Blog of Julian Andres Klode

Debian, Ubuntu, Linux in general, and other free software

Cleaning up the system with pseudo-boolean optimization

You can use a PBO solver to clean up your system from unneeded automatically installed packages. First of all, you convert the system state to PB, and add an optimization function telling it to remove as many automatically installed packages as possible. Then you run this thing through a solver (such as clasp, which seems the fastest solver for PBO instances in the Debian archive) and convert its output to human-readable package names.

Code is provided at, under the MPL 2.0. You need to have python-apt and clasp installed to use it. There is potential minisat+ support, but it’s currently a bit broken.

To use, run python, and it will tell you which packages are no longer needed on your system. It ignores Suggests, if you want those in, you have to hack the code and replace {“Recommends”} by {“Recommends”, “Suggests”}. You can also turn of such dependencies by setting Program.hard_softdeps to False.

Written by Julian Andres Klode

August 16, 2012 at 20:45

Posted in APT2, Debian, Python

Implicit preferences in OR dependencies

Debian packages commonly use or dependencies of the form “a | b” to mean that a or b should be installed, while preferring option a over b. In general, for resolving an or dependency, we will try all options from the left to the right, preferring the left-most option. We also prefer real packages over virtual ones. If one of the alternatives is already installed we use that.

def solve_or(or):
  best_real = None
  best_virtual = None
  for dep in or:
     for target in dep:
        if == and best_real is None:
           best_real = target
        if != and best_virtual is None:
           best_virtual = target        
        if target.is_installed():
          return target

  return best_real if best_real is not None else best_virtual

Now, this way of solving dependencies is slightly problematic. Let us consider a package that depends on: a | b, b. APT will likely choose to install ‘a’ to satisfy the first dependency and ‘b’ to satisfy the second. I currently have draft code around for a future version of APT that will cause it to later on revert unneeded changes, which means that APT will then only install ‘b’. This result closely matches the CUDF solvers and cupt’s solver.

On the topic of solving algorithms, we also have the problem that optimizing solvers like the ones used with apt-cudf do not respect the order of dependencies, rather choosing to minimise the number of packages installed. This causes such a solver to often do stuff like selecting an sqlite database as backend for some service rather then a larger SQL server, as that installs fewer packages.

To make such solvers aware of the implicit preferences, we can introduce a new type of dependency category: Weak conflicts, also known as Recommends-Not. If a package P defines a Recommends-Not dependency against a package Q, then this means that Q should not be installed if P is installed. Now, if we have a dependency like:

Depends: a | b | c

we can encode this as:

Recommends-Not: c, c, b

Causing the solver to prefer a, then b, and then c. This should be representable as a pseudo-boolean optimization problem, as is common for the dependency problem, although I have not looked at that yet — it should work by taking the standard representation of conflicts, adding a relaxation variable and then minimising [or maximising] the number of relaxation variables.

Written by Julian Andres Klode

August 11, 2012 at 16:27

Posted in APT2, Debian, General

Memory allocators

In the past days, I looked at various memory allocators in a quest to improve performance in my multi-threaded test cases of a reference counting language runtime / object allocator (a fun project).

It turns out the glibc’s memory allocator is relatively slow, especially if you do loops that create one element and destroy one element at the same time (for example, map() on a list you no longer use after you passed it to map). To fix this problem, I considered various options.

The first option was to add a thread-local cache around malloc(). So whenever we want to free() an object, we place it on a thread-local list instead, and if a malloc() request for an object of that size comes in, we reuse the old object.

This fixes the problem with the lists, but experiments shown another problem with the glibc allocator: Allocating many objects without releasing others (let’s say appending an element to a functional list). I started testing with tcmalloc instead, and noticed that it was several times faster (reducing run time from 6.75 seconds to 1.33 seconds (5 times faster)). As I do not want to depend on a C++ code base, I decided to write a simple allocator that allocates blocks of memory via mmap(), splits those into objects, and puts the objects on a local free list. That allocator performed faster than tcmalloc(), but was also just a simple test case, and not really useable, due to potential fragmentation problems (and because the code was statically linked in, causing thread-local storage to be considerably faster, as it does not need to call a function on every TLS access, but can rather go through a segment register).

I then discovered jemalloc, and tried testing with this. It turned out that jemalloc was even faster than tcmalloc in all test cases, and also required about the same amount of memory as tcmalloc (and slightly more than the custom test code, as that combined reference counting with memory allocator metadata and thus had less meta data overhead). Using jemalloc, the list building test performs in 0.9X seconds now (instead of tcmalloc’s 1.33 or glibc’s 6.75 seconds), requires 0 major page faults (tcmalloc has 2), and uses 5361472 kb memory (tcmalloc uses 5290240, glibc requires 7866608).

Given that jemalloc is written in C, I can only recommend it to everyone in need of a scalable memory allocator. Depending on your workload, it might require less memory than tcmalloc (at least that’s the case at Facebook) and is most likely faster than tcmalloc. It also provides tcmalloc-compatible profiling facilities (as Facebook needed them). Furthermore, jemalloc is also the memory allocator used by FreeBSD, and is used by Mozilla projects, and on several Facebook projects, and should thus be relatively stable and useable.

All tests were run with 4 threads, on a dual-core laptop with hyper-threading, using (e)glibc 2.13, tcmalloc 2.0, and jemalloc 3.0. The tests may not be representative of real world performance, and do not account for fragmentation.

Should I try replacing Python’s custom caching of malloc() calls with simply calling jemalloc, and run a few Python benchmarks? Anyone interested in that?

Written by Julian Andres Klode

May 31, 2012 at 11:43

Posted in General

Reference Counting and Tail Calls

One thing I thought about today is reference counting in combination with tail calls. Imagine a function like this:

function f(x) { return g(x+1); }

Now consider that x is a reference counted object and that x + 1 creates a new object. The call to g(x + 1) shall be in tail call position.

In most reference counted languages, the reference to an argument is owned by the caller. That is, f() owns the reference to x + 1. In that case, the call to g() would no longer be in a tail call position, as we still have to decrease the reference count after g() exits.

An alternative would be that the callee owns the reference. This however, will most likely create far more reference count changes than a caller-owns language (increase reference count in caller, decrease reference count in callee). For example, the following function requires an increment before the call to g().

function f1(x) { return g(x); }

Does anyone have any ideas on how to solve the problem of tail calls while avoiding the callee-owns scenario? Something that does not require a (non-reference-counting) garbage collector would be preferably.

Written by Julian Andres Klode

April 22, 2012 at 19:05

Posted in General

Currying in the presence of mutable parameters

In the language introduced yesterday, mutable parameters play the important role of representing the side effects of actions in the type system and thus ensure referential transparency.

One of the questions currently unaddressed is how to handle currying in the presence of mutable parameters. In order to visualise this problem, consider a function

    printLn(mutable IO, String line)

If we bind the the first parameter, what should be the type of the function, and especially important how do we get back the mutable parameter? Consider the partially applied form printLn1:

    printLn1(String line)

The mutability parameter would be lost and we could not do any further I/O (and the curried form would appear as a pure function, so a good compiler would not even emit a call to it) An answer might be to put the thing into the result when currying:

    printLn2(String line) -> mutable IO

But how can we explain this? In the end result, do we maybe have to use a signature like:

    printLn(String, mutable in IO io1, mutable out IO io2)

We could then introduce syntax to call that as

    printLn(string, mutable io)

Where as the “mutable io” argument basically expands to “io” and “io1″ for the first call, and for later calls to “io1, io2″, and so on. It can also be easily curried by allowing currying to take place on variables not declared as output parameters. We can then curry the function as:

    printLn3(mutable in IO io1, mutable out IO io2)
    printLn4(mutable out IO io2)

If so, we can rename mutable back to unique and make that easier by introducing the unary operator & for two locations, just like Mercury uses ! for it. We could then write calls looking like this:

    printLn("Hello", &io);
    printLn("Hello", io, io1);

How out parameters are explained is a different topic; we could probably say that an out parameter defines a new variable.

Another option is to forbid currying of mutable parameters entirely. This would have the advantage of maintaining the somewhat simple one parameter style.

The programming language Clean does not provide any special syntactic sugar for having mutable variables. In Clean, the function gets a unique object and returns a unique object (noted by *). For example, the main entry point in a Clean program (with state) looks like this:

    Start:: *World -> *World

In short, the function Start gets a abstract world passed that is unique and at the end returns a new unique world. In Clean syntax, our example function would most likely have the signature:

    printLn :: String *IO -> *IO

You know have to either maintain one additional variable for the new unique object, which gets a bit complicated with time. On the other hand, you can do function composition on this (if you have a function composition operator that preserves uniqueness when available, as should be possible in Clean):

    printTwoLines :: *IO -> *IO 
    printTwoLines = (printLn "World") o (printLn "Hello")

Function composition on mutable things however, does not seem like it is needed often enough in a functional programming language with a C syntax.

People might also ask why monads are not used instead. Simon L Peyton Jones and Philip Wadler described monadic I/O in their paper “Imperative Functional Programming” ( in 1993, and it is used in Haskell (the paper was about the implementation in Haskell anyway), one of the world’s most popular and successful functional programming languages.

While monadic I/O works for the Haskell crowd, and surely some other people, the use of Monads also limits the expressiveness of code, at least as far as I can tell. At least as soon as you want to combine multiple monads, you will have to start lifting stuff from one monad to another (liftIO and friends), or perform all operations in the IO monad, which prevents obvious optimizations (such as parallelizing the creation of two arrays) — in short dependencies between actions are more strict than they have to be. For a functional language targeting imperative programmers, the lifting part seems a bit too complicated.

One of the big advantages of monads is that they are much easier to implement, as they do not require extensions to the type system and the Hindley-Milner type inference algorithm used by the cool functional programming languages. If you want uniqueness typing however, you need to modify the algorithm or infer the basic types first and then infer uniqueness in a second pass (as Clean seems to do it).

Written by Julian Andres Klode

April 2, 2012 at 13:43

Posted in General

Functional programming language for Python programmers and friends

Just for you, and this time in the Pythonesque rendering.

module main:
    import std (range)
    import (printf, IO)

    # print the Fahrenheit-Celcius table for fahr = 0, 20, ..., 300
    function main(mutable IO io):
        Int lower = 0    # lower bound
        Int upper = 300  # upper bound
        Int step = 20    # step
        for Int fahr in range(lower, upper, step):
            Double celcius = 5 * (fahr - 32) / 9
  , "%3d\t%6.1f\n", fahr, celcius)

It does not really look like it, but this language is purely functional. It represents side effects using unique types. If you declare a mutable parameter, you basically declare a unique input parameter and a unique output parameter.

Iā€™m also giving you a list implementation

module std.container.list:

    ## The standard singly-linked list type
    type List[E]:
        Nil                     ## empty list
            E value             ## current value
            List[E] next        ## remaining list

And yes, both languages should be able to be represented using the same abstract syntax tree. The only change is the replacement of the opening curly brace by a colon, the removal of the closing curly bracket and semicolons, the replacement of C-style comments with Python-style comments and the requirement of indentation; oh and the for statement gets a bit lighter as well.

Written by Julian Andres Klode

April 1, 2012 at 18:33

[updated] Functional programming language for C programmers and friends

Just for you:

module main {
    import std (range);
    import (printf, IO);
    /* print the Fahrenheit-Celcius table
        for fahr = 0, 20, ..., 300 */
    function main(mutable IO io) {
        Int lower = 0;   // lower bound
        Int upper = 300; // upper bound
        Int step = 20;   // step
        for (Int fahr in range(lower, upper, step)) {
            Double celcius = 5 * (fahr - 32) / 9;
  , "%3d\t%6.1f\n", fahr, celcius);

It does not really look like it, but this language is purely functional. It represents side effects using unique types. If you declare a mutable parameter, you basically declare a unique input parameter and a unique output parameter.

I’m also giving you a list implementation

module std.container.list {

    /** The standard singly-linked list type */
    type List[E] {
        Nil;                    /** empty list */
        Node {
            E value;            /** current value */
            List[E] next;       /** remaining list */

Thus are only excerpts from a document with tens of pages and the reference implementation of the standard library. The incomplete working draft for the language is attached: JAK Programming Language Early Working Draft (28 pages).

Update: Fixed the link.

Written by Julian Andres Klode

April 1, 2012 at 17:41


Get every new post delivered to your Inbox.