Category: Python

python-apt now native Python 3 code

Today I made an important change to the python-apt code: It is now native Python 3 code (but also works under Python 2). The previous versions all run 2to3 during the build process to create a Python 3 version. This is no longer needed, as the code is now identical.

As part of that change, python-apt now only supports Python 2.7, Python 3.3, and newer. I’m using some features only present in 3.3 like Python2 unicode literal syntax in order to keep the code simple.

Here’s how I did it:

I took the Python 2 code and ran 2to3 -f print -x future on it. This turned every print statement in a call to the print function. I then went ahead and added a “from __future__ import print_function” to the top of each module. This was the first commit.

For the second commit, I ran 2to3 -p -x future to convert the remaining stuff to Python 3, and then undid some changes (like unicode literals) and fixed the rest of the changes to work under both Python 2 and 3. Sometimes I added a top-level code like:

if sys.version_info_major >= 3:
    unicode = str

So I could use unicode in the code for the Python 2 cases.

I used various backported modules like io and stuff only available in Python 2.7, so dropped support for Python 2.6.


Recursive-descent in Python, using generators

Writing recursive descent parsers is easy, especially in Python (just like everything is easy in Python). But Python defines a low limit on the number of recursive calls that can be made, and recursive descent parsers are prone to exceed this limit.

We should thus write our parser without real recursion. Fortunately, Python offers us a way out: Coroutines, introduced in Python 2.5 as per PEP 342. Using coroutines and a simple trampoline function, we can convert every mutually recursive set of functions into a set of coroutines that require constant stack space.

Let’s say our parser looks like this (tokens being an iterator over characters):

def parse(tokens):
    token = next(tokens)
    if token.isalnum():
        return token
    elif token == '(':
         result = parse(tokens)
         if next(tokens) != ')': raise ...
         return result
         raise ...

We now apply the following transformations:
return X => yield (X)
parse() => (yield parse())

That is, we yield the value we want to return, and we yield a generator whenever we want to call (using a yield expression). Our rewritten parser reads:

def parse(tokens):
    token = next(tokens)
    if token.isalnum():
        yield token
    elif token == '(':
         result = yield parse(tokens)
         if next(tokens) != ')': raise ...
         return result
         raise ...

We obviously cannot call that generator like the previous function. What we need to introduce is a trampoline. A trampoline is a function that manages that yielded generators, calls them, and passes their result upwards. It looks like this:

def trampoline(generator):
    """Execute the generator using trampolining. """
    queue = collections.deque()

    def schedule(coroutine, stack=(), value=None):
        def resume():
            if 0:
                global prev
                now = stack_len(stack)
                if now < prev:
                    print("Length", now)
                    prev = -1
                elif prev != -1:
                    prev = now
            result = coroutine.send(value)
            if isinstance(result, types.GeneratorType):     # Normal call
                schedule(result, (coroutine, stack))
            elif isinstance(result, Tail):                  # Tail call (if you want to)
                schedule(result.value, stack)
            elif stack:                                     # Return to parent
                schedule(stack[0], stack[1], result)
            else:                                           # Final Return
                return result



    result = None
    while queue:
        func = queue.popleft()
        result = func()

    return result

This function is based on the code in PEP 342, the difference being that

  • we do not correctly propagate exceptions through the stack, but directly unwind to the caller of the parser (we don’t handle exceptions inside our parser generators anyway)
  • the code actually compiles (code in PEP used ‘value = coroutine.send(value)’ which does not work)
  • the code returns a value (code in PEP was missing a return in schedule)
  • we don’t use a class, and allow only one function to be scheduled at once (yes, we could get rid of the queue)
  • we allow tail calls [where Tail = namedtuple(“Tail”, [“result”])] to save some more memory.

For a more generic version of that, you might want to re-add exception passing, but the exceptions will then have long tracebacks, so I’m not sure how well they will work if you have deep recursion.

Now, the advantage of that is that our parser now requires constant stack space, because the actual real stack is stored in the heap using tuples which are used like singly-linked lists in scheme here. So, the only recursion limit is available memory.

A disadvantage of that transformation is that the code will run slightly slower for small inputs that could be handled using a normally recursive parser.

Cleaning up the system with pseudo-boolean optimization

You can use a PBO solver to clean up your system from unneeded automatically installed packages. First of all, you convert the system state to PB, and add an optimization function telling it to remove as many automatically installed packages as possible. Then you run this thing through a solver (such as clasp, which seems the fastest solver for PBO instances in the Debian archive) and convert its output to human-readable package names.

Code is provided at, under the MPL 2.0. You need to have python-apt and clasp installed to use it. There is potential minisat+ support, but it’s currently a bit broken.

To use, run python, and it will tell you which packages are no longer needed on your system. It ignores Suggests, if you want those in, you have to hack the code and replace {“Recommends”} by {“Recommends”, “Suggests”}. You can also turn of such dependencies by setting Program.hard_softdeps to False.

FAIL: Arch Linux switch python executable to Python 3

Today, I got an email from an user of one of my Python scripts asking why the script t does not work on Arch Linux anymore. As it turns out, the Arch Linux team decided to switch /usr/bin/python to Python 3.0 and use python2 for Python 2.X versions. By doing this, they decided to make their distribution incompatible to almost any Python script in the world.

Arch Linux’s decision to diverge from the rest of the world that uses python for Python 2.X and python3 for Python 3.X is stupid. And doing this without updating reverse dependencies beforehand and thus breaking packages in their own distribution is insane. In the end this means that if you use Arch Linux, you should consider switching to a distribution that does things the right way: Debian. You can also switch to Ubuntu, that’s normally just a bit less right (= faster). But using a distribution that does those crazy things in such an irresponsible way is really insane.

Really, how can they be so stupid in the Arch world?

Python 3.1 bug: Objects in modules (m_size=-1) not deallocated

Last year, in July, I reported an issue to Python’s issue tracker. This issue can be seen at Since then, there has been no action on this bug from the developers.

The bug describes that every object stored in a module will not be deallocated if the module is deallocated and it’s m_size = -1 (which it should be if the module has no state). The problem seems to be that Python copies the dictionary of the module but forgets to decrease the reference count of the copy when the module is deallocated. This bug may have serious impact if objects are stored in the module which write status to a file when they are deallocated, because the deallocation functions are never called.

Python 3.1 and python-apt

So, I have started to port python-apt to Python 3. Most things work already, but there is one single problem. I can not access the attributes of the objects, only their methods.

In Python 2.5, everything works perfectly. In Python 3.1, the same code produces an error. An example is apt_pkg.GetCache().Packages. I defined the slots tp_getattro and tp_methods. In Python 3.1, tp_getattro seems to be ignored.

If you want to help, for browsing the branch and for branching it.

This branch contains the current state and allows you to build a python-apt package shipping with Python 3.1 modules and extensions. But as I wrote above, not everything works yet. But its far enough for about 5 hours of work.

Update 2009-04-12 00:17 CEST: I just fixed a bunch of problems, including the one listed above. Most of python-apt should work now, including the ‘apt’ package.

python-apt documentation – 2nd time


introduced again some new documentation.

doc/source/apt/debfile.rst | 6
doc/source/apt_pkg/cache.rst | 540 +++++++++++++++++++++++++++++++++-
doc/source/apt_pkg/index.rst | 24 +
doc/source/ | 5
doc/source/examples/ | 22 +
doc/source/examples/ | 29 +
doc/source/examples/ | 51 +++

7 files changed, 659 insertions(+), 18 deletions(-)

You can see them at

Missing seem to be:
– AcquireFile
– AcquireItem
– ActionGroup
– Configuration
– MetaIndex
– PackageIndexFile
– PkgManager
– PkgRecords
– PkgSourceList
– PkgSrcRecords
– ProblemResolver
– TagFile
– TagSection

But the weekend is not over yet. I will blog tomorrow again, and once I’m finished with everything (which may be tomorrow, too).

Have fun, read the documentation, find the mistakes. Some things are currently written as ‘???’, if you know what fits there, please tell me.