Blog of Julian Andres Klode

Debian, Ubuntu, Linux in general, and other free software

python-apt now native Python 3 code

with one comment

Today I made an important change to the python-apt code: It is now native Python 3 code (but also works under Python 2). The previous versions all run 2to3 during the build process to create a Python 3 version. This is no longer needed, as the code is now identical.

As part of that change, python-apt now only supports Python 2.7, Python 3.3, and newer. I’m using some features only present in 3.3 like Python2 unicode literal syntax in order to keep the code simple.

Here’s how I did it:

I took the Python 2 code and ran 2to3 -f print -x future on it. This turned every print statement in a call to the print function. I then went ahead and added a “from __future__ import print_function” to the top of each module. This was the first commit.

For the second commit, I ran 2to3 -p -x future to convert the remaining stuff to Python 3, and then undid some changes (like unicode literals) and fixed the rest of the changes to work under both Python 2 and 3. Sometimes I added a top-level code like:

if sys.version_info_major >= 3:
    unicode = str

So I could use unicode in the code for the Python 2 cases.

I used various backported modules like io and stuff only available in Python 2.7, so dropped support for Python 2.6.

Written by Julian Andres Klode

January 6, 2014 at 17:46

Posted in Debian, Python

python-apt 0.9 released

leave a comment »

I released python-apt 0.9. This completely removes support for the old API from the code base (it was disabled for the entirety of 0.8 in Debian, and in Ubuntu since saucy). Highlights:

  • Cleanup: Complete removal of old-api support code
  • Bug fix: Various coverty bug fixes by Michael Vogt
  • Bug fix: Correctly handles multi-arch dependencies in apt.debfile, so packagekit and gdebi can now install local multi-arch packages correctly
  • Bug fix: A segmentation fault has been fixed. When releasing the value of the policy attribute of an apt_pkg.Cache object, its destructor deleted the pkgPolicy, but that was managed by a CacheFile from APT, causing it to be deleted twice.
  • Bug fix: Tests do not depend on the contents of /tmp anymore
  • Bug fix: All examples and old tests have been updated to the current python-apt API
  • Feature: Paths can now be specified using ‘bytes’ objects instead of ‘str’ in Python 3.
  • Ubuntu-specific: Meta-data for Ubuntu 14.04 — although with a typo (‘thar’ instead of ‘tahr’), but that is fixed in git

 

Written by Julian Andres Klode

October 22, 2013 at 00:03

Posted in Debian

Review: ThinkPad X230

This week, a few hours after Lenovo announced the X240, I bought an X230. Normally, the X230 model I bought comes with a Core i5-3320M CPU, 4GB RAM, 500GB HDD. My model was a special set including a second 4GB RAM stick and a 128 GB mSATA Plextor SSD. It came without Windows; and the ThinkVantage button is black instead of blue and has no label.

I put a fresh installation of unstable on it and tweaked it to save more power when on battery (enabled RC6++, and enabled autosuspend and other powertop suggestions with a script in /etc/pm/power.d); and configured hdparm.conf to put my hard disk into standby after 5 seconds (it’s only used for big data anyway, so most of the time it is unused). It now consumes 5W in idle with minimum brightness, and 8-10W with brightness 13 of 15. Consumption when surfing is 10 – 15 watts. Booting from grub to gdm is fast, I did not measure it, but it probably took about 5 seconds.

The IPS screen looks really good. Much much much better than the screen in my 2010 Edge 15 (I reviewed that one in 2010). It seems a bit more glossy than that one, but still matte. Keyboard is good as well. The touch pad is crap however. All hardware seems to work.

Comparison to the X240 for others who think about buying one of them: The X240 is lighter and thinner (it’s an Ultrabook) and has an optional FullHD 12.5 inch screen. It also offers a much larger touchpad and palm rest. But compared to the X230, the X240 lacks many things: No dedicated buttons for the trackpoint (you need to use the click-pad), it’s limited at 8GB RAM, uses a slower low-voltage Haswell CPU, and it uses the new M.2 connector (probably supporting only the shortest cards) instead of mini-PCIe, so it’s not really possible to add an additional SSD currently; as M.2 SSDs do not seem to be available yet. I also do not know whether the X240 offers ThinkLight or LEDs for hard drive activity and similar.

Written by Julian Andres Klode

September 7, 2013 at 18:43

Posted in General

apt-show-versions rewrite in C++ (more than 10 times faster)

The script apt-show-versions is developed by another Debian Developer called Christoph Martin in Perl. Recently, it turned out that apt-show-versions is too slow for some users; so I decided to rewrite his program using APT’s C++ API. I expect this to be part of a future APT release, rendering the original apt-show-versions obsolete.

The rewrite is sadly not 100% backwards compatible to the original version; as some option names had to be renamed due to our command-line parser not supporting option names like -nh, and some other options were dropped because they are hard to support (like –status-file and –lists-dir) with our command-line parsing. I also decided not to keep the the -p and -r options, but use the standard APT command-line conventions insteads.

For now, it also cannot show you the distribution names you have specified in your sources.list file, but will always display codenames instead; if available. I hope to fix this in Jessie by extending APT’s cache format a bit.

On the performance side, this program now takes about 0.09s compared to the 1.40 seconds needed by apt-show-versions. The times are taken with all data in caches.

The current version can be found in a git repository, a link to gitweb is:

http://anonscm.debian.org/gitweb/?p=users/jak/apt-show-versions.git

Please also note that support for –allversions is not 100% implemented yet, but it should work for most uses.

Now, go testing and report back!

Written by Julian Andres Klode

April 9, 2013 at 20:40

Posted in Debian

Recursive-descent in Python, using generators

Writing recursive descent parsers is easy, especially in Python (just like everything is easy in Python). But Python defines a low limit on the number of recursive calls that can be made, and recursive descent parsers are prone to exceed this limit.

We should thus write our parser without real recursion. Fortunately, Python offers us a way out: Coroutines, introduced in Python 2.5 as per PEP 342. Using coroutines and a simple trampoline function, we can convert every mutually recursive set of functions into a set of coroutines that require constant stack space.

Let’s say our parser looks like this (tokens being an iterator over characters):

def parse(tokens):
    token = next(tokens)
    if token.isalnum():
        return token
    elif token == '(':
         result = parse(tokens)
         if next(tokens) != ')': raise ...
         return result
    else:
         raise ...

We now apply the following transformations:
return X => yield (X)
parse() => (yield parse())

That is, we yield the value we want to return, and we yield a generator whenever we want to call (using a yield expression). Our rewritten parser reads:

def parse(tokens):
    token = next(tokens)
    if token.isalnum():
        yield token
    elif token == '(':
         result = yield parse(tokens)
         if next(tokens) != ')': raise ...
         return result
    else:
         raise ...

We obviously cannot call that generator like the previous function. What we need to introduce is a trampoline. A trampoline is a function that manages that yielded generators, calls them, and passes their result upwards. It looks like this:

def trampoline(generator):
    """Execute the generator using trampolining. """
    queue = collections.deque()

    def schedule(coroutine, stack=(), value=None):
        def resume():
            if 0:
                global prev
                now = stack_len(stack)
                if now < prev:
                    print("Length", now)
                    prev = -1
                elif prev != -1:
                    prev = now
            result = coroutine.send(value)
            if isinstance(result, types.GeneratorType):     # Normal call
                schedule(result, (coroutine, stack))
            elif isinstance(result, Tail):                  # Tail call (if you want to)
                schedule(result.value, stack)
            elif stack:                                     # Return to parent
                schedule(stack[0], stack[1], result)
            else:                                           # Final Return
                return result

        queue.append(resume)

    schedule(generator)

    result = None
    while queue:
        func = queue.popleft()
        result = func()

    return result

This function is based on the code in PEP 342, the difference being that

  • we do not correctly propagate exceptions through the stack, but directly unwind to the caller of the parser (we don’t handle exceptions inside our parser generators anyway)
  • the code actually compiles (code in PEP used ‘value = coroutine.send(value)’ which does not work)
  • the code returns a value (code in PEP was missing a return in schedule)
  • we don’t use a class, and allow only one function to be scheduled at once (yes, we could get rid of the queue)
  • we allow tail calls [where Tail = namedtuple(“Tail”, ["result"])] to save some more memory.

For a more generic version of that, you might want to re-add exception passing, but the exceptions will then have long tracebacks, so I’m not sure how well they will work if you have deep recursion.

Now, the advantage of that is that our parser now requires constant stack space, because the actual real stack is stored in the heap using tuples which are used like singly-linked lists in scheme here. So, the only recursion limit is available memory.

A disadvantage of that transformation is that the code will run slightly slower for small inputs that could be handled using a normally recursive parser.

Written by Julian Andres Klode

January 11, 2013 at 12:30

Posted in Python

Cleaning up the system with pseudo-boolean optimization

You can use a PBO solver to clean up your system from unneeded automatically installed packages. First of all, you convert the system state to PB, and add an optimization function telling it to remove as many automatically installed packages as possible. Then you run this thing through a solver (such as clasp, which seems the fastest solver for PBO instances in the Debian archive) and convert its output to human-readable package names.

Code is provided at http://anonscm.debian.org/gitweb/?p=users/jak/cleanup.git, under the MPL 2.0. You need to have python-apt and clasp installed to use it. There is potential minisat+ support, but it’s currently a bit broken.

To use, run python program_builder.py, and it will tell you which packages are no longer needed on your system. It ignores Suggests, if you want those in, you have to hack the code and replace {“Recommends”} by {“Recommends”, “Suggests”}. You can also turn of such dependencies by setting Program.hard_softdeps to False.

Written by Julian Andres Klode

August 16, 2012 at 20:45

Posted in APT2, Debian, Python

Implicit preferences in OR dependencies

Debian packages commonly use or dependencies of the form “a | b” to mean that a or b should be installed, while preferring option a over b. In general, for resolving an or dependency, we will try all options from the left to the right, preferring the left-most option. We also prefer real packages over virtual ones. If one of the alternatives is already installed we use that.

def solve_or(or):
  best_real = None
  best_virtual = None
  for dep in or:
     for target in dep:
        if target.name == dep.name and best_real is None:
           best_real = target
        if target.name != dep.name and best_virtual is None:
           best_virtual = target        
        if target.is_installed():
          return target

  return best_real if best_real is not None else best_virtual

Now, this way of solving dependencies is slightly problematic. Let us consider a package that depends on: a | b, b. APT will likely choose to install ‘a’ to satisfy the first dependency and ‘b’ to satisfy the second. I currently have draft code around for a future version of APT that will cause it to later on revert unneeded changes, which means that APT will then only install ‘b’. This result closely matches the CUDF solvers and cupt’s solver.

On the topic of solving algorithms, we also have the problem that optimizing solvers like the ones used with apt-cudf do not respect the order of dependencies, rather choosing to minimise the number of packages installed. This causes such a solver to often do stuff like selecting an sqlite database as backend for some service rather then a larger SQL server, as that installs fewer packages.

To make such solvers aware of the implicit preferences, we can introduce a new type of dependency category: Weak conflicts, also known as Recommends-Not. If a package P defines a Recommends-Not dependency against a package Q, then this means that Q should not be installed if P is installed. Now, if we have a dependency like:

Depends: a | b | c

we can encode this as:

Recommends-Not: c, c, b

Causing the solver to prefer a, then b, and then c. This should be representable as a pseudo-boolean optimization problem, as is common for the dependency problem, although I have not looked at that yet — it should work by taking the standard representation of conflicts, adding a relaxation variable and then minimising [or maximising] the number of relaxation variables.

Written by Julian Andres Klode

August 11, 2012 at 16:27

Posted in APT2, Debian, General

Follow

Get every new post delivered to your Inbox.