Category: Debian

Cleaning up the system with pseudo-boolean optimization

You can use a PBO solver to clean up your system from unneeded automatically installed packages. First of all, you convert the system state to PB, and add an optimization function telling it to remove as many automatically installed packages as possible. Then you run this thing through a solver (such as clasp, which seems the fastest solver for PBO instances in the Debian archive) and convert its output to human-readable package names.

Code is provided at http://anonscm.debian.org/gitweb/?p=users/jak/cleanup.git, under the MPL 2.0. You need to have python-apt and clasp installed to use it. There is potential minisat+ support, but it’s currently a bit broken.

To use, run python program_builder.py, and it will tell you which packages are no longer needed on your system. It ignores Suggests, if you want those in, you have to hack the code and replace {“Recommends”} by {“Recommends”, “Suggests”}. You can also turn of such dependencies by setting Program.hard_softdeps to False.

Implicit preferences in OR dependencies

Debian packages commonly use or dependencies of the form “a | b” to mean that a or b should be installed, while preferring option a over b. In general, for resolving an or dependency, we will try all options from the left to the right, preferring the left-most option. We also prefer real packages over virtual ones. If one of the alternatives is already installed we use that.

def solve_or(or):
  best_real = None
  best_virtual = None
  for dep in or:
     for target in dep:
        if target.name == dep.name and best_real is None:
           best_real = target
        if target.name != dep.name and best_virtual is None:
           best_virtual = target        
        if target.is_installed():
          return target

  return best_real if best_real is not None else best_virtual

Now, this way of solving dependencies is slightly problematic. Let us consider a package that depends on: a | b, b. APT will likely choose to install ‘a’ to satisfy the first dependency and ‘b’ to satisfy the second. I currently have draft code around for a future version of APT that will cause it to later on revert unneeded changes, which means that APT will then only install ‘b’. This result closely matches the CUDF solvers and cupt’s solver.

On the topic of solving algorithms, we also have the problem that optimizing solvers like the ones used with apt-cudf do not respect the order of dependencies, rather choosing to minimise the number of packages installed. This causes such a solver to often do stuff like selecting an sqlite database as backend for some service rather then a larger SQL server, as that installs fewer packages.

To make such solvers aware of the implicit preferences, we can introduce a new type of dependency category: Weak conflicts, also known as Recommends-Not. If a package P defines a Recommends-Not dependency against a package Q, then this means that Q should not be installed if P is installed. Now, if we have a dependency like:

Depends: a | b | c

we can encode this as:

Recommends-Not: c, c, b

Causing the solver to prefer a, then b, and then c. This should be representable as a pseudo-boolean optimization problem, as is common for the dependency problem, although I have not looked at that yet — it should work by taking the standard representation of conflicts, adding a relaxation variable and then minimising [or maximising] the number of relaxation variables.

Managing system package selections using custom meta packages

Over the last years, I have developed a variety of metapackages for managing the package selections of the systems I administrate. The meta packages are organized like this:

jak-standard
Standard packages for all systems
jak-desktop
Standard packages for all desktop systems (GNOME 3 if possible, otherwise GNOME 2)
jak-printing
Print support
jak-devel
Development packages
jak-machine-<X>
The meta package defining the computer X

Each computer has a jak-machine-X package installed. This package is marked as manually installed, all other packages are marked as automatically installed.

The machine packages have the attribute XB-Important: yes set in debian/control. This creates an Important: yes field. This field is not official, but APT recognizes it and does not remove those packages (the same field is set for the APT package by APT when building the cache, as APT should not be removed either by APT). It seems to work a bit like Essential, with the exception that non-installed packages are not installed automatically on dist-upgrade.

The meta packages are created using seed files similar to Ubuntu. In contrast to Ubuntu, I’m not using germinate to create the packages from the seeds, but a custom dh_germinate_lite that simply takes a seed file and creates the correct substvars. It’s faster than germinate and really simplistic. It also does not handle Recommends currently.

The whole result can be seen on http://anonscm.debian.org/gitweb/?p=users/jak/jak-meta.git. Maybe that’s useful for some people. And if you happen to find some packages in the seeds that are deprecated, please let me know. Oh, and yes, some packages (such as the letterman one) are internal software not publically available yet [letterman is a simple GUI for creating letters using LaTeX].

While I’m at it, I also built Ubuntu’s version of wine1.2 for i386 squeeze. It can be found in
deb http://people.debian.org/~jak/debian/ squeeze main (it still needs a few changes to be correct though, I’ll upload a jak2 build soon). I also built updated sun-java6 packages for my parents (mostly needed due to the plugin, some websites do not work with the IcedTea one), but can’t share the binaries due to licensing requirements. I may push out a source repository, though, so others can build those packages themselves. I’ll let you know once that’s done.

dh-autoreconf v4 released, patching ltmain.sh for as-needed support

Yesterday I released version 4 of dh-autoreconf, fixing two bugs, and introducing a new feature: Patching ltmain.sh to make -Wl,–as-needed work.

For this new feature, run dh_autoreconf with the –as-needed option. dh_autoreconf will then patch all ltmain.sh equal to the system one (which should be all ltmain.sh files if libtoolize ran before or via dh_autoreconf). On clean, dh_autoreconf_clean reverses the patch again.

So, if your package runs autoreconf, and patches ltmain.sh via a patch you can now do this automatically via dh-autoreconf and be future-proof.

The only problem is that this might break once the patch no longer applies to libtool, at which point I need to update the package to include an updated patch. A solution for this problem would be to include the patch in libtool itself, as I proposed in Bug#347650.

In case this works well, the option could also become the default which would make things even easier.

0x15 + 1/365

Yesterday was my 21st birthday, and I received all “Hitchhiker’s Guide to the Galaxy” novels, the five ones in one book, and the sixth one written by Eoin Colfer in another book. Needless to say, the first book weights more than an N900. I did not read them yet, so now is the perfect chance to do so. Yes, I did not know that 25th is towel day, sorry for that.

I also bought a Toshiba AC100 before my birthday, a Tegra 2 based notebook/netbook/”web companion” with 1 GHz dual core ARM Cortex A9 chip and 512 MB RAM. It runs Android by default, and had a price of 160€ which is low compared to anything else with Cortex A9. It currently runs Ubuntu 11.04 with a specialised kernel 2.6.37 from time to time, without sound and accelerated video (and not functioning HDMI). Mostly waiting for Nvidia to release a new binary blob for the video part (And yes, if you just want to build packages, you can probably get happy without those things).

Another thing happening last week is the upload of python-apt 0.8.0 to unstable, marking the beginning (or end) of the API transition I started more than a year ago. Almost all packages not supporting it have proper Breaks in python-apt [most of them already fixed, only 2 packages remaining, one of which is “maintained” (well, not really maintained right now) by me], but there may be some which do not work correctly despite being fixed (or at least thought to be fixed).

If you know any other interesting thing I did last week, leave a comment, I wrote enough now. And yes, WordPress wants to write a multiplication sign instead of an x, so I had to use &#120 instead.

Project APT2: new cache format and small things

I did not write much code or merge much of my prototype code, but some things happened since the last blog post about APT2 specific things in August and I forgot to write about them.

First of all, I dropped the GVariant-based cache. The format strings were simply getting ugly long and were not very understandable, performance was just much too slow (needing more than a few nanoseconds for a package lookup is obviously too slow for solving dependency problems); furthermore, building the cache was also slow and complicated because we needed all attributes of an object at once to pass them to GVariant, leading to ugly API.

I replaced the GVariant cache with one that can be easily mmap()ed and is described completely in C. It’s derived from APT’s cache design (but more robust, as it includes the size of the cache and we can thus detect to small files, although that’s scheduled for the next ABI break in APT as well), but has fewer duplicate data, and uses arrays where APT uses linked lists. The reason for arrays is simple: They take up less space and can be represented naturally in Python and other languages using array-based lists. The cache also contains a coalesced hash table which does use a linked list, but that one is a bit different, as it is for searching only and not exposed. Everything non-stringy is 64-bit aligned in order to keep things as simple as possible. All integers are fixed size, thus the format is architecture-independent if you fix byte orders. The format is described at http://people.debian.org/~jak/apt2-doc/apt-Cache-Format.html.

I stole one more idea from cupt and changed the configuration system to verify types of variables. APT2’s configuration system knows more types than cupt’s, though, including regular expressions, directory and filenames (i.e. it does not let you store a value /d/ in a file variable), strings (which store everything), unsigned and signed integers, and boolean options; all of which are checked when parsing files (producing warnings) or command-line options (producing errors).

I have also simplified the type world by removing all iterator types except for one, replacing them with get_thing() and n_things() functions in the objects holding the arrays. Makes cool bindings slightly harder, but makes the C API much easier to use from C.

Most things expected from a package manager are still missing, but what is there looks good in most cases (especially AptConfiguration has a nice API, and no complaints from valgrind anywhere). Currently I am working on Python bindings so I can interact with the functions easily and check things in an interactive fashion; and I am also writing a document explaining the concepts behind APT2, drafts at http://people.debian.org/~jak/midlevel.pdf. I also have some more code pending further thoughts (including complete index parsing), but it might still take some time before I have something usable in the wild.

On other package managers: From time to time I also use Cupt, look at Cupt code, hack Cupt code, and report bugs against Cupt. I still do not really understand the (extreme) nesting of directory structures in the source code, why there are so (extremely) many source files split all over them, or the general concepts of Cupt, but I can hack together what I need for my personal testing. I also play with yum whenever I end up on a Fedora system (which happens from time to time).

last two weeks

The last two weeks, two new python-apt releases were made. 0.8.0~exp3 did not add much, but 0.8.0~exp4 added some new bindings for our friends at the mancoosi project. I also committed several fixes to the APT repository, but did not upload them yet.

In #debian-devel, some people (including me and others on the Debian side; and sladen, sabdfl for the Ubuntu side) discussed the Ubuntu font license which is considered non-free by Debian, due to extreme naming restrictions in section 2 (unmodified versions must keep the name, slightly modified versions must keep the name and add something). Some consider those restrictions equivalent to invariant sections. After we discusses the font license, we quickly got to discuss Doctor Who and time travel, as those two are obviously connected.

Some other things happened as well, like closing more bugs, but all in all, the last two weeks where a bit less intensive than the two weeks before them.