# Blog of Julian Andres Klode

Debian, Ubuntu, Linux in general, and other free software

## Cleaning up the system with pseudo-boolean optimization

You can use a PBO solver to clean up your system from unneeded automatically installed packages. First of all, you convert the system state to PB, and add an optimization function telling it to remove as many automatically installed packages as possible. Then you run this thing through a solver (such as clasp, which seems the fastest solver for PBO instances in the Debian archive) and convert its output to human-readable package names.

Code is provided at http://anonscm.debian.org/gitweb/?p=users/jak/cleanup.git, under the MPL 2.0. You need to have python-apt and clasp installed to use it. There is potential minisat+ support, but it’s currently a bit broken.

To use, run python program_builder.py, and it will tell you which packages are no longer needed on your system. It ignores Suggests, if you want those in, you have to hack the code and replace {“Recommends”} by {“Recommends”, “Suggests”}. You can also turn of such dependencies by setting Program.hard_softdeps to False.

Written by Julian Andres Klode

August 16, 2012 at 20:45

Posted in APT2, Debian, Python

## Implicit preferences in OR dependencies

Debian packages commonly use or dependencies of the form “a | b” to mean that a or b should be installed, while preferring option a over b. In general, for resolving an or dependency, we will try all options from the left to the right, preferring the left-most option. We also prefer real packages over virtual ones. If one of the alternatives is already installed we use that.

def solve_or(or):
best_real = None
best_virtual = None
for dep in or:
for target in dep:
if target.name == dep.name and best_real is None:
best_real = target
if target.name != dep.name and best_virtual is None:
best_virtual = target
if target.is_installed():
return target

return best_real if best_real is not None else best_virtual


Now, this way of solving dependencies is slightly problematic. Let us consider a package that depends on: a | b, b. APT will likely choose to install ‘a’ to satisfy the first dependency and ‘b’ to satisfy the second. I currently have draft code around for a future version of APT that will cause it to later on revert unneeded changes, which means that APT will then only install ‘b’. This result closely matches the CUDF solvers and cupt’s solver.

On the topic of solving algorithms, we also have the problem that optimizing solvers like the ones used with apt-cudf do not respect the order of dependencies, rather choosing to minimise the number of packages installed. This causes such a solver to often do stuff like selecting an sqlite database as backend for some service rather then a larger SQL server, as that installs fewer packages.

To make such solvers aware of the implicit preferences, we can introduce a new type of dependency category: Weak conflicts, also known as Recommends-Not. If a package P defines a Recommends-Not dependency against a package Q, then this means that Q should not be installed if P is installed. Now, if we have a dependency like:

Depends: a | b | c

we can encode this as:

Recommends-Not: c, c, b

Causing the solver to prefer a, then b, and then c. This should be representable as a pseudo-boolean optimization problem, as is common for the dependency problem, although I have not looked at that yet — it should work by taking the standard representation of conflicts, adding a relaxation variable and then minimising [or maximising] the number of relaxation variables.

Written by Julian Andres Klode

August 11, 2012 at 16:27

Posted in APT2, Debian, General

## Project APT2: new cache format and small things

I did not write much code or merge much of my prototype code, but some things happened since the last blog post about APT2 specific things in August and I forgot to write about them.

First of all, I dropped the GVariant-based cache. The format strings were simply getting ugly long and were not very understandable, performance was just much too slow (needing more than a few nanoseconds for a package lookup is obviously too slow for solving dependency problems); furthermore, building the cache was also slow and complicated because we needed all attributes of an object at once to pass them to GVariant, leading to ugly API.

I replaced the GVariant cache with one that can be easily mmap()ed and is described completely in C. It’s derived from APT’s cache design (but more robust, as it includes the size of the cache and we can thus detect to small files, although that’s scheduled for the next ABI break in APT as well), but has fewer duplicate data, and uses arrays where APT uses linked lists. The reason for arrays is simple: They take up less space and can be represented naturally in Python and other languages using array-based lists. The cache also contains a coalesced hash table which does use a linked list, but that one is a bit different, as it is for searching only and not exposed. Everything non-stringy is 64-bit aligned in order to keep things as simple as possible. All integers are fixed size, thus the format is architecture-independent if you fix byte orders. The format is described at http://people.debian.org/~jak/apt2-doc/apt-Cache-Format.html.

I stole one more idea from cupt and changed the configuration system to verify types of variables. APT2′s configuration system knows more types than cupt’s, though, including regular expressions, directory and filenames (i.e. it does not let you store a value /d/ in a file variable), strings (which store everything), unsigned and signed integers, and boolean options; all of which are checked when parsing files (producing warnings) or command-line options (producing errors).

I have also simplified the type world by removing all iterator types except for one, replacing them with get_thing() and n_things() functions in the objects holding the arrays. Makes cool bindings slightly harder, but makes the C API much easier to use from C.

Most things expected from a package manager are still missing, but what is there looks good in most cases (especially AptConfiguration has a nice API, and no complaints from valgrind anywhere). Currently I am working on Python bindings so I can interact with the functions easily and check things in an interactive fashion; and I am also writing a document explaining the concepts behind APT2, drafts at http://people.debian.org/~jak/midlevel.pdf. I also have some more code pending further thoughts (including complete index parsing), but it might still take some time before I have something usable in the wild.

On other package managers: From time to time I also use Cupt, look at Cupt code, hack Cupt code, and report bugs against Cupt. I still do not really understand the (extreme) nesting of directory structures in the source code, why there are so (extremely) many source files split all over them, or the general concepts of Cupt, but I can hack together what I need for my personal testing. I also play with yum whenever I end up on a Fedora system (which happens from time to time).

Written by Julian Andres Klode

May 11, 2011 at 18:52

Posted in APT2

## The Three Levels of Package Management

In today’s Linux distributions, there are usually two to three levels of package management. In this blog post, I will explain the three levels of package management.

### 1. Local (dpkg, rpm)

The first level of package management is the ‘local’ level. This level consists of package management tools that install and/or remove packages via package archives such as .deb or .rpm files and a database of the local’s system state.

### 2. Connected (APT, YUM, Smart)

The second level of package management is the ‘connected’ level. In this level, tools are fetching packages and information about them from (remote) locations known as ‘repositories’ or ‘sources’. A level 2 package manager is usually built on top of a level 1 package manager, for example, APT uses dpkg – but there are also cases where one tool is both, such as opkg.

### 3. Abstract (PackageKit, Smart)

A level 3 package manager is a tool that supports different level 1 package managers via one generic interface. It may be implemented on top of a level 2 package manager (PackageKit), or may be implemented directly in level 2 (Smart).

### Conclusion: Merge Level 2 and 3

Most level 2 package managers share a great deal of tasks, such as solving dependency problems. Merging level 2 package management and level 3 package management would enable us to reduce the number of dependency problem solvers and combine our efforts into a few package managers. Such an implementation would need to be written in C and support all common level 1 package managers in order to be successful. As some might have guessed, this is what I plan to do with the project codenamed APT2, although work is not progressing very fast.

### PS: Back from vacation

I spent the more than the complete last week in (mostly rainy) Corfu, Greece and am now getting back to my usual development stuff. I have already processed my news & blogs backlog (1000+ messages) and will now start with my email backlog, so don’t be angry if the answer to your email takes some time.

Written by Julian Andres Klode

October 20, 2010 at 17:01

Posted in APT2, General

## APT2 is now UPS

APT2 is now called UPS (Universal Package System). The name is inspired by the company that delivers packages in those brown trucks and from the Image Packaging System (IPS) of OpenSolaris; and mvo writing ups after I proposed upt (über package tool) in IRC. It’s definitely better than my first thought which was ‘moo’ (and libmoo).

Update: OK, let’s cancel this rename insanity.

Written by Julian Andres Klode

August 13, 2010 at 21:15

Posted in APT2

## APT2 – this time in C

As I wrote a few hours ago on deity@l.d.o (see http://lists.debian.org/deity/2010/08/msg00057.html), APT2 is back again. The first time, I tried Vala; but this time I wrote it in C (with the help of GLib, but no GObject) and the cache uses GVariant instead of an SQLite database. It’s really basic at the moment (no solver, package installation/removal), but it will improve. Read operation should be faster than with APT, although writes are slower (this will be fixed by reusing unchanged parts of the cache).

See the announcement for further information.

Written by Julian Andres Klode

August 11, 2010 at 21:22

Posted in APT2

## The previous post

Well, it seems that several news sites (golem.de, pro-linux.de, linux-magazin.de, linux-magazine.com, ubuntu-user.de [the last ones all from the same publishing house]), especially German ones have picked up the last blog post with same false impressions.

First, they stated that I am planning an APT2 release for Christmas. They took the statement

[...] the internal branch has seen a lot of new code[...]. Most of the code will need to be reworked before it will be published, but I hope to have this completed until Christmas.

as a proof for this. But in the context of this paragraph, ‘publish’ was not meant in the term of publishing a release, but in the term of publishing the code (of the internal branch) in the public repository. The code is public now and lives in a ‘temp’ branch and will be reworked there for inclusion in the master branch.

Secondly, those news sites called me an Ubuntu developer. While I do contribute to Ubuntu, and am an Ubuntu Member, I am NOT an Ubuntu Developer, as I am NOT a member of the relevant ubuntu-dev team at launchpad.

Thirdly, those who stated that speed is a goal (mostly pro-linux, at least from my understanding): It is not, at least not now. It is just a coincidence caused by using a relational database. Furthermore, the test was not really fair, since the other package managers both provide more information than capt; information which has yet to be made accessible in APT2. It was just an initial conclusion that SQLite is quite fast.

Please note that APT2 is a free time project in development, and the programming language used is still in development as well; as well as some other reverse dependencies. I should also state that neither Debian nor Ubuntu have any plans to drop apt at the moment; and APT is still actively developed.

Written by Julian Andres Klode

December 15, 2009 at 20:44

Posted in APT2

## APT2 progress report for the 1st half of December

This week was successful. I have pushed some changes from November to the repository which change the license to LGPL-2.1+ (which makes bi-directional sharing of code with other projects easier, since most Vala code is under the same license) and implement HTTP using libsoup2.4 directly, instead of using GIO and GVFS for this. I also added a parser for the sources.list format which uses regular expressions to parse the file and is relatively fast. The code needs a current checkout of Vala’s git master to work correctly; as released versions had a bug which I noticed today and Jürg Billeter fixed in Vala 25 minutes later; thank you Jürg.

While nothing else happened in the public repository, the internal branch has seen a lot of new code; including SQLite 3 caches; Acquire text progress handling; and capt; the command-line advanced package tool. Most of the code will need to be reworked before it will be published, but I hope to have this completed until Christmas. It will also depend on Vala 0.7.9 or newer, which is yet to be released.

The decision to use SQLite 3 as a backend means that we won’t see the size limitations APT has and that development can be simplified by using SQL queries for filtering requests. It also means that APT2 will be very fast in most actions, like searching; which currently happens in 0.140 seconds (unstable,experimental and more repositories enabled), whereas aptitude takes 1.101 seconds, cupt (which has no on-disk cache) 1.292 seconds, and apt-cache 0.475 seconds. Searching is performed by one SQL query. I also want to thank Jens Georg <mail@jensge.org>, who wrote Rygel’s Database class which is also used with minor modifications (like defaulting to in-memory journals) in APT2 as well. Rygel.Database is a small wrapper around sqlite3 which makes it easier to program for Vala programmers.

The command-line application ‘capt’ provides a shell based on readline with history (and later on command completion) as well as direct usage like ‘capt config dump’ or ‘capt search python-apt’. Just as with Eugene’s cupt, ‘capt’ will be the only program in the core APT2 distribution and provide the same functionality currently provided by apt-get, apt-config and friends. The name is not perfect and can be easily confused with ‘cupt’, but it was the closest option for now; considering that the name ‘apt’ is already used by Java (for its “Annotation Processing Tool”).

That’s all for now, I’ll tell you once all those features have passed my QA, and there is really something usable in the repository. In the meanwhile, you can discuss external dependency solvers, database layouts and other stuff in their threads on deity@lists.debian.org.

And a ‘screenshot’ from capt:

jak@hp:~/Desktop/APT2:temp$capt apt$ help
APT2 0.0.20091213 command-line frontend

Commands:
config dump               Dump the configuration
config get OPTION         Get the given option
config set OPTION VALUE   Set the given option
search EXPRESSION         Search for the given expression
show PACKAGE              Show all versions of the given package
sources list              Print a list of all sources
version                   Print the version of APT2
apt$search python-apt build-depends-python-apt - Dummy package to fulfill package dependencies python-apt - Python interface to libapt-pkg python-apt-dbg - Python interface to libapt-pkg (debug extension) python-apt-dev - Python interface to libapt-pkg (development files) python-aptdaemon - Python module for the server and client of aptdaemon python-aptdaemon-gtk - Python GTK+ widgets to run an aptdaemon client apt$


Written by Julian Andres Klode

December 13, 2009 at 21:07

Posted in APT2

## Results of the APT2 config parser testing

Thanks to those who have tested it (and/or will test it). The results where helpful and resulted in Bug#548443 filed against localepurge and GNOME Bug #596429 against glib. The first one is a case where quotes where used inside a value, although this has never been defined to work, and the second one is a problem with GLib’s GScanner not ignoring multi-line C-style comments although it was configured to do so.

I also fixed some bugs in APT2, like the missing build-depends on libgee-dev and the configuration parser now accepts ‘.’, ‘_’, ‘+’ in the option name. I also talked with Eugene about some differences in the way cupt and APT2 handle quotes and about some other parts of the configuration format. Seems this was a good day.

Written by Julian Andres Klode

September 26, 2009 at 19:12

Posted in APT2

## APT2: config parser testing

If you have an amd64 system, install the apt2 package from “deb http://people.debian.org/~jak/debian/ unstable/” and run the apt2-config command. Make sure that the parser reports no errors, otherwise send me an email or leave a comment here. One known exception is that all values must be quoted in the configuration file, I have no plans to fix this (probably just deprecate unquoted strings in APT instead). The parser is not as strict as cupt’s parser, but it gives you more help if something wents wrong. We also ignore most semicolons for now (they will be turned into warnings or errors later on). It is using GScanner from GLib for parsing the files.

apt2-config replicates the functionality of apt-config. It currently do not read the configuration files in the correct order, so don’t expect “apt2-config dump” to produce exactly the same output as “apt-config dump”. To build APT2 yourself, you need a patch for waf from http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=548329.

Written by Julian Andres Klode

September 25, 2009 at 23:19

Posted in APT2