Blog of Julian Andres Klode

Debian, Ubuntu, Linux in general, and other free software

Archive for the ‘APT2’ Category

Cleaning up the system with pseudo-boolean optimization

You can use a PBO solver to clean up your system from unneeded automatically installed packages. First of all, you convert the system state to PB, and add an optimization function telling it to remove as many automatically installed packages as possible. Then you run this thing through a solver (such as clasp, which seems the fastest solver for PBO instances in the Debian archive) and convert its output to human-readable package names.

Code is provided at http://anonscm.debian.org/gitweb/?p=users/jak/cleanup.git, under the MPL 2.0. You need to have python-apt and clasp installed to use it. There is potential minisat+ support, but it’s currently a bit broken.

To use, run python program_builder.py, and it will tell you which packages are no longer needed on your system. It ignores Suggests, if you want those in, you have to hack the code and replace {“Recommends”} by {“Recommends”, “Suggests”}. You can also turn of such dependencies by setting Program.hard_softdeps to False.

Written by Julian Andres Klode

August 16, 2012 at 20:45

Posted in APT2, Debian, Python

Implicit preferences in OR dependencies

Debian packages commonly use or dependencies of the form “a | b” to mean that a or b should be installed, while preferring option a over b. In general, for resolving an or dependency, we will try all options from the left to the right, preferring the left-most option. We also prefer real packages over virtual ones. If one of the alternatives is already installed we use that.

def solve_or(or):
  best_real = None
  best_virtual = None
  for dep in or:
     for target in dep:
        if target.name == dep.name and best_real is None:
           best_real = target
        if target.name != dep.name and best_virtual is None:
           best_virtual = target        
        if target.is_installed():
          return target

  return best_real if best_real is not None else best_virtual

Now, this way of solving dependencies is slightly problematic. Let us consider a package that depends on: a | b, b. APT will likely choose to install ‘a’ to satisfy the first dependency and ‘b’ to satisfy the second. I currently have draft code around for a future version of APT that will cause it to later on revert unneeded changes, which means that APT will then only install ‘b’. This result closely matches the CUDF solvers and cupt’s solver.

On the topic of solving algorithms, we also have the problem that optimizing solvers like the ones used with apt-cudf do not respect the order of dependencies, rather choosing to minimise the number of packages installed. This causes such a solver to often do stuff like selecting an sqlite database as backend for some service rather then a larger SQL server, as that installs fewer packages.

To make such solvers aware of the implicit preferences, we can introduce a new type of dependency category: Weak conflicts, also known as Recommends-Not. If a package P defines a Recommends-Not dependency against a package Q, then this means that Q should not be installed if P is installed. Now, if we have a dependency like:

Depends: a | b | c

we can encode this as:

Recommends-Not: c, c, b

Causing the solver to prefer a, then b, and then c. This should be representable as a pseudo-boolean optimization problem, as is common for the dependency problem, although I have not looked at that yet — it should work by taking the standard representation of conflicts, adding a relaxation variable and then minimising [or maximising] the number of relaxation variables.

Written by Julian Andres Klode

August 11, 2012 at 16:27

Posted in APT2, Debian, General

Project APT2: new cache format and small things

I did not write much code or merge much of my prototype code, but some things happened since the last blog post about APT2 specific things in August and I forgot to write about them.

First of all, I dropped the GVariant-based cache. The format strings were simply getting ugly long and were not very understandable, performance was just much too slow (needing more than a few nanoseconds for a package lookup is obviously too slow for solving dependency problems); furthermore, building the cache was also slow and complicated because we needed all attributes of an object at once to pass them to GVariant, leading to ugly API.

I replaced the GVariant cache with one that can be easily mmap()ed and is described completely in C. It’s derived from APT’s cache design (but more robust, as it includes the size of the cache and we can thus detect to small files, although that’s scheduled for the next ABI break in APT as well), but has fewer duplicate data, and uses arrays where APT uses linked lists. The reason for arrays is simple: They take up less space and can be represented naturally in Python and other languages using array-based lists. The cache also contains a coalesced hash table which does use a linked list, but that one is a bit different, as it is for searching only and not exposed. Everything non-stringy is 64-bit aligned in order to keep things as simple as possible. All integers are fixed size, thus the format is architecture-independent if you fix byte orders. The format is described at http://people.debian.org/~jak/apt2-doc/apt-Cache-Format.html.

I stole one more idea from cupt and changed the configuration system to verify types of variables. APT2′s configuration system knows more types than cupt’s, though, including regular expressions, directory and filenames (i.e. it does not let you store a value /d/ in a file variable), strings (which store everything), unsigned and signed integers, and boolean options; all of which are checked when parsing files (producing warnings) or command-line options (producing errors).

I have also simplified the type world by removing all iterator types except for one, replacing them with get_thing() and n_things() functions in the objects holding the arrays. Makes cool bindings slightly harder, but makes the C API much easier to use from C.

Most things expected from a package manager are still missing, but what is there looks good in most cases (especially AptConfiguration has a nice API, and no complaints from valgrind anywhere). Currently I am working on Python bindings so I can interact with the functions easily and check things in an interactive fashion; and I am also writing a document explaining the concepts behind APT2, drafts at http://people.debian.org/~jak/midlevel.pdf. I also have some more code pending further thoughts (including complete index parsing), but it might still take some time before I have something usable in the wild.

On other package managers: From time to time I also use Cupt, look at Cupt code, hack Cupt code, and report bugs against Cupt. I still do not really understand the (extreme) nesting of directory structures in the source code, why there are so (extremely) many source files split all over them, or the general concepts of Cupt, but I can hack together what I need for my personal testing. I also play with yum whenever I end up on a Fedora system (which happens from time to time).

Written by Julian Andres Klode

May 11, 2011 at 18:52

Posted in APT2

The Three Levels of Package Management

In today’s Linux distributions, there are usually two to three levels of package management. In this blog post, I will explain the three levels of package management.

1. Local (dpkg, rpm)

The first level of package management is the ‘local’ level. This level consists of package management tools that install and/or remove packages via package archives such as .deb or .rpm files and a database of the local’s system state.

2. Connected (APT, YUM, Smart)

The second level of package management is the ‘connected’ level. In this level, tools are fetching packages and information about them from (remote) locations known as ‘repositories’ or ‘sources’. A level 2 package manager is usually built on top of a level 1 package manager, for example, APT uses dpkg – but there are also cases where one tool is both, such as opkg.

3. Abstract (PackageKit, Smart)

A level 3 package manager is a tool that supports different level 1 package managers via one generic interface. It may be implemented on top of a level 2 package manager (PackageKit), or may be implemented directly in level 2 (Smart).

Conclusion: Merge Level 2 and 3

Most level 2 package managers share a great deal of tasks, such as solving dependency problems. Merging level 2 package management and level 3 package management would enable us to reduce the number of dependency problem solvers and combine our efforts into a few package managers. Such an implementation would need to be written in C and support all common level 1 package managers in order to be successful. As some might have guessed, this is what I plan to do with the project codenamed APT2, although work is not progressing very fast.

PS: Back from vacation

I spent the more than the complete last week in (mostly rainy) Corfu, Greece and am now getting back to my usual development stuff. I have already processed my news & blogs backlog (1000+ messages) and will now start with my email backlog, so don’t be angry if the answer to your email takes some time.

Written by Julian Andres Klode

October 20, 2010 at 17:01

Posted in APT2, General

APT2 is now UPS

APT2 is now called UPS (Universal Package System). The name is inspired by the company that delivers packages in those brown trucks and from the Image Packaging System (IPS) of OpenSolaris; and mvo writing ups after I proposed upt (über package tool) in IRC. It’s definitely better than my first thought which was ‘moo’ (and libmoo).

Update: OK, let’s cancel this rename insanity.

Written by Julian Andres Klode

August 13, 2010 at 21:15

Posted in APT2

APT2 – this time in C

As I wrote a few hours ago on deity@l.d.o (see http://lists.debian.org/deity/2010/08/msg00057.html), APT2 is back again. The first time, I tried Vala; but this time I wrote it in C (with the help of GLib, but no GObject) and the cache uses GVariant instead of an SQLite database. It’s really basic at the moment (no solver, package installation/removal), but it will improve. Read operation should be faster than with APT, although writes are slower (this will be fixed by reusing unchanged parts of the cache).

See the announcement for further information.

Written by Julian Andres Klode

August 11, 2010 at 21:22

Posted in APT2

The previous post

Well, it seems that several news sites (golem.de, pro-linux.de, linux-magazin.de, linux-magazine.com, ubuntu-user.de [the last ones all from the same publishing house]), especially German ones have picked up the last blog post with same false impressions.

First, they stated that I am planning an APT2 release for Christmas. They took the statement

[...] the internal branch has seen a lot of new code[...]. Most of the code will need to be reworked before it will be published, but I hope to have this completed until Christmas.

as a proof for this. But in the context of this paragraph, ‘publish’ was not meant in the term of publishing a release, but in the term of publishing the code (of the internal branch) in the public repository. The code is public now and lives in a ‘temp’ branch and will be reworked there for inclusion in the master branch.

Secondly, those news sites called me an Ubuntu developer. While I do contribute to Ubuntu, and am an Ubuntu Member, I am NOT an Ubuntu Developer, as I am NOT a member of the relevant ubuntu-dev team at launchpad.

Thirdly, those who stated that speed is a goal (mostly pro-linux, at least from my understanding): It is not, at least not now. It is just a coincidence caused by using a relational database. Furthermore, the test was not really fair, since the other package managers both provide more information than capt; information which has yet to be made accessible in APT2. It was just an initial conclusion that SQLite is quite fast.

Please note that APT2 is a free time project in development, and the programming language used is still in development as well; as well as some other reverse dependencies. I should also state that neither Debian nor Ubuntu have any plans to drop apt at the moment; and APT is still actively developed.

Written by Julian Andres Klode

December 15, 2009 at 20:44

Posted in APT2

Follow

Get every new post delivered to your Inbox.