Category: Ubuntu

APT 1.1.8 to 1.1.10 – going “faster”

Not only do I keep incrementing version numbers faster than ever before, APT also keeps getting faster. But not only that, it also has some bugs fixed and the cache is now checked with a hash when opening.

Important fix for 1.1.6 regression

Since APT 1.1.6, APT uses the configured xz compression level. Unfortunately, the default was set to 9, which requires 674 MiB of RAM, compared to the 94 MiB required at level 6.

This caused the test suite to fail on the Ubuntu autopkgtest servers, but I thought it was just some temporary hickup on their part, and so did not look into it for the 1.1.7, 1.1.8, and 1.1.9 releases.  When the Ubuntu servers finally failed with 1.1.9 again (they only started building again on Monday it seems), I noticed something was wrong.

Enter git bisect. I created a script that compiles the APT source code and runs a test with ulimit for virtual and resident memory set to 512 (that worked in 1.1.5), and let it ran, and thus found out the reason mentioned above.

The solution: APT now defaults to level 6.

New Features

APT 1.1.8 introduces /usr/lib/apt/apt-helper cat-file which can be used to read files compressed by any compressor understood by APT. It is used in the recent apt-file experimental release, and serves to prepare us for a future in which files on the disk might be compressed with a different compressor (such as LZ4 for Contents files, this will improve rred speed on them by factor 7).

David added a feature that enables servers to advertise that they do not want APT to download and use some Architecture: all contents when they include all in their list of architectures. This is to allow archives to drop Architecture: all packages from the architecture-specific content files, to avoid redundant data and (thus) improve the performance of apt-file.

Buffered writes

APT 1.1.9 introduces buffered writing for rred, reducing the runtime by about 50% on a slowish SSD, and maybe more on HDDs. The 1.1.9 release is a bit buggy and might mess up things when a write syscall is interrupted, this is fixed in 1.1.10.

Cache generation improvements

APT 1.1.9 and APT 1.1.10 improve the cache generation algorithms in several ways: Switching a lookup table from std::map to std::unordered_map, providing an inline isspace_ascii() function, and inlining the tolower_ascii() function which are tiny functions that are called a lot.

APT 1.1.10 also switches the cache’s hash function to the DJB hash function and increases the default hash table sizes to the smallest prime larger than 15000, namely 15013. This reduces the average bucket size from 6.5 to 4.5. We might increase this further in the future.

Checksum for the cache, but no more syncs

Prior to APT 1.1.10 writing the cache was a multi-part process:

  1. Write the the cache to a temporary file with the dirty bit set to true
  2. Call fsync() to sync the cache
  3. Write a new header with the dirty bit set to false
  4. Call fsync() to sync the new header
  5. (Rename the temporary file to the target name)

The last step was obviously not needed, as we could easily live with an intact cache that has its dirty field set to false, as we can just rebuild it.

But what matters more is step 2. Synchronizing the entire 40 or 50 MB takes some time. On my HDD system, it consumed 56% of the entire cache generation time, and on my SSD system, it consumed 25% of the time.

APT 1.1.10 does not sync the cache at all. It now embeds a hashsum (adler32 for performance reasons) in the cache. This helps ensure that no matter what parts of the cache are written in case of some failure somewhere, we can still detect a failure with reasonable confidence (and even more errors than before).

This means that cache generation is now much faster for a lot of people. On the bad side, commands like apt-cache show that previously took maybe 10 ms to execute can now take about 80 ms.

Please report back on your performance experience with 1.1.10 release, I’m very interested to see if that works reasonably for other people. And if you have any other idea how to solve the issue, I’d be interested to hear them (all data needs to be written before the header with dirty=0 is written, but we don’t want to sync the data).

Future work

We seem to have a lot of temporary (?) std::string objects during the cache generation, accounting for about 10% of the run time. I’m thinking of introducing a string_view class similar to the one proposed for C++17 and make use of that.

I also thought about calling posix_fadvise() before starting to parse files, but the cache generation process does not seem to spend a lot of its time in system calls (even with all caches dropped before the run), so I don’t think this will improve things.

If anyone has some other suggestions or patches for performance stuff, let me know.


0x15 + 1/365

Yesterday was my 21st birthday, and I received all “Hitchhiker’s Guide to the Galaxy” novels, the five ones in one book, and the sixth one written by Eoin Colfer in another book. Needless to say, the first book weights more than an N900. I did not read them yet, so now is the perfect chance to do so. Yes, I did not know that 25th is towel day, sorry for that.

I also bought a Toshiba AC100 before my birthday, a Tegra 2 based notebook/netbook/”web companion” with 1 GHz dual core ARM Cortex A9 chip and 512 MB RAM. It runs Android by default, and had a price of 160€ which is low compared to anything else with Cortex A9. It currently runs Ubuntu 11.04 with a specialised kernel 2.6.37 from time to time, without sound and accelerated video (and not functioning HDMI). Mostly waiting for Nvidia to release a new binary blob for the video part (And yes, if you just want to build packages, you can probably get happy without those things).

Another thing happening last week is the upload of python-apt 0.8.0 to unstable, marking the beginning (or end) of the API transition I started more than a year ago. Almost all packages not supporting it have proper Breaks in python-apt [most of them already fixed, only 2 packages remaining, one of which is “maintained” (well, not really maintained right now) by me], but there may be some which do not work correctly despite being fixed (or at least thought to be fixed).

If you know any other interesting thing I did last week, leave a comment, I wrote enough now. And yes, WordPress wants to write a multiplication sign instead of an x, so I had to use &#120 instead.

Nokia/Intel/Google/Canonical – openness and professionality in MeeGo, Android, Ubuntu

MeeGo (Nokia/Intel): Openness does not seem to be very important for Nokia and Intel. They develop their stuff behind closed doors and then do a large code drop (once dropped, stuff gets developed in the open). In terms of professionality, it does not look much better: If you look at the meego-dev mailing list, you feel like you are in some kind of a kindergarten for open source developers – Things like HTML emails and top-posting appear to be completely normal for them, they don’t even follow the basic rules for emails and they also appear to ignore any advice on this topic. Oh, and writing a platform in GTK+ while pushing Qt as the supported toolset is not a good idea.

Android (Google): The situation with Android appears to be even worse. Google keeps an internal development tree and the only time the public sees changes is when a new release is coming and Google does a new code drop. Such a behavior is not acceptable.

Ubuntu (Canonical): Canonical does an outstanding job on openness. Canonical employees develop their software completely in the open, and there are no big code drops. They basically have no competitive advantage and allow you to use their work for your own project before they use it themselves. Canonical employees basically behave like normal free software developers doing stuff in his free time, the only exception being that they are paid for it. Most Canonical employees also know how to behave on mailing lists and do not post HTML emails or do top-posting. There are of course also other people: The design process of the Ubuntu font on the other hand is a disaster, with the font being created using proprietary software running on Windows; for a free software company, that’s not acceptable (such persons should probably not be hired at all). Furthermore, the quality of Ubuntu packages is often bad when compared to the quality of a Debian package (especially in terms of API documentation of Canonical-created software); a result of vastly different development cycles and different priorities.

On a scale from 0 to 15 (15 being the best), Nokia/Intel receive 5 points (“D” in U.S. grades), Google receives 3 points (“F” in U.S. grades), and Canonical receives 12 points (“B” in U.S. grades). Please note that those grades only apply for the named projects, the companies may behave better or worse in other projects.

Final tips for the companies:

  • Nokia/Intel: Teach your employees how to behave in email discussions.
  • Nokia/Intel/Google: Don’t do big code drops, develop in the open. If someone else can create something awesome using your work before you can, you are on the right way. Competitive Advantage must not exist.
  • Canonical: Fire those who used Windows to create the Ubuntu font and restart using free tools.
  • All: Document your code.

My First upload with new source format

Yesterday, I uploaded command-not-found 0.2.38-1 (based on version 0.2.38ubuntu4) to Debian unstable, using the “3.0 (quilt)” source format. All steps worked perfectly, including stuff like cowbuilder, lintian, debdiff, dput and the processing on ftp-master. Next steps are reverting my machine from Ubuntu 9.10 to my Debian unstable system and uploading new versions of gnome-main-menu, python-apt (0.7.93, not finished yet) and some other packages.

In other news, the development of Ubuntu 10.04 Lucid Lynx just started. For the first time in Ubuntu’s history, the development will be based on the testing tree of Debian and not on the unstable tree. This is done in order to increase the stability of the distribution, as this release is going to be a long term supported release. Ubuntu will freeze in February, one month before the freeze of Debian Squeeze. This should give us enough room to collaborate, especially on bugfixes. This also means that I will freeze my packages in February, so they will have the same version in Squeeze and Lucid (applying the earliest freeze to both distributions; exceptions where needed).

Debian’s new time-based freezes

Overall, having time-based freezes is a good idea. But the chosen cycle is problematic, especially if one considers Ubuntu’s LTS release cycles. The problem is that if Debian releases a new version at approximately the same time as Ubuntu, there will not be much synchronization and Ubuntu will have newer program versions.

Consider the releases of Ubuntu 8.04 LTS (April 2008) and Debian GNU/Linux 5.0 (February 2009). Whereas Ubuntu 8.04 provides GNOME 2.22 including a new Nautilus, Debian provides GNOME 2.22 with nautilus 2.20. Ubuntu’s release made at about the same time (Ubuntu 9.04) already included GNOME 2.26.

The reason for this are the different development processes. Whereas a Debian release is based on stable upstream versions, the development of Ubuntu releases happens using newest pre-releases, causing Ubuntu releases to be one generation ahead in most technologies, although released at the same time.

Another difference is the way of freezing and the duration of the total freeze. Ubuntu has a freeze process split into multiple stages. The freeze which is comparable to Debian’s freeze is the feature freeze, which usually happens two months before the release. Debian’s freeze happens 6 month before the release, just about the time when Ubuntu has just defined the features to be included in the next release.

Synchronizing the release cycles of Debian and Ubuntu basically means that Ubuntu will always provide the newer features, better hardware support, etc. Ubuntu is the winner of this decision. It will have less bugs because it inherits from a frozen Debian branch and it can include newer versions where required because it freezes 3 months later.

To synchronize the package versions shipped in Debian and Ubuntu you have to make your release schedules asynchronous, in a way that the Debian release freezes after the Ubuntu release. This basically means that I expect Debian 6.0 (2010/H1) to be more similar to Ubuntu 9.10 than to Ubuntu 10.04. I would have preferred to freeze Squeeze in December 2010 and release in April 2011, and Ubuntu LTS to be re-scheduled for October 2010.

Now let’s say you are a Debian and Ubuntu user (and you prefer none of them) and you want to setup a new system in 2010/H2 (so the systems have a half year to stabilize). This system should be supported for a long time. You have two options: Debian 6.0 and Ubuntu 10.04. Which will you choose? Most people would probably consider Ubuntu now, because it provides better support for their hardware and has the newer features. A (partial) solution to this problem would be to make backports an official part of Debian starting with Debian 6.0.

Furthermore, there is the question of security support. If we want to provide the ability to upgrade directly from Debian 5.0 to Debian 7.0, we will have to support Lenny until 2012/H2 (to give users enough time to upgrade, when we release 7.0 in 2012/H1). This means that we would have to support in 2012: Debian 5.0, 6.0 and 7.0. And another side effect is that Debian 6.0 would have a 3 year security support, just like Ubuntu 10.04.

All in all, the decision means that Debian and Ubuntu LTS releases will occur at about the same time, will be supported for about the same time and that Ubuntu has the newer packages and less bugs to fix.

python-apt 0.7.91 released

As I promised, I released python-apt 0.7.91 today. This version provides a new API, with real classes in apt_pkg, new names which conform to PEP 8 conventions, and it supports new language features such as the ‘with’ statement. Old code should still continue to work, if it does not and it is using only public interfaces, report a bug against python-apt or send an email.

I can not guarantee that all the names will be kept like they are at the moment (it’s a pre-release), but there should not be many more changes needed. The series will hit Ubuntu Karmic later this month, and the final 0.8.0 release is going to be shipped in the final Karmic release.

If you want to help with python-apt, consider to write some examples of what can be done with python-apt, and some tutorials for the documentation. You can also check for spelling mistakes and alike. If you want, you can also contribute code. See the documentation (in the package, or online) for guidelines on how to contribute.

You can get the 0.7.91 release from Debian experimental, and you can view the documentation online at

And of course, a short example:

with cache.actiongroup(): # apt.Cache 'cache'
    for package in my_selected_packages:
        package.mark_install() # New PEP 8 names, previously named markInstall()

Ubuntu One

Today, I was testing Canonical’s new Ubuntu One service. Ubuntu One is a service for syncing and sharing files online, with 2GB storage for free. I installed the Ubuntu One client on Ubuntu 9.04 and it’s cool.

Ubuntu One creates a directory named Ubuntu One in your home directory. Within this directory, there are two subdirectories. The first one is “My Files” and the second one is named “Shared With Me”. When you place files in the “My Files” directory, the Ubuntu One client gets notified (using inotify) about the change and begins uploading the file to the Ubuntu one server.

When you access the web interface, which should work in every modern browser, and upload a file there, the next time your local client connects the files are fetched to your local hard disk. This also works when you have two different computers and create a file on the one computer, it will be visible on the second one as soon as it has fetched the new file.

You could also copy your .mozilla directory into the synced directory, and create a symlink from your home directory to it. I have not tried it myself, but in theory this would allow you to have your profile synced on all your computers.

And the best thing about Ubuntu One is that the client is completely free software and written in Python. This makes it possible to package the client for other distributions, like Debian. Packaging it for RPM-based distribution such as Fedora should also be doable, but may require some more time.

There seems to have been some criticism that the server side is not free software. While that may not be the good, it’s certainly better than other services where even the client is proprietary. And there still is the possibility to write your own server as the protocol is available.

Ubuntu One is currently in private beta, if you want to try it out, you need an invitation (visit for further information).