Blog of Julian Andres Klode

Debian, Ubuntu, Linux in general, and other free software

Python Speed: ‘x in list’ vs ‘x in set’

Well, this is my second post about speed in Python. Today, I noticed that debimg’s dependency resolver was much much slower than before. I thought what the problem could be and finally realized that the problem was that I switched from sets to list. This is fixed now in commit d0fd700080de5c19cb5fd66918d14c5ffa26e805

Now, some benchmarks (using IPython):

In [1]: a = range(10**6)

In [2]: b = set(a)

In [3]: %timeit 10**6 in a
10 loops, best of 3: 31.8 ms per loop

In [4]: %timeit 10**6 in b
10000000 loops, best of 3: 98.6 ns per loop

1ms are 1 million ns. Therefore, using sets is about 322515 times faster than using lists (or tuples).

debimg can now calculate dependencies in 0.5 seconds again, instead of 1 minute with lists.

Written by Julian Andres Klode

April 29, 2008 at 19:42

Posted in Python

3 Responses

Subscribe to comments with RSS.

  1. This is of course expected behaviour.
    Since a set is a sort of dictionary. (Hashtable)
    Therefor searching for an item in a set is log(n) instead of n.
    has_key is a dictionary.


    April 29, 2008 at 22:15

  2. Err, what? Searching for an item in a hashtable is O(1) not O(log(n)).


    April 30, 2008 at 01:40

  3. Actually search a hash table with a O(1) operation is the hash function hashes well. A search in O(log n) is usually associated to sorted lists or (balanced) binary trees.

    Paulo José da Silva e Silva

    April 30, 2008 at 03:45

Comments are closed.


Get every new post delivered to your Inbox.

%d bloggers like this: