Category: General

Reference Counting and Tail Calls

One thing I thought about today is reference counting in combination with tail calls. Imagine a function like this:

function f(x) { return g(x+1); }

Now consider that x is a reference counted object and that x + 1 creates a new object. The call to g(x + 1) shall be in tail call position.

In most reference counted languages, the reference to an argument is owned by the caller. That is, f() owns the reference to x + 1. In that case, the call to g() would no longer be in a tail call position, as we still have to decrease the reference count after g() exits.

An alternative would be that the callee owns the reference. This however, will most likely create far more reference count changes than a caller-owns language (increase reference count in caller, decrease reference count in callee). For example, the following function requires an increment before the call to g().

function f1(x) { return g(x); }

Does anyone have any ideas on how to solve the problem of tail calls while avoiding the callee-owns scenario? Something that does not require a (non-reference-counting) garbage collector would be preferably.

Advertisements

Currying in the presence of mutable parameters

In the language introduced yesterday, mutable parameters play the important role of representing the side effects of actions in the type system and thus ensure referential transparency.

One of the questions currently unaddressed is how to handle currying in the presence of mutable parameters. In order to visualise this problem, consider a function

    printLn(mutable IO, String line)

If we bind the the first parameter, what should be the type of the function, and especially important how do we get back the mutable parameter? Consider the partially applied form printLn1:

    printLn1(String line)

The mutability parameter would be lost and we could not do any further I/O (and the curried form would appear as a pure function, so a good compiler would not even emit a call to it) An answer might be to put the thing into the result when currying:

    printLn2(String line) -> mutable IO

But how can we explain this? In the end result, do we maybe have to use a signature like:

    printLn(String, mutable in IO io1, mutable out IO io2)

We could then introduce syntax to call that as

    printLn(string, mutable io)

Where as the “mutable io” argument basically expands to “io” and “io1” for the first call, and for later calls to “io1, io2”, and so on. It can also be easily curried by allowing currying to take place on variables not declared as output parameters. We can then curry the function as:

    printLn3(mutable in IO io1, mutable out IO io2)
    printLn4(mutable out IO io2)

If so, we can rename mutable back to unique and make that easier by introducing the unary operator & for two locations, just like Mercury uses ! for it. We could then write calls looking like this:

    printLn("Hello", &io);
    printLn("Hello", io, io1);

How out parameters are explained is a different topic; we could probably say that an out parameter defines a new variable.

Another option is to forbid currying of mutable parameters entirely. This would have the advantage of maintaining the somewhat simple one parameter style.

The programming language Clean does not provide any special syntactic sugar for having mutable variables. In Clean, the function gets a unique object and returns a unique object (noted by *). For example, the main entry point in a Clean program (with state) looks like this:

    Start:: *World -> *World

In short, the function Start gets a abstract world passed that is unique and at the end returns a new unique world. In Clean syntax, our example function would most likely have the signature:

    printLn :: String *IO -> *IO

You know have to either maintain one additional variable for the new unique object, which gets a bit complicated with time. On the other hand, you can do function composition on this (if you have a function composition operator that preserves uniqueness when available, as should be possible in Clean):

    printTwoLines :: *IO -> *IO 
    printTwoLines = (printLn "World") o (printLn "Hello")

Function composition on mutable things however, does not seem like it is needed often enough in a functional programming language with a C syntax.

People might also ask why monads are not used instead. Simon L Peyton Jones and Philip Wadler described monadic I/O in their paper “Imperative Functional Programming” (http://research.microsoft.com/en-us/um/people/simonpj/papers/imperative.ps.Z) in 1993, and it is used in Haskell (the paper was about the implementation in Haskell anyway), one of the world’s most popular and successful functional programming languages.

While monadic I/O works for the Haskell crowd, and surely some other people, the use of Monads also limits the expressiveness of code, at least as far as I can tell. At least as soon as you want to combine multiple monads, you will have to start lifting stuff from one monad to another (liftIO and friends), or perform all operations in the IO monad, which prevents obvious optimizations (such as parallelizing the creation of two arrays) — in short dependencies between actions are more strict than they have to be. For a functional language targeting imperative programmers, the lifting part seems a bit too complicated.

One of the big advantages of monads is that they are much easier to implement, as they do not require extensions to the type system and the Hindley-Milner type inference algorithm used by the cool functional programming languages. If you want uniqueness typing however, you need to modify the algorithm or infer the basic types first and then infer uniqueness in a second pass (as Clean seems to do it).

Functional programming language for Python programmers and friends

Just for you, and this time in the Pythonesque rendering.

module main:
    import std (range)
    import std.io (printf, IO)

    # print the Fahrenheit-Celcius table for fahr = 0, 20, ..., 300
    function main(mutable IO io):
        Int lower = 0    # lower bound
        Int upper = 300  # upper bound
        Int step = 20    # step
        for Int fahr in range(lower, upper, step):
            Double celcius = 5 * (fahr - 32) / 9
            std.io.printf(io, "%3d\t%6.1f\n", fahr, celcius)

It does not really look like it, but this language is purely functional. It represents side effects using unique types. If you declare a mutable parameter, you basically declare a unique input parameter and a unique output parameter.

I’m also giving you a list implementation

module std.container.list:

    ## The standard singly-linked list type
    type List[E]:
        Nil                     ## empty list
        Node:
            E value             ## current value
            List[E] next        ## remaining list
 


And yes, both languages should be able to be represented using the same abstract syntax tree. The only change is the replacement of the opening curly brace by a colon, the removal of the closing curly bracket and semicolons, the replacement of C-style comments with Python-style comments and the requirement of indentation; oh and the for statement gets a bit lighter as well.

[updated] Functional programming language for C programmers and friends

Just for you:

module main {
    import std (range);
    import std.io (printf, IO);
 
    /* print the Fahrenheit-Celcius table
        for fahr = 0, 20, ..., 300 */
    function main(mutable IO io) {
        Int lower = 0;   // lower bound
        Int upper = 300; // upper bound
        Int step = 20;   // step
        for (Int fahr in range(lower, upper, step)) {
            Double celcius = 5 * (fahr - 32) / 9;
            std.io.printf(io, "%3d\t%6.1f\n", fahr, celcius);
        }
    }
}

It does not really look like it, but this language is purely functional. It represents side effects using unique types. If you declare a mutable parameter, you basically declare a unique input parameter and a unique output parameter.

I’m also giving you a list implementation

module std.container.list {

    /** The standard singly-linked list type */
    type List[E] {
        Nil;                    /** empty list */
        Node {
            E value;            /** current value */
            List[E] next;       /** remaining list */
        }
    }
}

Thus are only excerpts from a document with tens of pages and the reference implementation of the standard library. The incomplete working draft for the language is attached: JAK Programming Language Early Working Draft (28 pages).

Update: Fixed the link.

hardlink 0.2 RC1 released

I have just released version 0.2 RC1 of my hardlink program. Compared to the 0.1.X series, the program has been rewritten in C, as Python was to memory-hungry for people with millions of files. The new program uses almost the same algorithm and has almost completely the same bugs as the old version.

The code should be portable to all UNIX-like platforms supporting nftw(). I have tested the code on Debian, FreeBSD 9, and Minix 3.2. For storing path names, it uses a flexible array member on C99 compilers, a zero-length-array on GNU compilers, and a array of length 1 on all other compilers (which should work everywhere as far as I heard).

The new version may have slightly different behaviour compared to 0.1.2 when regular expressions are specified on the command-line, as a result of the switch from Python’s regular expression engine to PCRE (if compiled with PCRE support, you can also use POSIX extended regular expressions if you like).

The code is significantly faster than the old code (in situations where it’s not I/O bound), and uses much less memory than the old one. Per file, we now store a <tt>struct stat</tt>, a pointer, an int, a char, and the filename; as well as three pointers for a binary search tree (which uses tsearch()).

It should also be compatible to Red Hat’s original hardlink tool now command-line-wise, there is at least a -c option now. The history and the name conflict are interesting, but probably nothing for this post. We’re even less resistant against changing trees than Fedora’s tool (and derived) currently, but should otherwise be better (and far more complicated and feature-packed). And we don’t require mmap(), but use fread() instead. There was no real performance difference in testing. And we are not GPL-licensed, but use the MIT/Expat license.

The package is currently entering Debian experimental for testing purposes. If you have used hardlink previously, or are just curious, give it try.

And the makefile is now compatible with the various BSD makes out there, if that’s interesting for you.

Link to website: http://jak-linux.org/projects/hardlink/

Managing system package selections using custom meta packages

Over the last years, I have developed a variety of metapackages for managing the package selections of the systems I administrate. The meta packages are organized like this:

jak-standard
Standard packages for all systems
jak-desktop
Standard packages for all desktop systems (GNOME 3 if possible, otherwise GNOME 2)
jak-printing
Print support
jak-devel
Development packages
jak-machine-<X>
The meta package defining the computer X

Each computer has a jak-machine-X package installed. This package is marked as manually installed, all other packages are marked as automatically installed.

The machine packages have the attribute XB-Important: yes set in debian/control. This creates an Important: yes field. This field is not official, but APT recognizes it and does not remove those packages (the same field is set for the APT package by APT when building the cache, as APT should not be removed either by APT). It seems to work a bit like Essential, with the exception that non-installed packages are not installed automatically on dist-upgrade.

The meta packages are created using seed files similar to Ubuntu. In contrast to Ubuntu, I’m not using germinate to create the packages from the seeds, but a custom dh_germinate_lite that simply takes a seed file and creates the correct substvars. It’s faster than germinate and really simplistic. It also does not handle Recommends currently.

The whole result can be seen on http://anonscm.debian.org/gitweb/?p=users/jak/jak-meta.git. Maybe that’s useful for some people. And if you happen to find some packages in the seeds that are deprecated, please let me know. Oh, and yes, some packages (such as the letterman one) are internal software not publically available yet [letterman is a simple GUI for creating letters using LaTeX].

While I’m at it, I also built Ubuntu’s version of wine1.2 for i386 squeeze. It can be found in
deb http://people.debian.org/~jak/debian/ squeeze main (it still needs a few changes to be correct though, I’ll upload a jak2 build soon). I also built updated sun-java6 packages for my parents (mostly needed due to the plugin, some websites do not work with the IcedTea one), but can’t share the binaries due to licensing requirements. I may push out a source repository, though, so others can build those packages themselves. I’ll let you know once that’s done.