Blog of Julian Andres Klode

Debian, Ubuntu, Linux in general, and other free software

Recursive-descent in Python, using generators

Writing recursive descent parsers is easy, especially in Python (just like everything is easy in Python). But Python defines a low limit on the number of recursive calls that can be made, and recursive descent parsers are prone to exceed this limit.

We should thus write our parser without real recursion. Fortunately, Python offers us a way out: Coroutines, introduced in Python 2.5 as per PEP 342. Using coroutines and a simple trampoline function, we can convert every mutually recursive set of functions into a set of coroutines that require constant stack space.

Let’s say our parser looks like this (tokens being an iterator over characters):

def parse(tokens):
    token = next(tokens)
    if token.isalnum():
        return token
    elif token == '(':
         result = parse(tokens)
         if next(tokens) != ')': raise ...
         return result
    else:
         raise ...

We now apply the following transformations:
return X => yield (X)
parse() => (yield parse())

That is, we yield the value we want to return, and we yield a generator whenever we want to call (using a yield expression). Our rewritten parser reads:

def parse(tokens):
    token = next(tokens)
    if token.isalnum():
        yield token
    elif token == '(':
         result = yield parse(tokens)
         if next(tokens) != ')': raise ...
         return result
    else:
         raise ...

We obviously cannot call that generator like the previous function. What we need to introduce is a trampoline. A trampoline is a function that manages that yielded generators, calls them, and passes their result upwards. It looks like this:

def trampoline(generator):
    """Execute the generator using trampolining. """
    queue = collections.deque()

    def schedule(coroutine, stack=(), value=None):
        def resume():
            if 0:
                global prev
                now = stack_len(stack)
                if now < prev:
                    print("Length", now)
                    prev = -1
                elif prev != -1:
                    prev = now
            result = coroutine.send(value)
            if isinstance(result, types.GeneratorType):     # Normal call
                schedule(result, (coroutine, stack))
            elif isinstance(result, Tail):                  # Tail call (if you want to)
                schedule(result.value, stack)
            elif stack:                                     # Return to parent
                schedule(stack[0], stack[1], result)
            else:                                           # Final Return
                return result

        queue.append(resume)

    schedule(generator)

    result = None
    while queue:
        func = queue.popleft()
        result = func()

    return result

This function is based on the code in PEP 342, the difference being that

  • we do not correctly propagate exceptions through the stack, but directly unwind to the caller of the parser (we don’t handle exceptions inside our parser generators anyway)
  • the code actually compiles (code in PEP used ‘value = coroutine.send(value)’ which does not work)
  • the code returns a value (code in PEP was missing a return in schedule)
  • we don’t use a class, and allow only one function to be scheduled at once (yes, we could get rid of the queue)
  • we allow tail calls [where Tail = namedtuple("Tail", ["result"])] to save some more memory.

For a more generic version of that, you might want to re-add exception passing, but the exceptions will then have long tracebacks, so I’m not sure how well they will work if you have deep recursion.

Now, the advantage of that is that our parser now requires constant stack space, because the actual real stack is stored in the heap using tuples which are used like singly-linked lists in scheme here. So, the only recursion limit is available memory.

A disadvantage of that transformation is that the code will run slightly slower for small inputs that could be handled using a normally recursive parser.

About these ads

Written by Julian Andres Klode

January 11, 2013 at 12:30

Posted in Python

4 Responses

Subscribe to comments with RSS.

  1. Very nice! If you’re in a position to use Python 3.3, this example can be made even simpler by using the new “yield from” construct to pass control down to the recursive parser. The outermost code then just needs to catch the final StopIteration and look at its “value” attribute to see the result.

    Nick Coghlan (@ncoghlan_dev)

    January 12, 2013 at 16:22

    • I don’t know how I should use yield from here. If you mean “yield from …” instead of the “yield (yield …)” [or a "yield Tail(...)"], then no, this would not work correctly, as we then use the Python stack for recursion in the parser — at least in Python 3.3.0. The interpreter could obviously handle this without recursion, by storing the sub-generator in a slot, as hinted in PEP 380, but that does not seem implemented yet.

      I could use Python 3.3′s return in a generator; which would then raise a StopIteration with a return value, but I don’t see how this simplifies the code. (Being a recursive descent parser, we can only return at the outer level anyway, so we can’t even use that to save passing a value upwards through the stack)

      Julian Andres Klode

      January 12, 2013 at 18:18

  2. You write “Python defines a low limit on the number of recursive calls that can be made”. My question is: Why don’t you simply increase that limit using sys.setrecursionlimit(limit)?

    Richard

    January 17, 2013 at 00:41

    • I did not know (or forgot) that this exists- But even then, this solution here is cooler to write about.

      Julian Andres Klode

      January 17, 2013 at 10:46


Comments are closed.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: