Defining recursion in terms of itself is an old joke among programmers. Despite the fact that it frustrates a lot of new-comers, we don’t change it. I like to define recursion as “Iteration for the cool kids”. I don’t mean this in a snobbish, let-us-exclude-the-for-loopers kinda way, but rather in a tone of appreciation. Recursion is an elegant way of doing things. Recursive alogrithms are concise, have less noise and have immutability baked in (always a plus).

Recursion is especially important for the functional programmer. In the initiation rites of functional programming we are made to internalize this -

~~Every recursive process can be defined iteratively.~~
Every iterative process can be defined recursively.

What do I mean by there is less noise?

I’m going to look at a famous interview question. How to reverse a linked list. I will provide two solutions, one recursive, one iterative.

This is the iterative solution, and most likely this is how most candidates will solve this problem.

There’s a lot of mutable state floating around, and it took me about 5 tries to get it right. Reasoning about the variables in the while loop is particularly hard.

def reverse_ll(head):
    first = head
    second = head.next
    rem = head.next.next

    first.next = None # make head a tail 
    while rem:
        second.next = first
        first = second
        second = rem
        rem = rem.next

    second.next = first
    return second

Here’s a recursive solution.

def reverse_ll_recursively(head):
    return reverse(None, head, head.next)

def reverse(first, second, rem):
    if not rem:
        second.next = first
        return second
    else:
        second.next = first
        return reverse(second, rem, rem.next)

This is shorter, and arguably more easy to understand. The elegant bit is not in the insight that the last element of any linked list is None, but rather in the part where it is obvious that reversing an element is as simple as putting the second element before the first. This is obvious in the base case.

I’ll be honest, I couldn’t get the iterative solution after many tries, so I implemented the recursive one and then worked backwards to implement the iterative one.

In fact in clojure, the solution is even more elegant. (if you’re not familiar with clojure’s syntax you can have a look at this)

(defn reverse-linked-list 
  ([head]
    (reverse-linked-list nil (first head) (rest head)))
  ([f s rem]
   (if-not (seq rem)
     (cons s f)
     (reverse-linked-list (cons s f) (first rem) (rest rem)))))

This is pure, no mutable state :)

Okay, one last solution, I promise.

(defn reverse-ll [head]
  (reduce cons head nil))

That’s it. It’s that easy to reverse a linked-list (without calling reverse).

reduce is another way to do recursion, in fact, it abstracts away a lot of boiler plate. Notice how much shorter our function was? This is what we call a good abstraction. I have written about reduce in depth, you can read it here.

But what about stack-overflow?

Stack-overflow is great, it helps a lot of programmers, ya know? Oh you mean that stack, the one in your RAM. I’m tempted to say that things like worrying about the stack size, and stack overflow is too low level for us. But this is a practical problem. As much as we love stackoverflow, we don’t like it in our programs. So, how do functional programmers solve this problem? Before I answer that, let’s take a quick detour.

Recursive processes vs Recursive definitions.

Yes, there’s a difference between them. Here’s a famous example.

(defn factorial [n]
  (if (< n 1)
    1
    (* n (factorial (- n 1)))))

;; call factorial with a biginteger
(factorial (biginteger 1000000))
; Execution error (StackOverflowError)

hmm, not good. Let’s think about what’s happening in the stack.

;; for (factorial 4)
;; stack
(factorial 4)
(* 4 (factorial 3))
(* 4 (* 3 (factorial 2)))
(* 4 (* 3 (* 2 (factorial 1)))) ; this is most the stack has to remember
(* 4 (* 3 (* 2 1)))
(* 4 (* 3 2))
(* 4 6)
24
; ---------------------------->
; space required

It’s obvious that another step would increase the amount the stack has to remember.

Compare it to an iterative definition.

(defn factorial-iter
  ([n] 
   (factorial-iter 1 1 n))
  ([counter result limit]
   (if (= counter limit) (* counter result)
       (factorial (+ 1 counter) (* counter result) limit))))

The stack looks quite different for this.

;; for (factorial-iter 4)
;; stack
(factorial-iter 4)
(factorial-iter 1 1 4)
(factorial-iter 2 1 4)
(factorial-iter 3 2 4)
(factorial-iter 4 6 4)
24
; --------------->
; space required

at any given point there’s a constant amount of information the stack has to remember. In fact this is how it would look if you iteratively defined factorial. The key difference between the two stacks is that the first one is a recursive process and the second one is an iterative process. Paraphrasing from SICP; The operations in the first stack are deferred until the base case is reached and then the subsequent operations are carried out, this is defined as a recursive process. The second stack holds 3 state variables, and operations aren’t deferred, this is defined as an iterative process.

Here’s where things get interesting. Both of these functions are recursively defined. Both call themselves until the base case is reached.


(defn factorial [n]
  (if (< n 1)
    1
    (* n (factorial (- n 1)))))

(defn factorial-iter
  ([n] 
   (factorial-iter 1 1 n))
  ([counter result limit]
   (if (= counter limit) 
       (* counter result)
       (factorial (+ 1 counter) (* counter result) limit))))

I want to really drive home the fact that

A recursively defined function can result in an iterative process!

Tail recursion

Let’s look at the last form of the previous functions.


(defn factorial [n]
;;   (if (< n 1)
    ;; 1
    (* n (factorial (- n 1))))


(defn factorial-iter
;;   ([n] 
;;    (factorial-iter 1 1 n))
;;   ([counter result limit]
;;    (if (= counter limit) 
       (* counter result)
       (factorial (+ 1 counter) (* counter result) limit))

Notice anything? In factorial the last form calls factorial, gets its result, and computes something, and returns that.

In factorial-iter the last form IS a function call. The result of that function call is returned.

Because the function call is the last form (at the tail), it’s called tail-recursive. Tail-recusion is how functional programmers avoid stack overflow.

In factorial the compiler/interpeter has to remember the calling function to pass the value of the recursive calls. In factorial-iter the compiler/interpreter says “hold on, the calling function is just returning the value of calling another function, I can easily forget about the calling function.”

Enough talk, let’s see whether this works in practice.

(factorial-iter (biginteger 1000000))
; Execution error (StackOverflowError)

“but you said it was an iterative proccess?” Ah, this is where you have to start worrying about low-level details, supporting tail recursion optimizations is something compilers do, and it is up to the runtime platform to implement this. The JVM doesn’t optimize for this. But but but… Clojure has a few tricks up its sleeve.

recur

Let’s redefine factorial-iter

(defn factorial-iter
  ([n] (factorial-iter 1 1 n))
  ([counter result limit]
   (if (= counter limit) (*' counter result)
       (recur (+ 1 counter) (*' counter result) limit))))

(factorial-iter (biginteger 1000000))

Try it out. It will take a long time to compute but it won’t result in a StackOverflowError.

recur is clojure’s way of providing tail-recursion optimization. So if you can define a function in a way that the recursive calls are in the tail position (last form) then you can use recur. Rich Hickey is quite ingenius.

What can be done in languages like python that don’t support tail recursion optimizations?

Trampoline

Trampolines are cooler than a polar bear sliding on ice. To use a trampoline, you need to tweak your function a little bit. Instead of having your function call itself in the tail position, you return a function in the tail position. The function you return must take no arguments. Let’s see this in action.

(defn factorial-trampoline
  ([n] (factorial-trampoline 1 1 n))
  ([counter result limit]
   (if (= counter limit) (*' counter result)
       #(factorial-trampoline (+ 1 counter) (*' counter result) limit))))

=> (trampoline factorial-trampoline 4)
24

The #() syntax is used to define an anonymous function. So factorial-trampoline returns a function instead of calling itself.

In python it would look something like this.

def factorial_trampoline(n,counter=1, result=1):
    if counter == n:
        return counter*result
    else:
        return lambda: factorial_trampoline(n, counter+1, counter*result)

trampoline takes a function and arguments (here 4) and calls the function with the arguments, since each subsequent call returns a function, trampoline calls that function until the base case is reached, then it returns the value.

Here’s clojure’s implementation of a trampoline.

(defn trampoline
  ([f]
     (let [ret (f)]
       (if (fn? ret)
         (recur ret)
         ret)))
  ([f & args]
     (trampoline #(apply f args))))

It uses recur, not very useful for python. Truth be told, in python you have to cheat a little. Here’s a trampoline implementation taken from functionali.

def trampoline(fn: Callable, *args: Any):
    recursive_fn = fn(*args)

    # this is the part where we have to cheat.
    while isinstance(recursive_fn, Callable):
        recursive_fn = recursive_fn() # ack mutable state!

    return recursive_fn

We have to use some mutable state but hey, you can define tail recursive functions and not have to worry about stackOverflow, I consider that a win.

Tail-call vs Tail-recusion optimizations

Clojure is known for having a high bar for including any function in the core library. So, when we have recur why do we need trampoline? Doesn’t that seem redundant?

Let’s look at two functions.

(declar is-even? is-odd?)

(defn is-even? [num]
  (if (= num 0)
    true
    (is-odd? (dec num))))

(defn is-odd? [num]
  (if (= num 0)
    false
    (is-even? (dec num))))

=>(trampoline is-even? 1000000000)
; Execution error (StackOverflowError)

These functions call each other until either one terminates. For big numbers, this will result in a stack overflow. Notice that these function calls are in the tail position (last form). But they are not recursive. is-even? calls is-odd? in the tail position, not itself, so we can’t use recur. This is the difference between a tail-call, and a tail-recursion. These are good examples of functions having tail-calls. As pointed out earlier, the JVM does not optimize for tail-calls or tail-recursions. This is where trampoline comes in.

Here is how you’d tweak it for trampoline.

(declar is-even? is-odd?)

(defn is-even? [num]
  (if (= num 0)
    true
    #(is-odd? (dec num))))

(defn is-odd? [num]
  (if (= num 0)
    false
    #(is-even? (dec num))))

; then call it with trampoline

=> (trampoline is-even? 100000000)
; true

Now, instead of calling is-odd? from is-even you’re returning a function that when called will compute (is-odd? (dec num) and returns its value.

Cool, right?!

Closing thoughts

Functional programs use recursion a lot, and if you plan to do functional programming, I suggest you get comfortable with it. Once you start thinking recursively, it is harder to do things with mutable variables and for loops (a good thing!). A good exercise is to take iteratively defined functions and try to define them recursively. Another exercise is to take recursive functions and redefine them to be tail-recursive. That was how I got comfortable with recursion.

What do I mean by there is less noise?#

But what about stack-overflow?#

Recursive processes vs Recursive definitions.#

Tail recursion#

recur#

Trampoline#

Tail-call vs Tail-recusion optimizations#

Closing thoughts#