How to derive the Y combinator

Thursday, December 30, 2021, 01:20 PM

Programming, Notes

I forgot my favorite method of deriving Y combinator for the n-th time, so this time I am forcing myself to write this up as a quick reminder.

For a proper tutorial, consider this: https://homes.cs.washington.edu/~sorawee/en/blog/2017/10-05-deriving-Y.html.

For each example here, verify with fact(10) == 3628800.

Version 0

This definition of fact requires that its body can look up the name fact in its parent environment, which is bad.

fact = lambda n: 1 if n == 0 else n * fact(n-1)

We want to be able to write fact entirely in terms of anonymous functions which do not have free variables. These things have a fancy name: “combinators.”

Y combinator, which we will eventually reinvent, is of interest because it provides us with a primitive for building recursive functions in languages that provide higher-order functions but not recursion by default.

Version 1

The trick is to pass in the recursion as an argument instead.

fact_base = lambda f: lambda n: 1 if n == 0 else n * f(f)(n-1)
fact = fact_base(fact_base) # this relies on currying/partial application

Version 2

Because there are no circular dependencies, we can just inline everything. We can just stop here if we want.

fact = (lambda f: lambda n: 1 if n == 0 else n * f(f)(n-1))(lambda f: lambda n: 1 if n == 0 else n * f(f)(n-1))

Version 3

I hate code duplication, so I am writing a helper function to call the function on itself instead.

fact = (lambda f: f(f))(lambda f: lambda n: 1 if n == 0 else n * f(f)(n-1))

Suppose I define make_recur = lambda f: f(f), then the user could make their own recursive function just by writing a function that takes in f. To make a recursive call, write f(f)(...). Here is an example of how to use it.

make_recur = lambda f: f(f)
fib = make_recur(lambda f: lambda n: n if n <= 1 else f(f)(n-1)+f(f)(n-2))
# fib(10) == 55

Version 4 (broken)

This is where I usually get stuck.

Right now, requiring the user to write f(f)(...) is disgusting. We want to allow the user to write f(...) instead.

user_fact_base = lambda f: lambda n: 1 if n == 0 else n * f(n-1)
fact_base = ???
fact = fact_base(fact_base)

Our task is to figure out the definition of fact_base to wrap around user_fact_base. Recall, fact_base takes in a function f as its argument. It should call user_fact_base with f(f) already prepped in that argument slot. This should work:

user_fact_base = lambda f: lambda n: 1 if n == 0 else n * f(n-1)
fact_base = lambda f: user_fact_base(f(f))
fact = fact_base(fact_base)

Actually, this results in stack overflow because Python uses strict evaluation.

fact = fact_base(fact_base)
     = (lambda f: user_fact_base(f(f)))(lambda f: user_fact_base(f(f)))
     = user_fact_base( (lambda f: user_fact_base(f(f)))(lambda f: user_fact_base(f(f))) )
     = user_fact_base( user_fact_base( ... ) ) # infinite recursion

Version 5

Realize that the issue is that Python evaluates f(f) before passing into user_fact_base. We do not want this to be evaluated unless it is necessary (i.e. when function body requires a recursive call). The trick is to simply wrap the expression f(f) in a somewhat redundant lambda expression (make the argument explicit).

user_fact_base = lambda f: lambda n: 1 if n == 0 else n * f(n-1)
fact_base = lambda f: user_fact_base(lambda x: f(f)(x))
fact = fact_base(fact_base)

Now we can inline everything if we want.

The Y combinator

What we discovered in version 4 was the Y combinator. Specifically:

user_fact_base = lambda f: lambda n: 1 if n == 0 else n * f(n-1)
Y = lambda f: (lambda x: f(x(x)))(lambda x: f(x(x)))
# Y = lambda f: (lambda x: x(x))(lambda x: f(x(x)))
fact = Y(user_fact_base)

$Y = \lambda f. (\lambda x. f (x x))(\lambda x. f (x x))$.

Some people might like to write: $Y = \lambda f. (\lambda x. x x)(\lambda x. f (x x))$.

The Z combinator

What we discovered in version 5 was the Z combinator. Specifically:

user_fact_base = lambda f: lambda n: 1 if n == 0 else n * f(n-1)
Z = lambda f: (lambda x: f(lambda v: x(x)(v)))(lambda x: f(lambda v: x(x)(v)))
# Z = lambda f: (lambda x: x(x))(lambda x: f(lambda v: x(x)(v)))
fact = Z(user_fact_base)

$Y = \lambda f. (\lambda x. f (\lambda v. x x v))(\lambda x. f (\lambda v. x x v))$.

Again, you can condense this into $Y = \lambda f. (\lambda x. x x)(\lambda x. f (\lambda v. x x v))$.

This works in Python.

Another way to derive the Y combinator

Here is another way I like to use to memorize how to write the Y combinator. This is probably not very understandable if you have not seen it before.

First, you need to know that $Y$ finds fixed point of a function, namely $Y f$ gives the input $x$ such that $f x = x$, i.e. $f (Y f) = Y f$.

We can repeatedly substitute $Y f$ with $f (Y f)$. So, $Y f = f (Y f) = f (f (Y f)) = f (f (f (f \dots)))$.

This equation, $Y f = f (f (f (f \dots)))$, is the goal we are trying to achieve. How do we apply $f$ repeatedly?

Well, let’s answer a simpler question first: How do you make something run infinitely?

This works: $(\lambda x. x x)(\lambda x. x x)$.

Notice that if you keep simplifying the expression, you are always stuck at $(\lambda x. x x)(\lambda x. x x)$.

We just need to sneak $f$ in there: $(\lambda x. f (x x))(\lambda x. f (x x))$.

Observe $(\lambda x. f (x x))(\lambda x. f (x x)) = f (\lambda x. f (x x))(\lambda x. f (x x)) = f (f (\lambda x. f (x x))(\lambda x. f (x x))) = f (f (f \dots))$.

We are done! $Y = \lambda f. (\lambda x. f (x x))(\lambda x. f (x x))$.

To get the Z combinator, just add a redundant $v$: $Z = \lambda f. (\lambda x. f (\lambda v. x x v))(\lambda x. f (\lambda v. x x v)) $.

Yay!

« Review/Summary: How to Design Programs (HTDP vs. SICP)

Power tripping with infinite lists in Haskell »

aqua-ified thoughts

never developed nor fully solidified