Introduction §
If you’ve done much functional programming, you’ve probably heard of the λ-calculus, invented by Alonzo Church in the 1930s. If you haven’t, the term might be quite new to you; but don’t worry, despite the intimidating name the λ-calculus is actually very simple.
In fact, that’s exactly what it’s used for: it exists as a very simplified programming language that nevertheless captures all the difficult parts of a real programming language (such as Turing-completeness), so that it can be used as a playground for experimentation on different programming language features or as a basis for a real programming language (that will be defined, if not implemented, by a syntactic translation into a core λ-calculus).
By using the λ-calculus instead of working on a fully-featured programming language like C# or Haskell, the programming language designer can focus on building the features they want to build, with minimal distraction from the large amounts of syntactic sugar and well-understood language features that go into making a real programming language designed for enterprise use. Of course, sometimes it is then unclear how different features should be integrated into the same language, a problem that initiated a lot of the work on algebraic effects.
The variants of the λ-calculus are too numerous to list here, but I will try to give a couple of representative examples and show how they are useful.
The untyped λ-calculus §
This is the simplest form of the calculus, and the basis on which all the others are built. It simply consists of three types of expression:
-
variables, named by some set of names, e.g. \(x\);
-
abstractions, written \(λx. e\) where \(x\) is a variable name and \(e\) is another expression;
-
applications, written \(f e\), where both \(f\) and \(e\) are terms.
The λ-calculus syntax can be understood easily with reference to a subset of a language supporting higher-order functions, like JavaScript:
-
variables work the same: a variable is represented by a valid variable name.
-
abstractions \(λx. e\) are a shorter syntax for function expressions
function(x) { return e; }
(or, in modern JavaScript,x ⇒ e
). -
applications \(f e\) simply mean
f(e)
.
Brackets may be used to disambiguate groupings; there are a couple of conventions for this, and they are opposing, so you have to know which is in use. In most contexts application is left-associative, meaning that \(f g h\) means \((f g) h\) and bracketing will only be used to represent \(f (g h)\). However, in some semantics work, especially that in which multiple arguments are applied at once such as Jean-Jaques Lévy’s work on optimal reduction, the syntax for an application always includes brackets around the function, e.g. \((f)gh…\) to mean that the function \(f\) is applied to arguments \(g\) and \(h\). I will use the former convention here.
As a shorthand, functions that immediately return functions, such as \(λf.λx.x\), can be abbreviated to \(λfx.x\).
In addition to the three syntactic forms, the (call-by-name) λ-calculus comes with a single reduction rule describing what happens when you apply a function to an argument:
-
\((λ x. e_1) e_2\) evaluates to (written \(\Longrightarrow\)) \(e_1\) with \(x\) replaced by \(e_2\) (often written \(e_1[x/e_2]\)). To be pedantic, this is actually non-capturing substitution, i.e. any variable names in \(e_2\) should be renamed to make sure they don’t conflict with variable names elsewhere in \(e_1\).
That’s it! This simple programming language is powerful enough to encode all (pure) programs. As an example, take the Y combinator,
This program is an infinite loop! If we replace all the \(x\)s in the first term with \((λx. x x)\), we get:
… which is back where we started. Despite its lack of any looping or recursion primitives, or even function names to use to perform recursive calls, we can loop in the λ-calculus just by using the name of the argument twice.
An encoding example: Church numerals §
Not only that, because we have so many different functions to work with, we can encode values like natural numbers directly into the calculus itself, and manipulate them using the reduction rule. A well-known example is that of the Church numerals, named after Alonzo Church. In the Church numerals, a natural number n is encoded as a function that takes a function and an argument and applies the function to the argument n times. For example, the Church numeral 0 is defined as:
The Church numeral 1:
And so on:
It’s easy to see that, given a Church numeral n, we can define its successor as \(λfx.f(nfx)\), since this means that we take the numeral n (a function that applies a given function to its argument n times) and apply the function to the result one more time. We can codify this:
Remember that \(λmfx.\) is just shorthand for \(λm.λf.λx.\), so \(\operatorname{succ} n\) evaluates to \(λfx.f(nfx)\).
With a zero and a successor function, we can construct any natural number we like. Only one thing remains to have a complete encoding of the naturals, which is the computations out of the naturals: the ability to distinguish between different naturals and alter the course of the computation based on them. Thankfully, the Church numerals encode this functionality into the number itself: because the encoding of the numeral n applies the function n times, it’s possible to pass a function that ‘counts’ the number of times it has been called on an argument and do something different.
For example, given (arbitrary) terms \(\def\tt{{\mathrm{\bf
t\!t}}}\tt\) and \(\def\ff{{\mathrm{\bf f\!f}}}\ff\) representing
the Booleans true
and false
, we can write a predicate
\(\operatorname{isZero}\):
There’s a Church encoding of the Booleans too, that looks like \(\tt = λtf.t\); \(\ff =
λtf.f\). Can you see how to write an if
?
If the function is called zero times, we get \(\ff\); but as soon as the number calls the function at least once, the resulting term ignores the previous value and returns \(\tt\).
The Church numerals are a prominent example of a more general procedure called Church encoding, in which data types can be encoded and manipulated as their recursor (the function that computes on the data type).
A brief aside: imperative programming §
While the λ-calculus is often associated with functional programming,
it can represent imperative languages just as well by a
straightforward transformation: an imperative program such as X();
Y();
can be represented by a term like:
where the \(x\), \(y\), and \(z\) terms represent the state
of the imperative program that is updated by the X()
and Y()
functions. Alternatively, alternative
evaluation orders can be used to control exactly when evaluation
occurs, making it possible to reason about arbitrary imperative side
effects that happen in the course of evaluation.
Type systems §
Depending on whether we are interested in syntax or semantics, extensions to the λ-calculus generally involve adding or changing types or terms respectively. In a real-world programming language, usually the λ-calculus is extended both ways, and often a new term will be presented with a type system that ensures correctness of use.
Simply-typed λ-calculus and the lambda cube §
The simplest type system extension given to the λ-calculus is the aptly-named simply-typed λ-calculus. In the simply-typed λ-calculus, we demand that every expression \(e\) has a type \(τ\), written \(e : τ\). Types \(τ\) are either:
-
a base type \(A\) drawn from some (usually unspecified) set of base types (e.g.
string
,int
); or -
a function type \(σ → τ\) from values of type \(σ\) to values of type \(τ\).
Along with the types we associate typing contexts (lists of variable types) and typing rules:
-
if a variable \(x\) has type \(τ\) in a context \(Γ\), then the term \(x\) has type \(τ\) in context \(Γ\).
-
if \(e\) has type \(τ\) in a context \(Γ, x: σ\) in which \(x\) has type \(σ\), then \(λx. e\) has type \(σ → τ\) in context \(Γ\).
-
if, in context \(Γ\), \(f\) has type \(σ → τ\) and \(e\) has type \(σ\), then \(f e\) has type \(τ\).
Notably the λ-calculus extended with (only) this type system is no
longer Turing-complete, because in order to type something like the Y
combinator discussed earlier, we would have to give it an infinitely
long type.
In practice, Turing-completeness is often recovered in typed λ-calculi by adding explicit terms for recursion as primitive, such as the Y combinator or recursive let rec
bindings that may refer to themselves by name. This kind of type system is
the basis of type systems in modern programming languages, and
corresponds to minimal
logic, a very simple logic with only (conjunction and) implication.
Other type systems of interest include those extended with the ability to parameterize things over other things; these are categorized by Barendregt’s famous lambda cube. Of these, each enables some practical feature that has since made its way into ‘real’ programming languages:
-
System F allows writing polymorphic functions, a.k.a. generic functions in languages like Rust or Java.
-
System Fω allows writing polymorphic types, such as trees or lists, a.k.a. generic types in languages like Rust or Java.
-
ΛΠ allows writing dependent types, wherein types depend on terms, allowing the expression of types like ‘arrays of length 3’ or ‘integers that are multiples of k’ for some variable k.
In (typed) functional programming circles, the ability to parameterize types on other types is often taken for granted, and the term ‘generic type’ can be used to mean a type that is available for inspection — something like Java reflection.
Of the points on the lambda cube, all but dependent types have been widely adopted by most modern mainstream typed programming languages, with, until quite recently, the notable exception of Go.