Setting the stage
It’s 2026. The LLM hype continues, and amid rounds of layoffs we’re told once again that the end is nigh: programming is dead! Programmers are obsolete! Now, in this golden age of vibe coding, the old dream has been realized: anybody can start programming just by writing plain old English (or, better, Polish) with no need to learn these finicky programming languages or fiddle about with ‘tooling’ or ‘infrastructure’ or ‘type systems’. Twitter is ablaze with tech influencers excitedly showing off the latest PhotoShop clone their agent built while they were in the bath. Conscientious developers are having existential crises over how they aren’t effectively producing software in their sleep like the Internet is telling them that they can.
But there’s a spanner in the works. The production of real, non-trivial software isn’t really speeding up the way we would expect it to do if these demos scale the way we’re being told they should. Quantitative studies of AI-assisted coding output consistently show negative or statistically insignificant productivity gains, even while the coders themselves feel more productive and are putting in longer hours and burning out faster.
It’s clear that LLMs are a real breakthrough in artificial intelligence and make short work of language-heavy tasks, and in certain cases they can one-shot tasks that would take humans weeks of effort. So what gives? I think the answer is simple: LLMs can’t program!
The struggle of syntax
People who claim that programming is dead, or that LLMs can do programming ‘for you’, usually focus on syntax. Every novice programmer knows the frustration of trying to remember where the semicolons go in a for loop, or which of its subexpressions is executed when, and the programming languages of today have plenty of ‘gotchas’ that can trip up even senior programmers: any C++ programmer who tells you they’ve never been bitten by an unexpected implicit type conversion is lying!
To a non-programmer, programming language syntax looks like the big hurdle of programming. The job of a programmer, surely, is to take a specification of a business problem written in plain, clear English that we normal people understand and translate it into this incomprehensible string of symbols that is the only thing that computers speak, because computers are too dumb to understand natural language. The mastery of programming, then, is in memorizing the meaning of all these arcane sigils and directly translating the natural-language specifications, drawn up by the people who make the big-picture decisions, into them. If we can teach the machines to understand regular-people language, there’s nothing left for the programmer to do! The job of programming is dead: now the normal people can just tell the computer what to do directly, and all that time programmers spent memorizing corner cases of C++ ⁠[1]1.I’m picking on C++ a bit here. It deserves it. syntax was a waste of twenty years.
Except… that’s not really the case at all. In fact, to some extent, the opposite is true! Programming languages are designed for humans, not computers, which only understand electrical pulses at the end of the day. The ‘inscrutable’ syntax of programming languages is, in fact, designed to make the task of programming easier, and modern programming languages represent the states of the art in (a selection of competing approaches to) making programs easy to read and write. That’s certainly not to say they’re perfect: if they were, we wouldn’t need so many! Learning and remembering programming language syntax really is difficult, and it frequently trips people up. But we have settled, for the most part, on syntaxes that make programming easier than not for humans, once the syntax is learnt. The trend, if anything, is towards less natural languages for programming: there was, at some point, a movement to try to make programming languages more like natural languages, embracing syntactic complexity and irregularity in exchange for reduced typing or grammatical familiarity to speakers of languages like English. The effects of this can be seen in, for example, COBOL, Perl⁠[2]2.Perl has a special place in my heart, and programming language history, for being designed not by a computer scientist but by a linguist., or (to a lesser extent) C++. But more recent popular programming languages like Python⁠[3]3.Python has a few concessions to natural language, which are held to be the basis of some of its more confusing syntactic features., Go, or Rust⁠[4]4.Yes really – see the turbofish! eschew such attempts in favour of grammatical unambiguity — not because computers are dumb (we know how to write parsers for these things!) but because even that small amount of ambiguity is enough to regularly trip up humans reading or writing programs.
The joy and pain of notation
For a more storied example of this process in action, let us look at mathematical syntax. Modern conventions for mathematical notation are much more recent than one might imagine; for example, the equals sign (=), a cornerstone of modern mathematical notation, dates only from the 16th century, set out by the Welsh mathematician Robert Recorde:
And to auoide the tediouſe repetition of theſe woordes : is equalle to : I will ſette as I doe often in woorke vſe, a paire of paralleles, or Gemowe lines of one lengthe, thus: =, bicauſe noe .2. thynges, can be moare equalle.
Before the invention of notation for a particular concept, the unambiguity required by mathematics is achieved by jargon, which is the formalized use of natural language specialized to (and given a formal meaning in) a particular domain. For example, when a modern mathematician says ‘group’ in the context of a dinner party, they are probably⁠[5]5.Though by no means certainly, as you’ll know if you’ve spoken to many mathematicians. using it in its informal sense to mean more than one person; but, in the context of a mathematical discussion, they almost certainly mean a set closed over a binary and a unary operation that satisfy the appropriate laws.
Natural language, jargon, and notation exist along a continuum. Natural language is the easiest to use, and simply involves explaining your concept using a variety of words that have semantics in the natural language of discorse, but achieving precision for novel concepts in natural language can be tricky, usually requiring elaborate circumlocutions. Once a concept has been adequately described in natural language, and if the concept will likely be needed again (to auoide the tediouſe repetition of theſe woordes) we typically assign it some jargon that we can use to refer to back to it in this context. But when we understand a concept more deeply, we can use notation, a graphical or non-lexical representation of the concept, to emphasize properties of the concept we consider important and de-emphasize those we don’t. For example, in the world of arithmetic, in modern notation we typically use juxtaposition to denote multiplication and the plus sign (+) to denote addition. Compared to ‘fourteen multiplied by an unknown added to fifteen is equal to seventy-one’, the notation \(14x + 15 = 71\) has several advantages. For one, it’s shorter! It simply expresses the same information in less space, which means we can fit more information on the page, easing visual comparison and reference. The same argument applies to the mental load to interpret it. Short-term memory is (held to be) organized into ‘chunks’. The precise number of ‘chunks’, as well as their composition, is debatable and probably varies somewhat from person to person; but it’s quite well agreed that we humans can fit only a certain number of things in mind at a time. The other big advantage of notation is that we can use it to visually expose properties of the underlying object. In the equation \(14x + 15 = 71\) we see this done in several different ways:
-
The variable, \(x\), which must be treated separately in various ways, is visually distinct from the constants, \(14\) \(15\) \(71\). Additionally, the (nominally arbitrary) choice of the name \(x\), as opposed to \(z\), \(G\), or \(Γ\), has cultural connotations about the contents (or type) of the variable and its intended usage in the mathematical explanation underway.
-
The plus sign, especially when spaced as is conventional, has less typographical colour⁠[6]6.‘Colour’, in typographical terms, refers just to the perceived visual ‘weight’ or ‘presence’ of the text on the page. than the terms to which it applies, which visually represents that it has lower precedence than the multiplication next to it.
-
The constants themselves are represented in the Hindu–Arabic numeral system, a positional system; the advantage of this can’t be overstated! In non-positional systems, like Roman numerals, operations such as multiplication or division require long sequences of operations, essentially repeated addition or subtraction, usually aided by a mechanical device such as an abacus, that were mostly the purview of professional mathematicians. With modern positional notation, we can perform long multiplication and division directly on the numbers, an operation that is logarithmic in the operands, and these manipulations are straightforward enough that we teach them to young children.
Importantly, mathematical notation was never constrained by computer implementation, and indeed much of it remains unimplementable by computers today. This is a pure example of a formal notation that won over natural language by sheer force of its advantages for humans in reading and writing.
That’s not to say we should derive special-purpose notation for everything. Learning a notation is an up-front cost that explanation in natural language doesn’t have, and its expenditure should be justified: notation shouldn’t be introduced for no good reason, but needs to pull its weight in making more complicated manipulations easier. But in the cases where it is worth it, notation is a powerful tool that has a force-multiplying effect on more complex work. To regress from the sophisticated (albeit imperfect) programming notations we currently have, which have undergone nearly a century of evolution since the introduction of the lambda calculus in the 1930s as well as standing on the shoulders of the giants of mathematical notation, to mere natural language as if we were in the 15th century, is not a win at all, but actually a huge step backwards for professional⁠[7]7.Which is not to say it isn’t useful: for novice programmers, or non-programmers, the ability to tap into the LLM’s training data without needing to understand the code contained within it unlocks a lot of capability, even if that capability remains much more limited than what a professional programmer can achieve. programming.
What is programming?
If we agree that the translation from natural language to syntax is not the crux of programming, what is? What is it that programmers actually do? This is a question that researchers in the field of programming psychology have been asking for years, and the answer seems to be deceptively simple: programmers make decisions. Software is made of decisions and programmers are those who make or clarify those decisions and weave them together into a coherent whole.
This is in stark odds with the business-oriented view of programming we set out above, where executives or product owners make the decisions about what the software is to be and then programmers, like robots on an assembly line, simply hammer the pieces together according to instruction. From this perspective, then, we can see the enterprise software as merely the distillation of the business’s decision-making process: decisions about the company’s vision and market positioning feed into decisions about user targeting and user interface design which feed into decisions about data structures and API endpoints. The executive class and the developer class, far from being the Montagues and Capulets forever separated, are in fact performing the same activity at different levels of abstraction. Each successive refinement merely reduces the ambiguity of the decisions until they are well-specified enough to be fed directly into a computer; and each refinement is an equally creative endeavour, filling in the holes left in the specification by the previous step with new information known only by the current refiner.
But the refinement doesn’t stop there! Even after the programmer has handed off the program⁠[8]8.‘Program’ is the name we give to a specification once it is well-defined enough for a computer to execute without further human intervention. the computer itself takes on the job of further refinement. Libraries, compilers, runtimes, virtual machines, operating systems begin further elaborating the program, adding in information about algorithms, garbage collection, scheduling, available hardware instructions. The CPU itself does even more work, analysing the resulting machine code for dependencies and pipelining and parallelizing the execution with its knowledge of its own capabilities.
LLMs don’t just translate natural language into programming-language syntax. They are also capable, to some extent, of filling holes left in the specification by extrapolating from their training data, which includes approximately all the code ever publicly written.⁠[9]9.I withhold commentary here about the ethical implications of such a thing; enough has been said about it elsewhere. This essay is about programming, but I will take this opportunity to allude to a rather loose analogy between learning programming language syntax in programming and the mechanical skills of drawing, sculpting, writing, et cetera in the arts. But from the perspective we’ve outlined above, this is not programming! The LLM when doing this process is not making any decisions, merely pattern-matching on similar situations that arise in its training data, and replicating the mean code the average person on the Internet⁠[10]10.Who is definitionally, you should remember, an absolutely mediocre programmer with no understanding of your codebase or context. writes when faced with a similar-looking situation. You can cause the LLM to output ‘better’ (more aligned with your tastes, requirements, aut cetera) code by shunting its context of reference into one in which humans on the Internet output better code; you can cause it to output code that is better suited to your particular problem context by feeding more of what you know about your problem into it. These practices are commonly referred to as ‘prompt engineering’, but are in fact nothing more nor less than plain old programming: you are making decisions and using those decisions to refine a specification to make the computer execute it in line with your desires or your interpretation of the specification that you, in turn, were fed. And you’re doing it in natural language, a notation uniquely unsuited for the task! Notably, though, the LLM itself is not programming. By picking the most generic output possible, it is exactly not making any decision, not adding any information, not refining the specification at all. The result may run, but is unlikely to be what you want: the main difference is that, instead of simply outputting ‘syntax error’, the computer will simply run a different program. In some ways that’s better; in other ways it’s worse, and the arguments are roughly analogous to the old arguments between Python programmers and Haskell programmers: would you rather have a program fast that does the wrong thing but lets you look at its runtime behaviour and iterate on it, or would you rather be told about all the potential problems (including a few that aren’t real!) up front?⁠[11]11.The correct answer is, of course, ‘both’, as Python is moving towards with optional type checking — though it has a long way to go!
Indeed, looking at it from this direction, it seems obvious that programming is a fundamentally human activity, and in order for an agent to be able to compete seriously at programming, it would need to be a true AGI. Refinement of specifications requires a robust understanding of⁠[12]12.And alignment with — which is a whole other, and maybe even more difficult, problem the intended goals of the resulting software, which comes from having a deep understanding of the world the software will inhabit, including a theory of mind sufficient to give a decent approximation of the intentions, knowledge, and desires of both the specification writer and the end user, not to mention any collaborators the model might have to work with in delivering the software.
The best analogy I’ve ever seen for explaining to non-programmers what it is that LLMs do to programming is this floorplan. At first glance, or if you look at just one part at a time, it looks sensible. All the ‘syntax’ is there: the model knows how to represent a wall, a floor, a toilet. To a non-architect taking a glance it looks like the LLM has ‘done architecture’. It’s definitely seen floorplans before, and if you had just asked it to give you an architectural diagramme of a bathroom, I’m pretty sure it would have spat a perfectly good one out from its training data. But if you look more closely at the product as a whole, the model has completely failed to understand the whole point of a house. It fundamentally doesn’t understand what houses are for or how humans use them, and the result is unusable and unfixable.
Programming in the age of LLMs
Does that mean that LLMs are inherently useless for programming? Certainly not.⁠[13]13.Though I do suspect they may be unusually poorly suited for programming compared to other disciplines in which direct translation, without adding information, is a much bigger part of the work. Let’s take an optimistic look at how we can help LLMs generate reasonable code for our use case, saving us some typing and documentation reading, even if we can’t expect them to program per se. We’ve already described novice programmers as an audience for whom LLMs can drastically improve programming capabilities. They’re also great lossy search engines for code that is well represented in their training data: if you want a to-do app, an LLM will spit you one out with very little ceremony or effort on your part! For professional programmers building novel software, though, they currently come with two quite crippling disadvantages, the first of which being the rather lacklustre language one must use to interact with them, and the other being how they hide (and sometimes stomp on) the boundary between translation and elaboration.
In my experience ‘successful’ LLM programmers, ones who manage to eventually produce code palatable to human reviewers, use a variety of techniques to overcome these limitations. These approaches are very reminiscent of approaches advocated in the 1990s and early 2000s: practices like waterfall software development (refining the specification until it is watertight before ever starting the ‘programming’ phase), pair programming (watching what your collaborator is doing so you can step in if they go off the rails), extremely thorough zero-trust code reviews, or test-driven development with enforced 100% test coverage come with very heavy overhead but severely limit the damage that less competent programmers can do to your codebase. Unfortunately, it seems that currently the overhead from this make-work roughly cancels out any benefit to be gained from using LLMs in the first place.
If we want LLMs to be a net positive to programming productivity, then, we need to drastically reduce the cost of adding these guard rails. Asking LLMs to add them for us is correctness theatre: if we could trust the LLM to correctly interpret our instructions there would be no need for the guard rails in the first place. Perturbing the space by using a different model or giving it different instructions increases the likelihood of it catching some issues, but is already challenging in its own right (it’s adding information to the artefact — i.e. programming!), and as soon as the models start to interact they are incentivized to align, in a way that is not required to be what the human wanted.
Furthermore, once you start trying to align several agents on a goal, you need to orchestrate them, i.e. establish protocols for communication between them. People talk about this as if it were a business process (‘being the CEO of your own small company of agents’), and in a way it is — but setting out business processes is, again, programming! It can even be done using real, formal programming languages, such as the π-calculus. Not only that, it’s concurrent programming, a famously difficult and error-prone type of programming.
It seems that any way we go, we end up programming. So let’s embrace that! What does it look like to program with coding agents that are deeply flawed or even malicious? I think it will at least involve some new directions in programming UX. Many of these are already active research directions, but are less urgent when collaborating with a trusted human-speed teammate.
Multiplayer
Traditional programming is primarily a solo activity. The standard definition of ‘algorithm’ is entirely non-interactive. Even interactive models of computation like game semantics focus on two-party interaction — player and opponent, me and you, program and environment, and only player/I/program is under the control of the programmer.
Programming in the large with LLMs involves shepherding large flocks of them, each specialized to a particular task, which might be filling in different bits of the codebase, checking the work of other agents, or pulling information from other resources to regurgitate into the artefact. We need to be able to easily write programs around them that give them well-defined rôles and communication channels.
Interactive
Popular programming languages are largely batch-driven: a program is written in a text editor by a human, and then sent to a programming language implementation of some kind for verification and execution. The editor typically provides a little live feedback for typing or documentation, but is largely static.
To have many agents working together on a problem with a human overseer feedback needs to happen at human speed. Interactive programming has been a research direction for a while (and has recently been extended with code synthesis integration), and have found some traction in the arts, but so far there are no mainstream live programming environments used for professional programming. If we are to collaborate effectively with a whole hoard of agents, being able to see and respond to what they’re doing real time is table stakes. In fact, there’s a name for the kind of radical collaborativity required to work on the same codebase as a gang of unruly agents: mob programming.
Heavily isolated
If untrusted LLMs are generating parts of our programs, we need to make sure that they (and the programs they generate) can’t break out of their sandbox and mess with other parts of the program. This implies strongly typed and tested interfaces with automated checking. Good feedback to the agent when it tries to violate the boundary should also help the agent iterate faster towards a well-behaved solution. The boundaries can be shifted by the user or by an overseeing agent to constrain the agent’s solution, adding a new test or refining the type.
We already see people generating sandboxes for their agents using Linux containers or simply buying dedicated hardware. This achieves some level of isolation, but isn’t fine-grained enough: it’s hard to give the agent access to certain websites without giving it access to the whole Web, for example. Fine-grained isolation is helpful for security, but crucial if the isolation boundary is to serve to help define the problem for the agent.
Typed holes as found in Agda, Lamdu, or Hazel are an obvious candidate here, though the notion of ‘type system’ used to specify the boundary of the hole does a lot of the heavy lifting. The circumscription and connexion of these holes can be equivalently seen as programming, specification-writing, agent orchestration, or business process modeling — depending only on the level of abstraction.
LLM-native programming languages
So far, we’ve thought about what it takes to make LLMs productive members of the programming lifecycle today. But what can LLMs bring to the programming languages of tomorrow?
For me, the key perspective here is that LLMs are optimized for languages. There are two competing definitions of ‘language’, and it’s important to distinguish them here. In computer science, ‘language’ typically refers to a formal language, which is a string of symbols with certain acceptance criteria, and all our current programming languages fall under this umbrella. In linguistics, though, a (natural) language is the chief object of study. To the linguistics researcher, programming languages are not languages, in that they don’t evolve over time in the natural world with the constraints that entails, and as such they fail to obey the laws that we see natural languages cleave to.
The type of language that LLMs are primarily trained on, and the type of language they work with most fluently, is the latter. Rather than thinking about replacing the formal languages that we already use for programming with clumsy specification in natural language, it’s more interesting to me to examine how LLMs could be used to bridge the gap between unambiguous formal specification languages and the fluidity and malleability of natural language.
As a representative example, take metaphor. Metaphor is the brick from which natural language is built. Start with a concrete example like a battle, with an attacker and defender. Remove the aspect of physical altercation; you have an argument, or a debate. But you can preserve the notions of ‘attacking an argument’ or ‘defending an argument’, and everybody will immediately know what you mean — even though no attacker or defender (in the original sense) is present. Over time, in a context where physical combat is less common than heated debate, to ‘attack’ someone comes to mean to criticize them or their logical position as a new concrete meaning, ripe for being metaphorically extended again.
We see this process in the evolution of programming languages as well. A ‘function’ in the mathematical sense (a possibly infinite set of input/output pairs) becomes a ‘function’ in the C sense (a procedure that takes some input values, performs some side effects, and usually returns an output value), and from there can be generalized again into, for example, an ‘asynchronous function’ in C# or Rust, which is a sort of state machine that can be paused and resumed! In each of these transitions the new meaning formally bears little resemblance to the old meaning, but has a similar enough interface that humans reading about them can pretend they are similar things, with some small difference to keep in mind, and thereby build intuition for the new thing. This intuition can be built on in turn, and thus we develop mathematics.
In computing and mathematics, these steps must be done manually, with the correspondance worked out by hand, implemented, and then eventually propagated to the mainstream. This is a slow and intellectually demanding process for all involved. But LLMs excel at metaphor: one of the classic examples of vector embeddings is that \(\mathrm{king} - \mathrm{man} + \mathrm{woman} = \mathrm{queen}\), translating the concept of ‘monarch’ from the semantic space of ‘man’ to the semantic space of ‘woman’. If LLMs are capable of applying natural-language reasoning to programming languages, couldn’t we have an equivalent concept of metaphor for programming-language concepts? What does it mean to apply a function metaphorically in a different context than the one for which it has been written?⁠[14]14.Homotopy type theory might have an answer from the formal perspective. Does it align with the linguistic notion? When we enable programming languages to evolve like natural languages, what will we discover?
Apologies and apologetics
It hasn’t escaped my notice that this essay started very concrete and ended rather speculative. I have concrete ideas for some of the science fiction herein, which may or may not bear fruit; but readers looking for some hard technology or some wildly optimistic visions of the future are likely to be equally disappointed.
On the same note, I fear it would be easy to misread this essay as being ‘anti-AI’, a badge that makes one rather unpopular at the moment in certain circles, and I’d like to try to preëmptively fend off such accusations. I am certainly pro ‘artificial intelligence’ in the abstract, but LLMs and the strata we build on them are far from the be all and end all of that programme, and we would be wise not to assume (as several generations have before us) that they are essentially human-level intelligence, perhaps modulo some tweaks, just because they can string a sentence together. Their inference is still very different from our reasoning, as we should always expect — after all, even if their architecture were perfectly capable of simulating a human, they inhabit a vastly different world to us with much more limited goals and learning opportunities! If we can’t even understand the world a lion inhabits⁠[15]15.Philosophical Investigations, Ludwig Wittgenstein, how could something that doesn’t share our physical reality possibly relate to us? Perhaps world models are sufficient to overcome or ameliorate this limitation, perhaps not; at least they should make failings of the model clear once data is no longer the bottleneck.
That all said, I do believe that the existence of LLMs is an exciting step forward⁠[16]16.Though only the future knows whether the step forward is exciting enough to justify the resource consumption… for research on both computation and language, and even that in certain circumstances they could potentially result in significant productivity gains. I find it unfortunate that through a series of historical accidents they are being touted as a solution to — even a trivialization of — the thorny problem of programming.