Bare Promises

The hunger is real. AI agents are writing more production code every day, and the instinct to design tooling specifically for them is not wrong. It is actually one of the more clear-headed reactions to where software engineering is heading. If agents become first-class producers of code, then the languages and tools they use deserve first-class design attention.
But there is a gap between instinct and execution, and right now that gap is wide enough to drive a truck through.
The problem is not ambition. The problem is that we do not actually know what makes a language good for agents. Not in a vague, "well, simplicity is probably good" way. In a rigorous, empirical, here-is-the-data way. And without that data, every claim of "agent-first" is a hypothesis wearing the clothes of a product.
Consider what we would need to study to answer this question properly. How does syntax density affect token consumption and output quality? If a language requires more tokens to express the same logic, does an LLM generate worse code because it runs out of context window faster? Does implicit behavior (type inference, operator overloading, magic methods) increase hallucination rates because the model has to track invisible state? Do functional patterns produce fewer runtime bugs than imperative ones when the author is a language model rather than a human?
Nobody knows. Not the researchers. Not the platform teams. Not the companies shipping agent-first tooling. The studies have not been done. We are in an empirical dark age about one of the most important questions in AI-assisted development, and we are shipping products into it anyway.
Then there is the cold-start problem. It is obvious enough that it should stop most launches before they begin. Any new language enters the world with zero representation in LLM training data. For a language claiming to be built for agents, this is self-defeating. An agent asked to write code in a novel language needs extensive documentation fed into its context window just to produce something that compiles. Meanwhile, the same agent can produce working TypeScript because it has seen billions of examples. The agent-first advantage has to be large enough to overcome the training data deficit. Has anyone measured whether it is?
This is not an argument against new languages. It is an argument against calling them agent-first before proving they are.
The irony is that existing languages are already surprisingly agent-friendly, each in a different accidental way. TypeScript has volume. The sheer mass of training data means agents produce working output more often than not. Rust has the borrow checker, which acts as an extra correctness filter, narrowing the space of valid programs and catching hallucinations at compile time. Go has deliberate simplicity, which means fewer ways for an agent to go wrong. None of these were designed for agents. They just happen to work.
My boss put it plainly recently. Anyone can code in any language now, he said, provided the language is popular enough for the models to be good at it, and you as the engineer know enough to keep the model from writing code with security and performance risks. Popularity buys you working output. The rest is on you.
A new language has to beat these accidental advantages on purpose. That is a high bar. And most of what passes for agent-first design right now (structured diagnostics, formal grammars, constrained output, better error messages) are features you could add to existing toolchains. A thin layer on top of the TypeScript compiler that emits machine-parseable diagnostics in a format optimized for context windows would get you eighty percent of the way there without asking anyone to learn a new language. Why does this need to be a new language instead of a better tool?
That question deserves an answer, and in most cases, it does not have one.
What would a genuinely agent-first language look like if we actually did the research first? It would probably not resemble a C-family language with minor syntax tweaks. It might look closer to a specification language. It might embed verification directly into the syntax, so that every function declares what it guarantees and the compiler enforces those guarantees at generation time. It might prioritize determinism and auditability over expressiveness. It might ship with a formal grammar that fits in a single context window and a synthetic training corpus that teaches the language to any model in a few hundred examples.
Or it might look like something none of us have imagined yet. That is the point. We should be running experiments, not writing press releases.
The hunger for agent-first tooling is not going away. If anything, it will intensify as agents take on more of the software supply chain. The first team that actually studies how language models interact with programming language design, and builds something from that data, will earn the label. Everyone else is making bare promises.
Disclaimer: All content reflects my personal views only and does not represent the positions, strategies, or opinions of any entity I am or have been associated with.


