Overdressed and Underperforming

When context files first started appearing in projects, I was the AGENTS.md maximalist. I told anyone who would listen to put everything in. Coding conventions. Folder structures. Design patterns. Database migration guides. Tech stack notes. Testing preferences. The more detail the better, I said. Make it a constitution. The agent will thank you.

I was wrong.

There was no single disaster. No dramatic outage at 2 AM. What happened was quieter and more insidious. A thousand small papercuts.

The first cut came when I asked the agent to write a simple utility function. A dozen lines, nothing clever. It returned a fully abstracted factory pattern with interfaces, an abstract base class, and a configuration provider. Correct. Completely unnecessary. I had written "prefer the factory pattern for object creation" in my AGENTS.md months earlier, thinking about a payment processing module I was working on. The agent applied it to a string formatter.

The database migration guide was worse. I wrote it for SQLite. The project migrated to a different database months ago. I updated the tech stack section at the top of the file but forgot the migration guide buried three hundred lines down. The agent kept suggesting SQLite-specific commands long after they stopped working. Each time, I blamed the model. I never blamed the file.

The folder structure section told the same story. I documented where to put models, services, and endpoints. The agent could already discover this by scanning the codebase. It did not need my map. But it obeyed it anyway, and sometimes my outdated map pointed to the wrong neighborhood.

These moments did not arrive all at once. They accumulated across weeks of daily work. The agent was not malfunctioning. It was following the constitution I had written. I was the bottleneck I had been complaining about.

Then a study landed that changed how I thought about all of this. Researchers tested multiple coding agents across a range of real-world tasks, comparing performance with no context file, LLM-generated context files, and developer-written ones. LLM-generated context files reduced task success rates while increasing inference costs significantly. Human-written files fared slightly better but still drove up costs. The researchers concluded that unnecessary requirements from context files make tasks harder and recommended that human-written context files describe only minimal requirements.

The study identified three failure modes, all of which I recognized in my own work. First, over-generalization. The agent treats AGENTS.md as a constitution to be obeyed, not a collection of suggestions. "Always apply the factory pattern" does not mean apply it where it makes sense. It means always. The agent lacks the judgment to know when to ignore a rule.

Second, redundancy. Modern coding agents can scan your codebase and discover your conventions on their own. They can see how you organize folders, what patterns you use, where you put tests. When your AGENTS.md describes the same things the code already shows, you now have two information sources that can drift apart. The outdated one in your file competes with the current one the agent discovers. Conflicting instructions produce confused output.

Third, staleness. AGENTS.md is documentation, and documentation rots. You change your tech stack. Your conventions evolve. Your team grows and shifts practices. The code updates. The file often does not. Every line that goes stale is a line that costs tokens without delivering value. Worse, it can actively mislead the agent into doing work that no longer applies.

The practical question is what to do about it. I do not delete my AGENTS.md anymore. I audit it. For every section, I ask three questions.

Does this apply to nearly every task? If the instruction matters only five percent of the time, it does not belong in a file that gets loaded into every single context window. Move it somewhere scoped.

Can the agent figure this out on its own? Folder structures, coding patterns visible in the codebase, anything the agent can discover by scanning the repository is redundant. Delete it. Keep only what the agent cannot infer. Custom build commands, non-obvious conventions the code does not surface.

Is this specific enough to live elsewhere? Conditional instructions that apply only to certain file types or scenarios belong in scoped rules or agent skills, not the global context file. Rules can be targeted to specific folders or file patterns. Skills can be invoked on demand. Keep the global file genuinely global.

A word of caution about agent skills. Self-discovery is still unreliable. Agents ignore skills sometimes. Performance can suffer when you add too many. You cannot assume a skill will fire when needed. You still need to be the steward. Notice when a step gets skipped. Trigger the skill manually. The human is not out of the loop yet.

The deeper lesson is not about AGENTS.md at all. It is about certainty. I spent months recommending something I had never tested, repeating advice from others who had also never tested it. We were all operating on vibes, passing it off as best practice. The study did not just change how I write context files. It changed how I hold my own opinions.

Trust your experience. It is real and it matters. But verify it. Especially when you are telling other people what to do.

Overdressed and Underperforming

Continue Reading.

Knock Before You Commit

Your Fundamentals Are Showing

Triggered by a Fork