Dark geometric blocks — scattered on the left, assembled into a coherent structure on the right
Post

Stop Waiting for Better Models

Stop Waiting for Better Models

“You are not allowed to ask for better people. The right system makes the wrong people do the right thing.” — maybe Russ Roberts on Econtalk1

If you follow any discussion about AI coding agents, you’ll notice a strange disconnect. Critics point to real weaknesses. Accumulation of dead code, bad architecture, etc. Proponents claim they’re 10x more productive and can’t imagine going back. I think both groups are telling the truth, but they’re talking about fundamentally different things.

It seems that critic usually describe the experience of pointing a vanilla agent at a codebase and asking it to do something. The proponent has typically spent months refining their tooling, writing custom skills, building feedback loops, and evolving their workflow. Comparing these two experiences is like comparing someone who writes code in Notepad with no tests and no version control to someone working with CI, code review, and a testing culture. Of course the outcomes are different.

This is where the quote at the top comes in. There is nothing stopping you from building a system around the identified weaknesses. The key is identifying them and finding a way to address them.

Addressing some common complaints

I’ve been working with Claude Code on a production codebase for several months. I’ve also adopted the BMAD method which defines workflows and agent personas that support them. One of my favorite features about it is retrospectives with a group of agents. These are remarkably effective and lead to continuous improvement of how the agent and I work together but also of the work the agent does by itself.

One of our recent retrospectives (I say “our” but I was really the only human in it) lead to the creation of a new /code-health skill. It describes process and tools for the agent to use to address the below, common issues:

Agents won’t remove dead code. The code health skill runs knip to find unused exports and tsc --noUnusedLocals to catch dead variables and imports. The agent isn’t reluctant to delete code — it just doesn’t know what’s dead without tooling. Neither do humans, which is why IDEs grey things out.

Agents miss cross-file refactors. The skill runs jscpd across the entire codebase to find duplication clusters. This gives the agent a map of where similar code lives, independent of its context window. It then tackles clusters one at a time with test verification between each pass. In one case, jscpd flagged auth middleware duplication across 21 call sites, and the agent extracted a shared procedure that made org-scoping structural.

Agents don’t notice complexity creep. The skill runs lizard for cyclomatic complexity analysis. Functions above the threshold get flagged. The retro process added nuance: high complexity from JSX ternaries is usually fine, but high complexity from state management logic is a real problem.

We modified the bmad workflows to now run the skill as one of the last steps as we finalize work on a ticket. Per a recurring action item assigned to me at that same retro, I am also running these skill against the entire code base at least once a week till we decide in the retro that we are happy with the baseline.

Results (so far): Duplication density peaked at 4.7 clones/kLOC during rapid feature development in mid-March. After the code health skill was introduced, it dropped to 2.9 — the codebase grew 22% while absolute clone count went down from 116 to 88. Average function complexity held flat at 2.6 the entire time, despite the codebase more than doubling.

The real measure will be escaped bugs and velocity changes, but it’s too early to give meaningful numbers on those.

It’s about the system

The above improvement to the process I use with my coding agents is just one of many examples I could give (running adversarial reviews with both Codex and OpenHands + Kimi 2 Thinking would be another standout). The key point here though aren’t the exact improvements, but acknowledging that you can overcome “skill issues” with process. This is what I loved about eXtreme Programming for human teams and what I love about whatever you want to call what I am doing with AI agents now. The parallel is that again retros are key (could we make this faster and more automatic with agents though?) and a willingness to adjust your process and tools.

So in summary: Don’t ask for better models! Build the system that makes the wrong model do the right thing!

  1. I remember this quote from Econtalk but haven’t been able to track down the exact episode. I doubt I made it up though 😂. The closest attributed version is Deming’s “A bad system will beat a good person every time.” If you know the episode, please let me know. 

All rights reserved by the author.