Is prompt engineering more leveraged than the harness, long-term?

As base models keep getting better at following short, loose prompts, does the long-term leverage shift back to prompt-craft, or does it stay with the harness around the agent (typed adapters, file-driven planning, adversarial review, test scaffolds)? For code I expect to maintain in a year or three, where’s the compounding return?

The tension

Prompt techniques are getting cheaper, stronger base models need less prompting. Harness work stays expensive to build and to keep current. If model capability keeps climbing fast enough, a lot of scaffolding could look like over-engineering in three years. On the other hand, prompts go stale every time a model changes; typed adapters, file-driven phases, and adversarial review loops don’t.

My current lean

Harness over prompt. The decisions and playbooks already on this site embody that lean:

Why I keep LLMs behind typed adapters, structural, outlives prompt drift.
A file-driven planning framework for AI-assisted coding, each phase reads and writes a file, crash-resumable.
Adversarial review via a second AI model, a second model with different blindspots, priced in as a step.

The bet is that harness pays off more on maintenance than prompt-craft does, because maintenance is when the model, the codebase, and the team have all changed.

What would change my mind

Experienced practitioners shipping and maintaining long-running agent systems, Karpathy and the small group of people doing this work in public, landing on “the harness was over-engineering; prompt discipline would have carried it.” If the consensus among that group shifts toward prompt-first for long-lived code, revisit.

Status

Open. Not holding the lean tightly.

Knowledge graph

Is prompt engineering more leveraged than the harness, long-term?

The tension

My current lean

What would change my mind

Status

Talk to me