AIAgentsDeveloper ToolsWorkflow Design

The Coding Agent Needs a Harness

Anthropic's large-codebase Claude Code guide makes a familiar point concrete: agents need context, tools, maintenance, and ownership around them before they can do serious work.

Max KellyMay 16, 20269 min read

Anthropic's recent guide to using Claude Code in large codebases is one of the clearest enterprise versions of an argument I keep coming back to:

Agents are only as useful as the system around them.

In What is an AI Agent, Anyways?, I used a simple five-part frame: model, prompt, context, tools, and harness.

That last piece is the one people skip. An agent is not the model by itself. It is the model operating inside a system of instructions, context, tools, and guardrails. The harness is the thing tying that together.

The Anthropic piece takes that same idea and makes it concrete for software teams.

That matters because most AI coding advice still treats the model as the main event. Use a better model. Write a better prompt. Try a different editor. Add more context.

That can work for small tasks.

It starts to break down once the codebase gets real.

Large codebases have old decisions, hidden conventions, duplicated names, odd build commands, local ownership boundaries, generated files, and tests that only make sense from a particular directory.

A coding agent can be strong and still waste most of its time looking in the wrong places.

The better question is:

What does the agent need around it to work like a useful engineer?

That is where the harness stops being an abstract idea and becomes engineering infrastructure.

The harness is the operating layer

Anthropic's large codebase guide makes a useful point: Claude Code does not depend on a stale central index. It navigates more like an engineer, using the live filesystem, command line tools, file reads, references, and search.

That is a good default.

It also means the setup around the repo matters a lot.

If the agent starts from the wrong level of the tree, it burns context finding the relevant subsystem. If every convention is crammed into one giant rules file, it carries noise through every session. If the test command is generic, it may run the wrong suite or drown itself in output.

The fix is not a bigger instruction file.

It is better harness design.

A small root CLAUDE.md, or equivalent rules file, should explain the shape of the repo, the main commands, and the gotchas that apply everywhere. Subdirectory rules should handle local conventions. A service with its own test command, deployment behavior, or naming pattern should say that close to the code.

This is the same principle that makes good documentation work.

Put the durable, broad information at the top. Put the specific information where it is used. Keep the rest out of the way.

Rules and skills do different jobs

One distinction I liked in the article was between rules and skills.

Rules are conventions.

Skills are workflows.

That sounds minor, but it prevents a lot of messy agent setup.

A rule might say that API routes live in a specific folder, that a billing module uses cents rather than dollars, or that generated files should not be edited. The agent should carry that as background when it works in that area.

A skill is different. It is a repeatable process for a task type: add an endpoint, review a migration, update documentation, create a release note, audit auth logic, or investigate a flaky test.

Those instructions do not need to be loaded all the time.

They need to appear when the task calls for them.

This is one of the places where AI work starts to look less like prompt writing and more like product operations. You are not trying to create one perfect mega prompt. You are deciding which pieces of expertise should be available, when they should appear, and how they should stay current.

That is the same pattern I see in non-coding workflows too. Context files, reusable playbooks, review points, and tool access are not optional extras once the work has to repeat. They are the structure that makes the agent usable.

The mistake is loading all of that expertise into the session by default.

Rules should be durable background. Skills should be called when the task needs them.

Search needs to become more precise

Basic text search is useful.

It is also blunt.

In a small repo, grep or rg gets you most of the way there. In a large repo, a common function name can return hundreds of matches. The agent then spends context reading through files to figure out which reference is real.

That is where language-server-backed tools become interesting.

Developers already rely on this every day. Go to definition. Find references. Rename symbol. Show type errors. The IDE has a structural understanding of the code that plain string search does not.

Giving that same capability to the agent changes the quality of exploration.

It can ask for the definition of the symbol it is actually editing. It can find references to that exact function rather than every text match with the same name. It can separate two identical names in different languages or modules.

For small teams, this may feel like extra machinery.

For a large codebase, it is basic navigation.

Hooks are maintenance too

Most people think about hooks as a safety layer.

Stop the agent from editing a directory. Run formatting. Block dangerous commands.

That is useful.

Some of it should be deterministic. Formatting, linting, and deny rules should not depend on the model remembering an instruction.

But the more interesting use is maintenance.

A start hook can load fresh context into a session: current Git state, recent commits, the owning team, relevant docs, or the ticket being worked on.

A stop hook can inspect what changed and propose updates to CLAUDE.md or local rules while the session is still fresh.

That matters because repo guidance goes stale quietly.

The code changes. The conventions drift. The model gets better. The old workaround becomes drag.

If nobody maintains the agent layer, it slowly turns into another pile of outdated documentation. Hooks can make that maintenance visible. They do not need to auto-merge guidance changes. Even a generated review note is useful if it catches that a local convention changed and the repo instructions did not.

This is especially important because the model is not the only thing changing.

The harness itself has to evolve.

Exploration should not bloat the editing session

Subagents are useful because exploration is messy.

A good investigation can touch a lot of files. It may involve reading old modules, searching issue history, checking docs, comparing patterns, and ruling out bad paths.

That work is valuable.

It is also expensive context.

For serious tasks, it often makes sense to split the job:

Send one agent to map the backend.
Send another to inspect the frontend path.
Send another to check database or auth assumptions.
Bring back concise findings.
Let the main session edit from the summary.

The point is not parallelism for its own sake.

The point is keeping the editing session clean enough to make good decisions.

This is very similar to how a strong engineer works. They investigate, take notes, narrow the problem, and then make the change with the right context in view.

The org layer is the part people skip

The strongest part of Anthropic's advice is also the least glamorous: someone has to own this.

If every developer builds their own agent setup, the organization gets fragmentation. Different rules. Different skills. Different permissions. Different assumptions about review, testing, and what the agent is allowed to touch.

That can feel productive at first because everyone is experimenting.

Then it becomes hard to scale.

The better pattern is a small ownership group. Developer experience, platform engineering, or one clear DRI can own the shared rules, skills, plugin setup, MCP servers, permissions, and rollout path.

This is where the article's plugin point matters.

If a team figures out a good setup, it should not stay trapped in one person's dotfiles. Package the working rules, skills, hooks, and MCP configuration so the next person starts from the same baseline.

That does not mean every team loses control.

It means there is a standard place for the shared layer, and a clear process for extending it.

This is the part that makes AI coding adoption feel less like a pile of individual tricks and more like infrastructure.

Start with the boring pieces

If I were setting this up for a team, I would not start with the most advanced part.

I would start with five boring pieces:

A lean root rules file.
Local rules for the most active subsystems.
Exact test and lint commands for each subsystem.
Ignore rules for generated files and noisy directories.
A small set of repeatable skills for common work.

Then I would add sharper navigation through LSP or MCP where the repo actually needs it.

Then I would add hooks for context loading and rule maintenance.

Then I would package what works and formalize ownership.

The ordering matters because the harness should grow from real friction. A repo with three services does not need the same setup as a million-line monorepo. A team with two developers does not need the same rollout process as an enterprise engineering org.

But the direction is the same.

The coding agent is not a magic layer floating above the codebase.

It is another participant in the engineering system.

If the system is legible, scoped, testable, and maintained, the agent can do much more useful work.

If the system is vague, stale, and tribal, the agent inherits that confusion.

That is the practical lesson.

The model matters.

The harness around it matters more than most teams have admitted.

Back to all writing