What Running Claude Code as an Agent Taught Me

Most of my Claude Code time used to look like this. Type a prompt. Read the diff. Accept it. Type another prompt.

That is chat. It works. It is also not the thing the tool was built for.

The shift happened when I started letting Claude run as an agent. Long-running tasks. Plan mode. Subagents doing the investigation. Hooks running verification on their own. After a few months of that, the lessons I came back to over and over were not about features. They were about discipline, and others coding agentically keep landing in the same place.

Here is what running Claude Code agentically actually changed for me.

Short sessions beat long marathons

The single most useful thing I noticed is that context degrades long before it fills up. Anthropic's best-practices doc says it plainly. Performance drops as context fills.

What this looks like in practice. Around 60 to 70 percent context, Claude starts forgetting rules in CLAUDE.md it was following an hour ago. It bundles commits even though I told it one fix per commit. It claims a fix works without running the verify command.

The fix is /clear between tasks, not /compact. Compaction summarizes. Clearing resets. For unrelated work, you want the reset. Agent-based runs in particular eat context fast because the agent reads files, runs commands, and pulls back tool output you do not need anymore.

Rule I keep. Two unrelated tasks in one session is one too many.

Checkpoint before any autonomous run

If you are about to say "go fix all the lint errors" or "migrate this directory to TypeScript", commit first.

Atomic, named, signed if your repo cares. The reason is simple. Agent-based work compounds errors. By the time you notice the agent has gone off the rails, it has touched fifteen files. Reverting from a clean checkpoint takes one command. Fixing forward takes an hour and rarely works.

Claude Code now ships checkpointing inside the tool itself. Double-tap escape, restore code or conversation or both. Useful, but it only tracks Claude's edits, not external processes. Real git commits still cover you for everything else.

Rollback over fix-forward

This is the rule that took me longest to accept.

When an agent run produces something wrong, the instinct is to correct. "No, you missed this case. Fix it." That works for one round. Rarely two. Past that, the context is polluted with failed approaches and the model is now reasoning from the wrong frame.

A fresh session with a sharper prompt almost always outperforms a long session with twenty corrections. The Claude Code docs flag this directly under "common failure patterns" and call it the correcting-over-and-over trap.

When I notice I have corrected the same issue twice, I stop. /clear, write a better prompt that bakes in what I learned, and go again. Faster every time.

Plan mode is not optional for multi-file work

The agent will happily jump straight into a refactor and bundle three concerns into one commit before you can stop it.

Plan mode separates research from execution. Claude reads files and writes a plan to disk. Nothing changes until you approve. For anything touching more than two files, anything refactor-shaped, anything in auth or migrations, I use it without thinking.

The single CLAUDE.md rule that made the biggest difference. "If execution diverges from the approved plan, stop and re-enter plan mode." Without it, agents drift. With it, they ask before scope-creeping.

Press Alt+T inside plan mode to enable extended thinking. For migrations and architecture decisions, the extra reasoning budget is worth more than the tokens it costs.

Verification is the line between an agent and an intern

The model defaults to optimism. Left alone, it will say "done" when the test it wrote does not actually run.

Three things plug this gap.

A PostToolUse hook that runs typecheck after any edit. If the typecheck fails, the agent cannot move on. Hooks cannot be ignored the way prompts can.

A CLAUDE.md rule that says "do not say done until you have run the verify command and pasted the output." Roughly 80 percent compliance from the rule alone. The hook covers the other 20.

For UI work, screenshots. The Chrome MCP or playwright lets the agent open the deployed page, grab a screenshot, and compare. This catches the cosmetic drift that tests cannot.

Split your models

Running everything on Opus is expensive and often slower. Running everything on Sonnet leaves quality on the table for the hard reasoning steps.

The split that works for me. Opus on the main session for planning, integration, and architecture calls. Sonnet on subagents for the focused work. The CLAUDE_CODE_SUBAGENT_MODEL env var controls the subagent model. Set once, forget it.

Subagents get their own context window. The main agent stays clean. The subagent burns its own tokens on a focused task and returns a summary. This was the single biggest token win I measured when I switched.

Worktrees beat branches for parallel agents

Two agents on the same checkout will collide. They will overwrite each other's edits, fight over the same lockfile, and leave the working tree in a state neither one expected.

Git worktrees give each agent its own isolated checkout of the same repo. Same git history, separate working directory. The Claude Code desktop app supports this directly. Each session gets its own worktree. Edits do not cross.

I use worktrees for two patterns. Writer plus reviewer, where one agent implements and a second agent in a fresh context reviews without bias toward the code it just wrote. And fan-out, where I run several focused agents on independent chunks of a larger migration.

Where to start

Three things, in order.

Set up CLAUDE.md and the rest of the harness with the rule about re-entering plan mode if execution diverges. One line. It changes more behavior than any other line you can write.

Make checkpoint commits a reflex before any autonomous task. Atomic, named, before the run starts. Reverting becomes free.

When you notice yourself correcting the same thing twice in a session, stop. /clear and start over with a sharper prompt. Treat polluted context as a dead session, not something to salvage.

Agent-based coding is a different mode of work, not a different feature. The features were already there. The discipline is what I had to build.