Claude Code Is Not the Problem. Your Harness Is.
A working Claude Code setup: CLAUDE.md, persistent memory, subagents for grep work, MCP servers, hooks, and the workflow rules that actually save tokens.
Most people I see complaining about Claude Code “going dumb” or burning through quota are not using a broken model.
They are using a broken setup.
No CLAUDE.md. No memory. No tools. They dump 40 files into one context window, watch it hallucinate, and conclude the model is the problem.
After running Claude Code daily for the last several months, I have landed on a setup that mostly works. The framing I keep coming back to is this. The harness is the variable. Below is what I actually use, the rules that hold up under daily pressure, and the parts I would skip.
Context discipline cuts token usage
Anthropic’s own Claude Code docs say context is your fundamental constraint. Every token you spend on noise is a token you do not have for the actual problem.
Three things bring this under control.
A CLAUDE.md at the repo root. Stack overview, ownership matrix, hard rules. Things like “run tsc —noEmit after every edit”, “max 50 lines per bugfix”, “one fix per commit”, “do not touch auth or payments without approval”. Claude reads it at the start of every session. You stop answering the same questions on every chat.
Persistent memory at ~/.claude/projects/<project>/memory/. Typed markdown files prefixed with user_, feedback_, project_, reference_, with a one-line index in MEMORY.md. I do not use this for everything. I use it for facts that survive across sessions, things like deployment quirks, naming conventions, and prior decisions I keep wanting to look up. Code patterns and file paths do not belong here. Those live in the code.
Subagents for grep work. This is the biggest single token win I have measured. Spawning an Explore or general-purpose subagent to do file digging keeps the noise out of your main context. The subagent burns its own tokens, returns a summary, and disappears. My main window stays clean.
The caveat from Anthropic’s docs is real. Subagents have startup overhead, so you do not use them for small one-shot lookups. Use them when the search is open-ended or when the result will be a thrown-away pile of file reads.
Workflow rules that hold up
A few rules I keep because they actually change behavior.
Auto-retros after each non-trivial session, saved to docs/retros/YYYY-MM-DD-topic.md. The next session loads the latest retro at start. You get continuity without the re-briefing tax.
Verification before completion. Claude does not get to say “done” or “fixed” until it has run the verify command and shown the output. This kills hallucinated success. It is also the rule I had to enforce most aggressively at first because the model defaults to optimism.
Atomic commits, one fix per commit, hard line limits. Forces Claude to scope its work and gives me clean rollback. Without this rule, the model will quietly bundle three things into a single commit and you will not notice until something breaks.
For architecture decisions or anything touching security or migrations, I spawn Gemini Pro, Flash, and Sonnet in parallel and synthesize the answers. Three independent reads beat one confident monologue. This has caught things a single model missed more than once.
MCP servers I keep installed
The ones I would not work without:
supabase for SQL, migrations, and schema reads from chat. Saves me bouncing to a separate console.
github for PRs, diffs, issues, and file reads. The official server. One connection covers most of what I used to leave the terminal for.
playwright and chrome-devtools-mcp so Claude can browse the deployed site, take screenshots, and run JS in the page. It QAs its own work instead of asking me to “please verify”.
context7 for current library docs instead of stale training data. This is the single MCP that has reduced my “Claude wrote against the wrong API version” bugs to almost zero.
sentry for production error triage from chat.
firecrawl for on-demand scraping and a gemini MCP to power the multi-model panel above. I use both occasionally, less than the others.
One thing the post is right about: 10+ MCP servers slows startup and adds token overhead. Anthropic’s best-practices doc warns about the same thing. Install what you actually use, remove what you do not.
Hooks do the janitor work
PreToolUse, PostToolUse, SessionStart, PreCompact, Stop. These let you run shell commands in response to specific events. Auto-save memory on Stop. Auto-run typecheck on PostToolUse for an edit. Sync state before compaction.
I use a small number of hooks. The point is not to automate everything. The point is to stop doing the same three or four things by hand at the start and end of every session.
What I would skip
A lot of “ultimate Claude Code setup” lists recommend graphify for clustering large repos into HTML and JSON knowledge graphs, and claude-flow for swarm orchestration with hooks, memory coordination, SPARC, and TDD pipelines.
I have tried both. For the kind of work I do, they were more setup than payoff. If you maintain a 200-file monorepo with multiple teams touching it, the calculus might be different. For a solo or small-team codebase, plain CLAUDE.md plus subagents already covers most of the win.
More tools is not better. The reason a tight setup works is because the model has fewer distractions, not because every component on a list is essential.
Where to start
Pick three things in this order.
Write a CLAUDE.md. Put your stack, your rules, and your “do not touch” list in it. Aim for 50 to 150 lines. If it grows past that, you are putting code-pattern detail in there that belongs in the code.
Add the verification-before-completion rule. One sentence in CLAUDE.md. Claude must run the verify command and paste output before claiming a fix works.
Use subagents for any task that starts with “find all the places where…” or “check whether the codebase does…”. Those are the open-ended searches that destroy your context if you do them in the main thread.
The model is not the problem. The setup around it is what decides whether Claude Code feels like a senior engineer or a confused intern.
Sources: