> How I Run Claude Code
How I Run Claude Code
Most people use Claude Code as a smarter chat β type a prompt, get a result, repeat. That's fine, but it leaves most of the leverage on the table. My setup treats the harness itself as a programming target: I write code that shapes how Claude behaves, runs supervised agents on top of it, and offloads the long-horizon work to processes that survive context resets.
What follows is the actual stack I run, not a tutorial. If you want to copy any of it, the artifacts are linked.
The unit of personal voice β global CLAUDE.md
The single highest-leverage file I maintain is ~/.claude/CLAUDE.md. It loads into every Claude Code session as user-level context and codifies things I'd otherwise have to repeat by hand:
- Surface assumptions. State assumptions before proceeding on non-trivial tasks. Don't silently fill in ambiguous requirements.
- Push back, don't yes-machine. If my approach has problems, point them out with evidence. Accept my override if I insist.
- Scope discipline. Touch only what I asked you to touch. Every changed line should trace to my request.
- Right-size the process. Trivial tasks get done immediately; medium tasks get a brief plan; large ones get plan β audit β implement β review.
- Verify before reporting done. After writing or modifying code, run the project's build/test/lint cycle.
- Grep broadly before fixing. When a bug is found, search all files for the same pattern before committing.
- Respect stop signals. When I say "enough" or "this is good", stop suggesting next steps.
The first six months of using Claude Code, I noticed I kept correcting the same patterns. CLAUDE.md is just the cumulative record of those corrections, written down once instead of typed every session. That's the leverage.
Settings that matter
// ~/.claude/settings.json (subset)
{
"model": "opus",
"effortLevel": "high",
"env": {
"MAX_OUTPUT_TOKENS": "64000",
"CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1"
}
}
Three deliberate choices:
- Opus by default, not Sonnet. The cost difference is small relative to my time; the quality difference on hard tasks is large.
- High effort + high max output tokens. I want longer reasoning and longer responses available when needed. Adaptive thinking handles the actual budget per turn.
- Agent teams enabled (still experimental as of mid-2026) so I can try teammate workflows for parallel work.
I run Claude Code with --dangerously-skip-permissions for any session where I'm doing real work. The permission prompts are a friction tax that's worth paying only when I'm exploring an unknown codebase. For my own projects, I trust the harness and the validation loop to catch problems.
Skills as the dominant primitive
Skills are the thing I reach for first. A skill is a small directory with a SKILL.md plus optional scripts that gets loaded on demand by name. The description stays in context as a one-liner; the body loads only when the skill is invoked.
The skills I rely on most:
agent-orchestratorβ controls the always-on daemon for spawning supervised Claude agents. (See agent-orchestrator.)research-orchestratorβ runs multi-Claude parallel research with shared memory. (See research-orchestrator.)vault-kbβ compiles inbox items in my Obsidian vault into wiki articles, lints sections for quality.slack-apiβ full Slack workspace control via the Web API usingcurl. Replaces the Slack MCP entirely; smaller context footprint, fewer tokens.twitter-apiβ same idea for Twitter via TwitterAPI.io.infinite-brainstormβ drives a board.json canvas. (See Infinite Brainstorm.)
The pattern that keeps proving itself: prefer a skill over a heavy MCP server when both can do the job. MCPs blow up context and token usage. A skill that wraps curl is smaller, debuggable, and doesn't need a long-running server.
Hooks for the things memory can't do
Memory and CLAUDE.md preferences can't fulfill automated behaviors β "from now on, when X, do Y." Those need hooks, executed by the harness itself, not by Claude. A few I use:
- Marathon-session detection on
UserPromptSubmit. After ~50 user turns or 5 hours of session, surface a reminder to wrap up and start a fresh session. Long sessions degrade quality; the hook fires the warning so I don't have to remember. - Memory injection at session start. Loads
MEMORY.mdfrom~/.claude/projects/<cwd>/memory/so prior-conversation facts are available without me typing "remember whenβ¦". - Validation loops β for my own projects, after a write/edit, the harness runs typecheck/lint and surfaces failures in-line so I (or the agent) fixes them before reporting done.
The bar for adding a hook: it would be a real recurring annoyance if I didn't. If it's a one-time preference, it goes in CLAUDE.md instead.
Persistent memory
Memory lives in ~/.claude/projects/<cwd>/memory/ as plain Markdown files with frontmatter. There's an index at MEMORY.md and individual files for each fact. Key types:
- user β who I am, role, preferences, knowledge depth
- feedback β guidance I've given (corrections and validated approaches), with a Why and a How to apply
- project β facts about ongoing work (status, deadlines, stakeholders)
- reference β pointers to where information lives in external systems
The discipline that took me a while to learn: save memories from successful approaches, not just from corrections. If I only saved corrections, the model would drift toward overly cautious choices because all the positive signal evaporates between sessions.
Plan β Audit β Implement β Review
For any non-trivial multi-file task:
- Plan. Write a detailed plan document (
plans/YYYY-MM-DD-*.md). - Audit the plan. Spawn 3β4 review agents in parallel β scope/ROI, dependencies/ordering, design consistency, backward compatibility. They look at the plan from independent angles before any code is written.
- Incorporate findings. Update the plan with the corrections.
- Implement. Execute in batches, running typecheck + lint after each batch.
- Review the code. Spawn another review wave β pattern recognition, architecture, simplicity, security.
- Visual verify if it's UI.
The audit step before implementing is what saved me real time. It's tempting to skip it. Don't.
Coordinator over executor for long-horizon work
For genuinely heavy work β multi-day refactors, multi-phase research projects β I don't run them in a single Claude Code session. The session would exhaust its turn budget or saturate context. Instead, a coordinator process spawns fresh claude CLI sub-sessions for each step. Each sub-session gets its own context window and turn limit. The coordinator just reads state between steps and decides what's next.
This is what claude-autoresearch and agent-orchestrator are: durable runtimes that survive context boundaries, with the agent loop reduced to "read state, do step N, write state, exit." Fresh-context for the work, persistent state for the loop.
What I've actually built on top
Three pieces of infrastructure I run continuously:
- agent-orchestrator β always-alive daemon that supervises Claude agents spawned from CLAUDE.md harness templates. They run as long-running processes, get evaluated automatically, and resume across context boundaries.
- claude-autoresearch β Claude Code plugin that runs autonomous, milestone-verified research loops. The shape is
modify β run β measure β keep / revert β repeatfrom Karpathy's autoresearch concept. - research-orchestrator β Claude Code skill that runs deep research as a multi-agent pipeline (splitter β N parallel agents β coordinator β gap-fill agents β synthesizer β judge).
The unifying principle: I program the harness, not just the prompts. The model is one component; the system around it is the actual product.
What I read to think about all this
- Agent Harness Engineering β Synthesis β the framework that informs most of my CLAUDE.md / skills / hooks decisions. Karpathy's skill tree, Boris Cherny's thread taxonomy, MercadoLibre's four levers.
- Karpathy Autoresearch β Deep Research Report β the source-of-truth for the autonomous-loop pattern.
- Production LLM Eval Platforms β Full Research Report β what it actually takes to know whether your harness is working.
Connection points
- The substrate this all runs on is described in Homelab for Autonomous Agents β when I run autonomous agents on local Ollama instead of the cloud, this stack doesn't change; only the inference endpoint does.
- The "Claude Code as programming target, not chat replacement" thesis is what ties claude-autoresearch, agent-orchestrator, research-orchestrator, and Infinite Brainstorm together.