> How I Run Claude Code

Budding

planted May 5, 2026tended May 5, 2026

#practice#claude-code#agents#harness#workflow#personal

How I Run Claude Code

Most people use Claude Code as a smarter chat — type a prompt, get a result, repeat. That's fine, but it leaves most of the leverage on the table. My setup treats the harness itself as a programming target: I write code that shapes how Claude behaves, runs supervised agents on top of it, and offloads the long-horizon work to processes that survive context resets.

What follows is the actual stack I run, not a tutorial. If you want to copy any of it, the artifacts are linked.

The unit of personal voice — global CLAUDE.md

The single highest-leverage file I maintain is ~/.claude/CLAUDE.md. It loads into every Claude Code session as user-level context and codifies things I'd otherwise have to repeat by hand:

Surface assumptions. State assumptions before proceeding on non-trivial tasks. Don't silently fill in ambiguous requirements.
Push back, don't yes-machine. If my approach has problems, point them out with evidence. Accept my override if I insist.
Scope discipline. Touch only what I asked you to touch. Every changed line should trace to my request.
Right-size the process. Trivial tasks get done immediately; medium tasks get a brief plan; large ones get plan → audit → implement → review.
Verify before reporting done. After writing or modifying code, run the project's build/test/lint cycle.
Grep broadly before fixing. When a bug is found, search all files for the same pattern before committing.
Respect stop signals. When I say "enough" or "this is good", stop suggesting next steps.

The first six months of using Claude Code, I noticed I kept correcting the same patterns. CLAUDE.md is just the cumulative record of those corrections, written down once instead of typed every session. That's the leverage.

Settings that matter

// ~/.claude/settings.json (subset)
{
  "model": "opus",
  "effortLevel": "high",
  "env": {
    "MAX_OUTPUT_TOKENS": "64000",
    "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1"
  }
}

Three deliberate choices:

Opus by default, not Sonnet. The cost difference is small relative to my time; the quality difference on hard tasks is large.
High effort + high max output tokens. I want longer reasoning and longer responses available when needed. Adaptive thinking handles the actual budget per turn.
Agent teams enabled (still experimental as of mid-2026) so I can try teammate workflows for parallel work.

I run Claude Code with --dangerously-skip-permissions for any session where I'm doing real work. The permission prompts are a friction tax that's worth paying only when I'm exploring an unknown codebase. For my own projects, I trust the harness and the validation loop to catch problems.

Skills as the dominant primitive

Skills are the thing I reach for first. A skill is a small directory with a SKILL.md plus optional scripts that gets loaded on demand by name. The description stays in context as a one-liner; the body loads only when the skill is invoked.

The skills I rely on most:

agent-orchestrator — controls the always-on daemon for spawning supervised Claude agents. (See agent-orchestrator.)
research-orchestrator — runs multi-Claude parallel research with shared memory. (See research-orchestrator.)
vault-kb — compiles inbox items in my Obsidian vault into wiki articles, lints sections for quality.
slack-api — full Slack workspace control via the Web API using curl. Replaces the Slack MCP entirely; smaller context footprint, fewer tokens.
twitter-api — same idea for Twitter via TwitterAPI.io.
infinite-brainstorm — drives a board.json canvas. (See Infinite Brainstorm.)

The pattern that keeps proving itself: prefer a skill over a heavy MCP server when both can do the job. MCPs blow up context and token usage. A skill that wraps curl is smaller, debuggable, and doesn't need a long-running server.

Hooks for the things memory can't do

Memory and CLAUDE.md preferences can't fulfill automated behaviors — "from now on, when X, do Y." Those need hooks, executed by the harness itself, not by Claude. A few I use:

Marathon-session detection on UserPromptSubmit. After ~50 user turns or 5 hours of session, surface a reminder to wrap up and start a fresh session. Long sessions degrade quality; the hook fires the warning so I don't have to remember.
Memory injection at session start. Loads MEMORY.md from ~/.claude/projects/<cwd>/memory/ so prior-conversation facts are available without me typing "remember when…".
Validation loops — for my own projects, after a write/edit, the harness runs typecheck/lint and surfaces failures in-line so I (or the agent) fixes them before reporting done.

The bar for adding a hook: it would be a real recurring annoyance if I didn't. If it's a one-time preference, it goes in CLAUDE.md instead.

Persistent memory

Memory lives in ~/.claude/projects/<cwd>/memory/ as plain Markdown files with frontmatter. There's an index at MEMORY.md and individual files for each fact. Key types:

user — who I am, role, preferences, knowledge depth
feedback — guidance I've given (corrections and validated approaches), with a Why and a How to apply
project — facts about ongoing work (status, deadlines, stakeholders)
reference — pointers to where information lives in external systems

The discipline that took me a while to learn: save memories from successful approaches, not just from corrections. If I only saved corrections, the model would drift toward overly cautious choices because all the positive signal evaporates between sessions.

Plan → Audit → Implement → Review

For any non-trivial multi-file task:

Plan. Write a detailed plan document (plans/YYYY-MM-DD-*.md).
Audit the plan. Spawn 3–4 review agents in parallel — scope/ROI, dependencies/ordering, design consistency, backward compatibility. They look at the plan from independent angles before any code is written.
Incorporate findings. Update the plan with the corrections.
Implement. Execute in batches, running typecheck + lint after each batch.
Review the code. Spawn another review wave — pattern recognition, architecture, simplicity, security.
Visual verify if it's UI.

The audit step before implementing is what saved me real time. It's tempting to skip it. Don't.

Coordinator over executor for long-horizon work

For genuinely heavy work — multi-day refactors, multi-phase research projects — I don't run them in a single Claude Code session. The session would exhaust its turn budget or saturate context. Instead, a coordinator process spawns fresh claude CLI sub-sessions for each step. Each sub-session gets its own context window and turn limit. The coordinator just reads state between steps and decides what's next.

This is what claude-autoresearch and agent-orchestrator are: durable runtimes that survive context boundaries, with the agent loop reduced to "read state, do step N, write state, exit." Fresh-context for the work, persistent state for the loop.

What I've actually built on top

Three pieces of infrastructure I run continuously:

agent-orchestrator — always-alive daemon that supervises Claude agents spawned from CLAUDE.md harness templates. They run as long-running processes, get evaluated automatically, and resume across context boundaries.
claude-autoresearch — Claude Code plugin that runs autonomous, milestone-verified research loops. The shape is modify → run → measure → keep / revert → repeat from Karpathy's autoresearch concept.
research-orchestrator — Claude Code skill that runs deep research as a multi-agent pipeline (splitter → N parallel agents → coordinator → gap-fill agents → synthesizer → judge).

The unifying principle: I program the harness, not just the prompts. The model is one component; the system around it is the actual product.

What I read to think about all this

Agent Harness Engineering — Synthesis — the framework that informs most of my CLAUDE.md / skills / hooks decisions. Karpathy's skill tree, Boris Cherny's thread taxonomy, MercadoLibre's four levers.
Karpathy Autoresearch — Deep Research Report — the source-of-truth for the autonomous-loop pattern.
Production LLM Eval Platforms — Full Research Report — what it actually takes to know whether your harness is working.

Connection points

The substrate this all runs on is described in Homelab for Autonomous Agents — when I run autonomous agents on local Ollama instead of the cloud, this stack doesn't change; only the inference endpoint does.
The "Claude Code as programming target, not chat replacement" thesis is what ties claude-autoresearch, agent-orchestrator, research-orchestrator, and Infinite Brainstorm together.

>> referenced by (2)

AI Agents

...ns and agents both edit the same board.json. Practice & Infrastructure - [[How I Run Claude Code]] — My personal Claude Code stack — global CLAUDE.md, custom skills, hooks, the a...

Homelab for Autonomous Agents

...— what training (not just inference) on this rig would look like. - Pairs with [[How I Run Claude Code]] — the homelab is the substrate, that note is the workflow on top.