> AI Agents

Evergreen

planted Dec 27, 2025tended May 3, 2026

#moc#ai#agents#autonomous-systems

AI Agents

A map organizing my exploration of AI agents and autonomous systems.

Featured

Production LLM Eval Platforms — Multi-agent research synthesis on the production-eval landscape: the eval/observability flywheel, trace data layers, failure-mode discovery, headless agent-driven evals, AI-gateway tracing, and governance.

Research

Karpathy Autoresearch — Deep Research Report — Deep research on autonomous AI agents running ML experiments with one GPU per agent. Architecture, multi-agent patterns, the OpenClaw security crisis, and a four-GPU consumer-hardware replication build.
Agent Harness Engineering — Synthesis — Synthesis of how to make AI coding agents work reliably. Karpathy's skill tree, Boris Cherny's thread taxonomy, MercadoLibre's four levers at 20K-dev scale, OpenAI Codex team's AGENTS.md pattern.
x402 Implementation Guide — Production build journal for an x402 pay-per-call API. Hono + Coinbase CDP facilitator + EIP-3009. Specific package versions, debugging playbook, hardening patterns.
x402 Competitive Landscape — Live Services Analysis — Scrape of the x402 ecosystem (~230 services across Bazaar + ecosystem + the402.ai). Where the gaps are, where the slop is, what to build.

What I'm Building

claude-autoresearch — Plugin for Claude Code that runs autonomous, milestone-verified research loops.
agent-orchestrator — Always-alive daemon for spawning supervised Claude agents from CLAUDE.md harness templates.
research-orchestrator — Multi-Claude parallel-research pipeline with shared memory and a synthesizer/judge stage.
Autonomous Agent Arena — Three bots running 24/7 on arenabot.io against local Ollama on a four-GPU rig.
Infinite Brainstorm — Agent-native infinite canvas. Humans and agents both edit the same board.json.

Getting Started

New to AI agents? Start here:

AI Agents Fundamentals 🌿 — Core concepts, architectures, and agent types
Tool Use and Function Calling 🌿 — How agents interact with external systems
Agent Frameworks Comparison 🌿 — Choosing the right framework

Core Concepts

Agent Architecture

AI Agents Fundamentals 🌿 — Components, patterns, and agentic behavior
Agent Memory Systems 🌿 — Short-term, long-term, and episodic memory
Multi-Agent Systems 🌿 — Coordination and collaboration patterns

Tool Integration

Tool Use and Function Calling 🌿 — Extending agent capabilities
Database access, web search, code execution
Tool composition and caching

Practical Guides

Framework-Specific

Building Agents with LangChain 🌿 — LangChain development guide
Claude Agent Patterns 🌿 — Best practices for Claude
Agent Frameworks Comparison 🌿 — LangChain, AutoGPT, CrewAI, and more

Development Workflow

Agent Evaluation and Testing 🌿 — Testing strategies and benchmarks
Agent Security Considerations 🌿 — Prompt injection, tool safety, auditing
Production Agent Deployment 🌿 — Scaling, monitoring, and operations

Production Considerations

Operations

Production Agent Deployment 🌿 — Deployment architectures, scaling strategies
Agent Security Considerations 🌿 — Security best practices
Agent Evaluation and Testing 🌿 — Performance metrics and monitoring

Cost & Performance

Token budgets and caching
Rate limiting and quotas
Latency optimization

Advanced Topics

Collaboration

Multi-Agent Systems 🌿 — Hierarchical, parallel, and democratic patterns
Agent communication protocols
Conflict resolution

Memory & Learning

Agent Memory Systems 🌿 — Vector stores, episodic memory, summarization
Long-term knowledge retention
Privacy and forgetting

Project Documentation

Experiments

eliza-001 — First AI agent experiment: ElizaOS framework exploration
eliza-002 — Agent capabilities and architecture deep dive

Case Studies

To be added as I build more agents

Learning Path

Connection Points

External resources: