> AI Agents Fundamentals
AI Agents Fundamentals
πΏ Budding note β foundational concepts for autonomous AI systems.
What is an AI Agent?
An AI agent is an autonomous system that:
- Perceives its environment through inputs (text, APIs, sensors)
- Reasons about what actions to take
- Acts on the environment to achieve goals
- Learns from feedback to improve over time
Key distinction from chatbots:
Chatbot: User β LLM β Response
Agent: User β LLM β [Tools] β Actions β Results β LLM β Response
β__________________|
Autonomous loop
Core Components
1. The Brain (LLM)
The reasoning engine that makes decisions:
# Claude as the agent brain
from anthropic import Anthropic
client = Anthropic()
def agent_think(task: str, context: dict) -> str:
"""Agent reasoning with Claude"""
response = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=4096,
system="You are an autonomous agent. Analyze the task and decide what action to take.",
messages=[{
"role": "user",
"content": f"Task: {task}\nContext: {context}\nWhat should I do next?"
}]
)
return response.content[0].text
Popular models for agents:
- Claude 3.5 Sonnet: Best for complex reasoning, tool use
- GPT-4: Strong general capabilities
- Mixtral: Open-source alternative
- Gemini Pro: Google's multimodal option
Related: Claude Agent Patterns for Claude-specific best practices
2. Memory Systems
Agents need memory to maintain context and learn:
Short-term memory (conversation buffer):
class AgentMemory:
def __init__(self, max_messages: int = 10):
self.messages = []
self.max_messages = max_messages
def add(self, role: str, content: str):
self.messages.append({"role": role, "content": content})
if len(self.messages) > self.max_messages:
self.messages.pop(0) # FIFO
def get_context(self) -> list:
return self.messages
Long-term memory (vector database):
from qdrant_client import QdrantClient
from sentence_transformers import SentenceTransformer
class LongTermMemory:
def __init__(self):
self.client = QdrantClient(":memory:")
self.encoder = SentenceTransformer('all-MiniLM-L6-v2')
def store(self, memory: str, metadata: dict):
"""Store experience in vector DB"""
vector = self.encoder.encode(memory).tolist()
self.client.upsert(
collection_name="memories",
points=[{
"id": hash(memory),
"vector": vector,
"payload": {"text": memory, **metadata}
}]
)
def recall(self, query: str, limit: int = 5):
"""Retrieve relevant memories"""
vector = self.encoder.encode(query).tolist()
results = self.client.search(
collection_name="memories",
query_vector=vector,
limit=limit
)
return [hit.payload for hit in results]
Deep dive: Agent Memory Systems
3. Tool Use
Agents extend their capabilities through tools:
Tool definition:
from typing import Callable, Dict, Any
class Tool:
def __init__(
self,
name: str,
description: str,
function: Callable,
parameters: Dict[str, Any]
):
self.name = name
self.description = description
self.function = function
self.parameters = parameters
def execute(self, **kwargs) -> Any:
return self.function(**kwargs)
# Example tools
web_search = Tool(
name="web_search",
description="Search the web for current information",
function=lambda query: search_api(query),
parameters={"query": {"type": "string", "required": True}}
)
calculator = Tool(
name="calculator",
description="Perform mathematical calculations",
function=lambda expression: eval(expression),
parameters={"expression": {"type": "string", "required": True}}
)
Related: Tool Use & Function Calling
4. Planning & Reasoning
Agents use various strategies to plan actions:
ReAct Pattern (Reasoning + Acting):
Thought: I need to find the current weather in Tokyo
Action: web_search("Tokyo weather today")
Observation: Temperature is 18Β°C, partly cloudy
Thought: Now I can answer the user's question
Action: respond("The weather in Tokyo is 18Β°C and partly cloudy")
Chain-of-Thought:
def chain_of_thought_prompt(question: str) -> str:
return f"""Let's approach this step-by-step:
1. First, identify what information we need
2. Break down the problem into sub-tasks
3. Execute each sub-task
4. Synthesize the results
Question: {question}
Let's begin:"""
Tree of Thoughts (explore multiple reasoning paths):
class ThoughtTree:
def __init__(self, root_thought: str):
self.root = {"thought": root_thought, "children": [], "score": 0}
def expand(self, node: dict, num_branches: int = 3):
"""Generate alternative reasoning paths"""
for i in range(num_branches):
child_thought = generate_next_thought(node["thought"])
child = {
"thought": child_thought,
"children": [],
"score": evaluate_thought(child_thought)
}
node["children"].append(child)
def best_path(self) -> list:
"""Find highest-scoring reasoning path"""
return traverse_tree(self.root, key=lambda n: n["score"])
Agent Architectures
1. Simple ReAct Agent
Single LLM call with tools:
class ReActAgent:
def __init__(self, llm, tools: list[Tool]):
self.llm = llm
self.tools = {t.name: t for t in tools}
def run(self, task: str, max_iterations: int = 10):
"""Execute ReAct loop"""
context = []
for i in range(max_iterations):
# Reasoning step
prompt = self._build_prompt(task, context)
response = self.llm.generate(prompt)
# Parse action
action = self._parse_action(response)
if action["type"] == "final_answer":
return action["content"]
# Execute tool
tool = self.tools[action["tool"]]
result = tool.execute(**action["args"])
context.append({
"thought": response,
"action": action,
"observation": result
})
return "Max iterations reached"
2. Multi-Agent Systems
Specialized agents collaborating:
class MultiAgentSystem:
def __init__(self):
self.agents = {
"researcher": ResearchAgent(),
"writer": WriterAgent(),
"critic": CriticAgent(),
}
async def solve(self, task: str):
"""Collaborative problem solving"""
# Research phase
research = await self.agents["researcher"].investigate(task)
# Writing phase
draft = await self.agents["writer"].write(research)
# Review phase
feedback = await self.agents["critic"].review(draft)
# Refinement
final = await self.agents["writer"].revise(draft, feedback)
return final
Deep dive: Multi-Agent Systems
3. Hierarchical Agents
Manager delegates to specialists:
Manager Agent
βββ Planning Agent (strategy)
βββ Execution Agent (actions)
βββ Monitoring Agent (validation)
class HierarchicalAgent:
def __init__(self):
self.manager = ManagerAgent()
self.specialists = {
"planner": PlanningAgent(),
"executor": ExecutionAgent(),
"monitor": MonitorAgent(),
}
async def run(self, goal: str):
# Manager creates plan
plan = await self.manager.plan(goal)
# Delegate to specialists
results = []
for step in plan.steps:
specialist = self.specialists[step.type]
result = await specialist.execute(step)
results.append(result)
# Manager synthesizes
return await self.manager.synthesize(results)
Agent Types
1. Task Completion Agents
Goal: Complete specific tasks (research, data analysis, code generation)
class TaskAgent:
async def complete_task(self, task_description: str):
# Understand task
requirements = await self.analyze_task(task_description)
# Break into steps
plan = await self.create_plan(requirements)
# Execute steps
for step in plan:
result = await self.execute_step(step)
if not self.validate(result):
await self.revise_plan(step)
return self.synthesize_results()
Examples:
- Research assistant (gather information)
- Code generator (write functions)
- Data analyst (SQL queries, visualizations)
2. Conversational Agents
Goal: Natural dialogue with memory and personality
class ConversationalAgent:
def __init__(self, personality: str):
self.personality = personality
self.memory = AgentMemory()
async def chat(self, user_message: str):
# Add to memory
self.memory.add("user", user_message)
# Generate response with personality
system_prompt = f"You are {self.personality}"
response = await self.llm.generate(
system=system_prompt,
messages=self.memory.get_context()
)
self.memory.add("assistant", response)
return response
Examples:
- Customer support bots
- Personal assistants
- Tutoring systems
3. Autonomous Agents
Goal: Continuous operation toward long-term goals
class AutonomousAgent:
def __init__(self, goal: str):
self.goal = goal
self.state = {}
async def run_loop(self):
"""Continuous operation"""
while not self.goal_achieved():
# Perceive environment
observations = await self.observe()
# Update internal state
self.state = await self.update_state(observations)
# Decide next action
action = await self.decide(self.state)
# Execute action
result = await self.execute(action)
# Learn from result
await self.learn(action, result)
await asyncio.sleep(self.tick_rate)
Examples:
- AutoGPT (recursive task completion)
- Cryptocurrency trading bots
- DevOps automation agents
Key Concepts
Agentic Behavior
What makes something truly "agentic":
- Autonomy: Self-directed action without human intervention
- Reactivity: Responds to environment changes
- Proactivity: Takes initiative toward goals
- Social ability: Interacts with other agents/humans
Grounding
Connecting agent actions to real-world effects:
def grounded_action(action: str, world_model: WorldState):
"""Ensure action has real effect"""
# Verify pre-conditions
if not world_model.check_preconditions(action):
raise InvalidAction("Preconditions not met")
# Execute with confirmation
result = execute(action)
# Verify post-conditions
new_state = world_model.update(result)
if not new_state.matches_expected():
rollback(action)
raise ActionFailed("Post-conditions not satisfied")
return new_state
Tool Calling vs Code Execution
Tool calling: Structured function invocation
# LLM returns structured tool call
{"tool": "web_search", "args": {"query": "Python tutorials"}}
Code execution: Generate and run arbitrary code
# LLM generates code
code = """
import requests
data = requests.get('https://api.example.com').json()
result = [item['name'] for item in data if item['active']]
"""
exec(code) # β οΈ Security implications!
See: Agent Security Considerations
Evaluation
Measuring agent performance:
Success Metrics
class AgentMetrics:
def __init__(self):
self.tasks_completed = 0
self.tasks_failed = 0
self.avg_steps = []
self.tool_usage = defaultdict(int)
def task_success_rate(self) -> float:
total = self.tasks_completed + self.tasks_failed
return self.tasks_completed / total if total > 0 else 0
def avg_steps_to_completion(self) -> float:
return sum(self.avg_steps) / len(self.avg_steps)
def tool_efficiency(self) -> dict:
"""Which tools are most useful?"""
total_calls = sum(self.tool_usage.values())
return {
tool: count / total_calls
for tool, count in self.tool_usage.items()
}
Benchmarks
- WebArena: Web navigation tasks
- SWE-bench: Software engineering tasks
- GAIA: General AI assistant tasks
- AgentBench: Multi-domain capabilities
Related: Agent Evaluation & Testing
Common Patterns
1. Retry with Refinement
async def retry_with_feedback(agent, task, max_attempts=3):
"""Retry failed tasks with error feedback"""
for attempt in range(max_attempts):
try:
result = await agent.execute(task)
return result
except Exception as e:
if attempt < max_attempts - 1:
task = f"{task}\n\nPrevious attempt failed: {e}\nPlease try a different approach."
else:
raise
2. Human-in-the-Loop
class HumanApprovalAgent:
async def execute_with_approval(self, action):
"""Require human approval for sensitive actions"""
if action.is_sensitive():
print(f"Agent wants to: {action.description}")
approval = input("Approve? (yes/no): ")
if approval.lower() != "yes":
return "Action rejected by human"
return await action.execute()
3. Self-Reflection
async def self_reflect(agent, task, result):
"""Agent critiques its own work"""
critique = await agent.llm.generate(f"""
Task: {task}
My result: {result}
Critically evaluate:
1. Did I fully address the task?
2. Are there errors or gaps?
3. How could I improve?
""")
if critique.suggests_revision():
return await agent.revise(result, critique)
return result
Practical Applications
Customer Support
class SupportAgent:
async def handle_ticket(self, ticket):
# Classify issue
category = await self.classify(ticket.description)
# Search knowledge base
kb_results = await self.search_kb(ticket.description)
# If known solution exists
if kb_results:
return self.format_solution(kb_results[0])
# Otherwise, escalate to human
return await self.escalate(ticket, reason="No KB match")
Code Assistant
class CodeAgent:
async def implement_feature(self, spec: str):
# Generate code
code = await self.generate_code(spec)
# Run tests
test_results = await self.run_tests(code)
# Fix if tests fail
while not test_results.passed:
errors = test_results.errors
code = await self.fix_code(code, errors)
test_results = await self.run_tests(code)
return code
Related: Building Agents with LangChain, Production Agent Deployment
Research Assistant
class ResearchAgent:
async def research_topic(self, topic: str):
# Web search
sources = await self.web_search(topic)
# Extract information
facts = []
for source in sources:
content = await self.fetch_url(source.url)
extracted = await self.extract_facts(content, topic)
facts.extend(extracted)
# Synthesize report
report = await self.synthesize(facts)
# Add citations
return self.add_citations(report, sources)
Challenges & Limitations
1. Reliability
Agents can fail unpredictably:
- Tool calls with wrong arguments
- Infinite loops
- Context window overflow
- Hallucinated actions
Mitigation: Agent Evaluation & Testing
2. Cost
LLM API costs accumulate:
# Track token usage
class CostTracker:
def __init__(self, cost_per_1k_tokens: float):
self.cost_per_1k = cost_per_1k_tokens
self.total_tokens = 0
def add_usage(self, input_tokens: int, output_tokens: int):
self.total_tokens += input_tokens + output_tokens
def total_cost(self) -> float:
return (self.total_tokens / 1000) * self.cost_per_1k
Mitigation: Production Agent Deployment
3. Security
Agents can be exploited:
- Prompt injection
- Tool misuse
- Data exfiltration
Mitigation: Agent Security Considerations
Learning Resources
Frameworks
- LangChain: Popular Python agent framework
- LangGraph: Stateful agent orchestration
- AutoGPT: Autonomous agent template
- CrewAI: Multi-agent collaboration
See: Agent Frameworks Comparison
Papers
- "ReAct: Synergizing Reasoning and Acting in Language Models" (2023)
- "Toolformer: Language Models Can Teach Themselves to Use Tools" (2023)
- "Generative Agents: Interactive Simulacra of Human Behavior" (2023)
- "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" (2022)
Courses
- DeepLearning.AI: "LangChain for LLM Application Development"
- "Building AI Agents" by Harrison Chase
- "Prompt Engineering for ChatGPT" by Vanderbilt
Connection Points
Start here:
- AI Agents MOC β Main navigation
- Tool Use & Function Calling β Extending agent capabilities
- Agent Memory Systems β Context management
Deep dives:
- Claude Agent Patterns β Claude-specific techniques
- Multi-Agent Systems β Collaborative agents
- Agent Frameworks Comparison β Choose the right framework
Advanced:
- Agent Security Considerations β Security practices
- Agent Evaluation & Testing β Measuring performance
- Production Agent Deployment β Scaling to production