Overview
The agent loop is the heart of klaw. It orchestrates the conversation between user input, LLM decisions, and tool execution, with built-in safety guardrails for iteration limits, context management, cost tracking, and human approval.The Loop
Step-by-Step Flow
Planning Phase (Optional)
On the first message, if planning is enabled, a planning prompt is injected asking the LLM to outline steps before acting.The default planning prompt asks the LLM to:
- Analyze what the user is asking for
- List steps (max 5)
- Identify potential issues
Context Window Check
Before each LLM call, the context manager estimates token usage. If the conversation exceeds the compaction threshold, middle messages are summarized via an LLM call.
- Max context: 200,000 tokens (default)
- Compaction threshold: 75% of available context
- Reserve: 8,192 tokens for the response
- Keeps first user message and last 6 messages verbatim
Send to LLM
Agent sends conversation history to the LLM provider with resilient delivery (retry + fallback).
Process Stream
Agent collects streaming events: text chunks, tool calls, stop signals with usage data.
Human Approval (Optional)
If a tool is in the The user sees:
require_approval list, the user is prompted before execution.⚠ Tool 'bash' requires approval. Execute? [y/N]:Execute Tools (Parallel)
Approved tools execute concurrently. Each tool runs in its own goroutine with a 2-minute timeout. Results are collected in original call order after all goroutines complete.If the LLM returns 3 file reads simultaneously, they complete in roughly the time of one read instead of three sequential reads.
Reflection (Optional)
After every N tool calls (default: 3), a reflection prompt is injected asking the LLM to assess progress.
Iteration Limit
Each agent has a maximum number of loop iterations (default: 50). This prevents runaway loops where the LLM repeatedly calls tools without converging on a solution.AgentError{Code: ErrMaxIterations}.
Context Window Management
Large conversations can exceed the LLM’s context window. The context manager handles this automatically:| Parameter | Default | Description |
|---|---|---|
MaxContextTokens | 200,000 | Maximum tokens before compaction |
CompactionThreshold | 0.75 | Ratio of max tokens that triggers compaction |
ReserveTokens | 8,192 | Tokens reserved for LLM response |
- Keep the first user message verbatim (preserves original intent)
- Keep the last 6 messages verbatim (preserves recent context)
- Summarize everything in between via an LLM call
- Replace middle messages with
[Previous conversation summary]
Cost Tracking
Every LLM call records input and output token counts. The cost tracker computes session cost using per-model pricing tables:| Model | Input (per 1M) | Output (per 1M) |
|---|---|---|
claude-sonnet-4-20250514 | $3.00 | $15.00 |
claude-opus-4-20250514 | $15.00 | $75.00 |
claude-3-5-haiku-20241022 | $0.80 | $4.00 |
openai/gpt-4o | $2.50 | $10.00 |
deepseek/deepseek-chat | $0.14 | $0.28 |
max_session_cost in config. When the budget is reached, the agent stops with ErrBudgetExceed. A warning is logged at 80% of the budget.
Structured Errors
The agent loop uses typed errors with machine-readable codes:| Error Code | Constant | Description |
|---|---|---|
max_iterations | ErrMaxIterations | Reached iteration limit |
provider_error | ErrProvider | LLM API or provider failure |
tool_execution | ErrToolExec | Tool execution failed |
context_limit | ErrContextLimit | Context window exceeded |
budget_exceeded | ErrBudgetExceed | Session cost budget exceeded |
[CODE] message: cause and support errors.Unwrap().
Stream Events
The provider returns a stream of events:| Event Type | Description | Fields |
|---|---|---|
text | Text chunk to stream to user | Text |
tool_use | LLM wants to call a tool | ToolCall |
tool_result | Result of a tool call | Content |
error | Error occurred | Error |
stop | Generation complete | Usage{InputTokens, OutputTokens} |
stop event now includes a Usage field with token counts, used by the cost tracker.
Tool Execution
Timeout
Each tool has a 2-minute timeout by default:Result Formatting
Tool results are formatted for display:Error Handling
Tool errors are captured and fed back to the LLM as structured errors:Memory Integration
Before each LLM call, workspace context is loaded:Sub-Agent Delegation
Thedelegate tool allows the main agent to spawn ephemeral sub-agents that run inline during a conversation. This is useful for parallelizable sub-tasks or specialized work.
- Sub-agents use
RunOnce— a non-streaming agent loop with parallel tool execution - Each sub-agent gets its own tool registry (parent delegate excluded, child delegate added at depth+1)
- Maximum nesting depth: 3 levels
- 5-minute timeout per delegation
- Output truncated at 30,000 characters
Performance Considerations
| Factor | Impact | Mitigation |
|---|---|---|
| Tool execution time | Blocks loop | Timeout, parallel execution |
| Large outputs | Token cost | Output truncation |
| Many tool calls | Latency | Parallel execution |
| Long history | Context limits | Automatic compaction |
| Runaway loops | Cost & time | Iteration limit |
| Provider outages | Availability | Retry + fallback |
| Sub-agent depth | Complexity | Depth limit (max 3) |
Next Steps
Resilience
Provider retry, fallback, and error handling
Cost & Safety
Budgets, approval, and monitoring
Tools Reference
Available built-in tools
Custom Tools
Build your own tools

