Skip to main content

Overview

The agent loop is the heart of klaw. It orchestrates the conversation between user input, LLM decisions, and tool execution, with built-in safety guardrails for iteration limits, context management, cost tracking, and human approval.

The Loop

┌──────────────────────────────────────────────────────────────────────┐
│                           AGENT LOOP                                │
│                                                                     │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────────┐  │
│  │ Receive  │ →  │ Planning │ →  │  Context │ →  │     LLM      │  │
│  │  Input   │    │  (opt.)  │    │  Check   │    │   Decision   │  │
│  └──────────┘    └──────────┘    └──────────┘    └──────┬───────┘  │
│       ▲                                                  │         │
│       │                                                  ▼         │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────────┐  │
│  │  Cost &  │ ←  │ Reflect  │ ←  │ Execute  │ ←  │   Approval   │  │
│  │ Iter Chk │    │  (opt.)  │    │  Tools   │    │   (opt.)     │  │
│  └──────────┘    └──────────┘    └──────────┘    └──────────────┘  │
│       │                                                            │
│       └── Loop or Complete ─────────────────────────────────────── │
└──────────────────────────────────────────────────────────────────────┘

Step-by-Step Flow

1

Receive Input

User sends a message through a channel (CLI, Slack, API).
msg, _ := channel.Receive(ctx)
history = append(history, Message{Role: "user", Content: msg})
2

Planning Phase (Optional)

On the first message, if planning is enabled, a planning prompt is injected asking the LLM to outline steps before acting.
if planner.Enabled && isFirstMessage {
    history = InjectPlanRequest(history, planner.PlanPrompt)
}
The default planning prompt asks the LLM to:
  1. Analyze what the user is asking for
  2. List steps (max 5)
  3. Identify potential issues
3

Context Window Check

Before each LLM call, the context manager estimates token usage. If the conversation exceeds the compaction threshold, middle messages are summarized via an LLM call.
history, _ = contextMgr.CompactIfNeeded(ctx, provider, history)
  • Max context: 200,000 tokens (default)
  • Compaction threshold: 75% of available context
  • Reserve: 8,192 tokens for the response
  • Keeps first user message and last 6 messages verbatim
4

Send to LLM

Agent sends conversation history to the LLM provider with resilient delivery (retry + fallback).
events, _ := provider.Stream(ctx, history)
5

Process Stream

Agent collects streaming events: text chunks, tool calls, stop signals with usage data.
for event := range events {
    switch event.Type {
    case "text":
        channel.Send(ctx, Message{Content: event.Text, Partial: true})
    case "tool_use":
        toolCalls = append(toolCalls, event.ToolCall)
    case "stop":
        costTracker.Add(model, event.Usage.InputTokens, event.Usage.OutputTokens)
        break
    }
}
6

Human Approval (Optional)

If a tool is in the require_approval list, the user is prompted before execution.
if approval.NeedsApproval(call.Name) {
    approved := approval.RequestApproval(ctx, channel, call)
    if !approved {
        result = "Denied by user"
        continue
    }
}
The user sees: ⚠ Tool 'bash' requires approval. Execute? [y/N]:
7

Execute Tools (Parallel)

Approved tools execute concurrently. Each tool runs in its own goroutine with a 2-minute timeout. Results are collected in original call order after all goroutines complete.
// All approved tools run in parallel
var wg sync.WaitGroup
results := make([]*tool.Result, len(toolCalls))
for i, call := range approved {
    wg.Add(1)
    go func(idx int, tc ToolCall) {
        defer wg.Done()
        ctx, cancel := context.WithTimeout(ctx, 2*time.Minute)
        results[idx] = tools.Execute(ctx, tc.Name, tc.Input)
        cancel()
    }(i, call)
}
wg.Wait()
If the LLM returns 3 file reads simultaneously, they complete in roughly the time of one read instead of three sequential reads.
8

Reflection (Optional)

After every N tool calls (default: 3), a reflection prompt is injected asking the LLM to assess progress.
if ShouldReflect(reflectionConfig, toolCallsSinceReflection) {
    history = InjectReflection(history, reflectionConfig.Prompt)
    toolCallsSinceReflection = 0
}
9

Cost & Iteration Check

The loop checks two safety limits before continuing:
// Iteration limit (default: 50)
if iteration >= maxIterations {
    return AgentError{Code: ErrMaxIterations}
}

// Cost budget
if err := costTracker.CheckBudget(); err != nil {
    return AgentError{Code: ErrBudgetExceed}
}
10

Loop or Complete

If tools were called, loop back to the context check step. If no tools were requested, the response is complete.
if len(toolCalls) > 0 {
    continue // Back to Step 3
}
channel.Send(ctx, Message{Content: text, Done: true})

Iteration Limit

Each agent has a maximum number of loop iterations (default: 50). This prevents runaway loops where the LLM repeatedly calls tools without converging on a solution.
[agent.default]
max_iterations = 50

[agent.researcher]
max_iterations = 30
When the limit is reached, the agent stops with AgentError{Code: ErrMaxIterations}.

Context Window Management

Large conversations can exceed the LLM’s context window. The context manager handles this automatically:
ParameterDefaultDescription
MaxContextTokens200,000Maximum tokens before compaction
CompactionThreshold0.75Ratio of max tokens that triggers compaction
ReserveTokens8,192Tokens reserved for LLM response
Compaction strategy:
  1. Keep the first user message verbatim (preserves original intent)
  2. Keep the last 6 messages verbatim (preserves recent context)
  3. Summarize everything in between via an LLM call
  4. Replace middle messages with [Previous conversation summary]
Token estimation uses a ~4 characters per token heuristic.

Cost Tracking

Every LLM call records input and output token counts. The cost tracker computes session cost using per-model pricing tables:
ModelInput (per 1M)Output (per 1M)
claude-sonnet-4-20250514$3.00$15.00
claude-opus-4-20250514$15.00$75.00
claude-3-5-haiku-20241022$0.80$4.00
openai/gpt-4o$2.50$10.00
deepseek/deepseek-chat$0.14$0.28
Set a budget with max_session_cost in config. When the budget is reached, the agent stops with ErrBudgetExceed. A warning is logged at 80% of the budget.

Structured Errors

The agent loop uses typed errors with machine-readable codes:
Error CodeConstantDescription
max_iterationsErrMaxIterationsReached iteration limit
provider_errorErrProviderLLM API or provider failure
tool_executionErrToolExecTool execution failed
context_limitErrContextLimitContext window exceeded
budget_exceededErrBudgetExceedSession cost budget exceeded
type AgentError struct {
    Code    ErrorCode  // Machine-readable code
    Message string     // Human-readable message
    Cause   error      // Underlying error (optional)
}
Errors format as [CODE] message: cause and support errors.Unwrap().

Stream Events

The provider returns a stream of events:
Event TypeDescriptionFields
textText chunk to stream to userText
tool_useLLM wants to call a toolToolCall
tool_resultResult of a tool callContent
errorError occurredError
stopGeneration completeUsage{InputTokens, OutputTokens}
The stop event now includes a Usage field with token counts, used by the cost tracker.

Tool Execution

Timeout

Each tool has a 2-minute timeout by default:
ctx, cancel := context.WithTimeout(ctx, 2*time.Minute)
defer cancel()
result, err := tool.Execute(ctx, input)

Result Formatting

Tool results are formatted for display:
╭──────────────────────────────────────────╮
│ Tool: bash                               │
│ Command: git status                      │
├──────────────────────────────────────────┤
│ On branch main                           │
│ Your branch is up to date                │
│ nothing to commit, working tree clean    │
╰──────────────────────────────────────────╯

Error Handling

Tool errors are captured and fed back to the LLM as structured errors:
result, err := tool.Execute(ctx, input)
if err != nil {
    return AgentError{Code: ErrToolExec, Message: err.Error(), Cause: err}
}

Memory Integration

Before each LLM call, workspace context is loaded:
System Prompt = Base Instructions
             + SOUL.md content
             + AGENTS.md content
             + TOOLS.md content
             + Skill prompts

Sub-Agent Delegation

The delegate tool allows the main agent to spawn ephemeral sub-agents that run inline during a conversation. This is useful for parallelizable sub-tasks or specialized work.
┌───────────────────────────────────────────────┐
│              Main Agent                        │
│  "Research patterns and summarize"             │
│                                                │
│  ┌─────────────┐    ┌─────────────┐           │
│  │  delegate    │    │  delegate    │           │
│  │  (depth 1)  │    │  (depth 1)  │           │
│  │  research   │    │  summarize  │           │
│  └──────┬──────┘    └──────┬──────┘           │
│         │                   │                  │
│         │  can delegate     │                  │
│         ▼                   ▼                  │
│  ┌─────────────┐    ┌─────────────┐           │
│  │  delegate    │    │  (result)   │           │
│  │  (depth 2)  │    └─────────────┘           │
│  └─────────────┘                               │
└───────────────────────────────────────────────┘
Key properties:
  • Sub-agents use RunOnce — a non-streaming agent loop with parallel tool execution
  • Each sub-agent gets its own tool registry (parent delegate excluded, child delegate added at depth+1)
  • Maximum nesting depth: 3 levels
  • 5-minute timeout per delegation
  • Output truncated at 30,000 characters
// The delegate tool calls RunOnce with a callback to avoid circular imports
delegateTool := tool.NewDelegateTool(
    func(ctx context.Context, cfg tool.RunConfig) (string, error) {
        return agent.RunOnce(ctx, agent.RunOnceConfig{
            Provider: cfg.Provider.(provider.Provider),
            Tools:    cfg.Tools,
            Prompt:   cfg.Prompt,
        })
    },
    provider, tools,
)

Performance Considerations

FactorImpactMitigation
Tool execution timeBlocks loopTimeout, parallel execution
Large outputsToken costOutput truncation
Many tool callsLatencyParallel execution
Long historyContext limitsAutomatic compaction
Runaway loopsCost & timeIteration limit
Provider outagesAvailabilityRetry + fallback
Sub-agent depthComplexityDepth limit (max 3)

Next Steps

Resilience

Provider retry, fallback, and error handling

Cost & Safety

Budgets, approval, and monitoring

Tools Reference

Available built-in tools

Custom Tools

Build your own tools