Overview
klaw is designed to handle failures gracefully. This page covers the resilience mechanisms built into the provider layer, context management, structured error handling, and observability.Provider Retry
When an LLM API call fails with a retryable error, klaw automatically retries with exponential backoff and jitter.Retry Flow
Backoff Calculation
| Parameter | Default | Description |
|---|---|---|
initial_backoff | 1s | Starting delay |
backoff_factor | 2.0 | Exponential multiplier |
max_backoff | 30s | Maximum delay cap |
jitter | 0-25% | Random addition to prevent thundering herd |
- Attempt 1: ~1s wait (1s + jitter)
- Attempt 2: ~2s wait (2s + jitter)
- Attempt 3: ~4s wait (4s + jitter)
Retryable Errors
| Error | Description |
|---|---|
| HTTP 429 | Rate limit exceeded |
| HTTP 500 | Internal server error |
| HTTP 502 | Bad gateway |
| HTTP 503 | Service unavailable |
| HTTP 529 | Overloaded |
timeout / timed out | Request timeout |
connection reset | Connection dropped |
connection refused | Server unavailable |
EOF | Unexpected disconnect |
Fallback Chain
When retries are exhausted on the primary provider, klaw tries fallback providers in order:Configuration
Context Window Compaction
When conversation history approaches the context window limit, klaw automatically compacts it via LLM-based summarization.Compaction Lifecycle
Configuration
| Parameter | Default | Description |
|---|---|---|
MaxContextTokens | 200,000 | Maximum context window size |
CompactionThreshold | 0.75 | Fraction of available tokens that triggers compaction |
ReserveTokens | 8,192 | Tokens reserved for the LLM response |
(200,000 - 8,192) * 0.75 ≈ 143,856 tokens.
Token estimation uses a ~4 characters per token heuristic. The summarized section is prefixed with [Previous conversation summary].
Structured Errors
All agent errors use theAgentError type with machine-readable codes:
Error Codes
| Code | Constant | Trigger |
|---|---|---|
max_iterations | ErrMaxIterations | Agent loop exceeded iteration limit |
provider_error | ErrProvider | LLM API failure after retries and fallbacks |
tool_execution | ErrToolExec | Tool returned an error |
context_limit | ErrContextLimit | Context window exceeded after compaction |
budget_exceeded | ErrBudgetExceed | Session cost exceeded max_session_cost |
Error Format
Errors serialize as[CODE] message: cause:
Unwrap() for Go error chain inspection.
Observability
Structured Logging
klaw uses Go’sslog package for structured JSON logging:
Metrics
Global and per-session metrics are tracked using atomic counters: Global Metrics:| Counter | Description |
|---|---|
TotalInputTokens | Cumulative input tokens |
TotalOutputTokens | Cumulative output tokens |
TotalRequests | Total LLM API requests |
TotalErrors | Total error occurrences |
TotalToolCalls | Total tool invocations |
| Field | Description |
|---|---|
InputTokens | Session input tokens |
OutputTokens | Session output tokens |
Requests | Session request count |
Errors | Session error count |
ToolCalls | Per-tool call counts (map) |
StartedAt | Session start timestamp |
RecordRequest(sessionID, inputTokens, outputTokens)— after each LLM callRecordToolCall(sessionID, toolName)— after each tool executionRecordError(sessionID, errorCode)— on any error
metrics.Summary() to get a snapshot of all global counters.
Next Steps
Agent Loop
How these mechanisms fit into the execution cycle
Cost & Safety Guide
Practical guide to configuring safety features

