Overview
klaw supports distributed deployment where a central controller manages multiple worker nodes. This architecture enables:- Horizontal scaling - Add more nodes as workload increases
- Fault tolerance - Nodes can fail without losing the system
- Resource isolation - Different agents on different machines
- Geographic distribution - Deploy nodes closer to users
Architecture
Setting Up the Controller
Start the Controller
On your central server:Controller Output
Check Controller Status
Setting Up Worker Nodes
Join a Controller
On each worker machine:Node Registration
When a node joins, it:- Authenticates with the controller
- Registers its available agents
- Starts heartbeat (every 30 seconds)
- Waits for task dispatch
Check Node Status
Managing the Cluster
List Nodes
Describe a Node
Dispatching Tasks
Send a Task to the Cluster
- Finds a node with the
coderagent - Dispatches the task to that node
- Streams results back
View Tasks
Task Output
Creating Agents on Nodes
Define Agents Locally
On each node, create agents:Register Agents with Controller
When a node joins, it automatically registers its agents:Namespaces in Distributed Mode
Create Cluster and Namespaces
Bind Agents to Namespaces
Dispatch to Namespace
High Availability
Multiple Controllers (Future)
For production deployments, run multiple controllers with shared state:Node Auto-Recovery
Nodes automatically reconnect if the controller restarts:Container Deployment
Run Nodes in Containers
Kubernetes Deployment
Monitoring
Cluster Metrics
View Logs
Security Considerations
Use authentication tokens
Use authentication tokens
Always set
--token for production deployments:Network security
Network security
- Use TLS for controller-node communication
- Restrict network access with firewalls
- Use private networks when possible
API key management
API key management
- Store API keys in environment variables
- Use secrets management (Vault, K8s secrets)
- Rotate keys regularly
Troubleshooting
Node can't connect
Node can't connect
- Verify controller is running:
klaw controller status - Check network connectivity:
nc -zv controller-host 9090 - Verify token matches
- Check firewall rules
Tasks not dispatching
Tasks not dispatching
- Check if node has the required agent
- Verify node is in Ready state
- Check controller logs for errors
Node disconnects frequently
Node disconnects frequently
- Check network stability
- Increase heartbeat timeout
- Check node resource usage

