Performance
Agno-Go is designed for extreme performance, achieving 16x faster agent instantiation compared to Python Agno.
Executive Summary
✅ Performance Goals Achieved:
- ✅ Agent instantiation: ~180ns (<1μs target)
- ✅ Memory footprint: ~1.2KB/agent (<3KB target)
- ✅ Concurrency: Linear scaling with no contention
Benchmark Results
Agent Creation Performance
Benchmark | Time/op | Memory/op | Allocs/op |
---|---|---|---|
Simple Agent | 184.5 ns | 1,272 B (1.2 KB) | 8 |
With Tools | 193.0 ns | 1,288 B (1.3 KB) | 9 |
With Memory | 111.9 ns | 312 B (0.3 KB) | 6 |
Key Findings:
- ⚡ Agent creation: <200 nanoseconds (5x better than 1μs target!)
- 💾 Memory usage: 1.2-1.3KB (60% better than 3KB target)
- 🎯 Adding tools costs only 8.5ns overhead
- 🎯 Memory is lightweight (only 312B)
Execution Performance
Benchmark | Throughput |
---|---|
Simple Run | ~6M ops/sec |
With Tool Calls | ~0.5M ops/sec |
Memory Operations | ~1M ops/sec |
Note: Real performance is bounded by LLM API latency (100-1000ms). Above results use mock models.
Concurrent Performance
Benchmark | Time/op | Memory/op | Scaling |
---|---|---|---|
Parallel Creation | 191.0 ns | 1,272 B | ✅ Linear |
Parallel Run | Similar | Similar | ✅ Linear |
Key Findings:
- ✅ Concurrent and single-threaded performance are identical
- ✅ No lock contention or race conditions
- ✅ Perfect for high-concurrency scenarios
Performance Comparison
vs Python Agno
Metric | Go | Python | Improvement |
---|---|---|---|
Instantiation | ~180ns | ~3μs | 16x faster |
Memory/Agent | ~1.2KB | ~6.5KB | 5x less |
Concurrency | Native goroutines | GIL limited | Superior |
Real-World Scenarios
Scenario 1: Batch Agent Creation
Creating 1,000 agents:
- Time: 1,000 × 180ns = 0.18ms
- Memory: 1,000 × 1.2KB = 1.2MB
Scenario 2: High-Concurrency API Service
Handling 10,000 req/s:
- Per request: 1 agent instance
- Memory overhead: 10,000 × 1.2KB = 12MB
- Latency: <1ms (excluding LLM API calls)
Scenario 3: Multi-Agent Workflow
100 agents collaborating:
- Total memory: 100 × 1.2KB = 120KB
- Startup time: 100 × 180ns = 18μs
Optimization Techniques
1. Low Allocation Count
- Only 8-9 heap allocations per agent
- No unnecessary interface conversions
- Pre-allocated slice capacities
2. Efficient Memory Layout
go
type Agent struct {
ID string // 16B
Name string // 16B
Model Model // 16B (interface)
Tools []Toolkit // 24B (slice header)
Memory Memory // 16B (interface)
Instructions string // 16B
MaxLoops int // 8B
// Total: ~112B struct + heap allocations
}
3. Zero-Copy Operations
- String references (no copying)
- Interface pointers (no copying)
- Slice views (no copying)
Bottleneck Analysis
Current Bottlenecks
LLM API Latency (100-1000ms)
- Solution: Streaming, caching, batch requests
Tool Execution Time (varies)
- Solution: Parallel execution, timeout controls
Not yet benchmarked:
- Team coordination overhead
- Workflow execution overhead
- Vector DB queries
Production Recommendations
1. Agent Pooling
Reuse agent instances to reduce GC pressure:
go
type AgentPool struct {
agents chan *Agent
}
func NewAgentPool(size int, config agent.Config) *AgentPool {
pool := &AgentPool{
agents: make(chan *Agent, size),
}
for i := 0; i < size; i++ {
ag, _ := agent.New(config)
pool.agents <- ag
}
return pool
}
func (p *AgentPool) Get() *Agent {
return <-p.agents
}
func (p *AgentPool) Put(ag *Agent) {
ag.ClearMemory()
p.agents <- ag
}
2. Goroutine Limits
Limit concurrency to avoid resource exhaustion:
go
semaphore := make(chan struct{}, 100) // Max 100 concurrent
for _, task := range tasks {
semaphore <- struct{}{}
go func(t Task) {
defer func() { <-semaphore }()
ag, _ := agent.New(config)
ag.Run(ctx, t.Input)
}(task)
}
3. Response Caching
Cache LLM responses to reduce API calls:
go
type CachedModel struct {
model models.Model
cache map[string]*types.ModelResponse
mu sync.RWMutex
}
func (m *CachedModel) Invoke(ctx context.Context, req *models.InvokeRequest) (*types.ModelResponse, error) {
key := hashRequest(req)
m.mu.RLock()
if cached, ok := m.cache[key]; ok {
m.mu.RUnlock()
return cached, nil
}
m.mu.RUnlock()
resp, err := m.model.Invoke(ctx, req)
if err != nil {
return nil, err
}
m.mu.Lock()
m.cache[key] = resp
m.mu.Unlock()
return resp, nil
}
4. Monitoring
Monitor key metrics in production:
go
import "github.com/prometheus/client_golang/prometheus"
var (
agentCreations = prometheus.NewCounter(prometheus.CounterOpts{
Name: "agno_agent_creations_total",
})
agentLatency = prometheus.NewHistogram(prometheus.HistogramOpts{
Name: "agno_agent_run_duration_seconds",
})
)
Running Benchmarks
Run All Benchmarks
bash
make bench
# or
go test -bench=. -benchmem ./...
Run Specific Benchmark
bash
go test -bench=BenchmarkAgentCreation -benchmem ./pkg/agno/agent/
Generate CPU Profile
bash
go test -bench=. -cpuprofile=cpu.prof ./pkg/agno/agent/
go tool pprof cpu.prof
Generate Memory Profile
bash
go test -bench=. -memprofile=mem.prof ./pkg/agno/agent/
go tool pprof mem.prof
Profiling Tips
1. CPU Profiling
bash
go test -cpuprofile=cpu.prof -bench=.
go tool pprof -http=:8080 cpu.prof
2. Memory Profiling
bash
go test -memprofile=mem.prof -bench=.
go tool pprof -http=:8080 mem.prof
3. Race Detection
bash
go test -race ./...
Future Optimizations
Planned Improvements
- [ ] String interning for repeated values
- [ ] sync.Pool for agent reuse
- [ ] Batch tool execution
- [ ] HTTP/2 connection pooling for LLM APIs
- [ ] gRPC support for lower latency
Conclusion
Agno-Go exceeds performance targets:
- ✅ 5x faster than target (180ns vs 1μs)
- ✅ 60% less memory than target (1.2KB vs 3KB)
- ✅ 16x faster than Python, 5x less memory
- ✅ Perfect concurrency scaling
Supports:
- Thousands of concurrent agents
- 10K+ requests/second
- Low-latency real-time applications