Performance

Agno-Go is designed for extreme performance, achieving 16x faster agent instantiation compared to Python Agno.

Executive Summary

✅ Performance Goals Achieved:

✅ Agent instantiation: ~180ns (<1μs target)
✅ Memory footprint: ~1.2KB/agent (<3KB target)
✅ Concurrency: Linear scaling with no contention

Benchmark Results

Agent Creation Performance

Benchmark	Time/op	Memory/op	Allocs/op
Simple Agent	184.5 ns	1,272 B (1.2 KB)	8
With Tools	193.0 ns	1,288 B (1.3 KB)	9
With Memory	111.9 ns	312 B (0.3 KB)	6

Key Findings:

⚡ Agent creation: <200 nanoseconds (5x better than 1μs target!)
💾 Memory usage: 1.2-1.3KB (60% better than 3KB target)
🎯 Adding tools costs only 8.5ns overhead
🎯 Memory is lightweight (only 312B)

Execution Performance

Benchmark	Throughput
Simple Run	~6M ops/sec
With Tool Calls	~0.5M ops/sec
Memory Operations	~1M ops/sec

Note: Real performance is bounded by LLM API latency (100-1000ms). Above results use mock models.

Concurrent Performance

Benchmark	Time/op	Memory/op	Scaling
Parallel Creation	191.0 ns	1,272 B	✅ Linear
Parallel Run	Similar	Similar	✅ Linear

Key Findings:

✅ Concurrent and single-threaded performance are identical
✅ No lock contention or race conditions
✅ Perfect for high-concurrency scenarios

Performance Comparison

vs Python Agno

Metric	Go	Python	Improvement
Instantiation	~180ns	~3μs	16x faster
Memory/Agent	~1.2KB	~6.5KB	5x less
Concurrency	Native goroutines	GIL limited	Superior

Real-World Scenarios

Scenario 1: Batch Agent Creation

Creating 1,000 agents:

Time: 1,000 × 180ns = 0.18ms
Memory: 1,000 × 1.2KB = 1.2MB

Scenario 2: High-Concurrency API Service

Handling 10,000 req/s:

Per request: 1 agent instance
Memory overhead: 10,000 × 1.2KB = 12MB
Latency: <1ms (excluding LLM API calls)

Scenario 3: Multi-Agent Workflow

100 agents collaborating:

Total memory: 100 × 1.2KB = 120KB
Startup time: 100 × 180ns = 18μs

Optimization Techniques

1. Low Allocation Count

Only 8-9 heap allocations per agent
No unnecessary interface conversions
Pre-allocated slice capacities

2. Efficient Memory Layout

type Agent struct {
    ID           string        // 16B
    Name         string        // 16B
    Model        Model         // 16B (interface)
    Tools        []Toolkit     // 24B (slice header)
    Memory       Memory        // 16B (interface)
    Instructions string        // 16B
    MaxLoops     int           // 8B
    // Total: ~112B struct + heap allocations
}

3. Zero-Copy Operations

String references (no copying)
Interface pointers (no copying)
Slice views (no copying)

Bottleneck Analysis

Current Bottlenecks

LLM API Latency (100-1000ms)
- Solution: Streaming, caching, batch requests
Tool Execution Time (varies)
- Solution: Parallel execution, timeout controls
Not yet benchmarked:
- Team coordination overhead
- Workflow execution overhead
- Vector DB queries

Production Recommendations

1. Agent Pooling

Reuse agent instances to reduce GC pressure:

type AgentPool struct {
    agents chan *Agent
}

func NewAgentPool(size int, config agent.Config) *AgentPool {
    pool := &AgentPool{
        agents: make(chan *Agent, size),
    }
    for i := 0; i < size; i++ {
        ag, _ := agent.New(config)
        pool.agents <- ag
    }
    return pool
}

func (p *AgentPool) Get() *Agent {
    return <-p.agents
}

func (p *AgentPool) Put(ag *Agent) {
    ag.ClearMemory()
    p.agents <- ag
}

2. Goroutine Limits

Limit concurrency to avoid resource exhaustion:

semaphore := make(chan struct{}, 100) // Max 100 concurrent

for _, task := range tasks {
    semaphore <- struct{}{}
    go func(t Task) {
        defer func() { <-semaphore }()

        ag, _ := agent.New(config)
        ag.Run(ctx, t.Input)
    }(task)
}

3. Response Caching

Cache LLM responses to reduce API calls:

type CachedModel struct {
    model models.Model
    cache map[string]*types.ModelResponse
    mu    sync.RWMutex
}

func (m *CachedModel) Invoke(ctx context.Context, req *models.InvokeRequest) (*types.ModelResponse, error) {
    key := hashRequest(req)

    m.mu.RLock()
    if cached, ok := m.cache[key]; ok {
        m.mu.RUnlock()
        return cached, nil
    }
    m.mu.RUnlock()

    resp, err := m.model.Invoke(ctx, req)
    if err != nil {
        return nil, err
    }

    m.mu.Lock()
    m.cache[key] = resp
    m.mu.Unlock()

    return resp, nil
}

4. Monitoring

Monitor key metrics in production:

import "github.com/prometheus/client_golang/prometheus"

var (
    agentCreations = prometheus.NewCounter(prometheus.CounterOpts{
        Name: "agno_agent_creations_total",
    })

    agentLatency = prometheus.NewHistogram(prometheus.HistogramOpts{
        Name: "agno_agent_run_duration_seconds",
    })
)

Running Benchmarks

Run All Benchmarks

bash

make bench
# or
go test -bench=. -benchmem ./...

Run Specific Benchmark

bash

go test -bench=BenchmarkAgentCreation -benchmem ./pkg/agno/agent/

Generate CPU Profile

bash

go test -bench=. -cpuprofile=cpu.prof ./pkg/agno/agent/
go tool pprof cpu.prof

Generate Memory Profile

bash

go test -bench=. -memprofile=mem.prof ./pkg/agno/agent/
go tool pprof mem.prof

Profiling Tips

1. CPU Profiling

bash

go test -cpuprofile=cpu.prof -bench=.
go tool pprof -http=:8080 cpu.prof

2. Memory Profiling

bash

go test -memprofile=mem.prof -bench=.
go tool pprof -http=:8080 mem.prof

3. Race Detection

bash

go test -race ./...

Future Optimizations

Planned Improvements

[ ] String interning for repeated values
[ ] sync.Pool for agent reuse
[ ] Batch tool execution
[ ] HTTP/2 connection pooling for LLM APIs
[ ] gRPC support for lower latency

Conclusion

Agno-Go exceeds performance targets:

✅ 5x faster than target (180ns vs 1μs)
✅ 60% less memory than target (1.2KB vs 3KB)
✅ 16x faster than Python, 5x less memory
✅ Perfect concurrency scaling

Supports:

Thousands of concurrent agents
10K+ requests/second
Low-latency real-time applications

Performance ​

Executive Summary ​

Benchmark Results ​

Agent Creation Performance ​

Execution Performance ​

Concurrent Performance ​

Performance Comparison ​

vs Python Agno ​

Real-World Scenarios ​

Scenario 1: Batch Agent Creation ​

Scenario 2: High-Concurrency API Service ​

Scenario 3: Multi-Agent Workflow ​

Optimization Techniques ​

1. Low Allocation Count ​

2. Efficient Memory Layout ​

3. Zero-Copy Operations ​

Bottleneck Analysis ​

Current Bottlenecks ​

Production Recommendations ​

1. Agent Pooling ​

2. Goroutine Limits ​

3. Response Caching ​

4. Monitoring ​

Running Benchmarks ​

Run All Benchmarks ​

Run Specific Benchmark ​

Generate CPU Profile ​

Generate Memory Profile ​

Profiling Tips ​

1. CPU Profiling ​

2. Memory Profiling ​

3. Race Detection ​

Future Optimizations ​

Planned Improvements ​

Conclusion ​

References ​

Performance

Executive Summary

Benchmark Results

Agent Creation Performance

Execution Performance

Concurrent Performance

Performance Comparison

vs Python Agno

Real-World Scenarios

Scenario 1: Batch Agent Creation

Scenario 2: High-Concurrency API Service

Scenario 3: Multi-Agent Workflow

Optimization Techniques

1. Low Allocation Count

2. Efficient Memory Layout

3. Zero-Copy Operations

Bottleneck Analysis

Current Bottlenecks

Production Recommendations

1. Agent Pooling

2. Goroutine Limits

3. Response Caching

4. Monitoring

Running Benchmarks

Run All Benchmarks

Run Specific Benchmark

Generate CPU Profile

Generate Memory Profile

Profiling Tips

1. CPU Profiling

2. Memory Profiling

3. Race Detection

Future Optimizations

Planned Improvements

Conclusion

References