Skip to content

Performance

Agno-Go is designed for extreme performance, achieving 16x faster agent instantiation compared to Python Agno.

Executive Summary

Performance Goals Achieved:

  • ✅ Agent instantiation: ~180ns (<1μs target)
  • ✅ Memory footprint: ~1.2KB/agent (<3KB target)
  • ✅ Concurrency: Linear scaling with no contention

Benchmark Results

Agent Creation Performance

BenchmarkTime/opMemory/opAllocs/op
Simple Agent184.5 ns1,272 B (1.2 KB)8
With Tools193.0 ns1,288 B (1.3 KB)9
With Memory111.9 ns312 B (0.3 KB)6

Key Findings:

  • ⚡ Agent creation: <200 nanoseconds (5x better than 1μs target!)
  • 💾 Memory usage: 1.2-1.3KB (60% better than 3KB target)
  • 🎯 Adding tools costs only 8.5ns overhead
  • 🎯 Memory is lightweight (only 312B)

Execution Performance

BenchmarkThroughput
Simple Run~6M ops/sec
With Tool Calls~0.5M ops/sec
Memory Operations~1M ops/sec

Note: Real performance is bounded by LLM API latency (100-1000ms). Above results use mock models.

Concurrent Performance

BenchmarkTime/opMemory/opScaling
Parallel Creation191.0 ns1,272 B✅ Linear
Parallel RunSimilarSimilar✅ Linear

Key Findings:

  • ✅ Concurrent and single-threaded performance are identical
  • ✅ No lock contention or race conditions
  • ✅ Perfect for high-concurrency scenarios

Performance Comparison

vs Python Agno

MetricGoPythonImprovement
Instantiation~180ns~3μs16x faster
Memory/Agent~1.2KB~6.5KB5x less
ConcurrencyNative goroutinesGIL limitedSuperior

Real-World Scenarios

Scenario 1: Batch Agent Creation

Creating 1,000 agents:

  • Time: 1,000 × 180ns = 0.18ms
  • Memory: 1,000 × 1.2KB = 1.2MB

Scenario 2: High-Concurrency API Service

Handling 10,000 req/s:

  • Per request: 1 agent instance
  • Memory overhead: 10,000 × 1.2KB = 12MB
  • Latency: <1ms (excluding LLM API calls)

Scenario 3: Multi-Agent Workflow

100 agents collaborating:

  • Total memory: 100 × 1.2KB = 120KB
  • Startup time: 100 × 180ns = 18μs

Optimization Techniques

1. Low Allocation Count

  • Only 8-9 heap allocations per agent
  • No unnecessary interface conversions
  • Pre-allocated slice capacities

2. Efficient Memory Layout

go
type Agent struct {
    ID           string        // 16B
    Name         string        // 16B
    Model        Model         // 16B (interface)
    Tools        []Toolkit     // 24B (slice header)
    Memory       Memory        // 16B (interface)
    Instructions string        // 16B
    MaxLoops     int           // 8B
    // Total: ~112B struct + heap allocations
}

3. Zero-Copy Operations

  • String references (no copying)
  • Interface pointers (no copying)
  • Slice views (no copying)

Bottleneck Analysis

Current Bottlenecks

  1. LLM API Latency (100-1000ms)

    • Solution: Streaming, caching, batch requests
  2. Tool Execution Time (varies)

    • Solution: Parallel execution, timeout controls
  3. Not yet benchmarked:

    • Team coordination overhead
    • Workflow execution overhead
    • Vector DB queries

Production Recommendations

1. Agent Pooling

Reuse agent instances to reduce GC pressure:

go
type AgentPool struct {
    agents chan *Agent
}

func NewAgentPool(size int, config agent.Config) *AgentPool {
    pool := &AgentPool{
        agents: make(chan *Agent, size),
    }
    for i := 0; i < size; i++ {
        ag, _ := agent.New(config)
        pool.agents <- ag
    }
    return pool
}

func (p *AgentPool) Get() *Agent {
    return <-p.agents
}

func (p *AgentPool) Put(ag *Agent) {
    ag.ClearMemory()
    p.agents <- ag
}

2. Goroutine Limits

Limit concurrency to avoid resource exhaustion:

go
semaphore := make(chan struct{}, 100) // Max 100 concurrent

for _, task := range tasks {
    semaphore <- struct{}{}
    go func(t Task) {
        defer func() { <-semaphore }()

        ag, _ := agent.New(config)
        ag.Run(ctx, t.Input)
    }(task)
}

3. Response Caching

Cache LLM responses to reduce API calls:

go
type CachedModel struct {
    model models.Model
    cache map[string]*types.ModelResponse
    mu    sync.RWMutex
}

func (m *CachedModel) Invoke(ctx context.Context, req *models.InvokeRequest) (*types.ModelResponse, error) {
    key := hashRequest(req)

    m.mu.RLock()
    if cached, ok := m.cache[key]; ok {
        m.mu.RUnlock()
        return cached, nil
    }
    m.mu.RUnlock()

    resp, err := m.model.Invoke(ctx, req)
    if err != nil {
        return nil, err
    }

    m.mu.Lock()
    m.cache[key] = resp
    m.mu.Unlock()

    return resp, nil
}

4. Monitoring

Monitor key metrics in production:

go
import "github.com/prometheus/client_golang/prometheus"

var (
    agentCreations = prometheus.NewCounter(prometheus.CounterOpts{
        Name: "agno_agent_creations_total",
    })

    agentLatency = prometheus.NewHistogram(prometheus.HistogramOpts{
        Name: "agno_agent_run_duration_seconds",
    })
)

Running Benchmarks

Run All Benchmarks

bash
make bench
# or
go test -bench=. -benchmem ./...

Run Specific Benchmark

bash
go test -bench=BenchmarkAgentCreation -benchmem ./pkg/agno/agent/

Generate CPU Profile

bash
go test -bench=. -cpuprofile=cpu.prof ./pkg/agno/agent/
go tool pprof cpu.prof

Generate Memory Profile

bash
go test -bench=. -memprofile=mem.prof ./pkg/agno/agent/
go tool pprof mem.prof

Profiling Tips

1. CPU Profiling

bash
go test -cpuprofile=cpu.prof -bench=.
go tool pprof -http=:8080 cpu.prof

2. Memory Profiling

bash
go test -memprofile=mem.prof -bench=.
go tool pprof -http=:8080 mem.prof

3. Race Detection

bash
go test -race ./...

Future Optimizations

Planned Improvements

  • [ ] String interning for repeated values
  • [ ] sync.Pool for agent reuse
  • [ ] Batch tool execution
  • [ ] HTTP/2 connection pooling for LLM APIs
  • [ ] gRPC support for lower latency

Conclusion

Agno-Go exceeds performance targets:

  • ✅ 5x faster than target (180ns vs 1μs)
  • ✅ 60% less memory than target (1.2KB vs 3KB)
  • ✅ 16x faster than Python, 5x less memory
  • ✅ Perfect concurrency scaling

Supports:

  • Thousands of concurrent agents
  • 10K+ requests/second
  • Low-latency real-time applications

References

Released under the MIT License.