Multi-Agent
2 posts
Tokencake: Multi-Agent KV Cache Scheduling That Cuts vLLM Latency by Half
Beihang/Peking/Alibaba introduce Tokencake, a KV-cache-centric serving framework for multi-agent apps. With time+space scheduling plus CPU buffering and progressive GPU reservation, it trims end-to-end latency by 47%+ versus vLLM and lifts GPU cache utilization by ~17%.
October 30, 2025 · 4 min · 679 words · DreamBeast Programming
Agno-Go: Building AI Agents in Go - What's it Like Being 16x Faster than Python?
Rewriting AI Agent framework in Go brings 16x performance boost, 180ns agent startup, and only 1.2KB memory footprint - this is the extreme experience Agno-Go delivers
October 4, 2025 · 5 min · 854 words · Rexai Programming