Tokencake: Multi-Agent KV Cache Scheduling That Cuts vLLM Latency by Half
Beihang/Peking/Alibaba introduce Tokencake, a KV-cache-centric serving framework for multi-agent apps. With time+space scheduling plus CPU buffering and progressive GPU reservation, it trims end-to-end latency by 47%+ versus vLLM and lifts GPU cache utilization by ~17%.