#Local LLM

2 篇文章

Cramming a 400B Model into 48GB: The Magic Behind LLM in a Flash

AI 2026-03-24 5 分钟

Cramming a 400B Model into 48GB: The Magic Behind LLM in a Flash

An Apple paper from 2023 made it possible to run a 400 billion parameter model on an ordinary MacBook. The …

90 Seconds of Waiting, Gone: How oMLX Buries Ollama on Mac

AI 2026-03-23 6 分钟

90 Seconds of Waiting, Gone: How oMLX Buries Ollama on Mac

oMLX is built for Apple Silicon, using the MLX framework, SSD-backed KV cache, and continuous batching to cut …