Apple Silicon
3 posts

Cramming a 400B Model into 48GB: The Magic Behind LLM in a Flash
An Apple paper from 2023 made it possible to run a 400 billion parameter model on an ordinary MacBook. The core technologies—MoE and quantization—hide an engineering philosophy built around on-demand loading.
March 24, 2026 · 5 min · 857 words · Dream Beast Programming

90 Seconds of Waiting, Gone: How oMLX Buries Ollama on Mac
oMLX is built for Apple Silicon, using the MLX framework, SSD-backed KV cache, and continuous batching to cut TTFT from 90 seconds to 1-3 seconds in long-context scenarios, comprehensively outperforming Ollama.
March 23, 2026 · 6 min · 1133 words · Mengshou Programming

Rust 1.89: Intel Macs demoted to Tier 2, clearer lifetimes, and new x86 intrinsics
Rust 1.89 demotes x86_64-apple-darwin (Intel Mac) from Tier 1 to Tier 2, adds the mismatched_lifetime_syntaxes lint, constant generics ‘_’ inference, and a wave of new x86 intrinsics.
August 9, 2025 · 2 min · 309 words · Rexai Programming
