MoE | 梦兽编程

Mac mini connected to SSD freezer and DRAM fridge, illustrating the layered architecture of LLM in a Flash

Cramming a 400B Model into 48GB: The Magic Behind LLM in a Flash

An Apple paper from 2023 made it possible to run a 400 billion parameter model on an ordinary MacBook. The core technologies—MoE and quantization—hide an engineering philosophy built around on-demand loading.

March 24, 2026 · 5 min · 857 words · Dream Beast Programming

Bring Kimi K2 Thinking Home with 247GB RAM: Dynamic 1-bit GGUF Field Notes

Step-by-step guide to running Unsloth’s Dynamic 1-bit GGUF build of the 1T-parameter Kimi K2 Thinking model on high-end PCs, covering install, download, inference, serving, and troubleshooting.

November 11, 2025 · 4 min · 815 words · Rexai Programming