Qwen
2 posts

Cramming a 400B Model into 48GB: The Magic Behind LLM in a Flash
An Apple paper from 2023 made it possible to run a 400 billion parameter model on an ordinary MacBook. The core technologies—MoE and quantization—hide an engineering philosophy built around on-demand loading.
March 24, 2026 · 5 min · 857 words · Dream Beast Programming

Rust Makes Qwen LLM Models Blazing Fast Again: 6x Speed Tokenizer Black Magic
bpe-qwen: BPE tokenization core rewritten in Rust for Qwen models, tested at 6x–12x speedup with HuggingFace API compatibility. One-line replacement to accelerate your inference pipeline.
October 16, 2025 · 6 min · 1189 words · Rexai Programming
