LLM | 梦兽编程

AI Agent memory infrastructure toolkit - Ghost + Memory Engine + PostgreSQL

Your AI Agent Can Think, But It Can't Remember

AI agents can reason, plan, and converse—but forget everything once the session ends. The Ghost project solves this with a pure PostgreSQL-based infrastructure, turning the database into the agent’s memory palace.

March 26, 2026 · 6 min · 1196 words · Dream Beast Programming

Mac mini connected to SSD freezer and DRAM fridge, illustrating the layered architecture of LLM in a Flash

Cramming a 400B Model into 48GB: The Magic Behind LLM in a Flash

An Apple paper from 2023 made it possible to run a 400 billion parameter model on an ordinary MacBook. The core technologies—MoE and quantization—hide an engineering philosophy built around on-demand loading.

March 24, 2026 · 5 min · 857 words · Dream Beast Programming

oMLX runs local LLMs on Mac Apple Silicon, dramatically outperforming Ollama with TTFT dropping from 90s to 1-3s

90 Seconds of Waiting, Gone: How oMLX Buries Ollama on Mac

oMLX is built for Apple Silicon, using the MLX framework, SSD-backed KV cache, and continuous batching to cut TTFT from 90 seconds to 1-3 seconds in long-context scenarios, comprehensively outperforming Ollama.

March 23, 2026 · 6 min · 1133 words · Mengshou Programming

Ramp AI Agent Enterprise Finance Automation: One Agent + A Thousand Skills

Don't Build a Thousand Agents: How Ramp Automates Finance with One Agent

Ramp, America’s fastest-growing enterprise finance platform valued at $32B with 50,000+ customers and $100B+ in annual transaction volume, chose a ‘one Agent + a thousand skills’ architecture over building many agents. This is a deep dive into Ramp’s AI实战经验.

March 19, 2026 · 17 min · 3428 words · 梦兽编程

Microgpt 200-line GPT implementation soft-tech infographic, showing five modules: data layer, tokenizer, backpropagation, GPT architecture, training inference

Karpathy's Latest Work: Complete GPT in 200 Lines of Code - The Most Adorable AI Tutorial

Andrej Karpathy has done it again! This time he implemented a trainable, inferable GPT model in just 200 lines of pure Python with no dependencies. This might be the most concise large language model implementation ever.

March 2, 2026 · 6 min · 1093 words · 梦兽编程