Ever tried running a scripting language on a microcontroller?

Blinking an LED on Arduino is bread-and-butter in C. But what if you need users to write their own scripts to control the device? That changes everything.

Take the Arduino UNO — 2KB RAM, 32KB Flash. You want to run Lua on it and let users write business logic. The options are limited:

Option 1: Port eLua. eLua’s VM alone eats over a dozen KB of Flash, leaving little room for application code. 2KB of RAM? Nearly gone after VM initialization.

Option 2: Write your own interpreter. Lexer, parser, bytecode execution engine… three months later you’re still debugging GC issues.

The root problem isn’t Lua itself — Lua is already the lightest option in the embedded space. The problem is the interpreter’s translation overhead: reading bytecodes one by one, jumping to C functions, decoding operands, returning through layers. These overheads are negligible on a PC, but on a microcontroller they’re real gold.

What if there’s another path? Don’t interpret, compile directly. Turn Lua source into machine code and let the chip run it natively.

Cleuton Sampaio recently built something called TinyLua in Rust, and that’s exactly what it does. It compiles Lua source directly into AVR/ARM native machine code — skipping the VM, skipping the interpreter, skipping GC. The resulting binary is flashed into Arduino just like a C firmware.

What does “zero VM” actually mean? How traditional Lua runs

Anyone who’s worked with the Lua C API knows the pattern — your code calls into the interpreter:

// Traditional embedded Lua — you need a full interpreter
lua_State *L = luaL_newstate();  // allocate VM state, consumes memory
luaL_openlibs(L);                // load standard libs, consumes more memory
luaL_dostring(L, "print('hello')");  // interpret every step — overhead
lua_close(L);

This is perfectly fine on a PC. On a microcontroller, every allocation is a luxury.

TinyLua takes a different approach: don’t interpret, compile. Your Lua code:

local a = 10
local b = 20
local c = a + b

Goes through TinyLua’s compiler and comes out as AVR machine code ready to write directly to the chip. No lua_State, no lua_CFunction, no instruction dispatch loop — the CPU runs compiled instructions directly.

Think of it as learning a foreign language instead of always speaking through an interpreter. The interpreter is gone, so conversation speed naturally goes up.

How does the compiler bypass the VM?

A source-to-machine-code compiler has three stages: frontend (source → AST), middle-end (type inference, optimization), backend (target machine code generation).

Lua source → Lexer → Token stream
       → Parser → AST
       → Semantic analysis → IR (intermediate representation with type info)
       → Code generation → AVR/ARM machine instructions

Every step has its challenges, but the biggest one is: Lua is dynamically typed, while machine code demands strong typing.

Consider this Lua code:

function add(a, b)
    return a + b
end

In the Lua interpreter, the line a + b checks at runtime: is a a number? Is b a number? If strings, should implicit conversion happen? If tables, does __add metatable exist? — These runtime checks are the interpreter’s core job, and the main source of performance cost.

TinyLua’s compiler needs to determine types at compile time and generate the corresponding machine instructions directly. If it infers that both a and b are integers, then a + b becomes a single ADD instruction. No runtime type checks, no metatable dispatch — just one instruction.

This is essentially AOT compilation combined with static type inference: the compiler figures out variable types as much as possible before generating code. For what it can determine at compile time, it emits direct instructions. For what must be decided at runtime, it inserts minimal type tags and dispatch code.

Why Rust for building a compiler?

Compiler development is a textbook “problem Rust was made for”:

Pattern matching + enums: A compiler spends most of its time processing AST nodes and IR trees. Rust’s enum plus match makes traversing and transforming these tree structures elegant:

enum LuaValue {
    Nil,
    Boolean(bool),
    Integer(i64),
    Number(f64),
    String(String),
}

// Compile-time type inference
fn infer_type(expr: &Expr) -> Option<LuaType> {
    match expr {
        Expr::Integer(_) => Some(LuaType::Integer),
        Expr::BinaryOp { op: BinOp::Add, left, right } => {
            let lt = infer_type(left)?;
            let rt = infer_type(right)?;
            if lt == LuaType::Integer && rt == LuaType::Integer {
                Some(LuaType::Integer)
            } else {
                Some(LuaType::Number)
            }
        }
        _ => None,  // can't determine at compile time
    }
}

Zero-cost abstractions: Compilers need lots of intermediate data structures (AST, IR, symbol tables, register allocation graphs). With C++, you either manage memory manually (risking leaks) or use smart pointers (with runtime overhead). Rust’s ownership system makes memory management for these structures both safe and efficient — the compiler itself performs well.

Cross-compilation: Rust supports #![no_std] through its LLVM backend, naturally generating code that doesn’t depend on a standard library. Writing a compiler that doesn’t depend on an OS or libc is much more ergonomic in Rust than in most languages.

Memory management: the trickiest problem in embedded compilers

If type inference is the first hurdle, memory management is the second — and the hardest.

Lua has GC (garbage collection). When you write t = {} to create a table in Lua, you don’t worry about when it’s freed — GC handles it. But in a bare-metal compilation scenario, the generated code has no GC runtime. Who reclaims the memory?

TinyLua’s approach:

  1. Static allocation first: the compiler calculates each variable’s required space at compile time whenever possible, embedding it directly into the generated data segment
  2. Arena allocation: for structures whose sizes can’t be determined at compile time (like runtime-length strings), it uses a bump allocator — a large array with a pointer that moves forward, no reclamation, everything freed when the program ends
  3. Manual tagging: for genuinely dynamic allocation scenarios, the generated code includes lightweight reference counting

These three strategies cover most typical microcontroller scenarios. Of course, this means your Lua code can’t go wild — no infinite recursion, no arbitrarily concatenating huge strings, no creating massive temporary tables. But on a 2KB RAM device, those constraints were always there anyway.

-- TinyLua-style code: static, predictable
local led_pin = 13
local delay_ms = 500

function blink()
    gpio_write(led_pin, 1)
    delay(delay_ms)
    gpio_write(led_pin, 0)
    delay(delay_ms)
end

How does it actually perform?

While there’s no official full benchmark, based on demo videos and technical blog posts:

  • Flash usage: The compiled binary is much smaller than carrying a full Lua VM. Without the interpreter, GC, and standard library runtime, only the compiled result of actually used language features remains
  • Execution speed: Pure native instruction execution — no VM dispatch loop overhead. Simple arithmetic and logic operations approach hand-written C performance
  • Startup time: No VM initialization — the program executes immediately. Critical for embedded scenarios that need fast response
  • Developer experience: Users write Lua (simple syntax, low learning curve) but actually run native machine code. Best of both worlds

The significance of this approach may not be in the raw performance numbers, but in proving one thing: a dynamic scripting language doesn’t need a VM to run on a microcontroller. Ahead-of-time compilation is a viable path.

What does this mean?

For embedded developers, tools like TinyLua offer a new option:

ApproachPerformanceMemory UsageLearning CurveFlexibility
Pure C/C++HighestLowestHighLow (recompile & reflash)
MicroPython/eLuaMediumHigh (needs VM)LowHigh (hot reload)
TinyLua (native compile)HighLowLowMedium (compile then flash)

It’s not a “replace everything” solution, but it provides an option that didn’t exist before for specific scenarios: want the development efficiency of scripting languages without sacrificing native performance.

These scenarios are particularly well-suited:

  1. Education: Teaching embedded programming — Lua is much friendlier than C, without exposing students to VM complexity
  2. Simple control logic: Sensor reading, motor control, LED strips — simple logic that needs fast response
  3. User-scriptable devices: Smart home gateways, industrial controllers — let users write Lua configuration logic, compile and push to device

Of course, the project is still early. Supported language features are limited, the standard library is incomplete, and compiler error messages could be more helpful. But the direction and thinking are fascinating. The Rust community has been pushing the boundaries of “infrastructure built in Rust” — databases, operating systems, browser engines, and now compilers.


Want more hands-on Rust content? Follow Dream Beast Programming on WeChat for weekly deep dives into systems programming.

Also check out Dream Beast AI Coding Assistant to bring AI programming tools into your production workflow.

FAQ

Q: Does TinyLua support the full Lua 5.x syntax?

Currently, no. It covers a Lua subset — basic data types, variables, functions, control flow, and basic table operations. Advanced features like coroutines, metatables, and dynamic require are intentionally omitted since they’re unnecessary and expensive to implement in bare-metal scenarios.

Q: Can the generated machine code run cross-platform?

No, this is native compilation. The compiler needs to know the target chip’s instruction set architecture (AVR, ARM Cortex-M, etc.) and generate platform-specific machine code. Switching chips requires recompilation.

Q: How is it different from MicroPython?

MicroPython is still an interpreter with a full VM and runtime running on your chip. TinyLua compiles the code on your PC — only native instructions go on the chip, with no runtime. Think pre-cooked meals vs a full kitchen: lighter weight vs more flexibility.

Q: What about infinite loops or infinite recursion in Lua?

On bare metal with no OS to fall back on, the compiled code will genuinely loop forever. The compiler can do some static checks (like detecting loops without exit conditions), but ultimately the developer is responsible.

Q: How do you load Lua code on a device with no filesystem?

You don’t load it on the device. The code is compiled on the development machine into a binary, then flashed to the chip via ISP/JTAG/USB — exactly the same process as burning C firmware.