An Ordinary Tuesday Morning, CI Suddenly Got 30% Faster

During afternoon tea, a colleague suddenly posted in the group chat: “Hey, did we upgrade the CI machines today? Why is it running so fast?”

I paused, switched to the CI dashboard—our pipeline that had been running like clockwork at 42 minutes for eight months was now showing 27 minutes. I refreshed. Still 27 minutes and 14 seconds.

I hadn’t changed any code, added any caching, or stayed up late tweaking configurations. The Rust compiler had quietly upgraded.

Just like that, overnight, our CI wall-clock time dropped by 35%. That morning, I truly felt for the first time—the optimizations made by compiler teams are real “freebies” for those of us writing business code.

But hold on, it’s not that simple.

When Compilation Gets Slow, It Actually Kills You

You might say, slow compilation? Just grab a coffee and come back, right?

But if you face this scenario every single day, you won’t say that anymore.

When a full compilation takes over 40 minutes, engineers start avoiding things—reluctant to rebase because it means recompiling from scratch; reluctant to do large-scale refactors because changing one file might require a 30-minute wait to verify; CI queues start piling up, urgent hotfixes can only wait in line behind feature branches.

Our team was in exactly this state. In the days before each release, people hardly dared touch the code because they knew any change would mean waiting for compilation again. Worse yet, some people simply gave up on incremental builds and ran clean builds every time—after all, it’s not their machine or their time being wasted.

This phenomenon is academically called “compilation fatigue.” In plain terms: when the compiler gets slow enough, it starts stealing your team’s productivity, confidence, and sleep.

The “Folk Remedies” We Tried Over the Years

Faced with slow compilation, most teams’ first reaction is some seemingly reasonable solutions:

Buy faster machines. 8 cores not enough? Get 16. 16 not enough? Get 32. This works until you find the CI bill spiraling out of control, and no matter how fast the machine, it can’t keep up with codebase growth.

Enable incremental compilation. Rust’s incremental compilation is on by default, but the actual effect… sometimes changing one line barely triggers recompilation, sometimes it’s like triggering a small nuclear explosion, recompiling the entire dependency graph. Unpredictable means untrustworthy.

Split large crates. This is also effective, but splitting has costs—you need to redesign module boundaries, handle cross-crate dependencies, and accept that IDE support might deteriorate.

After trying all these methods, we discovered a problem: when the codebase grows large enough, the bottleneck is no longer any single stage, but the coordination overhead of the entire compiler pipeline itself.

Frontend analysis, monomorphization, code generation, linking—these stages can theoretically all be parallel, but in practice, they’re all competing for global locks and shared queues. On our 8-core CI runner, CPU utilization hovers around 40% to 50%, watching all those cores just idling away.

What Actually Changed in the Rust Compiler Pipeline?

The new pipeline didn’t make Rust faster—it rearranged and recombined the work.

It more aggressively decoupled the analysis, code generation, and linking stages. This made scheduling more fine-grained and reduced global synchronization points—those critical blocking points that used to make the entire build process wait around are gone.

Rust Compiler Pipeline Comparison: Old Pipeline vs New Pipeline

For projects like ours with large crates, heavy generic usage, and deep dependency graphs, this change was practically tailor-made.

Here’s a concrete example. Previously in the code generation stage, although the compiler was nominally parallel, much of the work was actually queuing for a global scheduler. Now, each compilation unit is more independent and can better utilize multiple cores.

So we saw these changes: CPU utilization jumped from 40-50% to 85-90%—not running hotter, just finally getting all the work done.

The Data Doesn’t Lie, But It Makes You Think

In the first week after upgrading, we collected a bunch of data with some interesting conclusions:

Build Time Improvement Data Comparison

Full builds (clean build):

  • P50 time: Dropped from 31 minutes to 20 minutes
  • P99 time: Dropped from 46 minutes to 30 minutes
  • Changed nothing, adjusted no configs, just faster

Incremental builds:

  • P50 improved by about 20%, small changes still fast
  • P99 actually slowed by 10-15%, especially in generic-heavy modules

This is a bit awkward.

Previously incremental builds were unstable, but at least the variance wasn’t so large. Now the median improved, but the variance in extreme cases increased. Some changes compile about the same as before, some actually slower.

Why? Because the new pipeline made compilation units more independent, which improved parallelism but also exposed some issues previously hidden by global stalls. Some modules actually secretly depend on the entire world, just previously masked by global waiting. Now, each module’s true cost is laid out on the flame graph, crystal clear.

Our engineers initially had trouble adapting. Some complained: “Before, I’d wait 5 minutes after changes, now 7 minutes?” But what they didn’t notice was—the cases of waiting 5 minutes after changes became less frequent, most times it’s done in 3 minutes.

It’s just that human brains are naturally more sensitive to negative information, so occasional slowdowns are easier to remember than consistent speedups.

Memory, That Old Friend, Found a New Way to Torture Us

Our CI’s other pain point was the linker. Peak RSS often spiked, causing shared CI runners to OOM and get killed.

The new pipeline did one thing: keep more intermediate state in memory simultaneously to support better parallel scheduling.

The result: memory peaks come earlier, but the curve is smoother. Total memory usage is slightly higher, but no more scary spikes. For CI machines already tight on memory, this is actually good—failures come earlier and more predictably, at least you know where the problem is, rather than mysteriously dying at some random point.

But if your CI machines are already configured on the edge, this upgrade might make you hit the memory ceiling earlier.

A Few Hard-Won Lessons

If you also want to upgrade the Rust compiler to get this 35% speedup, here are some suggestions we earned through actual experience:

Don’t blindly upgrade CI. If your CI is already hovering at the memory limit edge, this upgrade might not help and could even cause earlier OOMs. Try it on low-traffic machines first, confirm no issues before rolling out widely.

This isn’t a silver bullet. If your compilation time is mainly spent on macro expansion, build scripts, or code generation outside rustc (like bindgen), this upgrade will have limited help. It solves compiler internal coordination issues, not all compilation bottlenecks.

This mirror will expose your crate structure problems. The new pipeline will make poorly structured crates glaringly obvious. Too many generics, messy dependencies, poor module division—previously might just be slow compilation, now you’ll find certain changes just trigger massive recompilation. In a way, this is good, at least you know where the problem is.

After upgrading, run profiling once. Use CPU and memory profiling for one clean build, see where time is spent. Then optimize specifically—like splitting that crate hogging 30% of code generation time. This is the correct way to use this upgrade: as a forcing function, pushing you to solve those structural issues you’ve been dragging your feet on.

So, Is This Upgrade Worth It?

The 35% reduction in compilation time is real, the numbers on our CI dashboard don’t lie. But this 35% isn’t free, it has some prerequisites:

You need enough memory headroom to support higher peak concurrency. You need to accept that incremental build variance will increase. You need to treat this upgrade as an opportunity to review and improve those long-ignored crate structure issues.

If your compilation time is already short (like a few minutes), you might not notice much change from this upgrade. If your CI machines are memory-starved, you might even regress. If your crate graph is a mess, the compiler will now show you this problem in full detail.

But if you’re like us—medium to large codebase, with some resource headroom, just needing a catalyst to seriously address compilation optimization—then this upgrade’s 35% speedup is real.

More importantly, it taught me one thing: performance optimization rewards those willing to measure, punishes those who guess.

Before, when we discussed compilation optimization, we went by feel: “This module should be slow, right?” “That dependency looks heavy.” Now? Pull up a flame graph, let data talk. Which crate takes the most time? Which one is most worth splitting? Everything has data backing it.

Final Thoughts

This compiler upgrade made me realize something: when we complain about slow compilation, it’s often not just about “slowness.” Behind the slowness is unreasonable resource utilization, unclear module boundaries, mismatches between toolchain and code structure.

The Rust compiler team put in huge effort redesigning the entire pipeline to get us this 35% speedup. As users, we can’t just lie back and “freeload”—we should take this opportunity to solve those issues blocking our true efficiency.

Next time CI finishes running, spend three minutes looking at the flame graph. You might discover that module you always thought “should be fine” is actually secretly eating up 40% of your compilation time.


If this article helped you, please like and share it so more colleagues still suffering from compilation times can see it. Also welcome to share your compilation optimization experiences in the comments—what pitfalls have you hit? Any secret techniques?

Follow Dream Beast Programming, we’ll continue discussing Rust performance optimization stories in the next article.