Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
Programming The Internet

Cloudflare Raves About Performance Gains After Rust Rewrite (cloudflare.com) 53

"We've spent the last year rebuilding major components of our system," Cloudflare announced this week, "and we've just slashed the latency of traffic passing through our network for millions of our customers," (There's a 10ms cut in the median time to respond, plus a 25% performance boost as measured by CDN performance tests.) They replaced a 15-year-old system named FL (where they run security and performance features), and "At the same time, we've made our system more secure, and we've reduced the time it takes for us to build and release new products."

And yes, Rust was involved: We write a lot of Rust, and we've gotten pretty good at it... We built FL2 in Rust, on Oxy [Cloudflare's Rust-based next generation proxy framework], and built a strict module framework to structure all the logic in FL2... Built in Rust, [Oxy] eliminates entire classes of bugs that plagued our Nginx/LuaJIT-based FL1, like memory safety issues and data races, while delivering C-level performance. At Cloudflare's scale, those guarantees aren't nice-to-haves, they're essential. Every microsecond saved per request translates into tangible improvements in user experience, and every crash or edge case avoided keeps the Internet running smoothly. Rust's strict compile-time guarantees also pair perfectly with FL2's modular architecture, where we enforce clear contracts between product modules and their inputs and outputs...

It's a big enough distraction from shipping products to customers to rebuild product logic in Rust. Asking all our teams to maintain two versions of their product logic, and reimplement every change a second time until we finished our migration was too much. So, we implemented a layer in our old NGINX and OpenResty based FL which allowed the new modules to be run. Instead of maintaining a parallel implementation, teams could implement their logic in Rust, and replace their old Lua logic with that, without waiting for the full replacement of the old system.

Over 100 engineers worked on FL2 — and there was extensive testing, plus a fallback-to-FL1 procedure. But "We started running customer traffic through FL2 early in 2025, and have been progressively increasing the amount of traffic served throughout the year...." As we described at the start of this post, FL2 is substantially faster than FL1. The biggest reason for this is simply that FL2 performs less work [thanks to filters controlling whether modules need to run]... Another huge reason for better performance is that FL2 is a single codebase, implemented in a performance focussed language. In comparison, FL1 was based on NGINX (which is written in C), combined with LuaJIT (Lua, and C interface layers), and also contained plenty of Rust modules. In FL1, we spent a lot of time and memory converting data from the representation needed by one language, to the representation needed by another. As a result, our internal measures show that FL2 uses less than half the CPU of FL1, and much less than half the memory. That's a huge bonus — we can spend the CPU on delivering more and more features for our customers!

Using our own tools and independent benchmarks like CDNPerf, we measured the impact of FL2 as we rolled it out across the network. The results are clear: websites are responding 10 ms faster at the median, a 25% performance boost. FL2 is also more secure by design than FL1. No software system is perfect, but the Rust language brings us huge benefits over LuaJIT. Rust has strong compile-time memory checks and a type system that avoids large classes of errors. Combine that with our rigid module system, and we can make most changes with high confidence...

We have long followed a policy that any unexplained crash of our systems needs to be investigated as a high priority. We won't be relaxing that policy, though the main cause of novel crashes in FL2 so far has been due to hardware failure. The massively reduced rates of such crashes will give us time to do a good job of such investigations. We're spending the rest of 2025 completing the migration from FL1 to FL2, and will turn off FL1 in early 2026. We're already seeing the benefits in terms of customer performance and speed of development, and we're looking forward to giving these to all our customers.

After that, when everything is modular, in Rust and tested and scaled, we can really start to optimize...!

Thanks to long-time Slashdot reader Beeftopia for sharing the article.

Cloudflare Raves About Performance Gains After Rust Rewrite

Comments Filter:
  • by TurboStar ( 712836 ) on Sunday November 02, 2025 @12:59AM (#65767056)

    The headline is misleading. Rust has nothing to do with the performance gains. They rewrote an entire system using what they learned from the previous version. Sometimes this is the right thing to do. It's not always easy to predict, but when folks start raving about the rewrite then it was probably a good decision.

    • I agree, most performance issues are due to a misdirected architecture, not language.

      Not all architectural performance bottlenecks were a problem initially, but as the system grows they become more and more noticable.

      • by ArmoredDragon ( 3450605 ) on Sunday November 02, 2025 @02:58AM (#65767124)

        People who say this usually haven't tried to write simultaneously multithreaded and concurrent applications in a systems language. Shit, rust makes it easier to do that than even "easy" higher level languages that were specifically designed for it from the beginning, like go.

        • Ok so they didn't know how to make the changes to make it go faster without rust. That just means they are inept, not that rust made it go faster.
    • by 93 Escort Wagon ( 326346 ) on Sunday November 02, 2025 @02:14AM (#65767102)

      The headline is misleading.

      At a minimum, the headline is certainly (and perhaps intentionally?) ambiguous. The "summary" - which probably includes the entire blog post - does make it pretty obvious that rust was not the reason for the speedup, although their choice of rust certainly makes prima facie sense.

      I'm a little surprised that replacing a bunch of old disparate software that's basically hacked together with a scripting language (obligatory xkcd [xkcd.com]) with a new custom compiled job only resulted in a 25% speed-up.

      • Re: (Score:2, Informative)

        The "summary" - which probably includes the entire blog post ...

        You know, assumptions are a tricky thing to base your reasoning on. But, as a hint, CF blog posts tend to be on the longer side, with interesting technical details, so it's kind of sad to see that TFS does not include a link to the source - although it aligns with the tradition of not RTFA around here, so maybe-ok job EditorDavid? Anyway, for your reading pleasure, this appears to be the missing link [cloudflare.com].

        And btw, this:

        I'm a little surprised that re

    • by arglebargle_xiv ( 2212710 ) on Sunday November 02, 2025 @02:36AM (#65767108)
      Came here to say the same thing. There have been several claims of the superiority of language or methodology X which, on closer examination, turned out to be caused by a rewrite of some old cobbled-together mess accumulated over a 20-year period with a new, properly-designed replacement. You could have replaced it with something new written in Visual Basic and seen an improvement.
    • by Somervillain ( 4719341 ) on Sunday November 02, 2025 @11:48AM (#65767762)

      The headline is misleading. Rust has nothing to do with the performance gains. They rewrote an entire system using what they learned from the previous version. Sometimes this is the right thing to do. It's not always easy to predict, but when folks start raving about the rewrite then it was probably a good decision.

      Having seen the difference between something in Go and Rust, the hype is real. Go is a dogshit slow language....much slower than Java, but a little faster than Python. I ported some Go testing utilities some dipshit at my company wrote to rust...VERY tangible difference...probably 25%. Had full confirmation from the team there was no loss in functionality. The go code was even well written, IMO, certainly no obvious explanations for the bad performance...it's just a shitty slow language...same with Python, JavaScript, etc. We've seen similar results porting node.js or Go garbage from old teams into Java.

      If you don't care about performance or efficiency or cloud spend...write in whatever you want...when money is on the line?...it's pretty common your toy prototypes need to be rewritten in grown-up languages like Rust or Java or even C/C++. Facebook famously started on PHP and had to rewrite everything because the language couldn't handle a site that big. We ported a bloated boondoggle Python app to Java...gave us MASSIVE cloud spend bills...reduced to less than half when porting to Java. We used to require 12-20 instances and now it's like 4-8...response time is 1/3 of what it used to be, etc. Admittedly, a modest fraction of that is what you described...re-examining old functionality, but most of the latency was really just removing the Python overhead.

      Replacing Lua with Rust is like replacing a heavy steel beach cruiser with a modern carbon-fiber or titanium racing bike. You WILL see a massive performance boost. You could claim the cyclist just got better and yeah...the operator makes a bigger difference than the tool...but....shitty tools are shitty tools. Lua...FFS...who the fuck would use that for mission critical infrastructure? Isn't Lua a kid-friendly scripting language? No scripting language should be in charge of anything you want performance and efficiency from.

      • To be fair there's a common way to compile Lua to JVM bytecode so it's likely just a Java front-end, not using the basic interpreter.

        Back in the day there was a craze to port Lua, Ruby, Perl, Groovy(!), to run as Java front-ends. Not many got put into production outside of Lua.

        However the real point here is that it's now "tell me why I shouldn't use Rust" time.

        Moving ABI might be a reasonable objection for a small team but Cloudflare has over a hundred engineers on this so it's not a problem.

        They get speed

    • by shanen ( 462549 )

      Okay FP, but I think there is actually a term "second-system effect" to describe it. I just confirmed it's in the old jargon file. (I even had a dead tree version a long time ago...)

  • by boa ( 96754 ) on Sunday November 02, 2025 @01:09AM (#65767066)

    I mean, come on. Everybody knows that. They could've implemented the Lua parts in C as well, and then compare performance.

    • by kertaamo ( 16100 )

      Yes, but that would have been a silly thing to do. Performance is not the only point here.

    • by bsolar ( 1176767 )

      I mean, come on. Everybody knows that. They could've implemented the Lua parts in C as well, and then compare performance.

      According to the article they used LuaJIT. I would not be surprised in their use case to get basically equivalent C performance.

      They did state the main reason for the better performance: the new implementation has less logic and by using Rust cohesively instead than mixed with C/Lua components there is no need for "translation layers" between languages anymore.

      They could have achieved the same by consolidating to C, but of course Rust brings additional important advantages for them.

    • Usually I see a headline like this and I'm like "yeah, the original code sucked and would have been made faster by rewriting it in the original language too".

      And I'm 100% confident that this is the case here too. But Lua is slow as fuck and I'm sure that transition did help considerably.

  • I have noticed it not being quite the latency fiend it used to be. I wonder if it's related or maybe someone just unkinked the Cat5 at the ISP by coincidence.
  • by Rosco P. Coltrane ( 209368 ) on Sunday November 02, 2025 @01:37AM (#65767080)

    and the ubiquitous surveillance of large swathes of the internet as well. Woohoo!

  • by registrations_suck ( 1075251 ) on Sunday November 02, 2025 @03:56AM (#65767164)

    Does "better performance" include inducing major outages or are those conveniently ignored?

  • In the Iranian Zolympics he/she used friction tape to predict the Simpsons. Zoho One the heat, but not without social controversy. Semipro trainers used the analogy to cover other Zoho integrations as well.
  • Let's assume, for the purpose of argument, that the reported performance gains are real and not merely artifacts of the measurement process.

    1. It's more than likely that the original code simply wasn't well-written. First generation code often isn't.

    2. Even if the original code was well-written, a rewrite is highly likely to produce improvements -- presuming that the authors of the second-generation code studied what already existed and thought carefully about its issues/problems.

    In other words, I
    • First generation code is more often than written with theoretical use-cases in mind. Second generation code usually written with hindsight of how the 1st generation code was actually used in practice. You seem to call that "better", I rather use "more insightful".

  • There is only one reliable way to improve the speed of code: performance tuning / profiling. At least 90% of performance problems exist because nobody bothered to profile the code.

    C / C++ are inherently faster than languages like Rust, because they play fast and loose with memory and references, leaving it to the programmer to make sure they properly release unused memory, and not try to reference unallocated memory. But those differences in inherent speed are vastly overwhelmed by poorly structured code. S

  • . . . including its runtime, garbage collection, and compiler. For a high transaction rate highly parallel I/O use cases - such as at Cloudflare - that means a LuaJIT application, even when deployed across multiple processes running on the same box, will likely be hamstrung in comparison to even moderately competently written Rust, Java, or C.

    Or am I missing something obvious?

Message from Our Sponsor on ttyTV at 13:58 ...

Working...