Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Programming Chrome

Google's Chrome Team Evaluates Retrofitting Temporal Memory Safety on C++ (googleblog.com) 49

"C++ allows for writing high-performance applications but this comes at a price, security..." So says Google's Chrome security team in a recent blog post, adding that in general, "While there is appetite for different languages than C++ with stronger memory safety guarantees, large codebases such as Chromium will use C++ for the foreseeable future."

So the post discusses "our journey of using heap scanning technologies to improve memory safety of C++." The basic idea is to put explicitly freed memory into quarantine and only make it available when a certain safety condition is reached. Microsoft has shipped versions of this mitigation in its browsers: MemoryProtector in Internet Explorer in 2014 and its successor MemGC in (pre-Chromium) Edge in 2015. In the Linux kernel a probabilistic approach was used where memory was eventually just recycled. And this approach has seen attention in academia in recent years with the MarkUs paper. The rest of this article summarizes our journey of experimenting with quarantines and heap scanning in Chrome.
In essence the C++ memory allocator (used by new and delete) is "intercepted." There are various hardening options which come with a performance cost:


- Overwrite the quarantined memory with special values (e.g. zero);

- Stop all application threads when the scan is running or scan the heap concurrently;

- Intercept memory writes (e.g. by page protection) to catch pointer updates;

- Scan memory word by word for possible pointers (conservative handling) or provide descriptors for objects (precise handling);

- Segregation of application memory in safe and unsafe partitions to opt-out certain objects which are either performance sensitive or can be statically proven as being safe to skip;

- Scan the execution stack in addition to just scanning heap memory...


Running our basic version on Speedometer2 regresses the total score by 8%. Bummer...

To reduce the regression we implemented various optimizations that improve the raw scanning speed. Naturally, the fastest way to scan memory is to not scan it at all and so we partitioned the heap into two classes: memory that can contain pointers and memory that we can statically prove to not contain pointers, e.g. strings. We avoid scanning memory that cannot contain any pointers. Note that such memory is still part of the quarantine, it is just not scanned....

[That and other] optimizations helped to reduce the Speedometer2 regression from 8% down to 2%.

Thanks to Slashdot reader Hari Pota for sharing the link
This discussion has been archived. No new comments can be posted.

Google's Chrome Team Evaluates Retrofitting Temporal Memory Safety on C++

Comments Filter:
  • ...couldn't they scramble + random re-write before freeing it so they free memory faster and useless for re-reading?

    I'd call this function "scramem" XD

    • That has the same overhead as zeroing it.

      • by kmoser ( 1469707 )
        Don't most CPUs have native opcodes for filling a block of RAM with zeroes, or any given value, that would be faster than a multi-op loop? Basically blitting, but not necessarily to video RAM.
        • pointers.
          c plus pluses advantage.
          and.
          c plus pluses curse

        • by slack_justyb ( 862874 ) on Monday June 06, 2022 @06:52PM (#62598592)

          Don't most CPUs have native opcodes for filling a block of RAM with zeroes

          For totally generic x86 targets. No. With MMX stuff you can movq 64-bits at a time and keep things in order. Best is a 64-bit write with a pointer to long long or _m64, but you must be 8 byte aligned and even then there's going to be a couple of cycles to revalidate the cache if we're talking something active. On SSE movaps exists by address has to be 16 byte aligned, which would require movsb till you get then, even then that's still looping, just a bit faster.

          Modern CPUs are not really fast PDP-11 machines. There's all kinds of different levels of pipes, caches, and prediction machinery that's better tuned to the looping process. The prediction pipes are made to optimize things they keep seeing, which looping tends to, especially this kind of looping. The core is going to engage the prefetch and mark the addresses exclusively held. Once that happens, you write to RAM as fast as you can get the page in cache onto the bus, which is as fast as you're going to get.

          You'll note I didn't mention vmovdqa32 from AVX512. This is because the instruction separates cache validation and memory write back, that's because what's in the cache might be in a state of flux, and if you're doing scientific computations as fast as possible, you'll wait to flush to RAM till the dust settles. Atomic instructions in x86 only complete once all cache's are updated if the update hits and only if the page alignment hits just right. If not, then it's twice the cycles because you have one for each fragment of memory you cross alignment on. The non-atomic methods seem tempting, but the processor will frustratingly fight you over this in ways that can be hard to predict how the processor will react.

          Modern CPUs are doing lots of things that aren't a 1-to-1 analog to C code, or hell, even assembly. Keeping the cache validated has an entire microcode architecture that handles the prefetch, directory update, invalidation, and signaling to the other cores exclusive or non-exclusive control. Way back in the day one could push DMA cycles to force write-backs. But you did so at your own risk because there wasn't anyway that the DMA controller could have a full understanding of the lay of memory inside the CPU's cache. Today, CPUs won't let you do this and instead force you to opt into the directory based ledger that's kept in L3.

          Basically blitting

          Blitting works because the page of memory is the source of truth. In video, what I write to memory is what I expect to appear. There isn't exactly additional processing left. It's here's some bits, make them appear. So yes, you can do that with video memory. You cannot exactly do that with RAM. If the CPU still has instructions cached from what you are blitting to, what happens when the CPU is done and the cache is flushed back to RAM? Who is the source of truth here? And there's valid reasons for both ways. Blitting makes sense when there is one and only one source of truth. But you wouldn't blit to memory a page that the GL pipe is currently writing to for the same reason. Who is the one that's supposed to win here? But you find a use case for that logic way less in GPUs than you would in a CPU.

          • by kmoser ( 1469707 )
            Wow, wish I had mod points to upvote your answer, which was more useful than my question.
        • Lots of ways. XOR it with itself, AND it with zero, etc. Zeroing costs almost nothing, but either way the proposal here is doing more than nothing.

          A lot of this sounds like they're trying to reinvent the garbage collector.

          But then, Rust is a thing now, and it doesn't need any of this crap to do what they want.

    • In principle, that should be a lower energy option and possibly faster: if they 0 it, they discharge all the electrons, and have to put them back in at next writes. If they do something that keeps them there in a random state/distributed state with the same total voltage, they should be able to use some of that voltage to power the next writes, even while the last state is unreadable.

      But whether that has anything to do with current RAM or the way things are set up I doubt.

      • by Anonymous Coward

        if they 0 it, they discharge all the electrons, and have to put them back in at next writes.

        You need to have an understanding of how DRAM works before you make up silly ideas like this. DRAM is made up of a whole bunch of tiny capacitors. Unfortunately, these capacitors have leakage (they gradually self discharge), and thus requiring constant "refresh." The hardware is periodically reading the value and recharging the capacitors back up to their fully charged state (if it was read as a 1), on a continuous basis.

        If you cleared memory to all zeros, then the periodic refreshing would not have to rech

        • I dont see anything here that says I am wrong in principle, though there are probably wise financial reasons why DRAM designers dont worry about it.

          My point is increasing entropy is free, whether local or global. If you let them 0 out, itâ(TM)s like letting water drain out row of buckets (where 1 is full and 0 is empty) into the gutter increasing global entropy. But if you for instance can let the whole group of buckets drain only into each other, you increase entropy locally, but still have some water

        • Oh, btw for clarity, insofar as you are saying preserving the random 1s long term is more expensive than preserving 0s, you are right. I am saying in very active memory, if you can find a way to not let all that heat flow out from erasure, it would be more efficient. The refresh rate of DDR2 SDRAM is 64 ms and the amount of leakage is going down all the time, so I think its relevant.

    • by Miamicanes ( 730264 ) on Sunday June 05, 2022 @09:50PM (#62596104)

      I don't remember the exact scenario, but I think one of the recent CPU vulnerabilities happened because "zeroing out" a block of memory didn't necessarily propagate from cache to DRAM before another core could physically read it.

      I think somewhere in the memory controller, there was a performance-optimization-induced bug where malicious code abused the cache-control opcodes to make it think it could save time & skip the final step (copying from cache to DRAM), then read the original values at its leisure.

      High-performance cache really throws a monkey wrench into security when you have to assume multiple threads from the same user are mutually-hostile, and the new security reality completely wrecks assumptions that Windows, VMS, Unix, and other OSes have made for decades (and Intel/AMD/ARM optimized for).

  • Rewrite the whole thing in Rust. You know you want to... :)

    (yes, I know that's impractical in the short term)

    • by Anonymous Coward

      more likely, rewrite in modern C++20 with better smart pointers

      • Smart pointers are ok. I love that there is a separation for unique and shared. I used this pattern for many years.

        I think they could have done far better. I think this is far too much manual labor and much to high a risk for accidental abuse. I feel that rather than offering memory safety, it offers convenience and more importantly, a great way to defer cleanup. I have always hated how delete and free were fatalistically immediate.

        Also, C++ will always suck as long as memory isn't relocatable.
        • I'll second that it will always be easier to rewrite a C++ codebase using modern paradigms which make these memory issues nearly disappear, rather than rewrite in another language.

          Also, why does C++ need relocatable memory? That hasn't been an issue in decades since all modern CPUs (except embedded systems) use virtual memory with hardware-based Translation Lookaside Buffers (TLB) which eliminates the bother of memory fragmentation (down to the page sizes).

  • by SuperKendall ( 25149 ) on Sunday June 05, 2022 @02:10PM (#62595174)

    Pretty sure if you look in the dictionary under the definition of "Hack" it just links to this blog post.

    They say we'll not be writing large applications in anything but C++ for the foreseeable future, it kind of makes me despair that even today with so much of the world converted to digital form and browsers used so heavily in all aspects of life, that even the most fundamental browser security is still really a big joke.

  • by fahrbot-bot ( 874524 ) on Sunday June 05, 2022 @02:33PM (#62595218)

    "C++ allows for writing high-performance applications but this comes at a price, security..."

    That last part is a pretty blanket assertion -- pretty sure it's not always, or even mostly, true.
    Even so, I imagine it would depend heavily on the programmer.

    • Even so, I imagine it would depend heavily on the programmer.

      It's not a question of security of the code, but rather security from the programmer doing silly things. It's the principle of not giving someone enough rope to hang themselves.

      Sure you can write memory safe applications in C++, it's just even easier in other languages.

  • Intercepted? (Score:3, Interesting)

    by Viol8 ( 599362 ) on Sunday June 05, 2022 @02:52PM (#62595252) Homepage

    One of the basic facilities of C++ for at least since 1998 is being able to provide your own memory allocators and also override the new and delete operators so they're not doing anything fundamentally different to what many people and teams have done before. Sounds like Google are starting to become MS in making a big deal over reinventing the wheel.

    • Re:Intercepted? (Score:5, Informative)

      by jythie ( 914043 ) on Sunday June 05, 2022 @05:35PM (#62595578)
      Looking at the post, I do not think they are making a 'big deal' about it. They developed a custom allocator and are talking about how it works. Not that differnt than countless other dev not type tech blogs.
      • Re:Intercepted? (Score:4, Insightful)

        by AmiMoJo ( 196126 ) on Monday June 06, 2022 @04:36AM (#62596536) Homepage Journal

        It's also a significant step forward in terms of performance compared to previous custom allocators. It's a topic that has been studied a lot over the decades and it's not a trivial one to solve, so their results are actually pretty interesting. Not least because they have it working in a real-world, highly complex application, which can be realistically benchmarked.

  • by Gibgezr ( 2025238 ) on Sunday June 05, 2022 @02:55PM (#62595254)

    Chrome has enormous memory leaks. How about tracking down those and fixing them first? THEN, once the code is "fixed" and correct, you can begin improving it. Right now you're building skyscrapers of code upon an unsteady foundation.

    • Ah. But you see if they fix the compiler to make memory leaks irrelevant to security then they don't have to plug the leaks! What? It's a fool proof 3 point plan, signed off on by management. Finite state machines? Why don't the users just upgrade to infinite state machines????

      - Every code boot camp / script kiddie / App developer ever.
    • Comment removed based on user account deletion
      • The leaks should have never made it to production code: very minimal basic testing should have detected them when they were introduced to the codebase. I am going to guess the problem is one of programmers being too clever for their own good. You can't just wish away or ignore a major problem in your architecture like that, it should be priority #1 to fix, no matter how big the fix. This isn't some niche use application, it's the most popular browser on the planet.
        Memory leaks mean your code is bad and need

        • Re: (Score:2, Informative)

          Comment removed based on user account deletion
          • Not an MBA. Spent the last 25 years programming in C++. I work within similar restrictions that Google does with C++ as well (no RIAA, no smart pointers). Their memory leaks should be trivial to detect (not fix, but detect): Chrome leaks like a sieve under normal usage. When you impose restrictions on using things like smart pointers you absolutely need to make detecting and fixing memory leaks in your code a priority. I can only assume Google has a bunch of your MBAs writing code for Chrome.

        • The leaks should have never made it to production code: very minimal basic testing should have detected them when they were introduced to the codebase.
          It is close to impossible to test for memory leaks.

          Where did you study CS? I like to write out a warning to my students not to continue at your place.

          • I have been writing C++ code for 25 years now, and test my code for memory leaks frequently. I would expect Google to run unit tests designed to find them, and no, they are not hard to detect. Compile the entire code base with debug, memory-tracking versions of new() and delete() and check the reports generated by them after testing.

            • You can not write unit tests to detect memory leaks.

              Perhaps you want to check what the term "unit test" means.

              Compile the entire code base with debug, memory-tracking versions of new() and delete() and check the reports generated by them after testing.
              That is not a unit test, and not a useful approach - except for extremely wild memory bugs.

              Perhaps you should read some books about it, for starters, e.g Andrew Koenigs, James Coplien or Jiri Soukup - to learn a bit about C++ memory management. Good luck.

              Ofc y

    • Erm, from which planet are you?

      Writing your own memory allocation and deallocation lib is exactly how you track those problems and fix them. Are you really that stupid?

      • Yup. So do that, instead of not doing that. They are not doing that. We could hope that a side-effect of them focusing on security would be that they fix the tendency of Chrome to leak GB of data over a few days use, but I'd rather they fix their software development practices and not release major applications in such terrible shape.

  • by istartedi ( 132515 ) on Sunday June 05, 2022 @03:04PM (#62595274) Journal

    In Rust you have to annotate things to get compile-time memory safety. Instead of mucking up the runtime like this, why not just build a better C++ compiler that's smart enough to figure out you're being stupid at compile time? If that's not possible, why not add some Rust-like memory safety annotations (optional) to C++ and throw warnings when compiled with -Wall? Yes, some people will turn off compiler warnings but that's always been the case. No, this can't possibly make C++ any more cluttered than it already is. That ship has sailed.

    • The only annotations rust really needs are for lifetimes. You'd need to create a c++ borrow checker for that to be of any use. Which also means you'd need to add borrowing rules. And then to make it ergonomic, you'd need to add a lot of methods to help the programmer work with the borrow checker, at which point you just say fuck it, the rust language has already done all of that, so we may as well use it instead, also tossing out C++'s decades old baggage in the process.

  • It still allows accessing validly allocated memory with unrelated pointers, doesn't it? So this just fixes use-after-free, not every buffer overflow.

  • If these techniques were effective there would be hardware support for them. And for that matter why not simply flip the ECC checksum bit on invalidated memory so that it gives a read error / warning when it is accessed.
  • It seems that the programmers still feel entitled to use 100% of CPU for performance vs. spending more resources on quality.

Behind every great computer sits a skinny little geek.

Working...