Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Programming IT Technology

Experiences w/ Garbage Collection and C/C++? 112

dberger queries: "Java has helped garbage collection enter the mainstream programmer's life - but it's certainly not new or unique to java. There have been (and are) several ways to add garbage collection to C/C++ - the most active seeming to be Hans Boehm's free libgc. I'm curious if any of the Slashdot crowd has used this (or any other) C++ garbage collector in non-trivial commercial applications. If so - what were your experiences? If not, why not? (Before you ask, yes - I know that GC isn't the only difference between C++ and Java, but 'automagic memory management' is certainly part of Java's marketing luster)"
This discussion has been archived. No new comments can be posted.

Experiences w/ Garbage Collection and C/C++?

Comments Filter:
  • Comment removed (Score:4, Informative)

    by account_deleted ( 4530225 ) * on Monday September 15, 2003 @09:28PM (#6970428)
    Comment removed based on user account deletion
  • GC in OpenCM (Score:5, Informative)

    by Jonathan S. Shapiro ( 321593 ) on Monday September 15, 2003 @10:33PM (#6971151) Homepage
    We made a decision early to use GC and exceptions in OpenCM [opencm.org], even though the application is written in C. Conceptually, it was a big success, but there were a number of hurdles along the way. Here are some things we learned:
    1. The Boehm-Weiser (BW) collector is not as portable as we had hoped. There are a number of platforms we wanted to run on where it just doesn't run at all. Relatively small changes to the target runtime can create a need to port it all over again. OpenBSD, in particular, was an ongoing hassle until we abandoned BW. Hans, I hasten to add, was quite encouraging, but he simply doesn't have time to adequately support the collector.

    2. The BW collector doesn't work in our application. OpenCM has a few very large objects. For reasons we don't really understand, this tends to cause a great deal of garbage retention when running the BW collector. Enough so that the OpenCM server crashed a lot when using it. Please note that this was NOT a bug involving falsely retained pointers, as later experience showed.

    3. Conservative collectors are actually too conservative. If you are willing to make very modest changes in your source code as you design the app, there prove to be very natural places in the code for collection, and the resulting collector is quite efficient.

    4. Independent of the collector, we also hacked together an exceptions package. This was also the right thing to do, but it's easy to trip over it in certain ways. The point of mentioning this is that once you do exceptions the pointer tracking becomes damned near hopeless and you essentially have to go to GC.

      I think the way to say this is: exceptions + GC reduces your error handling code by a lot. Instead of three lines of error check on every procedure call, the error checking is confined to logical recovery points in the program, and you don't have to mess around simulating multiple return values in order to return a result code in parallel with the actually intended return value.

    5. To provide malloc pluggability, we implemented an explicit free operation. This lets us interoperate compatibly with other libraries and do leak detection. Turns out to be very handy in lots of ways.

    6. Hybrid storage management works very well. For example, our diff routine explicitly frees some of its local storage (example [opencm.org]) [Sorry -- this link will go stale within the next few weeks because the OpenCM web interface will change in a way that makes it obsolete. If the link doesn't work for you, try looking for the same file in .../DEV/opencm/...] This is actually quite wonderful, as it lets us build certain libraries to be GC compatible without being GC dependent. One of the challenges in using a GC'd runtime in a library is compatibility with an enclosing application that doesn't use GC. We haven't tried it yet, but it looks like our gcmalloc code will handle this.

    Eventually, we gave up on the BW collector and wrote our own. Our collector is conceptually very similar to the collector that Keith Packard built for Nickle [nickle.org], though we've since built from there. A variant of the Nickle collector is also used as a debugging leak tracer for X11.

    The OpenCM GC system is reasonably well standalone. We need to document it, but others might want to look at it when we cut our next release.

    On the whole, I'ld say that GC for this app was definitely the right thing to do. Once you get into object caches it becomes very hard to locate all of the objects and decide when to free them. We were able to use a conservative approach with no real hassle, and heap size is fairly well bounded by the assisted GC approach we took.

    On the other hand, I would not recommend a pure conservative collector for a pro

  • There's another way. (Score:5, Informative)

    by Lally Singh ( 3427 ) on Monday September 15, 2003 @11:34PM (#6971624) Journal
    Garbage collection has costs:
    - The obvious: CPU & memory overhead for the checking and tracking. I can't comment on the amount here, but it is a generalized solution, so you forego the optimization opportunities that you'd otherwise have.
    - The subtle: Memory allocation can become a major bottleneck in multithreaded systems. Garbage collection has similar issues.
    - The irritating: you don't know when your destructors are called.

    Another way: Smart Pointers. They're simple wrappers around the types that act like pointers, but they can make sure your objects live as long as you need and no longer. The big trick is knowing which kind of smart pointer you want.
    - Reference Counting Smart Pointer (RCSP for short): this type of smart pointer will keep of how many RCSPs are pointing to the same object. It'll delete the object when the last RCSP is destroyed. A good one is the boost shared_ptr. Available for free from www.boost.org. This type is great for general use.

    - Owning Smart Pointer (OSP): this type is specialized for those cases when the refcnt is never more than 1. When you assign one OSP (a) to another (b), the new OSP (a) gets ownership of the referred object, and the old one (b) is automatically set to null. When an OSP that isn't set to null is destroyed, it deletes the object it owns. It's great for parameter passing, return values, and objects you want dead at the end of the current scope, even if there's an exception. The STL comes with auto_ptr, which works this way.

    You can use an RCSP wherever you can use an OSP, but not the other way around. The STL containers are a great example.

    Sure it's not as easy as 'allocate and forget,' but you won't have the (sometimes very costly) expense of full-blown garbage collection.

    Also, you can optimize your smart pointers for individual types (through template specialization). A great example is to give the no-longer-needed object back to a pool for later reuse.

    This is really a quick, quick overview. For the meat & potatoes, go read Effective STL by Scott Meyers.

    I've tried really hard to be fair & polite. There's probably still a bias, but I'm really trying!!
  • by ville ( 29367 ) on Tuesday September 16, 2003 @12:26AM (#6971983)

    And what is the deal with the sort(,) as a free-standing function? Following OO principles, shouldn't the vector object v know how to sort itself with a call to v.sort()?


    Not necessarily. If you have multiple types of containers and you can write a single sort that can sort all those types then why implement it in all of them instead of just once.


    Here [www.gotw.ca] is an article that deals with the question which functions should be members and which shouldn't. It uses the std::string as an example which has a lot of methods that turns out shouldn't have to be.



    // ville

  • by bo0ork ( 698470 ) on Tuesday September 16, 2003 @02:47AM (#6972742)
    I've wrote an OO language back in 1993 that's being used by two medium-sized companies. It's garbage collected, and it's kernel is written in C. The language is not interpreted; it gets translated into C and then compiled. The applications written with the language are fairly large. The source code of one is 28MB uncompressed. I'll skip the general implementation details, and just go over the garbage collection approach I used. These definitions are true for that language; they're not meant to be general.

    A program variable is either a global variable, a stack variable, a class variable or an instance variable. Global and stack variables are held in lists. Class and instance variables are kept inside objects.

    Every class object has a global variable that always refers to it.

    Any object that is not, and that can not become referenced (directly or indirectly) by a global or stack program variable is garbage.

    Each object has a 'not-garbage' flag.

    For each global and stack variable, if the referenced object is not marked not-garbage, mark the referenced object as not-garbage, and recurse for that objects contained variables.

    Delete all objects that are not marked not-garbage.

    There are a few more twists, like handling return values on the stack, but this algorithm correctly handles self-referencing objects no matter the complexity.

  • GC costs (Score:5, Informative)

    by greppling ( 601175 ) on Tuesday September 16, 2003 @05:14AM (#6973269)
    One thing you didn't mention is that GC is deemed to have pretty high processor cache-miss costs. The obvious part is that the GC run itself is basically pointer chasing, i.e. pretty much the worst thing you can do cache-wise. And after the GC run, the cache is clobbered with stuff useless for continuing the work.

    There is another indirect cost pointed out by Linus Torvalds in a lengthy post to the gcc mailining list [gnu.org]. The executive summary is that (he thinks that) memory that is not to be used anymore should be freed immediately. Otherwise, the data in there will keep lying around in the data cache. Also, he claims that explicit ref-counting gives you advantages for optimization: Assume you have to make some modifications to a data structure, but you don't want other parts of the program to see the modifications. Without ref-counting, you have to copy all the data structure before modifying it. With ref-couting, you can omit the copying if you are the only one with access to the data structure.

    And finally, he thinks that GC makes it too easy to write pointer-chasing-heavy code---as that kind of code is bad for cache behaviour all the time.

    It is an ongoing discussion whether GC really has that bad effects on performance of GCC. But Linus Torvalds seems to have very good points. (And some of them certainly cannot be taken into account in a "GC cost is less than hand-written memory management"-paper.)

  • by Inode Jones ( 1598 ) on Tuesday September 16, 2003 @09:06AM (#6974331) Homepage
    I am using the BDW collector in an EDA tool. EDA tools store large databases of circuit connectivity, and for various reasons we don't want to be bothered with explicit memory management.

    The salient points:

    Destructors are not Called

    If an object is allocated in collectible memory, then its destructor will not be called when the object is collected. Therefore, destructors are pretty much useless and your code must be designed to work without them.

    Actually, if your object derives from class gc_cleanup, then its destructor will be called. However, due to the handling of cleanup functions in the BDW collector, cycles of such objects will never be collected. For this reason, I don't use gc_cleanup much.

    Allocating Collectible Memory

    By default, C++ allocates objects in the "malloc" heap. The BDW collector maintains a separate heap. In effect, there are four types of memory:

    • scannable and collectible (GC)
    • scannable, but uncollectible (NoGC)
    • non-scannable, but collectible (GC_atomic)
    • non-scannable, non-collectible (malloc)

    "Scannable" refers to the property that objects in the heap are scanned for pointers. "Collectible" refers to the property that objects in the heap will be deallocated if no further references are found.

    These four memory types are an issue when you interact with STL and third-party class libraries. By default, STL uses the malloc heap. If you want, say, a std::vector in collectible memory, then you need to write an allocator to get it. The most recent versions of the collector come with such a beast; the version I started using did not.

    Similarly, std::string is reference-counted, and in the malloc heap. Here, rather than using an allocator to force it into the collectible heap, I wrote my own lightweight GCString class, which stores the string as an immutable object, and relies on the collector for cleanup.

    Third-party class libraries such as ANTLR may use reference-counted objects; you need to bridge between GC and non-GC applications carefully.

  • by haystd ( 145257 ) * on Tuesday September 16, 2003 @11:47AM (#6975897)
    With the first example I belive the point is that the string and vector classes will clean up themselves when they go out of scope (when their destructors are called). STL is very helpful especially when supplemented with the Boost [boost.org] libraries.

Get hold of portable property. -- Charles Dickens, "Great Expectations"

Working...