Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Programming IT Technology

Experiences w/ Garbage Collection and C/C++? 112

dberger queries: "Java has helped garbage collection enter the mainstream programmer's life - but it's certainly not new or unique to java. There have been (and are) several ways to add garbage collection to C/C++ - the most active seeming to be Hans Boehm's free libgc. I'm curious if any of the Slashdot crowd has used this (or any other) C++ garbage collector in non-trivial commercial applications. If so - what were your experiences? If not, why not? (Before you ask, yes - I know that GC isn't the only difference between C++ and Java, but 'automagic memory management' is certainly part of Java's marketing luster)"
This discussion has been archived. No new comments can be posted.

Experiences w/ Garbage Collection and C/C++?

Comments Filter:
  • Comment removed (Score:4, Informative)

    by account_deleted ( 4530225 ) * on Monday September 15, 2003 @08:28PM (#6970428)
    Comment removed based on user account deletion
    • C++ is very expressive indeed, and the example shows that C++ is good at something for which many would consider a scripting language to be required. But like some scripting languages, this C++ code fragment descends into a mix of Egyption hieroglyphics and Hittite cuneiform rather quickly.

      So please tell me what

      typedef vector::const_iterator Iter;

      (or rather vector::const_iterator) is supposed to mean. I suppose vector is a templated class, but how does ::const_iterator come up with a type name -

      • And what is the deal with the sort(,) as a free-standing function? Following OO principles, shouldn't the vector object v know how to sort itself with a call to v.sort()?

        Not necessarily. If you have multiple types of containers and you can write a single sort that can sort all those types then why implement it in all of them instead of just once.

        Here [www.gotw.ca] is an article that deals with the question which functions should be members and which shouldn't. It uses the std::string as an example which has a lot

      • Classes can have inner classes as well as typedefs. Those are in the namespace of the class, so the namespace operator :: is used to access them.

        And the sort question was answered by somebody else, but here is a bit more on the subject: if a bunch of classes share portions of their interfaces, and the shared subset is enough to perform a useful operation, why not share the implementation of the operation? While you could certainly "tell a vector to sort itself", it makes just as much sense to "apply the
        • The original discussion was whether anyone used/benefited from a C++ implementation of GC. A poster responded with a link to Stroustrup pointing out that C++ is so expressive that you don't need GC -- you can embed the memory management in stack-frame objects which take care of it for you.

          I pointed out that Stroustrup's example shows the expressive power of C++, but there is a big "huh?" factor of reading the code on account that many of us mere mortals are not rehearsed in the use of templates and STL,

      • So please tell me what

        typedef vector::const_iterator Iter;

        (or rather vector::const_iterator) is supposed to mean. I suppose vector is a templated class, but how does ::const_iterator come up with a type name -- I thought :: either references a static field or a class member function?

        No, classes can have types too. In this case, a vector::const_iterator is an iterator over the vector type that can point to anything in the container (the vector) but cannot change anything in it. A read-only pointer,

        • C++ is not just an OO language. It's a multi-paradigm language.

          I'm tired of people confusing any constraining in a language with being handcuffed. They are used so much to be loose that anything else looks like too tight to them. I know nothing about music, but when you learned you accepted the pentagram paradigm and language for describing music, didn't you? Do you need pointers and templates for music? seems like a single language, single paradigm to me.
          In the end if you don't like just don't use it.

          • I'm tired of people confusing any constraining in a language with being handcuffed.

            I didn't say anything about being handcuffed. Sure, Java and other OO languages are Turing complete. You can write anything you want in them. Whether you should or not is another matter. Basically, you should always, as much as possible, use the right tool for the job. Sometimes, heck, even a lot of times, OO works well. But there are sometimes where it doesn't and for those times you're much better off using a langu

        • I agree with this assessment somewhat, in that C++ is a generally more nifty language (in the hands of an expert) than most people give it gredit for. However, its got some limitations I wish were addressed ---

          It's awfully verbose. Namespaces don't help, because of irritating practical restrictions on the use of "using namespace" in header files.

          Its support for functional constructs is very limited. In particular, the lack of type inference and proper lambdas makes functional code painful to write.

          C++ is
          • To write functional code you need to stick to template meta programming.

            But oh well, that code is really ugly :-)

            A good book about it (despite the missleading title) is: Generative Programming from Ulrich Eisenecker (sorry, and another guy who has a to long/strange name to memorize, but that one is also a real coryphaea)

            angel'o'sphere
            • To write functional code you need to stick to template meta programming.
              You can write functional code in templates, but it executes at compile time, which isn't always what you want. Unless you meant lambda functions, which I call 'expression objects'?
          • Its support for functional constructs is very limited. In particular, the lack of type inference and proper lambdas makes functional code painful to write.

            That is actually being addressed by the Boost library [boost.org]. Boost is basically a testing ground for future additions to the language so if it works out there it's a good chance it will get added to the standard in a few years as an add on library, similar to the STL. For lambdas, take a look at The Boost Lambda Library [boost.org], especially the examples [boost.org]. Other st

    • I looked at Stroustrup's two examples. It looks like his first example does not involve freeing any memory at all. Am I right? His second example seems to use auto_ptr to assure that an object is freed when the function where it's allocated returns. Is that all it's doing? I would expect the situations where people get memory leaks to be more complex than auto_ptr could handle.

      Anyway, he never mentions garbage collection; just easier "explicit" management. (I put "explicit" in quotes, because malloc

      • With the first example I belive the point is that the string and vector classes will clean up themselves when they go out of scope (when their destructors are called). STL is very helpful especially when supplemented with the Boost [boost.org] libraries.
      • I can see plenty of freeing of memory in the first example.
        In case you missed it, here it is again:
        }

        I remember racing C code with (compiled) Scheme once, and the two were pretty close for the kind of task I was doing. However, when my problem size reached a certain point, GC would have to kick in and scheme was suddenly relegated to the vastly-slower-than-C camp, with so many other languages that otherwise would have plenty of merits. So I admit that gave me an instant ant-GC bias. (That and the _disaster_
        • OK, now I understand. Basically he's saying that if you follow a certain discipline, allocated memory will be freed when the function returns. But the same could be said of explicitly freeing at the end of the function, and having goto's at places where you would otherwise return early. The second example bumps it up exactly one level. However, for objects that get passed around between more than two functions, you still need to keep track of what must be freed.

          Regarding Scheme performance, check out

          • I think perhaps you're still misunderstanding slightly.

            The idea is that you wrap the resources in an automatic variable -- something that *will* be destroyed automatically when it goes out of scope. You cannot forget this, because the language does it for you. Now, have the destructor for that automatic variable release the resource it manages and bingo, you can never forget to release the resource.

            The idiom is known as "resource acquisition is initialisation", BTW, if you want to look it up.

            • I don't think I'm misunderstanding. I understand you have a mechanism that destroys the object a variable is bound to when that variable goes out of scope. That's only useful if you aren't putting the object in a data structure or otherwise planning on using the object outside of the block where that variable is in scope. Basically, it's only useful in trivial cases. I'm not saying it's worthless; I can see it has some advantage over explicitly freeing all the objects. It's just not anything near as po

              • I still think you're misunderstanding the significance of this. :-)

                In well-written C++, almost all objects are automatic variables, or ultimately contained within objects that are automatic variables). It's also far more common to pass by value or pass a reference than it is to start throwing ownership around using pointers. You don't write:

                SomeType *st = new SomeType();

                all over the place as in something like Java. You just write:

                SomeType st;

                under most circumstances.

                Typically, a data structu

          • Bjarne's emphasising the fact that if one has the need to do it explicitly, then one has the opportunity to muck up. No paradigm is an unchinked silver bullet, but some are less chinked than others.

            Language shootout - if that's the one I'm thinking of it promotes the use of unidiomatic, deliberately perverse code to make Perl look 2 times slower than it really is. However, it might not be the same shootout, so I'll google as soon as I click send. (But take that as a caveat that any language can be made slo
    • by Javagator ( 679604 ) on Tuesday September 16, 2003 @10:36AM (#6975779)
      I was part of a project to write an image display system of about 100k lines of C++ code. Our coding standards required that we allocate resources in constructors, release them in destructors, and put objects on the stack when possible. When putting objects on the stack was not possible, we used smart pointers.

      Towards the end of the project, we finally got our company to spring for Purify. We ran Purify on the code and found a few places where we forgot to release a graphics context, and one place where we didn't follow our coding standards, but no other memory leaks. We than ran purify on a comparable C system and found hundreds of memory leaks.

      Proper use of constructors and destructors can make resource management virtually automatic.
    • Depending upon how you are using Java, you end up having to do a lot of things that end up being managed garbage collection anyway. i.e. not that different from C/C++ except that when you screw up you don't end up with a memory leak.

      If you use a tool like boundschecker and run it a couple of times a day, you'll avoid most of these sorts of problems in C++. It's an amazing tool I couldn't live without. One advantage to having these sorts of things flag as a "memory leak" is that you can often find subtl

    • Bjarne is only partly right.

      First of all, he is against GC, and thats why he finds ways to avoid it and arguments for its unnecessarity.
      However there two points which make his argumentation weak:
      a) you need to know a far big deal of "how the standard library works" and about c++ in general to apply his hints. If you had GC ... programmers could focus on their algorithms instead

      b) all the arguments *against* GC, are in fact arguments *for* GC. All the work and the burdon the ordinary programmer is freed fr
  • Certainly.. (Score:5, Interesting)

    by QuantumG ( 50515 ) <qg@biodome.org> on Monday September 15, 2003 @08:36PM (#6970560) Homepage Journal
    Have a look at our project Boomerang [sourceforge.net]. We're over 230k lines of code and we garbage collect everything. It's as easy as linking to Hans Boehm's libgc and adding the following lines to one of your files (probably best is the one which contains "main").

    void* operator new(size_t n) {
    return GC_malloc(n);
    }

    void operator delete(void* p) {
    }

    You can also mix collected memory with uncollected memory, but we really don't see the point. This way we can still have descructors which do useful things but the actual memory clean up is left to the garbage collector. Of course, as we write more and more new code we leave our deletes and our destructors out, and eventually we'll go through and remove them all. Until then, we can disable the garbage collector just by #if 0ing these lines out.

    • Re:Certainly.. (Score:3, Interesting)

      by AJWM ( 19027 )
      leave our deletes and our destructors out,

      If you have destructors, do they ever get called? Destructors aren't just for freeing memory, they're also used for freeing other system resources (depending on the object). File descriptors, database connections, that kind of stuff. (Granted, you can have explicit methods to take care of that cleanup, but then you can have explicit methods to free memory too. Seems to me you want all that as automatic as possible.)

      • You can call the destructors explicitly (with a delete) or you can configure gc to call them when the object is garbage collected. We don't do that because the only thing our destructors do is clean up memory, which is what the garbage collector does.
      • System resources you need to free should ALWAYS BE FREED EXPLICITLY regardless of GC. Relying on a destructor to close a file or database connection is bad.
        • Shrug. Memory is a system resource.

          This is what destructors are for, so you can explicitly (in the destructor) free resources that haven't yet been freed. Simplifies dealing with exceptions, especially where an exception may take control out of the scope of the object.

          Sure, explicitly free your descriptors and whatnot -- that makes the code clearer -- but also do it in the destructor (wrapped in a suitable check so you don't do it twice, of course) as a back up.
          • Ignoring the whole "Allocation is initializion" thing is a good way to mess up when you're dealing with exceptions = C++ lacks finally blocks for a reason, after all.

            You can just rely on the GC to clean up other system resources as well as memory, but most of them are alot finickier than memory - if you open a file, you want to close it when you're done, not when GC runs.

          • Sure, explicitly free your descriptors and whatnot -- that makes the code clearer -- but also do it in the destructor (wrapped in a suitable check so you don't do it twice, of course) as a back up.

            Actually, I'd argue that most of the time, you shouldn't be releasing anything directly. If you find yourself writing my_file.close(), consider whether you've got the my_file object at the right scope in the first place.

            What does it mean to refer to my_file after the close() call anyway? In most cases, the a

            • Valid point. It depends what exactly the object is and what you're doing with it, of course, but if you're encapsulating properly then usually what you argue is correct.
        • Why? The whole point of deterministic destruction is that you can rely on destructors to clean things up, and in C++, the accepted idiom for resource management does just that. If the destruction is happening at the "wrong time", it's probably a symptom of a design flaw (see my other post in this subthread).


    • Like the other guy, I want to know when your destructors get called? Plus not every system allows you to override global new and delete operators (e.g. the Symbian cell phone OS).
      • well, for a start, you can read the reply I gave to the other guy. Secondly, I'd be suprised if the Boehm garbage collector even runs on that platform (it is very platform specific and supports a number of popular platforms).
    • Actually, you may not write any destructors, but they do get created for you by most compilers. Destructors are one of a few functions that will get created for you if you don't add it.
      Personally, I think GC is over rated. GC should be left to langauges like Java where it is built in, and a lot of design consideration was put in to adding it to the langauge. At the bear minimum I think everyone should have to manage their own memory for a while in order to learn what's going and why it's going on.
    • Hmm, but wouldn't it be better to have the delete operator do something? I know you don't _need_ it to do anything, because the garbage collector will eventually free the memory, but performance might be better if you can insert explicit 'delete' at certain places in your program where you know that an object is no longer referenced. operator delete() would inform the GC that the memory had been freed manually.
  • GC in OpenCM (Score:5, Informative)

    by Jonathan S. Shapiro ( 321593 ) on Monday September 15, 2003 @09:33PM (#6971151) Homepage
    We made a decision early to use GC and exceptions in OpenCM [opencm.org], even though the application is written in C. Conceptually, it was a big success, but there were a number of hurdles along the way. Here are some things we learned:
    1. The Boehm-Weiser (BW) collector is not as portable as we had hoped. There are a number of platforms we wanted to run on where it just doesn't run at all. Relatively small changes to the target runtime can create a need to port it all over again. OpenBSD, in particular, was an ongoing hassle until we abandoned BW. Hans, I hasten to add, was quite encouraging, but he simply doesn't have time to adequately support the collector.

    2. The BW collector doesn't work in our application. OpenCM has a few very large objects. For reasons we don't really understand, this tends to cause a great deal of garbage retention when running the BW collector. Enough so that the OpenCM server crashed a lot when using it. Please note that this was NOT a bug involving falsely retained pointers, as later experience showed.

    3. Conservative collectors are actually too conservative. If you are willing to make very modest changes in your source code as you design the app, there prove to be very natural places in the code for collection, and the resulting collector is quite efficient.

    4. Independent of the collector, we also hacked together an exceptions package. This was also the right thing to do, but it's easy to trip over it in certain ways. The point of mentioning this is that once you do exceptions the pointer tracking becomes damned near hopeless and you essentially have to go to GC.

      I think the way to say this is: exceptions + GC reduces your error handling code by a lot. Instead of three lines of error check on every procedure call, the error checking is confined to logical recovery points in the program, and you don't have to mess around simulating multiple return values in order to return a result code in parallel with the actually intended return value.

    5. To provide malloc pluggability, we implemented an explicit free operation. This lets us interoperate compatibly with other libraries and do leak detection. Turns out to be very handy in lots of ways.

    6. Hybrid storage management works very well. For example, our diff routine explicitly frees some of its local storage (example [opencm.org]) [Sorry -- this link will go stale within the next few weeks because the OpenCM web interface will change in a way that makes it obsolete. If the link doesn't work for you, try looking for the same file in .../DEV/opencm/...] This is actually quite wonderful, as it lets us build certain libraries to be GC compatible without being GC dependent. One of the challenges in using a GC'd runtime in a library is compatibility with an enclosing application that doesn't use GC. We haven't tried it yet, but it looks like our gcmalloc code will handle this.

    Eventually, we gave up on the BW collector and wrote our own. Our collector is conceptually very similar to the collector that Keith Packard built for Nickle [nickle.org], though we've since built from there. A variant of the Nickle collector is also used as a debugging leak tracer for X11.

    The OpenCM GC system is reasonably well standalone. We need to document it, but others might want to look at it when we cut our next release.

    On the whole, I'ld say that GC for this app was definitely the right thing to do. Once you get into object caches it becomes very hard to locate all of the objects and decide when to free them. We were able to use a conservative approach with no real hassle, and heap size is fairly well bounded by the assisted GC approach we took.

    On the other hand, I would not recommend a pure conservative collector for a pro

  • There's another way. (Score:5, Informative)

    by Lally Singh ( 3427 ) on Monday September 15, 2003 @10:34PM (#6971624) Journal
    Garbage collection has costs:
    - The obvious: CPU & memory overhead for the checking and tracking. I can't comment on the amount here, but it is a generalized solution, so you forego the optimization opportunities that you'd otherwise have.
    - The subtle: Memory allocation can become a major bottleneck in multithreaded systems. Garbage collection has similar issues.
    - The irritating: you don't know when your destructors are called.

    Another way: Smart Pointers. They're simple wrappers around the types that act like pointers, but they can make sure your objects live as long as you need and no longer. The big trick is knowing which kind of smart pointer you want.
    - Reference Counting Smart Pointer (RCSP for short): this type of smart pointer will keep of how many RCSPs are pointing to the same object. It'll delete the object when the last RCSP is destroyed. A good one is the boost shared_ptr. Available for free from www.boost.org. This type is great for general use.

    - Owning Smart Pointer (OSP): this type is specialized for those cases when the refcnt is never more than 1. When you assign one OSP (a) to another (b), the new OSP (a) gets ownership of the referred object, and the old one (b) is automatically set to null. When an OSP that isn't set to null is destroyed, it deletes the object it owns. It's great for parameter passing, return values, and objects you want dead at the end of the current scope, even if there's an exception. The STL comes with auto_ptr, which works this way.

    You can use an RCSP wherever you can use an OSP, but not the other way around. The STL containers are a great example.

    Sure it's not as easy as 'allocate and forget,' but you won't have the (sometimes very costly) expense of full-blown garbage collection.

    Also, you can optimize your smart pointers for individual types (through template specialization). A great example is to give the no-longer-needed object back to a pool for later reuse.

    This is really a quick, quick overview. For the meat & potatoes, go read Effective STL by Scott Meyers.

    I've tried really hard to be fair & polite. There's probably still a bias, but I'm really trying!!
    • QUOTE:

      Another way: Smart Pointers. They're simple wrappers around the types that act like pointers, but they can make sure your objects live as long as you need and no longer. The big trick is knowing which kind of smart pointer you want. - Reference Counting Smart Pointer (RCSP for short): this type of smart pointer will keep of how many RCSPs are pointing to the same object. It'll delete the object when the last RCSP is destroyed. A good one is the boost shared_ptr. Available for free from www.boo

    • by swillden ( 191260 ) * <shawn-ds@willden.org> on Tuesday September 16, 2003 @01:50AM (#6972753) Journal

      You repeat some common myths about GC; allow me to counter them.

      The obvious: CPU & memory overhead for the checking and tracking. I can't comment on the amount here, but it is a generalized solution, so you forego the optimization opportunities that you'd otherwise have.

      Malloc/free and new/delete (without pooling) are also generalized solutions, and they also consume CPU and memory overhead for checking and tracking. There is good reason to believe that in the right type of language (which C and C++ are not) that GC can actually be much more efficient than manual deallocation, mainly because it can do its work in larger batches, and because it can reorganize objects in memory to make allocating more efficient. Contrast a simple single-heap malloc implementation, which has to scan a free list looking for a sufficiently large block against a copying garbage-collected system where the allocation pool is simply a large contiguous block from which you just grab the first 'n' bytes.

      If you look on Boehm's web site, you can find a few papers comparing the performance of conservative GC for C with optimized malloc/free implementations. malloc/free wins, but not by as much as you'd expect.

      The subtle: Memory allocation can become a major bottleneck in multithreaded systems. Garbage collection has similar issues.

      Actually, GC *eases* the issues associated with recovering memory in multithreaded systems. Why? In a multi-threaded program with manual deallocation, both allocation and deallocation occur in every thread context. In a GC system, all deallocation is typically concentrated in a single thread, the GC thread. Allocation is still spread across threads but the required interlocking is hugely reduced since the GC thread can do all of the reclaiming, block coalescing and free list construction (if that's the technique used) without any interference from the other threads. It will have to acquire a mutex to place the recovered blocks back where the active threads can get them, of course.

      Generational, copying GCs can do even better, but not for C or C++.

      The irritating: you don't know when your destructors are called.

      As experience with finalize methods in Java has shown, you should really treat GC as a way of having infinite memory. The problem with finalizers/destructors is that not only do you not know when they'll be called, you have no way of knowing that they'll *ever* be called. That means that they're effectively useless and add significant complexity and overhead for little or no return.

      IMO, if you want to use C++ with GC, you should make sure that objects that have non-trivial destructors (those that do something besides memory management) get destructed normally, and just let GC handle the memory.

      Reference Counting Smart Pointer (RCSP for short)

      They're useful, but I'd hardly call them great. Reference counting is *far* more compute-intensive than scanning-type garbage collection. And then there's the problem of circular references, which will never be reclaimed. Of course, it's not that hard to avoid those situations most of the time, but with GC you don't have to care.

      Owning Smart Pointer (OSP) ... auto_ptr

      These are very useful, and conscientious use of them will eliminate 95% of memory leaks and dangling pointers. OTOH, they don't work when things get complex enough that ownership isn't simple and clear.

      Also, you can optimize your smart pointers for individual types (through template specialization). A great example is to give the no-longer-needed object back to a pool for later reuse.

      Generally, I would do this through specialized new/delete, rather than specialized smart pointers. Regardless of the mechanism, though, pooled allocation is the absolute best thing you can do to minimize the cost of memory management in your application. The reason is, of course, that you build the pooling based on your knowledge of the actual usage characteristics of the objects; knowledge that no general-purpose memory manager can possibly have.

      • And then there's the problem of circular references, which will never be reclaimed. Of course, it's not that hard to avoid those situations most of the time, but with GC you don't have to care.

        Ref-counting can be safe if used in conjunction with other methods. Its flaw is not only in that circular references between objects can't be tracked down but also in the general assumption that you should stay alive in memory while at least someone holds a reference to you. Semantically that's not always fair, sin

        • A mechanism that allows to invalidate a reference when an object dies of its own will complements reference counting.

          Yep, manually breaking the circularity solves the problem, but that's not always possible.

          • When exactly it's not possible? You program destruction notification yourself, so why it shouldn't be possible? The only difficulty here is extra coding in both objects, the reference holder and the target. They should be somehow `aware' of this functionality, and most likely it means that it should be built into the very basic class of your hierarchy, like Gtk and VCL do.

            From my experience, this technique combined with 'smart' pointers even in huge and complex applications do quite well.

      • by Anonymous Coward
        Actually, GC *eases* the issues associated with recovering memory in multithreaded systems. Why? In a multi-threaded program with manual deallocation, both allocation and deallocation occur in every thread context. In a GC system, all deallocation is typically concentrated in a single thread, the GC thread. Allocation is still spread across threads but the required interlocking is hugely reduced since the GC thread can do all of the reclaiming, block coalescing and free list construction (if that's the tech
        • You also have to consider cache lines to avoid stalling different threads. There's a dozen other factors as well. In the end even your best generational garbage collector is no match for a modern SMP malloc/free implementation like Hoard. Google on it.

          Thanks, I will.

          I'll freely admit that my knowledge of memory management schemes was state of the art circa 1998, but that changes in processor architecture (heavy memory caching, deep pipelines) and machine architecture (widespread SMP) could very well

        • I don't think you are thinking about what actually happens at CPU level.

          If a mutex is active and a thread blocks it will get put into the wait queue and not use CPU. The only ready processes will be new IO, then you are back to the GC code again. The only overhead is the actual scanning on a single CPU system. I agree that GC and SMP can get a bit harry (probably be best to divide the memory space N ways for an N processor system and seperate thread memory as much as possible).

          It's also true that GC w
      • throughout your post I was think, "yes but...", "well put, but..." and then I reached this, which is what I agree with and the only problem I have with GCs.

        >The reason is, of course, that you build the pooling based on your knowledge of the actual usage characteristics of the objects; knowledge that no general-purpose memory manager can possibly have.

        I find this in general... Garbage Collectors, not unlike VMs, do well when they know their problem domain well. So, it's common to use a Garbage Collecto
        • by swillden ( 191260 ) *

          iow, it's really worthwhile to think about memory management issues, who owns memory, etc, and not just for the free() call, but for the design itself. It pays back a lot to think of these things, and use GC for particular cases that can benefit.

          I agree with this, actually. I've seen many cases where having to think about object lifetimes has given me clearer insights into the problem domain and into the design, and resulted in better, tighter, cleaner and more maintainable code than would have been the

    • GC costs (Score:5, Informative)

      by greppling ( 601175 ) on Tuesday September 16, 2003 @04:14AM (#6973269)
      One thing you didn't mention is that GC is deemed to have pretty high processor cache-miss costs. The obvious part is that the GC run itself is basically pointer chasing, i.e. pretty much the worst thing you can do cache-wise. And after the GC run, the cache is clobbered with stuff useless for continuing the work.

      There is another indirect cost pointed out by Linus Torvalds in a lengthy post to the gcc mailining list [gnu.org]. The executive summary is that (he thinks that) memory that is not to be used anymore should be freed immediately. Otherwise, the data in there will keep lying around in the data cache. Also, he claims that explicit ref-counting gives you advantages for optimization: Assume you have to make some modifications to a data structure, but you don't want other parts of the program to see the modifications. Without ref-counting, you have to copy all the data structure before modifying it. With ref-couting, you can omit the copying if you are the only one with access to the data structure.

      And finally, he thinks that GC makes it too easy to write pointer-chasing-heavy code---as that kind of code is bad for cache behaviour all the time.

      It is an ongoing discussion whether GC really has that bad effects on performance of GCC. But Linus Torvalds seems to have very good points. (And some of them certainly cannot be taken into account in a "GC cost is less than hand-written memory management"-paper.)

      • Re:GC costs (Score:4, Interesting)

        by Hard_Code ( 49548 ) on Tuesday September 16, 2003 @07:09AM (#6973884)
        Can anybody informed tell me whether we have not ALREADY lost the war against pointer-chasing and cache clobbering? Any OOP or interpreted language (the vast majority of mainstream code) is doing this already, true?
      • Re:GC costs (Score:3, Insightful)

        by be-fan ( 61476 )
        Your description is slightly inaccurate. He said that explicitly freeing allows the next alloc to reuse a given chunk of cache-hot memory, while the GC will ignore that memory and allocate a cache-cold chunk instead.
      • Many people are not aware that GCC itself uses garbage collection as it runs. You can actually select which algorithm gets used at configure time, and tweak the GC parameters during runtime (via a growing set of command-line options that users never think to use).

        That aside: I've corresponded with Linus a couple times (on other subjects), and while he is the brilliant guy that /. thinks he is, he is a kernel expert, not a compiler expert. Entirely different problem domain, very differnt approaches to

      • But Linus Torvalds seems to have very good points.

        After reading many of Linus Torvalds' posts, I think it's useful to remember where he's coming from. He's spent ten years writing a kernel, with some of the best programmers, where every line of code has been rewritten several times. Yes, in that environment, garbage collection won't a huge win. But you don't always have forever to work on one project; you don't always have a team of crack programmers on the job; and a lot of times, efficency is not of pri
    • I used BW Garbage Collection on an Information Visualization system here [umd.edu] available under the GPL.
      It works, but with some problems, mainly due to the fact that it doesn't knows enough of the OS, in particular large pages allocated by libraries that it has to scan for pointers. There is nothing that cannot be fixed in theory, but systems are not designed for it right now.
      On my visualization application, it spends seconds scanning some memory mapped zone opened by NVidia OpenGL implementation (this is a guess
    • What about circular references? If they are not themselves referenced by an external object, do they still get cleaned up, or do they stick around because they each have a refcount of 1?
    • The memory management cost for GC is basically the same as for malloc/free. It's just amortized in a different place.

      It turns out, however, that there are natural places to do GC, and a little help from the application can go a very long ways. In the OpenCM collector, we mark procedures that return pointers using a special GC_RETURN macro. This works because at the return from a procedure all of its local variables are known to be unreachable. The only surviving objects are the ones that are reachable fro

    • I read some good articles about this at Relisoft.com [relisoft.com] and it was very helpful.
    • Reference Counting Smart Pointer (RCSP for short): this type of smart pointer will keep of how many RCSPs are pointing to the same object. It'll delete the object when the last RCSP is destroyed.

      So, if you have two RCSPs pointing at each other (or a whole daisy chain of them), and nothing else pointing to any of them, when do they get deleted?

      (They don't. That's the weakness of reference counting. You're fine so long as you never create any circular lists. (That's one reason you cannot create hard link
    • Another way: Smart Pointers. They're simple wrappers around the types that act like pointers, but they can make sure your objects live as long as you need and no longer. The big trick is knowing which kind of smart pointer you want.

      I've read from multiple resources that smart pointers don't work with STL containers (due to the way the internal container handles memory)

  • by bo0ork ( 698470 ) on Tuesday September 16, 2003 @01:47AM (#6972742)
    I've wrote an OO language back in 1993 that's being used by two medium-sized companies. It's garbage collected, and it's kernel is written in C. The language is not interpreted; it gets translated into C and then compiled. The applications written with the language are fairly large. The source code of one is 28MB uncompressed. I'll skip the general implementation details, and just go over the garbage collection approach I used. These definitions are true for that language; they're not meant to be general.

    A program variable is either a global variable, a stack variable, a class variable or an instance variable. Global and stack variables are held in lists. Class and instance variables are kept inside objects.

    Every class object has a global variable that always refers to it.

    Any object that is not, and that can not become referenced (directly or indirectly) by a global or stack program variable is garbage.

    Each object has a 'not-garbage' flag.

    For each global and stack variable, if the referenced object is not marked not-garbage, mark the referenced object as not-garbage, and recurse for that objects contained variables.

    Delete all objects that are not marked not-garbage.

    There are a few more twists, like handling return values on the stack, but this algorithm correctly handles self-referencing objects no matter the complexity.

  • It's okay (Score:4, Interesting)

    by Anonymous Coward on Tuesday September 16, 2003 @07:41AM (#6974092)
    I've used a garbage collection system in a C project before and it works surprisingly well. The problem with GC in C though is that it is possible and legal to,

    o allocate memory
    o write the pointer to a disk
    o lose the pointer in memory
    o read the pointer back off the disk,
    o make use of the pointer

    With all GC strategies I'm aware of, by the time you read the pointer from the disk the memory may well have been freed.

    I'm not saying that this style of programming is a generally good idea but it is used in certain, specialised situations and therefore not suitable for a garbage collecting language.
    • Well yes you could do that but... Don't just for goodness sakes don't do that! While it could work I can not imagain any reason to do such a thing.
      • Maybe that's a problem with your imagination. You'll be telling me next that Knuth never propagated the concept of the XOR DLList, and dancing pointers?

        If Boehm says to Knuth "but you shouldn't be doing that", then I think it's perfectly acceptable for Knuth to respond "but _you_ shouldn't be doing _that_".

        YAW.
    • A perfect solution for when you want to write data to a disk is to use a Memory Mapped File. You can write data to a file and still keep it in memory. CG will just work correctly, although using CG with a Memory Mapped File mat cause the data to be read in again everytime a CG occurs.

      I once wrote some classes to work with Memory Mapped Files (under Windows) in an almost transparent manner. It works great for making complex C++ object hierarchies persistant.

    • ... well thanks, you made my head explode by saying that!....

      Luckilly I don't need my head to type. I can no longer read the articles on slashdot... but that doesn't matter, I can still post.
  • I am using the BDW collector in an EDA tool. EDA tools store large databases of circuit connectivity, and for various reasons we don't want to be bothered with explicit memory management.

    The salient points:

    Destructors are not Called

    If an object is allocated in collectible memory, then its destructor will not be called when the object is collected. Therefore, destructors are pretty much useless and your code must be designed to work without them.

    Actually, if your object derives from class gc_cleanu

  • Very happy... (Score:3, Insightful)

    by DrCode ( 95839 ) on Tuesday September 16, 2003 @11:07AM (#6976224)
    A few years ago, I used the Boehme GC when writing a pair of compilers (Verilog/VHDL) in C++. I was very happy with the result, since it was rare for GC even to get called at all. It was also surprising how much simpler code gets when you don't have to worry about deleting objects.

  • ILOG [ilog.com] Solver & Scheduler are mainstream commercial thrid party libraries in C++ based on the constraint programming paradigm. One of the major features is ILOG's automatic garbage collection heap, which is automatically deallocates memory (based on assumptions on program flow). To make this efficient, they skip all deallocations (using a longjump, rather than a return).

    At first this may look like an elegant way to get rid of complicated memory management & garbage collection without loosing efficie
  • The Qt toolkit (on which KDE is based) has a nice garbage collection facility. All of the widgets derived from the base class QWidget take care of deleting child widgets that are also derived from QWidget, including user defined types. This means you can add, remove or move widgets in your user interface without having to worry about the corresponding delete.

    Tom.
  • by umofomia ( 639418 ) on Tuesday September 16, 2003 @03:51PM (#6979264) Journal
    There's a really good book [amazon.com] about everything you ever needed to know about garbage collection. Although most of the book deals with garbage collection techniques in general, it has two complete chapters devoted to implementing and using garbage collectors in C and C++ and which ones you should use depending on your application needs.
  • ...whenever you find yourself writing an overly-complicated means to overcome issues of object/memory 'ownership'.

    (Granted, one could say that this would apply to the GC itself, but not necessarily so)

    The trick is, memory is a 'resource' and as such is subject to acquisition and release steps in order to maintain it properly. If the notion of ownership of memory is ambiguous, you need to normalize your data somehow so you get back to a 1:n relationship between owners and acquired resources. This happe
  • You can get the same effect using Smart Pointers and not give up the control that using a garbage collection system entails. See Boost [boost.org] and Alexandrescu, Andrei. Modern C++ Design. There is also a nice article on CUJ. [cuj.com]
  • We considered using the Boehm collector for our commercial product, NewJ Library for C++. But we opted not to due to Boehm's lack of predictable object destruction and its maintenance of separate heaps, complicating integration with existing libraries. These issues are covered in detail in a recent C/C++ User's Journal (CUJ) article on the Boehm collector.

    Instead, we developed our own automated object management facility based on reference objects, that is, "smart pointer" objects with these new capabilit

  • I've dealt with garbage collection in Java and now in Python. When you get into the mindset of a language that does this natively, I have found that your code naturally flows into that paradigm. I can't imagine trying to use garbage collection in C/C++ -- it just doesn't fit into the scheme of things for me. True, the STL has auto_ptr, and I have used that in the past -- works rather nicely, IMHO -- however the way I learned how to write clean, efficient C code was to make sure you write the code to dealloc

Think of it! With VLSI we can pack 100 ENIACs in 1 sq. cm.!

Working...