C# Memory Leak Torpedoed Princeton's DARPA Chances 560
nil0lab writes "In a case of 20/20 hindsight, Princeton DARPA Grand Challenge team member Bryan Cattle
reflects on how their code failed to forget obstacles it had passed. It was written in Microsoft's C#, which isn't supposed to let you have memory leaks. 'We kept noticing that the computer would begin to bog down after extended periods of driving. This problem was pernicious because it only showed up after 40 minutes to an hour of driving around and collecting obstacles. The computer performance would just gradually slow down until the car just simply stopped responding, usually with the gas pedal down, and would just drive off into the bush until we pulled the plug. We looked through the code on paper, literally line by line, and just couldn't for the life of us imagine what the problem was.'"
Stupid Slashdot headline (Score:5, Interesting)
It's not C#'s fault. The team had references to the obstacle list (event handlers), which prevented garbage collection. The
Re:Slashvertisement (Score:5, Interesting)
Re:Stupid Slashdot headline (Score:3, Interesting)
Maybe so. But if they explicitly call delete to invoke the garbage collection of an object, would it not be better for the system to destroy the object and then throw an exception when it tried to send an event notification to a non-existing object?
Furthermore, if delete is called and the garbage collector does not delete the object because it realizes that the object is registered on certain events, would it not be just as easy to then un-register the object for the event? Or at least report it? After all, the GC already went to the trouble of checking to see if the object was registered with an event notification.
c#? (Score:3, Interesting)
They were using massive cooling systems and having very thorough code reviews, sounds like a perfect reason to use C over C#.
only 10KLOC? (Score:4, Interesting)
Of course memory leaks can happen with garbage collected languages, but these leaks are a little easier to find....
Maybe they should have coded in a higher level language like Ocaml, Haskell.
And yes, I'm sure most of an autonomous vehicle software is not low-level drivers, but in the planification & perception tasks. On such tasks, higher-level languages definitely make sense.
I also did not understood what kind of libraries these teams are using.
I'm also surprised that it is apparently so easy to get funded to have only 10KLOC inside a car!
Hard/weak references for event handlers (Score:4, Interesting)
So I guess the real question here is whether event handlers should be hard-referenced (as they are here), or just soft/weak referenced...
From a developer perspective it's quite natural to think that, as long as his code doesn't hold any reference to an object, it should be garbage collectable. If registerEvent() shall hard-reference handlers, documentation should be *very* explicit about it (and the need to unregister a handler for GC to work on it).
On the other hand, if handlers are not hard-referenced you can no longer register anonymous class event handlers...
Re:Stupid Slashdot headline (Score:5, Interesting)
don't these kids learn anything anymore? (Score:5, Interesting)
(2) You are particularly supposed to test your software if you send $200k and 1 ton of hardware careening through the street on autonomous real-time control.
(3) Garbage collectors do not prevent memory leaks.
(4) Garbage collected systems can be good for building real-time systems, but you need a real-time garbage collector or you need to treat the system as if it didn't have a garbage collector at all.
What "ruined their chances" was not that they overlooked a memory leak, what ruined their chances was that they didn't know what they were doing.
Ahahaha! (Score:3, Interesting)
This kind of thing makes me so happy. Sure, it's not really a bug in C#, but this is even better, a perfect demonstration of how GC does next to nothing to prevent this type of bug, and instead fools people into complacency while making the bug much more subtle.
In my opinion there is a proper language level for nearly any task. For kernel programming, drivers, or RT stuff, C. User-level stuff is usually better in C++. Well, I'm a big fan of C++ and more comfortable there so I'll usually extend its range down to some lower-level work and sometimes I'll bang out a quick-and-dirty app or script type thing (lots of user input parsing and other things C++ isn't great at) in it too, even if it could be done better (yes, better as in higher quality) or faster in another language.
Anyway, although I could be making incredibly wrong assumptions about the nature of the problem, I'm pretty sure that C# wasn't the right language for the job. C# very nicely occupies the space between C++ level languages and scripting languages, but for a problem that involves probably no parsing whatsoever (it shouldn't, anyway), needs to be perfectly stable (in my experience GC apps are buggier, I'm not going to go off on that tangent now and explain, but it's been my experience), and have as deterministic runtime as possible, it's C or a subset of C++ (little to no STL) all the way. This paragraph was brought to you by Lisp.
This problem was caused by, I'm going to go out on a limb here and say the wrong language choice. If this was C/C++, there would have been a segfault (easy to debug--usually) or the old reference wouldn't have mattered at all. C#'s real strong point, its huge and well-integrated library, probably didn't help them out very much.
Every programmer who wants to call themselves a real programmer should learn as many languages in as wide a range as possible. Sure, have favorites, but that should mean trying to work in your language's realm, not extending it way beyond its range.
Re:Hard/weak references for event handlers (Score:5, Interesting)
The poster of the article was trolling, and not only trolled with the post, managed to get a troll posted to a slashvertisement which was not even trolling.
Impressive on the part of the person who submitted it, but disappointing considering Taco's comments a few weeks back about articles that are truly nothing but advertisements.
Re:I'll show you mine if you.. (Score:3, Interesting)
A moderately experienced programmer would recognize the problem very easily by, say, noticing that a listener method is getting called 100,000 times for each event.
Re:Swing (Score:1, Interesting)
The resizing function was non-synchronous and had a callback to let your GUI know when the resizing was done. Turns out it was never removing the pointer to the uncompressed image internally after passing it out to the callback function. This bug has long-since been fixed, but it just goes to show that everyone makes this kind of mistake sometimes. It took quite a while to track that down, as the memory leak wasn't in my code, but eventually I fixed it by using a blocking resize function.
Wow, how embarassing (Score:3, Interesting)
Criticisms of the team aside, I would like to say that neither Java nor C# have made any steps to remedy problems like this with seem to be all too common with inexperienced developers. Both Java and C# need to support attaching to event handles with "weak" handlers. That is, the handler will not hold onto the object which defines the handler (and will automatically deregister itself sometime after the object has been collected). In many cases, there is a need for an object to listen and handle an event from another object, but only whilst the object that is listening is still referenced (with the exception of the reference held by the object firing the event).
In C#, the (admittedly ugly) way to implement this is to use an anonymous method and a weak reference: The "closure" that is created for the anonymous method does not hold a reference to "this" as it does not access any of "this"'s fields or methods unless it's through the weakreference.
The code has a flaw where the event handler code (only a few bytes to hold the closure) will never deregistered be collected unless the event is fired sometime after the owner object has been collected. This can be fixed by using a NotifyingWeakReference (a weak reference that raises an event when it has been collected).
Re:Reference counting (Score:2, Interesting)
MS used gc rather than reference counting in CLR to get around the problem of circular references so prevenlant in COM/ DCOM etc.
Also this is a classic problem in c# - the event handler holds on to the reference - so you need a dispose() to handle exceptions and explicit removal of refereces. but having recently tried to do something similar in c++ using various combinations of boost::bind siglib and refernce counted pointers its not as a trivial problem to solve in any language.
for those who say just use a weak reference when you need to for multicast delegate behaviour i would like to see their c++ template code they use to parameterise this kind of functionality. Really I am still looking for a good solution to good event handling in c++ that works well in multithreaded environments, and takes a reasonably modern approach to memory (ie RAII).
Re:Stupid Slashdot headline (Score:5, Interesting)
Now, if you have control of the implementation of the object who accepts Listeners you can store them internally in a weak collection, which allows them to be garbage-collected. This would work but may not be what the programmer intends. Actually in a language like Java I'd hazard that usually the programmer wouldn't want that at all: consider an application that listens to UI events. As a programmer I want to be able to stick listeners wherever they are needed and leave them there permanently. If I don't need a pointer to the object, I don't want to keep it around, and thus may not have a reference to the listener EXCEPT in the event-management collection. That's the advantage of GC languages: as soon as the object which creates those events (say, a dialog box) goes away, the objects it refers to have one fewer pointer and may be eligible for GC.
Anyway, lots of code has issues like this: we had a problem at my work where an Apache taglib was caching some compilation in a cache that would grow for ever. It was a simple code fix to solve that problem, but there was no way for us to even SEE the problem until we ran our application under load in a profiler. Fun fun fun.
Re:I'll show you mine if you.. (Score:5, Interesting)
Re:Well, there's your problem! (Score:5, Interesting)
I first ran into this sort of problem in 1983 when working on a CDC mainframe. The only way to find the bug was the line by line analysis method since even compiling the code with debug caused it to run slower and the nature of the problem changed. That's as much detail as I remember.
I expect to see a lot more of these kinds of errors pop up as multi-core CPUs become more prevalent (true parallel execution) and people continue to assume that they can just crank out code without taking the time to understand the design. I'd also expect the prevalence of multi-core processors to create a demand for more parallelism. If you don't take advantage of the additional cores, your program will only be as fast as if it were on a single core system. If the competition can create a program that uses the additional cores, your program will seem slow.
Cheers,
Dave
Re:I'll show you mine if you.. (Score:5, Interesting)
The Slashdot editor who posted it moved the link so it looked like I was linking to the original study, not the article about the study. It's like they felt compelled to make a change, so they made one even if the change didn't improve the quality of the article.
I will say that the rest of the text remained unchanged, and really the only problem with the submission is that people who thought they were going to a study were actually going to a newspaper article about a study, but the point is Slashdot editors *do* make changes all the time.
Slashdot not a news source? Agree! (Score:5, Interesting)
People complain that Slashdot sucks: the headlines are sensationalistic, the editors get commissions based on the number of dupes they post, and articles about 6-month-old events get posted as "news".
So why do I even bother visiting Slashdot? The answer is two things: the community of posters, and Slashcode moderation.
The value of Slashdot is in its community. You and I, dear Slashdotters. Our collective mind will pick through the various articles, point out their flaws, expose sensationalist FUD for what it is (and, surprisingly, will do this equally for anti-Linux and anti-MS FUD), debate various trends, and provide a signficantly international (though heavily USA-centric) perspective.
This value is enhanced by Slashdot's moderating system, so that information and insight can bubble to the top among the mass of inane posts. Metamoderation limits the amount of crack that the moderators can be on.
So, Slashdot editors, take note! *WE* are the reason we are here. *YOU* are not. Many of us don't even bother to read the articles any more, preferring to soak up the collective wisdom of techies from varying age groups and fields. If you piss us off, and the collective community of Slashdot deteriorates, then there's no reason for me (or others) to keep coming back.
Think about it.
Re:Stupid Slashdot headline (Score:4, Interesting)
I must apologise in advance if this is a bit of a rant. I have a graduate degree in, well, programming language design, and I find some things close to my field just very upsetting. You wrote:
Perhaps you write very C++-adapted, boilerplate code. The reason garbage collection is essential in a programming language is that without it (a) you cannot provide a safe implementation of first-class functions, since they implicitly grant indefinite lifespan to arbitrary objects; and (b) you cannot build an abstract data type, whose implementation is hidden from the user, since no matter what other features the language may have, you can always tell whether the type a library has handed you is an automatically managed 'atomic' object, or a 'reference type.'
But why get so upset about weird advanced programming techniques not coming out quite right?
Because the kicker is, that to those of us who grew up with garbage collected languages, first class functions and abstract data types are elementary programming techniques. They are the bricks and mortar of which everything else is made. "Data structures + Algorithms," you see. Sure, C++ programmers consider it rocket science and discuss ad nauseum their clever smart pointer techniques and their baroque fifty-line function object implementations (or, if they advocate Boost, their two line function object implementation that requires a five thousand line header file and employs a completely different syntax from everything else they do). That's because they're now used to getting through life with no arms and artificial legs.
The sense in which garbage collected languages make memory leaks a thing of the past is this: that if you received a non-C++-adapted education, focussed on data structures and algorithms and not the fifty-three (or five thousand and six - they make money, let's invent more) Programming Patterns that help you evade the design flaws of the One True Language, and so you are in the habit of thinking and coding using callbacks, strategy functions, abstract types, state encapsulators - all those basic things that (unless the goal is avoiding the shortcomings of C++) are taught in school, and, indeed, all those things that both functional programming and object oriented programming were invented to make notationally direct, then you can just go ahead and code what you think, and you won't be bitten on the bum. The abstract model of computation comes reasonably close to matching the reality. Without it, you're still tracing through the execution in your mind at every step, because relying on the abstraction itself will get you burned.
Yes, a competent programmer can adapt. Yes, a competent programmer can think at the level of assembly language and either work out exactly the lifetime of the data, or do a second explicit computation, woven in with the main one, to determine it dynamically. A competent programmer can also deal with a language having divergent notations for data, expressions, statements, type expressions, templates, and type expressions within templates; or to phase of the moon dependent name resolution (templates again!); or to notational 'abstractions' requiring manual instantiation in real implementati
Re:I'll show you mine if you.. (Score:3, Interesting)
If you're trying to imply that errors can be made in any language, you're right, but the big difference is that leaks in a manually allocated language like C++ are a heck of a lot easier to find and fix than leaks in a language that tries to be smart and "help" you avoid leaks.
If you're failing to dispose of an object, look at the places where it should be freed and make sure that it is. Generally, there aren't a lot of these places. If you have a dangling reference, it will show up in the form of a crash. The resulting backtrace will tell you exactly what object contains a dangling reference, at which point you just have to track down why the reference was not cleaned up when the object it referenced was freed.
By contrast, in a garbage-collected language (or a reference-counted language), you usually don't have the ease of tracking down the dangling references, and even when you do, you rarely have a way to tell the computer "I expect this object to go away when I release it; warn me if it doesn't." As a result, tracking down those last two or three errant references can be a pain in the ass, particularly if you have circularly-referenced structures jacking up your reference counts. It's not always easy to figure out which of the eighty tree-structured objects is being referenced. You might have a reference to a leaf of a tree and the parent links are keeping the tree alive, or you might have a reference to the root and the child links are keeping it alive, or you might have a reference smack in the middle of the rightmost branch and the references bubble up and back down.
Workarounds like "soft" references don't help, either. At best, you could keep a hard reference to every node in the tree in some enclosing object and use soft references inside it, but this is a pretty hackish workaround. Anything short of that and you are forced to handle the case where your "parent" node can suddenly go away if the last hard reference to the top of tree is released. This is probably not desirable.
Nothing annoys me as a programmer more than having to deal with garbage collection or reference counting on complex data structures. Above a certain level of complexity, the complexity of managing memory yourself becomes less than the complexity of working around the garbage collection and/or reference counting.
The best of these schemes, IMHO, is an advisory reference counting algorithm in which you make a habit of incrementing an object's ref counter whenever you add a reference to it and decrementing it when you remove a reference. Combined with manual alloc/free, this gives you the ease of tracking down where those references are, while still giving you the ability to test for references on free and generate a warning that "Object 0x46b32158 is still referenced in three places when freed at utilities.c:866."
Then, you just have to correlate the crash dump with the warnings and you know -exactly- what is going wrong and where the failure is occurring. Better, because you are choosing when to increase or decrease the reference counts, you can pick and choose which references to count, ignoring references between objects in a data structure like a tree and only considering outside references (the ones you actually care about). You can even do smart tricks like keeping a second reference count on the top node in the tree that indicates a reference to a subtree so that your warnings can be more informative. "Warning: tree 0x23421420 has active external references to child node 0x43298470." And so on.
The point is that by taking these things out of the hands of the language and keeping them in the hands of the programmer, while the actual programming is slightly more complex, debugging is so much easier that it makes up for that extra complexity in terms of robustness, performance, etc. (at least for programs with a certain level of complexity and longevity). I thus view garbage collection as a tool that is primarily useful for very simple programs and detri
Re:Slashvertisement (Score:1, Interesting)
What facts did you use to arrive your opinion? VB.Net now has all of syntactic capabilities of C#, and vice versa, so the language is just as suitable for any project as C#. In fact the ambiguous {} brackets and plethora semi-colons make C# code harder to read.
VB is the Rodney Dangerfield of programming languages - it gets no respect. I've used it to produce software used by engineers/geologists that sells for minimum $25k/seat in a timeframe that easily allowed us to beat our competition to market.
Re:Slashvertisement (Score:1, Interesting)
That's an oxymoron.
If you seriously believe a VB programmer can be good, then you think a VB program can be good. You would have to reinvent VB from the ground up, as Microsoft has for the past 6 versions of VB, and you still would get a lousy language, in which serious programs can't be written.
Even C# suffers from incompatabilities from version to version. What is it? Is Microsoft unable to create languages that are backward compatable or it does it on purpose so that companies have to rewrite their all their programs every few years?
I think Microsoft really does it on purpose, because Microsoft has binary backward compatability for all their software (you can run DOS programs in all Windows versions, even Vista), but Microsoft prefers not to have source code backward compatability for a reason, and I guess the reason may be Developers, Developers, Developers... http://video.google.com/videoplay?docid=6304687408656696643 [google.com]
They want developers to have jobs, and specially jobs that are dependent on Microsoft technologies. So you, the developer, switch to Microsoft and all your clients will always call you every few years to rewrite their very same systems. All you, the developer, need to do is to learn the little differences that spring from version to version by buying their books, so Microsoft is more a book selling company for developers, since tools are free, while at the same time, Microsoft sells cheap operating systems to companies to run the software those developers write. Microsoft has been doing this since the early 90's and it has worked well so far.
The first problem Microsoft has is that Java is open source and Linux is open source. Any company investing on Linux and Java will never have to look back, their investment is safe. They could even run their software on Windows or AS/400 if they wish. Let's hope they don't, but let us recognize they could.
The second problem is that source code written in Java is forwards compatable. You write now and you run it forever. Serious companies will always choose open source technologies, because they can control them (so they are safe for them) and they know they won't get buyer remorse. Is it more expensive to hire Java developers than
Absolutely!!!
http://www.indeed.com/salary?q1=java&l1=&q2=jms&l2=&q3=session+beans&l3=&q4=spring+hibernate&l4=&q5=ibatis&l5=&q6=struts+ajax&l6=&q7=j2ee+architect&l7=&q8=.net&l8=&q9=java&l9= [indeed.com]
So companies may think it is cheaper to build
That is what explains why developers using open source Java get rich quicker than
Slashdot *does* have editors. When I submitted... (Score:4, Interesting)
It's not. I didn't. I was editted! (Score:2, Interesting)
relationship with the product. In fact, I started my submission (which
was editted, see other comment above) with something like "in a blatant
plug for some kind of profiling product..."
Re:I'll show you mine if you.. (Score:3, Interesting)
Since every other garbage collected language, from every other company, would have had the same problem, how does it show that?