Memory Checker Tools For C++? 398
An anonymous reader writes "These newfangled memory-managed languages like Java and C# leave an old C++ dev like me feeling like I am missing the love. Are there any good C++ tools out there that do really good memory validation and heap checking? I have used BoundsChecker but I was looking for something a little faster. For my problem I happen to need something that will work on Windows XP 64. It's a legacy app so I can't just use Boosts' uber nifty shared_ptr. Thanks for any ideas."
Most tools I've tried are useless (Score:4, Insightful)
They might be useful for small apps but if you have a massive app they are almost more trouble than they are worth.
It's hard to say what you can do except foster safe coding practice and highlight the common pitfalls such as memory leaks, buffer overflows etc. Many compilers can help detect heap / memory overruns because the debug libs put guard bytes on the stack & heap that trigger exceptions when something bad happens. There are also 3rd party libs such as Boehm [hp.com] which help with memory leeak / garbage collection issues and dump stats. I'd say using STL & Boost is also a very good way of minimizing errors too simply because doing so avoids having to write your own implementations of arrays, strings etc. which are bound to be less stable.
Re:Boost? Ugh (Score:2, Insightful)
If you wanna code in C++ then you'd better get used to the "weird syntax" of templates and especially the boost libraries, they ARE the basis of most of the additions to the standard library in TR1 so they will become the "C++ norm"
Re:Boost? Ugh (Score:5, Insightful)
I'm used to templates syntax (though I think its ugly and Stroustrup could have done a lot better) but Boost makes it worse by overloading operators and then using them in ways never intended that produce syntax that a plain C++ wouldn't even recognise, never mind understand what its doing.eg the gratiutous overload of () for matrix ops where a simple function call would have been much cleaner and easier to follow.
Re:Most tools I've tried are useless (Score:2, Insightful)
Purify is what you need (Score:1, Insightful)
Linux developers can use Valgrind, which is also very good and is free. But it won't run on your platform.
Then there are the static checking tools like Coverity. I believe that they do great things, though I have never used them. If you are a big company I think it would be well worth getting them to talk to you; you would probably find it intellectually interesting, if nothing else. There are other tools in the same field; Wikipedia has a list.
You may find that Purify is too slow. It has various options that you can tweak. It also benefits from having loads of RAM (steal it from your colleagues while they have lunch). But basically you need to live with the speed and either be patient or hack your application to go straight to the problematic bit.
In my experience this sort of debugging is always painful, and the lesson it teaches us is to *not put the f***ing bugs in the code in the first place!* By that I mean:
- Avoid dynamic memory allocation when possible (i.e. use std containers instead).
- Every time you type 'new' or 'malloc', think to yourself "where does this get deleted/freed?"; ideally the call to delete or free should be a few lines away from the call to new or malloc and it should be blindingly obvious that they occur in pairs.
- Be really clear about ownership of pointers.
- Use smart pointers (like the Boost scoped_ptr and shared_ptr) when appropriate.
- Avoid pointer arithmetic.
- Don't use a NULL sentinel value.
Every time you find yourself doing one of these "bad" things, try to remember your last epic all-night debug session with Purify and fix it....
By following these sorts of practices, I have managed to avoid any nasty memory-allocated related bugs for a few years. But of course it doesn't help with your legacy codebase.
Re:Boost? Ugh (Score:0, Insightful)
with which C++ was designed. Creating DSEL (Domain Specific Embedded Languages) like boost::spirit (where you basically write EBNF syntax in C++ to generate parsers) was and is a key goal when C++ was developed.
So you now don't know what an operator like does? So what, it does the what it is supposed to do in this context.
This helps to really rise the abstraction level of the language to solve problems of the problem domain, in a very efficient way.
A second vote for Valgrind (Score:3, Insightful)
Re:Most tools I've tried are useless (Score:3, Insightful)
Q) How do I deal with memory leaks?
A) By writing code that doesn't have any. (goes on to advocate vector & string)
And also: C++ Is my favorite garbage collected language because it generates so little garbage (http://www.research.att.com/~bs/bs_faq.html#real
Over the past 6 months or so, I've really made an effort to better my usage of C++ (using Effective C++, Effective STL and C++ Coding Standards). With a combination of STL, references, RAII, std::string and boost::shared_ptr, all of my memory, ownership & null-pointer problems just went away. I hardly ever actually write 'new' any more. The Java model of just leaking objects and hoping they'll get collected sooner or later seems horrible.
But I'm not maintaining old code, so this is completely -1 Offtopic.
Re:Boost? Ugh (Score:3, Insightful)
No , something like vectorAdd(v1,v2) would be a lot more readable and a damn site easier to grep for. Idiot.
But I just see you signed your post with "Idiot." Thus I guess I shouldn't have taken it seriously anyway.
Re:um (Score:4, Insightful)
Regarding legacy applications, I think the point was that he can't go back through the app and rewrite everything to use smart_ptr.
Most people can't understand Purify's output (Score:5, Insightful)
For example, this code has serious issues:
char *ptr;
.
.
.
ptr = method_that_returns_string_object();
.
.
And FWIW, I've used Purify on massive apps, and found huge problems that the developers didn't even know were there. On one project, they couldn't explain why their "perfect" app kept crashing, either. Worse for them, I had been hired as a consultant to fix their problems that they couldn't seem believe existed (HINT: your boss hired someone from the outside...), and after watching the team flail and spend literally almost a man-year trying to find one memory bug, I finally had enough of "advice giving" being ignored and got on their system, linked their app under Purify, ran it, and found the bug - a double delete of an object from two different threads. It all took me about fifteen minutes. I did that in front of their management. I made my point.
Purify (and like tools) are a great help. Not using them is like trying to build a house without power tools. Yeah, it can be done. But what would you think if hired a builder to make your house and his team showed up carrying hand saws? Oh, and you are paying that team to hand-saw all the lumber...
What would you think of that builder?
Yet, when a developer asks for tools like Purify, management often balks. Because 1) they're shortsighted, and 2) developers don't know how to use such tools.
Like I said - what would you think of a construction company where the workers don't know how to use modern power tools to help their productivity?
Well, you just put yourself in that category.
Yes, Purify is somewhat slower than running without Purify. But it's a lot faster than most other full-memory checking methods. If you're worried about speed, link against the Win32 debug libraries - they'll at least show problems with double free() calls, access of free()'d and deleted objects, etc. And without too much performance problems.
Re:um (Score:3, Insightful)
Don't allocate or free = no leaks = need no tools (Score:5, Insightful)
We pre-allocate pools of objects at startup and then re-use them. No other memory is allocated or freed while the process is running. Our pools of reusable objects are monitored very carefully as an object that isn't release back to its pool when the job is done is akin to a memory leak. Use of sentries to automatically release objects back to the pools when they fall out of scope is mandatory.
So my answer is to the problem is:
1. Use sentries (or some other mechanism) to guarantee memory is released.
2. Don't allocate except at startup.
3. No need for elaborate tools due to the above.
I'm sure that not all applications data usage would fit into this model, but it is surprising how many can.
We have seen some leaks in our applications. These were tracked down to STL internally leaking. They weren't generally very large and therefore we continue to live with them.
On the subject of garbage collectors, some of our colleagues use Java and
So don't think that a garbage collector is the solution. Perhaps in less demanding applications it is a potential answer.
Lastly, I strongly dislike anything from Rational. I find them overpriced unreliable bloatware (YMMV). Purify used to be good some time ago, but those days are long gone.
I echo what others have said above. You are a developer. You know your requirements. Build a simple tool to monitor and check your usage. For us it was managed pools of re-usable objects.
Re:um (Score:5, Insightful)
I guess that's a long way of saying "I agree completely with what you just said."
Re:Most tools I've tried are useless (Score:3, Insightful)
If one manages their objects correctly, C/C++ perform quite well too.
Re:Most tools I've tried are useless (Score:5, Insightful)
A good example of what I'm talking about is a std::ifstream versus a java.io.FileInputStream. If you make an ifstream on the stack, you can be absolutely certain that when it goes out of scope, the destructor will be called and the file closed. You can be certain that it will happen, and you can also be certain when it happens; at the very point it goes out of scope.
With a heap based FileInputStream, you have no such gaurentee. You leak it, and you just hope that the finaliser gets called soon (if at all). I've had more than one occasion where I've been leaking FileInputStreams quicker than the garbage collector cares to clean them up, and sooner or later the OS says 'no' and you get an exception. And it's very difficult to reproduce, because it's all down to the whim of the garbage collector, and you always go slower when you're looking for a bug.
Of course the answer to this is to say "Well you should Close() your input stream beforehand". But that's just as bad as saying "You should delete your heap based objects" in C++. It's that situation of having to manually shut down objects that seems old fashioned to me.
Maybe there's a better way these days, I've been away from Java for a couple of years now.
(I do enjoy coding in either language though!)
C++ errors suck! (Score:1, Insightful)
I would say that this is a serious enough problem that we ought to stop and fix it before developing yet-more-complex libraries. One attempt to fix it at the language level is the introduction of 'concepts' in the next version of C++, which allows template classes to specify properties that their parameters must have - and which presumably allows more sane error messages when the properties do not hold. An attempt to fix it at the library level can be seen in the message that you cite: "property map not found". Yes, it's embedded in a load of stuff from the compiler, but maybe that's the best it can do.
I'd be very interested to know whether any of the other compilers give more comprehensible error messages than g++.
Re:Most tools I've tried are useless (Score:1, Insightful)
Re:um (Score:4, Insightful)
I also have (unfortunately) written enough ugly stuff that when I go back later I say "I can't believe I actually did something that stupid."
You live, ideally you learn, and when you look at code you wrote 5 years ago you likely slap your forehead in embarrassment - that's how you know you're getting better. That, and when your coworkers aren't trying to slash their wrists when they get handed something you wrote...
Re:um (Score:5, Insightful)
You are confusing two aspects here. Ugliness does limit maintainability. But it does not limit "solidness". "Solidness" would mean that the code actually works, and has a proven track record, such as being used in production for over 20 years. Code that has been in production for over 20 years is usually both solid and ugly.
Or it could be a monument over "the world is a complex place, and if you change anything here, and it causes the program to fail in some weird special case, your company is going to loose umpteen zillion dollars". While the reality is probably somewhere in between, rewrites should still be avoided like the plague. However, if you really have taken the time to understand what some nasty bit of code does, there's nothing wrong about cleaning it up. But most of the time, the ugly code is there for a reason.
Home brew tool for memory leaks with glibc (Score:4, Insightful)
One of our guys coded up a simple shared lib that can be loaded with LD_PRELOAD that sets simple hooks of printing memory locations for new/realloc/delete. He then wrote a perl script that kept track of these things and spit out anything that was malloc'ed and not realloc'ed or free'd.
I can't post it, because technically it's not my code it's my company's. But his shared lib code is just 300 lines long, and shouldn't be hard to duplicate. The perl log filter is even more straighforward. Each malloc gets saved. Each free removes the malloc. Each realloc removes the old malloc and adds a new one. Anything left over is a leak.
Override __malloc_initialize_hook with a pointer to your init_function. In your init_function, save the old functions at __malloc_hook __free_hook __memalign_hook and __realloc_hook and substitute your own. Now write your replacement functions, in it, do your logging and temporarily replce the old hooks and call the original functions, replace with your hook on the way out to get the next call. All of the hooks should be wrapped in a mutex to help re-entrancy problems.
It's not a full memory detector, just does leaks, but it's non-intrusive, requires no recompiles, and is the best way we have to leak detect our huge server long running code.
Re:Boost? Ugh (Score:3, Insightful)
The only thing I can add to this is that an error message that only takes up 8 lines is a cissy error coming from BGL. I had errors that were multiple screenfuls. It seems somehow wrong when a tiny type error that can be fixed with maybe 3 or 4 well placed characters can be so verbose. I guess that's C++ for you.
Peter
Re:two points (Score:5, Insightful)
It's not automatically bad, but using semi-automated memory management like this tends to reduce the emphasis on constructing things only when they're needed and destroying them immediately when you're done with them. This concern, known as "Java bloat syndrome" in honour of the language that first popularised it, can lead to major performance problems in applications that manipulate a lot of data, and is a favourite mistake made by the cult of "hardware is cheap, so optimisation doesn't matter".
The thing is, this sort of care-free programming philosophy is natural in languages like Java, so languages like Java have had to learn from their early mistakes and adapt. There have been dramatic improvements in GC technology since those early days, and today there isn't the same degree of performance penalty associated with relying on GC to clear everything up.
However, this sort of behind-the-scenes magic isn't really the "C++ way". You can do it, but tools like shared_ptr don't have the same level of sophistication as full-blown GC. Using them requires some care from programmers, and as the grandparent post said, this can lead to problems if the programmers come to rely on them more than they ought.
FWIW, I'm not sure I'd have described things in quite such black-and-white terms as the GP, but I can see the underlying point and I think it's a valid one.
Re:Boost? Ugh (Score:3, Insightful)
Nope. garbage collection solves one problem, memory management, but does not solve the more general issue of resource management. Incorporating a few file handles, database connections or what you have into Objects in java leads immediately to manual resource management issues. You cannot reflect a couple of resources into an object and have deterministic release behaviour unless you explicitly code for it. Shared pointers (reference counting) does cater for this, albeit at a performance cost. RAII is impossible in Java, yet commonplace in C++, with or without reference counting. They're just different, each with tradeoffs of their own, mkay?
Re:Those "skilled coders" must not code much (Score:3, Insightful)
>>fantasy land and you're nowhere near as good a coder as you think you are.
Pfft.
Actually, good coding habits will indeed work.
We were three people coding a 100,000 line program, 0 memory leaks. C.
Re:two points (Score:4, Insightful)
Re:Boost? Ugh (Score:3, Insightful)
If you really consider yourself a skilled C++ programmer you'd acknowledge that C++ provides 1001 ways to do the same thing. For some purposes using operator overloading and templates is better, for other purposes using method overloading and OO inheritance is better. Same goes for other problems C++ offers multiple solutions for. Sometimes multiple inheritance is ok, sometimes it is terrible. Heck, sometimes using 'goto' even makes sense. If you're not only consider yourself a skilled C++ programmer but also a skilled software engineer, you'd also acknowledge that code re-use and design patterns are almost always good things if applied properly, irrespective of the implementation language. If you say 'design patterns' is just new and cool terminology for clueless programmers you probably never even opened the de-facto standard work about design patterns (Gamma et. al) and browsed it a little. It's just common solutions to recurring problems, that can save you a lot of work because you don't have to re-invent them yourself. It's just design re-use on the architectural level, which is even more important than re-use on the implementation level.
I think you should try coding up some Java, C#, D or Python some day. You'll probably be disgusted by all the 'paradigms of the month' they applied to those languages, how much re-use and design patterns are incorporated into them etc. You think it's just because the people who created the languages wanted to show off?
Re:Are you thick or what? (Score:2, Insightful)
Indeed, with generic programming, the same code may call several different implementations of operator+, depending on what type it is used on. The same goes BTW for normal named functions. And the same is also true for virtual member functions (operator or not) in an OOP context.
I'd say if you have to find the callers of operator+ in order to check if it is implemented correctly, there's something fundamentally wrong with your code.
Re:two points (Score:3, Insightful)
Re:Are you thick or what? (Score:3, Insightful)
Re:Are you thick or what? (Score:1, Insightful)
Oh I dunno , because you've changed the implementation and need to know where in the program its used so you create suitable tests? Just a wild guess.
Oh, come on, really? You didn't immediately think of searching for the name of the class on which operator+ is overloaded? The class name will appear in the function parameters or the local variable declarations for code where the unit tests might need to be examined. Given that hint, you should be able to think of where else you might need to look for the class name. If you cannot handle that, consider sticking to that language where "&" is used for string catenation, as you noted elsewhere. Presumably you were referring to VB6 or earlier?
- T
GC is good, but relocation +GC is better (Score:2, Insightful)
Before I go into a rant, I must first rant about how much I hate Java. I feel that it was a great proof of concept and that they should have taken what they learned and went back and did it better. Java is a lot of great ideas implemented poorly due to lack of experience. I think they should have spent more time with the SmallTalk guys who actually almost had it right to begin with. Hell, evolving SmallTalk would have been far more intelligent than turning C++ into SmallTalk.
Ok... here's the thing. I tried a lot of competing products. I tried memory checkers, memory allocators (and SmartHeap is the shit!), I tried memory profilers, hand instrumented memory logging, etc... what I've learned are a few things...
Garbage collection (even reference counting) can improve performance greatly, but it had little or no impact on fragmentation. A system that I slapped together as a malloc/free new/delete override proved quite successful at drasticly improving browsing performance. What I did was that instead of deleting memory, I queued deletes and when the pool needed to be grown, I would process the deletes or I would process the deletes during idle cycles.
This just made the program seem faster during runtime... obviously, the added overhead just made it slower.
To explain why a web browser is one of the most rigorous tests of a memory environment... just think of the hundreds to thousands of DOM nodes/elements/etc..., script objects, images, etc... there are in sinlge page. Add that each element is typically represented by a single allocated object. Consider that images can decode to 100 megs in size (yes, it happens), most often closer to 2-3 megs for background images.
A web browser can't use a memory system optimized for specific object sizes.
Due to the dynamic nature of the objects, object reusability is not really an option
Scripts can grow or shrink memory usage thousands of times per second
Browsers typically contain 3rd party code from plugins which need to interact with the browser
I can go on and on... a web browser is possibly the worst memory management nightmare on the planet. Often I worked with customers that were developing their own operating system. They used to tell me that my web browser must suck because making it stable on their system was a pain in the ass and often required them to either change or rewrite their entire memory management system to get good performance on embedded devices. Then I'd explain to them that up until now, their memory manager has been having a friendly snow-ball fight with a penguin, now it's running for it's life from an avalanch caused by a mean Yeti.
So here's the deal, GC really didn't pay off for us... helped a little, but I have to say that once we started using reference counting and simple GC, the quality of the code got really poor. Just look at Symbian for an example of a product that suffers from using great memory management system that increases coding complexity 10 fold. It makes it so that you spend all your time coding for the memory manager and you run out of time to make the program itself work.
Now on the other hand, I played with a few Java web browsers and learned something important. Java is made for phones. If it has no other purpose (and I'm convinced it doesn't), it's for embedded devices. Because of relocatability, fragmentation doesn't occur (in a good VM) and applications run much better. The GC + Relocation system is REALLY REALLY REALLY good, if I were to start writing a new web browser today, I'd find a good alternative to Java and get moving on it.
Oh... last thing about auto-pointers, they're a blessing and a curse. For the most part, I find the best solution to be to use a proper system library like Qt instead of boost or STL. Qt seems to actu