C Coding Tip - Self-Manage Memory Alllocation 142
An anonymous reader inputs: "The C programming language defines two standard memory management functions: malloc() and free(). C programmers frequently use those functions to allocate buffers at run time to pass data between functions. In many situations, however, you cannot predetermine the actual sizes required for the buffers, which may cause several fundamental problems for constructing complex C programs. This article advocates a self-managing, abstract data buffer. It outlines a pseudo-C implementation of the abstract buffer and details the advantages of adopting this mechanism."
Um (Score:4, Interesting)
Re:Um (Score:5, Insightful)
And malloc is of course free, right? ("well no wally, they're opposites"
Good gc's operate incrementally. Good gc's let you turn gc on and off at will and disable it altogether for designated arenas. Good gc's can run in a separate thread on another CPU, whereas malloc/free cannot.
The reason java's gc goes wiggy is not because the gc is bad (it's just not very tunable except on solaris), it's because it allocates new objects all over the place (and is happily helped at it by the standard libraries). If you go hog wild with resource consumption, yes you're going to pay for it later.
For the 99.99% of programs that do NOT need hard realtime, you're better off with gc. Cripes, it's like saying homes shouldn't have thermostats because a home thermostat isn't suitable for a reactor sensor.
Re:Um (Score:5, Informative)
Of course, there are also hard real-time garbage collectors (ie Cheng's [nec.com]), though I don't think you'll find them in general-purpose production compilers. However, you will find good garbage collectors in a number of real production compilers (say, in mlton [mlton.org]). It's definitely worth benchmarking.
Re:Um (Score:5, Insightful)
While I don't think GC is quite to the point where it is free or beneficial to the performance of the average application, it is a lot less harmful than most people think. Given that it simplifies the code and eliminates a lot of bugs (usually more than it introduces), it is definitely worthwhile in almost all new application code (kernel-mode code isn't quite there yet, but it's coming), with only a small performance penalty. And I suspect that it won't be too long before it starts to be more of a speed booster, not a perf hit.
I think this is just another step in the process of handing another menial task over to the CPU. We moved from binary to assembly, assembly to low-level languages, low-level languages to higher-level languages, etc. At each step, the new method had a performance penalty at first, then as the new method matured, it turned out to actually be faster than the old method it replaced, while dramatically increasing programmer productivity (i.e. modern optimizers can usually do a better job than an assembly language programmer; often C++ code is faster than the equivalent C code since the compiler has more information to work with and the programmer can make use of more effective techniques like templates).
Re:Um (Score:1)
Re:Um (Score:2, Informative)
Forget C++! High-level modern multi-paradigm languages like Common Lisp and OCaml, which can do most things C++ can do and a lot of things it can't, often produce code as fast as the average C implementation of any given algorithm, despite their relying heavily on GC. And while it's generally possible to tweak the C version t
Re:Um (Score:1)
So the point is that people shouldn't be afraid of "high level" features like GC. While
Re:Um (Score:2)
Re:Um (Score:1)
You lucky dog. I've currently got the fun job of trying to write a hard realtime application for a DSP chip which does not have a C++ (or high level language except C) compiler. For me, this technique looks pretty interesting.
Re:Um (Score:1)
Re:GC in one thread is the fatal bottleneck (Score:1)
Re:Um (Score:3, Interesting)
I've heard that for years, it's yet to be true for the general case ... and most people doing manual allocation don't call malloc/free
Re:Um (Score:2, Insightful)
Caching slab, lookaside lists, whatever, you're still calling some kind of allocator with some kind of cost. There are definitely applications where memory allocation/deallocation follows a pattern that allows that kind of optimization to work well. But here is the
Re:Um (Score:2)
Let me prefix this by saying that I didn't want to get into a GC flamewar, as I said I think GC can be used in some applications where you don't care about the negative side affects. I only replied to you as you seemed to be giving one of the better "Let's burn everything and rebuld using only GC" arguments. Possibly some of the difference is due to you working for MSFT and me being at RHAT, but I doubt it ... and if you don't hold it against me I'll do the same :). So anyway...
Re:Um (Score:1)
To make sure that I wasn't just blowing steam, I did a little bit of looking around, reading some research papers, etc. Here is what I came up with:
Re:Um (Score:2)
Well it's hard to argue with no links :). But your point that large objects are easier on a malloc()/free() seems like common sense. It's much more often that you'd write a custom allocator for small objects.
Re:Um (Score:5, Interesting)
So in the presence of exceptions, you won't leak memory on the heap. But you will leak mutexes, file handles, etc. You need another idiom to handle those cases.
In the .NET world, C# introduced synchronization blocks to handle the leaking mutex problem. But it is a pain in Managed C++ and VB.NET.
Garbage collection is not the be all and end all.
If I ruled the world, I would create a multi-paradigm (object-oriented, generic, functional, and modular support) strongly-typed low-level language that let you program at a high-level. A second high-level langauge that was loosely-typed, garbage collected, and could be interpreted or natively compiled. Then I would define a standard to interface the two languages.
In other words, take C++ and add the concept of components/packages. Take Python and add the features (such as generics) that are missing from C++. And then define an interface between components written in both langauges.
Currently Boost.Python and SWIG exist. But I wish that they would just work automagically, everytime I typed make at the command line or build in VC++.
Re:Um (Score:5, Insightful)
----------
You just described Scheme/CL/Dylan.
Re:Um (Score:2)
Re:Um (Score:1, Insightful)
Re:Um (Score:1)
Re:Um (Score:3, Funny)
My GC does something like this:
1) Allocate memory, reference count 1
N-1) When ref count reaches 0, we call destroy() on the object
N) Free the memory
Re:Um (Score:2)
1) Allocate object A(1) (reference count 1), B(1) and C(1)
3) A(1)->B(2) (Reference B by A)... A(2)->B(2)->C(2)->A(2)
4) Dispose root references to A(1),B(1),C(1)
5) Have fun with your memory leak.
Re:Um (Score:2)
Re:Um (Score:3, Interesting)
So in the presence of exceptions, you won't leak memory on the heap. But you will leak mutexes, file handles, etc. You need another idiom to handle those cases.
Java is probably the most widely-used garbage-collected language in existence. I think I speak for all Java programmers when I say "WTF are you talking about?"
It is true that you
Re:Um (Score:5, Interesting)
He's talking about the unpredictability of resource release using GC. If unpredictability isn't a problem, fine. If you need to synchronize your resources carefully (which is what a mutex is
Now, this article is about C, so let's compare the two.
The post you're responding to wasn't about C: it was about a weakness of GC compared to, say, RAII (which is the idiom C++, among others, uses). But just for fun, let's go on to see how little you know about C.
Java: failing to close a file == usually no problem whatsoever
Unless you need to open the file again before the garbage collector decides to reclaim the handle.
C: failing to close a file == permanently leaked handle
Bzzt... wrong. All file handles are released upon program termination. I like how you used '==' to try impress people with your programming skillz, though.
As far as the other case you mention, mutexes, goes, Java has two means of providing mutual exclusion. The "synchronized" keyword
Wait, wait... you're saying... hold on a second, now... that Java uses a different idiom to handle mutexes? That's exactly what the parent post said it would have to do... because GC isn't as useful as RAII when it comes to general resource allocation (not just memory).
But you make it sound as if garbage collection is a step backwards from malloc/free
He made no such comparison. He compared it (unfavorably) to RAII.
Re:Um (Score:2)
"Next, all open streams with unwritten buffered data are flushed, all open streams are
closed, and all files created by the tmpfile function are removed."
Maybe you should learn the language a little better.
Re:Um (Score:2)
Re:Um (Score:2, Interesting)
My guess is that he or she was referring to C++ using the RAII (resource acquisition is initialization) paradigm. Every resource is wrapped in a object and all objects are put on the stack. Voila, no resource leaks.
Also, I think your example is far from complete. While failing to close a file handle may be "no problem whatsoever" for the application itself, but can be a pr
Re:Um (Score:2, Insightful)
Re:Um (Score:2)
Would you call C++ strongly typed if the keyword would be "typealias" instead of "typedef"?
Even enums aren't distinguishable from ints!
Bzzt, wrong! Try to assign an int to an enum without a cast... You'll get an error.
Re:Um (Score:1)
I'm afraid I only know of 'typealias' in Fortran (where it is synonymous with 'typedef') and in the Meta Object Facility (MOF) of CORBA. I have found reference to a python class of that name, but know nothing of it.
Would you be able to provide me a reference? Ta.
You are correct in
Re:Um (Score:2)
I was trying to make it clear that typedef doesn't define or declare a new type, but an alias to an existing type. IOW the keyword "typedef" is misleading, but it doesn't make C++ weakly typed.
The cast, however, provides no type checking.
Yes, that's a design feature of the language. The compiler errs out if you implicitly try to cast incompatible types, but when you explicitl
Re:Um (Score:2)
And if I ruled the world, me and Salma Hayek would... I'm sorry, what was the question?
Re:Um (Score:2)
Which means you don't have predictable destruction. Which means you don't have destructors. Which means you can't use idioms like resource-allocation-is-initialization.
In standard Python you have both garbage collection and predictable destruction.
In other words, take C++ and add the concept of components/packages. Take Python and add the features (such as generics) that are missing from C++. And then define an interface between components written in both langauges.
If you are using Python for your
Re:Um (Score:2, Informative)
In the
File/mutex/whatever leakage in the presence of exceptions? Two words:
unwind-protect
Re:Um (Score:1)
Weeeee Lets implement Slab allocators (Score:2)
Gee, isn't that handy (Score:5, Interesting)
The fundamental problem is that this sort of thing needs to be done at the C library level. And if it's not done in a flexible fashion, you end up with a library call that rarely gets used. Anyone used hsearch() lately?
If only clib streams (FILE* and friends) were extensible, this article would never have had to be written.
c.
Re:Gee, isn't that handy (Score:2)
Take a look at, Vstr [and.org] I think it's pretty flexible ... it certainly has much better researched documentation than the content for this IBM "research" article.
Re:Gee, isn't that handy (Score:2)
glibc's fmemopen() moots most of the IBM article, I think, but since I don't code exclusively in a glibc environment... Grrr... If only POSIX specced out FILE* a bit tighter...
c.
Re:Gee, isn't that handy (Score:2)
The io_* API is part of the examples, and not in the library itself. But it's a pretty small wrapper over what's in the library ... and, yes, the library itself was designed so it could do non-blocking IO [and.org] (which FILE* can't). So in that regard, yeh, I don't tend to use FILE* anymore.
Please (Score:5, Funny)
realloc (Score:2)
Re:realloc (Score:2)
Re:realloc (Score:5, Interesting)
When you call realloc, you're very likely to cause the data to be copied from the old buffer to the new buffer. This is very high overhead. The article discusses how to do similar things, but without this unecessary copying (eg, low overhead). It's actually not that interesting of an article as what it describes is hardly new and I believe any competent programmer could come up with that solution when faced with the particular circumstances that inspired it.
Realloc works by seeing if there is free memory after the end of the allocated block, and changing the block's size if so. Realloc can do this because it knows about the internals of the malloc/free implementation. If there is allocated memory right after the block in question, a new block must be allocated, as you cannot "move" the later block in a language like C where any memory location can be a pointer. You could try this kind of stuff in other languages (or in some bastardized C where you do not have direct access to memory, but go through more indirection, the next logical abstraction after the article), but when you start automatically finding/checking/updating memory pointers, you get into GC.
You may be able to overcome some overhead on realloc if you move the problem down into the kernel. The kernel could play page table games so there is little or no actual copying involved, just updating of page tables. This would be fairly easy to implement, but I don't think anyone's done it because (a) flushing the relevant TLB entries could hurt performance more than the copying, and (b) the system call overhead might be more overhead than the copying. Realloc is generally only used for small buffers (due to programmers knowing about the copying overhead) and this trick would only have gains for large buffers spanning multiple pages. For small buffers, the library-level realloc could avoid the system call and do the copying itself, avoiding system call overhead and TLB entry flushes.
This scheme I describe could make for an interesting paper (especially determining for what size of buffer and what type of program it has gains), but I doubt it would make much difference in real system performance as programmers avoid realloc for large buffers, and there are very few cases where one needs direct linear access to a large range of memory rather than being better-served by organizing that memory into some data structure.
Re:realloc (Score:2)
One way to use realloc better is to have a current length and max size and also rellocate more memory than need at the needed time. This cause better locality than a linked list. And you can always use realloc to make the list to the current length. Linked lists cause many problems when it comes to prefetching and fast memory access so they are slow when you are accessing things down a list. You can do a tree using one list also, by having an invers
Re:realloc (Score:3, Insightful)
Care to back that up with some [and.org] facts [and.org].
So you a) have the library code do that ... and b) have lots of tests. Of course you want to do both of those for something using a single block of memory too, and if you want it to be efficient it usualy does something clever to avoid copies ... and so is probably more likely to screw up and use/deallocate th
Re:realloc (Score:2)
Linked lists require links -- this puts pressure on the L1 cache. Linked list access is also necessarily serial, meaning that you lose all parallel execution potential (from high level software all the way down to the CPU core). Sub-blocked operations require full iteration through the intersection. And finally, of course, there is the small matter of random access.
Re:realloc (Score:3, Informative)
I said facts, not theories. Yes, I know all about cache and his friend random access. Maybe you'll take a noticable cache hit when moving between nodes ... and yes, certainly doing memcpy() over a single large block X will be faster than doing it X/20 times for 20 byte nodes (20 was the figure given in the IBM "research" article). But that doesn't take into account the time taken to call realloc() each time to expand the large block ... o
Re:realloc (Score:2)
AFAIK even glib has these.
Re:realloc (Score:2)
Re:realloc (Score:2)
Realloc can work well if you always malloc the largest buffer you possibly need, then realloc smaller when the real size is known. For that to work well, the allocations either need to be serial (that is malloc, load, realloc), or you also have a need for smaller blocks that will fit in to the space made by reallocing down.
In some systems such as where you are the only process on the CPU, you can do compression (copy down) when waiting for external events (like an interrupt). At that point, the cycles and
Memory Allocation (Score:5, Funny)
Hmmm (Score:5, Interesting)
This is simple to do, and avoids a lot of errors. It's also not much of a headline.
Re:Hmmm (Score:3, Insightful)
Um, this is reasonable how? If you're coding on a 3GHz dream machine w/4GB 400mhz RAM, there will be somebody out there who, quite reasonably, will want to run your code on a 64MB Pentium 133... Testing only on unencumbered machines makes for delusional developers.
I'm split: either yer trollin'
Hmmmm (Score:2)
Also, I'm not suggesting you allocate a 30 meg buffe
Re:Hmmm (Score:2)
Note that in Linux and many other OSes, malloc and friends don't actually cause physical memory to be committed to the process, they just create an unmapped virtual area. RAM is only committed when writing to a page area faults a page in. Reading from such a virtual area will fault THE zero page in (one page in the whole system containing zeros), writing to a zero mapped page will fault in a different page.
So, you can allocate a 1GB buffer and read it all to verify that it contains only zeros, at a cost
Re:Hmmm (Score:3, Insightful)
While I agree the IBM "research" article is terrible, the idea behind it isn't.
Actually having donetests [and.org] and benchmarks [and.org]. I can safely say:
Just to be clear... (Score:2)
Your benchmarks, on the other hand, are a good headline. Going into a project, you usually have a fair idea of your options for memory management and how long they'll take to implement. However, you don't always have a good grasp of the performance implications - your breakdown is handy.
Re:Hmmm (Score:1)
It has been my experience that this was done when systems were constrained, ie. test to see if there is enough system memory at start-up instead of running out of memory in the middle of execution. This was apparently so prevelent that some vendors (SGI for one) changed malloc to use a "first touched" memory model. In this model,
Indeed (Score:2)
Mostly I just hate people to be doing lots of work in C to save 30k of system memory - and ending up with a buggy program full of memory leaks. Many apps have data sets this small (30k) and yet are spending lot
Wow! (Score:5, Funny)
Now, I'll need a nice short catchy name for it... oh! I know! I'll call it a heap!
Useful (Score:3, Interesting)
The solution is one of the "bad ones"... (Score:1)
And the proposed solution requires both parties to stick to the common adbtract buffer interface.
Hmmm!
A memory leak IN THE SPECIFICATION?!? (Score:5, Funny)
From the article:
pLostBlock?!? This almost sounds as if it's designed to leak!
-- MarkusQ
P.S. Seriously, I think this is a fine idea, if not particularly earth shaking. But the typo was too ironic not to point out.
Vstr (Score:5, Informative)
The article basically proposes a very bad implementation of Vstr [and.org], most of the advise was extremly simplified at best but more likely just uninformed: an "efficient" abstract buffer that mixes shorts and pointers -- words almost fail me, how to solve the problem of "what do you do with the data when it's all in the buffer" -- "let's just copy it back out again (hey whats a couple of extra copies between friends). Representing in memory object sizes with "long int" *sigh*.
If you are interested in the article, go read this explanation of why you want it for security [and.org] and this explanation of why you want it for speed [and.org].
Vstr is LGPL, has actual benchmark data behind the block sizes it picks, has an extensive test suite ... and has documentation for the many functions that come with the library (including a fully compliant printf like function). Of course, I don't have a PhD ... but after reading this, you might well count that as a plus too
Re:Vstr (Score:2)
FWIW, this is not as trivial as one might think (from your project, I suspect you're aware of the issues involved).
I've been working on a program that ptrace()s another program and has to keep track of ranges of memory. Just because I like doing things properly, I figured that using a void * to store memory range offsets would be sufficient. Problem is, C doesn't define some operations that I needed to do (like modulus) on pointers. So I get t
Re:Vstr (Score:2)
Well I meant the sizes, not
Not News (Score:4, Informative)
Re:Not News (Score:2)
I do a lot of work in vxWorks. The version that I routinely use in supporting a legacy product (5.4) doesn't do any collection on the free'd spaces. Thus, you can very easily kill the system by doing something like this:
while (1)
{
ptr = malloc(rand());
free(ptr);
}
Eventually you'll wind up where memory is so fragmented that it can't find an appropriately sized chunk for a given size, even though there's plenty of memory available.
It's easier
Re:Not News (Score:2)
SCO owns C++ (and C too?) (Score:1, Offtopic)
Darl McBride, SCO
WTF?! It's true, he said that. Read more here [zdnet.com] and here [groklaw.net]
So use C++ already (Score:4, Insightful)
Yes, C++ has a host of problems and Strostrup and the C++ committee refuse to fix them. But the STL is a huge improvement on malloc/free. (They still can't get auto_ptr right, though.)
I must be missing something (Score:1)
unsigned char *buffer = 0;
b(&buffer);
free(buffer);
Re:2 words for this article.. (Score:1, Offtopic)
Re:This isn't Slashdot worthy (Score:2)
On my Linux box, if I malloc() a megabyte buffer, but only ever write to the first page of that buffer, the VM system will only ever hand me a page or so to use.
Probably a bit oversimplified, since overcommits might cause pressure to write out buffer data, but still...WTF is this guy thinking?
Re:What a crap story... (Score:2, Funny)