Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Programming IT Technology

The New Garbage Man 205

We've all heard of "garbage collection" in reference to cleaning up after yourself after a memory allocation. Two graduate students at the Illinois Institute of Technology are proposing a new method of garbage collection and memory allocation called DMM. What they are trying to do is "map basic software algorithms such as malloc(), free(), and realloc() into hardware." Sounds as if it has some promise, but can it ever be as flexible as some of the memory managers used today?
This discussion has been archived. No new comments can be posted.

The New Garbage Man

Comments Filter:
  • If they were to do this, it would probably increase performance. But marginally. At least in linux programs, when I've done profiles, most of the cycles were taken up in console i/o. Printf, etc.

    It would also require a modification to the kernel - so that the brk() system call would simply call the hardware variants.

    I think that it probably *would* remove flexibility, but at the same time I think that's a good thing. Part of the problem with memory leaks and stuff is the flexibility...

    Then again, I might just be talking out of a bodily orifice.


    If you can't figure out how to mail me, don't.
  • Week files that can't survive a crash should be abandoned and deleted. you heard it here first!
  • What's the point of trying to speed up memory allocations? Why not just make the hardware faster?

    I'm not giving everyone the right to go out and write inefficient code. It just seems that this research will not get very far. By the time they have something working, memory will be faster than their hardware solution. Sure, just use their hardware to make things even faster. Only if they get the cost down very low.

    Barjne looked at some specialized machines. They just weren't cost justified. I would have to think that is the case here.

    I'd rather see research into fabricating memory that ran at core processor speeds. That would speed up every memory access, not just the malloc()s and free()s.

    --
  • by Spiff T ( 17065 ) on Saturday February 26, 2000 @09:44AM (#1244241) Homepage
    After working on a project that was rewriting a heap allocator (I just wrote the test program, but I did read up on memory allocators quite a bit), I realized exactly how complicated memory allocators can be and more importantly, how little the community knows about exactly what is the optimal way to do memory allocation so that both speed and fragmentation are mimimized. Given this uncertanty it is probably best to keep the allocator in software so that it can be modified.
  • Why is having this 'software flexibility' important? I mean, most of the programmers I know just use the OS's memory management stuff. (and, if they really want to they can even do there own memory management). So it's not like the application programmers really lose any of there flexibility anyway. Its kind of like 3d hardware, some of the really crazy demoscene [scene.org] demos might not be possible with hardware acceleration, but anyone just using a standard API being rendered in hardware would notice an enormous improvement in speed.

    But it seems like, unlike games and demos, most programmers are using "APIs" for there memory management. So I don't think switching to hardware would do anything but speed things up. It could also allow for system types that we haven't seen before, with much, much, more use of dynamic memory.

    [ c h a d &nbsp o k e r e ] [dhs.org]
  • Transparent garbage collection has always been a good thing...it's just been a slow good thing. Hopefully, hardware support for garbage collection would simplify matters in the way that hardware threads could simplify the problems of multithreaded code.

    (Of course, both problems become easier if one programs in an appropriately intelligent functional language, not C -- a language that facilitates integrating such things in a transparent way -- but that's most definitely another rant...)

    When I hear questions like "but will it be flexible?" I can't help thinking that it sounds a little like whining from folks comfortable with their current programming model. Considering the number of problems that memory management gets programmers into, you'd think that any move towards a fast, automatic system (think George W. -- Java on cocaine) would be welcomed...
  • by Merlin42 ( 148225 ) on Saturday February 26, 2000 @09:46AM (#1244245)
    The whole idea behind RISC is that the hardware provide FAST primitives that the upper levels can use to create more complex structures. Putting malloc free etc into hardware would just make things more complicated and maybe faster in the VERY short run but would surely not be as good as just making a DAMN fast memory subsytem or an awesome processor.
  • The Java Virtual Machine Specification [sun.com] has stated since 1996 that:

    "The Java virtual machine does not assume any particular implementation technology, host hardware, or host operating system. It is not inherently interpreted, but can just as well be implemented by compiling its instruction set to that of a silicon CPU. It may also be implemented in microcode or directly in silicon." from Chapter 1, the Introduction.

    Since the JVM is responsible for garbage collection, this implies that the concept of implementing memory management in the hardware is at least 4 years old.

    I find it hard to believe no one's thought of this idea before for C, C++ (and so on) either.

    I wouldn't knock it till they try it or, at least, someone publishes some calculations that prove or disprove the possibility of major speed-ups.

  • What's the point of trying to speed up memory allocations? Why not just make the hardware faster?

    Um, that's what there doing, basically. These 'specialized machines' could be included in the next intel/amd chipset.

    'd rather see research into fabricating memory that ran at core processor speeds. That would speed up every memory access, not just the malloc()s and free()s.

    Well, I'm sure we'll have 800mhz ram in a few years, of course CPU speeds will be around 3 gigHz. The only way to get (cheap) ram speeds that fast is by stopping CPU development. Anyway it doesn't really matter, because almost all Memory access come from out of the CPU catch. (If you have a PC, try disabling the l1 and l2 catches in the BIOS, see how fast your system is without them). Speeding up raw main memory won't get you that much of an increase. And what would you rather pay for, a $25 chip that speeds up a few routines, or $10,000 worth of ultra high-speed ram?

    [ c h a d &nbsp o k e r e ] [dhs.org]
  • JWZ [jwz.org]'s thoughts on garbage collections. [jwz.org]
  • by xyzzy ( 10685 ) on Saturday February 26, 2000 @09:59AM (#1244252) Homepage
    How is this different from what Lisp machines (Symbolics and LMI) did 15-20 years ago?
  • FrontPage's lack of proper software garbage collection is the perfect environment to test hardware garbage collection in.

  • this seems as though it would be another one of those innovations that causes problems. why? it sees as though it will make it so that programmers no longer have to code correctly, and we already have enough of this garbage (pun) floating around. it's time to stop making it so easy for programmers not to have to code properly; myself included. i WANT to have to worry about maloc() calls etc. 'nuff said
  • Well, I'm sure we'll have 800mhz ram in a few years, of course CPU speeds will be around 3 gigHz. The only way to get (cheap) ram speeds that fast is by stopping CPU development. Anyway it doesn't really matter, because almost all Memory access come from out of the CPU catch. (If you have a PC, try disabling the l1 and l2 catches in the BIOS, see how fast your system is without them). Speeding up raw main memory won't get you that much of an increase. And what would you rather pay for, a $25 chip that speeds up a few routines, or $10,000 worth of ultra high-speed ram?

    Sorry to be anal about this, but it is "cache".

    I understand they are trying to make a few specific things faster, but my point is work on making the RAM run at processor core (which for the most part, happens to be what cache RAM runs at) speed. What I'm thinking of is eliminating the cache and make memory run at cache speed. The reason that cache exists is because main memory is slow. If main memory is very fast, there is no need for cache.

    And, quite frankly, unless my $25 can speed my system up noticeably, I won't spend it.

    Why would cheap RAM speeds stop CPU development? I could see increase RAM production lowering CPU production, but not stopping development.
    --
  • Why do I care what JWZ thinks again?

    He wrote a lot of code. Big deal. He doesn't have any theoretical training that would come into play here, just programmer feedback.
  • I don't think this would make much difference speed-wise of course, but it could be helpful against memory leaks.
  • Look, would you rather have 512 Megs of slow ram, or 16 Megs of core speed ram. The reason ram is so slow now is because its so cheap, Less then a dollar a Meg. When the pIII came out, it had about a half Meg, of half speed cache, and that cost almost as much as the processor die itself. I'm sure it would be possible to make SDRAM run at 800mhz, without changing anything, but why would you want to? Look at just the price difference between an Athlon 800 and an Athlon 500. Now imagine the price of RAM chips when going from 100mhz to 800. The CPU spends hardly any time whatsoever accessing memory outside the cache, speeding up main memory to core speeds wouldn't speed the computer up very much at all. Like I said, go into your BIOS and try disabling L1, and L2 cache. There's an order of magnitude difference in performance when you do that, because the CPU never really uses Main memory other then for loading and unloading once.

    [ c h a d &nbsp o k e r e ] [dhs.org]
  • A lot of the posters here don't seem to get it. Two things to keep in mind:

    1) Garbage collection is good for two reasons:
    i) You simplify your interfaces. W/o garbage collection, your interfaces need to be designed to keep track of what has or hasn't been allocated. This makes programmer simpler. If you don't use GC, you essentially are "reference counting", which is one of the worst performing GC algorithms, in fact.
    ii) You'll have fewer memory leaks. Memory leaks are difficult, often impossible to detect in large applications. W/o GC, the amount of memory leaked is unbouded in the program size. With GC, you can formally prove that memory leaks will remain bounded.

    So with all these good thing going for GC, why is it not commonly used?

    1) It is often slower.
    2) It's not a standard part of C/C++. (But it is in Perl, Java and just about every other "modern" language).

    The hardware knows when a reference/pointer is used. Software often doesn't and has to rely on techniques like reference counting or mark&sweep. This would be a *lot* easier if a few simple hardware hacks were throw in (not too many transistors). Remember, b/c of Moore's Law, CPU makers have more transistors than they know what to do with. It's kind of like "test and set bit"... implementing this in hardware is waaay easier than implementing locks in software.

    As for #2, well talk to Bjarne about that.. .apparently the man can do no wrong, if you ask him...

    -Amit Dubey

  • by sjames ( 1099 ) on Saturday February 26, 2000 @10:27AM (#1244267) Homepage Journal

    FrontPage's lack of proper software garbage collection is the perfect environment to test hardware garbage collection in.

    Wouldn't hardware dedicated to collecting software garbage tend to delete FrontPage entirely?

  • If this is implimented then only C languages would benifit from this, OS's like Linux, possibly Windows, but non C languages, VB, Pascal, etc wouldn't benifit from these because they couldn't use them effectively.

    Instead of moving a few C functions to hardware I'm with most people here who think that we should just create a faster memory subsystem.

    -- iCEBaLM
  • And you miss the point...

    If we want things to be "DAMN fast", why don't we just write everything in assembly? Let's hand-tune all our code! Yeah! (David Cutler complained about this when first implementing NT at Microsoft, btw -- he wanted to develop for multiple platforms to force cross-platform compatibility, but MS programmers of the time would automatically code in x86 assembly to make things faster...)

    Garbage collection improves the memory management model, which is the point in the first place. Which is better: slightly slower code that reduces the number of memory errors and security holes, or faster code with bugs?

    (For comparison, which is better: portably written, slower C or fast, nonportable, nonreadable x86 assembly?)
  • People complain about FrontPage and grammar? That's hardly a valid reason. Having met two of these individuals (Chang and Srisa-an), I can say that I do not believe either one to be a native English speaker. Both are, however, very knowledgable about computer architecture and hardware design.

    I'm not saying that this is a good/bad idea, or that it will/won't improve performance. I don't know. I do know that the two men I've met are two of the very few people in IIT's CS department who might come up with something usable, regardless of their lack of English and HTML skills.

    The English I saw browsing the pages was really not bad, and even if the page was done in FrontPage it's still better than the main IIT site (take a look if you don't believe me). Many successful open-source software projects involve people who do not use flawless English, but that does not affect the quality of the work.
  • What do you think Pascal, Ada, and Java call
    when they want memory? Malloc. It's what anything
    calls when it wants memory, ultimately.
  • 1) This *IS* a hardware solution - read the article before replying.
    2) This doesn't counter RISC philosophy (ie many RISC chips have test&set bit instructions)
    3) GC *IS* good. GC is for incompetant programms just like "High-level" languages such a C were for incompetant programmers in the 1970's and 80's.
  • Ha ha! It's posts like this take make me love slashdot.
  • Sure, take pot shots at ASM programmers.
    The ASM is better, depending on what you're doing.
    For example if you're writing some small low-level code,
    A) Its short so porting isn't really an issue. A few thousand lines of ASM code is just as easy to re-write for another arch rather than port.
    B) ASM is just as readable as C these days. take a look at MASM (Microsofts ASM compiler)
    C)Its fast as hell, and for small critical sections of programs, its really usefull.
  • Wrong. Faster CPU= Faster CPU. The memory will create a bottleneck. Memory needs big improvement.
  • But it seems like, unlike games and demos, most programmers are using "APIs" for there memory management. So I don't think switching to hardware would do anything but speed things up.

    True, programmers use APIs for memory management, but what do these APIs call? malloc(), more than likely.

  • by pli ( 8167 ) on Saturday February 26, 2000 @10:52AM (#1244283) Homepage
    After reading the one page description I have some comments:

    They are going to insert malloc/free into hardware, that doesn't mean they will build garbage collection into the hardware (which is a completely different thing). However, as the project description says, it could improve garbage collection performance, but that is another issue.

    DMM (Dynamic Memory Management) is a well known term, not something new as the slashdot article tries to make it sound like.

    Building DMM into hardware it not a new idea. However, I'm not really sure if it has been successfully implemented in any "real" CPUs.

  • What do you think Pascal, Ada, and Java call when they want memory? Malloc. It's what anything calls when it wants memory, ultimately.

    But they all do it differently, and even the same languages in different compilers allocate memory differently, this is what I'm talking about. If you impliment C style malloc(), free(), etc in hardware, then only C languages will benifit because other languages do it too differently for it to be useful to them.

    -- iCEBaLM
  • I'd rather see research into fabricating memory that ran at core processor speeds. That would speed up every memory access, not just the malloc()s and free()s. There is no technical problem with implementing memory at the processor core speed. The problem is econonics. Thats the reason we have small fast (and expensive) on-chip cache memory, and large (but cheap) off-chip main memory. So the problem is not in the fabrication of such high speed memory (although it would require being physically on the same chip as the processor to attain such speeds), but rather the economic feasability of producing such chips in high quantity. I'm sorry, but it just isn't going to happen, we are going to be stuck with slow main memory, and fast on-chip cache, and no amount of research is going to find a way to make it economical any time soon. Rather, research *should* be done in small but realistically achievable goals, like better hardware caching algorithms (greater hit ratio) and things like implementing hardware memory allocation/deallocation. Spyky
  • What's wrong with reference counting?
  • by AndyB ( 11841 ) on Saturday February 26, 2000 @10:57AM (#1244287)
    While I agree that this is pretty useless for your general purpose desktop, this is useful for real-time systems.

    People would love to write programs for such systems using a high-level language like Java or C++. The main issue is that real-time systems must be able to deterministically predict how long any operation will take. Problem is, with current implementations of runtimes, the garbage collector could kick in at any time and take some arbitrary amount of time to finish, totally messing things up.

    Lets say you've got a software-controlled robot arm, putting together a car or something (cheesy example). A start command is executed, and the arm begins moving. Then, right before the stop command was supposed to be executed, in comes your garbage collector. It does its thing and finishes, and the stop command executes - 50 milliseconds too late, putting the arm out of position. Oops - I hope you weren't planning on selling that car.

    But if you can deterministically predict when and how long all of the operations will take, voila, you can just take them into account when scheduling what happens and everything is all hunky-dory. The hardware part of it is a bit of a distraction I think - the deterministic part is what's cool.

  • Maybe interesting in this context, a language called 'C--' was developed last year to serve as a portable assembly language with garbage collection. It was intended mostly for compilers of functional programming languages (many fp compilers generate regular C code). Here's a link [microsoft.com] (yes, at microsoft!)
  • On the flip side, Windows should release the memory of a process after termination. Memory leaks should not persist past the termination of a process.

    And your god, BG, should take that advice too, I've yet to see ANY program except for netscape with more memory leaks than Windows.


    If you can't figure out how to mail me, don't.
  • If they were to do this, it would probably increase performance. But marginally. At least in linux programs, when I've done profiles, most of the cycles were taken up in console i/o. Printf, etc.

    malloc/free is a major bottleneck in many systems, especially in SMPs. So, yes a faster (without locks) malloc/free would speed up a program several times (not just marginally). I have several examples of programs that spends up to 80% of their time in malloc/free. But of course this depends on the kind of program. Multithreaded object-oriented programs running on SMP hardware is defenetly one category of programs that would benefit from this.

    It would also require a modification to the kernel - so that the brk() system call would simply call the hardware variants.

    Most likely, but is that really a problem?

    I think that it probably *would* remove flexibility, but at the same time I think that's a good thing. Part of the problem with memory leaks and stuff is the flexibility...

    Yes, it would probably remove some flexibility (which is not a good thing). But I don't see what you mean when you say that flexibility is part of the memory leak problems.
  • by jacobm ( 68967 ) on Saturday February 26, 2000 @11:15AM (#1244293) Homepage
    Sigh. Have you ever written a garbage collector? Think about it- the people who cared enough about garbage collection to implement it in Lisp, Java, Python, etc. did. I've written one too, and I promise, you MUST have an intimate knowledge of memory to get one working. It's not incompetent programmers who write these things.

    So why did those non-incompetent programmers who know all about managing memory on their own bother with writing these tools for the weak-minded? Mainly, it's that good programmers understand the principles of program design. One important design principle is that when you want to know what your program is doing at the top level, you shouldn't have to care what is really going on at the lower levels. That principle is the reason we have operating systems and compilers, for example. All of those things allow the programmer not to have to worry about what goes on under the hood, not because the programmer is stupid, but because the programmer's time is better spent solving the problem than dealing over and over again with low-level details that have nothing to do with the program at hand as opposed to any other program in the world.

    Garbage-collected languages do the same thing- you trade a little speed (the idea that garbage collectors must be slower than hand allocation and deallocation actually turns out not to be true, incidentally, but in practice it often is) for the ability not to have to think about a ridiculously machine-level idea (id est, 'in which cell of memory am I going to write down this particular chunk of data?') so your code can just concisely reflect what your program actually does- quicker to write, easier to read, better all the way around. A smidgen slower, but who cares? You can certainly write programs of acceptable speed in garbage-collected languages, so it's no big deal unless you're writing something really speed critical, like tight graphics rendering loops or something, in which case us softheaded garbage-collection-needing incompetent programmers just use inline assembly. =)
  • Depends if (a) you're a basic profession is programming rather than needing to write programs to solve problems and (b) you're writing highly fluid programs with very complicated data structures (needed to solve real problems) so that when suddenly the entire basis of the program changes it's incredibly difficult to figure out which memory freeing times have changed and where the safest place to deallocate them is. (And when I talk about complicated data structures, I mean complicated, not the namby-pamby things you get in programs dealing with pure computer problems. If so then not having to do something that's as error prone as ensuring correct new/delete behaviour is a bonus that lets you solve your problems quicker. Saying that not wanting to waste mental effort on something irrelevant to the task in hand make me incompetent is a bit like saying that, because I've only passed a conventional driving test I'm incompetent that I can't drive my car on two wheels, leap over trucks and do other car stunts. Sure, such technical skill is acheivable and is needed by, eg, stuntmen, but since I only drive to get to and from work, it's not a relevant skill. Bottom line: anything that gets me the answers I want from my programs with less programming effort is a good thing.
  • by jetson123 ( 13128 ) on Saturday February 26, 2000 @11:35AM (#1244298)
    Actually, a number of machines have had special hardware to deal with memory allocation issues (among them, Lisp Machines, an old Intel "object oriented processor", and various segmented architectures). Note, also, that the UNIX malloc on many machines does actually make use of the memory mapping hardware to avoid page copies when memory gets "realloc'ed".

    None of those machines have caught on. The reason is that generally, you seem to do better if you invest the same kinds of resources into just making your processor faster.

    Simply implementing existing malloc/free algorithms in hardware is unlikely to help. Those are complex algorithms with complex data structures, and they are probably memory limited. Where hardware could help is by changing the representation of pointers themselves: adding tag bits that are invisible to other instructions (allowing pointers and non-pointers to be distinguished, as well as space for mark bits), and possibly representing pointers as base+offset or some other kind of descriptor that makes it easier for the allocator to move memory around without the C/C++ program knowing about. As a side benefit, this kind of approach could also make C/C++ programs much safer and help run languages like Java faster.

    But altogether, after having seen several attempts at this, I'm not hopeful. Unless Intel or Motorola do this in a mainstream, mass market processor, it will be very expensive and not perform as well. If you want those features, you end up getting better performance at a lower price by simply buying a mainstream processor without the features and simulating what you need.

  • If they're planning on doing hardware-level memory management, they're probably going to be improving the control algorithms for the TAG and/or CAM memories.

    Your basic cache/main-memory hit, miss, refresh, and stale address mapping functions, but with a more flexible and comprehensive interface.
  • by QZS4 ( 7063 ) on Saturday February 26, 2000 @11:40AM (#1244300) Homepage

    Argh! Moving stuff to hardware, when we have long struggled to get rid of esoteric hardware instructions?!? The whole idea behind RISC is to remove this kind of stupidness.

    History is full of examples where this kind of hardware "features" made things faster originally, but has since become bottlenecks instead of improvements. "rep movsb" made sense on the 8086, but on a Pentium Pro it's just another thing which makes the chip more complicated, big, and power-comsuming.

    And as for garbage collection, those of you who say that "GC is slow" are just plain wrong. There is a lot of research on garbage collection, and nowadays a good GC may be faster than manual malloc/freeing. It does not lock up your programs either - there are incremental GC's which you won't notice. They are even used in real-time environments today.

    A major hindrance for good GC's in commonplace programs is that people usually write in C(++), and you can't have a good GC in a programming language which allows pointers. Just to give an example: I once saw a program which used a doubly-linked list. But instead of storing "next" and "prev" pointers, it only used one value - the XOR of prev and next! It was possible to traverse, since you always knew the address of either the previous or the next entry, so getting the other was just an XOR operation. But please tell me, how the h* is a GC algorithm supposed to know this? Even if you use the loosest approach, "treat every single word in memory as a possible pointer", it will fail - since there are no real pointers at all!

    So the result is this: There are lots of good GC's around, but they can't be used in C - so people don't know that they exist. More or less.

  • He mentioned he was porting to windows. In my experience, the windows kernel is much worse at handling things like that than the unix kernel is. Sometimes you have to reboot to free the memory.


    If you can't figure out how to mail me, don't.
  • No. Memory is limited by latency. It doesn't matter if the CPU is 100 ghz, if it has to wait on the memory the whole system will slow down.

    It's like if you're in a car, you can go 150, the speed limit is 150, but the guy in front of you is going 50. The speed of your car doesn't matter, and the speed limit doesn't matter either. You're going 50.

    Kindly hold back your flames till you know what you're talking about. It makes you look stupid.


    If you can't figure out how to mail me, don't.
  • The reason ram is so slow now is because its so cheap
    No, the reason RAM is cheap is because it is slow. There's a difference.

    Do you understand how a cache works? The speed of the cache is not what works the magic. That's only part of it. Caches work on the principles of temporal and spacial locatity. RTFM.

    The CPU spends hardly any time whatsoever accessing memory outside the cache, speeding up main memory to core speeds wouldn't speed the computer up very much at all.
    There would be no need for a cache if the processor could read and write to main memory at core processor speeds.

    What I'm saying is that I'd rather see main memory speed up and the elimination of cache. I don't want to turn main memory into cache. That won't work.

    You are right about disabling cache slowing down your system. If you had no cache and your RAM ran at core processor speeds, your system would be faster than a system with core speed cache and low speed RAM.

    Sure, pricing is the key point here. But the research dollars could be better spent. Processors are plenty fast today. I/O subsystems are the areas that need our attention. If core speed RAM came out, geeks like us would be the first in line to buy it. Everything is high priced when it first hits the market. Prices would drop.
    --
  • if you had:

    struct LinkedList {
    LinkedList *next;
    };

    LinkedList *a, *b;

    a = new LinkedList;
    b = new LinkedList;

    a->next = b;
    b->next = a;

    Then you never used a or b again, a reference-counting GC wouldn't be able to collect them: both have a non-zero reference count!

    Considering that circular linked lists *do* happen, this is a BAD thing.
  • What's more, PC Plus magazine in the UK reported that Windows 2000 now needs a ridiculous amount of memory allocation just to start up Oracle, let alone to do anything useful with it. Sounds like a serious bug to me.

  • Any hardware-based malloc shouldn't restrict your algorithm. You always have the option to malloc in chunks and then sub-malloc it the way you want it, using your own algorithm (sorry if I'm not using the correct terminology here). That's what a lot of languages/environments have done, historically. And if the hardware version turns out to be a pile of crap, you can just not use it at all. Obvious really when you think about it, isn't it?

    And anyway, how hard can malloc and free be? I mean, sure, there's questions of optimising for speed and lack of fragmentation, etc., but I hardly think it can be that complicated.

    Garbage collection, on the other hand (which is what they're actually talking about) is slightly more mind-bending - especially when you get into incremental GC. Ouch - my head hurts.

    But then, I don't know anything about memory management, so I could be wrong.

  • ASM is just as readable as C these days.

    Um, no, I don't think so. You are wearing rose-tinted glasses my friend.

    Unless you're knee-deep in assembly macros, in which case that's strictly speaking at a higher-level than assembly anyway.

  • Specialized GC hardware is an old idea--is this something new, or just a rehashing of something that's been done?
  • Depends on what you mean by "coding properly". Is explicitly allocating and deallocating memory "coding properly"? Why--just because you're forced to look at how the memory management works, I'm guessing you'll say.

    How about making sure you're using the correct registers? Isn't improper saving and restoring of registers a performance problem? How about simply not using registers when you should? We have compilers now, though, that take care of that for you--and you don't even have to think about it it all.

    GC could work the same way--if it were fast enough. Now, putting it into hardware is something I have more of a problem with, simply for cross-platform issues, but simply hiding memory allocation isn't necessarily a Bad Thing.
  • I disagree with you that writing a GC has to be difficult. I've written several, and all of them have been a page or less of code. Here's the GC from my minimal Lisp interpreter, which I call Stutter:

    voidgarbage_collect(void)
    {
    CELL*cell;
    inti,count=0;
    mark(binding_list);
    for(i=0;i<protect_used;i++)
    mark(protect_table[i]);
    for(cell=heap,i=0;i<heap_si ze;cell++,i++){
    if(!cell_mark(cell)){
    cell_car(cell)=free_list;
    free_list=cell;
    count++;
    }
    cell_mark(cell)=0;
    }
    }

    Surely, this is not rocket science.

    -- GWF

  • There are very real reasons why you aren't going to get ram working at core speeds cheap. One big reason is that there is a huge difference in the numbers of transistors used to store a bit in the high-speed and low latency L1 and L2 caches. I believe that one transistor is used per bit in DRAM. That's pretty cheap. What is the difference in # of transistors between a P3 and a coppermine. Thats all for 256k of L2 cache.
    The reason that we have all these layers of cache is because it is cost effective and resource effective. Good luck making full speed ram with low latency that is inexpensive. I believe it would require a complete paradigm shift.
  • No so fast. First of all have a look at C/C++ Portability Guidelines [mozilla.org] coustesy of mozilla.org. By observing this document you will find that lots of yummy features of C++ are hardly portable. By suggesting to use some kind of standard memory allocator/g-collector, you will then rely on this thing to get decent performance, and probably seriously screw your chances to port this thing anywhere. If you are so desperate for this thing, then you can code it yourself and use it.
  • I believe it would require a complete paradigm shift.
    Exactly. That's what needs to happen. Even if intel and AMD manage to release a 1Ghz processor, the thing will just be wasting clock cycles.

    Processors are plenty damn fast today. But they are useless if they don't have information to process. If we could process things without making the CPU wait, then we would see some pretty snappy performance.
    --
  • At least his German is better than Hemos' English.
  • I wonder how many clock cycles are actually wasted waiting on memory access. The high speed caches close to the processor help in this regard quite a bit. It would be interesting to see a simulation comparing how fast software runs on current memory systems to how fast it would run with high speed low latency memory.
  • Cocoa (OpenStep) solves this pretty easily. Anytime an element is added to a data structure, it is retained. When the stucture is released, everything iside it is released as well. If that drops the reference count of the object to 0, it is freed. Cocoa also has an autorelease pool that is emptied between cycles of the event loop, so you can keep memory around for temporary operations.

    I understand that this is working at a higher level than a garbage collector would work at, but it works very well.
  • Maybe I gave the wrong impresion. I didn't mean to imply that writing a garbage collector was a gigantic time-intensive effort and that garbage collectors were huge beasts- the garbage collector I wrote (for a Scheme-like toy language, but it's not important) wasn't much bigger than that. I just said that you have to understand how memory works to be able to write one (if you'll recall, the original poster said that garbage collection was for people who don't know how memory works). The program fragment you posted certainly takes more memory smarts than it takes to free() your malloc()'s, though it might not take more typing.
  • A good point, I remember back in I think '97 when there were some chips in development to do just that, I believe Motorola and IBM were the two major proponents, but I could be wrong.

    However, this example only illustrates the truism that any resources directed at specific hardware implementations would be better off being directed at making a general CPU faster. These chips don't exist today because Intel and AMD have pushed the envelope so far that they just aren't necessary, or couldn't keep up.

    (For the record, I am a Java programmer so this isn't an anti-Java bias speaking).

    Hotnutz.com [hotnutz.com] - Funny
  • Sure writing an idiotic GC is easy. Why don't you post the code for a generational garbage collector instead of this piece of junk, and then we will compare the size.

    Writing a good GC is NOT easy, thats why there are so few of them. (for instance ref counting SUCKS!)

    Gavin E. Mendel-Gleason
  • Actually, the "problem" with functional languages is that they've been more research-minded than anything else, so they haven't been tuned and optimized as much as, say, C++.

    Well, let's see...there's room to speed up functional languages. You can make them implicitly parallel (try that in C). They're easier to implement garbage collection in.

    So which one would you choose?
  • This was hilarious. >Its short so porting isn't really an issue. Correct me if I'm wrong, but to *port*, you first have to *parse*...Try porting from CISC to RISC. It's significantly more of a pain than playing a little with C code and recompiling. >ASM is just as readable as C these days. I think someone else went after this one, but assembly is still assembly. It's not like they renamed the instructions or something. >Its fast as hell... Have you heard of optimizing compilers? They've gotten pretty good the past few years or so... So, yes, sometimes you need assembly (like interrupt and page fault handlers). But if you're like that group that wrote an entire base OS directly in assembly, you're insane...
  • it sees as though it will make it so that programmers no longer have to code correctly, ... i WANT to have to worry about maloc() calls etc.

    Sometimes, the only correct way to make a free() call is to not make it. You don't know and have no way of knowing if you've got the last active reference to the block of memory or not. In such cases, using anything other than garbage collection is not coding "correctly".

    --

  • by Anonymous Coward
    I really think you are incorrect in calling java "Platform Independent" This is the last thing its. Every time you write a Java Program you are writing a program for the Java platform, which is a proprietary specification, so in some ways its just as bad as writing for a specfic OS. Now sure you can have the Java VM on top of different platforms, but we all know the "write once, debug everywhere" motto. Differences between Java VMs are also a major problem.

    So always becareful when you think you have a pancea, because you never do, you only have a false sense of security

  • Umm, no. Under any given operating system, all programs allocate memory from the system using the exact same OS calls, regardless of what language they were written in.

    If this functionality were moved to hardware, and the OS modified to support this, all languages would benefit. If the OS was not modified to support this, no languages would benefit, not even C.

    I think you have some very fundamental misunderstandings about how memory allocation works. For the purposes we're discussing here, all non-stack based programming languages do it identically. Pascal, C, Ada, etc...

    --

  • In my experience, the windows kernel is much worse at handling things like that than the unix kernel is. Sometimes you have to reboot to free the memory.

    It's not a matter of "better or worse", but of "different". If you're porting to Windows 3.1 or Windows 9x in 16-bit mode, then yes, you can corrupt the global heap by failing to release memory after a process terminates. That's because all global memory in those models is shared among all processes, and so the system can't clean up without possibly destablizing some other process. Bizarre as it sounds, that not a bug, it's a feature. It's one of the ways that Windows 3.1 and 9x preserved backwards compatibility with DOS, in which all processes had relatively direct access to the physical memory.

    If you're porting to Windows 9x in 32-bit mode, or if you're porting to Windows NT, then, no, processes clean up after themselves when they shut down. That's because Win32 uses an explicit shared memory model, with shared memory retained against the file system. Thus, if two processes share memory, and the memory that one of them holds is released by process termination, memory held by the other process doesn't get corrupted.

  • First of all, thanks for pointing out that hack for implementing a doubly-linked list with the XOR of prev and next. I love interesting tricks like that.

    Garbage Collection by Jones and Lins, point out that even if a programmer stays away from code like the hack in your example, an optimizing compiler can still make life difficult for a garbage collector. They give the following example (pg. 247):

    Strength reduction, for example, can destroy all direct references to an object. Given an array of size SIZE, the code fragment:

    for(i=0;i<SIZE;i++)
    .....x[i]...;
    ...x...;

    might be transformed into:

    xend=x+SIZE;
    for(;x<xend;x++)
    .....*x...;
    x-=SIZE;
    ...x...;

    if there us pressure on registers. Inside the loop the only references to the array are through interior pointers, and on exit x points one beyond the end of the array. If this is not regarded as a valid reference, any allocation may cause the array's space to be recycled.

  • by bhurt ( 1081 ) on Saturday February 26, 2000 @01:50PM (#1244335) Homepage
    GC for procedural code is optional. A good idea, IMHO- but still optional. If worse comes to worse, you can always "bubble up" responsibility for deallocating the memory to the block in which encompasses the entire lifetime of the memory. For example, if you had:
    void * mem_p;
    void foo(void) {
    mem_p = malloc(somesize);
    /* use mem_p */
    }
    void bar(void) {
    /* does mem_p need to be allocated? */
    /* use mem_p */
    /* does mem_p need to be freed? */
    }
    void bang(void) {
    foo();
    bar();
    }
    We can refactor this to:
    void foo (void * mem_p) {
    ...
    }
    void bar (void * mem_p) {
    ...
    /* we don't need to free mem_p here */
    }
    void bang (void) {
    void * mem_p = malloc(somesize);
    foo(mem_p);
    bar(mem_p);
    }

    We can do this because the life time of the body of bang() completely encompasses the life time of mem_p- which neither foo() nor bar() does.

    Unfortunately, OO programming makes GC signifigantly less optional. And exceptions make GC no longer an option. Consider the following C++ code:
    {
    someclass * ptr1 = new someclass();
    someclass * ptr2 = new someclass();
    ...
    }

    Can you spot the bug? If the program runs out of memory trying to allocate the second class, new throws an exception. But since the exception isn't immediatly caught, the stack is simply poped, and the memory pointed to by ptr1 is leaked.

    The solution, as any C++ programmer will tell you, is to use smart classes and smart pointers, which implement GC. I.e., if the language doesn't have GC, you have to add it.

    There are other reasons to use GC- by using GC you can more effectively divorce the implementation of a class from it's interface. In a GC system you don't have to export information about the memory management aspects of the implementation. This is especially important if you have two different classes implementing the same interface, but with different memory management requirements.

    Which is faster, GC or manual allocation, often depends upon both the program in question, and the GC algorithm implemented. There is some data indicating that copying garbage collection is the fastest in complex situations- what you lose copying the data you win back in better cache and virtual memory utilization.
  • Implementing malloc() and functions like that doesn't seem to be such a good idea.

    Refenrence counting is the more basic solution to the same problem.

    > If you don't use GC, you essentially are
    > "reference counting", which is one of the worst
    > performing GC algorithms, in fact.

    Maybe so, but wouldn't it be a lot smarter to implement a couple of instructions that would speed up reference counting consirably, instead of implemention a couple of high level functions?
  • ...you can't have a good GC in a programming language which allows pointers.

    Call me pedantic, but Java and Lisp have pointers, they just don't have pointer arithmetic.
  • Surely, this is not rocket science.

    Come on, saying that's all you need for a complete garbage collector is like saying this code is a complete garbage collector:

    int gc(void)
    {
    mark();
    sweep();
    }

    Yes, now that I look at it, you're right! Garbage collection is easy and requires only two lines of code!

    Get real. Show me a GC that (a) deals with objects for varying type and size, not an array of CELLs, (b) preserves blocks of memory pointed to by nothing in the heap but by a pointer on some processes' stack, and (c) finds pointers on the heap while not knowing the internal structure of the objects allocated there.

    If it doesn't satisfy those three requirements, let's not pretend it's anything other than a toy. Even at that, we're talking about fairly primative GC. Once you've solved those problems, make sure your GC is (a) fast, (b) thread-safe, and (c) reasonably real-time. Then we're talking about something that might be considered for use in a real program...

    I've written several garbage collectors myself, and I can assure you they don't take a page or less of code. Not if you understand the real issues involved in writing real garbage collectors.

    --

  • Only to a certain point. Once the cpu speed overreaches the latency of the memory, the memory becomes the road block.

    It introduces "wait states", which is basically a clock cycle where the cpu isn't doing anything but waiting for the memory to respond. These add up.


    If you can't figure out how to mail me, don't.
  • So why did those non-incompetent programmers who know all about managing memory on their own bother with writing these tools for the weak-minded? Mainly, it's that good programmers understand the principles of program design. One important design principle is that when you want to know what your program is doing at the top level, you shouldn't have to care what is really going on at the lower levels.

    I work for a large software company. One of my major jobs is training new developers in "best practices". What you've just said is one of the classic things that people in academic circles believe...but which is utterly and unforgivably wrong. I spend a lot of time teaching them that they do care very much about how the system works, and that they care a lot about how the memory allocator in their system works.

    You see, most modern commercial software is well-designed. But efficient design will only take you so far. One of the key performance bottlenecks on an modern computer is simple and straightforward: page faulting. The basic joke we tell is that doubling the speed of your processor gets you to your next page fault twice as fast...and it takes as long with a newer machine as it did before. This means that the cheapest way to get a performance improvement is to avoid page faults -- and you do that by avoiding heap allocation at all costs. If you need to do a malloc, do it in two stages: allocate a small buffer on the stack, test the space needed, and use that buffer if its big enough. It's on the stack, so it won't get page faulted out. (But always check that it is big enough...there's this thing called a buffer overflow condition that you risk there.) Only fall back on the heap when you have to. Allocate several continguous pages of memory to handle list nodes, and use them. Only go outside that block when you must. Etc.

    And this is for any program. Do you allow the user to undo things he's done? Then you maintain an undo stack. Allocate nodes for it efficiently. Are you doing searches in dynamic lists? Allocate your tree efficiently. It's not enough to know that you need to use a B-tree or an AVL-tree to keep your data around; you must also make sure that you keep stuff compact. Do you write in C++? The x86 code to support C++-style exceptions (or Java-style exceptions) is fantastically expensive, since the x86 doesn't support dynamic context unwinding in hardware. Maybe you're better off doing nothing in your constructor, and using an explicit initialization step afterwards. That way, you don't need to handle the throwing of exceptions.

    Don't get me wrong: good design is the most important key to successful coding. A well designed piece of software can easily run ten times faster than a badly design piece of software that does the same thing. But implementation is also critical, and that often means knowing more than just how the algorithm works, but also means knowing how much operations cost on the target system.

  • by Detritus ( 11846 ) on Saturday February 26, 2000 @02:29PM (#1244341) Homepage
    I've used dynamic memory allocation in embedded real-time systems which required deterministic behaviour. Instead of garbage collection, all buffers had reference counts to control when they were put back on free lists. During program initialization, the free memory pool was chopped up into the appropriate number and sizes of buffers for the program and placed on free lists. Each type of buffer had its own free list. Each buffer had a magic number in its header to allow type and validity checks. This prevented common bugs such as freeing bogus pointers, using buffers after they were freed and freeing a buffer twice. It wasn't as flexible as the typical malloc/free implementation but it was fast, reliable and deterministic.
  • <i>ASM is just as readable as C these days.</i>

    <i>Um, no, I don't think so. You are wearing rose-tinted glasses my friend. </i>

    <i> Unless you're knee-deep in assembly macros, in which case that's strictly speaking at a higher-level than assembly anyway.</i>

    Um, <b>YES</b>, certain processors have *extremely readable* assembly. I know for certain that the 68k has a very nice asm opcodes (it's very much like C - reads like it, too). And this is with a dumb 2pass assembler (no macros, nothing).

    The only reason "assembly is hard" is due to Intel and their awful pocket-calculator decendent chip, the x86. Now, this chip has extremely wierd instruction rules with wierd register accesses (some faster than others for certain operations)...
  • I never said porting would be easy. I said that a few thousand lines of critical ASM is easy enough just to rewrite. And check out some code written for MASM at www.gamedev.net it look like C with short function names. Yes optimizing compilers work great these days (even gcc works decently) but some things don't get optimized correctly and is just easier to write in ASM. I agree that writing an entire base OS in assembly is stupid, I never said otherwise. The type of ASM I'm talking about is small, critical sections of code written in ASM to really enhance performance. Back in the day before OpenGL on the PC, people would write their games in C, but write something like a triangle rasterizer (a few hundred lines of code) in ASM for maximum performance. The guys at Be are doing this with their OpenGL implementation, writing some critical, but small, portions in ASM for speed.
  • Question: is this OpenStep or Objective-C? At any rate, you still aren't saved from circular references.

    Let's say you had a function LinkedList *CircularLinkedList() that returns a circular linked list.

    c = CircularLinkList();

    OK. The count of the first element in the list is incremented. No problem. You do some processing, then you do

    free(c);

    the first element in the linked list has it's reference count decreased. It's count is now 1. And it will remain 1 even though the guy who is refering to him is unreferencable!!!

    This is a general failure of reference-counting techniques. There are other shortcomings, including extra space to keep track of ref. counts, and keeping track of the counts them selves. Think about: every time a pointer is assigned, passed to a function, or returned from a function, you need to increment or decrement some counter *somewhere* in memory. And this can't be parallelized like mark&sweep can and run when your main process it waiting - it takes a hit on the main process itself.

    It may work "OK", but better methods have long since been developed. I hope that I've explained this well, but if you still think reference counting is good, then obviously I haven't.... because this isn't a contencious point.
  • The devil is in the details.

    IIRC, in early IBM MVS (360 vintage), memory was allocatable from several subpools and subpools could be allocated/freed themselves. Surely 30+ years of "progress" has not made storage allocation more primitive.

    In reality you would want to allocate a "few" large blocks from the OS, then sub-allocate pieces of those blocks within a subsystem. When you have shared memory and long-lived processes, things get interesting.
  • Game programmers may not mind writing in assembly for speed. Fine -- they're paid to be concerned about speed. However I, and most other programmers, are paid to write working, maintainable programs in a reasonable amount of time, not to write slightly faster and partially unreadable programs in excessive amounts of time.

    I tend to write two kinds of programs: simple user utilities (most of which can be done in Perl to ignore this garbage) and security-critical, low-level daemons. The former are not speed-sensitive enough to write in assembly. In the latter, I am more concerned about being able to instantly determine program flow and implications than having to grok assembly.

    I have no problem with people using assembly in very specialized applications. But I think anyone considering using assembly for any reason but as an absolute last resort should be sent to work on Microsoft BOB.
  • It would require more than a paradigm shift.
    See, there is a reason why high frequency
    connectors get smaller and smaller. There
    is a reason why GHz people don't use BNC
    and such. To get good characteristics on high
    frequency signal you are almost forced to
    shrink your design or else adhere to insane
    tolerances. The first way leads to on-chip
    cache, the second leads to very expensive RAM.
    Rambus is a good early indicator of how RAM
    gets expensive as you bump up clock speed.
    There is also a problem of synchronization.
    Even within one fast enough chip you can't
    have one clock running the show because of
    relativity issues. Engineers are starting to
    deal with this today so it's not a far away
    problem. But now imagine a physically remote
    RAM chip (remote here can mean as little as
    a millimeter away). Even if you could run it as
    fast as the microprocessor, you couldn't keep
    them in synch.
    Having ultra-fast RAM is quite pointless, IMHO.
    You want local caches all over your chip. In the
    future chips will likely have many units on the
    same die working from their own clock and with
    their own cache.
  • I mean complicated, not the namby-pamby things you get in programs dealing with pure computer problems. If so then not having to do something that's as error prone as ensuring correct new/delete behaviour is a bonus that lets you solve your problems quicker.

    I think this line of thinking is plain ol wrong. I understand what you're saying, but I think you're giving up tons of performance, especially in cases of, as you said, REALLY complex data. For instance, if you're using something that takes care of all the memory management for you and allows you to concentrate solely on the problem, something like, say, java, you might be able to express your data conversion routine quickly, but oh-my-god is it ever going to suck to actually parse a few gigs of data.

    At that point, having efficient memory management AND a tight loop could save you hours or even days of mining time.

    Bottom line: anything that gets me the answers I want from my programs with less programming effort is a good thing.

    Be sure to put that at the front of your user manual, I'm sure the people waiting on your code to do what THEY want will appreciate your insight. :)
    And, damn is it ever funny that that post got moderated up.
    -- blue
  • for all you people whining about learning from RISC architecture, and complex hardware, realize something. this is two grad students and their advisor working on a project to get their masters degree. how many people are there out there with a masters degree in cs? how many of these peoples projects actually turned into somthing used by a major hardware company? probably not many. but their research can still turn up a lot of useful information. that's why it's research. for example, how many people have read linus' comments about the academic community and micro kernels? he basically slams on people who advocate micro kernel technology, but he turns around and says that most of the optimizations that can be used to make micro kernels faster can also be used to make monolithic kernels (ie linux) faster as well. so maybe you think that the study these guys are doing is useless, but what they learn in this project may turn out to be very useful information. and they may do other things that could be very useful. (for example, i may be doing some work with one of these guys on the implementations of threading and garbage collection in kaffe. maybe that's not exactly related, but im sure the studies he wants us to read on garbage collection came from his work on this project)

    oh andas far as comments about their html or english, neither of those is particularly relevant to their research. i've had both of these guys as TA's for cs classes here at iit, and i had dr. chang as a teacher. all three of them are not native english speakers and have very heavy accents. but while they are difficult to understand, they are all extremely knowledgable about hardware and hardware design, and are some of the brightest people i have met in college.
  • One of the biggest problems with garbage collection is that it can't really be controlled. If you're writing Quake3, you do not want garbage collection. You don't want the program to start cleaning up memory right when you're trying to generate the next frame...

    BUT, if they do this and make it so you can, say, force the garbage collection to do its thing at a given time, maybe that could be do-able... still pretty crappy for high-speed apps though that need consistent speed...

    Esperandi
  • I once saw a program which used a doubly-linked list. But instead of storing "next" and "prev" pointers, it only used one value - the XOR of prev and next!


    That's a cool trick! Although it probably doesn't make the program especially readable. Does anyone else have any good tricks like this? Preferably relating to garbage collection but offtopic will do :-)
  • Is this really an inherant problem with GC? The existence of incremental GC has been mentioned by others (though I don't know where it has been implemented). This should be fine except for the most finicky code. For these it would seem fairly easy to implement a CG system that could be turned off for critical operations. This would raise the problem of not being able to allocate memory, but this could be addressed by specifying an amount of memory to reserve before GC is turned off, leaving the responsibility for observing those limits in the hands of the coder until CG is restored.
  • Unless you're using a generational garbage collector, in which case it's usally just walking your most recently allocated data.

    Many garbage collectors just collect before expanding the heap, so that's not so bad, and arguably better to page a bunch of stuff in all at once than paging something in on every call of free..

  • And anyway, how hard can malloc and free be?

    Please write a memory management system and then come back. I'm sure there's some classic understatement using almost exactly those words that I can quote at you, but it doesn't come to mind.. :-)

    Daniel
  • -
    From the site (DMM):
    >The memory intensive nature of object-oriented languages such as C++ and Java

    From a reply:
    >why not make the hardware faster?

    Fellow programmers,
    am I the only one who still remembers Assembler and the intensive search for memory preserving methods ?

    Am I the only one who tries to make things fast, without thinking about the processor ?

    In the 70s we struggled for every byte we could spare (yes, and we created the Y2K problem).
    "Star programmers" like me even changed the code during runtime, by moving new instruction over old ones.
    Yes, it was very hard to read, but it was top efficient.

    Fellow programmers,
    are you all too young or what happened to the common sense ?

    If I have to solve a problem for my daily life with my machines, I FIRST check if a SHELL SCRIPT can do it.

    Not, because I'm too lazy to use C, but because it might be faster.

    If you run The Shell, there are inbuild commands and external commands, some having the very same name.

    Ie: A "test" runs faster than a "command test".

    BUT, YOUR:

    case "$1" in
    hello) COMMAND
    esac

    RUNS *** NOT *** FASTER
    If "test" is inbuild in your shell (and it is, folks) and you write:

    if test "$1" = hello
    then
    COMMAND
    fi
    Am I the only one who knows that ?

    Java needs a lot of memory.
    It's hip.
    But what is Java REALLY ?
    Nothing than an interpreter language that just happens to have support by MS and Netscape (and due to this, now in our Linux kernels).

    In the 80s we used (and I still do) REXX.
    It's also an interpreter language, can also be compiled to object code.
    It runs on MVS, VM, Unix, DOS, MS Windows.....

    At that time, there were just no browsers (besides Mosaic).
    It can do everything Java can,
    the only reason why you guys and girls use Java is that Big Companies ship their interpreters with their browsers
    and it looks like, as if Java runs on it's own, like magic.

    The only thing that runs on it's own is Assembler.

    Now I don't want to say you should use Assembler,
    but I think I need to remind us all,
    that hardware becomes faster and faster,
    and that because of this, programmers get lazy and code stuff that runs on state-of-the-art hardware,
    but would run 4 times as fast, if these programmers would first think about RESOURCES.

    A good program will run on a 486 in a reasonable speed and on a Piii like "real time".

    I want programmers to THINK about how they code.
    It IS possible to write applications that do not need extensive memory and a fast CPU (or two),
    IF the programmer would first THINK about how to write something and optimize the code himself, not only with a standard optimizer (if he uses that at all).

    Read "The Unix Programming Environment" by
    Brian W. Kernighan and Rob Pike.
    After that, your programs will run 4 times faster.

    Replies greatly appreciated,
    fine day y'all, george./

  • After all, if I always have to remember to set objects to NULL then I might as well just always remember to call delete!

    Somewhat off-topic. I like to set pointers to NULL after their objects have been freed or are no longer valid. This helps prevent the nasty situation of code referring to an object that is now a chunk of memory on the free list, or worse, allocated to some other use.

  • by Chuan-kai Lin ( 26774 ) on Sunday February 27, 2000 @12:05AM (#1244381) Homepage

    There seems to be a number of misconceptions as to what this is all about or how might it be useful in practice. I will offer some opinions on some of the issues raised in other posts.

    The first one, as already mentioned by a number of people, is that hardware implementation of malloc and free has nothing to do with GC. The most difficult part of GC is to determine which part of the memory is garbage (this is not as easy as it may seem) without using lots of memory (after all, garbage collectors are typically only activated when free memory blocks are running low), and for those garbage collectors running as an independent thread, avoid possible race conditions with the foreground processes. Other issues a garbage collector faces include how to reduce the impact on overall system performance, and how to decrease memory fragmentation.

    A garbage collector is not something easy to design and implement. Making a good garbage collector especially requires the almost-impossible combination of profound understanding in the memory usage behavior of typical programs, logical reasoning abilities, and coding techniques. In addition to the garbage collector itself, you also need support from the language and compiler side, and you have to integrate all of these into a clean design. That is about as hard as things like this can be.

    (Of course, you can also write a conservative pointer finding GC like the libgc used by many functional language compilers --- but that is far from the state of art in this business.)

    The proposed hardware support has nothing to do with the points we mentioned above, therefore has nothing to do with GC. Then, is it possible to build some hardware support for garbage collection? Maybe, but I am not an expert in this field. Whatever the solution turns out to be, it will never be as simple as hardware implementation of malloc and free.

    Second, this also has nothing to do with real time systems. Many people seems to think that being "real time" means you have to code everything in assembly language and make things run as fast as possible, but that is simply not true. Being (hard) real time means operations must respond and complete within bounded time; as long as that bound is satisfied, there is no need for the task be completed "very fast". The trick of building real time systems is in making sure that nothing will delay the operation for a long time, and that requires careful analysis and implementation.

    If you remain to be convinced, think of a typical hard real time application: CD-R burning control programs. They must feed a continuous stream of data (say, 600KB/s for 4X burning) into the burner, or the disk will turn into a coaster. Is it necessary to code it in assembly? Absolutely not, because pumping data at 600KB/s is very little work for current architectures with 600MHz processors and 100MHz memory busses. However, does that mean you do not need to pay special attention to make it real time? Wrong, because although the hardware is several orders of magnitudes more powerful than it needs to be, there are still possibilities that something will make the program stop working for a second or two and screw the party. It is the responsibility of real time system designers to make sure that it does not happen.

  • If you always chase the list, no problem. If you ever need to point into the list from outside, forget it.

    --
  • You need a pointer to the first entry in the list. Assuming that the last "next"-pointer and the first "prev"-pointer is NULL, it's easy to go from there. Sample:

    p = first;
    prev = NULL;
    while (p) {
    (do stuff with *p)
    tmp = p->xoredstuff ^ prev;
    prev = p;
    p = tmp;
    }

    Traversing in the other direction is left as an exercise for the reader.

  • java has references, which is different.. they are equiv. to handles (pointer pointers) where the middle pointer is a table to the location of the object - so things can be moved around without the actual pointer value in the program changing.

    Well, more or less. As part of a programming languages class I took in school, I actually had to write a garbage collector for a Scheme-like language. While one could implement references in a Java/Scheme/LISP-like language in the above fashion, that's really slow; it adds another layer of indirection that isn't really necessary. If the run-time system provides information about the types of objects in memory (which is required for Java and Scheme anyway -- instanceof and symbol?), then the GC can reach in and directly change the pointer values within the program's stack or whatever.

    From the point of view of the applications programmer, though, the two implementations are equivalent.

    As far as I'm concerned, the primary difference between pointers and references is that with a reference to an object of type Foo, you are guaranteed that the object referred to is a valid instance of type Foo. This is not the case with pointers. In C/C++ terms, int i; float *pF; pF = (float *)(&i); The object at *pF is not a valid instance of a float.

    (Yes, you can in fact break this guarantee in C++ using a series of really crufty typecasts. C++'s references are broken, in a number of respects.)

    Of course, as MassacrE points out, references are almost always implemented internally in terms of pointers, because that's what the underlying architecture tends to support.

  • I claim that there is no need for a garbage collector. Finding memory leaks quickly is not a problem _if_ you have the right tools. Purify is good but it is too slow. There are tools I have developped which can show with pinpoint accuracy where the leaks are and have no performance penalty.

    It's a misperception that garbage collection is about debugging, though- the point is that you can write programs that aren't convoluted through the need to allocate and deallocate memory cells for remembering new things. The C/C++ idiom of things like

    void get_string(char * dummy_string)
    {
    /* initialize buffer */
    }

    makes sense only because you've seen it a million times, and because you're thinking about the way that memory works. The much more intuitive Java style- that is,

    String getString()
    {
    return new String( ... );
    }

    reflects your intent much more naturally, but is only really possible because of garbage collection. (I know you can allocate and then return in C/C++, but I also know that it can turn out to be a huge mess to figure out exactly when to deallocate, and I no of nobody who writes anything that isn't trivial by returning a newly-allocated pointer).

    The point is simply that if you want to be able to design and implement programs correctly and rapidly, a facility that lets you specify only "what my program does" without bothering you with details. One aspect of speed is how fast your program will run, and that's certainly important. Another aspect of speed is how quickly you can write a correct program, and high-level abstractions make application development far quicker. And for most of your code, how rapidly it executes isn't really a big deal- the idea around Rice University, where I go to school, is that you should write your programs in as high-level a language as possible, figure out where it's necessary to make the code faster, and then rewrite those pieces in C or assembly or whatever (that idea is based on the premise that in typical programs, a very small subset of the code takes up the vast majority of the execution time). You get speed where you need it and flexibility (ease of maintenance, reusability, etc) everywhere else- the best of both worlds, and also you wrote your program in half the time it took the other guy to write the C++ version.
  • > One of the biggest problems with garbage collection is that it can't really be controlled

    Says who? Most gc's operate in threads. Make your critical operations atomic. Presto, no interruptions.
  • Guilty as charged- I am "in academic circles," and I haven't ever had to work in an industry setting on really big projects where speed was very important. But I maintain that what I said was still accurate, while conceding your point: it may be true that with current tools, programmers do sometimes need to think about machine-level details (like whether their code is likely to cause page faults) in their high level designs, but it's also still true that they shouldn't have to. I have mentioned elsewhere in this thread the idea (popular among the faculty here at Rice) that you ought to first write your program in a very high-level language (at Rice, it's MzScheme [rice.edu]), figure out where it's important for your code to go fast, and then rewrite those pieces in a low-level language where you can control machine-level details. The claim is that you get a speedy enough program, because you've optimized the hell out of the critical sections, and the stuff that it doesn't matter whether you optimize or not, which is most of your code, was written much more rapidly than would be possible with a lower-level language.
  • Make transferring 50,000 polygons to a 3D accelerator atomic?

    Esperandi
  • by slashdot-me ( 40891 ) on Sunday February 27, 2000 @11:27AM (#1244416)
    You fail to grasp that fast memory will always be more expensive than slow memory. That should be obvious, if for no other reason than fast memory needs six transistors where dram needs one.

    Here's an example. Say fast memory is 10x the price of slow memory and 10x the speed. Also, assume that a cache one tenth the size of the main ram has a miss rate of 0.5%. Given these specs, you can have either:

    N megs of fast memory with an average access time of Q.

    5N megs of dram with N/2 megs of cache with an average access time of 1.045Q.

    That's five times the capacity for a 4.5% speed hit. In the real world the situation is even more favorable for a memory heirarchy.

    BTW, nobody just uses ram anymore, everyone has swap space too. Why don't you argue for disks that run at core speed?

    Ryan Salsbury
  • I should've been more clear- when I talk about the programming-time advantage of higher-level languages vs. lower-level languages, I'm not thinking of Java, I'm thinking of Scheme. Java is syntactically enough like C++ that it seems to me the only thing you're saving yourself from having to do is explicitly deallocate memory (and dealing with C++'s brain-dead unit system- making programmers do the things they have to do with #includes and .h/.cc files ought to be against the law), so it doesn't surprise me that in real-life situations the benefits of Java would be outweighed by the speed of C++. There are other high-level programming languages, though, like Scheme, that have saved me tremendous amounts of time over C++ or especially C in the average case.
  • This article is by now so old that no one will read this comment, but what the hell. Karma whore, etc.

    Some of the articles that have been posted seem to miss the point. Several people suggested that this design goes against the principles of RISC. I am puzzled. The RISC philosophy is about maximizing efficiency by reducing CPU complexity. But this is memory management research, i.e. it proposes a new MMU design, not a new CPU. It is like suggesting that 3D accelerator boards are contrary to the principle of RISC design because they involve complex hardware. There's no contradiction in having a simplified CPU and complex off-chip hardware to back it up.

    Others have suggested that there's no point to this work because a hardware implementation of malloc() and free() would run only marginally faster than their software counterparts. I suggest reading some of the publications on their Web site, particularly their Introduction to DMMX. [iit.edu] They aren't merely trying to implement malloc() and free() in hardware, and the solution they describe would allocate and sweep the heap in constant time. If the scenario described in this paper is feasible, it could be pretty interesting stuff.

"Your stupidity, Allen, is simply not up to par." -- Dave Mack (mack@inco.UUCP) "Yours is." -- Allen Gwinn (allen@sulaco.sigma.com), in alt.flame

Working...