Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!


Forgot your password?

Memory Management Technique Speeds Apps By 20% 252

Dotnaught writes "A paper (PDF) to be presented later this month at the IEEE International Parallel and Distributed Processing Symposium in Atlanta describes a new approach to memory management that allows software applications to run up to 20% faster on multicore processors. Yan Solihin, associate professor of electrical and computer engineering at NCSU and co-author of the paper, says that using the technique is just a matter of linking to a library in a program that makes heavy use of memory allocation. The technique could be especially valuable for programs that are difficult to parallelize, such as word processors and Web browsers." Informationweek has a few more details from an interview with Solihin.
This discussion has been archived. No new comments can be posted.

Memory Management Technique Speeds Apps By 20%

Comments Filter:
  • by Estanislao Martínez ( 203477 ) on Monday April 05, 2010 @08:11PM (#31743272) Homepage
    Beware the key term there: "up to."
  • by Ancient_Hacker ( 751168 ) on Monday April 05, 2010 @08:11PM (#31743284)

    Nothing to see here...

    Moving malloc() to a separate thread does not do a thing for the putative word processor.

    They might get some speedup if they take a lousy old malloc() and have one thread hold onto the locks.

    But of course the *right* way would be to write a new malloc() that can from the get-go run re-entrantly and not require a bevy of slow locks.

  • 20%?! (Score:5, Insightful)

    by temojen ( 678985 ) on Monday April 05, 2010 @08:21PM (#31743372) Journal
    If most programs are spending 20% of their time on memory management, something is wrong.
  • by lordlod ( 458156 ) on Monday April 05, 2010 @08:41PM (#31743554)
    The article(s) are very scarce on details but it seems like the gains will be limited in most applications. Fundamentally you have to block until the malloc has finished before you can use it. So it helps if you malloc well ahead of time, but not if you malloc as you need it.

    A common simplified structure is:

    malloc memory
    use memory
    free memory

    With these new innovations you get:

    async malloc memory
    block until malloc finishes
    use memory
    async free memory

    And free shouldn't take a noticable amount of time.

  • by w0mprat ( 1317953 ) on Monday April 05, 2010 @08:46PM (#31743596)

    Because we learnt to program for a single threaded core with it's single processing pipeline since way back, using high level languages that pre-date the multi-threaded era, and it involves re-thinking how things are done on a fundamental level if we're ever to make proper use of 32, 64, 128 cores. Oh and we all know how many programmers are 'get off my lawn' types, myself included.
    If I still coded much anymore it would drive me to drink.

  • by Zironic ( 1112127 ) on Monday April 05, 2010 @08:58PM (#31743702)

    It's a performance gain because it's extremely rare that all your cores are maxed out at once, if you can distribute the computing power more evenly it's a performance gain in most circumstances even if the net computing power required increases.

  • Re:20%?! (Score:3, Insightful)

    by naasking ( 94116 ) <naasking.gmail@com> on Monday April 05, 2010 @09:04PM (#31743736) Homepage

    Not at all. 20% is a very typical overhead for dynamic memory management. Did you think malloc/free costs nothing?

  • by nxtw ( 866177 ) on Monday April 05, 2010 @09:09PM (#31743770)

    Well, the Intel AES instructions would benefit even more from parallelized AES CTR mode pre-computation than straight multiple cores, so that doesn't invalidate what I'm saying at all. :-)

    Are your storage and network devices that fast?

  • by AuMatar ( 183847 ) on Monday April 05, 2010 @09:28PM (#31743914)

    Wouldn't it be rather trivial to write a lockless malloc? Just have every thread allocate its own memory and maintain its own free list- problem solved.

  • by Spatial ( 1235392 ) on Monday April 05, 2010 @10:02PM (#31744092)
    I like to mentally replace that with the actual meaning: "between 0 and".

    It could allow software applications to run between 0 and 20% faster!
  • by Georules ( 655379 ) on Monday April 05, 2010 @10:05PM (#31744104)
    You might consider mentally replacing it with the sad reality that it might be between 0 and x faster AND it could also be infinitely slower.
  • by wealthychef ( 584778 ) on Monday April 05, 2010 @11:28PM (#31744484)
    But how much of your time is spent allocating memory? If you spend 5% of your time in malloc(), doubling its speed saves you 2.5% of your execution time.
  • by mswhippingboy ( 754599 ) on Tuesday April 06, 2010 @12:21AM (#31744796)
    What you are missing (as are most of the posters so far) is that there is considerable overhead involved in the actual management of the memory in terms of keeping track of what memory is free or allocated. This is outside the issue of maintaining locks. Moving this management overhead to a separate thread allows the otherwise single threaded app to take advantage of additional cores without any code changes. This does not appear all that novel however as modern garbage collectors do this today.
  • Re:20%?! (Score:3, Insightful)

    by RAMMS+EIN ( 578166 ) on Tuesday April 06, 2010 @12:51AM (#31744892) Homepage Journal

    ``20% is a very typical overhead for dynamic memory management. Did you think malloc/free costs nothing?''

    Many people actually seem to think that, and that only automatic memory management is costly. Out in the real world, of course, managing memory costs resources no matter how you do it, and you can sometimes make huge performance gains by optimizing it. I've seen percentages of time spent on memory management anywhere from 99% in real programs. As always: measure, don't guess.

  • by jasmusic ( 786052 ) on Tuesday April 06, 2010 @02:41AM (#31745328)
    Those developers can hold the rest of the software industry hostage for mad income. OS kernels don't write themselves.
  • by headLITE ( 171240 ) on Tuesday April 06, 2010 @03:11AM (#31745428)

    A large amount of malloc()/free() calls is something very typical of server applications that handle many concurrent requests. In this scenario, the problem is made worse by the locking used in many traditional implementations. Don't underestimate that.

    This is becoming more and more of a problem in client applications as well. Thanks to object orientation, many modern applications are little more than endless streams of created and subsequently destroyed objects; and in many modern languages this happens implicitly all the time.

  • by julesh ( 229690 ) on Tuesday April 06, 2010 @05:17AM (#31745812)

    When used for locking it is called spinning and not busy-looping, and stop your silly doomsday speak and grow a brain. The linux kernel itself more often use spinning than locking, because it is much faster and uses less cpu-cycles. You have busy-looping thousands of time each second when the kernel synchronizes threads and hardware, this is a no-go in application design, but a really common and efficient trick in low-level libraries and routines, and it will save you cpu-cycles and energy compared to semaphores, not use more.

    Only if you restrict its use to occasions where you know the lock will become available quickly. The Linux kernel uses spinlocks for its internal structures where it knows that no other CPU is going to lock them for more than a few thousand cycles at most. I also believe it (usually) disables interrupts while the lock is held, so it knows that nothing will interrupt that operation prior to its completion. This is a very different situation from an environment where there may easily be multiple seconds between allocation requests.

  • by tibit ( 1762298 ) on Tuesday April 06, 2010 @09:25AM (#31747164)

    I think that part of the problem is that there are still human developers on the other side of the keyboard. Code that utilizes asynchronous I/O in the general case, where you may be accessing multiple files from different places in your application, is just a pain to write in languages like C or C++.

    You need at least sensible coroutine support to make it palatable, IMHO. To really utilize async I/O without spawning many threads that each use sync I/O, you need to have cooperative multitasking -- thus coroutines or somesuch.

You are in a maze of little twisting passages, all different.