Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Programming Software IT Technology Linux

More Effective Use of Shared Memory on Linux 280

An anonymous reader writes "Making effective use of shared memory in high-level languages such as C++ is not straightforward, but it is possible to overcome the inherent difficulties. This article describes, and includes sample code for, two C++ design patterns that use shared memory on Linux in interesting ways and open the door for more efficient interprocess communication."
This discussion has been archived. No new comments can be posted.

More Effective Use of Shared Memory on Linux

Comments Filter:
  • SysV IPC is obsolete (Score:4, Informative)

    by bogolisk ( 18818 ) on Monday November 14, 2005 @08:06AM (#14025016)
    some1 should tell the authors to rtfm.

    $ man shm_open

  • by Cyberax ( 705495 ) on Monday November 14, 2005 @08:18AM (#14025041)
    There is a great C++ library for shared memory support: SHMEM [prohosting.com]. It can place complex objects and STL-like containers in shared memory. And it is crossplatform (POSIX and Windows are supported).

    And it will soon (hopefully) be a part of Boost [boost.org]!
    • by maxwell demon ( 590494 ) on Monday November 14, 2005 @08:40AM (#14025106) Journal
      It can place complex objects and STL-like containers in shared memory.

      Depends on your definition of "complex objects".

      From the documentation:

      Virtuality forbidden

      This is not an specific problem of Shmem, it is a problem for all shared memory object placing mechanisms. The virtual table pointer and the virtual table are in the address space of the process that constructs the object, so if we place a class with virtual function or inheritance, the virtual function pointer placed in shared memory will be invalid for other processes.


      Basically, I would have been surprised if they had found a solution for that. But I guess it cannot be portably solved. Instead, the system would have to be prepared for it. I could imagine that objects in a shared library (so the same code is guaranteed to be shared to both processes) could be placed in shared memory, if the compiler/runtime system provided the means for it (say, instead of the pointer to a VMT, it would contain an offset into the constant data section of the shared library, and something to identify the library with, say a system-wide unique active library index which is generated by the dynamic linker).
      • Well, it's possible to use shmem as a very fast method for marshalling of arguments across process boundaries and then use BIL (Boost Interfaces Library [cdiggins.com]) to marshall actual function calls. It will look like Local Procedure Call subsystem in Windows NT.

        You can get virtual functions this way and it will be fast enough but not very "nice", of course.
        • You could use XML to marshall the objects. It lends itself rather well (if you're careful with how you construct your schema) to dealing with some of the interesting challenges with virtual member functions and variables. It may be overkill in certain situations, but it's probably a simpler way to deal with marshalling than re-working a large object hierarchy so that it contains no virtual functions/variables; i.e. don't reinvent the wheel.
  • const (Score:4, Funny)

    by hey ( 83763 ) on Monday November 14, 2005 @08:40AM (#14025109) Journal
    I suppose everything marked const could be shared.
    • The problem is you can still change data in const objects. In some cases it's necessary - hence the "mutable" keyword.

      Once made a class that abstracted a file as an array of records. To be able to make a const version of this that could still read the backing file, I had to have read pointers into the file as "mutable". There is still changeable state, it's just more controlled.

      const is really not a very good guarantee that things won't be changed. The language lets you get around it too many ways.
  • This is nothing new (Score:4, Interesting)

    by Anonymous Coward on Monday November 14, 2005 @08:46AM (#14025122)
    You've been able to do this for a while using process shared mutexes and condition variables which allow you to do the same things you could do with pthreads and shared memory. The tradeoff is you get better performance avoiding syscalls to do IPC but it's less robust. If you get a segfault, you have to assume that the shared memory is in an unknown state and either shutdown or restart everything. The other processes can (or will be able) to detect this using once robust futex support is in Linux. Idiot programmers will of course ignore this and continue to use the corrupted memory anyway just like they do now with sysV semaphores used as mutexes with the SEM_UNDO option to allow the semaphore to auto reset if a process exits without resetting it.

    Anyway, old stuff. Wake me up when you start talking about the newer tricks with shared memory.

  • CML (Score:4, Informative)

    by putko ( 753330 ) on Monday November 14, 2005 @08:49AM (#14025136) Homepage Journal
    For concurrent applications, it is hard to beat Reppy's CML.
    http://portal.acm.org/citation.cfm?id=113470 [acm.org]

    In particular, the things you synchronize on are first-class. Also you can speculatively send/receive things. Normal "select" is only for reading. You don't have to manage your memory either.

    There are other concurrent languages, but CML is nice in that it has a formal semantics, so unlike typical languages like "C", "C++", Erlang or Java, a program has a meaning other than "whatever the program does when I run it."

    You can implement the primitives of CML in your favorite higher-order language, so you don't have to be limited by ML. That's what's in Reppy's book.

    A proper implementation can achieve speeds that are about 30x faster than pthreads for typical tests like "ping/pong".

  • by VernonNemitz ( 581327 ) on Monday November 14, 2005 @08:51AM (#14025141) Journal
    Quite a few years ago, there was a brief popularity of something called VRAM (video ram) that had memory cells specifically designed with one input line and TWO output lines. The idea was that the part of the hardware needing to construct an image for the screen ONLY needed to read memory, while the system responsible for creating the image needed both read and write access. Ever since then, I've wondered why they don't use this kind of memory in multi-processor systems, for communication between processors, such that Processor A has read/write access to a block of VRAM, to give info to Processor B (it has read-access only), while Processor B has read/write access to a different block of VRAM, to give info to Processor A (it has read-access only).
  • Doors (Score:5, Interesting)

    by Anonymous Coward on Monday November 14, 2005 @09:06AM (#14025201)
    I'm surprised no-one has mentioned Solaris Doors. Doors is an IPC mechanism whereby the first process (client) can hand off any residual time in its timeslice to the second process (server) resulting in short IPC calls running much less time as there is no discarded timeslice time and no wait for the server process to be scheduled (since it uses the client's timeslice).

    • Re:Doors (Score:3, Interesting)

      by Foolhardy ( 664051 )
      That sounds like the same as NT's event pairs, used to implement Quick LPC. An event pair consists of a high and a low event. The server thread waits on the high event and the client thread waits on the low event. Only one event can be signalled at one time, and two software interrupts are provided to toggle the event pair's state: interrupt 0x2C calls the function KiSetLowWaitHighThread() and interrupt 0x2B calls the function KiSetHighWaitLowThread(). When one of these is called and another thread (in anot
  • by Tzinger ( 550448 )
    Too many people here are willing to make inane useless comments about honest work efforts. If you have a better way, offer it. If you merely want to say something nasty about someone else's work, save it for the coffee house.
  • Unix Domain Sockets use shared memory to transfer data between applications. How does this compare to other shared memory methods in performance?

  • And? (Score:4, Interesting)

    by ratboy666 ( 104074 ) <<moc.liamtoh> <ta> <legiew_derf>> on Monday November 14, 2005 @10:10AM (#14025528) Journal
    Ok, I get it... it's an attempt to exploit shared memory in C++.

    And why is this news? Is it so difficult that nobody has done it? No, that can't be -- the shm stuff can be wrapped. This is so important that it rates a "design pattern"? Not it either -- the one illustrated isn't the best solution.

    So, just what is this article? Methinks fluff. Sort of in line with "How to implement co-routines with setjmp/longjmp" thing. Or, "Restructuring data to assist processor cache residency". And "How to remove locks from performance critical MP code".

    Except not as interesting or useful.

    Ratboy.
  • by photon317 ( 208409 ) on Monday November 14, 2005 @10:32AM (#14025639)

    A lot of shared memory synchronization and/or caching problems can be solved on Linux through the effective use of a few simple things:

    1) shm_open (if seperately-started processes which need to coordinate in shared memory), or mmap(MAP_SHARED|MAP_ANONYMOUS) for a process which will fork children which need to communicate/share between themselves and the parent.

    2) Use 's "atomic_t" integer type within that shared memory array (atomic_t* my_shm_array = mmap(....)). The atomic_t type has several functions defined in that header for atomic read, write, increment, etc for the linux hardware platform at hand. On most sane (cache-coherent) SMP architectures, reading and writing are already atomic operations, so this basically devolves to just setting and getting integers like normal (with a little bit of syntactic sugar (struct { volatile int val }) to make sure the C compiler doesn't optimize things away that it shouldn't. And you can implement a whole lot of sane algorithms using nothing but shared memory integer reads and writes with no locking or special atomic increment ops.

    3) If you need more advanced or complex locking on the shared memory for synchronization, use Linux's "futex"'s. They're in the man pages, and they're really fast.

    • 1) shm_open(2) is already mentioned in the 2nd post.

      2) dont u know that NPTL is already doing this for u? On fast-path, NPTL's posix mutex just do atomic operations and avoid doing syscall. Stick to the standard API and let the platform guys (libc, kernel, ...) do the optimization. They're smarter than u.

      3) u dont want to do this, seriously! if futex is that consummable by the public, then why did the glibc guy write a looooooong paper describing howto use futex.
       
  • The mutex doesn't seem to be shared between processes. This would make the code incorrect. Can anyone confirm this ?
  • not really usefull (Score:2, Informative)

    by vtoroman ( 930911 )
    The code shown is using pthread mutex for sync-ing. The mutex works only for synchronization of threads, not processes so the code is useless (even dangerous) for inter process communication (IPC). In the case of threads another question is just screaming for an answer:
    Why would someone use a shared memory block for threads which are all running in the same memory space anyway?

    We come to the conclusion that the code is quite useless for inter-thread communication too. All in all - usel
  • by Animats ( 122034 ) on Monday November 14, 2005 @02:57PM (#14028186) Homepage
    For historical reasons, most of the UNIX-like operating systems have terrible interprocess communication mechanisms. Early UNIX only had pipes. This started a tradition that interprocess communication works like I/O, leading to named pipes, sockets, and domain sockets. The result is a set of rather slow interprocess communication mechanisms. (One can do worse. In the old MacOS, interprocess communication could only pass one message per vertical refresh time, and this wasn't documented.)

    On top of those mechanisms, even slower interprocess communication systems are typically implemented, such as OpenRPC and CORBA. (For even more inefficiency, there's XPC. In Perl. But I digress.)

    Because of this history, there's a perception that interprocess communication has to be slow. It doesn't.

    What you really want looks more like what QNX [qnx.com] has - fast interprocess messaging that interacts properly with the scheduler. QNX has to have interprocess communication done right, because it does everything through it, including all I/O. This works out quite well. You take a performance hit (maybe 20% for this), but you get much of that back because the higher levels become more efficient when built on good IPC.

    The QNX messaging primitives are available for Linux, [cogentrts.com] although the implementation isn't good enough for inclusion in the standard kernel. That work should be redone for the current kernel.

    IPC/scheduler interaction really matters. If you get it wrong, each interprocess transaction results in an extra pass through the scheduler, or worse, both the sending process and the receiving process lose their turn at the CPU. This is easy to test. Start up two processes that communicate using your IPC mechanism. Measure the performance. Then start up a compute-bound process and measure again. If the IPC rate drops by much more than a factor of 2, something is wrong. Don't be surprised if it drops by two orders of magnitude. That's an indication that IPC/scheduler interaction was botched.

    Sun addressed this in the mid-1990s with their "Doors" interface in Solaris, which had roughly the right primitives. But that idea never caught on.

    The article here implements a message-passing system via shared memory, which is not exactly a new idea, even for UNIX. I think it first appeared in MERT [bell-labs.com], in the 1970s. It's an attempt to solve at the user level something that the OS should be doing for you.

    Shared memory is a hack. It's hard to make it work right. With it, one process can crash other processes in hard-to-debug ways. Sometimes you need it because you're moving vast amounts of data, (by which I mean more than just a video stream) but that's rarely the case.

It is easier to write an incorrect program than understand a correct one.

Working...