Follow Slashdot stories on Twitter

High-Performance Programming Techniques on Linux 19

Posted by michael on Wednesday June 26, 2002 @07:27PM from the tools-of-the-trade dept.

Dejected @Work writes "A senior IBM developer has come out with a series of articles on high performance Linux programming techniques using pipes, sockets, threads, and processes. The series has been running for a while and juxtaposes these high performance technique with Linux and Windows. Guess who wins?"

This discussion has been archived. No new comments can be posted.

High-Performance Programming Techniques on Linux

Load All Comments

Search 19 Comments Log In/Create an Account

Comments Filter:

Disclaimer (Score:1, Funny)

by Slashdot's Attorney ( 588436 ) writes:

Slashdot makes no guarantees as to the accuracy of the above information. All data provided are for entertainment purposes only, and any damage to hardware as a result of following the above advice is the sole responsibility of the user.
This doesn't seem too interesting. (Score:2, Informative)

by Lenolium ( 110977 ) writes:

Linux, always know for it's fast context switches, is faster that Windows, which is pretty slow at context switches, as far as OS's go. Is faster doing a minimal load multi-threaded application. Well, it's not quite a shocker, but glad to know that we are still ahead.
"high performance"??? (Score:3, Insightful)

by dirtydamo ( 160364 ) writes: on Wednesday June 26, 2002 @09:30PM (#3775471)

That's the most ludicrous statement. How can one call a normal send/recv loop high performance socket code?

I, for one, remain totally unconvinced by this article (at least the guy who wrote it admits he doesn't know anything about Windows). How can one possibly compare "high performance" I/O on Windows without using overlapped I/O, and possibly even completion ports?

Share
twitter facebook
- Re:"high performance"??? (Score:2)
  
  by Paranoid ( 12863 ) writes:
  
  Likewise, how can one possibly compare "high performance" I/O on Linux without using O_NONBLOCK and SIGIO, and possibly even POSIX AIO? =)
  
  I believe the point was trying to compare apples to apples, which is why the same API was used (to the extent possible) on both sides of the pond.
  
  Perhaps the article had a misleading title. On the other hand, don't all benchmarks have the term 'high-performance' in them somewhere?
  
  I actually liked these articles (even though I saw at least one of them before, here [slashdot.org]) - it seemed a good test of basic functionality, and as you rightly pointed out, the API they used really is basic. It did a far better job of comparing apples with apples than most comparisons, rather than shooting for some abstract (and uncomparable) "high-level" API, without even indicating how much of a benefit such an interface has over the base-level.
Why I don't do Windows (Score:2)

by PD ( 9577 ) writes:

From the article about pipes:

The number 24 in the first executable line of code above was determined experimentally. I found no mention of it anywhere in the Platform SDK. If it is not present, the program doesn't work. Apparently, the pipe facility requires a 24-byte header on each write to the pipe.

If this were Linux, we'd be able to know what that 24 bytes was.
Benchmark bullshit and no knowledge of Windows (Score:5, Informative)

by Twylite ( 234238 ) writes: <<az.oc.tpyrc> <ta> <etilywt>> on Thursday June 27, 2002 @03:07AM (#3777171) Homepage

This article is a typical case of benchmark bullshit. The author has taken a deliberately Unix-centric view of comuputing, and ignored design and implementation concepts that are normal for Windows-based systems.

In the synchronisation article (the /. poster missed the link for that one) only Mutexes, Semaphores and Critical Sections are evaluated. It is well known that mutex performance on Windows is poor compared to *nix, but that is mitigated by a number of benefits in the Windows threading model.

Here's a brief intro, to show why they CAN'T be compared:
Windows has processes and threads as first class citizens, and they have fair (multi-level round-robin) scheduling. Mutexes, semaphores and critical sections are the primary locks, but there are also atomic check-and-increment functions as well as events/signals (long lasting flags). Every object (mutex, sem, section, event, thread, process, file, socket, etc) in Windows can be waited on, and you can wait on any number and combination of objects at once, in either an AND or OR configuration. e.g. wait for a mutex AND an async socket IO; or wait for a semaphore OR a thread to end OR an event
Linux's options are far more limited - to achieve the same results you have to use a different architecture (not that this is necessarily a bad thing); on the other hand Linux's primitives and context switching is faster than the Windows equivalents. Linux has kernel scheduled processes, userland threads (kernel threads are available), a fair but not deterministic scheduler, mutexes, semaphores and condition variables.
A condition variable is similar to an event, but is instantaneous - if no thread is waiting on the condition variable, nothing happens. An event stays set until it releases a thread (auto-reset events) or until explicitly reset (manual-reset events). A condition variable one of the few time-waitable objects in Linux (all objects are time-waitable in Windows; mutexes and semaphores are not time-waiting in Linux).

The comparitive power of the Windows' threading and synchronisation model may not be obvious to long-time Unix programmers, but consider the wider range of architectural possibility when you can wait (with a timeout) on any combination of any objects in the system.

In the socket article, the author compares the BSD socket API on Linux with the WSASocket API on Windows, which is meant primarily for asynchronous operation. Despite claiming techniques for "high performance" sockets, he fails to mention /dev/poll, POSIX AIO, or Window's IoCompletion Ports. POSIX AIO can be reasonably compared to Window's async socket/file support, but it is impossible to make a valid comparison between /dev/poll (or kqueue, etc) and IoCompletion Port because they require significantly different architectures to function at peak efficiency.

On to processes and threads. CreateProcess() has the combined functionality of fork() and exec(), so the article starts off on the wrong foot. It also supports security attributes, so the equivalent Linux example should have had a larger function starting with fork(), then dropping permissions in the child and exec()ing another binary.

The author incorrectly assets that Linux threads are scheduled by the CPU - he is using the pthreads library, which is userland threading. pthreads is also far from "fair"; Windows uses a multi-level round-robin algorithm, which makes thread scheduling very deterministic; pthreads is far more prone to thread starvation in a system where processing cascades between threads. e.g. an input thread, processing thread and output thread, which use mutex-protected queues to communicate; this is an excellent architecture for Windows, but performs poorly by comparison on *nix because a sudden heavy load will see the input thread scheduled more often that other threads, until it's load dies down, at which point the processing thread will get the load, and so on - throughput stays much the same as a Window system, but latency near-triples.

Benchmarking thread creation is a load of crap. Few seriously high-performance servers use a thread-per-connection architecture anymore; and at the very least they use thread pools.

The entire article is unfair to both sides: on Windows, threads are first-class citizens; on Linux you are more likely to use multiple processes for stability and performance.

I've already covered everything necessary to dispute the bullshit in the Scheduling article.

Conclusion: this is an excellent case of "don't believe the FUD". You can't compare apples and apples when some of the apples are growing on an orange tree. The only way to achieve a meaningful comparison of these platforms is to construct applications with equivalent functions, but designed and implemented for the target platform.

Share
twitter facebook
- mod parent up.. (Score:2, Interesting)
  
  by johnfoobar ( 258419 ) writes:
  
  in fairness UNIX (or at least linux and the BSDs) are comparitively weak when it comes to multi-threading and lots of the slashdot zealots (sue me) could really benefit from actually sitting down with a copy of Inside Windows 2000 [sysinternals.com] rather than just mouthing off about microsoft being evil and windows being crap.
  multi-threading is why, for example aolserver [aolserver.com] can do with one process what apache needs a bunch of processes to do. (though i digress, aolserver only has to run tcl interps, where apache is much more versatile.)
  meanwhile, both FreeBSD and NetBSD are trying to get SMP and scheduler activations into their kernels. this would improve their support for multi-threading substantially. there's a paper [mit.edu] which explains this better than i ever could.
- Re:Benchmark bullshit and no knowledge of Windows (Score:5, Informative)
  
  by ianezz ( 31449 ) writes: on Thursday June 27, 2002 @05:18AM (#3777458) Homepage
  The author incorrectly assets that Linux threads are scheduled by the CPU - he is using the pthreads library, which is userland threading.
  Uh?
  Last time I checked, "pthread" is just an API, and on Linux you have at least two implementations of that:
  
  linuxthreads [inria.fr] (kernel-based, uses the clone() system call, definively scheduled by the kernel), which is the one shipped with GNU libc (the one normally used, and the one used by the author of the article, btw).
  
  GNU Pth [gnu.org] (completely userland).
  
  IBM is also working to implement a M:N threading implementation with a pthread API [ibm.com], partially kernel-based and partially in userland.
  Parent Share
  twitter facebook
- Re:Benchmark bullshit and no knowledge of Windows (Score:2)
  
  by sigwinch ( 115375 ) writes:
  
  Every object (mutex, sem, section, event, thread, process, file, socket, etc) in Windows can be waited on, and you can wait on any number and combination of objects at once, in either an AND or OR configuration. e.g. wait for a mutex AND an async socket IO; or wait for a semaphore OR a thread to end OR an event.
  
  Not serial ports--they take a different API. (Last I heard, I may be misinformed.)
  
  Linux has kernel scheduled processes, userland threads (kernel threads are available)...
  
  As somebody else points out, Linux kernel threads do exist and are usually used. More importantly, the Linux kernel multiprogramming model makes no distinction between threads and processes. A thread is simply a process that shares memory with another process. Linux thread creation and switching are very fast, forking a new process is only a little more expensive than starting a new thread.
  
  A condition variable one of the few time-waitable objects in Linux (all objects are time-waitable in Windows; mutexes and semaphores are not time-waiting in Linux).
  
  However, when your pipes are fast you don't *need* a tasteless profusion of inter-context communication, and Linux pipes are time waitable (using the conventional I/O waiting API: select, poll, /dev/poll). The only thing Linux lacks is the ability to wait for several conditions to become true using a single system call. (You can do AND with blocking read(2), but you can't wait on anything else at the same time.)
  
  Benchmarking thread creation is a load of crap. Few seriously high-performance servers use a thread-per-connection architecture anymore; and at the very least they use thread pools.
  
  If your threads suck, you are constrained to use a thread pool. If your threads are good, you can use whatever is appropriate for the job. (There are many small jobs that thread-per-connection will handle just fine provided your OS isn't raping you for it.)
He doesn't know what he is doing. (Score:3, Informative)

by benhaha ( 456005 ) writes: on Thursday June 27, 2002 @07:22AM (#3777789)

The validity of the exercise is compromised by his assumption that that multiple processes as opposed to multiple threads was the best choice for whatever his benchmark is supposed to model, and that if they are, RPC, COM or shared memory are not more appropriate to the IPC task. Windows has many ways of doing IPC and concurrent tasking, and most applications use other IPC methods than pipes. This failure of choice is an important reason why such like-for-like benchmarks are of little value.

In short, these "high-performance techniques" are high-performance on Linux only, the way he does it. On windows, other methods, not available on Linux, are more used.

Share
twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

High-Performance Programming Techniques on Linux 19

High-Performance Programming Techniques on Linux More Login

High-Performance Programming Techniques on Linux

Disclaimer (Score:1, Funny)

This doesn't seem too interesting. (Score:2, Informative)

"high performance"??? (Score:3, Insightful)

Re:"high performance"??? (Score:2)

Why I don't do Windows (Score:2)

Benchmark bullshit and no knowledge of Windows (Score:5, Informative)

mod parent up.. (Score:2, Interesting)

Re:Benchmark bullshit and no knowledge of Windows (Score:5, Informative)

Re:Benchmark bullshit and no knowledge of Windows (Score:2)

He doesn't know what he is doing. (Score:3, Informative)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot