High-Performance Programming Techniques on Linux 19
Dejected @Work writes "A senior IBM developer has come out with a series of articles on high performance Linux programming techniques using pipes, sockets, threads, and processes. The series has been running for a while and juxtaposes these high performance technique with Linux and Windows. Guess who wins?"
Disclaimer (Score:1, Funny)
This doesn't seem too interesting. (Score:2, Informative)
"high performance"??? (Score:3, Insightful)
I, for one, remain totally unconvinced by this article (at least the guy who wrote it admits he doesn't know anything about Windows). How can one possibly compare "high performance" I/O on Windows without using overlapped I/O, and possibly even completion ports?
Re:"high performance"??? (Score:2)
Likewise, how can one possibly compare "high performance" I/O on Linux without using O_NONBLOCK and SIGIO, and possibly even POSIX AIO? =)
I believe the point was trying to compare apples to apples, which is why the same API was used (to the extent possible) on both sides of the pond.
Perhaps the article had a misleading title. On the other hand, don't all benchmarks have the term 'high-performance' in them somewhere?
I actually liked these articles (even though I saw at least one of them before, here [slashdot.org]) - it seemed a good test of basic functionality, and as you rightly pointed out, the API they used really is basic. It did a far better job of comparing apples with apples than most comparisons, rather than shooting for some abstract (and uncomparable) "high-level" API, without even indicating how much of a benefit such an interface has over the base-level.
Why I don't do Windows (Score:2)
The number 24 in the first executable line of code above was determined experimentally. I found no mention of it anywhere in the Platform SDK. If it is not present, the program doesn't work. Apparently, the pipe facility requires a 24-byte header on each write to the pipe.
If this were Linux, we'd be able to know what that 24 bytes was.
Benchmark bullshit and no knowledge of Windows (Score:5, Informative)
This article is a typical case of benchmark bullshit. The author has taken a deliberately Unix-centric view of comuputing, and ignored design and implementation concepts that are normal for Windows-based systems.
In the synchronisation article (the /. poster missed the link for that one) only Mutexes, Semaphores and Critical Sections are evaluated. It is well known that mutex performance on Windows is poor compared to *nix, but that is mitigated by a number of benefits in the Windows threading model.
Here's a brief intro, to show why they CAN'T be compared:
Windows has processes and threads as first class citizens, and they have fair (multi-level round-robin) scheduling. Mutexes, semaphores and critical sections are the primary locks, but there are also atomic check-and-increment functions as well as events/signals (long lasting flags). Every object (mutex, sem, section, event, thread, process, file, socket, etc) in Windows can be waited on, and you can wait on any number and combination of objects at once, in either an AND or OR configuration. e.g. wait for a mutex AND an async socket IO; or wait for a semaphore OR a thread to end OR an event
Linux's options are far more limited - to achieve the same results you have to use a different architecture (not that this is necessarily a bad thing); on the other hand Linux's primitives and context switching is faster than the Windows equivalents. Linux has kernel scheduled processes, userland threads (kernel threads are available), a fair but not deterministic scheduler, mutexes, semaphores and condition variables.
A condition variable is similar to an event, but is instantaneous - if no thread is waiting on the condition variable, nothing happens. An event stays set until it releases a thread (auto-reset events) or until explicitly reset (manual-reset events). A condition variable one of the few time-waitable objects in Linux (all objects are time-waitable in Windows; mutexes and semaphores are not time-waiting in Linux).
The comparitive power of the Windows' threading and synchronisation model may not be obvious to long-time Unix programmers, but consider the wider range of architectural possibility when you can wait (with a timeout) on any combination of any objects in the system.
In the socket article, the author compares the BSD socket API on Linux with the WSASocket API on Windows, which is meant primarily for asynchronous operation. Despite claiming techniques for "high performance" sockets, he fails to mention /dev/poll, POSIX AIO, or Window's IoCompletion Ports. POSIX AIO can be reasonably compared to Window's async socket/file support, but it is impossible to make a valid comparison between /dev/poll (or kqueue, etc) and IoCompletion Port because they require significantly different architectures to function at peak efficiency.
On to processes and threads. CreateProcess() has the combined functionality of fork() and exec(), so the article starts off on the wrong foot. It also supports security attributes, so the equivalent Linux example should have had a larger function starting with fork(), then dropping permissions in the child and exec()ing another binary.
The author incorrectly assets that Linux threads are scheduled by the CPU - he is using the pthreads library, which is userland threading. pthreads is also far from "fair"; Windows uses a multi-level round-robin algorithm, which makes thread scheduling very deterministic; pthreads is far more prone to thread starvation in a system where processing cascades between threads. e.g. an input thread, processing thread and output thread, which use mutex-protected queues to communicate; this is an excellent architecture for Windows, but performs poorly by comparison on *nix because a sudden heavy load will see the input thread scheduled more often that other threads, until it's load dies down, at which point the processing thread will get the load, and so on - throughput stays much the same as a Window system, but latency near-triples.
Benchmarking thread creation is a load of crap. Few seriously high-performance servers use a thread-per-connection architecture anymore; and at the very least they use thread pools.
The entire article is unfair to both sides: on Windows, threads are first-class citizens; on Linux you are more likely to use multiple processes for stability and performance.
I've already covered everything necessary to dispute the bullshit in the Scheduling article.
Conclusion: this is an excellent case of "don't believe the FUD". You can't compare apples and apples when some of the apples are growing on an orange tree. The only way to achieve a meaningful comparison of these platforms is to construct applications with equivalent functions, but designed and implemented for the target platform.
mod parent up.. (Score:2, Interesting)
multi-threading is why, for example aolserver [aolserver.com] can do with one process what apache needs a bunch of processes to do. (though i digress, aolserver only has to run tcl interps, where apache is much more versatile.)
meanwhile, both FreeBSD and NetBSD are trying to get SMP and scheduler activations into their kernels. this would improve their support for multi-threading substantially. there's a paper [mit.edu] which explains this better than i ever could.
Re:Benchmark bullshit and no knowledge of Windows (Score:5, Informative)
Uh?
Last time I checked, "pthread" is just an API, and on Linux you have at least two implementations of that:
IBM is also working to implement a M:N threading implementation with a pthread API [ibm.com], partially kernel-based and partially in userland.
Re:Benchmark bullshit and no knowledge of Windows (Score:2)
He doesn't know what he is doing. (Score:3, Informative)
The validity of the exercise is compromised by his assumption that that multiple processes as opposed to multiple threads was the best choice for whatever his benchmark is supposed to model, and that if they are, RPC, COM or shared memory are not more appropriate to the IPC task. Windows has many ways of doing IPC and concurrent tasking, and most applications use other IPC methods than pipes. This failure of choice is an important reason why such like-for-like benchmarks are of little value.
In short, these "high-performance techniques" are high-performance on Linux only, the way he does it. On windows, other methods, not available on Linux, are more used.