


Syscall Speed On Linux And Windows 5
1010011010 writes: "IBM has tested the syscall speed of Linux 2.2.16, 2.4.2 and Windows 2000. As it turns out, Linux is a little more than twice as fast. This may be interesting to people who have been reading the LKML recently, as a debate has been doing on about syscall speed. Also, a method ("magic page") for further improving syscall speed is being developed by the kernel developers. The rate at which all aspects of Linux is improving -- kernel, GUIs, etc. -- is phenominal. I think Linux is pretty cool now; I can't wait to see it in 18 months."
Maybe not that accurate.... (Score:1)
Oh, for Christ's sake! (Score:4)
Did you actually read the article? It does not by any means "test the syscall speed" of Linux vs. Windows! It introduces timing routines for Linux and Windows which will be used for future articles comparing various things between Linux and Windows. The point of the article is not to reveal that Windows QueryPerformanceCounter() takes 1.945 usec and is therefore less than half as fast as a Linux gettimeofday(), but rather to demonstrate that BOTH systems are capable of providing sub-2-microsecond timing resolution, and that therefore the benchmarks to be performed in future articles will be accurate!
Feel free to interpret this as "Linux r0x, Windoze suxx!!", but really, it's about as significant as saying "gettimeofday() is only 14 characters long, and only lower-case, and can therefore be typed faster that the Windows equivalent, QueryPerformanceCounter(), which is 25 characters and mixed-case! Therefore programming under Linux is quicker and easier!".
Anyway, both methods are a wank. They should just use some inline asm to query the performance counters directly. Same code for both OS then.. :-)
gettimeofday()? give me a break. (Score:3)
The best way to get performance data on linux or windows is via the Intel chip's time-stamp counter; here's some example gcc code to do it:
static unsigned long long rdtsc(void) {
register unsigned long long d;
__asm__ __volatile__ ("rdtsc" : "=A"(d));
return d;
}
The previous method takes about 13 cycles on an Athlon 750. (DO NOT try and make it inline -- or gcc might optimize your to-be-timed code out from between the rdtsc() calls.) It is a straightforward manner to read the cpu clock speed from
As with any timing method, take care to execute it a few times before you gather any information, to prime the i-cache.
Apparently the lameness filter believes that this is a "junk character post", so I'll type some more. Intel has a useful whitepaper that describes how to do this in an M$ compiler, available here: http://developer.intel.com/software/idap/resource
a serialized version (Score:1)
If you'd like to disable out-of-order execution for your timing code (maybe necessary on PPro and later processors -- I've found that it doesn't make a lot of difference for most real benchmarking tasks), add a cpuid instruction before the rdtsc. Note that the cpuid instruction will clobber eax, ebx, ecx, and edx (you can give these registers to the GCC __asm__ directive).
The CPUID instruction forces all instructions in the pipeline to complete. Using the serialized rdtsc takes about 40 cycles on an Athlon 750.
OT: I really had to wrestle with the lameness filter to get this through -- even one line with the inline asm declaration was a "junk character post". Perhaps the lame"ness" filter should recognize that "C/asm code" is different from "ASCII goatse"...
What this is actually testing... (Score:1)
Of course, since the purpose of the time-timers.cpp [ibm.com] program is to time the timing routine, we want to time the actual overhead, including the user/kernel space copy (and the overhead of the function call),
I'm not sure why Linux is so much faster on the gettimeofday() call. I'm guessing it perhaps can retrieve the time directly into the final buffer? Or perhaps it has a more efficient way to copyout the data?
Then again, maybe NetBSD uses a different way to get the time. When gettimeofday is called, NetBSD does a few I/O accesses to the timer chip (Intel 8253) and returns the result of that. What does Linux do? (I don't have a copy of the linux source handy)
eric