Robert Love Explains Variable HZ 62
An anonymous reader writes "Robert Love, author of the kernel preemption patch for Linux, has backported a new performancing boosting patch from the 2.5 development kernel to the 2.4 stable kernel. This patch allows one to tune the frequency of the timer interrupt, defined in 2.4 as "HZ=100". Robert explains 'The timer interrupt is at the heart of the system. Everything lives and dies based on it. Its period is basically the granularity of the system: timers hit on 10ms intervals, timeslices come due at 10ms intervals, etc.' The 2.5 kernel has bumped the HZ value up to 1000, boosting performance."
In FreeBSD (Score:3, Interesting)
Finally... (Score:5, Funny)
Re:Finally... (Score:2)
Re:Finally... (Score:1)
The article is not available online unfortunately, but some of the amused reactions of their readers are here [heise.de] (in German), and you can even find a picture [heise.de] of the gizmo (note the photoshopped activity LED).
It doesn't improve performance. (Score:5, Informative)
To make a long story short, for number crunching machines, servers, and other applications which don't need much user interaction, larger timeslices are preferable because it doesn't matter how responsive the user interface is. For desktop systems, the timeslice can be decreased to improve the responsiveness of the user interface and give a better "feel" to the system at the expense of a minor performance loss. Being able to tune these parameters to meet your needs is one of Linux's great strengths.
Re:It doesn't improve performance. (Score:5, Informative)
This is not quite true. If you only have a single program running just one thread this is true. You have to do a context switch at each tick to Ring 0 and back, which takes maybe 500 cycles, or 1/20 microseconds on a 1Ghz machine. Do this 1000 times and you've lost 50 microseconds of processing time.
BUT once you have more than one program or thread running the situation is different. Say you have one thread running flat out and another that needs to do 100microseconds of work. With 100 ticks per second you will lose 5 usec to context switching and 9900 usec to waiting for the next context switch. With 1000 ticks per second you lose 50us to context switching and 900 usec to waiting for the next context switch. So you get more work done.
For someone who always runs at 100% processor utilization 1000 ticks per second is probably a setting since you are probably just running one thread 99% of the time and once in a while writing logs to disk or responding to some other events. If you are more like me and run at 1% of your processor utilization most of the time, with the 100% utilization only happening when you compile so you would rather be able to continue to use the computer than save 1ms on the 5 minute compile then an even higher value might make sense. 10000 maybe, assuming there aren't limitations in the kernel that prevent the higher value.
Disclaimer: I've been applying Love's patches for a while now. They make a real difference in the responsiveness of X, esp if your running stuff like Mozilla or Gnome/KDE on your box. I haven't applied it on any servers cuz the preempt patch is not quite stable.
Re:It doesn't improve performance. (Score:2, Redundant)
I disagree with your analysis.
If a process isn't doing processing, that's because it's blocked in the kernel. (Q: What does a HLT do in userland?) As soon as the kernel puts a process on a wait queue, it reschedules. So you don't have any loss 'waiting for the next context switch'; that's just time that another process is running, or if nothing has anything to do, that the kernel halts the processor.
Note: I haven't studied how process scheduling is handled under Linux, but I can't imagine any OS that wouldn't do what I said here... or at least, I can't imagine one that would halt the processor after a process blocks, while it waits for a timer interrupt to schedule the next process.
Okay, maybe one.
Re:It doesn't improve performance. (Score:3, Interesting)
Okay, after re-reading the article, I did see one performance gain this could get: the case of select/poll. (This is blatantly stated in the article; I shot my mouth off before reading the article closely enough.)
Under BSD, as I understand it (I don't have the Daemon Book handy, but a quick reading of the source seems to agree), select will put a process on the wait queue until something arrives. During a select, the kernel does nothing with the process-- timer or not.
From the look of the article, under Linux, select actually does some sort of polling at or related to HZ. It may be on some sort of almost-run queue: a selecting process gets allocated timeslices; on its slice, it polls and either returns to userland or goes back to onto the almost-run queue. I don't have time to verify that-- I don't know my way around the Linux kernel-- but it seems to be reasonable, based on the article. Can I get a Linux developer to confirm/deny my guess?
So it seems that in the case of something selecting, primarily on an otherwise idle or near-idle system, increasing HZ may improve performance. This situation is less common than it used to be in today's world of multithreaded servers (since each thread typically blocks only on a single fd), but it's still potentially significant.
Re:It doesn't improve performance. (Score:4, Informative)
Deny. It's actually the idle timeout that's affected by HZ. select() itself doesn't poll at all, and e.g. a select() call with an infinite timeout will be completely unaffected by HZ (select will wake up when the network gets an interrupt resulting in readable data/writeable buffer space).
Example of the timeout effect: a game could have a select() loop that waits on user input, but also has a timeout argument so that it can go ahead and update the screen, do enemy AI, etc. The kernel, in absence of interupts, schedules on HZ boundaries. Suppose that you as a programmer put a 1/60 second timeout argument in the select loop (intending to update the screen with a 60 HZ refresh and figure out where everything's moving). If you call select() right after a HZ boundary, you could find yourself waiting until 1/50 second passes even on an idle machine with HZ=100; after 1/100 sec, your timeout hasn't expired yet. Next chance to schedule is at 2/100 (1/50) sec.
With HZ=1000, you'll schedule no more than 1/1000 sec after the 1/60 sec boundry (on an idle machine).
This example is really simplified; a real-life app would adjust for scheduling creep by keeping track of wall-time. But the same concept, with more complicated apps, can cause faster HZ ticks to give you better CPU utilization (especially in e.g. video editing apps and such) because you get around to using the CPU closer to when you want it.
The preempt kernel is an even better example of where decreasing latency can increase throughput, sometimes significantly. There you can really get around to dealing with I/O quickly, keeping CPU saturated (and saturated with cache-warm data) and benefiting things like heavily loaded web servers just as much as sound editing stations.
Sumner
Re:It doesn't improve performance. (Score:2)
Okay, that makes much more sense, thank you!
I don't yet see why the timeout of select would be such a big deal (outside of a few specific cases), but I'll have to think about it, and your examples, more carefully.
FWIW: BSD has the same select setup as you described.
So here's my thought: how expensive is it to reprogram the timer chip? Would it be possible to adjust it dynamically to create perfect granularity in sleep/select?
Re:It doesn't improve performance. (Score:2)
Yeah, pretty much every Unix has interrupt-driven returns for the non-timeout case, anything else would be pretty bogus--though some systems (e.g. Linux 2.5.x) do interrupt mitigation under high load, but that's more of an "above and beyond" thing. The timeout case is handled differently on several Unices.
So here's my thought: how expensive is it to reprogram the timer chip? Would it be possible to adjust it dynamically to create perfect granularity in sleep/select?
There is a tickless Linux implementation.
I can't find the home page at the moment, but see e.g.
http://www.uwsg.iu.edu/hypermail/linux/kernel/0
There are a lot of other ways of dealing with this, and tickless has some negative attributes I don't fully understand (among them is that it's not portable to older hardware, and there is some overhead to programming timer interrupts). I think the nanosecond kernel patches (which are starting to go into 2.5) address the select/sleep granularity issue in a different way but I'm really fuzzy on the details.
Sumner
Re:It doesn't improve performance. (Score:2)
Sumner
Re:It doesn't improve performance. (Score:1)
Terrific, thanks! The IBM project it discusses sounds a lot like my half-verbalized idea. I'll have to delve deeper into this idea, and what they've done so far.
Re:It doesn't improve performance. (Score:2)
Also, after further investigation the Anzinger solution is _not_ in 2.5.x yet; Linus has looked at the patch, asked for clarification, and Anzinger recently replied with an updated patch. Search linux-kernel archives for "high-res-timer" or "POSIX timer" patches for more info.
Sumner
Re:It doesn't improve performance. (Score:2)
Like he knows that he needs to poll to get decent response times when there are no interrupts to wake the process up a quarter timeslice from now.
There are examples that are silly, like the lvcool user space idle loop I run on my AMD laptop cuz the kernel doesn't halt the processor in low power mode. I need to kill it to play a DVD, but if I don't run lvcool the fan runs constantly and the CPU still gets very hot. This should be in the kernel (and I've read is in 2.5 now as part of ACPI).
Then there are examples that aren't going to change unless the time slices get >muchdo exist even in some well written code. And sometimes it will spin until another thread gets control and releases some resource.
Re:It doesn't improve performance. (Score:2, Informative)
A reschedule does not happen only on the timer tick (100 or 1000 times a second depending on HZ setting), it happens on a number of occasions, timer tick being one of them. The other ones remove the concerns zenyu seems to be having:
The second point may seem a little weird, but a process can only become willing to do something as a result of some interrupt - a timer if the process was sleeping for a given amount of time; a i/o interrupt if the process handles the keyboard or the mouse. In any case, interrupts are handled by the kernel and so if a process is to wake up from its sleep or if a process gets something in some stream on which it is waiting (stdin on keyboard interrupt, socket on network card interrupt etc.), that process is just scheduled to wake up and work.
So on an idle machine the HZ does not really have much impact, and on a utilized machine the smoothness of process interaction (like window manager vs. X server) increases with increased HZ but this also increases the overhead.
Hope it's clearer.
Re:It doesn't improve performance. (Score:2)
Sure, there are some poorly-written apps that do excessive busy-waiting, but they are the exception, and there's not much the OS can do about it anyway.
The only benefit of increasing HZ is latency.
<RANT>
BTW, I'd just like to mention a pet peeve of mine. In the article, they mention that "RedHat shipped their 8.0 kernel at HZ=512". There is no reason whatsoever that this should be a power of two, so I believe it should not be. Powers of two have a magical status in the computer world, but I think you should not give your code this kind of connotation unless you have actually decided that a power of two is the best choice. Otherwise, you should pick a number that reflects the ad-hoc nature of your choice. Powers of ten reflect this better than powers of two. Thus, all else being equal, they should have chosen 500 over 512.
</RANT>
because 1/50th second minimum select timeout sucks (Score:1)
Presumably you meant "The only benefit of increasing HZ is decreasing latency" which is not a bad thing unto itself. Most people run interactive desktop applications, not scientific number crunching jobs for days at a time.
Having a minimum granularity of 1/50th of a second for a select() when HZ=100 really sucks, quite frankly.
Music players and animation programs have to resort to busy wait loops to get good response and tie up all CPU in the process. This is completely unnecessary in a modern OS.
It's 1/50th not 1/100th of a second with HZ=100 because of the way POSIX defines select() you have to wait for two jiffies at a minimum according to Linus [iu.edu].
Anyway, HZ > 500 sure as hell is better than HZ=100.
A HZ-less kernel with on-demand timer scheduling would be much better, though. IBM has such a kernel patch for their mainframe version of Linux to improve responsiveness when hundreds of Linux VMs are running concurrently.
Pity about the USER_HZ = 100 thing to accomodate all the borken programs that pick up HZ from the linux kernel header file and assume it is a) constant, or worse yet b) 100.
Had HZ had been a proper syscall instead of a #define in the first place for user-land programs this would not have been a problem today.
Can someone do me a big favor and post RedHat 8.0's asm-i386/param.h file so I can see how they defined HZ, USER_HZ and friends? I'd like to see it without actually going to the trouble of installing RedHat 8.0.
Wrong! Re:It doesn't improve performance. (Score:2)
Yes. Say you have one thread running flat out and another that needs to do 100microseconds of work. With 100 ticks per second you will lose 5 usec to context switching and 9900 usec to waiting for the next context switch.
No! The task does 100 microseconds of work and then calls the sleep command, or does I/O or whatever. This ultimately goes through the kernel and the kernel does an early context switch. It certainly doesn't waste the rest of the timeslice.
Incidentally, the overhead of doing the context switch is much bigger than you say here- one of the things that the kernel has to do is flush the caches as it swaps the virtual memory in and out- that will slow the system for tens of thousands of instructions afterwards.
Anyway, you're wrong about it not improving performance; it certainly can improve latency, which is very definitely a performance metric; but obviously you'll lose some cpu time due to the more frequent context switches that will occur.
Re:Wrong! Re:It doesn't improve performance. (Score:2)
It is higher if you switch to another userland application. If you go to the scheduler and decide to keep running the same app the TLB does not have to be flushed. Even if you do switch to another app it's unlikely that it's going to thrash the cache. Those gnome-apps aren't so data intensive. It's even more unlikely that you will have to page in virtual memory. I don't think I've even bothered to allocate virtual memory lately, when 2-4 Gigs are cheap why bother? (On a P4 you can even tell the processor not to flush the local's out of the TLB when you load a new LDT, and it never flushes the kernel's entries unless you change the GDT for some reason.) You can thrash the cache if you want to, just start a compile with -j# with # greater than the number of processors. But those little applications that need a small timeslice once in a while aren't gonna do it. There might be a security arguement for flushing the cache, so that some app can't communicate with another by reading or not reading in a memory location into the cache. But at that level of paranoia I wouldn't be using Linux anyway.
If you're swaping in virtual memory from a hard drive who cares what your timeslice is? It's going to take milliseconds just to get the page anyway! The only benefit of virtual memory is that it can swap out unused code so only the working set uses up RAM, in which case you still rarely actually swap things in since your working set is in RAM by definition. Overlays probably had a better granularity for that purpose. I'm always afraid virtual memory will be abandoned, even though it could be useful at some future date when you might have just 4GB of fast RAM, and 64TB of plan old DDR-RAM or something else the processor can't handle without OS help. (Yes I know there are machines that actually use virtual memory, but I'm not going to argue that they should have more ticks, they might benefit from fewer, in fact. I just haven't seen one of those machines in at least two years, so I think addressing the Athlons, and P3 & P4's of the world isn't such a bad idea.)
Re:Wrong! Re:It doesn't improve performance. (Score:2)
Re:Wrong! Re:It doesn't improve performance. (Score:2)
True, but if people stop even creating swap partitions in large numbers who's going to want to maintain the code? Code dies when it's not maintaned... I don't think this will happen to Linux or any of the free OS's anytime soon since they are still used on old hardware where it's hard to even find anyone selling compatible memory. And, as long as the embedded people don't all just switch over to uLinux/rtLinux it won't happen either. Something like PS2 Linux really needs swap files with just 32 megs of RAM. (The chip can support gigs of RAM, but the hack requires lots of soldering and expert knowledge of the MMU.)
I think the chance of Linux abandoning the MMU is 0%, if only because you need memory protection any general purpose machine. Also I think you'd be capped at 36bits of address space on i386, or only 64 Gigs of RAM, sounds great now, but won't in 5 years.
Re:Wrong! Re:It doesn't improve performance. (Score:3, Informative)
The point is that virtual memory reduces the amount of real memory you need for each thread- each only takes what it really needs. Sure if memory is cheap, it may not matter so much. But even if it is cheap do you really want to give each process 1 gig of space on the off-chance that it might need it? I don't think so.
Virtual memory is when a process thinks it has 1 gigabyte of memory, but it actually only has, say 128 megabytes. It can read or write to any bit of it, and the OS does what is necessary to ensure that it never notices the difference; obviously upto the actual system limits.
Virtual memory and swap space go together very nicely, but one does not imply the other. You can use virtual memory to implement garbage collection for example; with no backing store at all.
I guess there are other ways to do similar things- for example, don't use virtual memory, use real memory and set up the MMU so that each thread can only see its own map. But there are issues with that, and it isn't necessarily faster.
virtual memory (Score:2)
I wasn't clear enough, I see 0% chance that virtual memory will disappear from Linux because it provides protection from one application playing with another's memory.
I do fear for swapping, but only years from now when it's not so common. I do not fear for the loss of MMU support including virtual memory.
It isn't clear this is what I'm saying from that post, but if you read what I said before I think it's clear. I was agreeing with you on the point of virtual memory not being a big deal, but adding that swapping was in dirge territory on the modern systems that will benefit from upping HZ. Your original comment on swapping is what inspired me to write the comment, cuz I thought you were making the point that it's not a performance loss to use virtual memory even if you never swap, while my point on swapping had nothing to do with performance, but code maintinance. If an signal never fires who cares how long it takes to handle it after all.
If you have to do any swapping to disk I don't care how much you try to tune HZ, you need to buy more memory or run fewer apps to get a snappier system.
But enough on this point, it's tangental and I think I agree with everything you said in this last comment without exception.
Re:It doesn't improve performance. (Score:4, Informative)
Also, you don't necessarily have to increase the clock frequency by a whole order of magnitude. A fair compromise could be 200Hz, or 250Hz, or 500Hz. A typical workstation running X-Windows could use 250 or 500, for example.
Re:It doesn't improve performance. (Score:2)
You can do that in Linux too.
Re:It doesn't improve performance. (Score:2, Interesting)
NT Server has a larger timeslice and more caching for some system functions, while NT Workstation has a smaller timeslice with caching geared for user apps.
I know NT is old technology, and I'm not sure if this still applies to the latest MS offerings. Hardly justifies the price difference between Server and Workstation!
Re:It doesn't improve performance. (Score:1, Interesting)
System Control Panel -> Performance -> Optimize Performance for 'Applications' or 'Background Services'.
NT 4.0 Server's default was oddly "Applications"!
NT also has a priority boost for interactive apps. However, if the GUI is 'dead' for a long period of time (such as on a server), it will stop doing this. That's why if you walk up to an even lightly loaded W2K server, it's got that X11-style laggy mouse that your workstation never has.
AFAICT, there's no real operational difference between "Server" and "Workstation" at least for NT4 and 5, althout at least W2K Server has some sane non-workstation default settings. The kernel thread/registry entries are for licencing purposes only.
Re:It doesn't improve performance. (Score:1)
I think RedHat did this... (Score:5, Informative)
I tried recompiling the stock RedHat kernel, and sure enough that was a on option in there to increase the hz for the internal timer.
Moore's law (Score:2)
Re:Moore's law (Score:4, Informative)
The reason is that across a scheduling tick the processors cache gets flushed and reloaded. This means that you end up doing a burst of memory reads, and that will dominate if the clock tick is too short.
Re:Moore's law (Score:3, Informative)
Whoa! What architecture is that!
That just doesn't sound right. The register files get flushed(well swapped), but if that 2 meg cache got flushed on every context switch there wouldn't be much point in having it at all. You can get cache thrashing if too many cache hungry programs are running simultaniously but that's why you get a bigger cache if you run lots of those programs, it so that their working set is saved across context switches.
Perhaps you mean the L1 caches? They can get tossed out cuz it can only hold a few inner loops and a few small working sets at a time anyway, but all that stuff should still be in the L2 cache and get loaded very quickly into those puny L1 caches, the L1 data cache is practically a register file anyway on P4's, 64 bit moves to/from them happen in a cycle...
Those L2->L1 moves might start to affect you at 1,000,000 ticks per second, but no one is proposing that, right? Even so in a typical environment the other context is just the scheduler which I can't imagine filling the L1 cache... It's not that complicated on a mostly idle machine. (Quick & Dirty schedulers have been written, some which looked through the entire process list. Erm, but on my machine there are less than 100 processes right now, still not so bad for L1
Anyway I think 1000 is just fine, if you're doing real-time music synthesis on lotsa channels a larger number might be better. Someone in Europe is working on a music disto, so maybe they will discover that 8000 is the magic number for 16 channels at 48000khz on a P4 at 2Ghz.
It would be neat if someone came up with metrics so that the tick was set so that 99.999% of the time the sound systems got their slices once every 500 usec but otherwise the timeslices were as large as possible. Then you could just tune that 500 usec thing, make it longer if you're on a 386, shorter if you really need more than half millisecond timings. I guess any program that needed frequent time slices could write to some proc file how much more often it should be called, or if it could afford to be called less often. For example 1.2 if it want's to be called more often, 0.8 if it's time needs were met. The kernel would only have to insure all the numbers it got were less than 1.0, and if the largest one were less than 0.95 it could even afford fewer time slices. The kernel might also want to ensure through process accounting that the time sensitive processes never got more than a certain percentage of the cycles available even if it meant they got called less often. This to prevent a denial of service where you just always write 10 to that proc file whenever you get run so the time tick grows until you spend all your time in the scheduler. It might also want to set a floor, so that a human can interact with the machine. Ticks should never be less than say 10 for instance on a PC(or 250 if it's my machine). Though for some special purpose interstellar Linux probe you might want to sleep for a whole second at a time before checking your direction once on your way so a tick of 1 would be acceptible once out of your solar system. (You still want 64 bit uptimes for you're interstellar probe it would be so embarassing if it arrived and the aliens were like, "Woah this species can't develop an operating system with more than 3 day uptime for a space probe that took like 40 years to get here, what l0s3rs!")
Re:Moore's law (Score:4, Interesting)
All of them AFAIK.
That just doesn't sound right.
Well, it is. Deal ;-)
The problem occurs when the memory management unit gets modified to maintain the virtual memory 'illusion'. Then you have to flush the caches to maintain consistency. Of course it doesn't happen on every clock tick, you hope.
That means that all the caches above the memory management unit need to get flushed. This includes the program cache; and any other data too.
I did a quick check on the web for this, but I haven't managed to find a good reference to where the MMU is placed in the different architectures yet.
Anyway, that's one of the main reasons the OS scheduling isn't shorter, but any decent OS has to do quite a bit of dorking around at that time.
Re:Moore's law (Score:2)
Sure, but flush the whole cache? The virtual memory arguement justifies flushing the TLB cache if there is an actual switch to another running process. If it's just to the scheduler and back doesn't that have a valid mapping in any process (that whole reserve 1G for the OS out of the 4GB directly addressible must be for this purpose right?) But while I'm not familiar with the actual implementation of these chips I can't see why the cache wouldn't just be addressed by physical locations in memory, hence no need to invalidate their data, just because you change their virual adresses.
I'm not an Intel expert, but I know they have a GDT and LDT. That is a Global Descriptor Table and a Local one so the scheduler should be able to use the global one while the application uses the local one. I actually have the manuals so I looked but it's a bit esoteric. What I found that supports the TLB flushing is that whenever you load a new LDT you invalidate all the local TLB entries. You can have over 8000 entries in a LDT, but the OS needs to use one for each user level process in order to protect an applications memory from other applications. So if you're Amiga OS you just use the GLT for the kernel and an LDT for your apps, but if you're Linux each app gets it's own LDT. The Pentium 4 has a PGE flag that can be used to prevent flushing frequently used tables. So you could prevent flushing the entries if you had some use level app that was run frequently enough to get special treatment.
I'm still not convinced the actual L2 & L3 caches get flushed, esp since you can even avoid TLB flushes. The TLB is small, which is why you would want to flush it before running a different process, the caches are relatively big...
Re:Moore's law (Score:2)
Re:Moore's law (Score:1)
you don't flush the cache on a context switch.
Re:Moore's law (Score:2)
Re:Moore's law (Score:2)
However, I've always wondered if there was a performance win to multiple threads running in the same memory space as compared to multiple processes, for this very reason.
Anecdotally no: I spoke to the BeOS guys at a conference back in the days of wanting every cycle you could get, and they didn't give threads from the same process as the previous thread any higher probability of running next, which would be the natural thing to do if it were a performance win.
Re:Moore's law (Score:2)
It had better be astronomically small otherwise user programs will gradually screw up; and I don't think it is that small in fact.
Re:Moore's law (Score:3, Insightful)
I seriously doubt we are going to be needing 1/10th second slices for quite a few years, and by that time I expect the kernel to run something in idle time to auto-tune the slices for my current workload average. Remember the higher HZ only improves "responsivness", it actually decreases system performance computation wise. There is a specific number that is best for every system at any particular time, and going above or below that number hurts performance.
Good for streaming media (Score:1, Informative)
Among other things, streaming media is an important beneficiary of this change. Let's say you have a medium-bitrate video stream (about 2.5 to 5 megabits). That means that your packets should be spaced about 2 to 4 milliseconds apart. This is easy to schedule when your system has a 1 millisecond granularity, but is a disaster when your clocks are 10 milliseconds apart -- your packets end up going out in clumps. Your 100bT network may not care either way, but if you are pushing video over ADSL, 802.11b, or ATM, you may find your packets getting lost along the way.
Re:Good for streaming media (Score:2, Insightful)
When an application sends data over the network, it does a send() (or possibly a write()) on a socket. These are systemcalls, so the CPU switches context to the kernel, and the data send by the program is placed in the kernel network buffers. Note that this happens immediately, without waiting for another timeslice.
Then the kernel sends as much as possible (depends on the buffer size on the network card itself) of the data to the network card (after slapping on IP and TCP headers), after which the kernel returns to the application.
Now comes the difference: you suggest that when the network card is done sending the data, it'll have to wait for the next timeslice (because then a context switch to kernelspace occurs and the kernel can do some work), but this is not true!
When the network card is done sending the data, it immediately generates an interrupt (what do you think IRQs are for?). On interrupt, the CPU switches context to the kernel, and the kernel (still having the data to be send in the network buffers) can immediately replenish the buffer on the network card, allowing packets to follow very closely on eachother, regardless of timer granularity.
By the way, somewhat modern network cards can burst packets. That is, they can receive a whole batch of packets from the kernel, which they will then send at the appropriate speed of the medium, so that not everey packet will generate an interrupt. And that's a good thing (tm), because high interrupt loads (think towards 100,000 interrupts/sec for gigabit - without jumbo frames and bursts) are performance killers.
Never thought I'd say this... (Score:1)
Re:Never thought I'd say this... (Score:1)
I think they more than made up for it in reboot time ;-)
Wow. (Score:2)
Wow. Are you saying that linux pages out the running process at every context switch? I think I might have found an explanation for X's choppiness.
Re:Wow. (Score:2)
It has to do stuff like that to keep the processes address space separate- otherwise one rogue process would kill all the others, like in 95
Huh? Win 95 had virtual memory... (Score:2)
Huh? Mac OS 7 had virtual memory... (Score:1)
Windows 95 absolutely does have virtual memory. (Are you thinking of Mac OS 9??)
Mac OS 7 had virtual memory. It just wasn't protected virtual memory until Mac OS X.
Re:Huh? Mac OS 7 had virtual memory... (Score:3, Insightful)
Great - now binaries are broken. (Score:1, Informative)
Re:Great - now binaries are broken. (Score:4, Informative)
Robert Love to talk about all this in LA (Score:2, Interesting)
If you use the promo code: F633F you can get into the expo free.
Re:Robert Love to talk about all this in LA (Score:2, Interesting)
If this helps, something is broken (Score:2)