Follow Slashdot blog updates by subscribing to our blog RSS feed


Forgot your password?

Knuth Got It Wrong 298

davecb writes "Think you've mastered the art of server performance? Think again. Poul-Henning Kamp, in an article at ACM Queue, finds an off-by-ten error in btrees, because they fail to take virtual memory into account. And he solves the problem in the open source 'Varnish' HTTP accelerator, for all of us to see and use."
This discussion has been archived. No new comments can be posted.

Knuth Got It Wrong

Comments Filter:
  • by gnasher719 ( 869701 ) on Monday June 14, 2010 @07:30PM (#32572724)
    You should have read a bit further to the bit with B-trees and B*-trees.
  • by conspirator57 ( 1123519 ) on Monday June 14, 2010 @07:30PM (#32572726)

    article summary:

    "disk is slower than RAM. some doofs don't realize their system is swapping. ergo algorithm is bad."

    throw in 'Knuth is wrong' to generate page hits.

    non-sensationalized takeaway: "remember swap is slow; try not to use it."

  • Agreed (Score:5, Insightful)

    by eldavojohn ( 898314 ) * <{moc.liamg} {ta} {nhojovadle}> on Monday June 14, 2010 @07:35PM (#32572792) Journal

    Knuth's analysis is valid in the framework of his assumptions, and what is described in the linked article has been known as "cache oblivious b-tree" for not so short time.

    Yeah, using this logic, once quantum computers get to the point of solving general problems there's going to be an awful lot of people who "got it wrong" because their algorithms do not apply in a quantum computer. Advancements in technology causing algorithms & data structures to be tweaked means the original person who thought them up "got it wrong"? Ridiculous.

    "Oh, RSA was brilliant. But they got it wrong. You see, they failed to account for xanxinglydiumapping in 2056 computers. Poor stupid wrong bastards."

  • by InsertWittyNameHere ( 1438813 ) on Monday June 14, 2010 @07:36PM (#32572794)
    Yes it's true. In some real-world applications an algorithm encounters it's worst case running time more than the predicted theoretical average case running time. This is where case by case optimizations come into play.

    Knuth never claimed the algorithm was the best choice in YOUR particular case. Don't drag his name through your sensational mud for the sake of your slashvertisement.
  • by jfengel ( 409917 ) on Monday June 14, 2010 @07:37PM (#32572812) Homepage Journal

    There isn't any "off by ten error", and this isn't telling us anything we don't already know (in CS terms): implementation on an actual computer can be different in performance from an abstract machine.

    What the author is saying (quite well) is that the virtual memory performance amounts to cache misses, which cause extra performance overhead. He found a case where it was significant and got a 10x speedup in his particular application.

    The article is a little over-zealous its characterization, though it's careful to note that this is not actually a theoretical novelty. The summary, on the other hand, bastardizes and exaggerates it.

    The article is interesting, and worth reading, but if you RTFS without RTFA you'll be dumber than you were before. Thanks, kdawson.

  • news at 11 (Score:3, Insightful)

    by inkyblue2 ( 1117473 ) on Monday June 14, 2010 @07:47PM (#32572906)

    Algorithm revised in light of real-world performance constraints! Read all about it!

    Seriously, we just rewrote a tree (that runs in a high traffic serving environment) this month at work because it wasn't streamlined just right to take full advantage of the underlying architecture. No one will write a paper about it.

    Also, hey kids, profile your code.

  • Ow my eyes (Score:1, Insightful)

    by Anonymous Coward on Monday June 14, 2010 @07:49PM (#32572930)

    R-G colorblindness makes it near impossible to distinguish the graphs from each other.

  • Re:Nope, he didn't (Score:5, Insightful)

    by SashaMan ( 263632 ) on Monday June 14, 2010 @07:55PM (#32572986)

    Mod parent up. There was a comment on the article that referenced cache oblivious algorithms, which was a new concept to me and very interesting. Basically, this set of algorithms assumes a memory hierarchy (e.g. fast ram vs slow disk) that is optimized to limit the number of times the slower memory is accessed. Importantly, cache oblivious algorithms are optimal REGARDLESS of the size of the cache. That's opposed to a cache aware algorithm, like a normal b-tree, where the size of each node is set according to the page size of the machine.

    A very helpful overview here from MIT Opencourseware: []

  • Stupid headline (Score:4, Insightful)

    by Sloppy ( 14984 ) on Monday June 14, 2010 @07:57PM (#32573020) Homepage Journal

    Good article (increase your locality to get fewer page faults). Stupidly wrong Slashdot headline and summary. "Off by ten error?" Please!

    kdawson, next time, RTFA before you post someone's lame summary.

  • by MightyMartian ( 840721 ) on Monday June 14, 2010 @07:59PM (#32573028) Journal

    I don't think anyone accused him of not knowing what he's talking about. He's been accused of using an extremely hyperbolic headline to draw readers. Besides, what he's saying isn't exactly brand new.

  • by maraist ( 68387 ) * <michael DOT mara ... DOT n0spam DOT > on Monday June 14, 2010 @08:03PM (#32573068) Homepage
    Did you read the article? Not that it's terribly innovative or anything, but he's not saying anything about swap is slow.. He's saying when you need to page to disk (e.g. a database or disk-spilling cache), don't use traditional B-Tree's or B+Tree's which tend to only store a single level in a memory-page / disk-page. This causes Log(n) disk-lookups to find the leaf-node (where all B+Tree data lives, and half of B-Tree data lives). Instead store multiple levels of the tree in a contiguous mem/disk page. It's a relatively simple re-organization, and I don't think he's scaling out as well as he's suggesting (when you get to 1B + records - you're not going to get a large number of the 20 level lookups in a single vertical slice).
  • by McNihil ( 612243 ) on Monday June 14, 2010 @08:04PM (#32573072)

    To place algorithm theory in the same space as implementation is just plain wrong... and the article does say that in a better way than kdawson's sensationalist header.

    The implementation of a particular algorithm on specific hardware is more inline with resource economy than anything else and to subject comparison to the theoretical implementation is just ludicrous.

    For instance one could with simple means device a unit where only bubble-sort would be possible because of the massive overhead of qsort on said machine. This is trivial... for instance Casio Fx180 calculator.

  • by shoppa ( 464619 ) on Monday June 14, 2010 @08:29PM (#32573306)

    "Virtual memory leads to virtual performance."

    Just what I always wanted, a B-tree implementation that is guaranteed to swap.

  • by Anonymous Coward on Monday June 14, 2010 @08:36PM (#32573356)
    Then someone starts two copies of your program and you lock 160% of available memory
  • Re:Nope, he didn't (Score:5, Insightful)

    by Darinbob ( 1142669 ) on Monday June 14, 2010 @08:47PM (#32573432)
    Agreed. Very often these results get misused and misunderstood. Every operation has a cost, and you need to know what those costs are before you decide which one of those gets measured.

    For instance, it is common with sorting algorithms to measure (in Big-Oh notation) how many comparison operations there are. However these may not necessarily be very expensive relative to other operations; such as the cost of swapping two elements. If the elements are large structures and the comparison keys are integers, then you should be concerned about the number of copies. But if the elements are pointers to structures and the keys are long strings, then the concern will be with the number of comparisons.

    Almost every text book on algorithms you see is going to assume some sort of idealized computing device. In the real world there are multiple levels of memory speeds; caches up to virtual memory to disk. There are also other external factors that must be taken into account as well; for instance if you don't have enough memory, virtual or not, to hold all the data at once. Say you're sorting a database selection; or you're on a mainframe with limited memory space but reading customer data from tapes.
  • by Darinbob ( 1142669 ) on Monday June 14, 2010 @09:05PM (#32573536)
    But everyone already knows that (or should). Knuth knows that as well, caches and paging systems are not unknown things to him. He was writing a text book for students, and simplified a complicated problem to the point where it can be studied and analyzed more easily. Similar to teaching students Newtonian physics. It is sensationalist and naive to call Knuth wrong here.
  • Re:Nope, he didn't (Score:3, Insightful)

    by jemfinch ( 94833 ) on Monday June 14, 2010 @09:13PM (#32573608) Homepage

    No, and no. The data structure described by Henning-Kamp is not a B-tree, but a heap. Additionally, it's cache-aware, not cache-oblivious.

  • by Anonymous Coward on Monday June 14, 2010 @11:20PM (#32574386)

    A 300-GB backing store, memory mapped on a machine with no more than 16 GB of RAM, is quite typical. The user paid for 64 bits of address space, and I am not afraid to use it.

    As an undergraduate in the early 1990s, I am sure I was told about "out of core" algorithms within my first two years in school, including how people were starting to apply these to "out of cache" scenarios. In other words, everyone knows and takes for granted that you do not run a RAM-oriented algorithm such that its working set expands beyond the RAM space that has the access metrics you require. And B-trees/B*-trees are the classic data structure to organize data that is too large for your main working RAM. And the HPC community has been tuning hybrid data structures to live in multi-level RAM hierarchies for a long long time. I don't care how much of a "guru" the author is. If he wants to believe virtual memory is a transparent abstraction (a typical unix acolyte mistake), he is ignoring so much computer science; virtual memory is a slightly more graceful failure handler for temporary resource crises. If your average working set ever meets swapping, you are doing it wrong.

  • Re:Nope, he didn't (Score:4, Insightful)

    by JasterBobaMereel ( 1102861 ) on Tuesday June 15, 2010 @03:12AM (#32575314)

    Does it work : Yes

    Is there a better way : Yes

    Did it take 46 *years* for anyone to find a better way : yes

    If Knuth's mistakes take this long to spot then I still rate his code over most peoples

  • Re:Nope, he didn't (Score:5, Insightful)

    by yuhong ( 1378501 ) <yuhongbao_386 AT hotmail DOT com> on Tuesday June 15, 2010 @03:25AM (#32575368) Homepage
    Especially when virtual memory did not exist 46 years ago.
  • by gnud ( 934243 ) on Tuesday June 15, 2010 @04:51AM (#32575642)
    Because your paging algorithm will be much less tested, used and bugfixed than any OS paging algorithm. And the memory you allocated might be swapped out by the OS anyway.
  • by Nitage ( 1010087 ) on Tuesday June 15, 2010 @05:03AM (#32575682)
    He does know what he's talking about.

    Where, n is the number of items stored, the lookup time is given by log(n)*COMPARRISON_COST + log(n)*MEMORY_ACCESS_COST.
    That gives O(logn) only if COMPARRISON_COST and MEMORY_ACCESS_COST are *constant* - the entire point of the article is that MEMORY_ACCESS_COST is not constant, but increases as n increases.
  • by Rakishi ( 759894 ) on Tuesday June 15, 2010 @07:06AM (#32576104)

    Except it's a bounded increase. That's my whole point. There is nothing slower than the hard drive to cache from. You can assume every operation has the worst possible access time and you'd still end up with the same O() running time.


As in certain cults it is possible to kill a process if you know its true name. -- Ken Thompson and Dennis M. Ritchie