Forgot your password?
This discussion has been archived. No new comments can be posted.

The Law of Leaky Abstractions

Comments Filter:
  • I looked down in suprise to find a leaky abstraction. Had to change pants.
  • by ewithrow (409712) on Thursday November 14, 2002 @11:23AM (#4668348) Homepage
    Because the first step to solving any problem is always to create more problems.

  • by Anonymous Coward on Thursday November 14, 2002 @11:23AM (#4668354)
    Great! I'll print off a hardcopy and stick it on my refrigerator! I'm sure my wife will love it!
    • by Nexus7 (2919) on Thursday November 14, 2002 @11:40AM (#4668508)
      I guess a "liberal arts" major would be considered the quintessential "non-programmer". Certainly these people profess a non-concern for most technology, and of course, computing. I don't mean that they wouldn't know about Macs and PCs and Word, but we can agree that is a very superficial view of computing. But appreciating an article such as this "leaky abstractions" required some understanding of the way the networks work, even if there isn't any heavy math in it. In other words, the non-programmer wouldn't understand what the fuss is about.

      But that isn't how it's supposed to be. Liberal arts people are supposed to be interested in precisely this kind of thing, because it takes a higher level view of something that is usually presented in a way that only a CS major would find interesting or useful, and generalizes an idea to be applicable beyond the specific subject, networking.

      That is, engineers are today's liberal arts majors. It's time to get the so called "liberal arts" people out from politics, humanities, governance, management and other fields of importance because they just aren't trained to have or look for the conceptual basis of decision making and correctly apply it.
    • by jaredcoleman (616268) on Thursday November 14, 2002 @12:19PM (#4668845)
      Very funny! I agree that the average Joe is still going to be lost with the technical aspects of this article, but the author does generalize...

      And you can't drive as fast when it's raining, even though your car has windshield wipers and headlights and a roof and a heater, all of which protect you from caring about the fact that it's raining (they abstract away the weather), but lo, you have to worry about hydroplaning (or aquaplaning in England) and sometimes the rain is so strong you can't see very far ahead so you go slower in the rain, because the weather can never be completely abstracted away, because of the law of leaky abstractions

      I've heard a lot of people say that they can't believe how many homes, schools, and other buildings were destroyed by the huge thunderstorms that hit the states this past weekend, or that many people died. Hello, we haven't yet figured out how to control everything! American (middle to upper-class) life is a leaky abstaction. We find this out when we have a hard time coping with natural things that shake up our perfect (abstacted) world. That is what we all need to understand.

  • Informative (Score:5, Insightful)

    by Omkar (618823) on Thursday November 14, 2002 @11:26AM (#4668375) Homepage Journal
    Although I used to program as a hobby, my eyes bugged out when I saw this article. It's actually quite interesting; I finally realize why the hell people program in lower level languages.

    One point that I think could be addressed is backward compatibilty. I really know nothing about this, but don't the versions of the abstractions have to be fairly compatible with each other, especially on a large, distributed system? This extra abstraction of an abstraction has to be orders of magnitude more leaky. The best example I can think of is Windows.
    • Re:Informative (Score:2, Informative)

      by Jamey12345 (41681)
      Com and it's decendants are supposed to take care of this. In reality they work relativly well, but also lead to larger and larger libraries. The simple reason is because Com has to remain backwards compatable, by way of leaving in the old functions and methods.
    • Re:Informative (Score:4, Interesting)

      by Bastian (66383) on Thursday November 14, 2002 @11:35AM (#4668462)
      I think backward(really more slantwise or sideways) compatibility is almost certainly one of the reasons behind why C++ treats string literals as arrays of characters.

      I program in C++, but link to C libraries all the time. I also pass string literals into functions that have char* parameters. If C++ didn't treat string literals as char*, that would be impossible.
    • Re:Informative (Score:5, Interesting)

      by binaryDigit (557647) on Thursday November 14, 2002 @11:43AM (#4668524)
      I think it's a mistake to simply say that "high level languages make for buggier/bloated code". After all, many abstractions are created to solve common problems. If you don't have a string class then you'll either roll your own or have code that is complex and bug prone from calling 6 different functions to append a string. I don't think anyone would agree that it's better to write your own line drawing algorithm and have to program directly to the video card, vs calling one OpenGL method to do the same (well unless you need the absolute last word in performance, but that's another topic).
      • Re:Informative (Score:5, Insightful)

        by CynicTheHedgehog (261139) on Thursday November 14, 2002 @12:22PM (#4668865) Homepage
        Exactly. The only way to do something more easily or more efficiently is to restrict your scope. If you know something about a particular operation, or if you can make a few assumptions about it, your life because much easier. Take sorting, for example. Comparison sorts run (at best) in Omega(n log n) time. However, if you know the maximum range of numbers k in a set of length n, and k is much smaller than n, you can use a counting sort and do it in Theta(n) time. But what happens if you put a k+1 number in there? Well, all hell breaks loose.

        Another example: Java provides a pretty nifty mail API that you can use to create any kind of E-mail you can dream up in 20 lines of code or so. But you only ever want to send E-mail with a text/plain bodypart and a few attachments. So you make a class that does just that, and save yourself 15 lines of code every time you send mail. But suppose you want to send HTML E-mail, or you want to do something crazy with embedded bodyparts? Well it's not in the scope, so it's back to the old way.

        In order to abstract you have to reduce your scope somehow, and you have to ensure that certain parameters are within your scope (which adds overhead). And sometimes there's just nothing you can do about that overhead (like in TCP). And occasionally (if you abstract too much) you limit your scope to the point where your code can't be re-used.

        And as you abstract you tend to pile up a list of dependencies. Every library you abstract from needs to be included in addition to your library (assuming you use DLLs). So yes, there are maintenance and versioning headaches involved.

        Bottom line: non-trivial abstraction saves time up front, but costs later, mostly in the maintenance phase. There's probably some fixed kharmic limit to how much can be simplified beyond which any effort spent simply in displaces the problem.
      • Re:Informative (Score:3, Insightful)

        by oconnorcjo (242077)
        I think it's a mistake to simply say that "high level languages make for buggier/bloated code". After all, many abstractions are created to solve common problems. If you don't have a string class then you'll either roll your own or have code that is complex and bug prone from calling 6 different functions to append a string. -by binaryDigit.

        You said my own thoughts so well that I decided to quote you instead! Actually I thought the article just "stated the obvious" but that it didn't really matter. When I want to "just get things done", abstractions just make it so that I can do it in a magnitude faster than hand coding the machine language [even assembler is an abstraction]. Abstractions allow people to forget the BS and just get stuff done. Are abstractions slower, bloated, and buggy? To some degree yes! But the reason why they are so widely accepted and appreciated is that it makes life SIGNIFICANTLY easier, faster and better for programmers. My Uncle who was a programmer in the 1960's had a manager who said "an assembler compiler took too many cycles on the mainframe and was a waist of time". Now in the 1960's that may have been true but today that would be a joke. Today, I won't even go near a programming language lower than C and I like Python much better.

        • Re:Informative (Score:3, Interesting)

          by MrResistor (120588)
          even assembler is an abstraction

          I have to disagree. Every assembly instruction directly maps to a machine code instruction, so there is absolutely nothing hidden or being done behind the scenes.

          Assembly is just mnemonics for machine code. There is no abstraction in assembly since it doesn't hide anything, it simply makes it easier for humans to read through direct substitution. You might as well say that binary is an abstraction; you'd be equally correct.

          Also, there is no such thing as an "assembly compiler". There are assemblers, which are not compilers.

          • Re:Informative (Score:5, Informative)

            by GlassHeart (579618) on Thursday November 14, 2002 @03:37PM (#4670956) Journal
            Every assembly instruction directly maps to a machine code instruction, so there is absolutely nothing hidden or being done behind the scenes.

            Nonsense. On the 80x86, for example, a one-pass assembler cannot know if a forward JMP (jump) instruction is a "near jump" (8 bit offset) or a "far jump" (16 bit offset). It must generate code to assume the worst, so it tentatively creates a "far jump" and makes a note of this, because it doesn't know where it must jump to yet. In the backpatching phase, it may now know that the jump was actually "near", so it changes the instruction to a "near jump", fills in the 8-bit offset, and overwrites the spare 8 bits with a NOP (no operation) instead of shifting every single instruction below it up by one byte.

            A multi-pass assembler can avoid the NOP, but the fact is still that the same JMP assembly instruction can map to two distinct machine language sequences. The two different kinds of JMP are abstracted and hidden from the programmer.

            Typically, assemblers also provide:

            • Symbolic constants
            • Symbolic addresses
            • Macro definition and expansion
            • Numeric operators and conversion on constants
            • Strings
            which are all useful abstractions.
            • Plus ... (Score:3, Interesting)

              by SimonK (7722)
              ... machine code itself is an abstraction in the first place. This is especially true for modern processors that reorder instructions, execute them in parallel, and in extreme cases convert them into an entirely different instruction set.
    • Yes it would be nice to get back to 'first principles' and address machine resources directly, but its impossible to deliver a product to the marketplace in a meaningul timeframe using this method, particularly when Moore's law blurs the gains anyway - crap runs fast enough.
  • by Jack Wagner (444727) on Thursday November 14, 2002 @11:27AM (#4668384) Homepage Journal
    I'm of the idea that the whole premise that high-level tools and high level abstraction coupled with encasulation are the biggest bane of the software industry. We have these high level tools which most programmers really don't understand and are taught that they don't need to understand in order to build these sophisticated products.

    Yet, when something goes wrong with the underlying technology they are unable to properly fix their product because all they know is some basic java or VB and they don't understand anything about sockets or big-endian/little endian byte alignment issues. It's no wonder todays software is huge and slow and doesn't work as advertised.

    The one shining example of this is FreeBSD, which is based totally on low level C programs and they stress using legacy program methodologies in place of the fancy schmancy new ones which are faulty. The proof is in the pudding, as they say, when you look at the speed and quality if FreeBSD, as opposed to some of the slow ponderous OS's like Windows XP or Mac OSX.

    Warmest regards,
    • by binaryDigit (557647) on Thursday November 14, 2002 @11:37AM (#4668482)
      Well I'd agree up to a point. The fact is that FreeBSD is trying to solve a different problem/attract a different audience than XP/OSX. If FreeBSD was forced to add all the "features" of the other two in an attempt to compete in that space, then it would suffer mightily. You also have to take into account the level/type of programmers working on the these projects. While FreeBSD might have a core group of seasoned programmers working on it, the other two have a great range of programming experience working on it. A few guys who know what they're doing working on a smaller featureset would always produce better stuff than a large group of loosely coupled and widely differing talents working on a monsterous feature set.
    • by jorleif (447241) on Thursday November 14, 2002 @11:50AM (#4668579)
      The real problem is not the existance of high-level abstractions, but the fact that many programmers are unwilling or unable to understand the abstraction.

      So you say "let's get rid of encapsulation". But that doesn't solve this problem, because this problem is one of laziness or incompetence rather than not being allowed to touch what's inside the box. Encapsulation solves an entirely different problem, that is the one of modularity. If we abolish encapsulation the same clueless programmers will just produce code that is totally dependent on some obscure property in a specific version of a library. They still won't understand what the library does, so we're in a worse position than when we started.
    • I guess you are a troll (I hope you don't really believe what you're saying!!), but you're missing an important point: FreeBSD has security holes frequently found in it (ie, buffer overflows, heap overflows, format string attacks); bugs that would be impossible to make in a language like (say) Java. Security holes are the most salient example, but there are many perils to trying to do things "manually" and by programmer brute-force in a language like C.

      Java's not my favorite language, but programs written in it tend to be more robust than their C counterparts.

    • by Ars-Fartsica (166957) on Thursday November 14, 2002 @11:58AM (#4668647)
      This argument is so tired. The downfall of programming is now due to people who can't/don't write C. Twenty years before that the downfall of programming was C programmers who couldn't/wouldn't write assembler.

      The market rewards abstractions because they help create high level tools that get products on the market faster. Classic case in point is WordPerfect. They couldn't get their early assembler-based product out on a competitive schedule with Word or other C based programs.

      • The market rewards abstractions because they help create high level tools that get products on the market faster.

        Agreed, but I think it's important to note that without the understanding of where the abstraction came from, the high-level tools can be a bane rather than a help.

        I write C++ every day. Most of the time, I get to think in C++ abstraction land, which works fine. However, on days where the memory leaks, the buffer overflows, or the seg faults show up, it's not my abstraction knowledge of C++ that solves the problem. It's the lower level, assembly based, page swapping, memory layout understanding that does the debugging.

        I'm glad I don't have to write Assembly. It's fun as a novelty, but a pain in the butt for me to get something done. However, I'm not sure I could code as well without the underlying knowledge of what was happening under the abstraction. It's just too useful when something goes wrong...

      • by ChaosDiscord (4913) on Thursday November 14, 2002 @01:34PM (#4669562) Homepage Journal
        The market rewards...

        I'd suggest stearing clear of that phrase if your intention is to indicate that something is "good". It's also completes with things like "The market rewards skilled con men who disappear before you realize you've been rooked" and "The market rewards CEOs who destroy a company's long term future to boost short term stock value so he can cash out and retire."

        I'm all in favor of good abstractions, good abstractions will help make us more efficient. But even the best abstractions occasionally fail, and when they fail a programmer needs to be able to look beneath the abstraction. If you're unable to work below and without the abstraction, you'll be forced to call in external help which may cost you any of time, money, showing people you don't entirely trust your proprietary code, and being at the mercy of an external source. Sometimes this trade off is acceptable (I don't really have the foggest idea how my car works, when it breaks I put myself at the mercy of my auto shop). Perhaps we're even moving to a world where you have high level programmers that occasionally call in low level programmers for help. But you can't say that it's always best to live at the highest level of abstraction possible. You need to evaluate the benefits for each case individually.

        You point out that many people complain that some new programmers can't program C, while twenty years ago the complaint was the some new programmers can't program assembly. Interestingly both are right. If you're going to be skilled programmer you should have at least a general understanding of how a processor works and assembly. Without this knowledge you're going to be hard pressed to understand certain optimizations and cope with catastrophic failure. If you're going to write in Java or Python, knowing how the layer below (almost always C) works will help you appreciate the benefits of your higher level abstraction. You can't really judge the benefits of one language over another if you don't understand the improvements each tries to make over a lower level language. To be a skilled generalist programmer, you really need at least familiarity with every layer below the one you're using (this is why many Computer Science desgrees include at least one simple assembly class and one introductory electronics class).

    • The problem is that you need all these high level abstractions to reduce the workload of creating large systems. There's just no way you could have those VB monkeys be productive in C and there's just no way you are going to replace them with competent C programmers. Besides, competent programmers are more productive using high abstraction level tools.

      BSD is just a kernel + a small toolset. As soon as you start running all the regular stuff on top of it performance is comparable to a full blown linux/mac os X/windows desktop. Proof: mac os X, remove the none BSD stuff and see what's left: no ui, no friendly tools, no easy access to all connected devices.
    • I'm of the idea that the whole premise that high-level tools and high level abstraction coupled with encasulation are the biggest bane of the software industry.

      The problem with programming at the lower level, like Xlib [], is that it takes 2 years to get the first version of your program out. Then you move on to Xt and now it only takes only 1 year. Then you move on to Motif [] and it only takes you 6 months. Then you move on to Qt [] and it only takes 3 hours. Of course you want it to look slick so you use kdelibs [].

    • I'm of the idea that the whole premise that high-level tools and high level abstraction coupled with encasulation are the biggest bane of the software industry.

      Now that simply isn't true. Imagine you need to do reformat the data in a text file. In Perl, this is trivial, because you don't have to worry about buffer size and maximum line length, and so on. Plus you have a nice string type that lets you concatenate strings in a clean and efficient way.

      If you wrote the same program in C, you'd have to be careful to avoid buffer overruns, you'd have to work without regular expressions (and if you use a library, then that's a high level abstraction, right?), and you have to suffer with awful functions like strcat (or write your own).

      Is this really a win? What have you gained? Similarly, what will you have gained if you write a GUI-centric database querying application in C using raw Win32 calls instead of using Visual Basic? In the latter case, you'll write the same program in maybe 1/4 the time and it will have fewer bugs.
    • Amen.

      I can't tell you how many times this has happened to me. After 5 years of programming, my favorite language has become assembler - not because I hate HLL's, but rather, because you get exactly what you code in assembler. There are no "Leaky Abstractions" in assembly.

      And knowing the underlying details has made me a much better HLL coder. Knowing how the compiler is going to interpret a while statement or for loop makes me much more capable of writing fast, efficient C and C++ code. I can choose algorithms which I know the compiler can optimize well.

      And inevitably, at some point in a programmer's career, they'll come across a system in which the only available development tool is an assembler - at which point, the HLL-only programmer becomes completely useless to his company. This actually happened to me quite recently - my boss doesn't want to foot the bill for the rather expensive C++ compiler, so I'm left coding one of my projects in assembly. Because my education was focused on learning algorithms, rather than languages, my transition to using assembly has been a rather graceful one.

      • by radish (98371) on Thursday November 14, 2002 @12:18PM (#4668839) Homepage
        And inevitably, at some point in a programmer's career, they'll come across a system in which the only available development tool is an assembler

        Do you REALLY believe that? Are you mad? I can be pretty sure that in my career I will never be required to develop in assembler. And even if I do, I just have to brush up on my asm - big deal. To be honest, if I was asked to do that I'd probably quit anyway, it's not something I enjoy.

        Sure it's important to understand what's going on under the hood, but you have to use the right tools for the right job. No one would cut a lawn with scissors, or someones hair with a mower. Likewise I wouldn't write a FPS game in prolog or a web application in asm.

        The real point is that people have to get out of the "one language to code them all" mentality - you need to pick the right language and environment for the task at hand. From a personal point of view that means haveing a solid enough grasp of the fundamentals AT ALL LEVELS (i.e. including high and low level languages) to be able to learn the skills you inevitably won't have when you need them.

        Oh, and asm is just an abstraction of machine code. If you're coding in anything except 1's and 0's you're using a high(er) level language. Get over it.
        • No one would cut a lawn with scissors

          You'd be surprised what people will cut lawns with. In Brasilia (Capital of Brasil) the standard method of trimming lawns is to use a machete. No, I'm not talking about hacking down waist-high grass, I'm talking about trimming 3-inch high grass down to two inches by hacking repeatedly at it with a machete, trying to swing parallel to the ground as best you can. No, you don't do this yourself, you hire someone to do it. And if you're a salaried groundskeeper, it makes sure that you always have something to do - you woldn't want to be found slacking off during the day. On rare occasions I've seen people using hedge trimmers (aka big scissors) instead. My family was the only one I knew about in our neighborhood that even owned an American-style lawn mower. My parents were too cheap to hire a full-time groundskeeper, and I have lots of brothers and sisters who work for free :)

          Moral of the story; if it works and fits the requirements better, someone will do it.
      • by Junks Jerzey (54586) on Thursday November 14, 2002 @12:18PM (#4668843)
        After 5 years of programming, my favorite language has become assembler - not because I hate HLL's, but rather, because you get exactly what you code in assembler. There are no "Leaky Abstractions" in assembly.

        Ah, but you are wrong, and I'm speaking as someone who has written over 100,000 lines of assembly code. The great majority of the time, when you're faced with a programming problem, you don't want to think about that problem in terms of bits and and bytes and machine instructions and so on. You want to think about the problem in a more abstract way. After all, programming can be extremely difficult, and if you focus on the minute then you may never come up with a solution. And many high level abstractions simply do not exist in assembly language.

        What does a closure look like in assembly? It doesn't exist as a concept. Even if you write code using closures in Lisp, compile to assembly language, and then look at the assembly language, the concept of a closure will not exist in the assembly listing. Period. Because it's a higher level concept. It's like talking about a piece of lumber when you're working on a molecular level. There's no such thing when you're viewing things in such a primitive way. "Lumber" only becomes a concept when you have a macroscopic view. Would you want to build a house using individual molecules or would you build a house out of lumber or brick?
        • Here goes.... (Score:3, Interesting)

          by gillbates (106458)

          many high level abstractions simply do not exist in assembly language.

          Consider the following assembly language code:

          WHILE [input.txt.status] != [input.txt.eof] main_loop
          mov bx,infile_buffer
          call input.txt.read_line
          call input.txt.tokenize
          call evaluate_expression

          IF [expression_result] == 1 expression_match
          call write_fields
          ENDIF expression_match
          ENDWHILE main_loop

          Okay, so this is a little snippet of some assembly language I've just recently worked on. Here's the declaration for the input file:

          textfile input.txt

          That's it. Is this readable? Is it abstracted at a level high enough? The primary difference between assembly and a HLL is that in assembly one must invent their own logical abstractions for a real world problem, where languages such as C/C++ simply provide them.

          You've probably noticed that I'm using a lot of macros. In fact, classes, polymorphism, inheritance, and virtual functions are all easily implemented with macros. I'm using NASM right now (though I'm using my own macro processor), and it works very well. Because I understand both the high-level concepts and low level details, I can code rather high-level abstractions in a relatively low level language such as assembler. I get the best of both worlds: the ease of HLL abstraction with the power of low level coding.

          Please tell me what you think of this - I would honestly like to know. For the past few years, I've been working on macro sets and libraries that make coding in assembly seem more like a HLL. I've also set rules for function calls, like a function must preserve all registers, except those which are used to pass parms. With a well developed library of classes and routines, I've found that I can develop applications quickly and painlessly. Because I stick to coding standards, I'm able to reuse quite a bit (> 50%) of my assembly code.

          You might be tempted to ask, "Why not just write in a HLL then?" I do. In fact, I prefer to write in C++. But when the need arises, it's nice to be able to apply the same abstractions of a HLL in assembly. It just so happens that the need has arisen - I'm working on a project that will last a few weeks, and my boss doesn't consider it fiscally responsible to buy a $1200 compiler that will be used for such a short time.

          Interestingly, the use of assembly has made me a better programmer. Assembly forces one to think about what one is doing before coding the solution, which usually results in better code. Assembly forces me to come up with new abstractions and solutions that fit the problem, rather than fitting the problem into any given HLL's logical paradigm. Once I prove that the abstract algorithm will indeed solve the problem, I'm then free to convert the algorithm into assembly. Notice that this is the opposite of the way most HLL coders go about writing code - they find a way in which to squeeze a real world problem into the paradigm of the language used. Which leaves them at a loss when "leaky abstractions" occur. Assembly has the flexibility to adapt to the solution best suited to a problem, where as HLL's, while very good at solving the particular problem for which they were designed, perform very poorly for solving problems outside of their logical paradigms. While assembly is easily surpassed by C/C++, Java, or VB for many problems, there are simply some problems that cannot be solved without it. But even if one never uses assembly professionally, learning it forces one to learn to develop logical abstractions on their own - which in turn, increases their general problem solving ability, regardless of the language in which they write.

          I see the key difference between a good assembly coder and a HLL coder is that an assembly language coder must invent high level abstractions, where as the HLL coder simply learns and uses them. So assembly is a bit more mental work.

          • Re:Here goes.... (Score:4, Insightful)

            by Junks Jerzey (54586) on Thursday November 14, 2002 @03:46PM (#4671044)
            Please tell me what you think of this - I would honestly like to know.

            I've worked in a way similar to you, and I might still if it were as mindlessly simple to write assembly language programs under Windows as it was back in the day of smaller machines (i.e. no linker, no ugly DLL calling conventions, smaller instruction set, etc.). In addition to being fun, I agree in that assembly language is very useful when you need to develop your own abstractions that are very different from other languages, but it's a fine line. First, you have to really gain something substantial, not just a few microseconds of execution time and an executable that's ten kilobytes smaller. And second, sometimes you *think* you're developing a simpler abstraction, but by the time you're done you really haven't gained anything. It's like the classic newbie mistake of thinking that it's trivial to write a faster memcpy.

            These days, I prefer to work the opposite way in these situations. Rather than writing directly in assembly, I try to come up with a workable abstraction. Then I write a simple interpreter for that abstraction in as high a level language as I can (e.g. Lisp, Prolog). Then I work on ways of mechanically optimizing that symbolic representation, and eventually generate code (whether for a virtual machine or an existing assembly language). This is the best of both worlds: You get your own abstraction, you can work with assembly language, but you can mechanically handle the niggling details. If I come up with an optimization, then I can implement it, re-convert my symbolic code, and there it is. This assumes you're comfortable with the kind of programming promoted in books the _Structure and Interpretation of Computer Programs_ (maybe the best programming book ever written). To some extent, this is what you are doing with your macros, but you're working on a much lower level.
      • by Chris Mattern (191822) on Thursday November 14, 2002 @12:44PM (#4669045)
        > There are no "Leaky Abstractions" in assembly.

        At this point, may I whisper the word "microcode" in your ear?

        Chris Mattern
        • by YU Nicks NE Way (129084) on Thursday November 14, 2002 @01:41PM (#4669623)
          And I had my mod points expire this morning...

          He's exactly right. No leaky abstractions? I once worked on a project that was delayed six months because a simple, three-line assembler routine that had to return 1 actually returned something else about one time in a thousand. The code was basically "Load x 5 direct; load y addr ind; subt x from y in place", where we could see in the logic analyzer showing the contents in the address which was to be moved into register y was 6. Literally, 999 times in a thousand, that left a 1 in register y. The other time...

          We sent the errata off to the manufacturer, who had the good grace to be horrified. It then took six months to figure out how to work around the problem.

          And, hey, guess what? Semiconductor holes are a leaky abstraction, too. And don't get me started on subatomic particles.
        • Even at the machine code level, IEEE floating point is the mother of all leaky abstractions for real numbers.
    • by Yokaze (70883) on Thursday November 14, 2002 @12:12PM (#4668787)
      Don't blame the tools.

      High level languages and abstractions aren't the problem, neither are pointers in low level languages. It's the people, who can't use them.

      Abstraction does mean that you should not have to care about the underlying mechanisms, not that you should not understand them.
  • Great examples (Score:5, Interesting)

    by Ratface (21117) on Thursday November 14, 2002 @11:31AM (#4668427) Homepage Journal
    I loved the articel - great examples, especially this favourite bugbear of mine:

    "Problem: the ASP.NET designers needed to hide the fact that in HTML, there's no way to submit a form from a hyperlink. They do this by generating a few lines of JavaScript and attaching an onclick handler to the hyperlink. The abstraction leaks, though. If the end-user has JavaScript disabled, the ASP.NET application doesn't work correctly, and if the programmer doesn't understand what ASP.NET was abstracting away, they simply won't have any clue what is wrong."

    This is a perfect illustration of why I will never hire a web developer who cannot program HTML by hand. The kids who are going to university nowadays and building everything using Visual Studio or Dreamweaver will never be able to ensure that a site runs in an acceptable cross-platform way.

    Another favourite bugbear that this example reminds me of is people building page layouts using 4 levels of embedded tables where they reallly only needed 1 or 2.

    Great article!
    • I would just like to second this. Although I use VS.NET, I don't use it to build my HTML. Sure, I still allow ASP.NET to insert javascript (you can disable that), but I build Web Applications and a Web Application demands a UI that requires javascript to create. Nevertheless, I would never hire a dev who didn't understand the implications. We may need to one day create a simple web site in ASP.NET and I don't want to sit their explaining to him why a "LinkButton" won't render properly in all browsers.
  • Mirrored Here (Score:2, Redundant)

    by CFN (114345)
    It looked like that site was about to be /.ed, so I copied the text below.

    By Joel Spolsky
    November 11, 2002
    Printer Friendly Version

    There's a key piece of magic in the engineering of the Internet which you rely on every single day. It happens in the TCP protocol, one of the fundamental building blocks of the Internet.

    TCP is a way to transmit data that is reliable. By this I mean: if you send a message over a network using TCP, it will arrive, and it won't be garbled or corrupted.

    We use TCP for many things like fetching web pages and sending email. The reliability of TCP is why every exciting email from embezzling East Africans arrives in letter-perfect condition. O joy.

    By comparison, there is another method of transmitting data called IP which is unreliable. Nobody promises that your data will arrive, and it might get messed up before it arrives. If you send a bunch of messages with IP, don't be surprised if only half of them arrive, and some of those are in a different order than the order in which they were sent, and some of them have been replaced by alternate messages, perhaps containing pictures of adorable baby orangutans, or more likely just a lot of unreadable garbage that looks like the subject line of Taiwanese spam.

    Here's the magic part: TCP is built on top of IP. In other words, TCP is obliged to somehow send data reliably using only an unreliable tool.

    To illustrate why this is magic, consider the following morally equivalent, though somewhat ludicrous, scenario from the real world.

    Imagine that we had a way of sending actors from Broadway to Hollywood that involved putting them in cars and driving them across the country. Some of these cars crashed, killing the poor actors. Sometimes the actors got drunk on the way and shaved their heads or got nasal tattoos, thus becoming too ugly to work in Hollywood, and frequently the actors arrived in a different order than they had set out, because they all took different routes. Now imagine a new service called Hollywood Express, which delivered actors to Hollywood, guaranteeing that they would (a) arrive (b) in order (c) in perfect condition. The magic part is that Hollywood Express doesn't have any method of delivering the actors, other than the unreliable method of putting them in cars and driving them across the country. Hollywood Express works by checking that each actor arrives in perfect condition, and, if he doesn't, calling up the home office and requesting that the actor's identical twin be sent instead. If the actors arrive in the wrong order Hollywood Express rearranges them. If a large UFO on its way to Area 51 crashes on the highway in Nevada, rendering it impassable, all the actors that went that way are rerouted via Arizona and Hollywood Express doesn't even tell the movie directors in California what happened. To them, it just looks like the actors are arriving a little bit more slowly than usual, and they never even hear about the UFO crash.

    That is, approximately, the magic of TCP. It is what computer scientists like to call an abstraction: a simplification of something much more complicated that is going on under the covers. As it turns out, a lot of computer programming consists of building abstractions. What is a string library? It's a way to pretend that computers can manipulate strings just as easily as they can manipulate numbers. What is a file system? It's a way to pretend that a hard drive isn't really a bunch of spinning magnetic platters that can store bits at certain locations, but rather a hierarchical system of folders-within-folders containing individual files that in turn consist of one or more strings of bytes.

    Back to TCP. Earlier for the sake of simplicity I told a little fib, and some of you have steam coming out of your ears by now because this fib is driving you crazy. I said that TCP guarantees that your message will arrive. It doesn't, actually. If your pet snake has chewed through the network cable leading to your computer, and no IP packets can get through, then TCP can't do anything about it and your message doesn't arrive. If you were curt with the system administrators in your company and they punished you by plugging you into an overloaded hub, only some of your IP packets will get through, and TCP will work, but everything will be really slow.

    This is what I call a leaky abstraction. TCP attempts to provide a complete abstraction of an underlying unreliable network, but sometimes, the network leaks through the abstraction and you feel the things that the abstraction can't quite protect you from. This is but one example of what I've dubbed the Law of Leaky Abstractions:

    All non-trivial abstractions, to some degree, are leaky.

    Abstractions fail. Sometimes a little, sometimes a lot. There's leakage. Things go wrong. It happens all over the place when you have abstractions. Here are some examples.

    Something as simple as iterating over a large two-dimensional array can have radically different performance if you do it horizontally rather than vertically, depending on the "grain of the wood" -- one direction may result in vastly more page faults than the other direction, and page faults are slow. Even assembly programmers are supposed to be allowed to pretend that they have a big flat address space, but virtual memory means it's really just an abstraction, which leaks when there's a page fault and certain memory fetches take way more nanoseconds than other memory fetches.
    The SQL language is meant to abstract away the procedural steps that are needed to query a database, instead allowing you to define merely what you want and let the database figure out the procedural steps to query it. But in some cases, certain SQL queries are thousands of times slower than other logically equivalent queries. A famous example of this is that some SQL servers are dramatically faster if you specify "where a=b and b=c and a=c" than if you only specify "where a=b and b=c" even though the result set is the same. You're not supposed to have to care about the procedure, only the specification. But sometimes the abstraction leaks and causes horrible performance and you have to break out the query plan analyzer and study what it did wrong, and figure out how to make your query run faster.
    Even though network libraries like NFS and SMB let you treat files on remote machines "as if" they were local, sometimes the connection becomes very slow or goes down, and the file stops acting like it was local, and as a programmer you have to write code to deal with this. The abstraction of "remote file is the same as local file" leaks. Here's a concrete example for Unix sysadmins. If you put users' home directories on NFS-mounted drives (one abstraction), and your users create .forward files to forward all their email somewhere else (another abstraction), and the NFS server goes down while new email is arriving, the messages will not be forwarded because the .forward file will not be found. The leak in the abstraction actually caused a few messages to be dropped on the floor.
    C++ string classes are supposed to let you pretend that strings are first-class data. They try to abstract away the fact that strings are hard and let you act as if they were as easy as integers. Almost all C++ string classes overload the + operator so you can write s + "bar" to concatenate. But you know what? No matter how hard they try, there is no C++ string class on Earth that will let you type "foo" + "bar", because string literals in C++ are always char*'s, never strings. The abstraction has sprung a leak that the language doesn't let you plug. (Amusingly, the history of the evolution of C++ over time can be described as a history of trying to plug the leaks in the string abstraction. Why they couldn't just add a native string class to the language itself eludes me at the moment.)
    And you can't drive as fast when it's raining, even though your car has windshield wipers and headlights and a roof and a heater, all of which protect you from caring about the fact that it's raining (they abstract away the weather), but lo, you have to worry about hydroplaning (or aquaplaning in England) and sometimes the rain is so strong you can't see very far ahead so you go slower in the rain, because the weather can never be completely abstracted away, because of the law of leaky abstractions.
    One reason the law of leaky abstractions is problematic is that it means that abstractions do not really simplify our lives as much as they were meant to. When I'm training someone to be a C++ programmer, it would be nice if I never had to teach them about char*'s and pointer arithmetic. It would be nice if I could go straight to STL strings. But one day they'll write the code "foo" + "bar", and truly bizarre things will happen, and then I'll have to stop and teach them all about char*'s anyway. Or one day they'll be trying to call a Windows API function that is documented as having an OUT LPTSTR argument and they won't be able to understand how to call it until they learn about char*'s, and pointers, and Unicode, and wchar_t's, and the TCHAR header files, and all that stuff that leaks up.

    In teaching someone about COM programming, it would be nice if I could just teach them how to use the Visual Studio wizards and all the code generation features, but if anything goes wrong, they will not have the vaguest idea what happened or how to debug it and recover from it. I'm going to have to teach them all about IUnknown and CLSIDs and ProgIDS and ... oh, the humanity!

    In teaching someone about ASP.NET programming, it would be nice if I could just teach them that they can double-click on things and then write code that runs on the server when the user clicks on those things. Indeed ASP.NET abstracts away the difference between writing the HTML code to handle clicking on a hyperlink () and the code to handle clicking on a button. Problem: the ASP.NET designers needed to hide the fact that in HTML, there's no way to submit a form from a hyperlink. They do this by generating a few lines of JavaScript and attaching an onclick handler to the hyperlink. The abstraction leaks, though. If the end-user has JavaScript disabled, the ASP.NET application doesn't work correctly, and if the programmer doesn't understand what ASP.NET was abstracting away, they simply won't have any clue what is wrong.

    The law of leaky abstractions means that whenever somebody comes up with a wizzy new code-generation tool that is supposed to make us all ever-so-efficient, you hear a lot of people saying "learn how to do it manually first, then use the wizzy tool to save time." Code generation tools which pretend to abstract out something, like all abstractions, leak, and the only way to deal with the leaks competently is to learn about how the abstractions work and what they are abstracting. So the abstractions save us time working, but they don't save us time learning.

    And all this means that paradoxically, even as we have higher and higher level programming tools with better and better abstractions, becoming a proficient programmer is getting harder and harder.

    During my first Microsoft internship, I wrote string libraries to run on the Macintosh. A typical assignment: write a version of strcat that returns a pointer to the end of the new string. A few lines of C code. Everything I did was right from K&R -- one thin book about the C programming language.

    Today, to work on CityDesk, I need to know Visual Basic, COM, ATL, C++, InnoSetup, Internet Explorer internals, regular expressions, DOM, HTML, CSS, and XML. All high level tools compared to the old K&R stuff, but I still have to know the K&R stuff or I'm toast.

    Ten years ago, we might have imagined that new programming paradigms would have made programming easier by now. Indeed, the abstractions we've created over the years do allow us to deal with new orders of complexity in software development that we didn't have to deal with ten or fifteen years ago, like GUI programming and network programming. And while these great tools, like modern OO forms-based languages, let us get a lot of work done incredibly quickly, suddenly one day we need to figure out a problem where the abstraction leaked, and it takes 2 weeks. And when you need to hire a programmer to do mostly VB programming, it's not good enough to hire a VB programmer, because they will get completely stuck in tar every time the VB abstraction leaks.

    The Law of Leaky Abstractions is dragging us down.

    My company, Fog Creek Software, has just released FogBUGZ 3.0, the latest version of our innovative system for managing the software development process. Check it out now!

  • Really? (Score:2, Informative)

    by CowboyMeal (614487)
    Here's the magic part: TCP is built on top of IP. In other words, TCP is obliged to somehow send data reliably using only an unreliable tool.

    Seems to have worked pretty well so far...
  • Good Points.... (Score:3, Insightful)

    by Cap'n Canuck (622106) on Thursday November 14, 2002 @11:35AM (#4668461)
    This is a well written article that explains many things!

    The more layers, the slower things get.
    As computers speed up, it allows more feasible abstraction layers and next generation languages.

    Optimizing routines is still important.
    This holds true for ALL layers, as users are always expecting faster and faster apps.

    It's still important to know all the layers
    This allows old-timers to look down their noses at those whippersnappers.
    • This allows old-timers to look down their noses at those whippersnappers.

      And my first-year computer scientist coworker wonders why I think VB and .NET development tools are the work of the devil :P When they break (note the "when" not "if"), I and my boss are the only ones that can figure out why because we're used to C, C++, and all the nifty "old" languages that make you abstract on your own, gosh durn it! Now, where's my medication...?
    • The more layers, the slower things get.

      You can tell I'm tired. I read that line like this:

      The more lawyers, the slower things get.

      Well, it's true...

    • It's still important to know all the layers
      This allows old-timers to look down their noses at those whippersnappers.

      Uhhh yeah, right. What are you doing to do when it doesn't work as it should. You'll have to go get dad to help you put it together.

      This all reminds me of the Star Trek episode where people have become so reliant on machines that they are totally helpless when they break.
  • by cjustus (601772) on Thursday November 14, 2002 @11:36AM (#4668475) Homepage
    This article does a great job of describing problems in development environments today... But how do we solve it?

    Hire VB programmers with assembly language experience that are network admins and are familiar with assembly language? No - the solution is not to hire people just with skill XYZ, but to hire people that are capable of thinking for themselves, doing research, problem solving, and RTFM...

    It's a shame that so many companies hiring today are looking for skill X, Y, and Z... so some moron with X, Y, and Z on his resume gets hired, while someone that knows X - could learn Y and Z, and could outperform the moron, gets overlooked...

    Yet I see it happen at my IT Staffing company almost every day...

    • by Bastian (66383) on Thursday November 14, 2002 @12:03PM (#4668695)
      I got into that problem at a carreer fair the other day. Someone asked me if I had VB experience, and I said I didn't because I can't afford a copy of VB, but I was familiar with other flavors of BASIC, event-driven programming, and other RAD kits, so I would probably be able to learn VB in an afternoon.

      He just looked at me as if I had said someting in Klingon.

      I've been getting the distinct impression that the good programmer who knows his shit but doesn't have skills X, Y, and Z from the start is fucked because the people who do the hiring are clueless enough about programming that all they can do is watch for X, Y, and Z on a resume and fail to notice anything else.
      • To be frank, why should they waste their time with you when they can probably easily find people with deep VB experience quickly? Its not like you are really bringing deep skills to the table - as you claim, most of your other skills lie only in other high-level pseudo-coder toolkits.

        Knowing the specific tool is important. More important really than being well rounded. Look in any office and their best programmer is typically not the one with the grad degree, but the one who is willing to geek out and learn every detail about the particular language they use (sometimes these are the same person, often not).

        • by irix (22687) on Thursday November 14, 2002 @01:24PM (#4669462) Journal

          Knowing the specific tool is important. More important really than being well rounded.

          Riiight. That is why you always want to hire someone who graduated from Computer College X rather than a CS or Engineering program. I mean they know Visual Basic so well they can't program without it, but they couldn't solve another problem if their life depended on it. Just who I want to hire!

          Look in any office and their best programmer is typically not the one with the grad degree, but the one who is willing to geek out and learn every detail about the particular language they use

          So wrong. Where you you work, so I can avoid ever working there?

          The best programmers I work with are the smartest all-around people that also love their craft. Sure, they might know tool X or Y really well because they use it all of the time, but they also know a little bit about everything, and can learn tool Z at the drop of a hat. They also know that there are many tools/languages to solve a problem, and tend to use the best one.

          The language/tool geek who knows every nook and cranny about their language they use but can't think outside of that box is the last person I want to work with. They create code that is unmaintainable becuase they make heavy use of some obsure language features, but when it comes time to work with another language or tool they are useless. And no matter how much a particular problem cries out for another language to be used, they try and cram their square language into my round problem hole. No thanks.

    • No - the solution is not to hire people just with skill XYZ, but to hire people that are capable of thinking for themselves, doing research, problem solving, and RTFM...

      ...and of course all of these highly skilled "classical" eningeers want to work on your VB product, right?

      Maybe, if you are willing to pay them twice the going rate for the minimally competent VB coder...and at the end of the day someone who is not analytical by nature but just knows VB itself really well will probably produce a better product. Critical thinking is useful, but knowing your tool very well is probably twice as valuable.

      In any case as the barrier to entry into programming lowers, wages drop, and the cheapest adequate solution wins.

    • Hire VB programmers with assembly language experience that are network admins and are familiar with assembly language?

      Unfortunately, that's what it takes, sometimes. For example, I feel I'm a better software engineer since I studied for and obtained UNIX system and network administration certs. Why? The article says it very well. Even though I often work with J2EE-based web applications, for example, there are often failures in the web applications that are due to network infrastructure or system configuration. I have also found that I'm a better Java programmer due to having experience with C and C debuggers. C debuggers enforce a rigor in debugging that most high-level neophytes have not yet learned.
  • though joel sometimes thinks he is cooler than he is, this article he wrote was great. i think the points he make are valid.

    i think the solution is that we need to have CERTIFIED software components. to really evolve the art of software engineering, we need to stop having to deal with low-level ideas and increasingly interact with higher-level abstractions. but joel makes the point that any abstraction we may use is leaky, forcing us to understand how it works! so we need to start having certification processes for abstractions - i should be able to pick up a component/abstraction and believe in its specifications 100%. that way i can stop worrying about low-level details and start using components to build even more interesting systems / applications.

    • Well it's a nice idea in concept, but what software company would want to go throught the hassle of certifying a component? To avoid the "leaky abstraction", you are basically saying that it would have to be utterly bug free and that it would cover every conceivable use of the tool. Take the example of a c++ string class. Someone else (was it the author, I can't remember now) mentioned that the abstraction becomes leaky as soon as the novice C++ programmer tries "A" + "B", which would seem perfectly fine given instrinsic string support.
    • so we need to start having certification processes for abstractions

      Sure, that'd be nice I guess. And it'd be impossible, except for simple cases. Joel's example of the motorcar in the rain can't be fixed 100% unless you've got a weather-genie flying along above you.

      Leaky abstractions have 2 main causes:
      • Laws of Physics ("acts of god"). These you can never fix, because someday the implementation of the abstraction will encounter a situation beyond its ability. When that happens, the users can either give up and go home, or learn about the mechanisms underlying the abstraction and resolve it directly (or call in a specialist to do it).
      • Backwards compatibility ("acts of man"). Things like the ASP example of hyperlinks to submit forms, or the C++ string class (mostly). These we could possibly fix, but to do so is often prohibitively expensive, or just won't pay off fast enough. The goal of 100% specification confidence is nice, but today people aren't usually willing to make the sacrifices.

    • As I said before in another comment (which had even more spelling errors than this one!) sometimes the leaky abstraction is good enough.
      It depends upon your specs! Remember, performance/reliability is designed in from the ground up. It can't be 'tested' in later!

      Believing in a sub components specification 100% is important, but don't expect 100% up time. Infact, be thankful for 98%. If you want better specs, add redundancy. Duh.

      Using Joel's TCP model, if I have two computers (client & server) and in the middle of a request I physically power off the server, there is no way TCP can get my message there!

      So you expect every message you send to get there? Even though you have put your message bytes into the TCP socket and TCP says "Ok! I'll deliver these!" If your peer isn't there anymore you won't find out for two minutes! Uh, oh!

      So have another server, and have your own timeout. If you don't get your response back close your first socket and hot-swap.

      Your ceritification is your Mean time to failure analysis and your redundancy used to mitigate that. Avoid single points of failure.
      Your certification is within your own Business/Software Engineering process. You need X uptime. Can your components deliver?

      So maybe certification is the solution, but only in terms of 1) spec'ing the margins of error and 2) testing to ensure your system can meet what it was spec'd for. Don't ever expect to pick up some COTS pieces and have them "just work"!

      personally, I didn't see what was so great about the article.
  • by LordNimon (85072)
    In other words, TCP is obliged to somehow send data reliably using only an unreliable tool.

    How is this news? All technologies, on some level, are inherently unreliable. Therefore, in order to obtain reliability, it is always by adding some kind of redundancy to an unreliable tool.

    I've never seen a technology touted as "reliable" that didn't achieve that reliability without some kind of self-checking or redundancy somewhere. Maybe that's the author's point, but he makes it sound as TCP/IP is unique in this regard.

    This is what programming is all about. It seems pretty obvious to me.

    • but he makes it sound as TCP/IP is unique in this regard

      I think he's just using it as an example that almost anyone can relate to in the internet age. And while it is obvious to those of us who code/administer/tinker with such things, but his use of the "hollywood actors" analogy would seem to point to the fact that his audience is not us.
  • by nounderscores (246517) on Thursday November 14, 2002 @11:45AM (#4668542)
    Is our own bodies.

    I'm studying to be a bioinformatics guy with the university of melbourne and have just had the misfortune of looking into the enzymatic reactions that control oxygen based metabolism in the human body.

    I tried to do a worst case complexity analysis and gave up about half way through the krebs cycle. []

    When you think about it, most of basic science, some religeon and all of medicine has been about removing layers of abstraction to try and fix things when they go wrong.
  • start with, or at least be competent with, the basics.

    Any good programmer I've ever known started with the lower level stuff and was successful for this reason. Or at least plowed hard into the lower level stuff and learned it well when the time came, but the first scenario is preferable.

    Throwing dreamweaver in some HTML kiddie's lap, as much as I love dreamweaver, is not going to get you a reliable Internet DB app.
  • The mechanical, electrical, chemical, etc, engineering fields all have various degrees of abstractions via object hiding. It just isn't called "object hiding" because these are in fact real objects and there is no need to call them objects because it is natural to think of them that way. When debugging a design in any of these fields, it is not unusual to have to strip down layers and layers of "abstraction" (ie, pry into physical objects) to get to the bottom of a real tough problem. Those engineers with the broadest skills are usually the best at dealing with such problems. There isn't really anything new in the article.
  • Abstractions are good things, they help people understand systems better. An abstraction is a model, and if you use a model, you need to understand its limitations. High level languages have allowed a tremendous increase in programming productivity, with a price. But just as you cannot be really good at calculus without a thorough understanding of algebra, you cannot be a really good coder if you don't know what's going on underneath your abstractions.

    Great article, but don't throw out the high level tools and go back to coding Assembler.

  • by dereklam (621517) on Thursday November 14, 2002 @11:49AM (#4668572)
    I said that TCP guarantees that your message will arrive. It doesn't, actually. If your pet snake has chewed through the network cable leading to your computer, and no IP packets can get through, then TCP can't do anything about it and your message doesn't arrive.

    Unfortunately, his Slashdotted server is proving that to us right now.

  • by Anonymous Coward
    Hiding ugliness has its penalties. Over time processor performance buries these penalties. What Joel doesn't tell you is that abstraction can buy you productivity and simply put, make programming easier and open it up to larger audiences.

    Maybe someone out there prefers to program without any abstraction layers at all, but they inherit so much complexity that it will be impossible for them to deliver a meaningful product in a reasonable time.

    • Over time processor performance buries these penalties.

      Sometimes. But not all penalties can be resolved by more CPU cycles. A faster CPU can't repair your severed ethernet wire. It can't change all the existing HTML browsers and C++ compilers to cover up supposed "flaws".

      And unless this CPU is awesome enough to enable real AI, it can't save us from future shortcomings in computer-interface languages either.
  • by RobertB-DC (622190) on Thursday November 14, 2002 @11:50AM (#4668582) Homepage Journal
    As a VB programmer, I've *lived* leaky abstractions. Nowhere has it been more obvious than in the gigantic VB app our team is responsible for maintaining. 262 .frm files, 36 .bas modules, 25 .cls classes, and a handful of .ctl's.

    Much of our troubles, though, come from a single abstraction leak: the Sheridan (now called Infragistics []) Grid control.

    Like most VB controls, the Sheridan Grid is designed to be a drop-in, no-code way to display database information. It's designed to be bound to a data control, which itself is a drop-in no-code connection to a database using ODBC (or whatever the flavor of the month happens to be).

    The first leak comes in to play because we don't use the data control. We generate SQL on the fly because we need to do things with our queries that go beyond the capabilities of the control, and we don't save to the database until the client clicks "OK". Right away, we've broken the Sheridan Grid's paradigm, and the abstraction started to leak. So we put in buckets -- bucketfuls of code in obscure control events to buffer up changes to be written when the form closes.

    Just when things were running smoothly, Sheridan decided to take that kid with his finger in the dike and send him to an orphanage. They "upgraded" the control. The upgrade was designed to make the control more efficient, of course... but we don't use the data control! It completely broke all our code. Every single grid control in the application -- at least one and usually more in each of 200+ forms -- had to have all-new buckets installed to catch the leaks.

    You may be wondering by now why we haven't switched to a better grid control. Sure enough, there are controls out there now that would meet 95% of our needs... but 1) that 5% has high client visibility and 2) the rest of the code works, by golly! No way we're going to rip it out unless we're absolutely forced to.

    By the way, our application now compiles to a svelte 16.9 MEG...
  • This is one the best essays on software engineering I've read in a while. As a programmer and CS educator, it's really served to crystallize for me why (a) it seems so much harder for students to learn programming these days, and (b) why I've grown unhappy over the years with the series of new engineering paradigms that are in use. Extremely helpful for putting my own thoughts in order.

    The law statement itself, "all non-trivial abstractions, to some degree, are leaky" may possibly get included in my personal "top 10" aphorisms manifesto.
  • Bad term choosen IMO (Score:2, Interesting)

    by shadowtramp (573751)
    Don't know how the term leak fills to mentioned nonprogrammers but in programmers' slang the word leak has distinct meaning. And it does diverge from what Joel use it for.

    I think that term ooze would suite better in this case. It's possesses a kind of dirtiness to itself and the fealing the word 'ooze' gives me fits good with the matter of described problem. :o)

    Back to the article. To be serious, i think that Joel mixed all things as examples of 'Leaky abstraction' to no purpose. Too different situations make concept to fall apart. Here what i mean:
    In case of tcp/ip it denotes limits of abstraction. And regardless of programmer background every sane man should now those limits do exist.
    In case of page faults it's a matter of competence - there is no abstraction at all. You either do know how your code is compiled and executed or you don't. It's the same when you know what the phrase in a given language do realy mean or you don't. I simplify here.
    In the case of C++ strings i saw the only good example. What in my opinion the experience of STL and string class usage tells in this case is: one should understand the underlying mechanics fully before rely on abstraction behaviour.

    In programming it is realy simple to tell will the given 'abstraction' present you with an easter egg or not: if you can imagine FSM for the abstraction you will definitely know when to use it.

  • Joel should write an article about his leaky hosting company... or maybe his leaky colo-ed box.

    Since I can't get to the site and read the article, I'll tell some jokes.

    "Leaky Abstractions?! Is this guy talking about Proctology??"

    "Leaky Abstractions?! Someone get this guy a plumber!"

    "Leaky Abstractions?! I knew we should have used the pill!"

  • Isn't "leaky abstraction" a leaky abstraction of the leaky abstractions?
  • For something like IP packets, leaky is acceptable, but for many of those other abstractions, constipated might be a better adjective. Some of the tools and technologies out there (remember 4GL report-writers?) were big clogging masses that just won't pass.

    The first thing I do when I start in on a new technology (VBA, CGI, ASP, whatever) is to start digging in the corners and see where the model starts breaking down.

    What first turned me on to Perl (I'm trying hard not to flamebait here) was the statement that the easy things should be easy, and the hard things possible.

    But even Perl's abstraction of scalars could use a little fiber to move through the system. Turn strict and warnings on, and suddenly your "strings when you need 'em" stop being quite so flexible, and you start worrying about when it's really got something in it or not.

    On the HTML coding model breaking down, my current least-fave is checkboxes: if unchecked, they don't return a value to the server in the query, making it hard to determine whether the user is coming at the script the first time and there's no value, or just didn't select a value.

    Then there's always "This space intentionally left blank.*" Which I always footnote with "*...or it would have been if not for this notice." Sure sign of needing more regularity in your diet.
  • Sure, the author points out a few examples of leaky abstractions. But his conclusion seems to be that you always will have to know what is behind the abstraction.

    I don't think that's true. It depends on how the abstraction is defined, what it claims to be.

    You can use TCP without knowing how the internals work, and assume that all data will be reliably delivered, _unless_ the connection is broken. That is a better abstraction.

    And the virtual memory abstraction doesn't say that all memory accesses is guaranteed to take the same amount of time, so I don't consider it to be leaky.

    So I don't entirely agree with the author's conclusions.
  • by PureFiction (10256) on Thursday November 14, 2002 @12:05PM (#4668719)
    Proper abstractions avoid unintended side-effects by presenting a clean view of the intent and function of a given interface, and not just a collection of methods or structures.

    When I read what Joel wrote about "leaky abstractions" i saw a peice complaining about "unintended side-effects". I don't think the problem is with abstractions themselves, but rather the implementation.

    He lists some examples:

    1. TCP - This is a common one. Not only does TCP itself have peculiar behavior in less than ideal conditions, but it is also interfaced with via sockets, which compound the problem with an overly complex API.

    If you were to improve on this and present a clean reliable stream transport abstraction is would likely have a simple connection establishment interface and some simple read/write functionality. Errors would be propagated up to a user via exceptions or event handlers. But the point I want to make is that This problem can be solved with a cleaner abstraction.

    2. SQL - This example is a straw man. The problem with SQL is not the abstraction it provides, but the complexity of dealing with unknown table sizes when you are trying to write fast generic queries. There is no way to ensure that a query runs fastest on all systems. Every system and environment is going to have different amounts and types of data. The amount of data in a table, the way it is indexed, and the relationship between records is what determines a queries speed. There will always be manual performance tweaking of truly complex SQL simply because every scenario is different and the best solution will vary.

    3. C++ string classes. I think this is another straw man. Templates and pointers in C++ are hard. That is all there is too it. Most Visual Basic only coders will not be able to wrap their minds around the logic that is required to write complex c++ template code. No matter how good the abstractions get in C++, you will always have pointers, templates, and complexity. Sorry Joel, your VB coders are going to have to avoid c++ forever. There is simply no way around it. This abstraction was never meant to make things simple enough for Joe Programmer, but rather to provide an extensible, flexible tool for the programmer to use when dealing with string data. Most of the time this is simpler, sometimes it is more complex (try writing your own derived string class - there are a number of required constructors you must implement which are far from obvious) but the end result is that you have a flexible tool, not a leaky abstraction.

    There are some other examples, but you see the point. I think Joel has a good idea brewing regarding abstractions, complexity, and managing dependencies and unintended side-effects, but I do not think the problem is anywhere near as clear cut as he presents. As a discipline software engineering has a horrible track record of implementing arcane and overly complex abstractions for network programming (sockets and XTI) generic programming (templates, ref counting, custom allocators) and even operating systems API's (POSIX).

    Until we can leave behind all of the cruft and failed experiments of the past, start new with complete and simple abstractions that do not mask behavior, but rather recognize it and provide a mechansim to handle it gracefully, we will run into these problems.

    Luckily, such problems are fixable - just write the code. If joel were right and complex abstractions were fundamentally flawed, that would be a dark picture indeed for the future of software engineering (it is only going to grow ever more complex from here kids - make no mistake about it).
  • by Frums (112820) on Thursday November 14, 2002 @12:11PM (#4668775) Homepage Journal

    The problem that this article points to is a byproduct of large scale software development primarily being an exercise in complexity management. Abstraction is the foremost tool available in order to reduce complexity.

    In practice a person can keep track of between 4 and 11 different concepts at a time. The median lands around 5 or 6. If you want to do a self-experiment have someone write down a list of twenty words, then spend 30 seconds looking at them without using memnonic devices such as anagrams to memorize them then put the list away. After thirty more seconds write down as many as you can recall.

    This rule applies equally when attempting to manage a piece of software - you can only really keep track of between 4 and 11 "things" at the same time, so the most common practice is to abstract away complexity - you reduce an array of characters terminated by a null characters and a set of functions designed to operate on that array to a String. You went from half a dozen functions, a group of data pieces, and a pointer to a single concept - freeing up slots to pay attention to something else.

    The article is completely correct in its thesis that abstractions gloss over details and hide problems - they are designed to. Those details will stop you from being productive because the complexity in the project will rapidly outweigh your ability to pay attention to it.

    This range of attention sneaks into quite a few places in software development:

    • Team sizes: teams of between four and ten people are generally the most productive - they, and the project manager can track who is doing what without gross context switching.
    • Object models: When designing a system there will generally be between four and eleven components (which might break into more at lower levels of abstraction). Look at most UML diagrams - they will have four to eleven items (unless they were autogenerated by Rose).
    • Methods on an object: When it is initially created an object will generally have between four and eleven methods - after that it is said to start to smell, and could stand to be decomposed into multiple objects.
    • Vacation Days in the US: Typoically between five and ten - management can think about that many at one time, any more and they cannot keep track of them all in their head so there are obviously too many ;-)
    • Layers in the standard networking stack
    • Groups in a company
    • Directories off of /

    other schemes exist for managing complexity, but abstraction is decided human - you don't open a door, rotate, sit down backwards, rotate again, bend legs, position your feet, extend left arm, grasp door, pull door shut, insert key in iginition, extend right arm above left shoulder, grasp seatbelt, etc... you start the car. Software development is no different.

    There exist peopel that can track vast amounts of information in their heads at one time - look at Emacs - iirc RMS famously wrote it as he did because he could keep track of what everythign did, no one else can though. There also exist memnonic devices aside from abstraction for managing complexity - naming conventions, taxonomies, making notes, etc.


  • by jneemidge (183665) on Thursday November 14, 2002 @12:17PM (#4668823)
    This article reminds me of what I hated most about Jurassic Park (the novel -- the movie blessly omits the worst of it) -- Ian Malcolm's runaway pessimism. The arguments boil down to be very similar. Ian Malcolm says that complex systems are so complex we can't ever understand them all, so they're doomed to fail. Joel Spolsky says that our high-level abstractions will fail and because of that we're doomed to need to understand the lower-level stuff. I have problems with both -- they're a sort of technopessimism that I find particularly offensive, because they make the future sound bleak and hopeless despite volumes of evidence that, in fact, we've been dealing successfully with these issues for decades and they're just not all that bad.

    We have examples of massively complex systems that work very reliably day-in and day-out. Jet airplanes, for one; the national communications infrastructure, for another. Airplanes are, on the whole, amazingly reliable. The communications infrastructure, on the other hand, suffers numerous small faults, but they're quickly corrected and we go on. Both have some obvious leaky abstractions.

    The argument works out to be pessimism, pure and simple -- and unwarrented pessimism to boot. If it were true that things were all that bad, programmers would all _need_ to understand, in gruesome detail, the microarchitectures they're coding to, how instructions are executed, the full intricacies of the compiler, etc. All of these are leaky abstractions from time to time. They'd also need to understand every line of libc, the entire design of X11 top to bottom, and how their disk device driver works. For almost everyone, this simply isn't true. How many web designers, or even communications applications writers, know -- to the specification level -- how TCP/IP works? How many non-commo programmers?

    The point is that sometimes you need to know a _little bit_ about the place where the abstraction can leak. You don't need to know the lower layer exhaustively. A truly competant C programmer may need to know a bit about the architecture of their platform (or not -- it's better to write portable code) but they surely do not need to be a competant assembly programmer. A competant web designer may need to know something about HTML, but not the full intricacies of it. And so forth.

    Yes, the abstractions leak. Sometimes you get around this by having one person who knows the lower layer inside and out. Sometimes you delve down into the abstraction yourself. And sometimes, you say that, if the form fails because it needs JavaScript and the user turned off JavaScript, it's the user's fault and mandate JavaScript be turned on -- in fact, a _good_ high-level tool would generate defensive code to put a message on the user's screen telling them that, in the absence of JavaScript, things will fail (i.e. the tool itself can save the programmer from the leaky abstraction).

    What Ian Malcolm says, when you boil it all down, is that complex systems simply can't work in a sustained fashion. We have numerous examples which disprove the theory. That doesn't mean that we don't need to worry about failure cases, it means we overengineer and build in failsafes and error-correcting logic and so forth. What Joel Spolsky says is that you can't abstract away complexity because the abstractions leak. Again, there are numerous examples where we've done exactly that, and the abstraction has performed perfectly adequately for the vast majority of users. Someone needs to understand the complex part and maintain the abstraction -- the rest of us can get on with what we're doing, which may be just as complex, one layer up. We can, and do, stand on the shoulders of giants all the time -- we don't need to fully understand the giants to make use of their work.

  • Neal Stephenson... (Score:4, Interesting)

    by mikeee (137160) on Thursday November 14, 2002 @12:26PM (#4668901)
    Neal Stephenson talks about something similar in In the Beginning was the Command Line. He calls it interface shear; he's specificially referring the the UI as an abstraction (an interesting idea in itself). His take on it was that abstractions are metaphors, and that "interface shear"/"leaky abstractions" occur in regions where the metaphors break down.

    Interesting stuff...
  • by PacoSuarez (530275) on Thursday November 14, 2002 @12:37PM (#4668979)
    I think the article is great. And this principle can also be applied to Math. Theorems are much like library function calls. You can use them in your own proofs, without caring about how they are proved, because someone has already taken care of that for you. You prove that the hypothesis are true, and you get a result which is guaranteed to be true.

    The problem is that in real Math, you often need a slightly different result, or you cannot prove that the hypothesis are true in your situation. The solution often involves understanding what's "under the hood" in the theorem, so that you can modify the proof a little bit and use it.

    Every professional mathematician knows how to prove the theorems that he/she uses. There is no such thing as a "high-level mathematician", that doesn't really know the basics, but only uses sophisticated theorems in top of each other. The same should be true in programming, and this is what the article is about.

    The solution? Good education. If anyone wants to be considered a professional programmer, he/she should have a basic understanding of digital electronics, micro-processor design, assembly language (at least one), OS architechture, C, some object oriented language, databases... and should be able to understand the relationship between all those things, because when things go wrong, you may have to go to any of the levels.

    It's a lot of things to learn, but there is no other way out. Building software is a difficult task and whoever sells you something else lies.

  • by Anonymous Coward on Thursday November 14, 2002 @12:39PM (#4669004)
    I agree with Joel, but some people seem to be taking it as a call to stop abstracting. That's silly.

    Humans form abstractions. That's what we do. If you abstractions are leaking with detrimental consequences, then it could be because the programming language implementation you're using is deficient, not because you shouldn't be abstracting.

    Try a high-performnce Common Lisp compiler some time. Strong dynamic typing and optional static typing, macros, first class functions, generic-function OO, restartable conditions, first class symbols and package systems make abstraction much easier and less prone to arbitrary decisions and problems that are really:

    (i) workarounds for methods-in-once-class-rule of "ordinary" single-dispatch OO

    (ii) workarounds for the association of what an object is with the name of the object rather than it itself (static typing is really saying "this variable can only hold this type of object", dynamic typing is saying "the object is of this type". Some languages mix these issues up, or fail to recognise the distinction.

    (iii) workarounds for the fact that most languages, unlike forth and lisp, are not themselves extensible for new abstractions

    (iv) workarounds for the fact that one cannot pass functions as parameters to functions in some languages (doesn't apply to C, thanks to function pointers - here's where the odd fact that low level languages are often easier to form new abstractions in comes in)
    (v) workarounds for namespace issues

    (vi) workarounds for crappy or nonexistent exception processing

    Plus, Common Lisp's incremental compile cycle means faster development, and it's defined behaviours for in place modifications to running programs makes it good for high-availability systems
  • Argh. (Score:4, Insightful)

    by be-fan (61476) on Thursday November 14, 2002 @01:09PM (#4669286)
    While I usually like Joel's work, I'm pissed about the random jab at C++. For those he didn't read the article, he says something along the lines of

    "A lot of the stuff the C++ committe added to the language was to support a string class. Why didn't they just add a built-in string type?"

    It's good that a string class wasn't added, because that lead to templates being added! And templates are the greatest thing, ever!

    The comment shows a total lack of understanding of post-template, modern C++. People are free not to like C++ (or aspects of it) and to disagree with me about templates, of course, and in that case I'm fine with them taking stabs at it. But I get peeved when people who have just given the language a cursory glance try to fault it. If you haven't used stuff like Loki or Boost, or taken a look at some of the fascinating new design techniques that C++ has enabled, then you're in no place to comment about the language. At least read something like the newer editions of D&E or "The C++ Programming Language" then read "Modern C++" before spouting off.

    PS> Of course, I'm not accusing the author of being unknowledgable about C++ or anything of the sort. I'm just saying that this particular comment sounded rather n00b'ish, so to speak.
  • by Badgerman (19207) on Thursday November 14, 2002 @01:10PM (#4669303)
    Loved this article. Sent it on to my manager and a co-worker.

    One thing I liked especially is the danger of the Shiny New Thing. It may be neat and cool and save time, but knowing how to use it does not mean that you can do anything else - or function outside of it.

    Right now I'm on an ASP.NET project - and some ASP.NET stuff I actually like. But the IDE actually makes it harder to program responsibly, and even utilize .NET effectively. Unless one understands some of the underpinnings of this NEW technology, you actually can't take advantage of it. Throw in the generated code issues and the IDE, an abstraction of an abstraction, really is disadvantageous.

    A friend of mine just about strangled some web developers he worked with as they ONLY use tools (and they love all the Shiny New Ones) and barely know what the tools produce. This has led to hideous issues of having to configure servers and designs to work with their products as opposed to them actually knowing how they work. The guy's a saint, I swear.

    I think managers and employers need to be aware of how abstract things can get, and realize good programmers can "drill down" from one layer to another to fix things. A Shiny New Thing made with Shiny New Things does NOT mean the people who did it are talented programmers, or that they can haul your butt out of a jam when the Shiny New Thing looses its shine.

  • by dsaxena42 (623107) on Thursday November 14, 2002 @01:13PM (#4669339)
    Maybe I'm an old fashioned has-been but people doign software development should understand the fundamentals of how computers work. That means that they should understand things like memor management, they should understand what a pointer is, they should undertsand about how tight loops versus unrolled loops might affect the performance of the caches on their system. I meet so many "programmers" that have no understanding that there are architectural constraints on what they can and can't do. Software runs on hardware. If you're going to write software and treat the hardware as a black box, you're not going to write it as well, or as efficiently as you could be doing it.
  • by Dr. Awktagon (233360) on Thursday November 14, 2002 @01:29PM (#4669513) Homepage
    Looks like he just discovered and renamed the basic idea that "all models are incomplete". Any scientist could tell you that one! I remember a quote that goes something like this: The greatest scientific accomplishment of the 19th century was the discovery that everything could be described by equations. The greatest scientific accomplishment of the 20th century is that nothing can be described by equations.

    That's all an abstraction is: a model. Just like Newtonian physics, supply and demand under perfect competition, and every other hard or soft scientific model. Supply and demand breaks down at the low end (you can't be a market participant if you haven't eaten in a month) and the high end (if you are very wealthy, you can change the very rules of the game). Actually, supply and demand breaks down in many ways, all the time. Physics breaks down at the very large or very small scales. Planetary orbits have wobbles that can only be explained by more complex theories. Etc.

    No one should pretend that the models are complete. Or even pretend that complete models are possible. However, the models help you understand. They help you find better solutions (patterns) to problems. They help you discuss and comprehend and write about a problem. They allow you to focus on invariants (and even invariants break down).

    All models are imperfect. It's good that computer science folks can understand this, however, I don't think Joel should use a term like "leaky abstraction". Calling it that implies the existence of "unleaky abstraction", which is impossible. These are all just "abstractions" and the leaks are unavoidable.

    Example: if I unplug the computer and drop it out of a window, the software will fail. That's a leak, isn't it? Think of how you would address that in your model: maybe another computer watches this one so it can take over if it dies..etc..more complexity, more abstractions, more leaks....

    He also points out that, basically, computer science isn't exempt from the complexity, specialization, and growing body of understanding that accompanies every scientific field. Yeah, these days you have to know quite a bit of stuff about every part of a computer system in order to write truly reliable programs and understand what they are doing. And it will only get more complex as time goes on.

    But what else can we do, go back to the Apple II? (actually that's not a bad idea. That was the most reliable machine I've ever owned!)
  • by Animats (122034) on Thursday November 14, 2002 @02:55PM (#4670458) Homepage
    There's been a trend away from non-leaky abstractions. LISP, for example, was by design a non-leaky abstraction; you don't need to know how it works underneath. So is Smalltalk. Perl is close to being one. Java leaks more, leading to "write once, debug everywhere". C++ adds abstractions to C without hiding anything, which increases the visible complexity of the system.

    It's useful to distinguish between performance-related leaks and correctness leaks. SQL offers an abstraction for which the underlying database layout is irrelevant except for performance issues. The performance issues may be major, but at least you don't have to worry about correctness.

    C++ is notorious for this; the language adds abstractions with "gotchas" inside. If you try to get the C++ standards committee to clean things up, you always hear 1) that would break some legacy code somewhere, even if we can't find any examples of such code anywhere in any open source distro or Microsoft distro, or 2) that only bothers people who arent "l33t".

    Hardware people used to insist that everything you needed to know to use a part had to be on the datasheet. This is less true today, because hardware designers are so constrained on power, space, heat, and cost all at once.

"We learn from history that we learn nothing from history." -- George Bernard Shaw