The Law of Leaky Abstractions 524
Joel Spolsky has a nice essay on the leaky abstractions which underlie all high-level programming. Good reading, even for non-programmers.
The Tao is like a glob pattern: used but never used up. It is like the extern void: filled with infinite possibilities.
After a long night of coding once (Score:2, Funny)
Re:After a long night of coding once (Score:3, Funny)
Abstraction: (Score:3, Funny)
-E
even for non-programmers (Score:3, Funny)
Re:even for non-programmers (Score:4, Interesting)
But that isn't how it's supposed to be. Liberal arts people are supposed to be interested in precisely this kind of thing, because it takes a higher level view of something that is usually presented in a way that only a CS major would find interesting or useful, and generalizes an idea to be applicable beyond the specific subject, networking.
That is, engineers are today's liberal arts majors. It's time to get the so called "liberal arts" people out from politics, humanities, governance, management and other fields of importance because they just aren't trained to have or look for the conceptual basis of decision making and correctly apply it.
There are no more Rennisance Men. (Score:3, Insightful)
Unfortunately, the subjects you list have all grown to the point that no human can obtain even a BASIC understanding of all of them before he's too old to have a useful carreer left.
It was once possible to be a "Rennisance Man" - a master of ALL the sciences and arts reduced to teachability. No more. It's just too bloody large. (I say this as someone who attended a univerdity that claims to try to produce such people - centuries after the last of them is dead. B-) )
Unfortunately, "Liberal Arts" schools have, over much of the last century, been filled with the mathematically and technically illiterate - both because the students without the necessary skills gravitated there, and because the faculties themselves were so disabled, and in turn disparaged the skills they were incompetent to teach.
The engineering/scientific/biologic/technical cirriculum had constant feedback from the real world about what was true and what was false. But the "Arts Schools" taught classes where what was "right" was ONLY a matter of opinion - and grades solely a measure of how well you could regurgitate your Prof's pet bonnet-bees. (This DESPITE the fact that SOME of these theories could be TESTED - if only the academics understood, and/or believed in, things like the scientific method, statistics, and sampling methods.)
Yes the "Social 'Sciences'" are hard. But the bulk of their credentialed practitioners used this as an excuse to drop "science" from their methodologies. (This despite that fact that mathematics departments were generally part of the art, rather than the engineering, side of the school organization.)
I've been out of academia for a while now. I can hope that things have improved, as you seem to claim. But I have not personally seen any sign of such from the outside (other than your claim).
In my school days, too, many students on the Arts side of the wall knew tech, math, and the like. (Students are generally young, and still hunting for their muse.) But they would generally transfer out to some field more conducive to clear thought, drop out to use it in the real world, or (if they stayed in LS&A) suppress it or flunk out.
Re:even for non-programmers (Score:5, Insightful)
I've heard a lot of people say that they can't believe how many homes, schools, and other buildings were destroyed by the huge thunderstorms that hit the states this past weekend, or that many people died. Hello, we haven't yet figured out how to control everything! American (middle to upper-class) life is a leaky abstaction. We find this out when we have a hard time coping with natural things that shake up our perfect (abstacted) world. That is what we all need to understand.
Informative (Score:5, Insightful)
One point that I think could be addressed is backward compatibilty. I really know nothing about this, but don't the versions of the abstractions have to be fairly compatible with each other, especially on a large, distributed system? This extra abstraction of an abstraction has to be orders of magnitude more leaky. The best example I can think of is Windows.
Re:Informative (Score:2, Informative)
Re:Informative (Score:4, Interesting)
I program in C++, but link to C libraries all the time. I also pass string literals into functions that have char* parameters. If C++ didn't treat string literals as char*, that would be impossible.
Re:Informative (Score:4, Interesting)
Re:Informative (Score:4, Informative)
C++ is first of all definitely a separate language, in the sense that a C++ compiler will fail to compile legal C code. (Many compilers accept both C and C++ code, but must necessarily process them as either C or C++, not both.) If C and C++ are not "separate languages", then converting code from C to C++ or C++ to C must be a trivial task.
C++ is also a separate language in the sense that good C++ code (the definition of which does seem to differ depending on which edition of Stroustrup you look at) looks little like good C code. The STL (and templates in general) and exceptions result in source code that looks little like C.
it is merely an incremental improvement, an add-on basically. That's why it's called C++ and not D.
Stroustrup wrote: "I picked C++ because it was short, had nice interpretations, and wasn't of the form ``adjective C.'' in his own FAQ [att.com]. No mention of emphasis on C++ "merely" being an "incremental improvement".
If you're curious, yes, there was a B, but there was not actually an A (or rather, there was, but it was called ALGOL).
B came from BCPL [bell-labs.com].
Re:Informative (Score:4, Informative)
The most commonly-encountered difference is probably:
which is perfectly valid (and good) C. It is invalid C++ because the void * return type of malloc() must be explicitly cast to (char *). However: would actually be substandard C. Since C assumes that an unprototyped function returns int, forgetting to include stdio.h would generate an error, which is silenced by the explicit cast.Furthermore, the most recent iteration of ANSI C, known as C99, contains many features not supported in C++.
Re:Informative (Score:5, Interesting)
Re:Informative (Score:5, Insightful)
Another example: Java provides a pretty nifty mail API that you can use to create any kind of E-mail you can dream up in 20 lines of code or so. But you only ever want to send E-mail with a text/plain bodypart and a few attachments. So you make a class that does just that, and save yourself 15 lines of code every time you send mail. But suppose you want to send HTML E-mail, or you want to do something crazy with embedded bodyparts? Well it's not in the scope, so it's back to the old way.
In order to abstract you have to reduce your scope somehow, and you have to ensure that certain parameters are within your scope (which adds overhead). And sometimes there's just nothing you can do about that overhead (like in TCP). And occasionally (if you abstract too much) you limit your scope to the point where your code can't be re-used.
And as you abstract you tend to pile up a list of dependencies. Every library you abstract from needs to be included in addition to your library (assuming you use DLLs). So yes, there are maintenance and versioning headaches involved.
Bottom line: non-trivial abstraction saves time up front, but costs later, mostly in the maintenance phase. There's probably some fixed kharmic limit to how much can be simplified beyond which any effort spent simply in displaces the problem.
Re:Informative (Score:3, Insightful)
You said my own thoughts so well that I decided to quote you instead! Actually I thought the article just "stated the obvious" but that it didn't really matter. When I want to "just get things done", abstractions just make it so that I can do it in a magnitude faster than hand coding the machine language [even assembler is an abstraction]. Abstractions allow people to forget the BS and just get stuff done. Are abstractions slower, bloated, and buggy? To some degree yes! But the reason why they are so widely accepted and appreciated is that it makes life SIGNIFICANTLY easier, faster and better for programmers. My Uncle who was a programmer in the 1960's had a manager who said "an assembler compiler took too many cycles on the mainframe and was a waist of time". Now in the 1960's that may have been true but today that would be a joke. Today, I won't even go near a programming language lower than C and I like Python much better.
Re:Informative (Score:3, Interesting)
I have to disagree. Every assembly instruction directly maps to a machine code instruction, so there is absolutely nothing hidden or being done behind the scenes.
Assembly is just mnemonics for machine code. There is no abstraction in assembly since it doesn't hide anything, it simply makes it easier for humans to read through direct substitution. You might as well say that binary is an abstraction; you'd be equally correct.
Also, there is no such thing as an "assembly compiler". There are assemblers, which are not compilers.
Re:Informative (Score:5, Informative)
Nonsense. On the 80x86, for example, a one-pass assembler cannot know if a forward JMP (jump) instruction is a "near jump" (8 bit offset) or a "far jump" (16 bit offset). It must generate code to assume the worst, so it tentatively creates a "far jump" and makes a note of this, because it doesn't know where it must jump to yet. In the backpatching phase, it may now know that the jump was actually "near", so it changes the instruction to a "near jump", fills in the 8-bit offset, and overwrites the spare 8 bits with a NOP (no operation) instead of shifting every single instruction below it up by one byte.
A multi-pass assembler can avoid the NOP, but the fact is still that the same JMP assembly instruction can map to two distinct machine language sequences. The two different kinds of JMP are abstracted and hidden from the programmer.
Typically, assemblers also provide:
Plus ... (Score:3, Interesting)
Joel is ignoring market factors (Score:2)
The underlying problem with programming (Score:5, Insightful)
Yet, when something goes wrong with the underlying technology they are unable to properly fix their product because all they know is some basic java or VB and they don't understand anything about sockets or big-endian/little endian byte alignment issues. It's no wonder todays software is huge and slow and doesn't work as advertised.
The one shining example of this is FreeBSD, which is based totally on low level C programs and they stress using legacy program methodologies in place of the fancy schmancy new ones which are faulty. The proof is in the pudding, as they say, when you look at the speed and quality if FreeBSD, as opposed to some of the slow ponderous OS's like Windows XP or Mac OSX.
Warmest regards,
--Jack
Re:The underlying problem with programming (Score:4, Insightful)
Re:The underlying problem with programming (Score:5, Insightful)
So you say "let's get rid of encapsulation". But that doesn't solve this problem, because this problem is one of laziness or incompetence rather than not being allowed to touch what's inside the box. Encapsulation solves an entirely different problem, that is the one of modularity. If we abolish encapsulation the same clueless programmers will just produce code that is totally dependent on some obscure property in a specific version of a library. They still won't understand what the library does, so we're in a worse position than when we started.
Re:The underlying problem with programming (Score:2)
Java's not my favorite language, but programs written in it tend to be more robust than their C counterparts.
Re:The underlying problem with programming (Score:3, Insightful)
OK, fine: All programming languages have an implementation, and a host operating system. But switching from Java to C++ certainly won't save you from these kinds of problems. (In fact, there is only ONE C++ compiler that I know of that actually claims to be compliant with the C++ language definition; ie., every C++ compiler that people use to build programs is filled with bugs concerning the language's many insane idiosyncrasies!)
I only mean to point out Java as a *language* that has better abstraction properties than C++. (Personally, I prefer other less popular languages like SML, but Java serves the point as well. Just be careful not to take Java as the best example of a high-level language, because high-level languages can have better features and be more efficient than Java is.) Software written in a correct implementation of Java on a correct OS can not have buffer overflows. Programs written in C, even in a correct compiler (few exist) on a correct OS, can and frequently do have buffer overflows. I am reluctant to call this a programmer problem, because such bugs are so common, even among extremely good programmers. (Are the authors of Quake III Arena, Apache, MySQL, the Linux Kernel, ssh, BIND, Wu_ftpd just all bad programmers for having buffer overflows in their software? I personally don't think so...)
Some people are reading this article and using it as evidence to support low-level languages like C. ("Abstractions are leaky, so programmers need to have access to low-level details in order to work around leaky abstractions." or "Abstractions are leaky, so there's no point in using abstraction.") I think that's exactly backwards! Essentially, what I'm claiming is that C++ is a poor language for large software precisely because it does not allow programmers to create "tight" abstractions. Some languages do! These languages are much more pleasant to program in, and to build large software in! And in those languages, we can indeed make tight abstractions without the kinds of leaks he's described.
Time to market is the factor, not elegance (Score:5, Insightful)
The market rewards abstractions because they help create high level tools that get products on the market faster. Classic case in point is WordPerfect. They couldn't get their early assembler-based product out on a competitive schedule with Word or other C based programs.
Re:Time to market is the factor, not elegance (Score:3, Insightful)
Agreed, but I think it's important to note that without the understanding of where the abstraction came from, the high-level tools can be a bane rather than a help.
I write C++ every day. Most of the time, I get to think in C++ abstraction land, which works fine. However, on days where the memory leaks, the buffer overflows, or the seg faults show up, it's not my abstraction knowledge of C++ that solves the problem. It's the lower level, assembly based, page swapping, memory layout understanding that does the debugging.
I'm glad I don't have to write Assembly. It's fun as a novelty, but a pain in the butt for me to get something done. However, I'm not sure I could code as well without the underlying knowledge of what was happening under the abstraction. It's just too useful when something goes wrong...
Re:Time to market is the factor, not elegance (Score:5, Insightful)
I'd suggest stearing clear of that phrase if your intention is to indicate that something is "good". It's also completes with things like "The market rewards skilled con men who disappear before you realize you've been rooked" and "The market rewards CEOs who destroy a company's long term future to boost short term stock value so he can cash out and retire."
I'm all in favor of good abstractions, good abstractions will help make us more efficient. But even the best abstractions occasionally fail, and when they fail a programmer needs to be able to look beneath the abstraction. If you're unable to work below and without the abstraction, you'll be forced to call in external help which may cost you any of time, money, showing people you don't entirely trust your proprietary code, and being at the mercy of an external source. Sometimes this trade off is acceptable (I don't really have the foggest idea how my car works, when it breaks I put myself at the mercy of my auto shop). Perhaps we're even moving to a world where you have high level programmers that occasionally call in low level programmers for help. But you can't say that it's always best to live at the highest level of abstraction possible. You need to evaluate the benefits for each case individually.
You point out that many people complain that some new programmers can't program C, while twenty years ago the complaint was the some new programmers can't program assembly. Interestingly both are right. If you're going to be skilled programmer you should have at least a general understanding of how a processor works and assembly. Without this knowledge you're going to be hard pressed to understand certain optimizations and cope with catastrophic failure. If you're going to write in Java or Python, knowing how the layer below (almost always C) works will help you appreciate the benefits of your higher level abstraction. You can't really judge the benefits of one language over another if you don't understand the improvements each tries to make over a lower level language. To be a skilled generalist programmer, you really need at least familiarity with every layer below the one you're using (this is why many Computer Science desgrees include at least one simple assembly class and one introductory electronics class).
Re:The underlying problem with programming (Score:2)
BSD is just a kernel + a small toolset. As soon as you start running all the regular stuff on top of it performance is comparable to a full blown linux/mac os X/windows desktop. Proof: mac os X, remove the none BSD stuff and see what's left: no ui, no friendly tools, no easy access to all connected devices.
Re:The underlying problem with programming (Score:2)
The problem with programming at the lower level, like Xlib [x.org], is that it takes 2 years to get the first version of your program out. Then you move on to Xt and now it only takes only 1 year. Then you move on to Motif [lesstif.org] and it only takes you 6 months. Then you move on to Qt [trolltech.com] and it only takes 3 hours. Of course you want it to look slick so you use kdelibs [kde.org].
Re:The underlying problem with programming (Score:3, Insightful)
Now that simply isn't true. Imagine you need to do reformat the data in a text file. In Perl, this is trivial, because you don't have to worry about buffer size and maximum line length, and so on. Plus you have a nice string type that lets you concatenate strings in a clean and efficient way.
If you wrote the same program in C, you'd have to be careful to avoid buffer overruns, you'd have to work without regular expressions (and if you use a library, then that's a high level abstraction, right?), and you have to suffer with awful functions like strcat (or write your own).
Is this really a win? What have you gained? Similarly, what will you have gained if you write a GUI-centric database querying application in C using raw Win32 calls instead of using Visual Basic? In the latter case, you'll write the same program in maybe 1/4 the time and it will have fewer bugs.
Re:The underlying problem with programming (Score:3, Interesting)
Amen.
I can't tell you how many times this has happened to me. After 5 years of programming, my favorite language has become assembler - not because I hate HLL's, but rather, because you get exactly what you code in assembler. There are no "Leaky Abstractions" in assembly.
And knowing the underlying details has made me a much better HLL coder. Knowing how the compiler is going to interpret a while statement or for loop makes me much more capable of writing fast, efficient C and C++ code. I can choose algorithms which I know the compiler can optimize well.
And inevitably, at some point in a programmer's career, they'll come across a system in which the only available development tool is an assembler - at which point, the HLL-only programmer becomes completely useless to his company. This actually happened to me quite recently - my boss doesn't want to foot the bill for the rather expensive C++ compiler, so I'm left coding one of my projects in assembly. Because my education was focused on learning algorithms, rather than languages, my transition to using assembly has been a rather graceful one.
Re:The underlying problem with programming (Score:5, Insightful)
Do you REALLY believe that? Are you mad? I can be pretty sure that in my career I will never be required to develop in assembler. And even if I do, I just have to brush up on my asm - big deal. To be honest, if I was asked to do that I'd probably quit anyway, it's not something I enjoy.
Sure it's important to understand what's going on under the hood, but you have to use the right tools for the right job. No one would cut a lawn with scissors, or someones hair with a mower. Likewise I wouldn't write a FPS game in prolog or a web application in asm.
The real point is that people have to get out of the "one language to code them all" mentality - you need to pick the right language and environment for the task at hand. From a personal point of view that means haveing a solid enough grasp of the fundamentals AT ALL LEVELS (i.e. including high and low level languages) to be able to learn the skills you inevitably won't have when you need them.
Oh, and asm is just an abstraction of machine code. If you're coding in anything except 1's and 0's you're using a high(er) level language. Get over it.
Cutting lawns with scissors... (Score:3, Interesting)
You'd be surprised what people will cut lawns with. In Brasilia (Capital of Brasil) the standard method of trimming lawns is to use a machete. No, I'm not talking about hacking down waist-high grass, I'm talking about trimming 3-inch high grass down to two inches by hacking repeatedly at it with a machete, trying to swing parallel to the ground as best you can. No, you don't do this yourself, you hire someone to do it. And if you're a salaried groundskeeper, it makes sure that you always have something to do - you woldn't want to be found slacking off during the day. On rare occasions I've seen people using hedge trimmers (aka big scissors) instead. My family was the only one I knew about in our neighborhood that even owned an American-style lawn mower. My parents were too cheap to hire a full-time groundskeeper, and I have lots of brothers and sisters who work for free
Moral of the story; if it works and fits the requirements better, someone will do it.
Re:The underlying problem with programming (Score:5, Insightful)
Ah, but you are wrong, and I'm speaking as someone who has written over 100,000 lines of assembly code. The great majority of the time, when you're faced with a programming problem, you don't want to think about that problem in terms of bits and and bytes and machine instructions and so on. You want to think about the problem in a more abstract way. After all, programming can be extremely difficult, and if you focus on the minute then you may never come up with a solution. And many high level abstractions simply do not exist in assembly language.
What does a closure look like in assembly? It doesn't exist as a concept. Even if you write code using closures in Lisp, compile to assembly language, and then look at the assembly language, the concept of a closure will not exist in the assembly listing. Period. Because it's a higher level concept. It's like talking about a piece of lumber when you're working on a molecular level. There's no such thing when you're viewing things in such a primitive way. "Lumber" only becomes a concept when you have a macroscopic view. Would you want to build a house using individual molecules or would you build a house out of lumber or brick?
Here goes.... (Score:3, Interesting)
many high level abstractions simply do not exist in assembly language.
Consider the following assembly language code:
Okay, so this is a little snippet of some assembly language I've just recently worked on. Here's the declaration for the input file:
That's it. Is this readable? Is it abstracted at a level high enough? The primary difference between assembly and a HLL is that in assembly one must invent their own logical abstractions for a real world problem, where languages such as C/C++ simply provide them.
You've probably noticed that I'm using a lot of macros. In fact, classes, polymorphism, inheritance, and virtual functions are all easily implemented with macros. I'm using NASM right now (though I'm using my own macro processor), and it works very well. Because I understand both the high-level concepts and low level details, I can code rather high-level abstractions in a relatively low level language such as assembler. I get the best of both worlds: the ease of HLL abstraction with the power of low level coding.
Please tell me what you think of this - I would honestly like to know. For the past few years, I've been working on macro sets and libraries that make coding in assembly seem more like a HLL. I've also set rules for function calls, like a function must preserve all registers, except those which are used to pass parms. With a well developed library of classes and routines, I've found that I can develop applications quickly and painlessly. Because I stick to coding standards, I'm able to reuse quite a bit (> 50%) of my assembly code.
You might be tempted to ask, "Why not just write in a HLL then?" I do. In fact, I prefer to write in C++. But when the need arises, it's nice to be able to apply the same abstractions of a HLL in assembly. It just so happens that the need has arisen - I'm working on a project that will last a few weeks, and my boss doesn't consider it fiscally responsible to buy a $1200 compiler that will be used for such a short time.
Interestingly, the use of assembly has made me a better programmer. Assembly forces one to think about what one is doing before coding the solution, which usually results in better code. Assembly forces me to come up with new abstractions and solutions that fit the problem, rather than fitting the problem into any given HLL's logical paradigm. Once I prove that the abstract algorithm will indeed solve the problem, I'm then free to convert the algorithm into assembly. Notice that this is the opposite of the way most HLL coders go about writing code - they find a way in which to squeeze a real world problem into the paradigm of the language used. Which leaves them at a loss when "leaky abstractions" occur. Assembly has the flexibility to adapt to the solution best suited to a problem, where as HLL's, while very good at solving the particular problem for which they were designed, perform very poorly for solving problems outside of their logical paradigms. While assembly is easily surpassed by C/C++, Java, or VB for many problems, there are simply some problems that cannot be solved without it. But even if one never uses assembly professionally, learning it forces one to learn to develop logical abstractions on their own - which in turn, increases their general problem solving ability, regardless of the language in which they write.
I see the key difference between a good assembly coder and a HLL coder is that an assembly language coder must invent high level abstractions, where as the HLL coder simply learns and uses them. So assembly is a bit more mental work.
Re:Here goes.... (Score:4, Insightful)
I've worked in a way similar to you, and I might still if it were as mindlessly simple to write assembly language programs under Windows as it was back in the day of smaller machines (i.e. no linker, no ugly DLL calling conventions, smaller instruction set, etc.). In addition to being fun, I agree in that assembly language is very useful when you need to develop your own abstractions that are very different from other languages, but it's a fine line. First, you have to really gain something substantial, not just a few microseconds of execution time and an executable that's ten kilobytes smaller. And second, sometimes you *think* you're developing a simpler abstraction, but by the time you're done you really haven't gained anything. It's like the classic newbie mistake of thinking that it's trivial to write a faster memcpy.
These days, I prefer to work the opposite way in these situations. Rather than writing directly in assembly, I try to come up with a workable abstraction. Then I write a simple interpreter for that abstraction in as high a level language as I can (e.g. Lisp, Prolog). Then I work on ways of mechanically optimizing that symbolic representation, and eventually generate code (whether for a virtual machine or an existing assembly language). This is the best of both worlds: You get your own abstraction, you can work with assembly language, but you can mechanically handle the niggling details. If I come up with an optimization, then I can implement it, re-convert my symbolic code, and there it is. This assumes you're comfortable with the kind of programming promoted in books the _Structure and Interpretation of Computer Programs_ (maybe the best programming book ever written). To some extent, this is what you are doing with your macros, but you're working on a much lower level.
Re:The underlying problem with programming (Score:5, Insightful)
At this point, may I whisper the word "microcode" in your ear?
Chris Mattern
Re:The underlying problem with programming (Score:5, Interesting)
He's exactly right. No leaky abstractions? I once worked on a project that was delayed six months because a simple, three-line assembler routine that had to return 1 actually returned something else about one time in a thousand. The code was basically "Load x 5 direct; load y addr ind; subt x from y in place", where we could see in the logic analyzer showing the contents in the address which was to be moved into register y was 6. Literally, 999 times in a thousand, that left a 1 in register y. The other time...
We sent the errata off to the manufacturer, who had the good grace to be horrified. It then took six months to figure out how to work around the problem.
And, hey, guess what? Semiconductor holes are a leaky abstraction, too. And don't get me started on subatomic particles.
Re:The underlying problem with programming (Score:3, Insightful)
Re:The underlying problem with programming (Score:5, Insightful)
High level languages and abstractions aren't the problem, neither are pointers in low level languages. It's the people, who can't use them.
Abstraction does mean that you should not have to care about the underlying mechanisms, not that you should not understand them.
Great examples (Score:5, Interesting)
"Problem: the ASP.NET designers needed to hide the fact that in HTML, there's no way to submit a form from a hyperlink. They do this by generating a few lines of JavaScript and attaching an onclick handler to the hyperlink. The abstraction leaks, though. If the end-user has JavaScript disabled, the ASP.NET application doesn't work correctly, and if the programmer doesn't understand what ASP.NET was abstracting away, they simply won't have any clue what is wrong."
This is a perfect illustration of why I will never hire a web developer who cannot program HTML by hand. The kids who are going to university nowadays and building everything using Visual Studio or Dreamweaver will never be able to ensure that a site runs in an acceptable cross-platform way.
Another favourite bugbear that this example reminds me of is people building page layouts using 4 levels of embedded tables where they reallly only needed 1 or 2.
Great article!
Re:Great examples (Score:2)
Re:Great examples (Score:2)
Mirrored Here (Score:2, Redundant)
By Joel Spolsky
November 11, 2002
Printer Friendly Version
There's a key piece of magic in the engineering of the Internet which you rely on every single day. It happens in the TCP protocol, one of the fundamental building blocks of the Internet.
TCP is a way to transmit data that is reliable. By this I mean: if you send a message over a network using TCP, it will arrive, and it won't be garbled or corrupted.
We use TCP for many things like fetching web pages and sending email. The reliability of TCP is why every exciting email from embezzling East Africans arrives in letter-perfect condition. O joy.
By comparison, there is another method of transmitting data called IP which is unreliable. Nobody promises that your data will arrive, and it might get messed up before it arrives. If you send a bunch of messages with IP, don't be surprised if only half of them arrive, and some of those are in a different order than the order in which they were sent, and some of them have been replaced by alternate messages, perhaps containing pictures of adorable baby orangutans, or more likely just a lot of unreadable garbage that looks like the subject line of Taiwanese spam.
Here's the magic part: TCP is built on top of IP. In other words, TCP is obliged to somehow send data reliably using only an unreliable tool.
To illustrate why this is magic, consider the following morally equivalent, though somewhat ludicrous, scenario from the real world.
Imagine that we had a way of sending actors from Broadway to Hollywood that involved putting them in cars and driving them across the country. Some of these cars crashed, killing the poor actors. Sometimes the actors got drunk on the way and shaved their heads or got nasal tattoos, thus becoming too ugly to work in Hollywood, and frequently the actors arrived in a different order than they had set out, because they all took different routes. Now imagine a new service called Hollywood Express, which delivered actors to Hollywood, guaranteeing that they would (a) arrive (b) in order (c) in perfect condition. The magic part is that Hollywood Express doesn't have any method of delivering the actors, other than the unreliable method of putting them in cars and driving them across the country. Hollywood Express works by checking that each actor arrives in perfect condition, and, if he doesn't, calling up the home office and requesting that the actor's identical twin be sent instead. If the actors arrive in the wrong order Hollywood Express rearranges them. If a large UFO on its way to Area 51 crashes on the highway in Nevada, rendering it impassable, all the actors that went that way are rerouted via Arizona and Hollywood Express doesn't even tell the movie directors in California what happened. To them, it just looks like the actors are arriving a little bit more slowly than usual, and they never even hear about the UFO crash.
That is, approximately, the magic of TCP. It is what computer scientists like to call an abstraction: a simplification of something much more complicated that is going on under the covers. As it turns out, a lot of computer programming consists of building abstractions. What is a string library? It's a way to pretend that computers can manipulate strings just as easily as they can manipulate numbers. What is a file system? It's a way to pretend that a hard drive isn't really a bunch of spinning magnetic platters that can store bits at certain locations, but rather a hierarchical system of folders-within-folders containing individual files that in turn consist of one or more strings of bytes.
Back to TCP. Earlier for the sake of simplicity I told a little fib, and some of you have steam coming out of your ears by now because this fib is driving you crazy. I said that TCP guarantees that your message will arrive. It doesn't, actually. If your pet snake has chewed through the network cable leading to your computer, and no IP packets can get through, then TCP can't do anything about it and your message doesn't arrive. If you were curt with the system administrators in your company and they punished you by plugging you into an overloaded hub, only some of your IP packets will get through, and TCP will work, but everything will be really slow.
This is what I call a leaky abstraction. TCP attempts to provide a complete abstraction of an underlying unreliable network, but sometimes, the network leaks through the abstraction and you feel the things that the abstraction can't quite protect you from. This is but one example of what I've dubbed the Law of Leaky Abstractions:
All non-trivial abstractions, to some degree, are leaky.
Abstractions fail. Sometimes a little, sometimes a lot. There's leakage. Things go wrong. It happens all over the place when you have abstractions. Here are some examples.
Something as simple as iterating over a large two-dimensional array can have radically different performance if you do it horizontally rather than vertically, depending on the "grain of the wood" -- one direction may result in vastly more page faults than the other direction, and page faults are slow. Even assembly programmers are supposed to be allowed to pretend that they have a big flat address space, but virtual memory means it's really just an abstraction, which leaks when there's a page fault and certain memory fetches take way more nanoseconds than other memory fetches.
The SQL language is meant to abstract away the procedural steps that are needed to query a database, instead allowing you to define merely what you want and let the database figure out the procedural steps to query it. But in some cases, certain SQL queries are thousands of times slower than other logically equivalent queries. A famous example of this is that some SQL servers are dramatically faster if you specify "where a=b and b=c and a=c" than if you only specify "where a=b and b=c" even though the result set is the same. You're not supposed to have to care about the procedure, only the specification. But sometimes the abstraction leaks and causes horrible performance and you have to break out the query plan analyzer and study what it did wrong, and figure out how to make your query run faster.
Even though network libraries like NFS and SMB let you treat files on remote machines "as if" they were local, sometimes the connection becomes very slow or goes down, and the file stops acting like it was local, and as a programmer you have to write code to deal with this. The abstraction of "remote file is the same as local file" leaks. Here's a concrete example for Unix sysadmins. If you put users' home directories on NFS-mounted drives (one abstraction), and your users create
C++ string classes are supposed to let you pretend that strings are first-class data. They try to abstract away the fact that strings are hard and let you act as if they were as easy as integers. Almost all C++ string classes overload the + operator so you can write s + "bar" to concatenate. But you know what? No matter how hard they try, there is no C++ string class on Earth that will let you type "foo" + "bar", because string literals in C++ are always char*'s, never strings. The abstraction has sprung a leak that the language doesn't let you plug. (Amusingly, the history of the evolution of C++ over time can be described as a history of trying to plug the leaks in the string abstraction. Why they couldn't just add a native string class to the language itself eludes me at the moment.)
And you can't drive as fast when it's raining, even though your car has windshield wipers and headlights and a roof and a heater, all of which protect you from caring about the fact that it's raining (they abstract away the weather), but lo, you have to worry about hydroplaning (or aquaplaning in England) and sometimes the rain is so strong you can't see very far ahead so you go slower in the rain, because the weather can never be completely abstracted away, because of the law of leaky abstractions.
One reason the law of leaky abstractions is problematic is that it means that abstractions do not really simplify our lives as much as they were meant to. When I'm training someone to be a C++ programmer, it would be nice if I never had to teach them about char*'s and pointer arithmetic. It would be nice if I could go straight to STL strings. But one day they'll write the code "foo" + "bar", and truly bizarre things will happen, and then I'll have to stop and teach them all about char*'s anyway. Or one day they'll be trying to call a Windows API function that is documented as having an OUT LPTSTR argument and they won't be able to understand how to call it until they learn about char*'s, and pointers, and Unicode, and wchar_t's, and the TCHAR header files, and all that stuff that leaks up.
In teaching someone about COM programming, it would be nice if I could just teach them how to use the Visual Studio wizards and all the code generation features, but if anything goes wrong, they will not have the vaguest idea what happened or how to debug it and recover from it. I'm going to have to teach them all about IUnknown and CLSIDs and ProgIDS and
In teaching someone about ASP.NET programming, it would be nice if I could just teach them that they can double-click on things and then write code that runs on the server when the user clicks on those things. Indeed ASP.NET abstracts away the difference between writing the HTML code to handle clicking on a hyperlink () and the code to handle clicking on a button. Problem: the ASP.NET designers needed to hide the fact that in HTML, there's no way to submit a form from a hyperlink. They do this by generating a few lines of JavaScript and attaching an onclick handler to the hyperlink. The abstraction leaks, though. If the end-user has JavaScript disabled, the ASP.NET application doesn't work correctly, and if the programmer doesn't understand what ASP.NET was abstracting away, they simply won't have any clue what is wrong.
The law of leaky abstractions means that whenever somebody comes up with a wizzy new code-generation tool that is supposed to make us all ever-so-efficient, you hear a lot of people saying "learn how to do it manually first, then use the wizzy tool to save time." Code generation tools which pretend to abstract out something, like all abstractions, leak, and the only way to deal with the leaks competently is to learn about how the abstractions work and what they are abstracting. So the abstractions save us time working, but they don't save us time learning.
And all this means that paradoxically, even as we have higher and higher level programming tools with better and better abstractions, becoming a proficient programmer is getting harder and harder.
During my first Microsoft internship, I wrote string libraries to run on the Macintosh. A typical assignment: write a version of strcat that returns a pointer to the end of the new string. A few lines of C code. Everything I did was right from K&R -- one thin book about the C programming language.
Today, to work on CityDesk, I need to know Visual Basic, COM, ATL, C++, InnoSetup, Internet Explorer internals, regular expressions, DOM, HTML, CSS, and XML. All high level tools compared to the old K&R stuff, but I still have to know the K&R stuff or I'm toast.
Ten years ago, we might have imagined that new programming paradigms would have made programming easier by now. Indeed, the abstractions we've created over the years do allow us to deal with new orders of complexity in software development that we didn't have to deal with ten or fifteen years ago, like GUI programming and network programming. And while these great tools, like modern OO forms-based languages, let us get a lot of work done incredibly quickly, suddenly one day we need to figure out a problem where the abstraction leaked, and it takes 2 weeks. And when you need to hire a programmer to do mostly VB programming, it's not good enough to hire a VB programmer, because they will get completely stuck in tar every time the VB abstraction leaks.
The Law of Leaky Abstractions is dragging us down.
My company, Fog Creek Software, has just released FogBUGZ 3.0, the latest version of our innovative system for managing the software development process. Check it out now!
Really? (Score:2, Informative)
Seems to have worked pretty well so far...
Good Points.... (Score:3, Insightful)
The more layers, the slower things get.
As computers speed up, it allows more feasible abstraction layers and next generation languages.
Optimizing routines is still important.
This holds true for ALL layers, as users are always expecting faster and faster apps.
It's still important to know all the layers
This allows old-timers to look down their noses at those whippersnappers.
Re:Good Points.... (Score:2, Funny)
And my first-year computer scientist coworker wonders why I think VB and
Re:Good Points.... (Score:2, Funny)
You can tell I'm tired. I read that line like this:
The more lawyers, the slower things get.
Well, it's true...
Re:Good Points.... (Score:2)
This allows old-timers to look down their noses at those whippersnappers.
Uhhh yeah, right. What are you doing to do when it doesn't work as it should. You'll have to go get dad to help you put it together.
This all reminds me of the Star Trek episode where people have become so reliant on machines that they are totally helpless when they break.
Good article - how to solve? (Score:5, Insightful)
Hire VB programmers with assembly language experience that are network admins and are familiar with assembly language? No - the solution is not to hire people just with skill XYZ, but to hire people that are capable of thinking for themselves, doing research, problem solving, and RTFM...
It's a shame that so many companies hiring today are looking for skill X, Y, and Z... so some moron with X, Y, and Z on his resume gets hired, while someone that knows X - could learn Y and Z, and could outperform the moron, gets overlooked...
Yet I see it happen at my IT Staffing company almost every day...
Re:Good article - how to solve? (Score:5, Interesting)
He just looked at me as if I had said someting in Klingon.
I've been getting the distinct impression that the good programmer who knows his shit but doesn't have skills X, Y, and Z from the start is fucked because the people who do the hiring are clueless enough about programming that all they can do is watch for X, Y, and Z on a resume and fail to notice anything else.
They were right to skip on you (Score:2)
Knowing the specific tool is important. More important really than being well rounded. Look in any office and their best programmer is typically not the one with the grad degree, but the one who is willing to geek out and learn every detail about the particular language they use (sometimes these are the same person, often not).
Re:They were right to skip on you (Score:5, Insightful)
Knowing the specific tool is important. More important really than being well rounded.
Riiight. That is why you always want to hire someone who graduated from Computer College X rather than a CS or Engineering program. I mean they know Visual Basic so well they can't program without it, but they couldn't solve another problem if their life depended on it. Just who I want to hire!
Look in any office and their best programmer is typically not the one with the grad degree, but the one who is willing to geek out and learn every detail about the particular language they use
So wrong. Where you you work, so I can avoid ever working there?
The best programmers I work with are the smartest all-around people that also love their craft. Sure, they might know tool X or Y really well because they use it all of the time, but they also know a little bit about everything, and can learn tool Z at the drop of a hat. They also know that there are many tools/languages to solve a problem, and tend to use the best one.
The language/tool geek who knows every nook and cranny about their language they use but can't think outside of that box is the last person I want to work with. They create code that is unmaintainable becuase they make heavy use of some obsure language features, but when it comes time to work with another language or tool they are useless. And no matter how much a particular problem cries out for another language to be used, they try and cram their square language into my round problem hole. No thanks.
No, cheapest adequate solution wins (Score:2)
...and of course all of these highly skilled "classical" eningeers want to work on your VB product, right?
Maybe, if you are willing to pay them twice the going rate for the minimally competent VB coder...and at the end of the day someone who is not analytical by nature but just knows VB itself really well will probably produce a better product. Critical thinking is useful, but knowing your tool very well is probably twice as valuable.
In any case as the barrier to entry into programming lowers, wages drop, and the cheapest adequate solution wins.
Re:Good article - how to solve? (Score:2)
Unfortunately, that's what it takes, sometimes. For example, I feel I'm a better software engineer since I studied for and obtained UNIX system and network administration certs. Why? The article says it very well. Even though I often work with J2EE-based web applications, for example, there are often failures in the web applications that are due to network infrastructure or system configuration. I have also found that I'm a better Java programmer due to having experience with C and C debuggers. C debuggers enforce a rigor in debugging that most high-level neophytes have not yet learned.
leaky abstractions need to be patched up (Score:2, Insightful)
i think the solution is that we need to have CERTIFIED software components. to really evolve the art of software engineering, we need to stop having to deal with low-level ideas and increasingly interact with higher-level abstractions. but joel makes the point that any abstraction we may use is leaky, forcing us to understand how it works! so we need to start having certification processes for abstractions - i should be able to pick up a component/abstraction and believe in its specifications 100%. that way i can stop worrying about low-level details and start using components to build even more interesting systems / applications.
Re:leaky abstractions need to be patched up (Score:2)
Re:leaky abstractions need to be patched up (Score:3, Insightful)
Sure, that'd be nice I guess. And it'd be impossible, except for simple cases. Joel's example of the motorcar in the rain can't be fixed 100% unless you've got a weather-genie flying along above you.
Leaky abstractions have 2 main causes:
No, the solution is redundancy and realistic specs (Score:2)
It depends upon your specs! Remember, performance/reliability is designed in from the ground up. It can't be 'tested' in later!
Believing in a sub components specification 100% is important, but don't expect 100% up time. Infact, be thankful for 98%. If you want better specs, add redundancy. Duh.
Using Joel's TCP model, if I have two computers (client & server) and in the middle of a request I physically power off the server, there is no way TCP can get my message there!
So you expect every message you send to get there? Even though you have put your message bytes into the TCP socket and TCP says "Ok! I'll deliver these!" If your peer isn't there anymore you won't find out for two minutes! Uh, oh!
So have another server, and have your own timeout. If you don't get your response back close your first socket and hot-swap.
Your ceritification is your Mean time to failure analysis and your redundancy used to mitigate that. Avoid single points of failure.
Your certification is within your own Business/Software Engineering process. You need X uptime. Can your components deliver?
So maybe certification is the solution, but only in terms of 1) spec'ing the margins of error and 2) testing to ensure your system can meet what it was spec'd for. Don't ever expect to pick up some COTS pieces and have them "just work"!
personally, I didn't see what was so great about the article.
Re:leaky abstractions need to be patched up (Score:2)
oh man, i forgot - when joe says something we must follow and not question him. two internships at microsoft and the man is a messiah!
but seriously, take hardware engineering for example. when boeing builds a plane, it can say with a high confidence rate that 1) it won't crash due to mechanical failure, and 2) it'll last such and such years. hardware components are often 'certified' - until we evolve the software engineering process to be similarly reliable we'll be stuck with a status quo of buggy software. regardless of what joe says, this CAN be done. take NASA software for spacecraft - they RIGOROUSLY test every possibility and guarantee their specifications (they have no choice but to do so - if they don't follow orders, then people die). some have even suggested enacting 'lemon laws' for software - basically, if software fails on you, then you can sue the creator.
as technology becomes more of a support structure for our daily existence, we'll increasingly have to ensure its reliability.
Duh! (Score:2)
How is this news? All technologies, on some level, are inherently unreliable. Therefore, in order to obtain reliability, it is always by adding some kind of redundancy to an unreliable tool.
I've never seen a technology touted as "reliable" that didn't achieve that reliability without some kind of self-checking or redundancy somewhere. Maybe that's the author's point, but he makes it sound as TCP/IP is unique in this regard.
This is what programming is all about. It seems pretty obvious to me.
Re:Duh! (Score:2)
I think he's just using it as an example that almost anyone can relate to in the internet age. And while it is obvious to those of us who code/administer/tinker with such things, but his use of the "hollywood actors" analogy would seem to point to the fact that his audience is not us.
The ultimate leaky abstraction (Score:5, Insightful)
I'm studying to be a bioinformatics guy with the university of melbourne and have just had the misfortune of looking into the enzymatic reactions that control oxygen based metabolism in the human body.
I tried to do a worst case complexity analysis and gave up about half way through the krebs cycle. [virginia.edu]
When you think about it, most of basic science, some religeon and all of medicine has been about removing layers of abstraction to try and fix things when they go wrong.
This reenforces the important point... (Score:2, Insightful)
Any good programmer I've ever known started with the lower level stuff and was successful for this reason. Or at least plowed hard into the lower level stuff and learned it well when the time came, but the first scenario is preferable.
Throwing dreamweaver in some HTML kiddie's lap, as much as I love dreamweaver, is not going to get you a reliable Internet DB app.
This is true of almost any engineering profession (Score:2, Interesting)
Abstractions don't kill systems, people kill... (Score:2, Interesting)
Great article, but don't throw out the high level tools and go back to coding Assembler.
Leaky slashdotted server... (Score:3, Funny)
Unfortunately, his Slashdotted server is proving that to us right now.
Moore's law buries Leaky Abstractions law (Score:2, Insightful)
Maybe someone out there prefers to program without any abstraction layers at all, but they inherit so much complexity that it will be impossible for them to deliver a meaningful product in a reasonable time.
Re:Moore's law buries Leaky Abstractions law (Score:2)
Sometimes. But not all penalties can be resolved by more CPU cycles. A faster CPU can't repair your severed ethernet wire. It can't change all the existing HTML browsers and C++ compilers to cover up supposed "flaws".
And unless this CPU is awesome enough to enable real AI, it can't save us from future shortcomings in computer-interface languages either.
Visual Basic and abstraction breakdown (Score:5, Interesting)
Much of our troubles, though, come from a single abstraction leak: the Sheridan (now called Infragistics [infragistics.com]) Grid control.
Like most VB controls, the Sheridan Grid is designed to be a drop-in, no-code way to display database information. It's designed to be bound to a data control, which itself is a drop-in no-code connection to a database using ODBC (or whatever the flavor of the month happens to be).
The first leak comes in to play because we don't use the data control. We generate SQL on the fly because we need to do things with our queries that go beyond the capabilities of the control, and we don't save to the database until the client clicks "OK". Right away, we've broken the Sheridan Grid's paradigm, and the abstraction started to leak. So we put in buckets -- bucketfuls of code in obscure control events to buffer up changes to be written when the form closes.
Just when things were running smoothly, Sheridan decided to take that kid with his finger in the dike and send him to an orphanage. They "upgraded" the control. The upgrade was designed to make the control more efficient, of course... but we don't use the data control! It completely broke all our code. Every single grid control in the application -- at least one and usually more in each of 200+ forms -- had to have all-new buckets installed to catch the leaks.
You may be wondering by now why we haven't switched to a better grid control. Sure enough, there are controls out there now that would meet 95% of our needs... but 1) that 5% has high client visibility and 2) the rest of the code works, by golly! No way we're going to rip it out unless we're absolutely forced to.
By the way, our application now compiles to a svelte 16.9 MEG...
A truly excellent essay (Score:2)
The law statement itself, "all non-trivial abstractions, to some degree, are leaky" may possibly get included in my personal "top 10" aphorisms manifesto.
Bad term choosen IMO (Score:2, Interesting)
I think that term ooze would suite better in this case. It's possesses a kind of dirtiness to itself and the fealing the word 'ooze' gives me fits good with the matter of described problem. :o)
Back to the article. To be serious, i think that Joel mixed all things as examples of 'Leaky abstraction' to no purpose. Too different situations make concept to fall apart. Here what i mean:
In case of tcp/ip it denotes limits of abstraction. And regardless of programmer background every sane man should now those limits do exist.
In case of page faults it's a matter of competence - there is no abstraction at all. You either do know how your code is compiled and executed or you don't. It's the same when you know what the phrase in a given language do realy mean or you don't. I simplify here.
In the case of C++ strings i saw the only good example. What in my opinion the experience of STL and string class usage tells in this case is: one should understand the underlying mechanics fully before rely on abstraction behaviour.
In programming it is realy simple to tell will the given 'abstraction' present you with an easter egg or not: if you can imagine FSM for the abstraction you will definitely know when to use it.
other leaks (Score:2)
Since I can't get to the site and read the article, I'll tell some jokes.
"Leaky Abstractions?! Is this guy talking about Proctology??"
"Leaky Abstractions?! Someone get this guy a plumber!"
"Leaky Abstractions?! I knew we should have used the pill!"
-gerbik
1 question: (Score:2)
Leaky? There's got to be a better word. (Score:2, Interesting)
The first thing I do when I start in on a new technology (VBA, CGI, ASP, whatever) is to start digging in the corners and see where the model starts breaking down.
What first turned me on to Perl (I'm trying hard not to flamebait here) was the statement that the easy things should be easy, and the hard things possible.
But even Perl's abstraction of scalars could use a little fiber to move through the system. Turn strict and warnings on, and suddenly your "strings when you need 'em" stop being quite so flexible, and you start worrying about when it's really got something in it or not.
On the HTML coding model breaking down, my current least-fave is checkboxes: if unchecked, they don't return a value to the server in the query, making it hard to determine whether the user is coming at the script the first time and there's no value, or just didn't select a value.
Then there's always "This space intentionally left blank.*" Which I always footnote with "*...or it would have been if not for this notice." Sure sign of needing more regularity in your diet.
Not all abstractions are leaky (Score:2, Insightful)
I don't think that's true. It depends on how the abstraction is defined, what it claims to be.
You can use TCP without knowing how the internals work, and assume that all data will be reliably delivered, _unless_ the connection is broken. That is a better abstraction.
And the virtual memory abstraction doesn't say that all memory accesses is guaranteed to take the same amount of time, so I don't consider it to be leaky.
So I don't entirely agree with the author's conclusions.
This is an overrated rant about bad coding (Score:5, Insightful)
When I read what Joel wrote about "leaky abstractions" i saw a peice complaining about "unintended side-effects". I don't think the problem is with abstractions themselves, but rather the implementation.
He lists some examples:
1. TCP - This is a common one. Not only does TCP itself have peculiar behavior in less than ideal conditions, but it is also interfaced with via sockets, which compound the problem with an overly complex API.
If you were to improve on this and present a clean reliable stream transport abstraction is would likely have a simple connection establishment interface and some simple read/write functionality. Errors would be propagated up to a user via exceptions or event handlers. But the point I want to make is that This problem can be solved with a cleaner abstraction.
2. SQL - This example is a straw man. The problem with SQL is not the abstraction it provides, but the complexity of dealing with unknown table sizes when you are trying to write fast generic queries. There is no way to ensure that a query runs fastest on all systems. Every system and environment is going to have different amounts and types of data. The amount of data in a table, the way it is indexed, and the relationship between records is what determines a queries speed. There will always be manual performance tweaking of truly complex SQL simply because every scenario is different and the best solution will vary.
3. C++ string classes. I think this is another straw man. Templates and pointers in C++ are hard. That is all there is too it. Most Visual Basic only coders will not be able to wrap their minds around the logic that is required to write complex c++ template code. No matter how good the abstractions get in C++, you will always have pointers, templates, and complexity. Sorry Joel, your VB coders are going to have to avoid c++ forever. There is simply no way around it. This abstraction was never meant to make things simple enough for Joe Programmer, but rather to provide an extensible, flexible tool for the programmer to use when dealing with string data. Most of the time this is simpler, sometimes it is more complex (try writing your own derived string class - there are a number of required constructors you must implement which are far from obvious) but the end result is that you have a flexible tool, not a leaky abstraction.
There are some other examples, but you see the point. I think Joel has a good idea brewing regarding abstractions, complexity, and managing dependencies and unintended side-effects, but I do not think the problem is anywhere near as clear cut as he presents. As a discipline software engineering has a horrible track record of implementing arcane and overly complex abstractions for network programming (sockets and XTI) generic programming (templates, ref counting, custom allocators) and even operating systems API's (POSIX).
Until we can leave behind all of the cruft and failed experiments of the past, start new with complete and simple abstractions that do not mask behavior, but rather recognize it and provide a mechansim to handle it gracefully, we will run into these problems.
Luckily, such problems are fixable - just write the code. If joel were right and complex abstractions were fundamentally flawed, that would be a dark picture indeed for the future of software engineering (it is only going to grow ever more complex from here kids - make no mistake about it).
Complexity Management (Score:3, Interesting)
The problem that this article points to is a byproduct of large scale software development primarily being an exercise in complexity management. Abstraction is the foremost tool available in order to reduce complexity.
In practice a person can keep track of between 4 and 11 different concepts at a time. The median lands around 5 or 6. If you want to do a self-experiment have someone write down a list of twenty words, then spend 30 seconds looking at them without using memnonic devices such as anagrams to memorize them then put the list away. After thirty more seconds write down as many as you can recall.
This rule applies equally when attempting to manage a piece of software - you can only really keep track of between 4 and 11 "things" at the same time, so the most common practice is to abstract away complexity - you reduce an array of characters terminated by a null characters and a set of functions designed to operate on that array to a String. You went from half a dozen functions, a group of data pieces, and a pointer to a single concept - freeing up slots to pay attention to something else.
The article is completely correct in its thesis that abstractions gloss over details and hide problems - they are designed to. Those details will stop you from being productive because the complexity in the project will rapidly outweigh your ability to pay attention to it.
This range of attention sneaks into quite a few places in software development:
other schemes exist for managing complexity, but abstraction is decided human - you don't open a door, rotate, sit down backwards, rotate again, bend legs, position your feet, extend left arm, grasp door, pull door shut, insert key in iginition, extend right arm above left shoulder, grasp seatbelt, etc... you start the car. Software development is no different.
There exist peopel that can track vast amounts of information in their heads at one time - look at Emacs - iirc RMS famously wrote it as he did because he could keep track of what everythign did, no one else can though. There also exist memnonic devices aside from abstraction for managing complexity - naming conventions, taxonomies, making notes, etc.
-Frums
Pessimism gone rampant (Score:5, Insightful)
We have examples of massively complex systems that work very reliably day-in and day-out. Jet airplanes, for one; the national communications infrastructure, for another. Airplanes are, on the whole, amazingly reliable. The communications infrastructure, on the other hand, suffers numerous small faults, but they're quickly corrected and we go on. Both have some obvious leaky abstractions.
The argument works out to be pessimism, pure and simple -- and unwarrented pessimism to boot. If it were true that things were all that bad, programmers would all _need_ to understand, in gruesome detail, the microarchitectures they're coding to, how instructions are executed, the full intricacies of the compiler, etc. All of these are leaky abstractions from time to time. They'd also need to understand every line of libc, the entire design of X11 top to bottom, and how their disk device driver works. For almost everyone, this simply isn't true. How many web designers, or even communications applications writers, know -- to the specification level -- how TCP/IP works? How many non-commo programmers?
The point is that sometimes you need to know a _little bit_ about the place where the abstraction can leak. You don't need to know the lower layer exhaustively. A truly competant C programmer may need to know a bit about the architecture of their platform (or not -- it's better to write portable code) but they surely do not need to be a competant assembly programmer. A competant web designer may need to know something about HTML, but not the full intricacies of it. And so forth.
Yes, the abstractions leak. Sometimes you get around this by having one person who knows the lower layer inside and out. Sometimes you delve down into the abstraction yourself. And sometimes, you say that, if the form fails because it needs JavaScript and the user turned off JavaScript, it's the user's fault and mandate JavaScript be turned on -- in fact, a _good_ high-level tool would generate defensive code to put a message on the user's screen telling them that, in the absence of JavaScript, things will fail (i.e. the tool itself can save the programmer from the leaky abstraction).
What Ian Malcolm says, when you boil it all down, is that complex systems simply can't work in a sustained fashion. We have numerous examples which disprove the theory. That doesn't mean that we don't need to worry about failure cases, it means we overengineer and build in failsafes and error-correcting logic and so forth. What Joel Spolsky says is that you can't abstract away complexity because the abstractions leak. Again, there are numerous examples where we've done exactly that, and the abstraction has performed perfectly adequately for the vast majority of users. Someone needs to understand the complex part and maintain the abstraction -- the rest of us can get on with what we're doing, which may be just as complex, one layer up. We can, and do, stand on the shoulders of giants all the time -- we don't need to fully understand the giants to make use of their work.
Neal Stephenson... (Score:4, Interesting)
Interesting stuff...
It also happens in Math (Score:3, Interesting)
The problem is that in real Math, you often need a slightly different result, or you cannot prove that the hypothesis are true in your situation. The solution often involves understanding what's "under the hood" in the theorem, so that you can modify the proof a little bit and use it.
Every professional mathematician knows how to prove the theorems that he/she uses. There is no such thing as a "high-level mathematician", that doesn't really know the basics, but only uses sophisticated theorems in top of each other. The same should be true in programming, and this is what the article is about.
The solution? Good education. If anyone wants to be considered a professional programmer, he/she should have a basic understanding of digital electronics, micro-processor design, assembly language (at least one), OS architechture, C, some object oriented language, databases... and should be able to understand the relationship between all those things, because when things go wrong, you may have to go to any of the levels.
It's a lot of things to learn, but there is no other way out. Building software is a difficult task and whoever sells you something else lies.
Use a better language if leaking abstractions (Score:3, Interesting)
Humans form abstractions. That's what we do. If you abstractions are leaking with detrimental consequences, then it could be because the programming language implementation you're using is deficient, not because you shouldn't be abstracting.
Try a high-performnce Common Lisp compiler some time. Strong dynamic typing and optional static typing, macros, first class functions, generic-function OO, restartable conditions, first class symbols and package systems make abstraction much easier and less prone to arbitrary decisions and problems that are really:
(i) workarounds for methods-in-once-class-rule of "ordinary" single-dispatch OO
(ii) workarounds for the association of what an object is with the name of the object rather than it itself (static typing is really saying "this variable can only hold this type of object", dynamic typing is saying "the object is of this type". Some languages mix these issues up, or fail to recognise the distinction.
(iii) workarounds for the fact that most languages, unlike forth and lisp, are not themselves extensible for new abstractions
(iv) workarounds for the fact that one cannot pass functions as parameters to functions in some languages (doesn't apply to C, thanks to function pointers - here's where the odd fact that low level languages are often easier to form new abstractions in comes in)
(v) workarounds for namespace issues
(vi) workarounds for crappy or nonexistent exception processing
Plus, Common Lisp's incremental compile cycle means faster development, and it's defined behaviours for in place modifications to running programs makes it good for high-availability systems
Argh. (Score:4, Insightful)
"A lot of the stuff the C++ committe added to the language was to support a string class. Why didn't they just add a built-in string type?"
It's good that a string class wasn't added, because that lead to templates being added! And templates are the greatest thing, ever!
The comment shows a total lack of understanding of post-template, modern C++. People are free not to like C++ (or aspects of it) and to disagree with me about templates, of course, and in that case I'm fine with them taking stabs at it. But I get peeved when people who have just given the language a cursory glance try to fault it. If you haven't used stuff like Loki or Boost, or taken a look at some of the fascinating new design techniques that C++ has enabled, then you're in no place to comment about the language. At least read something like the newer editions of D&E or "The C++ Programming Language" then read "Modern C++" before spouting off.
PS> Of course, I'm not accusing the author of being unknowledgable about C++ or anything of the sort. I'm just saying that this particular comment sounded rather n00b'ish, so to speak.
The Irony of the Shiny New Thing (Score:5, Interesting)
One thing I liked especially is the danger of the Shiny New Thing. It may be neat and cool and save time, but knowing how to use it does not mean that you can do anything else - or function outside of it.
Right now I'm on an ASP.NET project - and some ASP.NET stuff I actually like. But the IDE actually makes it harder to program responsibly, and even utilize
A friend of mine just about strangled some web developers he worked with as they ONLY use tools (and they love all the Shiny New Ones) and barely know what the tools produce. This has led to hideous issues of having to configure servers and designs to work with their products as opposed to them actually knowing how they work. The guy's a saint, I swear.
I think managers and employers need to be aware of how abstract things can get, and realize good programmers can "drill down" from one layer to another to fix things. A Shiny New Thing made with Shiny New Things does NOT mean the people who did it are talented programmers, or that they can haul your butt out of a jam when the Shiny New Thing looses its shine.
Too much abstraction is a bad thing (Score:3, Insightful)
abstractions == models (Score:3, Insightful)
That's all an abstraction is: a model. Just like Newtonian physics, supply and demand under perfect competition, and every other hard or soft scientific model. Supply and demand breaks down at the low end (you can't be a market participant if you haven't eaten in a month) and the high end (if you are very wealthy, you can change the very rules of the game). Actually, supply and demand breaks down in many ways, all the time. Physics breaks down at the very large or very small scales. Planetary orbits have wobbles that can only be explained by more complex theories. Etc.
No one should pretend that the models are complete. Or even pretend that complete models are possible. However, the models help you understand. They help you find better solutions (patterns) to problems. They help you discuss and comprehend and write about a problem. They allow you to focus on invariants (and even invariants break down).
All models are imperfect. It's good that computer science folks can understand this, however, I don't think Joel should use a term like "leaky abstraction". Calling it that implies the existence of "unleaky abstraction", which is impossible. These are all just "abstractions" and the leaks are unavoidable.
Example: if I unplug the computer and drop it out of a window, the software will fail. That's a leak, isn't it? Think of how you would address that in your model: maybe another computer watches this one so it can take over if it dies..etc..more complexity, more abstractions, more leaks....
He also points out that, basically, computer science isn't exempt from the complexity, specialization, and growing body of understanding that accompanies every scientific field. Yeah, these days you have to know quite a bit of stuff about every part of a computer system in order to write truly reliable programs and understand what they are doing. And it will only get more complex as time goes on.
But what else can we do, go back to the Apple II? (actually that's not a bad idea. That was the most reliable machine I've ever owned!)
Non-leaky abstractions (Score:3, Interesting)
It's useful to distinguish between performance-related leaks and correctness leaks. SQL offers an abstraction for which the underlying database layout is irrelevant except for performance issues. The performance issues may be major, but at least you don't have to worry about correctness.
C++ is notorious for this; the language adds abstractions with "gotchas" inside. If you try to get the C++ standards committee to clean things up, you always hear 1) that would break some legacy code somewhere, even if we can't find any examples of such code anywhere in any open source distro or Microsoft distro, or 2) that only bothers people who arent "l33t".
Hardware people used to insist that everything you needed to know to use a part had to be on the datasheet. This is less true today, because hardware designers are so constrained on power, space, heat, and cost all at once.
Re:timeout (Score:3, Insightful)
TCP for the bored (Score:5, Insightful)
So in your case of timing out N, re-tx'ing N, and then getting the repsonse to the first N back after sending the second N, you do two things:
1) Good! You got yr packet!
2) keep track of how many bytes you have received thsu far (TCP is not sending messages, it is sending a stream)
3) when you get the response from your second request, discard it, becuase you already received those bytes from the stream.
4) since you timed out, DON'T use the Round TRip Time for that reponse: slow down your expected RTT time, and THEN start measuring.
And guess what? If I unplug the NIC of the other machine, there is no reliable way of transmitting that data (assuming your destination machine isn't dual homed)- so I keep streaming bytes to a TCP socket and I don't find out my peer is gone for approx. 2 minutes.
WOW. There's nothing reliable about that boundary condition!
my point is TCP is reliable ENOUGH. But I wouldn't equate it with a Maytag warranty. It is not a panacea. Infact, for a closed homogenous network I wouldn't even consider it the best option. But if the boundary conditions fall within the acceptible fudge range (remember Real Time human grade systems are not 100% reliable, only 99.99999% and much of that is achieved through redundancy) your leaks are ok.
I'm sorry.. (Score:3, Funny)
It seems like your keyboard keeps dropping packets. Could we have a repost of this comment?
Re:timeout (Score:2)
Re:timeout (Score:4, Informative)
A timeout is a legal result by the TCP specification, but it's not reliable, because your data didn't make it through.
By the IP specification, your data might not make it either- and that's a legal result because the spec allows it to drop packets for any reason at all. That doesn't mean IP is reliable, just that it obeys its own definition.
Of course, no real protocol can ever meet this restrictive definition of reliable. Some maniac can always cut through your wires or incinerate your CPUs. Calling TCP a "reliable protocol" is just a shorthand for "as much more reliable than the underlying protocols as we could manage"
The timeout you mention does make TCP more reliable than IP, because it alerts you to the data loss, where the application can possibly take steps to retransmit it sometime in the future.
But its not as if TCP could ever achieve the perfect reliablity that the simplest, most abstract description of it would imply. Which is why, as the author says, those who rely on the abstractions can get bitten later.
Re:timeout (Score:2)
1. Data either arrives intact or dont arrive at all.
2. Data always arrives in the order it was sent
3. Data is never duplicated.
You can safely rely on all of these facts. Calling TCP a garantie that you packages arrive is leaky, but TCP doesnt claim that.
So if you dont abstract TCP more than you are meant to, it is a non-leaky abstraction.
Re:What? (Score:2)
Re:What? (Score:4, Insightful)
UPD is also build on IP, but adds almost nothing to it (aside from the ability to target individual port numbers instead of an entire computer), so its not any more reliable than IP.
The internet's developers could've built TCP on top of UDP, but for unknown reasons they didn't. Maybe they felt that a whole extra layer of abstraction would've been useless overhead.
Re:WTF? (Score:2)
> essays . . . but you can retransmit freaking TCP packets!
He does go on to say that "retransmission" in this case means sending an identical twin of the actor. Your criticism really isn't fair in ignoring this. The metaphor of identical twins for copies of information is not perfect, but as a teaching tool for explaining TCP to a non-technical type, I'm not sure I could come up with another metaphor which matches this one for its intuitive value.
joelindenial (Score:5, Interesting)
In physics the abstractions leak. Newton's laws leak like crazy. Einstein's theories leak. Presently there are no fundamental theories in physics which don't leak like crazy when quantum mechanics and gravity interact.
In sports the abstractions leak. That's how we get players like Gretzky and pay a lot of money to watch what they do.
And how about the reason why didn't C++ didn't define a native string type. Because there isn't any way to implement a string class that serves all possible applications. The premise of C++ is not being stuck with someone else's choice on what part of the abstraction should leak. Because C++ doesn't define a native string type, the user is free to replace the default standard string implementation with any other string implementation and have it integrate with the language on an equal footing with the standard string type.
If a language is imposes standard abstractions it only takes one abstraction you can't live with to make that choice of language untenable. Which is how C++ has been so successful despite being the worst of all possible languages (except for all the others).
Re:a leaky abstraction is a wrong abstraction (Score:3, Insightful)
Re:"leaky abstractions" my foot. (Score:3, Insightful)