Programming As If Performance Mattered 615
Junks Jerzey writes "Saw the essay 'Programming as if Performance Mattered', by James Hague, mentioned at the Lambda the Ultimate programming language weblog. This is the first modern and sensible spin on how optimization has changed over the years. The big 'gotcha' in the middle caught me by surprise. An inspiring read." Hague begins: "Are particular performance problems perennial? Can we never escape them? This essay is an attempt to look at things from a different point of view, to put performance into proper perspective."
The question I always ask is (Score:5, Insightful)
What annoys me (Score:4, Insightful)
a sad inditement
the software taketh what the hardware giveth. (Score:4, Insightful)
ive seen glenz vectors and roto-zoomers on the commodore 64.
modern os's, escpecially windows seem super-sluggish when you see what is possible on those old computers if you just care to optimize the code to the max.
You don't optimize, that's the job of the compiler (Score:2, Insightful)
That's the difference between modern languages and more archaic ones. Sure you can't get the "absolute best" most optimized optimization, but you're probably going to get a better optimization than you can think of just from the interpreter/compiler doing its job.
The only thing that really needs optimization is streamlining data structures because the compiler can't predict what part of the data structure isn't used during runtime. You just ned to make sure you use the right data structure for the job and put the basic pen-and-paper (optimized) algorithm down in plain code. No strange hacker tricks needed.
Re:Funny thing about performance (Score:5, Insightful)
Assuming there is a second version, which there may not be because potential customers found that the performance of v1.0 sucked.
Premature Optimization (Score:5, Insightful)
The first was to get a SQL query to run faster: a simple matter of creating a view and supporting indexes.
The second was also SQL related, but on a different level: the code was making many small queries to the same data structures. Simply pulling the relevant subset into a hash table and accessing it from there fixed that one.
The most recent one was more complex: it was similar to the second SQL problem (lots of high overhead small queries) but with a more complex structure. Built an object to cache the data in with a set of hashes and "emulated" the MoveNext, EOF() ADO style access the code expected.
We have also had minor performance issues with XML documents we throw around, may have to fix that in the future.
Point? None of this is "low level optimization": it is simply reviewing the performance data we collect on the production system to determine where we spend the most time and making high level structural changes. In the case of SQL vs a hash based cache, we got a 10 fold speed increase simply by not heading back to the DB so often.
Irony? There are plenty of other places where similar caches could be built, but you won't see me rushing out to do so. For the most part performance has held up in the face of thousands of users *without* resorting to even rudementry optimization. Modern hardware is scary fast for business applications.
Re:The question I always ask is (Score:5, Insightful)
Re:Funny thing about performance (Score:5, Insightful)
However, I will dispute the claim that performance gains happen only at the hardware level - although programmers cannot really optimize every tiny bit, there is no harm in encouraging good programming.
The thing is that a lot of programmers today have grown NOT to respect the need for performance - they just assume that the upcoming systems would have really fast processors and infinite amounts of RAM and diskspace, and write shitty code.
I agree that like Knuth said, premature optimization is the root of all evil. However, writing absolutely non-optimized code is evil in itself - when a simple problem can be simplified in order and time, it's criminal not to
A lot of times, programmers (mostly the non-CS folks who jumped the programming bandwagon) write really bad code, leaving a lot of room for optimization. IMHO, this is a very bad practice, something that we have not really been paying much attention to because we always have faster computers coming up.
Maybe we never will hit the hardware barrier, I'm sure this will show through.
Re:The question I always ask is (Score:3, Insightful)
The question I ask is, can the server handle the load any other way? As far as my company is concerned, my time is worth nothing. They pay me either way. The only issue is, and will always be, will it work? Throwing more hardware at the problem has never solved a single performance problem, ever.
We've been slashdotted twice. In the pipeline is a database request, a SOAP interaction, a custom apache module, and an XSLT transform.
Our server never even came close to its breaking point. I attribute it to optimizing for performance.
Depends on your target (Score:5, Insightful)
When operations take several seconds a user gets annoyed. The program is percieved to be junk and the user begins looking for something else that can do the job faster. It doesn't matter if productivity is actually enhanced. It just matters that it's percieved to be enhanced or that the potential is there.
You also have to consider if the time taken to complete an operation is just because of laziness. If you can easily make it faster, there's little excuse not to.
For distributed apps you have to consider the cost of hardware. It may cost several hours of labor to optimize but it may save you the cost of a system or few.
In the world of games half a second per operation works out to 2 frames per second which is far from acceptible. Users expect at minimum 30 frames per second. It's up to the developer to decide what's the lowest system they'll try to get that target on.
You have to consider the number of users that will have that system vs the amount it will cost to optimize the code that far.
In terms of games you also have to consider that time wasted is time possibly better spent making the graphics look better. You could have an unoptimized mesh rendering routine, or a very fast one and time left over to apply all the latest bells and whistles the graphics card has to offer.
There are countless factors in determining when something is optimized enough. Games more so than apps. Sometimes you just need to get it out the door and say "it's good enough."
Ben
Re:Don't agree (Score:5, Insightful)
Clear concise programs that allow the programmer to understand -- and easily modify -- what is really happening matter more than worrying about (often) irrelevant details. This is certainly influenced by the language chosen.
e.g. I'm working on a large F77 program (ugh...) that I am certain would be much _faster_ in C++ simply because I could actually understand what the code was doing, rather than trying to trace through tens (if not hundreds) of goto statements. Not to mention actually being able to use CS concepts developed over the past 30 years...
Re:The question I always ask is (Score:3, Insightful)
To a certain extent. I've seen that excuse for some pretty bad/slow code out there.
Writing effecient and somewhat optimised code is like writing readable extensable code: if you design and write with that in mind you usually get 90% of it done for very very little (if any) extra work. Bolt it on later and you usually get a mess that doesn't actually do what you intented.
A good programmer should always keep both clean code and fast code in mind while writing software.
optimize with discretion (Score:3, Insightful)
In the business world, you have to satisfy market demands and thus cannot take an endless amount of time to produce a highly optimized product. However, unless you are Microsoft, it is very difficult to succeed by quickly shoving a slow pile of crap out the door and calling it "version 1".
So where do you optimize? Where do you concentrate your limited amount of time before you miss the window of opportunity for your product?
I know plenty of folks in academia who would scoff at what I'm about to say, but I'll say it anyway... just because something could be faster, doesn't mean it has to be. If you could spend X hours or Y days tweaking a piece of code to run faster, would it be worth it? Not necessarily. It depends on several things, and there's no really good formula, each case ought to be evaluated individually. For instance, if you're talking about a nightly maintenance task that runs between 2am and 4am when nobody is on the system, resource consumption doesn't matter, etc., then why bother making it run faster? If you have an answer, then good for you, but maybe you don't and should thus leave that 2 hour maintenanc task alone, spend your time doing something else.
For people who are really into performance optimization, I say get into hardware design or academia, because the rest of the business world doesn't really seem to make time for "doing things right" (just an observation, not my opinion).
One thing new programmers often miss (Score:3, Insightful)
In a high-level interpreted language with nice syntax--mine is Python, not Erlang, but same arguments apply--it's easier to write clean, lean code. So high-level languages lead to (c)leaner code, which is faster code. I often find that choosing the right approach, and implementing it in an elegant way, I get performance far better than I was expecting. And if what I was expecting would have been "fast enough", I'm done -- without optimizing.
Re:speed/easy coding (Score:4, Insightful)
Code tweaking (Score:5, Insightful)
Every time I'm tempted to start micro-optimizing, I remind myself of the following three simple rules:
2) If you feel tempted to violate rule 1, at least wait until you've finished writing the program.
3) Non-trivial programs are never finished.
Re:What annoys me (Score:5, Insightful)
Re:speed/easy coding (Score:4, Insightful)
Say you're a large business, and you have a mix of client side and server side applications. Both have significant processing time requirements Which do you spend more time optimizing?
In this scenario, you're going to have a large number of client machines and a small number of servers. If servers need a little more power, you can upgrade the machine without too much disruption or money spent. The upgrade will benefit all users of the system. In this case, it's more cost effective to upgrade the server than it is to pay developers to optimize the hell out of the code.
The client machines is a different story. There's a lot of machines in use. Upgrading any one will only help the user of that computer. Optimizing the code will help every user. In this case, paying a developer to optimize your code will be a lot cheaper than doing a company wide hardware upgrade.
This is all of course assuming you're designing things well in the first place. Of course you should do things like use a quick sort (or whatever may be more appropriate in the case at hand) instead of a bubble sort. The point is its not worth spending days to get the last 1% of performance.
Re:Funny thing about performance (Score:2, Insightful)
Better a version 1.0 that sucked than none at all.
And funny how Microsoft seems to release so many crappy 1.0 releases yet usually ends up clawing back to become the market leader.
Re:What annoys me (Score:2, Insightful)
Re:You don't optimize, that's the job of the compi (Score:5, Insightful)
I remember looking over something once that was clear, simple and very slow. It was a set of at least twenty if statements, testing the input and setting a variable. The input was tested against values in numeric order, and the variable was set the same way. Not even else if's so that the code had to go through every statement no matter the value. I re-wrote it to a single if, testing to see if the input were in the appropriate range and calculating the variable's value. No compiler is going to do that. Brute force can be clear, simple and slow.
Performance, an aspect of design and understanding (Score:3, Insightful)
That is why I insist on "optmization" in the beginning. Not peephole optimization - but design optimization. Designs (or "patterns" in the latest terminology) that are fast are also naturally simple. And simple - while hard to come up with initially - is easy to understand.
But that's also why I discount any "high level language is easier" statement, like this fellow makes. It is significantly harder to come up with a good architecture than learning to handle a "hard" language. If you can't do the former (including understanding the concepts of resource allocation, threads, and other basic concepts), you certainly aren't going to do the latter. Visual Basic is not an inherently bad language because you can't program well in it. It just attracts bad programmers.
And that goes the same for many of the newer "Basics": these "managed languages" that make it so that people can "code" without really understanding what they're doing. Sure, you can get lines of code that way. But you don't get a good product.
And then the whole thing falls apart.
Bad performance is built in. (Score:3, Insightful)
1. Mathematically impossible to do it any other way.
2. Modularity.
Of course crap code/logic also counts, but it can be rewritten.
The problem with modularity is that it forces us to break certain functions down at arbitrary points. This is handy for reusing code, of course, and it saves us a lot of work. Its the main reason we can build the huge systems we build today. However, it comes with a price.
While I don't really know how to solve this practically, it could be solved by writing code that never ever calls other code. In other words, the entire program would be custom-written from beginning to end for this one purpose. Sort of like a novel which tells one complete story and is one unified and self-contained package.
Programs are actually written more like chapters in the mother of all choose-your-own-adventure books. Trying to run the program causes an insane amount of page flipping for the computer (metaphorically and actually
Of course this approach is much more flexible and allows us to build off of the massive code that came before us, but it is also not a very efficient way to think about things.
Personally, I think the languages are still the problem because of where they draw the line for abstractions. It limits you to thinking within very small boxes and forcing you to express yourself in limited ways. In other words, your painting can be as big as you want, but you only get one color (a single return value in many languages). It is like we're still stuck at the Model T stage of language development--it comes in any color you want as long as its black!
Followed your link (Score:4, Insightful)
2. Assuming your stuff is good, when are you going to code up SHA-1 (*MY* favorite hash)?
3. On the server side of things, I would argue that correctness is more important than otherwise. If an app crashes 1 in 100 times for a desktop user, the developer blames windows and the user is satisfied (don't flame me on this, please). On the server, if the app crashes 1 in 100 times, it may bring down the transactions for 100s of users, making things very bad for the developer. For non-crash correctness problems, consider a problem which makes a minor, but cumulative error in subsequent runs. That would likely be disasterous for the server situation.
As far as clarity, find me one developer who has taken over a project and not complained about the quality of the inherited code ever. Seriously. (that's not directed at parent)
right for the wrong reasons (Score:5, Insightful)
Sigh. One of the best sources of flamebait is being right for the wrong reasons.
Surely C++ must rate as the least well understand language of all time. The horrors of C++ are almost entirely syntactic, beginning with the decision to maintain compatibility with the C language type declaration syntax and then adding several layers of abstraction complexity (most notably namespaces and templates).
There are only two areas where I fear C++ for program correctness. The first is making a syntactic brain fart leading to an incorrect operator resolution or some such. These can be tedious to ferret out, but most of these battles are fought with the compiler long before a defect makes it into the production codebase.
My second source of fear concerns interactions of exception unwinding across mixtures of object oriented and generic components. I see this as the only case where managed memory provides a significant advantage: where your program must incorporate exception handling. If you can't manage your memory correctly in the absence of the exception handling mechanism, I really don't believe you can code anything else in your application correctly either. I think exceptions are mostly a salvation for poor code structure. If all your code constructs are properly guarded, you don't need an error return path. Once a statement fails to achieve a precondition for the code that follows, the code path that follows will become a very efficient "do nothing" exercise until control is returned to a higher layer by the normal return path, whereupon the higher layer of control can perform tests about whether the objectives were achieved or not and take appropriate measures. I think the stupidest optimization in all of programming is cutting a "quick up" error return path that skips the normal path of program execution so that the normal path of execution can play fast and loose with guard predicates.
The four languages I use regularly are C, C++, PHP, and Perl. Perl is the language I'm least fond of maintaining. Too many semantic edge cases that offer no compelling advantage to motivate remembering the quirk. C++ has many strange cases, but for C++ I can remember the vast majority of these well enough, because I've stopped to think about how they evolved from taking the C language as a starting point.
I happen to love PHP for the property of being the most forgettable of all languages. I forget everything I know about PHP after every program I write, and it never slows me down the next time I sit down to write another PHP program. The managed memory model of PHP appeals to me in a way that Java doesn't, because as an inherently session-oriented programming model, PHP has a good excuse for behaving this way.
I have a love/hate relationship with both C and C++. I write one program at a high level of abstraction in C++ and then when I return to C it feels like a breath of fresh air to live for a while in an abstraction free zone, until the first time I need to write a correctness safe string manipulation more complicated than a single sprintf, and then I scream in despair.
The part of my brain that writes correct code writes correct code equally easily in all of these languages, with Perl 5 slightly in the rear.
If I really really really want correct code I would always use C++. The genericity facilities of C++ create an entire dimension of correctness calculas with no analog in most other programming languages. The template type mechanism in C++ is a pure functional programming language just as hard core as Haskell, but because C++ is a multi-paradigm language, in C++ you only have to pull out the functional programming hammer for the slice of your problem where nothing less will do.
What I won't dispute is that C++ is a hard language to master to the level of proficiency where it becomes correctness friendly. It demands a certain degree of meticulous typing skills (not typing = for ==). It demands an unflagging determination to master the sometim
This isn't an article about optimization (Score:5, Insightful)
The analogy is all wrong. These days there are distinctly two types of "optimization". Algorithmic and the traditional "to the metal" style.
During college I worked with the English department training English students to use computers as their work had to be done on a computer. (This was before laptops were commonplace) The theory was that word processing allowed students a new window into language communication. To be able to quickly and painlessly reorganize phrases, sentences and paragraphs showed the students how context, clarity and meaning could change just by moving stuff around.
This is what the author has discovered. That by being able to move code actions around, he can experiment and "play" with the algorithm to boost speed while keeping error introduction to a minimum. (Ye olde basic anyone?)
He mistakenly equates this to "advanced technologies" like virtual machines and automatic memory buffer checking. In reality, we've just removed the "advanced technologies" from the process. (IE Like pointers, dynamic memory allocation, etc) (IE, ye olde basic anyone?)
There's nothing wrong with this. Though I am a C++ programmer by trade, I was far more productive when I was professionally programming Java. But that was because I had LESS creative control over the solution because of the language syntax. No passed in variable changing, no multiple inheritance, etc. So I'm thinking of how to layout the code, there's pretty much a limited way of how I'm going to go about doing that.
It's like the difference between having the Crayola box of 8 crayons and the mondo-uber box of 64. If you're going to color the green grass with the box of 8, you've got: Green. If you've got 64 colors, you're going to agonize over blue-green, green-blue, lime green, yellow-green, pine green and GREEN.
That doesn't make C++ less "safe" than Java. Sure, you can overwrite memory. But you can also create a Memory class in C++ ONCE which will monitor the overflow situation FOR you and never have to worry again.
But back to optimization:
66 fps seems really fast. But in game context it's still kind of meaningless. Here's why. You're not just displaying uncompressed images. You're also doing AI, physics, scoring, digital sound generation, dynamic music, User input, possibly networking. As a game programmer, you don't stop at 66 fps. Because if you do 132 fps, then you can really do 66 fps, and still have half a second left over to do some smarter AI or pathfind. Or if you get it up to 264 fps than you can spend 1/4 of the cycle doing rendering, maybe you can add true Dynamic voice synthesis so you don't have to prerecord all your speech!
Ultimately, my point is this. (and I think this is what the author intended) You're going to get bugs in whatever language you write in. That's the nature of the beast. VM's and 4th generation languages take away the nitty gritty of programming while still providing alot of performance power. And in alot of cases, that's a good thing. But it's still nothing more than a "model" of what's really going on in the hardware. If you're really going to push the limits of the machine you have to be able to control all aspects of it. Now, it's getting harder to do that in Windows. We spend more time coding to the OS than the metal. But in the embeddes systems category, and in console video game systems the metal still reigns and if you're going to develop a game that will push the hardware, you're going to need a programming language that will let you speak machine language. Not one that's going to protect you from yourself.
As it was in the beginning, as it always will be: Right tool for the right job.
Depends on the design and the bottleneck (Score:3, Insightful)
You have to consider the entire system design when looking at the bestplace to make the optimization. You need to look at what the bottleneck and attack that, but keep in mind the issue in upgrading the system.
Re:Throw hardware at it. (Score:4, Insightful)
Well, that depends.
You probably picked the simplest, dumbest algorithm and probably used the most basic data structure. Why do all of the hard work when you don't even know if the easy work will suffice?
If they don't suffice, your options are to develop your own algorithm/find a better one and a more natural data structure, or to throw hardware at it. Chances are, you won't be lucky enough that you can just upgrade so you'll have to spend valuable programmer time implementing a more complex algorithm that will need more careful maintenance that is likely to have more bugs that is probably less robust. You'll probably have to convert the data to a more machine-friendly format. Maybe you'll have to inconvenience the user or ship a lot of precompiled data. Whatever.
It's rare that the easy algorithm is slow enough that it won't do as-is, but fast enough that doubling cpu power makes it tolerable. Usually there are orders of magnitude differences between the "best" algorithm and the easy algorithm, and only incremental speed bumps in computer offerings.
On the other hand, maybe with an extra GB of RAM you'll never have to touch swap. Maybe that's good enough. ;)
Re:speed/easy coding (Score:5, Insightful)
On the server side, I'd say that correctness and clarity are even more important. I guess it's all a matter of opinion as to where the "sweet spot" is, but most programming involves finding the right balance between speed and clarity.
If you're in a situation where you need the servers to process large amounts of data, you're most likely in a position to be able to justify the expense of throwing better hardware at the problem.
LK
Re:Funny thing about performance (Score:5, Insightful)
Asymptotic performance (Score:5, Insightful)
You can spend all day optimizing your code to never have a cache-miss, a branch misprediction, divisions, square roots, or any other "slow" things. But if you designed an O(n^2) algorithm, my non-optimized O(n) algorithm is still going to beat it (for sufficiently large n).
If the asymptotic performance of your algorithm is good, then the author is right, and you may not find it worth your time to worry about further optimizations. If the asymptotic performance of your algorithm is bad, you may quickly find that moving it to better hardware doesn't help you so much.
Alan
Re:Don't agree (Score:5, Insightful)
I don't think that fits into the description the article was talking about.
The point of this article is not targeted to you. I've seen interns as recent as last year complain about the same things mentioned in the article: division is slow, floating point is slow, missed branch prediction is slow, use MMX whenever more than one float is used, etc.
The point I get out of the article is not to bother with what is wasteful at a low level, but be concerned about the high levels. A common one I've seen lately is young programmers trying to pack all their floats into SSE2. Since that computation was not slow to begin with, they wonder why all their 'improvements' didn't speed up the code. Even the fact that they are doing a few hundred unneccessary matrix ops (each taking a few hundred CPU cycles) didn't show up on profiling. Their basic algorithm in a few cases I'm thinking about are either very wasteful, or could have been improved by a few minor adjustments.
The article mentions some basic techniques: choosing a different algorithm, pruning data, caching a few previously computed results, finding commonalities in data to improve the altorithm. Those are timeless techniques, which you probably have already learned since you work on such a big system. Writing your code so that you can find and easily implement high-level changes; that's generally more important than rewriting some specific block of code to run in the fewest CPU cycles.
A very specific example. At the last place I worked, there was one eager asm coder who write template specializations on most of the classes in the STL for intrinsic types in pure asm. His code was high quality, and had very few bugs. He re-wrote memory management so there were almost no calls to the OS for memory. When we used his libraries, it DID result in some speed gains, and it was enough to notice on wall-clock time.
However... Unlike his spending hundreds of hours on this low-return fruit, I could spend a day with a profiler, find one of the slower-running functions or pieces of functionality, figure out what made it slow, and make some small improvements. Usually, a little work on 'low-hanging fruit', stuff that gives a lot of result for a little bit of work, is the best place to look. For repeatedly computed values, I would sometimes cache a few results. Other times, I might see if there is some few system functions that can be made to do the same work. On math-heavy functions, there were times when I'd look for a better solution or 'accurate enough but much faster' solution using calculus. I'd never spend more than two days optimizing a bit of functionality, and I'd get better results than our 'optimize it in asm' guru.
Yes, I would spend a little time thinking about data locality (stuff in the CPU cache vs. ram) but typically that doesn't give me the biggest bang for the buck. But I'm not inherently wasteful, either. I still invert and multiply rather than divide (it's a habit), but I know that we have automatic vectorizers and both high-level and low-level optimizers in our compilers, and an out-of-order core with AT LEAST two floating point, two integer, and one memory interface unit.
And did I mention, I write software with realtime constraints; I'm constantly fighting my co-workers over CPU quotas. I read and refer to the intel and AMD processor documentation, but usually only to see which high-level functionality best lends itself to the hardware. I am tempted to 'go straight to the metal' occasionally, or to count the CPU cycles of everything, but I know that I can get bigger gains elsewhere. That's what the point of the article is, I believe.
Re:This guy is out on a limb (Score:2, Insightful)
I should have ended the post by typing "HEY, GRAMMER FREAKS! LOOK AT ME! I SUCK!" instead of writing that last sentence.
Quick, go get Fortran 95. (Score:3, Insightful)
Fortran 77 sucks.
But C++ sucks, in different ways.
Fortran 95 is a much better language than Fortran 77, and for many things, better than C++ as well.
It is practically a new language with an old name.
If you currently have a F77 code, it is almost certainly far better to start using Fortran 95.
Essentially all Fortran 95 implementations have compile and run-time checks which can make things as safe as Java, and when you take off the checks, things will run very fast. With the Intel Fortran 8.0, probably faster than anything other than hand-tuned assembly. You will probably whip GCC C++.
It is also quite doubtful you will get significantly better performance in C++.
No, I am not an old-fogey (I'm 35 now, programming since age 13). I learned Basic on the Apple II+, then Fortran 77 and C simultaneously when I got my first summer job. Then C++. Then Eiffel and Sather. Then Fortran 95.
Yes, indeed, fully knowing C++ I choose Fortran 95 for technical superiority in the problems I want and need to solve. (Sather was the best ever, but now dead, Eiffel good if you have sophisticated data structures and you don't need multi-dim arrays, and F95 best for any linkage to Fortran and multi-dim arrays, modules but not objects).
The problem is that C++ bugs, though less frequent than bugs in C, are can be deep, subtle and severe. The language has very opaque bits. Include files are antideluvian. Pointers and references, baroque and archaic. Object model brittle. Templates powerful and dangerous. A hideous and error-prone syntax.
This is not the case in Fortran 95. Other than fully algorithmic bugs are shallow.
"computer science" truly misunderestimates Fortran 95.
Re:You don't optimize, that's the job of the compi (Score:4, Insightful)
A compiler can do low-level optimization, but it can't figure out a better algorithm for you, and the simplest, least convoluted algorithm is usually not the fastest.
All the assembly language fiddling in the world -- by the optimizer or by hand -- will give you maybe a 2x performance over C, 10x over perl, but a better algorithm will often increase performance by many orders of magnitude.
Re:Clarity in Code (Score:2, Insightful)
Guilty. But at least I'm thinking about the poor SOB who's going to be maintaining my code. In fact we were just implementing new functionality using a superfast but arcane algorithm and were having trouble debugging it (mucho matrix maths - yuk). Instead of finishing that, we researched another algorithm that instead uses triply-nested loops with two conditionals. It won't be half as fast (because of conditionals within loops of course) but it will be a heck of a lot easier for my successor to maintain. (Took 10 minutes to implement and worked first time. Had to check I wasn't stuck in a BTL simulation)
Re:speed/easy coding (Score:3, Insightful)
Of course, a correct and fast server is much better than a correct and slow server.
Re:The question I always ask is (Score:5, Insightful)
Re:This guy is out on a limb (Score:4, Insightful)
Not only that, but even simple 2d games can need optimizing. Perhaps they need optimizing because they're on an inherently slow platform (like Flash or a cell phone), or perhaps they need optimizing because they're multiplayer (and games with bad network code are immediately obvious and usually fail miserably)
I find it strange that so many programmers here talk about things being "fast enough" or "not worth my time"... yet any article about mozilla, openoffice, windows, osx, damn near any software package with a gui is filled with complaints about slowness and bloat.
Makes you wonder what IS worth their time.
Re:This isn't an article about optimization (Score:2, Insightful)
For instance, the famous 'Goto is considered harmful'.
In actual machine code, the processor's equivalent of 'goto' (usually called a 'jump') is one of the most common operations...
Another way of looking at this is antilock brakes in cars.
It's not so much that the 'new' way of doing things is really any better to a skilled user. But they sure help reduce headaches caused by a lack of skill on the part of a new and/or less talented person.
Re:Don't agree (Score:2, Insightful)
It doesn't sound like any low level work is going to get you anywhere in your simulation. The best bet is to buy lots of hardware to brute-force it...
Or might statistics/ heuristics help? Picking the most likely region(s) (based on other theory) for computation, calculating that (those) first, and then working out into (assuming probability is smooth) less likely region(s).
As the article points out, these kinds of optimizations (very similar conceptually to those in the article) are those easiest to do in a very high-level language.
wtf (Score:4, Insightful)
BS.
First of all, erlang won't catch logical or algorithm errors, which are quite common when you're optimizing.
Second, you can optimize just fine in C++ the same way just as easily, IF YOU ARE A C++ programmer. You just try out some new techniques the same way you always do. So array bounds aren't checked. You get used to it and you just stop making that kind of mistake or else you get good at debugging it. Hey at least you have static type checking.
In fact you might be able to do a better job of optimization because you'll be able to see, right in front of you, low level opportunities for optimization and high level ones also. C++ programmers aren't automatically stupid and blinded by some 1:1 source line to assembly line ratio requirement.
Re:You don't optimize, that's the job of the compi (Score:5, Insightful)
Wrong. Dead wrong.
You don't micro-optimise unless the compiler doesn't do the job well enough. But nowadays, you almost never have to. Your superior brainpower can mostly be freed from the mundane details of your hardware and instead you can concentrate on using more suitable algorithms or data structures.
Indeed, the best thing you can do to get your code running fast is to write it with good abstractions. That way, when you find a performance problem, you can swap some old code out and swap some new code in and everything else will still work.
Re:Asymptotic performance (Score:5, Insightful)
To give a nice example: a colleague of mine worked on a program that took two months to execute (it consisted of finding the depth of all connections between all nodes in a graph containing 50,000 nodes). Since the customer needed to run this program once a month, this took far too long. So my colleague rewrote the whole program in assembly, which took him a few months, managing to reduce the required time to, indeed, one month.
My boss then asked me to take a look at it. Together with a mathematician I analysed the central function of the program, and we noticed that it was, basically, a matrix multiplication. We rewrote the program in Delphi in an hour or so, and reduced the required running time to less than an hour.
I won't spell out the lesson.
Re:Followed your link (Score:5, Insightful)
This may be true when you are producing libraries of math routines and similar stuff like you are doing. It doesn't hold an ounce of water when you do the sort of work I do. My projects are generally medium sized, mixed languages, developers of all different skill levels. Code clarity is far more important for 98% of the stuff we do. I need my juniors to be able to follow the code the seniors write, even if they can't write it themselves. The other 2% of the time it's fine to sacrifice clarity for speed to get the performance to an acceptable level on the target platform.
I have generally found that clear code is usually good code, so long as you are aware of the cost implications of your design decisions. For instance, I seem to recall the bubble sort (mentioned earlier) was actually faster than a qsort under some circumstances. Deep data knowledge would help you to make the decision as to which would need to be used...don't just reach for that qsort, it may be the fastest under most cases, but not all.
Re:Funny thing about performance (Score:5, Insightful)
> Any tendency to optimize prematurely ought to be
> avoided, at least until after v1.0 ships.
Performance gains occur at the algorithm level. It doesn't matter how much hardware you throw at a problem if it needs to scale properly and you have an O(n^3) solution.
Re:Asymptotic performance (Score:2, Insightful)
You might be a good optimiser, or you might just be a good programmer. Your colleague however, is a bad programmer.
-Lasse
Persuasive, widely accepted, and wrong. (Score:1, Insightful)
Today they're widely accepted. Most of the responses on
And that's probably why my 2GHz cpu doesn't really feel much snappier than the first computer I ever had, a 10MHz 286 box.
Re:Throw hardware at it. (Score:2, Insightful)
Most of my server side code (javascript) works fast enough on the first try, with most of the optimization being in the design, rather than in coding tweaks. I'll do other simple optimizations as I see them if there's a noticable slowdown and they won't adversely affect the clarity, maintainability, or reusability of the code. My biggest optimization on recent sites was an AES encryption/decryption library written in javascript, which I managed to speed up several fold. I've done research to determine the optimal tweaks for our servers, to get speedups without writing better code. Client side things that work fast enough from the start I don't bother to optimize. 90% of the time, reusable code is much better than efficient code. Most of what I write is 10x as fast as it needs to be, and with our web sites, bandwidth is the most limiting factor. I do optimize a bit to reduce bandwidth, but not excessively. And we spend roughly $0 a year on server hardware, and have since probably the year 2000.
There's one last optimization that we're planning on making that will eliminate 99% of the need for future optimization, allow us to write horribly innefficient code in favor of code reuse (such great reuse that our non-programmers can jump into development), and it'll give us maximum performance from our database driven web sites. It's a home-grown site generator, for all the pages that are created from a database but are otherwise static. The use of static pages means they can be gzipped and cached on the server, allowing for near instantaneous page loads with low server overhead.
Then there's the CS side of the fence:
Right now I'm working on a capstone project that will be one big exercise in optimization, being extremely 3d, memory, cpu, and hard disk intensive, working with data that's to big to fit into memory without culling but needs to be rendered at high frame rates anyway, so everything I just said can go out the window for the next couple months.
Re:Funny thing about performance (Score:5, Insightful)
I still don't think you should start doing every single silly trick in your code, like unrolling loops by hand, unless there's a provable need to do so. Write clearly, use comments, and use a profiler to see what needs to be optimized.
That is coming from someone who used to write assembly, btw.
But here's the other side of the coin: I don't think he included better algorithms in the "premature optimization". And the same goes for having some clue of your underlying machine and architecture. And there's where most of the problem lies nowadays.
E.g., there is no way in heck that an O(n * n) algorithm can beat an O(log(n)) algorithm for large data sets, and data sets _are_ getting larger. No matter how much loop unrolling you do, no matter how you cleverly replaced the loops to count downwards, it just won't. At best you'll manage to fool yourself that it runs fast enough on those 100 record test cases. Then it goes productive with a database with 350,000 records. (And that's a small one nowadays.) Poof, it needs two days to complete now.
And no hardware in the world will save you from that kind of a performance problem.
E.g., if most of the program's time is spent waiting for a database, there's no point in unrolling loops and such. You'll save... what? 100 CPU cycles, when you wait 100,000,000 cycles or more for a single SQL query? On the other hand, you'd be surprised how much of a difference can it make if you retrieve the data in a single SQL query, instead of causing a flurry of 1000 individual connect-query-close sequences.
(And you'd also be surprised how many clueless monkeys design their architecture without ever thinking of the database. They end up with a beautiful class architecture on paper, but a catastrophic flurry of querries when they actually have to read and write it.)
E.g., if you're using EJB, it's a pointless exercise to optimize 100 CPU cycles away, when the RMI/IIOP remote call's overhead is at least somewhere between 1,000,000 and 2,000,000 CPU cycles by itself. That is, assuming that you don't also have network latency adding to that RPC time. On the other hand, optimizing the very design of your application, so it only uses 1 or 2 RPC calls, instead of a flurry of 1000 remote calls to individual getters and setters... well, that might just make or break the performance.
(And again, you'd be surprised how many people don't even know that those overheads exist. Much less actually design with them in mind.)
So in a nutshell, what I'm saying is: Optimize the algorithm and design, before you jump in to do silly micro-level tricks. That's where the real money is.
Re:me too... (Score:2, Insightful)
Who cares? I'll bet that for a typical techie, the collection of all the source code they've ever written ever is smaller than thier current MP3 collection. By at least one order of magnitiude. Give me readability.
and compiles faster (because the parser doesn't have to read and ignore all those whitespace characters).
Technically you are right. However this effect will be utterly negligable next to the things that really take up time when compiling a c++ program, such as preprocessing #defines, expanding tempate code, checking types, resolving variable references, and generating machine code. c++ compilers are generally quite slow (as compared to say, Borland Delphi) this is due to the complexities and terseness of the language, not the whitespace.
Re:This guy is out on a limb (Score:3, Insightful)
I don't know why OpenOffice is slow, I've never analysed the way it works in enough detail. I'm sure the reason is fairly obvious to anyone who knows the code base well enough to comment.
Windows isn't really slow, but has some annoying features that have been added recently that can slow you down; for instance in the user interface it will try to open files of certain types to display information about them whenever you perform any operations on them, which isn't exactly helpful if opening the file is too slow...
My only experience of using OSX is that it's blindingly fast. But I've used it for about 3 hours, so that's hardly conclusive.
But you see the pattern -- these systems are slowed down by features that are _necessarily_ slow. You couldn't have the same features without the performance problems they bring. Windows can't give you a preview of an image, or tell you how many files are in an archive, without opening it (although I _really_ wish it'd do it in another thread...). Mozilla can't support its really easy-to-write user interface extensions without an interpreted UI.
The people complaining are people who don't actually want these features, and don't see why they should suffer for them, which is a fair point.
Re:Funny thing about performance (Score:4, Insightful)
Certain algorithms take more-than-proportionately longer as the data size increases. For example, if you're writing route-planning software, each additional stop on a route might cause the number of calculations required to (roughly) double.
In such a case, having hardware which is twice as powerful would mean that performance would half, although as soon as the user added two more data points, the performance would be slower than the original machine.
To clarify a tad, let's say FedEx decides to optimize the routes drivers in Montana are travelling. Assume that there are 10,000 stops and 200 drivers, and that your code runs in, say, an hour on FedEx's machines.
Assume that you've used an algorithm for which each additional data point doubles the amount of computation required. Now FedEx deciding to hire 10 more drivers means that your route planning software is going to take 2^10 times as long to plan their routes (since it doubles for each new data point, that's 2^1 for one driver, 2^2 for two, 2^3 for three...).
The point is that tiny operations add up when you've chosen the wrong algorithm. Despite the fact that runtime was fine using FedEx's CPU farm in the original situation, your disregard for efficiency will cause the route-planning time to take not the overnight-batch-job-friendly hour, but a stunning 1024 times as long (hint: over a month).
Say a new big fast machine enters the market, with four times the CPU power. FedEx will still need 256 times as many machines to perform the same calculations in under an hour, or at least, say, 32 times as many in order to be able to perform them overnight.
All because you decided that choosing algorithms based on performance was poppycock.
Prematurely optimizing on a microscopic level may be "bad", but choosing the proper algorithm can make the difference between a product with a future and a product with too many designed-in limitations to be able to handle greater-than-expected loads.
(CS fans will note that the TSP problem was a unrefined to have pulled out given the whole P/NP thing, but that's the point -- sticky situations can and will arise for which no amount of source-level optimization will save the day.)
Somebody doesn't understand O notation... (Score:3, Insightful)
You totally missed the point, didn't you? There are situations where a bubble sort is faster than a merge sort or a quicksort. It has almost no setup overhead, so if you're sorting sufficiently small arrays (and what I remember from CS101 is that "sufficiently small" goes up to about 1000 members) bubble sort is actually significantly faster.
So, as a matter of fact, if you had to sort a million small arrays, bubble sort would be the only feasible option.
Those who code so no one can do their work... (Score:2, Insightful)
Given that the only reason to deliberately make it hard for others to understand your work is to increase your job security, that must mean that you don't think you bring enough other skills to the job to keep it on merit.
In other words, you don't think you're good enough.
And given that most programmers think they are better than they actually are, if you don't think your good enough, why the hell should anyone else?
JIT optimization is just peephole optimization (Score:3, Insightful)
People keep saying that the JIT-style optimizers in .NET and Java can radically optimize the application "for programmers who can't or won't."
Peephole optimization and clock-scheduling are among the simplest of optimization. The machine looks at a few low-level instructions and might suggest an alternative which would operate identically but with better performance. That's really all that the VM has time or capability to perform today.
Mid-range optimizations include vectorizing, unrolling of loops, and register reduction. These are still machine-analyzable, so I expect the JIT-style optimizers to continue to make strides here.
But I don't think you're ever going to see JIT-style optimizers which replace an O(n^2) algorithm with an O(log n) algorithm. That is real optimization. That's where you win the performance races. That's the one that programmers should care about, and should learn how to do. The level of analysis required to "divine" the whole meaning of a large routine, realize the alternative algorithm equivalent, and fix up the code is far beyond any JIT solution.
I think we will have to wait far longer than the 6 GHz Longhorn machines before you see any meaningful machine optimization of sloppy code.
Re:Funny thing about performance (Score:4, Insightful)
Scarily, you have just enough knowledge to sound like you know what you're talking about. Sometimes it DOES matter how much hardware you throw at the problem, lest you forget the specialized hardware DESIGNED to crack DES.
How about your next computer I replace all the carry-lookahead adders with ripple-carry adders? Please look up those terms if you don't know them. I'm sure you'd be unpleasantly surprised.
Re:Throw hardware at it. (Score:4, Insightful)
The problem your prof is probably trying to get you to avoid is wasting time tuning code that rarely gets executed. It comes down to the old 80/20 rule. Sure, you can spend weeks hand tuning some import routine, but all your time was wasted if that import is only run once a month, at night while the system is offline.
Re:me too... (Score:3, Insightful)
This is a well-known problem, (Score:3, Insightful)
Re:Throw hardware at it. (Score:4, Insightful)
Optimization is great, but profiling to make sure that your optimization isn't wasted is more important.
Re:Performance tuning. (Score:5, Insightful)
The most effective, well used (if unintentionally used) development methodology is the prototype methodology. The first pass is simply a reality check, can we even accomplish what needs to be accomplished on the hardware and development tool we have available? The prototype is then shown to management as a proof of concept, show them that their ideas are possible, and then a second generation is re-engineered from the ground up using the lessons learned in the first generation as a foundation for a solid, well engineered deliverable product. This breaks down in one of two ways : management says screw the rewrite, lets just run what we have - or the developers are not smart enough to understand that their first pass at it wasn't production quality code, only a prototype.
What your client has right now is a prototype, a proof of concept. It 'works' inasmuch as a kite flys - as a demonstration that the concept is viable, but not meant for real work. You could probably push a big kite hard enough to 'fly' two people, but that doesn't make it a good idea. You could continue to 'tweak' a kite in order to even double the performance, get 4 people off the ground - but I wouldn't recommend using it for commercial applications.
Odds are the app needs to be understood from top to bottom so a set of software engineers know the concepts, what the package is intended to do, how it currently does it, what the expectations are for performance and growth - and then the SE's that understand it need to rewrite it from the ground up developing performance engineered code that is production quality.
Better compilers, clean code (Score:2, Insightful)
I don't know Erlang, but if it is a pure functional language, the compiler/interpreter can use "special" optimizations, e.g.
will not produce intermediate lists, instead the compiler will use lazy evaluation when decoding the data.My point is that many optimizations do not sacrifice readability. Many times it is possible to refactor slow code that improves both readability and execution speed, but you must know the pros and cons of the tools you are using!
Ummmm... (Score:2, Insightful)
Re:Throw hardware at it. (Score:4, Insightful)
On the other hand, if it's something that hundreds of people are going to be using four or five times a day, then it's probably worthwhile to do some algorithmic/data structure improvements.
Finally, you get the extreme case: some library that will end up being used by millions. Those are the times when you want to eke out every bit of performance you can. The size of the project doesn't always determine its importance, nor does the importance of the project always determine how much optimization is needed.
what's the score (Score:3, Insightful)
Despite the fact that nobody (other than him) seems to be interested in targa graphics these days, and that the total amount of data involved (less than a megabyte) is miniscule compared to limiting bandwidths in modern computers (ranging from ~100 MB/sec for the disk I/O subsystem, ~500 MB/sec for the main memory, and in the multiple GB/sec for the caches and internal registers).
Then he shows us just what a wonderful benchmark this program is: it is so wonderful that it runs nearly instantaneously! Maybe it's just me, but I like my benchmarks to take a little while to complete, either becuase I don't trust the sub-second accuracy of the system time routines, or because I like to get a reasonable sample of system states contributing the overall performance of the benchmark.
Next, the guy tells us how he used unsatisfactory tools to implement an ill-conceived algorithm and, glory-glory, later fixed his own dumb-ass mistakes!
Finally, he claims that he would not have been able to make the same kinds of algorithmic adjustments if he had implemented in C rather than Erlang, though it is not obvious why it would have been more difficult (and he doesn't give any arguemnt to support his assertion).
From this exercise he concludes that, GASP, optimization is still important!
What a senseless waste of skin.
Re:Somebody doesn't understand O notation... (Score:4, Insightful)
For a better analysis of optimization in this specific part of the sort space, I recommend Jon Bentley's classic "Engineering a sort function [utexas.edu]".
This paper discuss how to implement an optimal sort, after having done real-life measurements. Conclusions include dropping to an O(N^2) sort algorithm when qsort partitions become small enough - insertion sort was choosen. (The selected cut off was secven elements at that point; it may be that it would be sensible to choose a higher cutoff for the generic case now, as the cache locality might help. However, I won't bet on this either way without doing measurements.)
The qsort implemented there is the one still used in at least FreeBSD. I don't know the status for other OSen.
As for big O notation: The discussion in the previous post is so imprecise as to be misleading. It use "cost" and "complexity" where it discuss asymptotic complexity; these are distinctly different, and it is necessary to be quite clear on the distinctions to do correct analyses.
Big-O notation measure asymptotic complexity over an arbitrarily selected set of basic operations assumed to have unit cost. It discard all constants to make the analysis easy to do and easy to work with. This is a useful tool, but it only measure asymptotic complexity, and it only does it based on arbitrary basic operations.
In practice, a mere factor 1000 speed difference (one second to twenty minutes) might be quite noticable. This will be REMOVED from the big-O analysis, which can make it point in a quite different direction from the truth.
In the parent post, sorting 1000 elements is assigned a unit cost, claiming that the time will be similar for a bubble sort and a quick sort, and "low enough not to matter". Further, the conclusion is "never use bubble sort". Assuming a naive implementation of both bubble sort and quick sort, and a set of arrays that is already sorted, the quicksort will be O(N^2) and the bubble sort will be O(N) in the number of items in each bin. This is a quite noticable difference in asymptotic complexity.
A naive programmer is in my opinion the only relevant assumption if we're to give absolute advice on simple sort functions. A non-naive programmer will know how to do complexity evaluation, will know the tradeoffs on startup of the various algorithms, and will only be implementing a sort him- or herself because actual speed measurements or specific knowledge of the sort behaviour show that the system supplied sort is not fast enough for the case in question, and that a custom sort can do better. (S)he will also evaluate whether the data to sort is likely to be almost sorted or highly random, and thus which kind of algorithm is likely to go faster. (And insertion sort/bubble sort is actually faster also for large data sets if they're almost sorted beforehand.)
Eivind, who if he had to give general advice would give "evaluate qsort, mergesort, heapsort, insertion sort, and using a data structure that keeps order before choosing bubble sort."
Useless article (Score:3, Insightful)
Sure, if you're talking about puzzle games.
However, for most retail games pushing the graphics as far as possible is important. If you can squeeze a 5% improvement out of the engine you can use the freed up time to make the game a bit prettier. Or put another way, art expands to fill available processing power. Graphics blocking on the video card? Well, you can use processing power for increasingly realistic physics simulations and artificial intelligence.
If you play a lot of games you know that there is great variance between games. Some games coast along at a bare minimum while others surprise you with their ability to create compelling visuals with older hardware.
Ummmm, just about anyone sane? Wow, decoding a measely megabyte of data. And the encoding? Simple run length encoding. That's not a real programming problem; that's a homework assignment for Computer Science 101. If you're keen on loading graphics you could at least pick something that is slightly complicated like JPEG.
Is it true that optimization is massively overrated; that most programs are plenty fast? Sure. But this article doesn't provide a bit of evidence for that.
Scientific computing (Score:2, Insightful)
Get programming as if SIZE matters... (Score:2, Insightful)
I think size is a far more serious problem than speed-- just because I can put multi-gigabytes in my PC doesn't mean I want to waste them loading bloatware. In fact, I probably wouldn't NEED the multi-gigabytes if it wasn't for code bloat. Of course, Gates loves such things because the machine retailers love them-- easier to sell more memory to someone who needs it because they just tried to upgrade to XP or something.
So which compilers space-optimize by rolling loops instead of unrolling them?
Broken C++ code (Score:3, Insightful)
vector <pair <int, float> > myVector
and
vector<pair<int,float>>myVector
It's only subtle until you try to compile it
Derek
That guy doesn't know ANYTHING about performance (Score:1, Insightful)