Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Programming IT Technology

Programming As If Performance Mattered 615

Junks Jerzey writes "Saw the essay 'Programming as if Performance Mattered', by James Hague, mentioned at the Lambda the Ultimate programming language weblog. This is the first modern and sensible spin on how optimization has changed over the years. The big 'gotcha' in the middle caught me by surprise. An inspiring read." Hague begins: "Are particular performance problems perennial? Can we never escape them? This essay is an attempt to look at things from a different point of view, to put performance into proper perspective."
This discussion has been archived. No new comments can be posted.

Programming As If Performance Mattered

Comments Filter:
  • speed/easy coding (Score:2, Interesting)

    by rd4tech ( 711615 ) on Thursday May 06, 2004 @12:41AM (#9070665)
    The golden rule of programming has always been that clarity and correctness matter much more than the utmost speed. Very few people will argue with that

    Really? How about the server side of things?

    Shameless bragging: Why don't you take a look at my page to get a whole new view on peformance?
  • by ObviousGuy ( 578567 ) <ObviousGuy@hotmail.com> on Thursday May 06, 2004 @12:44AM (#9070683) Homepage Journal
    You can spend all your time optimizing for performance and when you finally release your product, your competition whose main objective was to get the product out the door faster, who uses a slower algorithm, is already first in mindshare with your customers. Not only that, the processors that you thought you would be targetting are already a generation behind and that algorithm that was going to hold back your competition runs perfectly fast on new processors.

    Performance gains occur at the hardware level. Any tendency to optimize prematurely ought to be avoided, at least until after v1.0 ships.
  • by ObviousGuy ( 578567 ) <ObviousGuy@hotmail.com> on Thursday May 06, 2004 @12:49AM (#9070706) Homepage Journal
    No, I use the language's sort routine. This typically means quicksort or heapsort.

    Do you code all your own algorithms?
  • I think... (Score:2, Interesting)

    by rms_nz ( 196697 ) on Thursday May 06, 2004 @12:49AM (#9070707)
    ...it would have been better for him to show the run times for all the versions of his program to show us what difference each of the changes had made...
  • by DakotaSandstone ( 638375 ) on Thursday May 06, 2004 @12:49AM (#9070709)
    Yes, yes, yes. Do optimize. But, come on people, do we really need to turn that nice readable device init code that only executes once into something like:
    for (i=0,j=0,init();i!=initDevice(j);j++,writeToLog()) ;

    Sheesh!

  • Don't agree (Score:5, Interesting)

    by Docrates ( 148350 ) on Thursday May 06, 2004 @12:51AM (#9070718) Homepage
    While the author's point seems to be that optimization and performance are not all that important, and that you can achieve better results with how you do things and not what you use, I tend to disagree with him.

    The thing is, in real life applications, playing with a Targa file is not the same as service critical, 300 users, number crunching, data handling systems, where a small performance improvement must be multiplied by the number of users/uses, by many many hours of operation and by years in service to understand its true impact.

    Just now I'm working on an econometric model for the Panama Canal (they're trying to make it bigger and need to figure out if it's worth the effort/investment) and playing with over 300 variables and 100 parameters to simulate dozens of different scenarios can make any server beg for more cycles, and any user beg for a crystal ball.
  • by jesup ( 8690 ) * <randellslashdot&jesup,org> on Thursday May 06, 2004 @12:57AM (#9070743) Homepage
    66 fps on a 3 GHz machine, doing a 600x600 simple RLE decode...

    Ok, it's not bad for a a language like Erlang, but it's not exactly fast.

    The big point here for the author is "it's fast enough". Lots of micro- (and macro-) optimizations are done when it turns out they aren't needed. And writing in a high level language you're comfortable in is important, if it'll do the job. This is a good point.

    On the other hand, even a fairly naive implementation in something like C or C++ (and perhaps Java) would probably have acheived the goal without having to make 5 optimization passes (and noticable time examining behavior).

    And even today, optimizations often do matter. I'm working on code that does pretty hard-real-time processing on multiple threads and keeps them synchronized while communicating with the outside world. A mis-chosen image filter or copy algorithm can seriously trash the rest of the system (not overlapping DMA's, inconvenient ordering of operations, etc). The biggest trick is knowing _where_ they will matter, and generally writing not-horrible-performance (but very readable) code as a matter of course as a starting point.

    Disclaimer: I was a hard-core ASM & C programmer who for years beta-tested 680x0 compilers by critiquing their optimizers.
  • by jbms ( 733980 ) on Thursday May 06, 2004 @01:07AM (#9070779)
    As another user commented, server software can benefit greatly from a large variety of optimizations, since better performance translates directly into supporting more users on fewer/cheaper servers.

    Optimizations also have significant effect in software designed to perform complex computations, such as scheduling.

    Also, the trend of ignoring performance considerations with the claim that modern hardware makes optimizations obselete is precisely what leads to the trend, particularly among Microsoft software, for the software to become significantly slower with each revision.
  • by neil.orourke ( 703459 ) on Thursday May 06, 2004 @01:07AM (#9070782)
    But the great demos on the Amiga and C64 never hit the OS.

    Have a look at some of the PC demos that boot from DOS and take over the machine (eg. www.scene.org) and tell me that they aren't just as amazing.
  • You would think that with all the years put into developing computer languages, as well as the decades of software engineering, that these algorithms and techniques would make their way into best practices.

    This, of course, has already begun with many frequently used algorithms like sorting or hashing being made part of the language core libraries, but more than that, it seems that duplicating effort occurs much more often than simply that.

    This is one instance where Microsoft has really come through. Their COM architecture allows for inter-language reuse of library code. By releasing a library which is binary compatible across different languages, as well as backwards compatible with itself (v2.0 supports v1.9), the COM object architecture takes much of the weight of programming difficult and repetitive tasks out of the hands of programmers and into the hands of library maintainers.

    This kind of separation of job function allows library programmers the luxury of focusing on optimizing the library. It also allows the client programmer the luxury of ignoring that optimization and focusing on improving the speed and stability of his own program by improving the general structure of the system rather than the low level mundanities.

    Large libraries like Java's and .Net's as well as Smalltalk's are all great. Taking the power of those libraries and making them usable across different languages, even making them scriptable would bring the speed optimizations in those libraries available to everyone.
  • by techno-vampire ( 666512 ) on Thursday May 06, 2004 @01:24AM (#9070868) Homepage
    The thing is that a lot of programmers today have grown NOT to respect the need for performance - they just assume that the upcoming systems would have really fast processors and infinite amounts of RAM and diskspace, and write shitty code.

    That's not the only reason. Programmers usually get to use fast machines with lots of RAM and diskspace, and often end up writing programs that need everything they have.

    Back in the DOS days, I worked on a project that had a better way of doing things. We had one machine with reasonable speed as the testbed. It wasn't well optimized as we didn't expect our customers to know how to do that and the programs we were writing didn't need expanded or extended memory. If what you wrote wouldn't run on that machine, it didn't matter how well it worked on your machine, you had to tweak it to use less memory.

  • by corngrower ( 738661 ) on Thursday May 06, 2004 @01:27AM (#9070885) Journal
    Often times to get improved performance you need to examine the algorithms used. At other times, and on certain cpu architectures, things that slow your code can be very subtle.

    If you're code must process a large amount of data, look for ways of designing your program so that you serially process the data. Don't try to bring large amounts of data from a database or data file all at once if you don't have too. Once you are no longer able to contain the data in physical memory, and the program starts using 'virtual' memory, things slow down real fast. I've seen architects forget about this, which is why I'm writing this reminder.

    On the other hand I've worked on a C++ project where, in a certain segment of the code, it was necessary to write our own container class to replace one of the std: classes, for performance on the SPARC architecture. Using the std: container would cause the subroutines to nest deeply enough to so that the cpu registers needed to be written out out to slower memory. The effect was enough to be quite noticeable in the app.

    With today's processors, to optimize for speed, you have to think about memory utilization, since running within cache is noticably faster than from main memory. Things are not as clear cut, so far as speed optimization goes, as they once were.

  • by Anonymous Coward on Thursday May 06, 2004 @01:31AM (#9070892)
    While I agree that it is silly to spend eternity optimizing small routines that trivially achieve the required level of performance for their intended purpose, a lot of scientific packages and simulation software absolutely demand serious performance optimization. The mantra about premature optimization being the root of all evil is only true if it doesn't save you significant development and testing time as a result. If you have a simulation application that requires an hour to do enough work to complete a minimal function/correctness test, it immediately pays off if you spend time optimizing the code if you can reduce your test times down to a half hour, for example. I've worked on a lot of software packages that run calculations for days and/or weeks at a time on parallel computers. You always want to start out with fast sequential algorithms and data structures before you get into using multiple processors. Parallelism is inherently more complex, so its often worth it to squeeze the most you can out of a single thread/process before going to multiple threads/processors. While desktop apps typically consume a negligable amount of CPU time/resources, apps that are candidates for running on parallel computers, clusters, or big SMP machines are inherently more costly to run in both CPU time and user's wall-clock time, so those applications don't fall within the same logic that trivial apps like image decompression/formatting do.
  • by ivec ( 61549 ) on Thursday May 06, 2004 @01:35AM (#9070916) Homepage

    - Decoding a RLE data buffer is short of impressive as a benchmark. RLE was designed as a simple and specific (generally inefficient) compression approach for age-old hardware (i.e. 8MHz, not 333MHz as the base system used here).
    How about JPEG or PNG ?

    - The author actually spent several iterations optimizing this Erlang code. And these optimizations required handling special cases. (So performance eventually did matter to the author?) Now, would a 'first throw' implementation in C/C++ have been written faster while immediately performing better than the Erlang version? (simpler code)

    - I agree that the compiled/interpreted code performance matters less and less, because processors are so much more powerful. For instance, the processing for RLE decompression should in any case be negligible wrt the memory or disk i/o involved.
    What is becoming increasingly important, however, is the data structures and algorithms that are used. In this perspective, C++ still shines, thanks to the flexibility that its algorithms and containers library provides.
    C++ offers both a high level of abstraction (working with containers), and provides the ability to convert to a different implementation strategy with ease - if and when profiling demonstrates a need.
    For large system and library development, the strong static typing of C++ is also a real plus (it doesn't matter to me it is faster than dynamic typing or not).

    I totally agree that performance should not be a concern during program implementation (other than avoiding 'unnecessary pessimization', which involves the KISS principle and knowledge of language idioms). Optimization should only be performed where the need for a speed-up has been demonstrated.
    Other than saying "wow this interpreted language runs damn fast on current hardware", this article does a poor job at making any relevant point.

    radix omnia malorum prematurae optimisatia est -- Donald Knuth

  • by aiyo ( 653781 ) on Thursday May 06, 2004 @01:39AM (#9070924)
    My software engineering prof. believes that optimization should never be done during a project. Instead he thinks the programmer should wait until the project is complete then give careful consideration as to wether to optimize or not. He says most problems can be fixed by upgrading to better hardware and hours of optimization is not worth 3-4k more in hardware costs. I thought he was crazy to preach this during lecture. What do you guys think? Would you spend a day designing a better algorithm or finish the project and buy faster hardware?
  • When to optimize (Score:2, Interesting)

    by lejordet ( 766984 ) <lej-slashdot@circuitry.no> on Thursday May 06, 2004 @02:01AM (#9071005) Homepage
    When I start on a program, I usually make "place holder" functions where necessary to get the program up. Sure, this will be slow, but at least I can get the program up and running quickly (the place holders usually do what they're supposed to in the most convenient-to-code way I could think of, or emulate their final functionality - for example by returning true all the time).

    What this achieves for me, is that I can look at the program as a whole, and _then_ identify where the problem areas are - most likely not where I thought they were... Even if the first version takes 5 minutes to run (as my first attempt at a depth-first tree search did), it works passably, and is often easier to optimize than trying to optimize each function as I write it.

    Might not work for everyone, but I like coding this way :)
  • by Animats ( 122034 ) on Thursday May 06, 2004 @02:04AM (#9071015) Homepage
    "I was also bothered--and I freely admit that I am probably one of the few people this would bother--by the fragility of my decoder."

    And, sure enough, there's a known, exploitable buffer overflow in Microsoft's RLE image decoder.

  • by SatanicPuppy ( 611928 ) <SatanicpuppyNO@SPAMgmail.com> on Thursday May 06, 2004 @02:09AM (#9071029) Journal
    Depends on how bad it is. I've seen stuff that runs so slow there really isn't a way to throw more hardware at it. Of course that was written by a guy who had two goals: 1) to make sure no one but him could support his work, and 2) to do as little work as possible.

    I don't know. Clean, elegant, functional code is beautiful. If you're ever going to have to work on it again, I think it's better for it to be clean and optimized.

    Also depends on the size of the app. With a small app, what excuse do you have for not optimizing? Wouldn't take that long. With a big project? Depends on your work environment.

    The bosses will never know if its optimal or not. If you tell them you've maxed out the server, they just think you write big badass code. A lot of times though, there isn't time to thoroughly bug check a big app (That what users are for, eh?), more less optimize it.
  • uhh.. yeah (Score:3, Interesting)

    by XO ( 250276 ) <blade.eric@NospAM.gmail.com> on Thursday May 06, 2004 @02:10AM (#9071036) Homepage Journal
    ok, so this guy is saying that.. he found 5 or 6 ways to improve the performance of his program by attacking things in an entirely different fashion... ok..

    back in the day, i discovered a really great trick... you might represent it as something like... :

    boolean a;
    a = 1 - a;

    this is a zillion times more efficient than if(a == 1) a = 0; else a = 1;

    it is also about the same as a |= 1; if you were going to use bitwise functions.

    OK. Great.

  • by defile ( 1059 ) on Thursday May 06, 2004 @02:10AM (#9071037) Homepage Journal

    How is this fair? He completely and utterly changed the entire assignment on you forcing you to throw all of your work away. And gave you one week for it!?

    apt-get and emerge are two totally different implementations of the same idea. Changing the environment on you may have taught you a lesson about how optimizing eliminates robustness, but if the last professor encouraged you to try MMX/SIMD instructions then you were totally right to tie yourself to the x86.

    I would've kicked that moron's ass.

  • by metlin ( 258108 ) * on Thursday May 06, 2004 @02:20AM (#9071070) Journal
    You assume that I made that reference to myself as being a bad programmer.

    The reason I made that statement was because just last week I was at Redmond for an interview for internship at Microsoft, and I was interviewed by the team that was trying to prevent just this sort of thing from happening.

    The idea was to design heuristics-enabled compilers that would effectively detect any "bad-code" and help make managed code and pseudo-managed code the norm, or convert existing code into managed code.

    I did not say that I was using a programming language that had such protections, merely that such programming languages have their own advantages. I was interviewed for creating compilers, linkers and OS-level protection that did not allow those troublesome things to exist - not use them - and hence my justification :)

    That said, you may knowingly or unknowingly use a language designed for bad programmers even when you program C or C++ in upcoming versions of compilers that insist on managed code - they may just wrap up your code in a nice wrapper to prevent mishaps and hand it over to the linker after having taken care of your holes.
  • by BitwizeGHC ( 145393 ) on Thursday May 06, 2004 @02:34AM (#9071105) Homepage
    Some implementations of popular dynamic languages (e.g., LISP, Scheme), let you do some type inference and/or some explicit declarations, and will spit out machine code or C that will do the job that much faster. Tweak your algorithm in the slow version of the language and then produce a program that runs ten times faster with an optimizing compiler.

    The Squeak [squeak.org] VM is a great example of this. The whole thing is written in Squeak itself. Running a Smalltalk VM this way is painfully slow, but a Smalltalk->C translator generates the code that will be compiled and used as the actual, runtime VM (which can support a whole host of things, including raster and vector graphics, sound, MP3 audio and MPEG video!).
  • by AaronW ( 33736 ) on Thursday May 06, 2004 @02:52AM (#9071185) Homepage
    Less code does not equal faster code. You can usually get the best performance increases by using a better algorithm. For example, if you're doing a lot of random adding and deleting of entries, a hash table will be much faster than a linked list. This will have the greatest impact.

    Other things that can help are doing your own memory management at times (i.e. freelists) since that will be faster than malloc/new, and will have less memory overhead. Also, design your storage to your data. If you know you'll allocate up to 64K of an item, and the item is small, allocate an array of 64K of them and maintain a freelist. This will use a lot less memory than dynamically allocating each item and will result in better locality.

    I write code in the embedded space, where memory usage and performance are both equally important. Usually getting the last ounce of performance out of the compiler doesn't make much difference.

    A good real-world example is that I replaced the malloc code provided by a popular embedded OS with DLMalloc, which glibc is based. The dlmalloc code is *much* more complicated, and the code path is much longer, but due to much better algorithms, operations that took an hour with the old simple malloc dropped down to 3 minutes. It went from exponential to linear time.

    -Aaron
  • by Kunta Kinte ( 323399 ) on Thursday May 06, 2004 @02:53AM (#9071191) Journal
    Maybe a little offtopic.

    But if you haven't heard of it http://www.javaperformancetuning.com/ [javaperfor...tuning.com] is a good source of performance tips for java

  • by grimen ( 777425 ) on Thursday May 06, 2004 @03:28AM (#9071315)
    The (I think correctly) author argues that for many tasks we over stress optimization in places where it isn't necessary. Well and fine for tasks that it's not necessary such as the example he gives.

    However, as available processing power increases, some tasks change. Many technologies follow a trajectory that starts at "unthinkable" then move to "if you have special hardware" and then move gradually to software. Often along the way, features and computational complexity are added that keep a technology barely in reach (of both HW and SW implementations). It can be many many years before some technologies settle into a stage where they can be comfortably supported in SW at acceptable performance.

    Examples include: sound (which started with clicks and beeps and moved through to multichannel 3D audio), graphics, games (text-based to ever-more-complex 3D) and video codecs (simple RLE moving to ridiculously complex stuff like the H.264 codec). In games, for example, there are often preference panels controlling which features should be disabled for performance reasons. This seems evidence that the authors/publishers feel they can't count on their customers having enough power to run the games without cutting features to gain performance.

    I think for those applications where processing power trails needs and desires of customers and where optimization can make up the difference, developers will need to optimize or be eaten by the competition. In my experience, in things like codec and graphics development, you can get many-times performance increases over solid but poorly optimized implementations (sometimes even when you're just feeding HW).

    I think those gains can be critical.
  • Re:me too... (Score:5, Interesting)

    by some guy I know ( 229718 ) on Thursday May 06, 2004 @04:22AM (#9071483) Homepage
    If the man was coming from a BASIC programming background, his belief may have been understandable.
    Some (very old) BASIC interpreters used to parse each source line each time it was executed.
    Doing it that way saved memory (no intermediate code to store).
  • So true... (Score:3, Interesting)

    by warrax_666 ( 144623 ) on Thursday May 06, 2004 @04:46AM (#9071545)
    You have to work much, much harder in C++ to get anywhere near FORTRAN performance, so much so that it's almost never worth the effort.

    One of the most dangerous things (optimization-wise) in C++, I've found is the temporary-creation problem. You have to be insanely careful to avoid creating temporaries to get any sort of reasonable performance... (or maybe I just need a better compiler than GNU GCC?)

    Templates powerful and dangerous.


    Not quite sure why you would consider them dangerous, but they are Turing Complete (i.e. they are a compile-time language all of their own). Which some people have used to create this [oonumerics.org]. It looks almost as fast as Fortran, but the syntax is a lot more complex than just A*B for a matrix-multiplication.
  • by Anonymous Coward on Thursday May 06, 2004 @04:49AM (#9071554)
    (Posting as anonymous coward to protect the guilty)

    When I started my Ph.D. work, I came into a project doing compiler stuff in functional languages. There was a home-brew lexer that my adviser had written, that did 2d "array" lookups by scanning all the way through a list of lists. We thought it was broken, as it never finished. I changed it to use real arrays, and got it down to taking a matter of minutes. Merely using a O(n^2) implementation rather than O(n^4) :) (Language and implementation still sucked, though)

    Morale: It's the O-factors that will kill you. Optimizing anything but that is a waste of time until you have seen the profiling data. And even the O-factors are irrelevant in code that's not going to be executed often or that is outside of high-performance areas.

    I still catch myself trying to avoid single instruction improvements in handling of, say, user dialog actions. But the user and window system together probably took over a second already to cause the action, so doing a bit of extra CPU work to be clearer or safer is The Right Thing.
  • by Anonymous Coward on Thursday May 06, 2004 @05:03AM (#9071602)
    I don't think you are really being fair.

    I've come to the belief that sending out machine-code packages is flawed, because you don't really know what the target platform is going to look like.

    Example: can the processor support SIMD instructions? This can make a _HUGE_ performance difference in some applications. I would argue that if you are shipping a binary, you just wouldn't know.

    Example: Code optimized for a Pentium III did not run as fast on a Pentium IV. Why? The pipeline changed and this had an effect on the way the compiler should schedule instructions.

    Summary: You want the binding between program -> machine code to be as late as you can push it. This doesn't mean I advocate source distrubition since those have their own issues (see Gentoo). But an easily translatable representation like Microsoft's ILM or Java's ByteCode seems to be the solution. This is a classic versioning problem, do you want every app to deal with it? Or deal with it once at the OS level?

    Not to mention having really stupid loaders causes other problems down the pipleline. Was this code compiled for thread-saftey? How about with debug information?

    But back to performance, low-level efficiency matters. But how much it matters depends on the application. Thinking in terms in processing large amounts of data. Obviously you want to touch it as few times as possible: therefore O(log n) is bettern then O(n), better then O(n^2). But to get good performance: BE CACHE FRIENDLY. Make sure you access data in an array, so that if you access the same data again, you access it soon, and if you look at one element you are likely to look at one of it's neighbors.

    This will keep your code running well today, and will allow future hardware to process it faster (assuming new machines use the same type of memory hierarchy only bigger). This does mean prefering indexing tricks into the array over pointers. And yes, linked lists are your enemies.

    Anything else, it is going to be a wash. The first thing that researchers of some new language find out is how to get rid of 80% of the inefficiencies that make it slower than C.

    In sum, concentrate more on the algorithms. Be cache friendly. Program in a language that you enjoy. And leave the pissing contest between who's language (and therefore their unit) is fundamentally faster to people who have nothing better to do.
  • engineering (Score:3, Interesting)

    by sir_cello ( 634395 ) on Thursday May 06, 2004 @06:43AM (#9071833)

    A lot of this discussion here is either crap, a rehash or was covered in Engineering 101.

    Basically, you have some requirements for the product, and you optimise according to those requirements. Performance is just one variable (time to market, scalability, reliability, security, usability, cost, etc - are the many others).

    The requirements for a product in a fast moving market entry company are less about performance and more about rollout ASAP.

    The requirements for the same product two years later may be to improve performance to achieve scalability requirements.

    If you're writing some sort of overnight build or batch application: whether it takes an extra hour or not may not matter, because it has a 12 hour window to run in.

    If you're writing an order processing system, then performance and end-to-end turn around to will be vitaly important, but you won't focus on the assembly, you'll focus on the algorithms, design and architecture.

    If you're writing a compression or encryption module: you probably will work on the assembly.

    All of the above cases, before you optimise anything: you profile and understand how the optimisation is going to pay back in real terms.

    In my experience, you cannot prescribe any of this: you need to take it on case by case basis because every product and circumstance is different.

  • Re:me too... (Score:1, Interesting)

    by Anonymous Coward on Thursday May 06, 2004 @06:59AM (#9071894)
    ... And if you were customizing and programing Vantive (customer support backend software where all customization is done in either VB for Applications or in stored procs on the SQL server), you'd still find yourself shortening variable names and reducing whitespace.

    Vantive stores VB for Application data in a set size blob on the server. Max size 64k. Go over that, and it won't save.

    Of course you could always refactor the blocks to move things out, but when you're working 70 hour days under the gun to "get-it-done-now-however" you just substitute a variable or two and damn the consequences.
  • by ultranova ( 717540 ) on Thursday May 06, 2004 @07:37AM (#9072011)

    So throwing more hardware at Windows doesn't solve its performance problems ? So the parent poster was right ?

    So what's your point ?

    In any case, the main usability problem with Windows XP isn't slowness - it's the thousand faces of madness within ! I mean the stupid popups that keep on harassing me ! "Do you want to clean the desktop ?" No, I want to work with the app I just started ! Why is there "never bother me again" -button on the darn popup ?!?

    I've only used XP on school, so I admit that my skills are propably lacking, and that a more skillfull person could propably turn all of these annoyances off. Still, it is not fun having those popups disturb me all the time, ut us not funny having the stupid operating system and Office programs hide half their menus, and it is NOT funny having that idiotic paperclip jumping up and offering useless "advice" all the time !

    Sorry about the rant, but I had to get that off my chest. Feel free to mod flamebait/offtopic/troll/whatever.

  • Coding while blind (Score:5, Interesting)

    by xyote ( 598794 ) on Thursday May 06, 2004 @07:40AM (#9072015)
    The biggest problem I see with performance is lack of visiblity of performance factors. At the hardware level there is cache (which is supposed to be transparent) and really deep pipelined processors. This can have a major effect on what would otherwise be an optimal algorithm. And the hardware keeps changing, so what may have been optimal at one point will become suboptimal later on.

    In software, the biggest problem is lack of performance directives. POSIX pthreads is one of the biggest offenders here. Best performance practices in pthreads are based on how common implementations work. POSIX allows implementations that would cause major performance problems for so called best pthread programming practices. Example, POSIX allows pthread_cond_signal implementations to wake all waiting threads, not just one. There are programs that depend on pthread_cond_signal to wake only one thread for performance in order to avoid "thundering herd" problems. So while standards allow portability of correct programs, whey do not necessarily allow portability of performance.

    We need explicit performance directives.
  • by Anonymous Coward on Thursday May 06, 2004 @07:42AM (#9072022)
    Its sad that the gems of truth about speed and optimisation are buried in the erlang flag waving.

    Optimising the algorithm is certainly the best way to speed it up but his apparent belief that C programmers would have trouble using the same optimisations is truly bizarre. In truth C programmers probably wouldn't bother because they'd have stopped earlier, doing less work for a similar final speed, reaping the benefit of making it 'just fast enough'. I know I made little effort to optimise image readers even on ancient 16Mhz PC's, the naive, readable C++ was fast enough.

    If it had anything useful to say about micro-optimisation (and when not to bother) it would be useful, but that's not really mentioned in the article, just an invention in the slashdot preamble.
  • Re:What annoys me (Score:2, Interesting)

    by Bush Pig ( 175019 ) on Thursday May 06, 2004 @08:45AM (#9072284)
    Sure, but far too many people send Word97 files when a plain text file would have been adequate. Most people assume that you're going to have the same version of Word as they do, and go all blank when you ask for something else (I'm speaking from personal experience).

  • Re:me too... (Score:3, Interesting)

    by ggeens ( 53767 ) <ggeens AT iggyland DOT com> on Thursday May 06, 2004 @08:49AM (#9072311) Homepage Journal

    Some (very old) BASIC interpreters used to parse each source line each time it was executed.

    One time (around 1985 IIRC), I read an article in the local computer club's magazine. The author had written a BASIC program (GW-BASIC I think) to "shorten" BASIC programs. It would:

    • Shorten all variable names to 1 or 2 characters
    • Remove whitespace
    • Renumber all lines so that all GOTO n became shorter

    The source code that went along with the article looked like it was used on itself (very terse).

    With the machines you had back then, it probably made a difference.

  • De-commenting (Score:4, Interesting)

    by RogL ( 608926 ) on Thursday May 06, 2004 @09:08AM (#9072427)
    Back in the late '80s, early '90s, worked on (DOS-based) commercial products written in APL. APL is an interpreted language, written in Greek & math symbols, some overstrikes.

    Each byte of that 640K was precious, so it was common practice to "de-comment" the code before release; remove all comments, reduce whitespace, move multiple statements onto a single line, possibly shorten variable names. You could gain a substantial (for the time) amount of memory that way. You also dynamically imported/destroyed functions.

    I regularly debugged client systems with "de-commented" APL; if you could read that, you could read anything!
  • by Old Uncle Bill ( 574524 ) on Thursday May 06, 2004 @09:09AM (#9072437) Journal
    Amen. Poorly written queries, excessive XML parses/transforms and too much bandwidth utilization are all things NOT solved by tuning the architecture. We typically make 2-3X improvements in our product through tuning the system and up to 100X by tuning the above. I've worked on projects where the system (in this case 1 4way db and two 2way app servers), could support 2 users. No amount of throwing hardware at that thing would improve the performance. Funny thing is, the client was a bit frosted because they had paid (at that time) about $4 million for the project. As a performance architect, lazy and inefficient programmers will keep me employed for centuries.
  • by scorp1us ( 235526 ) on Thursday May 06, 2004 @09:18AM (#9072504) Journal
    If you spend hours tweaking code to eliminate a few instructions, even instructions in a loop, then you are just wasting your time.


    Real opimizations come before you right your program. Take for example that loop that you removed an instruction or two from. Say it is a searching an array. and looks like:

    for (i=0; i<strlen(x); i++){
    if (x[i]=='&') ;
    }


    There are two things wrong. One you cal strlen repetitively. Strlen() is theta(n) So you have a loop that executes n times at a cost of n . n*n=n^2. That's one of the slowest algorithms around. Maybe your compiler is smart enough to see that x is not being modified and will to a s=strlen(x); then compare against X for you, but probably not.


    The other thing is when searching an array, try to give it structure. If your array contains sorted characters, then you can find it in log _2 (n). Of course, of you sort by frequency (most commonly accessed at the top) then your n^2 loop *might* do better.


    The article is right: constant-time operations (type checking, etc) are asymtotically infitessimal in algorithms. The article's real problem is that it is n, but on a 2d image (x*y)=n you can't do any better. Note that it is not n^2, (though it makes a square picture) because you're operating on pixels. So that will be your unit of measure - NOT time.


    Which is my 2nd point. Don't measure in time or instructions. Measure in OPERATIONS. Operations are not instructions or lines of code. An Operation is everytime you have to look at a unit. It is a logical unit of cost. Hardware can be changed out. We know that hardware (performance) doubles every 18 months. The constant-time instructions will get smaller. (Also clocks per cycle are irrelevant as well). But your loops will remain the biggest part of your program.


    With that, here's a crash course in CS:

    1. Loops are most of (time, operations) the program. Try not to use them.
    2. To avoid loops, structure your data. Giving structure means assumptions, and assumptions means you can skip irrelevant sections of data.
    3. Determine your dataset, minimize worst-case occurences. Find out what order of data or instructions will make your n*log2(n) algorithm become n^2. Then find away around it.
    4. and optimize for average case. That is, if you never sort more than 6 numbers at a time, an n^2 will beat a n*log_2 (n) algorithm.
    5. If your data structure introduces overhead (most will) find yuor most common or costly operation. Optimize your datastructure for that (searching, sorting, etc) If you do a combination determine the ratio and optimize for that. The cost of overhead is usually small compared to the reason why your using a datastructure to speed up your common operation.
    6. The most obvious and easiest to code algorithm is the slowest. (Bubble sort vs. Radix or quick-sort)
    7. Mastery of the above is the difference between a $50k programmer and a $90K programmer.


      To learn more, take a datastructures class at your local university. Please review Calculus II before doing that though.

  • Re:me too... (Score:2, Interesting)

    by randomencounter ( 653994 ) on Thursday May 06, 2004 @11:48AM (#9074101)
    I know where he might have gotten the idea.
    The Vic20/C64 basic allowed you to merge program lines using a semicolon. This took 1 byte less per merged line, and did indeed run somewhat faster. Since the Vic20 had only 2.3K usable without tricks this was a big deal.

    Of course, anyone inflexible enough to carry that through to a C++/C/Cobol project shouldn't be programming.

  • by Anonymous Coward on Thursday May 06, 2004 @11:49AM (#9074109)
    I think your love for F95 is a bit overzealous. It suffers from many of the same problems that C++ does.

    Like C++, F95 has opaque types and function (and operator) overloading, and "pointers". These offer all the same merits and demerits as in C++.

    Many F95 compilers do not check runtime array bounds, and none do by default. And when they do, the executable slows noticeably. Nobody ships F95-based apps or runs production code with array bounds checking turned on. It just isn't done.

    F95 bugs are at least as common as in C/C++, simply because the language's programming idiom does not routinely check return status variables, much less trap exceptions. Maybe that's not the fault of the language, but idioms count just as much as syntax.

    I agree that F95 is almost certainly superior to most/all other languages if you need to manipulate arrays. F95's compilers also produce better optimized code than the others (aside from C/C++). That's simply where the priorities of its users lies.

    But since most users of F95 attempt a different kind of programming task, with its own characteristic set of bugs, it's not meaningful to compare the subtleties of F95's floating-point exceptions with C++'s null pointers.

    Finally, perhaps the reason that "computer science" often underestimates F95 is that there are no books written on the language for computer scientists. Fortran terminology was coined in 1955 and NEVER updated. Rather than use common terms like "record" or "procedure", fortran persists in promulgating its own antiquated argot.

    Until fortran "gets with the program", it will remain a language that only an engineer could love.

    Randy
  • by Valar ( 167606 ) on Thursday May 06, 2004 @12:25PM (#9074569)
    Actually, when I was TAing data structures, we called that 52 card pick up sort. You take the deck of cards and throw it in the air. Pick up the cards and if they end up sorted, stop. If not, throw them in the air again. We used it as an example of "just because it works, doesn't mean you should do it" and as an example of algorithms with big os in the 'bad' column.
  • by solprovider ( 628033 ) on Thursday May 06, 2004 @03:35PM (#9076536) Homepage
    When I was finally allowed to program, I did it on a Commodore PET in elementary school. Of course I was writing games, and I optimized because games are not fun when they are slow. I was too lazy to type in the source from magazines, so all of my programs grew until they were usable, then grew some more as people played them and asked for features.

    Junior High did not have computers yet. I finally convinced my family to get me a computer if I paid half. With my budget, that meant a VIC20 for under $100. The VIC20 had 4KB of RAM. You could buy a 16KB expansion, but I could not afford it.

    The language was the same as the PET, so I tried to run my existing programs. They ran. I tried to modify them, save, and run them, and they would not work, even if the change was to remove code. I finally tried changing all the commands to "tokens" to shorten them. IIRC, a token was the first 2 characters of a command and an underscore. Since most of the commands were 4 letters, this saved quite a few characters. I also renamed all my variables to shorten them. Then I saved and the program ran. Yeah!

    Then I made another change, and the problem reappeared.

    I decided that:
    10 The program loaded as written to the tape. (Hard drive? Floppy disk? Never heard of them.)
    20 If the program fit in memory, it would run.
    30 When the program was loaded for editing, all tokens were expanded to the full command.
    40 The program was saved as text, except...
    50 If the tokenized version of a command was encountered, then it was saved as the token. I never figured out if they were saved as the 2 Hex number, the dollar sign and the number, or the 3-character shorthand "token" I typed.
    60 GOTO 10 (and see if it runs.)

    So every time I wanted to modify the longer programs, I had to change every command to the "token" format. (About half of my programs were under the 4KB limit, about half could be "fixed" using this technique, and a few were large enough that I never got them working again.) Any changes to the longer programs required 20 minutes of "tokenizing" the commands before saving it. That killed much of the fun of programming. (Today I get upset if a build takes longer than a game of Solitaire, but "getting upset" means deciding to fix the build process.)

    Commodore bought BASIC from MS, and then modified it, so I do not know who to blame for the hours I wasted on this, but Commodore is gone and MS continues to take the fun out of computers, so I blame MS.

    ---
    My next venture into computers was the C64. They had "Sprites". Half of the code in my games was controlling the graphics, and this improvement to the platform made that code obsolete. For the challenge, I upgraded one game to use Sprites. They took much of the fun out of it, and (IIRC) you were limited to 4 of them, so you had to play games (pun intended) to write PacMan. (4 Ghosts and PacMan required 5 Sprites. The dots and cherries would be handled without Sprites. It was easy to write a 3-ghost PacMan game, and really difficult to write a 4-ghost game.)

    Since the C64s at school did not have tape drives, my old programs had to be typed in, if I had a printout from elemetary school (no printer at home.) I already stated that I was lazy, so they are gone. Well, I still have the tapes, but they are 2 decades old, and my PC does not have a tape drive anyway.
  • by Anonymous Coward on Friday May 07, 2004 @10:41AM (#9084125)
    If humanities software developing skills progressed at the same speed as hardware we would have sentient thinking computers right now.

    You're a bit too early to be making that kind of statement. Try again in 15-20 years. Computers are still THAT far behind the processing/memory capabilities of a human.

"Engineering without management is art." -- Jeff Johnson

Working...