Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Programming IT Technology

When Making a Comprehensive Retrofit of your Code... 385

chizor asks: "My programming team is considering making some sweeping changes to our code base (150+ perl CGIs, over a meg of code) in the interest of consistency and reducing redundancy. We're going to have to make some hard decisions about code style. What suggestions might readers have about tackling a large-scale retrofit?" Once the decision has been made for a sweeping rewrite of a project, what can you do to make sure things go smoothly and you don't run into any development snags...especially as things progress in the development cycle?
This discussion has been archived. No new comments can be posted.

When Making a Comprehensive Retrofit of your Code...

Comments Filter:
  • object orientation (Score:3, Informative)

    by apsmith ( 17989 ) on Friday December 21, 2001 @05:23PM (#2739376) Homepage
    we're doing something similar - and switching to java (JSP's + Tomcat, struts) to replace a lot of old perl cgi's. The java code is much, much cleaner. But object-oriented perl code can help if you don't want to take the plunge too far into a new language. And at least find a way to go mod_perl rather than CGI, for the things where performance matters at all.
    • by TheRain ( 67313 )
      why is this flamebait? all this person is saying is that object orientation could make the code easier to manage and, thus, help to reduce redundancy. it's a good statement, modularize your code and it becomes more reusable... and therefore less redundant. who cares if he likes java over perl and CGI.
    • by apsmith ( 17989 ) on Friday December 21, 2001 @05:47PM (#2739486) Homepage
      Alright, maybe I posted a little too soon, but shouldn't "flamebait" be attracting flaming responses? I don't see any...

      Anyway, if I'd spent a little more time thinking about the advice side of it, taking a look at appropriate programming methodologies (like Extreme Programming advocated in another thread here) would be one piece I'd advocate. Given the size of the code (1 MB = about 20-30,000 lines?) there's no need for major heavy-weight processes here. More important I'd say is sitting down and figuring out in the appropriate level of detail what exactly your system is doing right now - you can do this using UML [omg.org] diagrams which seems to be becoming a standard, though the main use we've found is to try to get an overall view of things which we then throw out when we get into the details again.

      The other thing to do along these lines is look for your use of standard patterns within your code - the Design Patterns [amazon.com] book is extremely helpful if you're moving to an object-oriented framework at all; following well-known patterns and indicating clearly what you are doing can make your code much easier for others to follow.
    • Paradigm Caution (Score:2, Insightful)

      by Tablizer ( 95088 )
      There is NO decent evidence that OOP factors better, or does *anything* else better than procedural/realational programming (or other non-OO paradigms).

      IOW, don't switch paradigms just because something is in fashion right now.

      Look carefully before leaping into something that may just turn out to be a fad or only shines in specific domains.
  • for consistent style...;


    refactor as a result of learning from your mistakes and redundancies;


    and try to minimize the busy parts (where all developers have a hand) when things change (like lists of unique symbols, numbers, etc.)

  • Sleeping dogs (Score:5, Insightful)

    by Chairboy ( 88841 ) on Friday December 21, 2001 @05:25PM (#2739389) Homepage
    I'm sure a bunch of code nazis will disagree with me (please note the clever way I attempt to pre-emptively undermine their arguments by labeling them as 'nazis') but sometimes massive engineering re-writes are not necessary.

    Your tangled mass of spaghetti code paths are probably full of almost incomprehensible little design decisions and seemingly out of place declarations and functions, but most of those were probably added as specific fixes for bugs encountered under real-world use.

    Most companies that decide to massively re-engineer their code (do a big rewrite) usually end up regretting it because it forces them to re-fix the problems that caused the original strange looking code in the first place.

    Does your CGI nest work? If so, maybe you should leave it alone. If you are fixing specific problems, then go ahead, but if this is a generalized attempt to fix the 'not invented here' syndrome that plagues engineers (who will almost universally agree that it is easier to write code then it is to read it), perhaps you should reconsider.
    • Re:Sleeping dogs (Score:3, Informative)

      by kz45 ( 175825 )
      Your tangled mass of spaghetti code paths are probably full of almost incomprehensible little design decisions and seemingly out of place declarations and functions, but most of those were probably added as specific fixes for bugs encountered under real-world use.

      This is a lesson to be learned. Engineer your code from the beginning. Use easy to understand commenting, and strucutured code. Although it takes some discipline, you will almost never have to reconsider "re-writing from scratch".
      • Re:Sleeping dogs (Score:3, Insightful)

        by thogard ( 43403 )
        You can't engineer your code from the beginning in the part of the real world where I work. I've got a huge mess of perl that do reports and about one out of ten clients wants something different. These request could have have been predicted since many of them have no business reasons and have no function other that look more like their older system which just happed to have done it that way.
        • Re:Sleeping dogs (Score:3, Insightful)

          by sunking ( 19846 )
          So you're saying your problem is too hard to be solved with a clean design? Good software design is all about preparing for the unexpected! In fact, if you ever had a perfect understanding of the problem then it wouldn't be worth designing a solution carefully - you wouldn't need the flexibility!

          Thinking that writing one-off code is giving you flexibility is grave mistake.

          -sam

    • ++ to that comment.

      A general refactoring, without the intent of doing it to add new functionality / fix bugs, isn't worth it. What business value do you gain from it?

      However, there are times that it is worth undertaking substantial refactoring to add what may seem like relatively small pieces of functionality. I think it was in his _Refactoring_ book that Martin Fowler likened this to reaching a local peak in a mountain range, where you have to go down into a valley before you can climb the next peak.

      ObOfftopic: This is my first post in a while that hasn't been a troll or flamebait. It just isn't as fun being informative.

      • Re:Sleeping dogs (Score:5, Insightful)

        by Nyarly ( 104096 ) <nyarly@redfiv[ ]c.com ['ell' in gap]> on Friday December 21, 2001 @05:50PM (#2739499) Homepage Journal
        Certainly, but if you find yourself trying to make a change to existing code, it will probably not be the last change you need to make. At least refactoring anything your change touches will make future changes easier to make.

        Related note: the original poster doesn't say "refactoring" and he does say "Perl." Informative statements relating to these two facts:

        • Buy and read Martin Fowler's Refactoring. The examples are mostly Java, and he references the Group of Four's Design Patterns. "Retro-fitting," in which you probably plan to rewrite portions of code from scratch will break your app, your mind, and your budget. Learn what refactoring implies and entails.
        • Learn about design patterns in general, and consider how they might apply to your code. One description of a design pattern is "a target for refactoring."
        • (Donning asbestos) You might want to reconsider perl as the language of choice for a large scale application. I realize I'm posting this comment to a Perl system, but Perl hangs together like an immense kludge of a language. That said, you're probably stuck with it, and AFAIK, you may be forging new paths in programming for reusability by applying the above concepts to Perl. Good luck, and be sure you can trust your machete.
        • by bbqdeath ( 314918 ) on Friday December 21, 2001 @06:20PM (#2739627)

          I second the concern about PERL. And I offer my advice as someone who has virtually no qualifications to talk about large systems of code. I just like Python [python.org] better than PERL because it doesn't hurt my eyeballs like PERL.

          Yeah, I guess this is a troll. But it's honest. I use Python like most people use toilet paper: several times a day, and for more things than it was originally intended.

        • Perlmonks.org (Score:2, Informative)

          by consumer ( 9588 )
          You might want to reconsider perl as the language of choice for a large scale application. I realize I'm posting this comment to a Perl system, but Perl hangs together like an immense kludge of a language.

          What a monstrous Christmas troll you are. What qualifies you to make this judgement? Perl, like any other mature language, has people who write kludges with it and people who write clean, elegant code with it. Your lousy Perl code is not indicative of a language problem.

          That said, you're probably stuck with it, and AFAIK, you may be forging new paths in programming for reusability by applying the above concepts to Perl.

          And this shows how much you know, since the Perl community is full of activity around design patterns, refactoring tools, unit-testing, and other practices which are in favor among experienced people trying to write solid, maintainable code.

          My suggestion for those who are looking for actual useful advice rather than this kind of "throw away all your work and learn Java" crap, would be to head straight for http://perlmonks.org/ [perlmonks.org] and read up. There's tons of advice there for serious Perl coders. You would also do well to start reading the mod_perl mailing list, which often has informative discussions about these issues.

        • Re:Sleeping dogs (Score:3, Interesting)

          by foobar104 ( 206452 )
          You might want to reconsider perl as the language of choice for a large scale application.

          I agree 100%. My company started to bring a commercial application to market a little less than a year ago. I prototyped the code in Perl, and the prototype was sufficiently okay that the decision was made to evolve the prototype into the release code.

          This was A Mistake. It was A Dumb Idea. It was also My Decision. I have taken Much Shit for this from my coworkers. But you live and you learn.

          I have since (over the past four months or so) rewritten the entire application-- every line, every file-- in C++. The source tree is 3.8 MB, and it compiles to about 100 MB of object code. (The actual executables, of course, are much smaller than that.) It was a pretty big job.

          Not only is my code tighter and cleaner than the original Perl stuff (which was actually pretty okay code) but it's between 2 and 10 times faster.

          I love Perl, absolutely adore programming in it, but there are some things that are easier to do with C, or C++, or (presumably) Java. When you split a project up among a number of people, for example, using the Bridge design pattern and distributing read-only interface header files makes modular integration so very much easier. That's just one example.

          We would not have been able to get our app to market without the Perl prototype. And I don't think it would have been worth a damn if we hadn't rewritten it in C++.
    • see joel on software (Score:4, Informative)

      by kubalaa ( 47998 ) on Friday December 21, 2001 @05:44PM (#2739464) Homepage
      He says [joelonsoftware.com] the exact same thing as you, only better.
      • Although I enjoyed his book, and respect his opinion on matters of interface design, Joel Spolsky is no champion of well-designed code. In his book, he even suggests that a programming language shouldn't be used if another can create the same user interface in less time... as if the time it takes to create the first version of a product should take precedence of other factors such as maintainability of code, reuse, and flexibility. He does a good job of regurgitating the works of other interface specialists, but he is not an expert in code maintenance.
    • by ChaosDiscordSimple ( 41155 ) on Friday December 21, 2001 @05:50PM (#2739501) Homepage

      Your tangled mass of spaghetti code paths are probably full of almost incomprehensible little design decisions and seemingly out of place declarations and functions, but most of those were probably added as specific fixes for bugs encountered under real-world use.

      Yes, and if they're cryptic and uncommented, they are worthless. Eventually one of these incomprehensible, magical fixes will stop working. Perhaps the bug it works around is fixed. Perhaps how the function is being used changes to previously unexpected behavior. Some poor engineer will look at the little big of magic, scratch his head, and be forced make a blind decision about how to fix it. Perhaps he can change the code while leaving the bit of magic in working, but he can't be certain, since he doens't understand it. If the collection of cryptic tweaks becomes dense enough, any attempt to fix a bug or add a feature becomes highly risky.

      On a related note, don't let this happen to you. If you add one of these strange little fixes, for the sake of the programmer that follows you, document them. Just a little "Need to toggle the Foo because Qux 1.4 is correctly fails to do so" will bring tears of joy to the eyes of future programmers.

      • You're wrong. The comments you recommend will only /add/ to the mass of code to be grokked. Don't forbid commenting; but don't consider them a solution.

        Instead, refactor and unit test. Unlike comments inserted into the middle of the code, unit tests will fail and point you to the reason for the failure. In the above example, when we upgrade to Qux1.5, the unit test which asserts that the Foo is untoggled will fail, and will point right to the function which made the assumption. Bingo -- a quick fix.

        -Billy
    • Your tangled mass of spaghetti code paths are probably full of almost incomprehensible little design decisions and seemingly out of place declarations and functions, but most of those were probably added as specific fixes for bugs encountered under real-world use.


      ... which is why whenever you add one of these "little design decisions" whose purpose isn't blatantly obvious, it's important to put in a comment saying what it does and why it is there. Otherwise someone might come through later on, think it's an error, and remove it.

    • Your tangled mass of spaghetti code paths are probably full of almost incomprehensible little design decisions and seemingly out of place declarations and functions, but most of those were probably added as specific fixes for bugs encountered under real-world use.

      If you need to make big changes to the code, then you are already hosed. If you have no record of what bugs you fixed, and no way of testing if those bug fixes are still working after you make code changes, then keeping all of your spagetti in place is no guarantee that you won't re-introduce bugs later on.

      The Code Nazi approach is to write a unit test for each bug you fix. And you have an easy way to re-run all of your unit tests. After you make a big change, you can test if all of your bug fixes are still working. And now you have the flexibility to refactor your code whenever you want. Which means you can keep your code base clean and elegant, and you'll never reach the crisis point that this group has reached. This is the approach advocated by Extreme Programming, as well as by other software disciplines.

      Doug Moen

    • Joel Spolsky has an article [joelonsoftware.com] up on his blog site that speaks to this point.

      He uses Netscape's decision to rewrite Netscape 6 from scratch as an example, and expands upon many of the points mentioned above.

    • Sometimes, but not always, if your code is a mess, then that means that your business is a mess too. If your programmers have a hard time understanding the business, then your customers will too.

      When you investigate those twistly little lines of code, see if a business rule can be simplified. Try really hard. If it can't be, then put it into code. Don't make the opposite mistake of making the rules TOO simple.
    • by JohnsonWax ( 195390 ) on Friday December 21, 2001 @11:17PM (#2740242)

      There's a concept in the world of stuff you can touch called 'depreciation'. Why the software world hasn't caught onto this is beyond me, but from my experience it seems to apply reasonably well.

      The value of your code goes down in value over time. Now, don't confuse that with the value of your design - your design could be grand, but the code is part of your equipment, like your hardware. Depreciate it over time to reflect the increasing cost of maintenance and integration costs in a migrating business.

      Reworking your code allows you to make adjustments to your design to reflect a new environment, or to move away from languages/APIs/toolkits that might be hard to maintain.

      I depreciate my code over about 3 years. It's all modular, and I replace code about as frequently as I add. A number of years ago, I had tons of bandwidth but not much CPU power, so I tended to push data rather than compute. The reverse is now true, so I made some design changes as part of a standard rewrite - no need to wait until it broke. Overall, most of my code hasn't radically changed in it's design, but it has been rewritten several times. I've had code cut back to 10% of it's original size by adopting a new toolkit, etc. I've made it more robust, faster, cleaner, better documented. I can't think of a case where it's gotten worse, and I can't think of a case where the rewrite took much longer to write than the original code - and more often than not took much less time. It's worked well enough that I've added a considerable amount of functionality, but spend no more time reworking code because it becomes increasingly efficient and is never too far removed from future additions.

      Many people suggest that code rewrites are a waste of time, but it's a maintenance function. People that only budget time to write new code often find those extra work hours devoted to maintenance. Budget it in - and the best way is through rewrites.

  • You should read up on Extreme Programming [extremeprogramming.org], in particular Code Refactoring [extremeprogramming.org]. It's a method of cleaning up old code. A very well written book, as well as an excellent code-housekeeping method.
    • Yah, I keep hearing this. Read the whole book - it recommends strong test plans, incremental development, pair programming, tons of snacks, and a maximum of a forty-hour work week. In itself, not a bad thing.

      Interpreted by management?
      • They see the part about everything being stuck out in the open, with the computers set up in the center of the room - cubes are overrated, offices are evil
      • half as many boxes are required since you are "coding in tandem". See above for hint on how personal space is valued.
      • Demand complex test plans that will cure everything. These same people never have actual requirement documentation for what the code should do. Classic example? Validation rules for a phone number... gets ugly when someone wants to put in 1 800 FOO-BARR, add an extension, deal with international numbers. Think you can put together a test script? Got to have requirements first, and that is never a hard thing to get from the business (shudder).
      • I'm sure you have had complete management buy in on realistic time estimates now, so I wonder where the rumors come from that development tends to work more than an eight hour day. When they find out how long it will take to refactor the code base with through testing and daily updates, they will tell marketing and adjust the dates accordingly. Heck, I thought the Microsoft Project Plan defaulted to an eight-hour day - must be one of those rare Pentium errata issues that messes things up for me.

      Not that I'm bitter, cause I'm not... but XP takes a handful of generally good practices and assumes perfect management buy-in and team communication. With any management support and teamwork, almost any method works.
  • Break the code into re-usable modules (or objects if you go another route besides perl, or even if you do, if you like that sort of pain), each programmer responsible for his own set (ie layout, calculation, database, etc)
  • CYA (Score:5, Insightful)

    by The Gardener ( 519078 ) on Friday December 21, 2001 @05:27PM (#2739395) Homepage

    Don't stop maintaining the old code code until the new code is on solid ground. No matter how sure you are that you can do it, the new code might never come through.

    The Gardener

  • Don't do it (Score:3, Informative)

    by mrpotato ( 97715 ) on Friday December 21, 2001 @05:28PM (#2739398)
    See the recent Joel Spolsky interview here [softwarema...lution.com], that was discussed on /. here [slashdot.org].

    Basically, Joel's take on a similar problem is: don't do it.

    Unless you have a _really_ good reason to do huge change to a big codebase, don't bother, and make something more productive instead.

  • It always seems to me that large rewrites (though I've never done one as large as the one discussed) tend to make bad code bases worse for a while. If there is any effort made to maintain functionality in a changing codebase, a piecemeal rewrite is more of a headache than a help. So, if you're planning on a big rewrite, please give your developers the full thumbs up to break everything and expect them to put it back together later--and expect to see nothing tangible in the short term. Or call it off.

    That being said, my last company sat around and bickered about code style for nearly 4 months and produced no code that wasn't rewritten later. If you are going to concern yourself about style, settle that well in advance and make sure it's logical and consistent.

    It's also been my experience that conformant code style is highly overrated. Once the Best Practices document extends beyond language constructs and caveats, into brace styles, spacing, tab size (yes, there was a 3 space tab stop standard at my last job--wretch), and even the naming of locals, parameters, members, constants, enumerations, etc... it got to be a thick ass bible of stuff that only a few people would digest or attempt to adhere to. The point I'm trying to make is, choose your battles. The hope is your developers will make sane choices independently, and use standards to help integrate different peoples' work together. Anything beyond that and it's pissing in the wind.

    My $2e-2 or less.
  • Horror Story (Score:3, Informative)

    by Brontosaurus Jim ( 528803 ) on Friday December 21, 2001 @05:33PM (#2739417) Homepage
    My firm went through this sort of thing just two years ago. The PHB at the time decided, for some reason, that our 300,000 lines of semi-poorly written C code, and 50,000 associated lines of Java (Dont' ask).

    Anyway, it took 7 of us over 2 months to get even halfway done. The pressure the boss was putting on us was awful, and he didn't really even understand what we were doing, even though he was the one demanding it. I think she read it in a trade mag somewhere. God I'd do a lot more work if she didn't read that shit.

    Anyway, about halfway through the "Great Leap Forward" (as we [appropriatly] named it) the boss quit, and the next boss, who so far has been fairly clueful. He didn't think the whole deal was needed, but he was pressured by the former bosses husband (the CTO) to get it done. Seriously.

    Hope yours goes better than ours. From what we did, heres some tips I can give you.

    1. Be consistant through the whole thing.
    2. Make sure everything is planned before you start. This was the one part we got right.
    3. The team you have should have worked together before, because this sort of task requires previous knowledge of eachother.

    Other than that, my condolences. Or maybe it will work better for you.

    Good luck!
  • Rule #1 (Score:2, Informative)

    by ackthpt ( 218170 )
    We're still in the middle of a sweeping change and lemme tell ya, make d@mn sure there's someone accountable for managing the whole project from beginning to end, particularly this being their main focus.

    Transitioning in new managers or having the current manager only look in on the project once in a while is as sure a path to madness and doom as no management at all.

    Our due date was mid-August, we'll be lucky to get it through testing and into production by January 31st. All the while with the logjam we're having to put pieces of it into production and cross our fingers that the new changes don't break anything.

    Love to talk more about it, but need another gallon of coffee.

  • by pyrrho ( 167252 ) on Friday December 21, 2001 @05:35PM (#2739424) Journal
    You are going to rewrite the system from scratch. Design from scratch. Your new design might be able to use some old code, if the old code is useful.

    A large scale retrofit is really an oxymoron.

    IMHO, but with 15 years experience.
  • by ChuckPollock ( 168749 ) on Friday December 21, 2001 @05:35PM (#2739426)
    The First Rule of Program Optimization: Don't do it. The Second Rule of Program Optimization (for experts only!): Don't do it yet. -- Michael Jackson
  • Document (Score:5, Insightful)

    by Tosta Dojen ( 165691 ) on Friday December 21, 2001 @05:35PM (#2739430) Homepage
    I suppose what I have to say almost goes without saying, but I know so many programmers who neglect it, I will say it anyway. There are two things you have to know when going back into old code.

    1) Know what you are doing. 2) Know what the code does.

    Both are expedited by good documentation. This is so important, it deserves to be written thusly:

    DOCUMENTATION!

    Write everything down. If the code is not commented, figure out what it does and write it down. When you add a line or a module, write down what it is supposed to do. Declarations? Write those down too. Document everything so that you can figure everything out, both now and down the road when you decide to fix something else.

    This is the voice of experience. I have had to reverse-engineer my own code 6 months after I wrote it because I failed to document anything. Learn from my mistake.

    • Depend on tests, not documents. Documents lie. Tests don't.

      Use as little documentation as possible, BUT NO LESS. (In other words, for heaven's sake don't ever try to get away without any documentation.) Documentation should state fundamental premises -- things like "The customer wants X." and "This code checks that I'm fulfilling the customer's requirement of X." Documentation should not state intrinsic properties -- the statement "this code does X" should be made as a test, not a document.

      -Billy
      • I'll have to disagree. Document as much as possible, BUT NO MORE! But this documentation must be meaningful and relevant. Otherwise it is worse than useless.

        Document every function, listing the purpose of every parameter and the meaning of the return value. Document why you are doing something if there is more than one way to do it. If a section of code fixes a bug, document what it does and do not just document a bug number. Use self-documenting code whenever possible (ei. name your variables and functions meaningfully). Use a document generation tool if possible (javadoc, doxygen, etc). Write the user docs at the same time you're writing the code. Incorporate the user docs into the code if at all possible.

        Here is a bad comment: // store x2 in fu
        Here is a good comment: // save the index because we will use it later
        Here is a bad comment: // this is not meant for you to understand
        Here is a good comment: // please see Smith's "The Black Magic of Filesystems" for details on this algorithm

        The most important part of commenting is realizing who is going to read it. It may be you. But in all likelihood it will be someone you never met long after you have left the project or even the company. It may be code or design reviewers who don't know programming but do know how to block projects they can't understand. If it's Open Source, it may be some brilliant programmer wanting to fix a bug but without the time to puzzle over your constructs.

        In every code review I have ever been in, someone has made some silly assumption about the code, with the final recommendation that that section of code be commented better to avoid future silly assumptions.
  • Think again (Score:2, Insightful)

    by pkphilip ( 6861 )

    From what you say, you are planning on making these changes to clean up the code and make it prettier.

    I would strongly urge you to reconsider this as the probability that you will end up breaking parts of the code while "cleaning" it up is quite high..especially since you seem to have a fairly large code base ~1MB

    Ugly code containing redundant stuff is still better than beautiful code that doesn't work.

  • by Weasel Boy ( 13855 )
    Make sure you have a suite of tests that produce known output for your old code, so that you can ensure that the new code works in exactly the same way. Don't add anything new until you are proven conformant with what you had before.
  • Why do people mod comments about alternatives as Flamebait (and presciently, this one)? Are people afraid to hear that you should't write large scale systems in Perl?

    It really is valid (and in my opinion, correct) to say that if you _are_ going to do this you should look at other technologies and languages. Perl is for system administrators and system administrators-cum-developers, not real software development. Look at java. Look at PHP. Look at commercial and non-commerical web application systems, like Zope. Or don't rewrite it at all if it works. But for God's Sake, don't rewrite it in Perl - it's pointless.
    • Perl is a *tool* (Score:3, Insightful)

      by tmoertel ( 38456 )
      Perl is for system administrators and system administrators-cum-developers, not real software development.

      Baloney.

      Contrary to what lots of people around here seem to think, especially those who like to make wide-sweeping declarations about what things are and what they are not, Perl is a tool. Nothing less, nothing more. Like all tools, it can be used to create well-engineered systems, and it can be used to create crap.

      The community that grew up around Perl is all-welcoming and generally free of elitism. That's why a lot of newbie programmers and "system administrators-cum-developers" use Perl -- because they can without getting crapped on by others who think they know better. As a result, there is a lot of ametaurish-looking Perl code out there, but that's not a result of the language, that's a result of the all-inclusive set of people who use Perl.

      Let's be clear: If you write code in any language and the code sucks, it's your fault, not the language's -- the language is just a tool.

      Don't blame the problems of programmers on their tools.

    • What would the advantage be in changing to another language? Assuming the team develops coding standards and creates clean, documented code that can run under mod_perl, why would PHP or java be better?

      The team is obviously already familiar with perl, and some code could be reused after a bit of cleaning. Perl is perfect for applications that require a lot of string manipulation and as a front end for a database. This is what most web programming involves.

      The only drawback that I can see in using perl is if your team is messy. Perl will allow you to be messy, but it doesn't have to be. Perl can look just as clean as any other language. I go back to apps I wrote 3 years ago in perl and can follow every bit of it. It's all commented and has plenty of white space to make it easy on the eyes.

      I've written several large perl applications for the web and yes, I do know other languages, but perl is better suited for this type of work. It's fairly quick to put together and performs well. Now I wouldn't try to write an OS or first person shoot'em up game in perl, but for this it's just as viable of a choice as anything else.

      I'm also not saying that PHP or java would be bad if you have a good grasp on it and are starting a new project. Remember, TMTOWTDI is good!
      Just don't use ASP! ;)
    • > Perl is for system administrators and system > administrators-cum-developers, not real > software development. Look at java. Look at > PHP.

      Mmmm. I write large-scale web applications for a living, and I do it in Perl. By large-scale, I mean sites that are expected to support hundreds of thousands of page-views per day, serve hundreds of thousands of distinct users per month, and collate hundreds of thousands of distinct chunks of content into dynamic pages.

      My company is a high-end development shop. We generally bid on projects that will take six to nine months to complete, and we only do jobs for clients who understand how we work and why. Part of our approach is to use very small teams of extremely experienced web developers. We usually deploy four programmers on a project. Other companies that bid against us sometimes use several times that many people on a single development team. Another part of our approach is to build everything on top of our open-source development framework [xymbollab.com]. That sometimes used to be a tough sell ("what, give software away for free, heaven forfend") but these days, most customers are pretty receptive to the twin "more eyeballs means better code" and "you're not locked into our closed, proprietary product" arguments.

      We also generally build small clusters of dual-processor Athlon or PIII machines, whereas our Java- (Oracle-, IBM -, BEA-, etc) wielding colleagues often specify absurdly-expensive hardware.

      Perl is flexible, complete, performs relatively well and has an extroardinary base of skilled developers and re-usable components (CPAN). We couldn't work as quickly or write code that is as concise and maintainable using any other currently-available language/approach. Most of the lines of code that we write actually go into Perl modules, with HTML::Mason templates handling the dynamic web-page generation. We can push Mason out of the way and use straight mod_perl for small, defined tasks. And we can easily integrate C code into our Perl frameworks in places where performance is really, really critical (though those places are rare, as when push comes to shove, one is almost always waiting on the database).

      There are lots of things that aren't "right" about web development. The package that results from gluing HTML and program logic together in a stateless execution environment is sometimes a little lumpy, and unavoidably so. There's no magic bullet toolkit, and (as with other specialized programming arenas, like graphics or embedded systems) a lot of hard-won, domain-specific knowledge goes into the development of a fast, reliable, maintainable web app.

      The Perl/Apache/Mason combination that we use is far from perfect. But it's better -- for us -- than any of the alternatives.

      I really like Java, and have written big systems in that language, too. If for some reason I had to manage a very large team of programmers, or had to manage a team with a large percentage of less-experienced programmers, I would use a Java-based solution. Java is a more rigid language than Perl, and the structure that the language provides would be a useful management tool in those contexts. But for my small teams of skilled hackers, Perl is more productive. (We have an extensive, evolved, self-imposed "structure," so we don't need the language to impose one on us -- in fact, it gets in the way.)

      I would never use PHP for the kind of work we do. PHP just isn't the kind of powerful, flexible, complete environment that Perl is.

      Zope and Python are really neat. I'm a fan of the work that folks on that side of the fence are doing. But Python+Zope don't offer us anything new that Perl+Mason+Comma don't. I also like Perl more than Python (which is a subjective preference), and think that the Perl development environment is more mature (which is a subjective judgment).

      So don't listen to the folks who tell you to dump Perl. You should certainly consider all of your options and make an informed decision about core tools, but anyone who thinks that Perl is just a "scripting" language, or that it doesn't scale, hasn't been paying attention.

      To finish this up with a little more specific advice to the original poster: You mentioned "150+ perl CGIs" in your question. You should consider moving away from the CGI model, if possible. Take a look at HTML::Mason [masonhq.com], which is a very good embedded-perl environment. You can build solid, consistent application layers using Mason as a base. Also, I couldn't agree more with the folks recommending writing perl modules and requiring complete regression tests for each module. There are lots of ways to write tests, but in perl-land one of the easiest is to simply make a t/ directory down your module tree, write a bunch of scripts in that directory named <some-test>.t that print out a series of "ok <n>\n" lines, and use make test or Test::Harness::runtests() to invoke them.

  • by ADRA ( 37398 )
    There is no use in doing a rewrite unless you do it right from the beginning. If you can't spend a decent amount of time planning the architecture of the system, then stop now, and quit.

    Also, since you have decided to pour resources into this thing, then my opinion would be to make as much of your code generic so that you don't have to make code changes later. It doesn't matter if there is an initial performance hit with the systems, becuase in the short term, you can convince your boss to get a new leet server, and in the future, hardware needed to run your apps will be trivially cheap anyway.

    If you are going to cut a new release, try to avoid going back and taking snippets of code from your old system. It makes people slide back into the odl paradigm and can cause detrimental effects on the bottom line of your new system. This is new, so the least exposure to the old one, the better.
  • by cybrthng ( 22291 ) on Friday December 21, 2001 @05:42PM (#2739455) Homepage Journal
    Well, i can't say it enough. Use a web framework, i don't know of any for Perl off the top of my mind, but i use Resin.

    Resin is a JSP, Servlet, XML, XSLT application server that support all the latest and greatest EJB components and Managed persistance on the database and makes a great framework to build from.

    I have Postgresl for the Database, and use beans to run queries and output via xml and then i have XSLT draft xml data into the wonderfull HTML code.

    The beauty is, my code is in beans, servlets and jsp's. My HTML is in .xtp files (in the case of resin) and simple XSLT sheets parse the XML to render to the user.

    Just means i can produce output easily for WAP, Palm, CE, Normal Web Browsing, EMail, and what not without modifying the backend. Just create xslt using a session identifier to bring up the corresponding stylesheets for whatever device is acesssing the page.

    Enough about java, but something similar, be it in house developed would be your best bet.

    I also get away with using a Swing applicatoin to manage the database, users and run reports provide an easy to navigate gui which just interprests the same xml data that would be retrieved by the html client. Not a single change to the backend since i'm using the beauty of soap, jsp's, servlets and java.

    Virtualize your interfaces, standardize on your backend and use re-useable components. Perl is similar in many ways that you can load libaries and abstract your code from the display which will save you tons of hours of hassle in any future upgrades compared to the bit of changes you would do now
  • Don't Listen (Score:5, Insightful)

    by augustz ( 18082 ) on Friday December 21, 2001 @05:43PM (#2739461)
    A ton of people will tell you a ton of things, never having retrofited anything.

    - Do not undervalue the investment you have learning your existing coding language. New challenges await you if you jump on a new language like java. Make the jump if you are excited about learning about the new language.

    - If you use your existing coding language you will literally fly through the retrofit. Do it piece by piece. Make all those changes first, then test app, then make next set of changes then test. The simple fact is, most wasted time is spent on bugs not working on performance, and you've already knocked down a lot of bugs, don't let them pop back up by blowing everything up. There are books on this.

    - Sometimes blowing everything up is worth it. Do it right this time. Realize it won't be as perfect as you might think it will be.

    - Remember there are countless open source and shareware products that tried to create TNG with a total rewrite, got nowhere, and ended up improving their existing product. Remember the lesson, bite off what you can chew.

    - Spend a week poking around researching possibilities. I do this all the time, bookmark things I think are important. Then for the next project you've got all the little things you might forget at your fingertips. Optimzations/Tools/Paradigms. Think you know it all? You'd be suprised at what is out there and what you missed. And what you spent a month in house re-inventing. This one's important.

    - Use open source software. Nothing beats free. Nothing is more fun. Java's ugly standardization history makes me puke... the BS Sun has pulled with Java is staggering. That the Java Lobby swallows it and loves it even more so. This is irrelevant to your question, and not fair to the Lobby, but I like to give them a hard time.

    - Colorary to Java. You need less abstract design then you think. Endless object hierarchies will weigh you and your app down... Their are books on this too.

    - You need more documentation then you think. Ever found code someone ELSE wrote too EASY to follow. I don't think so. Especially if you are using perl and someone is enjoying the line noise capabalities perl allows. Perl has 20 ways to do EVERYTHING, you may not know the latest or twistiest. Document as you WRITE the code. Do not leave at the end of the day without catching up the docs. A week of documenting is the worst form of hell, avoided with a minutes worth of clarification each time you write a function/class.

    - Hardware is cheap.

    Anyways, have fun... and good luck. Be interested to read what others have to say.
    • >- If you use your existing coding language you will literally fly through the retrofit. Do it piece by piece. Make all those changes first, then test app, then make next set of changes then test. The simple fact is, most wasted time is spent on bugs not working on performance, and you've already knocked down a lot of bugs, don't let them pop back up by blowing everything up. There are books on this.

      This is good advice. To be more specific:
      1. START with your regression test suite
      2. Then add self-documentation features like standard naming conventions. Seems dull and bureaucratic and pointless but really truly saves maintenance time.
      3. Have a standard comment header for each function. The standard should answer questions like "Can that argument be NULL?" and "What do the error returns mean?"
      4. If you're going through every line already, do a security audit.

      There's good advice in the refactoring books, for example http://www1.fatbrain.com/asp/bookinfo/bookinfo.asp ?theisbn=0201485672&vm=
    • hardware is cheap, but bumping up the hardware requirements when you're talking about pushing out your new code to several hundred machines in a hosted data center will quickly kill that Xmas bonus you were hoping for...
      • Re:Don't Listen (Score:3, Interesting)

        by augustz ( 18082 )
        Yes, I agree with this. I actually dislike some of the easy but (what I consider heavy) choices like Java for this reason. You can get by with so much less so easily for many projects, and support literally 5x as many users per server. For a small business this is the difference between profit and no profit. For a large business this can be the difference between .com and .bomb.

        All too many times I see sites begging for money for hosting/bandwidth. Take a look at their HTML/CSS and see it is HUGLY bloated (no linked css, prevent all caching including image with default cache buster installs) and not gzipped, and I wonder, if what I can see is so bad, behind the scenes it is probably even worse. (ie dynamic page gens where none are needed). Which I had included this in my list and left of the hardware point, which I agree is the wrong message.

        But damn, if you do code right, hardware is so cheap I can't beleive it. I'm convinced some 10 machine projects with bad coding can be supported on a single machine now.
      • I agree with your point and I'd like to expand on it.

        Yeah, adding a gig of ram or increasing the CPU from 800Mhz to 1.8Ghz isn't expensive. But, if you go beyond what a singl-processor machine can handle, you run into another host of problems.

        Adding a second CPU means *MUCH* higher chances of race conditions and other threading bugs. If you know you're coding for a single processor, you can often use a single-threaded model which makes life so much easier.

        Adding clustering brings a whole hoist of data synchronization problems. It's *ALWAYS* easier to code for a single-machine than to code for a cluster. There are tools you can use to make shared memory easier, but those often flood the network.
    • I agreed with everything until the last line. That statement needs to be violently beaten out of every programmer, manager, and engineer on the planet. I almost want to suggest that the death penalty be decreed on that statement od heresy.

      The "hardware is cheap" mantra is exactly why we have crapy code and bloatware. "I dont have to optimize, hardware is cheap." "Why make it efficient? hardware's cheap!" "I dont care that it is slow, hardware is so cheap these days"

      Please, everyone, cary stones in your pockets and throw them really hard at the next programmer that makes that statement.
      • That last line is making me have to BS my boss.

        "Sure, the dual tbird appro should handle 10000 users"

        omitting, of course, if it was all written in clean tight C, instead of java.
  • KISS (Score:2, Insightful)

    by ZaneMcAuley ( 266747 )
    Keep It Simple Stupid. Keep code simple, why have funky lines of code that looks awe enspiring. Its pig ugly. I work on the principle of KISS. When I am working on a code area that is being cleaned up or fixed, I try to simplify that area. The simpler it is, the easier and quicker it is to maintain and bug hunt.

    Im a strong believer in managed code (whether its C# or Java, except managed extensions to c++ which are damn pig ugly :) Exception handling for seperating error handling from actual code that performs the actual work).

    Make sure there is GOOD in code documentation (and out of code documentation) to explain the intention of that code. (Intentional programming is a research area btw, to program ones intentions).

    You know when something is bad when you have to maintain that code later on, 6 months down the line or whenever. Thats when it bites you in the buttie.
  • by adamy ( 78406 ) on Friday December 21, 2001 @05:45PM (#2739474) Homepage Journal
    To expand on the concept of Refactoring:

    1. Write a test for a specific block of code.
    2. Appliy the refactoring [refactoring.org]

    You are going to want a good testing framework.

    To expand on the modules post: Do a dependency analysis. If you are writing DB based code, look at what tables can be logically grouped together.
    We did something like this at my company not too long ago. The basical level package we had was the security package, which identifies users and roles. Most other packages depended on this. All contact management stuff went into a package called Directory. All stuff for the people our system was managing went into Participant etc.

    For each of the packages, split the code out into a set of interfaces, and a set of implementing code for business logic, and the UI required to drive that business logic. This is the standard breakdown for code. You may want to further pull out code into helper packages. Avoid the urge to call these helper or util, and instead try to name them based on what they contain: we have one called format for stuff like phone numbers and social security numbers.

    Don't forget the make scripts. What ever build you use, it should be used to specify which modules you want to compile/deploy

    I recommend a little UML modeling session for the end package structure.

    Go in little pieces. After each refacotring, makes sure stuff still runs.

    Good Luck
  • Do you have the original engineers for that code?

    Most knowledge is in theyre head, not on paper unfortunately. Theyre experience in that area can never be written down.
  • Plan a big sweeping rewrite; idealize the system you design based on your experience with the real one. Make the design clean, simple, and flexible.

    Don't build it. Instead, evaluate the code you have now and plot a course towards the idealized system. Approach the actual work of the "retrofit" incrementally. Count on having multiple customer-facing revisions of the software tagged and QA'd before the system you're delivering looks anything like the planned rewrite.

    Taking baby steps towards a new design is probably the only way you'll ever migrate your project to that design. With the knowledge you've accrued working on the old system, it probably seems straightforward to start from scratch. Even if this isn't wishful thinking, though, it's a waste of time. Part of the discipline of design is an understanding of where the "hot spots" are that can't tolerate inferior implementation, and how to tell those hots spots from the spongy mass of integration, reporting, tracing, and sanitizing that is neither performance sensitive nor mutable enough to justify engineering effort.

    You can take early baby steps that make it easier to make holistic changes down the road. Refactor relentlessly. Migrate code recklessly out of subsystems and into common repositories and libraries. I've found it handy to distinguish between "proper" shared library and "dumps" of utility code that don't need scrupulously conceived interfaces.

    Most importantly, design for testability. In this respect, the biggest asset you have is the steaming lump of old Perl code you're facing; use it to figure out the expected behavior of subsystems. Write replacements, in modules with clean interfaces, and unit test them. A unit test probes code (functions, statements, internal states) --- NOT entire programs. You'll work ten times faster when you can move forward as a team knowing what components you can trust and what components you need to worry about. You'll work ten times slower if you haven't clarified your outputs, side effects, and return values enough to know whether your replacement parts are valid!

    We've seen articles on Slashdot before about this and I agree with the prevailing opinion: rewrites are often seductive traps and time sinks that don't offer value to customers. A better mentality that will eventually get you where you (think you) want to be is to adopt a strategy of constant measurement (testing, profiling, debugging) and improvement.

  • It might seem like and obvious step, but don't throw away the old system until you're sure that the new one works! Keep somebody minding the existing, working system so that if/when your attempt to completely rework it fails you won't be stuck. Once you have rewritten it, try setting it up on a trial basis in parallel to the working system so you can find the crippling bugs before they take down your system.

    While it's not a perfect example, Slashdot is actually a decent example with their switch to their new system. They kept the old, crufty version as the primary and set up a beta site with the new software. They knew that there would be problems and got some of their more loyal users to test the new system and only switched over it after they were pretty confident that they had gotten the worst problems out of the way.

    You can afford to take a few more risks as long as you keep a known working system around as a fallback.

  • Small steps (Score:5, Insightful)

    by ChaosDiscordSimple ( 41155 ) on Friday December 21, 2001 @05:56PM (#2739534) Homepage

    Large overhauls are usually mistakes. Details in the previous code are lost. If the overhaul takes non-trivial time, people become frustrated that two weeks ago they had a working (if problematic) system and today most of the system doesn't work.

    Instead, make small incremental changes. Pick something lots of code is replicating and attempt to unify it into a shared code base. Spend some time documenting key parts of the code. Pick a particularlly hairy class or function and untangle some of the worst bits. These sorts of changes can reveal minor bugs, build up to significant improvements, and leave you satisfied at the end of the day that you improved things.

    If a signficant overhaul is necessary, try to overhaul portions while maintaining the existing bits.

  • by DotComVictim ( 454236 ) on Friday December 21, 2001 @05:57PM (#2739541)
    1) Identify common functionality.

    2) Encapsulate in libraries

    3) Be sure to extract enough generality that you don't have special case functions

    4) Don't extract so much generality that functional interfaces become unwieldy.

    5) Write everything in the same language.

    6) Find any complex pieces or algorithms. If they can be simplified or re-written, do it. If not, save it so you don't need to debug it again.

    7) Throw everything else away.
  • by angel'o'sphere ( 80593 ) <angelo.schneider@nOSpam.oomentor.de> on Friday December 21, 2001 @05:59PM (#2739551) Journal
    The first thing you definitly have to do is Setting up a test suit for regressoin testing.

    For those not familair with the term 'regression test':

    Program a set of so called "test drivers", programms calling your code: routines/scripts.

    Define test data, either in a DB or in flat files, used by those driver programs.

    The test programs and test data needs to work with the old code, of course.

    As the new code should behave similar you only need to adjust PATH or script names to let the test programs work with the new code.

    Plan your project by defining which test cases(test program plus test data) should work at a planned milestone with the changed code.

    After making changes rerun all tests.

    Well, there is a lot more you could do, but that above is minimum (basic software engineering, sorry no art involved here).

    Regards,
    angel'o'sphere
  • Little by little (Score:3, Interesting)

    by Mike Schiraldi ( 18296 ) on Friday December 21, 2001 @06:05PM (#2739580) Homepage Journal
    Don't stop development on the old tree and shift all work to the restructuring project.

    Instead, leave most of the manpower working on the old tree as always. Take a small team of your best people and have them gather input from the others, while the others keep working on the old tree. Then have the small team outline the changes that need to be made.

    Work on the changes while simultaneously working on the usual stuff. Say, 90% of your manpower should do what they do now, and 10% should work on restructuring things. One mouthful at a time.

    When you have a mouthful that you think is ready, branch the old tree. Merge your diffs into the branch, TEST IT, and if things seem to work, land the change onto the old tree.
    • Wise, but may I suggest the obvious extension?

      Don't play with code trees. Use only one "tree": the current, live one. Don't let one code base rot unmaintained while the other one is hacked to bits, untested.

      Do take one mouthful at a time. Add unit tests, make sure they work. Now add a unit test for something you WANT to work, but which doesn't. Make that unit test work by implementing the feature. Release the result. Repeat.

      -Billy
  • by burtonator ( 70115 ) on Friday December 21, 2001 @06:05PM (#2739581)
    Most of the computer industry now calls this Refactoring.

    I would HIGHLY recommend the book "Refactoring" by Martin Fowler.

    There is a number of things you should do here.

    1. Document your plan and come up with an official PROPOSAL document. Allow others to comment on this document and incorporate fix all relevant issues.

    I started using this under the Apache Jetspeed project and now a lot of other Apache projects are accepting this practice.

    It really allows the community to become involved in your changes and encourages constructive feedback and involvement.

    2. Break this into phases. You should NOT attempt to do this all at once. Each phase should be isolated and should consist of one unit of work.

    Each phase should be branched off of CVS, worked on, stabilized, brought back into HEAD and tagged. You should then RUN this code in a semi-deployment role for a period of time to correct all issues which WILL arise with the updated code.

    After this you can then start your next phase.

    3. UNIT TESTS! If management (assuming you have management) has approved the time for this type of refactor then you need to take the time and write Unit Tests for each major component.

    It is important that Unit Testing can sometimes be just as hard, if not harder, than the actual development itself.

    In some situations you can avoid Unit Testing, some here are going to call me crazy for saying this but it is true. In a lot of high level applications, which are NOT used as libraries by other applications, you can bypass Unit Testing in order to increase development time. This is a dangerous practice but it is often outweighed by the extra functionality you will end up with in your product.

    Anyway. Good luck!

    Kevin
  • by beamz ( 75318 ) on Friday December 21, 2001 @06:05PM (#2739582)
    Not too long ago a link was posted to an interview with Joel Spolsky who used to work at Microsoft.

    His comments about code reworking and rewriting have a lot of insight in them.

    Here are some quotes from his article:

    SMS: Joel, what, in your opinion, is the single greatest development sin a software company can commit?


    Joel: Deciding to completely rewrite your product from scratch, on the theory that all your code is messy and bug prone and is bloated and needs to be completely rethought and rebuild from ground zero.

    SMS: Uh, what's wrong with that?

    Joel: Because it's almost never true. It's not like code rusts if it's not used. The idea that new code is better than old is patently absurd. Old code has been used. It has been tested. Lots of bugs have been found, and they've been fixed. There's nothing wrong with it.


    The point is this... The benefits of spending time rewriting code completely may be a waste of the companies resources, this is for you to determine.

    His interview is here:
    http://www.softwaremarketsolution.com/index_2.ht m

    and his site has more information about the concepts here:
    http://joel.editthispage.com/articles/fog0000000 06 9.html
    • This advice makes sense if you are embedded in a culture like Microsoft, where you have a huge quantity of legacy code, and a really shitty software process. Microsoft has no record of what design decisions were responsible for making their code the way it is; they just have the code itself. They can't change it because they don't know what they might break by doing this.

      If you enforce a good software process for every line of code that is written, then you have more flexibility. For example, suppose that you use Extreme Programming. Then you will have a unit tests for every bug that was fixed. Each time you re-run the unit tests, you find out if any of your previous bug fixes have become undone. This gives you the freedom to refactor your code whenever it needs to be refactored.

      Doug Moen.
      • Microsoft doesn't have quality code, but it DOES have reams of documentation. They have, and follow sporadically, a really strong process -- in fact, a huge number of really strong processes.

        In many of their products, the design decisions are documented, and documented well.

        As with all documentation at that level of detail, good luck finding the right page...

        Other than that, you're right.

        -Billy
        • Interesting. I think that a software process can be "strong" (tedious, bureaucratic) without necessary being "good" (ie, effective). Does Microsoft's current process actually deal with the problem of not being able to refactor your code without losing all of your bug fixes? Keeping all of your design history in labrynthine documentation that is difficult to find anything in is expensive, but doesn't solve this particular problem.

          Doug Moen.
          • Well said, on all points. Microsoft's problem is that they have too many processes, and IMO most of them are too heavy (and therefore they aren't really being well-applied). So they don't have A current process; they have many of them.

            I personally have a bias against strong processes. I do admit that they can work, and work well; but they also cost a LOT, in every way; it's very hard to tell whether you're applying enough of the process to make it work. I'm definitely an XP fan; there, your process is light (although it IS hard), and feedback is immediate and automatic.

            -Billy
  • by tim_maroney ( 239442 ) on Friday December 21, 2001 @06:08PM (#2739594) Homepage
    I second all that has been said about making sure that you really need to do this and that it is worth the time and risk. One sign that you may need to do so is an excessive reopened bug rate, where fixing one bug often creates another bug due to side effects and component interactions. If you decide that it is, then the three keys to success will be modularity, incremental rollout, and unit tests.

    Modularity is probably what you're already thinking about. Go over the old code base, in a code review, and find where the same thing is done over and over either with copy-and-paste code -- the bane of crap engineers -- or with different code that serves the same ends. Look for repeated sequences in particular. Create a new library that encapsulates those pieces of code.

    Incremental rollout is vital. Only replace small parts of your system at a time, doing complete retests frequently. Don't write a new encapsulated routine and then roll it out to each of the three dozen places in which it appears in the whole code base. Write the whole function library, with unit tests, and then start applying it to separable modules one by one, retesting as you go. Otherwise I guarantee the whole thing will fall apart and you won't be able to tell why. Ideally, you might set a threshold on the rate of replacement of old modules and work primarily on creating new modules with the abstracted logic.

    Unit tests are crucial because, as noted, the messiness of your old code probably conceals a lot of necessary logic. We had this great phenomenon on Apple's Copland where people who had never used the old OS managers were rewriting them in C or C++ from the assembly source. When they saw something in the assembly they didn't understand, they just ignored it. Guess what -- the new managers didn't have any backwards compatibility. The only answer to this is to have a thorough unit test for any module that you replace, against which you can test the new version. This also confers other quality benefits, but during a rewrite it's critical.

    Finally, once you have replaced a significant number of your modules, you will find that new levels of abstraction appear. The average size of each function or method will have shrunk considerably, and now it becomes possible to see new repeated code sequences that were not visible due to the old cruft. Move these into your new library modules and start using them in continuing replacement work. In addition, start going back -- slowly and incrementally -- through the already converted modules and replacing the repeated sequences with calls to the new abstractions.

    Finally, figure out how you got into this mess in the first place. The worst programmer habit I know of is copy-and-paste coding instead of using subroutines. You can tell people not to do it, but some always will. Those people should be bid farewell -- you can't afford their overhead. Other common problems include lack of planning and review, a code first and think later mentality. Start moving your organization up the levels of the CMM and you may find that you wind up with fewer modules that need replacement.

    Hope this helps.

    Tim
  • in particular how do you write a large program? I work on a program that has over 3000 files that are over 30 to 300 pages each. The software does alot for and has been around for over 30 years and has just had code sloppily added to it. So how does one go about rewriting an application that noone knows all the functionality of?
  • This is a redundant post but I think it's imortant enough to be repeated.

    Don't set out to refactor your codebase as one big project. Try and split up the code by functional areas and take them on one at a time. Now, this doesn't really work too well most of the time. You're going to run into way too many places where things are interdependant but try anyways.

    If you can, resist changes to your database schema while you are refactoring code. Having both of these thing happen at the same time is pretty scary.

    You say CGI so I asume you are using PERL, look into OO PERL, it's worth it. Even if you don't want to go OO all the way the 2 big things that can make your life easier are packages and layering.

    Using packages, especially for things like DB access can save you tons of time and headaches. You have one place where you run a query and build a hash, all of your code calls it when it needs the data. HUGE adavantages here.

    Layering your design is helpful ass well. I've found that you can do a lot of good if you have designed Data Access, Logic and Presentation layers seperately. All each one of these layers needs to do is take the hash ref passed by the other layer and do X with it. You can rebuild each layer at will as long as the data structures passed between them don't change.

  • chizor wrote:

    My programming team is considering making some sweeping changes to our code base (150+ perl CGIs, over a meg of code)... What suggestions might readers have about tackling a large-scale retrofit?

    My advise to successfully accomplish the changes:

    • Where possible, use Python or Java instead PERL
    • If there is a line-of-business piece of code (i.e. if your company's bread is on the line when this fails) make sure you can roll the original code back if the new version doesn't perform
    • Divide the programming team in groups of 2-4 people
    • Use XP techniques for development
    • Have an external group evaluate progress to keep people honest; this group is responsible for testing that functionality meets or exceeds that of the original system

    I led the development and migration of some very large mission-critical systems in my career. Too many programmers making decisions on-the-fly, totally centralized management, or a "leave the technical folks alone until they're done" attitude are sure recipies for disaster.

    Good luck with the changes.

    Merry Christmas, and God bless us everyone!

    E

  • DON'T DO IT! (Score:3, Interesting)

    by mcrbids ( 148650 ) on Friday December 21, 2001 @06:26PM (#2739651) Journal
    As posted elsewhere, re-writing is painful, and generally NOT A GOOD IDEA!

    I currently maintain a code base of around 120,000 lines of php and html (written by myself in a long, hard year) and have had to "retrofit" it a few times.

    I find that when it's time to do an "over-haul" it's generally best to:

    1) Pretend I know nothing - redesign from scratch. Write out a spec with flow charts, DB table definitions, etc. - make it VERY DETAILED. Spend lots of time at it. More time spent here saves even more time later.

    2) Ignore your spec. (See step three)

    3) When a bug comes up, or new functionality needs to be added to the codebase, refer to the spec built in 1, and build to it, and then put in compatability wrappers to work with the existing codebase.

    Make these compatability wrappers log their calling in some way, based on a global variable. This allows you to see when they're no longer needed simply by defining a variable in a config file and waiting a while.

    4) You'll be slowly bringing the application up to the new spec - eventually you'll reach a point where it's easier just to bring the remaining pieces up to snuff than to build more abstraction wrappers. When you get to that point, you'll find most of the work is already done, just finish it to the spec and remove the compatability wrappers.

    This can still be a painful process, but at least it isn't a "gun to your head"! This allows you to regression test your work as it's done, resulting in a more stable deliverable, and you can still meet clients' needs in the meantime without making them wait 6 months while you re-write all your stuff.

    Hope this helps...

  • ...don't use Perl ;-)


    [-1 Flamebait]

  • First, read this essay: The Big Ball of Mud [laputan.org]. It is an interesting look at why, when we all know that spaghetti, gnarly, twisted code is bad, that it happens anyway (hint: it may mirror your understanding of the problem).

    Ignore the "don't touch it" naysayers. Even before it's done, it'll be much nicer code to deal with. You can make decisions with less nagging doubts. You'll code onward with gusto. You'll be able to accurately predict the names of methods without looking them up.

    Test the current state of things at all points through the process. I'm hoping that you have lots of automated tests you can run everytime code is checked in; if not, make them FIRST, before the overhaul. You are majorly diverting the intent of the code at hundreds of points; you can run astray in so many places that the above naysayers would be correct. Constantly assure yourseleves that the code is working. Go out of your way to ensure that the code is buildable and runnable, even to the point of writing scaffolding you know will be soon thrown away.

    Burning a little incense every day in obesience to the Gods can't hurt, and will make the room smell nice.

    mahlen

    Shantytowns are usually built from common, inexpensive materials and simple tools. Shantytowns can be built using relatively unskilled labor. Even though the labor force is "unskilled" in the customary sense, the construction and maintenance of this sort of housing can be quite labor intensive. There is little specialization. Each housing unit is constructed and maintained primarily by its inhabitants, and each inhabitant must be a jack of all the necessary trades. There is little concern for infrastructure, since infrastructure requires coordination and capital, and specialized resources, equipment, and skills. There is little overall planning or regulation of growth. Shantytowns emerge where there is a need for housing, a surplus of unskilled labor, and a dearth of capital investment. Shantytowns fulfill an immediate, local need for housing by bringing available resources to bear on the problem. Loftier architectural goals are a luxury that has to wait. -- from "The Big Ball of Mud"

  • by runswithd6s ( 65165 ) on Friday December 21, 2001 @06:31PM (#2739666) Homepage
    As everyone else should be telling you, your existing codebase is a very valuable resource. It's been tested and debugged a number of times. It's mature, it's stable, it's there. Don't throw out the baby with the bathwater.

    Now, on coding standards and how to incorporate them into a legacy project. Your concern is NOT format. The format of your code, such as indentation, spacing, etc, should be the least of yoru concerns. Everyone has their own style, but there are wonderful tools that you can use to force everyone to a single style, ident and astyle just to name a couple. Use a wrapper script to the CVS (or similar system) on checkins. Force the code through an automated cleanup. check the code back out and make sure it compiles/runs as expected.

    What you should worry about is how much your design team has embraced the "black box" design principal. Parameters go in, results come out with no "side-effects" that impact the remainder of the code. Make your code re-entry safe, i.e. stay away from globally scoped variables as much as possible.

    Someone's going to give you the whole OO-Design sales pitch. Yes it's nice on paper, but don't sell out because something looks nice on paper. I learned this the hard way. I have a tendency to overdesign things. When OO, this gets to be really scary. I waste my time writing object classes for "everything" instead of simply designing the software to its functionality spec. Make things more "object oriented", "functional", or "blackboxed" when you find yourself repeating code elsewhere in the application.

    Don't spend a lot of time with naming standards such as Hungarian, Modified Hungarian, etc. Find a style that you and your team is comfortable with for the Interface API level. Below the Interface API, be more lenient. It's likely that portion of the code will undergo many changes anyway.

    And most importantly, document! This is the singlemost important issue of any coding project. Either force your developers to write docs as they go, use embedded documentation solutions, or hire a techwriter to follow you and your team around for a few months. Documented API is the quickest way to start someone off in the project, and a great way to keep track of the flow of the program.

  • Does it work? (Score:2, Insightful)

    by litewoheat ( 179018 )
    If it works don't break it. Pointless rewrites do nothing but feed the programmers ego and knock the company out of business. Its happened over and over and over and over again. Progress, not regress.
  • My suggestions (Score:5, Informative)

    by Pinball Wizard ( 161942 ) on Friday December 21, 2001 @07:32PM (#2739828) Homepage Journal
    My programming team is considering making some sweeping changes to our code base (150+ perl CGIs, over a meg of code


    First of all, I think its important to realize that you have a medium-sized website and not a big software project. Therefore, some of the above comments recommending refactoring, UML, and eXtreme programming may be a bit overkill.


    Web programming != software development! Its usually done at a much faster pace. Even if an object-oriented approach is taken, you are still probably talking about simple function libraries rather than complex C++ or Java classes. Again, overkill.


    150 files is still a small enough project to be managed by one or two decent coders. Actually, I just looked at the amount of stuff I've written over the years for my online bookstore [page1book.com] and its more like 500 files and over 4 megs of code. I don't feel like its too much of a job to manage this codebase by myself.


    So, here are my recommendations.


    You probably have gotten better at programming since the time you started your project. Take a few of the most recent CGIs you have written and compare them to the first ones you wrote. You just might notice a glaring difference in the quality. Also, the first pages you wrote are likely to be among the most important in your project, yet they are also likely the worst quality-wise.


    Regardless of what language you program in, I think its important that you can tell whats going on in the program by reading the comments. If a manager can understand what a program does by reading the English bit, there's a good chance other programmers will be able to jump in and help as well. One specific rule I also follow: if you do regexes, say IN ENGLISH what those regexes do. I say this because regexes are one of the hardest things to read.


    Look for any code that can be "factored out" of your scripts and put those into function libraries. Then include those in your program. The only problem with this occurs when you have huge function libraries that slow down your scripts when you include them. In that case you would logically separate your functions into different files. I have included very common functions in different include files, so I can make the actual code compiled or interpreted as small as possible.


    Consider using a flowcharting tool as an aid to programming and/or documenting your code.


    Standardize how you name variables and functions, write comments, identation, and spacing.


    Be sure and include the date you write your scripts in the comments, in case the filesystem wipes this out.


    I'm sure theres other things I've left out, but following the above guidelines have helped me do exactly what you are trying to do: manage a growing codebase. But don't forget, this is web programming, not rocket science, and some of the above suggestions may be more trouble than they are worth. Keep it simple.

  • by Degrees ( 220395 ) <`em.hcsireg' `ta' `seerged'> on Friday December 21, 2001 @07:38PM (#2739846) Homepage Journal
    IBM was faced with a simlar problem when they came out with the RISC version of the AS/400 hardware. They needed to re-write, but they had to maintain absolute compatibility. I think the articles describing this were in Dr. Dobb's Journal (although I might be wrong on this). Unfortunately, I do not have URL's for this.

    One of the success factors they found was documenting the interfaces for each and every call between modules. The documentation turned out to be excruciatingly precise - but this led to zero ambiguity (and thus 100% interoperability). It also required meetings (sometimes arguments) between programmers to hash out what was actually going to happen. Another factor was that they decided to allow zero 'overloading' of functions by different modules. A programmer was not allowed to duplicate someone else's work, nor create a second, incompatible version of a function provided in a different module. If the function was provided by someone else's module, the programmer had to call it (properly). The result was that they reaped the benefit of object oriented programming - reuse and refinement of modular libraries.

    It would be better if you could get the real scoop from the real programmers - but this might give you something to think about.

  • by William Tanksley ( 1752 ) on Friday December 21, 2001 @07:48PM (#2739880)
    Some here are warning you that major changes always require a total rewrite; yet in real life, total rewrites result in inability to compete (look how long the Netscape rewrite paralysed Netscape, unable to meet Microsoft's challenge!). There's some good discussion of the danger of rewriting at a former MS software engineer's site [joelonsoftware.com], and some limited advice about how to get away without doing it.

    But you've decided to rework rather than rewrite, you say, so I have no doubt you'll ignore the naysayers here. So what CAN you do? After all, as you recognise, reworking is dangerous!

    The following rules have worked for me; I've refined my own experience with advice from Fowler's Refactoring, a book as useful as Design Patterns, and with study of Extreme Programming, a design methodology forged in the traditions of Smalltalk, and in the knowledge that maintainance, the most important and expensive part of software engineering, is also the least studied.

    First, do the simplest thing that could possibly work. Don't EVER take your program out of commission for more than a day; make sure it runs at the end of each day. If you're doing something and at the end of the day your code base is broken, STRONGLY consider throwing away your changes and going back to the design stages.

    Second, rely on unit tests extensively. Start every change by writing as extensive of a unit test as possible. Unit test every function you touch, BEFORE you touch it, and after. Unit test every change you make, and run the unit test BEFORE you make the change to ensure that it fails (i.e. it detects the change). Write your unit tests BEFORE you write code, whenever possible; you'll objectively know your code is done when your unit tests pass.

    Third, don't design too far ahead; you don't know what tricks the old code is going to throw at you. Implement one feature at a time, bringing the code into compliance. Once everything has a unit test (thanks to your following the above principles), THEN you can safely embark on larger design changes -- and in the meantime, you have working code with new features, a win even if your customer/boss/manager decides not to continue.

    Fourth, don't be afraid to redesign your own code. The stuff you wrote has more tests, so it's safer to change, but it's more likely than the old code to lack some critical understanding only age can give.

    Fifth, use the principles of refactoring. Whenever possible split each code change into two parts: first, a part which changes the structure of the code without changing its function (and which therefore allows you to run the same unit tests); and second, a part which uses the new structure to perform a new function (thereby requiring new unit tests).

    Good luck. If you want more advice, read up on Extreme Programming [extremeprogramming.org].

    -Billy
    • Much agreed. We've found, though, that sometimes it's tough to write as many tests as you really should. Before you even start changing code, write a pile of tests-- it's best if you can just allocate a week or three for it.

      Beyond that, it's tough to stay motivated about writing tests; people want to write new code, not test the old. So mix it in-- we've decided to require a test to be checked in with every bug fix, figuring that if it broke once, it'll break again.

      While I'm at it, I totally disagree that a total rewrite is always the wrong choice-- you just have to keep significant resources going ahead on the old stuff. If your old stuff really was that bad, the new will catch up with it without too much trouble. If it doesn't, well, you didn't really need the rewrite.
  • Don't rewrite your code. That's a waste of time.

    Don't spend a big chunk of time refactoring it either. Waste of time too.

    Instead, make slight refactorings as you go. But make sure you are doing what you are really being paid for: implementing business value.

    And you'll find that you'll have much more courage to refactor if you have a full set of automated tests, so maybe you should work on tests first.

  • I've rebuilt dynamic sites from scratch twice. First and foremost, if you're rewriting because of serious scalability or design weakneses, then it is unavoidable. If it's just to get rid of annoying things, then I would say don't even try it. I consulted at a fairly big E-Commerce site that was crawling and couldn't handle the traffic. The original site was built by a programmer who scaled examples provided by MS. After it was done the whole site was a dog and would crash constantly. They finally brought in a programmer who was able to rewrite parts of it and make it work. After 7 months of intensive work, they 2 people stabilized the site. They decided to completely rewrite the site and I was contracted to help.

    In this particular case, it was necessary because the site was right at the max. If the traffic increased, it would kill the site. Since it was an E-Commerce site, rewriting it was fairly straight forward. The old code kept running, until we were able to finish the new system and make sure it was stable and ready.

    As a consultant, one of the most important aspects is detailed documentation that explains both the high and low level details. Often I will include very specific details about why a design was chosen and what limitations it has. When applicable, I will also describe how to extend, or modify the code to support additional features. This means you spend a lot of time doing documentation, but it forces you to think about a design more thoroughly and will expose weaknesses. Always keep an open mind and never fall in love with your design. There is no right way to build something, only right for the situation you are given.

  • Obviously you didn't read this slashdot article [slashdot.org] about Joel on Software.

    What is the #1 thing he says causes software companies to fail? Rewriting from scratch!!!

    I won't deny that there are times it has to be done. Joel points out some of those time (and yours isn't one of them even from the little you've written). Ours happened to fit everything he said, and we did rewrite, and not only was it the right thing to do, it was the only option for us, but the company just barely survived the process.

    Don't take that lightly. I speak with experience and Joel is right. You don't rewrite from scratch unless there is no conceivable alternative. Joel describes it well, so I won't "rewrite" it. You can just click the link and read it yourself.
  • php project mngt app (Score:2, Informative)

    by gol64738 ( 225528 )
    a good project management application is important for any development team. usually, these are hard to come by unless you plunk down $10,000 or more, although these come with a gazillion features that you probably won't end up using.

    i discovered a new tool on sourceforge [sourceforge.net] which is an open project written in php.

    i'm impressed with it. the code is also well documented.
    the homepage can be found here [tutos.org].
    i recommend checking out the screenshots as well.
  • When you first design and implement some module, a
    lot of time is involved in cycling between "ok, I
    know what to do" and "huh, maybe not". I've found
    this crucial, esp. in team work, in order to gain
    a good conception of the scope of the task. Also,
    many external issues, e.g. how the module interacts
    with the system, efficiency, etc. that aren't pure
    functional issues, are first grappled with here.

    Refactoring is different from this, in that you're
    probably very comfortable with the "state of mind"
    of the code. Instead of creating, you'll be
    clarifying. So, most of the refactoring is in
    your head (99%). All the external issues have been
    addressed before (or else this probably isn't really
    refactoring), so just work at a white board with
    your team until writing the code will basically
    be transcription (1%).

    I've found this to yield the best code.
  • by Codifex Maximus ( 639 ) on Friday December 21, 2001 @11:40PM (#2740275) Homepage
    if it works... don't fix it.

    If you feel you have to fix it, then prioritise the most problematic parts and fix them according to a set plan/policy. Use a naming and calling convention. Break functions that do more than one thing up into component functions that can be tested, verified and reused by other parts of your program. Fix it incrementally not all at once. Try using an interface contract when you make objects; that way, your new functions can call new methods and the old code can depend on old methods to be there. Deprecate the old methods when there is no code that depends on it. Don't forget to comment - comment the code then come back the next day and read your own comments. Make changes to the comments so they make sense today.

    Blah... blah... blah... Etc... etc... etc...
  • by LoveMe2Times ( 416048 ) on Saturday December 22, 2001 @12:38AM (#2740388) Homepage Journal
    There's a few things that I'll say. One, not a single person posting here really knows what your situation is (well, unless your coworkers read /. and are posting advice for you...). I don't want to be mean, but most of what's been posted here is likely not applicable to your situation. For starters, ignore all the threads arguing over languages: they'll never agree, and in the meantime you have work to get done. Next, I would suggest ignore all of the posts from people who aren't talking about web sites. There's a lot of methodology recommendations there, and it takes a long time to really understand *any* methodology. So if you don't already know it, it won't help for *this* project. If you're interested in that kind of stuff, read up on it; maybe it'll be useful on a future project.
    Next, no matter what you decide, from your current position, there's substantial risk involved. If you don't have a good way of estimating the costs and benefits of your alternatives, then you will wind up shooting in the dark no matter what you choose. This isn't really unusual, but it can be stressful! Somebody in management has to decide if they want to play high stakes poker with this project. This will establish how many nice risk mitigating activities you can budget (like unit tests, code reviews, documentation, etc). One thing a lot of technology people have difficulty grasping (this isn't directed at anybody in particular), is that if your management decides on a high risk course of action, it's not WRONG because it's technically inferior. Now, if management doesn't understand that it's a high risk proposition, then there's a communication failure somewhere that will screw you no matter what gets chosen.
    Ok, now that you've decided where you're going, I've got 3 pieces of useless advice (all advice is useless advice, btw...)

    1) Know your priorities. This is the most important thing to getting ANY solution.
    2) Understand your requirements(I don't care how you do this). This is the most important thing to getting (close to) a CORRECT solution.
    3) Remember it's only a job. This can be very important for retaining sanity :)
  • by Chris Johnson ( 580 ) on Saturday December 22, 2001 @09:15AM (#2741154) Homepage Journal
    I'm currently doing something similar- the program I maintain is Mastering Tools, which has long been an over-featured, densely packed program operating on a text input basis with a certain amount of visual feedback on the text input.

    However, I've long known that there are two things my real users (ideal users? the serious mastering engineers) want: familiar interface and realtime processing. I can't deliver realtime processing without literally doing the whole thing over again in a language I don't know (it's done in REALbasic, which is little better than a scripting language for speed, though it's got really, really nice prototyping abilities and GUI support.

    However, the time's come to completely overhaul the interface, partly because I have some ideas for mid/side processing and don't have any room in the current layout to fit them! The ideas have to do with rectifying the side channel and using it to either enhance or remove signals that aren't in both channels equally- where regular mono is a 'node' that totally eliminates out-of phase content, it's also possible to completely eliminate R and L-only content and keep only R+L and also the out-of-phase content.

    That part's the easy part- it's simply signal processing (and will be duly released under the GPL as soon as it's done, as always). However, the interface is asking for a total, complete overhaul in several ways, and that's what's taking all my effort currently. Here's the situation and how I'm handling it...

    Layering. It's no longer possible to fit the whole app in a 640x480 area, even with small print. There are several possible answers to this: one would be having separate windows. This would be more adaptable to larger screens, but it's untidy and there are issues with closing windows and still referring to controls they contain- so that's out. What's looking more reasonable is tabbed panels- RB implements a nice little drag and droppable tabbed panel control that appears quite easy (though I had some trouble attempting to do nested tabbed panels- after one experience of having all the nested panels (at identical coordinates) switch to the top panel of their parent panel, I quickly gave up on that concept. Instead I'm using more panel real estate and trying to divide the controls into logical categories. That is, of course, a real headache- doing interface properly is hard! (says 'interface is hard Barbie') It's only somewhat easier with the additional space. Complicating matters further, is the expectation of the intended audience here. It has to both be organized and look and feel like a mixing board, amplifier or rackmount box of some sort.

    Solution: implementing controls like knobs and meters. It's actually quite fun to code a knob appearance out of graphics primitives- and surprisingly hard to get mouse gestures to work on the damn thing- using a two-arg arctangent routine in RB that I don't fully understand. I also have a single meter control already implemented for azimuth display- think that it might be best to tear that apart and re-implement it in a more general sense.

    That's because the Knob class turned out to be the right thing to do- it's implemented almost totally separate from the main body of the program, as if it were a RB control or something. Knowing I was going to be using it in different sizes, possibly different colors etc, I wrote the knob code as completely scalable- from maybe 20 pixels to over 100. It reads its size from the control width and height, runs itself as far as handling mouse input and storing a control value, and does not inherit comparable interfaces to the controls it will be replacing- so when it's time to plug 'em in I can run the program and see what routines crash and burn (and need to be rewritten for the new control interface). I'm thinking meters need to also be handled in the same way- somehow- not clear on the form yet.

    So: dunno what else to tell you, but it seems like the things I'm doing that are helpful are: compartmentalize new interfaces, make them adaptable and get them working independently of the existing code, while leaving the existing code in working condition. Then when the new stuff is brought online do it in such a way that you could do it one control at a time or in small batches.

    Dunno if that's relevant to where you're coming from- but it's what I've found necessary when facing a major re-implementation.

The explanation requiring the fewest assumptions is the most likely to be correct. -- William of Occam

Working...