Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Programming IT Technology

Organizing Source Code, Regardless of Language? 64

og_sh0x queries: "I'm looking for a source of information dedicated to organizing source code. I see a lot of books and other resources covering syntax and various syntax-related philosophies, but I can never seem to find a good resource for organizing source code in general. For instance, at what point do you split that massive source file into multiple files? At what point do two functions approaching similar functionality need to be merged, despite the cost of digging through the source and making changes to call the new function? These are problems that plague many programming languages. Are there such resources that cover these issues?"
This discussion has been archived. No new comments can be posted.

Organizing Source Code, Regardless of Language?

Comments Filter:
  • This (Score:4, Insightful)

    by psavo ( 162634 ) <psavo@iki.fi> on Tuesday July 02, 2002 @06:10AM (#3806294) Homepage
    is called 'Experience' part of your CV.

    I've yet to find a simple way to determine any of those. It's just that feeling when you get while looking at the code 'damn, not again..'.
    • Refactoring (Score:5, Informative)

      by fingal ( 49160 ) on Tuesday July 02, 2002 @06:40AM (#3806363) Homepage
      I would strongly recommend that one reads "Refactoring" (Martin Fowler - Addison-Wesley - 0-201-48567-2) (after reading Design Patterns) for the solid techniques that it introduces for clearly defined manipulations that change the shape of code without changing the functionality. However, even with this it doesn't completely resolve the issue of "when" that you raised, as summed up in the introduction to chapter 3 "Bad Smells in Code":-
      By now you have a good idea of how refactoring works. But just because you know how doesn't mean you know when. Deciding when to start refactoring, and when to stop, is just as important to refactoring as knowing how to operate the mechanics of a refactoring.

      Now comes the dilemma. It is easy to explain to you how to delete an instance variable or create a hierarchy. These are simple matters. Trying to explain when you should do these things is not so cut-and-dried. Rather than appealing to some vague notion of programming aesthetics (which frankly is what we consultants usually do), I wanted something a bit more solid.
      I was mulling over this tricky issue when I visited Kent Beck in Zurich. Perhaps he was under the influence of the odors of his newborn daughter at the time, but he had come up with the notion describing the "when" of refactoring in terms of smells. "Smells" you say, " and that is supposed to be better than vague aesthetics?" Well, yes. We look at lots of code, written for projects that span the gamut from wildly successful to nearly dead. In doing so, we have learned to look for certain structures in the code that suggest (sometimes they scream for) the posibility of refactoring.

      One thing we won't try to do here is give you precise criteria for when a refactoring is overdue. In our experience no set of metrics rivals informed human intuition. What we will do is give you indications that ther is trouble that can be solved by a refactoring. You will have to develop your own sense of how many instance variables are too many instance variables and how many lines of code in a method are too many lines.

      The book then goes on to describe the various types of "abstract smells" and what sort of correctional techniques can be considered to correct them, for example:-

      Iappropriate Intimacy
      Sometimes classes become far too intimate and spend too much time delving in each others' private parts. We may not be prudes when it comes to people, but we think our classes should follow strict, puritan rules.

      Overintimate classes need to be broken up as lovers were in ancient days. Use
      Move Method and Move Field to seperate the pieces to reduce the intimacy. See whether you can arrange a Change Bidirectional Association to Unidirectional. If the classes do have common interests, use Extract Class to put the commonality in a safe place and make honest classes of them. Or use Hide Delegate to let another class act as a go-between.

      Inheritance often can lead to overintimacy. Subclasses are always going to know more about their parents than their parents would like them to know. If it's time to leave home, apply
      Replace Inheritance with Delegation.

      I have frequently found that just reading through this short (~15) collection of abstracted "smells" gives a very good way of supplementing the "experience" that you speak of and helping you to make decisions with the benefit of a) a bit of third party support in making these decisions and b) a clearly defined set of rules as to how to apply each of the refactorings including test cases to prove that the functionality has not been changed in the process and, more importantly, a clean roll-back procedure for those times when the olfactory senses get a little bit confused...

      • Re:Refactoring (Score:5, Informative)

        by gaj ( 1933 ) on Tuesday July 02, 2002 @06:59AM (#3806398) Homepage Journal
        I'll second that recommendation.

        When I first saw "Refactoring", I said to myself: "Self, now I've seen everything". I thought it was yet another book enshrining process and procedure over good working code.

        I was wrong.

        This book is really good for those who haven't yet learned what Stroustrup refers to as "taste". Hell, I've been coding for many, many years and I certainly thought it was worth a read!

        My only caution is the same as the one I give about GoF, UML, OOA/OOD/OOP, or any other codified programming "methods": Don't blindly follow them w/o taking your own experience into account.

        Basically, the less time you've been coding, the more seriously you should take these concepts. Over time, and with many KLOC, you'll develop your own "taste"; your own sense of what works. This is not to say that learning new methods is useless to someone who's been coding for a long time; far from it. Just that, in most cases, a hacker will develop a pretty good idea of what works for them and what doesn't. The dirty little secret is, whether your like the language he "accreted" or not, Wall is right: TMTOWTDI. And knowing which way to use in any given circumstance can only come with experience. Reading books like "Refactoring" can help a lot until you get that experience, though!

      • There good news is that there are great tools for automating refactorings for some languages (Java and Smalltalk come to mind immediately). The bad news is that C/C++ is not one of the languages, but C# is.

        That said, some of what refactoring browsers do can be done with search and replace. Be careful though, you don't want to change all occurences of "i" to "index" and end up with code that won't compile because there's no type "indexnt".

        The ability to rollback refactorings is essential, too. Industrial-strength source control tools are pretty much a necessity, allowing you to re-get your CVS tree if a refactoring attempt gets out of hand.

        Since I bought Martin Fowler's book and started studying refactorings, my code has gotten much easier to live with. I no longer fear making a significant change to add functionality or fix bugs because I know I can refactor and still have code that continues to work as before. The addition of unit tests has helped to ensure that it not only keeps working, but it keeps behaving as expected.
        • That said, some of what refactoring browsers do can be done with search and replace. Be careful though, you don't want to change all occurences of "i" to "index" and end up with code that won't compile because there's no type "indexnt".

          This is why one should seperate every independent token. The vi command :%s/i/index/g may break a lot of things, but :%s/ i / index /g will not.

          • Make that:
            :%s/\<i\>/index/gc
            That's escaped angle brackets to make i match only when it's not part of another word, and /c to require a confirm of each substitution. Confirm might not be necessary but then again it might.
  • For instance, at what point do you split that massive source file into multiple files?
    Right from the start, I'd say. Each function/class/whatever should have their own file.
    At what point do two functions approaching similar functionality need to be merged, despite the cost of digging through the source and making changes to call the new function?
    As early as possible, when as few as possible other functions or parts in the code already use it.
  • Read Stroustrup (Score:3, Informative)

    by ObviousGuy ( 578567 ) <ObviousGuy@hotmail.com> on Tuesday July 02, 2002 @06:13AM (#3806304) Homepage Journal
    Programming C++'s first couple of chapters discuss this very topic.
  • by Anonymous Coward
    www.refactoring.com - or any other good refactoring books should help loads to get you started. but there's nothing like experience :)
  • There are no simple answers to these questions. Best you can do is to formulate your own policy, and stick to it. In real life projects there will always be exceptions and special cases, but it helps a lot if all people working on the project at least know of the existence of common guidelines, and preferably understand and agree with the reasoning behind them.
  • Book recommendation (Score:5, Informative)

    by Read The Fine Manual ( 27464 ) on Tuesday July 02, 2002 @07:18AM (#3806435)
    Have a look at this book by Steve McConnell: Code Complete: A Practical Handbook of Software Construction [amazon.com].

    Yes, McConnell is a Microsoft guy, but this book is completely operating-system and programming-language agnostic (even though examples are in C, Fortran, and Pascal, IIRC). It is an excellent guide to software construction, covering every aspect from design, over coding practice, style issues, to project management. I highly recommend it.

  • by p3d0 ( 42270 ) on Tuesday July 02, 2002 @10:13AM (#3807114)
    These sound like the wrong questions to me. It reminds me of someone's (perhaps Dijkstra's?) story of the response he received when he recommended abolishing gotos [acm.org]. Someone said "ok, I'll buy that; so what do I do if I'm at this point in the program, and I want to get to that point?"

    The trouble with such a question is that it has no answer. Dijkstra's argument was not that one should take existing programs and remove the gotos; rather, that programs written using only structured elements (sequencing, conditionals, loops) are more comprehensible, and don't require any gotos because there is a more elegant way to achieve the same effect. Thus, as you can see, there really is no answer to the question; the questionner's approach was fundamentally flawed.

    Likewise, software organization is not done in terms of functions; rather, it is done in terms of information-hiding modules [acm.org]. To ask when one huge function should be split into to, or when two similar functions should be merged, indicates to me that the design might be flawed. Sometimes that's unavoidable; for instance, if you are involved in a project written by someone else. In that case, you do indeed need to make this kind of decision.

    However, true modular programming does not mean taking huge lumbering hunks of code and splitting them into modules. It means writing modules using the principles of information hiding to avoid making huge lumbering hunks of code in the first place.

    This, of course, is easier said than done. It's not that hard to avoid gotos, because the use of Dijkstra's structured programming techniques makes them unnecessary. In contrast, writing good modules is hard, and without superhuman foresight, some modules are bound to be pretty crummy. These will need to be rewritten in order to achieve good information hiding properties.

    So, there's your answer: don't put the cart before the horse. Don't expect that someone will tell you that you need to split a function when it gets beyond X number of lines. Rather, look at the integrity of the system's modules. If I can leave you with one piece of advice, I hope it is this: design module interfaces not according to what services they provide, but what information they hide. Modules for which you can't find a succinct statement (12 words or less, with no ifs, ands, or ors) of what information they hide are poorly designed, and need an overhaul. A symptom of this may be that your functions are redundant, or too long, but the core problem is one of poor module design.
    • by elflord ( 9269 )
      So, there's your answer: don't put the cart before the horse. Don't expect that someone will tell you that you need to split a function when it gets beyond X number of lines. Rather, look at the integrity of the system's modules. If I can leave you with one piece of advice, I hope it is this: design module interfaces not according to what services they provide, but what information they hide.

      Actually, the questions he is asking are indeed very important. It's all well to say that code "should be well designed", and indeed, most books spend a lot of time talking about design principles for people with clean slates. Unfortunately, very few people have a clean slate to work with. Using a good design up front is not an option if you're not the one who did the upfront design. We are either stuck with maintaning poorly designed code, or even code that was designed well up-front, but needs a change in design to meet changing requirements. What a book like refactoring brings to the table is the process of incremental redesign. Redesigning code without rewriting it is a fine art, and refactoring basically explains how to do it.

      • You're right, of course, that it is too easy to say "you should have designed it right in the first place", and I tried not to say just that, though I may have failed. :-)

        I tried to give some advice on how to tell whether a module system is good (that is, by information hiding); and further, to answer his question, my advice would be to refactor whenever he sees that information is not being hidden properly by the system's modules.
    • To ask when one huge function should be split into to, or when two similar functions should be merged, indicates to me that the design might be flawed. Sometimes that's unavoidable; for instance, if you are involved in a project written by someone else.
      Dude, that's the funniest quote I've read all year. Thanks.
    • (* design module interfaces not according to what services they provide, but what information they hide. *)

      Sounds like a hidden ad for OO thinking.

      oop.ismad.com

      OOP has never been proven to be objectively superior, neither WRT code size, nor reuse, nor less change under change-impact analyses. (Except possibly in a few narrow domains.)

      The trick to procedural is good table schema design IMO. In 70's they didn't know about this when they started bashing procedural designs and promoted OO as a solution.

      • Sounds like a hidden ad for OO thinking.
        Information hiding is orthogonal to OO.
        OOP has never been proven to be objectively superior, neither WRT code size, nor reuse, nor less change under change-impact analyses.
        Name a discipline that has been proven in such a way.
        The trick to procedural is good table schema design IMO.
        No offence intended, but IMHO that's retarded. Most of your system shouldn't have a clue that there even are tables. Plus, this falls flat for systems that are not based on tables. If you don't mind my saying so, it sounds like you have written software in a fairly narrow application domain.
        • (* Information hiding is orthogonal to OO. *)

          Perhaps a realistic example is in order. Shape, animal, and device driver toy examples don't scale to real things that I actually encounter.

          (* Name a discipline that has been proven in such a way. *)

          One can show that 3rd-generation languages can code the same thing with less code and be more transportable to other platforms than assembler.

          (* Most of your system shouldn't have a clue that there even are tables. *)

          Relational tables are a protocol and organizational philosophy. They allow, for example, one to get GOF-like patterns with mere formulas instead of painstaking hand-referencing needed in OOP.

          (* Plus, this falls flat for systems that are not based on tables. *)

          Well, I consider tables a paradigm. It is true that paradigm X will match better with another interface that is also in paradigm X, and visa versa. However, OO faces the same tradeoff. This is one of the reasons for the "impedence mismatch" between OO and RDBMS's.

          • This is getting interesting.
            Perhaps a realistic example is in order. Shape, animal, and device driver toy examples don't scale to real things that I actually encounter.
            Those are toy examples of OO, so of course I won't use those to demonstrate how information hiding is orthogonal to OO. :-)

            A nontrivial example won't fit here, so I'll have to refer you to Parnas's original article [acm.org] on the topic. Its example could still be considered trivial by today's standards, but it's far better than anything I could fit in this space. It makes no reference to OO whatsoever. In fact, it's decidedly non-OO, with modules like "circular shifter" and "alphabetizer" that are most certainly procedural abstractions.

            If you want a bigger example, there's my Master's thesis work [toronto.edu], especially my defence presentation [toronto.edu] (PowerPoint slides). I consider it a good example of a successful application of information hiding principles, and it's about 23,000 LOC, so it's big enough to be considered nontrivial. It's also OO, so it doesn't prove that OO is orthogonal to information hiding, but I feel that its success arises from information hiding more than OO (especially since it's written in C and so makes no use of inheritance).

            One can show that 3rd-generation languages can code the same thing with less code and be more transportable to other platforms than assembler.
            How does one show that? Do you have a reference for such a study?

            The 3rd-generation-versus-assembly is the most clear-cut case of programming language expressive power there is, and yet it's still quite hard to "prove" in any meaningful way.

            Relational tables are a protocol and organizational philosophy. They allow, for example, one to get GOF-like patterns with mere formulas instead of painstaking hand-referencing needed in OOP.
            That's interesting. Do you have any references for this?
            • (* I'll have to refer you to Parnas's original article *)

              I interpret Parnas as pointing toward a need for a standardized way to access collections. IOW, a database interface.

              Besides, it is not very clear exactly what the system is supposed to do, so it is hard estimate future change patterns and frequencies.

              (* If you want a bigger example, there's my Master's thesis work *)

              Speaking of modular, it is tough to figure out exactly what this contraption does. It seems like systems-software, kinda outside my domain of custom biz software.

              Also, students don't really have enough real-world experience to have a feel for how and where requirements change IMO. I probably would have gone along with OO out of school because of its appeal to (over) idealistic change patterns. I wouldn't know any better back then.

              (* Do you have a reference for such a study? *)

              No. But I never met an assembler fan who challenged it. You are not questing the cross-platform claim, are you?

              (* The 3rd-generation-versus-assembly is the most clear-cut case of programming language expressive power there is, and yet it's still quite hard to "prove" in any meaningful way. *)

              I don't think it would take that much. Take a medium-complexity problem and challenge an assembler fan to do it with less code. Then toss them some typical change scenarios and see who's code is affected the most. (They can counter with their own scenarios, BTW.)

              Besides, if I am wrong, perhaps there are assembler fans who can out-program and out-maintain C,Python,LISP, etc. programmers.

              That would suggest that paradigms are subjective. People favor the paradigm that best maps to the way that they think.

              I don't think this is really the case with assembler, but is with other paradigms.

              (* That's interesting. Do you have any references for this? *)

              I didn't apply any metrics, but examples of GOF and GOF-like patterns using tables can be found at:

              http://www.geocities.com/tablizer/prpats.htm
              • Thanks for the reference. I have never seen relational programming advanced as a general-purpose paragigm for software construction, so I'll find it interesting to investigate.

                My personal opinion regarding OO is that people are disappointed in it for a number of reasons:
                • It has been oversold as a panacea, so people become disappointed when they discover that they still need to think.
                • It has been represented very poorly by at least one language, C++, which has convinced many that OO is unworkable on large, complex projects.
                • Popular OO languages and approaches miss out on Design by Contract, making them far less effective.
                • The majority of programmers are simply not skilled enough to architect large enough projects to evaluate a paradigm's scalability. (This same assertion, in a different form, is what led Fred Brooks to promote the surgeon team in The Mythical Man Month twenty years ago.) It is my feeling that skillfully-applied OO wins over some other paradigms (equally skillfully-applied) at the high end of complexity, though I could just be another of those unskilled programmers relying on blind faith in OO. :-)
                Thanks again for the discussion.
                • (* It is my feeling that skillfully-applied OO wins over some other paradigms (equally skillfully-applied) at the high end of complexity *)

                  This is often, but not always, stated by OO fans. If this is the case, then how come it is being touted for everything (all sizes), and pushing alternatives and research in alternatives away?

                  IMO, the procedural/relational approach scales well because you consider mostly *one task* at a time, and communicate mostly through the database.

                  Detractractors will say that relying on tables like this causes ripple effects if the schema needs to change. I would point out that this is very similar to the affect of an *interface* changing in an OO app. Tables *are* an interface.

                  (Hiding changes via database views and triggers veries per vendor. The products could probably improve here, but there is no in-born limit of the paradigm which prevents them.)

                  (* I have never seen relational programming advanced as a general-purpose paragigm for software construction, so I'll find it interesting to investigate. *)

                  I don't know if it is general purpose, it just seems to work well for custom biz apps. One-size-fits-all is probably not the case.

                  Regarding Design-by-Contract, it is hard to implement such for many types of business rules. It takes more code to state the contract than it does to implement it in many cases. You end up have to change 2 things instead of one when new requirements come: the implementation *and* the contract verification code. Thus, you increase the chance of errors. It often violates the once-and-only-once rule of factoring.

                  The stack DBC examples in the books don't seem to extrapolate to real-world requirements very smoothly. (I stopped using stacks when decent databases came along. A "stack" is simply one of many possible views of any collection. IOW, "Has-a" stack view instead of "is-a" stack.) Good abstraction is all about managing relativism IMO.
                  • Regarding Design-by-Contract, it is hard to implement such for many types of business rules. It takes more code to state the contract than it does to implement it in many cases.
                    This is the same objection as "how do I get from here to there without goto". If you design your application with contracts in mind, your contracts are never so complicated as to become a burden. Likewise, if you design your app with relational tables in mind, those probably tend to stay relatively simple too.
                    You end up have to change 2 things instead of one when new requirements come: the implementation *and* the contract verification code. Thus, you increase the chance of errors. It often violates the once-and-only-once rule of factoring.
                    Yes, this is often stated as a flaw in Design by Contract. I disagree with it. Firstly, there is no contract verification code; only the contract. When a modification affects a contract, you're in for a lot of trouble (even more so when you're not using DbC, and therefore may not realize that the contract actually has changed), and the effort of modifying the contract itself reminds a programmer of that. Contracts only need to be changed for exactly those situations in which the work of changing the contract itself vanishes in comparison with the labour required to propagate that modification to the rest of the system.

                    There are a number of other reasons I disagree with your assessment:

                    1. The contract and the code do not say the same thing, except for trivial cases. Unfortunately, the kinds of cases shown to beginners must be trivial, so that is all they see.
                    2. In a system designed using DbC, the contracts are far simpler than the implementation. Let's say they're 5 times simpler. Then, even assuming they are entirely redundant (which they aren't), that's only a 20% growth in code size. It's well worth it for DbC's benefits.
                    3. Finding a solution to a problem is generally harder than demonstrating the solution to be correct. This is the basis for the conjecture that P != NP. It is also the basis for DbC: namely, contracts are easier to write than programs, and can demonstrate that a program is working correctly.
                    4. Whatever redundancy there is is a good thing in this case. Stating certain things twice, in two different ways, and having the computer check them against each other, helps to locate errors. The same argument regarding redundancy could be used against type annotations, or variable declarations, or even multi-letter variable names, but most would argue that these kinds of redundancy help program correcness, rather than hindering it.
                    The most convincing argument in favour of DbC, I think, is Bertrand Meyer's "law of the excluded miracle": if the author of a class/procedure/module doesn't know what it's supposed to do, then the odds that it will do it properly are vanishingly small. Preconditions and postconditions are nothing more than a precise way to specify what something does.
                    • (* In a system designed using DbC, the contracts are far simpler than the implementation. *)

                      Maybe in scientific computing where the interface is simple, but the computations are complex. However, biz apps tend to be the other way around. (Biz apps tend to be complex in the way that multiple things interact and the biz rules can reference.)

                      (* Preconditions and postconditions are nothing more than a precise way to specify what something does. *)

                      Try comments. Well-worded comments are not going to beat the usefulness of some machine-readable notation precisely because it is tuned for the machine instead for people.

                      (* The same argument regarding redundancy could be used against type annotations.... *)

                      I can live without those. I tend more toward scriptish langs anyhow these days.
                    • Maybe in scientific computing where the interface is simple, but the computations are complex. However, biz apps tend to be the other way around.
                      That does not match my experience, but let's assume you're right. Then, the fact that an interface is complex makes it that much more important to document it in a rigorous way; and the fact that this is difficult to do certainly doesn't mean that it shouldn't be done.

                      I have applied DbC successfully to business apps and system software. I have never written any scientific software, so I can't comment on that.

                      Preconditions and postconditions are nothing more than a precise way to specify what something does.
                      Try comments. Well-worded comments are not going to beat the usefulness of some machine-readable notation precisely because it is tuned for the machine instead for people.
                      I'll ignore the freudian slip, and assume you meant that well-worded comments are going to beat the usefulness of assertions. In that case, I disagree with that too:
                      • Nobody ever said assertions in DbC need to be executable. DbC is a design methodology, based on the principle that you should know what the parts of your system are supposed to do. I use DbC when I write my C code, and my contracts take the form of comments.
                      • Forcing comments to be executable makes them less ambiguous. In this way, Eiffel-style assertions are often preferable to the comments that usually pass for interface documentation.
                      • Executable contracts are continually double-checked against the implementation code to make sure they agree. Comments can drift and become inaccurate over time.
                      The same argument regarding redundancy could be used against type annotations...
                      I can live without those. I tend more toward scriptish langs anyhow these days.
                      Good for you. The point of my comparison with type annotations (plus, more importantly, variable declarations and multi-character variable names, which you have ellided) was that redundancy is not always bad. The redundancy argument could also be used against comments; I hope you won't argue that comments are contrary to the "say it once" principle?
                    • Besides,

                      Validation checks can be made with simple IF statements.

                      If not inRange(...) then
                      panic_or_something
                      end if
                    • No, for two reasons:
                      • To add this kind of checking code everywhere throughout the system would be prohibitively slow, even if the errors you are checking for never happen. In contrast, once certain bugs are rare enough, assertion checks can be disabled, and no longer add any performance overhead. Furthermore, even if you never disable assertion checks, DbC makes it clear exactly where they are necessary, so you don't end up with duplicate redundant checks.
                      • What you have shown is not Design by Contract. DbC is not an implementation technique to check for errors; it's a design methodology to delineate precisely the responsibilities of each class/module/function in a system. Yours is an example of defensive programming, which is basically the opposite of DbC.
                      Have a look at some of the Design by Contract literature on the web. I promise, it will be time well spent, even if you don't end up using it.
                    • (* In contrast, once certain bugs are rare enough, assertion checks can be disabled, and no longer add any performance overhead. *)

                      So it is slow the first 2 years, before More's law makes it not matter? That is not a very good selling point.

                      (* Furthermore, even if you never disable assertion checks, DbC makes it clear exactly where they are necessary, so you don't end up with duplicate redundant checks. *)

                      And IF statements are not because they are not weird and funky enough to stand out? That is a silly argument. Besides, you can call the same function each time:

                      if Not inRange(...)
                      SameFamiliarName("Foo out of range")
                      end if

                      (* DbC is not an implementation technique to check for errors; it's a design methodology to delineate precisely the responsibilities of each class/module/function in a system. *)

                      Yeah yeah. I have had this argument before, and how DBC is so *subletly* different that it does not really matter.

                      Use what is already available and stop adding goofy little syntax to a language to make it funkier and funkier. Reinvent something really different, not a glorified IF statement. That is a waste of complexity.
                    • Ok, I think I may be wasting my time. I thought you were just unfamiliar with DbC, but it seems to me you have made up your mind to believe that DbC is something it's not, and to argue against it based on faults it doesn't posess. You have taken a gratuitously pessimistic view of everything I say, to the point that some of your comments actually contradict what I have said.

                      I hope I'm wrong, and that we can continue to have a rational discussion about this.

                      So it is slow the first 2 years, before More's law makes it not matter? That is not a very good selling point.
                      Yes, it sure isn't a good selling point if you pull this two-year time frame out of your ass. What if I told you the time frame is more like six weeks? That would be more in line with my experience.
                      Furthermore, even if you never disable assertion checks, DbC makes it clear exactly where they are necessary, so you don't end up with duplicate redundant checks.
                      And IF statements are not because they are not weird and funky enough to stand out?
                      I'm not sure what your point is here. Mine is that you can't disable error checking code unless you know which error checks can safely be disabled. Sure, you can grep for "if", but you need to know the difference between error checks that trap bugs in the program, versus those that catch valid error conditions like user errors.

                      For instance, in a C compiler, you should eventually be able to disable internal data structure consistency checks, but you can never disable parse error checks. In most software, such as business apps, the line between bugs and actual error conditions is not so clear.

                      The way you tell error conditions from bugs is Design by Contract. To the extent that you can tell these two things apart, you are using DbC, whether you have chosen to do it consciously or not.

                      DbC is not an implementation technique to check for errors; it's a design methodology to delineate precisely the responsibilities of each class/module/function in a system.
                      Yeah yeah. I have had this argument before, and how DBC is so *subletly* different that it does not really matter.

                      Use what is already available and stop adding goofy little syntax to a language to make it funkier and funkier. Reinvent something really different, not a glorified IF statement.

                      This kind of logic is hard to argue with. You dismiss my statement that Design by Contract is more than just an IF statement, and then you claim that because it's just an IF statement, it's worthless. Well, I agree that there's no point in adding glorified IF statements to a language, but I can only emphasize once again that DbC is a design methodology. (Why do you think they call it Design by Contract?)

                      I have already told you that I use DbC to design C code. C obviously doesn't have any special contract syntax, do I'm not sure how you could believe that DbC is just a syntax issue.

                      There are mountains of resources on the internet describing the DbC technique, and if you want to ignore it and argue against a straw man instead, that's your perogative.

                    • (* Sure, you can grep for "if", but you need to know the difference between error checks that trap bugs in the program, versus those that catch valid error conditions like user errors. *)

                      I already described how to do that.

                      Another way is with a comment. The advantage of a comment is that you can create more complex "removal schemes". For example, you may not want to remove *all* the checks, but just the most costly ones (CPU-wise).

                      if Not inRange(....) // DBC: level_3
                      DBCraise("x is out of range")
                      end if ....
                      if Not inRange(....) // DBC: level_2
                      DBCraise("y is out of range")
                      end if

                      If you rely on built-in stuff, then you cannot add features like that if you want to: you are stuck with whatever is out-of-the-box. In this case, all-or-nothing removal/disable of the checks.

                      (* You dismiss my statement that Design by Contract is more than just an IF statement, and then you claim that because it's just an IF statement, it's worthless. *)

                      I did not say "worthless". I am saying that you have not justified dedicated syntax.

                      DBC is just a round-about, consultant buzzword wallet-draining way of saying:

                      "Testing assumptions is a good"
                    • Ok, fair enough. I think we're near the end of where this discussion can take us. Your idea with the comments seems impractically labour-intensive to me, but if it works for you, then that's great. Also, I don't recall you ever mentioning how to tell bugs from legitimate error conditions.

                      One last thing: regarding the dedicated syntax, I have said a number of times that DbC doesn't require any dedicated syntax, as I have used it on C projects. Again, it's a design methodology, not an implementation technique.

                    • (* Also, I don't recall you ever mentioning how to tell bugs from legitimate error conditions. *)

                      Have a function ReportBug(msg), and handle the rest any way that is appropriate.

                      I would note that you seemed to agree that the difference can be a fuzzy area in some cases.

                      (Or use new bloated OO-influenced style:

                      system.bugManager.bugReporter.reportBug( messageManager.messageFactory.msg("foo"))

                      )
                    • Oh, actually, another thing:
                      DBC is just a round-about, consultant buzzword wallet-draining way of saying: "Testing assumptions is a good"
                      (Wallet-draining? I'd love to meet your clients. :-)

                      No, it's not. DbC is not about testing assumptions; it's a methodology for deciding what those assumptions should be. It provides guidelines like command-query separation, rules on how contracts behave in the presence of inheritance, etc. The testing of the assumptions is secondary, and is helpful for debugging, but it's not Design by Contract.

                      Anyway, I don't really give a rat's ass whether you ever actually use DbC yourself. I just figured since you introduced me to your paradigm, I'd return the favour.

                    • (* Wallet-draining? I'd love to meet your clients *)

                      They are usually the ones who have been screwed over by buzzword-spewing consultants, and are thus looking for something lean, mean, and clean that works instead of talks.

  • For instance, at what point do you split that massive source file into multiple files?

    You do it as soon as you notice the problem. If you have good tools, it will be simple and fun (yes, fun).

    A refactoring browser like IDEA from IntelliJ [intellij.com] makes it simple. Hilight a few lines of code, choose "Extract Method" from a menu, and the code is extracted into a new method with all the necessary parameters created and passed in and the necessary return type and assignment created. For example:

    1: int a = 12, b = 9;
    2: a += 43 * b + 12 / 4;
    Hilight the expression afther the "+=" online 2 and extract method, calling it "foo":
    1: int a = 12, b = 9;
    2: a += foo( b );

    3: private int foo( int c ) {
    4: return 43 * c + 12 / 4;
    5: }

    At what point do two functions approaching similar functionality need to be merged, despite the cost of digging through the source and making changes to call the new function?

    It also has a rename feature which will rename a method or variable and change all references to it, but doesn't change references to different variables or methods that happen to have the same name.

    It has lots mroe features, but you can read about them for yourself and download the program and play it.

    There are other refactoring browsers out there too, like the free Eclipse from IBM. With the right tools, you can easily make your code less messy.

  • This question is the subject of Large Scale C++ Software Design [barnesandnoble.com] by John Lakos.

    Don't let the title fool you--although he uses C++ for his examples, the concepts he talks about (splitting code into components, why each component should be in its own file, levelization of components, etc.) make sense in any OO language.

    I consider this book a must-read for anybody working on large programs.

  • Before recommendations are made, I'd say congratulations that you're thinking of this. I'd further congratulate you if you work for someone who'll let you make such improvements. In my company, there's always some "business case", policies, politics, and other reasons for not making improvements.

    Nevertheless, I'd recommend a good book on structured analysis and design, then go to a design patterns book, perhaps. The SA&D is probably more micro and the patterns is probably more macro in nature today. I'd do this if you needed "justification" for what you were doing.

    Although not directly related, learning basic database concepts wouldn't be bad either. I think normalization concepts might effect your thinking in a positive way. Not only for organizing data structures, but also where you place files in your build structure. Maybe it'll help you avoid needless redundancy. At least these are based on more sound mathematics, and the knowledge thereof will last.

    I think the dirty little secret towards these books is that much of it is written opinion, with little scientific evidence for its reasoning. In computers today, much credibility comes more from writing about something than proving something. Not entirely, perhaps, but if it were really gospel, then we wouldn't have displaced SA&D with OOA&D and patterns. Develop your own style, be consistent, and stick to your guns, because your opinion is probably no worse than the others.
  • Java forces you to make each file a different object. Then comes organizing all your files into packages. For this, we use patterns (like model-view-controller pattern). The higher level after patterns is application specific.

    Ahh, the joys of OOP...
    • Um...what *are* you talking about?

      In Java, every *class* needs to be in a separate file, except inner classes, which are only visible to the public class (the one with the name of the file).

      Classes are organized into packages, and should be put into a directory tree that matches the package hierarchy.

      What with has to do with MVC I'd rather not guess.
      • bah, shoulda said "each object into a different file". Its been one of "those" kinda days... sorry.
        • This is getting rather off-topic, but every class goes in a different file. You never code objects at all, only classes. Then, some class magically gets its public static void main(String[] args) method called, from which you can instantiate objects based the classes that you have written.
          • Every public class goes in a different file, IIRC. That file can also define other, non-public classes, both inner and non-inner.
    • Java most certainly does not force you to make each file a different object, inasmuch as objects are created at runtime and do not exist in Java source files. Nor does Java force you to define each top level class in a different file, as davidmccabe stated. user2048 was correct in stating that "Every public class goes in a different file".

      The complete rule is that only one public top level class or interface can be defined per file, and the name of the file must be the same as the name of the public class or interface (plus .java). Additional non-public classes or interfaces can be defined in that same file, or indeed in some other file with an arbitrary .java name. For completeness, it is not necessary to define any classes or interfaces at all in a java source file, although a file with no class or interface definitions will be useless. And, yes, none of this file organization has much of anything to do with patterns.

      Anyone want to hire a Java language lawyer?
  • by oliverthered ( 187439 ) <oliverthered@hotmail. c o m> on Friday July 05, 2002 @11:36AM (#3827702) Journal
    Databases and code should be designed in a similar way, for more or less the same reasons. If all the refactoring book people have been recommending seem a bit extreme (even the word refactoring sounds extreme to me, a bit like downsizing grrrr....).

    Try getting a simple DB design book that goes through a normalisation process, it should make for a lighter read.

    Then think about how to apply the process to software(a bit of light thinking)

    The first couple of steps are something like

    separate everything out into discrete chunks

    look at 'keys' and 'indexes' (in source code they are design patterns, data structures the things that tie the chicks together).

    You don't need a 1000 page bible, you need a ten pages of guide lines and good practices and a bit of brain power.
    • data structures the things that tie the chicks [sic] together

      When you find yourself subconsciously writing about "tying chicks together" in a discussion of source code organization, it's time to take a break from the keyboard and go get laid, if you can...

      I agree with you about the correlation between database normalization and code factoring [which is the correct and long-established term, no matter how much you might dislike the term "refactoring"]. However, to get a database into Nth normal form can be done by following some fairly simple rules. Code isn't quite so easy. Books like Fowler's refactoring book cover details, subtleties, and rationales that even above-average developers may miss.

      Also, refactoring is a name for something that programmers have always done anyway. An agreed-on name is better than no name at all, or many non-standard names.

  • Programming Pearls by Jon Bentley - old as the hills by now (he talks about the location of data on tape....), but full of very good insights into writing "good code"TM.

    You might also like "The pragmatic programmer" - Hunt and Thomas - which is another "meta-programming" book with a lot of ideas and insights you could actually sell to your pointy-headed boss.

    The section on "zero-tolerance" coding is a great "why and when to refactor" argument. There's also a good section on how to design the units of which your software is composed, how to reduce the coupling between those units, and how to test em when (you think...) they're done.

    Nev

One man's constant is another man's variable. -- A.J. Perlis

Working...