


Organizing Source Code, Regardless of Language? 64
og_sh0x queries: "I'm looking for a source of information dedicated to organizing source code. I see a lot of books and other resources covering syntax and various syntax-related philosophies, but I can never seem to find a good resource for organizing source code in general. For instance, at what point do you split that massive source file into multiple files? At what point do two functions approaching similar functionality need to be merged, despite the cost of digging through the source and making changes to call the new function? These are problems that plague many programming languages. Are there such resources that cover these issues?"
This (Score:4, Insightful)
I've yet to find a simple way to determine any of those. It's just that feeling when you get while looking at the code 'damn, not again..'.
Refactoring (Score:5, Informative)
The book then goes on to describe the various types of "abstract smells" and what sort of correctional techniques can be considered to correct them, for example:-
I have frequently found that just reading through this short (~15) collection of abstracted "smells" gives a very good way of supplementing the "experience" that you speak of and helping you to make decisions with the benefit of a) a bit of third party support in making these decisions and b) a clearly defined set of rules as to how to apply each of the refactorings including test cases to prove that the functionality has not been changed in the process and, more importantly, a clean roll-back procedure for those times when the olfactory senses get a little bit confused...
Re:Refactoring (Score:5, Informative)
When I first saw "Refactoring", I said to myself: "Self, now I've seen everything". I thought it was yet another book enshrining process and procedure over good working code.
I was wrong.
This book is really good for those who haven't yet learned what Stroustrup refers to as "taste". Hell, I've been coding for many, many years and I certainly thought it was worth a read!
My only caution is the same as the one I give about GoF, UML, OOA/OOD/OOP, or any other codified programming "methods": Don't blindly follow them w/o taking your own experience into account.
Basically, the less time you've been coding, the more seriously you should take these concepts. Over time, and with many KLOC, you'll develop your own "taste"; your own sense of what works. This is not to say that learning new methods is useless to someone who's been coding for a long time; far from it. Just that, in most cases, a hacker will develop a pretty good idea of what works for them and what doesn't. The dirty little secret is, whether your like the language he "accreted" or not, Wall is right: TMTOWTDI. And knowing which way to use in any given circumstance can only come with experience. Reading books like "Refactoring" can help a lot until you get that experience, though!
Re:Refactoring (Score:2)
That said, some of what refactoring browsers do can be done with search and replace. Be careful though, you don't want to change all occurences of "i" to "index" and end up with code that won't compile because there's no type "indexnt".
The ability to rollback refactorings is essential, too. Industrial-strength source control tools are pretty much a necessity, allowing you to re-get your CVS tree if a refactoring attempt gets out of hand.
Since I bought Martin Fowler's book and started studying refactorings, my code has gotten much easier to live with. I no longer fear making a significant change to add functionality or fix bugs because I know I can refactor and still have code that continues to work as before. The addition of unit tests has helped to ensure that it not only keeps working, but it keeps behaving as expected.
Re:Refactoring (Score:1)
This is why one should seperate every independent token. The vi command :%s/i/index/g may break a lot of things, but :%s/ i / index /g will not.
Re:Refactoring (Score:1)
Always! (Score:1)
Re:Always! (Score:1)
If the functions belong to a library, you can still have a library file that includes all functions, so that the projects that use this library only have to include one single file.
Each function in an own file could also ease version management if you are not the only one working on the project. At least if you do not use cvs for whatever reason.
Does it still sound ridiculous? Is there any real argument AGAINST using separate files for each function, class, etc.? Then please tell.
Re:Always! (Score:3, Insightful)
My obligatory plug for The Mozilla Project [mozilla.org]. Not quite one function per source file, but definitely lots of very small source files, each implementing a very narrow slice of functionality. Mozilla is pretty well factored code, and maintainability is enhanced by the separation of responsibilities. It makes it possible to enhance or fix problems in one area, say the in nsFTPChannel, and know that all the thousands of other lines in the program will be largely insulated from those changes.
Yes, it does take a while to get familiar with the entire Mozilla codebase. The flip side is that you only have to look at and understand a small fraction of it to start becoming productive.
If you are using C++, Large Scale C++ Software Design is definitely a recommendation I can second.
Re:Always! (Score:1)
Read Stroustrup (Score:3, Informative)
refactoring is what you're after (Score:1, Informative)
Be consistent (Score:2)
Book recommendation (Score:5, Informative)
Yes, McConnell is a Microsoft guy, but this book is completely operating-system and programming-language agnostic (even though examples are in C, Fortran, and Pascal, IIRC). It is an excellent guide to software construction, covering every aspect from design, over coding practice, style issues, to project management. I highly recommend it.
Re:Book recommendation (Score:1)
Large-Scale C++ Software Design (Score:2)
The wrong questions (Score:4, Insightful)
The trouble with such a question is that it has no answer. Dijkstra's argument was not that one should take existing programs and remove the gotos; rather, that programs written using only structured elements (sequencing, conditionals, loops) are more comprehensible, and don't require any gotos because there is a more elegant way to achieve the same effect. Thus, as you can see, there really is no answer to the question; the questionner's approach was fundamentally flawed.
Likewise, software organization is not done in terms of functions; rather, it is done in terms of information-hiding modules [acm.org]. To ask when one huge function should be split into to, or when two similar functions should be merged, indicates to me that the design might be flawed. Sometimes that's unavoidable; for instance, if you are involved in a project written by someone else. In that case, you do indeed need to make this kind of decision.
However, true modular programming does not mean taking huge lumbering hunks of code and splitting them into modules. It means writing modules using the principles of information hiding to avoid making huge lumbering hunks of code in the first place.
This, of course, is easier said than done. It's not that hard to avoid gotos, because the use of Dijkstra's structured programming techniques makes them unnecessary. In contrast, writing good modules is hard, and without superhuman foresight, some modules are bound to be pretty crummy. These will need to be rewritten in order to achieve good information hiding properties.
So, there's your answer: don't put the cart before the horse. Don't expect that someone will tell you that you need to split a function when it gets beyond X number of lines. Rather, look at the integrity of the system's modules. If I can leave you with one piece of advice, I hope it is this: design module interfaces not according to what services they provide, but what information they hide. Modules for which you can't find a succinct statement (12 words or less, with no ifs, ands, or ors) of what information they hide are poorly designed, and need an overhaul. A symptom of this may be that your functions are redundant, or too long, but the core problem is one of poor module design.
Re:The wrong questions (Score:3, Insightful)
Actually, the questions he is asking are indeed very important. It's all well to say that code "should be well designed", and indeed, most books spend a lot of time talking about design principles for people with clean slates. Unfortunately, very few people have a clean slate to work with. Using a good design up front is not an option if you're not the one who did the upfront design. We are either stuck with maintaning poorly designed code, or even code that was designed well up-front, but needs a change in design to meet changing requirements. What a book like refactoring brings to the table is the process of incremental redesign. Redesigning code without rewriting it is a fine art, and refactoring basically explains how to do it.
Re:The wrong questions (Score:2)
I tried to give some advice on how to tell whether a module system is good (that is, by information hiding); and further, to answer his question, my advice would be to refactor whenever he sees that information is not being hidden properly by the system's modules.
Re:The wrong questions (Score:1)
Them er fightin' words (Score:2)
Sounds like a hidden ad for OO thinking.
oop.ismad.com
OOP has never been proven to be objectively superior, neither WRT code size, nor reuse, nor less change under change-impact analyses. (Except possibly in a few narrow domains.)
The trick to procedural is good table schema design IMO. In 70's they didn't know about this when they started bashing procedural designs and promoted OO as a solution.
Re:Them er fightin' words (Score:1)
Re:Them er fightin' words (Score:2)
Perhaps a realistic example is in order. Shape, animal, and device driver toy examples don't scale to real things that I actually encounter.
(* Name a discipline that has been proven in such a way. *)
One can show that 3rd-generation languages can code the same thing with less code and be more transportable to other platforms than assembler.
(* Most of your system shouldn't have a clue that there even are tables. *)
Relational tables are a protocol and organizational philosophy. They allow, for example, one to get GOF-like patterns with mere formulas instead of painstaking hand-referencing needed in OOP.
(* Plus, this falls flat for systems that are not based on tables. *)
Well, I consider tables a paradigm. It is true that paradigm X will match better with another interface that is also in paradigm X, and visa versa. However, OO faces the same tradeoff. This is one of the reasons for the "impedence mismatch" between OO and RDBMS's.
Re:Them er fightin' words (Score:1)
A nontrivial example won't fit here, so I'll have to refer you to Parnas's original article [acm.org] on the topic. Its example could still be considered trivial by today's standards, but it's far better than anything I could fit in this space. It makes no reference to OO whatsoever. In fact, it's decidedly non-OO, with modules like "circular shifter" and "alphabetizer" that are most certainly procedural abstractions.
If you want a bigger example, there's my Master's thesis work [toronto.edu], especially my defence presentation [toronto.edu] (PowerPoint slides). I consider it a good example of a successful application of information hiding principles, and it's about 23,000 LOC, so it's big enough to be considered nontrivial. It's also OO, so it doesn't prove that OO is orthogonal to information hiding, but I feel that its success arises from information hiding more than OO (especially since it's written in C and so makes no use of inheritance).
How does one show that? Do you have a reference for such a study?The 3rd-generation-versus-assembly is the most clear-cut case of programming language expressive power there is, and yet it's still quite hard to "prove" in any meaningful way.
That's interesting. Do you have any references for this?Re:Them er fightin' words (Score:2)
I interpret Parnas as pointing toward a need for a standardized way to access collections. IOW, a database interface.
Besides, it is not very clear exactly what the system is supposed to do, so it is hard estimate future change patterns and frequencies.
(* If you want a bigger example, there's my Master's thesis work *)
Speaking of modular, it is tough to figure out exactly what this contraption does. It seems like systems-software, kinda outside my domain of custom biz software.
Also, students don't really have enough real-world experience to have a feel for how and where requirements change IMO. I probably would have gone along with OO out of school because of its appeal to (over) idealistic change patterns. I wouldn't know any better back then.
(* Do you have a reference for such a study? *)
No. But I never met an assembler fan who challenged it. You are not questing the cross-platform claim, are you?
(* The 3rd-generation-versus-assembly is the most clear-cut case of programming language expressive power there is, and yet it's still quite hard to "prove" in any meaningful way. *)
I don't think it would take that much. Take a medium-complexity problem and challenge an assembler fan to do it with less code. Then toss them some typical change scenarios and see who's code is affected the most. (They can counter with their own scenarios, BTW.)
Besides, if I am wrong, perhaps there are assembler fans who can out-program and out-maintain C,Python,LISP, etc. programmers.
That would suggest that paradigms are subjective. People favor the paradigm that best maps to the way that they think.
I don't think this is really the case with assembler, but is with other paradigms.
(* That's interesting. Do you have any references for this? *)
I didn't apply any metrics, but examples of GOF and GOF-like patterns using tables can be found at:
http://www.geocities.com/tablizer/prpats.htm
Re:Them er fightin' words (Score:1)
My personal opinion regarding OO is that people are disappointed in it for a number of reasons:
Re:Them er fightin' words (Score:2)
This is often, but not always, stated by OO fans. If this is the case, then how come it is being touted for everything (all sizes), and pushing alternatives and research in alternatives away?
IMO, the procedural/relational approach scales well because you consider mostly *one task* at a time, and communicate mostly through the database.
Detractractors will say that relying on tables like this causes ripple effects if the schema needs to change. I would point out that this is very similar to the affect of an *interface* changing in an OO app. Tables *are* an interface.
(Hiding changes via database views and triggers veries per vendor. The products could probably improve here, but there is no in-born limit of the paradigm which prevents them.)
(* I have never seen relational programming advanced as a general-purpose paragigm for software construction, so I'll find it interesting to investigate. *)
I don't know if it is general purpose, it just seems to work well for custom biz apps. One-size-fits-all is probably not the case.
Regarding Design-by-Contract, it is hard to implement such for many types of business rules. It takes more code to state the contract than it does to implement it in many cases. You end up have to change 2 things instead of one when new requirements come: the implementation *and* the contract verification code. Thus, you increase the chance of errors. It often violates the once-and-only-once rule of factoring.
The stack DBC examples in the books don't seem to extrapolate to real-world requirements very smoothly. (I stopped using stacks when decent databases came along. A "stack" is simply one of many possible views of any collection. IOW, "Has-a" stack view instead of "is-a" stack.) Good abstraction is all about managing relativism IMO.
Re:Them er fightin' words (Score:1)
There are a number of other reasons I disagree with your assessment:
Re:Them er fightin' words (Score:2)
Maybe in scientific computing where the interface is simple, but the computations are complex. However, biz apps tend to be the other way around. (Biz apps tend to be complex in the way that multiple things interact and the biz rules can reference.)
(* Preconditions and postconditions are nothing more than a precise way to specify what something does. *)
Try comments. Well-worded comments are not going to beat the usefulness of some machine-readable notation precisely because it is tuned for the machine instead for people.
(* The same argument regarding redundancy could be used against type annotations.... *)
I can live without those. I tend more toward scriptish langs anyhow these days.
Re:Them er fightin' words (Score:1)
I have applied DbC successfully to business apps and system software. I have never written any scientific software, so I can't comment on that.
I'll ignore the freudian slip, and assume you meant that well-worded comments are going to beat the usefulness of assertions. In that case, I disagree with that too:Re:Them er fightin' words (Score:2)
Validation checks can be made with simple IF statements.
If not inRange(...) then
panic_or_something
end if
Re:Them er fightin' words (Score:1)
Re:Them er fightin' words (Score:2)
So it is slow the first 2 years, before More's law makes it not matter? That is not a very good selling point.
(* Furthermore, even if you never disable assertion checks, DbC makes it clear exactly where they are necessary, so you don't end up with duplicate redundant checks. *)
And IF statements are not because they are not weird and funky enough to stand out? That is a silly argument. Besides, you can call the same function each time:
if Not inRange(...)
SameFamiliarName("Foo out of range")
end if
(* DbC is not an implementation technique to check for errors; it's a design methodology to delineate precisely the responsibilities of each class/module/function in a system. *)
Yeah yeah. I have had this argument before, and how DBC is so *subletly* different that it does not really matter.
Use what is already available and stop adding goofy little syntax to a language to make it funkier and funkier. Reinvent something really different, not a glorified IF statement. That is a waste of complexity.
Re:Them er fightin' words (Score:1)
I hope I'm wrong, and that we can continue to have a rational discussion about this.
Yes, it sure isn't a good selling point if you pull this two-year time frame out of your ass. What if I told you the time frame is more like six weeks? That would be more in line with my experience. I'm not sure what your point is here. Mine is that you can't disable error checking code unless you know which error checks can safely be disabled. Sure, you can grep for "if", but you need to know the difference between error checks that trap bugs in the program, versus those that catch valid error conditions like user errors.For instance, in a C compiler, you should eventually be able to disable internal data structure consistency checks, but you can never disable parse error checks. In most software, such as business apps, the line between bugs and actual error conditions is not so clear.
The way you tell error conditions from bugs is Design by Contract. To the extent that you can tell these two things apart, you are using DbC, whether you have chosen to do it consciously or not.
This kind of logic is hard to argue with. You dismiss my statement that Design by Contract is more than just an IF statement, and then you claim that because it's just an IF statement, it's worthless. Well, I agree that there's no point in adding glorified IF statements to a language, but I can only emphasize once again that DbC is a design methodology. (Why do you think they call it Design by Contract?)I have already told you that I use DbC to design C code. C obviously doesn't have any special contract syntax, do I'm not sure how you could believe that DbC is just a syntax issue.
There are mountains of resources on the internet describing the DbC technique, and if you want to ignore it and argue against a straw man instead, that's your perogative.
Re:Them er fightin' words (Score:2)
I already described how to do that.
Another way is with a comment. The advantage of a comment is that you can create more complex "removal schemes". For example, you may not want to remove *all* the checks, but just the most costly ones (CPU-wise).
if Not inRange(....)
DBCraise("x is out of range")
end if
if Not inRange(....)
DBCraise("y is out of range")
end if
If you rely on built-in stuff, then you cannot add features like that if you want to: you are stuck with whatever is out-of-the-box. In this case, all-or-nothing removal/disable of the checks.
(* You dismiss my statement that Design by Contract is more than just an IF statement, and then you claim that because it's just an IF statement, it's worthless. *)
I did not say "worthless". I am saying that you have not justified dedicated syntax.
DBC is just a round-about, consultant buzzword wallet-draining way of saying:
"Testing assumptions is a good"
Re:Them er fightin' words (Score:1)
One last thing: regarding the dedicated syntax, I have said a number of times that DbC doesn't require any dedicated syntax, as I have used it on C projects. Again, it's a design methodology, not an implementation technique.
Re:Them er fightin' words (Score:2)
Have a function ReportBug(msg), and handle the rest any way that is appropriate.
I would note that you seemed to agree that the difference can be a fuzzy area in some cases.
(Or use new bloated OO-influenced style:
system.bugManager.bugReporter.reportBug( messageManager.messageFactory.msg("foo"))
)
Re:Them er fightin' words (Score:1)
No, it's not. DbC is not about testing assumptions; it's a methodology for deciding what those assumptions should be. It provides guidelines like command-query separation, rules on how contracts behave in the presence of inheritance, etc. The testing of the assumptions is secondary, and is helpful for debugging, but it's not Design by Contract.
Anyway, I don't really give a rat's ass whether you ever actually use DbC yourself. I just figured since you introduced me to your paradigm, I'd return the favour.
Re:Them er fightin' words (Score:2)
They are usually the ones who have been screwed over by buzzword-spewing consultants, and are thus looking for something lean, mean, and clean that works instead of talks.
If you have good tools... (Score:2, Interesting)
You do it as soon as you notice the problem. If you have good tools, it will be simple and fun (yes, fun).
A refactoring browser like IDEA from IntelliJ [intellij.com] makes it simple. Hilight a few lines of code, choose "Extract Method" from a menu, and the code is extracted into a new method with all the necessary parameters created and passed in and the necessary return type and assignment created. For example:
Hilight the expression afther the "+=" online 2 and extract method, calling it "foo":At what point do two functions approaching similar functionality need to be merged, despite the cost of digging through the source and making changes to call the new function?
It also has a rename feature which will rename a method or variable and change all references to it, but doesn't change references to different variables or methods that happen to have the same name.
It has lots mroe features, but you can read about them for yourself and download the program and play it.
There are other refactoring browsers out there too, like the free Eclipse from IBM. With the right tools, you can easily make your code less messy.
This book answers in detail (Score:2, Informative)
Don't let the title fool you--although he uses C++ for his examples, the concepts he talks about (splitting code into components, why each component should be in its own file, levelization of components, etc.) make sense in any OO language.
I consider this book a must-read for anybody working on large programs.
Re:Le plus ca change... (Score:2)
It's always good to see the grey-hairs confirming that what seems new and different and untested is in fact obvious and essential for junior programmers to know. Repackaging it a Refactoring may not add anything new, but it does place it in a context that's more accessible to those not raised on FORTRAN and COBOL. Plus, when the old classics are out of print and hard to find, it's good that the new refactorings of the information are still on the shelves at Amazon.
Re:Le plus ca change... (Score:2)
Although that characterization does describe a valid benefit of OOP, it completely misses possibly the most important aspect of OOP, which is the introduction of type-based polymorphism.
In fact, the "organizing procedural code" benefit of OOP is simply a side effect of designing systems based on interacting types, something which procedural systems didn't directly support. Saying that OOP is a way of organizing your procedural code completely misses the point.
Modern texts on refactoring focus on factoring issues in these systems of interacting types, and as such are revelant to current systems in a way that it's difficult for e.g. Plauger to be. Certainly, normalizing/factoring/compressing systems has been and always will be a basic goal of software development, but just because the concept is old doesn't mean that there aren't new insights into it. Suggesting otherwise is a little like saying that using Jupiter's gravity to give a spaceprobe an energy boost is nothing new, since Newton discovered gravity. I suspect NASA scientists get most of their information somewhere other than Principia Mathematica.
Re:Le plus ca change... (Score:1)
Yourdon & Constantine's 1975 book Structured Design [amazon.com], originally published by Prentice-Hall, is still in print in a photocopy/perfect-bound edition from Yourdon Press. The covers suck but the text is reproduced perfectly, except for the halftone boxes at the heads of the chapters.
Congratulations (Score:1)
Nevertheless, I'd recommend a good book on structured analysis and design, then go to a design patterns book, perhaps. The SA&D is probably more micro and the patterns is probably more macro in nature today. I'd do this if you needed "justification" for what you were doing.
Although not directly related, learning basic database concepts wouldn't be bad either. I think normalization concepts might effect your thinking in a positive way. Not only for organizing data structures, but also where you place files in your build structure. Maybe it'll help you avoid needless redundancy. At least these are based on more sound mathematics, and the knowledge thereof will last.
I think the dirty little secret towards these books is that much of it is written opinion, with little scientific evidence for its reasoning. In computers today, much credibility comes more from writing about something than proving something. Not entirely, perhaps, but if it were really gospel, then we wouldn't have displaced SA&D with OOA&D and patterns. Develop your own style, be consistent, and stick to your guns, because your opinion is probably no worse than the others.
Java? (Score:2)
Ahh, the joys of OOP...
Re:Java? (Score:1)
In Java, every *class* needs to be in a separate file, except inner classes, which are only visible to the public class (the one with the name of the file).
Classes are organized into packages, and should be put into a directory tree that matches the package hierarchy.
What with has to do with MVC I'd rather not guess.
Re:Java? (Score:1)
Re:Java? (Score:1)
Re:Java? (Score:1)
Yes, a class is a blueprint for a type of object.
The thing about classes and objects is, they are not the same in that they don't have a one-to-one relationship all of the time, just as "this kind of foo" is not the same as "this particular foo".
Re:Java? (Score:1)
Re:Java? (Score:1)
The complete rule is that only one public top level class or interface can be defined per file, and the name of the file must be the same as the name of the public class or interface (plus
Anyone want to hire a Java language lawyer?
Database normalisation rules. (Score:3, Interesting)
Try getting a simple DB design book that goes through a normalisation process, it should make for a lighter read.
Then think about how to apply the process to software(a bit of light thinking)
The first couple of steps are something like
separate everything out into discrete chunks
look at 'keys' and 'indexes' (in source code they are design patterns, data structures the things that tie the chicks together).
You don't need a 1000 page bible, you need a ten pages of guide lines and good practices and a bit of brain power.
Re:Database normalisation rules. (Score:2)
When you find yourself subconsciously writing about "tying chicks together" in a discussion of source code organization, it's time to take a break from the keyboard and go get laid, if you can...
I agree with you about the correlation between database normalization and code factoring [which is the correct and long-established term, no matter how much you might dislike the term "refactoring"]. However, to get a database into Nth normal form can be done by following some fairly simple rules. Code isn't quite so easy. Books like Fowler's refactoring book cover details, subtleties, and rationales that even above-average developers may miss.
Also, refactoring is a name for something that programmers have always done anyway. An agreed-on name is better than no name at all, or many non-standard names.
Other recommendations... (Score:2, Informative)
You might also like "The pragmatic programmer" - Hunt and Thomas - which is another "meta-programming" book with a lot of ideas and insights you could actually sell to your pointy-headed boss.
The section on "zero-tolerance" coding is a great "why and when to refactor" argument. There's also a good section on how to design the units of which your software is composed, how to reduce the coupling between those units, and how to test em when (you think...) they're done.
Nev