Space Shuttle Software: Not For Hacks 178
As someone who's done more than his share of late-nighters, it was an interesting view into the mission-critical environment. Maybe there are a few software firms out there that would rather spend some of their money on better processes rather than technical support engineers. Maybe a little more market research and a little less marketing, too. A good read."
These guys are "pretty thorough" the way Vlad the Impaler was "a little unbalanced." Still, you have to wonder how they can claim single-digit errors among thousands of lines of code, but I guess the proof is in the rocket-powered pudding. And lucky for them, their target platform was recently upgraded.
Who wrote the Mars landing software? (Score:2)
The only thing I could think of after hearing that such an error caused mulitmillion dollar craft to crash was IDIOTS - any scientist should be using SI units today.
Re:Processes in software engeneering. (Score:1)
Most likely not - but automatic verification of programs using logical constructs is a big growth area.
You can test a program with all possible inputs, and have a clean run. But this does not mean the program is 100% reliable. You must prove the program is correct if you want to be sure it is good enough for circumstances such as shuttle or aeroplane flight.
With all the complexities of semaphore control in parallel computing, you really have to make sure a program enters and leaves critical sessions at the correct times, without anything else running (that has been designated mutually exclusive).
Many expertes believe that some Airbus crashes were caused by incorrect verification.
On a single processor machine, this is much easier, but how many space shuttles do you know of that only have one CPU!
Have a look at some of the links at Dr. Mark Ryan's [bham.ac.uk] page (university of Birmingham) for some more info.
Re:Seems almost like ISO... (Score:1)
I've worked on programs assessed at Levels 3 and 4, and supposedly the folks I work with now are Level 5 (I know they made 4, but I'm not sure the certification for 5 is finished). I grind my teeth sometimes at the layers of process we have to wade through to get things done - but every six months or so, they make changes to (hopefully) make it better.
The SEI's not just working with software [cmu.edu] - they're developing models for System Engineering and Integrated Product Development, as well as Personal and Team [cmu.edu] software process models for small and independent-minded folk. Your tax dollars at work!
Structural problems with the software industry (Score:1)
The main reason Space Shuttle reliability is not a priority in the software industry in general is that the whole focus of the industry has become the quick buck, the rush to the IPO, the dazzling of the user with endless "features" that have minimal utility. The classic example was Windows 3.1. It was colorful, had lots of features - and barely worked.
The marketroids who set timetables for software projects are another problem. Most of them think any arbitrarily complex piece of software can be designed, implemented and tested in about 3 weeks and get impatient when this doesn't happen. In the shuttle program the engineers are in charge and they determine the timetables.
Yeah, I'm a bitter, angry little coder...
Suits? (Score:1)
suits.
Seriously. Why is wearing a suit such a huge thing in the business world? I can understand if you're a lawyer and need to impress people with your multi-thousand dollar clothing, or an executive who deals with customers and must appease the customers' sense of what's proper in an executive.. but other than that.. WHY?
It's been proven and re-proven that people are more productive in an environment where they're comfortable. In this particular case the idea seems to be to make your coders as un-cumfortable as possible so they can think of nothing but getting the code perfect the first time so they can go home.. but most places (as has been mentioned repeatedly) aren't like that. So why is it that a guy who works in a cubicle and never gets closer to customers than a middle-manager who is in charge of a supervisor who is in charge of the customer service department has to show up to work in a tie?
And the worst part is that the business world seems to think people enjoy this. Sure, it's nice to look good.. if you've got a $2,000 suit you're going to want to wear it on occasion.. but how many of us can honestly say that we feel more productive in it?
Dreamweaver
Reliability and Tough Professionals (Score:1)
When I meet programmers who think that they are cool and tough, I tell them to read Bravo Two Zero [amazon.com] by Andy McNab. It's the true story of an SAS (British army special forces) unit that operated behind the lines during the Gulf War. Here in the UK, the SAS is revered by most guys in the way that Navy SEALS are in the US. The book has a lot to teach about programming.
Many people seem to think that special forces troops are so good that they can just be handed a task, left to get it done, and that they will deal with whatever problems arise. Wrong. According to McNab, the True Motto(tm) of the SAS is "check and plan". For example, before approaching an Iraqi military vehicle, they would rehearse opening the vehicle's door: which way the handle turns, whether the handle has to first be pushed in or pulled out, whether the door swings open or slides back, how much force needs to be used, etc. etc. etc. Every little detail is checked like this. And there are backup plans.
Now read the first sentence of the previous paragraph, but substitute "top software programmers" for "special forces troops". You can see my point. Truly good special forces/programmers/professionals all have some things in common: they are focused, disciplined, and methodical. And they don't feel a need to prove how good they are by taking unnecessary chances.
The main article also notes that programming teams such as those used for the Space Shuttle seem good at drawing in women. This is hardly surprising. Women naturally like men who are justifiably confident about what they do.
How well did the eight-man SAS unit perform? They were surrounded by Iraqis, who had armored vehicles. Three were killed. The other five retreated: over 85 km (>2 marathons) in one night with 100 kg (220 lb) of equipment each. About 250 Iraqis were killed along the way, and thousands more were terrorized.
Sara Chan
Re: (Score:1)
Re:Spacecraft Design (Score:1)
Our Telcordia subsidiary (formerly BellCore, half of what was once AT&T Bell Labs) is one of those Level 5 organizations - we're all learning from them.
CMM Level 5 does not come from hacking (Score:1)
The software fits the budget; is what the client actually requested etc...
Many major companies/consultancies try to aim for CMM Level 3, and most defence contracts require it.
It makes the acheivements of the NASA Shuttle program seem all the more impressive.
It doesn't necessarilly fullfill the Hackers development model, however, it does try to ensure Software Quality.
Re:Again, I don't understand (Score:1)
pyrrho
Re:Haven't we seen this before? (Score:1)
Re:Who are the kernel QA gurus? (Score:1)
Part of one SW development process I've worked successfully with has QA engineers designing the test plan, based on the spec, while the SW wngineers write the code. When the code's done, you implement and run the test plan.
If a change is made to part of the code, it ought to be reviewd, and the QA engineer should be present. He can then make some new tests that look specifically at the effects of the change. (And run at least a representative sample of the standard tests.)
My experience was at the application level, on a multimedia authoring and playback system. I'd be tempted to apply similar processes to OS kernel development and testing.
Scenario testing -- what you described -- can find bugs the formal tests didn't, for a hundred users can be more devious than one QA erngineer. But you can't rely on it to find bugs at the early stage; it's too random and undirected.
--Timberwoof
Re:Seems almost like ISO... (Score:2)
What's needed is a "meta-process", a process to develop the software process and keep it directed towards the goal. I would suggest that a democratic meta-process, where developers themselves work together to evolve the procedures they will use, would work better than decrees from clueless management.
Well, that's one set of religions. Others - such as Zen Buddhism - would say that such rules, or "process", are things to ultimately be transcended. The enlightened person, the sage or bodhisattva, does not refrain from killing based on some religious law; he simply acts. The practice of these religions is designed to help lead ordinary people to that state of enlightenment.Perhaps that should be the goal of software development practices, as well - to help lead ordinary programmers into that state where they are enlightened enough to be simply incapable of producing flawed software.
The value of an exacting process... (Score:2)
Don't forget why the Arian 5 rocket blew up in 1996 [ufl.edu], a conversion error caused a software shutdown that lead to the self-destruct of the rocket.
"The internal SRI software exception was caused during execution of a data conversion from a 64-bit floating-point number to a 16-bit signed integer value. The value of the floating-point number was greater than what could be represented by a 16-bit signed integer. The result was an operand error. The data conversion instructions (in Ada code) were not protected from causing operand errors, although other conversions of comparable variables in the same place in the code were protected."
What was the estimate, about $8,000,000,000 of uninsured losses, including 10 years of work for the scientists with satellites on board.
I wonder how many other maiden voyages have started off so poorly, other that the Titanic that is.
Can you imagine just how simple those things are.. (Score:4)
and how slowly they are being developed? I don't mean that it's a bad thing -- it's good that Shuttle program allows them to do it at reasonable pace and with reasonable requirements, but if everyone else wasn't under constant pressure, and if everyone's else software wasn't a victim of feature bloat, dealing with poorly documented and even worse implemented protocols, and never-ending stream of bullshit coming from the management, everyone else would write robust software, too. Well, not really everyone -- some "programmers" wouldn't be able to do anything because they have no skill, no education or are plain dumb, but reasonably geeky and educated programmer can pull something like that in ideal conditions -- and those guys _are_ working in ideal conditions.
Re:Again, I don't understand (Score:2)
Re:Flight Software (Score:1)
>Computers: slow, 16-bit machines designed back in the seventies. There are many reasons, but a big
>reason is that new hardware would almost certainly require massive changes to the flight software. And
>rewriting and recertifying all that software would be a huge task. The current FSW works reliably; if
>it ain't broke...
Actually, AFAIK, the main reason is that old 386s are tested, tested and, once more, tested for space use. With newer processors, there are too many unkowns to risk a space shuttle. The line-widths in modern processors are so small that background radiation is beginning to cause problems in space without proper shielding. Probably they are testing 486s and Pentiums right now, but it'll be another ten years before they're ready for extensive space use.
The importance of documentation (Score:5)
If I had there budget... (Score:2)
Re:Flight Software (Score:2)
Schedule is driven by the planned date for launch, and worked backward from there. For example, if you're going to launch a mission at date L, then the crew begins training at L minus X months, which means that the software has to be ready for the SMS at L minus Y months, which means you have to begin design at L minus Z months, etc. I'm not sure what X, Y, Z and related time deltas are, but I believe they probably start planning at least a couple of years in advance.
--Jim
Re:I can understand why they want no hacks (Score:1)
Or this:
clean up later -
too drunk right now */
- eddy the lip
Italics (Score:2)
My god! Where's all my karma going?
Re:how the best are made... (Score:1)
However, dropping to your knees and worshipping the brilliant scientist-programmer who wrote the core code your company's business depends on will not make you milions of dollars.
That code still needs to be tested against specifications -- even if the specs are written afterwards -- and (re)engineered so that it can be maintained and expanded as new versions and applications demand. Trust me, it's better to write the code in a comprehensible and maintanable way from the start.
If you have a genius who won't work within the programming *organization*'s process, you're sunk. If your genius sees the process as liberating, freeing his mind to create really good stuff ... then pay him lots of money and stock options.
--
Re:Interesting stuff (Score:1)
Please, think of the balsa wood and cardboard tubes. For their sake, please don't release such a dangerous tool!
Re:Processes in software engeneering. (Score:1)
My idea of computer logic was the following: one of my friends studies on a course on computer engeneering (The Netherlands). He's shewn me once one of his scratches. It was a difficult program, several factors envolved, etc.
But it fit into one simple "logic" line!
On the other hand, another "simple" programme took almost 3 long lines of "logical formulas".
What I meant, it would be nice to write programmes in this language, but let the computer do his thing writing the code.
Sorry if it sounded too stupid.
Re:Suits? (Score:1)
However, maybe I missed it, but I didn't see anywhere in the article that it mentioned that they wore suits. It said "moderately dressy
Of the photographs I could make out, one group of people was wearing jackets, but the other group didn't even have ties. Probably what separates management from the grunts would be my guess.
Re:The importance of documentation (Score:1)
"It was called Moving Block Signaling
Oh there were certainly problems, especially with the MBP (Moving Block Processor). It was a truly great idea, no question. Idea was, roughly, that since the dawn of railways, signalling has been on the 'Fixed Block' system - divide the railway into chunks, and only allow one train per chunk. MBP idea is that if the trains have a map of the system, and monitor their own speed, and the condition of their brakes, and the diameters of their wheels to the nearest 10oth of a millimeter, and a whole bunch more data, plus information about the other trains on the network, then the MBP can work out the safe braking distance (LMA, Limit of Movement Authority, including several extra metres for safety), the upshot being that you can then cram a lot more trains per mile of track, driving themselves more safely than humans ever could.
The old system would allow up to 12 trains per hour, MBP could potentially do 36, if you could get people on and off the trains fast enough.
The project went so far over budget the whole firm looked like going tits up, and after losing 25 million pounds on this project alone, it was all rather scaled down.
But to be honest, what really blew up everyone's plans was that, when the project started, it was meant to be delivered for 2003. Then the Major Government decided to have the Millenium Dome and furthermore decided that the Jubilee Line Extension would be the preferred (i.e. the only convenient) way of getting there. Bingo - the project delivery date suddenly moved forward by three years with no possibility of compromise. Now by early 1999, we'd run simulator tests - first two fake trains, then a real train and a sim, and were about due to get to the Two-Real-Trains test. However, at this point, a) London Underground needed pretty much constant access to the tracks, and b) we still needed to get our Safety Case. That could easily take another year.
So yes, they got Colour Light Signals (which apart from the MBP benefits described above also means you need to think about stuff like braking distances vs Line-Of-Sight).
The trains have most of the equipment, Automatic Train Operation, Automatic Train Protection, the Common Logical Environment (Effectively and Operating System for railways), and once MBP's finished it could be fairly easily retrofitted. It's still being developed for the Madrid Metro.
Basically, if our client (London Underground) hadn't had their timescales rewritten for them by the Govt, I believe we would have delivered the most advanced railway in the world, and the repeat business would have made not millions but billions.
Shame, really. A Lot of very good people did a lot of very good work on MBP.
TomV
Re:Formal Methods are the key. (Score:1)
http://archive.comlab.ox.ac.uk/formal-methods/b
Re:The joy of PLCs (Score:1)
For example:
|--Switch1---lightbulb1---|
|--Switch2-/
This represents two switches in parallel, so lightbulb1 will get juice if either Switch is on. So this is the equivalent of OR.
|--Switch1--Switch2--Light1---|
This is AND.
You can add new rungs and include relays, so that a switch3 could be a relay driven off of lightbulb1. By cascading with relays, you can have states, which can represent steps in a process. Switches can be sensors and lightbulbs can be actuators, so you can build a very simple circuit that can control a multi-step process with safety conditions, such as "only activate the forge if there is a blank in place(detected by a proximity sensor), and the temperature is withing certain limits(sensors), and previous steps were completed successfully, and the operators hands are safely out of the way holding down switches 8 and 9." Instead of wiring all this up as actual circuits, you can connect all of the sensors and actuators to the PLC. That allows you to store your programs, it simplifies the wiring, and you don't need to use actual relays, timers, etc. (You'll still use some relays of course if you need the low voltage coming out of the PLC to activate heavy equipment.)
Simple do-it-yourself application: You could connect all your home lighting, along with motion sensors and switches to a PLC, and set up any number of different logical relationships. So a single switch could be "home/away" which could control a large number of lights throughout the house. A single "movie lighting" switch could turn off certain lights, turn others on, dim a few more, turn off the dishwasher, and set a timer to go back to normal in two hours in case you fall asleep.
I don't have one, but I think the cheapest models are probably under $100. They never crash, they can run for years, they're extremely reliable, easy to use, and cheap. If you can program a VCR, then you can program a PLC. Unfortunately, that rules it out as a product for the home market.
"What I cannot create, I do not understand."
How many of us wish (Score:2)
You get what you pay for, and take the time for. These days, most people and companies seem quite willing to settle for "bad, buggy programs now" rather than "better programs, later". Of course, without organization (also common), it's possible to wait and get nothing later, too. Process is expensive in terms of people involved and time, but it's a lot cheaper in the long run than the alternative.
Open-source projects actually follow this - every successful open project I've seen has a definate hierarchy of people managing patches and controling what winds up in the latest sub-point build, and making key architectural decisions so nothing derails them. Oh - and there's no one who'll fire you if marketings last-minute changes aren't rushed through.
Re:What's fun in software development? (Score:1)
Typeing code is not what the job is about (despite what people seem to think). We're in the business of doing cool things for people. The crativity and ideas that flow from the (very smart) people around me are what drives me.
Just sitting coding typing is a bit dull compared to human interaction...
"The reason I was speeding is.....
Re:Somewhat bogus article (Score:1)
If you have enough to do it poorly ..... (Score:3)
The problem with this arguement is that while many companies think that they can't afford to do it, what they really can't do is afford NOT to do it. Software is becoming more complex - it's the nature of the beast. For the most part, design is not; we are all still using procedures that were brought into being in 'dawn of computer age', with the exception of higher order languages and more focus on OO.
You are correct in that it may be expensive, THE FIRST TIME. This is called a 'learning curve' and the cost is amortized over the number of times you use this technique. You may also say that the process itself is expensive but that is incorrect, or at least only partially correct. The process allows errors to be caught EARLY, which reduces cost. Please don't tell me that you believe a code-compile-fix routing can catch these sorts of errors as early as a well thought out design.
Also, rigourous design allows for flexibility - this may sound contradictory but consider the use of design patterns. They are NOT things that can just be thrown into the code ad hoc; they require thought and intelligence. A good upfront design means the ability to use these tools. Consequently, use of these design patterns allows for a certain level of flexibility in statisfying the lower to medium level nasty customer requests, and certainly helps on the more egregious ones. Does a code now, look later approach allow this? (if you think so, I have this bridge I'd like to sell you ...)
In short, yes, using these techniques is expensive. But they also produce code that cuts development time (i.e., no stuck in debug/extra request phase for 2 years) and once people get used to the process, the extra cost/load is minimal.
Haven't we seen this before? (Score:1)
Their code is good because... (Score:2)
They were rated CMM level 5 in 1988 - one of the first organizations anywhere rated at that level of software process maturity. Another good description of their processes (and how they created them) is in the book "The Capability Maturity Model - Guidelines for Improving the Software Process" (ISBN 0-201-54664-7) in Chapter 6, "A High-Maturity Example: Space Shuttle Onboard Software".
As far as making software error-free, a quote from the book will help illustrate the difference in attitude they have (it's talking about a graph). "These data include failures occurring during NASA's testing, during use on flight simulators, during flight, or during any use by other contractors. Any behavior of the software that deviates from the requirements in any way, however benign, constitutes a failure. Contrast this level of commitment with the cavalier attitude toward users in most warranties offered by vendors of personal computer software."
The best place to find more about the CMM is their web site at http://www.sei.cmu.edu/ [cmu.edu]
Flight software crashes (Score:1)
2-3 times per flight is more than I usually experienced, but I think I had to reset at least one system on 50% or more of my flights. That's quite a bit more than 1 every 500 hours. Some aircraft were better than others too... One jet required it's radar to be reset every 15-20 minutes. That problem was eventually traced down to a wiring harness connector...
In addition, there were and still are known software problems in that aircraft. The known ones usually have some sort of workaround (if the heads up display freezes, cycle power on the display processor, stuff like that), but the occasional random crashes or glitches (like occasionally the plane will suddenly think it's flying 100,000 ft below the ground) have no known cause and the only fix is to reset something until the jet behaves itself again.
My last point is that the flight control software in the F-15E is designed to go offline if the aircraft exceeds certain parameters. In that case, the flight controls must be manually reset in one of four ways. There is a quick reset switch, a "hard" reset switch for pitch, roll, and yaw, we can cycle power for those systems, and worst case we can pull and reset the circuit breakers for the flight control system components.
The funny thing is, it works only because the rest of the design is very robust. Most systems have some sort of backup, and the plane flies just fine without any electrical power at all. Once the software problems are known, they're dealt with as simply one more environmental factor until they're fixed. The fix may take over a year, but they are usually fixed eventually.
Again, I don't understand (Score:1)
Is this supposed to be black magic or something? If something bad is bound to happen, it will happen regardless of how many "certificates" and such were signed.
Or maybe it's about transferring responsibility?
Maybe Mr. Keller could sign a certificate that aliens will contact us next wednesday?
Re:Flight Software (Score:1)
Maybe they can fix 2.3's VM (Score:1)
Re:"No Pizza" is good (Score:2)
It isn't.
Many organisations are starting to find this out and are moving to proper professional engineering practices that improve reliability increase schedule predictability and more importantly reduce costs.
A couple of hundred years ago people built houses & bridges the way we build software - work until it's done. These days we have archaetects and project managers that build houses faster, more reliably and ON BUDGET.
This is the way the wind's blowing. It's a lot less heroic but it's the future.
Not just space shuttles. (Score:4)
how the best are made... (Score:1)
The obvious canidate would be Bill Joy's TCP/IP implementation. Eveyone runs it:
1. BSD's always used it
2. SYS V incorperated it - thus it flowed to most commercial unixes
3. LINUX borrowed heavily from is (recall that Regents of the University of California boot message?)
4. If the TCP/IP fingerprint of WIN2000 is any indication, they borowed it too.
And it works right every sincle time you use it. So, what process made it? A single genius. All the cool process in the world won't make up for the fact that the single requirement for great software is a great designer/programmer. The required process is simple - whatever that person requires to let their genius loose.
The only way to circumvent this requirement is to do what NASA does and spend probably literally hundreds of $ per line of code.
Re:Flight Software (Score:1)
On another note, the group that I work in (Flight Design and Dynamics) may start looking into moving from our IBM/AIX platform to a Linux platform. Penguins in space! I guess that is a bit offtopic, but oh well.
But consider what "a crash" means ... (Score:1)
Hey! Isn't this the way that profs say to program? (Score:1)
hehe... (Score:1)
Re:Who wrote the Mars landing software? (Score:1)
That might be true of the scientists and engineers, but not necessarily of the contractors or of other government agencies.
--
Re:Processes in software engeneering. (Score:1)
Have you ever programmed a half-way complex system yourself? Re-writing it from scratch is often the best thing that you can do, the more often , the better. In fact, there are software engineering models that officially choose to re-write their code often. This is called "prototype-based SE".
The reason is that while you write the code, you invariably notice some decisions that you made earlier were false, but they affected the design so deeply that changing it would be more work than rewriting it from scratch. The alternative is to live with the design flaws; most commercial projects do that because they don't have the time to re-write their code.
Re:The importance of documentation (Score:5)
Likewise, i worked for a while on the signalling system for the Jubilee Line Extension for the London Underground.
Totally documentation driven. First there was the CRS (Customer Requirements Spec). - this then transformed via an SRS (Systems Requirement Spec.) into the FRS (Functional...) and the NFRS (Non-functional...). From these we had Software Design specs, Module Design Specs, Object Design Specs, Boundary Layer Design specs. in all there were around 4000 specification documents for the project, often at issue numbers well into the teens.
What really made the difference though, was not so much the existence of documentation, as the absolute insistence on traceability - every memeber function of every class in the whole system could be traced back to the Customer Requirement Spec, and every Requirement could be traced to its implementation. This meant - no chrome: everything in the spec was p[rovided, and nothing was provided that wasn't in the spec.
Also worth noting that: the whole thing was in ADA95. The compiler was very carefully chosen. Coding standards were tight, and tightly enforced - function point analysis was king - anything with more that 7 function points was OUT, simple as that. Every change to anything, however small, required an inspection meeting before and after implementation, with specialits from every part of the system which could be impacted, plus one of the two people with a general overview. Then there were the two independent test teams and the validation team.
Ye Gods it got tedious, no denying that. But in a situation where lives depended on good software...
Now I probably apply only a tiny fraction of what I learned, but when I decide to ignore part of the methodology, at least I know I'm ignoring it. And I'm aware of what I'm missing.
In short - learn about the safety-critical approach. Ditch most of it as excess baggage by all means - it's often simply not justifiable. But be aware of the choices you're making.
TomV
Re:ohh if only... (Score:1)
I bet they also have enough time and enough $$$ (Score:1)
Re:Here's the difference (Score:2)
Bullshit.
NASA didn't just have a solid process, they had MONEY. They BOUGHT that quality, by hiring an order of magnitude more testers than you'd find in the commercial world. By budgeting several years of development time rather than weeks or months. By reducing the number of lines of code that any one developer is responsible for.
There's a lot to learn from a highly structured development process like NASA's. But don't kid yourself that the quality they produced is simply because they 'had the right process' or had better management.
Higher Quality != Higher Cost/Time! (Score:3)
But then the Japanese came along with a radical new idea: if there are defective parts coming down the line, then we should figure out why they were created defectively in the first place and fix that. Then the number of defective parts at the end of the line would be less, thus you would need *fewer* inspectors and *less* time at the end of the assembly line. (Ironically, this principle came from an American named Edward Deming; unfortunately American companies were too successful during his lifetime for them to take him seriously :-) So the Japanese were able to build cheaper cars quicker than the Americans while actually having higher quality.
I think that's very analogous to the current argument. Under the current system of coding, you basically hack together something that sorta works, and then use sophisticated debuggers/development tools to figure out which parts are buggy. Using that system, it's true that higher quality requires more cost and time.
But I think the point of this article is that that is the wrong way to approach programming. First, figure out why defective code gets written in the first place (be it poor client specifications, poor management, poor documentation, whatever) and then fix those processes, and you'll turn out quality code without having to spend any more time or money!
As a practical example, I first learned C under a CompSci Ph.D. who was a quality fanatic. In order to teach me to code properly, he would give me projects and then not allow me to use a debugger. Nothing at all. Zilch. Nada. The only thing I was allowed was to place print statements within my program wherever I wanted to see what was going on. As a result, I spend *a lot* of my time planning my code out, and reviewing it over and over again before even compiling it, because I knew that if there were bugs in it, I couldn't just fire up a debugger and take a look.
And secondly, if there were bugs, I couldn't just trace through the entire program or create a watch list of every variable. I had to study the bug and understand it, look at the code and figure out where the bug most likely was, and then use selective print statements to look at the most suspicious stuff. That way, when I encountered bugs, I'd be forced to actually understand what the bug was and then analyze my code to figure out where that error most likely was.
If this sounds like a programming class from hell, believe me, it was incredible! I couldn't believe how much of my code worked the first time it compiled. And when there were bugs, I actually fixed the underlying flaw in the logic rather than just applying a temporary patch. What's more, since the rest of my program was well planned and documented, there were no "hidden" effects: if I found a bug, I knew exactly which parts of the program it affected, and perhaps more importantly, *how* it affected those parts. Thus they were very easy to fix.
Believe it or not, it took me less time to program this way than using debuggers, and the resulting code was much more stable and understood.
If you look at commercial software these days, it's not uncommon for the debugging period to take longer than the actual coding. In other words, there are more quality inspectors than there are assembly workers, and the time the code spends in inspection stations is longer than it spends being produced. It's tough to say that this is the "efficient" method of programming...
If you want to see where this is heading, just turn once again to the car industry: once American companies got their asses kicked by the Japanese, they adopted their techniques, and Surprise! Cars now come out of their factories with higher quality, in less time, and at less cost (adjusted for inflation and new features :-). Who would've believed it? :-)
Re:"No Pizza" is good (Score:2)
Noone wants to write buggy code...
Well, mister know-it-all...how do you go about getting really obnoxious amounts of money out of the customer?
knocking thhose who rewrite their code. (Score:2)
"No Pizza" is good (Score:3)
That's because pizza-and-coke all-nighters are a direct byproduct of poor planning, either by the engineer implementing the code, the architect creating the design (if there even is such a person) or the person making the engineer's schedule. And the result is usually hastily written, incompletely tested software that is typical of most product offerings for use on the desktop.
The process of authoring mission critical, man rated software is so far removed from the ad hoc, informal, duct-tape-it-together approach that most programmers use that no direct comparison can be made. I've seen both ends of the software development spectrum and they each have their uses. You can't launch a shuttle with a bunch of last minute kernel patches and some stuff that was written the night before the launch date. But you can't compete in the commercial software marketplace with code that takes 2 or 3 years to specify, design, implement, test, and integrate, either.
Stand in awe of the people who have the skill and discipline to write software of this quality. Learn what you can from their process and try and use the lessons they've learned. Their stuff doesn't break, because when it does, people die. If O/S developers had that same attitude about their code, we'd never see blue screens of death, kernel panics, or any of the other flakiness we tolerate on our desktop machines.
Re:An alternative strategy (Score:2)
I suspect that, upon seeing the "computer restart" button, the test pilot evaluating the aircraft would start asking a series of questions:
1. What is the failure rate of the computers; i.e., how often will that button have to be pressed.
2. What is the time elapsed between the computer failing and the computer operational, including the reaction time of the pilot or weapons officer? Assume that the pilot and weapons officer are already a) flying the plane, b) lining up on target, c) watching for SAM sites, ground fire, enemy aircraft, and d) coordinating with friendly aircraft.
3. How does the computer controlled, fly-by-wire system function during the timeframe covered in question 2? Will it fly steady (given that many modern fighter airframes are inherently unstable in flight, and rely on active computer control)? Will I have any control over the plane until it restarts?
4. If this happens in a dogfight, what are the chances of recovery and survival?
Or not... In truth, I suspect the first few questions would really be something like "You're kidding me, right? Do you think I'm crazy? Would you be willing to fly this deathtrap?"
Re:Processes in software engeneering. (Score:2)
There have been several occasions last year where me and a co-worker ended up trashing pages and pages of code to re-write it with the same functionality, but modular and ended up being smaller in some cases.
My company used consultants who wrote terrible code. Let's use this example...there is a program that calcuates x days ago. The consultant's program went and tried to calculate leap years and all of that. Our program that replaced it used system library calls to date, and then simply subtracted the proper amount of seconds. Other ones were hardcoded scripts to run sql on our database, we replaced that with a perl script that took the sql as a parameter.
So there are times where a re-write is better than maintaining the code. I guess the biggest case in point is mozilla versus navigator. Basically I agree that projects were planned and used software engineering principles we would most likely end up with good products. Granted game programs seem to be done best when they're a hack.. But how many times have you seen long term maintenance of games?
Somewhat bogus article (Score:5)
Here's NASA's own history [nasa.gov] on bugs in that software:
The Shuttle's user interface is awful. The thing has hex keyboards!. Some astronaut comments include
This project should not be held up as a great example of software engineering. Even NASA doesn't think it is.
Re:THAT is how to write code (Score:2)
With these type of clients (and I've dealt with many) taking the proper long stage of design and discussion doesn't work at all. The client immediately changes their tune after seeing initial results. Not so much to add features, but that the features they actually requested were not the ones they needed, or didn't work within their business practices.
Doug
Re:Higher Quality != Higher Cost/Time! (Score:2)
A good book on this (from 1986-8, so it leaves off when the US auto industry was in pretty much the nadir of its decline) is David Halberstam's The Reckoning [fatbrain.com]... I'd go into further detail, but you have to read the book. It goes into Ford & Nissan overall, but it's very rich with both history and personality (particularly Mr. K of Datsun 240Z fame) and an excellent read.
There are definitely some lessons to learn, particularly regarding American hubris during fat economic times..
Your Working Boy,
Re:ohh if only... (Score:2)
Here's the difference (Score:2)
This software is the work of 260 women and men...
Commercial programs of equivalent complexity would have been written by 7 or 8 people.
Re:Safety is cool... (Score:2)
I would suggest that if every software project werre SEI-5, there would be no internet and people would be doing papers on typewriters.
3rd Job = 1st Experience with Software Engineering (Score:2)
Then I got a job at the Waterford Institute. Their process wasn't probably as tight as the space shuttle, but there WAS a process, and there were specs. Nice specs. Nearly psuedo-code.
We were programming educational activites for kids learning math. Activities were created by design teams consisting of an educator, an artists, a tech writer, and a programmer. The tech writer would document everything that went on at the meetings, and distill it into spec. The design team would meet regularly over a period of several months, refining the spec until it was solid.
The spec described various states of the software. When a user did something, the state of the software changed, and did something accordingly. I'd never seen software described this way, but it made a big impression on me, and it made things easy to write and debug.
('course, the platform we were writting on was in Java, which kept changing, and in-house developers were writing our own object library, which kept changing too, so your code would work one day, and then wouldn't the next, so everything wasn't perfect. But hey. I was impressed with the specs
Interesting stuff (Score:2)
That said, something I was curious about that the article didn't answer, and that I don't see mentioned here yet-- what language is all of this done in? Ada would be my guess, or is there something even better than that?
Re:Flight Software (Score:3)
http://www.ksc.nasa.gov/mirrors/images/images/p
--Jim
Re:"No Pizza" is good (Score:3)
If your company took the time to write very stable, near-bug-free code, they'd take so long doing it they'd go out of business - their competitor would get the business with a flakey but shipping product and by the time you turned up with your perfect product, everyone would be locked into their stuff (and most likely would have been using it for a couple of years).
Noone wants to write buggy code, we all try to do our best; logical & clear design, defensive programming & good documentation give a good base. Peer review and experience (been there, don't want to do that again) help a lot too. Just writing the comments first (saying what you're *going* to do before doing it) helps.
Another problem is that writing bug free apps on (say) windows is almost pointless as the app will still fall over when some bit of buggy OS/windows API code falls over. Things have to be stable and bug-free from the hardware upwards to give an impression of stability to the user - the problem is, the average user can't tell the difference (and couldn't give a toss) whether the app or the os fell over, it's just "my WP crashed and I lost my work".
Welcome to the real world. Software can be flakey because it was written to be useful before the hardware went out of date - not exactly a problem with the shuttle. You can spend ages hand-crafting efficient code to be overtaken by crap code on a faster CPU. Blame the chip companies for moving so quickly
Hugo
Re:Interesting stuff (Score:2)
Re:Flight Software (Score:2)
I thought they were the only group to achieve SEI-Level 5. If not, then who else has, I'd love to go and correct one my lecturers.
When the Capability Maturity Model for Software [cmu.edu] was published by the SEI [cmu.edu] there was only one ML-5 orginzation; at the time they were known as the IBM Onboard Shuttle group. Thankfully, times are changing.
According to the SEI's 1999 survey [cmu.edu], 61 organizations reported a Maturity Level of 4 or 5. Of those, 40 were Level 4 groups and 21 were Level 5. The survey goes on to mention that as of 15-Feb-2000, some 71 organizations reported that they were Level 4 or 5. Those that gave their consent are listed in Apendix A [cmu.edu].
Re:Not Exactly (Score:2)
The B-1B has seven of the GCUs that the Shuttle has. So it's couldn't fly at all either. The FA-18E has a number of PowerPC chiped flight control computers...the FA-18E is the first US fighter to use Cat-5 Ethernet to connect the computers togeather instead of obscure military cabling...at least thats what I read.
IMHO the biggest problem with the F-16 is the fact that it has a single engine. If you look single engine jets crash more than twice as much as twin engine jets. Single Point of Failure will get you every time.
Re:Interesting stuff (Score:5)
I work in the Flight Software (FSW) Verification group in Houston.
The shuttle FSW code is written in something called HAL/S. This stands for High-level Assembly Language / Shuttle. The language was designed to read like mathematics is written. Superscripts like vector bars are actually displayed on the line above, subscripts like indices are displayed on the line below. Vectors and matrices can be operated on naturally, without looping.
We are the only ones with a compiler, because we wrote it ourselves.
Here's a sample:
EXAMPLE:
PROGRAM;
DECLARE A(12) SCALAR;
DECLAREB ARRAY(12) INTEGER INITIAL(0);
DECLARE SCALE ARRAY(3) CONSTANT(0.013, 0.026,0.013);
DECLARE BIAS SCALAR INITIAL(57.296);
DO FOR TEMPORARY I = 0 TO 9 BY 3;
DO FOR TEMPORARY J = 1 TO 3;
A =B SCALE + BIAS;
I+J I+J J
END;
END;
CLOSE EXAMPLE;
I couldn't get the subscripts to line up, but you get the idea.
Re:Spacecraft Design (Score:2)
Re:Interesting stuff (Score:2)
The Shuttle was flying before Ada had been developed.
Processes in software engeneering. (Score:2)
Not as many as should have.
Every time I read a history of a programme and find a line "completely re-wrote the code", I begin having second thougths about how really good the programme is.
With the ever-faster-growing complexity of programmes, it becomes more and more difficult for humans, even aided with computers, to keep track of the project. But if you teach everyone how the computer logic works, the programming would become only about writing the necessary simple code (ha! hackers, get this!).
Would the next generation programmers write in "logic language" instead of C++? Who knows, but it would IMHO make the programmes robuster and even better.
Unintentionally humorous quote in article (Score:3)
I can see many Dilbert-fans wondering if that is a bug or a feature.
If they ever release this to the public (Score:4)
1 Space Shuttle Endeavor
1 Launch Pad
1 Houston Mission Control Station
4 Astronauts
ohh if only... (Score:2)
I had the paitience.
Well this is cool. I proves that you can't write perfect software*. However you can come close.
If only everybody would do it this way, not just some cool company.
This probably even produces better software the "open source" way. OpenBSD is the only open software project that comes close, it really is kind of sad. People need to relax to do it right, down with stress!
Well if you met someone who works at some dot com ( well there quite a lot of them here in Stockholm ) they are always really really stressed. That might impress the stockmarket but not really anyone else... That is the reason everybody talkes about "When will the bubble bburst?"and I can tell you this:
The "bubble" ( which consists of overstressed people ) will burst very soon. The more relaxed people will take it easyily.
* Well you can, but Hello World! isn't really THAT
complex.
Bug free?? (Score:2)
How can they be sure it's bug free? If the last 14 versions had 20 errors, did they think it was bug free each time - only to find more bugs? At 500k lines of code you can't prove it all mathematically and human checkers are.. well human.
One way to measure how many bugs your code has is to purposefully introduce a bug and tell people to find it. Then you count how many new bugs they found along with the bug you introduced and scale that by the lines of code you have. But this technique won't work if you one have 1 or 2 bugs that people are actively looking for in the first place. So, my question is - how can they be sure it is bug free?
I can understand why they want no hacks (Score:3)
#Shuttle Waste Dump
#
#I dunno WHY this works, but it does!
What's fun in software development? (Score:4)
I can almost hear the moans from the pizza-and-coke crowd whem they read this: "Where's the fun? Where's the creativity?". But they're under the mistaken assumption that putting lines of code into the editor is the only fun thing about developing software.
IMHO, software development is full of fun activities. What about analysis and design? In my experience, that's where the creativity really comes into play. Just talking to the customer, understanding the problem and making a working design is really difficult, and hence rewarding when you pull it off.
And what about the process itself? Software development is a young dicipline, where individuals and small groups really can make an impact. Nobody really knows how to make good software. Maybe you'll be the one to find out? As the man says, in the shuttle software group, people use their creativity on improving the process.
And last, but not least, I bet those guys have a really good feeling when they talk to the customer after delivery. Not like some people I know, who just hide. ;)
If you can't see the fun of these other activities, maybe you shouldn't be working in this field...
This is what fault-tolerant systems are for (Score:2)
Re:Interesting stuff (Score:2)
this only works for few projects (Score:3)
- half a million LOC (that's small)
- under development for 20 years
- new requirements are avoided at all cost
So it is a small, long lived project with nearly unlimited budget. No wonder they can afford to have such a process in place. But now realistically, how long does it take to set up such a project from scratch. How about having a customer who does not know what he wants. How about deadlines of less than 10 years from now.
I honestly believe that this way of delivering software is optimal for nothing else but long lived, multi billion dollar projects. In any other case you'll end up with something that is delivered years to late, indeed matches the requirements of 10 years ago and is close to useless.
Unfortunately many software companies are in a situation where they can't afford to wait for perfect software. Take mobile phones as an example. Typically these things become obsolete within half a year after introduction. The software process is what determines time to market. Speed is everything. If you can deliver the software one month earlier, you can sell the phone one month longer.
Of course testing, requirementspecs and software designs are usefull for any project but it's usually not feasible to do it properly.
Re:Processes in software engeneering. (Score:2)
The only way to know for certain is to either code directly in bits or be (extremely) intimate with the compiler and linker. At that point, a proof will be correct.
The way the shuttle seems to work is you better have a damn good reason to write/alter/delete/modify/worship a line of code. This will catch the majority (~99.95% by their reports of errors v. standard commercial software) of reasonable errors.
They identified the weakest link in the chain of software engineering and have fortified it quite well.
Re:The value of an exacting process... (Score:2)
Re:An alternative strategy (Score:2)
If that's so, it's an interesting illustration of the overall system's requirements imposing lower quality standards on components of that system.
To wit: the article (I presume; haven't read it, but have read similar ones on the same topic) discusses the importance of achieving a 100% quality rate on a given chunk of software.
Now, that software is merely one component in a much larger system.
Actually, these larger systems nest "outwards". I.e. the shuttle itself is a larger system than the software it contains, but so is NASA a larger system than the shuttle; so is the US government larger than NASA; so is the USA larger than the government; so is the planet's population larger than the USA; etc.
In this case, there are specific reasons I can suggest account for the 100% quality requirement that might otherwise go unnoticed:
Failure resulting in death of participants, and especially of non-participants (humans), is not an option.
However, failure resulting in not launching, not even building it in the first place, especially not building it within some timeframe, is an option. That is, failure of the "commitment to quality" approach to actually deliver the component on a "timely" basis is an acceptable option.
The world generally will admire a program such as the space shuttle less if it crashes and burns frequently, killing/maiming people and destroying equipment, than if it succeeds on the extremely rare occasions on which it is tried -- perhaps even less than if it never happened in the first place.
A delay in a shuttle launch costs, overall, far less than the cumulative risks of premature shuttle launches. (Challenger demonstrated that.)
Compare these elements to fighter aircraft, where the software is part of a somewhat different set of larger systems:
The deaths of participants and non-participants is expected by most everyone of this sort of system and the activities around which it revolves.
On the contrary, the sorts of failures that result from failing to launch a fighter plane, or never having designed it in the first place, are generally not so well-tolerated.
The world will likely fear a non-existent fighter plane, even one that has 100% success in its flight-control software (doesn't require rebooting) but is launched extremely rarely (it's hard to build) or too late, far less than it will a large fleet of existing, dangerous fighters that have even a 10% "kill" rate of its pilots per year.
A delay in a fighter-plane deployment can literally cause lost wars. In that sense, the loss of pilots due to poor design is a calculated positive compared to the loss of a nation's (and/or its peoples') freedom.
Of course, I'm making pretty much everything up, above, so don't bother arguing details or interpretations with me -- I have no idea whether they're correct or not.
But, they're probably correct enough to illustrate why it's probably okay for us to be using highly buggy computers on a poorly designed (for the way it's being used now, anyway) Internet rather than, as another post on this thread put it, using typewriters and plain paper.
Not that there aren't wonderful advantages to deploying 100% correct software components in a large-scale, much-buggier system! "Creeping quality" is not a bad thing at all, since it allows people working on the system to worry less about various portions of it as they try to debug it.
But, the effort to deploy such perfect components may well outweigh the utility of doing so, overall, given the pertinent timeframe.
In particular, when trying to deploy such a perfect component in a large, buggy system, it can be hard figuring out which component can be made so "perfect" and still be useful in that (presumably speedily-evolving) system by the time it's ready!
So maybe it's appropriate to view almost everything we deal with on the Internet as a very early alpha-stage prototype after all. ;-)
Re:Seems almost like ISO... (Score:4)
I'm working on a large NASA project now. I have determined that the purpose of this project is not to produce a working software system, but rather to produce a wall full of loose-leaf binders of incomprehensible documentation that no one will ever refer to again.
The process says we must have code reviews - great! But instead of being an analysis of the logic of my code, it turns into a check against the local code formatting standards - "You can't declare two variables with one declaration, use int a; int b; instead of int a,b;" (yes, that's an actual standard around here) instead of "Hey, if foo is true and bar is negative, you're going to dereference a garbage pointer here!"
The forms are observed, but the meaning is forgotten, like Christians going to church on Sunday then cutting people off and flipping them the bird on the drive home.
"Process" won't save us. Which doesn't mean that a certain amount of it can't help, but there is no silver bullet. [virtualschool.edu]
Re:An alternative strategy (Score:2)
Flight Software (Score:5)
I can also tell you that NASA avoids having to make unnecessary changes to the FSW. For example, the new "glass cockpit" recently discussed here on Slashdot: when these upgrades were designed, they chose to design the interface to the new display modules to exactly mimic the interface to the old intruments. In other words, they are true plug-and-play replacements; one significant reason for this was so the flight software didn't have to be modified.
Likewise, people often ask why the shuttle continues to use such antiquated General Purpose Computers: slow, 16-bit machines designed back in the seventies. There are many reasons, but a big reason is that new hardware would almost certainly require massive changes to the flight software. And rewriting and recertifying all that software would be a huge task. The current FSW works reliably; if it ain't broke...
Huzzah! As I type, we just launched Atlantis. Go, baby, go!
--Jim
Re:ohh if only... (Score:2)
Formal Methods are the key. (Score:2)
If everyone would simply use VDM/Z or Larch/CLU for all their development work, it would be much easier for us to prove our software is correct, and then all bugs would be a thing of the past.
It really is that simple. Don't these people remember what they were taught at college ?
Seems almost like ISO... (Score:2)
I have come across fellow works where they absolutly hate this type of practice... well they probably best suited for development in non-critical life threatening systems.
If you read past this... (Score:2)
It's complicated.
It's simple.
It's complicated again.
The article gets worse from there.
--
THAT is how to write code (Score:5)
The good thing about the way software is written here is that the requirements are written down and sorted out before they even do the planning. How many prgrammers, groups, firms etc. can say that. I will admit, though, that a major problem is changing requirements. Something that just happen in the same way for NASA. It might just be better if people decided to wait a bit before jumping in to the programming. They'll save themselves more time and money in the long run.
Re:Formal Methods are the key. (Score:2)
"Would you rather fly on an airplane with software that has been proven to be correct, or on an airplane with software that has been rigorously tested through actual flight time?"
I think the answer is clear.
Re:Formal Methods are the key. (Score:3)
Another formal system originated in France is the Methode B, that consists in progressively refining logical statements that apply to the desired behaviour of your program (like assert() you put before and after the body of a function) into the implementation of the behaviour :
http://estas1.inrets.fr:8001/ESTAS/BUG/WWW/BUGh
An academic formal methods team that checks the Ariane 5 software:
http://pauillac.inria.fr/para/eng.htm
http://pauillac.inria.fr/para/eng.htm
Safety is cool... (Score:4)
I never quite understand why it is an act of macho bravado to work all night and live off pizza. It indicates two things 1) A badly run project and 2) poor maintainability in the code.
In one of my previous incarnations I worked on display systems for Air Traffic Control, where the quality level was also very high, where the performance requirements were exacting and the specifications precise.
Some would think that this means simple and boring... Of course not. Having to display a track from reception at the Radar to the display in 1/10th of a second isn't easy by any stretch of the imagination, and to do it so it works 100% of the time means you have to understand the problem properly rather than coding and patching.
If only more projects worked like that then there would be a lot less bugs in the world.
Re:THAT is how to write code (Score:2)
Re:Flight Software (Score:2)
Congratz,
Something I've always wondered about is at what point to you figure you have done enough planning and start to work on the actual project? What is their time-line dependant on; 100% error free pseudo code before everything gets actually implemented or after they put it through a set number of readings?
I think a lot of the reasons most companies don't go through this extensive pre-stage process is because they fear the project will get lost in a black hole of redesign and doublechecks.
Also where can one find the Software Engineering Institute ( SEI ) specs?