Writing Code for Spacecraft 204
CowboyRobot writes "In an article subtitled, "And you think *your* operating system needs to be reliable."
Queue has an interview with the developer of the OS that runs on the Mars Rovers. Mike Deliman, chief engineer of operating systems at Wind River Systems, has quotes like, 'Writing the code for spacecraft is no harder than for any other realtime life- or mission-critical application. The thing that is hard is debugging a problem from another planet.' and, 'The operating system and kernel fit in less than 2 megabytes; the rest of the code, plus data space, eventually exceeded 30 megabytes.'"
Wait a minute? (Score:4, Insightful)
VXworks does not even offer memory protection and the ram can get fragmented. Not to sound trollish but I would pick something like Qnx or NetBSD for any critical app or embedded device.
Its amazing the engineers fixed it and got it to work reliably but better more mission critical operating systems would be a better choice.
Other options being considered (Score:5, Insightful)
There are a few groups at JPL that have been actively experimenting with other options, including RTLinux and a few different variants of hard-real-time Java (basically Java with explicit memory management and no garbage collection).
Re:Wait a minute? (Score:5, Insightful)
Dynamically allocating memory is usually a big no-no in real time systems.
Re:hmm... (Score:4, Insightful)
Probably not as big a cost as losing a Mars rover because your OS wasn't reliable enough.
Re:Wait a minute? (Score:4, Insightful)
Why would you even want memory protection in a system like this? Memory protection is great to prevent crappy apps on your PC from doing too much damage, but in a system like the Rover it's pure overhead.
As for ram getting fragmented, it all depends on how you program it. Often, you don't even need memory allocation, so you won't have any problem with fragmentation.
Re:Efficiency (Score:3, Insightful)
Now, any particular instance of the kernel gets compiled for a specific processor, and only includes the drivers it needs. Which does save on some space. But a lot of that extra space comes from things like a dynamic loader/loader, graphics packages, local shells (usually in multiple flavors,) and host of other applications that are "standard."
The thing that saves *that* space is the local WDB debugging agent. It lets you offload almost all of the bells and whistles to another machine, which does the object loading, provides your shell, does whatever debugging you need, then sends simple instructions ot he agent to carry them out, and generally dramatically increase the interface capabilities without increasing footprint.
Marketing crap (Score:4, Insightful)
I used vxworks on a reasonably large project several years ago, it's a fine piece of work, but nothing special, it's no where close to the quality of a recent linux kernel.
About half-way through our project we developed a need for a local filesystem on our box. We bought a FAT filesystem add-on from wind river that was annoyingly poor quality, lots of bizarre little problems, memory leaks, and of course no source to look at. In the end we didn't use it, we put together our own filesystem from freely available sources.
When I read the articles about vxworks filesystem problems nearly borking the entire Mars rover mission I laughed and laughed. I'm sure that it was the same crappy code (although I don't really know for sure).
For me it's a case study on why you shouldn't use closed source software, you can't evaluate the quality of the code on the other side trade-secret barrier and you wind up trusting things like glossy brochures.
jeff
Open source spaceware (Score:5, Insightful)
If that was open source, there are so many space nerds who are programmers that flaws of that magnitude would never get by the army of testers.
Many would help out simply because hey it's the *space program* and that's good enough for them. Other would want their name listed next to some obscure bug fix on a NASA site; it's good for the ego or your CV.
Simply put, even a binary distribution of that code would allow unlimited free testing for crashes. Why wouldn't NASA do it?
Because there are still people in washington that think code mysteriously get damaged by being public - even if such code isn't modifiable by the public who reads it.
This is evidence of advanced cluelessness in Washington and maybe independant anti-free-source advocates (spelled M-i-c-r-o-s-o-f-t) are at cause.
But I've learned not to bash. Never explain by Microsoft malice what could be explained by stupidity. Such as using DOS on a space thing...
Re:Open source spaceware (Score:3, Insightful)
Open Source is great and all, but it's hardly the answer to everything.
Hell yes! (Score:5, Insightful)
Exactly!
The problem is that most /.ers are used to thinking of an OS as something that needs to run any arbitrary program under any arbitrary conditions and survive any arbitrary crash in those programs.
For a Rover, none of those are true. They know exactly what code is going to be run. They know exactly where it's going to sit in memory. And they test it. (This is the part that /.ers can't quite understand.) They test these programs far more rigorously than any bog-standard x86 Linux OSS program ever gets tested. Those programs have their problems, but they will be mistakes in logic (metric/imperial conversions, or thread priority inversions), not segfaults because of derefing a null pointer.
I wonder how many undergrand CS degree programs still teach correctness proofs? Not "yeah, I ran it lots of times and it didn't crash," but "I ran it 100,000 times with 100,000 different inputs, all random, and it didn't crash, but while it was running I also sat down and mathematically proved the code is correct."
Embedded programming is just plain different than "normal" progrmming. It's usually a mistake to try to generalize from one to the other.
(All that said, the next version of VxWorks is advertised to optionally support a "traditional Unix" process model, and I think protected memory boundaries are one of the features. In case your embedded app needs to run arbitrary third-party software which probably doesn't get stress-tested at JPL :-), you can turn all that stuff on and live with the overhead.)
Re:mutex's always cause trouble (Score:2, Insightful)
Ditto (Score:3, Insightful)
Part of the problem in my case was that VxWorks is for smaller embedded systems, which my project is NOT. I need fast disk storage, I need graphics, I need networking, I need things that VxWorks just doesn't provide very well.
Were I able to change one decision about the design of my project, I would have gone with Linux instead.
WRS *used* to have something to offer, in that they provided a real-time OS and hardware driver bundles (board support packages in WRS-speak). However, they no longer provide great value in that area - Linux has far better hardware support, and for any reasonably complex project will scale down as well as VxWorks will scale up.
Re:Open source spaceware (Score:3, Insightful)
Huh? (Score:4, Insightful)
"...and even though I chose the wrong tool for the job, it's still the tool's fault for not doing everything I need."
Re:Out of curiousity (Score:4, Insightful)
In the real world, once you get up in the vicinity of the Van Allen belt, you get into hard radiation. If you use typical modern high density chips, with 0.15 micron die spacing, a single particle will short/damage half a dozen traces on the chip on a single impact. If you use really old stuff, with 5 micron die spacing (and higher), a particle will be to small to get multiple traces in a single impact. you may still get a single bit flip, but, ecc will catch that, and you can deal with it. In the former case of a high density die, the failure would end up being catastrophic when a particle impacts the chip. There are practical limits to the size of die that can be mounted on a carrier, and the trace density defines the capacity of that die. Yes, it's possible to cram 32 meg of ram into that space, but, it wont last but a few minutes in a hard radiation environment. Take that same silicon wafer, using 5 micron traces, and it'll last years exposed to the same environment, but, it'll only have 1 meg of useable ram locations due to the decrease in density. you cant just throw more of them on, because then power consumption becomes the issue, in overly simplified terms, the chip is going to use power relative to it's surface area, matters not if it's got 1 or 32 meg of addressable locations in that area. Clock frequency is the other major contributor to power consumption, hence its not uncommon at all to see space hardware measured in KHZ rather than MHZ and GHZ like most folks are used to, and there are damn good reasons to leave it that way.
An all up spacecraft platform has hard limits on physical size (constrained by the physical limits of the launcher), and hard limits on total mass, determined by the launch vehicle capability to the final trajectory required. The final design will budget a portion of it's mass allowance to power generation, and that power is in turn budgeted to various systems. the folks doing the controllers will have a hard limit on power consumption, another on volume, and a third on mass. working within those limits, they have to design and deploy a system that is expected to have 99.999999% reliability, operating in conditions more extreme than it's possible to actually simulate on earth.
Its a shame, but there is one thing they dont seem to teach in computer science courses anymore. Out here in the real world, reality gets in the way of all the theory. Moore's law may well say chips will get faster, and density higher as time goes on, but it becomes irrelavent when other limiting factors get in the way. until gamma particles start to shrink, or we come up with an effective way of making sure they dont hit the electronics, 10 year old and older stuff is going to remain 'state of the art' for use in space. Die density and ability to shield are hard limitations, cant get past them, and you wont see more modern equipment going into the reaches of space till those limitations are overcome. That's not likely to happen in the forseeable future, the research in that area is all 'nuclear research' and that's all out of vouge these days, gonna take a couple more generations or a severely critical power shortage to change that.
An important point (Score:2, Insightful)
RTFA (Score:1, Insightful)
I'm as fervent a WindRiver basher as the next guy. But at least bash them for things they are *guilty* of. Sheesh!