Writing Code for Spacecraft 204
CowboyRobot writes "In an article subtitled, "And you think *your* operating system needs to be reliable."
Queue has an interview with the developer of the OS that runs on the Mars Rovers. Mike Deliman, chief engineer of operating systems at Wind River Systems, has quotes like, 'Writing the code for spacecraft is no harder than for any other realtime life- or mission-critical application. The thing that is hard is debugging a problem from another planet.' and, 'The operating system and kernel fit in less than 2 megabytes; the rest of the code, plus data space, eventually exceeded 30 megabytes.'"
Re:hard to imagine.. (Score:4, Informative)
George Neville-Neil (Score:5, Informative)
The interviewer George Neville-Neil co-authored "The Design and Implementation of the FreeBSD Operating System" with Marshall Kirk McKusick.
Re:hmm... (Score:2, Informative)
As to releasing of thier source code? From Wind River? ROTFL!
1. 2. 3. 4. Profit???? (For a quick mod up)
Will they quit using FAT? (Score:4, Informative)
Remember sometime ago Spirit was continously rebooting due to a flash memory problem. The usage of FAT file system in the embedded systems was partly responsible for the mess.
The problem, Denise said, was in the file system the rover used. In DOS, a directory structure is actually stored as a file. As that directory tree grows, the directory file grows, as well. The Achilles' heel, Denise said, was that deleting files from the directory tree does not reduce the size of the directory file. Instead, deleted files are represented within the directory by special characters, which tell the OS that the files can be replaced with new data.
By itself, the cancerous file might not have been an issue. Combined with a "feature" of a third-party piece of software used by the onboard Wind River embedded OS, however, the glitch proved nearly fatal.
According to Denise, the Spirit rover contains 256 Mbytes of flash memory, a nonvolatile memory that can be written and rewritten thousands of times. The rover also contains 128 Mbytes of DRAM, 96 Mbytes of which are used for data, such as buffering image files in preparation for transmitting them to Earth. The other 32 Mbytes are used for code storage. An additional 11 Mbytes of EEPROM memory are used for additional program code storage.
The undisclosed software vendor required that data stored in flash memory be mirrored in RAM. Since the rover's flash memory was twice the size of the system RAM, a crash was almost inevitable, Denise said.
Moving an actuator, for example, generates a large number of tiny data files. After the rover rebooted, the OSes heap memory would be a hair's breadth away from a crash, as the system RAM would be nearly full, Denise said. Adding another data file would generate a memory allocation command to a nonexistent memory address, prompting a fatal error.
Source: DOS Glitch Nearly Killed Mars Rover [extremetech.com]
BTW, there is another interview of Mike Deliman [pcworld.com] I read sometime ago in PCWorld.
Re:Efficiency (Score:5, Informative)
> megabytes; the rest of the code, plus data space,
> eventually exceeded 30 megabytes." This should be used as
> the example for efficient coding
You've GOT to be kidding, right? 2 meg of OS code? That's ULTRABLOAT compared to most spacecraft. In fact, for the vast majority of the space age, that would have exceeded the resources of the computer by several orders of magnitude.
I've done this kind of programming for a living (for 10 years, moved up to controls design) but the last system I programmed for has 372k of memory, total. That includes data, code, OS, everything. Runs at 432 KIPS. And it performs what it probably one of the most complex in-flight autonomous control operations ever.
Most are even more restrictive. For example, 8K of PROM and 1k of volatile memory (and 28 WORDS) of non-volatile memory. This more than adequate for most applications, if you do it right.
Many spacecraft OS's are more akin to this:
hardware interrupt
external electronics power up processor.
external electronics set PC = 80hex
run
{execute all the code}
halt
power down
Once every 1/4 a second for 15 years.
The project I am currently working on uses VxWorks (and so we were quite interested in the Mars Rover problem) and it's so bloated with unnecessary features it's absurd. This is not a Windows box, it's a spacecraft processor.
I can't argue with the 30 meg of data space. Using the memory as a data recorder would be quite useful and a good picture takes a lot of space. But it's alarming to me that you could figure out how to waste maybe 4-5 meg on code. If you started with a bare home-brew OS, I would guess (and I get paid for this sort of guess) that you could do the entire flight code in 512K, with maybe 8k of data space, excluding the science data.
Only recently have space-qualified rad-hard processors with this kind of capability become available. Until then, if you said you needed 2 meg for the OS alone, you would have gotten fired on the sopt and referred to mental health professionals. The availability of these processors enabled people to use high-level languages with tremendous overhead (like C++) to be used. And this was only done for employee retention purposes during the bubble. For years it was done at the assembler or even machine level. It's still not at all uncommon to do, and we've done MANY flight code patches, with only a processor handbook, an engineering paper pad, and by setting individual bits one-by-one.
Brett
Re:Open source spaceware (Score:3, Informative)
Re:Out of curiousity (Score:1, Informative)
why, in the 21st century, is it necessary to fit something like the Mars rover code in 2MB of memory? If something like a Gameboy Advance or a PDA can hold 64MB-a couple gigs, what is holding NASA back, with their gigantic budget and all?
One thing, radiation. It cheaper to take simpler purpose designed and fabricated, bulkier chips up that dont get upset once a particle hits it then it is to send up the lates and smallest chips supersensitive to radiation but oh so fast, and add lead shielding doubling only as dead weight.
Re:Out of curiousity (Score:5, Informative)
Last I read (maybe a year ago?), NASA still used 386 and 486 chips because they didn't generate a lot of heat (compared to todays machines) and could be made to withstand higher than normal forces (through extra padding on the device I imagine). They were more resiliant to the issues you might see in space than newer processors.
Simply put, if they put the latest CPU with tons of RAM in there, and it fails, how are they going to fix it?
-- Joe
And on top of all that... (Score:3, Informative)
...the memory inside the Gameboy Advance and whatnot isn't radiation-hardened.
The grandparent poster needs to RTFA, and note what had to be done to protect circuits from Marvin the Martian's cosmic rays. The chips get physically bigger (sometimes a lot bigger), and that builds up quickly.
Re:Open source spaceware (Score:3, Informative)
Re:Out of curiousity (Score:5, Informative)
Re:Out of curiousity (Score:4, Informative)
Smaller memory capacity for a given surface area implies larger feature size.
By the way, the class I took was 1-on-1 with Prof. Stephen McGuire at Cornell. Extremely cool guy.
Re:Other options being considered (Score:3, Informative)
Re:Marketing crap (Score:1, Informative)
BS. How much of the linux kernel have you read in detail for determining its quality? I agree that access to code allows you to FIX things faster or on your own, but you cant evaluate quality of any large piece of commercial software by looking at it.
Also for FWIW VxWorks still kicks Linux's ass on context switch times. For a really responsive system (think line rate packet switching etc) Linux is not even an option. Look at QNX, VxWorks etc.
Re:Huh? (Score:4, Informative)
"WindRiver portayed their tool as being able to do those things, thus I made the wrong decision based upon the false claims of the manufacturer."
You see, WRS would have you believe that VxWorks has a reasonable disk subsystem, even though they have no option of using DMA for the data transfers, a fact they convienently don't make available.
WRS had a port of XFree available for VxWorks. However, they did not release the source for it, and they stopped supporting it, and thus it fell behind in support for the video chips now in use. Of course, they did not inform developers of their impending decision to drop support until it was too late.
WRS has a TCP/IP stack. However, they did NOT have support for DHCP, nor DNS, and on certain platforms their stack has gross errors (e.g. packets being shifted by one byte so that when the reach the application they are corrupted.)
WRS claims to have board support packages so that you don't have to develop them. They don't mention that they don't support half the hardware on most boards (e.g. they don't enable the cache on XScale processors, halving the speed of the processor).
WRS claimed they would support development under Linux as a host OS "within a couple of months" - that was back in 1998. They started supporting development under Linux this year - and then not very well.
Yes, I choose the wrong tool for the job - because WRS did not correctly represent their tool's capabilities and there was no other way to evaluate the capabilities of the tool.
Re:hmm... (Score:1, Informative)
Development.' by Robert Love.
As far as locks go in pthreads(?)..WTF?
You can lock critical sections in pthreads without
using constructs like semaphores, which are crappy anyway(Read Stevens..again, or maybe for the
first time),by using a little imagination.