Follow Slashdot stories on Twitter


Forgot your password?
Programming Space IT Technology

Writing Code for Spacecraft 204

CowboyRobot writes "In an article subtitled, "And you think *your* operating system needs to be reliable." Queue has an interview with the developer of the OS that runs on the Mars Rovers. Mike Deliman, chief engineer of operating systems at Wind River Systems, has quotes like, 'Writing the code for spacecraft is no harder than for any other realtime life- or mission-critical application. The thing that is hard is debugging a problem from another planet.' and, 'The operating system and kernel fit in less than 2 megabytes; the rest of the code, plus data space, eventually exceeded 30 megabytes.'"
This discussion has been archived. No new comments can be posted.

Writing Code for Spacecraft

Comments Filter:
  • Re:hard to imagine.. (Score:4, Informative)

    by brilinux ( 255400 ) on Saturday November 20, 2004 @02:43PM (#10875612) Journal
    Actually, if I remember correctly, there was a problem with one of the rovers, and they had to re-flash it from millions of KM away. I am not sure whether they had a backup copy of the OS on the rover that would facilitate the re-flashing, or whether there was some patch that was transmitted, but I remember them talking about it on the news.
  • George Neville-Neil (Score:5, Informative)

    by cpghost ( 719344 ) on Saturday November 20, 2004 @02:45PM (#10875618) Homepage

    The interviewer George Neville-Neil co-authored "The Design and Implementation of the FreeBSD Operating System" with Marshall Kirk McKusick.

  • Re:hmm... (Score:2, Informative)

    by Anonymous Coward on Saturday November 20, 2004 @02:48PM (#10875638)
    WindRiver ROTS (real time operating system) is painful to work with. There debugging environment is a nightmare and the cost of development and deployment is almost 3x that of an embedded linux. My little company just finished doing a trade study of the various ROTS kernels available and yes, thiers might be more reliable, but at a huge cost. Furthermore, performance wise, it just isn't to snuff vs say MercuryOS on a single CPU, let alone a multi CPU system.
    As to releasing of thier source code? From Wind River? ROTFL!
    1. 2. 3. 4. Profit???? (For a quick mod up)
  • by EqualSlash ( 690076 ) on Saturday November 20, 2004 @03:05PM (#10875731)

    Remember sometime ago Spirit was continously rebooting due to a flash memory problem. The usage of FAT file system in the embedded systems was partly responsible for the mess.

    The problem, Denise said, was in the file system the rover used. In DOS, a directory structure is actually stored as a file. As that directory tree grows, the directory file grows, as well. The Achilles' heel, Denise said, was that deleting files from the directory tree does not reduce the size of the directory file. Instead, deleted files are represented within the directory by special characters, which tell the OS that the files can be replaced with new data.

    By itself, the cancerous file might not have been an issue. Combined with a "feature" of a third-party piece of software used by the onboard Wind River embedded OS, however, the glitch proved nearly fatal.

    According to Denise, the Spirit rover contains 256 Mbytes of flash memory, a nonvolatile memory that can be written and rewritten thousands of times. The rover also contains 128 Mbytes of DRAM, 96 Mbytes of which are used for data, such as buffering image files in preparation for transmitting them to Earth. The other 32 Mbytes are used for code storage. An additional 11 Mbytes of EEPROM memory are used for additional program code storage.

    The undisclosed software vendor required that data stored in flash memory be mirrored in RAM. Since the rover's flash memory was twice the size of the system RAM, a crash was almost inevitable, Denise said.

    Moving an actuator, for example, generates a large number of tiny data files. After the rover rebooted, the OSes heap memory would be a hair's breadth away from a crash, as the system RAM would be nearly full, Denise said. Adding another data file would generate a memory allocation command to a nonexistent memory address, prompting a fatal error.

    Source: DOS Glitch Nearly Killed Mars Rover []

    BTW, there is another interview of Mike Deliman [] I read sometime ago in PCWorld.

  • Re:Efficiency (Score:5, Informative)

    by Brett Buck ( 811747 ) on Saturday November 20, 2004 @03:55PM (#10876011)
    > "The operating system and kernel fit in less than 2
    > megabytes; the rest of the code, plus data space,
    > eventually exceeded 30 megabytes." This should be used as
    > the example for efficient coding

    You've GOT to be kidding, right? 2 meg of OS code? That's ULTRABLOAT compared to most spacecraft. In fact, for the vast majority of the space age, that would have exceeded the resources of the computer by several orders of magnitude.

    I've done this kind of programming for a living (for 10 years, moved up to controls design) but the last system I programmed for has 372k of memory, total. That includes data, code, OS, everything. Runs at 432 KIPS. And it performs what it probably one of the most complex in-flight autonomous control operations ever.

    Most are even more restrictive. For example, 8K of PROM and 1k of volatile memory (and 28 WORDS) of non-volatile memory. This more than adequate for most applications, if you do it right.

    Many spacecraft OS's are more akin to this:

    hardware interrupt
    external electronics power up processor.
    external electronics set PC = 80hex
    {execute all the code}
    power down

    Once every 1/4 a second for 15 years.

    The project I am currently working on uses VxWorks (and so we were quite interested in the Mars Rover problem) and it's so bloated with unnecessary features it's absurd. This is not a Windows box, it's a spacecraft processor.

    I can't argue with the 30 meg of data space. Using the memory as a data recorder would be quite useful and a good picture takes a lot of space. But it's alarming to me that you could figure out how to waste maybe 4-5 meg on code. If you started with a bare home-brew OS, I would guess (and I get paid for this sort of guess) that you could do the entire flight code in 512K, with maybe 8k of data space, excluding the science data.

    Only recently have space-qualified rad-hard processors with this kind of capability become available. Until then, if you said you needed 2 meg for the OS alone, you would have gotten fired on the sopt and referred to mental health professionals. The availability of these processors enabled people to use high-level languages with tremendous overhead (like C++) to be used. And this was only done for employee retention purposes during the bubble. For years it was done at the assembler or even machine level. It's still not at all uncommon to do, and we've done MANY flight code patches, with only a processor handbook, an engineering paper pad, and by setting individual bits one-by-one.

  • by GileadGreene ( 539584 ) on Saturday November 20, 2004 @04:46PM (#10876324) Homepage
    Or perhaps because NASA doesn't own the code -WindRiver does.
  • Re:Out of curiousity (Score:1, Informative)

    by Anonymous Coward on Saturday November 20, 2004 @05:08PM (#10876469)

    why, in the 21st century, is it necessary to fit something like the Mars rover code in 2MB of memory? If something like a Gameboy Advance or a PDA can hold 64MB-a couple gigs, what is holding NASA back, with their gigantic budget and all?

    One thing, radiation. It cheaper to take simpler purpose designed and fabricated, bulkier chips up that dont get upset once a particle hits it then it is to send up the lates and smallest chips supersensitive to radiation but oh so fast, and add lead shielding doubling only as dead weight.

  • Re:Out of curiousity (Score:5, Informative)

    by The Vulture ( 248871 ) on Saturday November 20, 2004 @05:20PM (#10876531) Homepage
    The problem is that technology moves too quickly for it to get "NASA certified". When you send something up in space where making changes to it will be difficult, you need something that is known to be robust and reliable, that has several years of testing.

    Last I read (maybe a year ago?), NASA still used 386 and 486 chips because they didn't generate a lot of heat (compared to todays machines) and could be made to withstand higher than normal forces (through extra padding on the device I imagine). They were more resiliant to the issues you might see in space than newer processors.

    Simply put, if they put the latest CPU with tons of RAM in there, and it fails, how are they going to fix it?

    -- Joe
  • by devphil ( 51341 ) on Saturday November 20, 2004 @05:42PM (#10876664) Homepage

    ...the memory inside the Gameboy Advance and whatnot isn't radiation-hardened.

    The grandparent poster needs to RTFA, and note what had to be done to protect circuits from Marvin the Martian's cosmic rays. The chips get physically bigger (sometimes a lot bigger), and that builds up quickly.

  • by ragnar ( 3268 ) on Saturday November 20, 2004 @05:46PM (#10876698) Homepage
    I agree about opening the source, but for entirely different reasons. It would be an ideal teaching aid in a real time CS course or for enthusiasts. Although it might be possible to contribute bug fixes, I wouldn't count on it. From what I've read and seen concerning the open source projects, they tend to gather contributors for features much more readily than for bug fixes, especially the variety that are very hard to reproduce or require formal proof along with the fix.
  • Re:Out of curiousity (Score:5, Informative)

    by GileadGreene ( 539584 ) on Saturday November 20, 2004 @06:21PM (#10876891) Homepage
    Shielding does not protect against single-event upsets (particle-induced bit flips), it only provides some mitigation against total ionizing dose (which causes long term cumulative degradation as a result of drift in transistor operating parameters). There are design techniques and fabrication processes that can reduce the likelihood that a circuit will suffer upsets, but it's still standard practice to provide either redundant memory, or error detection and correction coding. In the case of MER they had 3 physically separate PROMs carrying identical copies of the flight software, and the RAM was (IIRC) protected by an EDAC code implemented in a rad-hard FPGA.
  • Re:Out of curiousity (Score:4, Informative)

    by arnasobr ( 134440 ) on Saturday November 20, 2004 @06:26PM (#10876928)
    Feature size. The smaller the feature (think gate level), the higher the chance it will be ruined by random radiation exposure. And that's the one-sentence summary of the "Radiation Effects on Microelectronics" class I took about 7 years ago.

    Smaller memory capacity for a given surface area implies larger feature size.

    By the way, the class I took was 1-on-1 with Prof. Stephen McGuire at Cornell. Extremely cool guy.
  • by sexylicious ( 679192 ) on Saturday November 20, 2004 @07:31PM (#10877245)
    Yeah, when I was doing RT stuff at my former employer we made a pretty unanimous decision to not even get close to WR's stuff. Not that it couldn't do the job. They had some funky licensing thing that interfered with how we wanted to use the code. We ended up looking at a linux variant that had some tweaks to the tasking algorithm that fit perfectly with what we wanted. I think we ended up actually going in-house because one of the engineers we had programmed some code for an earlier project and we found out that the code did the EXACT thing we wanted. But I am pretty sure we would have went linux if we didn't get the in-house stuff. (The in-house code wasn't considered because the project was classified and all the source for it was locked away. It wasn't until that other engineer told us that we already had something in house that looked like it would work that we found out about it.)
  • Re:Marketing crap (Score:1, Informative)

    by Anonymous Coward on Sunday November 21, 2004 @04:56AM (#10879768)
    For me it's a case study on why you shouldn't use closed source software, you can't evaluate the quality of the code

    BS. How much of the linux kernel have you read in detail for determining its quality? I agree that access to code allows you to FIX things faster or on your own, but you cant evaluate quality of any large piece of commercial software by looking at it.
    Also for FWIW VxWorks still kicks Linux's ass on context switch times. For a really responsive system (think line rate packet switching etc) Linux is not even an option. Look at QNX, VxWorks etc.
  • Re:Huh? (Score:4, Informative)

    by wowbagger ( 69688 ) on Sunday November 21, 2004 @10:23AM (#10880514) Homepage Journal
    It's called:

    "WindRiver portayed their tool as being able to do those things, thus I made the wrong decision based upon the false claims of the manufacturer."

    You see, WRS would have you believe that VxWorks has a reasonable disk subsystem, even though they have no option of using DMA for the data transfers, a fact they convienently don't make available.

    WRS had a port of XFree available for VxWorks. However, they did not release the source for it, and they stopped supporting it, and thus it fell behind in support for the video chips now in use. Of course, they did not inform developers of their impending decision to drop support until it was too late.

    WRS has a TCP/IP stack. However, they did NOT have support for DHCP, nor DNS, and on certain platforms their stack has gross errors (e.g. packets being shifted by one byte so that when the reach the application they are corrupted.)

    WRS claims to have board support packages so that you don't have to develop them. They don't mention that they don't support half the hardware on most boards (e.g. they don't enable the cache on XScale processors, halving the speed of the processor).

    WRS claimed they would support development under Linux as a host OS "within a couple of months" - that was back in 1998. They started supporting development under Linux this year - and then not very well.

    Yes, I choose the wrong tool for the job - because WRS did not correctly represent their tool's capabilities and there was no other way to evaluate the capabilities of the tool.
  • Re:hmm... (Score:1, Informative)

    by Anonymous Coward on Sunday November 21, 2004 @01:01PM (#10881239)
    Please, please, please read 'Linux Kernel
    Development.' by Robert Love.
    As far as locks go in pthreads(?)..WTF?
    You can lock critical sections in pthreads without
    using constructs like semaphores, which are crappy anyway(Read Stevens..again, or maybe for the
    first time),by using a little imagination.

MESSAGE ACKNOWLEDGED -- The Pershing II missiles have been launched.