A Diagnosis of Self-Healing Systems

Catch up on stories from the past week (and beyond) at the Slashdot story archive

A Diagnosis of Self-Healing Systems 149

Posted by michael on Tuesday December 21, 2004 @07:40PM from the heal-thyself dept.

gManZboy writes "We've been hearing about self-healing systems for a while, but (as is usual), so far it's more hype than reality. Well it looks like Mike Shapiro (from Sun's Solaris Kernel group) has been doing a little actual work in this direction. His prognosis is that there's a long way to go before we get fully self-healing systems. In this article he talks a little bit about what he's done, points out some alternative approaches to his own, as well as what's left to do."

This discussion has been archived. No new comments can be posted.

A Diagnosis of Self-Healing Systems

Search 149 Comments Log In/Create an Account

Comments Filter:

Re:The challenge of a truly self-healing system (Score:5, Informative)

by grahamsz ( 150076 ) writes: on Tuesday December 21, 2004 @07:49PM (#11154052) Homepage Journal

Plenty of Sun's boxes have redundant power supplies.

If something goes wrong with one, the system should detect either too little or too much DC voltage or current coming from it, and switch to it's backup.

Your suggestion doesn't make much sense. Should mozilla know what to do if a usb mouse fails or is removed unexpectedly? Of course not, the mozilla developers expect that this will be taken care of.

Likewise when an correctably memory or disk error occurs... The memory controller or disk firmware should deal with it and the application should be none-the-wiser.

Parent Share
twitter facebook
Similar to IBM's Autonomic Computing (Score:2, Informative)

by bhadreshl ( 841411 ) writes: on Tuesday December 21, 2004 @08:12PM (#11154266)

Well this seems like where computing services are heading as IBM is doing extensive research on Self-Configuring, Self-Healing, Self-Optimizing, and Self-Protecting computing systems called 'Autonomic'

Check out: Autonomic Computing [ibm.com]

Parent Share
twitter facebook
Re:I'm confused (Score:3, Informative)

by segfaultcoredump ( 226031 ) writes: on Tuesday December 21, 2004 @10:46PM (#11155341)

Fault Tolerance implies the ability to not just detect the fault (i.e. a failed cpu), but to keep the processes running as if nothing happened. This is possible with Stratus and Tandem boxes. It is genrally not possible with common x86/Power/SPARC boxes (unless you put a lot of software on top of two boxes to make them look like one big virual system).

"Self Healing", in this context, is the systems ability to detect a fault (hardware or software), deal with it (restart a process, isolate hardware, etc) and then get on with life (in a possibly degraded mode). In a way, the venerable Veritas Cluster System is an example of a "self healing" system. (it detects a failure of a service group and restarts it, on another node if needed)

Note that with "self healing" systems, the process may die, and end users may notice a failure. But the system is 'back online' sooner than if it required manual intervention. Compare this to a Fault Tolerant systems that never went down in the first place.

Parent Share
twitter facebook
IBMs been there done that (Score:3, Informative)

by supersnail ( 106701 ) writes: on Wednesday December 22, 2004 @06:19AM (#11157091)

.... given away the tshirts.

The currentzSeries machines come with 16 cpus and L2 & L1 packaged together on a board.
But only 12 cpus are used.

Each "cpu" is actually two cpus and a comparitor. When the cpus come up with a different answer the cpu is shutdown and procesing is taken over by one of the four free cpus on the board.

You will never know it happened until you run one of the mainrneance utilities.

In the way of IBM this technoligy will probaly appear on top end pSeries (AIX/Linux) and iSeries boxes in a couple of years.

Share
twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

A Diagnosis of Self-Healing Systems 149

A Diagnosis of Self-Healing Systems More Login

A Diagnosis of Self-Healing Systems

Re:The challenge of a truly self-healing system (Score:5, Informative)

Similar to IBM's Autonomic Computing (Score:2, Informative)

Re:I'm confused (Score:3, Informative)

IBMs been there done that (Score:3, Informative)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot