Toyota Acceleration and Embedded System Bugs 499
An anonymous reader writes "David Cummings, a programmer who worked on the Mars Pathfinder project, has written an interesting editorial in the L.A. Times encouraging Toyota to drop claims of software infallibility in their recent acceleration problems. He argues that embedded systems developers must program more defensively, and that companies should stop relying on software for safety. Quoting: 'If Toyota has indeed tested its software as thoroughly as it says without finding any bugs, my response is simple: Keep trying. Find new ways to instrument the software, and come up with more creative tests. The odds are that there are still bugs in the code, which may or may not be related to unintended acceleration. Until these bugs are identified, how can you be certain they are not related to sudden acceleration?'"
Infallible fail. (Score:5, Insightful)
Re:Impossible to test (Score:5, Insightful)
Right, just hype. Except for those families were killed by the Toyota acceleration problems.
Re:Impossible to test (Score:5, Insightful)
Re:Impossible to test (Score:1, Insightful)
Most software is nearly -impossible- to test under flawless conditions. Especially embedded systems with small amounts of CPU power and memory.
Plus, all this hype around these Toyota acceleration problems is just that, hype.
Its not just software. This is an embedded system, the software is highly dependent on the hardware involved. When I say hardware, I mean the entire system of boards, wires, and connectors in the vehicle. Embedded systems have a tight couple of software and hardware unlike your average computer application. A slight error in the hardware systems would also send the software in some unknown behavior if not tested correctly/fully. Its a fairly difficult feat to ensure complete software/hardware compliance in an embedded system.
Testing. (Score:5, Insightful)
Of course, there is an alternative theory... (Score:1, Insightful)
Driver error, much like the Audi "unexplained acceleration" problem.
When Audi came up with a new automatic transmission interlock that forced drivers to put their foot on the brake before they could shift out of park, the problem went away.
The US government and its friends in the not-so-big three are using this as an excuse to pick on Toyota. No other car recall has had anything close to this level of government intervention.
Of course, my car has a fail-safe mechanism to disconnect the engine from the wheels - it's called a clutch. You should get one on your next car.
Re:Another interesting statistic (Score:4, Insightful)
Be careful to note that the 24 cases discussed there are only the ones that have led to serious incidents.
Logic of Testing (Score:5, Insightful)
David Cummings does seem to know what he's talking about, but as it is written, there is some strange logic in the article.
Testing cannot prove the absence of bugs, only their presence. There are two things that do not follow from this:
It sounds to me as if Toyota is saying the former, while Cummings says the latter. Neither is a correct conclusion.
Re:Logic of Testing (Score:5, Insightful)
Given practical software engineering conditions though, a) is highly unlikely while b) is highly likely.
Re:Logic of Testing (Score:4, Insightful)
I'll grant you that, but what I don't understand is this:
If you test, and do find some bugs, does that allow you to put any more trust in your software than if you tested and didn't find any?
Re:Impossible to test (Score:1, Insightful)
Right. Driver error. You mean it wasn't the floormats (first excuse), brake pedals (second excuse) and now something else? Why are you apologizing for Toyota? They have killed a number of people now, including children. Unacceptable. And yes, other manufacturers have issues too.
Re:Maybe it's not a bug, maybe it's an Easter Egg. (Score:2, Insightful)
The type of people that purposely hide bugs that will likely kill several people are the same type of people you can't really "appease" no matter what you do.
Boeing versus Airbus (Score:5, Insightful)
I'm loving this conversation here because I've gotten crucified in slashdot before for making simmilar comments to the whole thread here. I grew up in a family of top managers of Boeing systems engineers. They hated computers. My dad never even learned how to turn one on. He hired other monkey to use the computers. As A child I was regailed with wonderful stories of every hard lesson in safety my dad had learned over his lifetime. He loved world war II because they got to use cutting edge designs for balls out performance yet at the same time learned how to make things reliable by disecting the accident. He would tell me about the accident that taught them that the engine pumps need to be at full speed but flow stalled on take off so that there's no lag when you hot swap after a pump fails. He told me of the accident where they learned not to route 100% of the control system wiring through any one junction box. etc...
Probably because of all these hard won lessons boeing for years insisted on fully mechanical or hydraulic flight surface controls. Whereas Airbus and other jumped on the fly-by-wire concept early. My dad would spit after hearing some youg person tout all the advantages of fly by wire. He knew them perfectly well. He was big on accepting new innovations to reduce fuel costs and increas performance. He was not a luddite. But he had a safety background that told him these electonic systems were hard as hell to validate and hard as hell to make truly independent from each other.
For example they often used triple redundant computers and if one of them disagreed the other two would vote it off the island and stop listening to it. From what I've read it's now suspected that the latest airbus crash in the pacific had one of it's root problem in the voting nexus where a superior computer over ruled a more primitive safety system.
While we all know that computer software validation is hard if not impossible. It's not something we readily admit here on slash dot. It's because for years people like my dad would throttle the innovations the computer engineeers wanted to implement. I think as a result there became this culture of computer engineers that presented the case that embedded computing could be made safer than it really could be to offset that.
So now we come full circle and have to admit there is this middle ground. Just because a computer can improve perfromance does not mean it's reliable and safe. The old guys had a point after all when it came to safety.
Next week I'll tell you about how the ancient shocking lesson of the British Commet aluminum aircraft wings falling off led to the unanticipated discovery of metal fatigue and probably was the reason Boeing was slow to move to composite materials in commercial aircraft (but not in military aircraft). In hind sight we have heard of many tales of the composite tails of plane falling off as the reason for the loss of control before a crash. Conversely, composite wings on UAVs allow them to absorb a lot of bullet holes with no loss of control and to operate under higher perfromance conditions.
The point is that safety and performance are trade offs when both are pushed to the limit. The old guys know a lot more about safety than you might expect. The young guys are all about performance.
God damn legal system (Score:2, Insightful)
prove a negative? (Score:3, Insightful)
Isn't this like proving God doesn't exist?
They can test and test and not get a result that said this is the bug, so they assume that it doesn't exist.
Re:Impossible to test (Score:5, Insightful)
"Unacceptable" is strong. Sad, yes, but this is real life. There is no such thing as zero risk. Taking the attitude that it is somehow achievable despite the utter impossibility is something that makes for a good trial lawyer but a terrible human.
Re:Impossible to test (Score:2, Insightful)
Programmers today just give up too easily, and companies cut costs by scriping on testing, ducumentation and verification.
I'll start with the cheap joke - apparently you didn't learn the lesson too well.
And then the actual point - this is the real world, sir. Academics may have the luxury of taking as much time as they like to never produce result. That's not how things actually work for the rest of us.
Re:I agree on non-software fail-safes (Score:5, Insightful)
Re:Impossible to test (Score:1, Insightful)
What Toyota is doing is unacceptable. They have had this issue since 1986. 1986! In 2003 this problem surfaced in the Sienna, but they hid it for 5 years. They now are blaming the aftermarket floormats, the drivers, the pedals, whatever. UNACCEPTABLE. And then ridiculous people are defending them.
Get this people: we are in a economic war with Asia, which we are currently losing. It affects you and your future in a very real way. Supporting/defending crappy foreign companies doesn't do you any good, so stop it.
Re:Impossible to test (Score:5, Insightful)
Since when did that ever prevent anyone from doing anything? You must have us confused with some society that generally considers the full implications and long-term repercussions of our decisions...
Re:Impossible to test (Score:2, Insightful)
Toyota's reaction may well be unacceptable to you, but I find it highly unlikely that they were aware of the cause of the problem and left it alone anyway. Even assuming your economic war theory is true, there are no economic gains to be had by killing your customers.
In any case, this isn't me defending Toyota. Don't argue like a teenage girl.
Re:Another interesting statistic (Score:5, Insightful)
That part is strange. Uncontrolled acceleration is a much greater risk to life and limb than the red-lined/blown engine you might get if it were put into neutral with the throttle wide open. Being "afraid to try neutral" makes no sense.
Just an irrelevant side note: I've always found it low-class and tacky that phone calls made to 911 become publically available, especially when you hear them on the news. The message is, "hey sir or madam, remember that moment when you were highly emotional and had no idea if you were going to live or die? Well, we've got great news! That highly personal moment of reflection on your own mortality is now a public spectacle for millions of people! It's okay, we make a profit from this! No we won't share that profit with you..."
I realize it's a government service funded by taxpayer dollars. That explains how this is possible. It fails to explain how this is the best or most honorable thing to do.
That part generally shouldn't be a surprise. I'd imagine it also helps if you can keep calm and avoid panicking, as panicky people often fail at things they could do easily if they were not in a state of deer-in-headlights fear.
Nor does it explain why older drivers were disproportionately affected. Possibly the Toyota brand is more popular among older drivers because it historically has retained a decent resale value. While nothing the driver does should ever cause this kind of uncontrollable automatic acceleration, perhaps older drivers tend to have habits that somehow manifest whatever the actual underlying problem is. There are a lot of coincidences and correlations being pointed out in this discussion but unfortunately there seems to be little certainty about whether they are more than that.
Re:Toyota: (Score:3, Insightful)
There will always be another bug.
And even more important - the bug may be a combination of software and hardware. Just ask what may happen if the code suddenly jumps to the wrong address. Do they use ECC memories in the electronics? What about a voltage spike? Driver has wrong socks/pants causing a spark that jumps to the OBD-II connector and messes up the CAN bus?
If anything can go wrong - it will. Think outside the box of how bad it can be, then multiply with PI to get a value closer to reality.
And more examples of how wrong things can get can be found here: http://thedailywtf.com/ [thedailywtf.com]
Re:Impossible to test (Score:5, Insightful)
Your opinion of its likelihood is not relevant. Not only is it likely, evidence points to it being true. You are being disingenuous by phrasing it "no economic gains to be had by killing your customers." A product has a flaw, people die, that happens sometimes. If you issue a recall, you draw attention to the problem and cost yourself money in lost sales, repair costs, and possible lawsuits. "Killing your customers" is a bit different from "hoping that driver error is the official cause, not faulty cars," and you deciding to phrase it that way is an appeal to emotion, not a logical argument.
You can say we're just arguing semantics, but you're going to have to back up your unlikely opinion with links to convince me.
Re:Impossible to test (Score:2, Insightful)
Re:Boeing versus Airbus (Score:5, Insightful)
I'm pretty sure that the tail of an airplane falling off is an unanticipated scenario that humans cannot deal with either ;)
Re:Boeing versus Airbus (Score:5, Insightful)
Re:Impossible to test (Score:3, Insightful)
The articles you linked indicate that Toyota has attempted a number of different fixes for what it believed were several separate acceleration related issues. I don't have to find links to make my case, you made it for me.
Re:Impossible to test (Score:5, Insightful)
The is a component of moral empowerment though, you have to consider. Most people are more willing to accept risk if they control the situation, even if the risk is greater. Other people are more accepting of an inherent justice of in the results when something bad happens to someone else who they feel was in control of the situation than when they were not.
Consider on a per person per mile of travel basis a drunk "walker" is more likely to cause a traffic related fatality than a drunk driver. They do things like stumble off sidewalks into traffic, misjudge the rate of on cumming traffic and run out into busy highways, sit an take a rests on unlit rural roads and more. Still we vilify the drunk driver because when they cause a traffic fatality chances are they are not the individual contributing to the statistic, where as with the walkers they are usually the one killed.
If we really minimizing risk we would be more condemning of drunk walking than driving because someone is more likely to die. We don't operate that way though, we don't think that way. Many people would take a friends keys, few would forcibly restrain them if they could not be convinced to stay a little longer and sober up, even though that friend would be safer behind the wheel.
The same thing applies, most of us would be more willing to accept our loved one died because they were not able to control a set of mechanical and hydrolic linkages correctly and quickly enough to avoid and auto accident than we are when a software system fails to do the same, even though the later was far less likely.
I am not saying that makes sense in moral terms, statistical terms, or anything. In fact the more objectively you look at it the less sense it makes to not use drive by wire and computerized systems but "we" still don't "feel" that way about it.
Re:Boeing versus Airbus (Score:3, Insightful)
Justifying something based on its performance on outlier conditions has never been a good idea. The number of accidents where the tail falls off is probably greater than the number of accidents where the tail falls off and the software would be able to compensate.
Maybe having the "oh shit" button turn this on would be a good idea, but I think if you look at the number of crashes and their causes, you'd want to build redundancy in the rudder or strengthen the tail.
Re:From the days of "winmodems" (Score:4, Insightful)
On the flip side, the USN replaced complicated and heavy hardware analog computing systems for [SSBN] missile guidance systems with software running on a digital computer, and MTBF went through the roof and maintenance man hours and MTTR through the floor. The same thing happened when they replaced the analog torpedo fire controls with digital ones. The same thing happened again when the hovering system controls were upgraded to digital.
Now, before you claim that is a limited set of examples, I invite you to consider the millions of incident free flight hours accumulated by fly-by-wire aircraft. Or the replacement of DIP switches in PC's with software configuration. Etc... Etc...
Re:Impossible to test (Score:3, Insightful)
The "PR shitstorm" is way way over-hyped, it would be simple for the news to simply state "Toyota has confirmed an issue effecting the engine speed controls, and have issued a recall. If this happens to you while driving Toyota advises drivers to shift the car into neutral and engage the 4 ways and pull over in a safe location. If your car has a push button start be aware that you will need to hold it down for up to 5 seconds to shut down the engine." The fact that some people have died as a result of poor driving ability is no different than every fall here when it snows and some dumb person forgets that snow is slippery, a terrible thing to have happen, but usually 100% their fault.
You don't know much about the main stream media in the US, do you?
Re:Toyota: (Score:5, Insightful)
Other questions would be "What kind of transducer is measuring the input?"; "How many transducers are there?" and "What output do you get in the case of a failure?"
Note that there are applications where an unknown throttle setting resulting in full power being applied is the right thing to do. Maybe Toyota through they were building a light aircraft rather than a car...
Re:Boeing versus Airbus (Score:3, Insightful)
Amen. Fly by wire can never be perfectly safe - no matter how well a system is designed it can still fail. As long as its safer than the mechanical systems we're still ahead though.
Open Source the damn code -- so that I can test it (Score:2, Insightful)
I'm a former professional "software tester from hell" (currently unemployed) -- and I drive a 2007 Camry Hybrid.
Officially, my car does not have a problem -- other than the floor mats -- which are now in my trunk and were never a real problem anyways.
Months before all of this publicity, I complained to my dealer about what seemed like a "sticky throttle" during routine maintenance.
The engine continues to run fast for 3-5 seconds after letting up on the gas.
The dealer actually charged me for the extra inspection, but did not find a problem.
So obviously -- I have some concerns.
I doubt that it is a software bug. It didn't start happening until the car was 2 years old.
But who knows ?
Some of those embedded computers have probably been running since the day the Hybrid battery pack was connected in the factory.
Any software running that long could have become unstable.
But without source code, I am powerless to do anything about it.
I have to rely on the word of Toyota's software QA people -- even though I know the current state of the art of software testing is a JOKE !
If Toyota open sourced the code -- I'd have a lot more confidence -- that with a lot more eyes on it -- the software really was OK.
(and if they offered a reward for finding a problem -- I'd be even more confident)
Now for a quick rant as to WHY the current state of software testing is a joke, and why I have little confidence in ANY corporate software QA.
I write this as a former CSTE -- the QAI's "Certified Software Test (Engineer/Expert)".
I also should say that I love software testing because it is the one part of software development where creativity and intuition still play significant role.
And -- it is one area where techniques and standards are still being developed at a significant pace.
Most software development today is 98% boilerplate and copies of stuff somebody else did.
Engineers translate functional specifications into code based on established design patterns.
There are some basic calculations to ensure good response times and scalability.
Software testers typically create test plans from the same set of functional specs that the engineers use.
They simply validate that everything that is supposed to happen, happens.
Then they might run some performance tests -- but only if management budgeted for a test environment for suitable for performance testing.
Then they stop.
Inevitably -- bugs appear in areas that no one ever expected.
Those are fixed later -- and regression tests are added to the test plan.
But -- almost NO ONE EVER LOOKS FOR THOSE "UNEXPECTED" BUGS -- before software is put into production.
Why ?
Because engineers hate the unexpected and don't typically know how to deal with it.
Micro-managed companies following strict Six Sigma processes (like Toyota) don't know how to create a time and resource budget for a "hunt for the unexpected".
The QAI (Quality Assurance Institute) doesn't help either.
They are run by a bunch of engineers obsessed with a desire to precisely measure and quantify every aspect of software testing.
Their techniques are useful, and largely valid -- but if they don't know HOW to quantify something -- they IGNORE it.
Just ask any CSTE -- "How do you test for race conditions" ?
There is no established technique for this, so the QAI simply IGNORES the issue.
There is no mention of race conditions in the CSTE's CBOK (Certified Body of Knowledge).
I used to work for one the the world's top software QA "gurus" and I once asked him how we test for race conditions -- the answer was --
"we don't, because we don't know how to do it".
Despite this -- intermittent race condition bugs account for a huge portion of real-world bugs !
As programmers make more use of multi-core CPUs and GPUS -- race condition bugs are getting to be more and more common.
And yet -- testing for race conditions and testing for "the unexpected" IS actually possible -- it just
Re:Speaking as an embedded programmer here... (Score:1, Insightful)
Also speaking as a embedded programmer, with 18+ years work on avionics, industrial robotics, pneudralic-control systems, and the like, the SAFE assumption is to point at software. Your points about flakey/glitchy hardware conditions are well put--they are unfortunately far too common in far too many systems (which is another rant for another time, about piss poor design)--however it is the JOB OF THE SOFTWARE to sanely and 'correctly' control those systems. That means the _must_ be ARCHITECTED AND WRITTEN to handle all of those conditions, even the "impossible" corner-cases. Period. No exceptions, no excuses.
Please note: I'm making a distinction between the software programmers who made TECHNICAL DESIGN DECISIONS, and the person(s) who made POLICY DESIGN DECISIONS. From my general perusal of the news reports and statements made by Toyota in the aftermath of all of this, there are _clearly_ multiple problems with their POLICY DESIGN. If someone hits the brakes, it should override the accelerator functions. Period--no exceptions. Audi has always done this by default, Toyota is "implementing this as an additional safety feature"... Funny how they didn't "implement" it prior to this. But there also are likely multiple implementation issues with the TECHNICAL DESIGN of the software as well. 30 Million lines of code? Yeah, there's going to be some garbage in there... I'll lay $50 on that.
My point? That despite any hardware failures, hardware design flaws, and the like, software development is the last bastion of control. (Should it be? Maybe not--hence the decades-long argument for avoiding fly-by-wire.) To maintain that control requires good planning, good execution, and good testing through the entire software process. Toyota clearly failed at one of these, and likely in others. (I'm not even going to touch on the pending lawsuit concerning internal documentation swiped by their ex-lawyer, where it is alleged they made false statements, hid evidence, etc.) For Toyota to make absurdist statements in the press, and then act surprised, is B.S. They deserve to go through the flames on this.
Re:Boeing versus Airbus (Score:3, Insightful)
I agree with that wholeheartedly. Statements like "question all assumptions" are about as helpful as "foresee everything and never make mistakes."
I question that. It's just that the most complex tasks require software because they are beyond what can be done with mechanical systems alone. For an example of a complex mechanical and software system, look at the Space Shuttle. What brought it down, twice? Mechanical problems, not software glitches.
Re:Boeing versus Airbus (Score:4, Insightful)
One can be short without being wrong. You were both short *and* wrong.
No, actually they discovered (as is widely documented in aviation histories) that they failed to correctly account for the stresses caused by multiple pressurization and depressurization cycles. They knew perfectly well how to design for metal fatigue, but lacked information on how that fatigue would manifest itself.
I merely pointed out how you have the story wrong, not that your point was false.
Re:Impossible to test (Score:3, Insightful)
Actually, these days the engine won't blow if you suddenly shift into neutral due to rev limits in the engine controller, but that just makes any refusal to shift into neutral at speed even less excusable.
Defensive programming would include making a shift into neutral at speed override the accelerator position much like Toyota is doing (after the fact) with the brakes. Of course if the part of the software dealing with driver input is wedged, that won't help.
The other issue is the start button. Apparently you have to press and hold it for 3 seconds. We all know that now, but apparently that info is buried in the owners manual. It also defies psychology. In a panic situation where time is badly distorted for the driver, they are to patiently press and hold the button. The natural behavior is to press rapid fire.
I have had the throttle stick open on an old car once. Not wide open, but I released the accelerator and it had no effect. I turned the car off and pulled over. The ignition switch (being a mechanical device) behaved the same way it always does when turned, so there was no confusion. There's a lot of good in a positive action user interface. I was easily able to unstick the throttle cable and continue on my way.
Re:Other lessons from Boeing (Score:2, Insightful)
Re:Impossible to test (Score:4, Insightful)
Judging by how the power button appears to work in the Prius, I would guess all that it does is tell the computer to shut down the engine. It's not like a typical car where the turning the key to Acc or off would cut the power to several critical systems. So if the computer is messed up to the point where the gear shift selector is not working, I wouldn't count on the power button to help you either.