home *** CD-ROM | disk | FTP | other *** search
- RISKS-LIST: RISKS-FORUM Digest Wednesday 5 April 1989 Volume 8 : Issue 49
-
- FORUM ON RISKS TO THE PUBLIC IN COMPUTERS AND RELATED SYSTEMS
- ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator
-
- Contents:
- An unusual "common mode failure" in B-1B aircraft (PGN)
- Gripen crash caused by flight control software (Mitchell Charity, Mike Nutley)
- Airbus A320 article plus some comments (Nancy Leveson) [long]
-
- ----------------------------------------------------------------------
-
- Date: Wed, 5 Apr 1989 10:44:35 PDT
- From: Peter Neumann <neumann@csl.sri.com>
- Subject: An unusual "common mode failure" in B-1B aircraft
-
- A rather bizarre common mode failure has been detected in the recent inspection
- of grounded B-1B bombers: there was a shortage of lubricant in a critical
- gearbox in 70 of the 80 planes inspected (with 17 more still to go). The
- problem was found on the plane whose wing swept into the fuel tank
- (RISKS-8.46), which resulted in two shafts fractured and a leak along a fuel
- tank seam. [San Francisco Chronicle, 5 April 1989, p. A7]
-
- ------------------------------
-
- Date: Wed, 5 Apr 89 01:15:31 EDT
- From: mcharity@ATHENA.MIT.EDU
- Subject: Gripen crash caused by flight control software
-
- (quotes&inserts from FLIGHT INTERNATIONAL, 25 March 1989)
-
- On Feb 2 the 1st prototype (of 5) of Sweden's Saab JAS39 Gripen fighter crashed
- on landing after its 6th test flight. It impacted, broke left main gear,
- bounced, skidded and flipped.
-
- ``Gripen is naturally unstable and has a triplex digital fly-by-wire system
- with a triplex analogue backup.''
-
- Initial flight was ``some 18 months behind schedule'' and this was ``attributed
- to difficulties in proving the software for the flight control system.''
-
- After the 1st flight, test pilot ``remarked that the control system seemed
- too sensitive and that the control laws would probably need to be changed.''
- On all flights ``the aircraft experienced problems with lateral oscillations.''
- [On the] ``last flight oscillation in pitch was also apparent.''
-
- The accident investigation committee chairman ``confirms earlier assumptions
- that the flight control system was at fault.''
-
- Chairman:
- ``The accident was caused by the aircraft experiencing increasing pitch
- oscillations (divergent dynamic instability) in the final stage of landing,
- the oscillations becoming uncontrollable. This was because movement
- of the stick in the pitch axis exceeded the values predicted when
- designing the flight control system, whereby the stability margins were
- exceeded at the critical frequency.''
-
- Separate investigation by the JAS Industry Group:
- ``The control laws implemented in the flight-control system's computer had
- deficiencies with respect to controlling the pitch axis at low speed.
- In this case, the pilot's control commands were subjected to such a delay
- that he was out of phase with the aircraft's motion.''
-
- ``the company hopes to fly JAS39-2 before the end of the year.''
- ``Delivery of the first production aircraft [...] is now expected
- in [1993, although typo said `1933'], instead of 1992.''
-
- ------------------------------
-
- Subject: Swedish Gripen Fighter Crash
- Date: Wed, 5 Apr 89 17:09:44 BST
- From: jpff@maths.bath.ac.uk
- Sender: jpff@maths.bath.ac.uk
-
- From Datalink, April 3 1989 (a British paper for system/software)
- quoted in full without permission.
-
- Swedish wind cuts fly-by-wires
-
- Flight-control software has been blamed for the crash of the prototype
- Swedish Gripen fighter last February. The preliminary report from the
- Swedish government's crash-investigation commission indifified the
- software's inability to cope with gusting winds and the oversensitivity of
- the control system as the prime reasons for the accident.
-
- According to a spokesman for the commission, problems with the \pound 3.2
- billion project first arose in an earlier flight test. "The preceeding test
- flight had shown up problems, but it's not a problem with the aircraft or
- with the flight control systems. It's a software problem.
-
- "The whole of the control system was too sensitive for the pilot; it
- operated too fast. It was too easy for the pilot to go outside the
- flight-control envelope into unstable flight."
-
- In common with many fighters currently being developed, the JAS39
- Gripen is designed to be inherently unstable to increase its
- manoeuverability. It relies on the software to keep it under control.
-
- "There were limitations on the flight control systems, but during the
- landing phase the wind was stronger than allowed for by these
- limitations. The pilot had to try to overcome them."
-
- A final report into the crash is due in May, but work has already
- started on the second prototype aircraft, including a modified version
- of the flight-control software.
- Mike Nutley
- ------------------------------
-
- Date: Wed, 05 Apr 89 13:54:29 -0400
- From: levesonelectron.LCS.MIT.EDU
- Subject: Airbus A320 article plus some comments
-
- Here is the full Washington Post article, interspersed with a few of my
- comments.
-
- WASHINGTON POST: OUTLOOK, 04/02/89
- Copyright (c) 1989 The Washington Post Co.
- By Jim Beatson
-
- [Jim Beatson writes on aviation issues for the Guardian and some other
- British newspapers. He is currently living in Canada. NGL]
- [Apparently either Beatson or the Post removed some more controversial
- items from the original British appearance of this material. PGN]
-
- IN JUNE, a new plane hits the American skies. Northwest Airlines will become
- the first U.S. carrier to take delivery of the European Airbus A320 -- the most
- advanced passenger aircraft in the world, and already one of the most
- controversial. In use since last May by British Airways and Air France, the
- medium-sized 150-seat twin-engine jet is the first airliner to have every
- function, from flight controls to toilet operation, directed by computer.
- On June 26, 1988, two days after the third A320 went into service, it
- crashed while performing a low-level pass at a French air show. A woman and two
- children on board were killed. An investigation blamed the accident on pilot
- error, but the pilot faulted a number of factors including the aircraft's
- computers for providing incorrect altitude information. (The pilot, a senior
- Air France captain, was subsequently dismissed.) Since then, various unsettling
- reports have appeared in the European press, regarding: engines unexpectedly
- throttling up on final approach; inaccurate altimeter readings; sudden power
- loss prior to landing; steering problems while taxiing.
-
- [NGL: It is interesting that the pilot was never believed about
- the altimeter although there is not plenty of evidence to back up
- his story. I have noticed several things about evaluation of
- accidents in general:
-
- 1) Human error is always the first ascribed cause whenever a human
- is involved in the system where an accident occurred. However,
- most accidents are multi-factorial. If the altimeter is indeed
- inaccurate, then the accident was only partially caused by the
- pilot. Humans tend to want simple answers to complex problems and
- to be able to ascribe blame to some single cause. There are, of
- course, other factors at work in these oversimplifications such
- as liability issues and misplaced faith in technology. But seldom
- are accidents the result of only one thing going wrong. Actually,
- the few times I have found this to be true (i.e., one thing is at
- fault), it is a computer that is the primary agent. Perhaps engineers
- expect other things to fail and therefore design systems so that a
- single failure cannot lead to an accident. But since (as engineers
- often tell me or write in system safety evaluations) computer software
- does not fail...
-
- 2) If a human cannot be blamed, then the hardware is. The first
- incident involving the Therac 25 occurred in Hamilton, Ontario.
- The accident was blamed on a faulty microswitch (a "transient"
- failure since nothing could be found wrong with the microswitch).
- The fix for the problem was to put in a duplicate microswitch to
- detect when the filter was not in place to correctly filter
- the X-ray beam. When the next incident occurred in Tyler,
- Texas (again involving the misalignment of the filter), it was
- believed that the burn suffered by the patient (who died from his
- injuries 6 months later) was electrical. Nobody believed that he
- could have suffered an overdose or that the computer could be
- involved. The electrical system was checked out and found to be
- OK so the machine was deemed safe. Two weeks later another man
- was overdosed in Tyler (he died two weeks after this) and FINALLY,
- someone (at the hospital) decided the computer might be involved.
- It was the physicist at the hospital who was able to reproduce the
- problem and raise an alarm about the computer. He had some difficulty
- convincing anyone else about this. The Therac 25 victim in Georgia
- had great trouble convincing anyone that the Therac was responsible
- for her severe burns. This was true also for the first overdose in
- Yakima. Finally, when the second person was overdosed in Yakima
- (and all the prior incidents had occurred including the detection
- of an error in the software that could have caused the incidents),
- people were willing to examine the possibility that this was a
- software error (a different software error was given the blame
- this time). Why are people so reluctant to believe that the
- computer may be at fault?]
-
- [returning to the Washington Post article]
-
- Of course, the introduction of any new aircraft entails shake-out
- problems of one kind or another. But the A320's extensive use of
- computers raises a new set of questions: Are we ready to rely so heavily
- on complex software systems for such safety-critical applications as
- commercial flight?
-
- Bird on a Wire
-
- The control system employed by the A320 is known as "fly by wire." FBW
- replaces the conventional stick and rudder controls with a series of
- computers and miles of electronic cables. Instead of the familiar
- control-column, the pilots use "side-sticks," a single lever resembling
- the joy sticks used in video games.
- Sensing devices which gauge the aircraft's flight characteristics
- pass the information to the six color monitors that replace nearly all
- the traditional analog instruments and result, Airbus says, in 75
- percent fewer instruments than conventional configurations. On the
- uncluttered flight deck, the pilot on the right uses the side-stick with
- the right hand while the pilot on the left has a left-handed version.
- (On the left, pilots tend to push the aircraft to the right owing to the
- position of the forearm and wrist; that side-stick was adjusted to
- compensate.) But the computer system actually directs the control surfaces.
- Only the rudder and horizontal stabilizer -- both on the tail -- be
- mechanically directed by the pilot.
- All other flight controls are managed by the electrical flight-
- control system (EFCS), which contains three spoiler/elevator computers
- (SEC), two elevator/aileron computers (ELAC) and the flight-augmentation
- computer that oversees stability, limiting and protection functions. The
- engines and throttles are managed by the full-authority digital engine-
- control (FADEC) computers. The EFCS uses "dissimilar redundancy." That
- is, computers that are designed to back each other up are of different
- brands, have different microprocessor types and are supplied by
- different vendors -- all to minimize the likelihood of identical hardware
- parts failing at the same time. And different programmers were employed
- to write each of the parallel sets of software. Moreover, each computer
- is divided into two physically separate units with "segregated" power
- supplies.
-
- [NGL: There were different programmers. Were there different
- requirements specifications? How about design specifications?
- How much detailed design information was provided to the programmers?]
-
- The EFCS is designed to fly within a theoretical "flight
- envelope" -- permissible ranges for various maneuvers -- thus providing
- computer-monitored protection against windshear forces, overload or
- overspeed conditions. If the pilot were to, say, allow the speed to drop
- toward the stall point, the computer would sound alarms and
- automatically increase the power.
- In the event that two computers should disagree, one automatically
- shuts itself down and its tasks are carried out by the other. For
- example, if one unit directed the flaps to be partly extended and its
- monitoring software expected full flap extension, then the first unit
- would automatically shut itself down and its functions would be passed
- over to the other. The pilots' display monitors would tell them what had
- happened. Finally, each of the five flight-control-surface computers is
- capable of performing all of the essential tasks of the others as well
- as its own tasks.
-
- [NGL: If two computers disagree, how is it determined which computer
- to shut down? It does not sound like the pilots do this, they are
- just told about the event afterward (and may not have the information
- necessary to make this decision anyway). So how is the decision
- made? How do they know that the monitor is correct and the other
- one is not?]
-
- The Airbus A320, of course, is not the first civilian aircraft to use
- computerized control. Boeing's 757 and 767, for example, have computer-
- activated spoilers; and Boeing had planned to use FBW technology in the
- 7J7 but subsequently deferred development. Joe Sutter, Boeing's chief
- engineer for the past 20 years, believes that "fly-by-wire is way
- overstated as to its benefits"; and as for the side-stick, system, "we
- have some reservations -- like what one pilot is doing is not obvious to
- the other."
- "The main benefit of FBW," he says, "is to reduce weight and increase
- range. It will really boost safety. But fooling around with FBW to
- reduce [something like] tail size goes against the design philosophy I
- have always urged -- that you've got to design an aircraft which one day
- for some reason or other is going to get into a hell of a lot of trouble.
- " That means mechanical back-up systems for the main control surfaces.
- "What happens with FBW when the aircraft gets outside its control laws?
- Its going to leave the pilot in one hell of a lot of trouble -- for what?
- One-percent fuel burn?"
- A great deal more than that, says Airbus, which believes it now
- enjoys a significant competitive advantage over Boeing and McDonnell
- Douglas in fuel and weight savings. An Air France official says that the
- Airbus A320 is 40 percent more fuel efficient than the old Boeing 727s
- they have replaced. He was expecting 8 to 9 percent better, "but it's a
- good result anyway."
-
- How Safe Is Safe?
-
- But for all FBW's advantages, critics argue that its sophisticated
- computer system may be too far ahead of its time because of our
- relatively limited ability to test the reliability of software.
- Airbus Industry executive Robert Alizart believes that the duplicate
- architecture "reduces the chances of a total system loss to an absolute
- minimum." But Martyn Thomas, chairman of Praxis Systems, which produces special
- high-reliability software for Britain's Air Force, believe such precautions
- offer no guarantees. "Errors get through," Thomas says. "There may be common
- sources of error, such as a faulty specification, which cause the same mistakes
- in every version of the program. Identical errors may be made by independent
- teams. Testing only exercises a small proportion of the possible situations
- that the program may have to handle."
- Peter Neumann, a computer scientist at S.R.I. International, a Menlo Park,
- Calif., think tank, is a specialist in software engineering who has documented
- hundreds of software failure cases in the aerospace and other industries.
- Neumann says, "There are very serious risks in reliance or software in
- safety-critical applications. A seemingly innocuous addition to the software
- could have disastrous effects not discovered in testing. Never trust anyone who
- says such failures can never happen."
- The task facing testers is prodigious. "For even small amounts of software,"
- says Thomas, "the number of possible paths far exceeds the number which could
- realistically be tested. For example, a recent module comprising 100 lines of
- assembly code was analyzed and found to contain 38 million possible paths, of
- which 500,000 could be followed with valid input data."
- Mike Hennell, head of Computational Mathematics at Liverpool University --
- an authority on software reliability -- has not examined the A320's software
- code. Still, he says: "I wouldn't get into an Airbus A320 or any fly-by-wire
- aircraft."
- "We don't have the technology yet to tell if the programs have been
- adequately tested. We don't know what 'adequately tested' means. We can't
- predict what errors are left after testing, what their frequency is or what
- their impact will be. If, after testing over a long period, the program has not
- crashed, then it is assumed to be okay. That presupposes that they will have
- generated all of the sort of data that will come at it in real life -- and it
- is not clear that that will be true."
- Indeed, scientists have been working for 15 years on software
- reliability models, writes John Musa of AT&T's Bell Laboratories in the
- February issue of IEEE Spectrum. And they are now "moving into practice
- and starting to pay off." But they "deal with average rather than
- specific behavior, since the random nature of program usage and fault
- introduction generates failures at random." In the case of an airline
- reservation system, for example, "it is impossible to predict the next
- specific input and hence the next specific failure. Average behavior,
- however, can be characterized."
- The international design standard for airborne software systems (RTCA
- DO-178A) was developed by the Washington-based Radio Technical Commission for
- Aeronautics. Nancy Leveson, a specialist in software safety research and
- currently a visiting professor at MIT, says that DO- 178A is "not adequate for
- certifying commercial aircraft software. It lacks any mention of formal
- verification of safety, as required, for example, by the Department of Defense"
- which demands safety and hazard analysis.
- The FAA does, however, oblige developers "to use certain accepted concepts
- for design and development," says Mike DeWalt, an aircraft computer software
- specialist with the FAA. Although FAA officials do not see all the programming
- ("obviously there's no way in the world that a review agency could look at that
- much code"), they do demand adequate testing and quality evaluation, and even
- sample the programmers' work. "Basically, we take a slice through the whole
- system," says DeWalt. That is, pick a function like left aileron control and
- "follow it all the way down through testing and configuration management."
- "I don't want to imply that manufacturers and subcontractors will not do
- their best," Leveson says. "After all, they have the liability, and I'm sure
- they are decent human beings who care about human life. The problem is that
- without external review, we are depending on the competence of the employees of
- these companies, and I am less sanguine about the general state of software
- engineering knowledge and practice in industry than I am about the good
- intentions of humans."
- Daryl Pederson, deputy director of the FAA's Aircraft Certification Division
- and the man charged with certifying the A320, says of DO-178A, "The document
- recognizes that you can't test every situation you encounter." His British
- counterpart, Brian Perry, head of Avionics and Electrical Systems at the Civil
- Aviation Authority, agrees: "It's true that we are not able to establish to a
- fully verifiable level that the A320 software has no errors. It's not
- satisfactory, but it's a fact of life."
-
- Computers in the Sky
-
- Nonetheless, FBW offers the pilot some real gains. In extreme situations
- such as suddenly encountering strong windshear, the computers
- instantaneously compensate. Gordon Gorbes, chief test pilot for Airbus,
- says, "If a pilot has to make violent changes to the aircraft's attitude
- in an emergency, then the computer will prevent the pilot pushing it
- past design strengths. For example, the computer would prevent the pilot
- putting it into a dive that might break off the tail." And FBW saves
- money for the plane's owner, by reducing hardware costs, keeping the
- aircraft at optimum fuel-saving trim and facilitating the switch from
- three- to two-person flight crews.
- Many pilots flying the A320 have been enthusiastic in praising its
- handling and flying qualities. But some have complained about software
- problems and control irregularities. (The number of such complaints,
- according to Airbus' technical director, Bernard Ziegler, is small.) One
- problem reported by Air France, in a memo dated July 10, 1988 to Airbus,
- noted a software bug in its altimeter which measures the aircraft's
- height, a problem which has also been observed with British Airways'
- A320s. It is this problem that the pilot of the A320 that crashed at the
- small French airport at Mulhouse last June claimed contributed to the
- accident.
-
- [NGL: And which no one believed at the time.]
-
- There are various ways to fix a bug or add to a plane's installed
- software. Complete boxes containing replacement hardware and software
- can be exchanged by Airbus Industries. For carriers like Northwest, with
- 100 aircraft on order, this option would be expensive. So reprogramming
- could take place at a keyboard in the aircraft, conducted by Airbus or
- Northwest engineers. With over 640 aircraft on order around the world
- using two different makes of engine and a variety of sub-systems, the
- problem of "configuration management," as it is termed in the computer
- industry, becomes apparent.
-
- [NGL: Note that a configuration management problem involving
- a navigation computer was implicated in the Antarctica crash of
- the Air New Zealand plane into Mount Erebus. Of course, planes
- are not sent back to the factory for all of the hardware design
- changes that occur -- usually the maintenance crew handles
- them, Is the problem different for software?]
-
- So does the problem of anticipating a near-infinitude of real-life
- contingencies. In 1983 a United Airlines Boeing 767 went into a four-
- minute powerless glide after the pilot was compelled to shut down both
- engines because of overheating. The National Transportation Safety Board
- discovered that the plane's computerized engine-management system had
- ordered the engines to run at a relatively slow speed to optimize fuel
- efficiency. In the flight's particular atmospheric circumstances,
- however, this had allowed ice to build up on some engine surfaces,
- reducing the flow of air and causing the engines to work harder and
- overheat.
- "The problem is that the designer didn't anticipate all the possible
- demands the software would face," says Hennell. "The computer will
- always do something. But it will only do the correct thing if it has
- been programmed for that situation."
-
- ------------------------------
-
- End of RISKS-FORUM Digest 8.49
- ************************
- -------
-