SOFTWARE ENGINEERING

COMMAND AND CONTROL

Inside a hollowed-out mountain, software fiascoes--and a signal success

The focal point of North America's defense network looks like nothing so much as a Laundromat. Here in the computer room of the NORAD Command Center, 1,750 feet below the surface of Cheyenne Mountain in Colorado, sensor readings from heat-sensing spacecraft, tracking stations, weather satellites and coastal radar arrays converge in order to alert American and Canadian commanders of any bomber, missile or satellite attack. Sorting through that barrage of data falls to an odd lot of computers, some running software written a generation ago. My guide, Russell F. Mullins, proudly points out three shiny new VAX machines, which last year took over the processing of air defense intelligence from 74 antiquated predecessors. But I am more fascinated by the bank of magnetic tape units and the fleet of 20-year-old disk drives--they look more like coin-operated washing machines--that are still used to track ballistic missiles.

They should not be here. In 1981 the Pentagon started the Cheyenne Mountain Upgrade (CMU) program to replace the center's five main computer systems over six years, at a cost of $968 million. But as with many attempts to build grandiose software, the project soon derailed. In 1994 the General Accounting Office reported that the CMU was running 11 years behind schedule and about $1 billion over budget. Despite the extra time and money sunk into development, most of the new systems were still too slow or unreliable to use, so the air force had to keep the old systems running alongside as a backup.

This duplication created a problem, Mullins explains as he steers me through a maze of unmarked steel corridors to the bunker's systems center, which he heads. In this cramped room, technicians monitor the base's computers and its connections to the sensors, commanders and world leaders aboveground. Each new system added more warning panels and more glitches to fix. "We used to call this the Double Jeopardy Room," Russell laughs, "because we had to constantly scan more than 20 monitors for a wide variety of alerts" to network failures--alerts as subtle as "yes" changing to "no." His team fell behind amid the growing complexity. "If a missile warning component fails, we have to switch to backup systems in only two minutes," he says, suddenly very serious. "The best we could do was about four."

To solve the problems caused by too much software, CMU managers decided in April 1995 to build yet another software program, an automated tracking and monitoring system (ATAMS). With it, Mullins's crew could control the entire network using just two monitors and a simple, consistent interface that made failures hard to miss.

But the project seemed doomed from day one. Contractors estimated it would take two years to build; the air force allowed one. Bureaucratic snafus delayed delivery of Sun Microsystems workstations, forcing programmers to write the software for IBM hardware, then convert it later. Users demanded 10 times more functions than originally planned. Tests turned up unexpected bugs in the systems that ATAMS keeps tabs on. And Mullins's group found several errors just before the system was finished.

Yet in April 1996 ATAMS was complete, on time and within its $2-million budget. Unlike the rest of the CMU, it immediately worked as intended. "Now we regularly make the switchover to backups in 45 seconds," Mullins beams as he simulates losing communications with a missile launch detector. "It cut down on operator errors. And we can now operate this whole system with just two people, rather than four." To date, users have uncovered only two bugs in the software; both were fixed easily.

The success of ATAMS was surprising but no fluke, claims Buford D. Tackett of Kaman Sciences, who led the development team. He combined several techniques that were shown years ago to produce better software faster yet are still rarely used. Mullins sketched out what he wanted to see on the ATAMS screens, and Kaman built the displays first, rather than last. Tackett split the system into small segments and put the riskiest parts at the head of the line, rather than letting them slip to the end. The team incorporated off-the-shelf software and large sections from other systems. Programmers peer-reviewed one another's designs and code, catching more than 200 major design errors while they were still easy to fix. Tackett forced his engineers to perfect each segment before moving on, and rather than avoiding contact with the users, "they begged us periodically to come see what they had done," Mullins recalls.

Perhaps the most important difference between ATAMS and conventional systems is that it will be updated every year, rather than replaced once a decade. And it was designed to be just the first in a product line of related systems. Like a line of car models, its relatives will look and perform differently but share an underlying design and many of the same innards. "As we replace more elements of Cheyenne Mountain systems, we will use this product-line approach, applying the lessons of ATAMS," promises Colonel John M. Case, head of the Space and Warning Systems Directorate. Other contractors have begun experimenting with the process as well. "So eventually we should reach the point where we can evolve software continuously," he says, "at a much lower cost."

If so, perhaps future billion-dollar fiascoes will be fewer. But as I leave this cold war relic and pass three-foot-thick blast doors that take 45 seconds to open, I suspect obsolete mind-sets may prove hardest to upgrade.

--W. Wayt Gibbs inside Cheyenne Mountain, Colo.