home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.unix.cray
- Path: sparky!uunet!haven.umd.edu!darwin.sura.net!tulane!pops.navo.navy.mil!anderson
- From: anderson@prowler.navo.navy.mil (Douglas T Anderson)
- Subject: System Stability [Was Re: Cray EL]
- Nntp-Posting-Host-[nntpd-14647]: prowler.navo.navy.mil
- Message-ID: <1993Jan7.144524.8438@cs.tulane.edu>
- Sender: news@cs.tulane.edu
- Reply-To: anderson@pops.navo.navy.mil
- Organization: POPS Facility Managment
- References: <BzL9CC.9E@rex.uokhsc.edu> <1993Jan6.180326@siisun.epfl.ch> <1993Jan6.181111.14914@chpc.utexas.edu>
- Date: Thu, 7 Jan 1993 14:45:24 GMT
- Lines: 43
-
- In article <1993Jan6.181111.14914@chpc.utexas.edu>, jones@chpc.utexas.edu (Willaim L. Jones) writes:
- |>
- |> >From: brossard@siisun.epfl.ch (Alain Brossard EPFL-SIC/SII)
- |> .
- |> >Also the least stable, though to be fair Cray seems to have
- |> >finally succeeded in providing us with a stable machine. You
- |> >wouldn't believe how often it used to crash in the first few
- |> >months.
- |>
- |> Just as stable as their large machines.
- |>
- |> I sure wish cray would really put in the effort that is needed to fix
- |> their operating system.
- |>
-
- These messages got me to thinking, we have 3 Cray systems here on
- site (YMP8/8128, YMP2E/116, XMPEA-116) and over the past 3 years we
- have experienced what I consider to be pretty good stability. We
- have significantly more failures caused by shooting ourselves in the
- foot other then Cray caused failures, I dont have the numbers here
- in front of me, but our MTBF (Mean Time Between Failure) is
- on the order of 800 hours on the Y's and about 500 hours on the
- X. We run 24 hours a day/7 days a week and have experienced a
- 99.86% up time over the past 3 months.
-
- CUG statistics show, what to me as a Computer Center manager, would
- be good numbers for stability (though not as high as we are ;~) ).
-
- With our experience, and what appears to be the "average" from the
- CUG reports, I think the system/OS stability is pretty good, but
- more work needs to be done on System Operations/Administration tools
- to help us in the data center. Users dont care if its a CPU
- failure, Disk head crash, memory failure, or the operator "pulling
- the plug", down is down.
-
- --
- Douglas T Anderson
- Technical Services Manager/Chief Engineer
- POPS Program
-
- All opinions expressed are mine and mine alone. They do not
- reflect the opinions of Grumman Data Systems, the US Navy or any
- one else, unless they want them to.
-