NetNews Usenet Archive 1993 #1

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #1 / NN_1993_1.iso / spool / comp / unix / cray / 341 < prev next >

Wrap

Text File | 1993-01-07 | 2.4 KB | 57 lines

Newsgroups: comp.unix.cray Path: sparky!uunet!haven.umd.edu!darwin.sura.net!tulane!pops.navo.navy.mil!anderson From: anderson@prowler.navo.navy.mil (Douglas T Anderson) Subject: System Stability [Was Re: Cray EL] Nntp-Posting-Host-[nntpd-14647]: prowler.navo.navy.mil Message-ID: <1993Jan7.144524.8438@cs.tulane.edu> Sender: news@cs.tulane.edu Reply-To: anderson@pops.navo.navy.mil Organization: POPS Facility Managment References: <BzL9CC.9E@rex.uokhsc.edu> <1993Jan6.180326@siisun.epfl.ch> <1993Jan6.181111.14914@chpc.utexas.edu> Date: Thu, 7 Jan 1993 14:45:24 GMT Lines: 43 In article <1993Jan6.181111.14914@chpc.utexas.edu>, jones@chpc.utexas.edu (Willaim L. Jones) writes: |> |> >From: brossard@siisun.epfl.ch (Alain Brossard EPFL-SIC/SII) |> . |> >Also the least stable, though to be fair Cray seems to have |> >finally succeeded in providing us with a stable machine. You |> >wouldn't believe how often it used to crash in the first few |> >months. |> |> Just as stable as their large machines. |> |> I sure wish cray would really put in the effort that is needed to fix |> their operating system. |> These messages got me to thinking, we have 3 Cray systems here on site (YMP8/8128, YMP2E/116, XMPEA-116) and over the past 3 years we have experienced what I consider to be pretty good stability. We have significantly more failures caused by shooting ourselves in the foot other then Cray caused failures, I dont have the numbers here in front of me, but our MTBF (Mean Time Between Failure) is on the order of 800 hours on the Y's and about 500 hours on the X. We run 24 hours a day/7 days a week and have experienced a 99.86% up time over the past 3 months. CUG statistics show, what to me as a Computer Center manager, would be good numbers for stability (though not as high as we are ;~) ). With our experience, and what appears to be the "average" from the CUG reports, I think the system/OS stability is pretty good, but more work needs to be done on System Operations/Administration tools to help us in the data center. Users dont care if its a CPU failure, Disk head crash, memory failure, or the operator "pulling the plug", down is down. -- Douglas T Anderson Technical Services Manager/Chief Engineer POPS Program All opinions expressed are mine and mine alone. They do not reflect the opinions of Grumman Data Systems, the US Navy or any one else, unless they want them to.