home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.os.ms-windows.programmer.win32
- Path: sparky!uunet!microsoft!hexnut!leefi
- From: leefi@microsoft.com (lee fisher)
- Subject: re: Does NT's SMP handel processor failure?
- Message-ID: <1992Jul27.041246.7129@microsoft.com>
- Date: 27 Jul 92 04:12:46 GMT
- Organization: microsoft corp., redmond, wa
- Lines: 105
-
- hmm, i can't find the original article... anyway, here's a response from
- a co-worker (posted with his permission) regarding this NT SMP discussion.
- hope this clarifies things...
-
- the question:
-
- > The question is: On an N processor CPU, if one of the processor fails,
- > does NT "catch" the failure, and eliminate that CPU from eligibility
- > to run any other threads?
- >
- > The reason this is important is that if the answer is yes, then
- > an N processor CPU should be N times as reliable as a 1 processor
- > CPU( Since they would all have to fail before you completely lost
- > function.) If the answer is no, then it would be 1/N'th as reliable,
- > since any one processor failing would garbage all threads scheduled
- > to the failing processor.
- >
- > On Compuserve, someone said they had been trying and failing to
- > get a definitive answer on this question. Does anyone know for
- > sure?
-
- the response:
-
- > People have been kind enough to foreword me your posting on either
- > Compuserve or Usenet. I would like to take a moment to add some
- > comments/points of my own to your debate. I apologize for the length of my
- > message, I did try to keep it reasonably succinct.
- >
- > --- Fault tolerance of SMP machines ---
- >
- > First of all, if a processor fails during runtime Windows/NT will crash.
- >
- > Next, I want to point out that you need *special* hardware to be able to
- > continue running when a processor crashes. No x86 based SMP machine
- > currently has the required hardware to make this support even possible.
- > So, yes, I will take the bet that stated the MP OS/2, assuming it appears,
- > will actually continue upon processor failure because it can't be done on
- > the current hardware. (now if IBM provides it via a special FT enabled
- > computer, that's a different story).
- >
- > --- Reliability of SMP machines ---
- >
- > Yes an N processor machine is more apt to crash then a 1 processor
- > machine. Let's say out of every M crashes in a year due to faulty
- > hardware, 1 is due to a broken processor. Then an N processor machine
- > would crash M+N-1 times for the same period of time (all other things
- > being equal). Now I don't think every hardware crash has been caused from
- > a failing processor, so I don't think the odds of crash just increased a
- > factor N-1 times for the given period. Ie:
- >
- > These numbers are just examples - I don't know the true statistics - if
- > you have them, please provide the reference for them. They would be
- > valuable in attempting to make accurate determination of this problem.
- >
- > Let's assume:
- > 100 hardware crashes a year, 1 due to a CPU failure
- >
- > If we used 4 times the number of CPUs and left everything else the same.
- > We would have:
- > 103 hardware crashes a year, 4 due to CPU failures
- >
- > or...
- >
- > 400% the number of CPU failures, but only around 3% more total crashes
- > (due to hardware). (From machines which ideally provide close to 4 times
- > the processing power)
- >
- > However, it also a fact that moving from a UP platform to an MP platform
- > does not leave the rest of the hardware 'equal', so there are other
- > factors yet to be calculated in. (On the 'down side' there is extra
- > hardware in an MP machine which could break. Plus there is typically more
- > stress provided to the other parts of the box. On the 'up side' many MP
- > machines are fault-tolerant to the more common problems so they are apt to
- > be more stable in these areas then most UP machines).
- >
- > --- Other ---
- >
- > Multi-processor support in Windows/NT is intended to give users more
- > processing power. Sure I would love to provide FT abilities at the same
- > time, but what the real intention of multiple processors in this 'context'
- > is to get more processing-power for the user - that's how the machines are
- > designed, and that's how Windows/NT utilizes them.
- >
- > Other machines do things like use 2 processors tied together which run the
- > same bits of code and are expected to produce the same bits of output. If
- > they differ, or one simply ceases to function, then they are stopped and
- > the OS can sort out the problem without risk to the integrity of the rest
- > of the machine. Obviously here is a different 'context' where a
- > multiprocessor machine is designed to be fault-tolerant, not to increase
- > performance. (and then, of course, you get into hybrid designs which want
- > to do both).
- >
- > -------------
- >
- > I tried not to color my opinions more then necessary :^). I believe the
- > SMP support in Windows/NT is a good and viable feature.
- >
- > Thanks for listening,
- >
- > Ken Reneris
- > Microsoft Corp.
- > kenr@microsoft.com
- __
- Lee Fisher, (not a spokesperson for) Microsoft Corp., Redmond, WA, USA
- leefi@microsoft.com, {uunet,uw-beaver,sun,sco,decvax}!microsoft!leefi
-