home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!charon.amdahl.com!amdahl!nsc!decwrl!adobe!usenet
- From: zstern@adobe.com (Zalman Stern)
- Newsgroups: comp.arch
- Subject: Re: IBM's FIRST RISC System/6000(tm) based Shared Memory Parallel Processor
- Message-ID: <1992Dec14.104305.17593@adobe.com>
- Date: 14 Dec 92 10:43:05 GMT
- References: <id.I6TV.241@ferranti.com>
- Sender: usenet@adobe.com (USENET NEWS)
- Organization: Adobe Systems Incorporated
- Lines: 79
-
- In article <id.I6TV.241@ferranti.com> peter@ferranti.com (Peter da Silva)
- writes:
- > chased@rbbb.Eng.Sun.COM (David Chase) writes:
- > > I've had this nagging suspicion for about a year that people building
- > > MP's were working awfully hard to maintain coherency where it just
- > > didn't matter. Given that most of the programming languages tell you
- > > to lock your data if it is shared, the "we're guarding against this"
- > > examples always looked like buggy programs to me.
- >
- > I guess that depends on what the consequences of running these buggy
- > programs are. Are they guarding against bringing the whole system down
- > (unacceptable) or simply crashing a single application (acceptable)?
-
- Actually, buggy programs aren't really the issue.
-
- You put locking into serialize access to data. In the "simple"
- circumstances, this breaks your program into critical sections, each of
- which gets exclusive access to a collection of data. One can write perfectly
- good algorithms which do not work this way if you have consistent memory,
- but lets ignore that for the moment.
-
- The problem arises in that the data a critical section accesses is hard to
- enumerate. In some cases it may be relatively straight forward. (The finer
- grained your locks, the easier it should be to associate specific data with
- each one.) However, in a situation like running a good bit of the TCP/IP
- protocol stack under a single lock, identifying all the modified data is
- going to be tough. (And as was pointed out here by Vernon Schryver there a
- good reasons to do this.) Dealing with shared data is an area of active
- language and compiler research.
-
- Even when you have identified the shared data, you need some way of making
- it consistent. Explicit flushing is one possibility. Then there is a
- performance hit for the flush instructions, and the writes to main memory.
- In a cache coherent system, the flushing will happen when the data is
- accessed by another processor (or as a result of normal write back
- operations). Its easy see how snoopy caches might win here. (Of course it
- depends on how much slower your caches are than non-coherent ones.) Even in
- the snoopy cache world, there are many choices in the algorithms controlling
- cache line allocation and ownership.
-
- There is much ongoing research into scalable cache coherency hardware.
- Directory mechanisms and such are one particular method. This pushes up the
- number of processors for which cache coherency works. There are plenty of
- people predicting limits to this, but I tend to regard them in the same
- light as the people telling HP they can't make their chips any faster :-)
- You will still want to do things in software to try and keep data close to
- where it is being used though.
-
- In short, this is pretty complex stuff. For small numbers of processors
- (i.e. less than 8, 16 or 32 :-)) cache coherency is a win. Especially
- considering that the most important application for many of these systems is
- the UNIX kernel. After that, multi-processor UNIX vendors are interested in
- making standard database software run fast (i.e Oracle, Informix, etc.).
- This certainly biases their design decisions. (Both the kernel and databases
- are large bodies of software that aren't going to rewrite themselves just
- because you came up with a whole new way to do the hardware.)
-
- IBM research has been in the thick of the above mentioned research for quite
- a while. Notable projects include RP3, the ACE research multiprocessor, and
- their Scientific Visualization System SVS machine. The ACE machine was a non
- uniform memory access (NUMA) machine based on the ROMP (IBM RT)
- architecture. It supported up to five processors with a practical limit of
- four if I recall. Since the RT didn't have caches, coherency problems wasn't
- as much of a problem. Bob Fitzgerald and a few others at IBM Yorktown ported
- Mach to this beastie. A lot of work went into algorithms to "page" data from
- one processor to another to get reasonable performance. The SVS machine is
- similar in principle but based on a larger number of Intel i860's. The
- coherency mechanism there was to not cache shared data.
-
- The Power/4 sounds like the next step in this line of development being
- based on the RIOS processor. Qualitatively, these machines work fine, but
- have extremely large performance deviations between codes with good locality
- properties and those with bad ones. I got the impression that they don't
- make good general purpose timeshared multi-processors. Of course, evolution
- has a way of making better products...
- --
- Zalman Stern zalman@adobe.com (415) 962 3824
- Adobe Systems, 1585 Charleston Rd., POB 7900, Mountain View, CA 94039-7900
- "Yeah. Ask 'em if they'll upgrade my shifters too." Bill Watterson
-