NetNews Usenet Archive 1992 #30

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #30 / NN_1992_30.iso / spool / comp / arch / 11626 < prev next >

Wrap

Internet Message Format | 1992-12-14 | 5.1 KB

Path: sparky!uunet!charon.amdahl.com!amdahl!nsc!decwrl!adobe!usenet From: zstern@adobe.com (Zalman Stern) Newsgroups: comp.arch Subject: Re: IBM's FIRST RISC System/6000(tm) based Shared Memory Parallel Processor Message-ID: <1992Dec14.104305.17593@adobe.com> Date: 14 Dec 92 10:43:05 GMT References: <id.I6TV.241@ferranti.com> Sender: usenet@adobe.com (USENET NEWS) Organization: Adobe Systems Incorporated Lines: 79 In article <id.I6TV.241@ferranti.com> peter@ferranti.com (Peter da Silva) writes: > chased@rbbb.Eng.Sun.COM (David Chase) writes: > > I've had this nagging suspicion for about a year that people building > > MP's were working awfully hard to maintain coherency where it just > > didn't matter. Given that most of the programming languages tell you > > to lock your data if it is shared, the "we're guarding against this" > > examples always looked like buggy programs to me. > > I guess that depends on what the consequences of running these buggy > programs are. Are they guarding against bringing the whole system down > (unacceptable) or simply crashing a single application (acceptable)? Actually, buggy programs aren't really the issue. You put locking into serialize access to data. In the "simple" circumstances, this breaks your program into critical sections, each of which gets exclusive access to a collection of data. One can write perfectly good algorithms which do not work this way if you have consistent memory, but lets ignore that for the moment. The problem arises in that the data a critical section accesses is hard to enumerate. In some cases it may be relatively straight forward. (The finer grained your locks, the easier it should be to associate specific data with each one.) However, in a situation like running a good bit of the TCP/IP protocol stack under a single lock, identifying all the modified data is going to be tough. (And as was pointed out here by Vernon Schryver there a good reasons to do this.) Dealing with shared data is an area of active language and compiler research. Even when you have identified the shared data, you need some way of making it consistent. Explicit flushing is one possibility. Then there is a performance hit for the flush instructions, and the writes to main memory. In a cache coherent system, the flushing will happen when the data is accessed by another processor (or as a result of normal write back operations). Its easy see how snoopy caches might win here. (Of course it depends on how much slower your caches are than non-coherent ones.) Even in the snoopy cache world, there are many choices in the algorithms controlling cache line allocation and ownership. There is much ongoing research into scalable cache coherency hardware. Directory mechanisms and such are one particular method. This pushes up the number of processors for which cache coherency works. There are plenty of people predicting limits to this, but I tend to regard them in the same light as the people telling HP they can't make their chips any faster :-) You will still want to do things in software to try and keep data close to where it is being used though. In short, this is pretty complex stuff. For small numbers of processors (i.e. less than 8, 16 or 32 :-)) cache coherency is a win. Especially considering that the most important application for many of these systems is the UNIX kernel. After that, multi-processor UNIX vendors are interested in making standard database software run fast (i.e Oracle, Informix, etc.). This certainly biases their design decisions. (Both the kernel and databases are large bodies of software that aren't going to rewrite themselves just because you came up with a whole new way to do the hardware.) IBM research has been in the thick of the above mentioned research for quite a while. Notable projects include RP3, the ACE research multiprocessor, and their Scientific Visualization System SVS machine. The ACE machine was a non uniform memory access (NUMA) machine based on the ROMP (IBM RT) architecture. It supported up to five processors with a practical limit of four if I recall. Since the RT didn't have caches, coherency problems wasn't as much of a problem. Bob Fitzgerald and a few others at IBM Yorktown ported Mach to this beastie. A lot of work went into algorithms to "page" data from one processor to another to get reasonable performance. The SVS machine is similar in principle but based on a larger number of Intel i860's. The coherency mechanism there was to not cache shared data. The Power/4 sounds like the next step in this line of development being based on the RIOS processor. Qualitatively, these machines work fine, but have extremely large performance deviations between codes with good locality properties and those with bad ones. I got the impression that they don't make good general purpose timeshared multi-processors. Of course, evolution has a way of making better products... -- Zalman Stern zalman@adobe.com (415) 962 3824 Adobe Systems, 1585 Charleston Rd., POB 7900, Mountain View, CA 94039-7900 "Yeah. Ask 'em if they'll upgrade my shifters too." Bill Watterson