home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky comp.sys.atari.st:12498 comp.sys.atari.st.tech:4463
- Newsgroups: comp.sys.atari.st,comp.sys.atari.st.tech
- Path: sparky!uunet!elroy.jpl.nasa.gov!hanauma.jpl.nasa.gov!hyc
- From: hyc@hanauma.jpl.nasa.gov (Howard Chu)
- Subject: Re: Utterly bizarre idea for Atari
- Message-ID: <1992Aug18.224051.6308@elroy.jpl.nasa.gov>
- Sender: news@elroy.jpl.nasa.gov (Usenet)
- Nntp-Posting-Host: hanauma.jpl.nasa.gov
- Organization: SAR Systems Development & Processing, JPL
- References: <1992Aug15.043618.17054@news.csuohio.edu>
- Date: Tue, 18 Aug 1992 22:40:51 GMT
- Lines: 197
-
- In article <1992Aug15.043618.17054@news.csuohio.edu> max@madnick.cba.csuohio.edu (Max Polk) writes:
- >Here it is in a nutshell: a parallel processing Atari microcomputer
- >featuring a set of 68000 microprocessors, sold at low cost yet
- >featuring high technology.
-
- In 1985 or so, a company called Alliant Computer Systems based in Littleton,
- Massachusetts, introduced their FX series of machines, with the FX/8 being the
- main product, 8 68020-clones in a single chassis. This mini-supercomputer was
- outfitted with special vector-processor hardware as well, and cost around a
- million bucks. Alliant went bankrupt this year, and FX/8s now cost around
- $30K or so second-hand.
-
- It's an interesting idea, but it takes hard work to get it right. Alliant
- abandoned the Motorola family in favor of the Intel i860, (which I think was
- a mistake) but their operating system was a pretty mature product already.
- > * * *
-
- >As I glanced at my JDR Microdevices catalog, I couldn't help but wonder
- >why. Why is it that our thinking is so bound to a single microprocessor
- >running the whole show? Why must a multitasking operating system be
- >bound to a single microprocessor doing EVERYTHING. Isn't it obvious that
- >we are running into the very real limits of clock frequency and chip
- >size having to make the single microprocessor do more and more things
- >faster and faster?
-
- Sure is. But still, it's very difficult to break up a problem into segments
- that can be run concurrently. If you can't partition your problem this way,
- then parallel multiprocessing doesn't gain you anything.
- >
- >The catalog I referred to lists the price of microprocessors you can
- >order: the Atari ST's Motorola 68000 costs $6.95, while the 68020
- >costs $189.95. The IBM PC's Intel 8086 costs $5.95 while the 80486
- >that runs at 50 MHZ costs $1199.00.
-
- Hm. The cost is certainly attractive, but I still wouldn't want to use
- anything less than a 68020 in a system I were to design today. A real
- 32 bit architecture, instead of the 16 bit data/24 bit addressing of the
- 68000, plus the support needed for virtual memory, exotic address spaces, etc.
- Just can't see developing a Real Computing System without it.
- >
- >Let's see: $189.95 / $6.95 is about 27, and $1199.00 / $5.95 is about
- >201. Ignoring for a moment all of what you have grown to know and love
- >about operating systems and microcomputer architecture, consider this
- >possibility: couldn't you do more with 27 outstanding 68000's running
- >exactly one process each, in parallel, than you could with one 68020
- >running and switching between 27 processes, executing no more than one
- >instruction at a time (for roughly the same amount of money)?
-
- If you had 27 things you wanted to do right away, all at once, this would
- be fantastic. In my day to day use, I seldom have more than 4 windows open
- on my X terminal, and I'm generally only actively using one or two of them.
- When I ftp a new distribution of the X window system, or GCC, or some other
- large package, I actually do compile them in parallel on our FX/8 or FX/800,
- using up to processors at once on the 8. It's wonderful. But it's not a
- common occurrence. (See my paper "GNU & You, Building a Better World" in
- the proceedings of the 9th Annual Sun User Group conference...)
- >
- >For that matter, couldn't you do more with 201 good 8086's running
- >exactly one process each, in parallel, than you could with one 80486
- >running 201 processes all at once (for roughly the same amount of
- >money)?
-
- For roughly the same amount of money, you would have to scale back
- severely from 201 processors. It takes a tremendous amount of logic to
- synchronize a bus between a large number of processors.
- >
- >Consider what a great product Atari could have, flying into the twenty-
- >first century at low cost, if they could write an operating system that
- >is based upon the architecture of one microprocessor for each process
- >that can run. It wouldn't be that hard, if my thinking is correct:
- >each process runs on its own microprocessor with its own memory,
- >totally protected, so as to achieve instances of deaf, blind, and mute
- >minicomputers all inside the same box. (These computers inside of
- >computers will be referred to as DBMT's: Deaf, Blind, and Mute aTari's.)
-
- This is called asymmetric multi-processing. Sun is taking this approach in
- Solaris 1.x, and going for symmetric multiprocessing in Solaris 2.0. The
- Alliant Concentrix operating system was fully symmetric. It's just fine to
- have one processor per Unix process, but if you have only one kernel "process,"
- then you get serious performance bottlenecks. 27 or 201 processors making
- system calls to a single OS process will be a major drag. You will find most
- of your processors waiting in spin loops for the OS processor to complete
- their service requests.
- >
- >The operating system would only wait for system calls to be issued from
- >the various DBMT's and then handle the requests.
- >
- >Each DBMT would be constructed on a single plug-in board and there
- >would be as many of them as you had slots to plug them into. (This is
- >beginning to take on mainframe proportions!)
-
- That's a good idea. I think Alliant painted themselves into a corner by
- only allowing a small fixed number of processors per system. (8 procs on
- an FX/80, up to 28 on an FX/2800.) Like the transputer approach, where there
- is no predetermined limit to the number of processors you can connect. But
- again, the control circuitry for this approach gets quite complex.
- >
- >The process that desired to run a program would issue a system call,
- >and could either wait or not for it to finish. The operating system
- >would start the next idle DBMT. Disk access and window graphics would
- >also be system calls, giving good separation from the operating system,
- >and excellent protection (no more crashes either). Actually, the
- >protection would be perfect, leaving no possibilities whatsoever for
- >the user to tinker with system globals, keyboard vector interrupts,
- >vertical blanking interrupt routines, etc.
-
- So you're implying absolutely no sharing of memory between processor nodes, eh?
- That means all communication between a node and the OS processor has to be by
- DMA transfer? That will also introduce a performance hit. Motorola's DMA
- controllers only operate at speeds of up to around 10 or 12 MHz, for transfer
- rates of about 4 or 5 MB/sec. That's not so terrible for older SCSI-1 block
- devices, but that imposes tremendous overhead for tty-style character-at-a-time
- I/O processing.
- >
- >Perhaps this parallel concept would be applied to the filesystem as
- >well. Don't wait for a single head to reposition itself over and
- >over again while the programs are twiddling their thumbs -- use several
- >small drives or some such scheme to prevent the tremendous speed of
- >the parallel processors from bottlenecking at the disk drive.
-
- Check out the literature on RAIDs - "Redundant Array of Inexpensive Disk."
- All the rage in current high-performance disk storage systems.
- >
- >The speed would be so tremendous that no single microprocessor computer
- >could ever compare, no matter how powerful or fast. And yet, the
- >cost would be very low, since you are using the older 68000's and
- >a simple parallel processing scheme.
- >
- >Breaking a program down into several subprograms each on its own
- >DBMT with message passing between them through the operating system
- >would speed up any program. So Atari could make the most effective
- >microcomputer on the market today. Atari would supply the buyer with
- >tools to write parallel-processing programs in ordinary languages
- >to get things rolling.
-
- This is a very hard problem. To be able to easily split a single program
- among different processors, you really want to have shared memory between
- those processors, which is something we've already excluded from this design.
- Aside from that, it takes a different mind-set to get out of writing programs
- in the start -> crunch -> stop model of linear computing. Current compiler
- technology just takes traditional code and tries to run loops in parallel.
- If you have a huge processing job but it just consists of a sequence of steps
- that run in series, you can't get any benefit. (What you do then, of course,
- is fire off multiple instances of the job on multiple processors... But if
- you only need to run it once, you can't win.)
- >
- >Atari would have been one of the firsts to jump beyond the single
- >microprocessor barrier for the average home computer, giving users
- >fantastic power for the software that will soon arise that needs more
- >than a single microprocessor can handle (even with a math coprocessor).
- >
- >Because each DBMT is just that, deaf, blind, and mute, it would not
- >have to worry about I/O devices, leaving only the microprocessor and
- >memory to be installed on each DBMT board. Such a modular design
- >would be easier to construct and debug at the hardware level.
-
- It's an interesting idea. Sounds much like a transputer network, but with
- no internodal communications.
- >
- >It's about time that someone tries this out in one form or another
- >for reasonably-priced small computers, and Atari could pull it off.
- >If they did, it wouldn't be perfect, since no new ventures like this
- >are, but they could start things rolling and place their name in
- >history.
-
- Actually, thinking about this a bit more, there's no reason to limit such
- a design to any specific model of processor. Ok, so we have a single bus
- master that's basically a request dispatcher. When "the system" is booted,
- this guy starts up, inits the I/O subsystems, and inits each processor
- module. The processor modules and I/O modules can be pretty interchangeable,
- basically a card that resides at a particular bus address, with a processor,
- memory, and if an I/O module, whatever I/O channels it requires.
-
- Hm. With a 68020 or up as the central system, you can have up to 256
- 24-bit (i.e. 68000) nodes. (Assuming that you decode the function codes & such,
- giving a full 32 bit address space for the 68020 itself, and another 32 bits
- for the nodes... Maybe even more if you care to distinguish I/O nodes from
- processor nodes.)
-
- Ok, now, in order to make each single node useful, they need to have a good
- amount of RAM. I guess 4M is good for most common tasks. But wouldn't it be
- silly to have, say, 16 nodes == 64M of RAM in your system, but you can't run
- a single program that wants 8M because your memory is in 4M chunks? I guess
- this tells me that you've really got to have a single memory pool, with
- shared access to all the processors.
-
- Ah, this brings up another issue. So I can buy 27 68000s for the cost of one
- 68020. I probably can't afford to outfit all 27 of 'em with enough RAM for them
- all to do reasonable work, eh? I think it should be obvious from all this that,
- when you're designing a computer system, deciding on the processor is one of
- the easiest steps. Creating the memory system is the hard part, everything
- depends on its performance.
- --
- -- Howard Chu @ Jet Propulsion Laboratory, Pasadena, CA
- ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
- ::To the owner of the blue Mazda, license 742-XLT, your headlights are on...::
- ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
-