NetNews Usenet Archive 1992 #30

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #30 / NN_1992_30.iso / spool / comp / arch / 11663 < prev next >

Wrap

Text File | 1992-12-15 | 15.9 KB | 355 lines

Newsgroups: comp.arch Path: sparky!uunet!zaphod.mps.ohio-state.edu!uwm.edu!cs.utexas.edu!wotan.compaq.com!twisto.eng.hou.compaq.com!croatia.eng.hou.compaq.com!leigh From: leigh@croatia.eng.hou.compaq.com (Kevin Leigh) Subject: COMPAQ PROPOSED SCALABLE I/O ARCHITECTURE Message-ID: <1992Dec10.025609.5164@twisto.eng.hou.compaq.com> Summary: Cost effective, higher performance, processor-independent I/O scheme Keywords: I/O, Point-to-Point, High Performance, Low Cost, Bus Sender: news@twisto.eng.hou.compaq.com (Netnews Account) Organization: Compaq Computer Corp. Date: Thu, 10 Dec 1992 02:56:09 GMT Lines: 342 ************************************************************ * * * COMPAQ COMPUTER CORPORATION * * HOUSTON, TX * * * * STRATEGIC TECHNOLOGY DEVELOPMENT GROUP * * * * We're tired of beating our heads against the wall * * trying to expand the performance of I/O buses. Let's * * face it, wide-buses are just not a long term cost * * effective solution for high performance computer * * solutions. So here is an alternative... * * * ************************************************************ * * * Compaq presented a proposal at the PCMCIA sub-committee * * (CardBus) meeting in Deerfield Beach, Florida, on * * 12/7/92 for possible adoption as the CardBus standard. * * * ************************************************************ Our proposed I/O solution is - hierarchical - point-to-point - channel I/O architecture - scaleable - high performance - processor & endian independent - low cost In short, NOT yet-another shared-wide-bus!!! CPU | |---memory MIOC / \ IOC dev3 / \ dev1 dev2 Figure-1 The basic I/O subsystem consists of a "main" I/O concentrator (MIOC) which interfaces the I/O subsystem to any CPU/Memory architecture [figure-1]. One or more point-to-point channels propagate in a hierarchical manner from the MIOC to the devices. One such device is an I/O Concentrator (IOC) which allows the bandwidth of a single point-to-point channel to be shared amongst several devices. Other devices might include network interfaces, graphics, SCSI, etc. You may optionally allow a few very low bandwidth devices to share the same channel under a controlled environment. A channel is an interconnect between two ports, one residing in a (M)IOC and the other in a device. A port closer to CPU-memory is referred to as an upstream, and a port further from CPU-memory is referred to as a downstream. The physical channel consists of 12 signals: - a 50 MHz synchronizing clock - two handshake signals - eight data, and - a parity The small number of signals on a channel reduces the pin requirements for interface ICs and the physical size of add-in boards and connectors, reducing the cost of implementation. Because all the data transfers are in byte- wide quantities the sequence of data is in increasing address order, i.e., there is no little- or big-endianess in the data. Also note that the channel pin-out is NOT a derivative of any particular CPU chip's memory interface signals, allowing the solution to be non CPU architecture specific. The proposed solution minimizes "out-of-band" signals. All operations within the I/O subsystem are carried out via packets. Even the synchronizing channel clock is propagated from the MIOC to the devices through IOCs. In other words, the channel clock is redistributed on every IOC's downstream port. The signal timings of a channel is independent from all other channels. This virtually eliminates signal skew problems, especially for the clock, common in today's shared bus systems. The standard data transfer rate on a channel is 50 MBytes/sec (MBPS) and the upper-bound will be limited by the physical environment of the channel. For example, a current high performance implemention with GTL (Gunning Transistor Logic) drivers and receivers could yield a transfer rate of 200 MBPS. Two ports on a channel "agree" on a transfer rate during the initialization. Each port multiplies the 50 MHz channel clock to generate internal clocks to operate at the agreed transfer rate. Each IOC port can operate at a different transfer rate. A device designed to support high transfer rates also works at lower transfer rates. That means a device can be used at any I/O hierarchy level. The I/O "tree" should be carefully configured to optimize the bandwidth distributions according to the system and devices requirements. Operations occur using packet-based commands that flow through the channels. Each packet generally contains four fields - command (Read, Write, Error, Extensions/Interrupts, etc.) - size (the amount of data to be transferred) - address - data Only the command field is mandatory. All ports are responsible for some encoding and decoding of packets. Note that only one packet at a time can be transferred on a channel. This means that there will be times that - a devices cannot send or receive when an IOC downstream port does not have space or data, respectively, and - both ports want to send at the same time. The channel protocol supports handshaking for data transfers as well as for conflict resolutions. The commands are designed to allow optimal use of the distributed I/O bandwidth and to let devices maintain high bandwidths even when latencies are long. For example, the read operations are split transactions, where each read packet is later responded by a corresponding read-response packet. The proposed solution is designed to allow the CPU to directly read and write registers of MIOCs, IOCs, and devices in a programmed I/O manner. However, the architecture particularly lends itself to Command List Processing Master devices. These entities run device- specific command contexts placed in system memory that define their operation. This potentially frees the main CPU to perform other operations. ************************************************************ * * * So, what do you think? We are interested in making * * this proposal an open Industry Standard. As such, you * * may request further information, give us your comments, * * or ask questions. * * * * * * Contact: * * * * David Wooten, Manager, * * Strategic Technology Development * * Compaq Computer Corporation, * * 20555, SH 249, * * Houston, TX * * * * Email: davidw@twisto.compaq.com * * Phone: (713)378-7231 * * Fax: (713)374-2580 * * * * * * * * CONTINUE READING IF YOU'RE INTERESTED IN WHY WE SPENT * * TIME DEVELOPING THIS CONCEPT!!! * * * ************************************************************ The question we asked ourselves was how can one build an I/O subsystem that is high performance, easy to interface, easy to expand, and inexpensive? A shared wide-bus is not the solution due to its inherent limitations, such as distributed capacitance, one-at-a-time transmit, and high pin count for the mother board and add-in cards. In the past, shared bus architectures made sense because of the TTL technology and the high-cost-and-low-integration of silicon. The PC architecture that started almost a decade ago with a single, shared, 8-bit memory-I/O bus (known as "ISA") has evolved into many variants due to the ever- escalating performance requirements of x86 based CPU systems. This phenomenon is also seen in other system architectures based on non-x86 CPUs. One of the first changes in PC evolution was to move the CPU and the memory to a faster proprietary local bus when the ISA bus became a bottleneck. The I/O data bus was widened to 16-bit and then to 32-bit to provide higher bandwidth for some devices. There were two PC "wide-bus" solutions: EISA, a super-set of ISA for backward compatibility, and Micro Channel (MCA). EISA system is more flexible than ISA (and MCA) system because it can accept either EISA or ISA card. Also, EISA cards in EISA systems can provide higher performance than ISA cards in ISA (or EISA) systems. EISA has served quite well for several high-performance devices (e.g., Compaq's high performance QVision graphics card). As a tradeoff for the higher bandwidth, the system and the add-in cards for the wider I/O buses are more expensive than comparable ISA solutions. Because ISA can provide adequate bandwidth for the majority of devices, several board vendors continue to build more boards for ISA because of the larger available market. More recently, color graphics and video applications have become popular for PCs. In addition, high pixel resolution and more bits-per-pixel (for color) displays have become more affordable. Consequently, the bandwidth demands for the graphics/video devices have become too large for even EISA and MCA. To correct this deficiency, several system OEMs have moved graphics onto higher bandwidth proprietary local buses. As the practice of moving faster devices onto the local bus became common, there was a need for a standard. Currently, there are two local bus standard proposals, namely, PCI (by Intel) and VL-Bus (by VESA). Existing processors do not have a PCI bus and future processors will not directly support the VL-Bus. Besides, PCI and VL have limited plug-in (card) support. Consequently, PC systems utilizing PCI or VL may have multiple levels of buses (CPU bus, local bus, standard I/O bus) and IC bridges to interface between different bus protocols. All of this leads to higher system and add-in board costs. Chip and board vendors also need to choose between many different interfaces and form-factors to support a wide range of system types. Worse yet, both PCI and VL buses will very likely try to satisfy future needs, such as higher bandwidth and 3.3V technology, by CHANGING the physical layer. Examples might include: a wider (64-bit) bus once 32-bits runs out of steam, and a different driver/receiver technology for higher frequencies. When a 32-bit bus is expanded to 64-bit, a high-bandwidth (e.g., video) device will not gain performance on the 64-bit bus unless it is redesigned for 64-bit transfers. Protocols might also have to change and many of these changes will render today's PCI and VL devices incompatible (PCI does have a compatible transition to 64- bits). The additional drivers/receivers and new design will make the new devices to be more expensive. The trick is to allow the aggregate system bandwidth to go up without having to redesign everything. Neither PCI nor VL supports ease-of-use (hot-insertion and plug-and-play) features. Depending on implementation, PCMCIA does not support hot-swap and may not offer enough performance for some applications. A PCMCIA sub-committee was created to come up with a PCMCIA super-set called CardBus. As mentioned earlier, board vendors made more ISA cards than EISA cards for cost reasons and larger market share. Similarly, we are guessing that, board vendors will build more PCMCIA cards than CardBus cards, if CardBus is a super-set of PCMCIA, i.e., a wider and more expensive 32-bit bus and backward compatible. To summarize: the PC industry has tried several times to solve increasingly higher performance I/O requirements by utilizing a multitude of wide, shared buses. Most solutions did not last long because they were not scaleable and they were EXPENSIVE. As a result most PC users continue to buy cheap and reasonable performance solutions. What the computer industry needs is a solution that: (a) solves today's problems (performance, expansion and ease-of-use) in a cost effective manner AND (b) offers a migration path for those who want it AND (c) is processor independent AND (d) will provide longevity. Our proposed solution can offer ALL of these properties! ************************************************************ * * * To summarize the major Features: * * * ************************************************************ Our proposed solution is scaleable in multiple dimensions: performance, expandability and cost. Specifically FOR PERFORMANCE: (a) the CPU-memory bandwidth is decoupled from the slower device latencies, (b) point-to-point minimizes the physical constraints and GTL enables higher signal rates than TTL, FOR EXPANDABILITY: (c) virtually unlimited number of devices can be connected on-board or off-connectors, (d) expansion can also be made in external boxes via a small pin-count robust protocol (e.g., a fast serial link) (e) hot plug-and-play is supported, FOR COST: (f) small packages/connectors/boards and high integration can be achieved because of the small number of pins to interface, (g) the number of PCB layers can be minimized because of "clean" layout and no long traces (even in the case of busing, the fanouts are fairly short), (h) development cost can be reduced by utilizing common functional blocks (e.g., state machines, FIFOs), (i) time-to-market can be reduced by integrating common parts, (j) common parts for a wide range of systems enable large volume, fewer inventory part types and consistent test/manufacturing techniques, FOR COMPATIBILITY: (k) standard I/Os such as ROM and keyboards can be tapped off of an MIOC or IOC, (l) existing standard buses (e.g., ISA, PCMCIA) can also be implemented off MIOC or IOC, (m) Existing application software and operating systems will be compatible with some driver updates. FOR LONGEVITY: (n) the same devices can be reused in future systems because they are processor-neutral, (o) the same I/O subsystem or architecture can be used in a family of products (portables, desktops, workstations, servers), (p) the same device can be used for different transfer rates without changing the physical interface, and (q) changing a device interface in the future on a channel does not affect the rest of the devices [we call this "damage control"]. ************************************************************ * * * Thanks for reviewing our thoughts! Feel free to * * to contact us for more information, we're here to * * help!!! * * * * Happy Holidays from (Strat. Tech. Dev.) * * David Wooten, Kevin Leigh, Reynold Starnes, * * Thanh Tran, Chris Simonich, Brett Costly, * * David Murray, Craig Miller, Roger Tipley * * * ************************************************************