Liren Large Software Subsidy 15

home *** CD-ROM | disk | FTP | other *** search

/ Liren Large Software Subsidy 15 / 15.iso / s / s205 / 1.ddi / BACKUP.001 / DOC_IREF_MACH_EX.HLP < prev next >

Wrap

Text File | 1991-03-11 | 23.0 KB | 521 lines

====================================================================== MACH Design Examples ---------------------------------------------------------------------- ************************* Disclaimer ********************************* Note: The following applications examples are provided solely for the purposes of learning the MACH architecture and software. No implicit guarantee of functionality or usability for any other purpose is represented. ********************************************************************** Overview This section of the documentation provides several application "shorts" to guide the beginning MACH user in learning the MACH device. They also serve to instruct designers in alternate methods of implementing useful structures. Each shows "clever and significant" design use of the device resources. The logic in these files may be described further in applications notes, exploring their significance. These test cases explore processing options, resource limits, simulation support, etc. Counters: ------------------------ CNT_SMPL.SCH device=MACH110 16-bit up/down Counter with preload -- created with 74192 macros. Shows "brute force" approach to counter. Explanation of connection problems demonstrated. CNT_16B.SCH device=MACH110 Better implementation of 16-bit counter, built with 74163 macros. AMD_C16.110 device=MACH110 16-bit up/down Counter with preload -- Vector PDS format. Shows use of byte-carry to minimize number of inputs that have to communicate between blocks. CNT_31.PDS device=MACH110 31-bit up/down Counter with simulation Data Routing Functions: ------------------------ AMD_CM8.110 device=MACH110 Dual 8-bit data multiplexing function -- Preloadable Up/down counter feeding a 2:1 mux taking its input from counter outputs & pre-load data inputs. Extreme utilization of all device resources. Requires use of FITR expand & Full flags. BRL_SHFT.PDS device=MACH110 8-bit Barrel Shifter - Shows need to spaceout large functions to allow sourcing of global inputs and use of feedback. Arithmetic Functions: ------------------------ ADD_11B.PDS device=MACH110 11-bit Adder ADD_4B.PDS device=MACH110 4-bit Adder ADD_8B.PDS device=MACH110 8-bit Adder Miscellaneous Functions: ------------------------ ORCADDMA.SCH device=MACH110 Design example used in the documentation tutorial shows how to design for a MACH110 using the OrCAD/SDT III editor. A_TO_D.sch device=MACH110 Successive approximation register for a analogue to digital converter system with simulation. M7SEG1.PDS device=MACH110 7 Segment counter plus decoder with simulation. SRAM1.PDS device=MACH110 Simple state machine for controls of static RAM - from 29K handbook. DRAM.PDS device=MACH210 Integration of 2 inter-linked state machines (originally 23S8 designs) from pg 2-187 of the PAL Handbook. RLLENDEC.PDS device=MACH210 Integration of Codec functions Pre-Loadable U/D N-bit Counter Purpose These examples show how to implement wide counters within MACH architecture. They guide the user through some of the elementary pitfalls and illustrate use of the FITR command line options for successful fitting. Large counters represent an important and useful application for PLD's. It is also one of the most common means of "benchmarking" what a PLD architecture can do. The ability to create wide product terms across many flip-flops measures the global signal access of the part. Speed of operation and logic utilization is easy to evaluate because of the regular structure and the ease of extrapolation to larger size devices and counters. Logic Implementation Type of Flip-flop Counters can be implemented in a variety of ways. We are most interested in those that count in a binary sequence (000, 001, 010, ...) and operate synchronously. If implemented with only D-type flip-flops, wider OR structures are needed for each successive bit added to the width of the counter. For large counters (10-12 bits), this product term growth overcomes most fixed product term allocations. After this point, some sort of flip-flop banking is required and enabling of the clock source is commonly utilized. A Toggle flip-flop is much more efficient for implementing large counters. Only one product term (2 total for UP and down) per stage is needed to tell each T-FF to complement. Whenever all bits lower than its stage are "ones", the flip-flop changes state for counting up -- likewise all "zeroes" triggers counting down. At each stage, only the width of the AND gates change, encompassing more stages in the all "ones" determination. Carry Generation Carrys can be pipelined when the device block size limits access to counter stages. Creation of large AND gates are limited in most segmented PLD architectures -- there is a certain "granularity" that allows N input functions, but no more. For larger functions, a tree or chain of elements must be created, usually implying additional delay for logic propagation. Pre-decoding This would seem to limit the maximum frequency of operation, but the pre-calculation of these AND terms may be accomplished and a spare macrocell (flip-flop) can store the partial term. In the next stage the completed AND term can be formed, to cause a given stage to count. In this way, modular PLD architectures can feed partial terms to where they are needed for later stages of the counter. The counter will then operate at the maximum allowed by a single stage of logic propagation. ---------------------------------------------------------------------------- Example Files Different implementations of counters are demonstrated in these files: Cnt16_Smpl.Sch 16-bit up/down Counter with preload -- created with 74192 macros. Shows "brute force" approach to counter. Explanation of connection problems demonstrated. Cnt_16b.Sch Better implementation of 16-bit counter, built with 74163 macros. Cnt_16V.pds 16-bit up/down Counter with preload -- Vector PDS format. Shows use of byte-carry to minimize number of inputs that have to communicate between blocks. Cnt_31b.pds A maximal length (31-bit) counter for the MACH110. Cnt_16s.pds AND/OR equation version of Cnt_16UDV used for size limit testing (Cnt_16r.pds is also on disk). ---------------------------------------------------------------------------- Detailed Discussion Cnt16_Smpl.Sch Cnt16_Smpl illustrates the "brute force" approach to building a 16-bit up/down. Entering the design using OrCAD's Draft program takes only a few minutes to place the 74192 4-bit counter macros and to interconnect the Carry and Borrow terms. No Connect (NC) flags are placed on the carry and borrow out signals from the highest counter stage. When run through the MACH software, we can see a reasonable set of resources are used (for the MACH110). Available Used Remaining Clocks: 2 1 1 Pins: 38 37 1 97% IO_Mac: 32 16 16 Total macro: 32 16 16 Product terms: 128 80 16 62% MACH-PLD Resource Checks OK! - Utilization *: 69 % The difficulty occurs when we get to block partitioning. We see here that all of the block inputs are used by placing only a few counter stages into each block. The remaining stages do not fit and remain "unplaced". FITR Flags This testcase was run with the FULL flag on -- with it off, the design would have failed similarly, but with only 1 equation in each block. Error F580 announces the problem -- signals unplaced. Warning F120 indicates the block partitioning measure is too high (everything is connected to everything -- contributory factor). What has happened is that all stages of combinatorial logic along the carry / borrow path have been merged together, increasing the number of inputs required at the upper stages. The logic is relatively simple for this design, but very wide gating. Superficially, there is no simple solution to the simple 16- bit counter presented here. The design as drawn needs larger PLD blocks (more inputs) than is available on the MACH. The only solution is to drain the swamp -- change the design to require fewer inputs on the higher stages. ---------------------------------------------------------------------------- Cnt16_Btr.Sch Cnt16_Btr demonstrates the technique of redoing the logic to limit the number of inputs to the logic driving the upper stages. This time the 16-bit counter is built with 74163 macros and only counts up. Each of the 8-bit counter stages employs standard logic for connecting the ripple carry output (RCO) to the count enable (ENP) within the block. Carry Implementation Carry between 8-bit blocks is constructed as a 7-bit AND gate (decoding 1111.111x). A Node flag is placed on its output to force the software to generate a separate equation for this logic, instead of merging with other stages. This will require an additional propagation delay, but since the lowest bit is not included, there is extra time for this logic. The lowest bit of the counter (C0) in not included in this carry, but instead is coupled to all the upper stages of the counter. This is done as an alternate to carry pipelining -- each counter bit can "see" the LSB and make its own decision on counting, preserving the full speed operation of the device. FITR Flags This example will be successfully fitted with no flags on. Logic Hazard This method of carry logic implemention comes with a slight problem -- what happens when the counter is preloaded to 1111.1111.1111.1111. Does it "roll-over" on the next clock cycle, if it is enabled to count? No -- because the carry does not have time to settle and thus won't be valid. This "minor" glitch is one reason this sort of logic change can't be automated to happen transparently to the user, but must be designed in manually. ---------------------------------------------------------------------------- Cnt_16V.pds This file shows the same technique of input reduction applied to an 16-bit up/down counter. The design file is written using vectored signal equations -- this significantly compacts the design file. Carry / Borrow For this design, both Carry and Borrow need to be implemented to allow up and down count modes. Since each stage has access to the count enable and the counting mode (UP), the carry and borrow logic may be combined into one macrocell. Two product terms are used to detect the all "ones" and all "zeros" conditions of the 7 bits as before (1111.111x and 0000.000x). This combined signal is then fed to each of the upper count stages and decoded to give the correct counter operation. FITR Flags This example is routed successfully using the Expand flag. If this is not done, it fails with 2 inputs incompletely routed. The block partitioning display indicates that each byte of the counter is placed into "natural" divisions and that the switch matrix is "heavily" utilized (90% _ 69%). Significant user additions to the functionality may jeopardize this fitting success. ---------------------------------------------------------------------------- Data Routing Functions Purpose Show how the I/O macro-rich architecture of the MACH implements data routing functions. Introduce some resource- planning considerations of dealing with the connection-dense routing functions. Data routing functions are a classic area for PLD applications. With the byte-wide paths and product term-clumps allocated to specific I/O ports, the MACH devices are ideally suited for many functions. We offer a modest selection of some particularly useful configurations. ---------------------------------------------------------------------------- Example Files Cnt_Mux8.pds Dual 8-bit data multiplexing function -- Preloadable Up/down counter feeding a 2:1 mux taking its input from counter outputs and pre-load data inputs. Extreme utilization of all device resources (91%). Requires use of FITR expand & Full flags. Brl_Shft.pds 8-bit Barrel Shifter - Shows need to spaceout large functions to allow sourcing of global inputs and use of feedback. ---------------------------------------------------------------------------- Detailed Discussion Cnt_Mux8.pds Dual 8-bit data multiplexing function -- Preloadable Up/down counter feeding a 2:1 mux taking its input from counter outputs & pre-load data inputs. Extreme utliization of all device resources (91%). Requires use of FITR expand & Full flags. *** Device Resource Checks Available Used Remaining Clocks: 2 1 1 Pins: 38 37 1 97% IO_Mac: 32 16 16 Total macros: 32 32 0 Product terms: 128 94 0 72% MACH-PLD Resource Checks OK! - Utilization *: 90 % ... ---------------------------------------------------------------------------- 8-bit Barrel Shifter Shows need to space out large AND/OR functions to allow sourcing of global inputs and use of feedback. *** Device Resource Checks Available Used Remaining Clocks: 2 1 1 Pins: 38 22 16 57% IO_Mac: 32 8 24 Total macros: 32 8 24 PTs: 128 80 32 62% MACH-PLD Resource Checks OK! - Utilization *: 47 % ... ---------------------------------------------------------------------------- Arithmetic Functions Purpose Demonstrate new ways of realizing arithmetic functions using MACH devices. Show how the functional implementation can be structured around the device architecture and the advantages that result. Simple arithmetic functions form an important area of system building blocks unaddressed by conventional PLD's. New high density PLD architectures redo the balance sheet, opening up new application areas. Designer creativity is the key ingredient to satisfying these opportunities. ---------------------------------------------------------------------------- Parallel Adder Design Considerations A basic 4-bit adder can be constructed by cascading 4 single bit full adders. The problem with this approach is that the delays associated with the carry are additive. In the case of a 4-bit adder this means 4 delays (60nS in a 15nS MACH device) before the final result is obtained. This approach, using a MACH device yields a design that takes 8 macros and yields a result in 60nS. This is good from a macro usage viewpoint but takes too much time. For later comparison purposes this adder design was expanded to fill as much of a MACH110 as possible. The first trial (12 bits) would be satifactory for the number of macros (24), but would require too many array inputs (36). An 11-bit adder using this approach was successfully routed using the Expand Product Term option on the compilier. Again this design is very efficient but causes problems in the area of speed (165nS). Flores Sum & Carry A classic way of designing fast adders was described in the 1960s by Flores in his "The Logic of Computer Arithmetic". He describes a look ahead carry method where independent sum and carry terms are generated for each bit and these are used to generate the full carry terms. Using the intermediate sum and the full carry the final sum bits are generated. With lots of gates with very wide inputs it is possible to generate an adder of any length in 3 gate delays. Examination of the schematic for the 7483 in TI's 'yellow book' shows a 4-bit adder constructed using this "LookAhead Carry" methodology. The problem that exists with this approach in the MACH family is one of inputs to the array. On a 4-bit adder this approach takes 20 array inputs while using 19 macrocells. We could reduce this somewhat by combining the G equations into the C equations, resulting in 4 less macrocells and array inputs. For both implementations of Flores's adders, the sums are generated in 45nS and the carry in 30nS. At 15 macrocells and that speed, this approach may be a good compromise over the ripple carry approach. If this approach is extended to fill a MACH110, the results are a maximum of 7 bits using 25 macrocells with the sum in 45nS and the carry in 30nS. ---------------------------------------------------------------------------- Logic Implementation Another Approach One needs to step back in methodology to generate an even more efficient solution. Instead of using look ahead carry methods try a brute force method. Let's make a 2 bit adder by writing the direct equations without using intermediate nodes and put 2 of these together. This results in a 2 bit adder with 1 delay and 6 macrocells used. If 2 of these are stacked together we have our 4-bit adder using 12 macrocells instead of 15 in the LookAhead Carry version and producing a result in 2 delays rather than 3. Further it uses only 10 array inputs as opposed to 16. Expanding this method to the limits of a MACH110 produces an 11-bit adder (at 12 bits we run out of product terms). This design operates at 90nS for both sums and carry. Refinements The final trial will be to apply "LookAhead Carry" techniques to the 2-bit chunk implementation. Using full "LookAhead" accross all 8 bits didn't work, but if the last 2 bits are implemented as a simple 2 bit adder, the MACH110 fits 8 bits and produces a sum and carry result in 45nS. ---------------------------------------------------------------------------- Summary Guidelines Remember that array inputs are golden and excellent results can be obtained by restructuring logic considering that fact. RULE : It is best to expand product terms up to the maximum supported by the part if you want to save delay time and/or array inputs. Macro Usage for Adders Number of bits 4 6 7 8 9 10 11 --------------------------------------------------------------------- Ripple Carry 8 12 14 16 18 20 22 LookAhead Carry 15 21 25 Cant' Do 2 Bit Chunks 12 18 20 24 26 30 32 2 Bit with Look 10 15 17 21 Can't Do Sum Delay Times for Adders (ns for MACH110-15) Number of bits 4 6 7 8 9 10 11 --------------------------------------------------------------------- Ripple Carry 60 90 105 120 135 150 165 LookAhead Carry 45 45 45 Can't Do 2 Bit Chunks 30 45 60 60 75 75 90 2 Bit with Look 45 45 45 45 Can't Do Carry Delay Times for Adders (ns for MACH110-15) --------------------------------------------------------------------- number of bits 4 6 7 8 9 10 11 Ripple Carry 60 90 105 120 135 150 165 LookAhead Carry 30 30 30 Can't Do 2 Bit Chunks 30 45 60 60 75 75 90 2 Bit with Look 30 30 45 45 Can't Do ---------------------------------------------------------------------------- Example Files ADD_11B.pds - 11-bit adder using Ripple Carry ADD_4B.pds - 4-bit adder using 2 Bit Chunks & Look-ahead Carry ADD_8B.pds -8 bit adder using 2 Bit Chunks & Look-ahead Carry ---------------------------------------------------------------------------- Additional Discussion Simplification may also be obtained in simulation if one considers how the design works. A classic 4-bit adder with Carry In and Out would require 29=512 input test conditions to exhaustivly test all possible combinations. A file with large number of test vectors runs the danger of running out of memory before the task is complete. In our designs, however, we used independent two 2 bit adders. A 2 bit adder requires only 25=32 combinations for an exhaustive test. Noting that the Carry Out has 16 of the 32 input conditions with a HIGH result and 16 with a LOW result we may test both halves of the adder. By noting where C2 is a "0" and where it is a "1" allows the designer to correctly place the tests for the second module in parallel with the first module. This shortened example can be seen as applied to the 8-bit adder in ADD_8B.PDS. The number of simulation vectors is important since the software can process only so many. It is limited by the number of vectors and the number of signals within the simulation commands. Not only will many vectors consume more time, but, it may well run out of memory, before any results are obtained. ----------------------------------------------------------------------------