![]() Acrobat file (254K) |
![]() ClarisWorks 4 file (60K) |
![]() QuickView file (473K) |
T E C H N O T E :
Understanding
PCI Bus Performance
TECHNOTE 1008 | OCTOBER 1995 |
By Paul Freeburn
freeburn@applelink.apple.com
Apple Developer Technical Support (DTS)
Since the IB chip competes for system memory along with other system devices, continuous PCI bursting is not possible. Therefore, the achievable PCI bandwidth on Power Macintosh computers -- a significant improvement from NuBus -- will be less than the PCI theoretical maximum. Also, the bandwidth will be dependent on the PCI target's hardware design and the architecture of the driver software.
A PCI burst transfer is defined by one PCI bus transaction with a signal address phase followed by two or more data phases. One may ask, how can the bus master transfer a data object on each PCI clock cycle? To initiate a bus transaction, the PCI master only has to arbitrate for ownership of the bus one time. The master then issues the start address and transaction type during the address phase. It's the responsibility of the target device to latch the start address into an address counter and increment the addressing from data phase to data phase. (A single-beat read or write transaction is defined by a signal address phase followed by only one data phase.)
For data to be transferred between the PowerPC Processor and the PCI Target, or for the PCI Target to transfer data between system memory, one of the following commands is initiated, as shown in Table 1.
With the basics of the PCI bus described and details of the Power Macintosh PCI implementation outlined, this should be ample background to describe the functionality of the IB chip. In particular, under what circumstance will it perform what type of PCI command?
If the PCI target's address space is set to write thru cache mode, the IB chip will perform an eight-beat burst read on PCI with the Memory Read Line command. This translates to a cache line, eight 4-byte long words, i.e. 32-bytes.
If the PCI target's address space is set to write back cache mode, the IB chip will perform an eight-beat burst write on PCI with the Memory Write and Invalidate command.
The PCI Memory Write and Invalidate command will perform an 8-beat transaction if the address is aligned on a 32-byte boundary.
The PCI Memory Read Line or Memory Read Multiple commands will perform an eight-beat transaction if the address is aligned to an address less than or equal to 8-bytes less than the next 32-byte boundary. The PCI Memory Read Line and Memory Read Multiple commands are treated the same by the IB chip, in either case the IB chip will disconnect after an eight-beat transaction -- one 32-byte cache line.
The numbers in Tables 2 and Table 3 are based on the following assumptions:
With this reference release OS, Apple starts to separate between APIs (Application Programming Services) and SPI (System Programming Services).
In this present Mac OS release and the future direction, such as Copland, APIs and toolbox services are no longer available to driver SW. The Mac OS version
7.5.2 provides a DSL (Driver Services Library) that implements all SPI services available for drivers; documented in Designing PCI Cards and Drivers for
Power Macintosh Computers, Chapter 9.
PrepareMemoryForIO
, and CheckpointIO
. The PrepareMemoryForIO
function allocates resident system memory to buffers, provides logical and physical address information, and in conjunction with CheckpointIO
manages coherency between system memory and the PowerPC caches. CheckpointIO
is called after the buffer transfer is complete and either relinquishes the memory back to the OS and adjusts the processor caches for coherency, or prepares for another IO transfer.
PrepareMemoryForIO
should not be confused with PCIPrepareMemoryForIO
is an example of a service in the DSL; PCI cards that have DMA hardware should use PrepareMemoryForIO
to locate physical addresses in system memory. Older I/O expansion cards would typically use a toolbox call GetPhysical
to locate physical
addresses in system memory. To be fully compatible with the present and future Mac OS releases, drivers should only use SPI services. Again, this is fully
documented in Designing PCI Cards and Drivers for Power Macintosh Computers.Remembering that PCI address space defaults to cache inhibit mode, to enable the PowerPC to burst to areas of PCI memory space, that area must be set to cacheable setting. This can be done with the
SetProcessorCacheMode
(see chapter 9 in Designing PCI Cards and Drivers for Power
Macintosh
Computers). Set the desired PCI address space to kProcessorCacheModeCopyBack
for cache line writes and
kProcessorCacheModeWriteThrough
for cache line reads.
SetProcessorCacheMode
has an undocumented limitation. The PowerPC address space is divided into sixteen 256-Mbyte segments that are distinguished by the upper 4-bits of the effective address. The SetProcessorCacheMode
is only capable of changing the cache setting for one contiguous section of memory per 256-Mbyte segment. Therefore, if two PCI cards are configured where they both have PCI address assignments in the same segment only one card can change its address space cache setting. As an example, if two cards (card x and card y) have addresses mapped into segment 8, one at 0x80800000 and another at 0x80801000, the first call to
SetProcessorCacheMode
from the driver of card x to make a cacheable address space in segment 8 will work. A second call, say from the driver of card y, to modify the cache setting in segment 8 will not work nor will it report an error. This scenario will most likely result in a lower than expected performance for card y, because card y address space is actually cache inhibited which disables PCI transactions of 32-byte cache lines. If the two cards are mapped into different segments, such as 8 and A, then they both can modify the cache settings within their perspective segments. This limitation will be relaxed in the future.Extensions to the
BlockMove
routine have been incorporated in the DSL that optimizes performance on the PowerPC CPU family. In particular, BlockMoveData
has been optimized for data that is cacheable and BlockMoveDataUncached
for data that is cache inhibited. The difference between the cached and uncached versions of these instructions is that, for BlockMoveData
, the PPC dcbz
instruction is used to avoid the logically unnecessary read of the destination cache blocks. BlockMoveDataUncached
does not use the dcbz
instruction because dcbz
is extremely slow for address space marked cache inhibited or cache write thru.Table 4 lists the different BlockMove functions provided in the DSL
The difference between
BlockMove
and BlockMoveData
versions is whether or not the block being moved contains 68K instructions. If the data does contain 68K instructions BlockMove
must be called which also flushes the DR (Dynamic Recompilation) Emulator's cache. This is costly time-wise, so if the block does not contain 68K instructions, be sure to use BlockMoveData
or BlockMoveDataUncached
. Also with performance in mind, when appropriate the BlockMove
routines will align the source and destination address to utilize floating-point load and store instructions. BlockMove
routines, for transfers of large buffers between PCI cards the MoveBlockData
or BlockMoveDataUncached
functions should be used, depending if the destination address space is marked write back cacheable or not. Furthermore, PCI drivers most likely will not need to consider the non-Data variant of the BlockMove
routines because destination buffers either in PCI address space or system memory will probably not need to execute 68K code.BockMoveData
function will force the IB chip to burst 32-byte cache lines -- eight-beat data phases per PCI command transaction.Summary
The PCI bus on Power Macintosh computers delivers higher I/O performance along with lower costs and complexity from the previous NuBus
architecture. PCI also represents an emerging standard in the desktop PC industry. To maximize bus performance, utilize the services
available in the Driver Services Library, and pay close attention to PCI chip selection -- in particular, chips that can execute cache line burst
transactions with Memory Read Line, Memory Read Multiple, and Memory Write and Invalidate commands. And consider
Designing PCI Cards and Drivers For Power Macintosh Computers as essential documentation for successful PCI development on the Mac platform.Further References
Return to Technotes Table of Contents
Return to About Macintosh Technotes
Return to Developer Services and Products
Send feedback to devfeedback@applelink.apple.com.