The IP5, IP7, IP11, IP15, IP19, and IP21 multiprocessor CPUs have bus-watching caches that automatically invalidate the data cache when the system performs DMA (direct memory access) into physical memory. For these CPUs, no data cache write back or invalidation is required by software because these functions are performed by hardware.
DMA operations are categorized as DMA reads or DMA writes. DMA operations that transfer from memory to a device, and hence read memory, are DMA reads. DMA operations that transfer from a device to memory are DMA writes. Thus, you may want to think of DMA operations as being named from the point of view of what happens to memory.
The single-processor CPUs based on the R2000 or R3000 employ a cache architecture known as a write through cache. This means that all stores generated by the processor to a cached memory location go into the cache and into memory at the same time. Therefore, the data caches on these systems never contain data that is more recent than memory. However, after the system performs DMA to physical memory, the data lines in the processor's cache corresponding to this physical memory contain data that is stale with respect to memory. Therefore, a driver running on a single-processor CPU must explicitly invalidate the data cache before reading from the corresponding cached address, after the DMA completes.
R4000s and R8000s employ a cache architecture known as a write back cache. This means that stores generated by the processor go only into cache; they are written back from cache to memory only when a cache miss causes that cache line to be replaced. For this type of cache, the cache contains data that is newer than the corresponding memory locations. Memory is then stale with respect to the cache. Drivers that perform DMA reads from memory to device must specifically cause the cache to be written back to memory before the DMA starts. On IP19 and IP21 platforms, DMA pulls data from the processor caches, if necessary, thus providing coherent I/O.
Recall the code examples that read kernel data in Chapter 3, "Writing a VME Device Driver," and Chapter 5, "Writing a SCSI Device Driver." Before reading the data, the driver code used the dki_dcache_inval() function to invalidate the appropriate data cache lines. When the data cache lines are invalidated, accessing the kernel data causes a cache miss and, thus, forces a read from physical memory. Therefore, to ensure driver portability, your driver must always use dki_dcache_inval() to invalidate the data cache. This is the case even though the dki_dcache_inval() functions defined for the IP5 and IP7 use stub functions that do not do any actual work, although they use dki_dcache_wb() before starting a DMA from memory to device. See Table A-2 for a summary of cache line sizes for various MIPS processors.
If your driver uses the functions userdma() or physio() (physio calls userdma() internally), the data cache is automatically written back and invalidated for you no matter what system you are using. If your driver does not use these functions for a DMA write into cached memory, your driver must use dki_dcache_inval() to invalidate the data cache explicitly after the DMA completes.[18] Further, if your driver does a DMA read from memory to device, it must use dki_dcache_wb() to write back the data cache explicitly before the DMA is started.
Another consideration worth mentioning is that of buffer alignment for DMA. The R4000 processor implements a secondary cache line size of between 4 and 32 words (the secondary cache line size is dependent upon the CPU board implementation). Buffers used for DMA must be aligned on a byte boundary that is equal to the cache line size. To accomplish this, use the kmem_alloc() function with the KM_CACHEALIGN flag. This returns a buffer with the necessary alignment for the system.
Note: The R8000 has the same DMA alignment problems in general as the R4000. This is true for all systems with write back caches. Why is this alignment necessary? Suppose you have a variable, X, followed by a buffer you are going to use for DMA write. If you invalidate the buffer prior to the DMA write, but then reference the variable X, the resulting cache miss brings part of the buffer back into the cache. When the DMA write completes, the cache is stale with respect to memory. If, however, you invalidate the cache after the DMA write completes, you destroy the value of the variable X.