Data Cache Write Back and Invalidation

Data Cache Write Back and Invalidation

All CPUs use the MIPS R2000, R3000, R4000, or R8000 (formerly known as "TFP") series processor chips (R4400 and R4600 are the same as the R4000 for all practical purposes). These chips use a data cache to maximize the efficiency of fetching of heavily used memory.

The IP5, IP7, IP11, IP15, IP19, and IP21 multiprocessor CPUs have bus-watching caches that automatically invalidate the data cache when the system performs DMA (direct memory access) into physical memory. For these CPUs, no data cache write back or invalidation is required by software because these functions are performed by hardware.

DMA operations are categorized as DMA reads or DMA writes. DMA operations that transfer from memory to a device, and hence read memory, are DMA reads. DMA operations that transfer from a device to memory are DMA writes. Thus, you may want to think of DMA operations as being named from the point of view of what happens to memory.

The single-processor CPUs based on the R2000 or R3000 employ a cache architecture known as a write through cache. This means that all stores generated by the processor to a cached memory location go into the cache and into memory at the same time. Therefore, the data caches on these systems never contain data that is more recent than memory. However, after the system performs DMA to physical memory, the data lines in the processor's cache corresponding to this physical memory contain data that is stale with respect to memory. Therefore, a driver running on a single-processor CPU must explicitly invalidate the data cache before reading from the corresponding cached address, after the DMA completes.

R4000s and R8000s employ a cache architecture known as a write back cache. This means that stores generated by the processor go only into cache; they are written back from cache to memory only when a cache miss causes that cache line to be replaced. For this type of cache, the cache contains data that is newer than the corresponding memory locations. Memory is then stale with respect to the cache. Drivers that perform DMA reads from memory to device must specifically cause the cache to be written back to memory before the DMA starts. On IP19 and IP21 platforms, DMA pulls data from the processor caches, if necessary, thus providing coherent I/O.

Recall the code examples that read kernel data in Chapter 3, "Writing a VME Device Driver," and Chapter 5, "Writing a SCSI Device Driver." Before reading the data, the driver code used the dki_dcache_inval() function to invalidate the appropriate data cache lines. When the data cache lines are invalidated, accessing the kernel data causes a cache miss and, thus, forces a read from physical memory. Therefore, to ensure driver portability, your driver must always use dki_dcache_inval() to invalidate the data cache. This is the case even though the dki_dcache_inval() functions defined for the IP5 and IP7 use stub functions that do not do any actual work, although they use dki_dcache_wb() before starting a DMA from memory to device. See Table A-2 for a summary of cache line sizes for various MIPS processors.

If your driver uses the functions userdma() or physio() (physio calls userdma() internally), the data cache is automatically written back and invalidated for you no matter what system you are using. If your driver does not use these functions for a DMA write into cached memory, your driver must use dki_dcache_inval() to invalidate the data cache explicitly after the DMA completes.[18] Further, if your driver does a DMA read from memory to device, it must use dki_dcache_wb() to write back the data cache explicitly before the DMA is started.

Cache Line Sizes by Processor Type

Processor Type Size (D/I) Type Line Size (D/I) Size (D/I) Type Line Size (D/I)
R3000 (IP12) 32K/32K D 4/64 None None None
R3000 (IP7) 64K/64K D 4/64 None None None
R4000PC (IP20) 8K/8K D 32/32 None None None
R4000SC (IP17) 8K/8K D 32/32 1-4 MB D 128/128
R4400MP (IP19) 16K/16K D 32/32 1-4 MB D 128/128
R4600PC (IP22) 16K/16K 2 32/32 None None None
R4600SC (IP22) 16K/16K 2 32/32 512K 2 128
R8000 (IP21) 16K/16K D 32/32 4 MB 4 512
R8000 (IP26) 16K/16K D 32/32 2 MB 4 128

Cache Line Sizes by Processor Type

Processor Type	Size (D/I)	Type	Line Size (D/I)	Size (D/I)	Type	Line Size (D/I)
R3000 (IP12)	32K/32K	D	4/64	None	None	None
R3000 (IP7)	64K/64K	D	4/64	None	None	None
R4000PC (IP20)	8K/8K	D	32/32	None	None	None
R4000SC (IP17)	8K/8K	D	32/32	1-4 MB	D	128/128
R4400MP (IP19)	16K/16K	D	32/32	1-4 MB	D	128/128
R4600PC (IP22)	16K/16K	2	32/32	None	None	None
R4600SC (IP22)	16K/16K	2	32/32	512K	2	128
R8000 (IP21)	16K/16K	D	32/32	4 MB	4	512
R8000 (IP26)	16K/16K	D	32/32	2 MB	4	128

Another consideration worth mentioning is that of buffer alignment for DMA. The R4000 processor implements a secondary cache line size of between 4 and 32 words (the secondary cache line size is dependent upon the CPU board implementation). Buffers used for DMA must be aligned on a byte boundary that is equal to the cache line size. To accomplish this, use the kmem_alloc() function with the KM_CACHEALIGN flag. This returns a buffer with the necessary alignment for the system.

Note: The R8000 has the same DMA alignment problems in general as the R4000. This is true for all systems with write back caches. Why is this alignment necessary? Suppose you have a variable, X, followed by a buffer you are going to use for DMA write. If you invalidate the buffer prior to the DMA write, but then reference the variable X, the resulting cache miss brings part of the buffer back into the cache. When the DMA write completes, the cache is stale with respect to memory. If, however, you invalidate the cache after the DMA write completes, you destroy the value of the variable X.

[18] On systems with write through caches, and on IP5, IP7, IP9, IP11, and so on, the dki_dcache_inval() functions are stub functions that perform no actual work.