Cache Mapping in Challenge and Onyx Systems

Cache Mapping in Challenge and Onyx Systems

The cache design in the Challenge/Onyx line depends on the CPU model in use. The basic Challenge uses the IP19 board with R4x00 processors. This CPU board uses a simple algorithm to assign a memory location to a cache line: the address of a byte of data is taken modulo the cache size to generate the cache address. This means that two words that are separated in main memory by an exact multiple of the cache size are always loaded to the same cache location.

Only one of the words can occupy the cache at a time, so if your program alternates between words, it will have a cache miss on each reference. It is surprisingly easy to create this situation. The following code fragment causes bad performance in an R4K Challenge system with a 1 MB cache.

float part1[262144]; /* 1 MB */
float part2[262144]; /* adjacent 1 MB */
for (j=0;j<262144;++j) part1[j] = part2[j];

In that code fragment, the words of each array hash to the identical cache lines, so each assignment in the loop incurs two cache misses. (Some systems have caches of different sizes, but the same principle applies.)

Note: The cache in the POWER Challenge, which uses the R8000 or R10000 CPU, does not use simple modulus mapping; it is an associative memory that is much more resistant to cache conflicts.