Only one of the words can occupy the cache at a time, so if your program alternates between words, it will have a cache miss on each reference. It is surprisingly easy to create this situation. The following code fragment causes bad performance in an R4K Challenge system with a 1 MB cache.
float part1[262144]; /* 1 MB */ float part2[262144]; /* adjacent 1 MB */ for (j=0;j<262144;++j) part1[j] = part2[j];In that code fragment, the words of each array hash to the identical cache lines, so each assignment in the loop incurs two cache misses. (Some systems have caches of different sizes, but the same principle applies.)
Note: The cache in the POWER Challenge, which uses the R8000 or R10000 CPU, does not use simple modulus mapping; it is an associative memory that is much more resistant to cache conflicts.