Carsten Scholtes
Fachgruppe Informatik
Universitaet Bayreuth
Universitaetsstr. 30
95447 Bayreuth, Germany
email: carsten.scholtes@uni-bayreuth.de
With modern architectures, efficient use of caches is growing more and more important for performant execution of applications. Thus, given a specific architecture and a set of alternative source codes for a specific problem, it is interesting to compare the alternatives for their cache performance on the architecture under consideration. This paper presents a method which can be applied to the source code of a program to derive a set of calculations that can be evaluated in constant time and yields a probabilistic estimation of the number of cache misses to be expected when executing the program on a specific architecture. Architectures with one-level, direct-mapped caches are considered. The resulting set of calculations can be configured to architecture specific parameters like cache size, line size and the sizes of data types. The method is designed to be applicable to any program, in particular also to irregular applications like for example the Cholesky factorization of sparse matrices. Since the cache performance of an irregular application can depend heavily on the respective input, additional parameters describing the input of the application are allowed, as long as they are known beforehand or can be derived from the input in constant time. The method is based on the assumption that the probability for a miss accessing a specific memory line can be derived from the number of accesses to other memory lines that have been carried out since the last access to the same memory line. Starting with the source code of the program to be analysed, all places which generate memory accesses are identified. Each of them is analysed separately. The accesses through them are classified according to their behaviour considering the repeated use of memory lines. For each of these classes a specific miss probability is derived. A probability weighted sum over all classes yields the estimation of the number of misses. The method was developed with the intention to be automated and incorporated into a compiler. It has been applied to analyse both, a program to multiply a sparse matrix with a dense matrix and a program for the Cholesky-factorization of a sparse matrix. The Cholesky factorization is implemented according to a column based left-looking strategy. The sparse matrix is stored adhering to a compressed column storage scheme, where the array of row indices is further compressed when consecutive columns exhibit the same nonzero structure as their predecessors. The resulting estimations of the numbers of cache misses are compared with measurements of the respective programs.