Kowarschik-2002-Overview

Páginas: 34 (8352 palabras) Publicado: 13 de mayo de 2012
10. An Overview of Cache Optimization
Techniques and Cache-Aware Numerical
Algorithms∗
Markus Kowarschik and Christian Weiß

10.1 Introduction
In order to mitigate the impact of the growing gap between CPU speed and
main memory performance, today’s computer architectures implement hierarchical memory structures. The idea behind this approach is to hide both
the low main memory bandwidth and thelatency of main memory accesses
which is slow in contrast to the floating-point performance of the CPUs. Usually, there is a small and expensive high speed memory sitting on top of the
hierarchy which is usually integrated within the processor chip to provide
data with low latency and high bandwidth; i.e., the CPU registers. Moving
further away from the CPU, the layers of memory successively becomelarger
and slower. The memory components which are located between the processor core and main memory are called cache memories or caches. They are
intended to contain copies of main memory blocks to speed up accesses to
frequently needed data [378, 392]. The next lower level of the memory hierarchy is the main memory which is large but also comparatively slow. While
external memory such as harddisk drives or remote memory components in a
distributed computing environment represent the lower end of any common
hierarchical memory design, this paper focuses on optimization techniques
for enhancing cache performance.
The levels of the memory hierarchy usually subset one another so that
data residing within a smaller memory are also stored within the larger memories. A typical memoryhierarchy is shown in Fig. 10.1.
Efficient program execution can only be expected if the codes respect the
underlying hierarchical memory design. Unfortunately, today’s compilers cannot introduce highly sophisticated cache-based transformations and, consequently, much of this optimization effort is left to the programmer [335, 517].
This is particularly true for numerically intensive codes, which our paperconcentrates on. Such codes occur in almost all science and engineering
disciplines; e.g., computational fluid dynamics, computational physics, and
mechanical engineering. They are characterized both by a large portion of
floating-point (FP) operations as well as by the fact that most of their execution time is spent in small computational kernels based on loop nests.


This research is beingsupported in part by the Deutsche Forschungsgemeinschaft
(German Science Foundation), projects Ru 422/7–1,2,3.

U. Meyer et al. (Eds.): Algorithms for Memory Hierarchies, LNCS 2625, pp. 213-232, 2003.
© Springer-Verlag Berlin Heidelberg 2003

Markus Kowarschik and Christian Weiß

CPU

L1 Data Cache

L1 Inst Cache

Main Memory

Registers
L3 Cache

214

L2 Cache

Fig. 10.1. A typical memory hierarchycontaining two on-chip L1 caches, one
on-chip L2 cache, and a third level of off-chip cache. The thickness of the interconnections illustrates the bandwidths between the memory hierarchy levels.

Thus, instruction cache misses have no significant impact on execution performance. However, the underlying data sets are typically by far too large to
be kept in a higher level of the memory hierarchy; i.e.,in cache.
Due to data access latencies and memory bandwidth issues, the number
of arithmetic operations alone is no longer an adequate means of describing
the computational complexity of numerical computations. Efficient codes in
scientific computing must necessarily combine both computationally optimal
algorithms and memory hierarchy optimizations. Multigrid methods [731],
for example, are among themost efficient algorithms for the solution of large
systems of linear equations. The performance of such codes on cache-based
computer systems, however, is only acceptable if memory hierarchy optimizations are applied [762].
This paper is structured as follows. In Section 10.2, we will introduce
some fundamental cache characteristics, including a brief discussion of cache
performance analysis...
Leer documento completo

Regístrate para leer el documento completo.

Estos documentos también te pueden resultar útiles

  • Japan Overview
  • Bi overview
  • Atg Overview
  • Tcg Overview
  • Brazilian Insurance Market Overview
  • Plan 2002
  • Overview
  • Overview Of Federalism

Conviértase en miembro formal de Buenas Tareas

INSCRÍBETE - ES GRATIS