Table of Contents
Performance Optimization -Part I
For More Information
Outline
SN0 Architecture (Origin 2000 and Onyx2)
Nonuniform memory access(NUMA)
SN0 Cache Coherence
Cache Coherence (cont.)
Cache Contention
False Sharing Example
False Sharing Examples (cont.)
False Sharing Example (cont.)
MIPS R10000 Architecture
R10000 Execution Units
R10000 Cache Architecture
R10000 Cache Architecture (cont.)
Set Associativity
Set Associativity (cont.)
Set Associativity (cont.)
R10000 Cache Architecture (cont.)
Out-of-order Execution
Out-of-order execution (cont.)
Speculative Execution
SN0 Input/Output
PPT Slide
POWER, POWER2, and PowerPC Processors
POWER3 Processor
POWER3 Processor (cont.)
POWER3 Processor (cont.)
POWER3 Processor
DOE ASCI IBM SPs
DOE ASCI IBM SPs (cont.)
Virtual Memory
Virtual Memory (cont.)
Integration of performance optimization with code development
Integration with code development (cont.)
Integration with code development (cont.)
Tuned Library Routines
Profiling: Locating the Hot Spots
SGI Profiling Tools
SGI Profiling Tools (cont.)
SGI Profiling Tools (cont.)
SGI SpeedShop
SGI Speedshop (cont.)
SpeedShop Sampling Time Bases
SpeedShop Samping Time Bases (cont.)
SpeedShop Sampling Time Bases (cont.)
SpeedShop SamplingTime Bases (cont.)
SpeedShop Sampling Time Bases (cont.)
SGI SpeedShop -Ideal Time Profiling
Call Hierarchy Display
Compiler Feedback File
Exception Profiling
Exception Profiling (cont.)
dprof
drof (cont.)
AIX Profiling Tools
prof output
gprof output
tprof
tprof (cont.)
Xprofiler
SGI Compiler Optimizations
SGI Compiler Optimizations (cont.)
SGI Compiler IEEE Conformance
IEEE Conformance - Example
SGI Compiler Roundoff Control
Roundoff Control - Example
XL Fortran 5.5.1 Compiler Options
Compiler Options (cont.)
Compiler Options (cont.)
SGI Automatic Parallelization
SGI Automatic Parallelization (cont.)
Running a Multithreaded Program under IRIX 6.5
IBM Automatic Parallelization
IBM Automatic Parallelization (cont.)
Interprocedural Analysis (IPA)
IPA (cont.)
Loop Nest Optimization
|