Performance Optimization - Part I

4/7/99


Click here to start


Table of Contents

Performance Optimization - Part I

For More Information

Outline

SN0 Architecture (Origin 2000 and Onyx2)

Nonuniform memory access (NUMA)

SN0 Cache Coherence

Cache Coherence (cont.)

Cache Contention

False Sharing Example

False Sharing Examples (cont.)

False Sharing Example (cont.)

MIPS R10000 Architecture

R10000 Execution Units

R10000 Cache Architecture

R10000 Cache Architecture (cont.)

Set Associativity

Set Associativity (cont.)

Set Associativity (cont.)

R10000 Cache Architecture (cont.)

Out-of-order Execution

Out-of-order execution (cont.)

Speculative Execution

SN0 Input/Output

PPT Slide

POWER, POWER2, and PowerPC Processors

POWER3 Processor

POWER3 Processor (cont.)

POWER3 Processor (cont.)

POWER3 Processor

DOE ASCI IBM SPs

DOE ASCI IBM SPs (cont.)

Virtual Memory

Virtual Memory (cont.)

Integration of performance optimization with code development

Integration with code development (cont.)

Integration with code development (cont.)

Tuned Library Routines

Profiling: Locating the Hot Spots

SGI Profiling Tools

SGI Profiling Tools (cont.)

SGI Profiling Tools (cont.)

SGI SpeedShop

SGI Speedshop (cont.)

SpeedShop Sampling Time Bases

SpeedShop Samping Time Bases (cont.)

SpeedShop Sampling Time Bases (cont.)

SpeedShop Sampling Time Bases (cont.)

SpeedShop Sampling Time Bases (cont.)

SGI SpeedShop - Ideal Time Profiling

Call Hierarchy Display

Compiler Feedback File

Exception Profiling

Exception Profiling (cont.)

dprof

drof (cont.)

AIX Profiling Tools

prof output

gprof output

tprof

tprof (cont.)

Xprofiler

SGI Compiler Optimizations

SGI Compiler Optimizations (cont.)

SGI Compiler IEEE Conformance

IEEE Conformance - Example

SGI Compiler Roundoff Control

Roundoff Control - Example

XL Fortran 5.5.1 Compiler Options

Compiler Options (cont.)

Compiler Options (cont.)

SGI Automatic Parallelization

SGI Automatic Parallelization (cont.)

Running a Multithreaded Program under IRIX 6.5

IBM Automatic Parallelization

IBM Automatic Parallelization (cont.)

Interprocedural Analysis (IPA)

IPA (cont.)

Loop Nest Optimization

Author: Shirley Browne

Email: browne@cs.utk.edu