Introduction to PAPI
PAPI, or the Performance Application
Programming Interface is a machine
independent set of callable routines that provide access to the
performance counters on most modern processors. It is installed on a
variety of machines available through cs.utk. The illustration in this
exercise should work on any of the Intel Pentium Linux machines: msc01
- 08 or torc0 - 8, or any node of the boba or frodo SinRG clusters.
For further information about PAPI, check the PAPI website .
PAPI High Level Calls
PAPI is implemented in layers. The top layer consists of eight calls
which provide a simple interface to PAPI functionality for many
applications.
An overview of these eight functions can be found on
the PAPI manual pages . A single high level call,
PAPI_flops , is all that will be needed for this illustration.
Using PAPI to Measure Execution Time
As described in
the PAPI_flops documentation , a call to PAPI_flops returns four
parameters, discussed below:
*rtime -- total real time in seconds since the first
PAPI_flops() call
*ptime -- total process time
in
seconds since the first PAPI_flops() call
*flpops -- total floating
point operations since the first PAPI_flops() call
*mflops -- Mflops/s achieved
since the latest PAPI_flops() call
The values of rtime and ptime are
derived from the cycle counter on the Pentium chip, and multiplied by a
computed clock speed for the given processor as determined by measuring
against a system real-time clock. You can stop the counters used by
PAPI_flops with a call to PAPI_stop_counters.
The next call to PAPI_flops will start over with fresh values for all
returned parameters.
The Source Code
To illustrate the use of PAPI_flops for performance measurement, we
provide a simple C routine to multiply two matrices. The source code
can
be found in PAPI_flops.c . Note that all
programs that use PAPI must #include papi.h . You
can open each of these files in your browser and save them to your home
area.
Running this example
To try out this example, log on to a machine on which PAPI is
installed. This could be any of: msc01 - 08, torc0 - 8 or the boba or
frodo SinRG clusters. save the files
PAPI_flops.c and papi.h into your area.
Execute the following command line to compile and link this test:
UNIX> gcc -I/usr/local/include -O0
PAPI_flops.c /usr/local/lib/libpapi.a
-o PAPI_flops
When you run the program, you should get output similar to the
following (your milage may vary):
UNIX> PAPI_flops
Real_time: 0.077321
Proc_time: 0.077193
Total flpins: 2000000
MFLOPS: 25.909208
PAPI_flops.c PASSED
Programming on your own
To use PAPI_flops in your own code, you can either modify this file to
suit your needs, or copy the relevant pieces to code you have already
written. Make sure to #include "papi.h" and remember that a -1 value in
flpins will reset the counters. Experiment with the make line to suit
your needs.
Notes for Fortran
You can refer to
the PAPI_flops documentation to get the exact calling syntax for
Fortran. You can also refer to
the PAPI Fortran page for more general information on calling PAPI
routines from Fortran. Remember
that the Fortran calls have an extra check parameter at
the
end to pass back error status. Also keep in mind that a long long
value in C (64-bit integer) is an INTEGER*8 in Fortran,
and
a float in C is a REAL in Fortran.
A sample command line to compile and link the Fortran program foo
might look like this:
UNIX> f77 foo.f /usr/local/lib/libpapi.a -o foo.out
|