PARA'04 State-of-the-Art
in Scientific Computing
June 20-23, 2004 (Home page)

Updated: 20 May 2004

Hybrid full packed format

John A. Gunnels and Fred G. Gustavson
IBM T.J. Watson Research Center
Yorktown Heights NY 10598, USA
email: gustav@watson.ibm.com

We introduce a new format called HFP ( Hybrid Full Packed ) We will cover the six points (1-6) below as well as seeing how HFP fits together with Standard Full and Packed Data Formats.

  1. Full format triangular and symmetric arrays waste half the storage but provide high performance via the use of level 3 BLAS

  2. Packed format arrays fully utilize storage but provide low performance as there are NO level 3 packed BLAS.

  3. Major New Idea : Combine (1) and (2) using HFP format to obtain high performance using L3 (Level 3) BLAS as HFP is totally full format. An Lapack full and/or packed symmetric/Hermitian routine becomes a single new HFP routine. There are some 125 x two such symmetric/Hermitian/triangular routines and hence some 125 new new routines can be produced.

  4. Major New Idea is possible because conceptually two equal or nearly equal isosceles triangles make a square or near square. Let N be given and n1 = N/2 and n2 = N - n1. We give an example where N = odd. When N is even the case is similar and hence will be omitted. Covering the odd and even case for UPLO = 'U' and 'L' is general. However, because of symmetry there can be different ways to define this format. The HPF format to be given here is my best guess of what will become the standard. It is essentially independent of uplo.

    Both UHFP and LHFP formats consist two full arrays. The 1st full array T holds two triangles T1 and T2 of sizes n1 and n2. The LDT is n1+1 and the number cols is n2. For uplo = 'L' T1 is lower format and T2 is in upper format. The same is true for uplo = 'U'. In the 'L' case T2 is stored in upper format so that the two triangles form a compact square. In the 'U' case T1 is stored in lower format so that the two triangles form a compact square.

    The second full array S1 contains the near square between the two triangles of sizes n1 and n2. For uplo = 'L' S1 is stored in row major order ( format is 'Transpose' ) and for uplo = 'U' S1 is stored in col major order ( format is 'Normal' ).

    Together these two arrays ( concatenation along the row dimension ) form a single full array of size 2*n1 + 1 rows by n2 columns. Note that LDS = n1 and that both arrays T and S1 are in column major format. The example is general for N odd and T + S1 hold exactly NT = N*(N+1)/2 elements as does the UP and LP arrays.

  5. Performance: Take any packed LAPACK L3 routine. It has a full L3 block algorithm. Write a simple related partition algorithm SRPA with partition sizes n1 and n2. Apply the new SRPA on the new HPF data structure. We give an example for DPPTRF ( the present routine ) The SRPA consists of 4 subroutine calls to existing L3 codes.

  6. It is suggested that format conversion routines DPTHF and DHFTP be made available. The method in (5) is quite general. It offers the possibility to replace all Lapack packed L2 codes and all Lapack full L3 codes with new simple L3 codes based on existing Lapack L3 full format codes.The appeal is especially strong for SMP codes as many L3 BLAS are enabled for SMP implementations. Note that DPTHF and DHFTP offer efficient SMP implementations. Finally, if this new HFP format catches on as it might one can offer any user the ability to only use full format L2 BLAS. The user will not use the current packed L2 BLAS as HFP L2 BLAS are easily implemented by calls to full format L2 BLAS. These new codes will be efficient as no calls to DPTHF or DHFTP are needed as the data format is already HPF.

    We give an example in (4) for the case N = 9.


    \begin{displaymath}\begin{array}{cc}
\mbox{UP} & \mbox{LP} \\
\left( \begin{a...
...0&81&82&83&84&85&86&87&88 \\
\end{array} \right) \end{array} \end{displaymath}


    \begin{displaymath}\begin{array}{cc}
\mbox{UHFP T} & \mbox{LHFP T} \\
\left( ...
...&22&77&87 \\
30&31&32&33&88
\end{array} \right) \end{array} \end{displaymath}


    \begin{displaymath}\begin{array}{cc}
\mbox{UHFP S1} & \mbox{LHFP S1} \\
\left...
...&62&72&82 \\
43&53&73&73&83
\end{array} \right) \end{array} \end{displaymath}

    Home page



2004-05-20