IA-32 Extended Precision Library Version 1.1 Intel 32-bit architecture (IA-32) uses internally 80-bit arithmetic in its floating point calculations. Most compilers support setting the precision control to either double precision rounding (53-bit mantissa, 64-bit rounding) or extended precision rounding (64-bit mantissa, 80-bit rounding) once in a code with little support for extended precision variables. We are providing an Extended Precision Math Library (700+ routines in Version 1.1) to assist in getting at this extended precision. For those interested in details on this topic, please see http://www.netlib.org/utk/papers/chapter5.ps. There are currently 9 primitives provided in this latest library for using extended precision. They are: add, subtract, multiply, multiply-add combination, multiply-subtract combination, square root, assignment, reciprocal, and a full dot product. The syntax for the first five primitives (add, subtract, multiply, multiply-add, multiply-subtract) is the same (625 routines in all): xadd_YYY__ ( a, b, c ) does a = b + c xsub_YYY__ ( a, b, c ) does a = b - c xmul_YYY__ ( a, b, c ) does a = b * c xfma_YYY__ ( a, b, c ) does a = a + b * c xfms_YYY__ ( a, b, c ) does a = a - b * c where Y is (respective to the order of the 3 arguments): Y == "s" : single (32-bit, 24-bit mantissa) Y == "d" : double (64-bit, 53-bit mantissa) Y == "x" : extended (80-bit, 64-bit mantissa) Y == "w" : double-double (128-bit, 106-bit dual mantissa) Y == "q" : extended-extended (160-bit, 128-bit dual mantissa) Each of the first 5 primitives above has 3 arguments, and the location of Y indicates the precision of the argument. For example, xadd_xdw is a routine that takes a 64-bit number (2nd argument) and adds it to a double-double (3rd argument) and stores the result as an extended 80-bit number (1st argument). Even if a compiler does not support extended, double-double, or extended-extended formats, one could still pass in a double array with 2 or 3 entries, or a character*10 variable, or a single array with enough entries, etc.. Setting, assigning, and retrieving values into this array can be done via the set primitive described below. The next 3 primitives (square root, assignment and reciprocal) have only two arguments and are of the format: xsqrt_YY__ ( a, b ) does a = sqrt(b) xset_YY__ ( a, b ) does a = b xrecip_YY__ ( a, b ) does a = 1/ b Unlike the first set of primitives, not all combinations are present for xsqrt and xrecip. The assignment protocal "xset" has the full combinations of precision. xsqrt and xrecip just have "s", "d" and "x", with the addition of xsqrt_ww. The last primitive is really a first attempt at an extended precision BLAS. It does a dot product and is of the format xddot(n, x, y, sum) where it is assumed that x and y are double precision vectors of length n and increment one, and this stores the result as an 80-bit number sum. In addition, this library contains some new utilities: con64__(): change the rounding mode to 64-bit con80__(): change the rounding mode to 80-bit xprint__(x, ival) : print the number x. But you must tell xprint the precision with the ival flag: if you set ival=4 then x is single ("s",real*4) if you set ival=8 then x is double ("d",real*8) if you set ival=10 then x is extended ("x",real*10) if you set ival=16 then x is double-double ("w",real*16) if you set ival=20 then x is extended-extended ("q",real*20) if you set ival=32 then x is assumed to be real*32 This is fortran callable, so both x and ival are passing by reference (pointers). The 2 appended underscores should allow most linux linkers to make the codes callable from fortran: "call xprint(x,16)" or from C: "double x[2]; int ival=16; .... ; xprint__( x, &ival );". A final note on xprint__(): This routine is not optimized in the current version and it is terribly slow. This library is provided simply as a means to tinker with extended precision. Future library versions will have more extended precision BLAS calls as well. For questions, please contact Greg Henry at henry@co.intel.com - Greg Henry