IRIX Base Documentation 2002 November

home *** CD-ROM | disk | FTP | other *** search

/ IRIX Base Documentation 2002 November / SGI IRIX Base Documentation 2002 November.iso / usr / share / catman / p_man / cat3 / libblas / intro_blas.z / intro_blas

Wrap

Text File | 2002-10-03 | 12.6 KB | 252 lines

INTRO_BLAS(3F) Last changed: 11-2-98 NNAAMMEE IINNTTRROO__BBLLAASS - Introduction to Basic Linear Algebra Subprograms IIMMPPLLEEMMEENNTTAATTIIOONN See the individual man pages for implementation details DDEESSCCRRIIPPTTIIOONN BLAS is a library of routines that perform basic operations involving matrices and vectors. They were designed as a way of achieving efficiency in the solution of linear algebra problems. The BLAS, as they are now commonly called, have been very successful and have been used in a wide range of software, including LINPACK, LAPACK and many of the algorithms published by the ACM Transactions on Mathematical Software. They are an aid to clarity, portability, modularity and maintenance of software, and have become the de facto standard for elementary vector and matrix operations. The BLAS promote modularity by identifying frequently occurring operations of linear algebra and by specifying a standard interface to these operations. Efficiency is achieved through optimization within the BLAS without altering the higher-level code references them. There are three levels of BLAS: * Level 1: The original set of BLAS, commonly referred as the Level 1 BLAS, perform low-level operations such as dot-product and the adding of a multiple of one vector to another. Typically these operations involve O(_n) floating point operations and O(_n) data items moved (loaded or stored), where _n is the length of the vectors. The Level 1 BLAS permit efficient implementation on scalar machines, but the ratio of floating-point operations to data movement is too low to be effective on most vector or parallel hardware. * Level 2: The Level 2 BLAS perform matrix-vector operations that occur frequently in the implementation of many of the most common linear algebra algorithms. 2 These routines involve O(_n ) floating point operations. Algorithms that use Level 2 BLAS can be very efficient on vector computers, but are not well suited to computers with a hierarchy of memory (such as cache memory). * Level 3: The Level 3 BLAS are targeted at matrix-matrix operations. 3 2 They involve O(_n ) floating point operations, but only create O(_n ) data movement. These operations permit efficient reuse of data that resides in cache and create what is often called the surface-to- volumne effect for the ratio of computations to data movement. In addition, matrices can be partitioned into blocks, and operations on distinct blocks can be performed in parallel, and within the operations on each block, scalar or vector operations may be performed in parallel. BLAS2 and BLAS3 modules are optimized and parallelized to take advantage of Silicon Graphics' RISC parallel architecture. The best performances are achieved for BLAS3 routines (for exmaple, DDGGEEMM) where outer-loop unrolling and blocking techniques were applied to take advantage of the memory cache. The performance of BLAS2 routines (for example, DDGGEEMMVV) is sensitive to the size of the problem; for large sizes the high rate of cache miss slows down the algorithms. LAPACK algorithms use (preferably_ BLAS3 modules and are the most efficient. LINPACK uses only BLAS1 modules and therefore is less efficient than LAPACK. To link with lliibbbbllaass, ff7777 to load all the Fortran Libraries required; otherwise include --llffttnn in your link line. For R8000 and R10000 based machines, use the MMIIPPSS44 version by using the --mmiippss44 option when linking, as in this example: f77 -mips4 -o foobar.out foo.o bar.o -lblas To use the parallelized version, use the --mmiippss44 option as follows: f77 -mips4 -mp -o foobar.out foo.o bar.o -lblas_mp IInnccrreemmeenntt aarrgguummeennttss A vector's description consists of the name of the array (_x or _y) followed by the storage spacing (increment) in the array of vector elements (_i_n_c_x or _i_n_c_y). The increment can be positive or negative. When a vector _x consists of _n elements, the corresponding actual array arguments must be of a length at least 1+(_n-1)*|_i_n_c_x| . For a negative increment, the first element of _x is assumed to be _x(1+(_n-1)* |_i_n_c_x|) . The standard specification of __SSCCAALL, __NNRRMM22, __AASSUUMM, and II__AAMMAAXX does not define their behavior for negative increments, so this functionality is an extension to the standard BLAS. Setting an increment argument to 0 can cause unpredictable results. MMuullttiippllee rroouuttiinnee mmaann ppaaggeess Many of the routines are available in real (single-precision), complex, double precision and double complex versions. Often little or no difference exists between these versions, other than the data types of some inputs and outputs. In this case, the routines are described on the same man page, and that man page is named after the real or complex routine. The following data types are used in these routines: * RREEAALL: Fortran "real" data type, 32-bit floating point; these routine names begin with SS. * CCOOMMPPLLEEXX: Fortran "complex" data type, two 32-bit floating point reals; these routine names begin with CC. * DDOOUUBBLLEE PPRREECCIISSIIOONN: Fortran "double precision" data type, 64-bit floating point; these routine names begin with DD. * DDOOUUBBLLEE CCOOMMPPLLEEXX: Fortran "double complex" data type, two 64-bit floating point doubles; these routine names begin with ZZ. The mmaann(1) command can find a man page online by either the real, complex, double precision, or double complex name. The following table describes the naming conventions for these routines: ------------------------------------------------------------- 64-bit complex 64-bit real (double (double 32-bit complex 32-bit real precision) complex precision) ------------------------------------------------------------- form: SS_n_a_m_e DD_n_a_m_e CC_n_a_m_e ZZ_n_a_m_e example:SSAAXXPPYY DDAAXXPPYY CCAAXXPPYY ZZAAXXPPYY ------------------------------------------------------------- FFoorrttrraann ttyyppee ddeeccllaarraattiioonn ffoorr ffuunnccttiioonnss Always declare the data type of external functions. Declaring the data type of the complex Level 1 BLAS functions is particularily important because, based on the first letter of their names and the Fortran data typing rules, the default implied data type would be REAL. SSuummmmaarryy ooff rroouuttiinneess The following tables list the available BLAS routines. BBLLAASS LLeevveell 11 ------------------------------------------------------------------------- Function Prefix and suffix (if provided) Man page name ------------------------------------------------------------------------- dot product ss-- dd-- cc--uu cc--cc zz--uu zz--cc ddoott yy == aa**xx ++ yy ss-- dd-- cc-- zz-- aaxxppyy setup Givens rotation ss-- dd-- rroottgg apply Givens rotation ss-- dd-- ccss-- zzdd-- rroott copy x into y ss-- dd-- cc-- zz-- ccooppyy swap x and y ss-- dd-- cc-- zz-- sswwaapp Euclidean norm ss-- dd-- sscc-- ddzz-- nnrrmm22 sum of absolute values ss-- dd-- sscc-- ddzz-- aassuumm xx == aa**xx ss-- dd-- ccss-- cc-- zzdd-- zz-- ssccaall index of max abs value iiss-- iidd-- iicc-- iizz-- aammaaxx ------------------------------------------------------------------------- BBLLAASS LLeevveell 22 In the following tables, these abbreviations are used: MMVV Matrix vector multiply RR Rank one update to a matrix RR22 Rank two update to a matrix SSVV Solving certain triangular matrix problems. single precision Level 2 BLAS | Double precision Level 2 BLAS ----------------------------------------------------------------------- MV R R2 SV | MV R R2 SV SGE x x | DGE x x SGB x | DGB x SSP x x x | DSP x x x SSY x x x | DSY x x x SSB x | DSB x STR x x | DTR x x STB x x | DTB x x STP x x | DTP x x complex Level 2 BLAS | Double precision complex Level 2 BLAS ----------------------------------------------------------------------- MV R RC RU R2 SV| MV R RC RU R2 SV CGE x x x | ZGE x x x CGB x | ZGB x CHE x x x | ZHE x x x CHP x x x | ZHP x x x CHB x | ZHB x CTR x x | ZTR x x CTB x x | ZTB x x CTP x x | ZTP x x BBLLAASS LLeevveell 33 In the following tables, these abbreviations are used: MMMM Matrix matrix multiply RRKK Rank-k update to a matrix R2K Rank-2k update to a matrix SSMM Solving triangular matrix with many right-hand-sides. single precision Level 3 BLAS | Double precision Level 3 BLAS ----------------------------------------------------------------------- MM RK R2K SM | MM RK R2K SM SGE x | DGE x SSY x x x | DSY x x x STR x x | DTR x x complex Level 3 BLAS | Double precision complex Level 3 BLAS ----------------------------------------------------------------------- MM RK R2K SM | MM RK R2K SM CGE x | ZGE x CSY x x x | ZSY x x x CHE x x x | ZHE x x x CTR x x | ZTR x x FFIILLEESS //uussrr//lliibb//lliibbbbllaass..aa //uussrr//lliibb//lliibbbbllaass__mmpp..aa //uussrr//iinncclluuddee//ccbbllaass..hh NNOOTTEESS lliibbbbllaass does not currently support reshaped arrays. SSEEEE AALLSSOO S.P. Datardina, J.J. Du Croz, S.J. Hammarling and M.W. Pont, "A Proposed Specification of BLAS Routines in C", NAG Technical Report TR6/90. Lawson, C., Hanson, R., Kincaid, D., and Krogh, F., "Basic Linear Algebra Subprograms for Fortran Usage," _A_C_M _T_r_a_n_s_a_c_t_i_o_n_s _o_n _M_a_t_h_e_m_a_t_i_c_a_l _S_o_f_t_w_a_r_e, 5 (1979), pp. 308 - 325. J.Dongarra, J.DuCroz, S.Hammarling, and R.Hanson, "An extended set of Fortran Basic Linear Algebra Subprograms", ACM Trans. on Math. Soft. 14, 1(1988) 1-32 J.Dongarra, J.DuCroz, I.Duff,and S.Hammarling, "An set of level 3 Basic Algebra Subprograms", ACM Trans on Math Soft( Dec 1989) This man page is available only online.