home *** CD-ROM | disk | FTP | other *** search
- INTRO_BLAS(3F) Last changed: 11-2-98
-
-
- NNAAMMEE
- IINNTTRROO__BBLLAASS - Introduction to Basic Linear Algebra Subprograms
-
- IIMMPPLLEEMMEENNTTAATTIIOONN
- See the individual man pages for implementation details
-
- DDEESSCCRRIIPPTTIIOONN
- BLAS is a library of routines that perform basic operations involving
- matrices and vectors. They were designed as a way of achieving
- efficiency in the solution of linear algebra problems. The BLAS, as
- they are now commonly called, have been very successful and have been
- used in a wide range of software, including LINPACK, LAPACK and many
- of the algorithms published by the ACM Transactions on Mathematical
- Software. They are an aid to clarity, portability, modularity and
- maintenance of software, and have become the de facto standard for
- elementary vector and matrix operations.
-
- The BLAS promote modularity by identifying frequently occurring
- operations of linear algebra and by specifying a standard interface to
- these operations. Efficiency is achieved through optimization within
- the BLAS without altering the higher-level code references them.
-
- There are three levels of BLAS:
-
- * Level 1: The original set of BLAS, commonly referred as the Level 1
- BLAS, perform low-level operations such as dot-product and the
- adding of a multiple of one vector to another.
-
- Typically these operations involve O(_n) floating point operations
- and O(_n) data items moved (loaded or stored), where _n is the length
- of the vectors. The Level 1 BLAS permit efficient implementation on
- scalar machines, but the ratio of floating-point operations to data
- movement is too low to be effective on most vector or parallel
- hardware.
-
- * Level 2: The Level 2 BLAS perform matrix-vector operations that
- occur frequently in the implementation of many of the most common
- linear algebra algorithms.
- 2
- These routines involve O(_n ) floating point operations. Algorithms
- that use Level 2 BLAS can be very efficient on vector computers, but
- are not well suited to computers with a hierarchy of memory (such as
- cache memory).
-
- * Level 3: The Level 3 BLAS are targeted at matrix-matrix operations.
- 3 2
- They involve O(_n ) floating point operations, but only create O(_n )
- data movement. These operations permit efficient reuse of data that
- resides in cache and create what is often called the surface-to-
- volumne effect for the ratio of computations to data movement. In
- addition, matrices can be partitioned into blocks, and operations on
- distinct blocks can be performed in parallel, and within the
- operations on each block, scalar or vector operations may be
- performed in parallel.
-
- BLAS2 and BLAS3 modules are optimized and parallelized to take
- advantage of Silicon Graphics' RISC parallel architecture. The best
- performances are achieved for BLAS3 routines (for exmaple, DDGGEEMM) where
- outer-loop unrolling and blocking techniques were applied to take
- advantage of the memory cache. The performance of BLAS2 routines (for
- example, DDGGEEMMVV) is sensitive to the size of the problem; for large
- sizes the high rate of cache miss slows down the algorithms.
-
- LAPACK algorithms use (preferably_ BLAS3 modules and are the most
- efficient. LINPACK uses only BLAS1 modules and therefore is less
- efficient than LAPACK.
-
- To link with lliibbbbllaass, ff7777 to load all the Fortran Libraries required;
- otherwise include --llffttnn in your link line. For R8000 and R10000 based
- machines, use the MMIIPPSS44 version by using the --mmiippss44 option when
- linking, as in this example:
-
- f77 -mips4
- -o foobar.out foo.o bar.o
- -lblas
-
- To use the parallelized version, use the --mmiippss44 option as follows:
-
- f77 -mips4 -mp
- -o foobar.out foo.o bar.o
- -lblas_mp
-
- IInnccrreemmeenntt aarrgguummeennttss
- A vector's description consists of the name of the array (_x or _y)
- followed by the storage spacing (increment) in the array of vector
- elements (_i_n_c_x or _i_n_c_y). The increment can be positive or negative.
- When a vector _x consists of _n elements, the corresponding actual array
- arguments must be of a length at least 1+(_n-1)*|_i_n_c_x| . For a
- negative increment, the first element of _x is assumed to be _x(1+(_n-1)*
- |_i_n_c_x|) . The standard specification of __SSCCAALL, __NNRRMM22, __AASSUUMM, and
- II__AAMMAAXX does not define their behavior for negative increments, so this
- functionality is an extension to the standard BLAS.
-
- Setting an increment argument to 0 can cause unpredictable results.
-
- MMuullttiippllee rroouuttiinnee mmaann ppaaggeess
- Many of the routines are available in real (single-precision),
- complex, double precision and double complex versions. Often little
- or no difference exists between these versions, other than the data
- types of some inputs and outputs. In this case, the routines are
- described on the same man page, and that man page is named after the
- real or complex routine.
-
- The following data types are used in these routines:
-
- * RREEAALL: Fortran "real" data type, 32-bit floating point; these routine
- names begin with SS.
-
- * CCOOMMPPLLEEXX: Fortran "complex" data type, two 32-bit floating point
- reals; these routine names begin with CC.
-
- * DDOOUUBBLLEE PPRREECCIISSIIOONN: Fortran "double precision" data type, 64-bit
- floating point; these routine names begin with DD.
-
- * DDOOUUBBLLEE CCOOMMPPLLEEXX: Fortran "double complex" data type, two 64-bit
- floating point doubles; these routine names begin with ZZ.
-
- The mmaann(1) command can find a man page online by either the real,
- complex, double precision, or double complex name.
-
- The following table describes the naming conventions for these
- routines:
-
- -------------------------------------------------------------
- 64-bit
- complex
- 64-bit real (double
- (double 32-bit complex
- 32-bit real precision) complex precision)
- -------------------------------------------------------------
- form: SS_n_a_m_e DD_n_a_m_e CC_n_a_m_e ZZ_n_a_m_e
- example:SSAAXXPPYY DDAAXXPPYY CCAAXXPPYY ZZAAXXPPYY
- -------------------------------------------------------------
-
- FFoorrttrraann ttyyppee ddeeccllaarraattiioonn ffoorr ffuunnccttiioonnss
- Always declare the data type of external functions. Declaring the
- data type of the complex Level 1 BLAS functions is particularily
- important because, based on the first letter of their names and the
- Fortran data typing rules, the default implied data type would be
- REAL.
-
- SSuummmmaarryy ooff rroouuttiinneess
- The following tables list the available BLAS routines.
-
- BBLLAASS LLeevveell 11
- -------------------------------------------------------------------------
- Function Prefix and suffix (if provided) Man page name
- -------------------------------------------------------------------------
- dot product ss-- dd-- cc--uu cc--cc zz--uu zz--cc ddoott
- yy == aa**xx ++ yy ss-- dd-- cc-- zz-- aaxxppyy
- setup Givens rotation ss-- dd-- rroottgg
- apply Givens rotation ss-- dd-- ccss-- zzdd-- rroott
- copy x into y ss-- dd-- cc-- zz-- ccooppyy
- swap x and y ss-- dd-- cc-- zz-- sswwaapp
- Euclidean norm ss-- dd-- sscc-- ddzz-- nnrrmm22
- sum of absolute values ss-- dd-- sscc-- ddzz-- aassuumm
- xx == aa**xx ss-- dd-- ccss-- cc-- zzdd-- zz-- ssccaall
- index of max abs value iiss-- iidd-- iicc-- iizz-- aammaaxx
- -------------------------------------------------------------------------
-
- BBLLAASS LLeevveell 22
- In the following tables, these abbreviations are used:
-
- MMVV Matrix vector multiply
-
- RR Rank one update to a matrix
-
- RR22 Rank two update to a matrix
-
- SSVV Solving certain triangular matrix problems.
-
- single precision Level 2 BLAS | Double precision Level 2 BLAS
- -----------------------------------------------------------------------
- MV R R2 SV | MV R R2 SV
- SGE x x | DGE x x
- SGB x | DGB x
- SSP x x x | DSP x x x
- SSY x x x | DSY x x x
- SSB x | DSB x
- STR x x | DTR x x
- STB x x | DTB x x
- STP x x | DTP x x
-
- complex Level 2 BLAS | Double precision complex Level 2 BLAS
- -----------------------------------------------------------------------
- MV R RC RU R2 SV| MV R RC RU R2 SV
- CGE x x x | ZGE x x x
- CGB x | ZGB x
- CHE x x x | ZHE x x x
- CHP x x x | ZHP x x x
- CHB x | ZHB x
- CTR x x | ZTR x x
- CTB x x | ZTB x x
- CTP x x | ZTP x x
-
- BBLLAASS LLeevveell 33
- In the following tables, these abbreviations are used:
-
- MMMM Matrix matrix multiply
-
- RRKK Rank-k update to a matrix
-
- R2K Rank-2k update to a matrix
-
- SSMM Solving triangular matrix with many right-hand-sides.
-
- single precision Level 3 BLAS | Double precision Level 3 BLAS
- -----------------------------------------------------------------------
- MM RK R2K SM | MM RK R2K SM
- SGE x | DGE x
- SSY x x x | DSY x x x
- STR x x | DTR x x
-
- complex Level 3 BLAS | Double precision complex Level 3 BLAS
- -----------------------------------------------------------------------
- MM RK R2K SM | MM RK R2K SM
- CGE x | ZGE x
- CSY x x x | ZSY x x x
- CHE x x x | ZHE x x x
- CTR x x | ZTR x x
-
- FFIILLEESS
- //uussrr//lliibb//lliibbbbllaass..aa
- //uussrr//lliibb//lliibbbbllaass__mmpp..aa
- //uussrr//iinncclluuddee//ccbbllaass..hh
-
- NNOOTTEESS
- lliibbbbllaass does not currently support reshaped arrays.
-
- SSEEEE AALLSSOO
- S.P. Datardina, J.J. Du Croz, S.J. Hammarling and M.W. Pont, "A
- Proposed Specification of BLAS Routines in C", NAG Technical Report
- TR6/90.
-
- Lawson, C., Hanson, R., Kincaid, D., and Krogh, F., "Basic Linear
- Algebra Subprograms for Fortran Usage," _A_C_M _T_r_a_n_s_a_c_t_i_o_n_s _o_n
- _M_a_t_h_e_m_a_t_i_c_a_l _S_o_f_t_w_a_r_e,
- 5 (1979),
- pp. 308 - 325.
-
- J.Dongarra, J.DuCroz, S.Hammarling, and R.Hanson, "An extended set of
- Fortran Basic Linear Algebra Subprograms", ACM Trans. on Math. Soft.
- 14, 1(1988) 1-32
-
- J.Dongarra, J.DuCroz, I.Duff,and S.Hammarling, "An set of level 3
- Basic Algebra Subprograms", ACM Trans on Math Soft( Dec 1989)
-
- This man page is available only online.
-