IRIX Base Documentation 2002 November

home *** CD-ROM | disk | FTP | other *** search

/ IRIX Base Documentation 2002 November / SGI IRIX Base Documentation 2002 November.iso / usr / share / catman / p_man / cat3 / SCSL / intro_blas.z / intro_blas

Wrap

Text File | 2002-10-03 | 6.4 KB | 133 lines

IIIINNNNTTTTRRRROOOO____BBBBLLLLAAAASSSS((((3333SSSS)))) IIIINNNNTTTTRRRROOOO____BBBBLLLLAAAASSSS((((3333SSSS)))) NNNNAAAAMMMMEEEE IIIINNNNTTTTRRRROOOO____BBBBLLLLAAAASSSS - Introduction to SCSL Basic Linear Algebra Subprograms IIIIMMMMPPPPLLLLEEEEMMMMEEEENNNNTTTTAAAATTTTIIIIOOOONNNN See individual man pages for operating system and hardware availability. DDDDEEEESSSSCCCCRRRRIIIIPPPPTTTTIIIIOOOONNNN The Basic Linear Algebra Subprograms comprise a library of routines that perform basic operations involving matrices and vectors. They were designed as a way of achieving efficiency in the solution of linear algebra problems. The BLAS, as they are now commonly called, have been very successful and have been used in a wide range of software, including LINPACK, LAPACK and many of the algorithms published by the ACM Transactions on Mathematical Software. They are an aid to clarity, portability, modularity and maintenance of software, and have become the de facto standard for elementary vector and matrix operations. The BLAS promote modularity by identifying frequently occurring operations of linear algebra and by specifying a standard interface to these operations. Efficiency is achieved through optimization within the BLAS without altering the higher-level code that references them. There are three levels of BLAS: * Level 1: The original set of BLAS, commonly referred as the Level 1 BLAS, perform low-level operations such as dot-product and the adding of a multiple of one vector to another. Typically these operations involve O(_n) floating point operations and O(_n) data items moved (loaded or stored), where _n is the length of the vectors. The Level 1 BLAS permit efficient implementation on scalar machines, but the ratio of floating-point operations to data movement is too low to be effective on most vector or parallel hardware. For more details on Level 1 BLAS routines available in SCSL, please refer to the IIIINNNNTTTTRRRROOOO____BBBBLLLLAAAASSSS1111(3S) man page. * Level 2: The Level 2 BLAS perform matrix-vector operations that occur frequently in the implementation of many of the most common linear algebra algorithms. These routines involve O(_n) floating point operations. Algorithms that use Level 2 BLAS can be very efficient on vector computers, but are not well suited to computers with a hierarchy of memory (such as cache memory). For more details on Level 2 BLAS routines available in SCSL, please refer to the IIIINNNNTTTTRRRROOOO____BBBBLLLLAAAASSSS2222(3S) man page. PPPPaaaaggggeeee 1111 IIIINNNNTTTTRRRROOOO____BBBBLLLLAAAASSSS((((3333SSSS)))) IIIINNNNTTTTRRRROOOO____BBBBLLLLAAAASSSS((((3333SSSS)))) * Level 3: The Level 3 BLAS are targeted at matrix-matrix operations. They involve O(_n) floating point operations, but only create O(_n) data movement. These operations permit efficient reuse of data that reside in cache and create what is often called the surface-to-volume effect for the ratio of computations to data movement. In addition, matrices can be partitioned into blocks, and operations on distinct blocks can be performed in parallel, and within the operations on each block, scalar or vector operations may be performed in parallel. For more details on Level 3 BLAS routines available in SCSL, please refer to the IIIINNNNTTTTRRRROOOO____BBBBLLLLAAAASSSS3333(3S) man page. BLAS2 and BLAS3 modules in SCSL are optimized and parallelized to take advantage of SGI's chip-level and system-level architectures. The best performance is achieved with BLAS3 routines (for example, DDDDGGGGEEEEMMMMMMMM) where outer-loop unrolling and blocking techniques have been applied to take advantage of the memory cache. The performance of BLAS2 routines (for example, DDDDGGGGEEEEMMMMVVVV) is sensitive to the size of the problem; for large sizes the high cache miss rate slows down the algorithms. SCSL's LAPACK algorithms make extensive use of BLAS3 modules and are more efficient than the older, BLAS1-based LINPACK algorithms. NNNNOOOOTTTTEEEESSSS SSSSCCCCSSSSLLLL does not currently support reshaped arrays. SSSSEEEEEEEE AAAALLLLSSSSOOOO S.P. Datardina, J.J. Du Croz, S.J. Hammarling and M.W. Pont, "A Proposed Specification of BLAS Routines in C", NAG Technical Report TR6/90. C. Lawson, R. Hanson, D. Kincaid, and F. Krogh, "Basic Linear Algebra Subprograms for Fortran Usage", _A_C_M _T_r_a_n_s_a_c_t_i_o_n_s _o_n _M_a_t_h_e_m_a_t_i_c_a_l _S_o_f_t_w_a_r_e, 5 (1979), pp. 308-325. J. Dongarra, J. DuCroz, S. Hammarling, and R. Hanson, "An extended set of Fortran Basic Linear Algebra Subprograms", ACM Trans. on Math. Soft. 14, 1 (1988), pp. 1-32. J. Dongarra, J. DuCroz, I. Duff, and S. Hammarling, "An set of level 3 Basic Algebra Subprograms", ACM Trans. on Math. Soft. (Dec 1989). IIIINNNNTTTTRRRROOOO____SSSSCCCCSSSSLLLL(3S), IIIINNNNTTTTRRRROOOO____BBBBLLLLAAAASSSS1111(3S), IIIINNNNTTTTRRRROOOO____BBBBLLLLAAAASSSS2222(3S), IIIINNNNTTTTRRRROOOO____BBBBLLLLAAAASSSS3333(3S), IIIINNNNTTTTRRRROOOO____CCCCBBBBLLLLAAAASSSS(3S), IIIINNNNTTTTRRRROOOO____LLLLAAAAPPPPAAAACCCCKKKK(3S) PPPPaaaaggggeeee 2222