home *** CD-ROM | disk | FTP | other *** search
-
-
-
- IIIINNNNTTTTRRRROOOO____BBBBLLLLAAAASSSS((((3333SSSS)))) IIIINNNNTTTTRRRROOOO____BBBBLLLLAAAASSSS((((3333SSSS))))
-
-
-
- NNNNAAAAMMMMEEEE
- IIIINNNNTTTTRRRROOOO____BBBBLLLLAAAASSSS - Introduction to SCSL Basic Linear Algebra Subprograms
-
- IIIIMMMMPPPPLLLLEEEEMMMMEEEENNNNTTTTAAAATTTTIIIIOOOONNNN
- See individual man pages for operating system and hardware availability.
-
- DDDDEEEESSSSCCCCRRRRIIIIPPPPTTTTIIIIOOOONNNN
- The Basic Linear Algebra Subprograms comprise a library of routines that
- perform basic operations involving matrices and vectors. They were
- designed as a way of achieving efficiency in the solution of linear
- algebra problems. The BLAS, as they are now commonly called, have been
- very successful and have been used in a wide range of software, including
- LINPACK, LAPACK and many of the algorithms published by the ACM
- Transactions on Mathematical Software. They are an aid to clarity,
- portability, modularity and maintenance of software, and have become the
- de facto standard for elementary vector and matrix operations.
-
- The BLAS promote modularity by identifying frequently occurring
- operations of linear algebra and by specifying a standard interface to
- these operations. Efficiency is achieved through optimization within the
- BLAS without altering the higher-level code that references them.
-
- There are three levels of BLAS:
-
- * Level 1: The original set of BLAS, commonly referred as the Level 1
- BLAS, perform low-level operations such as dot-product and the adding
- of a multiple of one vector to another.
-
- Typically these operations involve O(_n) floating point operations and
- O(_n) data items moved (loaded or stored), where _n is the length of
- the vectors. The Level 1 BLAS permit efficient implementation on
- scalar machines, but the ratio of floating-point operations to data
- movement is too low to be effective on most vector or parallel
- hardware.
-
- For more details on Level 1 BLAS routines available in SCSL, please
- refer to the IIIINNNNTTTTRRRROOOO____BBBBLLLLAAAASSSS1111(3S) man page.
-
- * Level 2: The Level 2 BLAS perform matrix-vector operations that occur
- frequently in the implementation of many of the most common linear
- algebra algorithms.
-
- These routines involve O(_n) floating point operations. Algorithms
- that use Level 2 BLAS can be very efficient on vector computers, but
- are not well suited to computers with a hierarchy of memory (such as
- cache memory).
-
- For more details on Level 2 BLAS routines available in SCSL, please
- refer to the IIIINNNNTTTTRRRROOOO____BBBBLLLLAAAASSSS2222(3S) man page.
-
-
-
-
-
-
- PPPPaaaaggggeeee 1111
-
-
-
-
-
-
- IIIINNNNTTTTRRRROOOO____BBBBLLLLAAAASSSS((((3333SSSS)))) IIIINNNNTTTTRRRROOOO____BBBBLLLLAAAASSSS((((3333SSSS))))
-
-
-
- * Level 3: The Level 3 BLAS are targeted at matrix-matrix operations.
-
- They involve O(_n) floating point operations, but only create O(_n)
- data movement. These operations permit efficient reuse of data that
- reside in cache and create what is often called the surface-to-volume
- effect for the ratio of computations to data movement. In addition,
- matrices can be partitioned into blocks, and operations on distinct
- blocks can be performed in parallel, and within the operations on
- each block, scalar or vector operations may be performed in parallel.
-
- For more details on Level 3 BLAS routines available in SCSL, please
- refer to the IIIINNNNTTTTRRRROOOO____BBBBLLLLAAAASSSS3333(3S) man page.
-
- BLAS2 and BLAS3 modules in SCSL are optimized and parallelized to take
- advantage of SGI's chip-level and system-level architectures. The best
- performance is achieved with BLAS3 routines (for example, DDDDGGGGEEEEMMMMMMMM) where
- outer-loop unrolling and blocking techniques have been applied to take
- advantage of the memory cache. The performance of BLAS2 routines (for
- example, DDDDGGGGEEEEMMMMVVVV) is sensitive to the size of the problem; for large sizes
- the high cache miss rate slows down the algorithms.
-
- SCSL's LAPACK algorithms make extensive use of BLAS3 modules and are more
- efficient than the older, BLAS1-based LINPACK algorithms.
-
- NNNNOOOOTTTTEEEESSSS
- SSSSCCCCSSSSLLLL does not currently support reshaped arrays.
-
- SSSSEEEEEEEE AAAALLLLSSSSOOOO
- S.P. Datardina, J.J. Du Croz, S.J. Hammarling and M.W. Pont, "A Proposed
- Specification of BLAS Routines in C", NAG Technical Report TR6/90.
-
- C. Lawson, R. Hanson, D. Kincaid, and F. Krogh, "Basic Linear Algebra
- Subprograms for Fortran Usage", _A_C_M _T_r_a_n_s_a_c_t_i_o_n_s _o_n _M_a_t_h_e_m_a_t_i_c_a_l
- _S_o_f_t_w_a_r_e, 5 (1979), pp. 308-325.
-
- J. Dongarra, J. DuCroz, S. Hammarling, and R. Hanson, "An extended set of
- Fortran Basic Linear Algebra Subprograms", ACM Trans. on Math. Soft. 14,
- 1 (1988), pp. 1-32.
-
- J. Dongarra, J. DuCroz, I. Duff, and S. Hammarling, "An set of level 3
- Basic Algebra Subprograms", ACM Trans. on Math. Soft. (Dec 1989).
-
- IIIINNNNTTTTRRRROOOO____SSSSCCCCSSSSLLLL(3S), IIIINNNNTTTTRRRROOOO____BBBBLLLLAAAASSSS1111(3S), IIIINNNNTTTTRRRROOOO____BBBBLLLLAAAASSSS2222(3S), IIIINNNNTTTTRRRROOOO____BBBBLLLLAAAASSSS3333(3S),
- IIIINNNNTTTTRRRROOOO____CCCCBBBBLLLLAAAASSSS(3S), IIIINNNNTTTTRRRROOOO____LLLLAAAAPPPPAAAACCCCKKKK(3S)
-
-
-
-
-
-
-
-
- PPPPaaaaggggeeee 2222
-
-
-
-
-
-
-