Hardware - Vector Libraries


	Log In \| Not a Member?	Support

Vector Libraries

This page contains a continually expanding set of vector libraries that are available to the AltiVec programmer through the Accelerate framework on MacOS X.3, Panther. On earlier versions of MacOS X, these were available in vecLib.framework. The currently available libraries are described below.

Digital Signal Processing (vDSP)

vDSP is a collection of digital signal processing functions such as FFTs, convolutions and squares. It uses the vector engine when available and certain criteria are met on G4 equipped computers, and it uses the scalar unit on G3 equipped computers. vDSP is available in Mac OS 9.1 and Mac OS X.

(sample code)

vImage Image Processing Framework

vImage is a new image processing library delivered with MacOS X.3, Panther, as part of the Accelerate umbrella framework. It is not available on earlier operating systems. It provides a series of image processing operations, such as image resize, distortion, rotation, convolutions (useful for blurs, sharpen, derivative effects), morphology (erode, dilate, min, max), histogram operations, alpha compositing and data format conversion and clipping. It supports two channel formats, 8 bit integer per channel, and 32-bit IEEE-754 floating point. The channels can either be kept separate in a planar arrangement (a separate array for each color channel, such as red, green, blue, alpha, etc.) or interleaved in an ARGB format. Conversion routines are provided to support other data formats.

vImage is optimized to provide best performance on all platforms. It transparently selects to use scalar code on G3 and vector code on G4 and G5. It comes pre-installed on every MacOS X.3 or later system.

Tableau is a sample application that uses vImage to show what can be done with the framework. If you have installed the developer examples onto your system, it can be found here:

/Developer/Examples/Accelerate/vImage/Tableau

Basic Linear Algebra Subprograms (BLAS)

VecLib also contains Basic Linear Algebra Subprograms (BLAS) that use AltiVec technology for their implementations. The functions are grouped into three categories (called levels), as follows:

Vector-scalar linear algebra subprograms
Matrix-vector linear algebra subprograms
Matrix operations

A Readme file is included that contains the following sections:

Descriptions of functions
Comparison with BLAS (Basic Linear Algebra Subroutines)
Test methodology
Future releases
Compiler version

Download Sample Code

LAPACK

LAPACK provides routines for solving systems of simultaneous linear equations, least-squares solutions of linear systems of equations, eigenvalue problems, and singular value problems. The associated matrix factorizations (LU, Cholesky, QR, SVD, Schur, generalized Schur) are also provided, as are related computations such as reordering of the Schur factorizations and estimating condition numbers. Dense and banded matrices are handled, but not general sparse matrices. In all areas, similar functionality is provided for real and complex matrices, in both single and double precision.

Also, see <http://netlib.org/lapack/index.html>

The C version of LAPACK (denoted CLAPACK) intends to provide LAPACK for someone who does not have access to a Fortran compiler.

However, CLAPACK is designed to create C code that is still callable from Fortran, so all arguments must be passed using Fortran calling conventions and data structures. This requirement has several repercussions. The first is that since many compilers require distinct Fortran and C routine namespaces, an underscore (_) is appended to C routine names which will be called from Fortran. Therefore, f2c has added this underscore to all the names in CLAPACK. So, a call that in Fortran would look like:

call dgetrf(...)

becomes in C:

dgetrf_(...);

Second, the user must pass ALL arguments by reference, i.e. as pointers, since this is how Fortran works. This includes all scalar arguments like M and N. This restriction means that you cannot make a call with numbers directly in the parameter sequence. For example, consider the LU factorization of a 5-by-5 matrix. If the matrix to be factored is called A, the Fortran call

call dgetrf(5, 5, A, 5, ipiv, info)

becomes in C:

M = N = LDA = 5; dgetrf_(&M, &N, A, &LDA, ipiv, &info);

Some LAPACK routines take character string arguments. In all but the testing and timing code, only the first character of the string is signficant. Therefore, the CLAPACK driver, computational, and auxiliary routines only expect single character arguments. For example, the Fortran call

call dpotrf( 'Upper', n, a, lda, info )

becomes in C:

char s = 'U'; dpotrf_(&s, &n, a, &lda, &info);

In a future release we hope to provide ``wrapper'' routines that will remove the need for these unnecessary pointers, and automatically allocate (``malloc'') any workspace that is required.

As a final point, we must stress that there is a difference in the definition of a two-dimensional array in Fortran and C. A two-dimensional Fortran array declared as

DOUBLE PRECISION A(LDA, N)

is a contiguous piece of LDA X N double-words of memory, stored incolumn-major order: elements in a column are contiguous, and elements within a row are separated by a stride of LDA double-words.

In C, however, a two-dimensional array is in row-major order. Further, the rows of a two-dimensional C array need not be contiguous. The array:

double A[LDA][N];

actually has LDA pointers to rows of length N. These pointers can in principle be anywhere in memory. Passing such a two-dimensional C array to a CLAPACK routine will almost surely give erroneous results. Instead, you must use a one-dimensional C array of size LDA X N double-words (or else malloc the same amount of space).

We recommend using the following code to get the array CLAPACK will be expecting:

double *A; A = malloc( LDA*N*sizeof(double) );

Note that for best memory utilization, you would set LDA=M, the actual number of rows of A. If you now wish to operate on the matrix A, remember that A is in column-major order. As an example of accessing Fortran-style arrays in C, the following code fragments show how to initialize the array A declared above so that all of column j has the value j:

double *ptr; ptr = A; for(j=0; j < N; j++) {

for (i=0; i < M; i++) *ptr++ = j; ptr += (LDA - M);

}

or, you can use:

for(j=0; j < N; j++) {

for (i=0; i < M; i++) A[j*LDA+i] = j;

}

Note that the loop over the row index i is the inner loop, since column entries are contiguous.

Vector Math Library (vMathLib and vForce)

vMathLib is a collection of numerical functions designed to facilitate a wide range of numerical programming for the AltiVec programming model. It includes computational functions (divide, square root), exponential functions and transcendental functions, to name a few. vForce is similar, except that where vMathLib takes SIMD vectors as function arguments, vForce takes long vectors (pointers to arrays) as function arguments. vForce transparently selects between scalar and vector as appropriate for the current architecture, and in many cases on PowerPC is considerably faster than vMathLib, because there is more data for it to process concurrently.

Apple provides source for vMathLib for educational purposes. If you intend to use this code in your application, it is suggested that you use the routines in vecLib.framework instead. They are likely to be more optimized and more correct. The Accelerate.framework version also include x86 versions.

Download Source

Basic Algebraic Operations (vBasicOps)

vBasicOps contains a library of algebraic operations that expand on the capabilities of the AltiVec hardware. It includes integer add and subtract, both modulo and saturated, for 64- and 128-bit operands; half-multiply (wherein the result is the same size as the input operands) for 8-, 16-, 32-, 64-, and 128-bit operands; full multiply even/odd for 64- and 128-bit operands; divide for all size operands from 8- to 128-bit; logical left and right shift for 64-bit operands; algebraic shift for 64- and 128-bit operands; and rotate left and right for 64- and 128-bit operands.

Apple provides source for vBasicOps for educational purposes. If you intend to use this code in your application, it is suggested that you use the routines in vecLib.framework instead. They are likely to be more optimized.

Download Source

Vector Big Number Library

vBigNum implements the following for 256-, 512-, and 1024-bit operands, using the AltiVec vector instruction set:

shift left, right
rotate left, right
multiply (half and full result)
divide (with remainder)
mod
add
subtract
negate

Apple provides source for the vBigNum library for educational purposes. If you intend to use this code in your application, it is suggested that you use the routines in vecLib.framework instead. They are likely to be more optimized.

Download Source

Table of Contents Next Previous Top of Page