In article <1992Dec15.145457@is.morgan.com> robt@is.morgan.com (Rob Torop) writes:
>
>I have some code, the core of which is a loop that computes the transformation of a
>vector by a band matrix multiple times:
>
> x <-- Bx where B = b c 0 . . . 0
> a b c . . . 0
> . . .
> 0 0 0 . a b c
>
>Right now, I'm basically doing the obvious thing, looping over the vector. I have this nagging feeling that there is something brilliant and very different that can be done. I'm running on Sun Sparcs (2 and 10), in case that makes a difference. Any advice?
>
>
>-- Rob Torop
Yes, you're right about there being other ways to do this. Your
banded matrix has special structure - it is a Toeplitz matrix, that
is, the diagonals have constant entries. Multiplication of a vector
by a Toeplitz matrix can be done by convolution - consult Numerical
Recipes, for example, for an explanation of convolution. However,
there is an even easier way in the case of your compact, tridiagonal
matrix.
Think of your matrix B as the sum aI(-) + bI + cI(+), where I is the
identity matrix, I(-) is a matrix of all ones along the first
sub-diagonal, and I(+) is all ones along the first super-diagonal.
The effect of multiplication by I(-) is to shift a vector forward by
one entry, with the first entry replaced by zero. Thus,
I(-)*[1,2,3] = [0,1,2]; I intend my vector notation to denote a
column vector.
Similarly, multiplication by I(+) shifts
the entries back one, replacing the last entry by zero:
I(+)*[1,2,3]=[2,3,0]. I, of course, simply returns the orginal
vector.
If you put this all together, our example would be
Bx = a*[0,1,2] + b*[1,2,3] + c*[2,3,0]
The generalization to your problem is straightforward. This is done
in 3*(n-1) + 1 flops, whereas general matrix-vector multiplication
requries n^2 flops, where n is the length of the vector. If your
matrix is Toeplitz, but more of the diagonals are filled out, you
can use Fast Fourier transforms to do the equivalent convolutions,
which could be done in order n*log2(n) operations. In practice, the
speed of the FFT approach depends a lot on implementation details,
and so your effort wouldn't be repayed except at large n (~ 512 or