Club Amiga de Montreal

home *** CD-ROM | disk | FTP | other *** search

/ Club Amiga de Montreal - CAM / CAM_CD_1.iso / files / 326.lha / KFFT_v1.1 / fft.doc.pp / fft.doc

Wrap

Text File | 1990-01-03 | 17KB | 405 lines

KFFT.DOC KFFT version 1.1 Jerry Kallaus 14 Feb 1989 KFFT V1.1 (C)Copyright 1989, Jerry Kallaus. All rights reserved. May be freely redistributed for non-commercial use (FREEWARE). Jerry Kallaus 993-D Mangrove Sunnyvale, Ca 94086 Feedback welcome. --------------------------------------------------------------------- KFFT -- Fast Fourier Transform CONTENTS 1. Introduction 2. Files 3. Words 4. Stack Diagrams 5. Conditional Compile Controls 6. Run Time Controls and Global Variables 7. Data Representation 8. Performance 9. Future Versions 10. Miscellaneous 1. INTRODUCTION The following documents a Fast Fourier Transform (FFT) program which is written in JForth* for Commodore Amiga computers. A common computationally intensive bottleneck in a wide variety of digital signal processing (DSP) applications is the transformation of data back and forth between the time and frequency domain using the FFT. The original motivation for writing this program was to satisfy my curiosity as to how fast an FFT could be performed on a stock Commodore Amiga computer with a 7.14 MHz Motorola 68000 microproccesor. Another motivating factor was the idea that a good FFT program is probably the most massively useful thing to have lying around when doing DSP applications. One set of source code is used which conditionally compiles variations of the same basic algorithm, namely a radix-2 Cooley-Tukey FFT. A choice had to be made as to whether to have one set of source code cluttered with conditional compilation statements, or to maintain umpty-dozen sets of source code. The former was chosen. Some of the FFT variations achieve a speed which is probably within about twenty percent of what could ultimately be achieved by implementing the features described in the section on future versions. Currently, times as good as 0.54 seconds for a 1024 point complex fixed-point FFT, and 6.5 seconds for a 1024 point floating-point FFT have been achieved. (see section on Performance). While JForth does a good job of supporting Forth standards, some "features" of JForth may have been used which may or may not make this code difficult to port to other Forths. The current FFT version works with JForth version 2.0 and 1.2. As part of cleaning up some of the clutter in future FFT versions, support for JForth v1.2 will probably be dropped altogether. The first few sections of this document consists of rather terse reference material, with more elaboration given in later sections. * JForth is a product of Delta Research P.O. Box 1051 San Rafael, Ca 94915 2. FILES The file KFFT1.ZOO contains the following files: fft.doc this file cmplx.doc documentation for complex arithmetic support words fftasm.doc documentation for assembly language support words fftinc INCLUDEs for compile fftcontrols Conditional compilation controls fft1 Forth FFT code for complex data fftrc Forth FFT code for real data fftmisc Forth miscellaneous support words cmplx Complex arithmetic and data type Forth words fft.asm FFT assembler support words wtable cosine, sine table for FFT makewtable utility for making wtable testfft simple test code for fft fft1.for Fortran source which concisely depicts algorithm. 3. WORDS FFT fft - complex data in, complex data out FFTRC fft - real data in, complex data out IFFT inverse fft - complex data in, complex data out IFFTCR inverse fft - complex data in, real data out CHECK.INPUTS.FFT sanity check of fft arguments BIT.REVERSAL performs bit reversal reordering operation INIT.MAP.FFT makes bit reversal map for quick-reversal QUICK.REVERSAL internal faster bit reversal, used with init.map.fft See also cmplx.doc and fftasm.doc 4. STACK DIAGRAMS FFT ( data-address log2-fft-size -- ) FFTRC ( data-address log2-fft-size -- ) IFFT ( data-address log2-fft-size -- ) IFFTCR ( data-address log2-fft-size -- ) CHECK.INPUTS.FFT ( data-address log2-fft-size -- ) BIT.REVERSAL ( data-address log2-fft-size -- ) INIT.MAP.FFT ( reversal-map-address log2-fft-size -- ) QUICK.REVERSAL ( data-address reversal-map-address -- ) Note that for FFTRC and IFFTCR, the argument log2-fft-size represents the number of data points in the real array. Also, for 2N real data points, FFTRC computes a transform with N+1 real numbers, and N-1 imaginary numbers. The first and Nth imaginary parts are zero; rather than working with odd ball array sizes, FFTRC puts the N+1 real part in the imaginary part of the first complex number of the transform. Likewise, IFFTCR expects to find it there. See also cmplx.doc and fftasm.doc For example usage, see file testfft; words tfftrc and tifftcr. 5. FFT CONDITIONAL COMPILE CONTROLS The file fftcontrols contains the following flags which control the conditional compilation (* indicates default): FLOAT_FFT? true => version for floating point data * false => version for fixed point data W_TABLE_FFT? * true => use table for cosine, sine values false => compute cosine, sine values as needed ASM_FFT? * true => use assembler code for some basic operations false => use JForth code for everything INNER_ASM_FFT? * true => use assembler code for inner loop of FFT false => don't use assembler for inner loop of FFT AUTO_SCALE_FFT? * true => fixed pt version using block floating point falue => no auto scaling Certain combinations of flags are not supported and logic is present which precludes those combinations. Specifically, if FLOAT-FFT? is false, then W_TABLE_FFT? will be forced to true if FLOAT-FFT? is true, then INNER_ASM_FFT? will be forced to false if FLOAT-FFT? is true, then AUTO_SCALE_FFT? will be forced to false In other words, for a fixed point version, table trig values must be used; and there is no assembler code for the inner-loop or the auto-scale feature for the floating version. The logic which forces the above flag conditions is provided primarily as a convenience which allows switching from a fixed-point version to a floating-point version or vice versa by simply changing FLOAT_FFT? and not bothering with the other flags. 6. RUN TIME CONTROLS AND GLOBAL VARIABLES The following global variables provide run time controls and fft output scaling control and information. Use of any of these is optional, and except for REVERSAL-FFT, all are applicable only to fixed-point versions. REVERSAL-FFT addr => pre-formed reversal map address EVENS-FFT true => divide by 2 on even fft stages ODDS-FFT true => divide by 2 on odd fft stages INSHIFT-FFT value specifying right shift of input to first stage OUTBITS-FFT value specifying number of significant output bits SHIFTS-FFT value returned specifying number of shifts done on data BLK-EXP-FFT adjusted block floating point exponent Explanation: Part of the FFT computation involves reordering the data by a process called bit-reversal, so called because pairs of elements are exchanged whose indices are the bit-reversal of each other. The FFT algorithms provided here will by default perform this bit-reversal logic each time an FFT is computed using a word named BIT.REVERSAL. However, the word INIT.MAP.FFT may optionally be used to eliminate most of this. As an example, suppose the user wishes to compute many 1024 point FFT's, then the following code could be used, 512 ZARRAY mymap ( note only need half as many elements ) 0 mymap 10 INIT.MAP.FFT ( 2**10 = 1024, 0 mymap is mymap begin addr ) The word INIT.MAP.FFT will compute and store into mymap a bit-reversal map consisting of index swap pairs and a zero terminator at the end of the list. A little less than the 512 elements are actually used. The address of mymap is stored in the global variable REVERSAL-FFT. Later, when the FFT algorithms are used, the non-zero value in REVERSAL-FFT will indicate that a bit-reversal map exists and where it is located; and instead of using the normal BIT.REVERSAL word, a word named QUICK.REVERSAL will be used. The FFT of any data set of the same size may be taken using the same map and the user simply uses the FFT algorithms as normal. If different sizes of FFT's are desired, simply use INIT.MAP.FFT for each FFT size, providing a different array for each size. Only now, just before each use of the FFT algorthms, stuff the appropriate map array address into the variable REVERSAL-FFT. The following applies to the scaling of output data from the various FFT algorithms. For floating FFT's, scaling the output is left to the user. For fixed-point non-auto-scaled FFT's, the scaling of the output data is left to the user, but the use of the variables INSHIFT-FFT, EVENS-FFT, and ODDS-FFT will affect the scaling. For auto-scaled FFT's, the code uses a block floating point concept to maintain maximum accuracy without overflows throughout the computation. For auto-scaled FFT's, the variables INSHIFT-FFT, SHIFTS-FFT, BLK-EXP-FFT, and OUTBITS-FFT are used to control and inform the user of the the scaling of the output data. Also, a number of independent functions are provided for determining and changing the scaling of arrays of fixed point data. The casual reader may wish to skip to the last paragraph of this section, as the following tends to go on ad nauseum about scaling considerations when doing fixed point fft's. A radix-2 fft computes the FFT in log2 n stages; for example, a 1024 point FFT takes 10 stages. The root mean square magnitude of the complex elements of the FFT increase by the square root of two on each stage, and specific elements may double (note that the RMS value doubles for every two stages). This isn't really a problem with a floating point FFT; however, for a fixed point FFT, this can create arithmetic overflow problems. For a FFT fixed point version which does NOT use auto-scale, the following is applicable. The flags EVENS-FFT and ODDS-FFT can be set to scale-down (shift) the data by a factor of two on even or odd stages or both. For this purpose, the stages are counted zero through M-1, where M is the number of stages. The scale-down occurs on the input of the stage. The default is for scale-down on odd stages. When taking forward or inverse FFT's, if it is known or suspected that the data is impulsive in nature in the domain being transformed to, scale-down on both even and odd stages should probably be used to avoid arithmetic overflows. Unfortunately, when taking large fft's, the preceding capability has serious shortcomings. If a shift is done on every stage, compete loss of significance can occur. If a shift is done on every other stage, overflows can occur. Both situations are unacceptable. The auto-scale feature now described addresses that problem. The basic operation done on each element on each stage is of the form An+1(i) = An(i) + W * An(j), where n refers to the old stage, n+1 refers to the new stage, i and j are array indices, and A and W are complex numbers. W is the cosine and sine of some angle, and complex multiplication by W represents a rotation about the origin in the complex plane. Assume that a fixed-point representation is used with an implied binary point such that the magnitude of each An is less than one, then the maximum magnitude of each An+1 will be less than two. So it would be possible to compute the magnitude of each An+1, pick the maximum of these, and if it is greater than one, right shift all the data by one on input to the next stage, which would ensure that all values remain less than one. Unfortunately, the additional multiplies, adds, compares, etc., required would nearly double the fft time. A much more efficient solution is as follows. If the magnitude of each real and imaginary part of each An is limited to be less than one, then the maximum magnitude that the real or imaginary part of any An+1 can be is 1+sqrt(1+1), or approximately 2.414, and would occur when W corresponds to a 45 degree rotation. The implementation of this is as follows. The absolute value of the real and imaginary parts of all the An are bitwise logically OR'd together in a register. If the resultant value is less than one, no shift is done on the input to the next stage. If the value is between one and two, the next stage input is right shifted by one. If the value is greater than two, the next stage input is right shifted by two. This algorithm only adds about ten percent to the fft time. It should be mentioned here that on machines which have the same word length for everything, say 16-bits, the implementaion of this algorithm would lose one bit of significance relative to the algorithm which tracks the complex magnitudes. This happens because the real and imaginary parts would have to be kept smaller than one-half to prevent overflows. However, on the M68000 microprocessor, even though fixed point multipliers and multiplicands are limited to 16-bits, everything else may be 32-bits. Thus, it is permissable for the operation to overflow by two bits between stages, and there is no loss in significance. The variable: INSHIFT-FFT may be used to specify a right shift ammount for the data going into the first stage of the fft. OUTBITS-FFT may be used to specify the number of significant bits to use for the output of the fft. SHIFTS-FFT an output value specifying the total number of shifts performed on the data. BLK-EXP-FFT input and output value which represents the block floating point exponent of the data. It is updated by the FFT code simply by adding shifts-fft to the input value. 7. DATA REPRESENTATION Each value uses one cell (long word, 4-bytes, 32-bits). For complex numbers, the following conventions apply. On the stack, the imaginary part is on top with the real part under it. In memory, the real part is at the lower memory address with the imaginary part at the adjacent higher memory cell address. For fixed point data, the following applies. Although 32-bits are used to hold each value, the signed value must be contained within the low-order 16-bits, with the high-order 16-bits being the sign extension of the low-order 16-bits. The variable INSHIFT-FFT is primarily provided for handling data with more than 16-bits significance, including sign bit. The functions OR.ABS.ARRAY and NSBITS may be used for rapidly determining the magnitude of an array of fixed point data. 8. PERFORMANCE Times in seconds for 1024 point fft for complex (FFT, IFFT) and real (FTRC,IFFCR) data. The run-time quick-reversal option was used in all cases. Add one-tenth of a second for the time without this option. complex real float asm-fft w-table inner-asm auto-scale 6.92 3.72 T F F F F 3.92 2.00 F F F F F 1.24 .60 F T T F F .54 .34 F T T T F .62 .38 F T T T T 9. FUTURE VERSIONS The following are only possibilities, and will be probably only be developed if and when the need arises. The timing performance improvements are very rough guestimates. Add option to use table for all trig values, as opposed to their recursive computation. (5-10 percent) Provide additional inner-loops which eliminate multiplies by one's and zero's, and possibly combine multiplies by sqrt(2). (5-10 percent) Probably will not develop radix-4 fft algorithm (10-20 percent) or other/mixed radix algorithms. Provide some commonly used windowing functions. Change to algrithm(s) which do the bit-reversal after the transform, rather than prior to the transform. This would be preferable for high speed convolution (filtering) and correlation applications which do not necessarily require any bit-reversal. 10. MISCELLANEOUS A discussion or derivation of the fast fourier transform is beyond the scope of this document. Numerous texts on digital signal processing are available for this purpose. I would particularly recommend "Digital Signal Processing " edited by Lawrence R. Rabiner and Charles M. Radar, IEEE Press.