The C Users' Group Library 1994 August

home *** CD-ROM | disk | FTP | other *** search

/ The C Users' Group Library 1994 August / wc-cdrom-cusersgrouplibrary-1994-08.iso / vol_200 / 209_01 / simpfitr.doc < prev next >

Wrap

Text File | 1990-03-04 | 15KB | 533 lines

SIMPFITR.DOC VERS:- 01.00 DATE:- 09/26/86 TIME:- 10:02:49 PM documentation for C routines for simplex fitting and quadratic approximation of least squares surface. By J.A. Rupley, Tucson, Arizona NOTES ON THE SIMPLEX FITTING PROGRAM XXXXFITn AND ITS SUPPORTING MODULES A. DESCRIPTION OF THE PROGRAM: A function explicitly relating a dependent variable to one or more independent variables and up to ten variable parameters is fit to a set of data points (observed values of the dependent and independent variables). The Nelder-Mead algorithm is used to locate the least squares minimum by a simplex search procedure, giving estimates of the best-fit values of the parameters. Standard deviations of the parameters are estimated by use of a quadratic approximation of the least squares surface near the minimum. The quadratic approximation, equivalent to minimization by use of a Taylor's series approximation, also returns an improved estimate of the parameter. Thus the quadratic approximation can be used as part of the minimization procedure. Alternation of several cycles (eg, 30) of simplex minimization with one cycle of quadratic minimization quickly converges on the least squares minimum. The simplex search efficiently defines the region of parameter space that contains the minimum; bounds are easily incorporated; because derivatives are not used, ill-behaved functions can be handled. The quadratic minimization, on the other hand, rapidly moves to the least squares minimum, but it is reliable only when the parameter estimates have been brought near the minimum. One cycle of quadratic minimization takes about as much time as 30 cycles of simplex minimization. If the quadratic minimization fails, as it often will in the early stages of the search, considerable time is wasted. To reduce this waste, the next one or several cycles of quadratic minimization are skipped after a failure. 1 USAGE: A>xxxxfitn [d:]input_file [[d:]output_file] the optional output file can be LST: or a disk file; the required input file has the following structure: one-line title, with no tabs or ctrl characters; lines following give control variables, the starting simplex, and the data array; the order of entry is fixed by <read_data()>; see sample file for structure; the one-word descriptors must be present; the data format is free-form (no set field widths); comments at the end of the file are not read by the program and any other information to be stored should be there. 2 B. CONTENTS OF THE C SOURCE FILES: SIMPMAIN: main program controlling input, output & fitting = main() XXXXFITn: routines for simplex fitting that depend on the data and function to be fit (must be changed when function fit is changed): all external definitions, except for <data>, which is defined as dummy storage in SIMPLIB1 or its modification function for calculation of dependent variable and weighted sum of residuals squared = func() print of <data> records = fdatprint() customizable display called by <simpfit()> = fspecial() other special functions called by above functions SIMPLIB0: general routines for simplex fitting: simplex fitting routine = simpfit() quadratic fit for standard deviations = simpdev() supporting functions SIMPLIB1: routines for data input that depend (partially) on the data and function to be fit: definition of the aggregate <data>, with a dummy structure declaration usage message displayed on command line error = use_mess() opening of files for input and optional output = file() input of variables and data = read_data() 3 C. PROGRAM FLOW: Program flow is controlled by <main()>. I. A call to <read_data()> at the start of execution initializes the following: a one-line title string, without tabs or other control characters; control values that are used in testing (1) for exit from the minimization (<exit_test>), (2) for cycles on which to print intermediate fitting results (<prt_cycle>), and (3) for cycles on which to pass through the quadratic approximation (<maxquad_skip> and <quad_test>); <iter>, the starting interation number, generally set at 0; the starting simplex <p>, with <nvert> vertices and <nparm> parameters; <nvert> = 1 + number of free parameters, <nfree>; parameters that are the same for all vertices are treated as fixed; the data set, stored in the aggregate <data> of size <ndata>, each record of <data[ndata]> having <ndatval> members. On return from <read_data()>: <maxiter> is set at <iter> + <prt_cycle>; <nquad_skip> and <quad_cycle>, used in resetting <quad_test>, are set at zero and <quad_test>, respectively; The first command line argument specifies the input file from which the above information is read. The (optional) second argument specifies the (optional) output file or device, on which a selection of the crt display is saved. A sample input file is given in a following section. II. Fitting by use of the Nelder-Mead simplex algorithm is carried out by call of <simpfit()>. Control is returned to main every <prt_cycle> number of fitting cycles, for printout of the current simplex and its centroid. III. On certain cycles of simplex minimization, determined by the value of <quad_test>, the current simplex is passed to the quadratic approximation routine <simpdev()>. Printout consists of (1) the data array with calculated values of the dependent variable and (2) a summary of the results of the quadratic fit, 4 including estimates of the standard deviations of the parameters. <Simpdev()> returns a modified simplex, with, if quadratic minimization was successful, the new low vertex closer to the least squares minimum. The variance-covariance matrix is returned in <qmat[nfree][nfree]>. The standard deviations of the parameters and other information are returned in the structure <q>. IV. Minimization consists of looping through steps II and III, until the value of <test> is less than <exit_test>: <exit_test> = input value <test> = <rms_func>/<mean_func> <rms_func> = (root mean square of the deviations of the least squares values at the simplex vertices) <mean_func> = (mean of the least squares values) On exit from the minimization, a pass through <simpdev()> gives the final estimates of the standard deviations of the parameters. The alternation of a set of simplex fitting cycles with a pass through the quadratic approximation gives rapid convergence to the least squares minimum. In order to control this, <quad_test> can be used in either of two ways: If in the input file <quad_test> is set < 1, then <simpdev()> is entered at the first printing cycle for which <test> is less than <quad_test>. Before looping back to <simpfit()>, <quad_test> is reduced by a factor of 10. If in the input file <quad_test> is set >= 1 (in which case it should be an integer multiple of <prt_cycle>), then <simpdev()> is entered every {<quad_cycle> * (<nquad_skip> + 1)} number of cycles. <nquad_skip> is initially zero; it is incremented on each failure of the quadratic minimization, up to the limit <maxquad_skip>; on a successful quadratic minimization, <nquad_skip> is reset at zero. This algorithm reduces the time spent in unsuccessful quadratic minimization. The operator can choose, from within <simpfit()>, to return to <main()> and pass through <simpdev()> for a cycle of quadratic approximation. 5 D. CODING IN XXXXFITn FOR THE SPECIFIC FUNCTION TO BE FIT: The routine <func()> calculates for a simplex vertex the weighted sum of residuals for the set of parameter values passed to it. The result is stored in <p.val>. Bounds on the parameters can be implemented easily, by a test that returns an excessively large function value (eg, 1.E38) if a parameter exceeds a bound. Coding should be efficient, because much of the minimization time is spent in <func()>. The print statement of the routine <fdatprint()> may have to be recoded after change in the function or data. The routine <fspecial()> controls the customizable display printed at the end of the standard display for each cycle of the simplex minimization in <simpfit()>. Other routines should be function and data independent, unless special manipulation of the data is required. NOTE: change in the model being fit should require changes in the coding only of the function of XXXXFITn. 6 E. COMMENTS ON THE CONSTRUCTION OF THE SIMPLEX FITTING PROGRAM: About 4K of memory are reserved to allow expansion of <main()> and<func()>, for more elaborate output, functions, etc. The maximum number of parameters, NPARM, is set at 10; if more are needed, all routines (SIMPMAIN, SIMPLIB0, etc) must be recompiled with the new value of NPARM. To make more readable the coding in <func()> of the model equation to be fit to the data: (1) use mnemonic member names in declaring <struct dat> in XXXXFITn; (2) declare a dummy structure, <struct pnamestruct>, that is entirely equivalent to the structure that holds the parameter values, <pstruct>, but that has mnemonic member names; the mnemonic dummy structure then can be used with the <pstruct> address passed as a parameter to <func()>. The DEFINITION of the aggregate <data> and the functions <use_mess()>, <file()>, and <read_data()> are in a separate file, SIMPLIB1; this is to to allow expansion of the aggregate <data> by overwriting most of SIMPLIB1; the SIMPLIB1 routines are entered only once, at the start of execution. the DECLARATION of <struct dat> according to the requirements of the model described by <func()> is given in XXXXFITn; <func()>, etc. reference the aggregate <data> as external to XXXXFITn, but through the structure <dat>, declared locally in XXXXFITn with mnemonic member names suitable for use in the coding of <func()>, etc. The intent is to generalize the read of the data file and the allocation of data storage (in SIMPLIB1), while retaining flexibility in the declaration of <struct dat data> in XXXXFITn; the following comments bear upon this arrangement: The loading of values into the aggregate <data> is done in the SIMPLIB1 module <read_me()> by use of: (1) a generalized ("dummy") structure for <data>; (2) a read loop that moves successive double values from the ascii data file into the storage at and above <data[0]>, without referencing the elements of <data> by structure member or index. 7 The "useful" declaration of the structure for <data> is in XXXXFITn, where it is referenced by <func()> and <fdatprint()>; <struct dat> must be changed to accord with the requirements of <func()> and <fdatprint()>; all members of <struct dat> MUST be of type double. Change in the model being fit should not require recoding and recompilation of <read_me()> or of any other routines except those of XXXXFITn; of course, change in the model requires change of <func()>, <fdatprint()>, and of the declarations of <struct dat> and <struct pnamestruct> in XXXXFITn. The size of the <data> aggregate is limited by (1) the size of free memory and (2) the size of SIMPLIB1, most of which can be overwritten by data records; for this version of the program, SIMPLIB1 corresponds to about 600 double values, and unused memory to about 600 double values; overwriting of the code of SIMPLIB1 may not be allowed by some compilers. For the six-member structure <dat> used in this version of the program, the maximum number of data points is more than 100 (600 double values), expandable to more than 200 (1200 double values) if SIMPLIB1 is recompiled with an increase of NDATA; NDATA is currently set at 350 double values; increase of NDATA of course decreases the amount of memory available for expansion of the code of <main()>, <func()>, etc. 8