Between Heaven & Hell 2

home *** CD-ROM | disk | FTP | other *** search

/ Between Heaven & Hell 2 / BetweenHeavenHell.cdr / 500 / 473 / multi.arc / PC-SIZE.DOC < prev next >

Wrap

Text File | 1986-01-08 | 32KB | 897 lines

PC-SIZE A Program for Sample Size Determinations Version 2.0 August 8, 1985 Gerard E. Dallal USDA Human Nutrition Research Center on Aging at Tufts University 711 Washington Street Boston, MA 02111 and Tufts University School of Nutrition 132 Curtis Street Medford, MA 02155 PC-SIZE determines the sample size requirements for single factor experiments, two factor experiments, randomized blocks designs, and paired t-tests. In generic F mode, PC-SIZE can determine sample sizes for any experiment in which the power at the alternative is given by a non-central F distribution with fixed numerator degrees of freedom, denominator degrees of freedom that are linear in the sample size, and a non- centrality parameter that is proportional to the sample size. PC-SIZE can determine the sample size needed to detect a non- zero population correlation coefficient when sampling from a bivariate normal distribution. It can also be used to obtain the common sample size required to test the equality of two proportions. PC-SIZE can calculate the power of specific sample sizes as well as determine the sample size needed to achieve specific power. NOTICE Documentation and original code copyright 1985 by Gerard E. Dallal. Reproduction of material for non-commercial purposes is permitted, without charge, provided that suitable reference is made to PC-SIZE and its author. Neither PC-SIZE nor its documentation should be modified in any way without permission from the author, except for those changes that are essential to move PC-SIZE to another computer. PAGE 2 Please acknowledge PC-SIZE in any manuscript that uses its calculations. TABLE OF CONTENTS Features.................................................. 2 Installation.............................................. 3 Operation................................................. 3 Specifying the design................................. 4 Specifying the alternative............................ 4 Generic F mode........................................ 5 Initial approximation................................. 5 Correlation coefficient................................... 5 Proportions............................................... 6 Other applications........................................ 6 Two sample t-test..................................... 6 Comparing a single sample to a known standard......... 6 Power of specific sample sizes............................ 7 Non-centrality parameters................................. 7 Validation................................................ 9 Algorithms................................................ 14 References................................................ 15 Sample size tables for the correlation coefficient........ 16 FEATURES 1. Flexibility: Query system for single factor, two factor, randomized blocks designs and paired t-tests. Generic Mode permits sample size calculations for many problems in which the power at the alternative is given by the non-central F distribution. 2. Portability: PC-SIZE is written in FORTRAN 77, but not too far from the 66 standard. To make PC-SIZE run on a VAX, for example, all you need do is modify the I/O unit numbers (contained in a single DATA statement) and an OPEN statement. PC-SIZE G.E. Dallal PAGE 3 3. PC-SIZE will calculate the power of a specific sample size as well as the sample size required to achieve specific power. 4. Calculations may be saved in a designated output file. 5. Double precision calculations are used throughout. 6. Quantities contained in square brackets at the prompts are default values which can be obtained by pressing the return key. Default values are updated with the latest entry for each quantity, thereby simplifying the task of requesting a number of sample size calculations that share many of the same specifications. 7. Trailing decimal points may be omitted or included as you wish. INSTALLATION PC-SIZE is written for the IBM-PC. Installation on a new computer may entail modifying the following statements: The first DATA statement: IIN -- input unit number (screen) IOUT -- output unit number (screen) IWOUT -- save file unit number NMAX0 -- large integer constant (the largest sample size that can be considered) The OPEN statement for the save file just before statement 10. OPERATION Operation begins with the user specifying the level of the test and the power required at the alternative. PC-SIZE will report the number of observations per cell, per group (in the case of proportions), or per randomized block. PC-SIZE G.E. Dallal PAGE 4 Specifying the Design Single factor designs: The user is prompted for the number of groups. Two factor designs: The user is prompted for the number of levels of each factor. (PC-SIZE assumes that the calculations are being carried out for the main effects of Factor A.) The user can then indicate whether an interaction term will be present in the model and the ANOVA table. (A * B * (N - 1) denominator degrees of freedom, where 'A' and 'B' are the number of levels of the two factors, if interaction is present; A*B*N - A - B + 1 denominator degrees of freedom, if not.) Randomized blocks designs: The user is prompted for the number of levels of the treatment factor. PC-SIZE calculates the number of blocks needed to achieve the desired power assuming each block receives one complete set of treatments. Paired t-tests: The user is prompted for the expected difference and the standard deviation of the differences. Specifying the Alternative In the cases of single factor, two factor, and randomized blocks designs, the user is given three options for specifying the alternative at which the power is to be evaluated: 1. Specifying the individual effects. PC-SIZE automatically centers the effects about zero. It is not necessary to subtract the mean from each effect before entry. 2. Specifying a range (a single number) for the effects. The minimum and maximum effects are assumed to occupy the endpoints of the range with the remaining effects distributed uniformly throughout. 3. Specifying the average squared effect (where, for this option, the mean has been subtracted from each effect before squaring) divided by the error variance. PC-SIZE G.E. Dallal PAGE 5 Generic F Mode Generic mode requires more sophistication on the part of the user but is capable of handling a wide variety of problems, specifically, any problem for which the power at the alternative is given by a non-central F distribution with fixed numerator degrees of freedom, denominator degrees of freedom that are linear in the sample size, and a non- centrality parameter that is a multiple of the sample size. (Non-centrality parameters are discussed below.) The user is prompted for the numerator degrees of freedom, the linear function that defines the denominator degrees of freedom, and the multiple of the sample size that defines the non- centrality parameter. Initial Approximation PC-SIZE invokes a "large sample approximation" (using a non- central chi-square power function in place of the non-central F) to get a rough estimate the necessary sample size. The power is calculated at increments of 1 if the estimate is less than 500, 10 if the estimate is between 500 and 5000, 100 if the estimated is between 5000 and 50000, and so on. The calculations start at the large sample estimate less 5% or a count of 10, whichever is greater, rounded to the nearest increment, and continue until the required power is obtained. The correlation coefficient and proportions are handled differently--see below. CORRELATION COEFFICIENT This mode is used when sampling from a bivariate normal population, neither of the two variables having its values fixed prior to sampling. PC-SIZE will calculate the sample size needed to carry out a two-tailed test of the hypothesis that the population correlation coefficient is 0. The user is prompted for a non-null value of the coefficient. Note: The distribution of the sample correlation coefficient when the population value is non-zero is obtained through numerical integration using Simpson's Rule with some bells and whistles to speed up convergence. Ordinates of the PC-SIZE G.E. Dallal PAGE 6 density function are calculated recursively, resulting in an execution time that is proportional to sample size. PC-SIZE reports the power of the test for sample sizes 3, (2**K: K=2,3,...) successively until the required power is exceeded. A binary search is them carried out (with intermediate results NOT reported) to locate the minimum adequate sample size. If the sample size is large, the binary search can consume large amounts of execution time. Tables beginning on page XX, produced by PC-SIZE, give the necessary sample size for tests of power 0.50(0.10)0.90, 0.95 at levels 0.05 and 0.01 for underlying population correlation coefficients of 0.05, 0.10(0.10)0.90. PROPORTIONS PC-SIZE uses formulas 3.14 and 3.15 of Fleiss(1981) to determine the common sample size for a test of the equality of two proportions. This estimate is a large sample approximation based on standard normal theory. The user is prompted for the values of the proportions under the alternative to equality. In some instances the values produced by PC-SIZE will be 1 greater than those in Fleiss's Table A.3. Fleiss has apparently taken the values produced by the formulae and rounded to the nearest integer. PC-SIZE reports the smallest integer not less than the the results of the formulae. OTHER APPLICATIONS Two Sample t-test This is a single factor analysis of variance with two groups. Comparing a Single Sample to a Known Standard Use the paired t-test mode setting the "expected difference" to the expected difference between the unknown population PC-SIZE G.E. Dallal PAGE 7 mean and the known standard. Set the "estimate of standard deviation of difference" to the estimated population standard deviation. POWER OF SPECIFIC SAMPLE SIZES PC-SIZE will perform power calculations for specific sample sizes as well as determine the sample size required to achieve specific power. If the requested power is an integer greater than or equal to 1, PC-SIZE starts its power calculations at a sample size equal to the requested power. The user is prompted for an increment and a stopping value. NON-CENTRALITY PARAMETERS Different authors use different definitions of the non- centrality parameter of the non-central F distribution. The differences typically involve a square root, a factor of (numerator degrees of freedom + 1), and/or a factor of 2. PC-SIZE follows the notation of Kendall and Stuart(1973, pp.237,262): The sum of the squares of "d" independent normal variables with arbitrary means and unit variances is said to follow a non-central chi-square distribution with "d" degrees of freedom and non-centrality parameter equal to the sum of the squared means. The ratio of a non-central chi- square variable with "d1" degrees of freedom and non- centrality parameter "lambda", divided by "d1", to an independent central chi-square variable with "d2" degrees of freedom, divided by "d2", is said to follow a non-central F distribution with "d1" numerator degrees of freedom, "d2" denominator degrees of freedom, and non-centrality parameter "lambda". Scheffe(1959,p.414) defines his non-centrality parameter to be the square root of this quantity. Following Graybill(1961, Theorem 11.16), a non-centrality parameter can be obtained as the numerator degrees of freedom times (the difference between the numerator expected mean square and the error variance) divided by the error variance. It is assumed that the error variance is given by the expected mean square of the denominator of the F-ratio. PC-SIZE G.E. Dallal PAGE 8 The following notation is used throughout this section: ALPHA -- level of the test POWER -- power at the alternative K -- number of effects under test (number of groups, levels,...) F1 -- numerator degrees of freedom F2 -- denominator degrees of freedom AVGESQ -- average squared effect divided by the error variance LAMBDA -- non-centrality parameter N -- sample size EVAR -- error variance (often within cell) EFF(I) -- the I-th of the effects under test [ AVGESQ = (SUM(EFF(I)**2) / K) / EVAR ] 1. Single Factor Experiment (K Groups): LAMBDA = N * SUM(EFF(I)**2) / EVAR = N * K * AVGESQ 2. Two Factor Experiment (Factor A -- "A" levels; Factor B -- "B" levels): Main effects for Factor A: LAMBDA = N * B * SUM(EFF(I)**2) / EVAR = N * A * B * AVGESQ Two factor interaction: LAMBDA = N * SUM(EFF(I)**2) / EVAR = N * A * B * AVGESQ 3. Randomized blocks designs (Single treatment factor at K levels): LAMBDA = N * SUM(EFF(I)**2) / EVAR = N * K * AVGESQ 4. Simple linear regression: E(Y(i)) = C0 + C1 * X(i) (N observations at each X(i), i=1,...,p, with mean 0) LAMBDA = N * (C1**2 * SUM(X(I)**2)) / EVAR PC-SIZE G.E. Dallal PAGE 9 5. Quadratic regression: E(Y(i)) = C0 + C1 * X(i) + C2 * X(i)**2 H0: C1 = C2 = 0: LAMBDA= N * (C1**2 * SUM(X(i)**2)+ 2 * C1 * C2 * SUM(X(i)**3 + C2**2 * SUM(X(i)**4) / EVAR H0: C2 = 0 LAMBDA = C2**2 * SUM(X(i)**4) VALIDATION PC-SIZE was validated by applying it to all of the examples from sections 3.2 through and including 3.6 of Odeh and Fox (1975) which were reproduced with the following exceptions: example 3.3.1 (main effects for A with no interaction in the model): OF estimate 3. PC-SIZE calculates the power of a sample of size 3 to be 0.79896 (<0.80). 4 are needed. example 3.5.2 (test of quadratic regression term): OF estimate 40. PC-SIZE calculates the power of a sample of size 40 to be 0.94796 (<0.95). 41 are needed. example 3.6.2 (multivariate t-test): OF estimate 100. PC- SIZE calculates the power of a sample of size 100 to be 0.99484 (<0.995). 101 are needed. The values of the arguments and the resulting sample size estimates from PC-SIZE are: Single Factor Experiment (K Groups) LAMBDA = N * SUM(EFF(I)**2) / EVAR = N * K * AVGESQ Example 3.2.1: PC-SIZE G.E. Dallal PAGE 10 ALPHA = 0.05 POWER = 0.80 K = 2 F1 = 1 F2 = 2 * (N - 1) AVGESQ = 2 LAMBDA = 4 * N N = 4 Example 3.2.2: ALPHA = 0.025 POWER = 0.70 K = 3 F1 = 2 F2 = 3 * (N - 1) AVGESQ = 1/3 LAMBDA = 1 * N N = 11 Example 3.2.3: ALPHA = 0.01 POWER = 0.975 K = 6 F1 = 5 F2 = 6 * (N - 1) AVGESQ = 2/3 LAMBDA = 4 * N N = 9 Two Factor Experiment (Factor A -- "A" levels; Factor B -- "B" levels) Main effects for Factor A: LAMBDA = N * B * SUM(EFF(I)**2) / EVAR = N * A * B * AVGESQ A * B interaction: LAMBDA = N * SUM(EFF(I)**2) / EVAR = N * A * B * AVGESQ where EFF(i),i=1,...,A*B are the interaction terms. Example 3.3.1: Main effects for A with interaction in model: ALPHA = 0.05 POWER = 0.80 A = 3 F1 = 2 F2 = 6 * (N - 1) B = 2 AVGESQ = 2/3 LAMBDA = 4 * N N = 4 Main effects for A with no interaction in model: ALPHA = 0.05 POWER = 0.80 A = 3 F1 = 2 F2 = 6 * N - 4 B = 2 PC-SIZE G.E. Dallal PAGE 11 AVGESQ = 2/3 LAMBDA = 4 * N N = 4 Test for interaction (Use generic mode): ALPHA = 0.05 POWER = 0.90 K = 6 F1 = 2 F2 = 6 * (N - 1) AVGESQ = 1/2 LAMBDA = 3 * N N = 5 Example 3.3.2: Main effects for A with interaction in model: ALPHA = 0.005 POWER = 0.60 A = 4 F1 = 3 F2 = 16 * (N - 1) B = 4 AVGESQ = 1 LAMBDA = 16 * N N = 2 Main effects for A with no interaction in model: ALPHA = 0.005 POWER = 0.60 A = 4 F1 = 3 F2 = 16 * N - 7 B = 4 AVGESQ = 1 LAMBDA = 16 * N N = 2 Test for interaction (Use generic mode): ALPHA = 0.10 POWER = 0.60 K = 16 F1 = 9 F2 = 16 * (N - 1) AVGESQ = 1/8 LAMBDA = 2 * N N = 5 Example 3.3.3: Main effects for A with interaction in model: ALPHA = 0.01 POWER = 0.70 A = 2 F1 = 1 F2 = 6 * (N - 1) B = 3 AVGESQ = 1 LAMBDA = 6 * N N = 3 Main effects for A with no interaction in model: ALPHA = 0.01 POWER = 0.70 A = 2 F1 = 1 F2 = 6 * N - 4 B = 3 AVGESQ = 1 LAMBDA = 6 * N N = 3 PC-SIZE G.E. Dallal PAGE 12 Test for interaction (Use generic mode): ALPHA = 0.001 POWER = 0.90 K = 6 F1 = 2 F2 = 6 * (N - 1) AVGESQ = 1/2 LAMBDA = 3 * N N = 10 Randomized blocks designs (Single treatment factor at K levels) LAMBDA = N * SUM(EFF(I)**2) / EVAR LAMBDA = N * K * AVGESQ Example 3.4.1(i): ALPHA = 0.05 POWER = 0.90 K = 3 F1 = 2 F2 = 2 * (N - 1) AVGESQ = 2/3 LAMBDA = 2 * N N = 8 Example 3.4.1(ii): multiple treatment factors use generic mode ALPHA = 0.05 POWER = 0.90 A = B = 3 F1 = 2 F2 = 8 * (N - 1) AVGESQ = 2/3 LAMBDA = 6 * N N = 3 Example 3.4.2: multiple treatment factors use generic mode ALPHA = 0.001 POWER = 0.95 A = B = 2 F1 = 1 F2 = 12 * N - 2 K = 1,...,6*N AVGESQ = 1 LAMBDA = 24 * N N = 2 Example 3.4.3: multiple treatment factors use generic mode ALPHA = 0.025 POWER = 0.70 A = 6 F1 = 5 F2 = 17 * (N - 1) B = 3 AVGESQ = 1/3 LAMBDA = 6 * N N = 3 PC-SIZE G.E. Dallal PAGE 13 Regression using Generic Mode Simple linear regression E(Y(i)) = C0 + C1 * X(i) (N observations at each X(i), i=1,...,p, with mean 0) LAMBDA = N * (C1**2 * SUM(X(I)**2)) / EVAR Quadratic regression E(Y(i)) = C0 + C1 * X(i) + C2 * X(i)**2 LAMBDA= N * (C1**2 * SUM(X(i)**2)+ 2 * C1 * C2* SUM(X(i)**3 + C2**2 * SUM(X(i)**4) / EVAR Example 3.5.1 (linear): ALPHA = 0.001 POWER = 0.995 F1 = 1 F2 = 3 * N - 2 LAMBDA = 17 * N N = 5 Example 3.5.1 (quadratic): H0: C1 = C2 = 0 ALPHA = 0.001 POWER = 0.995 F1 = 2 F2 = 3 * (N - 1) LAMBDA = 144 * N N = 3 Example 3.5.1 (quadratic): H0: C2 = 0 ALPHA = 0.001 POWER = 0.995 F1 = 1 F2 = 3 * (N - 1) LAMBDA = 257 * N N = 3 Example 3.5.2 (linear): ALPHA = 0.025 POWER = 0.95 F1 = 1 F2 = 6 * N - 2 LAMBDA = 1.150 * N N = 14 Example 3.5.2 (quadratic): H0: C2 = 0 ALPHA = 0.025 POWER = 0.95 F1 = 1 F2 = 3 * (N - 1) PC-SIZE G.E. Dallal PAGE 14 LAMBDA = .382 * N N = 41 Multivariate t-test Example 3.6.1 : ALPHA = 0.10 POWER = 0.70 F1 = 5 F2 = N - 5 LAMBDA = 1 * N N = 14 Example 3.6.2: ALPHA = 0.10 POWER = 0.995 F1 = 4 F2 = 2 * N - 5 LAMBDA = .25 * N N = 101 ALGORITHMS PC-SIZE makes use of the following published routines, modified to run in double precision: Best, D.J. and D.E. Roberts (1975). Algorithm AS 91. The percentage points of the chi-squared distribution. Appl. Statist.,24,385-388. Bhattacharjee, G.P. (1970). The incomplete gamma integral. Appl. Statist.,19,285-287. Cran, G.W., K.J. Martin and G.E. Thomas (1977). Remark AS R19 and Algorithm AS 109. A remark on algorithms AS 63: The incomplete beta integral, and AS 64: Inverse of the incomplete beta function ratio. Appl. Statist.,26,111-114. Hill, I.D. (1973). Algorithm AS 66. The normal integral. Appl. Statist.,22,424-427. Majumder, K.L. and G.P. Bhattacharjee (1973). Algorithm AS 63. The incomplete beta integral. Appl. Statist.,22,409-411. PC-SIZE G.E. Dallal PAGE 15 Odeh, R.E. and J.O. Evans (1974). Algorithm AS 70. The percentage points of the normal distribution. Appl. Statist.,23,96-97. and the author's FORTRAN translation of Pike, M.C. and I.D. Hill (1966). Algorithm 291. Logarithm of the gamma function. Commun. Ass. Comput. Mach.,9,684. REFERENCES Fleiss, Joseph L. (1981). Statistical Methods for Rates and Proportions, 2-nd ed. New York: John Wiley & Sons, Inc. Graybill, Franklin A. (1961). An Introduction to Linear Models, Vol, 1. New York: McGraw-Hill Book Company, Inc. Kendall, Maurice G. and Alan Stuart (1973). The Advanced Theory of Statistics, Volume 2, 3-rd ed. New York: Hafner Publishing Co. Odeh, Robert E. and Martin Fox (1975). Sample Size Choice: Charts for Experiments with Linear Models. New York: Marcel Dekker, Inc. Scheffe, Henry (1959). The Analysis of Variance. New York: John Wiley and Sons, Inc. PC-SIZE G.E. Dallal PAGE 16 SAMPLE SIZE FOR THE TEST OF A NON-ZERO CORRELATION COEFFICIENT ALPHA = 0.05 POWER 0.50 0.60 0.70 0.80 0.90 0.95 RHO: 0.05 1536 1959 2467 3137 4198 5192 0.10 384 489 616 782 1046 1293 0.20 96 122 153 193 258 319 0.30 43 54 67 84 112 138 0.40 24 30 37 46 61 75 0.50 15 19 23 29 37 46 0.60 11 13 15 19 24 30 0.70 8 9 11 13 17 20 0.80 6 7 8 9 11 13 0.90 5 5 6 6 8 9 ALPHA = 0.01 POWER 0.50 0.60 0.70 0.80 0.90 0.95 RHO: 0.05 2653 3199 3841 4667 5944 7116 0.10 662 798 958 1163 1481 1772 0.20 165 198 237 287 365 436 0.30 72 87 103 125 158 189 0.40 40 48 57 68 86 102 0.50 25 30 35 42 52 62 0.60 17 20 23 27 34 40 0.70 12 14 16 19 23 27 0.80 9 10 11 13 15 18 0.90 6 7 8 9 10 11 PC-SIZE G.E. Dallal