home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Between Heaven & Hell 2
/
BetweenHeavenHell.cdr
/
500
/
473
/
multi.arc
/
PC-SIZE.DOC
< prev
next >
Wrap
Text File
|
1986-01-08
|
32KB
|
897 lines
PC-SIZE
A Program for Sample Size Determinations
Version 2.0
August 8, 1985
Gerard E. Dallal
USDA Human Nutrition Research Center on Aging
at Tufts University
711 Washington Street
Boston, MA 02111
and
Tufts University School of Nutrition
132 Curtis Street
Medford, MA 02155
PC-SIZE determines the sample size requirements for single
factor experiments, two factor experiments, randomized blocks
designs, and paired t-tests. In generic F mode, PC-SIZE can
determine sample sizes for any experiment in which the power
at the alternative is given by a non-central F distribution
with fixed numerator degrees of freedom, denominator degrees
of freedom that are linear in the sample size, and a non-
centrality parameter that is proportional to the sample size.
PC-SIZE can determine the sample size needed to detect a non-
zero population correlation coefficient when sampling from a
bivariate normal distribution. It can also be used to obtain
the common sample size required to test the equality of two
proportions. PC-SIZE can calculate the power of specific
sample sizes as well as determine the sample size needed to
achieve specific power.
NOTICE
Documentation and original code copyright 1985 by Gerard E.
Dallal. Reproduction of material for non-commercial purposes
is permitted, without charge, provided that suitable
reference is made to PC-SIZE and its author.
Neither PC-SIZE nor its documentation should be modified in
any way without permission from the author, except for those
changes that are essential to move PC-SIZE to another
computer.
PAGE 2
Please acknowledge PC-SIZE in any manuscript that uses its
calculations.
TABLE OF CONTENTS
Features.................................................. 2
Installation.............................................. 3
Operation................................................. 3
Specifying the design................................. 4
Specifying the alternative............................ 4
Generic F mode........................................ 5
Initial approximation................................. 5
Correlation coefficient................................... 5
Proportions............................................... 6
Other applications........................................ 6
Two sample t-test..................................... 6
Comparing a single sample to a known standard......... 6
Power of specific sample sizes............................ 7
Non-centrality parameters................................. 7
Validation................................................ 9
Algorithms................................................ 14
References................................................ 15
Sample size tables for the correlation coefficient........ 16
FEATURES
1. Flexibility:
Query system for single factor, two factor, randomized
blocks designs and paired t-tests.
Generic Mode permits sample size calculations for many
problems in which the power at the alternative is
given by the non-central F distribution.
2. Portability: PC-SIZE is written in FORTRAN 77, but not
too far from the 66 standard. To make PC-SIZE run on a
VAX, for example, all you need do is modify the I/O unit
numbers (contained in a single DATA statement) and an
OPEN statement.
PC-SIZE G.E. Dallal
PAGE 3
3. PC-SIZE will calculate the power of a specific sample
size as well as the sample size required to achieve
specific power.
4. Calculations may be saved in a designated output file.
5. Double precision calculations are used throughout.
6. Quantities contained in square brackets at the prompts
are default values which can be obtained by pressing the
return key. Default values are updated with the latest
entry for each quantity, thereby simplifying the task of
requesting a number of sample size calculations that
share many of the same specifications.
7. Trailing decimal points may be omitted or included as you
wish.
INSTALLATION
PC-SIZE is written for the IBM-PC. Installation on a new
computer may entail modifying the following statements:
The first DATA statement:
IIN -- input unit number (screen)
IOUT -- output unit number (screen)
IWOUT -- save file unit number
NMAX0 -- large integer constant (the
largest sample size that can
be considered)
The OPEN statement for the save file just before statement
10.
OPERATION
Operation begins with the user specifying the level of the
test and the power required at the alternative. PC-SIZE will
report the number of observations per cell, per group (in the
case of proportions), or per randomized block.
PC-SIZE G.E. Dallal
PAGE 4
Specifying the Design
Single factor designs: The user is prompted for the number
of groups.
Two factor designs: The user is prompted for the number of
levels of each factor. (PC-SIZE assumes that the calculations
are being carried out for the main effects of Factor A.) The
user can then indicate whether an interaction term will be
present in the model and the ANOVA table. (A * B * (N - 1)
denominator degrees of freedom, where 'A' and 'B' are the
number of levels of the two factors, if interaction is
present; A*B*N - A - B + 1 denominator degrees of freedom, if
not.)
Randomized blocks designs: The user is prompted for the
number of levels of the treatment factor. PC-SIZE calculates
the number of blocks needed to achieve the desired power
assuming each block receives one complete set of treatments.
Paired t-tests: The user is prompted for the expected
difference and the standard deviation of the differences.
Specifying the Alternative
In the cases of single factor, two factor, and randomized
blocks designs, the user is given three options for
specifying the alternative at which the power is to be
evaluated:
1. Specifying the individual effects. PC-SIZE automatically
centers the effects about zero. It is not necessary to
subtract the mean from each effect before entry.
2. Specifying a range (a single number) for the effects.
The minimum and maximum effects are assumed to occupy the
endpoints of the range with the remaining effects
distributed uniformly throughout.
3. Specifying the average squared effect (where, for this
option, the mean has been subtracted from each effect
before squaring) divided by the error variance.
PC-SIZE G.E. Dallal
PAGE 5
Generic F Mode
Generic mode requires more sophistication on the part of the
user but is capable of handling a wide variety of problems,
specifically, any problem for which the power at the
alternative is given by a non-central F distribution with
fixed numerator degrees of freedom, denominator degrees of
freedom that are linear in the sample size, and a non-
centrality parameter that is a multiple of the sample size.
(Non-centrality parameters are discussed below.) The user is
prompted for the numerator degrees of freedom, the linear
function that defines the denominator degrees of freedom, and
the multiple of the sample size that defines the non-
centrality parameter.
Initial Approximation
PC-SIZE invokes a "large sample approximation" (using a non-
central chi-square power function in place of the non-central
F) to get a rough estimate the necessary sample size. The
power is calculated at increments of 1 if the estimate is
less than 500, 10 if the estimate is between 500 and 5000,
100 if the estimated is between 5000 and 50000, and so on.
The calculations start at the large sample estimate less 5%
or a count of 10, whichever is greater, rounded to the
nearest increment, and continue until the required power is
obtained. The correlation coefficient and proportions are
handled differently--see below.
CORRELATION COEFFICIENT
This mode is used when sampling from a bivariate normal
population, neither of the two variables having its values
fixed prior to sampling. PC-SIZE will calculate the sample
size needed to carry out a two-tailed test of the hypothesis
that the population correlation coefficient is 0. The user
is prompted for a non-null value of the coefficient.
Note: The distribution of the sample correlation coefficient
when the population value is non-zero is obtained through
numerical integration using Simpson's Rule with some bells
and whistles to speed up convergence. Ordinates of the
PC-SIZE G.E. Dallal
PAGE 6
density function are calculated recursively, resulting in an
execution time that is proportional to sample size.
PC-SIZE reports the power of the test for sample sizes 3,
(2**K: K=2,3,...) successively until the required power is
exceeded. A binary search is them carried out (with
intermediate results NOT reported) to locate the minimum
adequate sample size. If the sample size is large, the
binary search can consume large amounts of execution time.
Tables beginning on page XX, produced by PC-SIZE, give the
necessary sample size for tests of power 0.50(0.10)0.90, 0.95
at levels 0.05 and 0.01 for underlying population correlation
coefficients of 0.05, 0.10(0.10)0.90.
PROPORTIONS
PC-SIZE uses formulas 3.14 and 3.15 of Fleiss(1981) to
determine the common sample size for a test of the equality
of two proportions. This estimate is a large sample
approximation based on standard normal theory. The user is
prompted for the values of the proportions under the
alternative to equality.
In some instances the values produced by PC-SIZE will be 1
greater than those in Fleiss's Table A.3. Fleiss has
apparently taken the values produced by the formulae and
rounded to the nearest integer. PC-SIZE reports the smallest
integer not less than the the results of the formulae.
OTHER APPLICATIONS
Two Sample t-test
This is a single factor analysis of variance with two groups.
Comparing a Single Sample to a Known Standard
Use the paired t-test mode setting the "expected difference"
to the expected difference between the unknown population
PC-SIZE G.E. Dallal
PAGE 7
mean and the known standard. Set the "estimate of standard
deviation of difference" to the estimated population standard
deviation.
POWER OF SPECIFIC SAMPLE SIZES
PC-SIZE will perform power calculations for specific sample
sizes as well as determine the sample size required to
achieve specific power. If the requested power is an integer
greater than or equal to 1, PC-SIZE starts its power
calculations at a sample size equal to the requested power.
The user is prompted for an increment and a stopping value.
NON-CENTRALITY PARAMETERS
Different authors use different definitions of the non-
centrality parameter of the non-central F distribution. The
differences typically involve a square root, a factor of
(numerator degrees of freedom + 1), and/or a factor of 2.
PC-SIZE follows the notation of Kendall and Stuart(1973,
pp.237,262): The sum of the squares of "d" independent
normal variables with arbitrary means and unit variances is
said to follow a non-central chi-square distribution with "d"
degrees of freedom and non-centrality parameter equal to the
sum of the squared means. The ratio of a non-central chi-
square variable with "d1" degrees of freedom and non-
centrality parameter "lambda", divided by "d1", to an
independent central chi-square variable with "d2" degrees of
freedom, divided by "d2", is said to follow a non-central F
distribution with "d1" numerator degrees of freedom, "d2"
denominator degrees of freedom, and non-centrality parameter
"lambda". Scheffe(1959,p.414) defines his non-centrality
parameter to be the square root of this quantity.
Following Graybill(1961, Theorem 11.16), a non-centrality
parameter can be obtained as the numerator degrees of freedom
times (the difference between the numerator expected mean
square and the error variance) divided by the error variance.
It is assumed that the error variance is given by the
expected mean square of the denominator of the F-ratio.
PC-SIZE G.E. Dallal
PAGE 8
The following notation is used throughout this section:
ALPHA -- level of the test
POWER -- power at the alternative
K -- number of effects under test
(number of groups, levels,...)
F1 -- numerator degrees of freedom
F2 -- denominator degrees of freedom
AVGESQ -- average squared effect divided by
the error variance
LAMBDA -- non-centrality parameter
N -- sample size
EVAR -- error variance (often within cell)
EFF(I) -- the I-th of the effects under test
[ AVGESQ = (SUM(EFF(I)**2) / K) / EVAR ]
1. Single Factor Experiment (K Groups):
LAMBDA = N * SUM(EFF(I)**2) / EVAR
= N * K * AVGESQ
2. Two Factor Experiment (Factor A -- "A" levels; Factor B
-- "B" levels):
Main effects for Factor A:
LAMBDA = N * B * SUM(EFF(I)**2) / EVAR
= N * A * B * AVGESQ
Two factor interaction:
LAMBDA = N * SUM(EFF(I)**2) / EVAR
= N * A * B * AVGESQ
3. Randomized blocks designs (Single treatment factor at K
levels):
LAMBDA = N * SUM(EFF(I)**2) / EVAR
= N * K * AVGESQ
4. Simple linear regression: E(Y(i)) = C0 + C1 * X(i)
(N observations at each X(i), i=1,...,p, with mean 0)
LAMBDA = N * (C1**2 * SUM(X(I)**2)) / EVAR
PC-SIZE G.E. Dallal
PAGE 9
5. Quadratic regression:
E(Y(i)) = C0 + C1 * X(i) + C2 * X(i)**2
H0: C1 = C2 = 0:
LAMBDA=
N * (C1**2 * SUM(X(i)**2)+ 2 * C1 * C2 * SUM(X(i)**3
+ C2**2 * SUM(X(i)**4) / EVAR
H0: C2 = 0
LAMBDA = C2**2 * SUM(X(i)**4)
VALIDATION
PC-SIZE was validated by applying it to all of the examples
from sections 3.2 through and including 3.6 of Odeh and Fox
(1975) which were reproduced with the following exceptions:
example 3.3.1 (main effects for A with no interaction in the
model): OF estimate 3. PC-SIZE calculates the power of a
sample of size 3 to be 0.79896 (<0.80). 4 are needed.
example 3.5.2 (test of quadratic regression term): OF
estimate 40. PC-SIZE calculates the power of a sample of
size 40 to be 0.94796 (<0.95). 41 are needed.
example 3.6.2 (multivariate t-test): OF estimate 100. PC-
SIZE calculates the power of a sample of size 100 to be
0.99484 (<0.995). 101 are needed.
The values of the arguments and the resulting sample size
estimates from PC-SIZE are:
Single Factor Experiment
(K Groups)
LAMBDA = N * SUM(EFF(I)**2) / EVAR
= N * K * AVGESQ
Example 3.2.1:
PC-SIZE G.E. Dallal
PAGE 10
ALPHA = 0.05 POWER = 0.80 K = 2
F1 = 1 F2 = 2 * (N - 1)
AVGESQ = 2 LAMBDA = 4 * N N = 4
Example 3.2.2:
ALPHA = 0.025 POWER = 0.70 K = 3
F1 = 2 F2 = 3 * (N - 1)
AVGESQ = 1/3 LAMBDA = 1 * N N = 11
Example 3.2.3:
ALPHA = 0.01 POWER = 0.975 K = 6
F1 = 5 F2 = 6 * (N - 1)
AVGESQ = 2/3 LAMBDA = 4 * N N = 9
Two Factor Experiment
(Factor A -- "A" levels; Factor B -- "B" levels)
Main effects for Factor A:
LAMBDA = N * B * SUM(EFF(I)**2) / EVAR
= N * A * B * AVGESQ
A * B interaction:
LAMBDA = N * SUM(EFF(I)**2) / EVAR
= N * A * B * AVGESQ
where EFF(i),i=1,...,A*B are the interaction terms.
Example 3.3.1:
Main effects for A with interaction in model:
ALPHA = 0.05 POWER = 0.80 A = 3
F1 = 2 F2 = 6 * (N - 1) B = 2
AVGESQ = 2/3 LAMBDA = 4 * N N = 4
Main effects for A with no interaction in model:
ALPHA = 0.05 POWER = 0.80 A = 3
F1 = 2 F2 = 6 * N - 4 B = 2
PC-SIZE G.E. Dallal
PAGE 11
AVGESQ = 2/3 LAMBDA = 4 * N N = 4
Test for interaction (Use generic mode):
ALPHA = 0.05 POWER = 0.90 K = 6
F1 = 2 F2 = 6 * (N - 1)
AVGESQ = 1/2 LAMBDA = 3 * N N = 5
Example 3.3.2:
Main effects for A with interaction in model:
ALPHA = 0.005 POWER = 0.60 A = 4
F1 = 3 F2 = 16 * (N - 1) B = 4
AVGESQ = 1 LAMBDA = 16 * N N = 2
Main effects for A with no interaction in model:
ALPHA = 0.005 POWER = 0.60 A = 4
F1 = 3 F2 = 16 * N - 7 B = 4
AVGESQ = 1 LAMBDA = 16 * N N = 2
Test for interaction (Use generic mode):
ALPHA = 0.10 POWER = 0.60 K = 16
F1 = 9 F2 = 16 * (N - 1)
AVGESQ = 1/8 LAMBDA = 2 * N N = 5
Example 3.3.3:
Main effects for A with interaction in model:
ALPHA = 0.01 POWER = 0.70 A = 2
F1 = 1 F2 = 6 * (N - 1) B = 3
AVGESQ = 1 LAMBDA = 6 * N N = 3
Main effects for A with no interaction in model:
ALPHA = 0.01 POWER = 0.70 A = 2
F1 = 1 F2 = 6 * N - 4 B = 3
AVGESQ = 1 LAMBDA = 6 * N N = 3
PC-SIZE G.E. Dallal
PAGE 12
Test for interaction (Use generic mode):
ALPHA = 0.001 POWER = 0.90 K = 6
F1 = 2 F2 = 6 * (N - 1)
AVGESQ = 1/2 LAMBDA = 3 * N N = 10
Randomized blocks designs
(Single treatment factor at K levels)
LAMBDA = N * SUM(EFF(I)**2) / EVAR
LAMBDA = N * K * AVGESQ
Example 3.4.1(i):
ALPHA = 0.05 POWER = 0.90 K = 3
F1 = 2 F2 = 2 * (N - 1)
AVGESQ = 2/3 LAMBDA = 2 * N N = 8
Example 3.4.1(ii): multiple treatment factors
use generic mode
ALPHA = 0.05 POWER = 0.90 A = B = 3
F1 = 2 F2 = 8 * (N - 1)
AVGESQ = 2/3 LAMBDA = 6 * N N = 3
Example 3.4.2: multiple treatment factors
use generic mode
ALPHA = 0.001 POWER = 0.95 A = B = 2
F1 = 1 F2 = 12 * N - 2 K = 1,...,6*N
AVGESQ = 1 LAMBDA = 24 * N N = 2
Example 3.4.3: multiple treatment factors
use generic mode
ALPHA = 0.025 POWER = 0.70 A = 6
F1 = 5 F2 = 17 * (N - 1) B = 3
AVGESQ = 1/3 LAMBDA = 6 * N N = 3
PC-SIZE G.E. Dallal
PAGE 13
Regression using Generic Mode
Simple linear regression
E(Y(i)) = C0 + C1 * X(i)
(N observations at each X(i), i=1,...,p, with mean 0)
LAMBDA = N * (C1**2 * SUM(X(I)**2)) / EVAR
Quadratic regression
E(Y(i)) = C0 + C1 * X(i) + C2 * X(i)**2
LAMBDA=
N * (C1**2 * SUM(X(i)**2)+ 2 * C1 * C2* SUM(X(i)**3
+ C2**2 * SUM(X(i)**4) / EVAR
Example 3.5.1 (linear):
ALPHA = 0.001 POWER = 0.995
F1 = 1 F2 = 3 * N - 2
LAMBDA = 17 * N N = 5
Example 3.5.1 (quadratic): H0: C1 = C2 = 0
ALPHA = 0.001 POWER = 0.995
F1 = 2 F2 = 3 * (N - 1)
LAMBDA = 144 * N N = 3
Example 3.5.1 (quadratic): H0: C2 = 0
ALPHA = 0.001 POWER = 0.995
F1 = 1 F2 = 3 * (N - 1)
LAMBDA = 257 * N N = 3
Example 3.5.2 (linear):
ALPHA = 0.025 POWER = 0.95
F1 = 1 F2 = 6 * N - 2
LAMBDA = 1.150 * N N = 14
Example 3.5.2 (quadratic): H0: C2 = 0
ALPHA = 0.025 POWER = 0.95
F1 = 1 F2 = 3 * (N - 1)
PC-SIZE G.E. Dallal
PAGE 14
LAMBDA = .382 * N N = 41
Multivariate t-test
Example 3.6.1 :
ALPHA = 0.10 POWER = 0.70
F1 = 5 F2 = N - 5
LAMBDA = 1 * N N = 14
Example 3.6.2:
ALPHA = 0.10 POWER = 0.995
F1 = 4 F2 = 2 * N - 5
LAMBDA = .25 * N N = 101
ALGORITHMS
PC-SIZE makes use of the following published routines,
modified to run in double precision:
Best, D.J. and D.E. Roberts (1975). Algorithm AS 91. The
percentage points of the chi-squared distribution. Appl.
Statist.,24,385-388.
Bhattacharjee, G.P. (1970). The incomplete gamma integral.
Appl. Statist.,19,285-287.
Cran, G.W., K.J. Martin and G.E. Thomas (1977). Remark
AS R19 and Algorithm AS 109. A remark on algorithms AS
63: The incomplete beta integral, and AS 64: Inverse of
the incomplete beta function ratio. Appl.
Statist.,26,111-114.
Hill, I.D. (1973). Algorithm AS 66. The normal integral.
Appl. Statist.,22,424-427.
Majumder, K.L. and G.P. Bhattacharjee (1973). Algorithm
AS 63. The incomplete beta integral. Appl.
Statist.,22,409-411.
PC-SIZE G.E. Dallal
PAGE 15
Odeh, R.E. and J.O. Evans (1974). Algorithm AS 70. The
percentage points of the normal distribution. Appl.
Statist.,23,96-97.
and the author's FORTRAN translation of
Pike, M.C. and I.D. Hill (1966). Algorithm 291. Logarithm
of the gamma function. Commun. Ass. Comput. Mach.,9,684.
REFERENCES
Fleiss, Joseph L. (1981). Statistical Methods for Rates and
Proportions, 2-nd ed. New York: John Wiley & Sons, Inc.
Graybill, Franklin A. (1961). An Introduction to Linear
Models, Vol, 1. New York: McGraw-Hill Book Company, Inc.
Kendall, Maurice G. and Alan Stuart (1973). The Advanced
Theory of Statistics, Volume 2, 3-rd ed. New York: Hafner
Publishing Co.
Odeh, Robert E. and Martin Fox (1975). Sample Size Choice:
Charts for Experiments with Linear Models. New York:
Marcel Dekker, Inc.
Scheffe, Henry (1959). The Analysis of Variance. New York:
John Wiley and Sons, Inc.
PC-SIZE G.E. Dallal
PAGE 16
SAMPLE SIZE FOR THE TEST OF A NON-ZERO
CORRELATION COEFFICIENT
ALPHA = 0.05
POWER
0.50 0.60 0.70 0.80 0.90 0.95
RHO:
0.05 1536 1959 2467 3137 4198 5192
0.10 384 489 616 782 1046 1293
0.20 96 122 153 193 258 319
0.30 43 54 67 84 112 138
0.40 24 30 37 46 61 75
0.50 15 19 23 29 37 46
0.60 11 13 15 19 24 30
0.70 8 9 11 13 17 20
0.80 6 7 8 9 11 13
0.90 5 5 6 6 8 9
ALPHA = 0.01
POWER
0.50 0.60 0.70 0.80 0.90 0.95
RHO:
0.05 2653 3199 3841 4667 5944 7116
0.10 662 798 958 1163 1481 1772
0.20 165 198 237 287 365 436
0.30 72 87 103 125 158 189
0.40 40 48 57 68 86 102
0.50 25 30 35 42 52 62
0.60 17 20 23 27 34 40
0.70 12 14 16 19 23 27
0.80 9 10 11 13 15 18
0.90 6 7 8 9 10 11
PC-SIZE G.E. Dallal