home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Frostbyte's 1980s DOS Shareware Collection
/
floppyshareware.zip
/
floppyshareware
/
DOOG
/
PCSSP2.ZIP
/
STATCORR.ZIP
/
STPRG.FOR
< prev
Wrap
Text File
|
1985-11-29
|
11KB
|
324 lines
C
C ..................................................................
C
C SUBROUTINE STPRG
C
C PURPOSE
C TO PERFORM A STEPWISE MULTIPLE REGRESSION ANALYSIS FOR A
C DEPENDENT VARIABLE AND A SET OF INDEPENDENT VARIABLES. AT
C EACH STEP, THE VARIABLE ENTERED INTO THE REGRESSION EQUATION
C IS THAT WHICH EXPLAINS THE GREATEST AMOUNT OF VARIANCE
C BETWEEN IT AND THE DEPENDENT VARIABLE (I.E. THE VARIABLE
C WITH THE HIGHEST PARTIAL CORRELATION WITH THE DEPENDENT
C VARIABLE). ANY VARIABLE CAN BE DESIGNATED AS THE DEPENDENT
C VARIABLE. ANY INDEPENDENT VARIABLE CAN BE FORCED INTO OR
C DELETED FROM THE REGRESSION EQUATION, IRRESPECTIVE OF ITS
C CONTRIBUTION TO THE EQUATION.
C
C USAGE
C CALL STPRG (M,N,D,XBAR,IDX,PCT,NSTEP,ANS,L,B,S,T,LL,IER)
C
C DESCRIPTION OF PARAMETERS
C M - TOTAL NUMBER OF VARIABLES IN DATA MATRIX
C N - NUMBER OF OBSERVATIONS
C D - INPUT MATRIX (M X M) OF SUMS OF CROSS-PRODUCTS OF
C DEVIATIONS FROM MEAN. THIS MATRIX WILL BE DESTROYED.
C XBAR - INPUT VECTOR OF LENGTH M OF MEANS
C IDX - INPUT VECTOR OF LENGTH M HAVING ONE OF THE FOLLOWING
C CODES FOR EACH VARIABLE.
C 0 - INDEPENDENT VARIABLE AVAILABLE FOR SELECTION
C 1 - INDEPENDENT VARIABLE TO BE FORCED INTO THE
C REGRESSION EQUATION
C 2 - VARIABLE NOT TO BE CONSIDERED IN THE EQUATION
C 3 - DEPENDENT VARIABLE
C THIS VECTOR WILL BE DESTROYED
C PCT - A CONSTANT VALUE INDICATING THE PROPORTION OF THE
C TOTAL VARIANCE TO BE EXPLAINED BY ANY INDEPENDENT
C VARIABLE. THOSE INDEPENDENT VARIABLES WHICH FALL
C BELOW THIS PROPORTION WILL NOT ENTER THE REGRESSION
C EQUATION. TO ENSURE THAT ALL VARIABLES ENTER THE
C EQUATION, SET PCT = 0.0.
C NSTEP- OUTPUT VECTOR OF LENGTH 5 CONTAINING THE FOLLOWING
C INFORMATION
C NSTEP(1)- THE NUMBER OF THE DEPENDENT VARIABLE
C NSTEP(2)- NUMBER OF VARIABLES FORCED INTO THE
C REGRESSION EQUATION
C NSTEP(3)- NUMBER OF VARIABLE DELETED FROM THE
C EQUATION
C NSTEP(4)- THE NUMBER OF THE LAST STEP
C NSTEP(5)- THE NUMBER OF THE LAST VARIABLE ENTERED
C ANS - OUTPUT VECTOR OF LENGTH 11 CONTAINING THE FOLLOWING
C INFORMATION FOR THE LAST STEP
C ANS(1)- SUM OF SQUARES REDUCED BY THIS STEP
C ANS(2)- PROPORTION OF TOTAL SUM OF SQUARES REDUCED
C ANS(3)- CUMULATIVE SUM OF SQUARES REDUCED UP TO
C THIS STEP
C ANS(4)- CUMULATIVE PROPORTION OF TOTAL SUM OF
C SQUARES REDUCED
C ANS(5)- SUM OF SQUARES OF THE DEPENDENT VARIABLE
C ANS(6)- MULTIPLE CORRELATION COEFFICIENT
C ANS(7)- F RATIO FOR SUM OF SQUARES DUE TO
C REGRESSION
C ANS(8)- STANDARD ERROR OF THE ESTIMATE (RESIDUAL
C MEAN SQUARE)
C ANS(9)- INTERCEPT CONSTANT
C ANS(10)-MULTIPLE CORRELATION COEFFICIENT ADJUSTED
C FOR DEGREES OF FREEDOM.
C ANS(11)-STANDARD ERROR OF THE ESTIMATE ADJUSTED
C FOR DEGREES OF FREEDOM.
C L - OUTPUT VECTOR OF LENGTH K, WHERE K IS THE NUMBER OF
C INDEPENDENT VARIABLES IN THE REGRESSION EQUATION.
C THIS VECTOR CONTAINS THE NUMBERS OF THE INDEPENDENT
C VARIABLES IN THE EQUATION.
C B - OUTPUT VECTOR OF LENGTH K, CONTAINING THE PARTIAL
C REGRESSION COEFFICIENTS CORRESPONDING TO THE
C VARIABLES IN VECTOR L.
C S - OUTPUT VECTOR OF LENGTH K, CONTAINING THE STANDARD
C ERRORS OF THE PARTIAL REGRESSION COEFFICIENTS,
C CORRESPONDING TO THE VARIABLES IN VECTOR L.
C T - OUTPUT VECTOR OF LENGTH K, CONTAINING THE COMPUTED
C T-VALUES CORRESPONDING TO THE VARIABLES IN VECTOR L.
C LL - WORKING VECTOR OF LENGTH M
C IER - 0, IF THERE IS NO ERROR.
C 1, IF RESIDUAL SUM OF SQUARES IS NEGATIVE OR IF THE
C PIVOTAL ELEMENT IN THE STEPWISE INVERSION PROCESS IS
C ZERO. IN THIS CASE, THE VARIABLE WHICH CAUSES THIS
C ERROR IS NOT ENTERED IN THE REGRESSION, THE RESULT
C PRIOR TO THIS STEP IS RETAINED, AND THE CURRENT
C SELECTION IS TERMINATED.
C
C REMARKS
C THE NUMBER OF DATA POINTS MUST BE AT LEAST GREATER THAN THE
C NUMBER OF INDEPENDENT VARIABLES PLUS ONE. FORCED VARIABLES
C ARE ENTERED INTO THE REGRESSION EQUATION BEFORE ALL OTHER
C INDEPENDENT VARIABLES. WITHIN THE SET OF FORCED VARIABLES,
C THE ONE TO BE CHOSEN FIRST WILL BE THAT ONE WHICH EXPLAINS
C THE GREATEST AMOUNT OF VARIANCE.
C INSTEAD OF USING, AS A STOPPING CRITERION, A PROPORTION OF
C THE TOTAL VARIANCE, SOME OTHER CRITERION MAY BE ADDED TO
C SUBROUTINE STOUT.
C
C SUBROUTINES AND FUNCTION SUBPROGRAMS REQUIRED
C STOUT(NSTEP,ANS,L,B,S,T,NSTOP)
C THIS SUBROUTINE MUST BE PROVIDED BY THE USER. IT IS AN
C OUTPUT ROUTINE WHICH WILL PRINT THE RESULTS OF EACH STEP OF
C THE REGRESSION ANALYSIS. NSTOP IS AN OPTION CODE WHICH IS
C ONE IF THE STEPWISE REGRESSION IS TO BE TERMINATED, AND IS
C ZERO IF IT IS TO CONTINUE. THE USER MUST CONSIDER THIS IF
C SOME OTHER STOPPING CRITERION THAN VARIANCE PROPORTION IS TO
C BE USED.
C
C METHOD
C THE ABBREVIATED DOOLITTLE METHOD IS USED TO (1) DECIDE VARI-
C ABLES ENTERING IN THE REGRESSION AND (2) COMPUTE REGRESSION
C COEFFICIENTS. REFER TO C. A. BENNETT AND N. L. FRANKLIN,
C 'STATISTICAL ANALYSIS IN CHEMISTRY AND THE CHEMICAL INDUS-
C TRY', JOHN WILEY AND SONS, 1954, APPENDIX 6A.
C
C ..................................................................
C
SUBROUTINE STPRG (M,N,D,XBAR,IDX,PCT,NSTEP,ANS,L,B,S,T,LL,IER)
C
DIMENSION D(1),XBAR(1),IDX(1),NSTEP(1),ANS(1),L(1),B(1),S(1),T(1),
1LL(1)
C
C ..................................................................
C
C IF A DOUBLE PRECISION VERSION OF THIS ROUTINE IS DESIRED, THE
C C IN COLUMN 1 SHOULD BE REMOVED FROM THE DOUBLE PRECISION
C STATEMENT WHICH FOLLOWS.
C
C DOUBLE PRECISION D,XBAR,ANS,B,S,T,RD,RE
C
C THE C MUST ALSO BE REMOVED FROM DOUBLE PRECISION STATEMENTS
C APPEARING IN OTHER ROUTINES USED IN CONJUNCTION WITH THIS
C ROUTINE.
C
C THE DOUBLE PRECISION VERSION OF THIS SUBROUTINE MUST ALSO
C CONTAIN DOUBLE PRECISION FORTRAN FUNCTIONS. SQRT IN STATEMENTS
C 85,90,114,132,AND 134, MUST BE CHANGED TO DSQRT.
C
C ..................................................................
C
C INITIALIZATION
C
IER=0
ONM=N-1
NFO=0
NSTEP(3)=0
ANS(3)=0.0
ANS(4)=0.0
NSTOP=0
C
C FIND DEPENDENT VARIABLE, NUMBER OF VARIABLES TO BE FORCED TO
C ENTER IN THE REGRESSION, AND NUMBER OF VARIABLES TO BE DELETED
C
DO 30 I=1,M
LL(I)=1
IF(IDX(I)) 30, 30, 10
10 IF(IDX(I)-2) 15, 20, 25
15 NFO=NFO+1
IDX(NFO)=I
GO TO 30
20 NSTEP(3)=NSTEP(3)+1
LL(I)=-1
GO TO 30
25 MY=I
NSTEP(1)=MY
LY=M*(MY-1)
LYP=LY+MY
ANS(5)=D(LYP)
30 CONTINUE
NSTEP(2)=NFO
C
C FIND THE MAXIMUM NUMBER OF STEPS
C
MX=M-NSTEP(3)-1
C
C START SELECTION OF VARIABLES
C
DO 140 NL=1,MX
RD=0
IF(NL-NFO) 35, 35, 55
C
C SELECT NEXT VARIABLE TO ENTER AMONG FORCED VARIABLES
C
35 DO 50 I=1,NFO
K=IDX(I)
IF(LL(K)) 50, 50, 40
40 LYP=LY+K
IP=M*(K-1)+K
RE=D(LYP)*D(LYP)/D(IP)
IF(RD-RE) 45, 50, 50
45 RD=RE
NEW=K
50 CONTINUE
GO TO 75
C
C SELECT NEXT VARIABLE TO ENTER AMONG NON-FORCED VARIABLES
C
55 DO 70 I=1,M
IF(I-MY) 60, 70, 60
60 IF(LL(I)) 70, 70, 62
62 LYP=LY+I
IP=M*(I-1)+I
RE=D(LYP)*D(LYP)/D(IP)
IF(RD-RE) 64, 70, 70
64 RD=RE
NEW=I
70 CONTINUE
C
C TEST WHETHER THE PROPORTION OF THE SUM OF SQUARES REDUCED BY
C THE LAST VARIABLE ENTERED IS GREATER THAN OR EQUAL TO THE
C SPECIFIED PROPORTION
C
75 IF(RD) 77,77,76
76 IF(ANS(5)-(ANS(3)+RD))77,77,78
77 IER=1
GO TO 150
78 RE=RD/ANS(5)
IF(RE-PCT) 150, 80, 80
C
C IT IS GREATER THAN OR EQUAL
C
80 LL(NEW)=0
L(NL)=NEW
ANS(1)=RD
ANS(2)=RE
ANS(3)=ANS(3)+RD
ANS(4)=ANS(4)+RE
NSTEP(4)=NL
NSTEP(5)=NEW
C
C COMPUTE MULTIPLE CORRELATION, F-VALUE FOR ANALYSIS OF
C VARIANCE, AND STANDARD ERROR OF ESTIMATE
C
85 ANS(6)= SQRT(ANS(4))
RD=NL
RE=ONM-RD
RE=(ANS(5)-ANS(3))/RE
ANS(7)=(ANS(3)/RD)/RE
90 ANS(8)= SQRT(RE)
C
C DIVIDE BY THE PIVOTAL ELEMENT
C
IP=M*(NEW-1)+NEW
RD=D(IP)
LYP=NEW-M
DO 100 J=1,M
LYP=LYP+M
IF(LL(J)) 100, 94, 97
94 IF(J-NEW) 96, 98, 96
96 IJ=M*(J-1)+J
D(IJ)=D(IJ)+D(LYP)*D(LYP)/RD
97 D(LYP)=D(LYP)/RD
GO TO 100
98 D(IP)=1.0/RD
100 CONTINUE
C
C COMPUTE REGRESSION COEFFICIENTS
C
LYP=LY+NEW
B(NL)=D(LYP)
IF(NL-1) 112, 112, 105
105 ID=NL-1
DO 110 J=1,ID
IJ=NL-J
KK=L(IJ)
LYP=LY+KK
B(IJ)=D(LYP)
DO 110 K=1,J
IK=NL-K+1
MK=L(IK)
LYP=M*(MK-1)+KK
110 B(IJ)=B(IJ)-D(LYP)*B(IK)
C
C COMPUTE INTERCEPT
C
112 ANS(9)=XBAR(MY)
DO 115 I=1,NL
KK=L(I)
ANS(9)=ANS(9)-B(I)*XBAR(KK)
IJ=M*(KK-1)+KK
114 S(I)=ANS(8)* SQRT(D(IJ))
115 T(I)=B(I)/S(I)
C
C PERFORM A REDUCTION TO ELIMINATE THE LAST VARIABLE ENTERED
C
IP=M*(NEW-1)
DO 130 I=1,M
IJ=I-M
IK=NEW-M
IP=IP+1
IF(LL(I)) 130, 130, 120
120 DO 126 J=1,M
IJ=IJ+M
IK=IK+M
IF(LL(J)) 126, 122, 122
122 IF(J-NEW) 124, 126, 124
124 D(IJ)=D(IJ)-D(IP)*D(IK)
126 CONTINUE
D(IP)=D(IP)/(-RD)
130 CONTINUE
C
C ADJUST STANDARD ERROR OF THE ESTIMATE AND MULTIPLE CORRELATION
C COEFFICIENT
C
RD=N-NSTEP(4)
RD=ONM/RD
132 ANS(10)=SQRT(1.0-(1.0-ANS(6)*ANS(6))*RD)
134 ANS(11)=ANS(8)*SQRT(RD)
C
C CALL THE OUTPUT SUBROUTINE
CALL STOUT (NSTEP,ANS,L,B,S,T,NSTOP)
C
C TEST WHETHER THE STEP-WISE REGRESSION WAS TERMINATED IN
C SUBROUTINE STOUT
C
IF(NSTOP) 140, 140, 150
C
140 CONTINUE
C
150 RETURN
END