home *** CD-ROM | disk | FTP | other *** search
- Comments: Gated by NETNEWS@AUVM.AMERICAN.EDU
- Path: sparky!uunet!zaphod.mps.ohio-state.edu!howland.reston.ans.net!paladin.american.edu!auvm!vm.sas.com!mozart.unx.sas.com!saswfk
- Originator: saswfk@thurstne.unx.sas.com
- X-Sender: news@unx.sas.com (Noter of Newsworthy Events)
- References: <8289@news.duke.edu>
- Nntp-Posting-Host: thurstne.unx.sas.com
- Organization: SAS Institute Inc. Cary NC
- Keywords: scaling
- Lines: 69
- Message-ID: <C0C3ox.H04@unx.sas.com>
- Newsgroups: bit.listserv.stat-l
- Date: Mon, 4 Jan 1993 14:46:09 GMT
- Sender: STATISTICAL CONSULTING <STAT-L@MCGILL1.BITNET>
- Comments: Warning -- original Sender: tag was NETNEWS@VM.SAS.COM
- From: "Warren F. Kuhfeld" <saswfk@UNX.SAS.COM>
- Subject: Re: qualitative principal components
-
- In article <8289@news.duke.edu>, Frank Harrell <feh@DUKE.EDU> writes:
- |> What is the state of the art in scaling techniques for combinations
- |> of continuous, ordinal, polytomous, and binary variables? To me,
- |> qualitative principal components using the alternating least squares-type
- |> techniques in SAS PROC PRINQUAL look very promising, but we have had
- |> tremendous convergence problems using this procedure.
-
- As I see it, the problem with using PROC PRINQUAL on many data sets is
- simply that at best there are too few observations for the number of
- parameters, and at worst, there are more parameters than observations.
- Consider 20 categorical variables with 10 categories each. There are
- 20 * (10 - 1) parameter estimates required for the optimal scoring.
-
- Another problem is that the algorithms often work "too well", finding
- an uninteresting or silly solution that is in fact the optimal
- solution. Consider the two-dimensional point cloud:
-
- X X X
- X X X X
- X X X X X X
- A X X XX X
- X X X X
- X X X
-
- If given the freedom to do so, PROC PRINQUAL could transform this to:
-
- A X
-
- It tries to collapse all the X's into one point. Often it does not
- quite succeed and the X's get *almost* the same scores. If the
- original mean and variance are restored, "A" will get extreme scores.
- This problem is most acute when ordinary (period) missing values are
- optimally scored. The example in the SAS manual showing how to use
- PRINQUAL to estimate missing data, I now believe, is not very useful.
- That technique too frequently leads to optimal but uninteresting
- solutions.
-
- In my (unfortunately unpublished) dissertation, I concluded that the
- best way to compute principal components of ordered categorical data
- was to first perform a rank transformation, and then perform an
- ordinary PCA of the ranks.
-
- In the ALS approach, binary variables can be treated as nominal,
- ordinary, or interval. It does not matter; the results will be the
- same.
-
- If the total number of categories in all of the nominal variables is
- small relative to the total number of observations, consider optimally
- scoring them with PRINQUAL. If the ((degree plus the number of knots)
- times the number of spline variables) is small relative to the total
- number of observations, consider splines. However, if either of these
- numbers is large, you may get uninteresting results. If there are
- multivariate outliers, you may also get uninteresting results.
-
- The 6.07 release of PRINQUAL for MVS, CMS, VMS has a REITERATE option.
- It allows you to output the results, change the model, and start
- iterating again using the previous results as a starting point. It
- also allows you to specify random initial scores. Perhaps this might
- help.
-
- --
- ----------------------------------------------------------------------
- Warren F. Kuhfeld Statistical R & D (919) 677-8000 x7922
- saswfk@unx.sas.com SAS Institute Inc. (919) 677-8123 (Fax)
- Cary, NC 27513-2414
-