home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!pmafire!mica.inel.gov!ux1!fcom.cc.utah.edu!hellgate.utah.edu!cs.utexas.edu!zaphod.mps.ohio-state.edu!rpi!newsserver.pixel.kodak.com!kodak!eastman!b56vxg.kodak.com!ekdug
- From: ekdug@b56vxg.kodak.com (Linda Stustman)
- Newsgroups: sci.math.stat
- Subject: Help With Statistics on "Compressed Data"
- Keywords: statistics, compressed data,
- Message-ID: <5NOV199209582623@b56vxg.kodak.com>
- Date: 5 Nov 92 14:58:00 GMT
- Sender: news@eastman.UUCP
- Organization: Eastman Kodak Company, Rochester NY
- Lines: 42
- News-Software: VAX/VMS VNEWS 1.41
-
- I'm looking for a way to estimate an upper bound on the standard deviation
- of a stream of plant data. The complication is that the data is coming from
- a process data base that has a type of compression applied to it.
-
- Simply put, data from the process is generated ever 30 minutes (an analysis
- by a gas chromatograph). The process data base compares the new value with
- the previous one and only records the new value (with an associated time
- stamp) if the absolute value of the difference between the two readings is
- greater than a fixed threshold.
-
- In practice, this means 3 to 6 values a day are recorded for the variable,
- out of the 48 analyses that are actually done. My problem is to come up
- with a way of providing a reasonable estimate for the standard deviation
- of the analysis values that uses the information present in the recorded
- values, but also includes the information that the other 42 to 45 analyses
- varied less that the threshold value.
-
- My only idea on how to attack the problem (to date) is to assume that the
- range of +/- the threshold value corresponds to +/- 3 sigma of a normally
- distributed variable. Then, I could generate the appropriate number of
- "missing" values from a random normal distribution, add the recorded
- values and calculate a standard deviation from the resulting augmented
- "data" set. Perhaps arguments could be made for treating the +/- range
- of the threshold value as +/- 2 or +/- 4 sigma (and does anyone have any
- comments?).
-
- Or, is there some well-known (but not by me) method of handling this
- situation? Comments are *most* welcome!
-
- (As a answer to a possible question as to why I seem to be trying to
- calculate a daily standard deviation, the process has several sources of
- rather slow drifts superimposed over the normal measurement noise that
- occurs in industrial processes. One day is long enough to do reasonable
- averages over and not so long as to pick up too much of the drift
- contributions. Historical data has to be used because we can't go back
- to the older modes of operating the process.)
-
- Please reply to the conference, if you feel this topic might have broad
- interest, or to jhcox@Kodak.com
-
- Thanks in advance,
- JHCox
-