NetNews Usenet Archive 1992 #26

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #26 / NN_1992_26.iso / spool / sci / math / stat / 2302 < prev next >

Wrap

Internet Message Format | 1992-11-05 | 2.7 KB

Path: sparky!uunet!pmafire!mica.inel.gov!ux1!fcom.cc.utah.edu!hellgate.utah.edu!cs.utexas.edu!zaphod.mps.ohio-state.edu!rpi!newsserver.pixel.kodak.com!kodak!eastman!b56vxg.kodak.com!ekdug From: ekdug@b56vxg.kodak.com (Linda Stustman) Newsgroups: sci.math.stat Subject: Help With Statistics on "Compressed Data" Keywords: statistics, compressed data, Message-ID: <5NOV199209582623@b56vxg.kodak.com> Date: 5 Nov 92 14:58:00 GMT Sender: news@eastman.UUCP Organization: Eastman Kodak Company, Rochester NY Lines: 42 News-Software: VAX/VMS VNEWS 1.41 I'm looking for a way to estimate an upper bound on the standard deviation of a stream of plant data. The complication is that the data is coming from a process data base that has a type of compression applied to it. Simply put, data from the process is generated ever 30 minutes (an analysis by a gas chromatograph). The process data base compares the new value with the previous one and only records the new value (with an associated time stamp) if the absolute value of the difference between the two readings is greater than a fixed threshold. In practice, this means 3 to 6 values a day are recorded for the variable, out of the 48 analyses that are actually done. My problem is to come up with a way of providing a reasonable estimate for the standard deviation of the analysis values that uses the information present in the recorded values, but also includes the information that the other 42 to 45 analyses varied less that the threshold value. My only idea on how to attack the problem (to date) is to assume that the range of +/- the threshold value corresponds to +/- 3 sigma of a normally distributed variable. Then, I could generate the appropriate number of "missing" values from a random normal distribution, add the recorded values and calculate a standard deviation from the resulting augmented "data" set. Perhaps arguments could be made for treating the +/- range of the threshold value as +/- 2 or +/- 4 sigma (and does anyone have any comments?). Or, is there some well-known (but not by me) method of handling this situation? Comments are *most* welcome! (As a answer to a possible question as to why I seem to be trying to calculate a daily standard deviation, the process has several sources of rather slow drifts superimposed over the normal measurement noise that occurs in industrial processes. One day is long enough to do reasonable averages over and not so long as to pick up too much of the drift contributions. Historical data has to be used because we can't go back to the older modes of operating the process.) Please reply to the conference, if you feel this topic might have broad interest, or to jhcox@Kodak.com Thanks in advance, JHCox