home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!hela.iti.org!usc!usc!not-for-mail
- From: ajayshah@almaak.usc.edu (Ajay Shah)
- Newsgroups: alt.lang.awk
- Subject: Re: Histogrammer? (with some src)
- Date: 28 Jan 1993 16:48:41 -0800
- Organization: University of Southern California, Los Angeles, CA
- Lines: 157
- Message-ID: <1k9ut9INNg0k@almaak.usc.edu>
- References: <dfrankow.728249852@myria> <1993Jan28.221236.28016@csc.ti.com>
- NNTP-Posting-Host: almaak.usc.edu
-
- I also wrote a histogrammer in awk and found it was too slow.
- I then rewrote it in C.
- If there is interest I'll post it.
-
- BTW there are two variants:
-
- a) "One-way and two-way frequency tables"
-
- This works with discrete data. It reduces data into a table like:
-
- 1 2 40% 40%
- 2 2 40% 80%
- 7 1 20% 100%
-
- Totally 5
-
- or
-
- | f2
- f1 | 1 2 | Total
- -----------+--------------------+----------
- 1 | 1 2 | 3
- 2 | 0 1 | 1
- -----------+--------------------+----------
- Total | 1 3 | 4
-
-
- b) "Histogram". This deals with real-valued data and comes
- up with things like:
-
- 0.0 0.778
- +---------------------------------------------------------------
- 0.00-3.00 |**********************************************
- 3.00-6.00 |******************
- 6.00-9.00 |******************
-
-
-
- I have a program named `crosstab' which produces the first
- kind of tables and a program named `binning' which does the second.
-
- -ans.
-
-
- Here is a awk one-way frequency table:
-
-
- {
- count[$3]++;
- }
-
- END {
- for (x in count) {
- ++i; y[i] = x;
- }
- N = i;
- # Now y is a array containing element values.
- do {
- interchanged = 0;
- for (i=1; i<N; i++)
- if (y[i] > y[i+1]) {
- tmp = y[i]; y[i] = y[i+1]; y[i+1] = tmp;
- interchanged = 1;
- }
- } while (interchanged);
-
- for (i=1; i<=N; i++)
- printf("%f %d\n", y[i], count[y[i]]);
- }
-
-
- Here is it in perl:
-
- #!/usr/rand/bin/perl
-
- #From sondeen@ISI.EDU Thu Mar 19 10:44:10 1992
- #Here is a perl program to do what you request (I did it for practice :-)
-
- #To run it, you must have the program "perl" in your search path (hint:
- #do "which perl"). Save the following in a file called ctab and then
- #run "perl ctab your_field_numbers your_data_file(s)" eg:
-
- #perl ctab 2 1 ctabs.dat ctabs.dat <== for two copies :-)
- # From: ajayshah@alhena.usc.edu (Ajay Shah)
- # Newsgroups: alt.sources.wanted
- # Subject: Has someone written a Unix tool for [cross-]tabulation?
- #
- # I'm looking for a Unix utility which does tabulations and cross
- # tabulations.
- #
- # E.g. given a dataset
- #
- # 1 7 9
- # 1 6 19
- # 2 7 4
- # 2 6 3
- # 3 7 2
- # ...
-
- $usage = "ctab field [field+] file+";
-
- @indices = ();
-
- while (($parm = shift) =~ /^\d+/) {
- push(@indices, $parm - 1); # reduce index by 1
- }
-
- if ($#indices < 0 || $parm eq '') {
- print "$usage\n";
- exit 1;
- }
-
- unshift(@ARGV,$parm);
-
- %counts = ();
- %firsts = ();
- %seconds = ();
-
- while (<>) {
- chop;
- split;
- shift if ($_[0] eq ''); # skip leading nothing
- @_ = @_[@indices];
- if ($#_ == $#indices) {
- $firsts{@_[0]}++;
- $seconds{@_[1]}++ if ($#indices > 0);
- $index = join('|',@_);
- $counts{$index}++;
- } else {
- #skip bad line?
- }
- }
-
- @firsts = sort keys %firsts;
- @seconds = sort keys %seconds;
-
- if ($#seconds >= 0) {
- print " ";
- foreach $second (@seconds) {
- print "$second ";
- }
- print "\n";
- }
-
- foreach $first (@firsts) {
- print "$first ";
- foreach $second (@seconds) {
- printf "%d ",$counts{$first . '|' . $second};
- # printf to make '' show up as 0
- }
- print "$counts{$first} " if ($#seconds < 0);
- print "\n";
- }
-
- exit 0;
- --
- Ajay Shah, (213)749-8133, ajayshah@rcf.usc.edu
-