NetNews Usenet Archive 1993 #3

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #3 / NN_1993_3.iso / spool / alt / lang / awk / 29 < prev next >

Wrap

Internet Message Format | 1993-01-28 | 3.9 KB

Path: sparky!uunet!hela.iti.org!usc!usc!not-for-mail From: ajayshah@almaak.usc.edu (Ajay Shah) Newsgroups: alt.lang.awk Subject: Re: Histogrammer? (with some src) Date: 28 Jan 1993 16:48:41 -0800 Organization: University of Southern California, Los Angeles, CA Lines: 157 Message-ID: <1k9ut9INNg0k@almaak.usc.edu> References: <dfrankow.728249852@myria> <1993Jan28.221236.28016@csc.ti.com> NNTP-Posting-Host: almaak.usc.edu I also wrote a histogrammer in awk and found it was too slow. I then rewrote it in C. If there is interest I'll post it. BTW there are two variants: a) "One-way and two-way frequency tables" This works with discrete data. It reduces data into a table like: 1 2 40% 40% 2 2 40% 80% 7 1 20% 100% Totally 5 or | f2 f1 | 1 2 | Total -----------+--------------------+---------- 1 | 1 2 | 3 2 | 0 1 | 1 -----------+--------------------+---------- Total | 1 3 | 4 b) "Histogram". This deals with real-valued data and comes up with things like: 0.0 0.778 +--------------------------------------------------------------- 0.00-3.00 |********************************************** 3.00-6.00 |****************** 6.00-9.00 |****************** I have a program named `crosstab' which produces the first kind of tables and a program named `binning' which does the second. -ans. Here is a awk one-way frequency table: { count[$3]++; } END { for (x in count) { ++i; y[i] = x; } N = i; # Now y is a array containing element values. do { interchanged = 0; for (i=1; i<N; i++) if (y[i] > y[i+1]) { tmp = y[i]; y[i] = y[i+1]; y[i+1] = tmp; interchanged = 1; } } while (interchanged); for (i=1; i<=N; i++) printf("%f %d\n", y[i], count[y[i]]); } Here is it in perl: #!/usr/rand/bin/perl #From sondeen@ISI.EDU Thu Mar 19 10:44:10 1992 #Here is a perl program to do what you request (I did it for practice :-) #To run it, you must have the program "perl" in your search path (hint: #do "which perl"). Save the following in a file called ctab and then #run "perl ctab your_field_numbers your_data_file(s)" eg: #perl ctab 2 1 ctabs.dat ctabs.dat <== for two copies :-) # From: ajayshah@alhena.usc.edu (Ajay Shah) # Newsgroups: alt.sources.wanted # Subject: Has someone written a Unix tool for [cross-]tabulation? # # I'm looking for a Unix utility which does tabulations and cross # tabulations. # # E.g. given a dataset # # 1 7 9 # 1 6 19 # 2 7 4 # 2 6 3 # 3 7 2 # ... $usage = "ctab field [field+] file+"; @indices = (); while (($parm = shift) =~ /^\d+/) { push(@indices, $parm - 1); # reduce index by 1 } if ($#indices < 0 || $parm eq '') { print "$usage\n"; exit 1; } unshift(@ARGV,$parm); %counts = (); %firsts = (); %seconds = (); while (<>) { chop; split; shift if ($_[0] eq ''); # skip leading nothing @_ = @_[@indices]; if ($#_ == $#indices) { $firsts{@_[0]}++; $seconds{@_[1]}++ if ($#indices > 0); $index = join('|',@_); $counts{$index}++; } else { #skip bad line? } } @firsts = sort keys %firsts; @seconds = sort keys %seconds; if ($#seconds >= 0) { print " "; foreach $second (@seconds) { print "$second "; } print "\n"; } foreach $first (@firsts) { print "$first "; foreach $second (@seconds) { printf "%d ",$counts{$first . '|' . $second}; # printf to make '' show up as 0 } print "$counts{$first} " if ($#seconds < 0); print "\n"; } exit 0; -- Ajay Shah, (213)749-8133, ajayshah@rcf.usc.edu