Usenet 1994 January

home *** CD-ROM | disk | FTP | other *** search

/ Usenet 1994 January / usenetsourcesnewsgroupsinfomagicjanuary1994.iso / sources / unix / volume20 / reldb / part03 < prev next >

Wrap

Internet Message Format | 1989-09-17 | 45.9 KB

Subject: v20i006: Relational database and graphing tools, Part03/03 Newsgroups: comp.sources.unix Sender: sources Approved: rsalz@uunet.UU.NET Submitted-by: hafro.is!gunnar (Gunnar Stefansson) Posting-number: Volume 20, Issue 6 Archive-name: reldb/part03 #!/bin/sh # to extract, remove the header and type "sh filename" if `test ! -s ./README` then echo "writting ./README" cat > ./README << '\Rogue\Monster\' Here is a neat little collection of programs to do all sorts of handy things. Directories: reldb.src Programs for relational database operations plot.src Plot(1) filter (for X11). scat.src Program for drawing simple plots (SCATtergram) doc Documentation testdb Sample programs for testing the reldb collection. testgr Sample programs for testing the scat options. Cd to each directory and examine the Makefiles and README files. Note that some makefiles may do automatic installation. \Rogue\Monster\ else echo "will not over write ./README" fi if `test ! -d ./bin` then mkdir ./bin echo "mkdir ./bin" fi if `test ! -s ./bin/README` then echo "writting ./bin/README" cat > ./bin/README << '\Rogue\Monster\' This is the default location for binary files. The directory can safely be removed, but is included in the distribution simply so that the binaries have someplace to go in case 'make' is used without appropriate local modifications of the Makefiles. \Rogue\Monster\ else echo "will not over write ./bin/README" fi if `test ! -d ./convert.src` then mkdir ./convert.src echo "mkdir ./convert.src" fi if `test ! -d ./convert.src/Glim` then mkdir ./convert.src/Glim echo "mkdir ./convert.src/Glim" fi if `test ! -s ./convert.src/Glim/pretoglim` then echo "writting ./convert.src/Glim/pretoglim" cat > ./convert.src/Glim/pretoglim << '\Rogue\Monster\' : # pretoglim.sh -- convert prelude to glim file # read header read dummy awk '{ for(i=1;i<NF;i++){ if($(i)==int($(i))) printf("%d ",$(i)); else if($(i)>1) printf("%.6f ",$(i)); else printf("%f ",$(i)); } printf("%.4f\n",$(NF)) }' > glimdat.junk units=`wc -l < glimdat.junk ` echo "\$units $units \$echo \$data $header \$read" | sed 's/ / \ /g' > glimcmds.junk echo 'Data in glimdat.junk, commands in glimcmds.junk' echo "Use 'cat glimcmds.junk glimdat.junk command-file | glim' to run " \Rogue\Monster\ else echo "will not over write ./convert.src/Glim/pretoglim" fi if `test ! -s ./convert.src/README` then echo "writting ./convert.src/README" cat > ./convert.src/README << '\Rogue\Monster\' Files for converting from reldb to various other formats. The directory names say it all. These programs are merely hints - they are currently run on a mix of machines (hp and sun), but not all programs work on all machines. (We can't test all combinations, since we don't have all packages on all machines). \Rogue\Monster\ else echo "will not over write ./convert.src/README" fi if `test ! -d ./convert.src/S` then mkdir ./convert.src/S echo "mkdir ./convert.src/S" fi if `test ! -s ./convert.src/S/pretoS` then echo "writting ./convert.src/S/pretoS" cat > ./convert.src/S/pretoS << '\Rogue\Monster\' #!/bin/sh # # Convert Prelude file to S files # # Ideally, the first column in the Prelude file should be a labelling column, # e.g. year. FILE=$1 cat $FILE > /tmp/tmp$$ tail +3 < /tmp/tmp$$ | grep -v '\-999 ' > $FILE.s ( echo '-f ' ; awk 'NR==1{for(i=1;i<=NF;i++)print $i,"REAL",i}' < /tmp/tmp$$) > $FILE.s.des Splus extract $FILE.s sed -n '3,$s/^$[^ ]*$.*$/"\1"/p' < /tmp/tmp$$ > rownames.txt head -1 /tmp/tmp$$ | tr ' ' '\012' | sed 's/$.*$/"\1"/' > varnames.txt NCOL=`wc -l < varnames.txt` rm /tmp/tmp$$ echo 'Now use restore("'$FILE'.s.ext") within Splus, to read the data or use x<-matrix(read("'$FILE'.s"),ncol='$NCOL',byrow=TRUE) to read the data as a matrix and colnames<-read("varnames.txt") to get the column labels for the matrix. If the first column contained labels, you can use rownames<-read("rownames.txt") to get the row labels.' \Rogue\Monster\ else echo "will not over write ./convert.src/S/pretoS" fi if `test ! -d ./convert.src/Bmdp` then mkdir ./convert.src/Bmdp echo "mkdir ./convert.src/Bmdp" fi if `test ! -s ./convert.src/Bmdp/2d.inp` then echo "writting ./convert.src/Bmdp/2d.inp" cat > ./convert.src/Bmdp/2d.inp << '\Rogue\Monster\' /PROBLEM TITLE IS 'Heiti verkefnis'. /PRINT PAGE=66. LINE=80. /INPUT FILE IS 'bmdp.test'. CODE IS TEST_DATA. /END \Rogue\Monster\ else echo "will not over write ./convert.src/Bmdp/2d.inp" fi if `test ! -s ./convert.src/Bmdp/7d.inp` then echo "writting ./convert.src/Bmdp/7d.inp" cat > ./convert.src/Bmdp/7d.inp << '\Rogue\Monster\' /PROBLEM TITLE IS 'Heiti verkefnis'. /PRINT PAGE=66. LINE=80. /INPUT FILE IS 'bmdp.test'. CODE IS TEST_DATA. /GROUP CODES(y) ARE 1, 3, 4, 5. CODES(z) ARE 2, 3, 4, 5, 6. /HISTOGRAM GROUPING= z. VARIABLE=x. /END \Rogue\Monster\ else echo "will not over write ./convert.src/Bmdp/7d.inp" fi if `test ! -s ./convert.src/Bmdp/Makefile` then echo "writting ./convert.src/Bmdp/Makefile" cat > ./convert.src/Bmdp/Makefile << '\Rogue\Monster\' # # BINDIR is where the binaries and scripts go. Note that a simple 'make' # will also install. You'll want to define BINDIR as ../bin untill # you know things work. # BINDIR=/usr/local/bin # # Use -DBSD for all BSD-systems # -DSYSV for sysv-style systems # CFLAGS= -DDEBUG=0 -DBSD BINARIES= $(BINDIR)/pretobmdp $(BINDIR)/pretobmdp : pretobmdp.sh cp pretobmdp.sh $(BINDIR)/pretobmdp chmod +x $(BINDIR)/pretobmdp \Rogue\Monster\ else echo "will not over write ./convert.src/Bmdp/Makefile" fi if `test ! -s ./convert.src/Bmdp/README` then echo "writting ./convert.src/Bmdp/README" cat > ./convert.src/Bmdp/README << '\Rogue\Monster\' README pretobmdp Converts reldb to BMDP Sample input file: bmdp.reldb Sample command: pretobmdp bmdp.reldb y z Output files: bmdp.prog bmdp.test junkfile \Rogue\Monster\ else echo "will not over write ./convert.src/Bmdp/README" fi if `test ! -s ./convert.src/Bmdp/bmdp.prog` then echo "writting ./convert.src/Bmdp/bmdp.prog" cat > ./convert.src/Bmdp/bmdp.prog << '\Rogue\Monster\' /PROBLEM TITLE IS 'Heiti verkefnis'. /PRINT PAGE=66. LINE=80. /INPUT FILE IS 'bmdp.test'. CODE IS TEST_DATA. /GROUP CODES(y) ARE 1, 3, 4, 5. CODES(z) ARE 2, 3, 4, 5, 6. /END \Rogue\Monster\ else echo "will not over write ./convert.src/Bmdp/bmdp.prog" fi if `test ! -s ./convert.src/Bmdp/bmdp.reldb` then echo "writting ./convert.src/Bmdp/bmdp.reldb" cat > ./convert.src/Bmdp/bmdp.reldb << '\Rogue\Monster\' x y z v - - - - 0 1 2 3 1 3 4 2 3 4 5 3 4 5 6 4 5 6 7 \Rogue\Monster\ else echo "will not over write ./convert.src/Bmdp/bmdp.reldb" fi if `test ! -s ./convert.src/Bmdp/bmdp.test` then echo "writting ./convert.src/Bmdp/bmdp.test" cat > ./convert.src/Bmdp/bmdp.test << '\Rogue\Monster\' else echo "will not over write ./convert.src/Bmdp/bmdp.test" fi if `test ! -s ./convert.src/Bmdp/pretobmdp.sh` then echo "writting ./convert.src/Bmdp/pretobmdp.sh" cat > ./convert.src/Bmdp/pretobmdp.sh << '\Rogue\Monster\' #!/bin/sh # # pretobmdp -- convert reldb to bmdp format # # Usage: pretobmdp input-file [factors] # # Converts the input file to BMDP format and sets up a BMDP script # to read that file. The runs the script, generates a BMDP data set # and suggests a next step. if [ ! -f "$1" ] then echo "Usage: pretobmdp input-file [factors]" echo "(Input file must be specified)" exit 1 fi INFILE=$1 # First set up the correct missing-value notation. sed ' s/^ /* / s/ $/ */ s/ -1 / * /g s/ -1 / * /g s/ -1 / * /g s/ / * /g s/ / * /g s/ / * /g ' < $1 > tmp$$.pre # First read Reldb header lines read names < $1 # column names t=`echo $names | wc -w` # compute number of variables bmnames=`echo "$names"|sed 's/ /, /g' | tr ' ' '\012'` # make bmdp name list (echo '/PROBLEM TITLE IS '\''Heiti verkefnis'\''. /PRINT PAGE=66. LINE=80. /INPUT VARIABLES ARE '$t'. FORMAT IS FREE. /VARIABLE NAMES ARE ' "$bmnames". ' /SAVE FILE IS '\''bmdp.test'\'.' CODE IS TEST_DATA. NEW. /END' sed ' 1,2d s/ / /g') < tmp$$.pre > junkfile # The file 'junkfile' now contains a bmdp-style file, ready e.g. for # the program 1v. Running this program will give a bmdp data file, # called bmdp.test echo "junkfile contains a simple bmdp program + data WARNING : <tab><tab> was assumed to mean a missing value. ALSO <tab>-1<tab> !!! These have been coded to bmdp missing values. The following is the output from the command '1d < junkfile'" 1d < junkfile echo "The '1d'-run generated bmdp.test, which contains the data in bmdp format" # Now to give a sample program to read bmdp.test echo '/PROBLEM TITLE IS '\''Heiti verkefnis'\''. /PRINT PAGE=66. LINE=80. /INPUT FILE IS '\''bmdp.test'\'.' CODE IS TEST_DATA.' > 2d.inp shift # Remaining arguments are factors # # When they exist, we will set up a histogram- and # multiple regression program (7d.inp and 2v.inp). # These are very similar, but not quite the same. if [ $# -ge 1 ] then echo -n "Processing your factors:" cp 2d.inp 7d.inp echo '/GROUP' >> 7d.inp for i # Generate a code-stmnt do echo -n "$i..." echo " CODES($i) ARE " >> 7d.inp project $i < tmp$$.pre | tail +3 | sort -u | sed -e 's/$/,/' \ -e '$s/,$/./' \ -e '/\*/d' >> 7d.inp done echo -n "Finishing up..." var=`sed -e '1{s/$[^ ]$ .*/\1./ q }' < tmp$$.pre` # Use 1st var as dependent echo "VARIABLE=$var" >> 7d.inp cp 7d.inp 2v.inp # This is ends the similarity 7d=2v echo '/HISTOGRAM GROUPING=' >> 7d.inp echo "/DESIGN DEPENDENT IS $var. GROUPING ARE " >> 2v.inp x=`echo $* | sed -e 's/ */,/g' -e 's/$/./'` echo $x >> 7d.inp echo $x >> 2v.inp echo '/END' >> 7d.inp fi echo '/END' >> 2d.inp echo "Done" echo "bmdp.prog contains a bmdp program to read bmdp.test and to generate simple statistics. Your next step(s) should be : 2d < 2d.inp" if [ -f 7d.inp ] then echo ' 2v < 2v.inp' echo ' 2d < 2d.inp' echo ' 7d < 7d.inp' fi echo "and then use the files" *.inp "for future development." rm -f tmp$$.pre \Rogue\Monster\ else echo "will not over write ./convert.src/Bmdp/pretobmdp.sh" fi if `test ! -d ./doc` then mkdir ./doc echo "mkdir ./doc" fi if `test ! -s ./doc/addcol.1` then echo "writting ./doc/addcol.1" cat > ./doc/addcol.1 << '\Rogue\Monster\' .\" Man page for addcol(1) .TH ADDCOL 1 "1. April 1989" .SH NAME addcol - addcol columns of a table .SH SYNOPSIS .B addcol .I newcolumn1 [ .I newcolumn2 \... ] .SH DESCRIPTION The .I addcol command adds new, empty columns to an existing reldb table. .I Project also does this, if nonexistent columns are named on its command line, but with .I addcol, mentioning existing columns is not needed. .SH "SEE ALSO" .BR reldb(1),\ rename(1),\ project(1) .SH "BUGS" None known. \Rogue\Monster\ else echo "will not over write ./doc/addcol.1" fi if `test ! -s ./doc/compute.1` then echo "writting ./doc/compute.1" cat > ./doc/compute.1 << '\Rogue\Monster\' .\" Man page for compute(1) .TH COMPUTE 1 "1. April 1989" .SH NAME compute - compute values into a table. .SH SYNOPSIS .B compute .I compute-statement [ .I < data ] .SH DESCRIPTION .B compute reads columnar data from the standard input and computes new values into the columns according to the computational formula given. Arbitrarily complex expressions (c.f. .I awk(1) ) can be used, based on the column names. .LP This is best explained by way of an example. Suppose the file .I data contains the lines: .nf x y z v u w i - - - - - - - 0 1 2 3 4 * 6 1 2 3 4 5 ** 7 2 3 4 5 6 *** 8 3 4 5 6 7 v 9 4 5 6 7 8 vi 10 .fi where exactly one tab-character separates the columns. .LP The command .I compute 'z = 2; .I v=2*i' < data yields .nf x y z v u w i - - - - - - - 0 1 2 12 4 * 6 1 2 2 14 5 ** 7 2 3 2 16 6 *** 8 3 4 2 18 7 v 9 4 5 2 20 8 vi 10 .fi on the standard output. .LP More complex numerical computation commands can use if-statements, be placed on multiple lines etc. The format of the if-statements is like the expressions described in select(1). .SH "SEE ALSO" .BR reldb(1),\ awk(1),\ select(1). .SH BUGS .LP .B Compute simply calls upon .I awk to do the work. Any bugs in .I awk will be reflected in .I compute. .LP Since the column names are translated into .I awk variable names, column names are bound by the same rules as variable names in .I awk. To be on the safe side, one should only use alphabetic characters and underscores. International characters (e.g. from ISO 8859/1, upper half) will probably not work. Other symbols, such as a hyphen (minus) and asterisk will not work. .LP Some names are reserved .I awk symbols and should not be used for column names (e.g. if, for, length). .LP The single most difficult problem encountered with .I compute and .I select is the error messages, which come from within .I awk. These are hard to trace, as .I compute and .I select generate awk-commands, which are stored in a temporary file, and this file is erased immediately after execution. .LP In BSD, one can suspend .I compute (with ^Z) immediately after it is started and take a look at the temporary file. Usually the error is completely obvious. .LP A good-natured soul should add a debug-option to .I compute and .I select, to allow output of this temporary file, e.g. to the terminal. \Rogue\Monster\ else echo "will not over write ./doc/compute.1" fi if `test ! -s ./doc/count.1` then echo "writting ./doc/count.1" cat > ./doc/count.1 << '\Rogue\Monster\' .\" man-page for count(1). .TH COUNT 1 "1. April 1989" .SH NAME count - give a count of identical lines. .SH SYNOPSIS .B count [ .I < data ] .SH DESCRIPTION .B Count reads a (sorted) .I reldb table and counts the number of identical lines. The output consists of the input, where repetitions are output only once, but a count column is added to give the number of repetitions. .LP .I Count is typically used after .I sorttable, and is very useful for generating histograms. This program is essentially the .I reldb analog of .I uniq. .SH "SEE ALSO" .BR reldb(1),\ sorttable(1),\ uniq(1). .SH BUGS .LP No bugs are known. .LP After the program was written, it became obvious that it would have been trivial to implement using .I uniq and .I sed. However, this beginner's C-program is much more efficient than that combination. \Rogue\Monster\ else echo "will not over write ./doc/count.1" fi if `test ! -s ./doc/dbe.1` then echo "writting ./doc/dbe.1" cat > ./doc/dbe.1 << '\Rogue\Monster\' .\" man-page for dbe(1). .TH DBE 1 "1. April 1989" .SH NAME dbe - database editor .LP dbe.add - add entries to reldb table .LP dbe.change - change entries in reldb table. .SH SYNOPSIS .B dbe.add .I data .LP .B dbe.change .I data .SH DESCRIPTION .LP A true database editor should allow easy editing of a table, including type checking, limits on numbers etc. Even better, it should be possible to edit several tables, going through them all simultaneously, using one or more key fields. .LP None of the above has been implemented, although it would be fairly easy e.g. in lisp within GNU Emacs. .LP Instead, two shell scripts exist: .I dbe.add for adding lines to a table, and .I dbe.change for going through all the lines in a table, with the option of reentering any field in any record. .LP To use .I dbe.add, simply type the command followed by the name of a table. The table must exist. The script will then prompt for each field in the record. The user enters values for the field and hits return in the end. After a record is complete, the program appends the record to the file. .LP The program is terminated by typing ^C (hold down control while pressing c). .SH "SEE ALSO" .BR reldb(1). .SH BUGS .I Dbe is missing. \Rogue\Monster\ else echo "will not over write ./doc/dbe.1" fi if `test ! -s ./doc/jointable.1` then echo "writting ./doc/jointable.1" cat > ./doc/jointable.1 << '\Rogue\Monster\' .\" Man page for jointable(1) .TH JOINTABLE 1 "1. April 1989" .SH NAME jointable - relational join on two tables .SH SYNOPSIS .B jointable .I file1\ file2 [ .I >\ result ] .SH DESCRIPTION .LP .B Jointable takes two input files and joins them ("side by side"), using the first column of each as a key. .LP A line in one file which does not have a matching line in the other file is not output. Thus .I jointable can be used with .I sorttable to select lines from either or both files. This will tend to be faster than .I select on large file, but requires both files to be sorted alphabetically. .LP The current implementation of .I jointable is simply a shell script which calls .I join(1). Any bugs in .I join will therefore also be reflected in .I jointable. .SH "SEE ALSO" .BR reldb(1),\ sorttable(1),\ select(1),\ plokk(1). .SH BUGS .LP .I Jointable requires both files first to be sorted alphabetically, .B not\ numerically using .I sorttable. .LP Running .I sorttable and .I jointable to select lines from files may get somewhat slow. See plokk(1) for alternate ways to select lines based on values in a key field. See .I select for a much more flexible method of selecting lines. .LP Bugs have been reported when using .I jointable with larger files on a Sun 3/50 running SunOs 4.0. These bugs do not occur on Sun 4's or on earlier versions of SunOs and are likely due to bugs in .I join(1) on the Sun 3/50. \Rogue\Monster\ else echo "will not over write ./doc/jointable.1" fi if `test ! -s ./doc/matrix.1` then echo "writting ./doc/matrix.1" cat > ./doc/matrix.1 << '\Rogue\Monster\' .\" man-page for matrix(1). .TH MATRIX 1 "1. April 1989" .SH NAME matrix - transform (z,x,y)-style data to matrix format. .SH SYNOPSIS .B matrix .I -f format [ .I < data ] .SH DESCRIPTION .B matrix reads a three-column table (as written by count(1)) from the standard input and transforms it to matrix format by summing the first column into a matrix indexed by the second and third column. .LP The format option should be used to obtain pretty output. It refers to the format of all columns after the first one. .LP The z-columns (typically frequencies) can be any real numbers. The coordinate columns can be arbitrary positive integers, although .I matrix does put limits on their size. .LP This is best explained by way of an example. Suppose the file .I data contains the lines: .nf count x y ----- - - 1 1 2 1 1 3 2 3 4 2 4 5 2 4 6 .fi where exactly one tab-character separates the columns. .LP The command .I matrix \fI"\\t%.0f" < data\fP yields .nf y x1 x2 x3 x4 - -- -- -- -- 2 1 0 0 0 3 1 0 0 0 4 0 0 2 0 5 0 0 0 2 6 0 0 0 2 .fi on the standard output. .LP The input does not need to be sorted in any way. .SH "SEE ALSO" .BR reldb(1),\ count(1). .SH BUGS .LP .B Matrix has hardwired dimensions. This should definitely be changed. .LP Other options may be available, e.g. to control whether lines consisting of all zeros are output. Check the code. \Rogue\Monster\ else echo "will not over write ./doc/matrix.1" fi if `test ! -s ./doc/plokk.1` then echo "writting ./doc/plokk.1" cat > ./doc/plokk.1 << '\Rogue\Monster\' .\" Man page for plokk(1) .TH PLOKK 1 "1. April 1989" .SH NAME plokk - 'pluck' lines from a file. .SH SYNOPSIS .B plokk .I selfile < datafile [ .I >\ result ] .LP .B bplokk .I selfile < datafile [ .I >\ result ] .SH DESCRIPTION .LP .B Plokk uses a .I selection file to decide while lines to select from a .I datafile. It is very fast and is often used in place of .I select(1) or a .I sorttable(1)/jointable(1) combination. .LP .I Plokk is useful when selecting according to values in a fixed column of the datafile. The datacolumn of interest must be the first column of the data (see .I project(1)). The selection must be based on numerical values (see also BUGS, below). .LP The first column of .I selfile is assumed to contain numbers corresponding to values to be selected from the datafile. .LP .I Plokk does not require any sorting of the files. However, there is a limit on the largest number which can be selected. .LP .I Bplokk (big .I plokk) can handle more arbitrary numbers (even text) in the .I selection\ file, but in this case there is a limit on the size of the .I selection file. .I Bplokk requires the .I selection file to have only one column, the column of values to be selected. .SH "SEE ALSO" .BR reldb(1),\ jointable(1),\ sorttable(1),\ select(1),\ fgrep(1). .SH "BUGS AND LIMITATIONS" .LP .I Plokk works by reading the numbers in the .I selection file and inserting '1' into a vector for elements corresponding to values to be selected. The check to see if a line is to be selected therefore only requires a single lookup into a vector. This is extremely fast, even for large datafiles, but the dimension of the vector limits the numerical size of indices to be selected. .LP .I Bplokk overcomes the limitation of .I plokk, by using the .I selection file for building a command for .I egrep, to search for strings at the beginning of lines in the datafile. The shell on the user's system will limit the size of the search string, which implies a limit on the number of items in the .I selection file. .LP The name of the command has an Icelandic origin, and is used to describe .I pulling lice out of hair, .I plucking chicken etc, c.f. .I pluck. .LP Note that .I fgrep(1) can be used with the same input as .I bplokk. This may alleviate some of the limitations of the number of search-items in .I bplokk, but .I fgrep will of course search for the search-strings anywhere in the data file, not just in the first column. \Rogue\Monster\ else echo "will not over write ./doc/plokk.1" fi if `test ! -s ./doc/project.1` then echo "writting ./doc/project.1" cat > ./doc/project.1 << '\Rogue\Monster\' .\" Man page for project(1) .TH PROJECT 1 "1. April 1989" .SH NAME project - project columns from table .SH SYNOPSIS .B project .I columns [ .I < data ] .SH DESCRIPTION .B project reads columnar data from the standard input and outputs the named columns. .LP This is best explained by way of an example. Suppose the file .I data contains the lines: .nf xcol y z v u w i ---- - - - - - - 0 1 2 3 4 * 6 1 2 3 4 5 ** 7 2 3 4 5 6 *** 8 3 4 5 6 7 v 9 4 5 6 7 8 vi 10 .fi where exactly one tab-character separates the columns. .LP The command .I project y v xcol < data yields .nf y v xcol - - ---- 1 3 0 1 2 3 2 3 4 3 4 5 4 5 6 .fi on the standard output. .LP .B Project is typically used with other reldb commands such as select, in which case the output from one command is piped into another. .SH "SEE ALSO" .BR reldb(1) .SH BUGS .LP .B Project uses a fixed size matrix for storing column names. This will overflow when too many names are given on the command line. No warning is given. \Rogue\Monster\ else echo "will not over write ./doc/project.1" fi if `test ! -s ./doc/reldb.1` then echo "writting ./doc/reldb.1" cat > ./doc/reldb.1 << '\Rogue\Monster\' .\" Man page for reldb(1). gunnar@hafro.is 1. april, 1989. .TH RELDB 1 "1. April 1989" .SH NAME reldb - programs for simple relational database operations .SH DESCRIPTION All .I reldb programs read columnar data from the standard input and write to the standard output. .LP All database operations are based on the Unix notion of pipes and filters. Thus, programs tend to be small, doing one task. The output from any one program can be piped into another. .LP A data file is a simple ascii file, with columns separated by a single tab. Thus, two tabs in a row denote an empty field (e.g. a missing value). Data fields can contain anything, character or numbers, although some commands will strip the 8th bit since the Unix tools called upon will do so. Naturally, there is no built-in notion of maximum field-, record- or file-length. .LP The primary problem with flat Unix files is the loss of the data dictionary concept. This is partially solved by using the first two lines in any file as a header, where the first line contains the column names (separated by tabs) and the second line contains a series of dashes. This allows commands to operate on columns by their name, regardless of their position in the input file. .LP The only restriction on the format of the files is that a file must contain the same fixed number of tabs in every line. .LP This is best explained by way of an example: .nf xcol y z v u w i ---- - - - - - - 0 1 2 3 4 * 6 1 2 3 4 5 ** 7 2 3 4 5 6 *** 8 3 4 5 6 7 v 9 4 5 6 7 8 vi 10 .fi .LP When the fields are long, the columns will not look "straight" as in the above example, but that is irrelevant. The only important element is that a tab always separates the columns, and there are an equal number of tabs in every line. .LP .I Reldb commands can read this file from the standard input. Thus, one can .I select lines from the file (c.f. .I grep(1)), .I project columns (c.f. .I cut(1)), use .I jointable to join it (sideways, like .I join(1) ) with another file or use .I union to concatenate two such files (like .I cat (1) ). Other operations include .I rename to rename columns, .I compute to compute new values into columns, .I addcol to add columns into a table, .I sorttable to sort a table, etc. etc. .LP Data from a .I reldb file can be trivially piped to "ordinary" Unix programs, via the .I tail\ +3 < data command. In particular, this allows a very easy interface with the excellent collection of statistical routines by Gary Perlman, .I |Stat, (previously .I Unix|Stat), also available as freeware. .LP The best feature of the .I reldb approach is the possibility of combining programs using Unix pipes. Thus one can first .I select lines where .I 'quantity' is at least one, and .I project columns .I 'price' and .I 'quantity' from the output: .nf .ce .I select 'quantity >= 1' < data | project price quantity .fi Similarly, one can .I compute new values into columns, .I select lines, .I project columns and pipe the result through .I scat to prot the results. .SH "SEE ALSO" .BR project(1),\ select(1),\ addcol(1),\ rename(1), .BR sorttable(1),\ compute(1),\ plokk(1),\ subtotal(1), .BR matrix(1),\ number(1),\ check(1),\ dbe(1), .BR testdb(1),\ testgr(1), .BR |Stat(1). .SH BUGS .LP Probably very many. See each command. .LP The column names should be restricted to ascii characters and underscores. On some systems international characters (i.e. ISO 8859/1, "upper half") will work, but a number of Unix implementations have awk's which strip the 8th bit. Furthermore, symbols such as a hyphen and an asterics will be misinterpreted in .I compute and .I select, which call upon .I awk(1) to do their work, using the column names as variable names. Thus the column names are bound to the same rules as variable names in .I awk. .LP Most of the commands have their .I man- page, but there will always be some that are missing, since programs will always be written before their documentation, contrary to what tends to be called good practise. .SH "ACKNOWLEDGEMENTS AND HISTORY" .LP These programs would never have been written had VenturCom Inc. not refused to port their Prelude package to our (then) central computer. This taught us the lesson that one is never independent of hardware vendors unless one is independent of software vendors. The lack of Prelude on the new computer made us think about just how hard it would be to rewrite the bare essentials of a rdbs (here called .I project,\ select,\ union and .I jointable). We found this a trivial task and in time a large number of other programs followed. All of them were written as they were needed and most of them are based on corresponding Unix tools, making the "writing" incredibly easy in each case. Having gone through this, we now very much question the wisdom of everyone writing in proprietary 4GL languages and using expensive dbms for simple applications. .LP .I Prelude is a much more complete system than .I reldb, but the principal elements look much the same as .I reldb was modeled to resemble a very early version of .I Prelude. A user wanting a complete and bug-free rdbs might want to contact .I VenturCom, but of course .I reldb is free and the source code is available, to be readily ported to any system (no, we do not have any ties with .I VenturCom, other than being a fairly happy customer). .LP This approach, although very cheap in manpower, has its problems in terms of integration, performance and elegance of individual programs. Overall, the system has been found to be extremely portable, as it is based mainly on shell scripts, which call corresponding Unix tools, after dealing with the header lines. Efficiency, although nowhere nearly the best possible, has rarely been a bottleneck. .LP Bugs (with suggested corrections, please) should be reported to gunnar@hafro.is (Internet) or uunet!mcvax!hafro!gunnar (Uucp). \Rogue\Monster\ else echo "will not over write ./doc/reldb.1" fi if `test ! -s ./doc/rename.1` then echo "writting ./doc/rename.1" cat > ./doc/rename.1 << '\Rogue\Monster\' .\" Man page for rename(1) .TH RENAME 1 "1. April 1989" .SH NAME rename - rename columns of a table .SH SYNOPSIS .B rename .I oldcolumn1 newcolumn1 [ .I old2 new2 ... ] [ .I < data ] .SH DESCRIPTION Self-explanatory. .SH "SEE ALSO" .BR reldb(1),\ addcol(1),\ project(1) .SH BUGS None known. \Rogue\Monster\ else echo "will not over write ./doc/rename.1" fi if `test ! -s ./doc/scat.1` then echo "writting ./doc/scat.1" cat > ./doc/scat.1 << '\Rogue\Monster\' .\" Man page for scat(1l) .TH SCAT 1L "1. April 1989" .SH NAME scat \- line drawings with points or labels, based on data .SH SYNOPSIS .B scat [ .B \-x .I m,n,l ] [ .B \-y .I m,n,l ] [ .B \-w m,n,l,o ] [ .B \-X text ] [ .B \-Y text ] [ .B \-H text ] [ .B \-F "x format" ] [ .B \-F "y format" ] [ .B \-c ] [ .B \-h ] [ .B \-e ] [ .B \-E ] [ .B \-l name ] [ .B \-L name ] [ .B \-d number ] .SH DESCRIPTION .B scat (scattergram) reads data from the standard input to generate .BR plot(5) commands on the standard output. These can be read by the .BR plot(1) filters to produce a graph on the desired device. .LP .B Scat is designed so that running it with no options at all will result in a useful, though possibly not-so-pretty graph, and the options deal mostly with making the graph look nicer. .LP Data must be in columns, with a single tab character between columns. A missing value is thus denoted by two tabs. The first column is taken to be the X-axis and is therefore assumed to be numeric. .LP The first line of the data must contain column names, separated by tabs. The second line consists of dashes, to separate the first line from the data. .LP This is best explained by way of an example: .nf xcol y z v u w i - - - - - - - 0 1 2 3 4 * 6 1 2 3 4 5 ** 7 2 3 4 5 6 *** 8 3 4 5 6 7 v 9 4 5 6 7 8 vi 10 .fi .LP When no options are given, .B scat assumes that the user wants a single figure, where each column after the first is plotted against the first. .LP In the simplest case, e.g. with only the xcol and y columns above, and no options, .B scat will draw lines through the points (xcol,y), using the data to compute bounds for the graph, no figurecaption will be given, and the column names, "xcol" and "y", will be used as axes labels. .LP In the above example, the .B w column is nonnumeric, so an option must be given to indicate it as a label column. The labels will be placed at the points (xcol,u). All the other columns will be plotted against .B xcol. .LP When a y-value is missing, no line will be drawn through that point (i.e. two line segments will be skipped). A missing value for a label is simply ignored. .SH OPTIONS .LP Options are quite numerous and follow the strict Sys V format. .TP .BI \-x\ m,n,l Numerical bounds on the horizontal axis (minimum, maximum and interval width). If missing, bounds based on the data will be used. .TP .BI \-y\ m,n,l Corresponding bounds for the y axis. .TP .BI \-w\ m,n,l,o Window bounds (minx, miny, maxx,maxy). These define the region of the plot to be used for the figure. For example, to plot in the upper right quadrant use: .I \-w\ 0.5,0.5,1.0,1.0 . If missing, (0,0) to (1,1) is used. (0,0) is the lower left hand corner. .TP .BI \-X\ text Text for X axis. If missing, the name of the first column is used. .I text can be an arbitrary string (within quotes, if needed). .TP .BI \-Y\ text Text for Y axis. If missing, the name of the second column is used. .TP .BI \-H\ text Figurecaption. Text to be placed above the figure. If missing, no figurecaption is given. .TP .BI \-F\ xformat Format for numbers on x axis. For example, .B -F "x %.2f" will result in the numbers on the x axis having two digits after the decimal place. If missing, a fixed default format is used. .TP .BI \-F\ yformat Format for numbers on y axis. .TP .BI \-c Use colors. If not used, dashed and dotted lines will be used to distinguish between different columns in the input. .TP .BI \-E Use extensions to the plot(5) command set. This is needed in order to accommodate rotation of text (y-axis) and centering of labels (header, x-axis and points), none of which is supported in the base command set. .TP .BI \-e Don't erase before plotting. This is useful when several pictures are to be plotted on the same piece of paper (or screenfull or...). In this case, the .I \-e option is used for all figures after the first one. .TP .BI \-l\ name Name of y column not to draw lines for. This will result in small points being drawn for each (x,y) pair, unless a label column is specified for this y-column. The default is always to draw lines between points. .TP .BI \-L\ name Name of label-column. When used, column .I name must contain labels to be used. These labels will be placed at points (x,y), where x is taken from the first column and y is the column preceding the column .I name. .TP .BI \-d\ number Debug level. Higher levels give more output. .SH "SEE ALSO" .BR plot (1), .BR plot(5) .SH BUGS .LP .B Scat stores all points internally. Strictly speaking this is not needed when the data bounds are given, but that would require quite a bit of reprogramming. .LP The methods of setting up the default axes should be replaced by one which gives reasonable numbers by default. The current method uses the absolute maximum and minimum from the data. This tends to give ridiculous numbers, although they will of course handle the figure. .LP As .I scat is tied to the .I plot(5) command set, it cannot do miracles. Simple extensions to this command set do wonders, however. These have been implemented in the filters in this distribution and will be used when the .I \-E option is used in .I scat. \Rogue\Monster\ else echo "will not over write ./doc/scat.1" fi if `test ! -s ./doc/select.1` then echo "writting ./doc/select.1" cat > ./doc/select.1 << '\Rogue\Monster\' .\" Man page for select(1) .TH SELECT 1 "1. April 1989" .SH NAME select - select lines from a table. .SH SYNOPSIS .B select .I criterion [ .I < data ] .SH DESCRIPTION .B select reads columnar data from the standard input and outputs those lines satisfying the given criterion. Arbitrarily complex expressions (c.f. .I awk(1) ) can be used, based on the column names. .LP This is best explained by way of an example. Suppose the file .I data contains the lines: .nf x y z v u w i - - - - - - - 0 1 2 3 4 * 6 1 2 3 4 5 ** 7 2 3 4 5 6 *** 8 3 4 5 6 7 v 9 4 5 6 7 8 vi 10 .fi where exactly one tab-character separates the columns. .LP The command .I select 'z == 2' < data (where == is the usual Unix notation for 'equal') yields .nf x y z v u w i - - - - - - - 0 1 2 3 4 * 6 .fi on the standard output. .LP Other numerical selection commands include .B < > >= <= and these can be used together, with .B && (and), .B || (or), using parenthesis as needed. .LP The command .I select 'z > 2 && .I v <= 5' < data' will give .nf x y z v u w i - - - - - - - 1 2 3 4 5 ** 7 2 3 4 5 6 *** 8 .fi This last command should be read: select those lines where the z column is (strictly) larger than 2 and (at the same time) the v column is less than or equal to 5. .SH "SEE ALSO" .BR reldb(1),\ plokk(1),\ awk(1),\ grep(1). .SH BUGS .LP .B Select simply calls upon .I awk to do the work. Any bugs in .I awk will be reflected in .I select. .LP Since the column names are translated into .I awk variable names, column names are bound by the same rules as variable names in .I awk. To be on the safe side, one should only use alphabetic characters and underscores. International characters (e.g. from ISO 8859/1, upper half) will probably not work. Other symbols, such as a hyphen (minus) and asterisk will not work. .LP Some names are reserved .I awk symbols and should not be used for column names (e.g. if, for, length). \Rogue\Monster\ else echo "will not over write ./doc/select.1" fi if `test ! -s ./doc/sorttable.1` then echo "writting ./doc/sorttable.1" cat > ./doc/sorttable.1 << '\Rogue\Monster\' .\" Man page for sorttable(1) .TH SORTTABLE 1 "1. April 1989" .SH NAME sorttable - sort a table .SH SYNOPSIS .B sorttable [ .I -n ] .I < file [ .I >\ result ] .SH DESCRIPTION .LP .B Sorttable takes an input file and sorts it, using either alphabetic sort (default) or numeric sort (-n option). .LP The current implementation of .I sorttable is simply a shell script which calls .I sort(1). .LP Note that the .I -n option is only useful for generating pretty output. It should .I not be used with .I jointable. .SH "SEE ALSO" .BR reldb(1). .SH BUGS .LP .I Sorttable only reads the standard input, so '<' must be used. .LP When using numeric sort (-n), only the first column is sorted numerically. For fixed vales of the first column, the rest of the line is sorted alphabetically. \Rogue\Monster\ else echo "will not over write ./doc/sorttable.1" fi if `test ! -s ./doc/subtotal.1` then echo "writting ./doc/subtotal.1" cat > ./doc/subtotal.1 << '\Rogue\Monster\' .\" man-page for subtotal(1). .TH SUBTOTAL 1 "1. April 1989" .SH NAME subtotal - subtotal values of some columns for fixed values of other columns. .SH SYNOPSIS .B subtotal .I by bycolumnlist .I on oncolumnlist [ .I < data ] .SH DESCRIPTION .B subtotal reads columnar data from the standard input and subtotals the by-columns for fixed values of the on-columns. .LP This is best explained by way of an example. Suppose the file .I data contains the lines: .nf x y z v - - - - 1 1 2 3 1 1 3 4 2 3 4 5 2 4 5 6 2 4 6 7 .fi where exactly one tab-character separates the columns. .LP The command .I subtotal by x y on z v < data yields .nf x y z v - - - - 1 1 5 7 2 3 4 5 2 4 11 13 .fi on the standard output. .LP Note that the input must be sorted on the by-columns in some fashion, so that all instances of equivalent lines are adjacent. .LP On-columns and by-columns can be intermixed in the input. Other columns in the input are ignored in the computations and are not output. .SH "SEE ALSO" .BR reldb(1),\ sorttable(1). .SH BUGS .LP .B Subtotal calls upon two other programs, .I project and .I addcol to do the work. This is inefficient but the inefficiency is rarely noticable.\Rogue\Monster\ else echo "will not over write ./doc/subtotal.1" fi if `test ! -s ./doc/union.1` then echo "writting ./doc/union.1" cat > ./doc/union.1 << '\Rogue\Monster\' .\" Man page for union(1) .TH UNION 1 "1. April 1989" .SH NAME union - appends one table to another .SH SYNOPSIS .B union .I file1\ file2 [ .I >\ outputfile ] .SH DESCRIPTION .LP The .I union command simply concatenates two files, appending the second to the first, but using only the header of the first file. .LP With the two files, .I data1: .nf first second third ----- ------ ----- 1 3 0 1 2 3 .fi and .I data2: .nf first second third ----- ------ ----- 2 3 4 3 4 5 4 5 6 .fi the command .BI union\ data1\ data2 yields: .nf first second third ----- ------ ----- 1 3 0 1 2 3 2 3 4 3 4 5 4 5 6 .fi on the standard output. .LP Note that the two input files must have the same columns, in the same order. .SH "SEE ALSO" .BR reldb(1)\ cat(1). \Rogue\Monster\ else echo "will not over write ./doc/union.1" fi if `test ! -s ./doc/Makefile` then echo "writting ./doc/Makefile" cat > ./doc/Makefile << '\Rogue\Monster\' MANDIR=/usr/man/man1 MANPAGES=addcol.1 check.1 compute.1 count.1 dbe.1 invert.1 \ jointable.1 \ math.1 matrix.1 \ number.1 \ plokk.1 project.1 \ recode.1 \ reldb.1 rename.1 scat.1 select.1 \ testdb.1 testgr.1 \ sorttable.1 subtotal.1 union.1 install: $(MANPAGES) cp $(MANPAGES) $(MANDIR) uninstall: cd $(MANDIR);rm -f $(MANPAGES) \Rogue\Monster\ else echo "will not over write ./doc/Makefile" fi if `test ! -s ./doc/invert.1` then echo "writting ./doc/invert.1" cat > ./doc/invert.1 << '\Rogue\Monster\' .\" man-page for invert(1). .TH INVERT 1 "1. April 1989" .SH NAME invert - matrix data to (z,x,y)-style data. .SH SYNOPSIS .B invert [ .I < data ] .SH DESCRIPTION .B invert reads a matrix-style table and outputs a three-column table. .LP The contents of the table can be any real numbers. .LP This is best explained by way of an example. Suppose the file .I data contains the lines: .nf y 1 2 3 4 - - - - - 2 1 0 0 0 3 1 0 0 0 4 0 0 2 0 5 0 0 0 2 6 0 0 0 2 .fi where exactly one tab-character separates the columns. .LP The command .I invert < data yields .nf col x y --- - - 1 1 2 1 1 3 2 3 4 2 4 5 2 4 6 .fi on the standard output. .SH "SEE ALSO" .BR reldb(1),\ matrix(1). .SH BUGS .LP .I Invert uses awk to do its work. This makes for easy programming, but at a cost in efficiency. Usually, tables are small and this is not a problem. \Rogue\Monster\ else echo "will not over write ./doc/invert.1" fi if `test ! -s ./doc/number.1` then echo "writting ./doc/number.1" cat > ./doc/number.1 << '\Rogue\Monster\' .\" man-page for number(1). .TH NUMBER 1 "1. April 1989" .SH NAME number - add a column with sequential line number. .SH SYNOPSIS .B number [ .I < data ] .SH DESCRIPTION .B Number adds a columns to a .I reldb table and inserts the line number into the column. This is essentially the same as .I nl(1), but the input and output are .I reldb tables. .SH "SEE ALSO" .BR reldb(1),\ nl(1). .SH BUGS .LP No bugs are known. \Rogue\Monster\ else echo "will not over write ./doc/number.1" fi if `test ! -s ./doc/check.1` then echo "writting ./doc/check.1" cat > ./doc/check.1 << '\Rogue\Monster\' .\" man-page for check(1). .TH check 1 "19. April 1989" .SH NAME check - check sanity of reldb table(s). .SH SYNOPSIS .B check .I data-files .SH DESCRIPTION .B check reads the named .I reldb tables and checks them for sanity, i.e. for the number of tab characters per line. .SH "SEE ALSO" .BR reldb(1). .SH BUGS .LP No bugs are known. \Rogue\Monster\ else echo "will not over write ./doc/check.1" fi if `test ! -s ./doc/recode.1` then echo "writting ./doc/recode.1" cat > ./doc/recode.1 << '\Rogue\Monster\' .\" Man page for recode(1) .TH RECODE 1 "1. April 1989" .SH NAME recode - recode a column in a table .SH SYNOPSIS .B recode .I codefile [ .I <\ datafile ] [ .I >\ result ] .SH DESCRIPTION .LP .B Recode reads the recoding scheme for .I codefile and applies it to the first column in the .I datafile. .LP The .I codefile contains two columns, .I oldcode and .I newcode (in that order). .LP The .I datafile is assumed to contain the old code as the first column. .LP The output consists of a new column, of the same name as the .I newcode column in .I codefile, and then the columns of the .I datafile. .SH EXAMPLE .LP Suppose the file .I recode.data contains the lines: .nf ex1 w y z v u extra --- - - - - - ----- 25 0 9 2 3 4 -3 4 0 2 2 3 4 -2 5 0 3 4 5 6 -3 5 2 3 4 6 7 -4 9 2 5 6 7 10 -3 9 2 5 6 120 8 -116 9 23 5 6 7 8 16 25 0 9 2 3 4 -3 25 2 9 2 5 6 -3 .fi And the code file .I recode.codes contains : .nf ex1 new --- --- 1 20 2 21 3 22 4 23 5 24 9 25 25 0 .fi The following is the output from the command .I recode\ recode.codes\ <\ recode.data .nf new ex1 w y z v u extra --- --- - - - - - ----- 0 25 0 9 2 3 4 -3 23 4 0 2 2 3 4 -2 24 5 0 3 4 5 6 -3 24 5 2 3 4 6 7 -4 25 9 2 5 6 7 10 -3 25 9 2 5 6 120 8 -116 25 9 23 5 6 7 8 16 0 25 0 9 2 3 4 -3 0 25 2 9 2 5 6 -3 .fi .SH BUGS .LP There is a built-in limit on the magnitude of the old codes, as these are read into a hard-coded vector. \Rogue\Monster\ else echo "will not over write ./doc/recode.1" fi echo "Finished archive 3 of 3" exit