home *** CD-ROM | disk | FTP | other *** search
-
- Tara Datafile Utilities, Version 2.0
-
- USER'S MANUAL
- by
- David C. Oshel
-
-
-
-
- Copyright (c) 1987 by David C. Oshel
-
- ALL RIGHTS RESERVED
-
- Private individuals are granted free license to copy and
- distribute this complete set of Tara Datafile Utilities, provided
- my copyright notice is not removed, and provided you distribute
- both programs and documentation without charge.
-
- Corporate or governmental use requires a license fee of $25.00.
- Send check or money order payable to:
-
- MicroConsulting Services
- 1219 Harding Avenue
- Ames, IA 50010
-
-
-
- I. Introduction
-
- These files are included in Tara Datafile Utilities, version 2.0:
-
- File Purpose
- -------------------------------------------------------------------------
- BROWSE.BAT demonstration, uses STARS.DAT
- CRYPTIC.DOC (doc)
- CRYPTIC.EXE sophisticated file encipher, decipher
- ENTER.DOC (doc)
- ENTER.EXE data entry for MailMerge-type datafiles
- FIELD.DOC (doc)
- FIELD.EXE selected fields, with format for browse
- FNKEY.DOC (doc)
- FNKEY.EXE assign macro strings to PC function keys
- NAMES.FLD sample field definition file for NAMES.DAT
- NSORT.DOC (doc)
- NSORT.EXE numeric sort on any field or fields
- PICK.DOC (doc)
- PICK.EXE select records on any field or fields
- SETNAMES.BAT define Fn keys for use with NAMES.DAT
- SETSTARS.BAT define Fn keys for use with STARS.DAT
- STARS.DAT demonstration data file, list of bright stars
- STARS.FLD field definition file for STARS.DAT
- TARA.DOC you're reading it
- TSORT.DOC (doc)
- TSORT.EXE text sort on any field or fields
-
-
- The Tara Datafile Utilities are a set of small, powerful tools
- which were originally intended just to make life with MailMerge a
- little easier. Until now, information retrieval from this kind
- of data file meant you had to use WordStar's Merge Print facility
- to restrict, select out, and display your data. If you wanted to
- sort (or resort) your data before using it, you were out of luck.
-
- But with these tools, you now have direct access to your own data
- and you no longer need to fire up WordStar and a printer just to
- see what's what.
-
- Most of these programs are simple, one-task tools. They are used
- in combination with each other and with the MS-DOS command line
- i/o redirection facility. In general, Tara Datafile Utilities
- allow you to:
-
- a) restrict the mass of data you have to the few records and
- fields you actually need to look at, based on the contents of any
- field or combination of fields;
-
- b) sort this smaller subset of information in either ascending
- or descending order, on either text or numeric data, on any field
- or combination of fields;
-
- c) project the results to another file for use by WordStar or
- some other program, or to the screen, where you may view selected
- and/or formatted fields at leisure.
-
- Tara Datafile Utilities also include:
-
- d) a quick data entry module, using a field definition file which
- you create;
-
- e) a sophisticated file encryption program;
-
- f) a program which assigns macro strings to PC function keys.
-
-
- ###
-
-
- II. MS-DOS i/o redirection and pipes
-
- You should have a firm grasp of what "i/o redirection" is all
- about in order to use the Tara Datafile Utilities. (There is
- also a brief discussion of this issue in your MS-DOS manual.)
-
- In general, "i/o" means INPUT TO and OUTPUT FROM any particular
- program. Most commonly, input comes from your computer keyboard
- and output goes to your computer screen.
-
- Less commonly, input may come from a text file, and the stream of
- characters from that file is treated exactly as though someone
- were rapidly typing on the computer's keyboard.
-
- Similarly, output may be sent to some other destination than the
- computer screen -- either to a line printer or to a text file.
-
- Less commonly still, the OUTPUT from one program can be the INPUT
- to another program! This is called a "pipe".
-
- So, the three things to be aware of are 1) redirected input, 2)
- redirected output, and 3) pipes. These three demons are invoked
- on the MS-DOS command line, and nowhere else.
-
- The DEFAULT input is "<CON:" and the DEFAULT output is ">CON:".
- You do not need to specify the DEFAULT input or output on any
- command line. However...
-
- Input from a FILE has the form "<file1.dat" and output to a file
- has the form ">file2.dat". Output to the line printer would
- typically be written as ">PRN:", but you may also see ">LPT1:" or
- ">LPT2:" in the case of serial printers or modems.
-
- A PIPE is specified with the vertical bar character, "|",
- surrounded by spaces fore and aft, " | ", between two COMMANDS:
-
- C>dir | sort | more
-
- This example comes entirely from MS-DOS; i.e., "dir", "sort" and
- "more" are all MS-DOS commands, and they are discussed in your
- MS-DOS manual.
-
- Note that MS-DOS sort typically REQUIRES i/o redirection:
-
- C>sort <inputfile >outputfile
-
-
- Note also that the SOURCE file and the DESTINATION file must NOT
- be the SAME file! I.e., this is a serious error which typically
- destroys your source document:
-
- *** DANGER *** C>sort <abc.dat >abc.dat
-
-
- ###
-
- III. Features common to all Tara Datafile Utilities
-
- A. Record type is MailMerge-compatible
-
- All Tara Datafile Utilities assume that the data they work with
- is compatible with MicroPro, Inc.'s WordStar program -- in
- particular with Merge Print (a.k.a. MailMerge in older versions).
-
- That means, in general, that records consist of a single line of
- characters terminated with a carriage return/line feed pair
- ("newline"), and that fields within each record are delimited by
- commas, except for the last field. If a field contains a comma,
- the field is enclosed in double quotation marks. If a field
- contains both a comma and quotes, the field is enclosed in
- apostrophes. If neither of these quotation schemes is adequate
- to mark off the fields in the record, the delimiting character
- can be changed from comma to something else.
-
- For example:
-
- "Oshel, Ph.D.",David,C.,1219 Harding,Ames,IA,50010,,,yes
-
- There are THREE fields following the zip code comma in this
- example. The last field is terminated by the same newline pair
- that terminates the record. The first field contains a comma, so
- that field is quoted.
-
- Tara Datafile Utilities refer to the fields in a record by FIELD
- NUMBER. The nth field in each record is to the immediate left of
- the nth comma (or chosen delimiting character). In the example,
- the word "Harding" lies in the fourth field, there is no home
- phone number in the eighth field, and the word "yes" occupies the
- last, tenth field. The ENTER program provides field numbers for
- ready reference -- you don't HAVE to count your commas! You do
- have to refer to fields by the numbers.
-
- There is NO DISTINCTION between upper or lower case in any of the
- Tara Datafile Utilities. This applies generally and everywhere.
- However, upper and lower case IN DATA are preserved when data is
- written to a new file or to the screen.
-
- All of these utilities STRIP THE HIGH BIT out of characters when
- writing data to standard output. This does not alter the source
- data, but does produce a file which contains no "negative ASCII"
- characters -- on IBM PC's, these are the letters with diacritical
- marks, box characters, greek letters, etc. WordStar uses them to
- hide formatting information in DOCUMENT MODE text.
-
- This fixup allows PICK, TSORT and NSORT to work with data that
- was inadvertently edited in WordStar document mode. You can
- detect "funny letters" in your data by using the MS-DOS TYPE
- command to examine your file. Negative ASCII characters would
- prevent the pattern-matching utilities from finding "obvious"
- matches. Tara Datafile Utilities do not alter your source data,
- unless you write it back to the original file through a pipe.
-
- Notice that MailMerge-type datafiles are somewhat idiosyncratic.
- Another useful definition of an "ASCII data file" is that fields
- with text data are quoted, fields containing quotes double the
- quotation character (e.g., """" defines a field which contains a
- single " character!), but numeric data is not quoted. Some
- programs that support some version of this other definition are
- BASIC and R:Base 5000. The difference lies in how to handle the
- ambiguity caused by quoting the quote character. WordStar tries
- to duck the issue by adding ' as another quote character.
-
- However, there is no hard and fast rule or widely held standard.
- These other formats are chiefly used only to import and export
- data to and from expensive data base management programs, which
- do not themselves use the format internally. "MailMerge format"
- is extremely and immediately useful (to WordStar as well as other
- programs) so that is the format chosen here. Data imported from
- R:Base 5000 or dBase III ("DELIMITED BY ,") will probably be
- acceptable to Tara Datafile Utilities, and probably acceptable to
- WordStar Merge Print, but you may need to do some preliminary
- fixing up. (If you, like me, have either of those pricey
- database programs you're probably using Tara for the same reason
- I do -- Tara is quick and dirty and easy to live with. But I
- wouldn't do a fancy job with just these simple tools. Yet...!)
-
- This kind of record has one distinct advantage -- it is variable
- length -- and one obvious disadvantage -- it takes longer to get
- information out of the file (especially numeric information). In
- general, MailMerge-type data files should not contain more than a
- a few hundred records, or processing time will be tedious. But
- you can always PICK a smaller set of data from a larger file.
-
- The largest record Tara Datafile Utilities can handle contains
- 4095 characters (with any number of fields within that limit).
-
- ###
-
- B. Switches
-
- The "-H" or "/H" switch always invokes a help screen for each of
- the utilities.
-
- Examples: C>enter -h
- C>pick /h
- C>fnkey -h
-
- The "-F" or "/F" switch changes the field delimiter character
- from comma to something else, in those utilities that read data.
-
- Examples: C>pick -f\ <abc.dat 1 has aardvark
- C>field <temp /f* /s. 1 12 3 8: | more
- C>nsort "-f|" 16 d
-
- But: C>fnkey -f9 "pick -f\ <abc.dat"
-
- The "-S" or "/S" switch only occurs in FIELD.EXE, and is used to
- change the fill character from BLANK to something else when
- right- or left-justifying formatted output.
-
- The "-E" or "/E", and "-D" or "/D", switches are used in CRYPTIC
- and nowhere else, to indicate mode: encipher or decipher.
-
- All program switches begin with either hyphen or slash. Switches
- alter the usual behavior of the utility in some way. There are
- only one or two switches at most for each utility, in addition to
- the help switch.
-
- Undefined switches usually, but not always, invoke the utility's
- help message. FNKEY will not try to interpret any switches but
- its own, viz., -H and the first instance of -F. Negative number
- arguments are never interpreted as switches.
-
- Note that a switch which contains <, |, or > must be QUOTED.
-
-
- ###
-
- C. Standard Output
-
- All of the utilities support redirected i/o using standard input
- and standard output. This feature is not especially useful with
- ENTER or FNKEY but it's there if you can figure out a use for it.
-
- ("Standard input" and "standard output" refer to the method which
- a program uses to receive or transmit data. The "standard" way
- to do things supports MS-DOS i/o redirection and pipes. The
- various non-standard i/o schemes used by almost all commercial
- programs are much faster ... and also much less flexible.)
-
- ###
-
- D. ANSI.SYS, only for FNKEY.EXE
-
- None of these utilities need to have ANSI.SYS installed on your
- system -- except for FNKEY, a program that assigns meanings to
- your PC's function keys. See your MS-DOS manual for instructions
- on installing ANSI.SYS. What you need to do is place the command
- DEVICE=ANSI.SYS into the CONFIG.SYS file on your startup disk.
-
-
- ###
-
- E. Numeric and dollar data types
-
- Utilities that recognize numeric data will correctly recognize
- the dollar format, e.g., $123,456.78. A minus sign may appear
- anywhere in its proper scan field. All numeric data is converted
- to floating point for purposes of comparison. Scientific formats
- like "1.3e-2" are supported, but "2e" is not considered a number
- (because there is no digit following the E); the exponentiation
- operator may be either E or D. Scientific and dollar notations
- are mutually incompatible in the same field. The FIELD utility
- relaxes the strict interpretation of what is a number, and allows
- multiple hyphens, parentheses and slash -- this allows data like
- social security numbers and phone numbers to be right-justified
- in formatted output. All non-numeric data is Text.
-
-
- ###
-
-
- IV. The CRYPTIC Program
-
- Usage: C>cryptic { -D | -E } password <inputfile >outputfile
-
- Examples: C>cryptic absinthe <myfile.dat >coded.dat
- C>cryptic -D absinthe <coded.dat | more
-
- The -D switch selects Decipher Mode. The -E switch selects
- Encipher Mode, and is the default mode. You must supply the same
- password each time you use Cryptic with a particular file.
-
- *** WARNING: Do NOT forget your password! ***
-
- This is a simple "filter" which scrambles the contents of text
- files. Coded files are indecipherable by normal means. To
- decode the scrambled file, run it through CRYPTIC once again
- using the same password as before.
-
- If you forget your own password, you're out of luck. The
- password is not recoverable. Remember it! You should keep a
- backup copy of any data file you use, in your possession and off
- the premises. This anticipates the problem that someone might
- maliciously encrypt your data for you.
-
- This utility provides a first level of data security only. It
- will prevent unauthorized access by average persons, but will not
- withstand expert analysis.
-
- Be advised that your physical disk medium probably retains an
- image of some part of your plain text, even if you have erased
- the file, unless you reformat the disk (after copying your coded
- data to another disk, of course!).
-
- Using a PIPE with plain text in it will also leave a transient
- image on physical disk media! If security is crucial, use your
- floppy drive and be sure to format the working diskette after a
- session. This physical data image, not associated with any file,
- can be viewed (and security compromised!) by any number of garden
- variety programs including Norton Utilities and PC-Tools.
-
- The encryption algorithm used here is more sophisticated than the
- usual "xor" type of scramble. In its day, this cipher could not
- be cracked, but no doubt things have changed a bit since the
- Crimean War.
-
- To illustrate the potential difficulty of cracking this cipher,
- Tables 1 and 2 compare the frequency of occurrence of characters
- found in a plain-text data file, against the frequency found in
- an especially cryptic version of the same data.
-
- The enciphered data file was run through Cryptic four times using
- a different password each time. Ciphered data has "smeared" over
- the range of all printable characters, while the gap between most
- and least frequent characters is far less than in the plain text.
-
- Both the plain text and the ciphered data contain 117 records and
- a total of 8,602 printing characters. Tables follow.
-
-
- Table 1. Frequencies of 79 characters found in a plain text
- data file delimited by commas.
-
- , = 1088, 12.6482% R = 48, 0.5580%
- = 563, 6.5450% k = 46, 0.5348%
- e = 532, 6.1846% v = 46, 0.5348%
- r = 339, 3.9409% P = 41, 0.4766%
- o = 335, 3.8944% H = 40, 0.4650%
- a = 314, 3.6503% f = 40, 0.4650%
- 0 = 302, 3.5108% O = 40, 0.4650%
- n = 275, 3.1969% ) = 37, 0.4301%
- s = 268, 3.1156% ( = 37, 0.4301%
- t = 260, 3.0226% b = 34, 0.3953%
- 2 = 251, 2.9179% * = 32, 0.3720%
- i = 241, 2.8017% L = 30, 0.3488%
- 1 = 207, 2.4064% U = 28, 0.3255%
- A = 204, 2.3715% W = 27, 0.3139%
- l = 173, 2.0112% J = 24, 0.2790%
- 5 = 170, 1.9763% G = 23, 0.2674%
- 3 = 168, 1.9530% N = 23, 0.2674%
- m = 139, 1.6159% E = 22, 0.2558%
- C = 138, 1.6043% & = 20, 0.2325%
- d = 129, 1.4997% T = 19, 0.2209%
- 4 = 122, 1.4183% x = 17, 0.1976%
- - = 119, 1.3834% F = 16, 0.1860%
- I = 118, 1.3718% ? = 16, 0.1860%
- S = 110, 1.2788% / = 14, 0.1628%
- 9 = 104, 1.2090% K = 12, 0.1395%
- 8 = 104, 1.2090% V = 11, 0.1279%
- c = 101, 1.1741% Y = 10, 0.1163%
- u = 100, 1.1625% # = 7, 0.0814%
- h = 94, 1.0928% z = 6, 0.0698%
- 6 = 92, 1.0695% ; = 4, 0.0465%
- 7 = 88, 1.0230% : = 2, 0.0233%
- p = 79, 0.9184% ' = 2, 0.0233%
- M = 69, 0.8021% X = 1, 0.0116%
- . = 66, 0.7673% ! = 1, 0.0116%
- y = 66, 0.7673% Z = 1, 0.0116%
- D = 62, 0.7208% j = 1, 0.0116%
- " = 62, 0.7208% q = 1, 0.0116%
- B = 58, 0.6743% \ = 1, 0.0116%
- g = 58, 0.6743% Q = 1, 0.0116%
- w = 53, 0.6161%
-
- Table 2. Frequencies of 95 characters found in a 4-ply cipher of
- the same data.
-
- 0 = 155, 1.8019% p = 90, 1.0463%
- . = 131, 1.5229% K = 89, 1.0346%
- 6 = 129, 1.4997% m = 88, 1.0230%
- , = 127, 1.4764% w = 88, 1.0230%
- % = 123, 1.4299% D = 87, 1.0114%
- ' = 123, 1.4299% H = 87, 1.0114%
- # = 122, 1.4183% e = 86, 0.9998%
- 2 = 121, 1.4066% g = 86, 0.9998%
- * = 121, 1.4066% 5 = 85, 0.9881%
- + = 119, 1.3834% o = 84, 0.9765%
- v = 118, 1.3718% M = 83, 0.9649%
- x = 117, 1.3601% ^ = 83, 0.9649%
- ( = 116, 1.3485% G = 83, 0.9649%
- | = 116, 1.3485% n = 81, 0.9416%
- 8 = 115, 1.3369% > = 81, 0.9416%
- 3 = 114, 1.3253% ; = 80, 0.9300%
- 4 = 113, 1.3136% { = 79, 0.9184%
- ! = 113, 1.3136% J = 79, 0.9184%
- ~ = 112, 1.3020% Q = 78, 0.9068%
- } = 111, 1.2904% N = 76, 0.8835%
- " = 110, 1.2788% X = 76, 0.8835%
- = = 109, 1.2671% a = 75, 0.8719%
- r = 109, 1.2671% f = 75, 0.8719%
- z = 109, 1.2671% l = 73, 0.8486%
- C = 107, 1.2439% ` = 72, 0.8370%
- ) = 106, 1.2323% F = 71, 0.8254%
- u = 106, 1.2323% Z = 71, 0.8254%
- A = 105, 1.2206% j = 70, 0.8138%
- & = 104, 1.2090% h = 69, 0.8021%
- < = 103, 1.1974% \ = 69, 0.8021%
- ? = 101, 1.1741% k = 67, 0.7789%
- E = 101, 1.1741% c = 67, 0.7789%
- / = 100, 1.1625% V = 66, 0.7673%
- 9 = 100, 1.1625% T = 65, 0.7556%
- : = 97, 1.1276% d = 64, 0.7440%
- $ = 96, 1.1160% L = 64, 0.7440%
- y = 95, 1.1044% O = 64, 0.7440%
- i = 94, 1.0928% U = 63, 0.7324%
- 1 = 92, 1.0695% [ = 61, 0.7091%
- t = 92, 1.0695% R = 60, 0.6975%
- @ = 92, 1.0695% S = 59, 0.6859%
- = 92, 1.0695% W = 56, 0.6510%
- q = 92, 1.0695% P = 56, 0.6510%
- - = 91, 1.0579% Y = 55, 0.6394%
- B = 91, 1.0579% b = 55, 0.6394%
- I = 90, 1.0463% _ = 54, 0.6278%
- 7 = 90, 1.0463% ] = 52, 0.6045%
- s = 90, 1.0463%
-
- ###
-
- V. Errors and problems:
-
-
- Problem: Nothing happens, the computer just sits there.
-
- You have forgotten to name the <input file. As a result, the
- program is correctly (but stupidly) waiting for you to type
- characters on the keyboard. (Same as <CON:)
-
- Solution: Type Ctrl-C or Ctrl-Break.
- This problem does not occur in Version 2.0.
-
-
-
- Problem: File not found.
-
- You have not spelled the name of your <input file correctly, or
- you have given an incorrect or incomplete path name.
-
- Solution: Check the directory for exact spelling.
-
-
- Problem: No result.
-
- The first thing that should suggest itself is an empty data file.
- As unlikely as this seems, it might be true. Your directory will
- indicate 0 bytes for the file size, if this is the case. (When
- you see the evidence, you will remember how you caused the
- problem yourself sometime last week.) A doctor I know created
- this situation by using WordStar to split data into two files; he
- then DELETED all the records from the original file instead of
- erasing it. It can take quite a while to diagnose this kind of
- error, so go "by the book" -- check your file size as a matter of
- course.
-
- Secondly, you might be asking for something that does not exist.
- Either PICK cannot find an exact match in a field, or FIELD
- cannot find the 6th field in records that only have 5 fields,
- etc.
-
- Another possibility, you want PICK to find the pattern 150 but
- PICK thinks 150 is a field number..! Use PICK 0 HAS 150 instead
- of PICK 150. Also valid, PICK 0 150.
-
- Solution: Check your directory. Check your spelling.
- Try to PICK using the HAS operator instead of EQ.
-
-
- Problem: Output contains garbage from a help screen.
-
- You have run the output from a utility through a pipe into FIELD.
- However, a command early in the chain bombed out and printed its
- help message into the pipe. When FIELD got it, it tried to
- format the output. The result is bits and pieces of a help
- screen, possibly TSORT-ed!
-
- Solution: Put complex commands in a batch file. Use FNKEY.
-
-
-
- Problem: No room on the disk
-
- MS-DOS pipes are actually FILES -- hidden, system-level files,
- but files nonetheless. Every pipe you create will demand disk
- space until its job is done. In addition, MS-DOS will allocate
- at least one entire CLUSTER to a file no matter how small the
- file actually is in byte count. Compare available disk space
- before and after deleting a very small file. Are you surprised
- by the result? Erasing one small file will free up at least 8k
- of disk space! So a command chain that uses an input file, a
- pipe, and an output file can demand up to THREE TIMES the size of
- the original file by itself. And the actual amount allocated by
- MS-DOS for any fractional part of a cluster is 8k -- that is, a
- file with 8,193 actual bytes takes up 16k of space! Your
- reaction to this bit of news is probably the same as mine was.
-
- Solution: Erase some files and try again. Get a hard disk.
- Get <input from Drive B, while logged onto Drive A.
-
- ###
-
-