home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Shareware Overload
/
ShartewareOverload.cdr
/
games
/
patterns.zip
/
GDT.DOC
< prev
next >
Wrap
Text File
|
1988-04-01
|
11KB
|
276 lines
GDT.COM
Horizontally Aligns Material on a GIven Marker Byte
by
Robert A. Magnuson
DMB, DCRT, NIH
Bethesda, MD 20892
Mar 1988
Revised Apr 1988
GDT (for Generalized Decimal Tab) is a filter that reads lines
from a set of files--or from stdin if no files are specified.
Each line is written to stdout, shifted such that the first
occurrence of a marker byte is in a specified column. A line
with no marker byte is shifted as if a marker byte were just
beyond the end. An option permits truncation. Without that
option the shifting stops just short of truncation. Such lines
can optionally be written to stderr.
[This document is intended to be read from the screen.
It contains some characters which will probably not print
correctly on a printer.]
A DOS command line that invokes GDT contains a number of
arguments. The syntax is expressed by the following diagram:
┌──────────────────────────────┐
│ ┌─ d ─┐ delete marker │ ┌──────────┐
GDT ─┴ / ┬─┤ ├────────────────┬┴─ <marker> ── <col> ─┴┬ <file> ┬┴──
│ ├─ e ─┤ write truncated │ └───────┘
│ │ │ lines to stderr │
│ └─ t ─┘ truncation OK │ 3/27/88
└────────────────────────┘
First there may be optional arguments specifying various GDT
options. Then there is the required marker byte, the first
occurrence of which in each line read determines the horizontal
shifting of that line. Next, there is the required column
specification. Then come the filenames which can be wildcarded.
If no filenames appear, GDT gets its input from stdin.
The arguments can optionally be enclosed in double
quotes. The enclosing quotes are stripped off and not
seen by GDT. Should you need to have a double quote
within an argument, the argument must be double quoted
and the internal double quote must be escaped by
preceding it with a backslash. This treatment of the
double quotes is done by the argc/argv mechanism of the C
compiler. [GDT is implemented in Borland TurboC.]
GDT exits with ERRORLEVEL set to one if some lines needed to be
truncated--but weren't because there was no /t option. In that
case the produced alignment is incorrect for those lines. The
ERRORLEVEL is set to zero when no truncation problems arose, and
to two for syntax problems.
OPTION ARGUMENTS:
GDT options are specified by the presence or absence of various
option letters in option arguments. Any option argument must
begin with a slash and must appear in front of the other kinds of
arguments. Any number of legal option letters can appear in an
option argument. Thus, you can have multiple option arguments,
perhaps each with a single option letter (and each beginning with
a slash), or just one option argument containing all of the
option letters desired. CURRENTLY, ALL LEGAL GDT OPTION LETTERS
ARE lower case.
For the sake of readability, the above syntax diagram
shows only the case where all option letters appear in
one option argument.
A GDT syntax error occurs when illegal option letters appear, and
when required arguments are missing or invalid. Both the
<marker> and <col> arguments are required. The marker must be
entered as a single byte, as \d..d, or as \xh..h--where the d's
are decimal digits, and the h's are hexadecimal digits.
When a syntax error occurs GDT prints a boxed syntax diagram
containing terse instructions on how to use GDT. This mechanism
can be deliberately tripped in order to get on-screen help. The
suggested way is to invoke GDT with no arguments--thus causing
the no-<marker> syntax error.
The /d option specifiess that the markers are to be deleted from
the shifted lines. The marker is searched for from left to
right. The first one found is used for alignment. Any further
markers are ignored and remain as text.
The /e option is used for error reporting. Lines that could not
be aligned without truncation are copied to stderr.
The /t option freely permits truncation. No errors are reported,
and the ERRORLEVEL will not be one.
EXAMPLES:
Let the file AMOUNTS contain the following lines:
47.23
1923.50
.57
32768
1.27
4.12
588.90
The indentation shown is to distinguish the dislayed contents
from the expanatory text. I.e., each line in AMOUNTS starts in
column one without any leading blanks. A copy of this data with
the decimal points aligned in, say, column ten is obtained via
gdt . 10 AMOUNTS
Here the marker is a period, the alignment column is 10. The
results will come out on the screen, i.e., stdout, as follows:
47.23
1923.50
.57
32768
1.27
4.12
588.90
The following material
Julius @Caesar
Charles @De Gaulle
Hernando @de Soto
Ferdinand Victor Eugene @Delacroix
Office of the @Emperor
Kublai @Khan
@Office of the President
Aleksandr Sergeyevich @Pushkin
Baron von @Richthofen
@Royal Canadian Air Force
Bachelor of @Theology
Vincent @van Gogh
when filtered through GDT via
gdt/d @ 30
comes out as
Julius Caesar
Charles De Gaulle
Hernando de Soto
Ferdinand Victor Eugene Delacroix
Office of the Emperor
Kublai Khan
Office of the President
Aleksandr Sergeyevich Pushkin
Baron von Richthofen
Royal Canadian Air Force
Bachelor of Theology
Vincent van Gogh
Now you may wonder how the at-sign markers got into the file, how
the material got sorted, etc. Much of this can be done by FP.EXE
and CHP.EXE (which are pattern utilities).
Suppose we start with NAMES, an unsorted file of the material to
be massaged:
---------- NAMES
[1]Charles @De Gaulle
[2]Kublai Khan
[3]Bachelor of Theology
[4]Aleksandr Sergeyevich Pushkin
[5]@Office of the President
[6]Office of the Emperor
[7]Julius Caesar
[8]Vincent @van Gogh
[9]@Royal Canadian Air Force
[10]Ferdinand Victor Eugene Delacroix
[11]Hernando @de Soto
[12]Baron von Richthofen
We assume here that most names are to be sorted by the last word
first. Therefore, at-sign markers appear only where needed by
exception (in lines 1, 5, 8, 9 and 11). GDT is used at the end
of the processing. The required processing is:
■ Separate out the lines with the at-signs,
■ For the other lines precede the last word with the
at-sign mark,
■ Recombine lines just automatically marked with others,
■ For each line, put the mark and following bytes in front
followed by a new temporary mark (here a smiley face),
followed by stuff before mark,
■ sort,
■ rearrange each line in former order (removing smiley
faces),
■ Finally via GDT, align mark on chosen column.
This can be done automatically via the following batch file:
---------- Z.BAT
[1]:split off already marked names to temp file,
[2]: and, at the same time,
[3]:take unmarked names, precede last word with mark,
[4]:then append to temp file
[5]fp/vou (tmp) @ names|chp "\( ?\)\([a-z]+\)$" \1@\2>>(tmp)
[6]:put the mark and following stuff in front,
[7]:followed by a smiley face, followed by stuff before mark,
[8]:then sort, then rearrange,
[9]:then align at-sign mark on column 30
[10]chp \(.*\)\(@.*\) \2\1 (tmp) | sort
| chp \(.*\)\(.*\) \2\1 |gdt/d @ 30
Comments and explanations follow:
Note that the /u option of FP is used below. You will
need a version of FP.EXE with a date of at least 4/01/88
for this option to be in FP.
[5]fp/vou (tmp) @ names|chp "\( ?\)\([a-z]+\)$" \1@\2>>(tmp)
Here FP (for Find Pattern) first writes to the file (TMP)
those lines in the file NAMES that contain an at-sign.
These lines are the ones that are not selected, and are
written because of the /u option. At the same time FP
selects those lines that do NOT have an at_sign. (The /v
option inverts the selection, and the /o option omits
including the filename--which would be ---------- NAMES.)
Those lines are piped to CHP (CHange via Pattern). The
pattern consists mainly of two backslash-parenthesized
patterns. The first matches zero or one blank. The
second matches one or more letters (irrespective of
case). The ending dollar sign matches end of line. Thus
\2 is the last word, and \1 is the preceding space.
(There would be no space if a line had just one word on
it.) The result consists of the possibly empty blank, the
at_sign, and the last word. What about the material to
the left of the rightmost word? CHP writes out intact
all unmatched material. This is then appended (because
of the >> redirection) to the temp file that already
contains the lines that had had at-signs. Now every line
has an at-sign in front of the major-sort part. The
[a-z] character set would have to be enlarged if any of
the names were hyphenated or contained apostrophes.
[10]chp \(.*\)\(@.*\) \2\1 (tmp) | sort
| chp \(.*\)\(.*\) \2\1 |gdt/d @ 30
Now, before sorting, the major-sort part needs to be put
in front. Therefore, the first CHP--reading from the
temp file--puts all the material preceding the at-sign
into \1, and puts the at-sign and all following bytes
into \2. The result is to be the two parts interchanged
with a smiley face in between. The smiley face is
inserted so that the end of the last name (in case of
multiple words there) can be found later. This is piped
to SORT whose stdout is, in turn, piped to the second
CHP. There, the two parts are interchanged again and the
smiley face is removed. Finally GDT does its thing. You
may wonder about the smiley face used temporarily to mark
the end of the major-sort part. Other characters could
have been used but this temporary marker should be
lexicographically below a blank in order to sort
correctly.