home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Media Share 9
/
MEDIASHARE_09.ISO
/
utility
/
diffx.zip
/
HDIFF.ARC
/
HDIFF.DOC
< prev
next >
Wrap
Text File
|
1990-01-11
|
25KB
|
577 lines
hdiff 1.22
hed 1.01
Purpose
-------
Hdiff compares two DOS text files and records the differences
between them in a third file. Although hdiff can be used for
simple "what's changed?" purposes, its real function is to
assist in maintaining program source code or similar text files
that change over time. By maintaining an original base file
and a series of "difference" files, it's possible to retain all
versions of a file at a great savings in space over retaining
the full text of all versions.
The hdiff system includes an auxiliary program, hed, that is
used to apply the difference files to the original (although
EDLIN can also be used if the files are small enough).
HDIFF 1.22 is unchanged from the version released in December,
1987. HED 1.01 corrects a problem that very occasionally caused
incorrect reproduction of the original file after applying
updates. Boy, did it take us a long time to track this one
down.
Running hdiff
-------------
The general syntax for hdiff is:
hdiff [-ecs] old-file new-file [dif-file]
The simplest use of hdiff is exemplified by:
hdiff oldfile.txt newfile.txt
which displays a simple report of differences between the two
files: it shows which lines of OLDFILE.TXT do not appear in
NEWFILE.TXT (deletions), and which lines of NEWFILE.TXT do not
appear in OLDFILE.TXT (insertions). The simple change report
consists of text lines in this format:
nnnn[+/-] text
A '+' format indicates that the line is new (an insertion); the
'-' indicates that the line is gone (a deletion). Thus:
0001- This line appears in the old file only
0001+ This line appears in the new file only
The 'nnnn' represents the line number. For '+' lines, it's the
line number in the new file; for '-' lines, it's the line number
in the old file.
(Note that the first file named on the command line is always
assumed to be the "old" file, and the second is the "new" file.)
If you want the report to be sent to a text file rather than to
the screen, simply include the file name as a third parameter:
hdiff oldfile.txt newfile.txt changes.txt
NOTE: the simple report does not show lines that have been
moved. The edlin-format report (-e switch) does include moved
lines. Use the -e report for maintaining difference files; the
simple report does not contain enough information.
Optional switches
-----------------
Here are the switches that can optionally be added to the
command line. They must precede the file names:
-c Case insensitive: hdiff ignores differences in
alphabetic case. Thus, the two lines:
This is text
THIS IS TEXT
are not reported as changed.
-e Edlin: produce an edlin-compatible difference file
rather than the simple difference report described
above. This switch is also used to created hed-format
files. See succeeding sections for more information.
-s Space insensitive: hdiff ignores differences in spacing.
This, the two lines:
This is text
This is text
are not reported as changed.
The switches may be combined, and they may be in any order:
-e -c
-ec
-ce
-c -e
are all equivalent. All switches must, however, precede the
first filename.
Examples of hdiff use:
hdiff foo.c newfoo.c
compares file 'foo.c' with file 'newfoo.c' and displays
a simple report showing insertions (lines in newfoo that
do not appear in foo) and deletions (lines in foo that
do not appear in newfoo). Lines that have been moved
but are otherwise unchanged do not appear in this
report.
hdiff -ec foo.c newfoo.c foo.114
compares foo.c with newfoo.c, ignoring case differences,
and prepares an edlin/hed script in the file foo.114.
This script, if applied to foo as described below, will
create a copy of newfoo.
Applying difference files: edlin and version control
----------------------------------------------------------
The main purpose of hdiff is to assist you in maintaining
multiple versions of program source or other text files. Many
programmers like to keep archival copies of old source, for any
of a number of reasons (one reason: sometimes changes don't work
and it's necessary to go back to a previous version!). You
could simply keep an archive or library with the complete text
of all versions, but this is wasteful of disk space.
A better solution (short of purchasing a true SCCS ["Source Code
Control System"] for big bucks) is to use hdiff and hed or edlin
to keep one original source file plus smaller difference files
that can be used to re-create any version.
To see how this works, assume that you have an old version of
your program MYPROG.C (in a file called MYPROG.SCC) and a new
version named MYPROG.C:
myprog.scc (version 1.00)
myprog.c (version 1.10)
To create a difference file, use hdiff:
hdiff -e myprog.scc myprog.c myprog.110
After hdiff is finished, you will have a file (MYPROG.110) that
contains the differences between 1.00 and 1.10. Because of the
-e switch, this file is in a special format: it is actually the
text of a series of edlin commands that would turn version 1.00
source into version 1.10 source. It is an edlin script. So, if
you were to execute the commands (remember that MYPROG.SCC is
version 1.00):
copy myprog.scc myprog.c
edlin myprog.c < myprog.110
the result (after edlin finished) would be a file called
MYPROG.C that contains the source for version 1.10. Thus,
between the original (1.00) MYPROG.SCC and the difference file
MYPROG.110 you have all you need to re-create either version of
the program. Chances are, however, that MYPROG.110 is much
smaller than the full source for MYPROG.C, so considerable
storage is saved.
Note that edlin cannot deal, in this context, with files larger
than about 48K. If you try to apply a difference file to a base
file larger than 48K using edlin, the resultant file will be
damaged and probably unusable. For this reason and others, we
recommend using the supplied program "hed" rather than edlin.
Using hed
---------
Hed is a simple program that can be used in place of edlin to
apply update files. We prefer it to edlin for this purpose;
there are several reasons:
1. It's much faster.
2. It doesn't suffer from Edlin's 48K file size restriction.
3. It handles file dates in a more useful manner.
4. It can create a new file with a different name.
5. It can apply more than one update file at a time.
Hed's full syntax is:
hed [-nv] base diff[+diff...] [new]
where base is the original source file (MYPROG.SCC, in the above
example), diff is the difference file created by hdiff, and new
is the optional output file name.
The optional parameters are:
-n No sort: instructs hed not to sort multiple update
files, i.e., to apply them in the stated order.
-v reVerse: instructs hed to sort multiple updates
files is reverse order (more about this shortly).
If both -n and -v are specified, -n takes precedence.
This command creates MYPROG version 1.10 from the 1.00 source
and the difference file created above:
hed myprog.scc myprog.110 myprog.c
On completion, you'll have the 1.10 source in the file MYPROG.C.
If you do not include a third file name (new), hed will change
the extension of the base file to BAK and re-use the base file
name for the output. In other words,
hed myprog.scc myprog.110
wil leave the original MYPROG.SCC in MYPROG.BAK, and the new
MYPROG.SCC will be the source for version 1.10. This is exactly
what edlin would do.
Note that you can apply several updates at once:
hed myprog.scc myprog.110+myprog.111+myprog.120 myprog.c
More information about this feature is in the section called
"Hed and Multiple Updates".
File dates
----------
If you use hdiff's -e switch and specify an output file, hdiff
will set the difference file date to the same date as new-file.
That is, after
hdiff -e myprog.scc myprog.c myprog.110
MYRPOG.110 will have the same date as MYPROG.C. This is useful
because hed uses the difference file date for its own output.
That is, after:
hed myprog.scc myprog.110 myprog.c
MYPROG.C will have the same date as MYPROG.110, which, in turn,
has the same date as the original copy of MYPROG.C.
In this manner, the hdiff/hed system can retain true file dates
for all versions.
Cdelta and cget
---------------
The two demonstration batches, cdelta and cget, provide a quick
sample of the kinds of things that can be done with hdiff and
hed. The two batches are designed for C programs; to revise
them for other languages, simply replace all references to ".c"
with the desired extension (.asm, for example).
The purpose of cdelta is to generate a change script that will
convert a "base" source file into a specified version of your
source. Cget performs the inverse task; it applies a specified
change file to the base and produces a file containing the
specified version. File naming conventions are as follows:
file.scc: "base" source; scc = source code control
file.###: A change script to produce version ###
file.c: The current version (cdelta), or the
output file (cget)
For example, suppose you are working with a C program called
FOO. A base (earliest) version of this file should be in
FOO.SCC. You have just finished revision 1.10 of FOO. To
create the change file, type
cdelta foo 110
The batch will create a new file, FOO.110; this file is an
edlin/hed compatible script that will convert FOO.SCC into
version 1.10 of FOO.C.
To retrieve a specified version, say 1.05, use
cget foo 105
The batch will apply the script FOO.105 to FOO.SCC (using hed)
and produce FOO.C, which will contain the source for version
1.05.
Note that cget always creates a file with a C extension,
overwriting any existing file with the same name. This implies
that you do NOT keep your current source in FILE.C; you keep the
current source only by retaining FILE.SCC and the delta files.
Sequential version control
--------------------------
If you have access to a system that provides more sophisticated
control over the execution of DOS commands (Personal REXX or
Extended Batch Language, for example), it's not difficult to
provide for "sequential" version control for even greater space
savings. The demo batch files, cdelta and cget, use only one
base file; each new version is represented by a difference file
that is the difference from the original version:
foo.scc + foo.110 = foo.c (version 1.10)
foo.scc + foo.111 = foo.c (version 1.11)
foo.scc + foo.120 = foo.c (version 1.20)
This scheme has the virtue of simplicity, but there is a
disadvantage: the difference files just keep getting bigger and
bigger. Each difference file contains the cumulative
differences of all preceding versions. You may eventually find
that the difference files are larger than the base file.
The sequential method keeps differences between versions, rather
than differences between the current version and an original
base. That is, FOO.110 is the difference between FOO.SCC and
version 1.10; FOO.111 is the difference between versions 1.10
and 1.11; FOO.120 is the difference between versions 1.11 and
1.20. To obtain version 1.20, we start with the base file and
apply all difference files sequentially:
foo.scc + foo.110 = temp.c (foo version 1.10)
temp.c + foo.111 = temp2.c (foo version 1.11)
temp2.c + foo.120 = foo.c (foo version 1.20)
This scheme is obviously somewhat more complex, but it allows
you to save all versions of a file in the least amount of space.
Note that the single command:
hed foo.scc foo.110+foo.111+foo.120 hed.c
would do the whole job in one step. See the next section for
more information on how to apply sequential update files.
Hed and Multiple Updates
------------------------
As noted, you can apply more than one update file per hed run by
using the "+" operator:
hed file.scc file.110+file.111+file.112 file.c
Here are the full rules:
1. You can use wildcard file specifications. For example,
if FILE.110, FILE.111, and FILE.112 were the only update
files in the current directory, you could use:
hed file.scc file.1* file.c
If you had FILE.110, FILE.121, and FILE.200:
hed file.scc file.1*+file.2* file.c
2. The file list must be separated by "+" ONLY; spaces are
not permitted. Thus,
hed file.scc file.100 + file.110
is not legal. It must be
hed file.scc file.100+file.110
3. A total of up to 40 update files (including all
wildcard expansions) may be specified.
4. Hed sorts the files by extension and applies them in
sorted order, one after the other. (If you use the -v
switch, hed will sort in reverse order; if you use -n, no
sorting will be performed.) In other words, if you
enter:
hed file.scc file.1* file.c
and files FILE.120 and FILE.110 are present in the
current directory (in that order), hed will:
a. Sort the update files by extension; 110 will
precede 120 even though they are "out of order"
in the disk directory.
b. Read in FILE.SCC.
c. Apply FILE.110 updates, creating an "in-memory"
copy of version 110.
d. Apply FILE.120 updates, creating an "in-memory"
copy of version 120.
e. Write out the resultant file as FILE.C
Note that intermediate versions are not written to disk.
Reverse sorting
---------------
The purpose of the -v switch is to allow you to implement a
"reverse" version scheme. Rather than keeping an original base
and multiple updates from that base, some people prefer to keep
the full current source and difference files for earlier
versions. For example, if you have FOO versions 1.00, 1.05, and
1.10 (the current), the "traditional" scheme would be to keep
the source for 1.00 and update files for 1.05 and 1.10:
foo.scc + foo.105 -> foo.c (version 1.05)
foo.c + foo.110 -> foo.c (version 1.10)
The "reverse" scheme would be to keep the full source to version
1.10 and keep a difference file that would create 1.05 and 1.00:
foo.scc = current (1.10)
foo.scc + foo.105 -> foo.c (version 1.05)
foo.c + foo.100 -> foo.c (version 1.00)
Using this scheme, the hed command
hed foo.scc foo.1* foo.c
(to create version 1.00) wouldn't work, because the update files
would be sorted in the wrong order: 1.00 would precede and be
applied before 1.05. However, the command
hed -v foo.scc foo.1* foo.c
would sort in reverse order and apply 1.05 before 1.00,
correctly producing 1.00.
The advantage to the "reverse" scheme is that the most current
version of the source can be obtained immediately, without the
need to apply many sequential files.
Other uses of hdiff
-------------------
In addition to the version control application of hdiff, you can
find other uses for the system.
The simplest use for hdiff is to compare two files to see if
they are the same. This can be used to check for corruption
during backups, copies, etc., or to determine which of two files
is newer. Even this simple use of hdiff can be useful in
unexpected ways, however. For example, look at this small batch
file:
dir a: > temp
find "-" temp > dir.a
dir b: > temp
find "-" temp > dir.b
hdiff dir.b dir.a > temp.bat
erase dir.a
erase dir.b
erase temp
This batch can be used for a simple backup system. Assume that
the default directory in drive A contains a series of files that
you want to backup, and that the default directory in drive B
contains the same set of files from the last backup. The batch
will isolate differences between the two directories and prepare
a file called TEMP.BAT that contains a list of those files that
have been changed or added since the last backup. Many popular
text editors could very easily convert (or be programmed to
automatically convert) TEMP.BAT file into a series of copy
commands that could be used, in batch mode, to perform the
copying.
Restrictions
------------
The following act, in one way or another, as restrictions on
hdiff:
- File format: hdiff is intended as a DOS text file differencer
only. It is NOT a replacement for the DOS utility COMP or our
own QCMP. Don't use it on binary (program or data) files, or on
most word processor files.
- Available memory: hdiff works entirely in memory, and it
needs quite a lot. The starting memory requirement is about
220K; then, for each UNIQUE line in either file, hdiff needs
about 12 bytes plus the length of the line. Identical lines are
stored only once, no matter how many times they occur. Thus,
the two files:
File 1:
Line 1
/* Comment */
Line 2
File 2:
Line 1
/* Comment */
/* Comment */
Line 2
Line 3
have four unique lines ("Line 1", "/* Comment */", "Line 2", and
"Line 3"). These will use about 79 bytes of storage (in
addition to the 220K starting memory!):
4 lines @ 12 bytes: 48
Total text length: 31
- Number of lines: neither file can exceed 5000 lines of text.
- Line size: limited to a maximum of 1000 characters per line.
Notes on the algorithm
----------------------
Hdiff uses a file comparison algorithm that was developed by
Paul Heckel and described by Dave Cortesi in Dr. Dobb's Journal
#94 (August, 1984). The algorithm is substantially more
efficient than traditional file comparison methods; it can
generate a difference report between two files in little more
than the time it takes for the program to read them.
Hdiff was derived from Cortesi's demonstration program, with
substantial modifications that
- accomodate differences between edlin and CP/M's "ed" (for
which the demo was written)
- allow use of edlin's block move capabilities
- allow for much larger files through the use of all
available memory.
- allow case and spacing insensitive comparisons.
- allow the user to request the simpler difference report
rather than the edlin script.
Copyright/License/Warranty
--------------------------
This document and the program files HDIFF.EXE and HED.EXE ("the
software") are copyrighted by the author. The copyright owner
hereby licenses you to: use the software; make as many copies
of the program and documentation as you wish; give such copies
to anyone; and distribute the software and documentation via
electronic means. There is no charge for any of the above.
However, you are specifically prohibited from charging, or
requesting donations, for any such copies, however made; and
from distributing the software and/or documentation with
commercial products without prior permission. An exception is
granted to not-for-profit user's groups, which are authorized to
charge a small fee (not to exceed $7) for materials, handling,
postage, and general overhead. NO FOR-PROFIT ORGANIZATION IS
AUTHORIZED TO CHARGE ANY AMOUNT FOR DISTRIBUTION OF COPIES OF
THE SOFTWARE OR DOCUMENTATION, OR TO INCLUDE COPIES OF THE
SOFTWARE OR DOCUMENTATION WITH SALES OF THEIR OWN PRODUCTS.
THIS INCLUDES A SPECIFIC PROHIBITION AGAINST FOR-PROFIT
ORGANIZATIONS DISTRIBUTING THE SOFTWARE, EITHER ALONE OR WITH
OTHER SOFTWARE, AND CHARGING A "HANDLING" OR "MATERIALS" FEE OR
ANY OTHER SUCH FEE FOR THE DISTRIBUTION. NO FOR-PROFIT
ORGANIZATION IS AUTHORIZED TO INCLUDE THE SOFTWARE ON ANY MEDIA
FOR WHICH MONEY IS CHARGED. PERIOD.
No copy of the software may be distributed or given away without
this document; and this notice must not be removed.
There is no warranty of any kind, and the copyright owner is not
liable for damages of any kind. By using this free software,
you agree to this.
The software and documentation are:
Copyright (C) 1985, 1986, 1987, 1990 by
The Cove Software Group
Christopher J. Dunford
P.O. Box 1072
Columbia, Maryland 21044
(301) 992-9371
CompuServe 76703,2002 [IBMNET]
Software and documentation author: Chris Dunford