home *** CD-ROM | disk | FTP | other *** search
-
-
- Jan 13 13:00 1988 CDIFF.MEM Page 1
-
-
-
- Diff maintains all information needed to compare the two files in
- main memory. This means that very large files (or fairly large
- files with many differences) will cause the program to abort with
- an "out of space" error. Main memory requirements (in words) are
- approximately:
-
- 2 * (length of file1 + length of file2) + (3 * number of changes)
-
- The diff algorithm reads each file twice (once to build hash
- tables and a second time to check for fortuitous matches), then
- reads the differences by seeking randomly within the files. CPU
- time requirements include sorting the two hash vectors and
- randomly searching memory tables for equivalence classes. For
- example, running in Vax compatibility mode, two 1000 line files
- with a fair number of differences took about 25 seconds (elapsed
- wall clock time) for processing. Most of this time was spent in
- the file read routines. This test required slightly more than
- 6000 words of memory for internal tables.
-
- The diff algorithm was developed by J. W. Hunt and M. D. McIlroy,
- using a central algorithm defined by H. S. Stone. The algorithm
- was described in:
-
- Hunt, J. W., and McIlroy, M. D.,
- An Algorithm for Differential File Comparison,
- Computing Science Technical Report #41,
- Bell Laboratories, Murray Hill, NJ 07974
-
- The following description is summarized from that document. While
- it has been slightly modified to correspond to the program
- source, the algorithm is essentially identical.
-
- 1. Read the input files, building two vectors containing the line
- number (serial) and hash value (hash) of each line. Data for
- fileA will be in a vector pointed to by fi\eA[], while data for
- fileB will be pointed to by fileB[]. The lengths (number of
- lines) of the files will be represented by lenA and lenB
- respectiveLy. [This is slightly different from the + 3 * (number of changes)
-
- (Where "length" is the number of lines of data in each
- file.)
-
- The algorithm reads each file twice, once to build hash
- tables and once to check for fortuitous matches (two lines
- that are in fact different, but which have the same hash
- value). CPU time requirements include sorting the hash
- tables and randomly searching memory tables for equivalence
- classes. For example, on a time-shared VAX-11/780, comparing
- two 1000 line files required about 30 seconds (elapsed clock
- time) and about 10,000 bytes of working storage. About 90
- per-cent of the time was taken up by file I/O.
-
- D✓D✓D✓DI✓I✓I✓IA✓A✓A✓AG✓G✓G✓GN✓N✓N✓NO✓O✓O✓OS✓S✓S✓ST✓T✓T✓TI✓I✓I✓IC✓C✓C✓CS✓S✓S✓S
- Warning, bad option 'x'
- The option is ignored.
-
-
-
- Page 1 (printed 1/13/88)
-
-
-
-
-
-
- C✓C✓C✓CD✓D✓D✓DI✓I✓I✓IF✓F✓F✓FF✓F✓F✓F(✓(✓(✓(1✓1✓1✓1)✓)✓)✓) U✓U✓U✓UN✓N✓N✓NI✓I✓I✓IX✓X✓X✓X 5✓5✓5✓5.✓.✓.✓.0✓0✓0✓0 C✓C✓C✓CD✓D✓D✓DI✓I✓I✓IF✓F✓F✓FF✓F✓F✓F(✓(✓(✓(1✓1✓1✓1)✓)✓)✓)
-
-
-
- Usage ...
- Two input files were not specified.
-
- Can't open input file "filename".
- Can't continue.
-
- Out of space
- The program ran out of memory while comparing the two
- files.
-
- Can't read line nnn at xxx in file[A/B]
- This indicates an I/O error when seeking to the
- specific line. It should not happen.
-
- Spurious match, output is not optimal.
- Two lines that were different yielded the same hash
- value. This is harmless except that the difference
- output is not the minimum set of differences between
- the two files. For example, instead of the output:
- lines 1 to 5 were changed to ...
- the program will print
- lines 1 to 3 were changed to ...
- lines 4 to 5 were changed to ...
-
- The program uses a CRC16 hash code.
- The likelihood of this error is quite small.
-
- A✓A✓A✓AU✓U✓U✓UT✓T✓T✓TH✓H✓H✓HO✓O✓O✓OR✓R✓R✓R
- The diff algorithm was developed by J. W. Hunt and M. D.
- McIlroy, using a central algorithm defined by H. S. Stone.
- It was published in:
- Hunt, J. W., and McIlroy, M. D.,
- An Algorithm for Differential File Comparison,
- Computing Science Technical Report #41,
- Bell Laboratories, Murray Hill, NJ 07974
-
- B✓B✓B✓BU✓U✓U✓UG✓G✓G✓GS✓S✓S✓S
- On RSX and DECUS C on VMS systems, diff may fail if the both
- files are not "variable-length, implied carriage control"
- format. The scopy program can be used to convert files to
- this format if problems arise.
-
- When compiled under VAX C, diff handles STREAM_LF files
- properly (in addition to the canonical variable-length
- implied carriage control files). Other variations should
- work, but have not been tested.
-
- When compiled under VAX C, diff is quite slow for unknown
- reasons which ought to be investigated. On the other hand,
- it has access to effectively unlimited memory.
-
- Output in a form suitable for ed - the -e option - seems
-
-
-
- Page 2 (printed 1/13/88)
-
-
-
-
-
-
- C✓C✓C✓CD✓D✓D✓DI✓I✓I✓IF✓F✓F✓FF✓F✓F✓F(✓(✓(✓(1✓1✓1✓1)✓)✓)✓) U✓U✓U✓UN✓N✓N✓NI✓I✓I✓IX✓X✓X✓X 5✓5✓5✓5.✓.✓.✓.0✓0✓0✓0 C✓C✓C✓CD✓D✓D✓DI✓I✓I✓IF✓F✓F✓FF✓F✓F✓F(✓(✓(✓(1✓1✓1✓1)✓)✓)✓)
-
-
-
- rather pointless; the analogue on DEC systems is SLP (SUMSLP
- on VMS). It would be simple to provide SLP-compatible
- output. The question is, why bother - since the various DEC
- file comparisonFound 424 control chars in "diff.doc"
- utilities already produce it.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Page 3 (printed 1/13/88)
-
-
-