GEMini Atari

home *** CD-ROM | disk | FTP | other *** search

/ GEMini Atari / GEMini_Atari_CD-ROM_Walnut_Creek_December_1993.iso / files / utility / diff / diff.mem < prev next >

Wrap

Text File | 1988-07-20 | 9.5 KB | 347 lines

Jan 13 13:00 1988 CDIFF.MEM Page 1 Diff maintains all information needed to compare the two files in main memory. This means that very large files (or fairly large files with many differences) will cause the program to abort with an "out of space" error. Main memory requirements (in words) are approximately: 2 * (length of file1 + length of file2) + (3 * number of changes) The diff algorithm reads each file twice (once to build hash tables and a second time to check for fortuitous matches), then reads the differences by seeking randomly within the files. CPU time requirements include sorting the two hash vectors and randomly searching memory tables for equivalence classes. For example, running in Vax compatibility mode, two 1000 line files with a fair number of differences took about 25 seconds (elapsed wall clock time) for processing. Most of this time was spent in the file read routines. This test required slightly more than 6000 words of memory for internal tables. The diff algorithm was developed by J. W. Hunt and M. D. McIlroy, using a central algorithm defined by H. S. Stone. The algorithm was described in: Hunt, J. W., and McIlroy, M. D., An Algorithm for Differential File Comparison, Computing Science Technical Report #41, Bell Laboratories, Murray Hill, NJ 07974 The following description is summarized from that document. While it has been slightly modified to correspond to the program source, the algorithm is essentially identical. 1. Read the input files, building two vectors containing the line number (serial) and hash value (hash) of each line. Data for fileA will be in a vector pointed to by fi\eA[], while data for fileB will be pointed to by fileB[]. The lengths (number of lines) of the files will be represented by lenA and lenB respectiveLy. [This is slightly different from the + 3 * (number of changes) (Where "length" is the number of lines of data in each file.) The algorithm reads each file twice, once to build hash tables and once to check for fortuitous matches (two lines that are in fact different, but which have the same hash value). CPU time requirements include sorting the hash tables and randomly searching memory tables for equivalence classes. For example, on a time-shared VAX-11/780, comparing two 1000 line files required about 30 seconds (elapsed clock time) and about 10,000 bytes of working storage. About 90 per-cent of the time was taken up by file I/O. D✓D✓D✓DI✓I✓I✓IA✓A✓A✓AG✓G✓G✓GN✓N✓N✓NO✓O✓O✓OS✓S✓S✓ST✓T✓T✓TI✓I✓I✓IC✓C✓C✓CS✓S✓S✓S Warning, bad option 'x' The option is ignored. Page 1 (printed 1/13/88) C✓C✓C✓CD✓D✓D✓DI✓I✓I✓IF✓F✓F✓FF✓F✓F✓F(✓(✓(✓(1✓1✓1✓1)✓)✓)✓) U✓U✓U✓UN✓N✓N✓NI✓I✓I✓IX✓X✓X✓X 5✓5✓5✓5.✓.✓.✓.0✓0✓0✓0 C✓C✓C✓CD✓D✓D✓DI✓I✓I✓IF✓F✓F✓FF✓F✓F✓F(✓(✓(✓(1✓1✓1✓1)✓)✓)✓) Usage ... Two input files were not specified. Can't open input file "filename". Can't continue. Out of space The program ran out of memory while comparing the two files. Can't read line nnn at xxx in file[A/B] This indicates an I/O error when seeking to the specific line. It should not happen. Spurious match, output is not optimal. Two lines that were different yielded the same hash value. This is harmless except that the difference output is not the minimum set of differences between the two files. For example, instead of the output: lines 1 to 5 were changed to ... the program will print lines 1 to 3 were changed to ... lines 4 to 5 were changed to ... The program uses a CRC16 hash code. The likelihood of this error is quite small. A✓A✓A✓AU✓U✓U✓UT✓T✓T✓TH✓H✓H✓HO✓O✓O✓OR✓R✓R✓R The diff algorithm was developed by J. W. Hunt and M. D. McIlroy, using a central algorithm defined by H. S. Stone. It was published in: Hunt, J. W., and McIlroy, M. D., An Algorithm for Differential File Comparison, Computing Science Technical Report #41, Bell Laboratories, Murray Hill, NJ 07974 B✓B✓B✓BU✓U✓U✓UG✓G✓G✓GS✓S✓S✓S On RSX and DECUS C on VMS systems, diff may fail if the both files are not "variable-length, implied carriage control" format. The scopy program can be used to convert files to this format if problems arise. When compiled under VAX C, diff handles STREAM_LF files properly (in addition to the canonical variable-length implied carriage control files). Other variations should work, but have not been tested. When compiled under VAX C, diff is quite slow for unknown reasons which ought to be investigated. On the other hand, it has access to effectively unlimited memory. Output in a form suitable for ed - the -e option - seems Page 2 (printed 1/13/88) C✓C✓C✓CD✓D✓D✓DI✓I✓I✓IF✓F✓F✓FF✓F✓F✓F(✓(✓(✓(1✓1✓1✓1)✓)✓)✓) U✓U✓U✓UN✓N✓N✓NI✓I✓I✓IX✓X✓X✓X 5✓5✓5✓5.✓.✓.✓.0✓0✓0✓0 C✓C✓C✓CD✓D✓D✓DI✓I✓I✓IF✓F✓F✓FF✓F✓F✓F(✓(✓(✓(1✓1✓1✓1)✓)✓)✓) rather pointless; the analogue on DEC systems is SLP (SUMSLP on VMS). It would be simple to provide SLP-compatible output. The question is, why bother - since the various DEC file comparisonFound 424 control chars in "diff.doc" utilities already produce it. Page 3 (printed 1/13/88) -1 -3 -5 -4 After unsort() has unwound fileB: Class[] (1 .. lines in fileA): 3 4 0 5 0 1 Within unravel(), match is built in the following order: Jan 13 13:00 1988 CDIFF.MEM Page 5 match[6] := 6 match[4] := 5 match[2] := 3 match[1] := 1 Match[] (0 .. lines in fileA): 0 1 3 0 5 0 6 Output is as follows: 1a2 > line 1.5 inserted 3c4 < line 3 --- > line 3 changed 5d5 < line 5 gets deleted ******************************************************************** /* * s t r e q . c */ String Equality Test String equality test Synopsis: streq(a, b); char *a; char *b; Description: Return TRUE if the strings are equal. Bugs *************************************************************** /* * e r r o r . c */ Fatal Error Exit Synopsis: _error() error(format, args) Jan 13 13:00 1988 CDIFF.MEM Page 6 char *format; Documentation: Fatal error exits. _error() halts, error() prints something on stderr and then halts. Bugs: THIS DOES NOT WORK ON MANY SYSTEMS DUE TO EXTREMLY NON-PORTABLE CODE. Why oh why can't people learn to use varargs properly? This code will blow up on OSK. Fortunatly, it isn't used often...