Media Share 9

home *** CD-ROM | disk | FTP | other *** search

/ Media Share 9 / MEDIASHARE_09.ISO / utility / diffx.zip / HDIFF.ARC / HDIFF.DOC < prev next >

Wrap

Text File | 1990-01-11 | 25KB | 577 lines

hdiff 1.22 hed 1.01 Purpose ------- Hdiff compares two DOS text files and records the differences between them in a third file. Although hdiff can be used for simple "what's changed?" purposes, its real function is to assist in maintaining program source code or similar text files that change over time. By maintaining an original base file and a series of "difference" files, it's possible to retain all versions of a file at a great savings in space over retaining the full text of all versions. The hdiff system includes an auxiliary program, hed, that is used to apply the difference files to the original (although EDLIN can also be used if the files are small enough). HDIFF 1.22 is unchanged from the version released in December, 1987. HED 1.01 corrects a problem that very occasionally caused incorrect reproduction of the original file after applying updates. Boy, did it take us a long time to track this one down. Running hdiff ------------- The general syntax for hdiff is: hdiff [-ecs] old-file new-file [dif-file] The simplest use of hdiff is exemplified by: hdiff oldfile.txt newfile.txt which displays a simple report of differences between the two files: it shows which lines of OLDFILE.TXT do not appear in NEWFILE.TXT (deletions), and which lines of NEWFILE.TXT do not appear in OLDFILE.TXT (insertions). The simple change report consists of text lines in this format: nnnn[+/-] text A '+' format indicates that the line is new (an insertion); the '-' indicates that the line is gone (a deletion). Thus: 0001- This line appears in the old file only 0001+ This line appears in the new file only The 'nnnn' represents the line number. For '+' lines, it's the line number in the new file; for '-' lines, it's the line number in the old file. (Note that the first file named on the command line is always assumed to be the "old" file, and the second is the "new" file.) If you want the report to be sent to a text file rather than to the screen, simply include the file name as a third parameter: hdiff oldfile.txt newfile.txt changes.txt NOTE: the simple report does not show lines that have been moved. The edlin-format report (-e switch) does include moved lines. Use the -e report for maintaining difference files; the simple report does not contain enough information. Optional switches ----------------- Here are the switches that can optionally be added to the command line. They must precede the file names: -c Case insensitive: hdiff ignores differences in alphabetic case. Thus, the two lines: This is text THIS IS TEXT are not reported as changed. -e Edlin: produce an edlin-compatible difference file rather than the simple difference report described above. This switch is also used to created hed-format files. See succeeding sections for more information. -s Space insensitive: hdiff ignores differences in spacing. This, the two lines: This is text This is text are not reported as changed. The switches may be combined, and they may be in any order: -e -c -ec -ce -c -e are all equivalent. All switches must, however, precede the first filename. Examples of hdiff use: hdiff foo.c newfoo.c compares file 'foo.c' with file 'newfoo.c' and displays a simple report showing insertions (lines in newfoo that do not appear in foo) and deletions (lines in foo that do not appear in newfoo). Lines that have been moved but are otherwise unchanged do not appear in this report. hdiff -ec foo.c newfoo.c foo.114 compares foo.c with newfoo.c, ignoring case differences, and prepares an edlin/hed script in the file foo.114. This script, if applied to foo as described below, will create a copy of newfoo. Applying difference files: edlin and version control ---------------------------------------------------------- The main purpose of hdiff is to assist you in maintaining multiple versions of program source or other text files. Many programmers like to keep archival copies of old source, for any of a number of reasons (one reason: sometimes changes don't work and it's necessary to go back to a previous version!). You could simply keep an archive or library with the complete text of all versions, but this is wasteful of disk space. A better solution (short of purchasing a true SCCS ["Source Code Control System"] for big bucks) is to use hdiff and hed or edlin to keep one original source file plus smaller difference files that can be used to re-create any version. To see how this works, assume that you have an old version of your program MYPROG.C (in a file called MYPROG.SCC) and a new version named MYPROG.C: myprog.scc (version 1.00) myprog.c (version 1.10) To create a difference file, use hdiff: hdiff -e myprog.scc myprog.c myprog.110 After hdiff is finished, you will have a file (MYPROG.110) that contains the differences between 1.00 and 1.10. Because of the -e switch, this file is in a special format: it is actually the text of a series of edlin commands that would turn version 1.00 source into version 1.10 source. It is an edlin script. So, if you were to execute the commands (remember that MYPROG.SCC is version 1.00): copy myprog.scc myprog.c edlin myprog.c < myprog.110 the result (after edlin finished) would be a file called MYPROG.C that contains the source for version 1.10. Thus, between the original (1.00) MYPROG.SCC and the difference file MYPROG.110 you have all you need to re-create either version of the program. Chances are, however, that MYPROG.110 is much smaller than the full source for MYPROG.C, so considerable storage is saved. Note that edlin cannot deal, in this context, with files larger than about 48K. If you try to apply a difference file to a base file larger than 48K using edlin, the resultant file will be damaged and probably unusable. For this reason and others, we recommend using the supplied program "hed" rather than edlin. Using hed --------- Hed is a simple program that can be used in place of edlin to apply update files. We prefer it to edlin for this purpose; there are several reasons: 1. It's much faster. 2. It doesn't suffer from Edlin's 48K file size restriction. 3. It handles file dates in a more useful manner. 4. It can create a new file with a different name. 5. It can apply more than one update file at a time. Hed's full syntax is: hed [-nv] base diff[+diff...] [new] where base is the original source file (MYPROG.SCC, in the above example), diff is the difference file created by hdiff, and new is the optional output file name. The optional parameters are: -n No sort: instructs hed not to sort multiple update files, i.e., to apply them in the stated order. -v reVerse: instructs hed to sort multiple updates files is reverse order (more about this shortly). If both -n and -v are specified, -n takes precedence. This command creates MYPROG version 1.10 from the 1.00 source and the difference file created above: hed myprog.scc myprog.110 myprog.c On completion, you'll have the 1.10 source in the file MYPROG.C. If you do not include a third file name (new), hed will change the extension of the base file to BAK and re-use the base file name for the output. In other words, hed myprog.scc myprog.110 wil leave the original MYPROG.SCC in MYPROG.BAK, and the new MYPROG.SCC will be the source for version 1.10. This is exactly what edlin would do. Note that you can apply several updates at once: hed myprog.scc myprog.110+myprog.111+myprog.120 myprog.c More information about this feature is in the section called "Hed and Multiple Updates". File dates ---------- If you use hdiff's -e switch and specify an output file, hdiff will set the difference file date to the same date as new-file. That is, after hdiff -e myprog.scc myprog.c myprog.110 MYRPOG.110 will have the same date as MYPROG.C. This is useful because hed uses the difference file date for its own output. That is, after: hed myprog.scc myprog.110 myprog.c MYPROG.C will have the same date as MYPROG.110, which, in turn, has the same date as the original copy of MYPROG.C. In this manner, the hdiff/hed system can retain true file dates for all versions. Cdelta and cget --------------- The two demonstration batches, cdelta and cget, provide a quick sample of the kinds of things that can be done with hdiff and hed. The two batches are designed for C programs; to revise them for other languages, simply replace all references to ".c" with the desired extension (.asm, for example). The purpose of cdelta is to generate a change script that will convert a "base" source file into a specified version of your source. Cget performs the inverse task; it applies a specified change file to the base and produces a file containing the specified version. File naming conventions are as follows: file.scc: "base" source; scc = source code control file.###: A change script to produce version ### file.c: The current version (cdelta), or the output file (cget) For example, suppose you are working with a C program called FOO. A base (earliest) version of this file should be in FOO.SCC. You have just finished revision 1.10 of FOO. To create the change file, type cdelta foo 110 The batch will create a new file, FOO.110; this file is an edlin/hed compatible script that will convert FOO.SCC into version 1.10 of FOO.C. To retrieve a specified version, say 1.05, use cget foo 105 The batch will apply the script FOO.105 to FOO.SCC (using hed) and produce FOO.C, which will contain the source for version 1.05. Note that cget always creates a file with a C extension, overwriting any existing file with the same name. This implies that you do NOT keep your current source in FILE.C; you keep the current source only by retaining FILE.SCC and the delta files. Sequential version control -------------------------- If you have access to a system that provides more sophisticated control over the execution of DOS commands (Personal REXX or Extended Batch Language, for example), it's not difficult to provide for "sequential" version control for even greater space savings. The demo batch files, cdelta and cget, use only one base file; each new version is represented by a difference file that is the difference from the original version: foo.scc + foo.110 = foo.c (version 1.10) foo.scc + foo.111 = foo.c (version 1.11) foo.scc + foo.120 = foo.c (version 1.20) This scheme has the virtue of simplicity, but there is a disadvantage: the difference files just keep getting bigger and bigger. Each difference file contains the cumulative differences of all preceding versions. You may eventually find that the difference files are larger than the base file. The sequential method keeps differences between versions, rather than differences between the current version and an original base. That is, FOO.110 is the difference between FOO.SCC and version 1.10; FOO.111 is the difference between versions 1.10 and 1.11; FOO.120 is the difference between versions 1.11 and 1.20. To obtain version 1.20, we start with the base file and apply all difference files sequentially: foo.scc + foo.110 = temp.c (foo version 1.10) temp.c + foo.111 = temp2.c (foo version 1.11) temp2.c + foo.120 = foo.c (foo version 1.20) This scheme is obviously somewhat more complex, but it allows you to save all versions of a file in the least amount of space. Note that the single command: hed foo.scc foo.110+foo.111+foo.120 hed.c would do the whole job in one step. See the next section for more information on how to apply sequential update files. Hed and Multiple Updates ------------------------ As noted, you can apply more than one update file per hed run by using the "+" operator: hed file.scc file.110+file.111+file.112 file.c Here are the full rules: 1. You can use wildcard file specifications. For example, if FILE.110, FILE.111, and FILE.112 were the only update files in the current directory, you could use: hed file.scc file.1* file.c If you had FILE.110, FILE.121, and FILE.200: hed file.scc file.1*+file.2* file.c 2. The file list must be separated by "+" ONLY; spaces are not permitted. Thus, hed file.scc file.100 + file.110 is not legal. It must be hed file.scc file.100+file.110 3. A total of up to 40 update files (including all wildcard expansions) may be specified. 4. Hed sorts the files by extension and applies them in sorted order, one after the other. (If you use the -v switch, hed will sort in reverse order; if you use -n, no sorting will be performed.) In other words, if you enter: hed file.scc file.1* file.c and files FILE.120 and FILE.110 are present in the current directory (in that order), hed will: a. Sort the update files by extension; 110 will precede 120 even though they are "out of order" in the disk directory. b. Read in FILE.SCC. c. Apply FILE.110 updates, creating an "in-memory" copy of version 110. d. Apply FILE.120 updates, creating an "in-memory" copy of version 120. e. Write out the resultant file as FILE.C Note that intermediate versions are not written to disk. Reverse sorting --------------- The purpose of the -v switch is to allow you to implement a "reverse" version scheme. Rather than keeping an original base and multiple updates from that base, some people prefer to keep the full current source and difference files for earlier versions. For example, if you have FOO versions 1.00, 1.05, and 1.10 (the current), the "traditional" scheme would be to keep the source for 1.00 and update files for 1.05 and 1.10: foo.scc + foo.105 -> foo.c (version 1.05) foo.c + foo.110 -> foo.c (version 1.10) The "reverse" scheme would be to keep the full source to version 1.10 and keep a difference file that would create 1.05 and 1.00: foo.scc = current (1.10) foo.scc + foo.105 -> foo.c (version 1.05) foo.c + foo.100 -> foo.c (version 1.00) Using this scheme, the hed command hed foo.scc foo.1* foo.c (to create version 1.00) wouldn't work, because the update files would be sorted in the wrong order: 1.00 would precede and be applied before 1.05. However, the command hed -v foo.scc foo.1* foo.c would sort in reverse order and apply 1.05 before 1.00, correctly producing 1.00. The advantage to the "reverse" scheme is that the most current version of the source can be obtained immediately, without the need to apply many sequential files. Other uses of hdiff ------------------- In addition to the version control application of hdiff, you can find other uses for the system. The simplest use for hdiff is to compare two files to see if they are the same. This can be used to check for corruption during backups, copies, etc., or to determine which of two files is newer. Even this simple use of hdiff can be useful in unexpected ways, however. For example, look at this small batch file: dir a: > temp find "-" temp > dir.a dir b: > temp find "-" temp > dir.b hdiff dir.b dir.a > temp.bat erase dir.a erase dir.b erase temp This batch can be used for a simple backup system. Assume that the default directory in drive A contains a series of files that you want to backup, and that the default directory in drive B contains the same set of files from the last backup. The batch will isolate differences between the two directories and prepare a file called TEMP.BAT that contains a list of those files that have been changed or added since the last backup. Many popular text editors could very easily convert (or be programmed to automatically convert) TEMP.BAT file into a series of copy commands that could be used, in batch mode, to perform the copying. Restrictions ------------ The following act, in one way or another, as restrictions on hdiff: - File format: hdiff is intended as a DOS text file differencer only. It is NOT a replacement for the DOS utility COMP or our own QCMP. Don't use it on binary (program or data) files, or on most word processor files. - Available memory: hdiff works entirely in memory, and it needs quite a lot. The starting memory requirement is about 220K; then, for each UNIQUE line in either file, hdiff needs about 12 bytes plus the length of the line. Identical lines are stored only once, no matter how many times they occur. Thus, the two files: File 1: Line 1 /* Comment */ Line 2 File 2: Line 1 /* Comment */ /* Comment */ Line 2 Line 3 have four unique lines ("Line 1", "/* Comment */", "Line 2", and "Line 3"). These will use about 79 bytes of storage (in addition to the 220K starting memory!): 4 lines @ 12 bytes: 48 Total text length: 31 - Number of lines: neither file can exceed 5000 lines of text. - Line size: limited to a maximum of 1000 characters per line. Notes on the algorithm ---------------------- Hdiff uses a file comparison algorithm that was developed by Paul Heckel and described by Dave Cortesi in Dr. Dobb's Journal #94 (August, 1984). The algorithm is substantially more efficient than traditional file comparison methods; it can generate a difference report between two files in little more than the time it takes for the program to read them. Hdiff was derived from Cortesi's demonstration program, with substantial modifications that - accomodate differences between edlin and CP/M's "ed" (for which the demo was written) - allow use of edlin's block move capabilities - allow for much larger files through the use of all available memory. - allow case and spacing insensitive comparisons. - allow the user to request the simpler difference report rather than the edlin script. Copyright/License/Warranty -------------------------- This document and the program files HDIFF.EXE and HED.EXE ("the software") are copyrighted by the author. The copyright owner hereby licenses you to: use the software; make as many copies of the program and documentation as you wish; give such copies to anyone; and distribute the software and documentation via electronic means. There is no charge for any of the above. However, you are specifically prohibited from charging, or requesting donations, for any such copies, however made; and from distributing the software and/or documentation with commercial products without prior permission. An exception is granted to not-for-profit user's groups, which are authorized to charge a small fee (not to exceed $7) for materials, handling, postage, and general overhead. NO FOR-PROFIT ORGANIZATION IS AUTHORIZED TO CHARGE ANY AMOUNT FOR DISTRIBUTION OF COPIES OF THE SOFTWARE OR DOCUMENTATION, OR TO INCLUDE COPIES OF THE SOFTWARE OR DOCUMENTATION WITH SALES OF THEIR OWN PRODUCTS. THIS INCLUDES A SPECIFIC PROHIBITION AGAINST FOR-PROFIT ORGANIZATIONS DISTRIBUTING THE SOFTWARE, EITHER ALONE OR WITH OTHER SOFTWARE, AND CHARGING A "HANDLING" OR "MATERIALS" FEE OR ANY OTHER SUCH FEE FOR THE DISTRIBUTION. NO FOR-PROFIT ORGANIZATION IS AUTHORIZED TO INCLUDE THE SOFTWARE ON ANY MEDIA FOR WHICH MONEY IS CHARGED. PERIOD. No copy of the software may be distributed or given away without this document; and this notice must not be removed. There is no warranty of any kind, and the copyright owner is not liable for damages of any kind. By using this free software, you agree to this. The software and documentation are: Copyright (C) 1985, 1986, 1987, 1990 by The Cove Software Group Christopher J. Dunford P.O. Box 1072 Columbia, Maryland 21044 (301) 992-9371 CompuServe 76703,2002 [IBMNET] Software and documentation author: Chris Dunford