The C Users' Group Library 1994 August

home *** CD-ROM | disk | FTP | other *** search

/ The C Users' Group Library 1994 August / wc-cdrom-cusersgrouplibrary-1994-08.iso / vol_100 / 193_01 / crypt.doc < prev next >

Wrap

Text File | 1985-11-14 | 17KB | 334 lines

12/9/85 INTRODUCTION The "Infinite Key Encryption System" article in issue #94 (August 1984) of Dr. Dobbs contains an excellent tutorial on Encryption Systems. At the time it was published, I read it with only passing interest. About six months ago, I developed a need for this type of utility and so begins my story. The first thing I did was write a shell cypher program (cypher.c-Listing #1). Since I had already written a generic file copy utility which allowed modifications during transfer, it was a simple matter to modify the argument passing to include multiple keys and add a cypher() function call to encrypt the file with a simple exclusive-or'ing algorithm (cypher1.c-Listing #2). While this method did encrypt the file and allowed for easy decryption (using the same run string), there were definite detectable patterns in the resultant file. These patterns, a function of the key period, were easily found in areas of repetitive characters. ( eg. a string of asterisks or spaces, the hex 1A's at the end of a file, etc ) Another drawback to this scheme is the inability to pass nonprintable characters in the runstring, thereby limiting the number of encryption tokens. Sooooo it was back to the drawing board. Rereading the aforementioned article with renewed interest, I gained an insight into the methods and schemes of practical modern cyphers. I don't intend to cover these concepts, so if you're interested or lacking in the basics, avail yourself of that article, and those in its bibliography. While the tutorial portion was excellent, I disagreed with a few points on implementation. First, there was the code itself and the intimation that assembly language was required for reasonable speed. The MAC Assembler is only used with CP/M-80 and I wanted more portable source, so I decided to write it in C. Second, I felt that a random key of some prime length could be generated solely from the original key (cypher2.c - Listing #3). Although some keys may work better than others, a means to evaluate results can render this method very functional. And last, I disagreed with the need for passing information in the encrypted file. It seemed unnecessary and cumbersome. My goal was to develop a method that worked entirely from the run-string. Overall I must commend the work done by John Thomas and Joan Thersites for presenting such a complete treatment of thier topic. THE ULTIMATE CYPHER The ultimate cypher is like the ultimate weapon. No matter how sophisticated, an anti-weapon (anti-cypher) can eventually be developed, IF there is a need. The user must make some judgements regarding needs and level of protection. The 2 algorithms mentioned are relatively simple to implement and use. The same keys will encypher or decypher the file and key order isn't important. The experts (and it becomes quite obvious) point out that the strongest cypher schemes utilize a combination of transposition and substitution. However, when transposition is added, the order of decyphering must be the exact reverse of encyphering. The last cypher module contains an algorithm for transposing the file tokens along with the random generated key encryption scheme (cypher3.c-Listing #4). This is a small price to pay for the added security. THE NEED FOR TOOLS As I progressed in my quest of the ultimate cypher/decypher algorithm, I became aware of the deficiencies of the standard CP/M utilities at my disposal, so I developed my own. The first tool, fv.c (file view - Listing #5), replaced my CP/M dump.com. Dump provides a continual display on screen of the hex contents of a file. Since most encryption is performed on text files, it is most beneficial to include the ascii form along with the hex. And, since most algorithms use an exclusive-or as the means of encryption it was easy enough to dump two files and the exclusive-ored difference between them. The next tool, fstat.c (file statistics - Listing #6) calculates and display the statistical characteristics of the file. This tool scans the file, counting the occurances of each element, and provides a 16 x 16 display of the distribution of characters. It calculates mean, median, mode and range of the character distribution and displays it's histogram. As one might suspect, each file type has a definite signature. In fact after limited use of this utility the user will be able to recognize the histogram patterns for text, Wordstar, and many other files. While experimenting with various schemes, it becomes obvious that the most difficult file to disguise is one that contains a single byte for every entry or some sequential scheme. So the next task was developing a utility, makef.c (make file - Listing #7), to generate a known sequential or uni-character file of some user defined length. Finally the last utility, sp.c (search pattern - Listing #7), I needed was a search scheme to look for repetitive patterns occuring within a file and provide some information regarding location and depth of the repetition. Also included is the option to calculate the delta characteristics of a file to search for repetitive mathematical as well character sequences. CYPHER.C - LISTING #1 The cypher shell program is provided for use as is, or for user modification. It contains the argument passing and file handling source code needed to copy from an existing file to a new file via a 16 Kbyte buffer with a cypher function being called to encrypt the file (If the file is less than 16k the input file name may be the same as the output thus destroying the original contents) A 16K buffer was chosen because it should fit easily with most compilers. This value may be adjusted to meet individual compiler needs. Any of the 3 algorithms that follow may be included or linked with this shell. Caution is recommended to ensure that the function name is appropriate for the method you use. The keys are passed in the CP/M command line and therefore limited by its length as well as the argument passing capability of the C-compiler. CYPHER1.C - LISTING #2 This minimal cypher algorithm uses an exclusive-or'ing scheme to encrypt the file with the keys passed. If the user employs keys of some prime length and performs multiple passes the results can be quite difficult to decypher. It's quick and easy to use and provides an excellent means to test the basics of cryptography. Since the keys are limited to only include printable characters, we don't take full advantage of the 256 codes available for a byte. CYPHER2.C - LISTING #3 Now something a little more difficult for the code breaker, an algorithm which grew out of the previous listing and generates a prime length key for each user key passed. One of 50 prime values (betweem 1009 and 1999) is selected as a function of the key passed. The prime key is then generated using a simple summing- anding-exclusive oring algorithm and the file is encrypted using this new key. If two or more keys are used, this method guarantees a cypher period in excess of 1,000,000, which is significantly larger than most text files. The key generation scheme is based entirely on the original key length and its contents and I fail to see how this can be worked backwards to regain the original, especially if multiple keys are used. Some keys will generate shorter periods within the prime length, but this is easily tested with the tools provided. I welcome feedback or suggestions for improving this algorithm. CYPHER3.C - LISTING #4 Adding chaos to disorder has probably driven many a code-cracker to drink. And this is just what we're trying to accomplish. The cypher2.c algorithm was modified slightly to test the first character of each key passed. If the key begins with a dash (-), then the buffer is transposed by some value between 2 and 17. Else, it encrypts the file using the algorithm as described above. This simple but very effective method puts the icing on the cake, however it completely reverses the decryption method. If the original runstring was cypher file.txt file.new ABRAHAM -2 LINCOLN -3 then the decryption runstring would be cypher file.new file.doc -3 LINCOLN -2 ABRAHAM FV.C - LISTING #5 As I mentioned earlier, this is my replacement for the CP/M dump utility. It allows the user to pass one or two files in the runstring for display. If one file name is passed in the runstring, the output appears much like the CP/M dump.com output with the addition of the ascii display. If two file names are passed, the output consists of a line from file one, a line from file two and a third line containing the exclusive oring of the two files (labeled "dif"). In all cases, non printable characters are replaced with a carrot ( ^ ) in the ascii portion and nulls are replaced with an equal sign ( = ), to readily identify comparisons between two files. The comparative output is purely a byte for byte operation and no attempt is made to realign the file to comparing characters as in a compare utility. The first file length controls display length. Figures 1 shows an example of screen output from the runstring <fv cypher1.c> and Figure 2 from the runstring <fv cypher1.c cypher2.c> FSTAT.C - LISTING #6 Descriptive statistics is the name of the game here and as with any statistical evaluation, one must be brutally honest (at least with oneself) to draw an objective conclusion. The entire file is read, 16Kbytes at a time. As we read, the occurrences of each of the 256 tokens is accumulated and we obtain the sum of all bytes as well as the min and max token occurrences. The sum is divided by the total characters to obtain the mean, the max becomes the mode and the range is the difference between min and max. Next the 256 byte array is copied to a second array and sorted to obtain the median. With all calculations completed, the numerical values of occurrences are displayed in a 16 X 16 display for evaluation. The statistical characteristics are displayed and the program pauses to await some keyboard entry to display the histogram. Depressing the space bar prints a scaled horizontal histogram of 16 groups (0-15, 16-31, . . ., 241-256). The ideal random file (which is what we'd like to see) would have the following characteristics: mean 127.5 mode not critical range < 20% of the total bytes divided by 256 median at midpoint of range histogram reasonably flat Remember I said ideal ! This is where judgement and self honesty come into play. A sequential file will display these ideal characteristics as well as a true random file. Also, files that look too good statistically should be just as suspect as those that don't. Figure 3 is the output produced by this article as is and Figure 4 when its encrypted with the runstring cypher crypt-tb.art new frederick -1 angelo -2 scacchitti -3 In all fairness, I must state that other possibly better statistical methods are better for determining randomness. I opted to use descriptive statistic because they are more easily understood and implemented. MAKEF.C - LISTING #7 A simple enough utility but an absolutely necessity if we are to evaluate encryption schemes. A filename of n 256 byte blocks is created and if a value between 0 and 255 is passed the file will be filled with this value. If no value (or one that is non- numeric) is passed, each block contains a sequential count from 0 to 255. An added benefit to this program is its ability to clean a disk. The user simply creates a file the size of the remaining disk space and then erases it. This results in all free disk space being set as the user defined. SP.C - LISTING #8 Along with the fstat utility this one confirms or denys (maybe even questions) the statistical data we received. A brute force search method is used to find matching character strings in the file. By default, the search starts at the first character and searches the first 128 bytes for a match of 4 or more characters. If the match depth exceeds the block size it is searching the program will return to CP/M after displaying the match data. If not it will continue in its search. Block size to search, minimum depth of comparison and starting point in the buffer may optionally be changed in the runstring. Also if an additional argument is passed ( one more than the three mentioned) the software converts the data in the buffer to delta-form. That is, element[n] = element[n] - element[n+1] for all data in the buffer. After the conversion is made the search scheme continues as before. This feature allows us to evaluate the file for some mathematical sequence. One drawback to this program as it stands is the limiting factor of the buffer size. No attempt is made to search beyond it. This shouldn't matter for most text files. SMALL C, CP/M, AND MISCELLANEOUS chkkbd() is a function which enables us to stop display (program) activity if Control-S is depressed. Following with a Control-C causes a return to CP/M and any other character will allow the program to continue. This function calls a bdos() function so the user may have to modify this for other operating systems. The getchx() function is a non-echoing version of getchar() which uses BDOS function 6. getchar() may be substituted for getchx(). calloc(), malloc() and cfree() functions are used for the dynamic allocation and deallocation of memory. My allocation/deallocation scheme is of the simple variety where the programmer must pay heed to order or pay the consequences. The source code contained here should work with most implementations of these functions. Printer output may be obtained from any of the programs by using the CP/M Control P function. It was the simplest method to implement. Math functions (especially floating point) are difficult for Small-C. There are several routines in the fstat.c source that perform the necessary long and fractional calculations. It's not necessary to change these however, if your compiler supports floats and longs, have at it. Each program will display the usage if entered without the proper number of arguments in the run string. Also, since most software users begin to feel uncomfortable when their computer is off somewhere performing exotic calculations, each program displays status to the screen thus putting these fears at rest. CYPHER BENCHMARKS My version, written in Small C, is generic enough to adapt to any C compiler. Running on a 4 Mhz Z80 based CP/M system, it benchmarks at less than 1K/sec for file I/O, 16K/sec for file transposition and approximately 4K/sec per key for encryption. Key encryption is difficult to benchmark since it includes the time to generate the prime length key which varies from 1009 to 1999 characters in length. AND FINALLY These tools should be employed with a measure of common sense. A strong cypher is only indicated when both statistically and pattern-whise indicated. (And it doesn't hurt to view the file either) My intent in developing these utilities was to enable the cryptographer-programmer a means to evaluate the strength of an encryption scheme as well as the resistance of schemes to "cracking". However (like most tools) these can be used for destructive as well as constructive purposes. The author assumes no liability for illegal use of these tools and sincerely believes they can only result in stronger encryption schemes being developed.