home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!usc!cs.utexas.edu!wupost!waikato.ac.nz!aukuni.ac.nz!cs18.cs.aukuni.ac.nz!pgut1
- Newsgroups: sci.crypt
- Subject: SHS/SHA source code + random thoughts
- Message-ID: <1992Sep4.060230.28313@cs.aukuni.ac.nz>
- From: Peter Gutmann (pgut1@cs.aukuni.ac.nz)
- Date: Fri, 4 Sep 1992 06:02:30 GMT
- Sender: pgut1@cs.aukuni.ac.nz (PeterClaus Gutmann )
- Organization: HPACK Conspiracy Secret Laboratory
- Keywords: SHS SHA message digest
- Summary: Source code for SHS/SHA message digest algorithm + comparison with MD5
- Lines: 503
-
- The following is a C implementation of the NIST SHA (Secure Hash Algorithm) -
- this is sometimes also referred to as SHS (Secure Hash Standard), but I've used
- SHA for now since the standard hasn't been adopted yet. It's called SHS and
- SHA rather interchangeably just to confuse people, I've called it SHA in the
- text but SHS in the code in anticipation of it becoming a standard. Once it's
- adopted, you can feed the text through sed and change all the names. I'll
- assume everyone has a copy of the SHS document and won't bother including it in
- this posting (it's rather long).
-
-
- The SHS/SHA Code
- ================
-
- It's a fairly straightforward implementation which has been tested under
- MSDOS and OS/2 on a PClone, and Unix on a DECstation. Some of the
- optimizations used are mentioned in the code. One of the results of these
- optimizations is that the code isn't endianness-independant, meaning that on
- little-endian machines a byteReverse() function has to be used for
- endianness-reversal. This entails defining LITTLE_ENDIAN on a little-endian
- system (it's currently defined by default in SHS.H). Being able to assume a
- given data endianness makes getting data to the SHA transform() routine a lot
- faster.
-
-
- SHA Speed
- =========
-
- I ran the SHA code against the distributed version of MD5 (which is
- significantly less optimised than the SHA code). The results were as follows:
-
- 25 MHz PClone DECstation 2100 DECstation 5000
-
- SHA 31 K/sec 120 K/sec 208 K/sec
- MD5 55 K/sec 169 K/sec 278 K/sec
-
- This comparison isn't 100% fair since the standard MD5 distribution is a
- somewhat pessimal implementation (in fact an optimised PC version runs around 5
- times faster). An implementation of MD5 optimised to the level of SHA runs
- around 2 times faster.
-
- Therefore with similar levels of optimisation it appears MD5 is around three
- times as fast on the PClone as SHA. Even the pessimal MD5 on the DECstations
- is around a third as fast again as SHA, and would probably also be 2 or more
- times as fast if optimized (I used the standard MD5 distribution rather than
- an optimized custom one to allow others to verify the results).
-
-
- SHA's Awkward Block Size (Some Flamage)
- ========================
-
- Will SHA be weakened by taking out h4 and E and reducing the number of rounds
- to create a 128-bit MD algorithm? This seems a lot nicer than the current one
- since it's now a power-of-two size which fits in a lot better with most current
- software. Removing h4 and E and reducing the total number of rounds by 16
- doesn't seem to weaken it any, and IMHO the decision to force SHA to fit an
- awkward DSS data size wasn't such a hot idea, especially if it's to be used
- with non-DSS code or if q is ever changed.
-
-
- SHA vs MD5
- ==========
-
- When implementing SHA I noticed how very similar it was to MD4. The main
- changes were the addition of an the 'expand' transformation (Step B), the
- addition of the previous round's output into the next round for a faster
- avalance effect, and the increasing of the whole transformation to fit the DSS
- block size. SHA is very much an MD4-variant rather than a redesign like MD5.
-
- The design decisions behind MD5 were given in the MD5 document, the design
- for SHA is never gone into in the SHS/SHA document (mind you it's pretty
- obvious what's going on - everything except how the Mysterious Constants
- were chosen can be seen at a glance). Presumably some of the changes made
- were to avoid the known attacks for MD4, but again no design details are
- given. Anyway, using what I had available I took the points raised in the
- MD5 design and compared them with what SHA did:
-
- - A fourth round has been added.
-
- SHA does this too. However in SHA the fourth round uses the same f-function
- as the second round (not obvious whether this is a problem or not, I'll have
- to look at it a bit more).
-
- - Each step now has a unique additive constant.
-
- SHA keeps the MD4 scheme where it reuses the constants for each group of
- rounds (in this case for 20 rounds at a time). Actually MD4 only has the
- additive constants in the last two rounds, not for all 4, but the principal
- is the same - the constants are reused many times.
-
- - The function g in round 2 was changed from ( XY v XZ v YZ ) to
- ( XZ v Y not( Z ) ) to make g less symmetric.
-
- SHA uses the MD4 version ( XY v XZ v YZ ) (SHA calls it f2 rather than g).
-
- - Each step now adds in the result of the previous step. This promotes a faster
- "avalanche effect".
-
- This change has been made in SHA as well.
-
- - The order in which input words are accessed in rounds 2 and 3 is changed, to
- make these patterns less like each other.
-
- SHA always access the words the same way, like MD4.
-
- - The shift amounts in each round have been approximately optimized, to yield a
- faster "avalanche effect". The shifts in different rounds are distinct.
-
- SHA uses a constant shift amount in each round. This shift amount is
- relatively prime to the word size (as in MD4).
-
- If you take SHA and remove h4, E, and a few rounds, the result is (basically)
- MD4 with one more round, the addition of the output of step n-1 to step n
- (giving a faster avalanche effect), and the addition of the initial "expand"
- transformation. This initial transformation is important, since it spreads the
- input data bits out over an area four times as large as the original, and then
- mixes in the expanded version of the data in each round. This means that
- instead of reusing the input data in each group of rounds, SHA uses different
- permutations of the input data in each group of rounds. This is definitely A
- Good Thing.
-
- This leads to the following situation:
-
- SHA = MD4 + 'expand' transformation + extra round + better-avalanche
-
- MD5 = MD4 + improved bit-bashing + extra round + better-avalanche
-
- Which is stronger, MD5 with its improved bit-bashing or SHA with it's 'expand'
- transformation? (Ain't no way I'm going to answer this one :-).
-
- One point is that the only "extra" in SHA which MD5 doesn't have, namely the
- 'expand' transformation, can be easily added to MD5, but that the MD5
- improvements can't be added to SHA without redesigning the whole algorithm.
- Thus the paranoid types can hedge their bets by adding 'expand' to MD5, giving
- them the best of both worlds (its very easy to simply change MD5 to have the
- 'expand' transformation - maybe this is an MD6 in the making?).
-
-
- Conclusion
- ==========
-
- This positing is beginning to sound as if I'm the official net.apologist for
- MD5 :-). Maybe I'm being a bit harsh on SHA, but I think someone should point
- out that it may not be the be-all and end-all of message digest algorithms. I
- welcome any email on the subject, or post your flames here....
-
- Peter.
-
- ------------------------------- Chainsaw here ---------------------------------
- #include <stdio.h>
- #include <stdlib.h>
- #include <time.h>
-
- /* --------------------------------- SHS.H ------------------------------- */
-
- /* NIST proposed Secure Hash Standard.
-
- Written 2 September 1992, Peter C. Gutmann.
- This implementation placed in the public domain.
-
- Comments to pgut1@cs.aukuni.ac.nz */
-
- /* Useful defines/typedefs */
-
- typedef unsigned char BYTE;
- typedef unsigned long LONG;
-
- /* The SHS block size and message digest sizes, in bytes */
-
- #define SHS_BLOCKSIZE 64
- #define SHS_DIGESTSIZE 20
-
- /* The structure for storing SHS info */
-
- typedef struct {
- LONG digest[ 5 ]; /* Message digest */
- LONG countLo, countHi; /* 64-bit bit count */
- LONG data[ 16 ]; /* SHS data buffer */
- } SHS_INFO;
-
- /* Whether the machine is little-endian or not */
-
- #define LITTLE_ENDIAN
-
- /* --------------------------------- SHS.C ------------------------------- */
-
- /* NIST proposed Secure Hash Standard.
-
- Written 2 September 1992, Peter C. Gutmann.
- This implementation placed in the public domain.
-
- Comments to pgut1@cs.aukuni.ac.nz */
-
- #include <string.h>
-
- /* The SHS f()-functions */
-
- #define f1(x,y,z) ( ( x & y ) | ( ~x & z ) ) /* Rounds 0-19 */
- #define f2(x,y,z) ( x ^ y ^ z ) /* Rounds 20-39 */
- #define f3(x,y,z) ( ( x & y ) | ( x & z ) | ( y & z ) ) /* Rounds 40-59 */
- #define f4(x,y,z) ( x ^ y ^ z ) /* Rounds 60-79 */
-
- /* The SHS Mysterious Constants */
-
- #define K1 0x5A827999L /* Rounds 0-19 */
- #define K2 0x6ED9EBA1L /* Rounds 20-39 */
- #define K3 0x8F1BBCDCL /* Rounds 40-59 */
- #define K4 0xCA62C1D6L /* Rounds 60-79 */
-
- /* SHS initial values */
-
- #define h0init 0x67452301L
- #define h1init 0xEFCDAB89L
- #define h2init 0x98BADCFEL
- #define h3init 0x10325476L
- #define h4init 0xC3D2E1F0L
-
- /* 32-bit rotate - kludged with shifts */
-
- #define S(n,X) ( ( X << n ) | ( X >> ( 32 - n ) ) )
-
- /* The initial expanding function */
-
- #define expand(count) W[ count ] = W[ count - 3 ] ^ W[ count - 8 ] ^ W[ count - 14 ] ^ W[ count - 16 ]
-
- /* The four SHS sub-rounds */
-
- #define subRound1(count) \
- { \
- temp = S( 5, A ) + f1( B, C, D ) + E + W[ count ] + K1; \
- E = D; \
- D = C; \
- C = S( 30, B ); \
- B = A; \
- A = temp; \
- }
-
- #define subRound2(count) \
- { \
- temp = S( 5, A ) + f2( B, C, D ) + E + W[ count ] + K2; \
- E = D; \
- D = C; \
- C = S( 30, B ); \
- B = A; \
- A = temp; \
- }
-
- #define subRound3(count) \
- { \
- temp = S( 5, A ) + f3( B, C, D ) + E + W[ count ] + K3; \
- E = D; \
- D = C; \
- C = S( 30, B ); \
- B = A; \
- A = temp; \
- }
-
- #define subRound4(count) \
- { \
- temp = S( 5, A ) + f4( B, C, D ) + E + W[ count ] + K4; \
- E = D; \
- D = C; \
- C = S( 30, B ); \
- B = A; \
- A = temp; \
- }
-
- /* The two buffers of 5 32-bit words */
-
- LONG h0, h1, h2, h3, h4;
- LONG A, B, C, D, E;
-
- /* Initialize the SHS values */
-
- void shsInit( SHS_INFO *shsInfo )
- {
- /* Set the h-vars to their initial values */
- shsInfo->digest[ 0 ] = h0init;
- shsInfo->digest[ 1 ] = h1init;
- shsInfo->digest[ 2 ] = h2init;
- shsInfo->digest[ 3 ] = h3init;
- shsInfo->digest[ 4 ] = h4init;
-
- /* Initialise bit count */
- shsInfo->countLo = shsInfo->countHi = 0L;
- }
-
- /* Perform the SHS transformation. Note that this code, like MD5, seems to
- break some optimizing compilers - it may be necessary to split it into
- sections, eg based on the four subrounds */
-
- void shsTransform( SHS_INFO *shsInfo )
- {
- LONG W[ 80 ], temp;
- int i;
-
- /* Step A. Copy the data buffer into the local work buffer */
- for( i = 0; i < 16; i++ )
- W[ i ] = shsInfo->data[ i ];
-
- /* Step B. Expand the 16 words into 64 temporary data words */
- expand( 16 ); expand( 17 ); expand( 18 ); expand( 19 ); expand( 20 );
- expand( 21 ); expand( 22 ); expand( 23 ); expand( 24 ); expand( 25 );
- expand( 26 ); expand( 27 ); expand( 28 ); expand( 29 ); expand( 30 );
- expand( 31 ); expand( 32 ); expand( 33 ); expand( 34 ); expand( 35 );
- expand( 36 ); expand( 37 ); expand( 38 ); expand( 39 ); expand( 40 );
- expand( 41 ); expand( 42 ); expand( 43 ); expand( 44 ); expand( 45 );
- expand( 46 ); expand( 47 ); expand( 48 ); expand( 49 ); expand( 50 );
- expand( 51 ); expand( 52 ); expand( 53 ); expand( 54 ); expand( 55 );
- expand( 56 ); expand( 57 ); expand( 58 ); expand( 59 ); expand( 60 );
- expand( 61 ); expand( 62 ); expand( 63 ); expand( 64 ); expand( 65 );
- expand( 66 ); expand( 67 ); expand( 68 ); expand( 69 ); expand( 70 );
- expand( 71 ); expand( 72 ); expand( 73 ); expand( 74 ); expand( 75 );
- expand( 76 ); expand( 77 ); expand( 78 ); expand( 79 );
-
- /* Step C. Set up first buffer */
- A = shsInfo->digest[ 0 ];
- B = shsInfo->digest[ 1 ];
- C = shsInfo->digest[ 2 ];
- D = shsInfo->digest[ 3 ];
- E = shsInfo->digest[ 4 ];
-
- /* Step D. Serious mangling, divided into four sub-rounds */
- subRound1( 0 ); subRound1( 1 ); subRound1( 2 ); subRound1( 3 );
- subRound1( 4 ); subRound1( 5 ); subRound1( 6 ); subRound1( 7 );
- subRound1( 8 ); subRound1( 9 ); subRound1( 10 ); subRound1( 11 );
- subRound1( 12 ); subRound1( 13 ); subRound1( 14 ); subRound1( 15 );
- subRound1( 16 ); subRound1( 17 ); subRound1( 18 ); subRound1( 19 );
- subRound2( 20 ); subRound2( 21 ); subRound2( 22 ); subRound2( 23 );
- subRound2( 24 ); subRound2( 25 ); subRound2( 26 ); subRound2( 27 );
- subRound2( 28 ); subRound2( 29 ); subRound2( 30 ); subRound2( 31 );
- subRound2( 32 ); subRound2( 33 ); subRound2( 34 ); subRound2( 35 );
- subRound2( 36 ); subRound2( 37 ); subRound2( 38 ); subRound2( 39 );
- subRound3( 40 ); subRound3( 41 ); subRound3( 42 ); subRound3( 43 );
- subRound3( 44 ); subRound3( 45 ); subRound3( 46 ); subRound3( 47 );
- subRound3( 48 ); subRound3( 49 ); subRound3( 50 ); subRound3( 51 );
- subRound3( 52 ); subRound3( 53 ); subRound3( 54 ); subRound3( 55 );
- subRound3( 56 ); subRound3( 57 ); subRound3( 58 ); subRound3( 59 );
- subRound4( 60 ); subRound4( 61 ); subRound4( 62 ); subRound4( 63 );
- subRound4( 64 ); subRound4( 65 ); subRound4( 66 ); subRound4( 67 );
- subRound4( 68 ); subRound4( 69 ); subRound4( 70 ); subRound4( 71 );
- subRound4( 72 ); subRound4( 73 ); subRound4( 74 ); subRound4( 75 );
- subRound4( 76 ); subRound4( 77 ); subRound4( 78 ); subRound4( 79 );
-
- /* Step E. Build message digest */
- shsInfo->digest[ 0 ] += A;
- shsInfo->digest[ 1 ] += B;
- shsInfo->digest[ 2 ] += C;
- shsInfo->digest[ 3 ] += D;
- shsInfo->digest[ 4 ] += E;
- }
-
- #ifdef LITTLE_ENDIAN
-
- /* When run on a little-endian CPU we need to perform byte reversal on an
- array of longwords. It is possible to make the code endianness-
- independant by fiddling around with data at the byte level, but this
- makes for very slow code, so we rely on the user to sort out endianness
- at compile time */
-
- static void byteReverse( LONG *buffer, int byteCount )
- {
- LONG value;
- int count;
-
- byteCount /= sizeof( LONG );
- for( count = 0; count < byteCount; count++ )
- {
- value = ( buffer[ count ] << 16 ) | ( buffer[ count ] >> 16 );
- buffer[ count ] = ( ( value & 0xFF00FF00L ) >> 8 ) | ( ( value & 0x00FF00FFL ) << 8 );
- }
- }
- #endif /* LITTLE_ENDIAN */
-
- /* Update SHS for a block of data. This code assumes that the buffer size
- is a multiple of SHS_BLOCKSIZE bytes long, which makes the code a lot
- more efficient since it does away with the need to handle partial blocks
- between calls to shsUpdate() */
-
- void shsUpdate( SHS_INFO *shsInfo, BYTE *buffer, int count )
- {
- /* Update bitcount */
- if( ( shsInfo->countLo + ( ( LONG ) count << 3 ) ) < shsInfo->countLo )
- shsInfo->countHi++; /* Carry from low to high bitCount */
- shsInfo->countLo += ( ( LONG ) count << 3 );
- shsInfo->countHi += ( ( LONG ) count >> 29 );
-
- /* Process data in SHS_BLOCKSIZE chunks */
- while( count >= SHS_BLOCKSIZE )
- {
- memcpy( shsInfo->data, buffer, SHS_BLOCKSIZE );
- #ifdef LITTLE_ENDIAN
- byteReverse( shsInfo->data, SHS_BLOCKSIZE );
- #endif /* LITTLE_ENDIAN */
- shsTransform( shsInfo );
- buffer += SHS_BLOCKSIZE;
- count -= SHS_BLOCKSIZE;
- }
-
- /* Handle any remaining bytes of data. This should only happen once
- on the final lot of data */
- memcpy( shsInfo->data, buffer, count );
- }
-
- void shsFinal( SHS_INFO *shsInfo )
- {
- int count;
- LONG lowBitcount = shsInfo->countLo, highBitcount = shsInfo->countHi;
-
- /* Compute number of bytes mod 64 */
- count = ( int ) ( ( shsInfo->countLo >> 3 ) & 0x3F );
-
- /* Set the first char of padding to 0x80. This is safe since there is
- always at least one byte free */
- ( ( BYTE * ) shsInfo->data )[ count++ ] = 0x80;
-
- /* Pad out to 56 mod 64 */
- if( count > 56 )
- {
- /* Two lots of padding: Pad the first block to 64 bytes */
- memset( ( BYTE * ) &shsInfo->data + count, 0, 64 - count );
- #ifdef LITTLE_ENDIAN
- byteReverse( shsInfo->data, SHS_BLOCKSIZE );
- #endif /* LITTLE_ENDIAN */
- shsTransform( shsInfo );
-
- /* Now fill the next block with 56 bytes */
- memset( &shsInfo->data, 0, 56 );
- }
- else
- /* Pad block to 56 bytes */
- memset( ( BYTE * ) &shsInfo->data + count, 0, 56 - count );
- #ifdef LITTLE_ENDIAN
- byteReverse( shsInfo->data, SHS_BLOCKSIZE );
- #endif /* LITTLE_ENDIAN */
-
- /* Append length in bits and transform */
- shsInfo->data[ 14 ] = highBitcount;
- shsInfo->data[ 15 ] = lowBitcount;
-
- shsTransform( shsInfo );
- #ifdef LITTLE_ENDIAN
- byteReverse( shsInfo->data, SHS_DIGESTSIZE );
- #endif /* LITTLE_ENDIAN */
- }
-
- /* ----------------------------- SHS Test code --------------------------- */
-
- /* Size of buffer for SHS speed test data */
-
- #define TEST_BLOCK_SIZE ( SHS_DIGESTSIZE * 100 )
-
- /* Number of bytes of test data to process */
-
- #define TEST_BYTES 10000000L
- #define TEST_BLOCKS ( TEST_BYTES / TEST_BLOCK_SIZE )
-
- void main( void )
- {
- SHS_INFO shsInfo;
- time_t endTime, startTime;
- BYTE data[ TEST_BLOCK_SIZE ];
- long i;
-
- /* Test output data (this is the only test data given in the SHS
- document, but chances are if it works for this it'll work for
- anything) */
- shsInit( &shsInfo );
- shsUpdate( &shsInfo, ( BYTE * ) "abc", 3 );
- shsFinal( &shsInfo );
- if( shsInfo.digest[ 0 ] != 0x0164B8A9L || \
- shsInfo.digest[ 1 ] != 0x14CD2A5EL || \
- shsInfo.digest[ 2 ] != 0x74C4F7FFL || \
- shsInfo.digest[ 3 ] != 0x082C4D97L || \
- shsInfo.digest[ 4 ] != 0xF1EDF880L )
- {
- puts( "Error in SHS implementation" );
- exit( -1 );
- }
-
- /* Now perform time trial, generating MD for 10MB of data. First,
- initialize the test data */
- memset( data, 0, TEST_BLOCK_SIZE );
-
- /* Get start time */
- printf( "SHS time trial. Processing %ld characters...\n", TEST_BYTES );
- time( &startTime );
-
- /* Calculate SHS message digest in TEST_BLOCK_SIZE byte blocks */
- shsInit( &shsInfo );
- for( i = TEST_BLOCKS; i > 0; i-- )
- shsUpdate( &shsInfo, data, TEST_BLOCK_SIZE );
- shsFinal( &shsInfo );
-
- /* Get finish time and time difference */
- time( &endTime );
- printf( "Seconds to process test input: %ld\n", endTime - startTime );
- printf( "Characters processed per second: %ld\n", TEST_BYTES / ( endTime - startTime ) );
- }
- --
- pgut1@cs.aukuni.ac.nz || peterg@kcbbs.gen.nz || peter@nacjack.gen.nz
- or sling a bottle in the ocean, it'd be about as reliable as our
- email currently is.
-
-