home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!dtix!darwin.sura.net!jvnc.net!yale.edu!yale!mintaka.lcs.mit.edu!hal.gnu.ai.mit.edu!knight
- From: knight@hal.gnu.ai.mit.edu (Knight of Tossing and Turning)
- Newsgroups: sci.crypt
- Subject: mcrypt again
- Message-ID: <1992Aug18.233021.15859@mintaka.lcs.mit.edu>
- Date: 18 Aug 92 23:30:21 GMT
- Sender: news@mintaka.lcs.mit.edu
- Organization: MIT Laboratory for Computer Science
- Lines: 300
-
-
- In <1992Aug13.153842.24060@mintaka.lcs.mit.edu>
- knight@hal.gnu.ai.mit.edu (Knight of Tossing and Turning) says:
-
- >>This is the post to the creation method of the "Encryption Grid" of
- >>"Mass Encryption" aka mcrypt.
- >> example deleted
-
- > I'm afraid that I do not understand this fully. Mass Encryption was
- > earlier supposed to be a byte Simple Substitution followed by
- > bit-transposition on a 1K block:
-
- >> ...stuff deleted
-
- > How does the User Key create the *two* necessary permutations?
- > Or does the cipher simply re-use the *same* permutation for *both*
- > the substitution and transposition stages? But how can this be?
- > The Simple Substitution on bytes needs 256 entries of 0..255; a
- > complete bit-permutation for a 1K block needs 8192 entries of
- > 0..8191.
-
- As I explained earlier, I figured everyone knew what a simple
- substitution looked like, so I wouldn't waste time with it. Basically,
- I was going to use a program which is basically the algorythm used
- for shuffling decks of cards in Video Poker programs, just using some
- elements of the password line to generate them. I figured that this
- method would be safe as long as I didn't generate the bit transposition
- phase and the simple-substitution phase the exact same way or in some
- way directly linked outside of the password field. In other words, bits
- are used in the transposition phase, and integers and positioning are
- used in the simple-substitution phase. Therefore, not directly linked by
- anything but the password.
-
- >> For password string "X", and distribution of "X" in a 32 * 32
- >> matrix is represented by the function GENKEY, . . .
-
- > OK, I think I see how the process works now:
- > ... accurate stuff deleted
-
- > It also means that adjacent bits in the input stream are about
- > 50% likely to be adjacent when enciphered; 25% likely to be separated
- > by one bit; 12.5% likely to be separated by two bits, etc. This
- > is in no sense a general permutation; it is better described as
- > key-controlled multiplexing or "braiding" (to recall the discussion
- > we had last year) without either the random control or the noise data
- > channel.
-
- Your right, I had this problem proving my system earlier. I wrote
- a program which took into consideration the rules which I generated
- and came up with a number in the whereabouts of 2^54. Since then,
- I've been debating if I should remove the required 0<x<255 rule.
- Removal of this would cause the permutations to jump dramatically.
-
- If you are off by a single bit, the entire encryption will likely be
- ruined and the simple substitution will cause a 1-in-256 chance of
- anything you've generated being accurate at all. It is basically
- a deck of cards -- pull out the wrong bit and the whole scheme will
- collapse.
-
- I wrote a program to solve for this series of numbers, I posted the
- results of the program to sci.crypt several days back.
-
- > On average, 16 bits of ciphertext contain two intermixed 8-bit
- > sequences; take 32 bits of ciphertext and you are sure to get a couple
- > of complete plaintext characters. In other words, the diffusion is
- > very weak, and diffusion is all that transposition does.
-
- Okay, lets get more to the heart of what you really were trying to
- point out -- and that was the arrangement must be stream-like, and
- therefore has a fairly finite method of extraction from 32 bits.
-
- First of all, lets say you extract 3 bytes of data from 32 bits of
- ciphertext. First of all, some of that data will be sequential to
- your needs. That is -- in order of extraction. However, some of that
- plaintext will be out of order -- after inverse takes place. Therefore,
- bytes 1, 2, and 3 could be mixed with bytes 678, 679, and 680. This
- is severely "out of phase" since you haven't even determined what bytes
- 4-677 are supposed to be yet.
-
- To repeatitively "take guesses" would be 95*95 (if in password field)
- or 10,355 ways to extract the very first BYTE, the first piece of
- pseudo-plaintext, and THAT could be one of 256 values, because of the
- simple-substitution made earlier. Therefore, the ultimate chances of
- extracting the very first byte, in accordance to the "system" which
- will extract all additional bytes accurately, is 1 in 2,650,880! And
- then you will have to guess the NEXT byte so that it follows the exact
- same parameters, which cannot be derrived from the first byte.
-
- Yes, I realize that there are only 256 possible numbers for the first
- byte, but this the encryption is highly structured. To take the first
- byte that "matches" will doubtless prevent the extraction of any other
- byte accurately -- you'd be "making it up" as you go along.
-
- > Moreover, the input data separations (gaps between the runs) are
- > not random, indeed, they *make sense* in the underlying reference
- > grid; they are, in fact, ASCII letter bit positions. Known-plaintext
- > (after any substitution) may well recover these letters, and thus the
- > actual User Key itself. Any phrase-like structure in the User Key
- > will help to fill in gaps in the cryptanalysis.
-
- Yes, but this is basically true of any encryption. However, this
- isn't ALWAYS true. Hence:
-
- 1 2 3 4
- 1 C H I N
- 2 A *
- 3
- 4
-
- The "*" was a distinct, NON-ASCII character. Therefore, to assume you
- can treat any series at all as only ASCII bit positions, you would have
- the problem dealing with the length of the password itself.
-
- And how well do they make sense? Lets take the following pattern that
- I will agree to: for the first 6 digits (assuming a 6 digit minimum)
- the high (7) bit will be always "0". After digit 6, the number could
- be ASCII or the NON-ASCII value (190 possibilities). After that, it
- could be hashed to a number as high as 254 or as low as 1. So the
- "pattern" permutations jump right where most people start placing a
- password! You'd need as few as 7 digits to watch it climb past the
- "ascii" assumption. By lowering the number of digits "necessary" in
- my program, it causes the permutations to jump all the sooner. It
- would be maximized by a single letter password.
-
- > One remaining issue is how strong the Simple Substitution is at
- > preventing the information necessary for attack. Usually Simple
- > Substitution is considered very weak, and it will be much weaker
- > if the same "reference grid" is used for both substitution and
- > mixing. Apparently the substitution step was not displayed in the
- > "debug mode" output, so it is hard to know how this is done.
-
- Sorry, this step wasn't even in my program when I originally placed
- was requested to show my idea to "sci.crypt". I'll have this placed
- for all to view (and my source code) probably within the week, provided
- my relatives ever leave the house (and my telephone ever starts working
- right for a consistant period of time.)
-
- Even though simple substitution is weak, it was included for two
- basic reasons:
-
- 1) It adds doubt to anything removed out of the "bit transposition"
- phase (eliminating the bulk of the plaintext prior to the really
- tough(?) encryption.) Unless you have the real password, you
- will have to guess as to the result of this step will be.
-
- 2) It prevents the argument that if the encrypted file has the same
- number of bits as the unencrypted file, and the two files were
- of large enough proportions, they were the same file.
-
- >> The rest of the grid can be generated from happenstance values
- >>from manipulation of characters from the password characters that
- >>make up X. A few formulas would be:
- >> .. pointless hash functions deleted.
-
- > These formulas simply to expand the User Key. They do not produce
- > independent results; the results are completely defined by the
- > User Key. Thus, the maximum possible uniqueness is the size of the
- > User Key alphabet (95) to the power of the size of the User Key
- > string (1024), or 95^1024. Of course, since only 1024 bits are
- > used to encipher a block, there can be no more than 2^1024
- > possibilities. Still huge, to be sure, but note the difference
- > from the claim:
-
- >> Total permutations for the GENKEY() function = 95 ^ (1024!).
-
- + Deleted 2^1024 vs 2^8192 harp, since everyone attacked it already.
-
- Also, my claim of 95^1024! is due to the fact that length is provided
- in the password "key grid" by placing of the non-ascii value. Therefore,
- for each limited length string between 1 and 1024 characters long, there
- exists a unique key grid -- which is reminicant of a factorial value.
- Admittably, I wish I had some computer to tell me which of the two numbers
- is smaller, but I can generate the factorial value inside of the 2^8192
- grid, but only because I "trick" the "1024" position into not having to
- supply me with a length. (Hence, if all 1024 bytes are ascii, then I
- don't have to place a non-ascii value to tell me the length.)
-
- Looks to me as if my claim still holds valid.
-
- > Few users will have a 1024-character User Key, so the average
- > uniqueness will be much lower. The multiplexing will not be
- > well-distributed or random, because a User Key of any size will
- > probably be a language phrase, and that plaintext phrase will control
- > the multiplexer directly. The expansion formulas may well be used to
- > expand any cryptanalytic "breaks" to multiple places within the block.
- > The expansion formulas themselves may relate individual User Key
- > characters to multiple locations in the block, yielding multiple
- > attack possibilities (it may be possible to attack individual User
- > Key characters this way). For icing on the cake, the plaintext User
- > Key itself will exist in a known position the "reference grid," just
- > begging to be the first thing attacked. And this is not to mention
- > anagramming, the classical attack on transposition.
-
- A-HA! Great points! Let me see if I can defend against this:
-
- 1) Like all 'password' based encryptions, attacks against language
- phrases by using phonics can be used to reduce the total number of
- "attempts" required IF the word is indeed a word. Furthermore,
- the "BUG" story comes into effect as far as people using really
- small passwords, as many people prefer to do that.
-
- To quote Helen Gaines's book "Cryptanalysis", p3:
-
- "Even a basically weak cipher, in the hands of an expert, can be
- made to serve its purpose; and the strongest can be made useless
- when inproperly used."
-
- 2) Okay, admittably everything along the line is basically a
- series starting with 95 ways times several possibilties that are
- approximately 95 ways until reaching value 1024. But this is
- the limiting factor, which is still a huge number. Even a six
- digit password would be 95*95*95*95*95*95, or 95^6, which is
- basically 735,091,890,625 possibilities. That may look fairly
- weak, but if your not sure about the length of the password, the
- 7 digit possibilities- 69833729609375 and the 8 digit possibilties
- - 6634204312840625, are basically enough to make "standard"
- brute force unfeasable against those attempts.
-
- DES encryption has the problem that two strings could generate
- the same result. I believe this also holds true with RSA
- due to the relatively small number of permutations compared to
- possible passwords that can be entered.
-
- 3) Attack against the fact the password itself is used in the
- "key" grid. Lets consider the following bits were placed using
- mcrypt, and that these bits represent the area of the "grid"
- where the password (in ASCII format) was used to generate the
- positioning of these bits, find the password:
-
- POSITION
-
- byte 1 byte 2 byte 3 byte 4 byte 5 byte 6 byte 7
- 01234567 01234567 01234567 01234567 01234567 01234567 01234567
-
- D: 01001011 11011011 11011001 00100110 11001000 11101110 10100001
-
- D = Encrypted data.
-
- Starting from scratch, not knowing what the data inside contains,
- knowing it could very well have been compressed prior to
- encryption, knowing that the values of the bytes have been altered
- prior to bit transposition, how do you extract the password
- using nothing but that data? If the data could be anything,
- then the password itself likewise could be anything. How long
- is the password? Is it 6 digits? Is it 7 digits? Is it larger
- then the segment I removed? I took VERY careful consideration to
- make sure that "clues" to the password never made it into the
- encrypted file.
-
- I didn't think that using the ascii characters themselves
- in the encryption grid is a problem. Sure, it looks vulernable when
- you stare at them on paper, but after the file is encrypted, the
- entire thing is discarded completely. I figured it was the opposite,
- a STRENGTH to the encryption. That way, no matter what, I was
- guarenteed the maximum number of keys generatable by ASCII.
-
- > But the real problem is that the underlying multiplexing step
- > is so very weak.
-
- An eight digit password has 6,634,204,312,840,625 possible arrangements
- of bits. This doesn't include passwords of earlier length. If you
- mess up but a single BIT anywhere in that combination, the rest of the
- encryption will fall apart and you will unlikely be able to generate
- any provable plaintext from the entire system (due to the simple
- substitution.)
-
- > One might argue that the multiplexing still provides 2^[8192] options
- > for the data block (although I argue that correlation effects greatly
- > reduce this value). But few would disagree that each additional
- > block enciphered in the same way would greatly improve the ability
- > to attack the cipher.
-
- Every "additional" block encrypted will cause extensive damage to the
- I/O usage and DRAM required to analyze the encryption. I'm not sure
- if you are aware of what components of the computer are slower and faster
- then others, but the more you rely on disk drive seeks and calls to
- stored data in Dynamic Memory, the less likely your program has a chance
- to ever finish. More calls to the operating system to handle that, the
- more slowdown you will receive, also. "Hardware" protected? Maybe.
-
- This brings about a point not covered recently, and that is the lack of
- a "verification" system. Basically, Mcrypt has no way of telling
- if the data was properly extracted. You will have to create your own
- methods, which adds considerable amounts of time.
-
- > References
- >
- > [1] Geffe, P. 1973. How to protect data with ciphers that are
- > really hard to break. Electronics. January 4. 99-101.
- >
- > [2] Siegenthaler, T. 1985. Decrypting a Class of Stream Ciphers
- > Using Ciphertext Only. IEEE Transactions on Computers.
- > C34: 81-85.
-
- Will try to get, thank you.
-
- > ---
- > Terry Ritter ritter@cactus.org
-
- --- Eric Knight, knight@gnu.ai.mit.edu
-