10.5. Cryptographic Algorithms and Protocols

Often cryptographic algorithms and protocols are necessary to keep a system secure, particularly when communicating through an untrusted network such as the Internet. Where possible, use cryptographic techniques to authenticate information and keep the information private (but don't assume that simple encryption automatically authenticates as well). Generally you'll need to use a suite of available tools to secure your application.

For background information and code, you should probably look at the classic text ``Applied Cryptography'' [Schneier 1996]. The newsgroup ``sci.crypt'' has a series of FAQ's; you can find them at many locations, including http://www.landfield.com/faqs/cryptography-faq. Linux-specific resources include the Linux Encryption HOWTO at http://marc.mutz.com/Encryption-HOWTO/. A discussion on how protocols use the basic algorithms can be found in [Opplinger 1998]. A useful collection of papers on how to apply cryptography in protocols can be found in [Stallings 1996]. What follows here is just a few comments; these areas are rather specialized and covered more thoroughly elsewhere.

Cryptographic protocols and algorithms are difficult to get right, so do not create your own. Instead, where you can, use protocols and algorithms that are widely-used, heavily analyzed, and accepted as secure. When you must create anything, give the approach wide public review and make sure that professional security analysts examine it for problems. In particular, do not create your own encryption algorithms unless you are an expert in cryptology, know what you're doing, and plan to spend years in professional review of the algorithm. Creating encryption algorithms (that are any good) is a task for experts only.

A number of algorithms are patented; even if the owners permit ``free use'' at the moment, without a signed contract they can always change their minds later, putting you at extreme risk later. In general, avoid all patented algorithms - in nearly all cases there's an unpatented approach that is at least as good or better technically, and by doing so you avoid a large number of legal problems.

10.5.1. Cryptographic Protocols

For protocols, try to use standard-conforming protocols such as SSL (soon to be TLS), SSH, IPSec, GnuPG/PGP, and Kerberos. Many of these overlap somewhat in functionality, but each has a ``specialty'' niche. SSL (soon to be TLS) is the primary method for protecting http (web) transactions. PGP-compatible protocols (implemented in PGP and GnuPG) are a primary method for securing email end-to-end. Kerberos is a primary method for securing and supporting authentication on a LAN, and for establishing shared secrets (thus, it needs to be used with other algorithms for the actual protection of communication). SSH is the primary method of securing ``remote terminals'' over an internet, e.g., telnet-like and X windows connections, though it's often used for securing other data streams too (such as CVS accesses). Note that there are two major versions of the SSH protocol, and there are several choices for key types and so on; see its documentation for more information. OpenSSH is an open source implementation of SSH. IPSec is the primary method for securing lower-level packets and ``all'' packets, so it's particularly useful for securing virtual private networks and remote machines. The new version of the Internet Protocol, IPv6, comes with IPSec ``built in,'' but IPSec also works with the more common IPv4 protocol.

Many of these protocols allow you to select a number of different algorithms, so you'll still need to pick reasonable defaults for algorithms (e.g., for encryption).

10.5.2. Symmetric Key Encryption Algorithms

The use, export, and/or import of implementations of encryption algorithms are restricted in many countries, and the laws can change quite rapidly. Find out what the rules are before trying to build applications using cryptography.

For secret key (bulk data) encryption algorithms, use only encryption algorithms that have been openly published and withstood years of attack, and check on their patent status. I would recommend using the new Advanced Encryption Standard (AES), also known as Rijndahl -- a number of cryptographers have analyzed it and not found any serious weakness in it, and I believe it has been through enough analysis to be trustworthy now. A good alternative to AES is the Serpent algorithm, which is slightly slower but is very resistant to attack. For many applications triple-DES is a very good encryption algorithm; it has a reasonably lengthy key (112 bits), no patent issues, and a very long history of withstanding attacks (it's withstood attacks far longer than any other encryption algorithm with reasonable key length in the public literature). However, triple-DES is very slow when implemented in software, so triple-DES can be considered ``safest but slowest.'' Twofish appears to be a good encryption algorithm, but there are some lingering questions - Sean Murphy and Fauzan Mirza showed that Twofish has properties that cause many academics to be concerned (though as of yet no one has managed to exploit these properties). MARS is highly resistent to ``new and novel'' attacks, but it's more complex and is impractical on small-ability smartcards. For the moment I would avoid Twofish - it's quite likely that this will never be exploitable, but it's hard to be sure and there are alternative algorithms which don't have these concerns. Don't use IDEA - it's subject to U.S. and European patents. Don't use stupid algorithms such as XOR with a constant or constant string, the ROT (rotation) scheme, a Vinegere ciphers, and so on - these can be trivially broken with today's computers. Don't use ``double DES'' (using DES twice) - that's subject to a ``man in the middle'' attack that triple-DES avoids. Your protocol should support multiple encryption algorithms, anyway; that way, when an encryption algorithm is broken, users can switch to another one.

For symmetric-key encryption (e.g., for bulk encryption), don't use a key length less than 90 bits if you want the information to stay secret through 2016 (add another bit for every additional 18 months of security) [Blaze 1996]. For encrypting worthless data, the old DES algorithm has some value, but with modern hardware it's too easy to break DES's 56-bit key using brute force. If you're using DES, don't just use the ASCII text key as the key - parity is in the least (not most) significant bit, so most DES algorithms will encrypt using a key value well-known to adversaries; instead, create a hash of the key and set the parity bits correctly (and pay attention to error reports from your encryption routine). So-called ``exportable'' encryption algorithms only have effective key lengths of 40 bits, and are essentially worthless; in 1996 an attacker could spend $10,000 to break such keys in twelve minutes or use idle computer time to break them in a few days, with the time-to-break halving every 18 months in either case.

Block encryption algorithms can be used in a number of different modes, such as ``electronic code book'' (ECB) and ``cipher block chaining'' (CBC). In nearly all cases, use CBC, and do not use ECB mode - in ECB mode, the same block of data always returns the same result inside a stream, and this is often enough to reveal what's encrypted. Many modes, including CBC mode, require an ``initialization vector'' (IV). The IV doesn't need to be secret, but it does need to be unpredictable by an attacker. Don't reuse IV's across sessions - use a new IV each time you start a session.

There are a number of different streaming encryption algorithms, but many of them have patent restrictions. I know of no patent or technical issues with WAKE. RC4 was a trade secret of RSA Data Security Inc; it's been leaked since, and I know of no real legal impediment to its use, but RSA has often threatened court action against users of it (it's not at all clear what RSA could do, but no doubt they could tie up users in worthless court cases). If you use RC4, use it as intended - in particular, always discard the first 256 bytes it generates, or you'll be vulnerable to attack. SEAL is patented by IBM - so don't use it. SOBER is patented; the patent owner has claimed that it will allow many uses for free if permission is requested, but this creates an impediment for later use. Even more interestingly, block encryption algorithms can be used in modes that turn them into stream ciphers, and users who want stream ciphers should consider this approach (you'll be able to choose between far more publicly-available algorithms).

10.5.3. Public Key Algorithms

For public key cryptography (used, among other things, for signing and sending secret keys), there are only a few widely-deployed algorithms. One of the most widely-used algorithms is RSA; RSA's algorithm was patented, but only in the U.S., and that patent expired in September 2000, so RSA can be freely used. Never decrypt or sign a raw value that an attacker gives you directly using RSA and expose the result, because that could expost the private key (this isn't a problem in practice, because most protocols involve signing a hash computed by the user - not the raw value - or don't expose the result). Never decrypt or sign the exact same raw value multiple times (the original can be exposed). Both of these can be solved by always adding random padding (PGP does this) - the usual approach is called Optimal Asymmetric Encryption Padding (OAEP).

The Diffie-Hellman key exchange algorithm is widely used to permit two parties to agree on a session key. By itself it doesn't guarantee that the parties are who they say they are, or that there is no middleman, but it does strongly help defend against passive listeners; its patent expired in 1997. If you use Diffie-Hellman to create a shared secret, be sure to hash it first (there's an attack if use its shared value directly).

NIST developed the digital signature standard (DSS) (it's a modification of the ElGamal cryptosystem) for digital signature generation and verification; one of the conditions for its development was for it to be patent-free.

RSA, Diffie-Hellman, and El Gamal's techniques require more bits for the keys for equivalent security compared to typicaly symmetric keys; a 1024-bit key in these systems is roughly equivalent to an 80-bit symmetric key, and in my opinion that is the minimum you should use today. If you need very few bits for a public key, then you might use elliptic curve cryptography (IEEE P1363 has some suggested curves; finding curves is hard). However, be careful - elliptic curve cryptography isn't patented, but certain speedup techniques are patented (elliptic curve is fast enough that it really doesn't need these speedups anyway for its usual use of encrypting session / bulk encryption keys).

10.5.4. Cryptographic Hash Algorithms

Some programs need a one-way cryptographic hash algorithm, that is, a function that takes an ``arbitrary'' amount of data and generates a fixed-length number that hard for an attacker to invert (e.g., it's difficult for an attacker to create a different set of data to generate that same value). For a number of years MD5 has been a favorite, but recent efforts have shown that its 128-bit length may not be enough [van Oorschot 1994] and that certain attacks weaken MD5's protection [Dobbertin 1996]. Indeed, there are rumors that a top industry cryptographer has broken MD5, but is bound by employee agreement to keep silent (see the Bugtraq 22 August 2000 posting by John Viega). Anyone can create a rumor, but enough weaknesses have been found that the idea of completing the break is plausible. If you're writing new code, use SHA-1 instead of MD5. Don't use the original SHA (now called ``SHA-0''); SHA-0 had the same weakness that MD5 does. If you need more bits in your hash algorithm, use SHA-256, SHA-384, or SHA-512; you can get the specifications in NIST FIPS PUB 180-2.

10.5.5. Integrity Checking

When communicating, you need some sort of integrity check (don't depend just on encryption, since an attacker can then induce changes of information to ``random'' values). This can be done with hash algorithms, but don't just use a hash function directly (this exposes users to an ``extension'' attack - the attacker can use the hash value, add data of their choosing, and compute the new hash). The usual approach is ``HMAC'', which computes the integrity check as
  H(k xor opad, H(k xor ipad, data)).
where H is the hash function (typically MD5 or SHA-1) and k is the key. Thus, integrity checks are often HMAC-MD5 or HMAC-SHA-1. Note that although MD5 has some weaknesses, as far as I know MD5 isn't vulnerable when used in this construct, so HMAC-MD5 is (to my knowledge) okay. This is defined in detail in IETF RFC 2104.

Note that in the HMAC approach, a receiver can forge the same data as a sender. This isn't usually a problem, but if this must be avoided, then use public key methods and have the sender ``sign'' the data with the sender private key - this avoids this forging attack, but it's more expensive and for most environments isn't necessary.

10.5.6. Other Cryptographic Issues

You should both encrypt and include integrity checks of data that's important. Don't depend on the encryption also providing integrity - an attacker may be able to change the bits into a different value, and although the attacker may not be able to change it to a specific value, merely changing the value may be enough. In general, you should use different keys for integrity and secrecy, to avoid certain subtle attacks.

One issue not discussed often enough is the problem of ``traffic analysis.'' That is, even if messages are encrypted and the encryption is not broken, an adversary may learn a great deal just from the encrypted messages. For example, if the presidents of two companies start exchanging many encrypted email messages, it may suggest that the two comparies are considering a merger. For another example, many SSH implementations have been found to have a weakness in exchanging passwords: observers could look at packets and determine the length (or length range) of the password, even if they couldn't determine the password itself. They could also also determine other information about the password that significantly aided in breaking it.

Be sure to not make it possible to solve a problem in parts, and use different keys when the trust environment (who is trusted) changes. Don't use the same key for too long - after a while, change the session key or password so an adversary will have to start over.

Generally you should compress something you'll encrypt - this does add a fixed header, which isn't so good, but it eliminates many patterns in the rest of the message as well as making the result smaller, so it's usually viewed as a ``win'' if compression is likely to make the result smaller.

In a related note, if you must create your own communication protocol, examine the problems of what's gone on before. Classics such as Bellovin [1989]'s review of security problems in the TCP/IP protocol suite might help you, as well as Bruce Schneier [1998] and Mudge's breaking of Microsoft's PPTP implementation and their follow-on work. Again, be sure to give any new protocol widespread public review, and reuse what you can.