Comparing English files coded in ASCII
You may easily use WinMerge.exe. You may also use WinMergeU.exe (if you are on NT/2000/XP/2003), but it will be slower due to encoding conversion.
Comparing western European files coded in the common windows codepage
(See FAQ: How do I tell what encoding my file uses) You may easily use WinMerge.exe. You may also use WinMergeU.exe (if you are on NT/2000/XP/2003), but it will be slower due to encoding conversion.
Comparing Unicode files
You should use WinMergeU.exe (if you are on NT/2000/XP/2003) as it will be at least as fast, and may be faster than WinMerge.exe, in this scenario. Also, WinMergeU.exe correctly handles UTF-8 files, unlike WinMerge.exe. If you are on Win95/98/ME, or if your files are all in English or a western European language, you may get by with WinMerge.exe.
Comparing multilingual files
You should use WinMergeU.exe (if you are on NT/2000/XP/2003) in order to be able to see the characters from different languages simultaneously. If you are on Win95/98/ME, then you may attempt to view the files in WinMerge.exe, but it is recommended that you do not attempt to merge them.
Comparing East Asian files
You should use WinMergeU.exe (if you are on NT/2000/XP/2003) in order to correctly display double-wide characters. See the Font section for font recommendations. If you are on Win95/98/ME, then you may attempt to view the files in WinMerge.exe, but it is recommended that you do not attempt to merge them.
Characters (such as "a" or "1" or "&") are represented by computers as numbers, and there is more than one way to do this. A text encoding method is a way to encode the characters into the numbers (bytes) of which a file is comprised. There are four main text encoding families:
ASCII
Each character is one byte in the file. This is an old encoding which only supports English letters, numbers, and puncuation.
Windows codepages
Each character is one byte in the file. These are used by default by versions of the Windows operating system. Examples include:
1252 (windows-1252) Western European (supports many western European languages including English, French, Spanish, and German)
1251 (windows-1251) Cyrillic (supports Cyrillic languages including Russian and Bulgarian)
1250 (windows-1250) Central European (supports central European Latin-scrypt languages including Polish)
1253 (windows-1253) Greek (supports Greek)
ISO codepages
Each character is one byte in the file. These have names such as "ISO-8859-1". Some windows codepages approximate some of these. Examples include:
ISO-8851-1 ("Latin-1") This is approximated by windows-1252
UCS-2LE (Microsoft's style of Unicode)
Each character is two bytes in the file. This supports Unicode 2.0, which is most of the characters in the Unicode character set.
UTF-8
Each character may be one, two, three, four, five, or six bytes in the file. This supports the full range of Unicode 3.x.