Encoding
Text
can be encoded in multiple ways. Most (older) textfiles use an
encoding named ANSI, which has room for a limited amount of different
characters, but is often sufficient to display all the text. However,
Unicode encodings allow for a much richer amount of characters,
allowing a single file to contain many languages at once, at the cost
of an increase in filesize. Notepad++ will automatically try to
detect the encoding used when opening a file, but allows you to
change it when editing it. To simply change the displayed encoding
(without modifying the actual text), select one of the
options from the Format menu. The convert the text to a certain
encoding, select one of the options in the format menu.
It
can happen that a file is saved with a certain encoding, but upon
reopening it in Notepad++ it is detected with another encoding. This
is a technical limitation and happens because sometimes the resulting
file will not differ even though different encodings are used. This
is most noticeable if the file is saved without a special BOM (Byte
Order Mark) indicating the used encoding.
Notepad++ offers the following encoding schemes:
ANSI : Older encoding, smallest filesize but error prone due to use of various codepages
UTF-8
: Unicode encoding, most Western character take one byte of filesize,
but other character can take up more, 3 to 4 most commonly. A three
byte BOM will be added upon save.
UTF-8 without BOM : Like UTF-8, but no BOM is added. Saves three bytes, but makes encoding detection harder.
UTF-16 Little Endian : All characters are two bytes in size, pairs are Little Endian ordered. A 4 byte BOM is added upon save.
UTF-16 Big Endian : All characters are two bytes in size, pairs are Big Endian ordered. A 4 byte BOM is added upon save.