Exim performs various transformations on the sender and recipient addresses of all messages that it handles, and also on the messages' header lines. Some of these are optional and configurable, while others always take place. All of this processing, except rewriting as a result of routing, and the addition or removal of header lines while delivering, happens when a message is received, before it is placed on Exim's queue.
Some of the automatic processing takes place only for “locally-originated” messages. This adjective is used to describe messages that are not received over TCP/IP, but instead are passed to an Exim process on its standard input. This includes the interactive “local SMTP” case that is set up by the -bs command line option. Note: messages received over TCP/IP on the loopback interface (127.0.0.1 or ::1) are not considered to be locally-originated. Exim does not treat the loopback interface specially in any way.
RFC 2821 specifies that CRLF (two characters: carriage-return, followed by linefeed) is the line ending for messages transmitted over the Internet using SMTP over TCP/IP. However, within individual operating systems, different conventions are used. For example, Unix-like systems use just LF, but others use CRLF or just CR.
Exim was designed for Unix-like systems, and internally, it stores messages using the system's convention of a single LF as a line terminator. When receiving a message, all line endings are translated to this standard format. Originally, it was thought that programs that passed messages directly to an MTA within an operating system would use that system's convention. Experience has shown that this is not the case; for example, there are Unix applications that use CRLF in this circumstance. For this reason, and for compatibility with other MTAs, the way Exim handles line endings for all messages is now as follows:
LF not preceded by CR is treated as a line ending.
CR is treated as a line ending; if it is immediately followed by LF, the LF is ignored.
The sequence “CR, dot, CR” does not terminate an incoming SMTP message, nor a local message in the state where a line containing only a dot is a terminator.
If a bare CR is encountered within a header line, an extra space is added after the line terminator so as not to end the header line. The reasoning behind this is that bare CRs in header lines are most likely either to be mistakes, or people trying to play silly games.
By default, Exim expects every address it receives from an external host to be fully qualified. Unqualified addresses cause negative responses to SMTP commands. However, because SMTP is used as a means of transporting messages from MUAs running on personal workstations, there is sometimes a requirement to accept unqualified addresses from specific hosts or IP networks.
Exim has two options that separately control which hosts may send unqualified sender or receipient addresses in SMTP commands, namely sender_unqualified_hosts and recipient_unqualified_hosts. In both cases, if an unqualified address is accepted, it is qualified by adding the value of qualify_domain or qualify_recipient, as appropriate.
Messages that have come from UUCP (and some other applications) often begin with a line containing the envelope sender and a timestamp, following the word “From”. Examples of two common formats are:
From a.oakley@berlin.mus Fri Jan 5 12:35 GMT 1996 From f.butler@berlin.mus Fri, 7 Jan 97 14:00:00 GMT
This line precedes the RFC 2822 header lines. For compatibility with Sendmail, Exim recognizes such lines at the start of messages that are submitted to it via the command line (that is, on the standard input). It does not recognize such lines in incoming SMTP messages, unless the sending host matches ignore_fromline_hosts or the -bs option was used for a local message and ignore_fromline_local is set. The recognition is controlled by a regular expression that is defined by the uucp_from_pattern option, whose default value matches the two common cases shown above and puts the address that follows “From” into $1.
When the caller of Exim for a non-SMTP message that contains a “From” line is a trusted user, the message's sender address is constructed by expanding the contents of uucp_sender_address, whose default value is “$1”. This is then parsed as an RFC 2822 address. If there is no domain, the local part is qualified with qualify_domain unless it is the empty string. However, if the command line -f option is used, it overrides the “From” line.
If the caller of Exim is not trusted, the “From” line is recognized, but the sender address is not changed. This is also the case for incoming SMTP messages that are permitted to contain “From” lines.
Only one “From” line is recognized. If there is more than one, the second is treated as a data line that starts the body of the message, as it is not valid as a header line. This also happens if a “From” line is present in an incoming SMTP message from a source that is not permitted to send them.
RFC 2822 makes provision for sets of header lines starting with the string Resent- to be added to a message when it is resent by the original recipient to somebody else. These headers are Resent-Date:, Resent-From:, Resent-Sender:, Resent-To:, Resent-Cc:, Resent-Bcc: and Resent-Message-ID:. The RFC says:
Resent fields are strictly informational. They MUST NOT be used in the normal processing of replies or other such automatic actions on messages.
This leaves things a bit vague as far as other processing actions such as address rewriting are concerned. Exim treats Resent- header lines as follows:
A Resent-From: line that just contains the login id of the submitting user is automatically rewritten in the same way as From: (see below).
If there's a rewriting rule for a particular header line, it is also applied to Resent- header lines of the same type. For example, a rule that rewrites From: also rewrites Resent-From:.
For local messages, if Sender: is removed on input, Resent-Sender: is also removed.
For a locally-submitted message, if there are any Resent- header lines but no Resent-Date:, Resent-From:, or Resent-Message-Id:, they are added as necessary. It is the contents of Resent-Message-Id: (rather than Message-Id:) which are included in log lines in this case.
The logic for adding Sender: is duplicated for Resent-Sender: when any Resent- header lines are present.
Whenever Exim generates a bounce or a delay warning message, it includes the header line
Auto-Submitted: auto-generated
If Exim is called with the -t option, to take recipient addresses from a message's header, it removes any Bcc: header line that may exist (after extracting its addresses). If -t is not present on the command line, any existing Bcc: is not removed.
If a locally-generated message has no Date: header line, Exim adds one, using the current date and time.
Delivery-date: header lines are not part of the standard RFC 2822 header set. Exim can be configured to add them to the final delivery of messages. (See the generic delivery_date_add transport option.) They should not be present in messages in transit. If the delivery_date_remove configuration option is set (the default), Exim removes Delivery-date: header lines from incoming messages.
Envelope-to: header lines are not part of the standard RFC 2822 header set. Exim can be configured to add them to the final delivery of messages. (See the generic envelope_to_add transport option.) They should not be present in messages in transit. If the envelope_to_remove configuration option is set (the default), Exim removes Envelope-to: header lines from incoming messages.
If a locally-generated incoming message does not contain a From: header line, Exim adds one containing the sender's address. The calling user's login name and full name are used to construct the address, as described in section 44.16. They are obtained from the password data by calling getpwuid() (but see the unknown_login configuration option). The address is qualified with qualify_domain.
For compatibility with Sendmail, if an incoming, non-SMTP message has a From: header line containing just the unqualified login name of the calling user, this is replaced by an address containing the user's login name and full name as described in section 44.16.
If a locally-generated incoming message does not contain a Message-ID: or Resent-Message-ID: header line, Exim adds one to the message. If there are any Resent-: headers in the message, it creates Resent-Message-ID:. The id is constructed from Exim's internal message id, preceded by the letter E to ensure it starts with a letter, and followed by @ and the primary host name. Additional information can be included in this header line by setting the message_id_header_text and/or message_id_header_domain options.
A Received: header line is added at the start of every message. The contents are defined by the received_header_text configuration option, and Exim automatically adds a semicolon and a timestamp to the configured string.
Return-path: header lines are defined as something an MTA may insert when it does the final delivery of messages. (See the generic return_path_add transport option.) Therefore, they should not be present in messages in transit. If the return_path_remove configuration option is set (the default), Exim removes Return-path: header lines from incoming messages.
For a locally-originated message from an untrusted user, Exim may remove an existing Sender: header line, and it may add a new one. You can modify these actions by setting local_sender_retain true or local_from_check false. No processing of Sender: header lines is done for messages received by TCP/IP or for messages submitted by trusted users.
When a local message is received from an untrusted user and local_from_check is true (the default), a check is made to see if the address given in the From: header line is the correct (local) sender of the message. The address that is expected has the login name as the local part and the value of qualify_domain as the domain. Prefixes and suffixes for the local part can be permitted by setting local_from_prefix and local_from_suffix appropriately. If From: does not contain the correct sender, a Sender: line is added to the message.
If you set local_from_check false, this checking does not occur. However, the removal of an existing Sender: line still happens, unless you also set local_sender_retain to be true. It is not possible to set both of these options true at the same time.
When a message is delivered, the addition and removal of header lines can be specified on any of the routers and transports, and also in the system filter. Changes specified in the system filter affect all deliveries of a message.
Header changes specified on a router affect all addresses handled by that router, and also any new addresses it generates. If an address passes through several routers, the changes are cumulative. When a message is processed by a transport, the message's original set of header lines is output, except for those named in any headers_remove options that the address has encountered as it was processed, and any in the transport's own headers_remove option. Then the new header lines from headers_add options are output.
When Exim constructs a sender address for a locally-generated message, it uses the form
<user name> <$<login>>@<qualify_domain>>
For example:
Zaphod Beeblebrox <zaphod@end.univ.example>
The user name is obtained from the -F command line option if set, or otherwise by looking up the calling user by getpwuid() and extracting the “gecos” field from the password entry. If the “gecos” field contains an ampersand character, this is replaced by the login name with the first letter upper cased, as is conventional in a number of operating systems. See the gecos_name option for a way to tailor the handling of the “gecos” field. The unknown_username option can be used to specify user names in cases when there is no password file entry.
In all cases, the user name is made to conform to RFC 2822 by quoting all or parts of it if necessary. In addition, if it contains any non-printing characters, it is encoded as described in RFC 2047, which defines a way of including non-ASCII characters in header lines. The value of the headers_charset option specifies the name of the encoding that is used (the characters are assumed to be in this encoding). The setting of print_topbitchars controls whether characters with the top bit set (that is, with codes greater than 127) count as printing characters or not.
RFC 2822 states that the case of letters in the local parts of addresses cannot be assumed to be non-significant. Exim preserves the case of local parts of addresses, but by default it uses a lower-cased form when it is routing, because on most Unix systems, usernames are in lower case and case-insensitive routing is required. However, any particular router can be made to use the original case for local parts by setting the caseful_local_part generic router option.
If you must have mixed-case user names on your system, the best way to proceed, assuming you want case-independent handling of incoming email, is to set up your first router to convert incoming local parts in your domains to the correct case by means of a file lookup. For example:
correct_case: driver = redirect domains = +local_domains data = ${lookup{$local_part}cdb\ {/etc/usercased.cdb}{$value}fail}\ @$domain
For this router, the local part is forced to lower case by the default action (caseful_local_part is not set). The lower-cased local part is used to look up a new local part in the correct case. If you then set caseful_local_part on any subsequent routers which process your domains, they will operate on local parts with the correct case in a case-sensitive manner.
RFC 2822 forbids empty components in local parts. That is, an unquoted local part may not begin or end with a dot, nor have two consecutive dots in the middle. However, it seems that many MTAs do not enforce this, so Exim permits empty components for compatibility.
Rewriting of sender and recipient addresses, and addresses in headers, can happen automatically, or as the result of configuration options, as described in chapter 31. The headers that may be affected by this are Bcc:, Cc:, From:, Reply-To:, Sender:, and To:.
Automatic rewriting includes qualification, as mentioned above. The other case in which it can happen is when an incomplete non-local domain is given. The routing process may cause this to be expanded into the full domain name. For example, a header such as
To: hare@teaparty
might get rewritten as
To: hare@teaparty.wonderland.fict.example
Rewriting as a result of routing is the one kind of message processing that does not happen at input time, as it cannot be done until the address has been routed.
Strictly, one should not do any deliveries of a message until all its addresses have been routed, in case any of the headers get changed as a result of routing. However, doing this in practice would hold up many deliveries for unreasonable amounts of time, just because one address could not immediately be routed. Exim therefore does not delay other deliveries when routing of one or more addresses is deferred.