Recipient-Friendly MIME generation

Keith Moore

MIME is widely supported, but many tools do a poor job of generating MIME. The tools either generate messages of excessive length, or which cannot be read on some platforms, or which -- for no good reason -- are not easily read with non-MIME mail readers. This has made some communities hostile to the introduction of MIME and delayed MIME's acceptance. This memo consists of advice to MIME implementors to generate ``Recipient-friendly MIME'' - that is, MIME which will not have an unnecessarily adverse impact on recipients.

NOTE: Most of what is written here applies only to typed-in text, not to attachments. The author's philosophy is that typed-in text should (by default) be readable by the maximum number of recipients, even if that involves making slight changes to the text (auto-wrapping of lines, deleting trailing spaces, substituting `` and '' for left and right double quote). Attachments, on the other hand, should be conveyed byte-for-byte intact to the recipient.

1. Don't use quoted-printable (or for that matter, base64) to encode typed-in text, unless the text contains non-ASCII characters.

Quoted-printable should only be used when it's necessary to convey some characters that cannot be represented in 7-bit ASCII. Some mail senders automatically encode everything in quoted-printable (or even base64) whether it needs it or not. This is annoying to recipients who don't have MIME mail readers.

NOTE: Some people feel that enough mailers are 8-bit transparent, that quoted-printable should not be used at all by default. However, there are still some mailers out there that will reject unencoded 8-bit text.

2. When generating quoted-printable from typed-in text, delete trailing spaces first.

If someone accidentally types a space at the end of a line, and it gets encoded in quoted-printable, the result looks like this:

There is a space at the end of this line=20
Here is the following line.

or maybe even like this:

There is a space at the end of this line=
 =
Here is the following line.

This is ugly. It either adds extra blank lines or extra =20 thingys which (again) are annoying to recipients who don't have MIME mail readers.

If you delete trailing spaces before applying quoted-printable encoding, short lines without = characters in them will look just the same to MIME recipients and non-MIME recipients.

3. In text/plain, it's the sender's responsibility to wrap long lines.

The wording in the spec is a bit muddy about this, but text/plain is supposed to consist of preformatted text. Line breaks are explicit and indicated by CR LF.

Some existing products don't insert line breaks except between paragraphs. Recipients of messages from such products may see long lines that, depending on the recipient's user agent, either (a) are wrapped in strange places, (b) are truncated at the right margin, or (c) extend way past the right side of the window such that horizontal scrolling is required to read the text.

There are basically two kinds of user interfaces for typed-in text:

1. The mail composer automatically word-wraps long lines that are typed in.

2. The mail composer does not do auto-wrap; the author must explicitly supply line terminators (typically by pressing RETURN or ENTER)

NOTE: The line length limit in quoted-printable encoding is for the purpose of allowing transport through old mail systems that stored mail in fixed-length 80 character records. It applies only to the encoded form of the text (after quoted-printable encoding has been applied). This does not affect what the recipient sees (unless the recipient lacks MIME capability).

4. When generating quoted-printable, try to put ``soft line breaks'' between words rather than in the middle of words.

To someone without a MIME mail reader,

|----------------| (pretend this is 78 columns wide) 

This is an =
extremely long =
line that must =
be wrapped.

looks much better than

|----------------| (pretend this is 78 columns wide) 
This is an extrem=
ely long line tha=
t must be wrapped.

(This mostly applies to attachments, rather than typed-in text, since for typed-in text you should normally wrap long lines before encoding. )

5. Don't send typed-in text as HTML, unless it's needed to express something requested by the author.

Some mail composers send HTML body parts even when the author has typed in only plain text with no markup - no bold or italic, no changes in character sizes, no sub or superscripts, and no ``links''. This is silly, especially if the HTML is difficult to read as plain text.

This also implies that HTML should not be the default for composed messages.

6. If you must generate HTML from typed-in text, generate recipient-friendly HTML.

If the author has explicitly requested some amount of markup (boldface/italic, typeface, etc.), HTML might be a reasonable choice to represent such markup. But the HTML should be generated such that it can easily be read by a reader that doesn't support HTML, or for that matter, MIME. (Note that the MIME specification says that, in the absence of specific support for text/html, a mail reader should display text/html as if it were text/plain.)

Some guidelines:

7. Don't use HTML to reply to a plain text message, if that message is quoted in the reply.

...especially if the message being replied to already contains "> " or some other quote character from previous replies.

Okay, so the "> " convention is a pretty crude mechanism to indicate that you're quoting something from a previous message, and it gets pretty ugly after several layers of replies. It's understandable that people would like to replace this with some nestable HTML construct. But if there's anything worse than lots of "> "s at the beginning of each line, it's a mixture of "> "s (or ">"s) and HTML constructs.

8. Don't send multipart/alternative unless the author explicitly asks for it.

Multipart/alternative is useful when you want to send a message in multiple formats and have the recipient's mail reader use the ``best'' format that it understands. But it takes up a lot of extra space. It's especially annoying to mailing lists, and many usenet sites will actually refuse to accept such messages.

A single recipient-friendly HTML body part might be better than a multipart/alternative body part consisting of text/plain and text/html components.

9. Don't use nonstandard content-types or character sets when there is a good ``standard'' choice.

Some mail senders use proprietary character sets even when all of the characters needed to display the mesasge are in US-ASCII or ISO-8859-1. For example, it's silly to use a proprietary character set to transmit characters such as apostrophe and double quotes. ASCII ' and `` '' work well enough.

10. Use content-disposition to label attachments and associate them with reasonable filenames.

The content-disposition header field should be used to indicate that a body part is an attachment, and to suggest a filename by which it might be stored. e.g.

Content-disposition: attachment; filename="foo.txt"

Note that the "name=" parameter is only defined for the application/octet-stream content-type. Its use with other content-types is mostly harmless, but neither is it universally recognized.

11. Don't include attachments (especially those in proprietary formats) unless the author explicitly asks for them in that message.

This applies to things like "MS-TNEF" attachments as well as to things like vCard objects. Since most recipients have no use for such things, they waste disk space and bandwidth. They are also a nuisance to recipients who lack MIME-capable mail readers.

Worse, many mail readers handle unrecognized attachments by asking if the recipient wants to save the attachment to a file. So having this cruft sent out in every message isn't merely wasteful, it's downright offensive to the recipient, and it doesn't reflect well on the sender.

If you want to make your vCard accessible to recipients without annoying them, put it on a web server and include its URL in your signature.

12. Don't generate extra quotes around phrases in header fields.

RFC 822 requires quotes around names when they contain special characters. Since ``.'' is a special character, the name "Bryan K. Moore" needs to be in double quotes.  Unless the name contains a special character, the double quotes should not be included.

Even worse is including single quotes (usually within double quotes). These serve no purpose at all.

13. Don't generate redundant phrases in header fields.

If an address isn't accompanied by a name, don't generate one. In particular, don't copy the address to the phrase that precedes the address.

14. Don't generate "Q" or "B" encoding in header fields unless it's necessary.

"Q" or "B" encoded-words (RFC 2047 and its predecessors) should only be used to encode non-ASCII characters that appear in text portions of message or body part headers - say in the phrase that precedes an address, in a Subject field, or in a comment.

Encoded-words should never be used for machine-readable portions of the headers (e.g. within an address). Addresses consist of ASCII characters only, both for compatibility with RFC 822, and so that anybody in the world can type them in.

15. Don't generate header fields that contain unencoded non-ASCII characters.

Message and body part headers consist entirely of ASCII characters. Unencoded non-ASCII characters in headers are nonstandard, undefined, and will cause some mail parsers to barf.

16. Use appropriate, registered content-types.

17. Don't label text with a superset character set when a smaller character set will suffice.

In particular, if a body part consists only of ASCII characters, label it as "US-ASCII" rather than "iso-8859-1" or something else.

Acknowledgements

Thanks to Steinar Bang, Nathaniel Borenstein, Earl Hood, and Dan Wing for their comments.