| lib | ||
| COPYRIGHT | ||
| email.gemspec | ||
| LICENCE | ||
| README.md | ||
NOTE NOTE NOTE
This project is not being maintained! It’s some code I wrote over a decade ago which I am releasing because it seems a shame for it to sit on a hard drive gathering dust.
In particular, what is written below about problems with the Mail gem may not be true any more 12 years later.
Email is a well-behaved email handling library for Ruby.
"If your mail reader can read it, so can the Email library." That's how it should be, and if it isn't, you should report a bug.
The Email library's message parser is fast, and is only going to get faster. I'm working on a Ragel-based C parser for some parts of the parser (like tokenisation) so you'll only see improvements in speed over time with new versions of the library.
Missing features
- Some kind of sensible email-writing DSL.
- A way to inject messages into the local (or remote) mail queue once they've been written. I'm not sure whether this is a good thing.
The API
The API is quite simple.
msg = Email::Message.new File.read('B802810E-B8D4-4630-A203-DAA10243F66E.eml')
# => unparsed email message
msg.subject # => "Re: Meeting on Tuesday"
msg.from # => ["God" <God@heaven.af.mil>]
msg.from.first.address # => "God@heaven.af.mil"
msg.to # => ["Angels" <angels@lists.heaven.af.mil>]
msg.to.first.name # => "Angels mailing list"
msg.body # => "Just a quick reminder about meeting [...]"
## or ##
msg.text_part.body # => "Just a quick reminder about meeting [...]"
msg.html_part.body # => "<p>Just a quick reminder about [...]"
The problems with the Mail gem
The authors of the Mail gem seemed to take the approach that "parsing RFC822 formatted email messages is a hard problem, so here's a hard solution."
- The Mail gem is slow. It takes about 7 times longer to parse a message than does the Email library.
- The Mail gem is unreliable. On my test set of about 200,000 messages, the Mail gem fails roughly once for every 1,000 emails it processes. The Email library successfully parses them all.
- The Mail gem makes accessing some information, such as the human-readable name of a recipient in an address list, unnecessarily difficult.
- Sometimes, when the Mail gem decides it can't extract any information from a structured field, it will abandon it and, instead of making either an empty set of information, or a set of information that is at least somewhat accurate available through its APIs, it will it will just replace it with a field of type Mail::Multibyte::Chars. This makes building reliable software with the Mail gem unnecessarily difficult, since you must check every time you access a field whether it is of the kind you expect, or instead an 'unstructured' field. The Email library will always attempt to parse things into the same format for the same field.
- The Mail gem does not tokenise field values where necessary. This means it can only be at best vaguely accurate about some aspects of the 822 specification, because it can't handle things like whitespace, comments, or even quoted parts inside an address list. The former is somewhat frequent, especially where folding has been applied after the original writer was done with the message; comments inside an address are admittedly very rare but are allowed by the spec; quoted parts are rare also but more common than comments. There are numerous other places where the Mail gem's failure to tokenise the fields results in a parser which is bulkier, slower and less reliable than it could have been.
- The Mail gem doesn't handle invisible lines in folded headers properly, always treating them as if they're the end of the header block.
In fact, I believe that a lot of these problems originate from the fact that the Mail gem's test suite is based on the TREC spam corpus, in turn based on the Enron email dataset. The problem is that the format of these messages are mostly faked by Microsoft Outlook. Thus they are mostly written by the same message writer software, and so don't include a fair representation of the slightly different formatting quirks that the 822 format writers of all the different mailers in the world have. When you have such a narrow sample set to test against, it's easy to be mislead into thinking that your software is a lot more reliable than it actually is.
Moderately esoteric features of the Email library
- The Email library does not actually parse a message until it has to. It loads the source into memory, but doesn't actually parse it until later.
- Related to the last, you don't have to actually load the message into
memory until it's time to parse it. You can pass a block to the
Email::Message.newmethod and it will be treated as a thunk that will return the source of the message. This way you avoid wasting memory: you just have to cons a new thunk for every message, instead of a string to hold the entire source of the message. This, and the last, will save a lot of memory if you load a lot of messages into core from disk at once and don't use most of them for a while.
Thanks
This library was developed in response to the needs of two applications: the obvious one is Trilby, but the original app is a mailing-list archiving program. dpk has owed this program to Grant Hutchinson for far too long now. Sincerest thanks for his patience, goodwill, and entertainment. He also provided a large chunk of the archive of the NewtonTalk mailing-list, which I've used as test data throughout the development of this library. Without this, the parsing code would not be anywhere near as robust.
Lastly, thanks to Daniel J. Bernstein for his superb documentation on the 822 message format, and mess822 both of which were frequent references when writing the Email library. If you want to host your own email system, qmail is undoubtedly the best way to do it.