Parsing forwarded emails - python

I'm writing some code to parse forwarded emails. What I'm not sure is if maybe there is some Python library, some RFC I could stick to or some other resource that would allow me to automate the task.
To be precise, I don't know if the "layout" of forwarded emails is covered by some standard or recommendation, or if it has just evolved over the years so now most email clients produce similar output for the text part:
Begin forwarded message:
> From: Me <me#me.me>
> Date: January 30, 2010 18:26:33 PM GMT+02:00
> To: Other Me <other-me#me.me>
> Subject: Unwise question
-- and go wild for attachments (and whatever other MIME sections can be there).
If it's still not precise enough I'll clarify it, it's just that I'm not 100% sure what to ask about (RFC, Python lib, convention or something else).

Unlike what many other people said, there is a standard on forwarded emails, RFC 2046, "Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types", more than ten years old. See specially its section 5.2, "Message Media Type".
The basic idea behind RFC 2046 is to encapsulate one message into the MIME part of another, of type named (unfortunately) message/rfc822 (never forget that MIME is recursive). The MIME library of Python can handle it fine.
I did not downvote the other answers because they are right in one respect: the standard is not followed by every mailer. For instance, the mutt mailer can forward a message in RFC 2046 format but also in a adhoc format. So, in practice, a mailer probably cannot handle only RFC 2046, it also has to parse the various others and underspecified syntaxes.

In my experience just about ever email client forwards/replies differently. Typically you'll have a plain text version and a html encoded version in the mime at the bottom of the mail pack. Mail headers do have a RFC (http://www.faqs.org/rfcs/rfc2822.html "2822"), but unfortunately the content of the message body is out side the scope.
Not only do you have to contend with the mail client variance, but the variance of user preferences. As an example: Lotus Notes puts replies at the top and Thunderbird replies at the bottom. So when a Thunderbird user is replying to a Lotus Notes user's reply they might insert their reply at the top and leave their signature at the bottom.
Another pitfall maybe contending with word wrapping of replied chains.
>>>> The outer reply that goes over the limit and is word wraped by
the middle replier's mail client\n
>> The message body of a middle reply
> Previous reply
Newest reply
I wouldn't parse the message and leave it to the user to parse in their heads. Or, I'd borrow the code from another project.

As the other answers already indicate: there is no standard, and your program is not going to be flawless.
You could have a look at the headers, in particular the User-Agent header, to see what kind of client was used, and code specifically for the most common clients.
To find out what clients you should consider to support, have a look at this popularity study. Various Outlooks, Yahoo!, Hotmail, Mail.app, iPhone mail, Gmail and Lotus Notes rank highly. About 11% of the mail is classified as "undetectable", but using headers from the forwarded e-mail you might be able to do better than that. Note that the statistics were gathered by placing an image inside the e-mail, so results may be skewed.
Another problem is HTML mail, which may or may not include a plain-text version. I'm not sure about clients' usual behaviour in this respect.

Standard for a reply/forward is > prepending each line the number of times the mail is nested including who sent the initial e-mail is up to the client to sort out. So what you need to do in python is simply add > to the start of each line.
imap Test <imap#gazler.com> Wrote:
>
>twice
>imap Test wrote:
>> nested
>>
>> imap#gazler.com wrote:
>>> test
>>>
>>> --
>>> Message sent via AHEM.
>>>
>>
>
Attachments just simply need to be attached to the message or as you put it 'go wild.'
I am not familiar with python, but believe the code would be:
string = string.replace("\n","\n>")

Related

Recover email list from a mailbox using Python

Is there any possibility or any library to log in to a given mail and recover a list of messages for a given sender?
I mean the situation in which I provide an e-mail address, based on this address, all messages in the inbox are filtered, and I am returned to the list of e-mails or the user's last message.
I use flask-mail to send emails, but I don't think it is possible to recover the list of messages.
You should check the standard mailbox library. It provides functionalities to read mailboxes stored on disk using the most popular mailbox file formats (Maildir, mbox, MH, Babyl, and MMDF at the time of this writing).
Be warned, nowaday, for performance, reasons many mail clients are using embedded database engines to store emails. SQLite being popular choice, you can also try the sqlite3 library.
Finally, You will also find exotic file formats like Mork. For that, you will have to write your own parser or turn to PyPy to search if someone has already done the work for you.
As a personal note, if your email client allows changing its storage backend, you may consider switching to a well know text-based storage format for your emails--it definitely helps in case of disaster recovery
As an example, I am using Thunderbird and set it up to use the mbox file format. So I can iterate over the message of my Junk folder that way:
>>> path = '~/.thunderbird/4tuag540.default/ImapMail/ssl0.ovh-1.net/INBOX.sbd/Junk'
>>> from mailbox import mbox
>>> junk = mbox(path)
>>> for message in junk:
... # Prinf the "From" header:
... print(message['From'])
...

Python read my outlook email mailbox and parse messages [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Reading e-mails from Outlook with Python through MAPI
I am completely new to Python and have been given the task to write a program that connects to my Microsoft Outlook mailbox, goes through all the emails and if the subject has a certain word, then the details of the email time and subject will be saved in variables, as well as the email message body will be parsed and relevant information will be stored in variables. Then this information will be stored in an external server/database. It also needs to be able to monitor any new emails that comes to my mailbox and repeat the drill of checking the subject line and taking appropriate action.
I have written exactly the same kind of program in C# earlier using the Interop library, but now need to do this in Python. I can figure out the nitty-gritty details by readin gthe module documentations later on, but from a high level perspective what modules should I use. I have been doing my research and some modules that have been mentioned include email, procmail and imaplib, but what do the Python veterans here recommend for the kind of project I am overtaking?
Thanks in advance for any help you might be able to provide!
At one company I worked we have a mailbox for suggestion with websites that had 'adult' material and one mailbox for spam mail that should be blocked.
Once I began working I was in 'charge' of this 'gracious' jobs.
Checking it there was something like 2000 unread mails to block and 4000 spam mails to block too.
Of course that is a function to be automatized and I looked for a good solution for me.
What I did:
[1] Used python IMAP to connect to Exchange server
[2] Used beatifulsoup (python) to parse the href values inside the email
[3] After that send a email 'thanking' the user for its collaboration (very important)
Three days after my boss thanked me for the great effort I was doing answering all the e-mails and that we got compliments. Because NOW we are answering back the customers. (not me the script)
Ok. now lets do a plan
Check the imap python module [1], and after take one tutorial using ssl imap4 [4]
Decide What is best for YOUR problem? Download the emails (pop3) or search and browse it at server (IMAP).
CHECK if you can connect using the protocols IMAP4 or POP3 Before, exchange is buggy in this part please check this bug report too [3]
Ok, you are sure you can connect using IMAP4 or POP3, now fetch one message and parse it with beatiful soup or lxml. (my case I looked for href and 'mailto:')
Do a nice message using the field 'from:' the email making it personal
PROFIT
[1] google it imap python
[2] google it BeautifulSoup python
[3] http://support.microsoft.com/kb/296387
[4] http://yuji.wordpress.com/2011/06/22/python-imaplib-imap-example-with-gmail/
Sorry but I had to give the google urls because of my low score.
I hope this answer give you some good pointers to your solution.
Of course you can make it more hax0r using lxml, sending the data to a DB.
But after you connect and start manipulating you can do anything :)

How to maintain mail conversion (reply / forward / reply to all like gmail) of email using Python pop/imap lib?

I've develop webmail client for any mail server.
I want to implement message conversion for it — for example same emails fwd/reply/reply2all should be shown together like gmail does...
My question is: what's the key to find those emails which are either reply/fwd or related to the original mail....
The In-Reply-To header of the child should have the value of the Message-Id header of the parent(s).
Google just seems to chain messages based on the subject line (so does Apple Mail by the way.)

Python - Open default mail client using mailto, with multiple recipients

I'm attempting to write a Python function to send an email to a list of users, using the default installed mail client. I want to open the email client, and give the user the opportunity to edit the list of users or the email body.
I did some searching, and according to here:
http://www.sightspecific.com/~mosh/WWW_FAQ/multrec.html
It's apparently against the RFC spec to put multiple comma-delimited recipients in a mailto link. However, that's the way everybody else seems to be doing it. What exactly is the modern stance on this?
Anyhow, I found the following two sites:
http://2ality.blogspot.com/2009/02/generate-emails-with-mailto-urls-and.html
http://www.megasolutions.net/python/invoke-users-standard-mail-client-64348.aspx
which seem to suggest solutions using urllib.parse (url.parse.quote for me), and webbrowser.open.
I tried the sample code from the first link (2ality.blogspot.com), and that worked fine, and opened my default mail client. However, when I try to use the code in my own module, it seems to open up my default browser, for some weird reason. No funny text in the address bar, it just opens up the browser.
The email_incorrect_phone_numbers() function is in the Employees class, which contains a dictionary (employee_dict) of Employee objects, which themselves have a number of employee attributes (sn, givenName, mail etc.). Full code is actually here (Python - Converting CSV to Objects - Code Design)
from urllib.parse import quote
import webbrowser
....
def email_incorrect_phone_numbers(self):
email_list = []
for employee in self.employee_dict.values():
if not PhoneNumberFormats.standard_format.search(employee.telephoneNumber):
print(employee.telephoneNumber, employee.sn, employee.givenName, employee.mail)
email_list.append(employee.mail)
recipients = ', '.join(email_list)
webbrowser.open("mailto:%s?subject=%s&body=%s" %
(recipients, quote("testing"), quote('testing'))
)
Any suggestions?
Cheers,
Victor
Well, since you asked for suggestions: forget about the mailto: scheme and webbrowser, and write a small SMTP client using Python's smtplib module. It's standard, fully supported on all systems, and there's an example included in the documentation which you can practically just copy-and-paste pieces out of.
Of course, if you're using smtplib you will have to ask the user for the details of an SMTP server to use (hostname and port, and probably a login/password). That is admittedly inconvenient, so I can see why you'd want to delegate to existing programs on the system to handle the email. Problem is, there's no system-independent way to do that. Even the webbrowser module doesn't work everywhere; some people use systems on which the module isn't able to detect the default (or any) browser, and even when it can, what happens when you provide a mailto: link is entirely up to the browser.
If you don't want to or can't use SMTP, your best bet might be to write a custom module that is able to detect and open the default email client on as many different systems as possible - basically what the webbrowser module does, except for email clients instead of browsers. In that case it's up to you to identify what kinds of mail clients your users have installed and make sure you support them. If you're thorough enough, you could probably publish your module on PyPI (Python package index) and perhaps even get it included in a future version of the Python standard library - I'm sure there are plenty of people who would appreciate something like that.
As is often the case in Python, somebody's already done most of the hard work. Check out this recipe.
In the following line, there shouldn’t be a space after the comma.
recipients = ', '.join(email_list)
Furthermore, Outlook needs semicolons, not commas. Apart from that, mailto never gave me grief.
The general tip is to test mailto URLs manually in the browser first and to debug URLs by printing them out and entering them manually.

Deleting the most recently received email via Python script?

I use Gmail and an application that notifies me if I've received a new email, containing its title in a tooltip. (GmailNotifier with Miranda-IM) Most of the emails I receive are ones I don't want to read, and it's annoying having to login to Gmail on a slow connection just to delete said email. I believe plugin is closed source.
I've been (unsuccessfully) trying to write a script that will login and delete the 'top' email (the one most recently received). However this is not as easy I thought it would be.
I first tried using imaplib, but discovered that it doesn't contain any of the methods I hoped it would. It's a bit like the dbapi spec, containing only minimal functionality incase the imap spec is changed. I then tried reading the imap RFC (rfc3501). Halfway through it, I realized I didn't want to write an entire mail client, so decided to try using pop3 instead.
poplib is also minimal but seemingly has what I need. However pop3 doesn't appear to sort the messages in any order I'm familiar with. I have to either call top() or retr() on every single email to read the headers if I want to see the date received.
I could probably iterate through every single message header, searching for the most recent date, but that's ugly. I want to avoid parsing my entire mailbox if possible. I also don't want to 'pop' the mailbox and download any other messages.
It's been 6 hours now and I feel no closer to a solution than when I started. Am I overlooking something simple? Is there another library I could try? (I found a 'chilkat' one, but it's bloated to hell, and I was hoping to do this with the standard library)
import poplib
#connect to server
mailserver = poplib.POP3_SSL('pop.gmail.com')
mailserver.user('recent:YOURUSERNAME') #use 'recent mode'
mailserver.pass_('YOURPASSWORD') #consider not storing in plaintext!
#newest email has the highest message number
numMessages = len(mailserver.list()[1])
#confirm this is the right one, can comment these out later
newestEmail = mailserver.retr(numMessages)
print newestEmail
#most servers will not delete until you quit
mailserver.dele(numMessages)
mailserver.quit()
I worked with the poplib recently, writing a very primitive email client. I tested this with my email server (not gmail) on some test emails and it seemed to work correctly. I would send yourself a few dummy emails to test it out first.
Caveats:
Make sure you are using 'recent
mode':
http://mail.google.com/support/bin/answer.py?answer=47948
Make sure your Gmail account has POP3
enabled: Gmail > Settings >
Forwarding and POP/IMAP > "Enable POP
for all mail"
Hope this helps, it should be enough to get you going!

Categories

Resources