Parsing All Mail from Mailbox File in Python - python

Maybe I'm going about this the wrong way, but I want to parse a single "catch all" email inbox via Python. I see the email module and I can make it parse an individual email, but what I want to do is open (for example) /var/spool/mail/catchall and parse all of the individual messages inside it. Opening that file and running the parser over it treats the whole thing as one giant email. How would I break it into individual messages?
Alternatively: is this A Bad Idea, given I'm going to want to delete the messages when I'm done with them? I'm tying this route instead of POP/ IMAP only because the server support isn't available right now.

You'll want to use mailbox to actually go through the mailbox.

Related

Recover email list from a mailbox using Python

Is there any possibility or any library to log in to a given mail and recover a list of messages for a given sender?
I mean the situation in which I provide an e-mail address, based on this address, all messages in the inbox are filtered, and I am returned to the list of e-mails or the user's last message.
I use flask-mail to send emails, but I don't think it is possible to recover the list of messages.
You should check the standard mailbox library. It provides functionalities to read mailboxes stored on disk using the most popular mailbox file formats (Maildir, mbox, MH, Babyl, and MMDF at the time of this writing).
Be warned, nowaday, for performance, reasons many mail clients are using embedded database engines to store emails. SQLite being popular choice, you can also try the sqlite3 library.
Finally, You will also find exotic file formats like Mork. For that, you will have to write your own parser or turn to PyPy to search if someone has already done the work for you.
As a personal note, if your email client allows changing its storage backend, you may consider switching to a well know text-based storage format for your emails--it definitely helps in case of disaster recovery
As an example, I am using Thunderbird and set it up to use the mbox file format. So I can iterate over the message of my Junk folder that way:
>>> path = '~/.thunderbird/4tuag540.default/ImapMail/ssl0.ovh-1.net/INBOX.sbd/Junk'
>>> from mailbox import mbox
>>> junk = mbox(path)
>>> for message in junk:
... # Prinf the "From" header:
... print(message['From'])
...

Parsing emails as soon as they are received

I have users sending emails with some text I need to extract. Each user's email is mapped to a single mailbox. I'm currently using a cron job that polls the mailbox (postfix) every 5 minutes, checks for new messages, and sends it to a queue where I have workers parse them. I have two main questions:
Is there a way I can parse the email as soon as it's received instead of
polling the server? Also, how could
I implement this to be scalable? For
example, if there are 50 incoming
messages per second.
I'm programatically writing each user's email address to point to mailbox in the postfix configuration file. Would it be better to create a catch all account, so I don't have to write each email address? However, I know catch-all accounts are more susceptible to spam.
Use a pipe alias to catch the email, then use celery to dump it into a MQ for processing.
Yes, this can be done quite easily. All you need to do is configure the postfix to forward email to a script instead of to a mailbox. It does not really have to be a catch-all, you can configure postfix to forward specific emails to a script. The script can be written in any language. I wrote such script in php a couple of times. Another possibility for a very busy server, like 50 emails per second is to write your own filter server, then configure postfix to pass each message to your filter.
TO forward email to a script, in aliases file put a line like this: the path must point to this file
someaccount |/usr/local/bin/emailParser.php
To forward emails to a filter, it has to be configured in master.cf, a little more difficult.
I would recommend using Procmail for this. It is specifically designed to process your incoming mail and you can pass all mail with a certain property to your app.
http://www.procmail.org/
The spam problem with catchall addresses can usually be solved quite easily by monitoring all mail on the machine. If multiple addresses recieve the same mail, than there's a high probability that it's spam.

Python - Open default mail client using mailto, with multiple recipients

I'm attempting to write a Python function to send an email to a list of users, using the default installed mail client. I want to open the email client, and give the user the opportunity to edit the list of users or the email body.
I did some searching, and according to here:
http://www.sightspecific.com/~mosh/WWW_FAQ/multrec.html
It's apparently against the RFC spec to put multiple comma-delimited recipients in a mailto link. However, that's the way everybody else seems to be doing it. What exactly is the modern stance on this?
Anyhow, I found the following two sites:
http://2ality.blogspot.com/2009/02/generate-emails-with-mailto-urls-and.html
http://www.megasolutions.net/python/invoke-users-standard-mail-client-64348.aspx
which seem to suggest solutions using urllib.parse (url.parse.quote for me), and webbrowser.open.
I tried the sample code from the first link (2ality.blogspot.com), and that worked fine, and opened my default mail client. However, when I try to use the code in my own module, it seems to open up my default browser, for some weird reason. No funny text in the address bar, it just opens up the browser.
The email_incorrect_phone_numbers() function is in the Employees class, which contains a dictionary (employee_dict) of Employee objects, which themselves have a number of employee attributes (sn, givenName, mail etc.). Full code is actually here (Python - Converting CSV to Objects - Code Design)
from urllib.parse import quote
import webbrowser
....
def email_incorrect_phone_numbers(self):
email_list = []
for employee in self.employee_dict.values():
if not PhoneNumberFormats.standard_format.search(employee.telephoneNumber):
print(employee.telephoneNumber, employee.sn, employee.givenName, employee.mail)
email_list.append(employee.mail)
recipients = ', '.join(email_list)
webbrowser.open("mailto:%s?subject=%s&body=%s" %
(recipients, quote("testing"), quote('testing'))
)
Any suggestions?
Cheers,
Victor
Well, since you asked for suggestions: forget about the mailto: scheme and webbrowser, and write a small SMTP client using Python's smtplib module. It's standard, fully supported on all systems, and there's an example included in the documentation which you can practically just copy-and-paste pieces out of.
Of course, if you're using smtplib you will have to ask the user for the details of an SMTP server to use (hostname and port, and probably a login/password). That is admittedly inconvenient, so I can see why you'd want to delegate to existing programs on the system to handle the email. Problem is, there's no system-independent way to do that. Even the webbrowser module doesn't work everywhere; some people use systems on which the module isn't able to detect the default (or any) browser, and even when it can, what happens when you provide a mailto: link is entirely up to the browser.
If you don't want to or can't use SMTP, your best bet might be to write a custom module that is able to detect and open the default email client on as many different systems as possible - basically what the webbrowser module does, except for email clients instead of browsers. In that case it's up to you to identify what kinds of mail clients your users have installed and make sure you support them. If you're thorough enough, you could probably publish your module on PyPI (Python package index) and perhaps even get it included in a future version of the Python standard library - I'm sure there are plenty of people who would appreciate something like that.
As is often the case in Python, somebody's already done most of the hard work. Check out this recipe.
In the following line, there shouldn’t be a space after the comma.
recipients = ', '.join(email_list)
Furthermore, Outlook needs semicolons, not commas. Apart from that, mailto never gave me grief.
The general tip is to test mailto URLs manually in the browser first and to debug URLs by printing them out and entering them manually.

In Django, I want to insert a database record by sending myself an email?

I'm looking into a possible feature for my little to-do application... I like the idea that I can send an email to a particular email address, containing a to-do task I need to complete, and this will be read by my web application and be put in the database... So, when I come to log into my application, the to-do task I emailed will be there as a entry in the app.
Is this possible? I have a slice with SliceHost (basically a dedicated server) so I have total control on what to install etc. I'm using Python/Django/MySQL for this.
Any ideas on what steps to take to make this happen?
If I were to implement this, I'd use a scheduler and a job to be scheduled.
That job would connect to the mail server (be it POP3 or IMAP) and parse the unread messages (or messages unread by the job). Based on that I would insert that record.
You'd get 2 types of records that way. A list of mail message ids which have been processed (so you don't reprocess mails) and a list of tasks.
Disadvantage is that it takes some time before you see the task, as the job only executes every X minutes, or seconds.
If that is not good enough I'd go for a permanent IMAP connection, but you'd have to implement more error handling; you don't just retry automatically every X minutes.
Googling for Django +scheduler will get you started.
also have a look at this StackOverflow thread, no need to reinvent the wheel :)
I needed the exact same thing. I use the Lamson project (which is written in python) to transform email, forward email based on rules to my www.evernote.com and thinking rock www.trgtd.com.au accounts, update firewall web filtering rules, update allow/deny lists for my spam filter, read and write databases etc....
I like to think of it as email server automation and email application development.
www.lamsonproject.org
Troy
One way that I've solved this in the past was using qmail's .qmail files (docs).
Basically you set up qmail and point your email address (for ease of use, lets assume proc#whatever.com is your email address) to your home directory. In that directory you set up a .qmail-proc file to handle the mail.
This allows you to use a full-fledged SMTP server on your server, including spam filtering, forwarding, aliases, all that fun stuff. You can then pipe the data from an email into an application. In your case, I would suggest making a Mangement Command in Django to process the email (I'll call it proc_email). Thus your .qmail-proc may look like:
/var/spool/mail/proc
| /www/django/myproject/manage.py proc_email
This stores a copy of the email in /var/spool/mail/proc, then passes the email to the script in the second line. The email itself is passed to proc_email via sys.stdin. Simply read the email from there, and store it through your Django Models.
If you need to process email for different addresses later, you can also set up aliases which point to your home directory, and use .qmail-<username> files for each alias. Allowing you to pass other flags (such as the username for each alias) to proc_email if needed.
I should note that this isn't the simplest solution, but it can scale, and is pretty darn bullet proof.
I would not focus on Django for this.
I would create a mail server to catch these emails. Use http://docs.python.org/library/smtpd.html.
I would then use just the Django ORM to update the database based on the emails received.

Deleting the most recently received email via Python script?

I use Gmail and an application that notifies me if I've received a new email, containing its title in a tooltip. (GmailNotifier with Miranda-IM) Most of the emails I receive are ones I don't want to read, and it's annoying having to login to Gmail on a slow connection just to delete said email. I believe plugin is closed source.
I've been (unsuccessfully) trying to write a script that will login and delete the 'top' email (the one most recently received). However this is not as easy I thought it would be.
I first tried using imaplib, but discovered that it doesn't contain any of the methods I hoped it would. It's a bit like the dbapi spec, containing only minimal functionality incase the imap spec is changed. I then tried reading the imap RFC (rfc3501). Halfway through it, I realized I didn't want to write an entire mail client, so decided to try using pop3 instead.
poplib is also minimal but seemingly has what I need. However pop3 doesn't appear to sort the messages in any order I'm familiar with. I have to either call top() or retr() on every single email to read the headers if I want to see the date received.
I could probably iterate through every single message header, searching for the most recent date, but that's ugly. I want to avoid parsing my entire mailbox if possible. I also don't want to 'pop' the mailbox and download any other messages.
It's been 6 hours now and I feel no closer to a solution than when I started. Am I overlooking something simple? Is there another library I could try? (I found a 'chilkat' one, but it's bloated to hell, and I was hoping to do this with the standard library)
import poplib
#connect to server
mailserver = poplib.POP3_SSL('pop.gmail.com')
mailserver.user('recent:YOURUSERNAME') #use 'recent mode'
mailserver.pass_('YOURPASSWORD') #consider not storing in plaintext!
#newest email has the highest message number
numMessages = len(mailserver.list()[1])
#confirm this is the right one, can comment these out later
newestEmail = mailserver.retr(numMessages)
print newestEmail
#most servers will not delete until you quit
mailserver.dele(numMessages)
mailserver.quit()
I worked with the poplib recently, writing a very primitive email client. I tested this with my email server (not gmail) on some test emails and it seemed to work correctly. I would send yourself a few dummy emails to test it out first.
Caveats:
Make sure you are using 'recent
mode':
http://mail.google.com/support/bin/answer.py?answer=47948
Make sure your Gmail account has POP3
enabled: Gmail > Settings >
Forwarding and POP/IMAP > "Enable POP
for all mail"
Hope this helps, it should be enough to get you going!

Categories

Resources