Parsing emails as soon as they are received - python

I have users sending emails with some text I need to extract. Each user's email is mapped to a single mailbox. I'm currently using a cron job that polls the mailbox (postfix) every 5 minutes, checks for new messages, and sends it to a queue where I have workers parse them. I have two main questions:
Is there a way I can parse the email as soon as it's received instead of
polling the server? Also, how could
I implement this to be scalable? For
example, if there are 50 incoming
messages per second.
I'm programatically writing each user's email address to point to mailbox in the postfix configuration file. Would it be better to create a catch all account, so I don't have to write each email address? However, I know catch-all accounts are more susceptible to spam.

Use a pipe alias to catch the email, then use celery to dump it into a MQ for processing.

Yes, this can be done quite easily. All you need to do is configure the postfix to forward email to a script instead of to a mailbox. It does not really have to be a catch-all, you can configure postfix to forward specific emails to a script. The script can be written in any language. I wrote such script in php a couple of times. Another possibility for a very busy server, like 50 emails per second is to write your own filter server, then configure postfix to pass each message to your filter.
TO forward email to a script, in aliases file put a line like this: the path must point to this file
someaccount |/usr/local/bin/emailParser.php
To forward emails to a filter, it has to be configured in master.cf, a little more difficult.

I would recommend using Procmail for this. It is specifically designed to process your incoming mail and you can pass all mail with a certain property to your app.
http://www.procmail.org/
The spam problem with catchall addresses can usually be solved quite easily by monitoring all mail on the machine. If multiple addresses recieve the same mail, than there's a high probability that it's spam.

Related

Simple SMTP Processing point

I'm working with a third party service that will send update information on some machinery to an email address I specify. What I want to do is take that email, parse out the the owner of a particular piece of hardware, and send that owner an email update. To be clear, I can't send the first email directly to the end user because the third party source I'm working with doesn't have my end user's email address data.
I've been able to do the two parts [(1)receiving+parsing, (2) sending new message ] independently using the smtpd and smtplib python packages, but I'm not able to use them together. Python doesn't have a builtin MTA so any outgoing messages I send with smtplib are intercepted by the smtpd.SMTPServer. Is there any way I can combine these two parts (receiving+parsing and sending an update) with a simple script without having to invest in a big smtp server package like Lamson?

How to receive an email on server? Better using Python or Perl

I've checked so many articles, but can't find one for server to server email receiving. I want to write a program or some code just acts as an email receiver, not SMTP server or something else.
Let's suppose I have a domain named example.com, and a gmail user user#gmail.com sends me an email to admin#example.com, or a yahoo user user#yahoo.com sends me an email to test#example.com. Now, what do I do to receive this email? I prefer to write this code in Python or Perl.
Regards,
David
http://docs.python.org/library/smtpd.html
http://www.doughellmann.com/PyMOTW/smtpd/
In Perl:
Net::SMTP libraries (including Net::SMTP::Server).
Here's an example of using it: http://wiki.nil.com/Simple_SMTP_server_in_PERL
"reveive" is not a word. I'm really not sure if you mean "receive" or "retrieve".
If you mean "receive" then you probably do want an SMTP server, despite your claim. An SMTP server running on a computer is responsible for listening for network requests from other SMTP servers that wish to deliver mail to that computer.
The SMTP server then, typically, deposits the mail in a directory where it can be read by the recipient. They can usually be configured (often in combination with tools such as Procmail) to do stuff to incoming email (such as pass it to a program for manipulation along the way, this allows you to avoid having to write a full blown SMTP server in order to capture some emails).
If, on the other hand, you mean "retrieve", then you are probably looking to find a library that will let your program act as an IMAP or POP client. These protocols are used to allow remote access to a mailbox.
Good article at http://muffinresearch.co.uk/archives/2010/10/15/fake-smtp-server-with-python/ showing how to subclass smtpd.SMTPserver and how to run it.

In Django, I want to insert a database record by sending myself an email?

I'm looking into a possible feature for my little to-do application... I like the idea that I can send an email to a particular email address, containing a to-do task I need to complete, and this will be read by my web application and be put in the database... So, when I come to log into my application, the to-do task I emailed will be there as a entry in the app.
Is this possible? I have a slice with SliceHost (basically a dedicated server) so I have total control on what to install etc. I'm using Python/Django/MySQL for this.
Any ideas on what steps to take to make this happen?
If I were to implement this, I'd use a scheduler and a job to be scheduled.
That job would connect to the mail server (be it POP3 or IMAP) and parse the unread messages (or messages unread by the job). Based on that I would insert that record.
You'd get 2 types of records that way. A list of mail message ids which have been processed (so you don't reprocess mails) and a list of tasks.
Disadvantage is that it takes some time before you see the task, as the job only executes every X minutes, or seconds.
If that is not good enough I'd go for a permanent IMAP connection, but you'd have to implement more error handling; you don't just retry automatically every X minutes.
Googling for Django +scheduler will get you started.
also have a look at this StackOverflow thread, no need to reinvent the wheel :)
I needed the exact same thing. I use the Lamson project (which is written in python) to transform email, forward email based on rules to my www.evernote.com and thinking rock www.trgtd.com.au accounts, update firewall web filtering rules, update allow/deny lists for my spam filter, read and write databases etc....
I like to think of it as email server automation and email application development.
www.lamsonproject.org
Troy
One way that I've solved this in the past was using qmail's .qmail files (docs).
Basically you set up qmail and point your email address (for ease of use, lets assume proc#whatever.com is your email address) to your home directory. In that directory you set up a .qmail-proc file to handle the mail.
This allows you to use a full-fledged SMTP server on your server, including spam filtering, forwarding, aliases, all that fun stuff. You can then pipe the data from an email into an application. In your case, I would suggest making a Mangement Command in Django to process the email (I'll call it proc_email). Thus your .qmail-proc may look like:
/var/spool/mail/proc
| /www/django/myproject/manage.py proc_email
This stores a copy of the email in /var/spool/mail/proc, then passes the email to the script in the second line. The email itself is passed to proc_email via sys.stdin. Simply read the email from there, and store it through your Django Models.
If you need to process email for different addresses later, you can also set up aliases which point to your home directory, and use .qmail-<username> files for each alias. Allowing you to pass other flags (such as the username for each alias) to proc_email if needed.
I should note that this isn't the simplest solution, but it can scale, and is pretty darn bullet proof.
I would not focus on Django for this.
I would create a mail server to catch these emails. Use http://docs.python.org/library/smtpd.html.
I would then use just the Django ORM to update the database based on the emails received.

Deleting the most recently received email via Python script?

I use Gmail and an application that notifies me if I've received a new email, containing its title in a tooltip. (GmailNotifier with Miranda-IM) Most of the emails I receive are ones I don't want to read, and it's annoying having to login to Gmail on a slow connection just to delete said email. I believe plugin is closed source.
I've been (unsuccessfully) trying to write a script that will login and delete the 'top' email (the one most recently received). However this is not as easy I thought it would be.
I first tried using imaplib, but discovered that it doesn't contain any of the methods I hoped it would. It's a bit like the dbapi spec, containing only minimal functionality incase the imap spec is changed. I then tried reading the imap RFC (rfc3501). Halfway through it, I realized I didn't want to write an entire mail client, so decided to try using pop3 instead.
poplib is also minimal but seemingly has what I need. However pop3 doesn't appear to sort the messages in any order I'm familiar with. I have to either call top() or retr() on every single email to read the headers if I want to see the date received.
I could probably iterate through every single message header, searching for the most recent date, but that's ugly. I want to avoid parsing my entire mailbox if possible. I also don't want to 'pop' the mailbox and download any other messages.
It's been 6 hours now and I feel no closer to a solution than when I started. Am I overlooking something simple? Is there another library I could try? (I found a 'chilkat' one, but it's bloated to hell, and I was hoping to do this with the standard library)
import poplib
#connect to server
mailserver = poplib.POP3_SSL('pop.gmail.com')
mailserver.user('recent:YOURUSERNAME') #use 'recent mode'
mailserver.pass_('YOURPASSWORD') #consider not storing in plaintext!
#newest email has the highest message number
numMessages = len(mailserver.list()[1])
#confirm this is the right one, can comment these out later
newestEmail = mailserver.retr(numMessages)
print newestEmail
#most servers will not delete until you quit
mailserver.dele(numMessages)
mailserver.quit()
I worked with the poplib recently, writing a very primitive email client. I tested this with my email server (not gmail) on some test emails and it seemed to work correctly. I would send yourself a few dummy emails to test it out first.
Caveats:
Make sure you are using 'recent
mode':
http://mail.google.com/support/bin/answer.py?answer=47948
Make sure your Gmail account has POP3
enabled: Gmail > Settings >
Forwarding and POP/IMAP > "Enable POP
for all mail"
Hope this helps, it should be enough to get you going!

Parsing All Mail from Mailbox File in Python

Maybe I'm going about this the wrong way, but I want to parse a single "catch all" email inbox via Python. I see the email module and I can make it parse an individual email, but what I want to do is open (for example) /var/spool/mail/catchall and parse all of the individual messages inside it. Opening that file and running the parser over it treats the whole thing as one giant email. How would I break it into individual messages?
Alternatively: is this A Bad Idea, given I'm going to want to delete the messages when I'm done with them? I'm tying this route instead of POP/ IMAP only because the server support isn't available right now.
You'll want to use mailbox to actually go through the mailbox.

Categories

Resources