I'm using imaplib to fetch emails for several accounts (Gmail, Yahoo..).
What is the best way to store emails locally (including attachments).
Is there any way to pickle and store emails as file?
Is it possible to store emails as bytes and retrieve them
later as mail object?
I'll try to save mail in separate folder with each field in JSON file
and attachment as separate files, but I was wondering if there is a
native way of doing it.
There are already several established ways to store mailboxes (i.e. a list of emails). Popular examples are Maildir and mbox.
Python includes the mailbox module which can handle them:
Supported mailbox formats are Maildir, mbox, MH, Babyl, and MMDF.
You can of course roll your own solution, pickle them or dump them as JSON to a file, but if you use one of the common formats, you gain compatibility with other programs (importing them into Thunderbird, for example).
Related
I have a Python program that will read an Outlook inbox using these Python libraries:
1. IMAPClient
2. email
I want to know if it is possible to get the date the email attachment was created.
I don't see anything in email headers that stand out. I can get the date an email was sent (or forwarded), but it is the case when an email is forwarded that prompts this question.
I want to get the date of the attachment inside the email. If anyone has done this, and has a full working code snippet to share, it would be greatly appreciated.
I have done several searches, looked carefully through email headers, looked at the two library documentation I am using (IMAPClient, and email), and see nothing that stands out that would lead to a solution.
Some file formats include this, most don't, some may. For example the EXIF data in some JPEG files includes it. To read that you'll need an EXIF library, not an IMAP library. Microsoft Word files include a creation date, IIRC that's mandatory for that format, but again you'll need a type-specific library.
IMAP is merely the channel through which you download the data you want to examine.
Is there any possibility or any library to log in to a given mail and recover a list of messages for a given sender?
I mean the situation in which I provide an e-mail address, based on this address, all messages in the inbox are filtered, and I am returned to the list of e-mails or the user's last message.
I use flask-mail to send emails, but I don't think it is possible to recover the list of messages.
You should check the standard mailbox library. It provides functionalities to read mailboxes stored on disk using the most popular mailbox file formats (Maildir, mbox, MH, Babyl, and MMDF at the time of this writing).
Be warned, nowaday, for performance, reasons many mail clients are using embedded database engines to store emails. SQLite being popular choice, you can also try the sqlite3 library.
Finally, You will also find exotic file formats like Mork. For that, you will have to write your own parser or turn to PyPy to search if someone has already done the work for you.
As a personal note, if your email client allows changing its storage backend, you may consider switching to a well know text-based storage format for your emails--it definitely helps in case of disaster recovery
As an example, I am using Thunderbird and set it up to use the mbox file format. So I can iterate over the message of my Junk folder that way:
>>> path = '~/.thunderbird/4tuag540.default/ImapMail/ssl0.ovh-1.net/INBOX.sbd/Junk'
>>> from mailbox import mbox
>>> junk = mbox(path)
>>> for message in junk:
... # Prinf the "From" header:
... print(message['From'])
...
Summary
How do I create a set of on-demand, mock/dummy/fake emails in numerous
IMAP folders? Email content needs to be share-able in a public forum via
a commonly-access IMAP-server account (typically for testers/developers
trying to debug MUA problems/configurations), with no privacy
risks.
I haven't yet found a solution to do this. Unless I can find something
(I'm totally looking for suggestions), I'm relegated to writing my own
software client. If so, I'd do it in Python, and I'm looking for general
pointers on which tools/libraries/methods/approaches I should employ to
most-quickly get a first, working prototype.
How should I solve the above, given the context below?
Purpose
I want to test various MUA deployments, sharing the same IMAP
account between many users/testers/developers (of any MUA) in a
public arena. Example: I might ask on a Notmuch email list:
"Why is my mbsync/Notmuch
config not working? Here's a shared gmail.com account we can
collectively use as a common IMAP server to minimize server-side
variables and thus collectively help debug stuff."
IMAP-Client Requirements
The IMAP-client program:
must be able to create a variable number of nested IMAP folders with any
number emails,
must prove all email content and folder names are share-able,
with no privacy concerns (any reasonable content will do,
and it doesn't have to make sense; eg #1: variants of Lorem
ipsum might work; eg #1:
provide input for 3 or more example emails, provided by the user/caller)
so long as the emails can be opened and read by MUAs, and their
attachments are "real" enough to be opened by the attachment-file's
corresponding application,
will include some number of emails with 1 or more attachments,
will optimally (but not required for initial versions) be capable
of generating GB's of content by creating many thousands of emails in
hundreds of nested IMAP folders. The client can leverage many or large
file attachments will help do this.
must be able to do all of the above on-demand, given any new/fresh
IMAP-server account credentials.
As an implementation shortcut, it's ok for the client to duplicate
much/most of the email conent, so long as there's significant variance
in Date:, To:, From:, and Subject: headers and email-folder names (all
of which are presumably easy to "randomize").
More Details
I've pondered trying to non-private-ize existing emails/folders from
IMAP accounts I already have (that serve the above requirements), but
that work appears way too hard. Too much personal, sensitive information
would need to be "converted"/"private-ized." However, I'd like to hear
options for ways to easily privatize (scramble, encrypt, something?)
this existing email content. Such a path might save me having to write
the software.
The only way I see to solve this properly: leverage an IMAP client
program (again, I'm presumably writing it) that can create emails and
email folders on any designated IMAP server/account. Program input can
include example (presumably private) email content, number of folders
and nesting levels, randomness, date ranges (of emails), etc.
I've not yet found anything that does this.
GreenMail appears to setup
the IMAP server, but not the IMAP server content--unless I'm overlooking
something?
There are many emails in my All mailbox more than there are in the Important and Sent mailboxes. I want to remove all the mails which are not in the Important or Sent mailbox.
I can not do any of the following steps
1) Delete all the emails in the All mailbox, (when i delete all the emails in the All mailbox, all the emails in the Important and Sent mailboxes will be deleted at the same time)
2) and copy emails from the Important and Sent mailboxes.
How can I write code to accomplish this?
The problem can become another form:
how can i make a copy of emails in my gmailbox :"[Gmail]/&kc 2JgQ-" into local directory g:\mygmail ?
There are 5 emails in my gmail--inbox ,i save all of them in the g:\mygmails,and name them as 0th.myemail 1th.myemail 2th.myemail 3th.myemail 4th.myemail with the following code,now how can i read them by thunderbird or some email soft ,i don't want to write my own code to read them?
import email,imaplib
att_path="g:\\mygmails\\"
user="xxxx"
password="yyyy"
con=imaplib.IMAP4_SSL('imap.gmail.com')
con.login(user,password)
con.select('INBOX')
resp, items = con.search(None, "ALL")
items = items[0].split()
for id,num in enumerate(items):
resp, data = con.fetch(num, "(RFC822)")
data=data[0][1]
fp = open(att_path+str(id)+"th"+".myemail", 'wb')
fp.write(data)
fp.close()
After doing some digging around on google, I found a github repository that provides a module for doing just this. It is not very well documented but the source code is very easy to read so it isn't a significant loss at all.
In terms of using this module, you can load in each email with the specified labels and mark them for being saved, then go through all the emails and delete the ones that have not been marked.
I don't currently see a natural way to mark the emails on the remote server, so you may have to implement something where you record the emails as strings and store them in a set.
If you have any questions still, just post a comment to this answer and I can elaborate more.
For Example: if you wanted to copy the entries of a particular mailbox into a python data structure, you can do so like this:
# Global Variables
username, password, mailboxname = '', '', '[Gmail]/&kc 2JgQ-'
# Set up
import gmail
g = gmail.Gmail()
g.login(username, password)
# Actual code.
emails = []
for email in g.mailbox(mailboxname).mail():
emails.append(email.fetch())
# Tear down.
g.logout()
So assuming that you adjust the global variables accordingly, you now have a python list (in the python variable emails) of all the emails in mailboxname for the gmail account username. Once you have this, you can easily do something like saving it to a file(s).
If you like Windows_PowerShell I have a solution that can be reuse with little effort and customized for your needs. You can setup Mail_User_Agent to use the Web Access API and automate this task. In my examples good old Powershell (as we know already - task automation and configuration management framework from Microsoft) with it's headless IE capabilities (will make it work as a Daemon and allow it to communicate with us only if preconditions are true) is able to support all this.
And to be more precise if You have to Login and use Firewall Web Access APIs - the implementation is almost the same. So with one stone we get two birds - every morning You'll be behind-the-wall and knowing your mail content. Here You can see sample solution.
I'm writing a script in Python that saves attachments from Gmail, only from unseen emails. To save on bandwidth I want to make sure that every file only gets downloaded once.
-I can't check the folder where I save them, because the file could be removed already, and then it shouldn't download again. (The scripts accesses the Inbox read_only, so it doesn't mark the email as read. As soon as the script runs again it will download the same attachments again, until the email gets marked read via another channel.)
-Right now I save the filename to a sqlite database, but there's 2 problems: I haven't figured out how to check the database for the filename the next time I run the script, and there's also a chance that somewhen down the line an attachment arrives with the same filename, which then wouldn't get downloaded.
What's a safe and scalable way to make sure I don't download the files more than once?
There are several open source projects in Python that already perform this task very well. Why don't you take a look at OfflineIMAP and getmail's source code. Also, if you're just trying to backup your GMail account, I suggest you use one of those rather than rolling your own...
You could not only save the filename to the database but save, for example, the Date:-header of the mail, too. (Or any combination of headers of which you are sure that they define a mail uniquely).
You could fetch the headers for the message, and use the message's Date and/or Message-Id header value to construct a "unique id prefix" for all of the attachments in that message. Then create a key of the form [unique_id]_[filename], check if that key exists in your database or filesystem. If not, download all attachments for that message, and save each with the modified unique id key.