How do I convert an email body to plain text in python? - python

I'm trying to store the text from the body of an email into a file (as plain text) for comparison. This the section of the code that's currently giving me the problem:
result, data = mail.search(None, "ALL")
ids = data[0]
id_list = ids.split()
latest_email_id = id_list[-1]
result, data = mail.fetch(latest_email_id, "(RFC822)")
raw_email = data[0][1]
mymail = email.message_from_bytes(raw_email)
mytext=mymail.get_payload()[0].get_payload()
print(mytext)
The output of the file looks like this:
email.message.Message object at 0x03BFC410>, email.message.Message object at 0x03C0D210>
How do I get the body of the email as plain text?

Related

Use python to download email attachments only based on Subject

The following code uses imap to find emails by subject line and returns all parts of the email and downloads the attachments. However i am ONLY needing it to download the attachments of the email not the entire body also. I understand this has to do with the for part in email_message.walk(): that is iterating the entire email. Could someone please help me have this code download only the attachment of the email? Im sure this is a simple code change but im just not sure how to make it!
import imaplib
import email.header
import os
import sys
import csv
# Your IMAP Settings
host = 'imap.gmail.com'
user = 'User email'
password = 'User password'
# Connect to the server
print('Connecting to ' + host)
mailBox = imaplib.IMAP4_SSL(host)
# Login to our account
mailBox.login(user, password)
boxList = mailBox.list()
# print(boxList)
mailBox.select()
searchQuery = '(SUBJECT "CDR Schedule output from schedule: This is a test to see how it works")'
result, data = mailBox.uid('search', None, searchQuery)
ids = data[0]
# list of uids
id_list = ids.split()
i = len(id_list)
for x in range(i):
latest_email_uid = id_list[x]
# fetch the email body (RFC822) for the given ID
result, email_data = mailBox.uid('fetch', latest_email_uid, '(RFC822)')
# I think I am fetching a bit too much here...
raw_email = email_data[0][1]
# converts byte literal to string removing b''
raw_email_string = raw_email.decode('utf-8')
email_message = email.message_from_string(raw_email_string)
# downloading attachments
for part in email_message.walk():
if part.get_content_maintype() == 'multipart':
continue
if part.get('Content-Disposition') is None:
continue
fileName = part.get_filename()
if bool(fileName):
filePath = os.path.join('C:/install files/', fileName)
if not os.path.isfile(filePath) :
fp = open(filePath, 'wb')
fp.write(part.get_payload(decode=True))
fp.close()
subject = str(email_message).split("Subject: ", 1)[1].split("\nTo:", 1)[0]
print('Downloaded "{file}" from email titled "{subject}" with UID {uid}.'.format(file=fileName, subject=subject, uid=latest_email_uid.decode('utf-8')))
mailBox.close()
mailBox.logout()

Python: Keep checking new email and alert of further new emails

I have this code that checks the latest email and then goes and does something. Is it possible to write something that keeps checking the inbox folder for new mail? Although I want it to keep checking for the latest new email. Is it getting too complicated if I try and store that it has made one pass? So it doesn't alert about the same email twice about the same email.
Code:
import imaplib
import email
import Tkinter as tk
word = ["href=", "href", "<a href="] #list of strings to search for in email body
#connection to the email server
mail = imaplib.IMAP4_SSL('imap.gmail.com')
mail.login('xxxx', 'xxxx')
mail.list()
# Out: list of "folders" aka labels in gmail.
mail.select("Inbox", readonly=True) # connect to inbox.
result, data = mail.uid('search', None, "ALL") # search and return uids instead
ids = data[0] # data is a list.
id_list = ids.split() # ids is a space separated string
latest_email_uid = data[0].split()[-1]
result, data = mail.uid('fetch', latest_email_uid, '(RFC822)') # fetch the email headers and body (RFC822) for the given ID
raw_email = data[0][1] # here's the body, which is raw headers and html and body of the whole email
# including headers and alternate payloads
.....goes and does other code regarding to email html....
Try to use this approach:
Logic is the same as from #tripleee comment.
import time
word = ["href=", "href", "<a href="] #list of strings to search for in email body
#connection to the email server
mail = imaplib.IMAP4_SSL('imap.gmail.com')
mail.login('xxxx', 'xxxx')
mail.list()
# Out: list of "folders" aka labels in gmail.
latest_email_uid = ''
while True:
mail.select("Inbox", readonly=True)
result, data = mail.uid('search', None, "ALL") # search and return uids instead
ids = data[0] # data is a list.
id_list = ids.split() # ids is a space separated string
if data[0].split()[-1] == latest_email_uid:
time.sleep(120) # put your value here, be sure that this value is sufficient ( see #tripleee comment below)
else:
result, data = mail.uid('fetch', latest_email_uid, '(RFC822)') # fetch the email headers and body (RFC822) for the given ID
raw_email = data[0][1]
latest_email_uid == data[0].split()[-1]
time.sleep(120) # put your value here, be sure that this value is sufficient ( see #tripleee comment below)

How can I make my email show only one time per email?

I've been making a Python script which checks for emails since logging in. Here is my code so far:
#!/usr/bin/python
import imaplib, getpass
mail = imaplib.IMAP4_SSL('imap.gmail.com')
u = raw_input('Your Gmail Address: ')
p = getpass.getpass()
mail.login(u, p)
mail.select("inbox")
while 1:
r, data = mail.search(None, "ALL")
ids = data[0]
id_list = ids.split()
latest_email_id = id_list[-1]
r, data = mail.fetch(latest_email_id, "(RFC822)")
raw_email = data[0][1]
print raw_email
The problem is that it keeps showing the same email over and over again (until a new one is received) because of the while loop.
How can I make it:
Only show a received email once until a new one is received
Only show the new one once
Repeat forever
So you basically want to develop an email listener...
In the following code, I'll just download the unseen emails, so that we have just the relevant data. Then, once an email is fetched, I mark it as 'read' so it's id won't turn up again:
while 1:
r, search_data = mail.search(None, "UNSEEN") #gets only the unseen emails
ids = data[0]
id_list = ids.split()
latest_email_id = id_list[-1]
r, data = mail.fetch(latest_email_id, "(RFC822)")
raw_email = data[0][1]
print raw_email
mail.store(search_data[0].replace(' ',','),'+FLAGS','\Seen') #marks as read
Now, at least your code won't print the same email again and again. IMAP is generally more reliable than POP3 in getting new emails quickly. Still, it can take some time.
I have found a solution:
list = []
while 1:
mail.select('inbox')
r, data = mail.search(None, "ALL")
ids = data[0]
id_list = ids.split()
latest_email_id = id_list[-1]
r, data = mail.fetch(latest_email_id, "(RFC822)")
raw_email = data[0][1]
if not raw_email in list:
print raw_email
list.append(raw_email)
Basically, it creates a list called list:
list = []
And then, in the loop it is mostly the same, except in the beginning, it checks the mailbox again:
while 1:
mail.select('inbox')
And then, at the end, it will print raw_email if it is not in the list and then adds it to the list so it will not be printed again:
if not raw_email in list:
print raw_email
list.append(raw_email)

Python -Get the body of an multipart email

i get an email with the following code:
m = imaplib.IMAP4_SSL(MailReceiveSRV)
m.login(MailReceiveUSER, MailReceivePWD)
m.select("Inbox")
status, unreadcount = m.status('INBOX', "(UNSEEN)")
unreadcount = int(unreadcount[0].split()[2].strip(').,]'))
items = m.search(None, "UNSEEN")
items = str(items[1]).strip('[\']').split(' ')
for index, emailid in enumerate(items):
resp, data = m.fetch(emailid, "(RFC822)")
email_body = data[0][1]
mail = email.message_from_string(email_body)
for part in mail.walk():
body = part.get_payload()
FYI: This is always a part of the examplecode.
But body is now a biiig list of objects. If the Content_Type would be Plaintext, it would be much easier.
How can i get access to the body of that mail now?
Short answer
You have a multiparted email. That's why you're getting a list instead of a string: get_payload returns a list of Message if it's a multipart message, and string if it's not.
Explanation
From the docs:
Return the current payload, which will be a list of Message objects when is_multipart() is True, or a string when is_multipart() is False.
Hence get_payload returning a list.
Your code for getting the body would be something like:
if email_message.is_multipart():
for part in email_message.get_payload():
body = part.get_payload()
# more processing?
else:
body = email_message.get_payload()
Again, from the docs:
Note that is_multipart() returning True does not necessarily mean that "msg.get_content_maintype() == 'multipart'" will return the True. For example, is_multipart will return True when the Message is of type message/rfc822.

python imaplib gmail fetching multiple results from list

I am trying to obtain email ids and then fetch all of them. How do I do this? Thanks!
The following is my code:
import imaplib
import re
user = 'user'
pwd = 'password'
imap_server = imaplib.IMAP4_SSL('imap.gmail.com', 993)
imap_server.login(user, pwd)
imap_server.select('Inbox')
typ, response = imap_server.search(None, '(SUBJECT "Hello")')
response = str(response[0])
response_re = re.compile('\d+')
response_pat = re.findall(response_re, response)
for i in response_pat:
results, datas = imap_server.fetch(i, "(RFC822)")
for i in datas:
print i
this still on print one value of datas, when I have iterated through a list of multiple #values.
You made a mistake with the command. It should be RFC822 instead of RCF822. Simply just change one line of your code. Change this line from
results, datas = imap_server.fetch(i, "(RCF822)")
to
results, datas = imap_server.fetch(i, "(RFC822)")
And also, don't use regex when you can simply use string libraries. Instead of using regex, simply do this in your loop:
for i in response[0].split():
results, datas = m.fetch(i, "(RFC822)")

Categories

Resources