Getting n most recent emails using IMAP and Python - python

I'm looking to return the n (most likely 10) most recent emails from an email accounts inbox using IMAP.
So far I've cobbled together:
import imaplib
from email.parser import HeaderParser
M = imaplib.IMAP4_SSL('my.server')
user = 'username'
password = 'password'
M.login(user, password)
M.search(None, 'ALL')
for i in range (1,10):
data = M.fetch(i, '(BODY[HEADER])')
header_data = data[1][0][1]
parser = HeaderParser()
msg = parser.parsestr(header_data)
print msg['subject']
This is returning email headers fine, but it seems to be a semi-random collection of emails that it gets, not the 10 most recent.
If it helps, I'm connecting to an Exchange 2010 server. Other approaches also welcome, IMAP just seemed the most appropriate given that I only wanted to read the emails not send any.

The sort command is available, but it is not guaranteed to be supported by the IMAP server. For example, Gmail does not support the SORT command.
To try the sort command, you would replace:
M.search(None, 'ALL')
with
M.sort(search_critera, 'UTF-8', 'ALL')
Then search_criteria would be a string like:
search_criteria = 'DATE' #Ascending, most recent email last
search_criteria = 'REVERSE DATE' #Descending, most recent email first
search_criteria = '[REVERSE] sort-key' #format for sorting
According to RFC5256 these are valid sort-key's:
"ARRIVAL" / "CC" / "DATE" / "FROM" / "SIZE" / "SUBJECT" / "TO"
Notes:
1. charset is required, try US-ASCII or UTF-8 all others are not required to be supported by the IMAP server
2. search critera is also required. The ALL command is a valid one, but there are many. See more at http://www.networksorcery.com/enp/rfc/rfc3501.txt
The world of IMAP is wild and crazy. Good luck

This is the code to get the emailFrom, emailSubject, emailDate, emailContent etc..
import imaplib, email, os
user = "your#email.com"
password = "pass"
imap_url = "imap.gmail.com"
connection = imaplib.IMAP4_SSL(imap_url)
connection.login(user, password)
result, data = connection.uid('search', None, "ALL")
if result == 'OK':
for num in data[0].split():
result, data = connection.uid('fetch', num, '(RFC822)')
if result == 'OK':
email_message = email.message_from_bytes(data[0][1])
print('From:' + email_message['From'])
print('To:' + email_message['To'])
print('Date:' + email_message['Date'])
print('Subject:' + str(email_message['Subject']))
print('Content:' + str(email_message.get_payload()[0]))
connection.close()
connection.logout()

# get recent one email
from imap_tools import MailBox
with MailBox('imap.mail.com').login('test#mail.com', 'password', 'INBOX') as mailbox:
for msg in mailbox.fetch(limit=1, reverse=True):
print(msg.date_str, msg.subject)
https://github.com/ikvk/imap_tools

this is work for me~
import imaplib
from email.parser import HeaderParser
M = imaplib.IMAP4_SSL('my.server')
user = 'username'
password = 'password'
M.login(user, password)
(retcode, messages) =M.search(None, 'ALL')
news_mail = get_mostnew_email(messages)
for i in news_mail :
data = M.fetch(i, '(BODY[HEADER])')
header_data = data[1][0][1]
parser = HeaderParser()
msg = parser.parsestr(header_data)
print msg['subject']
and this is get the newer email function :
def get_mostnew_email(messages):
"""
Getting in most recent emails using IMAP and Python
:param messages:
:return:
"""
ids = messages[0] # data is a list.
id_list = ids.split() # ids is a space separated string
#latest_ten_email_id = id_list # get all
latest_ten_email_id = id_list[-10:] # get the latest 10
keys = map(int, latest_ten_email_id)
news_keys = sorted(keys, reverse=True)
str_keys = [str(e) for e in news_keys]
return str_keys

Workaround for Gmail. Since the The IMAP.sort('DATE','UTF-8','ALL') does not work for gmail ,we can insert the values and date into a list and sort the list in reverse order of date. Can check for the first n-mails using a counter. This method will take a few minutes longer if there are hundreds of mails.
M.login(user,password)
rv,data= M.search(None,'ALL')
if rv=='OK':
msg_list=[]
for num in date[0].split():
rv,data=M.fetch(num,'(RFC822)')
if rv=='OK':
msg_object={}
msg_object_copy={}
msg=email.message_from_bytes(data[0][1])
msg_date=""
for val in msg['Date'].split(' '):
if(len(val)==1):
val="0"+val
# to pad the single date with 0
msg_date=msg_date+val+" "
msg_date=msg_date[:-1]
# to remove the last space
msg_object['date']= datetime.datetime.strptime(msg_date,"%a, %d %b %Y %H:%M:%S %z")
# to convert string to date time object for sorting the list
msg_object['msg']=msg
msg_object_copy=msg_object.copy()
msg_list.append(msg_object_copy)
msg_list.sort(reverse=True,key=lambda r:r['date'])
# sorts by datetime so latest mails are parsed first
count=0
for msg_obj in msg_list:
count=count+1
if count==n:
break
msg=msg_obj['msg']
# do things with the message

To get the latest mail:
This will return all the mail numbers contained inside the 2nd return value which is a list containing a bytes object:
imap.search(None, "ALL")[1][0]
This will split the bytes object of which the last element can be taken by accessing the negative index:
imap.search(None, "ALL")[1][0].split()[-1]
You may use the mail number to access the corresponding mail.

Related

How to make a time restriction in outlook using python?

I am making a program that:
opens outlook
find emails per subject
extract some date from emails (code and number)
fills these data in excel file in.
Standard email looks like this:
Subject: Test1
Hi,
You got a new answer from user Alex.
Code: alex123fj
Number1: 0611111111
Number2: 1020
Number3: 3032
I encounter 2 main problems in the process.
Firstly, I do not get how to make time restriction for emails in outlook. For example, if I want to read emails only from yesterday.
Secondly, all codes and numbers from email I save in lists. But every item gets this ["alex123fj/r"] in place from this ["alex123fj"]
I would appreciate any help or advice, that is my first ever program in Python.
Here is my code:
import win32com.client
import re
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
inbox = outlook.Folders('myemail#....').Folders('Inbox')
messages = inbox.Items
def get_code(messages):
codes_lijst = []
for message in messages:
subject = message.subject
if subject == "Test1":
body = message.body
matches = re.finditer("Code:\s(.*)$", body, re.MULTILINE)
for match in matches:
codes_lijst.append(match.group(1))
return codes_lijst
def get_number(messages):
numbers_lijst = []
for message in messages:
subject = message.subject
if subject == "Test1":
body = message.body
matches = re.finditer("Number:\s(.*)$", body, re.MULTILINE)
for match in matches:
numbers_lijst.append(match.group(1))
return numbers_lijst
code = get_code(messages)
number = get_number(messages)
print(code)
print(number)
Firstly, never loop through all items in a folder. Use Items.Find/FindNext or Items.Restrict with a restriction on ConversationTopic (e.g. [ConversationTopic] = 'Test1').
To create a date/time restriction, add a range restriction ([ReceivedTime] > 'some value') and [ReceivedTime] < 'other value'

Add label to Gmail using IMAP, Python

I'm trying to add a label to a subset of gmails. It's really buggy. It works, then doesn't work, then adds to the first email only...
mail = imaplib.IMAP4_SSL('imap.gmail.com')
mail.login('myaccountxyz#gmail.com', mypassword)
mail.select("my-folder") # finds emails with this label
result, data = mail.uid('search', None, 'all')
label_to_add = "label-to-add" # previously created in Gmail
for email_uid in data[0].split():
result, data_single = mail.uid('fetch', email_uid, '(RFC822)')
raw_email = data_single[0][1]
email_message = email.message_from_string(raw_email)
sender = email_message['From']
mail.store(email_uid, '+X-GM-LABELS', '('+label_to_add+')')
# also tried without the parenthesis
mail.store(email_uid, '+X-GM-LABELS', label_to_add)
If you use mail.uid('search'... you need to use mail.uid('store', ... otherwise you're mixing UIDs and MSNs (message sequence number) which don't correspond, so sometimes you get lucky and your UIDs happen to be low enough to hit an MSN.

How to scrape a link from a multipart email in python

I have a program which logs on to a specified gmail account and gets all the emails in a selected inbox that were sent from an email that you input at runtime.
I would like to be able to grab all the links from each email and append them to a list so that i can then filter out the ones i don't need before outputting them to another file. I was using a regex to do this which requires me to convert the payload to a string. The problem is that the regex i am using doesn't work for findall(), it only works when i use search() (I am not too familiar with regexes). I was wondering if there was a better way to extract all links from an email that doesn't involve me messing around with regexes?
My code currently looks like this:
print(f'[{Mail.timestamp}] Scanning inbox')
sys.stdout.write(Style.RESET)
self.search_mail_status, self.amount_matching_criteria = self.login_session.search(Mail.CHARSET,search_criteria)
if self.amount_matching_criteria == 0 or self.amount_matching_criteria == '0':
print(f'[{Mail.timestamp}] No mails from that email address could be found...')
Mail.enter_to_continue()
import main
main.main_wrapper()
else:
pattern = '(?P<url>https?://[^\s]+)'
prog = re.compile(pattern)
self.amount_matching_criteria = self.amount_matching_criteria[0]
self.amount_matching_criteria_str = str(self.amount_matching_criteria)
num_mails = re.search(r"\d.+",self.amount_matching_criteria_str)
num_mails = ((num_mails.group())[:-1]).split(' ')
sys.stdout.write(Style.GREEN)
print(f'[{Mail.timestamp}] Status code of {self.search_mail_status}')
sys.stdout.write(Style.RESET)
sys.stdout.write(Style.YELLOW)
print(f'[{Mail.timestamp}] Found {len(num_mails)} emails')
sys.stdout.write(Style.RESET)
num_mails = self.amount_matching_criteria.split()
for message_num in num_mails:
individual_response_code, individual_response_data = self.login_session.fetch(message_num, '(RFC822)')
message = email.message_from_bytes(individual_response_data[0][1])
if message.is_multipart():
print('multipart')
multipart_payload = message.get_payload()
for sub_message in multipart_payload:
string_payload = str(sub_message.get_payload())
print(prog.search(string_payload).group("url"))
Ended up using this for loop with a recursive function and a regex to get the links, i then removed all links without a the substring that you can input earlier on in the program before appending to a set
for message_num in self.amount_matching_criteria.split():
counter += 1
_, self.individual_response_data = self.login_session.fetch(message_num, '(RFC822)')
self.raw = email.message_from_bytes(self.individual_response_data[0][1])
raw = self.raw
self.scraped_email_value = email.message_from_bytes(Mail.scrape_email(raw))
self.scraped_email_value = str(self.scraped_email_value)
self.returned_links = prog.findall(self.scraped_email_value)
for i in self.returned_links:
if self.substring_filter in i:
self.link_set.add(i)
self.timestamp = time.strftime('%H:%M:%S')
print(f'[{self.timestamp}] Links scraped: [{counter}/{len(num_mails)}]')
The function used:
def scrape_email(raw):
if raw.is_multipart():
return Mail.scrape_email(raw.get_payload(0))
else:
return raw.get_payload(None,True)

Python read last 10 emails from Outlook

I can read my last email from my Outlook and send all the results according to each line's content.
However, I am unable to find the way to read my last 10 emails to be added to the fileCollect.txt file.
Any ideas how I could do this? Here is my current code:
import win32com.client
import csv
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
inbox = outlook.GetDefaultFolder(6) # "6" refers to the index of a folder - in this case,
# the inbox. You can change that number to reference
# any other folder
messages = inbox.Items
message = messages.GetLast()
fileCollect = open("fileCollect.txt",'a')
delimiter = "¿"
fileCollect.write( str(message.Sender) + delimiter + str(message.Subject)+ delimiter + str(message.Body) )
fileCollect.close()
csvfile = open("csvfile.csv",'a')
with open("fileCollect.txt","r") as outfile:
for line in outfile:
if line.find("test") != -1:
csvfile.write(line)
csvfile.close()
The Items collection will not be sorted in any particular order until you actually sort it by calling Items.Sort. The VB script below sorts the collection by ReceivedTime in the descending order:
set messages = inbox.Items
messages.Sort("ReceivedTime", True)
set message = messages.GetFirst()
while not (message Is Nothing)
MsgBox message.Subject
set message = messages.GetNext()
wend
You can get the last 10 messages by specifying a negative index:
last_10_messages = messages[-10:]
This will return an array from messages[-10], which is the 10th to the last message, to the last message in the messages array.
use len(inbox.Items) to get the length of the inbox.
use inbox.Items.Item(i) to get i-th email in the inbox.
Ref:
https://learn.microsoft.com/en-us/office/vba/api/outlook.items.item

TypeError: cannot concatenate 'str' and 'list' objects in email

I am working on sending an email in python. Right now, I want to send entries from a list via email but I am encountering an error saying "TypeError: cannot concatenate 'str' and 'list' objects" and I have no idea to debug it. The following is the code that I have. I'm still new in this language (3 weeks) so I have a little backgroud.
import smtplib
x = [2, 3, 4] #list that I want to send
to = '' #Recipient
user_name = '' #Sender username
user_pwrd = '' #Sender Password
smtpserver = smtplib.SMTP("mail.sample.com",port)
smtpserver.ehlo()
smtpserver.starttls()
smtpserver.ehlo()
smtpserver.login(user_name,user_pwrd)
#Header Part of the Email
header = 'To: '+to+'\n'+'From: '+user_name+'\n'+'Subject: \n'
print header
#Msg
msg = header + x #THIS IS THE PART THAT I WANT TO INSERT THE LIST THAT I WANT TO SEND. the type error occurs in this line
#Send Email
smtpserver.sendmail(user_name, to, msg)
print 'done!'
#Close Email Connection
smtpserver.close()
The problem is with msg = header + x. You're trying to apply the + operator to a string and a list.
I'm not exactly sure how you want x to be displayed but, if you want something like "[1, 2, 3]", you would need:
msg = header + str(x)
Or you could do,
msg = '{header}{lst}'.format(header=header, lst=x)
Problem is that in the code line msg = header + x, the name header is a string and x is a list so these two cannot be concatenated using + operator. The solution is to convert x to a string. One way of doing that is to extract elements from the list, convert them to str and .join() them together. So you should replace the code line:
msg = header + x
by:
msg = header + "".join([str(i) for i in x])

Categories

Resources