Python imaplib can't select() custom gmail labels - python

I have a bot I'm writing using imaplib in python to fetch emails from gmail and output some useful data from them. I've hit a snag on selecting the inbox, though; the existing sorting system uses custom labels to separate emails from different customers. I've partially replicated this system in my test email, but imaplib.select() throws a "imaplib.IMAP4.error: SELECT command error: BAD [b'Could not parse command']" with custom labels. Screenshot attatched My bot has no problem with the default gmail folders, fetching INBOX or [Gmail]/Spam. In that case, it hits an error later in the code that deals with completely different problem I have yet to fix. The point, though, is that imaplib.select() is succsessful with default inboxes and just not custom labels.
The way my code works is it works through all the available inboxes, compares it to a user-inputted name, and if they match, saves the name and sets a boolean to true to signal that it found a match. It then checks, if there was a match (the user-inputted inbox exists) it goes ahead, otherwise it throws an error message and resets. It then attempts to select the inbox the user entered.
I've verified that the variable the program's saving the inbox name to matches what's listed as the name in the imap.list() command. I have no idea what the issue is.
I could bypass the process by iterating through all mail to find the email's I'm looking for, but it's far more efficient to use the existing sorting system due to the sheer number of emails on the account I'll be using.
Any help is appreciated!
EDIT: Code attached after request. Thank you to the person who told me to do so.
'''
Fetches emails from the specified inbox and outputs them to a popup
'''
def fetchEmails(self):
#create an imap object. Must be local otherwise we can only establish a single connection
#imap states are kinda bad
imap = imaplib.IMAP4_SSL(host="imap.gmail.com", port="993")
#Login and fetch a list of available inboxes
imap.login(username.get(), password.get())
type, inboxList = imap.list()
#Set a reference boolean and iterate through the list
inboxNameExists = False
for i in inboxList:
#Finds the name of the inbox
name = self.inboxNameParser(i.decode())
#If the given inbox name is encountered, set its existence to true and break
if name.casefold().__eq__(inboxName.get().casefold()):
inboxNameExists = True
break
#If the inbox name does not exist, break and give error message
if inboxNameExists != True:
self.logout(imap)
tk.messagebox.showerror("Disconnected!", "That Inbox does not exist.")
return
'''
If/else to correctly feed the imap.select() method the inbox name
Apparently inboxes containing spaces require quoations before and after
Selects the inbox and pushes it to a variable
two actually but the first is unnecessary(?)
imap is weird
'''
if(name.count(" ") > 0):
status, messages = imap.select("\"" + name + "\"")
else:
status, messages = imap.select(name);
#Int containing total number of emails in inbox
messages = int(messages[0])
#If there are no messages disconnect and show an infobox
if messages == 0:
self.logout(imap)
tk.messagebox.showinfo("Disconnected!", "The inbox is empty.")
self.mailboxLoop(imap, messages)
Figured the issue out after a few hours banging through it with a friend. As it turns out the problem was that imap.select() wants quotations around the mailbox name if it contains spaces. So imap.select("INBOX") is fine, but with spaces you'd need imap.select("\"" + "Label Name" + "\"")
You can see this reflected in the code I posted with the last if/else statement.

Python imaplib requires mailbox names with spaces to be surrounded by apostrophes. So imap.select("INBOX") is fine, but with spaces you'd need imap.select("\"" + "Label Name" + "\"").

Related

Python Script takes very long time to run

I've managed to write a piece of code (composed by multiple sources along the web, and adapted to my needs) which should do the following:
Reads an excel file
From column A to search the value of each cell within the subject of mails from a specific folder
If matches (cell value equal to first 9 characters of the subject), save the attachment (each mail has only one attachment, no more, no less) with the value of cell in an "output" folder.
If doesn't match, go to the next mail, respectively next cell value.
In the end, display the run time (not very important, only for my knowledge)
The code actually works (tested with an email folder with only 9 emails). My problem is the run time.
The actual scope of the script is to look for 2539 values in a folder with 32700 emails and save the attachments.
I've done 2 runs as follow:
2539 values in 32700 emails (stopped after ~1 hour)
10 values in 32700 emails (stopped after ~40 minutes; in this time the script processed 4 values)
I would like to know / learn, if there a way to make the script faster, or if it's slow because it's bad written etc.
Below is my code:
from pathlib import Path
import win32com.client
import os
from datetime import datetime
import time
import openpyxl
#name of the folder created for output
output_dir = Path.cwd() / "Orders"
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
folder = outlook.Folders.Item("Shared Mailbox Name")
inbox = folder.Folders.Item("Inbox")
messages = inbox.Items
wb = openpyxl.load_workbook(r"C:\Users\TEST\Path-to-excel\FolderName\ExcelName.xlsx")
sheet = wb['Sheet1']
names=sheet['A']
for cellObj in names:
ordno = str(cellObj.value)
print(ordno)
for message in messages:
subject = message.Subject
body = message.body
attachments = message.Attachments
if str(subject)[:9] == ordno:
output_dir.mkdir(parents=True, exist_ok=True)
for attachment in attachments:
attachment.SaveAsFile(output_dir / str(attachment))
else:
pass
start = time()
print(f'Time taken to run: {time() - start} seconds')
I need to mention that I am a complete rookie in Python thus any help from the community is welcomed, especially next to some clarifications of what I did wrong and why.
I've also read some similar questions but nothing helps, or at least I don't know how to adopt the methods.
Thank you!
Seems to me the main problem with your program is that you have two nested loop (one over the values & one over the mails) when you only need to loop over the mails and check if their subject is in the list of values.
First you need to construct your list of value with something like :
ordno_values = [str(cellObj.value) for cellObj in names]
then, in your loop over mails, you just need to adapt the condition to :
if str(subject)[:9] in ordno_values:
Your use case is too specific for anyone to be able to recreate, and hints about performance only generic but your main problem is a combination of "O x N" and synchronous processing: currently you are processing one value, one message at a time, which includes disk IO to get the e-mail.
You can certainly improve things by creating a single list of values from the workbook. You can then use this list with a processing pool (see the Python documentation) to read multiple e-mails at once.
But things might be even better if you can use the subject to query the mail server.
If you have follow-up questions, please break them down to specific parts of the task.
First of all, instead of iterating over all items in the folder:
for message in messages:
subject = message.Subject
And then checking whether a subject starts from the specified string or includes such string:
if str(subject)[:9] == ordno:
Instead, you need to use the Find/FindNext or Restrictmethods of theItems` class where you could get collection of items that correspond to your search criteria. Read more about these methods in the following articles:
How To: Use Find and FindNext methods to retrieve Outlook mail items from a folder (C#, VB.NET)
How To: Use Restrict method to retrieve Outlook mail items from a folder
For example, you could use the following restriction on the collection (taken form the VBA sample):
criteria = "#SQL=" & Chr(34) & "urn:schemas:httpmail:subject" & Chr(34) & " ci_phrasematch 'question'"
See Filtering Items Using a String Comparison for more information.
Also you may find the AdvancedSearch method of the Application class helpful. The key benefits of using the AdvancedSearch method in Outlook are:
The search is performed in another thread. You don’t need to run another thread manually since the AdvancedSearch method runs it automatically in the background.
Possibility to search for any item types: mail, appointment, calendar, notes etc. in any location, i.e. beyond the scope of a certain folder. The Restrict and Find/FindNext methods can be applied to a particular Items collection (see the Items property of the Folder class in Outlook).
Full support for DASL queries (custom properties can be used for searching too). To improve the search performance, Instant Search keywords can be used if Instant Search is enabled for the store (see the IsInstantSearchEnabled property of the Store class).
You can stop the search process at any moment using the Stop method of the Search class.
See Advanced search in Outlook programmatically: C#, VB.NET for more information on that.

Python and Outlook - Marking message thread as 'read'

My company uses JIRA to track issues, and is set up to send an e-mail to all watchers and tagged users whenever an update is done on the issue. We also have some automation in place that will adjust fields on the issue (like sprint number) whenever it gets closed (this'll also send an e-mail). I also have a filter within Outlook that'll put any e-mail from JIRA into a separate subfolder 'JIRA'.
I often receive e-mails on issues that have been closed. I'm trying to write a small Python script that'll mark all these e-mails as read if the JIRA issue has been closed already. The basic idea is I can run this script once a week or so to clean up my mailbox.
I'm using the pywin32 and jira packages to do this, but I can't figure out how to change a message status. The fact that documentation is scarce doesn't help...
What I have:
import re
import textwrap
from jira import JIRA
import pandas as pd
import win32com.client
jira = JIRA("<JIRA URL>", None, ("<USER>", "<JIRA API key>"))
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
wrapper = textwrap.TextWrapper(initial_indent="", width=100, subsequent_indent=" " * 4)
days_back = 10
start_time = pd.to_datetime("now").floor("D") - pd.to_timedelta(days_back, unit="D")
for message in outlook.getDefaultFolder(6).Folders.Item("JIRA").Items.Restrict(f"[ReceivedTime] >= '{start_time.strftime('%d/%m/%Y %H:%M %p')}'"):
if message.Unread:
jira_issue = re.search("\[JIRA\] \([A-Z0-9-]+\)", str(message)).group().split()[1][1:-1]
print(message, jira_issue)
print(message.body)
issue = jira.issue(jira_issue)
status = issue.fields.status
if status in ("Done", "Checked"):
message.Unread = False
as noted in this SO issue. This doesn't seem to mark any e-mail as read.
Is this something I can even do in Python? If so, how? If not, what could be an alternative approach?
You can use Categories property to assign a red category to items in Outlook. Categories is a delimited string of category names that have been assigned to an Outlook item. This property uses the character specified in the value name, sList, under HKEY_CURRENT_USER\Control Panel\International in the Windows registry, as the delimiter for multiple categories. See Setting an Outlook mailitem's category programmatically? for more information.

separate emails in the email thread based on reference or in-reply-to headers using imap_tools

I am working on a CRM, where I am receiving hundreds of emails for offers/requirements per day. I am building an API that will process the email and will insert entries in the CRM.
I am using imap_tools to get the mails in my API. but I am stuck at the point when there's a thread/conversation. I read some articles regarding using reference or in-reply-to header from the mail. but unlucky so far. I have also tried using the message-id but it gave me the same email thread instead of multiple emails.
I am getting an email thread/conversation as a single email and I want to get separated emails so I can process them easily.
here's what I have done so far.
from imap_tools import MailBox
with MailBox('mail.mail.com').login('abc#abc.com', 'password', 'INBOX') as mailbox:
for msg in mailbox.fetch():
From = msg.headers['from'][0]
To = msg.headers['to'][0]
subject = msg.headers['subject'][0]
received_date = msg.headers['date'][0]
raw_email = msg.text
process_email(raw_email) #processing the email
The issue you are facing is not related to the headers reference or in-reply-to. Most email clients will append the previous email as quoted text to the new mail when you reply. Hence in a thread, a mail will have the body of all previous mails as quoted text.
In most cases, and I say most since the Email standards vary a lot from client to client, the client will quote the previous mail by pretending > before all quoted lines
new message
> old message
>> very old message
As a hacky solution, you can drop all lines that start with >
In python, you can splitlines() and filter
lines = email.splitlines()
new_lines = [i for i in lines if not i.startswith('>')]
or
new_lines = list(filter(lambda i: not i.startswith('>'), lines))
you may use regular expressions or other techniques too.
the issue with the solution is obvious, if an email contains > else where it will cause loss of information. Hence a more complicated approach is to select lines with > and compare them with the previous emails in the thread using references and remove those which match.
Google has their patented implementation here
https://patents.google.com/patent/US7222299
Source: How to remove the quoted text from an email and only show the new text
Edit
I realized Gmail follows the > quoting and other clients may follow other methods. There's a Wikipedia article on it: https://en.wikipedia.org/wiki/Posting_style
conceptually the approach needed will be similar, but different types of clients will need to be handled

listing Outlook emails by specific date in Python

I'm using Python 3.
I'm trying to extract (list / print show) outlook emails by date.
I was trying a loop.. maybe WHILE or IF statement.
Can it be done since ones a string and the other is a date.
Please concide what I've got so far: Thanks.
1. import win32com.client, datetime
2.
3. # Connect with MS Outlook - must be open.
4. outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
5. # connect to Sent Items
6. sent = outlook.GetDefaultFolder(5).Items # "5" refers to the sent item of a folder
7.
8. # Get yesterdays date
9. y = (datetime.date.today () - datetime.timedelta (days=1))
10. # Get emails by selected date
11. if sent == y:
12. msg = sent.GetLast()
13. # get Subject line
14. sjl = msg.subject
14. # print it out
15. print (sjl)
Ive completed the code. Thanks for help.
`import sys, win32com.client, datetime
# Connect with MS Outlook - must be open.
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace
("MAPI")
# connect to Sent Items
s = outlook.GetDefaultFolder(5).Items # "5" refers to the sent item of a
folder
#s.Sort("s", true)
# Get yesterdays date for the purpose of getting emails from this date
d = (datetime.date.today() - datetime.timedelta (days=1)).strftime("%d-%m-%
y")
# get the email/s
msg = s.GetLast()
# Loop through emails
while msg:
# Get email date
date = msg.SentOn.strftime("%d-%m-%y")
# Get Subject Line of email
sjl = msg.Subject
# Set the critera for whats wanted
if d == date and msg.Subject.startswith("xx") or msg.Subject.startswith
("yy"):
print("Subject: " + sjl + " Date : ", date)
msg = s.GetPrevious() `
This works. However if no message according to the constraint if found, it doesnt exit. Ive tried break which just finds one message and not all, Im wondering if and how to do an exception? or if i try a else d != date it doenst work either (it will not find anything).
I cant see that a For loop will work using a date with a msg(string).
I not sure -- biginner here :)
??
The outlook API has a method, Items.Find, for searching the contents of .Items. If this is the extent of what you want to do, that's probably how you should do it.
Right now it seems like your if statement is checking whether set of emails is equal to yesterday.
Microsoft's documentation says .Items is returning a collection of emails which you first must iterate through using a few different methods including Items.GetNext or by referencing a specific index with Items.Item.
You can then take the current email and access the .SentOn property.
currentMessage = sent.GetFirst()
while currentMessage:
if currentMessage.SentOn == y:
sjl = currentMessage.Subject
print(sjl)
currentMessage = sent.GetNext()
This should iterate through all messages in the sent folder until sent.GetNext() has no more messages to return. You will have to make sure y is the same formatting as what .SentOn returns.
If you don't want to iterate through every message, you could probably also nest two loops that goes back in messages until it gets to yesterday, iterates until it is no longer within "yesterday", and then breaks.
The COM API documentation is fairly thorough, you can see the class list for example here. It also documents the various methods you can use to manipulate the objects it has. In your particular example what you are after is to restrict your set of items via date. You will see that there is already a function for that in the items class here. Conveniently it is called Restrict. The only gotcha I can see with that function is that you need to specify the filter that you would like on your items in string form, thus requiring you to construct the string yourself.
So for example to continue your code and restrict by time:
#first create the string filter, here you would like to filter on sent time
#assuming you wanted emails after 5 pm as an example and your date d from the code above
sFilter = "[SentOn] > '{0} 5:00 PM'".format(d)
#then simply retrieve your restricted items
filteredEmails = s.Restrict(sFilter)
You can of course restrict by all sorts of criteria, just check the documentation on the function. This way if you restrict and it returns an empty set of items you can handle that case in the code rather than having to work with exceptions. So for example:
#you have restricted your selection now want to check if you have anything
if filteredEmails.Count == 0:
#handle this situation however you would like

How to change email flag to Recent using IMAPClient

I am retrieving emails from my email server using IMAPClient (Python), by checking for emails flagged with "\Recent". After the email has been read the email server automatically sets the email flag to "\Seen".
What I want to do is reset the email flag to "\Recent" so when I check the email directly on the server is still appears as unread.
What I'm finding is that IMAPClient is throwing an exception when I try to add the "\Recent" flag to an email using IMAPClient's "set_flag" definition. Adding any other flag works fine.
The IMAPClient documentation say's the Recent flag is read-only, but I was wondering if there is still a way to mark an email as un-read.
From my understanding email software like Thunderbird allows you to set emails as un-read so I assume there must be a way to do it.
Thanks.
For completeness, here's an actual example using IMAPClient. The \Seen flag is updated in order to control whether messages are marked as read or unread.
from imapclient import IMAPClient, SEEN
client = IMAPClient(...)
client.select_folder('INBOX')
msg_ids = client.search(...)
# Mark messages as read
client.add_flags(msg_ids, [SEEN])
# Mark messages as unread
client.remove_flags(msg_ids, [SEEN])
Note that add_flags and remove_flags are used instead of set_flags because the latter resets the flags to just those specified. When setting the read/unread status you typically want to leave any other message flags intact.
It's also worth noting that it's possible call fetch using the "BODY.PEEK" data item to retrieve parts of messages without affecting the \Seen flag. This can avoid the need to fix up the \Seen flag after downloading a message.
See section 6.4.5 of RFC 3501 for more details.
IMAPClient docs specifically stated the '\Recent' flag is ReadOnly:
http://imapclient.readthedocs.org/en/latest/#message-flags
This is probably a feature (or limitation) of IMAP and IMAP servers. (That is: probably not an IMAPClient limitation).
Use the '\Seen' flag to mark something unread.
Disclaimer: I'm familiar with IMAP but not Python-IMAPClient specifically.
Normally the 'seen' flag determines if an email summary will be shown normal or bold.
You should be able to reset the seen flag. However the recent flag may not be under your direct control. The imap server will set it if notices new messages arriving.
#Menno Smits:
I'm having issues adding the '\Seen' flag to a mail after parsing through it.
I only want to mark a mail as READ when it contains a particular text.
I've been trying to use the add_flags using the "client.add_flags(msg_ids, [SEEN])" you gave above but I keep getting store failed: Command received in invalid state What exactly goes into the [SEEN](is this just a placeholder or the exact syntax?)
Here is a portion of my code:
#login and authentication
context=ssl.SSLContext(ssl.PROTOCOL_TLSv1_2)
iobj=imapclient.IMAPClient('outlook.office365.com', ssl=True,ssl_context=context)
iobj.login(uname,pwd)
iobj.select_folder('INBOX', readonly=True)
unread=iobj.search('UNSEEN')
print('There are: ',len(unread),' UNREAD emails')
for i in unread:
mail=iobj.fetch(i,['BODY[]'])
mail_body=html2text.html2text(mcontent.html_part.get_payload().decode(mcontent.html_part.charset))
##Do some regex to parse the email to check if it contains text
meter_no=(re.findall(r'\nACCOUNT NUMBER: (\d+)', mail_body))
req_type=(re.findall(r'Complaint:..+?\n(.+)\n', mail_body))
if 'Key Change' in req_type:
if meter_no in kct['Account_no'].values:
print 'Going to sendmail'# Call a function
sending_email(meter_no,subject,phone_no,req_type,)
mail[b'FLAGS']=r'b\Seen'+','+''+r'b\Answered'##Trying to manuaally alter the flag but didn't work##
iobj.add_flags(i,br'\Seen')# Didn't work too (but is 'i' my msg_id??)
iobj.add_flags(i,[SEEN]) # Complains Name SEEN not defined
else: print 'KCT is yet to be generated'

Categories

Resources