Python Script takes very long time to run - python

I've managed to write a piece of code (composed by multiple sources along the web, and adapted to my needs) which should do the following:
Reads an excel file
From column A to search the value of each cell within the subject of mails from a specific folder
If matches (cell value equal to first 9 characters of the subject), save the attachment (each mail has only one attachment, no more, no less) with the value of cell in an "output" folder.
If doesn't match, go to the next mail, respectively next cell value.
In the end, display the run time (not very important, only for my knowledge)
The code actually works (tested with an email folder with only 9 emails). My problem is the run time.
The actual scope of the script is to look for 2539 values in a folder with 32700 emails and save the attachments.
I've done 2 runs as follow:
2539 values in 32700 emails (stopped after ~1 hour)
10 values in 32700 emails (stopped after ~40 minutes; in this time the script processed 4 values)
I would like to know / learn, if there a way to make the script faster, or if it's slow because it's bad written etc.
Below is my code:
from pathlib import Path
import win32com.client
import os
from datetime import datetime
import time
import openpyxl
#name of the folder created for output
output_dir = Path.cwd() / "Orders"
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
folder = outlook.Folders.Item("Shared Mailbox Name")
inbox = folder.Folders.Item("Inbox")
messages = inbox.Items
wb = openpyxl.load_workbook(r"C:\Users\TEST\Path-to-excel\FolderName\ExcelName.xlsx")
sheet = wb['Sheet1']
names=sheet['A']
for cellObj in names:
ordno = str(cellObj.value)
print(ordno)
for message in messages:
subject = message.Subject
body = message.body
attachments = message.Attachments
if str(subject)[:9] == ordno:
output_dir.mkdir(parents=True, exist_ok=True)
for attachment in attachments:
attachment.SaveAsFile(output_dir / str(attachment))
else:
pass
start = time()
print(f'Time taken to run: {time() - start} seconds')
I need to mention that I am a complete rookie in Python thus any help from the community is welcomed, especially next to some clarifications of what I did wrong and why.
I've also read some similar questions but nothing helps, or at least I don't know how to adopt the methods.
Thank you!

Seems to me the main problem with your program is that you have two nested loop (one over the values & one over the mails) when you only need to loop over the mails and check if their subject is in the list of values.
First you need to construct your list of value with something like :
ordno_values = [str(cellObj.value) for cellObj in names]
then, in your loop over mails, you just need to adapt the condition to :
if str(subject)[:9] in ordno_values:

Your use case is too specific for anyone to be able to recreate, and hints about performance only generic but your main problem is a combination of "O x N" and synchronous processing: currently you are processing one value, one message at a time, which includes disk IO to get the e-mail.
You can certainly improve things by creating a single list of values from the workbook. You can then use this list with a processing pool (see the Python documentation) to read multiple e-mails at once.
But things might be even better if you can use the subject to query the mail server.
If you have follow-up questions, please break them down to specific parts of the task.

First of all, instead of iterating over all items in the folder:
for message in messages:
subject = message.Subject
And then checking whether a subject starts from the specified string or includes such string:
if str(subject)[:9] == ordno:
Instead, you need to use the Find/FindNext or Restrictmethods of theItems` class where you could get collection of items that correspond to your search criteria. Read more about these methods in the following articles:
How To: Use Find and FindNext methods to retrieve Outlook mail items from a folder (C#, VB.NET)
How To: Use Restrict method to retrieve Outlook mail items from a folder
For example, you could use the following restriction on the collection (taken form the VBA sample):
criteria = "#SQL=" & Chr(34) & "urn:schemas:httpmail:subject" & Chr(34) & " ci_phrasematch 'question'"
See Filtering Items Using a String Comparison for more information.
Also you may find the AdvancedSearch method of the Application class helpful. The key benefits of using the AdvancedSearch method in Outlook are:
The search is performed in another thread. You don’t need to run another thread manually since the AdvancedSearch method runs it automatically in the background.
Possibility to search for any item types: mail, appointment, calendar, notes etc. in any location, i.e. beyond the scope of a certain folder. The Restrict and Find/FindNext methods can be applied to a particular Items collection (see the Items property of the Folder class in Outlook).
Full support for DASL queries (custom properties can be used for searching too). To improve the search performance, Instant Search keywords can be used if Instant Search is enabled for the store (see the IsInstantSearchEnabled property of the Store class).
You can stop the search process at any moment using the Stop method of the Search class.
See Advanced search in Outlook programmatically: C#, VB.NET for more information on that.

Related

Python and Outlook - Marking message thread as 'read'

My company uses JIRA to track issues, and is set up to send an e-mail to all watchers and tagged users whenever an update is done on the issue. We also have some automation in place that will adjust fields on the issue (like sprint number) whenever it gets closed (this'll also send an e-mail). I also have a filter within Outlook that'll put any e-mail from JIRA into a separate subfolder 'JIRA'.
I often receive e-mails on issues that have been closed. I'm trying to write a small Python script that'll mark all these e-mails as read if the JIRA issue has been closed already. The basic idea is I can run this script once a week or so to clean up my mailbox.
I'm using the pywin32 and jira packages to do this, but I can't figure out how to change a message status. The fact that documentation is scarce doesn't help...
What I have:
import re
import textwrap
from jira import JIRA
import pandas as pd
import win32com.client
jira = JIRA("<JIRA URL>", None, ("<USER>", "<JIRA API key>"))
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
wrapper = textwrap.TextWrapper(initial_indent="", width=100, subsequent_indent=" " * 4)
days_back = 10
start_time = pd.to_datetime("now").floor("D") - pd.to_timedelta(days_back, unit="D")
for message in outlook.getDefaultFolder(6).Folders.Item("JIRA").Items.Restrict(f"[ReceivedTime] >= '{start_time.strftime('%d/%m/%Y %H:%M %p')}'"):
if message.Unread:
jira_issue = re.search("\[JIRA\] \([A-Z0-9-]+\)", str(message)).group().split()[1][1:-1]
print(message, jira_issue)
print(message.body)
issue = jira.issue(jira_issue)
status = issue.fields.status
if status in ("Done", "Checked"):
message.Unread = False
as noted in this SO issue. This doesn't seem to mark any e-mail as read.
Is this something I can even do in Python? If so, how? If not, what could be an alternative approach?
You can use Categories property to assign a red category to items in Outlook. Categories is a delimited string of category names that have been assigned to an Outlook item. This property uses the character specified in the value name, sList, under HKEY_CURRENT_USER\Control Panel\International in the Windows registry, as the delimiter for multiple categories. See Setting an Outlook mailitem's category programmatically? for more information.

Text manipulation in Outlook using Python

Sometimes when sending a new event invitation for a certain meeting in Outlook I need to mention all the required people for the meeting in the invitation body, due to company conventions. Many times, the names I already sent the invitation to are the very same people I need to write all over again. I found that if I copy those names from the "To..." field, they are pasted in the format of name <mail>; name <mail>; name <mail>, so I wrote this Python function to turn it into a plain list of names separated by a new line with the mail addresses removed:
def format_invitees(string):
import re; return ''.join(x.strip(' \n')+'\n' for x in re.sub("[<].*?[>]", "", string).replace(' ; ', ';').split(';')).strip('\n')
Now, is there any good way to implement this function into an Outlook Macro, with whether to assign it to a hotkey or add it to the menu on right click? To mention that Python is the only language I know, and I am not allowed to install any external software due to organization orders. Best regards!
I think import re; return re.sub(r" ?<.*?>;? ?","\n",string) is a shorter way of defining the Python function.
But more to your point, follow the instructions at this SO question to enable VBA regex module (given for Word, but applicable in Outlook): How to Use/Enable (RegExp object) Regular Expression using VBA (MACRO) in word
I think the outlook.Recipients property may be useful for getting the names of the people you're needing to list: https://learn.microsoft.com/en-us/office/vba/api/outlook.recipients

Python imaplib can't select() custom gmail labels

I have a bot I'm writing using imaplib in python to fetch emails from gmail and output some useful data from them. I've hit a snag on selecting the inbox, though; the existing sorting system uses custom labels to separate emails from different customers. I've partially replicated this system in my test email, but imaplib.select() throws a "imaplib.IMAP4.error: SELECT command error: BAD [b'Could not parse command']" with custom labels. Screenshot attatched My bot has no problem with the default gmail folders, fetching INBOX or [Gmail]/Spam. In that case, it hits an error later in the code that deals with completely different problem I have yet to fix. The point, though, is that imaplib.select() is succsessful with default inboxes and just not custom labels.
The way my code works is it works through all the available inboxes, compares it to a user-inputted name, and if they match, saves the name and sets a boolean to true to signal that it found a match. It then checks, if there was a match (the user-inputted inbox exists) it goes ahead, otherwise it throws an error message and resets. It then attempts to select the inbox the user entered.
I've verified that the variable the program's saving the inbox name to matches what's listed as the name in the imap.list() command. I have no idea what the issue is.
I could bypass the process by iterating through all mail to find the email's I'm looking for, but it's far more efficient to use the existing sorting system due to the sheer number of emails on the account I'll be using.
Any help is appreciated!
EDIT: Code attached after request. Thank you to the person who told me to do so.
'''
Fetches emails from the specified inbox and outputs them to a popup
'''
def fetchEmails(self):
#create an imap object. Must be local otherwise we can only establish a single connection
#imap states are kinda bad
imap = imaplib.IMAP4_SSL(host="imap.gmail.com", port="993")
#Login and fetch a list of available inboxes
imap.login(username.get(), password.get())
type, inboxList = imap.list()
#Set a reference boolean and iterate through the list
inboxNameExists = False
for i in inboxList:
#Finds the name of the inbox
name = self.inboxNameParser(i.decode())
#If the given inbox name is encountered, set its existence to true and break
if name.casefold().__eq__(inboxName.get().casefold()):
inboxNameExists = True
break
#If the inbox name does not exist, break and give error message
if inboxNameExists != True:
self.logout(imap)
tk.messagebox.showerror("Disconnected!", "That Inbox does not exist.")
return
'''
If/else to correctly feed the imap.select() method the inbox name
Apparently inboxes containing spaces require quoations before and after
Selects the inbox and pushes it to a variable
two actually but the first is unnecessary(?)
imap is weird
'''
if(name.count(" ") > 0):
status, messages = imap.select("\"" + name + "\"")
else:
status, messages = imap.select(name);
#Int containing total number of emails in inbox
messages = int(messages[0])
#If there are no messages disconnect and show an infobox
if messages == 0:
self.logout(imap)
tk.messagebox.showinfo("Disconnected!", "The inbox is empty.")
self.mailboxLoop(imap, messages)
Figured the issue out after a few hours banging through it with a friend. As it turns out the problem was that imap.select() wants quotations around the mailbox name if it contains spaces. So imap.select("INBOX") is fine, but with spaces you'd need imap.select("\"" + "Label Name" + "\"")
You can see this reflected in the code I posted with the last if/else statement.
Python imaplib requires mailbox names with spaces to be surrounded by apostrophes. So imap.select("INBOX") is fine, but with spaces you'd need imap.select("\"" + "Label Name" + "\"").

Get Email list from Outlook using Python

Am trying to get last 1000 emails i received in outlook. But the code only get Email from Main folder not from sub folders.
Please assist
import win32com.client
import pandas as pd
import dateutil.parser
from datetime import datetime
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
inbox = outlook.GetDefaultFolder(6) # "6" refers to the index of a folder - in this case,
# the inbox. You can change that number to reference
# any other folder
messages = inbox.Items
messages.Sort("[ReceivedTime]", True)
i=1
df = pd.DataFrame(columns=['Sender','Subject','DateTime'])
Today = datetime.now().strftime("%m/%d/%Y") # current date and time
while i<1000:
message=messages[i]
DT1=message.ReceivedTime
DT = DT1.strftime("%m/%d/%Y, %H:%M:%S")
a=message.SenderEmailAddress
if "-" in a:
a=a.split("-",1)[1]
b=message.subject
df = df.append({'Sender':a,'Subject':b,'DateTime':DT}, ignore_index=True)
i+=1
df.to_excel("C:/Users/abc/Downloads/Email.xlsx")
To perform a search over multiple folders you need to use the AdvancedSearch method of the Application class. The key benefits of using the AdvancedSearch method in Outlook are:
The search is performed in another thread. You don’t need to run another thread manually since the AdvancedSearch method runs it automatically in the background.
Possibility to search for any item types: mail, appointment, calendar, notes etc. in any location, i.e. beyond the scope of a certain folder. The Restrict and Find/FindNext methods can be applied to a particular Items collection.
Full support for DASL queries (custom properties can be used for searching too). You can read more about this in the Filtering article in MSDN. To improve the search performance, Instant Search keywords can be used if Instant Search is enabled for the store (see the IsInstantSearchEnabled property of the Store class).
You can stop the search process at any moment using the Stop method of the Search class.
Read more about the AdvancedSearch method and find samples in the Advanced search in Outlook programmatically: C#, VB.NET article.
The Restrict or Find/FindNext methods of the Items class allow getting items according to your conditions from a single folder only.

listing Outlook emails by specific date in Python

I'm using Python 3.
I'm trying to extract (list / print show) outlook emails by date.
I was trying a loop.. maybe WHILE or IF statement.
Can it be done since ones a string and the other is a date.
Please concide what I've got so far: Thanks.
1. import win32com.client, datetime
2.
3. # Connect with MS Outlook - must be open.
4. outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
5. # connect to Sent Items
6. sent = outlook.GetDefaultFolder(5).Items # "5" refers to the sent item of a folder
7.
8. # Get yesterdays date
9. y = (datetime.date.today () - datetime.timedelta (days=1))
10. # Get emails by selected date
11. if sent == y:
12. msg = sent.GetLast()
13. # get Subject line
14. sjl = msg.subject
14. # print it out
15. print (sjl)
Ive completed the code. Thanks for help.
`import sys, win32com.client, datetime
# Connect with MS Outlook - must be open.
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace
("MAPI")
# connect to Sent Items
s = outlook.GetDefaultFolder(5).Items # "5" refers to the sent item of a
folder
#s.Sort("s", true)
# Get yesterdays date for the purpose of getting emails from this date
d = (datetime.date.today() - datetime.timedelta (days=1)).strftime("%d-%m-%
y")
# get the email/s
msg = s.GetLast()
# Loop through emails
while msg:
# Get email date
date = msg.SentOn.strftime("%d-%m-%y")
# Get Subject Line of email
sjl = msg.Subject
# Set the critera for whats wanted
if d == date and msg.Subject.startswith("xx") or msg.Subject.startswith
("yy"):
print("Subject: " + sjl + " Date : ", date)
msg = s.GetPrevious() `
This works. However if no message according to the constraint if found, it doesnt exit. Ive tried break which just finds one message and not all, Im wondering if and how to do an exception? or if i try a else d != date it doenst work either (it will not find anything).
I cant see that a For loop will work using a date with a msg(string).
I not sure -- biginner here :)
??
The outlook API has a method, Items.Find, for searching the contents of .Items. If this is the extent of what you want to do, that's probably how you should do it.
Right now it seems like your if statement is checking whether set of emails is equal to yesterday.
Microsoft's documentation says .Items is returning a collection of emails which you first must iterate through using a few different methods including Items.GetNext or by referencing a specific index with Items.Item.
You can then take the current email and access the .SentOn property.
currentMessage = sent.GetFirst()
while currentMessage:
if currentMessage.SentOn == y:
sjl = currentMessage.Subject
print(sjl)
currentMessage = sent.GetNext()
This should iterate through all messages in the sent folder until sent.GetNext() has no more messages to return. You will have to make sure y is the same formatting as what .SentOn returns.
If you don't want to iterate through every message, you could probably also nest two loops that goes back in messages until it gets to yesterday, iterates until it is no longer within "yesterday", and then breaks.
The COM API documentation is fairly thorough, you can see the class list for example here. It also documents the various methods you can use to manipulate the objects it has. In your particular example what you are after is to restrict your set of items via date. You will see that there is already a function for that in the items class here. Conveniently it is called Restrict. The only gotcha I can see with that function is that you need to specify the filter that you would like on your items in string form, thus requiring you to construct the string yourself.
So for example to continue your code and restrict by time:
#first create the string filter, here you would like to filter on sent time
#assuming you wanted emails after 5 pm as an example and your date d from the code above
sFilter = "[SentOn] > '{0} 5:00 PM'".format(d)
#then simply retrieve your restricted items
filteredEmails = s.Restrict(sFilter)
You can of course restrict by all sorts of criteria, just check the documentation on the function. This way if you restrict and it returns an empty set of items you can handle that case in the code rather than having to work with exceptions. So for example:
#you have restricted your selection now want to check if you have anything
if filteredEmails.Count == 0:
#handle this situation however you would like

Categories

Resources