Text manipulation in Outlook using Python

Text manipulation in Outlook using Python - python

Sometimes when sending a new event invitation for a certain meeting in Outlook I need to mention all the required people for the meeting in the invitation body, due to company conventions. Many times, the names I already sent the invitation to are the very same people I need to write all over again. I found that if I copy those names from the "To..." field, they are pasted in the format of name <mail>; name <mail>; name <mail>, so I wrote this Python function to turn it into a plain list of names separated by a new line with the mail addresses removed:
def format_invitees(string):
import re; return ''.join(x.strip(' \n')+'\n' for x in re.sub("[<].*?[>]", "", string).replace(' ; ', ';').split(';')).strip('\n')
Now, is there any good way to implement this function into an Outlook Macro, with whether to assign it to a hotkey or add it to the menu on right click? To mention that Python is the only language I know, and I am not allowed to install any external software due to organization orders. Best regards!

I think import re; return re.sub(r" ?<.*?>;? ?","\n",string) is a shorter way of defining the Python function.
But more to your point, follow the instructions at this SO question to enable VBA regex module (given for Word, but applicable in Outlook): How to Use/Enable (RegExp object) Regular Expression using VBA (MACRO) in word
I think the outlook.Recipients property may be useful for getting the names of the people you're needing to list: https://learn.microsoft.com/en-us/office/vba/api/outlook.recipients

Related

Python Script takes very long time to run

I've managed to write a piece of code (composed by multiple sources along the web, and adapted to my needs) which should do the following:
Reads an excel file
From column A to search the value of each cell within the subject of mails from a specific folder
If matches (cell value equal to first 9 characters of the subject), save the attachment (each mail has only one attachment, no more, no less) with the value of cell in an "output" folder.
If doesn't match, go to the next mail, respectively next cell value.
In the end, display the run time (not very important, only for my knowledge)
The code actually works (tested with an email folder with only 9 emails). My problem is the run time.
The actual scope of the script is to look for 2539 values in a folder with 32700 emails and save the attachments.
I've done 2 runs as follow:
2539 values in 32700 emails (stopped after ~1 hour)
10 values in 32700 emails (stopped after ~40 minutes; in this time the script processed 4 values)
I would like to know / learn, if there a way to make the script faster, or if it's slow because it's bad written etc.
Below is my code:
from pathlib import Path
import win32com.client
import os
from datetime import datetime
import time
import openpyxl
#name of the folder created for output
output_dir = Path.cwd() / "Orders"
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
folder = outlook.Folders.Item("Shared Mailbox Name")
inbox = folder.Folders.Item("Inbox")
messages = inbox.Items
wb = openpyxl.load_workbook(r"C:\Users\TEST\Path-to-excel\FolderName\ExcelName.xlsx")
sheet = wb['Sheet1']
names=sheet['A']
for cellObj in names:
ordno = str(cellObj.value)
print(ordno)
for message in messages:
subject = message.Subject
body = message.body
attachments = message.Attachments
if str(subject)[:9] == ordno:
output_dir.mkdir(parents=True, exist_ok=True)
for attachment in attachments:
attachment.SaveAsFile(output_dir / str(attachment))
else:
pass
start = time()
print(f'Time taken to run: {time() - start} seconds')
I need to mention that I am a complete rookie in Python thus any help from the community is welcomed, especially next to some clarifications of what I did wrong and why.
I've also read some similar questions but nothing helps, or at least I don't know how to adopt the methods.
Thank you!

Seems to me the main problem with your program is that you have two nested loop (one over the values & one over the mails) when you only need to loop over the mails and check if their subject is in the list of values.
First you need to construct your list of value with something like :
ordno_values = [str(cellObj.value) for cellObj in names]
then, in your loop over mails, you just need to adapt the condition to :
if str(subject)[:9] in ordno_values:

Your use case is too specific for anyone to be able to recreate, and hints about performance only generic but your main problem is a combination of "O x N" and synchronous processing: currently you are processing one value, one message at a time, which includes disk IO to get the e-mail.
You can certainly improve things by creating a single list of values from the workbook. You can then use this list with a processing pool (see the Python documentation) to read multiple e-mails at once.
But things might be even better if you can use the subject to query the mail server.
If you have follow-up questions, please break them down to specific parts of the task.

First of all, instead of iterating over all items in the folder:
for message in messages:
subject = message.Subject
And then checking whether a subject starts from the specified string or includes such string:
if str(subject)[:9] == ordno:
Instead, you need to use the Find/FindNext or Restrictmethods of theItems` class where you could get collection of items that correspond to your search criteria. Read more about these methods in the following articles:
How To: Use Find and FindNext methods to retrieve Outlook mail items from a folder (C#, VB.NET)
How To: Use Restrict method to retrieve Outlook mail items from a folder
For example, you could use the following restriction on the collection (taken form the VBA sample):
criteria = "#SQL=" & Chr(34) & "urn:schemas:httpmail:subject" & Chr(34) & " ci_phrasematch 'question'"
See Filtering Items Using a String Comparison for more information.
Also you may find the AdvancedSearch method of the Application class helpful. The key benefits of using the AdvancedSearch method in Outlook are:
The search is performed in another thread. You don’t need to run another thread manually since the AdvancedSearch method runs it automatically in the background.
Possibility to search for any item types: mail, appointment, calendar, notes etc. in any location, i.e. beyond the scope of a certain folder. The Restrict and Find/FindNext methods can be applied to a particular Items collection (see the Items property of the Folder class in Outlook).
Full support for DASL queries (custom properties can be used for searching too). To improve the search performance, Instant Search keywords can be used if Instant Search is enabled for the store (see the IsInstantSearchEnabled property of the Store class).
You can stop the search process at any moment using the Stop method of the Search class.
See Advanced search in Outlook programmatically: C#, VB.NET for more information on that.

Extract text from a config file [duplicate]

This question already has answers here:
Parse key value pairs in a text file
(7 answers)
Closed 1 year ago.
I'm using a config file to inform my Python script of a few key-values, for use in authenticating the user against a website.
I have three variables: the URL, the user name, and the API token.
I've created a config file with each key on a different line, so:
url:<url string>
auth_user:<user name>
auth_token:<API token>
I want to be able to extract the text after the key words into variables, also stripping any "\n" that exist at the end of the line. Currently I'm doing this, and it works but seems clumsy:
with open(argv[1], mode='r') as config_file:
lines = config_file.readlines()
for line in lines:
url_match = match('jira_url:', line)
if url_match:
jira_url = line[9:].split("\n")[0]
user_match = match('auth_user:', line)
if user_match:
auth_user = line[10:].split("\n")[0]
token_match = match('auth_token', line)
if token_match:
auth_token = line[11:].split("\n")[0]
Can anybody suggest a more elegant solution? Specifically it's the ... = line[10:].split("\n")[0] lines that seem clunky to me.
I'm also slightly confused why I can't reuse my match object within the for loop, and have to create new match objects for each config item.

you could use a .yml file and read values with yaml.load() function:
import yaml
with open('settings.yml') as file:
settings = yaml.load(file, Loader=yaml.FullLoader)
now you can access elements like settings["url"] and so on

If the format is always <tag>:<value> you can easily parse it by splitting the line at the colon and filling up a custom dictionary:
config_file = open(filename,"r")
lines = config_file.readlines()
config_file.close()
settings = dict()
for l in lines:
elements = l[:-1].split(':')
settings[elements[0]] = ':'.join(elements[1:])
So, you get a dictionary that has the tags as keys and the values as values. You can then just refer to these dictionary entries in your pogram.
(e.g.: if you need the auth_token, just call settings["auth_token"]

if you can add 1 line for config file, configparser is good choice
https://docs.python.org/3/library/configparser.html
[1] config file : 1.cfg
[DEFAULT] # configparser's config file need section name
url:<url string>
auth_user:<user name>
auth_token:<API token>
[2] python scripts
import configparser
config = configparser.ConfigParser()
config.read('1.cfg')
print(config.get('DEFAULT','url'))
print(config.get('DEFAULT','auth_user'))
print(config.get('DEFAULT','auth_token'))
[3] output
<url string>
<user name>
<API token>
also configparser's methods is useful
whey you can't guarantee config file is always complete

You have a couple of great answers already, but I wanted to step back and provide some guidance on how you might approach these problems in the future. Getting quick answers sometimes prevents you from understanding how those people knew about the answers in the first place.
When you zoom out, the first thing that strikes me is that your task is to provide config, using a file, to your program. Software has the remarkable property of solve-once, use-anywhere. Config files have been a problem worth solving for at least 40 years, so you can bet your bottom dollar you don't need to solve this yourself. And already-solved means someone has already figured out all the little off-by-one and edge-case dramas like stripping line endings and dealing with expected input. The challenge of course, is knowing what solution already exists. If you haven't spent 40 years peeling back the covers of computers to see how they tick, it's difficult to "just know". So you might have a poke around on Google for "config file format" or something.
That would lead you to one of the most prevalent config file systems on the planet - the INI file. Just as useful now as it was 30 years ago, and as a bonus, looks not too dissimilar to your example config file. Then you might search for "read INI file in Python" or something, and come across configparser and you're basically done.
Or you might see that sometime in the last 30 years, YAML became the more trendy option, and wouldn't you know it, PyYAML will do most of the work for you.
But none of this gets you any better at using Python to extract from text files in general. So zooming in a bit, you want to know how to extract parts of lines in a text file. Again, this problem is an age-old problem, and if you were to learn about this problem (rather than just be handed the solution), you would learn that this is called parsing and often involves tokenisation. If you do some research on, say "parsing a text file in python" for example, you would learn about the general techniques that work regardless of the language, such as looping over lines and splitting each one in turn.
Zooming in one more step closer, you're looking to strip the new line off the end of the string so it doesn't get included in your value. Once again, this ain't a new problem, and with the right keywords you could dig up the well-trodden solutions. This is often called "chomping" or "stripping", and with some careful search terms, you'd find rstrip() and friends, and not have to do awkward things like splitting on the '\n' character.
Your final question is about re-using the match object. This is much harder to research. But again, the "solution" wont necessarily show you where you went wrong. What you need to keep in mind is that the statements in the for loop are sequential. To think them through you should literally execute them in your mind, one after one, and imagine what's happening. Each time you call match, it either returns None or a Match object. You never use the object, except to check for truthiness in the if statement. And next time you call match, you do so with different arguments so you get a new Match object (or None). Therefore, you don't need to keep the object around at all. You can simply do:
if match('jira_url:', line):
jira_url = line[9:].split("\n")[0]
if match('auth_user:', line):
auth_user = line[10:].split("\n")[0]
and so on. Not only that, if the first if triggered then you don't need to bother calling match again - it will certainly not trigger any of other matches for the same line. So you could do:
if match('jira_url:', line):
jira_url = line[9:].rstrip()
elif match('auth_user:', line):
auth_user = line[10:].rstrip()
and so on.
But then you can start to think - why bother doing all these matches on the colon, only to then manually split the string at the colon afterwards? You could just do:
tokens = line.rstrip().split(':')
if token[0] == 'jira_url':
jira_url = token[1]
elif token[0] == 'auth_user':
auth_user = token[1]
If you keep making these improvements (and there's lots more to make!), eventually you'll end up re-writing configparse, but at least you'll have learned why it's often a good idea to use an existing library where practical!

How can I do language packs on my Discord Bot?

I'm developping a Discord Bot (some of you may already know it), and I wanted to do "language packs", text files where I store all sentences and words used by the bot in different languages.
But here's my problem.
I'm trying to import the "English" file in a list and print the list's content on Discord. That's easy, but then I try to reimport the list, but with another text file ("French") and it doesn't work.
I tried to use the "prefix" code, based on the following, but it doesn't seem to work.
with open("language.json") as f:
language = json.load(f)
default_language = "EN"
def language(bot, message):
id = message.server.id
return language.get(id, default_language)
lang = [i.strip() for i in open(#UNFINISHED)]#import each lines of the file [lang = [i.strip() for i in open(language).readlines()]]
I expected this code to import the language file the user selected (english default) but I get a TypeError.
I asked help to a friend, he told me to use aiofiles, but I don't know what it is and/or how to use it. He didn't know how to solve this problem.
==================================EDIT=====================================
I moved to Discord.py Rewrite because I was stuck on something. So I'm making the bot again, with some difficulties, but I'm still planning to add these translations.
But this time it is more complex. I use cogs, and these cogs will need to be translated too.
N.B. : As I don't find many tutorials on rewrite version, I'll probably ask more questions in the future.

Using imaplib, how can I create a mailbox without the \\NoSelect attribute

I'm attempting to create directory trees in an gmail IMAP account. I've used the "create()" command in imaplib, but it seems to add the \\Noselect attribute to the created folder. This breaks gmail's nested labels feature - is there a way to remove the \\Noselect attribute, or avoid it being created in the first place?
Example:
>> imap.create("foo/bar")
('OK', [b'Success'])
>> imap.list()
[b'(\\Noselect \\HasChildren) "/" "foo"', b'(\\HasNoChildren) "/" "foo/bar"',...

I figured out a solution - Not sure if it's the 'best' way though. When creating a nested mailbox in one command, the top level mailboxes automatically are flagged \\Noselect. While it may be hacky, you can remove this flag by creating each level explicitly.
Example:
folder = "abc/def/ghi/jkl"
target = ""
for level in folder.split('/'):
target += "{}/".format(level)
imap.create(target)
I'll leave the question open to see if anyone has a better solution.

bjeanes: Sam's solution works for me as long as I leave off the trailing hierarchy delimiter.
So, if I want to create the nested folder a/b/c, I first create just plain "a". If I do an xlist, it has the hasNoChildren flag set. Now I create "a/b", and an xlist will now show "a" with the "hasChildren" flag set, and "a/b" with the "hasNoChildren" flag set. Finally, I create "a/b/c", and now "b" has the "hasChildren" flag set as well. A look at the gmail web interface confirms this as well.
Sam: thanks for figuring this out and posting the solution. "Hacky" beats "not working." :^)

Extracting data from MS Word

I am looking for a way to extract / scrape data from Word files into a database. Our corporate procedures have Minutes of Meetings with clients documented in MS Word files, mostly due to history and inertia.
I want to be able to pull the action items from these meeting minutes into a database so that we can access them from a web-interface, turn them into tasks and update them as they are completed.
Which is the best way to do this:
VBA macro from inside Word to create CSV and then upload to the DB?
VBA macro in Word with connection to DB (how does one connect to MySQL from VBA?)
Python script via win32com then upload to DB?
The last one is attractive to me as the web-interface is being built with Django, but I've never used win32com or tried scripting Word from python.
EDIT: I've started extracting the text with VBA because it makes it a little easier to deal with the Word Object Model. I am having a problem though - all the text is in Tables, and when I pull the strings out of the CELLS I want, I get a strange little box character at the end of each string. My code looks like:
sFile = "D:\temp\output.txt"
fnum = FreeFile
Open sFile For Output As #fnum
num_rows = Application.ActiveDocument.Tables(2).Rows.Count
For n = 1 To num_rows
Descr = Application.ActiveDocument.Tables(2).Cell(n, 2).Range.Text
Assign = Application.ActiveDocument.Tables(2).Cell(n, 3).Range.Text
Target = Application.ActiveDocument.Tables(2).Cell(n, 4).Range.Text
If Target = "" Then
ExportText = ""
Else
ExportText = Descr & Chr(44) & Assign & Chr(44) & _
Target & Chr(13) & Chr(10)
Print #fnum, ExportText
End If
Next n
Close #fnum
What's up with the little control character box? Is some kind of character code coming across from Word?

Word has a little marker thingy that it puts at the end of every cell of text in a table.
It is used just like an end-of-paragraph marker in paragraphs: to store the formatting for the entire paragraph.
Just use the Left() function to strip it out, i.e.
Left(Target, Len(Target)-1))
By the way, instead of
num_rows = Application.ActiveDocument.Tables(2).Rows.Count
For n = 1 To num_rows
Descr = Application.ActiveDocument.Tables(2).Cell(n, 2).Range.Text
Try this:
For Each row in Application.ActiveDocument.Tables(2).Rows
Descr = row.Cells(2).Range.Text

Well, I've never scripted Word, but it's pretty easy to do simple stuff with win32com. Something like:
from win32com.client import Dispatch
word = Dispatch('Word.Application')
doc = word.Open('d:\\stuff\\myfile.doc')
doc.SaveAs(FileName='d:\\stuff\\text\\myfile.txt', FileFormat=?) # not sure what to use for ?
This is untested, but I think something like that will just open the file and save it as plain text (provided you can find the right fileformat) – you could then read the text into python and manipulate it from there. There is probably a way to grab the contents of the file directly, too, but I don't know it off hand; documentation can be hard to find, but if you've got VBA docs or experience, you should be able to carry them across.
Have a look at this post from a while ago: http://mail.python.org/pipermail/python-list/2002-October/168785.html Scroll down to COMTools.py; there's some good examples there.
You can also run makepy.py (part of the pythonwin distribution) to generate python "signatures" for the COM functions available, and then look through it as a kind of documentation.

You could use OpenOffice. It can open word files, and also can run python macros.

I'd say look at the related questions on the right -->
The top one seems to have some good ideas for going the python route.

how about saving the file as xml. then using python or something else and pull the data out of word and into the database.

It is possible to programmatically save a Word document as HTML and to import the table(s) contained into Access. This requires very little effort.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.