How to read email file (saved email to local drive, with “.msg” extension)?
I tried this 2 lines and it doesn't work out.
msg = open('Departure HOUSTON EXPRESS Port NORFOLK.msg', 'r')
print msg.read()
I searched the web for an answer, which gave the below code:
import email
def read_MSG(file):
email_File = open(file)
messagedic = email.Message(email_File)
content_type = messagedic["plain/text"]
FROM = messagedic["From"]
TO = messagedic.getaddr("To")
sujet = messagedic["Subject"]
email_File.close()
return content_type, FROM, TO, sujet
myMSG= read_MSG(r"c:\\myemail.msg")
print myMSG
However it gives an error:
Traceback (most recent call last):
File "C:\Python27\G.py", line 19, in <module>
myMSG= read_MSG(r"c:\\myemail.msg")
File "C:\Python27\G.py", line 10, in read_MSG
messagedic = email.Message(email_File)
TypeError: 'LazyImporter' object is not callable
Some responses on Internet tell it’d better to convert the .msg to .eml before parsing but I am not really sure how.
What would be the best way to read a .msg file?
The code you have now looks to be completely unworkable for what you're trying to accomplish. You need to parse Outlook ".msg" files, which can be done in Python but not using the email module. But if you can use ".eml" files as you mentioned, it will be easier because the email module can read those.
To read .eml files, see email.message_from_file().
In case someone else comes across this like me, almost a decade after the original question:
After trying some different solutions offered here and elsewhere on the internet, I found that the easiest for me was to use extract-msg, which you can install with pip. The readme documentation is limited, but the doc-strings in the actual library is quite comprehensive.
In my case, I needed to read a .msg on disc and specifically save its attachments to disc. Here is some sample code to show how easy this is with extact-msg:
import extract_msg
msg = extract_msg.openMsg('c:/some_folder/some_mail.msg')
sender = msg.sender
subject = msg.subject
body = msg.body
time_received = msg.receivedTime # datetime
attachment_filenames = []
for att in msg.attachments:
att.save(customPath='c:/saved_attachments/')
attachment_filenames.append(att.name)
Related
I am currently logged on to my BBG anywhere (web login) on my Mac. So first question is would I still be able to extract data using tia (as I am not actually on my terminal)
import pdblp
con = pdblp.BCon(debug=True, port=8194, timeout=5000)
con.start()
I got this error
pdblp.pdblp:WARNING:Message Received:
SessionStartupFailure = {
reason = {
source = "Session"
category = "IO_ERROR"
errorCode = 9
description = "Connection failed"
}
}
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/Users/prasadkamath/anaconda2/envs/Pk36/lib/python3.6/site-packages/pdblp/pdblp.py", line 147, in start
raise ConnectionError('Could not start blpapi.Session')
ConnectionError: Could not start blpapi.Session
I am assuming that I need to be on the terminal to be able to extract data, but wanted to confirm that.
This is a duplicate of this issue here on SO. It is not an issue with pdblp per se, but with blpapi not finding a connection. You mention that you are logged in via the web, which only allows you to use the terminal (or Excel add-in) within the browser, but not outside of it, since this way of accessing Bloomberg lacks a data feed and an API. More details and alternatives can be found here.
So I am trying to create a bot that cross posts from a sub (r/pics) to (r/polpics) using a bit of code from u/GoldenSights. I upgraded to a new python distro and I get a ton of errors, I don't even know where to begin. Here is the code (formatting off, error lines bold):
Traceback (most recent call last):
File "C:\Users\tonyc\AppData\Local\Programs\Python\Python36-32\Lib\site-
packages\praw\subdump.py", line 84, in <module>
r = praw.Reddit(USERAGENT)
File "C:\Users\tonyc\AppData\Local\Programs\Python\Python36-32\lib\site-
packages\praw\reddit.py", line 150, in __init__
raise ClientException(required_message.format(attribute))
praw.exceptions.ClientException: Required configuration setting 'client_id'
missing.
This setting can be provided in a praw.ini file, as a keyword argument to the `Reddit` class constructor, or as an environment variable.
This seems to be related to USERAGENT setting. I don't think I have that configured right.
USERAGENT = ""
# This is a short description of what the bot does. For example
"/u/GoldenSights' Newsletter bot"
SUBREDDIT = "pics"
# This is the sub or list of subs to scan for new posts.
# For a single sub, use "sub1".
# For multiple subs, use "sub1+sub2+sub3+...".
# For all use "all"
KEYWORDS = ["It looks like this post is about US Politics."]
# Any comment containing these words will be saved.
KEYDOMAINS = []
# If non-empty, linkposts must have these strings in their URL
This is the error line:
print('Logging in')
r = praw.Reddit(USERAGENT) <--here, this is error line 84
r.set_oauth_app_info(APP_ID, APP_SECRET, APP_URI)
r.refresh_access_information(APP_REFRESH)
Also in Reddit.py :
raise ClientException(required_message.format(attribute)) <--- error
praw.exceptions.ClientException: Required configuration setting 'client_id'
missing.
This setting can be provided in a praw.ini file, as a keyword argument to
the `Reddit` class constructor, or as an environment variable.
Firstly, you're going to want to have your API credentials stored externally in your praw.ini file. This makes things a lot more secure, and looks like it might go some way to fixing your issue. Here's what a completed praw.ini file looks like, including the useragent, so try to replicate this.
[DEFAULT]
# A boolean to indicate whether or not to check for package updates.
check_for_updates=True
# Object to kind mappings
comment_kind=t1
message_kind=t4
redditor_kind=t2
submission_kind=t3
subreddit_kind=t5
# The URL prefix for OAuth-related requests.
oauth_url=https://oauth.reddit.com
# The URL prefix for regular requests.
reddit_url=https://www.reddit.com
# The URL prefix for short URLs.
short_url=https://redd.it
[appname]
client_id=IE*******T14_w
client_secret=SW***********************CLY
password=******************
username=appname
user_agent=web:appname:1.0.0 (by /u/username)
Let me know how things go after you sort this out.
I have windows 10 environment with Python 2.7, win32com package 219 is installed.
I was able to run below code which runs a macro in excel and generate a pie chart that will get attached(also get embedded in email body) to email and sent.
This program was working fine, earlier, however after some windows update, the same is giving AttributeError: olEmbeddeditem, i have imported win32com.client and its constant.
Want the embedded image in the email body, so replacing olEmbeddeditem with olByValue, etc. will not help, i think, though i have tried, which also didn't worked.
I have also done reinstallation of win32com package of python, however problem persist.
Earlier working code does not included "from win32com.client import constants", however since it was not working, have thought of adding this line, but this too didn't helped.
Any help would be appreciated.
import sys
import os
import win32com.client
import codecs
from win32com.client import constants
sys.stdout = codecs.getwriter("iso-8859-1")(sys.stdout, 'xmlcharrefreplace')
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
inbox = outlook.GetDefaultFolder(6)
all_inbox = inbox.Items
folders = inbox.Folders
olMailItem = 0x0
obj = win32com.client.Dispatch("Outlook.Application")
xlApp = win32com.client.Dispatch("Excel.Application")
ExcelWorkBook = xlApp.Workbooks.Open('C:\Users\xxx\Desktop\data.xlsm')
xlSheet1 = ExcelWorkBook.Sheets("Sheet1")
xlApp.Application.Run("data.xlsm!Macro1")
chart1 = xlSheet1.ChartObjects(1)
chart1.Chart.Export("C:\Users\xxx\Desktop\photo.gif", "GIF", False)
xlApp.Workbooks(1).Close(SaveChanges=0)
xlApp.Application.Quit()
newMail = obj.CreateItem(olMailItem)
newMail.Subject = "Presentation of Automation"
attachment = newMail.Attachments.Add("C:\Users\xxx\Desktop\photo.gif", win32com.client.constants.olEmbeddeditem, 0, "photo")
imageCid = "photo.gif"
attachment.PropertyAccessor.SetProperty("http://schemas.microsoft.com/mapi/proptag/0x3712001E", imageCid)
newMail.HTMLBody = "<body>Dear Sir,Madam,<br>Please find the requested details.<br><br><p><img src=\"cid:{0}\"></body>".format(imageCid)
newMail.To = x
attachment1 = "C:\Users\xxx\Desktop\photo.gif"
newMail.Attachments.Add(attachment1)
newMail.Send()
os.remove("C:\Users\xxx\Desktop\photo.gif")
msg.UnRead = False
The root cause of the issue was not a Windows update as suspected, however it was because of a group email in the Inbox which was giving the error. After deleting that group mail or moving to different folder than Inbox the issue got resolved. Still not sure about the reason why it was giving the error and what is the way out going forward to ensure that such emails does not end up into a traceback.
The main reason for this attribute error is because your COM-server has shifted from late-binding (dynamic) to early binding (static).
Delete the gen_py folder in Temp which will revert the Dispatch to dynamic from static and your code should work fine.
instead of using
attachment = newMail.Attachments.Add("C:\Users\xxx\Desktop\photo.gif", win32com.client.constants.olEmbeddeditem, 0, "photo")
you can do
attachment = newMail.Attachments.Add("C:\Users\xxx\Desktop\photo.gif", 0x5, 0, "photo")
I'm processing a large (120mb) text file from my thunderbird imap directory and attempting to extract to/from info from the headers using mbox and regex. the process runs for a while until I eventually get an exception: "TypeError: expected string or buffer".
The exception references the fifth line of this code:
PAT_EMAIL = re.compile(r"[0-9A-Za-z._-]+\#[0-9A-Za-z._-]+")
temp_list = []
mymbox = mbox("data.txt")
for email in mymbox.values():
from_address = PAT_EMAIL.findall(email["from"])
to_address = PAT_EMAIL.findall(email["to"])
for item in from_address:
temp_list.append(item) #items are added to a temporary list where they are sorted then written to file
I've run the code on other (smaller) files, so I'm guessing the issue is my file. The file appears to be just a bunch of text. Can someone point me in the write direction for debugging this?
There can only be one from address (I think!):
In the following:
from_address = PAT_EMAIL.findall(email["from"])
I have a feeling you're trying to duplicate the work of email.message_from_file and email.utils.parseaddr
from email.utils import parseaddr
>>> s = "Jon Clements <jon#example.com>"
>>> from email.utils import parseaddr
>>> parseaddr(s)
('Jon Clements', 'jon#example.com')
So you can use parseaddr(email['from'])[1] to get the email address and use that.
Similarly, you may wish to look at email.utils.getaddresses to handle to and cc addresses...
Well, I didn't solve the issue but have worked around it for my own purposes. I inserted a try statement so that the iteration just continues past any TypeError. For every thousand email addresses I'm getting about 8 failures, which will suffice. Thanks for your input!
PAT_EMAIL = re.compile(r"[0-9A-Za-z._-]+\#[0-9A-Za-z._-]+")
temp_list = []
mymbox = mbox("data.txt")
for email in mymbox.values():
try:
from_address = PAT_EMAIL.findall(email["from"])
except(TypeError):
print "TypeError!"
try:
to_address = PAT_EMAIL.findall(email["to"])
except(TypeError):
print "TypeError!"
for item in from_address:
temp_list.append(item) #items are added to a temporary list where they are sorted then written to file
I am using Microsoft's CDO (Collaboration Data Objects) to programmatically read mail from an Outlook mailbox and save embedded image attachments. I'm trying to do this from Python using the Win32 extensions, but samples in any language that uses CDO would be helpful.
So far, I am here...
The following Python code will read the last email in my mailbox, print the names of the attachments, and print the message body:
from win32com.client import Dispatch
session = Dispatch('MAPI.session')
session.Logon('','',0,1,0,0,'exchange.foo.com\nbar');
inbox = session.Inbox
message = inbox.Messages.Item(inbox.Messages.Count)
for attachment in message.Attachments:
print attachment
print message.Text
session.Logoff()
However, the attachment names are things like: "zesjvqeqcb_chart_0". Inside the email source, I see image source links like this:
<IMG src="cid:zesjvqeqcb_chart_0">
So, is it possible to use this CID URL (or anything else) to extract the actual image and save it locally?
Difference in versions of OS/Outlook/CDO is what might be the source of confusion, so here are the steps to get it working on WinXP/Outlook 2007/CDO 1.21:
install CDO 1.21
install win32com.client
goto C:\Python25\Lib\site-packages\win32com\client\ directory run the following:
python makepy.py
from the list select "Microsoft CDO 1.21 Library (1.21)", click ok
C:\Python25\Lib\site-packages\win32com\client>python makepy.py
Generating to C:\Python25\lib\site-packages\win32com\gen_py\3FA7DEA7-6438-101B-ACC1-00AA00423326x0x1x33.py
Building definitions from type library...
Generating...
Importing module
Examining file 3FA7DEA7-6438-101B-ACC1-00AA00423326x0x1x33.py that's just been generated, will give you an idea of what classes, methods, properties and constants are available.
Now that we are done with the boring steps, here is the fun part:
import win32com.client
from win32com.client import Dispatch
session = Dispatch('MAPI.session')
session.Logon ('Outlook') # this is profile name
inbox = session.Inbox
messages = session.Inbox.Messages
message = inbox.Messages.GetFirst()
if(message):
attachments = message.Attachments
for i in range(attachments.Count):
attachment = attachments.Item(i + 1) # yep, indexes are 1 based
filename = "c:\\tmpfile" + str(i)
attachment.WriteToFile(FileName=filename)
session.Logoff()
Same general approach will also work if you have older version of CDO (CDO for win2k)