Reading attributes of .msg file - python

I am trying to read a .msg file to get the sender, recipients, and title.
I'm making this script for my workplace where I'm only allowed to install default python libraries so I want to use the email module to do this.
On the python website I found some examples of using the email module. https://docs.python.org/3/library/email.examples.html
Near the end of the page it talks about getting the sender, subject and recipient. I've tried using this code like this:
# Import the email modules we'll need
from email import policy
from email.parser import BytesParser
with open('test_email.msg', 'rb') as fp:
msg = BytesParser(policy=policy.default).parse(fp)
# Now the header items can be accessed as a dictionary, and any non-ASCII will
# be converted to unicode:
print('To:', msg['to'])
print('From:', msg['from'])
print('Subject:', msg['subject'])
This results in an output:
To: None
From: None
Subject: None
I checked the file test_email.msg, it is a valid email.
When I add a line of code
print(msg)
I get an output of a garbled email the same as if I opened the .msg file in notepad.
Can anybody suggest why the email module isn't finding the sender/recipient/subject correctly?

You are apparently attempting to read some sort of proprietary binary format. The Python email library does not support this; it only handles traditional (basically text) RFC822 / RFC5322 format.
To read Microsoft's OLE formats, you will need a third-party module, and some patience, voodoo, and luck.
Also, for the record, there is no unambigious definition of .msg. Outlook uses this file extension for its files, but it is used on other files in other formats as well, including also traditional RFC822 files.
(The second link attempts to link to the MS-OXMSG spec on MSDN; but Microsoft have in the past regarded URLs as some sort of depletable resource which runs out when you use it, so the link will probably stop working if enough people click on it.)

Related

remove remaining hex code in Pyc file

I have a project to send, where basically I have to send an email using python.
My code is complete so I was about to send it.
Because of the fact the module smtplib needs my email log in, I compiled my code so people could no see my email and password, however, even compiled, when we look at the hex code, we can still see my email and password (and some print)
Is there a way to compile so we have no information left after?
Thank you very much for your help and time !
Generally it is a bad idea to hold sensitive information in the code. There is no uniformly the best way to do it, but common practices to store credentials include:
in a separate code file not in your code base (local_settings.py, added to .gitignore)
in a separate config file outside of the project (e.g. json or yml)
environment variables (read using os.environ)
command line parameters
request as user input
a combination of all above

How to generate a static .html with python

I'm looking for a python solution to create a static .html that can be sent out via email, either attached or embedded in the email (ignore this latter option if it requires a lot more work). I do not have requirements for what regards the layout of the .html. The focus here is in identifying the less painful solution for to generate an offline .html.
A potential solution could be along the lines of the following pseudo-code.
from some_unknown_pkg import StaticHTML
# Initialise instance
newsletter = StaticHTML()
# Append charts, tables and text to blank newsletter.
newsletter.append(text_here)
newsletter.append(interactive_chart_generated_with_plotly)
newsletter.append(more_text_here)
newsletter.append(a_png_file_loaded_from_local_pc)
# Save newsletter to .html, ready to be sent out.
newsletter.save_to_html('newsletter.html')
Where 'newsletter.html' can be opened in a whatever browser. Just to provide a bit more context, this .html is supposed to be sent out to a few selected people inside my company and contains sensible data. I'm using plotly to generate interactive charts to be inserted in the .html.
Possible solution here
Seems package in that answer is exactly you want. Docs: http://www.yattag.org/
Another pretty nice package here.
Start your python module with by importing sys module and redirect stdout to newsletter.html
import sys
sys.stdout = open('newsletter.html','w')
This will redirect any output generated to the html file. Now, just use the print command in python to transmit html tags to the file. For eg try:
print "<html>"
print "<p> This is my NewsLetter </p>"
print "</html>"`
This code snippet will create a basic HTML file. Now, you can open this file in any browser. For sending email you can use email and smtplib modules of python.
The Dominate package looks like it provides a simple and intuitive way to create HTML pages. https://www.yattag.org/

Email message extract has encoding characters

I am trying to do some text mining on emails which I have exported from my email client (Mail in OS X) just by copying and pasting to a rtf file.
When I attempt to run tf-idf on the files either in python or rapidminer I get features which are clearly not in the message content itself. I wonder where they come from or how I can get rid of them. Perhaps from the headers? For example features such as: fonttbl, colortbl,cocoa rtf,paperw etc. Clearly they are some properties of the email. Where do they come from and how can I remove that more the files or extract only the email contents from the original email messages?
Perhaps this is an encoding issue??
Thanks!

Python : How to email attached Unix file in DOS format

On a Unix server, I am using smtplib in python to send an email to myself ; the email also contains a unix file attachment. I use outlook client to view the email and when I open the file, it does not display correctly due to differences in Unix and DOS format.
Is there anyway using smtplib to send the Unix file in DOS format ?
I do not want to use unix2dos as I do not want to create/modify files on the filesystem.
Editing the question to include changes based on suggestions from senior members
Since I have been asked to modify the file, need to know if there is a simpler way to do that. I am not well versed with Python so please bear with me. I have tried a few variations of the following but none have worked. My requirement is that I do not want to write to the file system. I want to save the changes into a variable in memory.
import string
fo=open(filename,"r")
filecontent=fo.readlines()
for line in filecontent:
line = string.replace(line,"\n","\r\m")
This is only a variation around the first comment on your question:
with open(filename, 'r') as f:
content = f.read().replace('\n', '\r\n')
After that, you have in the variable content the ... content of your file, with newlines replaced. In addition, using the with construct ensure your file is properly closed after reading.
Please note it is your responsibility to ensure that the file is "small enough" to hold in memory. If not sure, you could read line by line as you proposed yourself. That being said, I'm not quite sure to understand what was wrong with that at first...

What is a good audio library for validating files in Python?

I'm already checking for content-type, size, and extension (Django (audio) File Validation), but I need a library to read the file and confirm that it is in fact what I hope it is (mp3 and mp4 mostly).
I've been here: http://wiki.python.org/moin/Audio/ but no luck. Been at this one for a while, am a bit lost in the woods. Relying on SO big time for this whole end of things...
Thanks in advance.
EDIT:
I'm already (in Django) using UploadedFile.content_type() :
"The content-type header uploaded with the file (e.g. text/plain or application/pdf). Like any data supplied by the user, you shouldn't trust that the uploaded file is actually this type. You'll still need to validate that the file contains the content that the content-type header claims -- "trust but verify."
So, I'm already reading the header. But how can I validate the actual content of the file?
If just checking the header isn't good enough, I'd recommend using mutagen to load the file. It should throw an exception if it's not correct.
FYI, I do not think your approach is very scalable. Is it really necessary to read every byte of the file? What is your reason for not trusting the file header?
You can call a unix sub-shell within python like this:
>>> filename = 'Giant Steps.mp3'
>>> import os
>>> type = os.system('file %s' % filename)
Giant Steps.mp3: ISO Media, MPEG v4 system, iTunes AAC-LC
** See man pages for more details on the 'file' command if you want to go this route.
See this post for other options
Use sndhdr
It does a little more than content-type. Reads the file and gets it's headers..of course this is still not foolproof..using ffmpeg is probably then the only option.

Categories

Resources