How to save an email attached in another using python smtplib?

How to save an email attached in another using python smtplib? - python

I am using python imaplib to download and save attachments in email. But when there is an email with attachment as another email, x.get_payload() is of Nonetype. I think these type of mails are are send using some email clients. Since the filename was missing, I tried changing filename in header followed by 'Content-Disposition'. The renamed file gets opened and when I try to write to that file using
fp.write(part.get_payload(decode=True))
it says string or buffer expected but Nonetype found.
>>>x.get_payload()
[<email.message.Message instance at 0x7f834eefa0e0>]
>>>type(part.get_payload())
<type 'list'>
>>>type(part.get_payload(decode=True))
<type 'NoneType'>
I removed decode=True and I got a list of objects
x.get_payload()[0]
<email.message.Message instance at 0x7f834eefa0e0>
I tried editing the filename in case email found as attachment.
if part.get('Content-Disposition'):
attachment = str(part.get_filename()) #get filename
if attachment == 'None':
attachment = 'somename.mail'
attachment = self.autorename(attachment)#append (no: of occurences) to filename eg:filename(1) in case file exists
x.add_header('Content-Disposition', 'attachment', filename=attachment)
attachedmail = 1
if attachedmail == 1:
fp.write(str(x.get_payload()))
else:
fp.write(x.get_payload(decode=True)) #write contents to the opened file
and the file contains the object name file content is given below
[ < email.message.Message instance at 0x7fe5e09aa248 > ]
How can I write the contents of these attached emails to files?

I solved it myself. as [ < email.message.Message instance at 0x7fe5e09aa248 > ] is a list of email.message.Message instances, each one have .as_string() method. In my case writing the content of .as_string() to a file helped me to extract the whole header data including embedded attachments to a file. Then I inspected the file line by line and saved contents based on the encoding and file type.
>>>x.get_payload()
[<email.message.Message instance at 0x7f834eefa0e0>]
>>>fp=open('header','wb')
>>>fp.write(x.get_payload()[0].as_string())
>>>fp.close()
>>>file_as_list = []
>>>fp=open('header','rb')
>>>file_as_list = fp.readlines()
>>>fp.close()
And then inspecting each lines in file
for x in file_as_list:
if 'Content-Transfer-Encoding: quoted-printable' in x:
print 'qp encoded data found!'
if 'Content-Transfer-Encoding: base64' in x:
print 'base64 encoded data found!'
The encoded data representing inline(embedded) attachments can be skipped as imaplib already captures it.

Related

How to get a table inside body of .msg file

I want to get one table that are inside the body of one .msg file with Python. I can get the body content, but I need the table separated into dataframe, for example.
I can get the body content, but I can't separe the table of the body
import win32com.client
import os
dir = r"C:\Users\Murilo\Desktop\Emails\030"
file_list = os.listdir(dir)
for file in file_list:
if file.endswith(".msg"):
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
msg = outlook.OpenSharedItem(dir + "/" + file)
print(msg.Body)
I need the table that exists in body content, but not all body

If it is an HTML table, use MailItem.HTMLBody (instead of the plain text Body) and extract the table from HTML.

I would look at the extract_msg library. It should allow you to open a .msg file as plain XML and be very easy to extract a table from the content.
msg = extract_msg.Message(fileLoc)
msg_message = msg.body
content = ('Body: {}'.format(msg_message))

The Outlook object model provides three main ways for working with item bodies:
Body.
HTMLBody.
The Word editor. The WordEditor property of the Inspector class returns an instance of the Word Document which represents the message body. So, you can use the Word object model do whatever you need with the message body. The Copy and Paste methods of the Document will do the trick.
See Chapter 17: Working with Item Bodies for more information.
But I think the easiest and cleanest way is to use the Word object model. You can read more how to deal with the Word Object Model and how to use it to extract the table content in the How to read contents of an Table in MS-Word file Using Python? post.

Parse excel attachment from .eml file in python

I'm trying to parse a .eml file. The .eml has an excel attachment that's currently base 64 encoded. I'm trying to figure out how to decode it into XML so that I can later turn it into a CSV I can do stuff with.
This is my code right now:
import email
data = file('Openworkorders.eml').read()
msg = email.message_from_string(data)
for part in msg.walk():
c_type = part.get_content_type()
c_disp = part.get('Content Disposition')
if part.get_content_type() == 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet':
excelContents = part.get_payload(decode = True)
print excelContents
The problem is
When I try to decode it, it spits back something looking like this.
I've used this post to help me write the code above.
How can I get an email message's text content using Python?
Update:
This is exactly following the post's solution with my file, but part.get_payload() returns everything still encoded. I haven't figured out how to access the decoded content this way.
import email
data = file('Openworkorders.eml').read()
msg = email.message_from_string(data)
for part in msg.walk():
if part.get_content_type() == 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet':
name = part.get_param('name') or 'MyDoc.doc'
f = open(name, 'wb')
f.write(part.get_payload(None, True))
f.close()
print part.get("content-transfer-encoding")

As is clear from this table (and as you have already concluded), this file is an .xlsx. You can't just decode it with unicode or base64: you need a special package. Excel files specifically are a bit tricker (for e.g. this one does PowerPoint and Word, but not Excel). There are a few online, see here - xlrd might be the best.

Here is my solution:
I found 2 things out:
1.) I thought .open() was going inside the .eml and changing the selected decoded elements. I thought I needed to see decoded data before moving forward. What's really happening with .open() is it's creating a new file in the same directory of that .xlsx file. You must open the attachment before you will be able to deal with the data.
2.) You must open an xlrd workbook with the file path.
import email
import xlrd
data = file('EmailFileName.eml').read()
msg = email.message_from_string(data) # entire message
if msg.is_multipart():
for payload in msg.get_payload():
bdy = payload.get_payload()
else:
bdy = msg.get_payload()
attachment = msg.get_payload()[1]
# open and save excel file to disk
f = open('excelFile.xlsx', 'wb')
f.write(attachment.get_payload(decode=True))
f.close()
xls = xlrd.open_workbook(excelFilePath) # so something in quotes like '/Users/mymac/thisProjectsFolder/excelFileName.xlsx'
# Here's a bonus for how to start accessing excel cells and rows
for sheets in xls.sheets():
list = []
for rows in range(sheets.nrows):
for col in range(sheets.ncols):
list.append(str(sheets.cell(rows, col).value))

How does rfile.read() work?

I'm sending a text file with a string in a python script via POST to my server:
fo = open('data'.txt','a')
fo.write("hi, this is my testing data")
fo.close()
with open('data.txt', 'rb') as f:
r = requests.post("http://XXX.XX.X.X", data = {'data.txt':f})
f.close()
And receiving and handling it here in my server handler script, built off an example found online:
def do_POST(self):
data = self.rfile.read(int(self.headers.getheader('Content-Length')))
empty = [data]
with open('processing.txt', 'wb') as file:
for item in empty:
file.write("%s\n" % item)
file.close()
self._set_headers()
self.wfile.write("<html><body><h1>POST!</h1></body></html>")
My question is, how does:
self.rfile.read(int(self.headers.getheader('Content-Length')))
take the length of my data (an integer, # of bytes/characters) and read my file? I am confused how it knows what my data contains. What is going on behind the scenes with HTTP?
It outputs data.txt=hi%2C+this+is+my+testing+data
to my processing.txt, but I am expecting "hi this is my testing data"
I tried but failed to find documentation for what exactly rfile.read() does, and if simply finding that answers my question I'd appreciate it, and I could just delete this question.

Your client code snippet reads contents from the file data.txt and makes a POST request to your server with data structured as a key-value pair. The data sent to your server in this case is one key data.txt with the corresponding value being the contents of the file.
Your server code snippet reads the entire HTTP Request body and dumps it into a file. The key-value pair structured and sent from the client comes in a format that can be decoded by Python's built in library urlparse.
Here is a solution that could work:
def do_POST(self):
length = int(self.headers.getheader('content-length'))
field_data = self.rfile.read(length)
fields = urlparse.parse_qs(field_data)
This snippet of code was shamefully borrowed from: https://stackoverflow.com/a/31363982/705471
If you'd like to extract the contents of your text file back, adding the following line to the above snippet could help:
data_file = fields["data.txt"]
To learn more about how such information is encoded for the purposes of HTTP, read more at: https://en.wikipedia.org/wiki/Percent-encoding

Python 3.6: Reading a non-empty binary file is interpreted by Python to be empty

I have the following code, where 'password' is a string which is passed into the function. The issue is such that when I attempt to read the file created in the first half of the code, Python interprets it as being empty (despite the fact that File Explorer and text editors tell me it contains content). The 4 print statements are to assist with debugging (found here).
def encryptcredentials(username, password):
# Create key randomly and save to file
key = get_random_bytes(16)
keyfile = open("key.bin", "wb").write(key)
password = password.encode('utf-8')
path = "encrypted.bin"
# The following code generates a new AES128 key and encrypts a piece of data into a file
cipher = AES.new(key, AES.MODE_EAX)
ciphertext, tag = cipher.encrypt_and_digest(password)
file_out = open(path, "wb")
[file_out.write(x) for x in (cipher.nonce, tag, ciphertext)]
print("the path is {!r}".format(path))
print("path exists: ", os.path.exists(path))
print("it is a file: ", os.path.isfile(path))
print("file size is: ", os.path.getsize(path))
# At the other end, the receiver can securely load the piece of data back, if they know the key.
file_in = open(path, "rb")
nonce, tag, ciphertext = [file_in.read(x) for x in (16, 16, -1)]
The console output is as such:
the path is 'encrypted.bin'
path exists: True
it is a file: True
file size is: 0
Here's an image of how the file is displayed in File Explorer.
It appears that there's content in the .bin file produced at [file_out.write(x) for x in (cipher.nonce, tag, ciphertext)], but I can't get Python to read it.
Welcoming all suggestions. I'm running Python 3.6, 32-bit.

You have to close or even flush the file after file_out.write(x), so your data are writing from buffer to the file.
[file_out.write(x) for x in (cipher.nonce, tag, ciphertext)]
file_out.close()

Save an django email to eml

I am generating a bunch of html emails in django, and I want to save them into a model, in a FileField. I can quite easily generate the html content and dump in into a File, but I want to create something that can be opened in email clients, e.g. an eml file. Does anyone know of a python or django module to do this? Just to be clear, I'm not looking for an alternative email backend, as I also want the emails to be sent when they're generated.
Edit: After a bit of reading, it looks to me like the EmailMessage.messge() should return the content that should be stored int he eml file. However, if I try to save it like this, the file generated is empty:
import tempfile
name = tempfile.mkstemp()[1]
fh = open(name, 'wb')
fh.write(bytes(msg.message()))
fh.close()
output = File(open(name, 'rb'), msg.subject[:50])
I want to use a BytesIO instead of a temp file, but the temp file is easier for testing.

EML file is actually a text file with name value pairs. A valid EML file would be like
From: test#example.com
To: test#example.com
Subject: Test
Hello world!
If you follow the above pattern and save it in file with .eml extension, thunderbird like email clients will parse and show them without any problem.

Django's EmailMessage.message().as_bytes() will return the content of the .eml file. Then you just need to save the file to the directory of your choice:
from django.core.mail import EmailMessage
msg = EmailMessage(
'Hello',
'Body goes here',
'from#example.com',
['to3#example.com'],
)
eml_content = msg.message().as_bytes()
file_name = "/path/to/eml_output.eml"
with open(file_name, "wb") as outfile:
outfile.write(eml_content)

I had the similar problem. I found ticket on Django site. Last comment suggests using django-eml-email-backend. It helps me and it is very useful and simple.
Example:
installing:
$ pip install django-eml-email-backend
using:
EMAIL_BACKEND = 'eml_email_backend.EmailBackend'
EMAIL_FILE_PATH = 'path/to/output/folder/'

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to save an email attached in another using python smtplib? - python

Related

How to get a table inside body of .msg file

Parse excel attachment from .eml file in python

How does rfile.read() work?

Python 3.6: Reading a non-empty binary file is interpreted by Python to be empty

Save an django email to eml

Categories

Resources