Python IMAP - rendering message bodies with embedded inline images

Python IMAP - rendering message bodies with embedded inline images - python

I am working on my own email client (powered by Django 1.10 and Python 3).
Currently, I am trying to render inbox messages using python's IMAPClient library. It looks like I succeeded in parsing emails with mixed and alternative subtypes, but now I am stuck trying to render parts of the body with subtype relative. That is, parts containing HTML with embedded inline attachments.
Currently, I am going to download one-by-one all the inline images to my server using the respective fetch command, and after than insert links on those images in the HTML of the target letter.
To illustrate, let's say email HTML representation contains an inline image:
...<td><img src="cid:part1.06030702.04060203#studinter.ru"></td>...
...and thebodystruture part containing the inline image description looks like this:
(b'IMAGE', b'JPEG', (b'NAME', b'ban1.jpg'), b'<part1.06030702.04060203#studinter.ru>', None, b'BASE64', 15400, None, (b'INLINE', (b'FILENAME', b'ban1.jpg')), None)
So, in theory, I could download the image on my server, and replace the src tag's value(namely, cid:part1.06030702.04060203#studinter.ru) by the url of the image on my server.
My concern here is that this very process of inserting inline attachments into the target HTML message body is something that libraries like IMAPClient or python's email package have already implemented, and whether I am going to reinvent bicycle. I am completely lost in this topic.
The question is, do I really have to implement it on my own? If yes, is the described method appropriate? And if no, I would really appreciate a hint on how to do this with IMAPClient, or standard library's imaplib.

My external lib https://github.com/ikvk/imap_tools
from imap_tools import MailBox, A
with MailBox('imap.mail.com').login('test#mail.com', 'pwd', 'INBOX') as mailbox:
for msg in mailbox.fetch(A(all=True)):
msg.html # str: '<b>Hello 你 Привет</b>'
for att in msg.attachments:
att.filename # str: 'cat.jpg'
att.payload # bytes: b'\xff\xd8\xff\xe0\'
att.content_id # str: 'part45.06020801.00060008#mail.ru'
att.content_type # str: 'image/jpeg'
att.content_disposition # str: 'inline'
There is enough data for rendering here.
You can analyze att.content_id and find it in html.

Related

How do I create a PDF file containing a Signature Field, using python?

In order to be able to sign a PDF document using a token based DSC, I need a so-called signature field in my PDF.
This is a rectangular field you can fill with a digital signature using e.g. Adobe Reader or Adobe Acrobat.
I want to create this signable PDF in Python.
I'm starting from plain text, or a rich-text document (Image & Text) in .docx format.
How do I generate a PDF file with this field, in Python?

Check out pyHanko. You can add, edit and digitally sign PDFs using Python.
https://github.com/MatthiasValvekens/pyHanko
It's totally free. And if you have any problems, Matthias is very helpful and responsive.

Unfortunately, I couldn't find any (free) solutions. Just Python programs that sign PDF documents.
But there is a Python PDF SDK called PDFTron that has a free trial. Here's a link to a specific article showing how to "add a certification signature field to a PDF document and sign it".
# Open an existing PDF
doc = PDFDoc(docpath)
page1 = doc.GetPage(1)
# Create a text field that we can lock using the field permissions feature.
annot1 = TextWidget.Create(doc.GetSDFDoc(), Rect(50, 550, 350, 600), "asdf_test_field")
page1.AnnotPushBack(annot1)
# Create a new signature form field in the PDFDoc. The name argument is optional;
# leaving it empty causes it to be auto-generated. However, you may need the name for later.
# Acrobat doesn't show digsigfield in side panel if it's without a widget. Using a
# Rect with 0 width and 0 height, or setting the NoPrint/Invisible flags makes it invisible.
certification_sig_field = doc.CreateDigitalSignatureField(cert_field_name)
widgetAnnot = SignatureWidget.Create(doc, Rect(0, 100, 200, 150), certification_sig_field)
page1.AnnotPushBack(widgetAnnot)
...
# Save the PDFDoc. Once the method below is called, PDFNet will also sign the document using the information provided.
doc.Save(outpath, 0)

you can use https://github.com/mstamy2/PyPDF2 for PDF generation with python code.
and then use open source Java-Digital-Signature: Java command line tool for digital signature with PKCS#11 token: https://github.com/AlessioScarfone/Java-Digital-Signature
and call on your python code:
import subprocess
subprocess.call(['java', '-jar', 'signer.jar', 'pades', 'test.pdf'])

I use signpdf library of python to sign pdf.
Read this document for better understanding https://github.com/yourcelf/signpdf
pip install signpdf
Demo:
Sign the first page of "contract.pdf" with the signature "sig.png": ->
signpdf contract.pdf sig.png --coords 1x100x100x150x40
Understand Co-ordinates: Github link

Posting files to a chat through Slack API

I'm trying to deliver videos, through Slack API using Python's library slackclient.
I often use slack.api_call('chat.postMessage'...) and I am familiar with 'files.upload' but when I execute
slack = SlackClient(TOKEN)
slack.api_call('files.upload', file=open('video.mp4', 'rb')...)
the file is uploaded to the given channel, but is not posted as a message.
What I am trying to achieve is to create a message which I can send as a private message or to a channel that would look something like this
and maybe add some text above it if possible.
I've explored the Attachment section in the docs, but couldn't find anything related to files.
If there is a way to not supply the file in binary format, but as a link that would also be ok (as long as it is displayed in an embedded fashion).

How about this sample script? It uses io.BytesIO(f.read()) for the file. In order to use this, files:write:user has to be included in the scopes. About the text, you can import it using initial_comment. In my environment, attachments could not be used for files.upload. The API document is https://api.slack.com/methods/files.upload.
Script :
with open('./sample.mp4', 'rb') as f:
slack.api_call(
"files.upload",
channels='#sample',
filename='sample.mp4',
title='sampletitle',
initial_comment='sampletext',
file=io.BytesIO(f.read())
)
Result :
If I misunderstand your question, I'm sorry.

I came across this question because I had the same issue - my file would upload and I would get a response, but the file would not be posted to the channel I had sent. It turned out to be a poor job by me of reading the Slack API documentation. I had used chat.postMessage many times and included a single 'channel' argument. Here is that API: https://api.slack.com/methods/chat.postMessage
The files.upload method it wants a comma separated list of channels in a 'channels' argument. See https://api.slack.com/methods/files.upload Once I changed from 'channel' to 'channels' and made sure to pass it as a list, I was successfully posting the image to the channel I wanted.
To the original question then, in your link to the code you used (https://ibb.co/hwH5hF) try changing channel='bla'to channels=['bla']

This works for me:
import slack
client = slack.WebClient(token='xoxb-XXX')
with open('/path/to/attachment.jpeg', 'rb') as att:
r = client.api_call("files.upload", files={
'file': att,
}, data={
'channels': '#my_channel',
'filename': 'downloaded_filename.jpeg',
'title': 'Attachment\'s title',
'initial_comment': 'Attachment\'s description',
})
assert r.status_code == 200

Obtain DTD information from XML using Python

I'm trying to extract the DTD information from an XML document using Python and preferably the standard library. At first glance, it seems xml.sax.handler.DTDHandler is the way to go, so I wrote the following example code to extract the DTD of a trivial DocBook v4 document:
import xml.sax
from contextlib import closing
XML_CODE = '''<!DOCTYPE example PUBLIC
"-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd">
<example><title>Hello World in Python</title>
<programlisting>
print('Hello World!')
</programlisting>
</example>'''
class DTDPrinter(xml.sax.handler.DTDHandler):
def notationDecl(self, name, publicId, systemId):
print('name={}, publicId={}'.format(name, publicId))
if __name__ == '__main__':
with closing(xml.sax.make_parser()) as parser:
parser.setFeature(xml.sax.handler.feature_external_pes, False)
parser.setFeature(xml.sax.handler.feature_validation, False)
parser.setDTDHandler(DTDPrinter())
print('---------- before feed')
parser.feed(XML_CODE)
print('---------- after feed')
My expectation was that when running this code with Python 3.5 the output would be something like:
---------- before feed
name=example, publicId=-//OASIS//DTD DocBook XML V4.1.2//EN
---------- after feed
Instead I get an output with DTD's seemingly related to various image formats but not the one specified in the document:
---------- before feed
name=BMP, publicId=+//ISBN 0-7923-9432-1::Graphic Notation//NOTATION Microsoft Windows bitmap//EN
name=CGM-CHAR, publicId=ISO 8632/2//NOTATION Character encoding//EN
name=CGM-BINARY, publicId=ISO 8632/3//NOTATION Binary encoding//EN
...
name=WMF, publicId=+//ISBN 0-7923-9432-1::Graphic Notation//NOTATION Microsoft Windows Metafile//EN
name=WPG, publicId=None
name=linespecific, publicId=None
---------- after feed
Although maybe the last entry with the name linespecific might refer to the document DTD in a crippled way?
I also noticed a couple of seconds delay after the last output despite the document being trivial. Maybe the parsers attempts to connect to the internet? I tried to disable this by settings the features
parser.setFeature(xml.sax.handler.feature_external_pes, False)
parser.setFeature(xml.sax.handler.feature_validation, False)
but to no avail.
How can I convince the DTDHandler to react to the DTD occurring in the document and not connect to the internet?

As I didn't receive an answer and couldn't get this to work properly in the past 2 weeks I resorted to violence and simply used a regular expression to extract the information. This is ugly and won't be able to properly process all valid ways to express a DTD but is good enough for my purpose.
import re
DTD_REGEX = re.compile(
r'<!DOCTYPE\s+(?P<name>[a-zA-Z][a-zA-Z-]*)\s+PUBLIC\s+"(?P<public_id>.+)"')
dtd_match = DTD_REGEX.match(XML_CODE)
if dtd_match is not None:
public_id = dtd_match.group('public_id')
print(public_id)

Using Tablib Library with Web2py

I've been trying for a while to make tablib work with web2py without luck. The code is delivering a .xls file as expected, but it's corrupted and empty.
import tablib
data = []
headers = ('first_name', 'last_name')
data = tablib.Dataset(*data, headers=headers)
data.append(('John', 'Adams'))
data.append(('George', 'Washington'))
response.headers['Content-Type']= 'application/vnd.ms-excel;charset=utf-8'
response.headers['Content-disposition']='attachment; filename=test.xls'
response.write(data.xls, escape=False)
Any ideas??
Thanks!

Per http://en.wikipedia.org/wiki/Process_state , response.write is documented as serving
to write text into the output page body
(my emphasis). data.xls is not text -- it's binary stuff! To verify that is indeed the cause of your problem, try using data.csv instead, and that should work, since it is text.
I believe you'll need to use response.stream instead, to send "binary stuff" as your response (or as an attachment thereto).

send data from blobstore as email attachment in GAE

Why isn't the code below working? The email is received, and the file comes through with the correct filename (it's a .png file). But when I try to open the file, it doesn't open correctly (Windows Gallery reports that it can't open this photo or video and that the file may be unsupported, damaged or corrupted).
When I download the file using a subclass of blobstore_handlers.BlobstoreDownloadHandler (basically the exact handler from the GAE docs), and the same blob key, everything works fine and Windows reads the image.
One more bit of info - the binary files from the download and the email appear very similar, but have a slightly different length.
Anyone got any ideas on how I can get email attachments sending from GAE blobstore? There are similar questions on S/O, suggesting other people have had this issue, but there don't appear to be any conclusions.
from google.appengine.api import mail
from google.appengine.ext import blobstore
def send_forum_post_notification():
blob_reader = blobstore.BlobReader('my_blobstore_key')
blob_info = blobstore.BlobInfo.get('my_blobstore_key')
value = blob_reader.read()
mail.send_mail(
sender='my.email#address.com',
to='my.email#address.com',
subject='this is the subject',
body='hi',
reply_to='my.email#address.com',
attachments=[(blob_info.filename, value)]
)
send_forum_post_notification()

I do not understand why you use a tuple for the attachment. I use :
message = mail.EmailMessage(sender = ......
message.attachments = [blob_info.filename,blob_reader.read()]

I found that this code doesn't work on dev_appserver but does work when pushed to production.

I ran into a similar problem using the blobstore on a Python Google App Engine application. My application handles PDF files instead of images, but I was also seeing a "the file may be unsupported, damaged or corrupted" error using code similar to your code shown above.
Try approaching the problem this way: Call open() on the BlobInfo object before reading the binary stream. Replace this line:
value = blob_reader.read()
... with these two lines:
bstream = blob_info.open()
value = bstream.read()
Then you can remove this line, too:
blob_reader = blobstore.BlobReader('my_blobstore_key')
... since bstream above will be of type BlobReader.
Relevant documentation from Google is located here:
https://cloud.google.com/appengine/docs/python/blobstore/blobinfoclass#BlobInfo_filename

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.