I saved the whole message as xx.eml, but some mails body tells that mail is encoding by base64 at the first line, for example:
charset="utf-8" Content-Transfer-Encoding: base64
charset="gb2312" Content-Transfer-Encoding: base64
I tried to get the keys of body[0][1], but there is no content-transfer-encoding field (only content-type).
How can I process that mails?
def saveMail(conn, num):
typ, body = conn.fetch(num, 'RFC822')
message = open(emldirPath + '\\' + num + '.eml', 'w+')
message.write(str(email.message_from_string(body[0][1])))
print email.message_from_string(body[0][1]).keys()
#['Received', 'Return-Path', 'Received', 'Received', 'Date', 'From', 'To',
# 'Subject', 'Message-ID', 'X-mailer', 'Mime-Version', 'X-MIMETrack',
# 'Content-Type', 'X-Coremail-Antispam']
message.close()
I found the problem, it's not decoding problem.
right mail as follow:
------=_Part_446950_1309705579.1326378953207
Content-Type: text/plain; charset=GBK
Content-Transfer-Encoding: base64
what my program download:
------=_Part_446950_1309705579.1326378953207
Content-Type: text/plain;
charset="utf-8"
Content-Transfer-Encoding: base64
when my program save the .eml file, it change line after 'text/plain;'
therefore outlook express can't parse the mail
if I edit the line to ""Content-Type: text/html;charset="utf-8"",
it works
Now the question is: how to edit my program to not let it change line?
Emails that are transfered as BASE64 must set Content-Transfer-Encoding. However you are most likely dealing with a MIME/Multipart message (e.g. both text/plain and HTML in the same message), in which case the transfer encoding is set separately for each part. You can test with is_multipart() or if Content-Type is multipart/alternative. If that is the case you use walk to iterate over the different parts.
EDIT: It is quite normal to send text/plain using quoted-printable and HTML using BASE64.
Content-Type: multipart/alternative; boundary="=_d6644db1a848db3cb25f2a8973539487"
Subject: multipart sample
From: Foo Bar <foo#example.net>
To: Fred Flintstone <fred#example.net>
--=_d6644db1a848db3cb25f2a8973539487
Content-Transfer-Encoding: base64
Content-Type: text/plain; charset=utf-8
SOME BASE64 HERE
--=_d6644db1a848db3cb25f2a8973539487
Content-Transfer-Encoding: base64
Content-Type: text/html; charset=utf-8
AND SOME OTHER BASE64 HERE
Related
I use the following code to send an e-mail with a pdf attachment. For most receivers this works without any issues but some clients show the pdf as corrupt or not at all. Thus I think there is probably something wrong and most clients are just forgiving enough to make it work anyway. Unfortunately, at this point I am out of ideas as I tried so many header combinations - all without success.
The pdf is base64 encoded.
def sendMail(receiver, pdf):
marker = "AUNIQUEMARKER"
message = """Subject: The Subject
From: {sender}
To: {receiver}
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary={marker}
--{marker}
Content-Type: text/plain; charset="utf-8"
Text goes here.
--{marker}
Content-Type: application/pdf; name="{filename}"
Content-Transfer-Encoding:base64
Content-Disposition: attachment; filename={filename}
{pdfcontent}
--{marker}--
""".format(marker=marker, sender="some#sender.com", receiver=receiver, filename="Test.pdf", pdfcontent=pdf)
port = 587
smtp_server = "some.server.com"
context = ssl.create_default_context()
with smtplib.SMTP(smtp_server, port) as server:
server.starttls(context=context)
server.login("user", "password")
server.sendmail("some#sender.com", [receiver, "cc#sender.com"], message.encode())
In case it is relevant, the pdf is created via LaTex as follows
pdfl = PDFLaTeX.from_texfile('latex/test.tex')
pdf, log, completed_process = pdfl.create_pdf(keep_pdf_file=False, keep_log_file=False)
pdfBase64 = base64.b64encode(pdf).decode()
Thanks for any help.
PS: Not showing the attachment at all might be fixed as I switched from Content-Type: multipart/alternative to multipart/mixed.
Well, apparently the base64 block should contain a newline every 76 characters. In my case that means I had to switch from base64.b64encode to base64.encodebytes as the latter does exactly this.
I'm currently writing an application that needs to load an email into memory, add an attachment to it and send the same email back to the user. This has worked fine in the past, however I'm currently facing an issue where an email is sent in Content-Transfer-Encoding of base64.
I found a script online that converts a built in Python email message object to multipart, however whenever I do this, the original email doesn't get sent as base64 and now appears in plain text whenever I re-send the email.
Does anyone know how I could fix it? The (mostly redacted) email has been added and the code I used to convert the email to multipart. Thanks for the help in advance.
E-Mail
# Before conversion
From: ██████████ <█████#██████.com>
To: ████████ <███████#██████.com>
Subject: █████████
Date: Fri, ██ ███ 2017 00:18:17 +0200
Content-Language: nl-NL
Content-Type: text/html; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
cmVkYWN0ZWRyZWRhY3RlZHJlZGFjdGVkcmVkYWN0ZWRyZWRhY3RlZHJlZGFjdGVkcmVkYWN0ZWRy
ZWRhY3RlZHJlZGFjdGVkcmVkYWN0ZWRyZWRhY3RlZHJlZGFjdGVkcmVkYWN0ZWRyZWRhY3RlZHJl
ZGFjdGVkcmVkYWN0ZWRyZWRhY3RlZHJlZGFjdGVkcmVkYWN0ZWRyZWRhY3RlZA0K
# After conversion
Content-Type: multipart/mixed; boundary="===============0883378942=="
MIME-Version: 1.0
From: ██████████ <█████#██████.com>
To: ████████ <███████#██████.com>
Subject: █████████
Date: Fri, ██ ███ 2017 00:18:17 +0200
Content-Language: nl-NL
Content-Transfer-Encoding: base64
MIME-Version: 1.0
--===============0883378942==
Content-Type: text/html; charset="utf-8"
cmVkYWN0ZWRyZWRhY3RlZHJlZGFjdGVkcmVkYWN0ZWRyZWRhY3RlZHJlZGFjdGVkcmVkYWN0ZWRy
ZWRhY3RlZHJlZGFjdGVkcmVkYWN0ZWRyZWRhY3RlZHJlZGFjdGVkcmVkYWN0ZWRyZWRhY3RlZHJl
ZGFjdGVkcmVkYWN0ZWRyZWRhY3RlZHJlZGFjdGVkcmVkYWN0ZWRyZWRhY3RlZA0K
--===============0883378942==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="foo.txt"
Hello world
--===============0883378942==--
Plain to Multipart code
# If this method is not used on an email object
# A `TypeError` is raised with the message "Attach is not valid on a message with a non-multipart payload"
def mail_to_multipart(mail):
"""
Convert an email to a multipart email
:param mail: Email object
:return: None
"""
if mail.is_multipart():
return mail
mail_new = MIMEMultipart("mixed")
headers = list((k, v) for (k, v) in mail.items() if k != "Content-Type")
for k, v in headers:
mail_new[k] = v
for k, v in headers:
del mail[k]
mail_new.attach(mail)
return mail_new
Apparently the issue was related to the Content-Transfer-Encoding header not remaining in the old multipart block, by changing the following line:
headers = list((k, v) for (k, v) in mail.items() if k != "Content-Type")
To this:
headers = list((k, v) for (k, v) in mail.items() if k not in ("Content-Type", "Content-Transfer-Encoding"))
Fixed the issue
Here is a method which tries to get the html part of an email message:
from __future__ import absolute_import, division, unicode_literals, print_function
import email
html_mail_quoted_printable=b'''Subject: =?ISO-8859-1?Q?WG=3A_Wasenstra=DFe_84_in_32052_Hold_Stau?=
MIME-Version: 1.0
Content-type: multipart/mixed;
Boundary="0__=4EBBF4C4DFD012538f9e8a93df938690918c4EBBF4C4DFD01253"
--0__=4EBBF4C4DFD012538f9e8a93df938690918c4EBBF4C4DFD01253
Content-type: multipart/alternative;
Boundary="1__=4EBBF4C4DFD012538f9e8a93df938690918c4EBBF4C4DFD01253"
--1__=4EBBF4C4DFD012538f9e8a93df938690918c4EBBF4C4DFD01253
Content-type: text/plain; charset=ISO-8859-1
Content-transfer-encoding: quoted-printable
Freundliche Gr=FC=DFe
--1__=4EBBF4C4DFD012538f9e8a93df938690918c4EBBF4C4DFD01253
Content-type: text/html; charset=ISO-8859-1
Content-Disposition: inline
Content-transfer-encoding: quoted-printable
<html><body>
Freundliche Gr=FC=DFe
</body></html>
--1__=4EBBF4C4DFD012538f9e8a93df938690918c4EBBF4C4DFD01253--
--0__=4EBBF4C4DFD012538f9e8a93df938690918c4EBBF4C4DFD01253--
'''
def get_html_part(msg):
for part in msg.walk():
if part.get_content_type() == 'text/html':
return part.get_payload(decode=True)
msg=email.message_from_string(html_mail_quoted_printable)
html=get_html_part(msg)
print(type(html))
print(html)
Output:
<type 'str'>
<html><body>
Freundliche Gr��e
</body></html>
Unfortunately I get a byte string. I would like to have unicode string.
According to this answer msg.get_payload(decode=True) should do the magic. But it does not in this case.
How to decode a mime part of a message and get a unicode string in Python 2.7?
Unfortunately I get a byte string. I would like to have unicode string.
The decode=True parameter to get_payload only decodes the Content-Transfer-Encoding wrapper, the =-encoding in this message. To get from there to characters is one of the many things the email package makes you do yourself:
bytes = part.get_payload(decode=True)
charset = part.get_content_charset('iso-8859-1')
chars = bytes.decode(charset, 'replace')
(iso-8859-1 being the fallback in case the message specifies no encoding.)
I'm using Python email.mime lib to write emails, and I created two MIMEText objects and then attached them to Message as text (not as attachment), and as a result I got the MIME document as follows, as you can see there are two text objects, one is of type plain and the other is of type html, my question is that I can only see the latter text object (here is the html) in some mail clients, while I can see both text objects in some other mail clients (for example, live.com), so what caused this?
Content-Type: multipart/mixed; boundary="===============0542266593=="
MIME-Version: 1.0
FROM: john.smith#NYU.com
TO: john.smith#live.com, john.smith#gmail.com
SUBJECT: =?utf-8?q?A_Greeting_From_Postman?=
--===============0542266593==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
SGkhCkhvdyBhcmUgeW91PwpIZXJlIGlzIHRoZSBsaW5rIHlvdSB3YW50ZWQ6Cmh0dHA6Ly93d3cu
cHl0aG9uLm9yZw==
--===============0542266593==
Content-Type: text/html; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
ICAgICAgICA8aHRtbD4KICAgICAgICAgIDxoZWFkPjwvaGVhZD4KICAgICAgICAgIDxib2R5Pgog
ICAgICAgICAgICA8cD5IaSE8YnI+CiAgICAgICAgICAgICAgIEhvdyBhcmUgeW91Pzxicj4KICAg
ICAgICAgICAgICAgSGVyZSBpcyB0aGUgPGEgaHJlZj0iaHR0cDovL3d3dy5weXRob24ub3JnIj5s
aW5rPC9hPiB5b3Ugd2FudGVkLgogICAgICAgICAgICA8L3A+CiAgICAgICAgICA8L2JvZHk+CiAg
ICAgICAgPC9odG1sPgogICAgICAgIA==
--===============0542266593==--
You have specified 'multipart/mixed' as the mime type. If you want only one item to be displayed, specify 'multipart/alternative', as so:
email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from email.encoders import encode_base64
# Note: 'alternative' means only display one of the items.
msg = MIMEMultipart('alternative')
msg['Subject'] = "Hello"
msg['From'] = 'me#example.com'
msg['To'] = 'you#example.com'
msg.attach(MIMEText('Hello!', 'plain'))
msg.attach(MIMEText('<b>Hello!</b>', 'html'))
# Not required, but you had it in your example, so I kept it.
for i in msg.get_payload():
encode_base64(i)
print msg.as_string()
Python supports a quite functional MIME-Library called email.mime.
What I want to achieve is to get a MIME Part containing plain UTF-8 text to be encoded as quoted printables and not as base64. Although all functionallity is available in the library, I did not manage to use it:
Example:
import email.mime.text, email.encoders
m=email.mime.text.MIMEText(u'This is the text containing ünicöde', _charset='utf-8')
m.as_string()
# => Leads to a base64-encoded message, as base64 is the default.
email.encoders.encode_quopri(m)
m.as_string()
# => Leads to a strange message
The last command leads to a strange message:
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Transfer-Encoding: quoted-printable
GhpcyBpcyB0aGUgdGV4dCBjb250YWluaW5nIMO8bmljw7ZkZQ=3D=3D
This is obviously not encoded as quoted printables, the double transfer-encoding header is strange at last (if not illegal).
How can I get my text encoded as quoted printables in the mime-message?
Okay, I got one solution which is very hacky, but at least it leads into some direction: MIMEText assumes base64 and I don't know how to change this. For this reason I use MIMENonMultipart:
import email.mime, email.mime.nonmultipart, email.charset
m=email.mime.nonmultipart.MIMENonMultipart('text', 'plain', charset='utf-8')
#Construct a new charset which uses Quoted Printables (base64 is default)
cs=email.charset.Charset('utf-8')
cs.body_encoding = email.charset.QP
#Now set the content using the new charset
m.set_payload(u'This is the text containing ünicöde', charset=cs)
Now the message seems to be encoded correctly:
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
This is the text containing =C3=BCnic=C3=B6de
One can even construct a new class which hides the complexity:
class MIMEUTF8QPText(email.mime.nonmultipart.MIMENonMultipart):
def __init__(self, payload):
email.mime.nonmultipart.MIMENonMultipart.__init__(self, 'text', 'plain',
charset='utf-8')
utf8qp=email.charset.Charset('utf-8')
utf8qp.body_encoding=email.charset.QP
self.set_payload(payload, charset=utf8qp)
And use it like this:
m = MIMEUTF8QPText(u'This is the text containing ünicöde')
m.as_string()
In Python 3 you do not need your hack:
import email
# Construct a new charset which uses Quoted Printables (base64 is default)
cs = email.charset.Charset('utf-8')
cs.body_encoding = email.charset.QP
m = email.mime.text.MIMEText(u'This is the text containing ünicöde', 'plain', _charset=cs)
print(m.as_string())
Adapted from issue 1525919 and tested on python 2.7:
from email.Message import Message
from email.Charset import Charset, QP
text = "\xc3\xa1 = \xc3\xa9"
msg = Message()
charset = Charset('utf-8')
charset.header_encoding = QP
charset.body_encoding = QP
msg.set_charset(charset)
msg.set_payload(msg._charset.body_encode(text))
print msg.as_string()
will give you:
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
=C3=A1 =3D =C3=A9
Also see this response from a Python committer.