Strange characters in email that cannot be ingested by imap server - python

I am writing a tool that is able to backup and restore emails in Gmail via IMAP in python.
In some case the emails that are backed up from Gmail contain weird characters: ^# that then cannot be reingested by Gmail IMAP.
Delivered-To: xxxxx#lxxxxxx
Received: by 1x.xx.xx.xx with SMTP id jjjjjjjj;
Tue, 14 Jun 2011 16:56:26 -0700 (PDT)
Received: by x.x.x.x with SMTP id xxxx.xxx;
Tue, 14 Jun 2011 16:56:16 -0700 (PDT)
Return-Path: <foo.bar#email.com>
Delivery-Date: Mon, 23 Aug 2010 17:58:56 +0200
Received: from xxxxx (xxxxx [x.x.x.x])
by xxxx (node=xxx) with ESMTP (xxx)
id xxx ; Mon, 23 Aug 2010 17:58:56 +0200
Received: from [x] (x)
by x (x) with x (x)
id x; Mon, 23 Aug 2010 17:58:50 +0200
Message-ID: <x#foo.com>
Date: Mon, 23 Aug 2010 17:58:48 +0200
From: Foo Bar <foo.bar#email.com>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; de; rv:1.9.2.8) Gecko/20100802 Thunderbird/3.1.2
MIME-Version: 1.0
To: bar.foo#email.com <x>
Subject: The subject
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 8bit
X-Provags-ID: xxxxxxxxxxx=
Envelope-To: foo.bar#email.com
Hello All,
blah blah blah
^#
At the end their is this special character. Sometimes it appears in other emails in the middle.
When I store the email on disk (eml format) I just save it and revive it.
The encoding seems correct.
What is this character ?
Am I doing something wrong when I store the email in eml.
A bit of guidance would be appreciated.
Thanks.

Short answer: You can strip null characters from the body of the email prior to sending them back to Google.**
Longer answer:
Old email (according to RFC 822) was allowed to have null characters. New email (according to RFC 2822, circa 2003) is not allowed to have null characters. Note RFC 2822 reads: "Differences from earlier standards... ASCII 0 (null) removed."
It's entirely possible that Gmail accepts 822-style emails via SMTP (that's how the email first got to your inbox) but only 2822-style emails via IMAP (which is why you can't put it back via IMAP).
** Note: Don't blindly strip nulls from MIME documents included in the email. RFC 2822 "specifies that messages are made up of characters in the US-ASCII range of 1 through 127. There are other documents, specifically the MIME document series [RFC2045, RFC2046, RFC2047, RFC2048, RFC2049], that extend [RFC 2822] to allow for values outside of that range."

Related

Read all email messages from a text file containing multiple email messages using python

I have a single txt file which contains multiple email messages. Attached a sample text file which contains multiple email (.eml format)
From details
Return-Path: <emailaddress>
Delivered-To: email#address.com
Received: details
Received-SPF: details
Authentication-Results: details
Received: details
ARC-Seal: details
ARC-Message-Signature: details
Received: details
From: details
To: details
Subject: details
Thread-Topic: details
Thread-Index: details
Date: details
Message-ID:details
Accept-Language: en-US
Content-Language: en-US
Content-Type: multipart/mixed;
boundary="_004_DM5PR13MB138821372E6760B35B854B0CB74A0DM5PR13MB1388namp_"
MIME-Version: 1.0
details
--_004_DM5PR13MB138821372E6760B35B854B0CB74A0DM5PR13MB1388namp_
Content-Type: multipart/alternative;
boundary="_000_DM5PR13MB138821372E6760B35B854B0CB74A0DM5PR13MB1388namp_"
--_000_DM5PR13MB138821372E6760B35B854B0CB74A0DM5PR13MB1388namp_
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
test with FA
--_000_DM5PR13MB138821372E6760B35B854B0CB74A0DM5PR13MB1388namp_
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
<html>
<head>
<body>
details
</body>
</html>
--_000_DM5PR13MB138821372E6760B35B854B0CB74A0DM5PR13MB1388namp_--
--_004_DM5PR13MB138821372E6760B35B854B0CB74A0DM5PR13MB1388namp_
Content-Type: details
Content-Description: details
Content-Disposition: attachment; filename="p2_eml.eml"; size=37836;
creation-date="Tue, 04 Aug 2020 10:48:34 GMT";
modification-date="Tue, 04 Aug 2020 10:48:34 GMT"
Content-Transfer-Encoding: base64
base64encoded data
--_004_DM5PR13MB138821372E6760B35B854B0CB74A0DM5PR13MB1388namp_--
From details <--- 2nd email starts --->
Return-Path: <emailaddress>
Delivered-To: email#address.com
Received: details
Received-SPF: details
Authentication-Results: details
Received: details
more details
Using email lib by python, it can grab only 1st email message, but does not process rest of email messages.
But this creates a single msg object that refers to 1st email message in the txt file.
Is there a way i can fetch all email messages from txt file and process one by one?
msg = email.message_from_file(message) this only fetches the 1st email message object. Does not fetch the next message obj.
Code tried:
msg = email.message_from_file(message)
# Dump extra information To, From, Date, Subject header values.
dump_extra_info(msg)
decoded_content_list = []
for part in msg.walk():
charset = part.get_content_charset();
if part.get_content_type() == "application/octet-stream":
logger.info("found content disposition returning ...")
continue
decoded_data = part.get_payload(decode=True)
if decoded_data and charset is not None:
utf8decoded = decoded_data.decode(charset)
decoded_content_list.append(utf8decoded)
return ' '.join(decoded_content_list)```
```

Sending reply to existing message using gmail api does not show conversation trail in the gmail inbox

I'am using gmail rest api using Python. I construct the message using python built-in email library in the following way.
message=email.mime.multipart.MIMEMultipart('alternative')
message['from'] = 'Satish <satish#gmail.com>'
message['to'] = 'Satish1 <satish1#gmail.com>'
message['subject'] = 'Same as reply to message's subject'
raw_message = {'raw': base64.urlsafe_b64encode(message.as_string())}
if reply_to:
raw_message['threadId'] = reply_to # thread id of existing conversation
return raw_message
I sent this message using Gmail Rest Api like this:
users().messages().send(userId='me', body=raw_message).execute()
I'am getting this message as part of the thread which is fine. But unable to see conversation trail attached to the message.
[conversation trail is something attached with the message as ... and when hovered it says show trimmed content ]
Any help on this is appreciated.
Thanks in Advance
The trail is just a part of the message. The Gmail API does not implement that for you. You could just get the message you are responding to, and put it below your message, with a starting > for each quoted line.
MIME-Version: 1.0
Received: by 10.194.176.73 with HTTP; Thu, 11 Feb 2016 07:48:48 -0800 (PST)
In-Reply-To: <CADsZLRyvpU3bVw4MmmqGKTr=4bAAQmrRKj3gABVBWqrr8peoUA#mail.gmail.com>
References: <CADsZLRyvpU3bVw4MmmqGKTr=4bAAQmrRKj3gABVBWqrr8peoUA#mail.gmail.com>
Date: Thu, 11 Feb 2016 16:48:48 +0100
Delivered-To: emtholin#gmail.com
Message-ID: <CADsZLRztKLR0GgUSZxN6+B4pwxZiFi=6Rexq+kBXTYWy1UnojQ#mail.gmail.com>
Subject: Re: Hello my friend
From: Emil Tholin <emtholin#gmail.com>
To: Emil Tholin <emtholin#gmail.com>
Content-Type: multipart/alternative; boundary=001a1130d08c848a3e052b807cea
--001a1130d08c848a3e052b807cea
Content-Type: text/plain; charset=UTF-8
Likewise buddy.
2016-02-11 16:48 GMT+01:00 Emil Tholin <emtholin#gmail.com>:
> Nice to meet you.
>
--001a1130d08c848a3e052b807cea
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr">Likewise buddy.</div><div class=3D"gmail_extra"><br><div c=
lass=3D"gmail_quote">2016-02-11 16:48 GMT+01:00 Emil Tholin <span dir=3D"lt=
r"><<a href=3D"mailto:emtholin#gmail.com" target=3D"_blank">emtholin#gma=
il.com</a>></span>:<br><blockquote class=3D"gmail_quote" style=3D"margin=
:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr">N=
ice to meet you.</div>
</blockquote></div><br></div>
--001a1130d08c848a3e052b807cea--

MIME Attachments won't send with Subject Line

I'm having trouble with a bit of code sending an email with attachments AND a Subject line.
# Code exerpt from Oli: http://stackoverflow.com/questions/3362600/how-to-send-email-attachments-with-python
# Emails aren't sending with a subject--need to fix this.
def send_mail(self, send_from, send_to, subject, text, files=None, server="localhost"):
assert isinstance(send_to, list)
msg = MIMEMultipart(
Subject=subject,
From=send_from,
To=COMMASPACE.join(send_to),
Date=formatdate(localtime=True)
)
msg.attach(MIMEText(text))
for f in files or []:
with open(f, "rb") as fil:
msg.attach(MIMEApplication(
fil.read(),
Content_Disposition='attachment; filename="%s"' % basename(f),
Name=basename(f)
))
smtp = smtplib.SMTP(server)
smtp.sendmail(send_from, send_to, msg.as_string())
smtp.close()
This code sends an email fine, but it is not deliminating the 'Subject' line and the emails it sends have a subject line of "NO SUBJECT.' Here's what it shows when I print the first part of the MIME msg:
From nobody Thu Oct 29 16:17:38 2015
Content-Type: multipart/mixed; date="Thu, 29 Oct 2015 16:17:38 +0000";
to="me#email.com";
from="someserver#somewhere.com"; subject="TESTING";
boundary="===============0622475305469306134=="
MIME-Version: 1.0
--===============0622475305469306134==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Here we go, oh! ho! ho!
--===============0622475305469306134==
Content-Type: application/octet-stream; Content- Disposition="attachment;
filename=\"Log_Mill.py\""; Name="Log_Mill.py"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
I might be able to figure it out if I plug away for hours and hours, but I'm hoping to avoid the extra work for such a trivial fix.
Any help is appreciated!
You are assigning the Subject etc. as attributes of the multipart container, but that's incorrect. The headers you want to specify should be passed to the msg itself as headers instead, like this:
msg = MIMEMultipart()
msg['Subject'] = subject
msg['From'] = send_from
msg['To'] = COMMASPACE.join(send_to)
msg['Date'] = formatdate(localtime=True)
The output should look more like
From nobody Thu Oct 29 16:17:38 2015
Date: Thu, 29 Oct 2015 16:17:38 +0000
To: <me#email.com>
From: <someserver#somewhere.com>
Subject: TESTING
Content-Type: multipart/mixed;
boundary="===============0622475305469306134=="
MIME-Version: 1.0
--===============0622475305469306134==
Content-Type: text/plain; .......
You could also use a package specialised for writing HTML emails, showing pictures inline and easily attach files!
The package I'm referring to is yagmail and I'm the developer/maintainer.
import yagmail
yag = yagmail.SMTP('email#email.com', 'email_pwd')
file_names = ['/local/path/f.mp3', '/local/path/f.txt', '/local/path/f.avi']
yag.send('to#email.com', 'Sample subject', contents = ['This is text'] + filenames)
That's all there is to it.
Use pip install yagmail to obtain your copy.
Contents can be a list where you also add text, you can just only have the file_names as contents, awesome no?
It reads the file, magically determines the encoding, and attached it :)
Read the github for other tricks like passwordless scripts, aliasing and what not.

How can I send Inline images in Email with Python/Django? [duplicate]

This question already exists:
How to send Inline images in Email with Python/Django?
Closed 9 years ago.
I'm trying to send an email with an inline image using Python/Django.
Here is the code showing how I am doing it.
It's still in development. So all it is meant to do for now is send a dummy email message with a picture of a bumble bee embedded in it.
Yet when I receive the email in my Gmail inbox, I see only the following text-based email. The various Mime parts of the email show up in the payload of the email as text.
I clicked the "Show Original" button in Gmail and cut-n-pasted the entire email below so you can see what I get.
Can someone suggest what I'm doing wrong here? And a possible solution?
Delivered-To: myemail#gmail.com
Received: by 10.58.189.196 with SMTP id gk4csp207059vec;
Mon, 17 Feb 2014 23:10:53 -0800 (PST)
X-Received: by 10.140.22.145 with SMTP id 17mr38512811qgn.0.1392707452834;
Mon, 17 Feb 2014 23:10:52 -0800 (PST)
Return-Path: <0000014443d53bd9-c1021b39-b43e-4d6f-bb55-0aff6c4b38f5-000000#amazonses.com>
Received: from a8-41.smtp-out.amazonses.com (a8-41.smtp-out.amazonses.com. [54.240.8.41])
by mx.google.com with ESMTP id j50si9661440qgf.137.2014.02.17.23.10.52
for <myemail#gmail.com>;
Mon, 17 Feb 2014 23:10:52 -0800 (PST)
Received-SPF: pass (google.com: domain of 0000014443d53bd9-c1021b39-b43e-4d6f-bb55-0aff6c4b38f5-000000#amazonses.com designates 54.240.8.41 as permitted sender) client-ip=54.240.8.41;
Authentication-Results: mx.google.com;
spf=pass (google.com: domain of 0000014443d53bd9-c1021b39-b43e-4d6f-bb55-0aff6c4b38f5-000000#amazonses.com designates 54.240.8.41 as permitted sender) smtp.mail=0000014443d53bd9-c1021b39-b43e-4d6f-bb55-0aff6c4b38f5-000000#amazonses.com
Return-Path: 0000014443d53bd9-c1021b39-b43e-4d6f-bb55-0aff6c4b38f5-000000#amazonses.com
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Subject: Hello World3
From: My Django App <donotrespond#mydjangoapp.com>
To: myemail#gmail.com
Date: Tue, 18 Feb 2014 07:10:51 +0000
Message-ID: <0000014443d53bd9-c1021b39-b43e-4d6f-bb55-0aff6c4b38f5-000000#email.amazonses.com>
X-SES-Outgoing: 2014.02.18-54.240.8.41
Content-Type: multipart/related;
boundary="===============1003274537458441237=="
MIME-Version: 1.0
--===============1003274537458441237==
Content-Type: text/html; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
<p>Hello <img src="cid:myimage" /></p>
--===============1003274537458441237==
Content-Type: image/jpeg
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Id: <myimage>
/9j/4AAQSkZJRgABAQAAAQABAAD/2wCEAAkGBxQTERUUEhIWFBUVFxcVFRQVGBUUFRcYFxUWFhQU
FRUYHCggGRolHRQVITEiJSkrLi4uFx8zODMsNygtLisBCgoKDg0OGhAQGywmICYzLDc3MCwvLCw1
<VERY LARGE PORTION SNIPPED>
BAgQIECAAIGaAsLKmnPVFQECBAgQIECAAAECBAgQIECAAIF0AsLKdCNTMAECBAgQIECAAAECBAgQ
IECAAIGaAsLKmnPVFQECBAgQIECAAAECBAgQIECAAIF0Av8HNFl0J1BnG68AAAAASUVORK5CYII=
--===============5170682983005376168==--
It looks like you have:
multipart/related
-> text/html
-> image/jpeg
I've also had trouble in the past sending email with the top part being multipart/related. Try this instead:
multipart/mixed
-> multipart/related
--> text/html
--> image/jpeg
Also, make sure and set the disposition on the image like this:
img.add_header("Content-Disposition", "inline", filename="myimage")

JSON string decoding error

I am calling the URL :
http://code.google.com/feeds/issues/p/chromium/issues/full/291?alt=json
using urllib2 and decoding using the json module
url = "http://code.google.com/feeds/issues/p/chromium/issues/full/291?alt=json"
request = urllib2.Request(query)
response = urllib2.urlopen(request)
issue_report = json.loads(response.read())
I run into the following error :
ValueError: Invalid control character at: line 1 column 1120 (char 1120)
I tried checking the header and I got the following :
Content-Type: application/json; charset=UTF-8
Access-Control-Allow-Origin: *
Expires: Sun, 03 Jul 2011 17:38:38 GMT
Date: Sun, 03 Jul 2011 17:38:38 GMT
Cache-Control: private, max-age=0, must-revalidate, no-transform
Vary: Accept, X-GData-Authorization, GData-Version
GData-Version: 1.0
ETag: W/"CUEGQX47eCl7ImA9WxJaFEw."
Last-Modified: Tue, 04 Aug 2009 19:20:20 GMT
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
Server: GSE
Connection: close
I also tried adding an encoding parameter as follows :
issue_report = json.loads(response.read() , encoding = 'UTF-8')
I still run into the same error.
The feed has raw data from a JPEG in it at that point; the JSON is malformed, so it's not your fault. Report a bug to Google.
You could consider using lxml instead, since the JSON is malformed. It's XPath support makes working with XML pretty straight-forward:
import lxml.etree
url = 'http://code.google.com/feeds/issues/p/chromium/issues/full/291'
doc = lxml.etree.parse(url)
ns = {'issues': 'http://schemas.google.com/projecthosting/issues/2009'}
issues = doc.xpath('//issues:*', namespaces=ns)
Fairly easy to manipulate elements, for instance to strip namespace from tags, convert to dict:
>>> dict((x.tag[len(ns['issues'])+2:], x.text) for x in issues)
<<<
{'closedDate': '2009-08-04T19:20:20.000Z',
'id': '291',
'label': 'Area-BrowserUI',
'stars': '13',
'state': 'closed',
'status': 'Verified'}

Categories

Resources