I want to decode 'quoted-printable' encoded strings in Python, but I seem to be stuck at a point.
I fetch certain mails from my gmail account based on the following code:
import imaplib
import email
import quopri
mail = imaplib.IMAP4_SSL('imap.gmail.com')
mail.login('mail#gmail.com', '*******')
mail.list()
mail.select('"[Gmail]/All Mail"')
typ, data = mail.search(None, 'SUBJECT', '"{}"'.format('123456'))
data[0].split()
print(data[0].split())
for e_mail in data[0].split():
typ, data = mail.fetch('{}'.format(e_mail.decode()),'(RFC822)')
raw_mail = data[0][1]
email_message = email.message_from_bytes(raw_mail)
if email_message.is_multipart():
for part in email_message.walk():
if part.get_content_type() == 'text/plain':
if part.get_content_type() == 'text/plain':
body = part.get_payload()
to = email_message['To']
utf = quopri.decodestring(to)
text = utf.decode('utf-8')
print(text)
.
.
.
If I print 'to' for example, the result is this if the 'to' has characters like é,á,ó...:
=?UTF-8?B?UMOpdGVyIFBldMWRY3o=?=
I can decode the 'body' quoted-printable encoded string successfully using the quopri library as such:
quopri.decodestring(sometext).decode('utf-8')
But the same logic doesn't work for other parts of the e-mail, such as the to, from, subject.
Anyone knows a hint?
The subject string you have is not pure quoted printable encoding (i.e. not standard quopri) — it is a mixture of base64 and quoted printable. You can decode it with the standard library:
from email.header import decode_header
result = decode_header('=?UTF-8?B?UMOpdGVyIFBldMWRY3o=?=')
# ^ the result is a list of tuples of the form [(decoded_bytes, encoding),]
for data, encoding in result:
print(data.decode(encoding))
# outputs: Péter Petőcz
You are trying to decode latin characters using utf-8. The output you are getting is base64. It reads:
No printable characters found, try another source charset, or upload your data as a file for binary decoding.
Give this a try.
Python: Converting from ISO-8859-1/latin1 to UTF-8
This solves it:
from email.header import decode_header
def mail_header_decoder(header):
if header != None:
mail_header_decoded = decode_header(header)
l=[]
header_new=[]
for header_part in mail_header_decoded:
l.append(header_part[1])
if all(item == None for item in l):
# print(header)
return header
else:
for header_part in mail_header_decoded:
header_new.append(header_part[0].decode())
header_new = ''.join(header_new) # convert list to string
# print(header_new)
return header_new
Related
I went to this website https://jwt.io/#debugger-io
Here I took the given sample info and passed it to my code.
But the encoded information that I got is not matching what is given on the website.
Since I am doing something wrong here, I am not able to generate the signature in a valid format.
I need to make a program for JWT verification without using PyJWT types libraries.
Here is my code
import base64
import hmac
header = {"alg": "HS256", "typ": "JWT"}
payload = {"sub": "1234567890", "name": "John Doe", "iat": 1516239022}
header = base64.urlsafe_b64encode(bytes(str(header), 'utf-8'))
payload = base64.urlsafe_b64encode(bytes(str(payload), 'utf-8'))
print(header)
print(payload)
signature = hmac.new(bytes('hi', 'utf-8'), header + b'.' + payload, digestmod='sha256').hexdigest()
print(signature)
Outputs
There are 3 things that need to be changed in your code.
your header and payload are not valid JSON. If you decode the result of your Base64 encoded header eydhbGcnOiAnSFMyNTYnLCAndHlwJzogJ0pXVCd9 you get
{'alg': 'HS256', 'typ': 'JWT'}
but it should be (with "instead of ')
{"alg": "HS256", "typ": "JWT"}
the function base64.b64encode produces an output that can still contain '='. The padding has to be removed.
You create the signature with .hexdigest(), which produces a hex-ascii string. Instead you need to use .digest() to get binary output and then Base64URL encode the result.
The code below is a corrected version of your code which produces a JWT string that can be verified on https://jwt.io.
import base64
import hmac
header = '{"alg":"HS256","typ":"JWT"}'
payload = '{"sub":"1234567890","name":"John Doe","iat":1516239022}'
header = base64.urlsafe_b64encode(bytes(str(header), 'utf-8')).decode().replace("=", "")
payload = base64.urlsafe_b64encode(bytes(str(payload), 'utf-8')).decode().replace("=", "")
print(header)
print(payload)
signature = hmac.new(bytes('hi', 'utf-8'), bytes(header + '.' + payload, 'utf-8'), digestmod='sha256').digest()
sigb64 = base64.urlsafe_b64encode(bytes(signature)).decode().replace("=", "")
print(sigb64)
token = header + '.' + payload + '.' + sigb64
print(token)
As a sidenote: The secret that you use to create the HMAC should have a length of at least 256 bits, even if the very short key is accepted. Some libs enforce a minimum key length.
I tried to adapt this script I found by searching Google.
Was working perfectly with the previous emails I was receiving, as it was directly extracting the "From" field, and I didn't get the error.
Here is what my code looks like :
#!/usr/bin/python
import imaplib
import sys
import email
import re
#FOLDER=sys.argv[1]
FOLDER='folder'
LOGIN='login#gmail.com'
PASSWORD='password'
IMAP_HOST = 'imap.gmail.com' # Change this according to your provider
email_list = []
email_unique = []
mail = imaplib.IMAP4_SSL(IMAP_HOST)
mail.login(LOGIN, PASSWORD)
mail.select(FOLDER)
result, data = mail.search(None, 'ALL')
ids = data[0]
id_list = ids.split()
for i in id_list:
typ, data = mail.fetch(i,'(RFC822)')
for response_part in data:
if isinstance(response_part, tuple):
msg = email.message_from_string(response_part[1])
sender = msg['reply-to'].split()[0]
address = re.sub(r'[<>]','',sender)
# Ignore any occurences of own email address and add to list
if not re.search(r'' + re.escape(LOGIN),address) and not address in email_list:
email_list.append(address)
print address
Instead of messing around with string splitting and slicing, the correct approach is to use parseaddr from the email.utils package in the standard library. It correctly handles the various legal address formats in email headers.
Some examples:
>>> from email.utils import parseaddr
>>> parseaddr("sally#foo.com")
('', 'sally#foo.com')
>>> parseaddr("<sally#foo.com>")
('', 'sally#foo.com')
>>> parseaddr("Sally <sally#foo.com>")
('Sally', 'sally#foo.com')
>>> parseaddr("Sally Smith <sally#foo.com>")
('Sally Smith', 'sally#foo.com')
>>>
Also, you shouldn't assume that emails have a Reply-To header. Many do not.
Im trying to save unicode data to an external webservice.
When I try to save æ-ø-å, it get saved as æ-ø-Ã¥ in the external system.
Edit:
(My firstname value is Jørn) (Value from django J\\xf8rn)
firstname.value=user_firstname = Jørn
Here is my result if I try to use encode:
firstname.value=user_firstname.encode('ascii', 'replace') = J?rn
firstname.value=user_firstname.encode('ascii', 'xmlcharrefreplace') = Jørn
firstname.value=user_firstname.encode('ascii', 'backslashreplace') = J\xf8rn
firstname.value=user_firstname.encode('ascii', 'ignore') = I get a unicode error using ignore.
My form for updating a user:
def show_userform(request):
if request.method == 'POST':
form = UserForm(request.POST, request.user)
if form.is_valid():
u = UserProfile.objects.get(username = request.user)
firstname = form.cleaned_data['first_name']
lastname = form.cleaned_data['last_name']
tasks.update_webservice.delay(user_firstname=firstname, user_lastname=lastname)
return HttpResponseRedirect('/thank-you/')
else:
form = UserForm(instance=request.user) # An unbound form
return render(request, 'myapp/form.html', {
'form': form,
})
Here is my task:
from suds.client import Client
#task()
def update_webservice(user_firstname, user_lastname):
membermap = client.factory.create('ns2:Map')
firstname = client.factory.create('ns2:mapItem')
firstname.key="Firstname"
firstname.value=user_firstname
lastname = client.factory.create('ns2:mapItem')
lastname.key="Lastname"
lastname.value=user_lastname
membermap.item.append(firstname)
membermap.item.append(lastname)
d = dict(CustomerId='xxx', Password='xxx', PersonId='xxx', ContactData=membermap)
try:
#Send updates to SetPerson function
result = client.service.SetPerson(**d)
except WebFault, e:
print e
What do I need to do, to make the data saved correctly?
Your external system is interpreting your UTF-8 as if it were Latin-1, or maybe Windows-1252. That's bad.
Encoding or decoding ASCII is not going to help. Your string is definitely not plain ASCII.
If you're lucky, it's just that you're missing some option in that web service's API, with which you could tell it that you're sending it UTF-8.
If not, you've got quite a maintenance headache on your hands, but you can still fix what you get back. The web service took the string you encoded as UTF-8 and decoded it as Latin-1, so you just need to do the exact reverse of that:
user_firstname = user_firstname.encode('latin-1').decode('utf-8')
Use decode and encode methods for str type.
for example :
x = "this is a test" # ascii encode
x = x.encode("utf-8") # utf-8 encoded
x = x.decode("utf-8") # ascii encoded
I am trying to obtain email ids and then fetch all of them. How do I do this? Thanks!
The following is my code:
import imaplib
import re
user = 'user'
pwd = 'password'
imap_server = imaplib.IMAP4_SSL('imap.gmail.com', 993)
imap_server.login(user, pwd)
imap_server.select('Inbox')
typ, response = imap_server.search(None, '(SUBJECT "Hello")')
response = str(response[0])
response_re = re.compile('\d+')
response_pat = re.findall(response_re, response)
for i in response_pat:
results, datas = imap_server.fetch(i, "(RFC822)")
for i in datas:
print i
this still on print one value of datas, when I have iterated through a list of multiple #values.
You made a mistake with the command. It should be RFC822 instead of RCF822. Simply just change one line of your code. Change this line from
results, datas = imap_server.fetch(i, "(RCF822)")
to
results, datas = imap_server.fetch(i, "(RFC822)")
And also, don't use regex when you can simply use string libraries. Instead of using regex, simply do this in your loop:
for i in response[0].split():
results, datas = m.fetch(i, "(RFC822)")
I'd like to fetch the whole message from IMAP4 server.
In python docs if found this bit of code that works:
>>> t, data = M.fetch('1', '(RFC822)')
>>> body = data[0][1]
I'm wondering if I can always trust that data[0][1] returns the body of the message. When I've run 'RFC822.SIZE' I've got just a string instead of a tuple.
I've skimmed through rfc1730 but I wasn't able to figure out the proper response structure for the 'RFC822'. It is also hard to tell the fetch result structure from imaplib documentation.
Here is what I'm getting when fetching RFC822:
('OK', [('1 (RFC822 {858569}', 'body of the message', ')')])
But when I fetch RFC822.SIZE I'm getting:
('OK', ['1 (RFC822.SIZE 847403)'])
How should I properly handle the data[0] list?
Can I trust that when it is a list of tuples the tuples has exactly 3 parts and the second part is the payload?
Maybe you know any better library for imap4?
No... imaplib is a pretty good library, it's imap that's so unintelligible.
You may wish to check that t == 'OK', but data[0][1] works as expected for as much as I've used it.
Here's a quick example I use to extract signed certificates I've received by email, not bomb-proof, but suits my purposes:
import getpass, os, imaplib, email
from OpenSSL.crypto import load_certificate, FILETYPE_PEM
def getMsgs(servername="myimapserverfqdn"):
usernm = getpass.getuser()
passwd = getpass.getpass()
subject = 'Your SSL Certificate'
conn = imaplib.IMAP4_SSL(servername)
conn.login(usernm,passwd)
conn.select('Inbox')
typ, data = conn.search(None,'(UNSEEN SUBJECT "%s")' % subject)
for num in data[0].split():
typ, data = conn.fetch(num,'(RFC822)')
msg = email.message_from_string(data[0][1])
typ, data = conn.store(num,'-FLAGS','\\Seen')
yield msg
def getAttachment(msg,check):
for part in msg.walk():
if part.get_content_type() == 'application/octet-stream':
if check(part.get_filename()):
return part.get_payload(decode=1)
if __name__ == '__main__':
for msg in getMsgs():
payload = getAttachment(msg,lambda x: x.endswith('.pem'))
if not payload:
continue
try:
cert = load_certificate(FILETYPE_PEM,payload)
except:
cert = None
if cert:
cn = cert.get_subject().commonName
filename = "%s.pem" % cn
if not os.path.exists(filename):
open(filename,'w').write(payload)
print "Writing to %s" % filename
else:
print "%s already exists" % filename
The IMAPClient package is a fair bit easier to work with. From the description:
Easy-to-use, Pythonic and complete
IMAP client library.
Try my package:
https://pypi.org/project/imap-tools/
example:
from imap_tools import MailBox
# get list of email bodies from INBOX folder
with MailBox('imap.mail.com').login('test#mail.com', 'password', 'INBOX') as mailbox:
bodies = [msg.text or msg.html for msg in mailbox.fetch()]
Features:
Parsed email message attributes
Query builder for searching emails
Work with emails in folders (copy, delete, flag, move, append)
Work with mailbox folders (list, set, get, create, exists, rename, delete, status)
No dependencies
This was my solution to extract the useful bits of information. It's been reliable so far:
import datetime
import email
import imaplib
import mailbox
EMAIL_ACCOUNT = "your#gmail.com"
PASSWORD = "your password"
mail = imaplib.IMAP4_SSL('imap.gmail.com')
mail.login(EMAIL_ACCOUNT, PASSWORD)
mail.list()
mail.select('inbox')
result, data = mail.uid('search', None, "UNSEEN") # (ALL/UNSEEN)
i = len(data[0].split())
for x in range(i):
latest_email_uid = data[0].split()[x]
result, email_data = mail.uid('fetch', latest_email_uid, '(RFC822)')
# result, email_data = conn.store(num,'-FLAGS','\\Seen')
# this might work to set flag to seen, if it doesn't already
raw_email = email_data[0][1]
raw_email_string = raw_email.decode('utf-8')
email_message = email.message_from_string(raw_email_string)
# Header Details
date_tuple = email.utils.parsedate_tz(email_message['Date'])
if date_tuple:
local_date = datetime.datetime.fromtimestamp(email.utils.mktime_tz(date_tuple))
local_message_date = "%s" %(str(local_date.strftime("%a, %d %b %Y %H:%M:%S")))
email_from = str(email.header.make_header(email.header.decode_header(email_message['From'])))
email_to = str(email.header.make_header(email.header.decode_header(email_message['To'])))
subject = str(email.header.make_header(email.header.decode_header(email_message['Subject'])))
# Body details
for part in email_message.walk():
if part.get_content_type() == "text/plain":
body = part.get_payload(decode=True)
file_name = "email_" + str(x) + ".txt"
output_file = open(file_name, 'w')
output_file.write("From: %s\nTo: %s\nDate: %s\nSubject: %s\n\nBody: \n\n%s" %(email_from, email_to,local_message_date, subject, body.decode('utf-8')))
output_file.close()
else:
continue