python requests issues with non ascii data

python requests issues with non ascii data - python

I'm having problem using non-ascii characters in a file I'm trying to send as an attachment using requests.
The exception pops at httplib module in the _send_output function.
see this image:
here is my code:
response = requests.post(url="https://api.mailgun.net/v2/%s/messages" % utils.config.mailDomain,
auth=("api", utils.config.mailApiKey),
data={
"from" : me,
"to" : recepients,
"subject" : subject,
"html" if html else "text" : message
},
files= [('attachment', open(f)) for f in attachments] if attachments and len(attachments) else []
)
The problem is with the files parameter, containing non ascii data (hebrew).
The exception as can be seen in the image is:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 673: ordinal not in range(128)

the open() function has a parameter encoding, used like f = open('t.txt', encoding='utf-8') which accepts a variety of parameters as outlined in the docs. Find out what encoding scheme your data uses (probably UTF-8) and see if opening with that encoding works.

Don't use the encoding parameter to open the files because you want to open them as binary data. The calls to open should look like open(f, 'rb'). The documentation for requests only shows examples like this purposefully and even documents this behaviour.

Related

Errror in outputting CSV with Django?

I am trying to output my model as a CSV file.It is working fine with small data in model and it is very slow with large data.And secondly there are some error in outputting a model as CSV.My logic which I am using is:
def some_view(request):
# Create the HttpResponse object with the appropriate CSV header.
response = HttpResponse(content_type='text/csv')
response['Content-Disposition'] = 'attachment; filename="news.csv"'
writer = csv.writer(response)
news_obj = News.objects.using('cms').all()
for item in news_obj:
#writer.writerow([item.newsText])
writer.writerow([item.userId.name])
return response
and the error which I am facing is:
UnicodeEncodeError :--
'ascii' codec can't encode characters in position 0-6: ordinal not in
range(128)
and further it says:-
The string that could not be encoded/decoded was: عبدالله الحذ

Replace line
writer.writerow([item.userId.name])
with:
writer.writerow([item.userId.name.encode('utf-8')])
Before saving unicode string to a file you must encode it in some encoding. Most system use utf-8 by default, so it's a safe choice.

From the error, The write content of csv file is like ASCII character. So decode the character.
>>>u'aあä'.encode('ascii', 'ignore')
'a'
Can fix this error from ignoring the ASCII character:
writer.writerow([item.userId.name.encode('ascii', 'ignore')])

Parse text file from content-type=application/zip and base64 encoding in AWS SES

On amazon SES, I have a rule to save incoming emails to S3 buckets. Amazon saves these in MIME format.
These emails have a .txt in attachment that will be shown in the MIME file as content-type=text/plain, Content-Disposition=attachment ... .txt, and Content-Transfer-Encoding=quoted-printable or bases64.
I am able to parse it fine using python.
I have a problem decoding the content of the .txt file attachment when this is compressed (i.e., content-type: applcation/zip), as if the encoding wasn't base64.
My code:
import base64
s = unicode(base64.b64decode(attachment_content), "utf-8")
throws the error:
Traceback (most recent call last):
File "<input>", line 796, in <module>
UnicodeDecodeError: 'utf8' codec can't decode byte 0xcf in position 10: invalid continuation byte
Below are the first few lines of the "base64" string in attachment_content, which btw has length 53683 + "==" at the end, and I thought that the length of a base64 should be a multiple of 4 (??).
So maybe the decoding is failing because the compression is changing attachment_content and I need some other operation before/after decoding it? I have really no idea..
UEsDBBQAAAAIAM9Ah0otgkpwx5oAADMTAgAJAAAAX2NoYXQudHh0tL3bjiRJkiX23sD+g0U3iOxu
REWGu8c1l2Ag8lKd0V2ZWajM3kLuC6Hubu5uFeZm3nYJL6+n4T4Ry8EOdwCSMyQXBRBLgMQ+7CP5
QPBj5gdYn0CRI6JqFxWv7hlyszursiJV1G6qonI5cmQyeT6dPp9cnCaT6Yvp5Yvz6xfJe7cp8P/k
1SbL8xfJu0OSvUvr2q3TOnFVWjxrknWZFeuk2VRlu978s19MRvNMrHneOv51SOZlGUtMLYnfp0nd
...
I have also tried used "latin-1", but get gibberish.

The problem was that, after conversion, I was dealing with a zipped file in format, like "PK \x03 \x04 \X3C \Xa \x0c ...", and I needed to unzip it before transforming it to UTF-8 unicode.
This code worked for me:
import email
# Parse results from email
received_email = email.message_from_string(email_text)
for part in received_email.walk():
c_type = part.get_content_type()
c_enco = part.get('Content-Transfer-Encoding')
attachment_content = part.get_payload()
if c_enco == 'base64':
import base64
decoded_file = base64.b64decode(attachment_content)
print("File decoded from base64")
if c_type == "application/zip":
from cStringIO import StringIO
import zipfile
zfp = zipfile.ZipFile(StringIO(decoded_file), "r")
unzipped_list = zfp.open(zfp.namelist()[0]).readlines()
decoded_file = "".join(unzipped_list)
print('And un-zipped')
result = unicode(decoded_file, "utf-8")

Program (twitter bot) works on Windows machine, but not on Linux machine [duplicate]

I was trying to read a file in python2.7, and it was readen perfectly. The problem that I have is when I execute the same program in Python3.4 and then appear the error:
'utf-8' codec can't decode byte 0xf2 in position 424: invalid continuation byte'
Also, when I run the program in Windows (with python3.4), the error doesn't appear. The first line of the document is:
Codi;Codi_lloc_anonim;Nom
and the code of my program is:
def lectdict(filename,colkey,colvalue):
f = open(filename,'r')
D = dict()
for line in f:
if line == '\n': continue
D[line.split(';')[colkey]] = D.get(line.split(';')[colkey],[]) + [line.split(';')[colvalue]]
f.close
return D
Traduccio = lectdict('Noms_departaments_centres.txt',1,2)

In Python2,
f = open(filename,'r')
for line in f:
reads lines from the file as bytes.
In Python3, the same code reads lines from the file as strings. Python3
strings are what Python2 call unicode objects. These are bytes decoded
according to some encoding. The default encoding in Python3 is utf-8.
The error message
'utf-8' codec can't decode byte 0xf2 in position 424: invalid continuation byte'
shows Python3 is trying to decode the bytes as utf-8. Since there is an error, the file apparently does not contain utf-8 encoded bytes.
To fix the problem you need to specify the correct encoding of the file:
with open(filename, encoding=enc) as f:
for line in f:
If you do not know the correct encoding, you could run this program to simply
try all the encodings known to Python. If you are lucky there will be an
encoding which turns the bytes into recognizable characters. Sometimes more
than one encoding may appear to work, in which case you'll need to check and
compare the results carefully.
# Python3
import pkgutil
import os
import encodings
def all_encodings():
modnames = set(
[modname for importer, modname, ispkg in pkgutil.walk_packages(
path=[os.path.dirname(encodings.__file__)], prefix='')])
aliases = set(encodings.aliases.aliases.values())
return modnames.union(aliases)
filename = '/tmp/test'
encodings = all_encodings()
for enc in encodings:
try:
with open(filename, encoding=enc) as f:
# print the encoding and the first 500 characters
print(enc, f.read(500))
except Exception:
pass

Ok, I did the same as #unutbu tell me. The result was a lot of encodings one of these are cp1250, for that reason I change :
f = open(filename,'r')
to
f = open(filename,'r', encoding='cp1250')
like #triplee suggest me. And now I can read my files.

In my case I can't change encoding because my file is really UTF-8 encoded. But some rows are corrupted and causes the same error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 7092: invalid continuation byte
My decision is to open file in binary mode:
open(filename, 'rb')

Unicode error reading Python log file (logging)

I am creating a log file using Pythons logging library. When I am trying to read it with python and print it on a html page (using Flask), I get:
<textarea cols="80" rows="20">{% for line in log %}{{line}}{% endfor %}
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 36: ordinal not in range(128)
I guess that this has to do with that the log file is decoded in some other decoding, but which?
This is the line setting the log file if it helps:
fileLogger = logging.handlers.TimedRotatingFileHandler(filename = 'log.log', when = 'midnight', backupCount = 30)
How do I solve this problem?

The logging package file handlers will encode any Unicode object you send to it to UTF-8, unless you specified a different encoding.
Use io.open() to read the file as UTF-8 data again, you'll get unicode objects again, ideal for Jinja2:
import io
log = io.open('log.log', encoding='utf8')
You could also specify a different encoding for the TimedRotatingFileHandler but UTF-8 is an excellent default. Use the encoding keyword argument if you wanted to pick a different encoding:
fileLogger = logging.handlers.TimedRotatingFileHandler(
filename='log.log', when='midnight', backupCount=30,
encoding='Latin1')

I'm not familiar with flask, but if you can grab the contents of the log as a string. You can encode it to utf-8 like so:
string = string.encode('utf-8') # string is the log's contents, now in utf-8

Some characters (trademark sign, etc) unable to write to a file but is printable on the screen

I've been trying to scrape data from a website and write out the data that I find to a file. More than 90% of the time, I don't run into Unicode errors but when the data has the following characters such as "Burger King®, Hans Café", it doesn't like writing that into the file so my error handling prints it to the screen as is and without any further errors.
I've tried the encode and decode functions and the various encodings but to no avail.
Please find an excerpt of the current code that I've written below:
import urllib2,sys
import re
import os
import urllib
import string
import time
from BeautifulSoup import BeautifulSoup,NavigableString, SoupStrainer
from string import maketrans
import codecs
f=codecs.open('alldetails7.txt', mode='w', encoding='utf-8', errors='replace')
...
soup5 = BeautifulSoup(html5)
enc_s5 = soup5.originalEncoding
for company in iter(soup5.findAll(height="20px")):
stream = ""
count_detail = 1
for tag in iter(company.findAll('td')):
if count_detail > 1:
stream = stream + tag.text.replace(u',',u';')
if count_detail < 4 :
stream=stream+","
count_detail = count_detail + 1
stream.strip()
try:
f.write(str(stnum)+","+br_name_addr+","+stream.decode(enc_s5)+os.linesep)
except:
print "Unicode error ->"+str(storenum)+","+branch_name_address+","+stream

Your f.write() line doesn't make sense to me - stream will be a unicode since it's made indirectly from from tag.text and BeautifulSoup gives you Unicode, so you shouldn't call decode on stream. (You use decode to turn a str with a particular character encoding into a unicode.) You've opened the file for writing with codecs.open() and told it to use UTF-8, so you can just write() a unicode and that should work. So, instead I would try:
f.write(unicode(stnum)+br_name_addr+u","+stream+os.linesep)
... or, supposing that instead you had just opened the file with f=open('alldetails7.txt','w'), you would do:
line = unicode(stnum)+br_name_addr+u","+stream+os.linesep
f.write(line.encode('utf-8'))

Have you checked the encoding of the file you're writing to, and made sure the characters can be shown in the encoding you're trying to write to the file? Try setting the character encoding to UTF-8 or something else explicitly to have the characters show up.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python requests issues with non ascii data - python

the open() function has a parameter encoding, used like f = open('t.txt', encoding='utf-8') which accepts a variety of parameters as outlined in the docs. Find out what encoding scheme your data uses (probably UTF-8) and see if opening with that encoding works.

Don't use the encoding parameter to open the files because you want to open them as binary data. The calls to open should look like open(f, 'rb'). The documentation for requests only shows examples like this purposefully and even documents this behaviour.

Related

Errror in outputting CSV with Django?

Parse text file from content-type=application/zip and base64 encoding in AWS SES

Program (twitter bot) works on Windows machine, but not on Linux machine [duplicate]

Unicode error reading Python log file (logging)

Some characters (trademark sign, etc) unable to write to a file but is printable on the screen

Categories

Resources