I'm fetching JSON with Requests from an API (using Python 3.5) and when I'm trying to print (or use) the JSON, either by response.text, json.loads(...) or response.json(), I get an UnicodeEncodeError.
print(response.text)
UnicodeEncodeError: 'ascii' codec can't encode character '\xc5' in position 676: ordinal not in range(128)
The JSON contains an array of dictionaries with country names and some of them contain special characters, e.g.: (just one dictionary in the binary array for example)
b'[{\n "name" : "\xc3\x85land Islands"\n}]
I have no idea why there is an encoding problem and also why "ascii" is used when Requests detects an UTF-8 encoding (and even by setting it manually to UTF-8 doesn't change anything).
Edit2: The problem was Microsoft Visual Studio Code 1.4. It wasn't able to print the characters.
If your code is running within VS, then it sounds that Python can't work out the encoding of the inbuilt console, so defaults to ASCII. If you try to print any non-ASCII then Python throws an error rather printing text that won't display.
You can force Python's encoding by using the PYTHONIOENCODING environment variable. Set it within the run configuration for the script.
Depending on Visual Studio's console, you may get away with:
PYTHONIOENCODING=utf-8
or you may have to use a typical 8bit charset like:
PYTHONIOENCODING=windows-1252
Related
Getting below error when I execute a python code from jenkins-
File "/export/app-33-1/jenkins/w/ee4a092a/install/src/linux-amd64-gcc_4_4-release/bin/eat2/eat.py", line 553, in _runtest
print('ERROR:' + msg)
UnicodeEncodeError: 'ascii' codec can't encode character '\u0447' in position 315:
ordinal not in range(128)
From where exactly it takes encoder - ascii as I have changed default encoding of python, jenkins master and slave process as well as systems.
Even added # coding: utf-8 at the start of script but didn't work.
Its not about only printing the string in console, my code tries to access some files and file path contains some Russian characters so everything fails.
When I run the same script manually from linux console, everything works.
Any idea what could be the solution here?
Contrary to wide-spread belief, the default encoding for the built-in open() function as well as the sys.std* streams (print() uses sys.stdout) is not always UTF-8 in Python 3. It might be on one machine, but not the other, because it's platform-dependent.
From the docs for sys.stdin/stdout/stderr:
These streams are regular text files like those returned by the open() function. Their parameters are chosen as follows:
The character encoding is platform-dependent. Non-Windows platforms use the locale encoding [...]
And later on:
Under all platforms, you can override the character encoding by setting the PYTHONIOENCODING environment variable before starting Python [...]
Note that there are some exceptions for Windows.
For files opened with open, you can easily get control by explicitly setting the encoding= parameter.
When I tried to make request with python requests library like below. I am getting the below exception
def get_request(url):
return requests.get(url).json()
Exception
palo:dataextractor minisha$ python escoskill.py
Traceback (most recent call last):
File "escoskill.py", line 62, in <module>
print(response.json())
UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in position 277: ordinal not in range(128)
However, the same piece of code works for some request and not for all. For the below url, it doesn't work.
https://ec.europa.eu/esco/api/resource/concept?uri=http://data.europa.eu/esco/isco/C2&language=en
Url that works
https://ec.europa.eu/esco/api/resource/taxonomy?uri=http://data.europa.eu/esco/concept-scheme/isco&language=en
The exception you're getting, UnicodeEncodeError, means we have some character that we cannot encode into bytes. In this case, we're trying to encode \xe4, or ä, which ASCII¹ does not have, hence, error.
In this line of code:
print(response.json())
The only thing that's going to be doing encoding is the print(). print(), to emit text into something, needs to encode it to bytes. Now, what it does by default depends on what sys.stdout is. Typically, stdout is your screen unless you've redirected output to a file. On Unix-like OSs (Linux, OS X), the encoding Python will use will be whatever LANG is set to; typically, this should be something like en_US.utf8 (the first part, en_US, might differ if you're in a different country; the utf8 bit is what is important here). If LANG isn't set (this is unusual, but can happen in some contexts, such as Docker containers) then it defaults to C, for which Python will use ASCII as an encoding.
(Edit) From the additional information in the comments (you're on OS X, you're using IntelliJ, and LANG is unset (print(repr(os.environ['LANG']))) printed None)), this is a tough one to give advice on. LANG being unset means Python will assume it can only output ASCII, and error out, as you've seen, on anything else. In order of preference, I would:
Try to figure out why LANG is unset. This might be some configuration of the mini-terminal in the IDE, if that is what you have and are using. This may prove hard to find if you're unfamiliar with character encodings, and I might be off-base here, as I'm unfamiliar with IntelliJ.
Since you seem to be running your program from a command line, you can see if setting LANG helps. Where currently you are doing,
python escoskill.py
You can set LANG for a single run with:
LANG=en_US.utf8 python escoskill.py
If that works, you can make it last for however long that session is by doing,
export LANG=en_US.utf8
# LANG will have that value for all future commands run from this terminal.
python escoskill.py
You can override what Python autodetects the encoding to be, or you can override its behavior when it hits a character it can't encode. For example,
PYTHONIOENCODING=ascii:replace 'print("\xe4")'
tells Python to use the output encoding of ASCII (which is what it was doing before anyways) but the :replace bit will make characters that it can't encode in ASCII, such as ä, be emitted as ?s instead of erroring out. This might make some things harder to read, of course.
¹ASCII is a character encoding. A character encoding tells one how to translate bytes into characters. There's not just one, because… humans.
²or perhaps your OS, but LANG being unset on OS X just sounds very implausible
I'm using BeautifulSoup to Parse some html, with Spyder as my editor (both brilliant tools by the way!). The code runs fine in Spyder, but when I try to execute the .py file from terminal, I get an error:
file = open('index.html','r')
soup = BeautifulSoup(file)
html = soup.prettify()
file1 = open('index.html', 'wb')
file1.write(html)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa9' in position 5632: ordinal not in range(128)
I'm running OPENSUSE on a linux server, with Spyder installed using zypper.
Does anyone have any suggestions what the problem might be?
Many thanks.
That is because because before outputting the result (i.e writing it to the file) you must encode it first:
file1.write(html.encode('utf-8'))
See every file has an attribute file.encoding. To quote the docs:
file.encoding
The encoding that this file uses. When Unicode strings
are written to a file, they will be converted to byte strings using
this encoding. In addition, when the file is connected to a terminal,
the attribute gives the encoding that the terminal is likely to use
(that information might be incorrect if the user has misconfigured the
terminal). The attribute is read-only and may not be present on all
file-like objects. It may also be None, in which case the file uses
the system default encoding for converting Unicode strings.
See the last sentence? soup.prettify returns a Unicode object and given this error, I'm pretty sure you're using Python 2.7 because its sys.getdefaultencoding() is ascii.
Hope this helps!
I have a script that reads an XML file and writes it into the Database.
When I run it through the browser (call it via a view) it works fine, but
when I created a Command for it (./manage.py importxmlfile) I get the following message:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 6: ordinal not in range(128)
I'm not sure why it would only happen when calling the import via command line.. any ideas?
Update
I'm trying to convert an lxml.etree._ElementUnicodeResult object to string and save it in the DB (utf8 collation) using str(result).
This produces the error mentioned above only on Command Line.
Ah, don't use str(result).
instead, do:
result.encode('utf-8')
When you call str(result), python will use the default system encoding (usually ascii) to try and encode the bytes in result. This will break if the ordinal not in range(128). Rather than using the ascii codec, just .encode() and tell python which codec to use.
Check out the Python Unicode HowTo for more information. You might also want to check out this related question or this excellent presentation on the subject.
I've got some problem with unichr() on my server. Please see below:
On my server (Ubuntu 9.04):
>>> print unichr(255)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in position 0: ordinal not in range(128)
On my desktop (Ubuntu 9.10):
>>> print unichr(255)
ÿ
I'm fairly new to python so I don't know how to solve this. Anyone care to help? Thanks.
When using the "print" keyword, you'll be writing to the sys.stdout output stream. sys.stdout can usually only display Unicode strings if the characters can be converted to ascii using str(message).
You'll need to encode to your OS's terminal encoding when printing to be able to do this.
The locale module can sometimes detect the encoding of the output console:
import locale
print unichr(0xff).encode(locale.getdefaultlocale()[1], 'replace')
but it's usually better to just specify the encoding yourself, as python often gets it wrong:
print unichr(0xff).encode('latin-1', 'replace')
UTF-8 or latin-1 I think is often used in many modern linux distros.
If you know the encoding of your console, the lines below will encode Unicode strings automatically when you use "print":
import sys
import codecs
sys.stdout = codecs.getwriter(ENCODING)(sys.stdout)
If the encoding is ascii or something similar, you may need to change the console encoding of your OS to be able to display that character.
See also: http://wiki.python.org/moin/PrintFails
The terminal settings on your server are different, probably set to 7-bit US ASCII.
It's not really unichr() related. Problem is with locale setting in your server environment, as it's probably set to something like en_US and it's not unicode aware.
Consider using an explicit encoding when printing unicode strings where OS settings are not uniform.
unicode.encode([encoding[, errors]])
Return an encoded version of the string. Default encoding is the current default string encoding. errors may be given to set a different error handling scheme. The default for errors is 'strict', meaning that encoding errors raise a UnicodeError. Other possible values are 'ignore', 'replace', 'xmlcharrefreplace', 'backslashreplace' and any other name registered via codecs.register_error(), see section Codec Base Classes. For a list of possible encodings, see section Standard Encodings.
For example,
>>> print unichr(0xff).encode('iso8859-1')
����??
>>>