Python - Encode error - cp850.py

Python - Encode error - cp850.py - python

I am Python beginner so I hope this problem will be an easy fix.
I would like to print the value of an attribute as follows:
print (follower.city)
I receive the following error message:
File “C:\Python34\lib\encodings\cp850.py“, line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: ‘charmap‘ codec can’t encode character ‘\u0130‘
0: character maps to (undefined)
I think the problem is that cp850.py does not contain the relevant character in the encoding table.
What would be the solution to this problem? No ultimate need to display the character correctly, but the error message must be avoided. Do I need to modify cp850.py?
Sorry if this question has been addressed before, but I was not able to figure it out using previous answers to this topic.

To print a string it must first be converted from pure Unicode to the byte sequences supported by your output device. This requires an encode to the proper character set, which Python has identified as cp850 - the Windows Console default.
Starting with Python 3.3 you can set the Windows console to use UTF-8 with the following command issued at the command prompt:
chcp 65001
This should fix your issue, as long as you've configured the window to use a font that contains the character.

Related

Python program is running in IDLE but not in command line

Before someone says this is a duplicate question, I just want to let you know that the error I am getting from running this program in command line is different from all the other related questions I've seen.
I am trying to run a very short script in Python
from bs4 import BeautifulSoup
import urllib.request
html = urllib.request.urlopen("http://dictionary.reference.com/browse/word?s=t").read().strip()
dhtml = str(html, "utf-8").strip()
soup = BeautifulSoup(dhtml.strip(), "html.parser")
print(soup.prettify())
But I keep getting an error when I run this program with python.exe. UnicodeEncodeError: 'charmap' codec can't encode character '\u025c. I have tried a lot of methods to get around this, but I managed to isolate it to the problem of converting bytes to strings. When I run this program in IDLE, I get the HTML as expected. What is it that IDLE is automatically doing? Can I use IDLE's interpretation program instead of python.exe? Thanks!
EDIT:
My problem is caused by print(soup.prettify()) but type(soup.prettify()) returns str?
RESOLVED:
I finally made a decision to use encode() and decode() because of the trouble that has been caused. If someone knows how to actually resolve a question, please do; also, thank you for all your answers

UnicodeEncodeError: 'charmap' codec can't encode character '\u025c'
The console character encoding can't represent '\u025c' i.e., "ɜ" Unicode character (U+025C LATIN SMALL LETTER REVERSED OPEN E).
What is it that IDLE is automatically doing?
IDLE displays Unicode directly (only BMP characters) if the corresponding font supports given Unicode characters.
Can I use IDLE's interpretation program instead of python.exe
Yes, run:
T:\> py -midlelib -r your_script.py
Note: you could write arbitrary Unicode characters to the Windows console if Unicode API is used:
T:\> py -mpip install win-unicode-console
T:\> py -mrun your_script.py
See What's the deal with Python 3.4, Unicode, different languages and Windows?

I just want to let you know that the error I am getting from running this program in command line is different from all the other related questions I've seen.
Not really. You have PrintFails like everyone else.
The Windows console can't print Unicode. (This isn't strictly true, but going into exactly why, when and how you can get Unicode out of the console is a painful exercise and not usually worth it.) Trying to print a character that isn't in the console's limited encoding can't work, so Python gives you an error.
print them out (which I need an easier solution to because I cannot do .encode("utf-8") for a lot of elements
You could run the command set PYTHONIOENCODING=utf-8 before running the script to tell Python to use and encoding which can include any character (so no errors), but any non-ASCII output will still come out garbled as its encoding won't match the console's actual code page.
(Or indeed just use IDLE.)

I finally made a decision to use encode() and decode() because of the trouble that has been caused. If someone knows how to actually resolve a question, please do; also, thank you for all your answers

Why web-server complains about Cyrillic letters and command line not?

I have a web-server on which I try to submit a form containing Cyrillic letters. As a result I get the following error message:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
This message comes from the following line of the code:
ups = 'rrr {0}'.format(body.replace("'","''"))
(body contains Cyrillic letters). Strangely I cannot reproduce this error message in the python command line. The following works fine:
>>> body = 'ппп'
>>> ups = 'rrr {0}'.format(body.replace("'","''"))

It's working in the interactive prompt because your terminal is using your locale to determine what encoding to use. Directly from the Python docs:
Whereas the other file-like objects in python always convert to ASCII
unless you set them up differently, using print() to output to the
terminal will use the user’s locale to convert before sending the
output to the terminal.
On the other hand, while your server is running the scripts, there is no such assumption. Everything read as a byte str from a file-like object is encoded as ASCII in memory unless otherwise specified. Your Cyrillic characters, presumably encoded as UTF-8, can't be converted; they're far beyond the U+007F code point that maps directly between UTF-8 and ASCII. (Unicode uses hex to map its code points; U+007F, then, is U+00127 in decimal. In fact, ASCII only has 127 zero-indexed code points because it uses only 1 byte, and of that one byte, only the least-significant 7 bits. The most significant bit is always 0.)
Back to your problem. If you want to operate on the body of the file, you'll have to specify that it should be opened with a UTF-8 encoding. (Again, I'm assuming it's UTF-8 because it's information submitted from the web. If it's not -- well, it really should be.) The solution has already been given in other StackOverflow answers, so I'll just link to one of them rather than reiterate what's already been answered. The best answer may vary a little bit depending on your version of Python -- if you let me know in a comment I could give you a clearer recommendation.

Understanding Python Unicode and Linux terminal

I have a Python script that writes some strings with UTF-8 encoding. In my script I am using mainly the str() function to cast to string. It looks like that:
mystring="this is unicode string:"+japanesevalues[1]
#japanesevalues is a list of unicode values, I am sure it is unicode
print mystring
I don't use the Python terminal, just the standard Linux Red Hat x86_64 terminal. I set the terminal to output utf8 chars.
If I execute this:
#python myscript.py
this is unicode string: カラダーズ ソフィー
But if I do that:
#python myscript.py > output
I got the typical error:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 253-254: ordinal not in range(128)
Why is that?

The terminal has a character set, and Python knows what that character set is, so it will automatically decode your Unicode strings to the byte-encoding that the terminal uses, in your case UTF-8.
But when you redirect, you are no longer using the terminal. You are now just using a Unix pipe. That Unix pipe doesn't have a charset, and Python has no way of knowing which encoding you now want, so it will fall back to a default character set.
You have marked your question with "Python-3.x" but your print syntax is Python 2, so I suspect you are actually using Python 2. And then your sys.getdefaultencoding() is generally 'ascii', and in your case it's definitely so. And of course, you can not encode Japanese characters as ASCII, so you get an error.
Your best bet when using Python 2 is to encode the string with UTF-8 before printing it. Then redirection will work, and the resulting file with be UTF-8. That means it will not work if your terminal is something else, though, but you can get the terminal encoding from sys.stdout.encoding and use that (it will be None when redirecting under Python 2).
In Python 3, your code should work as is, except that you need to change print mystring to print(mystring).

If it outputs to the terminal then Python can examine the value of $LANG to pick a charset. All bets are off if you redirect.

Python: replace nonbreaking space in Unicode

In Python, I have a text that is Unicode-encoded. This text contains non-breaking spaces, which I want to convert to 'x'. Non-breaking spaces are equal to chr(160). I have the following code, which works great when I run it as Django via Eclipse using Localhost. No errors and any non-breaking spaces are converted.
my_text = u"hello"
my_new_text = my_text.replace(chr(160), "x")
However when I run it any other way (Python command line, Django via runserver instead of Eclipse) I get an error:
'ascii' codec can't decode byte 0xa0 in position 0: ordinal not in range(128)
I guess this error makes sense because it's trying to compare Unicode (my_text) to something that isn't Unicode. My questions are:
If chr(160) isn't Unicode, what is it?
How come this works when I run it from Eclipse? Understanding this would help me determine if I need to change other parts of my code. I have been testing my code from Eclipse.
(most important) How do I solve my original problem of removing the non-breaking spaces? my_text is definitely going to be Unicode.

In Python 2, chr(160) is a byte string of length one whose only byte has value 160, or hex a0. There's no meaning attached to it except in the context of a specific encoding.
I'm not familiar with Eclipse, but it may be playing encoding tricks of its own.
If you want the Unicode character NO-BREAK SPACE, i.e. code point 160, that's unichr(160).
E.g.,
>>> u"hello\u00a0world".replace(unichr(160), "X")
u'helloXworld

Return unicode string from python via ajax

I have a small webapp that runs Python on the server side and javascript (jQuery) on the client side.
Now upon a certain request my Python script returns a unicode string and the client is supposed to put that string inside a div in the browser. However i get a unicode encode error from Python.
If i run the script from the shell (bash on debian linux) teh script runs fine and prints the unicode string.
Any ideas ?
Thanks!
EDIT
This is the print statement that causes the error:
print u'öäü°'
This is the error message i get:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 34-36: ordinal not in range(128)
However i only get that message when calling the script via ajax ( $('#somediv').load('myscript.py'); )
Thank you !

If the python interpreter can't determine the encoding of sys.stdout ascii is used as a fallback however the characters in the string are not part of ascii, therefore a UnicodeEncodeError exception is raised.
A solution would be to encode the string yourself using something like .encode(sys.stdout.encoding or "utf-8"). This way utf-8 is used as a fallback instead of ascii.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.