emoji not printing in python shows UnicodeEncodeError

emoji not printing in python shows UnicodeEncodeError - python

I'm trying to print an emoji on Python
Code:
print("\U0001f600")
I'm trying to print using unicode but after compiling it shows error
Error:
Traceback (most recent call last): File
"C:/Users/RAHUL/Desktop/new.py", line 1, in print("\U0001f600")
UnicodeEncodeError: 'UCS-2' codec can't encode character '\U0001f600'
in position 0: Non-BMP character not supported in Tk
I have searched on the internet but did't found the answer

Related

Python 3.5 cannot read a file with special character ★

i have below code in python 3.5 screen of IDE and powershell
real_path_log_file = 'C:\\Users\\XXXXX\\Desktop\\report\\file1.log'
with open(real_path_log_file, 'r', encoding='utf-8') as open_log:
read_log = open_log.readlines()
print(read_log)
from powershell i get below error but if i run this from pycharm i can getting the output of file content
please note that file i am reading has ★ character
Traceback (most recent call last):
File ".\test_read_file.py", line 5, in <module>
print(read_log)
File "C:\Users\XXXXX\AppData\Local\Programs\Python\Python35\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2605' in position 367: character maps to <undefined>

I'm no python expert. At least in 3.5, I think print() needs to believe that the console can support unicode characters. For what it's worth, 3.8.5 doesn't have this problem.

Python and IBM DB2: UnicodeDecodeError

I'm getting this error message
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc8 in position 38: ordinal not in range(128)
when I try to execute any sql query in Python, like this one:
>>> import ibm_db
>>> conn = ibm_db.connect("sample","root","root")
>>> ibm_db.exec_immediate(conn, "select * from act")
I checked default encoding and it seems to be 'utf8':
>>> import sys
>>> sys.getdefaultencoding()
'utf-8'
I also know about this thread, where people are discussing quite a similar problem. One of the advices is:
Have you applied the required database PTFs (SI57014 and SI57015 for 7.1 and SI57146 and SI57147 for 7.2)? They are included as a distreq, so they should have been in the order with your PTFs, but won't be automatically applied.
However, I do not know what is database PTF and how to apply it. Need help.
PS. I'm using Windows 10.
EDIT
This is how I get my error message:
>>> print(ibm_db.stmt_errormsg())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc8 in position 38:
ordinal not in range(128)
But when I run the same query "select * from act" in DB2 CLP, then it's ok.
And this is driver information, whcih I got running this code in Python:
if client:
print("DRIVER_NAME: string(%d) \"%s\"" % (len(client.DRIVER_NAME), client.DRIVER_NAME))
print("DRIVER_VER: string(%d) \"%s\"" % (len(client.DRIVER_VER), client.DRIVER_VER))
print("DATA_SOURCE_NAME: string(%d) \"%s\"" % (len(client.DATA_SOURCE_NAME), client.DATA_SOURCE_NAME))
print("DRIVER_ODBC_VER: string(%d) \"%s\"" % (len(client.DRIVER_ODBC_VER), client.DRIVER_ODBC_VER))
print("ODBC_VER: string(%d) \"%s\"" % (len(client.ODBC_VER), client.ODBC_VER))
print("ODBC_SQL_CONFORMANCE: string(%d) \"%s\"" % (len(client.ODBC_SQL_CONFORMANCE), client.ODBC_SQL_CONFORMANCE))
print("APPL_CODEPAGE: int(%s)" % client.APPL_CODEPAGE)
print("CONN_CODEPAGE: int(%s)" % client.CONN_CODEPAGE)
ibm_db.close(conn)
else:
print("Error.")
it prints:
DRIVER_NAME: string(10) "DB2CLI.DLL"
DRIVER_VER: string(10) "10.05.0007"
DATA_SOURCE_NAME: string(6) "SAMPLE"
DRIVER_ODBC_VER: string(5) "03.51"
ODBC_VER: string(10) "03.01.0000"
ODBC_SQL_CONFORMANCE: string(8) "EXTENDED"
APPL_CODEPAGE: int(1251)
CONN_CODEPAGE: int(1208)
True
EDIT
I also tried this:
>>> cnx = ibm_db.connect("sample","root","root")
>>> query = "select * from act"
>>> query.encode('ascii')
b'select * from act'
>>> ibm_db.exec_immediate(cnx, query)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
Exception
>>> print(ibm_db.stmt_errormsg())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc8 in position 38:
ordinal not in range(128)
As you can see, in this case I also get the very same error message.
SUMMARY
Below are all my attemts:
C:\Windows\system32>chcp
Active code page: 65001
C:\Windows\system32>python
Python 3.4.4 (v3.4.4:737efcadf5a6, Dec 20 2015, 20:20:57) [MSC v.1600 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import ibm_db
>>> cnx = ibm_db.connect("sample","root","root")
>>> ibm_db.exec_immediate(cnx, "select * from act")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
Exception
>>> print(ibm_db.stmt_errormsg())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc8 in position 38: ordinal not in range(128)
>>> ibm_db.exec_immediate(cnx, b"select * from act")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
Exception: statement must be a string or unicode
>>> query = "select * from act"
>>> query = query.encode()
>>> ibm_db.exec_immediate(cnx, query)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
Exception: statement must be a string or unicode
>>> ibm_db.exec_immediate(cnx, "select * from act").decode('cp-1251')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
Exception

What you have here is an incompatibility between your client code (ibm_db) and the DB2 server. As you can see in the client code the logic for your query is basically:
Extract and check the parameters passed in (lines 4873 to 4918).
Allocate native objects for the query (up to 4954).
Do the query and decode the results (the rest of the function).
Based on our investigations so far, you know that the data you're passing in for the query is well-formed (and so it is not step 1). Looking at the error paths in step 2, you'd see simple error messages explaining these failures. You're therefore failing in step 3.
You are getting an empty Exception raised on the query and when you try to get the details of the error you get another Unicode decoding Exception. This looks like either a bug in ibm_db or a configuration error that means your DB2 installation is not compatible. So how can we find out which...?
As flagged elsewhere, the issue is fundamentally to do with codepages. All the ibm_db code basically interprets strings as ASCII (by converting them with StringOBJ_FromASCII which maps down to calls into Python APIs that insist on receiving ASCII chars - and will throw unicode exceptions if not).
Based on your diags, you could try to prove/disprove this problem, by installing/configuring both your systems (client and DB2 server) to use US English. This should get you past the codepage incompatibility to find the real error here.
If the query is really going out over the network, you might just get a network trace that shows the response coming back from the server. However, based on the fact that you saw nothing in the logs, I'm not convinced this will bear any fruit.
Failing that you need to patch the ibm_db code to handle non-ASCII content - either by raising a bug report with the maintainer or trying it yourself (if you know how to build and debug C extensions).

The problem is that the DB2 server is returned CP-1251 (also known as Windows-1251) text (as evidenced by APPL_CODEPAGE: int(1251)) in your config output. Python (specifically, the interactive Python REPL) is expecting either UTF-8 or ASCII output, so this causes issues.
The solution is to do:
ibm_db.exec_immediate(conn, "select * from act").decode('cp-1251')
Additionally, you need to make sure that your terminal's text encoding is set to UTF-8. Details on changing that setting will depend on the specific terminal that you are using. Since you have said you are using cmd, the appropriate command is chcp 65001.

In this kind of case, using an utf8 environment, with a stuff that requires a ascii one; i use the decode method.
'ascii' codec can't decode byte 0xc8
Allright, it's normal, this is not ascii but utf8 string: you should decode it with utf8 encoding.
...
query.decode('utf8')
ibm_db.exec_immediate(cnx, query)
After that you may need to re-encode the results to write or print them.

Encode UTF-8 for list

I'm using selenium to retrieve a list from a javascript object.
search_reply = driver.find_element_by_class_name("ac_results")
When trying to write to csv, I get this error:
Traceback (most recent call last):
File "insref_lookup15.py", line 54, in <module>
wr_insref.writerow(instrument_name)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 22: ordinal not in range(128)
I have tried placing .encode("utf-8") on both:
search_reply = driver.find_element_by_class_name("ac_results").encode("utf-8")
and
wr_insref.writerow(instrument_name).encode("utf-8")
but I just get the message
AttributeError: 'xxx' object has no attribute 'encode'

You need to encode the elements in the list:
wr_insref.writerow([v.encode('utf8') for v in instrument_name])
The csv module documentation has an Examples section that covers writing Unicode objects in more detail, including utility classes to handle this automatically.

UnicodeEncodeError in PyWinUSB with Python 2.7.9?

So, I'm trying to work with PyWinUSB, but I can't get very far because I keep getting a UnicodeEncodeError.
The code:
import pywinusb.hid as hid
hid.find_all_hid_devices()
The ouput:
[Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xae' in position 91: ordinal not in range(128)
Note that this only happens when I have my external keyboard and mouse plugged in (It's a Microsoft wireless combo).
This is what I get in Python 3.4 when I try the same code.
HID device (vID=0x045e, pID=0x00e3, v=0x0053); Microsft; Microsoft Wireless Optical Desktop\xae 2.20, Path: \\?\hid#vid_045e&pid_00e3&mi_01&col01#8&a279d5&0&0000#{4d1e55b2-f16f-11cf-88cb-001111000030}
HID device (vID=0x045e, pID=0x00e3, v=0x0053); Microsft; Microsoft Wireless Optical Desktop\xae 2.20, Path: \\?\hid#vid_045e&pid_00e3&mi_01&col03#8&a279d5&0&0002#{4d1e55b2-f16f-11cf-88cb-001111000030}
HID device (vID=0x045e, pID=0x00e3, v=0x0053); Microsft; Microsoft Wireless Optical Desktop\xae 2.20, Path: \\?\hid#vid_045e&pid_00e3&mi_01&col04#8&a279d5&0&0003#{4d1e55b2-f16f-11cf-88cb-001111000030}
However, if I try to do a print for each item with Python 3.4, I get this:
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "C:\Python34\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xae' in position 91: character maps to <undefined>
Any ideas on how I can fix this?

This is not a pyWinUsb issue, it's actually a design issue on Python 2.x.
By default, there is no character decoding management on stdout writing.
USB device strings are Unicode UTF-8 encoded, so it is safe to install a character decoder (from pyWinUsb show_hids.py example):
if sys.version_info < (3,):
import codecs
output = codecs.getwriter('mbcs')(sys.stdout)
else:
# python3, you have to deal with encodings, try redirecting to any file
output = sys.stdout

All of a sudden str.decode('unicode_escape') stopped working [2.7.3]

This snippet is taken from my recent python work. And it used to work just fine
strr = "What is th\u00e9 point?"
print strr.decode('unicode_escape')
But now it throws the unicode decoding error:
Traceback (most recent call last):
File "C:\Users\Lenon\Documents\WorkDir\pyWork\ocrFinale\F1\tests.py", line 49, in <module>
print strr.decode('unicode_escape')
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 10: ordinal not in range(128)
What is the possible cause of this?

You either have enabled unicode literals or have created a Unicode object by another means, by mistake.
The strr value is already a unicode object, so in order to decode the value Python first tries to encode to a byte string.
If you have an actual byte string your code works:
>>> strr = "What is th\u00e9 point?"
>>> strr.decode('unicode_escape')
u'What is th\xe9 point?'
but as soon as strr is in fact a Unicode object, you get the error as Python tries to encode the object using the default ASCII codec first:
>>> strr.decode('unicode_escape').decode('unicode_escape')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 10: ordinal not in range(128)
It could be that you enabled unicode_literals, for example:
>>> from __future__ import unicode_literals
>>> strr = "What is th\u00e9 point?"
>>> type(strr)
<type 'unicode'>
>>> strr.decode('unicode_escape')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 10: ordinal not in range(128)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.