UnicodeEncodingError - python

I'm using xlrd and cx_freeze.
Now when I'm trying to read from the excel file, it shows an error when it came to the "," mark:
UnicodeEncodeError: 'charmap' codec can't encode character "\u2019" in position 12: character maps to <undefined>
from xlrd3 import *
book= open_workbook('s1.xls')
sheet=book.sheet_by_index(0)
import sys
from encodings import *
from codecs import *
def pr():
global row,col
if isinstance((sheet.cell(row,col).value), float):
cell = sheet.cell(row,col)
cell_value = cell.value
cell_value1= int(cell.value)
s=[]
s.append(cell_value1)
print (s)
s=[]
else:
cell = sheet.cell(row,col)
cell_value = cell.value
s=[]
s.append(cell_value)
print (s)
s=[]
def co():
x=input("'S'earch again or 'Q'uite?: ")
if x == 'S' or x=='s':
search()
elif x == 'Q'or x == 'q':
sys.exit(0)
else:
print ('Please enter a Vailed answer: ')
co()
def search():
global row,col
s=[]
a=(input("Enter Search : "))
for row in range(sheet.nrows):
for col in range(sheet.ncols):
s.append(str(sheet.cell(row,col).value))
if a in (str(sheet.cell(row,col).value)):
for col in range(sheet.ncols):
pr()
else:
s=[]
co()
search()
this is the code

You don't show which line of code your error came from. A full traceback is much more informative than only showing the error message.
A UnicodeEncodeError (note: not 'UnicodeEncodingError' as you titled this thread) often shows up when you have a Unicode string and you are passing it to something which supports an unknown encoding method. The most likely situation here is in your print(s). A traceback would have told us if this was the source of the problem.
The problem is that Python doesn't know how to print non-ASCII Unicode characters to the terminal. In this case it's because you have character '\u2019', which is the right single quotation mark. (And not the "," mark, which is a comma.)
You must tell it how to encode the Unicode as a set of bytes appropriate for your terminal; specifically, a Windows terminal. This reduces your problem to the one discussed at Prevent encoding errors in Python and several dozen other posts which you get when searching for your error message here.
Since you are on Windows, take the advice from that "Prevent encoding errors in Python" link and do:
print(s.encode('cp850', errors='replace'))

Related

Encoding a file with ord function

I'm trying to encode a file and output the encode into a new file, but I got this error:
TypeError: ord() expected string of length 1, but int found
My code:
from sys import argv, exit
def encode(data):
encoded = ''
while data:
current = data[0]
count = 1
for i in data[1:]:
if i == current:
count += 1
else:
break
if count == 255:
break
encoded += '{}{}'.format(chr(ord(current) & 255), chr(count & 255)) #error occurs here.
data = data[count:]
return encoded
if __name__ == '__main__':
if len(argv) < 2:
print('Please specify input file!')
exit(0)
with open(argv[1], 'rb') as (f):
data = f.read()
with open(argv[1] + '.out', 'wb') as (f):
f.write(encode(data))
Additional question: How do I decode the encoded file?
You are reading bytes (open(..., 'rb')), so when you take one element of the byte string, you get a byte, ie. a number. This number already is the character code, so just leave out the ord. Alternatively, you could open the file without the b modifier (open(..., 'r')), which will return a string; I would advise to keep it as a byte string though (or you could run into encoding issues if you are parsing something non-ascii).
You will run into a similar problem saving your file: you cannot write a string into a file opened with the b modifier. Since you have characters outside the ascii range (>128), writing as a string is not a good idea, since python will try to encode your characters (eg. in UTF-8), and you will end up with completely different bytes. Therefore, the best solution probably is not to concat your data to a string in your loop (the part where you do '{}{}'.format(...), but to have a list (encoded = [], concat with encoded.append(current)) and convert that to a byte string using bytes(encoded) after your loop. You can then pass that to write without a problem.
As for how to decode your file, you can just open the file like you do for encoding, read two bytes b1 and b2, and append [b1]*b2 to your output (again, as a list), and convert that to a byte string with bytes().

Test if byte is empty in python

I'd like to test the content of a variable containing a byte in a way like this:
line = []
while True:
for c in self.ser.read(): # read() from pySerial
line.append(c)
if c == binascii.unhexlify('0A').decode('utf8'):
print("Line: " + line)
line = []
break
But this does not work...
I'd like also to test, if a byte is empty:
In this case
print(self.ser.read())
prints: b'' (with two single quotes)
I do not until now succeed to test this
if self.ser.read() == b''
or what ever always shows a syntax error...
I know, very basic, but I don't get it...
Thank you for your help. The first part of the question was answerd by #sisanared:
if self.ser.read():
does the test for an empty byte
The second part of the question (the end-of-line with the hex-value 0A) stil doesn't work, but I think it is whise to close this question since the answer to the title is given.
Thank you all
If you want to verify the contents of your variable or string which you want to read from pySerial, use the repr() function, something like:
import serial
import repr as reprlib
from binascii import unhexlify
self.ser = serial.Serial(self.port_name, self.baudrate, self.bytesize, self.parity, self.stopbits, self.timeout, self.xonxoff, self.rtscts)
line = []
while 1:
for c in self.ser.read(): # read() from pySerial
line.append(c)
if if c == b'\x0A':
print("Line: " + line)
print repr(unhexlify(''.join('0A'.split())).decode('utf8'))
line = []
break

Printing icons in the shell

Below are some examples, when run inside the Python 3.4.3 Shell included in IDLE3 it will output special characters (icons). When I run this same code inside a terminal, the characters will not appear at all.
""" Some print functions with backslashes.
In IDLE3 they will output 'special characters' or icons.
In a terminal, they will not output anything. """
#Somtimes a visual effect.
print ("a, \a") #telephone
print ("\a")
print ("b, \b") #checkmark
print ("c, \c") # just a '\c' output.
# other letters like '\c' kept out the rest of this list.
print ("f, \f") #quarter note (musical)
print ("n, \n") #newline
print ("r, \r") #halve note (musical)
print ("t, \tTabbed in")
#print ("u, \u") #syntaxerror
print ("\u0000") #empty
print ("\u0001") #left arrow
print ("\u0002") #left arrow underline
print ("\u0003") #right arrow (play)
print ("v, \v") #eighth note (musical)
print ("\x01") # == '\u0001' __________(x == 00 ?)
print ("\1") # == '\u0001' == '\x01'
#some more fooling around
print ("\1") #left arrow
print ("\2") #underlined left arrow
print ("\3") #right arrow
print ("\4") #underlined right arrow
print ("\5") #trinity
print ("\6") #Q-parking
print ("\7") #telephone
print ("\8")
print ("\9")
print ("\10") #checkmark
print ("\11 hi") #tab
print ("\12 hi") #newline
print ("\13") #8th note
print ("\14") #4th note
print ("\15") #halve note
print ("\16") #whole note
print ("\17") #double 8th note
print ("\18")
print ("\19")
print ("\20") #left arrow (black)
print ("\21") #right arrow (black)
print ("\22") #harry potter
print ("\23") #X-chrom-carrying cell
print ("\24") #Y-chrom-carrying cell
print ("\25") #diameter for lefties
print ("\26") #pentoid
print ("\27") #gamma?
print ("\28") #I finally realised this will have to do with triple
# binary per character? 111 = 7, stop = 8
print ("\30") #
print ("\31") # female
print ("\32") # male
print ("\33") #
print ("\34") # clock
print ("\35") # alfa / ichtus
print ("\36") # arc
print ("\37") # diameter
print ("\40hi") # spaces? I don't know.
# This does not work by the way:
##import string
###No visual effect.
##alfa = string.ascii_lowercase
##for x in alfa:
## print ("\%s" % x)
Some of my Python Shell 3.4.3. output in IDLE3:
Are these 'special' characters c.q. icons used in any way? Is there some documentation I could have read that would have prevented me from asking this question?
I checked on other questions about this issue on Stack, but all I found was people trying to pass in 'foreign' (like from word symbols or whatever) characters and make them get printed by Python.
If you are in the IDLE and click "Options" and "Configure IDLE..." you see the font you are using. Fonts convert character numbers to what you see. A different font can produce different characters.
Example:
>>> print(u'\u2620')
☠
Which I looked for by searching "unicode skull" and which can be found here.
Not all fonts support all characters.
Unicode characters are organized in blocks of a certain topic. I like the block "Miscellaneous Symbols" where the skull is from.
Encoding
Also an important question is which encoding you use. The encoding determines how characters are mapped to the unicode blocks. A character has to go from print(u'\0001') to the sys.stdout to the console reading it and to the window manager. Each step only understands bytes - 256 possible characters.
So, there are various encodings, such as latin-1, which use the 256 possible characters and map them onto the unicode blocks. latin-1 uses the first two blocks, I think. There are encodings, such as UTF-8 which use 8 bits = 1 byte and more or utf-16 which uses 2 bytes and more or utf-32 which uses 4 bytes and more, which allow more characters to be transfered from the print through the different steps.
If I want to encode the skull and crossbones in latin-1 I would get this error:
>>> u'\u2620'.encode('latin-1')
Traceback (most recent call last):
File "<pyshell#32>", line 1, in <module>
u'\u2620'.encode('latin-1')
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2620' in position 0: ordinal not in range(256)
Another example where I encode the russian letter zhe in the cyrillic code page and the latin one:
>>> print u'\u0436', repr(u'\u0436'.encode('cp1251')) # cyrillic works
ж '\xe6'
>>> print u'\u0436', repr(u'\u0436'.encode('cp1252')) # latin-1 fails
ж
Traceback (most recent call last):
File "<pyshell#41>", line 1, in <module>
print u'\u0436', repr(u'\u0436'.encode('cp1252')) # latin-1
File "C:\Python27\lib\encodings\cp1252.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u0436' in position 0: character maps to <undefined>
To escape this encoding jungle, use UTF-8 which can encode everything.
>>> print u'\u0436\u2620', repr(u'\u0436\u2620'.encode('utf-8'))
ж☠ '\xd0\xb6\xe2\x98\xa0'
Encoding and decoding with different encodings changes the character. If you want to use funny characters, use unicode and UTF-8.

Using the python unicode function

I'm working on a project that compares text.
Here is the relevant piece of code:
def post(self):
A = unicode(flask.request.form['A'])
B = unicode(flask.request.form['B'])
I posted large pieces of text from project gutenberg and I get errors like this:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 6: ordinal not in range(128)
Based on this page I have tried errors ignore and errors replace and get the error:
TypeError: decoding Unicode is not supported
If possible I want to be able to take in the most robust set of characters possible. I was hoping there was a python library that would allow this.
Here is more of the code. I think the problem may occur when I try to turn my input into a string.
C = A.split()
D = B.split()
Both = []
for x in C:
if x in D:
Both.append(x)
for x in range(len(Both)):
Both[x]=str(Both[x])
Final = []
for x in set(Both):
Final.append(x)
MissingA = []
for x in C:
if x not in Final and x not in MissingA:
MissingA.append(x)
for x in range(len(MissingA)):
MissingA[x]=str(MissingA[x])
MissingB = []
Here is more of the code. I think the problem may occur when I try to
turn my input into a string.
I think that's right - try eliminating the str() calls.

Replace Specialchars in Python

i need to replace special chars in the filename. Im trying this at the moment with translate, but its not really good working, and i hope you got an idea to do this. Its to make an clear playlist, ive got an bad player of mp3s in my car which cant do umlaute oder specialchars.
My code so far
# -*- coding: utf-8 -*-
import os
import sys
import id3reader
pfad = os.path.dirname(sys.argv[1])+"/"
ordner = ""
table = {
0xe9: u'e',
0xe4: u'ae',
ord(u'ö'): u'oe',
ord(u'ü'): u'ue',
ord(u'ß'): u'ss',
0xe1: u'ss',
0xfc: u'ue',
}
def replace(s):
return ''.join(c for c in s if (c.isalpha() or c == " " or c =="-") )
fobj_in = open(sys.argv[1])
fobj_out = open(sys.argv[1]+".new","w")
for line in fobj_in:
if (line.rstrip()[0:1]=="#" or line.rstrip()[0:1] ==" "):
print line.rstrip()[0:1]
else:
datei= pfad+line.rstrip()
#print datei
id3info = id3reader.Reader(datei)
dateiname= str(id3info.getValue('performer'))+" - "+ str(id3info.getValue('title'))
#print dateiname
arrPfad = line.split('/')
dateiname = replace(dateiname[0:60])
print dateiname
# dateiname = dateiname.translate(table)+".mp3"
ordner = arrPfad[0]+"/"+dateiname
# os.rename(datei,pfad+ordner)
fobj_out.write(ordner+"\r\n")
fobj_in.close()
i get this error: UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 37: ordinal not in range(128)
If i try to use the translate at the id3title i get TypeError: expected a character buffer object
if I need to get rid of non-ascii-characters, I often use:
>>> unicodedata.normalize("NFKD", u"spëcïälchärs").encode('ascii', 'ignore')
'specialchars'
which tries to convert characters to their ascii part of their normalized unicode decomposition.
Bad thing is, it throws away everything it does not know, and is not smart enough to transliterate umlauts (to ue, ae, etc).
But it might help you to at least play those mp3s.
Of course, you are free to do your own str.translate first, and wrap the result in this, to eliminate every non-ascii-character still left. In fact, if your replace is correct, this will solve your problem. I'd suggest you'd take a look on str.translate and str.maketrans, though.

Categories

Resources