Python ascii encoding issue - python

I run a python script and i receive the following error
sql = 'insert into posts(msg_id, msg_text, msg_date) values("{0}", "{1}", "{2}")'.format(msg['id'], text.encode('utf-8'), msg['datime'])
UnicodeEncodeError: 'ascii' codec can't encode characters in position 25-31: ordinal not in range(128)
How can i correct this error or maybe caught it with an exception? Any ideas?

try:
sql = u'insert into posts(msg_id, msg_text, msg_date) values("{0}", "{1}", "{2}")'.format(msg['id'], text.decode('utf-8'), msg['datime'])
basically, your text contains utf-8 characters, and using the encode() method, you keep it as is. But the main string (the ones you're formatting) is a plain ASCII string. By adding u in front of the string (u'') you make it a unicode string. Then, whatever being in text, you want to have it decoded as utf-8, thus the .decode() instead of .encode().
and if you want to catch that kind of errors, simply:
try:
sql = …
except UnicodeEncodeError, err:
print err
but if you want to really get rid of any utf8/ascii mess, you should think of switching to python 3.
HTH

Related

'ascii' codec can't encode character u'\u2602' in position 438: ordinal not in range(128)

I am running into this problem where when I try to decode a string I run into one error,when I try to encode I run into another error,errors below,is there a permanent solution for this?
P.S please note that you may not be able to reproduce the encoding error with the string I provided as I couldnt copy/paste some errors
text = "sometext"
string = '\n'.join(list(set(text)))
try:
print "decode"
text = string.decode('UTF-8')
except Exception as e:
print e
text = string.encode('UTF-8')
Errors:-
error while using string.decode('UTF-8')
'ascii' codec can't encode character u'\u2602' in position 438: ordinal not in range(128)
Error while using string.encode('UTF-8')
Exception All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
The First Error
The code you have provided will work as the text is a a bytestring (as you are using Python 2). But what you're trying to do is to decode from a UTF-8 string to
an ASCII one, which is possible, but only if that Unicode string contains only characters that have an ASCII equivalent (you can see the list of ASCII characters here). In your case, it's encountering a unicode character (specifically ☂) which has no ASCII equivalent. You can get around this behaviour by using:
string.decode('UTF-8', 'ignore')
Which will just ignore (i.e. replace with nothing) the characters that cannot be encoded into ASCII.
The Second Error
This error is more interesting. It appears the text you are trying to encode into UTF-8 contains either NULL bytes or specific control characters, which are not allowed by the version of Unicode (UTF-8) that you are trying to encode into. Again, the code that you have actually provided works, but something in the text that you are trying to encode is violating the encoding. You can try the same trick as above:
string.encode('UTF-8', 'ignore')
Which will simply remove the offending characters, or you can look into what it is in your specific text input that is causing the problem.

Unicode error in python program output

I am trying run a bash command from my python program which out put the result in a file.I am using os.system to execute the bash command.But I am getting an error as follows:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 793: ordinal not in range(128)
I am not able to understand how to handle it.Please suggest me a solution for it.
Have a look at this Blog post
These messages usually means that you’re trying to either mix Unicode strings with 8-bit strings, or is trying to write Unicode strings to an output file or device that only handles ASCII.
Try to do the following to encode your string:
This can then be used to properly convert input data to Unicode. Assuming the string referred to by value is encoded as UTF-8:
value = unicode(value, "utf-8")
You need to encode your string as:
your_string = your_string.encode('utf-8')
For example:
>>> print(u'\u201c'.encode('utf - 8'))
“

pyodbc Netezza 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)

I am new to Python scripting and when I was converting my shell scripts to Python for a Netezza DB call in which a stored procedure is invoked with passed arguments.Everything is working as expected and giving result same as Shell.But in one case if one parameter is null it will read that data from a Netezza table (Varchar field).While I was testing that scenario and try to print the result read from I got a weird error saying " 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)".I tried to convert the value to string but it is not working.
Attaching the script for reference.
Note:The script may not follow the Python standards.Open to any Suggestions for improving the code
connection
try:
conn_str ="DRIVER={NetezzaSQL};SERVER="+results.host+";PORT=5480;DATABASE="+results.sugarDB+";UID="+results.username+";PWD="+results.password+""
print conn_str
conn_sugar = pyodbc.connect(conn_str,ansi=True)
cur_sugar = conn_sugar.cursor()
if (conn_sugar):
print "Connection successful"
except Exception, e:
print "Error while creating Netezza connection Error:",e
sys.exit(-1)
reading data from Netezza table
try:
checking for null parameter dim list
if str(results.dimList)=="":
print "dimlist is null"
var_query="select LP.DIMENSIONS AS DIMENSIONS from PICASO..LKP_PX_RECOMMEND_METADATA LP where LP.client_id="+results.clientID+""
print var_query
for row in cur_sugar.execute(var_query):
print "line no 62"
print row.DIMENSIONS
conn_sugar.commit();
else:
print "dimlist is not null",results.dimList
v=results.dimList
cur_sugar.execute("{exec SQLTOOLKIT..UDP_PRC_GET_MEDIAPLAN_RECOMMENDATION_3004("+results.clientID+","+results.configID+","+results.jobinstanceID+",'"+results.convBegin+"','"+results.convEnd+"','"+results.jaMeta+"','"+results.sugarDB+"','"+results.dimList+"','"+results.flag+"')}")
conn_sugar.commit();
conn_sugar.close();
except Exception, e:
print "procedure call failed!!! Error :",e
Error coming as
procedure call failed!!! Error : 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
Thanks
Anoop R
The error message is saying that it cannot parse the bytes into a valid ascii string. Decode for bytes has an option for how to handle errors. You can 'ignore' or 'replace'. 'replace' will stuff a question character in where the original bytes could not be parsed into ascii.
value = b''
val_str = value.decode("ascii", 'ignore')
Think of the ordinal as the decimal number in bytes for the ascii table lookup. http://www.asciitable.com/
value = bytes([97]) # a
val_str = value.decode("ascii", "ignore")
print(val_str)
This issue could be related to UTF-8 conversion. Your result filed has non-unicode filed and that might be causing the issue. Try this solution.
pyodbc remove unicode strings
Refer: UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position 2: ordinal not in range(128)
A similar issue at Django area. This will give you an idea where it originates from.

UnicodeEncodeError: 'ascii' codec can't encode characters due to één from database

I have a field to get from database which contains string with this part één and while getting this i get error:
"UnicodeEncodeError: 'ascii' codec can't encode characters in position 12-15: ordinal not in range(128)"
I have search this error, and other people were having issue due to unicodes which start something like this u'\xa0, etc. But in my case, i think its due to special characters. I can not do changes in database as its not under my access. I can just access it.
The code is here: (actually its call to external url)
req = urllib2.Request(url)
req.add_header("Content-type", "application/json")
res = urllib2.urlopen(req,timeout = 50) #50 secs timeout
clientid = res.read()
result = json.loads(clientid)
Then I use result variable to get the above mentioned string and I get error on this line:
updateString +="name='"+str(result['product_name'])+"', "
You need to find the encoding for which is used for your data before it's inserted into the database. Let's assume it's UTF-8 since that's the most common.
In that case you will want to UTF-8 decode instead of ascii decode. You didn't provide any code, so I'm assuming you have "data".decode(). Try "data".decode("utf-8"), and if your data was encoded using this encoding, it will work.
So it sounds to me like the string already was unicode then. So remove str() and unicode functions on that line.

Unicode error trying to call Google search API

I need to perform google search to retrieve the number of results for a query. I found the answer here - Google Search from a Python App
However, for few queries I am getting the below error. I think the query has unicode characters.
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 28: ordinal not in range(128)
I searched google and found I need to convert unicode to ascii, and found below code.
def convertToAscii(text, action):
temp = unicode(text, "utf-8")
fixed = unicodedata.normalize('NFKD', temp).encode('ASCII', action)
return fixed
except Exception, errorInfo:
print errorInfo
print "Unable to convert the Unicode characters to xml character entities"
raise errorInfo
If I use the action ignore, it removes those characters, but if I use other actions, I am getting exceptions.
Any idea, how to handle this?
Thanks
== Edit ==
I am using below code to encode and then perform the search and this is throwing the error.
query = urllib.urlencode({'q': searchfor})
You cannot urlencode raw Unicode strings. You need to first encode them to UTF-8 and then feed to it:
query = urllib.urlencode({'q': u"München".encode('UTF-8')})
This returns q=M%C3%BCnchen which Google happily accepts.
You can't safely convert Unicode to ASCII. Doing so involves throwing away information (specifically, it throws away non-English letters).
You should be doing the entire process in Unicode, so as not to lose any information.

Categories

Resources