Django encoding/decoding issues - python

so i am using a Django tastypie resource and i am trying to find a generic way to decode any string that may be posted to the resource.
i have for example a Name like this
luiçscoico2##!&&á
and i want my to be able to identify the type of encoding, and appropriately decode it.
I am trying to fetch the string like this:
print bundle.data.get('first_name')
when i do a json dumps my string first name becomes like
"lui\u00e7scoico2##!&&\u00e1"
and i get an INTERNAL SERVER ERROR... any ideas?
UPDATE:
i do get a
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in
position 3: ordinal not in range(128)
if i try to decode('utf-8') before doing the json dumps to send to the server

Ok I'm gonna try to give a semi-blind answer here. Your string is already in Unicode, the reason I know this is because of the u'\xe7' which is exactly the ç character.
This means you don't have to encode it. If you need your string in utf-8 then just do:
x.decode('utf-8')
and it will porbably work :)
Hope this helps!

Related

Unicode error in python program output

I am trying run a bash command from my python program which out put the result in a file.I am using os.system to execute the bash command.But I am getting an error as follows:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 793: ordinal not in range(128)
I am not able to understand how to handle it.Please suggest me a solution for it.
Have a look at this Blog post
These messages usually means that you’re trying to either mix Unicode strings with 8-bit strings, or is trying to write Unicode strings to an output file or device that only handles ASCII.
Try to do the following to encode your string:
This can then be used to properly convert input data to Unicode. Assuming the string referred to by value is encoded as UTF-8:
value = unicode(value, "utf-8")
You need to encode your string as:
your_string = your_string.encode('utf-8')
For example:
>>> print(u'\u201c'.encode('utf - 8'))
“

UnicodeEncodeError: 'ascii' codec can't encode characters due to één from database

I have a field to get from database which contains string with this part één and while getting this i get error:
"UnicodeEncodeError: 'ascii' codec can't encode characters in position 12-15: ordinal not in range(128)"
I have search this error, and other people were having issue due to unicodes which start something like this u'\xa0, etc. But in my case, i think its due to special characters. I can not do changes in database as its not under my access. I can just access it.
The code is here: (actually its call to external url)
req = urllib2.Request(url)
req.add_header("Content-type", "application/json")
res = urllib2.urlopen(req,timeout = 50) #50 secs timeout
clientid = res.read()
result = json.loads(clientid)
Then I use result variable to get the above mentioned string and I get error on this line:
updateString +="name='"+str(result['product_name'])+"', "
You need to find the encoding for which is used for your data before it's inserted into the database. Let's assume it's UTF-8 since that's the most common.
In that case you will want to UTF-8 decode instead of ascii decode. You didn't provide any code, so I'm assuming you have "data".decode(). Try "data".decode("utf-8"), and if your data was encoded using this encoding, it will work.
So it sounds to me like the string already was unicode then. So remove str() and unicode functions on that line.

Django - POST data in latin1, decode as utf-8

Using mysql (not my choice), everything is set to utf8, utf8_general_ci. In the normal case everything is utf8 and happy.
However, if I POST sth like É’s, some latin1, and save it into the database as normal, I can't call .decode('utf-8') on the resulting model field:
>>> myinstance.myfield.decode('utf-8')
...
UnicodeEncodeError: 'ascii' codec can't encode character u'\xc9' in position 7: ordinal not in range(128)
I want to clean all incoming data so that it can be decoded as utf8.
Trying an approach like this just causes the UnicodeEncodeError upfront.
Edit: As Daniel's answer suggests, this question comes from a misunderstanding. latin1 is not the culprit here. .decode('utf-8') tries to encode to ASCII, so, it will fail for unicode like u'팩맨'.decode('utf-8'). It pains me to leave this question up, knowing what I know now. But, maybe it will help someone. I think, since the data is actually coming back as unicode, what we were trying to do was actually equivalent to u'É’'.decode('utf-8').
Django fields are always unicode. Trying to call decode on them means that Python will try to encode first, to ASCII, before trying to decode as UTF-8. That clearly isn't what you want. I expect you actually just want to do myinstance.myfield.encode('utf-8').

Conversion of Unicode

I am a newbie in python.
I have a unicode in Tamil.
When I use the sys.getdefaultencoding() I get the output as "Cp1252"
My requirement is that when I use text = testString.decode("utf-8") I get the error "UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-8: character maps to undefined"
When I use the
sys.getdefaultencoding() I get the
output as "Cp1252"
Two comments on that: (1) it's "cp1252", not "Cp1252". Don't type from memory. (2) Whoever caused sys.getdefaultencoding() to produce "cp1252" should be told politely that that's not a very good idea.
As for the rest, let me guess. You have a unicode object that contains some text in the Tamil language. You try, erroneously, to decode it. Decode means to convert from a str object to a unicode object. Unfortunately you don't have a str object, and even more unfortunately you get bounced by one of the very few awkish/perlish warts in Python 2: it tries to make a str object by encoding your unicode string using the system default encoding. If that's 'ascii' or 'cp1252', encoding will fail. That's why you get a Unicode*En*codeError instead of a Unicode*De*codeError.
Short answer: do text = testString.encode("utf-8"), if that's what you really want to do. Otherwise please explain what you want to do, and show us the result of print repr(testString).
add this as your 1st line of code
# -*- coding: utf-8 -*-
later in your code...
text = unicode(testString,"UTF-8")
you need to know which character-encoding is testString using. if not utf8, an error will occur when using decode('utf8').

utf-8 plus question marks

I have a site that displays user input by decoding it to unicode using utf-8. However, user input can include binary data, which is obviously not always able to be 'decoded' by utf-8.
I'm using Python, and I get an error saying:
'utf8' codec can't decode byte 0xbf in position 0: unexpected code byte. You passed in '\xbf\xcd...
Is there a standard efficient way to convert those undecodable characters into question marks?
It would be most helpful if the answer uses Python.
Try:
inputstring.decode("utf8", "replace")
See here for reference
I think what you are looking for is:
str.decode('utf8','ignore')
which should drop invalid bytes rather than raising exception

Categories

Resources