I'm having the following problem: when I use SQSConnection.send_message method with a fixed string as a parameter (with no accented characters), it works as expected. But when I get the body of a message (using get_messages) and try to send it again to the same queue, I get this error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xea' in position 38: ordinal not in range(128)
The messages were written directly from Amazon Web Console and have a few ";" characters and some accented such as "õ" and "ã". What should I do? I'm already using set_message_class(RawMessage) as suggested here
Using python BOTO with AWS SQS, getting back nonsense characters
but it only worked for receiving the messages. I'm using Ubuntu 12.04, with python-boto installed from repositories (I think it's version 2.22, but don't know how to check).
Thanks!!
send_message can only handle byte strings (str class). What you are receiving from SQS is a Unicode string (unicode class). You need to convert your Unicode string to a byte string by calling encode('utf-8') on it.
If you have a mix of string types coming in you may need to conditionally encode the Unicode string into a byte string.
if type(message_body) is unicode:
message_content = message_body.encode('utf-8')
else:
message_content = message_body
Related
I am running into this problem where when I try to decode a string I run into one error,when I try to encode I run into another error,errors below,is there a permanent solution for this?
P.S please note that you may not be able to reproduce the encoding error with the string I provided as I couldnt copy/paste some errors
text = "sometext"
string = '\n'.join(list(set(text)))
try:
print "decode"
text = string.decode('UTF-8')
except Exception as e:
print e
text = string.encode('UTF-8')
Errors:-
error while using string.decode('UTF-8')
'ascii' codec can't encode character u'\u2602' in position 438: ordinal not in range(128)
Error while using string.encode('UTF-8')
Exception All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
The First Error
The code you have provided will work as the text is a a bytestring (as you are using Python 2). But what you're trying to do is to decode from a UTF-8 string to
an ASCII one, which is possible, but only if that Unicode string contains only characters that have an ASCII equivalent (you can see the list of ASCII characters here). In your case, it's encountering a unicode character (specifically ☂) which has no ASCII equivalent. You can get around this behaviour by using:
string.decode('UTF-8', 'ignore')
Which will just ignore (i.e. replace with nothing) the characters that cannot be encoded into ASCII.
The Second Error
This error is more interesting. It appears the text you are trying to encode into UTF-8 contains either NULL bytes or specific control characters, which are not allowed by the version of Unicode (UTF-8) that you are trying to encode into. Again, the code that you have actually provided works, but something in the text that you are trying to encode is violating the encoding. You can try the same trick as above:
string.encode('UTF-8', 'ignore')
Which will simply remove the offending characters, or you can look into what it is in your specific text input that is causing the problem.
I am trying run a bash command from my python program which out put the result in a file.I am using os.system to execute the bash command.But I am getting an error as follows:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 793: ordinal not in range(128)
I am not able to understand how to handle it.Please suggest me a solution for it.
Have a look at this Blog post
These messages usually means that you’re trying to either mix Unicode strings with 8-bit strings, or is trying to write Unicode strings to an output file or device that only handles ASCII.
Try to do the following to encode your string:
This can then be used to properly convert input data to Unicode. Assuming the string referred to by value is encoded as UTF-8:
value = unicode(value, "utf-8")
You need to encode your string as:
your_string = your_string.encode('utf-8')
For example:
>>> print(u'\u201c'.encode('utf - 8'))
“
so i am using a Django tastypie resource and i am trying to find a generic way to decode any string that may be posted to the resource.
i have for example a Name like this
luiçscoico2##!&&á
and i want my to be able to identify the type of encoding, and appropriately decode it.
I am trying to fetch the string like this:
print bundle.data.get('first_name')
when i do a json dumps my string first name becomes like
"lui\u00e7scoico2##!&&\u00e1"
and i get an INTERNAL SERVER ERROR... any ideas?
UPDATE:
i do get a
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in
position 3: ordinal not in range(128)
if i try to decode('utf-8') before doing the json dumps to send to the server
Ok I'm gonna try to give a semi-blind answer here. Your string is already in Unicode, the reason I know this is because of the u'\xe7' which is exactly the ç character.
This means you don't have to encode it. If you need your string in utf-8 then just do:
x.decode('utf-8')
and it will porbably work :)
Hope this helps!
I have a small webapp that runs Python on the server side and javascript (jQuery) on the client side.
Now upon a certain request my Python script returns a unicode string and the client is supposed to put that string inside a div in the browser. However i get a unicode encode error from Python.
If i run the script from the shell (bash on debian linux) teh script runs fine and prints the unicode string.
Any ideas ?
Thanks!
EDIT
This is the print statement that causes the error:
print u'öäü°'
This is the error message i get:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 34-36: ordinal not in range(128)
However i only get that message when calling the script via ajax ( $('#somediv').load('myscript.py'); )
Thank you !
If the python interpreter can't determine the encoding of sys.stdout ascii is used as a fallback however the characters in the string are not part of ascii, therefore a UnicodeEncodeError exception is raised.
A solution would be to encode the string yourself using something like .encode(sys.stdout.encoding or "utf-8"). This way utf-8 is used as a fallback instead of ascii.
I'm having a problem when trying to apply a regular expression to some strings encoded in latin-1 (ISO-8859-1).
What I'm trying to do is send some data via HTTP POST from a page encoded in ISO-8859-1 to my python application and do some parsing on the data using regular expressions in my python script.
The web page uses jQuery to send the data to the server and I'm grabbing the text from the page using the .text() method. Once the data is sent back to the server looks like this: re.compile(r"^[\s,]*(\d*\s*\d*\/*\d)[\s,]*") - Unfortunately the \s in my regular expression is not matching my data, and I traced the problem down to the fact that the html page uses which gets encoded to 0xA0 (non-breaking space) and sent to the server. For some reason, it seems, my script is not interpreting that character as whitespace and is not matching. According to the python [documentation][1] it looks like this should work, so I must have an encoding issue here.
I then wanted to try converting the string into unicode and pass it to the regular expression, so I tried to view what would happen when I converted the string: print(unicode(data, 'iso-8859-1')).
Unfortunately I got this error:
UnicodeEncodeError at /script/
'ascii' codec can't encode character u'\xa0' in position 122: ordinal not in range(128)
I'm confused though - I'm obviously not trying to use ASCII decoding - is python trying to decode using ASCII even though I'm obviously passing another codec?
Try this instead:
print(repr(unicode(data, 'iso-8859-1')))
by printing a unicode object you're implicitly trying to convert it to the default encoding, which is ASCII. Using repr will escape it into an ASCII-safe form, plus it'll be easier for you to figure out what's going on for debugging.
Are you using Python 3.X or 2.X? It makes a difference. Actually looks like 2.X but you confused me by using print(blahblah) :-)
Answer to your last question: Yes, ASCII by default when you do print(). On 3.X: Use print(ascii(foo)) for debugging, not print(foo). On 2.X use repr(), not ascii().
Your original problem with the no-break space should go away if (a) the data is unicode and (b) you use the re.UNICODE flag with the re.compile()