This question already has an answer here:
How can I convert strings like "\u5c0f\u738b\u5b50\u003a\u6c49\u6cd5\u82f1\u5bf9\u7167" to Chinese characters
(1 answer)
Closed 9 years ago.
I have unicode string, i'm sure that it's UTF-8, but I can't decode it. The string is '\u041b\u0435\u0433\u043a\u043e\u0432\u044b\u0435'. How to decode it?
You can use aString.decode('unicode_escape'), it convert a unicode-format string to unicode object
>>> u'\u041b\u0435\u0433\u043a\u043e\u0432\u044b\u0435'
u'\u041b\u0435\u0433\u043a\u043e\u0432\u044b\u0435'
>>> '\u041b\u0435\u0433\u043a\u043e\u0432\u044b\u0435'.decode('unicode_escape')
u'\u041b\u0435\u0433\u043a\u043e\u0432\u044b\u0435'
>>>
In your case
>>> print '\u041b\u0435\u0433\u043a\u043e\u0432\u044b\u0435'.decode('unicode_escape')
Легковые
>>>
Related
This question already has answers here:
Decode Hex String in Python 3
(3 answers)
Closed 3 years ago.
I have lots of unicode characters codes stored as strings in Python3, e.g.
unicode = '3077'
where U+3077 is ぷ. How do I print this as human-readable text? I.e. how do I convert the string unicode to unicode_as_text such that:
>>> print(unicode_as_text)
ぷ
Your string is the unicode codepoint represented in hexdecimal, so the character can be rendered by printing the result of calling chr on the decimal value of the code point.
>>> print(chr(int('3077', 16)))
ぷ
This question already has answers here:
What's the correct way to convert bytes to a hex string in Python 3?
(9 answers)
Closed 4 years ago.
I have a variable b whose value is b'\xac\xed\x05sr\x00'.
How can I convert it to 'aced05737200'?
s, and r are converted to 73 and 72 respectively because their ascii code are 73 and 72.
b.decode('utf-8') gives me this error
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xac in position
0: invalid start byte
Simply use .hex()-method
>>> b = b'\xac\xed\x05sr\x00'
>>> b.hex()
'aced05737200'
to get the wanted result, because it's not a problem with decoding or encoding. Your bytestring looks ok to produce a proper string object with hexadecimal numbers.
This question already has answers here:
Python string to unicode [duplicate]
(3 answers)
Closed 6 years ago.
I have a question about Python 2 encoding. I am trying to decode an ASCII string which contains Unicode code of a letter to Unicode, and then encode it back to Latin-1, but with no success. Here is an illustration:
In[27]: d = u'\u010d'
In[28]: print d.encode('utf-8')
č
In[29]: d1 = '\u010d'
In[30]: d1.decode('ascii').encode('utf-8')
Out[30]: '\\u010d'
I would like to convert '\u010d' to 'č'. Are there any built-in solutions to avoid custom string replacement?
When you do
d1 = '\u010d'
you actually get this string:
In [3]: d1
Out[3]: '\\u010d'
This is because "normal" (non-Unicode) strings don't recognize the \unnnn escape sequence and therefore convert it to a literal backslash, followed by unnnn.
In order to decode that, you need to use the unicode_escape codec:
In [4]: print d1.decode("unicode_escape").encode('utf-8')
č
But of course you shouldn't use Unicode escape sequences in non-Unicode strings in the first place.
This question already has answers here:
Process escape sequences in a string in Python
(8 answers)
Closed 9 years ago.
I have a string that looks like this:
>>> st = 'aaaaa\x12bbbbb'
I can convert it to a raw string via:
>>> escaped_st = st.encode('string-escape')
'aaaaa\\x12bbbbb'
How can I convert the escaped string back to the original string? I was trying to do something like this:
escaped_st.replace('\\\\', '\\')
Decode the encoded string with the same encoding:
>>> st = 'aaaaa\x12bbbbb'
>>> escaped_st = st.encode('string-escape')
>>> escaped_st
'aaaaa\\x12bbbbb'
>>> escaped_st.decode('string-escape')
'aaaaa\x12bbbbb'
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How do I treat an ASCII string as unicode and unescape the escaped characters in it in python?
How do convert unicode escape sequences to unicode characters in a python string
I have a string that contains unicode characters e.g. \u2026 etc. Somehow it is not received to me as unicode, but is received as a str. How do I convert it back to unicode?
>>> a="Hello\u2026"
>>> b=u"Hello\u2026"
>>> print a
Hello\u2026
>>> print b
Hello…
>>> print unicode(a)
Hello\u2026
>>>
So clearly unicode(a) is not the answer. Then what is?
Unicode escapes only work in unicode strings, so this
a="\u2026"
is actually a string of 6 characters: '\', 'u', '2', '0', '2', '6'.
To make unicode out of this, use decode('unicode-escape'):
a="\u2026"
print repr(a)
print repr(a.decode('unicode-escape'))
## '\\u2026'
## u'\u2026'
Decode it with the unicode-escape codec:
>>> a="Hello\u2026"
>>> a.decode('unicode-escape')
u'Hello\u2026'
>>> print _
Hello…
This is because for a non-unicode string the \u2026 is not recognised but is instead treated as a literal series of characters (to put it more clearly, 'Hello\\u2026'). You need to decode the escapes, and the unicode-escape codec can do that for you.
Note that you can get unicode to recognise it in the same way by specifying the codec argument:
>>> unicode(a, 'unicode-escape')
u'Hello\u2026'
But the a.decode() way is nicer.
>>> a="Hello\u2026"
>>> print a.decode('unicode-escape')
Hello…