String Encoding in Python [duplicate] - python

This question already has answers here:
Saving UTF-8 texts with json.dumps as UTF-8, not as a \u escape sequence
(12 answers)
Closed 7 months ago.
I have strings like these - Trang chủ and Đồ Dùng Nhà Bếp which have special charaters. When i print them, they are shown as it is. But when I convert it into Json, it changes to Trang ch\xe1\xbb\xa7. How can I print them as they are in JSON format also? Thanks in advance.
I tried the suggested answer of -
string.encode('utf-8', "ignore")
string.decode("ascii", "ignore")
and got this error:
UnicodeDecodeError('ascii', 'Trang ch\xe1\xbb\xa7', 8, 9, 'ordinal not in range(128)')
Is there a way around?
The link provided as duplicate is not the question I was asking.
The answer provided does solve my question :
json.dumps(your_string, ensure_ascii=False)

Just use:
json.dumps(your_string, ensure_ascii=False)
This will disable escaping non-ascii characters.

Related

How to convert a string containing hexadecimal number to ascii text in python [duplicate]

This question already has answers here:
Convert from ASCII string encoded in Hex to plain ASCII?
(9 answers)
Closed 4 years ago.
I've been trying to convert this string:
str = "68656c6c6f20776f726421"
to recover a ASCII value:
str = "hello word!"
I need some help with this please.
EDIT
Sorry for not giving more information, I'm a newbie, unfortunately.
but reading several pages of this wonderful site I found the solution.
the problem was that I got the string of a file, and a \n was printed, changing the length of the string.
Solution here: Python: binascii.a2b_hex gives "Odd-length string"
have you tried
codecs.decode(str, "hex").decode('utf-8')
You may need to assign this to a variable or print the results to see the output depending on your use case.
If I was doing this solution it would be the following:
data = bytes.fromhex(b'68656c6c6f20776f726421'.decode("ascii"))
print('The hex value converted to ascii is the following:',data.decode('utf-8'))

How can I stringify a dictionary having a string with cyrillic symbols into a pretty JSON? [duplicate]

This question already has answers here:
Saving UTF-8 texts with json.dumps as UTF-8, not as a \u escape sequence
(12 answers)
Closed 5 years ago.
Desired output:
'{"a": "йцукен"}'
I tried this:
>>> import json
>>> json.dumps({'a': 'йцукен'})
'{"a": "\\u0439\\u0446\\u0443\\u043a\\u0435\\u043d"}'
How can I avoid these U-codes and print normal symbols?
I am using Python 3.
You can use the ensure_ascii keyword argument as shown here:
json.dumps({'a': 'йцукен'}, ensure_ascii=False)

Python character encoding for '%C5%9' and similar [duplicate]

This question already has an answer here:
Weird character encoding issue with python / nautilus scripts combo
(1 answer)
Closed 9 years ago.
I am working in Python with strings, but I can't manage to display certain charatcers properly.
For example, I have this string:
%23%C5%9Een%C5%9EakrakTakiple%C5%9FelimYine
I have applied several functions to it to no avail. How could I display the appropiate characters in a web site?
you need two things. First you need to unescape the urlencoded data with urllib.unquote, then you need to decode the bytes from whatever charset they're in, this looks like it's utf-8:
>>> import urllib
>>> foo = '%23%C5%9Een%C5%9EakrakTakiple%C5%9FelimYine'
>>> print urllib.unquote(foo).decode('utf-8')
#ŞenŞakrakTakipleşelimYine

Unicode values in strings are escaped when dumping to JSON in Python [duplicate]

This question already has answers here:
Saving UTF-8 texts with json.dumps as UTF-8, not as a \u escape sequence
(12 answers)
Closed 7 months ago.
For example:
>>> print(json.dumps('růže'))
"r\u016f\u017ee"
(Of course, in the real program it's not just a single string, and it also appears like this in the file, when using json.dump()) I'd like it to output simply "růže" as well, how to do that?
Pass the ensure_ascii=False argument to json.dumps:
>>> print(json.dumps('růže', ensure_ascii=False))
"růže"

string encode / decode [duplicate]

This question already has answers here:
Python - email header decoding UTF-8
(9 answers)
Closed 6 years ago.
'=?KOI8-R?B?W1JFUS0wMDI1NDEtNDc5NzddIO/h7yAi89TSz8rGwdLGz9IiIDs=?=\r\n\t=?KOI8-R?B?Ry43MjkgKDEwKQ==?='
How can I convert this into something readable ?
Thanks !
>>> email.header.decode_header('=?KOI8-R?B?W1JFUS0wMDI1NDEtNDc5NzddIO/h7yAi89TSz8rGwdLGz9IiIDs=?=\r\n\t=?KOI8-R?B?Ry43MjkgKDEwKQ==?=')
[('[REQ-002541-47977] \xef\xe1\xef "\xf3\xd4\xd2\xcf\xca\xc6\xc1\xd2\xc6\xcf\xd2" ;G.729 (10)', 'koi8-r')]
>>> print '[REQ-002541-47977] \xef\xe1\xef "\xf3\xd4\xd2\xcf\xca\xc6\xc1\xd2\xc6\xcf\xd2" ;G.729 (10)'.decode('koi8-r')
[REQ-002541-47977] ОАО "Стройфарфор" ;G.729 (10)
This is encoded-word encoding as specified in RFC 2047.
The email package should be able to deal with this format.

Categories

Resources