string encode / decode [duplicate] - python

This question already has answers here:
Python - email header decoding UTF-8
(9 answers)
Closed 6 years ago.
'=?KOI8-R?B?W1JFUS0wMDI1NDEtNDc5NzddIO/h7yAi89TSz8rGwdLGz9IiIDs=?=\r\n\t=?KOI8-R?B?Ry43MjkgKDEwKQ==?='
How can I convert this into something readable ?
Thanks !

>>> email.header.decode_header('=?KOI8-R?B?W1JFUS0wMDI1NDEtNDc5NzddIO/h7yAi89TSz8rGwdLGz9IiIDs=?=\r\n\t=?KOI8-R?B?Ry43MjkgKDEwKQ==?=')
[('[REQ-002541-47977] \xef\xe1\xef "\xf3\xd4\xd2\xcf\xca\xc6\xc1\xd2\xc6\xcf\xd2" ;G.729 (10)', 'koi8-r')]
>>> print '[REQ-002541-47977] \xef\xe1\xef "\xf3\xd4\xd2\xcf\xca\xc6\xc1\xd2\xc6\xcf\xd2" ;G.729 (10)'.decode('koi8-r')
[REQ-002541-47977] ОАО "Стройфарфор" ;G.729 (10)

This is encoded-word encoding as specified in RFC 2047.
The email package should be able to deal with this format.

Related

Python - How to convert HTML entity to UTF-8 [duplicate]

This question already has answers here:
Decode HTML entities in Python string?
(6 answers)
Closed 3 years ago.
I want to convert in Python 2.7 string like
"€", "ż"
and similar to UTF-8 string.
How to do it?
Python3
>>> import html
>>> html.unescape('©')
'©'
>>> html.unescape('€')
'€'
>>> html.unescape('ż')
'ż'
It's in html module in python.

Encoding string to Windows-1252 URL format in Python 3 [duplicate]

This question already has answers here:
URL encoding in python
(3 answers)
Closed 4 years ago.
I want to represent all characters in a string as in this
table.
But when I do
raw = 'æøå'
encoded = raw.encode('cp1252')
print(encoded)
I get
>>> b'\xe6\xf8\xe5'
What I want is
>>> %E6%F8%E5
as a string for use in a URL.
You have to "quote" your string using urllib tools.
import urllib.parse
raw = 'æøå'
print(urllib.parse.quote(raw, encoding='cp1252'))
# returns "%E6%F8%E5"

Python convert Hexadecimal Character to Respective Symbols? [duplicate]

This question already has answers here:
How do I url unencode in Python?
(3 answers)
Closed 5 years ago.
I'm trying to find a python package/sample code that can convert the following input "why+don%27t+you+want+to+talk+to+me" to "why+don't+you+want+to+talk+to+me".
Converting the Hex codes like %27 to ' respectively. I can hardcode the who hex character set and then swap them with their symbols. However, I want a simple and scalable solution.
Thanks for helping
You can use urllib's unquote function.
import urllib.parse
urllib.parse.unquote('why+don%27t+you+want+to+talk+to+me')

encoding string that has been decoded with %' to unicode [duplicate]

This question already has answers here:
Transform URL string into normal string in Python (%20 to space etc)
(3 answers)
Url decode UTF-8 in Python
(5 answers)
Decode escaped characters in URL
(5 answers)
Closed 5 years ago.
html POST method decoded my string like this:
Ostrołęka => Ostro%C5%82%C4%99ka
How do I encode it into readable form in Python?
Sorry for possible duplicate.
EDIT: Solution in 'possible duplicate' doesn't solve above problem
Python 2:
from urllib import unquote
x = unquote('Ostro%C5%82%C4%99ka')
Python 3:
from urllib.parse import unquote
x = unquote('Ostro%C5%82%C4%99ka')

String Encoding in Python [duplicate]

This question already has answers here:
Saving UTF-8 texts with json.dumps as UTF-8, not as a \u escape sequence
(12 answers)
Closed 7 months ago.
I have strings like these - Trang chủ and Đồ Dùng Nhà Bếp which have special charaters. When i print them, they are shown as it is. But when I convert it into Json, it changes to Trang ch\xe1\xbb\xa7. How can I print them as they are in JSON format also? Thanks in advance.
I tried the suggested answer of -
string.encode('utf-8', "ignore")
string.decode("ascii", "ignore")
and got this error:
UnicodeDecodeError('ascii', 'Trang ch\xe1\xbb\xa7', 8, 9, 'ordinal not in range(128)')
Is there a way around?
The link provided as duplicate is not the question I was asking.
The answer provided does solve my question :
json.dumps(your_string, ensure_ascii=False)
Just use:
json.dumps(your_string, ensure_ascii=False)
This will disable escaping non-ascii characters.

Categories

Resources