Python - How to convert HTML entity to UTF-8 [duplicate] - python

This question already has answers here:
Decode HTML entities in Python string?
(6 answers)
Closed 3 years ago.
I want to convert in Python 2.7 string like
"€", "ż"
and similar to UTF-8 string.
How to do it?

Python3
>>> import html
>>> html.unescape('©')
'©'
>>> html.unescape('€')
'€'
>>> html.unescape('ż')
'ż'
It's in html module in python.

Related

Encoding string to Windows-1252 URL format in Python 3 [duplicate]

This question already has answers here:
URL encoding in python
(3 answers)
Closed 4 years ago.
I want to represent all characters in a string as in this
table.
But when I do
raw = 'æøå'
encoded = raw.encode('cp1252')
print(encoded)
I get
>>> b'\xe6\xf8\xe5'
What I want is
>>> %E6%F8%E5
as a string for use in a URL.
You have to "quote" your string using urllib tools.
import urllib.parse
raw = 'æøå'
print(urllib.parse.quote(raw, encoding='cp1252'))
# returns "%E6%F8%E5"

Decode %xx in Python 3.6 [duplicate]

This question already has answers here:
Transform URL string into normal string in Python (%20 to space etc)
(3 answers)
Closed 4 years ago.
I have
%64%6f%63%75%6d%65%6e%74%2e%77%72%69%74%65%28%27%3c%61%20%68%72%65%66%3d%22%6d%61%69%6c%74%6f%3a%62%65%6e%2e%61%6e%67%65%72%40%6b%6e%6f%62%62%65%2e%63%6f%6d%22%20%72%65%6c%3d%22%6e%6f%69%6e%64%65%78%2c%20%6e%6f%66%6f%6c%6c%6f%77%22%3e%62%65%6e%2e%61%6e%67%65%72%40%6b%6e%6f%62%62%65%2e%63%6f%6d%3c%2f%61%3e%27%29%3b
It's from a JavaScript tag that I scraped.
Unfortunately, none of the solutions in Javascript unescape() vs. Python urllib.unquote() seem to work in Python 3.
unquote() has been moved to the urllib.parse package in Python 3:
>>> from urllib.parse import unquote
>>> unquote('%64%6f%63%75%6d%65%6e%74%2e')
'document.'

Python convert Hexadecimal Character to Respective Symbols? [duplicate]

This question already has answers here:
How do I url unencode in Python?
(3 answers)
Closed 5 years ago.
I'm trying to find a python package/sample code that can convert the following input "why+don%27t+you+want+to+talk+to+me" to "why+don't+you+want+to+talk+to+me".
Converting the Hex codes like %27 to ' respectively. I can hardcode the who hex character set and then swap them with their symbols. However, I want a simple and scalable solution.
Thanks for helping
You can use urllib's unquote function.
import urllib.parse
urllib.parse.unquote('why+don%27t+you+want+to+talk+to+me')

encoding string that has been decoded with %' to unicode [duplicate]

This question already has answers here:
Transform URL string into normal string in Python (%20 to space etc)
(3 answers)
Url decode UTF-8 in Python
(5 answers)
Decode escaped characters in URL
(5 answers)
Closed 5 years ago.
html POST method decoded my string like this:
Ostrołęka => Ostro%C5%82%C4%99ka
How do I encode it into readable form in Python?
Sorry for possible duplicate.
EDIT: Solution in 'possible duplicate' doesn't solve above problem
Python 2:
from urllib import unquote
x = unquote('Ostro%C5%82%C4%99ka')
Python 3:
from urllib.parse import unquote
x = unquote('Ostro%C5%82%C4%99ka')

How to convert `%xx` code in URL back to the corresponding UTF-8 character [duplicate]

This question already has answers here:
How do I url unencode in Python?
(3 answers)
Closed 8 years ago.
For example, I want to convert
'http://en.wikipedia.org/wiki/Ana%C3%AFs_Croze'
to
u'http://en.wikipedia.org/wiki/Anaïs_Croze'
How to do this in Python?
>>> import urllib2
>>> print urllib2.unquote('http://en.wikipedia.org/wiki/Ana%C3%AFs_Croze')
http://en.wikipedia.org/wiki/Anaïs_Croze
>>>
The above code as a runnable 'bunk' http://codebunk.com/bunk#-Iy8_GcBQ02jlMauuYP4

Categories

Resources