How to encode unicode to urlenconing in python 2.7
I want to encode unicode like '€'.
But I don't konw what should I do...
>>> u='€'
>>> _u=u'€'
>>> u
'\xa2\xe6'
>>> _u
u'\u20ac'
>>> urllib.quote(u)
'%A2%E6'
>>> urllib.quote(_u)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\urllib.py", line 1268, in quote
return ''.join(map(quoter, s))
KeyError: u'\u20ac'
>>> print urllib.unquote(urllib.quote(u))
€
>>>
Just I need '%A2%E6' Through unicode '€'.
What should I do?
>>> print urllib.unquote(urllib.quote(u'€'.encode('utf8')))
?
You should encode unicode string and then use urllib.quote. You yourself wrote the answer to your question
urllib.quote(u'€'.encode('utf8'))
Related
I have the Python string u'\u221220' aka "−20" with the Unicode minus sign.
When trying to convert into a float, I'm getting
>>> a = u'\u221220'
>>> float(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'decimal' codec can't encode character u'\u2212' in position 0: invalid decimal Unicode string
with Python 2 and
>>> a = u'\u221220'
>>> float(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: could not convert string to float: '−20'
with Python 3.
How can I properly convert u'\u221220' into the float -20.0 in both Python 2 and Python 3? A portable solution would be great.
From #j-f-sebastian:
a = u'\u221220'
float(a.replace(u'\N{MINUS SIGN}', '-'))
does the trick. See the related Python issue.
I have
(Pdb) email
'\x00t\x00e\x00s\x00t\x00#\x00g\x00m\x00a\x00i\x00l\x00.\x00c\x00o\x00m\x00'
(Pdb) print email
test#gmail.com
I need to validate whether thie value is an email format, however, how can i convert this string to actual ascii string?
Seems like it's encoded with utf-16 encoding.
>>> '\x00t\x00e\x00s\x00t\x00#\x00g\x00m\x00a\x00i\x00l\x00.\x00c\x00o\x00m\x00'.decode('utf-16')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\encodings\utf_16.py", line 16, in decode
return codecs.utf_16_decode(input, errors, True)
UnicodeDecodeError: 'utf16' codec can't decode byte 0x00 in position 28: truncated data
and truncated:
>>> '\x00t\x00e\x00s\x00t\x00#\x00g\x00m\x00a\x00i\x00l\x00.\x00c\x00o\x00m\x00'[1:].decode('utf-16')
u'test#gmail.com'
>>> '\x00t\x00e\x00s\x00t\x00#\x00g\x00m\x00a\x00i\x00l\x00.\x00c\x00o\x00m\x00'[1:].decode('utf-16-le')
u'test#gmail.com'
>>> '\x00t\x00e\x00s\x00t\x00#\x00g\x00m\x00a\x00i\x00l\x00.\x00c\x00o\x00m\x00'.decode('utf-16-be', 'ignore')
u'test#gmail.com'
Converting your email to an ASCII string can be done like this :
str(email.decode('utf-16le'))
It gives me an error that the line encoded needs to be bytes not str/dict
I know of adding a "b" before the text will solve that and print the encoded thing.
import base64
s = base64.b64encode(b'12345')
print(s)
>>b'MTIzNDU='
But how do I encode a variable?
such as
import base64
s = "12345"
s2 = base64.b64encode(s)
print(s2)
It gives me an error with the b added and without. I don't understand
I'm also trying to encode/decode a dictionary with base64.
You need to encode the unicode string. If it's just normal characters, you can use ASCII. If it might have other characters in it, or just for general safety, you probably want utf-8.
>>> import base64
>>> s = "12345"
>>> s2 = base64.b64encode(s)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ". . . /lib/python3.3/base64.py", line 58, in b64encode
raise TypeError("expected bytes, not %s" % s.__class__.__name__)
TypeError: expected bytes, not str
>>> s2 = base64.b64encode(s.encode('ascii'))
>>> print(s2)
b'MTIzNDU='
>>>
When I try to convert a unicode variable to float using unicodedata.numeric(variable_name), I get this error "need a single Unicode character as parameter". Does anyone know how to resolve this?
Thanks!
Here is the code snippet I'm using :
f = urllib.urlopen("http://compling.org/cgi-bin/DAL_sentence_xml.cgi?sentence=good")
s = f.read()
f.close()
doc = libxml2dom.parseString(s)
measure = doc.getElementsByTagName("measure")
valence = unicodedata.numeric(measure[0].getAttribute("valence"))
activation = unicodedata.numeric(measure[0].getAttribute("activation"))
This is the error I'm getting when I run the code above
Traceback (most recent call last):
File "sentiment.py", line 61, in <module>
valence = unicodedata.numeric(measure[0].getAttribute("valence"))
TypeError: need a single Unicode character as parameter
Summary: Use float() instead.
The numeric function takes a single character. It does not do general conversions:
>>> import unicodedata
>>> unicodedata.numeric('½')
0.5
>>> unicodedata.numeric('12')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: need a single Unicode character as parameter
If you want to convert a number to a float, use the float() function.
>>> float('12')
12.0
It won't do that Unicode magic, however:
>>> float('½')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: could not convert string to float: '½'
I am trying to make a simple conversion to Unicode string to standart string, but no success.
I have: PyQt4.QtCore.QString(u'\xc5\x9f')
I want: '\xc5\x9f' notice str type not unicode, because the library I am using is not accepting unicode.
Here is what I tried, you can see how hopeless I am :) :
>>> s = QtCore.QString(u'\xc5\x9f')
>>> str(s)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128) '
>>> s.toUtf8()
PyQt4.QtCore.QByteArray('\xc3\x85\xc2\x9f')
>>> s.toUtf8().decode("utf-8")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'QByteArray' object has no attribute 'decode'
>>> str(s.toUtf8()).decode("utf-8")
u'\xc5\x9f'
>>> str(str(s.toUtf8()).decode("utf-8"))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)
I know there are a lot of questions related to Unicode, but I can't find this answer.
What should I do?
Edit:
I found a hacky way:
>>> unicoded = str(s.toUtf8()).decode("utf-8")
>>> unicoded
u'\xc5\x9f'
>>> eval(repr(unicoded)[1:])
'\xc5\x9f'
Do you know a better way?
If you have unicode string of QString data type , and need to convert it to python string , you just :
unicode(YOUR_QSTRING_STRING)
Is this what you are after?
In [23]: a
Out[23]: u'\xc5\x9f'
In [24]: a.encode('latin-1')
Out[24]: '\xc5\x9f'
In [25]: type(a.encode('latin-1'))
Out[25]: <type 'str'>