QString: Unicode encoding-decoding problem - python

I am trying to make a simple conversion to Unicode string to standart string, but no success.
I have: PyQt4.QtCore.QString(u'\xc5\x9f')
I want: '\xc5\x9f' notice str type not unicode, because the library I am using is not accepting unicode.
Here is what I tried, you can see how hopeless I am :) :
>>> s = QtCore.QString(u'\xc5\x9f')
>>> str(s)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128) '
>>> s.toUtf8()
PyQt4.QtCore.QByteArray('\xc3\x85\xc2\x9f')
>>> s.toUtf8().decode("utf-8")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'QByteArray' object has no attribute 'decode'
>>> str(s.toUtf8()).decode("utf-8")
u'\xc5\x9f'
>>> str(str(s.toUtf8()).decode("utf-8"))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)
I know there are a lot of questions related to Unicode, but I can't find this answer.
What should I do?
Edit:
I found a hacky way:
>>> unicoded = str(s.toUtf8()).decode("utf-8")
>>> unicoded
u'\xc5\x9f'
>>> eval(repr(unicoded)[1:])
'\xc5\x9f'
Do you know a better way?

If you have unicode string of QString data type , and need to convert it to python string , you just :
unicode(YOUR_QSTRING_STRING)

Is this what you are after?
In [23]: a
Out[23]: u'\xc5\x9f'
In [24]: a.encode('latin-1')
Out[24]: '\xc5\x9f'
In [25]: type(a.encode('latin-1'))
Out[25]: <type 'str'>

Related

"invalid literal" from int(base64.urlsafe_b64decode(mystr))

I have a base64 string that I need to decode, then i convert it into a integer so I can "% 2" it. the base64 decode is easy but apparently I have some confusion on how python actually handles binary:
>>> y = 'EFbSUq0g7qvoW2ehykfSveb_pSmunxOJUEVao1RWwck'
>>> int(base64.urlsafe_b64decode('EFbSUq0g7qvoW2ehykfSveb_pSmunxOJUEVao1RWwck='), 2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 2: b'\x10V\xd2R\xad \xee\xab\xe8[g\xa1\xcaG\xd2\xbd\xe6\xff\xa5)\xae\x9f\x13\x89PEZ\xa3TV\xc1\xc9'
>>> int(base64.urlsafe_b64decode('EFbSUq0g7qvoW2ehykfSveb_pSmunxOJUEVao1RWwck='), 16)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 16: b'\x10V\xd2R\xad \xee\xab\xe8[g\xa1\xcaG\xd2\xbd\xe6\xff\xa5)\xae\x9f\x13\x89PEZ\xa3TV\xc1\xc9'
>>>
Use int.from_bytes() to convert a base64 decoded string into int
int.from_bytes(
base64.urlsafe_b64decode(
'EFbSUq0g7qvoW2ehykfSveb_pSmunxOJUEVao1RWwck='
),
'big' # the endianness
)
7390406020584230016520446236832857473226268177813448430255309703833393217993

convert unicode string to float

I have the Python string u'\u221220' aka "−20" with the Unicode minus sign.
When trying to convert into a float, I'm getting
>>> a = u'\u221220'
>>> float(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'decimal' codec can't encode character u'\u2212' in position 0: invalid decimal Unicode string
with Python 2 and
>>> a = u'\u221220'
>>> float(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: could not convert string to float: '−20'
with Python 3.
How can I properly convert u'\u221220' into the float -20.0 in both Python 2 and Python 3? A portable solution would be great.
From #j-f-sebastian:
a = u'\u221220'
float(a.replace(u'\N{MINUS SIGN}', '-'))
does the trick. See the related Python issue.

Unicode string in python

I have
(Pdb) email
'\x00t\x00e\x00s\x00t\x00#\x00g\x00m\x00a\x00i\x00l\x00.\x00c\x00o\x00m\x00'
(Pdb) print email
test#gmail.com
I need to validate whether thie value is an email format, however, how can i convert this string to actual ascii string?
Seems like it's encoded with utf-16 encoding.
>>> '\x00t\x00e\x00s\x00t\x00#\x00g\x00m\x00a\x00i\x00l\x00.\x00c\x00o\x00m\x00'.decode('utf-16')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\encodings\utf_16.py", line 16, in decode
return codecs.utf_16_decode(input, errors, True)
UnicodeDecodeError: 'utf16' codec can't decode byte 0x00 in position 28: truncated data
and truncated:
>>> '\x00t\x00e\x00s\x00t\x00#\x00g\x00m\x00a\x00i\x00l\x00.\x00c\x00o\x00m\x00'[1:].decode('utf-16')
u'test#gmail.com'
>>> '\x00t\x00e\x00s\x00t\x00#\x00g\x00m\x00a\x00i\x00l\x00.\x00c\x00o\x00m\x00'[1:].decode('utf-16-le')
u'test#gmail.com'
>>> '\x00t\x00e\x00s\x00t\x00#\x00g\x00m\x00a\x00i\x00l\x00.\x00c\x00o\x00m\x00'.decode('utf-16-be', 'ignore')
u'test#gmail.com'
Converting your email to an ASCII string can be done like this :
str(email.decode('utf-16le'))

How to encode unicode to urlenconing in python 2.7?

How to encode unicode to urlenconing in python 2.7
I want to encode unicode like '€'.
But I don't konw what should I do...
>>> u='€'
>>> _u=u'€'
>>> u
'\xa2\xe6'
>>> _u
u'\u20ac'
>>> urllib.quote(u)
'%A2%E6'
>>> urllib.quote(_u)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\urllib.py", line 1268, in quote
return ''.join(map(quoter, s))
KeyError: u'\u20ac'
>>> print urllib.unquote(urllib.quote(u))
€
>>>
Just I need '%A2%E6' Through unicode '€'.
What should I do?
>>> print urllib.unquote(urllib.quote(u'€'.encode('utf8')))
?
You should encode unicode string and then use urllib.quote. You yourself wrote the answer to your question
urllib.quote(u'€'.encode('utf8'))

Type error in Python: need a single Unicode character as parameter

When I try to convert a unicode variable to float using unicodedata.numeric(variable_name), I get this error "need a single Unicode character as parameter". Does anyone know how to resolve this?
Thanks!
Here is the code snippet I'm using :
f = urllib.urlopen("http://compling.org/cgi-bin/DAL_sentence_xml.cgi?sentence=good")
s = f.read()
f.close()
doc = libxml2dom.parseString(s)
measure = doc.getElementsByTagName("measure")
valence = unicodedata.numeric(measure[0].getAttribute("valence"))
activation = unicodedata.numeric(measure[0].getAttribute("activation"))
This is the error I'm getting when I run the code above
Traceback (most recent call last):
File "sentiment.py", line 61, in <module>
valence = unicodedata.numeric(measure[0].getAttribute("valence"))
TypeError: need a single Unicode character as parameter
Summary: Use float() instead.
The numeric function takes a single character. It does not do general conversions:
>>> import unicodedata
>>> unicodedata.numeric('½')
0.5
>>> unicodedata.numeric('12')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: need a single Unicode character as parameter
If you want to convert a number to a float, use the float() function.
>>> float('12')
12.0
It won't do that Unicode magic, however:
>>> float('½')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: could not convert string to float: '½'

Categories

Resources