I encoded a comma delimited list (ex. "1,2,3") of ids to base64 then the returned data from the form looks like x below.
I tried decoding and encoding and all sorts of things but nothing seems to return a the original string.
x = "b'Mw=='"
base64.b64decode(x)
# b'l\xcc'
x.decode()
# AttributeError: 'str' object has no attribute 'decode'
y = x.encode('utf-8')
print(y)
# b"b'Mw=='"
What am I missing?
If you have b'...' in your data, that's the repr()esentation of a bytestring.
If you can't get your data source to fix their content (it should just be Mw==: what they're giving you isn't valid base64 encoding!), you can use ast.literal_eval() to read it into a bytestring:
>>> import ast, base64
>>> x = "b'Mw=='"
>>> base64.b64decode(ast.literal_eval(x))
'3'
Related
This question is similar to this one here but if I put this into this code like so:
import base64
theone = input('Enter your plaintext: ')
encoded = str(base64.b64encode(theone))
encoded = base64.b64encode(encoded.encode('ascii'))
encoded = encoded[2:]
o = len(encoded)
o = o-1
encoded = encoded[:o]
print(encoded)
it raises this problem:
line 58, in b64encode
encoded = binascii.b2a_base64(s, newline=False)
TypeError: a bytes-like object is required, not 'str'
And then if I remove this line of code:
encoded = base64.b64encode(encoded.encode('ascii'))
then it raises the same error. I'm not sure what to do from here and I would be grateful for any help.
You seem to be having problems with bytes and strings. The value returned by input is a string (str), but base64.b64encode expects bytes (bytes).
If you print a bytes instance you see something like
b'spam'
To remove the leading 'b' you need to decode back to a str.
To make your code work, pass bytes to base64.b64encode, and decode the result to print it.
>>> theone = input('Enter your plaintext: ')
Enter your plaintext: Hello World!
>>> encoded = base64.b64encode(theone.encode())
>>> encoded
b'SGVsbG8gV29ybGQh'
>>> print(encoded.decode())
SGVsbG8gV29ybGQh
Well, let me introduce the problem first.
I've got some data via POST/GET requests. The data were UTF-8 encoded string. Little did I know that, and converted it just by str() method. And now I have full database of "nonsense data" and couldn't find a way back.
Example code:
unicode_str - this is the string I should obtain
encoded_str - this is the string I got with POST/GET requests - initial data
bad_str - the data I have in the Database at the moment and I need to get unicode from.
So apparently I know how to convert:
unicode_str =(encode)=> encoded_str =(str)=> bad_str
But I couldn't come up with solution back:
bad_str =(???)=> encoded_str =(decode)=> unicode_str
In [1]: unicode_str = 'Příliš žluťoučký kůň úpěl ďábelské ódy'
In [2]: unicode_str
Out[2]: 'Příliš žluťoučký kůň úpěl ďábelské ódy'
In [3]: encoded_str = unicode_str.encode("UTF-8")
In [4]: encoded_str
Out[4]: b'P\xc5\x99\xc3\xadli\xc5\xa1 \xc5\xbelu\xc5\xa5ou\xc4\x8dk\xc3\xbd k\xc5\xaf\xc5\x88 \xc3\xbap\xc4\x9bl \xc4\x8f\xc3\xa1belsk\xc3\xa9 \xc3\xb3dy'
In [5]: bad_str = str(encoded_str)
In [6]: bad_str
Out[6]: "b'P\\xc5\\x99\\xc3\\xadli\\xc5\\xa1 \\xc5\\xbelu\\xc5\\xa5ou\\xc4\\x8dk\\xc3\\xbd k\\xc5\\xaf\\xc5\\x88 \\xc3\\xbap\\xc4\\x9bl \\xc4\\x8f\\xc3\\xa1belsk\\xc3\\xa9 \\xc3\\xb3dy'"
In [7]: new_encoded_str = some_magical_function_here(bad_str) ???
You turned a bytes object to a string, which is just a representation of the bytes object. You can obtain the original bytes object by using ast.literal_eval() (credits to Mark Tolonen for the suggestion), then a simple decode() will do the job.
>>> import ast
>>> ast.literal_eval(bad_str).decode('utf-8')
'Příliš žluťoučký kůň úpěl ďábelské ódy'
Since you were the one who generated the strings, using eval() would be safe, but why not be safer?
Please do not use eval, instead:
import codecs
s = 'žluťoučký'
x = str(s.encode('utf-8'))
# strip quotes
x = x[2:-1]
# unescape
x = codecs.escape_decode(x)[0].decode('utf-8')
# profit
x == s
Python imaplib sometimes returns strings that looks like this:
=?utf-8?Q?Repertuar_wydarze=C5=84_z_woj._Dolno=C5=9Bl=C4=85skie?=
What is the name for this notation?
How can I decode (or should I say encode?) it to UTF8?
In short:
>>> from email.header import decode_header
>>> msg = decode_header('=?utf-8?Q?Repertuar_wydarze=C5=84_z_woj._Dolno=C5=9Bl=C4=85skie?=')[0][0].decode('utf-8')
>>> msg
'Repertuar wydarze\u0144 z woj. Dolno\u015bl\u0105skie'
My computer doesn't show the polish characters, but they should appear in yours (locales etc.)
Explained:
Use the email.header decoder:
>>> from email.header import decode_header
>>> value = decode_header('=?utf-8?Q?Repertuar_wydarze=C5=84_z_woj._Dolno=C5=9Bl=C4=85skie?=')
>>> value
[(b'Repertuar wydarze\xc5\x84 z woj. Dolno\xc5\x9bl\xc4\x85skie', 'utf-8')]
That will return a list with the decoded header, usually containing one tuple with the decoded message and the encoding detected (sometimes more than one pair).
>>> msg, encoding = decode_header('=?utf-8?Q?Repertuar_wydarze=C5=84_z_woj._Dolno=C5=9Bl=C4=85skie?=')[0]
>>> msg
b'Repertuar wydarze\xc5\x84 z woj. Dolno\xc5\x9bl\xc4\x85skie'
>>> encoding
'utf-8'
And finally, if you want msg as a normal utf-8 string, use the bytes decode method:
>>> msg = msg.decode('utf-8')
>>> msg
'Repertuar wydarze\u0144 z woj. Dolno\u015bl\u0105skie'
You can directly use the bytes decoder instead , here is an example:
result, data = imapSession.uid('search', None, "ALL") #search and return uids
latest_email_uid = data[0].split()[-1] #data[] is a list, using split() to separate them by space and getting the latest one by [-1]
result, data = imapSession.uid('fetch', latest_email_uid, '(BODY.PEEK[])')
raw_email = data[0][1].decode("utf-8") #using utf-8 decoder`
I'm wondering how can I convert ISO-8859-2 (latin-2) characters (I mean integer or hex values that represents ISO-8859-2 encoded characters) to UTF-8 characters.
What I need to do with my project in python:
Receive hex values from serial port, which are characters encoded in ISO-8859-2.
Decode them, this is - get "standard" python unicode strings from them.
Prepare and write xml file.
Using Python 3.4.3
txt_str = "ąęłóźć"
txt_str.decode('ISO-8859-2')
Traceback (most recent call last): File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'decode'
The main problem is still to prepare valid input for the "decode" method (it works in python 2.7.10, and thats the one I'm using in this project). How to prepare valid string from decimal value, which are Latin-2 code numbers?
Note that it would be uber complicated to receive utf-8 characters from serial port, thanks to devices I'm using and communication protocol limitations.
Sample data, on request:
68632057
62206A75
7A647261
B364206F
20616775
777A616E
616A2061
6A65696B
617A20B6
697A7970
6A65B361
70697020
77F36469
62202C79
6E647572
75206A65
7963696C
72656D75
6A616E20
73726F67
206A657A
65647572
77207972
73772065
00000069
This is some sample data. ISO-8859-2 pushed into uint32, 4 chars per int.
bit of code that manages unboxing:
l = l[7:].replace(",", "").replace(".", "").replace("\n","").replace("\r","") # crop string from uart, only data left
vl = [l[0:2], l[2:4], l[4:6], l[6:8]] # list of bytes
vl = vl[::-1] # reverse them - now in actual order
To get integer value out of hex string I can simply use:
int_vals = [int(hs, 16) for hs in vl]
Your example doesn't work because you've tried to use a str to hold bytes. In Python 3 you must use byte strings.
In reality, if you're using PySerial then you'll be reading byte strings anyway, which you can convert as required:
with serial.Serial('/dev/ttyS1', 19200, timeout=1) as ser:
s = ser.read(10)
# Py3: s == bytes
# Py2.x: s == str
my_unicode_string = s.decode('iso-8859-2')
If your iso-8895-2 data is actually then encoded to ASCII hex representation of the bytes, then you have to apply an extra layer of encoding:
with serial.Serial('/dev/ttyS1', 19200, timeout=1) as ser:
hex_repr = ser.read(10)
# Py3: hex_repr == bytes
# Py2.x: hex_repr == str
# Decodes hex representation to bytes
# Eg. b"A3" = b'\xa3'
hex_decoded = codecs.decode(hex_repr, "hex")
my_unicode_string = hex_decoded.decode('iso-8859-2')
Now you can pass my_unicode_string to your favourite XML library.
Interesting sample data. Ideally your sample data should be a direct print of the raw data received from PySerial. If you actually are receiving the raw bytes as 8-digit hexadecimal values, then:
#!python3
from binascii import unhexlify
data = b''.join(unhexlify(x)[::-1] for x in b'''\
68632057
62206A75
7A647261
B364206F
20616775
777A616E
616A2061
6A65696B
617A20B6
697A7970
6A65B361
70697020
77F36469
62202C79
6E647572
75206A65
7963696C
72656D75
6A616E20
73726F67
206A657A
65647572
77207972
73772065
00000069'''.splitlines())
print(data.decode('iso-8859-2'))
Output:
W chuj bardzo długa nazwa jakiejś zapyziałej pipidówy, brudnej ulicyumer najgorszej rudery we wsi
Google Translate of Polish to English:
The dick very long name some zapyziałej Small Town , dirty ulicyumer worst hovel in the village
This topic is closed. Working code, that handles what need to be done:
x=177
x.to_bytes(1, byteorder='big').decode("ISO-8859-2")
I've the variable buffer(string) and eip(byte) and I want concatenate to buffer.
My code:
junk = "\x41" * 50 # A
eip = pack("<L", 0x0015FCC4) # false jmp register
buffer = junk + eip # Problem HERE
print(buffer)
Error:
TypeError: Can't convert 'bytes' object to str implicitly
Well, I can't convert eip to string, because if I convert eip to string with str(eip), the output is: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAb'\xc4\xfc\x15\x00'
I just want that buffer contain the hexadecimal string to use it, and for this reason I put the print (for debug).
Thank you.
The following returns 'c4fc1500'
import binascii
binascii.hexlify(eip)
Is that what you need?