Convert 'bytes' object to string - python

I tried to find solution, but still stuck with it.
I want to use PyVisa to control function generator.
I have a waveform which is a list of values between 0 and 16382
Then I have to prepare it in a way that each waveform point occupies 2 bytes.
A value is represented in big-endian, MSB-first format, and is a straight binary. So I do binwaveform = pack('>'+'h'*len(waveform), *waveform)
And then when I try to write it to the instrument with AFG.write('trace ememory, '+ header + binwaveform) I get an error:
File ".\afg3000.py", line 97, in <module>
AFG.write('trace ememory, '+ header + binwaveform)
TypeError: Can't convert 'bytes' object to str implicitly
I tried to solve it with AFG.write('trace ememory, '+ header + binwaveform.decode()) but it looks that by default it tries to use ASCII characters what is not correct for some values: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 52787: invalid start byte
Could you please help with it?

binwaveform is a packed byte array of an integer. E.g:
struct.pack('<h', 4545)
b'\xc1\x11'
You can't print it as it makes no sense to your terminal. In the above example,
0xC1 is invalid ASCII and UTF-8.
When you append a byte string to a regular str (trace ememory, '+ header + binwaveform), Python wants to convert it to readable text but doesn't know how.
Decoding it implies that it's text - it's not.
The best thing to do is print the hex representation of it:
import codecs
binwaveform_hex = codecs.encode(binwaveform, 'hex')
binwaveform_hex_str = str(binwaveform_hex)
AFG.write('trace ememory, '+ header + binwaveform_hex_str)

Related

How do I decode this binary string in python?

So, I have this string 01010011101100000110010101101100011011000110111101110100011010000110010101110010011001010110100001101111011101110111100101101111011101010110010001101111011010010110111001100111011010010110110101100110011010010110111001100101011000010111001001100101011110010110111101110101011001100110100101101110011001010101000000000000
and I want to decode it using python, I'm getting this error
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 280: invalid start byte
According to this webiste: https://www.binaryhexconverter.com/binary-to-ascii-text-converter
The output should be S�ellotherehowyoudoingimfineareyoufineP
Here's my code:
def decodeAscii(bin_string):
binary_int = int(bin_string, 2);
byte_number = binary_int.bit_length() + 7 // 8
binary_array = binary_int.to_bytes(byte_number, "big")
ascii_text = binary_array.decode()
print(ascii_text)
How do I fix it?
Your bytes simply cannot be decoded as utf-8, just as the error message tells you.
utf-8 is the default encoding parameter of decode - and the best way to put in the correct encoding value is to know the encoding - otherwise you'll have to guess.
And guessing is probably what the website does, too, by trying the most common encodings, until one does not throw an exception:
def decodeAscii(bin_string):
binary_int = int(bin_string, 2);
byte_number = binary_int.bit_length() + 7 // 8
binary_array = binary_int.to_bytes(byte_number, "big")
ascii_text = "Bin string cannot be decoded"
for enc in ['utf-8', 'ascii', 'ansi']:
try:
ascii_text = binary_array.decode(encoding=enc)
break
except:
pass
print(ascii_text)
s = "01010011101100000110010101101100011011000110111101110100011010000110010101110010011001010110100001101111011101110111100101101111011101010110010001101111011010010110111001100111011010010110110101100110011010010110111001100101011000010111001001100101011110010110111101110101011001100110100101101110011001010101000000000000"
decodeAscii(s)
Output:
S°ellotherehowyoudoingimfineareyoufineP
But there's no guarantee that you find the "correct" encoding by guessing.
Your binary string is just not a valid ascii or utf-8 string. You can tell decode to ignore invalid sequences by saying
ascii_text = binary_array.decode(errors='ignore')
It could be solved in one line:
Try this:
def bin_to_text(bin_str):
bin_to_str = "".join([chr(int(bin_str[i:i+8],2)) for i in range(0,len(bin_str),8)])
return bin_to_str
bin_str = '01010011101100000110010101101100011011000110111101110100011010000110010101110010011001010110100001101111011101110111100101101111011101010110010001101111011010010110111001100111011010010110110101100110011010010110111001100101011000010111001001100101011110010110111101110101011001100110100101101110011001010101000000000000'
bin_to_str = bin_to_text(bin_str)
print(bin_to_str)
Output:
S°ellotherehowyoudoingimfineareyoufineP

How to convert a byte array to string?

I just finished creating a huffman compression algorithm . I converted my compressed text from a string to a byte array with bytearray(). Im attempting to decompress my huffman algorithm. My only concern though is that i cannot convert my byte array back into a string. Is there any built in function i could use to convert my byte array (with a variable) back into a string? If not is there a better method to convert my compressed string to something else? I attempted to use byte_array.decode() and I get this:
print("Index: ", Index) # The Index
# Subsituting text to our compressed index
for x in range(len(TextTest)):
TextTest[x]=Index[TextTest[x]]
NewText=''.join(TextTest)
# print(NewText)
# NewText=int(NewText)
byte_array = bytearray() # Converts the compressed string text to bytes
for i in range(0, len(NewText), 8):
byte_array.append(int(NewText[i:i + 8], 2))
NewSize = ("Compressed file Size:",sys.getsizeof(byte_array),'bytes')
print(byte_array)
print(byte_array)
print(NewSize)
x=bytes(byte_array)
x.decode()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x88 in position 0: invalid start byte
You can use .decode('ascii') (leave empty for utf-8).
>>> print(bytearray("abcd", 'utf-8').decode())
abcd
Source : Convert bytes to a string?

I can't figure out how to fix TypeError: can't concat str to bytes mean

I wrote a function to see encrypt a given text file. The below code is a small portion of the function.
#pad it before encrypting it
elif len(chunk) % 16 != 0:
chunk += ' ' * (16 - len(chunk) % 16)
#write encrypted data into output file
out_file.write(encryptor.encrypt(chunk))
Whenever I try to use the function I get an error that points to the last line saying
"TyprError: can't concat str to bytes". I'm not sure what I need to do in order to fix this error. I've tried a few things and they end leading me into more similar errors. Any guidance would be greatly appreciated.
The encryptor is below.
encryptor = PKCS1_OAEP.new(pub_key)
Your encryption method encryptor.encrypt() very likely accepts bytes as argument, not str. It also returns bytes very likely. So I suggest you to use the encode/decode methods as follows (example of utf-8 coding):
out_file.write(encryptor.encrypt(chunk.encode('utf-8')).decode('utf-8'))
You are trying to mix and match incompatible data types. Here is an example that will throw the same error:
str1 = bytes(123)
str2 = str(123)
str1 + str2
Go through your example and find where you are trying to concat a bytes value to a str value and make them match types.

Convertion between ISO-8859-2 and UTF-8 in Python

I'm wondering how can I convert ISO-8859-2 (latin-2) characters (I mean integer or hex values that represents ISO-8859-2 encoded characters) to UTF-8 characters.
What I need to do with my project in python:
Receive hex values from serial port, which are characters encoded in ISO-8859-2.
Decode them, this is - get "standard" python unicode strings from them.
Prepare and write xml file.
Using Python 3.4.3
txt_str = "ąęłóźć"
txt_str.decode('ISO-8859-2')
Traceback (most recent call last): File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'decode'
The main problem is still to prepare valid input for the "decode" method (it works in python 2.7.10, and thats the one I'm using in this project). How to prepare valid string from decimal value, which are Latin-2 code numbers?
Note that it would be uber complicated to receive utf-8 characters from serial port, thanks to devices I'm using and communication protocol limitations.
Sample data, on request:
68632057
62206A75
7A647261
B364206F
20616775
777A616E
616A2061
6A65696B
617A20B6
697A7970
6A65B361
70697020
77F36469
62202C79
6E647572
75206A65
7963696C
72656D75
6A616E20
73726F67
206A657A
65647572
77207972
73772065
00000069
This is some sample data. ISO-8859-2 pushed into uint32, 4 chars per int.
bit of code that manages unboxing:
l = l[7:].replace(",", "").replace(".", "").replace("\n","").replace("\r","") # crop string from uart, only data left
vl = [l[0:2], l[2:4], l[4:6], l[6:8]] # list of bytes
vl = vl[::-1] # reverse them - now in actual order
To get integer value out of hex string I can simply use:
int_vals = [int(hs, 16) for hs in vl]
Your example doesn't work because you've tried to use a str to hold bytes. In Python 3 you must use byte strings.
In reality, if you're using PySerial then you'll be reading byte strings anyway, which you can convert as required:
with serial.Serial('/dev/ttyS1', 19200, timeout=1) as ser:
s = ser.read(10)
# Py3: s == bytes
# Py2.x: s == str
my_unicode_string = s.decode('iso-8859-2')
If your iso-8895-2 data is actually then encoded to ASCII hex representation of the bytes, then you have to apply an extra layer of encoding:
with serial.Serial('/dev/ttyS1', 19200, timeout=1) as ser:
hex_repr = ser.read(10)
# Py3: hex_repr == bytes
# Py2.x: hex_repr == str
# Decodes hex representation to bytes
# Eg. b"A3" = b'\xa3'
hex_decoded = codecs.decode(hex_repr, "hex")
my_unicode_string = hex_decoded.decode('iso-8859-2')
Now you can pass my_unicode_string to your favourite XML library.
Interesting sample data. Ideally your sample data should be a direct print of the raw data received from PySerial. If you actually are receiving the raw bytes as 8-digit hexadecimal values, then:
#!python3
from binascii import unhexlify
data = b''.join(unhexlify(x)[::-1] for x in b'''\
68632057
62206A75
7A647261
B364206F
20616775
777A616E
616A2061
6A65696B
617A20B6
697A7970
6A65B361
70697020
77F36469
62202C79
6E647572
75206A65
7963696C
72656D75
6A616E20
73726F67
206A657A
65647572
77207972
73772065
00000069'''.splitlines())
print(data.decode('iso-8859-2'))
Output:
W chuj bardzo długa nazwa jakiejś zapyziałej pipidówy, brudnej ulicyumer najgorszej rudery we wsi
Google Translate of Polish to English:
The dick very long name some zapyziałej Small Town , dirty ulicyumer worst hovel in the village
This topic is closed. Working code, that handles what need to be done:
x=177
x.to_bytes(1, byteorder='big').decode("ISO-8859-2")

Python UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 74: invalid start byte

I have some data in the hbase stored as bytes and strings combined delimited by \x00 padding.
So the row in my hbase looks like:-
00:00:00:00:00:00\x00\x80\x00\x00\x00U\xEF\xA0\xB00\x002\x0040.0.2.1\x00
There is value corresponding to this row (key) which is 100.
Row description:-
00:00:00:00:00:00 - This is mac address and is a string
\x80\x00\x00\x00U\xEF\xA0\xB00 - This is the time which is saved as bytes
2 - this is customer id number stored as string
40.0.2.1 - this is store ID stored as string
I have used star base module to connect python to it's stargate server.
Here is my code snippet to connection to starbase and to the hbase table, and try fetching out the value of that row:-
from starbase import Connection
import starbase
C = Connection(host='10.10.5.2', port='60010')
get_table = C.table('dummy_table')
mac_address = "00:00:00:00:00:00"
time_start = "\x80\x00\x00\x00U\xEF\xA0\xB00"
cus_id = "2"
store_id = "40.0.2.1"
create_query = "%s\x00%s\x00%s\x00%s\x00" % (mac,time_start,cus_id,store_id)
fetch_result = get_table.fetch(create_query)
print fetch_result
Expected output is:-
100
You don't have to worry about the starbase connection and it's methods. They work just fine if everything was a string but now since time is converted into bytes, it is giving me error:-
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 74: invalid start byte
Just in case you need to see the output of create_query when I print it:-
00:00:1E:00:C8:36▒U▒v▒130.0.2.6
I would highly appreciate some help. Thanks
Try this
time_start = "\\x80\\x00\\x00\\x00U\\xEF\\xA0\\xB00"
\x is escape sequence for hex values,
create_query = "%s\x00%s\x00%s\x00%s\x00" % (mac,time_start,cus_id,store_id)
was converting time_start to string. And since x80 is not valid utf-8,it was throwing an error.
My guess would be that your database doesn't support storing bytes in these fields; perhaps you must store strings.
One approach would be to convert your bytes into base64 strings before storing them in the database. For example:
>>> from base64 import b64encode, b64decode
>>> b64encode("\x80\x00\x00\x00U\xEF\xA0\xB00")
'gAAAAFXvoLAw'
>>> b64decode(_)
'\x80\x00\x00\x00U\xef\xa0\xb00'

Categories

Resources