How to convert a byte array to string? - python

I just finished creating a huffman compression algorithm . I converted my compressed text from a string to a byte array with bytearray(). Im attempting to decompress my huffman algorithm. My only concern though is that i cannot convert my byte array back into a string. Is there any built in function i could use to convert my byte array (with a variable) back into a string? If not is there a better method to convert my compressed string to something else? I attempted to use byte_array.decode() and I get this:
print("Index: ", Index) # The Index
# Subsituting text to our compressed index
for x in range(len(TextTest)):
TextTest[x]=Index[TextTest[x]]
NewText=''.join(TextTest)
# print(NewText)
# NewText=int(NewText)
byte_array = bytearray() # Converts the compressed string text to bytes
for i in range(0, len(NewText), 8):
byte_array.append(int(NewText[i:i + 8], 2))
NewSize = ("Compressed file Size:",sys.getsizeof(byte_array),'bytes')
print(byte_array)
print(byte_array)
print(NewSize)
x=bytes(byte_array)
x.decode()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x88 in position 0: invalid start byte

You can use .decode('ascii') (leave empty for utf-8).
>>> print(bytearray("abcd", 'utf-8').decode())
abcd
Source : Convert bytes to a string?

Related

Read binary file and check with matching character in python

I would like to scan through data files from GPS receiver byte-wise (actually it will be a continuous flow, not want to test the code with offline data). If find a match, then check the next 2 bytes for the 'length' and get the next 2 bytes and shift 2 bits(not byte) to the right, etc. I didn't handle binary before, so stuck in a simple task. I could read the binary file byte-by-byte, but can not find a way to match by desired pattern (i.e. D3).
with open("COM6_200417.ubx", "rb") as f:
byte = f.read(1) # read 1-byte at a time
while byte != b"":
# Do stuff with byte.
byte = f.read(1)
print(byte)
The output file is:
b'\x82'
b'\xc2'
b'\xe3'
b'\xb8'
b'\xe0'
b'\x00'
b'#'
b'\x13'
b'\x05'
b'!'
b'\xd3'
b'\x00'
b'\x13'
....
how to check if that byte is == '\xd3'? (D3)
also would like to know how to shift bit-wise, as I need to check decimal value consisting of 6 bits
(1-byte and next byte's first 2-bits). Considering, taking 2-bytes(8-bits) and then 2-bit right-shift
to get 6-bits. Is it possible in python? Any improvement/addition/changes are very much appreciated.
ps. can I get rid of that pesky 'b' from the front? but if ignoring it does not affect then no problem though.
Thanks in advance.
'That byte' is represented with a b'' in front, indicating that it is a byte object. To get rid of it, you can convert it to an int:
thatbyte = b'\xd3'
byteint = thatbyte[0] # or
int.from_bytes(thatbyte, 'big') # 'big' or 'little' endian, which results in the same when converting a single byte
To compare, you can do:
thatbyte == b'\xd3'
Thus compare a byte object with another byte object.
The shift << operator works on int only
To convert an int back to bytes (assuming it is [0..255]) you can use:
bytes([byteint]) # note the extra brackets!
And as for improvements, I would suggest to read the whole binary file at once:
with open("COM6_200417.ubx", "rb") as f:
allbytes = f.read() # read all
for val in allbytes:
# Do stuff with val, val is int !!!
print(bytes([val]))

Encoding a file with ord function

I'm trying to encode a file and output the encode into a new file, but I got this error:
TypeError: ord() expected string of length 1, but int found
My code:
from sys import argv, exit
def encode(data):
encoded = ''
while data:
current = data[0]
count = 1
for i in data[1:]:
if i == current:
count += 1
else:
break
if count == 255:
break
encoded += '{}{}'.format(chr(ord(current) & 255), chr(count & 255)) #error occurs here.
data = data[count:]
return encoded
if __name__ == '__main__':
if len(argv) < 2:
print('Please specify input file!')
exit(0)
with open(argv[1], 'rb') as (f):
data = f.read()
with open(argv[1] + '.out', 'wb') as (f):
f.write(encode(data))
Additional question: How do I decode the encoded file?
You are reading bytes (open(..., 'rb')), so when you take one element of the byte string, you get a byte, ie. a number. This number already is the character code, so just leave out the ord. Alternatively, you could open the file without the b modifier (open(..., 'r')), which will return a string; I would advise to keep it as a byte string though (or you could run into encoding issues if you are parsing something non-ascii).
You will run into a similar problem saving your file: you cannot write a string into a file opened with the b modifier. Since you have characters outside the ascii range (>128), writing as a string is not a good idea, since python will try to encode your characters (eg. in UTF-8), and you will end up with completely different bytes. Therefore, the best solution probably is not to concat your data to a string in your loop (the part where you do '{}{}'.format(...), but to have a list (encoded = [], concat with encoded.append(current)) and convert that to a byte string using bytes(encoded) after your loop. You can then pass that to write without a problem.
As for how to decode your file, you can just open the file like you do for encoding, read two bytes b1 and b2, and append [b1]*b2 to your output (again, as a list), and convert that to a byte string with bytes().

Convert 'bytes' object to string

I tried to find solution, but still stuck with it.
I want to use PyVisa to control function generator.
I have a waveform which is a list of values between 0 and 16382
Then I have to prepare it in a way that each waveform point occupies 2 bytes.
A value is represented in big-endian, MSB-first format, and is a straight binary. So I do binwaveform = pack('>'+'h'*len(waveform), *waveform)
And then when I try to write it to the instrument with AFG.write('trace ememory, '+ header + binwaveform) I get an error:
File ".\afg3000.py", line 97, in <module>
AFG.write('trace ememory, '+ header + binwaveform)
TypeError: Can't convert 'bytes' object to str implicitly
I tried to solve it with AFG.write('trace ememory, '+ header + binwaveform.decode()) but it looks that by default it tries to use ASCII characters what is not correct for some values: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 52787: invalid start byte
Could you please help with it?
binwaveform is a packed byte array of an integer. E.g:
struct.pack('<h', 4545)
b'\xc1\x11'
You can't print it as it makes no sense to your terminal. In the above example,
0xC1 is invalid ASCII and UTF-8.
When you append a byte string to a regular str (trace ememory, '+ header + binwaveform), Python wants to convert it to readable text but doesn't know how.
Decoding it implies that it's text - it's not.
The best thing to do is print the hex representation of it:
import codecs
binwaveform_hex = codecs.encode(binwaveform, 'hex')
binwaveform_hex_str = str(binwaveform_hex)
AFG.write('trace ememory, '+ header + binwaveform_hex_str)

byte to bit manipulation in python

I have a bmp file that I read in my Python program. Once I have read in the bytes, I want to do bit-wise operations on each byte I read in. My program is:
with open("ship.bmp", "rb") as f:
byte = f.read(1)
while byte != b"":
# Do stuff with byte.
byte = f.read(1)
print(byte)
output:
b'\xfe'
I was wondering how I can do manipulation on that? I.e convert it to bits. Some general pointers would be good. I lack experience with Python, so any help would be appreciated!
bytes objects yield integers from 0 through 255 inclusive when indexed. So, just perform the bit manipulation on the result of indexing.
3>> b'\xfe'[0]
254
3>> b'\xfe'[0] ^ 0x55
171
file.read(1) constructs a length 1 bytes objects, which is a bit overkill when you want the byte as an integer. To access each byte as an integer the following would be more succinct, and have the benefit of using a for loop.
with open("ship.bmp", "rb") as f:
byte_data = f.read()
for byte in byte_data:
# do stuff with byte. eg.
result = byte & 0x2
...

Reading UDP Packets

I am having some trouble dissecting a UDP packet. I am receiving the packets and storing the data and sender-address in variables 'data' and 'addr' with:
data,addr = UDPSock.recvfrom(buf)
This parses the data as a string, that I am now unable to turn into bytes. I know the structure of the datagram packet which is a total of 28 bytes, and that the data I am trying to get out is in bytes 17:28.
I have tried doing this:
mybytes = data[16:19]
print struct.unpack('>I', mybytes)
--> struct.error: unpack str size does not match format
And this:
response = (0, 0, data[16], data[17], 6)
bytes = array('B', response[:-1])
print struct.unpack('>I', bytes)
--> TypeError: Type not compatible with array type
And this:
print "\nData byte 17:", str.encode(data[17])
--> UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in position 0: ordinal not in range(128)
More specifically I want to parse what I think is an unsigned int. And now I am not sure what to try next. I am completely new to sockets and byte-conversions in Python, so any advice would be helpful :)
Thanks,
Thomas
An unsigned int32 is 4 bytes long, so you have to feed 4 bytes into struct.unpack.
Replace
mybytes = data[16:19]
with
mybytes = data[16:20]
(right number is the first byte not included, i.e. range(16,19) = [16,17,18]) and you should be good to go.

Categories

Resources