Python read a binary file and decode

Python read a binary file and decode - python

I am quite new in python and I need to solve this simple problem. Already there are several similar questions but still I cannot solve it.
I need to read a binary file, which is composed by several blocks of bytes. For example the header is composed by 6 bytes and I would like to extract those 6 bytes and transform ins sequence of binary characters like 000100110 011001 for example.
navatt_dir='C:/PROCESSING/navatt_read/'
navatt_filename='OSPS_FRMT_NAVATT____20130621T100954_00296_caseB.bin'
navatt_path=navatt_dir+navatt_filename
navatt_file=open(navatt_path, 'rb')
header=list(navatt_file.read(6))
print header
As result of the list i have the following
%run C:/PROCESSING/navatt_read/navat_read.py
['\t', 'i', '\xc0', '\x00', '\x00', 't']
which is not what i want.
I would like also to read a particular value in the binary file knowing the position and the length, without reading all the file. IS it possible
thanks

ByteArray
A bytearray is a mutable sequence of bytes (Integers where 0 ≤ x ≤ 255). You can construct a bytearray from a string (If it is not a byte-string, you will have to provide encoding), an iterable of byte-sized integers, or an object with a buffer interface. You can of course just build it manually as well.
An example using a byte-string:
string = b'DFH'
b = bytearray(string)
# Print it as a string
print b
# Prints the individual bytes, showing you that it's just a list of ints
print [i for i in b]
# Lets add one to the D
b[0] += 1
# And print the string again to see the result!
print b
The result:
DFH
[68, 70, 72]
EFH
This is the type you want if you want raw byte manipulation. If what you want is to read 4 bytes as a 32bit int, one would use the struct module, with the unpack method, but I usually just shift them together myself from a bytearray.
Printing the header in binary
What you seem to want is to take the string you have, convert it to a bytearray, and print them as a string in base 2/binary.
So here is a short example for how to write the header out (I read random data from a file named "dump"):
with open('dump', 'rb') as f:
header = f.read(6)
b = bytearray(header)
print ' '.join([bin(i)[2:].zfill(8) for i in b])
After converting it to a bytearray, I call bin() on every single one, which gives back a string with the binary representation we need, in the format of "0b1010". I don't want the "0b", so I slice it off with [2:]. Then, I use the string method zfill, which allows me to have the required amount of 0's prepended for the string to be 8 long (which is the amount of bits we need), as bin will not show any unneeded zeroes.
If you're new to the language, the last line might look quite mean. It uses list comprehension to make a list of all the binary strings we want to print, and then join them into the final string with spaces between the elements.
A less pythonic/convoluted variant of the last line would be:
result = []
for byte in b:
string = bin(i)[2:] # Make a binary string and slice the first two bytes
result.append(string.zfill(8)) # Append a 0-padded version to the results list
# Join the array to a space separated string and print it!
print ' '.join(result)
I hope this helps!

Related

Converting integer to a pair of bytes produces unexpected format?

I am using python 3.8.5, and trying to convert from an integer in the range (0,65535) to a pair of bytes. I am currently using the following code:
from struct import pack
input_integer = 2111
bytes_val = voltage.to_bytes(2,'little')
output_data = struct.pack('bb',bytes_val[1],bytes_val[0])
print(output_data)
This produces the following output:
b'\x08?'
This \x08 is 8 in hex, the most significant byte, and ? is 63 in ascii. So together, the numbers add up to 2111 (8*256+63=2111). What I can't figure out is why the least significant byte is coming out in ascii instead of hex? It's very strange to me that it's in a different format than the MSB right next to it. I want it in hex for the output data, and am trying to figure out how to achieve that.
I have also tried modifying the format string in the last line to the following:
output_data = struct.pack('cc',bytes_val[1],bytes_val[0])
which produces the following error:
struct.error: char format requires a bytes object of length 1
I checked the types at each step, and it looks like bytes_val is a bytearray of length 2, but when I take one of the individual elements, say bytes_val[1], it is an integer rather than a byte array.
Any ideas?

All your observations can be verified from the docs for the bytes class:
While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers
In Python strings any letters and punctuation are represented by themselves in ASCII, while any control codes by their hexadecimal value (0-31, 127). You can see this by printing ''.join(map(chr, range(128))). Bytes literals follow the same convention, except that individual byte elements are integer, i.e., output_data[0].
If you want to represent everything as hex
>>> output_data.hex()
'083f'
>>> bytes.fromhex('083f') # to recover
b'\x08?'
As of version 3.8 bytes.hex() now supports optional sep and bytes_per_sep parameters to insert separators between bytes in the hex output.
>>> b'abcdef'.hex(' ', 2)
'6162 6364 6566'

How to convert 0's and 1's to binary and back for a Huffman algorithm?

Currently, I am writing a Huffman Algorithm but I have a problem with converting the binary part.
The rest of the program is already working. The program can create a tree from the symbols and can create a string of 0's and 1's which represent the symbols. But now I want to convert this string to a binary format and convert it back again. Currently, I am using this code to convert the string to binary.
def toBytes(data):
b = bytearray()
for i in range(0, len(data), 8):
b.append(int(data[i:i+8]), 2)
return bytes(b)
I can convert this string to a binary format but can't convert it back.
For example, when I insert "01111101011000" to the function it returns b'}\x18'. How can I convert this binary format back to my 0's and 1's?

You can write a bytes-to-binarylike-string method by making use of two observations:
You can use str.format's b type specifier to turn an integer into an equivalent string of ones and zeroes.
A bytes object can be treated just like a list of integers when you're iterating over it.
>>> def to_bin(b):
... return "".join("{:08b}".format(x) for x in b)
...
>>> b = b'}\x18'
>>> print(to_bin(b))
0111110100011000

how to split this string of HEX bytes

i have the following string of hex bytes from a smart meter:
'~\xa0\x1e\x03\x00\x02\xfe\xff4\xca\xec\xe6\xe7\x00\xc4\x01A\x00\x02\x04\x12\x00\x05\x11\x01\x11\x01\x11\x00\xc7\x11 ~'
I want to separate them in a list and then pass them to decimals or int. The .split() python function won't work, any ideas?
thanks!

You can convert a string to a list of ascii values with ord.
values = [ord(c) for c in data]
Although, depending on what you want to do, you might not even need to cast your data as a list since a str is already iterable.
Instead, iterate over your characters and recover their value. Here is a simplified example.
dt = '\xa0\x1e\x03\x00\x02\xfe'
for x in map(ord, dt):
print(x)
Output
160
30
3
0
2
254

Convert a list of ints to a float

I am trying to convert a number stored as a list of ints to a float type. I got the number via a serial console and want to reassemble it back together into a float.
The way I would do it in C is something like this:
bit_data = ((int16_t)byte_array[0] << 8) | byte_array[1];
result = (float)bit_data;
What I tried to use in python is a much more simple conversion:
result = int_list[0]*256.0 + int_list[1]
However, this does not preserve the sign of the result, as the C code does.
What is the right way to do this in python?
UPDATE:
Python version is 2.7.3.
My byte array has a length of 2.
in the python code byte_array is list of ints. I've renamed it to avoid misunderstanding. I can not just use the float() function because it will not preserve the sign of the number.

I'm a bit confused by what data you have, and how it is represented in Python. As I understand it, you have received two unsigned bytes over a serial connection, which are now represented by a list of two python ints. This data represents a big endian 16-bit signed integer, which you want to extract and turn into a float. eg. [0xFF, 0xFE] -> -2 -> -2.0
import array, struct
two_unsigned_bytes = [255, 254] # represented by ints
byte_array = array.array("B", two_unsigned_bytes)
# change above to "b" if the ints represent signed bytes ie. in range -128 to 127
signed_16_bit_int, = struct.unpack(">h", byte_array)
float_result = float(signed_16_bit_int)

I think what you want is the struct module.
Here's a round trip snippet:
import struct
sampleValue = 42.13
somebytes = struct.pack('=f', sampleValue)
print(somebytes)
result = struct.unpack('=f', somebytes)
print(result)
result may be surprising to you. unpack returns a tuple. So to get to the value you can do
result[0]
or modify the result setting line to be
result = struct.unpack('=f', some bytes)[0]
I personally hate that, so use the following instead
result , = struct.unpack('=f', some bytes) # tuple unpacking on assignment
The second thing you'll notice is that the value has extra digits of noise. That's because python's native floating point representation is double.
(This is python3 btw, adjust for using old versions of python as appropriate)

I am not sure I really understand what you are doing, but I think you got 4 bytes from a stream and know them to represent a float32 value. The way you handling this suggests big-endian byte-order.
Python has the struct package (https://docs.python.org/2/library/struct.html) to handle bytestreams.
import struct
stream = struct.pack(">f", 2/3.)
len(stream) # 4
reconstructed_float = struct.unpack(">f", stream)

Okay, so I think int_list isn't really just a list of ints. The ints are constrained to 0-255 and represent bytes that can be built into a signed integer. You then want to turn that into a float. The trick is to set the sign of the first byte properly and then procede much like you did.
float((-(byte_array[0]-127) if byte_array[0]>127 else byte_array[0])*256 + byte_array[1])

Python writing binary

I use python 3
I tried to write binary to file I use r+b.
for bit in binary:
fileout.write(bit)
where binary is a list that contain numbers.
How do I write this to file in binary?
The end file have to look like
b' x07\x08\x07\
Thanks

When you open a file in binary mode, then you are essentially working with the bytes type. So when you write to the file, you need to pass a bytes object, and when you read from it, you get a bytes object. In contrast, when opening the file in text mode, you are working with str objects.
So, writing “binary” is really writing a bytes string:
with open(fileName, 'br+') as f:
f.write(b'\x07\x08\x07')
If you have actual integers you want to write as binary, you can use the bytes function to convert a sequence of integers into a bytes object:
>>> lst = [7, 8, 7]
>>> bytes(lst)
b'\x07\x08\x07'
Combining this, you can write a sequence of integers as a bytes object into a file opened in binary mode.
As Hyperboreus pointed out in the comments, bytes will only accept a sequence of numbers that actually fit in a byte, i.e. numbers between 0 and 255. If you want to store arbitrary (positive) integers in the way they are, without having to bother about knowing their exact size (which is required for struct), then you can easily write a helper function which splits those numbers up into separate bytes:
def splitNumber (num):
lst = []
while num > 0:
lst.append(num & 0xFF)
num >>= 8
return lst[::-1]
bytes(splitNumber(12345678901234567890))
# b'\xabT\xa9\x8c\xeb\x1f\n\xd2'
So if you have a list of numbers, you can easily iterate over them and write each into the file; if you want to extract the numbers individually later you probably want to add something that keeps track of which individual bytes belong to which numbers.
with open(fileName, 'br+') as f:
for number in numbers:
f.write(bytes(splitNumber(number)))

where binary is a list that contain numbers
A number can have one thousand and one different binary representations (endianess, width, 1-complement, 2-complement, floats of different precision, etc). So first you have to decide in which representation you want to store your numbers. Then you can use the struct module to do so.
For example the byte sequence 0x3480 can be interpreted as 32820 (little-endian unsigned short), or -32716 (little-endian signed short) or 13440 (big-endian short).
Small example:
#! /usr/bin/python3
import struct
binary = [1234, 5678, -9012, -3456]
with open('out.bin', 'wb') as f:
for b in binary:
f.write(struct.pack('h', b)) #or whatever format you need
with open('out.bin', 'rb') as f:
content = f.read()
for b in content:
print(b)
print(struct.unpack('hhhh', content)) #same format as above
prints
210
4
46
22
204
220
128
242
(1234, 5678, -9012, -3456)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python read a binary file and decode - python

Related

Converting integer to a pair of bytes produces unexpected format?

How to convert 0's and 1's to binary and back for a Huffman algorithm?

how to split this string of HEX bytes

Convert a list of ints to a float

Python writing binary

Categories

Resources