Using Python How can I read the bits in a byte? - python

I have a file where the first byte contains encoded information. In Matlab I can read the byte bit by bit with var = fread(file, 8, 'ubit1'), and then retrieve each bit by var(1), var(2), etc.
Is there any equivalent bit reader in python?

Read the bits from a file, low bits first.
def bits(f):
bytes = (ord(b) for b in f.read())
for b in bytes:
for i in xrange(8):
yield (b >> i) & 1
for b in bits(open('binary-file.bin', 'r')):
print b

The smallest unit you'll be able to work with is a byte. To work at the bit level you need to use bitwise operators.
x = 3
#Check if the 1st bit is set:
x&1 != 0
#Returns True
#Check if the 2nd bit is set:
x&2 != 0
#Returns True
#Check if the 3rd bit is set:
x&4 != 0
#Returns False

With numpy it is easy like this:
Bytes = numpy.fromfile(filename, dtype = "uint8")
Bits = numpy.unpackbits(Bytes)
More info here:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.fromfile.html

You won't be able to read each bit one by one - you have to read it byte by byte. You can easily extract the bits out, though:
f = open("myfile", 'rb')
# read one byte
byte = f.read(1)
# convert the byte to an integer representation
byte = ord(byte)
# now convert to string of 1s and 0s
byte = bin(byte)[2:].rjust(8, '0')
# now byte contains a string with 0s and 1s
for bit in byte:
print bit

Joining some of the previous answers I would use:
[int(i) for i in "{0:08b}".format(byte)]
For each byte read from the file. The results for an 0x88 byte example is:
>>> [int(i) for i in "{0:08b}".format(0x88)]
[1, 0, 0, 0, 1, 0, 0, 0]
You can assign it to a variable and work as per your initial request.
The "{0.08}" is to guarantee the full byte length

To read a byte from a file: bytestring = open(filename, 'rb').read(1). Note: the file is opened in the binary mode.
To get bits, convert the bytestring into an integer: byte = bytestring[0] (Python 3) or byte = ord(bytestring[0]) (Python 2) and extract the desired bit: (byte >> i) & 1:
>>> for i in range(8): (b'a'[0] >> i) & 1
...
1
0
0
0
0
1
1
0
>>> bin(b'a'[0])
'0b1100001'

There are two possible ways to return the i-th bit of a byte. The "first bit" could refer to the high-order bit or it could refer to the lower order bit.
Here is a function that takes a string and index as parameters and returns the value of the bit at that location. As written, it treats the low-order bit as the first bit. If you want the high order bit first, just uncomment the indicated line.
def bit_from_string(string, index):
i, j = divmod(index, 8)
# Uncomment this if you want the high-order bit first
# j = 8 - j
if ord(string[i]) & (1 << j):
return 1
else:
return 0
The indexing starts at 0. If you want the indexing to start at 1, you can adjust index in the function before calling divmod.
Example usage:
>>> for i in range(8):
>>> print i, bit_from_string('\x04', i)
0 0
1 0
2 1
3 0
4 0
5 0
6 0
7 0
Now, for how it works:
A string is composed of 8-bit bytes, so first we use divmod() to break the index into to parts:
i: the index of the correct byte within the string
j: the index of the correct bit within that byte
We use the ord() function to convert the character at string[i] into an integer type. Then, (1 << j) computes the value of the j-th bit by left-shifting 1 by j. Finally, we use bitwise-and to test if that bit is set. If so return 1, otherwise return 0.

Supposing you have a file called bloom_filter.bin which contains an array of bits and you want to read the entire file and use those bits in an array.
First create the array where the bits will be stored after reading,
from bitarray import bitarray
a=bitarray(size) #same as the number of bits in the file
Open the file,
using open or with, anything is fine...I am sticking with open here,
f=open('bloom_filter.bin','rb')
Now load all the bits into the array 'a' at one shot using,
f.readinto(a)
'a' is now a bitarray containing all the bits

This is pretty fast I would think:
import itertools
data = range(10)
format = "{:0>8b}".format
newdata = (False if n == '0' else True for n in itertools.chain.from_iterable(map(format, data)))
print(newdata) # prints tons of True and False

I think this is a more pythonic way:
a = 140
binary = format(a, 'b')
The result of this block is:
'10001100'
I was to get bit planes of the image and this function helped me to write this block:
def img2bitmap(img: np.ndarray) -> list:
if img.dtype != np.uint8 or img.ndim > 2:
raise ValueError("Image is not uint8 or gray")
bit_mat = [np.zeros(img.shape, dtype=np.uint8) for _ in range(8)]
for row_number in range(img.shape[0]):
for column_number in range(img.shape[1]):
binary = format(img[row_number][column_number], 'b')
for idx, bit in enumerate("".join(reversed(binary))[:]):
bit_mat[idx][row_number, column_number] = 2 ** idx if int(bit) == 1 else 0
return bit_mat
Also by this block, I was able to make primitives image from extracted bit planes
img = cv2.imread('test.jpg', cv2.IMREAD_GRAYSCALE)
out = img2bitmap(img)
original_image = np.zeros(img.shape, dtype=np.uint8)
for i in range(original_image.shape[0]):
for j in range(original_image.shape[1]):
for data in range(8):
x = np.array([original_image[i, j]], dtype=np.uint8)
data = np.array([data], dtype=np.uint8)
flag = np.array([0 if out[data[0]][i, j] == 0 else 1], dtype=np.uint8)
mask = flag << data[0]
x[0] = (x[0] & ~mask) | ((flag[0] << data[0]) & mask)
original_image[i, j] = x[0]

Related

Convert 2 integers to hex/byte array?

I'm using a Python to transmit two integers (range 0...4095) via SPI. The package seems to expect a byte array in form of [0xff,0xff,0xff].
So e.g. 1638(hex:666) and 1229(hex:4cd) should yield [0x66,0x64,0xcd].
So would an effective conversion look like as the mixed byte in the middle seems quite nasty?
You can do it by left shifting and then bitwise OR'ing the two 12-bit values together and using the int_to_bytes() function shown below, which will work in Python 2.x.
In Python 3, the int type has a built-in method called to_bytes() that will do this and more, so in that version you wouldn't need to supply your own.
def int_to_bytes(n, minlen=0):
""" Convert integer to bytearray with optional minimum length.
"""
if n > 0:
arr = []
while n:
n, rem = n >> 8, n & 0xff
arr.append(rem)
b = bytearray(reversed(arr))
elif n == 0:
b = bytearray(b'\x00')
else:
raise ValueError('Only non-negative values supported')
if minlen > 0 and len(b) < minlen: # zero padding needed?
b = (minlen-len(b)) * '\x00' + b
return b
a, b = 1638, 1229 # two 12 bit values
v = a << 12 | b # shift first 12 bits then OR with second
ba = int_to_bytes(v, 3) # convert to array of bytes
print('[{}]'.format(', '.join(hex(b) for b in ba))) # -> [0x66, 0x64, 0xcd]

Python working with bits

I want to do a bit operation, and need some help:
I have a word of 16 bit and want to split it into two, reverse each and then join them again.
Example if i have 0b11000011
First I divide it into 0b1100 and 0b0011
Then i reverse both getting 0b0011 and 0b1100
And finally rejoin them getting 0b00111100
Thanks!
Here's one way to do it:
def rev(n):
res = 0
mask = 0x01
while mask <= 0x80:
res <<= 1
res |= bool(n & mask)
mask <<= 1
return res
x = 0b1100000110000011
x = (rev(x >> 8) << 8) | rev(x & 0xFF)
print bin(x) # 0b1000001111000001
Note that the method above operates on words, not bytes as example in the question.
here are some basic operations you can try, and you can concatenate results after splitting your string in two and reversing it
a = "0b11000011" #make a string
b = a[:6] #get first 5 chars
c = a[::-1] # invert the string

Write boolean string to binary file?

I have a string of booleans and I want to create a binary file using these booleans as bits. This is what I am doing:
# first append the string with 0s to make its length a multiple of 8
while len(boolString) % 8 != 0:
boolString += '0'
# write the string to the file byte by byte
i = 0
while i < len(boolString) / 8:
byte = int(boolString[i*8 : (i+1)*8], 2)
outputFile.write('%c' % byte)
i += 1
But this generates the output 1 byte at a time and is slow. What would be a more efficient way to do it?
It should be quicker if you calculate all your bytes first and then write them all together. For example
b = bytearray([int(boolString[x:x+8], 2) for x in range(0, len(boolString), 8)])
outputFile.write(b)
I'm also using a bytearray which is a natural container to use, and can also be written directly to your file.
You can of course use libraries if that's appropriate such as bitarray and bitstring. Using the latter you could just say
bitstring.Bits(bin=boolString).tofile(outputFile)
Here's another answer, this time using an industrial-strength utility function from the PyCrypto - The Python Cryptography Toolkit where, in version 2.6 (the current latest stable release), it's defined inpycrypto-2.6/lib/Crypto/Util/number.py.
The comments preceeding it say:
Improved conversion functions contributed by Barry Warsaw, after careful benchmarking
import struct
def long_to_bytes(n, blocksize=0):
"""long_to_bytes(n:long, blocksize:int) : string
Convert a long integer to a byte string.
If optional blocksize is given and greater than zero, pad the front of the
byte string with binary zeros so that the length is a multiple of
blocksize.
"""
# after much testing, this algorithm was deemed to be the fastest
s = b('')
n = long(n)
pack = struct.pack
while n > 0:
s = pack('>I', n & 0xffffffffL) + s
n = n >> 32
# strip off leading zeros
for i in range(len(s)):
if s[i] != b('\000')[0]:
break
else:
# only happens when n == 0
s = b('\000')
i = 0
s = s[i:]
# add back some pad bytes. this could be done more efficiently w.r.t. the
# de-padding being done above, but sigh...
if blocksize > 0 and len(s) % blocksize:
s = (blocksize - len(s) % blocksize) * b('\000') + s
return s
You can convert a boolean string to a long using data = long(boolString,2). Then to write this long to disk you can use:
while data > 0:
data, byte = divmod(data, 0xff)
file.write('%c' % byte)
However, there is no need to make a boolean string. It is much easier to use a long. The long type can contain an infinite number of bits. Using bit manipulation you can set or clear the bits as needed. You can then write the long to disk as a whole in a single write operation.
You can try this code using the array class:
import array
buffer = array.array('B')
i = 0
while i < len(boolString) / 8:
byte = int(boolString[i*8 : (i+1)*8], 2)
buffer.append(byte)
i += 1
f = file(filename, 'wb')
buffer.tofile(f)
f.close()
A helper class (shown below) makes it easy:
class BitWriter:
def __init__(self, f):
self.acc = 0
self.bcount = 0
self.out = f
def __del__(self):
self.flush()
def writebit(self, bit):
if self.bcount == 8 :
self.flush()
if bit > 0:
self.acc |= (1 << (7-self.bcount))
self.bcount += 1
def writebits(self, bits, n):
while n > 0:
self.writebit( bits & (1 << (n-1)) )
n -= 1
def flush(self):
self.out.write(chr(self.acc))
self.acc = 0
self.bcount = 0
with open('outputFile', 'wb') as f:
bw = BitWriter(f)
bw.writebits(int(boolString,2), len(boolString))
bw.flush()
Use the struct package.
This can be used in handling binary data stored in files or from network connections, among other sources.
Edit:
An example using ? as the format character for a bool.
import struct
p = struct.pack('????', True, False, True, False)
assert p == '\x01\x00\x01\x00'
with open("out", "wb") as o:
o.write(p)
Let's take a look at the file:
$ ls -l out
-rw-r--r-- 1 lutz lutz 4 Okt 1 13:26 out
$ od out
0000000 000001 000001
000000
Read it in again:
with open("out", "rb") as i:
q = struct.unpack('????', i.read())
assert q == (True, False, True, False)

Python: Extracting bits from a byte

I'm reading a binary file in python and the documentation for the file format says:
Flag (in binary)Meaning
1 nnn nnnn Indicates that there is one data byte to follow
that is to be duplicated nnn nnnn (127 maximum)
times.
0 nnn nnnn Indicates that there are nnn nnnn bytes of image
data to follow (127 bytes maximum) and that
there are no duplications.
n 000 0000 End of line field. Indicates the end of a line
record. The value of n may be either zero or one.
Note that the end of line field is required and
that it is reflected in the length of line record
field mentioned above.
When reading the file I'm expecting the byte I'm at to return 1 nnn nnnn where the nnn nnnn part should be 50.
I've been able to do this using the following:
flag = byte >> 7
numbytes = int(bin(byte)[3:], 2)
But the numbytes calculation feels like a cheap workaround.
Can I do more bit math to accomplish the calculation of numbytes?
How would you approach this?
The classic approach of checking whether a bit is set, is to use binary "and" operator, i.e.
x = 10 # 1010 in binary
if x & 0b10: # explicitly: x & 0b0010 != 0
print('First bit is set')
To check, whether n^th bit is set, use the power of two, or better bit shifting
def is_set(x, n):
return x & 2 ** n != 0
# a more bitwise- and performance-friendly version:
return x & 1 << n != 0
is_set(10, 1) # 1 i.e. first bit - as the count starts at 0-th bit
>>> True
You can strip off the leading bit using a mask ANDed with a byte from file. That will leave you with the value of the remaining bits:
mask = 0b01111111
byte_from_file = 0b10101010
value = mask & byte_from_file
print bin(value)
>> 0b101010
print value
>> 42
I find the binary numbers easier to understand than hex when doing bit-masking.
EDIT: Slightly more complete example for your use case:
LEADING_BIT_MASK = 0b10000000
VALUE_MASK = 0b01111111
values = [0b10101010, 0b01010101, 0b0000000, 0b10000000]
for v in values:
value = v & VALUE_MASK
has_leading_bit = v & LEADING_BIT_MASK
if value == 0:
print "EOL"
elif has_leading_bit:
print "leading one", value
elif not has_leading_bit:
print "leading zero", value
If I read your description correctly:
if (byte & 0x80) != 0:
num_bytes = byte & 0x7F
there you go:
class ControlWord(object):
"""Helper class to deal with control words.
Bit setting and checking methods are implemented.
"""
def __init__(self, value = 0):
self.value = int(value)
def set_bit(self, bit):
self.value |= bit
def check_bit(self, bit):
return self.value & bit != 0
def clear_bit(self, bit):
self.value &= ~bit
Instead of int(bin(byte)[3:], 2), you could simply use: int(bin(byte>>1),2)
not sure I got you correctly, but if I did, this should do the trick:
>>> x = 154 #just an example
>>> flag = x >> 1
>>> flag
1
>>> nb = x & 127
>>> nb
26
You can do it like this:
def GetVal(b):
# mask off the most significant bit, see if it's set
flag = b & 0x80 == 0x80
# then look at the lower 7 bits in the byte.
count = b & 0x7f
# return a tuple indicating the state of the high bit, and the
# remaining integer value without the high bit.
return (flag, count)
>>> testVal = 50 + 0x80
>>> GetVal(testVal)
(True, 50)

Convert bytes to bits in python

I am working with Python3.2. I need to take a hex stream as an input and parse it at bit-level. So I used
bytes.fromhex(input_str)
to convert the string to actual bytes. Now how do I convert these bytes to bits?
Another way to do this is by using the bitstring module:
>>> from bitstring import BitArray
>>> input_str = '0xff'
>>> c = BitArray(hex=input_str)
>>> c.bin
'0b11111111'
And if you need to strip the leading 0b:
>>> c.bin[2:]
'11111111'
The bitstring module isn't a requirement, as jcollado's answer shows, but it has lots of performant methods for turning input into bits and manipulating them. You might find this handy (or not), for example:
>>> c.uint
255
>>> c.invert()
>>> c.bin[2:]
'00000000'
etc.
What about something like this?
>>> bin(int('ff', base=16))
'0b11111111'
This will convert the hexadecimal string you have to an integer and that integer to a string in which each byte is set to 0/1 depending on the bit-value of the integer.
As pointed out by a comment, if you need to get rid of the 0b prefix, you can do it this way:
>>> bin(int('ff', base=16))[2:]
'11111111'
... or, if you are using Python 3.9 or newer:
>>> bin(int('ff', base=16)).removepreffix('0b')
'11111111'
Note: using lstrip("0b") here will lead to 0 integer being converted to an empty string. This is almost always not what you want to do.
Operations are much faster when you work at the integer level. In particular, converting to a string as suggested here is really slow.
If you want bit 7 and 8 only, use e.g.
val = (byte >> 6) & 3
(this is: shift the byte 6 bits to the right - dropping them. Then keep only the last two bits 3 is the number with the first two bits set...)
These can easily be translated into simple CPU operations that are super fast.
using python format string syntax
>>> mybyte = bytes.fromhex("0F") # create my byte using a hex string
>>> binary_string = "{:08b}".format(int(mybyte.hex(),16))
>>> print(binary_string)
00001111
The second line is where the magic happens. All byte objects have a .hex() function, which returns a hex string. Using this hex string, we convert it to an integer, telling the int() function that it's a base 16 string (because hex is base 16). Then we apply formatting to that integer so it displays as a binary string. The {:08b} is where the real magic happens. It is using the Format Specification Mini-Language format_spec. Specifically it's using the width and the type parts of the format_spec syntax. The 8 sets width to 8, which is how we get the nice 0000 padding, and the b sets the type to binary.
I prefer this method over the bin() method because using a format string gives a lot more flexibility.
I think simplest would be use numpy here. For example you can read a file as bytes and then expand it to bits easily like this:
Bytes = numpy.fromfile(filename, dtype = "uint8")
Bits = numpy.unpackbits(Bytes)
input_str = "ABC"
[bin(byte) for byte in bytes(input_str, "utf-8")]
Will give:
['0b1000001', '0b1000010', '0b1000011']
Here how to do it using format()
print "bin_signedDate : ", ''.join(format(x, '08b') for x in bytevector)
It is important the 08b . That means it will be a maximum of 8 leading zeros be appended to complete a byte. If you don't specify this then the format will just have a variable bit length for each converted byte.
To binary:
bin(byte)[2:].zfill(8)
Use ord when reading reading bytes:
byte_binary = bin(ord(f.read(1))) # Add [2:] to remove the "0b" prefix
Or
Using str.format():
'{:08b}'.format(ord(f.read(1)))
The other answers here provide the bits in big-endian order ('\x01' becomes '00000001')
In case you're interested in little-endian order of bits, which is useful in many cases, like common representations of bignums etc -
here's a snippet for that:
def bits_little_endian_from_bytes(s):
return ''.join(bin(ord(x))[2:].rjust(8,'0')[::-1] for x in s)
And for the other direction:
def bytes_from_bits_little_endian(s):
return ''.join(chr(int(s[i:i+8][::-1], 2)) for i in range(0, len(s), 8))
One line function to convert bytes (not string) to bit list. There is no endnians issue when source is from a byte reader/writer to another byte reader/writer, only if source and target are bit reader and bit writers.
def byte2bin(b):
return [int(X) for X in "".join(["{:0>8}".format(bin(X)[2:])for X in b])]
I came across this answer when looking for a way to convert an integer into a list of bit positions where the bitstring is equal to one. This becomes very similar to this question if you first convert your hex string to an integer like int('0x453', 16).
Now, given an integer - a representation already well-encoded in the hardware, I was very surprised to find out that the string variants of the above solutions using things like bin turn out to be faster than numpy based solutions for a single number, and I thought I'd quickly write up the results.
I wrote three variants of the function. First using numpy:
import math
import numpy as np
def bit_positions_numpy(val):
"""
Given an integer value, return the positions of the on bits.
"""
bit_length = val.bit_length() + 1
length = math.ceil(bit_length / 8.0) # bytelength
bytestr = val.to_bytes(length, byteorder='big', signed=True)
arr = np.frombuffer(bytestr, dtype=np.uint8, count=length)
bit_arr = np.unpackbits(arr, bitorder='big')
bit_positions = np.where(bit_arr[::-1])[0].tolist()
return bit_positions
Then using string logic:
def bit_positions_str(val):
is_negative = val < 0
if is_negative:
bit_length = val.bit_length() + 1
length = math.ceil(bit_length / 8.0) # bytelength
neg_position = (length * 8) - 1
# special logic for negatives to get twos compliment repr
max_val = 1 << neg_position
val_ = max_val + val
else:
val_ = val
binary_string = '{:b}'.format(val_)[::-1]
bit_positions = [pos for pos, char in enumerate(binary_string)
if char == '1']
if is_negative:
bit_positions.append(neg_position)
return bit_positions
And finally, I added a third method where I precomputed a lookuptable of the positions for a single byte and expanded that given larger itemsizes.
BYTE_TO_POSITIONS = []
pos_masks = [(s, (1 << s)) for s in range(0, 8)]
for i in range(0, 256):
positions = [pos for pos, mask in pos_masks if (mask & i)]
BYTE_TO_POSITIONS.append(positions)
def bit_positions_lut(val):
bit_length = val.bit_length() + 1
length = math.ceil(bit_length / 8.0) # bytelength
bytestr = val.to_bytes(length, byteorder='big', signed=True)
bit_positions = []
for offset, b in enumerate(bytestr[::-1]):
pos = BYTE_TO_POSITIONS[b]
if offset == 0:
bit_positions.extend(pos)
else:
pos_offset = (8 * offset)
bit_positions.extend([p + pos_offset for p in pos])
return bit_positions
The benchmark code is as follows:
def benchmark_bit_conversions():
# for val in [-0, -1, -3, -4, -9999]:
test_values = [
# -1, -2, -3, -4, -8, -32, -290, -9999,
# 0, 1, 2, 3, 4, 8, 32, 290, 9999,
4324, 1028, 1024, 3000, -100000,
999999999999,
-999999999999,
2 ** 32,
2 ** 64,
2 ** 128,
2 ** 128,
]
for val in test_values:
r1 = bit_positions_str(val)
r2 = bit_positions_numpy(val)
r3 = bit_positions_lut(val)
print(f'val={val}')
print(f'r1={r1}')
print(f'r2={r2}')
print(f'r3={r3}')
print('---')
assert r1 == r2
import xdev
xdev.profile_now(bit_positions_numpy)(val)
xdev.profile_now(bit_positions_str)(val)
xdev.profile_now(bit_positions_lut)(val)
import timerit
ti = timerit.Timerit(10000, bestof=10, verbose=2)
for timer in ti.reset('str'):
for val in test_values:
bit_positions_str(val)
for timer in ti.reset('numpy'):
for val in test_values:
bit_positions_numpy(val)
for timer in ti.reset('lut'):
for val in test_values:
bit_positions_lut(val)
for timer in ti.reset('raw_bin'):
for val in test_values:
bin(val)
for timer in ti.reset('raw_bytes'):
for val in test_values:
val.to_bytes(val.bit_length(), 'big', signed=True)
And it clearly shows the str and lookup table implementations are ahead of numpy. I tested this on CPython 3.10 and 3.11.
Timed str for: 10000 loops, best of 10
time per loop: best=20.488 µs, mean=21.438 ± 0.4 µs
Timed numpy for: 10000 loops, best of 10
time per loop: best=25.754 µs, mean=28.509 ± 5.2 µs
Timed lut for: 10000 loops, best of 10
time per loop: best=19.420 µs, mean=21.305 ± 3.8 µs

Categories

Resources