Python: Extracting bits from a byte - python

I'm reading a binary file in python and the documentation for the file format says:
Flag (in binary)Meaning
1 nnn nnnn Indicates that there is one data byte to follow
that is to be duplicated nnn nnnn (127 maximum)
times.
0 nnn nnnn Indicates that there are nnn nnnn bytes of image
data to follow (127 bytes maximum) and that
there are no duplications.
n 000 0000 End of line field. Indicates the end of a line
record. The value of n may be either zero or one.
Note that the end of line field is required and
that it is reflected in the length of line record
field mentioned above.
When reading the file I'm expecting the byte I'm at to return 1 nnn nnnn where the nnn nnnn part should be 50.
I've been able to do this using the following:
flag = byte >> 7
numbytes = int(bin(byte)[3:], 2)
But the numbytes calculation feels like a cheap workaround.
Can I do more bit math to accomplish the calculation of numbytes?
How would you approach this?

The classic approach of checking whether a bit is set, is to use binary "and" operator, i.e.
x = 10 # 1010 in binary
if x & 0b10: # explicitly: x & 0b0010 != 0
print('First bit is set')
To check, whether n^th bit is set, use the power of two, or better bit shifting
def is_set(x, n):
return x & 2 ** n != 0
# a more bitwise- and performance-friendly version:
return x & 1 << n != 0
is_set(10, 1) # 1 i.e. first bit - as the count starts at 0-th bit
>>> True

You can strip off the leading bit using a mask ANDed with a byte from file. That will leave you with the value of the remaining bits:
mask = 0b01111111
byte_from_file = 0b10101010
value = mask & byte_from_file
print bin(value)
>> 0b101010
print value
>> 42
I find the binary numbers easier to understand than hex when doing bit-masking.
EDIT: Slightly more complete example for your use case:
LEADING_BIT_MASK = 0b10000000
VALUE_MASK = 0b01111111
values = [0b10101010, 0b01010101, 0b0000000, 0b10000000]
for v in values:
value = v & VALUE_MASK
has_leading_bit = v & LEADING_BIT_MASK
if value == 0:
print "EOL"
elif has_leading_bit:
print "leading one", value
elif not has_leading_bit:
print "leading zero", value

If I read your description correctly:
if (byte & 0x80) != 0:
num_bytes = byte & 0x7F

there you go:
class ControlWord(object):
"""Helper class to deal with control words.
Bit setting and checking methods are implemented.
"""
def __init__(self, value = 0):
self.value = int(value)
def set_bit(self, bit):
self.value |= bit
def check_bit(self, bit):
return self.value & bit != 0
def clear_bit(self, bit):
self.value &= ~bit

Instead of int(bin(byte)[3:], 2), you could simply use: int(bin(byte>>1),2)

not sure I got you correctly, but if I did, this should do the trick:
>>> x = 154 #just an example
>>> flag = x >> 1
>>> flag
1
>>> nb = x & 127
>>> nb
26

You can do it like this:
def GetVal(b):
# mask off the most significant bit, see if it's set
flag = b & 0x80 == 0x80
# then look at the lower 7 bits in the byte.
count = b & 0x7f
# return a tuple indicating the state of the high bit, and the
# remaining integer value without the high bit.
return (flag, count)
>>> testVal = 50 + 0x80
>>> GetVal(testVal)
(True, 50)

Related

Convert 2 integers to hex/byte array?

I'm using a Python to transmit two integers (range 0...4095) via SPI. The package seems to expect a byte array in form of [0xff,0xff,0xff].
So e.g. 1638(hex:666) and 1229(hex:4cd) should yield [0x66,0x64,0xcd].
So would an effective conversion look like as the mixed byte in the middle seems quite nasty?
You can do it by left shifting and then bitwise OR'ing the two 12-bit values together and using the int_to_bytes() function shown below, which will work in Python 2.x.
In Python 3, the int type has a built-in method called to_bytes() that will do this and more, so in that version you wouldn't need to supply your own.
def int_to_bytes(n, minlen=0):
""" Convert integer to bytearray with optional minimum length.
"""
if n > 0:
arr = []
while n:
n, rem = n >> 8, n & 0xff
arr.append(rem)
b = bytearray(reversed(arr))
elif n == 0:
b = bytearray(b'\x00')
else:
raise ValueError('Only non-negative values supported')
if minlen > 0 and len(b) < minlen: # zero padding needed?
b = (minlen-len(b)) * '\x00' + b
return b
a, b = 1638, 1229 # two 12 bit values
v = a << 12 | b # shift first 12 bits then OR with second
ba = int_to_bytes(v, 3) # convert to array of bytes
print('[{}]'.format(', '.join(hex(b) for b in ba))) # -> [0x66, 0x64, 0xcd]

Python working with bits

I want to do a bit operation, and need some help:
I have a word of 16 bit and want to split it into two, reverse each and then join them again.
Example if i have 0b11000011
First I divide it into 0b1100 and 0b0011
Then i reverse both getting 0b0011 and 0b1100
And finally rejoin them getting 0b00111100
Thanks!
Here's one way to do it:
def rev(n):
res = 0
mask = 0x01
while mask <= 0x80:
res <<= 1
res |= bool(n & mask)
mask <<= 1
return res
x = 0b1100000110000011
x = (rev(x >> 8) << 8) | rev(x & 0xFF)
print bin(x) # 0b1000001111000001
Note that the method above operates on words, not bytes as example in the question.
here are some basic operations you can try, and you can concatenate results after splitting your string in two and reversing it
a = "0b11000011" #make a string
b = a[:6] #get first 5 chars
c = a[::-1] # invert the string

Write boolean string to binary file?

I have a string of booleans and I want to create a binary file using these booleans as bits. This is what I am doing:
# first append the string with 0s to make its length a multiple of 8
while len(boolString) % 8 != 0:
boolString += '0'
# write the string to the file byte by byte
i = 0
while i < len(boolString) / 8:
byte = int(boolString[i*8 : (i+1)*8], 2)
outputFile.write('%c' % byte)
i += 1
But this generates the output 1 byte at a time and is slow. What would be a more efficient way to do it?
It should be quicker if you calculate all your bytes first and then write them all together. For example
b = bytearray([int(boolString[x:x+8], 2) for x in range(0, len(boolString), 8)])
outputFile.write(b)
I'm also using a bytearray which is a natural container to use, and can also be written directly to your file.
You can of course use libraries if that's appropriate such as bitarray and bitstring. Using the latter you could just say
bitstring.Bits(bin=boolString).tofile(outputFile)
Here's another answer, this time using an industrial-strength utility function from the PyCrypto - The Python Cryptography Toolkit where, in version 2.6 (the current latest stable release), it's defined inpycrypto-2.6/lib/Crypto/Util/number.py.
The comments preceeding it say:
Improved conversion functions contributed by Barry Warsaw, after careful benchmarking
import struct
def long_to_bytes(n, blocksize=0):
"""long_to_bytes(n:long, blocksize:int) : string
Convert a long integer to a byte string.
If optional blocksize is given and greater than zero, pad the front of the
byte string with binary zeros so that the length is a multiple of
blocksize.
"""
# after much testing, this algorithm was deemed to be the fastest
s = b('')
n = long(n)
pack = struct.pack
while n > 0:
s = pack('>I', n & 0xffffffffL) + s
n = n >> 32
# strip off leading zeros
for i in range(len(s)):
if s[i] != b('\000')[0]:
break
else:
# only happens when n == 0
s = b('\000')
i = 0
s = s[i:]
# add back some pad bytes. this could be done more efficiently w.r.t. the
# de-padding being done above, but sigh...
if blocksize > 0 and len(s) % blocksize:
s = (blocksize - len(s) % blocksize) * b('\000') + s
return s
You can convert a boolean string to a long using data = long(boolString,2). Then to write this long to disk you can use:
while data > 0:
data, byte = divmod(data, 0xff)
file.write('%c' % byte)
However, there is no need to make a boolean string. It is much easier to use a long. The long type can contain an infinite number of bits. Using bit manipulation you can set or clear the bits as needed. You can then write the long to disk as a whole in a single write operation.
You can try this code using the array class:
import array
buffer = array.array('B')
i = 0
while i < len(boolString) / 8:
byte = int(boolString[i*8 : (i+1)*8], 2)
buffer.append(byte)
i += 1
f = file(filename, 'wb')
buffer.tofile(f)
f.close()
A helper class (shown below) makes it easy:
class BitWriter:
def __init__(self, f):
self.acc = 0
self.bcount = 0
self.out = f
def __del__(self):
self.flush()
def writebit(self, bit):
if self.bcount == 8 :
self.flush()
if bit > 0:
self.acc |= (1 << (7-self.bcount))
self.bcount += 1
def writebits(self, bits, n):
while n > 0:
self.writebit( bits & (1 << (n-1)) )
n -= 1
def flush(self):
self.out.write(chr(self.acc))
self.acc = 0
self.bcount = 0
with open('outputFile', 'wb') as f:
bw = BitWriter(f)
bw.writebits(int(boolString,2), len(boolString))
bw.flush()
Use the struct package.
This can be used in handling binary data stored in files or from network connections, among other sources.
Edit:
An example using ? as the format character for a bool.
import struct
p = struct.pack('????', True, False, True, False)
assert p == '\x01\x00\x01\x00'
with open("out", "wb") as o:
o.write(p)
Let's take a look at the file:
$ ls -l out
-rw-r--r-- 1 lutz lutz 4 Okt 1 13:26 out
$ od out
0000000 000001 000001
000000
Read it in again:
with open("out", "rb") as i:
q = struct.unpack('????', i.read())
assert q == (True, False, True, False)

Using Python How can I read the bits in a byte?

I have a file where the first byte contains encoded information. In Matlab I can read the byte bit by bit with var = fread(file, 8, 'ubit1'), and then retrieve each bit by var(1), var(2), etc.
Is there any equivalent bit reader in python?
Read the bits from a file, low bits first.
def bits(f):
bytes = (ord(b) for b in f.read())
for b in bytes:
for i in xrange(8):
yield (b >> i) & 1
for b in bits(open('binary-file.bin', 'r')):
print b
The smallest unit you'll be able to work with is a byte. To work at the bit level you need to use bitwise operators.
x = 3
#Check if the 1st bit is set:
x&1 != 0
#Returns True
#Check if the 2nd bit is set:
x&2 != 0
#Returns True
#Check if the 3rd bit is set:
x&4 != 0
#Returns False
With numpy it is easy like this:
Bytes = numpy.fromfile(filename, dtype = "uint8")
Bits = numpy.unpackbits(Bytes)
More info here:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.fromfile.html
You won't be able to read each bit one by one - you have to read it byte by byte. You can easily extract the bits out, though:
f = open("myfile", 'rb')
# read one byte
byte = f.read(1)
# convert the byte to an integer representation
byte = ord(byte)
# now convert to string of 1s and 0s
byte = bin(byte)[2:].rjust(8, '0')
# now byte contains a string with 0s and 1s
for bit in byte:
print bit
Joining some of the previous answers I would use:
[int(i) for i in "{0:08b}".format(byte)]
For each byte read from the file. The results for an 0x88 byte example is:
>>> [int(i) for i in "{0:08b}".format(0x88)]
[1, 0, 0, 0, 1, 0, 0, 0]
You can assign it to a variable and work as per your initial request.
The "{0.08}" is to guarantee the full byte length
To read a byte from a file: bytestring = open(filename, 'rb').read(1). Note: the file is opened in the binary mode.
To get bits, convert the bytestring into an integer: byte = bytestring[0] (Python 3) or byte = ord(bytestring[0]) (Python 2) and extract the desired bit: (byte >> i) & 1:
>>> for i in range(8): (b'a'[0] >> i) & 1
...
1
0
0
0
0
1
1
0
>>> bin(b'a'[0])
'0b1100001'
There are two possible ways to return the i-th bit of a byte. The "first bit" could refer to the high-order bit or it could refer to the lower order bit.
Here is a function that takes a string and index as parameters and returns the value of the bit at that location. As written, it treats the low-order bit as the first bit. If you want the high order bit first, just uncomment the indicated line.
def bit_from_string(string, index):
i, j = divmod(index, 8)
# Uncomment this if you want the high-order bit first
# j = 8 - j
if ord(string[i]) & (1 << j):
return 1
else:
return 0
The indexing starts at 0. If you want the indexing to start at 1, you can adjust index in the function before calling divmod.
Example usage:
>>> for i in range(8):
>>> print i, bit_from_string('\x04', i)
0 0
1 0
2 1
3 0
4 0
5 0
6 0
7 0
Now, for how it works:
A string is composed of 8-bit bytes, so first we use divmod() to break the index into to parts:
i: the index of the correct byte within the string
j: the index of the correct bit within that byte
We use the ord() function to convert the character at string[i] into an integer type. Then, (1 << j) computes the value of the j-th bit by left-shifting 1 by j. Finally, we use bitwise-and to test if that bit is set. If so return 1, otherwise return 0.
Supposing you have a file called bloom_filter.bin which contains an array of bits and you want to read the entire file and use those bits in an array.
First create the array where the bits will be stored after reading,
from bitarray import bitarray
a=bitarray(size) #same as the number of bits in the file
Open the file,
using open or with, anything is fine...I am sticking with open here,
f=open('bloom_filter.bin','rb')
Now load all the bits into the array 'a' at one shot using,
f.readinto(a)
'a' is now a bitarray containing all the bits
This is pretty fast I would think:
import itertools
data = range(10)
format = "{:0>8b}".format
newdata = (False if n == '0' else True for n in itertools.chain.from_iterable(map(format, data)))
print(newdata) # prints tons of True and False
I think this is a more pythonic way:
a = 140
binary = format(a, 'b')
The result of this block is:
'10001100'
I was to get bit planes of the image and this function helped me to write this block:
def img2bitmap(img: np.ndarray) -> list:
if img.dtype != np.uint8 or img.ndim > 2:
raise ValueError("Image is not uint8 or gray")
bit_mat = [np.zeros(img.shape, dtype=np.uint8) for _ in range(8)]
for row_number in range(img.shape[0]):
for column_number in range(img.shape[1]):
binary = format(img[row_number][column_number], 'b')
for idx, bit in enumerate("".join(reversed(binary))[:]):
bit_mat[idx][row_number, column_number] = 2 ** idx if int(bit) == 1 else 0
return bit_mat
Also by this block, I was able to make primitives image from extracted bit planes
img = cv2.imread('test.jpg', cv2.IMREAD_GRAYSCALE)
out = img2bitmap(img)
original_image = np.zeros(img.shape, dtype=np.uint8)
for i in range(original_image.shape[0]):
for j in range(original_image.shape[1]):
for data in range(8):
x = np.array([original_image[i, j]], dtype=np.uint8)
data = np.array([data], dtype=np.uint8)
flag = np.array([0 if out[data[0]][i, j] == 0 else 1], dtype=np.uint8)
mask = flag << data[0]
x[0] = (x[0] & ~mask) | ((flag[0] << data[0]) & mask)
original_image[i, j] = x[0]

Python: Set Bits Count (popcount)

Few blob's have been duplicated in my database(oracle 11g), performed XOR operations on the blob using UTL_RAW.BIT_XOR. After that i wanted to count the number of set bits in the binary string, so wrote the code above.
During a small experiment, i wanted to see what is the hex and the integer value produced and wrote this procedure..
SQL> declare
2
3 vblob1 blob;
4
5 BEGIN
6
7 select leftiriscode INTO vblob1 FROM irisdata WHERE irisid=1;
8
9 dbms_output.put_line(rawtohex(vblob1));
10
11
12 dbms_output.put_line(UTL_RAW.CAST_TO_binary_integer(vblob1));
13
14
15 END;
16 /
OUTPUT: HEXVALUE:
0F0008020003030D030C1D1C3C383C330A3311373724764C54496C0A6B029B84840547A341BBA83D
BB5FB9DE4CDE5EFE96E1FC6169438344D604681D409F9F9F3BC07EE0C4E0C033A23B37791F59F84F
F94E4F664E3072B0229DA09D9F0F1FC600C2E380D6988C198B39517D157E7D66FE675237673D3D28
3A016C01411003343C76740F710F0F4F8FE976E1E882C186D316A63C0C7D7D7D7D397F016101B043
0176C37E767C7E0C7D010C8302C2D3E4F2ACE42F8D3F3F367A46F54285434ABB61BDB53CBF6C7CC0
F4C1C3F349B3F7BEB30E4A0CFE1C85180DC338C2C1C6E7A5CE3104303178724CCC5F451F573F3B24
7F24052000202003291F130F1B0E070C0E0D0F0E0F0B0B07070F1E1B330F27073F3F272E2F2F6F7B
2F2E1F2E4F7EFF7EDF3EBF253F3D2F39BF3D7F7FFED72FF39FE7773DBE9DBFBB3FE7A76E777DF55C
5F5F7ADF7FBD7F6AFE7B7D1FBE7F7F7DD7F63FBFBF2D3B7F7F5F2F7F3D7F7D3B3F3B7FFF4D676F7F
5D9FAD7DD17F7F6F6F0B6F7F3F767F1779364737370F7D3F5F377F2F3D3F7F1F2FE7709FB7BCB77B
0B77CF1DF5BF1F7F3D3E4E7F197F571F7D7E3F7F7F7D7F6F4F75FF6F7ECE2FFF793EFFEDB7BDDD1F
FF3BCE3F7F3FBF3D6C7FFF7F7F4FAF7F6FFFFF8D7777BF3AE30FAEEEEBCF5FEEFEE75FFEACFFDF0F
DFFFF77FFF677F4FFF7F7F1B5F1F5F146F1F1E1B3B1F3F273303170F370E250B
INTEGER VALUE: 15
There was a variance between the hex code and the integer value produced, so used the following python code to check the actual integer value.
print int("0F0008020003030D030C1D1C3C383C330A3311373724764C54496C0A6B029B84840547A341BBA83D
BB5FB9DE4CDE5EFE96E1FC6169438344D604681D409F9F9F3BC07EE0C4E0C033A23B37791F59F84F
F94E4F664E3072B0229DA09D9F0F1FC600C2E380D6988C198B39517D157E7D66FE675237673D3D28
3A016C01411003343C76740F710F0F4F8FE976E1E882C186D316A63C0C7D7D7D7D397F016101B043
0176C37E767C7E0C7D010C8302C2D3E4F2ACE42F8D3F3F367A46F54285434ABB61BDB53CBF6C7CC0
F4C1C3F349B3F7BEB30E4A0CFE1C85180DC338C2C1C6E7A5CE3104303178724CCC5F451F573F3B24
7F24052000202003291F130F1B0E070C0E0D0F0E0F0B0B07070F1E1B330F27073F3F272E2F2F6F7B
2F2E1F2E4F7EFF7EDF3EBF253F3D2F39BF3D7F7FFED72FF39FE7773DBE9DBFBB3FE7A76E777DF55C
5F5F7ADF7FBD7F6AFE7B7D1FBE7F7F7DD7F63FBFBF2D3B7F7F5F2F7F3D7F7D3B3F3B7FFF4D676F7F
5D9FAD7DD17F7F6F6F0B6F7F3F767F1779364737370F7D3F5F377F2F3D3F7F1F2FE7709FB7BCB77B
0B77CF1DF5BF1F7F3D3E4E7F197F571F7D7E3F7F7F7D7F6F4F75FF6F7ECE2FFF793EFFEDB7BDDD1F
FF3BCE3F7F3FBF3D6C7FFF7F7F4FAF7F6FFFFF8D7777BF3AE30FAEEEEBCF5FEEFEE75FFEACFFDF0F
DFFFF77FFF677F4FFF7F7F1B5F1F5F146F1F1E1B3B1F3F273303170F370E250B",16)
Answer:
611951595100708231079693644541095422704525056339295086455197024065285448917042457
942011979060274412229909425184116963447100932992139876977824261789243946528467423
887840013630358158845039770703659333212332565531927875442166643379024991542726916
563271158141698128396823655639931773363878078933197184072343959630467756337300811
165816534945075483141582643531294791665590339000206551162697220540050652439977992
246472159627917169957822698172925680112854091876671868161705785698942483896808137
210721991100755736178634253569843464062494863175653771387230991126430841565373390
924951878267929443498220727531299945275045612499928105876210478958806304156695438
684335624641395635997624911334453040399012259638042898470872203581555352191122920
004010193837249388365999010692555403377045768493630826307316376698443166439386014
145858084176544890282148970436631175577000673079418699845203671050174181808397880
048734270748095682582556024378558289251964544327507321930196203199459115159756564
507340111030285226951393012863778670390172056906403480159339130447254293412506482
027099835944315172972281427649277354815211185293109925602315480350955479477144523
387689192243720928249121486221114300503766209279369960344185651810101969585926336
07333771272398091
To get the set-bit count I have written the following code in C:
int bitsoncount(unsigned x)
{
unsigned int b=0;
if(x > 1)
b=1;
while(x &= (x - 1))
b++;
return b;
}
When I tried the same code in python it did not work. I am new to python through curiosity I'm experimenting, excuse me if am wrong.
def bitsoncount(x):
b=0;
if(x>1):
b=1;
while(x &= (x-1)):
I get an error at the last line, need some help in resolving this and implementing the logic in python :-)
I was interested in checking out the set bits version in python after what i have seen!
Related question: Best algorithm to count the number of set bits in a 32-bit integer?
In Python 3.10+, there is int.bit_count():
>>> 123 .bit_count()
6
Python 2.6 or 3.0:
def bitsoncount(x):
return bin(x).count('1')
Example:
>>> x = 123
>>> bin(x)
'0b1111011'
>>> bitsoncount(x)
6
Or
Matt Howells's answer in Python:
def bitsoncount(i):
assert 0 <= i < 0x100000000
i = i - ((i >> 1) & 0x55555555)
i = (i & 0x33333333) + ((i >> 2) & 0x33333333)
return (((i + (i >> 4) & 0xF0F0F0F) * 0x1010101) & 0xffffffff) >> 24
Starting with Python 3.10 you can use int.bit_count():
x = 826151739
print(x.bit_count()) # 16
what you're looking for is called the Hamming Weight.
in python 2.6/3.0 it can be found rather easily with:
bits = sum( b == '1' for b in bin(x)[2:] )
What version of Python are you using?
First off, Python uses white space not semicolon's, so to start it should look something like this...
def bitsoncount(x):
b=0
while(x > 0):
x &= x - 1
b+=1
return b
The direct translation of your C algorithm is as follows:
def bitsoncount(x):
b = 0
while x > 0:
x &= x - 1
b += 1
return b
Maybe this is what you mean?
def bits_on_count(x):
b = 0
while x != 0:
if x & 1: # Last bit is a 1
b += 1
x >>= 1 # Shift the bits of x right
return b
There's also a way to do it simply in Python 3.0:
def bits_on_count(x):
return sum(c=='1' for c in bin(x))
This uses the fact that bin(x) gives a binary representation of x.
Try this module:
import sys
if sys.maxint < 2**32:
msb2= 2**30
else:
msb2= 2**62
BITS=[-msb2*2] # not converted into long
while msb2:
BITS.append(msb2)
msb2 >>= 1
def bitcount(n):
return sum(1 for b in BITS if b&n)
This should work for machine integers (depending on your OS and the Python version). It won't work for any long.
How do you like this one:
def bitsoncount(x):
b = 0
bit = 1
while bit <= x:
b += int(x & bit > 0)
bit = bit << 1
return b
Basically, you use a test bit that starts right and gets shifted all the way through up to the bit length of your in parameter. For each position the bit & x yields a single bit which is on, or none. Check > 0 and turn the resulting True|False into 1|0 with int(), and add this to the accumulator. Seems to work nicely for longs :-) .
How to count the number of 1-bits starting with Python 3.10: https://docs.python.org/3/library/stdtypes.html#int.bit_count
# int.bit_count()
n = 19
bin(n)
# '0b10011'
n.bit_count() # <-- this is how
# 3
(-n).bit_count()
# 3
Is equivalent to (as per page linked above), but more efficient than:
def bit_count(self):
return bin(self).count("1")

Categories

Resources