I'm developing a real-time application where I have to process a line of data as fast as possible to send it over to an app. These lines arrive at a very fast rate, around 40k per minute. The task is to extract the value of certain individual bits from the hexa data in the line. I have a solution already but I doubt it's the most efficient one, so I'm asking if you can improve it.
A sample line of data:
p1 p2 p3 len data
1497383697 0120 000 5 00 30 63 4f 15
len is how many bytes are in the data, data is what we're working with. Let's say I want to extract 3 bits starting from the 11th from the left. Converting the hexa to binary with padding:
0x0030634f15 = 0000 0000 0011 0000 0110 0011 0100 1111 0001 0101
The wanted value is 0b110 which is 6 in decimal.
My working solution for the problem is this:
# 11 and 3 in the example
start = config.getint(p, 'start')
length = config.getint(p, 'length')
parts = line.split()
hexadata = ''.join(parts[4:])
bindata = bin(int(hexadata, 16))[2:].zfill(len(hexadata) * 4)
val = int(bindata[start:start + length], 2)
val will hold the value 6 in the end. Any other, more efficent way to do this? Thank you
Instead of using string operations, it's faster to convert the input to a number and use bit operations:
parts = line.split(maxsplit=4)
# remove spaces in the number and convert it to int from base 16
num = int(parts[4].replace(' ', ''), 16)
# create a bit mask with exactly `length` 1s
mask = (1 << length) - 1
# calculate the offset from the right
shift = 40 - start - length
# shift the value to the right and apply the binary mask to get our value
val = (num >> shift) & mask
According to my timings, the bit operations are faster by about 20%. Timing results with 1 million iterations:
string_ops 2.735653492003621 seconds
bit_ops 2.190693126998667 seconds
Related
I have a byte array that is 6 bytes in length (48 bits). Only the first six bits of each byte are relevant. The high two bits do not contain data, so they should be ignored. They should not be included when converting to a number.
I want to extract a specific range of bits from the byte array and convert it to a number, while ignoring the high two bits of each byte.
e.g.
Take the following byte array as an example: b'\x12\x08\x1c\x30\x32\x21'
Bit 47 -> 00010010 00001000 00011100 00110000 00110010 00100001 <- Bit 0
If I want the value for bits 0 through 15. The answer should be 3233(1+32+128+1024+2048)
00010010 00001000 00011100 00110000 00110010 00100001
^^^^ XX^^^^^^ XX^^^^^^
If I want the value for bits 6 through 12. it should be 50 (2+16+32)
00010010 00001000 00011100 00110000 00110010 00100001
^ XX^^^^^^ XX
I can do this awkwardly in my head, but I'm having issues getting it down in Python. These are the steps that I think I should be doing, but I'm not sure if it's the best/easiest way nor how I should be doing it...
Convert my byte array into a single string containing it's binary value
Change every seventh and eighth character of the binary string (counting from the right side) to another character ("-" for example).
Remove any of the "-" characters from the new string. [edited]
Extract the bits that I want from the binary string. [edited]
Convert that string from a binary to a value.
...so...
1 . How do I take my byte array and convert to a 48 bit binary string?
2 . Is there an easy way to change every seventh and eighth bit to "-" in my binary string?
5 . Convert a string containing a binary value to a number?
...and is my thought process any good on this, or is there an easier way to accomplish this?
I really appreciate any help with this.
[edit] I think I had STEP 3 and 4 in my question in the wrong order... I want to remove the unwanted bits BEFORE I extract my binary digits. Question edited accordingly.[/edit]
Is this what you are looking for?
bytes6 = bytearray([0b00010010, 0b00001000, 0b00011100, 0b00110000, 0b00110010, 0b00100001])
shift = 30 # Shift first byte this much (6 bits * 5)
result = 0
for b in bytes6:
result |= (b & 0x3F) << shift
shift -= 6
print(bin(result))
Output: '0b10010001000011100110000110010100001'
This is what I've come up with...
def bit_value(data, first_bit, last_bit):
""" Returns a value based on what bits are set between first_bit and last_bit of a byte array, ignoring bit 7 and 8 of each byte."""
# Convert bytes to a binary string
number = int.from_bytes(data, "big")
bits = f"{number:048b}"
# Change 7th and 8th bits to "-"
clean_bits = ""
for i in range(0,48):
if i % 8 == 0 or i % 8 == 1:
clean_bits += "-"
else:
clean_bits += bits[i]
# Strip out the unwanted "-"
clean_bits = clean_bits.replace("-","")
# Get the bits we want
bits_i_want = clean_bits[35-first_bit:36-last_bit]
# Get the value of the resulting binary string
value = int(bits_i_want, 2)
return value
If I understand your requirements right, you can try something like this:
test.py:
def bit_value(data, first_bit, last_bit):
result = ""
for i in range(47 - last_bit, 48 - first_bit):
byte = i // 8
bit = 7 - (i % 8)
if bit < 6:
result += "01"[data[byte] >> bit & 1]
return int(result, 2)
def main():
data = b"\x12\x08\x1c\x30\x32\x21"
test1 = bit_value(data, 0, 15)
print(f"{test1:#b}, {test1}")
test2 = bit_value(data, 6, 12)
print(f"{test2:#b}, {test2}")
if __name__ == "__main__":
main()
Test:
$ python test.py
0b110010100001, 3233
0b10010, 18
How can I find out the number of Bytes a certain integer number takes up to store?
E.g. for
hexadecimal \x00 - \xff (or decimal 0 - 255 = binary 0000 0000 - 1111 1111) I'm looking to get 1 (Byte),
hexadecimal \x100 - \xffff (or decimal 256 - 65535 = binary 0000 0001 0000 0000 - 1111 1111 1111 1111) would give me 2 (Bytes)
and so on.
Any clue for hexadecimal or decimal format as the input?
def byte_length(i):
return (i.bit_length() + 7) // 8
Of course, as Jon Clements points out, this isn't the size of the actual PyIntObject, which has a PyObject header, and stores the value as a bignum in whatever way is easiest to deal with rather than most compact, and which you have to have at least one pointer (4 or 8 bytes) to on top of the actual object, and so on.
But this is the byte length of the number itself. It's almost certainly the most efficient answer, and probably also the easiest to read.
Or is ceil(i.bit_length() / 8.0) more readable?
Unless you're dealing with an array.array or a numpy.array - the size always has object overhead. And since Python deals with BigInts naturally, it's really, really hard to tell...
>>> i = 5
>>> import sys
>>> sys.getsizeof(i)
24
So on a 64bit platform it requires 24 bytes to store what could be stored in 3 bits.
However, if you did,
>>> s = '\x05'
>>> sys.getsizeof(s)
38
So no, not really - you've got the memory-overhead of the definition of the object rather than raw storage...
If you then take:
>>> a = array.array('i', [3])
>>> a
array('i', [3])
>>> sys.getsizeof(a)
60L
>>> a = array.array('i', [3, 4, 5])
>>> sys.getsizeof(a)
68L
Then you get what would be called normal byte boundaries, etc.. etc... etc...
If you just want what "purely" should be stored - minus object overhead, then from 2.(6|7) you can use some_int.bit_length() (otherwise just bitshift it as other answers have shown) and then work from there
You can use simple math:
>>> from math import log
>>> def bytes_needed(n):
... if n == 0:
... return 1
... return int(log(n, 256)) + 1
...
>>> bytes_needed(0x01)
1
>>> bytes_needed(0x100)
2
>>> bytes_needed(0x10000)
3
By using a simple biwise operation to move all the used bits over 1 byte each time you can see how many bytes are needed to store a number.
It's probably worth noting that while this method is very generic, it will not work on negative numbers and only looks at the binary of the variable without taking into account what it is stored in.
a = 256
i = 0
while(a > 0):
a = a >> 8;
i += 1;
print (i)
The program behaves as follows:
a is 0000 0001 0000 0000 in binary
each run of the loop will shift this to the left by 8:
loop 1:
0000 0001 >> 0000 0000
0000 0001 > 0 (1 > 0)
loop 2:
0000 0000 >> 0000 0001
0000 0000 > 0 (0 > 0)
END 0 is not > 0
so there are 2 bytes needed to store the number.
on python command prompt, you can use size of function
**$ import python
$ import ctypes
$ ctypes.sizeof(ctypes.c_int)**
# Python 3
import math
nbr = 0xff # 255 defined in hexadecimal
nbr = "{0:b}".format(nbr) # Transform the number into a string formated as bytes.
bit_length = len(nbr) # Number of characters
byte_length = math.ceil( bit_length/8 ) # Get minimum number of bytes
How can I find out the number of Bytes a certain integer number takes up to store?
E.g. for
hexadecimal \x00 - \xff (or decimal 0 - 255 = binary 0000 0000 - 1111 1111) I'm looking to get 1 (Byte),
hexadecimal \x100 - \xffff (or decimal 256 - 65535 = binary 0000 0001 0000 0000 - 1111 1111 1111 1111) would give me 2 (Bytes)
and so on.
Any clue for hexadecimal or decimal format as the input?
def byte_length(i):
return (i.bit_length() + 7) // 8
Of course, as Jon Clements points out, this isn't the size of the actual PyIntObject, which has a PyObject header, and stores the value as a bignum in whatever way is easiest to deal with rather than most compact, and which you have to have at least one pointer (4 or 8 bytes) to on top of the actual object, and so on.
But this is the byte length of the number itself. It's almost certainly the most efficient answer, and probably also the easiest to read.
Or is ceil(i.bit_length() / 8.0) more readable?
Unless you're dealing with an array.array or a numpy.array - the size always has object overhead. And since Python deals with BigInts naturally, it's really, really hard to tell...
>>> i = 5
>>> import sys
>>> sys.getsizeof(i)
24
So on a 64bit platform it requires 24 bytes to store what could be stored in 3 bits.
However, if you did,
>>> s = '\x05'
>>> sys.getsizeof(s)
38
So no, not really - you've got the memory-overhead of the definition of the object rather than raw storage...
If you then take:
>>> a = array.array('i', [3])
>>> a
array('i', [3])
>>> sys.getsizeof(a)
60L
>>> a = array.array('i', [3, 4, 5])
>>> sys.getsizeof(a)
68L
Then you get what would be called normal byte boundaries, etc.. etc... etc...
If you just want what "purely" should be stored - minus object overhead, then from 2.(6|7) you can use some_int.bit_length() (otherwise just bitshift it as other answers have shown) and then work from there
You can use simple math:
>>> from math import log
>>> def bytes_needed(n):
... if n == 0:
... return 1
... return int(log(n, 256)) + 1
...
>>> bytes_needed(0x01)
1
>>> bytes_needed(0x100)
2
>>> bytes_needed(0x10000)
3
By using a simple biwise operation to move all the used bits over 1 byte each time you can see how many bytes are needed to store a number.
It's probably worth noting that while this method is very generic, it will not work on negative numbers and only looks at the binary of the variable without taking into account what it is stored in.
a = 256
i = 0
while(a > 0):
a = a >> 8;
i += 1;
print (i)
The program behaves as follows:
a is 0000 0001 0000 0000 in binary
each run of the loop will shift this to the left by 8:
loop 1:
0000 0001 >> 0000 0000
0000 0001 > 0 (1 > 0)
loop 2:
0000 0000 >> 0000 0001
0000 0000 > 0 (0 > 0)
END 0 is not > 0
so there are 2 bytes needed to store the number.
on python command prompt, you can use size of function
**$ import python
$ import ctypes
$ ctypes.sizeof(ctypes.c_int)**
# Python 3
import math
nbr = 0xff # 255 defined in hexadecimal
nbr = "{0:b}".format(nbr) # Transform the number into a string formated as bytes.
bit_length = len(nbr) # Number of characters
byte_length = math.ceil( bit_length/8 ) # Get minimum number of bytes
I am trying to implement my own Websocket server in python im following the RFC 6455 Spec and im running into problems extracting the bits from the Base Frame header
im not having problems with the protocol im having problems with basic binary/hex math magic
according to the Specs the first 4 bits are single bit values
so to get the first bit i do something like this (d being my data from the websocket)
first_byte = ord(d[0])
print "finished bit",(first_byte >> 7) & 1
and later on if i want to get the payload size i do
sec_byte = ord(d[1])
print "payload size",sec_byte & 0x7f
however later in the spec i need to grab a 4bit value for the opcodes
this is what i need help on maybe even a link to how this math works ive googled/duckduckgoed my brains out most results being from stackoverflow
even more tinkering and its starting to fall into place i had been stuck on this for about 4 days now and still unsolved for anymore info anyone can give.
If you need to consider only the first (Most Significant) 4 bits, you need to right shift by 4 (extra masking with And could be unuseful, e.g. if your value is in the range 0-255, but it even stresses the bits you're are interested in). E.g.
>>> d = [128, 80, 40]
>>> print (d[0] >> 4) & 15
8
>>> print (d[1] >> 4) & 15
5
>>> print (d[2] >> 4) & 15
2
128 is in binary 1000 0000; right shifting by 4 gives 0000 1000 ("new" 0 bits enter from left), i.e. 8; 80 is 0101 0000, so you obtain 0000 0101; and finally 40 is 0010 1000 and we obtain 0000 0010.
In general, consider an octet like abcd efgh where each letter is a bit. You have to shift and And in order to isolate the bits you are interested in. E.g. suppose your spec says cd bits define four different kind of something. In order to obtain that 0-3 number, you right shift by 4 again, and and with 3, that is 0000 0011, i.e. you "isolate" the bits you want.
There is a byte at a specific index in a byte string which represents eight flags; one flag per bit in the byte. If a flag is set, its corresponding bit is 1, otherwise its 0. For example, if I've got
b'\x21'
the flags would be
0001 0101 # Three flags are set at indexes 0, 2 and 4
# and the others are not set
What would be the best way to get each bit value in that byte, so I know whether a particular flag is set or not? (Preferably using bitwise operations)
Typically, the least-significant bit is bit index 0 and the most-significant bit is bit index 7. Using this terminology, we can determine if bit index k is set by taking the bitwise-and with 1 shifted to the left by k. If the bitwise and is non-zero, then that means that index k has a 1; otherwise, index k has a 0. So:
def get_bit(byteval,idx):
return ((byteval&(1<<idx))!=0);
This will correctly determine the value of bits at indices 0...7 of the byte, going from right-to-left (i.e. the least significant bit to the most significant bit, or equivalently from the 1s place to the 27 = 128 place).
Why it works
I figured I should add an explanation of why it works...
1<<0 is 1 = 0000 0001
1<<1 is 2 = 0000 0010
1<<2 is 4 = 0000 0100
As you can see, 1<<k is equivalent to 2k and contains a 1 at exactly the index we are interested and at no other location. Consequently, the bitwise and with 1<<k will either return 0 or 1<<k; it will be 0 if the bit at the index we are interested in is 0 (because 1 and 0 is 0, and all other bits in 1<<k are zero). If the bit we are interested in is 1, then we get a 1 and a 1 in that position, and a 0 and something else, everywhere else.
x & 1, x & 2, x & 4, x & 8, etc
if those are >0 then the bit 1,2,3,4, etc is set
Specify bit masks (read about bit masks on Wikipedia):
FLAG_1 = 1 # 0000 0001
FLAG_2 = 2 # 0000 0010
FLAG_3 = 4 # 0000 0100
...
And then use AND to check whether a bit is set (flags contains your byte):
if(flags & FLAG_1) { # bit 0 is set, example: 0001 0101 & 0000 0001 = 0000 0001
}
if(flags & FLAG_2) { # bit 1 is set, example: 0001 0101 & 000 0010 = 0000 0000
}
...
Of course you should name FLAG_1, etc to something meaningful, depending on the context. E.g. ENABLE_BORDER.
Update:
I was confused about your comment which bits are set, but after reading another answer I realized your are counting the bits from the wrong end. Bits are numbered zero based from the right.
The function would be:
def get_site_value(byteval, index):
''' Function to get local value at a given index'''
return (byteval >> index) &1