Pack data, extreme Bitpacking in Python - python

I need to pack information as closely as possible into a bitstream.
I have variables with a different number of distinct states:
Number_of_states=[3,5,129,15,6,2]# A bit longer in reality
The best option I have in the Moment would be to create a bitfield, using
2+3+8+4+3+1 bit ->21 bit
However it should be possible to pack these states into np.log2(3*5*129*15*6*2)=18.4 bits, saving two bits. (In reality I have 298 bits an need to save a few)
In my case this would save about >5% of the data stream, which would help a lot.
Is there a viable solution in python to pack the data in this way? I tried packalgorithms, but they create too much overhead with just a few bytes of data. The string is no problem, it is constant and will be transmitted beforehand.
This is the code I am using in the moment:
from bitstring import pack
import numpy as np
DATA_TO_BE_PACKED=np.zeros(6)
Number_of_states=[3,5,129,15,6,2]#mutch longer in reality
DATA_TO_BE_PACKED=np.random.randint(Number_of_states)
string=''
for item in Number_of_states:
string+='uint:{}, '.format(int(np.ceil(np.log2(item))))
PACKED_DATA = pack(string,*DATA_TO_BE_PACKED)
print(len(PACKED_DATA ))
print(PACKED_DATA.unpack(string))

You can interpret the state as an index into a multidimensional array with shape (3, 5, 129, 15, 6, 2). This index can be encoded as the integer index into the flattened 1-d array with length 3*5*129*15*6*2 = 348300. NumPy has the functions ravel_multi_index and unravel_index that can do this for you.
For example, let num_states be the number of states for each component of your state:
In [86]: num_states = [3, 5, 129, 15, 6, 2]
Suppose state holds an instance of the data; that is, it records the state of each component:
In [87]: state = [2, 3, 78, 9, 0, 1]
To encode this state, pass it through ravel_multi_index. idx is the encoded state:
In [88]: idx = np.ravel_multi_index(state, num_states)
In [89]: idx
Out[89]: 316009
By construction, 0 <= idx < 348300, so it requires only 19 bits.
To restore state from idx, use unravel_index:
In [90]: np.unravel_index(idx, num_states)
Out[90]: (2, 3, 78, 9, 0, 1)

This looks like a usecase of a mixed radix numeral system.
A quick proof of concept:
num_states = [3, 5, 129, 15, 6, 2]
input_data = [2, 3, 78, 9, 0, 1]
print("Input data: %s" % input_data)
To encode, you start with a 0, and for each state first multiply by number of states, and then add the current state:
encoded = 0
for i in range(len(num_states)):
encoded *= num_states[i]
encoded += input_data[i]
print("Encoded: %d" % encoded)
To decode, you go in reverse, and get remainder of division by number of states, and then divide by number of states:
decoded_data = []
for n in reversed(num_states):
v = encoded % n
encoded = encoded // n
decoded_data.insert(0, v)
print("Decoded data: %s" % decoded_data)
Example output:
Input data: [2, 3, 78, 9, 0, 1]
Encoded: 316009
Decoded data: [2, 3, 78, 9, 0, 1]
Another example with more values:
Input data: [2, 3, 78, 9, 0, 1, 84, 17, 4, 5, 30, 1]
Encoded: 14092575747751
Decoded data: [2L, 3L, 78L, 9L, 0L, 1L, 84L, 17L, 4L, 5L, 30L, 1L]

Related

Reorder numpy array to new bitlength elements without loop

If I have a numpy array with elements each representing e.g. a 9-bit integer, is there an easy way (maybe without a loop) to reorder it in a way that the resulting array elements each represent a 8-bit integer with the "lost bits" at the end of the previous element getting added at the beginning of the next element?
for example to get the following
np.array([0b100111000, 0b100101100, 0b110011100, 0b110010100]) # initial array in binarys
# convert to
np.array([0b10011100, 0b01001011, 0b00110011, 0b10011001, 0b01000000]) # resulting array
I hope it is understandable what I want to archive.
Additional info, I don't know if this makes any difference:
All of my 9-bit numbers start with the msb beeing 1 (they are bigger than 255) and the last two bits are always 0, like in the example above.
The arrays I want to process are much bigger with thousands of elements.
Thanks for your help in advance!
edit:
my current (complicated) solution is the following:
import numpy as np
def get_bits(data, offset, leng):
data = (data % (1 << (offset + leng))) >> offset
return data
data1 = np.array([0b100111000, 0b100101100, 0b110011100, 0b110010100])
i = 1
part1 = []
part2 = []
for el in data1:
if i == 1:
part2.append(0)
part1.append(get_bits(el, i, 8))
part2.append(get_bits(el, 0, i)<<(8-i))
if i == 8:
i = 1
part1.append(0)
else:
i += 1
if i != 1:
part1.append(0)
res = np.array(part1) + np.array(part2)
It's been bugging me that np.packbits and np.unpackbits are inefficient, so I came up with a bit twiddling answer.
The general idea is to work it like any resampler: you make an output array, and figure out where each piece of the output comes from in the input. You have N elements of 9 bits each, so the output is:
data = np.array([0b100111000, 0b100101100, 0b110011100, 0b110010100])
result = np.empty(np.ceil(data.size * 9 / 8).astype(int), dtype=np.uint8)
Every nine output bytes have the following pattern relative to the corresponding eight input bytes. I use {...} to indicate the (inclusive) bits in each input integer:
result[0] = data[0]{8:1}
result[1] = data[0]{0:0} data[1]{8:2}
result[2] = data[1]{1:0} data[2]{8:3}
result[3] = data[2]{2:0} data[3]{8:4}
result[4] = data[3]{3:0} data[4]{8:5}
result[5] = data[4]{4:0} data[5]{8:6}
result[6] = data[5]{5:0} data[6]{8:7}
result[7] = data[6]{6:0} data[7]{8:8}
result[8] = data[7]{7:0}
The index of result (call it i) is really given modulo 9. The index into data is therefore offset by 8 * (i // 9). The lower portion of the byte is given by data[...] >> (i + 1). The upper portion is given by data[...] & ((1 << i) - 1), shifted left by 8 - i bits.
That makes it pretty easy to come up with a vectorized solution:
i = np.arange(result.size)
j = i[:-1]
result[i] = (data[8 * (i // 9) + (i % 9) - 1] & ((1 << i % 9) - 1)) << (8 - i % 9)
result[j] |= (data[8 * (j // 9) + (j % 9)] >> (j % 9 + 1)).astype(np.uint8)
You need to clip the index of the low portion because it may go out of bounds. You don't need to clip the high portion because -1 is a perfectly valid index, and you don't care which element it accesses. And of course numpy won't let you OR or add int elements to a uint8 array, so you have to cast.
>>> [bin(x) for x in result]
['0b10011100', '0b1001011', '0b110011', '0b10011001', '0b1000000']
This solution should be scalable to arrays of any size, and I wrote it so that you can work out different combinations of shifts, not just 9-to-8.
You can do it in two steps with np.unpackbits and np.packbits. First turn your array into a little-endian column vector:
>>> z = np.array([0b100111000, 0b100101100, 0b110011100, 0b110010100], dtype='<u2').reshape(-1, 1)
>>> z.view(np.uint8)
array([[ 56, 1],
[ 44, 1],
[156, 1],
[148, 1]], dtype=uint8)
You can convert this into an array of bits directly by unpacking. In fact, at some point (PR #10855) I added the count parameter to chop of the high zeros for you:
>>> np.unpackbits(z.view(np.uint8), axis=1, bitorder='l', count=9)
array([[0, 0, 0, 1, 1, 1, 0, 0, 1],
[0, 0, 1, 1, 0, 1, 0, 0, 1],
[0, 0, 1, 1, 1, 0, 0, 1, 1],
[0, 0, 1, 0, 1, 0, 0, 1, 1]], dtype=uint8)
Now you can just repack the reversed raveled array:
>>> u = np.unpackbits(z.view(np.uint8), axis=1, bitorder='l', count=9)[:, ::-1].ravel()
>>> result = np.packbits(u)
>>> result.dtype
dtype('uint8')
>>> [bin(x) for x in result]
['0b10011100', '0b1001011', '0b110011', '0b10011001', '0b1000000']
If your machine is native little endian (e.g., most intel architectures), you can do this in a one-liner:
z = np.array([0b100111000, 0b100101100, 0b110011100, 0b110010100])
result = np.packbits(np.unpackbits(z.view(np.uint8), axis=1, bitorder='l', count=9)[:, ::-1].ravel())
Otherwise, you can do z.byteswap().view(np.uint8) to get the right starting order (still one liner though, I suppose).
I think I understood most of what you want, and given that you can do bit operation with numpy arrays in which case you get the desire bit operation element wise if do it with two array (or the same for all if it is an array vs a number), then you need to construct the appropriate arrays to do the thing, so something like this
>>> import numpy as np
>>> x = np.array([0b100111000, 0b100101100, 0b110011100, 0b110010100])
>>> goal=np.array([0b10011100, 0b01001011, 0b00110011, 0b10011001, 0b01000000])
>>> x
array([312, 300, 412, 404])
>>> goal
array([156, 75, 51, 153, 64])
>>> shift1 = np.array(range(1,1+len(x)))
>>> shift1
array([1, 2, 3, 4])
>>> mask1 = np.array([2**n -1 for n in range(1,1+len(x))])
>>> mask1
array([ 1, 3, 7, 15])
>>> res=((x>>shift1)|((x&mask1)<<(9-shift1)))&0b11111111
>>> res
array([156, 75, 51, 153], dtype=int32)
>>> goal
array([156, 75, 51, 153, 64])
>>>
I don't understand why your goal array have one extra element, but the above operation give the others numbers, and adding one extra is not complicated, so adjust as necessary.
Now for explaining the ((x>>shift1)|((x&mask1)<<(9-shift1)))&0b11111111
First I notice you do a bigger shift by element, that is simple
>>> x>>shift1
array([156, 75, 51, 25], dtype=int32)
>>>
>>> list(map(bin,x>>shift1))
['0b10011100', '0b1001011', '0b110011', '0b11001']
>>>
We also want to catch the bits that would be lose with the shift, with an and with an appropriate mask we get those
>>> x&mask1
array([0, 0, 4, 4], dtype=int32)
>>> list(map(bin,mask1))
['0b1', '0b11', '0b111', '0b1111']
>>> list(map(bin,x&mask1))
['0b0', '0b0', '0b100', '0b100']
>>>
then we right shift that result by the complementary amount
>>> 9-shift1
array([8, 7, 6, 5])
>>> ((x&mask1)<<(9-shift1))
array([ 0, 0, 256, 128], dtype=int32)
>>> list(map(bin,_))
['0b0', '0b0', '0b100000000', '0b10000000']
>>>
then we or both together
>>> (x>>shift1) | ((x&mask1)<<(9-shift1))
array([156, 75, 307, 153], dtype=int32)
>>> list(map(bin,_))
['0b10011100', '0b1001011', '0b100110011', '0b10011001']
>>>
and finally we and that with 0b11111111 to keep only the 8 bit we want
Additionally, you mention that the last 2 bit are always zero, then a more simple solution is simple shift it by 2, and to recover the original just shift in back in the other direction
>>> x
array([312, 300, 412, 404])
>>> y = x>>2
>>> y
array([ 78, 75, 103, 101], dtype=int32)
>>> y<<2
array([312, 300, 412, 404], dtype=int32)
>>>

how can i delete an element in numpy array and use a for loop in python and its index

This is what I made and it doesn't work, I made a for loop and I use it to get the index and use it in another thing why doesn't it work or can I found another method to delete the element and use the index of it.
Here is some of my code
X1_train, X1_test, y1_train, y1_test = train_test_split(EclipseFeautres, EclipseClass, test_size=0.3, random_state=0)
E_critical_class=y1_train.copy()
E_critical_class = E_critical_class[E_critical_class != 1]
for x in range(len(E_critical_class)):
if(E_critical_class[x]==1):
E=np.delete(E_critical_class,x)
Your task is something like filtering of an array.
You want to drop all elements == 1.
Assume that the source array (arr) contains:
array([0, 1, 2, 3, 4, 1, 0, 3, 7, 1])
so it contains 3 elements == 1 (to be dropped).
A much simpler way to do it is to use boolean indexing and save the
result back to the original variable:
arr = arr[arr != 1]
The result is:
array([0, 2, 3, 4, 0, 3, 7])
as you wish - with all ones dropped.
#dizi icinde olan ve kendini tekrarlayan sayiyi delete etme!!!!!!
#to delete the repeated element in the numpy array
import numpy as np
a = np.array([10, 0, 0, 20, 0, 30, 40, 50, 0, 60, 70, 80, 90, 100,0])
print("Original array:")
print(a)
index=np.zeros(0)
print("index=",index)
for i in range(len(a)):
if a[i]==0:
index=np.append(index, i)
print("index=",index)
new_a=np.delete(a,index)
print("new_a=",new_a)

Numpy: 5 bits to integer (Python)

I have an array of bits.
Input: array([0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0])
And I need to transform it into an array of integer, reading 1 unsigned integer from 5 bits.
Output: array([1, 19, 14])
Because: (00001 -> 1, 10011 -> 19, 01110 -> 14)
Can I do it with numpy (or plain Python) ?
What if I need 6 bits to unsigned integer?
Reshape into an Nx5 array, and use a broadcasted multiplication and a sum to reduce along the length-5 axis:
temp = arr.reshape((-1, 5))
temp = temp * [16, 8, 4, 2, 1] # you can use *= here if you don't need to preserve the input
result = temp.sum(axis=1)
this is a bit complicated. Mabye it´s a better way to do it.
but it works. this solution is without numpy.
s = ""
arr = []
for n, i in enumerate(lst):
mod = n % 5
s += str(i)
if mod == 4:
s += " "
for item in s.split():
arr.append(int(str(item), 2))
print(arr)
Output:
[1, 14, 19]
I would suggest using a factor array. With this factor array you go over the data and multiply each chunk with this factor array and calculate its sum (which is the interrepesentation of the bit pattern)
def bitToInt(data, bits):
factor_arr = np.ones(bits)
for i in range(bits):
factor_arr[0:i] = factor_arr[0:i] * 2
res = []
for i in range(int(len(data)/bits)):
chunk = data[i*bits:(i+1)*bits]
res.append(int((chunk * factor_arr).sum()))
return np.array(res)
this gives numpy the possibilty to use vector instructions for the array mulitplication and the horizontal sum over the chunks.
PS: There might be a better way for chunk = data[i*bits:(i+1)*bits]

Check if an element in a series is increasing with respect to the previous values in a series pandas, fast solution

I want to check if elements of my series are continuously increasing.
For example if I have the following numbers:
[7, 15, 23, 0, 32, 18]
my output should be
[0, 1, 2, 0, 1, 0]
If any value is greater than the previous value, then the output value will be output of previous value + 1, otherwise it resets to zero.
I have implemented a naive for loop solution in python, which is as follows:
def const_increasing(tmp):
inc_ser = np.zeros(len(tmp))
for i in range(1, len(tmp)):
if tmp[i] > tmp[i-1]:
inc_ser[i] = 1 + inc_ser[i-1]
return inc_ser
But this solution is quite slow, as I am working with pandas series of large sizes. Is there any efficient way of implementing it ? Maybe using expanding() function or any better way in pandas or numpy.
Any help in this regard would be really appreciated.
Since you tagged pandas:
s = pd.Series([7, 15, 23, 0, 32, 18] ).diff().gt(0)
s.groupby((~s).cumsum()).cumcount().to_list()
Output:
[0, 1, 2, 0, 1, 0]
Here is an answer that does not use a cumulative sum:
import numpy as np
a = np.array([7, 15, 23, 0, 32, 18])
c = np.append(np.array([False]), (a[1:] > a[:-1]))
result = np.concatenate([np.arange(x.size)
for x in np.split(c, np.where(c == False)[0][1:])])

Extracting 2 bit integers from a string using Python

I am using python to receive a string via UDP. From each character in the string I need to extract the 4 pairs of bits and convert these to integers.
For example, if the first character in the string was "J", this is ASCII 0x4a or 0b01001010. So I would extract the pairs of bits [01, 00, 10, 10], which would be converted to [1, 0, 2, 2].
Speed is my number one priority here, so I am looking for a fast way to accomplish this.
Any help is much appreciated, thank you.
You can use np.unpackbits
def bitpairs(a):
bf = np.unpackbits(a)
return bf[1::2] + (bf[::2]<<1)
### or: return bf[1::2] | (bf[::2]<<1) but doesn't seem faster
### small example
bitpairs(np.frombuffer(b'J', 'u1'))
# array([1, 0, 2, 2], dtype=uint8)
### large example
from string import ascii_letters as L
S = np.random.choice(array(list(L), 'S1'), 1000000).view('S1000000').item(0)
### one very long byte string
S[:10], S[999990:]
# (b'fhhgXJltDu', b'AQGTlpytHo')
timeit(lambda: bitpairs(np.frombuffer(S, 'u1')), number=1000)
# 8.226706639004988
You can slice the string and convert to int assuming base 2:
>>> byt = '11100100'
>>> [int(b, 2) for b in (byt[0:2], byt[2:4], byt[4:6], byt[6:8])]
[3, 2, 1, 0]
This assume that byt is always an 8 character str, rather than the int formed through the binary literal b11100100.
More generalized solution might look something like:
>>> def get_int_slices(b: str) -> list:
... return [int(b[i:i+2], 2) for i in range(0, len(b), 2)]
...
>>> get_int_slices('1110010011100100111001001110010011100100')
[3, 2, 1, 0, 3, 2, 1, 0, 3, 2, 1, 0, 3, 2, 1, 0, 3, 2, 1, 0]
The int(x, 2) calls says, "interpret the input as being in base 2."
*To my knowledge, none of my answers have ever won a speed race against Paul Panzer's, and this one is probably no exception.

Categories

Resources