Extracting 2 bit integers from a string using Python - python

I am using python to receive a string via UDP. From each character in the string I need to extract the 4 pairs of bits and convert these to integers.
For example, if the first character in the string was "J", this is ASCII 0x4a or 0b01001010. So I would extract the pairs of bits [01, 00, 10, 10], which would be converted to [1, 0, 2, 2].
Speed is my number one priority here, so I am looking for a fast way to accomplish this.
Any help is much appreciated, thank you.

You can use np.unpackbits
def bitpairs(a):
bf = np.unpackbits(a)
return bf[1::2] + (bf[::2]<<1)
### or: return bf[1::2] | (bf[::2]<<1) but doesn't seem faster
### small example
bitpairs(np.frombuffer(b'J', 'u1'))
# array([1, 0, 2, 2], dtype=uint8)
### large example
from string import ascii_letters as L
S = np.random.choice(array(list(L), 'S1'), 1000000).view('S1000000').item(0)
### one very long byte string
S[:10], S[999990:]
# (b'fhhgXJltDu', b'AQGTlpytHo')
timeit(lambda: bitpairs(np.frombuffer(S, 'u1')), number=1000)
# 8.226706639004988

You can slice the string and convert to int assuming base 2:
>>> byt = '11100100'
>>> [int(b, 2) for b in (byt[0:2], byt[2:4], byt[4:6], byt[6:8])]
[3, 2, 1, 0]
This assume that byt is always an 8 character str, rather than the int formed through the binary literal b11100100.
More generalized solution might look something like:
>>> def get_int_slices(b: str) -> list:
... return [int(b[i:i+2], 2) for i in range(0, len(b), 2)]
...
>>> get_int_slices('1110010011100100111001001110010011100100')
[3, 2, 1, 0, 3, 2, 1, 0, 3, 2, 1, 0, 3, 2, 1, 0, 3, 2, 1, 0]
The int(x, 2) calls says, "interpret the input as being in base 2."
*To my knowledge, none of my answers have ever won a speed race against Paul Panzer's, and this one is probably no exception.

Related

Reorder numpy array to new bitlength elements without loop

If I have a numpy array with elements each representing e.g. a 9-bit integer, is there an easy way (maybe without a loop) to reorder it in a way that the resulting array elements each represent a 8-bit integer with the "lost bits" at the end of the previous element getting added at the beginning of the next element?
for example to get the following
np.array([0b100111000, 0b100101100, 0b110011100, 0b110010100]) # initial array in binarys
# convert to
np.array([0b10011100, 0b01001011, 0b00110011, 0b10011001, 0b01000000]) # resulting array
I hope it is understandable what I want to archive.
Additional info, I don't know if this makes any difference:
All of my 9-bit numbers start with the msb beeing 1 (they are bigger than 255) and the last two bits are always 0, like in the example above.
The arrays I want to process are much bigger with thousands of elements.
Thanks for your help in advance!
edit:
my current (complicated) solution is the following:
import numpy as np
def get_bits(data, offset, leng):
data = (data % (1 << (offset + leng))) >> offset
return data
data1 = np.array([0b100111000, 0b100101100, 0b110011100, 0b110010100])
i = 1
part1 = []
part2 = []
for el in data1:
if i == 1:
part2.append(0)
part1.append(get_bits(el, i, 8))
part2.append(get_bits(el, 0, i)<<(8-i))
if i == 8:
i = 1
part1.append(0)
else:
i += 1
if i != 1:
part1.append(0)
res = np.array(part1) + np.array(part2)
It's been bugging me that np.packbits and np.unpackbits are inefficient, so I came up with a bit twiddling answer.
The general idea is to work it like any resampler: you make an output array, and figure out where each piece of the output comes from in the input. You have N elements of 9 bits each, so the output is:
data = np.array([0b100111000, 0b100101100, 0b110011100, 0b110010100])
result = np.empty(np.ceil(data.size * 9 / 8).astype(int), dtype=np.uint8)
Every nine output bytes have the following pattern relative to the corresponding eight input bytes. I use {...} to indicate the (inclusive) bits in each input integer:
result[0] = data[0]{8:1}
result[1] = data[0]{0:0} data[1]{8:2}
result[2] = data[1]{1:0} data[2]{8:3}
result[3] = data[2]{2:0} data[3]{8:4}
result[4] = data[3]{3:0} data[4]{8:5}
result[5] = data[4]{4:0} data[5]{8:6}
result[6] = data[5]{5:0} data[6]{8:7}
result[7] = data[6]{6:0} data[7]{8:8}
result[8] = data[7]{7:0}
The index of result (call it i) is really given modulo 9. The index into data is therefore offset by 8 * (i // 9). The lower portion of the byte is given by data[...] >> (i + 1). The upper portion is given by data[...] & ((1 << i) - 1), shifted left by 8 - i bits.
That makes it pretty easy to come up with a vectorized solution:
i = np.arange(result.size)
j = i[:-1]
result[i] = (data[8 * (i // 9) + (i % 9) - 1] & ((1 << i % 9) - 1)) << (8 - i % 9)
result[j] |= (data[8 * (j // 9) + (j % 9)] >> (j % 9 + 1)).astype(np.uint8)
You need to clip the index of the low portion because it may go out of bounds. You don't need to clip the high portion because -1 is a perfectly valid index, and you don't care which element it accesses. And of course numpy won't let you OR or add int elements to a uint8 array, so you have to cast.
>>> [bin(x) for x in result]
['0b10011100', '0b1001011', '0b110011', '0b10011001', '0b1000000']
This solution should be scalable to arrays of any size, and I wrote it so that you can work out different combinations of shifts, not just 9-to-8.
You can do it in two steps with np.unpackbits and np.packbits. First turn your array into a little-endian column vector:
>>> z = np.array([0b100111000, 0b100101100, 0b110011100, 0b110010100], dtype='<u2').reshape(-1, 1)
>>> z.view(np.uint8)
array([[ 56, 1],
[ 44, 1],
[156, 1],
[148, 1]], dtype=uint8)
You can convert this into an array of bits directly by unpacking. In fact, at some point (PR #10855) I added the count parameter to chop of the high zeros for you:
>>> np.unpackbits(z.view(np.uint8), axis=1, bitorder='l', count=9)
array([[0, 0, 0, 1, 1, 1, 0, 0, 1],
[0, 0, 1, 1, 0, 1, 0, 0, 1],
[0, 0, 1, 1, 1, 0, 0, 1, 1],
[0, 0, 1, 0, 1, 0, 0, 1, 1]], dtype=uint8)
Now you can just repack the reversed raveled array:
>>> u = np.unpackbits(z.view(np.uint8), axis=1, bitorder='l', count=9)[:, ::-1].ravel()
>>> result = np.packbits(u)
>>> result.dtype
dtype('uint8')
>>> [bin(x) for x in result]
['0b10011100', '0b1001011', '0b110011', '0b10011001', '0b1000000']
If your machine is native little endian (e.g., most intel architectures), you can do this in a one-liner:
z = np.array([0b100111000, 0b100101100, 0b110011100, 0b110010100])
result = np.packbits(np.unpackbits(z.view(np.uint8), axis=1, bitorder='l', count=9)[:, ::-1].ravel())
Otherwise, you can do z.byteswap().view(np.uint8) to get the right starting order (still one liner though, I suppose).
I think I understood most of what you want, and given that you can do bit operation with numpy arrays in which case you get the desire bit operation element wise if do it with two array (or the same for all if it is an array vs a number), then you need to construct the appropriate arrays to do the thing, so something like this
>>> import numpy as np
>>> x = np.array([0b100111000, 0b100101100, 0b110011100, 0b110010100])
>>> goal=np.array([0b10011100, 0b01001011, 0b00110011, 0b10011001, 0b01000000])
>>> x
array([312, 300, 412, 404])
>>> goal
array([156, 75, 51, 153, 64])
>>> shift1 = np.array(range(1,1+len(x)))
>>> shift1
array([1, 2, 3, 4])
>>> mask1 = np.array([2**n -1 for n in range(1,1+len(x))])
>>> mask1
array([ 1, 3, 7, 15])
>>> res=((x>>shift1)|((x&mask1)<<(9-shift1)))&0b11111111
>>> res
array([156, 75, 51, 153], dtype=int32)
>>> goal
array([156, 75, 51, 153, 64])
>>>
I don't understand why your goal array have one extra element, but the above operation give the others numbers, and adding one extra is not complicated, so adjust as necessary.
Now for explaining the ((x>>shift1)|((x&mask1)<<(9-shift1)))&0b11111111
First I notice you do a bigger shift by element, that is simple
>>> x>>shift1
array([156, 75, 51, 25], dtype=int32)
>>>
>>> list(map(bin,x>>shift1))
['0b10011100', '0b1001011', '0b110011', '0b11001']
>>>
We also want to catch the bits that would be lose with the shift, with an and with an appropriate mask we get those
>>> x&mask1
array([0, 0, 4, 4], dtype=int32)
>>> list(map(bin,mask1))
['0b1', '0b11', '0b111', '0b1111']
>>> list(map(bin,x&mask1))
['0b0', '0b0', '0b100', '0b100']
>>>
then we right shift that result by the complementary amount
>>> 9-shift1
array([8, 7, 6, 5])
>>> ((x&mask1)<<(9-shift1))
array([ 0, 0, 256, 128], dtype=int32)
>>> list(map(bin,_))
['0b0', '0b0', '0b100000000', '0b10000000']
>>>
then we or both together
>>> (x>>shift1) | ((x&mask1)<<(9-shift1))
array([156, 75, 307, 153], dtype=int32)
>>> list(map(bin,_))
['0b10011100', '0b1001011', '0b100110011', '0b10011001']
>>>
and finally we and that with 0b11111111 to keep only the 8 bit we want
Additionally, you mention that the last 2 bit are always zero, then a more simple solution is simple shift it by 2, and to recover the original just shift in back in the other direction
>>> x
array([312, 300, 412, 404])
>>> y = x>>2
>>> y
array([ 78, 75, 103, 101], dtype=int32)
>>> y<<2
array([312, 300, 412, 404], dtype=int32)
>>>

Pack data, extreme Bitpacking in Python

I need to pack information as closely as possible into a bitstream.
I have variables with a different number of distinct states:
Number_of_states=[3,5,129,15,6,2]# A bit longer in reality
The best option I have in the Moment would be to create a bitfield, using
2+3+8+4+3+1 bit ->21 bit
However it should be possible to pack these states into np.log2(3*5*129*15*6*2)=18.4 bits, saving two bits. (In reality I have 298 bits an need to save a few)
In my case this would save about >5% of the data stream, which would help a lot.
Is there a viable solution in python to pack the data in this way? I tried packalgorithms, but they create too much overhead with just a few bytes of data. The string is no problem, it is constant and will be transmitted beforehand.
This is the code I am using in the moment:
from bitstring import pack
import numpy as np
DATA_TO_BE_PACKED=np.zeros(6)
Number_of_states=[3,5,129,15,6,2]#mutch longer in reality
DATA_TO_BE_PACKED=np.random.randint(Number_of_states)
string=''
for item in Number_of_states:
string+='uint:{}, '.format(int(np.ceil(np.log2(item))))
PACKED_DATA = pack(string,*DATA_TO_BE_PACKED)
print(len(PACKED_DATA ))
print(PACKED_DATA.unpack(string))
You can interpret the state as an index into a multidimensional array with shape (3, 5, 129, 15, 6, 2). This index can be encoded as the integer index into the flattened 1-d array with length 3*5*129*15*6*2 = 348300. NumPy has the functions ravel_multi_index and unravel_index that can do this for you.
For example, let num_states be the number of states for each component of your state:
In [86]: num_states = [3, 5, 129, 15, 6, 2]
Suppose state holds an instance of the data; that is, it records the state of each component:
In [87]: state = [2, 3, 78, 9, 0, 1]
To encode this state, pass it through ravel_multi_index. idx is the encoded state:
In [88]: idx = np.ravel_multi_index(state, num_states)
In [89]: idx
Out[89]: 316009
By construction, 0 <= idx < 348300, so it requires only 19 bits.
To restore state from idx, use unravel_index:
In [90]: np.unravel_index(idx, num_states)
Out[90]: (2, 3, 78, 9, 0, 1)
This looks like a usecase of a mixed radix numeral system.
A quick proof of concept:
num_states = [3, 5, 129, 15, 6, 2]
input_data = [2, 3, 78, 9, 0, 1]
print("Input data: %s" % input_data)
To encode, you start with a 0, and for each state first multiply by number of states, and then add the current state:
encoded = 0
for i in range(len(num_states)):
encoded *= num_states[i]
encoded += input_data[i]
print("Encoded: %d" % encoded)
To decode, you go in reverse, and get remainder of division by number of states, and then divide by number of states:
decoded_data = []
for n in reversed(num_states):
v = encoded % n
encoded = encoded // n
decoded_data.insert(0, v)
print("Decoded data: %s" % decoded_data)
Example output:
Input data: [2, 3, 78, 9, 0, 1]
Encoded: 316009
Decoded data: [2, 3, 78, 9, 0, 1]
Another example with more values:
Input data: [2, 3, 78, 9, 0, 1, 84, 17, 4, 5, 30, 1]
Encoded: 14092575747751
Decoded data: [2L, 3L, 78L, 9L, 0L, 1L, 84L, 17L, 4L, 5L, 30L, 1L]

How to initialize a list of integer with certain gap in Python

I am trying to translate some of my Java algorithm codes into Python.
For the following Java code, I could not figure out a clean way to convert:
int[] groupBaseIndex = IntStream.iterate(0, n -> n < size, n -> n + step).toArray();
Basically, it generate an array from 0 to size with a step. For example, if size=14 and step=4, it generates:
[0,4,8,12]
Can someone teach me how to convert this cleanly into Python? I am sure there must be a one liner to do this in Python.
You can use the built-in range() method:
size = 14
step = 4
print(list(range(0, size, step)))
Output:
[0, 4, 8, 12]
You can use a list comprehension to generate the list of values.
>>> groupBaseIndex = [i for i in range(0, 13, 4)]
>>> groupBaseIndex
[0, 4, 8, 12]
>>>
alternative you can use list(range()), for example
>>> groupBaseIndex = list(range(0, 13, 4))
>>> groupBaseIndex
[0, 4, 8, 12]
>>>

Keeping track of skipped numbers in Python

I'm pretty new to Python and I'm looking for a way for Python to keep track of skipped numbers in a sequence. For example, if I have a folder with pictures numbered 1-100, but 47, 58 and 98 are missing in the directory, how can I keep track of this?
You can subtract your set with missing numbers from a complete set of all the numbers, e.g.:
>>> incomplete_set = { 0, 1, 2, 3, 4, 6, 8, 9 }
>>> complete_set = set(range(10))
>>> complete_set - incomplete_set
set([5, 7])

Algorithm to offset a list of data

Given a list of data as follows:
input = [1,1,1,1,5,5,3,3,3,3,3,3,2,2,2,5,5]
I would like to create an algorithm that is able to offset the list of certain number of steps. For example, if the offset = -1:
def offsetFunc(inputList, offsetList):
#make something
return output
where:
output = [0,0,0,0,1,1,5,5,5,5,5,5,3,3,3,2,2]
Important Note: The elements of the list are float numbers and they are not in any progression. So I actually need to shift them, I cannot use any work-around for getting the result.
So basically, the algorithm should replace the first set of values (the 4 "1", basically) with the 0 and then it should:
Detect the lenght of the next range of values
Create a parallel output vectors with the values delayed by one set
The way I have roughly described the algorithm above is how I would do it. However I'm a newbie to Python (and even beginner in general programming) and I have figured out time by time that Python has a lot of built-in functions that could make the algorithm less heavy and iterating. Does anyone have any suggestion to better develop a script to make this kind of job? This is the code I have written so far (assuming a static offset at -1):
input = [1,1,1,1,5,5,3,3,3,3,3,3,2,2,2,5,5]
output = []
PrevVal = 0
NextVal = input[0]
i = 0
while input[i] == NextVal:
output.append(PrevVal)
i += 1
while i < len(input):
PrevVal = NextVal
NextVal = input[i]
while input[i] == NextVal:
output.append(PrevVal)
i += 1
if i >= len(input):
break
print output
Thanks in advance for any help!
BETTER DESCRIPTION
My list will always be composed of "sets" of values. They are usually float numbers, and they take values such as this short example below:
Sample = [1.236,1.236,1.236,1.236,1.863,1.863,1.863,1.863,1.863,1.863]
In this example, the first set (the one with value "1.236") is long 4 while the second one is long 6. What I would like to get as an output, when the offset = -1, is:
The value "0.000" in the first 4 elements;
The value "1.236" in the second 6 elements.
So basically, this "offset" function is creating the list with the same "structure" (ranges of lengths) but with the values delayed by "offset" times.
I hope it's clear now, unfortunately the problem itself is still a bit silly to me (plus I don't even speak good English :) )
Please don't hesitate to ask any additional info to complete the question and make it clearer.
How about this:
def generateOutput(input, value=0, offset=-1):
values = []
for i in range(len(input)):
if i < 1 or input[i] == input[i-1]:
yield value
else: # value change in input detected
values.append(input[i-1])
if len(values) >= -offset:
value = values.pop(0)
yield value
input = [1,1,1,1,5,5,3,3,3,3,3,3,2,2,2,5,5]
print list(generateOutput(input))
It will print this:
[0, 0, 0, 0, 1, 1, 5, 5, 5, 5, 5, 5, 3, 3, 3, 2, 2]
And in case you just want to iterate, you do not even need to build the list. Just use for i in generateOutput(input): … then.
For other offsets, use this:
print list(generateOutput(input, 0, -2))
prints:
[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 5, 5, 5, 3, 3]
Using deque as the queue, and using maxlen to define the shift length. Only holding unique values. pushing inn new values at the end, pushes out old values at the start of the queue, when the shift length has been reached.
from collections import deque
def shift(it, shift=1):
q = deque(maxlen=shift+1)
q.append(0)
for i in it:
if q[-1] != i:
q.append(i)
yield q[0]
Sample = [1.236,1.236,1.236,1.236,1.863,1.863,1.863,1.863,1.863,1.863]
print list(shift(Sample))
#[0, 0, 0, 0, 1.236, 1.236, 1.236, 1.236, 1.236, 1.236]
My try:
#Input
input = [1,1,1,1,5,5,3,3,3,3,3,3,2,2,2,5,5]
shift = -1
#Build service structures: for each 'set of data' store its length and its value
set_lengths = []
set_values = []
prev_value = None
set_length = 0
for value in input:
if prev_value is not None and value != prev_value:
set_lengths.append(set_length)
set_values.append(prev_value)
set_length = 0
set_length += 1
prev_value = value
else:
set_lengths.append(set_length)
set_values.append(prev_value)
#Output the result, shifting the values
output = []
for i, l in enumerate(set_lengths):
j = i + shift
if j < 0:
output += [0] * l
else:
output += [set_values[j]] * l
print input
print output
gives:
[1, 1, 1, 1, 5, 5, 3, 3, 3, 3, 3, 3, 2, 2, 2, 5, 5]
[0, 0, 0, 0, 1, 1, 5, 5, 5, 5, 5, 5, 3, 3, 3, 2, 2]
def x(list, offset):
return [el + offset for el in list]
A completely different approach than my first answer is this:
import itertools
First analyze the input:
values, amounts = zip(*((n, len(list(g))) for n, g in itertools.groupby(input)))
We now have (1, 5, 3, 2, 5) and (4, 2, 6, 3, 2). Now apply the offset:
values = (0,) * (-offset) + values # nevermind that it is longer now.
And synthesize it again:
output = sum([ [v] * a for v, a in zip(values, amounts) ], [])
This is way more elegant, way less understandable and probably way more expensive than my other answer, but I didn't want to hide it from you.

Categories

Resources