Related
If I have a numpy array with elements each representing e.g. a 9-bit integer, is there an easy way (maybe without a loop) to reorder it in a way that the resulting array elements each represent a 8-bit integer with the "lost bits" at the end of the previous element getting added at the beginning of the next element?
for example to get the following
np.array([0b100111000, 0b100101100, 0b110011100, 0b110010100]) # initial array in binarys
# convert to
np.array([0b10011100, 0b01001011, 0b00110011, 0b10011001, 0b01000000]) # resulting array
I hope it is understandable what I want to archive.
Additional info, I don't know if this makes any difference:
All of my 9-bit numbers start with the msb beeing 1 (they are bigger than 255) and the last two bits are always 0, like in the example above.
The arrays I want to process are much bigger with thousands of elements.
Thanks for your help in advance!
edit:
my current (complicated) solution is the following:
import numpy as np
def get_bits(data, offset, leng):
data = (data % (1 << (offset + leng))) >> offset
return data
data1 = np.array([0b100111000, 0b100101100, 0b110011100, 0b110010100])
i = 1
part1 = []
part2 = []
for el in data1:
if i == 1:
part2.append(0)
part1.append(get_bits(el, i, 8))
part2.append(get_bits(el, 0, i)<<(8-i))
if i == 8:
i = 1
part1.append(0)
else:
i += 1
if i != 1:
part1.append(0)
res = np.array(part1) + np.array(part2)
It's been bugging me that np.packbits and np.unpackbits are inefficient, so I came up with a bit twiddling answer.
The general idea is to work it like any resampler: you make an output array, and figure out where each piece of the output comes from in the input. You have N elements of 9 bits each, so the output is:
data = np.array([0b100111000, 0b100101100, 0b110011100, 0b110010100])
result = np.empty(np.ceil(data.size * 9 / 8).astype(int), dtype=np.uint8)
Every nine output bytes have the following pattern relative to the corresponding eight input bytes. I use {...} to indicate the (inclusive) bits in each input integer:
result[0] = data[0]{8:1}
result[1] = data[0]{0:0} data[1]{8:2}
result[2] = data[1]{1:0} data[2]{8:3}
result[3] = data[2]{2:0} data[3]{8:4}
result[4] = data[3]{3:0} data[4]{8:5}
result[5] = data[4]{4:0} data[5]{8:6}
result[6] = data[5]{5:0} data[6]{8:7}
result[7] = data[6]{6:0} data[7]{8:8}
result[8] = data[7]{7:0}
The index of result (call it i) is really given modulo 9. The index into data is therefore offset by 8 * (i // 9). The lower portion of the byte is given by data[...] >> (i + 1). The upper portion is given by data[...] & ((1 << i) - 1), shifted left by 8 - i bits.
That makes it pretty easy to come up with a vectorized solution:
i = np.arange(result.size)
j = i[:-1]
result[i] = (data[8 * (i // 9) + (i % 9) - 1] & ((1 << i % 9) - 1)) << (8 - i % 9)
result[j] |= (data[8 * (j // 9) + (j % 9)] >> (j % 9 + 1)).astype(np.uint8)
You need to clip the index of the low portion because it may go out of bounds. You don't need to clip the high portion because -1 is a perfectly valid index, and you don't care which element it accesses. And of course numpy won't let you OR or add int elements to a uint8 array, so you have to cast.
>>> [bin(x) for x in result]
['0b10011100', '0b1001011', '0b110011', '0b10011001', '0b1000000']
This solution should be scalable to arrays of any size, and I wrote it so that you can work out different combinations of shifts, not just 9-to-8.
You can do it in two steps with np.unpackbits and np.packbits. First turn your array into a little-endian column vector:
>>> z = np.array([0b100111000, 0b100101100, 0b110011100, 0b110010100], dtype='<u2').reshape(-1, 1)
>>> z.view(np.uint8)
array([[ 56, 1],
[ 44, 1],
[156, 1],
[148, 1]], dtype=uint8)
You can convert this into an array of bits directly by unpacking. In fact, at some point (PR #10855) I added the count parameter to chop of the high zeros for you:
>>> np.unpackbits(z.view(np.uint8), axis=1, bitorder='l', count=9)
array([[0, 0, 0, 1, 1, 1, 0, 0, 1],
[0, 0, 1, 1, 0, 1, 0, 0, 1],
[0, 0, 1, 1, 1, 0, 0, 1, 1],
[0, 0, 1, 0, 1, 0, 0, 1, 1]], dtype=uint8)
Now you can just repack the reversed raveled array:
>>> u = np.unpackbits(z.view(np.uint8), axis=1, bitorder='l', count=9)[:, ::-1].ravel()
>>> result = np.packbits(u)
>>> result.dtype
dtype('uint8')
>>> [bin(x) for x in result]
['0b10011100', '0b1001011', '0b110011', '0b10011001', '0b1000000']
If your machine is native little endian (e.g., most intel architectures), you can do this in a one-liner:
z = np.array([0b100111000, 0b100101100, 0b110011100, 0b110010100])
result = np.packbits(np.unpackbits(z.view(np.uint8), axis=1, bitorder='l', count=9)[:, ::-1].ravel())
Otherwise, you can do z.byteswap().view(np.uint8) to get the right starting order (still one liner though, I suppose).
I think I understood most of what you want, and given that you can do bit operation with numpy arrays in which case you get the desire bit operation element wise if do it with two array (or the same for all if it is an array vs a number), then you need to construct the appropriate arrays to do the thing, so something like this
>>> import numpy as np
>>> x = np.array([0b100111000, 0b100101100, 0b110011100, 0b110010100])
>>> goal=np.array([0b10011100, 0b01001011, 0b00110011, 0b10011001, 0b01000000])
>>> x
array([312, 300, 412, 404])
>>> goal
array([156, 75, 51, 153, 64])
>>> shift1 = np.array(range(1,1+len(x)))
>>> shift1
array([1, 2, 3, 4])
>>> mask1 = np.array([2**n -1 for n in range(1,1+len(x))])
>>> mask1
array([ 1, 3, 7, 15])
>>> res=((x>>shift1)|((x&mask1)<<(9-shift1)))&0b11111111
>>> res
array([156, 75, 51, 153], dtype=int32)
>>> goal
array([156, 75, 51, 153, 64])
>>>
I don't understand why your goal array have one extra element, but the above operation give the others numbers, and adding one extra is not complicated, so adjust as necessary.
Now for explaining the ((x>>shift1)|((x&mask1)<<(9-shift1)))&0b11111111
First I notice you do a bigger shift by element, that is simple
>>> x>>shift1
array([156, 75, 51, 25], dtype=int32)
>>>
>>> list(map(bin,x>>shift1))
['0b10011100', '0b1001011', '0b110011', '0b11001']
>>>
We also want to catch the bits that would be lose with the shift, with an and with an appropriate mask we get those
>>> x&mask1
array([0, 0, 4, 4], dtype=int32)
>>> list(map(bin,mask1))
['0b1', '0b11', '0b111', '0b1111']
>>> list(map(bin,x&mask1))
['0b0', '0b0', '0b100', '0b100']
>>>
then we right shift that result by the complementary amount
>>> 9-shift1
array([8, 7, 6, 5])
>>> ((x&mask1)<<(9-shift1))
array([ 0, 0, 256, 128], dtype=int32)
>>> list(map(bin,_))
['0b0', '0b0', '0b100000000', '0b10000000']
>>>
then we or both together
>>> (x>>shift1) | ((x&mask1)<<(9-shift1))
array([156, 75, 307, 153], dtype=int32)
>>> list(map(bin,_))
['0b10011100', '0b1001011', '0b100110011', '0b10011001']
>>>
and finally we and that with 0b11111111 to keep only the 8 bit we want
Additionally, you mention that the last 2 bit are always zero, then a more simple solution is simple shift it by 2, and to recover the original just shift in back in the other direction
>>> x
array([312, 300, 412, 404])
>>> y = x>>2
>>> y
array([ 78, 75, 103, 101], dtype=int32)
>>> y<<2
array([312, 300, 412, 404], dtype=int32)
>>>
The code I have tried
rand_array2 = np.random.randint(0,3, size=1000)
rand_array2
Searching for a way that allows me to count the (1,1,2) Sequences in rand_array2 with a for a loop.
A possible solution could look like:
import numpy as np
rand_array2 = np.random.randint(0,3, size=1000)
pattern = np.array((1,1,2))
matches=0
for i in range(len(rand_array2)-len(pattern)+1):
match = rand_array2[i:i+3]==pattern
if match.all():
matches+=1
matches
Try this approach. Here we loop through the range of the random array and match any sequences. This should work with any size of sequence as long as it's less than the length of the random array.
seq = (1, 1, 2)
arr = (1, 1, 2, 1, 3, 2, 3, 2, 1, 1, 2, 1)
count = 0
for i in range(len(arr) - len(seq) + 1):
if seq == arr[i:i+len(seq)]:
count += 1
Edit: was a bit late.
I've been trying to accomplish a simple linear sort that will, in this case, make a swap at every index except for when it reaches the end. Kindly help. (the while loop might be unnecessary at this point)
array = list(range(9, -1, -1))
has_flipped = True
while has_flipped:
for num in array:
if array.index(num) == (len(array) - 1):
continue
if num > array[array.index(num) + 1]:
container = array[array.index(num) + 1]
array[array.index(num) + 1] = num
num = container
has_flipped = False
has_flipped = not has_flipped
I expect a list with the numbers 0 through 9 but I instead get 9, 9, 7, 7, 5, 5, 3, 3, 1, 1.
You do not swap rightly. You never assign to array locations in right manner. As the other answerer explains...
num = container
...does not assign to an array location.
Moreover, the while loop is not required. Here is a more compact way of doing the same:
array = list(range(9, -1, -1))
ln = len(array)
for num in array:
if num > array[ln-1]:
container = array[ln-1]
array[ln-1] = num
array[array.index(num)] = container
ln -= 1
print(array)
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Given a list of data as follows:
input = [1,1,1,1,5,5,3,3,3,3,3,3,2,2,2,5,5]
I would like to create an algorithm that is able to offset the list of certain number of steps. For example, if the offset = -1:
def offsetFunc(inputList, offsetList):
#make something
return output
where:
output = [0,0,0,0,1,1,5,5,5,5,5,5,3,3,3,2,2]
Important Note: The elements of the list are float numbers and they are not in any progression. So I actually need to shift them, I cannot use any work-around for getting the result.
So basically, the algorithm should replace the first set of values (the 4 "1", basically) with the 0 and then it should:
Detect the lenght of the next range of values
Create a parallel output vectors with the values delayed by one set
The way I have roughly described the algorithm above is how I would do it. However I'm a newbie to Python (and even beginner in general programming) and I have figured out time by time that Python has a lot of built-in functions that could make the algorithm less heavy and iterating. Does anyone have any suggestion to better develop a script to make this kind of job? This is the code I have written so far (assuming a static offset at -1):
input = [1,1,1,1,5,5,3,3,3,3,3,3,2,2,2,5,5]
output = []
PrevVal = 0
NextVal = input[0]
i = 0
while input[i] == NextVal:
output.append(PrevVal)
i += 1
while i < len(input):
PrevVal = NextVal
NextVal = input[i]
while input[i] == NextVal:
output.append(PrevVal)
i += 1
if i >= len(input):
break
print output
Thanks in advance for any help!
BETTER DESCRIPTION
My list will always be composed of "sets" of values. They are usually float numbers, and they take values such as this short example below:
Sample = [1.236,1.236,1.236,1.236,1.863,1.863,1.863,1.863,1.863,1.863]
In this example, the first set (the one with value "1.236") is long 4 while the second one is long 6. What I would like to get as an output, when the offset = -1, is:
The value "0.000" in the first 4 elements;
The value "1.236" in the second 6 elements.
So basically, this "offset" function is creating the list with the same "structure" (ranges of lengths) but with the values delayed by "offset" times.
I hope it's clear now, unfortunately the problem itself is still a bit silly to me (plus I don't even speak good English :) )
Please don't hesitate to ask any additional info to complete the question and make it clearer.
How about this:
def generateOutput(input, value=0, offset=-1):
values = []
for i in range(len(input)):
if i < 1 or input[i] == input[i-1]:
yield value
else: # value change in input detected
values.append(input[i-1])
if len(values) >= -offset:
value = values.pop(0)
yield value
input = [1,1,1,1,5,5,3,3,3,3,3,3,2,2,2,5,5]
print list(generateOutput(input))
It will print this:
[0, 0, 0, 0, 1, 1, 5, 5, 5, 5, 5, 5, 3, 3, 3, 2, 2]
And in case you just want to iterate, you do not even need to build the list. Just use for i in generateOutput(input): … then.
For other offsets, use this:
print list(generateOutput(input, 0, -2))
prints:
[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 5, 5, 5, 3, 3]
Using deque as the queue, and using maxlen to define the shift length. Only holding unique values. pushing inn new values at the end, pushes out old values at the start of the queue, when the shift length has been reached.
from collections import deque
def shift(it, shift=1):
q = deque(maxlen=shift+1)
q.append(0)
for i in it:
if q[-1] != i:
q.append(i)
yield q[0]
Sample = [1.236,1.236,1.236,1.236,1.863,1.863,1.863,1.863,1.863,1.863]
print list(shift(Sample))
#[0, 0, 0, 0, 1.236, 1.236, 1.236, 1.236, 1.236, 1.236]
My try:
#Input
input = [1,1,1,1,5,5,3,3,3,3,3,3,2,2,2,5,5]
shift = -1
#Build service structures: for each 'set of data' store its length and its value
set_lengths = []
set_values = []
prev_value = None
set_length = 0
for value in input:
if prev_value is not None and value != prev_value:
set_lengths.append(set_length)
set_values.append(prev_value)
set_length = 0
set_length += 1
prev_value = value
else:
set_lengths.append(set_length)
set_values.append(prev_value)
#Output the result, shifting the values
output = []
for i, l in enumerate(set_lengths):
j = i + shift
if j < 0:
output += [0] * l
else:
output += [set_values[j]] * l
print input
print output
gives:
[1, 1, 1, 1, 5, 5, 3, 3, 3, 3, 3, 3, 2, 2, 2, 5, 5]
[0, 0, 0, 0, 1, 1, 5, 5, 5, 5, 5, 5, 3, 3, 3, 2, 2]
def x(list, offset):
return [el + offset for el in list]
A completely different approach than my first answer is this:
import itertools
First analyze the input:
values, amounts = zip(*((n, len(list(g))) for n, g in itertools.groupby(input)))
We now have (1, 5, 3, 2, 5) and (4, 2, 6, 3, 2). Now apply the offset:
values = (0,) * (-offset) + values # nevermind that it is longer now.
And synthesize it again:
output = sum([ [v] * a for v, a in zip(values, amounts) ], [])
This is way more elegant, way less understandable and probably way more expensive than my other answer, but I didn't want to hide it from you.
Imagine I have a numpy array and I need to find the spans/ranges where that condition is True. For example, I have the following array in which I'm trying to find spans where items are greater than 1:
[0, 0, 0, 2, 2, 0, 2, 2, 2, 0]
I would need to find indices (start, stop):
(3, 5)
(6, 9)
The fastest thing I've been able to implement is making a boolean array of:
truth = data > threshold
and then looping through the array using numpy.argmin and numpy.argmax to find start and end positions.
pos = 0
truth = container[RATIO,:] > threshold
while pos < len(truth):
start = numpy.argmax(truth[pos:]) + pos + offset
end = numpy.argmin(truth[start:]) + start + offset
if not truth[start]:#nothing more
break
if start == end:#goes to the end
end = len(truth)
pos = end
But this has been too slow for the billions of positions in my arrays and the fact that the spans I'm finding are usually just a few positions in a row. Does anyone know a faster way to find these spans?
How's one way. First take the boolean array you have:
In [11]: a
Out[11]: array([0, 0, 0, 2, 2, 0, 2, 2, 2, 0])
In [12]: a1 = a > 1
Shift it one to the left (to get the next state at each index) using roll:
In [13]: a1_rshifted = np.roll(a1, 1)
In [14]: starts = a1 & ~a1_rshifted # it's True but the previous isn't
In [15]: ends = ~a1 & a1_rshifted
Where this is non-zero is the start of each True batch (or, respectively, end batch):
In [16]: np.nonzero(starts)[0], np.nonzero(ends)[0]
Out[16]: (array([3, 6]), array([5, 9]))
And zipping these together:
In [17]: zip(np.nonzero(starts)[0], np.nonzero(ends)[0])
Out[17]: [(3, 5), (6, 9)]
If you have access to the scipy library:
You can use scipy.ndimage.measurements.label to identify any regions of non zero value. it returns an array where the value of each element is the id of a span or range in the original array.
You can then use scipy.ndimage.measurements.find_objects to return the slices you would need to extract those ranges. You can access the start / end values directly from those slices.
In your example:
import numpy
from scipy.ndimage.measurements import label, find_objects
data = numpy.array([0, 0, 0, 2, 2, 0, 2, 2, 2, 0])
labels, number_of_regions = label(data)
ranges = find_objects(labels)
for identified_range in ranges:
print(identified_range[0].start, identified_range[0].stop)
You should see:
3 5
6 9
Hope this helps!