numpy parse int into bit groupings - python

I have a np.array of np.uint8
a = np.array([randint(1,255) for _ in range(100)],dtype=np.uint8)
and I want to split this into low and high nibbles
I could get the low nibble
low = np.bitwise_and(a,0xF)
and I could get the high nibble with
high = np.bitwise_and(np.right_shift(a,4),0xF)
is there some way to do something like
>>> numpy.keep_bits(a,[(0,3),(4,7)])
numpy.array([
[low1,high1],
[low2,high2],
...
[lowN,highN]
])
Im not even sure what this would be called ... but I thought maybe some numpy guru would know a cool way to do this (in reality i am looking to do this with uint32's and much more varied nibbles
basically something like struct.unpack but for vectorized numpy operations
EDIT: I went with a modified version of the accepted answer below
here is my final code for anyone interested
def bitmask(start,end):
"""
>>> bitmask(0,2) == 0b111
>>> bitmask(3,5) == 0b111000
:param start: start bit
:param end: end bit (unlike range, end bit is inclusive)
:return: integer bitmask for the specified bit pattern
"""
return (2**(end+1-start)-1)<<start
def mask_and_shift(a,mask_a,shift_a):
"""
:param a: np.array
:param mask_a: array of masks to apply (must be same size as shift_a)
:param shift_a: array of shifts to apply (must be same size as mask_a)
:return: reshaped a, that has masks and shifts applied
"""
masked_a = numpy.bitwise_and(a.reshape(-1,1), mask_a)
return numpy.right_shift(masked_a,shift_a)
def bit_partition(rawValues,bit_groups):
"""
>>> a = numpy.array([1,15,16,17,125,126,127,128,129,254,255])
>>> bit_partition(a,[(0,2),(3,7)])
>>> bit_partition(a,[(0,2),(3,5),(6,7)])
:param rawValues: np.array of raw values
:param bit_groups: list of start_bit,end_bit values for where to bit twiddle
:return: np.array len(rawValues)xlen(bit_groups)
"""
masks,shifts = zip(*[(bitmask(s,e),s) for s,e in bit_groups])
return mask_and_shift(rawValues,masks,shifts)

A one-liner, using broadcasting, for the four bit lower and upper nibbles:
In [38]: a
Out[38]: array([ 1, 15, 16, 17, 127, 128, 255], dtype=uint8)
In [39]: (a.reshape(-1,1) & np.array([0xF, 0xF0], dtype=np.uint8)) >> np.array([0, 4], dtype=np.uint8)
Out[39]:
array([[ 1, 0],
[15, 0],
[ 0, 1],
[ 1, 1],
[15, 7],
[ 0, 8],
[15, 15]], dtype=uint8)
To generalize this, replace the hardcoded values [0xF, 0xF0] and [0, 4] with the appropriate bit masks and shifts. For example, to split the values into three groups, containing the highest two bits, followed by the remaining two groups of three bits, you can do this:
In [41]: masks = np.array([0b11000000, 0b00111000, 0b00000111], dtype=np.uint8)
In [42]: shifts = np.array([6, 3, 0], dtype=np.uint8)
In [43]: a
Out[43]: array([ 1, 15, 16, 17, 127, 128, 255], dtype=uint8)
In [44]: (a.reshape(-1,1) & np.array(masks, dtype=np.uint8)) >> np.array(shifts, dtype=np.uint8)
Out[44]:
array([[0, 0, 1],
[0, 1, 7],
[0, 2, 0],
[0, 2, 1],
[1, 7, 7],
[2, 0, 0],
[3, 7, 7]], dtype=uint8)

So, I won't comment on the specific logical operators you want to implement, since bit-hacking isn't quite a specialty of mine, but I can tell you where you should look in numpy to implement this kind of custom operator.
If you look through the numpy source you'll notice that nearly all of the bit-maniuplation techniques in numpy are just instances of _MaskedBinaryOperation, for example, the definition of bitwise_and is simply:
bitwise_and = _MaskedBinaryOperation(umath.bitwise_and)
The magic here comes in the form of the umath module, which calls down, typically to the low level libraries that numpy is built on. If you really want to, you could add your operator there, but I don't think it's worth mucking around at that level.
That said, this isn't the only way to incorporate these functions into numpy. In fact, the umath module has a really handy function called frompyfunc that will let you turn an arbitrary python function into one of these handy umath operators. Documentation can be found here. An example of creating such a function is below:
>>> oct_array = np.frompyfunc(oct, 1, 1)
>>> oct_array(np.array((10, 30, 100)))
array([012, 036, 0144], dtype=object)
>>> np.array((oct(10), oct(30), oct(100))) # for comparison
array(['012', '036', '0144'],
dtype='|S4')
If you decide on the specifics of the bitwise operator you want to implement, using this interface would be the best way to implement it.
This doesn't answer 100% of your question, but I assumed your question was much more about implementing some custom bitwise operator in proper numpy form rather than digging into the bitwise operator itself. Let me know if that's inaccurate and I can put together an example using the bitwise operator you alluded to above.

Related

Getting rows corresponding to label, for many labels

I have a 2D array, where each row has a label that is stored in a separate array (not necessarily unique). For each label, I want to extract the rows from my 2D array that have this label. A basic working example of what I want would be this:
import numpy as np
data=np.array([[1,2],[3,5],[7,10], [20,32],[0,0]])
label=np.array([1,1,1,0,1])
#very simple approach
label_values=np.unique(label)
res=[]
for la in label_values:
data_of_this_label_val=data[label==la]
res+=[data_of_this_label_val]
print(res)
The result (res) can have any format, as long as it is easily accessible. In the above example, it would be
[array([[20, 32]]), array([[ 1, 2],
[ 3, 5],
[ 7, 10],
[ 0, 0]])]
Note that I can easily associate each element in my list to one of the unique labels in label_values (that is, by index).
While this works, using a for loop can take quite a lot of time, especially if my label vector is large. Can this be sped up or coded more elegantly?
You can argsort the labels (which is what unique does under the hood I believe).
If your labels are small nonnegatvie integers as in the example you can get it a bit cheaper, see https://stackoverflow.com/a/53002966/7207392.
>>> import numpy as np
>>>
>>> data=np.array([[1,2],[3,5],[7,10], [20,32],[0,0]])
>>> label=np.array([1,1,1,0,1])
>>>
>>> idx = label.argsort()
# use kind='mergesort' if you require a stable sort, i.e. one that
# preserves the order of equal labels
>>> ls = label[idx]
>>> split = 1 + np.where(ls[1:] != ls[:-1])[0]
>>> np.split(data[idx], split)
[array([[20, 32]]), array([[ 1, 2],
[ 3, 5],
[ 7, 10],
[ 0, 0]])]
Unfortunately, there isn't a built-in groupby function in numpy, though you could write alternatives. However, your problem could be solved more succinctly using pandas, if that's available to you:
import pandas as pd
res = pd.DataFrame(data).groupby(label).apply(lambda x: x.values).tolist()
# or, if performance is important, the following will be faster on large arrays,
# but less readable IMO:
res = [data[i] for i in pd.DataFrame(data).groupby(label).groups.values()]
[array([[20, 32]]), array([[ 1, 2],
[ 3, 5],
[ 7, 10],
[ 0, 0]])]

pythonic way for axis-wise winner-take-all in numpy

I am wondering what the most concise and pythonic way to keep only the maximum element in each line of a 2D numpy array while setting all other elements to zeros. Example:
given the following numpy array:
a = [ [1, 8, 3 ,6],
[5, 5, 60, 1],
[63,9, 9, 23] ]
I want the answer to be:
b = [ [0, 8, 0, 0],
[0, 0, 60, 0],
[63,0, 0, 0 ] ]
I can think of several ways to solve that, but what interests me is whether there are python functions to so this just quickly
Thank you in advance
You can use np.max to take the maximum along one axis, then use np.where to zero out the non-maximal elements:
np.where(a == a.max(axis=1, keepdims=True), a, 0)
The keepdims=True argument keeps the singleton dimension after taking the max (i.e. so that a.max(1, keepdims=True).shape == (3, 1)), which simplifies broadcasting it against a.
Don't know what is pythonic, so I assume the way with most python specific grammar is pythonic.
It used two list comprehension, which is feature of python. but in this way it might not that concise.
b = [[y if y == max(x) else 0 for y in x] for x in a ]

Most efficient way to implement numpy.in1d for muliple arrays

What is the best way to implement a function which takes an arbitrary number of 1d arrays and returns a tuple containing the indices of the matching values (if any).
Here is some pseudo-code of what I want to do:
a = np.array([1, 0, 4, 3, 2])
b = np.array([1, 2, 3, 4, 5])
c = np.array([4, 2])
(ind_a, ind_b, ind_c) = return_equals(a, b, c)
# ind_a = [2, 4]
# ind_b = [1, 3]
# ind_c = [0, 1]
(ind_a, ind_b, ind_c) = return_equals(a, b, c, sorted_by=a)
# ind_a = [2, 4]
# ind_b = [3, 1]
# ind_c = [0, 1]
def return_equals(*args, sorted_by=None):
...
You can use numpy.intersect1d with reduce for this:
def return_equals(*arrays):
matched = reduce(np.intersect1d, arrays)
return np.array([np.where(np.in1d(array, matched))[0] for array in arrays])
reduce may be little slow here because we are creating intermediate NumPy arrays here(for large number of input it may be very slow), we can prevent this if we use Python's set and its .intersection() method:
matched = np.array(list(set(arrays[0]).intersection(*arrays[1:])))
Related GitHub ticket: n-array versions of set operations, especially intersect1d
This solution basically concatenates all input 1D arrays into one big 1D array with the intention of performing the required operations in a vectorized manner. The only place where it uses loop is at the start where it gets the lengths of the input arrays, which must be minimal on runtime costs.
Here's the function implementation -
import numpy as np
def return_equals(*argv):
# Concatenate input arrays into one big array for vectorized processing
A = np.concatenate((argv[:]))
# lengths of input arrays
narr = len(argv)
lens = np.zeros((1,narr),int).ravel()
for i in range(narr):
lens[i] = len(argv[i])
N = A.size
# Start indices of each group of identical elements from different input arrays
# in a sorted version of the huge concatenated input array
start_idx = np.where(np.append([True],np.diff(np.sort(A))!=0))[0]
# Runlengths of islands of identical elements
runlens = np.diff(np.append(start_idx,N))
# Starting and all indices of the positions in concatenate array that has
# islands of identical elements which are present across all input arrays
good_start_idx = start_idx[runlens==narr]
good_all_idx = good_start_idx[:,None] + np.arange(narr)
# Get offsetted indices and sort them to get the desired output
idx = np.argsort(A)[good_all_idx] - np.append([0],lens[:-1].cumsum())
return np.sort(idx.T,1)
In Python:
def return_equal(*args):
rtr=[]
for i, arr in enumerate(args):
rtr.append([j for j, e in enumerate(arr) if
all(e in a for a in args[0:i]) and
all(e in a for a in args[i+1:])])
return rtr
>>> return_equal(a,b,c)
[[2, 4], [1, 3], [0, 1]]
For start, I'd try:
def return_equals(*args):
x=[]
c=args[-1]
for a in args:
x.append(np.nonzero(np.in1d(a,c))[0])
return x
If I add a d=np.array([1,0,4,3,0]) (it has only 1 match; what if there are no matches?)
then
return_equals(a,b,d,c)
produces:
[array([2, 4], dtype=int32),
array([1, 3], dtype=int32),
array([2], dtype=int32),
array([0, 1], dtype=int32)]
Since the length of both input and returned arrays can differ, you really can't vectorize the problem. That is, it takes some special gymnastics to perform the operation across all inputs at once. And if the number of arrays is small compared to their typical length, I wouldn't worry about speed. Iterating a few times is not expensive. It's iterating over a 100 values that's expensive.
You could, of course, pass the keyword arguments on to in1d.
It's not clear what you are trying to do with the sorted_by parameter. Is that something that you could just as easily apply to the arrays before you pass them to this function?
List comprehension version of this iteration:
[np.nonzero(np.in1d(x,c))[0] for x in [a,b,d,c]]
I can imagine concatenating the arrays into one longer one, applying in1d, and then splitting it up into subarrays. There is a np.split, but it requires that you tell it how many elements to put in each sublist. That means, somehow, determining how many matches there are for each argument. Doing that without looping could be tricky.
The pieces for this (that still need to be packed as function) are:
args=[a,b,d,c]
lens=[len(x) for x in args]
abc=np.concatenate(args)
C=np.cumsum(lens)
I=np.nonzero(np.in1d(abc,c))[0]
S=np.split(I,(2,4,5))
[S[0],S[1]-C[0],S[2]-C[1],S[3]-C[2]]
I
# array([ 2, 4, 6, 8, 12, 15, 16], dtype=int32)
C
# array([ 5, 10, 15, 17], dtype=int32)
The (2,4,5) are the number of elements of I between successive values of C, i.e. the number of elements that match for each of a,b,...

Python: How to find the a unique element pattern from 2 arrays?

I have two numpy arrays, A and B:
A = ([1, 2, 3, 2, 3, 1, 2, 1, 3])
B = ([2, 3, 1, 2])
where B is a unique pattern within A.
I need the output to be all the elements of A, which aren't present in B.
Output = ([1, 2, 3, 1, 3])
Easiest is to use Python's builtins, i.e. string type:
A = "123231213"
B = "2312"
result = A.replace(B, "")
To efficiently convert numpy.array to an from str, use these functions:
x = numpy.frombuffer("3452353", dtype="|i1")
x
array([51, 52, 53, 50, 51, 53, 51], dtype=int8)
x.tostring()
"3452353"
(*) thus mixes up ascii codes (1 != "1"), but substring search will work just fine. Your data type should better fit in one char, or you may get a false match.
To sum it up, a quick hack looks like this:
A = numpy.array([1, 2, 3, 2, 3, 1, 2, 1, 3])
B = numpy.array([2, 3, 1, 2])
numpy.fromstring(A.tostring().replace(B.tostring(), ""), dtype=A.dtype)
array([1, 2, 3, 1, 3])
# note, here dtype is some int, I'm relying on the fact that:
# "1 matches 1" is equivalent to "0001 matches 00001"
# this holds as long as values of B are typically non-zero.
#
# this trick can conceptually be used with floating point too,
# but beware of multiple floating point representations of same number
In depth explanation:
Assuming size of A and B is arbitrary, naive approach runs in quadratic time. However better, probabilistic algorithms exit, for example Rabin-Karp, which relies on sliding window hash.
Which is the main reason text oriented functions, such as xxx in str or str.replace or re will be much faster than custom numpy code.
If you truly need this function to be integrated with numpy, you can always write an extension, but it's not easy :)

Get bit on n position from all elements in ndarray in python

i have a 3D array of int32. I would like to transform each item from array to its corresponding bit value on "n" th position. My current approach is to loop through the whole array, but I think it can be done much more efficiently.
for z in range(0,dim[2]):
for y in range(0,dim[1]):
for x in range(0,dim[0]):
byte='{0:032b}'.format(array[z][y][x])
array[z][y][x]=int(byte>>n) & 1
Looking forward to your answers.
If you are dealing with large arrays, you are better off using numpy. Applying bitwise operations on a numpy array is much faster than applying it on python lists.
import numpy as np
a = np.random.randint(1,65, (2,2,2))
print a
Out[12]:
array([[[37, 46],
[47, 34]],
[[ 3, 15],
[44, 57]]])
print (a>>1)&1
Out[16]:
array([[[0, 1],
[1, 1]],
[[1, 1],
[0, 0]]])
Unless there is an intrinsic relation between the different points, you have no other choice than to loop over them to discover their current values. So the best you can do, will always be O(n^3)
What I don't get however, is why you go over the hassle of converting a number to a 32bit string, then back to int.
If you want to check if the nth bit of a number is set, you would do the following:
power_n = 1 << (n - 1)
for z in xrange(0,dim[2]):
for y in xrange(0,dim[1]):
for x in xrange(0,dim[0]):
array[z][y][x]= 0 if array[z][y][x] & power_n == 0 else 1
Not that in this example, I'm assuming that N is a 1-index (first bit is at n=1).

Categories

Resources