ValueError with union of two arrays using the OR operator

ValueError with union of two arrays using the OR operator - python

I am using Python and numpy where I have a couple of numpy arrays of the same shape and I am trying to create a union of these arrays. these arrays contain only 0 and 1 and basically I want to merge them into a new array using the OR operation. So, I do the following:
import numpy as np
segs = list()
a = np.ones((10, 10)).astype('uint8')
b = np.zeros((10, 10)).astype('uint8')
segs.append(a)
segs.append(b)
mask = np.asarray([any(tup) for tup in zip(*segs)]).astype('uint8')
With the last staement I get the error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
If I use np.any, somehow my array shape is now just (10,). How can I create this merge without explicitly looping through the arrays?
EDIT
mask = np.asarray([any(tup) for tup in zip(segs)]).astype('uint8')
also results in the same error.

Your segs is a list of 2 arrays:
In [25]: segs = [np.ones((3,6),'uint8'), np.zeros((3,6),'uint8')]
In [26]: [tup for tup in zip(*segs)]
Out[26]:
[(array([1, 1, 1, 1, 1, 1], dtype=uint8),
array([0, 0, 0, 0, 0, 0], dtype=uint8)),
(array([1, 1, 1, 1, 1, 1], dtype=uint8),
array([0, 0, 0, 0, 0, 0], dtype=uint8)),
(array([1, 1, 1, 1, 1, 1], dtype=uint8),
array([0, 0, 0, 0, 0, 0], dtype=uint8))]
The zip produces tuples of 1d arrays (pairing rows of the two arrays). Python any applied to arrays gives the ambiguity error - that's true for other logical Python operations like if, or, etc, which expect a scalar True/False.
You tried np.any - that turns the tuple of arrays into a 2d array. But without an axis parameter it works on the flattened version, return a scalar True/False. But with an axis parameter we can apply this any across rows:
In [27]: [np.any(tup, axis=0) for tup in zip(*segs)]
Out[27]:
[array([ True, True, True, True, True, True]),
array([ True, True, True, True, True, True]),
array([ True, True, True, True, True, True])]
Using the logical_or ufunc as suggested in a comment:
In [31]: np.logical_or(segs[0],segs[1])
Out[31]:
array([[ True, True, True, True, True, True],
[ True, True, True, True, True, True],
[ True, True, True, True, True, True]])
In [32]: np.logical_or.reduce(segs)
Out[32]:
array([[ True, True, True, True, True, True],
[ True, True, True, True, True, True],
[ True, True, True, True, True, True]])
Using the '|' operator isn't quite the same:
In [33]: segs[0] | segs[1]
Out[33]:
array([[1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1]], dtype=uint8)
It uses the segs[0].__or__(segs[1]) method. I'd have to check the docs to see what is going on. Application to uint8 (or other numeric values) is different from application to bool. Almost looks like a max.

Related

How to zero out row slice in 2 dimensional array in numpy?

I have a numpy array representing an image. I want to zero out all indexes that are below a certain row in each column (based on a external data). I can't seem to figure out how to slice/broadcast/arrange the data to do this "the numpy way".
def first_nonzero(arr, axis, invalid_val=-1):
mask = arr!=0
return np.where(mask.any(axis=axis), mask.argmax(axis=axis), invalid_val)
# Find first non-zero pixels in a processed image
# Note, I might have my axes switched here... I'm not sure.
rows_to_zero = first_nonzero(processed_image, 0, processed_image.shape[1])
# zero out data in image below the rows found
# This is the part I'm stuck on.
image[:, :rows_to_zero, :] = 0 # How can I slice along an array of indexes?
# Or in plain python, I'm trying to do this:
for x in range(image.shape[0]):
for y in range(rows_to_zero, image.shape[1]):
image[x,y] = 0

Create a mask leveraging broadcasting and assign -
mask = rows_to_zero <= np.arange(image.shape[0])[:,None]
image[mask] = 0
Or multiply with the inverted mask : image *= ~mask.
Sample run to showcase mask setup -
In [56]: processed_image
Out[56]:
array([[1, 0, 1, 0],
[1, 0, 1, 1],
[0, 1, 1, 0],
[0, 1, 0, 1],
[1, 1, 1, 1],
[0, 1, 0, 1]])
In [57]: rows_to_zero
Out[57]: array([0, 2, 0, 1])
In [58]: rows_to_zero <= np.arange(processed_image.shape[0])[:,None]
Out[58]:
array([[ True, False, True, False],
[ True, False, True, True],
[ True, True, True, True],
[ True, True, True, True],
[ True, True, True, True],
[ True, True, True, True]], dtype=bool)
Also, for setting per column basis, I think you meant :
rows_to_zero = first_nonzero(processed_image, 0, processed_image.shape[0]-1)
If you meant to zero out on per row basis, you would have the indices first non-zero indices per row, let's call it idx. So, then do -
mask = idx[:,None] <= np.arange(image.shape[1])
image[mask] = 0
Sample run -
In [77]: processed_image
Out[77]:
array([[1, 0, 1, 0],
[1, 0, 1, 1],
[0, 1, 1, 0],
[0, 1, 0, 1],
[1, 1, 1, 1],
[0, 1, 0, 1]])
In [78]: idx = first_nonzero(processed_image, 1, processed_image.shape[1]-1)
In [79]: idx
Out[79]: array([0, 0, 1, 1, 0, 1])
In [80]: idx[:,None] <= np.arange(image.shape[1])
Out[80]:
array([[ True, True, True, True],
[ True, True, True, True],
[False, True, True, True],
[False, True, True, True],
[ True, True, True, True],
[False, True, True, True]], dtype=bool)

How to obtain the same result as numpy.where over a 2D array without getting 2 indices from the same row

I have a numpy array with booleans:
bool_array.shape
Out[84]: (78, 8)
bool_array.dtype
Out[85]: dtype('bool')
And I would like to find the indices where the second dimension is True:
bool_array[30:35]
Out[87]:
array([[False, False, False, False, True, False, False, False],
[ True, False, False, False, True, False, False, False],
[False, False, False, False, False, True, False, False],
[ True, False, False, False, False, False, False, False],
[ True, False, False, False, False, False, False, False]], dtype=bool)
I have been using numpy.where to do this, but sometimes there are more than 1 indices along the second dimension with the True value.
I would like to find a way to obtain the same result as numpy.where but avoiding to have 2 indices from the same row:
np.where(bool_array)[0][30:35]
Out[88]: array([30, 31, 31, 32, 33])
I currently solve this by looping over the results of numpy.where, finding which n indices are equal to n-1, and using numpy.delete to remove the unwanted indices.
I would like to know if there is a more directly way to obtain the kind of results that I want.
Notes:
The rows of the boolean arrays that I use always have at least 1
True value.
I don't care which one of the multiples True values remains, i only
care to have just 1.

IIUC and given the fact that there is at least one TRUE element per row, you can simply use np.argmax along the second axis to select the first TRUE element along each row, like so -
col_idx = bool_array.argmax(1)
Sample run -
In [246]: bool_array
Out[246]:
array([[ True, True, True, True, False],
[False, False, True, True, False],
[ True, True, False, False, True],
[ True, True, False, False, True]], dtype=bool)
In [247]: np.where(bool_array)[0]
Out[247]: array([0, 0, 0, 0, 1, 1, 2, 2, 2, 3, 3, 3])
In [248]: np.where(bool_array)[1]
Out[248]: array([0, 1, 2, 3, 2, 3, 0, 1, 4, 0, 1, 4])
In [249]: bool_array.argmax(1)
Out[249]: array([0, 2, 0, 0])
Explanation -
Corresponding to the duplicates from the output of np.where(bool_array)[0], i.e. :
array([0, 0, 0, 0, 1, 1, 2, 2, 2, 3, 3, 3])
, we need to select anyone from the output of np.where(bool_array)[1], i.e. :
array([0, 1, 2, 3, 2, 3, 0, 1, 4, 0, 1, 4])
^ ^ ^ ^
Thus, selecting the first True from each row with bool_array.argmax(1) gives us :
array([0, 2, 0, 0])

You could call np.unique on the resultant array like so:
>>> np.where(bool_array)[0][30:35]
Out[4]: array([0, 1, 1, 2, 3, 4])
>>> np.unique(np.where(bool_array)[0][30:35])
Out[5]: array([0, 1, 2, 3, 4])

Built-in function in numpy to interpret an integer to an array of boolean values in a bitwise manner?

I'm wondering if there is a simple, built-in function in Python / Numpy for converting an integer datatype to an array/list of booleans, corresponding to a bitwise interpretation of the number please?
e.g:
x = 5 # i.e. 101 in binary
print FUNCTION(x)
and then I'd like returned:
[True, False, True]
or ideally, with padding to always return 8 boolean values (i.e. one full byte):
[False, False, False, False, False, True, False, True]
Thanks

You can use numpy's unpackbits.
From the docs (http://docs.scipy.org/doc/numpy/reference/generated/numpy.unpackbits.html)
>>> a = np.array([[2], [7], [23]], dtype=np.uint8)
>>> a
array([[ 2],
[ 7],
[23]], dtype=uint8)
>>> b = np.unpackbits(a, axis=1)
>>> b
array([[0, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 1, 0, 1, 1, 1]], dtype=uint8)
To get to a bool array:
In [49]: np.unpackbits(np.array([1],dtype="uint8")).astype("bool")
Out[49]: array([False, False, False, False, False, False, False, True], dtype=bool)

Not a built in method, but something to get you going (and fun to write)
>>> def int_to_binary_bool(num):
return [bool(int(i)) for i in "{0:08b}".format(num)]
>>> int_to_binary_bool(5)
[False, False, False, False, False, True, False, True]

indices of NumPy arrays intersection

I have two NumPy arrays. For example:
arr1 = np.array(['a','b','a','c','c','b','a','d'])
arr2 = np.array(['a','b','c','d'])
My task is to create list of indices of arr2 array where arr1 == arr2.
The length of the desired list should be equal to len(arr1). For instance, in my case the correct answer is [0,1,0,2,2,1,0,3].
What is the short way to do this? Is it possible to use a list comprehension here?

I noticed that arr2 is sorted, is that by design? If so you can do:
arr1 = np.array(['a','b','a','c','c','b','a','d'])
arr2 = np.array(['a','b','c','d'])
arr2.searchsorted(arr1)
# array([0, 1, 0, 2, 2, 1, 0, 3])
As #JAB has mentioned you could use the sorter keyword to searchsorted when arr2 is not sorted:
arr2 = np.array(['d', 'c', 'b', 'a'])
sorter = arr2.argsort()
sorter[arr2.searchsorted(arr1, sorter=sorter)]
# array([3, 2, 3, 1, 1, 2, 3, 0])
This is an O(N*log(N)) method because of the argsort, but it should still be very fast for many use-cases.

Not sure if numpy has a method for this, but here is a builtin approach, which takes O(N) in time:
In [9]: lookup = {v:i for i, v in enumerate(arr2)}
In [10]: [lookup[v] for v in arr1]
Out[10]: [0, 1, 0, 2, 2, 1, 0, 3]

You can do it like this with NumPy using broadcasting, however if your arrays are large you can end up allocating a lot of memory for the intermediate result
>>> import numpy as np
>>> arr1, arr2 = np.array(['a','b','a','c','c','b','a','d']), np.array(['a','b','c','d'])
>>> arr1 == arr2[:, None]
array([[ True, False, True, False, False, False, True, False],
[False, True, False, False, False, True, False, False],
[False, False, False, True, True, False, False, False],
[False, False, False, False, False, False, False, True]], dtype=bool)
>>> (arr1 == arr2[:, None]).argmax(axis=0)
array([0, 1, 0, 2, 2, 1, 0, 3])
>>>
Otherwise keep an eye on arraysetops in case someone adds a return_index parameter to intersect1d

How to make this kind of equality array fast (in numpy)?

I have two numpy array (2 dimensional) e.g.
a1 = array([["a","b"],["a","c"],["b","b"],["a","b"]])
a2 = array([["a","b"],["b","b"],["c","a"],["a","c"]])
What is the most elegant way of getting a matrix like this:
array([[1,0,0,0],
[0,0,0,1],
[0,1,0,0],
[1,0,0,0]])
Where element (i,j) is 1 if all(a1[i,:] == a2[j,:]) and otherwise 0
(everything involving two for loops I don't consider elegant)

>>> (a1[:,numpy.newaxis] == a2).all(axis=2)
array([[ True, False, False, False],
[False, False, False, True],
[False, True, False, False],
[ True, False, False, False]], dtype=bool)
If you really need integers, convert to int as last step:
>>> (a1[:,numpy.newaxis] == a2).all(axis=2).astype(int)
array([[1, 0, 0, 0],
[0, 0, 0, 1],
[0, 1, 0, 0],
[1, 0, 0, 0]])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

ValueError with union of two arrays using the OR operator - python

Related

How to zero out row slice in 2 dimensional array in numpy?

How to obtain the same result as numpy.where over a 2D array without getting 2 indices from the same row

Built-in function in numpy to interpret an integer to an array of boolean values in a bitwise manner?

indices of NumPy arrays intersection

How to make this kind of equality array fast (in numpy)?

Categories

Resources