How to make this kind of equality array fast (in numpy)? - python

I have two numpy array (2 dimensional) e.g.
a1 = array([["a","b"],["a","c"],["b","b"],["a","b"]])
a2 = array([["a","b"],["b","b"],["c","a"],["a","c"]])
What is the most elegant way of getting a matrix like this:
array([[1,0,0,0],
[0,0,0,1],
[0,1,0,0],
[1,0,0,0]])
Where element (i,j) is 1 if all(a1[i,:] == a2[j,:]) and otherwise 0
(everything involving two for loops I don't consider elegant)

>>> (a1[:,numpy.newaxis] == a2).all(axis=2)
array([[ True, False, False, False],
[False, False, False, True],
[False, True, False, False],
[ True, False, False, False]], dtype=bool)
If you really need integers, convert to int as last step:
>>> (a1[:,numpy.newaxis] == a2).all(axis=2).astype(int)
array([[1, 0, 0, 0],
[0, 0, 0, 1],
[0, 1, 0, 0],
[1, 0, 0, 0]])

Related

How to count the number of true elements in each row of a NumPy bool array

I have a NumPy array 'boolarr' of boolean type. I want to count the number of elements whose values are True in each row. Is there a NumPy or Python routine dedicated for this task?
For example, consider the code below:
>>> import numpy as np
>>> boolarr = np.array([[0, 0, 1], [1, 0, 1], [1, 0, 1]], dtype=np.bool)
>>> boolarr
array([[False, False, True],
[ True, False, True],
[ True, False, True]], dtype=bool)
The count of each row would give the following results:
1
2
2
In [48]: boolarr = np.array([[0, 0, 1], [1, 0, 1], [1, 0, 1]], dtype=bool)
In [49]: boolarr
Out[49]:
array([[False, False, True],
[ True, False, True],
[ True, False, True]])
Just use sum:
In [50]: np.sum(boolarr, axis=1)
Out[50]: array([1, 2, 2])
The True count as 1 when doing addition.
Or:
In [54]: np.count_nonzero(boolarr, axis=1)
Out[54]: array([1, 2, 2])

ValueError with union of two arrays using the OR operator

I am using Python and numpy where I have a couple of numpy arrays of the same shape and I am trying to create a union of these arrays. these arrays contain only 0 and 1 and basically I want to merge them into a new array using the OR operation. So, I do the following:
import numpy as np
segs = list()
a = np.ones((10, 10)).astype('uint8')
b = np.zeros((10, 10)).astype('uint8')
segs.append(a)
segs.append(b)
mask = np.asarray([any(tup) for tup in zip(*segs)]).astype('uint8')
With the last staement I get the error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
If I use np.any, somehow my array shape is now just (10,). How can I create this merge without explicitly looping through the arrays?
EDIT
mask = np.asarray([any(tup) for tup in zip(segs)]).astype('uint8')
also results in the same error.
Your segs is a list of 2 arrays:
In [25]: segs = [np.ones((3,6),'uint8'), np.zeros((3,6),'uint8')]
In [26]: [tup for tup in zip(*segs)]
Out[26]:
[(array([1, 1, 1, 1, 1, 1], dtype=uint8),
array([0, 0, 0, 0, 0, 0], dtype=uint8)),
(array([1, 1, 1, 1, 1, 1], dtype=uint8),
array([0, 0, 0, 0, 0, 0], dtype=uint8)),
(array([1, 1, 1, 1, 1, 1], dtype=uint8),
array([0, 0, 0, 0, 0, 0], dtype=uint8))]
The zip produces tuples of 1d arrays (pairing rows of the two arrays). Python any applied to arrays gives the ambiguity error - that's true for other logical Python operations like if, or, etc, which expect a scalar True/False.
You tried np.any - that turns the tuple of arrays into a 2d array. But without an axis parameter it works on the flattened version, return a scalar True/False. But with an axis parameter we can apply this any across rows:
In [27]: [np.any(tup, axis=0) for tup in zip(*segs)]
Out[27]:
[array([ True, True, True, True, True, True]),
array([ True, True, True, True, True, True]),
array([ True, True, True, True, True, True])]
Using the logical_or ufunc as suggested in a comment:
In [31]: np.logical_or(segs[0],segs[1])
Out[31]:
array([[ True, True, True, True, True, True],
[ True, True, True, True, True, True],
[ True, True, True, True, True, True]])
In [32]: np.logical_or.reduce(segs)
Out[32]:
array([[ True, True, True, True, True, True],
[ True, True, True, True, True, True],
[ True, True, True, True, True, True]])
Using the '|' operator isn't quite the same:
In [33]: segs[0] | segs[1]
Out[33]:
array([[1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1]], dtype=uint8)
It uses the segs[0].__or__(segs[1]) method. I'd have to check the docs to see what is going on. Application to uint8 (or other numeric values) is different from application to bool. Almost looks like a max.

How to zero out row slice in 2 dimensional array in numpy?

I have a numpy array representing an image. I want to zero out all indexes that are below a certain row in each column (based on a external data). I can't seem to figure out how to slice/broadcast/arrange the data to do this "the numpy way".
def first_nonzero(arr, axis, invalid_val=-1):
mask = arr!=0
return np.where(mask.any(axis=axis), mask.argmax(axis=axis), invalid_val)
# Find first non-zero pixels in a processed image
# Note, I might have my axes switched here... I'm not sure.
rows_to_zero = first_nonzero(processed_image, 0, processed_image.shape[1])
# zero out data in image below the rows found
# This is the part I'm stuck on.
image[:, :rows_to_zero, :] = 0 # How can I slice along an array of indexes?
# Or in plain python, I'm trying to do this:
for x in range(image.shape[0]):
for y in range(rows_to_zero, image.shape[1]):
image[x,y] = 0
Create a mask leveraging broadcasting and assign -
mask = rows_to_zero <= np.arange(image.shape[0])[:,None]
image[mask] = 0
Or multiply with the inverted mask : image *= ~mask.
Sample run to showcase mask setup -
In [56]: processed_image
Out[56]:
array([[1, 0, 1, 0],
[1, 0, 1, 1],
[0, 1, 1, 0],
[0, 1, 0, 1],
[1, 1, 1, 1],
[0, 1, 0, 1]])
In [57]: rows_to_zero
Out[57]: array([0, 2, 0, 1])
In [58]: rows_to_zero <= np.arange(processed_image.shape[0])[:,None]
Out[58]:
array([[ True, False, True, False],
[ True, False, True, True],
[ True, True, True, True],
[ True, True, True, True],
[ True, True, True, True],
[ True, True, True, True]], dtype=bool)
Also, for setting per column basis, I think you meant :
rows_to_zero = first_nonzero(processed_image, 0, processed_image.shape[0]-1)
If you meant to zero out on per row basis, you would have the indices first non-zero indices per row, let's call it idx. So, then do -
mask = idx[:,None] <= np.arange(image.shape[1])
image[mask] = 0
Sample run -
In [77]: processed_image
Out[77]:
array([[1, 0, 1, 0],
[1, 0, 1, 1],
[0, 1, 1, 0],
[0, 1, 0, 1],
[1, 1, 1, 1],
[0, 1, 0, 1]])
In [78]: idx = first_nonzero(processed_image, 1, processed_image.shape[1]-1)
In [79]: idx
Out[79]: array([0, 0, 1, 1, 0, 1])
In [80]: idx[:,None] <= np.arange(image.shape[1])
Out[80]:
array([[ True, True, True, True],
[ True, True, True, True],
[False, True, True, True],
[False, True, True, True],
[ True, True, True, True],
[False, True, True, True]], dtype=bool)

Set values in matrix to 1 where index matches value

I have matrix:
[[0,1,0,2,1,0,1,2], [0,1,0,2,1,0,1,2], [0,1,0,2,1,0,1,2]]
and I would like to get matrix:
[[1,0,1,0,0,1,0,0], [0,1,0,0,1,0,1,0], [0,0,0,1,0,0,0,1]]
So if I try to explain this.. New matrix should have ones where index of row matches value. In first row all 0 should be 1 and all other values 0, in second row all 1 should be 1 and all other values 0 and so on...
Thanks.
You can do this easily if you take advantage of broadcasting (assuming that x is a numpy array-- if not, you can convert it to one):
>>> np.arange(len(x))
array([0, 1, 2])
>>> np.arange(len(x))[:,None]
array([[0],
[1],
[2]])
The [:None] (or [:np.newaxis]) adda a singleton axis, so we have a 2D object of shape (3,1) instead of a 1D object of shape (1,). Then we can compare:
>>> x == np.arange(len(x))[:,None]
array([[ True, False, True, False, False, True, False, False],
[False, True, False, False, True, False, True, False],
[False, False, False, True, False, False, False, True]], dtype=bool)
where the 0, 1, and 2 get compared with each element in each row.
After this, we have:
>>> (x == np.arange(len(x))[:,None]).astype(int)
array([[1, 0, 1, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 1, 0, 1, 0],
[0, 0, 0, 1, 0, 0, 0, 1]])
Enumerate is the way to go.
lst = [[0,1,0,2,1,0,1,2], [0,1,0,2,1,0,1,2], [0,1,0,2,1,0,1,2]]
lst = [[1 if i == j else 0 for j in l] for i, l in enumerate(lst)]
This should work out:
matrix = [[0,1,0,2,1,0,1,2], [0,1,0,2,1,0,1,2], [0,1,0,2,1,0,1,2]]
def fun(a,b):
if a == b:
return 1
else:
return 0
for i in range(len(matrix)):
matrix[i] = [fun(k,i) for k in matrix[i]]

Built-in function in numpy to interpret an integer to an array of boolean values in a bitwise manner?

I'm wondering if there is a simple, built-in function in Python / Numpy for converting an integer datatype to an array/list of booleans, corresponding to a bitwise interpretation of the number please?
e.g:
x = 5 # i.e. 101 in binary
print FUNCTION(x)
and then I'd like returned:
[True, False, True]
or ideally, with padding to always return 8 boolean values (i.e. one full byte):
[False, False, False, False, False, True, False, True]
Thanks
You can use numpy's unpackbits.
From the docs (http://docs.scipy.org/doc/numpy/reference/generated/numpy.unpackbits.html)
>>> a = np.array([[2], [7], [23]], dtype=np.uint8)
>>> a
array([[ 2],
[ 7],
[23]], dtype=uint8)
>>> b = np.unpackbits(a, axis=1)
>>> b
array([[0, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 1, 0, 1, 1, 1]], dtype=uint8)
To get to a bool array:
In [49]: np.unpackbits(np.array([1],dtype="uint8")).astype("bool")
Out[49]: array([False, False, False, False, False, False, False, True], dtype=bool)
Not a built in method, but something to get you going (and fun to write)
>>> def int_to_binary_bool(num):
return [bool(int(i)) for i in "{0:08b}".format(num)]
>>> int_to_binary_bool(5)
[False, False, False, False, False, True, False, True]

Categories

Resources