I have a numpy array representing an image. I want to zero out all indexes that are below a certain row in each column (based on a external data). I can't seem to figure out how to slice/broadcast/arrange the data to do this "the numpy way".
def first_nonzero(arr, axis, invalid_val=-1):
mask = arr!=0
return np.where(mask.any(axis=axis), mask.argmax(axis=axis), invalid_val)
# Find first non-zero pixels in a processed image
# Note, I might have my axes switched here... I'm not sure.
rows_to_zero = first_nonzero(processed_image, 0, processed_image.shape[1])
# zero out data in image below the rows found
# This is the part I'm stuck on.
image[:, :rows_to_zero, :] = 0 # How can I slice along an array of indexes?
# Or in plain python, I'm trying to do this:
for x in range(image.shape[0]):
for y in range(rows_to_zero, image.shape[1]):
image[x,y] = 0
Create a mask leveraging broadcasting and assign -
mask = rows_to_zero <= np.arange(image.shape[0])[:,None]
image[mask] = 0
Or multiply with the inverted mask : image *= ~mask.
Sample run to showcase mask setup -
In [56]: processed_image
Out[56]:
array([[1, 0, 1, 0],
[1, 0, 1, 1],
[0, 1, 1, 0],
[0, 1, 0, 1],
[1, 1, 1, 1],
[0, 1, 0, 1]])
In [57]: rows_to_zero
Out[57]: array([0, 2, 0, 1])
In [58]: rows_to_zero <= np.arange(processed_image.shape[0])[:,None]
Out[58]:
array([[ True, False, True, False],
[ True, False, True, True],
[ True, True, True, True],
[ True, True, True, True],
[ True, True, True, True],
[ True, True, True, True]], dtype=bool)
Also, for setting per column basis, I think you meant :
rows_to_zero = first_nonzero(processed_image, 0, processed_image.shape[0]-1)
If you meant to zero out on per row basis, you would have the indices first non-zero indices per row, let's call it idx. So, then do -
mask = idx[:,None] <= np.arange(image.shape[1])
image[mask] = 0
Sample run -
In [77]: processed_image
Out[77]:
array([[1, 0, 1, 0],
[1, 0, 1, 1],
[0, 1, 1, 0],
[0, 1, 0, 1],
[1, 1, 1, 1],
[0, 1, 0, 1]])
In [78]: idx = first_nonzero(processed_image, 1, processed_image.shape[1]-1)
In [79]: idx
Out[79]: array([0, 0, 1, 1, 0, 1])
In [80]: idx[:,None] <= np.arange(image.shape[1])
Out[80]:
array([[ True, True, True, True],
[ True, True, True, True],
[False, True, True, True],
[False, True, True, True],
[ True, True, True, True],
[False, True, True, True]], dtype=bool)
Related
I have a NumPy array 'boolarr' of boolean type. I want to count the number of elements whose values are True in each row. Is there a NumPy or Python routine dedicated for this task?
For example, consider the code below:
>>> import numpy as np
>>> boolarr = np.array([[0, 0, 1], [1, 0, 1], [1, 0, 1]], dtype=np.bool)
>>> boolarr
array([[False, False, True],
[ True, False, True],
[ True, False, True]], dtype=bool)
The count of each row would give the following results:
1
2
2
In [48]: boolarr = np.array([[0, 0, 1], [1, 0, 1], [1, 0, 1]], dtype=bool)
In [49]: boolarr
Out[49]:
array([[False, False, True],
[ True, False, True],
[ True, False, True]])
Just use sum:
In [50]: np.sum(boolarr, axis=1)
Out[50]: array([1, 2, 2])
The True count as 1 when doing addition.
Or:
In [54]: np.count_nonzero(boolarr, axis=1)
Out[54]: array([1, 2, 2])
I am using Python and numpy where I have a couple of numpy arrays of the same shape and I am trying to create a union of these arrays. these arrays contain only 0 and 1 and basically I want to merge them into a new array using the OR operation. So, I do the following:
import numpy as np
segs = list()
a = np.ones((10, 10)).astype('uint8')
b = np.zeros((10, 10)).astype('uint8')
segs.append(a)
segs.append(b)
mask = np.asarray([any(tup) for tup in zip(*segs)]).astype('uint8')
With the last staement I get the error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
If I use np.any, somehow my array shape is now just (10,). How can I create this merge without explicitly looping through the arrays?
EDIT
mask = np.asarray([any(tup) for tup in zip(segs)]).astype('uint8')
also results in the same error.
Your segs is a list of 2 arrays:
In [25]: segs = [np.ones((3,6),'uint8'), np.zeros((3,6),'uint8')]
In [26]: [tup for tup in zip(*segs)]
Out[26]:
[(array([1, 1, 1, 1, 1, 1], dtype=uint8),
array([0, 0, 0, 0, 0, 0], dtype=uint8)),
(array([1, 1, 1, 1, 1, 1], dtype=uint8),
array([0, 0, 0, 0, 0, 0], dtype=uint8)),
(array([1, 1, 1, 1, 1, 1], dtype=uint8),
array([0, 0, 0, 0, 0, 0], dtype=uint8))]
The zip produces tuples of 1d arrays (pairing rows of the two arrays). Python any applied to arrays gives the ambiguity error - that's true for other logical Python operations like if, or, etc, which expect a scalar True/False.
You tried np.any - that turns the tuple of arrays into a 2d array. But without an axis parameter it works on the flattened version, return a scalar True/False. But with an axis parameter we can apply this any across rows:
In [27]: [np.any(tup, axis=0) for tup in zip(*segs)]
Out[27]:
[array([ True, True, True, True, True, True]),
array([ True, True, True, True, True, True]),
array([ True, True, True, True, True, True])]
Using the logical_or ufunc as suggested in a comment:
In [31]: np.logical_or(segs[0],segs[1])
Out[31]:
array([[ True, True, True, True, True, True],
[ True, True, True, True, True, True],
[ True, True, True, True, True, True]])
In [32]: np.logical_or.reduce(segs)
Out[32]:
array([[ True, True, True, True, True, True],
[ True, True, True, True, True, True],
[ True, True, True, True, True, True]])
Using the '|' operator isn't quite the same:
In [33]: segs[0] | segs[1]
Out[33]:
array([[1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1]], dtype=uint8)
It uses the segs[0].__or__(segs[1]) method. I'd have to check the docs to see what is going on. Application to uint8 (or other numeric values) is different from application to bool. Almost looks like a max.
beginner with Python here. So I'm having trouble trying to calculate the resulting binary pairwise hammington distance matrix between the rows of an input matrix using only the numpy library. I'm supposed to avoid loops and use vectorization. If for instance I have something like:
[ 1, 0, 0, 1, 1, 0]
[ 1, 0, 0, 0, 0, 0]
[ 1, 1, 1, 1, 0, 0]
The matrix should be something like:
[ 0, 2, 3]
[ 2, 0, 3]
[ 3, 3, 0]
ie if the original matrix was A and the hammingdistance matrix is B. B[0,1] = hammingdistance (A[0] and A[1]). In this case the answer is 2 as they only have two different elements.
So for my code is something like this
def compute_HammingDistance(X):
hammingDistanceMatrix = np.zeros(shape = (len(X), len(X)))
hammingDistanceMatrix = np.count_nonzero ((X[:,:,None] != X[:,:,None].T))
return hammingDistanceMatrix
However it seems to just be returning a scalar value instead of the intended matrix. I know I'm probably doing something wrong with the array/vector broadcasting but I can't figure out how to fix it. I've tried using np.sum instead of np.count_nonzero but they all pretty much gave me something similar.
Try this approach, create a new axis along axis = 1, and then do broadcasting and count trues or non zero with sum:
(arr[:, None, :] != arr).sum(2)
# array([[0, 2, 3],
# [2, 0, 3],
# [3, 3, 0]])
def compute_HammingDistance(X):
return (X[:, None, :] != X).sum(2)
Explanation:
1) Create a 3d array which has shape (3,1,6)
arr[:, None, :]
#array([[[1, 0, 0, 1, 1, 0]],
# [[1, 0, 0, 0, 0, 0]],
# [[1, 1, 1, 1, 0, 0]]])
2) this is a 2d array has shape (3, 6)
arr
#array([[1, 0, 0, 1, 1, 0],
# [1, 0, 0, 0, 0, 0],
# [1, 1, 1, 1, 0, 0]])
3) This triggers broadcasting since their shape doesn't match, and the 2d array arr is firstly broadcasted along the 0 axis of 3d array arr[:, None, :], and then we have array of shape (1, 6) be broadcasted against (3, 6). The two broadcasting steps together make a cartesian comparison of the original array.
arr[:, None, :] != arr
#array([[[False, False, False, False, False, False],
# [False, False, False, True, True, False],
# [False, True, True, False, True, False]],
# [[False, False, False, True, True, False],
# [False, False, False, False, False, False],
# [False, True, True, True, False, False]],
# [[False, True, True, False, True, False],
# [False, True, True, True, False, False],
# [False, False, False, False, False, False]]], dtype=bool)
4) the sum along the third axis count how many elements are not equal, i.e, trues which gives the hamming distance.
For reasons I do not understand this
(2 * np.inner(a-0.5, 0.5-a) + a.shape[1] / 2)
appears to be much faster than #Psidom's for larger arrays:
a = np.random.randint(0,2,(100,1000))
timeit(lambda: (a[:, None, :] != a).sum(2), number=100)
# 2.297890231013298
timeit(lambda: (2 * np.inner(a-0.5, 0.5-a) + a.shape[1] / 2), number=100)
# 0.10616962902713567
Psidom's is a bit faster for the very small example:
a
# array([[1, 0, 0, 1, 1, 0],
# [1, 0, 0, 0, 0, 0],
# [1, 1, 1, 1, 0, 0]])
timeit(lambda: (a[:, None, :] != a).sum(2), number=100)
# 0.0004370050155557692
timeit(lambda: (2 * np.inner(a-0.5, 0.5-a) + a.shape[1] / 2), number=100)
# 0.00068191799800843
Update
Part of the reason appears to be floats being faster than other dtypes:
timeit(lambda: (0.5 * np.inner(2*a-1, 1-2*a) + a.shape[1] / 2), number=100)
# 0.7315902590053156
timeit(lambda: (0.5 * np.inner(2.0*a-1, 1-2.0*a) + a.shape[1] / 2), number=100)
# 0.12021801102673635
I'm wondering if there is a simple, built-in function in Python / Numpy for converting an integer datatype to an array/list of booleans, corresponding to a bitwise interpretation of the number please?
e.g:
x = 5 # i.e. 101 in binary
print FUNCTION(x)
and then I'd like returned:
[True, False, True]
or ideally, with padding to always return 8 boolean values (i.e. one full byte):
[False, False, False, False, False, True, False, True]
Thanks
You can use numpy's unpackbits.
From the docs (http://docs.scipy.org/doc/numpy/reference/generated/numpy.unpackbits.html)
>>> a = np.array([[2], [7], [23]], dtype=np.uint8)
>>> a
array([[ 2],
[ 7],
[23]], dtype=uint8)
>>> b = np.unpackbits(a, axis=1)
>>> b
array([[0, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 1, 0, 1, 1, 1]], dtype=uint8)
To get to a bool array:
In [49]: np.unpackbits(np.array([1],dtype="uint8")).astype("bool")
Out[49]: array([False, False, False, False, False, False, False, True], dtype=bool)
Not a built in method, but something to get you going (and fun to write)
>>> def int_to_binary_bool(num):
return [bool(int(i)) for i in "{0:08b}".format(num)]
>>> int_to_binary_bool(5)
[False, False, False, False, False, True, False, True]
I have two numpy array (2 dimensional) e.g.
a1 = array([["a","b"],["a","c"],["b","b"],["a","b"]])
a2 = array([["a","b"],["b","b"],["c","a"],["a","c"]])
What is the most elegant way of getting a matrix like this:
array([[1,0,0,0],
[0,0,0,1],
[0,1,0,0],
[1,0,0,0]])
Where element (i,j) is 1 if all(a1[i,:] == a2[j,:]) and otherwise 0
(everything involving two for loops I don't consider elegant)
>>> (a1[:,numpy.newaxis] == a2).all(axis=2)
array([[ True, False, False, False],
[False, False, False, True],
[False, True, False, False],
[ True, False, False, False]], dtype=bool)
If you really need integers, convert to int as last step:
>>> (a1[:,numpy.newaxis] == a2).all(axis=2).astype(int)
array([[1, 0, 0, 0],
[0, 0, 0, 1],
[0, 1, 0, 0],
[1, 0, 0, 0]])