Finding max in numpy skipping some rows and column - python

I want to find the max row and column index in a numpy matrix. But it not be in the a set of rows or columns. Thus, it should skip those rows and columns while computing the max.
Example:
# finding max in numpy matrix
[row,col] = np.where(mat == mat.max())
But it should skip rows removed_rows=[] and columns columns_rows=[]
I don't want to create a new sub matrix for the computation.

Let a be the input array, rows_rem and cols_rem be the rows and column indices to be skipped respectively. We would have an approach using masking, like so -
m,n = a.shape
d0,d1 = np.ogrid[:m,:n]
a_masked = a*~(np.in1d(d0,rows_rem)[:,None] | np.in1d(d1,cols_rem))
max_row, max_col = np.where(a_masked == a_masked.max())
Sample run -
In [204]: # Inputs
...: a = np.random.randint(11,99,(4,5))
...: rows_rem = [1,3]
...: cols_rem = [1,2,4]
...:
In [205]: a
Out[205]:
array([[36, 51, 72, 18, 31],
[78, 42, 12, 71, 72],
[38, 46, 42, 67, 12],
[87, 56, 76, 14, 21]])
In [206]: a_masked
Out[206]:
array([[64, 0, 0, 90, 0],
[ 0, 0, 0, 0, 0],
[17, 0, 0, 40, 0],
[ 0, 0, 0, 0, 0]])
In [207]: max_row, max_col
Out[207]: (array([0]), array([3]))
Please note that if there's more than one element with the same max value, we would have all of those in the output. So, if you want any or the first of those, we can use argmax, like so -
max_row, max_col = np.unravel_index(a_masked.argmax(),a.shape)

remove_rows = [2,3]
remove_cols = [0,1]
a = np.random.randint(11,99,(4,5))
>>> a
array([[60, 86, 89, 66, 20],
[77, 86, 78, 90, 44],
[68, 57, 83, 48, 25],
[30, 81, 42, 11, 63]])
>>>
Get the row and column indices that you are interested in by filtering out the indices you want removed:
r, c = a.shape
r = [x for x in range(r) if x not in remove_rows]
c = [x for x in range(c) if x not in remove_cols]
>>> r,c
([0, 1], [2, 3, 4])
>>>
Now r and c can be used for integer indexing, numpy.ix_ helps with this.
>>> a[np.ix_(r,c)]
array([[89, 66, 20],
[78, 90, 44]])
>>>
Tack on ndarray.max() to get the max value:
>>> a[np.ix_(r,c)].max()
90
>>>
Finally, use numpy.where to find where it is in the original array:
>>> row, col = np.where(a == a[np.ix_(r,c)].max())
>>> row, col
(array([1]), array([3]))
>>>
This method will also work if removing non-sequential rows or columns.
For example:
remove_rows = [0,3]
remove_cols = [1,4]

Related

How to select elements on given axis base on the value of another array

I am triying to solve the following problem in a more numpy-friendly way (without loops):
G is NxM matrix fill with 0, 1 or 2
D is a 3xNxM matrix
We want the a NxM matrix (R) with R[i,j] = D[k,i,j] being k=g[i,j]
A loop base solution is:
def getVals(g, d):
arr=np.zeros(g.shape)
for row in range(g.shape[0]):
for column in range(g.shape[1]):
arr[row,column]=d[g[row,column],row,column]
return arr
Try with ogrid and advanced indexing:
x,y = np.ogrid[:N,:M]
out = D[G, x[None], y[None]]
Test:
N,M=4,5
G = np.random.randint(0,3, (N,M))
D = np.random.rand(3,N,M)
np.allclose(getVals(G,D), D[G, x[None], y[None]])
# True
You could also use np.take_along_axis
Then you can simply extract your values along one specific axis:
# Example input data:
G = np.random.randint(0,3,(4,5)) # 4x5 array
D = np.random.randint(0,9,(3,4,5)) # 3x4x5 array
# Get the results:
R = np.take_along_axis(D,G[None,:],axis=0)
Since G should have the same number of dimension as D, we simply add a new dimension to G with G[None,:].
Here's my try (I assume g and d are Numpy Ndarrays):
def getVals(g, d):
m,n = g.shape
indexes = g.flatten()*m*n + np.arange(m*n)
arr = d.flatten()[indexes].reshape(m,n)
return arr
So if
d = [[[96, 89, 51, 40, 51],
[31, 72, 39, 77, 33]],
[[34, 11, 54, 86, 73],
[12, 21, 74, 39, 14]],
[[14, 91, 38, 77, 97],
[44, 55, 93, 88, 55]]]
and
g = [[2, 1, 2, 1, 1],
[0, 2, 0, 0, 2]]
then you are going to get
arr = [[14, 11, 38, 86, 73],
[31, 55, 39, 77, 55]]

identifying sub-arrays in numpy

I have two two dimensional arrays a and b (#columns of a <= #columns in b). I would like to find an efficient way of matching a row in array a to a contiguous part of a row in array b.
a = np.array([[ 25, 28],
[ 84, 97],
[105, 24],
[ 28, 900]])
b = np.array([[ 25, 28, 84, 97],
[ 22, 25, 28, 900],
[ 11, 12, 105, 24]])
The output should be np.array([[0,0], [0,1], [1,0], [2,2], [3,1]]). Row 0 in array a matches Row 0 in array b (first two positions). Row 1 in array a matches row 0 in array b (third and fourth positions).
We can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows for efficient patch extraction, and then compare those patches against each row off a, all of it in a vectorized manner. Then, get the matching indices with np.argwhere -
# a and b from posted question
In [325]: from skimage.util.shape import view_as_windows
In [428]: w = view_as_windows(b,(1,a.shape[1]))
In [429]: np.argwhere((w == a).all(-1).any(-2))[:,::-1]
Out[429]:
array([[0, 0],
[1, 0],
[0, 1],
[3, 1],
[2, 2]])
Alternatively, we could get the indices by the order of rows in a by pushing forward the first axis of a while performing broadcasted comparisons -
In [444]: np.argwhere((w[:,:,0] == a[:,None,None,:]).all(-1).any(-1))
Out[444]:
array([[0, 0],
[0, 1],
[1, 0],
[2, 2],
[3, 1]])
Another way I can think of is to loop over each row in a and perform a 2D correlation between the b which you can consider as a 2D signal a row in a.
We would find the results which are equal to the sum of squares of all values in a. If we subtract our correlation result with this sum of squares, we would find matches with a zero result. Any rows that give you a 0 result would mean that the subarray was found in that row. If you are using floating-point numbers for example, you may want to compare with some small threshold that is just above 0.
If you can use SciPy, the scipy.signal.correlate2d method is what I had in mind.
import numpy as np
from scipy.signal import correlate2d
a = np.array([[ 25, 28],
[ 84, 97],
[105, 24]])
b = np.array([[ 25, 28, 84, 97],
[ 22, 25, 28, 900],
[ 11, 12, 105, 24]])
EPS = 1e-8
result = []
for (i, row) in enumerate(a):
out = correlate2d(b, row[None,:], mode='valid') - np.square(row).sum()
locs = np.where(np.abs(out) <= EPS)[0]
unique_rows = np.unique(locs)
for res in unique_rows:
result.append((i, res))
We get:
In [32]: result
Out[32]: [(0, 0), (0, 1), (1, 0), (2, 2)]
The time complexity of this could be better, especially since we're looping over each row of a to find any subarrays in b.

Efficiently zero out all but largest n elements for each image pixel

So I have an image I of size (H x W x C), where C is some number of channels. The challenge is to obtain a new image J, again of size (H x W x C), in which J[i, j] contains only the maximum n entries in I[i, j].
Equivalently, think about iterating through each image pixel in I and zero-ing out all but the highest n entries.
What I've tried:
# NOTE: bone_weight_matrix is a matrix of size (256 x 256 x 43)
argsort_four = np.argsort(bone_weight_matrix, axis=2)[:, :, -4:]
# For each pixel, retain only the top four influencing bone weights
proc_matrix = np.zeros(bone_weight_matrix.shape)
for i in range(bone_weight_matrix.shape[0]):
for j in range(bone_weight_matrix.shape[1]):
proc_matrix[i, j, argsort_four[i, j]] = bone_weight_matrix[i, j, argsort_four[i, j]]
return proc_matrix
Problem is this method seems to be super slow and doesn't feel very pythonic. Any advice would be great.
Cheers.
Generic case : Keeping largest or smallest n elements along an axis
Basically two steps would be involved :
Get those n indices to be kept along the specified axis with np.argparition.
Initialize a zeros array and use those earlier obtained indices with advanced-indexing to select from the input array as well as assign into the zeros array.
Let's try to solve for a generic problem that works to select n elements along the specified axis and also be able to keep largest n as well as smallest n elements.
The implementation would look like this -
def keep(ar, n, axis=-1, order='largest'):
axis = np.core.multiarray.normalize_axis_index(axis, ar.ndim)
slice_l = [slice(None, None, None)]*ar.ndim
if order=='largest':
slice_l[axis] = slice(-n,None,None)
idx = np.argpartition(ar, kth=-n, axis=axis)[slice_l]
elif order=='smallest':
slice_l[axis] = slice(None,n,None)
idx = np.argpartition(ar, kth=n, axis=axis)[slice_l]
else:
raise Exception('Invalid order value')
grid = np.ogrid[tuple(map(slice, ar.shape))]
grid[axis] = idx
out = np.zeros_like(ar)
out[grid] = ar[grid]
return out
Sample runs
Input array :
In [208]: np.random.seed(0)
...: I = np.random.randint(11,99,(2,2,6))
In [209]: I
Out[209]:
array([[[55, 58, 75, 78, 78, 20],
[94, 32, 47, 98, 81, 23]],
[[69, 76, 50, 98, 57, 92],
[48, 36, 88, 83, 20, 31]]])
Keep largest 2 elements along last axis :
In [210]: keep(I, n=2, axis=-1, order='largest')
Out[210]:
array([[[ 0, 0, 0, 78, 78, 0],
[94, 0, 0, 98, 0, 0]],
[[ 0, 0, 0, 98, 0, 92],
[ 0, 0, 88, 83, 0, 0]]])
Keep largest 1 element along first axis :
In [211]: keep(I, n=1, axis=1, order='largest')
Out[211]:
array([[[ 0, 58, 75, 0, 0, 0],
[94, 0, 0, 98, 81, 23]],
[[69, 76, 0, 98, 57, 92],
[ 0, 0, 88, 0, 0, 0]]])
Keep smallest 2 elements along last axis :
In [212]: keep(I, n=2, axis=-1, order='smallest')
Out[212]:
array([[[55, 0, 0, 0, 0, 20],
[ 0, 32, 0, 0, 0, 23]],
[[ 0, 0, 50, 0, 57, 0],
[ 0, 0, 0, 0, 20, 31]]])

Multiply each row of one array with each element of another array in numpy

I have two arrays A and B in numpy. A holds cartesian coordinates, each row is one point in 3D space and has the shape (r, 3). B has the shape (r, n) and holds integers.
What I would like to do is multiply each element of B with each row in A, so that the resulting array has the shape (r, n, 3). So for example:
# r = 3
A = np.array([1,1,1, 2,2,2, 3,3,3]).reshape(3,3)
# n = 2
B = np.array([10, 20, 30, 40, 50, 60]).reshape(3,2)
# Result with shape (3, 2, 3):
# [[[10,10,10], [20,20,20]],
# [[60,60,60], [80,80,80]]
# [[150,150,150], [180,180,180]]]
I'm pretty sure this can be done with np.einsum, but I've been trying this for quite a while now and can't get it to work.
Use broadcasting -
A[:,None,:]*B[:,:,None]
Since np.einsum also supports broadcasting, you can use that as well (thanks to #ajcr for suggesting this concise version) -
np.einsum('ij,ik->ikj',A,B)
Sample run -
In [22]: A
Out[22]:
array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])
In [23]: B
Out[23]:
array([[10, 20],
[30, 40],
[50, 60]])
In [24]: A[:,None,:]*B[:,:,None]
Out[24]:
array([[[ 10, 10, 10],
[ 20, 20, 20]],
[[ 60, 60, 60],
[ 80, 80, 80]],
[[150, 150, 150],
[180, 180, 180]]])
In [25]: np.einsum('ijk,ij->ijk',A[:,None,:],B)
Out[25]:
array([[[ 10, 10, 10],
[ 20, 20, 20]],
[[ 60, 60, 60],
[ 80, 80, 80]],
[[150, 150, 150],
[180, 180, 180]]])

New array of smaller size excluding one value from each column

In Python 2.7 using numpy or by any means if I had an array of any size and wanted to excluded certain values and output the new array how would I do that? Here is What I would like
[(1,2,3),
(4,5,6), then exclude [4,2,9] to make the array[(1,5,3),
(7,8,9)] (7,8,6)]
I would always be excluding data the same length as the row length and always only one entry per column. [(1,5,3)] would be another example of data I would want to excluded. So every time I loop the function it reduces the array row size by one. I would imagine I have to use a masked array or convert my mask to a masked array and subtract the two then maybe condense the output but I have no idea how. Thanks for your time.
You can do it very efficiently if you transform your 2-D array in an unraveled 1-D array. Then you repeat the array with the elements to be excluded, called e in order to do an element-wise comparison:
import numpy as np
a = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
e = [1, 5, 3]
ar = a.T.ravel()
er = np.repeat(e, a.shape[0])
ans = ar[er != ar].reshape(a.shape[1], a.shape[0]-1).T
But it will work if each element in e only matches one row of a.
EDIT:
as suggested by #Jaime, you can avoid the ravel() and get the same result doing directly:
ans = a.T[(a != e).T].reshape(a.shape[1], a.shape[0]-1).T
To exclude vector e from matrix a:
import numpy as np
a = np.array([(1,2,3), (4,5,6), (7,8,9)])
e = [4,2,9]
print np.array([ [ i for i in a.transpose()[j] if i != e[j] ]
for j in range(len(e)) ]).transpose()
This would take some work to generalize, but here's something that can handle 2-d cases of the kind you describe. If passed unexpected input, this won't notice and will generate strange results, but it's at least a starting point:
def columnwise_compress(a, values):
a_shape = a.shape
a_trans_flat = a.transpose().reshape(-1)
compressed = a_trans_flat[~numpy.in1d(a_trans_flat, values)]
return compressed.reshape(a_shape[:-1] + ((a_shape[0] - 1),)).transpose()
Tested:
>>> columnwise_compress(numpy.arange(9).reshape(3, 3) + 1, [4, 2, 9])
array([[1, 5, 3],
[7, 8, 6]])
>>> columnwise_compress(numpy.arange(9).reshape(3, 3) + 1, [1, 5, 3])
array([[4, 2, 6],
[7, 8, 9]])
The difficulty is that you're asking for "compression" of a kind that numpy.compress doesn't do (removing different values for each column or row) and you're asking for compression along columns instead of rows. Compressing along rows is easier because it moves along the natural order of the values in memory; you might consider working with transposed arrays for that reason. If you want to do that, things become a bit simpler:
>>> a = numpy. array([[1, 4, 7],
... [2, 5, 8],
... [3, 6, 9]])
>>> a[~numpy.in1d(a, [4, 2, 9]).reshape(3, 3)].reshape(3, 2)
array([[1, 7],
[5, 8],
[3, 6]])
You'll still need to handle shape parameters intelligently if you do it this way, but it will still be simpler. Also, this assumes there are no duplicates in the original array; if there are, this could generate wrong results. Saullo's excellent answer partially avoids the problem, but any value-based approach isn't guaranteed to work unless you're certain that there aren't duplicate values in the columns.
In the spirit of #SaulloCastro's answer, but handling multiple occurrences of items, you can remove the first occurrence on each column doing the following:
def delete_skew_row(a, b) :
rows, cols = a.shape
row_to_remove = np.argmax(a == b, axis=0)
items_to_remove = np.ravel_multi_index((row_to_remove,
np.arange(cols)),
a.shape, order='F')
ret = np.delete(a.T, items_to_remove)
return np.ascontiguousarray(ret.reshape(cols,rows-1).T)
rows, cols = 5, 10
a = np.random.randint(100, size=(rows, cols))
b = np.random.randint(rows, size=(cols,))
b = a[b, np.arange(cols)]
>>> a
array([[50, 46, 85, 82, 27, 41, 45, 27, 17, 26],
[92, 35, 14, 34, 48, 27, 63, 58, 14, 18],
[90, 91, 39, 19, 90, 29, 67, 52, 68, 69],
[10, 99, 33, 58, 46, 71, 43, 23, 58, 49],
[92, 81, 64, 77, 61, 99, 40, 49, 49, 87]])
>>> b
array([92, 81, 14, 82, 46, 29, 67, 58, 14, 69])
>>> delete_skew_row(a, b)
array([[50, 46, 85, 34, 27, 41, 45, 27, 17, 26],
[90, 35, 39, 19, 48, 27, 63, 52, 68, 18],
[10, 91, 33, 58, 90, 71, 43, 23, 58, 49],
[92, 99, 64, 77, 61, 99, 40, 49, 49, 87]])

Categories

Resources