select first occurance of minimum index from numpy array - python

I am trying to find out the index of the minimum value in each row and I am using below code.
#code
import numpy as np
C = np.array([[1,2,4],[2,2,5],[4,3,3]])
ind = np.where(C == C.min(axis=1).reshape(len(C),1))
ind
#output
(array([0, 1, 1, 2, 2], dtype=int64), array([0, 0, 1, 1, 2], dtype=int64))
but the problem it is returning all indices of minimum values in each row. but I want only the first occurrence of minimum values. like
(array([0, 1, 2], dtype=int64), array([0, 0, 1], dtype=int64))

If you want to use comparison against the minimum value, we need to use np.min and keep the dimensions with keepdims set as True to give us a boolean array/mask. To select the first occurance, we can use argmax along each row of the mask and thus have our desired output.
Thus, the implementation to get the corresponding column indices would be -
(C==C.min(1, keepdims=True)).argmax(1)
Sample step-by-step run -
In [114]: C # Input array
Out[114]:
array([[1, 2, 4],
[2, 2, 5],
[4, 3, 3]])
In [115]: C==C.min(1, keepdims=1) # boolean array of min values
Out[115]:
array([[ True, False, False],
[ True, True, False],
[False, True, True]], dtype=bool)
In [116]: (C==C.min(1, keepdims=True)).argmax(1) # argmax to get first occurances
Out[116]: array([0, 0, 1])
The first output of row indices would simply be a range array -
np.arange(C.shape[0])
To achieve the same column indices of first occurance of minimum values, a direct way would be to use np.argmin -
C.argmin(axis=1)

Related

Delete numpy axis 1 based on condition

I need to remove values from a np axis based on a condition.
For example, I would want to remove [:,2] (the second values on axis 1) if the first value == 0, else I would want to remove [:,3].
Input:
[[0,1,2,3],[0,2,3,4],[1,3,4,5]]
Output:
[[0,1,3],[0,2,4],[1,3,4]]
So now my output has one less value on the 1st axis, depending on if it met the condition or not.
I know I can isolate and manipulate this based on
array[np.where(array[:,0] == 0)] but then I would have to deal with each condition separately, and it's very important for me to preserve the order of this array.
I am dealing with 3D arrays & am hoping to be able to calculate all this simultaneously while preserving the order.
Any help is much appreciated!
A possible solution:
a = np.array([[0,1,2,3],[0,2,3,4],[1,3,4,5]])
b = np.arange(a.shape[1])
np.apply_along_axis(
lambda x: x[np.where(x[0] == 0, np.delete(b,2), np.delete(b,3))], 1, a)
Output:
array([[0, 1, 3],
[0, 2, 4],
[1, 3, 4]])
Since you are starting and ending with a list, a straight forward iteration is a good solution:
In [261]: alist =[[0,1,2,3],[0,2,3,4],[1,3,4,5]]
In [262]: for row in alist:
...: if row[0]==0: row.pop(2)
...: else: row.pop(3)
...:
In [263]: alist
Out[263]: [[0, 1, 3], [0, 2, 4], [1, 3, 4]]
A possible array approach:
In [273]: arr = np.array([[0,1,2,3],[0,2,3,4],[1,3,4,5]])
In [274]: mask = np.ones(arr.shape, bool)
In [275]: mask[np.arange(3),np.where(arr[:,0]==0,2,3)]=False
In [276]: mask
Out[276]:
array([[ True, True, False, True],
[ True, True, False, True],
[ True, True, True, False]])
arr[mask] will be 1d, but since we are deleting the same number of elements each row, we can reshape it:
In [277]: arr[mask].reshape(arr.shape[0],-1)
Out[277]:
array([[0, 1, 3],
[0, 2, 4],
[1, 3, 4]])
I expect the list approach will be faster for small cases, but the array should scale better. I don't know where the trade off is.

numpy argmax when values are equal

I got a numpy matrix and I want to get the index of the maximum value in each row. E.g.
[[1,2,3],[1,3,2],[3,2,1]]
will return
[0,1,2]
However, when there're more than 1 maximum value in each row, numpy.argmax will only return the smallest index. E.g.
[[0,0,0],[0,0,0],[0,0,0]]
will return
[0,0,0]
Can I change the default (smallest index) to some other values? E.g. when there're equal maximum values, return 1 or None, so that the above result will be
[1,1,1]
or
[None, None, None]
If I can do this in TensorFlow that'll be better.
Thanks!
You can use np.partition two find the two largest values and check if they are equal, and then use that as a mask in np.where to set the default value:
In [228]: a = np.array([[1, 2, 3, 2], [3, 1, 3, 2], [3, 5, 2, 1]])
In [229]: twomax = np.partition(a, -2)[:, -2:].T
In [230]: default = -1
In [231]: argmax = np.where(twomax[0] != twomax[1], np.argmax(a, -1), default)
In [232]: argmax
Out[232]: array([ 2, -1, 1])
A convenient value of "default" is -1, as argmax will not return that on its own. None does not fit in an integer array. A masked array is also an option, but I didn't go that far. Here is a NumPy implementation
def my_argmax(a):
rows = np.where(a == a.max(axis=1)[:, None])[0]
rows_multiple_max = rows[:-1][rows[:-1] == rows[1:]]
my_argmax = a.argmax(axis=1)
my_argmax[rows_multiple_max] = -1
return my_argmax
Example of use:
import numpy as np
a = np.array([[0, 0, 0], [4, 5, 3], [3, 4, 4], [6, 2, 1]])
my_argmax(a) # array([-1, 1, -1, 0])
Explanation: where selects the indexes of all maximal elements in each row. If a row has multiple maxima, the row number will appear more than once in rows array. Since this array is already sorted, such repetition is detected by comparing consecutive elements. This identifies the rows with multiple maxima, after which they are masked in the output of NumPy's argmax method.

python : selecting row where y==1 and column is 0 in a matrix

I'm new to python. This is my code:
np.random.seed(42)
x1=np.random.randn(5,4)
y1=np.random.randint(0,2,(5,1))
print(x1)
print(y1)
I want to select x1's column 1 and rows where y is 1 :
print(x1[y1==1, 1])
but I am getting error too many indices for array
You need numpy.where to extract an array of integers to feed NumPy indexing:
x1[np.where(y1==1)[0], 1]
To understand how this work, note that y1 == 1 returns the following Boolean array:
array([[ True],
[ True],
[False],
[ True],
[False]], dtype=bool)
numpy.where extracts indices of the True elements in the first element of the tuple returned:
print(np.where(y1==1))
(array([0, 1, 3], dtype=int64), array([0, 0, 0], dtype=int64))

numpy.where for row index which that row is not all zero

I have a large matrix which some rows are all zero. I want to get the index of the row that is not all zero. I tried
idx = np.where(mymatrix[~np.all(mymatrix != 0, axis=1)])
and got
(array([ 21, 21, 21, ..., 1853, 3191, 3191], dtype=int64),
array([3847, 3851, 3852, ..., 4148, 6920, 6921], dtype=int64))
Is the first array the row index? Is there more straightforward way to get only row index?
There is a straight way:
np.where(np.any(arr != 0, axis=1))
You are actually close enough to the solution yourself. You need to think a bit what you do inside the np.where().
I get this matrix as an example:
array([[1, 1, 1, 1],
[2, 2, 2, 2],
[0, 0, 0, 0],
[3, 3, 3, 3]])
# This will give you back a boolean array of whether your
# statement is true or false per raw
np.all(mymatrix != 0, axis=1)
array([ True, True, False, True], dtype=bool)
Now if you give that to the np.where() it will return your desired output:
np.where(np.all(mymatrix != 0, axis=1))
(array([0, 1, 3]),)
What you do wrong is try to accessing the matrix with the bool matrix you get.
# This will give you the raws without zeros.
mymatrix[np.all(mymatrix != 0, axis=1)]
array([[1, 1, 1, 1],
[2, 2, 2, 2],
[3, 3, 3, 3]])
# While this will give you the raws with only zeros
mymatrix[~np.all(mymatrix != 0, axis=1)]
Given an array like this, np.where() is not able to return the indices. It doesn't know what you ask for.

Keeping the indexes of deleted column

I want to remove features with low variance in my array of data. By using scikit-learn, the code will look like below.
>>> from sklearn.feature_selection import VarianceThreshold
>>> X = [[0, 2, 0, 3], [0, 1, 4, 3], [0, 1, 1, 3]]
>>> selector = VarianceThreshold()
>>> selector.fit_transform(X)
array([[2, 0],
[1, 4],
[1, 1]])
My question is how to catch the column indexes that have been deleted? Let say I want to use them to delete another array in the same column (0th and 3th column in the above example).
Any idea?
selector.get_support() will return an array which shows which columns are kept and which are removed. In above case:
selector.get_support()
will return
array([False, True, True, False], dtype=bool)
which means first and last indices of the original input (X) are removed.

Categories

Resources