Finding index by iterating over each row of matrix - python

I have an numpy array 'A' of size 5000x10. I also have another number 'Num'. I want to apply the following to each row of A:
import numpy as np
np.max(np.where(Num > A[0,:]))
Is there a pythonic way than writing a for loop for above.

You could use argmax -
A.shape[1] - 1 - (Num > A)[:,::-1].argmax(1)
Alternatively with cumsum and argmax -
(Num > A).cumsum(1).argmax(1)
Explanation : With np.max(np.where(..), we are basically looking to get the last occurrence of matches along each row on the comparison.
For the same, we can use argmax. But, argmax on a boolean array gives us the first occurrence and not the last one. So, one trick is to perform the comparison and flip the columns with [:,::-1] and then use argmax. The column indices are then subtracted by the number of cols in the array to make it trace back to the original order.
On the second approach, it's very similar to a related post and therefore quoting from it :
One of the uses of argmax is to get ID of the first occurence of the max element along an axis in an array . So, we get the cumsum along the rows and get the first max ID, which represents the last non-zero elem. This is because cumsum on the leftover elements won't increase the sum value after that last non-zero element.

Related

Understanding np.ix_

Code:
import numpy as np
ray = [1,22,33,42,51], [61,71,812,92,103], [113,121,132,143,151], [16,172,183,19,201]
ray = np.asarray(ray)
type(ray)
ray[np.ix_([-2:],[3:4])]
I'd like to use index slicing and get a subarray consisting of the last two rows and the 3rd/4th columns. My current code produces an error:
I'd also like to sum each column. What am I doing wrong? I cannot post a picture because I need at least 10 reputation points.
So you want to make a slice of an array. The most straightforward way to do it is... slicing:
slice = ray[-2:,3:]
or if you want it explicitly
slice = ray[-2:,3:5]
See it explained in Understanding slicing
But if you do want to use np.ix_ for some reason, you need
slice = ray[np.ix_([-2,-1],[3,4])]
You can't use : here, because [] here don't make a slice, they construct lists and you should specify explicitly every row number and every column number you want in the result. If there are too many consecutive indices, you may use range:
slice = ray[np.ix_(range(-2, 0),range(3, 5))]
And to sum each column:
slice.sum(0)
0 means you want to reduce the 0th dimension (rows) by summation and keep other dimensions (columns in this case).

Find index of the maximum value in a numpy array

I have a numpy array called predictions as follows
array([[3.7839172e-06, 8.0308418e-09, 2.2542761e-06, 5.9392878e-08,
5.3137046e-07, 1.7033290e-05, 1.7738441e-07, 1.0742254e-03,
1.8656212e-06, 9.9890006e-01]], dtype=float32)
In order to get the index of the maximum value in this array, I used the following
np.where(prediction==prediction.max())
But the result I am getting showing index 0 also.
(array([0], dtype=int64), array([9], dtype=int64))
Does anyone know why is it showing index 0 also?
Also how can I get just the index number instead of showing as (array([9], dtype=int64)
Use built-in function for it:
prediction.argmax()
output:
9
Also, that index 0 is the row number, so the max is at row 0 and column 9.
The predictions array here is two dimensional. When you call np.where with only a condition, this is the same as calling np.asarray(condition).nonzero() which returns you the indices of the non-zero elements of prediction==prediction.max() which is a boolean array with the only non-zero element at (0,9).
What you are looking for is the argmax function which will give you the index of the maximum value along an axis. You effectively only have one axis (2d but only one row) here so this should be fine.
As the other answers mentioned, you have a 2D array, so you end up with two indices. Since the array is just a row, the first index is always zero. You can bypass this in a number of ways:
Use prediction.argmax(). The default axis argument is None, which means operate on a flattened array. Other options that will get you the same result are prediction.argmax(-1) (last axis) and prediction.argmax(1) (second axis). Keep in mind that you will only ever get the index of the first maximum this way. That's fine if you only ever expect to have one, or only need one.
Use np.flatnonzero to get the linear indices similarly to the way you were doing:
np.flatnonzero(perdiction == prediction.max())
Use np.nonzero or np.where, but extract the axis you care about:
np.nonzero(prediction == prediction.max())[1]
ravel the array on input:
np.where(prediction.ravel() == prediction.max())
Do the same thing, but with np.squeeze:
np.nonzero(prediction.squeeze() == prediction.max())

Finding the index of the maximum number in a python matrix which includes strings

I understand that
np.argmax(np.max(x, axis=1))
returns the index of the row that contains the maximum value and
np.argmax(np.max(x, axis=0))
returns the index of the row that contains the maximum value.
But what if the matrix contained strings? How can I change the code so that it still finds the index of the largest value?
Also (if there's no way to do what I previously asked for), can I change the code so that the operation is only carried out on a sub-section of the matrix, for instance, on the bottom right '2x2' sub-matrix in this example:
array = [['D','F,'J'],
['K',3,4],
['B',3,1]]
[[3,4],
[3,1]]
Can you try first converting the column to type dtype? If you take the min/max of a dtype column, it should use string values for the minimum/maximum.
Although not efficient, this could be one way to find index of the maximum number in the original matrix by using slices:
newmax=0
newmaxrow=0
newmaxcolumn=0
for row in [array[i][1:] for i in range(1,2)]:
for num in row:
if num>newmax:
newmax=num
newmaxcolumn=row.index(newmax)+1
newmaxrow=[array[i][1:] for i in range(1,2)].index(row)+1
Note: this method would not work if the lagest number lies within row 0 or column 0.

Finding index by iterating each element of a vector over each row of matrix [duplicate]

I have an numpy array 'A' of size 5000x10. I also have another number 'Num'. I want to apply the following to each row of A:
import numpy as np
np.max(np.where(Num > A[0,:]))
Is there a pythonic way than writing a for loop for above.
You could use argmax -
A.shape[1] - 1 - (Num > A)[:,::-1].argmax(1)
Alternatively with cumsum and argmax -
(Num > A).cumsum(1).argmax(1)
Explanation : With np.max(np.where(..), we are basically looking to get the last occurrence of matches along each row on the comparison.
For the same, we can use argmax. But, argmax on a boolean array gives us the first occurrence and not the last one. So, one trick is to perform the comparison and flip the columns with [:,::-1] and then use argmax. The column indices are then subtracted by the number of cols in the array to make it trace back to the original order.
On the second approach, it's very similar to a related post and therefore quoting from it :
One of the uses of argmax is to get ID of the first occurence of the max element along an axis in an array . So, we get the cumsum along the rows and get the first max ID, which represents the last non-zero elem. This is because cumsum on the leftover elements won't increase the sum value after that last non-zero element.

Minimum element from the matrix column

I need to find minimum over all elements from the column which has the maximum column sum.
I do the following things:
Create random matrix
from numpy import *
a = random.rand(5,4)
Then calculate sum of each column and find index of the maximum element
c = a.sum(axis=0)
d = argmax(c)
Then I try to find the minimum number in this column, but I am quite bad with syntax, I know how to find the minimum element in the row with current index.
e = min(a[d])
But how can I change it for columns?
You can extract the minimum value of a column as follows (using the variables you have indicated):
e=a[:,d].min()
Note that using
a=min(a[:,d])
will break you out of Numpy, slowing things down (thanks for pointing this out #SaulloCastro).

Categories

Resources