PYTHON - Calculate Min and Index of Min in an NdArray - python

I have an n*n array, and I want to find the min in the array, and get the index of the min in [x,y] format
Of course, this can be done using for loops and using temporary variables, but I am looking for a more sophisticated process to do this.
Example -
[[1,2,8],
[7,4,2],
[9,1,7],
[0,1,5],
[6,-4,3]]
I should get the following output -
Output-
Min = -4
Index = [4,1]
Can I implement something similar?
TIA.

Global minimum value and index
Flatten the array, get the argmin index. Get the corresponding row-col indices from it with np.unravel_index. Also, index into the flattened array with the earlier obtained flattened argmin index for the minimum value.
def smallest_val_index(a):
idx = a.ravel().argmin()
return a.ravel()[idx], np.unravel_index(idx, a.shape)
Sample run -
In [182]: a
Out[182]:
array([[ 1, 2, 8],
[ 7, 4, 2],
[ 9, 1, 7],
[ 0, 1, 5],
[ 6, -4, 3]])
In [183]: val, indx = smallest_val_index(a)
In [184]: val
Out[184]: -4
In [185]: indx
Out[185]: (4, 1)
Global maximum value and index
Similarly, to get the global maximum value, use argmax -
def largest_val_index(a):
idx = a.ravel().argmax()
return a.ravel()[idx], np.unravel_index(idx, a.shape)
Sample run -
In [187]: a
Out[187]:
array([[ 1, 2, 8],
[ 7, 4, 2],
[ 9, 1, 7],
[ 0, 1, 5],
[ 6, -4, 3]])
In [188]: largest_val_index(a)
Out[188]: (9, (2, 0))

Related

Scipy's linear_sum_assignment giving incorrect result

When I tried using scipy.optimize.linear_sum_assignment as shown, it gives the assignment vector [0 2 3 1] with a total cost of 15.
However, from the cost matrix c, you can see that for the second task, the 5th agent has a cost of 1. So the expected assignment should be [0 3 None 2 1] (total cost of 9)
Why is linear_sum_assignment not returning the optimal assignments?
from scipy.optimize import linear_sum_assignment
c = [
[1, 5, 9, 5],
[5, 8, 3, 2],
[3, 2, 6, 8],
[7, 3, 5, 4],
[2, 1, 9, 9],
]
results = linear_sum_assignment(c)
print(results[1]) # [0 2 3 1]
linear_sum_assignment returns a tuple of two arrays. These are the row indices and column indices of the assigned values. For your example (with c converted to a numpy array):
In [51]: c
Out[51]:
array([[1, 5, 9, 5],
[5, 8, 3, 2],
[3, 2, 6, 8],
[7, 3, 5, 4],
[2, 1, 9, 9]])
In [52]: row, col = linear_sum_assignment(c)
In [53]: row
Out[53]: array([0, 1, 3, 4])
In [54]: col
Out[54]: array([0, 2, 3, 1])
The corresponding index pairs from row and col give the selected entries. That is, the indices of the selected entries are (0, 0), (1, 2), (3, 3) and (4, 1). It is these pairs that are the "assignments".
The sum associated with this assignment is 9:
In [55]: c[row, col].sum()
Out[55]: 9
In the original version of the question (but since edited),
it looks like you wanted to know the row index for each column, so you expected [0, 4, 1, 3]. The values that you want are in row, but the order is not what you expect, because the indices in col are not simply [0, 1, 2, 3]. To get the result in the form that you expected, you have to reorder the values in row based on the order of the indices in col. Here are two ways to do that.
First:
In [56]: result = np.zeros(4, dtype=int)
In [57]: result[col] = row
In [58]: result
Out[58]: array([0, 4, 1, 3])
Second:
In [59]: result = row[np.argsort(col)]
In [60]: result
Out[60]: array([0, 4, 1, 3])
Note that the example in the linear_sum_assignment docstring is potentially misleading; because it displays only col_ind in the python session, it gives the impression that col_ind is "the answer". In general, however, the answer involves both of the returned arrays.

Numpy: for each element in one dimension, find coordinates of maximum of sub-array

I've seen variations of this question asked a few times but so far haven't seen any answers that get to the heart of this general case. I have an n-dimensional array of shape [a, b, c, ...] . For some dimension x, I want to look at each sub-array and find the coordinates of the maximum.
For example, say b = 2, and that's the dimension I'm interested in. I want the coordinates of the maximum of [:, 0, :, ...] and [:, 1, :, ...] in the form a_max = [a_max_b0, a_max_b1], c_max = [c_max_b0, c_max_b1], etc.
I've tried to do this by reshaping my input matrix to a 2d array [b, a*c*d*...], using argmax along axis 0, and unraveling the indices, but the output coordinates don't wind up giving the maxima in my dataset. In this case, n = 3 and I'm interested in axis 1.
shape = gains_3d.shape
idx = gains_3d.reshape(shape[1], -1)
idx = idx.argmax(axis = 1)
a1, a2 = np.unravel_index(idx, [shape[0], shape[2]])
Obviously I could use a loop, but that's not very pythonic.
For a concrete example, I randomly generated a 4x2x3 array. I'm interested in axis 1, so the output should be two arrays of length 2.
testarray = np.array([[[0.17028444, 0.38504759, 0.64852725],
[0.8344524 , 0.54964746, 0.86628204]],
[[0.77089997, 0.25876277, 0.45092835],
[0.6119848 , 0.10096425, 0.627054 ]],
[[0.8466859 , 0.82011746, 0.51123959],
[0.26681694, 0.12952723, 0.94956865]],
[[0.28123628, 0.30465068, 0.29498136],
[0.6624998 , 0.42748154, 0.83362323]]])
testarray[:,0,:] is
array([[0.17028444, 0.38504759, 0.64852725],
[0.77089997, 0.25876277, 0.45092835],
[0.8466859 , 0.82011746, 0.51123959],
[0.28123628, 0.30465068, 0.29498136]])
, so the first element of the first output array will be 2, and the first element of the other will be 0, pointing to 0.8466859. The second elements of the two matrices will be 2 and 2, pointing to 0.94956865 of testarray[:,1,:]
Let's first try to get a clear idea of what you are trying to do:
Sample 3d array:
In [136]: arr = np.random.randint(0,10,(2,3,4))
In [137]: arr
Out[137]:
array([[[1, 7, 6, 2],
[1, 5, 7, 1],
[2, 2, 5, *6*]],
[[*9*, 1, 2, 9],
[2, *9*, 3, 9],
[0, 2, 0, 6]]])
After fiddling around a bit I came up with this iteration, showing the coordinates for each middle dimension, and the max value
In [151]: [(i,np.unravel_index(np.argmax(arr[:,i,:]),(2,4)),np.max(arr[:,i,:])) for i in range
...: (3)]
Out[151]: [(0, (1, 0), 9), (1, (1, 1), 9), (2, (0, 3), 6)]
I can move the unravel outside the iteration:
In [153]: np.unravel_index([np.argmax(arr[:,i,:]) for i in range(3)],(2,4))
Out[153]: (array([1, 1, 0]), array([0, 1, 3]))
Your reshape approach does avoid this loop:
In [154]: arr1 = arr.transpose(1,0,2) # move our axis first
In [155]: arr1 = arr1.reshape(3,-1)
In [156]: arr1
Out[156]:
array([[1, 7, 6, 2, 9, 1, 2, 9],
[1, 5, 7, 1, 2, 9, 3, 9],
[2, 2, 5, 6, 0, 2, 0, 6]])
In [158]: np.argmax(arr1,axis=1)
Out[158]: array([4, 5, 3])
In [159]: np.unravel_index(_,(2,4))
Out[159]: (array([1, 1, 0]), array([0, 1, 3]))
max and argmax take only one axis value, where as you want the equivalent of taking the max along all but one axis. Some ufunc takes a axis tuple, but these do not. The transpose and reshape may be the only way.
In [163]: np.max(arr1,axis=1)
Out[163]: array([9, 9, 6])

Sorted array by column sum and excluding the largest sum of each column using Numpy

I would like to sort an array by column sum and delete the largest element of each column then continue the sorting.
#sorted by sum of columns
def sorting(a):
b = np.sum(a, axis = 0)
idx = b.argsort()
a = np.take(a, idx, axis=1)
return a
arr = [[1,2,3,8], [3,0,2,1],[5, 4, 25, 67], [11, 1, 6, 10]]
print(sorting(arr))
Here is the output:
[[ 2 1 3 8]
[ 0 3 2 1]
[ 4 5 25 67]
[ 1 11 6 10]]
I was able to able to find the max of each column and their indexes but I couldn't delete them without deleting the whole row/column. Please any help I am new to numpy!!!
Though not very elegant, one way to achieve this would be like this using broadcasting and fancy/advanced indexing:
import numpy as np
arr = np.array([[1,2,3,8], [3,0,2,1],[5, 4, 25, 67], [11, 1, 6, 10]])
First get the intermediate array sorted by column sums.
arr1 = arr[:, arr.sum(axis = 0).argsort()]
print(arr1)
# array([[ 2, 1, 3, 8],
# [ 0, 3, 2, 1],
# [ 4, 5, 25, 67],
# [ 1, 11, 6, 10]])
Next get where the maximas occur in each column.
idx = arr1.argmax(axis = 0)
print(idx)
# array([2, 3, 2, 2])
Now prepare row and column index arrays to slice from arr1. Note that the line to compute rows essentially performs a set difference of {0, 1, 2, 3} (in general to number of rows in arr) for each element in idx above, and stores them along the columns of the rows matrix.
k = np.arange(arr1.shape[0]) # original number of rows
rows = np.nonzero(k != idx[:, None])[1].reshape(-1, arr1.shape[0] - 1).T
cols = np.arange(arr1.shape[1])
print(rows)
# array([[0, 0, 0, 0],
# [1, 1, 1, 1],
# [3, 2, 3, 3]])
Note that cols will be broadcasted to the shape of rows while indexing arr1 by them. For your understanding cols will look like this to be compatible with rows:
print(np.broadcast_to(cols, rows.shape))
# array([[0, 1, 2, 3],
# [0, 1, 2, 3],
# [0, 1, 2, 3]])
Basically when you (fancy) index arr1 by them, you get the 0th column for rows 0, 1 and 3; 1st column for rows 0, 1 and 2 and so on. Hope you get the idea.
arr2 = arr1[rows, cols]
print(arr2)
# array([[ 2, 1, 3, 8],
# [ 0, 3, 2, 1],
# [ 1, 5, 6, 10]])
You can write a simple function composing these steps for your convenience to perform the operation multiplie times.

get maximum of absolute along axis

I have a couple of ndarrays with same shape, and I would like to get one array (of same shape) with the maximum of the absolute values for each element. So I decided to stack all arrays, and then pick the values along the new stacked axis. But how to do this?
Example
Say we have two 1-D arrays with 4 elements each, so my stacked array looks like
>>> stack
array([[ 4, 1, 2, 3],
[ 0, -5, 6, 7]])
If I would just be interested in the maximum I could just do
>>> numpy.amax(stack, axis=0)
array([4, 1, 6, 7])
But I need to consider negative values as well, so I was going for
>>> ind = numpy.argmax(numpy.absolute(stack), axis=0)
>>> ind
array([0, 1, 1, 1])
So now I have the indices I need, but how to apply this to the stacked array? If I just index stack by ind, numpy is doing something broadcasting stuff I don't need:
>>> stack[ind]
array([[ 4, 1, 2, 3],
[ 0, -5, 6, 7],
[ 0, -5, 6, 7],
[ 0, -5, 6, 7]])
What I want to get is array([4, -5, 6, 7])
Or to ask from a slightly different perspective: How do I get the array numpy.amax(stack, axis=0) based on the indices returned by numpy.argmax(stack, axis=0)?
The stacking operation would be inefficient. We can simply use np.where to do the choosing based on the absolute valued comparisons -
In [198]: a
Out[198]: array([4, 1, 2, 3])
In [199]: b
Out[199]: array([ 0, -5, 6, 7])
In [200]: np.where(np.abs(a) > np.abs(b), a, b)
Out[200]: array([ 4, -5, 6, 7])
This works on generic n-dim arrays without any modification.
If you have 2D numpy ndarray, classical indexing no longer applies. So to achieve what you want, to avoid brodcatsting, you have to index with 2D array too:
>>> stack[[ind,np.arange(stack.shape[1])]]
array([ 4, -5, 6, 7])
For 'normal' Python:
>>> a=[[1,2],[3,4]]
>>> b=[0,1]
>>> [x[y] for x,y in zip(a,b)]
[1, 4]
Perhaps it can be applied to arrays too, I am not familiar enough with Numpy.
Find array of max and min and combine using where
maxs = np.amax(stack, axis=0)
mins = np.amin(stack, axis=0)
max_abs = np.where(np.abs(maxs) > np.abs(mins), maxs, mins)

Is there a way to get the index of the median in python in one command?

Is there something like numpy.argmin(x), but for median?
a quick approximation:
numpy.argsort(data)[len(data)//2]
In general, this is an ill-posed question because an array does not necessarily contain its own median for numpy's definition of the median. For example:
>>> np.median([1, 2])
1.5
But when the length of the array is odd, the median will generally be in the array, so asking for its index does make sense:
>>> np.median([1, 2, 3])
2
For odd-length arrays, an efficient way to determine the index of the median value is by using the np.argpartition function. For example:
import numpy as np
def argmedian(x):
return np.argpartition(x, len(x) // 2)[len(x) // 2]
# Works for odd-length arrays, where the median is in the array:
x = np.random.rand(101)
print("median in array:", np.median(x) in x)
# median in array: True
print(x[argmedian(x)], np.median(x))
# 0.5819150016674371 0.5819150016674371
# Doesn't work for even-length arrays, where the median is not in the array:
x = np.random.rand(100)
print("median in array:", np.median(x) in x)
# median in array: False
print(x[argmedian(x)], np.median(x))
# 0.6116799104572843 0.6047559243909065
This is quite a bit faster than the accepted sort-based solution as the size of the array grows:
x = np.random.rand(1000)
%timeit np.argsort(x)[len(x)//2]
# 10000 loops, best of 3: 25.4 µs per loop
%timeit np.argpartition(x, len(x) // 2)[len(x) // 2]
# 100000 loops, best of 3: 6.03 µs per loop
It seems old question, but i found a nice way to make it so:
import random
import numpy as np
#some random list with 20 elements
a = [random.random() for i in range(20)]
#find the median index of a
medIdx = a.index(np.percentile(a,50,interpolation='nearest'))
The neat trick here is the percentile builtin option for nearest interpolation, which return a "real" median value from the list, so it is safe to search for it afterwards.
You can keep the indices with the elements (zip) and sort and return the element on the middle or two elements on the middle, however sorting will be O(n.logn). The following method is O(n) in terms of time complexity.
import numpy as np
def arg_median(a):
if len(a) % 2 == 1:
return np.where(a == np.median(a))[0][0]
else:
l,r = len(a) // 2 - 1, len(a) // 2
left = np.partition(a, l)[l]
right = np.partition(a, r)[r]
return [np.where(a == left)[0][0], np.where(a == right)[0][0]]
print(arg_median(np.array([ 3, 9, 5, 1, 15])))
# 1 3 5 9 15, median=5, index=2
print(arg_median(np.array([ 3, 9, 5, 1, 15, 12])))
# 1 3 5 9 12 15, median=5,9, index=2,1
Output:
2
[2, 1]
The idea is if there is only one median (array has a odd length), then it returns the index of the median. If we need to average to elements (array has even length) then it returns the indices of these two elements in an list.
The problem with the accepted answer numpy.argsort(data)[len(data)//2] is that it only works for 1-dimensional arrays. For n-dimensional arrays we need to use a different solution which is based on the answer proposed by #Hagay.
import numpy as np
# Initialize random 2d array, a
a = np.random.randint(0, 7, size=16).reshape(4,4)
array([[3, 1, 3, 4],
[5, 2, 1, 4],
[4, 2, 4, 2],
[6, 1, 0, 6]])
# Get the argmedians
np.stack(np.nonzero(a == np.percentile(a,50,interpolation='nearest')), axis=1)
array([[0, 0],
[0, 2]])
# Initialize random 3d array, a
a = np.random.randint(0, 10, size=27).reshape(3,3,3)
array([[[3, 5, 3],
[7, 4, 3],
[8, 3, 0]],
[[2, 6, 1],
[7, 8, 8],
[0, 6, 5]],
[[0, 7, 8],
[3, 1, 0],
[9, 6, 7]]])
# Get the argmedians
np.stack(np.nonzero(a == np.percentile(a,50,interpolation='nearest')), axis=1)
array([[0, 0, 1],
[1, 2, 2]])
The accepted answer numpy.argsort(data)[len(data)//2] can not handle arrays with NaNs.
For 2-D array, to get the median column index in the axis=1 (along row):
df = pd.DataFrame({'a': [1, 2, 3.3, 4],
'b': [80, 23, np.nan, 88],
'c': [75, 45, 76, 67],
'd': [5, 4, 6, 7]})
data = df.to_numpy()
# data
array([[ 1. , 80. , 75. , 5. ],
[ 2. , 23. , 45. , 4. ],
[ 3.3, nan, 76. , 6. ],
[ 4. , 88. , 67. , 7. ]])
# median, ignoring NaNs
amedian = np.nanmedian(data, axis=1)
aabs = np.abs(data.T-amedian).T
idx = np.nanargmin(aabs, axis=1)
idx
array([2, 1, 3, 2])
# the accepted answer, please note the third index is 2, the correspnoding cell value is 76, which should not be the median value in row [ 3.3, nan, 76. , 6. ]
idx = np.argsort(data)[:, len(data[0])//2]
idx
array([2, 1, 2, 2])
Since this is a 4*4 array with even columns, the column index of median value for row No.3 should be 6 instead of 76.

Categories

Resources