Let say I have the matrix
import numpy as np
A = np.matrix([[1,2,3,33],[4,5,6,66],[7,8,9,99]])
I am trying to understand the function argmax, as far as I know it returns the largest value
If I tried it on Python:
np.argmax(A[1:,2])
Should I get the largest element in the second row till the end of the row (which is the third row) and along the third column? So it should be the array [6 9], and arg max should return 9? But why when I run it on Python, it returns the value 1?
And if I want to return the largest element from row 2 onwards in column 3 (which is 9), how should I modify the code?
I have checked the Python documentation but still a bit unclear. Thanks for the help and explanation.
No argmax returns the position of the largest value. max returns the largest value.
import numpy as np
A = np.matrix([[1,2,3,33],[4,5,6,66],[7,8,9,99]])
np.argmax(A) # 11, which is the position of 99
np.argmax(A[:,:]) # 11, which is the position of 99
np.argmax(A[:1]) # 3, which is the position of 33
np.argmax(A[:,2]) # 2, which is the position of 9
np.argmax(A[1:,2]) # 1, which is the position of 9
It took me a while to figure this function out. Basically argmax returns you the index of the maximum value in the array. Now the array can be 1 dimensional or multiple dimensions. Following are some examples.
1 dimensional
a = [[1,2,3,4,5]]
np.argmax(a)
>>4
The array is 1 dimensional so the function simply returns the index of the maximum value(5) in the array, which is 4.
Multiple dimensions
a = [[1,2,3],[4,5,6]]
np.argmax(a)
>>5
In this example the array is 2 dimensional, with shape (2,3). Since no axis parameter is specified in the function, the numpy library flattens the array to a 1 dimensional array and then returns the index of the maximum value. In this case the array is transformed to [[1,2,3,4,5,6]] and then returns the index of 6, which is 5.
When parameter is axis = 0
a = [[1,2,3],[4,5,6]]
np.argmax(a, axis=0)
>>array([1, 1, 1])
The result here was a bit confusing to me at first. Since the axis is defined to be 0, the function will now try to find the maximum value along the rows of the matrix. The maximum value,6, is in the second row of the matrix. The index of the second row is 1. According to the documentation the dimension specified in the axis parameter will be removed. Since the shape of the original matrix was (2,3) and axis specified as 0, the returned matrix will have a shape of(3,) instead, since the 2 in the original shape(2,3) is removed.The row in which the maximum value was found is now repeated for the same number of elements as the columns in the original matrix i.e. 3.
When parameter is axis = 1
a = [[1,2,3],[4,5,6]]
np.argmax(a, axis=1)
>>array([2, 2])
Same concept as above but now index of the column is returned at which the maximum value is available. In this example the maximum value 6 is in the 3rd column, index 2. The column of the original matrix with shape (2,3) will be removed, transforming to (2,) and so the return array will display two elements, each showing the index of the column in which the maximum value was found.
Here is how argmax works. Let's suppose we have given array
matrix = np.array([[1,2,3],[4,5,6],[7,8,9], [9, 9, 9]])
Now, find the max value from given array
np.max(matrix)
The answer will be -> 9
Now find argmax of given array
np.argmax(matrix)
The answer will be -> 8
How it got 8, let's understand
python will convert array to one dimension, so array will look like
array([1, 2, 3, 4, 5, 6, 7, 8, 9, 9, 9, 9])
Index 0 1 2 3 4 5 6 7 8 9 10 11
so max value is 9 and first occurrence of 9 is at index 8. That's why answer of argmax is 8.
axis = 0 (column wise max)
Now, find max value column wise
np.argmax(matrix, axis=0)
Index 0 1 2
0 [1, 2, 3]
1 [4, 5, 6]
2 [7, 8, 9]
3 [9, 9, 9]
In first column values are 1 4 7 9, max value in first column is 9 and is at index 3
same for second column, values are 2 5 8 9 and max value in second column is 9 and is at index 3
for third column values are 3 6 9 9 and max value is 9 and is at index 2 and 3, so first occurrence of 9 is at index 2
so the output will be like [3, 3, 2]
axis = 1 (row wise)
Now find max value row wise
np.argmax(matrix, axis=1)
Index 0 1 2
0 [1, 2, 3]
1 [4, 5, 6]
2 [7, 8, 9]
3 [9, 9, 9]
for first row values are 1 2 3 and max value is 3 and is at index 2
for second row values are 4 5 6 and max value is 6 and is at index 2
for third row values are 7 8 9 and max value is 9 and is at index 2
for fourth row values are 9 9 9 and max value is 9 and is at index 0 1 2, but first occurrence of 9 is at index 0
so the output will be like [2 2 2 0]
argmax is a function which gives the index of the greatest number in the given row or column and the row or column can be decided using axis attribute of argmax funcion. If we give axis=0 then it will give the index from columns and if we give axis=1 then it will give the index from rows.
In your given example A[1:, 2] it will first fetch the values from 1st row on wards and the only 2nd column value from those rows, then it will find the index of max value from into the resulted matrix.
In my first steps in python i have tested this function. And the result with this example clarified me how works argmax.
Example:
# Generating 2D array for input
array = np.arange(20).reshape(4, 5)
array[1][2] = 25
print("The input array: \n", array)
# without axis
print("\nThe max element: ", np.argmax(array))
# with axis
print("\nThe indices of max element: ", np.argmax(array, axis=0))
print("\nThe indices of max element: ", np.argmax(array, axis=1))
Result Example:
The input array:
[[ 0 1 2 3 4]
[ 5 6 25 8 9]
[10 11 12 13 14]
[15 16 17 18 19]]
The max element: 7
The indices of max element: [3 3 1 3 3]
The indices of max element: [4 2 4 4]
In that result we can see 3 results.
The highest element in all array is in position 7.
The highest element in every column is in the last row which index is 3, except on third column where the highest value is in row number two which index is 1.
The highest element in every row is in the last column which index is 4, except on second row where the highest value is in third columen which index is 2.
Reference: https://www.crazygeeks.org/numpy-argmax-in-python/
I hope that it helps.
Related
I am reading Python for Data Analysis by Wes McKinney and came across the following:
Ranking assigns ranks from one through the number of valid data points in an array. The rank methods for Series and DataFrame are the place to look; by default rank breaks ties by assigning each group the mean rank:
In [215]: obj = pd.Series([7, -5, 7, 4, 2, 0, 4])
In [216]: obj.rank()
Out[216]:
0 6.5
1 1.0
2 6.5
3 4.5
4 3.0
5 2.0
6 4.5
dtype: float64
Unfortunately, I have no idea what this function does, and I find the explanation and the related documentation equally confusing: https://pandas.pydata.org/docs/reference/api/pandas.Series.rank.html
I can't make heads or tails of this, what is this function doing?
TL;DR
In general, Ranking creates the numerical values 1 through n for the sorted data with n values.
In order to understand pandas.Series.rank(), you need to first understand what the ranking is, you can refer to Ranking-Wikipedia and Test for Rank data to understand it clearly.
As rank works on sorted data, try to sort the data first
obj.sort_values()
1 -5
5 0
4 2
3 4
6 4
0 7
2 7
After sorting the data, each value will have its own rank from 1 to n, and as -5 is the lowest value, its rank is 1.
0 is the second lowest value so it will have rank 2, and 2 has rank 3, but 4 is the 4th lowest value, and is repeated.
As per Series.rank documentation, there is a parameter called method which has the default value as average, what it does is, it uses the average values as default for the repeated data. It first sorts the data then calculates the rank, and finally maps the input to an output based on the rank value.
Hence, two 4's will have ranks 4 and 5, and their average is 4.5, similarly, the two 7's have ranks 6 and 7, and the average is 6.5
Update: looking this over, I have figured it out.
-5 is the smallest value in the array, hence the argmin index (1) for the element with value -5 has rank==1.0, the next smallest value is 0, hence the index of that value has rank==2.0. Finally, the largest value is 7, but it appears twice, hence it is both the 6th and 7th ranked element, so it's average rank is 6.5
I hope all of you are having a great day. In my python class, we are learning how to use Numpy, so we got an assignment about that. My question is this: What is a rank array and how can I construct that with using python? My instructor tried to explain that with these lines but I did not understand anything actually :(
These are the instructions:
rank_calculator(A) - 5 pts
Given a numpy ndarray A, return its rank array.
Input: [[ 9 4 15 0 18]
[16 19 8 10 1]]
Return value: [[4 2 6 0 8]
[7 9 3 5 1]]
The return value should be an ndarray of the same size and shape as the original array A.
So, can someone explain that? I am not so good at Python, unfortunately :(
You can use numpy.argsort multiple times to handle a matrix, as suggested in this answer on SO.
import numpy as np
inp = np.array([[9,4,15,0,18],
[16,19,8,10,1]])
inp.ravel().argsort().argsort().reshape(inp.shape)
array([[4, 2, 6, 0, 8],
[7, 9, 3, 5, 1]])
What is a rank matrix?
In summary, if I were to take all the integers in the matrix, and sort them smallest to largest, then assign each one a rank from 0 to 9, that would result in the rank matrix. Notice that the smallest is 0 which gets a rank of 0, while largest is 19, which gets the last rank of 9.
How the double argsort works
#printing them so they align nicely
print('Array ->', end='')
for i in inp.ravel().astype('str'):
print(i.center(4), end='')
print('\n')
print('Sort1 ->', end='')
for i in inp.ravel().argsort().astype('str'):
print(i.center(4), end='')
print('\n')
print('Sort2 ->', end='')
for i in inp.ravel().argsort().argsort().astype('str'):
print(i.center(4), end='')
Array -> 9 4 15 0 18 16 19 8 10 1
Sort1 -> 3 9 1 7 0 8 2 5 4 6
Sort2 -> 4 2 6 0 8 7 9 3 5 1
Let's first summarize what argsort does. It takes the position of each element and puts them where they belong after sorting. Knowing this, we can write a backward logic which is sort of triangular in nature. Lets start from sort2, then sort1 and then array.
0th (in sort2) is 4th (in sort1), 4th (in sort1) is 0th (in array). So 0th (in array) is 0th (in sort2)
9th (in sort2) is 1st (in sort1), 1st (in sort1) is 9th (in array). So, 9th (in array) is 9th (in sort2)
6th (in sort2) is 9th (in sort1), 9th (in sort1) is 6th (in array). So, 6th (in array) is 6th (in sort2)
Its a bit confusing to wrap your head around it, but once you can understand how argsort() works, you shouldn't have a problem.
Q) What is a rank array?
Ans: It's basically the elements in their sorted order.
Basically what your teacher is asking you is to return each elements positions if they were sorted in ascending order.
CODE:
import numpy as np
A = np.array([[9, 4, 15, 0, 18],
[16, 19, 8, 10, 1]])
flatA = A.flatten()
sorted_flatA = sorted(flatA) # will become -> [0, 1, 4, 8, 9, 10, 15, 16, 18, 19]
# Using a 'MAP' to map the values of sorted_faltA to the index of sorted_faltA.
MAP = {}
for i in range(len(sorted_flatA)):
MAP[sorted_flatA[i]] = i
# Then simply going through the 2D array snd replacing the with their ranks.
res = np.zeros(A.shape)
for i in range(A.shape[0]):
for j in range(A.shape[1]):
res[i][j] = MAP[A[i][j]]
print(res)
I have a panda dataframe, it is used for a heatmap. I would like the minimal value of each column to be along the diagonal.
I've sorted the columsn using
data = data.loc[:, data.min().sort_values().index]
This works. Now I just need to sort the values such that the index of the min value in the first column is row 0, then the min value of second column is row 1, and so on.
Example
import seaborn as sns
import pandas as pd
data = [[5,1,9],
[7,8,6],
[5,3,2]]
data = pd.DataFrame(data)
#sns.heatmap(data)
data = data.loc[:, data.min().sort_values().index]
#sns.heatmap(data) # Gives result in step 1
# Step1, Columsn sorted by min value, 1, 2, 5
data = [[1,9,5],
[8,6,7],
[3,2,5]]
data = pd.DataFrame(data)
#sns.heatmap(data)
# How do i perform step two, maintinaing column order.
# Step 2, Rows sorted by min value 1,2,7
data = [[1,9,5],
[3,2,5],
[8,6,7]]
data = pd.DataFrame(data)
sns.heatmap(data)
Is this possible in panda in a clever way?
Setup
data = pd.DataFrame([[5, 1, 9], [7, 8, 6], [5, 3, 2]])
You can accomplish this by using argsort of the diagonal elements of your sorted DataFrame, then indexing the DataFrame using these values.
Step 1
Use your initial sort:
data = data.loc[:, data.min().sort_values().index]
1 2 0
0 1 9 5
1 8 6 7
2 3 2 5
Step 2
Use np.argsort with np.diag:
data.iloc[np.argsort(np.diag(data))]
1 2 0
0 1 9 5
2 3 2 5
1 8 6 7
I'm not quite sure, but you've already done the following to sort column
data = data.loc[:, data.min().sort_values().index]
the same trick could also be applied to sort row
data = data.loc[data.min(axis=1).sort_values().index, :]
To move some values around so that the min value within each column is placed along the diagonal you could try something like this:
for i in range(len(data)):
min_index = data.iloc[:, i].idxmin()
if data.iloc[i,i] != data.iloc[min_index, i]:
data.iloc[i,i], data.iloc[min_index,i] = data.iloc[min_index, i], data.iloc[i,i]
Basically just swap the min with the diagonal.
So I have an array of ranks (obtained from scipy.stats.rankdata):
1 2 5 3 4
The no.s are the ranks assigned to the corresponding indices. Now I want to shift the ranks downward by 2 positions. That is, I want the indices with the top 2 ranks to be assigned the last 2 ranks. The other elements must therefore increase in rank.
3 4 2 5 1
So the indices with the top 2 ranks i.e index 2 and 4 having ranks 5 and 4 are given the bottom 2 ranks 2 and 1 . The other elements are increased in rank subsequently.
How do I implement this shift for any top n ranks?
Here's one way:
In [19]: ranks = np.array([1, 2, 5, 3, 4])
In [20]: n = 2
In [21]: new_ranks = (ranks + n - 1) % len(ranks) + 1
In [22]: new_ranks
Out[22]: array([3, 4, 2, 5, 1])
By adding n to the ranks and taking the result modulo len(ranks), the high numbers wrap around to the low end. The -1 before modding shifts the values down by 1, because the modulo will work with the values 0, 1, 2, ... len(ranks) - 1. The +1 after modding restores the ranks to the range 1, 2, ..., len(ranks).
I have a minimum and a maximum value, i also have a 70X3 array. I would like to find all the values from the second column of the array which are within the range of the min and max value and export all 3 columns of the array for those values.
For example
A=[2,3,4
3,5,6
5,5,5
5,6,7
10,11,22
3,50,6]
Max value is 11 and min is 5 the result of the matrix would be something like
B=[3,5,6
5,5,5
5,6,7
10,11,22]
Up till now i what i did is :
for i in MatrixA[:,1]):
if i<maximum and i>minimum:
aa.append(i)
aa=np.asarray(aa)
But this only finds the range of values i need from the second column and not the corresponding values from column 1 and 3
You can use
A[numpy.logical_and(5 <= A[:, 1], A[:, 1] <= 11), :]
With simple numpy expression:
import numpy as np
a = np.array([[2,3,4],[3,5,6], [5,5,5], [5,6,7],[10,11,22], [3,50,6]])
b = a[(a[:,1] >= 5) & (a[:,1] <= 11)]
print(b)
The output:
[[ 3 5 6]
[ 5 5 5]
[ 5 6 7]
[10 11 22]]
a[:,1] - considering values from specified axis (1 axis, 2nd column)