Pulling elements from 2D array based on index array - python

I have an array as:
import numpy as np
A = np.arange(15).reshape(3, 5)
I also have an index array as:
ind = np.asarray([1,2,0,2,2])
The elements of ind represent the row number of A for each column of A.
i.e
I want to pull ind[0] = 1 element from column 0 of A
I want to pull ind[4] = 2 element from column 4 of A
Desired output is:
5, 11, 2, 13, 14

Using Numpy's fancy-indexing -
A[ind,np.arange(ind.size)]

Related

How to compare each dataframe row to each point of a tuple and assign the closest point's index to a new column?

Imagine the following dataset:
X Y
0 2 4
1 5 6
2 3 4
Now, imagine the following tuple of points: ((2,4), (6,5), (1,14))
How can I find the closest point to each row and assign the index of the point to a new column?
For example, since the closest point to the first row is the point with index 0, the first row would become:
X Y Closest_Point
0 2 4 0
Try with scipy , the logic here is broadcast
from scipy.spatial import distance
ary = distance.cdist(df.values, np.array(l), metric='euclidean')
ary.argmin(1)
Out[326]: array([0, 1, 0], dtype=int32)
I would for sure use Numpy to make both the tuple and the dataset into numpy arrays.
For the examples you gave:
import numpy as np
dataset = np.array([[2,4],[5,6],[3,4]])
points = np.array([[2,4],[6,5],[1,14]])
dataset_indexed = []
for i in range(dataset.shape[0]):
temp= (((dataset[i,0]-points[0,0])**2 +(dataset[i,1]-points[0,1])**2)**(1/2))
index=0
for n in range(points.shape[0]):
print(((dataset[i,0]-points[n,0])**2 +(dataset[i,1]-points[n,1])**2)**(1/2))
if(((dataset[i,0]-points[n,0])**2 +(dataset[i,1]-points[n,1])**2)**(1/2)<=temp):
temp= ((dataset[i,0]-points[n,0])**2 +(dataset[i,1]-points[n,1])**2)**(1/2)
index = n
dataset_indexed.append([dataset[i,0],dataset[i,1],index])

Using numpy to calculate mean?

I have a 2D array which looks like this:
array = [[23 ,89, 4, 3, 0],[12, 73 ,3, 5,1],[7, 9 ,12, 11 ,0]]
Where the last column is always 0 or 1 for all the rows. My aim is to calculate two means for column 0, where one mean will be when the last column's value is 0 and one of the mean will be when last column's value will be 1.
e.g. for given sample array above:
mean 1: 15 (mean for 0 column for all the rows where last column is 0)
mean 2: 12 (mean for 0 column for all the rows where last column is 1)
I have tried this (where train is my input array's name):
mean_c1_0=np.mean(train[:: , 0])
variance_c1_0=np.var(train[:: , 0])
This gets me mean and variance for column 0's ll the values.
I can always introduce one more for loop and couple of if conditions to keep checking last column and only then add corresponding values in column 0 but I am looking for an efficient approach. Since I am new to Python I was hoping if there is a numpy function that can get this done.
Can you point me to any such documentation ?
You can use numpy's array filtering. (see How can I slice a numpy array by the value of the ith field?), and just get the mean that way. No loops needed.
import numpy
x = numpy.array([[23, 89, 4, 3, 0],[12, 73, 3, 5, 1],[7, 9, 12, 11, 0]])
numpy.mean(x[x[:,-1]==1][::,0])
numpy.mean(x[x[:,-1]==0][::,0])
You can try this.
mean_of_zeros = np.mean(numpy_array[np.where(numpy_array[:,-1] == 0)])
mean_of_ones = np.mean(numpy_array[np.where(numpy_array[:,-1] == 1)])

Sorting pandas dataframe to get min value along diagonal

I have a panda dataframe, it is used for a heatmap. I would like the minimal value of each column to be along the diagonal.
I've sorted the columsn using
data = data.loc[:, data.min().sort_values().index]
This works. Now I just need to sort the values such that the index of the min value in the first column is row 0, then the min value of second column is row 1, and so on.
Example
import seaborn as sns
import pandas as pd
data = [[5,1,9],
[7,8,6],
[5,3,2]]
data = pd.DataFrame(data)
#sns.heatmap(data)
data = data.loc[:, data.min().sort_values().index]
#sns.heatmap(data) # Gives result in step 1
# Step1, Columsn sorted by min value, 1, 2, 5
data = [[1,9,5],
[8,6,7],
[3,2,5]]
data = pd.DataFrame(data)
#sns.heatmap(data)
# How do i perform step two, maintinaing column order.
# Step 2, Rows sorted by min value 1,2,7
data = [[1,9,5],
[3,2,5],
[8,6,7]]
data = pd.DataFrame(data)
sns.heatmap(data)
Is this possible in panda in a clever way?
Setup
data = pd.DataFrame([[5, 1, 9], [7, 8, 6], [5, 3, 2]])
You can accomplish this by using argsort of the diagonal elements of your sorted DataFrame, then indexing the DataFrame using these values.
Step 1
Use your initial sort:
data = data.loc[:, data.min().sort_values().index]
1 2 0
0 1 9 5
1 8 6 7
2 3 2 5
Step 2
Use np.argsort with np.diag:
data.iloc[np.argsort(np.diag(data))]
1 2 0
0 1 9 5
2 3 2 5
1 8 6 7
I'm not quite sure, but you've already done the following to sort column
data = data.loc[:, data.min().sort_values().index]
the same trick could also be applied to sort row
data = data.loc[data.min(axis=1).sort_values().index, :]
To move some values around so that the min value within each column is placed along the diagonal you could try something like this:
for i in range(len(data)):
min_index = data.iloc[:, i].idxmin()
if data.iloc[i,i] != data.iloc[min_index, i]:
data.iloc[i,i], data.iloc[min_index,i] = data.iloc[min_index, i], data.iloc[i,i]
Basically just swap the min with the diagonal.

How do I reverse the first four elements of the 1st axis and reversing the 2nd axis of a numpy array in a single operation?

I have a numpy array M of shape (n, 1000, 6). This can be thought of as n matrices with 1000 rows and 6 columns. For each matrix I would like to reverse the order of the rows (i.e. the top row is now at the bottom and vice versa) and then reverse the order of just the first 4 columns (so column 0 is now column 3, column 1 is column 2, column 2 is column 1 and column 3 is column 0 but column 4 is still column 4 and column 5 is still column 5). I would like to do this in a single operation, without doing indexing on the left side of the expression, so this would not be acceptable:
M[:,0:4,:] = M[:,0:4,:][:,::-1,:]
M[:,:,:] = M[:,:,::-1]
The operation needs to be achieveable using Keras backend which disallowes this. It must be of the form
M = M[indexing here that solves the task]
If I wanted to reverse the order of all the columns instead of just the first 4 this could easily be achieved with M = M[:,::-1,::-1] so I've being trying to modify this to achieve my goal but unfortunately can't work out how. Is this even possible?
M[:, ::-1, [3, 2, 1, 0, 4, 5]]

Understanding argmax

Let say I have the matrix
import numpy as np
A = np.matrix([[1,2,3,33],[4,5,6,66],[7,8,9,99]])
I am trying to understand the function argmax, as far as I know it returns the largest value
If I tried it on Python:
np.argmax(A[1:,2])
Should I get the largest element in the second row till the end of the row (which is the third row) and along the third column? So it should be the array [6 9], and arg max should return 9? But why when I run it on Python, it returns the value 1?
And if I want to return the largest element from row 2 onwards in column 3 (which is 9), how should I modify the code?
I have checked the Python documentation but still a bit unclear. Thanks for the help and explanation.
No argmax returns the position of the largest value. max returns the largest value.
import numpy as np
A = np.matrix([[1,2,3,33],[4,5,6,66],[7,8,9,99]])
np.argmax(A) # 11, which is the position of 99
np.argmax(A[:,:]) # 11, which is the position of 99
np.argmax(A[:1]) # 3, which is the position of 33
np.argmax(A[:,2]) # 2, which is the position of 9
np.argmax(A[1:,2]) # 1, which is the position of 9
It took me a while to figure this function out. Basically argmax returns you the index of the maximum value in the array. Now the array can be 1 dimensional or multiple dimensions. Following are some examples.
1 dimensional
a = [[1,2,3,4,5]]
np.argmax(a)
>>4
The array is 1 dimensional so the function simply returns the index of the maximum value(5) in the array, which is 4.
Multiple dimensions
a = [[1,2,3],[4,5,6]]
np.argmax(a)
>>5
In this example the array is 2 dimensional, with shape (2,3). Since no axis parameter is specified in the function, the numpy library flattens the array to a 1 dimensional array and then returns the index of the maximum value. In this case the array is transformed to [[1,2,3,4,5,6]] and then returns the index of 6, which is 5.
When parameter is axis = 0
a = [[1,2,3],[4,5,6]]
np.argmax(a, axis=0)
>>array([1, 1, 1])
The result here was a bit confusing to me at first. Since the axis is defined to be 0, the function will now try to find the maximum value along the rows of the matrix. The maximum value,6, is in the second row of the matrix. The index of the second row is 1. According to the documentation the dimension specified in the axis parameter will be removed. Since the shape of the original matrix was (2,3) and axis specified as 0, the returned matrix will have a shape of(3,) instead, since the 2 in the original shape(2,3) is removed.The row in which the maximum value was found is now repeated for the same number of elements as the columns in the original matrix i.e. 3.
When parameter is axis = 1
a = [[1,2,3],[4,5,6]]
np.argmax(a, axis=1)
>>array([2, 2])
Same concept as above but now index of the column is returned at which the maximum value is available. In this example the maximum value 6 is in the 3rd column, index 2. The column of the original matrix with shape (2,3) will be removed, transforming to (2,) and so the return array will display two elements, each showing the index of the column in which the maximum value was found.
Here is how argmax works. Let's suppose we have given array
matrix = np.array([[1,2,3],[4,5,6],[7,8,9], [9, 9, 9]])
Now, find the max value from given array
np.max(matrix)
The answer will be -> 9
Now find argmax of given array
np.argmax(matrix)
The answer will be -> 8
How it got 8, let's understand
python will convert array to one dimension, so array will look like
array([1, 2, 3, 4, 5, 6, 7, 8, 9, 9, 9, 9])
Index 0 1 2 3 4 5 6 7 8 9 10 11
so max value is 9 and first occurrence of 9 is at index 8. That's why answer of argmax is 8.
axis = 0 (column wise max)
Now, find max value column wise
np.argmax(matrix, axis=0)
Index 0 1 2
0 [1, 2, 3]
1 [4, 5, 6]
2 [7, 8, 9]
3 [9, 9, 9]
In first column values are 1 4 7 9, max value in first column is 9 and is at index 3
same for second column, values are 2 5 8 9 and max value in second column is 9 and is at index 3
for third column values are 3 6 9 9 and max value is 9 and is at index 2 and 3, so first occurrence of 9 is at index 2
so the output will be like [3, 3, 2]
axis = 1 (row wise)
Now find max value row wise
np.argmax(matrix, axis=1)
Index 0 1 2
0 [1, 2, 3]
1 [4, 5, 6]
2 [7, 8, 9]
3 [9, 9, 9]
for first row values are 1 2 3 and max value is 3 and is at index 2
for second row values are 4 5 6 and max value is 6 and is at index 2
for third row values are 7 8 9 and max value is 9 and is at index 2
for fourth row values are 9 9 9 and max value is 9 and is at index 0 1 2, but first occurrence of 9 is at index 0
so the output will be like [2 2 2 0]
argmax is a function which gives the index of the greatest number in the given row or column and the row or column can be decided using axis attribute of argmax funcion. If we give axis=0 then it will give the index from columns and if we give axis=1 then it will give the index from rows.
In your given example A[1:, 2] it will first fetch the values from 1st row on wards and the only 2nd column value from those rows, then it will find the index of max value from into the resulted matrix.
In my first steps in python i have tested this function. And the result with this example clarified me how works argmax.
Example:
# Generating 2D array for input
array = np.arange(20).reshape(4, 5)
array[1][2] = 25
print("The input array: \n", array)
# without axis
print("\nThe max element: ", np.argmax(array))
# with axis
print("\nThe indices of max element: ", np.argmax(array, axis=0))
print("\nThe indices of max element: ", np.argmax(array, axis=1))
Result Example:
The input array:
[[ 0 1 2 3 4]
[ 5 6 25 8 9]
[10 11 12 13 14]
[15 16 17 18 19]]
The max element: 7
The indices of max element: [3 3 1 3 3]
The indices of max element: [4 2 4 4]
In that result we can see 3 results.
The highest element in all array is in position 7.
The highest element in every column is in the last row which index is 3, except on third column where the highest value is in row number two which index is 1.
The highest element in every row is in the last column which index is 4, except on second row where the highest value is in third columen which index is 2.
Reference: https://www.crazygeeks.org/numpy-argmax-in-python/
I hope that it helps.

Categories

Resources