Get the column which have maximum sum value from matrix - python

Let's say:
import numpy as np
f=np.matrix("1 2; 3 4 ; 5 6")
Is retrieving number of column which have maximum sum of column from matrix possible? How?

You could write:
>>> f.sum(axis=0).argmax()
1
So column 1 sums to the greatest value.
To clarify what this does: f.sum(axis=0) sums the columns of the matrix f, returning the matrix matrix([[ 9, 12]]). Then argmax() is used to find the index of the maximum value in this matrix of sums.

Related

Select rows of dataframe whose column values amount to a given sum

I need to find out how many of the first N rows of a dataframe make up (just over) 50% of the sum of values for that column.
Here's an example:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(10, 1), columns=list("A"))
0 0.681991
1 0.304026
2 0.552589
3 0.716845
4 0.559483
5 0.761653
6 0.551218
7 0.267064
8 0.290547
9 0.182846
therefore
sum_of_A = df["A"].sum()
4.868260213425804
and with this example I need to find, starting from row 0, how many rows I need to get a sum of at least 2.43413 (approximating 50% of sum_of_A).
Of course I could iterate through the rows and sum and break when I get over 50%, but is there a more concise/Pythonic/efficient way of doing this?
I would use .cumsum(), which we can use to get all the rows where the cumulative sum is at least half of the total sum:
df[df["A"].cumsum() < df["A"].sum() / 2]

How to compare each dataframe row to each point of a tuple and assign the closest point's index to a new column?

Imagine the following dataset:
X Y
0 2 4
1 5 6
2 3 4
Now, imagine the following tuple of points: ((2,4), (6,5), (1,14))
How can I find the closest point to each row and assign the index of the point to a new column?
For example, since the closest point to the first row is the point with index 0, the first row would become:
X Y Closest_Point
0 2 4 0
Try with scipy , the logic here is broadcast
from scipy.spatial import distance
ary = distance.cdist(df.values, np.array(l), metric='euclidean')
ary.argmin(1)
Out[326]: array([0, 1, 0], dtype=int32)
I would for sure use Numpy to make both the tuple and the dataset into numpy arrays.
For the examples you gave:
import numpy as np
dataset = np.array([[2,4],[5,6],[3,4]])
points = np.array([[2,4],[6,5],[1,14]])
dataset_indexed = []
for i in range(dataset.shape[0]):
temp= (((dataset[i,0]-points[0,0])**2 +(dataset[i,1]-points[0,1])**2)**(1/2))
index=0
for n in range(points.shape[0]):
print(((dataset[i,0]-points[n,0])**2 +(dataset[i,1]-points[n,1])**2)**(1/2))
if(((dataset[i,0]-points[n,0])**2 +(dataset[i,1]-points[n,1])**2)**(1/2)<=temp):
temp= ((dataset[i,0]-points[n,0])**2 +(dataset[i,1]-points[n,1])**2)**(1/2)
index = n
dataset_indexed.append([dataset[i,0],dataset[i,1],index])

Best way to find the maximum sum of multiple arrays given constraints on index

Say I have 3 sorted arrays each of length 4, and I want to choose an index from each array such that the sum of the indexes are equal to 4. How would I find the maximum possible sum without testing all possible choices?
For instance I have the following arrays
1 : [0,0,0,8]
2 : [1,4,5,6]
3 : [1,5,5,5]
Then the solution would be 3,0,1. Because 3 + 0 + 1 = 4 and 8 + 1 + 5 is
the maximum combination where the sum of the indexes are 4.
I need a solution that can be generalized to n arrays of size m where the sum of the indexes could equal anything.
For instance, it could be asked that this be solved with 1000 arrays all of size 1000 where the sum of the index is 2000.
If there is a python package somewhere that does this please let me know.
This will achieve it , no sure the speed is meet your requirement
df1=pd.DataFrame([[0,0,0,8],[1,4,5,6],[1,5,5,5]])
import functools
df=pd.DataFrame(list(itertools.product([0,1,2,3],[0,1,2,3],[0,1,2,3])))
df=df.loc[df.sum(1)<=4,:]
df.index=df.apply(tuple,1)
df.apply(lambda x : df1.lookup(df.columns.tolist(),list(x.name)),1).sum(1).idxmax()
Out[751]: (3, 0, 1)
df.apply(lambda x : df1.lookup(df.columns.tolist(),list(x.name)),1).sum(1).max()
Out[752]: 14

How do I reverse the first four elements of the 1st axis and reversing the 2nd axis of a numpy array in a single operation?

I have a numpy array M of shape (n, 1000, 6). This can be thought of as n matrices with 1000 rows and 6 columns. For each matrix I would like to reverse the order of the rows (i.e. the top row is now at the bottom and vice versa) and then reverse the order of just the first 4 columns (so column 0 is now column 3, column 1 is column 2, column 2 is column 1 and column 3 is column 0 but column 4 is still column 4 and column 5 is still column 5). I would like to do this in a single operation, without doing indexing on the left side of the expression, so this would not be acceptable:
M[:,0:4,:] = M[:,0:4,:][:,::-1,:]
M[:,:,:] = M[:,:,::-1]
The operation needs to be achieveable using Keras backend which disallowes this. It must be of the form
M = M[indexing here that solves the task]
If I wanted to reverse the order of all the columns instead of just the first 4 this could easily be achieved with M = M[:,::-1,::-1] so I've being trying to modify this to achieve my goal but unfortunately can't work out how. Is this even possible?
M[:, ::-1, [3, 2, 1, 0, 4, 5]]

Numpy Summation for Index in Boolean Array

In Numpy I have a boolean array of equal length of a matrix. I want to run a calculation on the matrix elements that correspond to the boolean array. How do I do this?
a: [true, false, true]
b: [[1,1,1],[2,2,2],[3,3,3]]
Say the function was to sum the elements of the sub arrays
index 0 is True: thus I add 3 to the summation (Starts at zero)
index 1 is False: thus summation remains at 3
index 2 is True: thus I add 9 to the summation for a total of 12
How do I do this (the boolean and summation part; I don't need how to add up each individual sub array)?
You can simply use your boolean array a to index into the rows of b, then take the sum of the resulting (2, 3) array:
import numpy as np
a = np.array([True, False, True])
b = np.array([[1,1,1],[2,2,2],[3,3,3]])
# index rows of b where a is True (i.e. the first and third row of b)
print(b[a])
# [[1 1 1]
# [3 3 3]]
# take the sum over all elements in these rows
print(b[a].sum())
# 12
It sounds like you would benefit from reading the numpy documentation on array indexing.

Categories

Resources