I have two numpy matricies of the same shape.
In one of them each column contains all 0's except for a 1.
In the other matrix each column contains random numbers.
My goal is to count the number of columns for which the position of the 1 in the column of the first matrix corresponds with the position of the highest element in the column of the second matrix.
For example:
a = [[1,0],
[0,1]]
b = [[2,3],
[3,5]]
myFunc(a,b)
would yield 1 since the argmax of the first column in b is not the same as in a but it is the same in the second column.
My solution was to iterate over the columns and check if the argmax was the same, store that in a list and then sum that at the end, but this doesn't take advantage of numpy's fastness. Is there a faster way to do this? Thanks!
This checks the indices of max in each column of b against indices of 1s in corresponding column of a and counts the matches:
(a.T.nonzero()[1]==b.argmax(axis=0)).sum()
output in your example:
1
Given that there will only be a single 1 in the first array, then you should just be able to compare where the argmax is at the same position
def myfunc(binary_array,value_array):
return np.sum(a.argmax(axis=1)==b.argmax(axis=1))
a = np.array([[1,0],
[0,1]])
b = np.array([[2,3],
[3,5]])
myfunc(a,b)
1
c=np.array([[0,1,0],[1,0,0],[0,0,1]])
d=np.array([[1,2,3],[2,2,3],[1,3,4]])
myfunc(c,d)
1
e=np.array([[0,1,0],[0,0,1],[0,0,1]])
f=np.array([[1,2,3],[2,2,3],[1,3,4]])
myfunc(e,f)
2
Related
I have dataframe
enter image description here
how to calculate sum of lets 3 first negative elements of first column?
I tried loc and ilocs but they sum all negative elements in column. I expected -3
Filter first negative values - less like 0, then first 3 and sum:
out = df.loc[df.a.lt(0), 'a'].head(3).sum()
out = df.loc[df.a.lt(0), 'a'].iloc[:3].sum()
EDIT: If need first column select by positionm not by label a:
out = df.iloc[df.iloc[:, 0].lt(0).to_numpy(), 0].iloc[:3].sum()
I want to retrieve the original index of the column with the largest sum at each iteration after the previous column with the largest sum is removed. Meanwhile, the row of the same index of the deleted column is also deleted from the matrix at each iteration.
For example, in a 10 by 10 matrix, the 5th column has the largest sum, hence the 5th column and row are removed. Now the matrix is 9 by 9 and the sum of columns is recalculated. Suppose the 6th column has the largest sum, hence the 6th column and row of the current matrix are removed, which is the 7th in the original matrix. Do this iteratively until the desired number of columns index is preserved.
My code in Julia that does not work is pasted below. Step two in the for loop is not correct because a row is removed at each iteration, thus the sum of columns are different.
Thanks!
# a matrix of random numbers
mat = rand(10, 10);
# column sum of the original matrix
matColSum = sum(mat, dims=1);
# iteratively remove columns with the largest sum
idxColRemoveList = [];
matTemp = mat;
for i in 1:4 # Suppose 4 columns need to be removed
# 1. find the index of the column with the largest column sum at current iteration
sumTemp = sum(matTemp, dims=1);
maxSumTemp = maximum(sumTemp);
idxColRemoveTemp = argmax(sumTemp)[2];
# 2. record the orignial index of the removed scenario
idxColRemoveOrig = findall(x->x==maxSumTemp, matColSum)[1][2];
push!(idxColRemoveList, idxColRemoveOrig);
# 3. update the matrix. Note that the corresponding row is also removed.
matTemp = matTemp[Not(idxColRemoveTemp), Not(idxColRemoveTemp)];
end
python solution:
import numpy as np
mat = np.random.rand(5, 5)
n_remove = 3
original = np.arange(len(mat)).tolist()
removed = []
for i in range(n_remove):
col_sum = np.sum(mat, axis=0)
col_rm = np.argsort(col_sum)[-1]
removed.append(original.pop(col_rm))
mat = np.delete(np.delete(mat, col_rm, 0), col_rm, 1)
print(removed)
print(original)
print(mat)
I'm guessing the problem you had was keeping track with information what was the index of current columns/rows in original array. I've just used a list [0, 1, 2, ...] and then pop one value in each iteration.
A simpler way to code the problem would be to replace elements in the selected column with a significantly small number instead of deleting the column. This approach avoids the use of "sort" and "pop" to improve code efficiency.
import numpy as np
n = 1000
mat = np.random.rand(n, n)
n_remove = 500
removed = []
for i in range(n_remove):
# get sum of each column
col_sum = np.sum(mat, axis=0)
col_rm = np.argmax(col_sum)
# record the column ID
removed.append(col_rm)
# replace elements in the col_rm-th column and row with the zeros
mat[:, col_rm] = 1e-10
mat[col_rm, :] = 1e-10
print(removed)
I am trying to find out the 3 nearest neighbours of a row within a set of 10 rows(each 10 rows is a class), and then average out those 3 neighbours.
I need to do this over an array of 400 rows, where each consecutive 10 rows belong to one class.
I think I have managed to capture the 3 nearest neighbours for each row within 'indices' below.
In the output below, 'indices' is a 10x3 matrix.
I'm just not sure how to go about referencing those particular 3 rows in the original xclass that the 3 elements of each row of 'indices' refers to, and then add them (the challenge) and then divide by 3 to get the average (i assume this division is straight-forward).
Updated this para after the responses below:
Basically, X has dimensions 400x4096
Indices could be for example [[1,3,5],[2,4,8].....]
What I need to do is average out rows 1,3 and 5 of X and obtain a resultant row of shape 1x4096.
Similarly average out rows 2,4,8 of X and obtain a new row for this set and so on for each row in indices.
So basically each element in a particular row of indices refers to a specific row in X.
'''
for counter in range(0,399,10):
#print(counter)
xclass=X[counter:counter+9]
yclass=y[counter:counter+9]
#print(xclass)
nbrs = NearestNeighbors(n_neighbors=3, algorithm='brute').fit(xclass)
distances, indices = nbrs.kneighbors(xclass)
#print(indices)
'''
appreciate any insight.
You can index lists in python using a word as such...
a = ['aaa', 'bbb', 'ccc']
b = a[a.index('aaa')]
print(b)
output: aaa
and also like...
a = ['aaa', 'bbb', 'ccc']
word = 'aaa'
b = a[a.index(word)]
print(b)
output: aaa
so can do something like...
a = ['aaa', 'bbb', 'ccc']
word = 'aaa'
b = a[a.index(word+1)]
print(b)
output: bbb
I assume you are using numpy (or something similar). In general, you can take any indexing array and use that to capture the particular entries of interest in another array. For example,
import numpy as np
#Accessing an array by an indexing array.
X = np.arange(30).reshape(6,5) #6, 5 long vectors
I = [[0,1,2],[3,4,5]] #I wish to collect vectors 0,1,2 together and vectors 3,4,5 together
C = X[I,:] #Will be a 2 by 3 by 5. 2 collections of 3 5-long vectors.
print(C)
#Computations for averaging those collected arrays.
#Note that C is shape (2,3,5), we wish to average the 3s together, hence we need to
#average along the middle axis (axis=1).
A = np.average(C,axis=1)
print(A)
More detail about X[I,:]. In general, what we did here was specify all the x-coordinates and all the y-coordinates to capture in our array X. Since, we wanted the full vectors in X we didn't care about the y-coordinates and wanted to capture all of them, hence :. Likewise, we wanted to pull the x-coordinates 3 at a time, so we specified to pull [0,1,2] and then to pull [3,4,5]. You could change those to any indexes you wish.
I have to take only some parts/subset of a matrix that has 1273x1273 dimension.
I have two indices ={i,j}, and I have to take the elements of a matrix that have i as index of row, but not j as column, and vice versa.
for example:
M=[[1,2,3,4],
[5,6,7,8],
[9,10,11,12],
[13,14,15,16]]
If i=1 and j=3, I have to construct a submatrix that is
[[5,7],
[13,15]]
I am supposing that the first row and the first column have index=0.
First, fetch rows i and j.
# names are less than perfect
row_i = M[i]
row_j = M[j]
Then remove columns j and i from those rows.
del row_i[j]
del row_j[i]
Then return your new matrix ([row_i, row_j].)
I don't know if the i or the j change as you want, but the basic thing here to extract the first 3 columns without the fourth one is:
m[:,:2]
and if you want the last column separate use:
m[:,3]
You can change the 2 to the column number you want.
df.idxmax() returns max along an axis (row or columns), but I want arg_max(df) over the full dataframe, which returns a tuple (row,column).
The use case I have in mind is feature selection, wherein I have a correlation matrix and want to "recursively" remove features with highest correlation. I preprocess the correlation matrix to consider its absolute values and set the diagonal elements to -1. Then I propose to use rec_drop, which recursively drops one amongst the feature-pair that has the highest correlation (subject to a cutoff: max_allowed_correlation), and returns the final list of features. E.g.:
S = S.abs()
np.fill_diagonal(S.values,-1) # so that max can't be on the diagonal now
S = rec_drop(S,max_allowed_correlation=0.95)
def rec_drop(S, max_allowed_correlation=0.99):
max_corr = S.max().max()
if max_corr<max_allowed_correlation: # base case for recursion
return S.columns.tolist()
row,col = arg_max(S) # row and col are distinct features - max can't be on the diagonal
S = S.drop(row).drop(row,axis=1) # removing one of the features from S
return rec_drop(S, max_allowed_correlation)
Assuming that all your pandas table is numerical, something you can do is transform to its numpy interpretation and extract maximum locations from there. However, numpy's argmax works on the flattened data, so you will need to work around:
# Synthetic data
>>> table = pd.DataFrame(np.random.rand(5,3))
>>> table
0 1 2
0 0.367720 0.235935 0.278112
1 0.645146 0.187421 0.324257
2 0.644926 0.861077 0.460296
3 0.035064 0.369187 0.165278
4 0.270208 0.782411 0.690871
[5 rows x 3 columns
Transform table to numpy data and calculate argmax:
>>> data = table.as_matrix()
>>> amax = data.argmax() # 7 in this case
>>> row, col = (amax//data.shape[1], amax%data.shape[1])
>>> row, col
(2, 1)