I have to take only some parts/subset of a matrix that has 1273x1273 dimension.
I have two indices ={i,j}, and I have to take the elements of a matrix that have i as index of row, but not j as column, and vice versa.
for example:
M=[[1,2,3,4],
[5,6,7,8],
[9,10,11,12],
[13,14,15,16]]
If i=1 and j=3, I have to construct a submatrix that is
[[5,7],
[13,15]]
I am supposing that the first row and the first column have index=0.
First, fetch rows i and j.
# names are less than perfect
row_i = M[i]
row_j = M[j]
Then remove columns j and i from those rows.
del row_i[j]
del row_j[i]
Then return your new matrix ([row_i, row_j].)
I don't know if the i or the j change as you want, but the basic thing here to extract the first 3 columns without the fourth one is:
m[:,:2]
and if you want the last column separate use:
m[:,3]
You can change the 2 to the column number you want.
Related
I'm trying to build a basic game-like program where I need to rearrange a given matrix but vertically. In this case, I only have 0s and 1s. 0 being lighter objects and 1 being heavier. When the function runs, all the 1s should fall down vertically and the zeros go up vertically as well. It needs to have the exact number of 0s and 1s as the original matrix. Example:
-If I give the following matrix:
[1,0,1,1,0,1,0],
[0,0,0,1,0,0,0],
[1,0,1,1,1,1,1],
[0,1,1,0,1,1,0],
[1,1,0,1,0,0,1]
It should rearrange it to:
[0,0,0,0,0,0,0],
[0,0,0,1,0,0,0],
[1,0,1,1,0,1,0],
[1,1,1,1,1,1,1],
[1,1,1,1,1,1,1]
Any help or suggestions will be highly appreciated.
Consider using numpy for your matrices. You can then use np.sort to do what you want:
np.sort(matrix, axis=0)
Not as readable as the numpy approach, but if you want to use the list-approach you could
Transpose the matrix by using the zip(*matrix) approach.
Sort the resulting rows (which are columns of the original matrix)
Transpose back.
You can do it in one line:
[row for row in zip(*[sorted(column) for column in zip(*matrix)])]
If you didn't want to use numpy (though you should), you could do:
from collections import Counter
test = [[1,0,1,1,0,1,0],
[0,0,0,1,0,0,0],
[1,0,1,1,1,1,1],
[0,1,1,0,1,1,0],
[1,1,0,1,0,0,1] ]
new_version = [[] for _ in test] # create an empty list to append data to
for count, item in enumerate(test[0]): # go through the length of one of the list of lists for their length # assuming that all lists are of equal length
frequency = Counter([x[count] for x in test]) # get frequency count for the column
for count_inside, item_inside in enumerate(test):
# to add the values depending on their frequency distribution in the column
value = 0 if 0 in frequency and count_inside < frequency[0] else 1
new_version[count_inside].append(value)
print(new_version)
I am trying to come up with a function which does the following:
Take an array (preferably a sparse csr.matrix) N x N
Find which rows and/or columns all have 0 entries
Remove both the Nth row and the Nth column if any of the two (or both) have all 0 entries
Return the new NxN array (or sparse matrix) (with no all-0 entries rows and/or columns) and the index of the removed rows/columns.
I manage to return the correct array, but the return index of the removed rows and columns is not correct (and smaller): this is due to the fact that since I remove rows and columns, it might happen that new rows/columns now become all 0, while they were not before.
Take for example the array: [0,1,0,0],[0,0,0,0],[0,1,0,0],[1,0,0,0]. I shall remove the 2nd row and the 2nd column. Now, the new array [0,0,0],[0,0,0],[1,1,1] shall remove both the 1st and 2nd row and columns, which is okay, but it's easy to see how the returned indices are in a sense "scaled", i.e. they are not really the initial ones.
This is the function a created for the moment:
def remove_zero_rows(X):
# X is a scipy sparse matrix. We want to remove all zero rows/columns from it
creat_list=list(range(0, X.shape[1]))
nonzero_row_indice, nonzero_col_indice = X.nonzero()
unique_nonzero_indice = np.unique(nonzero_row_indice)
row_ind=np.array(list(set(creat_list).difference(unique_nonzero_indice))) ## Set of row 0s
nonzero_col_indice = np.unique(nonzero_col_indice)
col_ind=np.array(list(set(creat_list).difference(nonzero_col_indice))) ## Set of cols 0s
merge_two= list(set(row_ind) | set(col_ind)) # This is the index of 0 rows/columns
#Create new matrix
for i in range(X.shape[1]):
if(X.shape[1]-(np.unique(X.nonzero()[0]).size)>0 or X.shape[1]-(np.unique(X.nonzero()[1]).size)>0):
X = X[np.unique(X.nonzero()[0])][:,np.unique(X.nonzero()[0])]
X = X[np.unique(X.nonzero()[1])][:,np.unique(X.nonzero()[1])]
#print(i)
else:
break
return X, row_ind, col_ind, merge_two;
Thanks you!
I want to retrieve the original index of the column with the largest sum at each iteration after the previous column with the largest sum is removed. Meanwhile, the row of the same index of the deleted column is also deleted from the matrix at each iteration.
For example, in a 10 by 10 matrix, the 5th column has the largest sum, hence the 5th column and row are removed. Now the matrix is 9 by 9 and the sum of columns is recalculated. Suppose the 6th column has the largest sum, hence the 6th column and row of the current matrix are removed, which is the 7th in the original matrix. Do this iteratively until the desired number of columns index is preserved.
My code in Julia that does not work is pasted below. Step two in the for loop is not correct because a row is removed at each iteration, thus the sum of columns are different.
Thanks!
# a matrix of random numbers
mat = rand(10, 10);
# column sum of the original matrix
matColSum = sum(mat, dims=1);
# iteratively remove columns with the largest sum
idxColRemoveList = [];
matTemp = mat;
for i in 1:4 # Suppose 4 columns need to be removed
# 1. find the index of the column with the largest column sum at current iteration
sumTemp = sum(matTemp, dims=1);
maxSumTemp = maximum(sumTemp);
idxColRemoveTemp = argmax(sumTemp)[2];
# 2. record the orignial index of the removed scenario
idxColRemoveOrig = findall(x->x==maxSumTemp, matColSum)[1][2];
push!(idxColRemoveList, idxColRemoveOrig);
# 3. update the matrix. Note that the corresponding row is also removed.
matTemp = matTemp[Not(idxColRemoveTemp), Not(idxColRemoveTemp)];
end
python solution:
import numpy as np
mat = np.random.rand(5, 5)
n_remove = 3
original = np.arange(len(mat)).tolist()
removed = []
for i in range(n_remove):
col_sum = np.sum(mat, axis=0)
col_rm = np.argsort(col_sum)[-1]
removed.append(original.pop(col_rm))
mat = np.delete(np.delete(mat, col_rm, 0), col_rm, 1)
print(removed)
print(original)
print(mat)
I'm guessing the problem you had was keeping track with information what was the index of current columns/rows in original array. I've just used a list [0, 1, 2, ...] and then pop one value in each iteration.
A simpler way to code the problem would be to replace elements in the selected column with a significantly small number instead of deleting the column. This approach avoids the use of "sort" and "pop" to improve code efficiency.
import numpy as np
n = 1000
mat = np.random.rand(n, n)
n_remove = 500
removed = []
for i in range(n_remove):
# get sum of each column
col_sum = np.sum(mat, axis=0)
col_rm = np.argmax(col_sum)
# record the column ID
removed.append(col_rm)
# replace elements in the col_rm-th column and row with the zeros
mat[:, col_rm] = 1e-10
mat[col_rm, :] = 1e-10
print(removed)
I have two numpy matricies of the same shape.
In one of them each column contains all 0's except for a 1.
In the other matrix each column contains random numbers.
My goal is to count the number of columns for which the position of the 1 in the column of the first matrix corresponds with the position of the highest element in the column of the second matrix.
For example:
a = [[1,0],
[0,1]]
b = [[2,3],
[3,5]]
myFunc(a,b)
would yield 1 since the argmax of the first column in b is not the same as in a but it is the same in the second column.
My solution was to iterate over the columns and check if the argmax was the same, store that in a list and then sum that at the end, but this doesn't take advantage of numpy's fastness. Is there a faster way to do this? Thanks!
This checks the indices of max in each column of b against indices of 1s in corresponding column of a and counts the matches:
(a.T.nonzero()[1]==b.argmax(axis=0)).sum()
output in your example:
1
Given that there will only be a single 1 in the first array, then you should just be able to compare where the argmax is at the same position
def myfunc(binary_array,value_array):
return np.sum(a.argmax(axis=1)==b.argmax(axis=1))
a = np.array([[1,0],
[0,1]])
b = np.array([[2,3],
[3,5]])
myfunc(a,b)
1
c=np.array([[0,1,0],[1,0,0],[0,0,1]])
d=np.array([[1,2,3],[2,2,3],[1,3,4]])
myfunc(c,d)
1
e=np.array([[0,1,0],[0,0,1],[0,0,1]])
f=np.array([[1,2,3],[2,2,3],[1,3,4]])
myfunc(e,f)
2
I have a NumPy matrix that contains mostly non-zero values, but occasionally will contain a zero value. I need to be able to:
Count the non-zero values in each row and put that count into a variable that I can use in subsequent operations, perhaps by iterating through row indices and performing the calculations during the iterative process.
Count the non-zero values in each column and put that count into a variable that I can use in subsequent operations, perhaps by iterating through column indices and performing the calculations during the iterative process.
For example, one thing I need to do is to sum each row and then divide each row sum by the number of non-zero values in each row, reporting a separate result for each row index. And then I need to sum each column and then divide the column sum by the number of non-zero values in the column, also reporting a separate result for each column index. I need to do other things as well, but they should be easy after I figure out how to do the things that I am listing here.
The code I am working with is below. You can see that I am creating an array of zeros and then populating it from a csv file. Some of the rows will contain values for all the columns, but other rows will still have some zeros remaining in some of the last columns, thus creating the problem described above.
The last five lines of the code below are from another posting on this forum. These last five lines of code return a printed list of row/column indices for the zeros. However, I do not know how to use that resulting information to create the non-zero row counts and non-zero column counts described above.
ANOVAInputMatrixValuesArray=zeros([len(TestIDs),9],float)
j=0
for j in range(0,len(TestIDs)):
TestID=str(TestIDs[j])
ReadOrWrite='Read'
fileName=inputFileName
directory=GetCurrentDirectory(arguments that return correct directory)
inputfile=open(directory,'r')
reader=csv.reader(inputfile)
m=0
for row in reader:
if m<9:
if row[0]!='TestID':
ANOVAInputMatrixValuesArray[(j-1),m]=row[2]
m+=1
inputfile.close()
IndicesOfZeros = indices(ANOVAInputMatrixValuesArray.shape)
locs = IndicesOfZeros[:,ANOVAInputMatrixValuesArray == 0]
pts = hsplit(locs, len(locs[0]))
for pt in pts:
print(', '.join(str(p[0]) for p in pt))
Can anyone help me with this?
import numpy as np
a = np.array([[1, 0, 1],
[2, 3, 4],
[0, 0, 7]])
columns = (a != 0).sum(0)
rows = (a != 0).sum(1)
The variable (a != 0) is an array of the same shape as original a and it contains True for all non-zero elements.
The .sum(x) function sums the elements over the axis x. Sum of True/False elements is the number of True elements.
The variables columns and rows contain the number of non-zero (element != 0) values in each column/row of your original array:
columns = np.array([2, 1, 3])
rows = np.array([2, 3, 1])
EDIT: The whole code could look like this (with a few simplifications in your original code):
ANOVAInputMatrixValuesArray = zeros([len(TestIDs), 9], float)
for j, TestID in enumerate(TestIDs):
ReadOrWrite = 'Read'
fileName = inputFileName
directory = GetCurrentDirectory(arguments that return correct directory)
# use directory or filename to get the CSV file?
with open(directory, 'r') as csvfile:
ANOVAInputMatrixValuesArray[j,:] = loadtxt(csvfile, comments='TestId', delimiter=';', usecols=(2,))[:9]
nonZeroCols = (ANOVAInputMatrixValuesArray != 0).sum(0)
nonZeroRows = (ANOVAInputMatrixValuesArray != 0).sum(1)
EDIT 2:
To get the mean value of all columns/rows, use the following:
colMean = a.sum(0) / (a != 0).sum(0)
rowMean = a.sum(1) / (a != 0).sum(1)
What do you want to do if there are no non-zero elements in a column/row? Then we can adapt the code to solve such a problem.
A fast way to count nonzero elements per row in a scipy sparse matrix m is:
np.diff(m.tocsr().indptr)
The indptr attribute of a CSR matrix indicates the indices within the data corresponding to the boundaries between rows. So calculating the difference between each entry will provide the number of non-zero elements in each row.
Similarly, for the number of nonzero elements in each column, use:
np.diff(m.tocsc().indptr)
If the data is already in the appropriate form, these will run in O(m.shape[0]) and O(m.shape[1]) respectively, rather than O(m.getnnz()) in Marat and Finn's solutions.
If you need both row and column nozero counts, and, say, m is already a CSR, you might use:
row_nonzeros = np.diff(m.indptr)
col_nonzeros = np.bincount(m.indices)
which is not asymptotically faster than first converting to CSC (which is O(m.getnnz())) to get col_nonzeros, but is faster because of implementation details.
The faster way is to clone your matrix with ones instead of real values. Then just sum up by rows or columns:
X_clone = X.tocsc()
X_clone.data = np.ones( X_clone.data.shape )
NumNonZeroElementsByColumn = X_clone.sum(0)
NumNonZeroElementsByRow = X_clone.sum(1)
That worked 50 times faster for me than Finn Årup Nielsen's solution (1 second against 53)
edit:
Perhaps you will need to translate NumNonZeroElementsByColumn into 1-dimensional array by
np.array(NumNonZeroElementsByColumn)[0]
For sparse matrices, use the getnnz() function supported by CSR/CSC matrix.
E.g.
a = scipy.sparse.csr_matrix([[0, 1, 1], [0, 1, 0]])
a.getnnz(axis=0)
array([0, 2, 1])
(a != 0) does not work for sparse matrices (scipy.sparse.lil_matrix) in my present version of scipy.
For sparse matrices I did:
(i,j) = X.nonzero()
column_sums = np.zeros(X.shape[1])
for n in np.asarray(j).ravel():
column_sums[n] += 1.
I wonder if there is a more elegant way.