get indicies of non-zero elements of 2D array - python

From Getting indices of both zero and nonzero elements in array, I can get indicies of non-zero elements in a 1 D array in numpy like this:
indices_nonzero = numpy.arange(len(array))[~bindices_zero]
Is there a way to extend it to a 2D array?

You can use numpy.nonzero
The following code is self-explanatory
import numpy as np
A = np.array([[1, 0, 1],
[0, 5, 1],
[3, 0, 0]])
nonzero = np.nonzero(A)
# Returns a tuple of (nonzero_row_index, nonzero_col_index)
# That is (array([0, 0, 1, 1, 2]), array([0, 2, 1, 2, 0]))
nonzero_row = nonzero[0]
nonzero_col = nonzero[1]
for row, col in zip(nonzero_row, nonzero_col):
print("A[{}, {}] = {}".format(row, col, A[row, col]))
"""
A[0, 0] = 1
A[0, 2] = 1
A[1, 1] = 5
A[1, 2] = 1
A[2, 0] = 3
"""
You can even do this
A[nonzero] = -100
print(A)
"""
[[-100 0 -100]
[ 0 -100 -100]
[-100 0 0]]
"""
Other variations
np.where(array)
It is equivalent to np.nonzero(array)
But, np.nonzero is preferred because its name is clear
np.argwhere(array)
It's equivalent to np.transpose(np.nonzero(array))
print(np.argwhere(A))
"""
[[0 0]
[0 2]
[1 1]
[1 2]
[2 0]]
"""

A = np.array([[1, 0, 1],
[0, 5, 1],
[3, 0, 0]])
np.stack(np.nonzero(A), axis=-1)
array([[0, 0],
[0, 2],
[1, 1],
[1, 2],
[2, 0]])
np.nonzero returns a tuple of arrays, one for each dimension of a, containing the indices of the non-zero elements in that dimension.
https://docs.scipy.org/doc/numpy/reference/generated/numpy.nonzero.html
np.stack joins this tuple array along a new axis. In our case, the innermost axis also known as the last axis (denoted by -1).
The axis parameter specifies the index of the new axis in the dimensions of the result. For example, if axis=0 it will be the first dimension and if axis=-1 it will be the last dimension.
New in version 1.10.0.
https://docs.scipy.org/doc/numpy/reference/generated/numpy.stack.html

Related

Numpy assign an array value based on the values of another array with column selected based on a vector

I have a 2 dimensional array
X
array([[2, 3, 3, 3],
[3, 2, 1, 3],
[2, 3, 1, 2],
[2, 2, 3, 1]])
and a 1 dimensional array
y
array([1, 0, 0, 1])
For each row of X, i want to find the column index where X has the lowest value and y has a value of 1, and set the corresponding row column pair in a third matrix to 1
For example, in case of the first row of X, the column index corresponding to the minimum X value (for the first row only) and y = 1 is 0, then I want Z[0,0] = 1 and all other Z[0,i] = 0.
Similarly, for the second row, column index 0 or 3 gives the lowest X value with y = 1. Then i want either Z[1,0] or Z[1,3] = 1 (preferably Z[1,0] = 1 and all other Z[1,i] = 0, since 0 column is the first occurance)
My final Z array will look like
Z
array([[1, 0, 0, 0],
[1, 0, 0, 0],
[1, 0, 0, 0],
[0, 0, 0, 1]])
One way to do this is using masked arrays.
import numpy as np
X = np.array([[2, 3, 3, 3],
[3, 2, 1, 3],
[2, 3, 1, 2],
[2, 2, 3, 1]])
y = np.array([1, 0, 0, 1])
#get a mask in the shape of X. (True for places to ignore.)
y_mask = np.vstack([y == 0] * len(X))
X_masked = np.ma.masked_array(X, y_mask)
out = np.zeros_like(X)
mins = np.argmin(X_masked, axis=0)
#Output: array([0, 0, 0, 3], dtype=int64)
#Now just set the indexes to 1 on the minimum for each axis.
out[np.arange(len(out)), mins] = 1
print(out)
[[1 0 0 0]
[1 0 0 0]
[1 0 0 0]
[0 0 0 1]]
you can use numpy.argmin(), to get the indexes of the min value at each row of X. For example:
import numpy as np
a = np.arange(6).reshape(2,3) + 10
ids = np.argmin(a, axis=1)
Similarly, you can the indexes where y is 1 by either numpy.nonzero or numpy.where.
Once you have the two index arrays setting the values in third array should be quite easy.

Generate NumPy array containing the indices of another NumPy array

I'd like to generate a np.ndarray NumPy array for a given shape of another NumPy array. The former array should contain the corresponding indices for each cell of the latter array.
Example 1
Let's say we have a = np.ones((3,)) which has a shape of (3,). I'd expect
[[0]
[1]
[2]]
since there is a[0], a[1] and a[2] in a which can be accessed by their indices 0, 1 and 2.
Example 2
For a shape of (3, 2) like b = np.ones((3, 2)) there is already very much to write. I'd expect
[[[0 0]
[0 1]]
[[1 0]
[1 1]]
[[2 0]
[2 1]]]
since there are 6 cells in b which can be accessed by the corresponding indices b[0][0], b[0][1] for the first row, b[1][0], b[1][1] for the second row and b[2][0], b[2][1] for the third row. Therefore we get [0 0], [0 1], [1 0], [1 1], [2 0] and [2 1] at the matching positions in the generated array.
Thank you very much for taking the time. Let me know if I can clarify the question in any way.
One way to do it with np.indices and np.stack:
np.stack(np.indices((3,)), -1)
#array([[0],
# [1],
# [2]])
np.stack(np.indices((3,2)), -1)
#array([[[0, 0],
# [0, 1]],
# [[1, 0],
# [1, 1]],
# [[2, 0],
# [2, 1]]])
np.indices returns an array of index grid where each subarray represents an axis:
np.indices((3, 2))
#array([[[0, 0],
# [1, 1],
# [2, 2]],
# [[0, 1],
# [0, 1],
# [0, 1]]])
Then transpose the array with np.stack, stacking index for each element from different axis:
np.stack(np.indices((3,2)), -1)
#array([[[0, 0],
# [0, 1]],
# [[1, 0],
# [1, 1]],
# [[2, 0],
# [2, 1]]])

numpy 3D array shape

I created a numpy array of shape (4,3,2); I expected following code can print out a array shaped 4 X 3 or 3 X 4
import numpy as np
x = np.zeros((4,3,2), np.int32)
print(x[:][:][0])
However, I got
[[0 0]
[0 0]
[0 0]]
Looks like a 2 X 3? I am really confused on numpy matrix now. Shouldn't I get
[[0 0 0]
[0 0 0]
[0 0 0]
[0 0 0]]
in stead? How to do map a 3D image into a numpy 3D matrix?
For example, in MATLAB, the shape (m, n, k) means (row, col, slice) in a context of an (2D/3D) image.
Thanks a lot
x[:] slices all elements along the first dimension, so x[:] gives the same result as x and x[:][:][0] is thus equivalent to x[0].
To select an element on the last dimension, you can do:
x[..., 0]
#array([[0, 0, 0],
# [0, 0, 0],
# [0, 0, 0],
# [0, 0, 0]], dtype=int32)
or:
x[:,:,0]
#array([[0, 0, 0],
# [0, 0, 0],
# [0, 0, 0],
# [0, 0, 0]], dtype=int32)
You need to specify all slices at the same time in a tuple, like so:
x[:, :, 0]
If you do x[:][:][0] you are actually indexing the first dimension three times. The first two create a view for the entire array and the third creates a view for the index 0 of the first dimension

Using a numpy array to assign values to another array

I have the following numpy array matrix ,
matrix = np.zeros((3,5), dtype = int)
array([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]])
Suppose I have this numpy array indices as well
indices = np.array([[1,3], [2,4], [0,4]])
array([[1, 3],
[2, 4],
[0, 4]])
Question: How can I assign 1s to the elements in the matrix where their indices are specified by the indices array. A vectorized implementation is expected.
For more clarity, the output should look like:
array([[0, 1, 0, 1, 0], #[1,3] elements are changed
[0, 0, 1, 0, 1], #[2,4] elements are changed
[1, 0, 0, 0, 1]]) #[0,4] elements are changed
Here's one approach using NumPy's fancy-indexing -
matrix[np.arange(matrix.shape[0])[:,None],indices] = 1
Explanation
We create the row indices with np.arange(matrix.shape[0]) -
In [16]: idx = np.arange(matrix.shape[0])
In [17]: idx
Out[17]: array([0, 1, 2])
In [18]: idx.shape
Out[18]: (3,)
The column indices are already given as indices -
In [19]: indices
Out[19]:
array([[1, 3],
[2, 4],
[0, 4]])
In [20]: indices.shape
Out[20]: (3, 2)
Let's make a schematic diagram of the shapes of row and column indices, idx and indices -
idx (row) : 3
indices (col) : 3 x 2
For using the row and column indices for indexing into input array matrix, we need to make them broadcastable against each other. One way would be to introduce a new axis into idx, making it 2D by pushing the elements into the first axis and allowing a singleton dim as the last axis with idx[:,None], as shown below -
idx (row) : 3 x 1
indices (col) : 3 x 2
Internally, idx would be broadcasted, like so -
In [22]: idx[:,None]
Out[22]:
array([[0],
[1],
[2]])
In [23]: indices
Out[23]:
array([[1, 3],
[2, 4],
[0, 4]])
In [24]: np.repeat(idx[:,None],2,axis=1) # indices has length of 2 along cols
Out[24]:
array([[0, 0], # Internally broadcasting would be like this
[1, 1],
[2, 2]])
Thus, the broadcasted elements from idx would be used as row indices and column indices from indices for indexing into matrix for setting elements in it. Since, we had -
idx = np.arange(matrix.shape[0]),
Thus, we would end up with -
matrix[np.arange(matrix.shape[0])[:,None],indices] for setting elements.
this involves loop and hence may not be very efficient for large arrays
for i in range(len(indices)):
matrix[i,indices[i]] = 1
> matrix
Out[73]:
array([[0, 1, 0, 1, 0],
[0, 0, 1, 0, 1],
[1, 0, 0, 0, 1]])

Sum over rows in scipy.sparse.csr_matrix

I have a big csr_matrix and I want to add over rows and obtain a new csr_matrix with the same number of columns but reduced number of rows. (Context: The matrix is a document-term matrix obtained from sklearn CountVectorizer and I want to be able to quickly combine documents according to codes associated with these documents)
For a minimal example, this is my matrix:
import numpy as np
from scipy.sparse import csr_matrix
from scipy.sparse import vstack
row = np.array([0, 4, 1, 3, 2])
col = np.array([0, 2, 2, 0, 1])
dat = np.array([1, 2, 3, 4, 5])
A = csr_matrix((dat, (row, col)), shape=(5, 5))
print A.toarray()
[[1 0 0 0 0]
[0 0 3 0 0]
[0 5 0 0 0]
[4 0 0 0 0]
[0 0 2 0 0]]
No let's say I want a new matrix B in which rows (1, 4) and (2, 3, 5) are combined by summing them, which would look something like this:
[[5 0 0 0 0]
[0 5 5 0 0]]
And should be again in sparse format (because the real data I'm working with is large). I tried to sum over slices of the matrix and then stack it:
idx1 = [1, 4]
idx2 = [2, 3, 5]
A_sub1 = A[idx1, :].sum(axis=1)
A_sub2 = A[idx2, :].sum(axis=1)
B = vstack((A_sub1, A_sub2))
But this gives me the summed up values just for the non-zero columns in the slice, so I can't combine it with the other slices because the number of columns in the summed slices are different.
I feel like there must be an easy way to do this. But I couldn't find any discussion of this online or in the documentation. What am I missing?
Thank you for your help
Note that you can do this by carefully constructing another matrix. Here's how it would work for a dense matrix:
>>> S = np.array([[1, 0, 0, 1, 0,], [0, 1, 1, 0, 1]])
>>> np.dot(S, A.toarray())
array([[5, 0, 0, 0, 0],
[0, 5, 5, 0, 0]])
>>>
The sparse version is only a little more complicated. The information about which rows should be summed together is encoded in row:
col = range(5)
row = [0, 1, 1, 0, 1]
dat = [1, 1, 1, 1, 1]
S = csr_matrix((dat, (row, col)), shape=(2, 5))
result = S * A
# check that the result is another sparse matrix
print type(result)
# check that the values are the ones we want
print result.toarray()
Output:
<class 'scipy.sparse.csr.csr_matrix'>
[[5 0 0 0 0]
[0 5 5 0 0]]
You can handle more rows in your output by including higher values in row and extending the shape of S accordingly.
The indexing should be:
idx1 = [0, 3] # rows 1 and 4
idx2 = [1, 2, 4] # rows 2,3 and 5
Then you need to keep A_sub1 and A_sub2 in sparse format and use axis=0:
A_sub1 = csr_matrix(A[idx1, :].sum(axis=0))
A_sub2 = csr_matrix(A[idx2, :].sum(axis=0))
B = vstack((A_sub1, A_sub2))
B.toarray()
array([[5, 0, 0, 0, 0],
[0, 5, 5, 0, 0]])
Note, I think the A[idx, :].sum(axis=0) operations involve conversion from sparse matrices - so #Mr_E's answer is probably better.
Alternatively, it works when you use axis=0 and np.vstack (as opposed to scipy.sparse.vstack):
A_sub1 = A[idx1, :].sum(axis=0)
A_sub2 = A[idx2, :].sum(axis=0)
np.vstack((A_sub1, A_sub2))
Giving:
matrix([[5, 0, 0, 0, 0],
[0, 5, 5, 0, 0]])

Categories

Resources