From Getting indices of both zero and nonzero elements in array, I can get indicies of non-zero elements in a 1 D array in numpy like this:
indices_nonzero = numpy.arange(len(array))[~bindices_zero]
Is there a way to extend it to a 2D array?
You can use numpy.nonzero
The following code is self-explanatory
import numpy as np
A = np.array([[1, 0, 1],
[0, 5, 1],
[3, 0, 0]])
nonzero = np.nonzero(A)
# Returns a tuple of (nonzero_row_index, nonzero_col_index)
# That is (array([0, 0, 1, 1, 2]), array([0, 2, 1, 2, 0]))
nonzero_row = nonzero[0]
nonzero_col = nonzero[1]
for row, col in zip(nonzero_row, nonzero_col):
print("A[{}, {}] = {}".format(row, col, A[row, col]))
"""
A[0, 0] = 1
A[0, 2] = 1
A[1, 1] = 5
A[1, 2] = 1
A[2, 0] = 3
"""
You can even do this
A[nonzero] = -100
print(A)
"""
[[-100 0 -100]
[ 0 -100 -100]
[-100 0 0]]
"""
Other variations
np.where(array)
It is equivalent to np.nonzero(array)
But, np.nonzero is preferred because its name is clear
np.argwhere(array)
It's equivalent to np.transpose(np.nonzero(array))
print(np.argwhere(A))
"""
[[0 0]
[0 2]
[1 1]
[1 2]
[2 0]]
"""
A = np.array([[1, 0, 1],
[0, 5, 1],
[3, 0, 0]])
np.stack(np.nonzero(A), axis=-1)
array([[0, 0],
[0, 2],
[1, 1],
[1, 2],
[2, 0]])
np.nonzero returns a tuple of arrays, one for each dimension of a, containing the indices of the non-zero elements in that dimension.
https://docs.scipy.org/doc/numpy/reference/generated/numpy.nonzero.html
np.stack joins this tuple array along a new axis. In our case, the innermost axis also known as the last axis (denoted by -1).
The axis parameter specifies the index of the new axis in the dimensions of the result. For example, if axis=0 it will be the first dimension and if axis=-1 it will be the last dimension.
New in version 1.10.0.
https://docs.scipy.org/doc/numpy/reference/generated/numpy.stack.html
Related
I have a 2 dimensional array
X
array([[2, 3, 3, 3],
[3, 2, 1, 3],
[2, 3, 1, 2],
[2, 2, 3, 1]])
and a 1 dimensional array
y
array([1, 0, 0, 1])
For each row of X, i want to find the column index where X has the lowest value and y has a value of 1, and set the corresponding row column pair in a third matrix to 1
For example, in case of the first row of X, the column index corresponding to the minimum X value (for the first row only) and y = 1 is 0, then I want Z[0,0] = 1 and all other Z[0,i] = 0.
Similarly, for the second row, column index 0 or 3 gives the lowest X value with y = 1. Then i want either Z[1,0] or Z[1,3] = 1 (preferably Z[1,0] = 1 and all other Z[1,i] = 0, since 0 column is the first occurance)
My final Z array will look like
Z
array([[1, 0, 0, 0],
[1, 0, 0, 0],
[1, 0, 0, 0],
[0, 0, 0, 1]])
One way to do this is using masked arrays.
import numpy as np
X = np.array([[2, 3, 3, 3],
[3, 2, 1, 3],
[2, 3, 1, 2],
[2, 2, 3, 1]])
y = np.array([1, 0, 0, 1])
#get a mask in the shape of X. (True for places to ignore.)
y_mask = np.vstack([y == 0] * len(X))
X_masked = np.ma.masked_array(X, y_mask)
out = np.zeros_like(X)
mins = np.argmin(X_masked, axis=0)
#Output: array([0, 0, 0, 3], dtype=int64)
#Now just set the indexes to 1 on the minimum for each axis.
out[np.arange(len(out)), mins] = 1
print(out)
[[1 0 0 0]
[1 0 0 0]
[1 0 0 0]
[0 0 0 1]]
you can use numpy.argmin(), to get the indexes of the min value at each row of X. For example:
import numpy as np
a = np.arange(6).reshape(2,3) + 10
ids = np.argmin(a, axis=1)
Similarly, you can the indexes where y is 1 by either numpy.nonzero or numpy.where.
Once you have the two index arrays setting the values in third array should be quite easy.
I'd like to generate a np.ndarray NumPy array for a given shape of another NumPy array. The former array should contain the corresponding indices for each cell of the latter array.
Example 1
Let's say we have a = np.ones((3,)) which has a shape of (3,). I'd expect
[[0]
[1]
[2]]
since there is a[0], a[1] and a[2] in a which can be accessed by their indices 0, 1 and 2.
Example 2
For a shape of (3, 2) like b = np.ones((3, 2)) there is already very much to write. I'd expect
[[[0 0]
[0 1]]
[[1 0]
[1 1]]
[[2 0]
[2 1]]]
since there are 6 cells in b which can be accessed by the corresponding indices b[0][0], b[0][1] for the first row, b[1][0], b[1][1] for the second row and b[2][0], b[2][1] for the third row. Therefore we get [0 0], [0 1], [1 0], [1 1], [2 0] and [2 1] at the matching positions in the generated array.
Thank you very much for taking the time. Let me know if I can clarify the question in any way.
One way to do it with np.indices and np.stack:
np.stack(np.indices((3,)), -1)
#array([[0],
# [1],
# [2]])
np.stack(np.indices((3,2)), -1)
#array([[[0, 0],
# [0, 1]],
# [[1, 0],
# [1, 1]],
# [[2, 0],
# [2, 1]]])
np.indices returns an array of index grid where each subarray represents an axis:
np.indices((3, 2))
#array([[[0, 0],
# [1, 1],
# [2, 2]],
# [[0, 1],
# [0, 1],
# [0, 1]]])
Then transpose the array with np.stack, stacking index for each element from different axis:
np.stack(np.indices((3,2)), -1)
#array([[[0, 0],
# [0, 1]],
# [[1, 0],
# [1, 1]],
# [[2, 0],
# [2, 1]]])
I created a numpy array of shape (4,3,2); I expected following code can print out a array shaped 4 X 3 or 3 X 4
import numpy as np
x = np.zeros((4,3,2), np.int32)
print(x[:][:][0])
However, I got
[[0 0]
[0 0]
[0 0]]
Looks like a 2 X 3? I am really confused on numpy matrix now. Shouldn't I get
[[0 0 0]
[0 0 0]
[0 0 0]
[0 0 0]]
in stead? How to do map a 3D image into a numpy 3D matrix?
For example, in MATLAB, the shape (m, n, k) means (row, col, slice) in a context of an (2D/3D) image.
Thanks a lot
x[:] slices all elements along the first dimension, so x[:] gives the same result as x and x[:][:][0] is thus equivalent to x[0].
To select an element on the last dimension, you can do:
x[..., 0]
#array([[0, 0, 0],
# [0, 0, 0],
# [0, 0, 0],
# [0, 0, 0]], dtype=int32)
or:
x[:,:,0]
#array([[0, 0, 0],
# [0, 0, 0],
# [0, 0, 0],
# [0, 0, 0]], dtype=int32)
You need to specify all slices at the same time in a tuple, like so:
x[:, :, 0]
If you do x[:][:][0] you are actually indexing the first dimension three times. The first two create a view for the entire array and the third creates a view for the index 0 of the first dimension
I have the following numpy array matrix ,
matrix = np.zeros((3,5), dtype = int)
array([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]])
Suppose I have this numpy array indices as well
indices = np.array([[1,3], [2,4], [0,4]])
array([[1, 3],
[2, 4],
[0, 4]])
Question: How can I assign 1s to the elements in the matrix where their indices are specified by the indices array. A vectorized implementation is expected.
For more clarity, the output should look like:
array([[0, 1, 0, 1, 0], #[1,3] elements are changed
[0, 0, 1, 0, 1], #[2,4] elements are changed
[1, 0, 0, 0, 1]]) #[0,4] elements are changed
Here's one approach using NumPy's fancy-indexing -
matrix[np.arange(matrix.shape[0])[:,None],indices] = 1
Explanation
We create the row indices with np.arange(matrix.shape[0]) -
In [16]: idx = np.arange(matrix.shape[0])
In [17]: idx
Out[17]: array([0, 1, 2])
In [18]: idx.shape
Out[18]: (3,)
The column indices are already given as indices -
In [19]: indices
Out[19]:
array([[1, 3],
[2, 4],
[0, 4]])
In [20]: indices.shape
Out[20]: (3, 2)
Let's make a schematic diagram of the shapes of row and column indices, idx and indices -
idx (row) : 3
indices (col) : 3 x 2
For using the row and column indices for indexing into input array matrix, we need to make them broadcastable against each other. One way would be to introduce a new axis into idx, making it 2D by pushing the elements into the first axis and allowing a singleton dim as the last axis with idx[:,None], as shown below -
idx (row) : 3 x 1
indices (col) : 3 x 2
Internally, idx would be broadcasted, like so -
In [22]: idx[:,None]
Out[22]:
array([[0],
[1],
[2]])
In [23]: indices
Out[23]:
array([[1, 3],
[2, 4],
[0, 4]])
In [24]: np.repeat(idx[:,None],2,axis=1) # indices has length of 2 along cols
Out[24]:
array([[0, 0], # Internally broadcasting would be like this
[1, 1],
[2, 2]])
Thus, the broadcasted elements from idx would be used as row indices and column indices from indices for indexing into matrix for setting elements in it. Since, we had -
idx = np.arange(matrix.shape[0]),
Thus, we would end up with -
matrix[np.arange(matrix.shape[0])[:,None],indices] for setting elements.
this involves loop and hence may not be very efficient for large arrays
for i in range(len(indices)):
matrix[i,indices[i]] = 1
> matrix
Out[73]:
array([[0, 1, 0, 1, 0],
[0, 0, 1, 0, 1],
[1, 0, 0, 0, 1]])
I have a big csr_matrix and I want to add over rows and obtain a new csr_matrix with the same number of columns but reduced number of rows. (Context: The matrix is a document-term matrix obtained from sklearn CountVectorizer and I want to be able to quickly combine documents according to codes associated with these documents)
For a minimal example, this is my matrix:
import numpy as np
from scipy.sparse import csr_matrix
from scipy.sparse import vstack
row = np.array([0, 4, 1, 3, 2])
col = np.array([0, 2, 2, 0, 1])
dat = np.array([1, 2, 3, 4, 5])
A = csr_matrix((dat, (row, col)), shape=(5, 5))
print A.toarray()
[[1 0 0 0 0]
[0 0 3 0 0]
[0 5 0 0 0]
[4 0 0 0 0]
[0 0 2 0 0]]
No let's say I want a new matrix B in which rows (1, 4) and (2, 3, 5) are combined by summing them, which would look something like this:
[[5 0 0 0 0]
[0 5 5 0 0]]
And should be again in sparse format (because the real data I'm working with is large). I tried to sum over slices of the matrix and then stack it:
idx1 = [1, 4]
idx2 = [2, 3, 5]
A_sub1 = A[idx1, :].sum(axis=1)
A_sub2 = A[idx2, :].sum(axis=1)
B = vstack((A_sub1, A_sub2))
But this gives me the summed up values just for the non-zero columns in the slice, so I can't combine it with the other slices because the number of columns in the summed slices are different.
I feel like there must be an easy way to do this. But I couldn't find any discussion of this online or in the documentation. What am I missing?
Thank you for your help
Note that you can do this by carefully constructing another matrix. Here's how it would work for a dense matrix:
>>> S = np.array([[1, 0, 0, 1, 0,], [0, 1, 1, 0, 1]])
>>> np.dot(S, A.toarray())
array([[5, 0, 0, 0, 0],
[0, 5, 5, 0, 0]])
>>>
The sparse version is only a little more complicated. The information about which rows should be summed together is encoded in row:
col = range(5)
row = [0, 1, 1, 0, 1]
dat = [1, 1, 1, 1, 1]
S = csr_matrix((dat, (row, col)), shape=(2, 5))
result = S * A
# check that the result is another sparse matrix
print type(result)
# check that the values are the ones we want
print result.toarray()
Output:
<class 'scipy.sparse.csr.csr_matrix'>
[[5 0 0 0 0]
[0 5 5 0 0]]
You can handle more rows in your output by including higher values in row and extending the shape of S accordingly.
The indexing should be:
idx1 = [0, 3] # rows 1 and 4
idx2 = [1, 2, 4] # rows 2,3 and 5
Then you need to keep A_sub1 and A_sub2 in sparse format and use axis=0:
A_sub1 = csr_matrix(A[idx1, :].sum(axis=0))
A_sub2 = csr_matrix(A[idx2, :].sum(axis=0))
B = vstack((A_sub1, A_sub2))
B.toarray()
array([[5, 0, 0, 0, 0],
[0, 5, 5, 0, 0]])
Note, I think the A[idx, :].sum(axis=0) operations involve conversion from sparse matrices - so #Mr_E's answer is probably better.
Alternatively, it works when you use axis=0 and np.vstack (as opposed to scipy.sparse.vstack):
A_sub1 = A[idx1, :].sum(axis=0)
A_sub2 = A[idx2, :].sum(axis=0)
np.vstack((A_sub1, A_sub2))
Giving:
matrix([[5, 0, 0, 0, 0],
[0, 5, 5, 0, 0]])