How to select specific column indices from a matrix? - python

I have a matrix and a list of column indices that I want to select from the matrix for each row. How can I do that in numpy?
my_matrix = np.array([[1, 2], [4, 5]])
col_idx = np.array([1, 0])
selected = .... # selects 1st element of row 0 and 0th element of row 1.
print selected
# np.array([2, 4])

You can slice using range:
In [11]: my_matrix[np.arange(my_matrix.shape[0]), col_idx]
Out[11]: array([2, 4])

np.choose is very useful for this making these sorts of selections:
>>> np.choose(col_idx, my_matrix.T)
array([2, 4])
And on a larger matrix:
>>> my_matrix_2 = np.array([[1, 2], [4, 5], [3, 7], [4, 1]])
>>> col_idx_2 = np.array([1, 0, 0, 1])
>>> np.choose(col_idx_2, my_matrix_2.T)
array([2, 4, 3, 1])
The method returns a new array with the selected values (not a view of the original array).
There are more examples of this (initially slightly non-obvious) method in the documentation, but I'll explain what's happening using the second example above.
We're using np.choose to return a new array from an array of choices called my_matrix_2.T, where col_idx_2 specifies which row of the choice array we should pick from each time.
Notice we transpose my_matrix_2 for this to work:
# my_matrix_2.T
array([[1, 4, 3, 4], # row 0
[2, 5, 7, 1]]) # row 1
We have col_idx_2 = [1, 0, 0, 1]. Now stepping through this array one value at a time:
the first element of the new array will be the first element of row 1 of my_matrix_2.T. This is 2.
the second element of the new array will be the second element of row 0 of my_matrix_2.T. This is 4.
the third element of the new array will be the third element of row 0 of my_matrix_2.T. This is 3.
the fourth element of the new array will be the fourth element of row 1 of my_matrix_2.T. This is 1.
Hence the method returns array([2, 4, 3, 1]).

In [211]: M = np.array([[1, 2], [4, 5]])
In [212]: cid = [1, 0]
In [213]: M[[list(i) for i in zip(range(M.shape[0]), cid)]]
Out[213]: array([2, 4])

Related

Add repeated elements of array indexed by another array

I have a relatively simple problem that I cannot solve without using loops. It is difficult for me to figure out the correct title for this problem.
Lets say we have two numpy arrays:
array_1 = np.array([[0, 1, 2],
[3, 3, 3],
[3, 3, 4],
[3, 6, 2]])
array_2 = np.array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2],
[3, 3, 3],
[4, 4, 4],
[5, 5, 5],
[6, 6, 6]])
array_1 represents indices of the rows in array_2 that we want to sum. So for example, 4th row in result array should contain summed all rows in array_2 that have same row indices as all 3s in array_1.
It is much easier to understand it in the code:
result = np.empty(array_2.shape)
for i in range(array_1.shape[0]):
for j in range(array_1.shape[1]):
index = array_1[i, j]
result[index] = result[index] + array_2[i]
Result should be:
[[ 0 0 0]
[ 0 0 0]
[ 3 3 3]
[10 10 10]
[ 2 2 2]
[ 0 0 0]
[ 3 3 3]]
I tried to use np.einsum but I need to use both elements in array as indices and also its rows as indices so I'm not sure if np.einsum is the best path here.
This is the problem I have in graphics. array_1 represent indices of vertices for triangles and array_2 represents normals where index of a row corresponds to the index of the vertex
Any time you're adding something from a repeated index, normal ufuncs like np.add don't work out of the box because they only process a repeated fancy index once. Instead, you have to use the unbuffered version, which is np.add.at.
Here, you have a pair of indices: the row in array_1 is the row index into array_2, and the element of array_1 is the row index into the output.
First, construct the indices explicitly as fancy indices. This will make it much simpler to use them:
output_row = array_1.ravel()
input_row = np.repeat(np.arange(array_1.shape[0]), array_1.shape[1]).ravel()
You can apply input_row directly to array_2, but you need add.at to use output_row:
output = np.zeros_like(array_2)
np.add.at(output, output_row, array_2[input_row])
You really only use the first four rows of array_2, so it could be truncated to
array_2 = array2[:array_1.shape[0]]
In that case, you would want to initialize the output as:
output = np.zeros_like(array_2, shape=(output_row.max() + 1, array2.shape[1]))

Remove entire row of np array if there is duplicate in first column

I have the following test array:
arr = np.array([[1, 2, 3], [1,3,7], [2,1,3], [4, 5, 6], [1,4,7], [2,7,6])
I need to remove every row that has a duplicate value in the first column (but still keeping the first instance of that value). For this test array I require the following output:
result=[1,2,3],[2,1,3],[4,5,6]
So the first row where 1 is in the first column is kept, and the first row where 2 is in the column is kept etc...
Any help would be appreciated!
return_index in np.unique is quite useful for this:
_, i = np.unique(arr[:,0], return_index=True)
arr[i]
array([[1, 2, 3],
[2, 1, 3],
[4, 5, 6]])

Indexing and replacing values in sparse CSC matrix (Python)

I have a sparse CSC matrix, "A", in which I want to replace the first row with a vector that is all zeros, except for the first entry which is 1.
So far I am doing the inefficient version, e.g.:
import numpy as np
from scipy.sparse import csc_matrix
row = np.array([0, 2, 2, 0, 1, 2])
col = np.array([0, 0, 1, 2, 2, 2])
data = np.array([1, 2, 3, 4, 5, 6])
A = csc_matrix((data, (row, col)), shape=(3, 3))
replace = np.zeros(3)
replace[0] = 1
A[0,:] = replace
A.eliminate_zeros()
But I'd like to do it with .indptr, .data, etc. As it is a CSC, I am guessing that this might be inefficient as well? In my exact problem, the matrix is 66000 X 66000.
For a CSR sparse matrix I've seen it done as
A.data[1:A.indptr[1]] = 0
A.data[0] = 1.0
A.indices[0] = 0
A.eliminate_zeros()
So, basically I'd like to do the same for a CSC sparse matrix.
Expected result: To do exactly the same as above, just more efficiently (applicable to very large sparse matrices).
That is, start with:
[1, 0, 4],
[0, 0, 5],
[2, 3, 6]
and replace the upper row with a vector that is as long as the matrix, is all zeros except for 1 at the beginning. As such, one should end with
[1, 0, 0],
[0, 0, 5],
[2, 3, 6]
And be able to do it for large sparse CSC matrices efficiently.
Thanks in advance :-)
You can do it by indptr and indices. If you want to construct your matrix with indptr and indices parameters by:
indptr = np.array([0, 2, 3, 6])
indices = np.array([0, 2, 2, 0, 1, 2])
data = np.array([1, 2, 3, 4, 5, 6])
A = csc_matrix((data, indices, indptr), shape=(3,3))
But if you want to set all elements in the first row except the first element in row 0, you need to set data values to zero for those that indices is zero. In other words:
data[indices == 0] = 0
The above line set all the elements of the first row to 0. To avoid setting the first element to zero we can do the following:
indices_tmp = indices == 0
indices_tmp[0] = False # to avoid removing the first element in row 0.
data[indices_tmp == True] = 0
A = csc_matrix((data, indices, indptr), shape=(3,3))
Hope it helps.

How to find the index of all minimum elements in a numpy array in python?

Suppose I have a numpy array
a = np.array([0,2,3,4,5,1,9,0,0,7,9,0,0,0]).reshape(7,2)
I want to find out the indices of all the times the minimum element (here 0) occurs in the 2nd column. Using argmin I can find out the index of when 0 is occurring for the first time. How can I do this in Python?
Using np.flatnonzero on a[:, 1]==np.min(a) is the most starightforward way:
In [3]: idxs = np.flatnonzero(a[:, 1]==np.min(a))
In [4]: idxs
Out[4]: array([3, 5, 6])
After you reshaped your array it looks like this:
array([[0, 2],
[3, 4],
[5, 1],
[9, 0],
[0, 7],
[9, 0],
[0, 0]])
You can get all elements that are of the same value by using np.where. IN your case the following would work:
np.where(a.T[-1] == a.argmin())
# This would give you (array([3, 5, 6]),)
What happens here is that you create a transposed view on the array. This means you can easily access the columns. The term view here means that the a array itself is not changed for that. This leaves you with:
a.T
array([[0, 3, 5, 9, 0, 9, 0],
[2, 4, 1, 0, 7, 0, 0]])
From this you select the last line (i.e. the last column of a) by using the index -1. Now you have the array
array([2, 4, 1, 0, 7, 0, 0])
on which you can call np.where(condititon), which gives you all indices for which the condition is true. In your case the condition is
a.T[-1] == a.argmin()
which gives you all entries in the selected line of the transposed array that have the same value as np.argmin(a) which, as you said, is 0 in your case.

Extracting required indices from an array of tuples

import numpy as np
from scipy import signal
y = np.array([[2, 1, 2, 3, 2, 0, 1, 0],
[2, 1, 2, 3, 2, 0, 1, 0]])
maximas = signal.argrelmax(y, axis=1)
print maximas
(array([0, 0, 1, 1], dtype=int64), array([3, 6, 3, 6], dtype=int64))
The maximas produced the index of tuples: (0,3) and (0,6) are for row one [2, 1, 2, 3, 2, 0, 1, 0]; and (1,6) and (1,6) are for another row [2, 1, 2, 3, 2, 0, 1, 0].
The following prints all the results, but I want to extract only the first maxima of both rows, i.e., [3,3] using the tuples. So, the tuples I need are (0,3) and (1,3).
How can I extract them from the array of tuples, i.e., 'maximas'?
>>> print y[kk]
[3 1 3 1]
Given the tuple maximas, here's one possible NumPy way:
>>> a = np.column_stack(maximas)
>>> a[np.unique(a[:,0], return_index=True)[1]]
array([[0, 3],
[1, 3]], dtype=int64)
This stacks the coordinate lists returned by signal.argrelmax into an array a. The return_index parameter of np.unique is used to find the first index of each row number. We can then retrieve the relevant rows from a using these first indexes.
This returns an array, but you could turn it into a list of lists with tolist().
To return the first column index of the maximum in each row, you just need to take the indices returned by np.unique from maximas[0] and use them to index maximas[1]. In one line, it's this:
>>> maximas[1][np.unique(maximas[0], return_index=True)[1]]
array([3, 3], dtype=int64)
To retrieve the corresponding values from each row of y, you can use np.choose:
>>> cols = maximas[1][np.unique(maximas[0], return_index=True)[1]]
>>> np.choose(cols, y.T)
array([3, 3])
Well, a pure Python approach will be to use itertools.groupby(group on the row's index) and a list comprehension:
>>> from itertools import groupby
>>> from operator import itemgetter
>>> [max(g, key=lambda x: y[x])
for k, g in groupby(zip(*maximas), itemgetter(0))]
[(0, 3), (1, 3)]

Categories

Resources