selecting values from array with indexes and summing over them

selecting values from array with indexes and summing over them - python

I have two arrays, values and indexes
>>> values
array([[5, 4, 2, 4, 6],
[7, 9, 7, 3, 6]])
>>> indexes
array([[2, 4],
[0, 3],
[0, 1],
[1, 3]])
What i would like is a fast way (as my arrays are very large) to get, for each value of values the sum of the elements corresponding to all index collections that are in indexes.
I.e I want, for the first value [5, 4, 2, 4, 6] to get
>>> values[0][indexes.flatten()].reshape(indexes.shape)
array([[2, 6],
[5, 4],
[5, 4],
[4, 4]])
>>> values[0][indexes.flatten()].reshape(indexes.shape).sum(axis=1)
array([8, 9, 9, 8])
using this technique and looping over all values is the fastest I could come up with. Is there a better way? Thank you in advance for your time.

Approach #1
Simply index into columns and sum along the last axis -
values[:,indexes].sum(axis=-1)
Sample run -
In [39]: values
Out[39]:
array([[5, 4, 2, 4, 6],
[7, 9, 7, 3, 6]])
In [40]: indexes
Out[40]:
array([[2, 4],
[0, 3],
[0, 1],
[1, 3]])
In [41]: values[:,indexes].sum(axis=-1)
Out[41]:
array([[ 8, 9, 9, 8],
[13, 10, 16, 12]])
Approach #2
If there are no duplicates in each row of indexes, we can simply use matrix-multiplication to get the sum-reductions and this would be much faster -
m,n = indexes.shape[0], values.shape[1]
mask = np.zeros((n,m),dtype=bool) # faster with float dtype
mask[indexes, np.arange(m)[:,None]] = 1
out = values.dot(mask)

Related

Notation of swapping rows on a numpy array in Python

Lets say we have a numpy array:
A = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[10, 11, 12]])
Can anyone explain why this line is used to swap two rows(in this occasion the 1st with the 4th)?
'A[[0, 3]] = A [[3, 0]]'

You are updating the positions of two subarrays simultaneously.
However, doing:
A[0] = A[3]
A[3] = A[0]
would not work because the subarray A[0] has already been updated, so you need to do it simultaneously with:
A[[0, 3]] = A [[3, 0]]
A
array([[10, 11, 12],
[ 4, 5, 6],
[ 7, 8, 9],
[ 1, 2, 3]])

keep elements of an np.ndarray by values of another np.array (vectorized) [duplicate]

I have two matrices of the same size, A, B. I want to use the columns of B to acsses the columns of A, on a per column basis. For example,
A = np.array([[1, 4, 7],
[2, 5, 8],
[3, 6, 9]])
and
B = np.array([[0, 0, 2],
[1, 2, 1],
[2, 1, 0]])
I want something like:
A[B] = [[1, 4, 9],
[2, 6, 8],
[3, 5, 7]]
I.e., I've used the j'th column of B as indices to the j'th column of A.
Is there any effiecnt way of doing so?
Thanks!

You can use advanced indexing:
A[B, np.arange(A.shape[0])]
array([[1, 4, 9],
[2, 6, 8],
[3, 5, 7]])
Or with np.take_along_axis:
np.take_along_axis(A, B, axis=0)
array([[1, 4, 9],
[2, 6, 8],
[3, 5, 7]])

Applying tf.gather to all rows of two tensors

I want to apply tf.gather() to all the rows of a given parameters tensor and an indices tensor.
I can apply tf.gather() on two 1D tensors to extract a 1D tensor:
# params == array([3, 8, 9, 7, 6])
# inds == array([1, 2, 3])
>>> tf.gather(params, inds).eval()
array([8, 9, 7])
Now what if I have two 2D tensors, and want to apply tf.gather() on them row-wise? I want something like this:
# params == array([[3, 8, 9, 7, 6],
# [6, 1, 7, 0, 7],
# [7, 4, 4, 5, 8]])
# inds == array([[1, 2, 3],
# [2, 3, 4],
# [0, 1, 2]])
>>> row_wise_gather(params, inds)
array([[8, 9, 7],
[7, 0, 7],
[7, 4, 4]]
The closest I've come so far is using tf.gather() with axis=1, which yields a 3D tensor, and then index the result with gather_nd():
>>> gathered3d = tf.gather(params, inds, axis=1)
# gathered3d == array([[[8, 9, 7],
# [9, 7, 6],
# [3, 8, 9]],
#
# [[1, 7, 0],
# [7, 0, 7],
# [6, 1, 7]],
#
# [[4, 4, 5],
# [4, 5, 8],
# [7, 4, 4]]])
>>> tf.gather_nd(gathered3d, [[0, 0], [1, 1], [2, 2]]).eval()
array([[8, 9, 7],
[7, 0, 7],
[7, 4, 4]])
(I'd call other functions instead of giving literal values, but that's beside the point and not an issue)
This is very clumsy. Is there a more efficient way to do this?
By the way, the indices I use are always values increasing one by one; each row just has a different start and end value. That might make the problem easier.

How can I isolate rows in a 2d numpy matrix that match a specific criteria?

How can I isolate rows in a 2d numpy matrix that match a specific criteria? For example if I have some data and I only want to look at the rows where the 0 index has a value of 5 or less how would I retrieve those values?
I tried this approach:
import numpy as np
data = np.matrix([
[10, 8, 2],
[1, 4, 5],
[6, 5, 7],
[2, 2, 10]])
#My attempt to retrieve all rows where index 0 is less than 5
small_data = (data[:, 0] < 5)
The output is:
matrix([
[False],
[ True],
[False],
[ True]], dtype=bool)
However I'd like the output to be:
[[1, 4, 5],
[2, 2, 10]]
Another approach may be for me to loop through the matrix rows and if the 0 index is smaller than 5 append the row to a list but I am hoping there is a better way than that.
Note: I'm using Python 2.7.

First: Don't use np.matrix, use normal np.arrays.
import numpy as np
data = np.array([[10, 8, 2],
[1, 4, 5],
[6, 5, 7],
[2, 2, 10]])
Then you can always use boolean indexing (based on the boolean array you get when you do comparisons) to get the desired rows:
>>> data[data[:, 0] < 5]
array([[ 1, 4, 5],
[ 2, 2, 10]])
or integer array indexing:
>>> data[np.where(data[:, 0] < 5)]
array([[ 1, 4, 5],
[ 2, 2, 10]])

That way you got a logical array that you can use to select the desired rows.
>>> data = np.matrix([
[10, 8, 2],
[1, 4, 5],
[6, 5, 7],
[2, 2, 10]])
>>> data = np.array(data)
>>> data[(data[:, 0] < 5), :]
array([[ 1, 4, 5],
[ 2, 2, 10]])
You can also use np.squeeze to filter the rows.
>>> ind = np.squeeze(np.asarray(data [:,0]))<5
>>> data[ind,:]
array([[ 1, 4, 5],
[ 2, 2, 10]])

use the following code.
data[data[small_data,:]]
That would work

NumPy array indexing

I want to extract the second and the 3rd to the fifth columns of the NumPy array, how would I go about it?
A = array([[0, 1, 2, 3, 4, 5, 6], [4, 5, 6, 7, 4, 5, 6]])
A[:, [1, 4:6]]
This obviously doesn't work.

Assuming I've understood you -- it's usually a good idea to explicitly specify the output you want, because it's not obvious -- you could use numpy.r_:
In [27]: A
Out[27]:
array([[0, 1, 2, 3, 4, 5, 6],
[4, 5, 6, 7, 4, 5, 6]])
In [28]: A[:, [1,3,4,5]]
Out[28]:
array([[1, 3, 4, 5],
[5, 7, 4, 5]])
In [29]: A[:, r_[1, 3:6]]
Out[29]:
array([[1, 3, 4, 5],
[5, 7, 4, 5]])
In [37]: A[1:, r_[1, 3:6]]
Out[37]: array([[5, 7, 4, 5]])
which you can then flatten or reshape as you like. r_ is basically a convenience function to generate the right indices, e.g.
In [30]: r_[1, 3:6]
Out[30]: array([1, 3, 4, 5])

Perhaps you are looking for this?
In [10]: A[1:, [1]+range(3,6)]
Out[10]: array([[5, 7, 4, 5]])
Note this gives you the second, fourth, fifth and six columns of all rows but the first.

The second element is A[:,1]. Elements 3-5 (I'm assuming you want inclusive) are A[:,2:5]. You won't be able to extract them with a single call. To get them as an array, you could do
import numpy as np
A = np.array([[0, 1, 2, 3, 4, 5, 6], [4, 5, 6, 7, 4, 5, 6]])
my_cols = np.hstack((A[:,1][...,np.newaxis], A[:,2:5]))
The np.newaxis stuff is just to make A[:,1] a 2D array, consistent with A[:,2:5].
Hope this helps.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

selecting values from array with indexes and summing over them - python

Related

Notation of swapping rows on a numpy array in Python

keep elements of an np.ndarray by values of another np.array (vectorized) [duplicate]

Applying tf.gather to all rows of two tensors

How can I isolate rows in a 2d numpy matrix that match a specific criteria?

NumPy array indexing

Categories

Resources