Efficiently index a multidemnsional numpy array by another array

Efficiently index a multidemnsional numpy array by another array - python

I have an array x which specific values I would like to access, whose indices are given by another array.
For example, x is
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
and the indices are an array of Nx2
idxs = np.array([[1,2], [4,3], [3,3]])
I would like a function that returns an array of x[1,2], x[4,3], x[3,3] or [7, 23, 18]. The following code does the trick, but I would like to speed it up for large arrays, perhaps by avoiding the for loop.
import numpy as np
def arrayvalsofinterest(x, idx):
output = np.zeros(idx.shape[0])
for i in range(len(output)):
output[i] = x[tuple(idx[i,:])]
return output
if __name__ == "__main__":
xx = np.arange(25).reshape(5,5)
idxs = np.array([[1,2],[4,3], [3,3]])
print arrayvalsofinterest(xx, idxs)

You can pass in an iterable of axis0 coordinates and an iterable of axis1 coordinates. See the Numpy docs here.
i0, i1 = zip(*idxs)
x[i0, i1]
As #Divakar points out in the comments, this is less memory efficient than using a view of the array i.e.
x[idxs[:, 0], idxs[:, 1]]

Related

Pytorch tensor indexing error for sizes M < 32?

I am trying to access a pytorch tensor by a matrix of indices and I recently found this bit of code that I cannot find the reason why it is not working.
The code below is split into two parts. The first half proves to work, whilst the second trips an error. I fail to see the reason why. Could someone shed some light on this?
import torch
import numpy as np
a = torch.rand(32, 16)
m, n = a.shape
xx, yy = np.meshgrid(np.arange(m), np.arange(m))
result = a[xx] # WORKS for a torch.tensor of size M >= 32. It doesn't work otherwise.
a = torch.rand(16, 16)
m, n = a.shape
xx, yy = np.meshgrid(np.arange(m), np.arange(m))
result = a[xx] # IndexError: too many indices for tensor of dimension 2
and if I change a = np.random.rand(16, 16) it does work as well.

To whoever comes looking for an answer: it looks like its a bug in pyTorch.
Indexing using numpy arrays is not well defined, and it works only if tensors are indexed using tensors. So, in my example code, this works flawlessly:
a = torch.rand(M, N)
m, n = a.shape
xx, yy = torch.meshgrid(torch.arange(m), torch.arange(m), indexing='xy')
result = a[xx] # WORKS
I made a gist to check it, and it's available here

First, let me give you a quick insight into the idea of indexing a tensor with a numpy array and another tensor.
Example: this is our target tensor to be indexed
numpy_indices = torch.tensor([[0, 1, 2, 7],
[0, 1, 2, 3]]) # numpy array
tensor_indices = torch.tensor([[0, 1, 2, 7],
[0, 1, 2, 3]]) # 2D tensor
t = torch.tensor([[1, 2, 3, 4], # targeted tensor
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16],
[17, 18, 19, 20],
[21, 22, 23, 24],
[25, 26, 27, 28],
[29, 30, 31, 32]])
numpy_result = t[numpy_indices]
tensor_result = t[tensor_indices]
Indexing using a 2D numpy array: the index is read like pairs (x,y) tensor[row,column] e.g. t[0,0], t[1,1], t[2,2], and t[7,3].
print(numpy_result) # tensor([ 1, 6, 11, 32])
Indexing using a 2D tensor: walks through the index tensor in a row-wise manner and each value is an index of a row in the targeted tensor.
e.g. [ [t[0],t[1],t[2],[7]] , [[0],[1],[2],[3]] ] see the example below, the new shape of tensor_result after indexing is (tensor_indices.shape[0],tensor_indices.shape[1],t.shape[1])=(2,4,4).
print(tensor_result) # tensor([[[ 1, 2, 3, 4],
# [ 5, 6, 7, 8],
# [ 9, 10, 11, 12],
# [29, 30, 31, 32]],
# [[ 1, 2, 3, 4],
# [ 5, 6, 7, 8],
# [ 9, 10, 11, 12],
# [ 13, 14, 15, 16]]])
If you try to add a third row in numpy_indices, you will get the same error you have because the index will be represented by 3D e.g., (0,0,0)...(7,3,3).
indices = np.array([[0, 1, 2, 7],
[0, 1, 2, 3],
[0, 1, 2, 3]])
print(numpy_result) # IndexError: too many indices for tensor of dimension 2
However, this is not the case with indexing by tensor and the shape will be bigger (3,4,4).
Finally, as you see the outputs of the two types of indexing are completely different. To solve your problem, you can use
xx = torch.tensor(xx).long() # convert a numpy array to a tensor
What happens in the case of advanced indexing (rows of numpy_indices > 3 ) as your situation is still ambiguous and unsolved and you can check 1 , 2, 3.

Index numpy 3d-array with 1d array of indices

I have a 3D numpy array of shape (i, j, k). I have an array of length i which contains indices in k. I would like to index the array to get a shape (i, j).
Here is an example of what I am trying to achieve:
import numpy as np
arr = np.arange(2 * 3 * 4).reshape(2, 3, 4)
# array([[[ 0, 1, 2, 3],
# [ 4, 5, 6, 7],
# [ 8, 9, 10, 11]],
#
# [[12, 13, 14, 15],
# [16, 17, 18, 19],
# [20, 21, 22, 23]]])
indices = np.array([1, 3])
# I want to mask `arr` using `indices`
# Desired output is equivalent to
# np.stack((arr[0, :, 1], arr[1, :, 3]))
# array([[ 1, 5, 9],
# [15, 19, 23]])
I tried reshaping the indices array to be able to broadcast with arr but this raises an IndexError.
arr[indices[np.newaxis, np.newaxis, :]]
# IndexError: index 3 is out of bounds for axis 0 with size 2
I also tried creating a 3D mask and applying it to arr. This seems closer to the correct answer to me but I still end up with an IndexError.
mask = np.stack((np.arange(arr.shape[0]), indices), axis=1)
arr[mask.reshape(2, 1, 2)]
# IndexError: index 3 is out of bounds for axis 0 with size 2

From what I understand in your example, you can simply pass indices as your second dimension slice, and a range of length corresponding to your indices for the zeroth dimension slice, like this:
import numpy as np
arr = np.arange(2 * 3 * 4).reshape(2, 3, 4)
indices = np.array([1, 3])
print(arr[range(len(indices)), :, indices])
# array([[ 1, 5, 9],
# [15, 19, 23]])

This works:
sub = arr[[0,1], :, [1,3]]
Output:
>>> sub
array([[ 1, 5, 9],
[15, 19, 23]])
A more dynamic version by #Psidom:
>>> sub = arr[np.arange(len(arr)), :, [1,3]]
array([[ 1, 5, 9],
[15, 19, 23]])

Iterating Over Rows in Python Array to Extract Column Data

I am using Python and looking to iterate through each row of an Nx9 array and extract certain values from the row to form another matrix with them. The N value can change depending on the file I am reading but I have used N=3 in my example. I only require the 0th, 1st, 3rd and 4th values of each row to form into an array which I need to store. E.g:
result = np.array([[1, 2, 3, 4, 5, 6, 7, 8, 9],
[11, 12, 13, 14, 15, 16, 17, 18, 19],
[21, 22, 23, 24, 25, 26, 27, 28, 29]])
#Output matrix of first row should be: ([[1,2],[4,5]])
#Output matrix of second row should be: ([[11,12],[14,15]])
#Output matrix of third row should be: ([[21,22],[24,25]])
I should then end up with N number of matrices formed with the extracted values - a 2D matrix for each row. However, the matrices formed appear 3D so when transposed and subtracted I receive the error ValueError: operands could not be broadcast together with shapes (2,2,3) (3,2,2). I am aware that a (3,2,2) matrix cannot be subtracted from a (2,2,3) so how do I obtain a 2D matrix N number of times? Would a loop be better suited? Any suggestions?
import numpy as np
result = np.array([[1, 2, 3, 4, 5, 6, 7, 8, 9],
[11, 12, 13, 14, 15, 16, 17, 18, 19],
[21, 22, 23, 24, 25, 26, 27, 28, 29]])
a = result[:, 0]
b = result[:, 1]
c = result[:, 2]
d = result[:, 3]
e = result[:, 4]
f = result[:, 5]
g = result[:, 6]
h = result[:, 7]
i = result[:, 8]
output = [[a, b], [d, e]]
output = np.array(output)
output_transpose = output.transpose()
result = 0.5 * (output - output_transpose)

In [276]: result = np.array(
...: [
...: [1, 2, 3, 4, 5, 6, 7, 8, 9],
...: [11, 12, 13, 14, 15, 16, 17, 18, 19],
...: [21, 22, 23, 24, 25, 26, 27, 28, 29],
...: ]
...: )
...:
...: a = result[:, 0]
...
...: i = result[:, 8]
...: output = [[a, b], [d, e]]
In [277]: output
Out[277]:
[[array([ 1, 11, 21]), array([ 2, 12, 22])],
[array([ 4, 14, 24]), array([ 5, 15, 25])]]
In [278]: arr = np.array(output)
In [279]: arr
Out[279]:
array([[[ 1, 11, 21],
[ 2, 12, 22]],
[[ 4, 14, 24],
[ 5, 15, 25]]])
In [280]: arr.shape
Out[280]: (2, 2, 3)
In [281]: arr.T.shape
Out[281]: (3, 2, 2)
transpose exchanges the 1st and last dimensions.
A cleaner way to make a (N,2,2) array from selected columns is:
In [282]: arr = result[:,[0,1,3,4]].reshape(3,2,2)
In [283]: arr.shape
Out[283]: (3, 2, 2)
In [284]: arr
Out[284]:
array([[[ 1, 2],
[ 4, 5]],
[[11, 12],
[14, 15]],
[[21, 22],
[24, 25]]])
Since the last 2 dimensions are 2, you could transpose them, and take the difference:
In [285]: arr-arr.transpose(0,2,1)
Out[285]:
array([[[ 0, -2],
[ 2, 0]],
[[ 0, -2],
[ 2, 0]],
[[ 0, -2],
[ 2, 0]]])
Another way to get the (N,2,2) array is with a matrix index:
In [286]: result[:,[[0,1],[3,4]]]
Out[286]:
array([[[ 1, 2],
[ 4, 5]],
[[11, 12],
[14, 15]],
[[21, 22],
[24, 25]]])

Ok, this is not a coding problem, but a math problem. I wrote some code for you, since it's pretty obvious you're a beginner, so there will be some unfamiliar syntax in there that you should look into so you can avoid problems like this in the future. You might not use them all that often, but it's good to know how to do it, because it expands your understanding of python syntax in general.
First up, the complete code for easy copy and pasting:
import numpy as np
result=np.array([[1,2,3,4,5,6,7,8,9],
[11,12,13,14,15,16,17,18,19],
[21,22,23,24,25,26,27,28,29]])
output = np.array(tuple(result[:,i] for i in (0,1,3)))
def Matrix_Operation(Matrix,Coefficient):
if (Matrix.shape == Matrix.shape[::-1]
and isinstance(Matrix,np.ndarray)
and isinstance(Coefficient,float)):
return Coefficient*(Matrix-Matrix.transpose())
else:
print('The shape of you Matrix is not palindromic')
print('You cannot substitute matrices of unequal shape')
print('Your shape: %s'%str(Matrix.shape))
print(Matrix_Operation(output,0.5))
Now let's talk about a step by step explanation of what's happening here:
import numpy as np
result=np.array([[1,2,3,4,5,6,7,8,9],
[11,12,13,14,15,16,17,18,19],
[21,22,23,24,25,26,27,28,29]])
Python uses indentation (alignment of whitespaces) as an integral part of it's syntax. However, if you provide brackets, a lot of the time you don't need aligning indentations in order for the interpreter to understand your code. If you provide a large array of values manually, it is usually adviseable to start new lines at the commas (here, the commas separating the sublists). It's just more readable and that way your data isn't off screen in your coding program.
output = np.array(tuple(result[:,i] for i in (0,1,3)))
List comprehensions are a big deal in python and really handy for dirty one liners. As far as I know, no other language gives you this option. That's one of the reasons why python is so great. I basically created a list of lists, where each sublist is result[:,i] for every i in (0,1,3). This is cast as a tuple (yes, list comprehensions can also be done with tuples, not just lists). Finally I wrapped it in the np.array function, since this is the type required for our mathematical operations later on.
def Matrix_Operation(Matrix,Coefficient):
if (Matrix.shape == Matrix.shape[::-1]
and isinstance(Matrix,np.ndarray)
and isinstance(Coefficient,(float,int))):
return Coefficient*(Matrix-Matrix.transpose())
else:
print('The shape of you Matrix is not palindromic')
print('You cannot substitute matrices of unequal shape')
print('Your shape: %s'%str(Matrix.shape))
print(Matrix_Operation(output,0.5))
If you're gonna create a complex formula in python code, why not wrap it inside an abstractable function? You can incorporate a lot of "quality control" into a function as well, to check if it is given the correct input for the task it is supposed to do.
Your code failed, because you were trying to subtract a (2,2,3) shaped matrix from a (3,2,2) matrix. So we'll need a code snippet to check, if our provided matrix has a palindromic shape. You can reverse the order of items in a container by doing Container[::-1] and so we ask, if Matrix.shape == Matrix.shape[::-1]. Further, it is necessary, that our Matrix is a np.ndarray and if our coefficient is a number. That's what I'm doing with the isinstance() function. You can check for multiple types at once, which is why the isinstance(Coefficient,(float,int)) contains a tuple with both int and float in it.
Now that we have ensured that our input makes sense, we can preform our Matrix_Operation.
So in conclusion: Check if your math is solid before asking SO for help, because people here can get pretty annoyed at that sort of thing. You probably noticed by now that someone has already downvoted your question. Personally, I believe it's necessary to let newbies ask a couple stupid questions before they get into the groove, but that's what the voting button is for, I guess.

Numpy slicing function: Dynamically create slice indices np.r_[a:b, c:d, ...] from array shaped (X, 2) for selection in array

The situation
I have 2D array representing dual-channel audio. I want to create a function that returns slices of this array at arbitrary locations (e.g. speech only parts). I already know how to do it when I explicitly write the values into np.r_:
Sample data
arr = np.arange(0,24).reshape((2, -1))
# array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
# [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]])
Input
A x length array of width 2. E.g.
selector = np.array([[0, 2], [6, 9]])
# array([[0, 2],
# [6, 9]])
Desired output
# create an indexed arrays
selection_indices = np.r_[0:2, 6:9]
# array([0, 1, 6, 7, 8])
# use indices to select 2D
arr[:, selection_indices]
# array([[ 0, 1, 6, 7, 8],
# [12, 13, 18, 19, 20]])
Goal
A function that takes a X length array of width 2 (shape: X, 2), representing the start and end of a slice, and use that to return a selection of an array. Effectively the np.r_[0:2, 6:9], but then from an argument.
arr = np.arange(0,24).reshape((2, -1))
def slice_returner(arr, selector):
# something like this (broken); should be like: np.r_[0:2, 6:9]
selection_indices = np.r_[[row[0]:row[1]] for row in selector]
# return 2D slice
return arr[:, selection_indices]
selector = np.array([[0, 2], [6, 9]])
sliced_arr = slice_returner(arr, selector)
How do I turn the input into selection slices? Preferably with minimal array creation / copying.

Think boolean-indexing could be one efficient way. Hence, we can create a mask and then index cols and get our output -
# Generate mask for cols
mask = np.zeros(arr.shape[1],dtype=bool)
for (i,j) in selector:
mask[i:j] = True
# Boolean index into cols for final o/p
out = arr[:,mask]
The memory-overhead is just the mask, which being a boolean array should be minimal and the final output, which is required anyway.
Vectorized mask creation
If there are many entries in selector, there's a broadcasting-based vectorized way to create the mask for cols, like so -
r = np.arange(arr.shape[1])
mask = ((selector[:,0,None]<=r) & (selector[:,1,None]>r)).any(0)

You can just create an indexing array from individual aranges
slices = [[0, 2], [6, 9]]
np.concatenate([np.arange(*i) for i in slices])
# array([0, 1, 6, 7, 8])
and use it to extract the data
arr[:, np.concatenate([np.arange(*i) for i in slices])]
# array([[ 0, 1, 6, 7, 8],
# [12, 13, 18, 19, 20]])

Rearrange columns of numpy 2D array

Is there a way to change the order of the columns in a numpy 2D array to a new and arbitrary order?
For example, I have an array
array([[10, 20, 30, 40, 50],
[ 6, 7, 8, 9, 10]])
and I want to change it into, say
array([[10, 30, 50, 40, 20],
[ 6, 8, 10, 9, 7]])
by applying the permutation
0 -> 0
1 -> 4
2 -> 1
3 -> 3
4 -> 2
on the columns. In the new matrix, I therefore want the first column of the original to stay in place, the second to move to the last column and so on.
Is there a numpy function to do it? I have a fairly large matrix and expect to get even larger ones, so I need a solution that does this quickly and in place if possible (permutation matrices are a no-go)
Thank you.

This is possible in O(n) time and O(n) space using fancy indexing:
>>> import numpy as np
>>> a = np.array([[10, 20, 30, 40, 50],
... [ 6, 7, 8, 9, 10]])
>>> permutation = [0, 4, 1, 3, 2]
>>> idx = np.empty_like(permutation)
>>> idx[permutation] = np.arange(len(permutation))
>>> a[:, idx] # return a rearranged copy
array([[10, 30, 50, 40, 20],
[ 6, 8, 10, 9, 7]])
>>> a[:] = a[:, idx] # in-place modification of a
Note that a[:, idx] is returning a copy, not a view. An O(1)-space solution is not possible in the general case, due to how numpy arrays are strided in memory.

The easiest way in my opinion is:
a = np.array([[10, 20, 30, 40, 50],
[6, 7, 8, 9, 10]])
print(a[:, [0, 2, 4, 3, 1]])
the result is:
[[10 30 50 40 20]
[6 8 10 9 7 ]]

I have a matrix based solution for this, by post-multiplying a permutation matrix to the original one. This changes the position of the elements in original matrix
import numpy as np
a = np.array([[10, 20, 30, 40, 50],
[ 6, 7, 8, 9, 10]])
# Create the permutation matrix by placing 1 at each row with the column to replace with
your_permutation = [0,4,1,3,2]
perm_mat = np.zeros((len(your_permutation), len(your_permutation)))
for idx, i in enumerate(your_permutation):
perm_mat[idx, i] = 1
print np.dot(a, perm_mat)

If you're looking for any random permuation, you can do it in one line if you transpose columns into rows, permute the rows, then transpose back:
a = np.random.permutation(a.T).T

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Efficiently index a multidemnsional numpy array by another array - python

You can pass in an iterable of axis0 coordinates and an iterable of axis1 coordinates. See the Numpy docs here. i0, i1 = zip(*idxs) x[i0, i1] As #Divakar points out in the comments, this is less memory efficient than using a view of the array i.e. x[idxs[:, 0], idxs[:, 1]]

Related

Pytorch tensor indexing error for sizes M < 32?

Index numpy 3d-array with 1d array of indices

Iterating Over Rows in Python Array to Extract Column Data

Numpy slicing function: Dynamically create slice indices np.r_[a:b, c:d, ...] from array shaped (X, 2) for selection in array

Rearrange columns of numpy 2D array

Categories

Resources