Indexing ndarray with unknown number of dimensions with range dynamically

Indexing ndarray with unknown number of dimensions with range dynamically - python

I have data array with unknown shape and array bounds of bounds for slicing data. This code is for 3D data, but is there any way of generalizing this to N-dim?
for b in bounds:
l0, u0 = b[0]
l1, u1 = b[1]
l2, u2 = b[2]
a = data[l0:u0, l1:u1, l2:u2]
print(a)
Tried using range python object as index, did not work.
Examples for data:
data2D = np.arange(2*3).reshape((2, 3))
data3D = np.arange(2*3*4).reshape((2, 3, 4))
Corresponding bounds:
bounds2D = np.array([[[0, 2], [0, 2]], [[0, 2], [1, 3]]])
bounds3D = np.array(
[
[[0, 2], [0, 2], [0, 2]],
[[0, 2], [0, 2], [2, 4]],
[[0, 2], [1, 3], [0, 2]],
[[0, 2], [1, 3], [2, 4]],
],
)

You can use the slice function to create a single slice from each element in bounds. Then collect these slices into a single tuple and use it to correctly recover the wanted items of the array. You can adapt your code as follows:
import numpy as np
# The dimension of the slices is equal to the
# one specified by the bounds provided
def create_slices(bounds):
slices = list()
# Take a single item of the bounds and create corresponding slices
for b in bounds:
# Slices are collected inside a single tuple
slices.append(tuple([slice(l, u) for l, u in b]))
return slices
# 4D example data
data4D = np.arange(2*3*4*5).reshape((2,3,4,5))
# Bounds array for 4D data
bounds4D = np.array(
[
[[0, 2], [0, 2], [0, 2], [0, 2]],
[[0, 2], [0, 2], [0, 2], [2, 4]],
[[0, 2], [1, 3], [2, 4], [0, 2]],
[[0, 2], [1, 3], [2, 4], [2, 4]],
],
)
slices = create_slices(bounds4D)
# Each element of slices is a single slice that can be used on
# the corresponding data array
for single_slice in slices:
a = data4D[single_slice]
print("Slice", a)

Related

Sorting numpy array with lexsort. Alternative to pandas sort_values

I would like the order of b to equal the result of a below.
The result of b is not as I expected. I thought it would be sorting by the columns in ascending order but I think I misunderstood how lexsort works.
My goal is to be able to sort an array the way the df below is sorted.
I'm using lexsort because I think it would be the best thing to use for an array that also contained categorical values.
import numpy as np
import pandas as pd
x = np.array([[1,2],[1,3],[1,4],[2,1],[2,2],[2,3],[2,4],[0,1],[1,0],[0,2]])
a=pd.DataFrame(x).sort_values(by=[0,1], ascending=[0,1])
b=x[np.lexsort((x[:,1],x[:,0][::-1]))]
print(a)
print(b)

From the docs, it should be last, first to get the sort order:
sorter = np.lexsort((x[:, 1], x[:, 0]))
x[sorter][::-1] # sorting in descending order
Out[899]:
array([[2, 4],
[2, 3],
[2, 2],
[2, 1],
[1, 4],
[1, 3],
[1, 2],
[1, 0],
[0, 2],
[0, 1]])
To simulate descending on one end, with ascending on the other, you could combine np.unique, with np.split and np.concatenate:
temp = x[sorter]
_, s = np.unique(temp[:, 0], return_counts=True)
np.concatenate(np.split(temp, s.cumsum())[::-1])
Out[972]:
array([[2, 1],
[2, 2],
[2, 3],
[2, 4],
[1, 0],
[1, 2],
[1, 3],
[1, 4],
[0, 1],
[0, 2]])

Using _scatter() to replace values in matrix

Given the following two tensors:
x = torch.tensor([[[1, 2],
[2, 0],
[0, 0]],
[[2, 2],
[2, 0],
[3, 3]]]) # [batch_size x sequence_length x subseq_length]
y = torch.tensor([[2, 1, 0],
[2, 1, 2]]) # [batch_size x sequence_length]
I would like to sort the sequences in x based on their sub-sequence lengths (0 corresponds to padding in the sequence). y corresponds to the lengths of the sub-sequences in x. I have tried the following:
y_sorted, y_sort_idx = y.sort(dim=1, descending=True)
print(x.scatter_(dim=1, index=y_sort_idx.unsqueeze(2), src=x))
This results in:
tensor([[[1, 2],
[2, 0],
[0, 0]],
[[2, 2],
[2, 0],
[2, 3]]])
However what I would like to achieve is:
tensor([[[1, 2],
[2, 0],
[0, 0]],
[[2, 2],
[3, 3],
[2, 0]]])

This should do it
y_sorted, y_sort_idx = y.sort(dim=1, descending=True)
index = y_sort_idx.unsqueeze(2).expand_as(x)
x = x.gather(dim=1, index=index)

Creating a 2D matrix of vectors from a n-d array

I have an matrix represented by a np array. Here is an example of what I am talking about. You can see it has 3 "vectors" inside of it
x = np.array([[1, 1], [1,2],[2,3]])
[1, 1], [1,2] and [2,3]
The goal is to turn this into a matrix where these vectors are repeated. So the 0th row of said matrix should simply be [1,1] repeated n times. And the 1st row should be [1,2] repeated n times. I believe this would look somewhat like for n=4
xresult = np.array([[[1, 1], [1, 1], [1, 1], [1, 1]],
[[1, 2], [1, 2], [1, 2], [1, 2]],
[[2, 3], [2, 3], [2, 3], [2, 3]]])
And therefore
xresult[0,0] = [1,1]
xresult[0,1] = [1,1]
xresult[0,2] = [1,1]
xresult[1,2] = [1,2]
The goal is of course to do this without loops if possible as that is an obvious but perhaps less elegant/performant solution.
Here are some attempts that do not work
np.tile([x],(2,1))
>>>array([[[1, 1],
[1, 2],
[2, 3],
[1, 1],
[1, 2],
[2, 3]]])
np.tile([x],(2,))
>>>array([[[1, 1, 1, 1],
[1, 2, 1, 2],
[2, 3, 2, 3]]])
np.append(x,x,axis=0)
>>>array([[1, 1],
[1, 2],
[2, 3],
[1, 1],
[1, 2],
[2, 3]])
np.append([x],[x],axis=1)
>>>array([[[1, 1],
[1, 2],
[2, 3],
[1, 1],
[1, 2],
[2, 3]]])
np.array([[x],[x]])
>>>array([[[[1, 1],
[1, 2],
[2, 3]]],
[[[1, 1],
[1, 2],
[2, 3]]]])
(Some of these were just with n=2 as a goal)
It is worth noting that the ultimate end goal is to take x and y (a similarly crafted array of vectors of the same dimension but not necessarily the same number of vectors
y = np.array([[99,11], [23,44],[33,44], [2, 1], [9, 9]])
And run the procedure on x so that columns of the result are the number of vectors in y. And run a procedure on y that is similar but does this row-wise.
y after this transform would have the following
yresult[0,0] = [99,11]
yresult[1,0] = [22,44]
yresult[2,0] = [33,44]
yresult[2,1] = [33,44]
This way I can subtract the two matrices. The goal is to create a matrix where x'vector index is the row, y'vector index is the row and the element is the difference between these two vectors.
ultimateResult[0,1]=[1,1]-[23,44]=[-22,-43]
Perhaps there is a better way to get this.

Numpy 3d array delete first entry from every row

Let's say I have a 3d Numpy array:
array([[[0, 1, 2],
[0, 1, 2],
[0, 2, 5]]])
Is it possible to remove the first entry from all the rows (those inner most rows). In this case the 0 would be deleted in each row.
Giving us the following output:
[[[1, 2],
[1, 2],
[2, 5]]]

x
array([[[0, 1, 2],
[0, 1, 2],
[0, 2, 5]]])
x.shape
# (1, 3, 3)
You can use Ellipsis (...) to select across all the outermost axes, and slice out the first value from each row with 1:.
x[..., 1:]
array([[[1, 2],
[1, 2],
[2, 5]]])
x[..., 1:].shape
# (1, 3, 2)

To complement #coldspeed's response), slicing in numpy is very powerful and can be done in a variety of ways including with the colon operator : in the index, that is
print(x[:,:,1:])
# array([[[1, 2],
# [1, 2],
# [2, 5]]])
is equivalent to the established use of the ellipsis.

Numpy: Check for duplicates in first column and keep row with highest value [duplicate]

I have a large n x 2 numpy array that is formatted as (x, y) coordinates. I would like to filter this array so as to:
Identify coordinate pairs with duplicated x-values.
Keep only the coordinate pair of those duplicates with the highest y-value.
For example, in the following array:
arr = [[1, 4]
[1, 8]
[2, 3]
[4, 6]
[4, 2]
[5, 1]
[5, 2]
[5, 6]]
I would like the result to be:
arr = [[1, 8]
[2, 3]
[4, 6]
[5, 6]]
Ive explored np.unique and np.where but cannot figure out how to leverage them to solve this problem. Thanks so much!

Here's one way based on np.maximum.reduceat -
def grouby_maxY(a):
b = a[a[:,0].argsort()] # if first col is already sorted, skip this
grp_idx = np.flatnonzero(np.r_[True,(b[:-1,0] != b[1:,0])])
grp_maxY = np.maximum.reduceat(b[:,1], grp_idx)
return np.c_[b[grp_idx,0], grp_maxY]
Alternatively, if you want to bring np.unique, we can use it to find grp_idx with np.unique(b[:,0], return_index=1)[1].
Sample run -
In [453]: np.random.seed(0)
In [454]: arr = np.random.randint(0,5,(10,2))
In [455]: arr
Out[455]:
array([[4, 0],
[3, 3],
[3, 1],
[3, 2],
[4, 0],
[0, 4],
[2, 1],
[0, 1],
[1, 0],
[1, 4]])
In [456]: grouby_maxY(arr)
Out[456]:
array([[0, 4],
[1, 4],
[2, 1],
[3, 3],
[4, 0]])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Indexing ndarray with unknown number of dimensions with range dynamically - python

Related

Sorting numpy array with lexsort. Alternative to pandas sort_values

Using _scatter() to replace values in matrix

Creating a 2D matrix of vectors from a n-d array

Numpy 3d array delete first entry from every row

Numpy: Check for duplicates in first column and keep row with highest value [duplicate]

Categories

Resources