Using _scatter() to replace values in matrix - python

Given the following two tensors:
x = torch.tensor([[[1, 2],
[2, 0],
[0, 0]],
[[2, 2],
[2, 0],
[3, 3]]]) # [batch_size x sequence_length x subseq_length]
y = torch.tensor([[2, 1, 0],
[2, 1, 2]]) # [batch_size x sequence_length]
I would like to sort the sequences in x based on their sub-sequence lengths (0 corresponds to padding in the sequence). y corresponds to the lengths of the sub-sequences in x. I have tried the following:
y_sorted, y_sort_idx = y.sort(dim=1, descending=True)
print(x.scatter_(dim=1, index=y_sort_idx.unsqueeze(2), src=x))
This results in:
tensor([[[1, 2],
[2, 0],
[0, 0]],
[[2, 2],
[2, 0],
[2, 3]]])
However what I would like to achieve is:
tensor([[[1, 2],
[2, 0],
[0, 0]],
[[2, 2],
[3, 3],
[2, 0]]])

This should do it
y_sorted, y_sort_idx = y.sort(dim=1, descending=True)
index = y_sort_idx.unsqueeze(2).expand_as(x)
x = x.gather(dim=1, index=index)

Related

Indexing ndarray with unknown number of dimensions with range dynamically

I have data array with unknown shape and array bounds of bounds for slicing data. This code is for 3D data, but is there any way of generalizing this to N-dim?
for b in bounds:
l0, u0 = b[0]
l1, u1 = b[1]
l2, u2 = b[2]
a = data[l0:u0, l1:u1, l2:u2]
print(a)
Tried using range python object as index, did not work.
Examples for data:
data2D = np.arange(2*3).reshape((2, 3))
data3D = np.arange(2*3*4).reshape((2, 3, 4))
Corresponding bounds:
bounds2D = np.array([[[0, 2], [0, 2]], [[0, 2], [1, 3]]])
bounds3D = np.array(
[
[[0, 2], [0, 2], [0, 2]],
[[0, 2], [0, 2], [2, 4]],
[[0, 2], [1, 3], [0, 2]],
[[0, 2], [1, 3], [2, 4]],
],
)
You can use the slice function to create a single slice from each element in bounds. Then collect these slices into a single tuple and use it to correctly recover the wanted items of the array. You can adapt your code as follows:
import numpy as np
# The dimension of the slices is equal to the
# one specified by the bounds provided
def create_slices(bounds):
slices = list()
# Take a single item of the bounds and create corresponding slices
for b in bounds:
# Slices are collected inside a single tuple
slices.append(tuple([slice(l, u) for l, u in b]))
return slices
# 4D example data
data4D = np.arange(2*3*4*5).reshape((2,3,4,5))
# Bounds array for 4D data
bounds4D = np.array(
[
[[0, 2], [0, 2], [0, 2], [0, 2]],
[[0, 2], [0, 2], [0, 2], [2, 4]],
[[0, 2], [1, 3], [2, 4], [0, 2]],
[[0, 2], [1, 3], [2, 4], [2, 4]],
],
)
slices = create_slices(bounds4D)
# Each element of slices is a single slice that can be used on
# the corresponding data array
for single_slice in slices:
a = data4D[single_slice]
print("Slice", a)

how to split a list in python based on the values of the list

I have a list having sublists of numbers and want to extract specific ones. In my simplified example I have two main sublists and each one has its own pairs of numbers:
data=[[[1, 0], [2, 0], [2, 1], [2, 2],\
[1, 0], [1, 1], [1, 2],\
[0, 1], [0, 2], [0, 3]],\
[[1, 0], [2, 0],\
[1, 0],\
[0, 1], [0, 2], [1, 2],\
[1, 0], [1, 1], [1, 1]]]
Pairs stored in data can be divided based on some rules and I want the last pair of each division. For simplicity I have shown each division as a row in data. Each division starts with [1, 0] or [0, 1] and these two pairs are break points. Then, simply I want the last pair before each break points. In cases I may have no point between two break points and I only export the previous break point. Finally I want it as the following list:
data=[[[2, 2],\
[1, 2],\
[0, 3]],\
[[2, 0],\
[1, 0],\
[1, 2],\
[1, 1]]]
You can do the following, using enumerate:
def fun(lst):
return [p for i, p in enumerate(lst) if i==len(lst)-1 or set(lst[i+1])=={0,1}]
[*map(fun, data)]
# [[[2, 2], [1, 2], [0, 3]], [[2, 0], [1, 0], [1, 2], [1, 1]]]
fun filters a nested list for all elements that are either last or succeeded by [0, 1] or [1, 0].
data=[[[1, 0], [2, 0], [2, 1], [2, 2],
[1, 0], [1, 1], [1, 2],
[0, 1], [0, 2], [0, 3]],
[[1, 0], [2, 0],
[1, 0],
[0, 1], [0, 2], [1, 2],
[1, 0], [1, 1], [1, 1]]]
newData = []
for subarray in data:
new_subarray = []
for i,item in enumerate(subarray):
if item == [0,1] or item == [1,0]:
if i> 0:
new_subarray.append(subarray[i-1])
if i == len(subarray)-1:
new_subarray.append(item)
newData.append(new_subarray)
print(newData)
Here is a fun little unreadable numpy oneliner:
import numpy as np
[np.array(a)[np.roll(np.flatnonzero(np.logical_or(np.all(np.array(a)==(1, 0), axis=1), np.all(np.array(a)==(0, 1), axis=1)))-1, -1)].tolist() for a in data]
# [[[2, 2], [1, 2], [0, 3]], [[2, 0], [1, 0], [1, 2], [1, 1]]]
It works but in reality you'd better use schwobaseggl's solution.

Get value of variable index in particular dimension

Say if i have a tensor that is
value = torch.tensor([
[[0, 0, 0], [1, 1, 1]],
[[2, 2, 2], [3, 3, 3]],
])
essentially with shape (2,2,3).
Now say if i have an index = [1, 0], which means I want to take:
# row 1 of [[0, 0, 0], [1, 1, 1]], giving me: [1, 1, 1]
# row 0 of [[2, 2, 2], [3, 3, 3]], giving me: [2, 2, 2]
So that the final output:
output = torch.tensor([[1, 1, 1], [2, 2, 2]])
is there a vectorized way to achieve this?
You can use advanced indexing.
I can't find a good pytorch document about this, but I believe it works as same as numpy, so here's the numpy's document about indexing.
import torch
value = torch.tensor([
[[0, 0, 0], [1, 1, 1]],
[[2, 2, 2], [3, 3, 3]],
])
index = [1, 0]
i = range(0,2)
result = value[i, index]
# same as result = value[i, index, :]
print(result)

Creating a 2D matrix of vectors from a n-d array

I have an matrix represented by a np array. Here is an example of what I am talking about. You can see it has 3 "vectors" inside of it
x = np.array([[1, 1], [1,2],[2,3]])
[1, 1], [1,2] and [2,3]
The goal is to turn this into a matrix where these vectors are repeated. So the 0th row of said matrix should simply be [1,1] repeated n times. And the 1st row should be [1,2] repeated n times. I believe this would look somewhat like for n=4
xresult = np.array([[[1, 1], [1, 1], [1, 1], [1, 1]],
[[1, 2], [1, 2], [1, 2], [1, 2]],
[[2, 3], [2, 3], [2, 3], [2, 3]]])
And therefore
xresult[0,0] = [1,1]
xresult[0,1] = [1,1]
xresult[0,2] = [1,1]
xresult[1,2] = [1,2]
The goal is of course to do this without loops if possible as that is an obvious but perhaps less elegant/performant solution.
Here are some attempts that do not work
np.tile([x],(2,1))
>>>array([[[1, 1],
[1, 2],
[2, 3],
[1, 1],
[1, 2],
[2, 3]]])
np.tile([x],(2,))
>>>array([[[1, 1, 1, 1],
[1, 2, 1, 2],
[2, 3, 2, 3]]])
np.append(x,x,axis=0)
>>>array([[1, 1],
[1, 2],
[2, 3],
[1, 1],
[1, 2],
[2, 3]])
np.append([x],[x],axis=1)
>>>array([[[1, 1],
[1, 2],
[2, 3],
[1, 1],
[1, 2],
[2, 3]]])
np.array([[x],[x]])
>>>array([[[[1, 1],
[1, 2],
[2, 3]]],
[[[1, 1],
[1, 2],
[2, 3]]]])
(Some of these were just with n=2 as a goal)
It is worth noting that the ultimate end goal is to take x and y (a similarly crafted array of vectors of the same dimension but not necessarily the same number of vectors
y = np.array([[99,11], [23,44],[33,44], [2, 1], [9, 9]])
And run the procedure on x so that columns of the result are the number of vectors in y. And run a procedure on y that is similar but does this row-wise.
y after this transform would have the following
yresult[0,0] = [99,11]
yresult[1,0] = [22,44]
yresult[2,0] = [33,44]
yresult[2,1] = [33,44]
This way I can subtract the two matrices. The goal is to create a matrix where x'vector index is the row, y'vector index is the row and the element is the difference between these two vectors.
ultimateResult[0,1]=[1,1]-[23,44]=[-22,-43]
Perhaps there is a better way to get this.

How do I accept "divide by zero" as zero? (Python)

So I have the following code:
import numpy as np
array1 = np.array([[[[2, 2, 3], [0, 2, 0], [2, 0, 0]],
[[1, 2, 2], [2, 2, 0], [0, 2, 3]],
[[0, 4, 2], [2, 2, 2], [2, 2, 3]]],
[[[2, 3, 0], [3, 2, 0], [2, 0, 3]],
[[0, 2, 2], [2, 2, 0], [2, 2, 3]],
[[1, 0, 2], [2, 2, 2], [2, 2, 0]]],
[[[2, 0, 0], [0, 2, 0], [2, 0, 0]],
[[2, 2, 2], [0, 2, 0], [2, 2, 0]],
[[0, 2, 2], [2, 2, 2], [2, 2, 0]]]])
array2 = np.array([[[[2, 2, 3], [0, 2, 0], [2, 0, 0]],
[[1, 2, 2], [2, 2, 0], [0, 2, 3]],
[[0, 4, 2], [2, 2, 2], [2, 2, 3]]],
[[[2, 3, 0], [3, 2, 0], [2, 0, 3]],
[[0, 2, 2], [2, 10, 0], [2, 2, 3]],
[[1, 0, 2], [2, 2, 2], [2, 2, 0]]],
[[[2, 0, 0], [0, 2, 0], [2, 0, 0]],
[[2, 2, 2], [0, 2, 0], [2, 2, 0]],
[[0, 2, 2], [2, 2, 2], [2, 2, 0]]]])
def calc(x, y):
result = y/x
return result
final_result = []
for x, y in zip(array1, array2):
final_result.append(calc(np.array(x), np.array(y)))
So all in all I have two lists that include some 3D arrays, and then I have defined a function. The last part is where I use each 3D array in the function, and I ultimately end up with a list (final_result) of some other 3D arrays where the function has been used on each entry from array1 and array2.
However, as you can see, array1 which ultimately gives the x values in the function does have 0 values in some of the entries. And yes, mathematically, this is no good. But in this case, I really just need the entries that does have a zero x-entry to be zero. So it doesn't need to run the function whenever that happens, but just skip it, and leave that entry as zero.
Can this be done?
This question has been answered here. Numpy has a specific way to catch such errors:
def calc( a, b ):
""" ignore / 0, div0( [-1, 0, 1], 0 ) -> [0, 0, 0] """
with np.errstate(divide='ignore', invalid='ignore'):
c = np.true_divide( a, b )
c[ ~ np.isfinite( c )] = 0 # -inf inf NaN
return c

Categories

Resources