Creating a 2D matrix of vectors from a n-d array - python

I have an matrix represented by a np array. Here is an example of what I am talking about. You can see it has 3 "vectors" inside of it
x = np.array([[1, 1], [1,2],[2,3]])
[1, 1], [1,2] and [2,3]
The goal is to turn this into a matrix where these vectors are repeated. So the 0th row of said matrix should simply be [1,1] repeated n times. And the 1st row should be [1,2] repeated n times. I believe this would look somewhat like for n=4
xresult = np.array([[[1, 1], [1, 1], [1, 1], [1, 1]],
[[1, 2], [1, 2], [1, 2], [1, 2]],
[[2, 3], [2, 3], [2, 3], [2, 3]]])
And therefore
xresult[0,0] = [1,1]
xresult[0,1] = [1,1]
xresult[0,2] = [1,1]
xresult[1,2] = [1,2]
The goal is of course to do this without loops if possible as that is an obvious but perhaps less elegant/performant solution.
Here are some attempts that do not work
np.tile([x],(2,1))
>>>array([[[1, 1],
[1, 2],
[2, 3],
[1, 1],
[1, 2],
[2, 3]]])
np.tile([x],(2,))
>>>array([[[1, 1, 1, 1],
[1, 2, 1, 2],
[2, 3, 2, 3]]])
np.append(x,x,axis=0)
>>>array([[1, 1],
[1, 2],
[2, 3],
[1, 1],
[1, 2],
[2, 3]])
np.append([x],[x],axis=1)
>>>array([[[1, 1],
[1, 2],
[2, 3],
[1, 1],
[1, 2],
[2, 3]]])
np.array([[x],[x]])
>>>array([[[[1, 1],
[1, 2],
[2, 3]]],
[[[1, 1],
[1, 2],
[2, 3]]]])
(Some of these were just with n=2 as a goal)
It is worth noting that the ultimate end goal is to take x and y (a similarly crafted array of vectors of the same dimension but not necessarily the same number of vectors
y = np.array([[99,11], [23,44],[33,44], [2, 1], [9, 9]])
And run the procedure on x so that columns of the result are the number of vectors in y. And run a procedure on y that is similar but does this row-wise.
y after this transform would have the following
yresult[0,0] = [99,11]
yresult[1,0] = [22,44]
yresult[2,0] = [33,44]
yresult[2,1] = [33,44]
This way I can subtract the two matrices. The goal is to create a matrix where x'vector index is the row, y'vector index is the row and the element is the difference between these two vectors.
ultimateResult[0,1]=[1,1]-[23,44]=[-22,-43]
Perhaps there is a better way to get this.

Related

Indexing ndarray with unknown number of dimensions with range dynamically

I have data array with unknown shape and array bounds of bounds for slicing data. This code is for 3D data, but is there any way of generalizing this to N-dim?
for b in bounds:
l0, u0 = b[0]
l1, u1 = b[1]
l2, u2 = b[2]
a = data[l0:u0, l1:u1, l2:u2]
print(a)
Tried using range python object as index, did not work.
Examples for data:
data2D = np.arange(2*3).reshape((2, 3))
data3D = np.arange(2*3*4).reshape((2, 3, 4))
Corresponding bounds:
bounds2D = np.array([[[0, 2], [0, 2]], [[0, 2], [1, 3]]])
bounds3D = np.array(
[
[[0, 2], [0, 2], [0, 2]],
[[0, 2], [0, 2], [2, 4]],
[[0, 2], [1, 3], [0, 2]],
[[0, 2], [1, 3], [2, 4]],
],
)
You can use the slice function to create a single slice from each element in bounds. Then collect these slices into a single tuple and use it to correctly recover the wanted items of the array. You can adapt your code as follows:
import numpy as np
# The dimension of the slices is equal to the
# one specified by the bounds provided
def create_slices(bounds):
slices = list()
# Take a single item of the bounds and create corresponding slices
for b in bounds:
# Slices are collected inside a single tuple
slices.append(tuple([slice(l, u) for l, u in b]))
return slices
# 4D example data
data4D = np.arange(2*3*4*5).reshape((2,3,4,5))
# Bounds array for 4D data
bounds4D = np.array(
[
[[0, 2], [0, 2], [0, 2], [0, 2]],
[[0, 2], [0, 2], [0, 2], [2, 4]],
[[0, 2], [1, 3], [2, 4], [0, 2]],
[[0, 2], [1, 3], [2, 4], [2, 4]],
],
)
slices = create_slices(bounds4D)
# Each element of slices is a single slice that can be used on
# the corresponding data array
for single_slice in slices:
a = data4D[single_slice]
print("Slice", a)

How to create a multidimensional matrix in Python

I am a beginner in Python.
I want to create the matrix below, how should I create it?
[
[0,1], [0,2], [0,3],
[1,1], [1,2], [1,3],
[2,1], [2,2], [2,3],
[3,1], [3,2], [3,3]
]
I looked up numpy, maybe I'm not looking in the right way, I didn't find any good way.
This is almost what numpy.ndindex is doing, except you want one of the values to start with 1. You can fix it by converting to array and adding 1:
np.array(list(np.ndindex(4,3)))+[0,1]
Output:
array([[0, 1],
[0, 2],
[0, 3],
[1, 1],
[1, 2],
[1, 3],
[2, 1],
[2, 2],
[2, 3],
[3, 1],
[3, 2],
[3, 3]])
A rather simple list comprehension will generate this data structure. Numpy not required.
[[x, y] for x in range(4) for y in range(1, 4)]
Result:
[[0, 1], [0, 2], [0, 3],
[1, 1], [1, 2], [1, 3],
[2, 1], [2, 2], [2, 3],
[3, 1], [3, 2], [3, 3]]

Using _scatter() to replace values in matrix

Given the following two tensors:
x = torch.tensor([[[1, 2],
[2, 0],
[0, 0]],
[[2, 2],
[2, 0],
[3, 3]]]) # [batch_size x sequence_length x subseq_length]
y = torch.tensor([[2, 1, 0],
[2, 1, 2]]) # [batch_size x sequence_length]
I would like to sort the sequences in x based on their sub-sequence lengths (0 corresponds to padding in the sequence). y corresponds to the lengths of the sub-sequences in x. I have tried the following:
y_sorted, y_sort_idx = y.sort(dim=1, descending=True)
print(x.scatter_(dim=1, index=y_sort_idx.unsqueeze(2), src=x))
This results in:
tensor([[[1, 2],
[2, 0],
[0, 0]],
[[2, 2],
[2, 0],
[2, 3]]])
However what I would like to achieve is:
tensor([[[1, 2],
[2, 0],
[0, 0]],
[[2, 2],
[3, 3],
[2, 0]]])
This should do it
y_sorted, y_sort_idx = y.sort(dim=1, descending=True)
index = y_sort_idx.unsqueeze(2).expand_as(x)
x = x.gather(dim=1, index=index)

Problem getting pairs in a 3D list using list comprehension

I'm working with list comprehension but I'm having a trouble working this out, so, I have a 3D list in which I'm trying to obtain pairs in the inner lists, I created a code, in which I can obtain pairs, but it's not exactly what I need, here is my code:
mylist = [[[3, 2, 4, 3], [3, 2, 1], [2, 1]], [[1, 2, 3], [3, 1], [2, 1]]]
res = [[x[idx: idx+2] for i in mylist for x in i for idx in range(0, len(x) - 1)]]
print(res)
#res = [[[3, 2], [2, 4], [4, 3], [3, 2], [2, 1], [2, 1], [1, 2], [2, 3], [3, 1], [2, 1]]]
As you can see, I do get a 3D list with the pairs, but, it's not separated, its just a plain 3D list, I was expecting this:
#Output
res = [[[3, 2], [2, 4], [4, 3], [3, 2], [2, 1], [2, 1]], [[1, 2], [2, 3], [3, 1], [2, 1]]]
# ^
# Here is the separation
I'm working on my list comprehension, but I can't see where is happening the problem, I believe there is something wrong with the bracket, but I been trying different combinations but nothing seems to work, so any help will be appreciated.
Also, maybe this is bit of a stretch, but there is some way I can eliminate some repeated inner list in the 3D list, I mean, using res to get:
newres = [[[3, 2], [2, 4], [4, 3], [2, 1]], [[1, 2], [2, 3], [3, 1], [2, 1]]]
#[3, 2], [2, 1] eliminated
If you can point me to the right direction that would be great, thank you so much!
[[x[idx: idx+2] for x in i for idx in range(0, len(x) - 1)] for i in mylist ]
Sorry that I am not good at writing nested loops in one line. But this will remove duplicates and creates a 3D list with pairs:
mylist = [[[3, 2, 4, 3], [3, 2, 1], [2, 1]], [[1, 2, 3], [3, 1], [2, 1]]]
res = []
for inner in mylist:
temp = []
for each in inner:
for e in zip(each, each[1:]):
if list(e) not in temp:
temp.append(list(e))
res.append(temp)
print(res) # [[[3, 2], [2, 4], [4, 3], [2, 1]], [[1, 2], [2, 3], [3, 1], [2, 1]]]

Numpy: Check for duplicates in first column and keep row with highest value [duplicate]

I have a large n x 2 numpy array that is formatted as (x, y) coordinates. I would like to filter this array so as to:
Identify coordinate pairs with duplicated x-values.
Keep only the coordinate pair of those duplicates with the highest y-value.
For example, in the following array:
arr = [[1, 4]
[1, 8]
[2, 3]
[4, 6]
[4, 2]
[5, 1]
[5, 2]
[5, 6]]
I would like the result to be:
arr = [[1, 8]
[2, 3]
[4, 6]
[5, 6]]
Ive explored np.unique and np.where but cannot figure out how to leverage them to solve this problem. Thanks so much!
Here's one way based on np.maximum.reduceat -
def grouby_maxY(a):
b = a[a[:,0].argsort()] # if first col is already sorted, skip this
grp_idx = np.flatnonzero(np.r_[True,(b[:-1,0] != b[1:,0])])
grp_maxY = np.maximum.reduceat(b[:,1], grp_idx)
return np.c_[b[grp_idx,0], grp_maxY]
Alternatively, if you want to bring np.unique, we can use it to find grp_idx with np.unique(b[:,0], return_index=1)[1].
Sample run -
In [453]: np.random.seed(0)
In [454]: arr = np.random.randint(0,5,(10,2))
In [455]: arr
Out[455]:
array([[4, 0],
[3, 3],
[3, 1],
[3, 2],
[4, 0],
[0, 4],
[2, 1],
[0, 1],
[1, 0],
[1, 4]])
In [456]: grouby_maxY(arr)
Out[456]:
array([[0, 4],
[1, 4],
[2, 1],
[3, 3],
[4, 0]])

Categories

Resources