Remove rows from tensor by matching - python

I'm trying to do some operation like if there is tensor in pytorch
a = torch.tensor([[1,0]
,[0,1]
,[2,0]
,[3,2]])
b = torch.tensor([[0,1]
,[2,0]])
I want to remove the rows [0,1], [2,0] which are the rows of b from a.
Is there any way to do this?
# result
a = torch.tensor([[1,0]
,[3,2]])

You could do it if the tensor shapes were broadcastable.
For a tensor a of shape (?, d) and a tensor b of shape (d,), you could write something like:
cmp = a.eq(b).all(dim=1).logical_not(), i.e. compare each d-dimensional row of a with b and give me the indices where the comparison is False.
From these you can then easily your new tensor like so:
a = a[cmp]
I doubt you'll find an elegant way of doing this when b itself contains a batch dimension; your best bet would be to write a for loop.
Full example:
>>> xs = torch.tensor([[1,0], [0,1], [2,0], [3,2]])
>>> ys = torch.tensor([[0,1],[2,0]])
>>> for y in ys:
... xs = xs[xs.eq(y).all(dim=1).logical_not()]
>>> xs
tensor([[1, 0],
[3, 2]])

You can do something like this exploiting broadcasting:
import torch
a = torch.tensor([[1, 0], [0, 1], [2, 0], [3, 2]])
b = torch.tensor([[0, 1], [2, 0]])
indices = ((a == b[:, None]).sum(axis = 2) != a.shape[1]).all(axis = 0)
print(indices)
print(a[indices])
indices =
tensor([ True, False, False, True])
a[indices] =
tensor([[1, 0],
[3, 2]])
Works for all tensors a and b of shapes m x n and p x n respectively i.e, the number of columns (a.shape[1]) must be same and you can compare among any no. of rows.

Related

Multiplying array of vectors by matrix without for loop

I have a 2 x 2 numpy.array() matrix, and an array N x 2 X, containing N 2-dimensional vectors.
I want to multiply each vector in X by the 2 x 2 matrix. Below I use a for loop, but I am sure there is a faster way. Please, could someone show me what it is? I assume there is a way using a numpy function.
# the matrix I want to multiply X by
matrix = np.array([[0, 1], [-1, 0]])
# initialize empty solution
Y = np.empty((N, 2))
# loop over each vector in X and create a new vector Y with the result
for i in range(0, N):
Y[i] = np.dot(matrix, X[i])
For example, these arrays:
matrix = np.array([
[0, 1],
[0, -1]
])
X = np.array([
[0, 0],
[1, 1],
[2, 2]
])
Should result in:
Y = np.array([
[0, 0],
[1, -1],
[2, -2]
])
One-liner is (matrix # X.T).T
Just transpose your X, to get your vectors in columns. Then matrix # X.T or (np.dot(matrix, X.T) if you prefer this solution, but now that # notation exists, why not using it) is a matrix made of columns of matrix times X[i]. Just transpose back the result if you need Y to be made of lines of results
matrix = np.array([[0, 1], [-1, 0]])
X = np.array([[1,2],[3,4],[5,6]])
Y = (matrix # X.T).T
Y is
array([[ 2, -1],
[ 4, -3],
[ 6, -5]])
As expected, I guess.
In detail:
X is
array([[1, 2],
[3, 4],
[5, 6]])
so X.T is
array([[1, 3, 5],
[2, 4, 6]])
So, you can multiply your 2x2 matrix by this 2x3 matrix, and the result will be a 2x3 matrix whose columns are the result of multiplication of matrix by the column of this. matrix # X.T is
array([[ 2, 4, 6],
[-1, -3, -5]])
And transposing back this gives the already given result.
So, tl;dr: one-liner answer is (matrix # X.T).T
You are doing some kind of matrix multiplication with (2,2) matrix and each (2,1) X line.
You need to make all your vectors the same dimension to directly calculate this. Add a dimension with None and directly calculate Y like this :
matrix = np.array([[3, 1], [-1, 0.1]])
N = 10
Y = np.empty((N, 2))
X =np.ones((N,2))
X[0][0] = 2
X[5][1] = 3
# loop over each vector in X and create a new vector Y with the result
for i in range(0, N):
Y[i] = np.dot(matrix, X[i])
Ydirect = matrix[None,:] # X[:,:,None]
print(Y)
print(Ydirect[:,:,0])
You can vectorize Adrien's result and remove the for loop, which will optimize performance, especially as the matrices get bigger.
matrix = np.array([[3, 1], [-1, 0.1]])
N = 10
X = np.ones((N, 2))
X[0][0] = 2
X[5][1] = 3
# calculate dot product using # operator
Y = matrix # X.T
print(Y)

Moving some matrix b(kxk) over the matrix a(nxm) with given stride, multiplying the corresponding elements, and adding them into the new matrix

So here i have this problem.
Given 2D numpy arrays 'a' and 'b' of sizes m×n and k×k
respectively (k <= n, k <= m), 2 integers 'stride' and 'padding' and
'f' function. You need to
first pad 'a' matrix with 0s on each side,
then move 'b' over 'a' with stride 'stride', then multiply their elements by the corresponding 'b' elements,
add the resulting k * k numbers
apply the 'f' function to the result
and place them in the new matrix.
a = np.array([[1, 1, 2],
[0, 1, 3],
[1, 3, 0],
[4, 5, 2]])
b = np.array([[1, 0],
[0, 1]])
stride = 1
padding = 0
f = lambda x: x**2
print(conv(a, b, stride, padding, f))
>>[[4, 16],
[9, 1],
[36, 25]]
I don't understand how I should handle it in case if the stride is too large, for example if I set stride=2 in the example above, what will the program do? Will it take at first the [[1,1], [0,1]] then skip to the [[0,1], [1,3]], or somehow differently?
And what functions or method will be useful in this example, I already know how to pad matrices with 0s, but is there something else that could be useful?
def padding(a, padd):
matrix = np.zeros((len(a)+2*padd,len(a[0])+2*padd))
for i in range(len(a)):
for j in range(len(a[0])):
matrix[i+padd,j+padd] = a[i,j]
return matrix
def conv(a, b, stride, padd, f):
output = np.zeros((len(a)-(len(b)-1),len(b)))
c = padding(a,padd)
matrices = []
for i in range(len(output)):
column = stride - 1
for j in range(len(output[0])):
output[i,j] = np.sum(a[i:i+len(b),j+column:j+column+len(b)] * b)
return f(output)
a = np.array([[1, 1, 2],
[0, 1, 3],
[1, 3, 0],
[4, 5, 2]])
b = np.array([[1, 0],
[0, 1]])
stride = 1
pad = 0
f = lambda x: x**2
print(conv(a, b, stride, pad, f))
Hello from 1991 )

NumPy apply function to groups of rows corresponding to another numpy array

I have a NumPy array with each row representing some (x, y, z) coordinate like so:
a = array([[0, 0, 1],
[1, 1, 2],
[4, 5, 1],
[4, 5, 2]])
I also have another NumPy array with unique values of the z-coordinates of that array like so:
b = array([1, 2])
How can I apply a function, let's call it "f", to each of the groups of rows in a which correspond to the values in b? For example, the first value of b is 1 so I would get all rows of a which have a 1 in the z-coordinate. Then, I apply a function to all those values.
In the end, the output would be an array the same shape as b.
I'm trying to vectorize this to make it as fast as possible. Thanks!
Example of an expected output (assuming that f is count()):
c = array([2, 2])
because there are 2 rows in array a which have a z value of 1 in array b and also 2 rows in array a which have a z value of 2 in array b.
A trivial solution would be to iterate over array b like so:
for val in b:
apply function to a based on val
append to an array c
My attempt:
I tried doing something like this, but it just returns an empty array.
func(a[a[:, 2]==b])
The problem is that the groups of rows with the same Z can have different sizes so you cannot stack them into one 3D numpy array which would allow to easily apply a function along the third dimension. One solution is to use a for-loop, another is to use np.split:
a = np.array([[0, 0, 1],
[1, 1, 2],
[4, 5, 1],
[4, 5, 2],
[4, 3, 1]])
a_sorted = a[a[:,2].argsort()]
inds = np.unique(a_sorted[:,2], return_index=True)[1]
a_split = np.split(a_sorted, inds)[1:]
# [array([[0, 0, 1],
# [4, 5, 1],
# [4, 3, 1]]),
# array([[1, 1, 2],
# [4, 5, 2]])]
f = np.sum # example of a function
result = list(map(f, a_split))
# [19, 15]
But imho the best solution is to use pandas and groupby as suggested by FBruzzesi. You can then convert the result to a numpy array.
EDIT: For completeness, here are the other two solutions
List comprehension:
b = np.unique(a[:,2])
result = [f(a[a[:,2] == z]) for z in b]
Pandas:
df = pd.DataFrame(a, columns=list('XYZ'))
result = df.groupby(['Z']).apply(lambda x: f(x.values)).tolist()
This is the performance plot I got for a = np.random.randint(0, 100, (n, 3)):
As you can see, approximately up to n = 10^5 the "split solution" is the fastest, but after that the pandas solution performs better.
If you are allowed to use pandas:
import pandas as pd
df=pd.DataFrame(a, columns=['x','y','z'])
df.groupby('z').agg(f)
Here f can be any custom function working on grouped data.
Numeric example:
a = np.array([[0, 0, 1],
[1, 1, 2],
[4, 5, 1],
[4, 5, 2]])
df=pd.DataFrame(a, columns=['x','y','z'])
df.groupby('z').size()
z
1 2
2 2
dtype: int64
Remark that .size is the way to count number of rows per group.
To keep it into pure numpy, maybe this can suit your case:
tmp = np.array([a[a[:,2]==i] for i in b])
tmp
array([[[0, 0, 1],
[4, 5, 1]],
[[1, 1, 2],
[4, 5, 2]]])
which is an array with each group of arrays.
c = np.array([])
for x in np.nditer(b):
c = np.append(c, np.where((a[:,2] == x))[0].shape[0])
Output:
[2. 2.]

numpy, replace column with array

I am trying to replace one or several columns with a new array with the same length.
a = np.array([[1,2,3],[1,2,3],[1,2,3]])
b = np.array([[0,0,0])
a[:, 0] = b
I got an error of ValueError: could not broadcast input array from shape (3,1) into shape (3). However this works when b has multiple columns.
a = np.array([[1,2,3],[1,2,3],[1,2,3]])
b = np.array([[0,7],[0,7],[0,7]])
a[:, 0:2] = b
array([[0, 7, 3],
[0, 7, 3],
[0, 7, 3]])
How can I efficiently replace a column with another array?
Thanks
J
Your example will work fine if you use the following just like you are using a[:, 0:2] = b. [:, 0:1] is effectively just the first column
a = np.array([[1,2,3],[1,2,3],[1,2,3]])
b = np.array([[0],[0],[0]])
a[:, 0:1] = b
# array([[0, 2, 3],
# [0, 2, 3],
# [0, 2, 3]])
You have an incorrect shape of b. You should pass an ordinary 1D array to it if you want to replace only one column:
a = np.array([[1,2,3],[1,2,3],[1,2,3]])
b = np.array([0,0,0])
a[:, 0] = b
a
Returns:
array([[0, 2, 3],
[0, 2, 3],
[0, 2, 3]])

Reverse order of some elements in Tensorflow

Say I have a tensor DATA of shape (M, N, 2).
I also have another tensor IND of shape (N) consisting of zeros and ones.
If IND(i)==1 then DATA(:,i,0) and DATA(:,i,1) have to swap. If IND(i)==0 they won't swap.
How can I do this? I know that this can be done via tf.gather_nd, but I have no idea how.
Here is one possible solution with tf.equal, tf.where, tf.scater_nd_update, tf.gather_nd and tf.reverse_v2:
data = tf.Variable([[[1, 2],
[2, 3],
[3, 4],
[4, 5],
[5, 6]]]) # shape=(1,5,2)
# reverse elements where ind is 1
ind = tf.constant([1, 0, 1, 0, 1]) # shape(5,)
cond = tf.where(tf.equal([ind], 1))
match_data = tf.gather_nd(data, cond)
rev_match_data = tf.reverse_v2(match_data, axis=[-1])
data = tf.scatter_nd_update(data, cond, rev_match_data)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(data))
#[[[2 1]
# [2 3]
# [4 3]
# [4 5]
# [6 5]]]
One way which does not use tf.gather_ind is as follows. The idea is to build DATA1, which is DATA with all possible swaps (i.e. the result of swapping if IND had been a vector of 1s), and use masks to choose the correct values from either Data or Data1 depending on whether a swap is needed or not.
DATA1 = tf.concat([tf.reshape(DATA[:,:,1], [M, N, 1]), tf.reshape(DATA[:,:,0], [M, N, 1])], axis = 2)
Mask1 = tf.cast(tf.reshape(IND, [1, N, 1]), tf.float64)
Mask0 = 1 - Mask1
Res = tf.multiply(Mask0, DATA) + tf.multiply(Mask1, DATA1)

Categories

Resources