Calculate pairwise distance of multiple trajectories using numpy - python

Given an arbitrary number of 3D trajectories with N points (timesteps) each, I would like to compute the distance between each point for a given timestep.
Let's say we'll look at timestep 3 and have four trajectories t_0 ... t_3. The point of the third timestep of trajectory 0 is given as t_0(3). I want to calculate the distances as follows:
d_0 = norm(t_0(3) - t_1(3))
d_1 = norm(t_1(3) - t_2(3))
d_2 = norm(t_2(3) - t_3(3))
d_3 = norm(t_3(3) - t_0(3))
As you can see there is kind of circular behavior in it (the last one calculates the distance to the first one), but that is not strictly necessary.
I know how to write some for-loops and calculate what I want to. What i am looking for is a concept or maybe an implementation in numpy (or combinations of np-functions) which can perform this logic just using the right axis and other numpy magic.
Here some example trajectories
import numpy as np
TIMESTEP_COUNT = 70
origin = np.array([0, 0, 0])
run1_direction = np.array([1, 0, 0]) / np.linalg.norm([1, 0 ,0])
run2_direction = np.array([0, 1, 0]) / np.linalg.norm([0, 1, 0])
run3_direction = np.array([0, 0, 1]) / np.linalg.norm([0, 0, 1])
run4_direction = np.array([1, 1, 0]) / np.linalg.norm([1, 1, 0])
run1_trajectory = [origin]
run2_trajectory = [origin]
run3_trajectory = [origin]
run4_trajectory = [origin]
for t in range(TIMESTEP_COUNT - 1):
run1_trajectory.append(run1_trajectory[-1] + run1_direction)
run2_trajectory.append(run2_trajectory[-1] + run2_direction)
run3_trajectory.append(run3_trajectory[-1] + run3_direction)
run4_trajectory.append(run4_trajectory[-1] + run4_direction)
run1_trajectory = np.array(run1_trajectory)
run2_trajectory = np.array(run2_trajectory)
run3_trajectory = np.array(run3_trajectory)
run4_trajectory = np.array(run4_trajectory)
... results in the following image:
Thank you in advance!!
EDIT:
My question is different to the suggested answer below because i don't want to calculate a full distance matrix. My algo should work with the distances among consecutive runs only.

I think you can stack them vertically to get an array of shape 4 x n_timesteps, and then use np.roll to do the difference in each timestep, namely:
r = np.vstack([t0,t1,t2,t3])
r - np.roll(r,shift=-1,axis=0)
Numeric example:
t0,t1,t2,t3 = np.random.randint(1,10, 5), np.random.randint(1,10, 5), np.random.randint(1,10, 5), np.random.randint(1,10, 5)
r = np.vstack([t0,t1,t2,t3])
r
array([[1, 7, 7, 6, 2],
[9, 1, 2, 3, 6],
[1, 1, 6, 8, 1],
[2, 9, 5, 9, 3]])
r - np.roll(r,shift=-1,axis=0)
array([[-8, 6, 5, 3, -4],
[ 8, 0, -4, -5, 5],
[-1, -8, 1, -1, -2],
[ 1, 2, -2, 3, 1]])

Related

Pytorch batch matrix vector outer product

I am trying to generate a vector-matrix outer product (tensor) using PyTorch. Assuming the vector v has size p and the matrix M has size qXr, the result of the product should be pXqXr.
Example:
#size: 2
v = [0, 1]
#size: 2X3
M = [[0, 1, 2],
[3, 4, 5]]
#size: 2X2X3
v*M = [[[0, 0, 0],
[0, 0, 0]],
[[0, 1, 2],
[3, 4, 5]]]
For two vectors v1 and v2, I can use torch.bmm(v1.view(1, -1, 1), v2.view(1, 1, -1)). This can be easily extended for a batch of vectors. However, I am not able to find a solution for vector-matrix case. Also, I need to do this operation for batches of vectors and matrices.
You can use torch.einsum operator:
torch.einsum('bp,bqr->bpqr', v, M) # batch-wise operation v.shape=(b,p) M.shape=(b,q,r)
torch.einsum('p,qr->pqr', v, M) # cross-batch operation
I was able to do it with following code.
Single vector and matrix
v = torch.arange(3)
M = torch.arange(8).view(2, 4)
# v: tensor([0, 1, 2])
# M: tensor([[0, 1, 2, 3],
# [4, 5, 6, 7]])
torch.mm(v.unsqueeze(1), M.view(1, 2*4)).view(3,2,4)
tensor([[[ 0, 0, 0, 0],
[ 0, 0, 0, 0]],
[[ 0, 1, 2, 3],
[ 4, 5, 6, 7]],
[[ 0, 2, 4, 6],
[ 8, 10, 12, 14]]])
For a batch of vectors and matrices, it can be easily extended using torch.bmm.
v = torch.arange(batch_size*2).view(batch_size, 2)
M = torch.arange(batch_size*3*4).view(batch_size, 3, 4)
torch.bmm(v.unsqueeze(2), M.view(-1, 1, 3*4)).view(-1, 2, 3, 4)
If [batch_size, z, x, y] is the shape of the target matrix, another solution is building two matrices of this shape with appropriate elements in each position and then apply an elementwise multiplication. It works fine with batch of vectors:
# input matrices
batch_size = 2
x1 = torch.Tensor([0,1])
x2 = torch.Tensor([[0,1,2],
[3,4,5]])
x1 = x1.unsqueeze(0).repeat((batch_size, 1))
x2 = x2.unsqueeze(0).repeat((batch_size, 1, 1))
# dimensions
b = x1.shape[0]
z = x1.shape[1]
x = x2.shape[1]
y = x2.shape[2]
# solution
mat1 = x1.reshape(b, z, 1, 1).repeat(1, 1, x, y)
mat2 = x2.reshape(b,1,x,y).repeat(1, z, 1, 1)
mat1*mat2

numpy indexing: add vector to parts of rows, starting at varying position

I have this 2d array of zeros z and this 1d array of starting points starts. In addition, I have an 1d array of offsets
z = z = np.zeros(35, dtype='i').reshape(5, 7)
starts = np.array([1, 5, 3, 0, 3])
offsets = np.arange(5) + 1
I would like to vectorize this little for loop here, but I seem to be unable to do it.
for i in range(z.shape[0]):
z[i, starts[i]:] += offsets[i]
The result in this example should look like this:
z
array([[0, 1, 1, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 2, 2],
[0, 0, 0, 3, 3, 3, 3],
[4, 4, 4, 4, 4, 4, 4],
[0, 0, 0, 5, 5, 5, 5]])
We could use some masking and NumPy broadcasting -
mask = starts[:,None] <= np.arange(z.shape[1])
z[mask] = np.repeat(offsets, mask.sum(1))
We could play a trick of broadcasted multiplication to get the final output -
z = offsets[:,None] * mask
Other way would be to assign values into z from offsets and then mask out the rest of mask, like so -
z[:] = offsets[:,None]
z[~mask] = 0
And other way would be have a replicated version from offsets as the starting z and then mask out -
z = np.repeat(offsets,z.shape[1]).reshape(z.shape[0],-1)
z[~mask] = 0
Of course, we would need the shape parameters before-hand.
If z is not initialized as zeros array, then only one of the solutions mentioned earlier would be applicable and that would need to be updated with +=, like so -
z[mask] += np.repeat(offsets, mask.sum(1))

Pythonic way to sparsely randomly populate array?

Problem:
Populate a 10 x 10 array of zeros randomly with 10 1's, 20 2's, 30 3's.
I don't actually have to use an array, rather I just need coordinates for the positions where the values would be. It's just easier to think of in terms of an array.
I have written several solutions for this, but they all seem to be non-straight forward and non-pythonic. I am hoping someone can give me some insight. My method has been using a linear array of 0--99, choosing randomly (np.random.choice) 10 values, removing them from the array, then choosing 20 random values. After that, I convert the linear positions into (y,x) coordinates.
import numpy as np
dim = 10
grid = np.arange(dim**2)
n1 = 10
n2 = 20
n3 = 30
def populate(grid, n, dim):
pos = np.random.choice(grid, size=n, replace=False)
yx = np.zeros((n,2))
for i in xrange(n):
delPos = np.where(grid==pos[i])
grid = np.delete(grid, delPos)
yx[i,:] = [np.floor(pos[i]/dim), pos[i]%dim]
return(yx, grid)
pos1, grid = populate(grid, n1, dim)
pos2, grid = populate(grid, n2, dim)
pos3, grid = populate(grid, n3, dim)
Extra
Suppose when I populate the 1's, I want them all on one half of the "array." I can do it using my method (sampling from grid[dim**2/2:]), but I haven't figured out how to do the same with the other suggestions.
You can create a list of all coordinates, shuffle that list and take the first 60 of those (10 + 20 + 30):
>>> import random
>>> coordinates = [(i, j) for i in xrange(10) for j in xrange(10)]
>>> random.shuffle(coordinates)
>>> coordinates[:60]
[(9, 5), (6, 9), (1, 5), ..., (0, 2), (5, 9), (2, 6)]
You can then use the first 10 to insert the 10 values, the next 20 for the 20 values and the remaining for the 30 values.
To generate the array, you can use numpy.random.choice.
np.random.choice([0, 1, 2, 3], size=(10,10), p=[.4, .1, .2, .3])
Then you can convert to coordinates. Note that numpy.random.choice generates a random sample using probabilities p, and thus you are not guaranteed to get the exact proportions in p.
Extra
If you want to have all the 1s on a particular side of the array, you can generate two random arrays and then hstack them. The trick is to slightly modify the probabilities of each number on each side.
In [1]: import numpy as np
In [2]: rem = .1/3 # amount to de- / increase the probability for non-1s
In [3]: A = np.random.choice([0, 1, 2, 3], size=(5, 10),
p=[.4-rem, .2, .2-rem, .3-rem])
In [4]: B = np.random.choice([0, 2, 3], size=(5, 10), p=[.4+rem, .2+rem, .3+rem])
In [5]: M = np.hstack( (A, B) )
In [6]: M
Out[1]:
array([[1, 1, 3, 0, 3, 0, 0, 1, 1, 0, 2, 2, 0, 2, 0, 2, 3, 3, 2, 0],
[0, 3, 3, 3, 3, 0, 1, 3, 1, 3, 0, 2, 3, 0, 0, 0, 3, 3, 2, 3],
[1, 0, 0, 0, 1, 0, 3, 1, 2, 2, 0, 3, 0, 3, 3, 0, 0, 3, 0, 0],
[3, 2, 3, 0, 3, 0, 1, 2, 3, 2, 0, 0, 0, 0, 3, 2, 0, 0, 0, 3],
[3, 3, 0, 3, 3, 3, 1, 3, 0, 3, 0, 2, 0, 2, 0, 0, 0, 3, 3, 3]])
Here, because I'm putting all the 1s on the left, I double the probability of 1 and decrease the probability of each number equally. The same logic applies when creating the other side.
Not sure if this is anymore "pythonic", but here's something I came up with using part of Simeon's answer.
import random
dim = 10
n1 = 10
n2 = 20
n3 = 30
coords = [[i,j] for i in xrange(dim) for j in xrange(dim)]
def setCoords(coords, n):
pos = []
for i in xrange(n):
random.shuffle(coords)
pos.append(coords.pop())
return(coords, pos)
coordsTmp, pos1 = setCoords(coords[dim**2/2:], n1)
coords = coords[:dim**2/2] + coordsTmp
coords, pos2 = setCoords(coords, n2)
coords, pos3 = setCoords(coords, n3)

Quantile/Median/2D binning in Python

do you know a quick/elegant Python/Scipy/Numpy solution for the following problem:
You have a set of x, y coordinates with associated values w (all 1D arrays). Now bin x and y onto a 2D grid (size BINSxBINS) and calculate quantiles (like the median) of the w values for each bin, which should at the end result in a BINSxBINS 2D array with the required quantiles.
This is easy to do with some nested loop,but I am sure there is a more elegant solution.
Thanks,
Mark
This is what I came up with, I hope it's useful. It's not necessarily cleaner or better than using a loop, but maybe it'll get you started toward something better.
import numpy as np
bins_x, bins_y = 1., 1.
x = np.array([1,1,2,2,3,3,3])
y = np.array([1,1,2,2,3,3,3])
w = np.array([1,2,3,4,5,6,7], 'float')
# You can get a bin number for each point like this
x = (x // bins_x).astype('int')
y = (y // bins_y).astype('int')
shape = [x.max()+1, y.max()+1]
bin = np.ravel_multi_index([x, y], shape)
# You could get the mean by doing something like:
mean = np.bincount(bin, w) / np.bincount(bin)
# Median is a bit harder
order = bin.argsort()
bin = bin[order]
w = w[order]
edges = (bin[1:] != bin[:-1]).nonzero()[0] + 1
med_index = (np.r_[0, edges] + np.r_[edges, len(w)]) // 2
median = w[med_index]
# But that's not quite right, so maybe
median2 = [np.median(i) for i in np.split(w, edges)]
Also take a look at numpy.histogram2d
I'm just trying to do this myself and it sound like you want the command "scipy.stats.binned_statistic_2d" from you can find the mean, median, standard devation or any defined function for the third parameter given the bins.
I realise this question has already been answered but I believe this is a good built in solution.
thanks a lot for your code. Based on it I found the following solution of my problem (only a minor modification of your code):
import numpy as np
BINS=10
boxsize=10.0
bins_x, bins_y = boxsize/BINS, boxsize/BINS
x = np.array([0,0,0,1,1,1,2,2,2,3,3,3])
y = np.array([0,0,0,1,1,1,2,2,2,3,3,3])
w = np.array([0,1,2,0,1,2,0,1,2,0,1,2], 'float')
# You can get a bin number for each point like this
x = (x // bins_x).astype('int')
y = (y // bins_y).astype('int')
shape = [BINS, BINS]
bin = np.ravel_multi_index([x, y], shape)
# Median
order = bin.argsort()
bin = bin[order]
w = w[order]
edges = (bin[1:] != bin[:-1]).nonzero()[0] + 1
median = [np.median(i) for i in np.split(w, edges)]
#construct BINSxBINS matrix with median values
binvals=np.unique(bin)
medvals=np.zeros([BINS*BINS])
medvals[binvals]=median
medvals=medvals.reshape([BINS,BINS])
print medvals
With numpy/scipy it goes like this:
import numpy as np
import scipy.stats as stats
x = np.random.uniform(0,200,100)
y = np.random.uniform(0,200,100)
w = np.random.uniform(1,10,100)
h = np.histogram2d(x,y,bins=[10,10], weights=w,range=[[0,200],[0,200]])
hist, bins_x, bins_y = h
q = stats.mstats.mquantiles(hist,prob=[0.25, 0.5, 0.75])
>>> q.round(2)
array([ 512.8 , 555.41, 592.73])
q1 = np.where(hist<q[0],1,0)
q2 = np.where(np.logical_and(q[0]<=hist,hist<q[1]),2,0)
q3 = np.where(np.logical_and(q[1]<=hist,hist<=q[2]),3,0)
q4 = np.where(q[2]<hist,4,0)
>>>q1 + q2 + q3 + q4
array([[4, 3, 4, 3, 1, 1, 4, 3, 1, 2],
[1, 1, 4, 4, 2, 3, 1, 3, 3, 3],
[2, 3, 3, 2, 2, 2, 3, 2, 4, 2],
[2, 2, 3, 3, 3, 1, 2, 2, 1, 4],
[1, 3, 1, 4, 2, 1, 3, 1, 1, 3],
[4, 2, 2, 1, 2, 1, 3, 2, 1, 1],
[4, 1, 1, 3, 1, 3, 4, 3, 2, 1],
[4, 3, 1, 4, 4, 4, 1, 1, 2, 4],
[2, 4, 4, 4, 3, 4, 2, 2, 2, 4],
[2, 2, 4, 4, 3, 3, 1, 3, 4, 4]])
prob = [0.25, 0.5, 0.75] is the default value for the quantile settings, you can change it or leave it away.

python numpy roll with padding

I'd like to roll a 2D numpy in python, except that I'd like pad the ends with zeros rather than roll the data as if its periodic.
Specifically, the following code
import numpy as np
x = np.array([[1, 2, 3], [4, 5, 6]])
np.roll(x, 1, axis=1)
returns
array([[3, 1, 2],[6, 4, 5]])
but what I would prefer is
array([[0, 1, 2], [0, 4, 5]])
I could do this with a few awkward touchups, but I'm hoping that there's a way to do it with fast built-in commands.
Thanks
There is a new numpy function in version 1.7.0 numpy.pad that can do this in one-line. Pad seems to be quite powerful and can do much more than a simple "roll". The tuple ((0,0),(1,0)) used in this answer indicates the "side" of the matrix which to pad.
import numpy as np
x = np.array([[1, 2, 3],[4, 5, 6]])
print np.pad(x,((0,0),(1,0)), mode='constant')[:, :-1]
Giving
[[0 1 2]
[0 4 5]]
I don't think that you are going to find an easier way to do this that is built-in. The touch-up seems quite simple to me:
y = np.roll(x,1,axis=1)
y[:,0] = 0
If you want this to be more direct then maybe you could copy the roll function to a new function and change it to do what you want. The roll() function is in the site-packages\core\numeric.py file.
I just wrote the following. It could be more optimized by avoiding zeros_like and just computing the shape for zeros directly.
import numpy as np
def roll_zeropad(a, shift, axis=None):
"""
Roll array elements along a given axis.
Elements off the end of the array are treated as zeros.
Parameters
----------
a : array_like
Input array.
shift : int
The number of places by which elements are shifted.
axis : int, optional
The axis along which elements are shifted. By default, the array
is flattened before shifting, after which the original
shape is restored.
Returns
-------
res : ndarray
Output array, with the same shape as `a`.
See Also
--------
roll : Elements that roll off one end come back on the other.
rollaxis : Roll the specified axis backwards, until it lies in a
given position.
Examples
--------
>>> x = np.arange(10)
>>> roll_zeropad(x, 2)
array([0, 0, 0, 1, 2, 3, 4, 5, 6, 7])
>>> roll_zeropad(x, -2)
array([2, 3, 4, 5, 6, 7, 8, 9, 0, 0])
>>> x2 = np.reshape(x, (2,5))
>>> x2
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
>>> roll_zeropad(x2, 1)
array([[0, 0, 1, 2, 3],
[4, 5, 6, 7, 8]])
>>> roll_zeropad(x2, -2)
array([[2, 3, 4, 5, 6],
[7, 8, 9, 0, 0]])
>>> roll_zeropad(x2, 1, axis=0)
array([[0, 0, 0, 0, 0],
[0, 1, 2, 3, 4]])
>>> roll_zeropad(x2, -1, axis=0)
array([[5, 6, 7, 8, 9],
[0, 0, 0, 0, 0]])
>>> roll_zeropad(x2, 1, axis=1)
array([[0, 0, 1, 2, 3],
[0, 5, 6, 7, 8]])
>>> roll_zeropad(x2, -2, axis=1)
array([[2, 3, 4, 0, 0],
[7, 8, 9, 0, 0]])
>>> roll_zeropad(x2, 50)
array([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]])
>>> roll_zeropad(x2, -50)
array([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]])
>>> roll_zeropad(x2, 0)
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
"""
a = np.asanyarray(a)
if shift == 0: return a
if axis is None:
n = a.size
reshape = True
else:
n = a.shape[axis]
reshape = False
if np.abs(shift) > n:
res = np.zeros_like(a)
elif shift < 0:
shift += n
zeros = np.zeros_like(a.take(np.arange(n-shift), axis))
res = np.concatenate((a.take(np.arange(n-shift,n), axis), zeros), axis)
else:
zeros = np.zeros_like(a.take(np.arange(n-shift,n), axis))
res = np.concatenate((zeros, a.take(np.arange(n-shift), axis)), axis)
if reshape:
return res.reshape(a.shape)
else:
return res
import numpy as np
def shift_2d_replace(data, dx, dy, constant=False):
"""
Shifts the array in two dimensions while setting rolled values to constant
:param data: The 2d numpy array to be shifted
:param dx: The shift in x
:param dy: The shift in y
:param constant: The constant to replace rolled values with
:return: The shifted array with "constant" where roll occurs
"""
shifted_data = np.roll(data, dx, axis=1)
if dx < 0:
shifted_data[:, dx:] = constant
elif dx > 0:
shifted_data[:, 0:dx] = constant
shifted_data = np.roll(shifted_data, dy, axis=0)
if dy < 0:
shifted_data[dy:, :] = constant
elif dy > 0:
shifted_data[0:dy, :] = constant
return shifted_data
This function would work on 2D arrays and replace rolled values with a constant of your choosing.
A bit late, but feels like a quick way to do what you want in one line. Perhaps would work best if wrapped inside a smart function (example below provided just for horizontal axis):
import numpy
a = numpy.arange(1,10).reshape(3,3) # an example 2D array
print a
[[1 2 3]
[4 5 6]
[7 8 9]]
shift = 1
a = numpy.hstack((numpy.zeros((a.shape[0], shift)), a[:,:-shift]))
print a
[[0 1 2]
[0 4 5]
[0 7 8]]
You can also use ndimage.shift:
>>> from scipy import ndimage
>>> arr = np.array([[1, 2, 3], [4, 5, 6]])
>>> ndimage.shift(arr, (0,1))
array([[0, 1, 2],
[0, 4, 5]])
Elaborating on the answer by Hooked (since it took me a few minutes to understand it)
The code below first pads a certain amount of zeros in the up, down, left and right margins and then selects the original matrix inside the padded one. A perfectly useless code, but good for understanding np.pad.
import numpy as np
x = np.array([[1, 2, 3],[4, 5, 6]])
y = np.pad(x,((1,3),(2,4)), mode='constant')[1:-3,2:-4]
print np.all(x==y)
now to make an upwards shift of 2 combined with a rightwards shift of 1 position one should do
print np.pad(x,((0,2),(1,0)), mode='constant')[2:0,0:-1]
You could also use numpy's triu and scipy.linalg's circulant. Make a circulant version of your matrix. Then, select the upper triangular part starting at the first diagonal, (the default option in triu). The row index will correspond to the number of padded zeros you want.
If you don't have scipy you can generate a nXn circulant matrix by making an (n-1) X (n-1) identity matrix and stacking a row [0 0 ... 1] on top of it and the column [1 0 ... 0] to the right of it.
I faced a similar problem with shifting a 2-d array in both directions
def shift_frame(img,move_dir,fill=np.inf):
frame = np.full_like(img,fill)
x,y = move_dir
size_x,size_y = np.array(img.shape) - np.abs(move_dir)
frame_x = slice(0,size_x) if x>=0 else slice(-x,size_x-x)
frame_y = slice(0,size_y) if y>=0 else slice(-y,size_y-y)
img_x = slice(x,None) if x>=0 else slice(0,size_x)
img_y = slice(y,None) if y>=0 else slice(0,size_y)
frame[frame_x,frame_y] = img[img_x,img_y]
return frame
test = np.arange(25).reshape((5,5))
shift_frame(test,(1,1))
'''
returns:
array([[ 6, 7, 8, 9, -1],
[11, 12, 13, 14, -1],
[16, 17, 18, 19, -1],
[21, 22, 23, 24, -1],
[-1, -1, -1, -1, -1]])
'''
I haven't measured the runtime of this, but it seems to work well enough for my use, although a built-in one liner would be nice
import numpy as np
def roll_zeropad(a, dyx):
h, w = a.shape[:2]
dy, dx = dyx
pad_x, start_x, end_x = ((dx,0), 0, w) if dx > 0 else ((0,-dx), -dx, w-dx)
pad_y, start_y, end_y = ((dy,0), 0, h) if dy > 0 else ((0,-dy), -dy, h-dy)
return np.pad(a, (pad_y, pad_x))[start_y:end_y,start_x:end_x]
test = np.arange(25).reshape((5,5))
out = roll_zeropad(test,(1,1))
print(out)
"""
returns:
[[ 0 0 0 0 0]
[ 0 0 1 2 3]
[ 0 5 6 7 8]
[ 0 10 11 12 13]
[ 0 15 16 17 18]]
"""

Categories

Resources