Optimizing torch mean over a dimension in a random batch

Optimizing torch mean over a dimension in a random batch - python

I am looking for a way to optimize the following code in pytorch.
I have a function f defined over space x,y and time t.
In a random batch, I need to compute the average over all the same timestamps. I was able to achieve this with the following inefficient for-loop
import torch
# Space (x,y) and time (t) coordinates in a random batch
x = torch.Tensor([[0, 0, 1, 0],[3, 2, 2, 1],[1,3,5,5]]).T
# compute a dummy function u = f(t,x,y)
f = (x**2 + 0.5)[:,:2]
# timestamps
t = x[:,0]
# get unique timestamps
val = torch.unique(t.squeeze())
for v in val:
# compute a mask for all timestamp equal to v
mask = t == v
# average over the spatial coordinates
f[mask,:] = torch.mean(f[mask,:], dim=0)
print(f)
Which results in
f = tensor([[0.5000, 5.1667],
[0.5000, 5.1667],
[1.5000, 4.5000],
[0.5000, 5.1667]])
Is there a way to make this computation faster?

I think you are looking for index_add_:
avg_size = int(t.max().item()) + 1 # number of rows in output tensor
z = torch.zeros((avg_size, f.shape[1]), dtype=f.dtype)
s = torch.index_add(s, 0, t.long(), f) # sum the elements of f
c = torch.index_add(s, 0, t.long(), torch.ones_like(f[:, :1])) # count how many at each entry
out = s / c # divide to get the mean

Related

How to vectorize indexing and computation when indexed tensors are different dimensions?

I'm trying to vectorize the following for-loop in Pytorch. I'd be happy with just vectorizing the inner for-loop, but doing the whole batch would also be awesome.
# B: the batch size
# N: the number of training examples
# dim: the dimension of each feature vector
# K: the number of discrete labels. each vector has a single label
# delta: margin for hinge loss
batch_data = torch.tensor(...) # Tensor of shape [B x N x d]
batch_labels = torch.tensor(...) # Tensor of shape [B x N x 1], each element is one of K labels (ints)
batch_losses = [] # Ultimately should be [B x 1]
batch_centroids = [] # Ultimately should be [B x K_i x dim]
for i in range(B):
centroids = [] # Keep track of the means for each class.
classes = torch.unique(labels) # Get the unique labels for the classes.
# NOTE: The number of classes K for each item in the batch might actually
# be different. This may complicate batch-level operations.
total_loss = 0
# For each class independently. This is the part I want to vectorize.
for cl in classes:
# Take the subset of training examples with that label.
subset = data[torch.where(labels == cl)]
# Find the centroid of that subset.
centroid = subset.mean(dim=0)
centroids.append(centroid)
# Get the distance between each point in the subset and the centroid.
dists = subset - centroid
norm = torch.linalg.norm(dists, dim=1)
# The loss is the mean of the hinge loss across the subset.
margin = norm - delta
hinge = torch.clamp(margin, min=0.0) ** 2
total_loss += hinge.mean()
# Keep track of everything. If it's too hard to keep track of centroids, that's also OK.
loss = total_loss.mean()
batch_losses.append(loss)
batch_centroids.append(centroids)
I've been scratching my head on how to deal with the irregularly sized tensors. The number of classes in each batch K_i is different, and the size of each subset is different.

It turns out it actually is possible to vectorize across ragged arrays. I'll use numpy, but code should be directly translatable to torch. The key technique is to:
Sort by ragged array membership
Perform an accumulation
Find boundary indices, compute adjacent differences
For a single (non-batch) input of an n x d matrix X and an n-length array label, the following returns the k x d centroids and n-length distances to respective centroids:
def vcentroids(X, label):
"""
Vectorized version of centroids.
"""
# order points by cluster label
ix = np.argsort(label)
label = label[ix]
Xz = X[ix]
# compute pos where pos[i]:pos[i+1] is span of cluster i
d = np.diff(label, prepend=0) # binary mask where labels change
pos = np.flatnonzero(d) # indices where labels change
pos = np.repeat(pos, d[pos]) # repeat for 0-length clusters
pos = np.append(np.insert(pos, 0, 0), len(X))
Xz = np.concatenate((np.zeros_like(Xz[0:1]), Xz), axis=0)
Xsums = np.cumsum(Xz, axis=0)
Xsums = np.diff(Xsums[pos], axis=0)
counts = np.diff(pos)
c = Xsums / np.maximum(counts, 1)[:, np.newaxis]
repeated_centroids = np.repeat(c, counts, axis=0)
aligned_centroids = repeated_centroids[inverse_permutation(ix)]
dist = np.sum((X - aligned_centroids) ** 2, axis=1)
return c, dist
Batching requires little special handling. For an input B x n x d array batch_X, with B x n batch labels batch_labels, create unique labels for each batch:
batch_k = batch_labels.max(axis=1) + 1
batch_k[1:] = batch_k[:-1]
batch_k[0] = 0
base = np.cumsum(batch_k)
batch_labels += base.expand_dims(1)
So now each batch element has a unique contiguous range of labels. I.e., the first batch element will have n labels in some range [0, k0) where k0 = batch_k[0], the second will have range [k0, k0 + k1) where k1 = batch_k[1], etc.
Then just flatten the n x B x d input to n*B x d and call the same vectorized method. Your loss function is derivable using the final distances and same position-array based reduction technique.
For a detailed explanation of how the vectorization works, see my blog post.

You can vectorize the whole thing if you use a one-hot encoding for your classes and a pairwise distance trick for your norms:
import torch
B = 32
N = 1000
dim = 50
K = 25
batch_data = torch.randn((B, N, dim))
batch_labels = torch.randint(0, K, size=(B, N))
batch_one_hot = torch.nn.functional.one_hot(batch_labels)
centroids = torch.matmul(
batch_one_hot.transpose(-1, 1).type(batch_data.dtype),
batch_data
) / batch_one_hot.sum(1)[..., None]
norms = torch.linalg.norm(batch_data[:, :, None] - centroids[:, None], axis=-1)
# Compute the rest of your loss
# ...
A couple things to watch out for:
You'll get a divide by zero for any batches that have a missing class. You can handle this by first computing the class sums (with matmul) and counts (summing the one-hot tensor along axis 1) separately. Then, mask the sums with count == 0 and divide the rest of them by their class counts.
If you have a large number of classes, this will cause memory problems because the one-hot tensor will be too big. In that case, the answer from #VF1 probably makes more sense.

Repeating a function 1000 times and saving each iteration result in a list

I have tried to simulate some event-onsets and predictors for an experiment. I have two predictors (circles and squares). The stimuli ('events') take 1 second and the ISI (interstimulus interval) is 8 seconds. I am also interested in both contrasts against baseline (circles against baseline; squares against baseline). In the end, I am trying to run the function that I have defined (simulate_data_fixed, n=420 is a paramater that is fixed) for 1000, at each iteration I would like to calculate an efficiency score in the end and store the efficiency scores in a list.
def simulate_data_fixed_ISI(N=420):
dg_hrf = glover_hrf(tr=1, oversampling=1)
# Create indices in regularly spaced intervals (9 seconds, i.e. 1 sec stim + 8 ISI)
stim_onsets = np.arange(10, N - 15, 9)
stimcodes = np.repeat([1, 2], stim_onsets.size / 2) # create codes for two conditions
np.random.shuffle(stimcodes) # random shuffle
stim = np.zeros((N, 1))
c = np.array([[0, 1, 0], [0, 0, 1]])
# Fill stim array with codes at onsets
for i, stim_onset in enumerate(stim_onsets):
stim[stim_onset] = 1 if stimcodes[i] == 1 else 2
stims_A = (stim == 1).astype(int)
stims_B = (stim == 2).astype(int)
reg_A = np.convolve(stims_A.squeeze(), dg_hrf)[:N]
reg_B = np.convolve(stims_B.squeeze(), dg_hrf)[:N]
X = np.hstack((np.ones((reg_B.size, 1)), reg_A[:, np.newaxis], reg_B[:, np.newaxis]))
dvars = [(c[i, :].dot(np.linalg.inv(X.T.dot(X))).dot(c[i, :].T))
for i in range(c.shape[0])]
eff = c.shape[0] / np.sum(dvars)
return eff
However, I want to run this entire chunk 1000 times and store the 'eff' in an array, etc. so that later on I want to display them as a histogram. How ı can do this?

If I understand you correctly you should be able just to run
EFF = [simulate_data_fixed_ISI() for i in range(1000)] #1000 repeats
As #theonlygusti clarified, this line, EFF, runs your function simulate_data_fixed_ISI() 1000 times and put each return in the array EFF
Test
import numpy as np
def simulate_data_fixed_ISI(n=1):
"""
Returns 'n' random numbers
"""
return np.random.rand(n)
EFF = [simulate_data_fixed_ISI() for i in range(5)]
EFF
#[array([0.19585137]),
# array([0.91692933]),
# array([0.49294667]),
# array([0.79751017]),
# array([0.58294512])]

Your question seems to boil down to:
I am trying to run the function that I have defined for 1000, at each iteration I would like to calculate an efficiency score in the end and store the efficiency scores in a list
I guess "the function that I have defined" is the simulate_data_fixed_ISI in your question?
Then you can simply run it 1000 times using a basic for loop, and add the results into a list:
def simulate_data_fixed_ISI(N=420):
dg_hrf = glover_hrf(tr=1, oversampling=1)
# Create indices in regularly spaced intervals (9 seconds, i.e. 1 sec stim + 8 ISI)
stim_onsets = np.arange(10, N - 15, 9)
stimcodes = np.repeat([1, 2], stim_onsets.size / 2) # create codes for two conditions
np.random.shuffle(stimcodes) # random shuffle
stim = np.zeros((N, 1))
c = np.array([[0, 1, 0], [0, 0, 1]])
# Fill stim array with codes at onsets
for i, stim_onset in enumerate(stim_onsets):
stim[stim_onset] = 1 if stimcodes[i] == 1 else 2
stims_A = (stim == 1).astype(int)
stims_B = (stim == 2).astype(int)
reg_A = np.convolve(stims_A.squeeze(), dg_hrf)[:N]
reg_B = np.convolve(stims_B.squeeze(), dg_hrf)[:N]
X = np.hstack((np.ones((reg_B.size, 1)), reg_A[:, np.newaxis], reg_B[:, np.newaxis]))
dvars = [(c[i, :].dot(np.linalg.inv(X.T.dot(X))).dot(c[i, :].T))
for i in range(c.shape[0])]
eff = c.shape[0] / np.sum(dvars)
return eff
eff_results = []
for _ in range(1000):
efficiency_score = simulate_data_fixed_ISI()
eff_results.append(efficiency_score)
Now eff_results contains 1000 entries, each of which is a call to your function simulate_data_fixed_ISI

How to efficiently convert large numpy array of point cloud data to downsampled 2d array?

I have a large numpy array of unordered lidar point cloud data, of shape [num_points, 3], which are the XYZ coordinates of each point. I want to downsample this into a 2D grid of mean height values - to do this I want to split the data into 5x5 X-Y bins and calculate the mean height value (Z coordinate) in each bin.
Does anyone know any quick/efficient way to do this?
Current code:
import numpy as np
from open3d import read_point_cloud
resolution = 5
# Code to load point cloud and get points as numpy array
pcloud = read_point_cloud(params.POINT_CLOUD_DIR + "Part001.pcd")
pcloud_np = np.asarray(pcloud.points)
# Code to generate example dataset
pcloud_np = np.random.uniform(0.0, 1000.0, size=(1000,3))
# Current (inefficient) code to quantize into 5x5 XY 'bins' and take mean Z values in each bin
pcloud_np[:, 0:2] = np.round(pcloud_np[:, 0:2]/float(resolution))*float(resolution) # Round XY values to nearest 5
num_x = int(np.max(pcloud_np[:, 0])/resolution)
num_y = int(np.max(pcloud_np[:, 1])/resolution)
mean_height = np.zeros((num_x, num_y))
# Loop over each x-y bin and calculate mean z value
x_val = 0
for x in range(num_x):
y_val = 0
for y in range(num_y):
height_vals = pcloud_np[(pcloud_np[:,0] == float(x_val)) & (pcloud_np[:,1] == float(y_val))]
if height_vals.size != 0:
mean_height[x, y] = np.mean(height_vals)
y_val += resolution
x_val += resolution

Here is a suggestion using an np.bincount idiom on the flattened 2d grid. I also took the liberty to add some small fixes to the original code:
import numpy as np
#from open3d import read_point_cloud
resolution = 5
# Code to load point cloud and get points as numpy array
#pcloud = read_point_cloud(params.POINT_CLOUD_DIR + "Part001.pcd")
#pcloud_np = np.asarray(pcloud.points)
# Code to generate example dataset
pcloud_np = np.random.uniform(0.0, 1000.0, size=(1000,3))
def f_op(pcloud_np, resolution):
# Current (inefficient) code to quantize into 5x5 XY 'bins' and take mean Z values in each bin
pcloud_np[:, 0:2] = np.round(pcloud_np[:, 0:2]/float(resolution))*float(resolution) # Round XY values to nearest 5
num_x = int(np.max(pcloud_np[:, 0])/resolution) + 1
num_y = int(np.max(pcloud_np[:, 1])/resolution) + 1
mean_height = np.zeros((num_x, num_y))
# Loop over each x-y bin and calculate mean z value
x_val = 0
for x in range(num_x):
y_val = 0
for y in range(num_y):
height_vals = pcloud_np[(pcloud_np[:,0] == float(x_val)) & (pcloud_np[:,1] == float(y_val)), 2]
if height_vals.size != 0:
mean_height[x, y] = np.mean(height_vals)
y_val += resolution
x_val += resolution
return mean_height
def f_pp(pcloud_np, resolution):
xy = pcloud_np.T[:2]
xy = ((xy + resolution / 2) // resolution).astype(int)
mn, mx = xy.min(axis=1), xy.max(axis=1)
sz = mx + 1 - mn
flatidx = np.ravel_multi_index(xy-mn[:, None], sz)
histo = np.bincount(flatidx, pcloud_np[:, 2], sz.prod()) / np.maximum(1, np.bincount(flatidx, None, sz.prod()))
return (histo.reshape(sz), *(xy * resolution))
res_op = f_op(pcloud_np, resolution)
res_pp, x, y = f_pp(pcloud_np, resolution)
from timeit import timeit
t_op = timeit(lambda:f_op(pcloud_np, resolution), number=10)*100
t_pp = timeit(lambda:f_pp(pcloud_np, resolution), number=10)*100
print("results equal:", np.allclose(res_op, res_pp))
print(f"timings (ms) op: {t_op:.3f} pp: {t_pp:.3f}")
Sample output:
results equal: True
timings (ms) op: 359.162 pp: 0.427
Speedup almost 1000x.

How to do vector-matrix multiplication with conditions?

I want to obtain a list (or array, doesn't matter) of A from the following formula:
A_i = X_(k!=i) * S_(k!=i) * X'_(k!=i)
where:
X is a vector (and X' is the transpose of X), S is a matrix, and the subscript k is defined as {k=1,2,3,...n| k!=i}.
X = [x1, x2, ..., xn]
S = [[s11,s12,...,s1n],
[s21,s22,...,s2n]
[... ... ... ..]
[sn1,sn2,...,snn]]
I take the following as an example:
X = [0.1,0.2,0.3,0.5]
S = [[0.4,0.1,0.3,0.5],
[2,1.5,2.4,0.6]
[0.4,0.1,0.3,0.5]
[2,1.5,2.4,0.6]]
So, eventually, I would get a list of four values for A.
I did this:
import numpy as np
x = np.array([0.1,0.2,0.3,0.5])
s = np.matrix([[0.4,0.1,0.3,0.5],[1,2,1.5,2.4,0.6],[0.4,0.1,0.3,0.5],[1,2,1.5,2.4,0.6]])
for k in range(x) if k!=i
A = (x.dot(s)).dot(np.transpose(x))
print (A)
I am confused with how to use a conditional 'for' loop. Could you please help me to solve it? Thanks.
EDIT:
Just to explain more. If you take i=1, then the formula will be:
A_1 = X_(k!=1) * S_(k!=1) * X'_(k!=1)
So any array (or value) associated with subscript 1 will be deleted in X and S. like:
X = [0.2,0.3,0.5]
S = [[1.5,2.4,0.6]
[0.1,0.3,0.5]
[1.5,2.4,0.6]]

Step 1: correctly calculate A_i
Step 2: collect them into A
I assume what you want to calculate is
An easy way to do so is to mask away the entries using masked arrays. This way we don't need to delete or copy any matrixes.
# sample
x = np.array([1,2,3,4])
s = np.diag([4,5,6,7])
# we will use masked arrays to remove k=i
vec_mask = np.zeros_like(x)
matrix_mask = np.zeros_like(s)
i = 0 # start
# set masks
vec_mask[i] = 1
matrix_mask[i] = matrix_mask[:,i] = 1
s_mask = np.ma.array(s, mask=matrix_mask)
x_mask = np.ma.array(x, mask=vec_mask)
# reduced product, remember using np.ma.inner instead np.inner
Ai = np.ma.inner(np.ma.inner(x_mask, s_mask), x_mask.T)
vec_mask[i] = 0
matrix_mask[i] = matrix_mask[:,i] = 0
As terms of 0 don't add to the sum, we actually can ignore masking the matrix and just mask the vector:
# we will use masked arrays to remove k=i
mask = np.zeros_like(x)
i = 0 # start
# set masks
mask[i] = 1
x_mask = np.ma.array(x, mask=mask)
# reduced product
Ai = np.ma.inner(np.ma.inner(x_mask, s), x_mask.T)
# unset mask
mask[i] = 0
The final step is to assemble A out of the A_is, so in total we get
x = np.array([1,2,3,4])
s = np.diag([4,5,6,7])
mask = np.zeros_like(x)
x_mask = np.ma.array(x, mask=mask)
A = []
for i in range(len(x)):
x_mask.mask[i] = 1
Ai = np.ma.inner(np.ma.inner(x_mask, s), x_mask.T)
A.append(Ai)
x_mask.mask[i] = 0
A_vec = np.array(A)

Implementing a matrix/vector product using loops will be rather slow in Python. Therefore, I suggest to actually delete the rows/columns/elements at the given index and perform the fast built-in dot product without any explicit loops:
i = 0 # don't forget Python's indices are zero-based
x_ = np.delete(X, i) # remove element
s_ = np.delete(S, i, axis=0) # remove row
s_ = np.delete(s_, i, axis=1) # remove column
result = x_.dot(s_).dot(x_) # no need to transpose a 1-D array

Repeating A Function for Many Values and Building a 2D Array

I have a code.
It takes in a value N and does a quantum walk for that many steps and gives an array that shows the probability at each position.
It's quite a complex calculation and N must be a single integer.
What I want to do is repeat this calculation for 100 values of N and build a large 2D array.
Any idea how I would do this?
Here's my code:
N = 100 # number of random steps
P = 2*N+1 # number of positions
#defining a quantum coin
coin0 = array([1, 0]) # |0>
coin1 = array([0, 1]) # |1>
#defining the coin operator
C00 = outer(coin0, coin0) # |0><0|
C01 = outer(coin0, coin1) # |0><1|
C10 = outer(coin1, coin0) # |1><0|
C11 = outer(coin1, coin1) # |1><1|
C_hat = (C00 + C01 + C10 - C11)/sqrt(2.)
#step operator
ShiftPlus = roll(eye(P), 1, axis=0)
ShiftMinus = roll(eye(P), -1, axis=0)
S_hat = kron(ShiftPlus, C00) + kron(ShiftMinus, C11)
#walk operator
U = S_hat.dot(kron(eye(P), C_hat))
#defining the initial state
posn0 = zeros(P)
posn0[N] = 1 # array indexing starts from 0, so index N is the central posn
psi0 = kron(posn0,(coin0+coin1*1j)/sqrt(2.))
#the state after N steps
psiN = linalg.matrix_power(U, N).dot(psi0)
#finidng the probabilty operator
prob = empty(P)
for k in range(P):
posn = zeros(P)
posn[k] = 1
M_hat_k = kron( outer(posn,posn), eye(2))
proj = M_hat_k.dot(psiN)
prob[k] = proj.dot(proj.conjugate()).real
prob[prob==0] = np.nan
nanmask = np.isfinite(prob)
prob_masked=prob[nanmask] #this is the final probability to be plotted
P_masked=arange(P)[nanmask] #these are the possible positions
Rather than writing out the array I get as it is 100 units, this is a graph of the position and probability at N = 100
I eventually want to make a 3D plot of position against N against probability.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Optimizing torch mean over a dimension in a random batch - python

Related

How to vectorize indexing and computation when indexed tensors are different dimensions?

Repeating a function 1000 times and saving each iteration result in a list

How to efficiently convert large numpy array of point cloud data to downsampled 2d array?

How to do vector-matrix multiplication with conditions?

Repeating A Function for Many Values and Building a 2D Array

Categories

Resources