I would like to figure out a way to apply a function which calculates pairwise distances, let's call it dists(A, B), row-wise for every input element in a batch, meaning:
(100, 16, 3) -- input, 100 is the batch size so 100 instances, 16 is let's say image size, and 3 filters (asking for Conv2D)
(5, 3) -- tensor for which I want to calculate the row-wise distance (assume it's A in dists(A, B) and is fixed)
Now, for every instance I am supposed to get back a matrix of shape (5, 16). Naturally, I could use a for to span the batch and get my final (100,5,16) result. However, I would love to know if there is an easier way to apply my function row-wise, in parallel, using GPU.
Thank you very much for your time.
Suppose we are using the L1 distance:
import torch
# data and target
a = torch.randn(100, 16, 3)
b = torch.randn(5, 3)
# Reshape the tensors
a = a.unsqueeze(1)
b = b.unsqueeze(0).unsqueeze(2)
print(a.shape, b.shape)
# Compute distance
dist = (a-b).abs().sum(3)
print(dist.shape)
Related
I have the data with 38910 rows and 2 columns. As its a string data, so I used two feature creation methods A and B.
Method A gives me data of numpy arrays of the shape as:
a.shape = (38910, 17, 21)
Method B gives me data of numpy arrays of the shape as:
b.shape = (38910, 16, 441)
Now, for applying Convolution Neural Network and other methods, I need to combine both the features to make a numpy array of the shape = (38910, 17, 21, 16, 441). What is the best way I can do that such that I don't face memory issues.
One one to avoid memory issues is to process the rows in batches. Assuming that you have a function combine_features(a, b) that combines the outputs of method A and method B, here's a rough outline of a solution:
a_batches = np.array_split(a, 500)
b_batches = np.array_split(b, 500)
for i, batch in enumerate(zip(a_batches, b_batches)):
a_batch, b_batch = batch
output = combine_features(a_batch, b_batch)
np.save(f"{destination_folder}/data-{i}.npy", output)
Then as you are training, you can iterate through the saved files and load one at a time.
I have a tensors data of sensors, each tensor is of shape (4,1500)
This is 1500 timepoints and for each time point I have 4 features.
I want to "smooth" the sequences with rolling average or other rolling statistics. The end goal is to try to improve an lstm autoencoder with rolling statistics instead of the long raw sequence.
I am familiar with rolling windows of pandas and currently I am doing this:
#tensor shape:
data.shape
(4,1500)
#convert data to numpy array and then to dataframe and perform rolling mean
rolled_data=pd.DataFrame(data.numpy().swapaxes(1,0)).rolling(10).mean()[::10]
rolled_data.shape
(150, 4)
# convert back the dataframe to tensor
tensor_rolled_data=torch.Tensor(rolled_data.to_numpy().swapaxes(1,0))
tensor_rolled_data.shape
torch.Size([4, 150])
my question is- is there a better way to do it? a function in numpy/torch that can do rolling statistics in a cleaner or more efficient way?
Since you're striding the output by the size of the window this is actually more akin to downsampling by averaging than to a computing rolling statistics. We can take advantage of the fact that there are no overlaps by simply reshaping the initial tensor.
Using Tensor.reshape
Assuming your data tensor has a shape divisible by 10 then you can just reshape the tensor to shape (4, 150, 10) and compute the statistic along the last dimension. For example
win_size = 10
tensor_rolled_data = data.reshape(data.shape[0], -1, win_size).mean(dim=2)
This solution doesn't give exactly the same results as your tensor_rolled_data since in this solution the first entry will contain the mean of the first 10 samples, the second entry will contain the mean of the second 10 samples, etc... The pandas solution is a "causal filter" so the first entry will contain the mean of the 10 most recent samples up to and including sample 0, the second will contain the 10 most recent samples up to and including sample 10, etc... (Note that the first entry is nan in the pandas solution since less than 10 preceding samples exist).
If this difference is unacceptable you can recreate the pandas result by first padding with 9 nan values and clipping off the last 9 samples.
import torch.nn.functional as F
win_size = 10
# pad with `nan` to match behavior of pandas
data_padded = F.pad(data[None, :, :-(win_size - 1)], (win_size - 1, 0), 'constant', float('nan')).squeeze(0)
# find mean of groups of N samples
tensor_rolled_data = data_padded.reshape(data.shape[0], -1, win_size).mean(dim=2)
Using Tensor.unfold
To address the comment about what to do when there are overlaps. If you're only interested in the mean statistic then there are a number of ways to compute this (e.g. convolution, average pooling, tensor unfolding). That said, Tensor.unfold gives the most general solution since it could be used to compute any statistic over a window. For example
# same as first example above
win_size = 10
tensor_rolled_data = data.unfold(dimension=1, size=win_size, step=win_size).mean(dim=2)
or
# same as second example above
import torch.nn.functional as F
win_size = 10
data_padded = F.pad(data.unsqueeze(0), (win_size - 1, 0), 'constant', float('nan')).squeeze(0)
tensor_rolled_data = data_padded.unfold(dimension=1, size=win_size, step=win_size).mean(dim=2)
In the above cases, unfolding produces the same result as reshape since size and step are equal. However, unlike reshape, unfolding also supports size != step.
win_size = 10
stride = 2
tensor_rolled_data = data.unfold(1, win_size, stride).mean(dim=2).mean(dim=2)
# produces shape [4, 746]
or you can pad the front of the features with win_size - 1 values to achieve the same result as pandas.
import torch.nn.functional as F
win_size = 10
stride = 2
data_padded = F.pad(data.unsqueeze(0), (win_size - 1, 0), 'constant', float('nan')).squeeze(0)
tensor_rolled_data = data_padded.unfold(1, win_size, stride).mean(dim=2)
# produces shape [4, 750]
Note In practice you probably don't want to pad with NaN since this will probably become quite a headache. Instead you could use zero padding, 'replicate' padding, or 'mirror' padding.
I have the following situation. I have an array of size (3, 128, n) (where n is large). (This array represents a picture). I have a superresolution deep learning model that takes as input a (3, 128, 128) picture and gives it back in better quality. I want to use apply my model to the whole picture.
My existing solution
My first solution to this problem is to split my array into array of size (3, 128, 128). I then have a list of square images, and I can apply my model to each of this square and then concatenate all the results to get a new (3, 128, n) image. The problem with this method is that the model does not perform as well on the edges of the image.
My desired solution
To get around this problem, I have thought of an alternative solution. Instead of considering non overlapping square images, I can consider all square images that can be extracted from my original image. I can pass all those images to my model. Then to reconstruct a point of coordinates (a, b, c), I will consider all reconstructed square pictures that contains c, and take an average of them. I want this average to give more weight to the square where c is near the center.
To be more specific :
I start with a 3*128*n array (let's call it A). I pad on the left and on the right which gives me a new array (let's call it A_pad) of size 3*128*(n+2*127)
For i in range(0,n+127), let A_i = A_pad[:, :, i:i+128], A_i is of size (3*128*128) and can be fed to my model which creates a new array B_i of the same size.
Now I want a new array B of the same size than A that is defined like this : For each (x, y , z), B[x, y, z] is the mean of the 128 B_i[x, y, z+127-i] such that z <= i < z+128 with the weight 1 + min(z + 127 -i, i-z). That corresponds to taking the mean of all the windows that contains z with a weight proportional to the distance to the closest edge.
My question is based on the computation of B. Given what I've described, I could write multiple for loops that would yield the correct results, but I'm afraid it would be slow. I'm looking for a solution using numpy that is as fast as possible.
This is an example implementation that follows the steps you outlined in the section "My desired solution". It makes extensive use of np.lib.stride_tricks.as_strided which at first glance might not seem obvious at all; I added detailed comments to each usage for clarification. Also note that in your description you use z to denote the column position within images while in comments I use the term n-position in order to comply with the shape specification via n.
Regarding efficiency it's not obvious whether this is a winner or not. Computation happens all in numpy but the expression sliding_128 * weights builds a large array (128x the size of the original image) before reducing it along the frame dimension. This definitely comes at its cost, memory might even be an issue. A loop might come in handy at this position.
Lines which contain a comment prefixed with # [TEST] were added for testing purposes. Concretely this means we're overwriting the weights for the final sum of frames with 1 / 128 in order to eventually recover the original image (since no ML model transformation is applied either).
import numpy as np
n = 640 # For example.
image = np.random.randint(0, 256, size=(3, 128, n))
print('image.shape: ', image.shape) # (3, 128, 640)
padded = np.pad(image, ((0, 0), (0, 0), (127, 127)), mode='edge')
print('padded.shape: ', padded.shape) # (3, 128, 894)
sliding = np.lib.stride_tricks.as_strided(
padded,
# Frames stored along first dimension; sliding across last dimension of `padded`.
shape=(padded.shape[-1]-128+1, 3, 128, 128),
# First dimension: Moving one frame ahead -> move across last dimension of `padded`.
# Remaining three dimensions: Move as within `padded`.
strides=(padded.strides[-1:] + padded.strides)
)
print('sliding.shape: ', sliding.shape) # (767, 3, 128, 128)
# Now at this part we would feed the frames `sliding` to the ML model,
# where the first dimension is the batch size.
# Assume the output is assigned to `sliding` again.
# Since we're not using an ML model here, we create a copy instead
# in order to update the strides of `sliding` with it's actual shape (as defined above).
sliding = sliding.copy()
sliding_128 = np.lib.stride_tricks.as_strided(
# Reverse last dimension since we want the last column from the first frame.
# Need to copy again because `[::-1]` creates a view with negative stride,
# but we want actual reversal to work with the strides below.
# (There's perhaps a smart way of adjusting the strides below in order to not make a copy here.)
sliding[:, :, :, ::-1].copy(),
# Second dimension corresponds to the 128 consecutive frames.
# Previous last dimension is dropped since we're selecting the
# column that corresponds to the current n-position.
shape=(128, n, 3, 128),
# First dimension (frame position): Move one frame and one column ahead
# (actually want to move one column less in `sliding` but since we reverted order of columns
# we need to move one ahead now) -> move across first dimension of `sliding` + last dimension of `sliding`.
# Second dimension (n-position): Moving one frame ahead -> move across first dimension of `sliding`.
# Remaining two dimensions: Move within frames (channel and row dimensions).
strides=((sliding.strides[0] + sliding.strides[-1],) + sliding.strides[:1] + sliding.strides[1:3])
)
print('sliding_128.shape: ', sliding_128.shape) # (128, 640, 3, 128)
# Weights are independent of the n-position -> we can precompute.
weights = 1 + np.concatenate([np.arange(64), np.arange(64)[::-1]])
weights = np.ones(shape=128) # [TEST] Assign weights for testing -> want to obtain the original image back.
weights = weights.astype(float) / weights.sum() # Normalize?
weights = weights[:, None, None, None] # Prepare for broadcasting.
weighted_image = np.moveaxis(np.sum(sliding_128 * weights, axis=0), 0, 2)
print('weighted_image.shape: ', weighted_image.shape) # (3, 128, 640)
assert np.array_equal(image, weighted_image.astype(int)) # [TEST]
So I am a little new to using matrices in Python, and I am looking for the best way to perform the following operation.
Say I have a vector of an arbitrary length, like this:
data = np.array(range(255))
And I want to fit this data inside a matrix with a shape like so:
concept = np.zeros((3, 9, 6))
Now, obviously this will not fit, and results in an error:
ValueError: cannot reshape array of size 255 into shape (3,9,6)
What would be the best way to go about fitting as much of the data vector inside the first matrix with the shape (3, 9, 6) while making sure any "overflow" is stored in a second (or third, fourth, etc.) matrix?
Does this make sense?
Basically, I want to be able to take a vector of any size and produce an arbitrary amount of matrices that have the data shaped according to the 3, 9, 6 dimensions.
Thank you for your help.
def each_matrix(a, dims):
size = dims.prod()
padded = np.concatenate([ a, np.zeros(size-1) ])
for i in range(len(padded) / size):
yield padded[i*size : (i+1)*size].reshape(dims)
for matrix in each_matrix(np.array(range(255)),
dims=np.array([ 3, 9, 6 ])):
print(str(matrix) + '\n\n-------\n')
This will fill the last matrix with zeros.
Here is a rough solution to your problem.
def split_padded(a,n):
padding = n - len(data)%n
numOfsplit = int(len(data)/n)+1
print padding, numOfsplit
return np.split(np.concatenate((a,np.zeros(padding))),numOfsplit)
data = np.array(range(255))
splitnum = 3*9*6
splitdata = split_padded(data,splitnum)
for mat in splitdata:
print mat.reshape(3,9,6)
It is very rough and works for 1D input for array.
First, calculating the number of 0 we need to pad in padding and then calculating the number of matrices we can get out of input data in numOfsplit and doing the splitting in last line.
I have the following 3rd order tensors. Both tensors matrices the first tensor containing 100 10x9 matrices and the second containing 100 3x10 matrices (which I have just filled with ones for this example).
My aim is to multiply the matrices as the line up one to one correspondance wise which would result in a tensor with shape: (100, 3, 9) This can be done with a for loop that just zips up both tensors and then takes the dot of each but I am looking to do this just with numpy operators. So far here are some failed attempts
Attempt 1:
import numpy as np
T1 = np.ones((100, 10, 9))
T2 = np.ones((100, 3, 10))
print T2.dot(T1).shape
Ouput of attempt 1 :
(100, 3, 100, 9)
Which means it tried all possible combinations ... which is not what I am after.
Actually non of the other attempts even compile. I tried using np.tensordot , np.einsum (read here https://jameshensman.wordpress.com/2010/06/14/multiple-matrix-multiplication-in-numpy that it is supposed to do the job but I did not get Einsteins indices correct) also in the same link there is some crazy tensor cube reshaping method that I did not manage to visualize. Any suggestions / ideas-explanations on how to tackle this ?
Did you try?
In [96]: np.einsum('ijk,ilj->ilk',T1,T2).shape
Out[96]: (100, 3, 9)
The way I figure this out is look at the shapes:
(100, 10, 9)) (i, j, k)
(100, 3, 10) (i, l, j)
-------------
(100, 3, 9) (i, l, k)
the two j sum and cancel out. The others carry to the output.
For 4d arrays, with dimensions like (100,3,2,24 ) there are several options:
Reshape to 3d, T1.reshape(300,2,24), and after reshape back R.reshape(100,3,...). Reshape is virtually costless, and a good numpy tool.
Add an index to einsum: np.einsum('hijk,hilj->hilk',T1,T2), just a parallel usage to that of i.
Or use elipsis: np.einsum('...jk,...lj->...lk',T1,T2). This expression works with 3d, 4d, and up.