I have the following situation. I have an array of size (3, 128, n) (where n is large). (This array represents a picture). I have a superresolution deep learning model that takes as input a (3, 128, 128) picture and gives it back in better quality. I want to use apply my model to the whole picture.
My existing solution
My first solution to this problem is to split my array into array of size (3, 128, 128). I then have a list of square images, and I can apply my model to each of this square and then concatenate all the results to get a new (3, 128, n) image. The problem with this method is that the model does not perform as well on the edges of the image.
My desired solution
To get around this problem, I have thought of an alternative solution. Instead of considering non overlapping square images, I can consider all square images that can be extracted from my original image. I can pass all those images to my model. Then to reconstruct a point of coordinates (a, b, c), I will consider all reconstructed square pictures that contains c, and take an average of them. I want this average to give more weight to the square where c is near the center.
To be more specific :
I start with a 3*128*n array (let's call it A). I pad on the left and on the right which gives me a new array (let's call it A_pad) of size 3*128*(n+2*127)
For i in range(0,n+127), let A_i = A_pad[:, :, i:i+128], A_i is of size (3*128*128) and can be fed to my model which creates a new array B_i of the same size.
Now I want a new array B of the same size than A that is defined like this : For each (x, y , z), B[x, y, z] is the mean of the 128 B_i[x, y, z+127-i] such that z <= i < z+128 with the weight 1 + min(z + 127 -i, i-z). That corresponds to taking the mean of all the windows that contains z with a weight proportional to the distance to the closest edge.
My question is based on the computation of B. Given what I've described, I could write multiple for loops that would yield the correct results, but I'm afraid it would be slow. I'm looking for a solution using numpy that is as fast as possible.
This is an example implementation that follows the steps you outlined in the section "My desired solution". It makes extensive use of np.lib.stride_tricks.as_strided which at first glance might not seem obvious at all; I added detailed comments to each usage for clarification. Also note that in your description you use z to denote the column position within images while in comments I use the term n-position in order to comply with the shape specification via n.
Regarding efficiency it's not obvious whether this is a winner or not. Computation happens all in numpy but the expression sliding_128 * weights builds a large array (128x the size of the original image) before reducing it along the frame dimension. This definitely comes at its cost, memory might even be an issue. A loop might come in handy at this position.
Lines which contain a comment prefixed with # [TEST] were added for testing purposes. Concretely this means we're overwriting the weights for the final sum of frames with 1 / 128 in order to eventually recover the original image (since no ML model transformation is applied either).
import numpy as np
n = 640 # For example.
image = np.random.randint(0, 256, size=(3, 128, n))
print('image.shape: ', image.shape) # (3, 128, 640)
padded = np.pad(image, ((0, 0), (0, 0), (127, 127)), mode='edge')
print('padded.shape: ', padded.shape) # (3, 128, 894)
sliding = np.lib.stride_tricks.as_strided(
padded,
# Frames stored along first dimension; sliding across last dimension of `padded`.
shape=(padded.shape[-1]-128+1, 3, 128, 128),
# First dimension: Moving one frame ahead -> move across last dimension of `padded`.
# Remaining three dimensions: Move as within `padded`.
strides=(padded.strides[-1:] + padded.strides)
)
print('sliding.shape: ', sliding.shape) # (767, 3, 128, 128)
# Now at this part we would feed the frames `sliding` to the ML model,
# where the first dimension is the batch size.
# Assume the output is assigned to `sliding` again.
# Since we're not using an ML model here, we create a copy instead
# in order to update the strides of `sliding` with it's actual shape (as defined above).
sliding = sliding.copy()
sliding_128 = np.lib.stride_tricks.as_strided(
# Reverse last dimension since we want the last column from the first frame.
# Need to copy again because `[::-1]` creates a view with negative stride,
# but we want actual reversal to work with the strides below.
# (There's perhaps a smart way of adjusting the strides below in order to not make a copy here.)
sliding[:, :, :, ::-1].copy(),
# Second dimension corresponds to the 128 consecutive frames.
# Previous last dimension is dropped since we're selecting the
# column that corresponds to the current n-position.
shape=(128, n, 3, 128),
# First dimension (frame position): Move one frame and one column ahead
# (actually want to move one column less in `sliding` but since we reverted order of columns
# we need to move one ahead now) -> move across first dimension of `sliding` + last dimension of `sliding`.
# Second dimension (n-position): Moving one frame ahead -> move across first dimension of `sliding`.
# Remaining two dimensions: Move within frames (channel and row dimensions).
strides=((sliding.strides[0] + sliding.strides[-1],) + sliding.strides[:1] + sliding.strides[1:3])
)
print('sliding_128.shape: ', sliding_128.shape) # (128, 640, 3, 128)
# Weights are independent of the n-position -> we can precompute.
weights = 1 + np.concatenate([np.arange(64), np.arange(64)[::-1]])
weights = np.ones(shape=128) # [TEST] Assign weights for testing -> want to obtain the original image back.
weights = weights.astype(float) / weights.sum() # Normalize?
weights = weights[:, None, None, None] # Prepare for broadcasting.
weighted_image = np.moveaxis(np.sum(sliding_128 * weights, axis=0), 0, 2)
print('weighted_image.shape: ', weighted_image.shape) # (3, 128, 640)
assert np.array_equal(image, weighted_image.astype(int)) # [TEST]
Related
I have a 5D array with shape (80, 180, 144, 160, 11) (80 3D-images of size 180*144*160 each with 11 channels) and a set of indices referring to this array with shape (n, 4) (that is n indices referring to which image and which 3D-pixel I am interested in).
Now to the question, I want to extract "blocks" with shape (18, 18, 20) centered around every index and preserving all channels. This will yield an ndarray of shape (n, 18, 18, 20, 11). Also, if an index is too close to the border of the 3D-image as to not fit the entire block then I want to 0-pad the image.
I have managed to do this myself with a for-loop over every index but the performance is rather poor unfortunately (~10 s for n=100). I need to do this for ns in the range of 10 000 - 1 000 000 so my solution is not really an option.
My attempt where the images are given in images and the indices in block_indices:
block_shape = (18, 18, 20)
blocks = np.empty((0,) + block_shape + (11,))
for index in block_indices:
block = np.pad(images[index[0]], ((block_shape[0], block_shape[0]),
(block_shape[1], block_shape[1]),
(block_shape[2], block_shape[2]),
(0, 0)))[index[1]+int(block_shape[0]/2):index[1]+int(3*block_shape[0]/2),
index[2]+int(block_shape[1]/2):index[2]+int(3*block_shape[1]/2),
index[3]+int(block_shape[2]/2):index[3]+int(3*block_shape[2]/2),
...]
blocks = np.append(blocks, block[np.newaxis, ...], axis=0)
I was thinking that this can probably be done really quickly with slicing and fancy array indexing but I have tried to no avail. Do you have any suggestions how this can be done more quickly? Thanks in advance!
PS: The numbers presented can vary a bit but should give you a rough idea of the scale.
For anyone looking to do the same thing in the future
I have managed to come up with another solution which is a lot faster and scales better. It involves use of a "shifting" block matrix, np.tile, flattening and some reshaping. One caveat is that the indices of the blocks need to be given in a 1D array of length n where each index corresponds to the index in a flattened array of 3D-images. One can quite easily convert between these different representations however.
For brevity I will only explain the main concepts of the method and then post a working code example, here goes.
Main concepts:
First we flatten or images array so that it gets shape (80*180*144*160,11).
Now we need to come to the realisation that the blocks we are after can be accessed from the flattened array according to a predictable pattern which is only shifted along depending on the location of the block.
These elements can be taken out with np.take so long as we know the indices.
Lastly the result of np.take can be reshapened into an array of blocks.
Working code example:
# Make a 3D-image which enumerates all pixels.
image_pixel_enumeration = np.arange(180*144*160).reshape(180, 144, 160)
# Get the index pattern of a block.
block_shifts = image_pixel_enumeration[:block_shape[0], :block_shape[1], :block_shape[2]].flatten() \
- image_pixel_enumeration[int(block_shape[0]/2), int(block_shape[1]/2), int(block_shape[2]/2)]
# Tile an array with the pattern, one for each block.
block_shifts = np.tile(block_shifts, (len(block_indices), 1))
# Tile an array with the block center indices add to them the pattern.
validation_data_indices = np.tile(block_indices, (np.prod(block_shape), 1)).transpose() + block_shifts
# Take out elements.
validation_data = np.take(x_test.reshape((-1, 11)), validation_data_indices.flatten(), 0, mode='clip')
# Reshape into blocks.
validation_data = validation_data.reshape((-1,) + block_shape + (11,))
This method takes (on my machine) approximately 0.1 s, 0.2 s and 1.4 s for 10, 100 and 1 000 indices respectively whilst the old method took approximately 1 s, 16 s and 900 s for the same number of indices. A massive improvement!
PS. Note that this solution does not solve the issue of blocks extending beyond the original image and can potentially pick pixels from the wrong images or wrong slices in these cases.
I am trying to preprocess an RGB image before sending it into my model. The shape of the image is (2560, 1440,3). For that I need to calculate the mean of every channel and substract them from corresponding channel pixels. I know that I can do it by:
np.mean(image_array, axis=(0, 1)).
However, I cannot understand the process how it is being done.
I am aware of how axes work individually (axis=0 for columns and axis = 1 for rows). How does the axis = (0,1) work in this situation?
And also how can I do the same thing for multiple images, say, train_data_shape = (1000, 256, 256, 3)?
I appreciate every feedback!
Consider what happens when you have an array X of shape (5, 3) and you execute np.mean(X, axis=0). You’ll get back an array of shape (1, 3) where the (0, i) element is the average of the 5 values in column i. You’re essentially ‘averaging out’ that first dimension. If you instead set axis=1, you’d get back an array of shape (5, 1) where the (i, 0) element is the average of the 3 values in row i - now, you’re averaging out that second dimension.
It works similarly when multiple axes are provided. Say X is of shape (5, 4, 2). Then, executing np.mean(X, axis=(0,1)) will return an array of shape (1, 2) where the (0, i) element is the average of the sub-array X[:, :, i] (of shape (5, 4)). We’re averaging out the first two dimensions.
To answer your second question: If you want to compute means on an image-by-image and channel-by-channel basis, use axis=(1,2). If you want to compute means over all of your images per channel, use axis=(0,1,2).
I would like to figure out a way to apply a function which calculates pairwise distances, let's call it dists(A, B), row-wise for every input element in a batch, meaning:
(100, 16, 3) -- input, 100 is the batch size so 100 instances, 16 is let's say image size, and 3 filters (asking for Conv2D)
(5, 3) -- tensor for which I want to calculate the row-wise distance (assume it's A in dists(A, B) and is fixed)
Now, for every instance I am supposed to get back a matrix of shape (5, 16). Naturally, I could use a for to span the batch and get my final (100,5,16) result. However, I would love to know if there is an easier way to apply my function row-wise, in parallel, using GPU.
Thank you very much for your time.
Suppose we are using the L1 distance:
import torch
# data and target
a = torch.randn(100, 16, 3)
b = torch.randn(5, 3)
# Reshape the tensors
a = a.unsqueeze(1)
b = b.unsqueeze(0).unsqueeze(2)
print(a.shape, b.shape)
# Compute distance
dist = (a-b).abs().sum(3)
print(dist.shape)
I have a tensors data of sensors, each tensor is of shape (4,1500)
This is 1500 timepoints and for each time point I have 4 features.
I want to "smooth" the sequences with rolling average or other rolling statistics. The end goal is to try to improve an lstm autoencoder with rolling statistics instead of the long raw sequence.
I am familiar with rolling windows of pandas and currently I am doing this:
#tensor shape:
data.shape
(4,1500)
#convert data to numpy array and then to dataframe and perform rolling mean
rolled_data=pd.DataFrame(data.numpy().swapaxes(1,0)).rolling(10).mean()[::10]
rolled_data.shape
(150, 4)
# convert back the dataframe to tensor
tensor_rolled_data=torch.Tensor(rolled_data.to_numpy().swapaxes(1,0))
tensor_rolled_data.shape
torch.Size([4, 150])
my question is- is there a better way to do it? a function in numpy/torch that can do rolling statistics in a cleaner or more efficient way?
Since you're striding the output by the size of the window this is actually more akin to downsampling by averaging than to a computing rolling statistics. We can take advantage of the fact that there are no overlaps by simply reshaping the initial tensor.
Using Tensor.reshape
Assuming your data tensor has a shape divisible by 10 then you can just reshape the tensor to shape (4, 150, 10) and compute the statistic along the last dimension. For example
win_size = 10
tensor_rolled_data = data.reshape(data.shape[0], -1, win_size).mean(dim=2)
This solution doesn't give exactly the same results as your tensor_rolled_data since in this solution the first entry will contain the mean of the first 10 samples, the second entry will contain the mean of the second 10 samples, etc... The pandas solution is a "causal filter" so the first entry will contain the mean of the 10 most recent samples up to and including sample 0, the second will contain the 10 most recent samples up to and including sample 10, etc... (Note that the first entry is nan in the pandas solution since less than 10 preceding samples exist).
If this difference is unacceptable you can recreate the pandas result by first padding with 9 nan values and clipping off the last 9 samples.
import torch.nn.functional as F
win_size = 10
# pad with `nan` to match behavior of pandas
data_padded = F.pad(data[None, :, :-(win_size - 1)], (win_size - 1, 0), 'constant', float('nan')).squeeze(0)
# find mean of groups of N samples
tensor_rolled_data = data_padded.reshape(data.shape[0], -1, win_size).mean(dim=2)
Using Tensor.unfold
To address the comment about what to do when there are overlaps. If you're only interested in the mean statistic then there are a number of ways to compute this (e.g. convolution, average pooling, tensor unfolding). That said, Tensor.unfold gives the most general solution since it could be used to compute any statistic over a window. For example
# same as first example above
win_size = 10
tensor_rolled_data = data.unfold(dimension=1, size=win_size, step=win_size).mean(dim=2)
or
# same as second example above
import torch.nn.functional as F
win_size = 10
data_padded = F.pad(data.unsqueeze(0), (win_size - 1, 0), 'constant', float('nan')).squeeze(0)
tensor_rolled_data = data_padded.unfold(dimension=1, size=win_size, step=win_size).mean(dim=2)
In the above cases, unfolding produces the same result as reshape since size and step are equal. However, unlike reshape, unfolding also supports size != step.
win_size = 10
stride = 2
tensor_rolled_data = data.unfold(1, win_size, stride).mean(dim=2).mean(dim=2)
# produces shape [4, 746]
or you can pad the front of the features with win_size - 1 values to achieve the same result as pandas.
import torch.nn.functional as F
win_size = 10
stride = 2
data_padded = F.pad(data.unsqueeze(0), (win_size - 1, 0), 'constant', float('nan')).squeeze(0)
tensor_rolled_data = data_padded.unfold(1, win_size, stride).mean(dim=2)
# produces shape [4, 750]
Note In practice you probably don't want to pad with NaN since this will probably become quite a headache. Instead you could use zero padding, 'replicate' padding, or 'mirror' padding.
I have some images I want to work with, the problem is that there are two kinds of images both are 106 x 106 pixels, some are in color and some are black and white.
one with only two (2) dimensions:
(106,106)
and one with three (3)
(106,106,3)
Is there a way I can strip this last dimension?
I tried np.delete, but it did not seem to work.
np.shape(np.delete(Xtrain[0], [2] , 2))
Out[67]: (106, 106, 2)
You could use numpy's fancy indexing (an extension to Python's built-in slice notation):
x = np.zeros( (106, 106, 3) )
result = x[:, :, 0]
print(result.shape)
prints
(106, 106)
A shape of (106, 106, 3) means you have 3 sets of things that have shape (106, 106). So in order to "strip" the last dimension, you just have to pick one of these (that's what the fancy indexing does).
You can keep any slice you want. I arbitrarily choose to keep the 0th, since you didn't specify what you wanted. So, result = x[:, :, 1] and result = x[:, :, 2] would give the desired shape as well: it all just depends on which slice you need to keep.
if you have multiple dimensional this might help
pred_mask[0,...] #Remove First Dim
Pred_mask[...,0] #Remove Last Dim
Just take the mean value over the colors dimension (axis=2):
Xtrain_monochrome = Xtrain.mean(axis=2)
When the shape of your array is (106, 106, 3), you can visualize it as a table with 106 rows and 106 columns filled with data points where each point is array of 3 numbers which we can represent as [x, y ,z]. Therefore, if you want to get the dimensions (106, 106), you must make the data points in your table of to not be arrays but single numbers. You can achieve this by extracting either the x-component, y-component or z-component of each data point or by applying a function that somehow aggregates the three component like the mean, sum, max etc. You can extract any component just like #matt Messersmith suggested above.
well, you should be careful when you are trying to reduce the dimensions of an image.
An Image is normally a 3-D matrix that contains data of the RGB values of each pixel. If you want to reduce it to 2-D, what you really are doing is converting a colored RGB image into a grayscale image.
And there are several ways to do this like you can take the maximum of three, min, average, sum, etc, depending on the accuracy you want in your image. The best you can do is, take a weighted average of the RGB values using the formula
Y = 0.299R + 0.587G + 0.114B
where R stands for RED, G is GREEN and B is BLUE. In numpy, this can be written as
new_image = img[:, :, 0]*0.299 + img[:, :, 1]*0.587 + img[:, :, 2]*0.114
Actually np.delete would work if you would apply it two times,
if you want to preserve the first channel for example then you could run the following:
Xtrain = np.delete(Xtrain,2,2) # this will get rid of the 3rd component of the 3 dimensions
print(Xtrain.shape) # will now output (106,106,2)
# again we apply np.delete but on the second component of the 3rd dimension
Xtrain = np.delete(Xtrain,1,2)
print(Xtrain.shape) # will now output (106,106,1)
# you may finally squeeze your output to get a 2d array
Xtrain = Xtrain.squeeze()
print(Xtrain.shape) # will now output (106,106)