Alright - assume I have two numpy arrays, shapes are:
(185, 100, 50, 3)
(64, 100, 50, 3)
The values contained are 185 or 64 frames of video (for each frame, width is 100 pixels, height is 50, 3 channels, these are just images. The specifics of the images remain constant - the only value that changes is the number of frames per video)
I need to get them both into a single array of some shape like
(2, n, 100, 50, 3)
Where both videos are contained (to run through a neural net as a batch)
I've already tried using np.stack - but I get
ValueError: all input arrays must have the same shape
This is a quick brainstorm idea that I've got, along with strategy and Python code. Note: I was going to stick to just comment but to illustrate this idea I'd need to type in some codes. So here we go! (grab a coffee / a strong drink is recommended...)
Current State
we have video 1 vid1 with 4D shape (185, 100, 50, 3)
we have video 2 vid2 with 4D shape (64, 100, 50, 3)
... where the shape represents (frame ID, width, height, RGB channels)
we want to "stack" the two videos together as one numpy array with 5D shape (2, n, 100, 50, 3). Note: 2 because we are stacking 2 videos. n is a hyperparameter that we can choose. We keep the video size the same (100 width x 50 height x 3 RGB channels)
Opportunities
The first thing I see is that vid1 has roughly 3 times more frames than vid2. What about we use 60 as the common factor? i.e. let's set our hyperparameter n to 60. (Note: some "frame cropping" / "frame throwing away" may be required - this will be covered below.)
Strategy
Phase 1 - Crop both videos (throw away some frames)
Let's crop both vid1 and vid2 to nice round numbers that are of multiple of 60 (our n - the hyperparameter). Concretely:
crop vid1 so that the shape becomes (180, 100, 50, 3). (i.e. we throw away the last 5 frames). We call this new cropped video vid1_cropped.
crop vid2 so that the shape becomes (60, 100, 50, 3). (i.e. we throw away the last 4 frames). We call this new cropped video vid2_cropped.
Phase 2 - Make both videos 60 frames
vid2_cropped is already at 60 frames, with shape (60, 100, 50, 3). So we leave this alone.
vid1_cropped however is at 180 frames. So I suggest we reduce this video to 60 frames, by averaging the RGB channel values in 3-frame batches - for all pixel positions (along width and height). What we will get at the end of this process, is a somewhat "diluted" (averaged) video with the same shape as vid2_cropped - (60, 100, 50, 3). Let's called this diluted video vid1_cropped_diluted.
Phase 3 - stack the two same-shape videos together
Now that both vid2_cropped and vid1_cropped_diluted are of the same 4D shape (60, 100, 50, 3). We may stack them together to obtain our final numpy array of 5D shape (2, 60, 100, 50, 3) - let's call this vids_combined.
We are done!
Demo
Turning the strategy into codes. I did this in Python 3.6 (with Jupyter Notebook / Jupyter Console).
Some notes:
I yet to validate the code (and revised as needed). In the meantime If you see any bugs please shout - I will be happy to update.
I have a gut feeling line 10 below on "diluting" (np.average step) might contain error. i.e. I mean to perform the 3-frame averaging only against the RGB channel values, for all pixel positions. I need to double check syntax. (In the meantime please kindly check line 10!)
this post illustrates concepts and some code implementation. Ideally I would have step through this in more depth, via much smaller video sizes so we may obtain better intuition / visualise each step, pixel by pixel. (I might come back to this when I have time). For now, I believe the numpy array shape analysis is sufficient to convey the idea across.
In [1]: import numpy as np
In [2]: vid1 = np.random.random((185, 100, 50, 3))
In [3]: vid1.shape
Out[3]: (185, 100, 50, 3)
In [4]: vid2 = np.random.random((64, 100, 50, 3))
In [5]: vid2.shape
Out[5]: (64, 100, 50, 3)
In [6]: vid1_cropped = vid1[:180]
In [7]: vid1_cropped.shape
Out[7]: (180, 100, 50, 3)
In [8]: vid2_cropped = vid2[:60]
In [9]: vid2_cropped.shape
Out[9]: (60, 100, 50, 3)
In [10]: vid1_cropped_diluted = np.average(vid1_cropped.reshape(3,60,100,50,3),
: axis=0)
In [11]: vid1_cropped_diluted.shape
Out[11]: (60, 100, 50, 3)
In [12]: vids_combined = np.stack([vid1_cropped_diluted, vid2_cropped])
In [13]: vids_combined.shape
Out[13]: (2, 60, 100, 50, 3)
You can't stack arrays with different dimensions, since you need a value for each dimension.
Your options are therefore to:
not use arrays at all, and simply use a python list
pad the arrays with zeros until they are the same shape
resample the arrays to the same shape
Atlas7's answer is an implementation of 3, but you'd probably do better to use scipy.ndimage.zoom in some way, for a more flexible solution
Related
I have a 5D array with shape (80, 180, 144, 160, 11) (80 3D-images of size 180*144*160 each with 11 channels) and a set of indices referring to this array with shape (n, 4) (that is n indices referring to which image and which 3D-pixel I am interested in).
Now to the question, I want to extract "blocks" with shape (18, 18, 20) centered around every index and preserving all channels. This will yield an ndarray of shape (n, 18, 18, 20, 11). Also, if an index is too close to the border of the 3D-image as to not fit the entire block then I want to 0-pad the image.
I have managed to do this myself with a for-loop over every index but the performance is rather poor unfortunately (~10 s for n=100). I need to do this for ns in the range of 10 000 - 1 000 000 so my solution is not really an option.
My attempt where the images are given in images and the indices in block_indices:
block_shape = (18, 18, 20)
blocks = np.empty((0,) + block_shape + (11,))
for index in block_indices:
block = np.pad(images[index[0]], ((block_shape[0], block_shape[0]),
(block_shape[1], block_shape[1]),
(block_shape[2], block_shape[2]),
(0, 0)))[index[1]+int(block_shape[0]/2):index[1]+int(3*block_shape[0]/2),
index[2]+int(block_shape[1]/2):index[2]+int(3*block_shape[1]/2),
index[3]+int(block_shape[2]/2):index[3]+int(3*block_shape[2]/2),
...]
blocks = np.append(blocks, block[np.newaxis, ...], axis=0)
I was thinking that this can probably be done really quickly with slicing and fancy array indexing but I have tried to no avail. Do you have any suggestions how this can be done more quickly? Thanks in advance!
PS: The numbers presented can vary a bit but should give you a rough idea of the scale.
For anyone looking to do the same thing in the future
I have managed to come up with another solution which is a lot faster and scales better. It involves use of a "shifting" block matrix, np.tile, flattening and some reshaping. One caveat is that the indices of the blocks need to be given in a 1D array of length n where each index corresponds to the index in a flattened array of 3D-images. One can quite easily convert between these different representations however.
For brevity I will only explain the main concepts of the method and then post a working code example, here goes.
Main concepts:
First we flatten or images array so that it gets shape (80*180*144*160,11).
Now we need to come to the realisation that the blocks we are after can be accessed from the flattened array according to a predictable pattern which is only shifted along depending on the location of the block.
These elements can be taken out with np.take so long as we know the indices.
Lastly the result of np.take can be reshapened into an array of blocks.
Working code example:
# Make a 3D-image which enumerates all pixels.
image_pixel_enumeration = np.arange(180*144*160).reshape(180, 144, 160)
# Get the index pattern of a block.
block_shifts = image_pixel_enumeration[:block_shape[0], :block_shape[1], :block_shape[2]].flatten() \
- image_pixel_enumeration[int(block_shape[0]/2), int(block_shape[1]/2), int(block_shape[2]/2)]
# Tile an array with the pattern, one for each block.
block_shifts = np.tile(block_shifts, (len(block_indices), 1))
# Tile an array with the block center indices add to them the pattern.
validation_data_indices = np.tile(block_indices, (np.prod(block_shape), 1)).transpose() + block_shifts
# Take out elements.
validation_data = np.take(x_test.reshape((-1, 11)), validation_data_indices.flatten(), 0, mode='clip')
# Reshape into blocks.
validation_data = validation_data.reshape((-1,) + block_shape + (11,))
This method takes (on my machine) approximately 0.1 s, 0.2 s and 1.4 s for 10, 100 and 1 000 indices respectively whilst the old method took approximately 1 s, 16 s and 900 s for the same number of indices. A massive improvement!
PS. Note that this solution does not solve the issue of blocks extending beyond the original image and can potentially pick pixels from the wrong images or wrong slices in these cases.
I have a list of matrices with size of (63,32,1,600,600), when I want to stack it with torch.stack(matrices).cpu().detach().numpy() it's raising with error:
"stack expects each tensor to be equal size, but got [32, 1, 600, 600] at entry 0 and [16, 1, 600, 600] at entry 62". Is tried for resizing but it did not work. I appreciate any recommendations.
If I understand correctly what you're trying to do is stack the outputted mini-batches together into a single batch. My bet is that your last batch is partially filled (only has 16 elements instead of 32).
Instead of using torch.stack (creating a new axis), I would simply concatenate with torch.cat on the batch axis (axis=0). Assuming matrices is a list of torch.Tensors.
torch.cat(matrices).cpu().detach().numpy()
As torch.cat concatenates on axis=0 by default.
When we have tensors that differ in size only on the first dimension, as of PyTorch v1.7.0, we can use torch.vstack() to stack it along axis 0. Using torch.stack() fails here because it expects all the tensors to be of same shape.
Here is a reproducible illustration matching your problem description:
# sample tensors (as per your size)
In [65]: t1 = torch.randn([32, 1, 600, 600])
In [66]: t2 = torch.randn([16, 1, 600, 600])
# vertical stacking (i.e., stacking along axis 0)
In [67]: stacked = torch.vstack([t1, t2])
# check shape of output
In [68]: stacked.shape
Out[68]: torch.Size([48, 1, 600, 600])
we get 48 (32 + 16) as the size of first dimension in the result because we're stacking tensors along that dimension.
Note:
You can also initialize the result tensor, say stacked, by explicitly calculating the shape and pass this tensor as a parameter to out= kwarg of torch.vstack() if you want to write the result to a specific tensor, for instance updating the values of existing tensor (of same shape). However, this is optional.
# calculate new shape of stacking
In [80]: newshape = (t1.shape[0] + t2.shape[0], *t1.shape[1:])
# allocate an empty tensor, filled with garbage values
In [81]: stacked = torch.empty(newshape)
# stack it along axis 0 and write the result to `stacked`
In [83]: torch.vstack([t1, t2], out=stacked)
# check shape/size
In [84]: stacked.shape
Out[84]: torch.Size([48, 1, 600, 600])
I have a tiled numpy array of shape (16, 32, 16, 16), that is each tile is 16x16 pixels in a grid 32 tiles wide and 16 high.
From here I want to reshape it to a 256 high x 512 wide 2D image, and I can't quite find the right incantation of splits, slices, and reshapes to get to what I want.
You can combine numpy's reshape and transpose to get this job done. I am not entirely sure which of the three "16"s belongs to the 32x16 repetition grid, but assuming it's the first one:
import numpy as np
data = np.random.random((16, 32, 16, 16))
# put number of repetitions next to respective dimension
transposed_data = np.transpose(data, (0, 2, 1, 3))
# concatenate repeated dimensions via reshape
reshaped_data = transposed_data.reshape((16 * 16, 32 * 16))
print(reshaped_data.shape)
I am working in Python and I have an image array which is of shape [100,3,200,1200]. The array is of format Number_of_images x Channels x Height x Width. I want to split the images along the width direction into 6 images of shape 200x200 and add that as different channels. Ultimately, I would like to receive an array of shape [100,18,200,200].
I've attempted use the reshape function but it is not working as expected. I tried the following:
np.reshape([100,18,200,200])
When I plot each image, I notice that it is not cropping the image the way I wanted it to.
First reshape to make the splits:
a = np.reshape(a, (100, 3, 200, 6, 200))
Then move the split axis besides the channel axis:
a = np.moveaxis(a, 3, 2)
Then merge those two axes:
a = np.reshape(a, (100, 18, 200, 200))
In this case, the 18 channels would be sorted as:
[red-split1, red-split2, red-split3, red-split4, red-split5, red-split6,
green-split1, ..., green-split6,
blue-split1, ..., blue-split6]
If you change the second instruction to:
a = np.moveaxis(a, 3, 1)
The axes would be sorted as:
[red-split1, green-split1, blue-split1,
...,
red-split6, green-split6, blue-split6]
I am new to Python.
I am confused as to what is happening with the following:
B = np.array([A[..., n:n+5] for n in (5*4, 5*5)])
Where A.shape = (60L, 128L, 128L)
and B.shape = (2L, 60L, 128L, 5L)
I believe it is supposed to make some sort of image patch. Can someone explain to me what this does? This example is in the context of applying neural networks to images.
The shape of A tells me that A is most likely an array of 60 grayscale images (batch size 60), with each image having a size of 128x128 pixels.
We have: B = np.array([A[..., n:n+5] for n in (5*4, 5*5)]). To better understand what's happening here, let's unpack this line in reverse:
for n in (5*4, 5*5): This is the same as for n in (20, 25). The author probably chose to write it in this way for some intuitive reason related to the data or the rest of the code. This gives us n=20 and n=25.
A[..., n:n+5]: This is the same as A[:, :, n:n+5]. This gives us all the rows from all the images of in A, but only the 5 columns at n:n+5. The shape of the resulting array is then (60, 128, 5).
n=20 gives us A[:, :, 20:25] and n=25 gives us A[:, :, 25:30]. Each of these arrays is therefore of size (60, 128, 5).
Together, [A[..., n:n+5] for n in (5*4, 5*5)] gives us a list (thanks list comprehension!) with two elements, each a numpy array of size (60, 128, 5). np.array() converts this list into a numpy array of shape (2, 60, 128, 5).
The result is that B contains 2 patches of each image, each a 5 pixel column wide subset of the original image- one starting at column 20 and the second one starting at column 25.
I can't speculate to the reason for this crop without further information about the network and its purpose.
Hope this helps!