I'm working on an array shaped as follows
(64, 1, 64, 64)
This is in fact one grayscale image that was split into 64 patches, each patch with 64*64px.
Now I need to rebuild it into a 512*512px image.
I've tried using
np.reshape(arr, (512, 512))
but of course the resulting image is not as expected.
How do I resolve this?
It depends on how your patches are arranged. But the first thing you could try is
image.reshape(8, 8, 64, 64).swapaxes(1, 2).reshape(512, 512)
This is assuming that the original zeroth dimension lists the patches row by row, i.e. 0-7 are the first row of patches from left to right, 8-15 the second row and so on.
The first reshape reestablishes that arrangement, after it choosing index i, j for axes 0 and 1 addresses the j+1st patch in the i+1st row.
Now comes the interesting bit: When merging axes by reshape:
only adjacent dimensions can be combined
all but the rightmost axis in each block will be dispersed
Since we want to keep each patch together we have to rearrange in such a way that the current axes 2 and 3 become the rightmost members of blocks. That is what the swapaxes does.
By now the shape is (8, 64, 8, 64) and axes 1 and 3 are the within-patch coordinates. Combining two pairs ( 8, 64 -> 512 8, 64 -> 512 ) is all that's left to do.
Related
I am trying to preprocess an RGB image before sending it into my model. The shape of the image is (2560, 1440,3). For that I need to calculate the mean of every channel and substract them from corresponding channel pixels. I know that I can do it by:
np.mean(image_array, axis=(0, 1)).
However, I cannot understand the process how it is being done.
I am aware of how axes work individually (axis=0 for columns and axis = 1 for rows). How does the axis = (0,1) work in this situation?
And also how can I do the same thing for multiple images, say, train_data_shape = (1000, 256, 256, 3)?
I appreciate every feedback!
Consider what happens when you have an array X of shape (5, 3) and you execute np.mean(X, axis=0). You’ll get back an array of shape (1, 3) where the (0, i) element is the average of the 5 values in column i. You’re essentially ‘averaging out’ that first dimension. If you instead set axis=1, you’d get back an array of shape (5, 1) where the (i, 0) element is the average of the 3 values in row i - now, you’re averaging out that second dimension.
It works similarly when multiple axes are provided. Say X is of shape (5, 4, 2). Then, executing np.mean(X, axis=(0,1)) will return an array of shape (1, 2) where the (0, i) element is the average of the sub-array X[:, :, i] (of shape (5, 4)). We’re averaging out the first two dimensions.
To answer your second question: If you want to compute means on an image-by-image and channel-by-channel basis, use axis=(1,2). If you want to compute means over all of your images per channel, use axis=(0,1,2).
I have a tiled numpy array of shape (16, 32, 16, 16), that is each tile is 16x16 pixels in a grid 32 tiles wide and 16 high.
From here I want to reshape it to a 256 high x 512 wide 2D image, and I can't quite find the right incantation of splits, slices, and reshapes to get to what I want.
You can combine numpy's reshape and transpose to get this job done. I am not entirely sure which of the three "16"s belongs to the 32x16 repetition grid, but assuming it's the first one:
import numpy as np
data = np.random.random((16, 32, 16, 16))
# put number of repetitions next to respective dimension
transposed_data = np.transpose(data, (0, 2, 1, 3))
# concatenate repeated dimensions via reshape
reshaped_data = transposed_data.reshape((16 * 16, 32 * 16))
print(reshaped_data.shape)
I am working in Python and I have an image array which is of shape [100,3,200,1200]. The array is of format Number_of_images x Channels x Height x Width. I want to split the images along the width direction into 6 images of shape 200x200 and add that as different channels. Ultimately, I would like to receive an array of shape [100,18,200,200].
I've attempted use the reshape function but it is not working as expected. I tried the following:
np.reshape([100,18,200,200])
When I plot each image, I notice that it is not cropping the image the way I wanted it to.
First reshape to make the splits:
a = np.reshape(a, (100, 3, 200, 6, 200))
Then move the split axis besides the channel axis:
a = np.moveaxis(a, 3, 2)
Then merge those two axes:
a = np.reshape(a, (100, 18, 200, 200))
In this case, the 18 channels would be sorted as:
[red-split1, red-split2, red-split3, red-split4, red-split5, red-split6,
green-split1, ..., green-split6,
blue-split1, ..., blue-split6]
If you change the second instruction to:
a = np.moveaxis(a, 3, 1)
The axes would be sorted as:
[red-split1, green-split1, blue-split1,
...,
red-split6, green-split6, blue-split6]
I am new to Python.
I am confused as to what is happening with the following:
B = np.array([A[..., n:n+5] for n in (5*4, 5*5)])
Where A.shape = (60L, 128L, 128L)
and B.shape = (2L, 60L, 128L, 5L)
I believe it is supposed to make some sort of image patch. Can someone explain to me what this does? This example is in the context of applying neural networks to images.
The shape of A tells me that A is most likely an array of 60 grayscale images (batch size 60), with each image having a size of 128x128 pixels.
We have: B = np.array([A[..., n:n+5] for n in (5*4, 5*5)]). To better understand what's happening here, let's unpack this line in reverse:
for n in (5*4, 5*5): This is the same as for n in (20, 25). The author probably chose to write it in this way for some intuitive reason related to the data or the rest of the code. This gives us n=20 and n=25.
A[..., n:n+5]: This is the same as A[:, :, n:n+5]. This gives us all the rows from all the images of in A, but only the 5 columns at n:n+5. The shape of the resulting array is then (60, 128, 5).
n=20 gives us A[:, :, 20:25] and n=25 gives us A[:, :, 25:30]. Each of these arrays is therefore of size (60, 128, 5).
Together, [A[..., n:n+5] for n in (5*4, 5*5)] gives us a list (thanks list comprehension!) with two elements, each a numpy array of size (60, 128, 5). np.array() converts this list into a numpy array of shape (2, 60, 128, 5).
The result is that B contains 2 patches of each image, each a 5 pixel column wide subset of the original image- one starting at column 20 and the second one starting at column 25.
I can't speculate to the reason for this crop without further information about the network and its purpose.
Hope this helps!
I have some images I want to work with, the problem is that there are two kinds of images both are 106 x 106 pixels, some are in color and some are black and white.
one with only two (2) dimensions:
(106,106)
and one with three (3)
(106,106,3)
Is there a way I can strip this last dimension?
I tried np.delete, but it did not seem to work.
np.shape(np.delete(Xtrain[0], [2] , 2))
Out[67]: (106, 106, 2)
You could use numpy's fancy indexing (an extension to Python's built-in slice notation):
x = np.zeros( (106, 106, 3) )
result = x[:, :, 0]
print(result.shape)
prints
(106, 106)
A shape of (106, 106, 3) means you have 3 sets of things that have shape (106, 106). So in order to "strip" the last dimension, you just have to pick one of these (that's what the fancy indexing does).
You can keep any slice you want. I arbitrarily choose to keep the 0th, since you didn't specify what you wanted. So, result = x[:, :, 1] and result = x[:, :, 2] would give the desired shape as well: it all just depends on which slice you need to keep.
if you have multiple dimensional this might help
pred_mask[0,...] #Remove First Dim
Pred_mask[...,0] #Remove Last Dim
Just take the mean value over the colors dimension (axis=2):
Xtrain_monochrome = Xtrain.mean(axis=2)
When the shape of your array is (106, 106, 3), you can visualize it as a table with 106 rows and 106 columns filled with data points where each point is array of 3 numbers which we can represent as [x, y ,z]. Therefore, if you want to get the dimensions (106, 106), you must make the data points in your table of to not be arrays but single numbers. You can achieve this by extracting either the x-component, y-component or z-component of each data point or by applying a function that somehow aggregates the three component like the mean, sum, max etc. You can extract any component just like #matt Messersmith suggested above.
well, you should be careful when you are trying to reduce the dimensions of an image.
An Image is normally a 3-D matrix that contains data of the RGB values of each pixel. If you want to reduce it to 2-D, what you really are doing is converting a colored RGB image into a grayscale image.
And there are several ways to do this like you can take the maximum of three, min, average, sum, etc, depending on the accuracy you want in your image. The best you can do is, take a weighted average of the RGB values using the formula
Y = 0.299R + 0.587G + 0.114B
where R stands for RED, G is GREEN and B is BLUE. In numpy, this can be written as
new_image = img[:, :, 0]*0.299 + img[:, :, 1]*0.587 + img[:, :, 2]*0.114
Actually np.delete would work if you would apply it two times,
if you want to preserve the first channel for example then you could run the following:
Xtrain = np.delete(Xtrain,2,2) # this will get rid of the 3rd component of the 3 dimensions
print(Xtrain.shape) # will now output (106,106,2)
# again we apply np.delete but on the second component of the 3rd dimension
Xtrain = np.delete(Xtrain,1,2)
print(Xtrain.shape) # will now output (106,106,1)
# you may finally squeeze your output to get a 2d array
Xtrain = Xtrain.squeeze()
print(Xtrain.shape) # will now output (106,106)