Numpy remove a dimension from np array - python

I have some images I want to work with, the problem is that there are two kinds of images both are 106 x 106 pixels, some are in color and some are black and white.
one with only two (2) dimensions:
(106,106)
and one with three (3)
(106,106,3)
Is there a way I can strip this last dimension?
I tried np.delete, but it did not seem to work.
np.shape(np.delete(Xtrain[0], [2] , 2))
Out[67]: (106, 106, 2)

You could use numpy's fancy indexing (an extension to Python's built-in slice notation):
x = np.zeros( (106, 106, 3) )
result = x[:, :, 0]
print(result.shape)
prints
(106, 106)
A shape of (106, 106, 3) means you have 3 sets of things that have shape (106, 106). So in order to "strip" the last dimension, you just have to pick one of these (that's what the fancy indexing does).
You can keep any slice you want. I arbitrarily choose to keep the 0th, since you didn't specify what you wanted. So, result = x[:, :, 1] and result = x[:, :, 2] would give the desired shape as well: it all just depends on which slice you need to keep.

if you have multiple dimensional this might help
pred_mask[0,...] #Remove First Dim
Pred_mask[...,0] #Remove Last Dim

Just take the mean value over the colors dimension (axis=2):
Xtrain_monochrome = Xtrain.mean(axis=2)

When the shape of your array is (106, 106, 3), you can visualize it as a table with 106 rows and 106 columns filled with data points where each point is array of 3 numbers which we can represent as [x, y ,z]. Therefore, if you want to get the dimensions (106, 106), you must make the data points in your table of to not be arrays but single numbers. You can achieve this by extracting either the x-component, y-component or z-component of each data point or by applying a function that somehow aggregates the three component like the mean, sum, max etc. You can extract any component just like #matt Messersmith suggested above.

well, you should be careful when you are trying to reduce the dimensions of an image.
An Image is normally a 3-D matrix that contains data of the RGB values of each pixel. If you want to reduce it to 2-D, what you really are doing is converting a colored RGB image into a grayscale image.
And there are several ways to do this like you can take the maximum of three, min, average, sum, etc, depending on the accuracy you want in your image. The best you can do is, take a weighted average of the RGB values using the formula
Y = 0.299R + 0.587G + 0.114B
where R stands for RED, G is GREEN and B is BLUE. In numpy, this can be written as
new_image = img[:, :, 0]*0.299 + img[:, :, 1]*0.587 + img[:, :, 2]*0.114

Actually np.delete would work if you would apply it two times,
if you want to preserve the first channel for example then you could run the following:
Xtrain = np.delete(Xtrain,2,2) # this will get rid of the 3rd component of the 3 dimensions
print(Xtrain.shape) # will now output (106,106,2)
# again we apply np.delete but on the second component of the 3rd dimension
Xtrain = np.delete(Xtrain,1,2)
print(Xtrain.shape) # will now output (106,106,1)
# you may finally squeeze your output to get a 2d array
Xtrain = Xtrain.squeeze()
print(Xtrain.shape) # will now output (106,106)

Related

I am trying to understand how the process of calculating the mean for every channel of RGB image

I am trying to preprocess an RGB image before sending it into my model. The shape of the image is (2560, 1440,3). For that I need to calculate the mean of every channel and substract them from corresponding channel pixels. I know that I can do it by:
np.mean(image_array, axis=(0, 1)).
However, I cannot understand the process how it is being done.
I am aware of how axes work individually (axis=0 for columns and axis = 1 for rows). How does the axis = (0,1) work in this situation?
And also how can I do the same thing for multiple images, say, train_data_shape = (1000, 256, 256, 3)?
I appreciate every feedback!
Consider what happens when you have an array X of shape (5, 3) and you execute np.mean(X, axis=0). You’ll get back an array of shape (1, 3) where the (0, i) element is the average of the 5 values in column i. You’re essentially ‘averaging out’ that first dimension. If you instead set axis=1, you’d get back an array of shape (5, 1) where the (i, 0) element is the average of the 3 values in row i - now, you’re averaging out that second dimension.
It works similarly when multiple axes are provided. Say X is of shape (5, 4, 2). Then, executing np.mean(X, axis=(0,1)) will return an array of shape (1, 2) where the (0, i) element is the average of the sub-array X[:, :, i] (of shape (5, 4)). We’re averaging out the first two dimensions.
To answer your second question: If you want to compute means on an image-by-image and channel-by-channel basis, use axis=(1,2). If you want to compute means over all of your images per channel, use axis=(0,1,2).

Get the value at a position from all layers in python

I have 3 numpy arrays of shape (224, 224, 20). I want to go through each of (224, 224) values in all 20 layers (dimensions) and compare them to get the highest among them. For 3 Dimensional, I am able to come up with this:
arr1 = np.array([[[1,2,3],[4,5,6]],[[10,11,12],[15,16,17]]])
for x in range(0,2):
for y in range(0,2):
print(arr1[:,x,y])
But, I somehow couldn't understand how to convert it for (224,224,20) shaped arrays.
I also need the index of the layer which contains the maximum value.
To get max values along one dimension, you can use numpy.amax, checkout:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.amax.html
You can do this with numpy.max instead of a for loop:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.max.html
np.max(arr1, axis=2)
To get the index, use numpy.argmax
https://docs.scipy.org/doc/numpy/reference/generated/numpy.argmax.html
np.argmax(arr1, axis=2)

How to transform 3rd dimension of Numpy array loaded with ImageIO

I have a numpy array of shape (128,128,3) loaded from a PNG using ImageIO.
Dimension 3 seems to represent RGB values. In this instance all values for Dimension 3 are either [255,255,255] or [0,0,0] (i.e. white or black).
I want to get rid of the third dimension and replace it with a single 1D array containing 0 for black, and 1 for white. So the end result shape should be (128,128,1).
I've attempted to use combinations of numpy.reshape and numpy.transpose but I'm really struggling to understand how to do this. I am a beginner to numpy and Python so I may be missing something very simple.
It's not missing, and it is very simple. Just index the channel you want:
im[:, :, 0]
To convert to zeros and ones, you can either make a boolean array:
im[:, :, 0].astype(np.bool)
or set 255 to one:
im = im[:, :, 0]
im[im > 0] = 1
A more advanced approach to creating a boolean array would be to view the underlying data as a boolean. This will only work well out of the box if the input is uint8:
im[:, :, 0].view(dtype=np.bool)
Finally, to index the last dimension of an N dimensional array, you can use ellipsis:
im[..., 0]
... (or the actual name Ellipsis) in an index means "use : for all dimensions not listed explicitly." You can use it at most once in an index.
In general, you will want to read the documentation on indexing and later on broadcasting. There are gentler introductions it there, but the numpy documentation is quite comprehensive and straight from the horse's mouth.

what's the mean of "reshape(-1,1,2)"

x = np.linspace(0,10, 5)
y = 2*x
points = np.array([x, y]).T.reshape(-1, 1, 2)
What's the mean of the third line?I know the mean of reshape(m,n), but what does reshape(-1, 1, 2) means?
Your question is not entirely clear, so I'm guessing the -1 part is what troubles you.
From the documantaion:
The new shape should be compatible with the original shape. If an integer, then the result will be a 1-D array of that length. One shape dimension can be -1. In this case, the value is inferred from the length of the array and remaining dimensions.
The whole line meaning is this (breaking it down for simplicity):
points = np.array([x, y]) -> create a 2 X 5 np.array consisting of x,y
.T -> transpose
.reshape(-1, 1, 2) -> reshape it, in this case to a 5X1X2 array (as can seen by the output of points.shape [(5L, 1L, 2L)])
vertices = np.array([[100,300],[200,200],[400,300],[200,400]],np.int32)
vertices.shape
pts = vertices.reshape((-1,1,2))
refer this image
Consider the above code
here we have created set of vertices for to be plotted on a image using opencv but opencv expects 3d array but we only have vertices in 2d array.So the .reshape((-1,1,2)) allows us to keep the original array intact while adding the 3rd dimension to the array(Notice the extra brackets added to the list).This third dimension coontains the details for colors i.e RGB

Python np.array example

I am new to Python.
I am confused as to what is happening with the following:
B = np.array([A[..., n:n+5] for n in (5*4, 5*5)])
Where A.shape = (60L, 128L, 128L)
and B.shape = (2L, 60L, 128L, 5L)
I believe it is supposed to make some sort of image patch. Can someone explain to me what this does? This example is in the context of applying neural networks to images.
The shape of A tells me that A is most likely an array of 60 grayscale images (batch size 60), with each image having a size of 128x128 pixels.
We have: B = np.array([A[..., n:n+5] for n in (5*4, 5*5)]). To better understand what's happening here, let's unpack this line in reverse:
for n in (5*4, 5*5): This is the same as for n in (20, 25). The author probably chose to write it in this way for some intuitive reason related to the data or the rest of the code. This gives us n=20 and n=25.
A[..., n:n+5]: This is the same as A[:, :, n:n+5]. This gives us all the rows from all the images of in A, but only the 5 columns at n:n+5. The shape of the resulting array is then (60, 128, 5).
n=20 gives us A[:, :, 20:25] and n=25 gives us A[:, :, 25:30]. Each of these arrays is therefore of size (60, 128, 5).
Together, [A[..., n:n+5] for n in (5*4, 5*5)] gives us a list (thanks list comprehension!) with two elements, each a numpy array of size (60, 128, 5). np.array() converts this list into a numpy array of shape (2, 60, 128, 5).
The result is that B contains 2 patches of each image, each a 5 pixel column wide subset of the original image- one starting at column 20 and the second one starting at column 25.
I can't speculate to the reason for this crop without further information about the network and its purpose.
Hope this helps!

Categories

Resources