I am new to Python.
I am confused as to what is happening with the following:
B = np.array([A[..., n:n+5] for n in (5*4, 5*5)])
Where A.shape = (60L, 128L, 128L)
and B.shape = (2L, 60L, 128L, 5L)
I believe it is supposed to make some sort of image patch. Can someone explain to me what this does? This example is in the context of applying neural networks to images.
The shape of A tells me that A is most likely an array of 60 grayscale images (batch size 60), with each image having a size of 128x128 pixels.
We have: B = np.array([A[..., n:n+5] for n in (5*4, 5*5)]). To better understand what's happening here, let's unpack this line in reverse:
for n in (5*4, 5*5): This is the same as for n in (20, 25). The author probably chose to write it in this way for some intuitive reason related to the data or the rest of the code. This gives us n=20 and n=25.
A[..., n:n+5]: This is the same as A[:, :, n:n+5]. This gives us all the rows from all the images of in A, but only the 5 columns at n:n+5. The shape of the resulting array is then (60, 128, 5).
n=20 gives us A[:, :, 20:25] and n=25 gives us A[:, :, 25:30]. Each of these arrays is therefore of size (60, 128, 5).
Together, [A[..., n:n+5] for n in (5*4, 5*5)] gives us a list (thanks list comprehension!) with two elements, each a numpy array of size (60, 128, 5). np.array() converts this list into a numpy array of shape (2, 60, 128, 5).
The result is that B contains 2 patches of each image, each a 5 pixel column wide subset of the original image- one starting at column 20 and the second one starting at column 25.
I can't speculate to the reason for this crop without further information about the network and its purpose.
Hope this helps!
Related
I am trying to preprocess an RGB image before sending it into my model. The shape of the image is (2560, 1440,3). For that I need to calculate the mean of every channel and substract them from corresponding channel pixels. I know that I can do it by:
np.mean(image_array, axis=(0, 1)).
However, I cannot understand the process how it is being done.
I am aware of how axes work individually (axis=0 for columns and axis = 1 for rows). How does the axis = (0,1) work in this situation?
And also how can I do the same thing for multiple images, say, train_data_shape = (1000, 256, 256, 3)?
I appreciate every feedback!
Consider what happens when you have an array X of shape (5, 3) and you execute np.mean(X, axis=0). You’ll get back an array of shape (1, 3) where the (0, i) element is the average of the 5 values in column i. You’re essentially ‘averaging out’ that first dimension. If you instead set axis=1, you’d get back an array of shape (5, 1) where the (i, 0) element is the average of the 3 values in row i - now, you’re averaging out that second dimension.
It works similarly when multiple axes are provided. Say X is of shape (5, 4, 2). Then, executing np.mean(X, axis=(0,1)) will return an array of shape (1, 2) where the (0, i) element is the average of the sub-array X[:, :, i] (of shape (5, 4)). We’re averaging out the first two dimensions.
To answer your second question: If you want to compute means on an image-by-image and channel-by-channel basis, use axis=(1,2). If you want to compute means over all of your images per channel, use axis=(0,1,2).
I have a list itemlist that has 25 3D Arrays with shape (128x128x3)
I want it to convert/merge all these values into a single common array, basically create a image out of it. I'm expecting the new shape to be (640, 640, 3) meaning 5 rows and 5 columns of (128, 128)
I tried the following, but it is giving weird results, mostly repeating some arrays:
out = np.concatenate(itemlist).ravel()
out.shape ##(1228800,)
img = np.reshape(out, (640,640,3))
img.shape ## (640, 640, 3)
The final shape I get is correct but visually it looks like set of repeated images, is something wrong with logic?
With 25 (128,128,3) arrays
out = np.concatenate(itemlist)
should produce a (25*128, 128,3) array
out = out.ravel()
should produce 25128128*3
out.shape ##(1228800,)
(640,640,3) matches in total number of elements, but it will not produce meaningful images.
Working backwards:
(5*128, 5*128,3) => (5,128,5,128,3) => (5,5,128,128,3) => (25,128,128,3)
That requires a couple of reshapes, and one tranpose.
Alright - assume I have two numpy arrays, shapes are:
(185, 100, 50, 3)
(64, 100, 50, 3)
The values contained are 185 or 64 frames of video (for each frame, width is 100 pixels, height is 50, 3 channels, these are just images. The specifics of the images remain constant - the only value that changes is the number of frames per video)
I need to get them both into a single array of some shape like
(2, n, 100, 50, 3)
Where both videos are contained (to run through a neural net as a batch)
I've already tried using np.stack - but I get
ValueError: all input arrays must have the same shape
This is a quick brainstorm idea that I've got, along with strategy and Python code. Note: I was going to stick to just comment but to illustrate this idea I'd need to type in some codes. So here we go! (grab a coffee / a strong drink is recommended...)
Current State
we have video 1 vid1 with 4D shape (185, 100, 50, 3)
we have video 2 vid2 with 4D shape (64, 100, 50, 3)
... where the shape represents (frame ID, width, height, RGB channels)
we want to "stack" the two videos together as one numpy array with 5D shape (2, n, 100, 50, 3). Note: 2 because we are stacking 2 videos. n is a hyperparameter that we can choose. We keep the video size the same (100 width x 50 height x 3 RGB channels)
Opportunities
The first thing I see is that vid1 has roughly 3 times more frames than vid2. What about we use 60 as the common factor? i.e. let's set our hyperparameter n to 60. (Note: some "frame cropping" / "frame throwing away" may be required - this will be covered below.)
Strategy
Phase 1 - Crop both videos (throw away some frames)
Let's crop both vid1 and vid2 to nice round numbers that are of multiple of 60 (our n - the hyperparameter). Concretely:
crop vid1 so that the shape becomes (180, 100, 50, 3). (i.e. we throw away the last 5 frames). We call this new cropped video vid1_cropped.
crop vid2 so that the shape becomes (60, 100, 50, 3). (i.e. we throw away the last 4 frames). We call this new cropped video vid2_cropped.
Phase 2 - Make both videos 60 frames
vid2_cropped is already at 60 frames, with shape (60, 100, 50, 3). So we leave this alone.
vid1_cropped however is at 180 frames. So I suggest we reduce this video to 60 frames, by averaging the RGB channel values in 3-frame batches - for all pixel positions (along width and height). What we will get at the end of this process, is a somewhat "diluted" (averaged) video with the same shape as vid2_cropped - (60, 100, 50, 3). Let's called this diluted video vid1_cropped_diluted.
Phase 3 - stack the two same-shape videos together
Now that both vid2_cropped and vid1_cropped_diluted are of the same 4D shape (60, 100, 50, 3). We may stack them together to obtain our final numpy array of 5D shape (2, 60, 100, 50, 3) - let's call this vids_combined.
We are done!
Demo
Turning the strategy into codes. I did this in Python 3.6 (with Jupyter Notebook / Jupyter Console).
Some notes:
I yet to validate the code (and revised as needed). In the meantime If you see any bugs please shout - I will be happy to update.
I have a gut feeling line 10 below on "diluting" (np.average step) might contain error. i.e. I mean to perform the 3-frame averaging only against the RGB channel values, for all pixel positions. I need to double check syntax. (In the meantime please kindly check line 10!)
this post illustrates concepts and some code implementation. Ideally I would have step through this in more depth, via much smaller video sizes so we may obtain better intuition / visualise each step, pixel by pixel. (I might come back to this when I have time). For now, I believe the numpy array shape analysis is sufficient to convey the idea across.
In [1]: import numpy as np
In [2]: vid1 = np.random.random((185, 100, 50, 3))
In [3]: vid1.shape
Out[3]: (185, 100, 50, 3)
In [4]: vid2 = np.random.random((64, 100, 50, 3))
In [5]: vid2.shape
Out[5]: (64, 100, 50, 3)
In [6]: vid1_cropped = vid1[:180]
In [7]: vid1_cropped.shape
Out[7]: (180, 100, 50, 3)
In [8]: vid2_cropped = vid2[:60]
In [9]: vid2_cropped.shape
Out[9]: (60, 100, 50, 3)
In [10]: vid1_cropped_diluted = np.average(vid1_cropped.reshape(3,60,100,50,3),
: axis=0)
In [11]: vid1_cropped_diluted.shape
Out[11]: (60, 100, 50, 3)
In [12]: vids_combined = np.stack([vid1_cropped_diluted, vid2_cropped])
In [13]: vids_combined.shape
Out[13]: (2, 60, 100, 50, 3)
You can't stack arrays with different dimensions, since you need a value for each dimension.
Your options are therefore to:
not use arrays at all, and simply use a python list
pad the arrays with zeros until they are the same shape
resample the arrays to the same shape
Atlas7's answer is an implementation of 3, but you'd probably do better to use scipy.ndimage.zoom in some way, for a more flexible solution
I have some images I want to work with, the problem is that there are two kinds of images both are 106 x 106 pixels, some are in color and some are black and white.
one with only two (2) dimensions:
(106,106)
and one with three (3)
(106,106,3)
Is there a way I can strip this last dimension?
I tried np.delete, but it did not seem to work.
np.shape(np.delete(Xtrain[0], [2] , 2))
Out[67]: (106, 106, 2)
You could use numpy's fancy indexing (an extension to Python's built-in slice notation):
x = np.zeros( (106, 106, 3) )
result = x[:, :, 0]
print(result.shape)
prints
(106, 106)
A shape of (106, 106, 3) means you have 3 sets of things that have shape (106, 106). So in order to "strip" the last dimension, you just have to pick one of these (that's what the fancy indexing does).
You can keep any slice you want. I arbitrarily choose to keep the 0th, since you didn't specify what you wanted. So, result = x[:, :, 1] and result = x[:, :, 2] would give the desired shape as well: it all just depends on which slice you need to keep.
if you have multiple dimensional this might help
pred_mask[0,...] #Remove First Dim
Pred_mask[...,0] #Remove Last Dim
Just take the mean value over the colors dimension (axis=2):
Xtrain_monochrome = Xtrain.mean(axis=2)
When the shape of your array is (106, 106, 3), you can visualize it as a table with 106 rows and 106 columns filled with data points where each point is array of 3 numbers which we can represent as [x, y ,z]. Therefore, if you want to get the dimensions (106, 106), you must make the data points in your table of to not be arrays but single numbers. You can achieve this by extracting either the x-component, y-component or z-component of each data point or by applying a function that somehow aggregates the three component like the mean, sum, max etc. You can extract any component just like #matt Messersmith suggested above.
well, you should be careful when you are trying to reduce the dimensions of an image.
An Image is normally a 3-D matrix that contains data of the RGB values of each pixel. If you want to reduce it to 2-D, what you really are doing is converting a colored RGB image into a grayscale image.
And there are several ways to do this like you can take the maximum of three, min, average, sum, etc, depending on the accuracy you want in your image. The best you can do is, take a weighted average of the RGB values using the formula
Y = 0.299R + 0.587G + 0.114B
where R stands for RED, G is GREEN and B is BLUE. In numpy, this can be written as
new_image = img[:, :, 0]*0.299 + img[:, :, 1]*0.587 + img[:, :, 2]*0.114
Actually np.delete would work if you would apply it two times,
if you want to preserve the first channel for example then you could run the following:
Xtrain = np.delete(Xtrain,2,2) # this will get rid of the 3rd component of the 3 dimensions
print(Xtrain.shape) # will now output (106,106,2)
# again we apply np.delete but on the second component of the 3rd dimension
Xtrain = np.delete(Xtrain,1,2)
print(Xtrain.shape) # will now output (106,106,1)
# you may finally squeeze your output to get a 2d array
Xtrain = Xtrain.squeeze()
print(Xtrain.shape) # will now output (106,106)
I have the following 3rd order tensors. Both tensors matrices the first tensor containing 100 10x9 matrices and the second containing 100 3x10 matrices (which I have just filled with ones for this example).
My aim is to multiply the matrices as the line up one to one correspondance wise which would result in a tensor with shape: (100, 3, 9) This can be done with a for loop that just zips up both tensors and then takes the dot of each but I am looking to do this just with numpy operators. So far here are some failed attempts
Attempt 1:
import numpy as np
T1 = np.ones((100, 10, 9))
T2 = np.ones((100, 3, 10))
print T2.dot(T1).shape
Ouput of attempt 1 :
(100, 3, 100, 9)
Which means it tried all possible combinations ... which is not what I am after.
Actually non of the other attempts even compile. I tried using np.tensordot , np.einsum (read here https://jameshensman.wordpress.com/2010/06/14/multiple-matrix-multiplication-in-numpy that it is supposed to do the job but I did not get Einsteins indices correct) also in the same link there is some crazy tensor cube reshaping method that I did not manage to visualize. Any suggestions / ideas-explanations on how to tackle this ?
Did you try?
In [96]: np.einsum('ijk,ilj->ilk',T1,T2).shape
Out[96]: (100, 3, 9)
The way I figure this out is look at the shapes:
(100, 10, 9)) (i, j, k)
(100, 3, 10) (i, l, j)
-------------
(100, 3, 9) (i, l, k)
the two j sum and cancel out. The others carry to the output.
For 4d arrays, with dimensions like (100,3,2,24 ) there are several options:
Reshape to 3d, T1.reshape(300,2,24), and after reshape back R.reshape(100,3,...). Reshape is virtually costless, and a good numpy tool.
Add an index to einsum: np.einsum('hijk,hilj->hilk',T1,T2), just a parallel usage to that of i.
Or use elipsis: np.einsum('...jk,...lj->...lk',T1,T2). This expression works with 3d, 4d, and up.