I can declare a 3D array like this:
a 3D array, shape-(2, 2, 2)
3D_array = np.array([[[0, 1],[2, 3]], [[4, 5],[6, 7]]])
So if I have an image 10*10(pixels) 3 rgb channels, image.shape would be (3x10x10).
But i see all the time image.shape equal to (10x10x3), i don't understand why?
Thanks for you attention.
Usually in numpy and matplotlib the rgb channels are in the last axis. This is just a convention, so you can do little about this. If you use a program that uses the other convention (channels first), you can transform the image with:
channels_first_im = np.moveaxis(channels_last_im, 0, 1)
and the other way:
channels_last_im = np.moveaxis(channels_first_im, 0, -1)
If you're confused about why the convention image arrays would be of shape (N, M, 3) instead of (3, N, M), let's look at how indexing would work in both of those scenarios.
Let's assume we have an image called image_array, that represents a random colored with a width and height of 100 pixels, and let's try to index it to access the value of the pixel at index (50, 50).
Channels First
import numpy as np
image = np.random.random((3, 100, 100)) #image.shape == (3, 100, 100)
pixel = image[:, 50, 50] #pixel.shape == (3,)
Channels Last
import numpy as np
image = np.random.random((100, 100, 3)) #image.shape == (100, 100, 3)
pixel = image[50, 50] #pixel.shape == (3,)
Having the channels as the last dimension of the array, means that the individual pixel information is easier to index to find. Where as in the first case, we need to specify that we want the entire first dimension every time. These are inherently the same thing, but leaving the channels last allows us to be less verbose as to how we index the array.
Related
I am working on an image processing/building problem. I have a smaller image that I want to place into a larger one. As normal the image is represented as a 3d array. This works fine with the following code (both element_pixels and image_pixels are 3d ndarrays with depth 3 representing RGB, element_pixels is equal to or smaller than image_pixels in the other dimensions):
element_pixels = element.get_pixels()
image_pixels[element.position[0]:element.position[0]+element.height, element.position[1]:element.position[1]+element.width, :] = element_pixels
However I want to treat black pixels in the element as transparent. The simplest way to do this seems to be to mask the element so I don't modify image_pixels where element_pixel is black. I tried the following, but I am tying myself in knots:
element_pixels = element.get_pixels()
b = np.all(element_pixels == [0, 0, 0], axis=-1)
black_pixels_mask = np.dstack([b,b,b])
image_pixels[element.position[0]:element.position[0]+element.height, element.position[1]:element.position[1]+element.width, :][black_pixels_mask] = element_pixels
This looks to be correctly generating a mask but I can't figure out how to use it. I get the following error:
image_pixels[element.position[0]:element.position[0]+element.height, element.position[1]:element.position[1]+element.width, :][black_pixels_mask] = element_pixels
TypeError: NumPy boolean array indexing assignment requires a 0 or 1-dimensional input, input has 3 dimensions
The masking kind-of works (i.e. runs without exceptions) if I replace the final = element_pixels with a constant, but I'm struggling to extrapolate this to a solution.
Extra detail of sizes
element_pixels.shape=(40, 40,3)
image_pixels.shape=(100, 100,3)
image_pixels[element.position[0]:element.position[0]+element.height, element.position[1]:element.position[1]+element.width, :].shape = (40,40,3)
A MRE in 2d
This captures what I'm trying to do without the complexity of the extra dimension.
import numpy as np
bg = np.ones((10,10))*0.5
img = np.concatenate([np.zeros((5,1)),np.ones((5,1))], axis=1)
mask = img == 0
# copy the *non-zero* pixel values of img to a particular location in bg
bg[5:10,5:7][mask] = img # this throws exception
print(bg)
I discovered after some experimentation that the (perhaps obvious in hindsight) answer is the you have to apply the mask to both sides.
So taking my MRE:
import numpy as np
bg = np.ones((10,10))*0.5
img = np.concatenate([np.zeros((5,1)),np.ones((5,1))], axis=1)
mask = img > 0
bg[5:10,5:7][mask] = img[mask]
print(bg)
Or going back to my original code, the only line that changes is:
image_pixels[element.position[0]:element.position[0]+element.height, element.position[1]:element.position[1]+element.width, :][~black_pixels_mask] = element_pixels[~black_pixels_mask]
Well you can use a 2d mask on a 3d array. So something like this will replace all black pixels of img with those of background.
img = np.random.randint(0, 2, (10, 10, 3))
background = np.random.randint(0, 2, (10, 10, 3))
mask = np.all(img == [0,0,0], axis=2)
img[mask] = background[img]
I'm not sure I understand what is in image_pixels but I think you can do something similar.
I am not sure how to word this. I am sure there is an operation that describes what I am trying to do I just don't have a lot of experience manipulating image arrays.
I have a 2D array (matrix) of 1s and 0s which specify if a group of pixels should be the color [255,255,255] or the color [0,0,0] in rbg. It seems like this should be a simple multiplication. I should be able to multiple my color by my matrix of 1s and 0s to make an image, but all the dots products and matrix multiplication I have tried has failed.
Here is a simple example my 2D numpy array and
# 2D pixels array
[[0,1],
[1, 1]]
# rbg array
[[255,255,255]]
What I would want is the following 3D array
[[[0,0,0],[255,255,255]],
[[255,255,255], [255,255,255]]]
This array has the shape 2X2X3.
Here are the arrays for reproducibility and to make it easy for anyone willing to help.
pixel = np.array([0,1,1,1]).reshape(2,2)
rgb = np.array([255,255,255]).reshape(1,3)
How about reshaping pixel into a 3D matrix and using dot?
pixel = np.array([0,1,1,1]).reshape(2,2,-1)
rgb = np.array([255,255,255]).reshape(1,3)
pixel.dot(rgb)
Output
array([[[ 0, 0, 0],
[255, 255, 255]],
[[255, 255, 255],
[255, 255, 255]]])
If you are multiplying matrices element-wise, they must be broadcastable, meaning that numpy can convert their size into a common size. From the numpy documentation:
Broadcasting can be understood by four rules:
All input arrays with ndim smaller than the input array of largest ndim, have 1’s prepended to their shapes.
The size in each dimension of the output shape is the maximum of all the input sizes in that dimension.
An input can be used in the calculation if its size in a particular dimension either matches the output size in that dimension, or has value exactly 1.
If an input has a dimension size of 1 in its shape, the first data entry in that dimension will be used for all calculations along that dimension. In other words, the stepping machinery of the ufunc will simply not step along that dimension (the stride will be 0 for that dimension).
Your desired dimension structuring boils down to a three-dimensional array, where the dimension indices have the meaning (y coordinate, x coordinate, colour channel), where the colour channel index is either 0, 1, or 2 (for red, green, and blue, respectively).
To abide by the rules shown above and get the desired dimension structuring explained above, we need to ensure that our pixel array has three dimensions, of which the third dimension can have size 1 (see rule 3). The rgb array can stay the same, as dimensions will automatically be added at the front (see rule 1). Because of rule 2, the resulting array will take the size of the pixel array for the first two dimensions, and the size of the rgb array for the third dimension.
To add a third dimension to the pixel array, you can either use reshape (as the other answers show), or the expand_dims function. Then, you can simply do an elementwise multiplication between the arrays, and numpy will automatically broadcast the arrays using the rules discussed above. See the following example:
>>> pixel = np.array([
... [0, 1],
... [1, 1]
... ])
>>> rgb = np.array([255, 255, 255])
>>> pixel.shape
(2, 2)
>>> rgb.shape
(3,)
>>> pixel = np.expand_dims(pixel, axis=2) # add a third axis to get the correct shape
>>> pixel.shape
(2, 2, 1)
>>> image = rgb * pixel
>>> image.shape
(2, 2, 3)
>>> image
array([[[ 0, 0, 0],
[255, 255, 255]],
[[255, 255, 255],
[255, 255, 255]]])
Note that numpy prints the array a bit differently than you might expect, but you can simply verify that the array matches your desired array:
>>> image[0, 0, :]
array([0, 0, 0])
>>> image[0, 1, :]
array([255, 255, 255])
>>> image[1, 0, :]
array([255, 255, 255])
>>> image[1, 1, :]
array([255, 255, 255])
When multiplying matrices, the columns of the first matrix must match the rows of the 2nd matrix. And the Rows of the first matrix and column of the 2nd matrix will be the new dimension of your result.
An example is if you have a 1x3 matrix and you multiply by a 3x4 matrix. this is a valid multiplication since the first matrix has 3 columns and the 2nd matrix has 3 rows. The result will then be a 1x4 matrix due to the first matrix having a single row, and the 2nd has 4 columns.
For your example I think you need a nested loop to multiply each element in your pixel array with the RBG values. I don't know a great way to do this, but I think this may work.
pixel = np.array([0,1,1,1]).reshape(2,2)
rgb = np.array([255,255,255]).reshape(1,3)
arr = np.zeros((2,2,3))
for i in range(2):
for j in range(2):
for k in range(3):
arr[i][j][k] = pixel[i][j] * rgb[0][k]
I want to make a simple Program which outputs a video as an Webcam, but the Cam wants a RGBA Numpy Array but I only have RGB from the video. How can I convert the 3 dimensional array to 4 dimensions?
You're actually not converting a 3-dimensional array to a 4-dimensional array. You're changing the size of one of the dimensions from three to four.
Lets say you have a NxMx3 image. You then need to:
temp = np.zeros((N, M, 4))
temp[:,:,0:3] = image
temp[:,:,3] = whatever default alpha you choose to use.
Generalize as you see fit.
Assuming your existing array is shaped (xsize, ysize, 3) and you want to create alpha as a 4th entry all filled with 1, you should be able to do something like
alpha = np.ones((*rgb.shape[0:2], 1))
rgba = np.concatenate((rgb, alpha), axis=2)
If you wanted a different uniform alpha value you could use np.full with that value instead of np.ones, but normally when converting RGB to RGBA you want fully opaque.
You can np.dstack your original im with np.ones(im.shape[:2])
new_im = np.dstack((im, np.ones(im.shape[:2])))
update: this is equivalent to #hobbs solution np.concatenate(..., axis=2)
Maybe try something like these: (import numpy as np)
arr # shape (n_bands, y_pixels, x_pixels)
swapped = np.moveaxis(arr, 0, 2) # shape (y_pixels, x_pixels, n_bands)
arr4d = np.expand_dims(swapped, 0) # shape (1, y_pixels, x_pixels, n_bands)
I'm trying to get Image data from convolution layer in TensorFlow.
I have an array like:
data = [N, Width, Height, Channel]
where N is the image number, Width and Height are image dimensions and Channel in the index of channel.
What do I need is another 4D array like:
[N, Channel, Width, Height]
The reason is go in cycle by N and Channel and get 2D array of bytes for the ech channel for the each of image.
img = Image.fromarray(data[N][Channel], 'L')
img.save('my.png')
Use the transpose function to reorder the dimensions.
https://www.tensorflow.org/versions/r0.9/api_docs/python/array_ops.html#transpose
You would do something like this in the tensorflow code.
image = tf.transpose(image, perm = [0, 3, 1, 2])
In the perm parameter you specify the new order of dimensions you want. In this case you move the channels dimension (3) into the second position.
If you want to do it before inputting it into the tensorflow model you can use np.transpose in the same way.
Just use np.transpose:
x = np.zeros((32, 10, 10, 3)) # image with 3 channels, size 10x10
res = np.tranpose(x, (0, 3, 1, 2))
print res.shape # prints (32, 3, 10, 10)
I'm checking with you if there is a neat numpy solution to resizing down a 2D numpy array (which is an image) using bilinear filtering?
More specifically, my array has the shape (width, height, 4) (as in a rgba image). The downscaling is also only done on "even" steps: i.e. from (w, h, 4) to (w/2, h/2, 4) to (w/4, h/4, 4) etc.
I've browsed around for quite some time now but everyone seems to refer to the scipy/PIL versions of imresize.
I want to minimize the number of dependencies on python packages, hence the numpy only requirement.
I just wanted to check with SO before I go implement it in C++ instead.
I don't think there is any specific solution in numpy, but you should be able to implement it efficiently without leaving the comfort of python. Correct me if I'm wrong, but when the size of the image is divisible by 2, a bilinear filter is basically the same as averaging 4 pixels of the original image to get 1 pixel of the new one, right? Well, if your image size is a power of two, then the following code:
from __future__ import division
import numpy as np
from PIL import Image
def halve_image(image) :
rows, cols, planes = image.shape
image = image.astype('uint16')
image = image.reshape(rows // 2, 2, cols // 2, 2, planes)
image = image.sum(axis=3).sum(axis=1)
return ((image + 2) >> 2).astype('uint8')
def mipmap(image) :
img = image.copy()
rows, cols, planes = image.shape
mipmap = np.zeros((rows, cols * 3 // 2, planes), dtype='uint8')
mipmap[:, :cols, :] = img
row = 0
while rows > 1:
img = halve_image(img)
rows = img.shape[0]
mipmap[row:row + rows, cols:cols + img.shape[1], :] = img
row += rows
return mipmap
img = np.asarray(Image.open('lena.png'))
Image.fromarray(mipmap(img)).save('lena_mipmap.png')
Produces this output:
With an original image of 512x512, it runs on my system in:
In [3]: img.shape
Out[3]: (512, 512, 4)
In [4]: %timeit mipmap(img)
10 loops, best of 3: 154 ms per loop
This will not work if an odd length of a side ever comes up, but depending on exactly how you want to handle the downsampling for those cases, you should be able to get rid of a full row (or column) of pixels, reshape your image to (rows // 2, 2, cols // 2, 2, planes), so that img[r, :, c, :, p] is a 2x2 matrix of values to interpolate to get a new pixel value.