How to transform 3rd dimension of Numpy array loaded with ImageIO - python

I have a numpy array of shape (128,128,3) loaded from a PNG using ImageIO.
Dimension 3 seems to represent RGB values. In this instance all values for Dimension 3 are either [255,255,255] or [0,0,0] (i.e. white or black).
I want to get rid of the third dimension and replace it with a single 1D array containing 0 for black, and 1 for white. So the end result shape should be (128,128,1).
I've attempted to use combinations of numpy.reshape and numpy.transpose but I'm really struggling to understand how to do this. I am a beginner to numpy and Python so I may be missing something very simple.

It's not missing, and it is very simple. Just index the channel you want:
im[:, :, 0]
To convert to zeros and ones, you can either make a boolean array:
im[:, :, 0].astype(np.bool)
or set 255 to one:
im = im[:, :, 0]
im[im > 0] = 1
A more advanced approach to creating a boolean array would be to view the underlying data as a boolean. This will only work well out of the box if the input is uint8:
im[:, :, 0].view(dtype=np.bool)
Finally, to index the last dimension of an N dimensional array, you can use ellipsis:
im[..., 0]
... (or the actual name Ellipsis) in an index means "use : for all dimensions not listed explicitly." You can use it at most once in an index.
In general, you will want to read the documentation on indexing and later on broadcasting. There are gentler introductions it there, but the numpy documentation is quite comprehensive and straight from the horse's mouth.

Related

How to make `scikit-image` return a contiguous array without negative strides?

I am using Scikit-Image imread function for reading images for a PyTorch data loader.
I get errors from the function ToTensor(), saying the the strides of the numpy array are negative.
I read about it and using somearray.copy() solves it.
Yet, I'd like to solve it from the root. How can I force Scikit-Image to read the image into a contiguous array with regular strides?
I looked for solutions for this case and they mostly about creating a new copy of data which I want to avoid.
Those are the properties of the array:
print(f'shape: {img.shape}')
print(f'dtype: {img.dtype}')
print(f'strides: {img.strides}')
The output:
shape: (4032, 3024, 3)
dtype: uint8
strides: (3, -12096, 1)
When I run img.base I get the values of the data. Though the dimensions are (3024, 4032, 3)
I don't know a lot about image file formats, but can make some deductions from the data you provided
shape: (4032, 3024, 3)
dtype: uint8
strides: (3, -12096, 1)
img.base (3024, 4032, 3)
img is a view of its base. The negative strides[1] means that dimension has been reversed, e.g. with a ::-1 indexing. The fact that the largest stride is in the middle, means the first two dimensions have been swapped (transpose(1,0,2)). I expect img.base.strides is (12096,3,1). 12096 is 3*4032.
jpg is a compressed format, but I assume the base is close in layout to the file, and this view is needed to conform to our normal numpy expectations for an array.
img.copy() will have the same shape, but strides will be (9072,3,1).
If plt.imread produces an array with that shape and strides, it may well have returned that copy rather than the view. It's not necessarily being any more "efficient".
Think about how we print a 2d array - 1st dimension, rows, going down, 2nd, columns, going across, left to right. But think about a common xy plot - x goes left to right, and y goes from bottom up. Or look at what np.meshgrid says about indexing, 'ij' versus 'xy'.
Having the size 3 dimension last is just another convention. That's the color 'channel', 3 for RGB, 4 adds a transparency value, and 1 for b/w. Sometimes arrays have that dimension first.

Transform a 2x2 array into a 2x2x2 arrays with numpy

I use numpy to do image processing, I wanted to switch the image to black and white and for that I did the calculation in each cell to see the luminosity, but if i want to show it i have to transform a 2d array into 2d array with 3 times the same value
for exemple i have this:
a = np.array([[255,0][0,255]])
#into
b = np.array([[[255,255,255],[0,0,0]],[[0,0,0],[255,255,255]]])
I've been searching for a while but i don't find anything to help
PS: sorry if i have made some mistake with my English.
You'll want to us an explicit broadcast: https://numpy.org/doc/stable/reference/generated/numpy.broadcast_to.html#numpy.broadcast_to
b = np.broadcast_to(a[..., np.newaxis], (2, 2, 3))
Usually you don't need to do it explicitly, maybe try and see if just a[..., np.newaxis] and the standard broadcasting rules are enough.
Another way to do it
np.einsum('ij,k->ijk', a, [1,1,1])
It's a way to create a 3 dimensional array (hence the ijk), from a 2d array (ij) and a 1d array (k). Whose result is for all i,j,k being indices of a and of [1,1,1], the 3d matrix of a[i,j]×[1,1,1][k].

Non-consecutive slicing of a multidimensional array in Python

I am trying to perform non-consectuitive slicing of a multidimensional array like this (Matlab peudo code)
A = B(:,:,[1,3],[2,4,6]) %A and B are two 4D matrices
But when I try to write this code in Python:
A = B[:,:,np.array([0,2]),np.array([1,3,5])] #A and B are two 4D arrays
it gives an error: IndexError: shape mismatch: indexing arrays could not be broadcast...
It should be noted that slicing for one dimension each time works fine!
In numpy, if you use more than one fancy index (i.e. array) to index different dimension of the same array at the same time, they must broadcast. This is designed such that indexing can be more powerful. For your situation, the simplest way to solve the problem is indexing twice:
B[:, :, [0,2]] [..., [1,3,5]]
where ... stands for as many : as possible.
Indexing twice this way would generate some extra data moving time. If you want to index only once, make sure they broadcast (i.e. put fancy indices on different dimension):
B[:, :, np.array([0,2])[:,None], [1,3,5]]
which will result in a X by Y by 2 by 3 array. On the other hand, you can also do
B[:, :, [0,2], np.array([1,3,5])[:,None]]
which will result in a X by Y by 3 by 2 array. The [1,3,5] axis is transposed before the [0,2] axis.
Yon don't have to use np.array([0,2]) if you don't need to do fancy operation with it. Simply [0,2] is fine.
np.array([0,2])[:,None] is equivalent to [[0],[2]], where the point of [:,None] is to create an extra dimension such that the shape becomes (2,1). Shape (2,) and (3,) cannot broadcast, while shape (2,1) and (3,) can, which becomes (2,3).

Extracting specific columns from multi-dimensional array

Suppose we have a 3d numpy array in Python of shape (1, 22, 22) -random dimensions for illustration. If i want to extract the first 2 dimensions from Y, Z, then I can do:
new_array = array[:, 0:2, 0:2]
new_array.shape
(1, 2, 2)
But when I try to do the same by explicitly specifying the first two dimensions, as:
new_array = array[:, [0,1], [0,1]]
new_array.shape
(1, 2)
I'm getting a different result. Why's that? How can I select specific dimensions and and not a range of dimensions?
Passing a list to a numpy array's __getite__ uses advanced indexing instead of slicing. See the documentation here.
Advanced indexing is triggered when the selection object, obj, is a non-tuple sequence object, an ndarray (of data type integer or bool), or a tuple with at least one sequence object or ndarray (of data type integer or bool). There are two types of advanced indexing: integer and Boolean.
In your case, you are using the integer array indexing. The chain of integer indices are broadcast and iterated as a single unit. So using
array[:, [0,1], [0,1]]
selects elements (0,0) and (1,1), not the zeroth and first subarray from dimension 1 and the zeroth and first subarray form dimension 2.
I read the documentation and played around with my code. The only thing that seemed to work -but doesn't- with respect to my question is:
columns = np.array(([0, 1]), ([0,1]), dtype=np.intp)
new_array = my_array[:, columns, 0]
I'm still not quite sure why it works though.
EDIT: doesn't work as expected

Numpy remove a dimension from np array

I have some images I want to work with, the problem is that there are two kinds of images both are 106 x 106 pixels, some are in color and some are black and white.
one with only two (2) dimensions:
(106,106)
and one with three (3)
(106,106,3)
Is there a way I can strip this last dimension?
I tried np.delete, but it did not seem to work.
np.shape(np.delete(Xtrain[0], [2] , 2))
Out[67]: (106, 106, 2)
You could use numpy's fancy indexing (an extension to Python's built-in slice notation):
x = np.zeros( (106, 106, 3) )
result = x[:, :, 0]
print(result.shape)
prints
(106, 106)
A shape of (106, 106, 3) means you have 3 sets of things that have shape (106, 106). So in order to "strip" the last dimension, you just have to pick one of these (that's what the fancy indexing does).
You can keep any slice you want. I arbitrarily choose to keep the 0th, since you didn't specify what you wanted. So, result = x[:, :, 1] and result = x[:, :, 2] would give the desired shape as well: it all just depends on which slice you need to keep.
if you have multiple dimensional this might help
pred_mask[0,...] #Remove First Dim
Pred_mask[...,0] #Remove Last Dim
Just take the mean value over the colors dimension (axis=2):
Xtrain_monochrome = Xtrain.mean(axis=2)
When the shape of your array is (106, 106, 3), you can visualize it as a table with 106 rows and 106 columns filled with data points where each point is array of 3 numbers which we can represent as [x, y ,z]. Therefore, if you want to get the dimensions (106, 106), you must make the data points in your table of to not be arrays but single numbers. You can achieve this by extracting either the x-component, y-component or z-component of each data point or by applying a function that somehow aggregates the three component like the mean, sum, max etc. You can extract any component just like #matt Messersmith suggested above.
well, you should be careful when you are trying to reduce the dimensions of an image.
An Image is normally a 3-D matrix that contains data of the RGB values of each pixel. If you want to reduce it to 2-D, what you really are doing is converting a colored RGB image into a grayscale image.
And there are several ways to do this like you can take the maximum of three, min, average, sum, etc, depending on the accuracy you want in your image. The best you can do is, take a weighted average of the RGB values using the formula
Y = 0.299R + 0.587G + 0.114B
where R stands for RED, G is GREEN and B is BLUE. In numpy, this can be written as
new_image = img[:, :, 0]*0.299 + img[:, :, 1]*0.587 + img[:, :, 2]*0.114
Actually np.delete would work if you would apply it two times,
if you want to preserve the first channel for example then you could run the following:
Xtrain = np.delete(Xtrain,2,2) # this will get rid of the 3rd component of the 3 dimensions
print(Xtrain.shape) # will now output (106,106,2)
# again we apply np.delete but on the second component of the 3rd dimension
Xtrain = np.delete(Xtrain,1,2)
print(Xtrain.shape) # will now output (106,106,1)
# you may finally squeeze your output to get a 2d array
Xtrain = Xtrain.squeeze()
print(Xtrain.shape) # will now output (106,106)

Categories

Resources