strange behaviour when cropping a numpy/opencv image - python

I'm really puzzled by the way of indexing a numpy multidimensional array. My goal is to crop a region from an image I loaded using opencv.
Loading the image works great:
import numpy as np
import cv2
img = cv2.imread(start_filename)
print img.shape
shape is displayed as
(2000L, 4096L, 3L)
Now I want to cut a part from the image which ranges from pixels 550 to 1550 in the first dimension and only consists of the last 782 pixels of the second dimension. I tried
img=img[550:1550][:-782][:]
print img.shape
Now the shape is displayed as
(782L, 4096L, 3L)
I'm confused, whats the correct way of indexing for the crop operation?

The correct way of cropping image is using slicing technique:
import cv2
img = cv2.imread("lenna.png")
crop_img = img[200:400, 100:300] # Crop from x, y, w, h -> 100, 200, 300, 400
# NOTE: its img[y: y + h, x: x + w] and *not* img[x: x + w, y: y + h]
In your case, the final cropped image may be reproduced as:
crop_img=img[550:1550, -782:]
print crop_img.shape

As mentioned in other answers you could use img[550:1550,-782:,:] but this will give you only a read only view of the array. It means that you cannot modify it. If you want to modify the image after you crop it you could use the ix_ function of Numpy for indexing.
img=img[ix_(range(550, 1550), range(img.shape[1]-782, img.shape[1]))]
# or
img=img[ix_(range(550, 1550), range(img.shape[1]-782, img.shape[1]), range(3))]
After this your shape will look like:
(1000, 782, 3)

Related

OpenCV can't resize() a numpy array created from a pygame.PixelArray, error: src data type = 8 is not supported [duplicate]

I would like to take an image and change the scale of the image, while it is a numpy array.
For example I have this image of a coca-cola bottle:
bottle-1
Which translates to a numpy array of shape (528, 203, 3) and I want to resize that to say the size of this second image:
bottle-2
Which has a shape of (140, 54, 3).
How do I change the size of the image to a certain shape while still maintaining the original image? Other answers suggest stripping every other or third row out, but what I want to do is basically shrink the image how you would via an image editor but in python code. Are there any libraries to do this in numpy/SciPy?
Yeah, you can install opencv (this is a library used for image processing, and computer vision), and use the cv2.resize function. And for instance use:
import cv2
import numpy as np
img = cv2.imread('your_image.jpg')
res = cv2.resize(img, dsize=(54, 140), interpolation=cv2.INTER_CUBIC)
Here img is thus a numpy array containing the original image, whereas res is a numpy array containing the resized image. An important aspect is the interpolation parameter: there are several ways how to resize an image. Especially since you scale down the image, and the size of the original image is not a multiple of the size of the resized image. Possible interpolation schemas are:
INTER_NEAREST - a nearest-neighbor interpolation
INTER_LINEAR - a bilinear interpolation (used by default)
INTER_AREA - resampling using pixel area relation. It may be a preferred method for image decimation, as it gives moire’-free
results. But when the image is zoomed, it is similar to the
INTER_NEAREST method.
INTER_CUBIC - a bicubic interpolation over 4x4 pixel neighborhood
INTER_LANCZOS4 - a Lanczos interpolation over 8x8 pixel neighborhood
Like with most options, there is no "best" option in the sense that for every resize schema, there are scenarios where one strategy can be preferred over another.
While it might be possible to use numpy alone to do this, the operation is not built-in. That said, you can use scikit-image (which is built on numpy) to do this kind of image manipulation.
Scikit-Image rescaling documentation is here.
For example, you could do the following with your image:
from skimage.transform import resize
bottle_resized = resize(bottle, (140, 54))
This will take care of things like interpolation, anti-aliasing, etc. for you.
One-line numpy solution for downsampling (by 2):
smaller_img = bigger_img[::2, ::2]
And upsampling (by 2):
bigger_img = smaller_img.repeat(2, axis=0).repeat(2, axis=1)
(this asssumes HxWxC shaped image. note this method only allows whole integer resizing (e.g., 2x but not 1.5x))
For people coming here from Google looking for a fast way to downsample images in numpy arrays for use in Machine Learning applications, here's a super fast method (adapted from here ). This method only works when the input dimensions are a multiple of the output dimensions.
The following examples downsample from 128x128 to 64x64 (this can be easily changed).
Channels last ordering
# large image is shape (128, 128, 3)
# small image is shape (64, 64, 3)
input_size = 128
output_size = 64
bin_size = input_size // output_size
small_image = large_image.reshape((output_size, bin_size,
output_size, bin_size, 3)).max(3).max(1)
Channels first ordering
# large image is shape (3, 128, 128)
# small image is shape (3, 64, 64)
input_size = 128
output_size = 64
bin_size = input_size // output_size
small_image = large_image.reshape((3, output_size, bin_size,
output_size, bin_size)).max(4).max(2)
For grayscale images just change the 3 to a 1 like this:
Channels first ordering
# large image is shape (1, 128, 128)
# small image is shape (1, 64, 64)
input_size = 128
output_size = 64
bin_size = input_size // output_size
small_image = large_image.reshape((1, output_size, bin_size,
output_size, bin_size)).max(4).max(2)
This method uses the equivalent of max pooling. It's the fastest way to do this that I've found.
If anyone came here looking for a simple method to scale/resize an image in Python, without using additional libraries, here's a very simple image resize function:
#simple image scaling to (nR x nC) size
def scale(im, nR, nC):
nR0 = len(im) # source number of rows
nC0 = len(im[0]) # source number of columns
return [[ im[int(nR0 * r / nR)][int(nC0 * c / nC)]
for c in range(nC)] for r in range(nR)]
Example usage: resizing a (30 x 30) image to (100 x 200):
import matplotlib.pyplot as plt
def sqr(x):
return x*x
def f(r, c, nR, nC):
return 1.0 if sqr(c - nC/2) + sqr(r - nR/2) < sqr(nC/4) else 0.0
# a red circle on a canvas of size (nR x nC)
def circ(nR, nC):
return [[ [f(r, c, nR, nC), 0, 0]
for c in range(nC)] for r in range(nR)]
plt.imshow(scale(circ(30, 30), 100, 200))
Output:
This works to shrink/scale images, and works fine with numpy arrays.
For people who wants to resize(interpolate) a batch of numpy array, pytorch provide a faster function names torch.nn.functional.interpolate, just remember to use np.transpose first to change the channel from batchxWxHx3 to batchx3xWxH.
SciPy's imresize() method was another resize method, but it will be removed starting with SciPy v 1.3.0 . SciPy refers to PIL image resize method: Image.resize(size, resample=0)
size – The requested size in pixels, as a 2-tuple: (width, height).
resample – An optional resampling filter. This can be one of PIL.Image.NEAREST (use nearest neighbour), PIL.Image.BILINEAR (linear interpolation), PIL.Image.BICUBIC (cubic spline interpolation), or PIL.Image.LANCZOS (a high-quality downsampling filter). If omitted, or if the image has mode “1” or “P”, it is set PIL.Image.NEAREST.
Link here:
https://pillow.readthedocs.io/en/3.1.x/reference/Image.html#PIL.Image.Image.resize
Stumbled back upon this after a few years. It looks like the answers so far fall into one of a few categories:
Use an external library. (OpenCV, SciPy, etc)
User Power-of-Two Scaling
Use Nearest Neighbor
These solutions are all respectable, so I offer this only for completeness. It has three advantages over the above: (1) it will accept arbitrary resolutions, even non-power-of-two scaling factors; (2) it uses pure Python+Numpy with no external libraries; and (3) it interpolates all the pixels for an arguably 'nicer-looking' result.
It does not make good use of Numpy and, thus, is not fast, especially for large images. If you're only rescaling smaller images, it should be fine. I offer this under Apache or MIT license at the discretion of the user.
import math
import numpy
def resize_linear(image_matrix, new_height:int, new_width:int):
"""Perform a pure-numpy linear-resampled resize of an image."""
output_image = numpy.zeros((new_height, new_width), dtype=image_matrix.dtype)
original_height, original_width = image_matrix.shape
inv_scale_factor_y = original_height/new_height
inv_scale_factor_x = original_width/new_width
# This is an ugly serial operation.
for new_y in range(new_height):
for new_x in range(new_width):
# If you had a color image, you could repeat this with all channels here.
# Find sub-pixels data:
old_x = new_x * inv_scale_factor_x
old_y = new_y * inv_scale_factor_y
x_fraction = old_x - math.floor(old_x)
y_fraction = old_y - math.floor(old_y)
# Sample four neighboring pixels:
left_upper = image_matrix[math.floor(old_y), math.floor(old_x)]
right_upper = image_matrix[math.floor(old_y), min(image_matrix.shape[1] - 1, math.ceil(old_x))]
left_lower = image_matrix[min(image_matrix.shape[0] - 1, math.ceil(old_y)), math.floor(old_x)]
right_lower = image_matrix[min(image_matrix.shape[0] - 1, math.ceil(old_y)), min(image_matrix.shape[1] - 1, math.ceil(old_x))]
# Interpolate horizontally:
blend_top = (right_upper * x_fraction) + (left_upper * (1.0 - x_fraction))
blend_bottom = (right_lower * x_fraction) + (left_lower * (1.0 - x_fraction))
# Interpolate vertically:
final_blend = (blend_top * y_fraction) + (blend_bottom * (1.0 - y_fraction))
output_image[new_y, new_x] = final_blend
return output_image
Sample rescaling:
Original:
Downscaled by Half:
Upscaled by one and one quarter:
Are there any libraries to do this in numpy/SciPy
Sure. You can do this without OpenCV, scikit-image or PIL.
Image resizing is basically mapping the coordinates of each pixel from the original image to its resized position.
Since the coordinates of an image must be integers (think of it as a matrix), if the mapped coordinate has decimal values, you should interpolate the pixel value to approximate it to the integer position (e.g. getting the nearest pixel to that position is known as Nearest neighbor interpolation).
All you need is a function that does this interpolation for you. SciPy has interpolate.interp2d.
You can use it to resize an image in numpy array, say arr, as follows:
W, H = arr.shape[:2]
new_W, new_H = (600,300)
xrange = lambda x: np.linspace(0, 1, x)
f = interp2d(xrange(W), xrange(H), arr, kind="linear")
new_arr = f(xrange(new_W), xrange(new_H))
Of course, if your image is RGB, you have to perform the interpolation for each channel.
If you would like to understand more, I suggest watching Resizing Images - Computerphile.
import cv2
import numpy as np
image_read = cv2.imread('filename.jpg',0)
original_image = np.asarray(image_read)
width , height = 452,452
resize_image = np.zeros(shape=(width,height))
for W in range(width):
for H in range(height):
new_width = int( W * original_image.shape[0] / width )
new_height = int( H * original_image.shape[1] / height )
resize_image[W][H] = original_image[new_width][new_height]
print("Resized image size : " , resize_image.shape)
cv2.imshow(resize_image)
cv2.waitKey(0)

Grayscale image as np.array has shape (100, 80, 3), how?

I am currently working on a neural network working with grayscale images in form of numpy arrays. For some reason however I occasionally get images with the shape (.., .., 3) even though I should only be getting arrays with the shape (.., ..). This means that for some reason some images represent their grayscale color like this: [100, 100, 100] instead of just 100. Is there an effective way to fix this or simply to replace the [x, x, x] with an x?
Here is the code I use to import images and convert them to be black and white:
from PIL import Image
img = Image.open(Filepath)
img.convert("1")
print(np.array(img).shape) # -> (.., .., 3)??
The .convert() method returns a copy of the image, so you will need to assign it to a variable.
from PIL import Image
img = Image.open(filepath)
img = img.convert("1")
# Or img = Image.open(filepath).convert("1")

I want to get the SSIM when comparing two images in python

I'm trying to use the "compare_ssim" function. I currently have two 2xN matrices of x,y coordinates where the first row is all the x coordinates and the second row is all the y coordinates of each of the two images. How can I calculate the SSIM for these two images (if there is a way to do so)
For example I have:
X = np.array([[1,2,3], [4,5,6]])
Y = np.array([[3,4,5],[5,6,7]])
compare_ssim(X,Y)
But I am getting the error
ValueError: win_size exceeds image extent. If the input is a multichannel (color) image, set multichannel=True.
I'm not sure if I am missing a parameter or if I should convert the matrices in such a way that this function works. Or if there is a way that I am supposed to convert my coordinates to a grayscale matrix? I'm a bit confused on what the matrices for the parameters of the function should look like. I know that they are supposed to be ndarrays but the type(Y) and type(Y) are both numpy.ndarray.
Since you haven't mentioned which framework/library you are using, I am going with the assumption that you are using skimage's compare_ssim.
The error in question is due to the shape of your inputs. You can find more details here.
TL;DR: compare_ssim expects images in (H, W, C) dimensions but your input images have a dimension of (2, 3). So the function is confused which dimension to treat as the channel dimension. When multichannel=True, the last dimension is treated as the channel dimension.
There are 3 key problems with your code,
compare_image expects Images as input. So your X and Y matrices should be of the dimensions (H, W, C) and not (2, 3)
They should of float datatype.
Below I have shown a bit of demo code (note: Since skimage v1.7, compare_ssim has been moved to skimage.metrics.structural_similarity)
import numpy as np
from skimage.metrics import structural_similarity
img1 = np.random.randint(0, 255, size=(200, 200, 3)).astype(np.float32)
img2 = np.random.randint(0, 255, size=(200, 200, 3)).astype(np.float32)
ssim_score = structural_similarity(img1, img2, multichannel=True) #score: 0.0018769083894301646
ssim_score = structural_similarity(img1, img1, multichannel=True) #score: 1.0

Use PIL to convert a grayscale image to a (1, H, W) numpy array

Using PIL to convert a RGB image to a (H, W, 3) numpy array is very fast.
im = np.array(PIL.open(path))
However, I cannot find a fast way to convert a grayscale image to a (H, W, 1) array. I tried two approaches but they are both much slower than above:
im = np.array(PIL.open(path)) # return an (H, W) array
im = np.expand_dims(im, axis=0)
im = im.astype(int)
This approach is slow too:
img = PIL.open(path)
im = np.array(img.getdata()).reshape(img.size[1], img.size[0], 1)
Please advice...
You can use np.asarray() to get the array view, then append a new axis with None/np.newaxis and then use type conversion with copy set to False (in case you were converting from same dtype to save on memory) -
im = np.asarray(PIL.open(path))
im_out = im[None].astype(dtype=int, copy=False)
This appends new axis at the start, resulting in (1,H,W) as the output array shape. To do so, for the end to get an array shape of (H,W,1), do : im[...,None] instead of im[None].
A simpler way would be -
im_out = np.asarray(img, dtype=int)[None]
If the input is already in uint8 dtype and we want an output array of the same dtype, use dtype=np.uint8 and that should be pretty fast.

How to use view as blocks in Scikit-Image?

I have a numpy array of shape (12,224,224). This is 12 images of size (244, 244). When I had a single image, this was simple. The image was of size (x,y). For example, x is an image of size (400,400), for which I could use view_as_blocks like this:
from skimage.util import view_as_blocks as vablks
xx = vablks(x, block_shape=(8,8))
This would result in a block of shape (50,50,8,8).
Now I would like to know how to apply this when I have a list of images. Either I lose shape, that is my 12 images are combined into one (224,224) block broken down into (28,28,8,8), or I run into a ValueError. Here is the code I tried to use for iterating over the 12 images and viewing the (224,224) shaped images
xx = []
for item_ in x:
xx.append(blockSplitter(item_))
where x is a list of images.
Here is the error:
ValueError: 'block_shape' is not compatible with 'arr_in'
Overall, I would like to know how to view the images as blocks of 8x8 without losing the images.
Help, Please and Thank You.
You have at least two options:
1) Convert the list to an array, as suggested by the commenter above. Then use view_as_blocks with the correct parameters:
from skimage.util import view_as_blocks
images = [np.zeros((50, 50)) for i in range(10)]
images = np.array(images)
all_blocks = view_as_blocks(images, block_shape=(1, 10, 10)).squeeze()
2) Convert each item in the list to a windowed view, and then convert the end result to an array:
from skimage.util import view_as_blocks
images = [np.zeros((50, 50)) for i in range(10)]
image_blocks = [view_as_blocks(image, block_shape=(10, 10)) for image in images]
all_blocks = np.array(image_blocks)

Categories

Resources