I'm checking with you if there is a neat numpy solution to resizing down a 2D numpy array (which is an image) using bilinear filtering?
More specifically, my array has the shape (width, height, 4) (as in a rgba image). The downscaling is also only done on "even" steps: i.e. from (w, h, 4) to (w/2, h/2, 4) to (w/4, h/4, 4) etc.
I've browsed around for quite some time now but everyone seems to refer to the scipy/PIL versions of imresize.
I want to minimize the number of dependencies on python packages, hence the numpy only requirement.
I just wanted to check with SO before I go implement it in C++ instead.
I don't think there is any specific solution in numpy, but you should be able to implement it efficiently without leaving the comfort of python. Correct me if I'm wrong, but when the size of the image is divisible by 2, a bilinear filter is basically the same as averaging 4 pixels of the original image to get 1 pixel of the new one, right? Well, if your image size is a power of two, then the following code:
from __future__ import division
import numpy as np
from PIL import Image
def halve_image(image) :
rows, cols, planes = image.shape
image = image.astype('uint16')
image = image.reshape(rows // 2, 2, cols // 2, 2, planes)
image = image.sum(axis=3).sum(axis=1)
return ((image + 2) >> 2).astype('uint8')
def mipmap(image) :
img = image.copy()
rows, cols, planes = image.shape
mipmap = np.zeros((rows, cols * 3 // 2, planes), dtype='uint8')
mipmap[:, :cols, :] = img
row = 0
while rows > 1:
img = halve_image(img)
rows = img.shape[0]
mipmap[row:row + rows, cols:cols + img.shape[1], :] = img
row += rows
return mipmap
img = np.asarray(Image.open('lena.png'))
Produces this output:
With an original image of 512x512, it runs on my system in:
In [3]: img.shape
Out[3]: (512, 512, 4)
In [4]: %timeit mipmap(img)
10 loops, best of 3: 154 ms per loop
This will not work if an odd length of a side ever comes up, but depending on exactly how you want to handle the downsampling for those cases, you should be able to get rid of a full row (or column) of pixels, reshape your image to (rows // 2, 2, cols // 2, 2, planes), so that img[r, :, c, :, p] is a 2x2 matrix of values to interpolate to get a new pixel value.
I would like to take an image and change the scale of the image, while it is a numpy array.
For example I have this image of a coca-cola bottle:
Which translates to a numpy array of shape (528, 203, 3) and I want to resize that to say the size of this second image:
Which has a shape of (140, 54, 3).
How do I change the size of the image to a certain shape while still maintaining the original image? Other answers suggest stripping every other or third row out, but what I want to do is basically shrink the image how you would via an image editor but in python code. Are there any libraries to do this in numpy/SciPy?
Yeah, you can install opencv (this is a library used for image processing, and computer vision), and use the cv2.resize function. And for instance use:
import cv2
import numpy as np
img = cv2.imread('your_image.jpg')
res = cv2.resize(img, dsize=(54, 140), interpolation=cv2.INTER_CUBIC)
Here img is thus a numpy array containing the original image, whereas res is a numpy array containing the resized image. An important aspect is the interpolation parameter: there are several ways how to resize an image. Especially since you scale down the image, and the size of the original image is not a multiple of the size of the resized image. Possible interpolation schemas are:
INTER_NEAREST - a nearest-neighbor interpolation
INTER_LINEAR - a bilinear interpolation (used by default)
INTER_AREA - resampling using pixel area relation. It may be a preferred method for image decimation, as it gives moire’-free
results. But when the image is zoomed, it is similar to the
INTER_CUBIC - a bicubic interpolation over 4x4 pixel neighborhood
INTER_LANCZOS4 - a Lanczos interpolation over 8x8 pixel neighborhood
Like with most options, there is no "best" option in the sense that for every resize schema, there are scenarios where one strategy can be preferred over another.
While it might be possible to use numpy alone to do this, the operation is not built-in. That said, you can use scikit-image (which is built on numpy) to do this kind of image manipulation.
Scikit-Image rescaling documentation is here.
For example, you could do the following with your image:
from skimage.transform import resize
bottle_resized = resize(bottle, (140, 54))
This will take care of things like interpolation, anti-aliasing, etc. for you.
One-line numpy solution for downsampling (by 2):
smaller_img = bigger_img[::2, ::2]
And upsampling (by 2):
bigger_img = smaller_img.repeat(2, axis=0).repeat(2, axis=1)
(this asssumes HxWxC shaped image. note this method only allows whole integer resizing (e.g., 2x but not 1.5x))
For people coming here from Google looking for a fast way to downsample images in numpy arrays for use in Machine Learning applications, here's a super fast method (adapted from here ). This method only works when the input dimensions are a multiple of the output dimensions.
The following examples downsample from 128x128 to 64x64 (this can be easily changed).
Channels last ordering
# large image is shape (128, 128, 3)
# small image is shape (64, 64, 3)
input_size = 128
output_size = 64
bin_size = input_size // output_size
small_image = large_image.reshape((output_size, bin_size,
output_size, bin_size, 3)).max(3).max(1)
Channels first ordering
# large image is shape (3, 128, 128)
# small image is shape (3, 64, 64)
input_size = 128
output_size = 64
bin_size = input_size // output_size
small_image = large_image.reshape((3, output_size, bin_size,
output_size, bin_size)).max(4).max(2)
For grayscale images just change the 3 to a 1 like this:
Channels first ordering
# large image is shape (1, 128, 128)
# small image is shape (1, 64, 64)
input_size = 128
output_size = 64
bin_size = input_size // output_size
small_image = large_image.reshape((1, output_size, bin_size,
output_size, bin_size)).max(4).max(2)
This method uses the equivalent of max pooling. It's the fastest way to do this that I've found.
If anyone came here looking for a simple method to scale/resize an image in Python, without using additional libraries, here's a very simple image resize function:
#simple image scaling to (nR x nC) size
def scale(im, nR, nC):
nR0 = len(im) # source number of rows
nC0 = len(im[0]) # source number of columns
return [[ im[int(nR0 * r / nR)][int(nC0 * c / nC)]
for c in range(nC)] for r in range(nR)]
Example usage: resizing a (30 x 30) image to (100 x 200):
import matplotlib.pyplot as plt
def sqr(x):
return x*x
def f(r, c, nR, nC):
return 1.0 if sqr(c - nC/2) + sqr(r - nR/2) < sqr(nC/4) else 0.0
# a red circle on a canvas of size (nR x nC)
def circ(nR, nC):
return [[ [f(r, c, nR, nC), 0, 0]
for c in range(nC)] for r in range(nR)]
plt.imshow(scale(circ(30, 30), 100, 200))
This works to shrink/scale images, and works fine with numpy arrays.
For people who wants to resize(interpolate) a batch of numpy array, pytorch provide a faster function names torch.nn.functional.interpolate, just remember to use np.transpose first to change the channel from batchxWxHx3 to batchx3xWxH.
SciPy's imresize() method was another resize method, but it will be removed starting with SciPy v 1.3.0 . SciPy refers to PIL image resize method: Image.resize(size, resample=0)
size – The requested size in pixels, as a 2-tuple: (width, height).
resample – An optional resampling filter. This can be one of PIL.Image.NEAREST (use nearest neighbour), PIL.Image.BILINEAR (linear interpolation), PIL.Image.BICUBIC (cubic spline interpolation), or PIL.Image.LANCZOS (a high-quality downsampling filter). If omitted, or if the image has mode “1” or “P”, it is set PIL.Image.NEAREST.
Link here:
Stumbled back upon this after a few years. It looks like the answers so far fall into one of a few categories:
Use an external library. (OpenCV, SciPy, etc)
User Power-of-Two Scaling
Use Nearest Neighbor
These solutions are all respectable, so I offer this only for completeness. It has three advantages over the above: (1) it will accept arbitrary resolutions, even non-power-of-two scaling factors; (2) it uses pure Python+Numpy with no external libraries; and (3) it interpolates all the pixels for an arguably 'nicer-looking' result.
It does not make good use of Numpy and, thus, is not fast, especially for large images. If you're only rescaling smaller images, it should be fine. I offer this under Apache or MIT license at the discretion of the user.
import math
import numpy
def resize_linear(image_matrix, new_height:int, new_width:int):
"""Perform a pure-numpy linear-resampled resize of an image."""
output_image = numpy.zeros((new_height, new_width), dtype=image_matrix.dtype)
original_height, original_width = image_matrix.shape
inv_scale_factor_y = original_height/new_height
inv_scale_factor_x = original_width/new_width
# This is an ugly serial operation.
for new_y in range(new_height):
for new_x in range(new_width):
# If you had a color image, you could repeat this with all channels here.
# Find sub-pixels data:
old_x = new_x * inv_scale_factor_x
old_y = new_y * inv_scale_factor_y
x_fraction = old_x - math.floor(old_x)
y_fraction = old_y - math.floor(old_y)
# Sample four neighboring pixels:
left_upper = image_matrix[math.floor(old_y), math.floor(old_x)]
right_upper = image_matrix[math.floor(old_y), min(image_matrix.shape[1] - 1, math.ceil(old_x))]
left_lower = image_matrix[min(image_matrix.shape[0] - 1, math.ceil(old_y)), math.floor(old_x)]
right_lower = image_matrix[min(image_matrix.shape[0] - 1, math.ceil(old_y)), min(image_matrix.shape[1] - 1, math.ceil(old_x))]
# Interpolate horizontally:
blend_top = (right_upper * x_fraction) + (left_upper * (1.0 - x_fraction))
blend_bottom = (right_lower * x_fraction) + (left_lower * (1.0 - x_fraction))
# Interpolate vertically:
final_blend = (blend_top * y_fraction) + (blend_bottom * (1.0 - y_fraction))
output_image[new_y, new_x] = final_blend
return output_image
Sample rescaling:
Downscaled by Half:
Upscaled by one and one quarter:
Are there any libraries to do this in numpy/SciPy
Sure. You can do this without OpenCV, scikit-image or PIL.
Image resizing is basically mapping the coordinates of each pixel from the original image to its resized position.
Since the coordinates of an image must be integers (think of it as a matrix), if the mapped coordinate has decimal values, you should interpolate the pixel value to approximate it to the integer position (e.g. getting the nearest pixel to that position is known as Nearest neighbor interpolation).
All you need is a function that does this interpolation for you. SciPy has interpolate.interp2d.
You can use it to resize an image in numpy array, say arr, as follows:
W, H = arr.shape[:2]
new_W, new_H = (600,300)
xrange = lambda x: np.linspace(0, 1, x)
f = interp2d(xrange(W), xrange(H), arr, kind="linear")
new_arr = f(xrange(new_W), xrange(new_H))
Of course, if your image is RGB, you have to perform the interpolation for each channel.
If you would like to understand more, I suggest watching Resizing Images - Computerphile.
import cv2
import numpy as np
image_read = cv2.imread('filename.jpg',0)
original_image = np.asarray(image_read)
width , height = 452,452
resize_image = np.zeros(shape=(width,height))
for W in range(width):
for H in range(height):
new_width = int( W * original_image.shape[0] / width )
new_height = int( H * original_image.shape[1] / height )
resize_image[W][H] = original_image[new_width][new_height]
print("Resized image size : " , resize_image.shape)
I am working on an image processing/building problem. I have a smaller image that I want to place into a larger one. As normal the image is represented as a 3d array. This works fine with the following code (both element_pixels and image_pixels are 3d ndarrays with depth 3 representing RGB, element_pixels is equal to or smaller than image_pixels in the other dimensions):
element_pixels = element.get_pixels()
image_pixels[element.position[0]:element.position[0]+element.height, element.position[1]:element.position[1]+element.width, :] = element_pixels
However I want to treat black pixels in the element as transparent. The simplest way to do this seems to be to mask the element so I don't modify image_pixels where element_pixel is black. I tried the following, but I am tying myself in knots:
element_pixels = element.get_pixels()
b = np.all(element_pixels == [0, 0, 0], axis=-1)
black_pixels_mask = np.dstack([b,b,b])
image_pixels[element.position[0]:element.position[0]+element.height, element.position[1]:element.position[1]+element.width, :][black_pixels_mask] = element_pixels
This looks to be correctly generating a mask but I can't figure out how to use it. I get the following error:
image_pixels[element.position[0]:element.position[0]+element.height, element.position[1]:element.position[1]+element.width, :][black_pixels_mask] = element_pixels
TypeError: NumPy boolean array indexing assignment requires a 0 or 1-dimensional input, input has 3 dimensions
The masking kind-of works (i.e. runs without exceptions) if I replace the final = element_pixels with a constant, but I'm struggling to extrapolate this to a solution.
Extra detail of sizes
element_pixels.shape=(40, 40,3)
image_pixels.shape=(100, 100,3)
image_pixels[element.position[0]:element.position[0]+element.height, element.position[1]:element.position[1]+element.width, :].shape = (40,40,3)
A MRE in 2d
This captures what I'm trying to do without the complexity of the extra dimension.
import numpy as np
bg = np.ones((10,10))*0.5
img = np.concatenate([np.zeros((5,1)),np.ones((5,1))], axis=1)
mask = img == 0
# copy the *non-zero* pixel values of img to a particular location in bg
bg[5:10,5:7][mask] = img # this throws exception
I discovered after some experimentation that the (perhaps obvious in hindsight) answer is the you have to apply the mask to both sides.
So taking my MRE:
import numpy as np
bg = np.ones((10,10))*0.5
img = np.concatenate([np.zeros((5,1)),np.ones((5,1))], axis=1)
mask = img > 0
bg[5:10,5:7][mask] = img[mask]
Or going back to my original code, the only line that changes is:
image_pixels[element.position[0]:element.position[0]+element.height, element.position[1]:element.position[1]+element.width, :][~black_pixels_mask] = element_pixels[~black_pixels_mask]
Well you can use a 2d mask on a 3d array. So something like this will replace all black pixels of img with those of background.
img = np.random.randint(0, 2, (10, 10, 3))
background = np.random.randint(0, 2, (10, 10, 3))
mask = np.all(img == [0,0,0], axis=2)
img[mask] = background[img]
I'm not sure I understand what is in image_pixels but I think you can do something similar.
When you use opencv-python (cv2) and read from a VideoCapture device it returns a numpy array representing the image, in my case the dimensions are (480, 640, 3). I read about vectorizing but I haven't been able to really understand it.
This is the function I wanna map
def RGBtoRGChromaticity(pixel):
r, g, b = pixel
total = r + g + b
return r/total, g/total, b/total
Here is my attempt at vectorizing it, but it doesn't work :(
def RGBtoRGChromaticity(pixel):
r, g, b = pixel
total = ufunc.add(r, g, b)
return ufunc.true_divide(r, total), ufunc.true_divide(g, total), ufunc.true_divide(b, total)
I am trying to take an image and find green pixels. I found this article on a color space called RG Chromaticity which, from my understanding, makes it easy to find the dominant color in each pixel. The math on the article seems to follow this idea. My main question here is how to map a function over the numpy array, but if anyone has any advice on color spaces and better ways to approach this project please don't hesitate to share!!
The point of vectorization is that you apply it to the whole array at once, not to each pixel.
Let's say you have
img = np.random.randint(255, size=(480, 640, 3), dtype=np.uint8)
You function then becomes
def RGBtoRGC(img):
return img / img.sum(axis=-1, keepdims=True)
If you want the output to be uint8, I suggest rounding:
def RGBtoRGC(img):
return np.rint(img / img.sum(axis=-1, keepdims=True), dtype=np.uint8)
How about splitting channels in map_function args? like this
import numpy as np
def RGBtoRGChromaticity(r, g, b):
total = r + g + b
return r / total, g / total, b / total
vfunc = np.vectorize(RGBtoRGChromaticity)
image = np.random.uniform(size=(480, 640, 3))
res = vfunc(image[:, :, 0], image[:, :, 1], image[:, :, 2])
I would like to apply a filter/kernel to an image to alter it (for instance, perform vertical edge detection, diagonal blur, etc). I found this wikipedia page with some interesting examples of kernels.
When I look online, filters are implemented using opencv or default matplotlib/Pillow functions. I want to be able to modify an image using only numpy arrays and functions like matrix multiplication and such (There doesn't appear to be a default numpy function to perform the convolution operation.)I've tried very hard to figure it out but I keep making errors and I'm also relatively new to numpy.
I worked out this code to convert an image to greyscale:
import numpy as np
from PIL import Image
img = Image.open("my_path/my_image.jpeg")
img = np.array(img.resize((180, 320)))
grey = np.zeros((320, 180))
grey_avg_array = (np.sum(img,axis=-1,keepdims=False)/3)
grey_avg_array = grey_avg_array.astype(np.uint8)
grey_image = Image.fromarray(grey_avg_array)
I have tried to multiply my image by a numpy array [[1, 0, -1], [1, 0, -1], [1, 0, -1]] to implement edge detection but that gave me a broadcasting error. What would some sample code/useful functions that can do this without errors look like?
Also: a minor problem I've faced all day is that PIL can't display (x, x, 1) shaped arrays as images. Why is this? How do I get it to fix this? (np.squeeze didn't work)
Note: I would highly recommend checking out OpenCV, which has a large variety of built-in image filters.
Also: a minor problem I've faced all day is that PIL can't display (x, x, 1) shaped arrays as images. Why is this? How do I get it to fix this? (np.squeeze didn't work)
I assume the issue here is with processing grayscale float arrays. To fix this issue, you have to convert the float arrays to np.uint8 and use the 'L' mode in PIL.
img_arr = np.random.rand(100, 100) # Our float array in the range (0, 1)
uint8_img_arr = np.uint8(img_arr * 255) # Converted to the np.uint8 type
img = Image.fromarray(uint8_img_arr, 'L') # Create PIL Image from img_arr
As for doing convolutions, SciPy provides functions for doing convolutions with kernels that you may find useful.
But since we're solely using NumPy, let's implement it!
Note: To make this as general as possible, I am adding a few extra parameters that may or may not be important to you.
# Assuming the image has channels as the last dimension.
# filter.shape -> (kernel_size, kernel_size, channels)
# image.shape -> (width, height, channels)
def convolve(image, filter, padding = (1, 1)):
# For this to work neatly, filter and image should have the same number of channels
# Alternatively, filter could have just 1 channel or 2 dimensions
if(image.ndim == 2):
image = np.expand_dims(image, axis=-1) # Convert 2D grayscale images to 3D
if(filter.ndim == 2):
filter = np.repeat(np.expand_dims(filter, axis=-1), image.shape[-1], axis=-1) # Same with filters
if(filter.shape[-1] == 1):
filter = np.repeat(filter, image.shape[-1], axis=-1) # Give filter the same channel count as the image
#print(filter.shape, image.shape)
assert image.shape[-1] == filter.shape[-1]
size_x, size_y = filter.shape[:2]
width, height = image.shape[:2]
output_array = np.zeros(((width - size_x + 2*padding[0]) + 1,
(height - size_y + 2*padding[1]) + 1,
image.shape[-1])) # Convolution Output: [(W−K+2P)/S]+1
padded_image = np.pad(image, [
(padding[0], padding[0]),
(padding[1], padding[1]),
(0, 0)
for x in range(padded_image.shape[0] - size_x + 1): # -size_x + 1 is to keep the window within the bounds of the image
for y in range(padded_image.shape[1] - size_y + 1):
# Creates the window with the same size as the filter
window = padded_image[x:x + size_x, y:y + size_y]
# Sums over the product of the filter and the window
output_values = np.sum(filter * window, axis=(0, 1))
# Places the calculated value into the output_array
output_array[x, y] = output_values
return output_array
Here is an example of its usage:
Original Image (saved as original.png):
filter = np.array([
[1, 1, 1],
[1, 1, 1],
[1, 1, 1]
], dtype=np.float32)/9.0 # Box Filter
image = Image.open('original.png')
image_arr = np.array(image)/255.0
convolved_arr = convolve(image_arr, filter, padding=(1, 1))
convolved = Image.fromarray(np.uint8(255 * convolved_arr), 'RGB') # Convolved Image
Convolved Image:
A few things:
OpenCV, SciPy and scikit-image all use Numpy arrays as the standard way to store and manipulate images and are all largely interoperable with Numpy and each other
as regards plotting im with shape (x,y,1), you can just take the zeroth plane and plot that, i.e. newim = im[...,0]
When converting an RGB image to greyscale, rather than add all the RGB components up and divide by 3, you could just calculate the mean:
grey = np.mean(im, axis=2)
Actually the recommended weightings in ITU-R 601-2 are
L = 0.299 * Red + 0.587 * Green + 0.114 * Blue
So, you can use np.dot() to do that:
grey = np.dot(RGBimg[...,:3], [0.299, 0.587,0.114]).astype(np.uint8)
As regards finding vertical edges, you can do this with Numpy by subtracting each pixel from the one to its immediate right, i.e. differencing. Here is a little example, I also drew the shapes with Numpy so you can see a way to do that without using OpenCV since it seems to upset you so much ;-)
#!/usr/bin/env python3
import numpy as np
# Create a test image with a white square on black
rect = np.zeros((200,200), dtype=np.uint8)
rect[40:-40,40:-40] = 255
# Create a test image with a white circle on black
xx, yy = np.mgrid[:200, :200]
circle = (xx - 100) ** 2 + (yy - 100) ** 2
circle = (circle<4096).astype(np.uint8)*255
# Concatenate side-by-side to make our test image
im = np.hstack((rect,circle))
That now looks like this:
# Calculate horizontal differences only finding increasing brightnesses
d = im[:,1:] - im[:,0:-1]
# Calculate horizontal differences finding increasing or decreasing brightnesses
d = np.abs(im[:,1:].astype(np.int16) - im[:,0:-1].astype(np.int16))
Not very efficient, but you could extend your code by the following to detect edges:
edge = np.zeros([322, 182])
for i in range(grey_avg_array.shape[0]-2):
for j in range(grey_avg_array.shape[1]-2):
edge[i+1, j+1] = np.sum(grey_avg_array[i:i+3, j:j+3]*[[1, 0, -1], [1, 0, -1], [1, 0, -1]])
edge = edge.astype(np.uint8)
edge_img = Image.fromarray(edge)
To show image in the (say) Jupyter Notebook, you could just type the variable name (after you have done Image.fromarray()) as I have written above in the last line.
I can declare a 3D array like this:
a 3D array, shape-(2, 2, 2)
3D_array = np.array([[[0, 1],[2, 3]], [[4, 5],[6, 7]]])
So if I have an image 10*10(pixels) 3 rgb channels, image.shape would be (3x10x10).
But i see all the time image.shape equal to (10x10x3), i don't understand why?
Thanks for you attention.
Usually in numpy and matplotlib the rgb channels are in the last axis. This is just a convention, so you can do little about this. If you use a program that uses the other convention (channels first), you can transform the image with:
channels_first_im = np.moveaxis(channels_last_im, 0, 1)
and the other way:
channels_last_im = np.moveaxis(channels_first_im, 0, -1)
If you're confused about why the convention image arrays would be of shape (N, M, 3) instead of (3, N, M), let's look at how indexing would work in both of those scenarios.
Let's assume we have an image called image_array, that represents a random colored with a width and height of 100 pixels, and let's try to index it to access the value of the pixel at index (50, 50).
Channels First
import numpy as np
image = np.random.random((3, 100, 100)) #image.shape == (3, 100, 100)
pixel = image[:, 50, 50] #pixel.shape == (3,)
Channels Last
import numpy as np
image = np.random.random((100, 100, 3)) #image.shape == (100, 100, 3)
pixel = image[50, 50] #pixel.shape == (3,)
Having the channels as the last dimension of the array, means that the individual pixel information is easier to index to find. Where as in the first case, we need to specify that we want the entire first dimension every time. These are inherently the same thing, but leaving the channels last allows us to be less verbose as to how we index the array.