I already achieved the goal described in the title but I was wondering if there was a more efficient (or generally better) way to do it. First of all let me introduce the problem.
I have a set of images of different sizes but with a width/height ratio less than (or equal) 2 (could be anything but let's say 2 for now), I want to normalize each one, meaning I want all of them to have the same size. Specifically I am going to do so like this:
Extract the max height above all images
Zoom the image so that each image reaches the max height keeping its ratio
Add a padding to the right with just white pixels until the image has a width/height ratio of 2
Keep in mind the images are represented as numpy matrices of grey scale values [0,255].
This is how I'm doing it now in Python:
max_height = numpy.max([len(obs) for obs in data if len(obs[0])/len(obs) <= 2])
for obs in data:
if len(obs[0])/len(obs) <= 2:
new_img = ndimage.zoom(obs, round(max_height/len(obs), 2), order=3)
missing_cols = max_height * 2 - len(new_img[0])
norm_img = []
for row in new_img:
norm_img.append(np.pad(row, (0, missing_cols), mode='constant', constant_values=255))
norm_img = np.resize(norm_img, (max_height, max_height*2))
There's a note about this code:
I'm rounding the zoom ratio because it makes the final height equal to max_height, I'm sure this is not the best approach but it's working (any suggestion is appreciated here). What I'd like to do is to expand the image keeping the ratio until it reaches a height equal to max_height. This is the only solution I found so far and it worked right away, the interpolation works pretty good.
So my final questions are:
Is there a better approach to achieve what explained above (image normalization) ? Do you think I could have done this differently ? Is there a common good practice I'm not following ?
Thanks in advance for your time.
Instead of ndimage.zoom you could use
scipy.misc.imresize. This
function allows you to specify the target size as a tuple, instead of by zoom
factor. Thus you won't have to call np.resize later to get the size exactly as
desired.
Note that scipy.misc.imresize calls
PIL.Image.resize
under the hood, so PIL (or Pillow) is a dependency.
Instead of using np.pad in a for-loop, you could allocate space for the desired array, norm_arr, first:
norm_arr = np.full((max_height, max_width), fill_value=255)
and then copy the resized image, new_arr into norm_arr:
nh, nw = new_arr.shape
norm_arr[:nh, :nw] = new_arr
For example,
from __future__ import division
import numpy as np
from scipy import misc
data = [np.linspace(255, 0, i*10).reshape(i,10)
for i in range(5, 100, 11)]
max_height = np.max([len(obs) for obs in data if len(obs[0])/len(obs) <= 2])
max_width = 2*max_height
result = []
for obs in data:
norm_arr = obs
h, w = obs.shape
if float(w)/h <= 2:
scale_factor = max_height/float(h)
target_size = (max_height, int(round(w*scale_factor)))
new_arr = misc.imresize(obs, target_size, interp='bicubic')
norm_arr = np.full((max_height, max_width), fill_value=255)
# check the shapes
# print(obs.shape, new_arr.shape, norm_arr.shape)
nh, nw = new_arr.shape
norm_arr[:nh, :nw] = new_arr
result.append(norm_arr)
# visually check the result
# misc.toimage(norm_arr).show()
Related
I have a gray scale image that I want to rotate. However, I need to do optimization on it. Therefore, I cannot use pillow or opencv.
I want to reshape this image using python with numpy.reshape into an one dimensional vector (where I use the default settings C-style reshape).
And thereafter, I want to rotate this image around a point using matrix multiplication and addition, i.e. it should be something like
rotated_image_vector = A # vector + b # (or the equivalent in homogenious coordinates).
After this operation I want to reshape the outcome back to two dimensions and have the rotated image.
It would be best if it would as well use linear interpolation between the pixels that do not fit exactly to an other pixel.
The mathematical theory tells it is possible, and I believe there is a very elegant solution to this problem, but I do not see how to create this matrix. Did anyone already have this problem or sees an immediate solution?
Thanks a lot,
Eike
I like your approach but there is a slight misconception in it. What you want to transform are not the pixel values themselves but the coordinates. So you don't reshape your image but rather do a np.indices on it to obtain coordinates to each pixel. For those a rotation around a point looks like
rotation_matrix#(coordinates-fixed_point)+fixed_point
except that I have to transpose a bit to get the dimensions to align. The cove below is a slight adoption of my code in this answer.
As an example I am going to use the Wikipedia-logo-v2 by Nohat. It is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported license.
First I read in the picture, swap x and y axis to not get mad and rotate the coordinates as described above.
import numpy as np
import matplotlib.pyplot as plt
import itertools
image = plt.imread('wikipedia.jpg')
image = np.swapaxes(image,0,1)/255
fixed_point = np.array(image.shape[:2], dtype='float')/2
points = np.moveaxis(np.indices(image.shape[:2]),0,-1).reshape(-1,2)
a = 2*np.pi/8
A = np.array([[np.cos(a),-np.sin(a)],[np.sin(a),np.cos(a)]])
rotated_coordinates = (A#(points-fixed_point.reshape(1,2)).T).T+fixed_point.reshape(1,2)
Now I set up a little class to interpolate between the pixels that do not fit exactly to an other pixel. And finally I swap the axis back and plot it.
class Image_knn():
def fit(self, image):
self.image = image.astype('float')
def predict(self, x, y):
image = self.image
weights_x = [(1-(x % 1)).reshape(*x.shape,1), (x % 1).reshape(*x.shape,1)]
weights_y = [(1-(y % 1)).reshape(*x.shape,1), (y % 1).reshape(*x.shape,1)]
start_x = np.floor(x)
start_y = np.floor(y)
return sum([image[np.clip(np.floor(start_x + x), 0, image.shape[0]-1).astype('int'),
np.clip(np.floor(start_y + y), 0, image.shape[1]-1).astype('int')] * weights_x[x]*weights_y[y]
for x,y in itertools.product(range(2),range(2))])
image_model = Image_knn()
image_model.fit(image)
transformed_image = image_model.predict(*rotated_coordinates.T).reshape(*image.shape)
plt.imshow(np.swapaxes(transformed_image,0,1))
And I get a result like this
Possible Issue
The artifact in the bottom left that looks like one needs to clean the screen comes from the following problem: When we rotate it can happen that we don't have enough pixels to paint the lower left. What we do by default in image_knn is to clip the coordinates to an area where we have information. That means when we ask image knn for pixels coming from outside the image it gives us the pixels at the boundary of the image. This looks good if there is a background but if an object touches the edge of the picture it looks odd like here. Just something to keep in mind when using this.
Thank you for your answer!
But actually it is not a misconception that you could let this roation be represented by a matrix multiplication with the reshaped vector.
I used your code to generate such a matrix (its surely not the most efficient way but it works, most likely you see a more efficient implementation immediately XD. You see I really need it as a matix multiplication :-D).
What I basically did is to generate the representation matrix of the linear transformation, by computing how every of the 100*100 basis images (i.e. the image with zeros everywhere und a one) is mapped by your transformation.
import sys
import numpy as np
import matplotlib.pyplot as plt
import itertools
angle = 2*np.pi/6
image_expl = plt.imread('wikipedia.jpg')
image_expl = image_expl[:,:,0]
plt.imshow(image_expl)
plt.title("Image")
plt.show()
image_shape = image_expl.shape
pixel_number = image_shape[0]*image_shape[1]
rot_mat = np.zeros((pixel_number,pixel_number))
for i in range(pixel_number):
vector = np.zeros(pixel_number)
vector[i] = 1
image = vector.reshape(*image_shape)
fixed_point = np.array(image.shape, dtype='float')/2
points = np.moveaxis(np.indices(image.shape),0,-1).reshape(-1,2)
a = -angle
A = np.array([[np.cos(a),-np.sin(a)],[np.sin(a),np.cos(a)]])
rotated_coordinates = (A#(points-fixed_point.reshape(1,2)).T).T+fixed_point.reshape(1,2)
x,y = rotated_coordinates.T
image = image.astype('float')
weights_x = [(1-(x % 1)).reshape(*x.shape), (x % 1).reshape(*x.shape)]
weights_y = [(1-(y % 1)).reshape(*x.shape), (y % 1).reshape(*x.shape)]
start_x = np.floor(x)
start_y = np.floor(y)
transformed_image_returned = sum([image[np.clip(np.floor(start_x + x), 0, image.shape[0]-1).astype('int'),
np.clip(np.floor(start_y + y), 0, image.shape[1]-1).astype('int')] * weights_x[x]*weights_y[y]
for x,y in itertools.product(range(2),range(2))])
rot_mat[:,i] = transformed_image_returned
if i%100 == 0: print(int(100*i/pixel_number), "% finisched")
plt.imshow((rot_mat # image_expl.reshape(-1)).reshape(image_shape))
Thank you again :-)
After training a model (image classification) I would like to see how it performs differently when I evaluate a proper image and various noised versions of it.
The type of noise I'm thinking is a random change in pixels value, I tried with this approach:
# --Inside the generator function that I provide to model.predict_generator--
# dataset is a numpy array with denoised images path
dt = tf.data.Dataset.from_generator(lambda: image_generator(dataset), output_types=(tf.float32))
def image_generator_(image_paths):
for path in image_paths:
# im is keras.preprocessing image
img = im.load_img(path,
color_mode='rgb',
target_size=(224,224))
img_to_numpy = np.array(img)
for _ in range (0, 5):
tmp_numpy_image = img_to_numpy.copy()
for i in range(tmp_numpy_image.shape[0]):
for j in range(tmp_numpy_image.shape[1]):
# add noise
tmp_numpy_image.shape[i][j] = ...
yield tmp_numpy_image
This process works fine but it is very slow. I also use dataset.batch and dataset.prefetch on dt and I didn't found a combination for their values that reduces the algorithm time
Is there a smarter way to do it? I tried by yielding not noised images and to add the noise later inside dataset.map. The problem is that inside map I have to manipulate tensors and I didn't found a way to change each pixel value
SOLUTION
I used #Marat approach and it worked like a charm, the whole process went from 20-30 hours to minutes. My noise was a simple +-1 but I didn't want to go in overflow (255+1 = 0 in uint8) and therefore I only had to use numpy masks
...
tmp_numpy_image = img_to_numpy.copy()
noise = np.random.randint(-1, 1, img_to_numpy.shape)
# tmp_numpy_imag will become of type int32
tmp_numpy_image = tmp_numpy_image + noise
np.putmask(tmp_numpy_image, tmp_numpy_image < 0, 0)
np.putmask(tmp_numpy_image, tmp_numpy_image > 255, 255)
tmp_numpy_image = tmp_numpy_image.astype('uint8')
yield tmp_numpy_image
The biggest overhead here is pixel operations (double for loop). Vectorizing it should result in substantial speedup:
noise_magnitude = 10
...
img_max_value = img_to_numpy.max() * np.ones(img_to_numpy.shape)
for _ in range (0, 5):
# depending on range of values, you might want to adjust noise magnitude
noise = np.random.randint(0, noise_magnitude, img_to_numpy.shape)
# after adding noise, clip values exceeding max values
yield np.maximum(img_to_numpy + noise, img_max_value)
I am using Python 3.6 to perform basic image manipulation through Pillow. Currently, I am attempting to take 32-bit PNG images (RGBA) of arbitrary color compositions and sizes and quantize them to a known palette of 16 colors. Optimally, this quantization method should be able to leave fully transparent (A = 0) pixels alone, while forcing all semi-transparent pixels to be fully opaque (A = 255). I have already devised working code that performs this, but I wonder if it may be inefficient:
import math
from PIL import Image
# a list of 16 RGBA tuples
palette = [
(0, 0, 0, 255),
# ...
]
with Image.open('some_image.png').convert('RGBA') as img:
for py in range(img.height):
for px in range(img.width):
pix = img.getpixel((px, py))
if pix[3] == 0: # Ignore fully transparent pixels
continue
# Perform exhaustive search for closest Euclidean distance
dist = 450
best_fit = (0, 0, 0, 0)
for c in palette:
if pix[:3] == c: # If pixel matches exactly, break
best_fit = c
break
tmp = sqrt(pow(pix[0]-c[0], 2) + pow(pix[1]-c[1], 2) + pow(pix[2]-c[2], 2))
if tmp < dist:
dist = tmp
best_fit = c
img.putpixel((px, py), best_fit + (255,))
img.save('quantized.png')
I think of two main inefficiencies of this code:
Image.putpixel() is a slow operation
Calculating the distance function multiple times per pixel is computationally wasteful
Is there a faster method to do this?
I've noted that Pillow has a native function Image.quantize() that seems to do exactly what I want. But as it is coded, it forces dithering in the result, which I do not want. This has been brought up in another StackOverflow question. The answer to that question was simply to extract the internal Pillow code and tweak the control variable for dithering, which I tested, but I find that Pillow corrupts the palette I give it and consistently yields an image where the quantized colors are considerably darker than they should be.
Image.point() is a tantalizing method, but it only works on each color channel individually, where color quantization requires working with all channels as a set. It'd be nice to be able to force all of the channels into a single channel of 32-bit integer values, which seems to be what the ill-documented mode "I" would do, but if I run img.convert('I'), I get a completely greyscale result, destroying all color.
An alternative method seems to be using NumPy and altering the image directly. I've attempted to create a lookup table of RGB values, but the three-dimensional indexing of NumPy's syntax is driving me insane. Ideally I'd like some kind of code that works like this:
img_arr = numpy.array(img)
# Find all unique colors
unique_colors = numpy.unique(arr, axis=0)
# Generate lookup table
colormap = numpy.empty(unique_colors.shape)
for i, c in enumerate(unique_colors):
dist = 450
best_fit = None
for pc in palette:
tmp = sqrt(pow(c[0] - pc[0], 2) + pow(c[1] - pc[1], 2) + pow(c[2] - pc[2], 2))
if tmp < dist:
dist = tmp
best_fit = pc
colormap[i] = best_fit
# Hypothetical pseudocode I can't seem to write out
for iy in range(arr.size):
for ix in range(arr[0].size):
if arr[iy, ix, 3] == 0: # Skip transparent
continue
index = # Find index of matching color in unique_colors, somehow
arr[iy, ix] = colormap[index]
I note with this hypothetical example that numpy.unique() is another slow operation, since it sorts the output. Since I cannot seem to finish the code the way I want, I haven't been able to test if this method is faster anyway.
I've also considered attempting to flatten the RGBA axis by converting the values to a 32-bit integer and desiring to create a one-dimensional lookup table with the simpler index:
def shift(a):
return a[0] << 24 | a[1] << 16 | a[2] << 8 | a[3]
img_arr = numpy.apply_along_axis(shift, 1, img_arr)
But this operation seemed noticeably slow on its own.
I would prefer answers that involve only Pillow and/or NumPy, please. Unless using another library demonstrates a dramatic computational speed increase over any PIL- or NumPy-native solution, I don't want to import extraneous libraries to do something these two libraries should be reasonably capable of on their own.
for loops should be avoided for speed.
I think you should make a tensor like:
d2[x,y,color_index,rgb] = distance_squared
where rgb = 0..2 (0 = r, 1 = g, 2 = b).
Then compute the distance:
d[x,y,color_index] =
sqrt(sum(rgb,d2))
Then select the color_index with the minimal distance:
c[x,y] = min_index(color_index, d)
Finally replace alpha as needed:
alpha = ceil(orig_image.alpha)
img = c,alpha
Previously, I had created a Mandelbrot generator in python using turtle. Now, I am re-writing the program to use the Python Imaging Library in order to increase speed and reduce limits on size of images.
However, the program below only outputs RGB nonsense, almost noise. I think it is something to do with a difference in the way NumPy and PIL deal with arrays, since saying l[x,y] = [1,1,1] where l = np.zeros((height,width,3)) doesn't just make 1 pixel white when img = Image.fromarray(l) and img.show() are performed.
def imagebrot(mina=-1.25, maxa=1.25, minb=-1.25, maxb=1.25, width=100, height=100, maxit=300, inf=2):
l,b = np.zeros((height,width,3), dtype=np.float64), minb
for y in range(0, height):
a = mina
for x in range(0, width):
ab = mandel(a, b, maxit, inf)
if ab[0] == maxit:
l[x,y:] = [1,1,1]
#if ab[0] < maxit:
#smoothit = mandelc(ab[0], ab[1], ab[2])
#l[x, y] = colorsys.hsv_to_rgb(smoothit, 1, 1)
a += abs(mina-maxa)/width
b += abs(minb-maxb)/height
img = Image.fromarray(l, "RGB")
img.show()
def mandel(re, im, maxit, inf):
z = complex(re, im)
c,it = z,0
for i in range(0, maxit):
if abs(z) > inf:
break
z,it = z*z+c,it+1
return it,z,inf
def mandelc(it,z,inf):
return (it+1-log(log(abs(z)))/log(2))
UPDATE 1:
I realised that one of the major errors in this program (I'm sure there are many) is the fact that I was using the x,y coords as the complex coefficients! So, 0 to 100 instead of -1.25 to 1.25! I have changed this so that the code now uses variables a,b to describe them, incremented in a manner I've stolen from some of my code in the turtle version. The code above has been updated accordingly. Since the Smooth Colouring Algorithm code is currently commented out for debugging, the inf variable has been reduced to 2 in size.
UPDATE 2:
I have edited the numpy index with help from a great user. The program now outputs this when set to 200 by 200:
As you can see, it definitely shows some mathematical shape and yet is filled with all these strange red, green and blue pixels! Why could these be here? My program can only set RGB values to [1,1,1] or leave it as a default [0,0,0]. It can't be [1,0,0] or anything like that - this must be a serious flaw...
UPDATE 3:
I think there is an error with NumPy and PIL's integration. If I make l = np.zeros((100, 100, 3)) and then state l[0,0,:] = 1 and finally img = Image.fromarray(l) & img.show(), this is what we get:
Here we get a series of coloured pixels. This calls for another question.
UPDATE 4:
I have no idea what was happening previously, but it seems with a np.uint8 array, Image.fromarray() uses colour values from 0-255. With this piece of wisdom, I move one step closer to understanding this Mandelbug!
Now, I do get something vaguely mathematical, however it still outputs strange things.
This dot is all there is... I get even stranger things if I change to np.uint16, I presume due to the different byte-shape and encoding scheme.
You are indexing the 3D array l incorrectly, try
l[x,y,:] = [1,1,1]
instead. For more details on how to access and modify numpy arrays have a look at numpy indexing
As a side note: the quickstart documentation of numpy actually has an implementation of the mandelbrot set generation and plotting.
I have a problem in which a have a bunch of images for which I have to generate histograms. But I have to generate an histogram for each pixel. I.e, for a collection of n images, I have to count the values that the pixel 0,0 assumed and generate an histogram, the same for 0,1, 0,2 and so on. I coded the following method to do this:
class ImageData:
def generate_pixel_histogram(self, images, bins):
"""
Generate a histogram of the image for each pixel, counting
the values assumed for each pixel in a specified bins
"""
max_value = 0.0
min_value = 0.0
for i in range(len(images)):
image = images[i]
max_entry = max(max(p[1:]) for p in image.data)
min_entry = min(min(p[1:]) for p in image.data)
if max_entry > max_value:
max_value = max_entry
if min_entry < min_value:
min_value = min_entry
interval_size = (math.fabs(min_value) + math.fabs(max_value))/bins
for x in range(self.width):
for y in range(self.height):
pixel_histogram = {}
for i in range(bins+1):
key = round(min_value+(i*interval_size), 2)
pixel_histogram[key] = 0.0
for i in range(len(images)):
image = images[i]
value = round(Utils.get_bin(image.data[x][y], interval_size), 2)
pixel_histogram[value] += 1.0/len(images)
self.data[x][y] = pixel_histogram
Where each position of a matrix store a dictionary representing an histogram. But, how I do this for each pixel, and this calculus take a considerable time, this seems to me to be a good problem to be parallelized. But I don't have experience with this and I don't know how to do this.
EDIT:
I tried what #Eelco Hoogendoorn told me and it works perfectly. But applying it to my code, where the data are a large number of images generated with this constructor (after the values are calculated and not just 0 anymore), I just got as h an array of zeros [0 0 0]. What I pass to the histogram method is an array of ImageData.
class ImageData(object):
def __init__(self, width=5, height=5, range_min=-1, range_max=1):
"""
The ImageData constructor
"""
self.width = width
self.height = height
#The values range each pixel can assume
self.range_min = range_min
self.range_max = range_max
self.data = np.arange(width*height).reshape(height, width)
#Another class, just the method here
def generate_pixel_histogram(realizations, bins):
"""
Generate a histogram of the image for each pixel, counting
the values assumed for each pixel in a specified bins
"""
data = np.array([image.data for image in realizations])
min_max_range = data.min(), data.max()+1
bin_boundaries = np.empty(bins+1)
# Function to wrap np.histogram, passing on only the first return value
def hist(pixel):
h, b = np.histogram(pixel, bins=bins, range=min_max_range)
bin_boundaries[:] = b
return h
# Apply this for each pixel
hist_data = np.apply_along_axis(hist, 0, data)
print hist_data
print bin_boundaries
Now I get:
hist_data = np.apply_along_axis(hist, 0, data)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/lib/shape_base.py", line 104, in apply_along_axis
outshape[axis] = len(res)
TypeError: object of type 'NoneType' has no len()
Any help would be appreciated.
Thanks in advance.
As noted by john, the most obvious solution to this is to look for library functionality that will do this for you. It exists, and it will be orders of magnitude more efficient than what you are doing here.
Standard numpy has a histogram function that can be used for this purpose. If you have only few values per pixel, it will be relatively inefficient; and it creates a dense histogram vector rather than the sparse one you produce here. Still, chances are good the code below solves your problem efficiently.
import numpy as np
#some example data; 128 images of 4x4 pixels
voxeldata = np.random.randint(0,100, (128, 4,4))
#we need to apply the same binning range to each pixel to get sensibble output
globalminmax = voxeldata.min(), voxeldata.max()+1
#number of output bins
bins = 20
bin_boundaries = np.empty(bins+1)
#function to wrap np.histogram, passing on only the first return value
def hist(pixel):
h, b = np.histogram(pixel, bins=bins, range=globalminmax)
bin_boundaries[:] = b #simply overwrite; result should be identical each time
return h
#apply this for each pixel
histdata = np.apply_along_axis(hist, 0, voxeldata)
print bin_boundaries
print histdata[:,0,0] #print the histogram of an arbitrary pixel
But the more general message id like to convey, looking at your code sample and the type of problem you are working on: do yourself a favor, and learn numpy.
Parallelization certainly would not be my first port of call in optimizing this kind of thing. Your main problem is that you're doing lots of looping at the Python level. Python is inherently slow at this kind of thing.
One option would be to learn how to write Cython extensions and write the histogram bit in Cython. This might take you a while.
Actually, taking a histogram of pixel values is a very common task in computer vision and it has already been efficiently implemented in OpenCV (which has python wrappers). There are also several functions for taking histograms in the numpy python package (though they are slower than the OpenCV implementations).