I'm looking for an efficient way to efficiently gamma-blend images.
While regular (additive) blend of pixels A and B with a factor r is expressed as this:
C = (1-r) A + r B
Gamma (multiplicative) blend is done as follows:
C = A^(1-r) B^r
This would require a way to raise a pixel channels to a non-integer power, a bit like a gamma correction.
Since I have a large batch of 4K images to process, I need this be done efficiently (without looping through all pixels and performing the computation individually).
Posting an implementation of the solution #Pascal Mount mentioned in the comments he used as he has yet to post his:
import numpy as np
def blend_gamma_mul(img_A, img_B, r):
arr_A = np.array(img_A)
arr_B = np.array(img_B)
arr_C = arr_A**(1-r) * arr_B**r
return Image.fromarray(np.array(arr_C, dtype=np.uint8))
Use the function like so:
from PIL import Image
img_A = Image.open("A.jpg")
img_B = Image.open("B.jpg")
img_C = blend_gamma_mul(img_A, img_B, 0.7)
Took 3.47s on my computer to blend two 4k images.
When I was searching internet for an algorithm to correct luminance I came across this article about prospective correction and retrospective correction. I'm mostly interested in the prospective correction. Basically we take pictures of the scene with image in it(original one), and two other ,one bright and one dark, pictures where we only see the background of the original picture.
My problem is that I couldn't find any adaptation of these formulas in openCV or code example. I tried to use the formulas as they were in my code but this time I had a problem with data types. This happened when I tried to find C constant by applying operations on images.
This is how I implemented the formula in my code:
def calculate_C(im, im_b):
fx_mean = cv.mean(im)
fx_over_bx = np.divide(im,im_b)
mean_fx_bx = cv.mean(fx_over_bx)
c = np.divide(fx_mean, mean_fx_bx)
return c
#Basic image reading and resizing
# Original image
img = cv.imread(image_path)
img = cv.resize(img, (1000,750))
# Bright image
b_img = cv.imread(bright_image_path)
b_img = cv.resize(b_img, (1000,750))
# Calculating C constant from the formula
c_constant = calculate_C(img, b_img)
# Because I have only the bright image I am using second formula from the article
img = np.multiply(np.divide(img,b_img), c_constant)
When I try to run this code I get the error:
img = np.multiply(np.divide(img,b_img), c_constant)
ValueError: operands could not be broadcast together with shapes (750,1000,3) (4,)
So, is there anything I can do to fix my code? or is there any hints that you can share with me to handle luminance correction with this method or better methods?
You are using cv2.mean function which returns array with shape (4,) - mean value for each channel. You may need to ignore last channel and correctly broadcast it to numpy.
Or you could use numpy for calculations instead of opencv.
I just take example images from provided article.
Complete example:
import cv2
import numpy as np
from numpy.ma import divide, mean
f = cv2.imread("grain.png")
b = cv2.imread("grain_background.png")
f = f.astype(np.float32)
b = b.astype(np.float32)
C = mean(f) / divide(f, b).mean()
g = divide(f, b) * C
g = g.astype(np.uint8)
cv2.imwrite("grain_out.png", g)
Your need to use masked divide operation because ordinary operation could lead to division by zero => nan values.
Resulting image (output.png):
Given a batch image tensor like B x C x W x H (batchSize,channels,width,height),
I would like to create a new tensor in which the new channels are the channels from nearby pixels (padded with 0s).
For instance, if I choose the nearby pixel size to be 3 x 3 (like a 3 x 3 filter) then there are 9 total nearby pixels and the final tensor size would be B x ( 9 * C ) x W x H.
Any recommendations on doing this, or do I just need to go the brute-force approach through iteration?
If you want to cut the edges short (img is your image tensor):
from skimage.util import view_as_windows
B,C,W,H = img.shape
img_ = view_as_windows(img,(1,1,3,3)).reshape(B,C,W-2,H-2,-1).transpose(0,1,4,2,3).reshape(B,C*9,W-2,H-2)
And if you want to pad them with 0 instead:
from skimage.util import view_as_windows
img = np.pad(img,((0,0),(0,0),(1,1),(1,1)))
B,C,W,H = img.shape
img_ = view_as_windows(img,(1,1,3,3)).reshape(B,C,W-2,H-2,-1).transpose(0,1,4,2,3).reshape(B,C*9,W-2,H-2)
For future readers, if you don't want to break the computation graph (using skimage) or want to use a more efficient implementation by not moving data from/to GPU, you probably want a native PyTorch solution instead.
This problem is very close to inverse PixelShuffle, and has a currently active feature request. The difference is that the poster wants to maintain image resolution while this solution does not.
I am copying the requester's initial code (which is pretty efficient) here:
out_channel = c*(r**2)
out_h = h//r
out_w = w//r
fm_view = fm.contiguous().view(b, c, out_h, r, out_w, r)
fm_prime = fm_view.permute(0,1,3,5,2,4).contiguous().view(b,out_channel, out_h, out_w)
I need to use Gaussian Mixture Models on an RGB image, and therefore the dataset is quite big. This needs to run on real time (from a webcam feed). I first coded this with Matlab and I was able to achieve a running time of 0.5 seconds for an image of 1729 × 866. The images for the final application will be smaller and therefore the timing will be faster.
However, I need to implement this with Python and OpenCV for the final application (I need it to run on an embedded board). I translated all my code and used sklearn.mixture.GMM to replace fitgmdist in Matlab. The line of code calculating the GMM model itself is performed in only 7.7e-05 seconds, but the one to fit the model takes 19 seconds. I have tried other types of covariance, such as 'diag' or 'spherical', and the time does reduce a little but the results are worse and the time is still not good enough, not even close.
I was wondering if there is any other library I can use, or if it would be worth it to translate the functions from Matlab to Python.
Here is my example:
import cv2
import numpy as np
import math
from sklearn.mixture import GMM
im = cv2.imread('Boat.jpg');
h, w, _ = im.shape; # Height and width of the image
# Extract Blue, Green and Red
imB = im[:,:,0]; imG = im[:,:,1]; imR = im[:,:,2];
# Reshape Blue, Green and Red channels into single-row vectors
imB_V = np.reshape(imB, [1, h * w]);
imG_V = np.reshape(imG, [1, h * w]);
imR_V = np.reshape(imR, [1, h * w]);
# Combine the 3 single-row vectors into a 3-row matrix
im_V = np.vstack((imR_V, imG_V, imB_V));
# Calculate the bimodal GMM
nmodes = 2;
GMModel = GMM(n_components = nmodes, covariance_type = 'full', verbose = 0, tol = 1e-3)
GMModel = GMModel.fit(np.transpose(im_V))
Thank you very much for your help
You can try fit with the 'diagonal' or spherical covariance matrix instead of full.
I believe it will be much faster.
I already achieved the goal described in the title but I was wondering if there was a more efficient (or generally better) way to do it. First of all let me introduce the problem.
I have a set of images of different sizes but with a width/height ratio less than (or equal) 2 (could be anything but let's say 2 for now), I want to normalize each one, meaning I want all of them to have the same size. Specifically I am going to do so like this:
Extract the max height above all images
Zoom the image so that each image reaches the max height keeping its ratio
Add a padding to the right with just white pixels until the image has a width/height ratio of 2
Keep in mind the images are represented as numpy matrices of grey scale values [0,255].
This is how I'm doing it now in Python:
max_height = numpy.max([len(obs) for obs in data if len(obs[0])/len(obs) <= 2])
for obs in data:
if len(obs[0])/len(obs) <= 2:
new_img = ndimage.zoom(obs, round(max_height/len(obs), 2), order=3)
missing_cols = max_height * 2 - len(new_img[0])
norm_img = []
for row in new_img:
norm_img.append(np.pad(row, (0, missing_cols), mode='constant', constant_values=255))
norm_img = np.resize(norm_img, (max_height, max_height*2))
There's a note about this code:
I'm rounding the zoom ratio because it makes the final height equal to max_height, I'm sure this is not the best approach but it's working (any suggestion is appreciated here). What I'd like to do is to expand the image keeping the ratio until it reaches a height equal to max_height. This is the only solution I found so far and it worked right away, the interpolation works pretty good.
So my final questions are:
Is there a better approach to achieve what explained above (image normalization) ? Do you think I could have done this differently ? Is there a common good practice I'm not following ?
Thanks in advance for your time.
Instead of ndimage.zoom you could use
scipy.misc.imresize. This
function allows you to specify the target size as a tuple, instead of by zoom
factor. Thus you won't have to call np.resize later to get the size exactly as
Note that scipy.misc.imresize calls
under the hood, so PIL (or Pillow) is a dependency.
Instead of using np.pad in a for-loop, you could allocate space for the desired array, norm_arr, first:
norm_arr = np.full((max_height, max_width), fill_value=255)
and then copy the resized image, new_arr into norm_arr:
nh, nw = new_arr.shape
norm_arr[:nh, :nw] = new_arr
For example,
from __future__ import division
import numpy as np
from scipy import misc
data = [np.linspace(255, 0, i*10).reshape(i,10)
for i in range(5, 100, 11)]
max_height = np.max([len(obs) for obs in data if len(obs[0])/len(obs) <= 2])
max_width = 2*max_height
result = []
for obs in data:
norm_arr = obs
h, w = obs.shape
if float(w)/h <= 2:
scale_factor = max_height/float(h)
target_size = (max_height, int(round(w*scale_factor)))
new_arr = misc.imresize(obs, target_size, interp='bicubic')
norm_arr = np.full((max_height, max_width), fill_value=255)
# check the shapes
# print(obs.shape, new_arr.shape, norm_arr.shape)
nh, nw = new_arr.shape
norm_arr[:nh, :nw] = new_arr
# visually check the result
# misc.toimage(norm_arr).show()
I implemented computation of average RGB value of a Python Imaging Library image in 2 ways:
1 - using lists
def getAverageRGB(image):
Given PIL Image, return average value of color as (r, g, b)
# no. of pixels in image
npixels = image.size[0]*image.size[1]
# get colors as [(cnt1, (r1, g1, b1)), ...]
cols = image.getcolors(npixels)
# get [(c1*r1, c1*g1, c1*g2),...]
sumRGB = [(x[0]*x[1][0], x[0]*x[1][1], x[0]*x[1][2]) for x in cols]
# calculate (sum(ci*ri)/np, sum(ci*gi)/np, sum(ci*bi)/np)
# the zip gives us [(c1*r1, c2*r2, ..), (c1*g1, c1*g2,...)...]
avg = tuple([sum(x)/npixels for x in zip(*sumRGB)])
return avg
2 - using numpy
def getAverageRGBN(image):
Given PIL Image, return average value of color as (r, g, b)
# get image as numpy array
im = np.array(image)
# get shape
w,h,d = im.shape
# change shape
im.shape = (w*h, d)
# get average
return tuple(np.average(im, axis=0))
I was surprised to find that #1 runs about 20% faster than #2.
Am I using numpy correctly? Is there a better way to implement the average computation?
Surprising indeed.
You may want to use:
to compute your mean (r,g,b), but I doubt it's gonna improve things a lot. Have you tried to profile getAverageRGBN and find the bottleneck?
One-liner w/o changing dimension or writing getAverageRGBN:
Again, it might not improve any performance.
In PIL or Pillow, in Python 3.4+:
from statistics import mean
average_color = [mean(image.getdata(band)) for band in range(3)]