Given a batch image tensor like B x C x W x H (batchSize,channels,width,height),
I would like to create a new tensor in which the new channels are the channels from nearby pixels (padded with 0s).
For instance, if I choose the nearby pixel size to be 3 x 3 (like a 3 x 3 filter) then there are 9 total nearby pixels and the final tensor size would be B x ( 9 * C ) x W x H.
Any recommendations on doing this, or do I just need to go the brute-force approach through iteration?
If you want to cut the edges short (img is your image tensor):
from skimage.util import view_as_windows
B,C,W,H = img.shape
img_ = view_as_windows(img,(1,1,3,3)).reshape(B,C,W-2,H-2,-1).transpose(0,1,4,2,3).reshape(B,C*9,W-2,H-2)
And if you want to pad them with 0 instead:
from skimage.util import view_as_windows
img = np.pad(img,((0,0),(0,0),(1,1),(1,1)))
B,C,W,H = img.shape
img_ = view_as_windows(img,(1,1,3,3)).reshape(B,C,W-2,H-2,-1).transpose(0,1,4,2,3).reshape(B,C*9,W-2,H-2)
For future readers, if you don't want to break the computation graph (using skimage) or want to use a more efficient implementation by not moving data from/to GPU, you probably want a native PyTorch solution instead.
This problem is very close to inverse PixelShuffle, and has a currently active feature request. The difference is that the poster wants to maintain image resolution while this solution does not.
I am copying the requester's initial code (which is pretty efficient) here:
out_channel = c*(r**2)
out_h = h//r
out_w = w//r
fm_view = fm.contiguous().view(b, c, out_h, r, out_w, r)
fm_prime = fm_view.permute(0,1,3,5,2,4).contiguous().view(b,out_channel, out_h, out_w)
Related
I'd like to know if there is a way to convert an image from grayscale to RGB in Python using "pure" Keras (i.e. without importing Tensorflow).
What I do now is:
x_rgb = tf.image.grayscale_to_rgb(x_grayscale)
Maybe you would consider this "cheating" (as keras.backend may end up calling Tensorflow behind the scene), but here's a solution:
from keras import backend as K
def grayscale_to_rgb(images, channel_axis=-1):
images= K.expand_dims(images, axis=channel_axis)
tiling = [1] * 4 # 4 dimensions: B, H, W, C
tiling[channel_axis] *= 3
images= K.tile(images, tiling)
return images
(supposing your grayscale images have a shape B x H x W and not e.g. B x H x W x 1 ; otherwise just remove the first line of the function)
I'm looking for an efficient way to efficiently gamma-blend images.
While regular (additive) blend of pixels A and B with a factor r is expressed as this:
C = (1-r) A + r B
Gamma (multiplicative) blend is done as follows:
C = A^(1-r) B^r
This would require a way to raise a pixel channels to a non-integer power, a bit like a gamma correction.
Since I have a large batch of 4K images to process, I need this be done efficiently (without looping through all pixels and performing the computation individually).
Thanks!
Posting an implementation of the solution #Pascal Mount mentioned in the comments he used as he has yet to post his:
import numpy as np
def blend_gamma_mul(img_A, img_B, r):
arr_A = np.array(img_A)
arr_B = np.array(img_B)
arr_C = arr_A**(1-r) * arr_B**r
return Image.fromarray(np.array(arr_C, dtype=np.uint8))
Use the function like so:
from PIL import Image
img_A = Image.open("A.jpg")
img_B = Image.open("B.jpg")
img_C = blend_gamma_mul(img_A, img_B, 0.7)
img_C.save("C.jpg")
Took 3.47s on my computer to blend two 4k images.
I've had following codes that use Python and OpenCV. Briefly, I have a stack of image taken at different focal depth. The codes pick out pixels at every (x,y) position that has the largest Laplacian of Guassian response among all focal depth(z), thus creating a focus-stacked image. Function get_fmap creates a 2d array where each pixel will contains the number of the focal plane having the largest log response. In the following codes, lines that are commented out are my current VIPS implementation. They don't look compatible within the function definition because it's only partial solution.
# from gi.repository import Vips
def get_log_kernel(siz, std):
x = y = np.linspace(-siz, siz, 2*siz+1)
x, y = np.meshgrid(x, y)
arg = -(x**2 + y**2) / (2*std**2)
h = np.exp(arg)
h[h < sys.float_info.epsilon * h.max()] = 0
h = h/h.sum() if h.sum() != 0 else h
h1 = h*(x**2 + y**2 - 2*std**2) / (std**4)
return h1 - h1.mean()
def get_fmap(img): # img is a 3-d numpy array.
log_response = np.zeros_like(img[:, :, 0], dtype='single')
fmap = np.zeros_like(img[:, :, 0], dtype='uint8')
log_kernel = get_log_kernel(11, 2)
# kernel = get_log_kernel(11, 2)
# kernel = [list(row) for row in kernel]
# kernel = Vips.Image.new_from_array(kernel)
# img = Vips.new_from_file("testimg.tif")
for ii in range(img.shape[2]):
# img_filtered = img.conv(kernel)
img_filtered = cv2.filter2D(img[:, :, ii].astype('single'), -1, log_kernel)
index = img_filtered > log_response
log_response[index] = img_filtered[index]
fmap[index] = ii
return fmap
and then fmap will be used to pick out pixels from different focal planes to create a focus-stacked image
This is done on an extremely large image, and I feel VIPS might do a better job than OpenCV on this. However, the official documentation provides rather scant information on its Python binding. From the information I can find on the internet, I'm only able to make image convolution work ( which, in my case, is an order of magnitude faster than OpenCV.). I'm wondering how to implement this in VIPS, especially these lines?
log_response = np.zeros_like(img[:, :, 0], dtype = 'single')
index = img_filtered > log_response
log_response[index] = im_filtered[index]
fmap[index] = ii
log_response and fmap are initialized as 3D arrays in the question code, whereas the question text states that the output, fmap is a 2D array. So, I am assuming that log_response and fmap are to be initialized as 2D arrays with their shapes same as each image. Thus, the edits would be -
log_response = np.zeros_like(img[:,:,0], dtype='single')
fmap = np.zeros_like(img[:,:,0], dtype='uint8')
Now, back to the theme of the question, you are performing 2D filtering on each image one-by-one and getting the maximum index of filtered output across all stacked images. In case, you didn't know as per the documentation of cv2.filter2D, it could also be used on a multi-dimensional array giving us a multi-dimensional array as output. Then, getting the maximum index across all images is as simple as .argmax(2). Thus, the implementation must be extremely efficient and would be simply -
fmap = cv2.filter2D(img,-1,log_kernel).argmax(2)
After consulting the Python VIPS manual and some trial-and-error, I've come up with my own answer. My numpy and OpenCV implementation in question can be translated into VIPS like this:
import pyvips
img = []
for ii in range(num_z_levels):
img.append(pyvips.Image.new_from_file("testimg_z" + str(ii) + ".tif")
def get_fmap(img)
log_kernel = get_log_kernel(11,2) # get_log_kernel is my own function, which generates a 2-d numpy array.
log_kernel = [list(row) for row in log_kernel] # pyvips.Image.new_from_array takes 1-d list array.
log_kernel = pyvips.Image.new_from_array(log_kernel) # Turn the kernel into Vips array so it can be used by Vips.
log_response = img[0].conv(log_kernel)
for ii in range(len(img)):
img_filtered = img[ii+1].conv(log_kernel)
log_response = (img_filtered > log_response).ifthenelse(img_filtered, log_response)
fmap = (img_filtered > log_response).ifthenelse(ii+1, 0)
Logical indexing is achieved through ifthenelse method :
result_img = (test_condition).ifthenelse(value_if_true, value_if_false)
The syntax is rather flexible. The test condition can be a comparison between two images of the same size or between an image and a value, e.g. img1 > img2 or img > 5. Like wise, value_if_true can be a single value or a Vips image.
I already achieved the goal described in the title but I was wondering if there was a more efficient (or generally better) way to do it. First of all let me introduce the problem.
I have a set of images of different sizes but with a width/height ratio less than (or equal) 2 (could be anything but let's say 2 for now), I want to normalize each one, meaning I want all of them to have the same size. Specifically I am going to do so like this:
Extract the max height above all images
Zoom the image so that each image reaches the max height keeping its ratio
Add a padding to the right with just white pixels until the image has a width/height ratio of 2
Keep in mind the images are represented as numpy matrices of grey scale values [0,255].
This is how I'm doing it now in Python:
max_height = numpy.max([len(obs) for obs in data if len(obs[0])/len(obs) <= 2])
for obs in data:
if len(obs[0])/len(obs) <= 2:
new_img = ndimage.zoom(obs, round(max_height/len(obs), 2), order=3)
missing_cols = max_height * 2 - len(new_img[0])
norm_img = []
for row in new_img:
norm_img.append(np.pad(row, (0, missing_cols), mode='constant', constant_values=255))
norm_img = np.resize(norm_img, (max_height, max_height*2))
There's a note about this code:
I'm rounding the zoom ratio because it makes the final height equal to max_height, I'm sure this is not the best approach but it's working (any suggestion is appreciated here). What I'd like to do is to expand the image keeping the ratio until it reaches a height equal to max_height. This is the only solution I found so far and it worked right away, the interpolation works pretty good.
So my final questions are:
Is there a better approach to achieve what explained above (image normalization) ? Do you think I could have done this differently ? Is there a common good practice I'm not following ?
Thanks in advance for your time.
Instead of ndimage.zoom you could use
scipy.misc.imresize. This
function allows you to specify the target size as a tuple, instead of by zoom
factor. Thus you won't have to call np.resize later to get the size exactly as
desired.
Note that scipy.misc.imresize calls
PIL.Image.resize
under the hood, so PIL (or Pillow) is a dependency.
Instead of using np.pad in a for-loop, you could allocate space for the desired array, norm_arr, first:
norm_arr = np.full((max_height, max_width), fill_value=255)
and then copy the resized image, new_arr into norm_arr:
nh, nw = new_arr.shape
norm_arr[:nh, :nw] = new_arr
For example,
from __future__ import division
import numpy as np
from scipy import misc
data = [np.linspace(255, 0, i*10).reshape(i,10)
for i in range(5, 100, 11)]
max_height = np.max([len(obs) for obs in data if len(obs[0])/len(obs) <= 2])
max_width = 2*max_height
result = []
for obs in data:
norm_arr = obs
h, w = obs.shape
if float(w)/h <= 2:
scale_factor = max_height/float(h)
target_size = (max_height, int(round(w*scale_factor)))
new_arr = misc.imresize(obs, target_size, interp='bicubic')
norm_arr = np.full((max_height, max_width), fill_value=255)
# check the shapes
# print(obs.shape, new_arr.shape, norm_arr.shape)
nh, nw = new_arr.shape
norm_arr[:nh, :nw] = new_arr
result.append(norm_arr)
# visually check the result
# misc.toimage(norm_arr).show()
I implemented computation of average RGB value of a Python Imaging Library image in 2 ways:
1 - using lists
def getAverageRGB(image):
"""
Given PIL Image, return average value of color as (r, g, b)
"""
# no. of pixels in image
npixels = image.size[0]*image.size[1]
# get colors as [(cnt1, (r1, g1, b1)), ...]
cols = image.getcolors(npixels)
# get [(c1*r1, c1*g1, c1*g2),...]
sumRGB = [(x[0]*x[1][0], x[0]*x[1][1], x[0]*x[1][2]) for x in cols]
# calculate (sum(ci*ri)/np, sum(ci*gi)/np, sum(ci*bi)/np)
# the zip gives us [(c1*r1, c2*r2, ..), (c1*g1, c1*g2,...)...]
avg = tuple([sum(x)/npixels for x in zip(*sumRGB)])
return avg
2 - using numpy
def getAverageRGBN(image):
"""
Given PIL Image, return average value of color as (r, g, b)
"""
# get image as numpy array
im = np.array(image)
# get shape
w,h,d = im.shape
# change shape
im.shape = (w*h, d)
# get average
return tuple(np.average(im, axis=0))
I was surprised to find that #1 runs about 20% faster than #2.
Am I using numpy correctly? Is there a better way to implement the average computation?
Surprising indeed.
You may want to use:
tuple(im.mean(axis=0))
to compute your mean (r,g,b), but I doubt it's gonna improve things a lot. Have you tried to profile getAverageRGBN and find the bottleneck?
One-liner w/o changing dimension or writing getAverageRGBN:
np.array(image).mean(axis=(0,1))
Again, it might not improve any performance.
In PIL or Pillow, in Python 3.4+:
from statistics import mean
average_color = [mean(image.getdata(band)) for band in range(3)]