Interpreting results of the cv2 method: phaseCorrelate? | python opencv - python

I have written a simple script which returns the phase correlation between two images. To achieve this I call the cv2 method: cv2.phaseCorrelate
I understand that it returns the sub-pixel phase shift between two images, however, I am unclear with the specific details of each component of the return object (method returns a list containing a tuple, and a floating points number).
Any and all help is greatly appreciated, thank you in advance.
import cv2
import math
import time
import numpy as np
class CorrelationCalculator(object):
'TODO: class description'
version = '0.1'
def __init__(self, initial_frame, detection_threshold=4):
self.initial_frame = np.float32(cv2.cvtColor(initial_frame, cv2.COLOR_BGR2GRAY))
self.detection_threshold = detection_threshold
def detect_phase_shift(self, current_frame):
'returns detected sub-pixel phase shift between two arrays'
self.current_frame = np.float32(cv2.cvtColor(current_frame, cv2.COLOR_BGR2GRAY))
shift = cv2.phaseCorrelate(self.initial_frame, self.current_frame)
return shift
# implementation
import cv2
img = cv2.imread('img1.jpg')
img2 = cv2.imread('img2.jpg')
obj = CorrelationCalculator(img)
shift = obj.detect_phase_shift(img2)
print(str(shift)
Output:
((4.3597901057868285, -2.8767423065464186), 0.4815432178477446)

The first tuple returned tells you the amount of shift between img and img2 in x and y coordinates. For example, consider the two images below.
This method is supposed to find the rectangle's shift in pixel values. The other values shows the response value that we get from phase correlation process. You may think it as a measure for the certainty of the calculation. You can find the detailed information on OpenCv documentation under phaseCorrelate title.

Related

Luminance Correction (Prospective Correction)

When I was searching internet for an algorithm to correct luminance I came across this article about prospective correction and retrospective correction. I'm mostly interested in the prospective correction. Basically we take pictures of the scene with image in it(original one), and two other ,one bright and one dark, pictures where we only see the background of the original picture.
My problem is that I couldn't find any adaptation of these formulas in openCV or code example. I tried to use the formulas as they were in my code but this time I had a problem with data types. This happened when I tried to find C constant by applying operations on images.
This is how I implemented the formula in my code:
def calculate_C(im, im_b):
fx_mean = cv.mean(im)
fx_over_bx = np.divide(im,im_b)
mean_fx_bx = cv.mean(fx_over_bx)
c = np.divide(fx_mean, mean_fx_bx)
return c
#Basic image reading and resizing
# Original image
img = cv.imread(image_path)
img = cv.resize(img, (1000,750))
# Bright image
b_img = cv.imread(bright_image_path)
b_img = cv.resize(b_img, (1000,750))
# Calculating C constant from the formula
c_constant = calculate_C(img, b_img)
# Because I have only the bright image I am using second formula from the article
img = np.multiply(np.divide(img,b_img), c_constant)
When I try to run this code I get the error:
img = np.multiply(np.divide(img,b_img), c_constant)
ValueError: operands could not be broadcast together with shapes (750,1000,3) (4,)
So, is there anything I can do to fix my code? or is there any hints that you can share with me to handle luminance correction with this method or better methods?
You are using cv2.mean function which returns array with shape (4,) - mean value for each channel. You may need to ignore last channel and correctly broadcast it to numpy.
Or you could use numpy for calculations instead of opencv.
I just take example images from provided article.
grain.png:
grain_background.png:
Complete example:
import cv2
import numpy as np
from numpy.ma import divide, mean
f = cv2.imread("grain.png")
b = cv2.imread("grain_background.png")
f = f.astype(np.float32)
b = b.astype(np.float32)
C = mean(f) / divide(f, b).mean()
g = divide(f, b) * C
g = g.astype(np.uint8)
cv2.imwrite("grain_out.png", g)
Your need to use masked divide operation because ordinary operation could lead to division by zero => nan values.
Resulting image (output.png):

Rotating a reshaped image as a matrix operation

I have a gray scale image that I want to rotate. However, I need to do optimization on it. Therefore, I cannot use pillow or opencv.
I want to reshape this image using python with numpy.reshape into an one dimensional vector (where I use the default settings C-style reshape).
And thereafter, I want to rotate this image around a point using matrix multiplication and addition, i.e. it should be something like
rotated_image_vector = A # vector + b # (or the equivalent in homogenious coordinates).
After this operation I want to reshape the outcome back to two dimensions and have the rotated image.
It would be best if it would as well use linear interpolation between the pixels that do not fit exactly to an other pixel.
The mathematical theory tells it is possible, and I believe there is a very elegant solution to this problem, but I do not see how to create this matrix. Did anyone already have this problem or sees an immediate solution?
Thanks a lot,
Eike
I like your approach but there is a slight misconception in it. What you want to transform are not the pixel values themselves but the coordinates. So you don't reshape your image but rather do a np.indices on it to obtain coordinates to each pixel. For those a rotation around a point looks like
rotation_matrix#(coordinates-fixed_point)+fixed_point
except that I have to transpose a bit to get the dimensions to align. The cove below is a slight adoption of my code in this answer.
As an example I am going to use the Wikipedia-logo-v2 by Nohat. It is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported license.
First I read in the picture, swap x and y axis to not get mad and rotate the coordinates as described above.
import numpy as np
import matplotlib.pyplot as plt
import itertools
image = plt.imread('wikipedia.jpg')
image = np.swapaxes(image,0,1)/255
fixed_point = np.array(image.shape[:2], dtype='float')/2
points = np.moveaxis(np.indices(image.shape[:2]),0,-1).reshape(-1,2)
a = 2*np.pi/8
A = np.array([[np.cos(a),-np.sin(a)],[np.sin(a),np.cos(a)]])
rotated_coordinates = (A#(points-fixed_point.reshape(1,2)).T).T+fixed_point.reshape(1,2)
Now I set up a little class to interpolate between the pixels that do not fit exactly to an other pixel. And finally I swap the axis back and plot it.
class Image_knn():
def fit(self, image):
self.image = image.astype('float')
def predict(self, x, y):
image = self.image
weights_x = [(1-(x % 1)).reshape(*x.shape,1), (x % 1).reshape(*x.shape,1)]
weights_y = [(1-(y % 1)).reshape(*x.shape,1), (y % 1).reshape(*x.shape,1)]
start_x = np.floor(x)
start_y = np.floor(y)
return sum([image[np.clip(np.floor(start_x + x), 0, image.shape[0]-1).astype('int'),
np.clip(np.floor(start_y + y), 0, image.shape[1]-1).astype('int')] * weights_x[x]*weights_y[y]
for x,y in itertools.product(range(2),range(2))])
image_model = Image_knn()
image_model.fit(image)
transformed_image = image_model.predict(*rotated_coordinates.T).reshape(*image.shape)
plt.imshow(np.swapaxes(transformed_image,0,1))
And I get a result like this
Possible Issue
The artifact in the bottom left that looks like one needs to clean the screen comes from the following problem: When we rotate it can happen that we don't have enough pixels to paint the lower left. What we do by default in image_knn is to clip the coordinates to an area where we have information. That means when we ask image knn for pixels coming from outside the image it gives us the pixels at the boundary of the image. This looks good if there is a background but if an object touches the edge of the picture it looks odd like here. Just something to keep in mind when using this.
Thank you for your answer!
But actually it is not a misconception that you could let this roation be represented by a matrix multiplication with the reshaped vector.
I used your code to generate such a matrix (its surely not the most efficient way but it works, most likely you see a more efficient implementation immediately XD. You see I really need it as a matix multiplication :-D).
What I basically did is to generate the representation matrix of the linear transformation, by computing how every of the 100*100 basis images (i.e. the image with zeros everywhere und a one) is mapped by your transformation.
import sys
import numpy as np
import matplotlib.pyplot as plt
import itertools
angle = 2*np.pi/6
image_expl = plt.imread('wikipedia.jpg')
image_expl = image_expl[:,:,0]
plt.imshow(image_expl)
plt.title("Image")
plt.show()
image_shape = image_expl.shape
pixel_number = image_shape[0]*image_shape[1]
rot_mat = np.zeros((pixel_number,pixel_number))
for i in range(pixel_number):
vector = np.zeros(pixel_number)
vector[i] = 1
image = vector.reshape(*image_shape)
fixed_point = np.array(image.shape, dtype='float')/2
points = np.moveaxis(np.indices(image.shape),0,-1).reshape(-1,2)
a = -angle
A = np.array([[np.cos(a),-np.sin(a)],[np.sin(a),np.cos(a)]])
rotated_coordinates = (A#(points-fixed_point.reshape(1,2)).T).T+fixed_point.reshape(1,2)
x,y = rotated_coordinates.T
image = image.astype('float')
weights_x = [(1-(x % 1)).reshape(*x.shape), (x % 1).reshape(*x.shape)]
weights_y = [(1-(y % 1)).reshape(*x.shape), (y % 1).reshape(*x.shape)]
start_x = np.floor(x)
start_y = np.floor(y)
transformed_image_returned = sum([image[np.clip(np.floor(start_x + x), 0, image.shape[0]-1).astype('int'),
np.clip(np.floor(start_y + y), 0, image.shape[1]-1).astype('int')] * weights_x[x]*weights_y[y]
for x,y in itertools.product(range(2),range(2))])
rot_mat[:,i] = transformed_image_returned
if i%100 == 0: print(int(100*i/pixel_number), "% finisched")
plt.imshow((rot_mat # image_expl.reshape(-1)).reshape(image_shape))
Thank you again :-)

PyTorch : How to apply the same random transformation to multiple image?

I am writing a simple transformation for a dataset which contains many pairs of images. As a data augmentation, I want to apply some random transformation for each pair but the images in that pair should be transformed in the same way.
For example, given a pair of two images A and B, if A is flipped horizontally, B must be flipped horizontally as A. Then the next pair C and D should be differently transformed from A and B but C and D are transformed in the same way. I am trying that in the way below
import random
import numpy as np
import torchvision.transforms as transforms
from PIL import Image
img_a = Image.open("sample_ajpg") # note that two images have the same size
img_b = Image.open("sample_b.png")
img_c, img_d = Image.open("sample_c.jpg"), Image.open("sample_d.png")
transform = transforms.RandomChoice(
[transforms.RandomHorizontalFlip(),
transforms.RandomVerticalFlip()]
)
random.seed(0)
display(transform(img_a))
display(transform(img_b))
random.seed(1)
display(transform(img_c))
display(transform(img_d))
Yet、 the above code does not choose the same transformation and as I tested, it is dependent on the number of times transform is called.
Is there any way to force transforms.RandomChoice to use the same transform when specified?
Usually a workaround is to apply the transform on the first image, retrieve the parameters of that transform, then apply with a deterministic transform with those parameters on the remaining images. However, here RandomChoice does not provide an API to get the parameters of the applied transform since it involves a variable number of transforms.
In those cases, I usually implement an overwrite to the original function.
Looking at the torchvision implementation, it's as simple as:
class RandomChoice(RandomTransforms):
def __call__(self, img):
t = random.choice(self.transforms)
return t(img)
Here are two possible solutions.
You can either sample from the transform list on __init__ instead of on __call__:
import random
import torchvision.transforms as T
class RandomChoice(torch.nn.Module):
def __init__(self):
super().__init__()
self.t = random.choice(self.transforms)
def __call__(self, img):
return self.t(img)
So you can do:
transform = T.RandomChoice([
T.RandomHorizontalFlip(),
T.RandomVerticalFlip()
])
display(transform(img_a)) # both img_a and img_b will
display(transform(img_b)) # have the same transform
transform = T.RandomChoice([
T.RandomHorizontalFlip(),
T.RandomVerticalFlip()
])
display(transform(img_c)) # both img_c and img_d will
display(transform(img_d)) # have the same transform
Or better yet, transform the images in batch:
import random
import torchvision.transforms as T
class RandomChoice(torch.nn.Module):
def __init__(self, transforms):
super().__init__()
self.transforms = transforms
def __call__(self, imgs):
t = random.choice(self.transforms)
return [t(img) for img in imgs]
Which allows to do:
transform = T.RandomChoice([
T.RandomHorizontalFlip(),
T.RandomVerticalFlip()
])
img_at, img_bt = transform([img_a, img_b])
display(img_at) # both img_a and img_b will
display(img_bt) # have the same transform
img_ct, img_dt = transform([img_c, img_d])
display(img_ct) # both img_c and img_d will
display(img_dt) # have the same transform
Simply, take the randomization part out of PyTorch into an if statement.
Below code uses vflip. Similarly for horizontal or other transforms.
import random
import torchvision.transforms.functional as TF
if random.random() > 0.5:
image = TF.vflip(image)
mask = TF.vflip(mask)
This issue has been discussed in PyTorch forum. Several solutions' pros and cons were discussed on the official GitHub repository page.
PyTorch maintainers have suggested this simple approach.
Do not use torchvision.transforms.RandomVerticalFlip(p=1). Use torchvision.transforms.functional.vflip
Functional transforms give you fine-grained control of the transformation pipeline. As opposed to the transformations above, functional transforms don’t contain a random number generator for their parameters. That means you have to specify/generate all parameters, but you can reuse the functional transform.
I realize the OP requested a solution using torchvision and I think #Ivan's answer does a good job addressing this.
However, for those not tied to a specific augmentation library, I wanted to point out that Albumentations appears to handle these kind of situations nicely in a native fashion by allowing the user to pass multiple source images, boxes, etc into the same transform. The return is structured as a dict
import albumentations as A
transform = A.Compose(
transforms=[
A.VerticalFlip(p=0.5),
A.HorizontalFlip(p=0.5)],
additional_targets={'image0': 'image', 'image1': 'image'}
)
transformed = transform(image=image, image0=image0, image1=image1)
Now you can access transformed['image0'], transformed['image1'], etc and all of them will have random parameters applied
I dont know of a function to fix the random output.
maybe try a different logic, like creating the randomization yourself to be able to reuse the same transformation.
logic:
generate a random number
based on the number apply a transformation on both images
generate another random number
do the same for the other two images
try this:
import random
import numpy as np
import torchvision.transforms as transforms
from PIL import Image
img_a = Image.open("sample_ajpg") # note that two images have the same size
img_b = Image.open("sample_b.png")
img_c, img_d = Image.open("sample_c.jpg"), Image.open("sample_d.png")
if random.random() > 0.5:
image_a_flipped = transforms.functional_pil.vflip(img_a)
image_b_flipped = transforms.functional_pil.vflip(img_b)
else:
image_a_flipped = transforms.functional_pil.hflip(img_a)
image_b_flipped = transforms.functional_pil.hflip(img_b)
if random.random() > 0.5:
image_c_flipped = transforms.functional_pil.vflip(img_c)
image_d_flipped = transforms.functional_pil.vflip(img_d)
else:
image_c_flipped = transforms.functional_pil.hflip(img_c)
image_d_flipped = transforms.functional_pil.hflip(img_d)
display(image_a_flipped)
display(image_b_flipped)
display(image_c_flipped)
display(image_d_flipped)
Referencing Random transforms for both input and target? I think this is probably the cleanest way to do it. Save the random state before applying any transformation and the just restore it for each consequent call
t = transforms.RandomRotation(degrees=360)
state = torch.get_rng_state()
x = t(x)
torch.set_rng_state(state)
y = t(y)

OpenCV warpAffine always return 0 matrix

I am using python opencv version 4.5.
import cv2
import numpy as np
rigidRect = np.float32([[50,-50],[50,50],[-50,50]])
shiftRect = np.float32([[50,-30],[50,70],[-50,70]])
M = cv2.getAffineTransform(rigidRect, shiftRect) #this return [[1,0,0],[0,1,20]]
validateRect = cv2.warpAffine(rigidRect, M, (2,3))
and validateRect return a 3 by 2 zeroes matrix.
I thought validateRect will equal to shiftRect?
warpAffine is used to transform an image using the affine transform matrix. What you are trying to do is to transform the given points, which is achieved by the transform function. Documentation of getAffineTransform gives hint about related functions in see also part.
validateRect = cv2.transform(rigidRect[None,:,:], M)

Maximum intensity projection from image stack

I'm trying to recreate the function
max(array, [], 3)
From MatLab, which can take my 300x300px image stack of N images (I'm saying "Image" here because I'm processing images, really this is just a big double array), 300x300xN, and create a 300x300 array. What I think is happening in this function, if it were to operate inefficiently, is that it is parsing through each (x,y) point, then taking the maximum value from that point across the z-axis, then normalizing with maximum and minimum values of the entire array.
I've tried recreating this in python with
# Shape of dataset: (300, 300, 181)
# Type of dataset: <type 'numpy.ndarray'>
for x in range(numpy.size(self.dataset, 0)):
for y in range(numpy.size(self.dataset, 1)):
print "Point is", x, y
# more would go here to find the maximum (x,y) value over Z axis in self.dataset
A very simple X,Y iterator. -- but not only does my IDE crash after a few milliseconds of running this code, but also it feels gross and inefficient.
Is there something I'm missing? I'm new to Python, and therefore the answer here isn't clear to me. Is there an existing function that does this operation?
import numpy as np
import matplotlib.pyplot as plt
from skimage import io
path = "test.tif"
IM = io.imread(path)
IM_MAX= np.max(IM, axis=0)
plt.imshow(IM_MAX)

Categories

Resources