How to resize an image with "varying scaling"? - python

I need to resize an image, but with a "varying scaling" in the y axis, after warping:
Plotted Image
Original input image
Warped output image
The image (left one) was taken at an angle, so I've used the getPerspectiveTransform and warpPerspective OpenCV functions to get the top/plan view of the image (right one).
But, now the top half of the warped image is stretched and the bottom half is squashed, and this amount of stretch/squash is varying continuously as you go down the image. So, I need to do the opposite.
For example: The zebra crossing lines in the warped image are thicker at the top of the image and thinner at the bottom. I want them to all be the same thickness and same vertical distance from each other essentially.
Badly drawn but something like this: (if we ignore the 2 people, I think this is what the final output image should be like.)
predicted output image
My end goal is to measure distance between people's feet in an image (shown by green dots), but I've got that section sorted already.
By vertically scaling the warped image to make it linear, it will allow me to accurately measure the real distance in the x & y direction from a top/plan view, (i.e each pixel in the x or y direction is say 1cm in real distance)
I was thinking of multiplying each row of the image by a factor (e.g. top rows multiply by smaller number like 0.8 or 0.9, and bottom rows multiply by bigger number like 1.1 or 1.2), but I really don't know how to do that.
Code:
import cv2 as cv
from matplotlib import pyplot as plt
import numpy as np
# READ IMAGE
imgOrig = cv.imread('.jpg')
# RESIZE IMAGE
width = int(1000)
ratio = imgOrig.shape[1]/width
height = int(imgOrig.shape[0]/ratio)
dsize = (width, height)
img = cv.resize(imgOrig, dsize)
feetLocation = [[280, 500], [740, 496]]
cv.circle(img,(280, 500),5,(0,255,0),thickness= 10)
cv.circle(img,(740, 496),5,(0,255,0),thickness= 10)
# WARPING
pts1 = np.float32([[0, -0], [width, 0], [-1800, height], [width + 1800, height]])
pts2 = np.float32([[0, 0], [width, 0], [0, height], [width, height]])
M = cv.getPerspectiveTransform(pts1, pts2)
dst = cv.warpPerspective(img, M, (width, height))
#DISPLAY IMAGES
plt.subplot(121),plt.imshow(img),plt.title('Original Image')
plt.subplot(122),plt.imshow(dst),plt.title('Warped Image')
plt.show()

I was working on a solution, before the several edits were applied. I focussed on the actual boxes only. If, instead, you actually need the surrounding, too, the following approach won't help you much, I'm afraid. Also, I assumed the bottom box to be fully included. So, if that one's somehow cut like presented in your new desired final output, additional work would be needed to handle that case.
From the given image, you could mask the gray-ish part around and between the single boxes using the saturation and value channels from the HSV color space:
Following, row-wise sum all pixels, apply some moving average to clean the signal, and detect the peaks in that signal:
The bottom image border must be manually added, since there is no gray-ish border (most likely because the box is somehow cut).
Now, for each of these "peak rows", determine the first and last masked pixels, and build boxes from each two neighbouring "peak rows". Finally, for each of these boxes, apply a distinct perspective transform to a given size. If needed, stack those boxes vertically for example:
That'd be the whole code:
import cv2
import matplotlib.pyplot as plt
import numpy as np
from scipy.signal import find_peaks
# Read original image
imgOrig = cv2.cvtColor(cv2.imread('DInAq.jpg'), cv2.COLOR_BGR2RGB)
# Resize image
width = int(1000)
ratio = imgOrig.shape[1] / width
height = int(imgOrig.shape[0] / ratio)
dsize = (width, height)
img = cv2.resize(imgOrig, dsize)
# Mask low saturation and medium to high value (i.e. gray-ish/white-ish colors)
img_gauss = cv2.GaussianBlur(img, (5, 5), -1)
h, s, v = cv2.split(cv2.cvtColor(img_gauss, cv2.COLOR_BGR2HSV))
mask = (s < 24) & (v > 64)
# Row-wise sum mask pixels, apply moving average filter, and find peaks
row_sum = np.sum(mask, axis=1)
row_sum = np.convolve(row_sum, np.ones(5)/5, 'same')
peaks = find_peaks(row_sum, prominence=50)[0]
peaks = np.insert(peaks, 4, img.shape[0]-1)
# Find first and last pixels per "peak row"
x1 = [np.argwhere(mask[p, :]).min() for p in peaks]
x2 = [np.argwhere(mask[p, :]).max() for p in peaks]
# Collect single boxes
boxes = []
for i in np.arange(len(peaks)-1, 0, -1):
boxes.append([[x1[i], peaks[i]],
[x1[i-1], peaks[i-1]],
[x2[i-1], peaks[i-1]],
[x2[i], peaks[i]]])
# Warp each box individually to a given size
warped = []
bw, bh = [400, 400]
for box in reversed(boxes):
pts1 = np.float32(box)
pts2 = np.float32([[0, bh-1], [0, 0], [bw-1, 0], [bw-1, bh-1]])
M = cv2.getPerspectiveTransform(pts1, pts2)
warped.append(cv2.warpPerspective(img, M, (bw, bh)))
# Output
plt.figure(1)
plt.subplot(121), plt.imshow(img), plt.title('Original image')
for box in boxes:
pts = np.array(box)
plt.plot(pts[:, 0], pts[:, 1], 'rx')
plt.subplot(122), plt.imshow(np.vstack(warped)), plt.title('Warped image')
plt.tight_layout(), plt.show()
That's kind of an automated way to detect and extract the single boxes. For better results, you could set up a simple GUI (solely using OpenCV, for example), and let the user click on the exact corners, and build the boxes to be transformed from there.
----------------------------------------
System information
----------------------------------------
Platform: Windows-10-10.0.16299-SP0
Python: 3.9.1
PyCharm: 2021.1
Matplotlib: 3.4.1
NumPy: 1.20.2
OpenCV: 4.5.1
SciPy: 1.6.2
----------------------------------------

Related

affine transformation using nearest neighbor in python

I want to make an affine transformation and afterwards use nearest neighbor interpolation while keeping the same dimensions for input and output images. I use for example the scaling transformation T= [[2,0,0],[0,2,0],[0,0,1]]. Any idea how can I fill the black pixels with nearest neighbor ? I tryied giving them the min value of neighbors' intensities. For ex. if a pixel has neighbors [55,22,44,11,22,55,23,231], I give it the value of min intensity: 11. But the result is not anything clear..
import numpy as np
from matplotlib import pyplot as plt
#Importing the original image and init the output image
img = plt.imread('/home/left/Desktop/computerVision/SET1/brain0030slice150_101x101.png',0)
outImg = np.zeros_like(img)
# Dimensions of the input image and output image (the same dimensions)
(width , height) = (img.shape[0], img.shape[1])
# Initialize the transformation matrix
T = np.array([[2,0,0], [0,2,0], [0,0,1]])
# Make an array with input image (x,y) coordinations and add [0 0 ... 1] row
coords = np.indices((width, height), 'uint8').reshape(2, -1)
coords = np.vstack((coords, np.zeros(coords.shape[1], 'uint8')))
output = T # coords
# Arrays of x and y coordinations of the output image within the image dimensions
x_array, y_array = output[0] ,output[1]
indices = np.where((x_array >= 0) & (x_array < width) & (y_array >= 0) & (y_array < height))
# Final coordinations of the output image
fx, fy = x_array[indices], y_array[indices]
# Final output image after the affine transformation
outImg[fx, fy] = img[fx, fy]
The input image is:
The output image after scaling is:
well you could simply use the opencv resize function
import cv2
new_image = cv2.resize(image, new_dim, interpolation=cv.INTER_AREA)
it'll do the resize and fill in the empty pixels in one go
more on cv2.resize
If you need to do it manually, then you could simply detect dark pixels in resized image and change their value to mean of 4 neighbour pixels (for example - it depends on your required alghoritm)
See: nereast neighbour, bilinear, bicubic, etc.

How to plot centroids on image after kmeans clustering?

I have a color image and wanted to do k-means clustering on it using OpenCV.
This is the image on which I wanted to do k-means clustering.
This is my code:
import numpy as np
import cv2
import matplotlib.pyplot as plt
image1 = cv2.imread("./triangle.jpg", 0)
Z1 = image1.reshape((-1))
Z1 = np.float32(Z1)
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
K1 = 2
ret, mask, center =cv2.kmeans(Z1,K1,None,criteria,10,cv2.KMEANS_RANDOM_CENTERS)
center = np.uint8(center)
print(center)
res_image1 = center[mask.flatten()]
clustered_image1 = res_image1.reshape((image1.shape))
for c in center:
plt.hlines(c, xmin=0, xmax=max(clustered_image1.shape[0], clustered_image1.shape[1]), lw=1.)
plt.imshow(clustered_image1)
plt.show()
This is what I get from the center variable.
[[112]
[255]]
This is the output image
My problem is that I'm unable to understand the output. I have two lists in the center variable because I wanted two classes. But why do they have only one value?
Shouldn't it be something like this (which makes sense because centroids should be points):
[[x1, y1]
[x2, y2]]
instead of this:
[[x]
[y]]
and if I read the image as a color image like this:
image1 = cv2.imread("./triangle.jpg")
Z1 = image1.reshape((-1, 3))
I get this output:
[[255 255 255]
[ 89 173 1]]
Color image output
Can someone explain to me how I can get 2d points instead of lines? Also, how do I interpret the output I got from the center variable when using the color image?
Please let me know if I'm unclear anywhere. Thanks!!
K-Means-clustering finds clusters of similar values. Your input is an array of color values, hence you find the colors that describe the 2 clusters. [255 255 255] is the white color, [ 89 173 1] is the green color. Similar for [112] and [255] in the grayscale version. What you're doing is color quantization
They are correctly the centroids, but their dimension is color, not location. Therefor you cannot plot it anywhere. Well you can, but I looks like this:
See how the 'color location' determines to which class each pixel belongs?
This is not something you can locate in your image. What you can do is find the pixels that belong to the different clusters, and use the locations of the found pixels to determine their centroid or 'average' position.
To get the 'average' position of each color, you have to separate out the pixel coordinates according to the class/color to which they belong. In the code below I used np.where( img <= 240) where 240 is the threshold. I used 240 out of ease, but you could use K-Means to determine where the threshold should be. (inRange() might be useful at some point)) If you sum the coordinates and divide that by the number of pixels found, you'll have what I think you are looking for:
Result:
Code:
import cv2
# load image as grayscale
img = cv2.imread('D21VU.jpg',0)
# get the positions of all pixels that are not full white (= triangle)
triangle_px = np.where( img <= 240)
# dividing the sum of the values by the number of pixels
# to get the average location
ty = int(sum(triangle_px[0])/len(triangle_px[0]))
tx = int(sum(triangle_px[1])/len(triangle_px[1]))
# print location and draw filled black circle
print("Triangle ({},{})".format(tx,ty))
cv2.circle(img, (tx,ty), 10,(0), -1)
# the same process, but now with only white pixels
white_px = np.where( img > 240)
wy = int(sum(white_px[0])/len(white_px[0]))
wx = int(sum(white_px[1])/len(white_px[1]))
# print location and draw white filled circle
print("White: ({},{})".format(wx,wy))
cv2.circle(img, (wx,wy), 10,(255), -1)
# display result
cv2.imshow('Result',img)
cv2.waitKey(0)
cv2.destroyAllWindows()
Here is an Imagemagick solution, since I am not proficient with OpenCV.
Basically, I convert your actual image (from your link in the comments) to binary, then use image moments to extract the centroid and other statistics.
I suspect you can do something similar in OpenCV, Skimage, or Python Wand, which is based upon Imagemagick. (See for example:
https://docs.opencv.org/3.4/d3/dc0/group__imgproc__shape.html#ga556a180f43cab22649c23ada36a8a139
https://scikit-image.org/docs/dev/api/skimage.measure.html#skimage.measure.moments_coords_central
https://en.wikipedia.org/wiki/Image_moment)
Input:
Your image does not have just two colors. Perhaps this image did not have kmeans clustering applied with 2 colors only. So I will do that with an Imagemagick script that I have built.
kmeans -n 2 -m 5 img.png img2.png
final colors:
count,hexcolor
99234,#65345DFF
36926,#27AD0EFF
Then I convert the two colors to black and white by simply thresholding and stretching the dynamic range to full black and white.
convert img2.png -threshold 50% -auto-level img3.png
Then I get all the image moment statistics for the white pixels, which includes the x,y centroid in pixels relative to the top left corner of the image. It also includes the equivalent ellipse major and minor axes, angle of major axis, eccentricity of the ellipse, and equivalent brightness of the ellipse, plus the 8 Hu image moments.
identify -verbose -moments img3.png
Channel moments:
Gray:
--> Centroid: 208.523,196.302 <--
Ellipse Semi-Major/Minor axis: 170.99,164.34
Ellipse angle: 140.853
Ellipse eccentricity: 0.197209
Ellipse intensity: 106.661 (0.41828)
I1: 0.00149333 (0.380798)
I2: 3.50537e-09 (0.000227937)
I3: 2.10942e-10 (0.00349771)
I4: 7.75424e-13 (1.28576e-05)
I5: 9.78445e-24 (2.69016e-09)
I6: -4.20164e-17 (-1.77656e-07)
I7: 1.61745e-24 (4.44704e-10)
I8: 9.25127e-18 (3.91167e-08)

Detect if an OCR text image is upside down

I have some hundreds of images (scanned documents), most of them are skewed. I wanted to de-skew them using Python.
Here is the code I used:
import numpy as np
import cv2
from skimage.transform import radon
filename = 'path_to_filename'
# Load file, converting to grayscale
img = cv2.imread(filename)
I = cv2.cvtColor(img, COLOR_BGR2GRAY)
h, w = I.shape
# If the resolution is high, resize the image to reduce processing time.
if (w > 640):
I = cv2.resize(I, (640, int((h / w) * 640)))
I = I - np.mean(I) # Demean; make the brightness extend above and below zero
# Do the radon transform
sinogram = radon(I)
# Find the RMS value of each row and find "busiest" rotation,
# where the transform is lined up perfectly with the alternating dark
# text and white lines
r = np.array([np.sqrt(np.mean(np.abs(line) ** 2)) for line in sinogram.transpose()])
rotation = np.argmax(r)
print('Rotation: {:.2f} degrees'.format(90 - rotation))
# Rotate and save with the original resolution
M = cv2.getRotationMatrix2D((w/2,h/2),90 - rotation,1)
dst = cv2.warpAffine(img,M,(w,h))
cv2.imwrite('rotated.jpg', dst)
This code works well with most of the documents, except with some angles: (180 and 0) and (90 and 270) are often detected as the same angle (i.e it does not make difference between (180 and 0) and (90 and 270)). So I get a lot of upside-down documents.
Here is an example:
The resulted image that I get is the same as the input image.
Is there any suggestion to detect if an image is upside down using Opencv and Python?
PS: I tried to check the orientation using EXIF data, but it didn't lead to any solution.
EDIT:
It is possible to detect the orientation using Tesseract (pytesseract for Python), but it is only possible when the image contains a lot of characters.
For anyone who may need this:
import cv2
import pytesseract
print(pytesseract.image_to_osd(cv2.imread(file_name)))
If the document contains enough characters, it is possible for Tesseract to detect the orientation. However, when the image has few lines, the orientation angle suggested by Tesseract is usually wrong. So this can not be a 100% solution.
Python3/OpenCV4 script to align scanned documents.
Rotate the document and sum the rows. When the document has 0 and 180 degrees of rotation, there will be a lot of black pixels in the image:
Use a score keeping method. Score each image for it's likeness to a zebra pattern. The image with the best score has the correct rotation. The image you linked to was off by 0.5 degrees. I omitted some functions for readability, the full code can be found here.
# Rotate the image around in a circle
angle = 0
while angle <= 360:
# Rotate the source image
img = rotate(src, angle)
# Crop the center 1/3rd of the image (roi is filled with text)
h,w = img.shape
buffer = min(h, w) - int(min(h,w)/1.15)
roi = img[int(h/2-buffer):int(h/2+buffer), int(w/2-buffer):int(w/2+buffer)]
# Create background to draw transform on
bg = np.zeros((buffer*2, buffer*2), np.uint8)
# Compute the sums of the rows
row_sums = sum_rows(roi)
# High score --> Zebra stripes
score = np.count_nonzero(row_sums)
scores.append(score)
# Image has best rotation
if score <= min(scores):
# Save the rotatied image
print('found optimal rotation')
best_rotation = img.copy()
k = display_data(roi, row_sums, buffer)
if k == 27: break
# Increment angle and try again
angle += .75
cv2.destroyAllWindows()
How to tell if the document is upside down? Fill in the area from the top of the document to the first non-black pixel in the image. Measure the area in yellow. The image that has the smallest area will be the one that is right-side-up:
# Find the area from the top of page to top of image
_, bg = area_to_top_of_text(best_rotation.copy())
right_side_up = sum(sum(bg))
# Flip image and try again
best_rotation_flipped = rotate(best_rotation, 180)
_, bg = area_to_top_of_text(best_rotation_flipped.copy())
upside_down = sum(sum(bg))
# Check which area is larger
if right_side_up < upside_down: aligned_image = best_rotation
else: aligned_image = best_rotation_flipped
# Save aligned image
cv2.imwrite('/home/stephen/Desktop/best_rotation.png', 255-aligned_image)
cv2.destroyAllWindows()
Assuming you did run the angle-correction already on the image, you can try the following to find out if it is flipped:
Project the corrected image to the y-axis, so that you get a 'peak' for each line. Important: There are actually almost always two sub-peaks!
Smooth this projection by convolving with a gaussian in order to get rid of fine structure, noise, etc.
For each peak, check if the stronger sub-peak is on top or at the bottom.
Calculate the fraction of peaks that have sub-peaks on the bottom side. This is your scalar value that gives you the confidence that the image is oriented correctly.
The peak finding in step 3 is done by finding sections with above average values. The sub-peaks are then found via argmax.
Here's a figure to illustrate the approach; A few lines of you example image
Blue: Original projection
Orange: smoothed projection
Horizontal line: average of the smoothed projection for the whole image.
here's some code that does this:
import cv2
import numpy as np
# load image, convert to grayscale, threshold it at 127 and invert.
page = cv2.imread('Page.jpg')
page = cv2.cvtColor(page, cv2.COLOR_BGR2GRAY)
page = cv2.threshold(page, 127, 255, cv2.THRESH_BINARY_INV)[1]
# project the page to the side and smooth it with a gaussian
projection = np.sum(page, 1)
gaussian_filter = np.exp(-(np.arange(-3, 3, 0.1)**2))
gaussian_filter /= np.sum(gaussian_filter)
smooth = np.convolve(projection, gaussian_filter)
# find the pixel values where we expect lines to start and end
mask = smooth > np.average(smooth)
edges = np.convolve(mask, [1, -1])
line_starts = np.where(edges == 1)[0]
line_endings = np.where(edges == -1)[0]
# count lines with peaks on the lower side
lower_peaks = 0
for start, end in zip(line_starts, line_endings):
line = smooth[start:end]
if np.argmax(line) < len(line)/2:
lower_peaks += 1
print(lower_peaks / len(line_starts))
this prints 0.125 for the given image, so this is not oriented correctly and must be flipped.
Note that this approach might break badly if there are images or anything not organized in lines in the image (maybe math or pictures). Another problem would be too few lines, resulting in bad statistics.
Also different fonts might result in different distributions. You can try this on a few images and see if the approach works. I don't have enough data.
You can use the Alyn module. To install it:
pip install alyn
Then to use it to deskew images(Taken from the homepage):
from alyn import Deskew
d = Deskew(
input_file='path_to_file',
display_image='preview the image on screen',
output_file='path_for_deskewed image',
r_angle='offest_angle_in_degrees_to_control_orientation')`
d.run()
Note that Alyn is only for deskewing text.

Can I randomly add a small image (13x12) to an array of pixels (random pixels of the same size of my small image)

I have 20 small images (that I want to put in a target area of a background image (13x12). I have already marked my target area with a circle, I have the coordinates of the circle in two arrays of pixels. Now I want to know how I can randomly add my 20 small images in random area in my arrays of pixels which are basically the target area (the drawn circle).
In my code, I was trying for just one image, if it works, I'll pass the folder of my 20 small images.
# Depencies importation
import cv2
import numpy as np
# Saving directory
saving_dir = "../Saved_Images/"
# Read the background image
bgimg = cv2.imread("../Images/background.jpg")
# Resizing the bacground image
bgimg_resized = cv2.resize(bgimg, (2050,2050))
# Read the image that will be put in the background image (exemple of 1)
# I'm just trying with one, if it works, I'll pass the folder of the 20
small_img = cv2.imread("../Images/small.jpg")
# Convert the resized background image to gray
bgimg_gray = cv2.cvtColor(bgimg, cv2.COLOR_BGR2GRAY)
# Convert the grayscale image to a binary image
ret, thresh = cv2.threshold(bgimg_gray,127,255,0)
# Determine the moments of the binary image
M = cv2.moments(thresh)
# calculate x,y coordinate of center
cX = int(M["m10"] / M["m00"])
cY = int(M["m01"] / M["m00"])
# Drawing the circle in the background image
circle = cv2.circle(bgimg, (cX, cY), 930, (0,0,255), 9)
print(circle) # This returns None
# Getting the coordinates of the circle
combined = bgimg[:,:,0] + bgimg[:,:,1] + bgimg[:,:,2]
rows, cols = np.where(combined >= 0)
# I have those pixels in rows and cols, but I don't know
# How to randomly put my small image in those pixel
# Saving the new image
cv2.imwrite(saving_dir+"bgimg"+".jpg", bgimg)
cv2.namedWindow('image', cv2.WINDOW_NORMAL)
cv2.resizeWindow("Test", 1000, 1200)
# Showing the images
cv2.imshow("image", bgimg)
# Waiting for any key to stop the program execution
cv2.waitKey(0)
In the expected results, the small images must be placed in the background image randomly.
If you have the center and the radius of your circle, you can easily generate random coordinates by randomly choosing an angle theta from [0, 2*pi], calculating corresponding x and y values by cos(theta) and sin(theta) and scaling these by some random chosen scaling factors from [0, radius]. I prepared some code for you, see below.
I omitted a lot of code from yours (reading, preprocessing, saving) to focus on the relevant parts (see how to create a minimal, complete, and verifiable example). Hopefully, you can integrate the main idea of my solution into your code on your own. If not, I will provide further explanations.
import cv2
import numpy as np
# (Artificial) Background image (instead of reading an actual image...)
bgimg = 128 * np.ones((401, 401, 3), np.uint8)
# Circle parameters (obtained somehow...)
center = (200, 200)
radius = 100
# Draw circle in background image
cv2.circle(bgimg, center, radius, (0, 0, 255), 3)
# Shape of small image (known before-hand...?)
(w, h) = (13, 12)
for k in range(200):
# (Artificial) Small image (instead of reading an actual image...)
smallimg = np.uint8(np.add(128 * np.random.rand(w, h, 3), (127, 127, 127)))
# Select random angle theta from [0, 2*pi]
theta = 2 * np.pi * np.random.rand()
# Select random distance factors from center
factX = (radius - w/2) * np.random.rand()
factY = (radius - h/2) * np.random.rand()
# Calculate random coordinates for small image from angle and distance factors
(x, y) = np.uint16(np.add((np.cos(theta) * factX - w/2, np.sin(theta) * factY - h/2), center))
# Replace (rather than "add") determined area in background image with small image
bgimg[x:x+smallimg.shape[0], y:y+smallimg.shape[1]] = smallimg
cv2.imshow("bgimg", bgimg)
cv2.waitKey(0)
The exemplary output:
Caveat: I haven't paid attention, if the small images might violate the circle boundary. Therefore, some additional checks or limitations to the scaling factors must be added.
EDIT: I edited my above code. To take the below comment into account, I shift the small image by (width/2, height/2), and limit the radius scale factor accordingly, so that the circle boundary isn't violated, neither top/left nor bottom/right.
Before, it was possible, that the boundary is violated in the bottom/right part (n = 200):
After the edit, this should be prevented (n = 20000):
The touching of the red line in the image is due to the line's thickness. For "safety reasons", one could add another 1 pixel distance.

Insert an image patch into another image at a subpixel location

I have an image patch that I want to insert into another image at a floating point location. In fact, what I need is something like the opposite of what the opencv getRectSubPix function does.
I guess I could implement it by doing a subpixel warp of the patch into another patch and insert this other patch into the target image at an integer location. However, I don't have clear what to do with the empty fraction of the pixels in the warped patch, or how would I blend the border of the new patch with in the target image.
I rather use a library function than implement this operation myself. Does anybody know if there is any library function that can do this type of operation, in opencv or any other image processing library?
UPDATE:
I discovered that opencv warpPerspective can be used with a borderMode = BORDER_TRANSPARENT which means that the pixels in the destination image corresponding to the "outliers" in the source image are not modified by the function. So I thought I could implement this subpixel patch insertion with just a warpPerspective and an adequate transformation matrix. So I wrote this function in python to perform the operation:
def insert_patch_subpixel(im, patch, p):
"""
im: numpy array with source image.
patch: numpy array with patch to be inserted into the source image
p: tuple with the center of the position (can be float) where the patch is to be inserted.
"""
ths = patch.shape[0]/2
xpmin = p[0] - ths
ypmin = p[1] - ths
Ho = np.array([[1, 0, xpmin],
[0, 1, ypmin],
[0, 0, 1]], dtype=float)
h,w = im.shape
im2 = cv2.warpPerspective(patch, Ho, (w,h), dst=im,
flags=cv2.INTER_LINEAR,
borderMode=cv2.BORDER_TRANSPARENT)
return im2
Unfortunately, the interpolation doesn't seem to work for the outlier pixels if BORDER_TRANSPARENT is used. I tested this function with a small 10x10 image (filled with value 30) and inserting a 4x4 patch (filled with value 100) at p=(5,5) (left figure) and p=(5.5,5.5) (middle figure) and we can see in the figures below that there is no interpolation in the border. However, if I change the boderMode to BORDER_CONSTANT the interpolation works (right figure), but that also fills the destination image with 0s for the outlier values.
It's a shame that interpolation doesn't work with BORDER_TRANSPARENT. I'll suggest this as an improvement to the opencv project.
Resize the patch image to the size you want in the destination. Then set alpha along the edges based on 1.0 - fraction for the left edge, fraction for the right edge. Then blend.
It's not quite perfect, because you're not resampling all the pixels properly, but that would also damage resolution. It's probably your best compromise.
Actually you should use getRectSubPix().
Use it to extract your patch from the source image with the fractional part of your desired offset then just set it into the destination image with a simple copy (or blend as needed).
You might want to add a 1 pixel border around be patch where you can do the blend.
This function essentially does a translation only (subpixel) warp.
I found a solution based on what I found in my question update.
As I could see the interpolation happening when using boderMode = BORDER_CONSTANT in the warpPerspective function I thought I could use this as a weighting mask for a blending between the original image and the subpixel inserted patch on a black background. See the new function and test code:
import numpy as np
import matplotlib.pyplot as plt
def insert_patch_subpixel2(im, patch, p):
"""
im: numpy array with source image.
patch: numpy array with patch to be inserted into the source image
p: tuple with the center of the position (can be float) where the patch is to be inserted.
"""
ths = patch.shape[0]/2
xpmin = p[0] - ths
ypmin = p[1] - ths
Ho = np.array([[1, 0, xpmin],
[0, 1, ypmin],
[0, 0, 1]], dtype=float)
h,w = im.shape
im2 = cv2.warpPerspective(patch, Ho, (w,h),
flags=cv2.INTER_LINEAR,
borderMode=cv2.BORDER_CONSTANT)
patch_mask = np.ones_like(patch,dtype=float)
blend_mask = cv2.warpPerspective(patch_mask, Ho, (w,h),
flags=cv2.INTER_LINEAR,
borderMode=cv2.BORDER_CONSTANT)
#I don't multiply im2 by blend_mask because im2 has already
#been interpolated with a zero background.
im3 = im*(1-blend_mask)+im2
im4 = cv2.convertScaleAbs(im3)
return im4
if __name__ == "__main__":
x,y = np.mgrid[0:10:1, 0:10:1]
im =(x+y).astype('uint8')*5
#im = np.ones((10,10), dtype='uint8')*30
patch = np.ones((4,4), dtype='uint8')*100
p=(5.5,5.5)
im = insert_patch_subpixel2(im, patch, p)
plt.gray()
plt.imshow(im, interpolation='none', extent = (0, 10, 10, 0))
ax=plt.gca()
ax.grid(color='r', linestyle='-', linewidth=1)
ax.set_xticks(np.arange(0, 10, 1));
ax.set_yticks(np.arange(0, 10, 1));
def format_coord(x, y):
col = int(x)
row = int(y)
z = im[row,col]
return 'x=%1.4f, y=%1.4f %s'%(x, y, z)
ax.format_coord = format_coord
plt.show()
In the images below we can see the results of a test with a small 10x10 image (filled with value 30) and inserting a 4x4 patch (filled with value 100) at p=(5,5) (left figure) and p=(5.5,5.5) (middle figure) and now we can see in the figures below that there is bilinear interpolation in the border. To show that the interpolation works with an arbitrary background I also show a test with a gradient 10x10 image background (right figure). The test script creates a figure that lets you inspect the pixel values and verify that the correct interpolation is done at each border pixel.

Categories

Resources