I need help, please.
I'm trying to select and crop the overlapping area of two images with the Python Pillow library.
I have the upper-left pixel coordinate of the two pictures. With these, I can find out which one is located above the other.
I wrote a function, taking two images as arguments:
def function(img1, img2):
x1 = 223 #x coordinate of the first image
y1 = 197 #y coordinate of the first image
x2 = 255 #x coordinate of the second image
y2 = 197 #y coordinate of the second image
dX = x1 - x2
dY = y1 - y2
if y1 <= y2: #if the first image is above the other
upper = img1
lower = img2
flag = False
else:
upper = img2
lower = img1
flag = True
if dX <= 0: #if the lower image is on the left
box = (abs(dX), abs(dY), upper.size[0], upper.size[1])
a = upper.crop(box)
box = (0, 0, upper.size[0] - abs(dX), upper.size[1] - abs(dY))
b = lower.crop(box)
else:
box = (0, abs(dY), lower.size[0] - abs(dX), upper.size[1])
a = upper.crop(box)
box = (abs(dX), 0, lower.size[0], upper.size[1] - abs(dY))
b = lower.crop(box)
if flag:
return b,a #switch the two images again
else:
return a,b
I know for sure that the result is wrong (It's a school assignment).
Thanks for your help.
First of all, I don't quite get what do you mean by one picture being "above" the other (shouldn't that be a z-position?), but take a look at this: How to make rect from the intersection of two? , the first answer might be a good lead. :)
Related
Essentially, my original image has N instances of a certain object. I have the bounding box coordinates and the class for all of them in a text file. This is basically a dataset for YoloV3 and darknet. I want to generate additional images by slicing the original one in a way such that it contains at least 1 instance of one of those objects and if it does, save the image, and the new bounding box coordinates of the objects in that image.
The following is the code for slicing the image:
x1 = random.randint(0, 1200)
width = random.randint(0, 800)
y1 = random.randint(0, 1200)
height = random.randint(30, 800)
slice_img = img[x1:x1+width, y1:y1+height]
plt.imshow(slice_img)
plt.show()
My next step is to use template matching to find if my sliced image is in the original one:
w, h = slice_img.shape[:-1]
res = cv2.matchTemplate(img, slice_img, cv2.TM_CCOEFF_NORMED)
threshold = 0.6
loc = np.where(res >= threshold)
for pt in zip(*loc[::-1]): # Switch columns and rows
cv2.rectangle(img, pt, (pt[0] + w, pt[1] + h), (0, 0, 255), 5)
cv2.imwrite('result.png', img)
At this stage, I am quite lost and not sure how to proceed any further.
Ultimately, I need many new images with corresponding text files containing the class and coordinates. Any advice would be appreciated. Thank you.
P.S I cannot share my images with you, unfortunately.
Template matching is way overkill for this. Template matching essentially slides a kernel image over your main image and compares pixels of each, performing many many computations. There's no need to search the image because you already know where the objects are within the image. Essentially, you are trying to determine whether one rectangle (bounding box for an object) overlaps sufficiently with the slice, and you know the exact coordinates of each rectangle. Thus, it's a geometry problem rather than a computer vision problem.
(As an aside: the correct term for what you are calling a slice would probably be crop; slice generally means you're taking an N-dimensional array (say 3 x 4 x 5) and taking a subset of data that is N-1 dimensional by selecting a single index for one dimension (i.e. take index 0 on dimension 0 to get a 1 x 4 x 5 array).
Here's a brief example of how you might do this. Let x1 x2 y1 y2 be the min and max x and y coordinates for the crop you generate. Let ox1 ox2 oy1 oy2 be the min and max x and y coordinates for an object:
NO_SUCCESSFUL_CROPS = True
while NO_SUCCESSFUL_CROPS:
# Generate crop
x1 = random.randint(0, 1200)
width = random.randint(0, 800)
y1 = random.randint(0, 1200)
height = random.randint(30, 800)
x2 = x1 + width
y2 = y1 + height
# for each bounding box
#check if at least (nominally) 70% of object is within crop
threshold = 0.7
for bbox in all_objects:
#assign bbox to ox1 ox2 oy1 oy2
ox1,ox2,oy1,oy2 = bbox
# compute percentage of bbox that is within crop
minx = max(ox1,x1)
miny = max(oy1,y1)
maxx = min(ox2,x2)
maxy = min(oy2,y2)
area_in_crop = (maxx-minx)*(maxy-miny)
area of bbox = (ox2-ox1)*(oy2-oy1)
ratio = area_in_crop / area_of_bbox
if ratio > threshold:
# break loop
NO_SUCCESSFUL_CROPS = False
# crop image as above
crop_image = image[y1:y2,x1:x2] # if image is an array, may have to do y then x because y is row and x is column. Not sure exactly which form opencv uses
cv2.imwrite("output_file.png",crop_image)
# shift bbox coords since (x1,y1) is the new (0,0) pixel in crop_image
ox1 -= x1
ox2 -= x1
oy1 -= y1
oy2 -= y2
break # no need to continue (although you could alternately continue until you have N crops, or even make sure you get one crop with each object)
I'm trying to work on video stabilization using python and template matching via skimage. The code is supposed to track a single point during the whole video but the tracking is awfully imprecise and I suspect it's not even working correctly
This is the track_point function which is supposed to take a video as an input and some coordinates of a point and then return an array of tracked points for each frame
from skimage.feature import match_template
from skimage.color import rgb2gray
def track_point(video, x, y, patch_size = 4, search_size = 40):
length, height, width, _ = video.shape
frame = rgb2gray(np.squeeze(video[1, :, :, :])) # convert image to grayscale
x1 = int(max(1, x - patch_size / 2))
y1 = int(max(1, y - patch_size / 2))
x2 = int(min(width, x + patch_size / 2 - 1))
y2 = int(min(height, y + patch_size / 2 - 1))
template = frame[y1:y2, x1:x2] # cut the reference patch (template) from the first frame
track_x = [x]
track_y = [y]
#plt.imshow(template)
half = int(search_size/2)
for i in range(1, length):
prev_x = int(track_x[i-1])
prev_y = int(track_y[i-1])
frame = rgb2gray(np.squeeze(video[i, :, :, :])) # Extract current frame and convert it grayscale
image = frame[prev_x-half:prev_x+half,prev_y-half:prev_y+half] # Cut-out a region of search_size x search_size from 'frame' with the center in the point's previous position (i-1)
result = match_template(image, template, pad_input=False, mode='constant', constant_values=0) # Compare the region to template using match_template
ij = np.unravel_index(np.argmax(result), result.shape) # Select best match (maximum) and determine its position. Update x and y and append new x,y values to track_x,track_y
x, y = ij[::-1]
x += x1
y += y1
track_x.append(x)
track_y.append(y)
return track_x, track_y
And this is the implementation of the function
points = track_point(video, point[0], point[1])
# Draw trajectory on top of the first frame from video
image = np.squeeze(video[1, :, :, :])
figure = plt.figure()
plt.gca().imshow(image)
plt.gca().plot(points[0], points[1])
I expect the plot to be somehow regular since the video isn't that shaky but it's not.
For some reason the graph is plotting almost all of the coordinates of the search template.
EDIT: Here's the link for the video: https://upload-video.net/a11073n9Y11-noau
What am I doing wrong?
I am trying to compare 2 images using PIL and the below is my scenario.
img1:
img2:
img1 = Image.open(img1.png)
img2 = Image.open(img2.png)
I have written a simple diff function which will return -1 if there is a difference or 0 if they are same.
def diff(img1, img2):
im1 = img1.load()
im2 = img2.load()
for i in range(0, img1.size[0]):
for j in range(0, img1.size[1]):
if(im1[i,j] != im2[i,j]):
return -1
return 0
I am passing the following:
diff(img2, img1.transpose(Image.FLIP_LEFT_RIGHT))
Both are exactly the same image but I get a difference. The difference seems to be at:
[27 84]
Can someone please explain me why?
"Both are exactly the same image but I get a difference."
But they're not.
You can see this, using the code below for example:
def show_diff(img1, img2):
diff = Image.new("RGB", img1.size, (255,255,255))
for x1 in range(img1.size[0]):
for y1 in range(img1.size[1]):
x2 = img1.size[0] - 1 - x1
y2 = img1.size[1] - 1 - y1
if img1.getpixel((x1,y1)) != img2.getpixel((x2,y2)):
print(x1,y1,x2,y2)
diff.putpixel((x1,y1), (255,0,0))
diff.show()
img_r = Image.open("img/pacman-r.png")
img_l = Image.open("img/pacman-l.png")
show_diff(img_r, img_l)
Which results in
(Here, any pixel that differs between the two images is colored red.)
Or with
def show_delta(img1, img2):
diff = Image.new("RGB", img1.size, (255,255,255))
for x1 in range(img1.size[0]):
for y1 in range(img1.size[1]):
x2 = img1.size[0] - 1 - x1
y2 = img1.size[1] - 1 - y1
p1 = img1.getpixel((x1,y1))
p2 = img2.getpixel((x2,y2))
p3 = round((p1[0] / 2) - (p2[0] / 2)) + 128
diff.putpixel((x1,y1), (p3,p3,p3))
diff.show()
img_r = Image.open("img/pacman-r.png")
img_l = Image.open("img/pacman-l.png")
show_delta(img_r, img_l)
which results in
(Here, equivalent pixels are gray while a white pixel signifies a pixel in img1 was set (dark) while unset in img2 and a black pixel signifies the opposite.)
It seems like you suspected that PIL's Image.transpose method caused the problem, but the source images aren't just transposed.
Image.transpose works as you'd expect -- so something like:
def diff(img1, img2):
im1 = img1.load()
im2 = img2.load()
images_match = True
for i in range(0, img1.size[0]):
for j in range(0, img1.size[1]):
if(im1[i,j] != im2[i,j]):
images_match = False
return images_match
img_r = Image.open("img/pacman-r.png")
# NOTE: **NOT** Using img_l here
print(diff(img_r, img_r.transpose(Image.FLIP_LEFT_RIGHT).transpose(Image.FLIP_LEFT_RIGHT)))
returns True.
(Here, an image is compared to a twice-transposed version of itself)
I've got a red laser (dot and linear). I want to locate it and using the least squares method get a line located closest to the image of the laser. I used this Numpy function to get coefficients, Python 2.7 and OpenCV 3.1.
So, here's my code:
while loop == 1:
rval, frame = vc.read()
frame = imutils.resize(frame, width=640, height=480)
red, green, blue = cv2.split(frame)
rbin, thresholdImg = cv2.threshold(red, 240, 255, cv2.THRESH_BINARY)
new = np.argwhere(thresholdImg == 255) #Get only RED pixels
if len(new) == 0: #If laser lost
assistantView(3,assistantImg)
else:
xs = []
ys = []
for (x,y) in new: #Extract red pixels positions
xs = np.append(xs,x)
ys = np.append(ys,y)
ArrayToResult = np.vstack([xs, np.ones(len(xs))]).T
m, c = np.linalg.lstsq(ArrayToResult, ys)[0] #Applying least squares method
A = m
B = c
x1 = np.amin(xs) #Take "left" and "right" X-coords
x2 = np.amax(xs)
ymin = int(np.amin(ys))
ymax = int(np.amax(ys))
y1 = x1*A + B #Get line
y2 = x2*A + B
x1 = int(x1)
x2 = int(x2)
y1 = int(y1)
y2 = int(y2)
print(x1, y1, x2, y2)
cv2.line(thresholdImg,(x1,y1),(x2,y2),(255,0,0),1) #Draw a line
So, using dot-laser I had to get a straight line passing through the center of the laser image. But here's what I got:
And with the help of print(x1, y1, x2, y2), I noticed that the function is built right on them, do not correspond to the coordinates of the location of the laser. Move the camera, I noticed that the line is almost symmetrical to the image of the laser relative to the y=x. So, I have used an inverse function as follows:
y1 = (x1-B) / A
y2 = (x2-B) / A
And the result is:
Now Y-coords are looks like:
4698, 29126, 3726, 805208, 19575, -1671, -2952, 13194....
The second day, I'm trying to solve this problem. What am I doing wrong?
I have no idea why it works, but it works. I wrote in an array XS positions of Y-coordinates and in YS X-coordinates:
for (x,y) in new:
xs = np.append(xs,y) #was X
ys = np.append(ys,x) #was Y
This way all works fine.
And this is the result:
What I'm trying to do in this example is wrap an image around a circle, like below.
To wrap the image I simply calculated the x,y coordinates using trig.
The problem is the calculated X and Y positions are rounded to make them integers. This causes the blank pixels in seen the wrapped image above. The x,y positions have to be an integer because they are positions in lists.
I've done this again in the code following but without any images to make things easier to see. All I've done is create two arrays with binary values, one array is black the other white, then wrapped one onto the other.
The output of the code is.
import math as m
from PIL import Image # only used for showing output as image
width = 254.0
height = 24.0
Ro = 40.0
img = [[1 for x in range(int(width))] for y in range(int(height))]
cir = [[0 for x in range(int(Ro * 2))] for y in range(int(Ro * 2))]
def shom_im(img): # for showing data as image
list_image = [item for sublist in img for item in sublist]
new_image = Image.new("1", (len(img[0]), len(img)))
new_image.putdata(list_image)
new_image.show()
increment = m.radians(360 / width)
rad = Ro - 0.5
for i, row in enumerate(img):
hyp = rad - i
for j, column in enumerate(row):
alpha = j * increment
x = m.cos(alpha) * hyp + rad
y = m.sin(alpha) * hyp + rad
# put value from original image to its position in new image
cir[int(round(y))][int(round(x))] = img[i][j]
shom_im(cir)
I later found out about the Midpoint Circle Algorithm but I had worse result with that
from PIL import Image # only used for showing output as image
width, height = 254, 24
ro = 40
img = [[(0, 0, 0, 1) for x in range(int(width))]
for y in range(int(height))]
cir = [[(0, 0, 0, 255) for x in range(int(ro * 2))] for y in range(int(ro * 2))]
def shom_im(img): # for showing data as image
list_image = [item for sublist in img for item in sublist]
new_image = Image.new("RGBA", (len(img[0]), len(img)))
new_image.putdata(list_image)
new_image.show()
def putpixel(x0, y0):
global cir
cir[y0][x0] = (255, 255, 255, 255)
def drawcircle(x0, y0, radius):
x = radius
y = 0
err = 0
while (x >= y):
putpixel(x0 + x, y0 + y)
putpixel(x0 + y, y0 + x)
putpixel(x0 - y, y0 + x)
putpixel(x0 - x, y0 + y)
putpixel(x0 - x, y0 - y)
putpixel(x0 - y, y0 - x)
putpixel(x0 + y, y0 - x)
putpixel(x0 + x, y0 - y)
y += 1
err += 1 + 2 * y
if (2 * (err - x) + 1 > 0):
x -= 1
err += 1 - 2 * x
for i, row in enumerate(img):
rad = ro - i
drawcircle(int(ro - 1), int(ro - 1), rad)
shom_im(cir)
Can anybody suggest a way to eliminate the blank pixels?
You are having problems filling up your circle because you are approaching this from the wrong way – quite literally.
When mapping from a source to a target, you need to fill your target, and map each translated pixel from this into the source image. Then, there is no chance at all you miss a pixel, and, equally, you will never draw (nor lookup) a pixel more than once.
The following is a bit rough-and-ready, it only serves as a concept example. I first wrote some code to draw a filled circle, top to bottom. Then I added some more code to remove the center part (and added a variable Ri, for "inner radius"). This leads to a solid ring, where all pixels are only drawn once: top to bottom, left to right.
How you exactly draw the ring is not actually important! I used trig at first because I thought of re-using the angle bit, but it can be done with Pythagorus' as well, and even with Bresenham's circle routine. All you need to keep in mind is that you iterate over the target rows and columns, not the source. This provides actual x,y coordinates that you can feed into the remapping procedure.
With the above done and working, I wrote the trig functions to translate from the coordinates I would put a pixel at into the original image. For this, I created a test image containing some text:
and a good thing that was, too, as in the first attempt I got the text twice (once left, once right) and mirrored – that needed a few minor tweaks. Also note the background grid. I added that to check if the 'top' and 'bottom' lines – the outermost and innermost circles – got drawn correctly.
Running my code with this image and Ro,Ri at 100 and 50, I get this result:
You can see that the trig functions make it start at the rightmost point, move clockwise, and have the top of the image pointing outwards. All can be trivially adjusted, but this way it mimics the orientation that you want your image drawn.
This is the result with your iris-image, using 33 for the inner radius:
and here is a nice animation, showing the stability of the mapping:
Finally, then, my code is:
import math as m
from PIL import Image
Ro = 100.0
Ri = 50.0
# img = [[1 for x in range(int(width))] for y in range(int(height))]
cir = [[0 for x in range(int(Ro * 2))] for y in range(int(Ro * 2))]
# image = Image.open('0vWEI.png')
image = Image.open('this-is-a-test.png')
# data = image.convert('RGB')
pixels = image.load()
width, height = image.size
def shom_im(img): # for showing data as image
list_image = [item for sublist in img for item in sublist]
new_image = Image.new("RGB", (len(img[0]), len(img)))
new_image.putdata(list_image)
new_image.save("result1.png","PNG")
new_image.show()
for i in range(int(Ro)):
# outer_radius = Ro*m.cos(m.asin(i/Ro))
outer_radius = m.sqrt(Ro*Ro - i*i)
for j in range(-int(outer_radius),int(outer_radius)):
if i < Ri:
# inner_radius = Ri*m.cos(m.asin(i/Ri))
inner_radius = m.sqrt(Ri*Ri - i*i)
else:
inner_radius = -1
if j < -inner_radius or j > inner_radius:
# this is the destination
# solid:
# cir[int(Ro-i)][int(Ro+j)] = (255,255,255)
# cir[int(Ro+i)][int(Ro+j)] = (255,255,255)
# textured:
x = Ro+j
y = Ro-i
# calculate source
angle = m.atan2(y-Ro,x-Ro)/2
distance = m.sqrt((y-Ro)*(y-Ro) + (x-Ro)*(x-Ro))
distance = m.floor((distance-Ri+1)*(height-1)/(Ro-Ri))
# if distance >= height:
# distance = height-1
cir[int(y)][int(x)] = pixels[int(width*angle/m.pi) % width, height-distance-1]
y = Ro+i
# calculate source
angle = m.atan2(y-Ro,x-Ro)/2
distance = m.sqrt((y-Ro)*(y-Ro) + (x-Ro)*(x-Ro))
distance = m.floor((distance-Ri+1)*(height-1)/(Ro-Ri))
# if distance >= height:
# distance = height-1
cir[int(y)][int(x)] = pixels[int(width*angle/m.pi) % width, height-distance-1]
shom_im(cir)
The commented-out lines draw a solid white ring. Note the various tweaks here and there to get the best result. For instance, the distance is measured from the center of the ring, and so returns a low value for close to the center and the largest values for the outside of the circle. Mapping that directly back onto the target image would display the text with its top "inwards", pointing to the inner hole. So I inverted this mapping with height - distance - 1, where the -1 is to make it map from 0 to height again.
A similar fix is in the calculation of distance itself; without the tweaks Ri+1 and height-1 either the innermost or the outermost row would not get drawn, indicating that the calculation is just one pixel off (which was exactly the purpose of that grid).
I think what you need is a noise filter. There are many implementations from which I think Gaussian filter would give a good result. You can find a list of filters here. If it gets blurred too much:
keep your first calculated image
calculate filtered image
copy fixed pixels from filtered image to first calculated image
Here is a crude average filter written by hand:
cir_R = int(Ro*2) # outer circle 2*r
inner_r = int(Ro - 0.5 - len(img)) # inner circle r
for i in range(1, cir_R-1):
for j in range(1, cir_R-1):
if cir[i][j] == 0: # missing pixel
dx = int(i-Ro)
dy = int(j-Ro)
pix_r2 = dx*dx + dy*dy # distance to center
if pix_r2 <= Ro*Ro and pix_r2 >= inner_r*inner_r:
cir[i][j] = (cir[i-1][j] + cir[i+1][j] + cir[i][j-1] +
cir[i][j+1])/4
shom_im(cir)
and the result:
This basically scans between two ranges checks for missing pixels and replaces them with average of 4 pixels adjacent to it. In this black white case it is all white.
Hope it helps!