image matching in opencv python

image matching in opencv python - python

I've been working on a project of recognizing a flag shown in the camera using opencv python.
I've already tried using surf, color histogram matching, and template matching. But of these 3, it does not always return the correct answer. what i want now, is what would be the best solution to this problem of mine.
Example of the template images:
Here is an example of flag shown in camera.
what to use if this is the kind of images that i want to recognize?
Update code in matchTemplate
flags=["Cambodia.jpg","Laos.jpg","Malaysia.jpg","Myanmar.jpg","Philippines.jpg","Singapore.jpg","Thailand.jpg","Vietnam.jpg","Indonesia.jpg","Brunei.jpg"]
while True:
methods = 'cv2.TM_CCOEFF_NORMED'
list_of_pics=[]
for flag in flags:
template= cv2.imread(flag,0)
img = cv2.imread('philippines2.jpg',0)
# generate Gaussian pyramid for A
G = template.copy()
gpA = [G]
for i in xrange(6):
G = cv2.pyrDown(G)
gpA.append(G)
n=0
for x in gpA:
w, h = x.shape[::-1]
method = eval(methods)#
# Apply template Match
res = cv2.matchTemplate(img,x,method)
matchVal=res[0][0]
picDict={"matchVal":matchVal,"name":flag}
list_of_pics.append(picDict)
n=n+1
newlist = sorted(list_of_pics, key=operator.itemgetter('matchVal'),reverse=True)
#print newlist
matched_image=newlist[0]['name']
print matched_image
k=cv2.waitKey(10)
if (k==27):
break
cv2.destroyAllWindows()

I don't think that you can get good results from SURF/SIFT because:
SURF/SIFT need keypoints to detect the object but in your case, you have to detect flags and most of the flags are mostly uniform and do not provide much keypoints.
In your webcam frame, you have several things rather than having only flag. Those several things also contribute to get the keypoints.
Solution: i still think that you should use matchTemplate() of opencv which you have already tried but the problem in your version is that you didn't consider the fact that matchTemplate() is not scale and orientation invariant. So, the solution is to use Gaussian pyramid and create the different size (half, one forth, double etc.) of your sample flags. After getting the same flag in 2-5 different size, you should perform the matchTemplate() between every size of flag and the webcam frame.
Strategy:
Receive the webcam frame
Load the image of a flag.
Using Gaussian pyramid, create smaller and bigger images of that flag (you don't need to store them.)
Perform matchTemplate() between the webcam frame and each size of flag.
Result = with which so ever image you get the maximum correlation value is the flag present in your webcam.
REMEMBER: matchTemplate is not scale and orientation invariant. so, if you rotate the image or make it larger/smaller in the webcam frame...you won't get the good results.

SURF cannot be applied to the images that have no corners (when gradient is mostly goes in one direction like in a striped flag). Color histogram of the whole object may not work since both of your examples have similar colors. However, if you can apply a histogram to different parts of the image it will work better.
What you need to do is to split your training image on say 4 quadrants and create 4 color histograms. The testing stage will integrate these 4 back projected histograms and check for the right spatial order of responses. Color histogram is quite robust to rotations, scaling and perspective. It changes with illumination so you need to have liberal matching thresholds. Spatial resolution from 4 quadrants will help to ameliorate this situation.
For the future I recommend studying methods in more detail to understand their applicability rather than trying them randomly.

Related

Bokeh-like Blur with Mask as Intensity of Blur Radius

I know how to Gaussian blur with Pillow, but can't track how to mask it by intensity of the radius value with a mask.
I am using MiDaS package to produce depth maps form 2D images. What I want to do is be able to blur the original image by the depth mask as a pseudo depth of field.
Here is a visual demonstration of the result I'm after with CV2 or Pillow (I don't understand which can do what I'm after.)
Note: I'm sorry if this is considered junk, I've sat on this question for a month. I tried scouring the net for something like this, and all I found was Poor Man's Portrait Mode which I could not get to work, and also would be reproducing depth maps when I already have them from my script and used for the 3D image creation.
Edit:
I did come up with this, using composite Not sure why I didn't take note of it before. Though I have to say, the results aren't too great. I think I really do need to emulate some sort of shape blur like bokeh.
sharpen = 3
boxBlur = 5
oimg = Image.open('2.png').convert('RGB')
width, height = oimg.size
mimg = Image.open('2_depth.png').resize((width, height)).convert('L')
bimg = oimg.filter(ImageFilter.BoxBlur(int(boxBlur)))
bimg = bimg.filter(ImageFilter.BLUR)
for i in range(sharpen):
bimg = bimg.filter(ImageFilter.SHARPEN)
rimg = Image.composite(oimg, bimg, mimg)
Basically get your image, and mask, ensure the mask matches the image (I had a issue where images didn't match, but were the same size, just saved different from 2 saved the same way)
Blur your image to a new variable, however you like, Gaussian, etc. Gaussian was too soft for me. Add whatever extra filtering you want
Composite the results together, using depth map as a mask for composite.
Note: If someone knows how to achieve a different sort of blur that mimics bokeh, I'd like to know, and have adjusted the question title. I read about a discBlur but couldn't find anything for PIL/CV2.

I’ve got only a brute-force solution with iteration over pixels: Variable blur intensity.
My code is working but not as efficiently as I want.
You can try. Open your image as input and put your depth map in the variable blur_map.

OpenCV: What can cause a mostly black stereovision disparity map?

I have been dipping my toes into OpenCV and the stereovision functions it contains, and am struggling to get good results while following instructions in both the OpenCV documentation and many articles online. Specifically, I believe that at this point I have managed to obtain a decent calibration of my cameras, a decent stereo calibration, and even a decent rectification, but when moving to create the disparity map I seem to get nonsense back.
I am using a set of self-acquired images taken with a Pentax K-3 ii camera using a Loreo Lens-in-a-cap CCD splitter which gives me "two" images taken on one CCD. I can then split the image in half (and trim some of the pixels near the overlap) to have a reliable baseline distance in world coordinates with the camera. I unfortunately have no information on the true focal length of this configuration but I would guess it is around 9cm.
I have performed camera calibration on each split-image set to get camera matrices, distance coefficients, and object and image points for use in epipolar geometry. Then, following the procedure laid out in [1,2], perform stereo calibration and rectification. I do not have the required reputation to embed images, so please click here. By my understanding, the fact that similar features in both images are similar distances to the true horizontal lines I have drawn across them means that this is a good rectification result and should be usable.
However, when I implement the following code to create the disparity map:
# Settings for cv.StereoSGBM_create
minDisparity = 1
numDisparities = 64
blockSize = 1
disp12MaxDiff = 1
uniquenessRatio = 10
speckleWindowSize = 0
speckleRange = 8
stereo = cv.StereoSGBM_create(minDisparity=minDisparity, numDisparities=numDisparities, blockSize=blockSize, disp12MaxDiff=disp12MaxDiff, uniquenessRatio=uniquenessRatio,
speckleWindowSize=speckleWindowSize, speckleRange=speckleRange)
# Calculate the disparity map
disp = stereo.compute(imgL, imgR).astype(np.float32)
# Normalize the values to spread them across the viewable range
disp = cv.normalize(disp,0,255,cv.NORM_MINMAX)
# Resize for display
disp = cv.resize(disp, (1000,1000))
cv.imshow("disparity",disp)
cv.waitKey(0)
The result is disheartening. Intuitively, seeing a lot of black space surrounding edges which actually are fairly well-defined (such as in the chessboard pattern or near my hands) would suggest that there is very little disparity. However it seems clear to me that the images are quite different in terms of translation, so I am a bit confused. I have been delving through the documentation and run out of ideas. I tried reusing the code that produced the initial set of epipolar lines provided here which seemed to work on the original image quite nicely. However, it produces epipolar lines which are certainly not horizontal. This tells me that something is wrong, but I do not understand what could be, especially given the "visual test" I described above. I suspect I am misapplying that section of the code.
One thought I have is that I need to use an ROI to select the valid parts of the image, but I am unsure how to go about this. I think this is supported by the odd streaking behavior at the right edge of the left image post-rectification.
This is a link to a pastebin of all of my code, aside from the initial camera calibration which has significant runtime due to the size of the images.
I would appreciate any help that can be offered as at this point I am going a bit codeblind. I am limited to only 8 links due to my reputation, so please let me know if I can provide better images or documentation of my work.

shape detection

I have tried 3 algorithms:
Compare by Compare_ssim.
Difference detection by PIL (ImageChops.difference).
Images subtraction.
The first algorithm:
(score, diff) = compare_ssim(img1, img2, full=True)
diff = (diff * 255).astype("uint8")
The second algorithm:
from PIL import Image ,ImageChops
img1=Image.open("canny1.jpg")
img2=Image.open("canny2.jpg")
diff=ImageChops.difference(img1,img2)
if diff.getbbox():
diff.show()
The third algorithm:
image3= cv2.subtract(image1,image2)
The problem is these algorithms are so sensitive. If the images have different noise, they consider that the two images are totally different. Any ideas to fix that?

These pictures are different in many ways (deformation, lighting, colors, shape) and simple image processing just cannot handle all of this.
I would recommend a higher level method that tries to extract the geometry and color of those tubes, in the form of a simple geometric graph. Then compare the graphs rather than the images.
I acknowledge that this is easier said than done, and will only work with this particular kind of scene.

It is very difficult to help since we don't really know which parameters you can change, like can you keep your camera fixed? Will it always be just about tubes? What about tubes colors?
Nevertheless, I think what you are looking for is a framework for image registration and I propose you to use SimpleElastix. It is mainly used for medical images so you might have to get familiar with the library SimpleITK. What's interesting is that you have a lot of parameters to control the registration. I think that you will have to look into the documentation to find out how to control a specific image frequency, the one that create the waves and deform the images. Hereafter I did not configured it to have enough local distortion, you'll have to find the best trade-off, but I think it should be flexible enough.
Anyway, you can get such result with the following code, I don't know if it helps, I hope so:
import cv2
import numpy as np
import matplotlib.pyplot as plt
import SimpleITK as sitk
fixedImage = sitk.ReadImage('1.jpg', sitk.sitkFloat32)
movingImage = sitk.ReadImage('2.jpg', sitk.sitkFloat32)
elastixImageFilter = sitk.ElastixImageFilter()
affine_registration_parameters = sitk.GetDefaultParameterMap('affine')
affine_registration_parameters["NumberOfResolutions"] = ['6']
affine_registration_parameters["WriteResultImage"] = ['false']
affine_registration_parameters["MaximumNumberOfSamplingAttempts"] = ['4']
parameterMapVector = sitk.VectorOfParameterMap()
parameterMapVector.append(affine_registration_parameters)
parameterMapVector.append(sitk.GetDefaultParameterMap("bspline"))
elastixImageFilter.SetFixedImage(fixedImage)
elastixImageFilter.SetMovingImage(movingImage)
elastixImageFilter.SetParameterMap(parameterMapVector)
elastixImageFilter.Execute()
registeredImage = elastixImageFilter.GetResultImage()
transformParameterMap = elastixImageFilter.GetTransformParameterMap()
resultImage = sitk.Subtract(registeredImage, fixedImage)
resultImageNp = np.sqrt(sitk.GetArrayFromImage(resultImage) ** 2)
cv2.imwrite('gray_1.png', sitk.GetArrayFromImage(fixedImage))
cv2.imwrite('gray_2.png', sitk.GetArrayFromImage(movingImage))
cv2.imwrite('gray_2r.png', sitk.GetArrayFromImage(registeredImage))
cv2.imwrite('gray_diff.png', resultImageNp)
Your first image resized to 256x256:
Your second image:
Your second image registered with the first one:
Here is the difference between the first and second image which could show what's different:

This is one of the classical problems of image treatment - and one which does not have an answer which holds universally. The possible answers depend highly on what type of images you have, and what type of information you want to extract from them and the differences between them.
You can reduce noise by two means:
a) take several images of the same object, such that the object does not change. You can stack the images and noise is reduced by square-root of the number of images.
b) You can run a blur filter over the image. The more you blur, the more noise is averaged. Noise is here reduced by square-root of the number of pixels you average over. But so is detail in the images.
In both cases (a) and (b) you run the difference analysis after you applied either method.
Probably not applicable to you as you likely cannot get hold of either: it helps, if you can get hold of flatfields which give the inhomogeneity of illumination and pixel sensitivity of your camera and allow correcting the images prior to any treatment. Similar goes for darkfields which give an estimate of the influence of the read-out noise of the camera and allow correcting images for those.
There is somewhat another 3rd option, which is more high-level: run your object analysis first at a detailed-enough level. And compare the results.

Eliminate unwanted keypoints

I would like to eliminate the keypoints detected around the frame of an image (an artwork of a museum gallery ). In other words I want to separate out the actual artwork from its frame. Each artwork consist of different types of frames.
![Keypoints detected using sift][1]
I have already written a Python wrapper for David Lowe's SIFT implementation to detect keypoints as well as to compute descriptors.
However my question is what is the best approach to solve this problem? any of the following or something else?
Using Hough transformation (using Python Image Library)
Template matching
Your help is highly appreciated
Thanks again

I'd go with Hough transform and try to detect lines which form a quadrilateral.
You might get into trouble if the painting actually does contain a square or something. I'd look for some assumptions like: acceptable aspect ratio, acceptable size. Also find the outermost quadrilateral, and work your way towards the center of the image picking up inner quadrilaterals, if applicable. This would give you the frame and its thickness, so you can disregard any keypoints here or beyond the frame.
P.S. If you got some random replies from me, it's because I accidentally replied to another post in your thread... ^^

For each artwork, do you have a clean, properly framed reference image?
If so another solution to remove the background clutter is:
to use the ratio test algorithm to compute keypoints correspondences between your frame and the reference image,
to perform a geometric consistency check to filter out false matches.
In addition the geometric check will provide you with the homography matrix that you can use to warp your input frame or alternatively to project the corners of the reference images.
That way you will natively obtain the artwork area within your frame.
Here's an example about how you can do that with opensift's match tool - below is an illustration.

What algorithm can be used to determine the presence of multiple stripes?

Using python, which may be the best algorithm or the best strategy to detect the presence of colored bands as in image?
The image is scanned and cropped, the problem is that the crop not to be precise and I can not make use of a control that makes use of Cartesian coordinates to determine if the lines are present.
The strips may be present or not.

You have a number of options at your disposal:
If the strips are going to be the same size, and their orientation is known, then you can use cross-correlation (with working Python source). Your template image could be a single stripe, or a multiple strip pattern if you know the number of strips and their spacing.
More generally, you could go with morphological image processing and look for rectangles. You'd first have to threshold your image (using Ohtsu's method or some empirically determined threshold) and then perform contour detection. Here's an example that does something similar, but for ellipses -- it's trivial to modify it to look for rectangles. This time the source in in C, but it uses OpenCV like the first example, so it should be trivial to port
There are other approaches such as edge detection and Fourier analysis, but I really think that the first two are going to be more than enough for you.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.