I have been dipping my toes into OpenCV and the stereovision functions it contains, and am struggling to get good results while following instructions in both the OpenCV documentation and many articles online. Specifically, I believe that at this point I have managed to obtain a decent calibration of my cameras, a decent stereo calibration, and even a decent rectification, but when moving to create the disparity map I seem to get nonsense back.
I am using a set of self-acquired images taken with a Pentax K-3 ii camera using a Loreo Lens-in-a-cap CCD splitter which gives me "two" images taken on one CCD. I can then split the image in half (and trim some of the pixels near the overlap) to have a reliable baseline distance in world coordinates with the camera. I unfortunately have no information on the true focal length of this configuration but I would guess it is around 9cm.
I have performed camera calibration on each split-image set to get camera matrices, distance coefficients, and object and image points for use in epipolar geometry. Then, following the procedure laid out in [1,2], perform stereo calibration and rectification. I do not have the required reputation to embed images, so please click here. By my understanding, the fact that similar features in both images are similar distances to the true horizontal lines I have drawn across them means that this is a good rectification result and should be usable.
However, when I implement the following code to create the disparity map:
# Settings for cv.StereoSGBM_create
minDisparity = 1
numDisparities = 64
blockSize = 1
disp12MaxDiff = 1
uniquenessRatio = 10
speckleWindowSize = 0
speckleRange = 8
stereo = cv.StereoSGBM_create(minDisparity=minDisparity, numDisparities=numDisparities, blockSize=blockSize, disp12MaxDiff=disp12MaxDiff, uniquenessRatio=uniquenessRatio,
speckleWindowSize=speckleWindowSize, speckleRange=speckleRange)
# Calculate the disparity map
disp = stereo.compute(imgL, imgR).astype(np.float32)
# Normalize the values to spread them across the viewable range
disp = cv.normalize(disp,0,255,cv.NORM_MINMAX)
# Resize for display
disp = cv.resize(disp, (1000,1000))
cv.imshow("disparity",disp)
cv.waitKey(0)
The result is disheartening. Intuitively, seeing a lot of black space surrounding edges which actually are fairly well-defined (such as in the chessboard pattern or near my hands) would suggest that there is very little disparity. However it seems clear to me that the images are quite different in terms of translation, so I am a bit confused. I have been delving through the documentation and run out of ideas. I tried reusing the code that produced the initial set of epipolar lines provided here which seemed to work on the original image quite nicely. However, it produces epipolar lines which are certainly not horizontal. This tells me that something is wrong, but I do not understand what could be, especially given the "visual test" I described above. I suspect I am misapplying that section of the code.
One thought I have is that I need to use an ROI to select the valid parts of the image, but I am unsure how to go about this. I think this is supported by the odd streaking behavior at the right edge of the left image post-rectification.
This is a link to a pastebin of all of my code, aside from the initial camera calibration which has significant runtime due to the size of the images.
I would appreciate any help that can be offered as at this point I am going a bit codeblind. I am limited to only 8 links due to my reputation, so please let me know if I can provide better images or documentation of my work.
Related
I am trying to perform image registration on potentially hundreds of aerial images taken from a camera mounted on a UAV. I think it is safe to assume that I know the ordering of the images, and hopefully, sequential images will overlap.
I have read some papers that suggest using a CNN to find the homography matrix can vastly outperform the old school feature descriptor matching with RANSAC song and dance. My issue is that I don't quite understand how to stitch more than 2 images together. It seems to me that to register image 100 in the same coordinate frame as image 1 using the cv2.warpPerspective function, I would do I100H1H2*H3...H99. Even if the error in each transform is small after 100 applications it seems like it would be huge. My understanding is that the solution to this problem is bundle adjustment.
I have looked into bundle adjustment a little bit but Im struggling to see how exactly I can use it. I have read the paper that many related stack overflow posts suggest "Automatic Panoramic Image Stitching using Invariant Features". In the section on bundle adjustment IF I understand the authors suggest that after building the initial panorama it is likely that image A will eventually overlap with multiple other images. Using the matched feature points in any images that overlap with A they basically calculate some adjustment...? I think to image A?
My question is using openCV how do I apply this adjustment? Let's say I have 3 images I1, I2, I3 all overlapping for a minimal example.
#assuimg CNN model predicts transform
#I think the first step is find the homography between all images
H12 = cnnMod.predict(I1,I2)
H13 = cnnMod.predict(I1,I3)
H23 = cnnMod.predict(I2,I3)
outI2 = cv2.warpPerspective(I2,H12,(maxWidth, maxHeight),flags=cv2.INTER_LINEAR)
outI3 = cv2.warpPerspective(I2,H23,(maxWidth, maxHeight),flags=cv2.INTER_LINEAR)
#now would I do some bundle voodoo?
#what would it look like?
#which of the bundler classes should I use?
#would it look like this?
#or maybe the input is features?
voodoo = cv2.bundleVoodoo([H12,H13,H23])
golaballyRectifiedI2 = cv2.warpPerspective(outI2,voodoo[2],(maxWidth, maxHeight),flags=cv2.INTER_LINEAR)
The code is my best guess at what a solution might look like but clearly I have no idea what I am doing. I've not been able to find anything that actually shows how the bundle adjustment is done.
The basic idea underlying image alignment through bundle adjustment is that, rather than matching pairs of 2D points (x, x') across pairs of images, you posit the existence of 3d points X that, ideally, project onto matched tuples of 2D points (x, x', x'', ...) matched among corresponding tuples of images. You then solve for the location of the X's and the camera parameters (extrinsics, and intrinsics if the camera is uncalibrated) that minimize the (robustified, usually) RMS reprojection error over all 2d points and images.
Depending on your particular setup and scene, you may make some simplifying assumptions, e.g.:
That the X's all belong to the same plane (which you can arbitrarily choose as the world's Z=0 plane). This is useful, for example, when stitching images of a painting, or aerial images on relatively flat ground with relatively small extent so one can ignore the earth's curvature.
Or that the X's are all on the WGS84 ellipsoid.
Both the above assumptions remove one free coordinate from X, effectively reducing the problem's dimensionality.
I have a photo taken from a camera (whose focal length, principle point, and distortion coefficients I know). The photo has a 8cm x 8cm post-in on a table and the center of the post-it is the origin (0, 0) again in cm. I've also indicated the positive-y axis on the post-it.
From this information is it possible to compute the location of the camera and the vector in which the camera is looking in Python using OpenCV? If someone has a snippet of code that does that (assuming you know the coordinates of the post-it corners already) that would be amazing!
Use OpenCV's solvePnP specifying SOLVEPNP_IPPE_SQUARE in the flags. With only 4 points (and a postit) the solution will be quite sensitive to how accurately you mark their images, so ask yourself whether you really need the camera pose and location for your application, and how accurately. E.g., if you just want to make a flat CG "sticker" stay fixed on the table while the camera moves, all you need is estimating a homography, a much simpler task.
It does look like you have all the information required. The marker you use can be easily segmented. Shape analysis will provide corners. I did something similar to get basic eyesight tracking:
Here is a complete example.
Segmentation result for the example:
Please notice, accuracy really matters, so it might be useful to rely on several sets of points.
I have some SIFT features in two stereo images, and I'm trying to place them in 3D space. I've found triangulatePoints, which seems to be what I want, however, I'm having trouble with the arguments.
triangulatePoints takes 4 arguments, projMatr1 and projMatr2, which is where my issues start, and projPoints1 and projPoints2, which are my feature points. The OpenCV docs suggest using stereoRectify to find the projection matrices.
stereoRectify takes the intrinsic camera matrices (which I've calculated prior with calibrateCamera) and the image size from calibration. As well as two arguments R (rotation matrix) and T (translation vector), which can be found with stereoCalibrate.
However, stereoCalibrate takes "object points", which I'm pretty sure I can't calculate for images without a reference, which is a bit of a roadblock.
Is this the best way to be calculating 3D positions from pairs of features? If so, how can I calculate projMatr1 and projMatr2 without stereoCalibrate?
As you say, you have no calibration, so let’s forget about rectification. What you want is the depth of the points, so you can project them into 3D (which then uses just the intrinsic calibration of one camera, mainly the focal length).
Since you have no rectification, you cannot expect exact results, so let’s try to get as close as possible:
Depth is focal length times baseline divided by disparity, disparity and focal length being in pixels, and depth and baseline in (recommendation) meters.
For accurate disparity you need a rectified camera and correspondences between your features in both images. Since without calibration, you have no hope of rectification, you could try to just use the original images instead. It will work fine the more parallel the cameras are. If they are not parallel, you will introduce an error here and your results will become less accurate. If this becomes bad you must find a way to calibrate your camera.
But most importantly, you need correspondences between your features in both images. Running SIFT in both images won‘t do. A better approach would be running SIFT in just one image and then finding the corresponding pixels for each of the features in the other image. There are plenty of methods for that, I believe OpenCv has some simple block matching builtin.
I am trying to use OpenCV to measure size of filament ( that plastic material used for 3D printing)
What I am trying to do is measuring filament size ( that plastic material used for 3D printing ). The idea is that I use led panel to illuminate filament, then take image with camera, preprocess the image, apply edge detections and calculate it's size. Most filaments are fine made of one colour which is easy to preprocess and get fine results.
The problem comes with transparent filament. I am not able to get useful results. I would like to ask for a little help, or if someone could push me the right directions. I have already tried cropping the image to heigh that is a bit higher than filament, and width just a few pixels and calculating size using number of pixels in those images, but this did not work very well. So now I am here and trying to do it with edge detections
works well for filaments of single colour
not working for transparent filament
Code below is working just fine for common filaments, the problem is when I try to use it for transparent filament. I have tried adjusting tresholds for Canny function. I have tried different colour-spaces. But I am not able to get the results.
Images that may help to understand:
https://imgur.com/gallery/CIv7fxY
image = cv.imread("../images/img_fil_2.PNG") # load image
gray = cv.cvtColor(image, cv.COLOR_BGR2GRAY) # convert image to grayscale
edges = cv.Canny(gray, 100, 200) # detect edges of image
You can use the assumption that the images are taken under the same conditions.
Your main problem is that the reflections in the transparent filament are detected as edges. But, since the image is relatively simple, without any other edges, you can simply take the upper and the lower edge, and measure the distance between them.
A simple way of doing this is to take 2 vertical lines (e.g. image sides), find the edges that intersect the line (basically traverse a column in the image and find edge pixels), and connect the highest and the lowest points to form the edges of the filament. This also removes the curvature in the filament, which I assume is not needed for your application.
You might want to use 3 or 4 vertical lines, for robustness.
I've been working on a project of recognizing a flag shown in the camera using opencv python.
I've already tried using surf, color histogram matching, and template matching. But of these 3, it does not always return the correct answer. what i want now, is what would be the best solution to this problem of mine.
Example of the template images:
Here is an example of flag shown in camera.
what to use if this is the kind of images that i want to recognize?
Update code in matchTemplate
flags=["Cambodia.jpg","Laos.jpg","Malaysia.jpg","Myanmar.jpg","Philippines.jpg","Singapore.jpg","Thailand.jpg","Vietnam.jpg","Indonesia.jpg","Brunei.jpg"]
while True:
methods = 'cv2.TM_CCOEFF_NORMED'
list_of_pics=[]
for flag in flags:
template= cv2.imread(flag,0)
img = cv2.imread('philippines2.jpg',0)
# generate Gaussian pyramid for A
G = template.copy()
gpA = [G]
for i in xrange(6):
G = cv2.pyrDown(G)
gpA.append(G)
n=0
for x in gpA:
w, h = x.shape[::-1]
method = eval(methods)#
# Apply template Match
res = cv2.matchTemplate(img,x,method)
matchVal=res[0][0]
picDict={"matchVal":matchVal,"name":flag}
list_of_pics.append(picDict)
n=n+1
newlist = sorted(list_of_pics, key=operator.itemgetter('matchVal'),reverse=True)
#print newlist
matched_image=newlist[0]['name']
print matched_image
k=cv2.waitKey(10)
if (k==27):
break
cv2.destroyAllWindows()
I don't think that you can get good results from SURF/SIFT because:
SURF/SIFT need keypoints to detect the object but in your case, you have to detect flags and most of the flags are mostly uniform and do not provide much keypoints.
In your webcam frame, you have several things rather than having only flag. Those several things also contribute to get the keypoints.
Solution: i still think that you should use matchTemplate() of opencv which you have already tried but the problem in your version is that you didn't consider the fact that matchTemplate() is not scale and orientation invariant. So, the solution is to use Gaussian pyramid and create the different size (half, one forth, double etc.) of your sample flags. After getting the same flag in 2-5 different size, you should perform the matchTemplate() between every size of flag and the webcam frame.
Strategy:
Receive the webcam frame
Load the image of a flag.
Using Gaussian pyramid, create smaller and bigger images of that flag (you don't need to store them.)
Perform matchTemplate() between the webcam frame and each size of flag.
Result = with which so ever image you get the maximum correlation value is the flag present in your webcam.
REMEMBER: matchTemplate is not scale and orientation invariant. so, if you rotate the image or make it larger/smaller in the webcam frame...you won't get the good results.
SURF cannot be applied to the images that have no corners (when gradient is mostly goes in one direction like in a striped flag). Color histogram of the whole object may not work since both of your examples have similar colors. However, if you can apply a histogram to different parts of the image it will work better.
What you need to do is to split your training image on say 4 quadrants and create 4 color histograms. The testing stage will integrate these 4 back projected histograms and check for the right spatial order of responses. Color histogram is quite robust to rotations, scaling and perspective. It changes with illumination so you need to have liberal matching thresholds. Spatial resolution from 4 quadrants will help to ameliorate this situation.
For the future I recommend studying methods in more detail to understand their applicability rather than trying them randomly.