StereoCalibration in OpenCV: Shouldn't this work without ObjectPoints?

StereoCalibration in OpenCV: Shouldn't this work without ObjectPoints? - python

I have two questions relating to stereo calibration with opencv. I have many pairs of calibration images like these:
Across the set of calibration images the distance of the chessboard away from the camera varies, and it is also rotated in some shots.
From within this scene I would like to map pairs of image coordinates (x,y) and (x',y') onto object coordinates in a global frame: (X,Y,Z).
In order to calibrate the system I have detected pairs of image coordinates of all chessboard corners using cv2.DetectChessboardCorners(). From reading Hartley's Multiple View Geometry in Computer Vision I gather I should be able to calibrate this system up to a scale factor without actually specifying the object points of the chessboard corners. First question: Is this correct?
Investigating cv2's capabilities, the closest thing I've found is cv2.stereoCalibrate(objectpoints,imagepoints1,imagepoints2).
I have obtained imagepoints1 and imagepoints2 from cv2.findChessboardCorners. Apparently from the images shown I can approximately extract (X,Y,Z) relative to the frame on the calibration board (by design), which would allow me to apply cv2.stereoCalibrate(). However, I think this will introduce error, and it prevents me from using all of the rotated photos of the calibration board which I have. Second question: Can I calibrate without object points using opencv?
Thanks!

No. You must specify the object points. Note that they need not change across the image sequence, since you can interpret the change as due to camera motion relative to the target. Also, you can (should) assume that Z=0 for a planar target like yours. You may specify X,Y up to scale, and thus obtain after calibration translations up to scale.
No
Clarification: by "need not change across the image sequence" I mean that you can assume the target fixed in the world frame, and interpret the relative motion as due to the camera only. The world frame itself, absent a better prior, can be defined by the pose of the target in any one of the images (say, the first one). Obviously, I do not mean that the pose of the target relative to the camera does not change - in fact, it must change in order to obtain a calibration. If you do have a better prior, you should use if. For example, if the target moves on a turntable, you should solve directly for the parameters of the cylindrical motion, since there is less of them (one constant axis, one constant radius, plus one angle per image, rather than 6 parameters per image).

Related

How to use cv2.triangulatePoints with a single moving camera

I have a single camera that I can move around. I have the intrinsic parameter matrix and the extrinsic parameter matrix for each camera orientation. For object detection, I use YOLO and I get 2D bounding boxes in the image plane. My plan is to use a temporal pair of images, with the detected object in it, to triangulate the midpoint of the resulting 2D bounding box around the object.
Right now, I use two images that are 5 frames apart. That means, the first frame has the object in it and the second frame has the same object in it after a few milliseconds. I use cv2.triangulatePoints to get the corresponding 3D point for the 2D midpoint of the bounding box.
My main problem is that when the camera is more or less steady, the resulting distance value is accurate (within a few centimeters). However, when I move the camera around, the resulting distance value for the object starts varying quite a bit (the object is static and never moves, only the camera looking at it moves). I can't seem to understand why this is the case.
For cv2.triangulatePoints, I get the relative rotation matrix between the two temporal camera orientations (R = R2R1) and then get the relative translation (t = t2 - Rt1). P1 and P2 are the final projection matrices (P1 for the camera at an earlier position and P2 for the camera at a later position). P1 = K[I|0] and P2 = K[R|t], where K is the 3x3 intrinsic parameter matrix, I is a 3x3 identity matrix, and 0 is 3x1 vector of zeros.
Should I use a temporal gap of 10 frames or is using this method to localize objects using a single camera never accurate?

The centers of the bounding boxes are not guaranteed to be the projections of a single scene (3d) point, even with a perfect track, unless additional constraints are added. For example, that the tracked object is planar, or that the vertexes of the bounding boxes track points that are on a plane. Things get more complicated when tracking errors are present.
If you really need to triangulate the box centers (do you? are you sure you can't achieve your goals using only well-matched projections?), you could use a small area around the center in one box as a pattern, and track it using a point tracker (e.g. one based on the Lucas-Kanade algorithm, or one based on normalized cross-correlation) in the second image, using the second box to constrain the tracker search window.
Then you may need to address the accuracy of your camera motion estimation - if errors are significant your triangulations will go nowhere. Bundle adjustment may need to become your friend.

How to generate a 3D image based on ChArUco calibration of two 2D images

I'm currently extracting the calibration parameters of two images that were taken in a stereo vision setup via cv2.aruco.calibrateCameraCharucoExtended(). I'm using the cv2.undistortPoints() & cv2.triangulatePoints() function to convert any two 2D points to a 3D point coordinate, which works perfectly fine.
I'm now looking for a way to convert the 2D images, which can be seen under approach 1, to one 3D image. I need this 3D image because I would like to determine the order of these cups from left to right, to correctly use the triangulatePoints function. If I determine the order of the cups from left to right purely based on the x-coordinates of each of the 2D images, I get different results for each camera (the cup on the front left corner of the table for example is in a different 'order' depending on the camera angle).
Approach 1: Keypoint Feature Matching
I was first thinking about using a keypoint feature extractor like SIFT or SURF, so I therefore tried to do some keypoint extraction and matching. I tried using both the Brute-Force Matching and FLANN based Matcher, but the results are not really good:
Brute-Force
FLANN-based
I also tried to swap the images, but it still gives more or less the same results.
Approach 2: ReprojectImageTo3D()
I looked further into the issue and I think I need the cv2.reprojectImageTo3D() [docs] function. However, to use this function, I first need the Q matrix which needs to be obtained with cv2.stereoRectify [docs]. This stereoRectify function on its turn expects a couple of parameters that I'm able to provide, but there's two I'm confused about:
R – Rotation matrix between the
coordinate systems of the first and
the second cameras.
T – Translation vector between
coordinate systems of the cameras.
I do have the rotation and translation matrices for each camera separately, but not between them? Also, do I really need to do this stereoRectify all over again when I already did a full calibration in ChArUco and already have the camera matrix, distortion coefficients, rotation vectors and translations vectors?
Some extra info that might be useful
I'm using 40 calibration images per camera of the ChArUco board to calibrate. I first extract all corners and markers after which I estimate the calibration parameters with the following code:
(ret, camera_matrix, distortion_coefficients0,
rotation_vectors, translation_vectors,
stdDeviationsIntrinsics, stdDeviationsExtrinsics,
perViewErrors) = cv2.aruco.calibrateCameraCharucoExtended(
charucoCorners=allCorners,
charucoIds=allIds,
board=board,
imageSize=imsize,
cameraMatrix=cameraMatrixInit,
distCoeffs=distCoeffsInit,
flags=flags,
criteria=(cv2.TERM_CRITERIA_EPS & cv2.TERM_CRITERIA_COUNT, 10000, 1e-9))
The board paremeter is created with the following settings:
CHARUCO_BOARD = aruco.CharucoBoard_create(
squaresX=9,
squaresY=6,
squareLength=4.4,
markerLength=3.5,
dictionary=ARUCO_DICT)
Thanks a lot in advance!

Method to determine polygon surface rotation from top-down camera

I have a webcam looking down on a surface which rotates about a single-axis. I'd like to be able to measure the rotation angle of the surface.
The camera position and the rotation axis of the surface are both fixed. The surface is a distinct solid color right now, but I do have the option to draw features on the surface if it would help.
Here's an animation of the surface moving through its full range, showing the different apparent shapes:
My approach thus far:
Record a series of "calibration" images, where the surface is at a known angle in each image
Threshold each image to isolate the surface.
Find the four corners with cv2.approxPolyDP(). I iterate through various epsilon values until I find one that yields exactly 4 points.
Order the points consistently (top-left, top-right, bottom-right, bottom-left)
Compute the angles between each points with atan2.
Use the angles to fit a sklearn linear_model.linearRegression()
This approach is getting me predictions within about 10% of actual with only 3 training images (covering full positive, full negative, and middle position). I'm pretty new to both opencv and sklearn; is there anything I should consider doing differently to improve the accuracy of my predictions? (Probably increasing the number of training images is a big one??)
I did experiment with cv2.moments directly as my model features, and then some values derived from the moments, but these did not perform as well as the angles. I also tried using a RidgeCV model, but it seemed to perform about the same as the linear model.

If I'm clear, you want to estimate the Rotation of the polygon with respect to the camera. If you know the length of the object in 3D, you can use solvePnP to estimate the pose of the object, from which you can get the Rotation of the object.
Steps:
Calibrate your webcam and get the intrinsic matrix and distortion matrix.
Get the 3D measurements of the object corners and find the corresponding points in 2d. Let me assume a rectangular planar object and the corners in 3d will be (0,0,0), (0, 100, 0), (100, 100, 0), (100, 0, 0).
Use solvePnP to get the rotation and translation of the object
The rotation will be the rotation of your object along the axis. Here you can find an example to estimate the pose of the head, you can modify it to suit your application

Your first step is good -- everything after that becomes way way way more complicated than necessary (if I understand correctly).
Don't think of it as 'learning,' just think of it as a reference. Every time you're in a particular position where you DON'T know the angle, take a picture, and find the reference picture that looks most like it. Guess it's THAT angle. You're done! (They may well be indeterminacies, maybe the relationship isn't bijective, but that's where I'd start.)
You can consider this a 'nearest-neighbor classifier,' if you want, but that's just to make it sound better. Measure a simple distance (Euclidean! Why not!) between the uncertain picture, and all the reference pictures -- meaning, between the raw image vectors, nothing fancy -- and choose the angle that corresponds to the minimum distance between observed, and known.
If this isn't working -- and maybe, do this anyway -- stop throwing away so much information! You're stripping things down, then trying to re-estimate them, propagating error all over the place for no obvious (to me) benefit. So when you do a nearest neighbor, reference pictures and all that, why not just use the full picture? (Maybe other elements will change in it? That's a more complicated question, but basically, throw away as little as possible -- it should all be useful in, later, accurately choosing your 'nearest neighbor.')

Another option that is rather easy to implement, especially since you've done a part of the job is the following (I've used it to compute the orientation of a cylindrical part from 3 images acquired when the tube was rotating) :
Threshold each image to isolate the surface.
Find the four corners with cv2.approxPolyDP(), alternatively you could find the four sides of your part with LineSegmentDetector (available from OpenCV 3).
Compute the angle alpha, as depicted on the image hereunder
When your part is rotating, this angle alpha will follow a sine curve. That is, you will measure alpha(theta) = A sin(theta + B) + C. Given alpha you want to know theta, but first you need to determine A, B and C.
You've acquired many "calibration" or reference images, you can use all of these to fit a sine curve and determine A, B and C.
Once this is done, you can determine theta from alpha.
Notice that you have to deal with sin(a+Pi/2) = sin(a). It is not a problem if you acquire more than one image sequentially, if you have a single static image, you have to use an extra mechanism.
Hope I'm clear enough, the implementation really shouldn't be a problem given what you have done already.

How to mosaic/bend/curve image with curvature in python?

I have an image that represents the elevation of some area. But the drone that made it didn't necessarily go in a straight line(although image is always rectangular). I also have gps coordinates generated every 20cm of the way.
How can I "bend" this rectangular image (curve/mosaic) so that it represents the curved path that the drone actually went through? (in python)
I haven't managed to write any code as I have no idea what is the name of this "warping" of the image. Please find the attached image as a wanted end state, and normal horizontal letters as a start state.

There might be a better answer, but I guess you could use the remapping functions of openCV for that.
The process would look like that :
From your data, get your warping function. This will be a function that maps (x,y) pixel values from your input image I to (x,y) pixel values from your output image O
Compute the size needed in the output image to host your whole warped image, and create it
Create two maps, mapx and mapy, which will tell the pixel coordinates in I for every pixel in 0 (that's, in a sense, the inverse of your warping function)
Apply OpenCV remap function (which is better than simply applying your maps because it interpolates if the output image is larger than the input)
Depending on your warping function, it might be very simple, or close to impossible to apply this technique.
You can find an example with a super simple warping function here : https://docs.opencv.org/2.4/doc/tutorials/imgproc/imgtrans/remap/remap.html
More complex examples can be looked at in OpenCV doc and code when looking at distortion and rectification of camera images.

Image registration b-spline opencv

Here is my question:
My optical system is made of a camera plus a circular plexiglass "lens" that changes its curvature depending on pressure (radial bending).
This curvature induces a deformation of the image captured by the camera.
To correct this deformation, images need to be calibrated.
Calibration can be made with a grid (chessboard, dots, lines), pressure range has to be discretized with a certain step.
For each pressure step, an image of the grid has to be taken.
Then each image has to be compared to the reference one (P=0), and a transformation matrix has to be computed and stored.
Finally, each image taken during the experiment for a specific pressure has to be corrected by the transformation matrix.
The deformation is non-linear (not only a combination of rotations and translations), but most likely Barrel distortion. (again not induced by the camera)
Which looks like that:
http://en.wikipedia.org/wiki/Distortion_%28optics%29#mediaviewer/File:Barrel_distortion.svg
I found a plugin in ImageJ called BunwarpJ, http://biocomp.cnb.csic.es/~iarganda/bUnwarpJ/
and I basically want to know if there is an equivalent way to produce the same result in Opencv.
(CalibrateCamera won't do the trick)

OpenCv has an undistort function that can take a current image, a matrix of camera coefficients, distorsion coeffs. and produces a new image corrected for sent camera coeffs. and a new set of camera coeffs. (if you need to do other transformations on the new image).
I have not used it before, so I can't say what exactly are camera or distorsion coefficients are but as manual describes:
The function transforms an image to compensate radial and tangential
lens distortion. The function is simply a combination of
initUndistortRectifyMap() (with unity R ) and remap() (with bilinear
interpolation).
So checking those two funcs. out are a good way to find out.
I believe you misunderstood the manual perhaps because you seem to think that CalibrateCamera does this for you. Instead CalibrateCamera actually returns the camera and distorsion coeffs. which you need to undistort your image.
Each lens has its own constant coeffs. which in your case means that you'll have to calibrateCamera for a range of pressures (I assume you control that experimentally?) and then call different undistort func. with different parameters which you'll get out of your experiments.

A matrix can only capture a linear transformation (or possibly a linear transformation in homogeneous space), not a general distortion.
In my experience any attempt to use a single global transformation formula wouldn't be very accurate (it's not trivial to get just 99.9% accuracy). Even just correcting camera lens distortion this way is difficult if you want high accuracy.
In the past I got good enough results using a sparse global RBF interpolation, but later I moved to an interpolating 2d spline approach; if you can choose your calibration points to be on a regular grid this is the solution I would suggest.
In the end the mapping could be a 2-valued 3d interpolating spline on a regular grid (XY for the image, Z for the pressure; values UV are the pixel coordinates).
Straightening the image once pressure is known is just texture mapping.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.