How to use cv2.triangulatePoints with a single moving camera

How to use cv2.triangulatePoints with a single moving camera - python

I have a single camera that I can move around. I have the intrinsic parameter matrix and the extrinsic parameter matrix for each camera orientation. For object detection, I use YOLO and I get 2D bounding boxes in the image plane. My plan is to use a temporal pair of images, with the detected object in it, to triangulate the midpoint of the resulting 2D bounding box around the object.
Right now, I use two images that are 5 frames apart. That means, the first frame has the object in it and the second frame has the same object in it after a few milliseconds. I use cv2.triangulatePoints to get the corresponding 3D point for the 2D midpoint of the bounding box.
My main problem is that when the camera is more or less steady, the resulting distance value is accurate (within a few centimeters). However, when I move the camera around, the resulting distance value for the object starts varying quite a bit (the object is static and never moves, only the camera looking at it moves). I can't seem to understand why this is the case.
For cv2.triangulatePoints, I get the relative rotation matrix between the two temporal camera orientations (R = R2R1) and then get the relative translation (t = t2 - Rt1). P1 and P2 are the final projection matrices (P1 for the camera at an earlier position and P2 for the camera at a later position). P1 = K[I|0] and P2 = K[R|t], where K is the 3x3 intrinsic parameter matrix, I is a 3x3 identity matrix, and 0 is 3x1 vector of zeros.
Should I use a temporal gap of 10 frames or is using this method to localize objects using a single camera never accurate?

The centers of the bounding boxes are not guaranteed to be the projections of a single scene (3d) point, even with a perfect track, unless additional constraints are added. For example, that the tracked object is planar, or that the vertexes of the bounding boxes track points that are on a plane. Things get more complicated when tracking errors are present.
If you really need to triangulate the box centers (do you? are you sure you can't achieve your goals using only well-matched projections?), you could use a small area around the center in one box as a pattern, and track it using a point tracker (e.g. one based on the Lucas-Kanade algorithm, or one based on normalized cross-correlation) in the second image, using the second box to constrain the tracker search window.
Then you may need to address the accuracy of your camera motion estimation - if errors are significant your triangulations will go nowhere. Bundle adjustment may need to become your friend.

Related

How can i calculate the direction vector of a pixel?

So if you take a pinhole camera and make it as the origin of our plane(3D) and a pixel from the image plane and connect the two with a straight line it should make a vector, which has direction and a length. Think of this as the path followed by the light reflected from an object into the camera lens. And I want to calculate this. I think we have to use the cameras intrinsic properties for this.
Below is a statement that made me think about it all.
In a pinhole camera model, each pixel defines a direction vector in 3D space, specifically the vector from the projection center through the pixel's position on the image plane.
Here is a diagram better explaining this.
I want to calculate the three red lines, and known parameters would be, I guess, the camera position(origin) and the image pixel value, and the intrinsic camera parameters.

Method to determine polygon surface rotation from top-down camera

I have a webcam looking down on a surface which rotates about a single-axis. I'd like to be able to measure the rotation angle of the surface.
The camera position and the rotation axis of the surface are both fixed. The surface is a distinct solid color right now, but I do have the option to draw features on the surface if it would help.
Here's an animation of the surface moving through its full range, showing the different apparent shapes:
My approach thus far:
Record a series of "calibration" images, where the surface is at a known angle in each image
Threshold each image to isolate the surface.
Find the four corners with cv2.approxPolyDP(). I iterate through various epsilon values until I find one that yields exactly 4 points.
Order the points consistently (top-left, top-right, bottom-right, bottom-left)
Compute the angles between each points with atan2.
Use the angles to fit a sklearn linear_model.linearRegression()
This approach is getting me predictions within about 10% of actual with only 3 training images (covering full positive, full negative, and middle position). I'm pretty new to both opencv and sklearn; is there anything I should consider doing differently to improve the accuracy of my predictions? (Probably increasing the number of training images is a big one??)
I did experiment with cv2.moments directly as my model features, and then some values derived from the moments, but these did not perform as well as the angles. I also tried using a RidgeCV model, but it seemed to perform about the same as the linear model.

If I'm clear, you want to estimate the Rotation of the polygon with respect to the camera. If you know the length of the object in 3D, you can use solvePnP to estimate the pose of the object, from which you can get the Rotation of the object.
Steps:
Calibrate your webcam and get the intrinsic matrix and distortion matrix.
Get the 3D measurements of the object corners and find the corresponding points in 2d. Let me assume a rectangular planar object and the corners in 3d will be (0,0,0), (0, 100, 0), (100, 100, 0), (100, 0, 0).
Use solvePnP to get the rotation and translation of the object
The rotation will be the rotation of your object along the axis. Here you can find an example to estimate the pose of the head, you can modify it to suit your application

Your first step is good -- everything after that becomes way way way more complicated than necessary (if I understand correctly).
Don't think of it as 'learning,' just think of it as a reference. Every time you're in a particular position where you DON'T know the angle, take a picture, and find the reference picture that looks most like it. Guess it's THAT angle. You're done! (They may well be indeterminacies, maybe the relationship isn't bijective, but that's where I'd start.)
You can consider this a 'nearest-neighbor classifier,' if you want, but that's just to make it sound better. Measure a simple distance (Euclidean! Why not!) between the uncertain picture, and all the reference pictures -- meaning, between the raw image vectors, nothing fancy -- and choose the angle that corresponds to the minimum distance between observed, and known.
If this isn't working -- and maybe, do this anyway -- stop throwing away so much information! You're stripping things down, then trying to re-estimate them, propagating error all over the place for no obvious (to me) benefit. So when you do a nearest neighbor, reference pictures and all that, why not just use the full picture? (Maybe other elements will change in it? That's a more complicated question, but basically, throw away as little as possible -- it should all be useful in, later, accurately choosing your 'nearest neighbor.')

Another option that is rather easy to implement, especially since you've done a part of the job is the following (I've used it to compute the orientation of a cylindrical part from 3 images acquired when the tube was rotating) :
Threshold each image to isolate the surface.
Find the four corners with cv2.approxPolyDP(), alternatively you could find the four sides of your part with LineSegmentDetector (available from OpenCV 3).
Compute the angle alpha, as depicted on the image hereunder
When your part is rotating, this angle alpha will follow a sine curve. That is, you will measure alpha(theta) = A sin(theta + B) + C. Given alpha you want to know theta, but first you need to determine A, B and C.
You've acquired many "calibration" or reference images, you can use all of these to fit a sine curve and determine A, B and C.
Once this is done, you can determine theta from alpha.
Notice that you have to deal with sin(a+Pi/2) = sin(a). It is not a problem if you acquire more than one image sequentially, if you have a single static image, you have to use an extra mechanism.
Hope I'm clear enough, the implementation really shouldn't be a problem given what you have done already.

StereoCalibration in OpenCV: Shouldn't this work without ObjectPoints?

I have two questions relating to stereo calibration with opencv. I have many pairs of calibration images like these:
Across the set of calibration images the distance of the chessboard away from the camera varies, and it is also rotated in some shots.
From within this scene I would like to map pairs of image coordinates (x,y) and (x',y') onto object coordinates in a global frame: (X,Y,Z).
In order to calibrate the system I have detected pairs of image coordinates of all chessboard corners using cv2.DetectChessboardCorners(). From reading Hartley's Multiple View Geometry in Computer Vision I gather I should be able to calibrate this system up to a scale factor without actually specifying the object points of the chessboard corners. First question: Is this correct?
Investigating cv2's capabilities, the closest thing I've found is cv2.stereoCalibrate(objectpoints,imagepoints1,imagepoints2).
I have obtained imagepoints1 and imagepoints2 from cv2.findChessboardCorners. Apparently from the images shown I can approximately extract (X,Y,Z) relative to the frame on the calibration board (by design), which would allow me to apply cv2.stereoCalibrate(). However, I think this will introduce error, and it prevents me from using all of the rotated photos of the calibration board which I have. Second question: Can I calibrate without object points using opencv?
Thanks!

No. You must specify the object points. Note that they need not change across the image sequence, since you can interpret the change as due to camera motion relative to the target. Also, you can (should) assume that Z=0 for a planar target like yours. You may specify X,Y up to scale, and thus obtain after calibration translations up to scale.
No
Clarification: by "need not change across the image sequence" I mean that you can assume the target fixed in the world frame, and interpret the relative motion as due to the camera only. The world frame itself, absent a better prior, can be defined by the pose of the target in any one of the images (say, the first one). Obviously, I do not mean that the pose of the target relative to the camera does not change - in fact, it must change in order to obtain a calibration. If you do have a better prior, you should use if. For example, if the target moves on a turntable, you should solve directly for the parameters of the cylindrical motion, since there is less of them (one constant axis, one constant radius, plus one angle per image, rather than 6 parameters per image).

How to find neighbors in binary image with given horizontal and vertical distance (Python)

I have an Image (or several hundreds of them) that need to be analyzed. The goal is to find all black spots close to each other.
For example all black spots with a Horizontal distance of 160 pixel and vertical 40 pixel.
For now I just look at each Pixel and if there is a black pixel I call a recursive Method to find its neighbours (i can post the code too if you want to)
It works, but its very slow. At the moment the script runs about 3-4 minutes depending on image size.
Is there some easy/fast way to accomplish this (best would be a scikit-image method to help out here) I'm using Python.
edit: I tried to use scikit.measure.find_contours, now i have an array with arrays containing the contours of the black spots. Now I only need to find the contours in the neighbourhood of these contours.

When you get the coordinates of the different black spots, rather than computing all distances between all pairs of black pixels, you can use a cKDTree (in scipy.spatial, http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.cKDTree.html#scipy.spatial.cKDTree). The exact method of cKDTree to use depends on your exact criterion (you can for example use cKDTree.query_ball_tree to know whether there exists a pair of points belonging to two different labels, with a maximal distance that you give).
KDTrees are a great method to reduce the complexity of problems based on neighboring points. If you want to use KDTrees, you'll need to rescale the coordinates so that you can use one of the classical norms to compute the distance between points.

Disclaimer: I'm not proficient with the scikit image library at all, but I've tackled similar problems using MATLAB so I've searched for the equivalent methods in scikit, and I hope my findings below help you.
First you can use skimage.measure.label which returns label_image, i.e. an image where all connected regions are labelled with the same number. I believe you should call this function with background=255 because from your description it seems that the background in your images is the while region (hence the value 255).
This is essentially an image where the background pixels are assigned the value 0 and the pixels that make up each (connected) spot are assigned the value of an integer label, so all the pixels of one spot will be labelled with the value 1, the pixels of another spot will be labelled with the value 2, and so on. Below I'll refer to "spots" and "labelled regions" interchangeably.
You can then call skimage.measure.regionprops, that takes as input the label_image obtained in the previous step. This function returns a list of RegionProperties (one for each labelled region), which is a summary of properties of a labelled region.
Depending on your definition of
The goal is to find all black spots close to each other.
there are different fields of the RegionProperties that you can use to help solve your problem:
bbox gives you the set of coordinates of the bounding box that contains that labelled region,
centroid gives you the coordinates of the centroid pixel of that labelled region,
local_centroid gives you the centroid relative to the bounding box bbox
(Note there are also area and bbox_area properties which you can use to decide whether to throw away very small spots that you might not be interested in, thus reducing computation time when it comes to comparing proximity of each pair of spots)
If you're looking for a coarse comparison, then comparing the centroid or local_centroid between each pair of labelled regions might be enough.
Otherwise you can use the bbox coordinates to measure the exact distance between the outer bounds of any two regions.
If you want to base the decision on the precise distance between the pixel(s) of each pair of regions that are closest to each other, then you'll likely have to use the coords property.

If your input image is binary, you could separate your regions of interest as follows:
"grow" all the regions by the expected distance (actually half of it, as you grow from "both sides of the gap") with binary_dilation, where the structure is a kernel (e.g. rectangular: http://scikit-image.org/docs/dev/api/skimage.morphology.html#skimage.morphology.rectangle) of, let's say, 20x80pixels;
use the resulting mask as an input to skimage.measure.label to assign different values for different regions' pixels;
multiply your input image by the mask created above to zero dilated pixels.
Here are the results of proposed method on your image and kernel = rectange(5,5):
Dilated binary image (output of step 1):
Labeled version of the above (output of step 2):
Multiplication results (output of step 3):

OpenCV find object's position with solvePnPRansac with not-corresponding points

I am trying to find object's position relative to camera position in real-world coordinates by tracking a known 2D LED pattern on the object.
I did camera calibration. I was able to sucessfully detect LEDs in the pattern and find their exact coordinates in the image frame. These points however do not correspond exactly 1-to-1 to the known coordinates in the pattern, they are in random order. The correspondence is important in functions like solvePnPRansac or findHomography, which would be my first choice to use.
How can I find the correspondence between these sets of points or maybe should I use some other function to calculate transformation just like solvePnPRansac does?

As you did not ask about the way to estimate the relative pose between your object and your camera, I will let this topic aside and focus on the way to find correspondences between each LED and their 2D projections.
In order to obtain a unique 1-to-1 correspondence set, the LED pattern you use should be unambiguous with respect to rotation. For example, you may use a regular NxN grid with the top-left cell containing an additional LED, or LEDs located on a circle with one extra LED underneath a single one, etc. Then, the method to find the correspondences depends on the pattern you chose.
In the case of the circle pattern, you could do the following:
Estimate the center of gravity of the points
Find the disambiguing point, which is the only one not lying on a circle, and define the closest of the other points as the first observed point
Order the remaining points by increasing angle with respect to the center of gravity (i.e. clock-wise order)
In the case of the regular grid pattern, you could try the following:
Find the four corners of the grid (those with min/max coordinates)
Estimate the homography which transforms these four corners to the corners of a regular NxN square (with orthogonal angles)
Transform the other points using this homography
Find the disambiguing point, which is the only one for which X-floor(X) and Y-floor(Y) are close to 0.5, and define the closest of the four initial corners as the first observed point
Order the remaining points by increasing angle with respect to the center of the grid and decreasing distance to the center of the grid
You could also study the algorithm used by the function findChessboardCorners (see calibinit.cppin the calib3D module), which uses a similar approach to order the detected corners.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.