Can I compute camera pose from image with known scale?

Can I compute camera pose from image with known scale? - python

I have a photo taken from a camera (whose focal length, principle point, and distortion coefficients I know). The photo has a 8cm x 8cm post-in on a table and the center of the post-it is the origin (0, 0) again in cm. I've also indicated the positive-y axis on the post-it.
From this information is it possible to compute the location of the camera and the vector in which the camera is looking in Python using OpenCV? If someone has a snippet of code that does that (assuming you know the coordinates of the post-it corners already) that would be amazing!

Use OpenCV's solvePnP specifying SOLVEPNP_IPPE_SQUARE in the flags. With only 4 points (and a postit) the solution will be quite sensitive to how accurately you mark their images, so ask yourself whether you really need the camera pose and location for your application, and how accurately. E.g., if you just want to make a flat CG "sticker" stay fixed on the table while the camera moves, all you need is estimating a homography, a much simpler task.

It does look like you have all the information required. The marker you use can be easily segmented. Shape analysis will provide corners. I did something similar to get basic eyesight tracking:
Here is a complete example.
Segmentation result for the example:
Please notice, accuracy really matters, so it might be useful to rely on several sets of points.

Related

OpenCV: What can cause a mostly black stereovision disparity map?

I have been dipping my toes into OpenCV and the stereovision functions it contains, and am struggling to get good results while following instructions in both the OpenCV documentation and many articles online. Specifically, I believe that at this point I have managed to obtain a decent calibration of my cameras, a decent stereo calibration, and even a decent rectification, but when moving to create the disparity map I seem to get nonsense back.
I am using a set of self-acquired images taken with a Pentax K-3 ii camera using a Loreo Lens-in-a-cap CCD splitter which gives me "two" images taken on one CCD. I can then split the image in half (and trim some of the pixels near the overlap) to have a reliable baseline distance in world coordinates with the camera. I unfortunately have no information on the true focal length of this configuration but I would guess it is around 9cm.
I have performed camera calibration on each split-image set to get camera matrices, distance coefficients, and object and image points for use in epipolar geometry. Then, following the procedure laid out in [1,2], perform stereo calibration and rectification. I do not have the required reputation to embed images, so please click here. By my understanding, the fact that similar features in both images are similar distances to the true horizontal lines I have drawn across them means that this is a good rectification result and should be usable.
However, when I implement the following code to create the disparity map:
# Settings for cv.StereoSGBM_create
minDisparity = 1
numDisparities = 64
blockSize = 1
disp12MaxDiff = 1
uniquenessRatio = 10
speckleWindowSize = 0
speckleRange = 8
stereo = cv.StereoSGBM_create(minDisparity=minDisparity, numDisparities=numDisparities, blockSize=blockSize, disp12MaxDiff=disp12MaxDiff, uniquenessRatio=uniquenessRatio,
speckleWindowSize=speckleWindowSize, speckleRange=speckleRange)
# Calculate the disparity map
disp = stereo.compute(imgL, imgR).astype(np.float32)
# Normalize the values to spread them across the viewable range
disp = cv.normalize(disp,0,255,cv.NORM_MINMAX)
# Resize for display
disp = cv.resize(disp, (1000,1000))
cv.imshow("disparity",disp)
cv.waitKey(0)
The result is disheartening. Intuitively, seeing a lot of black space surrounding edges which actually are fairly well-defined (such as in the chessboard pattern or near my hands) would suggest that there is very little disparity. However it seems clear to me that the images are quite different in terms of translation, so I am a bit confused. I have been delving through the documentation and run out of ideas. I tried reusing the code that produced the initial set of epipolar lines provided here which seemed to work on the original image quite nicely. However, it produces epipolar lines which are certainly not horizontal. This tells me that something is wrong, but I do not understand what could be, especially given the "visual test" I described above. I suspect I am misapplying that section of the code.
One thought I have is that I need to use an ROI to select the valid parts of the image, but I am unsure how to go about this. I think this is supported by the odd streaking behavior at the right edge of the left image post-rectification.
This is a link to a pastebin of all of my code, aside from the initial camera calibration which has significant runtime due to the size of the images.
I would appreciate any help that can be offered as at this point I am going a bit codeblind. I am limited to only 8 links due to my reputation, so please let me know if I can provide better images or documentation of my work.

Calculating distance between camera and object real time

I am so new to this area. I want to improve myself and I need your advices. I want to detect objects and find the distances between the objects and my camera by using a phone camera. What should I learn in order to achive this? Any advices would be appreciated.

If you want the following: "a single picture, taken with any camera, at any distance, and calculate the distance given an image", then I fear that might be impossible, because there is no depth with a single view. It would be pretty impossible for a nn to just guess how far an object is away by how big an image is. Retrieved from wikipedia:
Depth perception arises from a variety of depth cues. These are
typically classified into binocular cues that are based on the receipt
of sensory information in three dimensions from both eyes and
monocular cues that can be represented in just two dimensions and
observed with just one eye
Now this is out of the way, you did say YOUR camera, using a specific camera changes things, if you know the focal length and angle of view, that would help a lot. Here are some links to illustrate that:
focal length
angle of view
Maybe you can calculate your way out of this, but you will need some constraints or callibration, one way or another. Hope I helped a bit

Method to determine polygon surface rotation from top-down camera

I have a webcam looking down on a surface which rotates about a single-axis. I'd like to be able to measure the rotation angle of the surface.
The camera position and the rotation axis of the surface are both fixed. The surface is a distinct solid color right now, but I do have the option to draw features on the surface if it would help.
Here's an animation of the surface moving through its full range, showing the different apparent shapes:
My approach thus far:
Record a series of "calibration" images, where the surface is at a known angle in each image
Threshold each image to isolate the surface.
Find the four corners with cv2.approxPolyDP(). I iterate through various epsilon values until I find one that yields exactly 4 points.
Order the points consistently (top-left, top-right, bottom-right, bottom-left)
Compute the angles between each points with atan2.
Use the angles to fit a sklearn linear_model.linearRegression()
This approach is getting me predictions within about 10% of actual with only 3 training images (covering full positive, full negative, and middle position). I'm pretty new to both opencv and sklearn; is there anything I should consider doing differently to improve the accuracy of my predictions? (Probably increasing the number of training images is a big one??)
I did experiment with cv2.moments directly as my model features, and then some values derived from the moments, but these did not perform as well as the angles. I also tried using a RidgeCV model, but it seemed to perform about the same as the linear model.

If I'm clear, you want to estimate the Rotation of the polygon with respect to the camera. If you know the length of the object in 3D, you can use solvePnP to estimate the pose of the object, from which you can get the Rotation of the object.
Steps:
Calibrate your webcam and get the intrinsic matrix and distortion matrix.
Get the 3D measurements of the object corners and find the corresponding points in 2d. Let me assume a rectangular planar object and the corners in 3d will be (0,0,0), (0, 100, 0), (100, 100, 0), (100, 0, 0).
Use solvePnP to get the rotation and translation of the object
The rotation will be the rotation of your object along the axis. Here you can find an example to estimate the pose of the head, you can modify it to suit your application

Your first step is good -- everything after that becomes way way way more complicated than necessary (if I understand correctly).
Don't think of it as 'learning,' just think of it as a reference. Every time you're in a particular position where you DON'T know the angle, take a picture, and find the reference picture that looks most like it. Guess it's THAT angle. You're done! (They may well be indeterminacies, maybe the relationship isn't bijective, but that's where I'd start.)
You can consider this a 'nearest-neighbor classifier,' if you want, but that's just to make it sound better. Measure a simple distance (Euclidean! Why not!) between the uncertain picture, and all the reference pictures -- meaning, between the raw image vectors, nothing fancy -- and choose the angle that corresponds to the minimum distance between observed, and known.
If this isn't working -- and maybe, do this anyway -- stop throwing away so much information! You're stripping things down, then trying to re-estimate them, propagating error all over the place for no obvious (to me) benefit. So when you do a nearest neighbor, reference pictures and all that, why not just use the full picture? (Maybe other elements will change in it? That's a more complicated question, but basically, throw away as little as possible -- it should all be useful in, later, accurately choosing your 'nearest neighbor.')

Another option that is rather easy to implement, especially since you've done a part of the job is the following (I've used it to compute the orientation of a cylindrical part from 3 images acquired when the tube was rotating) :
Threshold each image to isolate the surface.
Find the four corners with cv2.approxPolyDP(), alternatively you could find the four sides of your part with LineSegmentDetector (available from OpenCV 3).
Compute the angle alpha, as depicted on the image hereunder
When your part is rotating, this angle alpha will follow a sine curve. That is, you will measure alpha(theta) = A sin(theta + B) + C. Given alpha you want to know theta, but first you need to determine A, B and C.
You've acquired many "calibration" or reference images, you can use all of these to fit a sine curve and determine A, B and C.
Once this is done, you can determine theta from alpha.
Notice that you have to deal with sin(a+Pi/2) = sin(a). It is not a problem if you acquire more than one image sequentially, if you have a single static image, you have to use an extra mechanism.
Hope I'm clear enough, the implementation really shouldn't be a problem given what you have done already.

how to calculate depth of object in image captured by android camera

I have an image captured by android camera. Is it possible to calculate depth of object in the image ? Image contains object and background only. Any suggestion, explanation or links that you think can help me will be appreciated.

OpenCV is the library you need.
I did some depth identification of water levels in pure white background a few days ago. Generally, if you want to identify the depth, you can convert the question to identify the edge of the changing colors. In this case, you can convert the colorful pictures to grey and identify the changing of while-black-grey interface. OpenCV is capable of doing the job at high speed.
Hope it helps. Let me know if you need further help.
Edits:
If you want to find the actual depths, you need to project the coordinate system of your pictures to the real world, or vice versa. To do it, you have to know a fix location as your reference and the relationship between pixels and real distances.
What I did is find the fixed location and set it as zero. Afterwards, I measured a length of an object in the picture, and also calculated the pixel amount of the object. Therefore I obtained the relationship between pixels and real distances.
Note that these procedures may involve errors in the identification. I did it very carefully and the error was acceptable in my case.

With only one image, accurate depth estimation is near impossible. However, there are various methods of estimating depth under certain assumptions or the availability of the camera calibration matrix. As mentioned by #WenlongLiu, OpenCV is a very good place to start with.

Python OpenCV stereo camera position

I'd like to determine the position and orientation of a stereo camera relative to its previous position in world coordinates. I'm using a bumblebee XB3 camera and the motion between stereo pairs is on the order of a couple feet.
Would this be on the correct track?
Obtain rectified image for each pair
Detect/match feature points rectified images
Compute Fundamental Matrix
Compute Essential Matrix
Thanks for any help!

Well, it sounds like you have a fair understanding of what you want to do! Having a pre-calibrated stereo camera (like the Bumblebee) will then deliver up point-cloud data when you need it - but it also sounds like you basically want to also use the same images to perform visual odometry (certainly the correct term) and provide absolute orientation from a last known GPS position, when the GPS breaks down.
First things first - I wonder if you've had a look at the literature for some more ideas: As ever, it's often just about knowing what to google for. The whole idea of "sensor fusion" for navigation - especially in built up areas where GPS is lost - has prompted a whole body of research. So perhaps the following (intersecting) areas of research might be helpful to you:
Navigation in 'urban canyons'
Structure-from-motion for navigation
SLAM
Ego-motion
Issues you are going to encounter with all these methods include:
Handling static vs. dynamic scenes (i.e. ones that change purely based on the camera motion - c.f. others that change as a result of independent motion occurring in the scene: trees moving, cars driving past, etc.).
Relating amount of visual motion to real-world motion (the other form of "calibration" I referred to - are objects small or far away? This is where the stereo information could prove extremely handy, as we will see...)
Factorisation/optimisation of the problem - especially with handling accumulated error along the path of the camera over time and with outlier features (all the tricks of the trade: bundle adjustment, ransac, etc.)
So, anyway, pragmatically speaking, you want to do this in python (via the OpenCV bindings)?
If you are using OpenCV 2.4 the (combined C/C++ and Python) new API documentation is here.
As a starting point I would suggest looking at the following sample:
/OpenCV-2.4.2/samples/python2/lk_homography.py
Which provides a nice instance of basic ego-motion estimation from optic flow using the function cv2.findHomography.
Of course, this homography H only applies if the points are co-planar (i.e. lying on the same plane under the same projective transform - so it'll work on videos of nice flat roads). BUT - by the same principal we could use the Fundamental matrix F to represent motion in epipolar geometry instead. This can be calculated by the very similar function cv2.findFundamentalMat.
Ultimately, as you correctly specify above in your question, you want the Essential matrix E - since this is the one that operates in actual physical coordinates (not just mapping between pixels along epipoles). I always think of the Fundamental matrix as a generalisation of the Essential matrix by which the (inessential) knowledge of the camera intrinsic calibration (K) is omitted, and vise versa.
Thus, the relationships can be formally expressed as:
E = K'^T F K
So, you'll need to know something of your stereo camera calibration K after all! See the famous Hartley & Zisserman book for more info.
You could then, for example, use the function cv2.decomposeProjectionMatrix to decompose the Essential matrix and recover your R orientation and t displacement.
Hope this helps! One final word of warning: this is by no means a "solved problem" for the complexities of real world data - hence the ongoing research!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.