3D Reconstruction from several images - python

i'm sorry if someone already answered at this question, but I'm looking at every page on the Internet not founding the perfect answer for my problem. I need to reconstruct a 3D model from multiple 2D images. The fact is that i already have the images and i don't have info about the camera. I just know that it has been used one camera that has been rotated around the object. In order to reconstruct the 3D shape of the object i need to establish the camera matrix but i don't have any idea how to do it. I'm using ORB feature detection treating two images like a stereo view to establish a corrispondance and find the Fundamental Matrix and the Homography but i can't proceed to find camera parameters. I'm using python and OpenCV. Thanks in advance.

its been a while since youve asked this question but you can extract the focal length of an image from its Exif tags, It will work only if the images you have are in JPEG format. The optical centers can be approximated as width/2 and height/2, heres a good material about the same: http://phototour.cs.washington.edu/focal.html
Reading Exif can be done using a plethora of packages/libraries available, one such example in Python is : https://pypi.org/project/ExifRead/
Note : Exif data of focal lengths are in mm, you might have to convert it to pixels beforehand by using the sensor width value(also encoded in Exif tag).
F(pixels) = F(mm) x ImageWidth(pixel)/SensorWidth(mm)
Once you have Focal lengths, Cx, Cy determined you can fit these into Camera matrix K and proceed via SFM/MVS or Stereo Reconstruction based on the images you have at hand.

Related

Apply bundle adjustment to rectify images globally in the context of image stitching in python/openCV

I am trying to perform image registration on potentially hundreds of aerial images taken from a camera mounted on a UAV. I think it is safe to assume that I know the ordering of the images, and hopefully, sequential images will overlap.
I have read some papers that suggest using a CNN to find the homography matrix can vastly outperform the old school feature descriptor matching with RANSAC song and dance. My issue is that I don't quite understand how to stitch more than 2 images together. It seems to me that to register image 100 in the same coordinate frame as image 1 using the cv2.warpPerspective function, I would do I100H1H2*H3...H99. Even if the error in each transform is small after 100 applications it seems like it would be huge. My understanding is that the solution to this problem is bundle adjustment.
I have looked into bundle adjustment a little bit but Im struggling to see how exactly I can use it. I have read the paper that many related stack overflow posts suggest "Automatic Panoramic Image Stitching using Invariant Features". In the section on bundle adjustment IF I understand the authors suggest that after building the initial panorama it is likely that image A will eventually overlap with multiple other images. Using the matched feature points in any images that overlap with A they basically calculate some adjustment...? I think to image A?
My question is using openCV how do I apply this adjustment? Let's say I have 3 images I1, I2, I3 all overlapping for a minimal example.
#assuimg CNN model predicts transform
#I think the first step is find the homography between all images
H12 = cnnMod.predict(I1,I2)
H13 = cnnMod.predict(I1,I3)
H23 = cnnMod.predict(I2,I3)
outI2 = cv2.warpPerspective(I2,H12,(maxWidth, maxHeight),flags=cv2.INTER_LINEAR)
outI3 = cv2.warpPerspective(I2,H23,(maxWidth, maxHeight),flags=cv2.INTER_LINEAR)
#now would I do some bundle voodoo?
#what would it look like?
#which of the bundler classes should I use?
#would it look like this?
#or maybe the input is features?
voodoo = cv2.bundleVoodoo([H12,H13,H23])
golaballyRectifiedI2 = cv2.warpPerspective(outI2,voodoo[2],(maxWidth, maxHeight),flags=cv2.INTER_LINEAR)
The code is my best guess at what a solution might look like but clearly I have no idea what I am doing. I've not been able to find anything that actually shows how the bundle adjustment is done.
The basic idea underlying image alignment through bundle adjustment is that, rather than matching pairs of 2D points (x, x') across pairs of images, you posit the existence of 3d points X that, ideally, project onto matched tuples of 2D points (x, x', x'', ...) matched among corresponding tuples of images. You then solve for the location of the X's and the camera parameters (extrinsics, and intrinsics if the camera is uncalibrated) that minimize the (robustified, usually) RMS reprojection error over all 2d points and images.
Depending on your particular setup and scene, you may make some simplifying assumptions, e.g.:
That the X's all belong to the same plane (which you can arbitrarily choose as the world's Z=0 plane). This is useful, for example, when stitching images of a painting, or aerial images on relatively flat ground with relatively small extent so one can ignore the earth's curvature.
Or that the X's are all on the WGS84 ellipsoid.
Both the above assumptions remove one free coordinate from X, effectively reducing the problem's dimensionality.

How to Transform Centroid from Pixel to Real World Coordinates

I am working on an application using an IFM 3D camera to identify parts prior to a robot pickup. Currently I am able to find the centroid of these objects using contours from a depth image and from there calculate the center point of these objects in pixel space.
My next task is to then transform the 2D centroid coordinates to a 3D point in 'real' space. I am able to train the robot such that it's coordinate frame is either at the center of the image or at the traditional (0,0) point of an image (top left).
The 3D camera I am using provides both an intrinsic and extrinsic matrix. I know I need to use some combination of these matrices to project my centroid into three space but the following questions remain:
My current understanding from googling is the intrinsic matrix is used to fix lens distortion (barrel and pinhole warping, etc.) whereas the extrinsic matrix is used to project points into the real world. Is this simplified assumption correct?
How can a camera supply a single extrinsic matrix? I know traditionally these matrices are found using the checkerboard corners method but are these not dependent on the height of the camera?
Is the solution as simple as taking the 3x4 extrinsic matrix and multiplying it by a 3x1 matrix [x, y, 1] and if so, will the returned values be relative to the camera center or the traditional (0,0) point of an image.
Thanks in advance for any insight! Also if it's any consolation I am doing everything in python and openCV.
No. I suggest you read the basics in Multiple View Geometry of Hartley and Zisserman, freely available in the web. Dependent on the camera model, the intrinsics contain different parameters. For the pinhole camera model, these are the focal length and the principal point.
The only reason why you maybe could directly transform your 2D centroid to 3D is that you use a 3D camera. Read the manual of the camera, it should be explained how the relation between 2D and 3D coordinates is given for your specific model.
If you have only image data, you can only compute a 3D point from at least two views.
No, of course not. Please don't be lazy and start reading the basics about camera projection instead of asking for others to explain the common basics that are written down everywhere in the web and literature.

How getPerspectiveTransform and warpPerspective work? [Python]

I couldn't find a perfect explanation for how getPerspectiveTransform and warpPerspective work in OpenCV, specifically in Python. My understanding of the methods is :
Given 4 points from a source image and 4 new points getPerspectiveTransform returns a (3, 3) matrix that somehow crops the image when sent into warpPerspective as an argument. I thought that the 4 points(from src image) form a polygon on the image which is then removed/cropped and this new cropped image is then fitted between the newly given 4 points and also I saw that warpPerspective takes the input size of the new image. So I inferred this as, if the new points' max-height/max-width(Calculated from the points...imagining the points are corners of a rectangle or a quadrilateral) is less than the provided width or height the remaining area is left blank that is essentially black/white, but this wasn't the case...if the width/height calculated from the new points is less than the provided width and height the remaining space is filled with some part of the source image that is essentially the outer part of the 4 source points...
I wasn't able to comprehend this behavior...
So am I interpreting the methods incorrectly? if so please provide the correct interpretation of these methods.
PS. I'm pretty new to OpenCV and it would be great if someone explains the underlying math that is used by getPerspectiveTransform warpPerspective.
Thanks in advance.
These functions are parts of an image processing concept called Geometric transformations.
When taking a picture in real life, there is always some sort of geometric distortion which can be removed using Geometric transformations. It has other applications too, including construction of mosaics, geographical mapping, stereo and video.
Here's an example from this site :
So basically warpPerspective transforms the source image to the desired version of it and it does the job using a 3*3 transformation matrix given by getPerspectiveTransform.
See more details here.
Now if you wonder how to find that pair of 4 dots from source and dest image, you should check another image processing concept called Feature extraction. These are methods that perfectly find important regions of an image and you can match them to another image of the same object taken from a different view. (check SIFT, SURF, ORB ,etc.)
An example of matched features:
So warpPerspective won't just crop your image, it will transfer the whole image (not just the region specified by 4 dots) base on the transformation matrix and those dots will only be used to find the correct matrix.

How to generate a 3D image based on ChArUco calibration of two 2D images

I'm currently extracting the calibration parameters of two images that were taken in a stereo vision setup via cv2.aruco.calibrateCameraCharucoExtended(). I'm using the cv2.undistortPoints() & cv2.triangulatePoints() function to convert any two 2D points to a 3D point coordinate, which works perfectly fine.
I'm now looking for a way to convert the 2D images, which can be seen under approach 1, to one 3D image. I need this 3D image because I would like to determine the order of these cups from left to right, to correctly use the triangulatePoints function. If I determine the order of the cups from left to right purely based on the x-coordinates of each of the 2D images, I get different results for each camera (the cup on the front left corner of the table for example is in a different 'order' depending on the camera angle).
Approach 1: Keypoint Feature Matching
I was first thinking about using a keypoint feature extractor like SIFT or SURF, so I therefore tried to do some keypoint extraction and matching. I tried using both the Brute-Force Matching and FLANN based Matcher, but the results are not really good:
Brute-Force
FLANN-based
I also tried to swap the images, but it still gives more or less the same results.
Approach 2: ReprojectImageTo3D()
I looked further into the issue and I think I need the cv2.reprojectImageTo3D() [docs] function. However, to use this function, I first need the Q matrix which needs to be obtained with cv2.stereoRectify [docs]. This stereoRectify function on its turn expects a couple of parameters that I'm able to provide, but there's two I'm confused about:
R – Rotation matrix between the
coordinate systems of the first and
the second cameras.
T – Translation vector between
coordinate systems of the cameras.
I do have the rotation and translation matrices for each camera separately, but not between them? Also, do I really need to do this stereoRectify all over again when I already did a full calibration in ChArUco and already have the camera matrix, distortion coefficients, rotation vectors and translations vectors?
Some extra info that might be useful
I'm using 40 calibration images per camera of the ChArUco board to calibrate. I first extract all corners and markers after which I estimate the calibration parameters with the following code:
(ret, camera_matrix, distortion_coefficients0,
rotation_vectors, translation_vectors,
stdDeviationsIntrinsics, stdDeviationsExtrinsics,
perViewErrors) = cv2.aruco.calibrateCameraCharucoExtended(
charucoCorners=allCorners,
charucoIds=allIds,
board=board,
imageSize=imsize,
cameraMatrix=cameraMatrixInit,
distCoeffs=distCoeffsInit,
flags=flags,
criteria=(cv2.TERM_CRITERIA_EPS & cv2.TERM_CRITERIA_COUNT, 10000, 1e-9))
The board paremeter is created with the following settings:
CHARUCO_BOARD = aruco.CharucoBoard_create(
squaresX=9,
squaresY=6,
squareLength=4.4,
markerLength=3.5,
dictionary=ARUCO_DICT)
Thanks a lot in advance!

Determine The Orientation Of An Image

I am trying to determine the orientation of the following image. Given an image at random between 140x140 to 150X150 pixels with no EXIF data. Is there a method to define each image as 0, 90, 180 or 270 degrees so that when I get an image of a particular orientation I can match that with my predefined images? I've looked into feature matching with opencv using the following tutorial, and it works correctly. Identify the images as the same no matter its orientation, but I have no clue how to tell them apart.
I've looked into feature matching with opencv using the following tutorial, and it works correctly
So you could establish a valid match between an image of unknown rotation and an image in your database? And the latter one is of a known rotation (i.e. upright)?
In this case you can compute a transformation matrix:
either a homography which defines a full planar transformation (use cv::findHomography)
or an affine transform which expresses translation, rotation and scaling and thus seems best for your needs (use cv::estimateRigidTransform with fullAffine=true). You can find more about affine transformations here
If you don't have any known image then this task seems mathematically unsolvable but you could use something like an Artificial-Neural-Network-based heuristic which seems like a very research-intensive project.
If you have the random image somewhere (say, you're trying to match a certain image to a list of images you have), you could try taking the difference of your random image and your list of known images four times for each image, rotating the known image each time by 90 deg. Whichever one is closer to zero should be what you want.
If the image sizes of both your new image and the list of images are the same, you might also be able to just compare the keypoint distance differences (if the image is a match but all the keypoints are all rotated a quadrant clockwise from each other, then it's 90 deg off etc).
If you have no idea what that random image is supposed to be, I can't really think of any way to figure that out, unless you know for sure that a blob of light blue is supposed to be the sky. As far as I know, there's got to be something that you know to be up in order to determine what up is.

Categories

Resources