I am developing a robot based on StereoPI. I have successfully calibrated the cameras and obtained a fairly accurate depth map. However, I am unable to convert my depth map to point cloud so that I can obtain the actual distance of an object. I have been trying to use cv2.reprojectImageTo3D, but see no success. May I ask if there is a tutorial or guide which teaches how to convert disparity map to point cloud?
I am trying very hard to learn and find reliable sources but see on avail. So, Thank you very much in advance.
By calibrating your cameras you compute their interior orientation parameters (IOP - or intrinsic parameters). To compute the XYZ coordinates from the disparity you need also the exterior orientation parameters (EOP).
If you want your point cloud relative to the robot position, the EOP can be simplified, otherwise, you need to take into account the robot's position and rotation, which can be retrieved with a GNSS receiver and intertial measurement unit (IMU). Note that is very likely that such data need to be processed with a Kalman filter.
Then, assuming you got both (i) the IOP and EOP of your cameras, and (ii) the disparity map, you can generate the point cloud by intersection. There are several ways to accomplish this, I suggest using the collinearity equations.
Related
I have a point cloud and meshes (vertices=points of the point cloud).
I want to project the point cloud with a certain virtual camera.
Here, since the point cloud is sparse, the rendered result includes the points which should be occluded by foreground objects.
To resolve this issue, I want to use mesh information to identify which points should be occluded.
Is there any smart way to do this in python?
Kind advice will be greatly appreciated.
After hours of searching, I conclude that I have to re-implement a novel rendering pipeline to achieve my goal.
So, instead of this, I use a mesh-based renderer to render a depth map.
And then I simply project the points of the point cloud with a projection matrix.
Here, I use the depth map to check whether the point fits with the depth or not.
If the projected point is the one that should be occluded, then the depth of the point would be larger than the depth map value at the corresponding pixel.
So, such points should be ignored while rendering.
I know that this is a less elegant and inefficient trick but anyway it works very well :)
I´m currently working on a project to measure the surface of plant leaves. Until now I´ve successfully implemented an RCNN model to segment individual leaves and also generated a depth map using stereo computer vision which allows me to calculate distances between any two points.
Now I´m stuck trying to connect everything together in order to calculate the area of a leaf/polygon.
**I got original RGB images, Binary masks containing leaves, and also the depth information of every pixel.
Can someone please point me in the right direction?**
I reckon the right way would be to use Delauney triangulation on the polygons in the binary masks and then calculate the surface using the distance between the 3 points of each triangle. I haven't been able to find something quite similar to my problem which is implemented in python.
Thanks so much for your help in advance. I´ll upload a picture of an RGB image with the masks plotted.
leaf instance segmentation
Count the pixels inside the outlines (by polygon filling) or use the shoelace formula.
I have a photo taken from a camera (whose focal length, principle point, and distortion coefficients I know). The photo has a 8cm x 8cm post-in on a table and the center of the post-it is the origin (0, 0) again in cm. I've also indicated the positive-y axis on the post-it.
From this information is it possible to compute the location of the camera and the vector in which the camera is looking in Python using OpenCV? If someone has a snippet of code that does that (assuming you know the coordinates of the post-it corners already) that would be amazing!
Use OpenCV's solvePnP specifying SOLVEPNP_IPPE_SQUARE in the flags. With only 4 points (and a postit) the solution will be quite sensitive to how accurately you mark their images, so ask yourself whether you really need the camera pose and location for your application, and how accurately. E.g., if you just want to make a flat CG "sticker" stay fixed on the table while the camera moves, all you need is estimating a homography, a much simpler task.
It does look like you have all the information required. The marker you use can be easily segmented. Shape analysis will provide corners. I did something similar to get basic eyesight tracking:
Here is a complete example.
Segmentation result for the example:
Please notice, accuracy really matters, so it might be useful to rely on several sets of points.
I am working with 3D point clouds acquired from an object and I need to align them in a single global point cloud. I am having an hard time in understanding the difference between SLAM and registration. Especially since both of them can implement ICP
The point clouds have been acquired in spatial and temporal order and hav extended overlapping area; therefore I should could SLAM for aligning them.
Anyone can clarify this point to me?
Thanks!
anna
Based on your question it sounds like you are working with a depth sensor of some kind moving through an environment. You would like to create a consistent map or point cloud with this sensor. I would have commented this, but my reputation is too low at the moment.
Registration just refers to the alignment of two measurements through some transformation. For image registration this is typically when you find some transformation whether it be a simple translation or an affine warp between two images which makes them 'look' similar. Point cloud registration typically refers to finding a rotation and translation which aligns two point clouds.
SLAM, as you probably know, refers to simultaneous localization and mapping. The goal of SLAM is to find the sensors motion through a scene, and map the scene at the same time.
I think the reason you are having a hard time seeing the difference between the two is because, for your application, registration is a way of accomplishing a simple form of SLAM. The reason for this is because ICP essentially is finding the relative transformation of your depth sensor between two measurements. This is acts as odometry for your sensor.
However, registration is not necessarily going to give you a relative sensor pose in all applications. For example a KLT tracker is a form of simple image registration, but it does not give the relative transformation of two cameras directly.
I hope this clears it up.
I'd like to determine the position and orientation of a stereo camera relative to its previous position in world coordinates. I'm using a bumblebee XB3 camera and the motion between stereo pairs is on the order of a couple feet.
Would this be on the correct track?
Obtain rectified image for each pair
Detect/match feature points rectified images
Compute Fundamental Matrix
Compute Essential Matrix
Thanks for any help!
Well, it sounds like you have a fair understanding of what you want to do! Having a pre-calibrated stereo camera (like the Bumblebee) will then deliver up point-cloud data when you need it - but it also sounds like you basically want to also use the same images to perform visual odometry (certainly the correct term) and provide absolute orientation from a last known GPS position, when the GPS breaks down.
First things first - I wonder if you've had a look at the literature for some more ideas: As ever, it's often just about knowing what to google for. The whole idea of "sensor fusion" for navigation - especially in built up areas where GPS is lost - has prompted a whole body of research. So perhaps the following (intersecting) areas of research might be helpful to you:
Navigation in 'urban canyons'
Structure-from-motion for navigation
SLAM
Ego-motion
Issues you are going to encounter with all these methods include:
Handling static vs. dynamic scenes (i.e. ones that change purely based on the camera motion - c.f. others that change as a result of independent motion occurring in the scene: trees moving, cars driving past, etc.).
Relating amount of visual motion to real-world motion (the other form of "calibration" I referred to - are objects small or far away? This is where the stereo information could prove extremely handy, as we will see...)
Factorisation/optimisation of the problem - especially with handling accumulated error along the path of the camera over time and with outlier features (all the tricks of the trade: bundle adjustment, ransac, etc.)
So, anyway, pragmatically speaking, you want to do this in python (via the OpenCV bindings)?
If you are using OpenCV 2.4 the (combined C/C++ and Python) new API documentation is here.
As a starting point I would suggest looking at the following sample:
/OpenCV-2.4.2/samples/python2/lk_homography.py
Which provides a nice instance of basic ego-motion estimation from optic flow using the function cv2.findHomography.
Of course, this homography H only applies if the points are co-planar (i.e. lying on the same plane under the same projective transform - so it'll work on videos of nice flat roads). BUT - by the same principal we could use the Fundamental matrix F to represent motion in epipolar geometry instead. This can be calculated by the very similar function cv2.findFundamentalMat.
Ultimately, as you correctly specify above in your question, you want the Essential matrix E - since this is the one that operates in actual physical coordinates (not just mapping between pixels along epipoles). I always think of the Fundamental matrix as a generalisation of the Essential matrix by which the (inessential) knowledge of the camera intrinsic calibration (K) is omitted, and vise versa.
Thus, the relationships can be formally expressed as:
E = K'^T F K
So, you'll need to know something of your stereo camera calibration K after all! See the famous Hartley & Zisserman book for more info.
You could then, for example, use the function cv2.decomposeProjectionMatrix to decompose the Essential matrix and recover your R orientation and t displacement.
Hope this helps! One final word of warning: this is by no means a "solved problem" for the complexities of real world data - hence the ongoing research!