I'm trying to find the relative position of the camera to the chessboard (or the other way around) - I feel OK with converting between different coordinate systems, e.g. as suggested here. I decided to use chessboard not only for calibration but actual position determination as well at this stage, since I can use the findChessboardCorners to get the imagePoints (and this works OK).
I've read a lot on this topic and feel that I understand the solvePnP outputs (even though I'm completely new to openCV and computer vision in general). Unfortunately, the results I get from solvePnP and physically measuring the test set-up are different: translation in z-direction is off by approx. 25%. x and y directions are completely wrong - several orders of magnitude and different direction than what I've read to be the camera coordinate system (x pointing up the image, y to the right, z away from the camera). The difference persists if I convert tvec and rvec to camera pose in world coordinates.
My questions are:
What are the directions of camera and world coordinate systems' axes?
Does solvePnP output the translation in the same units as I specify the objectPoints?
I specified the world origin as the first of the objectPoints (one of the chessboard corners). Is that OK and is tvec the translation to exactly that point from the camera coordinates?
This is my code (I attach it pro forma as it does not throw any exceptions etc.). I used grayscale images to get the camera intrinsics matrix and distortion coefficients during calibration so decided to perform localisation in grayscale as well. chessCoordinates is a list of chessboard points location in mm with respect to the origin (one of the corner points). camMatrix and distCoefficients come from calibration (performed using the same chessboard and objectPoints).
camCapture=cv2.VideoCapture(0) # Take a picture of the target to get the imagePoints
tempImg=camCapture.read()
imgPts=[]
tgtPts=[]
tempImg=cv2.cvtColor(tempImg[1], cv2.COLOR_BGR2GRAY)
found_all, corners = cv2.findChessboardCorners(tempImg, chessboardDim )
imgPts.append(corners.reshape(-1, 2))
tgtPts.append(np.array(chessCoordinates, dtype=np.float32))
retval,myRvec,myTvec=cv2.solvePnP(objectPoints=np.array(tgtPts), imagePoints=np.array(imgPts), cameraMatrix=camMatrix, distCoeffs=distCoefficients)
The camera coordinates are the same as image coordinates. So You have x axe pointing in the right side from the camera, y axe pointing down, and z pointing in the direction camera is faced. This is a clockwise axe system, and the same would apply to the chessboard, so if You specified the origin in, lets say, upper right corner of the chessboard, x axe goes along the longer side to the right and y along shorter side of the chessboard, z axe would be pointing downward, to the ground.
Solve PnP outputs the translation in the same units as the units in which You specified the length of chessboard fields, but it might also use units specified in camera calibration, as it uses the camera matrix.
Tvec points to the origin of the world coordinates in which You placed the calibration object. So if You placed the first object point in (0,0), thats where tvec will point to.
What are the directions of camera and world coordinate systems' axes?
The 0,0,0 corner on the boards is so that the X & Y axis are towards the rest of the corner points. The Z axis is always pointing away from the board. This means that it's usually pointing somewhat in the direction of the camera.
Does solvePnP output the translation in the same units as I specify the objectPoints?
Yes
I specified the world origin as the first of the objectPoints (one of the chessboard corners). Is that OK and is tvec the translation to exactly that point from the camera coordinates?
Yes, this is pretty common. In most of the cases, the first cam corner is set as 0,0,0 and subsequent corners being set at the z=0 plane (eg; (1,0,0) , (0,1,0), etc).
The tvec, combined with the rotation, points towards that point from the board coordinate frame toward the camera. In short; the tvec & rvec provide you with the inverse translation (world -> camera). With some basic geometry you can calculate the transformation that puts camera -> world.
Related
I am trying to find the ground-plane coordinates of contour of an image. I have been able to detect the contour and get the x, y, w and h of the contour, but I now need it on a 2D ground plane coordinate system. I am trying to perform camera calibration to find out data about the camera so that I can perform homography to get the ground plane data. However I am using pre-made videos so I cannot use the checkerboard solution that I am finding online. Does anyone have any ideas?
Below is an example image:
For objects far away, the height of the bridge in the foreground is very small compared to their distance from the camera, so a homography computed from points on the bridge will be approximately correct far away, provided the bridge is approximately horizontal (no roll). This is another way of saying that the images all horizontal planes pass through the horizon.
For objects nearby, but not on the bridge, the above approximation will suffer from parallax error. Unless you have an object of known scale on the plane of interest (the sea), or an estimate of the distance of the bridge from that plane, there is no information available to resolve depth - as far as you can tell the bridge could be 100m or 1mm above the sea.
So if you take a pinhole camera and make it as the origin of our plane(3D) and a pixel from the image plane and connect the two with a straight line it should make a vector, which has direction and a length. Think of this as the path followed by the light reflected from an object into the camera lens. And I want to calculate this. I think we have to use the cameras intrinsic properties for this.
Below is a statement that made me think about it all.
In a pinhole camera model, each pixel defines a direction vector in 3D space, specifically the vector from the projection center through the pixel's position on the image plane.
Here is a diagram better explaining this.
I want to calculate the three red lines, and known parameters would be, I guess, the camera position(origin) and the image pixel value, and the intrinsic camera parameters.
I am trying to come up with a method of converting pixel coordinates to real-world distances and I'm having trouble coming up with the math to "undo" the perspective effects.
My physical setup is a camera facing towards a plate from maybe 2 feet away. The image that the camera would see is diagrammed in the image:
So the overall square is the bounds of the picture, the dotted lines finding the image centre marked with an X (this will be the origin of my pixel coordinate grid). The plate has markings at 5 locations which describe a circle and its centre. These locations are visible to the camera (marked P1, P2, P3, P4, Pin) and the radius of the circle they describe is known and labelled as R. There is no guarantee that Pin will be overlaid on top of X, and it's quite likely that they won't be aligned perfectly, so I drew it with an offset. I will be assuming that the optical axis is perpendicular to the plate. The dot marked A is a position on the real-world plate (that is also visible to the camera) with an unknown position. I am trying to determine its radius r from the centre point, Pin.
I am able to extract the 5 points' pixel coordinates from the image with ease, as well as from point A, but the transformation from pixel to real-world coordinate is the part that's being troublesome.
I theorized that I would be able to ignore perspective and just find out the pixel distances between say P1 and P3, equal that to 2R, then get a scaling factor between pixels and real-world distances. This may give alright results, but I want to design the algorithm to be able to handle perspective issues where locations further away from the optical axis will appear smaller than those in the centre of the image.
If useful, I am able to design a calibration test where Pin is centred over X, the distance between the plate and the camera is measured, and then using the P points and R determine the angle to pixel relationship of the camera.
Would anyone be able to help me develop this algorithm or point me to some resources that may help? I have discovered Projection matrices used in 3D rendering, but I am needing to go the other direction, and none of the resources deal with points on a plane like I have.
I will likely be writing this in python but that isn't required just yet.
At the moment I am working on a computer vision project. In this project I am trying to create a 3d marker tracking system with stereo vision. I am able to calibrate both camera's at the same time using a chessboard pattern. After that both camera's are stereo calibrated. when the calibration is complied, both cameras video's are tracked, points undistorted and rectified. The last step is the triangulation function in OpenCV of both the marker positions. This determent the z coordinate with good enough precision. Only when a object gets closer to the camera lens the distance between the objects changes because of the pin hole principle. only when your working with body segment lengths they should not change this will lead to invalid results.
When comparing the size change of the two markers when getting closer. it seems that the change in size correlates precisely with the z coordinate behavier. With this knowledge there should be a way to correct the x and y position when something gets closer or further away from the lens.
My question is: is there a function in OpenCV witch can adjust the scaling problem of the x and y coordinate accordingly to the z axis changes, otherwise is there a reliable methode of changing the x an y coordinates?
I'm doing something similar to the tutorial here: http://docs.opencv.org/3.1.0/d7/d53/t... regarding pose estimation. Essentially, I'm creating an axis in the model coordinate system and using ProjectPoints, along with my rvecs, tvecs, and cameraMatrix, to project the axis onto the image plane.
In my case, I'm working in the world coordinate space, and I have an rvec and tvec telling me the pose of an object. I'm creating an axis using world coordinate points (which assumes the object wasn't rotated or translated at all), and then using projectPoints() to draw the axes the object in the image plane.
I was wondering if it is possible to eliminate the projection, and get the world coordinates of those axes once they've been rotated and translated. To test, I've done the rotation and translation on the axis points manually, and then use projectPoints to project them onto the image plane (passing identity matrix and zero matrix for rotation, translation respectively), but the results seem way off. How can I eliminate the projection step to just get the world coordinates of the axes, once they've been rotation and translated? Thanks!