How to calculate OpenCV camera projectionMatrix in python - python

I am trying to calculate projection_matrix using OpenCV 2.4 in Python 2.7 for my camera (I am using ps eye). I need it for cv2.triangulatePoints(). I already did the calibration using cv2.calibrateCamera() (using calibrate.py from OpenCV examples) so I have rms, camera_matrix, dist_coefs, rvecs and tvecs.
But I have a problem actually calculating projection_matrix from these parameters (I did not find any Python examples online).
PS: Do I have to calibrate every ps eye camera ? I have 3 and I would like to track object in 3D space.

If you have only one camera, projection matrix should be equal to camera_matrix. There is only one complication.
The cv2.triangulatePoints is defined to work with 2 views from 2 different cameras.
Documentation also states that
The function reconstructs 3-dimensional points (in homogeneous
coordinates) by using their observations with a stereo camera.
Projections matrices can be obtained from stereoRectify().
So yes, you have to calibrate each camera and calibrate every pair of cameras in order to retrieve each camera matrix and the rotation matrix and the translation vector from one camera to the "main camera".
For a given couple of cameras, with K1 and K2 the camera matrices, it is true that
The projection matrix of the main camera (the camera is the world reference system) is
P1 = K1*[I | z]
where I is the identy matrix and z is a 0,0,0 vector in the fourth column.
You could think something like
1 0 0 0
0 1 0 0
0 0 1 0
If R is the rotation matrix between the 2 cameras and t the distance between the two cameras, the second projection matrix is
P2 = K2*[R | t]
In python, if you can not obtain the matrices from stereoRectify, one method to do it manually is
import numpy as np
P = np.concatenate((np.dot(K,R),np.dot(K,t)), axis = 1)

Related

How can I rotate a 2d image using a target image, landmark coordinates, the least squares approach, and a rotation matrix?

I have two 2d images, one is the source image and the other is a target image; I need to rotate the source image to match the target image using python (scikit & numpy). I have 3 landmark coordinates for each image, as follows:
image1_points = [(12,16),(7,4),(25,20)]
image2_points = [(15,22),(1,22),(25,10)]
I believe the following steps are what's needed:
Create rotation matrix using least squares approach using the 3 landmark coordinates
Use the rotation matrix to get theta
Convert theta to degrees (for the angle)
Use the apply_angle method with the angle to rotate the image
I've been trying to use these points and the least squares approach to compute a linear transformation matrix that transforms points from the source to the target image.
I know I need to create a rotation matrix, but having never taken algebra I'm a bit lost. I've done lots of reading, and tried using scipy's built-in procrustes to do an affine transformation below (which may be all wrong).
m1, m2, d = scipy.spatial.procrustes(target_points, source_points)
a = np.dot(m1.T, m2, out=None) / norm(m1)**2
#separate x and y for the sake of convenience
ref_x = m2[::2]
ref_y = m2[1::2]
x = m1[::2]
y = m1[1::2]
b = np.sum(x*ref_y - ref_x*y) / norm(m1)**2
scale = np.sqrt(a**2+b**2)
theta = atan(b / max(a.all(), 10**-10)) #avoid dividing by 0
degrees = cos(radians(theta))
apply_angle(source_img, degrees)
However, this is not giving me the result I would expect. It's giving me a degree around 1, where I would expect a degree around 72. I suspect that the degree is what's needed to rotate the image as the angle parameter.
Any help would be hugely appreciated. Thank you!

How to construct camera matrix from known parameters

I wish to project an image taken with a camera for which I know all parameters (focal length, sensor size, X, Y, Z, rotation (omega, phi, kappa) on a 2D plane. I know that I need to construct a camera matrix before being able to do the planar homography, but how?
I've successfully produced a matrix using 4 known pairs of points on each plane following this answer, but it's not the way I want to do it. I've looked at this video that give me almost all my answers, hovewer the matrix named "extrinsic parameters" is not enterely described. How the rotation matrix R and the matrix T of the camera position should be constructed?
With the final camera matrix in hand, is suppose I will be able to take each parameter and feed them to PIL.Image.transform. I'm also open to using the python OpenCV library.
Here is some exemple data:
Original image here (4288 x 2848 pixels)
#Camera position
X: 72003 m
Y: 1070100 m
Z: 1243 m
#Rotation of camera
Omega: 0°
Phi: 27°
Kappa: -38°
Focal length: 26 mm
Pixel size on sensor: 0.00551 mm
The camera matrix P is a 4x3 matrix of the form P = K[R t]:
K is a 3x3 matrix containing the intrinsic parameters (principal point and focal length in pixels)
[R t] is a 3x4 matrix obtained by concatenating R, a 3x3 matrix representing the rotation from the camera frame to the world frame, and t, a 3-vector which represents the position of the origin of the world in the camera frame.
This means that the parameters you have, which seem to be the position of the camera in the world frame, have to be inverted. The inverse of [R t] is [R' t'] where R' = inverse(R) = transpose(R) and t' = -inverse(R)t.
You would first have to know how to compute the 3x3 camera rotation matrix from the three angles you have, and there are many possible representations of a rotation matrix from three angles. The most common are yaw/pitch/roll, and Euler angles with all possible rotation orders.
The 3x3 intrinsics matrix K is [[f 0 cx][0 f cy][0 0 1]], where f = 26/0.00551 = 4719 and (cx,cy) is the principal point, which you can take as the center of the image (4288/2,2848/2).
Then to compute the homography (3x3 matrix) that goes from the plane at world height Z0 to your image, you multiply P by (X,Y,Z0,1), which gives you an expression of the form Xv1 + Yv2 + v3 where v1, v2, and v3 are 3-vectors. The matrix H=[v1 v2 v3] is the homography you are looking for. The 8 coefficients for PIL.Image.transform should be the first 8 coefficients from that matrix, divided by the 9th one.

How to triangulate a point in 3D space, given coordinate points in 2 image and extrinsic values of the camera

I'm trying to write a function that when given two cameras, their rotation, translation matrices, focal point, and the coordinates of a point for each camera, will be able to triangulate the point into 3D space. Basically, given all the extrinsic/intrinsic values needed
I'm familiar with the general idea: to somehow create two rays and find the closest point that satisfies the least squares problem, however, I don't know exactly how to translate the given information to a series of equations to the coordinate point in 3D.
I've arrived a couple years late on my journey. I ran into the exact same issue and found several people asking the same question but never found an answer that was simplified enough for me to understand, so I spent days learning this stuff just so I can simplify it to the essentials and post what I found here for future people.
I'll also give you some code samples at the end to do what you want in python, so stick around.
Some screen shots of my handwritten notes which explain the full process.
Page 1. Page 2. Page 3.
This is the equation I start with can be found in https://docs.opencv.org/master/d9/d0c/group__calib3d.html
Starting formula
Once you choose an origin in the real world that is the same for both cameras, you will have two of these equations with the same X, Y, Z values.
Sorry this next part you already have but others might not have gotten this far:
First you need to calibrate your camera which will give you the camera matrix and distortions (intrinsic properties) for each camera.
https://docs.opencv.org/master/dc/dbb/tutorial_py_calibration.html
You only need those two and can dump the rvecs and tvecs because this will change when you set up the camera.
Once you choose your real world coordinate system, you can use cv2.solvePnP to get the rotation and translation vectors. To do this you need a set of real world points and their corresponding coordinates in the camera for each camera. My trick was I made some code that would show an image of the field and I would pass in points. Then I would click on locations on the image and create a mapping. The code for this bit is a bit lengthy so I won't share it here unless it is requested.
cv2.solvePnP will give you a vector for the rotation matrix, so you need to convert this to a 3x3 matrix using the following line:
`R, jac = cv2.Rodrigues(rvec)`
So now back to the original question:
You have the 3x3 camera matrix for each camera. You have the 3x3 rotation matrix for each camera. You have the 3x1 translation vector for each camera. You have some (u, v) coordinate for where the object of interest is in each camera. The math is explained more in the image of the notes.
import numpy as np
def get_xyz(camera1_coords, camera1_M, camera1_R, camera1_T, camera2_coords, camera2_M, camera2_R, camera2_T):
# Get the two key equations from camera1
camera1_u, camera1_v = camera1_coords
# Put the rotation and translation side by side and then multiply with camera matrix
camera1_P = camera1_M.dot(np.column_stack((camera1_R,camera1_T)))
# Get the two linearly independent equation referenced in the notes
camera1_vect1 = camera1_v*camera1_P[2,:]-camera1_P[1,:]
camera1_vect2 = camera1_P[0,:] - camera1_u*camera1_P[2,:]
# Get the two key equations from camera2
camera2_u, camera2_v = camera2_coords
# Put the rotation and translation side by side and then multiply with camera matrix
camera2_P = camera2_M.dot(np.column_stack((camera2_R,camera2_T)))
# Get the two linearly independent equation referenced in the notes
camera2_vect1 = camera2_v*camera2_P[2,:]-camera2_P[1,:]
camera2_vect2 = camera2_P[0,:] - camera2_u*camera2_P[2,:]
# Stack the 4 rows to create one 4x3 matrix
full_matrix = np.row_stack((camera1_vect1, camera1_vect2, camera2_vect1, camera2_vect2))
# The first three columns make up A and the last column is b
A = full_matrix[:, :3]
b = full_matrix[:, 3].reshape((4, 1))
# Solve overdetermined system. Note b in the wikipedia article is -b here.
# https://en.wikipedia.org/wiki/Overdetermined_system
soln = np.linalg.inv(A.T.dot(A)).dot(A.T).dot(-b)
return soln
Assume you have two cameras -- camera 1 and camera 2.
For each camera j = 1, 2 you are given:
The distance hj between it's center Oj, (is "focal point" the right term? Basically the point Oj from which the camera is looking at its screen) and the camera's screen. The camera's coordinate system is centered at Oj, the Oj--->x and Oj--->y axes are parallel to the screen, while the Oj--->z axis is perpendicular to the screen.
The 3 x 3 rotation matrix Uj and the 3 x 1 translation vector Tj which transforms the Cartesian 3D coordinates with respect to the system of camera j (see point 1) to the world-coordinates, i.e. the coordinates with respect to a third coordinate system from which all points in the 3D world are described.
On the screen of camera j, which is the plane parallel to the plane Oj-x-y and at a distance hj from the origin Oj, you have the 2D coordinates (let's say the x,y coordinates only) of point pj, where the two points p1 and p2 are in fact the projected images of the same point P, somewhere in 3D, onto the screens of camera 1 and 2 respectively. The projection is obtained by drawing the 3D line between point Oj and point P and defining point pj as the unique intersection point of this line with with the screen of camera j. The equation of the screen in camera j's 3D coordinate system is z = hj , so the coordinates of point pj with respect to the 3D coordinate system of camera j look like pj = (xj, yj, hj) and so the 2D screen coordinates are simply pj = (xj, yj) .
Input: You are given the 2D points p1 = (x1, y1), p2 = (x2, y2) , the twp cameras' focal distances h1, h2 , two 3 x 3 rotation matrices U1 and U2, two translation 3 x 1 vector columns T1 and T2 .
Output: The coordinates P = (x0, y0, z0) of point P in the world coordinate system.
One somewhat simple way to do this, avoiding homogeneous coordinates and projection matrices (which is fine too and more or less equivalent), is the following algorithm:
Form Q1 = [x1; y1; h1] and Q2 = [x2; y2; h2] , where they are interpreted as 3 x 1 vector columns;
Transform P1 = U1*Q1 + T1 and P2 = U1*Q2 + T1 , where * is matrix multiplication, here it is a 3 x 3 matrix multiplied by a 3 x 1 column, givin a 3 x 1 column;
Form the lines X = T1 + t1*(P1 - T1) and X = T2 + t2*(P2 - T2) ;
The two lines from the preceding step 3 either intersect at a common point, which is the point P or they are skew lines, i.e. they do not intersect but are not parallel (not coplanar).
If the lines are skew lines, find the unique point X1 on the first line and the uniqe point X2 on the second line such that the vector X2 - X1 is perpendicular to both lines, i.e. X2 - X1 is perpendicular to both vectors P1 - T1 and P2 - T2. These two point X1 and X2 are the closest points on the two lines. Then point P = (X1 + X2)/2 can be taken as the midpoint of the segment X1 X2.
In general, the two lines should pass very close to each other, so the two points X1 and X2 should be very close to each other.

Working out the rotation of a sphere from two points on it's surface

I'm working in nuke and trying to stabilize a spherical panorama shot on a drone with 6 gopros.
I have been able to 2Dtrack 2 points on the image (supplied to me as an equirectangular map) and convert them to xyz co-ordinates for nukes 3d space.
Now I have to work out either an expression or python script to compute the rotation of the sphere in degrees (x,y,z) on each frame.
I'm ok with python and expressions but am getting quickly out of my depth with the maths.
You need to calculate the third point in an orthonormal frame. It's easy: take the cross product of the vectors defined by the points you're given. Then the rotation will be related to the orthonormal matrix whose rows (or columns) are the three vectors.

How to calculate 3D object points from 2D image points using stereo triangulation?

I have a stereo-calibrated camera system calibrated using OpenCV and Python. I am trying to use it to calculate the 3D position of image points. I have collected the intrinsic and extrinsic matrices, as well as, the E, F, R, and T matrices. I am confused on how to triangulate the 2D image points to 3D object points. I have read the following post but I am confused on the process (In a calibrated stereo-vision rig, how does one obtain the "camera matrices" needed for implementing a 3D triangulation algorithm?). Can some one explain how to get from 2D to 3D? From reading around, I feel that the fundamental matrix (F) is important, but I haven't found a clear way to link it to the projection matrix (P). Can someone please walk me through this process?
I appreciate any help I can get.
If you calibrated your stereo camera, you should have the intrinsics K1, K2 for each camera, and the rotation R12 and translation t12 from the first to the second camera. From these, you can form the camera projection matrices P1 and P2 as follows:
P1 = K1 * [I3 | 0]
P2 = K2 * [R12 | t12]
Here, I3 is the 3x3 identity matrix, and the notation [R | t] means stacking R and t horizontally.
Then, you can use function triangulatePoints (documentation), which implements the sparse stereo triangulation from the two camera matrices.
If you want dense triangulation or depthmap estimation, there are several functions for that. You first need to rectify the two images using stereoRectify (documentation) and then perform stereo matching, for example using StereoBM (documentation).

Categories

Resources