Stereo Vision setup <- Estimate Head Pose Angles - python

I'm looking for a solution to estimate head poses from a stereo camera setup.
I calibrated my cameras via opencv and estimated all important matrices to estimate the disparity of my tracked face points to get the 3D and 2D coordinates to solve the PnP-Problem via opencv.
The idea was to track the landamrks on both images Left and Right (I used the unsdistorted images via cv.remap(image_L, stereoMap_L)).
On this images I estimated the landmarks, see image.
Landmarks left and right
With the estimated landmarks I estimated the disparity of all landmarks.
For Disparity I used this function:
def disparity(points_L, points_R):
dispar = landmarks_left[:,0] - landmarks_right[:,0]
return dispar
I followd this paper to estimate all important matrices and points for my setup.
Three-Dimensional Head Pose Estimation Using a Stereo Camera Arrangement
face_2d_L = np.array(face_2d_L, dtype=np.float64) #landmarks left 2D
face_2d_R = np.array(face_2d_R, dtype=np.float64) # landmarks right 2D
dispar = disparity(face_2d_L[:,0], face_2d_R[:,0])
face_3d_L[:,2] = dispar
Q_matrix = np.load('Q_matrix.npy') # estimated via stereo camera calibration
face_3d_Left = face_3d_L # Q_matrix.T
face_3d_Left = (face_3d_Left[:,0:3].T / face_3d_Left[:,3]).T
cam_matrix_L = np.load('cam_matrix_L.npy')
cam_matrix_R = np.load('cam_matrix_R.npy')
dist_matrix_L = np.load('distorsion_left.npy')
dist_matrix_R = np.load('distorsion_right.npy')
success, rot_vec_L, trans_vec_L = cv2.solvePnP(face_3d_Left, face_2d_L, cam_matrix_L, dist_matrix_L)
To solve the PnP-Problem I used the 3D estimated points from the landmarks and the disaprity, the 2D landmarks, left cam matrix and the distorsion matrix from the stereo calibration.
Then i thought I'm more or less done and have all import matrices to estimate the rotation vectors.
rmat_L, jac_L = cv2.Rodrigues(rot_vec_L)
angles_L, mtxR_L, mtxQ_L, Qx_L, Qy_L, Qz_L = cv2.RQDecomp3x3(rmat_L)
Unfortunately I think that I made some mistakes or understand something wrong.
Perhaps you can help to understand this topic a bit better.
Thank you!

Related

How to place a virtual spherical camera centered inside a 3D object/mesh?

I'm currently working with some 3D models from the ModelNet40 dataset and I'd like to place a virtual spherical camera centered inside of the 3D object - which is a 3D triangular mesh - in order to trace rays from the camera and store informations related to each triangle that the ray hits.
I've only computed some basic informations such as vertices, normals and triangles:
mesh = o3d.io.read_triangle_mesh(r"...\airplane_0627.off")
print('Vertices:')
print(np.asarray(mesh.vertices),"\n")
print('Triangles:')
print(np.asarray(mesh.triangles),"\n")
print("Computing normal and rendering it:")
mesh.compute_vertex_normals()
print(np.asarray(mesh.triangle_normals))
meshVertices = np.asarray(mesh.vertices)
triangleNormals = np.asarray(mesh.triangle_normals)
meshTriangles = np.asarray(mesh.triangles)
o3d.visualization.draw_geometries([mesh], mesh_show_wireframe=True)
And tried to use the camera positioning from Open3D with some random pose parameters to see where the camera is placed:
vis = o3d.visualization.VisualizerWithKeyCallback()
vis.create_window()
view_ctl = vis.get_view_control()
vis.add_geometry(mesh)
pose = np.array([ 0.23522,-7.0289e-17,-0.97194,-129.54,
0.97194,-7.3988e-17,0.23522,59.464,
-8.8446e-17,-1,5.0913e-17,2.11,
0,0,0,1]).reshape(4,4)
cam = view_ctl.convert_to_pinhole_camera_parameters()
cam.extrinsic = pose
view_ctl.convert_from_pinhole_camera_parameters(cam)
vis.run()
vis.destroy_window()
What is the best way to place the camera inside the centre of a 3D object/triangular mesh?

Camera calibration and point projection from 3D to 2D image

I am trying to project a 3D point from a pointcloud to a 2D panorama image. I have manually picked 8 points on the 2D image and the corresponding 3D pointcloud points and used the OpenCV library to find the camera matrix, translation, and rotation vectors that are then used in projection method.
#initial camera matrix
camera_matrix = cv2.initCameraMatrix2D([objectPoints_3D],[imagePoints_2D], image_size_2D)
#calibrate cameras
intrinsics_error, intrinsic_matrix, intrinsics_distortion_coeffs, rotation_vectors, translation_vectors = cv2.calibrateCamera([objectPoints_3D], [imagePoints_2D], image_size_2D, camera_matrix, None, flags=cv2.CALIB_USE_INTRINSIC_GUESS) #camera_matrix and intrinsic_matrix are the same
#project a 3D point in the pointcloud onto a 2D image
projected_point_2D = cv2.projectPoints(selected_point_3D, rotation_vectors[0], translation_vectors[0], intrinsic_matrix, intrinsics_distortion_coeffs)
The projected_point_2D seems reasonable for the same image that I used when calibrating, but fails in another set of 2D and 3D images. Am I doing this wrong?
Images contain buildings and trees

Homogeneous transform of point clouds in camera coordinates to world coordinates using OpenCV

I have camera calibration intrinsics and extrinsics (including rotations and translations, i.e. rvecs and tvecs, for a set of N camera poses, relative to a fixed ChArUco target.
Additionally, for every camera pose, I have a set of 3D coordinates ("point cloud") defined in the standard OpenCV camera coordinate system.
After a lot of reading, I would have thought that I need to first compute each camera pose relative to the ChArUCo board, by constructing a homogeneous transformation matrix like so in python:
# initialize 4x4 transform
inverted_homogeneous_transform_matrix = np.zeros(shape=(4,4))
inverted_homogeneous_transform_matrix[3,3] = 1.0
# convert Rodrigues vector into Rodrigues matrix, and then invert it
rotation_matrix = np.zeros(shape=(3,3))
cv2.Rodrigues(rvecs, rotation_matrix)
inverted_rotation = rotation_matrix.transpose()
# add inverted rotation to transform
inverted_homogeneous_transform_matrix[:3,:3] = inverted_rotation
# compute inverted translation, e.g. see http://ksimek.github.io/2012/08/22/extrinsic/
inverted_translation_vector = -inverted_rotation * tvecs
inverted_transform_matrix[:3,3] = np.asarray(inverted_translation_vector).flatten()
# x_coords, y_coords, z_coords are defined in camera coordinate system
x_coords=np.asarray([1,2,3,4,5])
y_coords=np.asarray([2,4,6,8,10])
z_coords=np.asarray([3,6,9,12,15])
homogeneous_ones = np.ones(len(x_coords))
homogeneous_points = np.matrix([x_coords, y_coords, z_coords, homogeneous_ones])
# perform the transformation
transformed_points = inverted_transform_matrix * homogeneous_points
# clean up to extract x,y,z values from matrix and save as 1D array
x_coords = np.asarray(transformed_points[0,:]).flatten()
y_coords = np.asarray(transformed_points[1,:]).flatten()
z_coords = np.asarray(transformed_points[1,:]).flatten()
Basically, the above code works, but when I run it on multiple point clouds from different camera perspectives, they do not magically line up as I would expect. I can confirm that my inverse homogeneous transform is indeed an inverse to the homogeneous transform constructed directly from rvecs and tvecs; have tried breaking it apart into translation first then rotation, and vice versa; and have seen a non-inverse transformation make everything almost line up from one perspective (but all rotated weirdly from every other perspective)...
Any help appreciated!

Depth recognition with opencv2

I cannot fetch a correct disparity map from a couple simple images, as shown below:
LEFT
RIGHT
Disparity
The codes:
import cv2
import numpy as np
# frames buffer
frames = []
# image categories
cat_selected = 0
cat_list = ['open']
cat_calib = [np.load('LUMIA_CALIB.npy')]
# load images
def im_load(image, calib):
frame = cv2.imread(image,0)
if calib is not None:
frame = cv2.undistort(cv2.resize(frame, (640, 480)), *calib[0])
x, y, w, h = calib[1]
frame = frame[y : y + h, x : x + w]
return frame
for idx, im in enumerate(['left', 'right']):
frames.append(im_load('images/%s/%s.jpg' %(cat_list[cat_selected], im), cat_calib[cat_selected]))
cv2.namedWindow(im, cv2.cv.CV_WINDOW_NORMAL)
cv2.imshow(im, frames[idx])
cv2.imwrite('%s.jpg' %im, frames[idx])
stereo = cv2.StereoBM(1, 16, 15)
disparity = stereo.compute(frames[0], frames[1])
cv2.namedWindow('map', 0)
cv2.imshow('map', cv2.convertScaleAbs(disparity))
cv2.imwrite('disparity.jpg', disparity)
cv2.waitKey(0)
cv2.destroyAllWindows()
Questions
What is wrong with the code and how can I fix it?
What are the effects of the distance between cameras while computing depth?
What is the unit of the members of the disparity matrix's values?
P.S
The codes computes the disparity map for Tsukuba set of images, alright though.
I don't know if this is relevant or not but the distance between two cameras is 14.85 cm.
Question 1
You seem to have forgotten to rectify your images from a stereo calibration procedure.
Question 2
The distance between two cameras is called Baseline. In general, the bigger the baseline, the better the accuracy at a distance is, with a sacrifice on field of view. The opposite is true too.
Question 3
The disparity is probably in pixel if you're using Python (I'm no expert in the python API of OpenCV). In general, if you want the distance values from the triangulation (in real world unit), you'll want to use reprojectImageTo3D or its equivalent in Python. You will need to first calibrate your stereo rig using a chessboard (and knowing the size of the squares).

Calculate camera world position with OpenCV Python

I want to calculate my camera's position in world coordinates. This should be fairly easy, but I don't get the results I expect. I believe I've read everything on this topic, but my code isn't working. Here's what I do:
I have a camera looking at an area.
1) I drew a map of the area.
2) I calculated the homography by matching 4 image points to 4 points on my map using cv2.getPerspectiveTransform
3) The H homography transforms every world coordinate to camera coordinate; this is working properly
4) To calculate the camera matrix I followed this:
translation = np.zeros((3,1))
translation[:,0] = homography[:,2]
rotation = np.zeros((3,3))
rotation[:,0] = homography[:,0]
rotation[:,1] = homography[:,1]
rotation[:,2] = np.cross(homography[0:3,0],homography[0:3,1])
cameraMatrix = np.zeros((3,4))
cameraMatrix[:,0:3] = rotation
cameraMatrix[:,3] = homography[:,2]
cameraMatrix = cameraMatrix/cameraMatrix[2][3] #normalize the matrix
5) According to this, the camera's position should be calculated like this:
x,y,z = np.dot(-np.transpose(rotation),translation)
The coordinates I'm getting are totally wrong. The problem should be somewhere in step 4 or 5 I guess. What's wrong with my method?
I think I've got it now. The problem was with the method described in step 4. The camera position cannot be calculated from the homography matrix alone. The camera intrinsics matrix is also necessary. So, the correct procedure is the following:
1) draw a map of the area
2) calibrate the camera using the chessboard image with cv2.findChessboardCorners this yields the camera matrix and the distortion coefficients
3) solvePnP with the world coordinates (3D) and image coordinates (2D). The solvePnP returns the object's origo in the camera's coordinate system given the 4 corresponding points and the camera matrix.
4) Now I need to calculate the camera's position in world coordinates. The rotation matrix is: rotM = cv2.Rodrigues(rvec)[0]
5) The x,y,z position of the camera is: cameraPosition = -np.matrix(rotM).T * np.matrix(tvec)

Categories

Resources