Projection of a 3D circle onto a 2D camera image - python

Asked this on math.stackexchange, but no responses so trying here, hopefully the computer vision people are more able to help out.
Assume that I have a 3D circle with a center at (c1, c2, c3) in the circle coordinate frame C. The radius of the circle is r, and there is a unit vector (v1, v2, v3) (also in coordinate frame C) normal to the plane of the circle at the center point.
I have a pinhole camera located at point (k1, k2, k3) in the camera coordinate frame K. I have a known camera-to-circle coordinate frame transformation matrix kTc that transforms any point in C to coordinate frame K so that point_k = np.dot(kTc, point_c) where point_k is a point in the K frame coordinates and point_c is a point in the C frame coordinates. The camera has a known intrinsic camera matrix I.
How do I project the 3D circle onto the image plane of the camera?
Ideally would like to do this in python.

I think you want to use the scipy module in conjunction with numpy. The problem you are solving is the transformation of the sphere into a 2D plane and then transforming that plane (by rotating and shifting) to the camera plane (whichever that is).
First of all, some information is missing here about the camera plane (phi and/or theta wrt origin) or the equation of the plane and about the sphere (the equation of the sphere in radial coordinates). Transformations would be a bit tough without equations, angles, and plane(s) information. I agree you have given the information in variables, but it would be easy to have equations in handy OR the first step would be making equations (doing it by hand or using mathematica etc.)
I will be using this documentation to attempt to answer your question - https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.transform.Rotation.html
Following is a pseudo code
do not attempt to execute it as is
import numpy as np
from scipy.spatial.transform import Rotation
# Define the center of the circle in the circle coordinate frame
c = np.array([c1, c2, c3])
# Define the normal vector of the circle in the circle coordinate frame
n = np.array([v1, v2, v3])
# Define the camera center in the camera coordinate frame
k = np.array([k1, k2, k3])
# Define the intrinsic camera matrix
I = np.array(..., dtype=np.float32)
# Define the rotation matrix part of kTc
R = np.array(..., dtype=np.float32)
# Define the translation vector part of kTc
t = np.array(..., dtype=np.float32)
# Convert R to a scipy Rotation object
R_obj = Rotation.from_dcm(R)
# Transform the center of the circle to the camera coordinate frame
c_k = R_obj.apply(c) + t
# Project the center of the circle onto the image plane
c_p = I # c_k
# Normalize the center projection by dividing by its third component
c_p = c_p[:2] / c_p[2]
# Transform the normal vector to the camera coordinate frame
n_k = R_obj.apply(n)
# Project the normal vector onto the image plane
n_p = I[:2, :3] # n_k
# Normalize the normal projection by dividing by its third component
n_p = n_p / n_p[2]
# Calculate the equation of the circle in the image plane
r_squared = r**2 - np.sum(n_p[:2]**2) / n_p[2]**2
And here's an explanation of the equations used in the code:
c = np.array([c1, c2, c3]): The center of the circle in the circle coordinate frame is represented by the vector c = [c1, c2, c3].
n = np.array([v1, v2, v3]): The normal vector of the circle in the circle coordinate frame is represented by the vector n = [v1, v2, v3].
k = np.array([k1, k2, k3]): The camera center in the camera coordinate frame is represented by the vector k = [k1, k2, k3].
I = np.array(..., dtype=np.float32): The intrinsic camera matrix is represented by the 2D array I.
R = np.array(..., dtype=np.float32): The rotation matrix part of the kTc transformation matrix is represented by the 3x3 array R.
t = np.array(..., dtype=np.float32): The translation vector part
Answer algorithm:
To project the 3D circle onto the image plane of the camera, you'll need to follow these steps:
Transform the 3D circle from the circle coordinate frame (C) to the camera coordinate frame (K) using the transformation matrix kTc. The transformed circle's center will be given by:
center_k = np.dot(kTc, [c1, c2, c3, 1])
Project the transformed circle's center onto the image plane of the camera using the intrinsic camera matrix I. The projection of the center will be given by:
center_p = np.dot(I, center_k)
Transform the normal vector from the circle coordinate frame (C) to the camera coordinate frame (K) using the transformation matrix kTc. The transformed normal vector will be given by:
normal_k = np.dot(kTc[:3, :3], [v1, v2, v3])
Project the transformed normal vector onto the image plane of the camera using the intrinsic camera matrix I. The projection of the normal vector will be given by:
normal_p = np.dot(I[:2, :3], normal_k)
Using the center and normal projections, you can now find the equation of the 2D circle in the image plane of the camera. The equation of the circle in the image plane will be given by:
(x - center_p[0]/center_p[2])^2 + (y - center_p[1]/center_p[2])^2 = r^2 - (normal_p[0]^2 + normal_p[1]^2)/(normal_p[2]^2)
where (x, y) are the image plane coordinates of the projected circle.
Note that you'll need to normalize the center and normal projections by dividing each by their respective third components (center_p[2] and normal_p[2]) to get their actual image plane coordinates.

Related

Getting all pixels from a 3d image whithin a sphere

Given the sphere center coordinates and radius and the image in 3D how to get all pixels within the sphere using numpy in python ?
Assuming for the 3D image we have indicesX, indicesY, indicesZ arrays containing the pixels coordinates over X,Y and Z axis, centerX, centerY, centerZ contains the coordinates of center for the sphere and r is the radius.
One way you could do this is to construct a boolean mask using the equation for a sphere
mask = (np.square(indicesX - centerX) + np.square(indicesY - centerY) + np.square(indicesZ - centerZ)) <= np.square(r)
indicesX_in_sphere = indicesX[mask]
indicesY_in_sphere = indicesY[mask]
indicesZ_in_sphere = indicesZ[mask]
or you could use the mask directly on the values of the 3D image depending on your needs.

Projecting a Texture Mask onto an existing 3D Mesh given the camera extrinsics

Given an image mask, I want to project the pixels onto a mesh in respect to the position and orientation of the camera and convert these pixels into a pointcloud. I have the intrinsic and extrinsic parameters of the camera in respect to the world, and the location of the mesh in world coordinates. I know the mapping from world coordinates to camera image is as follow:
imgpoint = Intrinsic * Extrinsic * worldpoint
So when I want to the opposite i do the inverse of the intrinsic and extrinsic matrices:
worldpoint= Intrinsic^(-1) * Extrinsic^(-1) * imgpoint
However, the idea that I had was to obtain two points from one pixel, with different depth values, to obtain a line and then look for the closest intersection for the mesh I want with the line, but I do not know how to properly generate a point away from the original camera plane. How can I find this extra point and/or am I complicating this problem?
The top equation below shows how to project a point (x,y,z) onto a pixel (u,v);
The extrinsic parameters are the 3x3 rotation matrix R and translation t.
The intrinsic parameters are the focal distances f_x, f_y and
principal point (c_x, c_y). The value alpha is the perspective foreshortening term that is divided out.
The bottom equation reverses the process by describing how to project
a ray from the camera position through through the pixel (u,v) out into the scene as the parameter alpha varies from 0 to infinity.
Now we have converted the problem into a ray casting problem.
Find the intersection of the ray with your mesh which is a
standard computer graphics problem.

Get homography matrix from rotation or orientation matrix

I am getting rotation matrix and orientations (Euler angles) using a sensor in an android device. I want to use these in opencv for affine transformation. affine transformation uses homography matrix to do its job. My question is how to convert rotation matrix or orientation array to homography matrix that is usable in affine transformation?
Android code to get rotation matrix and orientation:
final float[] rotationMatrix = new float[9];
SensorManager.getRotationMatrix(rotationMatrix, null, accelerometerReading, magnetometerReading);
final float[] orientationAngles = new float[3];
SensorManager.getOrientation(rotationMatrix, orientationAngles);
opencv code to affine transform:
homographtMatrix = ... # to calc from rotation matrix or orientation angls
warped = cv2.warpPerspective(img, homographtMatrix, (cols, 600))
Sample rotation matrix:
[
[-0.39098227, -0.24775778, 0.8864249],
[0.9200034, -0.07699536, 0.38427263],
[-0.026955934, 0.96575755, 0.2580418]
]
Sample euler angles:
[1.3097044 0.0269592 1.97264932]
Image going to affine transform:
Desired transform (Cuts from left doesn't matter i can fix it):
Then I will tile a floor in a segmented image.
Homography matrix is a matrix to project a point in the world that is 3d (Pw) to a point in an image plane (Px). For example:
Px = H . Pw
This can be done by an original equation:
Px = K . (R . Pw + t)
Which K is the camera intrinsic parameters function that is created by a number called focal (f) length and a point called principal point (c) that is the center of the image most of the time.
K = [
[f, 0, cx],
[0, f, cy],
[0, 0, 1 ],
]
And R is a 3d rotaion matrix. To explain this, We assume that the point in the world has z = 0 (lays on z plane). To move it to camera coordinates a rotation matrix should be used. z = 0 causes the removal of a column of the R matrix. The combination of K, one column removed R ant t that is the distance of point coordinates from camera coordinates is called homography matrix.
TL;DR
Mathematics concepts are boring to understand. I found an implementation of camera calibration from github that makes projections easy:
import cv2
from vcam import vcam, meshGen
img = cv2.imread("chess.png")
c1.focus = 390
c1.sx = 101/100
c1.sy = 101/100
c1.KpCoeff[0] = 0
c1.KpCoeff[1] = 0
c1.KpCoeff[2] = 0
c1.KpCoeff[3] = 0
c1.set_tvec(0, 0, -500)
c1.set_rvec(0, 0, -180)
pts2d = c1.project(pts3d)
map_x, map_y = c1.getMaps(pts2d)
output = cv2.remap(img, map_x, map_y, interpolation=cv2.INTER_LINEAR)
It has an interesting GUI to test parameters that how they effect the output.

Homogeneous transform of point clouds in camera coordinates to world coordinates using OpenCV

I have camera calibration intrinsics and extrinsics (including rotations and translations, i.e. rvecs and tvecs, for a set of N camera poses, relative to a fixed ChArUco target.
Additionally, for every camera pose, I have a set of 3D coordinates ("point cloud") defined in the standard OpenCV camera coordinate system.
After a lot of reading, I would have thought that I need to first compute each camera pose relative to the ChArUCo board, by constructing a homogeneous transformation matrix like so in python:
# initialize 4x4 transform
inverted_homogeneous_transform_matrix = np.zeros(shape=(4,4))
inverted_homogeneous_transform_matrix[3,3] = 1.0
# convert Rodrigues vector into Rodrigues matrix, and then invert it
rotation_matrix = np.zeros(shape=(3,3))
cv2.Rodrigues(rvecs, rotation_matrix)
inverted_rotation = rotation_matrix.transpose()
# add inverted rotation to transform
inverted_homogeneous_transform_matrix[:3,:3] = inverted_rotation
# compute inverted translation, e.g. see http://ksimek.github.io/2012/08/22/extrinsic/
inverted_translation_vector = -inverted_rotation * tvecs
inverted_transform_matrix[:3,3] = np.asarray(inverted_translation_vector).flatten()
# x_coords, y_coords, z_coords are defined in camera coordinate system
x_coords=np.asarray([1,2,3,4,5])
y_coords=np.asarray([2,4,6,8,10])
z_coords=np.asarray([3,6,9,12,15])
homogeneous_ones = np.ones(len(x_coords))
homogeneous_points = np.matrix([x_coords, y_coords, z_coords, homogeneous_ones])
# perform the transformation
transformed_points = inverted_transform_matrix * homogeneous_points
# clean up to extract x,y,z values from matrix and save as 1D array
x_coords = np.asarray(transformed_points[0,:]).flatten()
y_coords = np.asarray(transformed_points[1,:]).flatten()
z_coords = np.asarray(transformed_points[1,:]).flatten()
Basically, the above code works, but when I run it on multiple point clouds from different camera perspectives, they do not magically line up as I would expect. I can confirm that my inverse homogeneous transform is indeed an inverse to the homogeneous transform constructed directly from rvecs and tvecs; have tried breaking it apart into translation first then rotation, and vice versa; and have seen a non-inverse transformation make everything almost line up from one perspective (but all rotated weirdly from every other perspective)...
Any help appreciated!

Aruco Marker World Coordinates

I've been working with Python's OpenCV library, using ArUco for object tracking.
The goal is to get the x/y/z coordinates at the center of the ArUco marker, and the angle in relation to the calibrated camera.
I am able to display axes on the aruco marker with the code I have so far, but cannot find how to get x/y/z coordinates from the rotation and translation vectors (if that's even the right way to go about it).
This is the line of code which defines the rotation/translation vectors:
rvec, tvec, _ = aruco.estimatePoseSingleMarkers(corners, markerLength, camera_matrix, dist_coeffs) # For a single marker
Any ideas on how to get angle/marker position in the camera world?
Thanks!
After some tribulation, I found that the x and y coordinates in aruco can be determined by the average of the corners:
x = (corners[i-1][0][0][0] + corners[i-1][0][1][0] + corners[i-1][0][2][0] + corners[i-1][0][3][0]) / 4
y = (corners[i-1][0][0][1] + corners[i-1][0][1][1] + corners[i-1][0][2][1] + corners[i-1][0][3][1]) / 4
And the angle relative to the camera can be determined by the Rodrigues of the rotation vector, the matrix must be filled prior
rotM = np.zeros(shape=(3,3))
cv2.Rodrigues(rvec[i-1], rotM, jacobian = 0)
Finally, yaw pitch and roll can be obtained by taking the RQ Decompose of the rotation matrix
ypr = cv2.RQDecomp3x3(rotM)
As said by chungzuwalla, tvec represents the position of marker center in camera coordinate system and it doesn't change with the rotation of marker at a position. If you want to know about location of corners in camera coordinate system then one needs both rvec and tvec.
Here is a perfect explanation
Aruco markers with openCv, get the 3d corner coordinates?

Categories

Resources