Aruco Marker World Coordinates - python

I've been working with Python's OpenCV library, using ArUco for object tracking.
The goal is to get the x/y/z coordinates at the center of the ArUco marker, and the angle in relation to the calibrated camera.
I am able to display axes on the aruco marker with the code I have so far, but cannot find how to get x/y/z coordinates from the rotation and translation vectors (if that's even the right way to go about it).
This is the line of code which defines the rotation/translation vectors:
rvec, tvec, _ = aruco.estimatePoseSingleMarkers(corners, markerLength, camera_matrix, dist_coeffs) # For a single marker
Any ideas on how to get angle/marker position in the camera world?
Thanks!

After some tribulation, I found that the x and y coordinates in aruco can be determined by the average of the corners:
x = (corners[i-1][0][0][0] + corners[i-1][0][1][0] + corners[i-1][0][2][0] + corners[i-1][0][3][0]) / 4
y = (corners[i-1][0][0][1] + corners[i-1][0][1][1] + corners[i-1][0][2][1] + corners[i-1][0][3][1]) / 4
And the angle relative to the camera can be determined by the Rodrigues of the rotation vector, the matrix must be filled prior
rotM = np.zeros(shape=(3,3))
cv2.Rodrigues(rvec[i-1], rotM, jacobian = 0)
Finally, yaw pitch and roll can be obtained by taking the RQ Decompose of the rotation matrix
ypr = cv2.RQDecomp3x3(rotM)

As said by chungzuwalla, tvec represents the position of marker center in camera coordinate system and it doesn't change with the rotation of marker at a position. If you want to know about location of corners in camera coordinate system then one needs both rvec and tvec.
Here is a perfect explanation
Aruco markers with openCv, get the 3d corner coordinates?

Related

Projection of a 3D circle onto a 2D camera image

Asked this on math.stackexchange, but no responses so trying here, hopefully the computer vision people are more able to help out.
Assume that I have a 3D circle with a center at (c1, c2, c3) in the circle coordinate frame C. The radius of the circle is r, and there is a unit vector (v1, v2, v3) (also in coordinate frame C) normal to the plane of the circle at the center point.
I have a pinhole camera located at point (k1, k2, k3) in the camera coordinate frame K. I have a known camera-to-circle coordinate frame transformation matrix kTc that transforms any point in C to coordinate frame K so that point_k = np.dot(kTc, point_c) where point_k is a point in the K frame coordinates and point_c is a point in the C frame coordinates. The camera has a known intrinsic camera matrix I.
How do I project the 3D circle onto the image plane of the camera?
Ideally would like to do this in python.
I think you want to use the scipy module in conjunction with numpy. The problem you are solving is the transformation of the sphere into a 2D plane and then transforming that plane (by rotating and shifting) to the camera plane (whichever that is).
First of all, some information is missing here about the camera plane (phi and/or theta wrt origin) or the equation of the plane and about the sphere (the equation of the sphere in radial coordinates). Transformations would be a bit tough without equations, angles, and plane(s) information. I agree you have given the information in variables, but it would be easy to have equations in handy OR the first step would be making equations (doing it by hand or using mathematica etc.)
I will be using this documentation to attempt to answer your question - https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.transform.Rotation.html
Following is a pseudo code
do not attempt to execute it as is
import numpy as np
from scipy.spatial.transform import Rotation
# Define the center of the circle in the circle coordinate frame
c = np.array([c1, c2, c3])
# Define the normal vector of the circle in the circle coordinate frame
n = np.array([v1, v2, v3])
# Define the camera center in the camera coordinate frame
k = np.array([k1, k2, k3])
# Define the intrinsic camera matrix
I = np.array(..., dtype=np.float32)
# Define the rotation matrix part of kTc
R = np.array(..., dtype=np.float32)
# Define the translation vector part of kTc
t = np.array(..., dtype=np.float32)
# Convert R to a scipy Rotation object
R_obj = Rotation.from_dcm(R)
# Transform the center of the circle to the camera coordinate frame
c_k = R_obj.apply(c) + t
# Project the center of the circle onto the image plane
c_p = I # c_k
# Normalize the center projection by dividing by its third component
c_p = c_p[:2] / c_p[2]
# Transform the normal vector to the camera coordinate frame
n_k = R_obj.apply(n)
# Project the normal vector onto the image plane
n_p = I[:2, :3] # n_k
# Normalize the normal projection by dividing by its third component
n_p = n_p / n_p[2]
# Calculate the equation of the circle in the image plane
r_squared = r**2 - np.sum(n_p[:2]**2) / n_p[2]**2
And here's an explanation of the equations used in the code:
c = np.array([c1, c2, c3]): The center of the circle in the circle coordinate frame is represented by the vector c = [c1, c2, c3].
n = np.array([v1, v2, v3]): The normal vector of the circle in the circle coordinate frame is represented by the vector n = [v1, v2, v3].
k = np.array([k1, k2, k3]): The camera center in the camera coordinate frame is represented by the vector k = [k1, k2, k3].
I = np.array(..., dtype=np.float32): The intrinsic camera matrix is represented by the 2D array I.
R = np.array(..., dtype=np.float32): The rotation matrix part of the kTc transformation matrix is represented by the 3x3 array R.
t = np.array(..., dtype=np.float32): The translation vector part
Answer algorithm:
To project the 3D circle onto the image plane of the camera, you'll need to follow these steps:
Transform the 3D circle from the circle coordinate frame (C) to the camera coordinate frame (K) using the transformation matrix kTc. The transformed circle's center will be given by:
center_k = np.dot(kTc, [c1, c2, c3, 1])
Project the transformed circle's center onto the image plane of the camera using the intrinsic camera matrix I. The projection of the center will be given by:
center_p = np.dot(I, center_k)
Transform the normal vector from the circle coordinate frame (C) to the camera coordinate frame (K) using the transformation matrix kTc. The transformed normal vector will be given by:
normal_k = np.dot(kTc[:3, :3], [v1, v2, v3])
Project the transformed normal vector onto the image plane of the camera using the intrinsic camera matrix I. The projection of the normal vector will be given by:
normal_p = np.dot(I[:2, :3], normal_k)
Using the center and normal projections, you can now find the equation of the 2D circle in the image plane of the camera. The equation of the circle in the image plane will be given by:
(x - center_p[0]/center_p[2])^2 + (y - center_p[1]/center_p[2])^2 = r^2 - (normal_p[0]^2 + normal_p[1]^2)/(normal_p[2]^2)
where (x, y) are the image plane coordinates of the projected circle.
Note that you'll need to normalize the center and normal projections by dividing each by their respective third components (center_p[2] and normal_p[2]) to get their actual image plane coordinates.

Projecting a Texture Mask onto an existing 3D Mesh given the camera extrinsics

Given an image mask, I want to project the pixels onto a mesh in respect to the position and orientation of the camera and convert these pixels into a pointcloud. I have the intrinsic and extrinsic parameters of the camera in respect to the world, and the location of the mesh in world coordinates. I know the mapping from world coordinates to camera image is as follow:
imgpoint = Intrinsic * Extrinsic * worldpoint
So when I want to the opposite i do the inverse of the intrinsic and extrinsic matrices:
worldpoint= Intrinsic^(-1) * Extrinsic^(-1) * imgpoint
However, the idea that I had was to obtain two points from one pixel, with different depth values, to obtain a line and then look for the closest intersection for the mesh I want with the line, but I do not know how to properly generate a point away from the original camera plane. How can I find this extra point and/or am I complicating this problem?
The top equation below shows how to project a point (x,y,z) onto a pixel (u,v);
The extrinsic parameters are the 3x3 rotation matrix R and translation t.
The intrinsic parameters are the focal distances f_x, f_y and
principal point (c_x, c_y). The value alpha is the perspective foreshortening term that is divided out.
The bottom equation reverses the process by describing how to project
a ray from the camera position through through the pixel (u,v) out into the scene as the parameter alpha varies from 0 to infinity.
Now we have converted the problem into a ray casting problem.
Find the intersection of the ray with your mesh which is a
standard computer graphics problem.

Convert Euler angle between camera & robot coordinate system

My problem is simple, but yet confusing as I personally have no experience in angles and angles conversion yet.
Basically, I need to locate the position of an object attached with single AruCo marker then send the 3d coordinate and pose of that object (the marker) to the robot. Note that the robot model I use is an industrial one manufactured by ABB, and the 3d coordinate I sent already been converted to Robot Coordinate System.
Put aside the problem of coordinate, I solved it using Stereo Cameras. However, I found the pose problem to be so difficult, especially when convert the pose of AruCo marker w.r.t camera to the robot coordinate system. The images below represent the two-coordinate system, one for camera and one for the robot.
The angle I collected from AruCo Marker were converted to Euler Angles, the methods were applied from OpenCV library here:
def PoseCalculate(rvec, tvec, marker):
rmat = cv2.Rodrigues(rvec)[0]
P = np.concatenate((rmat, np.reshape(tvec, (rmat.shape[0], 1))), axis=1)
euler = -cv2.decomposeProjectionMatrix(P)[6]
eul = euler_angles_radians
yaw = eul[1, 0]
pitch = eul[0, 0]
roll = eul[2, 0]
return (pitch, yaw, roll)
The result are three angles that represent pose of the marker. Pitch represents the rotation when the marker rotate around X axis (camera), Yaw for the Y axis (camera) and Roll for the Z axis (camera as well.)
So, how I can convert these three angles to the robot coordinate system?
Thanks for reading this long question and wish all of you be healthy in new year 2021!

Reconstructing a flying object's 3D trajectory off a single 2D-video

I am trying to reconstruct the basketball's 3D trajectory, using solely the broadcast feed.
To do this, I had to calculate the homography matrix, so in each frame I successfully tracked the ball, and 6 points which their location is known in the "real world" (4 on the court itself, and 2 on the backboard) as seen in the picture.
Using the laws of physics I've also approximated the z-coordinate of the ball in every frame.
Now I want to map the ball's location from the 2D pixel coordinates to the real world. The code I have right now(which is attached later), inputs the pixel location (u,v) and height (z) and outputs x,y,z location. It works well for points on the court (meaning z=0), however when I need to track something in the air (the ball), the results don't make sense. If anyone can help tell me what I need to do to get the mapping I would appreciate it a lot.
# Make empty list for ball's 3D location
ball_3d_location = []
# Fixed things
size = frame_list[0].shape
focal_length = size[1]
center = (size[1]/2, size[0]/2)
camera_matrix= np.array(
[[focal_length, 0, center[0]],
[0, focal_length, center[1]],
[0, 0, 1]], dtype = "double"
)
def groundProjectPoint(image_point, z = 0.0):
camMat = np.asarray(camera_matrix)
iRot = np.linalg.inv(rotMat)
iCam = np.linalg.inv(camMat)
uvPoint = np.ones((3, 1))
# Image point
uvPoint[0, 0] = image_point[0]
uvPoint[1, 0] = image_point[1]
tempMat = np.matmul(np.matmul(iRot, iCam), uvPoint)
tempMat2 = np.matmul(iRot, translation_vector)
s = (z + tempMat2[2, 0]) / tempMat[2, 0]
wcPoint = np.matmul(iRot, (np.matmul(s * iCam, uvPoint) - translation_vector))
# wcPoint[2] will not be exactly equal to z, but very close to it
assert int(abs(wcPoint[2] - z) * (10 ** 8)) == 0
wcPoint[2] = z
return wcPoint
dist_coeffs = np.zeros((4,1)) # Assuming no lens distortion
# The tracked points coordinates in the "Real World"
model_points = np.array([
(0,1524/2,0), #Baseline-sideline
(0,-244,0), #Paint-sideline
(579,-244,0), #Paint-FT
(579,1524/2,0), #Sideline-FT
(122,-182.9/2,396.32),#Top Left Backboard
(122,182.9/2,396.32)],dtype=np.float32 #Top Right BackBoard
)
for i,frame in enumerate(bball_frames):
f =frame
#This array has the pixel coordinates of the court & backboard points
image_points =np.array([f.baseline_sideline,
f.paint_sideline,
f.paint_ft,
f.sideline_ft,
f.top_left_backboard,
f.top_right_backboard],dtype=np.float32)
(success, rotation_vector, translation_vector) = cv2.solvePnP(model_points, image_points, camera_matrix, dist_coeffs, flags=cv2.SOLVEPNP_ITERATIVE)
rotMat, _ = cv2.Rodrigues(rotation_vector)
#We assume we know the ball's height in each frame due to the laws of physics.
ball_3d_location+=[groundProjectPoint(image_point=ball_2d_location[i],z = ball_height[i])]
EDIT:
First, I want to clarify the planes of reference:
The video you have is a 2D projection (viewing plane) of the 3D world, as a plane perpendicular to the centerline of the camera lens.
The shot arc is embedded in a plane (shot plane) which is perpendicular to the real-world (3D) floor, defined by the point of release (shooter's hand) and point of contact (backboard).
The shot arc you see on the video is from the projection of that shot plane onto the viewing plane.
I want to make sure we're clear with respect to your most recent
comment: So let's say I can estimate the shooting location on the
court (x,y). using the laws of physics I can say where the ball is in
each frame (x,y) wise and then from that and the pixel coordinates I
can extract the height coordinate?
You can, indeed, estimate the (x,y) coordinate. However, I would not ascribe my approach to "the laws of physics". I would use analytic geometry.
You can estimate, with good accuracy, the 3D coordinates of both the release point (from the known (x, y, 0) position of the shooter's feet) and the end point on the backboard (whose corners are known).
Drop a perpendicular from each of these points to the floor (z=0). That line on the floor is the vertical projection of the arc to the floor -- these are the (x,y) coordinates of the ball in flight.
For each video frame, drop a projected perpendicular from the ball's image to that line on the floor ... that gives you the (x,y) coordinates of the ball, for what it's worth.
You have the definition (equation) of the view plane, the viewpoint (camera), and the arc plane. To determine the ball's position for each video frame, draw a line from the viewpoint, through the ball's image on the view plane. Determine the intersection of this line with the arc plane. That gives you the 3D coordinates of the ball in that frame.
Does that clarify a useful line of attack?

Calculate camera world position with OpenCV Python

I want to calculate my camera's position in world coordinates. This should be fairly easy, but I don't get the results I expect. I believe I've read everything on this topic, but my code isn't working. Here's what I do:
I have a camera looking at an area.
1) I drew a map of the area.
2) I calculated the homography by matching 4 image points to 4 points on my map using cv2.getPerspectiveTransform
3) The H homography transforms every world coordinate to camera coordinate; this is working properly
4) To calculate the camera matrix I followed this:
translation = np.zeros((3,1))
translation[:,0] = homography[:,2]
rotation = np.zeros((3,3))
rotation[:,0] = homography[:,0]
rotation[:,1] = homography[:,1]
rotation[:,2] = np.cross(homography[0:3,0],homography[0:3,1])
cameraMatrix = np.zeros((3,4))
cameraMatrix[:,0:3] = rotation
cameraMatrix[:,3] = homography[:,2]
cameraMatrix = cameraMatrix/cameraMatrix[2][3] #normalize the matrix
5) According to this, the camera's position should be calculated like this:
x,y,z = np.dot(-np.transpose(rotation),translation)
The coordinates I'm getting are totally wrong. The problem should be somewhere in step 4 or 5 I guess. What's wrong with my method?
I think I've got it now. The problem was with the method described in step 4. The camera position cannot be calculated from the homography matrix alone. The camera intrinsics matrix is also necessary. So, the correct procedure is the following:
1) draw a map of the area
2) calibrate the camera using the chessboard image with cv2.findChessboardCorners this yields the camera matrix and the distortion coefficients
3) solvePnP with the world coordinates (3D) and image coordinates (2D). The solvePnP returns the object's origo in the camera's coordinate system given the 4 corresponding points and the camera matrix.
4) Now I need to calculate the camera's position in world coordinates. The rotation matrix is: rotM = cv2.Rodrigues(rvec)[0]
5) The x,y,z position of the camera is: cameraPosition = -np.matrix(rotM).T * np.matrix(tvec)

Categories

Resources