My problem is simple, but yet confusing as I personally have no experience in angles and angles conversion yet.
Basically, I need to locate the position of an object attached with single AruCo marker then send the 3d coordinate and pose of that object (the marker) to the robot. Note that the robot model I use is an industrial one manufactured by ABB, and the 3d coordinate I sent already been converted to Robot Coordinate System.
Put aside the problem of coordinate, I solved it using Stereo Cameras. However, I found the pose problem to be so difficult, especially when convert the pose of AruCo marker w.r.t camera to the robot coordinate system. The images below represent the two-coordinate system, one for camera and one for the robot.
The angle I collected from AruCo Marker were converted to Euler Angles, the methods were applied from OpenCV library here:
def PoseCalculate(rvec, tvec, marker):
rmat = cv2.Rodrigues(rvec)[0]
P = np.concatenate((rmat, np.reshape(tvec, (rmat.shape[0], 1))), axis=1)
euler = -cv2.decomposeProjectionMatrix(P)[6]
eul = euler_angles_radians
yaw = eul[1, 0]
pitch = eul[0, 0]
roll = eul[2, 0]
return (pitch, yaw, roll)
The result are three angles that represent pose of the marker. Pitch represents the rotation when the marker rotate around X axis (camera), Yaw for the Y axis (camera) and Roll for the Z axis (camera as well.)
So, how I can convert these three angles to the robot coordinate system?
Thanks for reading this long question and wish all of you be healthy in new year 2021!
Related
Given an image mask, I want to project the pixels onto a mesh in respect to the position and orientation of the camera and convert these pixels into a pointcloud. I have the intrinsic and extrinsic parameters of the camera in respect to the world, and the location of the mesh in world coordinates. I know the mapping from world coordinates to camera image is as follow:
imgpoint = Intrinsic * Extrinsic * worldpoint
So when I want to the opposite i do the inverse of the intrinsic and extrinsic matrices:
worldpoint= Intrinsic^(-1) * Extrinsic^(-1) * imgpoint
However, the idea that I had was to obtain two points from one pixel, with different depth values, to obtain a line and then look for the closest intersection for the mesh I want with the line, but I do not know how to properly generate a point away from the original camera plane. How can I find this extra point and/or am I complicating this problem?
The top equation below shows how to project a point (x,y,z) onto a pixel (u,v);
The extrinsic parameters are the 3x3 rotation matrix R and translation t.
The intrinsic parameters are the focal distances f_x, f_y and
principal point (c_x, c_y). The value alpha is the perspective foreshortening term that is divided out.
The bottom equation reverses the process by describing how to project
a ray from the camera position through through the pixel (u,v) out into the scene as the parameter alpha varies from 0 to infinity.
Now we have converted the problem into a ray casting problem.
Find the intersection of the ray with your mesh which is a
standard computer graphics problem.
Using OpenCV's cv2.stereoCalibrate I have calibrated a pair of cameras, one of them being a time-of-flight camera. So, I have the intrinsic calibration parameters and the essential / fundamental matrix.
Now I would like to project a point from the ToF camera to the 2D camera.
To convert image to world coordinates in the ToF camera, I did:
p = [(15, 15, 1)]
z = depth[p[0][0], p[0][1]] # measured ToF depth for this single point
# from ToF image coordinates (including ToF depth!) to ToF world coordinates
invR_x_invM_x_uv1 = R_inv * cameraMatrix_inv_3 * p[0]
invR_x_tvec = R_inv * T
wcPoint = (z + invR_x_tvec[2]) / invR_x_invM_x_uv1[2] * invR_x_invM_x_uv1 - invR_x_tvec
wcPoint = wcPoint[:, -1]
So I have the point in world coordinates.
What I do not get, is (1) how to transform this point to the world coordinate system of the second camera, and then (2) how to project this point to image coordinates of the second camera. Can anybody point me to a OpenCV function in particular for (1)?
First, you should get into linear algebra. You can't multiply matrices elementwise. You have to use the dot product. The dot product of two matrices A and B is either written as A#B or A.dot(B) in Python.
If you have a stereo calibration and two fixed cameras, you also should have the extrinsic parameters of these cameras. The translation and rotation between these cameras and the world coordinate system (there is only one), which could be located at the position of one of the mentioned cameras, enables you to transform your 3D data to obtain the coordinates in the chosen coordinate system. I seriously suggest that you read the basics about camera projection in Hartley/Zisserman, Multiple View Geometry, freely available online.
I've been trying to understand the output of the aruco_test.cpp program that is included when you download the Aruco Library.
The output has this format:
22=(236.87,86.4296) (422.581,78.3856) (418.21,228.032) (261.347,228.529) Txyz=0.00813142 -0.0148134 0.140595 Rxyz=-2.14032 0.0777095 0.138929
22 is the unique identifier of the marker, the next four pairs of numbers are the four corners of the marker. My problem here is the two vectors Tvec and Rvec.
I've been reading on the Internet that tvec is the translation vector from my camera's center to my object (the marker in this case) and that rvec is the rotation of the object with respect to my camera.
I've got a few questions regarding this:
How can I know the axis of my camera? I mean, is there a way to know where the x, y and z are facing?
How can I get the rotation of the camera from the rotation of the object wrt the camera?
Can someone explain me the meaning of the vectors better so I can really understand it? I think my main problem here is that I don't really know what those numbers mean for real.
EDIT: I've been doing some testing to check how the rotation works and I don't really understand the results:
Moving the camera, marker fixed on the floor:
Initial position: camera looking at the marker - 'z' axis of the marker looking to the camera, 'y' is going upwards and 'x' goes to the right: Rxyz=2.40804 -0.0823451 0.23141
Moving the camera on the 'x' axis of the marker (tilt the camera up): Rxyz=-1.97658 -0.0506794 -0.020052
Moving the camera on the 'y' axis of the marker (incline the camera to the right): Rxyz=2.74544 -0.118551 -0.973627
Turn the camera 90 degrees (to the right): Rxyz=1.80194 -1.86528 0.746029
Moving the marker instead of the camera, leaving the camera fixed looking to the marker:
Using the same initial position as in the previous case.
Moving the marker on its 'x' axis: Rxyz=2.23619 -0.0361307 -0.0843008
Moving the marker on its 'y' axis: Rxyz=-2.9065 -0.0291299 -1.13356
Moving the marker on its 'z' axis (90ยบ turn to the right): Rxyz=1.78398 1.74161 -0.690203
I've been assuming that each number of the vector was the rotation on a respective axis but I think I'm assuming wrong as this values don't make so much sense if that was the case.
How can I know the axis of my camera? I mean, is there a way to know
where the x, y and z are facing?
This is defined in the OpenCV library. x-axis increases from left to right of the image, y-axis increases from top to bottom of the image, and z axis increases towards the front of the camera. Below image explains this axis selection.
How can I get the rotation of the camera from the rotation of the
object wrt the camera?
rvec is the rotation of the marker with respect to the camera frame. You can convert rvec to a 3x3 rotation matrix using the built-in Rodrigues function. If the marker is aligned with camera frame, this rotation matrix should read 3x3 identity matrix.
If you get the inverse of this matrix (this is a rotation matrix, so the inverse is the transpose of the matrix), that is the rotation of the camera with respect to the marker.
Can someone explain me the meaning of the vectors better so I can really understand it?
I think my main problem here is that I don't really know what those
numbers mean for real.
tvec is the distance from the origin of the camera frame to the center of the detected marker (this is F_c - P line on the figure. rvec is as described in the above answer.
I've been working with Python's OpenCV library, using ArUco for object tracking.
The goal is to get the x/y/z coordinates at the center of the ArUco marker, and the angle in relation to the calibrated camera.
I am able to display axes on the aruco marker with the code I have so far, but cannot find how to get x/y/z coordinates from the rotation and translation vectors (if that's even the right way to go about it).
This is the line of code which defines the rotation/translation vectors:
rvec, tvec, _ = aruco.estimatePoseSingleMarkers(corners, markerLength, camera_matrix, dist_coeffs) # For a single marker
Any ideas on how to get angle/marker position in the camera world?
Thanks!
After some tribulation, I found that the x and y coordinates in aruco can be determined by the average of the corners:
x = (corners[i-1][0][0][0] + corners[i-1][0][1][0] + corners[i-1][0][2][0] + corners[i-1][0][3][0]) / 4
y = (corners[i-1][0][0][1] + corners[i-1][0][1][1] + corners[i-1][0][2][1] + corners[i-1][0][3][1]) / 4
And the angle relative to the camera can be determined by the Rodrigues of the rotation vector, the matrix must be filled prior
rotM = np.zeros(shape=(3,3))
cv2.Rodrigues(rvec[i-1], rotM, jacobian = 0)
Finally, yaw pitch and roll can be obtained by taking the RQ Decompose of the rotation matrix
ypr = cv2.RQDecomp3x3(rotM)
As said by chungzuwalla, tvec represents the position of marker center in camera coordinate system and it doesn't change with the rotation of marker at a position. If you want to know about location of corners in camera coordinate system then one needs both rvec and tvec.
Here is a perfect explanation
Aruco markers with openCv, get the 3d corner coordinates?
I am trying to create visualization with use of Python and Mayavi.
The purpose of that visualization is to show a trajectory and camera frustums at different stages of the path.
The thing I struggle with is to texturize camera frustum polygons with an actual images.
I am willing to put performance considerations aside for now, and want to find a way to texture a mayavi-created surface with an image provided by numpy.
The most promising suggestions were found there, yet I was unable to construct a surface as I implemented them.
def render_image(self, frustum, timestamp):
surf = mayavi.mlab.surf(frustum[0, :-1],
frustum[1, :-1],
frustum[2, :-1],
color = (1.0, 1.0, 1.0))
That's the code for surface creation, where rows of the numpy array frustum are x, y, z coordinates respectively and the last, fifth point is the tip of pyramid and hence not needed for mesh.
x [-8.717184671492793, -8.623419637172622, -8.363581977642212, -8.269816943322041]
y [-4.563044562134721, -4.941612408713827, -4.37100415350352, -4.749572000082626]
z [13.614485323873417, 13.703336344550703, 14.059553426925493, 14.148404447602779]
That is an example of function input - four 3D points representing vertices of a desired polygon.
Yet, the surf function fails on that input:
File "/usr/local/lib/python2.7/dist-packages/mayavi/tools/helper_functions.py", line 679, in __call_internal__
aspect_ratios = [(zf - zi) / (xf - xi), (zf - zi) / (yf - yi)]
ZeroDivisionError: float division by zero
Note: I was able to render images with mayavi.mlab.imshow, but I find it error-prone and onerous to specify image pose and size in terms of axis angles and scale vectors, so I'm reluctant to accept answers pointing to that direction.
Your help is greatly appreciated.
I got to draw textured cameras with mayavi!
see:
Although the way I've done it is using mlab.imshow, so it maybe this is the type of answer you don't want. See this code:
obj=mlab.imshow(image.T)
obj.actor.orientation = np.rad2deg(camera.w_Rt_c.euler)
pp = np.array([0, 0, camera.f])[:,None]
w_pp = camera.w_Rt_c.forward(pp)
obj.actor.position = w_pp.ravel()
obj.actor.scale = [0.8, 0.8, 0.8]
image is a (n,m) numpy array, for some reason imshow would show the image 90 degrees rotated, that's why I transpose it.
obj.actor.orientation expects a yaw, pitch, roll angles is degrees. The rotation of the image is the product of individual rotation matrices Rx(yaw)*Ry(pitch)*Rz(roll). In the code I use the camera to world euler angles of my camera class (can't share that code at the moment).
The position of the image is set to the 3d position where the principal point of my camera would be transformed to world coordinates.
Why the scale factor is 0.8 is a mystery, if I leave it to 1 the image plane appear larger than the frustum???
I encapsulate the above in a class that expects a camera and an image and draws the frustum and the image at the position and orientation of the given camera.