I've been trying to understand the output of the aruco_test.cpp program that is included when you download the Aruco Library.
The output has this format:
22=(236.87,86.4296) (422.581,78.3856) (418.21,228.032) (261.347,228.529) Txyz=0.00813142 -0.0148134 0.140595 Rxyz=-2.14032 0.0777095 0.138929
22 is the unique identifier of the marker, the next four pairs of numbers are the four corners of the marker. My problem here is the two vectors Tvec and Rvec.
I've been reading on the Internet that tvec is the translation vector from my camera's center to my object (the marker in this case) and that rvec is the rotation of the object with respect to my camera.
I've got a few questions regarding this:
How can I know the axis of my camera? I mean, is there a way to know where the x, y and z are facing?
How can I get the rotation of the camera from the rotation of the object wrt the camera?
Can someone explain me the meaning of the vectors better so I can really understand it? I think my main problem here is that I don't really know what those numbers mean for real.
EDIT: I've been doing some testing to check how the rotation works and I don't really understand the results:
Moving the camera, marker fixed on the floor:
Initial position: camera looking at the marker - 'z' axis of the marker looking to the camera, 'y' is going upwards and 'x' goes to the right: Rxyz=2.40804 -0.0823451 0.23141
Moving the camera on the 'x' axis of the marker (tilt the camera up): Rxyz=-1.97658 -0.0506794 -0.020052
Moving the camera on the 'y' axis of the marker (incline the camera to the right): Rxyz=2.74544 -0.118551 -0.973627
Turn the camera 90 degrees (to the right): Rxyz=1.80194 -1.86528 0.746029
Moving the marker instead of the camera, leaving the camera fixed looking to the marker:
Using the same initial position as in the previous case.
Moving the marker on its 'x' axis: Rxyz=2.23619 -0.0361307 -0.0843008
Moving the marker on its 'y' axis: Rxyz=-2.9065 -0.0291299 -1.13356
Moving the marker on its 'z' axis (90ยบ turn to the right): Rxyz=1.78398 1.74161 -0.690203
I've been assuming that each number of the vector was the rotation on a respective axis but I think I'm assuming wrong as this values don't make so much sense if that was the case.
How can I know the axis of my camera? I mean, is there a way to know
where the x, y and z are facing?
This is defined in the OpenCV library. x-axis increases from left to right of the image, y-axis increases from top to bottom of the image, and z axis increases towards the front of the camera. Below image explains this axis selection.
How can I get the rotation of the camera from the rotation of the
object wrt the camera?
rvec is the rotation of the marker with respect to the camera frame. You can convert rvec to a 3x3 rotation matrix using the built-in Rodrigues function. If the marker is aligned with camera frame, this rotation matrix should read 3x3 identity matrix.
If you get the inverse of this matrix (this is a rotation matrix, so the inverse is the transpose of the matrix), that is the rotation of the camera with respect to the marker.
Can someone explain me the meaning of the vectors better so I can really understand it?
I think my main problem here is that I don't really know what those
numbers mean for real.
tvec is the distance from the origin of the camera frame to the center of the detected marker (this is F_c - P line on the figure. rvec is as described in the above answer.
Related
Given an image mask, I want to project the pixels onto a mesh in respect to the position and orientation of the camera and convert these pixels into a pointcloud. I have the intrinsic and extrinsic parameters of the camera in respect to the world, and the location of the mesh in world coordinates. I know the mapping from world coordinates to camera image is as follow:
imgpoint = Intrinsic * Extrinsic * worldpoint
So when I want to the opposite i do the inverse of the intrinsic and extrinsic matrices:
worldpoint= Intrinsic^(-1) * Extrinsic^(-1) * imgpoint
However, the idea that I had was to obtain two points from one pixel, with different depth values, to obtain a line and then look for the closest intersection for the mesh I want with the line, but I do not know how to properly generate a point away from the original camera plane. How can I find this extra point and/or am I complicating this problem?
The top equation below shows how to project a point (x,y,z) onto a pixel (u,v);
The extrinsic parameters are the 3x3 rotation matrix R and translation t.
The intrinsic parameters are the focal distances f_x, f_y and
principal point (c_x, c_y). The value alpha is the perspective foreshortening term that is divided out.
The bottom equation reverses the process by describing how to project
a ray from the camera position through through the pixel (u,v) out into the scene as the parameter alpha varies from 0 to infinity.
Now we have converted the problem into a ray casting problem.
Find the intersection of the ray with your mesh which is a
standard computer graphics problem.
My problem is simple, but yet confusing as I personally have no experience in angles and angles conversion yet.
Basically, I need to locate the position of an object attached with single AruCo marker then send the 3d coordinate and pose of that object (the marker) to the robot. Note that the robot model I use is an industrial one manufactured by ABB, and the 3d coordinate I sent already been converted to Robot Coordinate System.
Put aside the problem of coordinate, I solved it using Stereo Cameras. However, I found the pose problem to be so difficult, especially when convert the pose of AruCo marker w.r.t camera to the robot coordinate system. The images below represent the two-coordinate system, one for camera and one for the robot.
The angle I collected from AruCo Marker were converted to Euler Angles, the methods were applied from OpenCV library here:
def PoseCalculate(rvec, tvec, marker):
rmat = cv2.Rodrigues(rvec)[0]
P = np.concatenate((rmat, np.reshape(tvec, (rmat.shape[0], 1))), axis=1)
euler = -cv2.decomposeProjectionMatrix(P)[6]
eul = euler_angles_radians
yaw = eul[1, 0]
pitch = eul[0, 0]
roll = eul[2, 0]
return (pitch, yaw, roll)
The result are three angles that represent pose of the marker. Pitch represents the rotation when the marker rotate around X axis (camera), Yaw for the Y axis (camera) and Roll for the Z axis (camera as well.)
So, how I can convert these three angles to the robot coordinate system?
Thanks for reading this long question and wish all of you be healthy in new year 2021!
So I have a picture with known coordinates for the four corners (lat, long) and I wish to plot some gps points on top of this.
The problem is: the image borders are not parallel to lat-long directions, so I can't use imshow's "extent" because then the corners need the same x or y.
My coordinates for the corners are (starting in bottom left corner, clockwise):
(57.786156, 14.096861) (57.786164, 14.098304)
(57.784925, 14.096857) (57.784928, 14.098310)
Can I rotate the image somehow to get lat-and long on the axis? Alternatively have a shifted coordinate system or something?
How would you do this?
imshow 'extent' and tried to transform the coordinates to [0 1] [0 1] system which seems difficult.
I am trying to create visualization with use of Python and Mayavi.
The purpose of that visualization is to show a trajectory and camera frustums at different stages of the path.
The thing I struggle with is to texturize camera frustum polygons with an actual images.
I am willing to put performance considerations aside for now, and want to find a way to texture a mayavi-created surface with an image provided by numpy.
The most promising suggestions were found there, yet I was unable to construct a surface as I implemented them.
def render_image(self, frustum, timestamp):
surf = mayavi.mlab.surf(frustum[0, :-1],
frustum[1, :-1],
frustum[2, :-1],
color = (1.0, 1.0, 1.0))
That's the code for surface creation, where rows of the numpy array frustum are x, y, z coordinates respectively and the last, fifth point is the tip of pyramid and hence not needed for mesh.
x [-8.717184671492793, -8.623419637172622, -8.363581977642212, -8.269816943322041]
y [-4.563044562134721, -4.941612408713827, -4.37100415350352, -4.749572000082626]
z [13.614485323873417, 13.703336344550703, 14.059553426925493, 14.148404447602779]
That is an example of function input - four 3D points representing vertices of a desired polygon.
Yet, the surf function fails on that input:
File "/usr/local/lib/python2.7/dist-packages/mayavi/tools/helper_functions.py", line 679, in __call_internal__
aspect_ratios = [(zf - zi) / (xf - xi), (zf - zi) / (yf - yi)]
ZeroDivisionError: float division by zero
Note: I was able to render images with mayavi.mlab.imshow, but I find it error-prone and onerous to specify image pose and size in terms of axis angles and scale vectors, so I'm reluctant to accept answers pointing to that direction.
Your help is greatly appreciated.
I got to draw textured cameras with mayavi!
see:
Although the way I've done it is using mlab.imshow, so it maybe this is the type of answer you don't want. See this code:
obj=mlab.imshow(image.T)
obj.actor.orientation = np.rad2deg(camera.w_Rt_c.euler)
pp = np.array([0, 0, camera.f])[:,None]
w_pp = camera.w_Rt_c.forward(pp)
obj.actor.position = w_pp.ravel()
obj.actor.scale = [0.8, 0.8, 0.8]
image is a (n,m) numpy array, for some reason imshow would show the image 90 degrees rotated, that's why I transpose it.
obj.actor.orientation expects a yaw, pitch, roll angles is degrees. The rotation of the image is the product of individual rotation matrices Rx(yaw)*Ry(pitch)*Rz(roll). In the code I use the camera to world euler angles of my camera class (can't share that code at the moment).
The position of the image is set to the 3d position where the principal point of my camera would be transformed to world coordinates.
Why the scale factor is 0.8 is a mystery, if I leave it to 1 the image plane appear larger than the frustum???
I encapsulate the above in a class that expects a camera and an image and draws the frustum and the image at the position and orientation of the given camera.
I have a couple of USB webcams (fixed focal length) setup as a simple stereoscopic rangefinder, spaced N mm apart with each rotated by M degrees towards the centerline, and I've calibrated the cameras to ensure alignment.
When adjusting the angle, how would I measure the coincidence between the images (preferably in Python/PIL/OpenCV) to know when the cameras are focused on an object? Is it as simple as choosing a section of pixels in each image (A rows by B columns) and calculating the sum of the difference between the pixels?
the problem is that you can not assume pixel perfect align of cameras
so let assume x-axis is the parallax shifted axis and y- axis is aligned. You need to identify the x-axis image distortion/shift to detect parallax align even if you are aligned as much as possible. The result of abs difference is not guaranteed to be in min/max so instead of substracting individual pixels substract average color of nearby area of that pixel with radius/size bigger then the align error in y-axis. Let call this radius or size r this way the resulting difference should be minimal when aligned.
Approximation search
You can even speed up the process by r
select big r
scan whole x-range with step for example 0.25*r
choose the lowest difference x-position (x0)
change r to half
go to bullet 2 (but this time whole x range is just between <x0-2.0*r,x0+2.0r>
stops if r is smaller then few pixels
This way you can search in O(log2(n)) instead of O(n)
computer vision approach
this should be even faster:
detect points of interest (in booth images)
specific change in gradient,etc ...
cross match points of interest between images
compute average x-distance between cross-matched points
change parallax align by found distance of points
goto bullet 1 until x-distance is small enough
This way you can avoid checking whole x-range because the align distance is obtained directly ... You just need to convert it to angle or what ever you use to align parallax
[notes]
You do not need to do this on whole image area just select few horizontal lines along the images and scan their nearby area.
There are also another ways to detect align for example for short distances the skew is significant marker of align so compare the height of object on its left and right side between cameras ... If near the same you are aligned if bigger/smaller you are not aligned and know which way to turn ...