OpenCV recoverPose camera coordinate system

OpenCV recoverPose camera coordinate system - python

I'm estimating the translation and rotation of a single camera using the following code.
E, mask = cv2.findEssentialMat(k1, k2,
focal = SCALE_FACTOR * 2868
pp = (1920/2 * SCALE_FACTOR, 1080/2 * SCALE_FACTOR),
method = cv2.RANSAC,
prob = 0.999,
threshold = 1.0)
points, R, t, mask = cv2.recoverPose(E, k1, k2)
where k1 and k2 are my matching set of key points, which are Nx2 matrices where the first column is the x-coordinates and the second column is y-coordinates.
I collect all the translations over several frames and generate a path that the camera traveled like this.
def generate_path(rotations, translations):
path = []
current_point = np.array([0, 0, 0])
for R, t in zip(rotations, translations):
path.append(current_point)
# don't care about rotation of a single point
current_point = current_point + t.reshape((3,)
return np.array(path)
So, I have a few issues with this.
The OpenCV camera coordinate system suggests that if I want to view the 2D "top down" view of the camera's path, I should plot the translations along the X-Z plane.
plt.plot(path[:,0], path[:,2])
This is completely wrong.
However, if I write this instead
plt.plot(path[:,0], path[:,1])
I get the following (after doing some averaging)
This path is basically perfect.
So, perhaps I am misunderstanding the coordinate system convention used by cv2.recoverPose? Why should the "birds eye view" of the camera path be along the XY plane and not the XZ plane?
Another, perhaps unrelated issue is that the reported Z-translation appears to decrease linearly, which doesn't really make sense.
I'm pretty sure there's a bug in my code since these issues appear systematic - but I wanted to make sure my understanding of the coordinate system was correct so I can restrict the search space for debugging.

At the very beginning, actually, your method is not producing a real path. The translation t produced by recoverPose() is always a unit vector. Thus, in your 'path', every frame is moving exactly 1 'meter' from the previous frame. The correct method would be, 1) initialize:(featureMatch, findEssentialMatrix, recoverPose), then 2) track:(triangluate, featureMatch, solvePnP). If you would like to dig deeper, finding tutorials on Monocular Visual SLAM would help.
Secondly, you might have messed up with the camera coordinate system and world coordinate system. If you want to plot the trajectory, you would use the world coordinate system rather than camera coordinate system. Besides, the results of recoverPose() are also in world coordinate system. And the world coordinate system is: x-axis pointing to right, y-axis pointing forward, z-axix pointing up.Thus, when you would like to plot the 'bird view', it is correct that you should plot along the X-Y plane.

Related

Creating and offsetting points outside polygon on a discrete grid

I am working in a discrete 2D grid of points in which there are "shapes" that I would like to create points outside of. I have been able to identify the vertices of these points and take convex hulls. So far, this leads to this and all is good and well. The purple here is the shape in question and the red line is the convex contour I have computed.
What I would like to do now is create two neighborhoods of points outside this shape. The first one is a set of points directly outside (as close as the grid size will allow), the second is another set of points but offset some distance away (the distance is not fixed, but rather an input).
I have attempted to write this in Python and get okay results. Here is an example of my current output. The problem is I notice the offsets are not perfect, for example look at the bottom most point in the image I attached. It kinks downwards whereas the original shape does not. It's not too bad in this example, but in other cases where the shape is smaller or if I take a smaller offset it gets worse. I also have an issue where the offsets sometimes overlap, even if they are supposed to be some distance away. I would also like there to be one line in each section of the contour, not two lines (for example in the top left).
My current attempt uses the Shapely package to handle most of the computational geometry. An outline of what I do once I have found the vertices of the convex contour is to offset these vertices by some amount, and interpolate along each pair of vertices to obtain many points alone these lines. Afterwards I use a coordinate transform to identify all points to the nearest grid point. This is how I obtain my final set of points. Below is the actual code I have written.
How can I improve this so I don't run into the issues I described?
Function #1 - Computes the offset points
def OutsidePoints(vertices, dist):
poly_line = LinearRing(vertices)
poly_line_offset = poly_line.buffer(dist, resolution=1, join_style=2, mitre_limit=1).exterior
new_vertices = list(poly_line_offset.coords)
new_vertices = np.asarray(new_vertices)
shape = sg.Polygon(new_vertices)
points = []
for t in np.arange(0, shape.length, step_size):
temp_points = np.transpose(shape.exterior.interpolate(t).xy)
points.append(temp_points[0])
points = np.array(points)
points = np.unique(points, axis=0)
return points
Function #2 - Transforming these points into points that are on my grid
def IndexFinder(points):
index_points = invCoordinateTransform(points)
for i in range(len(index_points)):
for j in range(2):
index_points[i][j] = math.floor(index_points[i][j])
index_points = np.unique(index_points, axis=0)
return index_points
Many thanks!

Convert Eye Gaze (Pitch and yaw) into screen coordinates (Where the person is looking at?)

I am asking this questions as a trimmed version of my previous question. Now that I have a face looking some position on screen and also gaze coordinates (pitch and yaw) of both the eye. Let us say
Left_Eye = [-0.06222888 -0.06577308]
Right_Eye = [-0.04176027 -0.44416167]
I want to identify the screen coordinates where the person probably may be looking at? Is this possible? Please help!

What you need is:
3D position and direction for each eye
you claim you got it but pitch and yaw are just Euler angles and you need also some reference frame and order of transforms to convert them back into 3D vector. Its better to leave the direction in a vector form (which I suspect you got in the first place). Along with the direction you need th position in 3D in the same coordinate system too...
3D definition of your projection plane
so you need at least start position and 2 basis vectors defining your planar rectangle. Much better is to use 4x4 homogenous transform matrix for this because that allows very easy transform from and in to its local coordinate system...
So I see it like this:
So now its just matter of finding the intersection between rays and plane
P(s) = R0 + s*R
P(t) = L0 + t*L
P(u,v) = P0 + u*U +v*V
Solving this system will lead to acquiring u,v which is also the 2D coordinate inside your plane yo are looking at. Of course because of inaccuracies this will not be solvable algebraicaly. So its better to convert the rays into plane local coordinates and just computing the point on each ray with w=0.0 (making this a simple linear equation with single unknown) and computing average position between one for left eye and the other for right eye (in case they do not align perfectly).
so If R0',R',L0',L' are the converted values in UVW local coordinates then:
R0z' + s*Rz' = 0.0
s = -R0z'/Rz'
// so...
R1 = R0' - R'*R0z'/Rz'
L1 = L0' - L'*L0z'/Lz'
P = 0.5 * (R1 + L1)
Where P is the point you are looking at in the UVW coordinates...
The conversion is done easily according to your notations you either multiply the inverse or direct matrix representing the plane by (R,1),(L,1),(R0,0)(L0,0). The forth coordinate (0,1) just tells if you are transforming vector or point.
Without knowing more about your coordinate systems, data accuracy, and what knowns and unknowns you got is hard to be more specific than this.
If your plane is the camera projection plane than U,V are the x and y axis of the image taken from camera and W is normal to it (direction is just matter of notation).
As you are using camera input which uses a perspective projection I hope your positions and vectors are corrected for it.

How to recalculate the coordinates of a point after scaling and rotation?

I have the coordinates of 6 points in an image
(170.01954650878906, 216.98866271972656)
(201.3812255859375, 109.42137145996094)
(115.70114135742188, 210.4272918701172)
(45.42426300048828, 97.89037322998047)
(167.0367889404297, 208.9329833984375)
(70.13690185546875, 140.90538024902344)
I have a point as center [89.2458, 121.0896]. I am trying to re-calculate the position of points in python using 4 rotation degree (from 0,90,-90,180) and 6 scaling factor (0.5,0.75,1,1.10,1.25,1.35,1.5).
My question is how can I rotate and scale the abovementioned points relative to the center point and get the new coordinates of those 6 points?
Your help is really appreciated.

Mathematics
A mathematical approach would be to represent this data as vectors from the center to the image-points, translate these vectors to the origin, apply the transformation and relocate them around the center point. Let's look at how this works in detail.
Representation as vectors
We can show these vectors in a grid, this will produce following image
This image provides a nice way to look at these points, so we can see our actions happening in a visual way. The center point is marked with a dot at the beginning of all the arrows, and the end of each arrow is the location of one of the points supplied in the question.
A vector can be seen as a list of the values of the coordinates of the point so
my_vector = [point[0], point[1]]
could be a representation for a vector in python, it just holds the coordinates of a point, so the format in the question could be used as is! Notice that I will use the position 0 for the x-coordinate and 1 for the y-coordinate throughout my answer.
I have only added this representation as a visual aid, we can look at any set of two points as being a vector, no calculation is needed, this is only a different way of looking at those points.
Translation to origin
The first calculations happen here. We need to translate all these vectors to the origin. We can very easily do this by subtracting the location of the center point from all the other points, for example (can be done in a simple loop):
point_origin_x = point[0] - center_point[0] # Xvalue point - Xvalue center
point_origin_y = point[1] - center_point[1] # Yvalue point - Yvalue center
The resulting points can now be rotated around the origin and scaled with respect to the origin. The new points (as vectors) look like this:
In this image, I deliberately left the scale untouched, so that it is clear that these are exactly the same vectors (arrows), in size and orientation, only shifted to be around (0, 0).
Why the origin
So why translate these points to the origin? Well, rotations and scaling actions are easy to do (mathematically) around the origin and not as easy around other points.
Also, from now on, I will only include the 1st, 2nd and 4th point in these images to save some space.
Scaling around the origin
A scaling operation is very easy around the origin. Just multiply the coordinates of the point with the factor of the scaling:
scaled_point_x = point[0] * scaling_factor
scaled_point_y = point[1] * scaling_factor
In a visual way, that looks like this (scaling all by 1.5):
Where the blue arrows are the original vectors and the red ones are the scaled vectors.
Rotating
Now for rotating. This is a little bit harder, because a rotation is most generally described by a matrix multiplication with this vector.
The matrix to multiply with is the following
(from wikipedia: Rotation Matrix)
So if V is the vector than we need to perform V_r = R(t) * V to get the rotated vector V_r. This rotation will always be counterclockwise! In order to rotate clockwise, we simply need to use R(-t).
Because only multiples of 90° are needed in the question, the matrix becomes a almost trivial. For a rotation of 90° counterclockwise, the matrix is:
Which is basically in code:
rotated_point_x = -point[1] # new x is negative of old y
rotated_point_y = point[0] # new y is old x
Again, this can be nicely shown in a visual way:
Where I have matched the colors of the vectors.
A rotation 90° clockwise will than be
rotated_counter_point_x = point[1] # x is old y
rotated_counter_point_y = -point[0] # y is negative of old x
A rotation of 180° will just be taking the negative coordinates or, you could just scale by a factor of -1, which is essentially the same.
As last point of these operations, might I add that you can scale and/or rotated as much as you want in a sequence to get the desired result.
Translating back to the center point
After the scaling actions and/or rotations the only thing left is te retranslate the vectors to the center point.
retranslated_point_x = new_point[0] + center_point_x
retranslated_point_y = new_point[1] + center_point_y
And all is done.
Just a recap
So to recap this long post:
Subtract the coordinates of the center point from the coordinates of the image-point
Scale by a factor with a simply multiplication of the coordinates
Use the idea of the matrix multiplication to think about the rotation (you can easily find these things on Google or Wikipedia).
Add the coordinates of the center point to the new coordinates of the image-point
I realize now that I could have just given this recap, but now there is at least some visual aid and a slight mathematical background in this post, which is also nice. I really believe that such problems should be looked at from a mathematical angle, the mathematical description can help a lot.

Why does gdal_grid turn image upside-down?

I'm trying to use gdal_grid to make an elevation grid from a surface in a geojson. I use this command:
gdal_grid -a linear:radius=0 inputSurface.geojson outputFile.tif
It seems to give the correct pixel values, but if I open the result in Global Mapper or QGIS, the image is flipped/mirrored in a horizontal axis, such that the tif is directly below the surface and upside-down.
What is the reason for this and how do I fix it??
Update
I already tried changing the geotransform, but it hasn't totally fixed my problem.
I looked at the resulting image in gdalinfo and found out that the upper left corner is actually the lower left corner, so I set it using the SetGeoTransform. This moved it to the correct location, but it is still upside-down. (This may by dependent on the projection, which might cause problems later)
I also tried looking at the pixel width in the geotransform as mentioned below:
Xgeo = GT[0] + Xpixel*GT[1] + Yline*GT[2]
Ygeo = GT[3] + Xpixel*GT[4] + Yline*GT[5]
The image returned by gdal_grid has a positive GT[5], but unfortunately changing it to -GT[5] doesn't change anything.
The code I used to change the geotransform:
transform = list(ds.GetGeoTransform())
transform = [upperLeftX, transform[1], 0, upperLeftY, 0, -transform[5]]
ds.SetGeoTransform(transform)

GDAL's georeferencing is commonly specified by two sets of parameters. The first is the spatial reference, which defines the coordinate system (UTM, WGS, something more localized). The spatial reference for a raster is set using gdal.Dataset.setProjection(). The second piece of georeferencing is the GeoTransform, which translates (row, column) pixel indices into coordinates in the coordinate system. It is likely the geotransform that you need to update to make your image "unflipped".
The GeoTransform is a tuple of 6 values, which relate raster indices into coordinates.
Xgeo = GT[0] + Xpixel*GT[1] + Yline*GT[2]
Ygeo = GT[3] + Xpixel*GT[4] + Yline*GT[5]
Because these are raster images, the (line, pixel) or (row, col) coordinates start from the top left of the image.
[ ]----> column
|
|
v row
This means that GT[1] will be positive when the image is positioned "upright" in the coordinate system. Similarly, and sometimes counter-intuitively, GT[5] will be negative because the y value should decrease for every increasing row in the image. This isn't a requirement, but it is very common.
Modifying the GeoTransform
You state that the image is upside down and below where is should be. This isn't guaranteed to be a fix, but it will get you started. It's easier if you have the image in front of you and can experiment or compare coordinates...
import gdal
# open dataset as readable/writable
ds = gdal.Open('input.tif', gdal.GA_Update)
# get the GeoTransform as a tuple
gt = gdal.GetGeoTransform()
# change gt[5] to be it's negative, flipping the image
gt_new = (gt[0], gt[1], gt[2], gt[3], gt[4], -1 * gt[5])
# set the new GeoTransform, effectively flipping the image
ds.SetGeoTransform(gt_new)
# delete the dataset reference, flushing the cache of changes
del ds

I ended up having more problems with gdal_grid, where it just crashes at seemingly random places, so I'm using the scipy.interpolate-function called griddata in stead. This uses a meshgrid to get the coordinates in the grid, and I had to tile it up because of memory restrictions of meshgrid.
import scipy.interpolate as il #for griddata
import numpy as np
# meshgrid of coords in this tile
gridX, gridY = np.meshgrid(xi[c*tcols:(c+1)*tcols], yi[r*trows:(r+1)*trows][::-1])
## Creating the DEM in this tile
zi = il.griddata((coordsT[0], coordsT[1]), coordsT[2], (gridX, gridY),method='linear',fill_value = nodata) # fill_value to prevent NaN at polygon outline
raster.GetRasterBand(1).WriteArray(zi,c*tcols,nrows-r*trows-rtrows)
The linear interpolation seems to do the same as gdal_grid is supposed to. This was actually effected by making the 5'th element in the geotransform negative as described in the question update.
See description at scipy.interpolate.griddata.
A few things to note:
The point used in the geotransform should be upper-left
The resolution in y-direction should be negative
In the projection (at least the ones I use) positive y-direction is up
In numpy arrays positive y-direction is down
When using gdal's WriteArray it uses the upper left corner
Hope this helps other people's confusion.

I've solved a similar issue by simply re-projecting the results of the gdal_grid. Give this a try (replacing the epsg code with your projection and replacing the input/output filepaths):
gdalwarp -s_srs epsg:4326 -t_srs epsg:4326 gdal_grid_result.tif inverted_output.tif

it does not. it is simply the standards of the tool rendering it. try opening it in QGIS and youll notice it is right side up.

How to get start and end coordinates (x, y) of major axis of a rotating ellipse in opencv?

I am performing motion tracking of an object, and I am trying to identify the front and back of the object. The object is asymmetrical, which means that the centroid of the contour is closer to the front than the back. Using this information, I am approaching this as follows:
Draw contours of object
Find centroid
centroidx, centroidy = int(moments['m10']/moments['m00']), int(moments['m10']/moments['m00'])
Draw bounding ellipse
cv2.fitEllipse(contour)
Calculate major axis length as follows (and as shown in the figure)
MAx, MAy = int(0.5 * ellipseMajorAxisx*math.sin(ellipseAngle)), int(0.5 * ellipseMajorAxisy*math.cos(ellipseAngle))
Calculate beginning and ending x, y coordinates of the major axis
MAxtop, MAytop = int(ellipseCentrex + MAx), int(ellipseCentrey + MAy)
MAxbot, MAybot = int(ellipseCentrex - MAx), int(ellipseCentrey - MAy)
Identify which of the points is closer to the centroid of the contour
distancetop = math.sqrt((centroidx - MAxtop)**2 + (centroidy - MAytop)**2)
distancebot = math.sqrt((centroidx - MAxbot)**2 + (centroidy - MAybot)**2)
min(distancetop, distancebot)
The problem I am encountering is, while I get the "front" end of the ellipse correct most of the time, occasionally the point is a little bit away. As far as I have observed, this seems to be happening such that the x value is correct, but y value is different (in effect, I think this represents the major axis of an ellipse that is perpendicular to mine). I am not sure if this is an issue with opencv's calculation of angles or (more than likely) my calculations are incorrect. I do realize this is a complicated example, hope my figures help!
EDIT: When I get the wrong point, it is not from a perpendicular ellipse, but of a mirror image of my ellipse. And it happens with the x values too, not just y.
After following ssm's suggestion below, I am getting the desired point most of the time. The point still goes wrong occasionally, but "snaps back" into place soon after. For example, this is a few frames when this happens:
By the way, the above images are after "correcting" for angle by using this code:
if angle > 90:
angle = 180 - angle
If I do not do the correction, I get the wrong point at other times, as shown below for the same frames.
So it looks like I get it right for some angles with angle correction and the other angles without correction. How do I get all the right points in both conditions?
(White dot inside the ellipse is the centroid of the contour, whereas the dot on or outside the ellipse is the point I am getting)

I think your only problem is MAytop. You can consider doing the following:
if ycen<yc:
# switch MAytop and MAybot
temp = MAytop
MAytop = MAybot
MAybot = temp
You may have to do a similar check on the x - scale

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.