Perspective Transformation of a Point in 2D Plane in Python

Perspective Transformation of a Point in 2D Plane in Python - python

I have the following image where the four corners of the cattle housing are -[(x1,y1), (x2,y2), (x3,y3), (x4,y4)]. The camera to capture the image was positioned in the middle of the (x1,y1) and (x4,y4). As the (x3,y3) and (x2,y2) are far from the camera, in the image, x1-x4 not equal x2-x3.
I need to reproject the housing into a 2d rectangular plane with corners of [(x1,y1), (x2,y2), (x3,y3), (x4,y4)] and unlikely the original image, this new plane will have x1-x4 = x2-x3. Is there any viable option to do that? OpenCV comes with a perspective transformation function that can only be applied to an image. But, in this case, I will have some x,y locations of cattle on the original plane which need to be converted and drawn into the rectangular 2d plane to show the cattle position.

This problem is linear algebra more than programming. You have a linear transformation on a simple quadrilateral. The math is simpler, because you have two edges parallel to edges of the image.
First of all, we need to redefine some notation: for instance, you've used (x2, y2) to refer to both points on the posted image, and to the desired position of the upper-left corner of the transformed image. I will simplify this by declaring the transformed points to be A = (x1, y2) and B = (x4, y3): we're horizontally stretching the top of the trapezoid to form a rectangle.
Also note that y1=y4 and y2=y3 from the start; this simplifies the calculations. Visualize the new and old images overlaid, with a point of question Q in the interior, its coordinates marked on the boundary. We need to find the general equation for Q's transformed point, R, after the "stretching".
I have also marked the median of the original image, MN. Points on this line will not move during the stretching. As a side note, points along the bottom edge 1-4 will not move. Points on the outer fringes of 2-3 will move most. Let C be the point on edge 1-2 with the same y-coordinate as Q (and, later, R); let D be the corresponding point on MN.
A-----2-----M-----3---B
| |
Qy CR Q D |
| |
| |
1----Rx-Qx-N----------4
We merely need to pro-rate the amount that a chosen point moves. Finding the equations of MN and 1-2 are well-known (two-point formula). Substitute Qy into each of those equations to obtain Cx and Dx.
The "stretch" factor, in transforming CD to (x1, Qy) D is the ratio of their lengths: (Dx-x1) / (Dx-Cx). Q will move left by a proportion of that stretch factor, according to its distance left of D: (Dx-Qx) / (Dx-x1). Multiply those to get the distance Q moves. Subtract that amount from Qx to get Rx.
Yes, you now have several constants in the final, combined equation: x1, x2, x3, x4, y1, y2. You also have variables Qx and Qy. This is as it should be. This leaves you with a general equation to convert Qx => Rx for any point in the image.
If you plan to stretch vertically as well, the same proportioning will apply in the vertical direction. I suggest that you do one stretch at a time; this will keep the math modular: easier to check and debug in separate stages.
Does that get you moving?
D will not move;

Related

How to triangulate a point in 3D space, given coordinate points in 2 image and extrinsic values of the camera

I'm trying to write a function that when given two cameras, their rotation, translation matrices, focal point, and the coordinates of a point for each camera, will be able to triangulate the point into 3D space. Basically, given all the extrinsic/intrinsic values needed
I'm familiar with the general idea: to somehow create two rays and find the closest point that satisfies the least squares problem, however, I don't know exactly how to translate the given information to a series of equations to the coordinate point in 3D.

I've arrived a couple years late on my journey. I ran into the exact same issue and found several people asking the same question but never found an answer that was simplified enough for me to understand, so I spent days learning this stuff just so I can simplify it to the essentials and post what I found here for future people.
I'll also give you some code samples at the end to do what you want in python, so stick around.
Some screen shots of my handwritten notes which explain the full process.
Page 1. Page 2. Page 3.
This is the equation I start with can be found in https://docs.opencv.org/master/d9/d0c/group__calib3d.html
Starting formula
Once you choose an origin in the real world that is the same for both cameras, you will have two of these equations with the same X, Y, Z values.
Sorry this next part you already have but others might not have gotten this far:
First you need to calibrate your camera which will give you the camera matrix and distortions (intrinsic properties) for each camera.
https://docs.opencv.org/master/dc/dbb/tutorial_py_calibration.html
You only need those two and can dump the rvecs and tvecs because this will change when you set up the camera.
Once you choose your real world coordinate system, you can use cv2.solvePnP to get the rotation and translation vectors. To do this you need a set of real world points and their corresponding coordinates in the camera for each camera. My trick was I made some code that would show an image of the field and I would pass in points. Then I would click on locations on the image and create a mapping. The code for this bit is a bit lengthy so I won't share it here unless it is requested.
cv2.solvePnP will give you a vector for the rotation matrix, so you need to convert this to a 3x3 matrix using the following line:
`R, jac = cv2.Rodrigues(rvec)`
So now back to the original question:
You have the 3x3 camera matrix for each camera. You have the 3x3 rotation matrix for each camera. You have the 3x1 translation vector for each camera. You have some (u, v) coordinate for where the object of interest is in each camera. The math is explained more in the image of the notes.
import numpy as np
def get_xyz(camera1_coords, camera1_M, camera1_R, camera1_T, camera2_coords, camera2_M, camera2_R, camera2_T):
# Get the two key equations from camera1
camera1_u, camera1_v = camera1_coords
# Put the rotation and translation side by side and then multiply with camera matrix
camera1_P = camera1_M.dot(np.column_stack((camera1_R,camera1_T)))
# Get the two linearly independent equation referenced in the notes
camera1_vect1 = camera1_v*camera1_P[2,:]-camera1_P[1,:]
camera1_vect2 = camera1_P[0,:] - camera1_u*camera1_P[2,:]
# Get the two key equations from camera2
camera2_u, camera2_v = camera2_coords
# Put the rotation and translation side by side and then multiply with camera matrix
camera2_P = camera2_M.dot(np.column_stack((camera2_R,camera2_T)))
# Get the two linearly independent equation referenced in the notes
camera2_vect1 = camera2_v*camera2_P[2,:]-camera2_P[1,:]
camera2_vect2 = camera2_P[0,:] - camera2_u*camera2_P[2,:]
# Stack the 4 rows to create one 4x3 matrix
full_matrix = np.row_stack((camera1_vect1, camera1_vect2, camera2_vect1, camera2_vect2))
# The first three columns make up A and the last column is b
A = full_matrix[:, :3]
b = full_matrix[:, 3].reshape((4, 1))
# Solve overdetermined system. Note b in the wikipedia article is -b here.
# https://en.wikipedia.org/wiki/Overdetermined_system
soln = np.linalg.inv(A.T.dot(A)).dot(A.T).dot(-b)
return soln

Assume you have two cameras -- camera 1 and camera 2.
For each camera j = 1, 2 you are given:
The distance hj between it's center Oj, (is "focal point" the right term? Basically the point Oj from which the camera is looking at its screen) and the camera's screen. The camera's coordinate system is centered at Oj, the Oj--->x and Oj--->y axes are parallel to the screen, while the Oj--->z axis is perpendicular to the screen.
The 3 x 3 rotation matrix Uj and the 3 x 1 translation vector Tj which transforms the Cartesian 3D coordinates with respect to the system of camera j (see point 1) to the world-coordinates, i.e. the coordinates with respect to a third coordinate system from which all points in the 3D world are described.
On the screen of camera j, which is the plane parallel to the plane Oj-x-y and at a distance hj from the origin Oj, you have the 2D coordinates (let's say the x,y coordinates only) of point pj, where the two points p1 and p2 are in fact the projected images of the same point P, somewhere in 3D, onto the screens of camera 1 and 2 respectively. The projection is obtained by drawing the 3D line between point Oj and point P and defining point pj as the unique intersection point of this line with with the screen of camera j. The equation of the screen in camera j's 3D coordinate system is z = hj , so the coordinates of point pj with respect to the 3D coordinate system of camera j look like pj = (xj, yj, hj) and so the 2D screen coordinates are simply pj = (xj, yj) .
Input: You are given the 2D points p1 = (x1, y1), p2 = (x2, y2) , the twp cameras' focal distances h1, h2 , two 3 x 3 rotation matrices U1 and U2, two translation 3 x 1 vector columns T1 and T2 .
Output: The coordinates P = (x0, y0, z0) of point P in the world coordinate system.
One somewhat simple way to do this, avoiding homogeneous coordinates and projection matrices (which is fine too and more or less equivalent), is the following algorithm:
Form Q1 = [x1; y1; h1] and Q2 = [x2; y2; h2] , where they are interpreted as 3 x 1 vector columns;
Transform P1 = U1*Q1 + T1 and P2 = U1*Q2 + T1 , where * is matrix multiplication, here it is a 3 x 3 matrix multiplied by a 3 x 1 column, givin a 3 x 1 column;
Form the lines X = T1 + t1*(P1 - T1) and X = T2 + t2*(P2 - T2) ;
The two lines from the preceding step 3 either intersect at a common point, which is the point P or they are skew lines, i.e. they do not intersect but are not parallel (not coplanar).
If the lines are skew lines, find the unique point X1 on the first line and the uniqe point X2 on the second line such that the vector X2 - X1 is perpendicular to both lines, i.e. X2 - X1 is perpendicular to both vectors P1 - T1 and P2 - T2. These two point X1 and X2 are the closest points on the two lines. Then point P = (X1 + X2)/2 can be taken as the midpoint of the segment X1 X2.
In general, the two lines should pass very close to each other, so the two points X1 and X2 should be very close to each other.

How would I remove a sector/slice from a 2D numpy (lat,lon) array?

I have a two-dimensional array of mesh-grided (lat,lon) data in a numpy array. From a single specified point in this array, I want to extend two lines in different directions, such that the area between these two lines creates the sector of a circle. This is best explained in the visualisation below:
The numbers in the image have no meaning, they're just for visualisation.
I wish to convert all the points within the sector to NaN values, such that the indices of the red zeros can be captured. (In the image it was easier to draw these as red zeros, but NaNs are preferable. It's really their index that I'm after.) The inputs to this will be the array, the centerpoint of the arc, and and the angles of each line relative to the horizontal (or vertical). The lines should extend beyond the edge of the (lat,lon) region, as in the diagram.
Can anyone suggest a way to get me started, and which numpy routines would be most helpful? I'm admittedly a little stumped.
EDIT: I also have a matching array of meshgridded latitudes and mesgridded longitudes. The integer index of the centrepoint is known (since I know the lat/lon of the centerpoint). "Angles" and "lines" in this context refer to literal geographic space.

Let you have central indexes cx, cy
Precalculate values for starting and ending angles of the sector:
S_Cos = Cos(Start)
S_Sin = Sin(Start)
E_Cos = Cos(End)
E_Sin = Sin(End)
And make floodfill with zeros using border conditions:
(x-cx) * S_Sin - (y-cy) * S_Cos >= 0 //point is left to starting ray
(x-cx) * E_Sin - (y-cy) * E_Cos <= 0 //point is right to ending ray
x >= minx, y>=miny, x<=maxx, y<=maxy //coordinate is inside array
Former approach:
For small arc angles (< 90 degrees):
Choose filling direction - for most cases horizontal line is good choice, while for some start/end directions vertical filling is more convenient (for example: 350 degrees - 10 degrees)
Make traversal along rays from the center using Bresenham line algorithm. For each Y-step fill with zeros horizontal line between rays or between ray and rectangle (array) border
For larger arc - divide arc into some smaller by OX, OY axes.

Convert Eye Gaze (Pitch and yaw) into screen coordinates (Where the person is looking at?)

I am asking this questions as a trimmed version of my previous question. Now that I have a face looking some position on screen and also gaze coordinates (pitch and yaw) of both the eye. Let us say
Left_Eye = [-0.06222888 -0.06577308]
Right_Eye = [-0.04176027 -0.44416167]
I want to identify the screen coordinates where the person probably may be looking at? Is this possible? Please help!

What you need is:
3D position and direction for each eye
you claim you got it but pitch and yaw are just Euler angles and you need also some reference frame and order of transforms to convert them back into 3D vector. Its better to leave the direction in a vector form (which I suspect you got in the first place). Along with the direction you need th position in 3D in the same coordinate system too...
3D definition of your projection plane
so you need at least start position and 2 basis vectors defining your planar rectangle. Much better is to use 4x4 homogenous transform matrix for this because that allows very easy transform from and in to its local coordinate system...
So I see it like this:
So now its just matter of finding the intersection between rays and plane
P(s) = R0 + s*R
P(t) = L0 + t*L
P(u,v) = P0 + u*U +v*V
Solving this system will lead to acquiring u,v which is also the 2D coordinate inside your plane yo are looking at. Of course because of inaccuracies this will not be solvable algebraicaly. So its better to convert the rays into plane local coordinates and just computing the point on each ray with w=0.0 (making this a simple linear equation with single unknown) and computing average position between one for left eye and the other for right eye (in case they do not align perfectly).
so If R0',R',L0',L' are the converted values in UVW local coordinates then:
R0z' + s*Rz' = 0.0
s = -R0z'/Rz'
// so...
R1 = R0' - R'*R0z'/Rz'
L1 = L0' - L'*L0z'/Lz'
P = 0.5 * (R1 + L1)
Where P is the point you are looking at in the UVW coordinates...
The conversion is done easily according to your notations you either multiply the inverse or direct matrix representing the plane by (R,1),(L,1),(R0,0)(L0,0). The forth coordinate (0,1) just tells if you are transforming vector or point.
Without knowing more about your coordinate systems, data accuracy, and what knowns and unknowns you got is hard to be more specific than this.
If your plane is the camera projection plane than U,V are the x and y axis of the image taken from camera and W is normal to it (direction is just matter of notation).
As you are using camera input which uses a perspective projection I hope your positions and vectors are corrected for it.

How to recalculate the coordinates of a point after scaling and rotation?

I have the coordinates of 6 points in an image
(170.01954650878906, 216.98866271972656)
(201.3812255859375, 109.42137145996094)
(115.70114135742188, 210.4272918701172)
(45.42426300048828, 97.89037322998047)
(167.0367889404297, 208.9329833984375)
(70.13690185546875, 140.90538024902344)
I have a point as center [89.2458, 121.0896]. I am trying to re-calculate the position of points in python using 4 rotation degree (from 0,90,-90,180) and 6 scaling factor (0.5,0.75,1,1.10,1.25,1.35,1.5).
My question is how can I rotate and scale the abovementioned points relative to the center point and get the new coordinates of those 6 points?
Your help is really appreciated.

Mathematics
A mathematical approach would be to represent this data as vectors from the center to the image-points, translate these vectors to the origin, apply the transformation and relocate them around the center point. Let's look at how this works in detail.
Representation as vectors
We can show these vectors in a grid, this will produce following image
This image provides a nice way to look at these points, so we can see our actions happening in a visual way. The center point is marked with a dot at the beginning of all the arrows, and the end of each arrow is the location of one of the points supplied in the question.
A vector can be seen as a list of the values of the coordinates of the point so
my_vector = [point[0], point[1]]
could be a representation for a vector in python, it just holds the coordinates of a point, so the format in the question could be used as is! Notice that I will use the position 0 for the x-coordinate and 1 for the y-coordinate throughout my answer.
I have only added this representation as a visual aid, we can look at any set of two points as being a vector, no calculation is needed, this is only a different way of looking at those points.
Translation to origin
The first calculations happen here. We need to translate all these vectors to the origin. We can very easily do this by subtracting the location of the center point from all the other points, for example (can be done in a simple loop):
point_origin_x = point[0] - center_point[0] # Xvalue point - Xvalue center
point_origin_y = point[1] - center_point[1] # Yvalue point - Yvalue center
The resulting points can now be rotated around the origin and scaled with respect to the origin. The new points (as vectors) look like this:
In this image, I deliberately left the scale untouched, so that it is clear that these are exactly the same vectors (arrows), in size and orientation, only shifted to be around (0, 0).
Why the origin
So why translate these points to the origin? Well, rotations and scaling actions are easy to do (mathematically) around the origin and not as easy around other points.
Also, from now on, I will only include the 1st, 2nd and 4th point in these images to save some space.
Scaling around the origin
A scaling operation is very easy around the origin. Just multiply the coordinates of the point with the factor of the scaling:
scaled_point_x = point[0] * scaling_factor
scaled_point_y = point[1] * scaling_factor
In a visual way, that looks like this (scaling all by 1.5):
Where the blue arrows are the original vectors and the red ones are the scaled vectors.
Rotating
Now for rotating. This is a little bit harder, because a rotation is most generally described by a matrix multiplication with this vector.
The matrix to multiply with is the following
(from wikipedia: Rotation Matrix)
So if V is the vector than we need to perform V_r = R(t) * V to get the rotated vector V_r. This rotation will always be counterclockwise! In order to rotate clockwise, we simply need to use R(-t).
Because only multiples of 90° are needed in the question, the matrix becomes a almost trivial. For a rotation of 90° counterclockwise, the matrix is:
Which is basically in code:
rotated_point_x = -point[1] # new x is negative of old y
rotated_point_y = point[0] # new y is old x
Again, this can be nicely shown in a visual way:
Where I have matched the colors of the vectors.
A rotation 90° clockwise will than be
rotated_counter_point_x = point[1] # x is old y
rotated_counter_point_y = -point[0] # y is negative of old x
A rotation of 180° will just be taking the negative coordinates or, you could just scale by a factor of -1, which is essentially the same.
As last point of these operations, might I add that you can scale and/or rotated as much as you want in a sequence to get the desired result.
Translating back to the center point
After the scaling actions and/or rotations the only thing left is te retranslate the vectors to the center point.
retranslated_point_x = new_point[0] + center_point_x
retranslated_point_y = new_point[1] + center_point_y
And all is done.
Just a recap
So to recap this long post:
Subtract the coordinates of the center point from the coordinates of the image-point
Scale by a factor with a simply multiplication of the coordinates
Use the idea of the matrix multiplication to think about the rotation (you can easily find these things on Google or Wikipedia).
Add the coordinates of the center point to the new coordinates of the image-point
I realize now that I could have just given this recap, but now there is at least some visual aid and a slight mathematical background in this post, which is also nice. I really believe that such problems should be looked at from a mathematical angle, the mathematical description can help a lot.

Matching Angles between Edges

I'm scripting in python for starters.
Making this example simple, I have one edge, with uv Coordinates of ([0,0],[1,1]), so its a 45 degree angle. I have another edge that is ([0,0],[0,1]) so its angle is 0/360 degrees. My goal is to compare the angles of those two edges in order to get the difference so I can modify the angle of the second edge to match the angle of the first edge. Is there a way to do this via vector math?

Easiest to reconstruct and thus constructively remember is IMO the complex picture. To compute the angle from a=a.x+i*a.y to b=b.x+i*b.y rotate b back by multiplying with the conjugate of a to get an angle from the zero angle resp. the positive real axis,
arg((a.x-i*a.y)*(b.x+i*b.y))
=arg((a.x*b.x+a.y*b.y)+i*(a.x*b.y-a.y*b.x))
=atan2( a.x*b.y-a.y*b.x , a.x*b.x+a.y*b.y )
Note that screen coordinates use the opposite orientation to the Cartesian/complex plane, thus change use a sign switch as from atan2(y,x) to atan2(-y,x) to get an angle in the usual direction.
To produce a vector b rotated angle (in radians) w from a, multiply by cos(w)+i*sin(w) to obtain
b.x = cos(w)*a.x - sin(w)*a.y
b.y = cos(w)*a.y + sin(w)*a.x
You will have to rescale to get a specified length of b.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.