Understanding the functions ProjectPoints() and UndistortPoints() of OpenCV - python

Trying to check if I was using correctly both projectPoints() and undistortPoints(), I am obtaining results which I find hard to understand.
Having defined the following intrinsic parameters:
# calibration matrix
K = np.array([
[500.0, 0.0, 300.0],
[0.0, 500.0, 250.0],
[0.0, 0.0, 1.0]
])
# distortion coefficients (k1, k2, p1, p2, k3)
distCoeffs = np.array([1.5, -0.95, -0.005, 0.0025, 1.16])
and the following pointcloud in the camera reference (with all the points being in the same plane):
# Coordinates of points in plane, representing a pointcloud in camera reference (c)
# plane
H, W = 1, 2
X, Y = np.meshgrid(np.arange(-W, W, 0.2), np.arange(-H, H, 0.2))
X, Y = X.reshape(1, -1), Y.reshape(1, -1)
# add depth. Pointcloud of n points represented as a (3, n) array:
Z = 5
P_c = np.concatenate((X, Y, Z * np.ones_like(X)), axis=0)
I was expecting that the following process would yield the original pointcloud:
Projecting the points, while accounting for the distortion, i.e. obtaining the distorted coordinates in the image plane:
# project points, including with lens distortion
U_dist, _ = cv2.projectPoints(P_c, np.zeros((3,)), np.zeros((3,)), K, distCoeffs)
# projections as (2, n) array.
U_dist = U_dist[:, 0].T
Undistort the image coordinates to get the normalized coordinates in the camera reference.
# get normalized coordinates, in camera reference, as a (2, n) array
xn_u = cv2.undistortPoints(U_dist.T, K, distCoeffs, None, None)[:, 0].T
Multiplying the previous normalized coordinates with the depth of the plane in the camera reference to get the original pointcloud:
# add depth.
P_c2 = Z * np.concatenate((xn_u, np.ones_like(X)))
[Complete code]
import numpy as np
import cv2
# calibration matrix
K = np.array([
[500.0, 0.0, 300.0],
[0.0, 500.0, 250.0],
[0.0, 0.0, 1.0]
])
# distortion coefficients (k1, k2, p1, p2, k3)
distCoeffs = np.array([1.5, -0.95, -0.005, 0.0025, 1.16])
# Coordinates of points in plane, representing a pointcloud in camera reference (c)
# plane
H, W = 1, 2
X, Y = np.meshgrid(np.arange(-W, W, 0.2), np.arange(-H, H, 0.2))
X, Y = X.reshape(1, -1), Y.reshape(1, -1)
# add depth. Pointcloud of n points represented as a (3, n) array:
Z = 5
P_c = np.concatenate((X, Y, Z * np.ones_like(X)), axis=0)
# ---------------------------------------------
# PROJECTION WITH DISTORTION
# project points, including with lens distortion
U_dist, _ = cv2.projectPoints(P_c, np.zeros((3,)), np.zeros((3,)), K, distCoeffs)
# projections as (2, n) array.
U_dist = U_dist[:, 0].T
#-----------------------------
# UNPROJECTION accounting for distortion
# get unnormalized coordinates, in camera reference, as a (2, n) array
xn_u = cv2.undistortPoints(U_dist.T, K, distCoeffs, None, None)[:, 0].T
# add depth.
P_c2 = Z * np.concatenate((xn_u, np.ones_like(X)))
# check equality (raises error)
assert np.allclose(P_c, P_c2), f'max difference: {np.abs(P_c - P_c2).max()}'
However, this is not true and the resulting pointcloud is significantly different than the original one.
I feel this may be due to a misunderstanding in the use of the previous functions.
Any help to understand where I'm doing the wrong step(s) is highly appreciated
EDIT
After some more experimentation, I believe the issue is with undistortPoints() rather than with projectPoints(). The latter is deterministic, while the former needs to solve a non-linear optimization problem. Emprically, by increasing the distortion, undistortPoints() tends to give worse results. However at lower levels, it correctly fixes it.

Related

The coordinates of the reconstructed 3D points are different after the virtual camera intrinsic K has also changed proportionally after image resize?

As far as I know, after image resize, the corresponding intrinsic parameter K also changes proportionally, but why the coordinates of the 3D reconstruction of the same point are not the same?
The following python program is a simple experiment, the original image size is , after resize it becomes , the intrinsic parameter K1 corresponds to the original image, the intrinsic parameter K2 corresponds to the resize, RT1, RT2 are the extrinsic projection matrix of the camera (should remain unchanged?,[R,T], size), without considering the effects of camera skew factor and distortions,why is there a difference in the reconstructed 3D points?
import cv2
import numpy as np
fx = 1040
fy = 1040
cx = 1920 / 2
cy = 1080 / 2
K1 = np.array([[fx, 0, cx],
[0, fy, cy],
[0, 0, 1]])
RT1 = np.array([[1, 0, 0, 4],
[0, 1, 0, 5],
[0, 0, 1, 6]]) # just random set
theta = np.pi / 6
RT2 = np.array([[np.cos(theta), -np.sin(theta), 0, 40],
[np.sin(theta), np.cos(theta), 0, 50],
[0, 0, 1, 60]]) # just random set
p1 = np.matmul(K1, RT1) # extrinsic projection matrix
p2 = np.matmul(K1, RT2) # extrinsic projection matrix
pt1 = np.array([100.0, 200.0])
pt2 = np.array([300.0, 400.0])
point3d1 = cv2.triangulatePoints(p1, p2, pt1, pt2)
# Remember to divide out the 4th row. Make it homogeneous
point3d1 = point3d1 / point3d1[3]
print(point3d1)
[[-260.07160113]
[ -27.39546108]
[ 273.95189881]
[ 1. ]]
then resize image to test recontruct 3D point, see if it is numerical equal.
rx = 640.0 / 1920.0
ry = 480.0 / 1080.0
fx = fx * rx
fy = fy * ry
cx = cx * rx
cy = cy * ry
K2 = np.array([[fx, 0, cx],
[0, fy, cy],
[0, 0, 1]])
p1 = np.matmul(K2, RT1)
p2 = np.matmul(K2, RT2)
pt1 = np.array([pt1[0] * rx, pt1[1] * ry])
pt2 = np.array([pt2[0] * rx, pt2[1] * ry])
point3d2 = cv2.triangulatePoints(p1, p2, pt1, pt2)
# Remember to divide out the 4th row. Make it homogeneous
point3d2 = point3d2 / point3d2[3]
print(point3d2)
[[-193.03965985]
[ -26.72133393]
[ 189.12512305]
[ 1. ]]
you see, point3d1 and point3d2 is not same,why?
After careful consideration, I was lucky to get a more plausible explanation, which I now state as follows to help others.
In a short conclusion:
Image scaling must specify a uniform (fx=fy) scaling factor in order to derive the correct intrinsic parameter K, otherwise inconsistencies in the x,y axis focal lengths with respect to the original image directly lead to deviations in the calculated 3D points!
Returning to the problem at the beginning, the given image size is 1080×1920, and its focal length is 1040 pixels, i.e. fx=fy=1040, because by definition fx=f/dx,fy=f/dy, where dx, dy are the number of pixels per unit length, and f is the actual physical size of the focal length; thus the a priori dx=dy can be introduced, which is constant This "convention" should also be followed for later image scaling.
Imagine if the scaled image fx,fy were obtained in different proportions, dx,dy would not be the same, causing distortion of the image, and in addition, according to the external projection matrix P = K*[R,t], fx,fy in K would vary disproportionately leading to a deviation in the calculated P!
BTW, Similarly, I put the reference answer to the experiment done by matlab at this link.

Interpolate image at specific coordinates

Given an image (array) in rectangular form, how do I interpolate specific pixel positions? The following code produces as 20x30 grid, with each pixel filled with a value (zg). The code then constructs an interpolator with scipy's interp2d method. What I want is to obtain interpolated values at specific coordinates. In the given example, at x = [1.5, 2.4, 5.8], y = [0.5, 7.2, 2.2], so for a total of 3 positions. However, the function returns a 3x3 array for some reason. Why? And how would I change the code so that only these three coordinates would be evaluated?
import numpy as np
from scipy.interpolate import interp2d
# Rectangular grid
x = np.arange(20)
y = np.arange(30)
xg, yg = np.meshgrid(x, y)
zg = np.exp(-(2*xg)**2 - (yg/2)**2)
# Define interpolator
interp = interp2d(yg, xg, zg)
# Interpolate pixel value
zi = interp([1.5, 2.4, 5.8], [0.5, 7.2, 2.2])
print(zi.shape) # = (3, 3)
Your code is fine. The interp interpolation function is computing all the possible combinations of coordinates, i.e. 3 × 3 = 9. For instance:
>>> interp(1.5, 0.5)
array([0.04635516])
>>> interp(1.5, 7.2)
array([0.02152198])
>>> interp(5.8, 2.2)
array([0.03073694])
>>> interp(2.4, 2.2)
array([0.03810408])
Indeed you can find these values in the returned matrix:
>>> interp([1.5, 2.4, 5.8], [0.5, 7.2, 2.2])
array([[0.04635516, 0.04409826, 0.03557219],
[0.0400542 , 0.03810408, 0.03073694],
[0.02152198, 0.02047414, 0.01651562]])
The documentation states that the return value is a
2-D array with shape (len(y), len(x))
If you just want the coordinates you need, you can do the following:
xe = [1.5, 2.4, 5.8]
ye = [0.5, 7.2, 2.2]
>>> [interp(x, y)[0] for x, y in zip(xe, ye)]
[0.04635515780224686, 0.020474138863349815, 0.030736938802464715]

Dividing circumference into equal parts and returning coordinates

I have created several circles with different origins using Python and I am trying to implement a function that will divide each circle into n number of equal parts along the circumference. I am trying to populate an array that contains the starting [x,y] coordinate for each part on the circumference.
My code is as follows:
def fnCalculateArcCoordinates(self,intButtonCount,radius,center):
lstButtonCoord = []
#for degrees in range(0,360,intAngle):
for arc in range(1,intButtonCount + 1):
degrees = arc * 360 / intButtonCount
xDegreesCoord = int(center[0] + radius * math.cos(math.radians(degrees)))
yDegreesCoord = int(center[1] + radius * math.sin(math.radians(degrees)))
lstButtonCoord.append([xDegreesCoord,yDegreesCoord])
return lstButtonCoord
When I run the code for 3 parts, an example of the set of coordinates that are returned are:
[[157, 214], [157, 85], [270, 149]]
This means the segments are of different sizes. Could someone please help me identify where my error is?
The exact results of such trigonometric calculations are rarely exact integers. By flooring them to int, you lose some precision, of course. The approximate (Pythagorean) distance checks suggest that your math is correct:
(270-157)**2 + (149-85)**2
# 16865
(270-157)**2 + (214-149)**2
# 16994
(157-157)**2 + (214-85)**2
# 16641
Furthermore, you can use the built-in complex number type and the cmath module. In particular cmath.rect converts polar coordinates (a radius and an angle) into rectangular coordinates:
import cmath
def calc(count, radius, center):
x, y = center
for i in range(count):
r = cmath.rect(radius, (2*cmath.pi)*(i/count))
yield [round(x+r.real, 2), round(y+r.imag, 2)]
list(calc(4, 2, [0, 0]))
# [[2.0, 0.0], [0.0, 2.0], [-2.0, 0.0], [-0.0, -2.0]]
list(calc(6, 1, [0, 0]))
# [[1.0, 0.0], [0.5, 0.87], [-0.5, 0.87], [-1.0, 0.0], [-0.5, -0.87], [0.5, -0.87]]
You want to change rounding as you see fit.

row-wise matrix multiplication using numpy

I want to implement a "row wise" matrix multiplication.
More specifically speaking, I want to plot a set of arrows whose directions range from (-pi, pi). The following code is how I implemented it.
scan_phi = np.linspace(-np.pi*0.5, np.pi*0.5, 450)
points = np.ones((450, 2), dtype=np.float)
points[..., 0] = 0.0
n_pts = len(points)
sin = np.sin(scan_phi)
cos = np.cos(scan_phi)
rot = np.append(np.expand_dims(np.vstack([cos, -sin]).T, axis=1),
np.expand_dims(np.vstack([sin, cos]).T, axis=1),
axis=1)
points_rot = []
for idx, p in enumerate(points):
points_rot.append(np.matmul(rot[idx], p.T))
points_rot = np.array(points_rot)
sample = points_rot[::10]
ax = plt.axes()
ax.set_xlim(-2, 2)
ax.set_ylim(-2, 2)
for idx, p in enumerate(sample):
if idx == 0:
ax.arrow(0, 0, p[0], p[1], head_width=0.05, head_length=0.1, color='red')
else:
ax.arrow(0, 0, p[0], p[1], head_width=0.05, head_length=0.1, fc='k', ec='k')
plt.show()
In my code, "rot" ends up being an array of (450, 2, 2) meaning for each arrow, I have created a corresponding rotation matrix to rotate it. I have 450 points stored in "points" (450, 2) that I want to draw arrows with. (Here the arrows are all initialized with [0, 1]. However, it can be initialized with different values which is why I want to have 450 individual points instead of just rotating a single point by 450 different angles)
The way I did is using a for-loop, i.e. for each arrow, I transform it individually.
points_rot = []
for idx, p in enumerate(points):
points_rot.append(np.matmul(rot[idx], p.T))
points_rot = np.array(points_rot)
However, I wonder if there's any nicer and easy way to do this completely through numpy, such as some operations that can perform matrix multiplication row-wise. Any idea will be grateful, thanks in advance!
This is a nice use-case for np.einsum:
aa = np.random.normal(size=(450, 2, 2))
bb = np.random.normal(size=(450, 2))
cc = np.einsum('ijk,ik->ij', aa, bb)
So that each row of cc is the product of corresponding rows of aa and bb:
np.allclose(aa[3].dot(bb[3]), cc) # returns True
Explanation: the Einstein notation ijk,ik->ij is saying:
cc[i,j] = sum(aa[i,j,k] * bb[i,k] for k in range(2))
I.e., all variables that do not appear in the right-hand side are summed away.

Scikit-learn and data visusalisation: Why do I have to use ravel when I use predict?

I have a function here that visualises the classification made by a certain classifier like Logistic Regression or simply the perceptron model. But I don't get several things:
X has n examples and just 2 features.
Why do I have to use xx1.ravel() and xx2.ravel() and then transpose the entire array for classifier.predict? Why can't I simply predict the outcomes using the original dimensions?
2.Why do I need to reshape Z back to the original xx1 shape?
Why is there a need to create a meshgrid for plotting a scatter plot? Does the specific points in the meshgrid act like 'pixels' that represent a certain point on the grid? Why is this needed anyway?
What is the idx value in idx, cl in enumerate(np.unique(y)), when all I get when I use unique is simply the unique id of the outcomes?
What is the use of c = cmap(idx) in the scatter function? Why can cmap take in an argument?
I apologise for the latter questions that may not fit with the topic question.
The code is taken from the Python Machine Learning book.
def plot_decision_regions(X, y, classifier, test_idx = None, resolution = 0.002):
#Setup marker generator and color map
markers = ('s', 'x', 'o', '^', 'v')
colors = ('red','blue','green','gray','cyan')
cmap = ListedColormap(colors[:len(np.unique(y))])
#MESHGRID - plot decision surface
x1_min, x1_max = X[:, 0].min(), X[:, 0].max()
x2_min, x2_max = X[:, 1].min(), X[:, 1].max()
xx1, xx2 = np.meshgrid(np.arange(x1_min, x1_max, resolution), np.arange(x2_min, x2_max, resolution))
# print 'meshgrid:', xx1, xx2
#CLASSIFIER PREDICT
Z = classifier.predict(np.array([xx1.ravel(), xx2.ravel()]).T)
Z = Z.reshape(xx1.shape)
plt.contourf(xx1, xx2, Z, alpha = 0.4, cmap=cmap)
plt.xlim(xx1.min(), xx1.max())
plt.ylim(xx2.min(), xx2.max())
for idx, cl in enumerate(np.unique(y)):
plt.scatter(x = X[y == cl, 0], y = X[y == cl, 1], alpha = 0.8, c = cmap(idx), marker = markers[idx], label =cl)
#highlight test samples
if test_idx:
XTest, yTest = X[test_idx, :], y[test_idx]
plt.scatter(XTest[:,0], XTest[:,1], c = '', alpha = 1.0, linewidth = 1, marker = 'o', s = 55, label = 'test set')
This business with meshgrid and ravel is simply a way of taking the cartesian product of the coordinate ranges in order to get a set of (x, y) coordinate pairs representing individual points in a region.
The classifier expects its input to be an Nx2 array, where N is the number of samples (i.e., cases whose class you want to predict). It wants two columns because there are two features.
Meshgrid produces two arrays, one containing the X coordinates of points in a specified rectangular region, and the other containing the Y coordinates of those points. By using .ravel(), you roll out these arrays into lists of coordinates. This is just a somewhat confusing way of taking the cartesian product of the desired coordinate ranges. In other words, this:
xx1, xx2 = np.meshgrid(np.arange(x1_min, x1_max, resolution), np.arange(x2_min, x2_max, resolution))
coord1, coord2 = xx1.ravel(), xx2.ravel()
Is effectively the same as this:
coord1, coord2 = zip(*itertools.product(np.arange(x1_min, x1_max, resolution), np.arange(x2_min, x2_max, resolution)))
You can see this with a simple example:
>>> xx1, xx2 = np.meshgrid(np.arange(3), np.arange(2))
>>> coord1, coord2 = xx1.ravel(), xx2.ravel()
>>> coord1
array([0, 1, 2, 0, 1, 2])
>>> coord2
array([0, 0, 0, 1, 1, 1])
>>> coord1, coord2 = zip(*itertools.product(np.arange(3), np.arange(2)))
>>> coord1
(0, 0, 1, 1, 2, 2)
>>> coord2
(0, 1, 0, 1, 0, 1)
You can see that the same x/y pairs are generated there (although they are generated in different orders).
The meshgrid approach was probably chosen here because it's needed for contourf. contourf essentially takes an "XY plane" as input (consisting of arrays of X and Y coordinates) along with an array of Z values for each point in that plane.
The upshot is that the classifier and the contour plot expect input in different formats. The classifier takes two individual values (the two input features) and returns a single value (the class it predicts). contourf requires a rectangular grid of points. In other words, loosely speaking, predict wants one X coordinate and one Y coordinate at a time, but contourf wants all the X coordinates first and then all the Y coordinates. The code you posted is doing some reshaping to convert between these two formats. You generate X and Y in the format contourf wants, and reshape it into the format predict wants so you can pass it to predict. predict gives you the Z data in the shape predict likes, and then you reshape that back into the format contourf wants.

Categories

Resources