Optimization of numpy mesh creation for efficient interpolation

Optimization of numpy mesh creation for efficient interpolation - python

I am reading magnetic field data from a text file. My goal is to correctly and efficiently load the mesh points (in 3 dimensions) and the associated fields (for simplicity I will assume below that I have a scalar field).
I managed to make it work, however I feel that some steps might not be necessary. In particular, reading the numpy doc it might be that "broadcasting" would be able to work its magic to my advantage.
import numpy as np
from scipy import interpolate
# Loaded from a text file, here the sampling over each dimension is identical but it is not required
x = np.array([-1.0, -0.5, 0.0, 0.5, 1.0])
y = np.array([-1.0, -0.5, 0.0, 0.5, 1.0])
z = np.array([-1.0, -0.5, 0.0, 0.5, 1.0])
# Create a mesh explicitely
mx, my, mz = np.meshgrid(x, y, z, indexing='ij') # I have to switch from 'xy' to 'ij'
# These 3 lines seem odd
mx = mx.reshape(np.prod(mx.shape))
my = my.reshape(np.prod(my.shape))
mz = mz.reshape(np.prod(mz.shape))
# Loaded from a text file
field = np.random.rand(len(mx))
# Put it all together
data = np.array([mx, my, mz, field]).T
# Interpolate
interpolation_points = np.array([[0, 0, 0]])
interpolate.griddata(data[:, 0:3], data[:, 3], interpolation_points, method='linear')
Is it really necessary to construct the mesh like this? Is it possible to make it more efficient?

Here's one with broadcasted-assignment to generate data directly from x,y,z and hence avoid the memory overhead of creating all the mesh-grids and hopefully lead to better performance -
m,n,r = len(x),len(y),len(z)
out = np.empty((m,n,r,4))
out[...,0] = x[:,None,None]
out[...,1] = y[:,None]
out[...,2] = z
out[...,3] = np.random.rand(m,n,r)
data_out = out.reshape(-1,out.shape[-1])

Related

Understanding the functions ProjectPoints() and UndistortPoints() of OpenCV

Trying to check if I was using correctly both projectPoints() and undistortPoints(), I am obtaining results which I find hard to understand.
Having defined the following intrinsic parameters:
# calibration matrix
K = np.array([
[500.0, 0.0, 300.0],
[0.0, 500.0, 250.0],
[0.0, 0.0, 1.0]
])
# distortion coefficients (k1, k2, p1, p2, k3)
distCoeffs = np.array([1.5, -0.95, -0.005, 0.0025, 1.16])
and the following pointcloud in the camera reference (with all the points being in the same plane):
# Coordinates of points in plane, representing a pointcloud in camera reference (c)
# plane
H, W = 1, 2
X, Y = np.meshgrid(np.arange(-W, W, 0.2), np.arange(-H, H, 0.2))
X, Y = X.reshape(1, -1), Y.reshape(1, -1)
# add depth. Pointcloud of n points represented as a (3, n) array:
Z = 5
P_c = np.concatenate((X, Y, Z * np.ones_like(X)), axis=0)
I was expecting that the following process would yield the original pointcloud:
Projecting the points, while accounting for the distortion, i.e. obtaining the distorted coordinates in the image plane:
# project points, including with lens distortion
U_dist, _ = cv2.projectPoints(P_c, np.zeros((3,)), np.zeros((3,)), K, distCoeffs)
# projections as (2, n) array.
U_dist = U_dist[:, 0].T
Undistort the image coordinates to get the normalized coordinates in the camera reference.
# get normalized coordinates, in camera reference, as a (2, n) array
xn_u = cv2.undistortPoints(U_dist.T, K, distCoeffs, None, None)[:, 0].T
Multiplying the previous normalized coordinates with the depth of the plane in the camera reference to get the original pointcloud:
# add depth.
P_c2 = Z * np.concatenate((xn_u, np.ones_like(X)))
[Complete code]
import numpy as np
import cv2
# calibration matrix
K = np.array([
[500.0, 0.0, 300.0],
[0.0, 500.0, 250.0],
[0.0, 0.0, 1.0]
])
# distortion coefficients (k1, k2, p1, p2, k3)
distCoeffs = np.array([1.5, -0.95, -0.005, 0.0025, 1.16])
# Coordinates of points in plane, representing a pointcloud in camera reference (c)
# plane
H, W = 1, 2
X, Y = np.meshgrid(np.arange(-W, W, 0.2), np.arange(-H, H, 0.2))
X, Y = X.reshape(1, -1), Y.reshape(1, -1)
# add depth. Pointcloud of n points represented as a (3, n) array:
Z = 5
P_c = np.concatenate((X, Y, Z * np.ones_like(X)), axis=0)
# ---------------------------------------------
# PROJECTION WITH DISTORTION
# project points, including with lens distortion
U_dist, _ = cv2.projectPoints(P_c, np.zeros((3,)), np.zeros((3,)), K, distCoeffs)
# projections as (2, n) array.
U_dist = U_dist[:, 0].T
#-----------------------------
# UNPROJECTION accounting for distortion
# get unnormalized coordinates, in camera reference, as a (2, n) array
xn_u = cv2.undistortPoints(U_dist.T, K, distCoeffs, None, None)[:, 0].T
# add depth.
P_c2 = Z * np.concatenate((xn_u, np.ones_like(X)))
# check equality (raises error)
assert np.allclose(P_c, P_c2), f'max difference: {np.abs(P_c - P_c2).max()}'
However, this is not true and the resulting pointcloud is significantly different than the original one.
I feel this may be due to a misunderstanding in the use of the previous functions.
Any help to understand where I'm doing the wrong step(s) is highly appreciated
EDIT
After some more experimentation, I believe the issue is with undistortPoints() rather than with projectPoints(). The latter is deterministic, while the former needs to solve a non-linear optimization problem. Emprically, by increasing the distortion, undistortPoints() tends to give worse results. However at lower levels, it correctly fixes it.

Interpolate image at specific coordinates

Given an image (array) in rectangular form, how do I interpolate specific pixel positions? The following code produces as 20x30 grid, with each pixel filled with a value (zg). The code then constructs an interpolator with scipy's interp2d method. What I want is to obtain interpolated values at specific coordinates. In the given example, at x = [1.5, 2.4, 5.8], y = [0.5, 7.2, 2.2], so for a total of 3 positions. However, the function returns a 3x3 array for some reason. Why? And how would I change the code so that only these three coordinates would be evaluated?
import numpy as np
from scipy.interpolate import interp2d
# Rectangular grid
x = np.arange(20)
y = np.arange(30)
xg, yg = np.meshgrid(x, y)
zg = np.exp(-(2*xg)**2 - (yg/2)**2)
# Define interpolator
interp = interp2d(yg, xg, zg)
# Interpolate pixel value
zi = interp([1.5, 2.4, 5.8], [0.5, 7.2, 2.2])
print(zi.shape) # = (3, 3)

Your code is fine. The interp interpolation function is computing all the possible combinations of coordinates, i.e. 3 × 3 = 9. For instance:
>>> interp(1.5, 0.5)
array([0.04635516])
>>> interp(1.5, 7.2)
array([0.02152198])
>>> interp(5.8, 2.2)
array([0.03073694])
>>> interp(2.4, 2.2)
array([0.03810408])
Indeed you can find these values in the returned matrix:
>>> interp([1.5, 2.4, 5.8], [0.5, 7.2, 2.2])
array([[0.04635516, 0.04409826, 0.03557219],
[0.0400542 , 0.03810408, 0.03073694],
[0.02152198, 0.02047414, 0.01651562]])
The documentation states that the return value is a
2-D array with shape (len(y), len(x))
If you just want the coordinates you need, you can do the following:
xe = [1.5, 2.4, 5.8]
ye = [0.5, 7.2, 2.2]
>>> [interp(x, y)[0] for x, y in zip(xe, ye)]
[0.04635515780224686, 0.020474138863349815, 0.030736938802464715]

Trying to infer model parameter of Lotka-Volterra model

def derivative(X, t, A, B, C, D):
x, y = X
dotx = x * (A - B * y)
doty = y * (-D + C * x)
return np.array([dotx, doty])
def integration(t,A,B,C,D,X0):
res = odeint(derivative, X0, t, args = (A,B,C,D))
return res
X0 = [30, 4]
X = array([[30. , 4. ],
[47.2, 6.1],
[70.2, 9.8],
[77.4, 35.2],
[36.3, 59.4],
[20.6, 41.7],
[18.1, 19. ],
[21.4, 13. ],
[22. , 8.3],
[25.4, 9.1],
[27.1, 7.4],
[40.3, 8. ],
[57. , 12.3],
[76.6, 19.5],
[52.3, 45.7],
[19.5, 51.1],
[11.2, 29.7],
[ 7.6, 15.8],
[14.6, 9.7],
[16.2, 10.1],
[24.7, 8.6]])
t = [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0]
XData = t
YData = X
curve_fit(integration,XData,YData)
So X is my data, the first column is species x, and second column is species y.
I tried to infer parameters for this Lotka-Volterra model using ode and curve fit.
The error says not enough values to unpack (expected 2, got 1)
I am actually not even sure whether I should infer parameter this way.
Can anyone help me with this, are there any better methods of infering parameters.
Thanks in advance!

Note that ydata is required to be a flat array. While it is strongly suggested that xdata contains one input value or vector per element of ydata, there is no requirement for it. xdata is a constant that could also have been passed some other way. It is there just for the convenience in standard regression tasks.
Thus it is also no problem to have ydata twice as long as xdata. Just apply .flatten() to the 2-dimensional arrays.
Next, the parameter list has to be a list of scalars, so add Y0 and pass the initial vector [X0,Y0].
Together these corrections lead to a result. Which is not very convincing.
I got a better result, but still not perfect, in using a multiple shooting approach, taking the points in X[:-1] and integrating for the time step 1, comparing the collected list of end-points to X[1:]. This works better in finding parameters that match amplitude and frequency, but produces a slight speed difference that looks better with a 3% correction of the coefficients.
One would probably need a mix of both approaches to get the local as well as global characteristics respected.
And indeed it works, giving parameters
A,B = 0.5215206964006734, 0.02567364947581818
C,D = 0.02493663631623848, 0.8476224408838039
X0,Y0 = 34.53872014350661, 4.653177640949391
Code for that complex fitting program: For the residual computation, first encapsulate the solver to avoid repetition of solver parameters. Then use that to first integrate over the full interval with the variable initial point, and then over the time step 1 segments.
def solver(XY,t,para):
return odeint(derivative, XY, t, args = para, atol=1e-8, rtol=1e-11)
def integration(XY_arr,*para):
XY0 = para[4:]
para = para[:4]
T = np.arange(len(XY_arr))
res0 = solver(XY0,T, para)
res1 = [ solver(XY,[t,t+1],para)[-1]
for t,XY in enumerate(XY_arr[:-1]) ]
return np.concatenate([res0,res1]).flatten()
This obviously needs the reference array prepared in a similar fashion
XData = X
YData = np.concatenate([ X,X[1:]]).flatten()
p0 =[ 0.5215, 0.02567,
0.02493, 0.8476,
34.53, 4.653]
After that the curve fitting procedure call remains the same, all changes happened before
params, info = curve_fit(integration,XData,YData,p0=p0)
XY0, para = params[4:], params[:4]
print(XY0,tuple(para))
t_plot = np.linspace(0,len(X),500)
x_plot = solver(XY0, t_plot, tuple(para))

Scikit-learn and data visusalisation: Why do I have to use ravel when I use predict?

I have a function here that visualises the classification made by a certain classifier like Logistic Regression or simply the perceptron model. But I don't get several things:
X has n examples and just 2 features.
Why do I have to use xx1.ravel() and xx2.ravel() and then transpose the entire array for classifier.predict? Why can't I simply predict the outcomes using the original dimensions?
2.Why do I need to reshape Z back to the original xx1 shape?
Why is there a need to create a meshgrid for plotting a scatter plot? Does the specific points in the meshgrid act like 'pixels' that represent a certain point on the grid? Why is this needed anyway?
What is the idx value in idx, cl in enumerate(np.unique(y)), when all I get when I use unique is simply the unique id of the outcomes?
What is the use of c = cmap(idx) in the scatter function? Why can cmap take in an argument?
I apologise for the latter questions that may not fit with the topic question.
The code is taken from the Python Machine Learning book.
def plot_decision_regions(X, y, classifier, test_idx = None, resolution = 0.002):
#Setup marker generator and color map
markers = ('s', 'x', 'o', '^', 'v')
colors = ('red','blue','green','gray','cyan')
cmap = ListedColormap(colors[:len(np.unique(y))])
#MESHGRID - plot decision surface
x1_min, x1_max = X[:, 0].min(), X[:, 0].max()
x2_min, x2_max = X[:, 1].min(), X[:, 1].max()
xx1, xx2 = np.meshgrid(np.arange(x1_min, x1_max, resolution), np.arange(x2_min, x2_max, resolution))
# print 'meshgrid:', xx1, xx2
#CLASSIFIER PREDICT
Z = classifier.predict(np.array([xx1.ravel(), xx2.ravel()]).T)
Z = Z.reshape(xx1.shape)
plt.contourf(xx1, xx2, Z, alpha = 0.4, cmap=cmap)
plt.xlim(xx1.min(), xx1.max())
plt.ylim(xx2.min(), xx2.max())
for idx, cl in enumerate(np.unique(y)):
plt.scatter(x = X[y == cl, 0], y = X[y == cl, 1], alpha = 0.8, c = cmap(idx), marker = markers[idx], label =cl)
#highlight test samples
if test_idx:
XTest, yTest = X[test_idx, :], y[test_idx]
plt.scatter(XTest[:,0], XTest[:,1], c = '', alpha = 1.0, linewidth = 1, marker = 'o', s = 55, label = 'test set')

This business with meshgrid and ravel is simply a way of taking the cartesian product of the coordinate ranges in order to get a set of (x, y) coordinate pairs representing individual points in a region.
The classifier expects its input to be an Nx2 array, where N is the number of samples (i.e., cases whose class you want to predict). It wants two columns because there are two features.
Meshgrid produces two arrays, one containing the X coordinates of points in a specified rectangular region, and the other containing the Y coordinates of those points. By using .ravel(), you roll out these arrays into lists of coordinates. This is just a somewhat confusing way of taking the cartesian product of the desired coordinate ranges. In other words, this:
xx1, xx2 = np.meshgrid(np.arange(x1_min, x1_max, resolution), np.arange(x2_min, x2_max, resolution))
coord1, coord2 = xx1.ravel(), xx2.ravel()
Is effectively the same as this:
coord1, coord2 = zip(*itertools.product(np.arange(x1_min, x1_max, resolution), np.arange(x2_min, x2_max, resolution)))
You can see this with a simple example:
>>> xx1, xx2 = np.meshgrid(np.arange(3), np.arange(2))
>>> coord1, coord2 = xx1.ravel(), xx2.ravel()
>>> coord1
array([0, 1, 2, 0, 1, 2])
>>> coord2
array([0, 0, 0, 1, 1, 1])
>>> coord1, coord2 = zip(*itertools.product(np.arange(3), np.arange(2)))
>>> coord1
(0, 0, 1, 1, 2, 2)
>>> coord2
(0, 1, 0, 1, 0, 1)
You can see that the same x/y pairs are generated there (although they are generated in different orders).
The meshgrid approach was probably chosen here because it's needed for contourf. contourf essentially takes an "XY plane" as input (consisting of arrays of X and Y coordinates) along with an array of Z values for each point in that plane.
The upshot is that the classifier and the contour plot expect input in different formats. The classifier takes two individual values (the two input features) and returns a single value (the class it predicts). contourf requires a rectangular grid of points. In other words, loosely speaking, predict wants one X coordinate and one Y coordinate at a time, but contourf wants all the X coordinates first and then all the Y coordinates. The code you posted is doing some reshaping to convert between these two formats. You generate X and Y in the format contourf wants, and reshape it into the format predict wants so you can pass it to predict. predict gives you the Z data in the shape predict likes, and then you reshape that back into the format contourf wants.

numpy *= not working

I use numpy to calculate matrix multiply.
If I use t = t * x, it works just fine, but if I use t *= x, it doesn't.
Do I need to use t = t * x?
import numpy as np
if __name__ == '__main__':
x = [
[0.9, 0.075, 0.025],
[0.15, 0.8, 0.05],
[0.25, 0.25, 0.5]
]
t = [1, 0, 0]
x = np.matrix(x)
t = np.matrix(t)
t = t * x # work , [[ 0.9 0.075 0.025]]
# t *= x # not work? always [[0 0 0]]
print t

You filled t with ints rather than floats, so NumPy decides you want a matrix of integer dtype. When you do t *= x, this requests that the operation be performed in place, reusing the t object to store the result. This forces the results to be cast to integers, so they can be stored in t.
Initialize t with floats:
t = numpy.matrix([1.0, 0.0, 0.0])
I would also recommend switching to plain arrays, rather than matrices. The convenience of * over dot isn't worth the inconsistencies matrix causes. If you're on Python 3.5 or later, you can even use # for matrix multiplication with regular arrays.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Optimization of numpy mesh creation for efficient interpolation - python

Related

Understanding the functions ProjectPoints() and UndistortPoints() of OpenCV

Interpolate image at specific coordinates

Trying to infer model parameter of Lotka-Volterra model

Scikit-learn and data visusalisation: Why do I have to use ravel when I use predict?

numpy *= not working

Categories

Resources