How to smooth z values and reduce its variation in python? - python

I have a 3D point cloud file containing x, y,z values. My ultimate goal is to cluster the point cloud. But the point cloud had some noise and I removed the noise using PCA. Now I want to smoothen the z values and cluster it based on the x and y.
I know the smoothing can be done using "gridfit" in matlab, (gridfit.m) but I am looking for a solution that can be done in Python.
https://au.mathworks.com/matlabcentral/fileexchange/8998-surface-fitting-using-gridfit
Is there a method that works similar to gridfit for reducing the variation of z?

this function is open source, it take some time to convert it but it is not impossible.
You just need theses functions:
def histc(X, bins):
map_to_bins = numpy.digitize(X,bins)
r = numpy.zeros(bins.shape)
for i in map_to_bins:
r[i-1] += 1
return r, map_to_bins
def sparse(i, j, v, m, n):
"""
Create and compressing a matrix that have many zeros
Parameters:
i: 1-D array representing the index 1 values
Size n1
j: 1-D array representing the index 2 values
Size n1
v: 1-D array representing the values
Size n1
m: integer representing x size of the matrix
n: integer representing y size of the matrix
Returns:
s: 2-D array
Matrix full of zeros excepting values v at indexes i, j
"""
return scipy.sparse.csr_matrix((v, (i, j)), shape=(m, n))
def repmat(A, reps):
"""
Repeting A list according to list dimension
Parameters:
A: array_like
The input array.
reps: array_like
The number of repetitions of A along each axis.
Returns:
c: ndarray
The tiled output array.
"""
return numpy.tile(A, numpy.array(reps))
def meshgrid(a, b):
"""
Return coordinate matrices from coordinate vectors.
Make N-D coordinate arrays for vectorized evaluations
of N-D scalar/vector fields over N-D grids,
given one-dimensional coordinate arrays a, b
Parameters:
a: array_like
1-D arrays representing the x coordinates of a grid.
b: array_like
1-D arrays representing the y coordinates of a grid.
Returns:
c: 2-D arrays representing the x * ycoordinates of grid.
"""
return numpy.meshgrid(a, b)
Otherwise you can use griddata and apply a blur on it by considering the result as an grayscale image but don't forget to cut off edges to avoid sharp edges:
You have the mathematical description here.
Don't forget to checkout source code here.
Here is the default parameter code in matlab to translate:
function [zgrid,xgrid,ygrid] = gridfit(x,y,z,xnodes,ynodes,varargin)
params.smoothness = 1;
params.interp = 'triangle';
params.regularizer = 'gradient';
params.solver = 'backslash';
params.maxiter = [];
params.extend = 'warning';
params.tilesize = inf;
params.overlap = 0.20;
params.mask = [];
params.autoscale = 'on';
params.xscale = 1;
params.yscale = 1;
params = check_params(params);
xmin = min(x);
xmax = max(x);
ymin = min(y);
ymax = max(y);
xnodes=xnodes(:);
ynodes=ynodes(:);
dx = diff(xnodes);
dy = diff(ynodes);
nx = length(xnodes);
ny = length(ynodes);
ngrid = nx*ny;
params.xscale = mean(dx);
params.yscale = mean(dy);
n = length(x);
[junk,indx] = histc(x,xnodes);
[junk,indy] = histc(y,ynodes);
k=(indx==nx);
indx(k)=indx(k)-1;
k=(indy==ny);
indy(k)=indy(k)-1;
ind = indy + ny*(indx-1);
tx = min(1,max(0,(x - xnodes(indx))./dx(indx)));
ty = min(1,max(0,(y - ynodes(indy))./dy(indy)));
A = sparse(
repmat((1:n)',1,4),
[ind,ind+1,ind+ny,ind+ny+1],
[(1-tx).*(1-ty), (1-tx).*ty, tx.*(1-ty), tx.*ty],
n,
ngrid
);
rhs = z;
smoothparam = params.smoothness;
xyRelativeStiffness = [1;1];
[i,j] = meshgrid(1:nx,2:(ny-1));
ind = j(:) + ny*(i(:)-1);
dy1 = dy(j(:)-1)/params.yscale;
dy2 = dy(j(:))/params.yscale;
Areg = sparse(
repmat(ind,1,3),
[ind-1,ind,ind+1],
xyRelativeStiffness(2)*[-2./(dy1.*(dy1+dy2)),
2./(dy1.*dy2),
-2./(dy2.*(dy1+dy2))],
ngrid,
ngrid
);
[i,j] = meshgrid(1:nx,2:(ny-1));
ind = j(:) + ny*(i(:)-1);
dy1 = dy(j(:)-1)/params.yscale;
dy2 = dy(j(:))/params.yscale;
Areg = sparse(repmat(ind,1,3),[ind-1,ind,ind+1], ...
xyRelativeStiffness(2)*[-2./(dy1.*(dy1+dy2)), ...
2./(dy1.*dy2), -2./(dy2.*(dy1+dy2))],ngrid,ngrid);
[i,j] = meshgrid(2:(nx-1),1:ny);
ind = j(:) + ny*(i(:)-1);
dx1 = dx(i(:)-1)/params.xscale;
dx2 = dx(i(:))/params.xscale;
Areg = [Areg;sparse(repmat(ind,1,3),[ind-ny,ind,ind+ny], ...
xyRelativeStiffness(1)*[-2./(dx1.*(dx1+dx2)), ...
2./(dx1.*dx2), -2./(dx2.*(dx1+dx2))],ngrid,ngrid)];
nreg = size(Areg,1);
NA = norm(A,1);
NR = norm(Areg,1);
A = [A;Areg*(smoothparam*NA/NR)];
rhs = [rhs;zeros(nreg,1)];
zgrid = reshape((A'*A)\(A'*rhs),ny,nx);

Related

Multigrid Poisson Solver

I am trying to make my own CFD solver and one of the most computationally expensive parts is solving for the pressure term. One way to solve Poisson differential equations faster is by using a multigrid method. The basic recursive algorithm for this is:
function phi = V_Cycle(phi,f,h)
% Recursive V-Cycle Multigrid for solving the Poisson equation (\nabla^2 phi = f) on a uniform grid of spacing h
% Pre-Smoothing
phi = smoothing(phi,f,h);
% Compute Residual Errors
r = residual(phi,f,h);
% Restriction
rhs = restriction(r);
eps = zeros(size(rhs));
% stop recursion at smallest grid size, otherwise continue recursion
if smallest_grid_size_is_achieved
eps = smoothing(eps,rhs,2*h);
else
eps = V_Cycle(eps,rhs,2*h);
end
% Prolongation and Correction
phi = phi + prolongation(eps);
% Post-Smoothing
phi = smoothing(phi,f,h);
end
I've attempted to implement this algorithm myself (also at the end of this question) however it is very slow and doesn't give good results so evidently it is doing something wrong. I've been trying to find why for too long and I think it's just worthwhile seeing if anyone can help me.
If I use a grid size of 2^5 by 2^5 points, then it can solve it and give reasonable results. However, as soon as I go above this it takes exponentially longer to solve and basically get stuck at some level of inaccuracy, no matter how many V-Loops are performed. at 2^7 by 2^7 points, the code takes way too long to be useful.
I think my main issue is that my implementation of a jacobian iteration is using linear algebra to calculate the update at each step. This should, in general, be fast however, the update matrix A is an n*m sized matrix, and calculating the dot product of a 2^7 * 2^7 sized matrix is expensive. As most of the cells are just zeros, should I calculate the result using a different method?
if anyone has any experience in multigrid methods, I would appreciate any advice!
Thanks
my code:
# -*- coding: utf-8 -*-
"""
Created on Tue Dec 29 16:24:16 2020
#author: mclea
"""
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import convolve2d
from mpl_toolkits.mplot3d import Axes3D
from scipy.interpolate import griddata
from matplotlib import cm
def restrict(A):
"""
Creates a new grid of points which is half the size of the original
grid in each dimension.
"""
n = A.shape[0]
m = A.shape[1]
new_n = int((n-2)/2+2)
new_m = int((m-2)/2+2)
new_array = np.zeros((new_n, new_m))
for i in range(1, new_n-1):
for j in range(1, new_m-1):
ii = int((i-1)*2)+1
jj = int((j-1)*2)+1
# print(i, j, ii, jj)
new_array[i,j] = np.average(A[ii:ii+2, jj:jj+2])
new_array = set_BC(new_array)
return new_array
def interpolate_array(A):
"""
Creates a grid of points which is double the size of the original
grid in each dimension. Uses linear interpolation between grid points.
"""
n = A.shape[0]
m = A.shape[1]
new_n = int((n-2)*2 + 2)
new_m = int((m-2)*2 + 2)
new_array = np.zeros((new_n, new_m))
i = (np.indices(A.shape)[0]/(A.shape[0]-1)).flatten()
j = (np.indices(A.shape)[1]/(A.shape[1]-1)).flatten()
A = A.flatten()
new_i = np.linspace(0, 1, new_n)
new_j = np.linspace(0, 1, new_m)
new_ii, new_jj = np.meshgrid(new_i, new_j)
new_array = griddata((i, j), A, (new_jj, new_ii), method="linear")
return new_array
def adjacency_matrix(rows, cols):
"""
Creates the adjacency matrix for an n by m shaped grid
"""
n = rows*cols
M = np.zeros((n,n))
for r in range(rows):
for c in range(cols):
i = r*cols + c
# Two inner diagonals
if c > 0: M[i-1,i] = M[i,i-1] = 1
# Two outer diagonals
if r > 0: M[i-cols,i] = M[i,i-cols] = 1
return M
def create_differences_matrix(rows, cols):
"""
Creates the central differences matrix A for an n by m shaped grid
"""
n = rows*cols
M = np.zeros((n,n))
for r in range(rows):
for c in range(cols):
i = r*cols + c
# Two inner diagonals
if c > 0: M[i-1,i] = M[i,i-1] = -1
# Two outer diagonals
if r > 0: M[i-cols,i] = M[i,i-cols] = -1
np.fill_diagonal(M, 4)
return M
def set_BC(A):
"""
Sets the boundary conditions of the field
"""
A[:, 0] = A[:, 1]
A[:, -1] = A[:, -2]
A[0, :] = A[1, :]
A[-1, :] = A[-2, :]
return A
def create_A(n,m):
"""
Creates all the components required for the jacobian update function
for an n by m shaped grid
"""
LaddU = adjacency_matrix(n,m)
A = create_differences_matrix(n,m)
invD = np.zeros((n*m, n*m))
np.fill_diagonal(invD, 1/4)
return A, LaddU, invD
def calc_RJ(rows, cols):
"""
Calculates the jacobian update matrix Rj for an n by m shaped grid
"""
n = int(rows*cols)
M = np.zeros((n,n))
for r in range(rows):
for c in range(cols):
i = r*cols + c
# Two inner diagonals
if c > 0: M[i-1,i] = M[i,i-1] = 0.25
# Two outer diagonals
if r > 0: M[i-cols,i] = M[i,i-cols] = 0.25
return M
def jacobi_update(v, f, nsteps=1, max_err=1e-3):
"""
Uses a jacobian update matrix to solve nabla(v) = f
"""
f_inner = f[1:-1, 1:-1].flatten()
n = v.shape[0]
m = v.shape[1]
A, LaddU, invD = create_A(n-2, m-2)
Rj = calc_RJ(n-2,m-2)
update=True
step = 0
while update:
v_old = v.copy()
step += 1
vt = v_old[1:-1, 1:-1].flatten()
vt = np.dot(Rj, vt) + np.dot(invD, f_inner)
v[1:-1, 1:-1] = vt.reshape((n-2),(m-2))
err = v - v_old
if step == nsteps or np.abs(err).max()<max_err:
update=False
return v, (step, np.abs(err).max())
def MGV(f, v):
"""
Solves for nabla(v) = f using a multigrid method
"""
# global A, r
n = v.shape[0]
m = v.shape[1]
# If on the smallest grid size, compute the exact solution
if n <= 6 or m <=6:
v, info = jacobi_update(v, f, nsteps=1000)
return v
else:
# smoothing
v, info = jacobi_update(v, f, nsteps=10, max_err=1e-1)
A = create_A(n, m)[0]
# calculate residual
r = np.dot(A, v.flatten()) - f.flatten()
r = r.reshape(n,m)
# downsample resitdual error
r = restrict(r)
zero_array = np.zeros(r.shape)
# interploate the correction computed on a corser grid
d = interpolate_array(MGV(r, zero_array))
# Add prolongated corser grid solution onto the finer grid
v = v - d
v, info = jacobi_update(v, f, nsteps=10, max_err=1e-6)
return v
sigma = 0
# Setting up the grid
k = 6
n = 2**k+2
m = 2**(k)+2
hx = 1/n
hy = 1/m
L = 1
H = 1
x = np.linspace(0, L, n)
y = np.linspace(0, H, m)
XX, YY = np.meshgrid(x, y)
# Setting up the initial conditions
f = np.ones((n,m))
v = np.zeros((n,m))
# How many V cyles to perform
err = 1
n_cycles = 10
loop = True
cycle = 0
# Perform V cycles until converged or reached the maximum
# number of cycles
while loop:
cycle += 1
v_new = MGV(f, v)
if np.abs(v - v_new).max() < err:
loop = False
if cycle == n_cycles:
loop = False
v = v_new
print("Number of cycles " + str(cycle))
plt.contourf(v)
I realize that I'm not answering your question directly, but I do note that you have quite a few loops that will contribute some overhead cost. When optimizing code, I have found the following thread useful - particularly the line profiler thread. This way you can focus in on "high time cost" lines and then start to ask more specific questions regarding opportunities to optimize.
How do I get time of a Python program's execution?

Coding Isomap (& MDS) function using only numpy and scipy in python

I have coded Isomap function starting with computing the eulidean distance matrix (using scipy.spatial.distance.cdist), next basing on K-nearest neighbors method and Dijkstra algorithm (to determinate the shortest path) I have Computed the full distance matrix over all paths, finally I have did map computations, following by the dimensionality reduction.
BUT, I want to use epsilon instead of K-nearest neighbors like in the following :
Y = isomap (X, epsilon, d)
• X is an n × m matrix which corresponds to n points with m attributes.
• epsilon is an anonymous function of the distance matrix used to find the parameters of neighborhood. (The neighborhood graph must be formed by eliminating the edges whose width is greater to epsilon of the complete distance graph).
• d is a parameter which signifies the output dimension.
• Y is an n × d matrix, which signifies the embedding resulting from isomap.
THANKS in advance
import numpy as np
import matplotlib.pyplot as plt
from scipy.spatial.distance import cdist
def distance_Matrix(X):
return cdist(X,X,'euclidean')
def Dijkstra(h):
q = h.copy()
for i in range(ndata):
for j in range(ndata):
k = np.argmin(q[i,:])
while not(np.isinf(q[i,k])):
q[i,k] = np.inf
for l in neighbours[k,:]:
possible = h[i,l] + h[l,k]
if possible < h[i,k]:
h[i,k] = possible
k = np.argmin(q[i,:])
return h
def MDS(D,newdim=2):
n = D.shape[0]
# Torgerson formula
I = np.eye(n)
J = np.ones(D.shape)
J = I-(1/n)*J
B = (-1/2)*np.dot(np.dot(J,D),np.dot(D,J)) # B = -(1/2).JD²J
#
eigenval, eigenvec = np.linalg.eig(B)
indices = np.argsort(eigenval)[::-1]
eigenval = eigenval[indices]
eigenvec = eigenvec[:, indices]
# dimension reduction
K = eigenvec[:, :newdim]
L = np.diag(eigenval[:newdim])
# result
Y = K # L **(1/2)
return np.real(Y)
def isomap(data,newdim=2,K=12):
ndata = np.shape(data)[0]
ndim = np.shape(data)[1]
d = distance_Matrix(X)
# replace begin
# K-nearest neighbours
indices = d.argsort()
#notneighbours = indices[:,K+1:]
neighbours = indices[:,:K+1]
# replace end
h = np.ones((ndata,ndata),dtype=float)*np.inf
for i in range(ndata):
h[i,neighbours[i,:]] = d[i,neighbours[i,:]]
h = Dijkstra(h)
return MDS(h,newdim)
Try sklearn.neighbors.radius_neighbors_graph for your distance matrix

Vecrtorized evluation of function defined by matrix over grid

I'm looking to plot the value of a function defined by a matrix over a grid of values.
Let S be an invertable 2x2 matrix and let x be a 2-dimensional vector. How can vectorize the evaluation of x#S#x over a two dimensional grid?
Here is how I currently do it. It works, but takes a beat to perform the computation since the grid is so fine.
#Initialize Matrix
S = np.zeros(shape = (2,2))
while np.linalg.matrix_rank(S)<S.shape[1]:
S = np.random.randint(-5,5+1, size = (2,2))
X,Y = [j.ravel() for j in np.meshgrid(np.linspace(-2,2,1001),np.linspace(-2,2,1001))]
Z = np.zeros_like(X)
for i,v in enumerate(zip(X,Y)):
v = np.array(v)
Z[i] = v#S#v
n = int(np.sqrt(X.size))
Z = Z.reshape(n,n)
X = X.reshape(n,n)
Y = Y.reshape(n,n)
plt.contour(X,Y,Z)
Simplest would be with stacking those X,Y into a 2-column 2D array and then using np.einsum to replace the loopy matrix-multiplications -
p = np.column_stack((X,Y)) # or np.stack((X,Y)).T
Zout = np.einsum('ij,jk,ik->i',p,S,p,optimize=True)

Integrating an array that returns the same size array instead of a tuple in python?

My apologize for the lengthy title. I have been working on a project for a a while and i'm in a rut for a certain part in my code. Ill do my best to be thorough.
I have an numpy array of masses, M, of size and shape 167197.
## Non-constant
M = data['m200'] # kg // Mass of dark matter haloes
R = [] # Km // Radius of sphere
for masses in M:
R.append(((3*masses)/(RHO_C*4*(3.14))**(1.0/3.0)))
I have fitting function with independent values of k that are part of my question. k is defined value in my code.
def T(k): # Fitting Function // Assuming a lambdaCDM model
q = k/((OMEGA_M)*H**2)*((T_CMB)/27)**2
L = np.log(euler+1.84*q)
C = 14.4 + 325/(1+60.5*q**1.11)
return L/(L+C*q**2)
##############################################################################
def P(k): # Linear Power Spectrum
A = 0.75 # LambdaCDM Power Normalization
n = 0.95 # current constraints from WMAP+LSS
return A*k**n*T(k)**2
* For the actual problem *
I have a Fourier transfrom W(kR)
def W(R):# Fourier Transfrom of Top Hat function
return (3*(np.sin(k*R)-(k*R)*np.cos(k*R)))/(k*R)**(3)
W_a = []
for radii in R:
W_a.append(W(radii))
In this condition, i'm treating R as the independent value instead of kR combined
printing the length of W_a gives me the exact same size as mu numpy array, so all is well.
This function will play a part for a integral along with the is included in this function of sigma
def sigma(R): # Mass Varience
k1 = lambda k: k**2*P(k)*W(R)**2
norm1 = 1/(2*np.pi**2)
return (integrate.quad(k1, 0, np.Inf))
sigma_a = []
for radii in R:
sigma_a.extend(sigma(radii))
The integral will create a tuple, of course. But for each value inside R. I'm wanting to create a list, or an array. So, when using .extend(), the length of my array is now doubled with a length of now 334394.
How do i correct it to where the integral evaluates each R in W(kR) returning an array of the same size, 167197?
First just a Python note:
R = [] # Km // Radius of sphere
for masses in M:
R.append(((3*masses)/(RHO_C*4*(3.14))**(1.0/3.0)))
can be expressed as:
R = [((3*masses)/(RHO_C*4*(3.14))**(1.0/3.0)) for masses in M]
In:
return (integrate.quad(k1, 0, np.Inf))
the outer set of () doesn't make a difference.
return integrate.quad(k1, 0, np.Inf)
should return the same thing.
Now where does the doubling come from? In the quad docs we see it returns 2 values, the integeral and an error term. That's shown as a tuple in some examples, but it is also unpacked in others:
y, err = integrate.quad(f, 0, 1, args=(3,))
If you want just integral, and not err, you could index, integate...()[0].
sigma_a = []
for radii in R:
sigma_a.append(sigma(radii)[0])
or
sigma_a = [sigma(radii)[0] for radii in R]
or
def sigma1(R): # Mass Varience
k1 = lambda k: k**2*P(k)*W(R)**2
norm1 = 1/(2*np.pi**2)
y, err = integrate.quad(k1, 0, np.Inf)
return y # return just the integral
sigma_a = [sigma1(radii) for radii in R]
If you want to collect both y and err, but in separate lists, use zip* to repack them (something like the numpy transpose).
ll = [sigma(radii) for radii in R]
# [(y0,err0),(y1,err1), ...]
ys, errs = zip(*ll)

How to perform cubic spline interpolation in python?

I have two lists to describe the function y(x):
x = [0,1,2,3,4,5]
y = [12,14,22,39,58,77]
I would like to perform cubic spline interpolation so that given some value u in the domain of x, e.g.
u = 1.25
I can find y(u).
I found this in SciPy but I am not sure how to use it.
Short answer:
from scipy import interpolate
def f(x):
x_points = [ 0, 1, 2, 3, 4, 5]
y_points = [12,14,22,39,58,77]
tck = interpolate.splrep(x_points, y_points)
return interpolate.splev(x, tck)
print(f(1.25))
Long answer:
scipy separates the steps involved in spline interpolation into two operations, most likely for computational efficiency.
The coefficients describing the spline curve are computed,
using splrep(). splrep returns an array of tuples containing the
coefficients.
These coefficients are passed into splev() to actually
evaluate the spline at the desired point x (in this example 1.25).
x can also be an array. Calling f([1.0, 1.25, 1.5]) returns the
interpolated points at 1, 1.25, and 1,5, respectively.
This approach is admittedly inconvenient for single evaluations, but since the most common use case is to start with a handful of function evaluation points, then to repeatedly use the spline to find interpolated values, it is usually quite useful in practice.
In case, scipy is not installed:
import numpy as np
from math import sqrt
def cubic_interp1d(x0, x, y):
"""
Interpolate a 1-D function using cubic splines.
x0 : a float or an 1d-array
x : (N,) array_like
A 1-D array of real/complex values.
y : (N,) array_like
A 1-D array of real values. The length of y along the
interpolation axis must be equal to the length of x.
Implement a trick to generate at first step the cholesky matrice L of
the tridiagonal matrice A (thus L is a bidiagonal matrice that
can be solved in two distinct loops).
additional ref: www.math.uh.edu/~jingqiu/math4364/spline.pdf
"""
x = np.asfarray(x)
y = np.asfarray(y)
# remove non finite values
# indexes = np.isfinite(x)
# x = x[indexes]
# y = y[indexes]
# check if sorted
if np.any(np.diff(x) < 0):
indexes = np.argsort(x)
x = x[indexes]
y = y[indexes]
size = len(x)
xdiff = np.diff(x)
ydiff = np.diff(y)
# allocate buffer matrices
Li = np.empty(size)
Li_1 = np.empty(size-1)
z = np.empty(size)
# fill diagonals Li and Li-1 and solve [L][y] = [B]
Li[0] = sqrt(2*xdiff[0])
Li_1[0] = 0.0
B0 = 0.0 # natural boundary
z[0] = B0 / Li[0]
for i in range(1, size-1, 1):
Li_1[i] = xdiff[i-1] / Li[i-1]
Li[i] = sqrt(2*(xdiff[i-1]+xdiff[i]) - Li_1[i-1] * Li_1[i-1])
Bi = 6*(ydiff[i]/xdiff[i] - ydiff[i-1]/xdiff[i-1])
z[i] = (Bi - Li_1[i-1]*z[i-1])/Li[i]
i = size - 1
Li_1[i-1] = xdiff[-1] / Li[i-1]
Li[i] = sqrt(2*xdiff[-1] - Li_1[i-1] * Li_1[i-1])
Bi = 0.0 # natural boundary
z[i] = (Bi - Li_1[i-1]*z[i-1])/Li[i]
# solve [L.T][x] = [y]
i = size-1
z[i] = z[i] / Li[i]
for i in range(size-2, -1, -1):
z[i] = (z[i] - Li_1[i-1]*z[i+1])/Li[i]
# find index
index = x.searchsorted(x0)
np.clip(index, 1, size-1, index)
xi1, xi0 = x[index], x[index-1]
yi1, yi0 = y[index], y[index-1]
zi1, zi0 = z[index], z[index-1]
hi1 = xi1 - xi0
# calculate cubic
f0 = zi0/(6*hi1)*(xi1-x0)**3 + \
zi1/(6*hi1)*(x0-xi0)**3 + \
(yi1/hi1 - zi1*hi1/6)*(x0-xi0) + \
(yi0/hi1 - zi0*hi1/6)*(xi1-x0)
return f0
if __name__ == '__main__':
import matplotlib.pyplot as plt
x = np.linspace(0, 10, 11)
y = np.sin(x)
plt.scatter(x, y)
x_new = np.linspace(0, 10, 201)
plt.plot(x_new, cubic_interp1d(x_new, x, y))
plt.show()
If you have scipy version >= 0.18.0 installed you can use CubicSpline function from scipy.interpolate for cubic spline interpolation.
You can check scipy version by running following commands in python:
#!/usr/bin/env python3
import scipy
scipy.version.version
If your scipy version is >= 0.18.0 you can run following example code for cubic spline interpolation:
#!/usr/bin/env python3
import numpy as np
from scipy.interpolate import CubicSpline
# calculate 5 natural cubic spline polynomials for 6 points
# (x,y) = (0,12) (1,14) (2,22) (3,39) (4,58) (5,77)
x = np.array([0, 1, 2, 3, 4, 5])
y = np.array([12,14,22,39,58,77])
# calculate natural cubic spline polynomials
cs = CubicSpline(x,y,bc_type='natural')
# show values of interpolation function at x=1.25
print('S(1.25) = ', cs(1.25))
## Aditional - find polynomial coefficients for different x regions
# if you want to print polynomial coefficients in form
# S0(0<=x<=1) = a0 + b0(x-x0) + c0(x-x0)^2 + d0(x-x0)^3
# S1(1< x<=2) = a1 + b1(x-x1) + c1(x-x1)^2 + d1(x-x1)^3
# ...
# S4(4< x<=5) = a4 + b4(x-x4) + c5(x-x4)^2 + d5(x-x4)^3
# x0 = 0; x1 = 1; x4 = 4; (start of x region interval)
# show values of a0, b0, c0, d0, a1, b1, c1, d1 ...
cs.c
# Polynomial coefficients for 0 <= x <= 1
a0 = cs.c.item(3,0)
b0 = cs.c.item(2,0)
c0 = cs.c.item(1,0)
d0 = cs.c.item(0,0)
# Polynomial coefficients for 1 < x <= 2
a1 = cs.c.item(3,1)
b1 = cs.c.item(2,1)
c1 = cs.c.item(1,1)
d1 = cs.c.item(0,1)
# ...
# Polynomial coefficients for 4 < x <= 5
a4 = cs.c.item(3,4)
b4 = cs.c.item(2,4)
c4 = cs.c.item(1,4)
d4 = cs.c.item(0,4)
# Print polynomial equations for different x regions
print('S0(0<=x<=1) = ', a0, ' + ', b0, '(x-0) + ', c0, '(x-0)^2 + ', d0, '(x-0)^3')
print('S1(1< x<=2) = ', a1, ' + ', b1, '(x-1) + ', c1, '(x-1)^2 + ', d1, '(x-1)^3')
print('...')
print('S5(4< x<=5) = ', a4, ' + ', b4, '(x-4) + ', c4, '(x-4)^2 + ', d4, '(x-4)^3')
# So we can calculate S(1.25) by using equation S1(1< x<=2)
print('S(1.25) = ', a1 + b1*0.25 + c1*(0.25**2) + d1*(0.25**3))
# Cubic spline interpolation calculus example
# https://www.youtube.com/watch?v=gT7F3TWihvk
Just putting this here if you want a dependency-free solution.
Code taken from an answer above: https://stackoverflow.com/a/48085583/36061
def my_cubic_interp1d(x0, x, y):
"""
Interpolate a 1-D function using cubic splines.
x0 : a 1d-array of floats to interpolate at
x : a 1-D array of floats sorted in increasing order
y : A 1-D array of floats. The length of y along the
interpolation axis must be equal to the length of x.
Implement a trick to generate at first step the cholesky matrice L of
the tridiagonal matrice A (thus L is a bidiagonal matrice that
can be solved in two distinct loops).
additional ref: www.math.uh.edu/~jingqiu/math4364/spline.pdf
# original function code at: https://stackoverflow.com/a/48085583/36061
This function is licenced under: Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
https://creativecommons.org/licenses/by-sa/3.0/
Original Author raphael valentin
Date 3 Jan 2018
Modifications made to remove numpy dependencies:
-all sub-functions by MR
This function, and all sub-functions, are licenced under: Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
Mod author: Matthew Rowles
Date 3 May 2021
"""
def diff(lst):
"""
numpy.diff with default settings
"""
size = len(lst)-1
r = [0]*size
for i in range(size):
r[i] = lst[i+1] - lst[i]
return r
def list_searchsorted(listToInsert, insertInto):
"""
numpy.searchsorted with default settings
"""
def float_searchsorted(floatToInsert, insertInto):
for i in range(len(insertInto)):
if floatToInsert <= insertInto[i]:
return i
return len(insertInto)
return [float_searchsorted(i, insertInto) for i in listToInsert]
def clip(lst, min_val, max_val, inPlace = False):
"""
numpy.clip
"""
if not inPlace:
lst = lst[:]
for i in range(len(lst)):
if lst[i] < min_val:
lst[i] = min_val
elif lst[i] > max_val:
lst[i] = max_val
return lst
def subtract(a,b):
"""
returns a - b
"""
return a - b
size = len(x)
xdiff = diff(x)
ydiff = diff(y)
# allocate buffer matrices
Li = [0]*size
Li_1 = [0]*(size-1)
z = [0]*(size)
# fill diagonals Li and Li-1 and solve [L][y] = [B]
Li[0] = sqrt(2*xdiff[0])
Li_1[0] = 0.0
B0 = 0.0 # natural boundary
z[0] = B0 / Li[0]
for i in range(1, size-1, 1):
Li_1[i] = xdiff[i-1] / Li[i-1]
Li[i] = sqrt(2*(xdiff[i-1]+xdiff[i]) - Li_1[i-1] * Li_1[i-1])
Bi = 6*(ydiff[i]/xdiff[i] - ydiff[i-1]/xdiff[i-1])
z[i] = (Bi - Li_1[i-1]*z[i-1])/Li[i]
i = size - 1
Li_1[i-1] = xdiff[-1] / Li[i-1]
Li[i] = sqrt(2*xdiff[-1] - Li_1[i-1] * Li_1[i-1])
Bi = 0.0 # natural boundary
z[i] = (Bi - Li_1[i-1]*z[i-1])/Li[i]
# solve [L.T][x] = [y]
i = size-1
z[i] = z[i] / Li[i]
for i in range(size-2, -1, -1):
z[i] = (z[i] - Li_1[i-1]*z[i+1])/Li[i]
# find index
index = list_searchsorted(x0,x)
index = clip(index, 1, size-1)
xi1 = [x[num] for num in index]
xi0 = [x[num-1] for num in index]
yi1 = [y[num] for num in index]
yi0 = [y[num-1] for num in index]
zi1 = [z[num] for num in index]
zi0 = [z[num-1] for num in index]
hi1 = list( map(subtract, xi1, xi0) )
# calculate cubic - all element-wise multiplication
f0 = [0]*len(hi1)
for j in range(len(f0)):
f0[j] = zi0[j]/(6*hi1[j])*(xi1[j]-x0[j])**3 + \
zi1[j]/(6*hi1[j])*(x0[j]-xi0[j])**3 + \
(yi1[j]/hi1[j] - zi1[j]*hi1[j]/6)*(x0[j]-xi0[j]) + \
(yi0[j]/hi1[j] - zi0[j]*hi1[j]/6)*(xi1[j]-x0[j])
return f0
Minimal python3 code:
from scipy import interpolate
if __name__ == '__main__':
x = [ 0, 1, 2, 3, 4, 5]
y = [12,14,22,39,58,77]
# tck : tuple (t,c,k) a tuple containing the vector of knots,
# the B-spline coefficients, and the degree of the spline.
tck = interpolate.splrep(x, y)
print(interpolate.splev(1.25, tck)) # Prints 15.203125000000002
print(interpolate.splev(...other_value_here..., tck))
Based on comment of cwhy and answer by youngmit
In my previous post, I wrote a code based on a Cholesky development to solve the matrix generated by the cubic algorithm. Unfortunately, due to the square root function, it may perform badly on some sets of points (typically a non-uniform set of points).
In the same spirit than previously, there is another idea using the Thomas algorithm (TDMA) (see https://en.wikipedia.org/wiki/Tridiagonal_matrix_algorithm) to solve partially the tridiagonal matrix during its definition loop. However, the condition to use TDMA is that it requires at least that the matrix shall be diagonally dominant. However, in our case, it shall be true since |bi| > |ai| + |ci| with ai = h[i], bi = 2*(h[i]+h[i+1]), ci = h[i+1], with h[i] unconditionally positive. (see https://www.cfd-online.com/Wiki/Tridiagonal_matrix_algorithm_-TDMA(Thomas_algorithm)
I refer again to the document from jingqiu (see my previous post, unfortunately the link is broken, but it is still possible to find it in the cache of the web).
An optimized version of the TDMA solver can be described as follows:
def TDMAsolver(a,b,c,d):
""" This function is licenced under: Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
https://creativecommons.org/licenses/by-sa/3.0/
Author raphael valentin
Date 25 Mar 2022
ref. https://www.cfd-online.com/Wiki/Tridiagonal_matrix_algorithm_-_TDMA_(Thomas_algorithm)
"""
n = len(d)
w = np.empty(n-1,float)
g = np.empty(n, float)
w[0] = c[0]/b[0]
g[0] = d[0]/b[0]
for i in range(1, n-1):
m = b[i] - a[i-1]*w[i-1]
w[i] = c[i] / m
g[i] = (d[i] - a[i-1]*g[i-1]) / m
g[n-1] = (d[n-1] - a[n-2]*g[n-2]) / (b[n-1] - a[n-2]*w[n-2])
for i in range(n-2, -1, -1):
g[i] = g[i] - w[i]*g[i+1]
return g
When it is possible to get each individual for ai, bi, ci, di, it becomes easy to combine the definitions of the natural cubic spline interpolator function within these 2 single loops.
def cubic_interpolate(x0, x, y):
""" Natural cubic spline interpolate function
This function is licenced under: Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
https://creativecommons.org/licenses/by-sa/3.0/
Author raphael valentin
Date 25 Mar 2022
"""
xdiff = np.diff(x)
dydx = np.diff(y)
dydx /= xdiff
n = size = len(x)
w = np.empty(n-1, float)
z = np.empty(n, float)
w[0] = 0.
z[0] = 0.
for i in range(1, n-1):
m = xdiff[i-1] * (2 - w[i-1]) + 2 * xdiff[i]
w[i] = xdiff[i] / m
z[i] = (6*(dydx[i] - dydx[i-1]) - xdiff[i-1]*z[i-1]) / m
z[-1] = 0.
for i in range(n-2, -1, -1):
z[i] = z[i] - w[i]*z[i+1]
# find index (it requires x0 is already sorted)
index = x.searchsorted(x0)
np.clip(index, 1, size-1, index)
xi1, xi0 = x[index], x[index-1]
yi1, yi0 = y[index], y[index-1]
zi1, zi0 = z[index], z[index-1]
hi1 = xi1 - xi0
# calculate cubic
f0 = zi0/(6*hi1)*(xi1-x0)**3 + \
zi1/(6*hi1)*(x0-xi0)**3 + \
(yi1/hi1 - zi1*hi1/6)*(x0-xi0) + \
(yi0/hi1 - zi0*hi1/6)*(xi1-x0)
return f0
This function gives the same results as the function/class CubicSpline from scipy.interpolate, as we can see in the next plot.
It is possible to implement as well the first and second analytical derivatives that can be described such way:
f1p = -zi0/(2*hi1)*(xi1-x0)**2 + zi1/(2*hi1)*(x0-xi0)**2 + (yi1/hi1 - zi1*hi1/6) + (yi0/hi1 - zi0*hi1/6)
f2p = zi0/hi1 * (xi1-x0) + zi1/hi1 * (x0-xi0)
Then, it is easy to verify that f2p[0] and f2p[-1] are equal to 0, then that the interpolator function yields natural splines.
An additional reference concerning natural spline:
https://faculty.ksu.edu.sa/sites/default/files/numerical_analysis_9th.pdf#page=167
An example of use:
import matplotlib.pyplot as plt
import numpy as np
x = [-8,-4.19,-3.54,-3.31,-2.56,-2.31,-1.66,-0.96,-0.22,0.62,1.21,3]
y = [-0.01,0.01,0.03,0.04,0.07,0.09,0.16,0.28,0.45,0.65,0.77,1]
x = np.asfarray(x)
y = np.asfarray(y)
plt.scatter(x, y)
x_new= np.linspace(min(x), max(x), 10000)
y_new = cubic_interpolate(x_new, x, y)
plt.plot(x_new, y_new)
from scipy.interpolate import CubicSpline
f = CubicSpline(x, y, bc_type='natural')
plt.plot(x_new, f(x_new), label='ref')
plt.legend()
plt.show()
In a conclusion, this updated algorithm shall perform interpolation with better stability and faster than the previous code (O(n)). Associated with numba or cython, it shall be even very fast. Finally, it is totally independent of Scipy.
Important, note that as most of algorithms, it is sometimes useful to normalize the data (e.g. against large or small number values) to get the best results. As well, in this code, I do not check nan values or ordered data.
Whatever, this update was a good lesson learning for me and I hope it can help someone. Let me know if you find something strange.
If you want to get the value
from scipy.interpolate import CubicSpline
import numpy as np
x = [-5,-4.19,-3.54,-3.31,-2.56,-2.31,-1.66,-0.96,-0.22,0.62,1.21,3]
y = [-0.01,0.01,0.03,0.04,0.07,0.09,0.16,0.28,0.45,0.65,0.77,1]
value = 2
#ascending order
if np.any(np.diff(x) < 0):
indexes = np.argsort(x).astype(int)
x = np.array(x)[indexes]
y = np.array(y)[indexes]
f = CubicSpline(x, y, bc_type='natural')
specificVal = f(value).item(0) #f(value) is numpy.ndarray!!
print(specificVal)
If you want to plot the interpolated function.
np.linspace third parameter increase the "accuracy".
from scipy.interpolate import CubicSpline
import numpy as np
import matplotlib.pyplot as plt
x = [-5,-4.19,-3.54,-3.31,-2.56,-2.31,-1.66,-0.96,-0.22,0.62,1.21,3]
y = [-0.01,0.01,0.03,0.04,0.07,0.09,0.16,0.28,0.45,0.65,0.77,1]
#ascending order
if np.any(np.diff(x) < 0):
indexes = np.argsort(x).astype(int)
x = np.array(x)[indexes]
y = np.array(y)[indexes]
f = CubicSpline(x, y, bc_type='natural')
x_new = np.linspace(min(x), max(x), 100)
y_new = f(x_new)
plt.plot(x_new, y_new)
plt.scatter(x, y)
plt.title('Cubic Spline Interpolation')
plt.show()
output:
Yes, as others have already noted, it should be as simple as
>>> from scipy.interpolate import CubicSpline
>>> CubicSpline(x,y)(u)
array(15.203125)
(you can, for example, convert it to float to get the value from a 0d NumPy array)
What has not been described yet is boundary conditions: the default ‘not-a-knot’ boundary conditions work best if you have zero knowledge about the data you’re going to interpolate.
If you see the following ‘features’ on the plot, you can fine-tune the boundary conditions to get a better result:
the first derivative vanishes at boundaries => bc_type=‘clamped’
the second derivative vanishes at boundaries => bc_type='natural'
the function is periodic => bc_type='periodic'
See my article for more details and an interactive demo.

Categories

Resources