Numpy - More efficient code to calculate metric - python

I am trying to implement this metric
I already managed to calculate NUBN with numpy operations so that is fast, but I can't find a way to escape python slow looping to calculate the DRD part. Here is my current calculation of DRD:
def drd(im, im_gt):
height, width = im.shape
W = np.array([[1/math.sqrt(x**2+y**2) if x != 0 or y != 0 else 0 for x in range(-2, 3)] for y in range(-2, 3)])
W /= W.sum()
drd = 0
s = []
for y, x in zip(*np.where(im_gt != im)):
if x > 1 and y > 1 and x + 2 < width and y + 2 < height:
s.append(im_gt[y-2:y+3, x-2:x+3] == im_gt[y, x])
else:
for yy in range(y-2, y+3):
for xx in range(x-2, x+3):
if xx > 1 and yy > 1 and xx < width - 1 and yy < height - 1:
drd += abs(im_gt[yy, xx] - im[y, x]) * W[yy-y+2, xx-x+2]
return drd + np.sum(s * W)
drd(np.random.choice([0, 1], size=(100, 100)), np.random.choice([0, 1], size=(100, 100)))
Can anyone think of a faster way to do this? Timings on 1000x1000:

The first step in speeding things up with numpy is to break up your sequence of operations into something that can be applied to an entire array. Let's start with an easy one: removing the comprehensions in the computation of W:
W = np.hypot(np.arange(-2, 3), np.arange(-2, 3)[:, None])
np.reciprocal(W, where=W.astype(bool), out=W)
W /= W.sum()
The next thing (which is hinted at above with where=W.astype(bool)) is to use masking where appropriate to apply a condition to an entire array. Your algorithm is as follows:
For each location that does not match between im and im_gt, compute the sum of the elements of W centered on that location where they do not match.
You can compute this with a convolution with W. Locations where im == im_gt are simply discarded. Locations where im_gt == 1 need to be flipped by subtracting from W.sum(), since you need to sum the zeros, not the ones for those elements. Convolution is implemented in scipy.signal.convolve2d. You get the same edge effects by using mode='same' and adjusting the edge pixels carefully. You can cheat and get the edge sums by convolving with an array of ones:
from scipy.signal import convolve2d
# Compute this once outside the function
W = np.hypot(np.arange(-2, 3), np.arange(-2, 3)[:, None])
np.reciprocal(W, where=W.astype(bool), out=W)
W /= W.sum()
def drd(im, im_gt):
m0 = im != im_gt
m1 = im_gt == 0
m2 = im_gt == 1
s1 = convolve2d(m1, W, mode='same')[m0 & m1].sum()
s2 = convolve2d(m2, W, mode='same')[m0 & m2].sum()
return s1 + s2

Related

Use loop for np.random or reshape array to matrix?

I am new to programming in general, however I am trying really hard for a project to randomly choose some outcomes depending on the probability of that outcome happening for lotteries that i have generated and i would like to use a loop to get random numbers each time.
This is my code:
import numpy as np
p = np.arange(0.01, 1, 0.001, dtype = float)
alpha = 0.5
alpha = float(alpha)
alpha = np.zeros((1, len(p))) + alpha
def w(alpha, p):
return np.exp(-(-np.log(p))**alpha)
w = w(alpha, p)
def P(w):
return np.exp(np.log2(w))
prob_win = P(w)
prob_lose = 1 - prob_win
E = 10
E = float(E)
E = np.zeros((1, len(p))) + E
b = 0
b = float(b)
b = np.zeros((1, len(p))) + b
def A(E, b, prob_win):
return (E - b * (1 - prob_win)) / prob_win
a = A(E, b, prob_win)
a = a.squeeze()
prob_array = (prob_win, prob_lose)
prob_matrix = np.vstack(prob_array).T.squeeze()
outcomes_array = (a, b)
outcomes_matrix = np.vstack(outcomes_array).T
outcome_pairs = np.vsplit(outcomes_matrix, len(p))
outcome_pairs = np.array(outcome_pairs).astype(np.float)
prob_pairs = np.vsplit(prob_matrix, len(p))
prob_pairs = np.array(prob_pairs)
nominalized_prob_pairs = [outcome_pairs / np.sum(outcome_pairs) for
outcome_pairs in np.vsplit(prob_pairs, len(p)) ]
The code works fine but I would like to use a loop or something similar for the next line of code as I want to get for each row/ pair of probabilities to get 5 realizations. When i use size = 5 i just get a really long list but I do not know which values still belong to the pairs as when size = 1
realisations = np.concatenate([np.random.choice(outcome_pairs[i].ravel(),
size=1 , p=nominalized_prob_pairs[i].ravel()) for i in range(len(outcome_pairs))])
or if I use size=5 as below how can I match the realizations to the initial probabilities? Do i need to cut the array after every 5th element and then store the values in a matrix with 5 columns and a new row for every 5th element of the initial array? if yes how could I do this?
realisations = np.concatenate([np.random.choice(outcome_pairs[i].ravel(),
size=1 , p=nominalized_prob_pairs[i].ravel()) for i in range(len(outcome_pairs))])
What are you trying to produce exactly ? Be more concise.
Here is a starter clean code where you can produce linear data.
import numpy as np
def generate_data(n_samples, variance):
# generate 2D data
X = np.random.random((n_samples, 1))
# adding a vector of ones to ease calculus
X = np.concatenate((np.ones((n_samples, 1)), X), axis=1)
# generate two random coefficients
W = np.random.random((2, 1))
# construct targets with our data and weights
y = X # W
# add some noise to our data
y += np.random.normal(0, variance, (n_samples, 1))
return X, y, W
if __name__ == "__main__":
X, Y, W = generate_data(10, 0.5)
# check random value of x for example
for x in X:
print(x, end=' --> ')
if x[1] <= 0.4:
print('prob <= 0.4')
else:
print('prob > 0.4')

Need to speed up very slow loop for image manipulation on Python

I am currently completing a program in Pyhton (3.6) as per internal requirement. As part of it, I am having to loop through a colour image (3 bytes per pixel, R, G & B) and distort the image pixel by pixel.
I have the same code in other languages (C++, C#), and non-optimized code executes in about two seconds, while optimized code executes in less than a second. By non-optimized code I mean that the matrix multiplication is performed by a 10 line function I implemented. The optimized version just uses external libraries for multiplication.
In Python, this code takes close to 300 seconds. I can´t think of a way to vectorize this logic or speed it up, as there are a couple of "if"s inside the nested loop. Any help would be greatly appreciated.
import numpy as np
#for test purposes:
#roi = rect.rect(0, 0, 1200, 1200)
#input = DCImage.DCImage(1200, 1200, 3)
#correctionImage = DCImage.DCImage(1200,1200,3)
#siteToImage= np.zeros((3,3), np.float32)
#worldToSite= np.zeros ((4, 4))
#r11 = r12 = r13 = r21 = r22 = r23 = r31 = r32 = r33 = 0.0
#xMean = yMean = zMean = 0
#tx = ty = tz = 0
#epsilon = np.finfo(float).eps
#fx = fy = cx = cy = k1 = k2 = p1 = p2 = 0
for i in range (roi.x, roi.x + roi.width):
for j in range (roi.y , roi.y + roi.height):
if ( (input.pixels [i] [j] == [255, 0, 0]).all()):
#Coordinates conversion
siteMat = np.matmul(siteToImage, [i, j, 1])
world =np.matmul(worldToSite, [siteMat[0], siteMat[1], 0.0, 1.0])
xLocal = world[0] - xMean
yLocal = world[1] - yMean
zLocal = z_ortho - zMean
#From World to camera
xCam = r11*xLocal + r12*yLocal + r13*zLocal + tx
yCam = r21*xLocal + r22*yLocal + r23*zLocal + ty
zCam = r31*xLocal + r32*yLocal + r33*zLocal + tz
if (zCam > epsilon or zCam < -epsilon):
xCam = xCam / zCam
yCam = yCam / zCam
#// DISTORTIONS
r2 = xCam*xCam + yCam*yCam
a1 = 2*xCam*yCam
a2 = r2 + 2*xCam*xCam
a3 = r2 + 2*yCam*yCam
cdist = 1 + k1*r2 + k2*r2*r2
u = int((xCam * cdist + p1 * a1 + p2 * a2) * fx + cx + 0.5)
v = int((yCam * cdist + p1 * a3 + p2 * a1) * fy + cy + 0.5)
if (u>=0 and u<correctionImage.width and v>=0 and v < correctionImage.height):
input.pixels [i] [j] = correctionImage.pixels [u][v]
You normally vectorize this kind of thing by making a displacement map.
Make a complex image where each pixel has the value of its own coordinate, apply the usual math operations to compute whatever transform you want, then apply the map to your source image.
For example, in pyvips you might write:
import sys
import pyvips
image = pyvips.Image.new_from_file(sys.argv[1])
# this makes an image where pixel (0, 0) (at the top-left) has value [0, 0],
# and pixel (image.width, image.height) at the bottom-right has value
# [image.width, image.height]
index = pyvips.Image.xyz(image.width, image.height)
# make a version with (0, 0) at the centre, negative values up and left,
# positive down and right
centre = index - [image.width / 2, image.height / 2]
# to polar space, so each pixel is now distance and angle in degrees
polar = centre.polar()
# scale sin(distance) by 1/distance to make a wavey pattern
d = 10000 * (polar[0] * 3).sin() / (1 + polar[0])
# and back to rectangular coordinates again to make a set of vectors we can
# apply to the original index image
distort = index + d.bandjoin(polar[1]).rect()
# distort the image
distorted = image.mapim(distort)
# pick pixels from either the distorted image or the original, depending on some
# condition
result = (d.abs() > 10 or image[2] > 100).ifthenelse(distorted, image)
result.write_to_file(sys.argv[2])
That's just a silly wobble pattern, but you can swap it for any distortion you want. Then run as:
$ /usr/bin/time -f %M:%e ./wobble.py ~/pics/horse1920x1080.jpg x.jpg
54572:0.31
300ms and 55MB of memory on this two-core, 2015 laptop to make:
After much testing, the only way to speed the function without writing it in C++ was dissassembling it and vectorizing it. The way to do it in this particular instance is to create an array with the valid indexes at the beginning of the funcion and use them as tuples to index the final solution.
subArray[roi.y:roi.y+roi.height,roi.x:roi.x+roi.width,] = input.pixels[roi.y:roi.y+roi.height,roi.x:roi.x+roi.width,]
#Calculate valid XY indexes
y_index, x_index = np.where(np.all(subArray== np.array([255,0,0]), axis=-1))
#....
#do stuff
#....
#Join result values with XY indexes
ij_xy = np.column_stack((i, j, y_index, x_index))
#Only keep valid ij values
valids_ij_xy = ij_xy [(ij_xy [:,0] >= 0) & (ij_xy [:,0] < correctionImage.height) & (ij_xy [:,1] >= 0) & (ij_xy [:,1] < correctionImage.width)]
#Assign values
input.pixels [tuple(np.array(valids_ij_xy [:,2:]).T)] = correctionImage.pixels[tuple(np.array(valids_ij_xy [:,:2]).T)]

How to vectorize hinge loss gradient computation

I'm computing thousands of gradients and would like to vectorize the computations in Python. The context is SVM and the loss function is Hinge Loss. Y is Mx1, X is MxN and w is Nx1.
L(w) = lam/2 * ||w||^2 + 1/m Sum i=1:m ( max(0, 1-y[i]X[i]w) )
The gradient of this is
grad = lam*w + 1/m Sum i=1:m {-y[i]X[i].T if y[i]*X[i]*w < 1, else 0}
Instead of looping through each element of the sum and evaluating the max function, is it possible to vectorize this? I want to use something like np.where like the following
grad = np.where(y*X.dot(w) < 1, -X.T.dot(y), 0)
This does not work because where the condition is true, -X.T*y is the wrong dimension.
edit: list comprehension version, would like to know if there's a cleaner or more optimal way
def grad(X,y,w,lam):
# cache y[i]*X[i].dot(w), each row of Xw is multiplied by a single element of y
yXw = y*X.dot(w)
# cache y[i]*X[i], note each row of X is multiplied by a single element of y
yX = X*y[:,np.newaxis]
# return the average of this max function
return lam*w + np.mean( [-yX[i] if yXw[i] < 1 else 0 for i in range(len(y))] )
you have two vectors A and B, and you want to return array C, such that C[i] = A[i] if B[i] < 1 and 0 else, consequently all you need to do is
C := A * sign(max(0, 1-B)) # suprisingly similar to the original hinge loss, right?:)
since
if B < 1 then 1-B > 0, thus max(0, 1-B) > 0 and sign(max(0, 1-B)) == 1
if B >= 1 then 1-B <= 0, thus max(0, 1-B) = 0 and sign(max(0, 1-B)) == 0
so in your code it will be something like
A = (y*X.dot(w)).ravel()
B = (X*y[:,np.newaxis]).ravel()
C = A * np.sign(np.maximum(0, 1-B))

Rows and columns restrictions on python

I have a list of lists m which I need to modify
I need that the sum of each row to be greater than A and the sum of each column to be lesser than B
I have something like this
x = 5 #or other number, not relevant
rows = len(m)
cols = len(m[0])
for r in range(rows):
while sum(m[r]) < A:
c = randint(0, cols-1)
m[r][c] += x
for c in range(cols):
cant = sum([m[r][c] for r in range(rows)])
while cant > B:
r = randint(0, rows-1)
if m[r][c] >= x: #I don't want negatives
m[r][c] -= x
My problem is: I need to satisfy both conditions and, this way, after the second for I won't be sure if the first condition is still met.
Any suggestions on how to satisfy both conditions and, of course, with the best execution? I could definitely consider the use of numpy
Edit (an example)
#input
m = [[0,0,0],
[0,0,0]]
A = 20
B = 25
# one desired output (since it chooses random positions)
m = [[10,0,15],
[15,0,5]]
I may need to add
This is for the generation of the random initial population of a genetic algorithm, the restrictions are to make them a possible solution, and I would need to run this like 80 times to get different possible solutions
Something like this should to the trick:
import numpy
from scipy.optimize import linprog
A = 10
B = 20
m = 2
n = m * m
# the coefficients of a linear function to minimize.
# setting this to all ones minimizes the sum of all variable
# values in the matrix, which solves the problem, but see below.
c = numpy.ones(n)
# the constraint matrix.
# This is matrix-multiplied with the current solution candidate
# to form the left hand side of a set of normalized
# linear inequality constraint equations, i.e.
#
# x_0 * A_ub[0][0] + x_1 * A_ub[0][1] <= b_0
# x_1 * A_ub[1][0] + x_1 * A_ub[1][1] <= b_1
# ...
A_ub = numpy.zeros((2 * m, n))
# row sums. Since the <= inequality is a fixed component,
# we just multiply everthing by (-1), i.e. we demand that
# the negative sums are smaller than the negative limit -A.
#
# Assign row ranges all at once, because numpy can do this.
for r in xrange(0, m):
A_ub[r][r * m:(r + 1) * m] = -1
# We want that the sum of the x in each (flattened)
# column is smaller than B
#
# The manual stepping for the column sums in row-major encoding
# is a little bit annoying here.
for r in xrange(0, m):
for j in xrange(0, m):
A_ub[r + m][r + m * j] = 1
# the actual upper limits for the normalized inequalities.
b_ub = [-A] * m + [B] * m
# hand the linear program to scipy
solution = linprog(c, A_ub=A_ub, b_ub=b_ub)
# bring the solution into the desired matrix form
print numpy.reshape(solution.x, (m, m))
Caveats
I use <=, not < as indicated in your question, because that's what numpy supports.
This minimizes the total sum of all values in the target vector.
For your use case, you probably want to minimize the distance
to the original sample, which the linear program cannot handle, since neither the squared error nor the absolute difference can be expressed using a linear combination (which is what c stands for). For that, you will probably need to go to full minimize().
Still, this should get you rough idea.
A NumPy solution:
import numpy as np
val = B / len(m) # column sums <= B
assert val * len(m[0]) >= A # row sums >= A
# create array shaped like m, filled with val
arr = np.empty_like(m)
arr[:] = val
I chose to ignore the original content of m - it's all zero in your example anyway.
from random import *
m = [[0,0,0],
[0,0,0]]
A = 20
B = 25
x = 1 #or other number, not relevant
rows = len(m)
cols = len(m[0])
def runner(list1, a1, b1, x1):
list1_backup = list(list1)
rows = len(list1)
cols = len(list1[0])
for r in range(rows):
while sum(list1[r]) <= a1:
c = randint(0, cols-1)
list1[r][c] += x1
for c in range(cols):
cant = sum([list1[r][c] for r in range(rows)])
while cant >= b1:
r = randint(0, rows-1)
if list1[r][c] >= x1: #I don't want negatives
list1[r][c] -= x1
good_a_int = 0
for r in range(rows):
test1 = sum(list1[r]) > a1
good_a_int += 0 if test1 else 1
if good_a_int == 0:
return list1
else:
return runner(list1=list1_backup, a1=a1, b1=b1, x1=x1)
m2 = runner(m, A, B, x)
for row in m:
print ','.join(map(lambda x: "{:>3}".format(x), row))

How to perform cubic spline interpolation in python?

I have two lists to describe the function y(x):
x = [0,1,2,3,4,5]
y = [12,14,22,39,58,77]
I would like to perform cubic spline interpolation so that given some value u in the domain of x, e.g.
u = 1.25
I can find y(u).
I found this in SciPy but I am not sure how to use it.
Short answer:
from scipy import interpolate
def f(x):
x_points = [ 0, 1, 2, 3, 4, 5]
y_points = [12,14,22,39,58,77]
tck = interpolate.splrep(x_points, y_points)
return interpolate.splev(x, tck)
print(f(1.25))
Long answer:
scipy separates the steps involved in spline interpolation into two operations, most likely for computational efficiency.
The coefficients describing the spline curve are computed,
using splrep(). splrep returns an array of tuples containing the
coefficients.
These coefficients are passed into splev() to actually
evaluate the spline at the desired point x (in this example 1.25).
x can also be an array. Calling f([1.0, 1.25, 1.5]) returns the
interpolated points at 1, 1.25, and 1,5, respectively.
This approach is admittedly inconvenient for single evaluations, but since the most common use case is to start with a handful of function evaluation points, then to repeatedly use the spline to find interpolated values, it is usually quite useful in practice.
In case, scipy is not installed:
import numpy as np
from math import sqrt
def cubic_interp1d(x0, x, y):
"""
Interpolate a 1-D function using cubic splines.
x0 : a float or an 1d-array
x : (N,) array_like
A 1-D array of real/complex values.
y : (N,) array_like
A 1-D array of real values. The length of y along the
interpolation axis must be equal to the length of x.
Implement a trick to generate at first step the cholesky matrice L of
the tridiagonal matrice A (thus L is a bidiagonal matrice that
can be solved in two distinct loops).
additional ref: www.math.uh.edu/~jingqiu/math4364/spline.pdf
"""
x = np.asfarray(x)
y = np.asfarray(y)
# remove non finite values
# indexes = np.isfinite(x)
# x = x[indexes]
# y = y[indexes]
# check if sorted
if np.any(np.diff(x) < 0):
indexes = np.argsort(x)
x = x[indexes]
y = y[indexes]
size = len(x)
xdiff = np.diff(x)
ydiff = np.diff(y)
# allocate buffer matrices
Li = np.empty(size)
Li_1 = np.empty(size-1)
z = np.empty(size)
# fill diagonals Li and Li-1 and solve [L][y] = [B]
Li[0] = sqrt(2*xdiff[0])
Li_1[0] = 0.0
B0 = 0.0 # natural boundary
z[0] = B0 / Li[0]
for i in range(1, size-1, 1):
Li_1[i] = xdiff[i-1] / Li[i-1]
Li[i] = sqrt(2*(xdiff[i-1]+xdiff[i]) - Li_1[i-1] * Li_1[i-1])
Bi = 6*(ydiff[i]/xdiff[i] - ydiff[i-1]/xdiff[i-1])
z[i] = (Bi - Li_1[i-1]*z[i-1])/Li[i]
i = size - 1
Li_1[i-1] = xdiff[-1] / Li[i-1]
Li[i] = sqrt(2*xdiff[-1] - Li_1[i-1] * Li_1[i-1])
Bi = 0.0 # natural boundary
z[i] = (Bi - Li_1[i-1]*z[i-1])/Li[i]
# solve [L.T][x] = [y]
i = size-1
z[i] = z[i] / Li[i]
for i in range(size-2, -1, -1):
z[i] = (z[i] - Li_1[i-1]*z[i+1])/Li[i]
# find index
index = x.searchsorted(x0)
np.clip(index, 1, size-1, index)
xi1, xi0 = x[index], x[index-1]
yi1, yi0 = y[index], y[index-1]
zi1, zi0 = z[index], z[index-1]
hi1 = xi1 - xi0
# calculate cubic
f0 = zi0/(6*hi1)*(xi1-x0)**3 + \
zi1/(6*hi1)*(x0-xi0)**3 + \
(yi1/hi1 - zi1*hi1/6)*(x0-xi0) + \
(yi0/hi1 - zi0*hi1/6)*(xi1-x0)
return f0
if __name__ == '__main__':
import matplotlib.pyplot as plt
x = np.linspace(0, 10, 11)
y = np.sin(x)
plt.scatter(x, y)
x_new = np.linspace(0, 10, 201)
plt.plot(x_new, cubic_interp1d(x_new, x, y))
plt.show()
If you have scipy version >= 0.18.0 installed you can use CubicSpline function from scipy.interpolate for cubic spline interpolation.
You can check scipy version by running following commands in python:
#!/usr/bin/env python3
import scipy
scipy.version.version
If your scipy version is >= 0.18.0 you can run following example code for cubic spline interpolation:
#!/usr/bin/env python3
import numpy as np
from scipy.interpolate import CubicSpline
# calculate 5 natural cubic spline polynomials for 6 points
# (x,y) = (0,12) (1,14) (2,22) (3,39) (4,58) (5,77)
x = np.array([0, 1, 2, 3, 4, 5])
y = np.array([12,14,22,39,58,77])
# calculate natural cubic spline polynomials
cs = CubicSpline(x,y,bc_type='natural')
# show values of interpolation function at x=1.25
print('S(1.25) = ', cs(1.25))
## Aditional - find polynomial coefficients for different x regions
# if you want to print polynomial coefficients in form
# S0(0<=x<=1) = a0 + b0(x-x0) + c0(x-x0)^2 + d0(x-x0)^3
# S1(1< x<=2) = a1 + b1(x-x1) + c1(x-x1)^2 + d1(x-x1)^3
# ...
# S4(4< x<=5) = a4 + b4(x-x4) + c5(x-x4)^2 + d5(x-x4)^3
# x0 = 0; x1 = 1; x4 = 4; (start of x region interval)
# show values of a0, b0, c0, d0, a1, b1, c1, d1 ...
cs.c
# Polynomial coefficients for 0 <= x <= 1
a0 = cs.c.item(3,0)
b0 = cs.c.item(2,0)
c0 = cs.c.item(1,0)
d0 = cs.c.item(0,0)
# Polynomial coefficients for 1 < x <= 2
a1 = cs.c.item(3,1)
b1 = cs.c.item(2,1)
c1 = cs.c.item(1,1)
d1 = cs.c.item(0,1)
# ...
# Polynomial coefficients for 4 < x <= 5
a4 = cs.c.item(3,4)
b4 = cs.c.item(2,4)
c4 = cs.c.item(1,4)
d4 = cs.c.item(0,4)
# Print polynomial equations for different x regions
print('S0(0<=x<=1) = ', a0, ' + ', b0, '(x-0) + ', c0, '(x-0)^2 + ', d0, '(x-0)^3')
print('S1(1< x<=2) = ', a1, ' + ', b1, '(x-1) + ', c1, '(x-1)^2 + ', d1, '(x-1)^3')
print('...')
print('S5(4< x<=5) = ', a4, ' + ', b4, '(x-4) + ', c4, '(x-4)^2 + ', d4, '(x-4)^3')
# So we can calculate S(1.25) by using equation S1(1< x<=2)
print('S(1.25) = ', a1 + b1*0.25 + c1*(0.25**2) + d1*(0.25**3))
# Cubic spline interpolation calculus example
# https://www.youtube.com/watch?v=gT7F3TWihvk
Just putting this here if you want a dependency-free solution.
Code taken from an answer above: https://stackoverflow.com/a/48085583/36061
def my_cubic_interp1d(x0, x, y):
"""
Interpolate a 1-D function using cubic splines.
x0 : a 1d-array of floats to interpolate at
x : a 1-D array of floats sorted in increasing order
y : A 1-D array of floats. The length of y along the
interpolation axis must be equal to the length of x.
Implement a trick to generate at first step the cholesky matrice L of
the tridiagonal matrice A (thus L is a bidiagonal matrice that
can be solved in two distinct loops).
additional ref: www.math.uh.edu/~jingqiu/math4364/spline.pdf
# original function code at: https://stackoverflow.com/a/48085583/36061
This function is licenced under: Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
https://creativecommons.org/licenses/by-sa/3.0/
Original Author raphael valentin
Date 3 Jan 2018
Modifications made to remove numpy dependencies:
-all sub-functions by MR
This function, and all sub-functions, are licenced under: Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
Mod author: Matthew Rowles
Date 3 May 2021
"""
def diff(lst):
"""
numpy.diff with default settings
"""
size = len(lst)-1
r = [0]*size
for i in range(size):
r[i] = lst[i+1] - lst[i]
return r
def list_searchsorted(listToInsert, insertInto):
"""
numpy.searchsorted with default settings
"""
def float_searchsorted(floatToInsert, insertInto):
for i in range(len(insertInto)):
if floatToInsert <= insertInto[i]:
return i
return len(insertInto)
return [float_searchsorted(i, insertInto) for i in listToInsert]
def clip(lst, min_val, max_val, inPlace = False):
"""
numpy.clip
"""
if not inPlace:
lst = lst[:]
for i in range(len(lst)):
if lst[i] < min_val:
lst[i] = min_val
elif lst[i] > max_val:
lst[i] = max_val
return lst
def subtract(a,b):
"""
returns a - b
"""
return a - b
size = len(x)
xdiff = diff(x)
ydiff = diff(y)
# allocate buffer matrices
Li = [0]*size
Li_1 = [0]*(size-1)
z = [0]*(size)
# fill diagonals Li and Li-1 and solve [L][y] = [B]
Li[0] = sqrt(2*xdiff[0])
Li_1[0] = 0.0
B0 = 0.0 # natural boundary
z[0] = B0 / Li[0]
for i in range(1, size-1, 1):
Li_1[i] = xdiff[i-1] / Li[i-1]
Li[i] = sqrt(2*(xdiff[i-1]+xdiff[i]) - Li_1[i-1] * Li_1[i-1])
Bi = 6*(ydiff[i]/xdiff[i] - ydiff[i-1]/xdiff[i-1])
z[i] = (Bi - Li_1[i-1]*z[i-1])/Li[i]
i = size - 1
Li_1[i-1] = xdiff[-1] / Li[i-1]
Li[i] = sqrt(2*xdiff[-1] - Li_1[i-1] * Li_1[i-1])
Bi = 0.0 # natural boundary
z[i] = (Bi - Li_1[i-1]*z[i-1])/Li[i]
# solve [L.T][x] = [y]
i = size-1
z[i] = z[i] / Li[i]
for i in range(size-2, -1, -1):
z[i] = (z[i] - Li_1[i-1]*z[i+1])/Li[i]
# find index
index = list_searchsorted(x0,x)
index = clip(index, 1, size-1)
xi1 = [x[num] for num in index]
xi0 = [x[num-1] for num in index]
yi1 = [y[num] for num in index]
yi0 = [y[num-1] for num in index]
zi1 = [z[num] for num in index]
zi0 = [z[num-1] for num in index]
hi1 = list( map(subtract, xi1, xi0) )
# calculate cubic - all element-wise multiplication
f0 = [0]*len(hi1)
for j in range(len(f0)):
f0[j] = zi0[j]/(6*hi1[j])*(xi1[j]-x0[j])**3 + \
zi1[j]/(6*hi1[j])*(x0[j]-xi0[j])**3 + \
(yi1[j]/hi1[j] - zi1[j]*hi1[j]/6)*(x0[j]-xi0[j]) + \
(yi0[j]/hi1[j] - zi0[j]*hi1[j]/6)*(xi1[j]-x0[j])
return f0
Minimal python3 code:
from scipy import interpolate
if __name__ == '__main__':
x = [ 0, 1, 2, 3, 4, 5]
y = [12,14,22,39,58,77]
# tck : tuple (t,c,k) a tuple containing the vector of knots,
# the B-spline coefficients, and the degree of the spline.
tck = interpolate.splrep(x, y)
print(interpolate.splev(1.25, tck)) # Prints 15.203125000000002
print(interpolate.splev(...other_value_here..., tck))
Based on comment of cwhy and answer by youngmit
In my previous post, I wrote a code based on a Cholesky development to solve the matrix generated by the cubic algorithm. Unfortunately, due to the square root function, it may perform badly on some sets of points (typically a non-uniform set of points).
In the same spirit than previously, there is another idea using the Thomas algorithm (TDMA) (see https://en.wikipedia.org/wiki/Tridiagonal_matrix_algorithm) to solve partially the tridiagonal matrix during its definition loop. However, the condition to use TDMA is that it requires at least that the matrix shall be diagonally dominant. However, in our case, it shall be true since |bi| > |ai| + |ci| with ai = h[i], bi = 2*(h[i]+h[i+1]), ci = h[i+1], with h[i] unconditionally positive. (see https://www.cfd-online.com/Wiki/Tridiagonal_matrix_algorithm_-TDMA(Thomas_algorithm)
I refer again to the document from jingqiu (see my previous post, unfortunately the link is broken, but it is still possible to find it in the cache of the web).
An optimized version of the TDMA solver can be described as follows:
def TDMAsolver(a,b,c,d):
""" This function is licenced under: Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
https://creativecommons.org/licenses/by-sa/3.0/
Author raphael valentin
Date 25 Mar 2022
ref. https://www.cfd-online.com/Wiki/Tridiagonal_matrix_algorithm_-_TDMA_(Thomas_algorithm)
"""
n = len(d)
w = np.empty(n-1,float)
g = np.empty(n, float)
w[0] = c[0]/b[0]
g[0] = d[0]/b[0]
for i in range(1, n-1):
m = b[i] - a[i-1]*w[i-1]
w[i] = c[i] / m
g[i] = (d[i] - a[i-1]*g[i-1]) / m
g[n-1] = (d[n-1] - a[n-2]*g[n-2]) / (b[n-1] - a[n-2]*w[n-2])
for i in range(n-2, -1, -1):
g[i] = g[i] - w[i]*g[i+1]
return g
When it is possible to get each individual for ai, bi, ci, di, it becomes easy to combine the definitions of the natural cubic spline interpolator function within these 2 single loops.
def cubic_interpolate(x0, x, y):
""" Natural cubic spline interpolate function
This function is licenced under: Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
https://creativecommons.org/licenses/by-sa/3.0/
Author raphael valentin
Date 25 Mar 2022
"""
xdiff = np.diff(x)
dydx = np.diff(y)
dydx /= xdiff
n = size = len(x)
w = np.empty(n-1, float)
z = np.empty(n, float)
w[0] = 0.
z[0] = 0.
for i in range(1, n-1):
m = xdiff[i-1] * (2 - w[i-1]) + 2 * xdiff[i]
w[i] = xdiff[i] / m
z[i] = (6*(dydx[i] - dydx[i-1]) - xdiff[i-1]*z[i-1]) / m
z[-1] = 0.
for i in range(n-2, -1, -1):
z[i] = z[i] - w[i]*z[i+1]
# find index (it requires x0 is already sorted)
index = x.searchsorted(x0)
np.clip(index, 1, size-1, index)
xi1, xi0 = x[index], x[index-1]
yi1, yi0 = y[index], y[index-1]
zi1, zi0 = z[index], z[index-1]
hi1 = xi1 - xi0
# calculate cubic
f0 = zi0/(6*hi1)*(xi1-x0)**3 + \
zi1/(6*hi1)*(x0-xi0)**3 + \
(yi1/hi1 - zi1*hi1/6)*(x0-xi0) + \
(yi0/hi1 - zi0*hi1/6)*(xi1-x0)
return f0
This function gives the same results as the function/class CubicSpline from scipy.interpolate, as we can see in the next plot.
It is possible to implement as well the first and second analytical derivatives that can be described such way:
f1p = -zi0/(2*hi1)*(xi1-x0)**2 + zi1/(2*hi1)*(x0-xi0)**2 + (yi1/hi1 - zi1*hi1/6) + (yi0/hi1 - zi0*hi1/6)
f2p = zi0/hi1 * (xi1-x0) + zi1/hi1 * (x0-xi0)
Then, it is easy to verify that f2p[0] and f2p[-1] are equal to 0, then that the interpolator function yields natural splines.
An additional reference concerning natural spline:
https://faculty.ksu.edu.sa/sites/default/files/numerical_analysis_9th.pdf#page=167
An example of use:
import matplotlib.pyplot as plt
import numpy as np
x = [-8,-4.19,-3.54,-3.31,-2.56,-2.31,-1.66,-0.96,-0.22,0.62,1.21,3]
y = [-0.01,0.01,0.03,0.04,0.07,0.09,0.16,0.28,0.45,0.65,0.77,1]
x = np.asfarray(x)
y = np.asfarray(y)
plt.scatter(x, y)
x_new= np.linspace(min(x), max(x), 10000)
y_new = cubic_interpolate(x_new, x, y)
plt.plot(x_new, y_new)
from scipy.interpolate import CubicSpline
f = CubicSpline(x, y, bc_type='natural')
plt.plot(x_new, f(x_new), label='ref')
plt.legend()
plt.show()
In a conclusion, this updated algorithm shall perform interpolation with better stability and faster than the previous code (O(n)). Associated with numba or cython, it shall be even very fast. Finally, it is totally independent of Scipy.
Important, note that as most of algorithms, it is sometimes useful to normalize the data (e.g. against large or small number values) to get the best results. As well, in this code, I do not check nan values or ordered data.
Whatever, this update was a good lesson learning for me and I hope it can help someone. Let me know if you find something strange.
If you want to get the value
from scipy.interpolate import CubicSpline
import numpy as np
x = [-5,-4.19,-3.54,-3.31,-2.56,-2.31,-1.66,-0.96,-0.22,0.62,1.21,3]
y = [-0.01,0.01,0.03,0.04,0.07,0.09,0.16,0.28,0.45,0.65,0.77,1]
value = 2
#ascending order
if np.any(np.diff(x) < 0):
indexes = np.argsort(x).astype(int)
x = np.array(x)[indexes]
y = np.array(y)[indexes]
f = CubicSpline(x, y, bc_type='natural')
specificVal = f(value).item(0) #f(value) is numpy.ndarray!!
print(specificVal)
If you want to plot the interpolated function.
np.linspace third parameter increase the "accuracy".
from scipy.interpolate import CubicSpline
import numpy as np
import matplotlib.pyplot as plt
x = [-5,-4.19,-3.54,-3.31,-2.56,-2.31,-1.66,-0.96,-0.22,0.62,1.21,3]
y = [-0.01,0.01,0.03,0.04,0.07,0.09,0.16,0.28,0.45,0.65,0.77,1]
#ascending order
if np.any(np.diff(x) < 0):
indexes = np.argsort(x).astype(int)
x = np.array(x)[indexes]
y = np.array(y)[indexes]
f = CubicSpline(x, y, bc_type='natural')
x_new = np.linspace(min(x), max(x), 100)
y_new = f(x_new)
plt.plot(x_new, y_new)
plt.scatter(x, y)
plt.title('Cubic Spline Interpolation')
plt.show()
output:
Yes, as others have already noted, it should be as simple as
>>> from scipy.interpolate import CubicSpline
>>> CubicSpline(x,y)(u)
array(15.203125)
(you can, for example, convert it to float to get the value from a 0d NumPy array)
What has not been described yet is boundary conditions: the default ‘not-a-knot’ boundary conditions work best if you have zero knowledge about the data you’re going to interpolate.
If you see the following ‘features’ on the plot, you can fine-tune the boundary conditions to get a better result:
the first derivative vanishes at boundaries => bc_type=‘clamped’
the second derivative vanishes at boundaries => bc_type='natural'
the function is periodic => bc_type='periodic'
See my article for more details and an interactive demo.

Categories

Resources