Spline in 3D can not be differentiated due to an AttributeError - python

I am trying to fit a smoothing B-spline to some data and I found this very helpful post on here. However, I not only need the spline, but also its derivatives, so I tried to add the following code to the example:
tck_der = interpolate.splder(tck, n=1)
x_der, y_der, z_der = interpolate.splev(u_fine, tck_der)
For some reason this does not seem to work due to some data type issues. I get the following traceback:
Traceback (most recent call last):
File "interpolate_point_trace.py", line 31, in spline_example
tck_der = interpolate.splder(tck, n=1)
File "/home/user/anaconda3/lib/python3.7/site-packages/scipy/interpolate/fitpack.py", line 657, in splder
return _impl.splder(tck, n)
File "/home/user/anaconda3/lib/python3.7/site-packages/scipy/interpolate/_fitpack_impl.py", line 1206, in splder
sh = (slice(None),) + ((None,)*len(c.shape[1:]))
AttributeError: 'list' object has no attribute 'shape'
The reason for this seems to be that the second argument of the tck tuple contains a list of numpy arrays. I thought turning the input data to be a numpy array as well would help, but it does not change the data types of tck.
Does this behavior reflect an error in scipy, or is the input malformed?
I tried manually turning the list into an array:
tck[1] = np.array(tck[1])
but this (which didn't surprise me) also gave an error:
ValueError: operands could not be broadcast together with shapes (0,8) (7,1)
Any ideas of what the problem could be? I have used scipy before and on 1D splines the splder function works just fine, so I assume it has something to do with the spline being a line in 3D.
------- edit --------
Here is a minimum working example:
import numpy as np
import matplotlib.pyplot as plt
from scipy import interpolate
from mpl_toolkits.mplot3d import Axes3D
total_rad = 10
z_factor = 3
noise = 0.1
num_true_pts = 200
s_true = np.linspace(0, total_rad, num_true_pts)
x_true = np.cos(s_true)
y_true = np.sin(s_true)
z_true = s_true / z_factor
num_sample_pts = 80
s_sample = np.linspace(0, total_rad, num_sample_pts)
x_sample = np.cos(s_sample) + noise * np.random.randn(num_sample_pts)
y_sample = np.sin(s_sample) + noise * np.random.randn(num_sample_pts)
z_sample = s_sample / z_factor + noise * np.random.randn(num_sample_pts)
tck, u = interpolate.splprep([x_sample, y_sample, z_sample], s=2)
x_knots, y_knots, z_knots = interpolate.splev(tck[0], tck)
u_fine = np.linspace(0, 1, num_true_pts)
x_fine, y_fine, z_fine = interpolate.splev(u_fine, tck)
# this is the part of the code I inserted: the line under this causes the crash
tck_der = interpolate.splder(tck, n=1)
x_der, y_der, z_der = interpolate.splev(u_fine, tck_der)
# end of the inserted code
fig2 = plt.figure(2)
ax3d = fig2.add_subplot(111, projection='3d')
ax3d.plot(x_true, y_true, z_true, 'b')
ax3d.plot(x_sample, y_sample, z_sample, 'r*')
ax3d.plot(x_knots, y_knots, z_knots, 'go')
ax3d.plot(x_fine, y_fine, z_fine, 'g')
fig2.show()
plt.show()

Stumbled into the same problem...
I circumvented the error by using interpolate.splder(tck, n=1) and instead used interpolate.splev(spline_ev, tck, der=1) which returns the derivatives at the points spline_ev (see Scipy Doku).
If you need the spline I think you can then use interpolate.splprep() again.
In total something like:
import numpy as np
from scipy import interpolate
import matplotlib.pyplot as plt
points = np.random.rand(10,2) * 10
(tck, u), fp, ier, msg = interpolate.splprep(points.T, s=0, k=3, full_output=True)
spline_ev = np.linspace(0.0, 1.0, 100, endpoint=True)
spline_points = interpolate.splev(spline_ev, tck)
# Calculate derivative
spline_der_points = interpolate.splev(spline_ev, tck, der=1)
spline_der = interpolate.splprep(spline_der_points.T, s=0, k=3, full_output=True)
# Plot the data and derivative
fig = plt.figure()
plt.plot(points[:,0], points[:,1], '.-', label="points")
plt.plot(spline_points[0], spline_points[1], '.-', label="tck")
plt.plot(spline_der_points[0], spline_der_points[1], '.-', label="tck_der")
# Show tangent
plt.arrow(spline_points[0][23]-spline_der_points[0][23], spline_points[1][23]-spline_der_points[1][23], 2.0*spline_der_points[0][23], 2.0*spline_der_points[1][23])
plt.legend()
plt.show()
EDIT:
I also opened an Issue on Github and according to ev-br the usage of interpolate.splprep is depreciated and one should use make_interp_spline / BSpline instead.

As noted in other answers, splprep output is incompatible with splder, but is compatible with splev. And the latter can evaluate the derivatives.
However, for interpolation, there is an alternative approach, which avoids splprep altogether. I'm basically copying a reply on the SciPy issue tracker (https://github.com/scipy/scipy/issues/10389):
Here's an example of replicating the splprep outputs. First let's make sense out of the splprep output:
# start with the OP example
import numpy as np
from scipy import interpolate
points = np.random.rand(10,2) * 10
(tck, u), fp, ier, msg = interpolate.splprep(points.T, s=0, k=3, full_output=True)
# check the meaning of the `u` array: evaluation of the spline at `u`
# gives back the original points (up to a list/transpose)
xy = interpolate.splev(u, tck)
xy = np.asarray(xy)
np.allclose(xy.T, points)
Next, let's replicate it without splprep. First, build the u array: the curve is represented parametrically, and u is essentially an approximation for the arc length. Other parametrizations are possible, but here let's stick to what splprep does. Translating the pseudocode from the doc page, https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.splprep.html
vv = np.sum((points[1:, :] - points[:-1, :])**2, axis=1)
vv = np.sqrt(vv).cumsum()
vv/= vv[-1]
vv = np.r_[0, vv]
# check:
np.allclose(u, vv)
Now, interpolate along the parametric curve: points vs vv:
spl = interpolate.make_interp_spline(vv, points)
# check spl.t vs knots from splPrep
spl.t - tck[0]
The result, spl, is a BSpline object which you can evaluate, differentiate etc in a usual way:
np.allclose(points, spl(vv))
# differentiate
spl_derivative = spl.derivative(vv)

Related

Estimate joint density with 2d Gaussian kernel

I have the following data set where I have to estimate the joint density of 'bwt' and 'age' using kernel density estimation with a 2-dimensional Gaussian kernel and width h=5. I can't use modules such as scipy where there are ready functions to do this and I have to built functions to calculate the density. Here's what I've gotten so far.
import numpy as np
import pandas as pd
babies_full = pd.read_csv("https://www2.helsinki.fi/sites/default/files/atoms/files/babies2.txt", sep='\t')
#Getting the columns I need
babies_full1=babies_full[['gestation', 'age']]
x=np.array(babies_full1,'int')
#2d Gaussian kernel
def k_2dgauss(x):
return np.exp(-np.sum(x**2, 1)/2) / np.sqrt(2*np.pi)
#Multivariate kernel density
def mv_kernel_density(t, x, h):
d = x.shape[1]
return np.mean(k_2dgauss((t - x)/h))/h**d
t = np.linspace(1.0, 5.0, 50)
h=5
print(mv_kernel_density(t, x, h))
However, I get a value error 'ValueError: operands could not be broadcast together with shapes (50,) (1173,2)' which think is because different shape of the matrices. I also don't understand why k_2dgauss(x) for me returns an array of zeros since it should only return one value. In general, I am new to the concept of kernel density estimation I don't really know if I've written the functions right so any hints would help!
Following on from my comments on your original post, I think this is what you want to do, but if not then come back to me and we can try again.
# info supplied by OP
import numpy as np
import pandas as pdbabies_full = \
pd.read_csv("https://www2.helsinki.fi/sites/default/files/atoms/files/babies2.txt", sep='\t')
#Getting the columns I need
babies_full1=babies_full[['gestation', 'age']]
x=np.array(babies_full1,'int')
# my contributions
from math import floor, ceil
def binMaker(arr, base):
"""function I already use for this sort of thing.
arr is the arr I want to make bins for
base is the bin separation, but does require you to import floor and ceil
otherwise you can make these bins manually yourself"""
binMin = floor(arr.min() / base) * base
binMax = ceil(arr.max() / base) * base
return np.arange(binMin, binMax + base, base)
bins1 = binMaker(x[:,0], 20.) # bins from 140. to 360. spaced 20 apart
bins2 = binMaker(x[:,1], 5.) # bins from 15. to 45. spaced 5. apart
counts = np.zeros((len(bins1)-1, len(bins2)-1)) # empty array for counts to go in
for i in range(0, len(bins1)-1): # loop over the intervals, hence the -1
boo = (x[:,0] >= bins1[i]) * (x[:,0] < bins1[i+1])
for j in range(0, len(bins2)-1): # loop over the intervals, hence the -1
counts[i,j] = np.count_nonzero((x[boo,1] >= bins2[j]) *
(x[boo,1] < bins2[j+1]))
# if you want your PDF to be a fraction of the total
# rather than the number of counts, do the next line
counts /= x.shape[0]
# plotting
import matplotlib.pyplot as plt
from matplotlib.colors import BoundaryNorm
# setting the levels so that each number in counts has its own colour
levels = np.linspace(-0.5, counts.max()+0.5, int(counts.max())+2)
cmap = plt.get_cmap('viridis') # or any colormap you like
norm = BoundaryNorm(levels, ncolors=cmap.N, clip=True)
fig, ax = plt.subplots(1, 1, figsize=(6,5), dpi=150)
pcm = ax.pcolormesh(bins2, bins1, counts, ec='k', lw=1)
fig.colorbar(pcm, ax=ax, label='Counts (%)')
ax.set_xlabel('Age')
ax.set_ylabel('Gestation')
ax.set_xticks(bins2)
ax.set_yticks(bins1)
plt.title('Manually making a 2D (joint) PDF')
If this is what you wanted, then there is an easier way with np.histgoram2d, although I think you specified it had to be using your own methods, and not built in functions. I've included it anyway for completeness' sake.
pdf = np.histogram2d(x[:,0], x[:,1], bins=(bins1,bins2))[0]
pdf /= x.shape[0] # again for normalising and making a percentage
levels = np.linspace(-0.5, pdf.max()+0.5, int(pdf.max())+2)
cmap = plt.get_cmap('viridis') # or any colormap you like
norm = BoundaryNorm(levels, ncolors=cmap.N, clip=True)
fig, ax = plt.subplots(1, 1, figsize=(6,5), dpi=150)
pcm = ax.pcolormesh(bins2, bins1, pdf, ec='k', lw=1)
fig.colorbar(pcm, ax=ax, label='Counts (%)')
ax.set_xlabel('Age')
ax.set_ylabel('Gestation')
ax.set_xticks(bins2)
ax.set_yticks(bins1)
plt.title('using np.histogram2d to make a 2D (joint) PDF')
Final note - in this example, the only place where counts doesn't equal pdf is for the bin between 40 <= age < 45 and 280 <= gestation 300, which I think is due to how, in my manual case, I've used <= and <, and I'm a little unsure how np.histogram2d handles values outside the bin ranges, or on the bin edges etc. We can see the element of x that is responsible
>>> print(x[1011])
[280 45]

Curve_Fit not accurate

i tried to fit very fluctual data over time as good as possible. So first i smoothed the data which is working fine. The smoothed data I get from this should further be represented from a fit to get out more of the peaks. As you see in the code I want to use an log-tanh function to fit the data. I am well aware that this problem accured in some of the threads already, but I tried them already and the data is also not very small or very big which i know can also cause problems.
The polynomial fit i tried works also pretty good as you see, but it does not eliminate all the wavy values. They cause problems for the following derivative which is very bad.
import tkinter as tk
from tkinter import filedialog
import numpy as np
import scipy.signal
from scipy.optimize import curve_fit
from numpy import diff
import matplotlib.pyplot as plt
from lmfit.models import StepModel, LinearModel
def loghypfunc(x, A, B, C, D, E):
return A*np.log(1+x)+B*np.tanh(C*x)+D*x+E
def expfunc(t, c0, c1, c2, c3):
return c0+c1*t-c2*np.exp(-c3*t)
def expdecay(x, a, b, c):
return a * np.exp(-b * x) + c
path="C:/Users/Sammy/Documents/Masterarbeit WT/CSM und Kriechdaten/Kriechen/Creep_10mN_00008_LC_20210406_2121_DYN.txt"
dataFile = np.loadtxt(path, delimiter='\t', skiprows=2, usecols=(0, 1, 2, 3, 29, 30), dtype=float)
num_rows, num_cols = dataFile.shape
# time column
time = dataFile[:, [0]].transpose()
time = time.flatten()
refTime = time[0] # get first time in column (reference)
# genullte Testzeit
timeNull = time - refTime
print("time", time)
flatTimeNull = timeNull.flatten() # jetzt ein 1D array (one row)
##################################################################################
# indent displacement column
indentDis = dataFile[:, [4]].transpose()
indentDis = indentDis.flatten()
indentDis = indentDis - indentDis[0]
# the indendt data has to be smoothed so there is not such a big fluctuation
indentSmooth = scipy.signal.savgol_filter(indentDis, 2001, 3)
# null the indent Smooth data
indentSmooth_Null = indentSmooth - indentSmooth[0]
hind_Smooth_flat = indentSmooth_Null.flatten() # jetzt ein 1D array
print('indent smooth', indentSmooth)
######################################################################
p0 = [100, 0.1, 100, 0.1]
c, cov = curve_fit(expfunc, time, indentSmooth, p0)
y_indent = expfunc(indentSmooth, *c)
p0 = [70, 0.5, 50, 0.1, 100]
popt, pcov = curve_fit(loghypfunc, time, indentSmooth, p0, maxfev = 10000)
y_indentTan = loghypfunc(indentSmooth, *popt)
modelh_t = np.poly1d(np.polyfit(time, indentSmooth, 8))
plt.plot(time, indentSmooth, 'r', label="Data smoothed")
plt.scatter(time, modelh_t(time), s=0.1, label="Polyfit")
plt.plot(time, y_indentTan, label="Curve fit Tangens function")
plt.plot(time, y_indent, label="Curve fit exp function")
plt.legend(loc="lower right")
plt.xlabel("time")
plt.ylabel("indent")
plt.show()
These are the two arrays i get the data from
time [ 6.299596 6.349592 6.399589 ... 608.0109 608.060897 608.110894]
indent smooth [120.81411822 121.07093706 121.32748184 ... 476.78825661 476.89357473 476.99915287]
Here the plots
Plots
The question for me now is how to fix it. Is it because of the false optimizied parameters to fit? But python should do that automatic sufficiently good i guess?
My second guess was that the data is timed to compact along this axes, as the array is about 12000 values long. Could this be a reason?
I would be very grateful for any kind of advices regarding the fits.
Regards
Hndrx

plotting vector equations with increasing variables on both axes python

i have been attempting a vector plot (by using quiver) in which every location on the grid is assigned a vector dependant on the location and equations but i am stuck on trying to use a range of both axis parameters (x1 and x3), getting an error:TypeError: only length-1 arrays can be converted to Python scalars
this is the code as built so far and any help would be amazing:
def SVmotion(t,A,beta,f,j):
x1= np.arange(0,10001,100)
x3= np.arange(0,10001,100)
w=2*np.pi*f
k=w/beta
k1=k*np.sin(j)
k3=k*np.cos(j)
k_beta_x = k1*x1+k3*x3
theta = k_beta_x-w*t
Usvx1 = k3*A*complex(-np.sin(theta),np.cos(theta))
Usvx3 = k1*A*complex(-np.sin(theta),np.cos(theta))
Usvx1_real=Usvx1.real
Usvx3_real=Usvx3.real
return Usvx1_real, Usvx3_real
fig ,ax = plt.subplots()
ax.quiver(x1,x3,Usvx1_real,Usvx3_real)
SVmotion(0,1,3000,2,0)
The issue is that 'theta' is an array. Please check if the following helps.
import numpy as np
def SVmotion(t,A,beta,f,j):
x1= np.arange(0,10001,100)
x3= np.arange(0,10001,100)
w=2*np.pi*f
k=w/beta
k1=k*np.sin(j)
k3=k*np.cos(j)
k_beta_x = k1*x1+k3*x3
theta = k_beta_x-w*t
for t in theta:
Usvx1 = k3*A*complex(-np.sin(t),np.cos(t))
Usvx3 = k1*A*complex(-np.sin(t),np.cos(t))
Usvx1_real=Usvx1.real
Usvx3_real=Usvx3.real
return Usvx1_real, Usvx3_real
SVmotion(0,1,3000,2,0)
#(0.003627598728468422, 0.0)

scipy.optimize.curvefit fails when using bounds

I'm trying to fit a set of data with a function (see the example below) using scipy.optimize.curvefit,
but when I use bounds (documentation) the fit fails and I simply get
the initial guess parameters as output.
As soon as I substitute -np.inf ad np.inf as bounds for the second parameter
(dt in the function), the fit works.
What am I doing wrong?
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize as opt
#Generate data
crc=np.array([-1.4e-14, 7.3e-14, 1.9e-13, 3.9e-13, 6.e-13, 8.0e-13, 9.2e-13, 9.9e-13,
1.e-12, 1.e-12, 1.e-12, 1.0e-12, 1.1e-12, 1.1e-12, 1.1e-12, 1.0e-12, 1.1e-12])
time=np.array([0., 368., 648., 960., 1520.,1864., 2248., 2655., 3031.,
3384., 3688., 4048., 4680., 5343., 6055., 6928., 8120.])
#Define the function for the fit
def testcurve(x, Dp, dt):
k = -Dp*(x+dt)*2e11
curve = 1e-12 * (1+2*(-np.exp(k) + np.exp(4*k) - np.exp(9*k) + np.exp(16*k)))
curve[0]= 0
return curve
#Set fit bounds
dtmax=time[2]
param_bounds = ((-np.inf, -dtmax),(np.inf, dtmax))
#Perform fit
(par, par_cov) = opt.curve_fit(testcurve, time, crc, p0 = (5e-15, 0), bounds = param_bounds)
#Print and plot output
print(par)
plt.plot(time, crc, 'o')
plt.plot(time, testcurve(time, par[0], par[1]), 'r-')
plt.show()
I encountered the same behavior today in a different fitting problem. After some searching online, I found this link quite helpful: Why does scipy.optimize.curve_fit not fit to the data?
The short answer is that: using extremely small (or large) numbers in numerical fitting is not robust and scale them leads to a much better fitting.
In your case, both crc and Dp are extremely small numbers which could be scaled up. You could play with the scale factors and within certain range the fitting looks quite robust. Full example:
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize as opt
#Generate data
crc=np.array([-1.4e-14, 7.3e-14, 1.9e-13, 3.9e-13, 6.e-13, 8.0e-13, 9.2e-13, 9.9e-13,
1.e-12, 1.e-12, 1.e-12, 1.0e-12, 1.1e-12, 1.1e-12, 1.1e-12, 1.0e-12, 1.1e-12])
time=np.array([0., 368., 648., 960., 1520.,1864., 2248., 2655., 3031.,
3384., 3688., 4048., 4680., 5343., 6055., 6928., 8120.])
# add scale factors to the data as well as the fitting parameter
scale_factor_1 = 1e12 # 1./np.mean(crc) also works if you don't want to set the scale factor manually
scale_factor_2 = 1./2e11
#Define the function for the fit
def testcurve(x, Dp, dt):
k = -Dp*(x+dt)*2e11 * scale_factor_2
curve = 1e-12 * (1+2*(-np.exp(k) + np.exp(4*k) - np.exp(9*k) + np.exp(16*k))) * scale_factor_1
curve[0]= 0
return curve
#Set fit bounds
dtmax=time[2]
param_bounds = ((-np.inf, -dtmax),(np.inf, dtmax))
#Perform fit
(par, par_cov) = opt.curve_fit(testcurve, time, crc*scale_factor_1, p0 = (5e-15/scale_factor_2, 0), bounds = param_bounds)
#Print and plot output
print(par[0]*scale_factor_2, par[1])
plt.plot(time, crc*scale_factor_1, 'o')
plt.plot(time, testcurve(time, par[0], par[1]), 'r-')
plt.show()
Fitting results: [6.273102923176595e-15, -21.12202697564494], which gives a reasonable fitting and also is very close to the result without any bounds: [6.27312512e-15, -2.11307470e+01]

Finding first derivative using DFT in Python

I want to find the first derivative of exp(sin(x)) on the interval [0, 2/pi] using a discrete Fourier transform. The basic idea is to first evaluate the DFT of exp(sin(x)) on the given interval, giving you say v_k, followed by computing the inverse DFT of ikv_k giving you the desired answer. In reality, due to the implementations of Fourier transforms in programming languages, you might need to reorder the output somewhere and/or multiply by different factors here and there.
I first did it in Mathematica, where there is an option FourierParameters, which enables you to specify a convention for the transform. Firstly, I obtained the Fourier series of a Gaussian, in order to see what the normalisation factors are that I have to multiply by and then went on finding the derivative. Unfortunately, translating my Mathematica code into Python thereafter (whereby again I first did the Fourier series of a Gaussian - this was successful), I didn't get the same results. Here is my code:
N=1000
xmin=0
xmax=2.0*np.pi
step = (xmax-xmin)/(N)
xdata = np.linspace(xmin, xmax-step, N)
v = np.exp(np.sin(xdata))
derv = np.cos(xdata)*v
vhat = np.fft.fft(v)
kvals1 = np.arange(0, N/2.0, 1)
kvals2 = np.arange(-N/2.0, 0, 1)
what1 = np.zeros(kvals1.size+1)
what2 = np.empty(kvals2.size)
it = np.nditer(kvals1, flags=['f_index'])
while not it.finished:
np.put(what1, it.index, 1j*(2.0*np.pi)/((xmax-xmin))*it[0]*vhat[[int(it[0])]])
it.iternext()
it = np.nditer(kvals2, flags=['f_index'])
while not it.finished:
np.put(what2, it.index, 1j*(2.0*np.pi)/((xmax-xmin))*it[0]*vhat[[int(it[0])]])
it.iternext()
xdatafull = np.concatenate((xdata, [2.0*np.pi]))
what = np.concatenate((what1, what2))
w = np.real(np.fft.ifft(what))
fig = plt.figure()
ax = plt.gca()
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')
ax.xaxis.set_ticks_position('bottom')
ax.spines['bottom'].set_position(('data',0))
ax.yaxis.set_ticks_position('left')
ax.spines['left'].set_position(('data',0))
plt.plot(xdata, derv, color='blue')
plt.plot(xdatafull, w, color='red')
plt.show()
I can post the Mathematica code, if people want me to.
Turns out the problem is that np.zeros gives you an array of real zeroes and not complex ones, hence the assignments after that don't change anything, as they are imaginary.
Thus the solution is quite simply
import numpy as np
N=100
xmin=0
xmax=2.0*np.pi
step = (xmax-xmin)/(N)
xdata = np.linspace(step, xmax, N)
v = np.exp(np.sin(xdata))
derv = np.cos(xdata)*v
vhat = np.fft.fft(v)
what = 1j*np.zeros(N)
what[0:N/2.0] = 1j*np.arange(0, N/2.0, 1)
what[N/2+1:] = 1j*np.arange(-N/2.0 + 1, 0, 1)
what = what*vhat
w = np.real(np.fft.ifft(what))
# Then plotting
whereby the np.zeros is replaced by 1j*np.zeros

Categories

Resources