Orthogonal Distance Regression of a trajectory in space (3D) - python

I'm using scipy ODR to perform a Othogonal Linear Regression.
I have a matrix of shape (nlines,3) representing trajectory coordinates, that is the columns are the x,y,z coordinates of a point moving in space.
The goal is to find the straight line that best approximate/fits the trajectory (similarly as asked here). Hence, the output straight line has the same shape as input: (nlines,3).
Problem:
what Model should I use for the intended goal? I'm trying odr.multilinear but I get an error.
Following the example given in the documentation with minor modification
# traj_data is my 2D data matrix with x,y,z coordinates
# create the array for the independent variable
nsamples = np.arange(0,traj_data.shape[0])
nsamples_3d = np.column_stack((nsamples, nsamples, nsamples))
# Define the function you want to fit against.:
linear = odr.multilinear # is this the correct Model to use?
# Create a Data instance.:
mydata = odr.Data(nsamples_3d, traj_data, wd=1, we=1)
# Instantiate ODR with your data, model and initial parameter estimate.:
myodr = odr.ODR(mydata, linear, beta0=[1., 2.])
# Run the fit.:
myoutput = myodr.run()
However, execution stops at
myodr = odr.ODR(mydata, linear, beta0=[1., 2.])
with the error
scipy.odr.odrpack.OdrError: fcn does not output [11700, 3]-shaped array
(where 11700 is nlines in the specific example I ran)
I have no problem when using 1D data.
Am I doing something wrong?

Related

How to reuse the coefficient obtained from scipy.interpolate.RectBivariateSpline.get_coeffs() to reconstruct the interpolation function?

I have a huge 3D (x,y,z) data set with (x,y) being the inputs and (z) is the output. Now the dataset is very large, and I need to use that information in real time with minimal delay.
Therefore, indexing/look-up table might seem slow. So my thought is to interpolate the dataset and in real time, instead of look-up table, I calcualte the value. So I don't have to store the original dataset but instead I can store the coefficients, which hopefully would be of smaller size than the original data set.
I used the scipy.interpolate.RectBivariateSpline to perform interpolation. And I was able to fit the data and also obtain coefficients. But I am not sure how to reconstruct the interpolation function from the coefficients.
I want to emphesize that the interpolation function will only be evaluated at input (x,y). Generalization is not of concern here.
from scipy import interpolate
import numpy as np
x = np.arange(1,500)
y = np.arange(2,200)
X,Y = np.meshgrid(x,y)
z = np.sin(X+Y).T
a = interpolate.RectBivariateSpline(x,y,z)
# print(len(a.get_coeffs()))
# coefficients can be obtained by a.get_coeffs()
# I want to have the following
# f = construct_spline_from_coefficient(a.get_coeffs())
# z = f(x_old, y_old)
Another approach I had in mind is use deep neural network. Can anyone shed some light here? Is this an over-kill?
The solution is in the scipy official doc (link).
Use bisplrep function (rep stands for representation) to obtain the interpoltaion output tck (see the docstring for bisplrep).
The output tck is an array and can be stored in a file.
Use bisplev (ev stands for evaluation) to constrcut an function.
For using nueral network at interpolation see this state-of-the-art (paper)
Training Neural Networks for and by Interpolation.

Fit in parallel but predict one by one - scikit-learn

I have some code who fits a Linear Regression to some data. In particular I have only one feature (one-dimensional x) but several variables to fit on the same x-values (two-dimensional y). I can then take advantage of parallelisation by fitting the whole y matrix at the same time.
The issue is that I than would like to store the predictors independently so that I can predict only one selected variable and not all the variables I fitted.
Here follows some code example :
#Generation of x and y
x = np.linspace(0,10,num=11).reshape(-1,1)
y = []
for i in range(5) :
coef = np.random.rand(2)*10
y.append(x*coef[0]+coef[1])
y = np.concatenate(y,axis=1)
#Linear regression fit
lin_reg = linear_model.LinearRegression(n_jobs=-1)
lin_reg.fit(X=x,y=y)
If I run lin_reg.predict(x) I get all the variables predicted (a 2D matrix). I would like to being able to save somewhere (in a DataFrame, but that's not an issue) the sub-prediction function, as if I could store lin_reg[i] or lin_reg.predict[i], which I could call to only predict the 1D array corresponding to the selected variable.
Is that possible ?

How do I get the corresponding frequency values to my data after FFT?

I have the following dataset in normal space, lets call it func:
I transformed it to fourierspace using the numpy fft algorithm from numpy.fft import fft as fourier, I received the fouriertransform usingfunc_fourier = np.fft.fftshift(fourier(func)) and plotted the absolute values plt.plot(np.abs(func_fourier)), what results in the following plot:.
I now want to fit a gaussian model to this function in fourierspace. The problem is, that I dont have x-values(frequencies) that I could plot my func_fourier over. How do I create the correct frequency array in fourierspace, which I also need for fitting the gaussian model to my transformed function ?
The default x-values are created as follows:
frequencies = list(range(len(y)))
Note: According to your explanation, your Fourier transformed values are stored in func_fourier, so the y is func_fourier.

Python code for curve fitting by convolution of a gausian and multi exponential decay

I'm developing a code for fitting a data with a model which is convolution of two functions (Gaussian with multi exponential decay exp(Ax)+exp(Bx)+...). basically the fitting with only Gaussian and/or Gaussian modified https://en.wikipedia.org/wiki/Exponentially_modified_Gaussian_distribution is working perfectly fine in Lmfit but using the builtin convolution (i.e if np.convolve of two functions is used Lmfit doesn't work.
I have tried many examples on internet, so far I realized that my functions returns inf or nan values and also data is not equally spaced for being used in convolution. I found a detour for the issue by using the mathematical expression of convolution and by using scipy.optimize.curve_fit .But it is a very clumsy and time consuming, I would like to find a way to making it more sophisticated and general by using a convolution of two functions and using lmfit where I can control the parameters a lot easier.
The data set is also included in comments as your reference.
w=0.1 # is constant
def CONVSum(x,w,*p):
n=np.int(len(p)/3)
A=p[:n]
B=p[n:2*n]
C=p[2*n:3*n]
# =======================================================================
# below formula is derived as mathematical expression of convoluted multi exponential components with a gaussian distribution based on the instruction given in http://www.np.ph.bham.ac.uk/research_resources/programs/halflife/gauss_exp_conv.pdf
# ======================================================================
fnct=sum(np.float64([A[i]*np.exp(-B[i]*((x-C[i])-(0.5*np.square(w)*B[i])))*(1+scipy.special.erf(((x-C[i])-(np.square(w)*B[i]))/(np.sqrt(2)*w))) for i in range(n)]))
fnct[np.isnan(fnct)]=0
fnct[fnct<1e-12]=0
return fnct
N=4 #number of exponential functions to be fitted
params = np.linspace(1, 0.0001, N*3); #parameters for a multiple exponential
popt,pcov = curve_fit(CONVSum,x,y,p0=params,
bounds=((0,0,0,0,-np.inf,-np.inf,-np.inf,-np.inf,-3,-3,-3,-3),
(1,1,1,1, np.inf, np.inf, np.inf, np.inf, 3, 3, 3, 3)),
maxfev = 1000000)
fitted data with curve fitt
Any help or hint regarding the fitting with convolution of Gaussian and multiple exponential decay is highly appreciated, I prefer using lmfit since I can identify parameters very nicely and also to relate them to each other.
Ideally I want to fit my data with the parameters where some of them are shared among the data sets, some are delayed (+off_set).
Well, your script is a bit hard to read and follow closely with lots of stuff that is not related to your question. Your exgauss function is not guarding against infinities. np.exp(x) for x>~ 710 will give Inf, and the fit will not be able to proceed.
Here is the equvalent of the cure fitting code given in question. I managed to creat this by using very great instruction and infromation in here and here. But still it needs to be developed.
# =============================================================================
# below formula is drived as mathematical expresion of convoluted multi exponential components with a gausian distribution based on the instruction given in http://www.np.ph.bham.ac.uk/research_resources/programs/halflife/gauss_exp_conv.pdf
# =============================================================================
def CONVSum(x,params):
fnct=sum(
np.float64([
(params['amp%s_%s'%(n,i)].value)*np.exp(-(params['dec%s_%s'%(n,i)].value)*((x-(params['cen%s_%s'%(n,i)].value))-
(0.5*np.square((params['sig%s_%s'%(n,i)].value))*(params['dec%s_%s'%(n,i)].value))))*
(1+scipy.special.erf(((x-(params['cen%s_%s'%(n,i)].value))-(np.square((params['sig%s_%s'%(n,i)].value))*
(params['dec%s_%s'%(n,i)].value)))/(np.sqrt(2)*(params['sig%s_%s'%(n,i)].value)))) for n in range(N) for i in wav
])
)
fnct=fnct/fnct.max()
return fnct
# =============================================================================
# this global fit were adapted from https://stackoverflow.com/questions/20339234/python-and-lmfit-how-to-fit-multiple-datasets-with-shared-parameters/20341726#20341726
# it is of very important thet we can identify the shared parameteres for datasets
# =============================================================================
def objective(params, x, data):
""" calculate total residual for fits to several data sets"""
ndata = data.shape[0]
resid = 0.0*data[:]
# make residual per data set
resid = data- CONVSum(x,params)
# now flatten this to a 1D array, as minimize() needs
return resid.flatten()
# selec datasets
x = df[949].index
data =df[949].values
# create required sets of parameters, one per data set
N=4 #number of exponential decays
wav=[949] #the desired data to be fitted
fit_params = Parameters()
for i in wav:
for n in range(N):
fit_params.add( 'amp%s_%s'%(n,i), value=1, min=0.0, max=1)
fit_params.add( 'dec%s_%s'%(n,i), value=0.5, min=-1e10, max=1e10)
fit_params.add( 'cen%s_%s'%(n,i), value=0.1, min=-3.0, max=1000)
fit_params.add( 'sig%s_%s'%(n,i), value=0.1, min=0.05, max=0.5)
# now we constrain some values to have the same value
# for example assigning sig_2, sig_3, .. sig_5 to be equal to sig_1
for i in wav:
for n in (1,2,3):
print(n,i)
fit_params['sig%s_%s'%(n,i)].expr='sig0_949'
fit_params['cen%s_%s'%(n,i)].expr='cen0_949'
# it will run the global fit to all the data sets
result = minimize(objective, fit_params, args=(x,data))
report_fit(result.params)
# plot the data sets and fits
plt.close('all')
plt.figure()
for i in wav:
y_fit = CONVSum(x,result.params)
plt.plot(x, data, 'o-', x, y_fit, '-')
plt.xscale('symlog')
plt.show()
fitted data with convolution of multi exponential and gausian
unfortunately the fitted results are not very satisfying, I am still looking for some advice to improve this.

plotting a line in matplotlib in python [duplicate]

I am trying to plot the decision boundary of a perceptron algorithm and am really confused about a few things. My input instances are in the form [(x1,x2),target_Value], basically a 2-d input instance and a 2 class target_value [1 or 0].
My weight vector hence is in the form: [w1,w2] Now I have to incorporate an additional bias parameter w0 and hence my weight vector becomes a 3x1 vector? is it 1x3 vector? I think it should be 1x3 since a vector has only 1 row and n columns.
Now let's say I instantiate [w0,w1,w2] to random values, how would I plot the decision boundary for this? Meaning what does w0 signify here? Is w0/norm(w) the distance of the decision region from the origin? If so how do I capture this and plot it in python using matplotlib.pyplot or its matlab equivalent? I would really appreciate even a little help regarding this matter.
from pylab import norm
import matplotlib.pyplot as plt
n = norm(weight_vector) #this is of the form [w0,w1,w2], w0 is bias parameter
ww = weight_vector/n #unit vector in the direction of weight_vector
ww1 = [ww[1],-ww[0]]
ww2 = [-ww[1],ww[0]]
plot([ww1[0], ww2[0]],[ww1[1], ww2[1]],'--k')
Here I want to incorporate the w0 parameter to indicate the distance of the displacement of the weight vector from the origin since that's what w0/norm(w) indicates?
When I plot the vector as mentioned in the comments below I get a vector of really small length, how would it be possible for me to extend this decision boundary in both directions?
The small dashed line near location [0,0] in the figure is my decision region, how can I make it longer in both directions? If I try to multiply each of its components, the figure scale changes, I am using matplotlib.pyplot.plot() function to achieve this.
First of all, you shouldn't add the bias to the input vectors. You only need to subtract or add the bias to all of your input vectors.
For plotting, you might want to try plot the linear function that passes the two weight points.

Categories

Resources