Non-linear fit in Python 3.3 - python

I would like to know how to do a non-linear fit in Python 3.3. I am not finding any easy examples online. I am not well aware of these fitting techniques.
Any help will be welcome!
Thanks in advance.

To follow an easy example, visit http://www.walkingrandomly.com/?p=5215
Here is the code with explanations!
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import numpy as np
xdata = np.array([-2,-1.64,-1.33,-0.7,0,0.45,1.2,1.64,2.32,2.9])
ydata = np.array([0.69,0.70,0.69,1.0,1.9,2.4,1.9,0.9,-0.7,-1.4])
def func(x, p1,p2):
return p1*np.cos(p2*x) + p2*np.sin(p1*x)
# Here you give the initial parameters for p0 which Python then iterates over
# to find the best fit
popt, pcov = curve_fit(func,xdata,ydata,p0=(1.0,0.3))
print(popt) # This contains your two best fit parameters
# Performing sum of squares
p1 = popt[0]
p2 = popt[1]
residuals = ydata - func(xdata,p1,p2)
fres = sum(residuals**2)
print(fres)
xaxis = np.linspace(-2,3,100) # we can plot with xdata, but fit will not look good
curve_y = func(xaxis,p1,p2)
plt.plot(xdata,ydata,'*')
plt.plot(xaxis,curve_y,'-')
plt.show()

Related

Displaying chi squared as the uncertainty of fit parameters in scipy.optimize

I am doing some curve fitting in python with the aid of scipy.optimize curve_fit. Normally I am satisfied with the scipy's default results.
However this time I would like to display the function with chi_squared as the uncertainty of my fit parameters and I don't know how to deal with this.
I have tried to use the parameter absolute_sigma=True instead of the default absolute_sigma=False. However according to my separate calculations, for absolute_sigma=False the uncertainty of the parameters, is equal to reduced_chi_squared. But it isn't chi_squared for absolute_sigma=True.
Within the documentation itself: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html chi_squared is mentioned few time, however it's not written explicitly how to display it within the plot and how to use it as the uncertainties for of the fit parameters.
My code is as follows:
# Necessary libraries
# Numpy will be used for the actual function (in this example it's not necessary)
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from uncertainties import *
#Some shortened data
Wave_lengths_500_est = [380, 447.1, 486, 492.2, 656, 700, 706.5]
Avraged_deflection_angles_500 = [11.965, 12.8, 13.93, 14.325, 19.37, 18.26, 18.335]
#The function to which the data are to be fitted
def lin_function(x,a,b):
return a*x + b
def line_fit_lamp_1_500():
#Niepewność kątowa na poziomie 2 minut kątowych. Wyrażona w stopniach
angle_error = 0.03333
#Dodanie punktów danych na wykresie
plt.scatter(Wave_lengths_500_est, Avraged_deflection_angles_500, color='b')
plt.errorbar(Wave_lengths_500_est, Avraged_deflection_angles_500, yerr = angle_error, fmt = 'o')
#Fitting of the function "function_lamp1_500" to the data points
Wave_length = np.array(Wave_lengths_500_est)
Defletion_angle = np.array(Avraged_deflection_angles_500)
popt, pcov = curve_fit(lin_function, Wave_length, Defletion_angle, absolute_sigma=True)
perr = np.sqrt(np.diag(pcov))
y = lin_function(Wave_length, *popt)
# Graph looks
plt.plot(Wave_length, y, '--', color = 'g', label="fit with: $a={:.3f}\pm{:.5f}$, $b={:.3f}\pm{:.5f}$" .format(popt[0], perr[0], popt[1], perr[1]))
plt.legend()
plt.xlabel('Długość fali [nm]')
plt.ylabel('Kąt załamania [Stopnie]')
plt.show()
# Function call
line_fit_lamp_1_500()
Toggling between absolute_sigma=True/False I get a change of uncertainty of the parameter a\pm0.00308 and b\pm1.74571 for absolute_sigma=True to a\pm0.022 and b\pm1.41679 for absolute_sigma=False. Versus an expected value of a\pm0.0001027 and b\pm0.058132 for chi_squared and a\pm0.002503 and b\pm1.4167 for reduced_chi_squared
Additionally could you please elaborate what does the expression .format(popt[0], perr[0], popt[1], perr[1]) do?
All help is appreciated, thanks in advance.

Calculating scale/dispersion of Gamma GLM using statsmodels

I'm having trouble obtaining the dispersion parameter of simulated data using statsmodels' GLM function.
import statsmodels.api as sm
import matplotlib.pyplot as plt
import scipy.stats as stats
import numpy as np
np.random.seed(1)
# Generate data
x=np.random.uniform(0, 100,50000)
x2 = sm.add_constant(x)
a = 0.5
b = 0.2
y_true = 1/(a+(b*x))
# Add error
scale = 2 # the scale parameter I'm trying to obtain
shape = y_true/scale # given that, for Gamma, mu = scale*shape
y = np.random.gamma(shape=shape, scale=scale)
# Run model
model = sm.GLM(y, x2, family=sm.families.Gamma()).fit()
model.summary()
Here's the summary from above:
Note that the coefficient estimates are correct (0.5 and 0.2), but the scale (21.995) is way off the scale I set (2).
Can someone point out what it is I'm misunderstanding/doing wrong? Thanks!
As Josef noted in the comments, statsmodels uses a different kind of parameterization.

How do you get scipy curve_fit to find a reasonable result if you don't have a good initial parameter guess?

I am trying to get a simple fit to my data of an exponential decay of the form a*(x-x0)**b, where I know a and b must be negative. So if you plot it on a log-log plot, I should see a linear trend for the obtained data.
As such, I'm giving scipy.optimize initial guesses where a and b are negative, but it keeps ignoring them and giving me the error,
OptimizeWarning: Covariance of the parameters could not be estimated
.. and giving me values for a and b that are positive. It then also does not give me an exponential decay, but a parabola that bottoms out and begins to increase.
I have tried many different guesses as to the initial parameters over a large range of values (one such is in the code below), but none worked without giving me the nonsensical return and the error. This has made me start to wonder if my code is wrong, or if there's just some obvious way to get good initial guesses into the code that won't be rejected.
import math
import numpy as np
import sys
import matplotlib.pyplot as plt
import scipy as sp
import scipy.optimize
from scipy.optimize import curve_fit
import numpy.polynomial.polynomial as poly
x= [1987, 1993.85, 2003, 2010.45, 2009.3, 2019.4]
t= [31, 8.6, 4.84, 1.96, 3.9, 1.875]
def model_func(x, a, b, x0):
return (a*(x-x0)**b)
# curve fit
p0 = (-.0005,-.0005,100)
opt, pcov = curve_fit(model_func, x, t,p0)
a, b, x0 = opt
# test result
x2 = np.linspace(1980, 2020, 100)
y2 = model_func(x2, a, b,x0)
coefs, cov = poly.polyfit(x, t, 2,full=True)
ffit = poly.polyval(x2, coefs)
plt.loglog(x,t,'.')
plt.loglog(x2, ffit,'--', color="#1f77b4")
print('S = (',coefs[0],'*(t-',coefs[2],')^',coefs[1])

matplotlib exact solution to a differential equation

I'm trying to plot the exact solution to a differential equation (a radioactive leak model) in python2.7 with matplotlib. When plotting the graph with Euler methods OR SciPy I get the expected results, but with the exact solution the output is a straight-line graph (should be logarithmic curve).
Here is my code:
import math
import numpy as np
import matplotlib.pyplot as plt
#define parameters
r = 1
beta = 0.0864
x0 = 0
maxt = 100.0
tstep = 1.0
#Make arrays for time and radioactivity
t = np.zeros(1)
#Implementing model with Exact solution where Xact = Exact Solution
Xact = np.zeros(1)
e = math.exp(-(beta/t))
while (t[-1]<maxt):
t = np.append(t,t[-1]+tstep)
Xact = np.append(Xact,Xact[-1] + ((r/beta)+(x0-r/beta)*e))
#plot results
plt.plot(t, Xact,color="green")
I realise that my problem may be due to an incorrect equation, but I'd be very grateful if someone could point out an error in my code. Cheers.
You probably want e to depend on t, as in
def e(t): return np.exp(-t/beta)
and then use
Xact.append( (r/beta)+(x0-r/beta)*e(t[-1]) )
But you can have that all shorter as
t = np.arange(0, maxt+tstep/2, tstep)
plt.plot(t, (r/beta)+(x0-r/beta)*np.exp(-t/beta), color="green" )

Python: two normal distribution

I have two data sets where two values where measured. I am interested in the difference between the value and the standard deviation of the difference. I made a histogram which I would like to fit two normal distributions. To calculate the difference between the maxima. I also would like to evaluate the effect that in on data set I have much less data on one value. I've already looked at this link but it is not really what I need:
Python: finding the intersection point of two gaussian curves
for ii in range(2,8):
# Kanal = ii - 1
file = filepath + '\Mappe1.txt'
data = np.loadtxt(file, delimiter='\t', skiprows=1)
data = data[:,ii]
plt.hist(data,bins=100)
plt.xlabel("bins")
plt.ylabel("Counts")
plt.tight_layout()
plt.grid()
plt.figure()
plt.show()
Quick and dirty fitting can be readily achieved using scipy:
from scipy.optimize import curve_fit #non linear curve fitting tool
from matplotlib import pyplot as plt
def func2fit(x1,x2,m_1,m_2,std_1,std_2,height1, height2): #define a simple gauss curve
return height1*exp(-(x1-m_1)**2/2/std_1**2)+height2*exp(-(x2-m_2)**2/2/std_2**2)
init_guess=(-.3,.3,.5,.5,3000,3000)
#contains the initial guesses for the parameters (m_1, m_2, std_1, std_2, height1, height2) using your first figure
#do the fitting
fit_pars, pcov =curve_fit(func2fit,xdata,ydata,init_guess)
#fit_pars contains the mean, the heights and the SD values, pcov contains the estimated covariance of these parameters
plt.plot(xdata,func2fit(xdata,*fit_pars),label='fit') #plot the fit
For further reference consult the scipy manual page:
curve_fit
Assuming that the two samples are independent there is no need to handle this problem using curve fitting. It's basic statistics. Here's some code that does the calculations required, with the source attributed in a comment.
## adapted from http://onlinestatbook.com/2/estimation/difference_means.html
from random import gauss
from numpy import sqrt
sample_1 = [ gauss(0,1) for _ in range(10) ]
sample_2 = [ gauss(1,.5) for _ in range(20) ]
n_1 = len(sample_1)
n_2 = len(sample_2)
mean_1 = sum(sample_1)/n_1
mean_2 = sum(sample_2)/n_2
SSE = sum([(_-mean_1)**2 for _ in sample_1]) + sum([(_-mean_2)**2 for _ in sample_2])
df = (n_1-1) + (n_2-1)
MSE = SSE/df
n_h = 2 / ( 1/n_1 + 1/n_2 )
s_mean_diff = sqrt( 2* MSE / n_h )
print ( 'difference between means', abs(n_1-n_2))
print ( 'std dev of this difference', s_mean_diff )

Categories

Resources