Python: curve fit looks scrambled

Python: curve fit looks scrambled - python

I am trying to fit a curve to some data but the resulting curve looks like a scrambled mess. I don't know whether or not the coefficients are accurate. With this sample data set it prints something like a triangle and with my original data set it looks even worse. It's mostly tutorial. I tried removing the sympy code from an alternate tutorial, but doing so accomplished nothing.
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import numpy as np
import sympy as sym
x = [0.0009425070688029959,
0.0009398496240601303,
0.0018779342723004293,
0.004694835680751241,
0.0009425070688029959,
0.004734848484848552,
0.0018993352326685255,
0.0009460737937558928]
y = [0.0028301886792453904,
0.003762935089369628,
0.001881467544684814,
0.0009433962264150743,
0.0028301886792453904,
0.0019029495718363059,
0.0038058991436727804,
0.0018939393939393534]
"""
Plot your data
"""
plt.plot(x, y, 'ro',label="Original Data")
"""
brutal force to avoid errors
"""
x = np.array(x, dtype=float) #transform your data in a numpy array of floats
y = np.array(y, dtype=float) #so the curve_fit can work
"""
create a function to fit with your data. a, b, c and d are the coefficients
that curve_fit will calculate for you.
In this part you need to guess and/or use mathematical knowledge to find
a function that resembles your data
"""
def func(x, b, c, d):
return b * x * x + c * x + d
"""
make the curve_fit
"""
popt, pcov = curve_fit(func, x, y)
"""
The result is:
popt[0] = a , popt[1] = b, popt[2] = c and popt[3] = d of the function,
so f(x) = popt[0]*x**3 + popt[1]*x**2 + popt[2]*x + popt[3].
"""
print("b = " + str(popt[0]) + " c = " + str(popt[1]) + " d = " + str(popt[2]))
"""t
Use sympy to generate the latex sintax of the function
"""
xs = sym.Symbol('\lambda')
tex = sym.latex(func(xs,*popt)).replace('$', '')
plt.title(r'$f(\lambda)= %s$' %(tex),fontsize=16)
"""
Print the coefficients and plot the funcion.
"""
plt.plot(x, func(x, *popt), label="Fitted Curve") #same as line above \/
#plt.plot(x, popt[0]*x**3 + popt[1]*x**2 + popt[2]*x + popt[3], label="Fitted Curve")
plt.legend(loc='upper left')
plt.show()

This is because Matplotlib will only draw lines between the few points in your original data (in the x and y arrays) and in the order they are defined. There are only 3 unique x values (plus some noise) which is why you see what looks like a triangle.
The fix is to create a new array with evenly spread, and ordered, x values across the range you're interested in. You can do that with the linspace function in numpy.
For example, try this for your second plot command:
x_eval = np.linspace(min(x), max(x), 100)
plt.plot(x_eval, func(x_eval, *popt), label="Fitted Curve")
x_eval above is a list of 100 evenly spread values between the minimum and maximum x value in your original data.

Looks like you need to sort on xdata.
Try inserting this:
x,y = zip(*sorted(zip(x, y)))
Such that
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import numpy as np
import sympy as sym
x = [0.0009425070688029959,
0.0009398496240601303,
0.0018779342723004293,
0.004694835680751241,
0.0009425070688029959,
0.004734848484848552,
0.0018993352326685255,
0.0009460737937558928]
y = [0.0028301886792453904,
0.003762935089369628,
0.001881467544684814,
0.0009433962264150743,
0.0028301886792453904,
0.0019029495718363059,
0.0038058991436727804,
0.0018939393939393534]
"""
Plot your data
"""
plt.plot(x, y, 'ro',label="Original Data")
"""
brutal force to avoid errors
"""
x,y = zip(*sorted(zip(x, y)))
x = np.array(x, dtype=float) #transform your data in a numpy array of floats
y = np.array(y, dtype=float) #so the curve_fit can work
"""
create a function to fit with your data. a, b, c and d are the coefficients
that curve_fit will calculate for you.
In this part you need to guess and/or use mathematical knowledge to find
a function that resembles your data
"""
def func(x, b, c, d):
return b * x * x + c * x + d
"""
make the curve_fit
"""
popt, pcov = curve_fit(func, x, y)
"""
The result is:
popt[0] = a , popt[1] = b, popt[2] = c and popt[3] = d of the function,
so f(x) = popt[0]*x**3 + popt[1]*x**2 + popt[2]*x + popt[3].
"""
print("b = " + str(popt[0]) + " c = " + str(popt[1]) + " d = " + str(popt[2]))
"""t
Use sympy to generate the latex sintax of the function
"""
xs = sym.Symbol('\lambda')
tex = sym.latex(func(xs,*popt)).replace('$', '')
plt.title(r'$f(\lambda)= %s$' %(tex),fontsize=16)
"""
Print the coefficients and plot the funcion.
"""
plt.plot(x, func(x, *popt), label="Fitted Curve") #same as line above \/
#plt.plot(x, popt[0]*x**3 + popt[1]*x**2 + popt[2]*x + popt[3], label="Fitted Curve")
plt.legend(loc='upper left')
plt.show()
The plotted curve from the data above.

Related

Quadratic fit with matplotlib not really working

I was following a tutorial for data fitting, and when I just changed original data to my data the fit became not quadratic.
Here's my code, thanks a lot for help:
# fit a second degree polynomial to the economic data
import numpy as np
from numpy import arange
from pandas import read_csv
from scipy.optimize import curve_fit
from matplotlib import pyplot
x = np.array([1,2,3,4,5,6])
y = np.array([1,4,12,29,54,104])
# define the true objective function
def objective(x, a, b, c):
return a * x + b * x**2 + c
# load the dataset
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/longley.csv'
dataframe = read_csv(url, header=None)
data = dataframe.values
# choose the input and output variables
# curve fit
popt, _ = curve_fit(objective, x, y)
# summarize the parameter values
a, b, c = popt
print('y = %.5f * x + %.5f * x^2 + %.5f' % (a, b, c))
# plot input vs output
pyplot.scatter(x, y)
# define a sequence of inputs between the smallest and largest known inputs
x_line = arange(min(x), max(x), 1)
# calculate the output for the range
y_line = objective(x_line, a, b, c)
# create a line plot for the mapping function
pyplot.plot(x_line, y_line, '--', color='red')
pyplot.show()```
I tried python matplotlib quadratic data fit, and I expected quadratic function but visually it's not.

How to show standard error with curve_fit from scipy in python?

I am trying to fit multiple curve equations to my data to determine what kind of decay curve best represents my data. I am using the curve_fit function within scipy in python.
Here is my example data:
data = {'X':[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,12 ,13 ,14 ,15
],
'Y':[55, 55, 55, 54, 54, 54, 54, 53, 53, 50, 45, 37, 27, 16, 0
]}
df = pd.DataFrame(data)
I then want to try fitting a linear, logarithmic, second-degree polynomial, and exponential decay curve to my data points and then plotting. I am trying out the following code:
import pandas as pd
import numpy as np
from numpy import array, exp
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
# load the dataset
data = df.values
# choose the input and output variables
x, y = data[:, 0], data[:, 1]
def func1(x, a, b):
return a * x + b
def func2(x, a, b):
return a * np.log(x) + b
def func3(x, a, b, c):
return a*x**2+b*x+c
def func4(x, a, b, c):
return a*exp(b*x)+c
params, covs = curve_fit(func1, x, y)
params, _ = curve_fit(func1, x, y)
a, b = params[0], params[1]
yfit1 = a * x + b
print('Linear decay fit:')
print('y = %.5f * x + %.5f' % (a, b))
params, _ = curve_fit(func2, x, y)
a, b = params[0], params[1]
yfit2 = a * np.log(x) + b
print('Logarithmic decay fit:')
print('y = %.5f * ln(x)+ %.5f' % (a, b))
params, _ = curve_fit(func3, x, y)
a, b, c = params[0], params[1], params[2]
yfit3 = a*x**2+b*x+c
print('Polynomial decay fit:')
print('y = %.5f * x^2 + %.5f * x + %.5f' % (a, b, c))
params, _ = curve_fit(func4, x, y)
a, b, c = params[0], params[1], params[2]
yfit4 = a*exp(x*b)+c
print('Exponential decay fit:')
print('y = %.5f * exp(x*%.5f)+%.5f' % (a, b, c))
plt.plot(x, y, 'bo', label="y-original")
plt.plot(x, yfit1, label="y=a * x + b")
plt.plot(x, yfit2, label="y=a * np.log(x) + b")
plt.plot(x, yfit3, label="y=a*x^2 + bx + c")
plt.plot(x, yfit4, label="y=a*exp(x*b)+c")
plt.xlabel('x')
plt.ylabel('y')
plt.legend(loc='best', fancybox=True, shadow=True)
plt.grid(True)
plt.show()
This produces the following text and visual plotting output:
I had originally tried to find a way to show the r-squared value for each curve fit, but I found out that for non-linear curves, R squared is not suitable, and that instead I should be identifying the standard error for each curve fit to best judge which curve equation best describes the decay I am seeing in my datapoints. My question is how can I take my code and the fits of each equation I attempted and output a standard error reading for each curve fit so that I can best judge which equation most accurately represents my data? I am ultimately trying to make the judgement call of saying "I have discovered that as the x-axis value increases, the y-axis value decays ..." and fitting in that blank with "linearly", "logarithmically", second-degree quadratically", or "exponentially". Visually I see that the exponentially decay equation fits my data the best, but I do not have a quantitative basis to make that call, other than simply visually, and so that is why I am trying to find out how I can print the standard error, so that I can judge the best fit for an equation as corresponding to the equation with the lowest standard error. I have been unable to locate just how to derive this from my calculations from documentation.

You can use Mean Absolute Scaled Error (MASE) for comparing the goodness of fit. Define a function for MASE as follows:
import numpy as np
def mase(actual : np.ndarray, predicted : np.ndarray):
forecast_error = np.mean(np.abs(actual - predicted))
naive_forecast = np.mean(np.abs(np.diff(actual)))
mase = forecast_error / naive_forecast
return mase
Then simply compare all of the predicted values with the actual values (both must be NumPy arrays to use the function above). You can then select the model/equation with the lowest MASE value. For example:
>>> actual = np.array([1,2,3])
>>> predicted1 = np.array([1.1, 2.2, 3.3])
>>> mase(actual, predicted1)
0.20000000000000004
>>>
>>> predicted2 = np.array([1.1, 2.1, 3.1])
>>> mase(actual, predicted2)
0.10000000000000009
Here we can safely say that predicted2 has a lower MASE value and thus is favorable.

Python code to create a diagram of Failure Assessment Diagram

I would like to do a Monte Carlo Probabilistic Model in Structural Analysis. In order to do so, I need to graph this model:
I worked out following code, but it still needs a lot of work:
import pandas as pd
from matplotlib import pyplot
import numpy as np
from scipy.optimize import curve_fit
from numpy import arange
%matplotlib inline
# define the true objective function
def objective(x, a, b, c, d, e, f):
return (a * x) + (b * x**2) + (c * x**3) + (d * x**4) + (e * x**5) + f
y = np.array([1,0.99,0.97,0.93,0.9,0.81,0.7,0.57,0.5,0.32,0.25])
x = np.array([0,0.2,0.4,0.6,0.67,.8,0.9,1.0,1.05,1.2,1.32])
popt, _ = curve_fit(objective, x, y)
a, b, c, d, e, f = popt
pyplot.scatter(x, y)
# define a sequence of inputs between the smallest and largest known inputs
x_line = arange(min(x), max(x), 0.1)
# calculate the output for the range
y_line = objective(x_line, a, b, c, d, e, f)
# create a line plot for the mapping function
pyplot.plot(x_line, y_line, '--', color='red')
pyplot.show()
Can you help me do the code properly to create a curve_fit?
How can I determine whether a random number will be inside the curve?

To get the curve to attach to the Y axis on the left, one way would be to set the X axis minimum to be the same as the smallest X axis value that you have (in this case, zero). matplotlib.pyplot.xlim
To close the right side of the plot, you can plot a vertical line based on the min/max of your data set. matplotlib.pyplot.vlines
While perhaps an overly simplistic view of your problem, one way would be to simply compare the value in question to the ranges of your dataset.
min(y) <= a[1] <= max(y)
The following code shows each example, but doesn't take the time to make it necessarily as Pythonic as it could be (written literally for illustration).
Code:
#! /usr/bin/env python
# -*- coding: utf-8 -*-
import pandas as pd
from matplotlib import pyplot
import numpy as np
from scipy.optimize import curve_fit
from numpy import arange
# %matplotlib inline
# define the true objective function
def objective(x, a, b, c, d, e, f):
return (a * x) + (b * x**2) + (c * x**3) + (d * x**4) + (e * x**5) + f
y = np.array([1,0.99,0.97,0.93,0.9,0.81,0.7,0.57,0.5,0.32,0.25])
x = np.array([0,0.2,0.4,0.6,0.67,.8,0.9,1.0,1.05,1.2,1.32])
popt, _ = curve_fit(objective, x, y)
a, b, c, d, e, f = popt
pyplot.scatter(x, y)
# define a sequence of inputs between the smallest and largest known inputs
x_line = arange(min(x), max(x), 0.1)
# calculate the output for the range
y_line = objective(x_line, a, b, c, d, e, f)
# create a line plot for the mapping function
pyplot.plot(x_line, y_line, '--', color='red')
# Set X axis limits
pyplot.xlim(min(x),)
# Set Y axis limits
pyplot.ylim(0,)
# Close the curve on the right
pyplot.vlines(max(x), min(y), 0, linestyles='--', color='red')
# Value within range?
a = (0.15, 0.63)
a1 = min(x) <= a[0] <= max(x)
a2 = min(y) <= a[1] <= max(y)
if a1 and a2:
print('True')
# Plot test point
pyplot.plot(a[0], a[1], marker='o', markersize=5, color="blue")
pyplot.show()
Output:
Shell output: True

Curve fitting of complex data

I want to fit complex data set with a two functions which shared the same parameters. For this I used
def funcReal(x,a,b,c,d):
return np.real((a + 1j*b)*(np.exp(1j*k*x - kappa1*x) - np.exp(kappa2*x)) + (c + 1j*d)*(np.exp(-1j*k*x - kappa1*x) - np.exp(-kappa2*x)))
def funcImag(x,a,b,c,d):
return np.imag((a + 1j*b)*(np.exp(1j*k*x - kappa1*x) - np.exp(kappa2*x)) + (c + 1j*d)*(np.exp(-1j*k*x - kappa1*x) - np.exp(-kappa2*x)))`
poptReal, pcovReal = curve_fit(funcReal, x, yReal)
poptImag, pcovImag = curve_fit(funcImag, x, yImag)
Here funcReal is the real part of my model, funcImag the imaginary part, yReal the real part of the data and yImag the imaginary part of the data.
However, both fits does not give me the same parameters for the real and imaginary part.
My question is there a package or a method such that I can realized multi fits for multiple data sets and multiple functions with shared parameters?

To fit both the complex function given above, we can treat the real and imaginary components as a coordinate point, or as a vector. Since curve_fit doesn't care about the order at which data points are inserted in the vectors x (independent data) and y (dependent data), we can simply split the complex data and stack the real and imaginary components using hstack. See the example below.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
kappa1 = np.pi
kappa2 = -0.01
def long_function(x, a, b, c, d):
return (a + 1j*b)*(np.exp(1j*k*x - kappa1*x) - np.exp(kappa2*x)) + (c + 1j*d)*(np.exp(-1j*k*x - kappa1*x) - np.exp(-kappa2*x))
def funcBoth(x, a, b, c, d):
N = len(x)
x_real = x[:N//2]
x_imag = x[N//2:]
y_real = np.real(long_function(x_real, a, b, c, d))
y_imag = np.imag(long_function(x_imag, a, b, c, d))
return np.hstack([y_real, y_imag])
# Create an independent variable with 100 measurements
N = 100
x = np.linspace(0, 10, N)
# True values of the dependent variable
y = long_function(x, a=1.1, b=0.3, c=-0.2, d=0.23)
# Add uniform complex noise (real + imaginary)
noise = (np.random.rand(N) + 1j * np.random.rand(N) - 0.5 - 0.5j) * 0.1
yNoisy = y + noise
# Split the measurements into a real and imaginary part
yReal = np.real(yNoisy)
yImag = np.imag(yNoisy)
yBoth = np.hstack([yReal, yImag])
# Find the best-fit solution
poptBoth, pcovBoth = curve_fit(funcBoth, np.hstack([x, x]), yBoth)
# Compute the best-fit solution
yFit = long_function(x, *poptBoth)
print(poptBoth)
# Plot the results
plt.figure(figsize=(9, 4))
plt.subplot(121)
plt.plot(x, np.real(yNoisy), "k.", label="Noisy y")
plt.plot(x, np.real(y), "r--", label="True y")
plt.plot(x, np.real(yFit), label="Best fit")
plt.ylabel("Real part of y")
plt.xlabel("x")
plt.legend()
plt.subplot(122)
plt.plot(x, np.imag(yNoisy), "k.")
plt.plot(x, np.imag(y), "r--")
plt.plot(x, np.imag(yFit))
plt.ylabel("Imaginary part of y")
plt.xlabel("x")
plt.tight_layout()
plt.show()
Result:
The best-fit parameters that were found in this example were a = 1.14, b = 0.375, c = -0.236, and d = 0.163, which are close enough to the true parameter values given the amplitude of the noise that I inserted here.

Scipy.optimize.curve_fit does not fit

Say I want to fit a sine function using scipy.optimize.curve_fit. I don't know any parameters of the function. To get the frequency, I do Fourier transform and guess all the other parameters - amplitude, phase, and offset. When running my program, I do get a fit but it does not make sense. What is the problem? Any help will be appreciated.
import numpy as np
import matplotlib.pyplot as plt
import scipy as sp
ampl = 1
freq = 24.5
phase = np.pi/2
offset = 0.05
t = np.arange(0,10,0.001)
func = np.sin(2*np.pi*t*freq + phase) + offset
fastfft = np.fft.fft(func)
freq_array = np.fft.fftfreq(len(t),t[0]-t[1])
max_value_index = np.argmax(abs(fastfft))
frequency = abs(freq_array[max_value_index])
def fit(a, f, p, o, t):
return a * np.sin(2*np.pi*t*f + p) + o
guess = (0.9, frequency, np.pi/4, 0.1)
params, fit = sp.optimize.curve_fit(fit, t, func, p0=guess)
a, f, p, o = params
fitfunc = lambda t: a * np.sin(2*np.pi*t*f + p) + o
plt.plot(t, func, 'r-', t, fitfunc(t), 'b-')

The main problem in your program was a misunderstanding, how scipy.optimize.curve_fit is designed and its assumption of the fit function:
ydata = f(xdata, *params) + eps
This means that the fit function has to have the array for the x values as the first parameter followed by the function parameters in no particular order and must return an array for the y values. Here is an example, how to do this:
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize
#t has to be the first parameter of the fit function
def fit(t, a, f, p, o):
return a * np.sin(2*np.pi*t*f + p) + o
ampl = 1
freq = 2
phase = np.pi/2
offset = 0.5
t = np.arange(0,10,0.01)
#is the same as fit(t, ampl, freq, phase, offset)
func = np.sin(2*np.pi*t*freq + phase) + offset
fastfft = np.fft.fft(func)
freq_array = np.fft.fftfreq(len(t),t[0]-t[1])
max_value_index = np.argmax(abs(fastfft))
frequency = abs(freq_array[max_value_index])
guess = (0.9, frequency, np.pi/4, 0.1)
#renamed the covariance matrix
params, pcov = scipy.optimize.curve_fit(fit, t, func, p0=guess)
a, f, p, o = params
#calculate the fit plot using the fit function
plt.plot(t, func, 'r-', t, fit(t, *params), 'b-')
plt.show()
As you can see, I have also changed the way the fit function for the plot is calculated. You don't need another function - just utilise the fit function with the parameter list, the fit procedure gives you back.
The other problem was that you called the covariance array fit - overwriting the previously defined function fit. I fixed that as well.
P.S.: Of course now you only see one curve, because the perfect fit covers your data points.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: curve fit looks scrambled - python

Related

Quadratic fit with matplotlib not really working

How to show standard error with curve_fit from scipy in python?

Python code to create a diagram of Failure Assessment Diagram

Curve fitting of complex data

Scipy.optimize.curve_fit does not fit

Categories

Resources