Related
I am trying to fit multiple curve equations to my data to determine what kind of decay curve best represents my data. I am using the curve_fit function within scipy in python.
Here is my example data:
data = {'X':[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,12 ,13 ,14 ,15
],
'Y':[55, 55, 55, 54, 54, 54, 54, 53, 53, 50, 45, 37, 27, 16, 0
]}
df = pd.DataFrame(data)
I then want to try fitting a linear, logarithmic, second-degree polynomial, and exponential decay curve to my data points and then plotting. I am trying out the following code:
import pandas as pd
import numpy as np
from numpy import array, exp
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
# load the dataset
data = df.values
# choose the input and output variables
x, y = data[:, 0], data[:, 1]
def func1(x, a, b):
return a * x + b
def func2(x, a, b):
return a * np.log(x) + b
def func3(x, a, b, c):
return a*x**2+b*x+c
def func4(x, a, b, c):
return a*exp(b*x)+c
params, covs = curve_fit(func1, x, y)
params, _ = curve_fit(func1, x, y)
a, b = params[0], params[1]
yfit1 = a * x + b
print('Linear decay fit:')
print('y = %.5f * x + %.5f' % (a, b))
params, _ = curve_fit(func2, x, y)
a, b = params[0], params[1]
yfit2 = a * np.log(x) + b
print('Logarithmic decay fit:')
print('y = %.5f * ln(x)+ %.5f' % (a, b))
params, _ = curve_fit(func3, x, y)
a, b, c = params[0], params[1], params[2]
yfit3 = a*x**2+b*x+c
print('Polynomial decay fit:')
print('y = %.5f * x^2 + %.5f * x + %.5f' % (a, b, c))
params, _ = curve_fit(func4, x, y)
a, b, c = params[0], params[1], params[2]
yfit4 = a*exp(x*b)+c
print('Exponential decay fit:')
print('y = %.5f * exp(x*%.5f)+%.5f' % (a, b, c))
plt.plot(x, y, 'bo', label="y-original")
plt.plot(x, yfit1, label="y=a * x + b")
plt.plot(x, yfit2, label="y=a * np.log(x) + b")
plt.plot(x, yfit3, label="y=a*x^2 + bx + c")
plt.plot(x, yfit4, label="y=a*exp(x*b)+c")
plt.xlabel('x')
plt.ylabel('y')
plt.legend(loc='best', fancybox=True, shadow=True)
plt.grid(True)
plt.show()
This produces the following text and visual plotting output:
I had originally tried to find a way to show the r-squared value for each curve fit, but I found out that for non-linear curves, R squared is not suitable, and that instead I should be identifying the standard error for each curve fit to best judge which curve equation best describes the decay I am seeing in my datapoints. My question is how can I take my code and the fits of each equation I attempted and output a standard error reading for each curve fit so that I can best judge which equation most accurately represents my data? I am ultimately trying to make the judgement call of saying "I have discovered that as the x-axis value increases, the y-axis value decays ..." and fitting in that blank with "linearly", "logarithmically", second-degree quadratically", or "exponentially". Visually I see that the exponentially decay equation fits my data the best, but I do not have a quantitative basis to make that call, other than simply visually, and so that is why I am trying to find out how I can print the standard error, so that I can judge the best fit for an equation as corresponding to the equation with the lowest standard error. I have been unable to locate just how to derive this from my calculations from documentation.
You can use Mean Absolute Scaled Error (MASE) for comparing the goodness of fit. Define a function for MASE as follows:
import numpy as np
def mase(actual : np.ndarray, predicted : np.ndarray):
forecast_error = np.mean(np.abs(actual - predicted))
naive_forecast = np.mean(np.abs(np.diff(actual)))
mase = forecast_error / naive_forecast
return mase
Then simply compare all of the predicted values with the actual values (both must be NumPy arrays to use the function above). You can then select the model/equation with the lowest MASE value. For example:
>>> actual = np.array([1,2,3])
>>> predicted1 = np.array([1.1, 2.2, 3.3])
>>> mase(actual, predicted1)
0.20000000000000004
>>>
>>> predicted2 = np.array([1.1, 2.1, 3.1])
>>> mase(actual, predicted2)
0.10000000000000009
Here we can safely say that predicted2 has a lower MASE value and thus is favorable.
I would like to do a Monte Carlo Probabilistic Model in Structural Analysis. In order to do so, I need to graph this model:
I worked out following code, but it still needs a lot of work:
import pandas as pd
from matplotlib import pyplot
import numpy as np
from scipy.optimize import curve_fit
from numpy import arange
%matplotlib inline
# define the true objective function
def objective(x, a, b, c, d, e, f):
return (a * x) + (b * x**2) + (c * x**3) + (d * x**4) + (e * x**5) + f
y = np.array([1,0.99,0.97,0.93,0.9,0.81,0.7,0.57,0.5,0.32,0.25])
x = np.array([0,0.2,0.4,0.6,0.67,.8,0.9,1.0,1.05,1.2,1.32])
popt, _ = curve_fit(objective, x, y)
a, b, c, d, e, f = popt
pyplot.scatter(x, y)
# define a sequence of inputs between the smallest and largest known inputs
x_line = arange(min(x), max(x), 0.1)
# calculate the output for the range
y_line = objective(x_line, a, b, c, d, e, f)
# create a line plot for the mapping function
pyplot.plot(x_line, y_line, '--', color='red')
pyplot.show()
Can you help me do the code properly to create a curve_fit?
How can I determine whether a random number will be inside the curve?
To get the curve to attach to the Y axis on the left, one way would be to set the X axis minimum to be the same as the smallest X axis value that you have (in this case, zero). matplotlib.pyplot.xlim
To close the right side of the plot, you can plot a vertical line based on the min/max of your data set. matplotlib.pyplot.vlines
While perhaps an overly simplistic view of your problem, one way would be to simply compare the value in question to the ranges of your dataset.
min(y) <= a[1] <= max(y)
The following code shows each example, but doesn't take the time to make it necessarily as Pythonic as it could be (written literally for illustration).
Code:
#! /usr/bin/env python
# -*- coding: utf-8 -*-
import pandas as pd
from matplotlib import pyplot
import numpy as np
from scipy.optimize import curve_fit
from numpy import arange
# %matplotlib inline
# define the true objective function
def objective(x, a, b, c, d, e, f):
return (a * x) + (b * x**2) + (c * x**3) + (d * x**4) + (e * x**5) + f
y = np.array([1,0.99,0.97,0.93,0.9,0.81,0.7,0.57,0.5,0.32,0.25])
x = np.array([0,0.2,0.4,0.6,0.67,.8,0.9,1.0,1.05,1.2,1.32])
popt, _ = curve_fit(objective, x, y)
a, b, c, d, e, f = popt
pyplot.scatter(x, y)
# define a sequence of inputs between the smallest and largest known inputs
x_line = arange(min(x), max(x), 0.1)
# calculate the output for the range
y_line = objective(x_line, a, b, c, d, e, f)
# create a line plot for the mapping function
pyplot.plot(x_line, y_line, '--', color='red')
# Set X axis limits
pyplot.xlim(min(x),)
# Set Y axis limits
pyplot.ylim(0,)
# Close the curve on the right
pyplot.vlines(max(x), min(y), 0, linestyles='--', color='red')
# Value within range?
a = (0.15, 0.63)
a1 = min(x) <= a[0] <= max(x)
a2 = min(y) <= a[1] <= max(y)
if a1 and a2:
print('True')
# Plot test point
pyplot.plot(a[0], a[1], marker='o', markersize=5, color="blue")
pyplot.show()
Output:
Shell output: True
I am trying to fit a curve to some data but the resulting curve looks like a scrambled mess. I don't know whether or not the coefficients are accurate. With this sample data set it prints something like a triangle and with my original data set it looks even worse. It's mostly tutorial. I tried removing the sympy code from an alternate tutorial, but doing so accomplished nothing.
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import numpy as np
import sympy as sym
x = [0.0009425070688029959,
0.0009398496240601303,
0.0018779342723004293,
0.004694835680751241,
0.0009425070688029959,
0.004734848484848552,
0.0018993352326685255,
0.0009460737937558928]
y = [0.0028301886792453904,
0.003762935089369628,
0.001881467544684814,
0.0009433962264150743,
0.0028301886792453904,
0.0019029495718363059,
0.0038058991436727804,
0.0018939393939393534]
"""
Plot your data
"""
plt.plot(x, y, 'ro',label="Original Data")
"""
brutal force to avoid errors
"""
x = np.array(x, dtype=float) #transform your data in a numpy array of floats
y = np.array(y, dtype=float) #so the curve_fit can work
"""
create a function to fit with your data. a, b, c and d are the coefficients
that curve_fit will calculate for you.
In this part you need to guess and/or use mathematical knowledge to find
a function that resembles your data
"""
def func(x, b, c, d):
return b * x * x + c * x + d
"""
make the curve_fit
"""
popt, pcov = curve_fit(func, x, y)
"""
The result is:
popt[0] = a , popt[1] = b, popt[2] = c and popt[3] = d of the function,
so f(x) = popt[0]*x**3 + popt[1]*x**2 + popt[2]*x + popt[3].
"""
print("b = " + str(popt[0]) + " c = " + str(popt[1]) + " d = " + str(popt[2]))
"""t
Use sympy to generate the latex sintax of the function
"""
xs = sym.Symbol('\lambda')
tex = sym.latex(func(xs,*popt)).replace('$', '')
plt.title(r'$f(\lambda)= %s$' %(tex),fontsize=16)
"""
Print the coefficients and plot the funcion.
"""
plt.plot(x, func(x, *popt), label="Fitted Curve") #same as line above \/
#plt.plot(x, popt[0]*x**3 + popt[1]*x**2 + popt[2]*x + popt[3], label="Fitted Curve")
plt.legend(loc='upper left')
plt.show()
This is because Matplotlib will only draw lines between the few points in your original data (in the x and y arrays) and in the order they are defined. There are only 3 unique x values (plus some noise) which is why you see what looks like a triangle.
The fix is to create a new array with evenly spread, and ordered, x values across the range you're interested in. You can do that with the linspace function in numpy.
For example, try this for your second plot command:
x_eval = np.linspace(min(x), max(x), 100)
plt.plot(x_eval, func(x_eval, *popt), label="Fitted Curve")
x_eval above is a list of 100 evenly spread values between the minimum and maximum x value in your original data.
Looks like you need to sort on xdata.
Try inserting this:
x,y = zip(*sorted(zip(x, y)))
Such that
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import numpy as np
import sympy as sym
x = [0.0009425070688029959,
0.0009398496240601303,
0.0018779342723004293,
0.004694835680751241,
0.0009425070688029959,
0.004734848484848552,
0.0018993352326685255,
0.0009460737937558928]
y = [0.0028301886792453904,
0.003762935089369628,
0.001881467544684814,
0.0009433962264150743,
0.0028301886792453904,
0.0019029495718363059,
0.0038058991436727804,
0.0018939393939393534]
"""
Plot your data
"""
plt.plot(x, y, 'ro',label="Original Data")
"""
brutal force to avoid errors
"""
x,y = zip(*sorted(zip(x, y)))
x = np.array(x, dtype=float) #transform your data in a numpy array of floats
y = np.array(y, dtype=float) #so the curve_fit can work
"""
create a function to fit with your data. a, b, c and d are the coefficients
that curve_fit will calculate for you.
In this part you need to guess and/or use mathematical knowledge to find
a function that resembles your data
"""
def func(x, b, c, d):
return b * x * x + c * x + d
"""
make the curve_fit
"""
popt, pcov = curve_fit(func, x, y)
"""
The result is:
popt[0] = a , popt[1] = b, popt[2] = c and popt[3] = d of the function,
so f(x) = popt[0]*x**3 + popt[1]*x**2 + popt[2]*x + popt[3].
"""
print("b = " + str(popt[0]) + " c = " + str(popt[1]) + " d = " + str(popt[2]))
"""t
Use sympy to generate the latex sintax of the function
"""
xs = sym.Symbol('\lambda')
tex = sym.latex(func(xs,*popt)).replace('$', '')
plt.title(r'$f(\lambda)= %s$' %(tex),fontsize=16)
"""
Print the coefficients and plot the funcion.
"""
plt.plot(x, func(x, *popt), label="Fitted Curve") #same as line above \/
#plt.plot(x, popt[0]*x**3 + popt[1]*x**2 + popt[2]*x + popt[3], label="Fitted Curve")
plt.legend(loc='upper left')
plt.show()
The plotted curve from the data above.
I have two equations that originate from one single matrix equation:
[x,y] = [[cos(n), -sin(n)],[sin(n), cos(n)]]*[x', y'],
where x' = Acos(w1*t+ p1) and y' = Bcos(w2*t + p2).
This is just a single matrix equation for the vector [x,y], but it can be decomposed into 2 scalar equations: x = A*cos(s)*cos(w1*t+ p1)*x' - B*sin(s)*sin(w2*t + p2)*y' and y = A*sin(s)*cos(w1*t+ p1)*x' + B*cos(s)*sin(w2*t + p2)*y'.
So I am fitting two datasets, x vs. t and y vs. t, but there are some shared parameters in these fits, namely A, B and s.
1) Can I fit directly the matrix equation, or do I have to decompose it into scalar ones? The former would be more elegant.
2) Can I fit with shared parameters on curve_fit? All relevant questions use other packages.
The below example code fits two different equations with one shared parameter using curve_fit. This is not an answer to your question, but I cannot format code in a comment so I'm posting it here.
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
y1 = np.array([ 16.00, 18.42, 20.84, 23.26, 25.68])
y2 = np.array([-20.00, -25.50, -31.00, -36.50, -42.00])
comboY = np.append(y1, y2)
h = np.array([5.0, 6.1, 7.2, 8.3, 9.4])
comboX = np.append(h, h)
def mod1(data, a, b, c): # not all parameters are used here
return a * data + c
def mod2(data, a, b, c): # not all parameters are used here
return b * data + c
def comboFunc(comboData, a, b, c):
# single data set passed in, extract separate data
extract1 = comboData[:len(y1)] # first data
extract2 = comboData[len(y2):] # second data
result1 = mod1(extract1, a, b, c)
result2 = mod2(extract2, a, b, c)
return np.append(result1, result2)
# some initial parameter values
initialParameters = np.array([1.0, 1.0, 1.0])
# curve fit the combined data to the combined function
fittedParameters, pcov = curve_fit(comboFunc, comboX, comboY, initialParameters)
# values for display of fitted function
a, b, c = fittedParameters
y_fit_1 = mod1(h, a, b, c) # first data set, first equation
y_fit_2 = mod2(h, a, b, c) # second data set, second equation
plt.plot(comboX, comboY, 'D') # plot the raw data
plt.plot(h, y_fit_1) # plot the equation using the fitted parameters
plt.plot(h, y_fit_2) # plot the equation using the fitted parameters
plt.show()
print(fittedParameters)
I want to fit complex data set with a two functions which shared the same parameters. For this I used
def funcReal(x,a,b,c,d):
return np.real((a + 1j*b)*(np.exp(1j*k*x - kappa1*x) - np.exp(kappa2*x)) + (c + 1j*d)*(np.exp(-1j*k*x - kappa1*x) - np.exp(-kappa2*x)))
def funcImag(x,a,b,c,d):
return np.imag((a + 1j*b)*(np.exp(1j*k*x - kappa1*x) - np.exp(kappa2*x)) + (c + 1j*d)*(np.exp(-1j*k*x - kappa1*x) - np.exp(-kappa2*x)))`
poptReal, pcovReal = curve_fit(funcReal, x, yReal)
poptImag, pcovImag = curve_fit(funcImag, x, yImag)
Here funcReal is the real part of my model, funcImag the imaginary part, yReal the real part of the data and yImag the imaginary part of the data.
However, both fits does not give me the same parameters for the real and imaginary part.
My question is there a package or a method such that I can realized multi fits for multiple data sets and multiple functions with shared parameters?
To fit both the complex function given above, we can treat the real and imaginary components as a coordinate point, or as a vector. Since curve_fit doesn't care about the order at which data points are inserted in the vectors x (independent data) and y (dependent data), we can simply split the complex data and stack the real and imaginary components using hstack. See the example below.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
kappa1 = np.pi
kappa2 = -0.01
def long_function(x, a, b, c, d):
return (a + 1j*b)*(np.exp(1j*k*x - kappa1*x) - np.exp(kappa2*x)) + (c + 1j*d)*(np.exp(-1j*k*x - kappa1*x) - np.exp(-kappa2*x))
def funcBoth(x, a, b, c, d):
N = len(x)
x_real = x[:N//2]
x_imag = x[N//2:]
y_real = np.real(long_function(x_real, a, b, c, d))
y_imag = np.imag(long_function(x_imag, a, b, c, d))
return np.hstack([y_real, y_imag])
# Create an independent variable with 100 measurements
N = 100
x = np.linspace(0, 10, N)
# True values of the dependent variable
y = long_function(x, a=1.1, b=0.3, c=-0.2, d=0.23)
# Add uniform complex noise (real + imaginary)
noise = (np.random.rand(N) + 1j * np.random.rand(N) - 0.5 - 0.5j) * 0.1
yNoisy = y + noise
# Split the measurements into a real and imaginary part
yReal = np.real(yNoisy)
yImag = np.imag(yNoisy)
yBoth = np.hstack([yReal, yImag])
# Find the best-fit solution
poptBoth, pcovBoth = curve_fit(funcBoth, np.hstack([x, x]), yBoth)
# Compute the best-fit solution
yFit = long_function(x, *poptBoth)
print(poptBoth)
# Plot the results
plt.figure(figsize=(9, 4))
plt.subplot(121)
plt.plot(x, np.real(yNoisy), "k.", label="Noisy y")
plt.plot(x, np.real(y), "r--", label="True y")
plt.plot(x, np.real(yFit), label="Best fit")
plt.ylabel("Real part of y")
plt.xlabel("x")
plt.legend()
plt.subplot(122)
plt.plot(x, np.imag(yNoisy), "k.")
plt.plot(x, np.imag(y), "r--")
plt.plot(x, np.imag(yFit))
plt.ylabel("Imaginary part of y")
plt.xlabel("x")
plt.tight_layout()
plt.show()
Result:
The best-fit parameters that were found in this example were a = 1.14, b = 0.375, c = -0.236, and d = 0.163, which are close enough to the true parameter values given the amplitude of the noise that I inserted here.