I have managed to create an exponential regression based on some data from an experiment. However, I would like the regression to stop when the y-values start plateauing (around x = 42000 seconds). See attached image of plot.
This is the code so far:
import matplotlib.pyplot as plt;
import numpy as np;
import pandas as pd
import scipy.optimize as opt;
# This is the function we are trying to fit to the data.
def func(x, a, b, c):
return a * b**x
dataC = pd.read_csv("yeastdata1cropped.txt")
data = pd.read_csv("yeastdata1.txt")
xdata = np.array(data.iloc[:,1])
ydata = np.array(data.iloc[:,0])
xdatac = np.array(dataC.iloc[:,1])
ydatac = np.array(dataC.iloc[:,0])
# Plot the actual data
plt.plot(xdata, ydata, ".", label="Data");
# The actual curve fitting happens here
optimizedParameters, pcov = opt.curve_fit(func, xdatac, ydatac);
# Use the optimized parameters to plot the best fit
plt.plot(xdata, func(xdata, *optimizedParameters), label="fit");
# Show the graph
plt.legend();
plt.show();
You just need to pass the relevant/interested values to the fit as follows. You can use NumPy indexing to pass only those values of x which are below 42000. Using [xdatac<42000] will return the indices/positions where this condition holds True. The rest of the code remains the same.
optimizedParameters, pcov = opt.curve_fit(func, xdatac[xdatac<42000],
ydatac[xdatac<42000]);
This way, the fit will only be performed up to 42000 and you can still plot the fitted line later by passing the complete x data.
Related
I'm trying to generate a Fit for the Data I have The Data
The Sample when Plotted directly is as follows: Sample Data
I've been trying to generate a Polynomial fit for this Data where T = Time in days & IC/IC100 is the data corresponding,
I've used 2 methods to generate the Polynomial Fit
1 Using Polyfit & Poly1D
Here is my code for this approach
import math
import seaborn as sns
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
sns.set(style="darkgrid")
import matplotlib as mpl
mpl.rcParams['figure.dpi'] = 100
from scipy.stats import sem
from scipy import optimize
from scipy.optimize import curve_fit
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
IC_M = pd.read_csv("TvsC100_MES.csv")
IC_M.set_index('Group/#/', inplace=True)
IICM_1 = IC_M[0:5]
IICM_1
# DEGREE = 2
mymodel = np.poly1d(np.polyfit(IICM_1["IC/IC100"],IICM_1["T"], 2))
figure(figsize=(12, 8), dpi=100)
plt.plot(IICM_1["T"], IICM_1["IC/IC100"], marker = 'o', label = 'Original Plot', c = 'blue')
plt.plot(mymodel(IICM_1["IC/IC100"]),IICM_1["IC/IC100"], marker = 'x', label = 'New Y', color = 'red')
#plt.plot(mymodel(new_y),new_y, marker = 'x', label = 'New Y', color = 'red')
plt.xlabel("X")
plt.ylabel("Y")
plt.legend()
plt.show()`
when i plot the graph i get this error in the graph, where one point is off, its not supposed to be like that, i haven't been able to fix this error... The behavior of this co-relation isn't the same during experimentation of the recorded values
The Error
The Second Method i used was using Polyfit,polytransform & predict
in this method, The coefficients are being generated for each point and the fit is as per each point and not the line as a whole (e.g. the equation is Y(X) = AX^2 + BX + C, ABC should remain constant for all points and that is the fit i am looking for.. If i were to extend the values of Y, i'm supposed to find the next predicted value according to the sample data, unfortunately this isnt the case...
here is my code:
just the main part differes after inputing the data from before...
`
# Poly Creation
# The degree here is in format (min degree, max degree) = according to our selection of (2,2), we are removing the term without the Degree^2
poly = PolynomialFeatures(degree=(2,2), include_bias= False)
# Actually Transforming the Data & applying the Polynomial Function to it,,
poly_features = poly.fit_transform(np.array(IICM_1["IC/IC100"]).reshape(-1,1)) # Y
#Creating an Instance of the Linear Regression Model
poly_reg_model = LinearRegression(fit_intercept = False, positive = True)
# Fitting is the procedure where we Train the Model based on X(input) & Y(response) to solve for the Coefficients during these Values
# y = A*(X) + C
poly_reg_model.fit(poly_features, np.array(IICM_1["T"]).reshape(-1,1)) #X
y_predicted = poly_reg_model.predict(poly_features)
figure(figsize=(12, 8), dpi=100)
# points + Curve
plt.plot(IICM_1["T"], IICM_1["IC/IC100"] ,marker = 'o', label = "Samp: C/3005-1", color = "blue")
plt.plot(y_predicted, IICM_1["IC/IC100"] ,marker = 'x', label = "Samp: Prediction", color = "red")
plt.legend()
plt.xlabel("T")
plt.ylabel("IC/IC100")
plt.show()`
This output i get, which is incorrect...IMO
I need to fix this, either my understanding of Polynomial is incorrect or maybe i'm using this function incorrectly or something else... How can i approach & fix this issue..?
I tried changing the Input order for the functions thinking that it will consider the points as a single line and not as individual lines, but the results were bad...
I need to fix this, either my understanding of Polynomial is incorrect or maybe i'm using this function incorrectly or something else... How can i approach & fix this issue..?
I am trying to fit a curve for a set of points using numpy and scipy libraries but am getting a closed curve as shown below.
Could anyone let me know how to fit a curve without closing curve?
The code I followed is:
import numpy as np
from scipy.interpolate import splprep, splev
import matplotlib.pyplot as plt
coords = np.array([(3,8),(3,9),(4,10),(5,11),(6,11), (7,13), (9,13),(10,14),(11,14),(12,14),(14,16),(16,17),(17,18),(18,18),(19,18), (20,19),
(21,19),(22,20),(23,20),(24,21),(26,21),(27,21),(28,21),(30,21),(32,20),(33,20),(32,17),(33,16),(33,15),(34,12), (34,10),(33,10),
(33,9),(33,8),(33,6),(34,6),(34,5)])
tck, u = splprep(coords.T, u=None, s=0.0, per=1)
u_new = np.linspace(u.min(), u.max(), 1000)
x_new, y_new = splev(u_new, tck, der=0)
plt.plot(coords[:,1], coords[:,0], 'ro')
plt.plot(y_new, x_new, 'b--')
plt.show()
Output:
I need output without joining the 1st and last point.
Thank you.
Just set per parameter to 0 in scipy.interpolate.splprep:
tck, u = splprep(coords.T, u=None, s=0.0, per=0)
I have some data I'm trying to model with lmfit's Model.
Specifically, I'm measuring superconducting resistors. I'm trying fit the experimental data (resistance vs. temperature) to a model which incorporates the critical temperature Tc (material dependent), the resistance below Tc (nominally 0), and the resistance above Tc (structure dependent).
Here's a simplified version (with simulated data) of the code I'm using to plot my data, along with the output plot.
I'm not getting any errors but, as you can see, I'm also not getting a fit that matches my data.
What am I doing wrong? This is my first time using lmfit and Model, so I may be making a newbie mistake. I thought I was following the lmfit example but, as I said, I'm obviously doing something wrong.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from lmfit import Model
def main():
x = np.linspace(0, 12, 50)
x_ser = pd.Series(x) # Simulated temperature data
y1 = [0] * 20
y2 = [10] * 30
y1_ser = pd.Series(y1) # Simulated resistance data below Tc
y2_ser = pd.Series(y2) # Simulated resistance data above Tc (
y_ser = y1_ser.append(y2_ser, ignore_index=True)
xcrit_model = Model(data_equation)
params = xcrit_model.make_params(y1_guess=0, y2_guess=12, xcrit_guess=9)
print('params: {}'.format(params))
result = xcrit_model.fit(y_ser, params, x=x_ser)
print(result.fit_report())
plt.plot(x_ser, y_ser, 'bo', label='simulated data')
plt.plot(x_ser, result.init_fit, 'k.', label='initial fit')
plt.plot(x_ser, result.best_fit, 'r:', label='best fit')
plt.legend()
plt.show()
def data_equation(x, y1_guess, y2_guess, xcrit_guess):
x_lt_xcrit = x[x < xcrit_guess]
x_ge_xcrit = x[x >= xcrit_guess]
y1 = [y1_guess] * x_lt_xcrit.size
y1_ser = pd.Series(data=y1)
y2 = [y2_guess] * x_ge_xcrit.size
y2_ser = pd.Series(data=y2)
y = y1_ser.append(y2_ser, ignore_index=True)
return y
if __name__ == '__main__':
main()
lmfit (and basically all similar solvers) work with continuous variables and investigate how they alter the result by making tiny changes in the parameter values and seeing how that effects this fit.
But your xcrit_guess parameter is used only as a discrete variable. If its value changes from 9.0000 to 9.00001, the fit will not change at all.
So, basically, don't do:
x_lt_xcrit = x[x < xcrit_guess]
x_ge_xcrit = x[x >= xcrit_guess]
Instead, you should use a smoother sigmoidal step function. In fact, lmfit has one of these built-in. So you might try something like this (note, there is no point in converting numpy.arrays to pandas.Series - the code will just turn these back to numpy arrays anyway):
import numpy as np
from lmfit.models import StepModel
import matplotlib.pyplot as plt
x = np.linspace(0, 12, 50)
y = 9.5*np.ones(len(x))
y[:26] = 0.0
y = y + np.random.normal(size=len(y), scale=0.0002)
xcrit_model = StepModel(form='erf')
params = xcrit_model.make_params(amplitude=4, center=5, sigma=1)
result = xcrit_model.fit(y, params, x=x)
print(result.fit_report())
plt.plot(x, y, 'bo', label='simulated data')
plt.plot(x, result.init_fit, 'k', label='initial fit')
plt.plot(x, result.best_fit, 'r:', label='best fit')
plt.legend()
plt.show()
I have the following csv data:
Dataset Size,MAPE,MAE,STD MAPE,STD MAE
35000,0.0715392337,23.38300578,0.9078698348,2.80407539
26250,0.06893431034,22.34732326,0.9833948236,1.926517044
17500,0.0756695622,26.0900766,0.6055443674,8.842862631
8750,0.07176532526,23.02646184,0.8284005282,2.190506033
4200,0.08661127364,29.89234607,0.9395831421,7.587818412
2100,0.08072315267,27.20110884,0.03956974712,4.948606892
1050,0.07505202908,27.04025924,0.841966778,4.550482956
700,0.07703248113,26.17923045,0.4468447145,1.523638508
350,0.08695408769,32.35331585,0.7891190087,4.18648457
200,0.09770903032,30.96197823,0.04648972591,3.892800694
170,0.1202382169,41.87828814,0.7257680584,6.70453713
150,0.1960949784,77.20321559,0.5661066006,21.57418682
From the above data, I would like to generate the following plot using matplotlib or similar (seaborn, pandas, etc.):
from pathlib import Path
from matplotlib import animation
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from scipy.optimize import curve_fit
nr_datapoints = 10
def exponenial_func(x, a, b, c):
return a*np.exp(-b*x)+c
def myplot(data_file):
df = pd.read_csv(data_file)
print(df.head())
fig, ax = plt.subplots()
# Exponential line fit
popt, pcov = curve_fit(exponenial_func, np.array([float(i) for i in range(len(df['Dataset Size']))]), df['MAPE'], p0=(0, 0.0145, 0.0823))
xp = np.linspace(0,len(df['Dataset Size']), 100)
plt.plot(xp, exponenial_func(xp, *popt), color = 'g')
# barplote with error bars
ax.bar([str(s) for s in df['Dataset Size']], df['MAPE'], yerr=df['STD MAPE'])
plt.title('Accuracy of Model vs. Dataset Size')
plt.xlabel('Dataset Size')
plt.ylabel('Mean Absolute Percentage Error')
fig.tight_layout()
plt.show()
The plot that I get looks as follows:
Why do I end up with a line rather than a curve from my code despite fitting an exponential function to the data? (Given that the google sheets plot does the same thing, e.g. fitting an exponential curve to the data)
Played around with some functions, and I think I can say with some degree of certainty that the Google Sheets exponential function has a form close to this:
def sheetey_exponential_function(x, a, b, c):
return a * b ** (x + c)
The problem is that the horizontal axis is not linear. Actually it is inversed linear. So if you want your fit to look like an exponential function, you need to replace x with 1/x:
def exponenial_func(x, a, b, c):
return a*np.exp(-b/x)+c
The result is the following:
i've never tried implementing error bars based off of confidence intervals. Being that this is what I want to do, i'm unsure how to proceed further.
I have this large data array that consists ~1000 elements. From plotting the histogram that has this data, it looks well enough like a Maxwell-Boltzmann distribution.
Lets say my data is called x, which I apply the fitting for it as
import scipy.stats as stats
import numpy as np
import matplotlib.pyplot as plt
maxwell = stats.maxwell
## Scale Parameter
params = maxwell.fit(x, floc=0)
print params
## mean
mean = 2*params[1]*np.sqrt(2/np.pi)
print mean
## Variance
sig = (params[1])**(3*np.pi-8)/np.pi
print sig
>>> (0, 178.17597215151301)
>>> 284.327714571
>>> 512.637498406
To which when plotting it
fig = plt.figure(figsize=(7,7))
ax = fig.add_subplot(111)
xd = np.argsort(x)
ax.plot(x[xd], maxwell.pdf(x, *params)[xd])
ax.hist(x[xd], bins=75, histtype="stepfilled", linewidth=1.5, facecolor='none', alpha=0.55, edgecolor='black',
normed=True)
How on earth do you go about implanting confidence intervals with the curve fit?
I can use
conf = maxwell.interval(0.90,loc=mean,scale=sig)
>>> (588.40702793225228, 1717.3973740895271)
But I have no clue what do with this