curve_fit not optimizing one of the parameters - python

I need to fit with scipy.optimize.curve_fit some data that look like the points in the figure. I use a function y(x) (see def below) which gives a constant y(x)=c for x<x0, otherwise a polynomial (eg a second tilted line y1 = mx+q).
I give a reasonable initial guess for the parameters (x0, c, m, q), as show in the figure. The result from the fit shows that all the parameters are optimized except for the first one x0.
Why so?
Is it how I define the function testfit(x, *p), where x0 (=p[0]) appears within another function?
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# generate some data:
x = np.linspace(0,100,1000)
y1 = np.repeat(0, 500)
y2 = x[500:] - 50
y = np.concatenate((y1,y2))
y = y + np.random.randn(len(y))
def testfit(x, *p):
''' piecewise function used to fit
it's a constant (=p[1]) for x < p[0]
or a polynomial for x > p[0]
'''
x = x.astype(float)
y = np.piecewise(x, [x < p[0], x >= p[0]], [p[1], lambda x: np.poly1d(p[2:])(x)])
return y
# initial guess, one horizontal and one tilted line:
p0_guess = (30, 5, 0.3, -10)
popt, pcov = curve_fit(testfit, x, y, p0=p0_guess)
print('params guessed : '+str(p0_guess))
print('params from fit : '+str(popt))
plt.plot(x,y, '.')
plt.plot(x, testfit(x, *p0_guess), label='initial guess')
plt.plot(x, testfit(x, *popt), label='final fit')
plt.legend()
Output
params guessed : (30, 5, 0.3, -10)
params from fit : [ 30. 0.04970411 0.80106256 -34.17194401]
OptimizeWarning: Covariance of the parameters could not be estimated category=OptimizeWarning)

As suggested by kazemakase, I solved the problem with a smooth transition between the two functions I use to fit (one horizontal line followed by a polynomial). The trick was to multiply one function by sigmoid(x) and the other by 1-sigmoid(x), (where sigmoid(x) is defined below).
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
x = np.linspace(0,100,1000)
y1 = np.repeat(0, 500)
y2 = x[500:] - 50
y = np.concatenate((y1,y2))
y = y + np.random.randn(len(y))
def testfit(x, *p):
''' function to fit the indentation curve
p = [x0,c, poly1d_coeffs ]'''
x = x.astype(float)
y = p[1]*(1-sigmoid(x-p[0],k=1)) + np.poly1d(p[2:])(x) * sigmoid(x-p[0],k=1)
return y
def sigmoid(x, k=1):
return 1/(1+np.exp(-k*x))
p0_guess = (30, 5, 0.3, -10 )
popt, pcov = curve_fit(testfit, x, y, p0=p0_guess)
print('params guessed : '+str(p0_guess))
print('params from fit : '+str(popt))
plt.figure(1)
plt.clf()
plt.plot(x,y, 'y.')
plt.plot(x, testfit(x, *p0_guess), label='initial guess')
plt.plot(x, testfit(x, *popt), 'k', label='final fit')
plt.legend()

I had a similar problem. I ended up using np.gradient and a convolution to smooth the curve, then plotting it. Something like:
def mov_avg(n, data):
return np.convolve(data, np.ones((n,))/n, mode='valid')
If you want a more direct approach, you can try this:
def find_change(data):
def test_flag(pos):
grad = np.gradient(data) - np.gradient(data).mean()
return (grad[:pos]<0).sum() + (grad[pos:]>0).sum()
return np.vectorize(test_flag)(np.arange(len(data)-1)).argmax()
def find_gradient(pos, data):
return np.gradient(data[:pos]).mean(), np.gradient(data[pos:]).mean()
pos=find_change(x2)
print(pos, find_gradient(pos, data))
The first function calculates the point at which the gradient change by comparing the point gradient against the mean gradient, and finds the point from which the gradients are "mostly positive".
Hope it helps

Related

Fit a simple S-curve and find the midpoint in python

Let's say I have S-curved shaped data like below :
S-Curved data
I would like too find the simplest way to fit this kind of curves AND use this fit to find the midpoint (aka the point where y=0.5). The fact is that I don't know beforehand where the midpoint.
Thanks a lot for your answers,
Cheers
This is clearly a case of fitting a logistic curve with L=1:
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit
data = np.loadtxt(r"\data.txt", delimiter=",")
x = data[:, 0]
y = data[:, 1]
def f(x: np.ndarray, k: float, x0: float):
return 1 / (1 + np.exp(-k*(x - x0)))
popt, pcov = curve_fit(f, x, y, p0 = [1, 120])
fig, ax = plt.subplots(figsize=(8, 5.6))
plt.scatter(x, y)
plt.plot(x, f(x, *popt), color="red")
plt.show()
x0 is given by popt[1], i.e. 121.18.

Finding point on line that lies a minimum distance from other point not on line Python

I have used scipy.optimize.fmin_cobyla to find the minimum distance between a point and a line on a graph. Is there a way to get fmin_cobyla to return the point on the line (not the distance!) at which that minimum distance occurs? Currently, this is my code:
from matplotlib import pyplot as plt
from scipy.optimize import curve_fit, fmin_cobyla
P = (0.5, 2)
def logifunc(x,A,x0,k):
return A / (1 + np.exp(-k*(x-x0)))
x, y = ecdf_xy(gene='Cd44', cluster_num=6, group='TREAT')
x[0] = 0
popt, pcov = curve_fit(logifunc, x, y, p0=[min(y), np.median(y), max(y)])
def f(x):
return logifunc(x, *popt)
def objective(X):
x,y = X
return np.sqrt((x - P[0])**2 + (y - P[1])**2)
def c1(X):
x,y = X
return f(x) - y
X = fmin_cobyla(objective, x0=[0.5,0.5], cons=[c1])
print('The minimum distance is {0:1.2f}'.format(objective(X)))
fig, ax = plt.subplots()
plt.scatter(x, logifunc(x, *popt))
plt.plot(x, logifunc(x, *popt))
plt.plot(P[0], P[1], marker='o')
circle = plt.Circle(P, objective(X))
ax.add_patch(circle)
If there isn't a way to use fmin_cobyla to do this, how else might I find the point on the line that lies the minimum distance from P? That is the main goal of my question.
Any other comments or suggestions about my code are of course welcome.

Gaussian fitted curve showing a tail that does not go back to base-level

Based upon existing topics on Stackoverflow, I have managed to fit a Gaussian curve to my dataset. However, the fitted Gaussian shows one tail that does not go back to base-level (i.e., in the example below, the right tail suddenly stops at a higher y-value compared to the left tail). This surprises me, as per definition a Gaussian should show a perfectly symmetrical bell-shaped curve. How can I generate a Gaussian curve of which both tails are equally long (i.e., the tails stop at the same width measured from the plume center-line) and end at the same base-level (i.e., the same y-value)? The reason I would like to have this, is because in my data sometimes a second peak starts to arise while the first peak did not go back to base-level yet. I would like to separate these peaks by fitting a Gaussian that goes back to base-level, as theoretically each peak should go back to its base-level. Thanks a lot in advance!
import numpy as np
from lmfit import Model
import matplotlib.pyplot as plt
from scipy.signal import find_peaks
x = np.array([-20.0,-17.0,-14.0,-11.0,-8.0,-5.0,-2.0,1.0,4.0,7.0,10.0,13.0,16.0,19.0,22.0,25.0,28.0,31.0,34.0,37.0,40.0,43.0,46.0,49.0,52.0,55.0,58.0,61.0,64.0,67.0,70.0,73.0,76.0,79.0,82.0])
y = np.array([1.90269,1.93535,2.62402,3.08949,2.82409,3.07588,3.22015,3.18884,5.14053,10.5111,18.6118,28.6343,37.7625,46.3641,53.9163,60.7622,66.5765,71.0596,74.4948,77.7177,80.373,82.5833,83.9021,83.4652,79.0229,71.4679,61.93,52.113,43.8517,36.211,29.3815,23.8966,19.31,15.5209,12.4532])
def gaussian(x, amp, cen, wid):
return (amp / (np.sqrt(2*np.pi) * wid)) * np.exp(-(x-cen)**2 / (2*wid**2))
def line(x, slope, intercept):
return slope*x + intercept
peak_index = find_peaks(y,height=27.6)[0][0]
mean = sum(x*y)/np.sum(y) #weighted arithmetic mean
mod = Model(gaussian) + Model(line)
pars = mod.make_params(amp=max(y), cen=x[peak_index],
wid=np.sqrt(sum((x-mean)**2 * y)/sum(y)), slope=0, intercept=1)
result = mod.fit(y, pars, x=x)
comps = result.eval_components()
plt.plot(x, y, 'bo')
plt.plot(x, comps['gaussian'], 'k--')
Edit: The following example hopefully illustrates why I am interested in this. I have a long data-set in which the signal of different sources are being measured. The data-set is processed such that it generates the arrays x_measured and y_measured that contain the measured values belonging to one source. My program automatically detects the plume that occurs within the measured values, and stores the values of this plume in arrays called x and y. To these x and y arrays, I perform a Gaussian fit.
However, sometimes the measured values show that 2 plumes are overlapping, hence there is no measured plume from and back to base-level. An example is given in the code below. My program for these measured values now gives a Gaussian fit whereby the right tail goes to around y=0, but the left tail of the Gaussian fit stops around y=4.5. I would like the left tail to also go back to around y=0. This is, because theoretically I know that each plume should start and go back to the same base-level, and I want to compute the plume-width of such a Gaussian plume. For the example below, the left tail does not go back to around y=0, hence I cannot determine the width of the plume. I would like to have a Gaussian-fit of which both tails go back to the same base-level of y=0, such that I can determine the width of the plume.
x_measured = np.arange(-20,245,3)
y_measured = np.array([38.7586,38.2323,37.2958,35.9924,34.4196,32.7123,31.0257,29.5169,28.3244,27.5502,27.2458,27.4078,27.9815,28.8728,29.9643,31.1313,32.2545,33.2276,33.9594,34.373,34.4041,34.0009,33.1267,31.7649,29.9247,27.6458,24.9992,22.0845,19.0215,15.9397,12.966,10.2127,7.76834,5.69046,4.00296,2.69719,1.73733,1.06907,0.629744,0.358021,0.201123,0.11878,0.0839719,0.0813392,0.104295,0.151634,0.224209,0.321912,0.441478,0.575581,0.713504,0.843351,0.954777,1.04109,1.09974,1.13118,1.13683,1.11758,1.07369,1.0059,0.917066,0.81321,0.703288,0.597775,0.506678,0.437843,0.396256,0.384633,0.405147,0.461496,0.560387,0.71144,0.925262,1.21022,1.56925,1.99788,2.48458,3.01314,3.56626,4.12898,4.69031,5.24283,5.78014,6.29365,6.77004,7.19071,7.53399,7.78019,7.91889])
x = np.arange(10,104,3)
y = np.array([22.4548,23.4302,25.3389,27.9929,30.486,32.0528,33.5527,35.1304,35.9941,36.8606,37.1889,37.723,36.4069,35.9751,33.8824,31.0909,27.4247,23.3213,18.8772,14.3363,11.1075,7.68792,4.54899,2.2057,0,0,0,0,0,0,0.179834,0])
def gaussian(x, amp, cen, wid):
return (amp / (np.sqrt(2*np.pi) * wid)) * np.exp(-(x-cen)**2 / (2*wid**2))
def line(x, slope, intercept):
return slope*x + intercept
peak_index = find_peaks(y,height=27.6)[0][0]
mean = sum(x*y)/np.sum(y) #weighted arithmetic mean
mod = Model(gaussian) + Model(line)
pars = mod.make_params(amp=max(y), cen=x[peak_index],
wid=np.sqrt(sum((x-mean)**2 * y)/sum(y)), slope=0, intercept=1)
result = mod.fit(y, pars, x=x)
comps = result.eval_components()
plt.plot(x, y, 'bo')
plt.plot(x, comps['gaussian'], 'k--')
plt.plot(x_measured,y_measured)
It is unclear why you expect a bimodal fit with the model you defined. Use two different Gaussian functions for your fit, then evaluate the fitted functions for a longer interval x_fit to see the curves returning to baseline:
import numpy as np
from lmfit import Model
import matplotlib.pyplot as plt
from scipy.signal import find_peaks
x = np.array([-20.0,-17.0,-14.0,-11.0,-8.0,-5.0,-2.0,1.0,4.0,7.0,10.0,13.0,16.0,19.0,22.0,25.0,28.0,31.0,34.0,37.0,40.0,43.0,46.0,49.0,52.0,55.0,58.0,61.0,64.0,67.0,70.0,73.0,76.0,79.0,82.0])
y = np.array([1.90269,1.93535,2.62402,3.08949,2.82409,3.07588,3.22015,3.18884,5.14053,10.5111,18.6118,28.6343,37.7625,46.3641,53.9163,60.7622,66.5765,71.0596,74.4948,77.7177,80.373,82.5833,83.9021,83.4652,79.0229,71.4679,61.93,52.113,43.8517,36.211,29.3815,23.8966,19.31,15.5209,12.4532])
def gaussian1(x, amp1, cen1, wid1):
return (amp1 / (np.sqrt(2*np.pi) * wid1)) * np.exp(-(x-cen1)**2 / (2*wid1**2))
def gaussian2(x, amp2, cen2, wid2):
return (amp2 / (np.sqrt(2*np.pi) * wid2)) * np.exp(-(x-cen2)**2 / (2*wid2**2))
#peak_index = find_peaks(y,height=27.6)[0][0]
#mean = sum(x*y)/np.sum(y) #weighted arithmetic mean
mod = Model(gaussian1) + Model(gaussian2)
#I just filled in some start values, the details of educated guesses can be filled in later by you
pars = mod.make_params(amp1=30, amp2=40, cen1=20, cen2=40, wid1=2, wid2=2)
result = mod.fit(y, pars, x=x)
print(result.params)
x_fit=np.linspace(-30, 120, 500)
comps_elem = result.eval_components(x=x_fit)
comps_comb = result.eval(x=x_fit)
plt.plot(x, y, 'bo')
plt.plot(x_fit, comps_comb, 'k')
plt.plot(x_fit, comps_elem['gaussian1'], 'k-.')
plt.plot(x_fit, comps_elem['gaussian2'], 'k--')
plt.show()
Sample output:
The corresponding scipy.curve_fit function would look like this:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy.optimize import curve_fit
x = [-20.0,-17.0,-14.0,-11.0,-8.0,-5.0,-2.0,1.0,4.0,7.0,10.0,13.0,16.0,19.0,22.0,25.0,28.0,31.0,34.0,37.0,40.0,43.0,46.0,49.0,52.0,55.0,58.0,61.0,64.0,67.0,70.0,73.0,76.0,79.0,82.0]
y = [1.90269,1.93535,2.62402,3.08949,2.82409,3.07588,3.22015,3.18884,5.14053,10.5111,18.6118,28.6343,37.7625,46.3641,53.9163,60.7622,66.5765,71.0596,74.4948,77.7177,80.373,82.5833,83.9021,83.4652,79.0229,71.4679,61.93,52.113,43.8517,36.211,29.3815,23.8966,19.31,15.5209,12.4532]
def gauss(x, mu, sigma, A):
return A*np.exp(-(x-mu)**2/2/sigma**2)
def bimodal(x, mu1, sigma1, A1, mu2, sigma2, A2):
return gauss(x, mu1, sigma1, A1) + gauss(x, mu2, sigma2, A2)
expected = (20, 2, 30, 40, 2, 40)
params, cov = curve_fit(bimodal, x, y, expected)
sigma=np.sqrt(np.diag(cov))
x_fit = np.linspace(-20, 120, 500)
plt.plot(x_fit, bimodal(x_fit, *params), color='red', lw=3, label='model')
plt.plot(x_fit, gauss(x_fit, *params[:3]), color='red', lw=1, ls="--", label='distribution 1')
plt.plot(x_fit, gauss(x_fit, *params[3:]), color='red', lw=1, ls=":", label='distribution 2')
plt.scatter(x, y, marker="X", color="black", label="original data")
plt.legend()
print(pd.DataFrame(data={'params': params, 'sigma': sigma}, index=bimodal.__code__.co_varnames[1:]))
plt.show()

Sigmoidal curve fit, how to get the value of x when y=0.5

I want to solve the following function so that after fitting, I want to get the value of x when y=0.5.
The function:
import numpy as np
from scipy.optimize import curve_fit
def sigmoid(x, b, c):
y = 1 / (1 + c*np.exp(-b*x))
return y
x_data = [4, 6, 8, 10]
y_data = [0.86, 0.73, 0.53, 0.3]
popt, pcov = curve_fit(sigmoid, x_data, y_data,(28.14,-0.25))
please explain how would you carry out this using python!
Thanks!
When I run your code I get a warning, and popt is the same as your initial guess, (28.14, -0.25). If you try plotting this you'll see that it's essentially a straight line at y == 1 that doesn't fit your data well at all:
from matplotlib import pyplot as plt
x = np.linspace(4, 10, 1000)
y = sigmoid(x, *popt)
fig, ax = plt.subplots(1, 1)
ax.hold(True)
ax.scatter(x_data, y_data, s=50, zorder=20)
ax.plot(x, y, '-k', lw=2)
The problem is that you're initializing with a negative value for the b parameter. Remember that b gets negated, so you're actually exponentiating x times a positive number, which blows up your denominator. Instead you want to initialize with a positive value for b, but perhaps a negative value for c (to give you your negative slope):
popt2, pcov2 = curve_fit(sigmoid, x_data, y_data, (-0.5, 0.1))
y2 = sigmoid(x, *popt2)
ax.plot(x, y2, '-r', lw=2)
To get the value of x at y == 0.5 using nonlinear optimization you need to define an objective function, which could be the square of the difference between 0.5 and sigmoid(x, b, c):
def objective(x, b, c):
return (0.5 - sigmoid(x, b, c)) ** 2
You can then use scipy.optimize.minimize or scipy.optimize.minimize_scalar to find the value of x that minimizes the objective function:
from scipy.optimize import minimize_scalar
res = minimize_scalar(objective, bracket=(4, 10), args=tuple(popt2))
ax.annotate("$y = 0.5$", (res.x, 0.5), (30, 30), textcoords='offset points',
arrowprops=dict(facecolor='black', shrink=0.05), fontsize='x-large')

Matplotlib regression scattered plot using Python?

My Question
I tried regression using curve_fit function from scipy module, and now I am getting a scattered kind of plot, I can't figure it out, Why I am getting a scattered plot here ?
My Code
def scipyFunction(x,y):
plot(x, y, 'o', label='Original data', markersize=10)
x = [float(xn) for xn in x] #every element (xn) in x becomes a float
y = [float(yn) for yn in y] #every element (yn) in y becomes a float
x = np.array(x) #tranform data into numpy array
y = np.array(y) #transform data into numpy array
def functionForScipy(x,a,b,c,d):
return a*x**3 + b*x**2 + c*x + d
#make the curve_fit
popt,pcov = curve_fit(functionForScipy,x,y)
'''
The result is :
popt[0] = a, popt[1] = b, popt[2] = d of the function,
so f(x) = popt[0]*x**3 + popt[1]*x**2 + popt[2]*x + popt[3].
'''
print(popt)
plt.plot(x, popt[0]*x**3 + popt[1]*x**2 + popt[2]*x + popt[3], label="Fitted Curve") # same as lin eabove
plt.legend(loc='upper left')
plt.show()
x and y plot is like this :
I suspect this occurs because the values in your x array are not monotonically increasing (that is to say that the each subsequent number is larger than the last).
You need to sort your x values before you plot them otherwise they will be all over the place, as shown in the example below.
import numpy as np
import matplotlib.pyplot as plt
def func(x):
return x**2
x = np.array([0, 5, 2, 1, 3, 4])
y = func(x)
plt.plot(x, y, 'b-', label='Unsorted')
x.sort()
y = func(x)
plt.plot(x, y, 'r-', label='Sorted')
plt.legend()
plt.show()
Probably your x and y are not sorted.
Don't forget the apply the same sorting you do on x on y as well. To achieve this zip is very handy. You could add the following at the beginning of your function:
comb = zip(x,y)
comb.sort(key=lambda x:x[0]) #sort according to x
x, y = zip(*comb) #both x and y are sorted now according to x

Categories

Resources