I have to make a fit using curve_fit. My problem is that, instead of having a continous curve, I obtain a broken line, as shown in the figure. Here is my code:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit
N=np.array([66851,200522,401272,801832,1200951])
e=np.array([2.88,1.75,1.17,0.80,0.71])
def er_func(x,A,c):
return A/np.sqrt(x)+c
from scipy.optimize import curve_fit
popt, pcov=curve_fit(er_func,N,e,p0=[10,1000])
plt.plot(N,er_func(N,*popt),"b")
plt.plot(N,e,"xr")
plt.xlabel("Number of events")
plt.ylabel("Error [Chn]")
[https://i.stack.imgur.com/BZtnN.png][1]
I think that this happens because I'm plotting the fit function evaluated in the correspondence of my points, and then it connects the five points with a straight line. How can I obtain a correct fit?
Thanks for any help you can provide.
I am only showing the relevant part of the code. You needed to define a fine mesh (N_mesh below) for plotting your continuous fit curve. I am highlighting the lines added/modified by a comment
N=np.array([66851,200522,401272,801832,1200951])
N_mesh = np.linspace(N[0], N[-1], 100) # Added (A mesh of 100 x-points)
e=np.array([2.88,1.75,1.17,0.80,0.71])
def er_func(x,A,c):
return A/np.sqrt(x)+c
from scipy.optimize import curve_fit
popt, pcov=curve_fit(er_func,N,e,p0=[10,1000])
plt.plot(N_mesh,er_func(N_mesh,*popt),"b", label='Fit') # Modified
plt.plot(N,e,"xr", label='Actual data') # Modified
plt.legend(fontsize=14) # Added
Output
Related
So I have this task, where im supposed to interpolate a function with polynomials. The entire interval is divided into N subintervals, and the polynomial interpolating in each subinterval is of order k. I generet all my interpolating points, but I am running into two problems.
I) For k=1, i.e first order polynomials, I've tried solving the task by having a loop generate a first order polynomial in each subinterval using the scipy interp1d, but I'd like to get all the different polynomials in a single plot.
This is my code, tried only including the nessescary bits, sorry if something is missing. intpoint here are the interpolation points, and funky(x) is just the arbitrary function im approximating.
import numpy as np
import matplotlib.pyplot as plt
import scipy.interpolate as sc
intpoint=np.array([-3,-2,-1,0,1,2,3])
for i in range(len(intpoint)):
intleng=[intpoint[i],intpoint[i+1]]
myinterval=np.linspace(intpoint[i],intpoint[i+1],1000)
mypol=sc.interp1d(intleng,np.sin(intleng),1)
plt.plot(intleng, mypol(intleng))
plt.plot(myinterval,np.sin(myinterval))
plt.show()
Apologies in advance if anything is unclear, or my code is hard to follow/untidy.
import numpy as np
import matplotlib.pyplot as plt
import scipy.interpolate as sc
intpoint=np.array([-3,-2,-1,0,1,2,3])
for i in range(len(intpoint)-1):
intleng=[intpoint[i],intpoint[i+1]]
myinterval=np.linspace(intpoint[i],intpoint[i+1],1000)
mypol=sc.interp1d(intleng,np.sin(intleng),1)
plt.plot(myinterval,mypol(myinterval))
plt.plot(myinterval,np.sin(myinterval))
plt.show()
I think this is what you want. There was a mistake in the plotting and you should do plt.show() only once to get one plot.
i am trying to construct a function which gives me interpolated values of a piecewise linear function. I tried linear spline interpolation (which should be able to do exactly this?)- but without any luck. The problem is most visible on a log scale plot. Below there is the code of a small example i prepared:
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import os
from scipy import interpolate
#Original Data
pwl_data = np.array([[0,1e3, 1e5, 1e8], [-90,-90, -90, -130]])
#spine interpolation
pwl_spline = interpolate.splrep(pwl_data[0], pwl_data[1])
spline_x = np.linspace (0,1e8, 10000)
legend = []
plt.plot(pwl_data[0],pwl_data[1])
plt.plot(spline_x,interpolate.splev(spline_x,pwl_spline ),'*')
legend.append("Data")
legend.append("Interpolated Data")
plt.xscale('log')
plt.legend(legend)
plt.grid(True)
plt.grid(b=True, which='minor', linestyle='--')
plt.show()
What am I doing wrong?
The spline fitting have to be performed on the linearized data, i.e. using log(x) instead of x:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from scipy import interpolate
#Original Data
pwl_data = np.array([[1, 1e3, 1e5, 1e8], [-90, -90, -90, -130]])
x = pwl_data[0]
y = pwl_data[1]
log_x = np.log(x)
#spine interpolation
pwl_spline = interpolate.splrep(log_x, y)
spline_log_x = np.linspace(0, 18, 30)
spline_y = interpolate.splev(spline_log_x, pwl_spline )
plt.plot(log_x, y, '-o')
plt.plot(spline_log_x, spline_y, '-*')
plt.xlabel('log(x)');
note: I remove the zero from the data. Also, spline fitting could be not the best if you want a piecewise linear function, you could have a look at this question for example: https://datascience.stackexchange.com/q/8457/53362
For plotting with matplotlib, consider matplotlibs step which internally performs a piecewise constant interpolation.
https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.step.html
you can invoke it simply via:
plt.step(x,y) given your inputs x and y.
In plotly the argument line_shape='hv' for the Scatter plot achieves similar results see https://plotly.com/python/line-charts/
Have some data that I've plotted on a log-log plot and now I want to fit a straight line through these points. I have tried various methods and can't get what I'm after. Example code:
import numpy as np
import matplotlib.pyplot as plt
import random
x= np.linspace(1,100,10)
y = np.log10(x)+np.log10(np.random.uniform(0,10))
coefficients = np.polyfit(np.log10(x),np.log10(y),1)
polynomial=np.poly1d(coefficients)
y_fit = polynomial(y)
plt.plot(x,y,'o')
plt.plot(x,y_fit,'-')
plt.yscale('log')
plt.xscale('log')
This gives me a ideal 'straight' line in log log offset by a random number to which I then fit a 1d poly. The output is:
So ignoring the offset, which I can deal with, it is not quite what I require as it has basically plotted a straight line between each point and then joined them up whereas I need a 'line of best fit' through the middle of them all so I can measure the gradient of it.
What is the best way to achieve this?
One problem is
y_fit = polynomial(y)
You must plug in the x values, not y, to get y_fit.
Also, you fit log10(y) with log10(x), so to evaluate the linear interpolator, you must plug in log10(x), and the result will be the base-10 log of the y values.
Here's a modified version of your script, followed by the plot it generates.
import numpy as np
import matplotlib.pyplot as plt
import random
x = np.linspace(1,100,10)
y = np.log10(x) + np.log10(np.random.uniform(0,10))
coefficients = np.polyfit(np.log10(x), np.log10(y), 1)
polynomial = np.poly1d(coefficients)
log10_y_fit = polynomial(np.log10(x)) # <-- Changed
plt.plot(x, y, 'o-')
plt.plot(x, 10**log10_y_fit, '*-') # <-- Changed
plt.yscale('log')
plt.xscale('log')
How would I calculate the confidence intervals for a LOWESS regression in Python? I would like to add these as a shaded region to the LOESS plot created with the following code (other packages than statsmodels are fine as well).
import numpy as np
import pylab as plt
import statsmodels.api as sm
x = np.linspace(0,2*np.pi,100)
y = np.sin(x) + np.random.random(100) * 0.2
lowess = sm.nonparametric.lowess(y, x, frac=0.1)
plt.plot(x, y, '+')
plt.plot(lowess[:, 0], lowess[:, 1])
plt.show()
I've added an example plot with confidence interval below from the webblog Serious Stats (it is created using ggplot in R).
LOESS doesn't have an explicit concept for standard error. It just doesn't mean anything in this context. Since that's out, your stuck with the brute-force approach.
Bootstrap your data. Your going to fit a LOESS curve to the bootstrapped data. See the middle of this page to find a pretty picture of what your doing. http://statweb.stanford.edu/~susan/courses/s208/node20.html
Once you have your large number of different LOESS curves, you can find the top and bottom Xth percentile.
This is a very old question but it's one of the first that pops up on google search. You can do this using the loess() function from scikit-misc. Here's an example (I tried to keep your original variable names, but I bumped up the noise a bit to make it more visible)
import numpy as np
import pylab as plt
from skmisc.loess import loess
x = np.linspace(0,2*np.pi,100)
y = np.sin(x) + np.random.random(100) * 0.4
l = loess(x,y)
l.fit()
pred = l.predict(x, stderror=True)
conf = pred.confidence()
lowess = pred.values
ll = conf.lower
ul = conf.upper
plt.plot(x, y, '+')
plt.plot(x, lowess)
plt.fill_between(x,ll,ul,alpha=.33)
plt.show()
result:
For a project of mine, I need to create intervals for time-series modeling, and to make the procedure more efficient I created tsmoothie: A python library for time-series smoothing and outlier detection in a vectorized way.
It provides different smoothing algorithms together with the possibility to computes intervals.
In the case of LowessSmoother:
import numpy as np
import matplotlib.pyplot as plt
from tsmoothie.smoother import *
from tsmoothie.utils_func import sim_randomwalk
# generate 10 randomwalks of length 200
np.random.seed(33)
data = sim_randomwalk(n_series=10, timesteps=200,
process_noise=10, measure_noise=30)
# operate smoothing
smoother = LowessSmoother(smooth_fraction=0.1, iterations=1)
smoother.smooth(data)
# generate intervals
low, up = smoother.get_intervals('prediction_interval', confidence=0.05)
# plot the first smoothed timeseries with intervals
plt.figure(figsize=(11,6))
plt.plot(smoother.smooth_data[0], linewidth=3, color='blue')
plt.plot(smoother.data[0], '.k')
plt.fill_between(range(len(smoother.data[0])), low[0], up[0], alpha=0.3)
I point out also that tsmoothie can carry out the smoothing of multiple time-series in a vectorized way. Hope this can help someone
I've fitted a frechet distribution in R and would like to use this in a python script. However inputting the same distribution parameters in scipy.stats.frechet_r gives me a very different curve. Is this a mistake in my implementation or a fault in scipy ?
R distribution:
vs Scipy distribution:
R frechet parameters: loc=17.440, shape=0.198, scale=8.153
python code:
from scipy.stats import frechet_r
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots(1, 1)
F=frechet_r(loc=17.440 ,scale= 8.153, c= 0.198)
x=np.arange(0.01,120,0.01)
ax.plot(x, F.pdf(x), 'k-', lw=2)
plt.show()
edit - relevant documentation.
The Frechet parameters were calculated in R using the fgev function in the 'evd' package http://cran.r-project.org/web/packages/evd/evd.pdf (page 40)
Link to the scipy documentation:
http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.frechet_r.html#scipy.stats.frechet_r
I haven't used the frechet_r function from scipy.stats (when just quickly testing it I got the same plot out as you) but you can get the required behaviour from genextreme in scipy.stats. It is worth noting that for genextreme the Frechet and Weibull shape parameter have the 'opposite' sign to usual. That is, in your case you would need to use a shape parameter of -0.198:
from scipy.stats import genextreme as gev
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots(1, 1)
x=np.arange(0.01,120,0.01)
# The order for this is array, shape, loc, scale
F=gev.pdf(x,-0.198,loc=17.44,scale=8.153)
plt.plot(x,F,'g',lw=2)
plt.show()