Chi square numpy.polyfit (numpy)

Chi square numpy.polyfit (numpy) - python

Could someone explain how to get Chi^2/doF using numpy.polyfit?

Assume you have some data points
x = numpy.array([0.0, 1.0, 2.0, 3.0])
y = numpy.array([3.6, 1.3, 0.2, 0.9])
To fit a parabola to those points, use numpy.polyfit():
p = numpy.polyfit(x, y, 2)
To get the chi-squared value for this fit, evaluate the polynomial at the x values of your data points, subtract the y values, square and sum:
chi_squared = numpy.sum((numpy.polyval(p, x) - y) ** 2)
You can divide this number by the number of degrees of freedom if you like.

Numpy's polyfit has, at least since release 1.3, supported a full parameter. If that is set to True, polyfit will return a few more values, including the square of the residuals. Which is chi-squared (unnormalized by the degrees of freedom).
So a simple example would be
p, residuals, _, _, _ = numpy.polyfit(x, y, 2, full=True)
chisq_dof = residuals / (len(x) - 3)
I have not tried this myself with weights, but I assume polyfit does the right thing here (since numpy 1.7, polyfit accepts a parameter w to provide weights for the fit).

Related

Creating a best fit probability distribution from pdf sample coordinates with scipy

Problem: I have data points indicating coordinates sampled from a probability distriabution (in this case we will assume a discrete probability distribution function) We are essentially forming a 'best fit of a pdf' from pdf data here.
Given: sample coordinates of pdf and the type of pdf type to fit to it (e.g. lognorm)
Return: Ideally the pdf parameters, or, alternatively the coordinates of the best fit distribution.
I have not found a question on stackoverflow with this question/answer and I understand it may be poor practice. It seems that scipy explicitly likes the original data to build the pdf parameters from, not sample coordinates from a pdf.
I have vectors whereby:
x = list(range(40))
y =
[0.032935611986072325,
0.15399668951796566,
0.19217568076280733,
0.16189644686218774,
0.11504756998080325,
0.09474568682103104,
0.08971162676825704,
0.06198299715985481,
0.04408241680044377,
0.026817519111333753,
0.013562814925870696,
0.007007365243147507,
0.003909173588759217,
0.0015053452905258473,
0.00037481359597322736,
0.0001378624720821066,
5.734365756863486e-05,
2.9711739672867803e-05,
8.022169711674307e-06,
5.942347934573561e-06,
2.228380475465085e-06,
3.7139674591084754e-06,
8.913521901860341e-07,
8.913521901860341e-07,
5.94234793457356e-07,
2.97117396728678e-07,
2.97117396728678e-07,
2.97117396728678e-07,
1.48558698364339e-07,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0]

Calling your PDF f(x):
If your data really represents {x, f(x)} then you could try simply optimizing for the parameters of f using e.g. https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.leastsq.html#scipy.optimize.leastsq
If your data on the other hand are samples from the probability distribution, i.e. your data looks like {x} but each x is chosen with probability f(x), then you should try Markov Chain Monte Carlo to estimate f. There are several choices for Python:
https://pystan.readthedocs.io/en/latest/
http://docs.pymc.io/notebooks/getting_started.html#Model-fitting

I think your data represents a pdf {x, y = pdf(x)} since sum(y) = 1.
When we plot your data with a slight correction x = list(range(39)) we get a curve similar to a lognormal (?).
import matplotlib.pyplot as plt
x = list(range(39))
plt.plot(x, y)
One trick you could use to avoid optimisation algorithms is to transform your data into a sample since each y[i] is proportional to the frequency of x[i] . In other words, if you want a 'perfect' sample S of size N, each x[i] will appear N * y[i] times.
N = 20.000
n_times = [int(y[i] * N) for i in range(len(y))]
S = np.repeat(x, n_times)
All that remains to be done is to fit a LogNormal distribution to S. Personally, I am used to OpenTURNS library. You just need to format S as an ot.Sample by reshaping into N points of dimension 1
import openturns as ot
sample = ot.Sample([[p] for p in S])
fitdist = ot.LogNormalFactory().build(sample)
fitdist is an "ot.Distribution", you can print to see its parameters
print(fitdist)
>>> LogNormal(muLog = 1.62208, sigmaLog = 0.45679, gamma = -1.79583)
or plot both curves using fitdist.computePDF built-in method which takes as argument ot.Sample format
plt.plot(x, y)
plt.plot(x, fitdist.computePDF(ot.Sample([[p] for p in x])))

numpy.polyfit gives useful fit, but infinite covariance matrix

I am trying to fit a polynomial to a set of data. Sometimes it may happen that the covariance matrix returned by numpy.ployfit only consists of inf, although the fit seems to be useful. There are no numpy.inf or 'numpy.nan' in the data!
Example:
import numpy as np
# sample data, does not contain really x**2-like behaviour,
# but that should be visible in the fit results
x = [-449., -454., -459., -464., -469.]
y = [ 0.9677024, 0.97341953, 0.97724978, 0.98215678, 0.9876293]
fit, cov = np.polyfit(x, y, 2, cov=True)
print 'fit: ', fit
print 'cov: ', cov
Result:
fit: [ 1.67867158e-06 5.69199547e-04 8.85146009e-01]
cov: [[ inf inf inf]
[ inf inf inf]
[ inf inf inf]]
np.cov(x,y) gives
[[ 6.25000000e+01 -6.07388099e-02]
[ -6.07388099e-02 5.92268942e-05]]
So np.cov is not the same as the covariance returned from np.polyfit. Has anybody an idea what's going on?
EDIT:
I now got the point that numpy.cov is not what I want. I need the variances of the polynom coefficients, but I dont get them if (len(x) - order - 2.0) == 0. Is there another way to get the variances of the fit polynom coefficients?

As rustil's answer says, this is caused by the bias correction applied to the denominator of the covariance equation, which results in a zero divide for this input. The reasoning behind this correction is similar to that behind Bessel's Correction. This is really a sign that there are too few datapoints to estimate covariance in a well-defined way.
How to skirt this problem? Well, this version of polyfit accepts weights. You could add another datapoint but weight it at epsilon. This is equivalent to reducing the 2.0 in this formula to a 1.0.
x = [-449., -454., -459., -464., -469.]
y = [ 0.9677024, 0.97341953, 0.97724978, 0.98215678, 0.9876293]
x_extra = x + x[-1:]
y_extra = y + y[-1:]
weights = [1.0, 1.0, 1.0, 1.0, 1.0, sys.float_info.epsilon]
fit, cov = np.polyfit(x, y, 2, cov=True)
fit_extra, cov_extra = np.polyfit(x_extra, y_extra, 2, w=weights, cov=True)
print fit == fit_extra
print cov_extra
The output. Note that the fit values are identical:
>>> print fit == fit_extra
[ True True True]
>>> print cov_extra
[[ 8.84481850e-11 8.11954338e-08 1.86299297e-05]
[ 8.11954338e-08 7.45405039e-05 1.71036963e-02]
[ 1.86299297e-05 1.71036963e-02 3.92469307e+00]]
I am very uncertain that this will be especially meaningful, but it's a way to work around the problem. It's a bit of a kludge though. For something more robust, you could modify polyfit to accept its own ddof parameter, perhaps in lieu of the boolean that cov currently accepts. (I just opened an issue to suggest as much.)
A quick final note about the calculation of cov: If you look at the wikipedia page on least squares regression, you'll see that the simplified formula for the covariance of the coefficients is inv(dot(dot(X, W), X)), which has a corresponding line in the numpy code -- at least roughly speaking. In this case, X is the Vandermonde matrix, and the weights have already been multiplied in. The numpy code also does some scaling (which I understand; it's part of a strategy to minimize numerical error) and multiplies the result by the norm of the residuals (which I don't understand; I can only guess that it's part of another version of the covariance formula).

the difference should be in the degree of freedom. In the polyfit method it already takes into account that your degree is 2, thus causing:
RuntimeWarning: divide by zero encountered in true_divide
fac = resids / (len(x) - order - 2.0)
you can pass your np.cov a ddof= keyword (ddof = delta degrees of freedom) and you'll run into the same problem

SciPy + Numpy: Finding the slope of a sigmoid curve

I have some data that follow a sigmoid distribution as you can see in the following image:
After normalizing and scaling my data, I have adjusted the curve at the bottom using scipy.optimize.curve_fit and some initial parameters:
popt, pcov = curve_fit(sigmoid_function, xdata, ydata, p0 = [0.05, 0.05, 0.05])
>>> print popt
[ 2.82019932e+02 -1.90996563e-01 5.00000000e-02]
So popt, according to the documentation, returns *"Optimal values for the parameters so that the sum of the squared error of f(xdata, popt) - ydata is minimized". I understand here that there is no calculation of the slope with curve_fit, because I do not think the slope of this gentle curve is 282, neither is negative.
Then I tried with scipy.optimize.leastsq, because the documentation says it returns "The solution (or the result of the last iteration for an unsuccessful call).", so I thought the slope would be returned. Like this:
p, cov, infodict, mesg, ier = leastsq(residuals, p_guess, args = (nxdata, nydata), full_output=True)
>>> print p
Param(x0=281.73193626250207, y0=-0.012731420027056234, c=1.0069006606656596, k=0.18836680131910222)
But again, I did not get what I expected. curve_fit and leastsq returned almost the same values, with is not surprising I guess, as curve_fit is using an implementation of the least squares method within to find the curve. But no slope back...unless I overlooked something.
So, how to calculate the slope in a point, say, where X = 285 and Y = 0.5?
I am trying to avoid manual methods, like calculating the derivative in, say, (285.5, 0.55) and (284.5, 0.45) and subtract and divide results and so. I would like to know if there is a more automatic method for this.
Thank you all!
EDIT #1
This is my "sigmoid_function", used by curve_fit and leastsq methods:
def sigmoid_function(xdata, x0, k, p0): # p0 not used anymore, only its components (x0, k)
# This function is called by two different methods: curve_fit and leastsq,
# this last one through function "residuals". I don't know if it makes sense
# to use a single function for two (somewhat similar) methods, but there
# it goes.
# p0:
# + Is the initial parameter for scipy.optimize.curve_fit.
# + For residuals calculation is left empty
# + It is initialized to [0.05, 0.05, 0.05]
# x0:
# + Is the convergence parameter in X-axis and also the shift
# + It starts with 0.05 and ends up being around ~282 (days in a year)
# k:
# + Set up either by curve_fit or leastsq
# + In least squares it is initially fixed at 0.5 and in curve_fit
# + to 0.05. Why? Just did this approach in two different ways and
# + it seems it is working.
# + But honestly, I have no clue on what it represents
# xdata:
# + Positions in X-axis. In this case from 240 to 365
# Finally I changed those parameters as suggested in the answer.
# Sigmoid curve has 2 degrees of freedom, therefore, the initial
# guess only needs to be this size. In this case, p0 = [282, 0.5]
y = np.exp(-k*(xdata-x0)) / (1 + np.exp(-k*(xdata-x0)))
return y
def residuals(p_guess, xdata, ydata):
# For the residuals calculation, there is no need of setting up the initial parameters
# After fixing the initial guess and sigmoid_function header, remove []
# return ydata - sigmoid_function(xdata, p_guess[0], p_guess[1], [])
return ydata - sigmoid_function(xdata, p_guess[0], p_guess[1], [])
I am sorry if I made mistakes while describing the parameters or confused technical terms. I am very new with numpy and I have not studied maths for years, so I am catching up again.
So, again, what is your advice to calculate the slope of X = 285, Y = 0.5 (more or less the midpoint) for this dataset? Thanks!!
EDIT #2
Thanks to Oliver W., I updated my code as he suggested and understood a bit better the problem.
There is a final detail I do not fully get. Apparently, curve_fit returns a popt array (x0, k) with the optimum parameters for the fitting:
x0 seems to be how shifted is the curve by indicating the central point of the curve
k parameter is the slope when y = 0.5, also in the center of the curve (I think!)
Why if the sigmoid function is a growing one, the derivative/slope in popt is negative? Does it make sense?
I used sigmoid_derivative to calculate the slope and, yes, I obtained the same results that popt but with positive sign.
# Year 2003, 2005, 2007. Slope in midpoint.
k = [-0.1910, -0.2545, -0.2259] # Values coming from popt
slope = [0.1910, 0.2545, 0.2259] # Values coming from sigmoid_derivative function
I know this is being a bit peaky because I could use both. The relevant data is in there but with negative sign, but I was wondering why is this happening.
So, the calculation of the derivative function as you suggested, is only required if I need to know the slope in other points than y = 0.5. Only for midpoint, I can use popt.
Thanks for your help, it saved me a lot of time. :-)

You're never using the parameter p0 you're passing to your sigmoid function. Hence, curve fitting will not have any good measure to find convergence, because it can take any value for this parameter. You should first rewrite your sigmoid function like this:
def sigmoid_function(xdata, x0, k):
y = np.exp(-k*(xdata-x0)) / (1 + np.exp(-k*(xdata-x0)))
return y
This means your model (the sigmoid) has only two degrees of freedom. This will be returned in popt:
initial_guess = [282, 1] # (x0, k): at x0, the sigmoid reaches 50%, k is slope related
popt, pcov = curve_fit(sigmoid_function, xdata, ydata, p0=initial_guess)
Now popt will be a tuple (or array of 2 values), being the best possible x0 and k.
To get the slope of this function at any point, to be honest, I would just calculate the derivative symbolically as the sigmoid is not such a hard function. You will end up with:
def sigmoid_derivative(x, x0, k):
f = np.exp(-k*(x-x0))
return -k / f
If you have the results from your curve fitting stored in popt, you could pass this easily to this function:
print(sigmoid_derivative(285, *popt))
which will return for you the derivative at x=285. But, because you ask specifically for the midpoint, so when x==x0 and y==.5, you'll see (from the sigmoid_derivative) that the derivative there is just -k, which can be observed immediately from the curve_fit output you've already obtained. In the output you've shown, that's about 0.19.

gaussian sum filter for irregular spaced points

I have a set of points (x,y) as two vectors
x,y for example:
from pylab import *
x = sorted(random(30))
y = random(30)
plot(x,y, 'o-')
Now I would like to smooth this data with a Gaussian and evaluate it only at certain (regularly spaced) points on the x-axis. lets say for:
x_eval = linspace(0,1,11)
I got the tip that this method is called a "Gaussian sum filter", but so far I have not found any implementation in numpy/scipy for that, although it seems like a standard problem at first glance.
As the x values are not equally spaced I can't use the scipy.ndimage.gaussian_filter1d.
Usually this kind of smoothing is done going through furrier space and multiplying with the kernel, but I don't really know if this will be possible with irregular spaced data.
Thanks for any ideas

This will blow up for very large datasets, but the proper calculaiton you are asking for would be done as follows:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(0) # for repeatability
x = np.random.rand(30)
x.sort()
y = np.random.rand(30)
x_eval = np.linspace(0, 1, 11)
sigma = 0.1
delta_x = x_eval[:, None] - x
weights = np.exp(-delta_x*delta_x / (2*sigma*sigma)) / (np.sqrt(2*np.pi) * sigma)
weights /= np.sum(weights, axis=1, keepdims=True)
y_eval = np.dot(weights, y)
plt.plot(x, y, 'bo-')
plt.plot(x_eval, y_eval, 'ro-')
plt.show()

I'll preface this answer by saying that this is more of a DSP question than a programming question...
...that being said there, there is a simple two step solution to your problem.
Step 1: Resample the data
So to illustrate this we can create a random data set with unequal sampling:
import numpy as np
x = np.cumsum(np.random.randint(0,100,100))
y = np.random.normal(0,1,size=100)
This gives something like:
We can resample this data using simple linear interpolation:
nx = np.arange(x.max()) # choose new x axis sampling
ny = np.interp(nx,x,y) # generate y values for each x
This converts our data to:
Step 2: Apply filter
At this stage you can use some of the tools available through scipy to apply a Gaussian filter to the data with a given sigma value:
import scipy.ndimage.filters as filters
fx = filters.gaussian_filter1d(ny,sigma=100)
Plotting this up against the original data we get:
The choice of the sigma value determines the width of the filter.

Based on #Jaime's answer I wrote a function that implements this with some additional documentation and the ability to discard estimates far from the datapoints.
I think confidence intervals could be obtained on this estimate by bootstrapping, but I haven't done this yet.
def gaussian_sum_smooth(xdata, ydata, xeval, sigma, null_thresh=0.6):
"""Apply gaussian sum filter to data.
xdata, ydata : array
Arrays of x- and y-coordinates of data.
Must be 1d and have the same length.
xeval : array
Array of x-coordinates at which to evaluate the smoothed result
sigma : float
Standard deviation of the Gaussian to apply to each data point
Larger values yield a smoother curve.
null_thresh : float
For evaluation points far from data points, the estimate will be
based on very little data. If the total weight is below this threshold,
return np.nan at this location. Zero means always return an estimate.
The default of 0.6 corresponds to approximately one sigma away
from the nearest datapoint.
"""
# Distance between every combination of xdata and xeval
# each row corresponds to a value in xeval
# each col corresponds to a value in xdata
delta_x = xeval[:, None] - xdata
# Calculate weight of every value in delta_x using Gaussian
# Maximum weight is 1.0 where delta_x is 0
weights = np.exp(-0.5 * ((delta_x / sigma) ** 2))
# Multiply each weight by every data point, and sum over data points
smoothed = np.dot(weights, ydata)
# Nullify the result when the total weight is below threshold
# This happens at evaluation points far from any data
# 1-sigma away from a data point has a weight of ~0.6
nan_mask = weights.sum(1) < null_thresh
smoothed[nan_mask] = np.nan
# Normalize by dividing by the total weight at each evaluation point
# Nullification above avoids divide by zero warning shere
smoothed = smoothed / weights.sum(1)
return smoothed

How to do linear regression, taking errorbars into account?

I am doing a computer simulation for some physical system of finite size, and after this I am doing extrapolation to the infinity (Thermodynamic limit). Some theory says that data should scale linearly with system size, so I am doing linear regression.
The data I have is noisy, but for each data point I can estimate errorbars. So, for example data points looks like:
x_list = [0.3333333333333333, 0.2886751345948129, 0.25, 0.23570226039551587, 0.22360679774997896, 0.20412414523193154, 0.2, 0.16666666666666666]
y_list = [0.13250359351851854, 0.12098339583333334, 0.12398501145833334, 0.09152715, 0.11167239583333334, 0.10876248333333333, 0.09814170444444444, 0.08560799305555555]
y_err = [0.003306749165349316, 0.003818446389148108, 0.0056036878203831785, 0.0036635292592592595, 0.0037034897788415424, 0.007576672222222223, 0.002981084130692832, 0.0034913019065973983]
Let's say I am trying to do this in Python.
First way that I know is:
m, c, r_value, p_value, std_err = scipy.stats.linregress(x_list, y_list)
I understand this gives me errorbars of the result, but this does not take into account errorbars of the initial data.
Second way that I know is:
m, c = numpy.polynomial.polynomial.polyfit(x_list, y_list, 1, w = [1.0 / ty for ty in y_err], full=False)
Here we use the inverse of the errorbar for the each point as a weight that is used in the least square approximation. So if a point is not really that reliable it will not influence result a lot, which is reasonable.
But I can not figure out how to get something that combines both these methods.
What I really want is what second method does, meaning use regression when every point influences the result with different weight. But at the same time I want to know how accurate my result is, meaning, I want to know what are errorbars of the resulting coefficients.
How can I do this?

Not entirely sure if this is what you mean, but…using pandas, statsmodels, and patsy, we can compare an ordinary least-squares fit and a weighted least-squares fit which uses the inverse of the noise you provided as a weight matrix (statsmodels will complain about sample sizes < 20, by the way).
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
mpl.rcParams['figure.dpi'] = 300
import statsmodels.formula.api as sm
x_list = [0.3333333333333333, 0.2886751345948129, 0.25, 0.23570226039551587, 0.22360679774997896, 0.20412414523193154, 0.2, 0.16666666666666666]
y_list = [0.13250359351851854, 0.12098339583333334, 0.12398501145833334, 0.09152715, 0.11167239583333334, 0.10876248333333333, 0.09814170444444444, 0.08560799305555555]
y_err = [0.003306749165349316, 0.003818446389148108, 0.0056036878203831785, 0.0036635292592592595, 0.0037034897788415424, 0.007576672222222223, 0.002981084130692832, 0.0034913019065973983]
# put x and y into a pandas DataFrame, and the weights into a Series
ws = pd.DataFrame({
'x': x_list,
'y': y_list
})
weights = pd.Series(y_err)
wls_fit = sm.wls('x ~ y', data=ws, weights=1 / weights).fit()
ols_fit = sm.ols('x ~ y', data=ws).fit()
# show the fit summary by calling wls_fit.summary()
# wls fit r-squared is 0.754
# ols fit r-squared is 0.701
# let's plot our data
plt.clf()
fig = plt.figure()
ax = fig.add_subplot(111, facecolor='w')
ws.plot(
kind='scatter',
x='x',
y='y',
style='o',
alpha=1.,
ax=ax,
title='x vs y scatter',
edgecolor='#ff8300',
s=40
)
# weighted prediction
wp, = ax.plot(
wls_fit.predict(),
ws['y'],
color='#e55ea2',
lw=1.,
alpha=1.0,
)
# unweighted prediction
op, = ax.plot(
ols_fit.predict(),
ws['y'],
color='k',
ls='solid',
lw=1,
alpha=1.0,
)
leg = plt.legend(
(op, wp),
('Ordinary Least Squares', 'Weighted Least Squares'),
loc='upper left',
fontsize=8)
plt.tight_layout()
fig.set_size_inches(6.40, 5.12)
plt.show()
WLS residuals:
[0.025624005084707302,
0.013611438189866154,
-0.033569595462217161,
0.044110895217014695,
-0.025071632845910546,
-0.036308252199571928,
-0.010335514810672464,
-0.0081511479431851663]
The mean squared error of the residuals for the weighted fit (wls_fit.mse_resid or wls_fit.scale) is 0.22964802498892287, and the r-squared value of the fit is 0.754.
You can obtain a wealth of data about the fits by calling their summary() method, and/or doing dir(wls_fit), if you need a list of every available property and method.

I wrote a concise function to perform the weighted linear regression of a data set, which is a direct translation of GSL's "gsl_fit_wlinear" function. This is useful if you want to know exactly what your function is doing when it performs the fit
def wlinear_fit (x,y,w) :
"""
Fit (x,y,w) to a linear function, using exact formulae for weighted linear
regression. This code was translated from the GNU Scientific Library (GSL),
it is an exact copy of the function gsl_fit_wlinear.
"""
# compute the weighted means and weighted deviations from the means
# wm denotes a "weighted mean", wm(f) = (sum_i w_i f_i) / (sum_i w_i)
W = np.sum(w)
wm_x = np.average(x,weights=w)
wm_y = np.average(y,weights=w)
dx = x-wm_x
dy = y-wm_y
wm_dx2 = np.average(dx**2,weights=w)
wm_dxdy = np.average(dx*dy,weights=w)
# In terms of y = a + b x
b = wm_dxdy / wm_dx2
a = wm_y - wm_x*b
cov_00 = (1.0/W) * (1.0 + wm_x**2/wm_dx2)
cov_11 = 1.0 / (W*wm_dx2)
cov_01 = -wm_x / (W*wm_dx2)
# Compute chi^2 = \sum w_i (y_i - (a + b * x_i))^2
chi2 = np.sum (w * (y-(a+b*x))**2)
return a,b,cov_00,cov_11,cov_01,chi2
To perform your fit, you would do
a,b,cov_00,cov_11,cov_01,chi2 = wlinear_fit(x_list,y_list,1.0/y_err**2)
Which will return the best estimate for the coefficients a (the intercept) and b (the slope) of the linear regression, along with the elements of the covariance matrix cov_00, cov_01 and cov_11. The best estimate on the error on a is then the square root of cov_00 and the one on b is the square root of cov_11. The weighted sum of the residuals is returned in the chi2 variable.
IMPORTANT: this function accepts inverse variances, not the inverse standard deviations as the weights for the data points.

sklearn.linear_model.LinearRegression supports specification of weights during fit:
x_data = np.array(x_list).reshape(-1, 1) # The model expects shape (n_samples, n_features).
y_data = np.array(y_list)
y_err = np.array(y_err)
model = LinearRegression()
model.fit(x_data, y_data, sample_weight=1/y_err)
Here the sample weight is specified as 1 / y_err. Different versions are possible and often it's a good idea to clip these sample weights to a maximum value in case the y_err varies strongly or has small outliers:
sample_weight = 1 / y_err
sample_weight = np.minimum(sample_weight, MAX_WEIGHT)
where MAX_WEIGHT should be determined from your data (by looking at the y_err or 1 / y_err distributions, e.g. if they have outliers they can be clipped).

I found this document helpful in understanding and setting up my own weighted least squares routine (applicable for any programming language).
Typically learning and using optimized routines is the best way to go but there are times where understanding the guts of a routine is important.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.