How to fit multiple exponential curves using python

How to fit multiple exponential curves using python - python

For a single exponential curve such as shown in the image here curve_fit for as single exponential curve , I am able to fit the data using scipy.optimize.curve_fit. However, I am unsure on how to realize a fit for similar dataset composed of multiple exponential curves as shown here double exponential curves.
I achieved the fit for the single curve using the following approach:
def exp_decay(x,a,r):
return a * ((1-r)**x)
x = np.linspace(0,50,50)
y = exp_decay(x, 400, 0.06)
y1 = exp_decay(x, 550, 0.06) # this is to be used to append to y to generate two curves
pars, cov = curve_fit(exp_decay, x, y, p0=[0,0])
plt.scatter(x,y)
plt.plot(x, exp_decay(x, *pars), 'r-') #this realizes the fit for a single curve
yx = np.append(y,y1) #this realizes two exponential curves (as shown above - double exponential curves) for which I don't need to fit a model to
Can someone help describe how to achieve this for a dataset of two curves. My actual dataset comprises of multiple exponential curves but I think if I can realize a fit for two curves, I may be able to replicate same for my dataset. This must not be done with scipy's curve_fit; any implementation that works is fine.
PLEASE HELP !!!

Your problem can easily be tackled by splitting your dataset using a simple criterion such as first derivative estimate and then we can apply simple curve fitting procedure to each sub dataset.
Trial Dataset
First, let's import some packages and create a synthetic dataset with three curves to represent your problem.
We use a two parameters exponential model as time origin shift will be handled by the splitting methodology. We also add noise as there is always noise on real world data:
import numpy as np
import pandas as pd
from scipy import optimize
import matplotlib.pyplot as plt
def func(x, a, b):
return a*np.exp(b*x)
N = 1001
n1 = N//3
n2 = 2*n1
t = np.linspace(0, 10, N)
x0 = func(t[:n1], 1, -0.2)
x1 = func(t[n1:n2]-t[n1], 5, -0.4)
x2 = func(t[n2:]-t[n2], 2, -1.2)
x = np.hstack([x0, x1, x2])
xr = x + 0.025*np.random.randn(x.size)
Graphically it renders as follow:
Dataset Splitting
We can split the dataset into three sub-datasets using a simple criterion as first derivative estimate using first difference to assess it. The goal is to detect when curve drastically goes up or down (where dataset should be split. First derivative is estimated as follow):
dxrdt = np.abs(np.diff(xr)/np.diff(t))
The criterion requires an extra parameter (threshold) that must be tuned accordingly to your signal specifications. The criterion is equivalent to:
xcrit = 20
q = np.where(dxrdt > xcrit) # (array([332, 665], dtype=int64),)
And split index are:
idx = [0] + list(q[0]+1) + [t.size] # [0, 333, 666, 1001]
Mainly the criterion threshold will be affected by the nature and the power of the noise on your data and the gap magnitudes between two curves. The usage of this methodology depends on the ability to detect curves gap in presence of noise. It will break when the noise power has the same magnitude of the gap we want to detect. You can also observe false split index if the noise is heavily tailed (few strong outliers).
In this MCVE, we have set the threshold to 20 [Signal Units/Time Units]:
An alternative to this hand-crafted criterion is to delegate the identification to the excellent find_peaks method of scipy. But it will not avoid the requirement to tune the detection to your signal specifications.
Fit origin-shifted dataset
Now we can apply the curve fitting on each sub-dataset (with origin shifted time), collect parameters and statistics and plot the result:
trials = []
fig, axe = plt.subplots()
for k, (i, j) in enumerate(zip(idx[:-1], idx[1:])):
p, s = optimize.curve_fit(func, t[i:j]-t[i], xr[i:j])
axe.plot(t[i:j], xr[i:j], '.', label="Data #{}".format(k+1))
axe.plot(t[i:j], func(t[i:j]-t[i], *p), label="Data Fit #{}".format(k+1))
trials.append({"n0": i, "n1": j, "t0": t[i], "a": p[0], "b": p[1],
"s_a": s[0,0], "s_b": s[1,1], "s_ab": s[0,1]})
axe.set_title("Curve Fits")
axe.set_xlabel("Time, $t$")
axe.set_ylabel("Signal Estimate, $\hat{g}(t)$")
axe.legend()
axe.grid()
df = pd.DataFrame(trials)
It returns the following fitting results:
n0 n1 t0 a b s_a s_b s_ab
0 0 333 0.00 0.998032 -0.199102 0.000011 4.199937e-06 -0.000005
1 333 666 3.33 5.001710 -0.399537 0.000013 3.072542e-07 -0.000002
2 666 1001 6.66 2.002495 -1.203943 0.000030 2.256274e-05 -0.000018
Which complies with our original parameters (see Trial dataset section).
Graphically we can check the goodness of fits:

Related

Fitting a model with some known parameters to an experimental dataset in python, in order to optimise other parameters

I have an experimental dataset 1 which plots intensity as a function of energy. These are arrays of 1800 datapoints.
I have been trying to fit a model to this data, given by the equation below:
Imodel = I0 * ((math.cos(phi) + (beta * f1))**2 + (math.sin(phi) + (beta*f2))**2 + Ioff
I have 2 other datasets of f1 vs. energy and f2 vs. energy 2. These are arrays of 700 datapoints, albeit over the same energy range as the first dataset.
I want to use this model function together with the f1 and f2 data to find optimal values of the other 4 parameters (I0, phi, beta, Ioff) where this model function fits the experimental dataset exactly.
I have been looking into curve_fit and least_squares from the scipy.optimize package, as well as linear regression packages such as lmfit and scikit, but to no avail.
can anyone help? Thanks

Presently I have no representative data from Ayrtonb1 in order to test the method proposed below. The method seems convenient from theoretical basis but one cannot be sure that it will be satisfying with the OP data.
Nevertheless a preliminary test was carried out with a "toy" data (shown below).
I suppose that the screencopy below is sufficient to understand the method and to reproduce the calculus with real data.
The result of this preliminary test is rather good :
LRMSE<2 for a range up to 600. (Least Root Mean Square Error).
LRMSRE<2% (Least Root Mean Square Relative Error).

Your data and formula look suspiciously like resonant (or anomalous) X-ray diffraction data, with measurements of scattered intensity going across the Zn K-edge. Although you do not say this, the discussion here will assume that. You say you have 1800 measurements, presumably as a function of X-ray energy in eV. The resonant scattering factors (f1, f2) you show seem to be idealized and possibly "typical", and perhaps not specifically for the Zn K-edge -- at the very least the energy scale shown is not the same as your data.
You will want to treat the data and model the intensity as a function of X-ray energy. And you will want realistic values for f1 and f2 for the element of interest, and at the actual energy points for your data. I recommend using xraydb (full disclosure: I am the lead author) [pip install xraydb], so that you can do
import numpy as np
import xraydb
#edata, idata = function_to_extract_your_data()
# or perhaps testing with
edata = np.linspace(9500, 10500, 501)
f1 = xraydb.f1_chantler('Zn', edata)
f2 = xraydb.f2_chantler('Zn', edata)
As written, your intensity function does not directly depend on energy, though it might at a later date, say to make that offset be linear in energy, not just a constant. You might write a function like:
def intensity(en, phi, beta, scale=1, slope=0, offset=0, f1=-1, f2=1):
costerm = np.cos(phi) + beta*f1
sinterm = np.sin(phi) + beta*f2
return scale * (costerm**2 + sinterm**2) + slope*en + offset
with that you can start just trying out some values to get a feel for the function and how it compares to your data:
import matplotlib.pyplot as plt
beta = 0.025 # Wild Guess!
for phi in np.pi*np.arange(20)/10:
plt.plot(edata, intensity(edata, phi, beta, f1=f1, f2=f2), label='%.1f'%phi)
plt.legend()
plt.show()
It kind of looks like your value for phi would be around 5.5 to 6 (or -0.8 to -0.3). You could also try different values of beta and plot that with your actual data.
With that model for intensity and a feel for what the range of parameters is, you could then try to fit your data. To do that, I would recommend using lmfit (full disclosure: I am the lead author) [pip install lmfit], where you can create a model from your intensity model function - this will use the names of the function arguments to name the fitting parameters.
from lmfit import Model
imodel = Model(intensity, independent_vars=['en', 'f1', 'f2'])
params = imodel.make_params(scale=1, offset=0, slope=0, beta=0.1, phi=5.5)
That independent_vars will tell Model to not make fitting Parameters for the function arguments f1 and f2 and to expect them to be passed into any evaluation or fit. The other function arguments (scale, phi, etc) will be expected to become fitting variables. You do have to create a "Parameters" object and must give initial values for all parameters. You can put bounds on these or fix them so they are not altered in the fit, as with
params['scale'].min = 0 # force scale to be positive
params['slope'].vary = False # slope will be fixed at 0.
You can then evaluate the model with
init_value = imodel.eval(params, en=edata, f1=f1, f2=f2)
And then eventually do a fit with
result = imodel.fit(idata, params, en=edata, f1=f1, f2=f2)
print(result.fit_report())
plt.plot(edata, idata, label='data')
plt.plot(edata, init_value, label='initial fit')
plt.plot(edata, result.best_fit, label='best fit')
plt.legend()
plt.show()
Finally, for analysis of X-ray resonant scattering be sure to consider including absorption corrections in that intensity calculation. As you go across the Zn K edge, the absorption depth of the sample may change dramatically if the Zn concentration is high.

Fourier series data fit with numpy: fft vs coding

Suppose I have some data, y, to which I would like to fit a Fourier series. On this post, a solution was posted by Mermoz using the complex format of the series and "calculating the coefficient with a riemann sum". On this other post, the series is obtained through the FFT and an example is written down.
I tried implementing both approaches (image and code below - notice everytime the code is run, different data will be generated due to the use of numpy.random.normal) but I wonder why I am getting different results - the Riemann approach seems "wrongly shifted" while the FFT approach seems "squeezed". I am also not sure about my definition of the period "tau" for the series. I appreciate the attention.
I am using Spyder with Python 3.7.1 on Windows 7
Example
import matplotlib.pyplot as plt
import numpy as np
# Assume x (independent variable) and y are the data.
# Arbitrary numerical values for question purposes:
start = 0
stop = 4
mean = 1
sigma = 2
N = 200
terms = 30 # number of terms for the Fourier series
x = np.linspace(start,stop,N,endpoint=True)
y = np.random.normal(mean, sigma, len(x))
# Fourier series
tau = (max(x)-min(x)) # assume that signal length = 1 period (tau)
# From ref 1
def cn(n):
c = y*np.exp(-1j*2*n*np.pi*x/tau)
return c.sum()/c.size
def f(x, Nh):
f = np.array([2*cn(i)*np.exp(1j*2*i*np.pi*x/tau) for i in range(1,Nh+1)])
return f.sum()
y_Fourier_1 = np.array([f(t,terms).real for t in x])
# From ref 2
Y = np.fft.fft(y)
np.put(Y, range(terms+1, len(y)), 0.0) # zero-ing coefficients above "terms"
y_Fourier_2 = np.fft.ifft(Y)
# Visualization
f, ax = plt.subplots()
ax.plot(x,y, color='lightblue', label = 'artificial data')
ax.plot(x, y_Fourier_1, label = ("'Riemann' series fit (%d terms)" % terms))
ax.plot(x,y_Fourier_2, label = ("'FFT' series fit (%d terms)" % terms))
ax.grid(True, color='dimgray', linestyle='--', linewidth=0.5)
ax.set_axisbelow(True)
ax.set_ylabel('y')
ax.set_xlabel('x')
ax.legend()

Performing two small modifications is sufficient to make the sums nearly similar to the output of np.fft. The FFTW library indeed computes these sums.
1) The average of the signal, c[0] is to be accounted for:
f = np.array([2*cn(i)*np.exp(1j*2*i*np.pi*x/tau) for i in range(0,Nh+1)]) # here : 0, not 1
2) The output must be scaled.
y_Fourier_1=y_Fourier_1*0.5
The output seems "squeezed" because the high frequency components have been filtered. Indeed, the high frequency oscillations of the input have been cleared and the output looks like a moving average.
Here, tau is actually defined as stop-start: it corresponds to the length of the frame. It is the expected period of the signal.
If the frame does not correspond to a period of the signal, you can guess its period by convoluting the signal with itself and finding the first maximum. See
Find period of a signal out of the FFT Nevertheless, it is unlikely to work properly with a dataset generated by numpy.random.normal : this is an Additive White Gaussian Noise. As it features a constant power spectral density, it can hardly be discribed as periodic!

sklearn LogisticRegression - plot displays too small coefficient

I am attempting to fit a logistic regression model to sklearn's iris dataset. I get a probability curve that looks like it is too flat, aka the coefficient is too small. I would expect a probability over ninety percent by sepal length > 7 :
Is this probability curve indeed wrong? If so, what might cause that in my code?
from sklearn import datasets
import matplotlib.pyplot as plt
import numpy as np
import math
from sklearn.linear_model import LogisticRegression
data = datasets.load_iris()
#get relevent data
lengths = data.data[:100, :1]
is_setosa = data.target[:100]
#fit model
lgs = LogisticRegression()
lgs.fit(lengths, is_setosa)
m = lgs.coef_[0,0]
b = lgs.intercept_[0]
#generate values for curve overlay
lgs_curve = lambda x: 1/(1 + math.e**(-(m*x+b)))
x_values = np.linspace(2, 10, 100)
y_values = lgs_curve(x_values)
#plot it
plt.plot(x_values, y_values)
plt.scatter(lengths, is_setosa, c='r', s=2)
plt.xlabel("Sepal Length")
plt.ylabel("Probability is Setosa")

If you refer to http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression, you will find a regularization parameter C that can be passed as argument while training the logistic regression model.
C : float, default: 1.0 Inverse of regularization strength; must be a
positive float. Like in support vector machines, smaller values
specify stronger regularization.
Now, if you try different values of this regularization parameter, you will find that larger values of C leads to fitting curves that has sharper transitions from 0 to 1 value of the output (response) binary variable, and still larger values fit models that have high variance (try to model the training data transition more closely, i think that's what you are expecting, then you may try to set C value as high as 10 and plot) but at the same time are likely to have the risk to overfit, while the default value C=1 and values smaller than that lead to high bias and are likely to underfit and here comes the famous bias-variance trade-off in machine learning.
You can always use techniques like cross-validation to choose the C value that is right for you. The following code / figure shows the probability curve fitted with models of different complexity (i.e., with different values of the regularization parameter C, from 1 to 10):
x_values = np.linspace(2, 10, 100)
x_test = np.reshape(x_values, (100,1))
C = list(range(1, 11))
labels = map(str, C)
for i in range(len(C)):
lgs = LogisticRegression(C = C[i]) # pass a value for the regularization parameter C
lgs.fit(lengths, is_setosa)
y_values = lgs.predict_proba(x_test)[:,1] # use this function to compute probability directly
plt.plot(x_values, y_values, label=labels[i])
plt.scatter(lengths, is_setosa, c='r', s=2)
plt.xlabel("Sepal Length")
plt.ylabel("Probability is Setosa")
plt.legend()
plt.show()
Predicted probs with models fitted with different values of C

Although you do not describe what you want to plot, I assume you want to plot the separating line. It seems that you are confused with respect to the Logistic/sigmoid function. The decision function of Logistic Regression is a line.

Your probability graph looks flat because you have, in a sense, "zoomed in" too much.
If you look at the middle of a sigmoid function, it get's to be almost linear, as the second derivative get's to be almost 0 (see for example a wolfram alpha graph)
Please note that the value's we are talking about are the results of -(m*x+b)
When we reduce the limits of your graph, say by using
x_values = np.linspace(4, 7, 100), we get something which looks like a line:
But on the other hand, if we go crazy with the limits, say by using x_values = np.linspace(-10, 20, 100), we get the clearer sigmoid:

gaussian sum filter for irregular spaced points

I have a set of points (x,y) as two vectors
x,y for example:
from pylab import *
x = sorted(random(30))
y = random(30)
plot(x,y, 'o-')
Now I would like to smooth this data with a Gaussian and evaluate it only at certain (regularly spaced) points on the x-axis. lets say for:
x_eval = linspace(0,1,11)
I got the tip that this method is called a "Gaussian sum filter", but so far I have not found any implementation in numpy/scipy for that, although it seems like a standard problem at first glance.
As the x values are not equally spaced I can't use the scipy.ndimage.gaussian_filter1d.
Usually this kind of smoothing is done going through furrier space and multiplying with the kernel, but I don't really know if this will be possible with irregular spaced data.
Thanks for any ideas

This will blow up for very large datasets, but the proper calculaiton you are asking for would be done as follows:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(0) # for repeatability
x = np.random.rand(30)
x.sort()
y = np.random.rand(30)
x_eval = np.linspace(0, 1, 11)
sigma = 0.1
delta_x = x_eval[:, None] - x
weights = np.exp(-delta_x*delta_x / (2*sigma*sigma)) / (np.sqrt(2*np.pi) * sigma)
weights /= np.sum(weights, axis=1, keepdims=True)
y_eval = np.dot(weights, y)
plt.plot(x, y, 'bo-')
plt.plot(x_eval, y_eval, 'ro-')
plt.show()

I'll preface this answer by saying that this is more of a DSP question than a programming question...
...that being said there, there is a simple two step solution to your problem.
Step 1: Resample the data
So to illustrate this we can create a random data set with unequal sampling:
import numpy as np
x = np.cumsum(np.random.randint(0,100,100))
y = np.random.normal(0,1,size=100)
This gives something like:
We can resample this data using simple linear interpolation:
nx = np.arange(x.max()) # choose new x axis sampling
ny = np.interp(nx,x,y) # generate y values for each x
This converts our data to:
Step 2: Apply filter
At this stage you can use some of the tools available through scipy to apply a Gaussian filter to the data with a given sigma value:
import scipy.ndimage.filters as filters
fx = filters.gaussian_filter1d(ny,sigma=100)
Plotting this up against the original data we get:
The choice of the sigma value determines the width of the filter.

Based on #Jaime's answer I wrote a function that implements this with some additional documentation and the ability to discard estimates far from the datapoints.
I think confidence intervals could be obtained on this estimate by bootstrapping, but I haven't done this yet.
def gaussian_sum_smooth(xdata, ydata, xeval, sigma, null_thresh=0.6):
"""Apply gaussian sum filter to data.
xdata, ydata : array
Arrays of x- and y-coordinates of data.
Must be 1d and have the same length.
xeval : array
Array of x-coordinates at which to evaluate the smoothed result
sigma : float
Standard deviation of the Gaussian to apply to each data point
Larger values yield a smoother curve.
null_thresh : float
For evaluation points far from data points, the estimate will be
based on very little data. If the total weight is below this threshold,
return np.nan at this location. Zero means always return an estimate.
The default of 0.6 corresponds to approximately one sigma away
from the nearest datapoint.
"""
# Distance between every combination of xdata and xeval
# each row corresponds to a value in xeval
# each col corresponds to a value in xdata
delta_x = xeval[:, None] - xdata
# Calculate weight of every value in delta_x using Gaussian
# Maximum weight is 1.0 where delta_x is 0
weights = np.exp(-0.5 * ((delta_x / sigma) ** 2))
# Multiply each weight by every data point, and sum over data points
smoothed = np.dot(weights, ydata)
# Nullify the result when the total weight is below threshold
# This happens at evaluation points far from any data
# 1-sigma away from a data point has a weight of ~0.6
nan_mask = weights.sum(1) < null_thresh
smoothed[nan_mask] = np.nan
# Normalize by dividing by the total weight at each evaluation point
# Nullification above avoids divide by zero warning shere
smoothed = smoothed / weights.sum(1)
return smoothed

How to do linear regression, taking errorbars into account?

I am doing a computer simulation for some physical system of finite size, and after this I am doing extrapolation to the infinity (Thermodynamic limit). Some theory says that data should scale linearly with system size, so I am doing linear regression.
The data I have is noisy, but for each data point I can estimate errorbars. So, for example data points looks like:
x_list = [0.3333333333333333, 0.2886751345948129, 0.25, 0.23570226039551587, 0.22360679774997896, 0.20412414523193154, 0.2, 0.16666666666666666]
y_list = [0.13250359351851854, 0.12098339583333334, 0.12398501145833334, 0.09152715, 0.11167239583333334, 0.10876248333333333, 0.09814170444444444, 0.08560799305555555]
y_err = [0.003306749165349316, 0.003818446389148108, 0.0056036878203831785, 0.0036635292592592595, 0.0037034897788415424, 0.007576672222222223, 0.002981084130692832, 0.0034913019065973983]
Let's say I am trying to do this in Python.
First way that I know is:
m, c, r_value, p_value, std_err = scipy.stats.linregress(x_list, y_list)
I understand this gives me errorbars of the result, but this does not take into account errorbars of the initial data.
Second way that I know is:
m, c = numpy.polynomial.polynomial.polyfit(x_list, y_list, 1, w = [1.0 / ty for ty in y_err], full=False)
Here we use the inverse of the errorbar for the each point as a weight that is used in the least square approximation. So if a point is not really that reliable it will not influence result a lot, which is reasonable.
But I can not figure out how to get something that combines both these methods.
What I really want is what second method does, meaning use regression when every point influences the result with different weight. But at the same time I want to know how accurate my result is, meaning, I want to know what are errorbars of the resulting coefficients.
How can I do this?

Not entirely sure if this is what you mean, but…using pandas, statsmodels, and patsy, we can compare an ordinary least-squares fit and a weighted least-squares fit which uses the inverse of the noise you provided as a weight matrix (statsmodels will complain about sample sizes < 20, by the way).
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
mpl.rcParams['figure.dpi'] = 300
import statsmodels.formula.api as sm
x_list = [0.3333333333333333, 0.2886751345948129, 0.25, 0.23570226039551587, 0.22360679774997896, 0.20412414523193154, 0.2, 0.16666666666666666]
y_list = [0.13250359351851854, 0.12098339583333334, 0.12398501145833334, 0.09152715, 0.11167239583333334, 0.10876248333333333, 0.09814170444444444, 0.08560799305555555]
y_err = [0.003306749165349316, 0.003818446389148108, 0.0056036878203831785, 0.0036635292592592595, 0.0037034897788415424, 0.007576672222222223, 0.002981084130692832, 0.0034913019065973983]
# put x and y into a pandas DataFrame, and the weights into a Series
ws = pd.DataFrame({
'x': x_list,
'y': y_list
})
weights = pd.Series(y_err)
wls_fit = sm.wls('x ~ y', data=ws, weights=1 / weights).fit()
ols_fit = sm.ols('x ~ y', data=ws).fit()
# show the fit summary by calling wls_fit.summary()
# wls fit r-squared is 0.754
# ols fit r-squared is 0.701
# let's plot our data
plt.clf()
fig = plt.figure()
ax = fig.add_subplot(111, facecolor='w')
ws.plot(
kind='scatter',
x='x',
y='y',
style='o',
alpha=1.,
ax=ax,
title='x vs y scatter',
edgecolor='#ff8300',
s=40
)
# weighted prediction
wp, = ax.plot(
wls_fit.predict(),
ws['y'],
color='#e55ea2',
lw=1.,
alpha=1.0,
)
# unweighted prediction
op, = ax.plot(
ols_fit.predict(),
ws['y'],
color='k',
ls='solid',
lw=1,
alpha=1.0,
)
leg = plt.legend(
(op, wp),
('Ordinary Least Squares', 'Weighted Least Squares'),
loc='upper left',
fontsize=8)
plt.tight_layout()
fig.set_size_inches(6.40, 5.12)
plt.show()
WLS residuals:
[0.025624005084707302,
0.013611438189866154,
-0.033569595462217161,
0.044110895217014695,
-0.025071632845910546,
-0.036308252199571928,
-0.010335514810672464,
-0.0081511479431851663]
The mean squared error of the residuals for the weighted fit (wls_fit.mse_resid or wls_fit.scale) is 0.22964802498892287, and the r-squared value of the fit is 0.754.
You can obtain a wealth of data about the fits by calling their summary() method, and/or doing dir(wls_fit), if you need a list of every available property and method.

I wrote a concise function to perform the weighted linear regression of a data set, which is a direct translation of GSL's "gsl_fit_wlinear" function. This is useful if you want to know exactly what your function is doing when it performs the fit
def wlinear_fit (x,y,w) :
"""
Fit (x,y,w) to a linear function, using exact formulae for weighted linear
regression. This code was translated from the GNU Scientific Library (GSL),
it is an exact copy of the function gsl_fit_wlinear.
"""
# compute the weighted means and weighted deviations from the means
# wm denotes a "weighted mean", wm(f) = (sum_i w_i f_i) / (sum_i w_i)
W = np.sum(w)
wm_x = np.average(x,weights=w)
wm_y = np.average(y,weights=w)
dx = x-wm_x
dy = y-wm_y
wm_dx2 = np.average(dx**2,weights=w)
wm_dxdy = np.average(dx*dy,weights=w)
# In terms of y = a + b x
b = wm_dxdy / wm_dx2
a = wm_y - wm_x*b
cov_00 = (1.0/W) * (1.0 + wm_x**2/wm_dx2)
cov_11 = 1.0 / (W*wm_dx2)
cov_01 = -wm_x / (W*wm_dx2)
# Compute chi^2 = \sum w_i (y_i - (a + b * x_i))^2
chi2 = np.sum (w * (y-(a+b*x))**2)
return a,b,cov_00,cov_11,cov_01,chi2
To perform your fit, you would do
a,b,cov_00,cov_11,cov_01,chi2 = wlinear_fit(x_list,y_list,1.0/y_err**2)
Which will return the best estimate for the coefficients a (the intercept) and b (the slope) of the linear regression, along with the elements of the covariance matrix cov_00, cov_01 and cov_11. The best estimate on the error on a is then the square root of cov_00 and the one on b is the square root of cov_11. The weighted sum of the residuals is returned in the chi2 variable.
IMPORTANT: this function accepts inverse variances, not the inverse standard deviations as the weights for the data points.

sklearn.linear_model.LinearRegression supports specification of weights during fit:
x_data = np.array(x_list).reshape(-1, 1) # The model expects shape (n_samples, n_features).
y_data = np.array(y_list)
y_err = np.array(y_err)
model = LinearRegression()
model.fit(x_data, y_data, sample_weight=1/y_err)
Here the sample weight is specified as 1 / y_err. Different versions are possible and often it's a good idea to clip these sample weights to a maximum value in case the y_err varies strongly or has small outliers:
sample_weight = 1 / y_err
sample_weight = np.minimum(sample_weight, MAX_WEIGHT)
where MAX_WEIGHT should be determined from your data (by looking at the y_err or 1 / y_err distributions, e.g. if they have outliers they can be clipped).

I found this document helpful in understanding and setting up my own weighted least squares routine (applicable for any programming language).
Typically learning and using optimized routines is the best way to go but there are times where understanding the guts of a routine is important.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.