nonlinear curve fitting in python with two variables - python

I am trying to define a function that fits input x and y data of the form:
def nlvh(x,y, xi, yi, H,C):
return ((H-xi*C)/8.314)*((1/xi) - x) + (C/8.314)*np.log((1/x)/xi) + np.log(yi)
The x and y data are 1-D numpy arrays of the same length. I would like to slice the data so that I can select the first 5 points of x and y, fit those by optimizing C and H in the model, and then move one point ahead and repeat. I have some code that does this for a linear fit over the same data:
for i in np.arange(len(x)):
xdata = x[i:i + window]
ydata = y[i:i+window]
a[i], b[i] = np.polyfit(xdata, ydata,1)
xdata_avg[i] = np.mean(xdata)
if i == (lenx - window):
but doing the same thing over the equation defined above appears to be a bit more tricky. x and y appear as the independent and dependent variables, but there are also parameters xo and yo which are the first values of x and y in each window.
The end result I would like are two new arrays with H[i] and C[i], where i designates each subsequent window. Does anybody have some insight as to how I can get started?

Following your comment to my previous answer (where you suggested that you will like xi and yi to be the initial values in each "sliced" x and y arrays), I am adding another answer. This answer introduces a change in the function nlvh and achieves exactly what you desire. As like my previous answer, we will use curve_fit from scipy.optimize.
In the below mentioned code, I am using globals() function from python to define xi and yi. For every sliced x and y arrays, xi and yi store the first value of the respective sliced arrays. This is the revamped code:
from __future__ import division #For decimal division.
import numpy as np
from scipy.optimize import curve_fit
def nlvh(x, H, C):
return ((H-xi*C)/8.314)*((1/xi) - x) + (C/8.314)*np.log((1/x)/xi) + np.log(yi)
xdata = np.arange(1,21) #Choose an array for x.
#Choose an array for y.
ydata = np.array([-0.1404996, -0.04353953, 0.35002257, 0.12939468, -0.34259184, -0.2906065,
-0.37508709, -0.41583238, -0.511851, -0.39465581, -0.32631751, -0.34403938,
-0.592997, -0.34312689, -0.4838437, -0.19311436, -0.20962735, -0.31134191,
-0.09487793, -0.55578775])
H_lst, C_lst = [], []
for i in range( len(xdata)-5 ):
#Select 5 consecutive points of xdata (from index i to i+4).
xnew = xdata[i: i+5]
globals()['xi'] = xnew[0]
#Select 5 consecutive points of ydata (from index i to i+4).
ynew = ydata[i: i+5]
globals()['yi'] = ynew[0]
#Fit function nlvh to data using scipy.optimize.curve_fit
popt, pcov = curve_fit(nlvh, xnew, ynew, maxfev=100000)
#Optimal values for H from minimization of sum of the squared residuals.
H_lst += [popt[0]]
#Optimal values for C from minimization of sum of the squared residuals.
C_lst += [popt[1]]
H_arr, C_arr = np.asarray(H_lst), np.asarray(C_lst) #Convert list to numpy arrays.
Your output for H_arr and C_arr will now be the following:
print H_arr
>>>[1.0, 1.0, -23.041138662879327, -34.58915200575536, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
print C_arr
>>>[1.0, 1.0, -8.795855063863234, -9.271561975595562, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
Following are the plots that you get for the data selected above (xdata, ydata).

You can use curve_fit from scipy.optimize. It will use non-linear least squares to fit the parameters (H, C, xi, yi) of your function nlvh to given input data for x and y.
Try the following code. In the below mentioned code, H_arr and C_arr are numpy arrays which contain fit parameters of H and C respectively when the function nlvh is fitted to windows of 5 consecutive points of xdata and ydata (xdata and ydata are arrays that I have chosen for x and y. You can choose different arrays here.)
from __future__ import division #For decimal division.
import numpy as np
from scipy.optimize import curve_fit
def nlvh(x, H, C, xi, yi):
return ((H-xi*C)/8.314)*((1/xi) - x) + (C/8.314)*np.log((1/x)/xi) + np.log(yi)
xdata = np.arange(1,21) #Choose an array for x
#Find an array yy for chosen values of parameters (H, C, xi, yi)
yy = nlvh(xdata, H=1.0, C=1.0, xi=1.0, yi=1.0)
print yy
>>>[ 0. -0.08337108 -0.13214004 -0.16674217 -0.19358166 -0.21551112 -0.23405222 -0.25011325 -0.26428008 -0.27695274 -0.28841656 -0.2988822 -0.30850967 -0.3174233 -0.3257217 -0.33348433 -0.3407762 -0.34765116 -0.35415432 -0.36032382]
#Add noise to the initally chosen array yy.
y_noise = 0.2 * np.random.normal(size=xdata.size)
ydata = yy + y_noise
print ydata
>>>[-0.1404996 -0.04353953 0.35002257 0.12939468 -0.34259184 -0.2906065 -0.37508709 -0.41583238 -0.511851 -0.39465581 -0.32631751 -0.34403938 -0.592997 -0.34312689 -0.4838437 -0.19311436 -0.20962735 -0.31134191-0.09487793 -0.55578775]
H_lst, C_lst = [], []
for i in range( len(xdata)-5 ):
#Select 5 consecutive points of xdata (from index i to i+4).
xnew = xdata[i: i+5]
#Select 5 consecutive points of ydata (from index i to i+4).
ynew = ydata[i: i+5]
#Fit function nlvh to data using scipy.optimize.curve_fit
popt, pcov = curve_fit(nlvh, xnew, ynew, maxfev=100000)
#Optimal values for H from minimization of sum of the squared residuals.
H_lst += [popt[0]]
#Optimal values for C from minimization of sum of the squared residuals.
C_lst += [popt[1]]
H_arr, C_arr = np.asarray(H_lst), np.asarray(C_lst) #Convert list to numpy arrays.
Following will be your output of H_arr and C_arr for the chosen values of xdata and ydata.
print H_arr
>>>[ -11.5317468 -18.44101926 20.30837781 31.47360697 -14.45018355 24.17226837 39.96761325 15.28776756 -113.15255865 15.71324201 51.56631241 159.38292301 -28.2429133 -60.97509922 -89.48216973]
print C_arr
>>>[0.70339652 0.34734507 0.2664654 0.2062776 0.30740565 0.19066498 0.1812445 0.30169133 0.11654544 0.21882872 0.11852967 0.09968506 0.2288574 0.128909 0.11658227]


Least Squares Method for a sum of functions

I would like to use the curve_fit function from the scipy.optimize module to determine amplitudes, frequencies, phases of sum of sine functions (and one y0). It's easy to do when I know a number of sines to use. For example when I know two frequencies from the DFT (Discrete Fourier Transform): 1.152 and 0.432 I can define a function:
def func(x, amp1, amp2, freq1 , freq2, phase1, phase2, y0):
return amp1*np.sin(freq1*x + phase1) + amp2*np.sin(freq2*x + phase2) + y0
Then, using the curve_fit and constraining intervals of frequencies I can find a good fitting:
param, _ = curve_fit(func, t, data, bounds=([-np.inf, -np.inf, 1.14, 0.43, -np.inf, -np.inf, -np.inf], [np.inf, np.inf, 1.16, 0.44, np.inf, np.inf, np.inf]))
It looks great:
But in this case I've prepared the data and I've known a number of frequencies. Do you know how to define the func only once and handle all cases (for example five sine functions)? I've tried to put the parameters into lists, e.g. amp = [amp1, amp2, ... ] and I've iterated over their length. But there is a problem to define bounds for parameter lists. bounds is very important to ensure reality model.
The solution does not have to based on curve_fit.
Assuming you know the frequencies beforehand the problem is simple. You can set the lower bound to 0 and set the upper bound to 2 * pi * freq for frequency. For amps, set any number (or np.inf if you want no boundary).
You can formulate the function in the form lambda x, amp1, phase1, amp2, phase2... : y, curve_fit can accept a function of undefined number of arguments as long as you supply a proper initial guess.
A sample code for five frequencies:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
x = np.linspace(0,10,60)
w = [1,2,3,4,5]
a = [1,4,2,3,0.1]
x0 = [0,1,0,1,0.5]
y = np.sum(a_i * np.sin(w_i * x - x0_i) for w_i, a_i, x0_i in zip(w,a, x0)) #base_data
yr = y + np.random.normal(0,0.5, size=x.size) #noisy data
def func(x, *args):
""" function of the form lambda x, amp1, phase1, amp2, phase2...."""
return np.sum(a_i * np.sin(w_i * (x-x0)) for w_i, a_i, x0
in zip(w,args[::2], args[1::2]))
ubounds = np.zeros(len(w) * 2)
ubounds[::2] = 10 #setting amp max value to 10 (arbitrary)
ubounds[1::2] = np.asarray(w) * 2 * np.pi
p0 = [0] * 10 # note p0 size
popt, pcov = curve_fit(func, x, yr, p0, bounds=(0, ubounds))
amps, phases = popt[::2], popt[1::2]
plt.plot(x,func(x, *popt))
plt.plot(x,yr, 'go')

Fit an arbitrary number of parameters when calling curve_fit

Closest I found to this question was here: Fitting only one parameter of a function with many parameters in python. I have a multi-parameter function that I want to be able to call with a different subset of parameters being optimised in different parts of the code (useful because for some datasets, I may be able to fix some parameters based on ancillary data). Simplified demonstration of the problem below.
from scipy.optimize import curve_fit
import numpy as np
def wrapper_func(**kwargs):
a = kwargs['a'] if 'a' in kwargs else None
b = kwargs['b'] if 'b' in kwargs else None
c = kwargs['c'] if 'c' in kwargs else None
return lambda x, a, c: func(x, a, b, c)
def func(x, a, b, c):
return a * x**2 + b * x + c
# Set parameters
a = 0.3
b = 5
c = 17
# Make some fake data
x_vals = np.arange(100)
y_vals = a * x_vals**2 + b * x_vals + c
noise = np.random.randn(100) * 20
# Get fit
popt, pcov = curve_fit(lambda x, a_, c_: func(x, a_, b, c_),
x_vals, y_vals + noise)
# Get fit using separate function
alt_popt, alt_cov = curve_fit(wrapper_func(b=5), x_vals, y_vals + noise)
So this works, but I want to be able to pass any combination of parameters to be fixed. So here parameters a and c are optimised, and b is fixed, but if I want to fix a and optimise b and c (or any other combination), is there a way to do this neatly? I made a start with wrapper_func() above, but the same problem arises: there seems to be no way to vary which parameters are optimised, except by writing multiple lambdas (conditional on what fixed parameter values are passed). This gets ugly quickly because the equations I am working with have 4-6 parameters. I can make a version work using eval, but gather this is not recommended. As it stands I have been groping around trying to use *args with lambda, but haven't managed to get it to work.
Any tips greatly appreciated!
lmfit ( does exactly this. Instead of creating an array of floating point values for the parameters in the fit, one creates a Parameters object -- an ordered dictionary of Parameter objects that are used to parametrize the model for the data. Each Parameter can be fixed or varied in the fit, can have max/min bounds, or can be defined as a simple mathematical expression in terms of other Parameters in the fit.
That is, with lmfit (and its Model class that is especially useful for curve-fitting), one creates Parameters and can then decide which will be optimized and which will be held fixed.
As an example, here is a variation on the problem you pose:
import numpy as np
from lmfit import Model
import matplotlib.pylab as plt
# starting parameters
a, b, c = 0.3, 5, 17
x_vals = np.arange(100)
noise = np.random.normal(size=100, scale=0.25)
y_vals = a * x_vals**2 + b * x_vals + c + noise
def func(x, a, b, c):
return a * x**2 + b * x + c
# create a Model from this function
model = Model(func)
# create parameters with initial values. Model will know to
# turn function args `a`, `b`, and `c` into Parameters:
params = model.make_params(a=0.25, b=4, c=10)
# you can alter each parameter, for example, fix b or put bounds on a
params['b'].vary = False
params['b'].value = 5.3
params['a'].min = -1
params['a'].max = 1
# run fit
result =, params, x=x_vals)
# print and plot results
will print out:
[[Fit Statistics]]
# function evals = 12
# data points = 100
# variables = 2
chi-square = 475.843
reduced chi-square = 4.856
Akaike info crit = 159.992
Bayesian info crit = 165.202
a: 0.29716481 +/- 7.46e-05 (0.03%) (init= 0.25)
b: 5.3 (fixed)
c: 11.4708897 +/- 0.329508 (2.87%) (init= 10)
[[Correlations]] (unreported correlations are < 0.100)
C(a, c) = -0.744
(You will find that b and c are highly and negatively correlated) and show a plot like
Furthermore, the fit results including the parameters are held in result, so if you want to change what parameters are fixed, you can simply change the starting values (which have not been updated by the fit):
params['b'].vary = True
params['a'].value = 0.285
params['a'].vary = False
newresult =, params, x=x_vals)
and then compare/contrast the two results.
Here my solution. I am not sure how to do it with curve_fit, but it works with leastsq. It has a wrapper function that takes the free and fixed parameters as well as a list of the free parameter positions. As leastsq calls the function with the free parameters first, hence, the wrapper has to rearrange the order.
from matplotlib import pyplot as plt
import numpy as np
from scipy.optimize import leastsq
def func(x,a,b,c,d,e):
return a+b*x+c*x**2+d*x**3+e*x**4
#takes x, the 5 parameters and a list
# the first n parameters are free
# the list of length n gives there position, e.g. 2 parameters, 1st and 3rd order ->[1,3]
# the remaining parameters are in order, i.e. in this example it would be f(x,b,d,a,c,e)
def expand_parameters(*args):
for item in freeList:
for val,pos in zip(callArgs, freeList+fixedList):
return func(args[0],*callList)
def residuals(parameters,dataPoint,fixedParameterValues=None,freeParametersPosition=None):
if fixedParameterValues is None:
a,b,c,d,e = parameters
dist = [y -func(x,a,b,c,d,e) for x,y in dataPoint]
assert len(fixedParameterValues)==5-len(freeParametersPosition)
assert len(fixedParameterValues)>0
assert len(fixedParameterValues)<5 # doesn't make sense to fix all
dist = [y -expand_parameters(x,*extraIn) for x,y in dataPoint]
return dist
if __name__=="__main__":
fList=np.fromiter( (func(s,1.1,-.9,-.7,.5,.1) for s in xList), np.float)
###some test
print residuals([1.1,-.9,-.7,.5,.1],dataTupel)
print residuals([1.1,-.9,-.7,.5],dataTupel,fixedParameterValues=[.1],freeParametersPosition=[0,1,2,3])
#exact fit
bestFitValuesAll, ier = leastsq(residuals, [1,1,1,1,1],args=(dataTupel))
print bestFitValuesAll
###Only a constant
bestFitValuesConstOnly, ier = leastsq(residuals, guess,args=(dataTupel,[0,0,0,0],[0]))
print bestFitValuesConstOnly
fConstList=np.fromiter(( func(x,*np.append(bestFitValuesConstOnly,[0,0,0,0])) for x in xList),np.float)
###Only 2nd and 4th
bestFitValues_1_3, ier = leastsq(residuals, guess,args=(dataTupel,[0,0,0],[2,4]))
print bestFitValues_1_3
f_1_3_List=np.fromiter(( expand_parameters(x, *(list(bestFitValues_1_3)+[0,0,0]+[[2,4]] ) ) for x in xList),np.float)
###Only 2nd and 4th with closer values
bestFitValues_1_3_closer, ier = leastsq(residuals, guess,args=(dataTupel,[1.2,-.8,0],[2,4]))
print bestFitValues_1_3_closer
f_1_3_closer_List=np.fromiter(( expand_parameters(x, *(list(bestFitValues_1_3_closer)+[1.2,-.8,0]+[[2,4]] ) ) for x in xList),np.float)
ax.plot(xList,f_1_3_closer_List,linestyle='',marker='o',label='1,3 c')
>>[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
>>[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
>>[ 1.1 -0.9 -0.7 0.5 0.1]
>>[ 2.64880466]
>>[-0.14065838 0.18305123]
>>[-0.31708629 0.2227272 ]

Measuring the similarity between two irregular plots

I have two irregular lines as a list of [x,y] coordinates, which has peaks and troughs. The length of the list might vary slightly(unequal). I want to measure their similarity such that to check occurence of the peaks and troughs (of similar depth or height) are coming at proper interval and give a similarity measure. I want to do this in Python. Is there any inbuilt function to do this?
I don't know of any builtin functions in Python to do this.
I can give you a list of possible functions in the Python ecosystem you can use. This is in no way a complete list of functions, and there are probably quite a few methods out there that I am not aware of.
If the data is ordered, but you don't know which data point is the first and which data point is last:
Use the directed Hausdorff distance
If the data is ordered, and you know the first and last points are correct:
Discrete Fréchet distance *
Dynamic Time Warping (DTW) *
Partial Curve Mapping (PCM) **
A Curve-Length distance metric (uses arc length distance from beginning to end) **
Area between two curves **
* Generally mathematical method used in a variety of machine learning tasks
** Methods I've used to identify unique material hysteresis responses
First let's assume we have two of the exact same random X Y data. Note that all of these methods will return a zero. You can install the similaritymeasures from pip if you do not have it.
import numpy as np
from scipy.spatial.distance import directed_hausdorff
import similaritymeasures
import matplotlib.pyplot as plt
# Generate random experimental data
x = np.random.random(100)
y = np.random.random(100)
P = np.array([x, y]).T
# Generate an exact copy of P, Q, which we will use to compare
Q = P.copy()
dh, ind1, ind2 = directed_hausdorff(P, Q)
df = similaritymeasures.frechet_dist(P, Q)
dtw, d = similaritymeasures.dtw(P, Q)
pcm = similaritymeasures.pcm(P, Q)
area = similaritymeasures.area_between_two_curves(P, Q)
cl = similaritymeasures.curve_length_measure(P, Q)
# all methods will return 0.0 when P and Q are the same
print(dh, df, dtw, pcm, cl, area)
The printed output is
0.0, 0.0, 0.0, 0.0, 0.0, 0.0
This is because the curves P and Q are exactly the same!
Now let's assume P and Q are different.
# Generate random experimental data
x = np.random.random(100)
y = np.random.random(100)
P = np.array([x, y]).T
# Generate random Q
x = np.random.random(100)
y = np.random.random(100)
Q = np.array([x, y]).T
dh, ind1, ind2 = directed_hausdorff(P, Q)
df = similaritymeasures.frechet_dist(P, Q)
dtw, d = similaritymeasures.dtw(P, Q)
pcm = similaritymeasures.pcm(P, Q)
area = similaritymeasures.area_between_two_curves(P, Q)
cl = similaritymeasures.curve_length_measure(P, Q)
# all methods will return 0.0 when P and Q are the same
print(dh, df, dtw, pcm, cl, area)
The printed output is
0.107, 0.743, 37.69, 21.5, 6.86, 11.8
which quantify how different P is from Q according to each method.
You now have many methods to compare the two curves. I would start with DTW, since this has been used in many time series applications which look like the data you have uploaded.
We can visualize what P and Q look like with the following code.
plt.plot(P[:, 0], P[:, 1])
plt.plot(Q[:, 0], Q[:, 1])
Since your arrays are not the same size ( and I am assuming you are taking the same real time) , you need to interpolate them to compare across related set of points.
The following code does that, and calculates correlation measures:
import numpy as np
from scipy.interpolate import interp1d
import matplotlib.pyplot as plt
import scipy.spatial.distance as ssd
import scipy.stats as ss
x = np.linspace(0, 10, num=11)
x2 = np.linspace(1, 11, num=13)
y = 2*np.cos( x) + 4 + np.random.random(len(x))
y2 = 2* np.cos(x2) + 5 + np.random.random(len(x2))
# Interpolating now, using linear, but you can do better based on your data
f = interp1d(x, y)
f2 = interp1d(x2,y2)
points = 15
xnew = np.linspace ( min(x), max(x), num = points)
xnew2 = np.linspace ( min(x2), max(x2), num = points)
ynew = f(xnew)
ynew2 = f2(xnew2)
plt.plot(x,y, 'r', x2, y2, 'g', xnew, ynew, 'r--', xnew2, ynew2, 'g--')
# Now compute correlations
print ssd.correlation(ynew, ynew2) # Computes a distance measure based on correlation between the two vectors
print np.correlate(ynew, ynew2, mode='valid') # Does a cross-correlation of same sized arrays and gives back correlation
print np.corrcoef(ynew, ynew2) # Gives back the correlation matrix for the two arrays
print ss.spearmanr(ynew, ynew2) # Gives the spearman correlation for the two arrays
[ 363.48984942]
[[ 1. 0.50097173]
[ 0.50097173 1. ]]
SpearmanrResult(correlation=0.45357142857142857, pvalue=0.089485900143027278)
Remember that the correlations here are parametric and pearson type and assume monotonicity for calculating correlations. If this is not the case, and you think that your arrays are just changing sign together, you can use Spearman's correlation as in the last example.
I'm not aware of an inbuild function, but sounds like you can modify Levenshtein's distance. The following code is adopted from the code at wikibooks.
def point_distance(p1, p2):
# Define distance, if they are the same, then the distance should be 0
def levenshtein_point(l1, l2):
if len(l1) < len(l2):
return levenshtein(l2, l1)
# len(l1) >= len(l2)
if len(l2) == 0:
return len(l1)
previous_row = range(len(l2) + 1)
for i, p1 in enumerate(l1):
current_row = [i + 1]
for j, p2 in enumerate(l2):
print('{},{}'.format(p1, p2))
insertions = previous_row[j + 1] + 1 # j+1 instead of j since previous_row and current_row are one character longer
deletions = current_row[j] + 1 # than l2
substitutions = previous_row[j] + point_distance(p1, p2)
current_row.append(min(insertions, deletions, substitutions))
previous_row = current_row
return previous_row[-1]

how to find 50% point after curve fitting using numpy

I have used numpy in python to fit my data to a sigmoidal curve. How can I find the vaue for X at y=50% point in the curve after the data is fit to the curve
enter code here`import numpy as np
enter code here`import pylab
from scipy.optimize import curve_fit
def sigmoid(x, x0, k):
y = 1 / (1 + np.exp(-k*(x-x0)))
return y
xdata = np.array([0.0, 1.0, 3.0, 4.3, 7.0, 8.0, 8.5, 10.0, 12.0])
ydata = np.array([0.01, 0.02, 0.04, 0.11, 0.43, 0.7, 0.89, 0.95, 0.99])
popt, pcov = curve_fit(sigmoid, xdata, ydata)
print popt
x = np.linspace(-1, 15, 50)
y = sigmoid(x, *popt)
pylab.plot(xdata, ydata, 'o', label='data')
pylab.plot(x,y, label='fit')
pylab.ylim(0, 1.05)
You just need to solve the function you found for y(x) = 0.50. You can use one of the root finding tools of scipy, though these only solve for zero, so you need to give your function an offset:
def sigmoid(x, x0, k, y0=0):
y = 1 / (1 + np.exp(-k*(x-x0))) + y0
return y
Then it's just a matter of calling the root finding method of choice:
from scipy.optimize import brentq
a = np.min(xdata)
b = np.max(xdata)
x0, k = popt
y0 = -0.50
solution = brentq(sigmoid, a, b, args=(x0, k, y0)) # = 7.142
In addition to your comment:
My code above uses the original popt that was calculated with your code. If you do the curve fitting with the updated sigmoid function (with the offset), popt will also contain a fitted parameter for y0.
Probably you don't want this.. you'll want the curve fitted for y0=0. This can be done by supplying a guess for the curve_fit with only two values. This way the default value for y0 of the sigmoid function will be used:
popt, pcov = curve_fit(sigmoid, xdata, ydata, p0 = (1,1))
Alternatively, just declare two seperate sigmmoid functions, one with the offset and one without it.

Compute divergence of vector field using python

Is there a function that could be used for calculation of the divergence of the vectorial field? (in matlab) I would expect it exists in numpy/scipy but I can not find it using Google.
I need to calculate div[A * grad(F)], where
F = np.array([[1,2,3,4],[5,6,7,8]]) # (2D numpy ndarray)
A = np.array([[1,2,3,4],[1,2,3,4]]) # (2D numpy ndarray)
so grad(F) is a list of 2D ndarrays
I know I can calculate divergence like this but do not want to reinvent the wheel. (I would also expect something more optimized) Does anyone have suggestions?
Just a hint for everybody reading that:
the functions above do not compute the divergence of a vector field. they sum the derivatives of a scalar field A:
result = dA/dx + dA/dy
in contrast to a vector field (with three dimensional example):
result = sum dAi/dxi = dAx/dx + dAy/dy + dAz/dz
Vote down for all! It is mathematically simply wrong.
import numpy as np
def divergence(field):
"return the divergence of a n-D field"
return np.sum(np.gradient(field),axis=0)
Based on Juh_'s answer, but modified for the correct divergence of a vector field formula
def divergence(f):
Computes the divergence of the vector field f, corresponding to dFx/dx + dFy/dy + ...
:param f: List of ndarrays, where every item of the list is one dimension of the vector field
:return: Single ndarray of the same shape as each of the items in f, which corresponds to a scalar field
num_dims = len(f)
return np.ufunc.reduce(np.add, [np.gradient(f[i], axis=i) for i in range(num_dims)])
Matlab's documentation uses this exact formula (scroll down to Divergence of a Vector Field)
The answer of #user2818943 is good, but it can be optimized a little:
def divergence(F):
""" compute the divergence of n-D scalar field `F` """
return reduce(np.add,np.gradient(F))
F = np.random.rand(100,100)
timeit reduce(np.add,np.gradient(F))
# 1000 loops, best of 3: 318 us per loop
timeit np.sum(np.gradient(F),axis=0)
# 100 loops, best of 3: 2.27 ms per loop
About 7 times faster:
sum implicitely construct a 3d array from the list of gradient fields which are returned by np.gradient. This is avoided using reduce
Now, in your question what do you mean by div[A * grad(F)]?
about A * grad(F): A is a 2d array, and grad(f) is a list of 2d arrays. So I considered it means to multiply each gradient field by A.
about applying divergence to the (scaled by A) gradient field is unclear. By definition, div(F) = d(F)/dx + d(F)/dy + .... I guess this is just an error of formulation.
For 1, multiplying summed elements Bi by a same factor A can be factorized:
Sum(A*Bi) = A*Sum(Bi)
Thus, you can get this weighted gradient simply with: A*divergence(F)
If ̀A is instead a list of factor, one for each dimension, then the solution would be:
def weighted_divergence(W,F):
Return the divergence of n-D array `F` with gradient weighted by `W`
̀`W` is a list of factors for each dimension of F: the gradient of `F` over
the `i`th dimension is multiplied by `W[i]`. Each `W[i]` can be a scalar
or an array with same (or broadcastable) shape as `F`.
wGrad = return map(np.multiply, W, np.gradient(F))
return reduce(np.add,wGrad)
result = weighted_divergence(A,F)
What Daniel had modified is the right answer, let me explain self defined func divergence further in more detail :
Function np.gradient() defined as : np.gradient(f) = df/dx, df/dy, df/dz +...
but we need define func divergence as : divergence ( f) = dfx/dx + dfy/dy + dfz/dz +... = np.gradient( fx) + np.gradient(fy) + np.gradient(fz) + ...
Let's test, compare with example of divergence in matlab
import numpy as np
import matplotlib.pyplot as plt
NY = 50
ymin = -2.
ymax = 2.
dy = (ymax -ymin )/(NY-1.)
xmin = -2.
xmax = 2.
dx = (xmax -xmin)/(NX-1.)
def divergence(f):
num_dims = len(f)
return np.ufunc.reduce(np.add, [np.gradient(f[i], axis=i) for i in range(num_dims)])
y = np.array([ ymin + float(i)*dy for i in range(NY)])
x = np.array([ xmin + float(i)*dx for i in range(NX)])
x, y = np.meshgrid( x, y, indexing = 'ij', sparse = False)
Fx = np.cos(x + 2*y)
Fy = np.sin(x - 2*y)
F = [Fx, Fy]
g = divergence(F)
plt.pcolormesh(x, y, g)
plt.savefig( 'Div' + str(NY) +'.png', format = 'png')
---------- UPDATED VERSION: Include the differential Steps----------------
Thank the comment from #henry, the np.gradient take the default step as 1, so the results may have some mismatch. We can provide our own differential steps.
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import make_axes_locatable
NY = 50
ymin = -2.
ymax = 2.
dy = (ymax -ymin )/(NY-1.)
xmin = -2.
xmax = 2.
dx = (xmax -xmin)/(NX-1.)
def divergence(f,h):
div(F) = dFx/dx + dFy/dy + ...
g = np.gradient(Fx,dx, axis=1)+ np.gradient(Fy,dy, axis=0) #2D
g = np.gradient(Fx,dx, axis=2)+ np.gradient(Fy,dy, axis=1) +np.gradient(Fz,dz,axis=0) #3D
num_dims = len(f)
return np.ufunc.reduce(np.add, [np.gradient(f[i], h[i], axis=i) for i in range(num_dims)])
y = np.array([ ymin + float(i)*dy for i in range(NY)])
x = np.array([ xmin + float(i)*dx for i in range(NX)])
x, y = np.meshgrid( x, y, indexing = 'ij', sparse = False)
Fx = np.cos(x + 2*y)
Fy = np.sin(x - 2*y)
F = [Fx, Fy]
h = [dx, dy]
rows = 1
cols = 2
#g = np.gradient(Fx,dx, axis=1)+np.gradient(Fy,dy, axis=0) # equivalent to our func
g = divergence(F,h)
ax = plt.subplot(rows,cols,1,aspect='equal',title='div numerical')
#im=plt.pcolormesh(x, y, g)
im = plt.pcolormesh(x, y, g, shading='nearest','coolwarm'))
divider = make_axes_locatable(ax)
cax = divider.append_axes("right", size="5%", pad=0.05)
cbar = plt.colorbar(im, cax = cax,format='%.1f')
g = -np.sin(x+2*y) -2*np.cos(x-2*y)
ax = plt.subplot(rows,cols,2,aspect='equal',title='div analytical')
im=plt.pcolormesh(x, y, g)
im = plt.pcolormesh(x, y, g, shading='nearest','coolwarm'))
divider = make_axes_locatable(ax)
cax = divider.append_axes("right", size="5%", pad=0.05)
cbar = plt.colorbar(im, cax = cax,format='%.1f')
plt.savefig( 'divergence.png', format = 'png')
Based on #paul_chen answer, and with some additions for Matplotlib 3.3.0 (a shading param needs to be passed, and default colormap I guess has changed)
import numpy as np
import matplotlib.pyplot as plt
NY = 20; ymin = -2.; ymax = 2.
dy = (ymax -ymin )/(NY-1.)
xmin = -2.; xmax = 2.
dx = (xmax -xmin)/(NX-1.)
def divergence(f):
num_dims = len(f)
return np.ufunc.reduce(np.add, [np.gradient(f[i], axis=i) for i in range(num_dims)])
y = np.array([ ymin + float(i)*dy for i in range(NY)])
x = np.array([ xmin + float(i)*dx for i in range(NX)])
x, y = np.meshgrid( x, y, indexing = 'ij', sparse = False)
Fx = np.cos(x + 2*y)
Fy = np.sin(x - 2*y)
F = [Fx, Fy]
g = divergence(F)
plt.pcolormesh(x, y, g, shading='nearest','coolwarm'))
plt.savefig( 'Div.png', format = 'png')
The divergence as a built-in function is included in matlab, but not numpy. This is the sort of thing that it may perhaps be worthwhile to contribute to pylab, an effort to create a viable open-source alternative to matlab.
Edit: Now called
As far as I can tell, the answer is that there is no native divergence function in numpy. Therefore, the best method for calculating divergence is to sum the components of the gradient vector i.e. calculate the divergence.
I don't think the answer by #Daniel is correct, especially when the input is in order [Fx, Fy, Fz, ...].
A simple test case
See the MATLAB code:
a = [1 2 3;1 2 3; 1 2 3];
b = [[7 8 9] ;[1 5 8] ;[2 4 7]];
which gives the result:
ans =
-5.0000 -2.0000 0
-1.5000 -1.0000 0
2.0000 0 0
and Daniel's solution:
def divergence(f):
Daniel's solution
Computes the divergence of the vector field f, corresponding to dFx/dx + dFy/dy + ...
:param f: List of ndarrays, where every item of the list is one dimension of the vector field
:return: Single ndarray of the same shape as each of the items in f, which corresponds to a scalar field
num_dims = len(f)
return np.ufunc.reduce(np.add, [np.gradient(f[i], axis=i) for i in range(num_dims)])
if __name__ == '__main__':
a = np.array([[1, 2, 3]] * 3)
b = np.array([[7, 8, 9], [1, 5, 8], [2, 4, 7]])
div = divergence([a, b])
which gives:
[[1. 1. 1. ]
[4. 3.5 3. ]
[2. 2.5 3. ]]
The mistake of Daniel's solution is, in Numpy, the x axis is the last axis instead of the first axis. When using np.gradient(x, axis=0), Numpy actually gives the gradient of y direction (when x is a 2d array).
My solution
There is my solution based on Daniel's answer.
def divergence(f):
Computes the divergence of the vector field f, corresponding to dFx/dx + dFy/dy + ...
:param f: List of ndarrays, where every item of the list is one dimension of the vector field
:return: Single ndarray of the same shape as each of the items in f, which corresponds to a scalar field
num_dims = len(f)
return np.ufunc.reduce(np.add, [np.gradient(f[num_dims - i - 1], axis=i) for i in range(num_dims)])
which gives the same result as MATLAB divergence in my test case.
Somehow the previous attempts to compute the divergence are wrong! Let me show you:
We have the following vector field F:
F(x) = cos(x+2y)
F(y) = sin(x-2y)
If we compute the divergence (using Mathematica):
Div[{Cos[x + 2*y], Sin[x - 2*y]}, {x, y}]
we get:
-2 Cos[x - 2 y] - Sin[x + 2 y]
which has a maximum value in the range of y [-1,2] and x [-2,2]:
N[Max[Table[-2 Cos[x - 2 y] - Sin[x + 2 y], {x, -2, 2 }, {y, -2, 2}]]] = 2.938
Using the divergence equation given here:
def divergence(f):
num_dims = len(f)
return np.ufunc.reduce(np.add, [np.gradient(f[i], axis=i) for i in range(num_dims)])
we get a maximum value of about 0.625
Correct divergence function: Compute divergence with python

