Related
I have a pretty known issue but impossible to fix for the moment.
This is about curve_fit function. I get the error:
Error scipy/optimize/minpack.py", line 765, in curve_fit raise ValueError("sigma has incorrect shape.")
Here is the code, don't make caution to the loop for, it is just I would like 5 different histograms:
for i in range(5):
mean_o[i] = np.mean(y3[:,i])
sigma_o[i] = np.std(y3[:,i])
## Histograms
# Number of bins
Nbins=100
binwidth = np.zeros(5)
# Fitting curves
def gaussian(x, a, mean, sigma):
return a * np.exp(-((x - mean)**2 / (2 * sigma**2)))
for i in range(5):
binwidth[i] = (max(y3[:,i]) - min(y3[:,i]))/Nbins
bins_plot = np.arange(min(y3[:,i]), max(y3[:,i]) + binwidth[i], binwidth[i])
plt.title('Distribution of O observable for redshift bin = '+str(z_ph_fid[i]))
plt.hist(y3[:,i], bins=bins_plot, label='bin '+str(z_ph_fid[i]))
plt.legend(loc='upper right')
# Fitting and plot
range_fit = np.linspace(min(y3[:,i]), max(y3[:,i]), len(y3[:,i]))
popt, pcov = curve_fit(gaussian, range_fit, y3[:,i], mean_o[i], sigma_o[i])
plt.plot(range_fit, gaussian(range_fit, *popt))
# Save figure
plt.savefig('chi2_gaussian_bin_'+str(i+1)+'.png')
plt.close()
The first histogram i=0 look like:
I would like to plot a red Gaussian fit over the histogram.
There are two problems with the OP.
First problem is that the code tries to fit the random samples with a normal distribution. This is wrong. One can fit the output of the histogram, though. This is shown in the code below. Better would be the use of scipy.stats.norm.fit() which allows to fit the random samples. This is also shown.
Second problem is the sigma-shape. Here curve_fit is actually expecting errors on the y-data, which naturally needs the shape of the y-data. What should have been done is: providing start values for the fit. This is shown below as well.
Code looks like:
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import norm
from scipy.optimize import curve_fit
mean_o = list()
sigma_o = list()
y3 = list()
### generate some data
for i in range( 5 ):
y3.append( norm.rvs( size=150000 ) )
y3 = np.transpose( y3 )
for i in range(5):
mean_o.append( np.mean( y3[ :, i ] ) )
sigma_o.append( np.std( y3[ :, i ] ) )
## Histograms
# Number of bins
Nbins=100
binwidth = np.zeros(5)
# Fitting curves
def gaussian( x, a , mean, sigma ):
return a * np.exp( -( ( x - mean )**2 / (2 * sigma**2 ) ) )
fig = plt.figure()
ax = { i : fig.add_subplot( 2, 3, i+1) for i in range( 5 ) }
for i in range(5):
ymin = min(y3[:,i])
ymax = max(y3[:,i])
binwidth[i] = ( ymax - ymin) / Nbins
bins_plot = np.arange( ymin, ymax + binwidth[i], binwidth[i])
histdata = ax[i].hist(
y3[:,i],
bins=bins_plot,
label='bin '+str(i)
)
range_fit = np.linspace( ymin, ymax, 250)
# Fitting and plot version 1
popt, pcov = curve_fit(
gaussian,
0.5 * ( histdata[1][:-1] + histdata[1][1:] ),
histdata[0],
p0=( max(histdata[0]), mean_o[i], sigma_o[i] ) )
ax[i].plot(range_fit, gaussian( range_fit, *popt ) )
ax[i].axvline( x=mean_o[i], ls=':', c='r' )
# Fitting and plot version 2
params = norm.fit( y3[ ::, i ], loc=mean_o[i], scale=sigma_o[i] )
nth = gaussian(
range_fit,
len( y3[::, i]) * binwidth[i] / np.sqrt( 2 * np.pi ),
*params
)
ax[i].plot(range_fit, nth, ls="--" )
plt.tight_layout()
plt.show()
I want to fit my data using two linear functions (broken power law) with one breaking point which is user given. Currently Im using the curve_fit function from the scipy.optimize module. Here are my datasets frequencies, binned data, errors
Here is my code:
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
freqs=np.loadtxt('binf11.dat')
binys=np.loadtxt('binp11.dat')
errs=np.loadtxt('bine11.dat')
def brkPowLaw(xArray, breakp, slopeA, offsetA, slopeB):
returnArray = []
for x in xArray:
if x <= breakp:
returnArray.append(slopeA * x + offsetA)
elif x>breakp:
returnArray.append(slopeB * x + offsetA)
return returnArray
#define initial guesses, breakpoint=-3.2
a_fit,cov=curve_fit(brkPowLaw,freqs,binys,sigma=errs,p0=(-3.2,-2.0,-2.0,-2.0))
modelPredictions = brkPowLaw(freqs, *a_fit)
plt.errorbar(freqs, binys, yerr=errs, fmt='kp',fillstyle='none',elinewidth=1)
plt.xlim(-5,-2)
plt.plot(freqs,modelPredictions,'r')
The offset of the second linear function is set to be equal to the offset of the first one.
It looks like this works but I get this fit:
Now I thought that the condition in by brkPowLaw function should suffice but it does not. What I want is that the first linear equation is used to fit the data up to a chosen breaking point and then from this breaking point a second linear fit will be done, but without the hump as it shows in the plot because now there it looks like there are two breaking points instead of one and three linear functions for fitting which is not what I expected nor wanted.
What I want is that when the first linear fit ends the second one starts from the point where the first linear fit ended.
I have tried using the numpy.piecewise function with no plausible result, looked into some topics like this or this but I did not manage to make my script work
Thank you for your time
This would be my approach, not with linear but quadratic functions.
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit
def soft_step(x, s): ### not my usual np.tanh() ...OMG
return 1+ 0.5 * s * x / np.sqrt( 1 + ( s * x )**2 )
### for the looks of the data I decided to go for two parabolas with
### one discontinuity
def fit_func( x, a0, b0, c0, a1, b1, c1, x0, s ):
out = ( a0 * x**2 + b0 * x + c0 ) * ( 1 - soft_step( x - x0, s ) )
out += ( a1 * x**2 + b1 * x + c1 ) * soft_step( x - x0, s )
return out
### with global parameter for iterative fit
### if using least_squares one could avoid globals
def fit_short( x, a0, b0, c0, a1, b1, c1, x0 ):
global stepwidth
return fit_func( x, a0, b0, c0, a1, b1, c1, x0, stepwidth )
### getting data
xl = np.loadtxt( "binf11.dat" )
yl = np.loadtxt( "binp11.dat" )
el = np.loadtxt( "bine11.dat" )
### check for initial values
p0 = [ 0, -2,-11, 0, -2, -9, -3, 10 ]
xth = np.linspace( -5.5, -1.5, 250 )
yth = np.fromiter( ( fit_func(x, *p0 ) for x in xth ), np.float )
### initial fit
sol, pcov = curve_fit( fit_func, xl, yl, sigma=el, p0=p0, absolute_sigma=True )
yft = np.fromiter( ( fit_func( x, *sol ) for x in xth ), np.float )
sol=sol[: -1]
###iterating with fixed and decreasing softness in the step
for stepwidth in range(10,55,5):
sol, pcov = curve_fit( fit_short, xl, yl, sigma=el, p0=sol, absolute_sigma=True )
### printing the step position
print sol[-1]
yiter = np.fromiter( ( fit_short(x, *sol ) for x in xth ), np.float )
print sol
###plotting
fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1 )
# ~ax.plot( xth, yth ) ### no need to show start parameters
ax.plot( xth, yft ) ### first fit with variable softness
ax.plot( xth, yiter ) ### last fit with fixed softness of 50
ax.errorbar( xl, yl, el, marker='o', ls='' ) ### data
plt.show()
This gives:
-3.1762721614559712
-3.1804393481217477
-3.1822672190583603
-3.183493292415725
-3.1846976088390333
-3.185974760198917
-3.1872472903175266
-3.188427041827035
-3.1894705102541843
[ -0.78797351 -5.33255174 -12.48258537 0.53024954 1.14252783 -4.44589397 -3.18947051]
and
putting the jump at -3.189
I have a list of over 500 points, given in Latitude and Longitudes. These points represent craters, and I want to plot a heatmap of these craters. For example, I want an area with a lot of craters to be considered "hot" and fewer craters to be "cold". I have looked at KDE using SciPy, and also tried using ListSliceDensityPlot3D in Mathematica, but I have been unable to create a graph that is adequate.
I converted each point from latitude/longitude into Cartesian [x,y,z] coordinates, and plotted them on the surface of a sphere, but I don't know how I would take the list of points and calculate the density in a given area, and then plot it on a 3D surface.
The idea being that I obtain a plot something like this image of Ceres!
Thanks in advance, please ask questions if needed, sorry if I didn't post enough information initially.
This is sort of a brute force method, but it works up to a certain point.
It will be problematic, if you make the mesh extremely fine or have thousands of craters. If the bin size is small enough there is no big difference between the distance on the surface and the 3D distance, so I took the latter, as it is easier to calculate, but one may want to change this.
Code looks like:
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np
from matplotlib import cm
def random_point( r=1 ):
ct = 2*np.random.rand() - 1
st = np.sqrt( 1 - ct**2 )
phi = 2* np.pi * np.random.rand()
x = r * st * np.cos( phi)
y = r * st * np.sin( phi)
z = r * ct
return np.array( [x, y, z ] )
def near( p, pntList, d0 ):
cnt=0
for pj in pntList:
dist=np.linalg.norm( p - pj )
if dist < d0:
cnt += 1 - dist/d0
return cnt
"""
https://stackoverflow.com/questions/22128909/plotting-the-temperature-distribution-on-a-sphere-with-python
"""
pointList = np.array([ random_point( 10.05 ) for i in range( 65 ) ] )
fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1, projection='3d')
u = np.linspace( 0, 2 * np.pi, 120)
v = np.linspace( 0, np.pi, 60 )
# create the sphere surface
XX = 10 * np.outer( np.cos( u ), np.sin( v ) )
YY = 10 * np.outer( np.sin( u ), np.sin( v ) )
ZZ = 10 * np.outer( np.ones( np.size( u ) ), np.cos( v ) )
WW = XX.copy()
for i in range( len( XX ) ):
for j in range( len( XX[0] ) ):
x = XX[ i, j ]
y = YY[ i, j ]
z = ZZ[ i, j ]
WW[ i, j ] = near(n p.array( [x, y, z ] ), pointList, 3)
WW = WW / np.amax( WW )
myheatmap = WW
# ~ ax.scatter( *zip( *pointList ), color='#dd00dd' )
ax.plot_surface( XX, YY, ZZ, cstride=1, rstride=1, facecolors=cm.jet( myheatmap ) )
plt.show()
Outcome is like:
You can also modify the distance function to account for crater size, maybe.
There's two separate problems here: defining suitable bin sizes and count functions on a sphere so that you can build an appropriate colour function, and then plotting that colour function on a 3D sphere. I'll provide a Mathematica solution for both.
1. Making up example data
To start with, here's some example data:
data = Map[
Apply#Function[{x, y, z},
{
Mod[ArcTan[x, y], 2 π],
ArcSin[z/Sqrt[x^2 + y^2 + z^2]]
}
],
Map[
#/Norm[#]&,
Select[
RandomReal[{-1, 1}, {200000, 3}],
And[
Norm[#] > 0.01,
Or[
Norm[# - {0, 0.3, 0.2}] < 0.6,
Norm[# - {-0.3, -0.15, -0.3}] < 0.3
]
] &
]
]
]
I've made it a bit lumpy so that it will have more interesting features when it comes to plot time.
2. Building a colour function
To build a colour function within Mathematica, the cleanest solution is to use HistogramList, but this needs to be modified to account for the fact that bins at high latitude will have different areas, so the density needs to be adjusted.
Nevertheless, the in-built histogram-building tools are pretty good:
DensityHistogram[
data,
{5°}
, AspectRatio -> Automatic
, PlotRangePadding -> None
, ImageSize -> 700
]
You can get the raw data via
{{ϕbins, θbins}, counts} = HistogramList[data, {15°}]
and then for convenience let's define
ϕcenters = 1/2 (Most[ϕbins] + Rest[ϕbins])
θcenters = 1/2 (Most[θbins] + Rest[θbins])
with the bin area calculated using
SectorArea[ϕmin_, ϕmax_, θmin_, θmax_] = (Abs[ϕmax - ϕmin]/(4 π)) *
Integrate[Sin[θ], {θ, θmin, θmax}]
This then allows you to define your own color function as
function[ϕ_, θ_] := With[{
iϕ = First[Nearest[ϕcenters -> Range[Length[ϕcenters]], ϕ]],
iθ = First[Nearest[θcenters -> Range[Length[θcenters]], θ]]
},
(N#counts[[iϕ, iθ]]/
SectorArea[ϕbins[[iϕ]], ϕbins[[iϕ + 1]], θbins[[iθ]], θbins[[iθ + 1]]])/max
]
So, here's that function in action:
texture = ListDensityPlot[
Flatten[
Table[
{
ϕcenters[[iϕ]],
θcenters[[iθ]],
function[ϕcenters[[iϕ]], θcenters[[iθ]]]
}
, {iϕ, Length[ϕbins] - 1}
, {iθ, Length[θbins] - 1}
], 1]
, InterpolationOrder -> 0
, AspectRatio -> Automatic
, ColorFunction -> ColorData["GreenBrownTerrain"]
, Frame -> None
, PlotRangePadding -> None
]
3. Plotting
To plot the data on a sphere, I see two main options: you can make a surface plot and then wrap that around a parametric plot as a Texture, as in
ParametricPlot3D[
{Cos[ϕ] Sin[θ], Sin[ϕ] Sin[θ],
Cos[θ]}
, {ϕ, 0, 2 π}, {θ, 0, π}
, Mesh -> None
, Lighting -> "Neutral"
, PlotStyle -> Directive[
Specularity[White, 30],
Texture[texture]
]
]
Or you can define it as an explicit ColorFunction in that same parametric plot:
ParametricPlot3D[
{Cos[ϕ] Sin[θ], Sin[ϕ] Sin[θ],
Cos[θ]}
, {ϕ, 0, 2 π}, {θ, 0, π}
, ColorFunctionScaling -> False
, ColorFunction -> Function[{x, y, z, ϕ, θ},
ColorData["GreenBrownTerrain"][function[ϕ, θ]]
]
]
All of the above is of course very modular, so you're free to mix-and-match to your advantage.
I'm trying to fit some data to a power law using python. The problem is that some of my points are upper limits, which I don't know how to include in the fitting routine.
In the data, I have put the upper limits as errors in y equal to 1, when the rest is much smaller. You can put this errors to 0 and change the uplims list generator, but then the fit is terrible.
The code is the following:
import numpy as np
import matplotlib.pyplot as plt
from scipy.odr import *
# Initiate some data
x = [1.73e-04, 5.21e-04, 1.57e-03, 4.71e-03, 1.41e-02, 4.25e-02, 1.28e-01, 3.84e-01, 1.15e+00]
x_err = [1e-04, 1e-04, 1e-03, 1e-03, 1e-02, 1e-02, 1e-01, 1e-01, 1e-01]
y = [1.26e-05, 8.48e-07, 2.09e-08, 4.11e-09, 8.22e-10, 2.61e-10, 4.46e-11, 1.02e-11, 3.98e-12]
y_err = [1, 1, 2.06e-08, 2.5e-09, 5.21e-10, 1.38e-10, 3.21e-11, 1, 1]
# Define upper limits
uplims = np.ones(len(y_err),dtype='bool')
for i in range(len(y_err)):
if y_err[i]<1:
uplims[i]=0
else:
uplims[i]=1
# Define a function (power law in our case) to fit the data with.
def function(p, x):
m, c = p
return m*x**(-c)
# Create a model for fitting.
model = Model(function)
# Create a RealData object using our initiated data from above.
data = RealData(x, y, sx=x_err, sy=y_err)
# Set up ODR with the model and data.
odr = ODR(data, model, beta0=[1e-09, 2])
odr.set_job(fit_type=0) # 0 is full ODR and 2 is least squares; AFAIK, it doesn't change within errors
# more details in https://docs.scipy.org/doc/scipy/reference/generated/scipy.odr.ODR.set_job.html
# Run the regression.
out = odr.run()
# Use the in-built pprint method to give us results.
#out.pprint() #this prints much information, but normally we don't need it, just the parameters and errors; the residual variation is the reduced chi square estimator
print('amplitude = %5.2e +/- %5.2e \nindex = %5.2f +/- %5.2f \nchi square = %12.8f'% (out.beta[0], out.sd_beta[0], out.beta[1], out.sd_beta[1], out.res_var))
# Generate fitted data.
x_fit = np.linspace(x[0], x[-1], 1000) #to do the fit only within the x interval; we can always extrapolate it, of course
y_fit = function(out.beta, x_fit)
# Generate a plot to show the data, errors, and fit.
fig, ax = plt.subplots()
ax.errorbar(x, y, xerr=x_err, yerr=y_err, uplims=uplims, linestyle='None', marker='x')
ax.loglog(x_fit, y_fit)
ax.set_xlabel(r'$x$')
ax.set_ylabel(r'$f(x) = m·x^{-c}$')
ax.set_title('Power Law fit')
plt.show()
The result of the fit is:
amplitude = 3.42e-12 +/- 5.32e-13
index = 1.33 +/- 0.04
chi square = 0.01484021
As you can see in the plot, the two first and two last points are upper limits and the fit is not taking them into account. Moreover, in the penultimate point, the fit goes over it even though that would be strictly forbidden.
I need that the fit knows this limits are very strict, and not try to fit the point itself but only consider them just as limits. How could I do this with the odr routine (or any other code which makes me the fits and gives me a chi square-esque estimator)?
Please, take into account that I need to change the function to other generalizations easily, so things as the powerlaw module are not desirable.
Thanks!
This answer is related to this post, where I discuss fitting with x and y errors. This, hence does not require the ODR module, but can be done manually. Therefore, one can use leastsq or minimize. Concerning the constraints, I made clear in other posts that I try to avoid them if possible. This can be done here as well, although the details of programming and maths are a little cumbersome, especially if it is supposed to be stable and foolproof. I will just give a rough idea. Say we want y0 > m * x0**(-c). In log-form we can write this as eta0 > mu - c * xeta0. I.e. there is an alpha such that eta0 = mu - c * xeta0 + alpha**2. Same for the other inequalities. For the second upper limit you get a beta**2 but you can decide which one is the smaller one, so you automatically fulfil the other condition. Same thing works for the lower limits with a gamma**2 and a delta**2. Say we can work with alpha and gamma. We can combine the inequality conditions to relate those two as well. At the end we can fit a sigma and alpha = sqrt(s-t)* sigma / sqrt( sigma**2 + 1 ), where s and t are derived from the inequalities. The sigma / sqrt( sigma**2 + 1 ) function is just one option to let alpha vary in a certain range, i.e. alpha**2 < s-t The fact that the radicand may become negative, shows that there are cases without solution. With alpha known, mu and, therefore m are calculated. So fit parameters are c and sigma, which takes the inequalities into account and makes m depended. I tired it and it works, but the version at hand is not the most stable one. I'd post it upon request.
As we have a handmade residual function already, we have a second option, though. We just introduce our own chi**2 function and use minimize, whic allows constraints. As minimize and the constraints keyword solution are very flexible and the residual function is easily modified for other functions and not only for m * x**( -c ) the overall construction is quite flexible. It looks as follows:
import matplotlib.pyplot as plt
import numpy as np
from random import random, seed
from scipy.optimize import minimize,leastsq
seed(7563)
fig1 = plt.figure(1)
###for gaussion distributed errors
def boxmuller(x0,sigma):
u1=random()
u2=random()
ll=np.sqrt(-2*np.log(u1))
z0=ll*np.cos(2*np.pi*u2)
z1=ll*np.cos(2*np.pi*u2)
return sigma*z0+x0, sigma*z1+x0
###for plotting ellipses
def ell_data(a,b,x0=0,y0=0):
tList=np.linspace(0,2*np.pi,150)
k=float(a)/float(b)
rList=[a/np.sqrt((np.cos(t))**2+(k*np.sin(t))**2) for t in tList]
xyList=np.array([[x0+r*np.cos(t),y0+r*np.sin(t)] for t,r in zip(tList,rList)])
return xyList
###function to fit
def f(x,m,c):
y = abs(m) * abs(x)**(-abs(c))
#~ print y,x,m,c
return y
###how to rescale the ellipse to make fitfunction a tangent
def elliptic_rescale(x, m, c, x0, y0, sa, sb):
#~ print "e,r",x,m,c
y=f( x, m, c )
#~ print "e,r",y
r=np.sqrt( ( x - x0 )**2 + ( y - y0 )**2 )
kappa=float( sa ) / float( sb )
tau=np.arctan2( y - y0, x - x0 )
new_a=r*np.sqrt( np.cos( tau )**2 + ( kappa * np.sin( tau ) )**2 )
return new_a
###residual function to calculate chi-square
def residuals(parameters,dataPoint):#data point is (x,y,sx,sy)
m, c = parameters
#~ print "m c", m, c
theData = np.array(dataPoint)
best_t_List=[]
for i in range(len(dataPoint)):
x, y, sx, sy = dataPoint[i][0], dataPoint[i][1], dataPoint[i][2], dataPoint[i][3]
#~ print "x, y, sx, sy",x, y, sx, sy
###getthe point on the graph where it is tangent to an error-ellipse
ed_fit = minimize( elliptic_rescale, x , args = ( m, c, x, y, sx, sy ) )
best_t = ed_fit['x'][0]
best_t_List += [best_t]
#~ exit(0)
best_y_List=[ f( t, m, c ) for t in best_t_List ]
##weighted distance not squared yet, as this is done by scipy.optimize.leastsq
wighted_dx_List = [ ( x_b - x_f ) / sx for x_b, x_f, sx in zip( best_t_List,theData[:,0], theData[:,2] ) ]
wighted_dy_List = [ ( x_b - x_f ) / sx for x_b, x_f, sx in zip( best_y_List,theData[:,1], theData[:,3] ) ]
return wighted_dx_List + wighted_dy_List
def chi2(params, pnts):
r = np.array( residuals( params, pnts ) )
s = sum( [ x**2 for x in r] )
#~ print params,s,r
return s
def myUpperIneq(params,pnt):
m, c = params
x,y=pnt
return y - f( x, m, c )
def myLowerIneq(params,pnt):
m, c = params
x,y=pnt
return f( x, m, c ) - y
###to create some test data
def test_data(m,c, xList,const_sx,rel_sx,const_sy,rel_sy):
yList=[f(x,m,c) for x in xList]
xErrList=[ boxmuller(x,const_sx+x*rel_sx)[0] for x in xList]
yErrList=[ boxmuller(y,const_sy+y*rel_sy)[0] for y in yList]
return xErrList,yErrList
###some start values
mm_0=2.3511
expo_0=.3588
csx,rsx=.01,.07
csy,rsy=.04,.09,
limitingPoints=dict()
limitingPoints[0]=np.array([[.2,5.4],[.5,5.0],[5.1,.9],[5.7,.9]])
limitingPoints[1]=np.array([[.2,5.4],[.5,5.0],[5.1,1.5],[5.7,1.2]])
limitingPoints[2]=np.array([[.2,3.4],[.5,5.0],[5.1,1.1],[5.7,1.2]])
limitingPoints[3]=np.array([[.2,3.4],[.5,5.0],[5.1,1.7],[5.7,1.2]])
####some data
xThData=np.linspace(.2,5,15)
yThData=[ f(x, mm_0, expo_0) for x in xThData]
#~ ###some noisy data
xNoiseData,yNoiseData=test_data(mm_0, expo_0, xThData, csx,rsx, csy,rsy)
xGuessdError=[csx+rsx*x for x in xNoiseData]
yGuessdError=[csy+rsy*y for y in yNoiseData]
for testing in range(4):
###Now fitting with limits
zipData=zip(xNoiseData,yNoiseData, xGuessdError, yGuessdError)
estimate = [ 2.4, .3 ]
con0={'type': 'ineq', 'fun': myUpperIneq, 'args': (limitingPoints[testing][0],)}
con1={'type': 'ineq', 'fun': myUpperIneq, 'args': (limitingPoints[testing][1],)}
con2={'type': 'ineq', 'fun': myLowerIneq, 'args': (limitingPoints[testing][2],)}
con3={'type': 'ineq', 'fun': myLowerIneq, 'args': (limitingPoints[testing][3],)}
myResult = minimize( chi2 , estimate , args=( zipData, ), constraints=[ con0, con1, con2, con3 ] )
print "############"
print myResult
###plot that
ax=fig1.add_subplot(4,2,2*testing+1)
ax.plot(xThData,yThData)
ax.errorbar(xNoiseData,yNoiseData, xerr=xGuessdError, yerr=yGuessdError, fmt='none',ecolor='r')
testX = np.linspace(.2,6,25)
testY = np.fromiter( ( f( x, myResult.x[0], myResult.x[1] ) for x in testX ), np.float)
bx=fig1.add_subplot(4,2,2*testing+2)
bx.plot(xThData,yThData)
bx.errorbar(xNoiseData,yNoiseData, xerr=xGuessdError, yerr=yGuessdError, fmt='none',ecolor='r')
ax.plot(limitingPoints[testing][:,0],limitingPoints[testing][:,1],marker='x', linestyle='')
bx.plot(limitingPoints[testing][:,0],limitingPoints[testing][:,1],marker='x', linestyle='')
ax.plot(testX, testY, linestyle='--')
bx.plot(testX, testY, linestyle='--')
bx.set_xscale('log')
bx.set_yscale('log')
plt.show()
Providing results
############
status: 0
success: True
njev: 8
nfev: 36
fun: 13.782127248002116
x: array([ 2.15043226, 0.35646436])
message: 'Optimization terminated successfully.'
jac: array([-0.00377715, 0.00350225, 0. ])
nit: 8
############
status: 0
success: True
njev: 7
nfev: 32
fun: 41.372277637885716
x: array([ 2.19005695, 0.23229378])
message: 'Optimization terminated successfully.'
jac: array([ 123.95069313, -442.27114677, 0. ])
nit: 7
############
status: 0
success: True
njev: 5
nfev: 23
fun: 15.946621924326545
x: array([ 2.06146362, 0.31089065])
message: 'Optimization terminated successfully.'
jac: array([-14.39131606, -65.44189298, 0. ])
nit: 5
############
status: 0
success: True
njev: 7
nfev: 34
fun: 88.306027468763432
x: array([ 2.16834392, 0.14935514])
message: 'Optimization terminated successfully.'
jac: array([ 224.11848736, -791.75553417, 0. ])
nit: 7
I checked four different limiting points (rows). The result are displayed normally and in logarithmic scale (columns). With some additional work you could get errors as well.
Update on asymmetric errors
To be honest, at the moment I do not know how to handle this property. Naively, I'd define my own asymmetric loss function similar to this post.
With x and y errors I do it by quadrant instead of just checking positive or negative side. My error ellipse, hence, changes to four connected pieces.
Nevertheless, it is somewhat reasonable. For testing and to show how it works, I made an example with a linear function. I guess the OP can combine the two pieces of code according to his requirements.
In case of a linear fit it looks like this:
import matplotlib.pyplot as plt
import numpy as np
from random import random, seed
from scipy.optimize import minimize,leastsq
#~ seed(7563)
fig1 = plt.figure(1)
ax=fig1.add_subplot(2,1,1)
bx=fig1.add_subplot(2,1,2)
###function to fit, here only linear for testing.
def f(x,m,y0):
y = m * x +y0
return y
###for gaussion distributed errors
def boxmuller(x0,sigma):
u1=random()
u2=random()
ll=np.sqrt(-2*np.log(u1))
z0=ll*np.cos(2*np.pi*u2)
z1=ll*np.cos(2*np.pi*u2)
return sigma*z0+x0, sigma*z1+x0
###for plotting ellipse quadrants
def ell_data(aN,aP,bN,bP,x0=0,y0=0):
tPPList=np.linspace(0, 0.5 * np.pi, 50)
kPP=float(aP)/float(bP)
rPPList=[aP/np.sqrt((np.cos(t))**2+(kPP*np.sin(t))**2) for t in tPPList]
tNPList=np.linspace( 0.5 * np.pi, 1.0 * np.pi, 50)
kNP=float(aN)/float(bP)
rNPList=[aN/np.sqrt((np.cos(t))**2+(kNP*np.sin(t))**2) for t in tNPList]
tNNList=np.linspace( 1.0 * np.pi, 1.5 * np.pi, 50)
kNN=float(aN)/float(bN)
rNNList=[aN/np.sqrt((np.cos(t))**2+(kNN*np.sin(t))**2) for t in tNNList]
tPNList = np.linspace( 1.5 * np.pi, 2.0 * np.pi, 50)
kPN = float(aP)/float(bN)
rPNList = [aP/np.sqrt((np.cos(t))**2+(kPN*np.sin(t))**2) for t in tPNList]
tList = np.concatenate( [ tPPList, tNPList, tNNList, tPNList] )
rList = rPPList + rNPList+ rNNList + rPNList
xyList=np.array([[x0+r*np.cos(t),y0+r*np.sin(t)] for t,r in zip(tList,rList)])
return xyList
###how to rescale the ellipse to touch fitfunction at point (x,y)
def elliptic_rescale_asymmetric(x, m, c, x0, y0, saN, saP, sbN, sbP , getQuadrant=False):
y=f( x, m, c )
###distance to function
r=np.sqrt( ( x - x0 )**2 + ( y - y0 )**2 )
###angle to function
tau=np.arctan2( y - y0, x - x0 )
quadrant=0
if tau >0:
if tau < 0.5 * np.pi: ## PP
kappa=float( saP ) / float( sbP )
quadrant=1
else:
kappa=float( saN ) / float( sbP )
quadrant=2
else:
if tau < -0.5 * np.pi: ## PP
kappa=float( saN ) / float( sbN)
quadrant=3
else:
kappa=float( saP ) / float( sbN )
quadrant=4
new_a=r*np.sqrt( np.cos( tau )**2 + ( kappa * np.sin( tau ) )**2 )
if quadrant == 1 or quadrant == 4:
rel_a=new_a/saP
else:
rel_a=new_a/saN
if getQuadrant:
return rel_a, quadrant, tau
else:
return rel_a
### residual function to calculate chi-square
def residuals(parameters,dataPoint):#data point is (x,y,sxN,sxP,syN,syP)
m, c = parameters
theData = np.array(dataPoint)
bestTList=[]
qqList=[]
weightedDistanceList = []
for i in range(len(dataPoint)):
x, y, sxN, sxP, syN, syP = dataPoint[i][0], dataPoint[i][1], dataPoint[i][2], dataPoint[i][3], dataPoint[i][4], dataPoint[i][5]
### get the point on the graph where it is tangent to an error-ellipse
### i.e. smallest ellipse touching the graph
edFit = minimize( elliptic_rescale_asymmetric, x , args = ( m, c, x, y, sxN, sxP, syN, syP ) )
bestT = edFit['x'][0]
bestTList += [ bestT ]
bestA,qq , tau= elliptic_rescale_asymmetric( bestT, m, c , x, y, aN, aP, bN, bP , True)
qqList += [ qq ]
bestYList=[ f( t, m, c ) for t in bestTList ]
### weighted distance not squared yet, as this is done by scipy.optimize.leastsq or manual chi2 function
for counter in range(len(dataPoint)):
xb=bestTList[counter]
xf=dataPoint[counter][0]
yb=bestYList[counter]
yf=dataPoint[counter][1]
quadrant=qqList[counter]
if quadrant == 1:
sx, sy = sxP, syP
elif quadrant == 2:
sx, sy = sxN, syP
elif quadrant == 3:
sx, sy = sxN, syN
elif quadrant == 4:
sx, sy = sxP, syN
else:
assert 0
weightedDistanceList += [ ( xb - xf ) / sx, ( yb - yf ) / sy ]
return weightedDistanceList
def chi2(params, pnts):
r = np.array( residuals( params, pnts ) )
s = np.fromiter( ( x**2 for x in r), np.float ).sum()
return s
####...to make data with asymmetric error (fixed); for testing only
def noisy_data(xList,m0,y0, sxN,sxP,syN,syP):
yList=[ f(x, m0, y0) for x in xList]
gNList=[boxmuller(0,1)[0] for dummy in range(len(xList))]
xerrList=[]
for x,err in zip(xList,gNList):
if err < 0:
xerrList += [ sxP * err + x ]
else:
xerrList += [ sxN * err + x ]
gNList=[boxmuller(0,1)[0] for dummy in range(len(xList))]
yerrList=[]
for y,err in zip(yList,gNList):
if err < 0:
yerrList += [ syP * err + y ]
else:
yerrList += [ syN * err + y ]
return xerrList, yerrList
###some start values
m0=1.3511
y0=-2.2
aN, aP, bN, bP=.2,.5, 0.9, 1.6
#### some data
xThData=np.linspace(.2,5,15)
yThData=[ f(x, m0, y0) for x in xThData]
xThData0=np.linspace(-1.2,7,3)
yThData0=[ f(x, m0, y0) for x in xThData0]
### some noisy data
xErrList,yErrList = noisy_data(xThData, m0, y0, aN, aP, bN, bP)
###...and the fit
dataToFit=zip(xErrList,yErrList, len(xThData)*[aN], len(xThData)*[aP], len(xThData)*[bN], len(xThData)*[bP])
fitResult = minimize(chi2, (m0,y0) , args=(dataToFit,) )
fittedM, fittedY=fitResult.x
yThDataF=[ f(x, fittedM, fittedY) for x in xThData0]
### plot that
for cx in [ax,bx]:
cx.plot([-2,7], [f(x, m0, y0 ) for x in [-2,7]])
ax.errorbar(xErrList,yErrList, xerr=[ len(xThData)*[aN],len(xThData)*[aP] ], yerr=[ len(xThData)*[bN],len(xThData)*[bP] ], fmt='ro')
for x,y in zip(xErrList,yErrList)[:]:
xEllList,yEllList = zip( *ell_data(aN,aP,bN,bP,x,y) )
ax.plot(xEllList,yEllList ,c='#808080')
### rescaled
### ...as well as a scaled version that touches the original graph. This gives the error shortest distance to that graph
ed_fit = minimize( elliptic_rescale_asymmetric, 0 ,args=(m0, y0, x, y, aN, aP, bN, bP ) )
best_t = ed_fit['x'][0]
best_a,qq , tau= elliptic_rescale_asymmetric( best_t, m0, y0 , x, y, aN, aP, bN, bP , True)
xEllList,yEllList = zip( *ell_data( aN * best_a, aP * best_a, bN * best_a, bP * best_a, x, y) )
ax.plot( xEllList, yEllList, c='#4040a0' )
###plot the fit
bx.plot(xThData0,yThDataF)
bx.errorbar(xErrList,yErrList, xerr=[ len(xThData)*[aN],len(xThData)*[aP] ], yerr=[ len(xThData)*[bN],len(xThData)*[bP] ], fmt='ro')
for x,y in zip(xErrList,yErrList)[:]:
xEllList,yEllList = zip( *ell_data(aN,aP,bN,bP,x,y) )
bx.plot(xEllList,yEllList ,c='#808080')
####rescaled
####...as well as a scaled version that touches the original graph. This gives the error shortest distance to that graph
ed_fit = minimize( elliptic_rescale_asymmetric, 0 ,args=(fittedM, fittedY, x, y, aN, aP, bN, bP ) )
best_t = ed_fit['x'][0]
#~ print best_t
best_a,qq , tau= elliptic_rescale_asymmetric( best_t, fittedM, fittedY , x, y, aN, aP, bN, bP , True)
xEllList,yEllList = zip( *ell_data( aN * best_a, aP * best_a, bN * best_a, bP * best_a, x, y) )
bx.plot( xEllList, yEllList, c='#4040a0' )
plt.show()
which plots
The upper graph shows the original linear function and some data generated from this using asymmetric Gaussian errors. Error bars are plotted, as well as the piecewise error ellipses (grey...and rescaled to touch the linear function, blue). The lower graph additionally shows the fitted function as well as the rescaled piecewise ellipses, touching the fitted function.
I've been trying to use scipy.interpolate.bisplrep() and scipy.interpolate.interp2d() to find interpolants for data on my (218x135) 2D spherical-polar grid. To these I pass 2D arrays, X and Y, of the Cartesian positions of my grid nodes. I keep getting errors like the following (for linear interp. with interp2d):
"Warning: No more knots can be added because the additional knot would coincide
with an old one. Probably cause: s too small or too large a weight
to an inaccurate data point. (fp>s)
kx,ky=1,1 nx,ny=4,5 m=29430 fp=1390609718.902140 s=0.000000"
I get a similar result for bivariate splines with the default value of the smoothing parameter s etc. My data are smooth. I've attached my code below in case I'm doing something obviously wrong.
Any ideas?
Thanks!
Kyle
class Field(object):
Nr = 0
Ntheta = 0
grid = np.array([])
def __init__(self, Nr, Ntheta, f):
self.Nr = Nr
self.Ntheta = Ntheta
self.grid = np.empty([Nr, Ntheta])
for i in range(Nr):
for j in range(Ntheta):
self.grid[i,j] = f[i*Ntheta + j]
def calculate_lines(filename):
ri,ti,r,t,Br,Bt,Bphi,Bmag = np.loadtxt(filename, skiprows=3,\
usecols=(1,2,3,4,5,6,7,9), unpack=True)
Nr = int(max(ri)) + 1
Ntheta = int(max(ti)) + 1
### Initialise coordinate grids ###
X = np.empty([Nr, Ntheta])
Y = np.empty([Nr, Ntheta])
for i in range(Nr):
for j in range(Ntheta):
indx = i*Ntheta + j
X[i,j] = r[indx]*sin(t[indx])
Y[i,j] = r[indx]*cos(t[indx])
### Initialise field objects ###
Bradial = Field(Nr=Nr, Ntheta=Ntheta, f=Br)
### Interpolate the fields ###
intp_Br = interpolate.interp2d(X, Y, Bradial.grid, kind='linear')
#rbf_0 = interpolate.Rbf(X,Y, Bradial.grid, epsilon=2)
return
Added 27Aug: Kyle followed this up on a
scipy-user thread.
30Aug: #Kyle, it looks as though there's a mixup between Cartesion X,Y and polar Xnew,Ynew.
See "polar" in the too-long notes below.
# griddata vs SmoothBivariateSpline
# http://stackoverflow.com/questions/3526514/
# problem-with-2d-interpolation-in-scipy-non-rectangular-grid
# http://www.scipy.org/Cookbook/Matplotlib/Gridding_irregularly_spaced_data
# http://en.wikipedia.org/wiki/Natural_neighbor
# http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html
from __future__ import division
import sys
import numpy as np
from scipy.interpolate import SmoothBivariateSpline # $scipy/interpolate/fitpack2.py
from matplotlib.mlab import griddata
__date__ = "2010-10-08 Oct" # plot diffs, ypow
# "2010-09-13 Sep" # smooth relative
def avminmax( X ):
absx = np.abs( X[ - np.isnan(X) ])
av = np.mean(absx)
m, M = np.nanmin(X), np.nanmax(X)
histo = np.histogram( X, bins=5, range=(m,M) ) [0]
return "av %.2g min %.2g max %.2g histo %s" % (av, m, M, histo)
def cosr( x, y ):
return 10 * np.cos( np.hypot(x,y) / np.sqrt(2) * 2*np.pi * cycle )
def cosx( x, y ):
return 10 * np.cos( x * 2*np.pi * cycle )
def dipole( x, y ):
r = .1 + np.hypot( x, y )
t = np.arctan2( y, x )
return np.cos(t) / r**3
#...............................................................................
testfunc = cosx
Nx = Ny = 20 # interpolate random Nx x Ny points -> Newx x Newy grid
Newx = Newy = 100
cycle = 3
noise = 0
ypow = 2 # denser => smaller error
imclip = (-5., 5.) # plot trierr, splineerr to same scale
kx = ky = 3
smooth = .01 # Spline s = smooth * z2sum, see note
# s is a target for sum (Z() - spline())**2 ~ Ndata and Z**2;
# smooth is relative, s absolute
# s too small => interpolate/fitpack2.py:580: UserWarning: ier=988, junk out
# grr error message once only per ipython session
seed = 1
plot = 0
exec "\n".join( sys.argv[1:] ) # run this.py N= ...
np.random.seed(seed)
np.set_printoptions( 1, threshold=100, suppress=True ) # .1f
print 80 * "-"
print "%s Nx %d Ny %d -> Newx %d Newy %d cycle %.2g noise %.2g kx %d ky %d smooth %s" % (
testfunc.__name__, Nx, Ny, Newx, Newy, cycle, noise, kx, ky, smooth)
#...............................................................................
# interpolate X Y Z to xnew x ynew --
X, Y = np.random.uniform( size=(Nx*Ny, 2) ) .T
Y **= ypow
# 1d xlin ylin -> 2d X Y Z, Ny x Nx --
# xlin = np.linspace( 0, 1, Nx )
# ylin = np.linspace( 0, 1, Ny )
# X, Y = np.meshgrid( xlin, ylin )
Z = testfunc( X, Y ) # Ny x Nx
if noise:
Z += np.random.normal( 0, noise, Z.shape )
# print "Z:\n", Z
z2sum = np.sum( Z**2 )
xnew = np.linspace( 0, 1, Newx )
ynew = np.linspace( 0, 1, Newy )
Zexact = testfunc( *np.meshgrid( xnew, ynew ))
if imclip is None:
imclip = np.min(Zexact), np.max(Zexact)
xflat, yflat, zflat = X.flatten(), Y.flatten(), Z.flatten()
#...............................................................................
print "SmoothBivariateSpline:"
fit = SmoothBivariateSpline( xflat, yflat, zflat, kx=kx, ky=ky, s = smooth * z2sum )
Zspline = fit( xnew, ynew ) .T # .T ??
splineerr = Zspline - Zexact
print "Zspline - Z:", avminmax(splineerr)
print "Zspline: ", avminmax(Zspline)
print "Z: ", avminmax(Zexact)
res = fit.get_residual()
print "residual %.0f res/z2sum %.2g" % (res, res / z2sum)
# print "knots:", fit.get_knots()
# print "Zspline:", Zspline.shape, "\n", Zspline
print ""
#...............................................................................
print "griddata:"
Ztri = griddata( xflat, yflat, zflat, xnew, ynew )
# 1d x y z -> 2d Ztri on meshgrid(xnew,ynew)
nmask = np.ma.count_masked(Ztri)
if nmask > 0:
print "info: griddata: %d of %d points are masked, not interpolated" % (
nmask, Ztri.size)
Ztri = Ztri.data # Nans outside convex hull
trierr = Ztri - Zexact
print "Ztri - Z:", avminmax(trierr)
print "Ztri: ", avminmax(Ztri)
print "Z: ", avminmax(Zexact)
print ""
#...............................................................................
if plot:
import pylab as pl
nplot = 2
fig = pl.figure( figsize=(10, 10/nplot + .5) )
pl.suptitle( "Interpolation error: griddata - %s, BivariateSpline - %s" % (
testfunc.__name__, testfunc.__name__ ), fontsize=11 )
def subplot( z, jplot, label ):
ax = pl.subplot( 1, nplot, jplot )
im = pl.imshow(
np.clip( z, *imclip ), # plot to same scale
cmap=pl.cm.RdYlBu,
interpolation="nearest" )
# nearest: squares, else imshow interpolates too
# todo: centre the pixels
ny, nx = z.shape
pl.scatter( X*nx, Y*ny, edgecolor="y", s=1 ) # for random XY
pl.xlabel(label)
return [ax, im]
subplot( trierr, 1,
"griddata, Delaunay triangulation + Natural neighbor: max %.2g" %
np.nanmax(np.abs(trierr)) )
ax, im = subplot( splineerr, 2,
"SmoothBivariateSpline kx %d ky %d smooth %.3g: max %.2g" % (
kx, ky, smooth, np.nanmax(np.abs(splineerr)) ))
pl.subplots_adjust( .02, .01, .92, .98, .05, .05 ) # l b r t
cax = pl.axes([.95, .05, .02, .9]) # l b w h
pl.colorbar( im, cax=cax ) # -1.5 .. 9 ??
if plot >= 2:
pl.savefig( "tmp.png" )
pl.show()
Notes on 2d interpolation, BivariateSpline vs. griddata.
scipy.interpolate.*BivariateSpline and matplotlib.mlab.griddata
both take 1d arrays as arguments:
Znew = griddata( X,Y,Z, Xnew,Ynew )
# 1d X Y Z Xnew Ynew -> interpolated 2d Znew on meshgrid(Xnew,Ynew)
assert X.ndim == Y.ndim == Z.ndim == 1 and len(X) == len(Y) == len(Z)
The inputs X,Y,Z describe a surface or cloud of points in 3-space:
X,Y (or latitude,longitude or ...) points in a plane,
and Z a surface or terrain above that.
X,Y may fill most of the rectangle [Xmin .. Xmax] x [Ymin .. Ymax],
or may be just a squiggly S or Y inside it.
The Z surface may be smooth, or smooth + a bit of noise,
or not smooth at all, rough volcanic mountains.
Xnew and Ynew are usually also 1d, describing a rectangular grid
of |Xnew| x |Ynew| points where you want to interpolate or estimate Z.
Znew = griddata(...) returns a 2d array over this grid, np.meshgrid(Xnew,Ynew):
Znew[Xnew0,Ynew0], Znew[Xnew1,Ynew0], Znew[Xnew2,Ynew0] ...
Znew[Xnew0,Ynew1] ...
Znew[Xnew0,Ynew2] ...
...
Xnew,Ynew points far from any of the input X,Y s spell trouble.
griddata checks this:
A masked array is returned if any grid points are outside convex
hull defined by input data (no extrapolation is done).
("Convex hull" is the area inside an imaginary
rubber band stretched around all the X,Y points.)
griddata works by first constructing a Delaunay triangulation
of the input X,Y, then doing
Natural neighbor
interpolation. This is robust and quite fast.
BivariateSpline, though, can extrapolate,
generating wild swings without warning.
Furthermore, all the *Spline routines in Fitpack
are very sensitive to smoothing parameter S.
Dierckx's book (books.google isbn 019853440X p. 89) says:
if S is too small, the spline approximation is too wiggly
and picks up too much noise (overfit);
if S is too large the spline will be too smooth
and signal will be lost (underfit).
Interpolation of scattered data is hard, smoothing not easy, both together really hard.
What should an interpolator do with big holes in XY, or with very noisy Z ?
("If you want to sell it, you're going to have to describe it.")
Yet more notes, fine print:
1d vs 2d: Some interpolators take X,Y,Z either 1d or 2d.
Others take 1d only, so flatten before interpolating:
Xmesh, Ymesh = np.meshgrid( np.linspace(0,1,Nx), np.linspace(0,1,Ny) )
Z = f( Xmesh, Ymesh ) # Nx x Ny
Znew = griddata( Xmesh.flatten(), Ymesh.flatten(), Z.flatten(), Xnew, Ynew )
On masked arrays: matplotlib handles them just fine,
plotting only unmasked / non-NaN points.
But I wouldn't bet that that a bozo numpy/scipy functions would work at all.
Check for interpolation outside the convex hull of X,Y like this:
Znew = griddata(...)
nmask = np.ma.count_masked(Znew)
if nmask > 0:
print "info: griddata: %d of %d points are masked, not interpolated" % (
nmask, Znew.size)
# Znew = Znew.data # array with NaNs
On polar coordinates:
X,Y and Xnew,Ynew should be in the same space,
both Cartesion, or both in [rmin .. rmax] x [tmin .. tmax].
To plot (r, theta, z) points in 3d:
from mpl_toolkits.mplot3d import Axes3D
Znew = griddata( R,T,Z, Rnew,Tnew )
ax = Axes3D(fig)
ax.plot_surface( Rnew * np.cos(Tnew), Rnew * np.sin(Tnew), Znew )
See also (haven't tried this):
ax = subplot(1,1,1, projection="polar", aspect=1.)
ax.pcolormesh(theta, r, Z)
Two tips for the wary programmer:
check for outliers, or funny scaling:
def minavmax( X ):
m = np.nanmin(X)
M = np.nanmax(X)
av = np.mean( X[ - np.isnan(X) ]) # masked ?
histo = np.histogram( X, bins=5, range=(m,M) ) [0]
return "min %.2g av %.2g max %.2g histo %s" % (m, av, M, histo)
for nm, x in zip( "X Y Z Xnew Ynew Znew".split(),
(X,Y,Z, Xnew,Ynew,Znew) ):
print nm, minavmax(x)
check interpolation with simple data:
interpolate( X,Y,Z, X,Y ) -- interpolate at the same points
interpolate( X,Y, np.ones(len(X)), Xnew,Ynew ) -- constant 1 ?