I have observations of several optical emission lines, and I have a model that predicts several (flux) ratios of those lines, based on two parameters, q and z, which I want to infer.
I have created #pymc.deterministic objects that take values of q and z (each of which has uninformative priors over some physically-interesting region), and turn them into a "predicted" ratio. There are about 7 ratios, and they have the form:
#pymc.deterministic(observed=True, value=NII_SII)
def NII_SII_th(q=q, z=z):
return NII_SII_g(np.array([q, z]))
I can also define the ratios derived from observations, such as
#pymc.deterministic
def NII_SII(NII_6584=NII_6584, SII_6717=SII_6717,
rcf_NII_6584=rcf_NII_6584, rcf_SII_6717=rcf_SII_6717):
return np.log10(
(rcf_NII_6584*NII_6584) / \
(rcf_SII_6717*SII_6717))
where, for instance, NII_6584 is the observed flux of one of the lines and rcf_NII_6584 is the flux correction for that same line. These corrections are themselves determined by the line wavelengths (known with infinite precision), and by a parameter EBV, which can be calculated from the observed flux ratio of two lines that are supposed to have a fixed ratio r:
#pymc.deterministic
def EBV(Ha=Ha, Hb=Hb, r=r, R_V=R_V, Ha_l=Ha_l, Hb_l=Hb_l):
kHb = gas_meas.calzetti_k(lams=np.array([Ha_l]), Rv=R_V)
kHa = gas_meas.calzetti_k(lams=np.array([Hb_l]), Rv=R_V)
return 2.5 / (kHb - kHa) * np.log10((Ha/Hb) / r)
I also have a prior on the value of R_V.
The measurements themselves are expressed as Normal distributions, such as
NII_6584 = pymc.Normal(
'NII_6584', mu=f_row['[NII]6584'],
tau=1./e_row['[NII]6584']**2.,
observed=True, value=f_row['[NII]6584'])
I would like to get estimates of R_V, EBV, q, and z. However, when I make a pymc Model from all these, I am told that Deterministic objects cannot have observed values:
TypeError: __init__() got an unexpected keyword argument 'value'
First, am I misunderstanding the nature of Deterministic objects? If so, how else do I infer based on values that are not directly observed?
Second, am I constructing the observations correctly? It seems odd that I'd have to specify the observed flux as both the mean and the value argument, but it's not clear to me what else to do, other than also model the flux means and variances, which seems unnecessarily complicated.
Any advice would be appreciated!
I don't think you're constructing your observations correctly. This is not a minimum working example, but maybe we can clear up some confusion.
First off, I don't think the #deterministic decorator takes an argument value = <something>. It's not clear which of your deterministic statements is the actual model, but try to translate your code into the following template:
#Define your randomly-distributed variables (I'm assuming they're normal)
q = pymc.Normal(name,mu=mu,tau=tau)
z = pymc.Normal(name2,mu=mu2,tau=tau2)
#Define how you think they generate your data
#pymc.deterministic
def NII_SII_th(q=q, z=z):
return NII_SII_g(np.array([q, z])) #this fcn is defined somewhere else
#Your data array
f_row['[Nii]6584']=[...]
#Now link your model and your data
obs = pymc.Normal(modelname,mu=NII_SII_th,
observed=True, value=f_row['[NII]6584'])
Related
I know the library curve_fit of scipy and its power to fitting curves. I have read many examples here and in the documentation, but I cannot solve my problem.
For example, I have 10 files (chemical structers but it does not matter) and ten experimental energy values. I have a function inside a class that calculates for each structure the theoretical energy for some parameters and it returns a numpy array with the theoretical energy values.
I want to find the best parameters to have the theoretical values nearest to the experimental ones. I will furnish here the minimum exemple of my code
This is the class function that reads the experimental energy files, extracts the correct substring and returns the values as a numpy array. The self.path is just the directory and self.nPoints = 10. It is not so important, but I furnish for the sake of completeness
def experimentalValues(self):
os.chdir(self.path)
energy = np.zeros(self.nPoints)
for i in range(1, self.nPoints):
f = open("p_" + str(i + 1) + ".xyz", "r")
energy[i] = float(f.readlines()[1].split()[1])
f.close()
os.chdir('..')
return energy
I calculate the theoretical value with this class function that takes two numpy arrays as arguments, lets say
sigma = np.full(nSubstrate, 2.)
epsilon = np.full(nSubstrate, 0.15)
where nSubstrate = 9
Here there is the class function. It reads files and does two nested loops to calculate for each file the theoretical value and return it to a numpy array.
def theoreticalEnergy(self, epsilon, sigma):
os.chdir(self.path)
cE = np.zeros(self.nPoints)
for n in range(0, self.nPoints):
filenameXYZ = "p_" + str(n + 1) + "_extended.xyz"
allCoordinates = np.loadtxt(filenameXYZ, skiprows = 0, usecols = (1, 2, 3))
substrate = allCoordinates[0:self.nSubstrate]
surface = allCoordinates[self.nSubstrate:]
for i in range(0, substrate.shape[0]):
positionAtomI = np.array(substrate[i][:])
for j in range(0, surface.shape[0]):
positionAtomJ = np.array(surface[j][:])
distanceIJ = self.distance(positionAtomI, positionAtomJ)
cE[n] += self.LennardJones(distanceIJ, epsilon[i], sigma[i])
os.chdir('..')
return cE
Again, for the sake of completeness the Lennard Jones class function is defined as
def LennardJones(self, distance, epsilon, sigma):
repulsive = (sigma/distance) ** 12.
attractive = (sigma/distance) ** 6.
potential = 4. * epsilon* (repulsive - attractive)
return potential
where in this case all the arguments are scalar as the return value.
To conclude the problem presentation I have 3 ingredients:
a numpy array with the experimental data
two numpy arrays with a guess for the parameters sigma and epsilon
a function that takes the last parameters and returns a numpy vector with the values to be fitted.
How can I solve this problem like the approach described in the documentation https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html?
Curve fitting
The curve_fit fits a function f(w, x[i]) to points y[i] by finding w that minimizes sum((f(w, x[i] - y[i])**2 for i in range(n)). As you will read in the first line after the function definition
[It uses] non-linear least squares to fit a function, f, to data.
It refers to least_squares where it states
Given the residuals f(x) (an m-D real function of n real variables) and the loss function rho(s) (a scalar function), least_squares finds a local minimum of the cost function F(x):
Curve fitting is a kind of convex-cost multi-objective optimization. Since the each individual cost is convex, you can add all of them and that will still be a convex function. Notice that the decision variables (the parameters to be optimized) are the same in every point.
Your problem
In my understanding for each energy level you have a different set of parameters, if you write it as a curve fitting problem, the objective function could be expressed as sum((f(w[i], x[i]) - y[i])**2 ...), where y[i]is determined by the energy level. Since each of the terms in the sum is independent on the other terms, this is equivalent to finding each group of parametersw[i]separately minimizing(f(w[i], x[i]) - y[i])**2`.
Convexity
Convexity is a very convenient property for optimization because it ensures that you will have only one minimum in the parameter space. I am not doing a detailed analysis but have reasonable doubts about the convexity of your energy function.
The Lennard Jones function has the difference of a repulsive and an attractive force both with negative even exponent on the distance this alone is very unlikely to be convex.
The sum of multiple local functions centered at different positions has no defined convexity.
Molecular energy, or crystal energy, or protein folding are well known to be non-convex.
A few days ago (on a bike ride) I was thinking about this, how the molecules will be configured in a global minimum energy, and I was wondering if it finds that configuration so rapidly because of quantum tunneling effects.
Non-convex optimization
The non-convex (global) optimization is different from (non-linear) least-squares, in the sense that when a local minimum is found the process don't return immediately, it start making new attempts in different regions of the search spaces. If the function is smooth you can still take advantage of a gradient based local optimization method, but the complexity is still NP.
A classic global optimization method is the Simulated annenaling, if you have a chemical background I think you will have some insights reading about it. Once upon a time, simulated annealing was provided in scipy.optimize.
You will find a few global optimization methods in scipy.optimize. I would encourage you to try Basin hopping, since it was successfully applied to similar problems, as you can read in the references.
I hope this drop you on the right way to your solution. But, be aware that you will probably need to spend, learning how to use the function and will need to make some decisions. You will need to find a balance of accuracy, simplicity, efficiency.
If you want better solution take the time to derive the gradient of the cost function (you can return two values f, and df, where df is the gradient of f with respect to the decision variables).
I'm hoping to make an animation about how the least-squares regression analysis provided by scipy.optimize.leastsq() converges on a specific result. Is there any way to get the function to, say, append to a list a tuple of guess values for each iteration until the function converges to the local minima? Or, is there a different library which includes this feature?
Below is what I have:
# initial guess for gaussian distributions to optimize [height, position, width].
# if more than 2 distributions required, add a new set of [h,p,w] initial parameters to 'initials' for each new distribution.
# new parameters should be of the same format for consistency; i.e. [h,p,w],[h,p,w],[h,p,w]... etc.
# A 'w' guess of 1 is typically a sufficient estimation.
initials = [6.5,13,1],[4.5,19,1]
# determines the number of gaussian functions to compute from the initial guesses
n = len(initials)
# formats initials into a 1D array
var = np.concatenate(initials)
# data matrix
M = np.array(master)
# defines a typical gaussian function, of independent variable x,
# amplitude a, position b, and width parameter c.
def gaussian(x,a,b,c):
return a*np.exp((-(x-b)**2.0)/c**2.0)
# defines the expected resultant as a sum of intrinsic gaussian functions
def GaussSum(x, p):
return sum(gaussian(x, p[3*k], p[3*k+1], p[3*k+2]) for k in range(n))
# defines condition of minimization, reducing the square of the difference between the data (y) and the function 'func(x,p)'
def residuals(p, y, x):
return (y - GaussSum(x,p))**2
# executes least-squares regression analysis to optimize initial parameters
cnsts = leastsq(residuals, var, args=(M[:,1],M[:,0]))[0]
what I'm eventually hoping for is for 'cnsts' to be a list of tuples of every guess from the initial guess to the final guess.
If I'm understanding your question correctly, you want to make a guess at each of the different coefficients while fitting a linear regression line, then have a list of all the coefficents that have been guessed? Similar to how a NN will back-propagate the error to better fit a model?
Linear regression isn't guessing the different coefficents. It's just calculating them... https://www.statisticshowto.datasciencecentral.com/probability-and-statistics/regression-analysis/find-a-linear-regression-equation/#FindaLinear
I'm trying to understand if there is any meaningful difference in the ways of passing data into a model - either aggregated or as single trials (note this will only be a sensical question for certain distributions e.g. Binomial).
Predicting p for a yes/no trail, using a simple model with a Binomial distribution.
What is the difference in the computation/results of the following models (if any)?
I choose the two extremes, either passing in a single trail at once (reducing to Bernoulli) or passing in the sum of the entire series of trails, to exemplify my meaning though I am interested in the difference in between these extremes also.
# set up constants
p_true = 0.1
N = 3000
observed = scipy.stats.bernoulli.rvs(p_true, size=N)
Model 1: combining all observations into a single data point
with pm.Model() as binomial_model1:
p = pm.Uniform('p', lower=0, upper=1)
observations = pm.Binomial('observations', N, p, observed=np.sum(observed))
trace1 = pm.sample(40000)
Model 2: using each observation individually
with pm.Model() as binomial_model2:
p = pm.Uniform('p', lower=0, upper=1)
observations = pm.Binomial('observations', 1, p, observed=observed)
trace2 = pm.sample(40000)
There is isn't any noticeable difference in the trace or posteriors in this case. I attempted to dig into the pymc3 source code to try to see how the observations were being processed but couldn't find the right part.
Possible expected answers:
pymc3 aggregates the observations under the hood for Binomial anyway so their is no difference
the resultant posterior surface (which is explored in the sample process) is identical in each case -> there is no meaningful/statistical difference in the two models
there are differences in the resultant statistics because of this and that...
This is an interesting example! Your second suggestion is correct: you can actually work out the posterior analytically, and it will be distributed according to
Beta(sum(observed), N - sum(observed))
in either case.
The difference in modelling approach would show up if you used, for example, pm.sample_ppc, in that the first would be distributed according to Binomial(N, p) and the second would be N draws of Binomial(1, p).
I have three arrays x,y,z. I wanted to smooth the z-data. So, I have used SmoothBivariateSpline function. But when I eval the result, I get completely different values compared to my previous z-data. Below is my code:
def envinterpolate(x,y,z):
x_interp = np.linspace(min(x),max(x),len(x)*4)
y_interp = np.linspace(min(y),max(y),len(x)*4)
sbsp = SmoothBivariateSpline(x,y,z)
z_interp = sbsp.ev(x_interp,y_interp)
return z_interp
Is there anything wrong in my code while evaluating the values of spline?
Attaching the plot,after trying s=0 parameter(redline my actual z-data,blackline z-interp data)
By convention, "smoothing" refers specifically to cases where you don't want the interpolant to pass exactly through your input data points (for example if you know that your input data is noisy).
SmoothBivariateSpline takes a parameter s that controls the degree of smoothing that is applied to the interpolant:
s : float, optional
Positive smoothing factor defined for estimation condition: sum((w[i]*(z[i]-s(x[i], y[i])))**2, axis=0) <= s Default s=len(w) which should be a good value if 1/w[i] is an estimate of the standard deviation of z[i].
If you don't want any smoothing you could simply set s=0.
Given that the fitting function is of type:
I intend to fit such function to the experimental data (x,y=f(x)) that I have. But then I have some doubts:
How do I define my fitting function when there's a summation involved?
Once the function defined, i.e. def func(..) return ... is it still possible to use curve_fit from scipy.optimize? Because now there's a set of parameters s_i and r_i involved compared to the usual fitting cases where one has few single parameters.
Finally are such cases treated completely differently?
Feel a bit lost here, thanks for any help.
This is very well within reach of scipy.optimize.curve_fit (or just scipy.optimize.leastsqr). The fact that a sum is involved does not matter at all, nor that you have arrays of parameters. The only thing to note is that curve_fit wants to give your fit function the parameters as individual arguments, while leastsqr gives a single vector.
Here's a solution:
import numpy as np
from scipy.optimize import curve_fit, leastsq
def f(x,r,s):
""" The fit function, applied to every x_k for the vectors r_i and s_i. """
x = x[...,np.newaxis] # add an axis for the summation
# by virtue of numpy's fantastic broadcasting rules,
# the following will be evaluated for every combination of k and i.
x2s2 = (x*s)**2
return np.sum(r * x2s2 / (1 + x2s2), axis=-1)
# fit using curve_fit
popt,pcov = curve_fit(
lambda x,*params: f(x,params[:N],params[N:]),
X,Y,
np.r_[R0,S0],
)
R = popt[:N]
S = popt[N:]
# fit using leastsq
popt,ier = leastsq(
lambda params: f(X,params[:N],params[N:]) - Y,
np.r_[R0,S0],
)
R = popt[:N]
S = popt[N:]
A few things to note:
Upon start, we need the 1d arrays X and Y of measurements to fit to, the 1d arrays R0 and S0 as initial guesses and Nthe length of those two arrays.
I separated the implementation of the actual model f from the objective functions supplied to the fitters. Those I implemented using lambda functions. Of course, one could also have ordinary def ... functions and combine them into one.
The model function f uses numpy's broadcasting to simultaneously sum over a set of parameters (along the last axis), and calculate in parallel for many x (along any axes before the last, though both fit functions would complain if there is more than one... .ravel() to help there)
We concatenate the fit parameters R and S into a single parameter vector using numpy's shorthand np.r_[R,S].
curve_fit supplies every single parameter as a distinct parameter to the objective function. We want them as a vector, so we use *params: It catches all remaining parameters in a single list.
leastsq gives a single params vector. However, it neither supplies x, nor does it compare it to y. Those are directly bound into the objective function.
In order to use scipy.optimize.leastsq to estimate multiple parameters, you need to pack them into an array and unpack them inside your function. You can then do anything you want with them. For example, if your s_i are the first 3 and your r_i are the next three parameters in your array p, you would just set ssum=p[:3].sum() and rsum=p[3:6].sum(). But again, your parameters are not identified (according to your comment), so estimation is pointless.
For an example of using leastsq, see the Cookbook's Fitting Data example.