Tensorflow probability not giving the same results as PyMC3

Tensorflow probability not giving the same results as PyMC3 - python

I have previousely used PyMC3 and am now looking to use tensorflow probability.
I have built some model in both, but unfortunately, I am not getting the same answer. In fact, the answer is not that close.
# definition of the joint_log_prob to evaluate samples
def joint_log_prob(data, proposal):
prior = tfd.Normal(mu_0, sigma_0, name='prior')
likelihood = tfd.Normal(proposal, sigma, name='likelihood')
return (prior.log_prob(proposal) + tf.reduce_mean(likelihood.log_prob(data)))
proposal = 0
# define a closure on joint_log_prob
def unnormalized_log_posterior(proposal):
return joint_log_prob(data=observed, proposal=proposal)
# define how to propose state
rwm = tfp.mcmc.NoUTurnSampler(
target_log_prob_fn=unnormalized_log_posterior,
max_tree_depth = 100,
step_size = 0.1
)
# define initial state
initial_state = tf.constant(0., name='initial_state')
#tf.function
def run_chain(initial_state, num_results=7000, num_burnin_steps=2000,adaptation_steps = 1):
adaptive_kernel = tfp.mcmc.DualAveragingStepSizeAdaptation(
rwm, num_adaptation_steps=adaptation_steps,
step_size_setter_fn=lambda pkr, new_step_size: pkr._replace(step_size=new_step_size),
step_size_getter_fn=lambda pkr: pkr.step_size,
log_accept_prob_getter_fn=lambda pkr: pkr.log_accept_ratio,
)
return tfp.mcmc.sample_chain(
num_results=num_results,
num_burnin_steps= num_burnin_steps,
current_state=initial_state,
kernel=adaptive_kernel,
trace_fn=lambda cs, kr: kr)
trace, kernel_results = run_chain(initial_state)
I am using NoUTurns sampler, I have added some stepsize adaptation, without it, the result is pretty much the same.
I dont really know how to move forward?
Maybe be joint log probability is wrong?

You should use reduce_sum in your log_prob instead of reduce_mean. Otherwise you are effectively downweighting the likelihood by a factor equal to the size of your data set. This would cause the samples to look a lot more like the prior, which might be what you’re seeing in the plot.

Related

Specification of Multinomial model in Tensorflow Probability

I am playing with a mixed multinomial discrete choice model in Tensorflow Probability. The function should take an input of a choice among 3 alternatives. The chosen alternative is specified by CHOSEN (a # observationsx3 tensor). Below is an update to the code to reflect my progress on the problem (but the problem remains).
I currently get the error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [6768,3] vs. [1,3,6768] [Op:Mul]
with the traceback suggesting the issue is in the call to log_prob() for the final component of the joint distrubtion (i.e., tfp.Independent(tfp.Multinomial(...))
The main components are (thank you to Padarn Wilson for helping to fix the joint distribution definition):
#tf.function
def affine(x, kernel_diag, bias=tf.zeros([])):
"""`kernel_diag * x + bias` with broadcasting."""
kernel_diag = tf.ones_like(x) * kernel_diag
bias = tf.ones_like(x) * bias
return x * kernel_diag + bias
def mmnl_func():
adj_AV_train = (tf.ones(num_idx) - AV[:,0]) * tf.constant(-9999.)
adj_AV_SM = (tf.ones(num_idx) - AV[:,1]) * tf.constant(-9999.)
adj_AV_car = (tf.ones(num_idx) - AV[:,2]) * tf.constant(-9999.)
return tfd.JointDistributionSequential([
tfd.Normal(loc=0., scale=1e5), # mu_b_time
tfd.HalfCauchy(loc=0., scale=5), # sigma_b_time
lambda sigma_b_time,mu_b_time: tfd.MultivariateNormalDiag( # b_time
loc=affine(tf.ones([num_idx]), mu_b_time[..., tf.newaxis]),
scale_diag=sigma_b_time*tf.ones(num_idx)),
tfd.Normal(loc=0., scale=1e5), # a_train
tfd.Normal(loc=0., scale=1e5), # a_car
tfd.Normal(loc=0., scale=1e5), # b_cost
lambda b_cost,a_car,a_train,b_time: tfd.Independent(tfd.Multinomial(
total_count=1,
logits=tf.stack([
affine(DATA[:,0], tf.gather(b_time, IDX[:,0], axis=-1), (a_train + b_cost * DATA[:,1] + adj_AV_train)),
affine(DATA[:,2], tf.gather(b_time, IDX[:,0], axis=-1), (b_cost * DATA[:,3] + adj_AV_SM)),
affine(DATA[:,4], tf.gather(b_time, IDX[:,0], axis=-1), (a_car + b_cost * DATA[:,5] + adj_AV_car))
], axis=1)
),reinterpreted_batch_ndims=1)
])
#tf.function
def mmnl_log_prob(mu_b_time, sigma_b_time, b_time, a_train, a_car, b_cost):
return mmnl_func().log_prob(
[mu_b_time, sigma_b_time, b_time, a_train, a_car, b_cost, CHOICE])
# Based on https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args
# change constant values to tf.constant()
nuts_samples = tf.constant(1000)
nuts_burnin = tf.constant(500)
num_chains = tf.constant(1)
## Initial step size
init_step_size= tf.constant(0.3)
# Set the chain's start state.
initial_state = [
tf.zeros([num_chains], dtype=tf.float32, name="init_mu_b_time"),
tf.zeros([num_chains], dtype=tf.float32, name="init_sigma_b_time"),
tf.zeros([num_chains, num_idx], dtype=tf.float32, name="init_b_time"),
tf.zeros([num_chains], dtype=tf.float32, name="init_a_train"),
tf.zeros([num_chains], dtype=tf.float32, name="init_a_car"),
tf.zeros([num_chains], dtype=tf.float32, name="init_b_cost")
]
## NUTS (using inner step size averaging step)
##
#tf.function
def nuts_sampler(init):
nuts_kernel = tfp.mcmc.NoUTurnSampler(
target_log_prob_fn=mmnl_log_prob,
step_size=init_step_size,
)
adapt_nuts_kernel = tfp.mcmc.DualAveragingStepSizeAdaptation(
inner_kernel=nuts_kernel,
num_adaptation_steps=nuts_burnin,
step_size_getter_fn=lambda pkr: pkr.step_size,
log_accept_prob_getter_fn=lambda pkr: pkr.log_accept_ratio,
step_size_setter_fn=lambda pkr, new_step_size: pkr._replace(step_size=new_step_size)
)
samples_nuts_, stats_nuts_ = tfp.mcmc.sample_chain(
num_results=nuts_samples,
current_state=initial_state,
kernel=adapt_nuts_kernel,
num_burnin_steps=tf.constant(100),
parallel_iterations=tf.constant(5))
return samples_nuts_, stats_nuts_
samples_nuts, stats_nuts = nuts_sampler(initial_state)

More than likely this is an issue with your initial state and number of chains. You can try to initialize your kernel outside of the sampler call:
nuts_kernel = tfp.mcmc.NoUTurnSampler(
target_log_prob_fn=mmnl_log_prob,
step_size=init_step_size,
)
adapt_nuts_kernel = tfp.mcmc.DualAveragingStepSizeAdaptation(
inner_kernel=nuts_kernel,
num_adaptation_steps=nuts_burnin,
step_size_getter_fn=lambda pkr: pkr.step_size,
log_accept_prob_getter_fn=lambda pkr: pkr.log_accept_ratio,
step_size_setter_fn=lambda pkr, new_step_size: pkr._replace(step_size=new_step_size)
)
and then do
nuts_kernel.bootstrap_results(initial_state)
and investigate the shapes of logL, and proposal states are being returned.
Another thing to do is to feed your initial state into your log-likelihood/posterior and see if the dimensions of the returned log-likelihoods match what you think it should be (if you are doing multiple chains then maybe it should be returning # chains log likelihoods).
It is my understanding that the batch dimension (# chains) has to be the first one in all your vectorized calculations.
The very last part of my blog post on tensorflow and custom likelihoods has working code for an example that does this.

I was able to get reasonable results from my model. Thank you to everyone for the help! The following points helped solve the various issues.
Use of JointDistributionSequentialAutoBatched() to produce consistent batch shapes. You need tf-nightly installed for access.
More informative priors for hyperparameters. The exponential transformation in the Multinomial() distribution means that uninformative hyperparameters (i.e., with sigma = 1e5) mean you quickly have large positive numbers entering the exp(), leading to infinities.
Setting the step size, etc. was also important.
I found an answer by Christopher Suter to a recent question on the Tensorflow Probability forum specifying a model from STAN useful. I took the use of taking a sample from my prior as a starting point for the initial likelihood parameters useful.
Despite JointDistributionSequentialAutoBatched() correcting the batch shapes, I went back and corrected my joint distribution shapes so that printing log_prob_parts() gives consistent shapes (i.e., [10,1] for 10 chains). I still get a shape error without using JointDistributionSequentialAutoBatched() but the combination seem to work.
I separated my affine() into two functions. They do the same thing but remove retracing warnings. Basically, affine() was able to broadcast the inputs but they differed and it was easier to write two functions that setup the inputs with consistent shapes. Differently shaped inputs causes Tensorflow to trace the function multiple times.

Tensorflow Probability Error: OperatorNotAllowedInGraphError: iterating over `tf.Tensor` is not allowed

I am trying to estimate a model in tensorflow using NUTS by providing it a likelihood function. I have checked the likelihood function is returning reasonable values. I am following the setup here for setting up NUTS:
https://rlhick.people.wm.edu/posts/custom-likes-tensorflow.html
and some of the examples here for setting up priors, etc.:
https://github.com/tensorflow/probability/blob/master/tensorflow_probability/examples/jupyter_notebooks/Multilevel_Modeling_Primer.ipynb
My code is in a colab notebook here:
https://drive.google.com/file/d/1L9JQPLO57g3OhxaRCB29do2m808ZUeex/view?usp=sharing
I get the error: OperatorNotAllowedInGraphError: iterating overtf.Tensoris not allowed: AutoGraph did not convert this function. Try decorating it directly with #tf.function. This is my first time using tensorflow and I am quite lost interpreting this error. It would also be ideal if I could pass the starting parameter values as a single input (example I am working off doesn't do it, but I assume it is possible).
Update
It looks like I had to change the position of the #tf.function decorator. The sampler now runs, but it gives me the same value for all samples for each of the parameters. Is it a requirement that I pass a joint distribution through the log_prob() function? I am clearly missing something. I can run the likelihood through bfgs optimization and get reasonable results (I've estimated the model via maximum likelihood with fixed parameters in other software). It looks like I need to define the function to return a joint distribution and call log_prob(). I can do this if I set it up as a logistic regression (logit choice model is logistically distributed in differences). However, I lose the standard closed form.
My function is as follows:
#tf.function
def mmnl_log_prob(init_mu_b_time,init_sigma_b_time,init_a_car,init_a_train,init_b_cost,init_scale):
# Create priors for hyperparameters
mu_b_time = tfd.Sample(tfd.Normal(loc=init_mu_b_time, scale=init_scale),sample_shape=1).sample()
# HalfCauchy distributions are too wide for logit discrete choice
sigma_b_time = tfd.Sample(tfd.Normal(loc=init_sigma_b_time, scale=init_scale),sample_shape=1).sample()
# Create priors for parameters
a_car = tfd.Sample(tfd.Normal(loc=init_a_car, scale=init_scale),sample_shape=1).sample()
a_train = tfd.Sample(tfd.Normal(loc=init_a_train, scale=init_scale),sample_shape=1).sample()
# a_sm = tfd.Sample(tfd.Normal(loc=init_a_sm, scale=init_scale),sample_shape=1).sample()
b_cost = tfd.Sample(tfd.Normal(loc=init_b_cost, scale=init_scale),sample_shape=1).sample()
# Define a heterogeneous random parameter model with MultivariateNormalDiag()
# Use MultivariateNormalDiagPlusLowRank() to define nests, etc.
b_time = tfd.Sample(tfd.MultivariateNormalDiag( # b_time
loc=mu_b_time,
scale_diag=sigma_b_time),sample_shape=num_idx).sample()
# Definition of the utility functions
V1 = a_train + tfm.multiply(b_time,TRAIN_TT_SCALED) + b_cost * TRAIN_COST_SCALED
V2 = tfm.multiply(b_time,SM_TT_SCALED) + b_cost * SM_COST_SCALED
V3 = a_car + tfm.multiply(b_time,CAR_TT_SCALED) + b_cost * CAR_CO_SCALED
print("Vs",V1,V2,V3)
# Definition of loglikelihood
eV1 = tfm.multiply(tfm.exp(V1),TRAIN_AV_SP)
eV2 = tfm.multiply(tfm.exp(V2),SM_AV_SP)
eV3 = tfm.multiply(tfm.exp(V3),CAR_AV_SP)
eVD = eV1 + eV2 +
eV3
print("eVs",eV1,eV2,eV3,eVD)
l1 = tfm.multiply(tfm.truediv(eV1,eVD),tf.cast(tfm.equal(CHOICE,1),tf.float32))
l2 = tfm.multiply(tfm.truediv(eV2,eVD),tf.cast(tfm.equal(CHOICE,2),tf.float32))
l3 = tfm.multiply(tfm.truediv(eV3,eVD),tf.cast(tfm.equal(CHOICE,3),tf.float32))
ll = tfm.reduce_sum(tfm.log(l1+l2+l3))
print("ll",ll)
return ll
The function is called as follows:
nuts_samples = 1000
nuts_burnin = 500
chains = 4
## Initial step size
init_step_size=.3
init = [0.,0.,0.,0.,0.,.5]
##
## NUTS (using inner step size averaging step)
##
#tf.function
def nuts_sampler(init):
nuts_kernel = tfp.mcmc.NoUTurnSampler(
target_log_prob_fn=mmnl_log_prob,
step_size=init_step_size,
)
adapt_nuts_kernel = tfp.mcmc.DualAveragingStepSizeAdaptation(
inner_kernel=nuts_kernel,
num_adaptation_steps=nuts_burnin,
step_size_getter_fn=lambda pkr: pkr.step_size,
log_accept_prob_getter_fn=lambda pkr: pkr.log_accept_ratio,
step_size_setter_fn=lambda pkr, new_step_size: pkr._replace(step_size=new_step_size)
)
samples_nuts_, stats_nuts_ = tfp.mcmc.sample_chain(
num_results=nuts_samples,
current_state=init,
kernel=adapt_nuts_kernel,
num_burnin_steps=100,
parallel_iterations=5)
return samples_nuts_, stats_nuts_
samples_nuts, stats_nuts = nuts_sampler(init)

I have an answer to my question! It is simply a matter of different nomenclature. I need to define my model as a softmax function, which I knew was what I would call a "logit model", but it just wasn't clicking for me. The following blog post gave me the epiphany:
http://khakieconomics.github.io/2019/03/17/Putting-it-all-together.html

How do I properly train and predict value like biomass using GRU RNN?

My first time trying to train a dataset containing 8 variables in a time-series of 20 years or so using GRU RNN. The biomass value is what I'm trying to predict based on the other variables. I'm trying first with 1 layer GRU. I'm not using softmax for the output layer. MSE is used for my cost function.
It is basic GRU with forward propagation and backward gradient update. Here are the main function I defined:
'x_t is the input training dataset with a dimension of 7572x8. So T = 7572, input_dim = 8, hidden_dim =128. y_train is my train label.'
def forward_prop_step(self, x_t,y_train, s_t1_prev,V, U, W, b, c,learning_rate):
T = x_t.shape[0]
z_t1 = np.zeros((T,self.hidden_dim))
r_t1 = np.zeros((T,self.hidden_dim))
h_t1 = np.zeros((T,self.hidden_dim))
s_t1 = np.zeros((T+1,self.hidden_dim))
o_s = np.zeros((T,self.input_dim))
for i in xrange(T):
x_e = x_t[i].T
z_t1[i] = sigmoid(U[0].dot(x_e) + W[0].dot(s_t1[i]) + b[0])#128x1
r_t1[i] = sigmoid(U[1].dot(x_e) + W[1].dot(s_t1[i]) + b[1])#128x1
h_t1[i] = np.tanh(U[2].dot(x_e) + W[2].dot(s_t1[i] * r_t1[i]) + b[2])#128x1
s_t1[i+1] = (np.ones_like(z_t1[i]) - z_t1[i]) * h_t1[i] + z_t1[i] * s_t1[i]#128x1
o_s[i] = np.dot(V,s_t1[i+1]) + c#8x1
return [o_s,z_t1,r_t1,h_t1,s_t1]
def bptt(self, x,y_train,o,z_t1,r_t1,h_t1,s_t1,V, U, W, b, c):
bptt_truncate = 360
T = x.shape[0]#length of time scale of input data (train)
dLdU = np.zeros(U.shape)
dLdV = np.zeros(V.shape)
dLdW = np.zeros(W.shape)
dLdb = np.zeros(b.shape)
dLdc = np.zeros(c.shape)
y_train_sp = np.repeat(y_train,self.input_dim)
for t in np.arange(T)[::-1]:
dLdy = 2 * (o[t] - y_train_sp[t])
dydV = s_t1[t]
dydc = 1.0
dLdV += np.outer(dLdy,dydV)
dLdc += dLdy*dydc
for i in np.arange(max(0, t-bptt_truncate), t+1)[::-30]:#every month in the past year
s_t1_pre = s_t1[i]
dydst1 = V #8x128
dst1dzt1 = -h_t1[i] + s_t1_pre #128x1
dst1dht1 = np.ones_like(z_t1[i]) - z_t1[i] #128x1
dzt1dU = np.outer(z_t1[i]*(1.0-z_t1[i]),x[i]) #128x8
#print dzt1dU.shape
dzt1dW = np.outer(z_t1[i]*(1.0-z_t1[i]),s_t1_pre) #128x128
dzt1db = z_t1[i]*(1.0-z_t1[i]) #128x1
dht1dU = np.outer((1.0-h_t1[i] ** 2),x[i]) #128x8
dht1dW = np.outer((1.0-h_t1[i] ** 2),s_t1_pre * r_t1[i]) #128x128
dht1db = 1.0-h_t1[i] ** 2 #128x1
dht1drt1 = (1.0-h_t1[i] ** 2)*(W[2].dot(s_t1_pre))#128x1
drt1dU = np.outer((r_t1[i]*(1.0-r_t1[i])),x[i]) #128x8
drt1dW = np.outer((r_t1[i]*(1.0-r_t1[i])),s_t1_pre) #128x128
drt1db = (r_t1[i]*(1.0-r_t1[i]))#128x1
dLdW[0] += np.outer(dydst1.T.dot(dLdy),dzt1dW.dot(dst1dzt1)) #128x128
dLdU[0] += np.outer(dydst1.T.dot(dLdy),dst1dzt1.dot(dzt1dU)) #128x8
dLdb[0] += (dydst1.T.dot(dLdy))*dst1dzt1*dzt1db#128x1
dLdW[1] += np.outer(dydst1.T.dot(dLdy),dst1dht1*dht1drt1).dot(drt1dW)#128x128
dLdU[1] += np.outer(dydst1.T.dot(dLdy),dst1dht1*dht1drt1).dot(drt1dU) #128x8
dLdb[1] += (dydst1.T.dot(dLdy))*dst1dht1*dht1drt1*drt1db#128x1
dLdW[2] += np.outer(dydst1.T.dot(dLdy),dht1dW.dot(dst1dht1)) #128x128
dLdU[2] += np.outer(dydst1.T.dot(dLdy),dst1dht1.dot(dht1dU))#128x8
dLdb[2] += (dydst1.T.dot(dLdy))*dst1dht1*dht1db#128x1
return [ dLdV,dLdU, dLdW, dLdb, dLdc ]
def predict( self, x):
pred = np.amax(x, axis = 1)
pred_f = relu(pred)
return pred_f
Parameters V,U,W,b,c are updated by gradient dLdV,dLdU,dLdW,dLdb,dLdc calculated by bptt.
I have tried different weight initialization (xavier or just random), tried different time truncation. But all lead to the same outcome. Probably the weight update wasn't right? The network set-up seems simple though. Really struggle on understanding the predication and translate to actual biomass too. The function predict is what I defined to translate the output layer from the GRU network to biomass value by taking the maximum value. But the output layer gives similar value for almost all time iterations. Not sure the best way to do the job though. Thanks for any help or suggestions in advance.

I doubt anyone on stackoverflow is going to debug a custom implementation of GRU for you. If you were using Tensorflow or another high level library, I might take a stab at it, or if it was a simple fully connected network, but as it is all I can do is give some advice on how to proceed with debugging.
First, it sounds like you're running a brand new implementation on your own data set right off the bat. Instead, try starting out by testing your network on trivial, synthetic data sets first. Can it learn an identity function? A response which is simply the weighted average of the three previous time stamps? And so on. It's easier to debug small simple problems. Once you know your implementation can learn things that a GRU based recurrent network should be able to learn, then you can start using your own data.
Second, this comment of yours was very insightful:
Probably the weight update wasn't right?
While it's impossible to say for sure, this is a very common - perhaps the most common - source of bugs in for backprop implementations. Andrew Ng recommends gradient checking to debug an implementation like this. Essentially, this involves numerically approximating the gradient. It's computationally inefficient but relies only on a correct implementation of forward propagation, which makes it very useful for debugging. For one, if the algorithm converges when the numerically approximated gradient is used, you can be more sure that your forward prop is correct and focus on debugging backprop. (On the other hand, if it is still does not succeed, it is likely the issue in your forward prop function.) For another, once the algorithm is working with the numerically approximated gradient then you can compare the output of your analytic gradient function with it and debug any discrepancies. This makes it a lot easier because you now know the correct answer that it should return.

Convert numpy function to theano

I am using PyMC3 to calculate something which I won't get into here but you can get the idea from this link if interested.
The '2-lambdas' case is basically a switch function, which needs to be compiled to a Theano function to avoid dtype errors and looks like this:
import theano
from theano.tensor import lscalar, dscalar, lvector, dvector, argsort
#theano.compile.ops.as_op(itypes=[lscalar, dscalar, dscalar], otypes=[dvector])
def lambda_2_distributions(tau, lambda_1, lambda_2):
"""
Return values of `lambda_` for each observation based on the
transition value `tau`.
"""
out = zeros(num_observations)
out[: tau] = lambda_1 # lambda before tau is lambda1
out[tau:] = lambda_2 # lambda after (and including) tau is lambda2
return out
I am trying to generalize this to apply to 'n-lambdas', where taus.shape[0] = lambdas.shape[0] - 1, but I can only come up with this horribly slow numpy implementation.
#theano.compile.ops.as_op(itypes=[lvector, dvector], otypes=[dvector])
def lambda_n_distributions(taus, lambdas):
out = zeros(num_observations)
np_tau_indices = argsort(taus).eval()
num_taus = taus.shape[0]
for t in range(num_taus):
if t == 0:
out[: taus[np_tau_indices[t]]] = lambdas[t]
elif t == num_taus - 1:
out[taus[np_tau_indices[t]]:] = lambdas[t + 1]
else:
out[taus[np_tau_indices[t]]: taus[np_tau_indices[t + 1]]] = lambdas[t]
return out
Any ideas on how to speed this up using pure Theano (avoiding the call to .eval())? It's been a few years since I've used it and so don't know the right approach.

Using a switch function is not recommended, as it breaks the nice geometry of the parameters space and makes sampling using modern sampler like NUTS difficult.
Instead, you can try model it using a continuous relaxation of a switch function. The main idea here would be to model the rate before the first switch point as a baseline; and add the prediction from a logistic function after each switch point:
def logistic(L, x0, k=500, t=np.linspace(0., 1., 1000)):
return L/(1+tt.exp(-k*(t_-x0)))
with pm.Model() as m2:
lambda0 = pm.Normal('lambda0', mu, sd=sd)
lambdad = pm.Normal('lambdad', 0, sd=sd, shape=nbreak-1)
trafo = Composed(pm.distributions.transforms.LogOdds(), Ordered())
b = pm.Beta('b', 1., 1., shape=nbreak-1, transform=trafo,
testval=[0.3, 0.5])
theta_ = pm.Deterministic('theta', tt.exp(lambda0 +
logistic(lambdad[0], b[0]) +
logistic(lambdad[1], b[1])))
obs = pm.Poisson('obs', theta_, observed=y)
trace = pm.sample(1000, tune=1000)
There are a few tricks I used here as well, for example, the composite transformation that is not on the PyMC3 code base yet. You can have a look at the full code here: https://gist.github.com/junpenglao/f7098c8e0d6eadc61b3e1bc8525dd90d
If you have more question, please post to https://discourse.pymc.io with your model and (simulated) data. I check and answer on the PyMC3 discourse much more regularly.

Fixing fit parameters in curve_fit

I have a function Imaginary which describes a physics process and I want to fit this to a dataset x_interpolate, y_interpolate. The function is a form of a Lorentzian peak function and I have some initial values that are user given, except for f_peak (the peak location) which I find using a peak finding algorithm. All of the fit parameters, except for the offset, are expected to be positive and thus I have set bounds_I accordingly.
def Imaginary(freq, alpha, res, Ms, off):
numerator = (2*alpha*freq*res**2)
denominator = (4*(alpha*res*freq)**2) + (res**2 - freq**2)**2
Im = Ms*(numerator/denominator) + off
return Im
pI = np.array([alpha_init, f_peak, Ms_init, 0])
bounds_I = ([0,0,0,0, -np.inf], [np.inf,np.inf,np.inf, np.inf])
poptI, pcovI = curve_fit(Imaginary, x_interpolate, y_interpolate, pI, bounds=bounds_I)
In some situations I want to keep the parameter f_peak fixed during the fitting process. I tried an easy solution by changing bounds_I to:
bounds_I = ([0,f_peak+0.001,0,0, -np.inf], [np.inf,f_peak-0.001,np.inf, np.inf])
This is for many reasons not an optimal way of doing this so I was wondering if there is a more Pythonic way of doing this? Thank you for your help

If a parameter is fixed, it is not really a parameter, so it should be removed from the list of parameters. Define a model that has that parameter replaced by a fixed value, and fit that. Example below, simplified for brevity and to be self-contained:
x = np.arange(10)
y = np.sqrt(x)
def parabola(x, a, b, c):
return a*x**2 + b*x + c
fit1 = curve_fit(parabola, x, y) # [-0.02989396, 0.56204598, 0.25337086]
b_fixed = 0.5
fit2 = curve_fit(lambda x, a, c: parabola(x, a, b_fixed, c), x, y)
The second call to fit returns [-0.02350478, 0.35048631], which are the optimal values of a and c. The value of b was fixed at 0.5.
Of course, the parameter should be removed from the initial vector pI and the bounds as well.

You might find lmfit (https://lmfit.github.io/lmfit-py/) helpful. This library adds a higher-level interface to the scipy optimization routines, aiming for a more Pythonic approach to optimization and curve fitting. For example, it uses Parameter objects to allow setting bounds and fixing parameters without having to modify the objective or model function. For curve-fitting, it defines high level Model functions that can be used.
For you example, you could use your Imaginary function as you've written it with
from lmfit import Model
lmodel = Model(Imaginary)
and then create Parameters (lmfit will name the Parameter objects according to your function signature), providing initial values:
params = lmodel.make_params(alpha=alpha_init, res=f_peak, Ms=Ms_init, off=0)
By default all Parameters are unbound and will vary in the fit, but you can modify these attributes (without rewriting the model function):
params['alpha'].min = 0
params['res'].min = 0
params['Ms'].min = 0
You can set one (or more) of the parameters to not vary in the fit as with:
params['res'].vary = False
To be clear: this does not require altering the model function, making it much easier to change with is fixed, what bounds might be imposed, and so forth.
You would then perform the fit with the model and these parameters:
result = lmodel.fit(y_interpolate, params, freq=x_interpolate)
you can get a report of fit statistics, best-fit values and uncertainties for parameters with
print(result.fit_report())
The best fit Parameters will be held in result.params.
FWIW, lmfit also has builtin Models for many common forms, including Lorentzian and a Constant offset. So, you could construct this model as
from lmfit.models import LorentzianModel, ConstantModel
mymodel = LorentzianModel(prefix='l_') + ConstantModel()
params = mymodel.make_params()
which will have Parameters named l_amplitude, l_center, l_sigma, and c (where c is the constant) and the model will use the name x for the independent variable (your freq). This approach can become very convenient when you may want to change the functional form of the peaks or background, or when fitting multiple peaks to a spectrum.

I was able to solve this issue regarding arbitrary number of parameters and arbitrary positioning of the fixed parameters:
def d_fit(x, y, param, boundMi, boundMx, listparam):
Sparam, SboundMi, SboundMx = asarray([]), asarray([]), asarray([])
Nparam, NboundMi, NboundMx = asarray([]), asarray([]), asarray([])
for i in range(len(param)):
if(listparam[i] == 1):
Sparam = append(Sparam,asarray(param[i]))
SboundMi = append(SboundMi,asarray(boundMi[i]))
SboundMx = append(SboundMx,asarray(boundMx[i]))
else:
Nparam = append(Nparam,asarray(param[i]))
def funF(x, Sparam):
j = 0
for i in range(len(param)):
if(listparam[i] == 1):
param[i] = Sparam[i-j]
else:
param[i] = Nparam[j]
j = j + 1
return fun(x, param)
return curve_fit(lambda x, *Sparam: funF(x, Sparam), x, y, p0 = Sparam, bounds = (SboundMi,SboundMx))
In this case:
param = [a,b,c,...] # parameters array (any size)
boundMi = [min_a, min_b, min_c,...] # minimum allowable value of each parameter
boundMx = [max_a, max_b, max_c,...] # maximum allowable value of each parameter
listparam = [0,1,1,0,...] # 1 = fit and 0 = fix the corresponding parameter in the fit routine
and the root function is define as
def fun(x, param):
a,b,c,d.... = param
return a*b/c... # any function of the params a,b,c,d...
This way, you can change the root function and the number of parameters without changing the fit routine.
And, at any time, you can fix or let fit any parameter by changing "listparam".
Use like this:
popt, pcov = d_fit(x, y, param, boundMi, boundMx, listparam)
"popt" and "pcov" are 1D arrays of the size of the number of "1" in "listparam" bringing the results of the fitted parameters (best value and err matrix)
"param" will ramain an 1D array of the same size of the original (input) "param", HOWEVER IT WILL BE UPDATED AUTOMATICALLY TO THE FITTED VALUES (same as "popt") for the fitted values, keeping the fixed values according to "listparam"
Hope can be usefull!
Obs1: x = 1D-array independent values and y = 1D-array dependent values
Obs2: This is my first post. Please let me know if I can improove it!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.