This is perhaps a silly question.
I'm trying to fit data to a very strange PDF using MCMC evaluation in PyMC. For this example I just want to figure out how to fit to a normal distribution where I manually input the normal PDF. My code is:
data = [];
for count in range(1000): data.append(random.gauss(-200,15));
mean = mc.Uniform('mean', lower=min(data), upper=max(data))
std_dev = mc.Uniform('std_dev', lower=0, upper=50)
# #mc.potential
# def density(x = data, mu = mean, sigma = std_dev):
# return (1./(sigma*np.sqrt(2*np.pi))*np.exp(-((x-mu)**2/(2*sigma**2))))
mc.Normal('process', mu=mean, tau=1./std_dev**2, value=data, observed=True)
model = mc.MCMC([mean,std_dev])
model.sample(iter=5000)
print "!"
print(model.stats()['mean']['mean'])
print(model.stats()['std_dev']['mean'])
The examples I've found all use something like mc.Normal, or mc.Poisson or whatnot, but I want to fit to the commented out density function.
Any help would be appreciated.
An easy way is to use the stochastic decorator:
import pymc as mc
import numpy as np
data = np.random.normal(-200,15,size=1000)
mean = mc.Uniform('mean', lower=min(data), upper=max(data))
std_dev = mc.Uniform('std_dev', lower=0, upper=50)
#mc.stochastic(observed=True)
def custom_stochastic(value=data, mean=mean, std_dev=std_dev):
return np.sum(-np.log(std_dev) - 0.5*np.log(2) -
0.5*np.log(np.pi) -
(value-mean)**2 / (2*(std_dev**2)))
model = mc.MCMC([mean,std_dev,custom_stochastic])
model.sample(iter=5000)
print "!"
print(model.stats()['mean']['mean'])
print(model.stats()['std_dev']['mean'])
Note that my custom_stochastic function returns the log likelihood, not the likelihood, and that it is the log likelihood for the entire sample.
There are a few other ways to create custom stochastic nodes. This doc gives more details, and this gist contains an example using pymc.Stochastic to create a node with a kernel density estimator.
Related
I am trying to estimate a model in tensorflow using NUTS by providing it a likelihood function. I have checked the likelihood function is returning reasonable values. I am following the setup here for setting up NUTS:
https://rlhick.people.wm.edu/posts/custom-likes-tensorflow.html
and some of the examples here for setting up priors, etc.:
https://github.com/tensorflow/probability/blob/master/tensorflow_probability/examples/jupyter_notebooks/Multilevel_Modeling_Primer.ipynb
My code is in a colab notebook here:
https://drive.google.com/file/d/1L9JQPLO57g3OhxaRCB29do2m808ZUeex/view?usp=sharing
I get the error: OperatorNotAllowedInGraphError: iterating overtf.Tensoris not allowed: AutoGraph did not convert this function. Try decorating it directly with #tf.function. This is my first time using tensorflow and I am quite lost interpreting this error. It would also be ideal if I could pass the starting parameter values as a single input (example I am working off doesn't do it, but I assume it is possible).
Update
It looks like I had to change the position of the #tf.function decorator. The sampler now runs, but it gives me the same value for all samples for each of the parameters. Is it a requirement that I pass a joint distribution through the log_prob() function? I am clearly missing something. I can run the likelihood through bfgs optimization and get reasonable results (I've estimated the model via maximum likelihood with fixed parameters in other software). It looks like I need to define the function to return a joint distribution and call log_prob(). I can do this if I set it up as a logistic regression (logit choice model is logistically distributed in differences). However, I lose the standard closed form.
My function is as follows:
#tf.function
def mmnl_log_prob(init_mu_b_time,init_sigma_b_time,init_a_car,init_a_train,init_b_cost,init_scale):
# Create priors for hyperparameters
mu_b_time = tfd.Sample(tfd.Normal(loc=init_mu_b_time, scale=init_scale),sample_shape=1).sample()
# HalfCauchy distributions are too wide for logit discrete choice
sigma_b_time = tfd.Sample(tfd.Normal(loc=init_sigma_b_time, scale=init_scale),sample_shape=1).sample()
# Create priors for parameters
a_car = tfd.Sample(tfd.Normal(loc=init_a_car, scale=init_scale),sample_shape=1).sample()
a_train = tfd.Sample(tfd.Normal(loc=init_a_train, scale=init_scale),sample_shape=1).sample()
# a_sm = tfd.Sample(tfd.Normal(loc=init_a_sm, scale=init_scale),sample_shape=1).sample()
b_cost = tfd.Sample(tfd.Normal(loc=init_b_cost, scale=init_scale),sample_shape=1).sample()
# Define a heterogeneous random parameter model with MultivariateNormalDiag()
# Use MultivariateNormalDiagPlusLowRank() to define nests, etc.
b_time = tfd.Sample(tfd.MultivariateNormalDiag( # b_time
loc=mu_b_time,
scale_diag=sigma_b_time),sample_shape=num_idx).sample()
# Definition of the utility functions
V1 = a_train + tfm.multiply(b_time,TRAIN_TT_SCALED) + b_cost * TRAIN_COST_SCALED
V2 = tfm.multiply(b_time,SM_TT_SCALED) + b_cost * SM_COST_SCALED
V3 = a_car + tfm.multiply(b_time,CAR_TT_SCALED) + b_cost * CAR_CO_SCALED
print("Vs",V1,V2,V3)
# Definition of loglikelihood
eV1 = tfm.multiply(tfm.exp(V1),TRAIN_AV_SP)
eV2 = tfm.multiply(tfm.exp(V2),SM_AV_SP)
eV3 = tfm.multiply(tfm.exp(V3),CAR_AV_SP)
eVD = eV1 + eV2 +
eV3
print("eVs",eV1,eV2,eV3,eVD)
l1 = tfm.multiply(tfm.truediv(eV1,eVD),tf.cast(tfm.equal(CHOICE,1),tf.float32))
l2 = tfm.multiply(tfm.truediv(eV2,eVD),tf.cast(tfm.equal(CHOICE,2),tf.float32))
l3 = tfm.multiply(tfm.truediv(eV3,eVD),tf.cast(tfm.equal(CHOICE,3),tf.float32))
ll = tfm.reduce_sum(tfm.log(l1+l2+l3))
print("ll",ll)
return ll
The function is called as follows:
nuts_samples = 1000
nuts_burnin = 500
chains = 4
## Initial step size
init_step_size=.3
init = [0.,0.,0.,0.,0.,.5]
##
## NUTS (using inner step size averaging step)
##
#tf.function
def nuts_sampler(init):
nuts_kernel = tfp.mcmc.NoUTurnSampler(
target_log_prob_fn=mmnl_log_prob,
step_size=init_step_size,
)
adapt_nuts_kernel = tfp.mcmc.DualAveragingStepSizeAdaptation(
inner_kernel=nuts_kernel,
num_adaptation_steps=nuts_burnin,
step_size_getter_fn=lambda pkr: pkr.step_size,
log_accept_prob_getter_fn=lambda pkr: pkr.log_accept_ratio,
step_size_setter_fn=lambda pkr, new_step_size: pkr._replace(step_size=new_step_size)
)
samples_nuts_, stats_nuts_ = tfp.mcmc.sample_chain(
num_results=nuts_samples,
current_state=init,
kernel=adapt_nuts_kernel,
num_burnin_steps=100,
parallel_iterations=5)
return samples_nuts_, stats_nuts_
samples_nuts, stats_nuts = nuts_sampler(init)
I have an answer to my question! It is simply a matter of different nomenclature. I need to define my model as a softmax function, which I knew was what I would call a "logit model", but it just wasn't clicking for me. The following blog post gave me the epiphany:
http://khakieconomics.github.io/2019/03/17/Putting-it-all-together.html
Does anyone know how I can see the final acceptance-rate in PyMC3 (Metropolis-Hastings) ? Or in general, how can I see all the information that pymc3.sample() returns ?
Thanks
Given an example, first, set up the model:
import pymc3 as pm3
sigma = 3 # Note this is the std of our data
data = norm(10,sigma).rvs(100)
mu_prior = 8
sigma_prior = 1.5 # Note this is our prior on the std of mu
plt.hist(data,bins=20)
plt.show()
basic_model = pm3.Model()
with basic_model:
# Priors for unknown model parameters
mu = pm3.Normal('Mean of Data',mu_prior,sigma_prior)
# Likelihood (sampling distribution) of observations
data_in = pm3.Normal('Y_obs', mu=mu, sd=sigma, observed=data)
Second, perform the simulation:
chain_length = 10000
with basic_model:
# obtain starting values via MAP
startvals = pm3.find_MAP(model=basic_model)
# instantiate sampler
step = pm3.Metropolis()
# draw 5000 posterior samples
trace = pm3.sample(chain_length, step=step, start=startvals)
Using the above example, the acceptance rate can be calculated this way:
accept = np.sum(trace['Mean of Data'][1:] != trace['Mean of Data'][:-1])
print("Acceptance Rate: ", accept/trace['Mean of Data'].shape[0])
(I found this solution in an online tutorial, but I don't quite understand it.)
Reference: Introduction to PyMC3
I checked for the NUTS algorithm, and found the solution from here pymc3 forum.
trace.mean_tree_accept.mean()
Let step = pymc3.Metropolis() be our sampler, we can get the final acceptance-rate through
"step.accepted"
Just for beginners (pymc3) like myself, after each variable/obj. put a "." and hit the tab key; you will see some interesting suggestions ;)
I want to use sklearn.mixture.GaussianMixture to store a gaussian mixture model so that I can later use it to generate samples or a value at a sample point using score_samples method. Here is an example where the components have the following weight, mean and covariances
import numpy as np
weights = np.array([0.6322941277066596, 0.3677058722933399])
mu = np.array([[0.9148052872961359, 1.9792961751316835],
[-1.0917396392992502, -0.9304220945910037]])
sigma = np.array([[[2.267889129267119, 0.6553245618368836],
[0.6553245618368835, 0.6571014653342457]],
[[0.9516607767206848, -0.7445831474157608],
[-0.7445831474157608, 1.006599716443763]]])
Then I initialised the mixture as follow
from sklearn import mixture
gmix = mixture.GaussianMixture(n_components=2, covariance_type='full')
gmix.weights_ = weights # mixture weights (n_components,)
gmix.means_ = mu # mixture means (n_components, 2)
gmix.covariances_ = sigma # mixture cov (n_components, 2, 2)
Finally I tried to generate a sample based on the parameters which resulted in an error:
x = gmix.sample(1000)
NotFittedError: This GaussianMixture instance is not fitted yet. Call 'fit' with appropriate arguments before using this method.
As I understand GaussianMixture is intended to fit a sample using a mixture of Gaussian but is there a way to provide it with the final values and continue from there?
You rock, J.P.Petersen!
After seeing your answer I compared the change introduced by using fit method. It seems the initial instantiation does not create all the attributes of gmix. Specifically it is missing the following attributes,
covariances_
means_
weights_
converged_
lower_bound_
n_iter_
precisions_
precisions_cholesky_
The first three are introduced when the given inputs are assigned. Among the rest, for my application the only attribute that I need is precisions_cholesky_ which is cholesky decomposition of the inverse covarinace matrices. As a minimum requirement I added it as follow,
gmix.precisions_cholesky_ = np.linalg.cholesky(np.linalg.inv(sigma)).transpose((0, 2, 1))
It seems that it has a check that makes sure that the model has been trained. You could trick it by training the GMM on a very small data set before setting the parameters. Like this:
gmix = mixture.GaussianMixture(n_components=2, covariance_type='full')
gmix.fit(rand(10, 2)) # Now it thinks it is trained
gmix.weights_ = weights # mixture weights (n_components,)
gmix.means_ = mu # mixture means (n_components, 2)
gmix.covariances_ = sigma # mixture cov (n_components, 2, 2)
x = gmix.sample(1000) # Should work now
To understand what is happening, what GaussianMixture first checks that it has been fitted:
self._check_is_fitted()
Which triggers the following check:
def _check_is_fitted(self):
check_is_fitted(self, ['weights_', 'means_', 'precisions_cholesky_'])
And finally the last function call:
def check_is_fitted(estimator, attributes, msg=None, all_or_any=all):
which only checks that the classifier already has the attributes.
So in short, the only thing you have missing to have it working (without having to fit it) is to set precisions_cholesky_ attribute:
gmix.precisions_cholesky_ = 0
should do the trick (can't try it so not 100% sure :P)
However, if you want to play safe and have a consistent solution in case scikit-learn updates its contrains, the solution of #J.P.Petersen is probably the best way to go.
As a slight alternative to #hashmuke's answer, you can use the precision computation that is used inside GaussianMixture directly:
import numpy as np
from scipy.stats import invwishart as IW
from sklearn.mixture import GaussianMixture as GMM
from sklearn.mixture._gaussian_mixture import _compute_precision_cholesky
n_dims = 5
mu1 = np.random.randn(n_dims)
mu2 = np.random.randn(n_dims)
Sigma1 = IW.rvs(n_dims, 0.1 * np.eye(n_dims))
Sigma2 = IW.rvs(n_dims, 0.1 * np.eye(n_dims))
gmm = GMM(n_components=2)
gmm.weights_ = np.array([0.2, 0.8])
gmm.means_ = np.stack([mu1, mu2])
gmm.covariances_ = np.stack([Sigma1, Sigma2])
gmm.precisions_cholesky_ = _compute_precision_cholesky(gmm.covariances_, 'full')
X, y = gmm.sample(1000)
And depending on your covariance type you should change full accordingly as input to _compute_precision_cholesky (will be one of full, diag, tied, spherical).
I am completely new to pymc3, so please excuse the fact that this is likely trivial. I have a very simple model where I am predicting a binary response function. The model is almost a verbatim copy of this example: https://github.com/pymc-devs/pymc3/blob/master/pymc3/examples/gelman_bioassay.py
I get back the model parameters (alpha, beta, and theta), but I can't seem to figure out how to overplot the predictions of the model vs. the input data. I tried doing this (using the parlance of the bioassay model):
from scipy.stats import binom
mean_alpha = mean(trace['alpha'])
mean_beta = mean(trace['beta'])
pred_death = binom.rvs(n, 1./(1.+np.exp(-(mean_alpha + mean_beta * dose))))
and then plotting dose vs. pred_death, but this is manifestly not correct as I get different draws of the binomial distribution every time.
Related to this is another question, how do I evaluate the goodness of fit? I couldn't seem to find anything to that effect in the "getting started" pymc3 tutorial.
Thanks very much for any advice!
Hi a simple way to do it is as follows:
from pymc3 import *
from numpy import ones, array
# Samples for each dose level
n = 5 * ones(4, dtype=int)
# Log-dose
dose = array([-.86, -.3, -.05, .73])
def invlogit(x):
return np.exp(x) / (1 + np.exp(x))
with Model() as model:
# Logit-linear model parameters
alpha = Normal('alpha', 0, 0.01)
beta = Normal('beta', 0, 0.01)
# Calculate probabilities of death
theta = Deterministic('theta', invlogit(alpha + beta * dose))
# Data likelihood
deaths = Binomial('deaths', n=n, p=theta, observed=[0, 1, 3, 5])
start = find_MAP()
step = NUTS(scaling=start)
trace = sample(2000, step, start=start, progressbar=True)
import matplotlib.pyplot as plt
death_fit = np.percentile(trace.theta,50,axis=0)
plt.plot(dose, death_fit,'g', marker='.', lw='1.25', ls='-', ms=5, mew=1)
plt.show()
If you want to plot dose vs pred_death, where pred_death is computed from the mean estimated values of alpha and beta, then do:
pred_death = 1./(1. + np.exp(-(mean_alpha + mean_beta * dose)))
plt.plot(dose, pred_death)
instead if you want to plot dose vs pred_death, where pred_death is computed taking into account the uncertainty in posterior for alpha and beta. Then probably the easiest way is to use the function sample_ppc:
May be something like
ppc = pm.sample_ppc(trace, samples=100, model=pmmodel)
for i in range(100):
plt.plot(dose, ppc['deaths'][i], 'bo', alpha=0.5)
Using Posterior Predictive Checks (ppc) is a way to check how well your model behaves by comparing the predictions of the model to your actual data. Here you have an example of sample_ppc
Other options could be to plot the mean value plus some interval of interest.
I have a data set that I know has a Pareto distribution. Can someone point me to how to fit this data set in Scipy? I got the below code to run but I have no idea what is being returned to me (a,b,c). Also, after obtaining a,b,c, how do I calculate the variance using them?
import scipy.stats as ss
import scipy as sp
a,b,c=ss.pareto.fit(data)
Be very careful fitting power laws!! Many reported power laws are actually badly fitted by a power law. See Clauset et al. for all the details (also on arxiv if you don't have access to the journal). They have a companion website to the article which now links to a Python implementation. Don't know if it uses Scipy because I used their R implementation when I last used it.
Here's a quickly written version, taking some hints from the Reference page that Rupert gave.
This is currently work in progress in scipy and statsmodels and requires MLE with some fixed or frozen parameters, which is only available in the trunk versions.
No standard errors on the parameter estimates or other result statistics are available yet.
'''estimating pareto with 3 parameters (shape, loc, scale) with nested
minimization, MLE inside minimizing Kolmogorov-Smirnov statistic
running some examples looks good
Author: josef-pktd
'''
import numpy as np
from scipy import stats, optimize
#the following adds my frozen fit method to the distributions
#scipy trunk also has a fit method with some parameters fixed.
import scikits.statsmodels.sandbox.stats.distributions_patch
true = (0.5, 10, 1.) # try different values
shape, loc, scale = true
rvs = stats.pareto.rvs(shape, loc=loc, scale=scale, size=1000)
rvsmin = rvs.min() #for starting value to fmin
def pareto_ks(loc, rvs):
est = stats.pareto.fit_fr(rvs, 1., frozen=[np.nan, loc, np.nan])
args = (est[0], loc, est[1])
return stats.kstest(rvs,'pareto',args)[0]
locest = optimize.fmin(pareto_ks, rvsmin*0.7, (rvs,))
est = stats.pareto.fit_fr(rvs, 1., frozen=[np.nan, locest, np.nan])
args = (est[0], locest[0], est[1])
print 'estimate'
print args
print 'kstest'
print stats.kstest(rvs,'pareto',args)
print 'estimation error', args - np.array(true)
Let's say you data is formated like this
import openturns as ot
data = [
[2.7018013],
[8.53280352],
[1.15643882],
[1.03359467],
[1.53152735],
[32.70434285],
[12.60709624],
[2.012235],
[1.06747063],
[1.41394096],
]
sample = ot.Sample([[v] for v in data])
You can easily fit a Pareto distribution using ParetoFactory of OpenTURNS library:
distribution = ot.ParetoFactory().build(sample)
You can of course print it:
print(distribution)
>>> Pareto(beta = 0.00317985, alpha=0.147365, gamma=1.0283)
or plot its PDF:
from openturns.viewer import View
pdf_graph = distribution.drawPDF()
pdf_graph.setTitle(str(distribution))
View(pdf_graph, add_legend=False)
More details on the ParetoFactory are provided in the documentation.
Before passing the data to build() function in OPENTURNS, make sure to convert it this way:
data = [[i] for i in data]
Because Sample() function may return an error.
FYI #Tropilio