I am trying to incorporate different types and replicates of measurements into one model in PyMC3.
Consider the following model: P(t)=P0*exp(-kBt) where P(t), P0, and B are concentrations. k is a rate. We measure P(t) at different times and B once, all through counting of particles. k is the parameter of interest we are trying to infer.
My question has two parts:
(1) How to incorporate measurements on P(t) and B into one model?
(2) How to use a variable number of replicate experiments to inform on the value of k?
I think I can answer part (1), but am unsure about whether it is right or done in the right flavour. I failed to generalise the code to include a variable number of replicates.
For one experiment (one replicate):
ts=np.asarray([time0,time1,...])
counts=np.asarray([countforB,countforPattime0,countforPattime1,...])
basic_model = pm.Model()
with basic_model:
k=pm.Uniform('k',0,20)
B=pm.Uniform('B',0,1000)
P=pm.Uniform('P',0,1000)
exprate=pm.Deterministic('exprate',k*B)
modelmu=pm.math.concatenate(B*(np.asarray([1.0]),P*pm.math.exp(-exprate*ts)))
Y_obs=pm.Poisson('Y_obs',mu=modelmu,observed=counts))
I tried to include different replicates along the lines of the above, but to no avail:
...
k=pm.Uniform('k',0,20) # same within replicates
B=pm.Uniform('B',0,1000,shape=numrepl) # can vary between expts.
P=pm.Uniform('P',0,1000,shape=numrepl) # can vary between expts.
exprate=???
modelmu=???
Multiple Observables
PyMC3 supports multiple observables, that is, you can add multiple RandomVariable objects to the graph with the observed argument set.
Single Trial
In your first case, this would lend some clarity to the model:
counts=[countforPattime0, countforPattime1, ...]
with pm.Model() as single_trial:
# priors
k = pm.Uniform('k', 0, 20)
B = pm.Uniform('B', 0, 1000)
P = pm.Uniform('P', 0, 1000)
# transformed RVs
rate = pm.Deterministic('exprate', k*B)
mu = P*pm.math.exp(-rate*ts)
# observations
B_obs = pm.Poisson('B_obs', mu=B, observed=countforB)
Y_obs = pm.Poisson('Y_obs', mu=mu, observed=counts)
Multiple Trials
With this additional flexibility, hopefully it makes the transition to multiple trials more obvious. It should go something like:
B_cts = np.array(...) # shape (N, 1)
Y_cts = np.array(...) # shape (N, M)
ts = np.array(...) # shape (1, M)
with pm.Model() as multi_trial:
# priors
k = pm.Uniform('k', 0, 20)
B = pm.Uniform('B', 0, 1000, shape=B_cts.shape)
P = pm.Uniform('P', 0, 1000, shape=B_cts.shape)
# transformed RVs
rate = pm.Deterministic('exprate', k*B)
mu = P*pm.math.exp(-rate*ts)
# observations
B_obs = pm.Poisson('B_obs', mu=B, observed=B_cts)
Y_obs = pm.Poisson('Y_obs', mu=mu, observed=Y_cts)
There might be some extra syntax stuff to get the matrices multiplying correctly, but this at least includes the correct shapes.
Priors
Once you get that setup working, it would be in your interest to reconsider the priors. I suspect you have more information about the typical values for those than is currently included, especially since this seems like a chemical or physical model.
For instance, right now the model says,
We believe the true value of B remains fixed for the duration of a trial, but across trials is a completely arbitrary value between 0 and 1000, and measuring it repeatedly within a trial would be Poisson distributed.
Typically, one should avoid truncations unless they are excluding meaningless values. Hence, a lower bound of 0 is fine, but the upper bounds are arbitrary. I'd recommend having a look at the Stan Wiki on choosing priors.
Related
Imagine we tossed a biased coin 8 times (we don’t know how biased it is), and we recorded 5 heads (H) to 3 tails (T) so far. What is the probability of that the next 3 tosses will all be tails? In other words, we are wondering the expected probability of having 5Hs and 6Ts after 11th tosses.
I want to build a MCMC simulation model using pyMC3 to find the Bayesian solution. There is also an analytical solution within the Bayesian approach for this problem. So, I will be able to compare the results derived from the simulation, the analytical way as well as the classical frequentest way. Let me briefly explain what I could do so far:
Frequentest solution:
If we consider the probability for a single toss:
E(T) = p = (3/8) = 0,375
Then, the ultimate answer is E({T,T,T}) = p^3 = (3/8)^3 = 0,052.
2.1. Bayesian solution with analytical way:
Please assume the unknown parameter “p” represents for the bias of the coin.
If we consider the probability for a single toss:
E(T) = Integral0-1[p * P(p | H = 5, T = 3) dp] = 0,400 (I calculated the result after some algebraic manipulation)
Similarly, the ultimate answer is:
E({T,T,T}) = Integral0-1[p^3 * P(p | H = 5, T = 3) dp] = 10/11 = 0,909.
2.2. Bayesian solution with MCMC simulation:
When we consider the probability for a single toss, I built the model in pyMC3 as below:
Head: 0
Tail: 1
data = [0, 0, 0, 0, 0, 1, 1, 1]
import pymc3 as pm
with pm.Model() as coin_flipping:
p = pm.Uniform(‘p’, lower=0, upper=1)
y = pm.Bernoulli(‘y’, p=p, observed=data)
trace = pm.sample(1000)
pm.traceplot(trace)
After I run this code, I got that the posterior mean is E(T) =0,398 which is very close to the result of analytical solution (0,400). I am happy so far, but this is not the ultimate answer. I need a model that simulate the probability of E({T,T,T}). I appreciate if someone help me on this step.
One way is to do this empirically is with PyMC3's posterior predictive sampling. That is, once you have a posterior sampling, you can generate samplings from random parameterizations of the model. The pymc3.sample_posterior_predictive() method will generate new samples the size of your original observed data. Since you are only interested in three flips, we can just ignore the additional flips it generates. For example, if you wanted 10000 random sets of predicted flips, you would do
with pm.Model() as coin_flipping:
# this is still uniform, but I always prefer Beta for proportions
p = pm.Beta(‘p’, alpha=1, beta=1)
pm.Bernoulli(‘y’, p=p, observed=data)
# chains looked a bit waggly at 1K; 10K looks smoother
trace = pm.sample(10000, random_seed=2019, chains=4)
# note this generates (10000, 8) observations
post_pred = pm.sample_posterior_predictive(trace, samples=10000, random_seed=2019)
To then see how frequent the next three flips are (1,1,1), we can do
np.mean(np.sum(post_pred['y'][:,:3], axis=1) == 3)
# 0.0919
Analytic Solution
In this example, since we have an analytic posterior predictive distribution (Beta-Binomial[k | n, a=4, b=6] - see the Wikipedia table of conjugate distributions for details), we can exactly calculate the probability of observing three tails in the next three flips as follows:
from scipy.special import comb, beta as beta_fn
n, k = 3, 3 # flips, tails
a, b = 4, 6 # 1 + observed tails, 1 + observed heads
comb(n, k) * beta_fn(n + a, n - k + b) / beta_fn(a, b)
# 0.09090909090909091
Note that the beta_fn is the Euler Beta function, as distinct from the Beta distribution.
I'm trying to understand if there is any meaningful difference in the ways of passing data into a model - either aggregated or as single trials (note this will only be a sensical question for certain distributions e.g. Binomial).
Predicting p for a yes/no trail, using a simple model with a Binomial distribution.
What is the difference in the computation/results of the following models (if any)?
I choose the two extremes, either passing in a single trail at once (reducing to Bernoulli) or passing in the sum of the entire series of trails, to exemplify my meaning though I am interested in the difference in between these extremes also.
# set up constants
p_true = 0.1
N = 3000
observed = scipy.stats.bernoulli.rvs(p_true, size=N)
Model 1: combining all observations into a single data point
with pm.Model() as binomial_model1:
p = pm.Uniform('p', lower=0, upper=1)
observations = pm.Binomial('observations', N, p, observed=np.sum(observed))
trace1 = pm.sample(40000)
Model 2: using each observation individually
with pm.Model() as binomial_model2:
p = pm.Uniform('p', lower=0, upper=1)
observations = pm.Binomial('observations', 1, p, observed=observed)
trace2 = pm.sample(40000)
There is isn't any noticeable difference in the trace or posteriors in this case. I attempted to dig into the pymc3 source code to try to see how the observations were being processed but couldn't find the right part.
Possible expected answers:
pymc3 aggregates the observations under the hood for Binomial anyway so their is no difference
the resultant posterior surface (which is explored in the sample process) is identical in each case -> there is no meaningful/statistical difference in the two models
there are differences in the resultant statistics because of this and that...
This is an interesting example! Your second suggestion is correct: you can actually work out the posterior analytically, and it will be distributed according to
Beta(sum(observed), N - sum(observed))
in either case.
The difference in modelling approach would show up if you used, for example, pm.sample_ppc, in that the first would be distributed according to Binomial(N, p) and the second would be N draws of Binomial(1, p).
I have a list of n observations, each of which is the sum of two Weibull-distributed variables:
x[i] = t1[i] + t2[i]
t1[i] ~ Weibull(shape1, scale1)
t2[i] ~ Weibull(shape2, scale2)
My goal is:
1) Estimate the shape and scale parameters for both Weibull distributions (shape1, scale1, shape2, scale2),
2) For each observation x[i], estimate t1[i] (and t2[i] follows from this).
(Aside: Each observation x[i] is the age of cancer diagnosis, and t1[i] and t2[i] are two different time periods in the development of the tumor. The actual model involves mutation data as well, but before I try that out, I want to make sure that I can use PyMC for this simpler problem.)
I am using PyMC2 to make these estimates, and it looks like the run converges, but to incorrect results. I do not know whether there is a problem with my PyMC model syntax, with the MCMC settings, or both. I tried adapting this advice on using Potentials to model latent variables. First I define x[i] and t1[i] for each observation:
for i in xrange(n):
x[i] = pm.Index('x_%i'%i, x=data, index=i) # data is a list of observations
t1[i] = pm.Weibull('t1_%i'%i, alpha=shape1, beta=scale1)
# Ensure that initial guess for t1 is not more than the observed sum:
if t1[i].value >= x[i].value:
t1[i].value = 0.95 * x[i].value
Then I define a Deterministic for t2[i] = x[i] - t1[i]:
for i in xrange(n):
def subtractfunc(t1=t1, x=x, ii=i):
return x[ii] - t1[ii]
t2[i] = pm.Lambda('t2_%i'%i, subtractfunc)
And last I define the Potential for t2[i]:
t2dist = np.empty(n, dtype=object)
for i in xrange(n):
def weibfunc(t2=t2, shape2=shape2, scale2=scale2, ii=i):
return pm.weibull_like(t2[ii], alpha=shape2, beta=scale2)
t2dist[i] = pm.Potential(logp = weibfunc,
name = 't2dist_%i'%i,
parents = {'shape2':shape2, 'scale2':scale2, 't2':t2},
doc = 'weibull potential for t2',
verbose = 0,
cache_depth = 2)
You can see my full code here. I test by simulating 60 independent observations, with shape1 = 1, scale1 = 30, shape2 = 6.5, scale2 = 10, and I run 1e5 iterations of AdaptiveMetropolis. The results converge to a mean of shape1=1.94, scale1=37.9, shape2=0.55, scale2=36.1, and the 95% HPDs do not include the true values. This resulting distribution is not even in the right ballpark, as this histogram shows. (Blue shows the simulated data x[i] that I used, while the red shows the completely different inferred distribution from a representative iteration in the MCMC run.)
Running again with a different random seed, I get shape1=4.65, scale1=23.3, shape2=0.83, scale2=21.3. This distribution is somewhat closer to the truth. Is there some way to change the MCMC settings to consistently get decent results for this sort of problem? Any advice about using PyMC more effectively is much appreciated.
Update -- tried an "assisted" MCMC run:
I also tried assisting the MCMC run by initializing population-level parameters with values close to the truth. The results are somewhat better, but I now find a systematic bias. The histogram below shows the true distribution of observations (blue) against the fitted distribution (red). The right tail fits nicely, but the fit fails to capture the sharp peak at the left side. This bias occurs consistently, for population sizes n = 60 and 100. I am not sure if this is more of a PyMC question or a general MCMC algorithm issue.
I am trying to learn the parameters of a simple discrete HMM using PyMC. I am modeling the rainy-sunny model from the Wiki page on HMM. The model looks as follows:
I am using the following priors.
theta_start_state ~ beta(20,10)
theta_transition_rainy ~beta(8,2)
theta_transition_sunny ~beta(2,8)
theta_emission_rainy ~ Dirichlet(3,4,3)
theta_emission_sunny ~ Dirichlet(10,6,4)
Initially, I use this setup to create a training set as follows.
## Some not so informative priors!
# Prior on start state
theta_start_state = pm.Beta('theta_start_state',12,8)
# Prior on transition from rainy
theta_transition_rainy = pm.Beta('transition_rainy',8,2)
# Prior on transition from sunny
theta_transition_sunny = pm.Beta('transition_sunny',2,8)
# Prior on emission from rainy
theta_emission_rainy = pm.Dirichlet('emission_rainy',[3,4,3])
# Prior on emission from sunny
theta_emission_sunny = pm.Dirichlet('emission_sunny',[10,6,4])
# Start state
x_train_0 = pm.Categorical('x_0',[theta_start_state, 1-theta_start_state])
N = 100
# Create a train set for hidden states
x_train = np.empty(N, dtype=object)
# Creating a train set of observations
y_train = np.empty(N, dtype=object)
x_train[0] = x_train_0
for i in xrange(1, N):
if x_train[i-1].value==0:
x_train[i] = pm.Categorical('x_train_%d'%i,[theta_transition_rainy, 1- theta_transition_rainy])
else:
x_train[i] = pm.Categorical('x_train_%d'%i,[theta_transition_sunny, 1- theta_transition_sunny])
for i in xrange(0,N):
if x_train[i].value == 0:
# Rain
y_train[i] = pm.Categorical('y_train_%d' %i, theta_emission_rainy)
else:
y_train[i] = pm.Categorical('y_train_%d' %i, theta_emission_sunny)
However, I am not able to understand how to learn these parameters using PyMC. I made a start as follows.
#pm.observed
def y(x=x_train, value =y_train):
N = len(x)
out = np.empty(N, dtype=object)
for i in xrange(0,N):
if x[i].value == 0:
# Rain
out[i] = pm.Categorical('y_%d' %i, theta_emission_rainy)
else:
out[i] = pm.Categorical('y_%d' %i, theta_emission_sunny)
return out
The complete notebook containing this code can be found here.
Aside: The gist containing HMM code for a Gaussian is really hard to understand! (not documented)
Update
Based on the answers below, I tried changing my code as follows:
#pm.stochastic(observed=True)
def y(value=y_train, hidden_states = x_train):
def logp(value, hidden_states):
logprob = 0
for i in xrange(0,len(hidden_states)):
if hidden_states[i].value == 0:
# Rain
logprob = logprob + pm.categorical_like(value[i], theta_emission_rainy)
else:
# Sunny
logprob = logprob + pm.categorical_like(value[i], theta_emission_sunny)
return logprob
The next step would be to create a model and then run the MCMC algorithm. However, the above
edited code would also not work. It gives a ZeroProbability error.
I am not sure if I have interpreted the answers correctly.
Just some thoughts on this:
Sunny and Rainy are mutually exclusive and exhaustive hidden states. Why don't you encode them as a single categorical weather variable which can take one of two values (coding for sunny, rainy) ?
In your likelihood function, you seem to observe Rainy / Sunny. The way I see it in your model graph, these seem to be the hidden, not the observed variables (that would be "walk", "shop" and "clean")
In your likelihood function, you need to sum (for all time steps t) the log-probability of the observed values (of walk, shop and clean respectively) given the current (sampled) values of rainy/sunny (i.e., Weather) at the same time step t.
Learning
If you want to learn parameters for the model, you might want to consider switching to PyMC3 which would be better suited for automatically computing gradients for your logp function. But in this case (since you chose conjugate priors) this is not really neccessary. If you don't know what Conjugate Priors are, or are in need of an overview, ask Wikipedia for List of Conjugate Priors, it has a great article on that.
Depending on what you want to do, you have a few choices here. If you want to sample from the posterior distribution of all parameters, just specify your MCMC model as you did, and press the inference button, after that just plot and summarize the marginal distributions of the parameters you're interested in, and you are done.
If you are not interested in marginal posterior distributions, but rather in finding the joint MAP paramters, you might consider using Expectation Maximization (EM) learning or Simulated Annealing. Both should work reasonably well within the MCMC Framework.
For EM Learning simply repeat these steps until convergence:
E (Expectation) Step: Run the MCMC chain for a while and collect samples
M (Maximization) Step: Update the hyperparamters for your Beta and Dirichlet Priors as if these samples had been actual observations. Look up the Beta and Dirichlet Distributions if you don't know how to do that.
I would use a small learning rate factor so you don't fall into the first local optimum (now we're approaching Simulated Annealing): Instead of treating the N samples you generated from the MCMC chain as actual observations, treat them as K observations (for a value K << N) by scaling the updates to the hyperparameters down by a learning rate factor of K/N.
The first thing that pops out at me is the return value of your likelihood. PyMC expects a scalar return value, not a list/array. You need to sum the array before returning it.
Also, when you use a Dirichlet as a prior for the Categorical, PyMC detects this and fills in the last probability. Here's how I would code your x_train/y_train loops:
p = []
for i in xrange(1, N):
# This will return the first variable if prev=0, and the second otherwise
p.append(pm.Lambda('p_%i' % i, lambda prev=x_train[i-1]: (theta_transition_rainy, theta_transition_sunny)[bool(prev)]))
x_train[i] = pm.Categorical('x_train_%i' % i, p[-1])
So, you grab the appropriate probabilities with a Lambda, and use it as the argument for the Categorical.
I'm trying to automate a process that at some point needs to draw samples from a truncated multivariate normal. That is, it's a normal multivariate normal distribution (i.e. Gaussian) but the variables are constrained to a cuboid. My given inputs are the mean and covariance of the full multivariate normal but I need samples in my box.
Up to now, I'd just been rejecting samples outside the box and resampling as necessary, but I'm starting to find that my process sometimes gives me (a) large covariances and (b) means that are close to the edges. These two events conspire against the speed of my system.
So what I'd like to do is sample the distribution correctly in the first place. Googling led only to this discussion or the truncnorm distribution in scipy.stats. The former is inconclusive and the latter seems to be for one variable. Is there any native multivariate truncated normal? And is it going to be any better than rejecting samples, or should I do something smarter?
I'm going to start working on my own solution, which would be to rotate the untruncated Gaussian to it's principal axes (with an SVD decomposition or something), use a product of truncated Gaussians to sample the distribution, then rotate that sample back, and reject/resample as necessary. If the truncated sampling is more efficient, I think this should sample the desired distribution faster.
So, according to the Wikipedia article, sampling a multivariate truncated normal distribution (MTND) is more difficult. I ended up taking a relatively easy way out and using an MCMC sampler to relax an initial guess towards the MTND as follows.
I used emcee to do the MCMC work. I find this package phenomenally easy-to-use. It only requires a function that returns the log-probability of the desired distribution. So I defined this function
from numpy.linalg import inv
def lnprob_trunc_norm(x, mean, bounds, C):
if np.any(x < bounds[:,0]) or np.any(x > bounds[:,1]):
return -np.inf
else:
return -0.5*(x-mean).dot(inv(C)).dot(x-mean)
Here, C is the covariance matrix of the multivariate normal. Then, you can run something like
S = emcee.EnsembleSampler(Nwalkers, Ndim, lnprob_trunc_norm, args = (mean, bounds, C))
pos, prob, state = S.run_mcmc(pos, Nsteps)
for given mean, bounds and C. You need an initial guess for the walkers' positions pos, which could be a ball around the mean,
pos = emcee.utils.sample_ball(mean, np.sqrt(np.diag(C)), size=Nwalkers)
or sampled from an untruncated multivariate normal,
pos = numpy.random.multivariate_normal(mean, C, size=Nwalkers)
and so on. I personally do several thousand steps of sample discarding first, because it's fast, then force the remaining outliers back within the bounds, then run the MCMC sampling.
The number of steps for convergence is up to you.
Note also that emcee easily supports basic parallelization by adding the argument threads=Nthreads to the EnsembleSampler initialization. So you can make this blazing fast.
I have reimplemented an algorithm which does not depend on MCMC but creates independent and identically distributed (iid) samples from the truncated multivariate normal distribution. Having iid samples can be very useful! I used to also use emcee as described in the answer by Warrick, but for convergence the number of samples needed exploded in higher dimensions, making it impractical for my use case.
The algorithm was introduced by Botev (2016) and uses an accept-reject algorithm based on minimax exponential tilting. It was originally implemented in MATLAB but reimplementing it for Python increased the performance significantly compared to running it using the MATLAB engine in Python. It also works well and is fast at higher dimensions.
The code is available at: https://github.com/brunzema/truncated-mvn-sampler.
An Example:
d = 10 # dimensions
# random mu and cov
mu = np.random.rand(d)
cov = 0.5 - np.random.rand(d ** 2).reshape((d, d))
cov = np.triu(cov)
cov += cov.T - np.diag(cov.diagonal())
cov = np.dot(cov, cov)
# constraints
lb = np.zeros_like(mu) - 1
ub = np.ones_like(mu) * np.inf
# create truncated normal and sample from it
n_samples = 100000
tmvn = TruncatedMVN(mu, cov, lb, ub)
samples = tmvn.sample(n_samples)
Plotting the first dimension results in:
Reference:
Botev, Z. I., (2016), The normal law under linear restrictions: simulation and estimation via minimax tilting, Journal of the Royal Statistical Society Series B, 79, issue 1, p. 125-148
Simulating truncated multivariate normal can be tricky and usually involves some conditional sampling by MCMC.
My short answer is, you can use my code (https://github.com/ralphma1203/trun_mvnt)!!! It implements the Gibbs sampler algorithm from , which can handle general linear constraints in the form of , even when you have non-full rank D and more constraints than the dimensionality.
import numpy as np
from trun_mvnt import rtmvn, rtmvt
########## Traditional problem, probably what you need... ##########
##### lower < X < upper #####
# So D = identity matrix
D = np.diag(np.ones(4))
lower = np.array([-1,-2,-3,-4])
upper = -lower
Mean = np.zeros(4)
Sigma = np.diag([1,2,3,4])
n = 10 # want 500 final sample
burn = 100 # burn-in first 100 iterates
thin = 1 # thinning for Gibbs
random_sample = rtmvn(n, Mean, Sigma, D, lower, upper, burn, thin)
# Numpy array n-by-p as result!
random_sample
########## Non-full rank problem (more constraints than dimension) ##########
Mean = np.array([0,0])
Sigma = np.array([1, 0.5, 0.5, 1]).reshape((2,2)) # bivariate normal
D = np.array([1,0,0,1,1,-1]).reshape((3,2)) # non-full rank problem
lower = np.array([-2,-1,-2])
upper = np.array([2,3,5])
n = 500 # want 500 final sample
burn = 100 # burn-in first 100 iterates
thin = 1 # thinning for Gibbs
random_sample = rtmvn(n, Mean, Sigma, D, lower, upper, burn, thin) # Numpy array n-by-p as result!
A little late I guess but for the record, you could use Hamiltonian Monte Carlo. A module in Matlab exists named HMC exact. It shouldn't be too difficult to translate in Py.