pymc3: how to model correlated intercept and slope in multilevel linear regression - python

In the Pymc3 example for multilevel linear regression (the example is here, with the radon data set from Gelman et al.’s (2007)), the intercepts (for different counties) and slopes (for apartment with and without basement) each have a Normal prior. How can I model them together with a multivariate normal prior, so that I can examine the correlation between them?
The hierarchical model given in the example is like this:
with pm.Model() as hierarchical_model:
# Hyperpriors for group nodes
mu_a = pm.Normal('mu_a', mu=0., sd=100**2)
sigma_a = pm.HalfCauchy('sigma_a', 5)
mu_b = pm.Normal('mu_b', mu=0., sd=100**2)
sigma_b = pm.HalfCauchy('sigma_b', 5)
# Intercept for each county, distributed around group mean mu_a
# Above we just set mu and sd to a fixed value while here we
# plug in a common group distribution for all a and b (which are
# vectors of length n_counties).
a = pm.Normal('a', mu=mu_a, sd=sigma_a, shape=n_counties)
# Intercept for each county, distributed around group mean mu_a
b = pm.Normal('b', mu=mu_b, sd=sigma_b, shape=n_counties)
# Model error
eps = pm.HalfCauchy('eps', 5)
radon_est = a[county_idx] + b[county_idx] * data.floor.values
# Data likelihood
radon_like = pm.Normal('radon_like', mu=radon_est, sd=eps, observed=data.log_radon)
hierarchical_trace = pm.sample(2000)
And I'm trying to make some change to the priors
with pm.Model() as correlation_model:
# Hyperpriors for group nodes
mu_a = pm.Normal('mu_a', mu=0., sd=100**2)
mu_b = pm.Normal('mu_b', mu=0., sd=100**2)
# here I want to model a and b together
# I borrowed some code from a multivariate normal model
# but the code does not work
sigma = pm.HalfCauchy('sigma', 5, shape=2)
C_triu = pm.LKJCorr('C_triu', n=2, p=2)
C = T.fill_diagonal(C_triu[np.zeros((2,2), 'int')], 1)
cov = pm.Deterministic('cov', T.nlinalg.matrix_dot(sigma, C, sigma))
tau = pm.Deterministic('tau', T.nlinalg.matrix_inverse(cov))
a, b = pm.MvNormal('mu', mu=(mu_a, mu_b), tau=tau,
shape=(n_counties, n_counties))
# Model error
eps = pm.HalfCauchy('eps', 5)
radon_est = a[county_idx] + b[county_idx] * data.floor.values
# Data likelihood
radon_like = pm.Normal('radon_like', mu=radon_est, sd=eps, observed=data.log_radon)
correlation_trace = pm.sample(2000)
Here is the error message I got:
File "<ipython-input-108-ce400c54cc39>", line 14, in <module>
tau = pm.Deterministic('tau', T.nlinalg.matrix_inverse(cov))
File "/home/olivier/anaconda3/lib/python3.5/site-packages/theano/gof/op.py", line 611, in __call__
node = self.make_node(*inputs, **kwargs)
File "/home/olivier/anaconda3/lib/python3.5/site-packages/theano/tensor/nlinalg.py", line 73, in make_node
assert x.ndim == 2
AssertionError
Clearly I've made some mistakes about the covariance matrix, but I'm new to pymc3 and completely new to theano so have no idea how to fix it. I gather this should be a rather common use case so maybe there have been some examples on it? I just can't find them.
The full replicable code and data can be seen on the example page (link given above). I didn't include it here because it's too long and also I thought those familiar with pymc3 are very likely already quite familiar with it:)

You forgot to add one line when creating the covariance matrix you miss-specified the shape of the MvNormal. Your model should look something like this:
with pm.Model() as correlation_model:
mu = pm.Normal('mu', mu=0., sd=10, shape=2)
sigma = pm.HalfCauchy('sigma', 5, shape=2)
C_triu = pm.LKJCorr('C_triu', n=2, p=2)
C = tt.fill_diagonal(C_triu[np.zeros((2,2), 'int')], 1.)
sigma_diag = tt.nlinalg.diag(sigma) # this line
cov = tt.nlinalg.matrix_dot(sigma_diag, C, sigma_diag)
tau = tt.nlinalg.matrix_inverse(cov)
ab = pm.MvNormal('ab', mu=mu, tau=tau, shape=(n_counties, 2))
eps = pm.HalfCauchy('eps', 5)
radon_est = ab[:,0][county_idx] + ab[:,1][county_idx] * data.floor.values
radon_like = pm.Normal('radon_like', mu=radon_est, sd=eps, observed=data.log_radon)
trace = pm.sample(2000)
Notice that alternatively, you can evaluate the correlation of the intercept and the slope from the posterior of hierarchical_model. You can use a frequentist method or build another Bayesian model, that takes as the observed data the result of hierarchical_model. May be this could be faster.
EDIT
If you want to evaluate the correlation of two variables from the posterior you can do something like.
chain = hierarchical_trace[100:]
x_0 = chain['mu_a']
x_1 = chain['mu_b']
X = np.vstack((x_0, x_1)).T
and then you can run the following model:
with pm.Model() as correlation:
mu = pm.Normal('mu', mu=0., sd=10, shape=2)
sigma = pm.HalfCauchy('sigma', 5, shape=2)
C_triu = pm.LKJCorr('C_triu', n=2, p=2)
C = tt.fill_diagonal(C_triu[np.zeros((2,2), 'int')], 1.)
sigma_diag = tt.nlinalg.diag(sigma)
cov = tt.nlinalg.matrix_dot(sigma_diag, C, sigma_diag)
tau = tt.nlinalg.matrix_inverse(cov)
yl = pm.MvNormal('yl', mu=mu, tau=tau, shape=(2, 2), observed=X)
trace = pm.sample(5000, pm.Metropolis())
You can replace x_0 and x_1 according to your needs. For example you may want to do:
x_0 = np.random.normal(chain['mu_a'], chain['sigma_a'])
x_1 = np.random.normal(chain['mu_b'], chain['sigma_b'])

Related

How to calculate Bayesian model selection with Python's pymcmcstat library

I am lost within the pymcmcstat documentation of Python. I managed to plot the parameter distributions etc, but when it comes to the Bayes factor, I need to calculate the integral over the parameter space of likelihood for each model.
I followed this video. Each model has a different model function with different parameters. According to this link, I am supposed to compare the model evidences for model selection. All I have in my hand is the chain results after burnin that returns the distribution for each parameters, chain for sum-of-squares error (SSE) and variances. How do I compare the models with mcmc chain results I have?
Where do I go from here?
Here is my code for one model; for each model, the test_modelfun is changed and the chain results are saved for further comparison of different models;
# Data related lines: input omega and output fm
x = (np.array([76.29395, 152.5879, 305.1758, 610.3516, 1220.703, 2441.406, 4882.813, 9765.625, 19531.25, 39062.5, 78125, 156250, 312500, 625000]))
y = np.array([155.6412886 -63.3826188j , 113.9114436 -79.90544719j, 64.97809441-77.65152741j, 26.87482243-57.38474656j, 7.44462341-34.02438426j, 2.32954856-16.17918216j, 2.30747953 -6.72487436j, 3.39658859 -2.72444011j, 4.0084345 -1.2029167j , 4.25877486 -0.70276446j, 4.11761329 -0.69591231j, 3.83339489 -0.65244854j, 3.47289164 -0.6079278j , 3.07027319 -0.14914359j])
#import mcmc library and add data to the library in the second line below
mcstat = MCMC()
mcstat.data.add_data_set(x,y)
##define transfer function model calculated with theta parameters
def test_modelfun(xdata, theta):
K, alpha_0, alpha_1, Tp_1, Tp_2, Tz_1 = 10**theta[0], 10**theta[1], 10**theta[2], 10**theta[3], 10**theta[4], 10**theta[5]
#####################
Pz_0 = (omega**(alpha_0))
Pz_1 = (np.sqrt(((Tp_1**2)*(omega**(2*alpha_1))) + (2*Tp_1*(omega**alpha_1)*cos(alpha_1*pi/2)) +1))
Pz_2 = (np.sqrt(((Tp_2**2)*(omega**(2*alpha_1))) + (2*Tp_2*(omega**alpha_1)*cos(alpha_1*pi/2)) +1))
Zz_1 = (np.sqrt(((Tz_1**2)*(omega**(2*alpha_1))) + (2*Tz_1*(omega**alpha_1)*cos(alpha_1*pi/2)) +1))
Pp_0 = np.array([(-1*pi*alpha_0)/2]*len(omega)).T#[0]
Pp_1 = np.array([math.atan((Tp_1*(omega[i]**alpha_1)*sin(pi*alpha_1/2))/(1+(Tp_1*(omega[i]**alpha_1)*cos(pi*alpha_1/2)))) for i in range(len(omega))])
Pp_2 = np.array([math.atan((Tp_2*(omega[i]**alpha_1)*sin(pi*alpha_1/2))/(1+(Tp_2*(omega[i]**alpha_1)*cos(pi*alpha_1/2)))) for i in range(len(omega))])
Zp_1 = np.array([math.atan((Tz_1*(omega[i]**alpha_1)*sin(pi*alpha_1/2))/(1+(Tz_1*(omega[i]**alpha_1)*cos(pi*alpha_1/2)))) for i in range(len(omega))])
#####################
Z_est = (K*Zz_1)/(Pz_0*Pz_1*Pz_2)
P_est = Zp_1 + Pp_0 - Pp_1 - Pp_2
#####################
R_est = np.real([cmath.rect(Z_est[i], P_est[i]) for i in range(len(omega))])#abs()#[:,0]
X_est = np.imag([cmath.rect(Z_est[i], P_est[i]) for i in range(len(omega))])#abs()#[:,0]
RX_est = (R_est + 1j*X_est)
return RX_est
def modelfun(xdata, theta):
ymodel = test_modelfun(xdata,theta)
Zest = 20*log10(np.abs(ymodel))
return Zest
##define sum of squares function for the error in evaluating the likelihood function L(Fobs(i)|q)
def test_ssfun(theta,data):
xdata = data.xdata[0]
ydata = data.ydata[0]
ymodel = test_modelfun(xdata,theta)
return (1/len(omega))*(sum((real(fm)- real(ymodel))**2 + (imag(fm)-imag(ymodel))**2))
#sumsquares = sum((ymodel[:,0]-ydata[:,0])**2)
##import mcmc library and add data to the library in the second line below
itr = 50.0e4
verb = 1
wbar = 1
mcstat = MCMC()
mcstat.data.add_data_set(x,y)
## add model parameters
mcstat.parameters.add_model_parameter(name='th_1',theta0=1, minimum=-2,maximum=3) #m_k, M_k = -2, 3
mcstat.parameters.add_model_parameter(name='th_2',theta0=-1, minimum=-4,maximum=0) #m_a0, M_a0 = -4, 0
mcstat.parameters.add_model_parameter(name='th_3',theta0=-1, minimum=-3,maximum=0) #m_a1, M_a1 = -3, 0
mcstat.parameters.add_model_parameter(name='th_4',theta0=-4, minimum=-9,maximum=0) #m_p1, M_p1 = -9, 0
mcstat.parameters.add_model_parameter(name='th_5',theta0=-4, minimum=-9,maximum=0) #m_p2, M_p2 = -9, 0
mcstat.parameters.add_model_parameter(name='th_6',theta0=-4, minimum=-9,maximum=0) #m_z1, M_z1 = -9, 0
## define simulation options: mh=metropolis-hastings, am=adaptive metropolis, dr=delayed rejection, dram=dr+am
mcstat.simulation_options.define_simulation_options(nsimu=int(itr), updatesigma=1, method='dr', adaptint=100, verbosity=verb, waitbar=wbar)
## define model settings
mcstat.model_settings.define_model_settings(sos_function=test_ssfun)
mcstat.run_simulation()
## extract results
results=mcstat.simulation_results.results
chain = results['chain']# chain for each parameter sampled during simulation. s2
s2chain = results['s2chain']# chain for error variances. if updatesigma=0 then s2chain is an empty list
sschain = results['sschain']# chain for sum-of-squares error calculated using each set of parameter values in the cahin
names = results['names']
burnin = int(itr/2)
## display chain statistics
mcstat.chainstats(chain[burnin:,:],results)
mcpl = mcstat.mcmcplot
figcp = mcpl.plot_chain_panel(chain, names, figsizeinches = (7,6))
axes = figcp.get_axes()
for ii, ax in enumerate(axes):
ch = chain[:, ii]
ax.plot([burnin, burnin], [ch.min(), ch.max()], 'r')
figpd = mcpl.plot_density_panel(chain[burnin:,:], names, figsizeinches=(7,6))
figpc = mcpl.plot_pairwise_correlation_panel(chain[burnin:,:], names, figsizeinches = (7,6))
mcstat.PI.setup_prediction_interval_calculation(results=results, data=mcstat.data, modelfunction=modelfun, burnin=burnin)
mcstat.PI.generate_prediction_intervals(calc_pred_int=True, waitbar=False)
fg, ax = mcstat.PI.plot_prediction_intervals(adddata=True, plot_pred_int=True, figsizeinches = (7,5), data_display=dict(color='k'))

Estimating the p value of the difference between two proportions using statsmodels and PyMC3 (MCMC simulation) in Python

In Probabilistic-Programming-and-Bayesian-Methods-for-Hackers, a method is proposed to compute the p value that two proportions are different.
(You can find the jupyter notebook here containing the entire chapter
http://nbviewer.jupyter.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter2_MorePyMC/Ch2_MorePyMC_PyMC2.ipynb)
The code is the following:
import pymc3 as pm
figsize(12, 4)
#these two quantities are unknown to us.
true_p_A = 0.05
true_p_B = 0.04
N_A = 1700
N_B = 1700
#generate some observations
observations_A = bernoulli.rvs(true_p_A, size=N_A)
observations_B = bernoulli.rvs(true_p_B, size=N_B)
print(np.mean(observations_A))
print(np.mean(observations_B))
0.04058823529411765
0.03411764705882353
# Set up the pymc3 model. Again assume Uniform priors for p_A and p_B.
with pm.Model() as model:
p_A = pm.Uniform("p_A", 0, 1)
p_B = pm.Uniform("p_B", 0, 1)
# Define the deterministic delta function. This is our unknown of interest.
delta = pm.Deterministic("delta", p_A - p_B)
# Set of observations, in this case we have two observation datasets.
obs_A = pm.Bernoulli("obs_A", p_A, observed=observations_A)
obs_B = pm.Bernoulli("obs_B", p_B, observed=observations_B)
# To be explained in chapter 3.
step = pm.Metropolis()
trace = pm.sample(20000, step=step)
burned_trace=trace[1000:]
p_A_samples = burned_trace["p_A"]
p_B_samples = burned_trace["p_B"]
delta_samples = burned_trace["delta"]
# Count the number of samples less than 0, i.e. the area under the curve
# before 0, represent the probability that site A is worse than site B.
print("Probability site A is WORSE than site B: %.3f" % \
np.mean(delta_samples < 0))
print("Probability site A is BETTER than site B: %.3f" % \
np.mean(delta_samples > 0))
Probability site A is WORSE than site B: 0.167
Probability site A is BETTER than site B: 0.833
However, if we compute the p value using statsmodels, we get a very different result:
from scipy.stats import norm, chi2_contingency
import statsmodels.api as sm
s1 = int(1700 * 0.04058823529411765)
n1 = 1700
s2 = int(1700 * 0.03411764705882353)
n2 = 1700
p1 = s1/n1
p2 = s2/n2
p = (s1 + s2)/(n1+n2)
z = (p2-p1)/ ((p*(1-p)*((1/n1)+(1/n2)))**0.5)
z1, p_value1 = sm.stats.proportions_ztest([s1, s2], [n1, n2])
print('z1 is {0} and p is {1}'.format(z1, p))
z1 is 0.9948492584166934 and p is 0.03735294117647059
With MCMC, the p value seems to be 0.167, but using statsmodels, we get a p value 0.037.
How can I understand this?
Looks like you printed the wrong value. Try this instead:
print('z1 is {0} and p is {1}'.format(z1, p_value1))
Also, if you want to test the hypothesis p_A > p_B then you should set the alternative parameter in the function call to larger like so:
z1, p_value1 = sm.stats.proportions_ztest([s1, s2], [n1, n2], alternative='larger')
The docs have more examples on how to use it.

Multiple levels in hierarchical linear regression using PYMC3

I am trying to set up a hierarchical linear regression model using PYMC3. In my particular case, I want to see whether postal codes provide a meaningful structure for other features. Suppose I use the following mock data:
import pandas as pd
import numpy as np
import pymc3 as pm
data = pd.DataFrame({"postalcode": np.floor(np.random.uniform(low=10, high=99, size=1000)),
"x": np.random.normal(size=1000),
"y": np.random.normal(size=1000)})
data["postalcode"] = data["postalcode"].astype(int)
I generate postal codes from 10 to 99, as well as a normally distributed feature x and a target value y. Now I set up my indices for postal code level 1 and level 2:
def create_pc_index(level):
pc = data["postalcode"].astype(str).str[0:level]
unique_pc = pc.unique()
pc_dict = dict(zip(unique_pc, range(0, len(unique_pc))))
return pc_dict, pc.apply(lambda x: pc_dict[x]).values
pc1_dict, pc1_index = create_pc_index(1)
pc2_dict, pc2_index = create_pc_index(2)
Using the first digit of the postal code as hierarchical attribute works fine:
number_of_samples = 1000
x = data["x"]
y = data["y"]
with pm.Model() as model:
sigma = pm.HalfCauchy('sigma', beta=10, testval=0.5, shape=1)
mu_i = pm.Normal("mu_i", 5, sd=25, shape=1)
intercept = pm.Normal('Intercept', mu_i, sd=1, shape=len(pc1_dict))
mu_s = pm.Normal("mu_x", 0, sd=3, shape=1)
x_coeffs = pm.Normal("x", mu_s, 1, shape=len(pc1_dict))
mean = intercept[pc1_index] + x_coeffs[pc1_index] * x
likelihood_mean = pm.Deterministic("mean", mean)
likelihood = pm.Normal('y', mu=likelihood_mean, sd=sigma, observed=y)
trace = pm.sample(number_of_samples)
burned_trace = trace[number_of_samples/2:]
However, if I want to add a second level to my hierarchy (in this case only on the intercept, ignoring x for the moment), I run into shape problems
with pm.Model() as model:
sigma = pm.HalfCauchy('sigma', beta=10, testval=0.5, shape=1)
mu_i_level_1 = pm.Normal("mu_i", 0, sd=25, shape=1)
mu_i_level_2 = pm.Normal("mu_i_level_2", mu_i_level_1, sd=1, shape=len(pc1_dict))
intercept = pm.Normal('Intercept', mu_i_level_2[pc1_index], sd=1, shape=len(pc2_dict))
mu_s = pm.Normal("mu_x", 0, sd=3, shape=1)
x_coeffs = pm.Normal("x", mu_s, 1, shape=len(pc1_dict))
mean = intercept[pc2_index] + x_coeffs[pc1_index] * x
likelihood_mean = pm.Deterministic("mean", mean)
likelihood = pm.Normal('y', mu=likelihood_mean, sd=sigma, observed=y)
trace = pm.sample(number_of_samples)
burned_trace = trace[number_of_samples/2:]
The error message is:
operands could not be broadcast together with shapes (89,) (1000,)
How do I model multiple levels in my regression correctly? Is this just an issue with the correct shape size or is there a more fundamental error on my part?
Thanks in advance!
I don't think intercept can have a shape of len(pc2_dict) but a mu of len(pc1_dict). The contradiction is here:
intercept = pm.Normal('Intercept', mu_i_level_2[pc1_index], sd=1, shape=len(pc2_dict))

How to fit multiple datasets which have a combination of shared and non-shared parameters

I'm trying to fit multiple dataset which should have some variables which are shared by between datasets and others which are not. However I'm unsure of the steps I need to take to do this. Below I've shown the approach that I'm trying to use (from 'Issues begin here' doesn't work, and it just for illustrative purposes).
In this answer somebody is able to share parameters arcoss datasets is there some way that this could be adapted so that I can also have some non-shared parameters?
Does anybody have an idea how I could achieve this, or could somebody suggegst a better approach to acheive the same result? Thanks.
import numpy as np
from scipy.stats import gamma
import matplotlib.pyplot as plt
import pandas as pd
from lmfit import minimize, Minimizer, Parameters, Parameter, report_fit, Model
# Create datasets to fit
a = 1.99
start = gamma.ppf(0.001, a)
stop = gamma.ppf(.99, a)
xvals = np.linspace(start, stop, 100)
yvals = gamma.pdf(xvals, a)
data_dict = {}
for dataset in range(4):
name = 'dataset_' + str(dataset)
rand_offset = np.random.uniform(-.1, .1)
noise = np.random.uniform(-.05, .05,len(yvals)) + rand_offset
data_dict[name] = yvals + noise
df = pd.DataFrame(data_dict)
# Create some non-shared parameters
non_shared_param1 = np.random.uniform(0.009, .21, 4)
non_shared_param2 = np.random.uniform(0.01, .51, 4)
# Create the independent variable
ind_var = np.linspace(.001,100,100)
# Create a model
def model_func(time, Qi, at, vw, R, rhob_cb, al, NSP1, NSP2):
Dt = at * vw
Dl = al * vw
t = time
first_bot = 8 * np.pi * t * rhob_cb
sec_bot = np.sqrt((np.pi * (Dl * R) * t))
exp_top = R * np.power((NSP1 - ((t * vw)/R)), 2)
exp_bot = 4 * Dl * t
exp_top2 = R * np.power(NSP2, 2)
exp_bot2 = 4 * Dt * t
return (Qi / first_bot * sec_bot) * np.exp(- (exp_top / exp_bot) - (exp_top2 / exp_bot2))
model = Model(model_func)
### Issues begin here ###
all_results = {}
index = 0
for col in df:
# This block assigns the correct non-shared parameter for the particular fit
nsp1 = non_shared_param1[index]
nsp2 = non_shared_param2[index]
index += 1
params = Parameters()
at = 0.1
al = 0.15
vw = 10**-4
Dt = at * vw
Dl = al * vw
# Non-shared parameters
model.set_param_hint('NSP1', value = nsp1)
model.set_param_hint('NSP2', value = nsp2)
# Shared and varying parameters
model.set_param_hint('vw', value =10**-4, min=10**-10)
model.set_param_hint('at', value =0.1)
model.set_param_hint('al', value =0.15)
# Shared and fixed parameters
model.set_param_hint('Qi', value = 1000, vary = True)
model.set_param_hint('R', value = 1.7, vary = True)
model.set_param_hint('rhob_cb', value =2895, vary = True)
# One set of parameters should be returned
result = model.fit(df[col], time = ind_var)
all_results[index] = result
A fit with lmfit always uses a single instance of a Parameters object, it does not take multiple Parameters objects.
In order to simultaneously fit multiple data sets with the similar models (perhaps the same mathematical model, but expecting different parameter values for each model), you need to have an objective function that concatenates the residuals from the different component models. And each of those models has to have parameters that are taken from the single instance of Parameters(), each parameter having a unique name.
So, to fit 2 data sets with the same function (let's use Gaussian, with parameters "center", "amplitude", and "sigma"), you might define the Parameters as
params = Parameters()
params.add('center_1', 5., vary=True)
params.add('amplitude_1', 10., vary=True)
params.add('sigma_1', 1.0, vary=True
params.add('center_2', 8., vary=True)
params.add('amplitude_2', 3., vary=True)
params.add('sigma_2', 2.0, vary=True)
Then use 'center_1', 'amplitude_1', and 'sigma_1' to calculate the model for the first data set and 'center_2', etc to calculate the the model for the second, perhaps as
def residual(params, x, datasets):
model1 = params['amplitude_1'] * gaussian(x, params['center_1'], params['sigma_1'])
model2 = params['amplitude_2'] * gaussian(x, params['center_2'], params['sigma_2']
resid1 = datasets[0] - model1
resid2 = datasets[1] - model2
return np.concatenate((resid1, resid2))
fit = lmfit.minimize(residual, params, fcn_args=(x, datasets))
As you might be able to see from this, Parameter values are independent by default. In order to share parameter values to be used in different datasets, you have to do that explicitly (as shown in the linked answer you provide).
For example, if you want to require the sigma values to be the same, would not change the residual function, just the Parameter definition above to:
params.add('sigma_2', expr='sigma_1')
You could require the two amplitudes to add to some value:
params.add('amplitude_2', expr='10 - amplitude_1')
or perhaps you would want to ensure that 'center_2' is larger than 'center_1', but by an amount to be determined in the fit:
params.add('center_offset', value=0.5, min=0)
params.add('center_2', expr='center_1 + center_offset')
Those are all ways to tie parameter values. By default, they're independent. Of course, you can also have some parameters that get used in all the models (say, just call the parameter 'sigma' and use it in for all models).

Modified BPMF in PyMC3 using `LKJCorr` priors: PositiveDefiniteError using `NUTS`

I previously implemented the original Bayesian Probabilistic Matrix Factorization (BPMF) model in pymc3. See my previous question for reference, data source, and problem setup. Per the answer to that question from #twiecki, I've implemented a variation of the model using LKJCorr priors for the correlation matrices and uniform priors for the standard deviations. In the original model, the covariance matrices are drawn from Wishart distributions, but due to current limitations of pymc3, the Wishart distribution cannot be sampled from properly. This answer to a loosely related question provides a succinct explanation for the choice of LKJCorr priors. The new model is below.
import pymc3 as pm
import numpy as np
import theano.tensor as t
n, m = train.shape
dim = 10 # dimensionality
beta_0 = 1 # scaling factor for lambdas; unclear on its use
alpha = 2 # fixed precision for likelihood function
std = .05 # how much noise to use for model initialization
# We will use separate priors for sigma and correlation matrix.
# In order to convert the upper triangular correlation values to a
# complete correlation matrix, we need to construct an index matrix:
n_elem = dim * (dim - 1) / 2
tri_index = np.zeros([dim, dim], dtype=int)
tri_index[np.triu_indices(dim, k=1)] = np.arange(n_elem)
tri_index[np.triu_indices(dim, k=1)[::-1]] = np.arange(n_elem)
logging.info('building the BPMF model')
with pm.Model() as bpmf:
# Specify user feature matrix
sigma_u = pm.Uniform('sigma_u', shape=dim)
corr_triangle_u = pm.LKJCorr(
'corr_u', n=1, p=dim,
testval=np.random.randn(n_elem) * std)
corr_matrix_u = corr_triangle_u[tri_index]
corr_matrix_u = t.fill_diagonal(corr_matrix_u, 1)
cov_matrix_u = t.diag(sigma_u).dot(corr_matrix_u.dot(t.diag(sigma_u)))
lambda_u = t.nlinalg.matrix_inverse(cov_matrix_u)
mu_u = pm.Normal(
'mu_u', mu=0, tau=beta_0 * lambda_u, shape=dim,
testval=np.random.randn(dim) * std)
U = pm.MvNormal(
'U', mu=mu_u, tau=lambda_u,
shape=(n, dim), testval=np.random.randn(n, dim) * std)
# Specify item feature matrix
sigma_v = pm.Uniform('sigma_v', shape=dim)
corr_triangle_v = pm.LKJCorr(
'corr_v', n=1, p=dim,
testval=np.random.randn(n_elem) * std)
corr_matrix_v = corr_triangle_v[tri_index]
corr_matrix_v = t.fill_diagonal(corr_matrix_v, 1)
cov_matrix_v = t.diag(sigma_v).dot(corr_matrix_v.dot(t.diag(sigma_v)))
lambda_v = t.nlinalg.matrix_inverse(cov_matrix_v)
mu_v = pm.Normal(
'mu_v', mu=0, tau=beta_0 * lambda_v, shape=dim,
testval=np.random.randn(dim) * std)
V = pm.MvNormal(
'V', mu=mu_v, tau=lambda_v,
testval=np.random.randn(m, dim) * std)
# Specify rating likelihood function
R = pm.Normal(
'R', mu=t.dot(U, V.T), tau=alpha * np.ones((n, m)),
observed=train)
# `start` is the start dictionary obtained from running find_MAP for PMF.
# See the previous post for PMF code.
for key in bpmf.test_point:
if key not in start:
start[key] = bpmf.test_point[key]
with bpmf:
step = pm.NUTS(scaling=start)
The goal with this reimplementation was to produce a model that could be estimated using the NUTS sampler. Unfortunately, I'm still getting the same error at the last line:
PositiveDefiniteError: Scaling is not positive definite. Simple check failed. Diagonal contains negatives. Check indexes [ 0 1 2 3 ... 1030 1031 1032 1033 1034 ]
I've made all the code for PMF, BPMF, and this modified BPMF available in this gist to make it simple to replicate the error. All you need to do is download the data (also referenced in the gist).
It looks like you are passing the complete precision matrix into the normal distribution:
mu_u = pm.Normal(
'mu_u', mu=0, tau=beta_0 * lambda_u, shape=dim,
testval=np.random.randn(dim) * std)
I assume you only want to pass the diagonal values:
mu_u = pm.Normal(
'mu_u', mu=0, tau=beta_0 * t.diag(lambda_u), shape=dim,
testval=np.random.randn(dim) * std)
Does this change to mu_u and mu_v fix it for you?

Categories

Resources