Estimating the p value of the difference between two proportions using statsmodels and PyMC3 (MCMC simulation) in Python

Estimating the p value of the difference between two proportions using statsmodels and PyMC3 (MCMC simulation) in Python - python

In Probabilistic-Programming-and-Bayesian-Methods-for-Hackers, a method is proposed to compute the p value that two proportions are different.
(You can find the jupyter notebook here containing the entire chapter
http://nbviewer.jupyter.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter2_MorePyMC/Ch2_MorePyMC_PyMC2.ipynb)
The code is the following:
import pymc3 as pm
figsize(12, 4)
#these two quantities are unknown to us.
true_p_A = 0.05
true_p_B = 0.04
N_A = 1700
N_B = 1700
#generate some observations
observations_A = bernoulli.rvs(true_p_A, size=N_A)
observations_B = bernoulli.rvs(true_p_B, size=N_B)
print(np.mean(observations_A))
print(np.mean(observations_B))
0.04058823529411765
0.03411764705882353
# Set up the pymc3 model. Again assume Uniform priors for p_A and p_B.
with pm.Model() as model:
p_A = pm.Uniform("p_A", 0, 1)
p_B = pm.Uniform("p_B", 0, 1)
# Define the deterministic delta function. This is our unknown of interest.
delta = pm.Deterministic("delta", p_A - p_B)
# Set of observations, in this case we have two observation datasets.
obs_A = pm.Bernoulli("obs_A", p_A, observed=observations_A)
obs_B = pm.Bernoulli("obs_B", p_B, observed=observations_B)
# To be explained in chapter 3.
step = pm.Metropolis()
trace = pm.sample(20000, step=step)
burned_trace=trace[1000:]
p_A_samples = burned_trace["p_A"]
p_B_samples = burned_trace["p_B"]
delta_samples = burned_trace["delta"]
# Count the number of samples less than 0, i.e. the area under the curve
# before 0, represent the probability that site A is worse than site B.
print("Probability site A is WORSE than site B: %.3f" % \
np.mean(delta_samples < 0))
print("Probability site A is BETTER than site B: %.3f" % \
np.mean(delta_samples > 0))
Probability site A is WORSE than site B: 0.167
Probability site A is BETTER than site B: 0.833
However, if we compute the p value using statsmodels, we get a very different result:
from scipy.stats import norm, chi2_contingency
import statsmodels.api as sm
s1 = int(1700 * 0.04058823529411765)
n1 = 1700
s2 = int(1700 * 0.03411764705882353)
n2 = 1700
p1 = s1/n1
p2 = s2/n2
p = (s1 + s2)/(n1+n2)
z = (p2-p1)/ ((p*(1-p)*((1/n1)+(1/n2)))**0.5)
z1, p_value1 = sm.stats.proportions_ztest([s1, s2], [n1, n2])
print('z1 is {0} and p is {1}'.format(z1, p))
z1 is 0.9948492584166934 and p is 0.03735294117647059
With MCMC, the p value seems to be 0.167, but using statsmodels, we get a p value 0.037.
How can I understand this?

Looks like you printed the wrong value. Try this instead:
print('z1 is {0} and p is {1}'.format(z1, p_value1))
Also, if you want to test the hypothesis p_A > p_B then you should set the alternative parameter in the function call to larger like so:
z1, p_value1 = sm.stats.proportions_ztest([s1, s2], [n1, n2], alternative='larger')
The docs have more examples on how to use it.

Related

Constraining log-likelihood using pymc3.Potential?

Problem description
I am new to probabilistic programming and working through PyMC3's Gaussian Mixture Model sample notebook:
import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc3 as pm
import theano.tensor as tt
# simulate data from a known mixture distribution
np.random.seed(12345) # set random seed for reproducibility
k = 3
ndata = 500
spread = 5
centers = np.array([-spread, 0, spread])
# simulate data from mixture distribution
v = np.random.randint(0, k, ndata)
data = centers[v] + np.random.randn(ndata)
plt.hist(data);
# setup model
model = pm.Model()
with model:
# cluster sizes
p = pm.Dirichlet("p", a=np.array([1.0, 1.0, 1.0]), shape=k)
# ensure all clusters have some points
p_min_potential = pm.Potential("p_min_potential", tt.switch(tt.min(p) < 0.1, -np.inf, 0))
# cluster centers
means = pm.Normal("means", mu=[0, 0, 0], sigma=15, shape=k)
# break symmetry
order_means_potential = pm.Potential(
"order_means_potential",
tt.switch(means[1] - means[0] < 0, -np.inf, 0)
+ tt.switch(means[2] - means[1] < 0, -np.inf, 0),
)
# measurement error
sd = pm.Uniform("sd", lower=0, upper=20)
# latent cluster of each observation
category = pm.Categorical("category", p=p, shape=ndata)
# likelihood for each observed value
points = pm.Normal("obs", mu=means[category], sigma=sd, observed=data)
# fit model
with model:
step1 = pm.Metropolis(vars=[p, sd, means])
step2 = pm.ElemwiseCategorical(vars=[category], values=[0, 1, 2])
tr = pm.sample(10000, step=[step1, step2], tune=5000)
I am struggling with the following expressions:
# ensure all clusters have some points
p_min_potential = pm.Potential("p_min_potential", tt.switch(tt.min(p) < 0.1, -np.inf, 0))
and
# break symmetry
order_means_potential = pm.Potential(
"order_means_potential",
tt.switch(means[1] - means[0] < 0, -np.inf, 0)
+ tt.switch(means[2] - means[1] < 0, -np.inf, 0),
)
What I researched
From looking at related questions and PyMC3's and Theano's documentation I think I understand that pm.Potential() is a way of setting the log-likelihood of an event in your model during sampling without providing observations to it, and that tt.switch() checks whether a certain condition is met and returns one of two values accordingly.
Thus p_min_potential ensures that all values in p are greater than 0.1 by setting the log-likelihood for an event where one value in p to negative infinite, similarly order_means_potential ensures the values in means differ from on another and their ordering stays the same during sampling.
Questions
Unfortunately neither related questions nor the documentations could answer the following questions:
How are the results of these expressions fed back into the model, as neither p_min_potential nor order_means_potential occur as input to any other expression?
Am I right in how tt.switch() works, thus if the condition tt.min(p) < 0.1 is met -np.inf is returned as that events log-likelihood, and 0 in any other case?
Any help would be greatly appreciated, I would like to understand how this example works to the degree where I'll be able to alter and expand it. Specifically I want to implement a mixture model for two or more beta distributions.

Bjontegaard calculation using only one pair of PSNR and BitRate

I want to calculate the BD-Rate for two different video encoding settings using the python script below.
Using 4 RD Points (R1 and PSNR1 are the reference RD Points of the Video1 while R2 and PSNR2 are the new tests with different video settings of Video2) the script works fine ie
from bjontegaard_metric import *
R1 = np.array([686.76, 309.58, 157.11, 85.95])
PSNR1 = np.array([40.28, 37.18, 34.24, 31.42])
R2 = np.array([893.34, 407.8, 204.93, 112.75])
PSNR2 = np.array([40.39, 37.21, 34.17, 31.24])
print 'BD-PSNR: ', BD_PSNR(R1, PSNR1, R2, PSNR2)
print 'BD-RATE: ', BD_RATE(R1, PSNR1, R2, PSNR2)
But with just 1 RD Point ie
from bjontegaard_metric import *
R1 = np.array([686.76])
PSNR1 = np.array([40.28])
R2 = np.array([893.34])
PSNR2 = np.array([40.39])
print 'BD-PSNR: ', BD_PSNR(R1, PSNR1, R2, PSNR2)
print 'BD-RATE: ', BD_RATE(R1, PSNR1, R2, PSNR2)
I get a warning: RankWarning: Polyfit may be poorly conditioned. Each video encoder run, returns just one pair of PSNR and Bitrate as a result. So I want to compare two pairs of PSNR/BitRate (Reference video & modified video). Is there any way to fix this warning? The results I get using only 1 RD point are reliable?
import numpy as np
import scipy.interpolate
def BD_PSNR(R1, PSNR1, R2, PSNR2, piecewise=0):
lR1 = np.log(R1)
lR2 = np.log(R2)
PSNR1 = np.array(PSNR1)
PSNR2 = np.array(PSNR2)
p1 = np.polyfit(lR1, PSNR1, 3)
p2 = np.polyfit(lR2, PSNR2, 3)
# integration interval
min_int = max(min(lR1), min(lR2))
max_int = min(max(lR1), max(lR2))
# find integral
if piecewise == 0:
p_int1 = np.polyint(p1)
p_int2 = np.polyint(p2)
int1 = np.polyval(p_int1, max_int) - np.polyval(p_int1, min_int)
int2 = np.polyval(p_int2, max_int) - np.polyval(p_int2, min_int)
else:
# See https://chromium.googlesource.com/webm/contributor-guide/+/master/scripts/visual_metrics.py
lin = np.linspace(min_int, max_int, num=100, retstep=True)
interval = lin[1]
samples = lin[0]
v1 = scipy.interpolate.pchip_interpolate(np.sort(lR1), PSNR1[np.argsort(lR1)], samples)
v2 = scipy.interpolate.pchip_interpolate(np.sort(lR2), PSNR2[np.argsort(lR2)], samples)
# Calculate the integral using the trapezoid method on the samples.
int1 = np.trapz(v1, dx=interval)
int2 = np.trapz(v2, dx=interval)
# find avg diff
avg_diff = (int2-int1)/(max_int-min_int)
return avg_diff
def BD_RATE(R1, PSNR1, R2, PSNR2, piecewise=0):
lR1 = np.log(R1)
lR2 = np.log(R2)
# rate method
p1 = np.polyfit(PSNR1, lR1, 3)
p2 = np.polyfit(PSNR2, lR2, 3)
# integration interval
min_int = max(min(PSNR1), min(PSNR2))
max_int = min(max(PSNR1), max(PSNR2))
# find integral
if piecewise == 0:
p_int1 = np.polyint(p1)
p_int2 = np.polyint(p2)
int1 = np.polyval(p_int1, max_int) - np.polyval(p_int1, min_int)
int2 = np.polyval(p_int2, max_int) - np.polyval(p_int2, min_int)
else:
lin = np.linspace(min_int, max_int, num=100, retstep=True)
interval = lin[1]
samples = lin[0]
v1 = scipy.interpolate.pchip_interpolate(np.sort(PSNR1), lR1[np.argsort(PSNR1)], samples)
v2 = scipy.interpolate.pchip_interpolate(np.sort(PSNR2), lR2[np.argsort(PSNR2)], samples)
# Calculate the integral using the trapezoid method on the samples.
int1 = np.trapz(v1, dx=interval)
int2 = np.trapz(v2, dx=interval)
# find avg diff
avg_exp_diff = (int2-int1)/(max_int-min_int)
avg_diff = (np.exp(avg_exp_diff)-1)*100
return avg_diff

According to IETF at https://tools.ietf.org/id/draft-ietf-netvc-testing-06.html#rfc.section.4.2 number 2 At least four points must be computed. These points should be the same quantizers when comparing two versions of the same codec. So any lesser points than 4 are not valid for reliable results.
1. Rate/distortion points are calculated for the reference and test codec.
2. At least four points must be computed. These points should be the same quantizers when comparing two versions of the same codec.
3. Additional points outside of the range should be discarded.
4. The rates are converted into log-rates.
5. A piecewise cubic hermite interpolating polynomial is fit to the points for each codec to produce functions of log-rate in terms of distortion.
Metric score ranges are computed:
1. If comparing two versions of the same codec, the overlap is the intersection of the two curves, bound by the chosen quantizer points.
2. If comparing dissimilar codecs, a third anchor codec’s metric scores at fixed quantizers are used directly as the bounds.
3. The log-rate is numerically integrated over the metric range for each curve, using at least 1000 samples and trapezoidal integration.
4. The resulting integrated log-rates are converted back into linear rate, and then the percent difference is calculated from the reference to the test codec.

Checking if Frequentist approach is correct? Bayesian approach using MCMC for AB test. How to calculate Bayes Factors in Python?

I've been trying to get my head around Frequentist and Bayesian approaches for a toy data AB test problem.
The results don't really make sense to me. I am struggling to understand the results, or whether I have computed them (in)correctly (which is probably likely). Furthermore, after much research, I am still somewhat lost as to how to compute Bayes Factors. I've seen packages in R that make this look somewhat easy. Alas, I am not familiar with R and would prefer to be able to solve this problem in Python.
I would greatly appreciate any help and guidance regarding this!
Here is the data:
# imports
import pingouin as pg
import pymc3 as pm
import pandas as pd
import numpy as np
import scipy.stats as scs
import statsmodels.stats.api as sms
import math
import matplotlib.pyplot as plt
# A = control -- B = treatment
a_success = 10730
a_failure = 61988
a_total = a_success + a_failure
a_cr = a_success / a_total
b_success = 10966
b_failure = 60738
b_total = b_success + b_failure
b_cr = b_success / b_total
I started by doing some power analysis, to determine the number of required samples with a power of 0.8, alpha of 0.05 and a practical significance of 2%. I'm not sure whether expected conversion rates should be supplied, or the baseline + some proportion. Depending on the effect size, the required number of samples increases significantly.
# determine required sample size
baseline_rate = a_cr
practical_significance = 0.02
alpha = 0.05
power = 0.8
nobs1 = None
# is this how to calculate effect size?
effect_size = sms.proportion_effectsize(baseline_rate, baseline_rate + practical_significance) # 5204
# # or this?
# effect_size = sms.proportion_effectsize(baseline_rate, baseline_rate + baseline_rate * practical_significance) # 228583
sample_size = sms.NormalIndPower().solve_power(effect_size = effect_size,
power = power,
alpha = alpha,
nobs1 = nobs1,
ratio = 1)
I continued trying to determine if the null hypothesis could be rejected:
# calculate pooled probability
pooled_probability = (a_success + b_success) / (a_total + b_total)
# calculate pooled standard error and margin of error
se_pooled = math.sqrt(pooled_probability * (1 - pooled_probability) * (1 / b_total + 1 / a_total))
z_score = scs.norm.ppf(1 - alpha / 2)
margin_of_error = se_pooled * z_score
# the estimated difference between probability of conversions of both groups
d_hat = (test_b_success / test_b_total) - (test_a_success / test_a_total)
# test if null hypothesis can be rejected
lower_bound = d_hat - margin_of_error
upper_bound = d_hat + margin_of_error
if practical_significance < lower_bound:
print("reject null hypothesis -- groups do not have the same conversion rates")
else:
print("do not reject the null hypothesis -- groups have the same conversion rates")
which evaluates to 'do not reject the null ...' despite group B (treatment) showing a 3.65% relative improvement with regards to conversion rate over group A (control) which seems... odd?
I tried a slightly different approach (I guess a slightly different hypothesis?):
successes = [a_success, b_success]
nobs = [a_total, b_total]
z_stat, p_value = sms.proportions_ztest(successes, nobs=nobs)
(lower_a, lower_b), (upper_a, upper_b) = sms.proportion_confint(successes, nobs=nobs, alpha=alpha)
if p_value < alpha:
print("reject null hypothesis -- groups do not have the same conversion rates")
else:
print("do not reject the null hypothesis -- groups have the same conversion rates")
Which evaluates to 'reject null hypothesis ... ' with p-value: 0.004236. This seems highly contradictory, especially since the p-value is < 0.01.
On to Bayes... I created some arrays of success and failures (and only tested on 100 observations) due to how long this thing takes, and ran the following:
# generate lists of 1, 0
obs_a = np.repeat([1, 0], [a_success, a_failure])
obs_v = np.repeat([1, 0], [b_success, b_failure])
for _ in range(10):
np.random.shuffle(observations_A)
np.random.shuffle(observations_B)
with pm.Model() as model:
p_A = pm.Beta("p_A", 1, 1)
p_B = pm.Beta("p_B", 1, 1)
delta = pm.Deterministic("delta", p_A - p_B)
obs_A = pm.Bernoulli("obs_A", p_A, observed = obs_a[:1000])
obs_B = pm.Bernoulli("obs_B", p_B, observed = obs_b[:1000])
step = pm.NUTS()
trace = pm.sample(1000, step = step, chains = 2)
Firstly, I understand that you are supposed to burn some proportion of the trace -- how do you determine an appropriate number of indices to burn?
In trying to evaluate the posterior probabilities, is the following code the correct way to do this?
b_lift = (trace['p_B'].mean() - trace['p_A'].mean()) / trace['p_A'].mean() * 100
b_prob = np.mean(trace["delta"] > 0)
a_lift = (trace['p_A'].mean() - trace['p_B'].mean()) / trace['p_B'].mean() * 100
a_prob = np.mean(trace["delta"] < 0)
# is the Bayes Factor just the ratio of the posterior probabilities for these two models?
BF = (trace['p_B'] / trace['p_A']).mean()
print(f'There is {b_prob} probability B outperforms A by a magnitude of {round(b_lift, 2)}%')
print(f'There is {a_prob} probability A outperforms B by a magnitude of {round(a_lift, 2)}%')
print('BF:', BF)
-- output:
There is 0.666 probability B outperforms A by a magnitude of 1.29%
There is 0.334 probability A outperforms B by a magnitude of -1.28%
BF: 1.013357654428127
I suspect that this is not the correct way to calculate Bayes Factors. How can the Bayes Factor be calculated?
I really hope you can help me understand all of the above... I realize it's an exceptionally long post. But I've tried every resource I can find and am still stuck!
Kind regards.

pymc3: how to model correlated intercept and slope in multilevel linear regression

In the Pymc3 example for multilevel linear regression (the example is here, with the radon data set from Gelman et al.’s (2007)), the intercepts (for different counties) and slopes (for apartment with and without basement) each have a Normal prior. How can I model them together with a multivariate normal prior, so that I can examine the correlation between them?
The hierarchical model given in the example is like this:
with pm.Model() as hierarchical_model:
# Hyperpriors for group nodes
mu_a = pm.Normal('mu_a', mu=0., sd=100**2)
sigma_a = pm.HalfCauchy('sigma_a', 5)
mu_b = pm.Normal('mu_b', mu=0., sd=100**2)
sigma_b = pm.HalfCauchy('sigma_b', 5)
# Intercept for each county, distributed around group mean mu_a
# Above we just set mu and sd to a fixed value while here we
# plug in a common group distribution for all a and b (which are
# vectors of length n_counties).
a = pm.Normal('a', mu=mu_a, sd=sigma_a, shape=n_counties)
# Intercept for each county, distributed around group mean mu_a
b = pm.Normal('b', mu=mu_b, sd=sigma_b, shape=n_counties)
# Model error
eps = pm.HalfCauchy('eps', 5)
radon_est = a[county_idx] + b[county_idx] * data.floor.values
# Data likelihood
radon_like = pm.Normal('radon_like', mu=radon_est, sd=eps, observed=data.log_radon)
hierarchical_trace = pm.sample(2000)
And I'm trying to make some change to the priors
with pm.Model() as correlation_model:
# Hyperpriors for group nodes
mu_a = pm.Normal('mu_a', mu=0., sd=100**2)
mu_b = pm.Normal('mu_b', mu=0., sd=100**2)
# here I want to model a and b together
# I borrowed some code from a multivariate normal model
# but the code does not work
sigma = pm.HalfCauchy('sigma', 5, shape=2)
C_triu = pm.LKJCorr('C_triu', n=2, p=2)
C = T.fill_diagonal(C_triu[np.zeros((2,2), 'int')], 1)
cov = pm.Deterministic('cov', T.nlinalg.matrix_dot(sigma, C, sigma))
tau = pm.Deterministic('tau', T.nlinalg.matrix_inverse(cov))
a, b = pm.MvNormal('mu', mu=(mu_a, mu_b), tau=tau,
shape=(n_counties, n_counties))
# Model error
eps = pm.HalfCauchy('eps', 5)
radon_est = a[county_idx] + b[county_idx] * data.floor.values
# Data likelihood
radon_like = pm.Normal('radon_like', mu=radon_est, sd=eps, observed=data.log_radon)
correlation_trace = pm.sample(2000)
Here is the error message I got:
File "<ipython-input-108-ce400c54cc39>", line 14, in <module>
tau = pm.Deterministic('tau', T.nlinalg.matrix_inverse(cov))
File "/home/olivier/anaconda3/lib/python3.5/site-packages/theano/gof/op.py", line 611, in __call__
node = self.make_node(*inputs, **kwargs)
File "/home/olivier/anaconda3/lib/python3.5/site-packages/theano/tensor/nlinalg.py", line 73, in make_node
assert x.ndim == 2
AssertionError
Clearly I've made some mistakes about the covariance matrix, but I'm new to pymc3 and completely new to theano so have no idea how to fix it. I gather this should be a rather common use case so maybe there have been some examples on it? I just can't find them.
The full replicable code and data can be seen on the example page (link given above). I didn't include it here because it's too long and also I thought those familiar with pymc3 are very likely already quite familiar with it:)

You forgot to add one line when creating the covariance matrix you miss-specified the shape of the MvNormal. Your model should look something like this:
with pm.Model() as correlation_model:
mu = pm.Normal('mu', mu=0., sd=10, shape=2)
sigma = pm.HalfCauchy('sigma', 5, shape=2)
C_triu = pm.LKJCorr('C_triu', n=2, p=2)
C = tt.fill_diagonal(C_triu[np.zeros((2,2), 'int')], 1.)
sigma_diag = tt.nlinalg.diag(sigma) # this line
cov = tt.nlinalg.matrix_dot(sigma_diag, C, sigma_diag)
tau = tt.nlinalg.matrix_inverse(cov)
ab = pm.MvNormal('ab', mu=mu, tau=tau, shape=(n_counties, 2))
eps = pm.HalfCauchy('eps', 5)
radon_est = ab[:,0][county_idx] + ab[:,1][county_idx] * data.floor.values
radon_like = pm.Normal('radon_like', mu=radon_est, sd=eps, observed=data.log_radon)
trace = pm.sample(2000)
Notice that alternatively, you can evaluate the correlation of the intercept and the slope from the posterior of hierarchical_model. You can use a frequentist method or build another Bayesian model, that takes as the observed data the result of hierarchical_model. May be this could be faster.
EDIT
If you want to evaluate the correlation of two variables from the posterior you can do something like.
chain = hierarchical_trace[100:]
x_0 = chain['mu_a']
x_1 = chain['mu_b']
X = np.vstack((x_0, x_1)).T
and then you can run the following model:
with pm.Model() as correlation:
mu = pm.Normal('mu', mu=0., sd=10, shape=2)
sigma = pm.HalfCauchy('sigma', 5, shape=2)
C_triu = pm.LKJCorr('C_triu', n=2, p=2)
C = tt.fill_diagonal(C_triu[np.zeros((2,2), 'int')], 1.)
sigma_diag = tt.nlinalg.diag(sigma)
cov = tt.nlinalg.matrix_dot(sigma_diag, C, sigma_diag)
tau = tt.nlinalg.matrix_inverse(cov)
yl = pm.MvNormal('yl', mu=mu, tau=tau, shape=(2, 2), observed=X)
trace = pm.sample(5000, pm.Metropolis())
You can replace x_0 and x_1 according to your needs. For example you may want to do:
x_0 = np.random.normal(chain['mu_a'], chain['sigma_a'])
x_1 = np.random.normal(chain['mu_b'], chain['sigma_b'])

Porting pyMC2 Bayesian A/B testing example to pyMC3

I am working to learn pyMC 3 and having some trouble. Since there are limited tutorials for pyMC3 I am working from Bayesian Methods for Hackers. I'm trying to port the pyMC 2 code to pyMC 3 in the Bayesian A/B testing example, with no success. From what I can see the model isn't taking into account the observations at all.
I've had to make a few changes from the example, as pyMC 3 is quite different, so what should look like this:
import pymc as pm
# The parameters are the bounds of the Uniform.
p = pm.Uniform('p', lower=0, upper=1)
# set constants
p_true = 0.05 # remember, this is unknown.
N = 1500
# sample N Bernoulli random variables from Ber(0.05).
# each random variable has a 0.05 chance of being a 1.
# this is the data-generation step
occurrences = pm.rbernoulli(p_true, N)
print occurrences # Remember: Python treats True == 1, and False == 0
print occurrences.sum()
# Occurrences.mean is equal to n/N.
print "What is the observed frequency in Group A? %.4f" % occurrences.mean()
print "Does this equal the true frequency? %s" % (occurrences.mean() == p_true)
# include the observations, which are Bernoulli
obs = pm.Bernoulli("obs", p, value=occurrences, observed=True)
# To be explained in chapter 3
mcmc = pm.MCMC([p, obs])
mcmc.sample(18000, 1000)
figsize(12.5, 4)
plt.title("Posterior distribution of $p_A$, the true effectiveness of site A")
plt.vlines(p_true, 0, 90, linestyle="--", label="true $p_A$ (unknown)")
plt.hist(mcmc.trace("p")[:], bins=25, histtype="stepfilled", normed=True)
plt.legend()
instead looks like:
import pymc as pm
import random
import numpy as np
import matplotlib.pyplot as plt
with pm.Model() as model:
# Prior is uniform: all cases are equally likely
p = pm.Uniform('p', lower=0, upper=1)
# set constants
p_true = 0.05 # remember, this is unknown.
N = 1500
# sample N Bernoulli random variables from Ber(0.05).
# each random variable has a 0.05 chance of being a 1.
# this is the data-generation step
occurrences = [] # pm.rbernoulli(p_true, N)
for i in xrange(N):
occurrences.append((random.uniform(0.0, 1.0) <= p_true))
occurrences = np.array(occurrences)
obs = pm.Bernoulli('obs', p_true, observed=occurrences)
start = pm.find_MAP()
step = pm.Metropolis()
trace = pm.sample(18000, step, start)
pm.traceplot(trace);
plt.show()
Apologies for the lengthy post but in my adaptation there have been a number of small changes, e.g. manually generating the observations because pm.rbernoulli no longer exists. I'm also not sure if I should be finding the start prior to running the trace. How should I change my implementation to correctly run?

You were indeed close. However, this line:
obs = pm.Bernoulli('obs', p_true, observed=occurrences)
is wrong as you are just setting a constant value for p (p_true == 0.05). Thus, your random variable p defined above to have a uniform prior is not constrained by the likelihood and your plot shows that you are just sampling from the prior. If you replace p_true with p in your code it should work. Here is the fixed version:
import pymc as pm
import random
import numpy as np
import matplotlib.pyplot as plt
with pm.Model() as model:
# Prior is uniform: all cases are equally likely
p = pm.Uniform('p', lower=0, upper=1)
# set constants
p_true = 0.05 # remember, this is unknown.
N = 1500
# sample N Bernoulli random variables from Ber(0.05).
# each random variable has a 0.05 chance of being a 1.
# this is the data-generation step
occurrences = [] # pm.rbernoulli(p_true, N)
for i in xrange(N):
occurrences.append((random.uniform(0.0, 1.0) <= p_true))
occurrences = np.array(occurrences)
obs = pm.Bernoulli('obs', p, observed=occurrences)
start = pm.find_MAP()
step = pm.Metropolis()
trace = pm.sample(18000, step, start)
pm.traceplot(trace);

This worked for me. I generated the observations before initiating the model.
true_p_A = 0.05
true_p_B = 0.04
N_A = 1500
N_B = 750
obs_A = np.random.binomial(1, true_p_A, size=N_A)
obs_B = np.random.binomial(1, true_p_B, size=N_B)
with pm.Model() as ab_model:
p_A = pm.Uniform('p_A', 0, 1)
p_B = pm.Uniform('p_B', 0, 1)
delta = pm.Deterministic('delta',p_A - p_B)
obs_A = pm.Bernoulli('obs_A', p_A, observed=obs_A)
osb_B = pm.Bernoulli('obs_B', p_B, observed=obs_B)
with ab_model:
trace = pm.sample(2000)
pm.traceplot(trace)

You were very close - you just need to unindent the last two lines, which produce the traceplot. You can think of plotting the traceplot as a diagnostic that should occur after you finish sampling. The following works for me:
import pymc as pm
import random
import numpy as np
import matplotlib.pyplot as plt
with pm.Model() as model:
# Prior is uniform: all cases are equally likely
p = pm.Uniform('p', lower=0, upper=1)
# set constants
p_true = 0.05 # remember, this is unknown.
N = 1500
# sample N Bernoulli random variables from Ber(0.05).
# each random variable has a 0.05 chance of being a 1.
# this is the data-generation step
occurrences = [] # pm.rbernoulli(p_true, N)
for i in xrange(N):
occurrences.append((random.uniform(0.0, 1.0) <= p_true))
occurrences = np.array(occurrences)
obs = pm.Bernoulli('obs', p_true, observed=occurrences)
start = pm.find_MAP()
step = pm.Metropolis()
trace = pm.sample(18000, step, start)
#Now plot
pm.traceplot(trace)
plt.show()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Estimating the p value of the difference between two proportions using statsmodels and PyMC3 (MCMC simulation) in Python - python

Related

Constraining log-likelihood using pymc3.Potential?

Bjontegaard calculation using only one pair of PSNR and BitRate

Checking if Frequentist approach is correct? Bayesian approach using MCMC for AB test. How to calculate Bayes Factors in Python?

pymc3: how to model correlated intercept and slope in multilevel linear regression

Porting pyMC2 Bayesian A/B testing example to pyMC3

Categories

Resources