Supplying test values in pymc 3

Supplying test values in pymc 3 - python

I am exploring the use of bounded distributions in pymc. I am trying to bound a Gamma prior distribution between two values. The model specification seems to fail due to the absence of test values. How may I pass a testval argument such that I am able to specify these sorts of models?
For completeness I have included the error, as well as a minimal example below. Thank you!
AttributeError: <pymc.quickclass.Gamma object at 0x110a62890> has no default value to use, checked for: ['median', 'mean', 'mode'] pass testval argument or provide one of these.
import pymc as pm
import numpy as np
ndims = 2
nobs = 20
zdata = np.random.normal(loc=0, scale=0.75, size=(ndims, nobs))
BoundedGamma = pm.Bound(pm.Gamma, 0.5, 2)
with pm.Model() as model:
xbound = BoundedGamma('xbound', alpha=1, beta=2)
z = pm.Normal('z', mu=0, tau=xbound, shape=(ndims, 1), observed=zdata)
edit: for reference purposes, here is a simple working model utilizing a bounded gamma prior distribution:
import pymc as pm
import numpy as np
ndims = 2
nobs = 20
zdata = np.random.normal(loc=0, scale=0.75, size=(ndims, nobs))
BoundedGamma = pm.Bound(pm.Gamma, 0.5, 2)
with pm.Model() as model:
xbound = BoundedGamma('xbound', alpha=1, beta=2, testval=2)
z = pm.Normal('z', mu=0, tau=xbound, shape=(ndims, 1), observed=zdata)
with model:
start = pm.find_MAP()
with model:
step = pm.NUTS()
with model:
trace = pm.sample(3000, step, start)
pm.traceplot(trace);

Use that line:
xbound = BoundedGamma('xbound', alpha=1, beta=2, testval=1)

Related

PyMC3 has no attribute hpd

I'm trying to use the hpd function but I keep getting an attribute error.
import pymc3 as pm
pm.__version__
>>'3.11.4'
pm.stats.hpd()
>>AttributeError: module 'pymc3.stats' has no attribute 'hpd'
The complete code (properly passing arguments):
def posterior_grid_approx(grid_points=100, success=6, tosses=9):
"""
"""
# define grid
p_grid = np.linspace(0, 1, grid_points)
# define prior
prior = np.repeat(5, grid_points) # uniform
#prior = (p_grid >= 0.5).astype(int) # truncated
#prior = np.exp(- 5 * abs(p_grid - 0.5)) # double exp
# compute likelihood at each point in the grid
likelihood = stats.binom.pmf(success, tosses, p_grid)
# compute product of likelihood and prior
unstd_posterior = likelihood * prior
# standardize the posterior, so it sums to 1
posterior = unstd_posterior / unstd_posterior.sum()
return p_grid, posterior
p_grid, posterior = posterior_grid_approx(grid_points=100, success=6, tosses=9)
samples = np.random.choice(p_grid, p=posterior, size=int(1e4), replace=True)
pm.stats.hpd(samples, alpha=0.5)
>> AttributeError: module 'pymc3.stats' has no attribute 'hpd'
I tried pm.hpd and pm.stats.hpd both with no succcess

It looks like this was moved to arviz and the code below seems to work
import arviz as az
# Calculate HPD credible interval of 95%
ci_95 = az.hdi(posterior_draws, hdi_prob=0.95)

Old PyMC3 style grouping traceplot plotted with Arviz

I have an old blogpost where I am training a PyMC3 model. You can find the blogpost here but the gist of the model is shown below.
with pm.Model() as model:
mu_intercept = pm.Normal('mu_intercept', mu=40, sd=5)
mu_slope = pm.HalfNormal('mu_slope', 10, shape=(n_diets,))
mu = mu_intercept + mu_slope[df.diet-1] * df.time
sigma_intercept = pm.HalfNormal('sigma_intercept', sd=2)
sigma_slope = pm.HalfNormal('sigma_slope', sd=2, shape=n_diets)
sigma = sigma_intercept + sigma_slope[df.diet-1] * df.time
weight = pm.Normal('weight', mu=mu, sd=sigma, observed=df.weight)
approx = pm.fit(20000, random_seed=42, method="fullrank_advi")
In this dataset I'm estimating the effect of Diet on the weight of chickens. This is what the traceplot looks like.
Look at how pretty it is! Each diet has its own line! Beautiful!
Arviz Changes
This traceplot was made using the older PyMC3 API. Nowadays this functionality has moved to arviz. So tried redo-ing this work but ... the plot looks very different.
The code that I'm using here is slightly different. I'm using pm.Data now but I doubt that's supposed to cause this difference.
with pm.Model() as mod:
time_in = pm.Data("time_in", df['time'].astype(float))
diet_in = pm.Data("diet_in", dummies)
intercept = pm.Normal("intercept", 0, 2)
time_effect = pm.Normal("time_weight_effect", 0, 2, shape=(4,))
diet = pm.Categorical("diet", p=[0.25, 0.25, 0.25, 0.25], shape=(4,), observed=diet_in)
sigma = pm.HalfNormal("sigma", 2)
sigma_time_effect = pm.HalfNormal("time_sigma_effect", 2, shape=(4,))
weight = pm.Normal("weight",
mu=intercept + time_effect.dot(diet_in.T)*time_in,
sd=sigma + sigma_time_effect.dot(diet_in.T)*time_in,
observed=df.weight)
trace = pm.sample(5000, return_inferencedata=True)
What do I need to do to get the different colors per DIET back in?

There's a parameter for it in the new plot_trace function. This does the trick;
az.plot_trace(trace, compact=True)

lmfit minimize (or scipy.optimize leastsq) on complex equation/data

Edit:
Modeling and fitting with this approach work fine, the data in here is not good.-------------------
I want to do a curve-fitting on a complex dataset. After thorough reading and searching, I found that i can use a couple of methods (e.g. lmfit optimize, scipy leastsq).
But none gives me a good fit at all.
here is the fit equation:
here is the data to be fitted (list of y values):
[(0.00011342104914066835+8.448890220616275e-07j),
(0.00011340386404065371+7.379293582429708e-07j),
(0.0001133540327309949+6.389834505824625e-07j),
(0.00011332170913939336+5.244566142401774e-07j),
(0.00011331311156154074+4.3841061618015007e-07j),
(0.00011329383047059048+3.6163513508002877e-07j),
(0.00011328700094846502+3.0542249453666894e-07j),
(0.00011327650033983806+2.548725558622188e-07j),
(0.00011327702539337786+2.2508174567697671e-07j),
(0.00011327342238146558+1.9607648998100523e-07j),
(0.0001132710747364799+1.721721661949941e-07j),
(0.00011326933241850936+1.5246061350710235e-07j),
(0.00011326798040984542+1.3614817802178457e-07j),
(0.00011326752037650585+1.233483784504962e-07j),
(0.00011326758290166552+1.1258801448459512e-07j),
(0.00011326813100914905+1.0284749122099354e-07j),
(0.0001132684076390416+9.45791423595816e-08j),
(0.00011326982474882009+8.733105218572698e-08j),
(0.00011327158639135678+8.212191452217794e-08j),
(0.00011327366823516856+7.747920115589205e-08j),
(0.00011327694366034208+7.227069986108343e-08j),
(0.00011327915327873038+6.819405851172907e-08j),
(0.00011328181165961218+6.468392148750885e-08j),
(0.00011328531688122571+6.151393311227958e-08j),
(0.00011328857849500441+5.811704586613896e-08j),
(0.00011329241716561626+5.596645863242474e-08j),
(0.0001132970129528527+5.4722461511610696e-08j),
(0.0001133002881788021+5.064523218904898e-08j),
(0.00011330507671740223+5.0307457368330284e-08j),
(0.00011331106068787993+4.7703959367963307e-08j),
(0.00011331577350707601+4.634615394867111e-08j),
(0.00011332064001939156+4.6914747648361504e-08j),
(0.00011333034985824086+4.4992151257444304e-08j),
(0.00011334188526870483+4.363662798446445e-08j),
(0.00011335491299924776+4.364164366097129e-08j),
(0.00011337451201475147+4.262881852644385e-08j),
(0.00011339778209066752+4.275096587356569e-08j),
(0.00011342832992628646+4.4463907608604945e-08j),
(0.00011346526768580432+4.35706649329342e-08j),
(0.00011351108008292451+4.4155812379491554e-08j),
(0.00011356967192325835+4.327004709646922e-08j),
(0.00011364164970635006+4.420660396556604e-08j),
(0.00011373150199883139+4.3672898914161596e-08j),
(0.00011384660942003356+4.326171366194325e-08j),
(0.00011399193321804955+4.1493065523925126e-08j),
(0.00011418043916260295+4.0762418512759096e-08j),
(0.00011443271767970721+3.91359909722939e-08j),
(0.00011479600563688605+3.845666332695652e-08j),
(0.0001153652105925112+3.6224677316584614e-08j),
(0.00011638635682516399+3.386843079212692e-08j),
(0.00011836223959714231+3.6692295450490655e-08j)]
here is the list of x values:
[999.9999960000001,
794.328231,
630.957342,
501.18723099999994,
398.107168,
316.22776400000004,
251.188642,
199.52623,
158.489318,
125.89254,
99.999999,
79.432823,
63.095734,
50.118722999999996,
39.810717,
31.622776,
25.118864000000002,
19.952623000000003,
15.848932000000001,
12.589253999999999,
10.0,
7.943282000000001,
6.309573,
5.011872,
3.981072,
3.1622779999999997,
2.511886,
1.9952619999999999,
1.584893,
1.258925,
1.0,
0.7943279999999999,
0.630957,
0.5011869999999999,
0.398107,
0.316228,
0.251189,
0.199526,
0.15848900000000002,
0.125893,
0.1,
0.079433,
0.063096,
0.050119,
0.039811,
0.031623000000000005,
0.025119,
0.019953,
0.015849000000000002,
0.012589,
0.01]
and here is the code which works but not the way I want:
import numpy as np
import matplotlib.pyplot as plt
from lmfit import minimize, Parameters
#%% the equation
def ColeCole(params, fr): #fr is x values array and params are the fitting parameters
sig0 = params['sig0']
m = params['m']
tau = params['tau']
c = params['c']
w = fr*2*np.pi
num = 1
denom = 1+(1j*w*tau)**c
sigComplex = sig0*(1.0+(m/(1-m))*(1-num/denom))
return sigComplex
def res(params, fr, data): #calculating reseduals of fit
resedual = ColeCole(params, fr) - data
return resedual.view(np.float)
#%% Adding model parameters and fitting
params = Parameters()
params.add('sig0', value=0.00166)
params.add('m', value=0.19,)
params.add('tau', value=0.05386)
params.add('c', value=0.80)
params['tau'].min = 0 # these conditions must be met but even if I remove them the fit is ugly!!
params['m'].min = 0
out= minimize(res, params , args= (np.array(fr2), np.array(data)))
#%%plotting Imaginary part
fig, ax = plt.subplots()
plotX = fr2
plotY = data.imag
fitplot = ColeCole(out.params, fr2)
ax.semilogx(plotX,plotY,'o',label='imc')
ax.semilogx(plotX,fitplot.imag,label='fit')
#%%plotting real part
fig2, ax2 = plt.subplots()
plotX2 = fr2
plotY2 = data.real
fitplot2 = ColeCole(out.params, fr2)
ax2.semilogx(plotX2,plotY2,'o',label='imc')
ax2.semilogx(plotX2,fitplot2.real,label='fit')
I might be doing it completely wrong, please help me if you know the proper solution to do a curve fitting on complex data.

I would suggest first converting the complex data to numpy arrays and get real, imag pairs separately and then using lmfit Model to model that same sort of data. Perhaps something like this:
cdata = np.array((0.00011342104914066835+8.448890220616275e-07j,
0.00011340386404065371+7.379293582429708e-07j,
0.0001133540327309949+6.389834505824625e-07j,
0.00011332170913939336+5.244566142401774e-07j,
0.00011331311156154074+4.3841061618015007e-07j,
0.00011329383047059048+3.6163513508002877e-07j,
0.00011328700094846502+3.0542249453666894e-07j,
0.00011327650033983806+2.548725558622188e-07j,
0.00011327702539337786+2.2508174567697671e-07j,
0.00011327342238146558+1.9607648998100523e-07j,
0.0001132710747364799+1.721721661949941e-07j,
0.00011326933241850936+1.5246061350710235e-07j,
0.00011326798040984542+1.3614817802178457e-07j,
0.00011326752037650585+1.233483784504962e-07j,
0.00011326758290166552+1.1258801448459512e-07j,
0.00011326813100914905+1.0284749122099354e-07j,
0.0001132684076390416+9.45791423595816e-08j,
0.00011326982474882009+8.733105218572698e-08j,
0.00011327158639135678+8.212191452217794e-08j,
0.00011327366823516856+7.747920115589205e-08j,
0.00011327694366034208+7.227069986108343e-08j,
0.00011327915327873038+6.819405851172907e-08j,
0.00011328181165961218+6.468392148750885e-08j,
0.00011328531688122571+6.151393311227958e-08j,
0.00011328857849500441+5.811704586613896e-08j,
0.00011329241716561626+5.596645863242474e-08j,
0.0001132970129528527+5.4722461511610696e-08j,
0.0001133002881788021+5.064523218904898e-08j,
0.00011330507671740223+5.0307457368330284e-08j,
0.00011331106068787993+4.7703959367963307e-08j,
0.00011331577350707601+4.634615394867111e-08j,
0.00011332064001939156+4.6914747648361504e-08j,
0.00011333034985824086+4.4992151257444304e-08j,
0.00011334188526870483+4.363662798446445e-08j,
0.00011335491299924776+4.364164366097129e-08j,
0.00011337451201475147+4.262881852644385e-08j,
0.00011339778209066752+4.275096587356569e-08j,
0.00011342832992628646+4.4463907608604945e-08j,
0.00011346526768580432+4.35706649329342e-08j,
0.00011351108008292451+4.4155812379491554e-08j,
0.00011356967192325835+4.327004709646922e-08j,
0.00011364164970635006+4.420660396556604e-08j,
0.00011373150199883139+4.3672898914161596e-08j,
0.00011384660942003356+4.326171366194325e-08j,
0.00011399193321804955+4.1493065523925126e-08j,
0.00011418043916260295+4.0762418512759096e-08j,
0.00011443271767970721+3.91359909722939e-08j,
0.00011479600563688605+3.845666332695652e-08j,
0.0001153652105925112+3.6224677316584614e-08j,
0.00011638635682516399+3.386843079212692e-08j,
0.00011836223959714231+3.6692295450490655e-08j))
fr = np.array((999.9999960000001, 794.328231, 630.957342,
501.18723099999994, 398.107168, 316.22776400000004,
251.188642, 199.52623, 158.489318, 125.89254, 99.999999,
79.432823, 63.095734, 50.118722999999996, 39.810717,
31.622776, 25.118864000000002, 19.952623000000003,
15.848932000000001, 12.589253999999999, 10.0,
7.943282000000001, 6.309573, 5.011872, 3.981072,
3.1622779999999997, 2.511886, 1.9952619999999999, 1.584893,
1.258925, 1.0, 0.7943279999999999, 0.630957,
0.5011869999999999, 0.398107, 0.316228, 0.251189, 0.199526,
0.15848900000000002, 0.125893, 0.1, 0.079433, 0.063096,
0.050119, 0.039811, 0.031623000000000005, 0.025119, 0.019953,
0.015849000000000002, 0.012589, 0.01))
data = np.concatenate((cdata.real, cdata.imag))
# model function for lmfit
def colecole_function(x, sig0, m, tau, c):
w = x*2*np.pi
denom = 1+(1j*w*tau)**c
sig = sig0*(1.0+(m/(1.0-m))*(1-1.0/denom))
return np.concatenate((sig.real, sig.imag))
mod = Model(colecole_function)
params = mod.make_params(sig0=0.002, m=-0.19, tau=0.05, c=0.8)
params['tau'].min = 0
result = mod.fit(data, params, x=fr)
print(result.fit_report())
You would then want to plot the results like
nf = len(fr)
plt.plot(fr, data[:nf], label='data(real)')
plt.plot(fr, result.best_fit[:nf], label='fit(real)')
and similarly
plt.plot(fr, data[nf:], label='data(imag)')
plt.plot(fr, result.best_fit[nf:], label='fit(imag)')
Note that I think you're going to want to allow m to be negative (or maybe I misuderstand your model). I did not work carefully on getting a great fit, but I think this should get you started.

Multiple levels in hierarchical linear regression using PYMC3

I am trying to set up a hierarchical linear regression model using PYMC3. In my particular case, I want to see whether postal codes provide a meaningful structure for other features. Suppose I use the following mock data:
import pandas as pd
import numpy as np
import pymc3 as pm
data = pd.DataFrame({"postalcode": np.floor(np.random.uniform(low=10, high=99, size=1000)),
"x": np.random.normal(size=1000),
"y": np.random.normal(size=1000)})
data["postalcode"] = data["postalcode"].astype(int)
I generate postal codes from 10 to 99, as well as a normally distributed feature x and a target value y. Now I set up my indices for postal code level 1 and level 2:
def create_pc_index(level):
pc = data["postalcode"].astype(str).str[0:level]
unique_pc = pc.unique()
pc_dict = dict(zip(unique_pc, range(0, len(unique_pc))))
return pc_dict, pc.apply(lambda x: pc_dict[x]).values
pc1_dict, pc1_index = create_pc_index(1)
pc2_dict, pc2_index = create_pc_index(2)
Using the first digit of the postal code as hierarchical attribute works fine:
number_of_samples = 1000
x = data["x"]
y = data["y"]
with pm.Model() as model:
sigma = pm.HalfCauchy('sigma', beta=10, testval=0.5, shape=1)
mu_i = pm.Normal("mu_i", 5, sd=25, shape=1)
intercept = pm.Normal('Intercept', mu_i, sd=1, shape=len(pc1_dict))
mu_s = pm.Normal("mu_x", 0, sd=3, shape=1)
x_coeffs = pm.Normal("x", mu_s, 1, shape=len(pc1_dict))
mean = intercept[pc1_index] + x_coeffs[pc1_index] * x
likelihood_mean = pm.Deterministic("mean", mean)
likelihood = pm.Normal('y', mu=likelihood_mean, sd=sigma, observed=y)
trace = pm.sample(number_of_samples)
burned_trace = trace[number_of_samples/2:]
However, if I want to add a second level to my hierarchy (in this case only on the intercept, ignoring x for the moment), I run into shape problems
with pm.Model() as model:
sigma = pm.HalfCauchy('sigma', beta=10, testval=0.5, shape=1)
mu_i_level_1 = pm.Normal("mu_i", 0, sd=25, shape=1)
mu_i_level_2 = pm.Normal("mu_i_level_2", mu_i_level_1, sd=1, shape=len(pc1_dict))
intercept = pm.Normal('Intercept', mu_i_level_2[pc1_index], sd=1, shape=len(pc2_dict))
mu_s = pm.Normal("mu_x", 0, sd=3, shape=1)
x_coeffs = pm.Normal("x", mu_s, 1, shape=len(pc1_dict))
mean = intercept[pc2_index] + x_coeffs[pc1_index] * x
likelihood_mean = pm.Deterministic("mean", mean)
likelihood = pm.Normal('y', mu=likelihood_mean, sd=sigma, observed=y)
trace = pm.sample(number_of_samples)
burned_trace = trace[number_of_samples/2:]
The error message is:
operands could not be broadcast together with shapes (89,) (1000,)
How do I model multiple levels in my regression correctly? Is this just an issue with the correct shape size or is there a more fundamental error on my part?
Thanks in advance!

I don't think intercept can have a shape of len(pc2_dict) but a mu of len(pc1_dict). The contradiction is here:
intercept = pm.Normal('Intercept', mu_i_level_2[pc1_index], sd=1, shape=len(pc2_dict))

Getting started with PYMC for linear regression

Thought I'd start off following this example:
http://www.databozo.com/2014/01/17/Exploring_PyMC3.html
But when I follow the example precisely using pymc 2.3 I get an exit and told that the API has changed
UserWarning: The MCMC() syntax is deprecated. Please pass in nodes explicitly via M = MCMC(input).
'The MCMC() syntax is deprecated. Please pass in nodes explicitly via M = MCMC(input).') but I don't have a good idea how to change the example to provide exactly what to the model function and how to do with the 'with' clause?
The code in question is:
%pylab inline
import scipy
import numpy as np
x = np.array(range(0,50))
y = np.random.uniform(low=0.0, high=40.0, size=200)
y = map((lambda a: a[0] + a[1]), zip(x,y))
import matplotlib.pyplot as plt
plt.scatter(x,y)
Above sample data generator works fine
import pymc as pm
import numpy as np
trace = None
with pm.Model() as model: <<<<<<indicated as the error line
alpha = pm.Normal('alpha', mu=0, sd=20)
beta = pm.Normal('beta', mu=0, sd=20)
sigma = pm.Uniform('sigma', lower=0, upper=20)
y_est = alpha + beta * x
likelihood = pm.Normal('y', mu=y_est, sd=sigma, observed=y)
start = pm.find_MAP()
step = pm.NUTS(state=start)
trace = pm.sample(2000, step, start=start, progressbar=False)
pm.traceplot(trace);

Package author #fonnesbeck informed me that I needed the development version 3 from Github and not the pypi version 2.3. Installed that, via github and I'm good to go. Open source is great!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Supplying test values in pymc 3 - python

Use that line: xbound = BoundedGamma('xbound', alpha=1, beta=2, testval=1)

Related

PyMC3 has no attribute hpd

Old PyMC3 style grouping traceplot plotted with Arviz

lmfit minimize (or scipy.optimize leastsq) on complex equation/data

Multiple levels in hierarchical linear regression using PYMC3

Getting started with PYMC for linear regression

Categories

Resources