ArviZ plot_trace does not properly plot multidimensional variables

ArviZ plot_trace does not properly plot multidimensional variables - python

I'm attempting to run a basic test model using PyMC3, but I've found the ArviZ plot_trace function won't properly show my traces.
Code
from scipy import stats
import arviz as az
import numpy as np
import matplotlib.pyplot as plt
import pymc3 as pm
import seaborn as sns
import pandas as pd
from theano import shared
from sklearn import preprocessing
if __name__ == "__main__":
with basic_model:
# Priors for unknown model parameters
alpha = pm.Normal('alpha', mu=0, sigma=10)
beta = pm.Normal('beta', mu=0, sigma=10, shape=2)
sigma = pm.HalfNormal('sigma', sigma=1)
# Expected value of outcome
mu = alpha + beta[0]*X1 + beta[1]*X2
# Likelihood (sampling distribution) of observations
Y_obs = pm.Normal('Y_obs', mu=mu, sigma=sigma, observed=Y)
# draw 500 posterior samples
trace = pm.sample(5000)
az.plot_trace(trace, compact = False)
The beta parameter is multidimensional, and has both beta[0] and beta[1], but the ArviZ trace only shows beta[0]:
Trace Plot
If I run the trace plot as az.plot_trace(trace, compact = True), then I do see both dimensions of beta properly overlaid. I only observed this issue when trying to plot the dimensions in separate axes with compact = False.
Versions
ArviZ: 0.6.1
Numpy: 1.18.1
SciPy: 1.4.1
xarray: 0.15.0
Matplotlib: 3.1.3

It looks like you are running into this bug. I'd recommend updating ArviZ to its latest version (0.7.0 at the time of writing) which already includes a fix for this particular bug.
If you had some kind of version constraint, then disabling numba should fix the issue.

Related

Python random library: Simulating from Pareto distribution (using shape & scale params)

According to the Python docs, random.paretovariate(alpha) simulates from the Pareto distribution where alpha is the shape parameter. But the Pareto distribution takes both a shape and scale parameter.
How can I simulate from this distribution specifying both parameters?

You can use NumPy instead:
from numpy import random
pareto = random.pareto(a=4, size=(4, 8))
print(pareto)
>>>[[0.32803729 0.03626127 0.73736579 0.53301595 0.33443536 0.12561402
0.00816275 0.0134468 ]
[0.21536643 0.15798882 0.52957712 0.06631794 0.03728101 0.80383849
0.01727098 0.03910042]
[0.24481661 0.13497905 0.00665971 0.41875676 0.20252262 0.13701287
0.06929994 0.05350275]
[0.93898544 0.02621125 0.0873763 0.15660287 0.31329102 3.95332518
0.09149938 0.08415795]]
You can also nicely graph the data using matplotlib and seaborn:
from numpy import random
import matplotlib.pyplot as plt
import seaborn
seaborn.distplot(random.pareto(a=4, size=1000), kde=False)
plt.show()

PYMC3 - Random Walk Forecasting

I was hoping someone may be able to clarify something for me. I am trying to do a timeseries forecasting with the GaussianRandomWalk function in PyMC3. I have been suggested that my code is wrong as I’ve modeled it so that the standard deviation of the latent walk is the same as the observation noise, which seems like it might be a mistake. Is it a mistake? How would i change it?
import pymc3 as pm
#import seaborn as sns
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# generate a random walk
sd = .1
N = 200
deltas = np.random.normal(scale=sd, size=N)
y = np.cumsum(deltas)
x = np.arange(N)
df = pd.DataFrame({‘y’: y})
df = df.reindex(np.arange(250))
with pm.Model() as model:
sd = pm.HalfNormal(‘sd’)
mu = pm.Uniform(“mu”, 0, 100)
prior = pm.GaussianRandomWalk(‘prior’, mu=mu, sd=sd, shape=len(df))
obs = pm.Normal("obs", mu=prior, sd=sd, observed=df["y"])
# graph = pm.model_to_graphviz(model)
# print(graph)
trace = pm.sample(2000, chains=1)
pm.traceplot(trace)
plt.show()
with model:
ppc = pm.sample_posterior_predictive(trace)
pm.traceplot(ppc)
plt.show()
print(ppc)

GuassianRandomWalk is pure random, without any trend/inertia. You might want to look into tfp.sts.LocalLinearTrend or pm.AR which has some "inertia" in it.
I don't know more about how to model timeseries.

Why Error on the fitting functions is too large also curve seems to pass from maximum number of points. How to reduce that error?

I want to use power law to fit on my data points because I have to calculate the value of v. But the error on my fitting parameters is too large, although curve seems to pass all data points. How to reduce is error?
`import numpy as np
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
import math
import scipy
from scipy import optimize
x_data= np.array([30, 45, 60, 75])
y_data= np.array([0.42597867, 0.26249343, 0.19167837, 0.08116507])
fig = plt.figure()
ax= fig.add_subplot(111)
def ff(L,v,c):
return (L**(-1/v)+c)
ax2.scatter(x_data, y_data, marker='s',s=4**2,)
pfit,pcov = optimize.curve_fit(ff,x_data,y_data)
print("pfit: ",pfit)
print("pcov: ",pcov.shape)
#print(pcov)
perr = np.sqrt(np.diag(pcov))
x=np.linspace(20,85,1000)
ax2.plot(x,ff(x,*pfit),color='red')`

How to pull samples with a tweedie distribution using numpy

I'm trying to plot a CDF of random samples to compare to a target within a dataset that follows a tweedie distribution. I know the following code will pull random samples along a poisson distribution:
import numpy as np
import matplotlib.pyplot as plt
x_r = np.random.poisson(lam = coll_df['pure_premium'].mean(), size = len(coll_df['pure_premium'])).sort()
y_r = np.arange(1, len(x)+1)/len(x)
_ = plt.plot(x, y_r, color = 'red')
_ = plt.xlabel('Percent of Pure Premium')
_ = plt.ylabel('ECDF')
However, there is no tweedie distribution option on the random sampling. Anyone know how to hack this together?

PyPI has a tweedie package. A minimal example drawing a sample would be:
import tweedie, seaborn as sns, matplotlib.pyplot as plt
tvs = tweedie.tweedie(mu=10, p=1.5, phi=20).rvs(100000)
sns.distplot(tvs)
plt.show()
The package's GitHub pages have a more fancy example. The package implements rv_continuous, so one gets a bunch of other functionality besides rvs(). Also, while there seems no nice online docs, help(tweedie.tweedie) gives lots of detail.

PyMC3 traceplot not displaying

I am trying to get the PyMC3 examples from Osvaldo Martin's Bayesian Analysis with Python working. On Windows 10, while the following code using matplotlib works fine (i.e. a chart is displayed):
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
def posterior_grid(grid_points=100, heads=6, tosses=9):
"""
A grid implementation for the coin-flip problem
"""
grid = np.linspace(0, 1, grid_points)
prior = 0.5 - abs(grid - 0.5)
likelihood = stats.binom.pmf(heads, tosses, grid)
unstd_posterior = likelihood * prior
posterior = unstd_posterior / unstd_posterior.sum()
return grid, posterior
if __name__ == "__main__":
points = 100
h, n = 1, 4
grid, posterior = posterior_grid(points, h, n)
plt.plot(grid, posterior, 'o-', label='heads = {}\ntosses = {}'.format(h, n))
plt.xlabel(r'$\theta$')
plt.legend(loc=0)
plt.show()
...I cannot get the following - which uses PyMC3's traceplot - to display a chart:
import pymc3 as pm
import numpy as np
import scipy.stats as stats
if __name__ == "__main__":
np.random.seed(123)
n_experiments = 4
theta_real = 0.35
data = stats.bernoulli.rvs(p=theta_real, size=n_experiments)
print(data)
with pm.Model() as our_first_model:
theta = pm.Beta('theta', alpha=1, beta=1)
y = pm.Bernoulli('y', p=theta, observed=data)
start = pm.find_MAP()
step = pm.Metropolis()
trace = pm.sample(1000, step=step, start=start)
burnin = 100
chain = trace[burnin:]
pm.traceplot(chain, lines={'theta':theta_real});
The code runs and exits fine, but no chart is displayed.
I have tried in IntelliJ IDEA with the Python plugin, from an Anaconda console window for my root environment, and from IPython.
In IPython, I get the following output on the console:
Out[3]:
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x0000024BDD622F60>,
<matplotlib.axes._subplots.AxesSubplot object at 0x0000024BDD667208>]], dtype=object)
...so obviously something is happening. But how can I display the results as a chart?
I have also tried the exact library versions listed in the book with Python 3.5, but still no traceplot chart:
Ipython 5.0
NumPy 1.11.1
SciPy 0.18.1
Pandas 0.18.1
Matplotlib 1.5.3
Seaborn 0.7.1
PyMC3 3.0

Various further Googling got me to the following answers.
With IPython, you must invoke with ipython --pylab auto to give matplotlib a suitable backend (on Windows at least).
With IntelliJ IDEA / PyCharm, you need to add
import matplotlib.pyplot as plt
and then
plt.show()
after the traceplot line to show the plot.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

ArviZ plot_trace does not properly plot multidimensional variables - python

It looks like you are running into this bug. I'd recommend updating ArviZ to its latest version (0.7.0 at the time of writing) which already includes a fix for this particular bug. If you had some kind of version constraint, then disabling numba should fix the issue.

Related

Python random library: Simulating from Pareto distribution (using shape & scale params)

PYMC3 - Random Walk Forecasting

Why Error on the fitting functions is too large also curve seems to pass from maximum number of points. How to reduce that error?

How to pull samples with a tweedie distribution using numpy

PyMC3 traceplot not displaying

Categories

Resources