The issue
Tl;dr: I would like a function that randomly returns a float (or optionally an ndarray of floats) in an interval following a probability distribution that resembles the sum of a "Gaussian" and a uniform distributions.
The function (or class) - let's say custom_distr() - should have as inputs (with default values already given):
the lower and upper bounds of the interval: low=0.0, high=1.0
the mean and standard deviation parameters of the "Gaussian": loc=0.5, scale=0.02
the size of the output: size=None
size can be an integer or a tuple of integers. If so, then loc and scale can either both simultaneously be scalars, or ndarrays whose shape corresponds to size.
The output is a scalar or an ndarray, depending on size.
The output has to be scaled to certify that the cumulative distribution is equal to 1 (I'm uncertain how to do this).
Note that I'm following numpy.random.Generator's naming convention from uniform and normal distributions as reference, but the nomenclature and the utilized packages does not really matter to me.
What I've tried
Since I couldn't find a way to "add" numpy.random.Generator's uniform and Gaussian distributions directly, I've tried using scipy.stats.rv_continuous subclassing, but I'm stuck at how to define the _rvs method, or the _ppf method to make it fast.
From what I've understood of rv_continuous class definition in Github, _rvs uses numpy's random.RandomState (which is out of date in comparison to random.Generator) to make the distributions. This seems to defeat the purpose of using scipy.stats.rv_continuous subclassing.
Another option would be to define _ppf, the percent-point function of my custom distribution, since according to rv_generic class definition in Github, the default function _rvs uses _ppf. But I'm having trouble defining this function by hand.
Following, there is a MWE, tested using low=0.0, high=1.0, loc=0.3 and scale=0.02. The names are different than the "The issue" section, because terminologies of terms are different between numpy and scipy.
import numpy as np
from scipy.stats import rv_continuous
import scipy.special as sc
import matplotlib.pyplot as plt
import time
# The class definition
class custom_distr(rv_continuous):
def __init__(self, my_loc=0.5, my_scale=0.5, a=0.0, b=1.0, *args, **kwargs):
super(custom_distr, self).__init__(a, b, *args, **kwargs)
self.a = a
self.b = b
self.my_loc = my_loc
self.my_scale = my_scale
def _pdf(self, x):
# uniform distribution
aux = 1/(self.b-self.a)
# gaussian distribution
aux += 1/np.sqrt(2*np.pi*self.my_scale**2) * \
np.exp(-(x-self.my_loc)**2/2/self.my_scale**2)
return aux/2 # divide by 2?
def _cdf(self, x):
# uniform distribution
aux = (x-self.a)/(self.b-self.a)
# gaussian distribution
aux += 0.5*(1+sc.erf((x-self.my_loc)/(self.my_scale*np.sqrt(2))))
return aux/2 # divide by 2?
# Testing the class
if __name__ == "__main__":
my_cust_distr = custom_distr(name="my_dist", my_loc=0.3, my_scale=0.02)
x = np.linspace(0.0, 1.0, 10000)
start_t = time.time()
the_pdf = my_cust_distr.pdf(x)
print("PDF calc time: {:4.4f}".format(time.time()-start_t))
plt.plot(x, the_pdf, label='pdf')
start_t = time.time()
the_cdf = my_cust_distr.cdf(x)
print("CDF calc time: {:4.4f}".format(time.time()-start_t))
plt.plot(x, the_cdf, 'r', alpha=0.8, label='cdf')
# Get 10000 random values according to the custom distribution
start_t = time.time()
r = my_cust_distr.rvs(size=10000)
print("RVS calc time: {:4.4f}".format(time.time()-start_t))
plt.hist(r, density=True, histtype='stepfilled', alpha=0.3, bins=40)
plt.ylim([0.0, the_pdf.max()])
plt.grid(which='both')
plt.legend()
print("Maximum of CDF is: {:2.1f}".format(the_cdf[-1]))
plt.show()
The generated image is:
The output is:
PDF calc time: 0.0010
CDF calc time: 0.0010
RVS calc time: 11.1120
Maximum of CDF is: 1.0
The time computing the RVS method is too slow in my approach.
According to Wikipedia, the ppf, or percent-point function (also called the Quantile function), can be written as the inverse function of the cumulative distribution function (cdf), when the cdf increases monotonically.
From the figure shown in the question, the cdf of my custom distribution function does, indeed, increase monotonically - as is expected, since the cdf's of Gaussian and uniform distributions do so too.
The ppf of the general normal distribution can be found in this Wikipedia page under "Quartile function". And the ppf of a uniform function defined between a and b can be calculated simply as p*(b-a)+a, where p is the desired probability.
But the inverse function of the sum of two functions, cannot (in general) be trivially written as a function of the inverses! See this Mathematics Exchange post for more information.
Therefore, the partial "solution" I have found thus far is to save an array containing the cdf of my custom distribution when instantiating an object and then finding the ppf (i.e. the inverse function of the cdf) via 1D interpolation, which only works as long as the cdf is indeed a monotonically increasing function.
NOTE 1: I still haven't fixed the bound's check issue mentioned by Peter O.
NOTE 2: The proposed solution is unviable if an ndarray of loc's were given, because of the lack of a closed-form expression for the Quartile function. Therefore, the original question is still open.
The working code is now:
import numpy as np
from scipy.stats import rv_continuous
import scipy.special as sc
import matplotlib.pyplot as plt
import time
# The class definition
class custom_distr(rv_continuous):
def __init__(self, my_loc=0.5, my_scale=0.5, a=0.0, b=1.0,
init_ppf=1000, *args, **kwargs):
super(custom_distr, self).__init__(a, b, *args, **kwargs)
self.a = a
self.b = b
self.my_loc = my_loc
self.my_scale = my_scale
self.x = np.linspace(a, b, init_ppf)
self.cdf_arr = self._cdf(self.x)
def _pdf(self, x):
# uniform distribution
aux = 1/(self.b-self.a)
# gaussian distribution
aux += 1/np.sqrt(2*np.pi)/self.my_scale * \
np.exp(-0.5*((x-self.my_loc)/self.my_scale)**2)
return aux/2 # divide by 2?
def _cdf(self, x):
# uniform distribution
aux = (x-self.a)/(self.b-self.a)
# gaussian distribution
aux += 0.5*(1+sc.erf((x-self.my_loc)/(self.my_scale*np.sqrt(2))))
return aux/2 # divide by 2?
def _ppf(self, p):
if np.any((p<0.0) | (p>1.0)):
raise RuntimeError("Quantile function accepts only values between 0 and 1")
return np.interp(p, self.cdf_arr, self.x)
# Testing the class
if __name__ == "__main__":
a = 1.0
b = 3.0
my_loc = 1.5
my_scale = 0.02
my_cust_distr = custom_distr(name="my_dist", a=a, b=b,
my_loc=my_loc, my_scale=my_scale)
x = np.linspace(a, b, 10000)
start_t = time.time()
the_pdf = my_cust_distr.pdf(x)
print("PDF calc time: {:4.4f}".format(time.time()-start_t))
plt.plot(x, the_pdf, label='pdf')
start_t = time.time()
the_cdf = my_cust_distr.cdf(x)
print("CDF calc time: {:4.4f}".format(time.time()-start_t))
plt.plot(x, the_cdf, 'r', alpha=0.8, label='cdf')
start_t = time.time()
r = my_cust_distr.rvs(size=10000)
print("RVS calc time: {:4.4f}".format(time.time()-start_t))
plt.hist(r, density=True, histtype='stepfilled', alpha=0.3, bins=100)
plt.ylim([0.0, the_pdf.max()])
# plt.xlim([a, b])
plt.grid(which='both')
plt.legend()
print("Maximum of CDF is: {:2.1f}".format(the_cdf[-1]))
plt.show()
The resulting image is:
And the output is:
PDF calc time: 0.0010
CDF calc time: 0.0010
RVS calc time: 0.0010
Maximum of CDF is: 1.0
The code is faster than before, at the cost of using a bit more memory.
Related
Following this post, I tried to create a logit-normal distribution by creating the LogitNormal class:
import numpy as np
import matplotlib.pyplot as plt
from scipy.special import logit
from scipy.stats import norm, rv_continuous
class LogitNormal(rv_continuous):
def _pdf(self, x, **kwargs):
return norm.pdf(logit(x), **kwargs)/(x*(1-x))
class OtherLogitNormal:
def pdf(self, x, **kwargs):
return norm.pdf(logit(x), **kwargs)/(x*(1-x))
fig, ax = plt.subplots()
values = np.linspace(10e-10, 1-10e-10, 1000)
sigma, mu = 1.78, 0
ax.plot(
values, LogitNormal().pdf(values, loc=mu, scale=sigma), label='subclassed'
)
ax.plot(
values, OtherLogitNormal().pdf(values, loc=mu, scale=sigma),
label='not subclassed'
)
ax.legend()
fig.show()
However, the LogitNormal class does not produce the desired results. When I don't subclass rv_continuous it works. Why is that? I need the subclassing to work because I also need the other methods that come with it like rvs.
Btw, the only reason I am creating my own logit-normal distribution in Python is because the only implementations of that distribution that I could find were from the PyMC3 package and from the TensorFlow package, both of which are pretty heavy / overkill if you only need them for that one function. I already tried PyMC3, but apparently it doesn't do well with scipy I think, it always crashed for me. But that's a whole different story.
Forewords
I came across this problem this week and the only relevant issue I have found about it is this post. I have almost same requirement as the OP:
Having a random variable for Logit Normal distribution.
But I also need:
To be able to perform statistical test as well;
While being compliant with the scipy random variable interface.
As #Jacques Gaudin pointed out the interface for rv_continous (see distribution architecture for details) does not ensure follow up for loc and scale parameters when inheriting from this class. And this is somehow misleading and unfortunate.
Implementing the __init__ method of course allow to create the missing binding but the trade off is: it breaks the pattern scipy is currently using to implement random variables (see an example of implementation for lognormal).
So, I took time to dig into the scipy code and I have created a MCVE for this distribution. Although it is not totally complete (it mainly misses moments overrides) it fits the bill for both OP and my purposes while having satisfying accuracy and performance.
MCVE
An interface compliant implementation of this random variable could be:
class logitnorm_gen(stats.rv_continuous):
def _argcheck(self, m, s):
return (s > 0.) & (m > -np.inf)
def _pdf(self, x, m, s):
return stats.norm(loc=m, scale=s).pdf(special.logit(x))/(x*(1-x))
def _cdf(self, x, m, s):
return stats.norm(loc=m, scale=s).cdf(special.logit(x))
def _rvs(self, m, s, size=None, random_state=None):
return special.expit(m + s*random_state.standard_normal(size))
def fit(self, data, **kwargs):
return stats.norm.fit(special.logit(data), **kwargs)
logitnorm = logitnorm_gen(a=0.0, b=1.0, name="logitnorm")
This implementation unlock most of the scipy random variables potential.
N = 1000
law = logitnorm(0.24, 1.31) # Defining a RV
sample = law.rvs(size=N) # Sampling from RV
params = logitnorm.fit(sample) # Infer parameters w/ MLE
check = stats.kstest(sample, law.cdf) # Hypothesis testing
bins = np.arange(0.0, 1.1, 0.1) # Bin boundaries
expected = np.diff(law.cdf(bins)) # Expected bin counts
As it relies on scipy normal distribution we may assume underlying functions have the same accuracy and performance than normal random variable object. But it might indeed be subject to float arithmetic inaccuracy especially when dealing with highly skewed distributions at the support boundary.
Tests
To check out how it performs we draw some distribution of interest and check them.
Let's create some fixtures:
def generate_fixtures(
locs=[-2.0, -1.0, 0.0, 0.5, 1.0, 2.0],
scales=[0.32, 0.56, 1.00, 1.78, 3.16],
sizes=[100, 1000, 10000],
seeds=[789, 123456, 999999]
):
for (loc, scale, size, seed) in itertools.product(locs, scales, sizes, seeds):
yield {"parameters": {"loc": loc, "scale": scale}, "size": size, "random_state": seed}
And perform checks on related distributions and samples:
eps = 1e-8
x = np.linspace(0. + eps, 1. - eps, 10000)
for fixture in generate_fixtures():
# Reference:
parameters = fixture.pop("parameters")
normal = stats.norm(**parameters)
sample = special.expit(normal.rvs(**fixture))
# Logit Normal Law:
law = logitnorm(m=parameters["loc"], s=parameters["scale"])
check = law.rvs(**fixture)
# Fit:
p = logitnorm.fit(sample)
trial = logitnorm(*p)
resample = trial.rvs(**fixture)
# Hypothetis Tests:
ks = stats.kstest(check, trial.cdf)
bins = np.histogram(resample)[1]
obs = np.diff(trial.cdf(bins))*fixture["size"]
ref = np.diff(law.cdf(bins))*fixture["size"]
chi2 = stats.chisquare(obs, ref, ddof=2)
Some adjustments with n=1000, seed=789 (this sample is quite normal) are shown below:
If you look at the source code of the pdf method, you will notice that _pdf is called without the scale and loc keyword arguments.
if np.any(cond):
goodargs = argsreduce(cond, *((x,)+args+(scale,)))
scale, goodargs = goodargs[-1], goodargs[:-1]
place(output, cond, self._pdf(*goodargs) / scale)
It results that the kwargs in your overriding _pdf method is always an empty dictionary.
If you look a bit closer at the code, you will also notice that the scaling and location are handled by pdf as opposed to _pdf.
In your case, the _pdf method calls norm.pdf so the loc and scale parameters must somehow be available in LogitNormal._pdf.
You could for example pass scale and loc when creating an instance of LogitNormal and store the values as class attributes:
import numpy as np
import matplotlib.pyplot as plt
from scipy.special import logit
from scipy.stats import norm, rv_continuous
class LogitNormal(rv_continuous):
def __init__(self, scale=1, loc=0):
super().__init__(self)
self.scale = scale
self.loc = loc
def _pdf(self, x):
return norm.pdf(logit(x), loc=self.loc, scale=self.scale)/(x*(1-x))
fig, ax = plt.subplots()
values = np.linspace(10e-10, 1-10e-10, 1000)
sigma, mu = 1.78, 0
ax.plot(
values, LogitNormal(scale=sigma, loc=mu).pdf(values), label='subclassed'
)
ax.legend()
fig.show()
I am building a neural network that makes use of T-distribution noise. I am using functions defined in the numpy library np.random.standard_t and the one defined in tensorflow tf.distributions.StudentT. The link to the documentation of the first function is here and that to the second function is here. I am using the said functions like below:
a = np.random.standard_t(df=3, size=10000) # numpy's function
t_dist = tf.distributions.StudentT(df=3.0, loc=0.0, scale=1.0)
sess = tf.Session()
b = sess.run(t_dist.sample(10000))
In the documentation provided for the Tensorflow implementation, there's a parameter called scale whose description reads
The scaling factor(s) for the distribution(s). Note that scale is not technically the standard deviation of this distribution but has semantics more similar to standard deviation than variance.
I have set scale to be 1.0 but I have no way of knowing for sure if these refer to the same distribution.
Can someone help me verify this? Thanks
I would say they are, as their sampling is defined in almost the exact same way in both cases. This is how the sampling of tf.distributions.StudentT is defined:
def _sample_n(self, n, seed=None):
# The sampling method comes from the fact that if:
# X ~ Normal(0, 1)
# Z ~ Chi2(df)
# Y = X / sqrt(Z / df)
# then:
# Y ~ StudentT(df).
seed = seed_stream.SeedStream(seed, "student_t")
shape = tf.concat([[n], self.batch_shape_tensor()], 0)
normal_sample = tf.random.normal(shape, dtype=self.dtype, seed=seed())
df = self.df * tf.ones(self.batch_shape_tensor(), dtype=self.dtype)
gamma_sample = tf.random.gamma([n],
0.5 * df,
beta=0.5,
dtype=self.dtype,
seed=seed())
samples = normal_sample * tf.math.rsqrt(gamma_sample / df)
return samples * self.scale + self.loc # Abs(scale) not wanted.
So it is a standard normal sample divided by the square root of a chi-square sample with parameter df divided by df. The chi-square sample is taken as a gamma sample with parameter 0.5 * df and rate 0.5, which is equivalent (chi-square is a special case of gamma). The scale value, like the loc, only comes into play in the last line, as a way to "relocate" the distribution sample at some point and scale. When scale is one and loc is zero, they do nothing.
Here is the implementation for np.random.standard_t:
double legacy_standard_t(aug_bitgen_t *aug_state, double df) {
double num, denom;
num = legacy_gauss(aug_state);
denom = legacy_standard_gamma(aug_state, df / 2);
return sqrt(df / 2) * num / sqrt(denom);
})
So essentially the same thing, slightly rephrased. Here we have also have a gamma with shape df / 2 but it is standard (rate one). However, the missing 0.5 is now by the numerator as / 2 within the sqrt. So it's just moving the numbers around. Here there is no scale or loc, though.
In truth, the difference is that in the case of TensorFlow the distribution really is a noncentral t-distribution. A simple empirical proof that they are the same for loc=0.0 and scale=1.0 is to plot histograms for both distributions and see how close they look.
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
np.random.seed(0)
t_np = np.random.standard_t(df=3, size=10000)
with tf.Graph().as_default(), tf.Session() as sess:
tf.random.set_random_seed(0)
t_dist = tf.distributions.StudentT(df=3.0, loc=0.0, scale=1.0)
t_tf = sess.run(t_dist.sample(10000))
plt.hist((t_np, t_tf), np.linspace(-10, 10, 20), label=['NumPy', 'TensorFlow'])
plt.legend()
plt.tight_layout()
plt.show()
Output:
That looks pretty close. Obviously, from the point of view of statistical samples, this is not any kind of proof. If you were not still convinced, there are some statistical tools for testing whether a sample comes from a certain distribution or two samples come from the same distribution.
I want to get kernel density estimation for positive data points. Using Python Scipy Stats package, I came up with the following code.
def get_pdf(data):
a = np.array(data)
ag = st.gaussian_kde(a)
x = np.linspace(0, max(data), max(data))
y = ag(x)
return x, y
This works perfectly for most data sets, but it gives an erroneous result for "all positive" data points. To make sure this works correctly, I use numerical integration to compute the area under this curve.
def trapezoidal_2(ag, a, b, n):
h = np.float(b - a) / n
s = 0.0
s += ag(a)[0]/2.0
for i in range(1, n):
s += ag(a + i*h)[0]
s += ag(b)[0]/2.0
return s * h
Since the data is spread in the region (0, int(max(data))), we should get a value close to 1, when executing the following line.
b = 1
data = st.pareto.rvs(b, size=10000)
data = list(data)
a = np.array(data)
ag = st.gaussian_kde(a)
trapezoidal_2(ag, 0, int(max(data)), int(max(data))*2)
But it gives a value close to 0.5 when I test.
But when I intergrate from -100 to max(data), it provides a value close to 1.
trapezoidal_2(ag, -100, int(max(data)), int(max(data))*2+200)
The reason is, ag (KDE) is defined for values less than 0, even though the original data set contains only positive values.
So how can I get a kernel density estimation that considers only positive values, such that area under the curve in the region (o, max(data)) is close to 1?
The choice of the bandwidth is quite important when performing kernel density estimation. I think the Scott's Rule and Silverman's Rule work well for distribution similar to a Gaussian. However, they do not work well for the Pareto distribution.
Quote from the doc:
Bandwidth selection strongly influences the estimate obtained from
the KDE (much more so than the actual shape of the kernel). Bandwidth selection
can be done by a "rule of thumb", by cross-validation, by "plug-in
methods" or by other means; see [3], [4] for reviews. gaussian_kde
uses a rule of thumb, the default is Scott's Rule.
Try with different bandwidth values, for example:
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
b = 1
sample = stats.pareto.rvs(b, size=3000)
kde_sample_scott = stats.gaussian_kde(sample, bw_method='scott')
kde_sample_scalar = stats.gaussian_kde(sample, bw_method=1e-3)
# Compute the integrale:
print('integrale scott:', kde_sample_scott.integrate_box_1d(0, np.inf))
print('integrale scalar:', kde_sample_scalar.integrate_box_1d(0, np.inf))
# Graph:
x_span = np.logspace(-2, 1, 550)
plt.plot(x_span, stats.pareto.pdf(x_span, b), label='theoretical pdf')
plt.plot(x_span, kde_sample_scott(x_span), label="estimated pdf 'scott'")
plt.plot(x_span, kde_sample_scalar(x_span), label="estimated pdf 'scalar'")
plt.xlabel('X'); plt.legend();
gives:
integrale scott: 0.5572130540733236
integrale scalar: 0.9999999999968957
and:
We see that the kde using the Scott method is wrong.
I am trying to use scipy's kstest on some data, and a few different distributions. I got to testing my data against a log-normal distribution, and got confused so I made a test.
I am parameterizing the log normal by its own mean and standard deviation (instead of scipys version where loc is exponential of the corresponding normal mean and s is the standard deviation of the corresponding normal.)
Explained here:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.lognorm.html
I have written a function that takes in the my parameters, converts them to scipys parameters, and then samples. Here it is:
def lognormal_samples(M_y, Sig_y):
m_x = (2*math.log(M_y)) - (.5)*(math.log(math.pow(Sig_y,2) + math.pow(M_y,2)))
scale = math.exp(m_x)
sigma2 = -2 * math.log(M_y) + math.log(math.pow(Sig_y,2) + math.pow(M_y,2))
s = math.sqrt(sigma2)
result = stats.lognorm(s, scale=scale).rvs(size=10000)
return result, s, scale
To test I want to see if the ks statistic is near 0 if I fit these samples to a scipy.stats.lognormal. Here I try to do that:
def lognormal_test_of_ks_test():
samples, my_s, my_scale = lognormal_samples(1, .25)
ks = stats.kstest(samples, 'lognorm', args=[my_s, my_scale])[0]
print('ks: ', ks)
When I run this I get ks: 0.958038612187 which is ridiculously high. My problem I believe is that when I pass [my_s,my_scale] to args these are not actually getting passed to s and scale in lognorm in kstest. How do I pass my two paremeters into kstest so that they actually parameterize lognorm? I would image it would be something like:
my_s = 's=' + str(my_s)
my_scale = 'scale=' + str(my_scale)
my_args = [my_s, my_scale]
ks = stats.kstest(samples, 'lognorm', args=my_args)[0]
But this doesn't work either.
kstest ends up calling lognorm.cdf which takes the following arguments according to doc:
cdf(x, s, loc=0, scale=1)
So you need to pass:
my_args = [my_s, 0, my_scale]
ks = stats.kstest(samples, 'lognorm', args=my_args)[0]
which outputs:
ks: 0.007790356168134116
I am completely new to pymc3, so please excuse the fact that this is likely trivial. I have a very simple model where I am predicting a binary response function. The model is almost a verbatim copy of this example: https://github.com/pymc-devs/pymc3/blob/master/pymc3/examples/gelman_bioassay.py
I get back the model parameters (alpha, beta, and theta), but I can't seem to figure out how to overplot the predictions of the model vs. the input data. I tried doing this (using the parlance of the bioassay model):
from scipy.stats import binom
mean_alpha = mean(trace['alpha'])
mean_beta = mean(trace['beta'])
pred_death = binom.rvs(n, 1./(1.+np.exp(-(mean_alpha + mean_beta * dose))))
and then plotting dose vs. pred_death, but this is manifestly not correct as I get different draws of the binomial distribution every time.
Related to this is another question, how do I evaluate the goodness of fit? I couldn't seem to find anything to that effect in the "getting started" pymc3 tutorial.
Thanks very much for any advice!
Hi a simple way to do it is as follows:
from pymc3 import *
from numpy import ones, array
# Samples for each dose level
n = 5 * ones(4, dtype=int)
# Log-dose
dose = array([-.86, -.3, -.05, .73])
def invlogit(x):
return np.exp(x) / (1 + np.exp(x))
with Model() as model:
# Logit-linear model parameters
alpha = Normal('alpha', 0, 0.01)
beta = Normal('beta', 0, 0.01)
# Calculate probabilities of death
theta = Deterministic('theta', invlogit(alpha + beta * dose))
# Data likelihood
deaths = Binomial('deaths', n=n, p=theta, observed=[0, 1, 3, 5])
start = find_MAP()
step = NUTS(scaling=start)
trace = sample(2000, step, start=start, progressbar=True)
import matplotlib.pyplot as plt
death_fit = np.percentile(trace.theta,50,axis=0)
plt.plot(dose, death_fit,'g', marker='.', lw='1.25', ls='-', ms=5, mew=1)
plt.show()
If you want to plot dose vs pred_death, where pred_death is computed from the mean estimated values of alpha and beta, then do:
pred_death = 1./(1. + np.exp(-(mean_alpha + mean_beta * dose)))
plt.plot(dose, pred_death)
instead if you want to plot dose vs pred_death, where pred_death is computed taking into account the uncertainty in posterior for alpha and beta. Then probably the easiest way is to use the function sample_ppc:
May be something like
ppc = pm.sample_ppc(trace, samples=100, model=pmmodel)
for i in range(100):
plt.plot(dose, ppc['deaths'][i], 'bo', alpha=0.5)
Using Posterior Predictive Checks (ppc) is a way to check how well your model behaves by comparing the predictions of the model to your actual data. Here you have an example of sample_ppc
Other options could be to plot the mean value plus some interval of interest.