sampling multinomial from small log probability vectors in numpy/scipy - python

Is there a function in numpy/scipy that lets you sample multinomial from a vector of small log probabilities, without losing precision? example:
# sample element randomly from these log probabilities
l = [-900, -1680]
the naive method fails because of underflow:
import scipy
import numpy as np
# this makes a all zeroes
a = np.exp(l) / scipy.misc.logsumexp(l)
r = np.random.multinomial(1, a)
this is one attempt:
def s(l):
m = np.max(l)
norm = m + np.log(np.sum(np.exp(l - m)))
p = np.exp(l - norm)
return np.where(np.random.multinomial(1, p) == 1)[0][0]
is this the best/fastest method and can np.exp() in the last step be avoided?

First of all, I believe the problem you're encountering is because you're normalizing your probabilities incorrectly. This line is incorrect:
a = np.exp(l) / scipy.misc.logsumexp(l)
You're dividing a probability by a log probability, which makes no sense. Instead you probably want
a = np.exp(l - scipy.misc.logsumexp(l))
If you do that, you find a = [1, 0] and your multinomial sampler works as expected up to floating point precision in the second probability.
A Solution for Small N: Histograms
That said, if you still need more precision and performance is not as much of a concern, one way you could make progress is by implementing a multinomial sampler from scratch, and then modifying this to work at higher precision.
NumPy's multinomial function is implemented in Cython, and essentially performs a loop over a number of binomial samples and combines them into a multinomial sample.
and you can call it like this:
np.random.multinomial(10, [0.1, 0.2, 0.7])
# [0, 1, 9]
(Note that the precise output values here & below are random, and will change from call to call).
Another way you might implement a multinomial sampler is to generate N uniform random values, then compute the histogram with bins defined by the cumulative probabilities:
def multinomial(N, p):
rand = np.random.uniform(size=N)
p_cuml = np.cumsum(np.hstack([[0], p]))
p_cuml /= p_cuml[-1]
return np.histogram(rand, bins=p_cuml)[0]
multinomial(10, [0.1, 0.2, 0.7])
# [1, 1, 8]
With this method in mind, we can think about doing things to higher precision by keeping everything in log-space. The main trick is to realize that the log of uniform random deviates is equivalent to the negative of exponential random deviates, and so you can do everything above without ever leaving log space:
def multinomial_log(N, logp):
log_rand = -np.random.exponential(size=N)
logp_cuml = np.logaddexp.accumulate(np.hstack([[-np.inf], logp]))
logp_cuml -= logp_cuml[-1]
return np.histogram(log_rand, bins=logp_cuml)[0]
multinomial_log(10, np.log([0.1, 0.2, 0.7]))
# [1, 2, 7]
The resulting multinomial draws will maintain precision even for very small values in the p array.
Unfortunately, these histogram-based solutions will be much slower than the native numpy.multinomial function, so if performance is an issue you may need another approach. One option would be to adapt the Cython code linked above to work in log-space, using similar mathematical tricks as I used here.
A Solution for Large N: Poisson Approximation
The problem with the above solution is that as N grows large, it becomes very slow.
I was thinking about this and realized there's a more efficient way forward, despite np.random.multinomial failing for probabilities smaller than 1E-16 or so.
Here's an example of that failure: on a 64-bit machine, this will always give zero for the first entry because of the way the code is implemented, when in reality it should give something near 10:
np.random.multinomial(1E18, [1E-17, 1])
# array([ 0, 1000000000000000000])
If you dig into the source, you can trace this issue to the binomial function upon which the multinomial function is built. The cython code internally does something like this:
def multinomial_basic(N, p, size=None):
results = np.array([np.random.binomial(N, pi, size) for pi in p])
results[-1] = int(N) - results[:-1].sum(0)
return np.rollaxis(results, 0, results.ndim)
multinomial_basic(1E18, [1E-17, 1])
# array([ 0, 1000000000000000000])
The problem is that the binomial function chokes on very small values of p – this is because the algorithm computes the value (1 - p), so the value of p is limited by floating-point precision.
So what can we do? Well, it turns out that for small values of p, the Poisson distribution is an extremely good approximation of the binomial distribution, and the implementation doesn't have these issues. So we can build a robust multinomial function based on a robust binomial sampler that switches to a Poisson sampler at small p:
def binomial_robust(N, p, size=None):
if p < 1E-7:
return np.random.poisson(N * p, size)
else:
return np.random.binomial(N, p, size)
def multinomial_robust(N, p, size=None):
results = np.array([binomial_robust(N, pi, size) for pi in p])
results[-1] = int(N) - results[:-1].sum(0)
return np.rollaxis(results, 0, results.ndim)
multinomial_robust(1E18, [1E-17, 1])
array([ 12, 999999999999999988])
The first entry is nonzero and near 10 as expected! Note that we can't use N larger than 1E18, because it will overflow the long integer.
But we can confirm that our approach works for smaller probabilities using the size parameter, and averaging over results:
p = [1E-23, 1E-22, 1E-21, 1E-20, 1]
size = int(1E6)
multinomial_robust(1E18, p, size).mean(0)
# array([ 1.70000000e-05, 9.00000000e-05, 9.76000000e-04,
# 1.00620000e-02, 1.00000000e+18])
We see that even for these very small probabilities, the multinomial values are turning up in the right proportion. The result is a very robust and very fast approximation to the multinomial distribution for small p.

Related

Efficient sampling from a 'partial' binomial distribution

I am want to sample from the binomial distribution B(n,p) but with an additional constraint that the sampled value belongs in the range [a,b] (instead of the normal 0 to n range). In other words, I have to sample a value from binomial distribution given that it lies in the range [a,b]. Mathematically, I can write the pmf of this distribution (f(x)) in terms of the pmf of binomial distribution bin(x) = [(nCx)*(p)^x*(1-p)^(n-x)] as
sum = 0
for i in range(a,b+1):
sum += bin(i)
f(x) = bin(x)/sum
One way of sampling from this distribution is to sample a uniformly distributed number and apply the inverse of the CDF(obtained using the pmf). However, I don't think this is a good idea as the pmf calculation would easily get very time-consuming.
The values of n,x,a,b are quite large in my case and this way of computing pmf and then using a uniform random variable to generate the sample seems extremely inefficient due to the factorial terms in nCx.
What's a nice/efficient way to achieve this?
This is a way to collect all the values of bin in a pretty short time:
from scipy.special import comb
import numpy as np
def distribution(n, p=0.5):
x = np.arange(n+1)
return comb(n, x, exact=False) * p ** x * (1 - p) ** (n - x)
It can be done in a quarter of microsecond for n=1000.
Sample run:
>>> distribution(4):
array([0.0625, 0.25 , 0.375 , 0.25 , 0.0625])
You can sum specific parts of this array like so:
>>> np.sum(distribution(4)[2:4])
0.625
Remark: For n>1000 middle values of this distribution requires to use extremely large numbers in multiplication therefore RuntimeWarning is raised.
Bugfix
You can use scipy.stats.binom equivalently:
from scipy.stats import binom
def distribution(n, p):
return binom.pmf(np.arange(n+1), n, p)
This does the same as above mentioned method quite efficiently (n=1000000 in a third of second). Alternatively, you can use binom.cdf(np.arange(n+1), n, p) which calculate cumulative sum of binom.pmf. Then subtraction of bth and ath items of this array gives an output which is very close to what you expect.
Another way would be to use the CDF and it's inverse, something like:
from scipy import stats
dist = stats.binom(100, 0.5)
# limit ourselves to [60, 100]
lo, hi = dist.cdf([60, 100])
# draw a sample
x = dist.ppf(stats.uniform(lo, hi-lo).rvs())
should give us values in the range. note that due to floating point precision, this might give you values outside of what you want. it gets worse above the mean of the distribution
note that for large values you might as well use the normal approximation

Mean and standard deviation of lognormal distribution do not match analytic values

As part of my research, I measure the mean and standard deviation of draws from a lognormal distribution. Given a value of the underlying normal distribution, it should be possible to analytically predict these quantities (as given at https://en.wikipedia.org/wiki/Log-normal_distribution).
However, as can be seen in the plots below, this does not seem to be the case. The first plot shows the mean of the lognormal data against the sigma of the gaussian, while the second plot shows the sigma of the lognormal data against that of the gaussian. Clearly, the "calculated" lines deviate from the "analytic" ones very significantly.
I take the mean of the gaussian distribution to be related to the sigma by mu = -0.5*sigma**2 as this ensures that the lognormal field ought to have mean of 1. Note, this is motivated by the area of physics that I work in: the deviation from analytic values still occurs if you set mu=0.0 for example.
By copying and pasting the code at the bottom of the question, it should be possible to reproduce the plots below. Any advice as to what might be causing this would be much appreciated!
Mean of lognormal vs sigma of gaussian:
Sigma of lognormal vs sigma of gaussian:
Note, to produce the plots above, I used N=10000, but have put N=1000 in the code below for speed.
import numpy as np
import matplotlib.pyplot as plt
mean_calc = []
sigma_calc = []
mean_analytic = []
sigma_analytic = []
ss = np.linspace(1.0,10.0,46)
N = 1000
for s in ss:
mu = -0.5*s*s
ln = np.random.lognormal(mean=mu, sigma=s, size=(N,N))
mean_calc += [np.average(ln)]
sigma_calc += [np.std(ln)]
mean_analytic += [np.exp(mu+0.5*s*s)]
sigma_analytic += [np.sqrt((np.exp(s**2)-1)*(np.exp(2*mu + s*s)))]
plt.loglog(ss,mean_calc,label='calculated')
plt.loglog(ss,mean_analytic,label='analytic')
plt.legend();plt.grid()
plt.xlabel(r'$\sigma_G$')
plt.ylabel(r'$\mu_{LN}$')
plt.show()
plt.loglog(ss,sigma_calc,label='calculated')
plt.loglog(ss,sigma_analytic,label='analytic')
plt.legend();plt.grid()
plt.xlabel(r'$\sigma_G$')
plt.ylabel(r'$\sigma_{LN}$')
plt.show()
TL;DR
Lognormal are positively skewed and heavy tailed distribution. When performing float arithmetic operations (such as sum, mean or std) on sample drawn from a highly skewed distribution, the sampling vector contains values with discrepancy over several order of magnitude (many decades). This makes the computation inaccurate.
The problem comes from those two lines:
mean_calc += [np.average(ln)]
sigma_calc += [np.std(ln)]
Because ln contains both very low and very high values with order of magnitude much higher than float precision.
The problem can be easily detected to warn user that its computation are wrong using the following predicate:
(max(ln) + min(ln)) <= max(ln)
Which is obviously false in Strictly Positive Real but must be considered when using Finite Precision Arithmetic.
Modifying your MCVE
If we slightly modify your MCVE to:
from scipy import stats
for s in ss:
mu = -0.5*s*s
ln = stats.lognorm(s, scale=np.exp(mu)).rvs(N*N)
f = stats.lognorm.fit(ln, floc=0)
mean_calc += [f[2]*np.exp(0.5*s*s)]
sigma_calc += [np.sqrt((np.exp(f[0]**2)-1)*(np.exp(2*mu + s*s)))]
mean_analytic += [np.exp(mu+0.5*s*s)]
sigma_analytic += [np.sqrt((np.exp(s**2)-1)*(np.exp(2*mu + s*s)))]
It gives the reasonably correct mean and standard deviation estimation even for high value of sigma.
The key is that fit uses MLE algorithm to estimates parameters. This totally differs from your original approach which directly performs the mean of the sample.
The fit method returns a tuple with (sigma, loc=0, scale=exp(mu)) which are parameters of the scipy.stats.lognorm object as specified in documentation.
I think you should investigate how you are estimating mean and standard deviation. The divergence probably comes from this part of your algorithm.
There might be several reasons why it diverges, at least consider:
Biased estimator: Are your estimator correct and unbiased? Mean is unbiased estimator (see next section) but maybe not efficient;
Sampled outliers from Pseudo Random Generator may not be as intense as they should be compared to the theoretical distribution: maybe MLE is less sensitive than your estimator New MCVE bellow does not support this hypothesis, but Float Arithmetic Error can explain why your estimators are underestimated;
Float Arithmetic Error New MCVE bellow highlights that it is part of your problem.
A scientific quote
It seems naive mean estimator (simply taking mean), even if unbiased, is inefficient to properly estimate mean for large sigma (see Qi Tang, Comparison of Different Methods for Estimating Log-normal Means, p. 11):
The naive estimator is easy to calculate and it is unbiased. However,
this estimator can be inefficient when variance is large and sample
size is small.
The thesis reviews several methods to estimate mean of a lognormal distribution and uses MLE as reference for comparison. This explains why your method has a drift as sigma increases and MLE stick better alas it is not time efficient for large N. Very interesting paper.
Statistical considerations
Recalling than:
Lognormal is a heavy and long tailed distribution positively skewed. One consequence is: as the shape parameter sigma grows, the asymmetry and skweness grows, so does the strength of outliers.
Effect of Sample Size: as the number of samples drawn from a distribution grows, the expectation of having an outlier increases (so does the extent).
Building a new MCVE
Lets build a new MCVE to make it clearer. The code bellow draws samples of different sizes (N ranges between 100 and 10000) from lognormal distribution where shape parameter varies (sigma ranges between 0.1 and 10) and scale parameter is set to be unitary.
import warnings
import numpy as np
from scipy import stats
# Make computation reproducible among batches:
np.random.seed(123456789)
# Parameters ranges:
sigmas = np.arange(0.1, 10.1, 0.1)
sizes = np.logspace(2, 5, 21, base=10).astype(int)
# Placeholders:
rv = np.empty((sigmas.size,), dtype=object)
xmean = np.full((3, sigmas.size, sizes.size), np.nan)
xstd = np.full((3, sigmas.size, sizes.size), np.nan)
xextent = np.full((2, sigmas.size, sizes.size), np.nan)
eps = np.finfo(np.float64).eps
# Iterate Shape Parameter:
for (i, s) in enumerate(sigmas):
# Create Random Variable:
rv[i] = stats.lognorm(s, loc=0, scale=1)
# Iterate Sample Size:
for (j, N) in enumerate(sizes):
# Draw Samples:
xs = rv[i].rvs(N)
# Sample Extent:
xextent[:,i,j] = [np.min(xs), np.max(xs)]
# Check (max(x) + min(x)) <= max(x)
if (xextent[0,i,j] + xextent[1,i,j]) - xextent[1,i,j] < eps:
warnings.warn("Potential Float Arithmetic Errors: logN(mu=%.2f, sigma=%2f).sample(%d)" % (0, s, N))
# Generate different Estimators:
# Fit Parameters using MLE:
fit = stats.lognorm.fit(xs, floc=0)
xmean[0,i,j] = fit[2]
xstd[0,i,j] = fit[0]
# Naive (Bad Estimators because of Float Arithmetic Error):
xmean[1,i,j] = np.mean(xs)*np.exp(-0.5*s**2)
xstd[1,i,j] = np.sqrt(np.log(np.std(xs)**2*np.exp(-s**2)+1))
# Log-transform:
xmean[2,i,j] = np.exp(np.mean(np.log(xs)))
xstd[2,i,j] = np.std(np.log(xs))
Observation: The new MCVE starts to raise warnings when sigma > 4.
MLE as Reference
Estimating shape and scale parameters using MLE performs well:
The two figures above show than:
Error on estimation grows alongside with shape parameter;
Error on estimation reduces as sample size increases (CTL);
Note than MLE also fits well the shape parameter:
Float Arithmetic
It is worthy to plot the extent of drawn samples versus shape parameter and sample size:
Or the decimal magnitude between smallest and largest number form the sample:
On my setup:
np.finfo(np.float64).precision # 15
np.finfo(np.float64).eps # 2.220446049250313e-16
It means we have at maximum 15 significant figures to work with, if the magnitude between two numbers exceed then the largest number absorb the smaller ones.
A basic example: What is the result of 1 + 1e6 if we can only keep four significant figures?
The exact result is 1,000,001.0 but it must be rounded off to 1.000e6. This implies: the result of the operation equals to the highest number because of the rounding precision. It is inherent of Finite Precision Arithmetic.
The two previous figures above in conjunction with statistical consideration supports your observation that increasing N does not improve estimation for large value of sigma in your MCVE.
Figures above and below show than when sigma > 3 we haven't enough significant figures (less than 5) to performs valid computations.
Further more we can say that estimator will be underestimated because largest numbers will absorb smallest and the underestimated sum will then be divided by N making the estimator biased by default.
When shape parameter becomes sufficiently large, computations are strongly biased because of Arithmetic Float Errors.
It means using quantities such as:
np.mean(xs)
np.std(xs)
When computing estimate will have huge Arithmetic Float Error because of the important discrepancy among values stored in xs. Figures below reproduce your issue:
As stated, estimations are in default (not in excess) because high values (few outliers) in sampled vector absorb small values (most of the sampled values).
Logarithmic Transformation
If we apply a logarithmic transformation, we can drastically reduces this phenomenon:
xmean[2,i,j] = np.exp(np.mean(np.log(xs)))
xstd[2,i,j] = np.std(np.log(xs))
And then the naive estimation of the mean is correct and will be far less affected by Arithmetic Float Error because all sample values will lie within few decades instead of having relative magnitude higher than the float arithmetic precision.
Actually, taking the log-transform returns the same result for mean and std estimation than MLE for each N and sigma:
np.allclose(xmean[0,:,:], xmean[2,:,:]) # True
np.allclose(xstd[0,:,:], xstd[2,:,:]) # True
Reference
If you are looking for complete and detailed explanations of this kind of issues when performing scientific computing, I recommend you to read the excellent book: N. J. Higham, Accuracy and Stability of Numerical Algorithms, Siam, Second Edition, 2002.
Bonus
Here an example of figure generation code:
import matplotlib.pyplot as plt
fig, axe = plt.subplots()
idx = slice(None, None, 5)
axe.loglog(sigmas, xmean[0,:,idx])
axe.axhline(1, linestyle=':', color='k')
axe.set_title(r"MLE: $x \sim \log\mathcal{N}(\mu=0,\sigma)$")
axe.set_xlabel(r"Standard Deviation, $\sigma$")
axe.set_ylabel(r"Mean Estimation, $\hat{\mu}$")
axe.set_ylim([0.1,10])
lgd = axe.legend([r"$N = %d$" % s for s in sizes[idx]] + ['Exact'], bbox_to_anchor=(1,1), loc='upper left')
axe.grid(which='both')
fig.savefig('Lognorm_MLE_Emean_Sigma.png', dpi=120, bbox_extra_artists=(lgd,), bbox_inches='tight')

Differentiable round function in Tensorflow?

So the output of my network is a list of propabilities, which I then round using tf.round() to be either 0 or 1, this is crucial for this project.
I then found out that tf.round isn't differentiable so I'm kinda lost there.. :/
Something along the lines of x - sin(2pi x)/(2pi)?
I'm sure there's a way to squish the slope to be a bit steeper.
You can use the fact that tf.maximum() and tf.minimum() are differentiable, and the inputs are probabilities from 0 to 1
# round numbers less than 0.5 to zero;
# by making them negative and taking the maximum with 0
differentiable_round = tf.maximum(x-0.499,0)
# scale the remaining numbers (0 to 0.5) to greater than 1
# the other half (zeros) is not affected by multiplication
differentiable_round = differentiable_round * 10000
# take the minimum with 1
differentiable_round = tf.minimum(differentiable_round, 1)
Example:
[0.1, 0.5, 0.7]
[-0.0989, 0.001, 0.20099] # x - 0.499
[0, 0.001, 0.20099] # max(x-0.499, 0)
[0, 10, 2009.9] # max(x-0.499, 0) * 10000
[0, 1.0, 1.0] # min(max(x-0.499, 0) * 10000, 1)
This works for me:
x_rounded_NOT_differentiable = tf.round(x)
x_rounded_differentiable = x - tf.stop_gradient(x - x_rounded_NOT_differentiable)
Rounding is a fundamentally nondifferentiable function, so you're out of luck there. The normal procedure for this kind of situation is to find a way to either use the probabilities, say by using them to calculate an expected value, or by taking the maximum probability that is output and choose that one as the network's prediction. If you aren't using the output for calculating your loss function though, you can go ahead and just apply it to the result and it doesn't matter if it's differentiable. Now, if you want an informative loss function for the purpose of training the network, maybe you should consider whether keeping the output in the format of probabilities might actually be to your advantage (it will likely make your training process smoother)- that way you can just convert the probabilities to actual estimates outside of the network, after training.
Building on a previous answer, a way to get an arbitrarily good approximation is to approximate round() using a finite Fourier approximation and use as many terms as you need. Fundamentally, you can think of round(x) as adding a reverse (i. e. descending) sawtooth wave to x. So, using the Fourier expansion of the sawtooth wave we get
With N = 5, we get a pretty nice approximation:
Kind of an old question, but I just solved this problem for TensorFlow 2.0. I am using the following round function on in my audio auto-encoder project. I basically want to create a discrete representation of sound which is compressed in time. I use the round function to clamp the output of the encoder to integer values. It has been working well for me so far.
#tf.custom_gradient
def round_with_gradients(x):
def grad(dy):
return dy
return tf.round(x), grad
In range 0 1, translating and scaling a sigmoid can be a solution:
slope = 1000
center = 0.5
e = tf.exp(slope*(x-center))
round_diff = e/(e+1)
In tensorflow 2.10, there is a function called soft_round which achieves exactly this.
Fortunately, for those who are using lower versions, the source code is really simple, so I just copy-pasted those lines, and it works like a charm:
def soft_round(x, alpha, eps=1e-3):
"""Differentiable approximation to `round`.
Larger alphas correspond to closer approximations of the round function.
If alpha is close to zero, this function reduces to the identity.
This is described in Sec. 4.1. in the paper
> "Universally Quantized Neural Compression"<br />
> Eirikur Agustsson & Lucas Theis<br />
> https://arxiv.org/abs/2006.09952
Args:
x: `tf.Tensor`. Inputs to the rounding function.
alpha: Float or `tf.Tensor`. Controls smoothness of the approximation.
eps: Float. Threshold below which `soft_round` will return identity.
Returns:
`tf.Tensor`
"""
# This guards the gradient of tf.where below against NaNs, while maintaining
# correctness, as for alpha < eps the result is ignored.
alpha_bounded = tf.maximum(alpha, eps)
m = tf.floor(x) + .5
r = x - m
z = tf.tanh(alpha_bounded / 2.) * 2.
y = m + tf.tanh(alpha_bounded * r) / z
# For very low alphas, soft_round behaves like identity
return tf.where(alpha < eps, x, y, name="soft_round")
alpha sets how soft the function is. Greater values leads to better approximations of round function, but then it becomes harder to fit since gradients vanish:
x = tf.convert_to_tensor(np.arange(-2,2,.1).astype(np.float32))
for alpha in [ 3., 7., 15.]:
y = soft_round(x, alpha)
plt.plot(x.numpy(), y.numpy(), label=f'alpha={alpha}')
plt.legend()
plt.title('Soft round function for different alphas')
plt.grid()
In my case, I tried different values for alpha, and 3. looks like a good choice.

Truncated multivariate normal in SciPy?

I'm trying to automate a process that at some point needs to draw samples from a truncated multivariate normal. That is, it's a normal multivariate normal distribution (i.e. Gaussian) but the variables are constrained to a cuboid. My given inputs are the mean and covariance of the full multivariate normal but I need samples in my box.
Up to now, I'd just been rejecting samples outside the box and resampling as necessary, but I'm starting to find that my process sometimes gives me (a) large covariances and (b) means that are close to the edges. These two events conspire against the speed of my system.
So what I'd like to do is sample the distribution correctly in the first place. Googling led only to this discussion or the truncnorm distribution in scipy.stats. The former is inconclusive and the latter seems to be for one variable. Is there any native multivariate truncated normal? And is it going to be any better than rejecting samples, or should I do something smarter?
I'm going to start working on my own solution, which would be to rotate the untruncated Gaussian to it's principal axes (with an SVD decomposition or something), use a product of truncated Gaussians to sample the distribution, then rotate that sample back, and reject/resample as necessary. If the truncated sampling is more efficient, I think this should sample the desired distribution faster.
So, according to the Wikipedia article, sampling a multivariate truncated normal distribution (MTND) is more difficult. I ended up taking a relatively easy way out and using an MCMC sampler to relax an initial guess towards the MTND as follows.
I used emcee to do the MCMC work. I find this package phenomenally easy-to-use. It only requires a function that returns the log-probability of the desired distribution. So I defined this function
from numpy.linalg import inv
def lnprob_trunc_norm(x, mean, bounds, C):
if np.any(x < bounds[:,0]) or np.any(x > bounds[:,1]):
return -np.inf
else:
return -0.5*(x-mean).dot(inv(C)).dot(x-mean)
Here, C is the covariance matrix of the multivariate normal. Then, you can run something like
S = emcee.EnsembleSampler(Nwalkers, Ndim, lnprob_trunc_norm, args = (mean, bounds, C))
pos, prob, state = S.run_mcmc(pos, Nsteps)
for given mean, bounds and C. You need an initial guess for the walkers' positions pos, which could be a ball around the mean,
pos = emcee.utils.sample_ball(mean, np.sqrt(np.diag(C)), size=Nwalkers)
or sampled from an untruncated multivariate normal,
pos = numpy.random.multivariate_normal(mean, C, size=Nwalkers)
and so on. I personally do several thousand steps of sample discarding first, because it's fast, then force the remaining outliers back within the bounds, then run the MCMC sampling.
The number of steps for convergence is up to you.
Note also that emcee easily supports basic parallelization by adding the argument threads=Nthreads to the EnsembleSampler initialization. So you can make this blazing fast.
I have reimplemented an algorithm which does not depend on MCMC but creates independent and identically distributed (iid) samples from the truncated multivariate normal distribution. Having iid samples can be very useful! I used to also use emcee as described in the answer by Warrick, but for convergence the number of samples needed exploded in higher dimensions, making it impractical for my use case.
The algorithm was introduced by Botev (2016) and uses an accept-reject algorithm based on minimax exponential tilting. It was originally implemented in MATLAB but reimplementing it for Python increased the performance significantly compared to running it using the MATLAB engine in Python. It also works well and is fast at higher dimensions.
The code is available at: https://github.com/brunzema/truncated-mvn-sampler.
An Example:
d = 10 # dimensions
# random mu and cov
mu = np.random.rand(d)
cov = 0.5 - np.random.rand(d ** 2).reshape((d, d))
cov = np.triu(cov)
cov += cov.T - np.diag(cov.diagonal())
cov = np.dot(cov, cov)
# constraints
lb = np.zeros_like(mu) - 1
ub = np.ones_like(mu) * np.inf
# create truncated normal and sample from it
n_samples = 100000
tmvn = TruncatedMVN(mu, cov, lb, ub)
samples = tmvn.sample(n_samples)
Plotting the first dimension results in:
Reference:
Botev, Z. I., (2016), The normal law under linear restrictions: simulation and estimation via minimax tilting, Journal of the Royal Statistical Society Series B, 79, issue 1, p. 125-148
Simulating truncated multivariate normal can be tricky and usually involves some conditional sampling by MCMC.
My short answer is, you can use my code (https://github.com/ralphma1203/trun_mvnt)!!! It implements the Gibbs sampler algorithm from , which can handle general linear constraints in the form of , even when you have non-full rank D and more constraints than the dimensionality.
import numpy as np
from trun_mvnt import rtmvn, rtmvt
########## Traditional problem, probably what you need... ##########
##### lower < X < upper #####
# So D = identity matrix
D = np.diag(np.ones(4))
lower = np.array([-1,-2,-3,-4])
upper = -lower
Mean = np.zeros(4)
Sigma = np.diag([1,2,3,4])
n = 10 # want 500 final sample
burn = 100 # burn-in first 100 iterates
thin = 1 # thinning for Gibbs
random_sample = rtmvn(n, Mean, Sigma, D, lower, upper, burn, thin)
# Numpy array n-by-p as result!
random_sample
########## Non-full rank problem (more constraints than dimension) ##########
Mean = np.array([0,0])
Sigma = np.array([1, 0.5, 0.5, 1]).reshape((2,2)) # bivariate normal
D = np.array([1,0,0,1,1,-1]).reshape((3,2)) # non-full rank problem
lower = np.array([-2,-1,-2])
upper = np.array([2,3,5])
n = 500 # want 500 final sample
burn = 100 # burn-in first 100 iterates
thin = 1 # thinning for Gibbs
random_sample = rtmvn(n, Mean, Sigma, D, lower, upper, burn, thin) # Numpy array n-by-p as result!
A little late I guess but for the record, you could use Hamiltonian Monte Carlo. A module in Matlab exists named HMC exact. It shouldn't be too difficult to translate in Py.

Equivalent python command for quantile in matlab

I'm trying to replicate some Matlab code in python. I could not find an exact equivalent to the Matlab function quantile. What I found most close is python's mquantiles.
Matlab example:
quantile( [ 8.60789925e-05, 1.98989354e-05 , 1.68308882e-04, 1.69379370e-04], 0.8)
...gives: 0.00016958
Same example in python:
scipy.stats.mstats.mquantiles( [8.60789925e-05, 1.98989354e-05, 1.68308882e-04, 1.69379370e-04], 0.8)
...gives 0.00016912
Does anyone know how to exactly replicate Matlab's quantile function?
The documentation for quantile (under the More About => Algorithms section) gives the exact algorithm used. Here's some python code that does it for a single quantile for a flat array, using bottleneck to do partial sorting:
import numpy as np
import botteleneck as bn
def quantile(a, prob):
"""
Estimates the prob'th quantile of the values in a data array.
Uses the algorithm of matlab's quantile(), namely:
- Remove any nan values
- Take the sorted data as the (.5/n), (1.5/n), ..., (1-.5/n) quantiles.
- Use linear interpolation for values between (.5/n) and (1 - .5/n).
- Use the minimum or maximum for quantiles outside that range.
See also: scipy.stats.mstats.mquantiles
"""
a = np.asanyarray(a)
a = a[np.logical_not(np.isnan(a))].ravel()
n = a.size
if prob >= 1 - .5/n:
return a.max()
elif prob <= .5 / n:
return a.min()
# find the two bounds we're interpreting between:
# that is, find i such that (i+.5) / n <= prob <= (i+1.5)/n
t = n * prob - .5
i = np.floor(t)
# partial sort so that the ith element is at position i, with bigger ones
# to the right and smaller to the left
a = bn.partsort(a, i)
if i == t: # did we luck out and get an integer index?
return a[i]
else:
# we'll linearly interpolate between this and the next index
smaller = a[i]
larger = a[i+1:].min()
if np.isinf(smaller):
return smaller # avoid inf - inf
return smaller + (larger - smaller) * (t - i)
I only did the single-quantile, 1d case because that's all I needed. If you want several quantiles, it's probably worth just doing the full sort; to do it per-axis and knew you didn't have any nans, all you should need to do is add an axis argument to the sort and vectorize the linear interpolation bit. Doing it per-axis with nans would be a little trickier.
This code gives:
>>> quantile([ 8.60789925e-05, 1.98989354e-05 , 1.68308882e-04, 1.69379370e-04], 0.8)
0.00016905822360000001
and the matlab code gave 0.00016905822359999999; the difference is 3e-20. (which is less than machine precision)
Your input vector only has 4 values, which is far too few to get a good approximation of the quantiles of the underlying distribution. The discrepancy is probably the result of Matlab and SciPy using different heuristics to compute quantiles on under sampled distributions.
A bit late, but:
mquantiles is very flexible. You just need to provide alphap and betap parameters.
Here, since MATLAB does a linear interpolation, you need to set the parameters to (0.5,0.5).
In [9]: scipy.stats.mstats.mquantiles( [8.60789925e-05, 1.98989354e-05, 1.68308882e-04, 1.69379370e-04], 0.8, alphap=0.5, betap=0.5)
EDIT: MATLAB says that it does linear interpolation, however it seems that it calculates the quantile through piece-wise linear interpolation, which is equivalent to Type 5 quantile in R, and (0.5, 0.5) in scipy.

Categories

Resources