Sample more than one element from multivariable normal distribution - python

I have a 2D means matrix in size n*m, where n is number of samples and m is the dimension of the data.
I have as well n matrices of m*m, namely sigma is my variance matrix in shape n*m*m.
I wish to sample n samples from a the distributions above, such that x_i~N(mean[i], sigma[i]).
Any way to do that in numpy or any other standard lib w/o running with a for loop?
The only option I thought was using np.random.multivariate_normal() by flatting the means matrix to one vector, and flatten the 3D sigma to a 2D blocks-diagonal matrix. And of course reshape afterwards. But that means we are going the sample with sigma in shape (n*m)*(n*m) which can easily be ridiculously huge, and only computing and allocating that matrix (if possible) can take longer than running in a for loop.
In my specific task, right now Sigma is the same matrix for all the samples, means I can express Sigma in m*m, and it is the same one for all n points. But I am interested in a general solution.
Appreciate your help.

Difficult to tell without testable code, but this should be close:
A = numpy.linalg.cholesky(sigma) # => shape (n, m, m), same as sigma
Z = np.random.normal(size = (n, m)) # shape (n, m)
X = np.einsum('ijk, ik -> ij', A, Z) + mean # shape (n, m)
What's going on:
We're manually sampling multivariate normal distributions according to the standard Cholesky decomposition method outlined here. A is built such that A#A.T = sigma. Then X (the multivariate normal) can be formed by the dot product of A and a univariate normal N(0, 1) vector Z, plus the mean.
You keep the extraneous dimension throughout the calculation in the first (index = 0, 'i' in the einsum) axis, while contracting the last ('k') axis, forming the dot product.

Related

Pytorch: Efficiently compute unbiased estimator of mean to the power of four

Let w, x, y, z be torch tensors of shape (m, n) and we wish to compute the following unbiased estimator row-wise efficiently (without for loops), where I want to compute for every row 1, ..., m:
In case of only the unbiased estimator of the square of means, i.e., for :
this is possible, e.g., using torch.einsum:
batch_outer = torch.einsum('bi, bj -> bij', x, y)
zero_diag = 1-torch.eye(batch_outer.shape[1])
return (batch_outer * zero_diag).sum(dim=2).sum(dim=1) / (n * (n-1))
However, for the case to the power of four this is not so easy doable, mostly because these are not squared tensors and in particular, because the zeroing out of the diagonals becomes very tedious.
My questions:
1.) How can this be implemented efficiently ommitting any for loops?
2.) Which time and memory complexity would that solution have in big O notation?
3.) Can this solution also be used to do it with four 3D tensors of shape (m, k, n), where again we only want to do the computations along the axes of length n (dim=2)?
4.) If I want to do it in log-space for numerical stability, i.e., to use logsumexp for summations and sums for multiplications (because log(xy)= log(x)+log(y)), any solution with einsum wouldnt work anymore. How could that computation then be done in log space?
1 This implementation seems to work if I didn't make mess with the diagonal dimensions.
import numpy as np
import torch as th
x = np.array([1,4,5,3])
y = np.array([5,2,4,5])[np.newaxis]
z = np.array([5,7,4,5])[np.newaxis][np.newaxis]
w = np.array([3,9,5,1])[np.newaxis][np.newaxis][np.newaxis]
xth = th.Tensor(x)
yth = th.Tensor(y)
zth = th.Tensor(z)
wth = th.Tensor(w)
tensor = xth*th.transpose(yth, 0, 1)*th.transpose(zth,0,2)*th.transpose(wth,0,3)
diag = th.diagonal(tensor, dim1 = -2, dim2 = -1)
result = th.sum(tensor) - th.sum(diag)
result /= np.math.factorial(len(x))
print(result)
The order is between O(n^2.37..) - O(n^3), depending on the pytorch implementation of the matrix multiplication.
I don't see why not, just choose properly the dimensions to transpose and take the diagonal.
I don't see why would this solution won't work in a log-space.
pd: my knowledge in pytorch is quite limited, but I'm sure you can define x,y,z,w in a more elegant way.

In which scenario would one use another matrix than the identity matrix for finding eigenvalues?

The scipy.linalg.eigh function can take two matrices as arguments: first the matrix a, of which we will find eigenvalues and eigenvectors, but also the matrix b, which is optional and chosen as the identity matrix in case it is left blank.
In what scenario would someone like to use this b matrix?
Some more context: I am trying to use xdawn covariances from the pyRiemann package. This uses the scipy.linalg.eigh function with a covariance matrix a and a baseline covariance matrix b. You can find the implementation here. This yields an error, as the b matrix in my case is not positive definitive and thus not useable in the scipy.linalg.eigh function. Removing this matrix and just using the identity matrix however solves this problem and yields relatively nice results... The problem is that I do not really understand what I changed, and maybe I am doing something I should not be doing.
This is the code from the pyRiemann package I am using (modified to avoid using functions defined in other parts of the package):
# X are samples (EEG data), y are labels
# shape of X is (1000, 64, 2459)
# shape of y is (1000,)
from scipy.linalg import eigh
Ne, Ns, Nt = X.shape
tmp = X.transpose((1, 2, 0))
b = np.matrix(sklearn.covariance.empirical_covariance(tmp.reshape(Ne, Ns * Nt).T))
for c in self.classes_:
# Prototyped response for each class
P = np.mean(X[y == c, :, :], axis=0)
# Covariance matrix of the prototyper response & signal
a = np.matrix(sklearn.covariance.empirical_covariance(P.T))
# Spatial filters
evals, evecs = eigh(a, b)
# and I am now using the following, disregarding the b matrix:
# evals, evecs = eigh(a)
If A and B were both symmetric matrices that doesn't necessarily have to imply that inv(A)*B must be a symmetric matrix. And so, if i had to solve a generalised eigenvalue problem of Ax=lambda Bx then i would use eig(A,B) rather than eig(inv(A)*B), so that the symmetry isn't lost.
One practical application is in finding the natural frequencies of a dynamic mechanical system from differential equations of the form M (d²x/dt²) = Kx where M is a positive definite matrix known as the mass matrix and K is the stiffness matrix, and x is displacement vector and d²x/dt² is acceleration vector which is the second derivative of the displacement vector. To find the natural frequencies, x can be substituted with x0 sin(ωt) where ω is the natural frequency. The equation reduces to Kx = ω²Mx. Now, one can use eig(inv(K)*M) but that might break the symmetry of the resultant matrix, and so I would use eig(K,M) instead.
A - lambda B x it means that x is not in the same basis as the covariance matrix.
If the matrix is not definite positive it means that there are vectors that can be flipped by your B.
I hope it was helpful.

Multiplying subarrays of tensor

I am trying to implement a multivariate Gaussian Mixture Model and am trying to calculate the probability distribution function using tensors. There are n data points, k clusters, and d dimensions. So far, I have two tensors. One is a (n,k,d) tensor of centered data points and the other is a kxdxd tensor of covariance matricies. I can compute an nxk matrix of probabilities by doing
centered = np.repeat(points[:,np.newaxis,:],K,axis=1) - mu[np.newaxis,:] # KxNxD
prob = np.zeros(n,k)
constant = 1/2/np.pow(np.pi, d/2)
for n in range(centered.shape[1]):
for k in range(centered.shape[0]):
p = centered[n,k,:][np.newaxis] # 1xN
power = -1/2*(p # np.linalg.inv(sigma[k,:,:]) # p.T)
prob[n,k] = constant * np.linalg.det(sigma[k,:,:]) * np.exp(power)
where sigma is the triangularized kxdxd matrix of covariances and centered are mypoints. What is a more pythonic way of doing this using numpy's tensor capabilites?
Just a couple of quick observations:
I don't see you using p in the loop; is this a mistake? Using n instead?
The T in centered[n,k,:].T does nothing; with that index the array is 1d
I'm not sure if np.linal.inv can handle batches of arrays, allowing np.linalg.inv(sigma).
# allows batches, just so long as the last 2 dim are the ones entering into the dot (with the usual last of A, 2nd to the last of B rule; einsum can also be used.
again does np.linalg.det handle batches?

What does the MNIST tensorflow tutorial mean with matmul flipping trick?

The tutorial on MNIST for ML Beginners, in Implementing the Regression, shows how to make the regression on a single line, followed by an explanation that mentions the use of a trick (emphasis mine):
y = tf.nn.softmax(tf.matmul(x, W) + b)
First, we multiply x by W with the expression tf.matmul(x, W). This is flipped from when we multiplied them in our equation, where we had Wx, as a small trick to deal with x being a 2D tensor with multiple inputs.
What is the trick here, and why are we using it?
Well, there's no trick here. That line basically points to one previous equation multiplication order
# Here the order of W and x, this equation for single example
y = Wx +b
# if you want to use batch of examples you need the change the order of multiplication; instead of using another transpose op
y = xW +b
# hence
y = tf.matmul(x, W)
Ok, I think the main point is that if you train in batches (i.e. train with several instances of the training set at once), TensorFlow always assumes that the zeroth dimension of x indicates the number of events per batch.
Suppose you want to map a training instance of dimension M to a target instance of dimension N. You would typically do this by multiplying x (a column vector) with a NxM matrix (and, optionally, add a bias with dimension N (also a column vector)), i.e.
y = W*x + b, where y is also a column vector.
This is perfectly alright seen from the perspective of linear algebra. But now comes the point with the training in batches, i.e. training with several training instances at once.
To get to understand this, it might be helpful to not view x (and y) as vectors of dimension M (and N), but as matrices with the dimensions Mx1 (and Nx1 for y).
Since TensorFlow assumes that the different training instances constituting a batch are aligned along the zeroth dimension, we get into trouble here since the zeroth dimension is occupied by the different elements of one single instance.
The trick is then to transpose the above equation (remember that transposition of a product also switches the order of the two transposed objects):
y^T = x^T * W^T + b^T
This is pretty much what has been described in short within the tutorial.
Note that y^T is now a matrix of dimension 1xN (practically a row vector), while x^T is a matrix of dimension 1xM (also a row vector). W^T is a matrix of dimension MxN. In the tutorial, they did not write x^T or y^T, but simply defined the placeholders according to this transposed equation. The only point that is not clear to me is why they did not define b the "transposed way". I assume that the + operator automatically transposes b if it is necessary in order to get the correct dimensions.
The rest is now pretty easy: if you have batches larger than 1 instance, you just "stack" multiple of the x (1xM) matrices, say to a matrix of dimensions (AxM) (where A is the batch size). b will hopefully automatically broadcasted to this number of events (that means to a matrix of dimension (AxN). If you then use
y^T = x^T * W^T + b^T,
you will get a (AxN) matrix of the targets for each element of the batch.

Python: how to use Python to generate a random sparse symmetric matrix?

How to use python to generate a random sparse symmetric matrix ?
In MATLAB, we have a function "sprandsym (size, density)"
But how to do that in Python?
If you have scipy, you could use sparse.random. The sprandsym function below generates a sparse random matrix X, takes its upper triangular half, and adds its transpose to itself to form a symmetric matrix. Since this doubles the diagonal values, the diagonals are subtracted once.
The non-zero values are normally distributed with mean 0 and standard deviation
of 1. The Kolomogorov-Smirnov test is used to check that the non-zero values is
consistent with a drawing from a normal distribution, and a histogram and
QQ-plot is generated too to visualize the distribution.
import numpy as np
import scipy.stats as stats
import scipy.sparse as sparse
import matplotlib.pyplot as plt
np.random.seed((3,14159))
def sprandsym(n, density):
rvs = stats.norm().rvs
X = sparse.random(n, n, density=density, data_rvs=rvs)
upper_X = sparse.triu(X)
result = upper_X + upper_X.T - sparse.diags(X.diagonal())
return result
M = sprandsym(5000, 0.01)
print(repr(M))
# <5000x5000 sparse matrix of type '<class 'numpy.float64'>'
# with 249909 stored elements in Compressed Sparse Row format>
# check that the matrix is symmetric. The difference should have no non-zero elements
assert (M - M.T).nnz == 0
statistic, pval = stats.kstest(M.data, 'norm')
# The null hypothesis is that M.data was drawn from a normal distribution.
# A small p-value (say, below 0.05) would indicate reason to reject the null hypothesis.
# Since `pval` below is > 0.05, kstest gives no reason to reject the hypothesis
# that M.data is normally distributed.
print(statistic, pval)
# 0.0015998040114 0.544538788914
fig, ax = plt.subplots(nrows=2)
ax[0].hist(M.data, normed=True, bins=50)
stats.probplot(M.data, dist='norm', plot=ax[1])
plt.show()
PS. I used
upper_X = sparse.triu(X)
result = upper_X + upper_X.T - sparse.diags(X.diagonal())
instead of
result = (X + X.T)/2.0
because I could not convince myself that the non-zero elements in (X + X.T)/2.0 have the right distribution. First, if X were dense and normally distributed with mean 0 and variance 1, i.e. N(0, 1), then (X + X.T)/2.0 would be N(0, 1/2). Certainly we could fix this by using
result = (X + X.T)/sqrt(2.0)
instead. Then result would be N(0, 1). But there is yet another problem: If X is sparse, then at nonzero locations, X + X.T would often be a normally distributed random variable plus zero. Dividing by sqrt(2.0) will squash the normal distribution closer to 0 giving you a more tightly spiked distribution. As X becomes sparser, this may be less and less like a normal distribution.
Since I didn't know what distribution (X + X.T)/sqrt(2.0) generates, I opted for copying the upper triangular half of X (thus repeating what I know to be normally distributed non-zero values).
The matrix needs to be symmetric too, which seems to be glossed over by the two answers here;
def sparseSym(rank, density=0.01, format='coo', dtype=None, random_state=None):
density = density / (2.0 - 1.0/rank)
A = scipy.sparse.rand(rank, rank, density=density, format=format, dtype=dtype, random_state=random_state)
return (A + A.transpose())/2
This will create a sparse matrix, and then adds it's transpose to itself to make it symmetric.
It takes into account the fact that the density will increase as you add the two together, and the fact that there is no additional increase in density from the diagonal terms.
unutbu's answer is the best one for performance and extensibility - numpy and scipy, together, have a lot of the functionality from matlab.
If you can't use them for whatever reason, or you're looking for a pure python solution, you could try
from random import randgauss, randint
sparse = [ [0 for i in range(N)] for j in range(N)]
# alternatively, if you have numpy but not scipy:
# sparse = numpy.zeros(N,N)
for _ in range(num_terms):
(i,j) = (randint(0,n),randint(0,n))
x = randgauss(0,1)
sparse[i][j] = x
sparse[j][i] = x
Although it might give you a little more control than unutbu's solution, you should expect it to be significantly slower; scipy is a dependency you probably don't want to avoid

Categories

Resources