I know that numpy.cov calculates the covariance given a N dimensional array.
I can see from the documentation on GitHub that the normalisation is done by (N-1). But for my specific case, the covariance matrix is given by:
where xi is the quantity. i and j are the bins.
As you can see from the above equation, this covariance matrix is normalised by (N-1)/N.
TO GET THE ABOVE NORMALISATION
Can I simply multiply the covariance matrix obtained from numpy.cov by (N-1)**2 / N to get the above normalisation? Is that correct?
Or Should I use the bias parameter inside numpy.cov? If so how?
There are two ways of doing this.
We can call np.cov with bias=1 and then multiply the result by N-1
or
We can multiply the overall covariance matrix obtained by (N-1)**2/N
Related
I'm trying to fit a gaussian to this set of data:
It is a 2D matrix with values (probability distribution). If I plot it in 3D it looks like:
As far as I understood from this other question (https://mathematica.stackexchange.com/questions/27642/fitting-a-two-dimensional-gaussian-to-a-set-of-2d-pixels) I need to compute the mean and the covariance matrix of my data and the Gaussian that I need will be exactly the one defined by that mean and covariance matrix.
However, I can not properly understand the code of that other question (as it is from Mathematica) and I am pretty stuck with statistics.
How would I compute in Python (Numpy, PyTorch...) the mean and the covariance matrix of the Gaussian?
I'm trying to avoid all these optimization frameworks (LSQ, KDE) as I think that the solution is much simpler and the computational cost is something that I have to take into account...
Thanks!
Let's call your data matrix D with shape d x n where d is the data dimension and n is the number of samples. I will assume that in your example, d=5 and n=6, although you will need to determine for yourself which is the data dimension and which is the sample dimension. In that case, we can find the mean and covariance using the following code:
import numpy as np
n = 6
d = 5
D = np.random.random([d, n])
mean = D.mean(axis=1)
covariance = np.cov(D)
I have a basis set of square matrices and a data set that I need to find the coefficients for given that my data is a linear sum of the basis set.
def basis(a,b,c):
return a*gam1+b*gam2+c*kapp+allsky
So data = basis and I need the best fit (least square) values of the coefficients a,b and c. The basis matrices and the data matrices are all square matrices of 89x89. I have tried using np.linalg.lstsq however since my A matrix would need to be a matrix of the 4 basis matrices the array dimension becomes 4 and throws an error stating the array dimension must be 2. Any help is appreciated.
Assume 3 matrices Mean, Variance, Sample all with the same dimensionality. Is there a 1 line solution to generate the Sample matrix in numpy such that:
Sample[i,j] is drawn from NormalDistribution(Mean[i,j], Variance[i,j])
Using linearity of mean and Var(aX +b) = a**2 Var(X):
Generate a centered and reduced 2D array with np.random.randn(). Multiply pointwise by std (np.sqrt(variance)) and add (still pointwise) mean.
I have a large sparse matrix and I want to find its eigenvectors with specific eigenvalue. In scipy.sparse.linalg.eigs, it says the required argument k:
"k is the number of eigenvalues and eigenvectors desired. k must be smaller than N-1. It is not possible to compute all eigenvectors of a matrix".
The problem is that I don't know how many eigenvectors corresponding to the eigenvalue I want. What should I do in this case?
I'd suggest using Singular Value Decomposition (SVD) instead. There is a function from scipy where you can use SVD from scipy.sparse.linalg import svds and it can handle sparse matrix. You can find eigenvalues (in this case will be singular value) and eigenvectors by the following:
U, Sigma, VT = svds(X, k=n_components, tol=tol)
where X can be sparse CSR matrix, U and VT is set of left eigenvectors and right eigenvectors corresponded to singular values in Sigma. Here, you can control number of components. I'd say start with small n_components first and then increase it. You can rank your Sigma and see the distribution of singular value you have. There will be some large number and drop quickly. You can make threshold on how many eigenvectors you want to keep from singular values.
If you want to use scikit-learn, there is a class sklearn.decomposition.TruncatedSVD that let you do what I explained.
By definition, a square matrix that has a zero determinant should not be invertible. However, for some reason, after generating a covariance matrix, I take the inverse of it successfully, but taking the determinant of the covariance matrix ends up with an output of 0.0.
What could be potentially going wrong? Should I not trust the determinant output, or should I not trust the inverse covariance matrix? Or both?
Snippet of my code:
cov_matrix = np.cov(data)
adjusted_cov = cov_matrix + weight*np.identity(cov_matrix.shape[0]) # add small weight to ensure cov_matrix is non-singular
inv_cov = np.linalg.inv(adjusted_cov) # runs with no error, outputs a matrix
det = np.linalg.det(adjusted_cov) # ends up being 0.0
The numerical inversion of matrices does not involve computing the determinant. (Cramer's formula for the inverse is not practical for large matrices.) So, the fact that determinant evaluates to 0 (due to insufficient precision of floats) is not an obstacle for the matrix inversion routine.
Following up on the comments by BobChao87, here is a simplified test case (Python 3.4 console, numpy imported as np)
A = 0.2*np.identity(500)
np.linalg.inv(A)
Output: a matrix with 5 on the main diagonal, which is the correct inverse of A.
np.linalg.det(A)
Output: 0.0, because the determinant (0.2^500) is too small to be represented in double precision.
A possible solution is a kind of pre-conditioning (here, just rescaling): before computing the determinant, multiply the matrix by a factor that will make its entries closer to 1 on average. In my example, np.linalg.det(5*A) returns 1.
Of course, using the factor of 5 here is cheating, but np.linalg.det(3*A) also returns a nonzero value (about 1.19e-111). If you try np.linalg.det(2**k*A) for k running through modest positive integers, you will likely hit one that will return nonzero. Then you will know that the determinant of the original matrix was approximately 2**(-k*n) times the output, where n is the matrix size.