Numpy - Matrix multiplication faster with many zero-entries?

Numpy - Matrix multiplication faster with many zero-entries? - python

Problem: In numpy, I have a matrix M1 that I am multiplying with another matrix M2.
I know that I can spare half of the values in M1 because the resulting matrix will be symmetric and I only need the top k values.
So I am thinking to use numpy.tril to zero out half the values, hoping that the underlying C-functions will be faster for multiplications a*b where a==0 as they can stop right at seeing a==0 instead of doing the whole float operation.
I know I can time this but I think this is a question of general interest.
Note that M1 is not sparse, just that half of it needs not be considered.
Perhaps there is even a better way to save 50% computation?
Background: This has to do with
p(A|B)*p(B) == p(B|A)*p(A)
... if you see what I mean.
Example: This is only one point where it happens, but in the very end, we have
a |A| x |A| matrix p(A|B) (A and B are the same variables)
a 1 x |A| matrix p(B)
the result is a |A| x |A| matrix p(A,B) = p(A|B)*p(B) where we do not care about the diagonal as it is the probability of the value given itself and the part above or below the diagonal as it is duplicate of the other half. Nice for a sanity check but unnecessary after all.
Note that here it is actually not a dot product. But I guess half the computations leading to p(A|B) are also unnecessary.
Update: I will persue a more reasonable approach for this application, which is to limit A and B to be disjoint. Then all matrices are reduced in size. It can be done elegantly in numpy but adds some complexity to reading the code.
That did not make sense after all. The only option would be to create M1.shape[0]-1 submatrices that recreate the triangle, but that would certainly produce too much overhead.

Related

Calculating roots of multiple polynomials in numpy without using a loop

I can use the polyfit() method with a 2D array as input, to calculate polynomials on multiple data sets in a fast manner. After getting these multiple polynomials, I want to calculate the roots of all of these polynomials, in a fast manner.
There is numpy.roots() method for finding the roots of a single polynomial but this method does not work with 2D inputs (meaning multiple polynomials). I am working with millions of polynomials, so I would like to avoid looping over all polynomials using a for loop, map or comprehension because it takes minutes in that case. I would prefer a vectoral numpy operation or series of vectoral operations.
An example code for inefficient calculation:
POLYNOMIAL_COUNT = 1000000
# Create a polynomial of second order with coefficients 2, 3 and 4
coefficients = np.array([[2,3,4]])
# Let's say we have the same polynomial multiple times, represented as a 2D array.
# In reality the polynomial coefficients will be different from each other,
# but they will be the same order.
coefficients = coefficients.repeat(POLYNOMIAL_COUNT, axis=0)
# Calculate roots of these same-order polynomials.
# Looping here takes too much time.
roots = []
for i in range(POLYNOMIAL_COUNT):
roots.append(np.roots(coefficients[i]))
Is there a way to find the roots of multiple same-order polynomials using numpy, but without looping?

For the special case of polynomials up to the fourth order, you can solve in a vectorized manner. Anything higher than that does not have an analytical solution, so requires iterative optimization, which is fundamentally unlikely to be vectorizable since different rows may require a different number of iterations. As #John Coleman suggests, you might be able to get away with using the same number of steps for each one, but will likely have to sacrifice accuracy to do so.
That being said, here is an example of how to vectorize the second order case:
d = coefficients[:, 1:-1]**2 - 4.0 * coefficients[:, ::2].prod(axis=1, keepdims=True)
roots = -0.5 * (coefficients[:, 1:-1] + [1, -1] * np.emath.sqrt(d)) / coefficients[:, :1]
If I got the order of the coefficients wrong, replace coefficients[:, :1] with coefficients[:, -1:] in the denominator of the last assignment. Using np.emath.sqrt is nice because it will return a complex128 result automatically when your discriminant d is negative anywhere, and normal float64 result for all real roots.
You can implement a third order solution or a fourth order solution in a similar manner.

best way to store numbers in a multidimensional (sparse) array in python

What is the best container object for a calculation in N dimensions, when the problem is symmetric so that only some numbers need to be calculated?
Concretely, for N=4 I have:
M=50
results = np.zeros((M,M,M,M))
for ii in range(M):
for jj in range(ii,M):
for kk in range(jj,M):
for ll in range(kk, M):
res=1 #really some calculation
results[ii,jj,kk,ll] = res
Many elements in this array are completely redundant and aren't even accessed. This is even more true for higher N (I'd like to go up to N=10 or ideally N=15).
Is it better to use lists and append in each step for such a problem, or a dictionary, or sparse matrices? I tried a sparse matrix, but it keeps warning me that I shouldn't frequently change elements in a sparse matrix, so presumably this is not a good idea.
The only functionality that I'd need to retain is finding maxima (ideally along each dimension).
Any insights would be appreciated!

The "density" of the matrix will by 1 / D**2, where D is the number of dimensions - so you can see that the payoff in space is exponential, while the performance penalty comparing to lists or dense matrices is constant.
So, when the number of dimensions is high, sparse matrices will provide HUGE advantage in space used, and they're still faster than just lists. If the number of dimensions is small, dense matrices will be slightly bigger but also only slightly faster (slightly here: few times faster, but since the total execution time is small, the absolute difference is still small).
Overall, unless the number of dimensions is fixed, it makes more sense to stick with sparse matrices. However, if D is fixed, it's better to just benchmark for this specific case.

numpy.sum transition to kahan but with masked arrays for increased precision

I have a multi-array stack of data that is masked to exclude 'bad' or problematic values- this is in the 3rd dimension. Current code utilizes np.sum, but the level of precision (both large and small numbers) has negatively impacted results. I've attempted to implement the kahan_sum referenced here but forgotten about the masked arrays, and the results are not similar (due to masking). It is my hope that the added precision retention by utilizing a kahan summation and accumulator will permit downstream operations to maintain less error.
Source/research:
https://github.com/numpy/numpy/issues/8786
Kahan summation
Python floating point precision sum (I've jacked up the precision as far as possible but it doesn't help)
import numpy as np
import numpy.ma as ma
def kahan_sum(a, axis=None):
s = numpy.zeros(a.shape[:axis] + a.shape[axis+1:])
c = numpy.zeros(s.shape)
for i in range(a.shape[axis]):
# http://stackoverflow.com/a/42817610/353337
y = a[(slice(None),) * axis + (i,)] - c
t = s + y
c = (t - s) - y
s = t.copy()
return s
data=np.random.rand(5,5,5)
dd=np.ma.masked_array(data=d, mask=np.random.rand(5,5,5)<0.2)
I want to sum along the 3rd (axis=2) as that's essentially my 'stack' of photos.
The masks are not coming out as I expected. It's possible I'm just overtired...
np.sum(dd, axis=2)
kahan_sum(dd, axis=2)
np.sum provides a fully populated array of data and excluded the 'masked' values.
kahan_sum essentially or'd all of the masked values, and I've been unable to come up with a pattern for it.
Printing the mask is pretty evident that thats where the problem is; I'm just not figuring out how to fix it or why it's operating the way it is.
Thank you.

If you really need more precision, consider using math.fsum which is accurate to fp resolution. If A is your 3D masked array, something like:
i,j,k = A.shape
np.frompyfunc(lambda i,j:math.fsum(A[i,j].compressed().tolist()),2,1)(*np.ogrid[:i,:j])
But before that I'd triple check thatnp.sum really isn't good enough. As far as I know it uses pairwise summation along contiguous axes which in practice tends to be pretty good.

numpy vector multiplication speed?

I'm new to numpy, and found such strange(as for me) behavior.
I'm implementing logistic regression cost function, here I have 2 column vectors with same dimension and same types(dfloat). y contains bunch of zeros and ones, and a contains float numbers in range (-1, 1).
At some point I should get dot product so I transpose one and multiply them:
x = y.T # a
But when I use
x = y # a.T
occasionally performance decreases about 3 times, while results are the same
Why is this so? Isn't operations are the same?
Thanks.

The performance decreases, and you get a very different answer!
For vector multiplication (unlike number multiplication) a # b != b # a. In your case (assuming column vectors), a.T # b is a number, but a # b.T is a full-blown matrix! So, if your vectors are both of shape (1, y), the last operation will result in a (y, y) matrix, which may be pretty huge. Of course, it'll take way more time to compute such a matrix (a.k.a. add a whole lot of numbers and produce a whole lot of numbers), than to add a bunch of numbers and produce one single number.
That's how matrix (or vector) multiplication works.

pseudo inverse of sparse matrix in python

I am working with data from neuroimaging and because of the large amount of data, I would like to use sparse matrices for my code (scipy.sparse.lil_matrix or csr_matrix).
In particular, I will need to compute the pseudo-inverse of my matrix to solve a least-square problem.
I have found the method sparse.lsqr, but it is not very efficient. Is there a method to compute the pseudo-inverse of Moore-Penrose (correspondent to pinv for normal matrices).
The size of my matrix A is about 600'000x2000 and in every row of the matrix I'll have from 0 up to 4 non zero values. The matrix A size is given by voxel x fiber bundle (white matter fiber tracts) and we are expecting maximum 4 tracts to cross in a voxel. In most of the white matter voxels we expect to have at least 1 tract, but I will say that around 20% of the lines could be zeros.
The vector b should not be sparse, actually b contains the measure for each voxel, which is in general not zero.
I would need to minimize the error, but there are also some conditions on the vector x. As I tried the model on smaller matrices, I never needed to constrain the system in order to satisfy these conditions (in general 0
Is that of any help? Is there a way to avoid taking the pseudo-inverse of A?
Thanks
Update 1st June:
thanks again for the help.
I can't really show you anything about my data, because the code in python give me some problems. However, in order to understand how I could choose a good k I've tried to create a testing function in Matlab.
The code is as follow:
F=zeros(100000,1000);
for k=1:150000
p=rand(1);
a=0;
b=0;
while a<=0 || b<=0
a=random('Binomial',100000,p);
b=random('Binomial',1000,p);
end
F(a,b)=rand(1);
end
solution=repmat([0.5,0.5,0.8,0.7,0.9,0.4,0.7,0.7,0.9,0.6],1,100);
size(solution)
solution=solution';
measure=F*solution;
%check=pinvF*measure;
k=250;
F=sparse(F);
[U,S,V]=svds(F,k);
s=svds(F,k);
plot(s)
max(max(U*S*V'-F))
for s=1:k
if S(s,s)~=0
S(s,s)=1/S(s,s);
end
end
inv=V*S'*U';
inv*measure
max(inv*measure-solution)
Do you have any idea of what should be k compare to the size of F? I've taken 250 (over 1000) and the results are not satisfactory (the waiting time is acceptable, but not short).
Also now I can compare the results with the known solution, but how could one choose k in general?
I also attached the plot of the 250 single values that I get and their squares normalized. I don't know exactly how to better do a screeplot in matlab. I'm now proceeding with bigger k to see if suddently the value will be much smaller.
Thanks again,
Jennifer

You could study more on the alternatives offered in scipy.sparse.linalg.
Anyway, please note that a pseudo-inverse of a sparse matrix is most likely to be a (very) dense one, so it's not really a fruitful avenue (in general) to follow, when solving sparse linear systems.
You may like to describe a slight more detailed manner your particular problem (dot(A, x)= b+ e). At least specify:
'typical' size of A
'typical' percentage of nonzero entries in A
least-squares implies that norm(e) is minimized, but please indicate whether your main interest is on x_hat or on b_hat, where e= b- b_hat and b_hat= dot(A, x_hat)
Update: If you have some idea of the rank of A (and its much smaller than number of columns), you could try total least squares method. Here is a simple implementation, where k is the number of first singular values and vectors to use (i.e. 'effective' rank).
from scipy.sparse import hstack
from scipy.sparse.linalg import svds
def tls(A, b, k= 6):
"""A tls solution of Ax= b, for sparse A."""
u, s, v= svds(hstack([A, b]), k)
return v[-1, :-1]/ -v[-1, -1]

Regardless of the answer to my comment, I would think you could accomplish this fairly easily using the Moore-Penrose SVD representation. Find the SVD with scipy.sparse.linalg.svds, replace Sigma by its pseudoinverse, and then multiply V*Sigma_pi*U' to find the pseudoinverse of your original matrix.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Numpy - Matrix multiplication faster with many zero-entries? - python

Related

Calculating roots of multiple polynomials in numpy without using a loop

best way to store numbers in a multidimensional (sparse) array in python

numpy.sum transition to kahan but with masked arrays for increased precision

numpy vector multiplication speed?

pseudo inverse of sparse matrix in python

Categories

Resources