As part of a complex task, I need to compute matrix cofactors. I did this in a straightforward way using this nice code for computing matrix minors. Here is my code:
def matrix_cofactor(matrix):
C = np.zeros(matrix.shape)
nrows, ncols = C.shape
for row in xrange(nrows):
for col in xrange(ncols):
minor = matrix[np.array(range(row)+range(row+1,nrows))[:,np.newaxis],
np.array(range(col)+range(col+1,ncols))]
C[row, col] = (-1)**(row+col) * np.linalg.det(minor)
return C
It turns out that this matrix cofactor code is the bottleneck, and I would like to optimize the code snippet above. Any ideas as to how to do this?
If your matrix is invertible, the cofactor is related to the inverse:
def matrix_cofactor(matrix):
return np.linalg.inv(matrix).T * np.linalg.det(matrix)
This gives large speedups (~ 1000x for 50x50 matrices). The main reason is fundamental: this is an O(n^3) algorithm, whereas the minor-det-based one is O(n^5).
This probably means that also for non-invertible matrixes, there is some clever way to calculate the cofactor (i.e., not use the mathematical formula that you use above, but some other equivalent definition).
If you stick with the det-based approach, what you can do is the following:
The majority of the time seems to be spent inside det. (Check out line_profiler to find this out yourself.) You can try to speed that part up by linking Numpy with the Intel MKL, but other than that, there is not much that can be done.
You can speed up the other part of the code like this:
minor = np.zeros([nrows-1, ncols-1])
for row in xrange(nrows):
for col in xrange(ncols):
minor[:row,:col] = matrix[:row,:col]
minor[row:,:col] = matrix[row+1:,:col]
minor[:row,col:] = matrix[:row,col+1:]
minor[row:,col:] = matrix[row+1:,col+1:]
...
This gains some 10-50% total runtime depending on the size of your matrices. The original code has Python range and list manipulations, which are slower than direct slice indexing. You could try also to be more clever and copy only parts of the minor that actually change --- however, already after the above change, close to 100% of the time is spent inside numpy.linalg.det so that furher optimization of the othe parts does not make so much sense.
The calculation of np.array(range(row)+range(row+1,nrows))[:,np.newaxis] does not depended on col so you could could move that outside the inner loop and cache the value. Depending on the number of columns you have this might give a small optimization.
Instead of using the inverse and determinant, I'd suggest using the SVD
def cofactors(A):
U,sigma,Vt = np.linalg.svd(A)
N = len(sigma)
g = np.tile(sigma,N)
g[::(N+1)] = 1
G = np.diag(-(-1)**N*np.product(np.reshape(g,(N,N)),1))
return U # G # Vt
from sympy import *
A = Matrix([[1,2,0],[0,3,0],[0,7,1]])
A.adjugate().T
And the output (which is cofactor matrix) is:
Matrix([
[ 3, 0, 0],
[-2, 1, -7],
[ 0, 0, 3]])
Related
I have 2 arrays of sets of signals, both 16x90000 arrays. In other words, 2 arrays with 16 signals in each. I want to perform matched filtering on the signals, row by row, correlating row 1 of array 1 with row 1 of array 2, and so forth. I've tried using scipy's signal.convolve2D but it is extremely slow, taking tens of seconds to convolve even a 2x90000 array. I'm not sure if I am simply implementing wrong, or if there is a more efficient way of achieving what I want. I know the arrays are long, but I feel it should still be achievable. I have a feeling convolve2d is actually convolving to a squared factor higher than I want and convolving rows by columns too but I may be misunderstanding.
My implementation:
A.shape = (16,90000) # an array of 16 signals each 90000 samples long
B.shape = (16,90000) # another array of 16 signals each 90000 samples long
corr = sig.convolve2d(A,B,mode='same')
I haven't had much coffee yet so there's every chance I'm being stupid right now.
Please no for loops.
Since you need to correlate the signals row by row, the most basic solution would be:
import numpy as np
from scipy.signal import correlate
# sample inputs: A and B both have n signals of length m
n, m = 2, 5
A = np.random.randn(n, m)
B = np.random.randn(n, m)
C = np.vstack([correlate(a, b, mode="same") for a, b in zip(A, B)])
# [[-0.98455996 0.86994062 -1.1446486 -2.1751074 -0.59270322]
# [ 1.7945015 1.51317292 1.74286042 -0.57750712 -1.9178488 ]]]
One way to avoid a looped solution could be by bootlegging off a deep learning library, like PyTorch. Torch's Conv1d (though named conv, it effectively performs cross-correlation) can handle this scenario.
import torch
import torch.nn.functional as F
# Convert A and B to torch tensors
P = torch.from_numpy(A).unsqueeze(0) # (1, n, m)
Q = torch.from_numpy(B).unsqueeze(1) # (n, 1, m)
# Use conv1d --- with groups = n
def torch_correlate(A, B, n):
with torch.no_grad():
return F.conv1d(A, B, bias=None, stride=1, groups=n, padding="same").squeeze(0).numpy()
R = torch_correlate(P, Q, n)
# [[-0.98455996 0.86994062 -1.1446486 -2.1751074 -0.59270322]
# [ 1.7945015 1.51317292 1.74286042 -0.57750712 -1.9178488 ]]
However, I believe there shouldn't be any significant difference in the results, since grouping might be using some form of iteration internally as well. (Plus there is an overhead of converting from torch to numpy and back to consider).
I would suggest using the first method generally. Unless if you are working on really large signals, then you could theoretically use the PyTorch version to run it really fast on GPU, which you won't be able to do with the regular scipy one.
I have a matrix which is supposed to be symmetric (it's the inverse of symmetric), but it is not exactly due to numerical errors in the inversion etc.
So, I add a step of making the matrix symmetric (by a = .5(a+a'), and I see a numerical disaster if I do it in-place (out-of-place is ok). Code:
import numpy as np
def check_sym(x):
print("||a-a'||^2 = %e" % np.sum((x - x.T)**2))
# make a symmetric matrix
dim = 100
a = np.random.randn(dim,dim)
a = np.matmul(a, a.T)
b = a.copy()
check_sym(a)
print("symmetrizing in-place")
a += a.T
a *= .5
check_sym(a)
print("symmetrizing out-of-place")
b = .5 * (b + b.T)
check_sym(b)
And the output is:
||a-a'||^2 = 1.184044e-26
symmetrizing in-place
||a-a'||^2 = 7.313593e+04
symmetrizing out-of-place
||a-a'||^2 = 0.000000e+00
Note that for lower dimension (e.g. dim=10) the problem does not appear.
EDIT some more info is given by looking at a-a' after the in-place version:
The error comes from the line a += a.T. It is a known problem of the in-place operations (I cannot find right now the proper piece of documentation that states so) but quoted from scipy lecture notes:
The transposition is a view.
As a results, the following code is wrong and will not make a matrix symmetric:
a += a.T
It will work for small arrays (because of buffering) but fail for large one, in unpredictable ways.
The reason is that at the same time a is being updated with a.T, a.T is actually changing (since it is a memoryview of a), and thus updating some coordinates of a incorrectly.
If you want to symmetrize a matrix in-place, you could do the following:
a = np.random.rand(4,4)
a[np.tril_indices_from(a)] = a.T[np.tril_indices_from(a)]
Or, if you want to stick to your notation:
a += a.T.copy()
since copy will create a temporary copy of a.T which is not going to be updated.
I need to iteratively construct a huge sparse matrix in numpy/scipy. The intitialization is done within a loop:
from scipy.sparse import dok_matrix, csr_matrix
def foo(*args):
dim_x = 256*256*1024
dim_y = 128*128*512
matrix = dok_matrix((dim_x, dim_y))
for i in range(dim_x):
# compute stuff in order to get j
matrix[i, j] = 1.
return matrix.tocsr()
Then i need to convert it to a csr_matrix, because of further computations like:
matrix = foo(...)
result = matrix.T.dot(x)
At the beginning this was working fine. But my matrices are getting bigger and bigger and my computer starts to crash. Is there a more elegant way in storing the matrix?
Basically i have the following requirements:
The matrix needs to store float values form 0. to 1.
I need to compute the transpose of the matrix
I need to compute the dot product with a x_dimensional vector
The matrix dimensions can be around 1*10^9 x 1*10^8
My ram-storage is exceeding. I was reading several posts on stack overflow and the rest of the internet ;) I found PyTables, which isn't really made for matrix computations... etc.. Is there a better way?
For your case I would recommend using the data type np.int8 (or np.uint8) which require only one byte per element:
matrix = dok_matrix((dim_x, dim_y), dtype=np.int8)
Directly constructing the csr_matrix will also allow you to go further with the maximum matrix size:
from scipy.sparse import csr_matrix
def foo(*args):
dim_x = 256*256*1024
dim_y = 128*128*512
row = []
col = []
for i in range(dim_x):
# compute stuff in order to get j
row.append(i)
col.append(j)
data = np.ones_like(row, dtype=np.int8)
return csr_matrix((data, (row, col)), shape=(dim_x, dim_y), dtype=np.int8)
You may have hit the limits of what Python can do for you, or you may be able to do a little more. Try setting a datatype of np.float32, if you're on a 64 bit machine, this reduced precision may reduce your memory consumption. np.float16 may help you on memory even further, but your calculations may slow down (I've seen examples where processing may take 10x the amount of time):
matrix = dok_matrix((dim_x, dim_y), dtype=np.float32)
or possibly much slower, but even less memory consumption:
matrix = dok_matrix((dim_x, dim_y), dtype=np.float16)
Another option: buy more system memory.
Finally, if you can avoid creating your matrix with dok_matrix, and can create it instead with csr_matrix (I don't know if this is possible for your calculations) you may save a little overhead on the dict that dok_matrix uses.
I have a 2d numpy array X = (xrows, xcols) and I want to apply dot product on each row combination of the array to obtain another array which is of the shape P = (xrow, xrow).
The code looks like the following:
P = np.zeros((xrow, xrow))
for i in range(xrow):
for j in range(xrow):
P[i, j] = numpy.dot(X[i], X[j])
which works well if the array X is small but takes a lot of time for huge X. Is there any way to make it faster or do it more pythonically so that it is fast?
That is obtained by doing result = X.dot(X.T)
When the array becomes large, it can be done be blocks, but depending on your numpy backend this should already parallelize threadwise as much as possible. It seems that this is what you are looking for.
If for some reason you don't want to rely on that, and finally do resort to multiprocessing, you can try something along the lines of
import numpy as np
X = np.random.randn(1000, 100000)
block_size = 10000
from sklearn.externals.joblib import Parallel, delayed
products = Parallel(n_jobs=10)(delayed(np.dot)(X[:, pos:pos + block_size], X.T[pos:pos + block_size]) for pos in range(0, X.shape[1], block_size))
product = np.sum(products, axis=0)
I don't think this is useful for relatively small arrays. And threading can sometimes take care of this better as well.
This is 10% faster on my machine as it avoids loops:
numpy.matrix(X) * numpy.matrix(X.T)
but still there is 50% redundancy.
In Pythons Numpy module, is there a function that can calculate long/advanced math expressions on an array? I heard of the numexp module but want to stay clear of further dependencies.
Better yet, can I limit these expressions to only say the first or second element of the sub arrays within my array, without having to unpack them as separate arrays?
Here is my specific problem. I have an array of arrays containing geographic point coordinates looking like this: [[x1,y1],[x2,y2],[x3,y3],etc...]. What I want is to transform these geocoords to pixel coordinates so they can be drawn on an image. I therefore want to run the following expression/calculation on the first element of each subarray, ie the xs:
((180+X)/360)*screenwidthpixels
And on the second element, ie the ys:
((-90+Y)/180)*-screenheightpixels
These expressions would work in a python for-loop but is too slow, which is why I'm turning to Numpy. I know I can and have tried to just link numpys single math operator functions after each other but still too slow, and besides, to do that I first had to unpack all the xs and ys to separate arrays and repack them together after the calculation making it even slower.
So I guess I'm looking for a more direct Numpy way using less steps to transform my coordinate array using the expressions above. Any ideas?
import numpy as np
points = np.random.rand(10,2)
translation = np.array([180,-90])
scaling = np.array([1024, -768]) / np.array([360,180])
transformed_points = (points + translation) * scaling
This will do what you are looking for. It relies on numpy broadcasting rules to achieve expressiveness and performance.
But rather than explaining exactly how that works, I think you are better off finding yourself a good numpy primer, and starting at the top. numpy is one of the best things about python, and you cant go wrong learning a little more about it. Suffice to say, numpy is certainly up to the kind of task you are facing.
I'm a little confused because I'm not sure exactly what you're saying you already tried, or what the speed condition for success is.
Are you saying you already tried something like the following, but it is too slow?
arr = whatever
arr[:,0] = (arr[:,0] + 180) / (360 * screenwidthpixels)
arr[:,1] = 180 - (arr[:,1] - 90) / (180 * screenheightpixels)
I'm not sure what you mean by "having to unpack" to X and Y. Here's how you avoid unpacking (if i understand...)
arr = np.array([ [x1,y1], [x2,y2], [x3,y3] ])
arr.shape
=> (3, 2)
X = arr[:,0] # fast, creates a view
Y = arr[:,1] # fast too
((X+180)/360)/screenwidthpixels
Further speed up can be achieved by rewriting/simplifying your expressions.
((X+180)/360)/s => (X+180)/(360*s)
(180-((Y+90)/180))/s => (180/s-1/(2*s)) - y/(180*s)
In the first rewrite, you get 2 traverses of the array, instead of 3, and in the second, the array is only traversed twice, instead of 4 times.
In [235]: xs=arange(1000)
In [236]: ys=arange(1, 1001)
In [237]: a=array([xs, ys]).T
In [238]: a
Out[238]:
array([[ 0, 1],
[ 1, 2],
[ 2, 3],
...,
[ 997, 998],
[ 998, 999],
[ 999, 1000]])
In [240]: a[:, 0]=(a[:, 0]+180)/360/1024
the a[:, 0] offers a view of the first column of a, it's fast and memory saving. docs for numpy here