Is there an "enhanced" numpy/scipy dot method?

Is there an "enhanced" numpy/scipy dot method? - python

Problem
I would like to compute the following using numpy or scipy:
Y = A**T * Q * A
where A is a m x n matrix, A**T is the transpose of A and Q is an m x m diagonal matrix.
Since Q is a diagonal matrix I store only its diagonal elements as a vector.
Ways of solving for Y
Currently I can think of two ways of how to calculate Y:
Y = np.dot(np.dot(A.T, np.diag(Q)), A) and
Y = np.dot(A.T * Q, A).
Clearly option 2 is better than option 1 since no real matrix has to be created with diag(Q) (if this is what numpy really does...)
However, both methods suffer from the defect of having to allocate more memory than there really is necessary since A.T * Q and np.dot(A.T, np.diag(Q)) have to be stored along with A in order to calculate Y.
Question
Is there a method in numpy/scipy that would eliminate the unnecessary allocation of extra memory where you would only pass two matrices A and B (in my case B is A.T) and a weighting vector Q along with it?

(w/r/t the last sentence of the OP: i am not aware of such a numpy/scipy method but w/r/t the Question in the OP Title (i.e., improving NumPy dot performance) what's below should be of some help. In other words, my answer is directed to improving performance of most of the steps comprising your function for Y).
First, this should give you a noticeable boost over the vanilla NumPy dot method:
>>> from scipy.linalg import blas as FB
>>> vx = FB.dgemm(alpha=1., a=v1, b=v2, trans_b=True)
Note that the two arrays, v1, v2 are both in C_FORTRAN order
You can access the byte order of a NumPy array through an array's flags attribute like so:
>>> c = NP.ones((4, 3))
>>> c.flags
C_CONTIGUOUS : True # refers to C-contiguous order
F_CONTIGUOUS : False # fortran-contiguous
OWNDATA : True
MASKNA : False
OWNMASKNA : False
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
to change the order of one of the arrays so both are aligned, just call the NumPy array constructor, pass in the array and set the appropriate order flag to True
>>> c = NP.array(c, order="F")
>>> c.flags
C_CONTIGUOUS : False
F_CONTIGUOUS : True
OWNDATA : True
MASKNA : False
OWNMASKNA : False
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
You can further optimize by exploiting array-order alignment to reduce excess memory consumption caused by copying the original arrays.
But why are the arrays copied before being passed to dot?
The dot product relies on BLAS operations. These operations require arrays stored in C-contiguous order--it's this constraint that causes the arrays to be copied.
On the other hand, the transpose does not effect a copy, though unfortunately returns the result in Fortran order:
Therefore, to remove the performance bottleneck, you need to eliminate the predicate array-copying step; to do that just requires passing both arrays to dot in C-contiguous order*.
So to calculate dot(A.T., A) without making an extra copy:
>>> import scipy.linalg.blas as FB
>>> vx = FB.dgemm(alpha=1.0, a=A.T, b=A.T, trans_b=True)
In sum, the expression just above (along with the predicate import statement) can substitute for dot, to supply the same functionality but better performance
you can bind that expression to a function like so:
>>> super_dot = lambda v, w: FB.dgemm(alpha=1., a=v.T, b=w.T, trans_b=True)

I just wanted to put that up on SO, but this pull request should be helpful and remove the need for a separate function for numpy.dot
https://github.com/numpy/numpy/pull/2730
This should be available in numpy 1.7
In the meantime, I used the example above to write a function that can replace numpy dot, whatever the order of your arrays are, and make the right call to fblas.dgemm.
http://pastebin.com/M8TfbURi
Hope this helps,

numpy.einsum is what you're looking for:
numpy.einsum('ij, i, ik -> jk', A, Q, A)
This shall not need any additional memory (though usually einsum works slowlier than BLAS operations)

Related

How can a python function handle both numpy matrix and scalar?

There is a simple function, which intends to accept a scalar parameter, but also works for a numpy matrix. Why does the function fun works for a matrix?
>>> import numpy as np
>>> def fun(a):
return 1.0 / a
>>> b = 2
>>> c = np.mat([1,2,3])
>>> c
matrix([[1, 2, 3]])
>>> fun(b)
0.5
>>> fun(c)
matrix([[ 1. , 0.5 , 0.33333333]])
>>> v_fun = np.vectorize(fun)
>>> v_fun(b)
array(0.5)
>>> v_fun(c)
matrix([[ 1. , 0.5 , 0.33333333]])
It seems like fun is vectorized somehow, because the explictly vectorized function v_fun behaves same on matrix c. But their get different outputs on scalar b. Could anybody explain it? Thanks.

What happens in the case of fun is called broadcasting.
General Broadcasting Rules
When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions, and works its way forward. Two dimensions are compatible when
they are equal, or
one of them is 1
If these conditions are not met, a ValueError: frames are not aligned exception is thrown, indicating that the arrays have incompatible shapes. The size of the resulting array is the maximum size along each dimension of the input arrays.

fun already works for both scalars and arrays - because elementwise division is defined for both (their own methods). fun(b) does not involve numpy at all, that just a Python operation.
np.vectorize is meant to take a function that only works with scalars, and feed it elements from an array. In your example it first converts b into an array, np.array(b). For both c and this modified b, the result is an array of matching size. c is a 2d np.matrix, and result is the same. Notice that fun(b) is type array, not matrix.
This not a good example of using np.vectorize, nor an example of broadcasting. np.vectorize is a rather 'simple minded' function and doesn't handle scalars in a special way.
1/c or even b/c works because c, an array 'knows' about division. Similarly array multiplication and addition are defined: 1+c or 2*c.
I'm tempted to mark this as a duplicate of
Python function that handles scalar or arrays

scipy sparse matrix -- accessing multiple elements of a path

I have a scipy sparse matrix A and a (long) list of coordinates
myrows=[i1,i2,...] mycols=[j1,j2,...]. I need a list of their values [A[i1,j2],A[i2,j2],...]. How can I do this quickly. A loop is too slow.
I've thought about cython.inline() (which I use in other places in my code) or weave, but I don't see how to use the sparse type efficiently in cython or C++. Am I missing something simple?
Currently I'm using a hack that seems inefficient and possibly wrong sometimes -- which I flag with an error message. Here is my badly written code. Note that it relies on the ordering of elements to be preserved under addition and assumes that the elements in myrows,mycols are in A.
import scipy.sparse as sps
def getmatvals(A,myrows,mycols) #A is a coo_matrix
B = sps.coo_matrix((range(1,1+A.nnz),(A.row,A.col)),shape=A.shape)
T = sps.coo_matrix(([A.nnz+1]*len(myrows),(myrows,mycols)),shape=A.shape)
G = B-T #signify myelements in G by negatives and others by 0's
H = np.minimum([0]*A.nnz,G.data) #remove extra elements
H = H[np.nonzero(H)]
H = H + A.nnz
return A.data[H]

Arrays product in Python

I have defined couple of arrays in Python but I am having problem in calculation of the product.
import numpy as np
phi = np.array([[ 1., 1.],[ 0., 1.]])
P = np.array([[ 999., 0.],[ 0., 999.]])
np.dot(phi, P, phi.T)
I get the error:
ValueError: output array is not acceptable (must have the right type, nr dimensions, and be a C-Array)
But I do not know what is the problem, since the size of matrix or array is 2 by 2

As the documentation explains, numpy.dot only multiplies two matrices. The third, optional argument is an array in which to store the results. If you want to multiply three matrices, you will need to call dot twice:
numpy.dot(numpy.dot(phi, P), phi.T)
Note that arrays have a dot method that does the same thing as numpy.dot, which can make things easier to read:
phi.dot(P).dot(phi.T)

phi.T is the same as phi.transpose() (as stated in the docs). It is basically a return value of a class method. Therefore you can't use it as an output storage for the dot product.
Update
It appears that there is an additional problem here, that can be seen if saving the transposed matrix into new variable and using it as an output:
>>> g = phi.T
>>> np.dot(phi, P, g)
is still giving an error. The problem seem to be with the way the result of transpose is stored in the memory. The output parameter for the dot product has to be C-contiguous array, but in this case g is not like that. To overcome this issue the numpy.ascontiguousarray method can be used, which solves the problem:
>>> g = np.ascontiguousarray(phi.T)
>>> np.dot(phi, P, g)
array([[ 999., 999.],
[ 0., 999.]])

The error message points that there can be 3 reasons why it cannot perform np.dot(phi, P, out=phi.T):
"must have the right type": That is ok in the first example, since all the elements of P and phi are floating numbers. But not with the other example mentioned at the comments, where the c[0,0] element is floating point number, but the output array wants to be an integer at all positions since both 'a' and 'b' contains integers everywhere.
"nr dimensions":2x2 is the expected dimension of the output array, so the problem is definitely not with the dimensions.
"must be a C-Array": This actually means that the output array must be C-contingous. There is a very good description what actually C and F contingous mean: difference between C and F contingous arrays. To make the long story short, if phi is C-contingous (and by default it is) than phi.T will be F-contingous.
You can check it by checking the flag attributes:
>>> phi.flags
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
...
>>> phi.T.flags
C_CONTIGUOUS : False
F_CONTIGUOUS : True
OWNDATA : False
...

Sum ndarray values

Is there an easier way to get the sum of all values (assuming they are all numbers) in an ndarray :
import numpy as np
m = np.array([[1,2],[3,4]])
result = 0
(dim0,dim1) = m.shape
for i in range(dim0):
for j in range(dim1):
result += m[i,j]
print result
The above code seems somewhat verbose for a straightforward mathematical operation.
Thanks!

Just use numpy.sum():
result = np.sum(matrix)
or equivalently, the .sum() method of the array:
result = matrix.sum()
By default this sums over all elements in the array - if you want to sum over a particular axis, you should pass the axis argument as well, e.g. matrix.sum(0) to sum over the first axis.
As a side note your "matrix" is actually a numpy.ndarray, not a numpy.matrix - they are different classes that behave slightly differently, so it's best to avoid confusing the two.

Yes, just use the sum method:
result = m.sum()
For example,
In [17]: m = np.array([[1,2],[3,4]])
In [18]: m.sum()
Out[18]: 10
By the way, NumPy has a matrix class which is different than "regular" numpy arrays. So calling a regular ndarray matrix causes some cognitive dissonance. To help others understand your code, you may want to change the name matrix to something else.

Array division- translating from MATLAB to Python

I have this line of code in MATLAB, written by someone else:
c=a.'/b
I need to translate it into Python. a, b, and c are all arrays. The dimensions that I am currently using to test the code are:
a: 18x1,
b: 25x18,
which gives me c with dimensions 1x25.
The arrays are not square, but I would not want the code to fail if they were. Can someone explain exactly what this line is doing (mathematically), and how to do it in Python? (i.e., the equivalent for the built-in mrdivide function in MATLAB if it exists in Python?)

The line
c = a.' / b
computes the solution of the equation c b = aT for c. Numpy does not have an operator that does this directly. Instead you should solve bT cT = a for cT and transpose the result:
c = numpy.linalg.lstsq(b.T, a.T)[0].T

The symbol / is the matrix right division operator in MATLAB, which calls the mrdivide function. From the documentation, matrix right division is related to matrix left division in the following way:
B/A = (A'\B')'
If A is a square matrix, B/A is roughly equal to B*inv(A) (although it's computed in a different, more robust way). Otherwise, x = B/A is the solution in the least squares sense to the under- or over-determined system of equations x*A = B. More detail about the algorithms used for solving the system of equations is given here. Typically packages like LAPACK or BLAS are used under the hood.
The NumPy package for Python contains a routine lstsq for computing the least-squares solution to a system of equations. This routine will likely give you comparable results to using the mrdivide function in MATLAB, but it is unlikely to be exact. Any differences in the underlying algorithms used by each function will likely result in answers that differ slightly from one another (i.e. one may return a value of 1.0, whereas the other may return a value of 0.999). The relative size of this error could end up being larger, depending heavily on the specific system of equations you are solving.
To use lstsq, you may have to adjust your problem slightly. It appears that you want to solve an equation of the form cB = a, where B is 25-by-18, a is 1-by-18, and c is 1-by-25. Applying a transpose to both sides gives you the equation BTcT = aT, which is a more standard form (i.e. Ax = b). The arguments to lstsq should be (in this order) BT (an 18-by-25 array) and aT (an 18-element array). lstsq should return a 25-element array (cT).
Note: while NumPy doesn't make any distinction between a 1-by-N or N-by-1 array, MATLAB certainly does, and will yell at you if you don't use the proper one.

In Matlab, A.' means transposing the A matrix. So mathematically, what is achieved in the code is AT/B.
How to go about implementing matrix division in Python (or any language) (Note: Let's go over a simple division of the form A/B; for your example you would need to do AT first and then AT/B next, and it's pretty easy to do the transpose operation in Python |left-as-an-exercise :)|)
You have a matrix equation
C*B=A (You want to find C as A/B)
RIGHT DIVISION (/) is as follows:
C*(B*BT)=A*BT
You then isolate C by inverting (B*BT)
i.e.,
C = A*BT*(B*BT)' ----- [1]
Therefore, to implement matrix division in Python (or any language), get the following three methods.
Matrix multiplication
Matrix transpose
Matrix inverse
Then apply them iteratively to achieve division as in [1].
Only, you need to do AT/B, therefore your final operation after implementing the three basic methods should be:
AT*BT*(B*BT)'
Note: Don't forget the basic rules of operator precedence :)

You can also approach this using the pseudo-inverse of B then post multiplying that result with A. Try using numpy.linalg.pinv then combine this with matrix multiplication via numpy.dot:
c = numpy.dot(a, numpy.linalg.pinv(b))

[edited] As Suvesh pointed out, i was completely wrong before. however, numpy can still easily do the procedure he gives in his post:
A = numpy.matrix(numpy.random.random((18, 1))) # as noted by others, your dimensions are off
B = numpy.matrix(numpy.random.random((25, 18)))
C = A.T * B.T * (B * B.T).I

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Is there an "enhanced" numpy/scipy dot method? - python

numpy.einsum is what you're looking for: numpy.einsum('ij, i, ik -> jk', A, Q, A) This shall not need any additional memory (though usually einsum works slowlier than BLAS operations)

Related

How can a python function handle both numpy matrix and scalar?

scipy sparse matrix -- accessing multiple elements of a path

Arrays product in Python

Sum ndarray values

Array division- translating from MATLAB to Python

Categories

Resources