I have defined couple of arrays in Python but I am having problem in calculation of the product.
import numpy as np
phi = np.array([[ 1., 1.],[ 0., 1.]])
P = np.array([[ 999., 0.],[ 0., 999.]])
np.dot(phi, P, phi.T)
I get the error:
ValueError: output array is not acceptable (must have the right type, nr dimensions, and be a C-Array)
But I do not know what is the problem, since the size of matrix or array is 2 by 2
As the documentation explains, numpy.dot only multiplies two matrices. The third, optional argument is an array in which to store the results. If you want to multiply three matrices, you will need to call dot twice:
numpy.dot(numpy.dot(phi, P), phi.T)
Note that arrays have a dot method that does the same thing as numpy.dot, which can make things easier to read:
phi.dot(P).dot(phi.T)
phi.T is the same as phi.transpose() (as stated in the docs). It is basically a return value of a class method. Therefore you can't use it as an output storage for the dot product.
Update
It appears that there is an additional problem here, that can be seen if saving the transposed matrix into new variable and using it as an output:
>>> g = phi.T
>>> np.dot(phi, P, g)
is still giving an error. The problem seem to be with the way the result of transpose is stored in the memory. The output parameter for the dot product has to be C-contiguous array, but in this case g is not like that. To overcome this issue the numpy.ascontiguousarray method can be used, which solves the problem:
>>> g = np.ascontiguousarray(phi.T)
>>> np.dot(phi, P, g)
array([[ 999., 999.],
[ 0., 999.]])
The error message points that there can be 3 reasons why it cannot perform np.dot(phi, P, out=phi.T):
"must have the right type": That is ok in the first example, since all the elements of P and phi are floating numbers. But not with the other example mentioned at the comments, where the c[0,0] element is floating point number, but the output array wants to be an integer at all positions since both 'a' and 'b' contains integers everywhere.
"nr dimensions":2x2 is the expected dimension of the output array, so the problem is definitely not with the dimensions.
"must be a C-Array": This actually means that the output array must be C-contingous. There is a very good description what actually C and F contingous mean: difference between C and F contingous arrays. To make the long story short, if phi is C-contingous (and by default it is) than phi.T will be F-contingous.
You can check it by checking the flag attributes:
>>> phi.flags
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
...
>>> phi.T.flags
C_CONTIGUOUS : False
F_CONTIGUOUS : True
OWNDATA : False
...
Related
As far as I understand the documentation of numpy's negative function, its where option allows you to leave some array components unnegated:
>>> import numpy as np
>>> np.negative(np.array([[1.,2.],[3.,4.],[5.,6.]]), where=[True, False])
array([[-1., 2.],
[-3., 4.],
[-5., 6.]])
However, when I try it, it seems that those values are (almost) zeroed instead:
>>> import numpy as np
>>> np.negative(np.array([[1.,2.],[3.,4.],[5.,6.]]), where=[True, False])
array([[-1.00000000e+000, 6.92885436e-310],
[-3.00000000e+000, 6.92885377e-310],
[-5.00000000e+000, 6.92885375e-310]])
So how should I see the where option?
The documentation describes where like this:
Values of True indicate to calculate the ufunc at that position, values of False indicate to leave the value in the output alone.
Let's try an example using the out parameter:
x = np.ones(3)
np.negative(np.array([4.,5.,6.]), where=np.array([False,True,False]), out=x)
This sets x to [1., -5., 1.], and returns the same.
This makes some amount of sense once you realize that "leave the value in the output alone" literally means the output value is "don't care", rather than "same as the input" (the latter interpretation was how I read it the first time, too).
The problem comes in when you specify where but not out. Apparently the "ufunc machinery" (which is not visible in the implementation of np.negative()) creates an empty output array, meaning the values are indeterminate. So the locations at which where is False will have uninitialized values, which could be anything.
This seems pretty wrong to me, but there was a NumPy issue filed about it last year, and closed. It seems unlikely to change, so you'll have to work around it (e.g. by creating the output array yourself using zeros).
There is a simple function, which intends to accept a scalar parameter, but also works for a numpy matrix. Why does the function fun works for a matrix?
>>> import numpy as np
>>> def fun(a):
return 1.0 / a
>>> b = 2
>>> c = np.mat([1,2,3])
>>> c
matrix([[1, 2, 3]])
>>> fun(b)
0.5
>>> fun(c)
matrix([[ 1. , 0.5 , 0.33333333]])
>>> v_fun = np.vectorize(fun)
>>> v_fun(b)
array(0.5)
>>> v_fun(c)
matrix([[ 1. , 0.5 , 0.33333333]])
It seems like fun is vectorized somehow, because the explictly vectorized function v_fun behaves same on matrix c. But their get different outputs on scalar b. Could anybody explain it? Thanks.
What happens in the case of fun is called broadcasting.
General Broadcasting Rules
When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions, and works its way forward. Two dimensions are compatible when
they are equal, or
one of them is 1
If these conditions are not met, a ValueError: frames are not aligned exception is thrown, indicating that the arrays have incompatible shapes. The size of the resulting array is the maximum size along each dimension of the input arrays.
fun already works for both scalars and arrays - because elementwise division is defined for both (their own methods). fun(b) does not involve numpy at all, that just a Python operation.
np.vectorize is meant to take a function that only works with scalars, and feed it elements from an array. In your example it first converts b into an array, np.array(b). For both c and this modified b, the result is an array of matching size. c is a 2d np.matrix, and result is the same. Notice that fun(b) is type array, not matrix.
This not a good example of using np.vectorize, nor an example of broadcasting. np.vectorize is a rather 'simple minded' function and doesn't handle scalars in a special way.
1/c or even b/c works because c, an array 'knows' about division. Similarly array multiplication and addition are defined: 1+c or 2*c.
I'm tempted to mark this as a duplicate of
Python function that handles scalar or arrays
Given the following code, I expect the last two lines to behave the same, however the don't.
import numpy as np
C = np.matrix(np.zeros((4,4)))
C[0, 0:2] = np.matrix([[1, 2]]) # Works as expected.
C[0, [0,1]] = np.matrix([[1, 2]]) # Throws an "array is not broadcastable to correct shape" error.
When using an ndarray instead, things work as expected (adjusting the right-hand-side of the assignment to a one-dimensional ndarray):
D = np.zeros((4,4))
D[0, 0:2] = np.array([1, 2]) # Works as expected.
D[0, [0,1]] = np.array([1, 2]) # Works too.
And to make things even weirder, if one is only indexing the matrix C (as opposed to assigning to it), it seems using slice indices or a list just return the same:
C[0, 0:2] # => matrix([[ 1., 2.]])
C[0, [0, 1]] # => matrix([[ 1., 2.]])
The question is, why is the behavior of the two approaches in assignment different? What am I missing?
(Edit: typo)
It appears to be a bug in numpy: http://projects.scipy.org/numpy/ticket/803 . The solution is to assign an ordinary list or numpy array instead of assigning a matrix to the selected elements.
Edit: Had to realize that while what I write is true, the fact that D[0,0:2] = ... is different from D[0,[0,1]] = ... (so for arrays) is maybe a real inconsistency (and related).
Maybe an explenation why this happens as far as I see. Check this:
D[0,[0,1]] = np.array([[1,2]])
Gives the same error. The thing is that internally the slicing operation takes place before the matrix shape is "fixed" to 2D again, which, since matrix is a subclass occurs whenver a new view is created, but here no view is created as its unnecessary normally!
This means that when you are setting elements like this, it always behaves like:
C.A[0,[0,1]] = matrix([[1,2]]) # Note the C.A giving normal array view of C.
Which fails, because the matrix is 2D, but C.A[0,[0,1]] is 1D (since it is not "fixed" to be at least 2D by the matrix object), in this case one could say that since its just removing a 1 dimension axis from the right hand side numpy could maybe tolerate it, but as long as it doesn't it would require the matrix object to make a full custom sets of in place/assignment operators which would not be very elegent as well probably.
But maybe the use of C.A, etc. can help getting around this inconvenience. On a general note however, in numpy it is better to always use base class arrays unless you are doing a lot of matrix multiplications, etc. (in which case if it is limited to one part of the program, its likely better to just view your arrays as matrixes before it but work with arrays in the rest)
I want to assign a matrix N values long to N entries in a column of a much longer matrix, where a a boolean mask selects N entries. I am doing it wrong, because the large matrix remains unchanged. Please, see the next example:
Each entry in a large matrix contains a timestamp, a valid flag and an empty field to be filled with the time since the previous valid entry. I want to compute these time lapses:
a = np.array([(0,0,0),
(1,0,0),
(2,1,0),
(3,1,0),
(4,1,0),
(5,0,0),
(6,0,0),
(7,0,0),
(8,1,0),
(9,1,0)],
dtype=np.dtype([('time', '<i4'), ('ena', '|b1'), ('elapsed', '<i4')]))
To calculate time difference with previous unmasked entries:
elapsed = a[a['ena']]['timestamp'][1:] - a[a['ena']]['timestamp'][0:-1]
elapsed will be [1,1,4,1], (which is what I wanted).
Now I want to write elapsed seconds to the original array:
a[a['ena']]['step_secs'][1:] = timestep
there is no warning or error, but a remains unchanged, although I expected:
a = np.array([
(0,0,0),
(1,0,0),
(2,1,0),
(3,1,1),
(4,1,1),
(5,0,0),
(6,0,0),
(7,0,0),
(8,1,4),
(9,1,1)]
How should I do it? Many thanks.
The numpy folks have done some amazing magic to make fancy indexing (which includes boolean indexing) work as well as it does. This magic is pretty impressive but, it still cannot handle fancy indexing followed by more indexing on the left side of the assignment, for example a[fancy][index2] = something. Here is a simple example:
>>> a = np.zeros(3)
>>> b = np.array([True, False, True])
>>> a[b][1:] = 2
array([ 0., 0., 0.])
>>> a[1:][b[1:]] = 2
array([ 0., 0., 2.])
I think this is a bug, and I wonder if it is possible to catch it and raise an error instead of letting it silently fail. But getting back to your question, the easiest solution seems to be to replace:
a[a['ena']]['step_secs'][1:] = timestep
with:
tmp = a['ena'][1:]
a['step_secs'][1:][tmp] = timestep
or maybe:
a['step_secs'][1:][a['ena'][1:]] = timestep
Problem
I would like to compute the following using numpy or scipy:
Y = A**T * Q * A
where A is a m x n matrix, A**T is the transpose of A and Q is an m x m diagonal matrix.
Since Q is a diagonal matrix I store only its diagonal elements as a vector.
Ways of solving for Y
Currently I can think of two ways of how to calculate Y:
Y = np.dot(np.dot(A.T, np.diag(Q)), A) and
Y = np.dot(A.T * Q, A).
Clearly option 2 is better than option 1 since no real matrix has to be created with diag(Q) (if this is what numpy really does...)
However, both methods suffer from the defect of having to allocate more memory than there really is necessary since A.T * Q and np.dot(A.T, np.diag(Q)) have to be stored along with A in order to calculate Y.
Question
Is there a method in numpy/scipy that would eliminate the unnecessary allocation of extra memory where you would only pass two matrices A and B (in my case B is A.T) and a weighting vector Q along with it?
(w/r/t the last sentence of the OP: i am not aware of such a numpy/scipy method but w/r/t the Question in the OP Title (i.e., improving NumPy dot performance) what's below should be of some help. In other words, my answer is directed to improving performance of most of the steps comprising your function for Y).
First, this should give you a noticeable boost over the vanilla NumPy dot method:
>>> from scipy.linalg import blas as FB
>>> vx = FB.dgemm(alpha=1., a=v1, b=v2, trans_b=True)
Note that the two arrays, v1, v2 are both in C_FORTRAN order
You can access the byte order of a NumPy array through an array's flags attribute like so:
>>> c = NP.ones((4, 3))
>>> c.flags
C_CONTIGUOUS : True # refers to C-contiguous order
F_CONTIGUOUS : False # fortran-contiguous
OWNDATA : True
MASKNA : False
OWNMASKNA : False
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
to change the order of one of the arrays so both are aligned, just call the NumPy array constructor, pass in the array and set the appropriate order flag to True
>>> c = NP.array(c, order="F")
>>> c.flags
C_CONTIGUOUS : False
F_CONTIGUOUS : True
OWNDATA : True
MASKNA : False
OWNMASKNA : False
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
You can further optimize by exploiting array-order alignment to reduce excess memory consumption caused by copying the original arrays.
But why are the arrays copied before being passed to dot?
The dot product relies on BLAS operations. These operations require arrays stored in C-contiguous order--it's this constraint that causes the arrays to be copied.
On the other hand, the transpose does not effect a copy, though unfortunately returns the result in Fortran order:
Therefore, to remove the performance bottleneck, you need to eliminate the predicate array-copying step; to do that just requires passing both arrays to dot in C-contiguous order*.
So to calculate dot(A.T., A) without making an extra copy:
>>> import scipy.linalg.blas as FB
>>> vx = FB.dgemm(alpha=1.0, a=A.T, b=A.T, trans_b=True)
In sum, the expression just above (along with the predicate import statement) can substitute for dot, to supply the same functionality but better performance
you can bind that expression to a function like so:
>>> super_dot = lambda v, w: FB.dgemm(alpha=1., a=v.T, b=w.T, trans_b=True)
I just wanted to put that up on SO, but this pull request should be helpful and remove the need for a separate function for numpy.dot
https://github.com/numpy/numpy/pull/2730
This should be available in numpy 1.7
In the meantime, I used the example above to write a function that can replace numpy dot, whatever the order of your arrays are, and make the right call to fblas.dgemm.
http://pastebin.com/M8TfbURi
Hope this helps,
numpy.einsum is what you're looking for:
numpy.einsum('ij, i, ik -> jk', A, Q, A)
This shall not need any additional memory (though usually einsum works slowlier than BLAS operations)