Say I have two matrices B and M and I want to execute the following statement:
B += 3*M
I execute this instruction repeatedly so I don't want to build each time the matrix 3*M (3 may change, it is just to make cleat that I only do a scalar-matrix product). Is it a numpy-function which makes this computation "in place"?
More precisely, I have a list of scalars as and a list of matrices Ms, I would like to perform the "dot product" (which is not really one since the two operands are of different type) of the two, that is to say:
sum(a*M for a, M in zip(as, Ms))
The np.dot function does not do what I except...
You can use np.tensordot -
np.tensordot(As,Ms,axes=(0,0))
Or np.einsum -
np.einsum('i,ijk->jk',As,Ms)
Sample run -
In [41]: As = [2,5,6]
In [42]: Ms = [np.random.rand(2,3),np.random.rand(2,3),np.random.rand(2,3)]
In [43]: sum(a*M for a, M in zip(As, Ms))
Out[43]:
array([[ 6.79630284, 5.04212877, 10.76217631],
[ 4.91927651, 1.98115548, 6.13705742]])
In [44]: np.tensordot(As,Ms,axes=(0,0))
Out[44]:
array([[ 6.79630284, 5.04212877, 10.76217631],
[ 4.91927651, 1.98115548, 6.13705742]])
In [45]: np.einsum('i,ijk->jk',As,Ms)
Out[45]:
array([[ 6.79630284, 5.04212877, 10.76217631],
[ 4.91927651, 1.98115548, 6.13705742]])
Another way you could do this, particularly if you favour readability, is to make use of broadcasting.
So you could make a 3D array from the 1D and 2D arrays and then sum over the appropriate axis:
>>> Ms = np.random.randn(4, 2, 3) # 4 arrays of size 2x3
>>> As = np.random.randn(4)
>>> np.sum(As[:, np.newaxis, np.newaxis] * Ms)
array([[-1.40199248, -0.40337845, -0.69986566],
[ 3.52724279, 0.19547118, 2.1485559 ]])
>>> sum(a*M for a, M in zip(As, Ms))
array([[-1.40199248, -0.40337845, -0.69986566],
[ 3.52724279, 0.19547118, 2.1485559 ]])
However, it's worth noting that np.einsum and np.tensordot are usually much more efficient:
>>> %timeit np.sum(As[:, np.newaxis, np.newaxis] * Ms, axis=0)
The slowest run took 7.38 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 8.58 µs per loop
>>> %timeit np.einsum('i,ijk->jk', As, Ms)
The slowest run took 19.16 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.44 µs per loop
And this is also true for larger numbers:
>>> Ms = np.random.randn(100, 200, 300)
>>> As = np.random.randn(100)
>>> %timeit np.einsum('i,ijk->jk', As, Ms)
100 loops, best of 3: 5.03 ms per loop
>>> %timeit np.sum(As[:, np.newaxis, np.newaxis] * Ms, axis=0)
100 loops, best of 3: 14.8 ms per loop
>>> %timeit np.tensordot(As,Ms,axes=(0,0))
100 loops, best of 3: 2.79 ms per loop
So np.tensordot works best in this case.
The only good reason to use np.sum and broadcasting is to make the code a little more readable (helps when you have small matrices).
Related
I want to compute the output error for a neural network for each input by compare output signal and its true output value so I need two matrix to compute this task.
I have output matrix in shape of (n*1) but in the label I just have the index of neuron that should be activated, so I need a matrix in the same shape with all element equal to zero except the one which it's index is equal to the label. I could do that with a function but I wonder is there a built in method in numpy python that can do that for me?
You can do that multiple ways using numpy or standard libraries, one way is to create an array of zeros, and set the value corresponding to index as 1.
n = len(result)
a = np.zeros((n,));
a[id] = 1
It probably is going to be the fastest one as well:
>> %timeit a = np.zeros((n,)); a[id] = 1
1000000 loops, best of 3: 634 ns per loop
Alternatively you can use numpy.pad to pad [ 1 ] array with zeros. But this will almost definitely will be slower due to padding logic.
np.lib.pad([1],(id,n-id),'constant', constant_values=(0))
As expected order of magnitude slower:
>> %timeit np.lib.pad([1],(id,n-id),'constant', constant_values=(0))
10000 loops, best of 3: 47.4 µs per loop
And you can try list comprehension as suggested by the comments:
results = [7]
np.matrix([1 if x == id else 0 for x in results])
But it is much slower than the first method as well:
>> %timeit np.matrix([1 if x == id else 0 for x in results])
100000 loops, best of 3: 7.25 µs per loop
Edit:
But in my opinion, if you want to compute the neural networks error. You should just use np.argmax and compute whether it was successful or not. That error calculation may give you more noise than it is useful. You can make a confusion matrix if you feel your network is prone to similarities.
A few other methods that also seem to be slower than #umutto's above:
%timeit a = np.zeros((n,)); a[id] = 1 #umutto's method
The slowest run took 45.34 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.53 µs per loop
Boolean construction:
%timeit a = np.arange(n) == id
The slowest run took 13.98 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 3.76 µs per loop
Boolean construction to integer:
%timeit a = (np.arange(n) == id).astype(int)
The slowest run took 15.31 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 5.47 µs per loop
List construction:
%timeit a = [0]*n; a[id] = 1; a=np.asarray(a)
10000 loops, best of 3: 77.3 µs per loop
Using scipy.sparse
%timeit a = sparse.coo_matrix(([1], ([id],[0])), shape=(n,1))
10000 loops, best of 3: 51.1 µs per loop
Now what's actually faster may depend on what's being cached, but it seems like constructing the zero array is probably fastest, especially if you can use np.zeros_like(result) instead of np.zeros(len(result))
One liner:
x = np.identity(n)[id]
I have a list of floats I get from a machine learning algorithm. All these floats are between 0 and 1:
probs = [proba[0] for proba in self.classifier.predict_proba(x_test)]
probs is my list of floats. The predict_proba() function normally returns a numpy array. It takes about 9 seconds to get the list, and the list finally contains about 60k values.
I would like to scale, or normalize, all the values in the list against the highest value in the list.
Normally, I would do that:
maximum = max(probs)
list_values = [proba / maximum for proba in probs]
But for 60k values, it takes about 2 minutes. I would like to make it shorter.
Do you have any idea about how I could attend better performances ?
If you don't mind using an external library, numpy might be worth looking into:
import numpy
probs = numpy.array([proba[0] for proba in self.classifier.predict_proba(x_test)])
list_values = probs/maximum
Another approach using numpy, potentially faster if your list of probabilities is large, is to convert your whole probabilities to a numpy array, and then operate over it:
import numpy as np
probs = np.asarray(self.classifier.predict_proba(x_test))
list_values = probs[:, 0] / probs.max()
The first line will convert all your probabilities to a N x M array (where N is your samples and M your number of classes).
The second line will select all the probabilities for the first class ([:, 0] means all rows of the column 0, which yields a vector of size N) and divide it for the maximum.
You can potentially extend this to all your probabilities:
all_probs = probs / probs.max()
The above will normalize all your probabilities for all the classes. And later you can access them like all_probs[:, i] where i is the class of interest.
You should use Scikit learn's normalize.
from sklearn.preprocessing import normalize
If you want your end results to be numpy.array , then it would be to faster to convert your list to numpy array before hand and to use array division directly , than list comprehension. Example -
import numpy as np
probsnp = np.array([proba[0] for proba in self.classifier.predict_proba(x_test)])
maximum = probs.max()
list_values = probs/maximum
Examples of timing tests -
In [46]: import numpy.random as ndr
In [47]: probs = ndr.random_sample(1000)
In [48]: probs.shape
Out[48]: (1000,)
In [49]: def func1(probs):
....: maximum = max(probs)
....: probsnew = [i/maximum for i in probs]
....: return probsnew
....:
In [50]: def func2(probs):
....: maximum = probs.max()
....: probsnew = probs/maximum
....: return probsnew
....:
In [51]: %timeit func1(probs)
The slowest run took 229.79 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 279 µs per loop
In [52]: %timeit func1(probs)
1000 loops, best of 3: 278 µs per loop
In [53]: %timeit func2(probs)
The slowest run took 356.45 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 81 µs per loop
In [54]: %timeit func1(probs)
1000 loops, best of 3: 278 µs per loop
In [55]: %timeit func2(probs)
10000 loops, best of 3: 81.5 µs per loop
The numpy method takes only 1/3rd time as that of list comprehension.
Timing tests with numpy.array() conversion as part of func2 (in above example) -
In [60]: probslist = [p for p in probs]
In [61]: def func2(probs):
....: probsnp = np,array(probs)
....: maxprobs = probsnp.max()
....: probsnew = probsnp/maxprobs
....: return probsnew
....:
In [65]: %timeit func1(probslist)
1000 loops, best of 3: 212 µs per loop
In [66]: %timeit func2(probslist)
10000 loops, best of 3: 198 µs per loop
In [67]: probs = ndr.random_sample(60000)
In [68]: probslist = [p for p in probs]
In [74]: %timeit func1(probslist)
100 loops, best of 3: 11.5 ms per loop
In [75]: %timeit func2(probslist)
100 loops, best of 3: 5.79 ms per loop
In [76]: %timeit func1(probslist)
100 loops, best of 3: 11.4 ms per loop
In [77]: %timeit func2(probslist)
100 loops, best of 3: 5.81 ms per loop
Seems like its still a little faster to use numpy array.
In Matlab, the builtin isequal does a check if two arrays are equal. If they are not equal, this might be very fast, as the implementation presumably stops checking as soon as there is a difference:
>> A = zeros(1e9, 1, 'single');
>> B = A(:);
>> B(1) = 1;
>> tic; isequal(A, B); toc;
Elapsed time is 0.000043 seconds.
Is there any equavalent in Python/numpy? all(A==B) or all(equal(A, B)) is far slower, because it compares all elements, even if the initial one differs:
In [13]: A = zeros(1e9, dtype='float32')
In [14]: B = A.copy()
In [15]: B[0] = 1
In [16]: %timeit all(A==B)
1 loops, best of 3: 612 ms per loop
Is there any numpy equivalent? It should be very easy to implement in C, but slow to implement in Python because this is a case where we do not want to broadcast, so it would require an explicit loop.
Edit:
It appears array_equal does what I want. However, it is not faster than all(A==B), because it's not a built-in, but just a short Python function doing A==B. So it does not meet my need for a fast check.
In [12]: %timeit array_equal(A, B)
1 loops, best of 3: 623 ms per loop
First, it should be noted that in the OP's example the arrays have identical elements because B=A[:] is just a view onto the array, so:
>>> print A[0], B[0]
1.0, 1.0
But, although the test isn't a fit one, the basic complaint is true: Numpy does not have a short-circuiting equivalency check.
One can easily see from the source that all of allclose, array_equal, and array_equiv are just variations upon all(A==B) to match their respective details, and are not notable faster.
An advantage of numpy though is that slices are just views, and are therefore very fast, so one could write their own short-circuiting comparison fairly easily (I'm not saying this is ideal, but it does work):
from numpy import *
A = zeros(1e8, dtype='float32')
B = A[:]
B[0] = 1
C = array(B)
C[0] = 2
D = array(A)
D[-1] = 2
def short_circuit_check(a, b, n):
L = len(a)/n
for i in range(n):
j = i*L
if not all(a[j:j+L]==b[j:j+L]):
return False
return True
In [26]: %timeit short_circuit_check(A, C, 100) # 100x faster
1000 loops, best of 3: 1.49 ms per loop
In [27]: %timeit all(A==C)
1 loops, best of 3: 158 ms per loop
In [28]: %timeit short_circuit_check(A, D, 100)
10 loops, best of 3: 144 ms per loop
In [29]: %timeit all(A==D)
10 loops, best of 3: 160 ms per loop
My code for slicing a numpy array (via fancy indexing) is very slow. It is currently a bottleneck in program.
a.shape
(3218, 6)
ts = time.time(); a[rows][:, cols]; te = time.time(); print('%.8f' % (te-ts));
0.00200009
What is the correct numpy call to get an array consisting of the subset of rows 'rows' and columns 'col' of the matrix a? (in fact, I need the transpose of this result)
Let my try to summarize the excellent answers by Jaime and TheodrosZelleke and mix in some comments.
Advanced (fancy) indexing always returns a copy, never a view.
a[rows][:,cols] implies two fancy indexing operations, so an intermediate copy a[rows] is created and discarded. Handy and readable, but not very efficient. Moreover beware that [:,cols] usually generates a Fortran contiguous copy form a C-cont. source.
a[rows.reshape(-1,1),cols] is a single advanced indexing expression basing on the fact that rows.reshape(-1,1) and cols are broadcast to the shape of the intended result.
A common experience is that indexing in a flattened array can be more efficient than fancy indexing, so another approach is
indx = rows.reshape(-1,1)*a.shape[1] + cols
a.take(indx)
or
a.take(indx.flat).reshape(rows.size,cols.size)
Efficiency will depend on memory access patterns and whether the starting array is C-countinous or Fortran continuous, so experimentation is needed.
Use fancy indexing only if really needed: basic slicing a[rstart:rstop:rstep, cstart:cstop:cstep] returns a view (although not continuous) and should be faster!
To my surprise this, kind of lenghty expression, which calculates first linear 1D-indices, is more than 50% faster than the consecutive array indexing presented in the question:
(a.ravel()[(
cols + (rows * a.shape[1]).reshape((-1,1))
).ravel()]).reshape(rows.size, cols.size)
UPDATE: OP updated the description of the shape of the initial array. With the updated size the speedup is now above 99%:
In [93]: a = np.random.randn(3218, 1415)
In [94]: rows = np.random.randint(a.shape[0], size=2000)
In [95]: cols = np.random.randint(a.shape[1], size=6)
In [96]: timeit a[rows][:, cols]
10 loops, best of 3: 186 ms per loop
In [97]: timeit (a.ravel()[(cols + (rows * a.shape[1]).reshape((-1,1))).ravel()]).reshape(rows.size, cols.size)
1000 loops, best of 3: 1.56 ms per loop
INITAL ANSWER:
Here is the transcript:
In [79]: a = np.random.randn(3218, 6)
In [80]: a.shape
Out[80]: (3218, 6)
In [81]: rows = np.random.randint(a.shape[0], size=2000)
In [82]: cols = np.array([1,3,4,5])
Time method 1:
In [83]: timeit a[rows][:, cols]
1000 loops, best of 3: 1.26 ms per loop
Time method 2:
In [84]: timeit (a.ravel()[(cols + (rows * a.shape[1]).reshape((-1,1))).ravel()]).reshape(rows.size, cols.size)
1000 loops, best of 3: 568 us per loop
Check that results are actually the same:
In [85]: result1 = a[rows][:, cols]
In [86]: result2 = (a.ravel()[(cols + (rows * a.shape[1]).reshape((-1,1))).ravel()]).reshape(rows.size, cols.size)
In [87]: np.sum(result1 - result2)
Out[87]: 0.0
You can get some speed up if you slice using fancy indexing and broadcasting:
from __future__ import division
import numpy as np
def slice_1(a, rs, cs) :
return a[rs][:, cs]
def slice_2(a, rs, cs) :
return a[rs[:, None], cs]
>>> rows, cols = 3218, 6
>>> rs = np.unique(np.random.randint(0, rows, size=(rows//2,)))
>>> cs = np.unique(np.random.randint(0, cols, size=(cols//2,)))
>>> a = np.random.rand(rows, cols)
>>> import timeit
>>> print timeit.timeit('slice_1(a, rs, cs)',
'from __main__ import slice_1, a, rs, cs',
number=1000)
0.24083110865
>>> print timeit.timeit('slice_2(a, rs, cs)',
'from __main__ import slice_2, a, rs, cs',
number=1000)
0.206566124519
If you think in term of percentages, doing something 15% faster is always good, but in my system, for the size of your array, this is taking 40 us less to do the slicing, and it is hard to believe that an operation taking 240 us will be your bottleneck.
Using np.ix_ you can a similar speed to ravel/reshape, but with code that is more clear:
a = np.random.randn(3218, 1415)
rows = np.random.randint(a.shape[0], size=2000)
cols = np.random.randint(a.shape[1], size=6)
a = np.random.randn(3218, 1415)
rows = np.random.randint(a.shape[0], size=2000)
cols = np.random.randint(a.shape[1], size=6)
%timeit (a.ravel()[(cols + (rows * a.shape[1]).reshape((-1,1))).ravel()]).reshape(rows.size, cols.size)
#101 µs ± 2.36 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit ix_ = np.ix_(rows, cols); a[ix_]
#135 µs ± 7.47 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
ix_ = np.ix_(rows, cols)
result1 = a[ix_]
result2 = (a.ravel()[(cols + (rows * a.shape[1]).reshape((-1,1))).ravel()]).reshape(rows.size, cols.size)
np.sum(result1 - result2)
0.0
I have this array
A = array([[-0.49740509, -0.48618909, -0.49145315],
[-0.48959259, -0.48618909, -0.49145315],
[-0.49740509, -0.47837659, -0.49145315],
...,
[ 0.03079315, -0.01194593, -0.06872366],
[ 0.03054901, -0.01170179, -0.06872366],
[ 0.03079315, -0.01170179, -0.06872366]])
which is a collection of 3D vector. I was wondering if I could use a vectorial operation to get an array with the norm of each of my vector.
I tried with norm(A) but it didn't work.
Doing it manually might be fastest (although there's always some neat trick someone posts I didn't think of):
In [75]: from numpy import random, array
In [76]: from numpy.linalg import norm
In [77]:
In [77]: A = random.rand(1000,3)
In [78]: timeit normedA_0 = array([norm(v) for v in A])
100 loops, best of 3: 16.5 ms per loop
In [79]: timeit normedA_1 = array(map(norm, A))
100 loops, best of 3: 16.9 ms per loop
In [80]: timeit normedA_2 = map(norm, A)
100 loops, best of 3: 16.7 ms per loop
In [81]: timeit normedA_4 = (A*A).sum(axis=1)**0.5
10000 loops, best of 3: 46.2 us per loop
This assumes everything's real. Could multiply by the conjugate instead if that's not true.
Update: Eric's suggestion of using math.sqrt won't work -- it doesn't handle numpy arrays -- but the idea of using sqrt instead of **0.5 is a good one, so let's test it.
In [114]: timeit normedA_4 = (A*A).sum(axis=1)**0.5
10000 loops, best of 3: 46.2 us per loop
In [115]: from numpy import sqrt
In [116]: timeit normedA_4 = sqrt((A*A).sum(axis=1))
10000 loops, best of 3: 45.8 us per loop
I tried it a few times, and this was the largest difference I saw.
just had the same prob, maybe late to answer but this should help others.
You can use the axis argument in the norm function
norm(A, axis=1)
Having never used numpy, I'm going to guess:
normedA = array(norm(v) for v in A)
How about this method?
Also you might want to add a [numpy] tag to the post.