My code for slicing a numpy array (via fancy indexing) is very slow. It is currently a bottleneck in program.
a.shape
(3218, 6)
ts = time.time(); a[rows][:, cols]; te = time.time(); print('%.8f' % (te-ts));
0.00200009
What is the correct numpy call to get an array consisting of the subset of rows 'rows' and columns 'col' of the matrix a? (in fact, I need the transpose of this result)
Let my try to summarize the excellent answers by Jaime and TheodrosZelleke and mix in some comments.
Advanced (fancy) indexing always returns a copy, never a view.
a[rows][:,cols] implies two fancy indexing operations, so an intermediate copy a[rows] is created and discarded. Handy and readable, but not very efficient. Moreover beware that [:,cols] usually generates a Fortran contiguous copy form a C-cont. source.
a[rows.reshape(-1,1),cols] is a single advanced indexing expression basing on the fact that rows.reshape(-1,1) and cols are broadcast to the shape of the intended result.
A common experience is that indexing in a flattened array can be more efficient than fancy indexing, so another approach is
indx = rows.reshape(-1,1)*a.shape[1] + cols
a.take(indx)
or
a.take(indx.flat).reshape(rows.size,cols.size)
Efficiency will depend on memory access patterns and whether the starting array is C-countinous or Fortran continuous, so experimentation is needed.
Use fancy indexing only if really needed: basic slicing a[rstart:rstop:rstep, cstart:cstop:cstep] returns a view (although not continuous) and should be faster!
To my surprise this, kind of lenghty expression, which calculates first linear 1D-indices, is more than 50% faster than the consecutive array indexing presented in the question:
(a.ravel()[(
cols + (rows * a.shape[1]).reshape((-1,1))
).ravel()]).reshape(rows.size, cols.size)
UPDATE: OP updated the description of the shape of the initial array. With the updated size the speedup is now above 99%:
In [93]: a = np.random.randn(3218, 1415)
In [94]: rows = np.random.randint(a.shape[0], size=2000)
In [95]: cols = np.random.randint(a.shape[1], size=6)
In [96]: timeit a[rows][:, cols]
10 loops, best of 3: 186 ms per loop
In [97]: timeit (a.ravel()[(cols + (rows * a.shape[1]).reshape((-1,1))).ravel()]).reshape(rows.size, cols.size)
1000 loops, best of 3: 1.56 ms per loop
INITAL ANSWER:
Here is the transcript:
In [79]: a = np.random.randn(3218, 6)
In [80]: a.shape
Out[80]: (3218, 6)
In [81]: rows = np.random.randint(a.shape[0], size=2000)
In [82]: cols = np.array([1,3,4,5])
Time method 1:
In [83]: timeit a[rows][:, cols]
1000 loops, best of 3: 1.26 ms per loop
Time method 2:
In [84]: timeit (a.ravel()[(cols + (rows * a.shape[1]).reshape((-1,1))).ravel()]).reshape(rows.size, cols.size)
1000 loops, best of 3: 568 us per loop
Check that results are actually the same:
In [85]: result1 = a[rows][:, cols]
In [86]: result2 = (a.ravel()[(cols + (rows * a.shape[1]).reshape((-1,1))).ravel()]).reshape(rows.size, cols.size)
In [87]: np.sum(result1 - result2)
Out[87]: 0.0
You can get some speed up if you slice using fancy indexing and broadcasting:
from __future__ import division
import numpy as np
def slice_1(a, rs, cs) :
return a[rs][:, cs]
def slice_2(a, rs, cs) :
return a[rs[:, None], cs]
>>> rows, cols = 3218, 6
>>> rs = np.unique(np.random.randint(0, rows, size=(rows//2,)))
>>> cs = np.unique(np.random.randint(0, cols, size=(cols//2,)))
>>> a = np.random.rand(rows, cols)
>>> import timeit
>>> print timeit.timeit('slice_1(a, rs, cs)',
'from __main__ import slice_1, a, rs, cs',
number=1000)
0.24083110865
>>> print timeit.timeit('slice_2(a, rs, cs)',
'from __main__ import slice_2, a, rs, cs',
number=1000)
0.206566124519
If you think in term of percentages, doing something 15% faster is always good, but in my system, for the size of your array, this is taking 40 us less to do the slicing, and it is hard to believe that an operation taking 240 us will be your bottleneck.
Using np.ix_ you can a similar speed to ravel/reshape, but with code that is more clear:
a = np.random.randn(3218, 1415)
rows = np.random.randint(a.shape[0], size=2000)
cols = np.random.randint(a.shape[1], size=6)
a = np.random.randn(3218, 1415)
rows = np.random.randint(a.shape[0], size=2000)
cols = np.random.randint(a.shape[1], size=6)
%timeit (a.ravel()[(cols + (rows * a.shape[1]).reshape((-1,1))).ravel()]).reshape(rows.size, cols.size)
#101 µs ± 2.36 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit ix_ = np.ix_(rows, cols); a[ix_]
#135 µs ± 7.47 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
ix_ = np.ix_(rows, cols)
result1 = a[ix_]
result2 = (a.ravel()[(cols + (rows * a.shape[1]).reshape((-1,1))).ravel()]).reshape(rows.size, cols.size)
np.sum(result1 - result2)
0.0
Related
I'm wondering what the most efficient way to do an argsort of an array given a condition, while preserving the original index
import numpy as np
x = np.array([0.63, 0.5, 0.7, 0.65])
np.argsort(x)
#Corrected argsort(x) solution
Out[99]: array([1, 0, 3, 2])
I want to argsort this array with condition that x>0.6. Since 0.5 < 0.6, index 1 should not be included.
x = np.array([0.63, 0.5, 0.7, 0.65])
index = x.argsort()
list(filter(lambda i: x[i] > 0.6, index))
[0,3,2]
This is inefficient since its not vectorized.
EDIT:
The filter will eliminate most of elements. So ideally, it filter first, then sort, while preserving original index.
Method 1 (Same idea as Tai's method but using integer indexing)
Too late to the party too and if my solution is a repeat of an already posted solution - ping me and I will delete it.
def meth_agn_v1(x, thresh):
idx = np.arange(x.size)[x > thresh]
return idx[np.argsort(x[idx])]
Then,
In [143]: meth_agn_v1(x, 0.5)
Out[143]: array([0, 3, 2])
Method 2 (significant performance improvement)
This uses the same idea expressed in the last section of my answer (comparison to Tai's method) that integer indexing is faster than boolean indexing (for small number of expected elements to be selected) and avoids creating an initial index at all.
def meth_agn_v2(x, thresh):
idx, = np.where(x > thresh)
return idx[np.argsort(x[idx])]
Timing
In [144]: x = np.random.rand(100000)
In [145]: timeit meth_jp(x, 0.99)
100 loops, best of 3: 7.43 ms per loop
In [146]: timeit meth_alex(x, 0.99)
1000 loops, best of 3: 498 µs per loop
In [147]: timeit meth_tai(x, 0.99)
1000 loops, best of 3: 298 µs per loop
In [148]: timeit meth_agn_v1(x, 0.99)
1000 loops, best of 3: 232 µs per loop
In [161]: timeit meth_agn_v2(x, 0.99)
10000 loops, best of 3: 95 µs per loop
Comparison of v1 to Tai's method
My first version of the answer is very similar to Tai's answer but not identical.
Tai's method as published originally:
def meth_tai(x, thresh):
y = np.arange(x.shape[0])
y = y [x > thresh]
x = x [x > thresh] # x = x[y] is used in my method
y[np.argsort(x)]
So, my method is different in using integer array indexing instead of the boolean indexing used by Tai. For a small number of selected elements integer indexing is faster than boolean indexing making this method more efficient than Tai's method even after Tai optimized his code.
Come a bit late to the party. The idea is that we can sort an array based on sorted indices of another array.
y = np.arange(x.shape[0]) # y for preserving the indices
mask = x > thresh
y = y[mask]
x = x[mask]
ans = y[np.argsort(x)] # change order of y based on sorted indices of x
The method is to add an array y that is just for recording the indices of x. We then filter out both arrays based on the boolean indexes x > thresh. Then, sort x with argsort. Finally, use the indices return by argsort to change the order of y!
Method 1 (#jp_data_analysis answer)
You should use this one unless you have reason not to.
def meth1(x, thresh):
return np.argsort(x)[(x <= thresh).sum():]
Method 2
If the filter will greatly reduce the number of elements in the array and the array is large then following may help:
def meth2(x, thresh):
m = x > thresh
idxs = np.argsort(x[m])
offsets = (~m).cumsum()
return idxs + offsets[m][idxs]
Speed comparison
x = np.random.rand(10000000)
%timeit meth1(x, 0.99)
# 2.81 s ± 244 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit meth2(x, 0.99)
# 104 ms ± 1.22 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Here is one more hackish approach which modifies the original array with some arbitrary maximum number which will not likely to occur in the original array.
In [50]: x = np.array([0.63, 0.5, 0.7, 0.65])
In [51]: invmask = ~(x > 0.6)
# replace it with some huge number which will not occur in your original array
In [52]: x[invmask] = 9999.0
In [53]: np.argsort(x)[:-sum(invmask)]
Out[53]: array([0, 3, 2])
So I have a (seemingly) simple problem, which I am currently doing now via a for loop.
Basically, I want to increment specific cells in a numpy matrix, but I want to do it without a for-loop if possible.
To give more details: I have 100 x 100 numpy matrix, X. I also have a 2x1000 numpy matrix P. P just stores indices into X, so for example, each column of P, has the row-column index of the cell, that I want to increment in X.
What I do right now is this:
for p in range(P.shape[1]):
X[P[0,p], P[1,p]] += 1
My question is, is there a way to do this without a for-loop?
Thanks!
Use the at method of the add ufunc with advanced indexing:
numpy.add.at(X, (P[0], P[1]), 1)
or just advanced indexing if P is guaranteed to never select the same cell of X twice:
X[P[0], P[1]] += 1
Using linear-indices and bincount -
lidx = np.ravel_multi_index(P, X.shape)
X += np.bincount(lidx, minlength=X.size).reshape(X.shape)
Benchmarking
For the case when indices are not repeated, advanced indexing based approach as suggested in #user2357112's post seems to be very efficient.
For repeated ones case, we have np.add.at and np.bincount and the performance numbers seem to be dependent on the size of indices array relative to the size of input array.
Approaches -
def app0(X,P): # #user2357112's soln1
np.add.at(X, (P[0], P[1]), 1)
def app1(X, P): # Proposed in this ppst
lidx = np.ravel_multi_index(P, X.shape)
X += np.bincount(lidx, minlength=X.size).reshape(X.shape)
Here's few timing tests to suggest that -
Case #1 :
In [141]: X = np.random.randint(0,9,(100,100))
...: P = np.random.randint(0,100,(2,1000))
...:
In [142]: %timeit app0(X, P)
...: %timeit app1(X, P)
...:
10000 loops, best of 3: 68.9 µs per loop
100000 loops, best of 3: 15.1 µs per loop
Case #2 :
In [143]: X = np.random.randint(0,9,(1000,1000))
...: P = np.random.randint(0,1000,(2,10000))
...:
In [144]: %timeit app0(X, P)
...: %timeit app1(X, P)
...:
1000 loops, best of 3: 687 µs per loop
1000 loops, best of 3: 1.48 ms per loop
Case #3 :
In [145]: X = np.random.randint(0,9,(1000,1000))
...: P = np.random.randint(0,1000,(2,100000))
...:
In [146]: %timeit app0(X, P)
...: %timeit app1(X, P)
...:
100 loops, best of 3: 11.3 ms per loop
100 loops, best of 3: 2.51 ms per loop
Suppose I have the following Numpy array, in which I have one and only one continuous slice of 1s:
import numpy as np
x = np.array([0,0,0,0,1,1,1,0,0,0], dtype=1)
and I want to find the index of the 1D center of mass of the 1 elements. I could type the following:
idx = np.where( x )[0]
idx_center_of_mass = int(0.5*(idx.max() + idx.min()))
# this would give 5
(Of course this would lead to rough approximation when the number of elements of the 1s slice is even.)
Is there any better way to do this, like a computationally more efficient oneliner?
Can't you simply do the following?
center_of_mass = (x*np.arange(len(x))).sum()/x.sum() # 5
%timeit center_of_mass = (x*arange(len(x))).sum()/x.sum()
# 100000 loops, best of 3: 10.4 µs per loop
As one approach we can get the non-zero indices and get the mean of those as the center of mass, like so -
np.flatnonzero(x).mean()
Here's another approach using shifted array comparison to get the start and stop indices of that slice and getting the mean of those indices for determining the center of mass, like so -
np.flatnonzero(x[:-1] != x[1:]).mean()+0.5
Runtime test -
In [72]: x = np.zeros(10000,dtype=int)
In [73]: x[100:2000] = 1
In [74]: %timeit np.flatnonzero(x).mean()
10000 loops, best of 3: 115 µs per loop
In [75]: %timeit np.flatnonzero(x[:-1] != x[1:]).mean()+0.5
10000 loops, best of 3: 38.7 µs per loop
We can improve the performance by some margin here with the use of np.nonzero()[0] to replace np.flatnonzero and np.sum in place of np.mean -
In [107]: %timeit (np.nonzero(x[:-1] != x[1:])[0].sum()+1)/2.0
10000 loops, best of 3: 30.6 µs per loop
Alternatively, for the second approach, we can store the start and stop indices and then simply add them to get the center of mass for a bit more efficient approach as we would avoid the function call to np.mean, like so -
start,stop = np.flatnonzero(x[:-1] != x[1:])
out = (stop + start + 1)/2.0
Timings -
In [90]: %timeit start,stop = np.flatnonzero(x[:-1] != x[1:])
10000 loops, best of 3: 21.3 µs per loop
In [91]: %timeit (stop + start + 1)/2.0
100000 loops, best of 3: 4.45 µs per loop
Again, we can experiment with np.nonzero()[0] here.
Say I have two matrices B and M and I want to execute the following statement:
B += 3*M
I execute this instruction repeatedly so I don't want to build each time the matrix 3*M (3 may change, it is just to make cleat that I only do a scalar-matrix product). Is it a numpy-function which makes this computation "in place"?
More precisely, I have a list of scalars as and a list of matrices Ms, I would like to perform the "dot product" (which is not really one since the two operands are of different type) of the two, that is to say:
sum(a*M for a, M in zip(as, Ms))
The np.dot function does not do what I except...
You can use np.tensordot -
np.tensordot(As,Ms,axes=(0,0))
Or np.einsum -
np.einsum('i,ijk->jk',As,Ms)
Sample run -
In [41]: As = [2,5,6]
In [42]: Ms = [np.random.rand(2,3),np.random.rand(2,3),np.random.rand(2,3)]
In [43]: sum(a*M for a, M in zip(As, Ms))
Out[43]:
array([[ 6.79630284, 5.04212877, 10.76217631],
[ 4.91927651, 1.98115548, 6.13705742]])
In [44]: np.tensordot(As,Ms,axes=(0,0))
Out[44]:
array([[ 6.79630284, 5.04212877, 10.76217631],
[ 4.91927651, 1.98115548, 6.13705742]])
In [45]: np.einsum('i,ijk->jk',As,Ms)
Out[45]:
array([[ 6.79630284, 5.04212877, 10.76217631],
[ 4.91927651, 1.98115548, 6.13705742]])
Another way you could do this, particularly if you favour readability, is to make use of broadcasting.
So you could make a 3D array from the 1D and 2D arrays and then sum over the appropriate axis:
>>> Ms = np.random.randn(4, 2, 3) # 4 arrays of size 2x3
>>> As = np.random.randn(4)
>>> np.sum(As[:, np.newaxis, np.newaxis] * Ms)
array([[-1.40199248, -0.40337845, -0.69986566],
[ 3.52724279, 0.19547118, 2.1485559 ]])
>>> sum(a*M for a, M in zip(As, Ms))
array([[-1.40199248, -0.40337845, -0.69986566],
[ 3.52724279, 0.19547118, 2.1485559 ]])
However, it's worth noting that np.einsum and np.tensordot are usually much more efficient:
>>> %timeit np.sum(As[:, np.newaxis, np.newaxis] * Ms, axis=0)
The slowest run took 7.38 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 8.58 µs per loop
>>> %timeit np.einsum('i,ijk->jk', As, Ms)
The slowest run took 19.16 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.44 µs per loop
And this is also true for larger numbers:
>>> Ms = np.random.randn(100, 200, 300)
>>> As = np.random.randn(100)
>>> %timeit np.einsum('i,ijk->jk', As, Ms)
100 loops, best of 3: 5.03 ms per loop
>>> %timeit np.sum(As[:, np.newaxis, np.newaxis] * Ms, axis=0)
100 loops, best of 3: 14.8 ms per loop
>>> %timeit np.tensordot(As,Ms,axes=(0,0))
100 loops, best of 3: 2.79 ms per loop
So np.tensordot works best in this case.
The only good reason to use np.sum and broadcasting is to make the code a little more readable (helps when you have small matrices).
I have a list of floats I get from a machine learning algorithm. All these floats are between 0 and 1:
probs = [proba[0] for proba in self.classifier.predict_proba(x_test)]
probs is my list of floats. The predict_proba() function normally returns a numpy array. It takes about 9 seconds to get the list, and the list finally contains about 60k values.
I would like to scale, or normalize, all the values in the list against the highest value in the list.
Normally, I would do that:
maximum = max(probs)
list_values = [proba / maximum for proba in probs]
But for 60k values, it takes about 2 minutes. I would like to make it shorter.
Do you have any idea about how I could attend better performances ?
If you don't mind using an external library, numpy might be worth looking into:
import numpy
probs = numpy.array([proba[0] for proba in self.classifier.predict_proba(x_test)])
list_values = probs/maximum
Another approach using numpy, potentially faster if your list of probabilities is large, is to convert your whole probabilities to a numpy array, and then operate over it:
import numpy as np
probs = np.asarray(self.classifier.predict_proba(x_test))
list_values = probs[:, 0] / probs.max()
The first line will convert all your probabilities to a N x M array (where N is your samples and M your number of classes).
The second line will select all the probabilities for the first class ([:, 0] means all rows of the column 0, which yields a vector of size N) and divide it for the maximum.
You can potentially extend this to all your probabilities:
all_probs = probs / probs.max()
The above will normalize all your probabilities for all the classes. And later you can access them like all_probs[:, i] where i is the class of interest.
You should use Scikit learn's normalize.
from sklearn.preprocessing import normalize
If you want your end results to be numpy.array , then it would be to faster to convert your list to numpy array before hand and to use array division directly , than list comprehension. Example -
import numpy as np
probsnp = np.array([proba[0] for proba in self.classifier.predict_proba(x_test)])
maximum = probs.max()
list_values = probs/maximum
Examples of timing tests -
In [46]: import numpy.random as ndr
In [47]: probs = ndr.random_sample(1000)
In [48]: probs.shape
Out[48]: (1000,)
In [49]: def func1(probs):
....: maximum = max(probs)
....: probsnew = [i/maximum for i in probs]
....: return probsnew
....:
In [50]: def func2(probs):
....: maximum = probs.max()
....: probsnew = probs/maximum
....: return probsnew
....:
In [51]: %timeit func1(probs)
The slowest run took 229.79 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 279 µs per loop
In [52]: %timeit func1(probs)
1000 loops, best of 3: 278 µs per loop
In [53]: %timeit func2(probs)
The slowest run took 356.45 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 81 µs per loop
In [54]: %timeit func1(probs)
1000 loops, best of 3: 278 µs per loop
In [55]: %timeit func2(probs)
10000 loops, best of 3: 81.5 µs per loop
The numpy method takes only 1/3rd time as that of list comprehension.
Timing tests with numpy.array() conversion as part of func2 (in above example) -
In [60]: probslist = [p for p in probs]
In [61]: def func2(probs):
....: probsnp = np,array(probs)
....: maxprobs = probsnp.max()
....: probsnew = probsnp/maxprobs
....: return probsnew
....:
In [65]: %timeit func1(probslist)
1000 loops, best of 3: 212 µs per loop
In [66]: %timeit func2(probslist)
10000 loops, best of 3: 198 µs per loop
In [67]: probs = ndr.random_sample(60000)
In [68]: probslist = [p for p in probs]
In [74]: %timeit func1(probslist)
100 loops, best of 3: 11.5 ms per loop
In [75]: %timeit func2(probslist)
100 loops, best of 3: 5.79 ms per loop
In [76]: %timeit func1(probslist)
100 loops, best of 3: 11.4 ms per loop
In [77]: %timeit func2(probslist)
100 loops, best of 3: 5.81 ms per loop
Seems like its still a little faster to use numpy array.