I have a list of floats I get from a machine learning algorithm. All these floats are between 0 and 1:
probs = [proba[0] for proba in self.classifier.predict_proba(x_test)]
probs is my list of floats. The predict_proba() function normally returns a numpy array. It takes about 9 seconds to get the list, and the list finally contains about 60k values.
I would like to scale, or normalize, all the values in the list against the highest value in the list.
Normally, I would do that:
maximum = max(probs)
list_values = [proba / maximum for proba in probs]
But for 60k values, it takes about 2 minutes. I would like to make it shorter.
Do you have any idea about how I could attend better performances ?
If you don't mind using an external library, numpy might be worth looking into:
import numpy
probs = numpy.array([proba[0] for proba in self.classifier.predict_proba(x_test)])
list_values = probs/maximum
Another approach using numpy, potentially faster if your list of probabilities is large, is to convert your whole probabilities to a numpy array, and then operate over it:
import numpy as np
probs = np.asarray(self.classifier.predict_proba(x_test))
list_values = probs[:, 0] / probs.max()
The first line will convert all your probabilities to a N x M array (where N is your samples and M your number of classes).
The second line will select all the probabilities for the first class ([:, 0] means all rows of the column 0, which yields a vector of size N) and divide it for the maximum.
You can potentially extend this to all your probabilities:
all_probs = probs / probs.max()
The above will normalize all your probabilities for all the classes. And later you can access them like all_probs[:, i] where i is the class of interest.
You should use Scikit learn's normalize.
from sklearn.preprocessing import normalize
If you want your end results to be numpy.array , then it would be to faster to convert your list to numpy array before hand and to use array division directly , than list comprehension. Example -
import numpy as np
probsnp = np.array([proba[0] for proba in self.classifier.predict_proba(x_test)])
maximum = probs.max()
list_values = probs/maximum
Examples of timing tests -
In [46]: import numpy.random as ndr
In [47]: probs = ndr.random_sample(1000)
In [48]: probs.shape
Out[48]: (1000,)
In [49]: def func1(probs):
....: maximum = max(probs)
....: probsnew = [i/maximum for i in probs]
....: return probsnew
....:
In [50]: def func2(probs):
....: maximum = probs.max()
....: probsnew = probs/maximum
....: return probsnew
....:
In [51]: %timeit func1(probs)
The slowest run took 229.79 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 279 µs per loop
In [52]: %timeit func1(probs)
1000 loops, best of 3: 278 µs per loop
In [53]: %timeit func2(probs)
The slowest run took 356.45 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 81 µs per loop
In [54]: %timeit func1(probs)
1000 loops, best of 3: 278 µs per loop
In [55]: %timeit func2(probs)
10000 loops, best of 3: 81.5 µs per loop
The numpy method takes only 1/3rd time as that of list comprehension.
Timing tests with numpy.array() conversion as part of func2 (in above example) -
In [60]: probslist = [p for p in probs]
In [61]: def func2(probs):
....: probsnp = np,array(probs)
....: maxprobs = probsnp.max()
....: probsnew = probsnp/maxprobs
....: return probsnew
....:
In [65]: %timeit func1(probslist)
1000 loops, best of 3: 212 µs per loop
In [66]: %timeit func2(probslist)
10000 loops, best of 3: 198 µs per loop
In [67]: probs = ndr.random_sample(60000)
In [68]: probslist = [p for p in probs]
In [74]: %timeit func1(probslist)
100 loops, best of 3: 11.5 ms per loop
In [75]: %timeit func2(probslist)
100 loops, best of 3: 5.79 ms per loop
In [76]: %timeit func1(probslist)
100 loops, best of 3: 11.4 ms per loop
In [77]: %timeit func2(probslist)
100 loops, best of 3: 5.81 ms per loop
Seems like its still a little faster to use numpy array.
Related
So I have a (seemingly) simple problem, which I am currently doing now via a for loop.
Basically, I want to increment specific cells in a numpy matrix, but I want to do it without a for-loop if possible.
To give more details: I have 100 x 100 numpy matrix, X. I also have a 2x1000 numpy matrix P. P just stores indices into X, so for example, each column of P, has the row-column index of the cell, that I want to increment in X.
What I do right now is this:
for p in range(P.shape[1]):
X[P[0,p], P[1,p]] += 1
My question is, is there a way to do this without a for-loop?
Thanks!
Use the at method of the add ufunc with advanced indexing:
numpy.add.at(X, (P[0], P[1]), 1)
or just advanced indexing if P is guaranteed to never select the same cell of X twice:
X[P[0], P[1]] += 1
Using linear-indices and bincount -
lidx = np.ravel_multi_index(P, X.shape)
X += np.bincount(lidx, minlength=X.size).reshape(X.shape)
Benchmarking
For the case when indices are not repeated, advanced indexing based approach as suggested in #user2357112's post seems to be very efficient.
For repeated ones case, we have np.add.at and np.bincount and the performance numbers seem to be dependent on the size of indices array relative to the size of input array.
Approaches -
def app0(X,P): # #user2357112's soln1
np.add.at(X, (P[0], P[1]), 1)
def app1(X, P): # Proposed in this ppst
lidx = np.ravel_multi_index(P, X.shape)
X += np.bincount(lidx, minlength=X.size).reshape(X.shape)
Here's few timing tests to suggest that -
Case #1 :
In [141]: X = np.random.randint(0,9,(100,100))
...: P = np.random.randint(0,100,(2,1000))
...:
In [142]: %timeit app0(X, P)
...: %timeit app1(X, P)
...:
10000 loops, best of 3: 68.9 µs per loop
100000 loops, best of 3: 15.1 µs per loop
Case #2 :
In [143]: X = np.random.randint(0,9,(1000,1000))
...: P = np.random.randint(0,1000,(2,10000))
...:
In [144]: %timeit app0(X, P)
...: %timeit app1(X, P)
...:
1000 loops, best of 3: 687 µs per loop
1000 loops, best of 3: 1.48 ms per loop
Case #3 :
In [145]: X = np.random.randint(0,9,(1000,1000))
...: P = np.random.randint(0,1000,(2,100000))
...:
In [146]: %timeit app0(X, P)
...: %timeit app1(X, P)
...:
100 loops, best of 3: 11.3 ms per loop
100 loops, best of 3: 2.51 ms per loop
I want to compute the output error for a neural network for each input by compare output signal and its true output value so I need two matrix to compute this task.
I have output matrix in shape of (n*1) but in the label I just have the index of neuron that should be activated, so I need a matrix in the same shape with all element equal to zero except the one which it's index is equal to the label. I could do that with a function but I wonder is there a built in method in numpy python that can do that for me?
You can do that multiple ways using numpy or standard libraries, one way is to create an array of zeros, and set the value corresponding to index as 1.
n = len(result)
a = np.zeros((n,));
a[id] = 1
It probably is going to be the fastest one as well:
>> %timeit a = np.zeros((n,)); a[id] = 1
1000000 loops, best of 3: 634 ns per loop
Alternatively you can use numpy.pad to pad [ 1 ] array with zeros. But this will almost definitely will be slower due to padding logic.
np.lib.pad([1],(id,n-id),'constant', constant_values=(0))
As expected order of magnitude slower:
>> %timeit np.lib.pad([1],(id,n-id),'constant', constant_values=(0))
10000 loops, best of 3: 47.4 µs per loop
And you can try list comprehension as suggested by the comments:
results = [7]
np.matrix([1 if x == id else 0 for x in results])
But it is much slower than the first method as well:
>> %timeit np.matrix([1 if x == id else 0 for x in results])
100000 loops, best of 3: 7.25 µs per loop
Edit:
But in my opinion, if you want to compute the neural networks error. You should just use np.argmax and compute whether it was successful or not. That error calculation may give you more noise than it is useful. You can make a confusion matrix if you feel your network is prone to similarities.
A few other methods that also seem to be slower than #umutto's above:
%timeit a = np.zeros((n,)); a[id] = 1 #umutto's method
The slowest run took 45.34 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.53 µs per loop
Boolean construction:
%timeit a = np.arange(n) == id
The slowest run took 13.98 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 3.76 µs per loop
Boolean construction to integer:
%timeit a = (np.arange(n) == id).astype(int)
The slowest run took 15.31 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 5.47 µs per loop
List construction:
%timeit a = [0]*n; a[id] = 1; a=np.asarray(a)
10000 loops, best of 3: 77.3 µs per loop
Using scipy.sparse
%timeit a = sparse.coo_matrix(([1], ([id],[0])), shape=(n,1))
10000 loops, best of 3: 51.1 µs per loop
Now what's actually faster may depend on what's being cached, but it seems like constructing the zero array is probably fastest, especially if you can use np.zeros_like(result) instead of np.zeros(len(result))
One liner:
x = np.identity(n)[id]
Suppose I have the following Numpy array, in which I have one and only one continuous slice of 1s:
import numpy as np
x = np.array([0,0,0,0,1,1,1,0,0,0], dtype=1)
and I want to find the index of the 1D center of mass of the 1 elements. I could type the following:
idx = np.where( x )[0]
idx_center_of_mass = int(0.5*(idx.max() + idx.min()))
# this would give 5
(Of course this would lead to rough approximation when the number of elements of the 1s slice is even.)
Is there any better way to do this, like a computationally more efficient oneliner?
Can't you simply do the following?
center_of_mass = (x*np.arange(len(x))).sum()/x.sum() # 5
%timeit center_of_mass = (x*arange(len(x))).sum()/x.sum()
# 100000 loops, best of 3: 10.4 µs per loop
As one approach we can get the non-zero indices and get the mean of those as the center of mass, like so -
np.flatnonzero(x).mean()
Here's another approach using shifted array comparison to get the start and stop indices of that slice and getting the mean of those indices for determining the center of mass, like so -
np.flatnonzero(x[:-1] != x[1:]).mean()+0.5
Runtime test -
In [72]: x = np.zeros(10000,dtype=int)
In [73]: x[100:2000] = 1
In [74]: %timeit np.flatnonzero(x).mean()
10000 loops, best of 3: 115 µs per loop
In [75]: %timeit np.flatnonzero(x[:-1] != x[1:]).mean()+0.5
10000 loops, best of 3: 38.7 µs per loop
We can improve the performance by some margin here with the use of np.nonzero()[0] to replace np.flatnonzero and np.sum in place of np.mean -
In [107]: %timeit (np.nonzero(x[:-1] != x[1:])[0].sum()+1)/2.0
10000 loops, best of 3: 30.6 µs per loop
Alternatively, for the second approach, we can store the start and stop indices and then simply add them to get the center of mass for a bit more efficient approach as we would avoid the function call to np.mean, like so -
start,stop = np.flatnonzero(x[:-1] != x[1:])
out = (stop + start + 1)/2.0
Timings -
In [90]: %timeit start,stop = np.flatnonzero(x[:-1] != x[1:])
10000 loops, best of 3: 21.3 µs per loop
In [91]: %timeit (stop + start + 1)/2.0
100000 loops, best of 3: 4.45 µs per loop
Again, we can experiment with np.nonzero()[0] here.
I have a regular list called a, and a NumPy array of indices b.
(No, it is not possible for me to convert a to a NumPy array.)
Is there any way for me to the same effect as "a[b]" efficiently? To be clear, this implies that I don't want to extract every individual int in b due to its performance implications.
(Yes, this is a bottleneck in my code. That's why I'm using NumPy arrays to begin with.)
a = list(range(1000000))
b = np.random.randint(0, len(a), 10000)
%timeit np.array(a)[b]
10 loops, best of 3: 84.8 ms per loop
%timeit [a[x] for x in b]
100 loops, best of 3: 2.93 ms per loop
%timeit operator.itemgetter(*b)(a)
1000 loops, best of 3: 1.86 ms per loop
%timeit np.take(a, b)
10 loops, best of 3: 91.3 ms per loop
I had high hopes for numpy.take() but it is far from optimal. I tried some Numba solutions as well, and they yielded similar times--around 92 ms.
So a simple list comprehension is not far from the best here, but operator.itemgetter() wins, at least for input sizes at these orders of magnitude.
Write a cython function:
import cython
from cpython cimport PyList_New, PyList_SET_ITEM, Py_INCREF
#cython.wraparound(False)
#cython.boundscheck(False)
def take(list alist, Py_ssize_t[:] arr):
cdef:
Py_ssize_t i, idx, n = arr.shape[0]
list res = PyList_New(n)
object obj
for i in range(n):
idx = arr[i]
obj = alist[idx]
PyList_SET_ITEM(res, i, alist[idx])
Py_INCREF(obj)
return res
The result of %timeit:
import numpy as np
al= list(range(10000))
aa = np.array(al)
ba = np.random.randint(0, len(a), 10000)
bl = ba.tolist()
%timeit [al[i] for i in bl]
%timeit np.take(aa, ba)
%timeit take(al, ba)
1000 loops, best of 3: 1.68 ms per loop
10000 loops, best of 3: 51.4 µs per loop
1000 loops, best of 3: 254 µs per loop
numpy.take() is the fastest if both of the arguments are ndarray object. The cython version is 5x faster than list comprehension.
I know that in Python, the in-place operators use the __iadd__ method for in-place operators. For immutable types, the __iadd__ is a workaround using the __add__, e.g., like tmp = a + b; a = tmp, but mutable types (like lists) are modified in-place, which causes a slight speed boost.
However, if I have a NumPy array where I modify its contained immutable types, e.g., integers or floats, there is also an even more significant speed boost. How does this work? I did some example benchmarks below:
import numpy as np
def inplace(a, b):
a += b
return a
def assignment(a, b):
a = a + b
return a
int1 = 1
int2 = 1
list1 = [1]
list2 = [1]
npary1 = np.ones((1000,1000))
npary2 = np.ones((1000,1000))
print('Python integers')
%timeit inplace(int1, 1)
%timeit assignment(int2, 1)
print('\nPython lists')
%timeit inplace(list1, [1])
%timeit assignment(list2, [1])
print('\nNumPy Arrays')
%timeit inplace(npary1, 1)
%timeit assignment(npary2, 1)
What I would expect is a similar difference as for the Python integers when I used the in-place operators on NumPy arrays, however the results are completely different:
Python integers
1000000 loops, best of 3: 265 ns per loop
1000000 loops, best of 3: 249 ns per loop
Python lists
1000000 loops, best of 3: 449 ns per loop
1000000 loops, best of 3: 638 ns per loop
NumPy Arrays
100 loops, best of 3: 3.76 ms per loop
100 loops, best of 3: 6.6 ms per loop
Each call to assignment(npary2, 1) requires creating a new one million element array. Consider how much time it takes just to allocate a (1000, 1000)-shaped array of ones:
In [21]: %timeit np.ones((1000, 1000))
100 loops, best of 3: 3.84 ms per loop
This allocation of a new temporary array requires on my machine about 3.84 ms, and is on the right order of magnitude to explain the entire difference between inplace(npary1, 1) and assignment(nparay2, 1):
In [12]: %timeit inplace(npary1, 1)
1000 loops, best of 3: 1.8 ms per loop
In [13]: %timeit assignment(npary2, 1)
100 loops, best of 3: 4.04 ms per loop
So, given that allocation is a relatively slow process, it makes sense that in-place addition is significantly faster than assignment to a new array.
NumPy operations on NumPy arrays may be fast, but creation of NumPy arrays is relatively slow. Consider, for example, how much more time it takes to create a NumPy array than a Python list:
In [14]: %timeit list()
10000000 loops, best of 3: 106 ns per loop
In [15]: %timeit np.array([])
1000000 loops, best of 3: 563 ns per loop
This is one reason why it is generally better to use one large NumPy array (allocated once) rather than thousands of small NumPy arrays.