How to optimize MAPE code in Python?

How to optimize MAPE code in Python? - python

I need to have a MAPE function, however I was not able to find it in standard packages ... Below, my implementation of this function.
def mape(actual, predict):
tmp, n = 0.0, 0
for i in range(0, len(actual)):
if actual[i] <> 0:
tmp += math.fabs(actual[i]-predict[i])/actual[i]
n += 1
return (tmp/n)
I don't like it, it's super not optimal in terms of speed. How to rewrite the code to be more Pythonic way and boost the speed?

Here's one vectorized approach with masking -
def mape_vectorized(a, b):
mask = a <> 0
return (np.fabs(a[mask] - b[mask])/a[mask]).mean()
Probably a faster one with masking after division computation -
def mape_vectorized_v2(a, b):
mask = a <> 0
return (np.fabs(a - b)/a)[mask].mean()
Runtime test -
In [217]: a = np.random.randint(-10,10,(10000))
...: b = np.random.randint(-10,10,(10000))
...:
In [218]: %timeit mape(a,b)
100 loops, best of 3: 11.7 ms per loop
In [219]: %timeit mape_vectorized(a,b)
1000 loops, best of 3: 273 µs per loop
In [220]: %timeit mape_vectorized_v2(a,b)
1000 loops, best of 3: 220 µs per loop

Another similar way of doing it using masked_Arrays to mask division by zero is:
import numpy.ma as ma
masked_actual = ma.masked_array(actual, mask=actual==0)
MAPE = (np.fabs(masked_actual - predict)/masked_actual).mean()

Related

Looking to speed up loop including pandas operations

I need help speeding up this loop and I am not sure how to go about it
import numpy as np
import pandas as pd
import timeit
n = 1000
df = pd.DataFrame({0:np.random.rand(n),1:np.random.rand(n)})
def loop():
result = pd.DataFrame(index=df.index,columns=['result'])
for i in df.index:
last_index_to_consider = df.index.values[::-1][i]
tdf = df.loc[:last_index_to_consider] - df.shift(-i).loc[:last_index_to_consider]
tdf = tdf.apply(lambda x: x**2)
tsumdf = tdf.sum(axis=1)
result.loc[i,'result'] = tsumdf.mean()
return result
print(timeit.timeit(loop, number=10))
Is it possible to tweak the for loop to make it faster or are there options using numba or can I go ahead and use multiple threads to speed this loop up?
What would be the most sensible way to get more performance than just simply evaluating this code straight away?

There's a lot of compute happening per iteration. Keeping it that way, we could leverage underlying array data alongwith np.einsum for the squared-sum-reductions could bring about speedups. Here's an implementation that goes along those lines -
def array_einsum_loop(df):
a = df.values
l = len(a)
out = np.empty(l)
for i in range(l):
d = a[:l-i] - a[i:]
out[i] = np.einsum('ij,ij->',d,d)
df_out = pd.DataFrame({'result':out/np.arange(l,0,-1)})
return df_out
Runtime test -
In [153]: n = 1000
...: df = pd.DataFrame({0:np.random.rand(n),1:np.random.rand(n)})
In [154]: %timeit loop(df)
1 loop, best of 3: 1.43 s per loop
In [155]: %timeit array_einsum_loop(df)
100 loops, best of 3: 5.61 ms per loop
In [156]: 1430/5.61
Out[156]: 254.9019607843137
Not bad for a 250x+ speedup without breaking any loop or bank!

Just for fun, the ultimate speedup with numba :
import numba
#numba.njit
def numba(d0,d1):
n=len(d0)
result=np.empty(n,np.float64)
for i in range(n):
s=0
k=i
for j in range(n-i):
u = d0[j]-d0[k]
v = d1[j]-d1[k]
k+=1
s += u*u + v*v
result[i] = s/(j+1)
return result
def loop2(df):
return pd.DataFrame({'result':numba(*df.values.T)})
For a 2500x+ factor.
In [519]: %timeit loop2(df)
583 µs ± 5.87 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Accumulating numbers in an array without a loop. (python)

So I have a (seemingly) simple problem, which I am currently doing now via a for loop.
Basically, I want to increment specific cells in a numpy matrix, but I want to do it without a for-loop if possible.
To give more details: I have 100 x 100 numpy matrix, X. I also have a 2x1000 numpy matrix P. P just stores indices into X, so for example, each column of P, has the row-column index of the cell, that I want to increment in X.
What I do right now is this:
for p in range(P.shape[1]):
X[P[0,p], P[1,p]] += 1
My question is, is there a way to do this without a for-loop?
Thanks!

Use the at method of the add ufunc with advanced indexing:
numpy.add.at(X, (P[0], P[1]), 1)
or just advanced indexing if P is guaranteed to never select the same cell of X twice:
X[P[0], P[1]] += 1

Using linear-indices and bincount -
lidx = np.ravel_multi_index(P, X.shape)
X += np.bincount(lidx, minlength=X.size).reshape(X.shape)
Benchmarking
For the case when indices are not repeated, advanced indexing based approach as suggested in #user2357112's post seems to be very efficient.
For repeated ones case, we have np.add.at and np.bincount and the performance numbers seem to be dependent on the size of indices array relative to the size of input array.
Approaches -
def app0(X,P): # #user2357112's soln1
np.add.at(X, (P[0], P[1]), 1)
def app1(X, P): # Proposed in this ppst
lidx = np.ravel_multi_index(P, X.shape)
X += np.bincount(lidx, minlength=X.size).reshape(X.shape)
Here's few timing tests to suggest that -
Case #1 :
In [141]: X = np.random.randint(0,9,(100,100))
...: P = np.random.randint(0,100,(2,1000))
...:
In [142]: %timeit app0(X, P)
...: %timeit app1(X, P)
...:
10000 loops, best of 3: 68.9 µs per loop
100000 loops, best of 3: 15.1 µs per loop
Case #2 :
In [143]: X = np.random.randint(0,9,(1000,1000))
...: P = np.random.randint(0,1000,(2,10000))
...:
In [144]: %timeit app0(X, P)
...: %timeit app1(X, P)
...:
1000 loops, best of 3: 687 µs per loop
1000 loops, best of 3: 1.48 ms per loop
Case #3 :
In [145]: X = np.random.randint(0,9,(1000,1000))
...: P = np.random.randint(0,1000,(2,100000))
...:
In [146]: %timeit app0(X, P)
...: %timeit app1(X, P)
...:
100 loops, best of 3: 11.3 ms per loop
100 loops, best of 3: 2.51 ms per loop

Code optimization python

I wrote the below function to estimate the orientation from a 3 axes accelerometer signal (X,Y,Z)
X.shape
Out[4]: (180000L,)
Y.shape
Out[4]: (180000L,)
Z.shape
Out[4]: (180000L,)
def estimate_orientation(self,X,Y,Z):
sigIn=np.array([X,Y,Z]).T
N=len(sigIn)
sigOut=np.empty(shape=(N,3))
sigOut[sigOut==0]=None
i=0
while i<N:
sigOut[i,:] = np.arccos(sigIn[i,:]/np.linalg.norm(sigIn[i,:]))*180/math.pi
i=i+1
return sigOut
Executing this function with a signal of 180000 samples takes quite a while (~2.2 seconds)... I know that it is not written in a "pythonic way"... Could you help me to optimize the execution time?
Thanks!

Starting approach
One approach following an usage of broadcasting, would be like so -
np.arccos(sigIn/np.linalg.norm(sigIn,axis=1,keepdims=1))*180/np.pi
Further optimization - I
We could use np.einsum to replace np.linalg.norm part. Thus :
np.linalg.norm(sigIn,axis=1,keepdims=1)
could be replaced by :
np.sqrt(np.einsum('ij,ij->i',sigIn,sigIn))[:,None]
Further optimization - II
Further boost could be brought in with numexpr module, which works really well with huge arrays and with operations involving trigonometrical functions. In our case that would be arcccos. So, we will use the einsum part as used in the previous optimization section and then use arccos from numexpr on it.
Thus, the implementation would look something like this -
import numexpr as ne
pi_val = np.pi
s = np.sqrt(np.einsum('ij,ij->i',signIn,signIn))[:,None]
out = ne.evaluate('arccos(signIn/s)*180/pi_val')
Runtime test
Approaches -
def original_app(sigIn):
N=len(sigIn)
sigOut=np.empty(shape=(N,3))
sigOut[sigOut==0]=None
i=0
while i<N:
sigOut[i,:] = np.arccos(sigIn[i,:]/np.linalg.norm(sigIn[i,:]))*180/math.pi
i=i+1
return sigOut
def broadcasting_app(signIn):
s = np.linalg.norm(signIn,axis=1,keepdims=1)
return np.arccos(signIn/s)*180/np.pi
def einsum_app(signIn):
s = np.sqrt(np.einsum('ij,ij->i',signIn,signIn))[:,None]
return np.arccos(signIn/s)*180/np.pi
def numexpr_app(signIn):
pi_val = np.pi
s = np.sqrt(np.einsum('ij,ij->i',signIn,signIn))[:,None]
return ne.evaluate('arccos(signIn/s)*180/pi_val')
Timings -
In [115]: a = np.random.rand(180000,3)
In [116]: %timeit original_app(a)
...: %timeit broadcasting_app(a)
...: %timeit einsum_app(a)
...: %timeit numexpr_app(a)
...:
1 loops, best of 3: 1.38 s per loop
100 loops, best of 3: 15.4 ms per loop
100 loops, best of 3: 13.3 ms per loop
100 loops, best of 3: 4.85 ms per loop
In [117]: 1380/4.85 # Speedup number
Out[117]: 284.5360824742268
280x speedup there!

Python - Best way to find the 1d center of mass in a binary numpy array

Suppose I have the following Numpy array, in which I have one and only one continuous slice of 1s:
import numpy as np
x = np.array([0,0,0,0,1,1,1,0,0,0], dtype=1)
and I want to find the index of the 1D center of mass of the 1 elements. I could type the following:
idx = np.where( x )[0]
idx_center_of_mass = int(0.5*(idx.max() + idx.min()))
# this would give 5
(Of course this would lead to rough approximation when the number of elements of the 1s slice is even.)
Is there any better way to do this, like a computationally more efficient oneliner?

Can't you simply do the following?
center_of_mass = (x*np.arange(len(x))).sum()/x.sum() # 5
%timeit center_of_mass = (x*arange(len(x))).sum()/x.sum()
# 100000 loops, best of 3: 10.4 µs per loop

As one approach we can get the non-zero indices and get the mean of those as the center of mass, like so -
np.flatnonzero(x).mean()
Here's another approach using shifted array comparison to get the start and stop indices of that slice and getting the mean of those indices for determining the center of mass, like so -
np.flatnonzero(x[:-1] != x[1:]).mean()+0.5
Runtime test -
In [72]: x = np.zeros(10000,dtype=int)
In [73]: x[100:2000] = 1
In [74]: %timeit np.flatnonzero(x).mean()
10000 loops, best of 3: 115 µs per loop
In [75]: %timeit np.flatnonzero(x[:-1] != x[1:]).mean()+0.5
10000 loops, best of 3: 38.7 µs per loop
We can improve the performance by some margin here with the use of np.nonzero()[0] to replace np.flatnonzero and np.sum in place of np.mean -
In [107]: %timeit (np.nonzero(x[:-1] != x[1:])[0].sum()+1)/2.0
10000 loops, best of 3: 30.6 µs per loop
Alternatively, for the second approach, we can store the start and stop indices and then simply add them to get the center of mass for a bit more efficient approach as we would avoid the function call to np.mean, like so -
start,stop = np.flatnonzero(x[:-1] != x[1:])
out = (stop + start + 1)/2.0
Timings -
In [90]: %timeit start,stop = np.flatnonzero(x[:-1] != x[1:])
10000 loops, best of 3: 21.3 µs per loop
In [91]: %timeit (stop + start + 1)/2.0
100000 loops, best of 3: 4.45 µs per loop
Again, we can experiment with np.nonzero()[0] here.

Fastest way to scale a list of floats

I have a list of floats I get from a machine learning algorithm. All these floats are between 0 and 1:
probs = [proba[0] for proba in self.classifier.predict_proba(x_test)]
probs is my list of floats. The predict_proba() function normally returns a numpy array. It takes about 9 seconds to get the list, and the list finally contains about 60k values.
I would like to scale, or normalize, all the values in the list against the highest value in the list.
Normally, I would do that:
maximum = max(probs)
list_values = [proba / maximum for proba in probs]
But for 60k values, it takes about 2 minutes. I would like to make it shorter.
Do you have any idea about how I could attend better performances ?

If you don't mind using an external library, numpy might be worth looking into:
import numpy
probs = numpy.array([proba[0] for proba in self.classifier.predict_proba(x_test)])
list_values = probs/maximum

Another approach using numpy, potentially faster if your list of probabilities is large, is to convert your whole probabilities to a numpy array, and then operate over it:
import numpy as np
probs = np.asarray(self.classifier.predict_proba(x_test))
list_values = probs[:, 0] / probs.max()
The first line will convert all your probabilities to a N x M array (where N is your samples and M your number of classes).
The second line will select all the probabilities for the first class ([:, 0] means all rows of the column 0, which yields a vector of size N) and divide it for the maximum.
You can potentially extend this to all your probabilities:
all_probs = probs / probs.max()
The above will normalize all your probabilities for all the classes. And later you can access them like all_probs[:, i] where i is the class of interest.

You should use Scikit learn's normalize.
from sklearn.preprocessing import normalize

If you want your end results to be numpy.array , then it would be to faster to convert your list to numpy array before hand and to use array division directly , than list comprehension. Example -
import numpy as np
probsnp = np.array([proba[0] for proba in self.classifier.predict_proba(x_test)])
maximum = probs.max()
list_values = probs/maximum
Examples of timing tests -
In [46]: import numpy.random as ndr
In [47]: probs = ndr.random_sample(1000)
In [48]: probs.shape
Out[48]: (1000,)
In [49]: def func1(probs):
....: maximum = max(probs)
....: probsnew = [i/maximum for i in probs]
....: return probsnew
....:
In [50]: def func2(probs):
....: maximum = probs.max()
....: probsnew = probs/maximum
....: return probsnew
....:
In [51]: %timeit func1(probs)
The slowest run took 229.79 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 279 µs per loop
In [52]: %timeit func1(probs)
1000 loops, best of 3: 278 µs per loop
In [53]: %timeit func2(probs)
The slowest run took 356.45 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 81 µs per loop
In [54]: %timeit func1(probs)
1000 loops, best of 3: 278 µs per loop
In [55]: %timeit func2(probs)
10000 loops, best of 3: 81.5 µs per loop
The numpy method takes only 1/3rd time as that of list comprehension.
Timing tests with numpy.array() conversion as part of func2 (in above example) -
In [60]: probslist = [p for p in probs]
In [61]: def func2(probs):
....: probsnp = np,array(probs)
....: maxprobs = probsnp.max()
....: probsnew = probsnp/maxprobs
....: return probsnew
....:
In [65]: %timeit func1(probslist)
1000 loops, best of 3: 212 µs per loop
In [66]: %timeit func2(probslist)
10000 loops, best of 3: 198 µs per loop
In [67]: probs = ndr.random_sample(60000)
In [68]: probslist = [p for p in probs]
In [74]: %timeit func1(probslist)
100 loops, best of 3: 11.5 ms per loop
In [75]: %timeit func2(probslist)
100 loops, best of 3: 5.79 ms per loop
In [76]: %timeit func1(probslist)
100 loops, best of 3: 11.4 ms per loop
In [77]: %timeit func2(probslist)
100 loops, best of 3: 5.81 ms per loop
Seems like its still a little faster to use numpy array.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to optimize MAPE code in Python? - python

Another similar way of doing it using masked_Arrays to mask division by zero is: import numpy.ma as ma masked_actual = ma.masked_array(actual, mask=actual==0) MAPE = (np.fabs(masked_actual - predict)/masked_actual).mean()

Related

Looking to speed up loop including pandas operations

Accumulating numbers in an array without a loop. (python)

Code optimization python

Python - Best way to find the 1d center of mass in a binary numpy array

Fastest way to scale a list of floats

Categories

Resources