Numba slower when optionally returning arrays - python

I try to write a numba function, that optionally returns arrays, that are created inside the function. I found, that just the implementation of this option slows down the execution compared to the same function without the option of returning the arrays.
Here is an example:
import numpy as np
from numba import jit
#jit(nopython=True, nogil=True, cache=True)
def testfun1(array_in, arrays_out=np.empty((0,0), np.float64)):
array_add_1 = array_in + 1
array_add_2 = array_in + 2
array_sum_1 = sum(array_add_1)
array_sum_2 = sum(array_add_2)
if arrays_out.shape == (10,len(array_in)):
# write to an array
arrays_out[0,:] = array_add_1
arrays_out[1,:] = array_add_1
arrays_out[2,:] = array_add_1
arrays_out[3,:] = array_add_1
arrays_out[4,:] = array_add_1
arrays_out[5,:] = array_add_2
arrays_out[6,:] = array_add_2
arrays_out[7,:] = array_add_2
arrays_out[8,:] = array_add_2
arrays_out[9,:] = array_add_2
return (array_sum_1, array_sum_2)
#jit(nopython=True, nogil=True, cache=True)
def testfun2(array_in, arrays_out=np.empty((0,0), np.float64)):
array_add_1 = array_in + 1
array_add_2 = array_in + 2
array_sum_1 = sum(array_add_1)
array_sum_2 = sum(array_add_2)
if arrays_out.shape == (10,len(array_in)):
array_add_1 = 0 # do something
array_add_2 = 0 # do something
return (array_sum_1, array_sum_2)
array_in = np.arange(0,10000)
Timing gives me:
%timeit testfun1(array_in) # 70.1 µs ± 346 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit testfun2(array_in) # 69.6 µs ± 217 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
(Please note, that this is a really small scaled example. In my actual code, the time difference is much more noticeable because of more and bigger arrays to optionally return)
So basically both functions do the same, as no arrays_out variable with correct size is passed to the functions. But why is testfun1 slower?
Do you know an alternative to code a function, that optionally returns arrays but won't slow down the execution when arrays should not be returned?
Thanks and best regards,
Scooba

Related

How to parse and evaluate a math expression with Pandas Dataframe columns?

What I would like to do is to parse an expression such this one:
result = A + B + sqrt(B + 4)
Where A and B are columns of a dataframe. So I would have to parse the expresion like this in order to get the result:
new_col = df.B + 4
result = df.A + df.B + new_col.apply(sqrt)
Where df is the dataframe.
I have tried with re.sub but it would be good only to replace the column variables (not the functions) like this:
import re
def repl(match):
inner_word = match.group(1)
new_var = "df['{}']".format(inner_word)
return new_var
eq = 'A + 3 / B'
new_eq = re.sub('([a-zA-Z_]+)', repl, eq)
result = eval(new_eq)
So, my questions are:
Is there a python library to do this? If not, how can I achieve this in a simple way?
Creating a recursive function could be the solution?
If I use the "reverse polish notation" could simplify the parsing?
Would I have to use the ast module?
Pandas DataFrames do have an eval function. Using your example equation:
import pandas as pd
# create an example DataFrame to work with
df = pd.DataFrame({"A": [1, 2], "B": [3, 4]})
# define equation
eq = 'A + 3 / B'
# actual computation
df.eval(eq)
# more complicated equation
eq = "A + B + sqrt(B + 4)"
df.eval(eq)
Warning
Keep in mind that eval allows to run arbitrary code, which can make you vulnerable to code injection if you pass user input to this function.
Following the example provided by #uuazed, a faster way would be using numexpr
import pandas as pd
import numpy as np
import numexpr as ne
df = pd.DataFrame(np.random.randn(int(1e6), 2), columns=['A', 'B'])
eq = "A + B + sqrt(B + 4)"
timeit df.eval(eq)
# 15.9 ms ± 177 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
timeit A=df.A; B=df.B; ne.evaluate(eq)
# 6.24 ms ± 396 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
numexpr may also have more supported operations

Euclidean distance between matrix and vector

Calculate the euclidean of a vector from each column of another vector.
Is this correct?
distances=np.sqrt(np.sum(np.square(new_v-val.reshape(10,1)),axis=0))
new_v is a matrix.
val.reshape(10,1) is a column vector.
Another other/better ways to do it.
What you have is correct. There is a simpler method available in numpy.linalg:
from numpy.linalg import norm
norm(new_v.T-val, axis=1, ord=2)
You can make use of the efficient np.einsum -
subs = new_v - val[:,None]
out = np.sqrt(np.einsum('ij,ij->j',subs,subs))
Alternatively, using (a-b)^2 = a^2 + b^2 - 2ab formula -
out = np.sqrt(np.einsum('ij,ij->j',new_v, new_v) + val.dot(val) - 2*val.dot(new_v))
If the second axis of new_v is a large one, we can also numexpr module to compute the sqrt part at the end.
Runtime test
Approaches -
import numexpr as ne
def einsum_based(new_v, val):
subs = new_v - val[:,None]
return np.sqrt(np.einsum('ij,ij->j',subs,subs))
def dot_based(new_v, val):
return np.sqrt(np.einsum('ij,ij->j',new_v, new_v) + \
val.dot(val) - 2*val.dot(new_v))
def einsum_numexpr_based(new_v, val):
subs = new_v - val[:,None]
sq_dists = np.einsum('ij,ij->j',subs,subs)
return ne.evaluate('sqrt(sq_dists)')
def dot_numexpr_based(new_v, val):
sq_dists = np.einsum('ij,ij->j',new_v, new_v) + val.dot(val) - 2*val.dot(new_v)
return ne.evaluate('sqrt(sq_dists)')
Timings -
In [85]: # Inputs
...: new_v = np.random.randint(0,9,(10,100000))
...: val = np.random.randint(0,9,(10))
In [86]: %timeit np.sqrt(np.sum(np.square(new_v-val.reshape(10,1)),axis=0))
...: %timeit einsum_based(new_v, val)
...: %timeit dot_based(new_v, val)
...: %timeit einsum_numexpr_based(new_v, val)
...: %timeit dot_numexpr_based(new_v, val)
...:
100 loops, best of 3: 2.91 ms per loop
100 loops, best of 3: 2.1 ms per loop
100 loops, best of 3: 2.12 ms per loop
100 loops, best of 3: 2.26 ms per loop
100 loops, best of 3: 2.43 ms per loop
In [87]: from numpy.linalg import norm
# #wim's solution
In [88]: %timeit norm(new_v.T-val, axis=1, ord=2)
100 loops, best of 3: 5.88 ms per loop

convert time string XhYmZs to seconds in python

I have a string which comes in three forms:
XhYmZs or YmZs or Zs
where, h,m,s are for hours, mins, secs and X,Y,Z are the corresponding values.
How do I efficiently convert these strings to seconds in python2.7?
I guess I can do something like:
s="XhYmZs"
if "h" in s:
hours=s.split("h")
elif "m" in s:
mins=s.split("m")[0][-1]
... but this does not seem very efficient to me :(
Split on the delimiters you're interested in, then parse each resulting element into an integer and multiply as needed:
import re
def hms(s):
l = list(map(int, re.split('[hms]', s)[:-1]))
if len(l) == 3:
return l[0]*3600 + l[1]*60 + l[2]
elif len(l) == 2:
return l[0]*60 + l[1]
else:
return l[0]
This produces a duration normalized to seconds.
>>> hms('3h4m5s')
11045
>>> 3*3600+4*60+5
11045
>>> hms('70m5s')
4205
>>> 70*60+5
4205
>>> hms('300s')
300
You can also make this one line by turning the re.split() result around and multiplying by 60 raised to an incrementing power based on the element's position in the list:
def hms2(s):
return sum(int(x)*60**i for i,x in enumerate(re.split('[hms]', s)[-2::-1]))
>>> import datetime
>>> datetime.datetime.strptime('3h4m5s', '%Hh%Mm%Ss').time()
datetime.time(3, 4, 5)
Since it varies which fields are in your strings, you may have to build a matching format string.
>>> def parse(s):
... fmt=''.join('%'+c.upper()+c for c in 'hms' if c in s)
... return datetime.datetime.strptime(s, fmt).time()
The datetime module is the standard library way to handle times.
Asking to do this "efficiently" is a bit of a fool's errand. String parsing in an interpreted language isn't fast; aim for clarity. In addition, seeming efficient isn't very meaningful; either analyze the algorithm or benchmark, otherwise it's speculation.
Do not know how efficient this is, but this is how I would do it:
import re
test_data = [
'1h2m3s',
'1m2s',
'1s',
'3s1h2m',
]
HMS_REGEX = re.compile('^(\d+)h(\d+)m(\d+)s$')
MS_REGEX = re.compile('^(\d+)m(\d+)s$')
S_REGEX = re.compile('^(\d+)s$')
def total_seconds(hms_string):
found = HMS_REGEX.match(hms_string)
if found:
x = found.group(1)
return 3600 * int(found.group(1)) + \
60 * int(found.group(2)) + \
int(found.group(3))
found = MS_REGEX.match(hms_string)
if found:
return 60 * int(found.group(1)) + int(found.group(2))
found = S_REGEX.match(hms_string)
if found:
return int(found.group(1))
raise ValueError('Could not convert ' + hms_string)
for datum in test_data:
try:
print(total_seconds(datum))
except ValueError as exc:
print(exc)
or going to a single match and riffing on TigerhawkT3's one liner, but retaining the error checking of non-matching strings:
HMS_REGEX = re.compile('^(\d+)h(\d+)m(\d+)s$|^(\d+)m(\d+)s$|^(\d+)s$')
def total_seconds(hms_string):
found = HMS_REGEX.match(hms_string)
if found:
return sum(
int(x or 0) * 60 ** i for i, x in enumerate(
(y for y in reversed(found.groups()) if y is not None))
raise ValueError('Could not convert ' + hms_string)
My fellow pythonistas, please stop using regular expression for everything. Regular Expression is not needed for such simple tasks. Python is considered a slow language not because the GIL or the interpreter, because such mis-usage.
In [1]: import re
...: def hms(s):
...: l = list(map(int, re.split('[hms]', s)[:-1]))
...: if len(l) == 3:
...: return l[0]*3600 + l[1]*60 + l[2]
...: elif len(l) == 2:
...: return l[0]*60 + l[1]
...: else:
...: return l[0]
In [2]: %timeit hms("6h7m8s")
5.62 µs ± 722 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [6]: def ehms(s):
...: bases=dict(h=3600, m=60, s=1)
...: secs = 0
...: num = 0
...: for c in s:
...: if c.isdigit():
...: num = num * 10 + int(c)
...: else:
...: secs += bases[c] * num
...: num = 0
...: return secs
In [7]: %timeit ehms("6h7m8s")
2.07 µs ± 70.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [8]: %timeit hms("8s")
2.35 µs ± 124 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [9]: %timeit ehms("8s")
1.06 µs ± 118 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [10]: bases=dict(h=3600, m=60, s=1)
In [15]: a = ord('a')
In [16]: def eehms(s):
...: secs = 0
...: num = 0
...: for c in s:
...: if c.isdigit():
...: num = num * 10 + ord(c) - a
...: else:
...: secs += bases[c] * num
...: num = 0
...: return secs
In [17]: %timeit eehms("6h7m8s")
1.45 µs ± 30 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
see, almost 4 times as fast.
There's a library python-dateutil - pip install python-dateutil, it takes a string and returns a datetime.datetime.
It can parse values as 5h 30m, 0.5h 30m, 0.5h - with spaces or without.
from datetime import datetime
from dateutil import parser
time = '5h15m50s'
midnight_plus_time = parser.parse(time)
midnight: datetime = datetime.combine(datetime.today(), datetime.min.time())
timedelta = midnight_plus_time - midnight
print(timedelta.seconds) # 18950
It can't parse more than 24h at once though.

How to turn a 1D radial profile into a 2D array in python

I have a list that models a phenomenon that is a function of radius. I want to convert this to a 2D array. I wrote some code that does exactly what I want, but since it uses nested for loops, it is quite slow.
l = len(profile1D)/2
critDim = int((l**2 /2.)**(1/2.))
profile2D = np.empty([critDim, critDim])
for x in xrange(0, critDim):
for y in xrange(0,critDim):
r = ((x**2 + y**2)**(1/2.))
profile2D[x,y] = profile1D[int(l+r)]
Is there a more efficient way to do the same thing by avoiding these loops?
Here's a vectorized approach using broadcasting -
a = np.arange(critDim)**2
r2D = np.sqrt(a[:,None] + a)
out = profile1D[(l+r2D).astype(int)]
If there are many repeated indices generated by l+r2D, we can use np.take for some further performance boost, like so -
out = np.take(profile1D,(l+r2D).astype(int))
Runtime test
Function definitions -
def org_app(profile1D,l,critDim):
profile2D = np.empty([critDim, critDim])
for x in xrange(0, critDim):
for y in xrange(0,critDim):
r = ((x**2 + y**2)**(1/2.))
profile2D[x,y] = profile1D[int(l+r)]
return profile2D
def vect_app1(profile1D,l,critDim):
a = np.arange(critDim)**2
r2D = np.sqrt(a[:,None] + a)
out = profile1D[(l+r2D).astype(int)]
return out
def vect_app2(profile1D,l,critDim):
a = np.arange(critDim)**2
r2D = np.sqrt(a[:,None] + a)
out = np.take(profile1D,(l+r2D).astype(int))
return out
Timings and verification -
In [25]: # Setup input array and params
...: profile1D = np.random.randint(0,9,(1000))
...: l = len(profile1D)/2
...: critDim = int((l**2 /2.)**(1/2.))
...:
In [26]: np.allclose(org_app(profile1D,l,critDim),vect_app1(profile1D,l,critDim))
Out[26]: True
In [27]: np.allclose(org_app(profile1D,l,critDim),vect_app2(profile1D,l,critDim))
Out[27]: True
In [28]: %timeit org_app(profile1D,l,critDim)
10 loops, best of 3: 154 ms per loop
In [29]: %timeit vect_app1(profile1D,l,critDim)
1000 loops, best of 3: 1.69 ms per loop
In [30]: %timeit vect_app2(profile1D,l,critDim)
1000 loops, best of 3: 1.68 ms per loop
In [31]: # Setup input array and params
...: profile1D = np.random.randint(0,9,(5000))
...: l = len(profile1D)/2
...: critDim = int((l**2 /2.)**(1/2.))
...:
In [32]: %timeit org_app(profile1D,l,critDim)
1 loops, best of 3: 3.76 s per loop
In [33]: %timeit vect_app1(profile1D,l,critDim)
10 loops, best of 3: 59.8 ms per loop
In [34]: %timeit vect_app2(profile1D,l,critDim)
10 loops, best of 3: 59.5 ms per loop

Faster loop operating on two values of an array

Consider the following function:
def dostuff(n, f):
array = numpy.arange(0, n)
for i in range(1, n): # Line 1
array[i] = f(array[i-1], array[i]) # Line 2
return numpy.sum(array)
How can I rewrite the Line 1/Line 2 to make the loop faster in python 3 (without using cython)?
I encourage you to check this question on SO generalized cumulative functions in NumPy/SciPy? , since you want a generalized cumulative function .
also check scipy documentation for the function frompyfunc Here
func = np.frompyfunc(f , 2 , 1)
def dostuff(n,f):
final_array = func.accumulate(np.arange(0,n), dtype=np.object).astype(np.int)
return np.sum(final_array)
Example
In [86]:
def f(num1 , num2):
return num1 + num2
In [87]:
func = np.frompyfunc(f , 2 , 1)
In [88]:
def dostuff(n,f):
final_array = func.accumulate(np.arange(0,n), dtype=np.object).astype(np.int)
return np.sum(final_array)
In [108]:
dostuff(15,f)
Out[108]:
560
In [109]:
dostuff(10,f)
Out[109]:
165
Benchmarks
def dostuff1(n, f):
array = np.arange(0, n)
for i in range(1, n): # Line 1
array[i] = f(array[i-1], array[i]) # Line 2
return np.sum(array)
def dostuff2(n,f):
final_array = func.accumulate(np.arange(0,n), dtype=np.object).astype(np.int)
return np.sum(final_array)
In [126]:
%timeit dostuff1(100,f)
10000 loops, best of 3: 40.6 µs per loop
In [127]:
%timeit dostuff2(100,f)
The slowest run took 4.98 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 23.8 µs per loop

Categories

Resources