Reduce function calls - python

I profiled my python program and found that the following function was taking too long to run. Perhaps, I can use a different algorithm and make it run faster. However, I have read that I can also possibly increase the speed by reducing function calls, especially when it gets called repeatedly within a loop. I am a python newbie and would like to learn how to do this and see how much faster it can get. Currently, the function is:
def potentialActualBuyers(setOfPeople,theCar,price):
count=0
for person in setOfPeople:
if person.getUtility(theCar) >= price and person.periodCarPurchased==None:
count += 1
return count
where setOfPeople is a list of person objects. I tried the following:
def potentialActualBuyers(setOfPeople,theCar,price):
count=0
Utility=person.getUtility
for person in setOfPeople:
if Utility(theCar) >= price and person.periodCarPurchased==None:
count += 1
return count
This, however, gives me an error saying local variable 'person' referenced before assignment
Any suggestions, how I can reduce function calls or any other changes that can make the code faster.
Again, I am a python newbie and even though I may possibly be able to use a better algorithm, it is still worthwhile learning the answer to the above question.
Thanks very much.
***** EDIT *****
Adding the getUtility method:
def getUtility(self,theCar):
if theCar in self.utility.keys():
return self.utility[theCar]
else:
self.utility[theCar]=self.A*(math.pow(theCar.mpg,self.alpha))*(math.pow(theCar.hp,self.beta))*(math.pow(theCar.pc,self.gamma))
return self.utility[theCar]
***** EDIT: asking for new ideas *****
Any ideas how to speed this up further. I used the method suggested by Alex to cut the time in half. Can I speed this further?
Thanks.

I doubt you can get much speedup in this case by hoisting the lookup of person.getUtility (by class, not by instances, as other instances have pointed out). Maybe...:
return sum(1 for p in setOfPeople
if p.periodCarPurchased is None
and p.getUtility(theCar) >= price)
but I suspect most of the time is actually spent in the execution of getUtility (and possibly in the lookup of p.periodCarPurchased if that's some fancy property as opposed to a plain old attribute -- I moved the latter before the and just in case it is a plain attribute and can save a number of the getUtility calls). What does your profiling say wrt the fraction of time spent in this function (net of its calls to others) vs the method (and possibly property) in question?

Try instead (that's assuming all persons are of the same type Person):
Utility = Person.getUtility
for person in setOfPeople:
if Utility (person, theCar) >= ...
Also, instead of == None using is None should be marginally faster. Try if swapping and terms helps.

Methods are just functions bound to an object:
Utility = Person.getUtility
for person in setOfPeople:
if Utility(person, theCar) ...
This doesn't eliminate a function call though, it eliminates an attribute lookup.

This one line made my eyes bleed:
self.utility[theCar]=self.A*(math.pow(theCar.mpg,self.alpha))*(math.pow(theCar.hp,self.beta))*(math.pow(theCar.pc,self.gamma))
Let's make it legible and PEP8able and then see if it can be faster. First some spaces:
self.utility[theCar] = self.A * (math.pow(theCar.mpg, self.alpha)) * (math.pow(theCar.hp, self.beta)) * (math.pow(theCar.pc, self.gamma))
Now we can see there are very redundant parentheses; remove them:
self.utility[theCar] = self.A * math.pow(theCar.mpg, self.alpha) * math.pow(theCar.hp, self.beta) * math.pow(theCar.pc, self.gamma)
Hmmm: 3 lookups of math.pow and 3 function calls. You have three choices for powers: x ** y, the built-in pow(x, y[, z]), and math.pow(x, y). Unless you have good reason for using one of the others, it's best (IMHO) to choose x ** y; you save both the attribute lookup and the function call.
self.utility[theCar] = self.A * theCar.mpg ** self.alpha * theCar.hp ** self.beta * theCar.pc ** self.gamma
annnnnnd while we're here, let's get rid of the horizontal scroll-bar:
self.utility[theCar] = (self.A
* theCar.mpg ** self.alpha
* theCar.hp ** self.beta
* theCar.pc ** self.gamma)
A possibility that would require quite a rewrite of your existing code and may not help anyway (in Python) would be to avoid most of the power calculations by taking logs everywhere and working with log_utility = log_A + log_mpg * alpha ...

Related

Issues when profiling list reversal in Python vs Erlang

I was profiling Erlang's lists:reverse Built in Function (BIF) to see how well it scales with the size of the input. More specifically, I tried:
1> X = lists:seq(1, 1000000).
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
23,24,25,26,27,28,29|...]
2> timer:tc(lists, reverse, [X]).
{57737,
[1000000,999999,999998,999997,999996,999995,999994,999993,
999992,999991,999990,999989,999988,999987,999986,999985,
999984,999983,999982,999981,999980,999979,999978,999977,
999976,999975,999974|...]}
3> timer:tc(lists, reverse, [X]).
{46896,
[1000000,999999,999998,999997,999996,999995,999994,999993,
999992,999991,999990,999989,999988,999987,999986,999985,
999984,999983,999982,999981,999980,999979,999978,999977,
999976,999975,999974|...]}
4> Y = lists:seq(1, 10000000).
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
23,24,25,26,27,28,29|...]
5> timer:tc(lists, reverse, [Y]).
{434079,
[10000000,9999999,9999998,9999997,9999996,9999995,9999994,
9999993,9999992,9999991,9999990,9999989,9999988,9999987,
9999986,9999985,9999984,9999983,9999982,9999981,9999980,
9999979,9999978,9999977,9999976,9999975,9999974|...]}
6> timer:tc(lists, reverse, [Y]).
{214173,
[10000000,9999999,9999998,9999997,9999996,9999995,9999994,
9999993,9999992,9999991,9999990,9999989,9999988,9999987,
9999986,9999985,9999984,9999983,9999982,9999981,9999980,
9999979,9999978,9999977,9999976,9999975,9999974|...]}
Ok, so far it seems like the reverse BIF scales in approximately linear time with respect to the input (e.g. multiply the size of the input by 10 and the size of time taken also increases by a factor of 10). In pure Erlang that would make sense since we would use something like tail recursion to reverse the list. I guess that even as a BIF implemented in C, the algorithm for reversing seems a list to be the same (maybe because of the way lists are just represented in Erlang?).
Now I wanted to compare this with something another language - perhaps another dynamically typed language that I already use. So I tried a similar thing in Python - taking care to, very explicitly, use actual lists instead of generators which I anticipate would affect the performance of Python positively in this test, giving it an unfair advantage.
import time
ms_conv_factor = 10**6
def profile(func, *args):
start = time.time()
func(args)
end = time.time()
elapsed_seconds = end - start
print(elapsed_seconds * ms_conv_factor, flush=True)
x = list([i for i in range(0, 1000000)])
y = list([i for i in range(0, 10000000)])
z = list([i for i in range(0, 100000000)])
def f(m):
return m[::-1]
def g(m):
return reversed(m)
if __name__ == "__main__":
print("All done loading the lists, starting now.", flush=True)
print("f:")
profile(f, x)
profile(f, y)
print("")
profile(f, x)
profile(f, y)
print("")
profile(f, z)
print("")
print("g:")
profile(g, x)
profile(g, y)
print("")
profile(g, x)
profile(g, y)
print("")
profile(g, z)
This seems to suggest that after the function has been loaded and run once, the length of the input makes no difference and the reversal times are incredibly fast - in the range of ~0.7µs.
Exact result:
All done loading the lists, starting now.
f:
1.430511474609375
0.7152557373046875
0.7152557373046875
0.2384185791015625
0.476837158203125
g:
1.9073486328125
0.7152557373046875
0.2384185791015625
0.2384185791015625
0.476837158203125
My first, naive, guess was that python might be able to recognize the reverse construct and create something like a reverse iterator and return that (Python can work with references right? Maybe it was using some kind of optimization here). But I don't think that theory makes sense since the original list and the returned list are not the same (changing one shouldn't change the other).
So my question(s) here is(are):
Is my profiling technique here flawed? Have I written the tests in a way that favor one language over the other?
What is the difference in implementation of lists and their reversal in Erlang vs Python that make this situation (of Python being WAY faster) possible?
Thanks for your time (in advance).
This seems to suggest that after the function has been loaded and run
once, the length of the input makes no difference and the reversal
times are incredibly fast - in the range of ~0.7µs.
Because your profiling function is incorrect. It accepts variable positional arguments, but when it passes them to the function, it doesn't unpack them so you are only ever working with a tuple of length one. You need to do the following:
def profile(func, *args):
start = time.time()
func(*args) # Make sure to unpack the args!
end = time.time()
elapsed_seconds = end - start
print(elapsed_seconds * ms_conv_factor, flush=True)
So notice the difference:
>>> def foo(*args):
... print(args)
... print(*args)
...
>>> foo(1,2,3)
(1, 2, 3)
1 2 3
Also note, reversed(m) creates a reversed iterator, so it doesn't actually do anything until you iterate over it. So g will still be constant time.
But rest assured, reversing a list in Python takes linear time.

using numba to speedup solve_ivp

I am using the method of lines with 'solve_ivp` to solve a nonlinear PDE:
#njit(fastmath=True,error_model="numpy",cache=True)
def thinFilmEq(t,h,dx,Ma,phiFun,tempFun):
phi = phiFun(h)
temperature = tempFun(h)
hxx = (np.roll(h,1) - 2*h + np.roll(h,-1))/dx**2 # use np.roll as I'm implementing periodic BC
p = phi - hxx
px = (np.roll(p,-1) - np.roll(p,1))/(2*dx)
Tx = (np.roll(temperature,-1) - np.roll(temperature,1))/(2*dx)
flux = h**3*px/3 + Ma*h**2*Tx/2
dhdt = (np.roll(flux,-1) - np.roll(flux,1))/(2*dx)
return dhdt
I get the following error: TypingError: non-precise type pyobject
[1] During: typing of argument at C:/Users/yhcha/method_of_lines/test_01_thinFilmEq.py (28) I suspect it is due to phiFun and tempFun. They are functions which I supply at the time of calling. I make the functions arguments to the dhdt function just to keep things more general. When I try to remove phiFun and tempFun and explicitly give the function form inside thinFilmEq, the error goes away.
Then, I see the following error TypingError: Use of unsupported NumPy function 'numpy.roll' or unsupported use of the function. I thought maybe np.roll is not supported although it is included in the official website. I tried to 'enlarge' the array to somehow manually apply the same thing as np.roll when dealing with the finite difference for periodic BC:
def augment(x):
x2 = np.empty(len(x)+2)
x2[1:-1] = x
x2[0] = x[-1]
x2[-1] = x[0]
return x2
H = augment(x)
hx = (H[2:]-[H:-2])/dx # use this instead of hx=(roll(h,-1)-roll(h,1))/dx
My questions are:
It seems that I can get numba to work, at the expense of making the codes less generally (cannot supply an arbitrary function like phiFun and elegant (e.g. cannot use a one-liner with np.roll). Are there ways to get around it or is it just the price I need to pay when using numba to 'compile' the code?
The original version without numba is close to 10x slower than the Matlab version I coded, and the numba version is still around 3-4 times slower than Matlab. I don't really expect scipy to outperform Matlab, but are there other ways to speedup the code to bridge the gap?

Python/Numba - Custom class object as input type

I'm starting with numba and my first goal is to try and accelerate a not so complicated function with a nested loop.
Given the following class:
class TestA:
def __init__(self, a, b):
self.a = a
self.b = b
def get_mult(self):
return self.a * self.b
and a numpy ndarray that contains class TestA objects. Dimension (N,) where N is usually ~3 million in length.
Now given the following function:
def test_no_jit(custom_class_obj_container):
container_length = len(custom_class_obj_container)
sum = 0
for i in range(container_length):
for j in range(i + 1, container_length):
obj_i = custom_class_obj_container[i]
obj_j = custom_class_obj_container[j]
sum += (obj_i.get_mult() + obj_j.get_mult())
return sum
I've tried to play around numba to get it to work with the function above however I cannot seem to get it to work with nopython=True flag, and if it's set to false, then the runtime is higher than the no-jit function.
Here is my latest try in trying to jit the function (also using nb.prange):
#nb.jit(nopython=False, parallel=True)
def test_jit(custom_class_obj_container):
container_length = len(custom_class_obj_container)
sum = 0
for i in nb.prange(container_length):
for j in nb.prange(i + 1, container_length):
obj_i = custom_class_obj_container[i]
obj_j = custom_class_obj_container[j]
sum += (obj_i.get_mult() + obj_j.get_mult())
return sum
I've tried to search around but I cannot seem to find a tutorial of how to define a custom class in the signature, and how would I go in order to accelerate a function of that sort and get it to run on GPU and possibly (any info regarding that matter would be highly appreciated) to get it to run with cuda libraries - which are installed and ready to use (previously used with tensorflow)
The numba docs give an example of creating a custom type, even for nopython mode: https://numba.pydata.org/numba-doc/latest/extending/interval-example.html
In your case though, unless this is a really slimmed down version of what you actually want to do, it seems like the easiest approach would be to re-use existing types. Additionally, the construction of a 3M length object array is going to be slow, and produce fragmented memory (as the objects are not being stored in contiguous blocks).
An example of how using record arrays might be used to solve the problem:
x_dt = np.dtype([('a', np.float64),
('b', np.float64)])
n = 30000
buf = np.arange(n*2).reshape((n, 2)).astype(np.float64)
vec3 = np.recarray(n, dtype=x_dt, buf=buf)
#numba.njit
def mult(a):
return a.a * a.b
#numba.jit(nopython=True, parallel=True)
def sum_of_prod(vector):
sum = 0
vector_len = len(vector)
for i in numba.prange(vector_len):
for j in numba.prange(i + 1, vector_len):
sum += mult(vector[i]) + mult(vector[j])
return sum
sum_of_prod(vec3)
FWIW, I'm no numba expert. I found this question when searching for how to implement a custom type in numba for non-numerical stuff. In your case, because this is highly numerical, I think a custom type is probably overkill.

TyperError: 'function' object has no attribute '_getitem_'

Currently, I am working on my graduation project, but I have some troubles with my code. Is there maybe anybody that could help to solve my error. I am trying to optimize the profit generated out of water.
The error I get is the following one (appears in the if line of the constraint):
TypeError: 'function' object has no attribute '__getitem__'
My code:
#constants
Ymax=8 #tonne/ha
ky=1.25
Numbas=3 #3 subbasins
Nummon=12 #12 months is a year.
c_hydro=0.9 #Conversion rate m3 to kWh
LBPmaize=316413 #LBP/tonne
LBPhydro=55 #LBP/kWh
alpha=0.7
p1=0.35 #soil moisture depletion factor for no stress
#parameters
S0=[207.112, 150, 161.398]
A=[74571.9, 1537.8, 6645.7] #total area per subbasin
a=[0.423, 0.959, 0.473] #part of area used for irrigation
R=[0.2, 0.3, 0.5]
Qhydromatrix=[0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,
1,1,1,1,1,0,0,0,0,1,1,1]
#Definitions
def profit(x,sign=-1):
x=[]*(Numbas*Nummon)
return sign * (alpha*Yasum*LBPmaize)
def ETa(x):
for i in range(0,(Nummon*Numbas)):
ETa=[]*(Numbas*Nummon)
if np.multiply(x,p1)>ETmax_maize:
ETa[i]=ETmax_maize[i]
else:
ETa[i]=np.multiply(x[i],p1)
return ETa
def Yasum(ETa):
Yasum=0
for j in range(0,Numbas):
for i in range(0,Nummon):
ETasum=sum(ETa[j*Nummon:(j+1)*Nummon])
ETmaxsum=sum(ETmax_maize[j*Nummon:(j+1)*Nummon])
Ya=((-1*Ymax*ky)*(1-(ETasum/ETmaxsum))+Ymax)*A[j]*a[j]
Yasum=Yasum+Ya
return Yasum
def constraint(x):
for j in range(0,Numbas):
for i in range(0,Nummon):
if (i-(j*Nummon))==0:
x[i+(j*Nummon)]-((1-R[j])*S0[j])+ETa[i+(j*Nummon)+11]-P[i+(j*Nummon)+11]-Rdown[i+(j*Nummon)]
else:
ETa[i+(j*Nummon)-1]-((1-R[j])*x[i+(j*Nummon)-1])+x[i+(j*Nummon)]-P[i+(j*Nummon)-1]-Rdown[i+(j*Nummon)]
return x
con2=({'type':'ineq','fun':constraint})
x0=[100]*(Nummon*Numbas)
sol=minimize(profit, x0,method='SLSQP', constraints=con2)
In general, this error means you have an expression f[x] where f is a function.
You have several problems in your code, but the error comes from this line:
ETa[i+(j*Nummon)-1]-((1-R[j])*x[i+(j*Nummon)-1])+x[i+(j*Nummon)]-P[i+(j*Nummon)-1]-Rdown[i+(j*Nummon)]
# ^ this is a function
You defined a function called ETa. You also use local variables named ETa. This is not a good idea in general. In the last function, you don't have a local variable in this name, so the name is looked up at the global scope - and that's a function; functions cannot be accessed this way.
Other issues with your code:
In the function profit you are assigning the parameter, and not using it: x=[]*(Numbas*Nummon). This change is local, and has no effect on the argument passed to the function.
You are using expressions of the form [] * some_number in two places. This is meaningless - repeating the empty list returns the empty list.
In the function constraint you are returning x, but not changing it in any way. Did you intend to change it from within profit? Because you can't.
In the same function, both if and else branch goes on to evaluate expression, without any effect.
Note that those problems are unrelated to the logic of your code.
Important style issues:
Please use snake_case variable names. This does not affect the behavior of the code, but helps other people read it, so it will help you in getting feedback. Similarly, put spaces around operators such as =, +, etc.
If expressions get too long, break it into several statement using local variable.
If an expression repeats itself (such as j * Nummon above), put its value in a variable.

multiple functions as arguments in python

I have the following problem: I have two sets of data (set T and set F). And the following functions:
x(T) = arctan(T-c0), A(x(T)) = arctan(x(T) -c1),
B(x(T)) = arctan(x(T) -c2)
and Y(x(t),F) = ((A(x(t)) - B(x(t)))/2 - A(x(t))arctan(F-c3) + B(x(t))arctan(F-c4))
# where c0,c1,c2,c3,c4 are constants
Now I want to create a surface plot of Y. And for that I would like to implement Y as a python (numpy) function what turns out to be quite complicated, because Y takes other functions as input.
Another idea of mine was to evaluate x, B and A on the data separately and store the results in numpy arrays. With those I also could get the output of the function Y , but I don't know which way is better in order to plot the data and I really would like to know how to write Y as a python function.
Thank you very much for your help
It is absolutely possible to use functions as input parameters to other functions. A use case could look like:
def plus_one(standard_input_parameter_like_int):
return standard_input_parameter_like_int + 1
def apply_function(function_as_input, standard_input_parameter):
return function_as_input(standard_input_parameter)
if(__name__ == '__main__'):
print(apply_function(plus_one, 1))
I hope that helps to solve your specific problem.
[...] somethin like def s(x,y,z,*args,*args2): will yield an
error.
This is perfectly normal as (at least as far as I know) there is only one variable length non-keyword argument list allowed per function (that has to be exactly labeled as *args). So if you remove the asterisks (*) you should actually be able to run s properly.
Regarding your initial question you could do something like:
c = [0.2,-0.2,0,0,0,0]
def x(T):
return np.arctan(T-c[0])
def A(xfunc,T):
return np.arctan(xfunc(T) - c[1])
def B(xfunc,T):
return np.arctan(xfunc(T) - c[2])
def Y(xfunc,Afunc,Bfunc,t,f):
return (Afunc(xfunc,t) - Bfunc(xfunc,t))/2.0 - Afunc(xfunc,t) * np.arctan(f - c[3]) + Bfunc(xfunc,t)*np.arctan(f-c[4])
_tSet = np.linspace(-1,1,20)
_fSet = np.arange(-1,1,20)
print Y(x,A,B,_tSet,_fSet)
As you can see (and probably already tested by yourself judging from your comment) you can use functions as arguments. And as long as you don't use any 'if' conditions or other non-vectorized functions in your 'sub'-functions the top-level function should already be vectorized.

Categories

Resources