I was profiling Erlang's lists:reverse Built in Function (BIF) to see how well it scales with the size of the input. More specifically, I tried:
1> X = lists:seq(1, 1000000).
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
23,24,25,26,27,28,29|...]
2> timer:tc(lists, reverse, [X]).
{57737,
[1000000,999999,999998,999997,999996,999995,999994,999993,
999992,999991,999990,999989,999988,999987,999986,999985,
999984,999983,999982,999981,999980,999979,999978,999977,
999976,999975,999974|...]}
3> timer:tc(lists, reverse, [X]).
{46896,
[1000000,999999,999998,999997,999996,999995,999994,999993,
999992,999991,999990,999989,999988,999987,999986,999985,
999984,999983,999982,999981,999980,999979,999978,999977,
999976,999975,999974|...]}
4> Y = lists:seq(1, 10000000).
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
23,24,25,26,27,28,29|...]
5> timer:tc(lists, reverse, [Y]).
{434079,
[10000000,9999999,9999998,9999997,9999996,9999995,9999994,
9999993,9999992,9999991,9999990,9999989,9999988,9999987,
9999986,9999985,9999984,9999983,9999982,9999981,9999980,
9999979,9999978,9999977,9999976,9999975,9999974|...]}
6> timer:tc(lists, reverse, [Y]).
{214173,
[10000000,9999999,9999998,9999997,9999996,9999995,9999994,
9999993,9999992,9999991,9999990,9999989,9999988,9999987,
9999986,9999985,9999984,9999983,9999982,9999981,9999980,
9999979,9999978,9999977,9999976,9999975,9999974|...]}
Ok, so far it seems like the reverse BIF scales in approximately linear time with respect to the input (e.g. multiply the size of the input by 10 and the size of time taken also increases by a factor of 10). In pure Erlang that would make sense since we would use something like tail recursion to reverse the list. I guess that even as a BIF implemented in C, the algorithm for reversing seems a list to be the same (maybe because of the way lists are just represented in Erlang?).
Now I wanted to compare this with something another language - perhaps another dynamically typed language that I already use. So I tried a similar thing in Python - taking care to, very explicitly, use actual lists instead of generators which I anticipate would affect the performance of Python positively in this test, giving it an unfair advantage.
import time
ms_conv_factor = 10**6
def profile(func, *args):
start = time.time()
func(args)
end = time.time()
elapsed_seconds = end - start
print(elapsed_seconds * ms_conv_factor, flush=True)
x = list([i for i in range(0, 1000000)])
y = list([i for i in range(0, 10000000)])
z = list([i for i in range(0, 100000000)])
def f(m):
return m[::-1]
def g(m):
return reversed(m)
if __name__ == "__main__":
print("All done loading the lists, starting now.", flush=True)
print("f:")
profile(f, x)
profile(f, y)
print("")
profile(f, x)
profile(f, y)
print("")
profile(f, z)
print("")
print("g:")
profile(g, x)
profile(g, y)
print("")
profile(g, x)
profile(g, y)
print("")
profile(g, z)
This seems to suggest that after the function has been loaded and run once, the length of the input makes no difference and the reversal times are incredibly fast - in the range of ~0.7µs.
Exact result:
All done loading the lists, starting now.
f:
1.430511474609375
0.7152557373046875
0.7152557373046875
0.2384185791015625
0.476837158203125
g:
1.9073486328125
0.7152557373046875
0.2384185791015625
0.2384185791015625
0.476837158203125
My first, naive, guess was that python might be able to recognize the reverse construct and create something like a reverse iterator and return that (Python can work with references right? Maybe it was using some kind of optimization here). But I don't think that theory makes sense since the original list and the returned list are not the same (changing one shouldn't change the other).
So my question(s) here is(are):
Is my profiling technique here flawed? Have I written the tests in a way that favor one language over the other?
What is the difference in implementation of lists and their reversal in Erlang vs Python that make this situation (of Python being WAY faster) possible?
Thanks for your time (in advance).
This seems to suggest that after the function has been loaded and run
once, the length of the input makes no difference and the reversal
times are incredibly fast - in the range of ~0.7µs.
Because your profiling function is incorrect. It accepts variable positional arguments, but when it passes them to the function, it doesn't unpack them so you are only ever working with a tuple of length one. You need to do the following:
def profile(func, *args):
start = time.time()
func(*args) # Make sure to unpack the args!
end = time.time()
elapsed_seconds = end - start
print(elapsed_seconds * ms_conv_factor, flush=True)
So notice the difference:
>>> def foo(*args):
... print(args)
... print(*args)
...
>>> foo(1,2,3)
(1, 2, 3)
1 2 3
Also note, reversed(m) creates a reversed iterator, so it doesn't actually do anything until you iterate over it. So g will still be constant time.
But rest assured, reversing a list in Python takes linear time.
I am using the method of lines with 'solve_ivp` to solve a nonlinear PDE:
#njit(fastmath=True,error_model="numpy",cache=True)
def thinFilmEq(t,h,dx,Ma,phiFun,tempFun):
phi = phiFun(h)
temperature = tempFun(h)
hxx = (np.roll(h,1) - 2*h + np.roll(h,-1))/dx**2 # use np.roll as I'm implementing periodic BC
p = phi - hxx
px = (np.roll(p,-1) - np.roll(p,1))/(2*dx)
Tx = (np.roll(temperature,-1) - np.roll(temperature,1))/(2*dx)
flux = h**3*px/3 + Ma*h**2*Tx/2
dhdt = (np.roll(flux,-1) - np.roll(flux,1))/(2*dx)
return dhdt
I get the following error: TypingError: non-precise type pyobject
[1] During: typing of argument at C:/Users/yhcha/method_of_lines/test_01_thinFilmEq.py (28) I suspect it is due to phiFun and tempFun. They are functions which I supply at the time of calling. I make the functions arguments to the dhdt function just to keep things more general. When I try to remove phiFun and tempFun and explicitly give the function form inside thinFilmEq, the error goes away.
Then, I see the following error TypingError: Use of unsupported NumPy function 'numpy.roll' or unsupported use of the function. I thought maybe np.roll is not supported although it is included in the official website. I tried to 'enlarge' the array to somehow manually apply the same thing as np.roll when dealing with the finite difference for periodic BC:
def augment(x):
x2 = np.empty(len(x)+2)
x2[1:-1] = x
x2[0] = x[-1]
x2[-1] = x[0]
return x2
H = augment(x)
hx = (H[2:]-[H:-2])/dx # use this instead of hx=(roll(h,-1)-roll(h,1))/dx
My questions are:
It seems that I can get numba to work, at the expense of making the codes less generally (cannot supply an arbitrary function like phiFun and elegant (e.g. cannot use a one-liner with np.roll). Are there ways to get around it or is it just the price I need to pay when using numba to 'compile' the code?
The original version without numba is close to 10x slower than the Matlab version I coded, and the numba version is still around 3-4 times slower than Matlab. I don't really expect scipy to outperform Matlab, but are there other ways to speedup the code to bridge the gap?
I'm starting with numba and my first goal is to try and accelerate a not so complicated function with a nested loop.
Given the following class:
class TestA:
def __init__(self, a, b):
self.a = a
self.b = b
def get_mult(self):
return self.a * self.b
and a numpy ndarray that contains class TestA objects. Dimension (N,) where N is usually ~3 million in length.
Now given the following function:
def test_no_jit(custom_class_obj_container):
container_length = len(custom_class_obj_container)
sum = 0
for i in range(container_length):
for j in range(i + 1, container_length):
obj_i = custom_class_obj_container[i]
obj_j = custom_class_obj_container[j]
sum += (obj_i.get_mult() + obj_j.get_mult())
return sum
I've tried to play around numba to get it to work with the function above however I cannot seem to get it to work with nopython=True flag, and if it's set to false, then the runtime is higher than the no-jit function.
Here is my latest try in trying to jit the function (also using nb.prange):
#nb.jit(nopython=False, parallel=True)
def test_jit(custom_class_obj_container):
container_length = len(custom_class_obj_container)
sum = 0
for i in nb.prange(container_length):
for j in nb.prange(i + 1, container_length):
obj_i = custom_class_obj_container[i]
obj_j = custom_class_obj_container[j]
sum += (obj_i.get_mult() + obj_j.get_mult())
return sum
I've tried to search around but I cannot seem to find a tutorial of how to define a custom class in the signature, and how would I go in order to accelerate a function of that sort and get it to run on GPU and possibly (any info regarding that matter would be highly appreciated) to get it to run with cuda libraries - which are installed and ready to use (previously used with tensorflow)
The numba docs give an example of creating a custom type, even for nopython mode: https://numba.pydata.org/numba-doc/latest/extending/interval-example.html
In your case though, unless this is a really slimmed down version of what you actually want to do, it seems like the easiest approach would be to re-use existing types. Additionally, the construction of a 3M length object array is going to be slow, and produce fragmented memory (as the objects are not being stored in contiguous blocks).
An example of how using record arrays might be used to solve the problem:
x_dt = np.dtype([('a', np.float64),
('b', np.float64)])
n = 30000
buf = np.arange(n*2).reshape((n, 2)).astype(np.float64)
vec3 = np.recarray(n, dtype=x_dt, buf=buf)
#numba.njit
def mult(a):
return a.a * a.b
#numba.jit(nopython=True, parallel=True)
def sum_of_prod(vector):
sum = 0
vector_len = len(vector)
for i in numba.prange(vector_len):
for j in numba.prange(i + 1, vector_len):
sum += mult(vector[i]) + mult(vector[j])
return sum
sum_of_prod(vec3)
FWIW, I'm no numba expert. I found this question when searching for how to implement a custom type in numba for non-numerical stuff. In your case, because this is highly numerical, I think a custom type is probably overkill.
Currently, I am working on my graduation project, but I have some troubles with my code. Is there maybe anybody that could help to solve my error. I am trying to optimize the profit generated out of water.
The error I get is the following one (appears in the if line of the constraint):
TypeError: 'function' object has no attribute '__getitem__'
My code:
#constants
Ymax=8 #tonne/ha
ky=1.25
Numbas=3 #3 subbasins
Nummon=12 #12 months is a year.
c_hydro=0.9 #Conversion rate m3 to kWh
LBPmaize=316413 #LBP/tonne
LBPhydro=55 #LBP/kWh
alpha=0.7
p1=0.35 #soil moisture depletion factor for no stress
#parameters
S0=[207.112, 150, 161.398]
A=[74571.9, 1537.8, 6645.7] #total area per subbasin
a=[0.423, 0.959, 0.473] #part of area used for irrigation
R=[0.2, 0.3, 0.5]
Qhydromatrix=[0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,
1,1,1,1,1,0,0,0,0,1,1,1]
#Definitions
def profit(x,sign=-1):
x=[]*(Numbas*Nummon)
return sign * (alpha*Yasum*LBPmaize)
def ETa(x):
for i in range(0,(Nummon*Numbas)):
ETa=[]*(Numbas*Nummon)
if np.multiply(x,p1)>ETmax_maize:
ETa[i]=ETmax_maize[i]
else:
ETa[i]=np.multiply(x[i],p1)
return ETa
def Yasum(ETa):
Yasum=0
for j in range(0,Numbas):
for i in range(0,Nummon):
ETasum=sum(ETa[j*Nummon:(j+1)*Nummon])
ETmaxsum=sum(ETmax_maize[j*Nummon:(j+1)*Nummon])
Ya=((-1*Ymax*ky)*(1-(ETasum/ETmaxsum))+Ymax)*A[j]*a[j]
Yasum=Yasum+Ya
return Yasum
def constraint(x):
for j in range(0,Numbas):
for i in range(0,Nummon):
if (i-(j*Nummon))==0:
x[i+(j*Nummon)]-((1-R[j])*S0[j])+ETa[i+(j*Nummon)+11]-P[i+(j*Nummon)+11]-Rdown[i+(j*Nummon)]
else:
ETa[i+(j*Nummon)-1]-((1-R[j])*x[i+(j*Nummon)-1])+x[i+(j*Nummon)]-P[i+(j*Nummon)-1]-Rdown[i+(j*Nummon)]
return x
con2=({'type':'ineq','fun':constraint})
x0=[100]*(Nummon*Numbas)
sol=minimize(profit, x0,method='SLSQP', constraints=con2)
In general, this error means you have an expression f[x] where f is a function.
You have several problems in your code, but the error comes from this line:
ETa[i+(j*Nummon)-1]-((1-R[j])*x[i+(j*Nummon)-1])+x[i+(j*Nummon)]-P[i+(j*Nummon)-1]-Rdown[i+(j*Nummon)]
# ^ this is a function
You defined a function called ETa. You also use local variables named ETa. This is not a good idea in general. In the last function, you don't have a local variable in this name, so the name is looked up at the global scope - and that's a function; functions cannot be accessed this way.
Other issues with your code:
In the function profit you are assigning the parameter, and not using it: x=[]*(Numbas*Nummon). This change is local, and has no effect on the argument passed to the function.
You are using expressions of the form [] * some_number in two places. This is meaningless - repeating the empty list returns the empty list.
In the function constraint you are returning x, but not changing it in any way. Did you intend to change it from within profit? Because you can't.
In the same function, both if and else branch goes on to evaluate expression, without any effect.
Note that those problems are unrelated to the logic of your code.
Important style issues:
Please use snake_case variable names. This does not affect the behavior of the code, but helps other people read it, so it will help you in getting feedback. Similarly, put spaces around operators such as =, +, etc.
If expressions get too long, break it into several statement using local variable.
If an expression repeats itself (such as j * Nummon above), put its value in a variable.
I have the following problem: I have two sets of data (set T and set F). And the following functions:
x(T) = arctan(T-c0), A(x(T)) = arctan(x(T) -c1),
B(x(T)) = arctan(x(T) -c2)
and Y(x(t),F) = ((A(x(t)) - B(x(t)))/2 - A(x(t))arctan(F-c3) + B(x(t))arctan(F-c4))
# where c0,c1,c2,c3,c4 are constants
Now I want to create a surface plot of Y. And for that I would like to implement Y as a python (numpy) function what turns out to be quite complicated, because Y takes other functions as input.
Another idea of mine was to evaluate x, B and A on the data separately and store the results in numpy arrays. With those I also could get the output of the function Y , but I don't know which way is better in order to plot the data and I really would like to know how to write Y as a python function.
Thank you very much for your help
It is absolutely possible to use functions as input parameters to other functions. A use case could look like:
def plus_one(standard_input_parameter_like_int):
return standard_input_parameter_like_int + 1
def apply_function(function_as_input, standard_input_parameter):
return function_as_input(standard_input_parameter)
if(__name__ == '__main__'):
print(apply_function(plus_one, 1))
I hope that helps to solve your specific problem.
[...] somethin like def s(x,y,z,*args,*args2): will yield an
error.
This is perfectly normal as (at least as far as I know) there is only one variable length non-keyword argument list allowed per function (that has to be exactly labeled as *args). So if you remove the asterisks (*) you should actually be able to run s properly.
Regarding your initial question you could do something like:
c = [0.2,-0.2,0,0,0,0]
def x(T):
return np.arctan(T-c[0])
def A(xfunc,T):
return np.arctan(xfunc(T) - c[1])
def B(xfunc,T):
return np.arctan(xfunc(T) - c[2])
def Y(xfunc,Afunc,Bfunc,t,f):
return (Afunc(xfunc,t) - Bfunc(xfunc,t))/2.0 - Afunc(xfunc,t) * np.arctan(f - c[3]) + Bfunc(xfunc,t)*np.arctan(f-c[4])
_tSet = np.linspace(-1,1,20)
_fSet = np.arange(-1,1,20)
print Y(x,A,B,_tSet,_fSet)
As you can see (and probably already tested by yourself judging from your comment) you can use functions as arguments. And as long as you don't use any 'if' conditions or other non-vectorized functions in your 'sub'-functions the top-level function should already be vectorized.