Issues when profiling list reversal in Python vs Erlang - python
I was profiling Erlang's lists:reverse Built in Function (BIF) to see how well it scales with the size of the input. More specifically, I tried:
1> X = lists:seq(1, 1000000).
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
23,24,25,26,27,28,29|...]
2> timer:tc(lists, reverse, [X]).
{57737,
[1000000,999999,999998,999997,999996,999995,999994,999993,
999992,999991,999990,999989,999988,999987,999986,999985,
999984,999983,999982,999981,999980,999979,999978,999977,
999976,999975,999974|...]}
3> timer:tc(lists, reverse, [X]).
{46896,
[1000000,999999,999998,999997,999996,999995,999994,999993,
999992,999991,999990,999989,999988,999987,999986,999985,
999984,999983,999982,999981,999980,999979,999978,999977,
999976,999975,999974|...]}
4> Y = lists:seq(1, 10000000).
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
23,24,25,26,27,28,29|...]
5> timer:tc(lists, reverse, [Y]).
{434079,
[10000000,9999999,9999998,9999997,9999996,9999995,9999994,
9999993,9999992,9999991,9999990,9999989,9999988,9999987,
9999986,9999985,9999984,9999983,9999982,9999981,9999980,
9999979,9999978,9999977,9999976,9999975,9999974|...]}
6> timer:tc(lists, reverse, [Y]).
{214173,
[10000000,9999999,9999998,9999997,9999996,9999995,9999994,
9999993,9999992,9999991,9999990,9999989,9999988,9999987,
9999986,9999985,9999984,9999983,9999982,9999981,9999980,
9999979,9999978,9999977,9999976,9999975,9999974|...]}
Ok, so far it seems like the reverse BIF scales in approximately linear time with respect to the input (e.g. multiply the size of the input by 10 and the size of time taken also increases by a factor of 10). In pure Erlang that would make sense since we would use something like tail recursion to reverse the list. I guess that even as a BIF implemented in C, the algorithm for reversing seems a list to be the same (maybe because of the way lists are just represented in Erlang?).
Now I wanted to compare this with something another language - perhaps another dynamically typed language that I already use. So I tried a similar thing in Python - taking care to, very explicitly, use actual lists instead of generators which I anticipate would affect the performance of Python positively in this test, giving it an unfair advantage.
import time
ms_conv_factor = 10**6
def profile(func, *args):
start = time.time()
func(args)
end = time.time()
elapsed_seconds = end - start
print(elapsed_seconds * ms_conv_factor, flush=True)
x = list([i for i in range(0, 1000000)])
y = list([i for i in range(0, 10000000)])
z = list([i for i in range(0, 100000000)])
def f(m):
return m[::-1]
def g(m):
return reversed(m)
if __name__ == "__main__":
print("All done loading the lists, starting now.", flush=True)
print("f:")
profile(f, x)
profile(f, y)
print("")
profile(f, x)
profile(f, y)
print("")
profile(f, z)
print("")
print("g:")
profile(g, x)
profile(g, y)
print("")
profile(g, x)
profile(g, y)
print("")
profile(g, z)
This seems to suggest that after the function has been loaded and run once, the length of the input makes no difference and the reversal times are incredibly fast - in the range of ~0.7µs.
Exact result:
All done loading the lists, starting now.
f:
1.430511474609375
0.7152557373046875
0.7152557373046875
0.2384185791015625
0.476837158203125
g:
1.9073486328125
0.7152557373046875
0.2384185791015625
0.2384185791015625
0.476837158203125
My first, naive, guess was that python might be able to recognize the reverse construct and create something like a reverse iterator and return that (Python can work with references right? Maybe it was using some kind of optimization here). But I don't think that theory makes sense since the original list and the returned list are not the same (changing one shouldn't change the other).
So my question(s) here is(are):
Is my profiling technique here flawed? Have I written the tests in a way that favor one language over the other?
What is the difference in implementation of lists and their reversal in Erlang vs Python that make this situation (of Python being WAY faster) possible?
Thanks for your time (in advance).
This seems to suggest that after the function has been loaded and run
once, the length of the input makes no difference and the reversal
times are incredibly fast - in the range of ~0.7µs.
Because your profiling function is incorrect. It accepts variable positional arguments, but when it passes them to the function, it doesn't unpack them so you are only ever working with a tuple of length one. You need to do the following:
def profile(func, *args):
start = time.time()
func(*args) # Make sure to unpack the args!
end = time.time()
elapsed_seconds = end - start
print(elapsed_seconds * ms_conv_factor, flush=True)
So notice the difference:
>>> def foo(*args):
... print(args)
... print(*args)
...
>>> foo(1,2,3)
(1, 2, 3)
1 2 3
Also note, reversed(m) creates a reversed iterator, so it doesn't actually do anything until you iterate over it. So g will still be constant time.
But rest assured, reversing a list in Python takes linear time.
Related
Parse list of strings for speed
Background I have a function called get_player_path that takes in a list of strings player_file_list and a int value total_players. For the sake of example i have reduced the list of strings and also set the int value to a very small number. Each string in the player_file_list either has a year-date/player_id/some_random_file.file_extension or year-date/player_id/IDATs/some_random_number/some_random_file.file_extension Issue What i am essentially trying to achieve here is go through this list and store all unique year-date/player_id path in a set until it's length reaches the value of total_players My current approach does not seem the most efficient to me and i am wondering if i can speed up my function get_player_path in anyway?? Code def get_player_path(player_file_list, total_players): player_files_to_process = set() for player_file in player_file_list: player_file = player_file.split("/") file_path = f"{player_file[0]}/{player_file[1]}/" player_files_to_process.add(file_path) if len(player_files_to_process) == total_players: break return sorted(player_files_to_process) player_file_list = [ "2020-10-27/31001804320549/31001804320549.json", "2020-10-27/31001804320549/IDATs/204825150047/foo_bar_Red.idat", "2020-10-28/31001804320548/31001804320549.json", "2020-10-28/31001804320548/IDATs/204825150123/foo_bar_Red.idat", "2020-10-29/31001804320547/31001804320549.json", "2020-10-29/31001804320547/IDATs/204825150227/foo_bar_Red.idat", "2020-10-30/31001804320546/31001804320549.json", "2020-10-30/31001804320546/IDATs/123455150047/foo_bar_Red.idat", "2020-10-31/31001804320545/31001804320549.json", "2020-10-31/31001804320545/IDATs/597625150047/foo_bar_Red.idat", ] print(get_player_path(player_file_list, 2)) Output ['2020-10-27/31001804320549/', '2020-10-28/31001804320548/']
Let's analyze your function first: your loop should take linear time (O(n)) in the length of the input list, assuming the path lengths are bounded by a relatively "small" number; the sorting takes O(n log(n)) comparisons. Thus the sorting has the dominant cost when the list becomes big. You can micro-optimize your loop as much as you want, but as long as you keep that sorting at the end, your effort won't make much of a difference with big lists. Your approach is fine if you're just writing a Python script. If you really needed perfomances with huge lists, you would probably be using some other language. Nonetheless, if you really care about performances (or just to learn new stuff), you could try one of the following approaches: replace the generic sorting algorithm with something specific for strings; see here for example use a trie, removing the need for sorting; this could be theoretically better but probably worse in practice. Just for completeness, as a micro-optimization, assuming the date has a fixed length of 10 characters: def get_player_path(player_file_list, total_players): player_files_to_process = set() for player_file in player_file_list: end = player_file.find('/', 12) # <--- len(date) + len('/') + 1 file_path = player_file[:end] # <--- player_files_to_process.add(file_path) if len(player_files_to_process) == total_players: break return sorted(player_files_to_process) If the IDs have fixed length too, as in your example list, then you don't need any split or find, just: LENGTH = DATE_LENGTH + ID_LENGTH + 1 # 1 is for the slash between date and id ... for player_file in player_file_list: file_path = player_file[:LENGTH] ... EDIT: fixed the LENGTH initialization, I had forgotten to add 1
I'll leave this solution here which can be further improved, hope it helps. player_file_list = ( "2020-10-27/31001804320549/31001804320549.json", "2020-10-27/31001804320549/IDATs/204825150047/foo_bar_Red.idat", "2020-10-28/31001804320548/31001804320549.json", "2020-10-28/31001804320548/IDATs/204825150123/foo_bar_Red.idat", "2020-10-29/31001804320547/31001804320549.json", "2020-10-29/31001804320547/IDATs/204825150227/foo_bar_Red.idat", "2020-10-30/31001804320546/31001804320549.json", "2020-10-30/31001804320546/IDATs/123455150047/foo_bar_Red.idat", "2020-10-31/31001804320545/31001804320549.json", "2020-10-31/31001804320545/IDATs/597625150047/foo_bar_Red.idat", ) def get_player_path(l, n): pfl = set() for i in l: i = "/".join(i.split("/")[0:2]) if i not in pfl: pfl.add(i) if len(pfl) == n: return pfl if n > len(pfl): print("not enough matches") return print(get_player_path(player_file_list, 2)) # {'2020-10-27/31001804320549', '2020-10-28/31001804320548'} Python Demo
Use dict so that you don't have to sort it since your list is already sorted. If you still need to sort you can always use sorted in the return statement. Add import re and replace your function as follows: def get_player_path(player_file_list, total_players): dct = {re.search('^\w+-\w+-\w+/\w+',pf).group(): 1 for pf in player_file_list} return [k for i,k in enumerate(dct.keys()) if i < total_players]
multiple functions as arguments in python
I have the following problem: I have two sets of data (set T and set F). And the following functions: x(T) = arctan(T-c0), A(x(T)) = arctan(x(T) -c1), B(x(T)) = arctan(x(T) -c2) and Y(x(t),F) = ((A(x(t)) - B(x(t)))/2 - A(x(t))arctan(F-c3) + B(x(t))arctan(F-c4)) # where c0,c1,c2,c3,c4 are constants Now I want to create a surface plot of Y. And for that I would like to implement Y as a python (numpy) function what turns out to be quite complicated, because Y takes other functions as input. Another idea of mine was to evaluate x, B and A on the data separately and store the results in numpy arrays. With those I also could get the output of the function Y , but I don't know which way is better in order to plot the data and I really would like to know how to write Y as a python function. Thank you very much for your help
It is absolutely possible to use functions as input parameters to other functions. A use case could look like: def plus_one(standard_input_parameter_like_int): return standard_input_parameter_like_int + 1 def apply_function(function_as_input, standard_input_parameter): return function_as_input(standard_input_parameter) if(__name__ == '__main__'): print(apply_function(plus_one, 1)) I hope that helps to solve your specific problem.
[...] somethin like def s(x,y,z,*args,*args2): will yield an error. This is perfectly normal as (at least as far as I know) there is only one variable length non-keyword argument list allowed per function (that has to be exactly labeled as *args). So if you remove the asterisks (*) you should actually be able to run s properly. Regarding your initial question you could do something like: c = [0.2,-0.2,0,0,0,0] def x(T): return np.arctan(T-c[0]) def A(xfunc,T): return np.arctan(xfunc(T) - c[1]) def B(xfunc,T): return np.arctan(xfunc(T) - c[2]) def Y(xfunc,Afunc,Bfunc,t,f): return (Afunc(xfunc,t) - Bfunc(xfunc,t))/2.0 - Afunc(xfunc,t) * np.arctan(f - c[3]) + Bfunc(xfunc,t)*np.arctan(f-c[4]) _tSet = np.linspace(-1,1,20) _fSet = np.arange(-1,1,20) print Y(x,A,B,_tSet,_fSet) As you can see (and probably already tested by yourself judging from your comment) you can use functions as arguments. And as long as you don't use any 'if' conditions or other non-vectorized functions in your 'sub'-functions the top-level function should already be vectorized.
Operations in Python
My question is fairly basic yet might need a challenging solution. Essentially, I have an arbitrary function which we will call some_function. def some_function(n): for i in range(n): i+i r = 1 r = r+1 And I want to count the number of operations took place in an arbitrary call to this function is executed (e.g. some_function(5). there are 7 operations that took place). How would one count the number of operations that took place within a function call? I cannot modify some_function.
I think you're really after what others already told you - the big O notation. But if you really want to know the actual number of instructions executed you can use this on linux: perf stat -e instructions:u python yourscript.py Which will output: Performance counter stats for 'python yourscript.py': 22,260,577 instructions:u 0.014450363 seconds time elapsed Note though that it includes all the instructions for executing python itself. So you'd have to find your own reference.
Using byteplay: Example: #!/usr/bin/env python from byteplay import Code def some_function(n): for i in range(n): i + i r = 1 r = r + 1 def no_of_bytecode_instructions(f): code = Code.from_code(f.func_code) return len(code.code) print(no_of_bytecode_instructions(some_function)) Output: $ python -i foo.py 28 >>> NB: This still gives you no idea how complex f is here. "Number of Instructions" != "Algorithm Complexity" (not by itself) See: Big O Algorithm complexity is a measure of the no. of instructions executed relative to the size of your input data set(s). Some naive examples of "complexity" and Big O: def func1(just_a_list): """O(n)""" for i in just_a_list: ... def func2(list_of_lists): """O(n^2)""" for i in list_of_lsits: for j in i: ... def func3(a_dict, a_key): """O(1)""" return a_dict[a_key]
Only index needed: enumerate or (x)range?
If I want to use only the index within a loop, should I better use the range/xrange function in combination with len() a = [1,2,3] for i in xrange(len(a)): print i or enumerate? Even if I won't use p at all? for i,p in enumerate(a): print i
I would use enumerate as it's more generic - eg it will work on iterables and sequences, and the overhead for just returning a reference to an object isn't that big a deal - while xrange(len(something)) although (to me) more easily readable as your intent - will break on objects with no support for len...
Using xrange with len is quite a common use case, so yes, you can use it if you only need to access values by index. But if you prefer to use enumerate for some reason, you can use underscore (_), it's just a frequently seen notation that show you won't use the variable in some meaningful way: for i, _ in enumerate(a): print i There's also a pitfall that may happen using underscore (_). It's also common to name 'translating' functions as _ in i18n libraries and systems, so beware to use it with gettext or some other library of such kind (thnks to #lazyr).
That's a rare requirement – the only information used from the container is its length! In this case, I'd indeed make this fact explicit and use the first version.
xrange should be a little faster, but enumerate will mean you don't need to change it when you realise that you need p afterall
I ran a time test and found out range is about 2x faster than enumerate. (on python 3.6 for Win32) best of 3, for len(a) = 1M enumerate(a): 0.125s range(len(a)): 0.058s Hope it helps. FYI: I initialy started this test to compare python vs vba's speed...and found out vba is actually 7x faster than range method...is it because of my poor python skills? surely python can do better than vba somehow script for enumerate import time a = [0] a = a * 1000000 time.perf_counter() for i,j in enumerate(a): pass print(time.perf_counter()) script for range import time a = [0] a = a * 1000000 time.perf_counter() for i in range(len(a)): pass print(time.perf_counter()) script for vba (0.008s) Sub timetest_for() Dim a(1000000) As Byte Dim i As Long tproc = Timer For i = 1 To UBound(a) Next i Debug.Print Timer - tproc End Sub
I wrote this because I wanted to test it. So it depends if you need the values to work with. Code: testlist = [] for i in range(10000): testlist.append(i) def rangelist(): a = 0 for i in range(len(testlist)): a += i a = testlist[i] + 1 # Comment this line for example for testing def enumlist(): b = 0 for i, x in enumerate(testlist): b += i b = x + 1 # Comment this line for example for testing import timeit t = timeit.Timer(lambda: rangelist()) print("range(len()):") print(t.timeit(number=10000)) t = timeit.Timer(lambda: enumlist()) print("enum():") print(t.timeit(number=10000)) Now you can run it and will get most likely the result, that enum() is faster. When you comment the source at a = testlist[i] + 1 and b = x + 1 you will see range(len()) is faster. For the code above I get: range(len()): 18.766527627612255 enum(): 15.353173553868345 Now when commenting as stated above I get: range(len()): 8.231641875551514 enum(): 9.974262515773656
Based on your sample code, res = [[profiel.attr[i].x for i,p in enumerate(profiel.attr)] for profiel in prof_obj] I would replace it with res = [[p.x for p in profiel.attr] for profiel in prof_obj]
Just use range(). If you're going to use all the indexes anyway, xrange() provides no real benefit (unless len(a) is really large). And enumerate() creates a richer datastructure that you're going to throw away immediately.
Reduce function calls
I profiled my python program and found that the following function was taking too long to run. Perhaps, I can use a different algorithm and make it run faster. However, I have read that I can also possibly increase the speed by reducing function calls, especially when it gets called repeatedly within a loop. I am a python newbie and would like to learn how to do this and see how much faster it can get. Currently, the function is: def potentialActualBuyers(setOfPeople,theCar,price): count=0 for person in setOfPeople: if person.getUtility(theCar) >= price and person.periodCarPurchased==None: count += 1 return count where setOfPeople is a list of person objects. I tried the following: def potentialActualBuyers(setOfPeople,theCar,price): count=0 Utility=person.getUtility for person in setOfPeople: if Utility(theCar) >= price and person.periodCarPurchased==None: count += 1 return count This, however, gives me an error saying local variable 'person' referenced before assignment Any suggestions, how I can reduce function calls or any other changes that can make the code faster. Again, I am a python newbie and even though I may possibly be able to use a better algorithm, it is still worthwhile learning the answer to the above question. Thanks very much. ***** EDIT ***** Adding the getUtility method: def getUtility(self,theCar): if theCar in self.utility.keys(): return self.utility[theCar] else: self.utility[theCar]=self.A*(math.pow(theCar.mpg,self.alpha))*(math.pow(theCar.hp,self.beta))*(math.pow(theCar.pc,self.gamma)) return self.utility[theCar] ***** EDIT: asking for new ideas ***** Any ideas how to speed this up further. I used the method suggested by Alex to cut the time in half. Can I speed this further? Thanks.
I doubt you can get much speedup in this case by hoisting the lookup of person.getUtility (by class, not by instances, as other instances have pointed out). Maybe...: return sum(1 for p in setOfPeople if p.periodCarPurchased is None and p.getUtility(theCar) >= price) but I suspect most of the time is actually spent in the execution of getUtility (and possibly in the lookup of p.periodCarPurchased if that's some fancy property as opposed to a plain old attribute -- I moved the latter before the and just in case it is a plain attribute and can save a number of the getUtility calls). What does your profiling say wrt the fraction of time spent in this function (net of its calls to others) vs the method (and possibly property) in question?
Try instead (that's assuming all persons are of the same type Person): Utility = Person.getUtility for person in setOfPeople: if Utility (person, theCar) >= ... Also, instead of == None using is None should be marginally faster. Try if swapping and terms helps.
Methods are just functions bound to an object: Utility = Person.getUtility for person in setOfPeople: if Utility(person, theCar) ... This doesn't eliminate a function call though, it eliminates an attribute lookup.
This one line made my eyes bleed: self.utility[theCar]=self.A*(math.pow(theCar.mpg,self.alpha))*(math.pow(theCar.hp,self.beta))*(math.pow(theCar.pc,self.gamma)) Let's make it legible and PEP8able and then see if it can be faster. First some spaces: self.utility[theCar] = self.A * (math.pow(theCar.mpg, self.alpha)) * (math.pow(theCar.hp, self.beta)) * (math.pow(theCar.pc, self.gamma)) Now we can see there are very redundant parentheses; remove them: self.utility[theCar] = self.A * math.pow(theCar.mpg, self.alpha) * math.pow(theCar.hp, self.beta) * math.pow(theCar.pc, self.gamma) Hmmm: 3 lookups of math.pow and 3 function calls. You have three choices for powers: x ** y, the built-in pow(x, y[, z]), and math.pow(x, y). Unless you have good reason for using one of the others, it's best (IMHO) to choose x ** y; you save both the attribute lookup and the function call. self.utility[theCar] = self.A * theCar.mpg ** self.alpha * theCar.hp ** self.beta * theCar.pc ** self.gamma annnnnnd while we're here, let's get rid of the horizontal scroll-bar: self.utility[theCar] = (self.A * theCar.mpg ** self.alpha * theCar.hp ** self.beta * theCar.pc ** self.gamma) A possibility that would require quite a rewrite of your existing code and may not help anyway (in Python) would be to avoid most of the power calculations by taking logs everywhere and working with log_utility = log_A + log_mpg * alpha ...