I am looking for a way to partially apply functions in python that is simple to understand, readable, resusable and as little error prone to coder mistakes as possible. Most of all I want the style to be as performant as possible - less frames on the stack is nice, and less memory footprint for the partially applied functions is also desirable. I have considered 4 styles and written examples below:
import functools
def multiplier(m):
def inner(x):
return m * x
return inner
def divide(n,d):
return n/d
def divider(d):
return functools.partial(divide,d=d)
times2 = multiplier(2)
print(times2(3)) # 6
by2 = divider(2)
print(by2(6)) # 3.0
by3 = functools.partial(divide,d=3)
print(by3(9)) # 3.0
by4 = lambda n: divide(n,4)
print(by4(12)) # 3.0
My analysis of them are:
times2 is a nested thing. I guess python makes a closure with the m bound, and everything is nice. The code is readable (I think) and simple to understand. No external libraries. This is the style I use today.
by2 has an explicit named function which makes it simple for the user. It uses functools, so it gives you extra imports. I like this style to some extent since it is transparent, and give me the option to use divide in other ways if I want to. Contrast this with inner which is not reachable.
by3 is like by2, but forces the reader of the code to be comfortable with functools.partial since they have it right in the face. what I like less is that PyCharm cant give my tooltips with what the arguments to functools.partial should be, since they are effectively arguments to by3. I have to know the signature of divide myself everytime I define some new partial application.
by4 is simple to type, since I can get autocompletion. It needs no import of functools. I think it looks non-pythonic though. Also, I always feel uncomfortable about scoping of variables / closures with lambdas work in python. Never sure how that behaves....
What is the logical difference between the styles and how does that affect memory and CPU?
The first way seems to be the most efficient. I tweaked your code so that all 4 functions compute exactly the same mathematical function:
import functools,timeit
def multiplier(m):
def inner(x):
return m * x
return inner
def mult(x,m):
return m*x
def multer(m):
return functools.partial(mult,m=m)
f1 = multiplier(2)
f2 = multer(2)
f3 = functools.partial(mult,m=2)
f4 = lambda x: mult(x,2)
print(timeit.timeit('f1(10)',setup = 'from __main__ import f1'))
print(timeit.timeit('f2(10)',setup = 'from __main__ import f2'))
print(timeit.timeit('f3(10)',setup = 'from __main__ import f3'))
print(timeit.timeit('f4(10)',setup = 'from __main__ import f4'))
Typical output (on my machine):
0.08207898699999999
0.19439769299999998
0.20093803199999993
0.1442435820000001
The two functools.partial approaches are identical (since one of them is just a wrapper for the other), the first is twice as fast, and the last is somewhere in between (but closer to the first). There is a clear overhead in using functools over a straightforward closure. Since the closure approach is arguably more readable as well (and more flexible than the lambda which doesn't extend well to more complicated functions) I would just go with it.
Technically you're missing one other option as operator.mul does the same thing you're looking to do and you can just use functools.partial on it to get a default first argument without having to reinvent the wheel.
Not only is it the fastest option it also uses the least space compared to a custom function or a lambda statement. The fact that it's a partial is why it uses the same space as the others and I think that's the best route here.
from timeit import timeit
from functools import partial
from sys import getsizeof
from operator import mul
def multiplier(m):
def inner(x):
return m * x
return inner
def mult(x,m):
return m*x
def multer(m):
return partial(mult,m=m)
f1 = multiplier(2)
f2 = multer(2)
f3 = partial(mult,m=2)
f4 = lambda n: mult(n,2)
f5 = partial(mul, 2)
from_main = 'from __main__ import {}'.format
print(timeit('f1(10)', from_main('f1')), getsizeof(f1))
print(timeit('f2(10)', from_main('f2')), getsizeof(f2))
print(timeit('f3(10)', from_main('f3')), getsizeof(f3))
print(timeit('f4(10)', from_main('f4')), getsizeof(f4))
print(timeit('f5(10)', from_main('f5')), getsizeof(f5))
Output
0.5278953390006791 144
1.0804575479996856 96
1.0762036349988193 96
0.9348237040030654 144
0.3904160970050725 96
This should answer your question as far as memory usage and speed.
Related
Intro:
Hello. I am exploring the python rxpy library for my use case - where I am building an execution pipeline using the reactive programming concepts. This way I expect I would not have to manipulate too many states. Though my solution seems to be functional, but I am having trouble trying to compose a new Observable from other Observables.
The problem is that the way I am composing my observables is causing some expensive calculations to be repeated twice. For performance, I really want to prevent triggering expensive calculations.
I am very new the reactive programming. Trying to scratch my head and have looked through internet resources and reference documentation - seems a little too terse for me to grasp. Please advice.
Following is a toy example which illustrates what I am doing:
import rx
from rx import operators as op
from rx.subject import Subject
root = Subject()
foo = root.pipe(
op.map( lambda x : x + 1 ),
op.do_action(lambda r: print("foo(x) = %s (expensive)" % str(r)))
)
bar_foo = foo.pipe(
op.map( lambda x : x * 2 ),
op.do_action(lambda r: print("bar(foo(x)) = %s" % str(r)))
)
bar_foo.pipe(
op.zip(foo),
op.map(lambda i: i[0]+i[1]),
op.do_action(lambda r: print("foo(x) + bar(foo(x)) = %s" % str(r)))
).subscribe()
print("-------------")
root.on_next(10)
print("-------------")
Output:
-------------
foo(x) = 11 (expensive)
bar(foo(x)) = 22
foo(x) = 11 (expensive)
foo(x) + bar(foo(x)) = 33
-------------
You could think of foo() and bar() to be expensive and complex operations. I first build an observable foo. Then compose a new observable bar_foo that incorporates foo. Later both are zipped together to calculate the final result foo(x)+bar(foo(x)).
Question:
What can I do to prevent foo() from getting triggered more than once for a single input?
I have really strong reasons to keep foo() and bar() separate. Also I also do not want to explicitly memoize foo().
Anyone with experience using rxpy in production could share their experiences. Will using rxpy lead to better performance or slowdowns as compared to an equivalent hand crafted (but unmaintainable) code?
Adding op.share() right after the expensive calculation in the foo pipeline could be useful here. So changing the foo pipeline to:
foo = root.pipe(
op.map( lambda x : x + 1 ),
op.do_action(lambda r: print("foo(x) = %s (expensive)" % str(r))),
op.share() # added to pipeline
)
will result in:
-------------
foo(x) = 11 (expensive)
bar(foo(x)) = 22
foo(x) + bar(foo(x)) = 33
-------------
I believe that .share() makes the emitted events of the expensive operation being shared among downstream subscribers, so that the result of a single expensive calculation can be used multiple times.
Regarding your second question; I am new to RxPy as well, so interested in the answer of more experienced users. Until now, I've noticed that as a beginner you can easily create (bad) pipelines where messages and calculations are repeated in the background. .share() seems to reduce this to some extend, but not sure about what is happening in the background.
I'm trying to repeatedly run a function that requires a few positional arguments and involves random number generation (to generate many samples of a distribution). For a MWE, I think this captures everything:
import numpy as np
import multiprocessing as mup
from functools import partial
def rarr(xsize, ysize, k):
return np.random.rand(xsize, ysize)
def clever_array(nsamp, xsize=100, ysize=100, ncores=None):
np.random.seed()
if ncores is None:
p = mup.Pool()
else:
p = mup.Pool(ncores)
out = p.map_async( partial(rarr, xsize, ysize), range(nsamp))
p.close()
return np.array(out.get())
Note that the final positional argument for rarr() is just a dummy variable, since I am using map_async(), which requires an iterable. Now if I run %timeit clever_array(500, ncores = 1) I get 208 ms, whereas %timeit clever_array(500, ncores = 5) takes 149 ms. So there is definitely some kind of parallelism happening (the speedup isn't terribly impressive for this MWE but is decent in my real code).
However, I'm wondering a few things -- is there a more natural implementation other than the dummy variable for rarr() passed as an iterable to map_async to run this many times? Is there any obvious way to pass the xsize and ysize args to rarr() other than partial()? And is there any way to ensure different results from the different cores other than initializing a different random.seed() every time?
Thanks for any help!
Typically when we use multiprocessing we would expect different results from each invocation of a function, therefore it doesn't quite make sense to call the same function many times. In order to ensure the randomness of the sampling output, it is best to separate the random state (seed) from the function itself. The best approach as recommended by the numpy official doc is to use a np.random.Generator object, created via np.random.default_rng([seed]). With that we can modify your code to
import numpy as np
import multiprocessing as mup
from functools import partial
def rarr(xsize, ysize, rng):
return rng.random((xsize, ysize))
def clever_array(nsamp, xsize=100, ysize=100, ncores=None):
if ncores is None:
p = mup.Pool()
else:
p = mup.Pool(ncores)
out = p.map_async(partial(rarr, xsize, ysize), map(np.random.default_rng, range(nsamp)))
p.close()
return np.array(out.get())
I have the following problem: I have two sets of data (set T and set F). And the following functions:
x(T) = arctan(T-c0), A(x(T)) = arctan(x(T) -c1),
B(x(T)) = arctan(x(T) -c2)
and Y(x(t),F) = ((A(x(t)) - B(x(t)))/2 - A(x(t))arctan(F-c3) + B(x(t))arctan(F-c4))
# where c0,c1,c2,c3,c4 are constants
Now I want to create a surface plot of Y. And for that I would like to implement Y as a python (numpy) function what turns out to be quite complicated, because Y takes other functions as input.
Another idea of mine was to evaluate x, B and A on the data separately and store the results in numpy arrays. With those I also could get the output of the function Y , but I don't know which way is better in order to plot the data and I really would like to know how to write Y as a python function.
Thank you very much for your help
It is absolutely possible to use functions as input parameters to other functions. A use case could look like:
def plus_one(standard_input_parameter_like_int):
return standard_input_parameter_like_int + 1
def apply_function(function_as_input, standard_input_parameter):
return function_as_input(standard_input_parameter)
if(__name__ == '__main__'):
print(apply_function(plus_one, 1))
I hope that helps to solve your specific problem.
[...] somethin like def s(x,y,z,*args,*args2): will yield an
error.
This is perfectly normal as (at least as far as I know) there is only one variable length non-keyword argument list allowed per function (that has to be exactly labeled as *args). So if you remove the asterisks (*) you should actually be able to run s properly.
Regarding your initial question you could do something like:
c = [0.2,-0.2,0,0,0,0]
def x(T):
return np.arctan(T-c[0])
def A(xfunc,T):
return np.arctan(xfunc(T) - c[1])
def B(xfunc,T):
return np.arctan(xfunc(T) - c[2])
def Y(xfunc,Afunc,Bfunc,t,f):
return (Afunc(xfunc,t) - Bfunc(xfunc,t))/2.0 - Afunc(xfunc,t) * np.arctan(f - c[3]) + Bfunc(xfunc,t)*np.arctan(f-c[4])
_tSet = np.linspace(-1,1,20)
_fSet = np.arange(-1,1,20)
print Y(x,A,B,_tSet,_fSet)
As you can see (and probably already tested by yourself judging from your comment) you can use functions as arguments. And as long as you don't use any 'if' conditions or other non-vectorized functions in your 'sub'-functions the top-level function should already be vectorized.
Setup: I have a function preprocess(data, predicate) and a list of predicates that might look like this:
preds = [lambda x: x < 1,
lambda x: x < 2,
lambda x: x < 3,
lambda x: x < 42]
EDIT: I probably should have been more precise, because I thought 1, 2, 3, 42 are obviously identifiable as examples, but it seems that it was too implicit. Actually I'm doing some NLP and data are lists of words and one predicate looks like lambda w: (w.lower() not in stopwords.words('english') and re.search("[a-z]", w.lower())). I want to test different predicates to evaluate which performes best.
Here is what I actually want to do. Call preprocess with every predicate, in parallel.
EDIT: Because this is a preprocessing step I need what is beeing returned by preprocess to continue to work with it.
What I hoped I can do but sadly can't:
pool = Pool(processes=4)
pool.map(lambda p: preprocess(data, p), preds)
As far as I understood this is because everything passed to pool.map has to be pickle-able. In this question there are two solutions suggested of which the first(accepted answer) seems impractical and the secound doesn't seem to work in Python 2.7, which I'm using, even though suggested that it does by pythonic metaphor in the comments.
My Question is whether or not pool.map is the right way to go and if so how to do it? Or shoud I try a different approach?
I know there are quite a lot of questions regarding pool.map and even though I spent some time searching I didn't found an answer. Also if my code-style is awkward feel free to point out. I read that lambda looks strange to some and that I probably should use functools.partial.
Thanks in advance.
In this simple case you can simply modify the preprocess function to accept a threshold attribute. Something like:
def preprocess(data, threshold):
def predicate(x):
return x < threshold
return old_preprocess(data, predicate)
Now in your preds list you can simply put the integers, which are picklable:
preds = [1,2,3,42]
pool = Pool(processes=4)
pool.map(preprocess, zip(data, preds))
You can extend it to choose the operator by using the operator module:
def preprocess(data, pred):
threshold, op = pred
def predicate(x):
return op(x, threshold)
return old_preprocess(data, predicate)
import operator as op
preds = [(1, op.lt), (2, op.gt), (3, op.ge), (42, op.lt)]
pool = Pool(processes=4)
pool.map(preprocess, zip(data, preds))
To extend it with arbitrary predicates things get harder. Probably the easiest way to do this is to use the marshal module, which is able to convert the code of a function into a bytes object and back.
Something like:
real_preds = [marshal.dumps(pred.__code__) for pred in preds]
And then the preprocess should re-build the predicate functions:
import types
def preprocess(data, pred):
pred = types.FunctionType(marshal.loads(pred), globals())
Here's a MWE for this last suggestion:
>>> from multiprocessing import Pool
>>> import marshal
>>> import types
>>> def preprocess(pred):
... pred = types.FunctionType(marshal.loads(pred), globals())
... return pred(2)
...
>>> preds = [lambda x: x < 1,
... lambda x: x <2,
... lambda x: x < 3,
... lambda x: x < 42]
>>> real_preds = [marshal.dumps(pred.__code__) for pred in preds]
>>> pool = Pool(processes=4)
>>> pool.map(preprocess, real_preds)
[False, False, True, True]
Note that the argument to pool.map must be picklable. Which means you cannot use a lambda as first argument to Pool.map:
>>> pool.map(lambda x: preprocess(x), real_preds)
Exception in thread Thread-5:
Traceback (most recent call last):
File "/usr/lib/python3.3/threading.py", line 639, in _bootstrap_inner
self.run()
File "/usr/lib/python3.3/threading.py", line 596, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.3/multiprocessing/pool.py", line 351, in _handle_tasks
put(task)
File "/usr/lib/python3.3/multiprocessing/connection.py", line 206, in send
ForkingPickler(buf, pickle.HIGHEST_PROTOCOL).dump(obj)
_pickle.PicklingError: Can't pickle <class 'function'>: attribute lookup builtins.function failed
Regarding the "is Pool.map the right tool? I believe it highly depends on the size of the data. Using multiprocessing increases quite a lot the overhead, so even if you "make it work" there are high chances that it isn't worth it. In particular, in your edited question you put a more "real world" scenario for predicates:
lambda w: (w.lower() not in stopwords.words('english') and re.search("[a-z]", w.lower()))
I believe that this predicate doesn't take enough time to make it worth using Pool.map. Obviously it depends on the size of w and the number of elements to map.
Doing really fast tests with this predicate I see that using Pool.map starts to become faster when w is around 35000 characters in length. If w is less than 1000 then using Pool is about 15 times slower than a plain map (with 256 strings to check. If the strings are 60000 then Pool is a bit faster).
Notice that if w is quite long then it is worth using a def instead of lambda and avoid the double calculation of w.lower(). Either if you are going to use plain map or if you want to use Pool.map.
You can do this with Pool.map, you just have to organize what you're mapping properly. Maps basically work like this:
result = map(function, things)
is equivalent to
result = []
for thing in things:
result.append(function(thing))
or, more concisely,
result = [function(thing) for thing in things]
You can structure your function so that it accepts an argument (the upper bound) and does the comparison itself:
def mapme(bound):
p = lambda x : x < bound
return preprocess(data, p)
From there, it doesn't matter if you're doing a parallel map or a single threaded one. As long as preprocess doesn't have side effects, you can use a map.
If you're using the functions for their side effects and don't need to use the unified output of pool.map(), you can just simulate it using os.fork() (at least on unix-like systems).
You could try something like this:
import numpy as np
import os
nprocs=4
funcs=np.array_split(np.array(preds),nprocs)
#Forks the program into nprocs programs, each with a procid from 0 to nprocs-1
procid=0
for x in range(1,nprocs):
if (os.fork()==0):
procid=x
break
map(lambda p: preprocess(data, p), funcs[procid])
I profiled my python program and found that the following function was taking too long to run. Perhaps, I can use a different algorithm and make it run faster. However, I have read that I can also possibly increase the speed by reducing function calls, especially when it gets called repeatedly within a loop. I am a python newbie and would like to learn how to do this and see how much faster it can get. Currently, the function is:
def potentialActualBuyers(setOfPeople,theCar,price):
count=0
for person in setOfPeople:
if person.getUtility(theCar) >= price and person.periodCarPurchased==None:
count += 1
return count
where setOfPeople is a list of person objects. I tried the following:
def potentialActualBuyers(setOfPeople,theCar,price):
count=0
Utility=person.getUtility
for person in setOfPeople:
if Utility(theCar) >= price and person.periodCarPurchased==None:
count += 1
return count
This, however, gives me an error saying local variable 'person' referenced before assignment
Any suggestions, how I can reduce function calls or any other changes that can make the code faster.
Again, I am a python newbie and even though I may possibly be able to use a better algorithm, it is still worthwhile learning the answer to the above question.
Thanks very much.
***** EDIT *****
Adding the getUtility method:
def getUtility(self,theCar):
if theCar in self.utility.keys():
return self.utility[theCar]
else:
self.utility[theCar]=self.A*(math.pow(theCar.mpg,self.alpha))*(math.pow(theCar.hp,self.beta))*(math.pow(theCar.pc,self.gamma))
return self.utility[theCar]
***** EDIT: asking for new ideas *****
Any ideas how to speed this up further. I used the method suggested by Alex to cut the time in half. Can I speed this further?
Thanks.
I doubt you can get much speedup in this case by hoisting the lookup of person.getUtility (by class, not by instances, as other instances have pointed out). Maybe...:
return sum(1 for p in setOfPeople
if p.periodCarPurchased is None
and p.getUtility(theCar) >= price)
but I suspect most of the time is actually spent in the execution of getUtility (and possibly in the lookup of p.periodCarPurchased if that's some fancy property as opposed to a plain old attribute -- I moved the latter before the and just in case it is a plain attribute and can save a number of the getUtility calls). What does your profiling say wrt the fraction of time spent in this function (net of its calls to others) vs the method (and possibly property) in question?
Try instead (that's assuming all persons are of the same type Person):
Utility = Person.getUtility
for person in setOfPeople:
if Utility (person, theCar) >= ...
Also, instead of == None using is None should be marginally faster. Try if swapping and terms helps.
Methods are just functions bound to an object:
Utility = Person.getUtility
for person in setOfPeople:
if Utility(person, theCar) ...
This doesn't eliminate a function call though, it eliminates an attribute lookup.
This one line made my eyes bleed:
self.utility[theCar]=self.A*(math.pow(theCar.mpg,self.alpha))*(math.pow(theCar.hp,self.beta))*(math.pow(theCar.pc,self.gamma))
Let's make it legible and PEP8able and then see if it can be faster. First some spaces:
self.utility[theCar] = self.A * (math.pow(theCar.mpg, self.alpha)) * (math.pow(theCar.hp, self.beta)) * (math.pow(theCar.pc, self.gamma))
Now we can see there are very redundant parentheses; remove them:
self.utility[theCar] = self.A * math.pow(theCar.mpg, self.alpha) * math.pow(theCar.hp, self.beta) * math.pow(theCar.pc, self.gamma)
Hmmm: 3 lookups of math.pow and 3 function calls. You have three choices for powers: x ** y, the built-in pow(x, y[, z]), and math.pow(x, y). Unless you have good reason for using one of the others, it's best (IMHO) to choose x ** y; you save both the attribute lookup and the function call.
self.utility[theCar] = self.A * theCar.mpg ** self.alpha * theCar.hp ** self.beta * theCar.pc ** self.gamma
annnnnnd while we're here, let's get rid of the horizontal scroll-bar:
self.utility[theCar] = (self.A
* theCar.mpg ** self.alpha
* theCar.hp ** self.beta
* theCar.pc ** self.gamma)
A possibility that would require quite a rewrite of your existing code and may not help anyway (in Python) would be to avoid most of the power calculations by taking logs everywhere and working with log_utility = log_A + log_mpg * alpha ...