Python: strategies for persistently memoizing functions with function arguments? - python

I have written a little class to persistently memoize some expensive functions that do various statistical analyses of random networks.
These are all pure functions; all the data is immutable. However, some of the functions take functions as arguments.
Making keys based on these arguments is a small problem, since in Python function object equality is equivalent to function object identity, which does not persist between sessions, even if the function implementation does not change.
I am hacking around this for the time being by using the function name as a string, but this raises its own swarm of issues when one starts thinking about changing the implementation of the function or anonymous functions and so on. But I am probably not the first to worry about such things.
Does anybody have any strategies for persistently memoizing functions with function arguments in Python?

One option would be to use marshal.dumps(function.func_code)
It'll produce a string representation for the code of the function. That should handle changing implementations and anonymous functions.

Have a look at using this as the identity of the function
[getattr(func.__code__,s)
for s in ['co_argcount', 'co_cellvars', 'co_code', 'co_consts',
'co_filename', 'co_firstlineno', 'co_flags', 'co_freevars',
'co_lnotab', 'co_name', 'co_names', 'co_nlocals', 'co_stacksize',
'co_varnames']
]
that should correctly handle changing the implementation in any way...

Related

How do I get a unique hash for the fully inlined code of a function?

My idea is to build some kind of caching mechanism for functions. To that end, I need to determine if a function needs to be evaluated. The result of a function depends - for the sake of this example - on its parameters and the actual code of the function. There may be calls to other functions inside the function. Therefore, only the fully inlined code of the function is a useful "value" for determining whether a function needs to be reevaluated.
Is there a good way to get this fully inlined code of a function in python?
Not possible. The "fully inlined code" of a Python function isn't a well-defined concept, for multiple reasons.
First, almost anything a Python function refers to can be redefined at runtime, which invalidates ahead-of-time inlining. You can even replace built-ins like print.
Second, even with no such rebinding, it is impossible to "fully inline" a recursive or indirectly recursive function.
Third, a function can and almost always will invoke code dynamically based on the provided parameters. def f(x): return x.something() requires a concrete value of x to determine what something is. Even something like def f(x, y): return x + y dynamically invokes an __add__ or __radd__ callback that can't be determined until the actual values of x and y are known.
Fourth, Python functions can invoke C code, which cannot be inlined at Python level.
Finally, even if all the problems with "fully inlining" a function didn't exist, a "fully inlined" version of a function still wouldn't be enough. There is no general way to determine if a Python function performs state mutation, or depends on mutable state that has been mutated, both of which are major issues for caching.
Although it's possible in Python, code is not generally in a state of flux if you're concerned that it might change behind your back. There might be something in the dis module that could help you.
Otherwise you could use memoization to cache function results by mapping them to their parameters. You could use the #functools.lru_cache() decorator, but writing a decorator to do that is pretty easy. See What is memoization and how can I use it in Python?
Apart from mutating code (which might even affect pure functions), the value of any function that relies on changing data is also indeterminate, e.g. some function that returns the latest stock price for a company. Memoization would not help here, nor would producing a function code signature/hash to detect changes in code.

In python, functions are blocks of code that perform desired action whereas methods are functions specific to some objects. Is this statement true?

I'm learning Python. Have knowledge in other languages. There's a difference between methods and functions in python which confuses me. There's a very minute difference. Is my above conclusion on functions and methods true? In what better way can they be differentiated.
Most of the answer is here : https://wiki.python.org/moin/FromFunctionToMethod
To make a long story short: a method is the partial application of a function to an object.
Both are logically types of functions, but method or member function specifically refers to the subset of functions that are defined on classes and that operate on specific instances of the class.
In Python, specifically, it may also refer to functions where the self parameter has already been bound to a specific object (as opposed to the free-standing form where self isn't bound).
Yes, in Python functions and methods are different but similar. Methods needs to take the 'self'(the reference on the caller object) keyword like first parameter,instead functions needs 0 or more parameters.

When a function is used in python, what objects need to be created?

I read somewhere that it is bad to define functions inside of functions in python, because it makes python create a new function object very time the outer function is called. Someone basically said this:
#bad
def f():
def h():
return 4
return h()
#faster
def h():
return 4
def f(h=h):
return h()
Is this true? Also, what about a case when I have a ton of constants like this:
x = # long tuple of strings
# and several more similar tuples
# which are used to build up data structures
def f(x):
#parse x using constants above
return parse dictionary
Is it faster if I put all the constants inside the definition of f? Or should I leave them outside and bind them to local names in a keyword argument? I don't have any data to do timings with unfortunately, so I guess I'm asking about your experiences with similar things.
Short answers to you questions - it is true. Inner functions are created each time outer function is called, and this takes some time. Also access to objects, which were defined outside of a function, is slower, comparing to access to local variables.
However, you also asked more important question - "should I care?". Answer to this, almost always, NO. Performance difference will be really minor, and readability of your code is much more valuable.
So, if you think that this function belongs to body of other function and makes no sense elsewhere - just put it inside and do not care about performance (at least, until your profiler tells you otherwise).
When a function is executed, all the code that is inside needs to be executed. So of course, very simply said, the more you put in the function, the more effort it takes Python to execute this. Especially when you have constant things that do not need to be constructed at run time of the function, then you can save quite a bit by putting it in an upper scope so that Python only needs to look it up instead of generating it again and allocating (temporary) memory to save it for the short run time of a function.
So in your case, if you have a large tuple or anything that does not depend on the input x to the function f, then yes, you should store it outside.
Now the other thing you mentioned is a scope lookup for functions or constants using a keyword argument. In general, looking up variables in an outer scope is more expensive than looking it up in the most local scope. So yes, when you define those constants on module level and you access them inside a function, the lookup will be more expensive than when the constants would be defined inside the function. However, actually defining them inside the function (with memory allocation and actual generation of the data) is likely to be more expensive, so it’s really not a good option.
Now you could pass the constants as a keyword argument to the function, so the lookup inside the function would be a local scope lookup. But very often, you don’t need those constants a lot. You maybe access it once or twice in the function and that’s absolutely not worth adding the overhead of another argument to the function and the possibility to pass something different/incompatible to it (breaking the function).
If you know that you access some global stuff multiple times, then create a local variable at the top of the function which looks that global stuff up once and then use the local variable in all further places. This also applies to member lookup, which can be expensive too.
In general though, these are all rather micro optimizations and you are unlikely to run into any problems if you do it one or the other way. So I’d suggest you to write clear code first and make sure that the rest of it works well, and if you indeed run into performance problems later, then you can check where the issues are.
In my tests, the fastest way to do what I needed to was to define all the constants I needed outside, then make the list of functions who need those constants outside, then pass the list of functions to the main function. I used dis.dis, cProfile.run, and timeit.timeit for my tests, but I can't find the benchmarking script and can't be bothered to rewrite it and put up the results.

What is the equivalent of passing functions as arguments using an object oriented approach

I have a program in python that includes a class that takes a function as an argument to the __init__ method. This function is stored as an attribute and used in various places within the class. The functions passed in can be quite varied, and passing in a key and then selecting from a set of predefined functions would not give the same degree of flexibility.
Now, apologies if a long list of questions like this is not cool, but...
Is their a standard way to achieve this in a language where functions aren't first class objects?
Do blocks, like in smalltalk or objective-C, count as functions in this respect?
Would blocks be the best way to do this in those languages?
What if there are no blocks?
Could you add a new method at runtime?
In which languages would this be possible (and easy)?
Or would it be better to create an object with a single method that performs the desired operation?
What if I wanted to pass lots of functions, would I create lots of singleton objects?
Would this be considered a more object oriented approach?
Would anyone consider doing this in python, where functions are first class objects?
I don't understand what you mean by "equivalent... using an object oriented approach". In Python, since functions are (as you say) first-class objects, how is it not "object-oriented" to pass functions as arguments?
a standard way to achieve this in a language where functions aren't first class objects?
Only to the extent that there is a standard way of functions failing to be first-class objects, I would say.
In C++, it is common to create another class, often called a functor or functionoid, which defines an overload for operator(), allowing instances to be used like functions syntactically. However, it's also often possible to get by with plain old function-pointers. Neither the pointer nor the pointed-at function is a first-class object, but the interface is rich enough.
This meshes well with "ad-hoc polymorphism" achieved through templates; you can write functions that don't actually care whether you pass an instance of a class or a function pointer.
Similarly, in Python, you can make objects register as callable by defining a __call__ method for the class.
Do blocks, like in smalltalk or objective-C, count as functions in this respect?
I would say they do. At least as much as lambdas count as functions in Python, and actually more so because they aren't crippled the way Python's lambdas are.
Would blocks be the best way to do this in those languages?
It depends on what you need.
Could you add a new method at runtime? In which languages would this be possible (and easy)?
Languages that offer introspection and runtime access to their own compiler. Python qualifies.
However, there is nothing about the problem, as presented so far, which suggests a need to jump through such hoops. Of course, some languages have more required boilerplate than others for a new class.
Or would it be better to create an object with a single method that performs the desired operation?
That is pretty standard.
What if I wanted to pass lots of functions, would I create lots of singleton objects?
You say this as if you might somehow accidentally create more than one instance of the class if you don't write tons of boilerplate in an attempt to prevent yourself from doing so.
Would this be considered a more object oriented approach?
Again, I can't fathom your understanding of the term "object-oriented". It doesn't mean "creating lots of objects".
Would anyone consider doing this in python, where functions are first class objects?
Not without a need for the extra things that a class can do and a function can't. With duck typing, why on earth would you bother?
I'm just going to answer some of your questions.
As they say in the Scheme community, "objects are a poor man's closures" (closures being first-class functions). Blocks are usually just syntactic sugar for closures. For languages that do not have closures, there exist various solutions.
One of the common solutions is to use operator overloading: C++ has a notion of function objects, which define a member operator() ("operator function call"). Python has a similar overloading mechanism, where you define __call__:
class Greeter(object):
def __init__(self, who):
self.who = who
def __call__(self):
print("Hello, %s!" % who)
hello = Greeter("world")
hello()
Yes, you might consider using this in Python instead of storing functions in objects, since functions can't be pickled.
In languages without operator overloading, you'll see things like Guava's Function interface.
You could use the strategy pattern. Basically you pass in an object with a known interface, but different behavior. It's like passing function but one that's wrapped up in an object.
In Smalltalk you'd mostly be using blocks. You can also create classes and instances at runtime.

When is using __call__ a good idea?

What are peoples' opinions on using the __call__. I've only very rarely seen it used, but I think it's a very handy tool to use when you know that a class is going to be used for some default behaviour.
I think your intuition is about right.
Historically, callable objects (or what I've sometimes heard called "functors") have been used in the OO world to simulate closures. In C++ they're frequently indispensable.
However, __call__ has quite a bit of competition in the Python world:
A regular named method, whose behavior can sometimes be a lot more easily deduced from the name. Can convert to a bound method, which can be called like a function.
A closure, obtained by returning a function that's defined in a nested block.
A lambda, which is a limited but quick way of making a closure.
Generators and coroutines, whose bodies hold accumulated state much like a functor can.
I'd say the time to use __call__ is when you're not better served by one of the options above. Check the following criteria, perhaps:
Your object has state.
There is a clear "primary" behavior for your class that's kind of silly to name. E.g. if you find yourself writing run() or doStuff() or go() or the ever-popular and ever-redundant doRun(), you may have a candidate.
Your object has state that exceeds what would be expected of a generator function.
Your object wraps, emulates, or abstracts the concept of a function.
Your object has other auxilliary methods that conceptually belong with your primary behavior.
One example I like is UI command objects. Designed so that their primary task is to execute the comnand, but with extra methods to control their display as a menu item, for example, this seems to me to be the sort of thing you'd still want a callable object for.
Use it if you need your objects to be callable, that's what it's there for
I'm not sure what you mean by default behaviour
One place I have found it particularly useful is when using a wrapper or somesuch where the object is called deep inside some framework/library.
More generally, Python has a lot of double-underscore methods. They're there for a reason: they are the Python way of overloading operators. For instance, if you want a new class in which addition, I don't know, prints "foo", you define the __add__ and __radd__ methods. There's nothing inherently good or bad about this, any more than there's anything good or bad about using for loops.
In fact, using __call__ is often the more Pythonic approach, because it encourages clarity of code. You could replace MyCalculator.calculateValues( foo ) with MyCalculator( foo ), say.
Its usually used when class is used as function with some instance context, like some DecoratorClass which would be used as #DecoratorClass('some param'), so 'some param' would be stored in the instance's namespace and then instance being called as actual decorator.
It is not very useful when your class provides some different methods, since its usually not obvious what would the call do, and explicit is better than implicit in these cases.

Categories

Resources