Determine if a python function has changed - python

Context
I am trying to cache executions in a data processing framework (kedro). For this, I want to develop a unique hash for a python function to determine if anything in the function body (or the functions and modules this function calls) has changed. I looked into __code__.co_code. While that nicely ignores comments, spacing etc, it also doesn't change when two functions are obviously different. E.g.
def a():
a = 1
return a
def b():
b = 2
return b
assert a.__code__.co_code != b.__code__.co_code
fails. So the byte code for these two functions is equal.
The ultimate goal: Determine if either a function's code or any of its data inputs have changed. If not and the result already exists, skip execution to save runtime.
Question: How can one get a fingerprint of a functions code in python?
Another idea brought forward by a colleague was this:
import dis
def compare_instructions(func1, func2):
"""compatre instructions of two functions"""
func1_instructions = list(dis.get_instructions(func1))
func2_instructions = list(dis.get_instructions(func2))
# compare every attribute of instructions except for starts_line
for line1, line2 in zip(func1_instructions, func2_instructions):
assert line1.opname == line2.opname
assert line1.opcode == line2.opcode
assert line1.arg == line2.arg
assert line1.argval == line2.argval
assert line1.argrepr == line2.argrepr
assert line1.offset == line2.offset
return True
This seems rather like a hack. Other tools like pytest-testmon try to solve this as well but they appear to be using a number of heuristics.

__code__.co_code returns the byte_code which doesn't reference the constants. Ignore the constants in your functions and they are the same.
__code__.co_consts contains information about the constants so would need to be accounted for in your comparison.
assert a.__code__.co_code != b.__code__.co_code \
or a.__code__.co_consts != b.__code__.co_consts
Looking at inspect highlights a few other considerations for 'sameness'. For example, to ensure the functions below are considered different, default arguments must be accounted for.
def a(a1, a2=1):
return a1 * a2
def b(b1, b2=2):
return b1 * b2
One way to finger print is to use the built-in hash function. Assume the same function defintions as in the OP's example:
def finger_print(func):
return hash(func.__code__.co_consts) + hash(func.__code__.co_code)
assert finger_print(a) != finger_print(b)

Related

How do we check if a function returns mutiple values in Python?

This question was already asked, but I wish to ask something subtly different.
How do we determine if a python function returns multiple values, without calling the function? Is there some way to find out at something more like compile-time instead of at run-time? (I realize that python is an interpreted language)
The following is out of the question:
r = demo_function(data) # takes more than 5 minutes to run
if (not len(r) == 2) or (not isinstance(r, tuple)):
raise ValueError("I was supposed to return EXACTLY two things")
So is:
try:
i, j = demo_function(data)
# I throw TypeError: 'int' object is not iterable
except ValueError:
raise ValueError("Hey! I was expecting two values.")
except TypeError:
s1 = "Hey! I was expecting two values."
s2 = "Also, TypeError was thrown, not ValueError"
raise ValueError(s1 + s2)
The following sort of works, but is extremely inelegant:
r = demo_function(extremely_pruned_down_toy_data) # runs fast
if len(r) != 2:
raise ValueError("There are supposed to be two outputs")
# Now we run it for real
r = demo_function(data) # takes more than 5 minutes to run
There are tools already in python which do similar things. For example, we can find out if a class object has a certain attribute:
prop_str = 'property'
if not hasattr(obj, prop_str):
raise ValueError("There is no attribute named" + prop_str + " NOOOOoooo! ")
Also, we can find out how many INPUTS a function has:
from inspect import signature
sig = signature(demo_function)
p = sig.parameters
if len(p)) != 2:
raise ValueError("The function is supposed to have 2 inputs, but it has" + str(p))
I basically want the following:
p = nargout(demo_function)
if p != 2:
raise ValueError("The function is supposed to have 2 outputs, but it has" + str(p))
Asking what a function returns is one of the most basic things questions one can ask about a function. It feels really weird that I'm having trouble finding out.
EDIT:
juanpa.arrivillaga wrote,
[...] fundamentally, this points to a deeper, underlying design flaw: why do you have functions that can return different length containers when you are expecting a specific length?
Well, let me explain. I have something like this:
def process_data(data_processor, data):
x, y = data_processor(data)
return x, y
A precondition of the process_data function is that the input data_processor MUST return two items. As such, I want to write some error checking code to enforce the precondition.
def process_data(data_processor, data):
# make sure data_processor returns exactly two things!
verify_data_processor(data_processor)
x, y = data_processor(data)
return x, y
However, it looks like that's not easily doable.
A function really only has a single return value. It can return a container, such as a tuple, of whatever length. But there is no inherent way for a Python function to know the length of such a value, Python is much too dynamic. Indeed, in general, the interpreter does not know the nature of the return value of a function. But even if we stick to just considering functions that return containers, consider the following function:
def howmany(n):
return n*('foo',)
Well, what should nargout(howmany) return?
And python does not special case something like return x, y, nor should it, because then what should be the behavior when the length of the returned container is indeterminate, such as return n*(1,)? No, it is up to the caller to deal with the case of a function returning a container, in one of the ways you've already illustrated.
And fundamentally, this points to a deeper, underlying design flaw: why do you have functions that can return different length containers when you are expecting a specific length?

Decorator to define function-local statics - fine details of AST-munging

I am trying to produce a better answer to the frequently-asked question "How do I do function-local static variables in Python?" (1, 2, 3, ...) "Better" means completely encapsulated in a decorator, that can be used in any context where a function definition may appear. In particular, it must DTRT when applied to methods and nested functions; it must play nice with other decorators applied to the same function (in any order); it must accept arbitrary initializers for the static variables, and it must not modify the formal parameter list of the decorated function. Basically, if this were to be proposed for inclusion in the standard library, nobody should be able to object on quality-of-implementation grounds.
Ideal surface syntax would be
#static_vars(a=0, b=[])
def test():
b.append(a)
a += 1
sys.stdout.write(repr(b) + "\n")
I would also accept
#static_vars(a=0, b=[])
def test():
static.b.append(static.a)
static.a += 1
sys.stdout.write(repr(static.b) + "\n")
or similar, as long as the namespace for the static variables is not the name of the function! (I intend to use this in functions that may have very long names.)
A slightly more motivated example involves precompiled regular expressions that are only relevant to one function:
#static_vars(encode_re = re.compile(
br'[\x00-\x20\x7F-\xFF]|'
br'%(?!(?:[0-9A-Fa-f]{2}|u[0-9A-Fa-f]{4}))')
def encode_nonascii_and_percents(segment):
segment = segment.encode("utf-8", "surrogateescape")
return encode_re.sub(
lambda m: "%{:02X}".format(ord(m.group(0))).encode("ascii"),
segment).decode("ascii")
Now, I already have a mostly-working implementation. The decorator rewrites each function definition as if it had read like so (using the first example):
def _wrap_test_():
a = 0
b = 1
def test():
nonlocal a, b
b.append(a)
a += 1
sys.stdout.write(repr(b) + "\n")
test = _wrap_test_()
del _wrap_test_
It seems that the only way to accomplish this is to munge the AST. I have code that works for simple cases (see below) but I strongly suspect it is wrong in more complicated cases. For instance, I think it will break if applied to a method definition, and of course it also breaks in any situation where inspect.getsource() fails.
So the question is, first, what should I do to make it work in more cases, and second, is there a better way to define a decorator with the same black-box effects?
Note 1: I only care about Python 3.
Note 2: Please assume that I have read all of the proposed solutions in all of the linked questions and found all of them inadequate.
#! /usr/bin/python3
import ast
import functools
import inspect
import textwrap
def function_skeleton(name, args):
"""Return the AST of a function definition for a function named NAME,
which takes keyword-only args ARGS, and does nothing. Its
.body field is guaranteed to be an empty array.
"""
fn = ast.parse("def foo(*, {}): pass".format(",".join(args)))
# The return value of ast.parse, as used here, is a Module object.
# We want the function definition that should be the Module's
# sole descendant.
assert isinstance(fn, ast.Module)
assert len(fn.body) == 1
assert isinstance(fn.body[0], ast.FunctionDef)
fn = fn.body[0]
# Remove the 'pass' statement.
assert len(fn.body) == 1
assert isinstance(fn.body[0], ast.Pass)
fn.body.clear()
fn.name = name
return fn
class static_vars:
"""Decorator which provides functions with static variables.
Usage:
#static_vars(foo=1, bar=2, ...)
def fun():
foo += 1
return foo + bar
The variables are implemented as upvalues defined by a wrapper
function.
Uses introspection to recompile the decorated function with its
context changed, and therefore may not work in all cases.
"""
def __init__(self, **variables):
self._variables = variables
def __call__(self, func):
if func.__name__ in self._variables:
raise ValueError(
"function name {} may not be the same as a "
"static variable name".format(func.__name__))
fname = inspect.getsourcefile(func)
lines, first_lineno = inspect.getsourcelines(func)
mod = ast.parse(textwrap.dedent("".join(lines)), filename=fname)
# The return value of ast.parse, as used here, is a Module
# object. Save that Module for use later and extract the
# function definition that should be its sole descendant.
assert isinstance(mod, ast.Module)
assert len(mod.body) == 1
assert isinstance(mod.body[0], ast.FunctionDef)
inner_fn = mod.body[0]
mod.body.clear()
# Don't apply decorators twice.
inner_fn.decorator_list.clear()
# Fix up line numbers. (Why the hell doesn't ast.parse take a
# starting-line-number argument?)
ast.increment_lineno(inner_fn, first_lineno - inner_fn.lineno)
# Inject a 'nonlocal' statement declaring the static variables.
svars = sorted(self._variables.keys())
inner_fn.body.insert(0, ast.Nonlocal(svars))
# Synthesize the wrapper function, which will take the static
# variableas as arguments.
outer_fn_name = ("_static_vars_wrapper_" +
inner_fn.name + "_" +
hex(id(self))[2:])
outer_fn = function_skeleton(outer_fn_name, svars)
outer_fn.body.append(inner_fn)
outer_fn.body.append(
ast.Return(value=ast.Name(id=inner_fn.name, ctx=ast.Load())))
mod.body.append(outer_fn)
ast.fix_missing_locations(mod)
# The new function definition must be evaluated in the same context
# as the original one. FIXME: supply locals if appropriate.
context = func.__globals__
exec(compile(mod, filename="<static-vars>", mode="exec"),
context)
# extract the function we just defined
outer_fn = context[outer_fn_name]
del context[outer_fn_name]
# and call it, supplying the static vars' initial values; this
# returns the adjusted inner function
adjusted_fn = outer_fn(**self._variables)
functools.update_wrapper(adjusted_fn, func)
return adjusted_fn
if __name__ == "__main__":
import sys
#static_vars(a=0, b=[])
def test():
b.append(a)
a += 1
sys.stdout.write(repr(b) + "\n")
test()
test()
test()
test()
Isn't this what classes are for?
import sys
class test_class:
a=0
b=[]
def test(self):
test_class.b.append(test_class.a)
test_class.a += 1
sys.stdout.write(repr(test_class.b) + "\n")
t = test_class()
t.test()
t.test()
[0]
[0, 1]
Here is a version of your regexp encoder:
import re
class encode:
encode_re = re.compile(
br'[\x00-\x20\x7F-\xFF]|'
br'%(?!(?:[0-9A-Fa-f]{2}|u[0-9A-Fa-f]{4}))')
def encode_nonascii_and_percents(self, segment):
segment = segment.encode("utf-8", "surrogateescape")
return encode.encode_re.sub(
lambda m: "%{:02X}".format(ord(m.group(0))).encode("ascii"),
segment).decode("ascii")
e = encode()
print(e.encode_nonascii_and_percents('foo bar'))
foo%20bar
There is always the singleton class.
Is there a simple, elegant way to define Singletons in Python?

Switch in Python [duplicate]

This question already has answers here:
Replacements for switch statement in Python?
(44 answers)
Closed 27 days ago.
I have tried making a switch like statement in python, instead of having a lot of if statements.
The code looks like this:
def findStuff(cds):
L=[]
c=0
for i in range(0, len(cds), 3):
a=differencesTo(cds[i:i+3])
result = {
a[2][0]==1: c=i+1,
a[2][1]==1: c=i+2,
a[2][2]==1: c=i+3,
a[1]==1: L.append((cds[i:i+3], a[0], c))
}
return L
My problem is, that this does not work. (Works with if statements, but this would in my opinion be more pretty).
I have found some examples of switches in Python, and they follow this structure. Can anyone help me?
(a) I fail to see what is wrong with if...elif...else
(b) I assume that python does not have a switch statement for the same reason that Smalltalk doesn't: it's almost completely redundant, and in the case where you want to switch on types, you can add an appropriate method to your classes; and likewise switching on values should be largely redundant.
Note: I am informed in the comments that whatever Guido's reason for not creating a switch in the first place, PEPs to have it added were rejected on the basis that support for adding such a statement is extremely limited. See: http://www.python.org/dev/peps/pep-3103/
(c) If you really need switching behaviour, use a hashtable (dict) to store callables. The structure is:
switch_dict = {
Foo: self.doFoo,
Bar: self.doBar,
}
func = switch_dict[switch_var]
result = func() # or if they take args, pass args
There's nothing wrong with a long if:
if switch == 'case0':
do_case0()
elif switch == 'case1':
do_case1()
elif switch == 'case2':
do_case2()
...
If that's too long winded, or if you have a lot of cases, put them in a dictionary:
switch = {'case0': do_case0, 'case1': do_case1, 'case2': do_case2, ...}
switch[case_variable]()
// Alternative:
(switch[case_variable]).__call__()
If your conditions are a bit more complex, you need to think a little about your data structures. e.g.:
switch = {
(0,21): 'never have a pension',
(21,50): 'might have a pension',
(50,65): 'definitely have a pension',
(65, 200): 'already collecting pension'
}
for key, value in switch.items():
if key[0] <= case_var < key[1]:
print(value)
Other ans are suitable for older version of python. For python v3.10+ you can use match/case which is more powerful than general switch/case construct.
def something(val):
match val:
case "A":
return "A"
case "B":
return "B"
case "C":
return "C"
case _:
return "Default"
something("A")
Assignment in Python is a statement, and cannot be a part of expression. Also, using literal in this way evaluates everything at once, which is probably not what you want. Just use ifs, you won't gain any readability by using this.
I don't know which article you've found to do something like this, but this is really messy: the whole result diction will be always evaluated, and instead of doing only part of the work (as a switch / if do), you'll do the whole work everytime. (even if you use only a part of the result).
Really, a fast switch statement in Python is using "if":
if case == 1:
pass
elif case == 2:
pass
elif case == 3:
pass
else:
# default case
pass
With "get" method, you can have the same effect as "switch..case" in C.
Marcin example :
switch_dict = {
Foo: self.doFoo,
Bar: self.doBar,
}
func = switch_dict.get(switch_var, self.dodefault)
result = func() # or if they take args, pass args
You can do something like what you want, but you shouldn't. That said, here's how; you can see how it does not improve things.
The biggest problem with the way you have it is that Python will evaluate your tests and results once, at the time you declare the dictionary. What you'd have to do instead is make all conditions and the resulting statements functions; this way, evaluation is deferred until you call them. Fortunately there is a way to do this inline for simple functions using the lambda keyword. Secondly, the assignment statement can't be used as a value in Python, so our action functions (which are executed if the corresponding condition function returns a truthy value) have to return a value that will be used to increment c; they can't assign to c themselves.
Also, the items in a dictionary aren't ordered, so your tests won't necessarily be performed in the order you define them, meaning you probably should use something other than a dictionary that preserves order, such as a tuple or a list. I am assuming you want only ever one case to execute.
So, here we go:
def findStuff(cds):
cases = [ (lambda: a[2][0] == 1, lambda: i + 1),
(lambda: a[2][1] == 1, lambda: i + 2),
(lambda: a[2][2] == 1, lambda: i + 3),
(lambda: a[1] == 1, lambda: L.append(cds[i:i+3], a[0], c) or 0)
]
L=[]
c=0
for i in range(0, len(cds), 3):
a=differencesTo(cds[i:i+3])
for condition, action in cases:
if condition():
c += action()
break
return L
Is this more readable than a sequence of if/elif statements? Nooooooooooooo. In particular, the fourth case is far less comprehensible than it should be because we are having to rely on a function that returns the increment for c to modify a completely different variable, and then we have to figure out how to get it to return a 0 so that c won't actually be modified. Uuuuuugly.
Don't do this. In fact this code probably won't even run as-is, as I deemed it too ugly to test.
While there is nothing wrong with if..else, I find "switch in Python" still an intriguing problem statement. On that, I think Marcin's (deprecated) option (c) and/or Snim2's second variant can be written in a more readable way.
For this we can declare a switch class, and exploit the __init__() to declare the case we want to switch, while __call__() helps to hand over a dict listing the (case, function) pairs:
class switch(object):
def __init__(self, case):
self._case = case
def __call__(self, dict_):
try:
return dict_[self._case]()
except KeyError:
if 'else' in dict_:
return dict_['else']()
raise Exception('Given case wasn\'t found.')
Or, respectively, since a class with only two methods, of which one is __init__(), isn't really a class:
def switch(case):
def cases(dict_):
try:
return dict_[case]()
except KeyError:
if 'else' in dict_:
return dict_['else']()
raise Exception('Given case wasn\'t found.')
return cases
(note: choose something smarter than Exception)
With for example
def case_a():
print('hello world')
def case_b():
print('sth other than hello')
def default():
print('last resort')
you can call
switch('c') ({
'a': case_a,
'b': case_b,
'else': default
})
which, for this particular example would print
last resort
This doesn't behave like a C switch in that there is no break for the different cases, because each case executes only the function declared for the particular case (i.e. break is implicitly always called). Secondly, each case can list exactly only one function that will be executed upon a found case.

Is this a "pythonic" method of executing functions as a python switch statement for tuple values?

I have a situation where I have six possible situations which can relate to four different results. Instead of using an extended if/else statement, I was wondering if it would be more pythonic to use a dictionary to call the functions that I would call inside the if/else as a replacement for a "switch" statement, like one might use in C# or php.
My switch statement depends on two values which I'm using to build a tuple, which I'll in turn use as the key to the dictionary that will function as my "switch". I will be getting the values for the tuple from two other functions (database calls), which is why I have the example one() and zero() functions.
This is the code pattern I'm thinking of using which I stumbled on with playing around in the python shell:
def one():
#Simulated database value
return 1
def zero():
return 0
def run():
#Shows the correct function ran
print "RUN"
return 1
def walk():
print "WALK"
return 1
def main():
switch_dictionary = {}
#These are the values that I will want to use to decide
#which functions to use
switch_dictionary[(0,0)] = run
switch_dictionary[(1,1)] = walk
#These are the tuples that I will build from the database
zero_tuple = (zero(), zero())
one_tuple = (one(), one())
#These actually run the functions. In practice I will simply
#have the one tuple which is dependent on the database information
#to run the function that I defined before
switch_dictionary[zero_tuple]()
switch_dictionary[one_tuple]()
I don't have the actual code written or I would post it here, as I would like to know if this method is considered a python best practice. I'm still a python learner in university, and if this is a method that's a bad habit, then I would like to kick it now before I get out into the real world.
Note, the result of executing the code above is as expected, simply "RUN" and "WALK".
edit
For those of you who are interested, this is how the relevant code turned out. It's being used on a google app engine application. You should find the code is considerably tidier than my rough example pattern. It works much better than my prior convoluted if/else tree.
def GetAssignedAgent(self):
tPaypal = PaypalOrder() #Parent class for this function
tAgents = []
Switch = {}
#These are the different methods for the actions to take
Switch[(0,0)] = tPaypal.AssignNoAgent
Switch[(0,1)] = tPaypal.UseBackupAgents
Switch[(0,2)] = tPaypal.UseBackupAgents
Switch[(1,0)] = tPaypal.UseFullAgents
Switch[(1,1)] = tPaypal.UseFullAndBackupAgents
Switch[(1,2)] = tPaypal.UseFullAndBackupAgents
Switch[(2,0)] = tPaypal.UseFullAgents
Switch[(2,1)] = tPaypal.UseFullAgents
Switch[(2,2)] = tPaypal.UseFullAgents
#I'm only interested in the number up to 2, which is why
#I can consider the Switch dictionary to be all options available.
#The "state" is the current status of the customer agent system
tCurrentState = (tPaypal.GetNumberofAvailableAgents(),
tPaypal.GetNumberofBackupAgents())
tAgents = Switch[tCurrentState]()
Consider this idiom instead:
>>> def run():
... print 'run'
...
>>> def walk():
... print 'walk'
...
>>> def talk():
... print 'talk'
>>> switch={'run':run,'walk':walk,'talk':talk}
>>> switch['run']()
run
I think it is a little more readable than the direction you are heading.
edit
And this works as well:
>>> switch={0:run,1:walk}
>>> switch[0]()
run
>>> switch[max(0,1)]()
walk
You can even use this idiom for a switch / default type structure:
>>> default_value=1
>>> try:
... switch[49]()
... except KeyError:
... switch[default_value]()
Or (the less readable, more terse):
>>> switch[switch.get(49,default_value)]()
walk
edit 2
Same idiom, extended to your comment:
>>> def get_t1():
... return 0
...
>>> def get_t2():
... return 1
...
>>> switch={(get_t1(),get_t2()):run}
>>> switch
{(0, 1): <function run at 0x100492d70>}
Readability matters
It is a reasonably common python practice to dispatch to functions based on a dictionary or sequence lookup.
Given your use of indices for lookup, an list of lists would also work:
switch_list = [[run, None], [None, walk]]
...
switch_list[zero_tuple]()
What is considered most Pythonic is that which maximizes clarity while meeting other operational requirements. In your example, the lookup tuple doesn't appear to have intrinsic meaning, so the operational intent is being lost of a magic constant. Try to make sure the business logic doesn't get lost in your dispatch mechanism. Using meaningful names for the constants would likely help.

The Matlab equivalent of Python's "None"

Is there a keyword in Matlab that is roughly equivalent to None in python?
I am trying to use it to mark an optional argument to a function. I am translating the following Python code
def f(x,y=None):
if y == None:
return g(x)
else:
return h(x,y)
into Matlab
function rtrn = f(x,y)
if y == []:
rtrn = g(x);
else
rtrn = h(x,y);
end;
end
As you can see currently I am using [] as None. Is there a better way to do this?
in your specific case. you may use nargin to determine how many input arguments here provided when calling the function.
from the MATLAB documentation:
The nargin and nargout functions
enable you to determine how many input
and output arguments a function is
called with. You can then use
conditional statements to perform
different tasks depending on the
number of arguments. For example,
function c = testarg1(a, b)
if (nargin == 1)
c = a .^ 2;
elseif (nargin == 2)
c = a + b;
end
Given a single input argument, this
function squares the input value.
Given two inputs, it adds them
together.
NaN while not equivalent, often serves the similar purpose.
nargin is definitely the easiest way of doing it. Also it is usually good practice to validate the number of input argument using nargchk:
function e = testFunc(a,b,c,d)
error( nargchk(2, 4, nargin, 'struct') );
% set default values
if nargin<4, d = 0; end
if nargin<3, c = 0; end
% ..
c = a*b + c*d;
end
... which acts as a way to ensure the correct number of arguments is passed. In this case, a minimum of two arguments are required, with a maximum of four.
If nargchk detects no error, execution resumes normally, otherwise an error is generated. For example, calling testFunc(1) generates:
Not enough input arguments.
UPDATE: A new function was introduced in R2011b narginchk, which replaces the use of the deprecated nargchk+error seen above:
narginchk(2,4);
You can use functions like: exist and isempty to check whether a variable exists and whether it is empty respectively:
if ~exist('c','var') || isempty(c)
c = 10;
end
which allows you to call your function such as: testFunc(1,2,[],4) telling it to use the default value for c but still giving a value for d
You could also use varargin to accept a variable number of arguments.
Finally a powerful way to parse and validate named inputs is to use inputParser
To see examples and other alternatives of passing arguments and setting default values, check out this post and its comments as well.
The equivalent to Python None in MATLAB is string(missing)
To test, type the following in your command window : py.type( string(missing) )
It returns <class 'NoneType'>
MATLAB to python data types documentation here
If you want to pass None into a Python function that you are calling from MATLAB, then you would pass in string(missing). This argument would show up as None in the Python function, for example, if you are detecting for None such as if arg1 == None.

Categories

Resources