using function attributes to store results for lazy (potential) processing - python

I'm doing some collision detection and I would very much like to use the same function in two different contexts. In one context, I would like for it to be something like
def detect_collisions(item, others):
return any(collides(item, other) for other in others)
and in another, I would like it to be
def get_collisions(item, others):
return [other for other in others if collides(item, other)]
I really hate the idea of writing two functions here. Just keeping their names straight is one turnoff and complicating the interface to the collision detecting is another. so I was thinking:
def peek(gen):
try:
first = next(gen)
except StopIteration:
return False
else:
return it.chain((first,), gen)
def get_collisions(item, others):
get_collisions.all = peek(other for other in others if collides(item, other))
return get_collisions.all
Now when I just want to do a check, I can say:
if get_collisions(item, others):
# aw snap
or
if not get_collisions(item, others):
# w00t
and in the other context where I actually want to examine them, I can do:
if get_collisions(item, others):
for collision in get_collisions.all:
# fix it
and in both cases, I don't do any more processing than I need to.
I recognize that this is more code than the first two functions but it also has the advantage of:
Keeping my interface to the collision detection as a tree with the node at the top level instead of a mid level. This seems simpler.
Hooking myself up with a handy peek function. If I use it one other time, then I'm actually writing less code. (In response to YAGNI, if I have it, I will)
So. If you were the proverbial homicidal maniac that knows where I live, would I be expecting a visit from you if I wrote the above code? If so, how would you approach this situation?

Just make get_collisions return a generator:
def get_collisions(item, others):
return (other for other in others if collides(item, other))
Then, if you want to do a check:
for collision in get_collisions(item, others):
    print 'Collision!'
break
else:
    print 'No collisions!'

This is very similar to what we were discussing in "pythonic way to rewrite an assignment in an if statement", but this version only handles one positional or one keyword argument per call so one can always retrieve the actual valued cached (not just whether is a True value in the Pythonian boolean sense or not).
I never really cared for your proposal to accept multiple keywords and have differently named functions depending on whether you wanted all the results put through any() or all() -- but liked the idea of using keyword arguments which would allow a single function to be used in two or more spots simultaneously. Here's what I ended up with:
# can be called with a single unnamed value or a single named value
def cache(*args, **kwargs):
if len(args)+len(kwargs) == 1:
if args:
name, value = 'value', args[0] # default attr name 'value'
else:
name, value = kwargs.items()[0]
else:
raise NotImplementedError('"cache" calls require either a single value argument '
'or a name=value argument identifying an attribute.')
setattr(cache, name, value)
return value
# add a sub-function to clear the cache attributes (optional and a little weird)
cache.clear = lambda: cache.func_dict.clear()
# you could then use it either of these two ways
if get_collisions(item, others):
# no cached value
if cache(collisions=get_collisions(item, others)):
for collision in cache.collisions:
# fix them
By putting all the ugly details in a separate function, it doesn't affect the code in get_collisions() one way or the other, and is also available for use elsewhere.

Related

Automatically return from a function based on another function call

Lets say I have a function myFunc defined as
def myFunc(value):
return value if isinstance(value, int) else None
Now wherever in my project I use myFunc the enclosing funciton should return automatically if the value returned from myFunc is None and should continue if some integer value is returned
For example:
def dumbFunc():
# some code
# goes here..
result = myFunc('foo')
# some code
# goes here..
This funciton should automatically behave like..
def dumbFunc():
# some code
# goes here..
result = myFunc('foo')
if not result:
return
# some code
# goes here..
PS - I don't know whether this thing even possible or not.
This is simply not possible.
Apart from exceptions, you cannot give a called function the ability to impact the control flow of the calling scope. So a function call foo() can never interrupt the control flow without throwing an exception. As a consumer of the function, you (the calling function) always have the responsibility yourself to handle such cases and decide about your own control flow.
And it is a very good idea to do it like that. Just the possibility that a function call might interrupt my control flow without having a possibility to react on it first sounds like a pure nightmare. Just alone for the ability to release and cleanup resources, it is very important that the control flow is not taken from me.
Exceptions are the notable exception from this, but of course this is a deeply rooted language feature which also still gives me the ability to act upon it (by catching exceptions, and even by having finally blocks to perform clean up tasks). Exceptions are deliberately not silent but very loud, so that interruptions from the deterministic control flow are clearly visible and have a minimum impact when properly handled.
But having a silent feature that does neither give any control nor feedback would be just a terrible idea.
If myFunc is used at 100 places in my project, everywhere I need to put an if condition after it.
If your code is like that that you could just return nothing from any function that calls myFunc without having to do anything, then either you are building an unrealistic fantasy project, or you simply are not aware of the implications this can have to the calling code of the functions that would be returned that way.
ok, I'll bite.
on the one hand, this isn't really possible. if you want to check something you have to have a line in your code that checks it.
there are a few ways you could achieve something like this, but i think you may have already found the best one.
you already have this function:
def myFunc(value):
return value if isinstance(value, int) else None
I would probably have done:
def myFunc(value):
return isinstance(value, int)
but either way you could use it:
def dumb_func():
value = do_something()
if myFunc(value):
return
do_more()
return value
alternately you could use try and except
I would raise a TypeError, seeing as that seems to be what you are checking:
def myFunc(value):
if not isinstance(value, int):
raise TypeError('myFunc found that {} is not an int'.format(value))
then you can use this as such
def dumb_func():
value = do_something()
try:
myFunc(value):
Except TypeError as e:
print e # some feedback that this has happened, but no error raised
return
do_more()
return value
for bonus points you could define a custom exception (which is safer because then when you catch that specific error you know it wasn't raised by anything else in your code, also if you did that you could be lazier eg:)
Class CustomTypeError(TypeError):
pass
def dumb_func():
try:
value = do_something()
myFunc(value):
do_more()
return value
Except CustomTypeError as e:
print e # some feedback that this has happened, but no error raised
return
but none of this gets around the fact that if you want to act based on the result of a test, you have to check that result.
Python has a ternary conditional operator, and the syntax you used is right, so this will work:
def myFunc(value):
return value if isinstance(value, int) else None
def dumbFunc():
print("Works?")
result = myFunc(5)
print(result)
dumbFunc()
Result:
Works?
5
I want the function to return automatically in that case
This is not possible. To do that, you have to check the return value of myFunc() and act upon it.
PS: You could do that with a goto statement, but Python, fortunately, doesn't support this functionality.

How do I identify executed-but-dead variable assignments in python

I frequently use tools like pyflakes and pep8, as well as coverage/coveralls etc. But in my project we have a handful of 1000+ line files with things like
FOO = 3
BAR['foo'] = 5 # Actually overwritten/dead-code
BAR['bat'] = 6
BAR['bazooka']['aimed'] = True # Actually Overwritten/dead-code
for i in BAR['bazooka'].keys():
BAR['bazooka'][i] = False
if not BAR['bazooka']['aimed']:
BAR['foo'] = 500
etc, in that same file we also have things that try to do smart things with for loops, and replacing parts of the dict stack depending on some other variables.
I'm interested if python has any way (at-execution is fine, but without needing to modify every var assignment) of going a "well the assigned on line 3 was actually overwritten by the assignment in the loop on line 398)
Basically a way of going "well the variable assigned here was gc()'d so we have a new value for it"
coverage can't identify because the assignments are executed code, we can't sort the file alphabetically due to said loops (and the loop changes are where this holds the most value)
You could define BAR to be an instance of a class that keeps track of what items are re-assigned without ever having been referenced, for example. Subclassing dict for brevity (but it's better to hold a dict and subclass collections.Mapping) -- also a simplified proof of concept in other ways (items could be "used" in other ways than just being obtained by indexing, etc):
class Bardict(dict):
def __init__(*a, **k):
dict.__init__(self, *a, **k)
self.assigned_not_used = set(self)
def __getitem__(self, key):
self.assigned_not_used.discard(key)
return dict.__getitem__(self, key)
def __setitem__(self, key, value):
if key in self.assigned_not_used:
print('Assigned, never used: {}'.format(key)
self.assigned_not_used.add(key)
return dict.__setitem__(self, key)
BAR = Bardict()
This also gives what I consider a "false positive" in cases such as:
BAR['foo'] = 'zap'
if some_condition: BAR['foo'] = 'zip'
which is actually a perfectly fine way to initialize BAR['foo'] (though an if/else is fine too, of course). But it's really hard to be quiet in this case and speak up in other cases you appear to want to flag...

How do I retain the method attributes of the functions generated through yield in python 2.7?

I have been doing a lot of searching, and I don't think I've really found what I have been looking for. I will try my best to explain what I am trying to do, and hopefully there is a simple solution, and I'll be glad to have learned something new.
This is ultimately what I am trying to accomplish: Using nosetests, decorate some test cases using the attribute selector plugin, then execute test cases that match a criteria by using the -a switch during commandline invocation. The attribute values for the tests that are executed are then stored in an external location. The command line call I'm using is like below:
nosetests \testpath\ -a attribute='someValue'
I have also created a customized nosetest plugin, which stores the test cases' attributse, and writes them to an external location. The idea is that I can select a batch of tests, and by storing the attributes of these tests, I can do filtering on these results later for reporting purposes. I am accessing the method attributes in my plugin by overriding the "wantMethod" method with the code similar to the following:
def set_attribs(self, method, attribute):
if hasattr(method, attribute):
if not self.method_attributes.has_key(method.__name__):
self.method_attributes[method.__name__] = {}
self.method_attributes[method.__name__][attribute] = getattr(method, attribute)
def wantMethod(self, method):
self.set_attribs(method, "attribute1")
self.set_attribs(method, "attribute2")
pass
I have this working for pretty much all the tests, except for one case, where the test is uing the "yield" keyword. What is happening is that the methods that are generated are being executed fine, but then the method attributes are empty for each of the generated functions.
Below is the example of what I am trying to achieve. The test below retreives a list of values, and for each of those values, yields the results from another function:
#attr(attribute1='someValue', attribute2='anotherValue')
def sample_test_generator(self):
for (key, value) in _input_dictionary.items()
f = partial(self._do_test, key, value)
f.attribute1='someValue'
yield (lambda x: f(), key)
def _do_test(self, input1, input2):
# Some code
From what I have read, and think I understand, when yield is called, it would create a new callable function which then gets executed. I have been trying to figure out how to retain the attribute values from my sample_test_generator method, but I have not been successful. I thought I could create a partial method, and then add the attribute to the method, but no luck. The tests execute without errors at all, it just seems that from my plugin's perspective, the method attributes aren't present, so they don't get recorded.
I realize this a pretty involved question, but I wanted to make sure that the context for what I am trying to achieve is clear. I have been trying to find information that could help me for this particular case, but I feel like I've reached a stumbling block now, so I would really like to ask the experts for some advice.
Thanks.
** Update **
After reading through the feedback and playing around some more, it looks like if I modified the lambda expression, it would achieve what I am looking for. In fact, I didn't even need to create the partial function:
def sample_test_generator(self):
for (key, value) in _input_dictionary.items()
yield (lambda: self._do_test)
The only downside to this approach is that the test name will not change. As I am playing around more, it looks like in nosetests, when a test generator is used, it would actually change the test name in the result based on the keywords it contains. Same thing was happening when I was using the lambda expression with a parameter.
For example:
Using lamdba expression with a parameter:
yield (lambda x: self._do_test, "value1")
In nosetests plugin, when you access the test case name, it would be displayed as "sample_test_generator(value1)
Using lambda expression without a parameter:
yield (lambda: self._do_test)
The test case name in this case would be "sample_test_generator". In my example above, if there are multiple values in the dictionary, then the yield call would occur multiple times. However, the test name would always remain as "sample_test_generator". This is not as bad as when I would get the unique test names, but then not be able to store the attribute values at all. I will keep playing around, but thanks for the feedback so far!
EDIT
I forgot to come back and provide my final update on how I was able to get this to work in the end, there was a little confusion on my part at first, and after I looked through it some more, I figured out that it had to do with how the tests are recognized:
My original implementation assumed that every test that gets picked up for execution goes through the "wantMethod" call from the plugin's base class. This is not true when "yield" is used to generate the test, because at this point, the test method has already passed the "wantMethod" call.
However, once the test case is generated through the "yeild" call, it does go through the "startTest" call from the plug-in base class, and this is where I was finally able to store the attribute successfully.
So in a nut shell, my test execution order looked like this:
nose -> wantMethod(method_name) -> yield -> startTest(yielded_test_name)
In my override of the startTest method, I have the following:
def startTest(self, test):
# If a test is spawned by using the 'yield' keyword, the test names would be the parent test name, appended by the '(' character
# example: If the parent test is "smoke_test", the generated test from yield would be "smoke_test('input')
parent_test_name = test_name.split('(')[0]
if self.method_attributes.has_key(test_name):
self._test_attrib = self.method_attributes[test_name]
elif self.method_attributes.has_key(parent_test_name):
self._test_attrib = self.method_attributes[parent_test_name]
else:
self._test_attrib = None
With this implementation, along with my overide of wantMethod, each test spawned by the parent test case also inherits attributes from the parent method, which is what I needed.
Again, thanks to all who send replies. This was quite a learning experience.
Would this fix your name issue?
def _actual_test(x, y):
assert x == y
def test_yield():
_actual_test.description = "test_yield_%s_%s" % (5, 5)
yield _actual_test, 5, 5
_actual_test.description = "test_yield_%s_%s" % (4, 8) # fail
yield _actual_test, 4, 8
_actual_test.description = "test_yield_%s_%s" % (2, 2)
yield _actual_test, 2, 2
Rename survives #attr too.
does this work?
#attr(attribute1='someValue', attribute2='anotherValue')
def sample_test_generator(self):
def get_f(f, key):
return lambda x: f(), key
for (key, value) in _input_dictionary.items()
f = partial(self._do_test, key, value)
f.attribute1='someValue'
yield get_f(f, key)
def _do_test(self, input1, input2):
# Some code
The Problem ist that the local variables change after you created the lambda.

Dictionary or If statements, Jython

I am writing a script at the moment that will grab certain information from HTML using dom4j.
Since Python/Jython does not have a native switch statement I decided to use a whole bunch of if statements that call the appropriate method, like below:
if type == 'extractTitle':
extractTitle(dom)
if type == 'extractMetaTags':
extractMetaTags(dom)
I will be adding more depending on what information I want to extract from the HTML and thought about taking the dictionary approach which I found elsewhere on this site, example below:
{
'extractTitle': extractTitle,
'extractMetaTags': extractMetaTags
}[type](dom)
I know that each time I run the script the dictionary will be built, but at the same time if I were to use the if statements the script would have to check through all of them until it hits the correct one. What I am really wondering, which one performs better or is generally better practice to use?
Update: #Brian - Thanks for the great reply. I have a question, if any of the extract methods require more than one object, e.g.
handle_extractTag(self, dom, anotherObject)
# Do something
How would you make the appropriate changes to the handle method to implemented this? Hope you know what I mean :)
Cheers
To avoid specifying the tag and handler in the dict, you could just use a handler class with methods named to match the type. Eg
class MyHandler(object):
def handle_extractTitle(self, dom):
# do something
def handle_extractMetaTags(self, dom):
# do something
def handle(self, type, dom):
func = getattr(self, 'handle_%s' % type, None)
if func is None:
raise Exception("No handler for type %r" % type)
return func(dom)
Usage:
handler = MyHandler()
handler.handle('extractTitle', dom)
Update:
When you have multiple arguments, just change the handle function to take those arguments and pass them through to the function. If you want to make it more generic (so you don't have to change both the handler functions and the handle method when you change the argument signature), you can use the *args and **kwargs syntax to pass through all received arguments. The handle method then becomes:
def handle(self, type, *args, **kwargs):
func = getattr(self, 'handle_%s' % type, None)
if func is None:
raise Exception("No handler for type %r" % type)
return func(*args, **kwargs)
With your code you're running your functions all get called.
handlers = {
'extractTitle': extractTitle,
'extractMetaTags': extractMetaTags
}
handlers[type](dom)
Would work like your original if code.
It depends on how many if statements we're talking about; if it's a very small number, then it will be more efficient than using a dictionary.
However, as always, I strongly advice you to do whatever makes your code look cleaner until experience and profiling tell you that a specific block of code needs to be optimized.
Your use of the dictionary is not quite correct. In your implementation, all methods will be called and all the useless one discarded. What is usually done is more something like:
switch_dict = {'extractTitle': extractTitle,
'extractMetaTags': extractMetaTags}
switch_dict[type](dom)
And that way is facter and more extensible if you have a large (or variable) number of items.
The efficiency question is barely relevant. The dictionary lookup is done with a simple hashing technique, the if-statements have to be evaluated one at a time. Dictionaries tend to be quicker.
I suggest that you actually have polymorphic objects that do extractions from the DOM.
It's not clear how type gets set, but it sure looks like it might be a family of related objects, not a simple string.
class ExtractTitle( object ):
def process( dom ):
return something
class ExtractMetaTags( object ):
def process( dom ):
return something
Instead of setting type="extractTitle", you'd do this.
type= ExtractTitle() # or ExtractMetaTags() or ExtractWhatever()
type.process( dom )
Then, you wouldn't be building this particular dictionary or if-statement.

A simple freeze behavior decorator

I'm trying to write a freeze decorator for Python.
The idea is as follows :
(In response to the two comments)
I might be wrong but I think there is two main use of
test case.
One is the test-driven development :
Ideally , developers are writing case before writing the code.
It usually helps defining the architecture because this discipline
forces to define the real interfaces before development.
One may even consider that in some case the person who
dispatches job between dev is writing the test case and
use it to illustrate efficiently the specification he has in mind.
I don't have any experience of the use of test case like that.
The second is the idea that all project with a decent
size and a several programmers is suffering from broken code.
Something that use to work may get broken from a change
that looked like an innocent refactoring.
Though good architecture, loose couple between component may
help to fight against this phenomenon ; you will sleep better
at night if you have written some test case to make sure
that nothing will break your program's behavior.
HOWEVER,
Nobody can deny the overhead of writting test cases. In the
first case one may argue that test case is actually guiding
development and is therefore not to be considered as an overhead.
Frankly speaking, I'm a pretty young programmer and if I were
you, my word on this subject is not really valuable...
Anyway, I think that mosts company/projects are not working
like that, and that unit tests are mainly used in the second
case...
In other words, rather than ensuring that the program is
working correctly, it is aiming at checking that it will
work the same in the future.
This needs can be met without the cost of writing tests,
by using this freezing decorator.
Let's say you have a function
def pow(n,k):
if n == 0: return 1
else: return n * pow(n,k-1)
It is perfectly nice, and you want to rewrite it as an optimized version.
It is part of a big project. You want it to give back the same result
for a few value.
Rather than going through the pain of test cases, one could use some
kind of freeze decorator.
Something such that the first time the decorator is run,
the decorator run the function with the defined args (below 0, and 7)
and saves the result in a map ( f --> args --> result )
#freeze(2,0)
#freeze(1,3)
#freeze(3,5)
#freeze(0,0)
def pow(n,k):
if n == 0: return 1
else: return n * pow(n,k-1)
Next time the program is executed, the decorator will load this map and check
that the result of this function for these args as not changed.
I already wrote quickly the decorator (see below), but hurt a few problems about
which I need your advise...
from __future__ import with_statement
from collections import defaultdict
from types import GeneratorType
import cPickle
def __id_from_function(f):
return ".".join([f.__module__, f.__name__])
def generator_firsts(g, N=100):
try:
if N==0:
return []
else:
return [g.next()] + generator_firsts(g, N-1)
except StopIteration :
return []
def __post_process(v):
specialized_postprocess = [
(GeneratorType, generator_firsts),
(Exception, str),
]
try:
val_mro = v.__class__.mro()
for ( ancestor, specialized ) in specialized_postprocess:
if ancestor in val_mro:
return specialized(v)
raise ""
except:
print "Cannot accept this as a value"
return None
def __eval_function(f):
def aux(args, kargs):
try:
return ( True, __post_process( f(*args, **kargs) ) )
except Exception, e:
return ( False, __post_process(e) )
return aux
def __compare_behavior(f, past_records):
for (args, kargs, result) in past_records:
assert __eval_function(f)(args,kargs) == result
def __record_behavior(f, past_records, args, kargs):
registered_args = [ (a, k) for (a, k, r) in past_records ]
if (args, kargs) not in registered_args:
res = __eval_function(f)(args, kargs)
past_records.append( (args, kargs, res) )
def __open_frz():
try:
with open(".frz", "r") as __open_frz:
return cPickle.load(__open_frz)
except:
return defaultdict(list)
def __save_frz(past_records):
with open(".frz", "w") as __open_frz:
return cPickle.dump(past_records, __open_frz)
def freeze_behavior(*args, **kvargs):
def freeze_decorator(f):
past_records = __open_frz()
f_id = __id_from_function(f)
f_past_records = past_records[f_id]
__compare_behavior(f, f_past_records)
__record_behavior(f, f_past_records, args, kvargs)
__save_frz(past_records)
return f
return freeze_decorator
Dumping and Comparing of results is not trivial for all type. Right now I'm thinking about using a function (I call it postprocess here), to solve this problem.
Basically instead of storing res I store postprocess(res) and I compare postprocess(res1)==postprocess(res2), instead of comparing res1 res2.
It is important to let the user overload the predefined postprocess function.
My first question is :
Do you know a way to check if an object is dumpable or not?
Defining a key for the function decorated is a pain. In the following snippets
I am using the function module and its name.
** Can you think of a smarter way to do that. **
The snippets below is kind of working, but opens and close the file when testing and when recording. This is just a stupid prototype... but do you know a nice way to open the file, process the decorator for all function, close the file...
I intend to add some functionalities to this. For instance, add the possibity to define
an iterable to browse a set of argument, record arguments from real use, etc.
Why would you expect from such a decorator?
In general, would you use such a feature, knowing its limitation... Especially when trying to use it with POO?
"In general, would you use such a feature, knowing its limitation...?"
Frankly speaking -- never.
There are no circumstances under which I would "freeze" results of a function in this way.
The use case appears to be based on two wrong ideas: (1) that unit testing is either hard or complex or expensive; and (2) it could be simpler to write the code, "freeze" the results and somehow use the frozen results for refactoring. This isn't helpful. Indeed, the very real possibility of freezing wrong answers makes this a bad idea.
First, on "consistency vs. correctness". This is easier to preserve with a simple mapping than with a complex set of decorators.
Do this instead of writing a freeze decorator.
print "frozen_f=", dict( (i,f(i)) for i in range(100) )
The dictionary object that's created will work perfectly as a frozen result set. No decorator. No complexity to speak of.
Second, on "unit testing".
The point of a unit test is not to "freeze" some random results. The point of a unit test is to compare real results with results developed another (simpler, more obvious, poorly-performing way). Usually unit tests compare hand-developed results. Other times unit tests use obvious but horribly slow algorithms to produce a few key results.
The point of having test data around is not that it's a "frozen" result. The point of having test data is that it is an independent result. Done differently -- sometimes by different people -- that confirms that the function works.
Sorry. This appears to me to be a bad idea; it looks like it subverts the intent of unit testing.
"HOWEVER, Nobody can deny the overhead of writting test cases"
Actually, many folks would deny the "overhead". It isn't "overhead" in the sense of wasted time and effort. For some of us, unittests are essential. Without them, the code may work, but only by accident. With them, we have ample evidence that it actually works; and the specific cases for which it works.
Are you looking to implement invariants or post conditions?
You should specify the result explicitly, this wil remove most of you problems.

Categories

Resources