Dictionary or If statements, Jython - python

I am writing a script at the moment that will grab certain information from HTML using dom4j.
Since Python/Jython does not have a native switch statement I decided to use a whole bunch of if statements that call the appropriate method, like below:
if type == 'extractTitle':
extractTitle(dom)
if type == 'extractMetaTags':
extractMetaTags(dom)
I will be adding more depending on what information I want to extract from the HTML and thought about taking the dictionary approach which I found elsewhere on this site, example below:
{
'extractTitle': extractTitle,
'extractMetaTags': extractMetaTags
}[type](dom)
I know that each time I run the script the dictionary will be built, but at the same time if I were to use the if statements the script would have to check through all of them until it hits the correct one. What I am really wondering, which one performs better or is generally better practice to use?
Update: #Brian - Thanks for the great reply. I have a question, if any of the extract methods require more than one object, e.g.
handle_extractTag(self, dom, anotherObject)
# Do something
How would you make the appropriate changes to the handle method to implemented this? Hope you know what I mean :)
Cheers

To avoid specifying the tag and handler in the dict, you could just use a handler class with methods named to match the type. Eg
class MyHandler(object):
def handle_extractTitle(self, dom):
# do something
def handle_extractMetaTags(self, dom):
# do something
def handle(self, type, dom):
func = getattr(self, 'handle_%s' % type, None)
if func is None:
raise Exception("No handler for type %r" % type)
return func(dom)
Usage:
handler = MyHandler()
handler.handle('extractTitle', dom)
Update:
When you have multiple arguments, just change the handle function to take those arguments and pass them through to the function. If you want to make it more generic (so you don't have to change both the handler functions and the handle method when you change the argument signature), you can use the *args and **kwargs syntax to pass through all received arguments. The handle method then becomes:
def handle(self, type, *args, **kwargs):
func = getattr(self, 'handle_%s' % type, None)
if func is None:
raise Exception("No handler for type %r" % type)
return func(*args, **kwargs)

With your code you're running your functions all get called.
handlers = {
'extractTitle': extractTitle,
'extractMetaTags': extractMetaTags
}
handlers[type](dom)
Would work like your original if code.

It depends on how many if statements we're talking about; if it's a very small number, then it will be more efficient than using a dictionary.
However, as always, I strongly advice you to do whatever makes your code look cleaner until experience and profiling tell you that a specific block of code needs to be optimized.

Your use of the dictionary is not quite correct. In your implementation, all methods will be called and all the useless one discarded. What is usually done is more something like:
switch_dict = {'extractTitle': extractTitle,
'extractMetaTags': extractMetaTags}
switch_dict[type](dom)
And that way is facter and more extensible if you have a large (or variable) number of items.

The efficiency question is barely relevant. The dictionary lookup is done with a simple hashing technique, the if-statements have to be evaluated one at a time. Dictionaries tend to be quicker.
I suggest that you actually have polymorphic objects that do extractions from the DOM.
It's not clear how type gets set, but it sure looks like it might be a family of related objects, not a simple string.
class ExtractTitle( object ):
def process( dom ):
return something
class ExtractMetaTags( object ):
def process( dom ):
return something
Instead of setting type="extractTitle", you'd do this.
type= ExtractTitle() # or ExtractMetaTags() or ExtractWhatever()
type.process( dom )
Then, you wouldn't be building this particular dictionary or if-statement.

Related

Python: More elegant way to add optional parameters to method call

This will seem trivial perhaps, but it is a condition that I run into fairly frequently and would like to find a more elegant way of writing this code. The method, while not terribly relevant to the question, takes a text value and an optional is_checked value to create a radio button (using dominate). In this case, I can't set 'checked' to None, or false - it either has to be there or not. It doesn't seem like I should have to write the 'input' line twice though, just to optionally add an argument.
def _get_radio_button(text: str, is_checked=False):
with label(text, cls="radio-inline") as lbl:
if is_checked:
input(text, type="radio", name="optradio", checked='checked')
else:
input(text, type="radio", name="optradio")
return lbl
This would be my second approach, but it is the same lines of code and less readable - though perhaps a tiny bit more DRY.
a = dict(type='radio', name='optradio')
if is_checked:
a['checked']='checked'
with label(text, cls="radio-inline") as lbl:
input(text, **a)
Question: How can I handle this code case with the fewest lines possible without sacrificing readability?
Your code looks fine, except obviously for the naming of a, which could be input_opts or something like that.
Another possibility to make it a bit clearer is to use direct keyword arguments for the common stuff and just inject the optional ones using **. When only one is optional, this can be quite short, e.g.:
checked_arg = {'checked': 'checked'} if is_checked else {}
with label(text, cls="radio-inline") as lbl:
input(text, type="radio", name="optradio", **checked_arg)
Only as concept :) You can decorate in this way own or alien (library) functions. Even more, you can make decorator as class (with __call__ method which will decorate underlying function) which can be parameterized with simple "morphisms" of underlying function arguments (they may be list of functions - as arguments of decorator class constructor). Also you can make more declarative style decorator and to inspect underlying function arguments (for default values, for example) - you are limited only by own fantasy :) So:
from functools import wraps
def adapt_gui_args(callable):
#wraps(callable)
def w(*args, **kwargs):
if kwargs.pop('is_checked', False): kwargs['checked'] = 'checked'
return callable(*args, **kwargs)
return w
# may be decorated with adapt_gui_args if it's your function
def input(*args, **kwargs):
print("args: ", args)
print("kwargs: ", kwargs)
# decorate input function outside its source body
input = adapt_gui_args(input)
def test(is_checked=False):
input(1, 2, type="radio", is_checked=is_checked)
test(False)
test(True)

Python: Need the result of functools.partial(function) to be known as something else

My software supports python to automate tasks (Maya). When I undo or redo in this software it prints the last command, unfortunately for Python this is the memory address of the function rather than something actually useful. So the user sees the output Undo: <functools.partial object at 0x000002235DEDDF48> instead of something actually useful like Undo: Set Key on something at frame x
There appears to be no option to make Maya print a useful result from within it's own functionality, so now I want to ask if there's some obscure way cheese it with python to have that instance call itself something useful in a way the software will print while hopefully not interfering with the functionality. I'll try anything at this point!
def testFunc():
pass
test = partial(testFunc)
test results in <functools.partial object at 0x000002235DEA95E8>
If anyone can think of a more accurate title please edit / suggest.
Thanks to kindall giving me a lead in the comments I was able to find an answer. Subclassing partial and defining __repr__() is the key.
By grabbing the *args on __init__() and storing it as self.result we can use it on __repr__() to return the last argument given to *args as the result given by Maya when using Undo/Redo.
class rpartial(partial):
def __init__(self, *args):
self.result = args[-1]
def __repr__(self):
return self.result
rpartial(function, arg1, arg2, undoredo)
The string given to rpartial on the last line for undoredo is what will be printed by Maya when using Undo/Redo.

Is there a reason not to send super().__init__() a dictionary instead of **kwds?

I just started building a text based game yesterday as an exercise in learning Python (I'm using 3.3). I say "text based game," but I mean more of a MUD than a choose-your-own adventure. Anyway, I was really excited when I figured out how to handle inheritance and multiple inheritance using super() yesterday, but I found that the argument-passing really cluttered up the code, and required juggling lots of little loose variables. Also, creating save files seemed pretty nightmarish.
So, I thought, "What if certain class hierarchies just took one argument, a dictionary, and just passed the dictionary back?" To give you an example, here are two classes trimmed down to their init methods:
class Actor:
def __init__(self, in_dict,**kwds):
super().__init__(**kwds)
self._everything = in_dict
self._name = in_dict["name"]
self._size = in_dict["size"]
self._location = in_dict["location"]
self._triggers = in_dict["triggers"]
self._effects = in_dict["effects"]
self._goals = in_dict["goals"]
self._action_list = in_dict["action list"]
self._last_action = ''
self._current_action = '' # both ._last_action and ._current_action get updated by .update_action()
class Item(Actor):
def __init__(self,in_dict,**kwds)
super().__init__(in_dict,**kwds)
self._can_contain = in_dict("can contain") #boolean entry
self._inventory = in_dict("can contain") #either a list or dict entry
class Player(Actor):
def __init__(self, in_dict,**kwds):
super().__init__(in_dict,**kwds)
self._inventory = in_dict["inventory"] #entry should be a Container object
self._stats = in_dict["stats"]
Example dict that would be passed:
playerdict = {'name' : '', 'size' : '0', 'location' : '', 'triggers' : None, 'effects' : None, 'goals' : None, 'action list' = None, 'inventory' : Container(), 'stats' : None,}
(The None's get replaced by {} once the dictionary has been passed.)
So, in_dict gets passed to the previous class instead of a huge payload of **kwds.
I like this because:
It makes my code a lot neater and more manageable.
As long as the dicts have at least some entry for the key called, it doesn't break the code. Also, it doesn't matter if a given argument never gets used.
It seems like file IO just got a lot easier (dictionaries of player data stored as dicts, dictionaries of item data stored as dicts, etc.)
I get the point of **kwds (EDIT: apparently I didn't), and it hasn't seemed cumbersome when passing fewer arguments. This just appears to be a comfortable way of dealing with a need for a large number of attributes at the the creation of each instance.
That said, I'm still a major python noob. So, my question is this: Is there an underlying reason why passing the same dict repeatedly through super() to the base class would be a worse idea than just toughing it out with nasty (big and cluttered) **kwds passes? (e.g. issues with the interpreter that someone at my level would be ignorant of.)
EDIT:
Previously, creating a new Player might have looked like this, with an argument passed for each attribute.
bob = Player('bob', Location = 'here', ... etc.)
The number of arguments needed blew up, and I only included the attributes that really needed to be present to not break method calls from the Engine object.
This is the impression I'm getting from the answers and comments thus far:
There's nothing "wrong" with sending the same dictionary along, as long as nothing has the opportunity to modify its contents (Kirk Strauser) and the dictionary always has what it's supposed to have (goncalopp). The real answer is that the question was amiss, and using in_dict instead of **kwds is redundant.
Would this be correct? (Also, thanks for the great and varied feedback!)
I'm not sure I understand your question exactly, because I don't see how the code looked before you made the change to use in_dict. It sounds like you have been listing out dozens of keywords in the call to super (which is understandably not what you want), but this is not necessary. If your child class has a dict with all of this information, it can be turned into kwargs when you make the call with **in_dict. So:
class Actor:
def __init__(self, **kwds):
class Item(Actor):
def __init__(self, **kwds)
self._everything = kwds
super().__init__(**kwds)
I don't see a reason to add another dict for this, since you can just manipulate and pass the dict created for kwds anyway
Edit:
As for the question of the efficiency of using the ** expansion of the dict versus listing the arguments explicitly, I did a very unscientific timing test with this code:
import time
def some_func(**kwargs):
for k,v in kwargs.items():
pass
def main():
name = 'felix'
location = 'here'
user_type = 'player'
kwds = {'name': name,
'location': location,
'user_type': user_type}
start = time.time()
for i in range(10000000):
some_func(**kwds)
end = time.time()
print 'Time using expansion:\t{0}s'.format(start - end)
start = time.time()
for i in range(10000000):
some_func(name=name, location=location, user_type=user_type)
end = time.time()
print 'Time without expansion:\t{0}s'.format(start - end)
if __name__ == '__main__':
main()
Running this 10,000,000 times gives a slight (and probably statistically meaningless) advantage passing around a dict and using **.
Time using expansion: -7.9877269268s
Time without expansion: -8.06108212471s
If we print the IDs of the dict objects (kwds outside and kwargs inside the function), you will see that python creates a new dict for the function to use in either case, but in fact the function only gets one dict forever. After the initial definition of the function (where the kwargs dict is created) all subsequent calls are just updating the values of that dict belonging to the function, no matter how you call it. (See also this enlightening SO question about how mutable default parameters are handled in python, which is somewhat related)
So from a performance perspective, you can pick whichever makes sense to you. It should not meaningfully impact how python operates behind the scenes.
I've done that myself where in_dict was a dict with lots of keys, or a settings object, or some other "blob" of something with lots of interesting attributes. That's perfectly OK if it makes your code cleaner, particularly if you name it clearly like settings_object or config_dict or similar.
That shouldn't be the usual case, though. Normally it's better to explicitly pass a small set of individual variables. It makes the code much cleaner and easier to reason about. It's possible that a client could pass in_dict = None by accident and you wouldn't know until some method tried to access it. Suppose Actor.__init__ didn't peel apart in_dict but just stored it like self.settings = in_dict. Sometime later, Actor.method comes along and tries to access it, then boom! Dead process. If you're calling Actor.__init__(var1, var2, ...), then the caller will raise an exception much earlier and provide you with more context about what actually went wrong.
So yes, by all means: feel free to do that when it's appropriate. Just be aware that it's not appropriate very often, and the desire to do it might be a smell telling you to restructure your code.
This is not python specific, but the greatest problem I can see with passing arguments like this is that it breaks encapsulation. Any class may modify the arguments, and it's much more difficult to tell which arguments are expected in each class - making your code difficult to understand, and harder to debug.
Consider explicitly consuming the arguments in each class, and calling the super's __init__ on the remaining. You don't need to make them explicit:
class ClassA( object ):
def __init__(self, arg1, arg2=""):
pass
class ClassB( ClassA ):
def __init__(self, arg3, arg4="", *args, **kwargs):
ClassA.__init__(self, *args, **kwargs)
ClassB(3,4,1,2)
You can also leave the variables uninitialized and use methods to set them. You can then use different methods in the different classes, and all subclasses will have access to the superclass methods.

How do I retain the method attributes of the functions generated through yield in python 2.7?

I have been doing a lot of searching, and I don't think I've really found what I have been looking for. I will try my best to explain what I am trying to do, and hopefully there is a simple solution, and I'll be glad to have learned something new.
This is ultimately what I am trying to accomplish: Using nosetests, decorate some test cases using the attribute selector plugin, then execute test cases that match a criteria by using the -a switch during commandline invocation. The attribute values for the tests that are executed are then stored in an external location. The command line call I'm using is like below:
nosetests \testpath\ -a attribute='someValue'
I have also created a customized nosetest plugin, which stores the test cases' attributse, and writes them to an external location. The idea is that I can select a batch of tests, and by storing the attributes of these tests, I can do filtering on these results later for reporting purposes. I am accessing the method attributes in my plugin by overriding the "wantMethod" method with the code similar to the following:
def set_attribs(self, method, attribute):
if hasattr(method, attribute):
if not self.method_attributes.has_key(method.__name__):
self.method_attributes[method.__name__] = {}
self.method_attributes[method.__name__][attribute] = getattr(method, attribute)
def wantMethod(self, method):
self.set_attribs(method, "attribute1")
self.set_attribs(method, "attribute2")
pass
I have this working for pretty much all the tests, except for one case, where the test is uing the "yield" keyword. What is happening is that the methods that are generated are being executed fine, but then the method attributes are empty for each of the generated functions.
Below is the example of what I am trying to achieve. The test below retreives a list of values, and for each of those values, yields the results from another function:
#attr(attribute1='someValue', attribute2='anotherValue')
def sample_test_generator(self):
for (key, value) in _input_dictionary.items()
f = partial(self._do_test, key, value)
f.attribute1='someValue'
yield (lambda x: f(), key)
def _do_test(self, input1, input2):
# Some code
From what I have read, and think I understand, when yield is called, it would create a new callable function which then gets executed. I have been trying to figure out how to retain the attribute values from my sample_test_generator method, but I have not been successful. I thought I could create a partial method, and then add the attribute to the method, but no luck. The tests execute without errors at all, it just seems that from my plugin's perspective, the method attributes aren't present, so they don't get recorded.
I realize this a pretty involved question, but I wanted to make sure that the context for what I am trying to achieve is clear. I have been trying to find information that could help me for this particular case, but I feel like I've reached a stumbling block now, so I would really like to ask the experts for some advice.
Thanks.
** Update **
After reading through the feedback and playing around some more, it looks like if I modified the lambda expression, it would achieve what I am looking for. In fact, I didn't even need to create the partial function:
def sample_test_generator(self):
for (key, value) in _input_dictionary.items()
yield (lambda: self._do_test)
The only downside to this approach is that the test name will not change. As I am playing around more, it looks like in nosetests, when a test generator is used, it would actually change the test name in the result based on the keywords it contains. Same thing was happening when I was using the lambda expression with a parameter.
For example:
Using lamdba expression with a parameter:
yield (lambda x: self._do_test, "value1")
In nosetests plugin, when you access the test case name, it would be displayed as "sample_test_generator(value1)
Using lambda expression without a parameter:
yield (lambda: self._do_test)
The test case name in this case would be "sample_test_generator". In my example above, if there are multiple values in the dictionary, then the yield call would occur multiple times. However, the test name would always remain as "sample_test_generator". This is not as bad as when I would get the unique test names, but then not be able to store the attribute values at all. I will keep playing around, but thanks for the feedback so far!
EDIT
I forgot to come back and provide my final update on how I was able to get this to work in the end, there was a little confusion on my part at first, and after I looked through it some more, I figured out that it had to do with how the tests are recognized:
My original implementation assumed that every test that gets picked up for execution goes through the "wantMethod" call from the plugin's base class. This is not true when "yield" is used to generate the test, because at this point, the test method has already passed the "wantMethod" call.
However, once the test case is generated through the "yeild" call, it does go through the "startTest" call from the plug-in base class, and this is where I was finally able to store the attribute successfully.
So in a nut shell, my test execution order looked like this:
nose -> wantMethod(method_name) -> yield -> startTest(yielded_test_name)
In my override of the startTest method, I have the following:
def startTest(self, test):
# If a test is spawned by using the 'yield' keyword, the test names would be the parent test name, appended by the '(' character
# example: If the parent test is "smoke_test", the generated test from yield would be "smoke_test('input')
parent_test_name = test_name.split('(')[0]
if self.method_attributes.has_key(test_name):
self._test_attrib = self.method_attributes[test_name]
elif self.method_attributes.has_key(parent_test_name):
self._test_attrib = self.method_attributes[parent_test_name]
else:
self._test_attrib = None
With this implementation, along with my overide of wantMethod, each test spawned by the parent test case also inherits attributes from the parent method, which is what I needed.
Again, thanks to all who send replies. This was quite a learning experience.
Would this fix your name issue?
def _actual_test(x, y):
assert x == y
def test_yield():
_actual_test.description = "test_yield_%s_%s" % (5, 5)
yield _actual_test, 5, 5
_actual_test.description = "test_yield_%s_%s" % (4, 8) # fail
yield _actual_test, 4, 8
_actual_test.description = "test_yield_%s_%s" % (2, 2)
yield _actual_test, 2, 2
Rename survives #attr too.
does this work?
#attr(attribute1='someValue', attribute2='anotherValue')
def sample_test_generator(self):
def get_f(f, key):
return lambda x: f(), key
for (key, value) in _input_dictionary.items()
f = partial(self._do_test, key, value)
f.attribute1='someValue'
yield get_f(f, key)
def _do_test(self, input1, input2):
# Some code
The Problem ist that the local variables change after you created the lambda.

Multiple 'endings' to a function - return an object, return a string, raise an exception?

This one's a structure design problem, I guess. Back for some advice.
To start: I'm writing a module. Hence the effort of making it as usable to potential developers as possible.
Inside an object (let's call it Swoosh) I have a method which, when called, may result in either success (a new object is returned -- for insight: it's an httplib.HTTPResponse) or failure (surprising, isn't it?).
I'm having trouble deciding how to handle failures. There are two main cases here:
user supplied data that was incorrect
data was okay, but user interaction will be needed () - I need to pass back to the user a string that he or she will need to use in some way.
In (1) I decided to raise ValueError() with an appropriate description.
In (2), as I need to actually pass a str back to the user.. I'm not sure about whether it would be best to just return a string and leave it to the user to check what the function returned (httplib.HTTPResponse or str) or raise a custom exception? Is passing data through raising exceptions a good idea? I don't think I've seen this done anywhere, but on the other hand - I haven't seen much.
What would you, as a developer, expect from an object/function like this?
Or perhaps you find the whole design ridiculous - let me know, I'll happily learn.
As much as I like the approach of handling both cases with specifically-typed exceptions, I'm going to offer a different approach in case it helps: callbacks.
Callbacks tend to work better if you're already using an asynchronous framework like Twisted, but that's not their only place. So you might have a method that takes a function for each outcome, like this:
def do_request(on_success, on_interaction_needed, on_failure):
"""
Submits the swoosh request, and awaits a response.
If no user interaction is needed, calls on_success with a
httplib.HTTPResponse object.
If user interaction is needed, on_interaction_needed is
called with a single string parameter.
If the request failed, a ValueError is passed to on_failure
"""
response = sumbit_request()
if response.is_fine():
on_success(response)
elif response.is_partial()
on_interaction_needed(response.message)
else:
on_failure(ValueError(response.message))
Being Python, there are a million ways to do this. You might not like passing an exception to a function, so you maybe just take a callback for the user input scenario. Also, you might pass the callbacks in to the Swoosh initialiser instead.
But there are drawbacks to this too, such as:
Carelessness may result in spaghetti code
You're allowing your caller to inject logic into your function (eg. exceptions raised in the callback will propagate out of Swoosh)
My example here is simple, your actual function might not be
As usual, careful consideration and good documentation should avoid these problems. In theory.
I think raising an exception may actually be a pretty good idea in this case. Squashing multiple signals into a single return value of a function isn't ideal in Python, due to duck typing. It's not very Pythonic; every time you need to do something like:
result = some_function(...)
if isinstance(result, TypeA):
do_something(result)
elif isinstance(result, TypeB):
do_something_else(result)
you should be thinking about whether it's really the best design (as you're doing).
In this case, if you implement a custom exception, then the code that calls your function can just treat the returned value as a HTTPResponse. Any path where the function is unable to return something its caller can treat that way is handled by throwing an exception.
Likewise, the code that catches the exception and prompts the user with the message doesn't have to worry about the exact type of the thing its getting. It just knows that it's been explicitly instructed (by the exception) to show something to the user.
If the user interaction case means the calling code has to show a prompt, get some input and them pass control back to your function, it might be ugly trying to handle that with an exception. Eg,
try:
Swoosh.method()
except UserInteraction, ex:
# do some user interaction stuff
# pass it back to Swoosh.method()?
# did Swoosh need to save some state from the last call?
except ValueError:
pass # whatever
If this user interaction is a normal part of the control flow, it might be cleaner to pass a user-interaction function into your method in the first place - then it can return a result to the Swoosh code. For example:
# in Swoosh
def method(self, userinteractor):
if more_info_needed:
more_info = userinteractor.prompt("more info")
...
ui = MyUserInteractor(self) # or other state
Swoosh.method(ui)
You can return a tuple of (httplib.HTTPResponse, str) with the str being optionally None.
Definitely raise an exception for 1).
If you don't like returning a tuple, you can also create a "response object" i.e. an instance of a new class ( lets say SomethingResponse ) that encapsulates the HTTPResponse with optional messages to the end-user( in the simplest case, just a str).

Categories

Resources