I want to write a number of related parse functions, that take text and return objects or raise exceptions, rather like int() and float() do. I do anticipate being able to supply these recursively to higher level parsers. I want to be able to configure these at run time, and have either their docstrings, or some other attribute, settable to report how they've been configured.
Python's 'There should be one—and preferably only one—obvious way to do it' has let me down here.
I appear to be able to do exactly the same thing with either a class with a call method, or a function that returns a function.
For instance, my two attempts at a toy range-constrained number parser are below.
class Parser():
def __init__(self, nType=int, nRange=None):
self.nType = nType
self.nRange = nRange
self.__doc__ = 'class - range is {}'.format(str(nRange))
def __call__(self, inStr):
x = self.nType(inStr)
if self.nRange:
if not self.nRange[0] <= x <= self.nRange[1]:
raise ValueError('{} is out of range (class)'.format(inStr))
return x
def parserFactory(nType=int, nRange=None):
def parser(inStr):
x = nType(inStr)
if nRange:
if not nRange[0] <= x <= nRange[1]:
raise ValueError('{} is out of range (factory)'.format(inStr))
return x
parser.__doc__ = 'factory - range is {}'.format(str(nRange))
return parser
a = Parser()
b = Parser(nRange=(3,6), nType=float)
c = parserFactory(nType=float)
d = parserFactory(nRange=(3, 6))
for string in ['4', '14']:
for x in [a,b,c,d,int]:
print(x.__doc__[:35])
try:
print(string, x(string))
except ValueError as error:
print(error)
Both do what I want. Both have more or less the same complexity, and essentially the same statements, albeit in a different order. The factory is slightly shorter. I don't anticipate needing to use any other class methods. I don't see any clear way to choose which is 'better'.
Is one or the other more pythonic?
Is one or the other more likely to run me into difficulty if (when) I try to modify them in yet unanticipated ways?
What do most people do?
I'm a fairly inexperienced programmer. I've read wikipedia's entry on 'factory method pattern' and the subtleties in it go straight over my head.
(edit) Having read comments, answers and links, I think one of the problems is that neither is a good fit. You would not expect a class to have so few methods, even though it can. You would not expect a function to be carrying an attribute, even though it can. As the syntax is so similar, it probably doesn't matter which I use initially, as I can switch without a change in behaviour. (/edit)
You can think of functions as syntactic sugar for classes with only a __init__ and __call__. That would also be true for generators vs classes, context managers vs classes, ...
If you are only passing the parser around and calling it someplace(i.e. doing function things), then you should use the factory. It also allows you to migrate to the class later easily, your factory can simply return the class.
If, besides calling it, you need to inspect or change the values of the parser in other parts of your code, then you should go with classes.
All that said, in this specific case you showed here, I think I would use functools.partial
Related
I used naive approach to write a wrapper. Get all *args and **kwargs and pass them to the enclosing function. But something went wrong. So I simplified example to the core to illustrate my troubles.
# simplies wrapper possible: just pass the args
def wraps(f):
def call(*argv, **kw):
# add some meaningful manipulations later
return f(*argv, **kw)
return call
# check the wrapper behaves identically
class M:
def __init__(this, param):
this.param = param
M.__new__ = M.__new__
m1 = M(1)
M.__new__ = wraps(M.__new__)
m2 = M(2)
m1 was instantiated normally, but m2 fails with the following error description
TypeError: object.__new__() takes exactly one argument (the type to instantiate)
The question is how to define wraps and call function properly so they would behave identically to the function being wrapped regardless of the wrapped function.
It is not the end objective obviously, since primitive lambda x: x would suffice. It is a starting point from which I could introduce further complications.
The short answer: It's impossible. You could not define a perfect wrapper in python (and in many other languages too).
Slightly longer version. Python function is a first-class object and all manipulations acceptable for objects could be performed with a function too. So you could not presume that some complex procedure would limit itself with only calling the function passed as argument and would not use the function object in other unobvious ways
Much more verbose speculation with examples
Functions defined only at part of the domain are pretty common
def half(i):
if i < 0:
raise ValueError
if i & 1:
raise ValueError
return i / 2
Pretty straight. No we could get a little more confusing:
class Veggy:
def __init__(this, kind):
this.kind = kind
def pr(this):
print(this.kind)
def assess(v):
if v.kind in ['tomato', 'carrot']:
raise ValueError
v.pr()
Here Veggy used as a function proxy but also have public property kind which the assess function check before executing.
The same thing could be done with a function object since it also have additional properties besides calling.
def test(x):
return x + x
def assess4(f, *argv, **kw):
if f.__name__ != 'test':
raise ValueError
if f.__module__ != '__main__':
raise ValueError
if len(f.__code__.co_code) % 8 == 4:
raise ValueError
return f(*argv, **kw)
Writing correct wrapper becomes a challenge. That challenge could be complicated further:
def assess0(f, *argv, **kw):
if len(f.__code__.co_code) % 8 == 0:
kw['arg'] = True
return f(*argv[1:], kw)
else
kw['arg'] = False
return f(*argv[:-1], **kw)
Universal wrapper should handle both assess0 and assess4 correctly which is pretty impossible. And we have not touched id magic. Checking id would cast acceptable function in stone.
Coding etiquette
So you could not write a wrapper. Why someone bother to write one? Why function are so common when they could not guarantee behavior equivalence and could possible introduce non-trivial changes in code flow?
The simple answer is coding conventions. The famous substitution principle. Code should keep behavior properties when some object is substituted with another of the same type. Python put little focus on type nomination and enforcing. Rigorous type system is not a must, you could establish APIs and protocols through documentation and type annotation like the python language does.
Programs must be written for people to read, and only incidentally for machines to execute. OOP conventions are all in people minds. So python developers broke conventions requiring some non-stadard behavior for overriding object methods. This non-conventional OOP treatment make impossible to use decorators for transforming __init__ and __new__ methods.
The final solution
If python treats __new__ so special then generic wrapper should do the same.
# simplest wrapper possible: just pass the args
def wraps(f):
def call(*argv, **kw):
# add some meaningful manipulations later
return f(*argv, **kw)
def call_new(*argv, **kw):
# add some meaningful manipulations later
return f(argv[0])
if f is object.__new__:
return call_new
# elif other_special_case: pass
else:
return call
Now it could successfully pass the test
# check the wrapper behaves identically
class M:
def __init__(this, param):
this.param = param
M.__new__ = M.__new__
m1 = M(1)
M.__new__ = wraps(M.__new__)
m2 = M(2)
The drawback is that you should implement distinct workaround for any other convention breaking functions besides __new__ to make your function wrapper semi-applicable in universal context. But it is the best you could get out of python.
I'm trying to efficiently add optional features so some class functions. Pretty new to python, so I'm trying to learn good habits. I'm implementing this through **kwargs (for better or worse).
One such example is adding a 'verbose' option which adds several conditional print statements throughout the functions, and will suffice as a unit test here. There's a lot of SO Q+As (and other tutorials) about **kwargs usage for variables but less about usage as execution flags.
My frame of reference here is thinking in terms of overloads and switch statements so I was on the fence with using a try statement if various **kwargs existed.
Can't copy and paste, so here's some basic example:
class Fruit:
def __init__(self, name):
self.name = name
class Bowl:
def __init__(self):
self.contents = []
def fill_bowl(self, *fruit, **options):
self.fruit_list = []
for x in options:
if options.get(x) == 'verbose':
verbose = 1
else:
verbose = 0
From here, I'd add several if verbose == 1: print... various attributes of the Fruit I'm adding to the bowl, number of contents, etc., etc. to help with sanity checks without going through the debugger in-depth.
This is functional, barring some transcription error in typing this. Am I on the right track, or is there a more intuitive way to accomplish this?
You shouldn't be using kwargs here. This are for when you need to accept unknown options. Here i you know what you want to accept: a verbose parameter. To make it optional you can give it a default value.
def fill_bowl(self, *fruit, verbose=0):
Although you should probably use True and False rather than 1 and 0.
(And note even if you did want to use kwargs, trees no reason to iterate through the dict like that; you would just do if kwargs.get("verbose").)
I just started building a text based game yesterday as an exercise in learning Python (I'm using 3.3). I say "text based game," but I mean more of a MUD than a choose-your-own adventure. Anyway, I was really excited when I figured out how to handle inheritance and multiple inheritance using super() yesterday, but I found that the argument-passing really cluttered up the code, and required juggling lots of little loose variables. Also, creating save files seemed pretty nightmarish.
So, I thought, "What if certain class hierarchies just took one argument, a dictionary, and just passed the dictionary back?" To give you an example, here are two classes trimmed down to their init methods:
class Actor:
def __init__(self, in_dict,**kwds):
super().__init__(**kwds)
self._everything = in_dict
self._name = in_dict["name"]
self._size = in_dict["size"]
self._location = in_dict["location"]
self._triggers = in_dict["triggers"]
self._effects = in_dict["effects"]
self._goals = in_dict["goals"]
self._action_list = in_dict["action list"]
self._last_action = ''
self._current_action = '' # both ._last_action and ._current_action get updated by .update_action()
class Item(Actor):
def __init__(self,in_dict,**kwds)
super().__init__(in_dict,**kwds)
self._can_contain = in_dict("can contain") #boolean entry
self._inventory = in_dict("can contain") #either a list or dict entry
class Player(Actor):
def __init__(self, in_dict,**kwds):
super().__init__(in_dict,**kwds)
self._inventory = in_dict["inventory"] #entry should be a Container object
self._stats = in_dict["stats"]
Example dict that would be passed:
playerdict = {'name' : '', 'size' : '0', 'location' : '', 'triggers' : None, 'effects' : None, 'goals' : None, 'action list' = None, 'inventory' : Container(), 'stats' : None,}
(The None's get replaced by {} once the dictionary has been passed.)
So, in_dict gets passed to the previous class instead of a huge payload of **kwds.
I like this because:
It makes my code a lot neater and more manageable.
As long as the dicts have at least some entry for the key called, it doesn't break the code. Also, it doesn't matter if a given argument never gets used.
It seems like file IO just got a lot easier (dictionaries of player data stored as dicts, dictionaries of item data stored as dicts, etc.)
I get the point of **kwds (EDIT: apparently I didn't), and it hasn't seemed cumbersome when passing fewer arguments. This just appears to be a comfortable way of dealing with a need for a large number of attributes at the the creation of each instance.
That said, I'm still a major python noob. So, my question is this: Is there an underlying reason why passing the same dict repeatedly through super() to the base class would be a worse idea than just toughing it out with nasty (big and cluttered) **kwds passes? (e.g. issues with the interpreter that someone at my level would be ignorant of.)
EDIT:
Previously, creating a new Player might have looked like this, with an argument passed for each attribute.
bob = Player('bob', Location = 'here', ... etc.)
The number of arguments needed blew up, and I only included the attributes that really needed to be present to not break method calls from the Engine object.
This is the impression I'm getting from the answers and comments thus far:
There's nothing "wrong" with sending the same dictionary along, as long as nothing has the opportunity to modify its contents (Kirk Strauser) and the dictionary always has what it's supposed to have (goncalopp). The real answer is that the question was amiss, and using in_dict instead of **kwds is redundant.
Would this be correct? (Also, thanks for the great and varied feedback!)
I'm not sure I understand your question exactly, because I don't see how the code looked before you made the change to use in_dict. It sounds like you have been listing out dozens of keywords in the call to super (which is understandably not what you want), but this is not necessary. If your child class has a dict with all of this information, it can be turned into kwargs when you make the call with **in_dict. So:
class Actor:
def __init__(self, **kwds):
class Item(Actor):
def __init__(self, **kwds)
self._everything = kwds
super().__init__(**kwds)
I don't see a reason to add another dict for this, since you can just manipulate and pass the dict created for kwds anyway
Edit:
As for the question of the efficiency of using the ** expansion of the dict versus listing the arguments explicitly, I did a very unscientific timing test with this code:
import time
def some_func(**kwargs):
for k,v in kwargs.items():
pass
def main():
name = 'felix'
location = 'here'
user_type = 'player'
kwds = {'name': name,
'location': location,
'user_type': user_type}
start = time.time()
for i in range(10000000):
some_func(**kwds)
end = time.time()
print 'Time using expansion:\t{0}s'.format(start - end)
start = time.time()
for i in range(10000000):
some_func(name=name, location=location, user_type=user_type)
end = time.time()
print 'Time without expansion:\t{0}s'.format(start - end)
if __name__ == '__main__':
main()
Running this 10,000,000 times gives a slight (and probably statistically meaningless) advantage passing around a dict and using **.
Time using expansion: -7.9877269268s
Time without expansion: -8.06108212471s
If we print the IDs of the dict objects (kwds outside and kwargs inside the function), you will see that python creates a new dict for the function to use in either case, but in fact the function only gets one dict forever. After the initial definition of the function (where the kwargs dict is created) all subsequent calls are just updating the values of that dict belonging to the function, no matter how you call it. (See also this enlightening SO question about how mutable default parameters are handled in python, which is somewhat related)
So from a performance perspective, you can pick whichever makes sense to you. It should not meaningfully impact how python operates behind the scenes.
I've done that myself where in_dict was a dict with lots of keys, or a settings object, or some other "blob" of something with lots of interesting attributes. That's perfectly OK if it makes your code cleaner, particularly if you name it clearly like settings_object or config_dict or similar.
That shouldn't be the usual case, though. Normally it's better to explicitly pass a small set of individual variables. It makes the code much cleaner and easier to reason about. It's possible that a client could pass in_dict = None by accident and you wouldn't know until some method tried to access it. Suppose Actor.__init__ didn't peel apart in_dict but just stored it like self.settings = in_dict. Sometime later, Actor.method comes along and tries to access it, then boom! Dead process. If you're calling Actor.__init__(var1, var2, ...), then the caller will raise an exception much earlier and provide you with more context about what actually went wrong.
So yes, by all means: feel free to do that when it's appropriate. Just be aware that it's not appropriate very often, and the desire to do it might be a smell telling you to restructure your code.
This is not python specific, but the greatest problem I can see with passing arguments like this is that it breaks encapsulation. Any class may modify the arguments, and it's much more difficult to tell which arguments are expected in each class - making your code difficult to understand, and harder to debug.
Consider explicitly consuming the arguments in each class, and calling the super's __init__ on the remaining. You don't need to make them explicit:
class ClassA( object ):
def __init__(self, arg1, arg2=""):
pass
class ClassB( ClassA ):
def __init__(self, arg3, arg4="", *args, **kwargs):
ClassA.__init__(self, *args, **kwargs)
ClassB(3,4,1,2)
You can also leave the variables uninitialized and use methods to set them. You can then use different methods in the different classes, and all subclasses will have access to the superclass methods.
I am writing a script at the moment that will grab certain information from HTML using dom4j.
Since Python/Jython does not have a native switch statement I decided to use a whole bunch of if statements that call the appropriate method, like below:
if type == 'extractTitle':
extractTitle(dom)
if type == 'extractMetaTags':
extractMetaTags(dom)
I will be adding more depending on what information I want to extract from the HTML and thought about taking the dictionary approach which I found elsewhere on this site, example below:
{
'extractTitle': extractTitle,
'extractMetaTags': extractMetaTags
}[type](dom)
I know that each time I run the script the dictionary will be built, but at the same time if I were to use the if statements the script would have to check through all of them until it hits the correct one. What I am really wondering, which one performs better or is generally better practice to use?
Update: #Brian - Thanks for the great reply. I have a question, if any of the extract methods require more than one object, e.g.
handle_extractTag(self, dom, anotherObject)
# Do something
How would you make the appropriate changes to the handle method to implemented this? Hope you know what I mean :)
Cheers
To avoid specifying the tag and handler in the dict, you could just use a handler class with methods named to match the type. Eg
class MyHandler(object):
def handle_extractTitle(self, dom):
# do something
def handle_extractMetaTags(self, dom):
# do something
def handle(self, type, dom):
func = getattr(self, 'handle_%s' % type, None)
if func is None:
raise Exception("No handler for type %r" % type)
return func(dom)
Usage:
handler = MyHandler()
handler.handle('extractTitle', dom)
Update:
When you have multiple arguments, just change the handle function to take those arguments and pass them through to the function. If you want to make it more generic (so you don't have to change both the handler functions and the handle method when you change the argument signature), you can use the *args and **kwargs syntax to pass through all received arguments. The handle method then becomes:
def handle(self, type, *args, **kwargs):
func = getattr(self, 'handle_%s' % type, None)
if func is None:
raise Exception("No handler for type %r" % type)
return func(*args, **kwargs)
With your code you're running your functions all get called.
handlers = {
'extractTitle': extractTitle,
'extractMetaTags': extractMetaTags
}
handlers[type](dom)
Would work like your original if code.
It depends on how many if statements we're talking about; if it's a very small number, then it will be more efficient than using a dictionary.
However, as always, I strongly advice you to do whatever makes your code look cleaner until experience and profiling tell you that a specific block of code needs to be optimized.
Your use of the dictionary is not quite correct. In your implementation, all methods will be called and all the useless one discarded. What is usually done is more something like:
switch_dict = {'extractTitle': extractTitle,
'extractMetaTags': extractMetaTags}
switch_dict[type](dom)
And that way is facter and more extensible if you have a large (or variable) number of items.
The efficiency question is barely relevant. The dictionary lookup is done with a simple hashing technique, the if-statements have to be evaluated one at a time. Dictionaries tend to be quicker.
I suggest that you actually have polymorphic objects that do extractions from the DOM.
It's not clear how type gets set, but it sure looks like it might be a family of related objects, not a simple string.
class ExtractTitle( object ):
def process( dom ):
return something
class ExtractMetaTags( object ):
def process( dom ):
return something
Instead of setting type="extractTitle", you'd do this.
type= ExtractTitle() # or ExtractMetaTags() or ExtractWhatever()
type.process( dom )
Then, you wouldn't be building this particular dictionary or if-statement.
I'm trying to write a freeze decorator for Python.
The idea is as follows :
(In response to the two comments)
I might be wrong but I think there is two main use of
test case.
One is the test-driven development :
Ideally , developers are writing case before writing the code.
It usually helps defining the architecture because this discipline
forces to define the real interfaces before development.
One may even consider that in some case the person who
dispatches job between dev is writing the test case and
use it to illustrate efficiently the specification he has in mind.
I don't have any experience of the use of test case like that.
The second is the idea that all project with a decent
size and a several programmers is suffering from broken code.
Something that use to work may get broken from a change
that looked like an innocent refactoring.
Though good architecture, loose couple between component may
help to fight against this phenomenon ; you will sleep better
at night if you have written some test case to make sure
that nothing will break your program's behavior.
HOWEVER,
Nobody can deny the overhead of writting test cases. In the
first case one may argue that test case is actually guiding
development and is therefore not to be considered as an overhead.
Frankly speaking, I'm a pretty young programmer and if I were
you, my word on this subject is not really valuable...
Anyway, I think that mosts company/projects are not working
like that, and that unit tests are mainly used in the second
case...
In other words, rather than ensuring that the program is
working correctly, it is aiming at checking that it will
work the same in the future.
This needs can be met without the cost of writing tests,
by using this freezing decorator.
Let's say you have a function
def pow(n,k):
if n == 0: return 1
else: return n * pow(n,k-1)
It is perfectly nice, and you want to rewrite it as an optimized version.
It is part of a big project. You want it to give back the same result
for a few value.
Rather than going through the pain of test cases, one could use some
kind of freeze decorator.
Something such that the first time the decorator is run,
the decorator run the function with the defined args (below 0, and 7)
and saves the result in a map ( f --> args --> result )
#freeze(2,0)
#freeze(1,3)
#freeze(3,5)
#freeze(0,0)
def pow(n,k):
if n == 0: return 1
else: return n * pow(n,k-1)
Next time the program is executed, the decorator will load this map and check
that the result of this function for these args as not changed.
I already wrote quickly the decorator (see below), but hurt a few problems about
which I need your advise...
from __future__ import with_statement
from collections import defaultdict
from types import GeneratorType
import cPickle
def __id_from_function(f):
return ".".join([f.__module__, f.__name__])
def generator_firsts(g, N=100):
try:
if N==0:
return []
else:
return [g.next()] + generator_firsts(g, N-1)
except StopIteration :
return []
def __post_process(v):
specialized_postprocess = [
(GeneratorType, generator_firsts),
(Exception, str),
]
try:
val_mro = v.__class__.mro()
for ( ancestor, specialized ) in specialized_postprocess:
if ancestor in val_mro:
return specialized(v)
raise ""
except:
print "Cannot accept this as a value"
return None
def __eval_function(f):
def aux(args, kargs):
try:
return ( True, __post_process( f(*args, **kargs) ) )
except Exception, e:
return ( False, __post_process(e) )
return aux
def __compare_behavior(f, past_records):
for (args, kargs, result) in past_records:
assert __eval_function(f)(args,kargs) == result
def __record_behavior(f, past_records, args, kargs):
registered_args = [ (a, k) for (a, k, r) in past_records ]
if (args, kargs) not in registered_args:
res = __eval_function(f)(args, kargs)
past_records.append( (args, kargs, res) )
def __open_frz():
try:
with open(".frz", "r") as __open_frz:
return cPickle.load(__open_frz)
except:
return defaultdict(list)
def __save_frz(past_records):
with open(".frz", "w") as __open_frz:
return cPickle.dump(past_records, __open_frz)
def freeze_behavior(*args, **kvargs):
def freeze_decorator(f):
past_records = __open_frz()
f_id = __id_from_function(f)
f_past_records = past_records[f_id]
__compare_behavior(f, f_past_records)
__record_behavior(f, f_past_records, args, kvargs)
__save_frz(past_records)
return f
return freeze_decorator
Dumping and Comparing of results is not trivial for all type. Right now I'm thinking about using a function (I call it postprocess here), to solve this problem.
Basically instead of storing res I store postprocess(res) and I compare postprocess(res1)==postprocess(res2), instead of comparing res1 res2.
It is important to let the user overload the predefined postprocess function.
My first question is :
Do you know a way to check if an object is dumpable or not?
Defining a key for the function decorated is a pain. In the following snippets
I am using the function module and its name.
** Can you think of a smarter way to do that. **
The snippets below is kind of working, but opens and close the file when testing and when recording. This is just a stupid prototype... but do you know a nice way to open the file, process the decorator for all function, close the file...
I intend to add some functionalities to this. For instance, add the possibity to define
an iterable to browse a set of argument, record arguments from real use, etc.
Why would you expect from such a decorator?
In general, would you use such a feature, knowing its limitation... Especially when trying to use it with POO?
"In general, would you use such a feature, knowing its limitation...?"
Frankly speaking -- never.
There are no circumstances under which I would "freeze" results of a function in this way.
The use case appears to be based on two wrong ideas: (1) that unit testing is either hard or complex or expensive; and (2) it could be simpler to write the code, "freeze" the results and somehow use the frozen results for refactoring. This isn't helpful. Indeed, the very real possibility of freezing wrong answers makes this a bad idea.
First, on "consistency vs. correctness". This is easier to preserve with a simple mapping than with a complex set of decorators.
Do this instead of writing a freeze decorator.
print "frozen_f=", dict( (i,f(i)) for i in range(100) )
The dictionary object that's created will work perfectly as a frozen result set. No decorator. No complexity to speak of.
Second, on "unit testing".
The point of a unit test is not to "freeze" some random results. The point of a unit test is to compare real results with results developed another (simpler, more obvious, poorly-performing way). Usually unit tests compare hand-developed results. Other times unit tests use obvious but horribly slow algorithms to produce a few key results.
The point of having test data around is not that it's a "frozen" result. The point of having test data is that it is an independent result. Done differently -- sometimes by different people -- that confirms that the function works.
Sorry. This appears to me to be a bad idea; it looks like it subverts the intent of unit testing.
"HOWEVER, Nobody can deny the overhead of writting test cases"
Actually, many folks would deny the "overhead". It isn't "overhead" in the sense of wasted time and effort. For some of us, unittests are essential. Without them, the code may work, but only by accident. With them, we have ample evidence that it actually works; and the specific cases for which it works.
Are you looking to implement invariants or post conditions?
You should specify the result explicitly, this wil remove most of you problems.