Method Refactor: from many kwargs to one arg-object - python

Sometimes the number of kwargs of a method increase to a level where I think it should be refactored.
Example:
def foo(important=False, debug=False, dry_run=False, ...):
....
sub_foo(important=imporant, debug=debug, dry_run=dry_run, ...)
My current preferred solution:
class Args(object):
...
def foo(args):
sub_foo(args)
First question: How to call Args? Is there a well known description or design pattern?
Second question: Does Python have something which I could use as base class for Args?
Update
I use Python work daily since 13 years. I used methods with many kwargs and wrote methods with many kwargs. During the last weeks a read the book "clean code" and I liked it. Somehow it is like wearing an other pair of glasses now. My old code works, but it is not nice to look at. Splitting long methods into several smaller methods is easy. But I am not sure how to handle methods with kwargs-bloat.

I think what you've described is an example of the "Context" design pattern.
I usually call your "Args" a "Context" (or a "FooContext" if it's foo-specific enough).
I think the best explanation I saw was here: http://accu.org/index.php/journals/246 ("The Encapsulate Context Pattern", by Allen Kelly in Overload Journal #63 - Oct 2004, which I saw from another SO answer: https://stackoverflow.com/a/9458244/3427357).
There's also some decent papers that elaborate further if you want an in-depth exploration:
http://www.two-sdg.demon.co.uk/curbralan/papers/europlop/ContextEncapsulation.pdf
https://www.dre.vanderbilt.edu/~schmidt/PDF/Context-Object-Pattern.pdf
As pointed out by yet another SO answer (https://stackoverflow.com/a/1135454/3427357), the Context pattern is considered dangerous by some (c.f. http://misko.hevery.com/2008/07/18/breaking-the-law-of-demeter-is-like-looking-for-a-needle-in-the-haystack/).
But I think the "Law of Demeter" warnings are about not over-complicating your early design more than they're about cleaning up the cruft that accidentally grew while you were solving other problems. If you're passing an "important" boolean through multiple function call layers you're already going to testing hell, and in that situation the refactor you've described is generally a pure win in my experience.
I don't think there's a standard base class for this in python, unless maybe you're lazy enough to pass an argparse.Namespace as your context object just because you already had your parameter values there.

def foo(*args, **kwargs):
sub_foo(*args, **kwargs)

Better would be to use introspection to call through to the subfunction.
You just need a way to get information on the function. You could do something like this:
def passthru(func):
l = inspect.stack()[1][0].f_locals
args = inspect.getargspec(func)[0]
kwargs = dict((x, l[x]) for x in args if x in l)
return func(**kwargs)
def f(x=1, y=2):
print x,y
def g(x=4):
passthru(f)
f()
1 2
g()
4 2
g(6)
6 2
It seems to have some overhead, though.

I'm not exactly sure what you are looking for, so perhaps an edit to add in some additional information might be helpful (e.g. what do you mean by clean code, and why doesn't *args, **kwargs satisfy that, what is the ultimate goal you are trying to accomplish, etc).
I'll throw out one additional idea not mentioned yet. You could create a dictionary and pass it in as the keyword arguments by using **
def foo(important=False, debug=False, dry_run=False):
print important, debug, dry_run
args = dict()
args['important'] = True
args['debug'] = True
args['dry_run'] = False
foo(**args)
Or since you wanted to involve OOP, you could perhaps use an object.
class Args(object):
pass
def foo(important=False, debug=False, dry_run=False):
print important, debug, dry_run
args = Args()
args.important = True
args.debug = True
args.dry_run = False
foo(**args.__dict__)

I don't understand why you would do that. Generally, if a method has that many arguments the problem is that method is doing too much, not that you need to wrap the arguments up in some object. If you just want to be able to pass the arguments around you can use **kwargs.
That said, if you have some strange use-case and really need this you could use NamedTuple.
def foo(a=1, b=2, c=3, d=4, e=5, f=6, g=7): # kwarg way
do_things(a, 7, b, 12, c, 3, d, e, f, g) # or whatever
FooArgs = collections.namedtuple('FooArgs', ['a', 'b', 'c', 'd', 'e', 'f', 'g'])
foo_args = FooArgs(1, 2, 3, 4, 5, 6, 7)
foo_args.a # 1
foo_args.e # 5
def foo(args): # namedtuple way
do_things(args.a, 7, args.b, 12, args.c, 3, args.d, args.e, args.f, args.g)

I see a few ways out:
automagic e.g. thread-local storage or other context where these values can be gotten from. web frameworks often follow this, e.g. here https://stackoverflow.com/a/19484699/705086 I find this most pythonic, in a sense that it's easier to read. Call it poor man's context-oriented programming. It's similar to giving direct access to sys.argv but more precise.
it's best for cross-cutting concerns, authorization, logging, usage limits, retries…
collections.namedtuple especially useful if same set of arguments is often repeated exactly or if multiple instance of this kind are common, for example:
job = collections.namedtuple("job", "id started foo bar")
todo = [job(record) for record in db.select(…)]
**kwargs, anonymous, bug-prone when unexpected keyword argument is passed in.
self, if you keep passing arguments from one function to the next, perhaps these should be class/object members
You can also mix and match these, in your example:
debug ⇾ automagic context
dry_run ⇾ automagic context
important ⇾ keep a named kwarg, for explicit is better than implicit

I believe that you're just wasting your time and making your code more complex. As a Python developer I'd rather see a function with 20 arguments than a function that takes a complex Args object.

Related

initialize function and binding by iteration python [duplicate]

Do I have to formally define a function before I can use it as an element of a dictionary?
def my_func():
print 'my_func'
d = {
'function': my_func
}
I would rather define the function inline. I just tried to type out what I want to do, but the whitespace policies of python syntax make it very hard to define an inline func within a dict. Is there any way to do this?
The answer seems to be that there is no way to declare a function inline a dictionary definition in python. Thanks to everyone who took the time to contribute.
Do you really need a dictionary, or just getitem access?
If the latter, then use a class:
>>> class Dispatch(object):
... def funcA(self, *args):
... print('funcA%r' % (args,))
... def funcB(self, *args):
... print('funcB%r' % (args,))
... def __getitem__(self, name):
... return getattr(self, name)
...
>>> d = Dispatch()
>>>
>>> d['funcA'](1, 2, 3)
funcA(1, 2, 3)
You could use a decorator:
func_dict = {}
def register(func):
func_dict[func.__name__] = func
return func
#register
def a_func():
pass
#register
def b_func():
pass
The func_dict will end up mapping using the entire name of the function:
>>> func_dict
{'a_func': <function a_func at 0x000001F6117BC950>, 'b_func': <function b_func at 0x000001F6117BC8C8>}
You can modify the key used by register as desired. The trick is that we use the __name__ attribute of the function to get the appropriate string.
Consider using lambdas, but note that lambdas can only consist of one expression and cannot contain statements (see http://docs.python.org/reference/expressions.html#lambda).
e.g.
d = { 'func': lambda x: x + 1 }
# call d['func'](2) will return 3
Also, note that in Python 2, print is not a function. So you have to do either:
from __future__ import print_function
d = {
'function': print
}
or use sys.stdout.write instead
d = {
'function': sys.stdout.write
}
Some functions can be easily 'inlined' anonymously with lambda expressions, e.g.:
>>> d={'function': lambda x : x**2}
>>> d['function'](5)
25
But for anything semi-complex (or using statements) you probably just should define them beforehand.
There is no good reason to want to write this using a dictionary in Python. It's strange and is not a common way to namespace functions.
The the Python philosophies that apply here are:
There should be one-- and preferably only one --obvious way to do it.
Combined with
Readability counts.
Doing it this way also makes things hard to understand and read for the typical Python user.
The good things the dictionary does in this case is map strings to functions and namespace them within a dictionary, but this functionality is already provided by both modules and classes and it's much easier to understand by those familiar with Python.
Examples:
Module method:
#cool.py
def cool():
print 'cool'
Now use the module like you would be using your dict:
import cool
#cool.__dict__['cool']()
#update - to the more correct idiom vars
vars(cool)['cool']()
Class method:
class Cool():
def cool():
print 'cool'
#Cool.__dict__['cool']()
#update - to the more correct idiom vars
vars(Cool)['cool']()
Edit after comment below:
argparse seems like a good fit for this problem, so you don't have to reinvent the wheel. If you do decide to implement it completely yourself though argparse source should give you some good direction. Anyways the sections below seem to apply to this use case:
15.4.4.5. Beyond sys.argv
Sometimes it may be useful to have an ArgumentParser parse arguments
other than those of sys.argv. This can be accomplished by passing a
list of strings to parse_args(). This is useful for testing at the
interactive prompt:
15.4.5.1. Sub-commands¶
ArgumentParser.add_subparsers()
Many programs split up their functionality into a number of sub-commands, for example, the svn program can invoke sub-commands
like svn checkout, svn update, and svn commit.
15.4.4.6. The Namespace object
It may also be useful to have an ArgumentParser assign attributes to
an already existing object, rather than a new Namespace object. This
can be achieved by specifying the namespace= keyword argument:
Update, here's an example using argparse
strategizer = argparse.ArgumentParser()
strat_subs = strategizer.add_subparsers()
math = strat_subs.add_parser('math')
math_subs = math.add_subparsers()
math_max = math_subs.add_parser('max')
math_sum = math_subs.add_parser('sum')
math_max.set_defaults(strategy=max)
math_sum.set_defaults(strategy=sum)
strategizer.parse_args('math max'.split())
Out[46]: Namespace(strategy=<built-in function max>)
strategizer.parse_args('math sum'.split())
Out[47]: Namespace(strategy=<built-in function sum>)
I would like to note the reasons I would recommend argparse
Mainly the requirement to use strings that represent options and sub options to map to functions.
It's dead simple (after getting past the feature filled argparse module).
Uses a Python Standard Library Module. This let's others familiar with Python grok what your doing without getting into implementation details, and is very well documented for those who aren't.
Many extra features could be taken advantage of out of the box (not the best reason!).
Using argparse and Strategy Pattern together
For the plain and simple implementation of the Strategy Pattern, this has already been answered very well.
How to write Strategy Pattern in Python differently than example in Wikipedia?
#continuing from the above example
class MathStudent():
def do_math(self, numbers):
return self.strategy(numbers)
maximus = strategizer.parse_args('math max'.split(),
namespace=MathStudent())
sumera = strategizer.parse_args('math sum'.split(),
namespace=MathStudent())
maximus.do_math([1, 2, 3])
Out[71]: 3
sumera.do_math([1, 2, 3])
Out[72]: 6
The point of inlining functions is to blur the distinction between dictionaries and class instances. In javascript, for example, this techinque makes it very pleasant to write control classes that have little reusability. Also, and very helpfully the API then conforms to the well-known dictionary protocols, being self explanatory (pun intended).
You can do this in python - it just doesn't look like a dictionary! In fact, you can use the class keyword in ANY scope (i.e. a class def in a function, or a class def inside of a class def), and it's children can be the dictonary you are looking for; just inspect the attributes of a definition as if it was a javascript dictionary.
Example as if it was real:
somedict = {
"foo":5,
"one_function":your method here,
"two_function":your method here,
}
Is actually accomplished as
class somedict:
foo = 5
#classmethod
def one_method(self):
print self.foo
self.foo *= 2;
#classmethod
def two_method(self):
print self.foo
So that you can then say:
somedict.foo #(prints 5)
somedict.one_method() #(prints 5)
somedict.two_method() #(prints 10)
And in this way, you get the same logical groupings as you would with your "inlining".

I want a configurable python callable. Class versus function factory?

I want to write a number of related parse functions, that take text and return objects or raise exceptions, rather like int() and float() do. I do anticipate being able to supply these recursively to higher level parsers. I want to be able to configure these at run time, and have either their docstrings, or some other attribute, settable to report how they've been configured.
Python's 'There should be one—and preferably only one—obvious way to do it' has let me down here.
I appear to be able to do exactly the same thing with either a class with a call method, or a function that returns a function.
For instance, my two attempts at a toy range-constrained number parser are below.
class Parser():
def __init__(self, nType=int, nRange=None):
self.nType = nType
self.nRange = nRange
self.__doc__ = 'class - range is {}'.format(str(nRange))
def __call__(self, inStr):
x = self.nType(inStr)
if self.nRange:
if not self.nRange[0] <= x <= self.nRange[1]:
raise ValueError('{} is out of range (class)'.format(inStr))
return x
def parserFactory(nType=int, nRange=None):
def parser(inStr):
x = nType(inStr)
if nRange:
if not nRange[0] <= x <= nRange[1]:
raise ValueError('{} is out of range (factory)'.format(inStr))
return x
parser.__doc__ = 'factory - range is {}'.format(str(nRange))
return parser
a = Parser()
b = Parser(nRange=(3,6), nType=float)
c = parserFactory(nType=float)
d = parserFactory(nRange=(3, 6))
for string in ['4', '14']:
for x in [a,b,c,d,int]:
print(x.__doc__[:35])
try:
print(string, x(string))
except ValueError as error:
print(error)
Both do what I want. Both have more or less the same complexity, and essentially the same statements, albeit in a different order. The factory is slightly shorter. I don't anticipate needing to use any other class methods. I don't see any clear way to choose which is 'better'.
Is one or the other more pythonic?
Is one or the other more likely to run me into difficulty if (when) I try to modify them in yet unanticipated ways?
What do most people do?
I'm a fairly inexperienced programmer. I've read wikipedia's entry on 'factory method pattern' and the subtleties in it go straight over my head.
(edit) Having read comments, answers and links, I think one of the problems is that neither is a good fit. You would not expect a class to have so few methods, even though it can. You would not expect a function to be carrying an attribute, even though it can. As the syntax is so similar, it probably doesn't matter which I use initially, as I can switch without a change in behaviour. (/edit)
You can think of functions as syntactic sugar for classes with only a __init__ and __call__. That would also be true for generators vs classes, context managers vs classes, ...
If you are only passing the parser around and calling it someplace(i.e. doing function things), then you should use the factory. It also allows you to migrate to the class later easily, your factory can simply return the class.
If, besides calling it, you need to inspect or change the values of the parser in other parts of your code, then you should go with classes.
All that said, in this specific case you showed here, I think I would use functools.partial

Adding function options via kwargs (e.g., verbose)

I'm trying to efficiently add optional features so some class functions. Pretty new to python, so I'm trying to learn good habits. I'm implementing this through **kwargs (for better or worse).
One such example is adding a 'verbose' option which adds several conditional print statements throughout the functions, and will suffice as a unit test here. There's a lot of SO Q+As (and other tutorials) about **kwargs usage for variables but less about usage as execution flags.
My frame of reference here is thinking in terms of overloads and switch statements so I was on the fence with using a try statement if various **kwargs existed.
Can't copy and paste, so here's some basic example:
class Fruit:
def __init__(self, name):
self.name = name
class Bowl:
def __init__(self):
self.contents = []
def fill_bowl(self, *fruit, **options):
self.fruit_list = []
for x in options:
if options.get(x) == 'verbose':
verbose = 1
else:
verbose = 0
From here, I'd add several if verbose == 1: print... various attributes of the Fruit I'm adding to the bowl, number of contents, etc., etc. to help with sanity checks without going through the debugger in-depth.
This is functional, barring some transcription error in typing this. Am I on the right track, or is there a more intuitive way to accomplish this?
You shouldn't be using kwargs here. This are for when you need to accept unknown options. Here i you know what you want to accept: a verbose parameter. To make it optional you can give it a default value.
def fill_bowl(self, *fruit, verbose=0):
Although you should probably use True and False rather than 1 and 0.
(And note even if you did want to use kwargs, trees no reason to iterate through the dict like that; you would just do if kwargs.get("verbose").)

Why would Python's variable length arguments be used over passing in a list/dictionary?

Python allows you to declare a function like
def print_all(*arguments):
for a in arguments:
print(a)
print_all(1,2,3)
Which allows one to pass in a variable amount of data. This seems much less readable to me than building a list or a dictionary, and passing those in as arguments like so.
def print_all2(things_to_print):
for thing in things_to_print:
print(thing)
things_to_print = [1,2,3]
print_all2(things_to_print)
The second option allows you to give the argument a proper name. When would it be preferable to use the *arguments technique? Is there a time when using *arguments is more Pythonic?
Is there a time when using *arguments is more Pythonic?
Not only "more pythonic", but it's often necessary.
Your need to use *args whenever you don't know how many arguments a function will recieve.
Think, for example, about decorators:
def deco(fun):
def wrapper(*args, **kwargs):
do_stuff()
return fun(*args, **kwargs)
return wrapper
Very opinion based, but sometimes you want to use a function providing the arguments in-line. It just looks a bit clearer:
function("please", 0, "work this time", 2.3)
than:
function(["please", 0, "work this time", 2.3])
In fact, there is a good example, which you even mention in your question: print! Imagine you'd have to create a list each time you wanted to print something:
print(["please print my variable", x, " and another:", y])
print([x])
Tedious.

Is there a reason not to send super().__init__() a dictionary instead of **kwds?

I just started building a text based game yesterday as an exercise in learning Python (I'm using 3.3). I say "text based game," but I mean more of a MUD than a choose-your-own adventure. Anyway, I was really excited when I figured out how to handle inheritance and multiple inheritance using super() yesterday, but I found that the argument-passing really cluttered up the code, and required juggling lots of little loose variables. Also, creating save files seemed pretty nightmarish.
So, I thought, "What if certain class hierarchies just took one argument, a dictionary, and just passed the dictionary back?" To give you an example, here are two classes trimmed down to their init methods:
class Actor:
def __init__(self, in_dict,**kwds):
super().__init__(**kwds)
self._everything = in_dict
self._name = in_dict["name"]
self._size = in_dict["size"]
self._location = in_dict["location"]
self._triggers = in_dict["triggers"]
self._effects = in_dict["effects"]
self._goals = in_dict["goals"]
self._action_list = in_dict["action list"]
self._last_action = ''
self._current_action = '' # both ._last_action and ._current_action get updated by .update_action()
class Item(Actor):
def __init__(self,in_dict,**kwds)
super().__init__(in_dict,**kwds)
self._can_contain = in_dict("can contain") #boolean entry
self._inventory = in_dict("can contain") #either a list or dict entry
class Player(Actor):
def __init__(self, in_dict,**kwds):
super().__init__(in_dict,**kwds)
self._inventory = in_dict["inventory"] #entry should be a Container object
self._stats = in_dict["stats"]
Example dict that would be passed:
playerdict = {'name' : '', 'size' : '0', 'location' : '', 'triggers' : None, 'effects' : None, 'goals' : None, 'action list' = None, 'inventory' : Container(), 'stats' : None,}
(The None's get replaced by {} once the dictionary has been passed.)
So, in_dict gets passed to the previous class instead of a huge payload of **kwds.
I like this because:
It makes my code a lot neater and more manageable.
As long as the dicts have at least some entry for the key called, it doesn't break the code. Also, it doesn't matter if a given argument never gets used.
It seems like file IO just got a lot easier (dictionaries of player data stored as dicts, dictionaries of item data stored as dicts, etc.)
I get the point of **kwds (EDIT: apparently I didn't), and it hasn't seemed cumbersome when passing fewer arguments. This just appears to be a comfortable way of dealing with a need for a large number of attributes at the the creation of each instance.
That said, I'm still a major python noob. So, my question is this: Is there an underlying reason why passing the same dict repeatedly through super() to the base class would be a worse idea than just toughing it out with nasty (big and cluttered) **kwds passes? (e.g. issues with the interpreter that someone at my level would be ignorant of.)
EDIT:
Previously, creating a new Player might have looked like this, with an argument passed for each attribute.
bob = Player('bob', Location = 'here', ... etc.)
The number of arguments needed blew up, and I only included the attributes that really needed to be present to not break method calls from the Engine object.
This is the impression I'm getting from the answers and comments thus far:
There's nothing "wrong" with sending the same dictionary along, as long as nothing has the opportunity to modify its contents (Kirk Strauser) and the dictionary always has what it's supposed to have (goncalopp). The real answer is that the question was amiss, and using in_dict instead of **kwds is redundant.
Would this be correct? (Also, thanks for the great and varied feedback!)
I'm not sure I understand your question exactly, because I don't see how the code looked before you made the change to use in_dict. It sounds like you have been listing out dozens of keywords in the call to super (which is understandably not what you want), but this is not necessary. If your child class has a dict with all of this information, it can be turned into kwargs when you make the call with **in_dict. So:
class Actor:
def __init__(self, **kwds):
class Item(Actor):
def __init__(self, **kwds)
self._everything = kwds
super().__init__(**kwds)
I don't see a reason to add another dict for this, since you can just manipulate and pass the dict created for kwds anyway
Edit:
As for the question of the efficiency of using the ** expansion of the dict versus listing the arguments explicitly, I did a very unscientific timing test with this code:
import time
def some_func(**kwargs):
for k,v in kwargs.items():
pass
def main():
name = 'felix'
location = 'here'
user_type = 'player'
kwds = {'name': name,
'location': location,
'user_type': user_type}
start = time.time()
for i in range(10000000):
some_func(**kwds)
end = time.time()
print 'Time using expansion:\t{0}s'.format(start - end)
start = time.time()
for i in range(10000000):
some_func(name=name, location=location, user_type=user_type)
end = time.time()
print 'Time without expansion:\t{0}s'.format(start - end)
if __name__ == '__main__':
main()
Running this 10,000,000 times gives a slight (and probably statistically meaningless) advantage passing around a dict and using **.
Time using expansion: -7.9877269268s
Time without expansion: -8.06108212471s
If we print the IDs of the dict objects (kwds outside and kwargs inside the function), you will see that python creates a new dict for the function to use in either case, but in fact the function only gets one dict forever. After the initial definition of the function (where the kwargs dict is created) all subsequent calls are just updating the values of that dict belonging to the function, no matter how you call it. (See also this enlightening SO question about how mutable default parameters are handled in python, which is somewhat related)
So from a performance perspective, you can pick whichever makes sense to you. It should not meaningfully impact how python operates behind the scenes.
I've done that myself where in_dict was a dict with lots of keys, or a settings object, or some other "blob" of something with lots of interesting attributes. That's perfectly OK if it makes your code cleaner, particularly if you name it clearly like settings_object or config_dict or similar.
That shouldn't be the usual case, though. Normally it's better to explicitly pass a small set of individual variables. It makes the code much cleaner and easier to reason about. It's possible that a client could pass in_dict = None by accident and you wouldn't know until some method tried to access it. Suppose Actor.__init__ didn't peel apart in_dict but just stored it like self.settings = in_dict. Sometime later, Actor.method comes along and tries to access it, then boom! Dead process. If you're calling Actor.__init__(var1, var2, ...), then the caller will raise an exception much earlier and provide you with more context about what actually went wrong.
So yes, by all means: feel free to do that when it's appropriate. Just be aware that it's not appropriate very often, and the desire to do it might be a smell telling you to restructure your code.
This is not python specific, but the greatest problem I can see with passing arguments like this is that it breaks encapsulation. Any class may modify the arguments, and it's much more difficult to tell which arguments are expected in each class - making your code difficult to understand, and harder to debug.
Consider explicitly consuming the arguments in each class, and calling the super's __init__ on the remaining. You don't need to make them explicit:
class ClassA( object ):
def __init__(self, arg1, arg2=""):
pass
class ClassB( ClassA ):
def __init__(self, arg3, arg4="", *args, **kwargs):
ClassA.__init__(self, *args, **kwargs)
ClassB(3,4,1,2)
You can also leave the variables uninitialized and use methods to set them. You can then use different methods in the different classes, and all subclasses will have access to the superclass methods.

Categories

Resources