Best practice: how to pass many arguments to a function? - python

I am running some numerical simulations, in which my main function must receive lots and lots of arguments - I'm talking 10 to 30 arguments depending on the simulation to run.
What are some best practices to handle cases like this? Dividing the code into, say, 10 functions with 3 arguments each doesn't sound very feasible in my case.
What I do is create an instance of a class (with no methods), store the inputs as attributes of that instance, then pass the instance - so the function receives only one input.
I like this because the code looks clean, easy to read, and because I find it easy to define and run alternative scenarios.
I dislike it because accessing class attributes within a function is slower than accessing a local variable (see: How / why to optimise code by copying class attributes to local variables?) and because it is not an efficient use of memory - too much data stored multiple times unnecessarily.
Any thoughts or recommendations?
myinput=MyInput()
myinput.input_sql_table = that_sql_table
myinput.input_file = that_input_file
myinput.param1 = param1
myinput.param2 = param2
myoutput = calc(myinput)
Alternative scenarios:
inputs=collections.OrderedDict()
scenarios=collections.OrderedDict()
inputs['base scenario']=copy.deepcopy(myinput)
inputs['param2 = 100']=copy.deepcopy(myinput)
inputs['param2 = 100'].param2 = 100
# loop through all the inputs and stores the outputs in the ordered dictionary scenarios

I don't think this is really a StackOverflow question, more of a Software Engineering question. For example check out this question.
As far as whether or not this is a good design pattern, this is an excellent way to handle a large number of arguments. You mentioned that this isn't very efficient in terms of memory or speed, but I think you're making an improper micro-optimization.
As far as memory is concerned, the overhead of running the Python interpreter is going to dwarf the couple of extra bytes used by instantiating your class.
Unless you have run a profiler and determined that accessing members of that options class is slowing you down, I wouldn't worry about it. This is especially the case because you're using Python. If speed is a real concern, you should be using something else.
You may not be aware of this, but most of the large scale number crunching libraries for Python aren't actually written in Python, they're just wrappers around C/C++ libraries that are much faster.
I recommend reading this article, it is well established that "Premature optimization is the root of all evil".

You could pass in a dictionary like so:
all_the_kwargs = {kwarg1: 0, kwarg2: 1, kwargN: xyz}
some_func_or_class(**all_the_kwargs)
def some_func_or_class(kwarg1: int = -1, kwarg2: int = 0, kwargN: str = ''):
print(kwarg1, kwarg2, kwargN)
Or you could use several named tuples like referenced here: Type hints in namedtuple
also note that depending on which version of python you are using there may be a limit to the number of arguments you can pass into a function call.
Or you could use just a dictionary:
def some_func(a_dictionary):
a_dictionary.get('argXYZ', None) # defaults to None if argXYZ doesn't exist

Related

Python: Can you dynamically get the amount of variables a function is going to return into?

Part of a utility system my AcecoolLib package I'm writing by porting all / most of my logic to Python, and other various languages, on contains a simple, but greatly useful helper... a function named ENUM.
It has many useful features, such as automatically creating maps of the enums, extended or reverse maps if you have the map assigned to more than just values, and a lot more.
It can create maps for generating function names dynamically, it can create simple maps between enumeration and text or string identifiers for language, and much more.
The function declaration is simple, too:
def ENUM( _count = None, *_maps ):
It has an extra helper... Here: https://www.dropbox.com/s/6gzi44i7dh58v61/dynamic_properties_accessorfuncs_and_more.py?dl=0
The other one isn't used. ENUM_MAP is, but the other isn't.
Anyway, before I start going into etc.. etc.. the question is:
How can I count the return variables outside of the function... ie:
ENUM_EXAMPLE_A, ENUM_EXAMPLE_B, ENUM_EXAMPLE_C, ENUM_LIST_EXAMPLE, MAP_ENUM_EXAMPLE = ENUM( None, [ '#example_a', '#example_b', '#example_c' ] )
Where List is a simple list of 0 = 0, 1 = 1, 2 = 2, or something. , then the map links so [ 0 = '#example_a', 1 = '#example_b', etc.. ], then [ '#example_a' = 0, etc.. ] for reverse... or something along those lines.
There are other advanced use cases, not sure if I have those features in the file above, but regardless... I'm trying to simply count the return vars... and get the names.
I know it is likely possible, to read the line from which the call is executed... read the file, get the line, break it apart and do all of that... but I'm hoping something exists to do that without having to code it from scratch in the default Python system...
in short: I'd like to get rid of the first argument of ENUM( _count, *_maps ) so that only the optional *_maps is used. So if I call: ENUM_A, ENUM_B, ENUM_C, LIST_ENUMS = ENUM( ); it'll detect 4 output returns, and get the name of them so I can see if the last contains certain text different from the style of the first... ie, if they want the list, etc.... If they add a map, then optional list, etc.. and I can just count back n _maps to find the list arg, or not...
I know it probably isn't necessary, but I want it to be easy and dynamic so if I add a new enum to a giant list, I don't have to add the number ( although for those I use the maps which means I have to add an entry anyway )...
Either way - I know in Lua, this is stupid easy to do with built-in functions.. I'm hoping Python has built in functions to easily grab the data too.
Thanks!
Here is the one proposed answer, similar to what I could do in my Lua framework... The difference, though, is my framework has to load all of the files into memory ( for dynamic reloading, and dynamic changes, going to the appropriate location - and to network the data by combining everything so the file i/o cost is 'averted' - and Lua handles tables incredibly well ).
The simple answer, is that it is possible.. I'm not sure about in default Python without file i/o, however this method would easily work. This answer will be in pseudo context - but the functionality does exist.
Logic:
1) Using traces, you can determine which file / path and which line, called the ENUM function.
2) Read the calling file as text -- if you can read directly to a line without having to process the entire file - then that would be quicker. There may be some libraries out there that do this. In default Python, I haven't done a huge amount of file i/o other than the basics so I'm not up to speed on all of the most useful things as I typically use SQL for storage purposes, etc...
3) With the line in question, split the line text on '=', ie: before the function call to have the arguments, and the function itself.. call it _result
4)a IF you have no results then someone called the function without returning anything - odd..
4) split _result[ 0 ] on ',' to get each individual argument, and trim whitespace left / right --
5) Combine the clean arguments into a list..
6) Process the args -- ie: determine the method the developer uses to name their enum values, and see if that style changes from the last argument ( if no map ). If map, then go back n or n*2 elements for the list, then onward from there for the map vars. With maps, map returns are given - the only thing I need to do dynamically is the number and determine if the user has a list arg, or not..
Note: There is a very useful and simple mechanism in Python to do a lot of these functions in-line with a single line of code.
All of this is possible, and easy to create in Python. The thing I dislike about this solution is the fact that it requires file i/o -- If your program is executed from another program, and doesn't remain in memory, this means these tasks are always repeated making it less friendly, and more costly...
If the program opens, and remains open, then the cost is more up-front instead of on-going making it not as bad.
Because I use ENUMs in everything, including quick executable scripts which run then close - I don't want to use file i/o..
But, a solution does exist. I'm looking for an alternate.
Simple answer is you can't.
In Python when you do (a, b, c) = func() it's called tuple unpacking. Essentially it's expecting func() to return a tuple of exactly 3 elements (in this example). However, you can also do a = func() and then a will contain a 3-element tuple or whatever func decided to return. Regardless of how func is called, there's nothing within the method that knows how the return value is going to be processed after it's returned.
I wanted to provide a more pythonic way of doing what you're intending, but I'm not really sure I understand the purpose of ENUM(). It seems like you're trying to create constants, but Python doesn't really have true constants.
EDIT:
Methods are only aware of what's passed in as arguments. If you want some sort of ENUM to value mapping then the best equivalent is a dict. You could then have a method that took ENUM('A', 'B', 'C') and returned {'A':0, 'B':1, 'C':2} and then you'd use dict look-ups to get the values.
enum = ENUM('A', 'B', 'C')
print(enum['A']) # prints 0

Function Encapsulation Efficiency in Python

I have a large set of objects, and I need to be able to do some complex stuff with each object, so I have some fairly long functions.
In general, is it better to put the long functions in the class that they'll actually be used in (GreatObject, below) for proper encapsulation, or is it better for efficiency to put one function in the collection class (GreatSet, which will only ever have one instance)?
class GreatSet(object):
def __init__(self):
self.great_set = [] # Will contain a lot of GreatObjects.
def long_method(self, great_object): # Is this function better here?
[Many lines of code]
class GreatObject(object):
def __init__(self, params):
self.params = params
def.long_method(self): # Or here?
[Many lines of code]
I'm using Python 2.7.
in both cases, long_method will belong to it's class (there will be a single long_method function per class, shared by all instances), and in both cases looking up obj.long_method will create a new Method instance for each lookup, so wrt/ "efficiency" (whatever it's supposed to mean) it won't make any difference. Also, unless you need maximum time and space performances - in which case a lower level language might be a best choice - you should really feel more concerned with proper design and maintainability than with maximum raw performances.
So, if long_method is supposed to work on GreatObject it might belong to GreatObject, but it depends on the respective responsabilities of those classes, what long_method really do, and which application layers long_method, GreatObject and GreatSet belong to. If for example GreatObject and GreatSet both belong to the domain model and long_method do presentation-related stuff then obviously long_method belongs neither in GreatObject nor GreatSet.
Finally, as PartialOrder mentions in his comment, "long" functions are most often a code / design smell. Sometimes a function has to be "long enough" because it has do something complex - and even then you can usually refactor it into smaller functions (eventually into methods of a distinct class if those functions need to share state) -, but quite often a long function means it's just doing too many things.

memory usage #on_trait_change vs _foo_changed()

I did built an application with Enthought Traits, which is using too much memory. I think, the problem is caused by trait notifications:
There seems to be a fundamental difference in memory usage of events caught by #on_trait_change or by using the special naming convention (e.g. _foo_changed() ). I made a little example with two classes Foo and FooDecorator, which i assumed to show exactly the same behaviour. But they don't!
from traits.api import *
class Foo(HasTraits):
a = List(Int)
def _a_changed(self):
pass
def _a_items_changed(self):
pass
class FooDecorator(HasTraits):
a = List(Int)
#on_trait_change('a[]')
def bar(self):
pass
if __name__ == '__main__':
n = 100000
c = FooDecorator
a = [c() for i in range(n)]
When running this script with c = Foo, Windows task manager shows a memory usage for the whole python process of 70MB, which stays constant for increasing n. For c = FooDecorator, the python process is using 450MB, increasing for higher n.
Can you please explain this behaviour to me?
EDIT: Maybe i should rephrase: Why would anyone choose FooDecorator over Foo?
EDIT 2: I just uninstalled python(x,y) 2.7.9 and installed the newest version of canopy with traits 4.5.0. Now the 450MB became 750MB.
EDIT 3: Compiled traits-4.6.0.dev0-py2.7-win-amd64 myself. The outcome is the same as in EDIT 2. So despite all plausibility https://github.com/enthought/traits/pull/248/files does not seem to be the cause.
I believe you are seeing the effect of a memory leak that has been fixed recently:
https://github.com/enthought/traits/pull/248/files
As for why one would use the decorator, in this particular instance the two versions are practically equivalent.
In general, the decorator is more flexible: you can give a list of traits to listen to, and you can use the extended name notation, as described here:
http://docs.enthought.com/traits/traits_user_manual/notification.html#semantics
For example, in this case:
class Bar(HasTraits):
b = Str
class FooDecorator(HasTraits):
a = List(Bar)
#on_trait_change('a.b')
def bar(self):
print 'change'
the bar notifier is going to be called for changes to the trait a, its items, and for the change of the trait b in each of the Bar items. Extended names can be quite powerful.
What's going on here is that Traits has two distinct ways of handling notifications: static notifiers and dynamic notifiers.
Static notifiers (such as those created by the specially-named _*_changed() methods) are fairly light-weight: each trait on an instance has a list of notifiers on t, which are basically the functions or methods with a lightweight wrapper.
Dynamic notifiers (such as those created with on_trait_change() and the extended trait name conventions like a[] are significantly more powerful and flexible, but as a result they are much more heavy-weight. In particular, in addition to the wrapper object they create, they also create a parsed representation of the extended trait name and a handler object, some of which are in-turn HasTraits subclass instances.
As a result, even for a simple expression like a[] there will be a fair number of new Python objects created, and these objects have to be created for every on_trait_change listener on every instance separately to properly handle corner-cases like instance traits. The relevant code is here: https://github.com/enthought/traits/blob/master/traits/has_traits.py#L2330
Base on the reported numbers, the majority of the difference in memory usage that you are seeing is in the creation of this dynamic listener infrastructure for each instance and each on_trait_change decorator.
It's worth noting that there is a short-circuit for on_trait_change in the case where you are using a simple trait name, in which case it generates a static trait notifier instead of a dynamic notifier. So if you were to instead write something like:
class FooSimpleDecorator(HasTraits):
a = List(Int)
#on_trait_change('a')
def a_updated(self):
pass
#on_trait_change('a_items')
def a_items_updated(self):
pass
you should see similar memory performance to the specially-named methods.
To answer the rephrased question about "why use on_trait_change", in FooDecorator you can write one method instead of two if your response to a change of either the list or any items in the list is the same. This makes code significantly easier to debug and maintain, and if you aren't creating thousands of these objects then the extra memory usage is negligible.
This becomes even more of a factor when you consider more sophisticated extended trait name patterns, where the dynamic listeners automatically handle changes which would otherwise require significant manual (and error-prone) code for hooking up and removing listeners from intermediate objects and traits. The power and simplicity of this approach usually outweighs the concerns about memory usage.

Which is a better __repr__ for a custom Python class?

It seems there are different ways the __repr__ function can return.
I have a class InfoObj that stores a number of things, some of which I don't particularly want users of the class to set by themselves. I recognize nothing is protected in python and they could just dive in and set it anyway, but seems defining it in __init__ makes it more likely someone might see it and assume it's fine to just pass it in.
(Example: Booleans that get set by a validation function when it determines that the object has been fully populated, and values that get calculated from other values when enough information is stored to do so... e.g. A = B + C, so once A and B are set then C is calculated and the object is marked Valid=True.)
So, given all that, which is the best way to design the output of __ repr__?
bob = InfoObj(Name="Bob")
# Populate bob.
# Output type A:
bob.__repr__()
'<InfoObj object at 0x1b91ca42>'
# Output type B:
bob.__repr__()
'InfoObj(Name="Bob",Pants=True,A=7,B=5,C=2,Valid=True)'
# Output type C:
bob.__repr__()
'InfoObj.NewInfoObj(Name="Bob",Pants=True,A=7,B=5,C=2,Valid=True)'
... the point of type C would be to not happily take all the stuff I'd set 'private' in C++ as arguments to the constructor, and make teammates using the class set it up using the interface functions even if it's more work for them. In that case I would define a constructor that does not take certain things in, and a separate function that's slightly harder to notice, for the purposes of __repr__
If it makes any difference, I am planning to store these python objects in a database using their __repr__ output and retrieve them using eval(), at least unless I come up with a better way. The consequence of a teammate creating a full object manually instead of going through the proper interface functions is just that one type of info retrieval might be unstable until someone figures out what he did.
The __repr__ method is designed to produce the most useful output for the developer, not the enduser, so only you can really answer this question. However, I'd typically go with option B. Option A isn't very useful, and option C is needlessly verbose -- you don't know how your module is imported anyway. Others may prefer option C.
However, if you want to store Python objects is a database, use pickle.
import pickle
bob = InfoObj(Name="Bob")
> pickle.dumps(bob)
b'...some bytestring representation of Bob...'
> pickle.loads(pickle.dumps(bob))
Bob(...)
If you're using older Python (pre-3.x), then note that cPickle is faster, but pickle is more extensible. Pickle will work on some of your classes without any configuration, but for more complicated objects you might want to write custom picklers.

Parameter names in Python functions that take single object or iterable

I have some functions in my code that accept either an object or an iterable of objects as input. I was taught to use meaningful names for everything, but I am not sure how to comply here. What should I call a parameter that can a sinlge object or an iterable of objects? I have come up with two ideas, but I don't like either of them:
FooOrManyFoos - This expresses what goes on, but I could imagine that someone not used to it could have trouble understanding what it means right away
param - Some generic name. This makes clear that it can be several things, but does explain nothing about what the parameter is used for.
Normally I call iterables of objects just the plural of what I would call a single object. I know this might seem a little bit compulsive, but Python is supposed to be (among others) about readability.
I have some functions in my code that accept either an object or an iterable of objects as input.
This is a very exceptional and often very bad thing to do. It's trivially avoidable.
i.e., pass [foo] instead of foo when calling this function.
The only time you can justify doing this is when (1) you have an installed base of software that expects one form (iterable or singleton) and (2) you have to expand it to support the other use case. So. You only do this when expanding an existing function that has an existing code base.
If this is new development, Do Not Do This.
I have come up with two ideas, but I don't like either of them:
[Only two?]
FooOrManyFoos - This expresses what goes on, but I could imagine that someone not used to it could have trouble understanding what it means right away
What? Are you saying you provide NO other documentation, and no other training? No support? No advice? Who is the "someone not used to it"? Talk to them. Don't assume or imagine things about them.
Also, don't use Leading Upper Case Names.
param - Some generic name. This makes clear that it can be several things, but does explain nothing about what the parameter is used for.
Terrible. Never. Do. This.
I looked in the Python library for examples. Most of the functions that do this have simple descriptions.
http://docs.python.org/library/functions.html#isinstance
isinstance(object, classinfo)
They call it "classinfo" and it can be a class or a tuple of classes.
You could do that, too.
You must consider the common use case and the exceptions. Follow the 80/20 rule.
80% of the time, you can replace this with an iterable and not have this problem.
In the remaining 20% of the cases, you have an installed base of software built around an assumption (either iterable or single item) and you need to add the other case. Don't change the name, just change the documentation. If it used to say "foo" it still says "foo" but you make it accept an iterable of "foo's" without making any change to the parameters. If it used to say "foo_list" or "foo_iter", then it still says "foo_list" or "foo_iter" but it will quietly tolerate a singleton without breaking.
80% of the code is the legacy ("foo" or "foo_list")
20% of the code is the new feature ("foo" can be an iterable or "foo_list" can be a single object.)
I guess I'm a little late to the party, but I'm suprised that nobody suggested a decorator.
def withmany(f):
def many(many_foos):
for foo in many_foos:
yield f(foo)
f.many = many
return f
#withmany
def process_foo(foo):
return foo + 1
processed_foo = process_foo(foo)
for processed_foo in process_foo.many(foos):
print processed_foo
I saw a similar pattern in one of Alex Martelli's posts but I don't remember the link off hand.
It sounds like you're agonizing over the ugliness of code like:
def ProcessWidget(widget_thing):
# Infer if we have a singleton instance and make it a
# length 1 list for consistency
if isinstance(widget_thing, WidgetType):
widget_thing = [widget_thing]
for widget in widget_thing:
#...
My suggestion is to avoid overloading your interface to handle two distinct cases. I tend to write code that favors re-use and clear naming of methods over clever dynamic use of parameters:
def ProcessOneWidget(widget):
#...
def ProcessManyWidgets(widgets):
for widget in widgets:
ProcessOneWidget(widget)
Often, I start with this simple pattern, but then have the opportunity to optimize the "Many" case when there are efficiencies to gain that offset the additional code complexity and partial duplication of functionality. If this convention seems overly verbose, one can opt for names like "ProcessWidget" and "ProcessWidgets", though the difference between the two is a single easily missed character.
You can use *args magic (varargs) to make your params always be iterable.
Pass a single item or multiple known items as normal function args like func(arg1, arg2, ...) and pass iterable arguments with an asterisk before, like func(*args)
Example:
# magic *args function
def foo(*args):
print args
# many ways to call it
foo(1)
foo(1, 2, 3)
args1 = (1, 2, 3)
args2 = [1, 2, 3]
args3 = iter((1, 2, 3))
foo(*args1)
foo(*args2)
foo(*args3)
Can you name your parameter in a very high-level way? people who read the code are more interested in knowing what the parameter represents ("clients") than what their type is ("list_of_tuples"); the type can be defined in the function documentation string, which is a good thing since it might change, in the future (the type is sometimes an implementation detail).
I would do 1 thing,
def myFunc(manyFoos):
if not type(manyFoos) in (list,tuple):
manyFoos = [manyFoos]
#do stuff here
so then you don't need to worry anymore about its name.
in a function you should try to achieve to have 1 action, accept the same parameter type and return the same type.
Instead of filling the functions with ifs you could have 2 functions.
Since you don't care exactly what kind of iterable you get, you could try to get an iterator for the parameter using iter(). If iter() raises a TypeError exception, the parameter is not iterable, so you then create a list or tuple of the one item, which is iterable and Bob's your uncle.
def doIt(foos):
try:
iter(foos)
except TypeError:
foos = [foos]
for foo in foos:
pass # do something here
The only problem with this approach is if foo is a string. A string is iterable, so passing in a single string rather than a list of strings will result in iterating over the characters in a string. If this is a concern, you could add an if test for it. At this point it's getting wordy for boilerplate code, so I'd break it out into its own function.
def iterfy(iterable):
if isinstance(iterable, basestring):
iterable = [iterable]
try:
iter(iterable)
except TypeError:
iterable = [iterable]
return iterable
def doIt(foos):
for foo in iterfy(foos):
pass # do something
Unlike some of those answering, I like doing this, since it eliminates one thing the caller could get wrong when using your API. "Be conservative in what you generate but liberal in what you accept."
To answer your original question, i.e. what you should name the parameter, I would still go with "foos" even though you will accept a single item, since your intent is to accept a list. If it's not iterable, that is technically a mistake, albeit one you will correct for the caller since processing just the one item is probably what they want. Also, if the caller thinks they must pass in an iterable even of one item, well, that will of course work fine and requires very little syntax, so why worry about correcting their misapprehension?
I would go with a name explaining that the parameter can be an instance or a list of instances. Say one_or_more_Foo_objects. I find it better than the bland param.
I'm working on a fairly big project now and we're passing maps around and just calling our parameter map. The map contents vary depending on the function that's being called. This probably isn't the best situation, but we reuse a lot of the same code on the maps, so copying and pasting is easier.
I would say instead of naming it what it is, you should name it what it's used for. Also, just be careful that you can't call use in on a not iterable.

Categories

Resources