Handle many function parameters through many layers of functions? - python

I have a library that has functions like the following:
class ResultObject:
def __init__(self)
self.object_string = []
...
def append(self, other_result_object)
self.object_string.append(other_result_object)
...
def bottom_func_A(option_1, option_2, param_0=def_arg_0, param_1=def_arg_1, ..., param_14=def_arg_14):
""""
This is a process whose output genuinely depends on 15 input parameters enumerated param_n
"""
...
return result_object_A
def bottom_func_B(option_1, option_2, param_0=def_arg_0, param_1=def_arg_1, ..., param_14=def_arg_14):
""""
This is a different process whose output genuinely depends on 15 input parameters
enumerated param_n. parameters values or defautsare not shared with bottom_func_A
"""
...
return result_object_A
def wrapper_A(wrapper_option):
result_1 = bottom_func_A(True, True)
if wrapper_option:
result_2 = bottom_func_A(True, False)
# Do something with result_1 and result_2 if present...
...
return wrapper_object_A
def wrapper_B(wrapper_option):
result_1 = bottom_func_B(True, True)
if wrapper_option:
result_2 = bottom_func_B(True, False)
# Do something with result_1 and result_2 if present...
...
return wrapper_object_B
def do_both_A_B(wrapper_option)
result_A = wrapper_A(wrapper_option)
result_B = wrapper_B(wrapper_option)
# Do something with result_A and result_B
...
return wrapper_A_B_object
In the library the bottom_func_X functions do not always have similar signatures, sometimes they have more or less "option"-like params and more or less "numerical parameter" parameters like param_N. Their bodies are also generic. Similar in this case I've shown wrapper_A and wrapper_B with similar structures but the "wrapper" functions need not have similar structures to eachother. Also, there may be a wrapper (like do_both_A_B) that involves bottom functions A, R, G etc. That is wrappers aren't 1:1 with bottom_funcs. The only structural consistency is that ALL of these functions (bottom funcs and wrappers at all layers) return instances of the ResultObject class. These result objects get "strung together" into long strings where each node in the string was probably generated by some bottom_func with a set of parameters. Note also that are many dozens of bottom_func type objects and wrappers and super-wrappers that string together many combinations of bottom_funcs. That is, these many parameter functions are not one-off things, but that there are many of them in the library.
This is fine and can be made to work. The problem is this. It is very often the case that I want parameters from the bottom functions to be exposed in the signatures of the wrappers at various levels. I see 4 ways to do this.
(1) Just include additional parameters for inner functions at each layer.
(1a) Only do this on an as-needed basis. If I or someone has a use case for exposing param_4 from bottom_func_A in wrapper_A then add it. The downside of this is you have a hodge-podge of exposed and unexposed parameters so you may find yourself often needing to modify this library rather than just call things from it.
(1b) Do this carte-blanche. Expose all parameters at all layers. Obviously this is a mess because do_both_A_B would need to expose 30 parameters, 15 for each of its bottom layer functions. Most of the code in the library would rapidly become function signatures.
Option (1a) is what I'd say has been de-facto the existing solution in the library.
(2) Don't call bottom_funcs within wrappers, rather, accept results from bottom_funcs in the wrappers. That is, bottom_func_A produces a result_object_A and this is al wrapper_A requires. So wrapper_A could have a parameter called result_object_A or something. Maybe it could default to None and in this case wrapper_A will indeed call bottom_func_A using the default arguments of bottom_func_A. But if for some reason a caller doesn't want the default result_object_A they can call bottom_func_A themselves with their desired custom parameters.
(3) Bundle the param_N parameters into some sort of dataclass bundle that can be passed more fluidly. This param bundle would then be exposed at all layers and a caller could modify it from the default. The downside is class or instance of this databundle would need to be defined for each bottom_func. Each of the bottom_funcs would also now have the job of unbundling the data bundle within the bottom_func body. The data bundle and bottom_func would need to cooperate, so if bottom_func is modified to include some new feature (that requires additional params) the dev would need to make sure to modify the corresponding data bundle. I think in practice this would have a similar effect on the codebase as option (2). That is, the default data bundle would be used if callers of the wrappers don't choose to modify the bundle, but if they like, they can do the work of generating a custom bundle and passing that through. In effect this isn't much different than calling bottom_func_A with the caller's desired parameters and passing the result through.
(4) Similar to option 3, but the "bundle" could just be kwargs dictionaries that get passed with appropriate names through the different layers. That is do_both_A_B could have something like bottom_func_A_kwargs as an input parameter that exposes the ability to modify one or both versions of bottom_func_A in it.
(5) The nested parameters could be turned into some sort of global variables whose value can be modified in wrappers or by callers. These could be python global variables or entries in some sort of database module/file or actual database. This is a pretty nasty idea for a few reasons but in practice this sort of solution has arisen in this code base as well. The major downside that pops in my mind first is if a caller modifies a param value but forgets to set it back to its default value but that's not to say there's not other big ones.
Which of these patterns (or other patterns I haven't listed) most aptly solves the parameter explosion problem that arises from wanting to expose the ability to modify the lowest level parameters at higher layers? Perhaps to some degree the issue is that I want a very high degree of flexibility in what this code can do so I just have to pay the price of having tons of code dedicated to handling parameters?

Related

Argument convention in PyTorch

I am new to PyTorch and while going through the examples, I noticed that sometimes functions have a different convention when accepting arguments. For example transforms.Compose receives a list as its argument:
transform=transforms.Compose([ # Here we pass a list of elements
transforms.ToTensor(),
transforms.Normalize(
(0.4915, 0.4823, 0.4468),
(0.2470, 0.2435, 0.2616)
)
]))
At the same time, other functions receive the arguments individually (i.e. not in a list). For example torch.nn.Sequential:
torch.nn.Sequential( # Here we pass individual elements
torch.nn.Linear(1, 4),
torch.nn.Tanh(),
torch.nn.Linear(4, 1)
)
This has been a common typing mistake for me while learning.
I wonder if we are implying something when:
the arguments are passed as a list
the arguments are passed as individual items
Or is it simply the preference of the contributing author and should be memorized as is?
Update 1: Note that I do not claim that either format is better. I am merely complaining about lack of consistency. Of course (as Ivan stated in his answer) it makes perfect sense to follow one format if there is a good reason for it (e.g. transforms.Normalize). But if there is not, then I would vote for consistency.
This is not a convention, it is a design decision.
Yes, torch.nn.Sequential (source) receives individual items, whereas torchvision.transforms.Compose (source) receives a single list of items. Those are arbitrary design choices. I believe PyTorch and Torchvision are maintained by different groups of people, which might explain the difference. One could argue it is more coherent to have the inputs passed as a list since it is as a varied length, this is the approach used in more conventional programming languages such as C++ and Java. On the other hand you could argue it is more readable to pass them as a sequence of separate arguments instead, which what languages such as Python.
In this particular case we would have
>>> fn1([element_a, element_b, element_c]) # single list
vs
>>> fn2(element_a, element_b, element_c) # separate args
Which would have an implementation that resembles:
def fn1(elements):
pass
vs using the star argument:
def fn2(*elements):
pass
However it is not always up to design decision, sometimes the implementation is clear to take. For instance, it would be much preferred to go the list approach when the function has other arguments (whether they are positional or keyword arguments). In this case it makes more sense to implement it as fn1 instead of fn2. Here I'm giving second example with keyword arguments. Look a the difference in interface for the first set of arguments in both scenarios:
>>> fn1([elemen_a, element_b], option_1=True, option_2=True) # list
vs
>>> fn2(element_a, element_b, option_1=True, option_2=True) # separate
Which would have a function header which looks something like:
def fn1(elements, option_1=False, option_2=False)
pass
While the other would be using a star argument under the hood:
def fn2(*elements, option_1=False, option_2=False)
pass
If an argument is positioned after the star argument it essentially forces the user to use it as a keyword argument...
Mentioning this you can check out the source code for both Compose and Sequential and you will notice how both only expect a list of elements and no additional arguments afterwards. So in this scenario, it might have been preferred to go with Sequential's approach using the star argument... but this is just personal preference!

**kwargs.pop(x) versus defining x as a function parameter

Lets say you're writing a child class that has a constructor that passes its unused kwargs up to the parent constructor, but your class has the argument x that it needs to store that shouldn't be passed to the parent.
I have seen two different approaches to this:
def __init__(self, **kwargs):
self.x = kwargs.pop('x', 'default')
super().__init__(**kwargs)
and
def __init__(self, x='default', **kwargs):
self.x = x
super().__init__(**kwargs)
Is there every any functional difference between these two constructors? Is there any reason to use one over the other?
The only difference I can see is that the second form, which defines x in the signature, allows the user to better see it as a possible argument, or an IDE to offer it as an autocomplete option. Or in Python 3.5+, you could add a type annotation to x. Does that make the first form objectively worse?
As already mentionned by Giacomo Alzetta in a comment, the second version allow to pass x as a positional argument when the first only allow named arguments, IOW with the second form you can use both Child(x=2) AND Child(2), while the first only supports Child(x=2).
Also, when using inspection to check the method's signature, the second form will clearly mention the existance of the x param, while the first won't.
And finally, the second version will yield a slightly clearer exception if x is not passed.
And that's for the functional differences.
Is there any reason to use one over the other?
Well... As a general rule, it's cleaner (best practice) to use explicit arguments whenever possible, even if only for readability, and from experience it does usually make maintenance easier indeed. So from this point of view, the second form can be seen as "objectively better" than the first.
This being said, when the parent method has dozens of mostly optional and rarely used arguments (django.forms.Form, I'm lookig at you) AND you also want to preserve positional arguments order, it can be convenient to just use the generic *args, **kwargs signature for the child and force the additional param(s) to be passed as kwargs. Assuming you clearly document this in the docstring, it's still explicit enough (as far as I'm concerned, YMMV), and also avoids a lot of clutter (you can have a look at django.forms.Form for a concrete example of what I mean here).
So as always with "best practices" and other golden rules, you have to understand and weight the pros and cons wrt/ the concrete case at hand.
PS: just to make things clear, django's Form class signature makes perfect sense so I'm not ranting here - it's just one of those cases where there's no "beautiful" solution to the problem, period.
Aside obvious differences in code clarity, there might be a little difference in speed of calling the function, in this case method init().
If you can, define all necessary arguments with default values if you have some, in both methods, and pass them classically, and exclude ones you do not wish.
In this way you make the code clear and speed of calls stays the same.
If you need some micro-optimization, then use timeit to check what works faster.
I expect that one with the "x" added as an argument will perhaps be a winner.
Because getting to its value directly from local variables will be faster and kwargs dict() is smaller.
When you use "normal" arguments, they are automatically inserted into the functions local variables dictionary.
When you use *args and/or **kwargs they are additional tuple() and/or dict() added as new local variables. They are first created from the arguments you passed into the function call.
When you are passing them to a next function, they are extracted
to match that function's call signature. In both operations you lose a tiny bit of speed.
If you add removing something from the kwargs dictionary, ( x = kwargs.pop("x") ), you also lose some speed.
By observing both codes, it seems that their call speed would be equal. But you should check. If you do not need an extra 0.000001 seconds when initializing your instances, then both options are fine and just choose what you like most.
But again, if you are free to do it, and if it will not greatly impair the code's maintenance, define all arguments and their default values and pass them on one-by-one.

explicitly passing functions in python

Out of curiosity is more desirable to explicitly pass functions to other functions, or let the function call functions from within. is this a case of Explicit is better than implicit?
for example (the following is only to illustrate what i mean)
def foo(x,y):
return 1 if x > y else 0
partialfun = functools.partial(foo, 1)
def bar(xs,ys):
return partialfun(sum(map(operator.mul,xs,ys)))
>>> bar([1,2,3], [4,5,6])
--or--
def foo(x,y):
return 1 if x > y else 0
partialfun = functools.partial(foo, 1)
def bar(fn,xs,ys):
return fn(sum(map(operator.mul,xs,ys)))
>>> bar(partialfun, [1,2,3], [4,5,6])
There's not really any difference between functions and anything else in this situation. You pass something as an argument if it's a parameter that might vary over different invocations of the function. If the function you are calling (bar in your example) is always calling the same other function, there's no reason to pass that as an argument. If you need to parameterize it so that you can use many different functions (i.e., bar might need to call many functions besides partialfun, and needs to know which one to call), then you need to pass it as an argument.
Generally, yes, but as always, it depends. What you are illustrating here is known as dependency injection. Generally, it is a good idea, as it allows separation of variability from the logic of a given function. This means, for example, that it will be extremely easy for you to test such code.
# To test the process performed in bar(), we can "inject" a function
# which simply returns its argument
def dummy(x):
return x
def bar(fn,xs,ys):
return fn(sum(map(operator.mul,xs,ys)))
>>> assert bar(dummy, [1,2,3], [4,5,6]) == 32
It depends very much on the context.
Basically, if the function is an argument to bar, then it's the responsibility of the caller to know how to implement that function. bar doesn't have to care. But consequently, bar's documentation has to describe what kind of function it needs.
Often this is very appropriate. The obvious example is the map builtin function. map implements the logic of applying a function to each item in a list, and giving back a list of results. map itself neither knows nor cares about what the items are, or what the function is doing to them. map's documentation has to describe that it needs a function of one argument, and each caller of map has to know how to implement or find a suitable function. But this arrangement is great; it allows you to pass a list of your custom objects, and a function which operates specifically on those objects, and map can go away and do its generic thing.
But often this arrangement is inappropriate. A function gives a name to a high level operation and hides the internal implementation details, so you can think of the operation as a unit. Allowing part of its operation to be passed in from outside as a function parameter exposes that it works in a way that uses that function's interface.
A more concrete (though somewhat contrived) example may help. Lets say I've implemented data types representing Person and Job, and I'm writing a function name_and_title for formatting someone's full name and job title into a string, for client code to insert into email signatures or on letterhead or whatever. It's obviously going to take a Person and Job. It could potentially take a function parameter to let the caller decide how to format the person's name: something like lambda firstname, lastname: lastname + ', ' + firstname. But to do this is to expose that I'm representing people's names with a separate first name and last name. If I want to change to supporting a middle name, then either name_and_title won't be able to include the middle name, or I have to change the type of the function it accepts. When I realise that some people have 4 or more names and decide to change to storing a list of names, then I definitely have to change the type of function name_and_title accepts.
So for your bar example, we can't say which is better, because it's an abstract example with no meaning. It depends on whether the call to partialfun is an implementation detail of whatever bar is supposed to be doing, or whether the call to partialfun is something that the caller knows about (and might want to do something else). If it's "part of" bar, then it shouldn't be a parameter. If it's "part of" the caller, then it should be a parameter.
It's worth noting that bar could have a huge number of function parameters. You call sum, map, and operator.mul, which could all be parameterised to make bar more flexible:
def bar(fn, xs,ys, g, h, i):
return fn(g(h(i,xs,ys))
And the way in which g is called on the output of h could be abstracted too:
def bar(fn, xs, ys, g, h, i, j):
return fn(j(g, h(i, xs, ys)))
And we can keep going on and on, until bar doesn't do anything at all, and everything is controlled by the functions passed in, and the caller might as well have just directly done what they want done rather than writing 100 functions to do it and passing those to bar to execute the functions.
So there really isn't a definite answer one way or the other that applies all the time. It depends on the particular code you're writing.

Python: Using a dummy class to pass variable names?

This is a followup to function that returns a dict whose keys are the names of the input arguments, which I learned many things (paraphrased):
Python objects, on the whole, don't know their names.
No, this is not possible in general with *args. You'll have to use keyword arguments
When the number of arguments is fixed, you can do this with locals
Using globals(). This will only work if the values are unique in the module scope, so it's fragile
You're probably better off not doing this anyway and rethinking the problem.
The first point highlighting my fundamental misunderstanding of Python variables. The responses were very pedagogic and nearly instantaneous, clearly this is both a well-understood yet easily confused topic.
Since I'd like to learn how to do things proper, is it considered bad practice to create a dummy class to simply hold the variables with names attached to them?
class system: pass
S = system ()
S.T = 1.0
S.N = 20
S.L = 10
print vars(S)
This accomplishes my original intent, but I'm left wondering if there is something I'm not considering that can bite me later.
I do it as a homage to Javascript, where you don't have any distinction between dictionaries and instance variables. I think it's not necessarily an antipattern, also because differently from dictionaries, if you don't have the value it raises AttributeError instead of KeyError, and it is easier to spot typos of the name. As I said, not an antipattern, provided that
the scope of the class is restricted to a very specific usage
the routine or method you are calling (e.g. vars in your example) is private in nature. I would not want a public interface with that calling semantics, nor I want it as a returned entity
the name of the "dummy" class is extremely clear in its intent and the kind of aggregate it represents.
the lifetime of that object is short and uneventful. It is just a temporary bag of data.
If these constraints are not respected, go for a fully recognized class with properties.
you can do that, but why not use a dictionary?
but if you do that, you're better off passing keywords args to the class's constructor, and then let the constructor copy them to the app's members. something like:
class Foo(object):
def __init__(self, **kwargs):
self.__dict__.update(kwargs)

Which is a better __repr__ for a custom Python class?

It seems there are different ways the __repr__ function can return.
I have a class InfoObj that stores a number of things, some of which I don't particularly want users of the class to set by themselves. I recognize nothing is protected in python and they could just dive in and set it anyway, but seems defining it in __init__ makes it more likely someone might see it and assume it's fine to just pass it in.
(Example: Booleans that get set by a validation function when it determines that the object has been fully populated, and values that get calculated from other values when enough information is stored to do so... e.g. A = B + C, so once A and B are set then C is calculated and the object is marked Valid=True.)
So, given all that, which is the best way to design the output of __ repr__?
bob = InfoObj(Name="Bob")
# Populate bob.
# Output type A:
bob.__repr__()
'<InfoObj object at 0x1b91ca42>'
# Output type B:
bob.__repr__()
'InfoObj(Name="Bob",Pants=True,A=7,B=5,C=2,Valid=True)'
# Output type C:
bob.__repr__()
'InfoObj.NewInfoObj(Name="Bob",Pants=True,A=7,B=5,C=2,Valid=True)'
... the point of type C would be to not happily take all the stuff I'd set 'private' in C++ as arguments to the constructor, and make teammates using the class set it up using the interface functions even if it's more work for them. In that case I would define a constructor that does not take certain things in, and a separate function that's slightly harder to notice, for the purposes of __repr__
If it makes any difference, I am planning to store these python objects in a database using their __repr__ output and retrieve them using eval(), at least unless I come up with a better way. The consequence of a teammate creating a full object manually instead of going through the proper interface functions is just that one type of info retrieval might be unstable until someone figures out what he did.
The __repr__ method is designed to produce the most useful output for the developer, not the enduser, so only you can really answer this question. However, I'd typically go with option B. Option A isn't very useful, and option C is needlessly verbose -- you don't know how your module is imported anyway. Others may prefer option C.
However, if you want to store Python objects is a database, use pickle.
import pickle
bob = InfoObj(Name="Bob")
> pickle.dumps(bob)
b'...some bytestring representation of Bob...'
> pickle.loads(pickle.dumps(bob))
Bob(...)
If you're using older Python (pre-3.x), then note that cPickle is faster, but pickle is more extensible. Pickle will work on some of your classes without any configuration, but for more complicated objects you might want to write custom picklers.

Categories

Resources