Parameter names in Python functions that take single object or iterable - python

I have some functions in my code that accept either an object or an iterable of objects as input. I was taught to use meaningful names for everything, but I am not sure how to comply here. What should I call a parameter that can a sinlge object or an iterable of objects? I have come up with two ideas, but I don't like either of them:
FooOrManyFoos - This expresses what goes on, but I could imagine that someone not used to it could have trouble understanding what it means right away
param - Some generic name. This makes clear that it can be several things, but does explain nothing about what the parameter is used for.
Normally I call iterables of objects just the plural of what I would call a single object. I know this might seem a little bit compulsive, but Python is supposed to be (among others) about readability.

I have some functions in my code that accept either an object or an iterable of objects as input.
This is a very exceptional and often very bad thing to do. It's trivially avoidable.
i.e., pass [foo] instead of foo when calling this function.
The only time you can justify doing this is when (1) you have an installed base of software that expects one form (iterable or singleton) and (2) you have to expand it to support the other use case. So. You only do this when expanding an existing function that has an existing code base.
If this is new development, Do Not Do This.
I have come up with two ideas, but I don't like either of them:
[Only two?]
FooOrManyFoos - This expresses what goes on, but I could imagine that someone not used to it could have trouble understanding what it means right away
What? Are you saying you provide NO other documentation, and no other training? No support? No advice? Who is the "someone not used to it"? Talk to them. Don't assume or imagine things about them.
Also, don't use Leading Upper Case Names.
param - Some generic name. This makes clear that it can be several things, but does explain nothing about what the parameter is used for.
Terrible. Never. Do. This.
I looked in the Python library for examples. Most of the functions that do this have simple descriptions.
http://docs.python.org/library/functions.html#isinstance
isinstance(object, classinfo)
They call it "classinfo" and it can be a class or a tuple of classes.
You could do that, too.
You must consider the common use case and the exceptions. Follow the 80/20 rule.
80% of the time, you can replace this with an iterable and not have this problem.
In the remaining 20% of the cases, you have an installed base of software built around an assumption (either iterable or single item) and you need to add the other case. Don't change the name, just change the documentation. If it used to say "foo" it still says "foo" but you make it accept an iterable of "foo's" without making any change to the parameters. If it used to say "foo_list" or "foo_iter", then it still says "foo_list" or "foo_iter" but it will quietly tolerate a singleton without breaking.
80% of the code is the legacy ("foo" or "foo_list")
20% of the code is the new feature ("foo" can be an iterable or "foo_list" can be a single object.)

I guess I'm a little late to the party, but I'm suprised that nobody suggested a decorator.
def withmany(f):
def many(many_foos):
for foo in many_foos:
yield f(foo)
f.many = many
return f
#withmany
def process_foo(foo):
return foo + 1
processed_foo = process_foo(foo)
for processed_foo in process_foo.many(foos):
print processed_foo
I saw a similar pattern in one of Alex Martelli's posts but I don't remember the link off hand.

It sounds like you're agonizing over the ugliness of code like:
def ProcessWidget(widget_thing):
# Infer if we have a singleton instance and make it a
# length 1 list for consistency
if isinstance(widget_thing, WidgetType):
widget_thing = [widget_thing]
for widget in widget_thing:
#...
My suggestion is to avoid overloading your interface to handle two distinct cases. I tend to write code that favors re-use and clear naming of methods over clever dynamic use of parameters:
def ProcessOneWidget(widget):
#...
def ProcessManyWidgets(widgets):
for widget in widgets:
ProcessOneWidget(widget)
Often, I start with this simple pattern, but then have the opportunity to optimize the "Many" case when there are efficiencies to gain that offset the additional code complexity and partial duplication of functionality. If this convention seems overly verbose, one can opt for names like "ProcessWidget" and "ProcessWidgets", though the difference between the two is a single easily missed character.

You can use *args magic (varargs) to make your params always be iterable.
Pass a single item or multiple known items as normal function args like func(arg1, arg2, ...) and pass iterable arguments with an asterisk before, like func(*args)
Example:
# magic *args function
def foo(*args):
print args
# many ways to call it
foo(1)
foo(1, 2, 3)
args1 = (1, 2, 3)
args2 = [1, 2, 3]
args3 = iter((1, 2, 3))
foo(*args1)
foo(*args2)
foo(*args3)

Can you name your parameter in a very high-level way? people who read the code are more interested in knowing what the parameter represents ("clients") than what their type is ("list_of_tuples"); the type can be defined in the function documentation string, which is a good thing since it might change, in the future (the type is sometimes an implementation detail).

I would do 1 thing,
def myFunc(manyFoos):
if not type(manyFoos) in (list,tuple):
manyFoos = [manyFoos]
#do stuff here
so then you don't need to worry anymore about its name.
in a function you should try to achieve to have 1 action, accept the same parameter type and return the same type.
Instead of filling the functions with ifs you could have 2 functions.

Since you don't care exactly what kind of iterable you get, you could try to get an iterator for the parameter using iter(). If iter() raises a TypeError exception, the parameter is not iterable, so you then create a list or tuple of the one item, which is iterable and Bob's your uncle.
def doIt(foos):
try:
iter(foos)
except TypeError:
foos = [foos]
for foo in foos:
pass # do something here
The only problem with this approach is if foo is a string. A string is iterable, so passing in a single string rather than a list of strings will result in iterating over the characters in a string. If this is a concern, you could add an if test for it. At this point it's getting wordy for boilerplate code, so I'd break it out into its own function.
def iterfy(iterable):
if isinstance(iterable, basestring):
iterable = [iterable]
try:
iter(iterable)
except TypeError:
iterable = [iterable]
return iterable
def doIt(foos):
for foo in iterfy(foos):
pass # do something
Unlike some of those answering, I like doing this, since it eliminates one thing the caller could get wrong when using your API. "Be conservative in what you generate but liberal in what you accept."
To answer your original question, i.e. what you should name the parameter, I would still go with "foos" even though you will accept a single item, since your intent is to accept a list. If it's not iterable, that is technically a mistake, albeit one you will correct for the caller since processing just the one item is probably what they want. Also, if the caller thinks they must pass in an iterable even of one item, well, that will of course work fine and requires very little syntax, so why worry about correcting their misapprehension?

I would go with a name explaining that the parameter can be an instance or a list of instances. Say one_or_more_Foo_objects. I find it better than the bland param.

I'm working on a fairly big project now and we're passing maps around and just calling our parameter map. The map contents vary depending on the function that's being called. This probably isn't the best situation, but we reuse a lot of the same code on the maps, so copying and pasting is easier.
I would say instead of naming it what it is, you should name it what it's used for. Also, just be careful that you can't call use in on a not iterable.

Related

Argument convention in PyTorch

I am new to PyTorch and while going through the examples, I noticed that sometimes functions have a different convention when accepting arguments. For example transforms.Compose receives a list as its argument:
transform=transforms.Compose([ # Here we pass a list of elements
transforms.ToTensor(),
transforms.Normalize(
(0.4915, 0.4823, 0.4468),
(0.2470, 0.2435, 0.2616)
)
]))
At the same time, other functions receive the arguments individually (i.e. not in a list). For example torch.nn.Sequential:
torch.nn.Sequential( # Here we pass individual elements
torch.nn.Linear(1, 4),
torch.nn.Tanh(),
torch.nn.Linear(4, 1)
)
This has been a common typing mistake for me while learning.
I wonder if we are implying something when:
the arguments are passed as a list
the arguments are passed as individual items
Or is it simply the preference of the contributing author and should be memorized as is?
Update 1: Note that I do not claim that either format is better. I am merely complaining about lack of consistency. Of course (as Ivan stated in his answer) it makes perfect sense to follow one format if there is a good reason for it (e.g. transforms.Normalize). But if there is not, then I would vote for consistency.
This is not a convention, it is a design decision.
Yes, torch.nn.Sequential (source) receives individual items, whereas torchvision.transforms.Compose (source) receives a single list of items. Those are arbitrary design choices. I believe PyTorch and Torchvision are maintained by different groups of people, which might explain the difference. One could argue it is more coherent to have the inputs passed as a list since it is as a varied length, this is the approach used in more conventional programming languages such as C++ and Java. On the other hand you could argue it is more readable to pass them as a sequence of separate arguments instead, which what languages such as Python.
In this particular case we would have
>>> fn1([element_a, element_b, element_c]) # single list
vs
>>> fn2(element_a, element_b, element_c) # separate args
Which would have an implementation that resembles:
def fn1(elements):
pass
vs using the star argument:
def fn2(*elements):
pass
However it is not always up to design decision, sometimes the implementation is clear to take. For instance, it would be much preferred to go the list approach when the function has other arguments (whether they are positional or keyword arguments). In this case it makes more sense to implement it as fn1 instead of fn2. Here I'm giving second example with keyword arguments. Look a the difference in interface for the first set of arguments in both scenarios:
>>> fn1([elemen_a, element_b], option_1=True, option_2=True) # list
vs
>>> fn2(element_a, element_b, option_1=True, option_2=True) # separate
Which would have a function header which looks something like:
def fn1(elements, option_1=False, option_2=False)
pass
While the other would be using a star argument under the hood:
def fn2(*elements, option_1=False, option_2=False)
pass
If an argument is positioned after the star argument it essentially forces the user to use it as a keyword argument...
Mentioning this you can check out the source code for both Compose and Sequential and you will notice how both only expect a list of elements and no additional arguments afterwards. So in this scenario, it might have been preferred to go with Sequential's approach using the star argument... but this is just personal preference!

**kwargs.pop(x) versus defining x as a function parameter

Lets say you're writing a child class that has a constructor that passes its unused kwargs up to the parent constructor, but your class has the argument x that it needs to store that shouldn't be passed to the parent.
I have seen two different approaches to this:
def __init__(self, **kwargs):
self.x = kwargs.pop('x', 'default')
super().__init__(**kwargs)
and
def __init__(self, x='default', **kwargs):
self.x = x
super().__init__(**kwargs)
Is there every any functional difference between these two constructors? Is there any reason to use one over the other?
The only difference I can see is that the second form, which defines x in the signature, allows the user to better see it as a possible argument, or an IDE to offer it as an autocomplete option. Or in Python 3.5+, you could add a type annotation to x. Does that make the first form objectively worse?
As already mentionned by Giacomo Alzetta in a comment, the second version allow to pass x as a positional argument when the first only allow named arguments, IOW with the second form you can use both Child(x=2) AND Child(2), while the first only supports Child(x=2).
Also, when using inspection to check the method's signature, the second form will clearly mention the existance of the x param, while the first won't.
And finally, the second version will yield a slightly clearer exception if x is not passed.
And that's for the functional differences.
Is there any reason to use one over the other?
Well... As a general rule, it's cleaner (best practice) to use explicit arguments whenever possible, even if only for readability, and from experience it does usually make maintenance easier indeed. So from this point of view, the second form can be seen as "objectively better" than the first.
This being said, when the parent method has dozens of mostly optional and rarely used arguments (django.forms.Form, I'm lookig at you) AND you also want to preserve positional arguments order, it can be convenient to just use the generic *args, **kwargs signature for the child and force the additional param(s) to be passed as kwargs. Assuming you clearly document this in the docstring, it's still explicit enough (as far as I'm concerned, YMMV), and also avoids a lot of clutter (you can have a look at django.forms.Form for a concrete example of what I mean here).
So as always with "best practices" and other golden rules, you have to understand and weight the pros and cons wrt/ the concrete case at hand.
PS: just to make things clear, django's Form class signature makes perfect sense so I'm not ranting here - it's just one of those cases where there's no "beautiful" solution to the problem, period.
Aside obvious differences in code clarity, there might be a little difference in speed of calling the function, in this case method init().
If you can, define all necessary arguments with default values if you have some, in both methods, and pass them classically, and exclude ones you do not wish.
In this way you make the code clear and speed of calls stays the same.
If you need some micro-optimization, then use timeit to check what works faster.
I expect that one with the "x" added as an argument will perhaps be a winner.
Because getting to its value directly from local variables will be faster and kwargs dict() is smaller.
When you use "normal" arguments, they are automatically inserted into the functions local variables dictionary.
When you use *args and/or **kwargs they are additional tuple() and/or dict() added as new local variables. They are first created from the arguments you passed into the function call.
When you are passing them to a next function, they are extracted
to match that function's call signature. In both operations you lose a tiny bit of speed.
If you add removing something from the kwargs dictionary, ( x = kwargs.pop("x") ), you also lose some speed.
By observing both codes, it seems that their call speed would be equal. But you should check. If you do not need an extra 0.000001 seconds when initializing your instances, then both options are fine and just choose what you like most.
But again, if you are free to do it, and if it will not greatly impair the code's maintenance, define all arguments and their default values and pass them on one-by-one.

When to type-check a function's arguments?

I'm asking about situations where if a wrong type of argument is passed to the function, it could:
Blow up the whole thing.
Return unexpected results
Return nothing
For instance, the function below expects the argument name to be a string. It would throw an exception for all other types that doesn't have a startswith method.
def fruits(name):
if name.startswith('O'):
print('Is it Orange?')
There are other cases where a function could halt or cause damage to the system if execution proceeds without type-checking. Whenever there are a lot of functions or functions with a lot of arguments, type checking is tedious and makes the code unreadable. So, is there a standard for doing this? As to 'how to type check' - there are plenty of examples here on stackexchange, but I couldn't find any about where it would be appropriate to do so.
Another example would be:
def fruits(names):
with open('important_file.txt', 'r+') as fil:
for name in names:
if name in fil:
# Edit the file
Here if the name is a string each character in it will influence the editing of the file. If it is any other iterable, each element provided by it would influence the editing. Both of these could produce different results.
So, when should we type-check an argument and should we not?
The answer off the top of my head would be: it depends where the input comes from.
If the functions are class methods that get invokes internally or things like that, you can assume the inputs are valid, because you wrote it!
For example
def add(x,y):
return x + y
def multiply(a,b):
product = 0
for i in range(a):
product = add(product, b)
return product
In my add function, I could check that there is a + operator for the parameters x and y. But since I wrote the multiply function, and that is the only function that uses add, it is safe to assume the inputs will be int because that's how I wrote it. Now that argument stands on shaky ground for large code bases where you (hopefully) have shared code, so you can't be sure people don't misuse your functions. But that's why you comment them well to describe the correct use of said function.
If it has to read from a file, get user input, etc, then you may want to do some validation first.
I almost never do type checking in Python. In accordance with Pythonic philosophy I assume that me and other programmers are adult people capable of reading the code (or at least the documentation) and using it properly. I assume that we test our code before we let it destroy something important. After all in most cases if you do something wrong, you'll just see an error and Python's error messages are quite informative most of the time.
The only occasion when I sometimes check types is when I want my function to behave differently depending on the argument's type. But although I sometimes feel compelled to do this, I don't consider it a good practice.
Most often it happens when my function iterates over a list of strings and I fear (or want) I could get a single string passed into it by accident - this won't throw an error at once because unfortunately string is an iterable too.

Correct Style for Python functions that mutate the argument

I would like to write a Python function that mutates one of the arguments (which is a list, ie, mutable). Something like this:
def change(array):
array.append(4)
change(array)
I'm more familiar with passing by value than Python's setup (whatever you decide to call it). So I would usually write such a function like this:
def change(array):
array.append(4)
return array
array = change(array)
Here's my confusion. Since I can just mutate the argument, the second method would seem redundant. But the first one feels wrong. Also, my particular function will have several parameters, only one of which will change. The second method makes it clear what argument is changing (because it is assigned to the variable). The first method gives no indication. Is there a convention? Which is 'better'? Thank you.
The first way:
def change(array):
array.append(4)
change(array)
is the most idiomatic way to do it. Generally, in python, we expect a function to either mutate the arguments, or return something1. The reason for this is because if a function doesn't return anything, then it makes it abundantly clear that the function must have had some side-effect in order to justify it's existence (e.g. mutating the inputs).
On the flip side, if you do things the second way:
def change(array):
array.append(4)
return array
array = change(array)
you're vulnerable to have hard to track down bugs where a mutable object changes all of a sudden when you didn't expect it to -- "But I thought change made a copy"...
1Technically every function returns something, that _something_ just happens to be None ...
The convention in Python is that functions either mutate something, or return something, not both.
If both are useful, you conventionally write two separate functions, with the mutator named for an active verb like change, and the non-mutator named for a participle like changed.
Almost everything in builtins and the stdlib follows this pattern. The list.append method you're calling returns nothing. Same with list.sort—but sorted leaves its argument alone and instead returns a new sorted copy.
There are a handful of exceptions for some of the special methods (e.g., __iadd__ is supposed to mutate and then return self), and a few cases where there clearly has to be one thing getting mutating and a different thing getting returned (like list.pop), and for libraries that are attempting to use Python as a sort of domain-specific language where being consistent with the target domain's idioms is more important than being consistent with Python's idioms (e.g., some SQL query expression libraries). Like all conventions, this one is followed unless there's a good reason not to.
So, why was Python designed this way?
Well, for one thing, it makes certain errors obvious. If you expected a function to be non-mutating and return a value, it'll be pretty obvious that you were wrong, because you'll get an error like AttributeError: 'NoneType' object has no attribute 'foo'.
It also makes conceptual sense: a function that returns nothing must have side-effects, or why would anyone have written it?
But there's also the fact that each statement in Python mutates exactly one thing—almost always the leftmost object in the statement. In other languages, assignment is an expression, mutating functions return self, and you can chain up a whole bunch of mutations into a single line of code, and that makes it harder to see the state changes at a glance, reason about them in detail, or step through them in a debugger.
Of course all of this is a tradeoff—it makes some code more verbose in Python than it would be in, say, JavaScript—but it's a tradeoff that's deeply embedded in Python's design.
It hardly ever makes sense to both mutate an argument and return it. Not only might it cause confusion for whoever's reading the code, but it leaves you susceptible to the mutable default argument problem. If the only way to get the result of the function is through the mutated argument, it won't make sense to give the argument a default.
There is a third option that you did not show in your question. Rather than mutating the object passed as the argument, make a copy of that argument and return it instead. This makes it a pure function with no side effects.
def change(array):
array_copy = array[:]
array_copy.append(4)
return array_copy
array = change(array)
From the Python documentation:
Some operations (for example y.append(10) and y.sort()) mutate the
object, whereas superficially similar operations (for example y = y +
[10] and sorted(y)) create a new object. In general in Python (and in
all cases in the standard library) a method that mutates an object
will return None to help avoid getting the two types of operations
confused. So if you mistakenly write y.sort() thinking it will give
you a sorted copy of y, you’ll instead end up with None, which will
likely cause your program to generate an easily diagnosed error.
However, there is one class of operations where the same operation
sometimes has different behaviors with different types: the augmented
assignment operators. For example, += mutates lists but not tuples or
ints (a_list += [1, 2, 3] is equivalent to a_list.extend([1, 2, 3])
and mutates a_list, whereas some_tuple += (1, 2, 3) and some_int += 1
create new objects).
Basically, by convention, a function or method that mutates an object does not return the object itself.

Flexible Arguments in Python

This has probably been asked before, but I don't know how to look up the answer, because I'm not sure what's a good way to phrase the question.
The topic is function arguments that can semantically be expressed in many different ways. For example, to give a file to a function, you could either give the file directly, or you could give a string, which is the path to the file. To specify a number, you might allow an integer as an argument, or maybe you might allow a string (the numeral), or you might even allow a string such as "one". Another example might be a function that takes a list (of numbers, say), but as a convenience, it will convert a number into a list containing one element: that number.
Is there a more or less standard approach in Python to allowing this sort of flexibility? It certainly complicates the code for a program if you're not certain what types the arguments are, so my guess would be to try factor out the convenience functions into just one place, instead of scattered everywhere, but I don't really know how best to do that kind of factoring.
No, there is not more or less a "standard" approach to this.
Don't do that! ;)
But if you still want to, I'd suggest that you have an intermediate class or function handling this for you:
Pseudocode:
def printTheNumber(num):
print num
def intermediatePrintTheNumber(input):
num_int_dict = {'one':1, "two":2 ....
if input.isstring():
printTheNumber(num_int_dict[input])
elif input.isint():
printTheNumber(input)
else:
print "Sorry Dave, I don't understand you"
If this is pythonic I don't know, but that's how I'd solve it if I had to, of course with some more checking of the input to deem it valid.
When it comes to your comment you mention semantic similarity i.e "one" and 1 might mean the same thing.
Where should this kind of conversion be made you ask.
Well that depends on the design of your system, but I can tell you that it should not be done in the same function that I call printTheNumber for one very simple reason, and that is that that would give that function way to much responsibility.
Depending on the complexity of the input it could be the integer 1 or the string "1" or, in the worse case, "one" or maybe even worse "uno"|"one"|"yxi"|"ett" .. and so on. This should be handled by a function that has only that responsibility maybe with a database handling the mapping.
I would split it up so that I have one function handling the the strings "one", "two" ... etc, and one handling integers and have a third function that checks the input to see if it can be converted to an integer or not.
As I see it, there is a warning for a fundamental flaw in the design if you have to take measures for this kind of complexity, but you seem to be aware of that so I won't go on about it.
A good way to factor out code common to several functions is through decorators. For example,
from functools import wraps
def takes_list(func):
#wraps(func)
def wrapper(arg):
if not isinstance(arg, list):
arg = [arg]
return func(arg)
return wrapper
#takes_list
def my_func(x):
"Does something with list x."
I should note that for cases such as files, you don't want to get in the way of Python's duck typing: doing a check isinstance(arg, file) has the problem that it won't allow file-like things such as io.StringIO. Instead, check against str (or basestring) or even let open do the checking for you, with try-except.
However, it's generally better practice just to let the caller pass what they like into the function, and fail if it isn't valid.
Since you can not add methods at runtime to builtin classes such as int or str you would make a switch case statement like structure as mentioned by Daniel Figueroa.
Another way would be to just convert:
def func(i):
if not isinstance(i, int):
i = int(i) # objects can overwrite __int__ if needed.
If you have own classes that you can add methods to, you may use double dispatch to do the same thing for you. Smalltalk uses this for the Integer-Float-... conversion.
Another way would be to use subject oriented programming for which I have not found an implementation yet but I tried: https://gist.github.com/niccokunzmann/4971938

Categories

Resources