In my work, my Python scripts get a lot of input from non-Python-professional users.
So, for example, if my function needs to handle both single values and a series of values (Polymorphism/Duck Typing, Yo!) I would like to do something like this pseudo-code:
def duck_typed(args):
"""
args - Give me a single integer or a list of integers to work on.
"""
for li in args:
<do_something>
If the user passes me a list:
[1,2]
a tuple:
(1,2)
or even a singleton tuple (terminology help here, please)
(1,)
everything works as expected. But as soon as the user passes in a single integer everything goes to $%^&:
1
TypeError: 'int' object is not iterable
The Difference between Smart and Clever
Now, I immediately think, "No problem, I just need to get clever!" and I do something like this:
def duck_typed(args):
"""
args - Give me a single integer or a list of integers to work on.
"""
args = (args,) # <= Ha ha! I'm so clever!
for li in args:
<do_something>
Well, now it works for the single integer case:
args = (1,)
(1,)
but when the user passes in an iterable, the $%^&*s come out again. My trick gives me a nested iterable:
args = ((1,2),)
((1,2),)
ARGH!
The Usual Suspects
There are of course the usual workarounds. Try/except clauses:
try:
args = tuple(args)
except TypeError:
args = tuple((args,))
These work, but I run into this issue A LOT. This is really a 1-line problem and try/except is a 4-line solution. I would really love it if I could just call:
tuple(1)
have it return (1,) and call it a day.
Other People Use this Language Too, You Know
Now I'm aware that my needs in my little corner of the Python programming universe don't apply to the rest of the Python world. Being dynamically typed makes Python such a wonderful language to work in -- especially after years of work in neurotic languages such as C. (Sorry, sorry. I'm not bashing C. It's quite good at what its good at, but you know: xkcd)
I'm quite sure the creators of the Python language have a very good reason to not allow tuple(1).
Question 1
Will someone please explain why the creators chose to not allow tuple(1) and/or list(1) to work? I'm sure its completely sane reason and bloody obvious to many. I may have just missed it during my tenure at the School of Hard Knocks. (It's on the other side of the hill from Hogwarts.)
Question 2
Is there a more practical -- hopefully 1-line -- way to make the following conversion?
X(1) -> (1,)
X((1,2)) -> (1,2)
If not, I guess I could just break down and roll my own.
Duck typing explains and validates why list(1) fails.
The method was expecting a Duck but was given a Hamster, and Hamsters can't swim1.
In duck typing, a programmer is only concerned with ensuring that objects behave as demanded of them in a given context, rather than ensuring that they are of a specific type.
But not all objects/types behave the same, or "as demanded". In this case, an integer does not behave like an iterable and causes an exception. However, list("quack") works precisely because a string does act like an iterable and goes Quack - ['q','u','a','c','k']! To make list take a non-iterable would actually mean special casing, not duck typing.
Expecting an integer to "be iterable" sounds like a design issue because this requires an implicit change in multiplicity. That is, the concepts of a value and a sequence of [zero or more] values should be kept separate. Polymorphism doesn't apply in this case as polymorphism (of any type) only works over unification - and there is no unification to "iterate a non-iterable".
Furthermore, having list(itr) only accept an iterable fits in the strongly-typed Python model and avoids edge-cases. Consider that if list(x) was written in such a way that it allowed a non-iterable as well, one could not determine if the result would be [x] or [x0..xn] without knowing the value supplied. Python simply forbids this operation and puts the burden of changing multiplicity - or, passing a Duck - on the calling code.
See In Python, how do I determine if an object is iterable? which presents several solutions to wrap (or otherwise deal with) a non-iterable value.
While I do not recommend this approach, as it changes the multiplicity, I would likely write such a coercion function as follows. Unlike isinstance checks it will handle all non-iterable values. However, you will have to work out the rules for what should happen on ensure_iterable(None).
def ensure_iterable(x):
try:
return iter(x)
except TypeError:
return (x,)
And then "one line" usage:
for li in ensure_iterable(args):
pass
1 Hamsters can swim .. or at least stay afloat for a little bit. However I find the analogy is apt (and more memorable) precisely because a wet/drowning hamster is a sad thought. Keep those little critters safe and dry!
Try this:
if isinstance(variable, int):
variable = (variable, )
Related
I'm asking about situations where if a wrong type of argument is passed to the function, it could:
Blow up the whole thing.
Return unexpected results
Return nothing
For instance, the function below expects the argument name to be a string. It would throw an exception for all other types that doesn't have a startswith method.
def fruits(name):
if name.startswith('O'):
print('Is it Orange?')
There are other cases where a function could halt or cause damage to the system if execution proceeds without type-checking. Whenever there are a lot of functions or functions with a lot of arguments, type checking is tedious and makes the code unreadable. So, is there a standard for doing this? As to 'how to type check' - there are plenty of examples here on stackexchange, but I couldn't find any about where it would be appropriate to do so.
Another example would be:
def fruits(names):
with open('important_file.txt', 'r+') as fil:
for name in names:
if name in fil:
# Edit the file
Here if the name is a string each character in it will influence the editing of the file. If it is any other iterable, each element provided by it would influence the editing. Both of these could produce different results.
So, when should we type-check an argument and should we not?
The answer off the top of my head would be: it depends where the input comes from.
If the functions are class methods that get invokes internally or things like that, you can assume the inputs are valid, because you wrote it!
For example
def add(x,y):
return x + y
def multiply(a,b):
product = 0
for i in range(a):
product = add(product, b)
return product
In my add function, I could check that there is a + operator for the parameters x and y. But since I wrote the multiply function, and that is the only function that uses add, it is safe to assume the inputs will be int because that's how I wrote it. Now that argument stands on shaky ground for large code bases where you (hopefully) have shared code, so you can't be sure people don't misuse your functions. But that's why you comment them well to describe the correct use of said function.
If it has to read from a file, get user input, etc, then you may want to do some validation first.
I almost never do type checking in Python. In accordance with Pythonic philosophy I assume that me and other programmers are adult people capable of reading the code (or at least the documentation) and using it properly. I assume that we test our code before we let it destroy something important. After all in most cases if you do something wrong, you'll just see an error and Python's error messages are quite informative most of the time.
The only occasion when I sometimes check types is when I want my function to behave differently depending on the argument's type. But although I sometimes feel compelled to do this, I don't consider it a good practice.
Most often it happens when my function iterates over a list of strings and I fear (or want) I could get a single string passed into it by accident - this won't throw an error at once because unfortunately string is an iterable too.
This has probably been asked before, but I don't know how to look up the answer, because I'm not sure what's a good way to phrase the question.
The topic is function arguments that can semantically be expressed in many different ways. For example, to give a file to a function, you could either give the file directly, or you could give a string, which is the path to the file. To specify a number, you might allow an integer as an argument, or maybe you might allow a string (the numeral), or you might even allow a string such as "one". Another example might be a function that takes a list (of numbers, say), but as a convenience, it will convert a number into a list containing one element: that number.
Is there a more or less standard approach in Python to allowing this sort of flexibility? It certainly complicates the code for a program if you're not certain what types the arguments are, so my guess would be to try factor out the convenience functions into just one place, instead of scattered everywhere, but I don't really know how best to do that kind of factoring.
No, there is not more or less a "standard" approach to this.
Don't do that! ;)
But if you still want to, I'd suggest that you have an intermediate class or function handling this for you:
Pseudocode:
def printTheNumber(num):
print num
def intermediatePrintTheNumber(input):
num_int_dict = {'one':1, "two":2 ....
if input.isstring():
printTheNumber(num_int_dict[input])
elif input.isint():
printTheNumber(input)
else:
print "Sorry Dave, I don't understand you"
If this is pythonic I don't know, but that's how I'd solve it if I had to, of course with some more checking of the input to deem it valid.
When it comes to your comment you mention semantic similarity i.e "one" and 1 might mean the same thing.
Where should this kind of conversion be made you ask.
Well that depends on the design of your system, but I can tell you that it should not be done in the same function that I call printTheNumber for one very simple reason, and that is that that would give that function way to much responsibility.
Depending on the complexity of the input it could be the integer 1 or the string "1" or, in the worse case, "one" or maybe even worse "uno"|"one"|"yxi"|"ett" .. and so on. This should be handled by a function that has only that responsibility maybe with a database handling the mapping.
I would split it up so that I have one function handling the the strings "one", "two" ... etc, and one handling integers and have a third function that checks the input to see if it can be converted to an integer or not.
As I see it, there is a warning for a fundamental flaw in the design if you have to take measures for this kind of complexity, but you seem to be aware of that so I won't go on about it.
A good way to factor out code common to several functions is through decorators. For example,
from functools import wraps
def takes_list(func):
#wraps(func)
def wrapper(arg):
if not isinstance(arg, list):
arg = [arg]
return func(arg)
return wrapper
#takes_list
def my_func(x):
"Does something with list x."
I should note that for cases such as files, you don't want to get in the way of Python's duck typing: doing a check isinstance(arg, file) has the problem that it won't allow file-like things such as io.StringIO. Instead, check against str (or basestring) or even let open do the checking for you, with try-except.
However, it's generally better practice just to let the caller pass what they like into the function, and fail if it isn't valid.
Since you can not add methods at runtime to builtin classes such as int or str you would make a switch case statement like structure as mentioned by Daniel Figueroa.
Another way would be to just convert:
def func(i):
if not isinstance(i, int):
i = int(i) # objects can overwrite __int__ if needed.
If you have own classes that you can add methods to, you may use double dispatch to do the same thing for you. Smalltalk uses this for the Integer-Float-... conversion.
Another way would be to use subject oriented programming for which I have not found an implementation yet but I tried: https://gist.github.com/niccokunzmann/4971938
Lacking experience with maintaining dynamic-typed code, I'm looking for the best way to handle this kind of situations :
(Example in python, but could work with any dynamic-typed language)
def some_function(object_that_could_be_a_list):
if isinstance(object_that_could_be_a_list, list):
for element in object_that_could_be_a_list:
some_function(element)
else:
# Do stuff that expects the object to have certain properties
# a list would not have
I'm quite uneasy with this, since I think a method should do only one thing, and I'm thinking that it is not as readable as it should be. So, I'd be tempted to make three functions : the first that'll take any object and "sort" between the two others, one for the lists, another for the "simple" objects. Then again, that'd add some complexity.
What is the most "sustainable" solution here, and the one that guarantee ease of maintenance ? Is there an idiom in python for those situations that I'm unaware of ? Thanks in advance.
Don't type check - do what you want to do, and if it won't work, it'll throw an exception which you can catch and manage.
The python mantra is 'ask for forgiveness, not permission'. Type checking takes extra time, when most of the time, it'll be pointless. It also doesn't make much sense in a duck-typed environment - if it works, who cares why type it is? Why limit yourself to lists when other iterables will work too?
E.g:
def some_function(object_that_could_be_a_list):
try:
for element in object_that_could_be_a_list:
some_function(element)
except TypeError:
...
This is more readable, will work in more cases (if I pass in any other iterable which isn't a list, there are a lot) and will often be faster.
Note you are getting terminology mixed up. Python is dynamically typed, but not weakly typed. Weak typing means objects change type as needed. For example, if you add a string and an int, it will convert the string to an int to do the addition. Python does not do this. Dynamic typing means you don't declare a type for a variable, and it may contain a string at some point, then an int later.
Duck typing is a term used to describe the use of an object without caring about it's type. If it walks like a duck, and quacks like a duck - it's probably a duck.
Now, this is a general thing, and if you think your code will get the 'wrong' type of object more often than the 'right', then you might want to type check for speed. Note that this is rare, and it's always best to avoid premature optimisation. Do it by catching exceptions, and then test - if you find it's a bottleneck, then optimise.
A common practice is to implement the multiple interface by way of using different parameters for different kinds of input.
def foo(thing=None, thing_seq=None):
if thing_seq is not None:
for _thing in thing_seq:
foo(thing=_thing)
if thing is not None:
print "did foo with", thing
Rather than doing it recursive I tend do it this way:
def foo(x):
if not isinstance(x, list):
x = [x]
for y in x:
do_something(y)
You can use decorators in this case to make it more maintainable:
from mm import multimethod
#multimethod(int, int)
def foo(a, b):
...code for two ints...
#multimethod(float, float):
def foo(a, b):
...code for two floats...
#multimethod(str, str):
def foo(a, b):
...code for two strings...
I have some functions in my code that accept either an object or an iterable of objects as input. I was taught to use meaningful names for everything, but I am not sure how to comply here. What should I call a parameter that can a sinlge object or an iterable of objects? I have come up with two ideas, but I don't like either of them:
FooOrManyFoos - This expresses what goes on, but I could imagine that someone not used to it could have trouble understanding what it means right away
param - Some generic name. This makes clear that it can be several things, but does explain nothing about what the parameter is used for.
Normally I call iterables of objects just the plural of what I would call a single object. I know this might seem a little bit compulsive, but Python is supposed to be (among others) about readability.
I have some functions in my code that accept either an object or an iterable of objects as input.
This is a very exceptional and often very bad thing to do. It's trivially avoidable.
i.e., pass [foo] instead of foo when calling this function.
The only time you can justify doing this is when (1) you have an installed base of software that expects one form (iterable or singleton) and (2) you have to expand it to support the other use case. So. You only do this when expanding an existing function that has an existing code base.
If this is new development, Do Not Do This.
I have come up with two ideas, but I don't like either of them:
[Only two?]
FooOrManyFoos - This expresses what goes on, but I could imagine that someone not used to it could have trouble understanding what it means right away
What? Are you saying you provide NO other documentation, and no other training? No support? No advice? Who is the "someone not used to it"? Talk to them. Don't assume or imagine things about them.
Also, don't use Leading Upper Case Names.
param - Some generic name. This makes clear that it can be several things, but does explain nothing about what the parameter is used for.
Terrible. Never. Do. This.
I looked in the Python library for examples. Most of the functions that do this have simple descriptions.
http://docs.python.org/library/functions.html#isinstance
isinstance(object, classinfo)
They call it "classinfo" and it can be a class or a tuple of classes.
You could do that, too.
You must consider the common use case and the exceptions. Follow the 80/20 rule.
80% of the time, you can replace this with an iterable and not have this problem.
In the remaining 20% of the cases, you have an installed base of software built around an assumption (either iterable or single item) and you need to add the other case. Don't change the name, just change the documentation. If it used to say "foo" it still says "foo" but you make it accept an iterable of "foo's" without making any change to the parameters. If it used to say "foo_list" or "foo_iter", then it still says "foo_list" or "foo_iter" but it will quietly tolerate a singleton without breaking.
80% of the code is the legacy ("foo" or "foo_list")
20% of the code is the new feature ("foo" can be an iterable or "foo_list" can be a single object.)
I guess I'm a little late to the party, but I'm suprised that nobody suggested a decorator.
def withmany(f):
def many(many_foos):
for foo in many_foos:
yield f(foo)
f.many = many
return f
#withmany
def process_foo(foo):
return foo + 1
processed_foo = process_foo(foo)
for processed_foo in process_foo.many(foos):
print processed_foo
I saw a similar pattern in one of Alex Martelli's posts but I don't remember the link off hand.
It sounds like you're agonizing over the ugliness of code like:
def ProcessWidget(widget_thing):
# Infer if we have a singleton instance and make it a
# length 1 list for consistency
if isinstance(widget_thing, WidgetType):
widget_thing = [widget_thing]
for widget in widget_thing:
#...
My suggestion is to avoid overloading your interface to handle two distinct cases. I tend to write code that favors re-use and clear naming of methods over clever dynamic use of parameters:
def ProcessOneWidget(widget):
#...
def ProcessManyWidgets(widgets):
for widget in widgets:
ProcessOneWidget(widget)
Often, I start with this simple pattern, but then have the opportunity to optimize the "Many" case when there are efficiencies to gain that offset the additional code complexity and partial duplication of functionality. If this convention seems overly verbose, one can opt for names like "ProcessWidget" and "ProcessWidgets", though the difference between the two is a single easily missed character.
You can use *args magic (varargs) to make your params always be iterable.
Pass a single item or multiple known items as normal function args like func(arg1, arg2, ...) and pass iterable arguments with an asterisk before, like func(*args)
Example:
# magic *args function
def foo(*args):
print args
# many ways to call it
foo(1)
foo(1, 2, 3)
args1 = (1, 2, 3)
args2 = [1, 2, 3]
args3 = iter((1, 2, 3))
foo(*args1)
foo(*args2)
foo(*args3)
Can you name your parameter in a very high-level way? people who read the code are more interested in knowing what the parameter represents ("clients") than what their type is ("list_of_tuples"); the type can be defined in the function documentation string, which is a good thing since it might change, in the future (the type is sometimes an implementation detail).
I would do 1 thing,
def myFunc(manyFoos):
if not type(manyFoos) in (list,tuple):
manyFoos = [manyFoos]
#do stuff here
so then you don't need to worry anymore about its name.
in a function you should try to achieve to have 1 action, accept the same parameter type and return the same type.
Instead of filling the functions with ifs you could have 2 functions.
Since you don't care exactly what kind of iterable you get, you could try to get an iterator for the parameter using iter(). If iter() raises a TypeError exception, the parameter is not iterable, so you then create a list or tuple of the one item, which is iterable and Bob's your uncle.
def doIt(foos):
try:
iter(foos)
except TypeError:
foos = [foos]
for foo in foos:
pass # do something here
The only problem with this approach is if foo is a string. A string is iterable, so passing in a single string rather than a list of strings will result in iterating over the characters in a string. If this is a concern, you could add an if test for it. At this point it's getting wordy for boilerplate code, so I'd break it out into its own function.
def iterfy(iterable):
if isinstance(iterable, basestring):
iterable = [iterable]
try:
iter(iterable)
except TypeError:
iterable = [iterable]
return iterable
def doIt(foos):
for foo in iterfy(foos):
pass # do something
Unlike some of those answering, I like doing this, since it eliminates one thing the caller could get wrong when using your API. "Be conservative in what you generate but liberal in what you accept."
To answer your original question, i.e. what you should name the parameter, I would still go with "foos" even though you will accept a single item, since your intent is to accept a list. If it's not iterable, that is technically a mistake, albeit one you will correct for the caller since processing just the one item is probably what they want. Also, if the caller thinks they must pass in an iterable even of one item, well, that will of course work fine and requires very little syntax, so why worry about correcting their misapprehension?
I would go with a name explaining that the parameter can be an instance or a list of instances. Say one_or_more_Foo_objects. I find it better than the bland param.
I'm working on a fairly big project now and we're passing maps around and just calling our parameter map. The map contents vary depending on the function that's being called. This probably isn't the best situation, but we reuse a lot of the same code on the maps, so copying and pasting is easier.
I would say instead of naming it what it is, you should name it what it's used for. Also, just be careful that you can't call use in on a not iterable.
I find myself writing the same argument checking code all the time for number-crunching:
def myfun(a, b):
if a < 0:
raise ValueError('a cannot be < 0 (was a=%s)' % a)
# more if.. raise exception stuff here ...
return a + b
Is there a better way? I was told not to use 'assert' for these things (though I don't see the problem, apart from not knowing the value of the variable that caused the error).
edit: To clarify, the arguments are usually just numbers and the error checking conditions can be complex, non-trivial and will not necessarily lead to an exception later, but simply to a wrong result. (unstable algorithms, meaningless solutions etc)
assert gets optimized away if you run with python -O (modest optimizations, but sometimes nice to have). One preferable alternative if you have patterns that often repeat may be to use decorators -- great way to factor out repetition. E.g., say you have a zillion functions that must be called with arguments by-position (not by-keyword) and must have their first arguments positive; then...:
def firstargpos(f):
def wrapper(first, *args):
if first < 0:
raise ValueError(whateveryouwish)
return f(first, *args)
return wrapper
then you say something like:
#firstargpos
def myfun(a, b):
...
and the checks are performed in the decorators (or rather the wrapper closure it returns) once and for all. So, the only tricky part is figuring out exactly what checks your functions need and how best to call the decorator(s) to express those (hard to say, without seeing the set of functions you're defining and the set of checks each needs!-). Remember, DRY ("Don't Repeat Yourself") is close to the top spot among guiding principles in software development, and Python has reasonable support to allow you to implement DRY and avoid boilerplatey, repetitious code!-)
You don't want to use assert because your code can be run (and is by default on some systems) in such a way that assert lines are not checked and do not raise errors (-O command line flag).
If you're using a lot of variables that are all supposed to have those same properties, why not subclass whatever type you're using and add that check to the class itself? Then when you use your new class, you know you never have an invalid value, and don't have to go checking for it all over the place.
I'm not sure if this will answer your question, but it strikes me that checking a lot of arguments at the start of a function isn't very pythonic.
What I mean by this is that it is the assumption of most pythonistas that we are all consenting adults, and we trust each other not to do something stupid. Here's how I'd write your example:
def myfun(a, b):
'''a cannot be < 0'''
return a + b
This has three distinct advantages. First off, it's concise, there's really no extra code doing anything unrelated to what you're actually trying to get done. Second, it puts the information exactly where it belongs, in help(myfun), where pythonistas are expected to look for usage notes. Finally, is a non-positive value for a really an error? Although you might think so, unless something definitely will break if a is zero (here it probably wont), then maybe letting it slip through and cause an error up the call stream is wiser. after all, if a + b is in error, it raises an exception which gets passed up the call stack and behavior is still pretty much the same.