I am new to Python and higher level languages in general, so I was wondering if it is looked down upon if I have a function that takes in a lot of arguments, and how to better architect my code to prevent this.
For example what this function is essentially doing is printing each location of a string in a file.
def scan(fpin,base,string,pNo,print_to_file,dumpfile_path,quiet):
This function is being called from the main function, which is basically parsing the command line arguments and passing the data to the scan function. I have thought of creating a class containing all of these arguments and passing it to scan,but there will only be one instance of this data, so wouldn't that be pointless?
Named arguments are your friends. For things that act like semi-optional configuration options with reasonable defaults, give the parameters the defaults, and only pass them (as named arguments) for non-default situations. If there are a lot of parameters without reasonable defaults, then you may want to name all of them when you call the function.
Consider the built-in function sorted. It takes up to four arguments. Is the reverse parameter before or after cmp? What should I pass in as key if I want the default behavor? Answer: Hell if I can remember. I call sorted(A, reverse=True) and it does what I'd expect.
Incidentally, if I had a ton of "config"-style arguments that I was passing into every call to scan, and only changing (say, fpin and string) each time, I might be inclined to put all the other argumentsinto a dictionary, and then pass it to the function with **kwargs syntax. That's a little more advanced. See the manual for details. (Note that this is NOT the same as declaring the function as taking **kwargs. The function definition is the same, the only difference is what calls to it look like.)
No, there's really nothing wrong with it. If you have N different arguments (things that control the execution of your function), you have to pass them somehow - how you actually do that is just user preference if you ask me.
However... if you find yourself doing something like this, though:
func('somestring', a=A, b=B, c=C)
func('something else', a=A, b=B)
func('something third', a=A, c=C, d=D)
etc. where A,B,C are really configurations for lots of different things, then you should start looking into a class. A class does many things, but it does also create context. Instead, then you can do something like:
cf = myclass(a=A, b=B, c=C, d=D)
cf.func('somestring')
cf.func('something else')
cf.func('something third')
etc.
Related
Lets say you're writing a child class that has a constructor that passes its unused kwargs up to the parent constructor, but your class has the argument x that it needs to store that shouldn't be passed to the parent.
I have seen two different approaches to this:
def __init__(self, **kwargs):
self.x = kwargs.pop('x', 'default')
super().__init__(**kwargs)
and
def __init__(self, x='default', **kwargs):
self.x = x
super().__init__(**kwargs)
Is there every any functional difference between these two constructors? Is there any reason to use one over the other?
The only difference I can see is that the second form, which defines x in the signature, allows the user to better see it as a possible argument, or an IDE to offer it as an autocomplete option. Or in Python 3.5+, you could add a type annotation to x. Does that make the first form objectively worse?
As already mentionned by Giacomo Alzetta in a comment, the second version allow to pass x as a positional argument when the first only allow named arguments, IOW with the second form you can use both Child(x=2) AND Child(2), while the first only supports Child(x=2).
Also, when using inspection to check the method's signature, the second form will clearly mention the existance of the x param, while the first won't.
And finally, the second version will yield a slightly clearer exception if x is not passed.
And that's for the functional differences.
Is there any reason to use one over the other?
Well... As a general rule, it's cleaner (best practice) to use explicit arguments whenever possible, even if only for readability, and from experience it does usually make maintenance easier indeed. So from this point of view, the second form can be seen as "objectively better" than the first.
This being said, when the parent method has dozens of mostly optional and rarely used arguments (django.forms.Form, I'm lookig at you) AND you also want to preserve positional arguments order, it can be convenient to just use the generic *args, **kwargs signature for the child and force the additional param(s) to be passed as kwargs. Assuming you clearly document this in the docstring, it's still explicit enough (as far as I'm concerned, YMMV), and also avoids a lot of clutter (you can have a look at django.forms.Form for a concrete example of what I mean here).
So as always with "best practices" and other golden rules, you have to understand and weight the pros and cons wrt/ the concrete case at hand.
PS: just to make things clear, django's Form class signature makes perfect sense so I'm not ranting here - it's just one of those cases where there's no "beautiful" solution to the problem, period.
Aside obvious differences in code clarity, there might be a little difference in speed of calling the function, in this case method init().
If you can, define all necessary arguments with default values if you have some, in both methods, and pass them classically, and exclude ones you do not wish.
In this way you make the code clear and speed of calls stays the same.
If you need some micro-optimization, then use timeit to check what works faster.
I expect that one with the "x" added as an argument will perhaps be a winner.
Because getting to its value directly from local variables will be faster and kwargs dict() is smaller.
When you use "normal" arguments, they are automatically inserted into the functions local variables dictionary.
When you use *args and/or **kwargs they are additional tuple() and/or dict() added as new local variables. They are first created from the arguments you passed into the function call.
When you are passing them to a next function, they are extracted
to match that function's call signature. In both operations you lose a tiny bit of speed.
If you add removing something from the kwargs dictionary, ( x = kwargs.pop("x") ), you also lose some speed.
By observing both codes, it seems that their call speed would be equal. But you should check. If you do not need an extra 0.000001 seconds when initializing your instances, then both options are fine and just choose what you like most.
But again, if you are free to do it, and if it will not greatly impair the code's maintenance, define all arguments and their default values and pass them on one-by-one.
Implementing some Neural Network with tensorflow, I've faced a method which parameters have took my attention. I'm talking about tf.nn.sigmoid_cross_entropy_with_logits (Documentation here).
The first parameter it receives as first parameter _sentinel=None which, according to the documentation:
_sentinel: Used to prevent positional parameters. Internal, do not use.
I understand that by having this parameter, next ones have to be named instead of positional is this one don't have to be used, but my question is. In which cases does prevent positional parameters have some benefit? What is their main goal to use this? Because I could also run
tf.nn.sigmoid_cross_entropy_with_logits(None, my_labels, my_logits)
being all arguments positional. Anyway, I want to clarify that my question is not focused in TensorFlow, it's just the example that I have found.
Positional parameters couple the caller and receiver on the order of the parameters. It makes refactoring the order of the reciver's parameters more difficult.
For example, if I have
def foo(a, b, c):
do_stuff(a,b,c)
and I decide, for reasons, perhaps I want to make a partial function or whatever, that it would be better to have
def foo(b, a, c):
do_stuff(a,b,c)
But now I have callers in the wild and it would be very rude to change my contract, so I'm stuck.
Sandi Metz in Practical Object-Oriented Design in Ruby also addresses this. (I know this is python, but oop is oop)
When the code [is changed to use keyword arguments], it lost its dependency
on argument order but it gained a dependency on the names of the keys
in the [keyword arguments]. This change is healthy. The new dependency is
more stable than the old, and thus this code faces less risk of being
forced to change. Additionally, and perhaps unexpectedly, the [keywords]
provides one new, secondary benefit: The key names in the hash furnish
explicit documentation about the arguments. This is a byproduct of
using a hash but the fact that it is unintentional makes it no less
useful. Future maintainers of this code will be grateful for the
information.
Keyword arguments are also nice if you have a lot of parameters. Order is easy to get wrong. It may also make a nicer API in the opinion of the authors.
PEP-3102 also addresses this, but I find the rationale unsatisfying from the perspective of "why would I choose to design something like this"
The current Python function-calling paradigm allows arguments to be
specified either by position or by keyword. An argument can be filled
in either explicitly by name, or implicitly by position.
There are often cases where it is desirable for a function to take a
variable number of arguments. The Python language supports this using
the 'varargs' syntax (*name), which specifies that any 'left over'
arguments be passed into the varargs parameter as a tuple.
One limitation on this is that currently, all of the regular argument
slots must be filled before the vararg slot can be.
This is not always desirable. One can easily envision a function which
takes a variable number of arguments, but also takes one or more
'options' in the form of keyword arguments. Currently, the only way to
do this is to define both a varargs argument, and a 'keywords'
argument (**kwargs), and then manually extract the desired keywords
from the dictionary.
What is the use for keyword only parameters:
For some function, it is impossible to do otherwise (ex: print(a, b, end=''))
It prevents you from making silly mistakes, consider the following example:
# if it wasn't made with kw-only parameters, this would return 3
>>> sorted(3, 1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: sorted expected 1 arguments, got 2
>>> sorted((1,2), reverse=True)
[2, 1]
It allows you to change things later:
# if
def sorted(iterable, reverse=False)
# becomes
def sorted(iterable, key=None, reverse=False)
# you can guarantee backwards compatibility
First, a caveat that I can't know the intention of the person who wrote that. However, I can offer reason why “prevent positional parameters” might be desirable.
It's often important that a parameter be keyword-only, that is, it must be used only by name. The parameter is not conceptually an input to the function's purpose; it's more a modifier (change the behaviour in this way), or an external resource (here is the log file to emit your messages to), etc.
For that reason, Python 3 now allows you to define, in the signature of the function, specific parameters as keyword-only parameters. The change is documented in PEP 3102 Keyword-only arguments along with rationale.
I have a function, and inside this function I should execute another function of 3 posibilities.
The simple way should be passing a parameter, doing a if-elif to exec function acording to that parameter, but I'd like to know it it's posible to pass the name of a funtion by params and use that, something like:
FunctionA(Param1, Param2, FunctionNameToUse)
....Code....
FunctionNameToUse(Value1, Value2)
....Code....
endFunctionA
Depending on how functionA is called, it will use one Function or another, I dont know if I explained well...
Thanks mates!
Functions themselves are first class objects in Python. So there's no need to pass the name: pass the actual function itself, then you can call it directly.
def func1(a, b):
pass
def func2(a, b):
pass
def functionA(val1, val2, func):
func(val1, val2)
# now call functionA passing in a function
functionA("foo", "bar", func1)
In python, functions are first-class values, which means they can be passed around as arguments, returned from functions, and so on.
So the approach you want to take should work (However you're not passing the name, you're passing the actual function).
Take a look at map and filter in the standard library, they do exactly this.
Yes, you can. As other have mentioned, functions are first-class objects, i.e. you can do everything you can think of to it (see What are "first class" objects? for more info).
So then, how does this really work when you pass in a function as parameter? If you have taken any ComSci class, you'd know that each process has a special area in memory allocated for its code; all of your functions are defined here. Whenever you call a function, the CPU would move the Program Counter (a special pointer that points to whatever code it is executing at the moment) to the beginning of that function's code, and go from there. Thus, after the function is loaded in memory, all that's needed to call it is its address. When you pass a function A's "name" as an argument to another function B in Python, what you are doing is really passing a reference (in C term, a pointer) that contains the address to the code that represents A in memory.
I have started to learn python, and I would like to ask you about something which I considered a little magic in this language.
I would like to note that before learning python I worked with PHP and there I haven't noticed that.
What's going on - I have noticed that some call constructors or methods in Python are in this form.
object.call(variable1 = value1, variable2 = value2)
For example, in FLask:
app.run(debug=True, threaded=True)
Is any reason for this convention? Or is there some semantical reason outgoing from the language fundamentals? I haven't seen something like that in PHP as often as in Python and because I'm really surprised. I'm really curious if there is some magic or it's only convention to read code easier.
These are called keyword arguments, and they're usually used to make the call more readable.
They can also be used to pass the arguments in a different order from the declared parameters, or to skip over some default parameters but pass arguments to others, or because the function requires keyword arguments… but readability is the core reason for their existence.
Consider this:
app.run(True, False)
Do you have any idea what those two arguments mean? Even if you can guess that the only two reasonable arguments are threading and debugging flags, how can you guess which one comes first? The only way you can do it is to figure out what type app is, and check the app.run method's docstring or definition.
But here:
app.run(debug=True, threaded=False)
It's obvious what it means.
It's worth reading the FAQ What is the difference between arguments and parameters?, and the other tutorial sections near the one linked above. Then you can read the reference on Function definitions for full details on parameters and Calls for full details on arguments, and finally the inspect module documentation on kinds of parameters.
This blog post attempts to summarize everything in those references so you don't have to read your way through the whole mess. The examples at the end should also serve to show why mixing up arguments and parameters in general, keyword arguments and default parameters, argument unpacking and variable parameters, etc. will lead you astray.
Specifying arguments by keyword often creates less risk of error than specifying arguments solely by position. Consider this function to compute loan payments:
def pmt(principal, interest, term):
return **something**;
When one tries to compute the amortization of their house purchase, it might be invoked thus:
payment = pmt(100000, 4.2, 360)
But it is difficult to see which of those values should be associated with which parameter. Without checking the documentation, we might think it should have been:
payment = pmt(360, 4.2, 100000)
Using keyword parameters, the call becomes self-documenting:
payment = pmt(principal=100000, interest=4.2, term=360)
Additionally, keyword parameters allow you to change the order of the parameters at the call site, and everything still works correctly:
# Equivalent to previous example
payment = pmt(term=360, interest=4.2, principal=100000)
See http://docs.python.org/2/tutorial/controlflow.html#keyword-arguments for more information.
They are arguments passed by keywords. There is no semantical difference between keyword arguments and positional arguments.
They are often used like "options", and provide a much more readable syntax for this circumstance. Think of this:
>>> sorted([2,-1,3], key=lambda x: x**2, reverse=True)
[3, 2, -1]
Versus(python2):
>>> sorted([2,-1,3], None, lambda x: x**2, True)
[3, 2, -1]
In this second example can you tell what's the meaning of None or True?
Note that in keyword only arguments, i.e. arguments that you can only specify using this syntax, were introduced in python3. In python2 any argument can be specified by position(except when using **kwargs but that's another issue).
There is no "magic".
A function can take:
Positional arguments (args)
Keyworded arguments (kwargs)
Always is this order.
Try this:
def foo(*args, **kwargs):
print args
print kwargs
foo(1,2,3,4,a=8,b=12)
Output:
(1, 2, 3, 4)
{'a': 8, 'b': 12}
Python stores the positional arguments in a tuple, which has to be immutable, and the keyworded ones in a dictionary.
The main utility of the convention is that it allows for setting certain inputs when there may be some defaults in between. It's particularly useful when a function has many parameters, most of which work fine with their defaults, but a few need to be set to other values for the function to work as desired.
example:
def foo(i1, i2=1, i3=3, i4=5):
# does something
foo(1,2,3,4)
foo(1,2,i4=3)
foo(1,i2=3)
foo(0,i3=1,i2=3,i4=5)
Out of curiosity is more desirable to explicitly pass functions to other functions, or let the function call functions from within. is this a case of Explicit is better than implicit?
for example (the following is only to illustrate what i mean)
def foo(x,y):
return 1 if x > y else 0
partialfun = functools.partial(foo, 1)
def bar(xs,ys):
return partialfun(sum(map(operator.mul,xs,ys)))
>>> bar([1,2,3], [4,5,6])
--or--
def foo(x,y):
return 1 if x > y else 0
partialfun = functools.partial(foo, 1)
def bar(fn,xs,ys):
return fn(sum(map(operator.mul,xs,ys)))
>>> bar(partialfun, [1,2,3], [4,5,6])
There's not really any difference between functions and anything else in this situation. You pass something as an argument if it's a parameter that might vary over different invocations of the function. If the function you are calling (bar in your example) is always calling the same other function, there's no reason to pass that as an argument. If you need to parameterize it so that you can use many different functions (i.e., bar might need to call many functions besides partialfun, and needs to know which one to call), then you need to pass it as an argument.
Generally, yes, but as always, it depends. What you are illustrating here is known as dependency injection. Generally, it is a good idea, as it allows separation of variability from the logic of a given function. This means, for example, that it will be extremely easy for you to test such code.
# To test the process performed in bar(), we can "inject" a function
# which simply returns its argument
def dummy(x):
return x
def bar(fn,xs,ys):
return fn(sum(map(operator.mul,xs,ys)))
>>> assert bar(dummy, [1,2,3], [4,5,6]) == 32
It depends very much on the context.
Basically, if the function is an argument to bar, then it's the responsibility of the caller to know how to implement that function. bar doesn't have to care. But consequently, bar's documentation has to describe what kind of function it needs.
Often this is very appropriate. The obvious example is the map builtin function. map implements the logic of applying a function to each item in a list, and giving back a list of results. map itself neither knows nor cares about what the items are, or what the function is doing to them. map's documentation has to describe that it needs a function of one argument, and each caller of map has to know how to implement or find a suitable function. But this arrangement is great; it allows you to pass a list of your custom objects, and a function which operates specifically on those objects, and map can go away and do its generic thing.
But often this arrangement is inappropriate. A function gives a name to a high level operation and hides the internal implementation details, so you can think of the operation as a unit. Allowing part of its operation to be passed in from outside as a function parameter exposes that it works in a way that uses that function's interface.
A more concrete (though somewhat contrived) example may help. Lets say I've implemented data types representing Person and Job, and I'm writing a function name_and_title for formatting someone's full name and job title into a string, for client code to insert into email signatures or on letterhead or whatever. It's obviously going to take a Person and Job. It could potentially take a function parameter to let the caller decide how to format the person's name: something like lambda firstname, lastname: lastname + ', ' + firstname. But to do this is to expose that I'm representing people's names with a separate first name and last name. If I want to change to supporting a middle name, then either name_and_title won't be able to include the middle name, or I have to change the type of the function it accepts. When I realise that some people have 4 or more names and decide to change to storing a list of names, then I definitely have to change the type of function name_and_title accepts.
So for your bar example, we can't say which is better, because it's an abstract example with no meaning. It depends on whether the call to partialfun is an implementation detail of whatever bar is supposed to be doing, or whether the call to partialfun is something that the caller knows about (and might want to do something else). If it's "part of" bar, then it shouldn't be a parameter. If it's "part of" the caller, then it should be a parameter.
It's worth noting that bar could have a huge number of function parameters. You call sum, map, and operator.mul, which could all be parameterised to make bar more flexible:
def bar(fn, xs,ys, g, h, i):
return fn(g(h(i,xs,ys))
And the way in which g is called on the output of h could be abstracted too:
def bar(fn, xs, ys, g, h, i, j):
return fn(j(g, h(i, xs, ys)))
And we can keep going on and on, until bar doesn't do anything at all, and everything is controlled by the functions passed in, and the caller might as well have just directly done what they want done rather than writing 100 functions to do it and passing those to bar to execute the functions.
So there really isn't a definite answer one way or the other that applies all the time. It depends on the particular code you're writing.