I have a long Python function of that structure:
def the_function(lots, of, arguments):
return_value = None
if some_important_condition:
# a lot of stuff here
return_value = "some value"
else:
# even more stuff here
return_value = "some other value"
return return_value
One problem is that both the if and the else block contain more than one screenful of code. It is easy to lose track of the indentation, or having to scroll up to see in what condition we are at the moment.
One idea to improve this would be to split it up in several functions:
def case_true(lots, of, arguments):
# a lot of stuff here
return "some value"
def case_false(lots, of, arguments):
# even more stuff here
return "some other value"
def the_function(lots, of, arguments):
return_value = None
if some_important_condition:
return_value = case_true(lots, of, arguments)
else:
return_value = case_false(lots, of, arguments)
return return_value
but I am not sure whether this cleans things up, considering the argument juggling.
Another idea would be to use multiple exit points:
def the_function(lots, of, arguments):
if some_important_condition:
# a lot of stuff here
return "some value"
# even more stuff here
return "some other value"
but several coding styles advice against multiple exit points, especially when they are screens apart.
The question is: what would be a preferred, pythonic way to make the original construct more read- and maintainable?
It's perfectly fine to have several exit points in a function, the requirement of only a single exit point is an old convention, dating back to the days where programming languages didn't have exception handling and it made sense to have a single exit point to centralize error handling. The existence of exceptions makes that old convention obsolete.
There are situations where having multiple exit points is the way to go, even when enforcing the single function exit point policy - for example, guard clauses at the top of a function require a quick return from a function "if parameters are bad, or the bulk of the function is obviously inappropriate", in this case makes a lot of sense "bailing out at the top, before any meaningful work has been done. Otherwise, you'll need huge if statements that cover the bulk of the function, giving you yet another level of indentation".
For completeness' sake, here's an explanation expanding on my point.
The golden rule is: one function can have several return points, but as long as it enhances readability, and if your code is so massive, I'm afraid there will not be any difference between returning, and copying into a variable that will be returned.
I think your problem is more about the design, level of abstraction and semantics of your routine.
These questions might help you:
Does the routine have functional cohesion? i.e.: it does only and only one thing. Not something like calculate revenues, print them, send them to server and go for a walk with the dog.
Does the function have more than 7 arguments? If so, most probably the level of abstraction of your routine is not appropriated.
It would help if you post a little bit more information on the details of your routine (what it does, what it returns, what arguments). It might be that you are better off using two classes for that...
But, as a general answer, I would say that you better analyze the individual actions, factor them out in small functions with good cohesion, and turn your function into a sequential caller of these small functions, rather than having it do the work. And the approach of having only two functions to case_true, and case_false is probably wrong, since it can very well be that you have similar actions in both functions (for true, and false) and you are coding them twice.
Related
[Edit] changed return 0 to return. Side effects of beinga Python n00b. :)
I'm defining a function, where i'm doing some 20 lines of processing. Before processing, i need to check if a certain condition is met. If so, then I should bypass all processing. I have defined the function this way.
def test_funciton(self,inputs):
if inputs == 0:
<Display Message box>
return
<20 line logic here>
Note that the 20 line logic does not return any value, and i'm not using the 0 returned in the first 'if'.
I want to know if this is better than using the below type of code (in terms of performance, or readability, or for any other matter), because the above method looks good to me as it is one indentation less:
def test_function(self,inputs):
if inputs == 0:
<Display Message box>
else:
<20 line logic here>
In general, it improves code readability to handle failure conditions as early as possible. Then the meat of your code doesn't have to worry about these, and the reader of your code doesn't have to consider them any more. In many cases you'd be raising exceptions, but if you really want to do nothing, I don't see that as a major problem even if you generally hew to the "single exit point" style.
But why return 0 instead of just return, since you're not using the value?
First, you can use return without anything after, you don't have to force a return 0.
For the performance way, this question seems to prove you won't notice any difference (except if you're realy unlucky ;) )
In this context, I think it's important to know why inputs can't be zero? Typically, I think the way most programs will handle this is to raise an exception if a bad value is passed. Then the exception can be handled (or not) in the calling routine.
You'll often see it written "Better to ask forgiveness" as opposed to "Look before you leap". Of course, If you're often passing 0 into the function, then the try / except clause could get expensive (try is cheap, except is not).
If you're set on "looking before you leap", I would probably use the first form to keep indentation down.
I doubt the performance is going to be significantly different in either case. Like you I would tend to lean more toward the first method for readability.
In addition to the smaller indentation(which doesn't really matter much IMO), it precludes the necessity to read further for when your inputs == 0:
In the second method one might assume that there is additional processing after the if/else statement, whereas the first one makes it obvious that the method is complete upon that condition.
It really just comes down to personal preference though, you will see both methods used in practice.
Your second example will return after it displays the message box in this case.
I prefer to "return early" as in my opinion it leads to better readability. But then again, most of my returns that happen prior to the actual end of the function tend to be more around short circuiting application logic if certain conditions are not met.
Sorry about the confusing title.
I recently started doing this in my project, and am wondering if it's more efficient, or not, and if it's a terrible style to practice.
Here's an example from a Database interface:
def register(self, user, pw):
"""Register user/pw into the database"""
if self.isStarted():
raise Exceptions.Started
hashed = hashlib.sha512(pw).hexdigest()
self._db_cur.execute('''INSERT INTO PLAYERS (name, password)
values (?, ?)''', [user, hashed])
self._db.commit()
I do it here with raising an exception, but I've done it in other places with a return.
I feel this allows the false-cases to exit the function at the top, instead of continuing down the function, seeing if there is any more code for them to run.
I rarely see this in code I look at: is this a bad practice, or does it not yield any performance like I imagine it to?
To help clarify, what I'm used to is:
if (somethingTrue):
runThis()
thisToo()
x = andThis()
return x
return None
and what I've started to do, and am iffy on:
if (!somethingTrue):
return None
runThis()
thisToo()
x = andThis()
return x
The latter seems to give the impression (especially in functions longer than 4 lines) that the code isn't part of a conditional, when it's intended that way. This also makes it look nicer, while adhering to PEP-8, so I'm really in a toss-up about it.
I have a feeling this breaks something horribly sacred. Is this alright, or sacrilegious?
One part of the question is about style and best practice - therefore IMHO is no 'correct' way.
In my opinion the nested version (no direct return) comes from 'historic' programming languages like 'C' where the whole cleanup is done once at the right place. Artificial example to show the point only:
int f() {
int result = 1;
char * buffer = (char *)malloc(77);
if(buffer!=NULL) {
int const fd = open("/tmp/data.log", O_RDONLY);
if(fd!=-1) {
ssize_t const read_cnt = read(fd, buffer, 77);
if(read_cnt!=77) {
/* Do something: was not possible to read 77 bytes. */
result = 0;
}
close(fd);
}
free(buffer);
}
return result;
}
Here it is not correct to return from other lines than the last one - because then there might be a resource leak.
When using only objects which are destroyed completely in their destructor - or when there is no need to clean up resources (because none were allocated), I prefer the 'short' path return. This make things clearer like: if the preconditions for a function is not meet, there is no way to 'really' execute the function body. Also: you don't need that much indentation and it is easier to read.
Performance: I made some tests; I was not able to measure a difference between the two ways. IMHO if you need to tune performance at this level, you might want to think about choosing another programming language. ;-)
If you must do the check anyway (for correctness), then I think it's a fine idea to do it at the top.
There are cases where you can choose (1) to do a check to avoid unnecessary work, or (2) just skip the check and always do the work. In such cases, you might consider skipping the check if you think the not-doing-the-work case will be rare enough that the extra checks will cost more than doing unnecessary work from time to time (profile if you really want to be sure). Then again, it might not be clear which is the case; then I'd say a simple if statement is pretty cheap (assuming the check itself isn't expensive), so just do the check and don't worry too much unless you see performance problems. You can always do profiling later.
Edit: Based on your further example, it sounds like you had a different issue in mind. On that subject, I'd say you should generally put the case with the shortest code first, so you don't have something like
if positive_case:
lots of stuff
lots more stuff
...
else:
whatever this corresponds to is now off the screen
Your technique is also a common way to avoid too much nesting. By exiting from the if, and leaving out the else, you can flatten
if error:
raise exception
else:
do more stuff
if error:
...
to
if error:
raise exception
do more stuff
if error:
...
I think as you alluded, PEP-8 actually mentions and recommends this technique. I'm not sure why you feel this might be sacrilegious. Many things are just personal preference, with pros and cons either way, and you're entitled to your own opinion on the tradeoffs.
I'm adding some (epydoc) documentation to a package I've written, and I'm coming across a lot of instances where I'm repeating myself a multitude of times.
def script_running(self, script):
"""Return if script is running
#param script: Script to check whether running
#return: B{True} if script is running, B{False} otherwise
#rtype: C{bool}
"""
PEP257 says that:
One-liners are for really obvious cases.
and also
The docstring for a function or method should summarize its behavior and document its arguments, return value(s), side effects, exceptions raised, and restrictions on when it can be called (all if applicable).
Is there a general guideline or standard practice for when to draw the line between a one-liner (description) and full param/return fields?
Or when generating documentation should I include every applicable field for each function, regardless of how repetitive it seems?
Bonus question: Syntactically, what's the best way to describe the script param?
The general guideline you are looking for is right in PEP257 in what you quoted, maybe you just need to see it in action.
Your function is a good candidate for a one-line docstring ("really obvious cases"):
def script_running(self, script):
"""Check if the script is running."""
Usually if you say that a function is checking something it means that it's going to return True or False, but if you like you could be more specific:
def script_running(self, script):
"""Return True if the script is running, False otherwise."""
Once again all in one line.
I would probably also change the name of your function, because there's no need to emphasize on what the function works in its name (a script). A function name should be something sweet, short and meaningful about what the function does. Probably I'd go with:
def check_running(self, script):
"""Return True if the script is running, False otherwise."""
Sometimes the function-name-imagination is tired by all the coding, but you should try anyway to do your best.
For a multiline example, let me borrow a docstring from the google guidelines:
def fetch_bigtable_rows(big_table, keys, other_silly_variable=None):
"""Fetches rows from a Bigtable.
Retrieves rows pertaining to the given keys from the Table instance
represented by big_table. Silly things may happen if
other_silly_variable is not None.
Args:
big_table: An open Bigtable Table instance.
keys: A sequence of strings representing the key of each table row
to fetch.
other_silly_variable: Another optional variable, that has a much
longer name than the other args, and which does nothing.
Returns:
A dict mapping keys to the corresponding table row data
fetched. Each row is represented as a tuple of strings. For
example:
{'Serak': ('Rigel VII', 'Preparer'),
'Zim': ('Irk', 'Invader'),
'Lrrr': ('Omicron Persei 8', 'Emperor')}
If a key from the keys argument is missing from the dictionary,
then that row was not found in the table.
Raises:
IOError: An error occurred accessing the bigtable.Table object.
"""
This could be one way to "summarize its behavior and document its arguments, return value(s), side effects, exceptions raised, and restrictions on when it can be called (all if applicable)".
You might also be interested to look at this example of pypi project that it's meant to be documented with Sphinx.
My 2 cents: Guidelines are meant to give you an idea about what you should and shouldn't do, but they are not strict rules that you have to blindly follow. So at the end choose what you feel to be better.
I would like to clear something that is been said in another answer about hitting the Maximum Line Length with a docstring.
PEP8 tells you to "Limit all lines to a maximum of 79 characters" even though at the end everyone does 80.
This are 80 characters:
--------------------------------------------------------------------------------
And this may be an edge case where a little long one sentence is all you really need:
def my_long_doc_function(arg1, arg2):
"""This docstring is long, it's a little looonger than the 80 characters
limit.
"""
Is like a one-line docstring, meaning that is for really obvious cases, but on your editor (with the 80 character limit) is on multiple lines.
I think there is likely always some degree of repetition involved when adding extended syntax for docstrings, i.e. epydoc/sphinx markup.
I would also say this matter is subjective rahter than objective. Explicit is better than implicit, and would seem to follow the Zen of Python more.
I find myself writing the same argument checking code all the time for number-crunching:
def myfun(a, b):
if a < 0:
raise ValueError('a cannot be < 0 (was a=%s)' % a)
# more if.. raise exception stuff here ...
return a + b
Is there a better way? I was told not to use 'assert' for these things (though I don't see the problem, apart from not knowing the value of the variable that caused the error).
edit: To clarify, the arguments are usually just numbers and the error checking conditions can be complex, non-trivial and will not necessarily lead to an exception later, but simply to a wrong result. (unstable algorithms, meaningless solutions etc)
assert gets optimized away if you run with python -O (modest optimizations, but sometimes nice to have). One preferable alternative if you have patterns that often repeat may be to use decorators -- great way to factor out repetition. E.g., say you have a zillion functions that must be called with arguments by-position (not by-keyword) and must have their first arguments positive; then...:
def firstargpos(f):
def wrapper(first, *args):
if first < 0:
raise ValueError(whateveryouwish)
return f(first, *args)
return wrapper
then you say something like:
#firstargpos
def myfun(a, b):
...
and the checks are performed in the decorators (or rather the wrapper closure it returns) once and for all. So, the only tricky part is figuring out exactly what checks your functions need and how best to call the decorator(s) to express those (hard to say, without seeing the set of functions you're defining and the set of checks each needs!-). Remember, DRY ("Don't Repeat Yourself") is close to the top spot among guiding principles in software development, and Python has reasonable support to allow you to implement DRY and avoid boilerplatey, repetitious code!-)
You don't want to use assert because your code can be run (and is by default on some systems) in such a way that assert lines are not checked and do not raise errors (-O command line flag).
If you're using a lot of variables that are all supposed to have those same properties, why not subclass whatever type you're using and add that check to the class itself? Then when you use your new class, you know you never have an invalid value, and don't have to go checking for it all over the place.
I'm not sure if this will answer your question, but it strikes me that checking a lot of arguments at the start of a function isn't very pythonic.
What I mean by this is that it is the assumption of most pythonistas that we are all consenting adults, and we trust each other not to do something stupid. Here's how I'd write your example:
def myfun(a, b):
'''a cannot be < 0'''
return a + b
This has three distinct advantages. First off, it's concise, there's really no extra code doing anything unrelated to what you're actually trying to get done. Second, it puts the information exactly where it belongs, in help(myfun), where pythonistas are expected to look for usage notes. Finally, is a non-positive value for a really an error? Although you might think so, unless something definitely will break if a is zero (here it probably wont), then maybe letting it slip through and cause an error up the call stream is wiser. after all, if a + b is in error, it raises an exception which gets passed up the call stack and behavior is still pretty much the same.
For debugging, it is often useful to tell if a particular function is higher up on the call stack. For example, we often only want to run debugging code when a certain function called us.
One solution is to examine all of the stack entries higher up, but it this is in a function that is deep in the stack and repeatedly called, this leads to excessive overhead. The question is to find a method that allows us to determine if a particular function is higher up on the call stack in a way that is reasonably efficient.
Similar
Obtaining references to function objects on the execution stack from the frame object? - This question focuses on obtaining the function objects, rather than determining if we are in a particular function. Although the same techniques could be applied, they may end up being extremely inefficient.
Unless the function you're aiming for does something very special to mark "one instance of me is active on the stack" (IOW: if the function is pristine and untouchable and can't possibly be made aware of this peculiar need of yours), there is no conceivable alternative to walking frame by frame up the stack until you hit either the top (and the function is not there) or a stack frame for your function of interest. As several comments to the question indicate, it's extremely doubtful whether it's worth striving to optimize this. But, assuming for the sake of argument that it was worthwhile...:
Edit: the original answer (by the OP) had many defects, but some have since been fixed, so I'm editing to reflect the current situation and why certain aspects are important.
First of all, it's crucial to use try/except, or with, in the decorator, so that ANY exit from a function being monitored is properly accounted for, not just normal ones (as the original version of the OP's own answer did).
Second, every decorator should ensure it keeps the decorated function's __name__ and __doc__ intact -- that's what functools.wraps is for (there are other ways, but wraps makes it simplest).
Third, just as crucial as the first point, a set, which was the data structure originally chosen by the OP, is the wrong choice: a function can be on the stack several times (direct or indirect recursion). We clearly need a "multi-set" (also known as "bag"), a set-like structure which keeps track of "how many times" each item is present. In Python, the natural implementation of a multiset is as a dict mapping keys to counts, which in turn is most handily implemented as a collections.defaultdict(int).
Fourth, a general approach should be threadsafe (when that can be accomplished easily, at least;-). Fortunately, threading.local makes it trivial, when applicable -- and here, it should surely be (each stack having its own separate thread of calls).
Fifth, an interesting issue that has been broached in some comments (noticing how badly the offered decorators in some answers play with other decorators: the monitoring decorator appears to have to be the LAST (outermost) one, otherwise the checking breaks. This comes from the natural but unfortunate choice of using the function object itself as the key into the monitoring dict.
I propose to solve this by a different choice of key: make the decorator take a (string, say) identifier argument that must be unique (in each given thread) and use the identifier as the key into the monitoring dict. The code checking the stack must of course be aware of the identifier and use it as well.
At decorating time, the decorator can check for the uniqueness property (by using a separate set). The identifier may be left to default to the function name (so it's only explicitly required to keep the flexibility of monitoring homonymous functions in the same namespace); the uniqueness property may be explicitly renounced when several monitored functions are to be considered "the same" for monitoring purposes (this may be the case if a given def statement is meant to be executed multiple times in slightly different contexts to make several function objects that the programmers wants to consider "the same function" for monitoring purposes). Finally, it should be possible to optionally revert to the "function object as identifier" for those rare cases in which further decoration is KNOWN to be impossible (since in those cases it may be the handiest way to guarantee uniqueness).
So, putting these many considerations together, we could have (including a threadlocal_var utility function that will probably already be in a toolbox module of course;-) something like the following...:
import collections
import functools
import threading
threadlocal = threading.local()
def threadlocal_var(varname, factory, *a, **k):
v = getattr(threadlocal, varname, None)
if v is None:
v = factory(*a, **k)
setattr(threadlocal, varname, v)
return v
def monitoring(identifier=None, unique=True, use_function=False):
def inner(f):
assert (not use_function) or (identifier is None)
if identifier is None:
if use_function:
identifier = f
else:
identifier = f.__name__
if unique:
monitored = threadlocal_var('uniques', set)
if identifier in monitored:
raise ValueError('Duplicate monitoring identifier %r' % identifier)
monitored.add(identifier)
counts = threadlocal_var('counts', collections.defaultdict, int)
#functools.wraps(f)
def wrapper(*a, **k):
counts[identifier] += 1
try:
return f(*a, **k)
finally:
counts[identifier] -= 1
return wrapper
return inner
I have not tested this code, so it might contain some typo or the like, but I'm offering it because I hope it does cover all the important technical points I explained above.
Is it all worth it? Probably not, as previously explained. However, I think along the lines of "if it's worth doing at all, then it's worth doing right";-).
I don't really like this approach, but here's a fixed-up version of what you were doing:
from collections import defaultdict
import threading
functions_on_stack = threading.local()
def record_function_on_stack(f):
def wrapped(*args, **kwargs):
if not getattr(functions_on_stack, "stacks", None):
functions_on_stack.stacks = defaultdict(int)
functions_on_stack.stacks[wrapped] += 1
try:
result = f(*args, **kwargs)
finally:
functions_on_stack.stacks[wrapped] -= 1
if functions_on_stack.stacks[wrapped] == 0:
del functions_on_stack.stacks[wrapped]
return result
wrapped.orig_func = f
return wrapped
def function_is_on_stack(f):
return f in functions_on_stack.stacks
def nested():
if function_is_on_stack(test):
print "nested"
#record_function_on_stack
def test():
nested()
test()
This handles recursion, threading and exceptions.
I don't like this approach for two reasons:
It doesn't work if the function is decorated further: this must be the final decorator.
If you're using this for debugging, it means you have to edit code in two places to use it; one to add the decorator, and one to use it. It's much more convenient to just examine the stack, so you only have to edit code in the code you're debugging.
A better approach would be to examine the stack directly (possibly as a native extension for speed), and if possible, find a way to cache the results for the lifetime of the stack frame. (I'm not sure if that's possible without modifying the Python core, though.)