I'm adding some (epydoc) documentation to a package I've written, and I'm coming across a lot of instances where I'm repeating myself a multitude of times.
def script_running(self, script):
"""Return if script is running
#param script: Script to check whether running
#return: B{True} if script is running, B{False} otherwise
#rtype: C{bool}
"""
PEP257 says that:
One-liners are for really obvious cases.
and also
The docstring for a function or method should summarize its behavior and document its arguments, return value(s), side effects, exceptions raised, and restrictions on when it can be called (all if applicable).
Is there a general guideline or standard practice for when to draw the line between a one-liner (description) and full param/return fields?
Or when generating documentation should I include every applicable field for each function, regardless of how repetitive it seems?
Bonus question: Syntactically, what's the best way to describe the script param?
The general guideline you are looking for is right in PEP257 in what you quoted, maybe you just need to see it in action.
Your function is a good candidate for a one-line docstring ("really obvious cases"):
def script_running(self, script):
"""Check if the script is running."""
Usually if you say that a function is checking something it means that it's going to return True or False, but if you like you could be more specific:
def script_running(self, script):
"""Return True if the script is running, False otherwise."""
Once again all in one line.
I would probably also change the name of your function, because there's no need to emphasize on what the function works in its name (a script). A function name should be something sweet, short and meaningful about what the function does. Probably I'd go with:
def check_running(self, script):
"""Return True if the script is running, False otherwise."""
Sometimes the function-name-imagination is tired by all the coding, but you should try anyway to do your best.
For a multiline example, let me borrow a docstring from the google guidelines:
def fetch_bigtable_rows(big_table, keys, other_silly_variable=None):
"""Fetches rows from a Bigtable.
Retrieves rows pertaining to the given keys from the Table instance
represented by big_table. Silly things may happen if
other_silly_variable is not None.
Args:
big_table: An open Bigtable Table instance.
keys: A sequence of strings representing the key of each table row
to fetch.
other_silly_variable: Another optional variable, that has a much
longer name than the other args, and which does nothing.
Returns:
A dict mapping keys to the corresponding table row data
fetched. Each row is represented as a tuple of strings. For
example:
{'Serak': ('Rigel VII', 'Preparer'),
'Zim': ('Irk', 'Invader'),
'Lrrr': ('Omicron Persei 8', 'Emperor')}
If a key from the keys argument is missing from the dictionary,
then that row was not found in the table.
Raises:
IOError: An error occurred accessing the bigtable.Table object.
"""
This could be one way to "summarize its behavior and document its arguments, return value(s), side effects, exceptions raised, and restrictions on when it can be called (all if applicable)".
You might also be interested to look at this example of pypi project that it's meant to be documented with Sphinx.
My 2 cents: Guidelines are meant to give you an idea about what you should and shouldn't do, but they are not strict rules that you have to blindly follow. So at the end choose what you feel to be better.
I would like to clear something that is been said in another answer about hitting the Maximum Line Length with a docstring.
PEP8 tells you to "Limit all lines to a maximum of 79 characters" even though at the end everyone does 80.
This are 80 characters:
--------------------------------------------------------------------------------
And this may be an edge case where a little long one sentence is all you really need:
def my_long_doc_function(arg1, arg2):
"""This docstring is long, it's a little looonger than the 80 characters
limit.
"""
Is like a one-line docstring, meaning that is for really obvious cases, but on your editor (with the 80 character limit) is on multiple lines.
I think there is likely always some degree of repetition involved when adding extended syntax for docstrings, i.e. epydoc/sphinx markup.
I would also say this matter is subjective rahter than objective. Explicit is better than implicit, and would seem to follow the Zen of Python more.
Related
Background: I need to read the same key/value from a dictionary (exactly) twice.
Question: There are two ways, as shown below,
Method 1. Read it with the same key twice, e.g.,
sample_map = {'A':1,}
...
if sample_map.get('A', None) is not None:
print("A's value in map is {}".format(sample_map.get('A')))
Method 2. Read it once and store it in a local variable, e.g,
sample_map = {'A':1,}
...
ret_val = sample.get('A', None)
if ret_val is not None:
print("A's value in map is {}".format(ret_val))
Which way is better? What are their Pros and Cons?
Note that I am aware that print() can naturally handle ret_val of None. This is a hypothetical example and I just use it for illustration purposes.
Under these conditions, I wouldn't use either. What you're really interested in is whether A is a valid key, and the KeyError (or lack thereof) raised by __getitem__ will tell you if it is or not.
try:
print("A's value in map is {}".format(sample['A'])
except KeyError:
pass
Or course, some would say there is too much code in the try block, in which case method 2 would be preferable.
try:
ret_val = sample['A']
except KeyError:
pass
else:
print("A's value in map is {}".format(ret_val))
or the code you already have:
ret_val = sample.get('A') # None is the default value for the second argument
if ret_val is not None:
print("A's value in map is {}".format(ret_val))
There isn't any effective difference between the options you posted.
Python: List vs Dict for look up table
Lookups in a dict are about o(1). Same goes for a variable you have stored.
Efficiency is about the same. In this case, I would skip defining the extra variable, since not much else is going on.
But in a case like below, where there's a lot of dict lookups going on, I have plans to refactor the code to make things more intelligible, as all of the lookups clutter or obfuscate the logic:
# At this point, assuming that these are floats is OK, since no thresholds had text values
if vname in paramRanges:
"""
Making sure the variable is one that we have a threshold for
"""
# We might care about it
# Don't check the equal case, because that won't matter
if float(tblChanges[vname][0]) < float(tblChanges[vname][1]):
# Check lower tolerance
# Distinction is important because tolerances are not always the same +/-
if abs(float(tblChanges[vname][0]) - float(tblChanges[vname][1])) >= float(
paramRanges[vname][2]):
# Difference from default is greater than tolerance
# vname : current value, default value, negative tolerance, tolerance units, change date
alerts[vname] = (
float(tblChanges[vname][0]), float(tblChanges[vname][1]), float(paramRanges[vname][2]),
paramRanges[vname][0], tblChanges[vname][2]
)
if abs(float(tblChanges[vname][0]) - float(tblChanges[vname][1])) >= float(
paramRanges[vname][1]):
alerts[vname] = (
float(tblChanges[vname][0]), float(tblChanges[vname][1]), float(paramRanges[vname][1]),
paramRanges[vname][0], tblChanges[vname][2]
)
In most cases—if you can't just rewrite your code to use EAFP as chepner suggests, which you probably can for this example—you want to avoid repeated method calls.
The only real benefit of repeating the get is saving an assignment statement.
If your code isn't crammed in the middle of a complex expression, that just means saving one line of vertical space—which isn't nothing, but isn't a huge deal.
If your code is crammed in the middle of a complex expression, pulling the get out may force you to rewrite things a bit. You may have to, e.g., turn a lambda into a def, or turn a while loop with a simple condition into a while True: with an if …: break. Usually that's a sign that you, e.g., really wanted a def in the first place, but "usually" isn't "always". So, this is where you might want to violate the rule of thumb—but see the section at the bottom first.
On the other side…
For dict.get, the performance cost of repeating the method is pretty tiny, and unlikely to impact your code. But what if you change the code to take an arbitrary mapping object from the caller, and someone passes you, say, a proxy that does a get by making a database query or an RPC to a remote server?
For single-threaded code, calling dict.get with the same arguments twice in a row without doing anything in between is correct. But what if you're taking a dict passed by the caller, and the caller has a background thread also modifying the same dict? Then your code is only correct if you put a Lock or other synchronization around the two accesses.
Or, what if your expression was something that might mutate some state, or do something dangerous?
Even if nothing like this is ever going to be an issue in your code, unless that fact is blindingly obvious to anyone reading your code, they're still going to have to think about the possibility of performance costs and ToCToU races and so on.
And, of course, it makes at least two of your lines longer. Assuming you're trying to write readable code that sticks to 72 or 79 or 99 columns, horizontal space is a scarce resource, while vertical space is much less of a big deal. I think your second version is easier to scan than your first, even without all of these other considerations, but imagine making the expression, say, 20 characters longer.
In the rare cases where pulling the repeated value out of an expression would be a problem, you still often want to assign it to a temporary.
Unfortunately, up to Python 3.7, you usually can't. It's either clumsy (e.g., requiring an extra nested comprehension or lambda just to give you an opportunity to bind a variable) or impossible.
But in Python 3.8, PEP 572 assignment expressions handle this case.
if (sample := sample_map.get('A', None)) is not None:
print("A's value in map is {}".format(sample))
I don't think this is a great use of an assignment expression (see the PEP for some better examples), especially since I'd probably write this the way chepner suggested… but it does show how to get the best of both worlds (assigning a temporary, and being embeddable in an expression) when you really need to.
I'm asking about situations where if a wrong type of argument is passed to the function, it could:
Blow up the whole thing.
Return unexpected results
Return nothing
For instance, the function below expects the argument name to be a string. It would throw an exception for all other types that doesn't have a startswith method.
def fruits(name):
if name.startswith('O'):
print('Is it Orange?')
There are other cases where a function could halt or cause damage to the system if execution proceeds without type-checking. Whenever there are a lot of functions or functions with a lot of arguments, type checking is tedious and makes the code unreadable. So, is there a standard for doing this? As to 'how to type check' - there are plenty of examples here on stackexchange, but I couldn't find any about where it would be appropriate to do so.
Another example would be:
def fruits(names):
with open('important_file.txt', 'r+') as fil:
for name in names:
if name in fil:
# Edit the file
Here if the name is a string each character in it will influence the editing of the file. If it is any other iterable, each element provided by it would influence the editing. Both of these could produce different results.
So, when should we type-check an argument and should we not?
The answer off the top of my head would be: it depends where the input comes from.
If the functions are class methods that get invokes internally or things like that, you can assume the inputs are valid, because you wrote it!
For example
def add(x,y):
return x + y
def multiply(a,b):
product = 0
for i in range(a):
product = add(product, b)
return product
In my add function, I could check that there is a + operator for the parameters x and y. But since I wrote the multiply function, and that is the only function that uses add, it is safe to assume the inputs will be int because that's how I wrote it. Now that argument stands on shaky ground for large code bases where you (hopefully) have shared code, so you can't be sure people don't misuse your functions. But that's why you comment them well to describe the correct use of said function.
If it has to read from a file, get user input, etc, then you may want to do some validation first.
I almost never do type checking in Python. In accordance with Pythonic philosophy I assume that me and other programmers are adult people capable of reading the code (or at least the documentation) and using it properly. I assume that we test our code before we let it destroy something important. After all in most cases if you do something wrong, you'll just see an error and Python's error messages are quite informative most of the time.
The only occasion when I sometimes check types is when I want my function to behave differently depending on the argument's type. But although I sometimes feel compelled to do this, I don't consider it a good practice.
Most often it happens when my function iterates over a list of strings and I fear (or want) I could get a single string passed into it by accident - this won't throw an error at once because unfortunately string is an iterable too.
I have a long Python function of that structure:
def the_function(lots, of, arguments):
return_value = None
if some_important_condition:
# a lot of stuff here
return_value = "some value"
else:
# even more stuff here
return_value = "some other value"
return return_value
One problem is that both the if and the else block contain more than one screenful of code. It is easy to lose track of the indentation, or having to scroll up to see in what condition we are at the moment.
One idea to improve this would be to split it up in several functions:
def case_true(lots, of, arguments):
# a lot of stuff here
return "some value"
def case_false(lots, of, arguments):
# even more stuff here
return "some other value"
def the_function(lots, of, arguments):
return_value = None
if some_important_condition:
return_value = case_true(lots, of, arguments)
else:
return_value = case_false(lots, of, arguments)
return return_value
but I am not sure whether this cleans things up, considering the argument juggling.
Another idea would be to use multiple exit points:
def the_function(lots, of, arguments):
if some_important_condition:
# a lot of stuff here
return "some value"
# even more stuff here
return "some other value"
but several coding styles advice against multiple exit points, especially when they are screens apart.
The question is: what would be a preferred, pythonic way to make the original construct more read- and maintainable?
It's perfectly fine to have several exit points in a function, the requirement of only a single exit point is an old convention, dating back to the days where programming languages didn't have exception handling and it made sense to have a single exit point to centralize error handling. The existence of exceptions makes that old convention obsolete.
There are situations where having multiple exit points is the way to go, even when enforcing the single function exit point policy - for example, guard clauses at the top of a function require a quick return from a function "if parameters are bad, or the bulk of the function is obviously inappropriate", in this case makes a lot of sense "bailing out at the top, before any meaningful work has been done. Otherwise, you'll need huge if statements that cover the bulk of the function, giving you yet another level of indentation".
For completeness' sake, here's an explanation expanding on my point.
The golden rule is: one function can have several return points, but as long as it enhances readability, and if your code is so massive, I'm afraid there will not be any difference between returning, and copying into a variable that will be returned.
I think your problem is more about the design, level of abstraction and semantics of your routine.
These questions might help you:
Does the routine have functional cohesion? i.e.: it does only and only one thing. Not something like calculate revenues, print them, send them to server and go for a walk with the dog.
Does the function have more than 7 arguments? If so, most probably the level of abstraction of your routine is not appropriated.
It would help if you post a little bit more information on the details of your routine (what it does, what it returns, what arguments). It might be that you are better off using two classes for that...
But, as a general answer, I would say that you better analyze the individual actions, factor them out in small functions with good cohesion, and turn your function into a sequential caller of these small functions, rather than having it do the work. And the approach of having only two functions to case_true, and case_false is probably wrong, since it can very well be that you have similar actions in both functions (for true, and false) and you are coding them twice.
Specifically the ":int" part...
I assumed it somehow checked the type of the parameter at the time the function is called and perhaps raised an exception in the case of a violation. But the following run without problems:
def some_method(param:str):
print("blah")
some_method(1)
def some_method(param:int):
print("blah")
some_method("asdfaslkj")
In both cases "blah" is printed - no exception raised.
I'm not sure what the name of the feature is so I wasn't sure what to google.
EDIT: OK, so it's http://www.python.org/dev/peps/pep-3107/. I can see how it'd be useful in frameworks that utilize metadata. It's not what I assumed it was. Thanks for the responses!
FOLLOW-UP QUESTION - Any thoughts on whether it's a good idea or bad idea to define my functions as def some_method(param:int) if I really only can handle int inputs - even if, as pep 3107 explains, it's just metadata - no enforcement as I originally assumed? At least the consumers of the methods will see clearly what I intended. It's an alternative to documentation. Think this is good/bad/waste of time? Granted, good parameter naming (unlike my contrived example) usually makes it clear what types are meant to be passed in.
it's not used for anything much - it's just there for experimentation (you can read them from within python if you want, for example). they are called "function annotations" and are described in pep 3107.
i wrote a library that builds on it to do things like type checking (and more - for example you can map more easily from JSON to python objects) called pytyp (more info), but it's not very popular... (i should also add that the type checking part of pytyp is not at all efficient - it can be useful for tracking down a bug, but you wouldn't want to use it across an entire program).
[update: i would not recommend using function annotations in general (ie with no particular use in mind, just as docs) because (1) they might eventually get used in a way that you didn't expect and (2) the exact type of things is often not that important in python (more exactly, it's not always clear how best to specify the type of something in a useful way - objects can be quite complex, and often only "parts" are used by any one function, with multiple classes implementing those parts in different ways...). this is a consequence of duck typing - see the "more info" link for related discussion on how python's abstract base classes could be used to tackle this...]
Function annotations are what you make of them.
They can be used for documentation:
def kinetic_energy(mass: 'in kilograms', velocity: 'in meters per second'):
...
They can be used for pre-condition checking:
def validate(func, locals):
for var, test in func.__annotations__.items():
value = locals[var]
msg = 'Var: {0}\tValue: {1}\tTest: {2.__name__}'.format(var, value, test)
assert test(value), msg
def is_int(x):
return isinstance(x, int)
def between(lo, hi):
def _between(x):
return lo <= x <= hi
return _between
def f(x: between(3, 10), y: is_int):
validate(f, locals())
print(x, y)
>>> f(0, 31.1)
Traceback (most recent call last):
...
AssertionError: Var: y Value: 31.1 Test: is_int
Also see http://www.python.org/dev/peps/pep-0362/ for a way to implement type checking.
Not experienced in python, but I assume the point is to annotate/declare the parameter type that the method expects. Whether or not the expected type is rigidly enforced at runtime is beside the point.
For instance, consider:
intToHexString(param:int)
Although the language may technically allow you to call intToHexString("Hello"), it's not semantically meaningful to do so. Having the :int as part of the method declaration helps to reinforce that.
It's basically just used for documentation. When some examines the method signature, they'll see that param is labelled as an int, which will tell them the author of the method expected them to pass an int.
Because Python programmers use duck typing, this doesn't mean you have to pass an int, but it tells you the code is expecting something "int-like". So you'll probably have to pass something basically "numeric" in nature, that supports arithmetic operations. Depending on the method it may have to be usable as an index, or it may not.
However, because it's syntax and not just a comment, the annotation is visible to any code that wants to introspect it. This opens up the possibility of writing a typecheck decorator that can enforce strict type checking on arbitrary functions; this allows you to put the type checking logic in one place, and have each method declare which parameters it wants strictly type checked (by attaching a type annotation) with a minimum on syntax, in a way that is visible to client programmers who are browsing method definitions to find out the interface.
Or you could do other things with those annotations. No standardized meaning has yet been developed. Maybe if someone comes up with a killer feature that uses them and has huge adoption, then it'll one day become part of the Python language, but I suspect the flexibility of using them however you want will be too useful to ever do that.
You might also use the "-> returnValue" notation to indicate what type the function might return.
def mul(a:int, b:int) -> None:
print(a*b)
I prefer to document each parameter (as needed) on the same line where I declare the parameter in order to apply D.R.Y.
If I have code like this:
def foo(
flab_nickers, # a series of under garments to process
has_polka_dots=False,
needs_pressing=False # Whether the list of garments should all be pressed
):
...
How can I avoid repeating the parameters in the doc string and retain the parameter explanations?
I want to avoid:
def foo(
flab_nickers, # a series of under garments to process
has_polka_dots=False,
needs_pressing=False # Whether the list of garments should all be pressed
):
'''Foo does whatever.
* flab_nickers - a series of under garments to process
* needs_pressing - Whether the list of garments should all be pressed.
[Default False.]
Is this possible in python 2.6 or python 3 with some sort of decorator manipulation? Is there some other way?
I would do this.
Starting with this code.
def foo(
flab_nickers, # a series of under garments to process
has_polka_dots=False,
needs_pressing=False # Whether the list of garments should all be pressed
):
...
I would write a parser that grabs the function parameter definitions and builds the following:
def foo(
flab_nickers,
has_polka_dots=False,
needs_pressing=False,
):
"""foo
:param flab_nickers: a series of under garments to process
:type flab_nickers: list or tuple
:param has_polka_dots: default False
:type has_polka_dots: bool
:param needs_pressing: default False, Whether the list of garments should all be pressed
:type needs_pressing: bool
"""
...
That's some pretty straight-forward regex processing of the various arguments string patterns to fill in the documentation template.
A lot of good Python IDEs (for example PyCharm) understand the default Sphinx param notation and even flag vars/methods in the scope that IDE thinks does not conform to the declared type.
Note the extra comma in the code; that's just to make things consistent. It does no harm, and it might simplify things in the future.
You can also try and use the Python compiler to get a parse tree, revise it and emit the update code. I've done this for other languages (not Python), so I know a little bit about it, but don't know how well supported it is in Python.
Also, this is a one-time transformation.
The original in-line comments in the function definition don't really follow DRY because it's a comment, in an informal language, and unusable by any but the most sophisticated tools.
The Sphinx comments are closer to DRY because they're in the RST markup language, making them much easier to process using ordinary text-parsing tools in docutils.
It's only DRY if tools can make use of it.
Useful links:
https://pythonhosted.org/an_example_pypi_project/sphinx.html#function-definitions
http://sphinx-doc.org/domains.html#id1
Annotations are meant to partly address this problem in Python 3:
http://www.python.org/dev/peps/pep-3107/
I'm not sure if there has been any work in applying these to Sphinx yet.
You can't do that without a preprocessor, as comments don't exist for Python once the source has been compiled. To avoid repeating yourself, remove the comments and document the parameters only in the docstring, this is the standard way to document your arguments.