Making API from Python docstrings written in PEP8 - python

I've written a code in Python. I tried to be follow common guidelines about writing helpful comments at the beginnings of functions. My style is PEP8, e.g.
def __init__(self, f_name=None, list_=None, cut_list=None, n_events=None, description=None):
"""
Parse an LHCO or ROOT file into a list of Event objects.
It is possible to initialize an Events class without a LHCO file,
and later append events to the list.
Arguments:
f_name -- Name of an LHCO or ROOT file, including path
list_ -- A list for initalizing Events
cut_list -- Cuts applied to events and their acceptance
n_events -- Number of events to read from LHCO file
description -- Information about events
"""
I want to automatically generate a helpful API from my code. I've found a few options and was looking at Sphinx in particular. It seemed to do what I wanted (though I struggled to make it generate an API, rather than a manual for my program). The drawback, however, was that it has it's own expected style for docstrings:
"""
:param x: My parameter
:type x: Its type
"""
Is it really best for me to rewrite all my docstrings with this syntax? They produce a nice API, but I don't like them in the code, and it'll be time consuming if it turns out to be a bad idea. What is standard, best practice? Should I convert? If so, can something do it automatically for me?

The Sphinx default format for docstrings is really quite powerful and is definitely worth the time if you want to generate clean API documentation and if you need to review your own code in months, years. So yes, it is a good idea.
If you don't like the default Sphinx-ReST syntax, you could try writing your docstrings the way Numpy do, e.g.:
def func(arg1, arg2):
"""Summary line.
Extended description of function.
Parameters
----------
arg1 : int
Description of arg1
arg2 : str
Description of arg2
Returns
-------
bool
Description of return value
"""
return True
There's a Sphinx extension (Napoleon) which allows Sphinx to parse this style (or the Google style, which is even simpler).

I think that the Sphinx syntax is pretty lightweight (be glad it's not Javadoc) so in terms of pretty raw code it's not a serious disadvantage.
My IDE, PyCharm, automatically creates skeletons in the Sphinx style when I add a docstring to a function. So there's some developers who know a thing or two about Python (and who also like to push for PEP8 style in other areas a lot) and recommend Sphinx. PyCharm even has a type hinting system used for inference and type checking, which starts by checking the declarations in the docstring.
Here's a regex you can use to make the conversion automatically. Replace
^(\s+)(\w+) -- (.+)$
with
$1:param $2: $3\n$1:type $2:
where $n represents the nth group. Of course you will need to fill out the type yourself.

Related

Python: parse commandline

I'm writing a CLI application in python with is used by means of a rather elaborate commandline language. The idea is very similar to find(1) which arguably has the same property.
Currently, the parser is completely handwritten using a handmade EBNF description language. The problem is that this language is very awkward to use because I have to write everything as python structures. I also feel that my program is still way too bloated because of the parsing.
Is there any lib that features ease of use, and a true description language (input as string/document) for commandline parsing? From the syntax tree, I would like to directly map each item to a class instance. Naturally, I don't want a tokenizer, or at least the tokenizer must map straight from commandline arguments to tokens.
Thanks for all suggestions!
UPDATE: The whole point of my program is to generate objects and pass them through any number of filters (possibly unpure/effectful actions) that might or might not output the objects again, or might even output objects of another type. The general idea is obviously gleaned from find(1). An example commandline would be:
~/picdb.py -sqlselect 'select * from pics where dirname like "testdir%"' -tagged JoSo -updateFromFile [ -resx +300 -or -resX +200 -resY +500 ] -printfXml '<jpegfile><src>%fp</src><DateTimeOriginal>%ed</DateTimeOriginal><Manufacturer>%eM</Manufacturer><Model>%em</Model></jpegfile>%NL'
This is a very tricky problem...You can "bind" actions to commandline arguments using argparse quite easily (e.g. create a class, operate on a previously created class ...). Here's a silly example of that...(argument --foo creates an object, argument --bar modifies the object created by --foo).
from argparse import ArgumentParser,Action
class Foo(object):
def __init__(self,*args):
self.args=args
def __str__(self):
return str(self.args)
class FooAction(Action):
def __call__(self,parser,namespace,values,option_string=None):
setattr(namespace,self.dest,Foo(*values)) #Add Foo to the options...
class BarAction(Action):
def __call__(self,parser,namespace,values,option_string=None):
FooObj=getattr(namespace,'foo') #raises an error if foo isn't in namespace...
#In this way, BarAction is like a filter on the
#object created by foo.
FooObj.args=tuple(list(FooObj.args)+list(values)) #append to the list of args.
parser=ArgumentParser()
parser.add_argument('--foo',nargs='*',action=FooAction,help="Foo!")
parser.add_argument('--bar',nargs='*',action=BarAction,help="Bar! : Must be used after --foo")
namespace=parser.parse_args("--foo Hello World --bar Nice Day".split())
print (namespace)
print (namespace.foo)
However, this is a little different from yours in that -argument is not really possible with argparse, only -a or --argument. That may already be a deal breaker for you, I'm not sure...
The next difficulty is dealing with the brackets... [ and ]. If you can treat those as arguments to a different commandline option, you might be OK...You might be able to set up a second parser to parse out the inside portions -- but I've never tried anything like that before... (If anyone else has any ideas about how to deal with the brackets, I'd be very interested to hear them).
As far as optparse and getopt are concerned, I'm pretty sure that anything you can do with them, you can do with argparse, which is why I've left them out of the discussion.
There are al least three modules you could try; argparse, optparse (deprecated in 2.7) and getopt. See chapter 15 of the Python standard library manual.

How to document a method with parameter(s)?

How to document methods with parameters using Python's documentation strings?
EDIT:
PEP 257 gives this example:
def complex(real=0.0, imag=0.0):
"""Form a complex number.
Keyword arguments:
real -- the real part (default 0.0)
imag -- the imaginary part (default 0.0)
"""
if imag == 0.0 and real == 0.0: return complex_zero
...
Is this the convention used by most Python developers ?
Keyword arguments:
<parameter name> -- Definition (default value if any)
I was expecting something a little bit more formal such as
def complex(real=0.0, imag=0.0):
"""Form a complex number.
#param: real The real part (default 0.0)
#param: imag The imaginary part (default 0.0)
"""
if imag == 0.0 and real == 0.0: return complex_zero
...
Environment: Python 2.7.1
Since docstrings are free-form, it really depends on what you use to parse code to generate API documentation.
I would recommend getting familiar with the Sphinx markup, since it is widely used and is becoming the de-facto standard for documenting Python projects, in part because of the excellent readthedocs.org service. To paraphrase an example from the Sphinx documentation as a Python snippet:
def send_message(sender, recipient, message_body, priority=1) -> int:
"""
Send a message to a recipient.
:param str sender: The person sending the message
:param str recipient: The recipient of the message
:param str message_body: The body of the message
:param priority: The priority of the message, can be a number 1-5
:type priority: integer or None
:return: the message id
:rtype: int
:raises ValueError: if the message_body exceeds 160 characters
:raises TypeError: if the message_body is not a basestring
"""
This markup supports cross-referencing between documents and more. Note that the Sphinx documentation uses (e.g.) :py:attr: whereas you can just use :attr: when documenting from the source code.
Naturally, there are other tools to document APIs. There's the more classic Doxygen which uses \param commands but those are not specifically designed to document Python code like Sphinx is.
Note that there is a similar question with a similar answer in here...
Based on my experience, the numpy docstring conventions (PEP257 superset) are the most widely-spread followed conventions that are also supported by tools, such as Sphinx.
One example:
Parameters
----------
x : type
Description of parameter `x`.
Conventions:
PEP 257 Docstring Conventions
PEP 287 reStructuredText Docstring Format
Tools:
Epydoc: Automatic API Documentation Generation for Python
sphinx.ext.autodoc – Include documentation from docstrings
PyCharm has some nice support for docstrings
Update: Since Python 3.5 you can use type hints which is a compact, machine-readable syntax:
from typing import Dict, Union
def foo(i: int, d: Dict[str, Union[str, int]]) -> int:
"""
Explanation: this function takes two arguments: `i` and `d`.
`i` is annotated simply as `int`. `d` is a dictionary with `str` keys
and values that can be either `str` or `int`.
The return type is `int`.
"""
The main advantage of this syntax is that it is defined by the language and that it's unambiguous, so tools like PyCharm can easily take advantage from it.
python doc strings are free-form, you can document it in any way you like.
Examples:
def mymethod(self, foo, bars):
"""
Does neat stuff!
Parameters:
foo - a foo of type FooType to bar with.
bars - The list of bars
"""
Now, there are some conventions, but python doesn't enforce any of them. Some projects have their own conventions. Some tools to work with docstrings also follow specific conventions.
If you plan to use Sphinx to document your code, it is capable of producing nicely formatted HTML docs for your parameters with their 'signatures' feature. http://sphinx-doc.org/domains.html#signatures
The mainstream is, as other answers here already pointed out, probably going with the Sphinx way so that you can use Sphinx to generate those fancy documents later.
That being said, I personally go with inline comment style occasionally.
def complex( # Form a complex number
real=0.0, # the real part (default 0.0)
imag=0.0 # the imaginary part (default 0.0)
): # Returns a complex number.
"""Form a complex number.
I may still use the mainstream docstring notation,
if I foresee a need to use some other tools
to generate an HTML online doc later
"""
if imag == 0.0 and real == 0.0:
return complex_zero
other_code()
One more example here, with some tiny details documented inline:
def foo( # Note that how I use the parenthesis rather than backslash "\"
# to natually break the function definition into multiple lines.
a_very_long_parameter_name,
# The "inline" text does not really have to be at same line,
# when your parameter name is very long.
# Besides, you can use this way to have multiple lines doc too.
# The one extra level indentation here natually matches the
# original Python indentation style.
#
# This parameter represents blah blah
# blah blah
# blah blah
param_b, # Some description about parameter B.
# Some more description about parameter B.
# As you probably noticed, the vertical alignment of pound sign
# is less a concern IMHO, as long as your docs are intuitively
# readable.
last_param, # As a side note, you can use an optional comma for
# your last parameter, as you can do in multi-line list
# or dict declaration.
): # So this ending parenthesis occupying its own line provides a
# perfect chance to use inline doc to document the return value,
# despite of its unhappy face appearance. :)
pass
The benefits (as #mark-horvath already pointed out in another comment) are:
Most importantly, parameters and their doc always stay together, which brings the following benefits:
Less typing (no need to repeat variable name)
Easier maintenance upon changing/removing variable. There will never be some orphan parameter doc paragraph after you rename some parameter.
and easier to find missing comment.
Now, some may think this style looks "ugly". But I would say "ugly" is a subjective word. A more neutual way is to say, this style is not mainstream so it may look less familiar to you, thus less comfortable. Again, "comfortable" is also a subjective word. But the point is, all the benefits described above are objective. You can not achieve them if you follow the standard way.
Hopefully some day in the future, there will be a doc generator tool which can also consume such inline style. That will drive the adoption.
PS: This answer is derived from my own preference of using inline comments whenever I see fit. I use the same inline style to document a dictionary too.
Building upon the type-hints answer (https://stackoverflow.com/a/9195565/2418922), which provides a better structured way to document types of parameters, there exist also a structured manner to document both type and descriptions of parameters:
def copy_net(
infile: (str, 'The name of the file to send'),
host: (str, 'The host to send the file to'),
port: (int, 'The port to connect to')):
pass
example adopted from: https://pypi.org/project/autocommand/
Docstrings are only useful within interactive environments, e.g. the Python shell. When documenting objects that are not going to be used interactively (e.g. internal objects, framework callbacks), you might as well use regular comments. Here’s a style I use for hanging indented comments off items, each on their own line, so you know that the comment is applying to:
def Recomputate \
(
TheRotaryGyrator,
# the rotary gyrator to operate on
Computrons,
# the computrons to perform the recomputation with
Forthwith,
# whether to recomputate forthwith or at one's leisure
) :
# recomputates the specified rotary gyrator with
# the desired computrons.
...
#end Recomputate
You can’t do this sort of thing with docstrings.

How to know function return type and argument types?

While I am aware of the duck-typing concept of Python, I sometimes struggle with the type of arguments of functions, or the type of the return value of the function.
Now, if I wrote the function myself, I DO know the types. But what if somebody wants to use and call my functions, how is he/she expected to know the types?
I usually put type information in the function's docstring (like: "...the id argument should be an integer..." and "... the function will return a (string, [integer]) tuple.")
But is looking up the information in the docstring (and putting it there, as a coder) really the way it is supposed to be done?
Edit: While the majority of answers seem to direct towards "yes, document!" I feel this is not always very easy for 'complex' types. For example: how to describe concisely in a docstring that a function returns a list of tuples, with each tuple of the form (node_id, node_name, uptime_minutes) and that the elements are respectively a string, string and integer?
The docstring PEP documentation doesn't give any guidelines on that.
I guess the counterargument will be that in that case classes should be used, but I find python very flexible because it allows passing around these things using lists and tuples, i.e. without classes.
Well things have changed a little bit since 2011! Now there's type hints in Python 3.5 which you can use to annotate arguments and return the type of your function. For example this:
def greeting(name):
return 'Hello, {}'.format(name)
can now be written as this:
def greeting(name: str) -> str:
return 'Hello, {}'.format(name)
As you can now see types, there's some sort of optional static type checking which will help you and your type checker to investigate your code.
for more explanation I suggest to take a look at the blog post on type hints in PyCharm blog.
This is how dynamic languages work. It is not always a good thing though, especially if the documentation is poor - anyone tried to use a poorly documented python framework? Sometimes you have to revert to reading the source.
Here are some strategies to avoid problems with duck typing:
create a language for your problem domain
this will help you to name stuff properly
use types to represent concepts in your domain language
name function parameters using the domain language vocabulary
Also, one of the most important points:
keep data as local as possible!
There should only be a few well-defined and documented types being passed around. Anything else should be obvious by looking at the code: Don't have weird parameter types coming from far away that you can't figure out by looking in the vicinity of the code...
Related, (and also related to docstrings), there is a technique in python called doctests. Use that to document how your methods are expected to be used - and have nice unit test coverage at the same time!
Actually there is no need as python is a dynamic language, BUT if you want to specify a return value then do this
def foo(a) -> int: #after arrow type the return type
return 1 + a
But it won't really help you much. It doesn't raise exceptions in the same way like in staticly-typed language like java, c, c++. Even if you returned a string, it won't raise any exceptions.
and then for argument type do this
def foo(a: int) -> int:
return a+ 1
after the colon (:)you can specify the argument type.
This won't help either, to prove this here is an example:
def printer(a: int) -> int: print(a)
printer("hello")
The function above actually just returns None, because we didn't return anything, but we did tell it we would return int, but as I said it doesn't help. Maybe it could help in IDEs (Not all but few like pycharm or something, but not on vscode)
I attended a coursera course, there was lesson in which, we were taught about design recipe.
Below docstring format I found preety useful.
def area(base, height):
'''(number, number ) -> number #**TypeContract**
Return the area of a tring with dimensions base #**Description**
and height
>>>area(10,5) #**Example **
25.0
>>area(2.5,3)
3.75
'''
return (base * height) /2
I think if docstrings are written in this way, it might help a lot to developers.
Link to video [Do watch the video] : https://www.youtube.com/watch?v=QAPg6Vb_LgI
Yes, you should use docstrings to make your classes and functions more friendly to other programmers:
More: http://www.python.org/dev/peps/pep-0257/#what-is-a-docstring
Some editors allow you to see docstrings while typing, so it really makes work easier.
Yes it is.
In Python a function doesn't always have to return a variable of the same type (although your code will be more readable if your functions do always return the same type). That means that you can't specify a single return type for the function.
In the same way, the parameters don't always have to be the same type too.
For example: how to describe concisely in a docstring that a function returns a list of tuples, with each tuple of the form (node_id, node_name, uptime_minutes) and that the elements are respectively a string, string and integer?
Um... There is no "concise" description of this. It's complex. You've designed it to be complex. And it requires complex documentation in the docstring.
Sorry, but complexity is -- well -- complex.
Answering my own question >10 years later, there are now 2 things I use to manage this:
type hints (as already mentioned in other answers)
dataclasses, when parameter or return type hints become unwieldy/hard to read
As an example of the latter, say I have a function
def do_something(param:int) -> list[tuple[list, int|None]]:
...
return result
I would now rewrite using a dataclass, e.g. along the lines of:
from dataclasses import dataclass
#dataclass
class Stat:
entries: list
value: int | None = None
def do_something(param:int) -> list[Stat]:
...
return result
Yes, since it's a dynamically type language ;)
Read this for reference: PEP 257
Docstrings (and documentation in general). Python 3 introduces (optional) function annotations, as described in PEP 3107 (but don't leave out docstrings)

In Sphinx, is there a way to document parameters along with declaring them?

I prefer to document each parameter (as needed) on the same line where I declare the parameter in order to apply D.R.Y.
If I have code like this:
def foo(
flab_nickers, # a series of under garments to process
has_polka_dots=False,
needs_pressing=False # Whether the list of garments should all be pressed
):
...
How can I avoid repeating the parameters in the doc string and retain the parameter explanations?
I want to avoid:
def foo(
flab_nickers, # a series of under garments to process
has_polka_dots=False,
needs_pressing=False # Whether the list of garments should all be pressed
):
'''Foo does whatever.
* flab_nickers - a series of under garments to process
* needs_pressing - Whether the list of garments should all be pressed.
[Default False.]
Is this possible in python 2.6 or python 3 with some sort of decorator manipulation? Is there some other way?
I would do this.
Starting with this code.
def foo(
flab_nickers, # a series of under garments to process
has_polka_dots=False,
needs_pressing=False # Whether the list of garments should all be pressed
):
...
I would write a parser that grabs the function parameter definitions and builds the following:
def foo(
flab_nickers,
has_polka_dots=False,
needs_pressing=False,
):
"""foo
:param flab_nickers: a series of under garments to process
:type flab_nickers: list or tuple
:param has_polka_dots: default False
:type has_polka_dots: bool
:param needs_pressing: default False, Whether the list of garments should all be pressed
:type needs_pressing: bool
"""
...
That's some pretty straight-forward regex processing of the various arguments string patterns to fill in the documentation template.
A lot of good Python IDEs (for example PyCharm) understand the default Sphinx param notation and even flag vars/methods in the scope that IDE thinks does not conform to the declared type.
Note the extra comma in the code; that's just to make things consistent. It does no harm, and it might simplify things in the future.
You can also try and use the Python compiler to get a parse tree, revise it and emit the update code. I've done this for other languages (not Python), so I know a little bit about it, but don't know how well supported it is in Python.
Also, this is a one-time transformation.
The original in-line comments in the function definition don't really follow DRY because it's a comment, in an informal language, and unusable by any but the most sophisticated tools.
The Sphinx comments are closer to DRY because they're in the RST markup language, making them much easier to process using ordinary text-parsing tools in docutils.
It's only DRY if tools can make use of it.
Useful links:
https://pythonhosted.org/an_example_pypi_project/sphinx.html#function-definitions
http://sphinx-doc.org/domains.html#id1
Annotations are meant to partly address this problem in Python 3:
http://www.python.org/dev/peps/pep-3107/
I'm not sure if there has been any work in applying these to Sphinx yet.
You can't do that without a preprocessor, as comments don't exist for Python once the source has been compiled. To avoid repeating yourself, remove the comments and document the parameters only in the docstring, this is the standard way to document your arguments.

What is the proper way to comment functions in Python?

Is there a generally accepted way to comment functions in Python? Is the following acceptable?
#########################################################
# Create a new user
#########################################################
def add(self):
The correct way to do it is to provide a docstring. That way, help(add) will also spit out your comment.
def add(self):
"""Create a new user.
Line 2 of comment...
And so on...
"""
That's three double quotes to open the comment and another three double quotes to end it. You can also use any valid Python string. It doesn't need to be multiline and double quotes can be replaced by single quotes.
See: PEP 257
Use docstrings.
This is the built-in suggested convention in PyCharm for describing function using docstring comments:
def test_function(p1, p2, p3):
"""
test_function does blah blah blah.
:param p1: describe about parameter p1
:param p2: describe about parameter p2
:param p3: describe about parameter p3
:return: describe what it returns
"""
pass
Use a docstring, as others have already written.
You can even go one step further and add a doctest to your docstring, making automated testing of your functions a snap.
Use a docstring:
A string literal that occurs as the first statement in a module, function, class, or method definition. Such a docstring becomes the __doc__ special attribute of that object.
All modules should normally have docstrings, and all functions and classes exported by a module should also have docstrings. Public methods (including the __init__ constructor) should also have docstrings. A package may be documented in the module docstring of the __init__.py file in the package directory.
String literals occurring elsewhere in Python code may also act as documentation. They are not recognized by the Python bytecode compiler and are not accessible as runtime object attributes (i.e. not assigned to __doc__ ), but two types of extra docstrings may be extracted by software tools:
String literals occurring immediately after a simple assignment at the top level of a module, class, or __init__ method are called "attribute docstrings".
String literals occurring immediately after another docstring are called "additional docstrings".
Please see PEP 258 , "Docutils Design Specification" [2] , for a detailed description of attribute and additional docstrings...
The principles of good commenting are fairly subjective, but here are some guidelines:
Function comments should describe the intent of a function, not the implementation
Outline any assumptions that your function makes with regards to system state. If it uses any global variables (tsk, tsk), list those.
Watch out for excessive ASCII art. Having long strings of hashes may seem to make the comments easier to read, but they can be annoying to deal with when comments change
Take advantage of language features that provide 'auto documentation', i.e., docstrings in Python, POD in Perl, and Javadoc in Java
I would go for a documentation practice that integrates with a documentation tool such as Sphinx.
The first step is to use a docstring:
def add(self):
""" Method which adds stuff
"""
Read about using docstrings in your Python code.
As per the Python docstring conventions:
The docstring for a function or method should summarize its behavior and document its arguments, return value(s), side effects, exceptions raised, and restrictions on when it can be called (all if applicable). Optional arguments should be indicated. It should be documented whether keyword arguments are part of the interface.
There will be no golden rule, but rather provide comments that mean something to the other developers on your team (if you have one) or even to yourself when you come back to it six months down the road.
I would go a step further than just saying "use a docstring". Pick a documentation generation tool, such as pydoc or epydoc (I use epydoc in pyparsing), and use the markup syntax recognized by that tool. Run that tool often while you are doing your development, to identify holes in your documentation. In fact, you might even benefit from writing the docstrings for the members of a class before implementing the class.
While I agree that this should not be a comment, but a docstring as most (all?) answers suggest, I want to add numpydoc (a docstring style guide).
If you do it like this, you can (1) automatically generate documentation and (2) people recognize this and have an easier time to read your code.
You can use three quotes to do it.
You can use single quotes:
def myfunction(para1,para2):
'''
The stuff inside the function
'''
Or double quotes:
def myfunction(para1,para2):
"""
The stuff inside the function
"""
The correct way is as follows:
def search_phone_state(phone_number_start,state,dataframe_path,separator):
"""
returns records whose phone numbers begin with a phone_number_start and are from state
"""
dataframe = pd.read_csv(filepath_or_buffer=dataframe_path, sep=separator, header=0)
return dataframe[(pd.Series(dataframe["Phone"].values.tolist()).str.startswith(phone_number_start, na="False"))& (dataframe["State"]==state)]
If you do:
help(search_phone_state)
It will print:
Help on function search_phone_state in module __main__:
search_phone_state(phone_number_start, state, dataframe_path, separator)
returns records whose phone numbers begin with a phone_number_start and are from state

Categories

Resources