How to find undocumented methods in my code? - python

I am writing documentation for a project and I would like to make sure I did not miss any method. The code is written in Python and I am using PyCharm as an IDE.
Basically, I would need a REGEX to match something like:
def method_name(with, parameters):
someVar = something()
...
but it should NOT match:
def method_name(with, parameters):
""" The doc string """
...
I tried using PyCharm's search with REGEX feature with the pattern ):\s*[^"'] so it would match any line after : that doesn't start with " or ' after whitespace, but it doesn't work. Any idea why?

You mentioned you were using PyCharm: there is an inspection "Missing, empty, or incorrect docstring" that you can enable and will do that for you.
Note that you can then change the severity for it to show up more or less prominently.

There is a tool called pydocstyle which checks if all classes, functions, etc. have properly formatted docstrings.
Example from the README:
$ pydocstyle test.py
test.py:18 in private nested class `meta`:
D101: Docstring missing
test.py:27 in public function `get_user`:
D300: Use """triple double quotes""" (found '''-quotes)
test:75 in public function `init_database`:
D201: No blank lines allowed before function docstring (found 1)
I don't know about PyCharm, but pydocstyle can, for example, be integrated in Vim using the Syntastic plugin.

I don't know python, but I do know my regex.
And your regex has issues. First of all, as comments have mentioned, you may have to escape the closing parenthesis. Secondly, you don't match the new line following the function declaration. Finally, you look for single or double quotations at the START of a line, yet the start of a line contains whitespace.
I was able to match your sample file with \):\s*\n\s*["']. This is a multiline regex. Not all programs are able to match multiline regex. With grep, for example, you'd have to use this method.
A quick explanation of what this regex matches: it looks for a closing parenthesis followed by a semicolon. Any number of optional whitespace may follow that. Then there should be a new line followed by any number of whitespace (indentation, in this case). Finally, there must be a single or double quote. Note that this matches functions that do have comments. You'd want to invert this to find those without.

In case PyCharm is not available, there is a little tool called ckdoc written in Python 3.5.
Given one or more files, it finds modules, classes and functions without a docstring. It doesn't search in imported built-in or external libraries – it only considers objects defined in files residing in the same folder as the given file, or subfolders of that folder.
Example usage (after removing some docstrings)
> ckdoc/ckdoc.py "ckdoc/ckdoc.py"
ckdoc/ckdoc.py
module
ckdoc
function
Check.documentable
anykey_defaultdict.__getitem__
group_by
namegetter
type
Check
There are cases when it doesn't work. One such case is when using Anaconda with modules. A possible workaround in that case is to use ckdoc from Python shell. Import necessary modules and then call the check function.
> import ckdoc, main
> ckdoc.check(main)
/tmp/main.py
module
main
function
main
/tmp/custom_exception.py
type
CustomException
function
CustomException.__str__
False
The check function returns True if there are no missing docstrings.

Related

Importing custom packages in python [duplicate]

Basically when I have a python file like:
python-code.py
and use:
import (python-code)
the interpreter gives me syntax error.
Any ideas on how to fix it? Are dashes illegal in python file names?
You should check out PEP 8, the Style Guide for Python Code:
Package and Module Names Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability. Python packages should also have short, all-lowercase names, although the use of underscores is discouraged.
Since module names are mapped to file names, and some file systems are case insensitive and truncate long names, it is important that module names be chosen to be fairly short -- this won't be a problem on Unix, but it may be a problem when the code is transported to older Mac or Windows versions, or DOS.
In other words: rename your file :)
One other thing to note in your code is that import is not a function. So import(python-code) should be import python-code which, as some have already mentioned, is interpreted as "import python minus code", not what you intended. If you really need to import a file with a dash in its name, you can do the following::
python_code = __import__('python-code')
But, as also mentioned above, this is not really recommended. You should change the filename if it's something you control.
TLDR
Dashes are not illegal but you should not use them for 3 reasons:
You need special syntax to import files with dashes
Nobody expects a module name with a dash
It's against the recommendations of the Python Style Guide
If you definitely need to import a file name with a dash the special syntax is this:
module_name = __import__('module-name')
Curious about why we need special syntax?
The reason for the special syntax is that when you write import somename you're creating a module object with identifier somename (so you can later use it with e.g. somename.funcname). Of course module-name is not a valid identifier and hence the special syntax that gives a valid one.
You don't get why module-name is not valid identifier?
Don't worry -- I didn't either. Here's a tip to help you: Look at this python line: x=var1-var2. Do you see a subtraction on the right side of the assignment or a variable name with a dash?
PS
Nothing original in my answer except including what I considered to be the most relevant bits of information from all other answers in one place
The problem is that python-code is not an identifier. The parser sees this as python minus code. Of course this won't do what you're asking. You will need to use a filename that is also a valid python identifier. Try replacing the - with an underscore.
On Python 3 use import_module:
from importlib import import_module
python_code = import_module('python-code')
More generally,
import_module('package.subpackage.module')
You could probably import it through some __import__ hack, but if you don't already know how, you shouldn't. Python module names should be valid variable names ("identifiers") -- that means if you have a module foo_bar, you can use it from within Python (print foo_bar). You wouldn't be able to do so with a weird name (print foo-bar -> syntax error).
Although proper file naming is the best course, if python-code is not under our control, a hack using __import__ is better than copying, renaming, or otherwise messing around with other authors' code. However, I tried and it didn't work unless I renamed the file adding the .py extension. After looking at the doc to derive how to get a description for .py, I ended up with this:
import imp
try:
python_code_file = open("python-code")
python_code = imp.load_module('python_code', python_code_file, './python-code', ('.py', 'U', 1))
finally:
python_code_file.close()
It created a new file python-codec on the first run.

Vim: change a function's call-signature througout the code-base

I'm slowly turning Vim into my IDE-of-choice, with ctags and static-analysis/autocompletion/etc plugins (e.g. vim-jedi, youcompleteme, etc). But I haven't found anything that can do one specific task:
Say I have a function (I'll use Python here):
def my_function(outFile, foo, bar):
outFile.write(foo[bar])
Later I change its signature so the outFile positional-argument is a named one:
def my_function(foo, bar, outFile=None):
if outFile is None:
outFile = getDefaultOutfile()
outFile.write(foo[bar])
Now I want to change all of the old calls, thoughout the entire codebase:
my_function(oF, f, b)
to
my_function(f, b, outFile=oF)
Is there an easy way to do this in Vim (or other Linux utils e.g. sed)? I know PyCharm etc can do this, but I'm not intending to jump ship just yet.
You can do this following regex substitution:
:s/\vmy_function\((\w*), (\w*), (\w*)\)/my_function(\2, \3, outfile=\1)/g
Read up on vimregex, specifically capturing groups. Basically, whatever (\w*) matches will be saved and named \1 (then \2, \3, etc.) for us to use in the replacement later.
It's worth noting that this works for your example, but will not work if there's extra or missing spaces. To make this more robust, you could change it to:
:s/\vmy_function\((\w*),\s*(\w*),\s*(\w*)\)/my_function(\2, \3, outfile=\1)/g
As an alternative to find (described in an earlier comment), one can also use grep to open the files containing the desired function:
vim `grep -l 'my_function' *py`
The files will be loaded in different buffers. Then use a general buffer replacement:
bufdo %s/\(my_function\)(.*)/\1(f, b, outFile=oF)/gc
The c flag here is optional but I would recommended for this particular replacement.

Difference between comments in Python, # and """

Starting to program in Python, I see some scripts with comments using # and """ comments """.
What is the difference between these two ways to comment?
The best thing would be to read PEP 8 -- Style Guide for Python Code, but since it is longish, here
is a three-liner:
Comments start with # and are not part of the code.
String (delimited by """ """) is actually called a docstring and is used on special places for defined purposes (briefly: the first thing in a module or function describing the module or function) and is actually accessible in the code (so it is a part of the program; it is not a comment).
Triple quotes is a way to create a multi-line string and or comment:
"""
Descriptive text here
"""
Without assigning to a variable is a none operation that some versions of Python will completely ignore.
PEP 8 suggests when to use block comment/strings, and I personally follow a format like this:
Example Google Style Python Docstrings
The string at the start of a module, class or function is a docstring:
PEP 257 -- Docstring Conventions
that can be accessed with some_obj.__doc__ and is used in help(...). Whether you use "Returns 42" or """Returns 42""" is a matter of style, and using the latter one is more common, even for single-line documentation.
A # comment is just that, a comment. It cannot be accessed at runtime.
The # means the whole line is used for a comment while whatever is in between the two """ quotes is used as comments so you can write comments on multiple lines.
As the user in a previous answer stated, the triple quotes are used to comment multiple lines of code while the # only comments one line.
Look out though, because you can use the triple quotes for docstrings and such.

Automated way to switch from epydoc's docstring formatting to sphinx docstring formatting?

I've got a project which I documented using epydoc. Now I'm trying to switch to sphinx. I formatted all my docstrings for epydocs, using B{}, L{} etc for bolding, linking and the like, and using #param, #return, #raise etc to explain input, output, exceptions and the likes.
So now that I'm switching to sphinx it loses all these features. Is there an automated way to convert docstrings formatted for epydocs to docstrings formatted for sphinx?
To expand on Kevin Horn's answer, docstrings can be translated on the fly in an event handler triggered by the autodoc-process-docstring event.
Below is a small demonstration (try it by adding the code to conf.py). It replaces the # character in some common Epytext fields with :, which is used in the corresponding Sphinx fields.
import re
re_field = re.compile('#(param|type|rtype|return)')
def fix_docstring(app, what, name, obj, options, lines):
for i in xrange(len(lines)):
lines[i] = re_field.sub(r':\1', lines[i])
def setup(app):
app.connect('autodoc-process-docstring', fix_docstring)
Pyment is a tool that can convert Python docstrings and create missing ones skeletons. It can manage Google, Epydoc (javadoc style), Numpydoc, reStructuredText (reST, Sphinx default) docstring formats.
It accepts a single file or a folder (exploring also sub-folders). For each file, it will recognize each docstring format and convert it to the desired one. At the end, a patch will be generated to apply to the file.
To convert your project:
install Pyment
Type the following (you can use a virtualenv):
$ git clone https://github.com/dadadel/pyment.git
$ cd pyment
$ python setup.py install
convert from Epydoc to Sphinx
You can convert your project to Sphinx format (reST), which is the default output format, by doing:
$ pyment /my/folder/project
In theory you could write a Sphinx extension which would catch whatever event gets fired when a docstring gets read (source_read, maybe?) and translate the docstrings on the fly.
I say in theory because:
I've been meaning to write such a thing for a very long time, but haven't managed to get around to it yet.
Translating stuff like this is always harder than it seems.
You could also probably try just replacing all the docstrings in your code with a similar translator outside of Sphinx, perhaps using the ast module or something similar.
As one of the comment suggested, sphinx-epytext does provides the relevant support. How it worked for me:
Installing it is very easy:
pip install -U sphinx-epytext
It contains one file process_docstring.py that converts the epytext markups to reStructuredText markups by replacing # with colon :.
Some of the fields I found missing in there were: ivar, var, cvar, vartype
Simply extend the existing list FIELDS in there:
FIELDS.extend(['ivar', 'var', 'cvar', 'vartype'])
Epytext understands #type for variables, but sphinx understands :vartype.
To fix that, replace the former ones with later ones inside process_docstring method.
Most of the syntax or docstring parts that Sphinx can't comprehend are reported as Warnings. You can log these warnings by running sphinx-build with -w <WarningLogFile>. As per my experience with it, Sphinx is very sensitive about how a field should start or end, missing-formatting-syntax, etc.

How to make Python use a path that contains colons in it?

I have a program that includes an embedded Python 2.6 interpreter. When I invoke the interpreter, I call PySys_SetPath() to set the interpreter's import-path to the subdirectories installed next to my executable that contain my Python script files... like this:
PySys_SetPath("/path/to/my/program/scripts/type1:/path/to/my/program/scripts/type2");
(except that the path strings are dynamically generated based on the current location of my program's executable, not hard-coded as in the example above)
This works fine... except when the clever user decides to install my program underneath a folder that has a colon in its name. In that case, my PySys_SetPath() command ends up looking like this (note the presence of a folder named "path:to"):
PySys_SetPath("/path:to/my/program/scripts/type1:/path:to/my/program/scripts/type2");
... and this breaks all my Python scripts, because now Python looks for script files in "/path", and "to/my/program/scripts/type1" instead of in "/path:to/myprogram/scripts/type1", and so none of the import statements work.
My question is, is there any fix for this issue, other than telling the user to avoid colons in his folder names?
I looked at the makepathobject() function in Python/sysmodule.c, and it doesn't appear to support any kind of quoting or escaping to handle literal colons.... but maybe I am missing some nuance.
The problem you're running into is the PySys_SetPath function parses the string you pass using a colon as the delimiter. That parser sees each : character as delimiting a path, and there isn't a way around this (can't be escaped).
However, you can bypass this by creating a list of the individual paths (each of which may contain colons) and use PySys_SetObject to set the sys.path:
PyListObject *path;
path = (PyListObject *)PyList_New(0);
PyList_Append((PyObject *) path, PyString_FromString("foo:bar"));
PySys_SetObject("path", (PyObject *)path);
Now the interpreter will see "foo:bar" as a distinct component of the sys.path.
Supporting colons in a file path opens up a huge can of worms on multiple operating systems; it is not a valid path character on Windows or Mac OS X, for example, and it doesn't seem like a particularly reasonable thing to support in the context of a scripting environment either for exactly this reason. I'm actually a bit surprised that Linux allows colon filenames too, especially since : is a very common path separator character.
You might try escaping the colon out, i.e. converting /path:to/ to /path\:to/ and see if that works. Other than that, just tell the user to avoid using colons in their file names. They will run into all sorts of problems in quite a few different environments and it's a just plain bad idea.

Categories

Resources