I'm learning Python because I think is an awesome and powerful language like C++, perl or C# but is really really easy at same time. I'm using JetBrains' Pycharm and when I define a function it ask me to add a "Documentation String Stub" when I click yes it adds something like this:
"""
"""
so the full code of the function is something like this:
def otherFunction(h, w):
"""
"""
hello = h
world = w
full_word = h + ' ' + w
return full_word
I would like to know what these (""" """) symbols means, Thanks.
""" """ is the escape sequence for strings spanning several lines in python.
When put right after a function or class declaration they provide the documentation for said function/class (they're called docstrings)
Triple quotes indicate a multi-line string. You can put any text in there to describe the function. It can even be accessed from the program itself:
def thirdFunction():
"""
All it does is printing its own docstring.
Really.
"""
print(thirdFunction.__doc__)
These are called 'docstrings' and provide inline documentation for Python. The PEP describes them generally, and the wikipedia article provides some examples.
You can also assign these to a variable! Line breaks included:
>>> multi_line_str = """First line.
... Second line.
... Third line."""
>>> print(multi_line_str)
First line.
Second line.
Third line.
Theoretically a simple string would also work as a docstring. Even multi line if you add \n for linebreaks on your own.:
>>> def somefunc():
... 'Single quote docstring line one.\nAnd line two!''
... pass
...
>>> help(somefunc)
Help on function somefunc in module __main__:
somefunc()
Single quote docstring line one.
And line two!
But triple quotes ... actually triple double quotes are a standard convention! See PEP237 on this also PEP8!
Just for completeness. :)
Related
How would I write this into a function that gives the same output?
from nltk.book import text2
sorted([word.lower() for word in text2 if len(word)>4 and len(word)<12])
Functions are defined using the special keyword def followed by a function-name and parameters in parenthesis. The body of the function must be indented. Output is in general passed using the return-keyword. For this particular line of code, you can wrap it as such:
from nltk.book import text2
def funcName():
return sorted([word.lower() for word in text2 if len(word)>4 and len(word)<12])
Where funcName can be replaced with any other word, preferably something that describes what the function does more precisely.
To use the function you would add a linefuncName(). The function will then be executed, after execution, the program returns to the line where funcName was called and replaces it with the return-value of the function.
You can find more information about functions in the documentation.
I am not sure I understand you correct.
from nltk.book import text2
def my_func():
return sorted([word.lower() for word in text2 if len(word)>4 and len(word)<12])
my_func()
Welcome to StackOverflow! Unfortunately, it is not our jobs to write code FOR you, but rather help you understand where you are running into some errors.
What you want to do is learn how to lowercase strings, write conditionals (like length > 4 && < 12), and sort arrays.
Those are somewhat basic, and easy to learn functionality of python and looking up those docs can get you your answer. Once you are writing your own python code, we can better help you get your solution and point out any flaws.
My programming is almost all self taught, so I apologise in advance if some of my terminology is off in this question. Also, I am going to use a simple example to help illustrate my question, but please note that the example itself is not important, its just a way to hopefully make my question clearer.
Imagine that I have some poorly formatted text with a lot of extra white space that I want to clean up. So I create a function that will replace any groups of white space characters that has a new line character in it with a single new line character and any other groups of white space characters with a single space. The function might look like this
def white_space_cleaner(text):
new_line_finder = re.compile(r"\s*\n\s*")
white_space_finder = re.compile(r"\s\s+")
text = new_line_finder.sub("\n", text)
text = white_space_finder.sub(" ", text)
return text
That works just fine, the problem is that now every time I call the function it has to compile the regular expressions. To make it run faster I can rewrite it like this
new_line_finder = re.compile(r"\s*\n\s*")
white_space_finder = re.compile(r"\s\s+")
def white_space_cleaner(text):
text = new_line_finder.sub("\n", text)
text = white_space_finder.sub(" ", text)
return text
Now the regular expressions are only compiled once and the function runs faster. Using timeit on both functions I find that the first function takes 27.3 µs per loop and the second takes 25.5 µs per loop. A small speed up, but one that could be significant if the function is called millions of time or has hundreds of patterns instead of 2. Of course, the downside of the second function is that it pollutes the global namespace and makes the code less readable. Is there some "Pythonic" way to include an object, like a compiled regular expression, in a function without having it be recompiled every time the function is called?
Keep a list of tuples (regular expressions and the replacement text) to apply; there doesn't seem to be a compelling need to name each one individually.
finders = [
(re.compile(r"\s*\n\s*"), "\n"),
(re.compile(r"\s\s+"), " ")
]
def white_space_cleaner(text):
for finder, repl in finders:
text = finder.sub(repl, text)
return text
You might also incorporate functools.partial:
from functools import partial
replacers = {
r"\s*\n\s*": "\n",
r"\s\s+": " "
}
# Ugly boiler-plate, but the only thing you might need to modify
# is the dict above as your needs change.
replacers = [partial(re.compile(regex).sub, repl) for regex, repl in replacers.iteritems()]
def white_space_cleaner(text):
for replacer in replacers:
text = replacer(text)
return text
Another way to do it is to group the common functionality in a class:
class ReUtils(object):
new_line_finder = re.compile(r"\s*\n\s*")
white_space_finder = re.compile(r"\s\s+")
#classmethod
def white_space_cleaner(cls, text):
text = cls.new_line_finder.sub("\n", text)
text = cls.white_space_finder.sub(" ", text)
return text
if __name__ == '__main__':
print ReUtils.white_space_cleaner("the text")
It's already grouped in a module, but depending on the rest of the code a class can also be suitable.
You could put the regular expression compilation into the function parameters, like this:
def white_space_finder(text, new_line_finder=re.compile(r"\s*\n\s*"),
white_space_finder=re.compile(r"\s\s+")):
text = new_line_finder.sub("\n", text)
text = white_space_finder.sub(" ", text)
return text
Since default function arguments are evaluated when the function is parsed, they'll only be loaded once and they won't be in the module namespace. They also give you the flexibility to replace those from calling code if you really need to. The downside is that some people might consider it to be polluting the function signature.
I wanted to try timing this but I couldn't figure out how to use timeit properly. You should see similar results to the global version.
Markus's comment on your post is correct, though; sometimes it's fine to put variables at module-level. If you don't want them to be easily visible to other modules, though, consider prepending the names with an underscore; this marks them as module-private and if you do from module import * it won't import names starting with an underscore (you can still get them if you ask from them by name, though).
Always remember; the end-all to "what's the best way to do this in Python" is almost always "what makes the code most readable?" Python was created, first and foremost, to be easy to read, so do what you think is the most readable thing.
In this particular case I think it doesn't matter. Check:
Is it worth using Python's re.compile?
As you can see in the answer, and in the source code:
https://github.com/python/cpython/blob/master/Lib/re.py#L281
The implementation of the re module has a cache of the regular expression itself. So, the small speed up you see is probably because you avoid the lookup for the cache.
Now, as with the question, sometimes doing something like this is very relevant like, again, building a internal cache that remains namespaced to the function.
def heavy_processing(arg):
return arg + 2
def myfunc(arg1):
# Assign attribute to function if first call
if not hasattr(myfunc, 'cache'):
myfunc.cache = {}
# Perform lookup in internal cache
if arg1 in myfunc.cache:
return myfunc.cache[arg1]
# Very heavy and expensive processing with arg1
result = heavy_processing(arg1)
myfunc.cache[arg1] = result
return result
And this is executed like this:
>>> myfunc.cache
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'function' object has no attribute 'cache'
>>> myfunc(10)
12
>>> myfunc.cache
{10: 12}
You can use a static function attribute to hold the compiled re. This example does something similar, keeping a translation table in one function attribute.
def static_var(varname, value):
def decorate(func):
setattr(func, varname, value)
return func
return decorate
#static_var("complements", str.maketrans('acgtACGT', 'tgcaTGCA'))
def rc(seq):
return seq.translate(rc.complements)[::-1]
I know that the parameters can be any object but for the documentation it is quite important to specify what you would expect.
First is how to specify a parameter types like these below?
str (or use String or string?)
int
list
dict
function()
tuple
object instance of class MyClass
Second, how to specify params that can be of multiple types like a function that can handle a single parameter than can be int or str?
Please use the below example to demonstrate the syntax needed for documenting this with your proposed solution. Mind that it is desired to be able to hyperlink reference to the "Image" class from inside the documentation.
def myMethod(self, name, image):
"""
Does something ...
name String: name of the image
image Image: instance of Image Class or a string indicating the filename.
Return True if operation succeeded or False.
"""
return True
Note, you are welcome to suggest the usage of any documentation tool (sphinx, oxygen, ...) as long it is able to deal with the requirements.
Update:
It seams that there is some kind of support for documenting parameter types in doxygen in. general. The code below works but adds an annoying $ to the param name (because it was initially made for php).
#param str $arg description
#param str|int $arg description
There is a better way. We use
def my_method(x, y):
"""
my_method description
#type x: int
#param x: An integer
#type y: int|string
#param y: An integer or string
#rtype: string
#return: Returns a sentence with your variables in it
"""
return "Hello World! %s, %s" % (x,y)
That's it. In the PyCharm IDE this helps a lot. It works like a charm ;-)
You need to add an exclamation mark at the start of the Python docstring for Doxygen to parse it correctly.
def myMethod(self, name, image):
"""!
Does something ...
#param name String: name of the image
#param image Image: instance of Image Class or a string indicating the filename.
#return Return True if operation succeeded or False.
"""
return True
If using Python 3, you can use the function annotations described in PEP 3107.
def compile(
source: "something compilable",
filename: "where the compilable thing comes from",
mode: "is this a single statement or a suite?"):
See also function definitions.
Figured I'd post this little tidbit here since IDEA showed me this was possible, and I was never told nor read about this.
>>> def test( arg: bool = False ) -> None: print( arg )
>>> test(10)
10
When you type test(, IDLE's doc-tip appears with (arg: bool=False) -> None Which was something I thought only Visual Studio did.
It's not exactly doxygen material, but it's good for documenting parameter-types for those using your code.
Yup, #docu is right - this is the (IMHO best) way to combine both documentation schemes more or less seamlessly. If, on the other hand, you also want to do something like putting text on the doxygen-generated index page, you would add
##
# #mainpage (Sub)Heading for the doxygen-generated index page
# Text that goes right onto the doxygen-generated index page
somewhere at the beginning of your Python code.
In other words, where doxygen does not expect Python comments, use ## to alert it that there are tags for it. Where it expects Python comments (e.g. at the beginning of functions or classes), use """!.
Doxygen is great for C++, but if you are working with mostly python code you should give sphinx a try. If you choose sphinx then all you need to do is follow pep8.
In RST, we use some whitespaces in front of a block to say this is a code block. Because Python also uses whitespace to indent a code block, I would like my RST code block to preserve those whitespaces if I were writing Python code. How can I do that?
Let's say we have a class:
class Test(object):
And we want to write a method called __init__ that is a member of this class. This method belongs to another code block but we want to have some visual clue so that readers know that this second block is a continuation of the previous one. At the moment, I use # to mark the vertical guide line of a code block like this:
def __init__(self):
pass
#
Without the #, def __init__(self) would be printed at the same indentation level as class Test(object). There's gotta be more elegant way.
You need to define your own directive (it's true that the standard .. code:: directive gobbles spaces but you can make your own directive that doesn't):
import re
from docutils.parsers.rst import directives
INDENTATION_RE = re.compile("^ *")
def measure_indentation(line):
return INDENTATION_RE.match(line).end()
class MyCodeBlock(directives.body.CodeBlock):
EXPECTED_INDENTATION = 3
def run(self):
block_lines = self.block_text.splitlines()
block_header_len = self.content_offset - self.lineno + 1
block_indentation = measure_indentation(self.block_text)
code_indentation = block_indentation + MyCodeBlock.EXPECTED_INDENTATION
self.content = [ln[code_indentation:] for ln in block_lines[block_header_len:]]
return super(MyCodeBlock, self).run()
directives.register_directive("my-code", MyCodeBlock)
You could of course overwrite the standard .. code:: directive with this, too.
Ah... I've run into this before ;). The # trick is usually what I use, alas. If you read the spec it sounds like it will always take away the leading indent. [1]
You could also use an alternate syntax:
::
> def foo(x):
> pass
With the leading ">" that will preserve leading space.
[1] : http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html#indented-literal-blocks
EDIT
Just dug through the docutils code (this has been bugging me a lot too) and can confirm that it will always strip out the common indent, no questions asked. It would be easy to modify to change this behavior but that would make the resulting restructured text non-standard.
You can also try Line Blocks which look like this:
| def foo(x):
| pass
though they aren't specific to code examples.
Is there an option to print the output of help('myfun'). The behaviour I'm seeing is that output is printed to std.out and the script waits for user input (i.e. type 'q' to continue).
There must be a setting to set this to just dump docstrings.
Alternatively, if I could just dump the docstring PLUS the "def f(args):" line that would be fine too.
Searching for "python help function" is comical. :) Maybe I'm missing some nice pydoc page somewhere out there that explains it all?
To get exactly the help that's printed by help(str) into the variable strhelp:
import pydoc
strhelp = pydoc.render_doc(str, "Help on %s")
Of course you can then easily print it without paging, etc.
You've already seen reference to the docstring, the magic __doc__ variable which holds the body of the help:
def foo(a,b,c):
''' DOES NOTHING!!!! '''
pass
print foo.__doc__ # DOES NOTHING!!!!
To get the name of a function, you just use __name__:
def foo(a,b,c): pass
print foo.__name__ # foo
The way to get the signature of a function which is not built in you can use the func_code property and from that you can read its co_varnames:
def foo(a,b,c): pass
print foo.func_code.co_varnames # ('a', 'b', 'c')
I've not found out how to do the same for built in functions.
If you want to access the raw docstring from code:
myvar = obj.__doc__
print(obj.__doc__)
The help function does some additional processing, the accepted answer shows how to replicate this with pydoc.render_doc().
>>> x = 2
>>> x.__doc__
'int(x[, base]) -> integer\n\nConvert a string or number to an integer, if possi
ble. A floating point\nargument will be truncated towards zero (this does not i
nclude a string\nrepresentation of a floating point number!) When converting a
string, use\nthe optional base. It is an error to supply a base when converting
a\nnon-string. If the argument is outside the integer range a long object\nwill
be returned instead.'
Is that what you needed?
edit - you can print(x.__doc__) and concerning the function signature, you can build it using the inspect module.
>>> inspect.formatargspec(inspect.getargspec(os.path.join))
'((a,), p, None, None)'
>>> help(os.path.join)
Help on function join in module ntpath:
join(a, *p)
Join two or more pathname components, inserting "\" as needed