I am using python to do some processing on .py files. These .py files may be from unknown sources so I do not wish to directly run their code (for security), and may not have their dependencies installed anyway. I am analysing these files using python's tokenize module and then using the tokens to look at what the types of any NAME tokens are. For a function or class declared in a file you can just do:
import tokenize
# tokenize the source file ...
all_functions = []
for index, token in enumerate(tokens):
# check the token type
if token[0] == tokenize.NAME:
# check the token's string
if token[1] == "def":
# the next token is always the name of the function
all_functions.append(tokens[index + 1][1])
elif token[1] == "class":
# as above but for classes ...
The problem is that for an imported module I don't know how to tell the difference between a class and a function without seeing its declaration.
Take the following code snippet:
import pathlib
foo = pathlib.Path("some/path")
bar = pathlib.urlquote_from_bytes(b"some bytes")
Because this is well written code (PEP8 compliant), I can assume that pathlib.Path will be a class because the first character is uppercase and I can assume that pathlib.urlquote_from_bytes will be a function because it uses lower case words with underscores, however I cannot know for sure without having the module's source code (which may not be the case). Not all of the .py files I receive will necessarily be well written (PEP8 compliant) so I cannot rely on this.
Is there any other way of finding out whether some python module's attributes are of a given type? A thought I had would be to run python3 -m py_compile <file> and then analyse the result .pyc file, but I have never looked into cpython so I don't know if this would actually be helpful or not. Any suggestions would be welcome.
For this specific use case, it turned out I did not need to separate out classes and functions and could wrap them all up as callables. A callable is essentially any instance that has a __call__() method, e.g. some method x(arg1, arg2, ...) is shorthand for x.__call__(arg1, arg2, ...).
For anyone using older versions of Python it is worth noting that "this function was first removed in Python 3.0 and then brought back in Python 3.2."
Further reading:
Blog post: Is it a class or a function? It's a callable!
Python3 docs: Call function
StackOverflow: What is a callable?
I have an application that embeds python and exposes its internal object model as python objects/classes.
For autocompletion/scripting purposes I'd like to extract a mock of the inernal object model, containing the doc tags, structure, functions, etc so I can use it as library source for the IDE autocompletion.
Does someone know of a library, or has some code snippet that could be used to dump those classes to source?
Use the dir() or globals() function to get the list of what has been defined yet. Then, to filter and browse your classes use the inspect module
Example toto.py:
class Example(object):
"""class docstring"""
def hello(self):
"""hello doctring"""
pass
Example browse.py:
import inspect
import toto
for name, value in inspect.getmembers(toto):
# First ignore python defined variables
if name.startswith('__'):
continue
# Now only browse classes
if not inspect.isclass(value):
continue
print "Found class %s with doctring \"%s\"" % (name, inspect.getdoc(value))
# Only browse functions in the current class
for sub_name, sub_value in inspect.getmembers(value):
if not inspect.ismethod(sub_value):
continue
print " Found method %s with docstring \"%s\"" % \
(sub_name, inspect.getdoc(sub_value))
python browse.py:
Found class Example with doctring "class docstring"
Found method hello with docstring "hello doctring"
Also, that doesn't really answer your question, but if you're writing a sort of IDE, you can also use the ast module to parse python source files and get information about them
Python data structures are mutable (see What is a monkey patch?), so extracting a mock would not be enough. You could instead ask the interpreter for possible autocompletion strings dynamically using the dir() built-in function.
I have a written a program in Python 3 and are using Sphinx to document it. Sphinx's autodoc is great, however it only works with Python 2. Some modules work fine in autodoc, however modules don't. Some examples: Python 2 complains about Python 3 style metaclasses, and some modules which don't exist anymore in Python 2 such as configparser. This is annoying as it cannot import the docstrings from that file.
I don't want to rewrite the whole program in Python 2, however I want to use autodoc.
One idea I had was a small program that read each Python file and removed all functionality but just left the basic function and classes with their docstrings (because autodoc imports each module and reads the docstring of a specific function or class).
import configparser
import os
class TestClass:
"""
I am a class docstring.
"""
def method(self, argument):
"""
I am a method docstring.
"""
#Some code here
print(os.getcwd())
def TestFunction():
"""
I am a function docstring.
"""
#Some more useless code here
return os.path.join("foo", "bar")
into...
class TestClass:
"""
I am a class docstring.
"""
def method(self, argument):
"""
I am a method docstring.
"""
pass
def TestFunction():
"""
I am a function docstring.
"""
pass
In this way the processed code can be read by autodoc, but still have the docstrings which is what I really need. Is this the best way of going about this, and does anyone have any suggestions as to how to write the little program which converts the code.
I can remove the metaclass problem very easily with some regular expressions, but I am struggling with the rest.
m = re.search("\(metaclass=.*\)", file_content)
if m:
file_content = "".join(file_content[:m.start()], file_content[m.end():])
Would the ast module be useful?
Thanks.
You can just install the development version of sphinx, which supports python 3.
pip-3.2 install hg+https://bitbucket.org/birkenfeld/sphinx
I tested the autodoc feature on your class and it worked.
What tends to be the solution is sprinkling try/except clauses in your code.
Python 2.6 has configparser but it is known as ConfigParser (python 3 changed the camelcase names to all lower case)
so something like:
try:
import configparser
except ImportError:
#we are in 2.x
import ConfigParser as configparser
you might want to do a few things like this where it's broken. between the two. metaclasses though between the two I'm not sure to handle.
There's a 3to2 library that can convert Python 3 code to python 2. You could try this in conjunction with Sphinx.
I am doing some heavy commandline stuff (not really web based) and am new to Python, so I was wondering how to set up my files/folders/etc. Are there "header" files where I can keep all the DB connection stuff?
How/where do I define classes and objects?
Just to give you an example of a typical Python module's source, here's something with some explanation. This is a file named "Dims.py". This is not the whole file, just some parts to give an idea what's going on.
#!/usr/bin/env python
This is the standard first line telling the shell how to execute this file. Saying /usr/bin/env python instead of /usr/bin/python tells the shell to find Python via the user's PATH; the desired Python may well be in ~/bin or /usr/local/bin.
"""Library for dealing with lengths and locations."""
If the first thing in the file is a string, it is the docstring for the module. A docstring is a string that appears immediately after the start of an item, which can be accessed via its __doc__ property. In this case, since it is the module's docstring, if a user imports this file with import Dims, then Dims.__doc__ will return this string.
# Units
MM_BASIC = 1500000
MILS_BASIC = 38100
IN_BASIC = MILS_BASIC * 1000
There are a lot of good guidelines for formatting and naming conventions in a document known as PEP (Python Enhancement Proposal) 8. These are module-level variables (constants, really) so they are written in all caps with underscores. No, I don't follow all the rules; old habits die hard. Since you're starting fresh, follow PEP 8 unless you can't.
_SCALING = 1
_SCALES = {
mm_basic: MM_BASIC,
"mm": MM_BASIC,
mils_basic: MILS_BASIC,
"mil": MILS_BASIC,
"mils": MILS_BASIC,
"basic": 1,
1: 1
}
These module-level variables have leading underscores in their names. This gives them a limited amount of "privacy", in that import Dims will not let you access Dims._SCALING. However, if you need to mess with it, you can explicitly say something like import Dims._SCALING as scaling.
def UnitsToScale(units=None):
"""Scales the given units to the current scaling."""
if units is None:
return _SCALING
elif units not in _SCALES:
raise ValueError("unrecognized units: '%s'." % units)
return _SCALES[units]
UnitsToScale is a module-level function. Note the docstring and the use of default values and exceptions. No spaces around the = in default value declarations.
class Length(object):
"""A length. Makes unit conversions easier.
The basic, mm, and mils properties can be used to get or set the length
in the desired units.
>>> x = Length(mils=1000)
>>> x.mils
1000.0
>>> x.mm
25.399999999999999
>>> x.basic
38100000L
>>> x.mils = 100
>>> x.mm
2.54
"""
The class declaration. Note the docstring has things in it that look like Python command line commands. These care called doctests, in that they are test code in the docstring. More on this later.
def __init__(self, unscaled=0, basic=None, mm=None, mils=None, units=None):
"""Constructs a Length.
Default contructor creates a length of 0.
>>> Length()
Length(basic=0)
Length(<float>) or Length(<string>) creates a length with the given
value at the current scale factor.
>>> Length(1500)
Length(basic=1500)
>>> Length("1500")
Length(basic=1500)
"""
# Straight copy
if isinstance(unscaled, Length):
self._x = unscaled._x
return
# rest omitted
This is the initializer. Unlike C++, you only get one, but you can use default arguments to make it look like several different constructors are available.
def _GetBasic(self): return self._x
def _SetBasic(self, x): self._x = x
basic = property(_GetBasic, _SetBasic, doc="""
This returns the length in basic units.""")
This is a property. It allows you to have getter/setter functions while using the same syntax as you would for accessing any other data member, in this case, myLength.basic = 10 does the same thing as myLength._SetBasic(10). Because you can do this, you should not write getter/setter functions for your data members by default. Just operate directly on the data members. If you need to have getter/setter functions later, you can convert the data member to a property and your module's users won't need to change their code. Note that the docstring is on the property, not the getter/setter functions.
If you have a property that is read-only, you can use property as a decorator to declare it. For example, if the above property was to be read-only, I would write:
#property
def basic(self):
"""This returns the length in basic units."""
return self._x
Note that the name of the property is the name of the getter method. You can also use decorators to declare setter methods in Python 2.6 or later.
def __mul__(self, other):
"""Multiplies a Length by a scalar.
>>> Length(10)*10
Length(basic=100)
>>> 10*Length(10)
Length(basic=100)
"""
if type(other) not in _NumericTypes:
return NotImplemented
return Length(basic=self._x * other)
This overrides the * operator. Note that you can return the special value NotImplemented to tell Python that this operation isn't implemented (in this case, if you try to multiply by a non-numeric type like a string).
__rmul__ = __mul__
Since code is just a value like anything else, you can assign the code of one method to another. This line tells Python that the something * Length operation uses the same code as Length * something. Don't Repeat Yourself.
Now that the class is declared, I can get back to module code. In this case, I have some code that I want to run only if this file is executed by itself, not if it's imported as a module. So I use the following test:
if __name__ == "__main__":
Then the code in the if is executed only if this is being run directly. In this file, I have the code:
import doctest
doctest.testmod()
This goes through all the docstrings in the module and looks for lines that look like Python prompts with commands after them. The lines following are assumed to be the output of the command. If the commands output something else, the test is considered to have failed and the actual output is printed. Read the doctest module documentation for all the details.
One final note about doctests: They're useful, but they're not the most versatile or thorough tests available. For those, you'll want to read up on unittests (the unittest module).
Each python source file is a module. There are no "header" files. The basic idea is that when you import "foo" it'll load the code from "foo.py" (or a previously compiled version of it). You can then access the stuff from the foo module by saying foo.whatever.
There seem to be two ways for arranging things in Python code. Some projects use a flat layout, where all of the modules are at the top-level. Others use a hierarchy. You can import foo/bar/baz.py by importing "foo.bar.baz". The big gotcha with hierarchical layout is to have __init__.py in the appropriate directories (it can even be empty, but it should exist).
Classes are defined like this:
class MyClass(object):
def __init__(self, x):
self.x = x
def printX(self):
print self.x
To create an instance:
z = MyObject(5)
You can organize it in whatever way makes the most sense for your application. I don't exactly know what you're doing so I can't be certain what the best organization would be for you, but you can pretty much split it up as you see fit and just import what you need.
You can define classes in any file, and you can define as many classes as you would like in a script (unlike Java). There are no official header files (not like C or C++), but you can use config files to store info about connecting to a DB, whatever, and use configparser (a standard library function) to organize them.
It makes sense to keep like things in the same file, so if you have a GUI, you might have one file for the interface, and if you have a CLI, you might keep that in a file by itself. It's less important how your files are organized and more important how the source is organized into classes and functions.
This would be the place to look for that: http://docs.python.org/reference/.
First of all, compile and install pip: http://pypi.python.org/pypi/pip. It is like Ubuntu's apt-get. You run it via a Terminal by typing in pip install package-name. It has a database of packages, so you can install/uninstall stuff quite easily with it.
As for importing and "header" files, from what I can tell, if you run import foo, Python looks for foo.py in the current folder. If it's not there, it looks for eggs (folders unzipped in the Python module directory) and imports those.
As for defining classes and objects, here's a basic example:
class foo(foobar2): # I am extending a class, in this case 'foobar2'. I take no arguments.
__init__(self, the, list, of, args = True): # Instead, the arguments get passed to me. This still lets you define a 'foo()' objects with three arguments, only you let '__init__' take them.
self.var = 'foo'
def bar(self, args):
self.var = 'bar'
def foobar(self): # Even if you don't need arguments, never leave out the self argument. It's required for classes.
print self.var
foobar = foo('the', 'class', 'args') # This is how you initialize me!
Read more on this in the Python Reference, but my only tip is to never forget the self argument in class functions. It will save you a lot of debugging headaches...
Good luck!
There's no some fixed structure for Python programs, but you can take Django project as an example. Django project consists of one settings.py module, where global settings (like your example with DB connection properties) are stored and pluggable applications. Each application has it's own models.py module, which stores database models and, possibly, other domain specific objects. All the rest is up to you.
Note, that these advices are not specific to Python. In C/C++ you probably used similar structure and kept settings in XML. Just forget about headers and put settings in plain in .py file, that's all.
That's it. If you want to document a function or a class, you put a string just after the definition. For instance:
def foo():
"""This function does nothing."""
pass
But what about a module? How can I document what a file.py does?
Add your docstring as the first statement in the module.
"""
Your module's verbose yet thorough docstring.
"""
import foo
# ...
For packages, you can add your docstring to __init__.py.
For the packages, you can document it in __init__.py.
For the modules, you can add a docstring simply in the module file.
All the information is here: http://www.python.org/dev/peps/pep-0257/
Here is an Example Google Style Python Docstrings on how module can be documented. Basically there is an information about a module, how to execute it and information about module level variables and list of ToDo items.
"""Example Google style docstrings.
This module demonstrates documentation as specified by the `Google
Python Style Guide`_. Docstrings may extend over multiple lines.
Sections are created with a section header and a colon followed by a
block of indented text.
Example:
Examples can be given using either the ``Example`` or ``Examples``
sections. Sections support any reStructuredText formatting, including
literal blocks::
$ python example_google.py
Section breaks are created by resuming unindented text. Section breaks
are also implicitly created anytime a new section starts.
Attributes:
module_level_variable1 (int): Module level variables may be documented in
either the ``Attributes`` section of the module docstring, or in an
inline docstring immediately following the variable.
Either form is acceptable, but the two should not be mixed. Choose
one convention to document module level variables and be consistent
with it.
Todo:
* For module TODOs
* You have to also use ``sphinx.ext.todo`` extension
.. _Google Python Style Guide:
http://google.github.io/styleguide/pyguide.html
"""
module_level_variable1 = 12345
def my_function():
pass
...
...
You do it the exact same way. Put a string in as the first statement in the module.
It's easy, you just add a docstring at the top of the module.
For PyPI Packages:
If you add doc strings like this in your __init__.py file as seen below
"""
Please refer to the documentation provided in the README.md,
which can be found at gorpyter's PyPI URL: https://pypi.org/project/gorpyter/
"""
# <IMPORT_DEPENDENCIES>
def setup():
"""Verify your Python and R dependencies."""
Then you will receive this in everyday usage of the help function.
help(<YOUR_PACKAGE>)
DESCRIPTION
Please refer to the documentation provided in the README.md,
which can be found at gorpyter's PyPI URL: https://pypi.org/project/gorpyter/
FUNCTIONS
setup()
Verify your Python and R dependencies.
Note, that my help DESCRIPTION is triggered by having that first docstring at the very top of the file.