Obtain Python docstring of arbitrary script file

Obtain Python docstring of arbitrary script file - python

I am writing a module that requires the docstring of the calling script. Thus far I have managed to obtain the filename of the calling script using
import inspect
filename = inspect.stack()[1].filename
The docstring can be found inside the calling script using __doc__. Getting the docstring from the called script does not seem trivial however. Of course I could write a function, but this is bound to ignore some uncommon cases. I there a way to actually parse the calling script to find its docstring (without executing its code)?

Based on chaos's suggestion to use ast I wrote the following which seems to work nicely.
import ast
with open(fname, 'r') as f:
tree = ast.parse(f.read())
docstring = ast.get_docstring(tree)

Related

Importing only a specific class from a Python module with `importlib`

How would one go about importing only a specific class from a Python module using its path?
I need to import a specific class from a Python file using the file path. I have no control over the file and its completely outside of my package.
file.py:
class Wanted(metaclass=MyMeta):
...
class Unwanted(metaclass=MyMeta):
...
The metaclass implementation is not relevant here. However, I will point out that it's part of my package and I have full control over it.
Import example:
spec = importlib.util.spec_from_file_location(name='Wanted', location="path_to_module/mudule.py")
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
This works, and Wanted is imported. The problem is that Unwanted is also imported. In fact, as long as there is any string value given for name (including an empty string), both Wanted and Unwanted are imported from the module.
This has the same effect as in the example before, where both Wanted and Unwanted are imported:
importlib.util.spec_from_file_location(name='random string', location="path_to_module/mudule.py")
I'm not looking for a specific solution using importlib; any reasonable way will do. I will point out that I don't have a need of using the class when it's imported, I only need the import to happen and my metaclass will take care of the rest.

If I am not mistaken, the name parameter is just used to name the module you are importing. But, more importantly, when you are importing any module, you are executing the whole file, which means that in your case both of these classes will be created. It would not matter whether you wrote from file import Wanted, import file or used any other form of the import statement.
A Python program is constructed from code blocks. A block is a piece of Python program text that is executed as a unit. The following are blocks: a module, a function body, and a class definition.
Source: https://docs.python.org/3/reference/executionmodel.html#structure-of-a-program

since you named your file "file.py":
from file import Wanted
If your file is in a folder, you can use:
from folder.file import Wanted

Get a module's attribute's type without importing the module or attribute

I am using python to do some processing on .py files. These .py files may be from unknown sources so I do not wish to directly run their code (for security), and may not have their dependencies installed anyway. I am analysing these files using python's tokenize module and then using the tokens to look at what the types of any NAME tokens are. For a function or class declared in a file you can just do:
import tokenize
# tokenize the source file ...
all_functions = []
for index, token in enumerate(tokens):
# check the token type
if token[0] == tokenize.NAME:
# check the token's string
if token[1] == "def":
# the next token is always the name of the function
all_functions.append(tokens[index + 1][1])
elif token[1] == "class":
# as above but for classes ...
The problem is that for an imported module I don't know how to tell the difference between a class and a function without seeing its declaration.
Take the following code snippet:
import pathlib
foo = pathlib.Path("some/path")
bar = pathlib.urlquote_from_bytes(b"some bytes")
Because this is well written code (PEP8 compliant), I can assume that pathlib.Path will be a class because the first character is uppercase and I can assume that pathlib.urlquote_from_bytes will be a function because it uses lower case words with underscores, however I cannot know for sure without having the module's source code (which may not be the case). Not all of the .py files I receive will necessarily be well written (PEP8 compliant) so I cannot rely on this.
Is there any other way of finding out whether some python module's attributes are of a given type? A thought I had would be to run python3 -m py_compile <file> and then analyse the result .pyc file, but I have never looked into cpython so I don't know if this would actually be helpful or not. Any suggestions would be welcome.

For this specific use case, it turned out I did not need to separate out classes and functions and could wrap them all up as callables. A callable is essentially any instance that has a __call__() method, e.g. some method x(arg1, arg2, ...) is shorthand for x.__call__(arg1, arg2, ...).
For anyone using older versions of Python it is worth noting that "this function was first removed in Python 3.0 and then brought back in Python 3.2."
Further reading:
Blog post: Is it a class or a function? It's a callable!
Python3 docs: Call function
StackOverflow: What is a callable?

How to retrieve the fully qualified name from certain functions while parsing the python file

I've noticed that there's a 'qualname' for functions to get their fully qualified name. However, what I'm trying to achieve is to extract a functions call sequence from a given python file, which means to get the fully qualified name while reading the file, but not execute it. Here's an example of the problem.
Given code:
import libA
class A():
bar = libA.Bar()
bar.functionA()
I'm expecting the output of: libA.Bar.functionA.
Is there a way to use the inspect or ast tool to solve this issue?

How can you get the source code of dynamically defined functions in Python?

When defining code dynamically in Python (e.g. through exec or loading it from some other medium other than import), I am unable to get to the source of the defined function.
inspect.getsource seems to look for a loaded module from where it was loaded.
import inspect
code = """
def my_function():
print("Hello dears")
"""
exec(code)
my_function() #Works, as expected
print(inspect.getsource(my_function)) ## Fails with OSError('could not get source code')
Is there any other way to get at the source of a dynamically interpreted function (or other object, for that matter)?

Is there any other way to get at the source of a dynamically interpreted function (or other object, for that matter)?
One option would be to dump the source to a file and exec from there, though that litters your filesystem with garbage you need to cleanup.
A somewhat less reliable but less garbagey alternative would be to rebuild the source (-ish) from the bytecode, using astor.to_source() for instance. It will give you a "corresponding" source but may alter formatting or lose metadata compared to the original.
The simplest would be to simply attach your original source to the created function object:
code = """
def my_function():
print("Hello dears")
"""
exec(code)
my_function.__source__ = code # has nothing to do with getsource
One more alternative (though probably not useful here as I assume you want the body to be created dynamically from a template for instance) would be to swap the codeobject for one you've updated with the correct / relevant firstlineno (and optionally filename though you can set that as part of the compile statement). That's only useful if for some weird reason you have your python code literally embedded in an other file but can't or don't want to extract it to its own module for a normal evaluation.

You can do it almost like below
import inspect
source = """
def foo():
print("Hello World")
"""
file_name ='/tmp/foo.py' # you can use any hash_function
with open(file_name, 'w') as f:
f.write(source)
code = compile(source, file_name, 'exec')
exec(code)
foo() # Works, as expected
print(inspect.getsource(foo))

Python if_py and VIM: exec('import re') does not work

I have a auto-completion script that I would like to modify
to complete class names, attributes, methods, etc. In python
when i do: re.co<TAB> it should give me a list of matching
methods. Problem is, I don't want to parse the re.py file.
I'd prefer to:
import re
and then do dir(re) to get the list of methods. But How???
I tried:
imp_obj = exec('import re')
and it refused to work in if_py! 2 + 2 works though..

The best way to do this is to use the builtin __import__ function to import the module like so:
imp_obj = __import__('re')
Your code probably does not work because import does not return a value that would in turn be returned by exec('import re').
In general, it is a bad idea to use exec on text input by the user because it has a higher probability of executing arbitrary code you don't want to execute.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Obtain Python docstring of arbitrary script file - python

Based on chaos's suggestion to use ast I wrote the following which seems to work nicely. import ast with open(fname, 'r') as f: tree = ast.parse(f.read()) docstring = ast.get_docstring(tree)

Related

Importing only a specific class from a Python module with `importlib`

Get a module's attribute's type without importing the module or attribute

How to retrieve the fully qualified name from certain functions while parsing the python file

How can you get the source code of dynamically defined functions in Python?

Python if_py and VIM: exec('import re') does not work

Categories

Resources