How to use Python to programmatically generate part of Sphinx documentation?

How to use Python to programmatically generate part of Sphinx documentation? - python

I am using Sphinx to generate the documentation for a project of mine.
In this project, I describe a list of available commands in a yaml file which, once loaded, results in a dictionary in the form {command-name : command-description} for example:
commands = {"copy" : "Copy the highlighted text in the clipboard",
"paste" : "Paste the clipboard text to cursor location",
...}
What I would like to know, is if there is a method in sphinx to load the yaml file during the make html cycle, translate the python dictionary in some reStructuredText format (e.g. a definition list) and include in my html output.
I would expect my .rst file to look like:
Available commands
==================
The commands available in bla-bla-bla...
.. magic-directive-that-execute-python-code::
:maybe python code or name of python file here:
and to be converted internally to:
Available commands
==================
The commands available in bla-bla-bla...
copy
Copy the highlighted text in the clipboard
paste
Paste the clipboard text to cursor location
before being translated to HTML.

At the end I find a way to achieve what I wanted. Here's the how-to:
Create a python script (let's call it generate-includes.py) that will generate the reStructuredText and save it in the myrst.inc file. (In my example, this would be the script loading and parsing the YAML, but this is irrelevant). Make sure this file is executable!!!
Use the include directive in your main .rst document of your documentation, in the point where you want your dynamically-generated documentation to be inserted:
.. include:: myrst.inc
Modify the sphinx Makefile in order to generate the required .inc files at build time:
myrst.inc:
./generate-includes.py
html: myrst.inc
...(other stuff here)
Build your documentation normally with make html.

An improvement based on Michael's code and the built-in include directive:
import sys
from os.path import basename
try:
from StringIO import StringIO
except ImportError:
from io import StringIO
from docutils.parsers.rst import Directive
from docutils import nodes, statemachine
class ExecDirective(Directive):
"""Execute the specified python code and insert the output into the document"""
has_content = True
def run(self):
oldStdout, sys.stdout = sys.stdout, StringIO()
tab_width = self.options.get('tab-width', self.state.document.settings.tab_width)
source = self.state_machine.input_lines.source(self.lineno - self.state_machine.input_offset - 1)
try:
exec('\n'.join(self.content))
text = sys.stdout.getvalue()
lines = statemachine.string2lines(text, tab_width, convert_whitespace=True)
self.state_machine.insert_input(lines, source)
return []
except Exception:
return [nodes.error(None, nodes.paragraph(text = "Unable to execute python code at %s:%d:" % (basename(source), self.lineno)), nodes.paragraph(text = str(sys.exc_info()[1])))]
finally:
sys.stdout = oldStdout
def setup(app):
app.add_directive('exec', ExecDirective)
This one imports the output earlier so that it goes straight through the parser. It also works in Python 3.

I needed the same thing, so I threw together a new directive that seems to work (I know nothing about custom Sphinx directives, but it's worked so far):
import sys
from os.path import basename
from StringIO import StringIO
from sphinx.util.compat import Directive
from docutils import nodes
class ExecDirective(Directive):
"""Execute the specified python code and insert the output into the document"""
has_content = True
def run(self):
oldStdout, sys.stdout = sys.stdout, StringIO()
try:
exec '\n'.join(self.content)
return [nodes.paragraph(text = sys.stdout.getvalue())]
except Exception, e:
return [nodes.error(None, nodes.paragraph(text = "Unable to execute python code at %s:%d:" % (basename(self.src), self.srcline)), nodes.paragraph(text = str(e)))]
finally:
sys.stdout = oldStdout
def setup(app):
app.add_directive('exec', ExecDirective)
It's used as follows:
.. exec::
print "Python code!"
print "This text will show up in the document"

Sphinx doesn't have anything built-in to do what you like. You can either create a custom directive to process your files or generate the reStructuredText in a separate step and include the resulting reStructuredText file using the include directive.

I know this question is old, but maybe someone else will find it useful as well.
It sounds like you don't actually need to execute any python code, but you just need to reformat the contents of your file. In that case you might want to look at sphinx-jinja (https://pypi.python.org/pypi/sphinx-jinja).
You can load your YAML file in the conf.py:
jinja_contexts = yaml.load(yourFileHere)
Then you can use jinja templating to write out the contents and have them treated as reST input.

Sphinx does support custom extensions that would probably be the best way to do this http://sphinx.pocoo.org/ext/tutorial.html.

Not quite the answer you're after, but perhaps a close approximation: yaml2rst. It's a converter from YAML to RST. Doesn't do anything explicitly fancy with the YAML itself, but looks for comment lines (starts with #) and pulls them out into RST chunks (with the YAML going into code-blocks). Allows for a sort-of literate YAML.
Also, the syntax-highlighted YAML is quite readable (heck, it's YAML, not JSON!).

Related

json dump() into specific folder

This seems like it should be simple enough, but haven't been able to find a working example of how to approach this. Simply put I am generating a JSON file based on a list that a script generates. What I would like to do, is use some variables to run the dump() function, and produce a json file into specific folders. By default it of course dumps into the same place the .py file is located, but can't seem to find a way to run the .py file separately, and then produce the JSON file in a new folder of my choice:
import json
name = 'Best'
season = '2019-2020'
blah = ['steve','martin']
with open(season + '.json', 'w') as json_file:
json.dump(blah, json_file)
Take for example the above. What I'd want to do is the following:
Take the variable 'name', and use that to generate a folder of the same name inside the folder the .py file is itself. This would then place the JSON file, in the folder, that I can then manipulate.
Right now my issue is that I can't find a way to produce the file in a specific folder. Any suggestions, as this does seem simple enough, but nothing I've found had a method to do this. Thanks!

Python's pathlib is quite convenient to use for this task:
import json
from pathlib import Path
data = ['steve','martin']
season = '2019-2020'
Paths of the new directory and json file:
base = Path('Best')
jsonpath = base / (season + ".json")
Create the directory if it does not exist and write json file:
base.mkdir(exist_ok=True)
jsonpath.write_text(json.dumps(data))
This will create the directory relative to the directory you started the script in. If you wanted a absolute path, you could use Path('/somewhere/Best').
If you wanted to start the script while beeing in some other directory and still create the new directory into the script's directory, use: Path(__file__).resolve().parent / 'Best'.

First of all, instead of doing everything in same place have a separate function to create folder (if already not present) and dump json data as below:
def write_json(target_path, target_file, data):
if not os.path.exists(target_path):
try:
os.makedirs(target_path)
except Exception as e:
print(e)
raise
with open(os.path.join(target_path, target_file), 'w') as f:
json.dump(data, f)
Then call your function like :
write_json('/usr/home/target', 'my_json.json', my_json_data)

Use string format
import json
import os
name = 'Best'
season = '2019-2020'
blah = ['steve','martin']
try:
os.mkdir(name)
except OSError as error:
print(error)
with open("{}/{}.json".format(name,season), 'w') as json_file:
json.dump(blah, json_file)

Use os.path.join():
with open(os.path.join(name, season+'.json'), 'w') as json_file
The advantage above writing a literal slash is that it will automatically pick the type of slash for the operating system you are on (slash on linux, backslash on windows)

Python signxml XML signature package. How to add xml placehoder for Signature tag?

I am new to Python. I have installed signxml package and I am doing xml signature process.
Link to python package : https://pypi.org/project/signxml/
My xml file is getting generated. However XML signature code is little different. I was able to match most of the part but I donot have idea how to match following one.
Can any one please help me for that.
Different part is following tag
<Signature>
Above part should be like this one
<Signature xmlns="http://www.w3.org/2000/09/xmldsig#">
When i searched into signxml core file i found following note.
To specify the location of an enveloped signature within **data**, insert a
``<ds:Signature Id="placeholder"></ds:Signature>`` element in **data** (where
"ds" is the "http://www.w3.org/2000/09/xmldsig#" namespace). This element will
be replaced by the generated signature, and excised when generating the digest.
How to make change to get this changed.
Following is my python code
from lxml import etree
import xml.etree.ElementTree as ET
from signxml import XMLSigner, XMLVerifier
import signxml
el = ET.parse('example.xml')
root = el.getroot()
#cert = open("key/public.pem").read()
key = open("key/private.pem").read()
signed_root = XMLSigner(method=signxml.methods.enveloped,signature_algorithm='rsa-sha512',digest_algorithm="sha512").sign(root, key=key)
tree = ET.ElementTree(signed_root)
#dv = tree.findall(".//DigestValue");
#print(dv);
tree.write("new_signed_file.xml")
What above code doing is. It takes one xml file and do digital signature process and generates new file.
Can anyone please guide me where and what change should i do for this requirements ?

I am assuming you are using python signxml
Go to python setup and open this file Python\Lib\site-packages\signxml\ __init__.py
Open __init__.py file and do following changes.
Find following code
def _unpack(self, data, reference_uris):
sig_root = Element(ds_tag("Signature"), nsmap=self.namespaces)
Change with following code.
def _unpack(self, data, reference_uris):
#sig_root = Element(ds_tag("Signature"), nsmap=self.namespaces)
sig_root = Element(ds_tag("Signature"), xmlns="http://www.w3.org/2000/09/xmldsig#")
After doing this change re-compile your python signxml package.
Re-generate new xml signature file.

reading a file in python from different directory

Probably a simple query.. But basically, I have data in directory "/foo/bar/foobar.txt"
and I am working in directory "/some/path/read_foobar.py"..
Now I want to read the file "foobar.txt" but rather than giving full path, I thought of adding /foo/bar/ to the path..
So, added the following at the start of read_foobar.py
import sys
sys.path.append("/foo/bar")
But when I try to read open("foobar.txt","r"), it is not able to find the file?
how do I do this?
Thanks

You can do it like this:
import os
os.chdir('/foo/bar')
f = open('foobar.txt', 'r')

sys.path is used to set the path used to look for python modules. Short of you writing some helper function that has a list of directories to search in when opening a file, I don't believe there is a standard module that provides this functionality.

From what I gathered from here and some quick tests, appending a path to sys.path will make python search in that path when you import a file/module, but not when open-ing it. Let's say we have a file called foo.py in /foo/bar/
import sys
sys.path.append("/foo/bar/")
try:
f = open('foo.py', 'r')
except:
print('this did not work') # this will print
import foo # no problems here

Unfortunately you can't. The PATH environment variable is only used by the operating system to search for executable files, and python uses it (along with the environment variable PYTHONPATH) to search for python modules to import.
You may want to consider setting a symbolic link to that file from your current working directory
ln -s /foo/bar/foobar.txt /some/path/foobar.text

How to check type of files without extensions? [duplicate]

This question already has answers here:
How to find the mime type of a file in python?
(18 answers)
Closed 1 year ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
I have a folder full of files and they don't have an extension. How can I check file types? I want to check the file type and change the filename accordingly. Let's assume a function filetype(x) returns a file type like png. I want to do this:
files = os.listdir(".")
for f in files:
os.rename(f, f+filetype(f))
How do I do this?

There are Python libraries that can recognize files based on their content (usually a header / magic number) and that don't rely on the file name or extension.
If you're addressing many different file types, you can use python-magic. That's just a Python binding for the well-established magic library. This has a good reputation and (small endorsement) in the limited use I've made of it, it has been solid.
There are also libraries for more specialized file types. For example, the Python standard library has the imghdr module that does the same thing just for image file types.
If you need dependency-free (pure Python) file type checking, see filetype.

The Python Magic library provides the functionality you need.
You can install the library with pip install python-magic and use it as follows:
>>> import magic
>>> magic.from_file('iceland.jpg')
'JPEG image data, JFIF standard 1.01'
>>> magic.from_file('iceland.jpg', mime=True)
'image/jpeg'
>>> magic.from_file('greenland.png')
'PNG image data, 600 x 1000, 8-bit colormap, non-interlaced'
>>> magic.from_file('greenland.png', mime=True)
'image/png'
The Python code in this case is calling to libmagic beneath the hood, which is the same library used by the *NIX file command. Thus, this does the same thing as the subprocess/shell-based answers, but without that overhead.

On unix and linux there is the file command to guess file types. There's even a windows port.
From the man page:
File tests each argument in an attempt to classify it. There are three
sets of tests, performed in this order: filesystem tests, magic number
tests, and language tests. The first test that succeeds causes the
file type to be printed.
You would need to run the file command with the subprocess module and then parse the results to figure out an extension.
edit: Ignore my answer. Use Chris Johnson's answer instead.

In the case of images, you can use the imghdr module.
>>> import imghdr
>>> imghdr.what('8e5d7e9d873e2a9db0e31f9dfc11cf47') # You can pass a file name or a file object as first param. See doc for optional 2nd param.
'png'
Python 2 imghdr doc
Python 3 imghdr doc

import subprocess as sub
p = sub.Popen('file yourfile.txt', stdout=sub.PIPE, stderr=sub.PIPE)
output, errors = p.communicate()
print(output)
As Steven pointed out, subprocess is the way. You can get the command output by the way above as this post said

You can also install the official file binding for Python, a library called file-magic (it does not use ctypes, like python-magic).
It's available on PyPI as file-magic and on Debian as python-magic. For me this library is the best to use since it's available on PyPI and on Debian (and probably other distributions), making the process of deploying your software easier.
I've blogged about how to use it, also.

With newer subprocess library, you can now use the following code (*nix only solution):
import subprocess
import shlex
filename = 'your_file'
cmd = shlex.split('file --mime-type {0}'.format(filename))
result = subprocess.check_output(cmd)
mime_type = result.split()[-1]
print mime_type

also you can use this code (pure python by 3 byte of header file):
full_path = os.path.join(MEDIA_ROOT, pathfile)
try:
image_data = open(full_path, "rb").read()
except IOError:
return "Incorrect Request :( !!!"
header_byte = image_data[0:3].encode("hex").lower()
if header_byte == '474946':
return "image/gif"
elif header_byte == '89504e':
return "image/png"
elif header_byte == 'ffd8ff':
return "image/jpeg"
else:
return "binary file"
without any package install [and update version]

Only works for Linux but Using the "sh" python module you can simply call any shell command
https://pypi.org/project/sh/
pip install sh
import sh
sh.file("/root/file")
Output:
/root/file: ASCII text

This code list all files of a given extension in a given folder recursively
import magic
import glob
from os.path import isfile
ROOT_DIR = 'backup'
WANTED_EXTENSION = 'sqlite'
for filename in glob.iglob(ROOT_DIR + '/**', recursive=True):
if isfile(filename):
extension = magic.from_file(filename, mime = True)
if WANTED_EXTENSION in extension:
print(filename)
https://gist.github.com/izmcm/6a5d6fa8d4ec65fd9851a1c06c8946ac

how to use pkgutils.get_data with csv.reader in python?

I have a python module that has a variety of data files, (a set of csv files representing curves) that need to be loaded at runtime. The csv module works very well
# curvefile = "ntc.10k.csv"
raw = csv.reader(open(curvefile, 'rb'), delimiter=',')
But if I import this module into another script, I need to find the full path to the data file.
/project
/shared
curve.py
ntc.10k.csv
ntc.2k5.csv
/apps
script.py
I want the script.py to just refer to the curves by basic filename, not with full paths. In the module code, I can use:
pkgutil.get_data("curve", "ntc.10k.csv")
which works very well at finding the file, but it returns the csv file already read in, whereas the csv.reader requires the file handle itself. Is there any way to make these two modules play well together? They're both standard libary modules, so I wasn't really expecting problems. I know I can start splitting the pkgutil binary file data, but then I might as well not be using the csv library.
I know I can just use this in the module code, and forget about pkgutils, but it seems like pkgutils is really exactly what this is for.
this_dir, this_filename = os.path.split(__file__)
DATA_PATH = os.path.join(this_dir, curvefile)
raw = csv.reader(open(DATA_PATH, "rb"))

I opened up the source code to get_data, and it is trivial to have it return the path to the file instead of the loaded file. This module should do the trick. Use the keyword as_string=True to return the file read into memory, or as_string=False, to return the path.
import os, sys
from pkgutil import get_loader
def get_data_smart(package, resource, as_string=True):
"""Rewrite of pkgutil.get_data() that actually lets the user determine if data should
be returned read into memory (aka as_string=True) or just return the file path.
"""
loader = get_loader(package)
if loader is None or not hasattr(loader, 'get_data'):
return None
mod = sys.modules.get(package) or loader.load_module(package)
if mod is None or not hasattr(mod, '__file__'):
return None
# Modify the resource name to be compatible with the loader.get_data
# signature - an os.path format "filename" starting with the dirname of
# the package's __file__
parts = resource.split('/')
parts.insert(0, os.path.dirname(mod.__file__))
resource_name = os.path.join(*parts)
if as_string:
return loader.get_data(resource_name)
else:
return resource_name

It's not ideal, especially for very large files, but you can use StringIO to turn a string into something with a read() method, which csv.reader should be able to handle.
csvdata = pkgutil.get_data("curve", "ntc.10k.csv")
csvio = StringIO(csvdata)
raw = csv.reader(csvio)

Over 10 years after the question has been asked, but I came here using Google and went down the rabbit hole posted in other answers. Nowadays this seems to be more straightforward. Below my implementation using stdlib's importlib that returns the filesystem path to the package's resource as string. Should work with 3.6+.
import importlib.resources
import os
def get_data_file_path(package: str, resource: str) -> str:
"""
Returns the filesystem path of a resource marked as package
data of a Python package installed.
:param package: string of the Python package the resource is
located in, e.g. "mypackage.module"
:param resource: string of the filename of the resource (do not
include directory names), e.g. "myfile.png"
:return: string of the full (absolute) filesystem path to the
resource if it exists.
:raises ModuleNotFoundError: In case the package `package` is not found.
:raises FileNotFoundError: In case the file in `resource` is not
found in the package.
"""
# Guard against non-existing files, or else importlib.resources.path
# may raise a confusing TypeError.
if not importlib.resources.is_resource(package, resource):
raise FileNotFoundError(f"Python package '{package}' resource '{resource}' not found.")
with importlib.resources.path(package, resource) as resource_path:
return os.fspath(resource_path)

Another way is to use json.loads() along-with file.decode(). As get_data() retrieves data as bytes, need to convert it to string in-order to process it as json
import json
import pkgutil
data_file = pkgutil.get_data('test.testmodel', 'data/test_data.json')
length_data_file = len(json.loads(data_file.decode()))
Reference

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to use Python to programmatically generate part of Sphinx documentation? - python

Sphinx doesn't have anything built-in to do what you like. You can either create a custom directive to process your files or generate the reStructuredText in a separate step and include the resulting reStructuredText file using the include directive.

Sphinx does support custom extensions that would probably be the best way to do this http://sphinx.pocoo.org/ext/tutorial.html.

Related

json dump() into specific folder

Python signxml XML signature package. How to add xml placehoder for Signature tag?

reading a file in python from different directory

How to check type of files without extensions? [duplicate]

how to use pkgutils.get_data with csv.reader in python?

Categories

Resources