Handle environmental variables in config options

Handle environmental variables in config options - python

I have snakemake command line with configuration options like this:
snakemake --config \
f1=$PWD/file1.txt \
f2=$PWD/file2.txt \
f3=/path/to/file3.txt \
...more key-value pairs \
--directory /path/to/output/dir
file1.txt and file2.txt are expected to be in the same directory as the snakefile, file3.txt is somewhere else. I need the paths to files to be absolute, hence the $PWD variable, so Snakemake can find the files after moving to /path/to/output/dir.
Because I start having several configuration options, I would like to move all the --config items to a separate yaml configuration file. The problem is: How do I transfer the variable $PWD to a configuration file?
I could have a dummy string in the yaml file indicating that that string is to be replaced by the directory where the Snakefile is (e.g. f1: <replace me>/file1.txt) but I feel it's awkward. Any better ideas? It may be that I should rethink how the files fileX.txt are passed to snakemake...

You can access the directory the Snakefile lives in with workflow.basedir - you might be able to get away with specifying the relative path in the config file and then defining the absolute path in your Snakefile e.g. as
file1 = pathlib.Path(workflow.basedir) / config["f1"]
file2 = pathlib.Path(workflow.basedir) / config["f2"]

One option is to use an external module, intake, to handle the environmental variable integration. There is a similar answer, but a more specific example for this question is as follow.
A yaml file which follows the syntax expected by intake, a field called sources that contains a list of nested entries that specify at the very least a (possibly local) url at which the file can be access:
# config.yml
sources:
file1:
args:
url: "{{env(PWD)}}/file1.txt"
file2:
args:
url: "{{env(PWD)}}/file2.txt"
Inside the Snakefile, the relevant code would be:
import intake
cat = intake.open_catalog('config.yml')
f1 = cat['file1'].urlpath
f2 = cat['file2'].urlpath
Note that for less verbose yaml files, intake provides syntax for parameterization, see the docs or this example.

Related

Parsing env files

How do safely store passwords and API keys within an .env file and properly parse them? using python?
I want to store passwords that I do not wish to push into public repos.

You may parse the key values of an .env file you can use os.getenv(key) where you replace key with the key of the value you want to access
Suppose the contents of the .env files are :
A=B
FOO=BAR
SECRET=VERYMUCH
You can parse the content like this :
import os
print(os.getenv("A"))
print(os.getenv("FOO"))
print(os.getenv("SECRET"))
# And so on...
Which would result in (output / stdout) :
B
BAR
VERYMUCH
Now, if you are using git version control and you never want yourself to push an environment file accidentally then add this to your .gitignore file and they'll be conveniently ignored.
.env
(Although you should make your life a bit easier by using an existing .gitignore template all of which by default ignore .env files)
Lastly :
you can load .env just like a text file by doing :
with open(".env") as env:
...
and then proceed to manually parse it using regex
In case for some reason if .env files aren't detected by your python script then use this module python-dotenv
You may make use of the aforementioned library like this :
from dotenv import load_dotenv
load_dotenv() # take environment variables from .env.
# Code of your application, which uses environment variables (e.g. from `os.environ` or
# `os.getenv`) as if they came from the actual environment.

snakemake - wildcards from python dictionary

I am writing a snakemake file that has input files on a specific location, with specific folder names (for this example, barcode9[456]). I need to change the naming conventions in these directories so I now want to add a first rule to my snakemake, which would link the folders in the original location (FASTQ_PATH) to an output folder in the snakemake working directory. The names of the link folders in this directory come from a python dictionay d, defined in the snakemake. I would then use these directories as input for the downstream rules.
So the first rule of my snakemake is actually a python script (scripts/ln.py) that maps the naming convention in the original directory to the desired naming conventions, and links the fastqs:
The snakemake looks like so:
FASTQ_PATH = '/path/to/original_location'
# dictionary that maps the subdirectories in FASTQ_PATH (keys) with the directory names that I want in the snakemake working directory (values)
d = {'barcode94': 'IBC_UZL-CV5-04',
'barcode95': 'IBC_UZL-CV5-42',
'barcode96': 'IBC_UZL-CV5-100'}
rule all:
input:
expand('symLinkFq/{barcode}', barcode = list(d.values())) # for each element in list(d.values()) I want to create a subdirectory that would point to the corresponding path in the original location (FASTQ_PATH)
rule symlink:
input:
FASTQ_PATH,
d
output:
'symLinkFq/{barcode}'
script:
"scripts/ln.py"
The python script that I am calling to make the links is shown below
import pandas as pd
import subprocess as sp
import os
# parsing variables from Snakefile
d_sampleNames = snakemake.input[1]
fastq_path = snakemake.input[0]
os.makedirs('symLinkFq')
for barcode in list(d_sampleNames.keys()):
idx = list(d_sampleNames.keys()).index(barcode)
sampleName = list(d_sampleNames.values())[idx]
sp.run(f"ln -s {fastq_path}/{barcode} symLinkFq/{sampleName}", shell=True) # the relative path with respect to the working directory should suffice for the DEST in the ln -s command
But when I call snakemake -np -s Snakefile I get
Building DAG of jobs...
MissingInputException in line 15 of /nexusb/SC2/ONT/scripts/SnakeMake/minimalExample/renameFq/Snakefile:
Missing input files for rule symlink:
barcode95
barcode94
barcode96
The error kind of makes sense to me. The only 'input' files that I have are python variables instead of being files that do exist in my system.
I guess the issue that I am having comes down to the fact that the wildcards that I want to use for all rules are not present in any file that can be used as input, so what I can think of using is the dictionary with the correspondence, though it is not working as I tried.
Does anyone know how to get around this, any other different approach is welcome.

If I understand correctly, I think it could be easier...
I would reverse the key/value mapping (here with dict(zip(...))) than use a lambda input function to get the source directory for each output key:
FASTQ_PATH = '/path/to/files'
d = {'barcode94': 'IBC_UZL-CV5-04',
'barcode95': 'IBC_UZL-CV5-42',
'barcode96': 'IBC_UZL-CV5-100'}
d = dict(zip(d.values(), d.keys())) # Values become keys and viceversa
rule all:
input:
expand('symLinkFq/{barcode}', barcode = d.keys())
rule symlink:
input:
indir= lambda wc: os.path.join(FASTQ_PATH, d[wc.barcode]),
output:
outdir= directory('symLinkFq/{barcode}'),
shell:
r"""
ln -s {input.indir} {output.outdir}
"""
As an aside, in a python script I would use os.symlink() instead of spawning a subprocess and call ln -s - I think it's easier to debug if something goes wrong.

Sphinx apidoc section titles for Python module/package names

When I run sphinx-apidoc and then make html it produces doc pages that have "Subpackages" and "Submodules" sections and "module" and "package" at the end of each module/package name in the table of contents (TOC). How might I prevent these extra titles from being written without editing the Sphinx source?
here's an example doc pages I would like to make (notice TOC):
http://selenium.googlecode.com/svn/trunk/docs/api/py/index.html#documentation
I understand it is due to the apidoc.py file in the sphinx source (line 88):
https://bitbucket.org/birkenfeld/sphinx/src/ef3092d458cc00c4b74dd342ea05ba1059a5da70/sphinx/apidoc.py?at=default
I could manually edit each individual .rst file to delete these titles or just remove those lines of code from the script but then I'd have to compile the Sphinx source code. Is there an automatic way of doing this without manually editing the Sphinx source?

I was struggling with this myself when I found this question... The answers given didn't quite do what I wanted so I vowed to come back when I figured it out. :)
In order to remove 'package' and 'module' from the auto-generated headings and have docs that are truly automatic, you need to make changes in several places so bear with me.
First, you need to handle your sphinx-apidoc options. What I use is:
sphinx-apidoc -fMeET ../yourpackage -o api
Assuming you are running this from inside the docs directory, this will source yourpackage for documentation and put the resulting files at docs/api. The options I'm using here will overwrite existing files, put module docs before submodule docs, put documentation for each module on its own page, abstain from creating module/package headings if your docstrings already have them, and it won't create a table of contents file.
That's a lot of options to remember, so I just add this to the end of my Makefile:
buildapi:
sphinx-apidoc -fMeET ../yourpackage -o api
#echo "Auto-generation of API documentation finished. " \
"The generated files are in 'api/'"
With this in place, you can just run make buildapi to build your docs.
Next, create an api.rst file at the root of your docs with the following contents:
API Documentation
=================
Information on specific functions, classes, and methods.
.. toctree::
:glob:
api/*
This will create a table of contents with everything in the api folder.
Unfortunately, sphinx-apidoc will still generate a yourpackage.rst file with an ugly 'yourpackage package' heading, so we need one final piece of configuration. In your conf.py file, find the exclude_patterns option and add this file to the list. It should look something like this:
exclude_patterns = ['_build', 'api/yourpackage.rst']
Now your documentation should look exactly like you designed it in the module docstrings, and you never have to worry about your Sphinx docs and your in-code documentation being out of sync!

It's probably late, but the options maxdepth or titlesonly should do the trick.
More details :
http://sphinx-doc.org/latest/markup/toctree.html

The answer by Jen Garcia helped a lot but it requires to put repeat package names in docstrings. I used a Perl one-liner to remove the "module" or "package" suffix in my Makefile:
docs:
rm -rf docs/api docs/_build
sphinx-apidoc -MeT -o docs/api wdmapper
for f in docs/api/*.rst; do\
perl -pi -e 's/(module|package)$$// if $$. == 1' $$f ;\
done
$(MAKE) -C docs html

I didn't want to use the titles within my docstrings as I was following numpy style guidelines. So I first generate the rst files and then run the following python script as a post-processing step.
from pathlib import Path
src_dir = Path("source/api")
for file in src_dir.iterdir():
print("Processed RST file:", file)
with open(file, "r") as f:
lines = f.read()
junk_strs = ["Submodules\n----------", "Subpackages\n-----------"]
for junk in junk_strs:
lines = lines.replace(junk, "")
lines = lines.replace(" module\n=", "\n")
with open(file, "w") as f:
f.write(lines)
This script is kept in the same directory as the makefile. I also add the following lines to the makefile.
html:
# rm -r "$(BUILDDIR)"
python rst_process.py
#$(SPHINXBUILD) -M $# "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
Now running make html builds the documentation in the way I want.

I'm not sure I'm 100% answering your question, but I had a similar experience and I realized I was running sphinx-apidoc with the -f flag each time, which created the .rst files fresh each time.
Now I'm allowing sphinx-apidoc to generate the .rst files once, but not overwriting them, so I can modify them to change titles/etc. and then run make html to propagate the changes. If I want to freshly generate .rst files I can just remove the files I want to regenerate or pass the -f flag in.
So you do have to change the rst files but only once.

In newer versions of Apidoc, you can use a custom Jinja template to control the generated output.
The default templates are here: https://github.com/sphinx-doc/sphinx/tree/5.x/sphinx/templates/apidoc
You can make a local copy of each template using the same names (e.g. source/_templates/toc.rst_t) and invoke sphinx-apidoc with the --templatedir option (e.g. sphinx-apidoc --templatedir source/_templates).
Once you are using your own template file, you can customize it however you want. For example, you can remove the ugly "package" and "module" suffix, which is added at this stage.

Passing commands to OS: What is wrong here?

So, I want to create a simple script to create directories based upon the file names contained within a certain folder.
My method looks like this:
def make_new_folders(filenames, destination):
"""
Take a list of presets and create new directories using mkdir
"""
for filename in filenames:
path = '"%s/%s/"' % (destination, filename)
subprocess.call(["mkdir", path])
For some reason I can't get the command to work.
If I pass in a file named "Test Folder", i get an error such as:
mkdir: "/Users/soundteam/Desktop/PlayGround/Test Folder: No such file or directory
Printing the 'path' variable results in:
"/Users/soundteam/Desktop/PlayGround/Test Folder/"
Can anyone point me in the right direction?

First of all, you should use os.path.join() to glue your path parts together because it works cross-platform.
Furthermore, there are built-in commands like os.mkdir or os.makedirs (which is really cool because it's recursive) to create folders. Creating a subprocess is expensive and, in this case, not a good idea.
In your example you're passing double-quotes ("destination/filename") to subprocess, which you don't have to do. Terminals need double-quotes if you use whitespaces in file or folder names, subprocess takes care of that for you.

You don't need the double quotes. subprocess passes the parameters directly to the process, so you don't need to prepare them for parsing by a shell. You also don't need the trailing slash, and should use os.path.join to combine path components:
path = os.path.join(destination, filename)
EDIT: You should accept #Fabian's answer, which explains that you don't need subprocess at all (I knew that).

File deletion using rm command

I want to make sure that I delete required files.
I have code something like
dir="/some/path/"
file = "somefile.txt"
cmd_rm= "rm -rf "+dir + file
os.system(cmd_rm)
The dir and file values are fetched from a database. How can I make sure I never end up running rm -rf /?
What things should I check before doing rm -rf?

Don't use the -r switch if you just want to remove a single file. Also, there could be spaces in the file name.
Better use the functions in Python's os module instead:
dirname = "/some/path/"
filename = "somefile.txt"
pathname = os.path.abspath(os.path.join(dirname, filename))
if pathname.startswith(dirname):
os.remove(pathname)
Normalizing the path with abspath and comparing it against the target directory avoids file names like "../../../etc/passwd" or similar.

You might consider using os.remove() instead since it's a great deal less dangerous than what you're attempting.

First, I suggest you to use the os.remove() and os.rmdir() functions for working with things like that. You will end up with more portable code and less headache for checking command return.
To check what you are effectively attempting to remove (you may not want to just check "/"), you can use some regular expressions on the generated path or just add a base path to all path returned from you database (depending what you are doing ...).

Use shutil.rmtree as Dave Kirby says. If you want to delete the just the file use:
dir = "/some/path/"
file = "somefile.txt"
cmd = os.path.join(dir, file)
shutil.rmtree(cmd)
If you want to delete the directory use:
dir = "/some/path/"
file = "somefile.txt"
shutil.rmtree(dir)
If the files are write protected make sure you have write permissions before you run this.

There is a module called shutil that provides shell-like file manipulation. If you want to delete a directory and all files and directories in it then use shutil.rmtree.
However it is implemented in python so if you are deleting a huge number of files then spawning rm may be faster, but will fail if the path has a space in it.

Assuming that your mentioning rm -rf is not just at random, but is exactly the command you need, why not just to call it? There is a lib allowing a greater integration with shell called sh.
from sh import rm
path_to_delete = '/some/path'
if os.path.exists(path_to_delete):
rm('-rf', path_to_delete)
PS Make sure you are not root and/or ask for user input to be extra cautious.
And, yes, smoke the man to avoid deleting a single file recursively ;)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Handle environmental variables in config options - python

Related

Parsing env files

snakemake - wildcards from python dictionary

Sphinx apidoc section titles for Python module/package names

Passing commands to OS: What is wrong here?

File deletion using rm command

Categories

Resources