How to load files in a notebook when using Snakemake?

How to load files in a notebook when using Snakemake? - python

In a data processing project with several steps, using Snakemake, there is a Python Jupyter Notebook in a subdirectory that processes some data:
Notebook processing_step_1/process.ipynb contains:
with open('input.csv') as infile:
for line in infile:
print(line)
Data file processing_step_1/input.csv contains:
one,two,three
1,2,3
And this is the Snakefile using the notebook :
rule process_data:
input:
"processing_step_1/input.csv",
notebook:
"processing_step_1/process.ipynb"
If I run the notebook interactively, or from the command line like this
jupyter nbconvert --execute --to notebook processing_step_1/process.ipynb
it works. The working directory is set to the directory of the notebook and the input file can be found with a relative path.
When running from Snakemake, though, using
snakemake -c1
I get an error message
FileNotFoundError: [Errno 2] No such file or directory: 'input.csv'
and the reason for that is that the notebook is copied and executed in a different directory, as can be seen from the Snakemake error message:
Command 'set -euo pipefail; jupyter-nbconvert --log-level ERROR --execute --to notebook --ExecutePreprocessor.timeout=-1 /path/to/project/.snakemake/scripts/tmp9mmr8k20.process.ipynb' returned non-zero exit status 1.
What is the canonical way of loading data files from the same directory as the notebook when using Snakemake?
I would like to still be able to use the same notebook standalone without Snakemake. So preferably I wouldn’t like to add Snakemake-specific code to it.
It seems to be impossible to find the directory containing the notebook from within the notebook. See e.g. https://stackoverflow.com/a/52119628/381281. Also I couldn’t find a way to set a working directory per rule in Snakemake.

The solution by #hfs (OP) is one way to resolve this, but another way is to avoid hardcoding the file paths within the notebook:
# with open('input.csv') as infile: <- this is hard-coded
with open(snakemake.input[0]) as infile: # this is flexible
...
Note that for this solution to work, the notebook directive should be used instead of the shell-nbconvert combination.

Using shell, one can cd to the desired working directory:
rule process_data:
input:
"processing_step_1/input.csv",
shell:
"""
cd processing_step_1
jupyter nbconvert --execute --to notebook --inplace process.ipynb
"""

Related

Issue running terminal commands through jupyter notebook

I am currently running a jupyter notebook from a github repo. One chunk goes like this:
for file in os.listdir('data/'):
path = 'data/' + file
os.system(f"python OpticalFlowGen.py --type {target} --file {path}")
The variables target and path were defined and no errors were raised.
When the OpticalFlowGen.py file is run on the terminal with python OpticalFlowGen.py ---type 'Train' -- file 'data/video.mp4', a popup appears and closes after the video file is processed by openCV and .jpg files will be saved in the system. However, when this command is run on the jupyter notebook, nothing pops up and no files are saved. You can access this .py file from the same repository here.
Currently I have to run manually on the terminal file by file so save all the image output before I can run the notebook without error. However, it will become an issue when I have too many video files, not using the for loop will be too cumbersome. Any idea on how to solve this issue?

In a Jupyter Notebook you don't have to use os.system, instead try to use !:
for file in os.listdir('data/'):
path = 'data/' + file
# Use
# ! - for terminal commands
# {} - for a variable in the terminal command
!python OpticalFlowGen.py --type {target} --file {path}

Automatically convert jupyter notebook to .py

I know there have been a few questions about this but I have not found anything robust enough.
Currently I am using, from terminal, a command that creates .py, then moves them to another folder:
jupyter nbconvert --to script '/folder/notebooks/notebook.ipynb' && \
mv ./folder/notebooks/*.py ./folder/python_scripts && \
The workflow then is to code in a notebook, check with git status what changed since last commit, create a potentially huge number of nbconvert commands, then move them all.
I would like to use something like !jupyter nbconvert --to scriptfound in this answer, but without the cell that crates the python file appearing in the .py itself.
Because if that line appears, my code won't ever work right.
So, is there a proper way of dealing with this problem? One that can be automated, and not manually copying files names, creating the command, executing and then starting again.

You can add the following code in the last cell in your notebook file.
!jupyter nbconvert --to script mycode.ipynb
with open('mycode.py', 'r') as f:
lines = f.readlines()
with open('mycode.py', 'w') as f:
for line in lines:
if 'nbconvert --to script' in line:
break
else:
f.write(line)
It will generate the .py file and then remove this very code from it. You will end up with a clean script that will not call !jupyter nbconvert anymore.

Another way would be to use Jupytext as extension for your jupyter installation (can be easily pip installed).
Jupytext Description (see github page)
Have you always wished Jupyter notebooks were plain text documents?
Wished you could edit them in your favorite IDE? And get clear and
meaningful diffs when doing version control? Then... Jupytext may well
be the tool you're looking for!
It will keep paired notebooks in sync with .py files. You then just need to move your .py files or gitignore the notebooks for example as possible workflows.

Go to File > Save and Export Notebook as... > Executable Scripts

This is the closest I have found to what I had in mind, but I have yet to try and implement it:
# A post-save hook to make a script equivalent whenever the notebook is saved (replacing the --script option in older versions of the notebook):
import io
import os
from notebook.utils import to_api_path
_script_exporter = None
def script_post_save(model, os_path, contents_manager, **kwargs):
"""convert notebooks to Python script after save with nbconvert
replaces `jupyter notebook --script`
"""
from nbconvert.exporters.script import ScriptExporter
if model['type'] != 'notebook':
return
global _script_exporter
if _script_exporter is None:
_script_exporter = ScriptExporter(parent=contents_manager)
log = contents_manager.log
base, ext = os.path.splitext(os_path)
script, resources = _script_exporter.from_filename(os_path)
script_fname = base + resources.get('output_extension', '.txt')
log.info("Saving script /%s", to_api_path(script_fname, contents_manager.root_dir))
with io.open(script_fname, 'w', encoding='utf-8') as f:
f.write(script)
c.FileContentsManager.post_save_hook = script_post_save
Additionally, this looks like it has worked to some user on github, so I put it here for reference:
import os
from subprocess import check_call
def post_save(model, os_path, contents_manager):
"""post-save hook for converting notebooks to .py scripts"""
if model['type'] != 'notebook':
return # only do this for notebooks
d, fname = os.path.split(os_path)
check_call(['ipython', 'nbconvert', '--to', 'script', fname], cwd=d)

ModuleNotFoundError: No module named... Jupyter Notebooks

dev is my root directory. I have the below files.
dev\sitehealthcheck\test.py
dev\sitehealthcheck\draw.py
In test.py, the first line reads:
from sitehealthcheck.draw import *
If I run test.py in the VSCode terminal, everything works as expected. However, when I try to execute test.py in the interactive window, Jupyter returns:
ModuleNotFoundError: No module named 'sitehealthcheck'
What can I do so VSCode automatically searches for modules in the same directory as the file I'm executing?
I would prefer just to type the below line.. and, have the VSCode editor/Intellisense and Jupyter to automatically search for modules in the same directory as the file I'm executing.
from draw import *

In support of the existing answers, you need to add the jupyter.notebookFileRoot line to your settings.json file in your workspace when using a notebook in a subfolder and modules in a sibling folder:
.vscode/settings.json:
{
"python.pythonPath": "env/bin/python",
"jupyter.notebookFileRoot": "${workspaceFolder}"
}

To solve this problem, you can search for a setting in the Python extension which is called "Python › Data Science: Notebook File Root". The line below the title of this setting says: "Set the root directory for loading files for the Python Interactive window.". Then you change ${workspaceFolder} to ${fileDirname}, close the interactive Python terminal and restart it, and it should work.

For the interactive window the current working directory is not set to the path of the file being executed, instead it's set to the path of the workspace folder that you have open in VSCode. If I have the following:
WorkspaceFolder:
SubFolder:
MyScript.py
ImportMe.py
If you run MyScript.py in the terminal you would want this:
from importme import * since the file location is added to the path
But if you are using the interactive window you can either add to your PythonPath as suggested above. Or you can use this:
from SubFolder.importme import *
Since the CWD is set to the workspace root for the Interactive Window session.

Easy way to set proper kenral and venv. Try setting proper venv like below.

if error in jupyter notebook
open -: anaconda prompt
pip install import-ipynb
open main file
import import_ipynb
import first

If you add .../dev/sitehealthcheck to your PYTHONPATH, then you can import draw.

how to convert a multiple python files into one ipynb file?

im trying to convert four python files that are related (belongs to the same project) into a jupyter notebook(ipynb) one file , is there any specific way to do that ?
This is my project folder tree:
C:/
build_dataset.py
train_model.py
folder1
---cancernet.py
---config.py
dataset_folder

You can use py2nb tool for it:
https://github.com/williamjameshandley/py2nb
Just call it from the shell:
py2nb waka.py
and you will get the .ipynb file.
PS: There are several similar tools. p2j also can help you. Usage is absolutely equal to py2nb. Or you can use the powerful jupytext with its command line conversions between formats:
jupytext --to notebook notebook.py # overwrite notebook.ipynb (remove outputs)
jupytext --to notebook --update notebook.py # update notebook.ipynb (preserve outputs)
jupytext --to ipynb notebook1.md notebook2.py # overwrite notebook1.ipynb and notebook2.ipynb

How to run a `nix-shell` with a default.nix file?

I'm trying to understand how nix works. For that purposed I tried to create a simple environment to run jupyter notebooks.
When I run the command:
nix-shell -p "\
with import <nixpkgs> {};\
python35.withPackages (ps: [\
ps.numpy\
ps.toolz\
ps.jupyter\
])\
"
I get what I expected -- a shell in an environment with python and the all packages listed installed, and the all expected commands accessible in the path:
[nix-shell:~/dev/hurricanes]$ which python
/nix/store/5scsbf8z3jnz8ardch86mhr8xcyc8jr2-python3-3.5.3-env/bin/python
[nix-shell:~/dev/hurricanes]$ which jupyter
/nix/store/5scsbf8z3jnz8ardch86mhr8xcyc8jr2-python3-3.5.3-env/bin/jupyter
[nix-shell:~/dev/hurricanes]$ jupyter notebook
[I 22:12:26.191 NotebookApp] Serving notebooks from local directory: /home/calsaverini/dev/hurricanes
[I 22:12:26.191 NotebookApp] 0 active kernels
[I 22:12:26.191 NotebookApp] The Jupyter Notebook is running at: http://localhost:8888/?token=7424791f6788af34f4c2616490b84f0d18353a4d4e60b2b5
So, I created a new folder with a single default.nix file with the following contents:
with import <nixpkgs> {};
python35.withPackages (ps: [
ps.numpy
ps.toolz
ps.jupyter
])
When I run nix-shell in this folder though, it seems like everything is installed but the PATHs are not set:
[nix-shell:~/dev/hurricanes]$ which python
/usr/bin/python
[nix-shell:~/dev/hurricanes]$ which jupyter
[nix-shell:~/dev/hurricanes]$ jupyter
The program 'jupyter' is currently not installed. You can install it by typing:
sudo apt install jupyter-core
By what I read here I was expecting the two situations to be equivalent. What did I do wrong?

Your default.nix file is supposed to hold the information to build a derivation when calling it with nix-build. When calling it with nix-shell, it just sets the shell in a way that the derivation is buildable. In particular, it sets the PATH variable to contain everything that is listed in the buildInput attribute:
with import <nixpkgs> {};
stdenv.mkDerivation {
name = "my-env";
# src = ./.;
buildInputs =
python35.withPackages (ps: [
ps.numpy
ps.toolz
ps.jupyter
]);
}
Here, I've commented out the src attribute which is required if you want to run nix-build but isn't necessary when your are just running nix-shell.
In your last sentence, I suppose you are referring more precisely to this:
https://github.com/NixOS/nixpkgs/blob/master/doc/languages-frameworks/python.section.md#load-environment-from-nix-expression
I don't understand this advice: to me it just looks plain false.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.