Issue running terminal commands through jupyter notebook

Issue running terminal commands through jupyter notebook - python

I am currently running a jupyter notebook from a github repo. One chunk goes like this:
for file in os.listdir('data/'):
path = 'data/' + file
os.system(f"python OpticalFlowGen.py --type {target} --file {path}")
The variables target and path were defined and no errors were raised.
When the OpticalFlowGen.py file is run on the terminal with python OpticalFlowGen.py ---type 'Train' -- file 'data/video.mp4', a popup appears and closes after the video file is processed by openCV and .jpg files will be saved in the system. However, when this command is run on the jupyter notebook, nothing pops up and no files are saved. You can access this .py file from the same repository here.
Currently I have to run manually on the terminal file by file so save all the image output before I can run the notebook without error. However, it will become an issue when I have too many video files, not using the for loop will be too cumbersome. Any idea on how to solve this issue?

In a Jupyter Notebook you don't have to use os.system, instead try to use !:
for file in os.listdir('data/'):
path = 'data/' + file
# Use
# ! - for terminal commands
# {} - for a variable in the terminal command
!python OpticalFlowGen.py --type {target} --file {path}

Related

How to load files in a notebook when using Snakemake?

In a data processing project with several steps, using Snakemake, there is a Python Jupyter Notebook in a subdirectory that processes some data:
Notebook processing_step_1/process.ipynb contains:
with open('input.csv') as infile:
for line in infile:
print(line)
Data file processing_step_1/input.csv contains:
one,two,three
1,2,3
And this is the Snakefile using the notebook :
rule process_data:
input:
"processing_step_1/input.csv",
notebook:
"processing_step_1/process.ipynb"
If I run the notebook interactively, or from the command line like this
jupyter nbconvert --execute --to notebook processing_step_1/process.ipynb
it works. The working directory is set to the directory of the notebook and the input file can be found with a relative path.
When running from Snakemake, though, using
snakemake -c1
I get an error message
FileNotFoundError: [Errno 2] No such file or directory: 'input.csv'
and the reason for that is that the notebook is copied and executed in a different directory, as can be seen from the Snakemake error message:
Command 'set -euo pipefail; jupyter-nbconvert --log-level ERROR --execute --to notebook --ExecutePreprocessor.timeout=-1 /path/to/project/.snakemake/scripts/tmp9mmr8k20.process.ipynb' returned non-zero exit status 1.
What is the canonical way of loading data files from the same directory as the notebook when using Snakemake?
I would like to still be able to use the same notebook standalone without Snakemake. So preferably I wouldn’t like to add Snakemake-specific code to it.
It seems to be impossible to find the directory containing the notebook from within the notebook. See e.g. https://stackoverflow.com/a/52119628/381281. Also I couldn’t find a way to set a working directory per rule in Snakemake.

The solution by #hfs (OP) is one way to resolve this, but another way is to avoid hardcoding the file paths within the notebook:
# with open('input.csv') as infile: <- this is hard-coded
with open(snakemake.input[0]) as infile: # this is flexible
...
Note that for this solution to work, the notebook directive should be used instead of the shell-nbconvert combination.

Using shell, one can cd to the desired working directory:
rule process_data:
input:
"processing_step_1/input.csv",
shell:
"""
cd processing_step_1
jupyter nbconvert --execute --to notebook --inplace process.ipynb
"""

Run Python file with relative path outside current folder in Jupyter Notebook

Let's say I have the python file ~/file_to_run.py that I want to run from jupyter notebook (~/notebooks/my_notebook.ipynb) using the magic command %run. The problem is that file_to_run.py uses a relative path for example:
open('data/file.csv') # full path ~/data/file.csv
When I run ~/file_to_run.py from ~/notebooks/my_notebook.ipynb with:
%run ../file_to_run.py
I get the following error:
FileNotFoundError: [Errno 2] No such file or directory: 'data/file.csv'
Is the any fix without modifying the python file? Thank you!

Changing the working directory could be a solution:
import os
os.chdir('../') # Change the working directory
%run file_to_run.py # Call the script from new working directory

Getting error as :- env: 'python3\r': No such file or directory While running pyspark project using Docker in windows10

Getting error while running pyspark project using Docker in windows 10:
env: 'python3\r': No such file or directory
Python file:
#!/usr/bin/env python3
Tried the following ways:
unix2dos - Converted the file in dos format using a linux box and downloaded the file from linux box to windows local machine.
Sublime Text3 - Saved the file as a python file and then tried running it, but same error.
Reference:
env: python\r: No such file or directory
https://askubuntu.com/questions/896860/usr-bin-env-python3-r-no-such-file-or-directory

What I did was to create a new file. Copy the buggy file (issue with 'python3\r') and pasted in a normal file without the "#!/usr/bin/env python3". Then save it with the .py extention and type by hand (without copy paste) "#!/usr/bin/env python3".

Automatically convert jupyter notebook to .py

I know there have been a few questions about this but I have not found anything robust enough.
Currently I am using, from terminal, a command that creates .py, then moves them to another folder:
jupyter nbconvert --to script '/folder/notebooks/notebook.ipynb' && \
mv ./folder/notebooks/*.py ./folder/python_scripts && \
The workflow then is to code in a notebook, check with git status what changed since last commit, create a potentially huge number of nbconvert commands, then move them all.
I would like to use something like !jupyter nbconvert --to scriptfound in this answer, but without the cell that crates the python file appearing in the .py itself.
Because if that line appears, my code won't ever work right.
So, is there a proper way of dealing with this problem? One that can be automated, and not manually copying files names, creating the command, executing and then starting again.

You can add the following code in the last cell in your notebook file.
!jupyter nbconvert --to script mycode.ipynb
with open('mycode.py', 'r') as f:
lines = f.readlines()
with open('mycode.py', 'w') as f:
for line in lines:
if 'nbconvert --to script' in line:
break
else:
f.write(line)
It will generate the .py file and then remove this very code from it. You will end up with a clean script that will not call !jupyter nbconvert anymore.

Another way would be to use Jupytext as extension for your jupyter installation (can be easily pip installed).
Jupytext Description (see github page)
Have you always wished Jupyter notebooks were plain text documents?
Wished you could edit them in your favorite IDE? And get clear and
meaningful diffs when doing version control? Then... Jupytext may well
be the tool you're looking for!
It will keep paired notebooks in sync with .py files. You then just need to move your .py files or gitignore the notebooks for example as possible workflows.

Go to File > Save and Export Notebook as... > Executable Scripts

This is the closest I have found to what I had in mind, but I have yet to try and implement it:
# A post-save hook to make a script equivalent whenever the notebook is saved (replacing the --script option in older versions of the notebook):
import io
import os
from notebook.utils import to_api_path
_script_exporter = None
def script_post_save(model, os_path, contents_manager, **kwargs):
"""convert notebooks to Python script after save with nbconvert
replaces `jupyter notebook --script`
"""
from nbconvert.exporters.script import ScriptExporter
if model['type'] != 'notebook':
return
global _script_exporter
if _script_exporter is None:
_script_exporter = ScriptExporter(parent=contents_manager)
log = contents_manager.log
base, ext = os.path.splitext(os_path)
script, resources = _script_exporter.from_filename(os_path)
script_fname = base + resources.get('output_extension', '.txt')
log.info("Saving script /%s", to_api_path(script_fname, contents_manager.root_dir))
with io.open(script_fname, 'w', encoding='utf-8') as f:
f.write(script)
c.FileContentsManager.post_save_hook = script_post_save
Additionally, this looks like it has worked to some user on github, so I put it here for reference:
import os
from subprocess import check_call
def post_save(model, os_path, contents_manager):
"""post-save hook for converting notebooks to .py scripts"""
if model['type'] != 'notebook':
return # only do this for notebooks
d, fname = os.path.split(os_path)
check_call(['ipython', 'nbconvert', '--to', 'script', fname], cwd=d)

ModuleNotFoundError: No module named... Jupyter Notebooks

dev is my root directory. I have the below files.
dev\sitehealthcheck\test.py
dev\sitehealthcheck\draw.py
In test.py, the first line reads:
from sitehealthcheck.draw import *
If I run test.py in the VSCode terminal, everything works as expected. However, when I try to execute test.py in the interactive window, Jupyter returns:
ModuleNotFoundError: No module named 'sitehealthcheck'
What can I do so VSCode automatically searches for modules in the same directory as the file I'm executing?
I would prefer just to type the below line.. and, have the VSCode editor/Intellisense and Jupyter to automatically search for modules in the same directory as the file I'm executing.
from draw import *

In support of the existing answers, you need to add the jupyter.notebookFileRoot line to your settings.json file in your workspace when using a notebook in a subfolder and modules in a sibling folder:
.vscode/settings.json:
{
"python.pythonPath": "env/bin/python",
"jupyter.notebookFileRoot": "${workspaceFolder}"
}

To solve this problem, you can search for a setting in the Python extension which is called "Python › Data Science: Notebook File Root". The line below the title of this setting says: "Set the root directory for loading files for the Python Interactive window.". Then you change ${workspaceFolder} to ${fileDirname}, close the interactive Python terminal and restart it, and it should work.

For the interactive window the current working directory is not set to the path of the file being executed, instead it's set to the path of the workspace folder that you have open in VSCode. If I have the following:
WorkspaceFolder:
SubFolder:
MyScript.py
ImportMe.py
If you run MyScript.py in the terminal you would want this:
from importme import * since the file location is added to the path
But if you are using the interactive window you can either add to your PythonPath as suggested above. Or you can use this:
from SubFolder.importme import *
Since the CWD is set to the workspace root for the Interactive Window session.

Easy way to set proper kenral and venv. Try setting proper venv like below.

if error in jupyter notebook
open -: anaconda prompt
pip install import-ipynb
open main file
import import_ipynb
import first

If you add .../dev/sitehealthcheck to your PYTHONPATH, then you can import draw.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.