Desired behaviour
We have an existing workflow in vanilla Jupyter Notebook/Lab where we use relative paths to store outputs of some notebooks. Example:
/home/user/notebooks/notebook1.ipynb
/home/user/notebooks/notebook1_output.log
/home/user/notebooks/project1/project.ipynb
/home/user/notebooks/project1/project_output.log
In both notebooks, we produce the output by simply writing to ./output.log or so.
Problem
However, we are now trying Google Dataproc with Jupyter optional component, and the current directory is always / regardless of which notebook it's run from. This applies for both the notebook and Lab interfaces.
What I've tried
Disabling c.FileContentsManager.root_dir='/' in /etc/jupyter/jupyter_notebook_config.py causes the current directory to be set to wherever I started jupyter notebook from, but it is always that initial starting folder instead of following the .ipynb notebook files.
Any idea on how to restore the "dynamic" current directory behaviour?
Even if it's not possible, I'd like to understand how Dataproc even makes Jupyter behave differently.
Details
Dataproc Image 2.0-debian10
Notebook Server 6.2.0
Jupyterlab 3.0.18
No it is not possible to always get the current directory where your .ipynb file is. Jupyter is running from the local filesystem of the master node of your cluster. It will always take the default system path for its kernel.
In other cases(besides dataproc) also it is not possible to consistently get the path of a Jupyter notebook. You can check out this thread regarding this topic.
You have to mention the directory path for your log file to be saved in the desired path.
Note that the GCS folder in your Lab refers to the Google Cloud storage Bucket of your cluster. You can create .ipynb in GCS but when you will execute the file it will be running inside the local system.Thus you will not be able to save log files in GCS directly.
EDIT:
It's not only Dataproc who makes Jupyter behave differently.If you use Google Colab notebooks there you will also see the same behaviour.
The reason is because youre always executing code in the kernel does not matter where the file is. And in theory multiple notebooks could connect to that kernel.Thus you can't have multiple working directories for the same kernel.
As I mentioned earlier by default if you're starting a notebook, the current working directory is set to the path of the notebook.
Link to the main thread -> https://github.com/ipython/ipython/issues/10123
Definitely a general solution for most use-cases seems to be what is described right here in the github issue: https://github.com/ipython/ipython/issues/10123#issuecomment-354889020
Related
I am playing with Azure Databricks. I uploaded a Python file and added it to the spark context with
spark.sparkContext.addPyFile('/location/to/the/file/model.py')
Everything works fine when I run my Python code in the notebook. But when I make a change to the model.py file and uploaded it with --overwrite, the code in the notebook does not pick up the new version of the file, instead still uses the old one until I restart the cluster.
Is there a way to avoid cluster restart whenever I overwrite a file?
Unfortunately it doesn't work this way - when file is added, it's distributed to all workers and used there.
But you may do it differently - you can use feature of the Repos called arbitrary files (don't forget restart cluster after enabling it), so when you clone your repository into Repos, then you can use Python files (not notebooks) from that repository as Python packages. And then in the notebook, you can use following directives to force reload of changes into notebook environment:
%load_ext autoreload
%autoreload 2
You can see an example of such usage in this demo - Python code from my_package directory is used in the notebook.
I have Anaconda Navigator on my work computer and I've changed the default working directory for Jupyter notebooks to be a certain location on the firm server, using the steps given here
I have also created a second environment in Anaconda, for which I would like to use a different Jupyter Notebook working directory than that of the base (root). To do this, I believe I would need to:
Create a second Jupyter Notebook config file
Get the second environment to refer to the new config file, while ensuring that the old file still referred to the original config file.
How would I go about this? Alternate approaches to creating multiple working directories also welcome.
Was looking to achieve the same result: pass different configurations which would include specific settings such as working and workspace directories when launching Jupyter Notebook/Lab in different conda or other virtual environments.
Noticing the following:
jupyter lab --help
--config=<Unicode>
Full path of a config file.
Default: ''
Equivalent to: [--JupyterApp.config_file]
Thus, to achieve the desired result, it is possible to pass the path to the relevant config file upon launching Jupyter Lab/Notebook as such:
jupyter lab --config=~/.jupyter/path_to_my_custom_jupyter_config_for_env_1.py
# or
jupyter notebook --config=~/.jupyter/path_to_my_custom_jupyter_config_for_env_1.py
You can copy your current configuration file, make the relevant adjustments and save it as a different file. You would then pass the path to that new config file when launching Jupyter as per above. To simplify, you can create shortcuts that do that (instead of typing/copying the specific parameters on each launch).
Otherwise (with no specific argument passed), Jupyter Lab/Notebook will launch using the default configuration file if it exists, which is always located in the home directory of the Unix user launching Jupyter (~/.jupyter/jupyter_notebook_config.py).
When finding an interesting Python Jupyter Notebook, such as 02.00-Introduction-to-NumPy.ipynb, I usally have to:
download it locally
open a shell in the same folder (tip: use SHIFT+RIGHT CLICK+ Open command window here to save 30 second browsing in the different folders) and do jupyter notebook
select the right .ipynb file, and finally run the code
Isn't there an easier way to do this?
What is the natural way to open a .ipynb notebook which is online, and run the code, without having to manually download the .ipynb?
Note: the notebook is visible here: https://github.com/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/02.00-Introduction-to-NumPy.ipynb but we can't run the code
#jakevdp builds in a nice way to do that, see here. In short, on each page he has an Open in Google Colab button:
#GoogleColab can open any #ProjectJupyter notebook directly from #github!
To run the notebook, just replace "http://github.com " with "http://colab.research.google.com/github/ " in the notebook URL, and it will be loaded into Colab.
Example: 02.00-Introduction-to-NumPy.ipynb becomes: https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/02.00-Introduction-to-NumPy.ipynb
By default, code will run on Colab's distant server, but it's also possible to run it locally, by clicking on top right's Connect to local runtime...:
I personally prefer the MyBinder project as a route. It will open temporary, active sessions with the contents of any Github repo, Github Gists, Gitlab repo, Zenodo archive, Dataverse repo, Datashare archive, Figshare archive, and others. Many repositories already include the necessary configuration files and even put a launch binder button them. Some don't but you can go to the form at MyBinder project and generate a session. That form will also generate a URL that you can use to target the public MyBinder system to open a session alter For example, this person posted the link to open a session for all of Jakes notebooks, you just got to the URL https://mybinder.org/v2/gh/jakevdp/PythonDataScienceHandbook/master?filepath=notebooks%2FIndex.ipynb to tell MyBinder to start a session. Then from the index page that comes up you can click on the link you listed above and run it. Jake included configuration files that MyBinder also recognizes. Note, for some repositories or archives you'll point MyBinder at, it won't have the necessary configuration files and so you can run %pip install <package_name_here> or %conda install <package_name_here> in the current session and continue on running code. Limitations include that you have to be concerned with not sharing anything you wouldn't mind be public, limited resources, and FTP is not allowed to avoid abuse.
Some others to get you started:
A Gallery of Popular Binders (You'll note the one you referenced is listed in the number one position under Featured Projects there.)
Analyze CMS Open Data in Jupyter Notebooks using Binder
Tidal constituent database mapped with Datashader
Sample Binder Repositories For example, the first one listed there includes the library seaborn installed in the environment that launches & uses it to plot a figure.
I have a custom Jupyter kernel which runs IPython using a custom IPython profile which uses a matplotlib stylesheet.
I know to run this successfully normally I would put:
The matplotlib stylesheet in ~/.config/matplotlib/stylelib/
The IPython profile in ~/.ipython/
The kernel json in ~/.jupyter/kernels/my_kernel/
But I am doing this as part of larger program which runs in a virtualenv, and if I put the things as above then any notebook server running on the computer will be able to see the custom kernels, even if it is running outside the venv. I don't what this because I don't want my program to interfere with other notebooks on the computer.
I think what I need to do is put the things above somewhere equivalent inside the venv but I can't figure out where they should go. Doe anyone know where they would go? Or is this just a thing IPython/Jupiter can't/won't do?
It's probably worth mentioning that in the case of the stylesheet for example I don't want to just put it in the working directory of my program (which is one option matplotlib offers).
You can put kernelspecs in VIRTUAL_ENV/share/jupyter/kernels/ and they will be made available if the notebook server is running in that env. In general, <sys.prefix>/share/jupyter/kernels is included in the path to look for kernelspecs.
You can see the various locations Jupyter will look, you can see the output of jupyter --paths:
$ jupyter --paths
config:
/Users/you/.jupyter
/Users/you/env/etc/jupyter
/usr/local/etc/jupyter
/etc/jupyter
data:
/Users/you/Library/Jupyter
/Users/you/env/share/jupyter
/usr/local/share/jupyter
/usr/share/jupyter
runtime:
/Users/you/Library/Jupyter/runtime
Kernelspecs are considered data files, and will be found in any of those directories listed under data:, in a kernels subdirectory, e.g. /usr/local/share/jupyter/kernels.
I am trying to use the guidance on http://www.slideshare.net/fullscreen/randyzwitch/ipython-ec2/12 to install public Ipython notebooks in an AWS instance. One problem that i encounter is, when i try to create a profile, i do not not observe the creation of an ipython_notebook_config_py file (as explained in the tutoral, as per the screenshot), but i only get a ipython_kernel_config.py file, which has very different contents, and cannot be edited in the way as explained in the tutoral. Can someone help me to understand why this happens, and what i should do subsequently? Many thanks.
The notebook server is no longer part of IPython; it's a separate Jupyter project, which has its own config directory ~/.jupyter. The config file for the notebook server is ~/.jupyter/jupyter_notebook_config.py. You can use this to configure the notebook server. If you want to keep multiple configurations of the notebook server, you can use the environment variable JUPYTER_CONFIG_DIR to specify that a different directory should be used:
JUPYTER_CONFIG_DIR=~/jupyter_nbserver jupyter notebook
Note: IPython has not lost its profiles or config files, so ~/.ipython/profile_default/ipython_config.py and startup files, etc. continue to work as before for configuring the IPython kernel, just not the notebook server itself.
References:
Running a public notebook server
Jupyter configuration
Migration from IPython to Jupyter