When finding an interesting Python Jupyter Notebook, such as 02.00-Introduction-to-NumPy.ipynb, I usally have to:
download it locally
open a shell in the same folder (tip: use SHIFT+RIGHT CLICK+ Open command window here to save 30 second browsing in the different folders) and do jupyter notebook
select the right .ipynb file, and finally run the code
Isn't there an easier way to do this?
What is the natural way to open a .ipynb notebook which is online, and run the code, without having to manually download the .ipynb?
Note: the notebook is visible here: https://github.com/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/02.00-Introduction-to-NumPy.ipynb but we can't run the code
#jakevdp builds in a nice way to do that, see here. In short, on each page he has an Open in Google Colab button:
#GoogleColab can open any #ProjectJupyter notebook directly from #github!
To run the notebook, just replace "http://github.com " with "http://colab.research.google.com/github/ " in the notebook URL, and it will be loaded into Colab.
Example: 02.00-Introduction-to-NumPy.ipynb becomes: https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/02.00-Introduction-to-NumPy.ipynb
By default, code will run on Colab's distant server, but it's also possible to run it locally, by clicking on top right's Connect to local runtime...:
I personally prefer the MyBinder project as a route. It will open temporary, active sessions with the contents of any Github repo, Github Gists, Gitlab repo, Zenodo archive, Dataverse repo, Datashare archive, Figshare archive, and others. Many repositories already include the necessary configuration files and even put a launch binder button them. Some don't but you can go to the form at MyBinder project and generate a session. That form will also generate a URL that you can use to target the public MyBinder system to open a session alter For example, this person posted the link to open a session for all of Jakes notebooks, you just got to the URL https://mybinder.org/v2/gh/jakevdp/PythonDataScienceHandbook/master?filepath=notebooks%2FIndex.ipynb to tell MyBinder to start a session. Then from the index page that comes up you can click on the link you listed above and run it. Jake included configuration files that MyBinder also recognizes. Note, for some repositories or archives you'll point MyBinder at, it won't have the necessary configuration files and so you can run %pip install <package_name_here> or %conda install <package_name_here> in the current session and continue on running code. Limitations include that you have to be concerned with not sharing anything you wouldn't mind be public, limited resources, and FTP is not allowed to avoid abuse.
Some others to get you started:
A Gallery of Popular Binders (You'll note the one you referenced is listed in the number one position under Featured Projects there.)
Analyze CMS Open Data in Jupyter Notebooks using Binder
Tidal constituent database mapped with Datashader
Sample Binder Repositories For example, the first one listed there includes the library seaborn installed in the environment that launches & uses it to plot a figure.
Related
Desired behaviour
We have an existing workflow in vanilla Jupyter Notebook/Lab where we use relative paths to store outputs of some notebooks. Example:
/home/user/notebooks/notebook1.ipynb
/home/user/notebooks/notebook1_output.log
/home/user/notebooks/project1/project.ipynb
/home/user/notebooks/project1/project_output.log
In both notebooks, we produce the output by simply writing to ./output.log or so.
Problem
However, we are now trying Google Dataproc with Jupyter optional component, and the current directory is always / regardless of which notebook it's run from. This applies for both the notebook and Lab interfaces.
What I've tried
Disabling c.FileContentsManager.root_dir='/' in /etc/jupyter/jupyter_notebook_config.py causes the current directory to be set to wherever I started jupyter notebook from, but it is always that initial starting folder instead of following the .ipynb notebook files.
Any idea on how to restore the "dynamic" current directory behaviour?
Even if it's not possible, I'd like to understand how Dataproc even makes Jupyter behave differently.
Details
Dataproc Image 2.0-debian10
Notebook Server 6.2.0
Jupyterlab 3.0.18
No it is not possible to always get the current directory where your .ipynb file is. Jupyter is running from the local filesystem of the master node of your cluster. It will always take the default system path for its kernel.
In other cases(besides dataproc) also it is not possible to consistently get the path of a Jupyter notebook. You can check out this thread regarding this topic.
You have to mention the directory path for your log file to be saved in the desired path.
Note that the GCS folder in your Lab refers to the Google Cloud storage Bucket of your cluster. You can create .ipynb in GCS but when you will execute the file it will be running inside the local system.Thus you will not be able to save log files in GCS directly.
EDIT:
It's not only Dataproc who makes Jupyter behave differently.If you use Google Colab notebooks there you will also see the same behaviour.
The reason is because youre always executing code in the kernel does not matter where the file is. And in theory multiple notebooks could connect to that kernel.Thus you can't have multiple working directories for the same kernel.
As I mentioned earlier by default if you're starting a notebook, the current working directory is set to the path of the notebook.
Link to the main thread -> https://github.com/ipython/ipython/issues/10123
Definitely a general solution for most use-cases seems to be what is described right here in the github issue: https://github.com/ipython/ipython/issues/10123#issuecomment-354889020
I am playing with Azure Databricks. I uploaded a Python file and added it to the spark context with
spark.sparkContext.addPyFile('/location/to/the/file/model.py')
Everything works fine when I run my Python code in the notebook. But when I make a change to the model.py file and uploaded it with --overwrite, the code in the notebook does not pick up the new version of the file, instead still uses the old one until I restart the cluster.
Is there a way to avoid cluster restart whenever I overwrite a file?
Unfortunately it doesn't work this way - when file is added, it's distributed to all workers and used there.
But you may do it differently - you can use feature of the Repos called arbitrary files (don't forget restart cluster after enabling it), so when you clone your repository into Repos, then you can use Python files (not notebooks) from that repository as Python packages. And then in the notebook, you can use following directives to force reload of changes into notebook environment:
%load_ext autoreload
%autoreload 2
You can see an example of such usage in this demo - Python code from my_package directory is used in the notebook.
What are the free solutions for sharing an interactive python Jupyter notebook with user-defined module and dependent input files?
I have python Jupyter notebook that serves as a code interface for non-technical users. The code itself is in another file code.py that contains many functions that are called from the python Jupyter notebook as needed. Running these functions reqires about ten input files with a size of 100 mb. I want anyone on the web to open this notebook in an executable environment such that the user can run the code with different user choices.
One approach I consider implementing is to use Google Colab, Google Drive, GitHub, and
the Python Package Index (PyPI) as follows:
Package the code.py as PyPI module
Add dependent input files on Google Drive and get their shared link id
Add Colab notebook on GitHub
Once the user run the Colab notebook then it will pip install and import the functions on code.py and download the dependent input files from Google Drive
How to improve or simplify this approach?
What would be a better Colab-based approach to do this job?
Is there any other environments (e.g., Binder) that are more suitable than Colab for this job ?
You can use MyBinder.org and use curl or wget in a postBuild or start configuration file to get your input files if they are elsewhere.
For non-technical users you may want to combine in the use of Voila and myBinder.org. See here about Voila. There's a launch binder badge you can use to demo there. There's a bunch of other examples that run on MyBinder at the Voila Gallery, too.
In our institute, we use OneDrive / Sharepoint from Microsoft to share files with each other. I am also sharing my Jupyter notebooks - with extension *.ipynb - there. Unfortunately, when one clicks on such a file the message "Hmm... looks like this file doesn't have a preview we can show you" appears. In my case I developed my notebooks with python. It's not necessary that my colleagues can run the notebook, but it would indeed be much easier to be able to install a kind of plugin for previewing on the Sharepoint server. Is there any product that can work like this? Otherwise I'll have to always export the file to HTML or so :-(
Unfortunately, there isn't any good news so far. Such a plugin seems to be unavailable as yet. For the meantime, I have automated the exporting of the notebooks to HTML on my Windows 10 machine as follows:
For /R .\ %G IN (*.ipynb) do jupyter nbconvert --to html "%G"
For use in a *.bat file, you'll have to double the %-signs.
I am trying to use the guidance on http://www.slideshare.net/fullscreen/randyzwitch/ipython-ec2/12 to install public Ipython notebooks in an AWS instance. One problem that i encounter is, when i try to create a profile, i do not not observe the creation of an ipython_notebook_config_py file (as explained in the tutoral, as per the screenshot), but i only get a ipython_kernel_config.py file, which has very different contents, and cannot be edited in the way as explained in the tutoral. Can someone help me to understand why this happens, and what i should do subsequently? Many thanks.
The notebook server is no longer part of IPython; it's a separate Jupyter project, which has its own config directory ~/.jupyter. The config file for the notebook server is ~/.jupyter/jupyter_notebook_config.py. You can use this to configure the notebook server. If you want to keep multiple configurations of the notebook server, you can use the environment variable JUPYTER_CONFIG_DIR to specify that a different directory should be used:
JUPYTER_CONFIG_DIR=~/jupyter_nbserver jupyter notebook
Note: IPython has not lost its profiles or config files, so ~/.ipython/profile_default/ipython_config.py and startup files, etc. continue to work as before for configuring the IPython kernel, just not the notebook server itself.
References:
Running a public notebook server
Jupyter configuration
Migration from IPython to Jupyter