Multiple users use same jupyter kernel

Multiple users use same jupyter kernel - python

Question: Is it possible to have multiple users connect to the same Jupyter kernel?
Context: I am trying to provide jupyter notebook access to large volume of users. All users are using python.
Right now, every notebook spawns a new kernel pod in the kubernetes cluster and this is inefficient. I am looking for a way to connect a few users to a single kernel pod in Kubernetes. So that we can consume relatively lower compute resources.
I am new to jupyter notebooks so my terminology might have errors. Also, I came across KernelProvisioner and was wondering if that's of any help?
I am looking to see
If it's even possible in Jupyter?
Which new K8S objects to add to achieve this for example, custom controllers, services, deployments etc.
Any inputs will be appreciated.
Thank you!

Related

Running Jupyter Notebook in GCP on a Schedule

What is the best way to migrate a jupyter notebook in to Google Cloud Platform?
Requirements
I don't want to do a lot of changes to the notebook to get it to run
I want it to be scheduleable, preferably through the UI
I want it to be able to run a ipynb file, not a py file
In AWS it seems like sagemaker is the no brainer solution for this. I want the tool in GCP that gets as close to the specific task without a lot of extras
I've tried the following,
Cloud Function: it seems like it's best for running python scripts, not a notebook, requires you to run a main.py file by default
Dataproc: seems like you can add a notebook to a running instance but it cannot be scheduled
Dataflow: sort of seemed like overkill, like it wasn't the best tool and that it was better suited apache based tools
I feel like this question should be easier, I found this article on the subject:
How to Deploy and Schedule Jupyter Notebook on Google Cloud Platform
He actually doesn't do what the title says, he moves a lot of GCP code in to a main.py to create an instance and he has the instance execute the notebook.
Feel free to correct my perspective on any of this

I use Vertex AI Workbench to run notebooks on GCP. It provides two variants:
Managed Notebooks
User-managed Notebooks
User-managed notebooks creates compute instances at the background and it comes with pre-built packages such as Jupyter Lab, Python, etc and allows customisation. I mainly use for developing Dataflow pipelines.
Other requirement of scheduling - Managed Notebooks supports this feature, refer this documentation (I am yet to try Managed Notebooks):
Use the executor to run a notebook file as a one-time execution or on
a schedule. Choose the specific environment and hardware that you want
your execution to run on. Your notebook's code will run on Vertex AI
custom training, which can make it easier to do distributed training,
optimize hyperparameters, or schedule continuous training jobs. See
Run notebook files with the executor.
You can use parameters in your execution to make specific changes to
each run. For example, you might specify a different dataset to use,
change the learning rate on your model, or change the version of the
model.
You can also set a notebook to run on a recurring schedule. Even while
your instance is shut down, Vertex AI Workbench will run your notebook
file and save the results for you to look at and share with others.

How to share a jupyter notebook with colleagues (virtual private cloud required)?

I need to share Jupyter notebooks with my colleagues. My company is pretty strict about data, sharing, etc so it needs to be on our private cloud. We use AWS. SageMaker is ok, but we also want to share the same environment I set up for my notebook. We don't have huge budgets for Domino Data Labs. Any methods / tools, even if low cost, you recommend? Really appreciaet the help.
Background on what i tried:
I don't even know where to start..
Our dev ops guys dont have bandwidth to do this for a while, but it's crushing us because we have a deliverable due soon

multiple simultaneous connections on same jupyter notebook at the same time

I created a jupyter notebook with the purpose of taking a survey on a fairly large group of people, which consists of 1 script that each person has to run and fill in. To make it convenient for them I hosted a public jupyter notebook server and mailed every person the link to participate.
The problem is that when one person is running the script, all other people have to wait until that person closes the notebook in order for them to run it. I want a system that generates one seperate kernel for every incoming connection so multiple people can take the survey at the same time.
Does anyone have any ideas?

Jupyter Notebook wasn't made for simultaneous collaboration on the same file. One solution I've seen that addresses exactly this problem is Google Colab, which is a fork of Jupyter built on Google's collaborative Docs platform, and allows exactly what you're talking about.
It looks like for Jupyter Lab, they're hoping to integrate simultaneous editing as a core feature (they were originally going for a Google Drive backend, but Google seems to have pulled support and now they're considering more P2P solutions like IPFS), but it looks like that work has hit a few roadbumps and won't be released with version 1.0.

Interactive Ipython Notebooks on Heroku

I am currently trying to make python tutorials and host them using an ipython notebook on a Heroku site. The problem is that ipython notebooks are static when uploaded. I am trying to make it such that the user can use the notebook interactively (such as print outputs). I also dont want the output from their notebooks to be saved permanently on the Heroku website.

From what I understand, you have 2 issues do deal with :
interactive notebooks
"read only" notebooks (do not save the modifications)
For issue 1, you need to use a jupyter (the new IPython name for notebooks) server. Only showing the notebook is not enough because you need a server to "understand" and execute the modifications. See : http://jupyter-notebook.readthedocs.io/en/latest/public_server.html
I am not familiar with Heroku, after googling 2s I found this : https://github.com/pl31/heroku-jupyter which was able to deploy a working Jupyter server on a demo heroku machine.
According to me, issue 2 is more difficult to solve.
When the "learners" will change the notebook, the modifications will be applied to the notebook file (.ipnb) so the modifications will be persistent... This is not want you want.
You could try some tricks using file permissions to prevent the kernel to save the file, but I think it would only crash the kernel...
Moreover it asks several user-interaction problems, for instance what if I lose my internet connection ? Will I loose my work ? Why ? Is this what I really want as a learner ?
For this, the best solution is to provide a user access to the notebook / a worksapce where she can save her progression, but it is more work than just deploy a jupyter server. As an example, see databricks.com (the first (only) one that come to mind, not necessary the best).
(As a remark, it seems that the multi user mode is already implemented : https://jupyterhub.readthedocs.io/en/latest/)
I would like to add a last remark about the security of the server. Letting stranger access a server with an embedded shell sound like a bad idea if you are not prepared for the consequences. I would suggest you to see how you can put each user's jupyter session in a "jail" / container, anything that works in Heroku.

How to build a web service with one sandboxed Python (VM) per request

As part of an effort to make the scikit-image examples gallery interactive, I would like to build a web service that receives a Python code snippet, executes it, and provides me with the generated output image.
For safety, the Python instances launched should be sandboxed and resource controlled, so I was thinking of using LXC containers.
Is this a good way to approach the problem? If so, what is the recommended way of launching one Python VM per request?

Stefan, perhaps "Docker" could be of use? I get the impression that you could constrain the VM that the application is run in -- an example web service:
http://docs.docker.io/en/latest/examples/python_web_app/
You could try running the application on Digital Ocean, like so:
https://www.digitalocean.com/community/articles/how-to-install-and-use-docker-getting-started

[disclaimer: I'm an engineer at Continuum working on Wakari]
Wakari Enterprise (http://enterprise.wakari.io) is aiming to do exactly this, and we're hoping to back-port the functionality into Wakari Cloud (http://wakari.io) so "published" IPython Notebooks can have some knobs on them for variable input control, then they can be "invoked" in a sandboxed state, and then the output given back to the user.
However for things that exist now, you should look at Sage Notebook. A few years ago several people worked hard on a Sage Notebook Cell Server that could do exactly what you were asking for: execute small code snippets. I haven't followed it since then, but it seems it is still alive and well from a quick search:
http://sagecell.sagemath.org/?q=ejwwif
http://sagecell.sagemath.org
http://www.sagemath.org/eval.html
For the last URL, check out Graphics->Mandelbrot and you can see that Sage already has some great capabilities for UI widgets that are tied to the "cell execution".

I think docker is the way to go for this. The instances are very light weight, and docker is designed to spawn 100s of instances at a time (Spin up time is fractions of a second vs traditional VMs couple of seconds). Configured correctly I believe it also gives you a complete sandboxed environment. Then it matters not about trying to sandbox python :-D

I'm not sure if you really have to go as far as setting up LXC containers:
There is seccomp-nurse, a Python sandbox that leverages the seccomp feature of the Linux kernel.
Another option would be to use PyPy, which has explicit support for sandboxing out of the box.
In any case, do not use pysandbox, it is broken by design and has severe security risks.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.