Python Script Sandboxing using Docker

Python Script Sandboxing using Docker - python

If I build a container using a base image like Python 3 Alpine, and I'll follow the Hardening indicated into the docker documentation, is it secure to inject and execute a Python script?
I mean, if a user will write something dangerous (like sudo rm -R using a Python function), only the container will be affected of those problems, right?
Is this a good practice? I need to execute some small code snippets with limited access to the system, modules, etc...

I would not treat Docker as a security “silver bullet” here; you want to have at least some notion that the code you’re running is “trustworthy” before unleashing it on your system, even under Docker.
Remember that you need to have root privileges to run docker anything at all, or else you can trivially gain them (docker run -v /:/host -u root ... will let you freely edit the host filesystem). If your application really is dealing in untrusted code, consider whether you want a privileged process to be able to deal with it.
Beyond that, Docker containers share the host’s kernel and various physical resources. If there’s a kernel privilege escalation bug, something running in a container could exploit it. If your untrusted code makes outbound TCP calls to shuffle data around that you wouldn’t want on your network, that’s not limited by default. If it’s “merely” using your CPU cycles to mine Bitcoin, you can’t control that.
If all of this sounds like an acceptable level of risk to you, then running somewhat-trusted code under Docker is certainly better than not: you do get some protection against changing files on the host and host-level settings like network configuration, especially if you believe the code you’re running isn’t actively malicious.

Related

Python web based interpreter security issues

I am making a web based python interpreter which will take code executes it on Linux based python3 interpreter and give output on the same web page. But this has some serious loop holes like someone can execute bash script using python's os module, can check directory for source code of the web application and a lot more.
Can anyone suggest me how to prevent this kind of mishaps in my application
Regards

Short answer: there is no easy "python-only" solution for this.
Some details:
user can always try to call os, sys, with open(SENSITIVE_PATH, 'rw') as f: ..., etc, and it's hard to detect all those cases simply by analyzing the code
If you allow ANY third-party, then things become even more complicated, for example some third-party package may locally create an alias to os.execv (os_ex = os.execv), and after this it will be possible to write a script like from thirdparty.some_internals import os_ex; os_ex(...).
The more or less reliable solution is to use "external sandboxing" solutions:
Run interpreter in the unprivileged docker container. For example:
write untrusted script to some file that will be exposed through volume in the docker container
execute that script in docker:
a. subprocess.call(['docker', 'exec', 'CONTAINER_ID', '/usr/bin/python', 'PATH_TO_SCRIPT'])
b. subprocess.call(['docker', 'exec', 'CONTAINER_ID', '/usr/bin/python', '-c', UNTRUSTED_SCRIPT_TEXT])
Use PyPy-s sandbox.
Search for some "secure" IPython kernel for Jupyter notebook server. Or write your own. Note: existing kernels are not guaranteed to be secure and may allow to call subprocess.check_output, os.rm and others. So for "default kernel" it's still better to run Jupyter server in the isolated environment.
Run interpreter in chroot using unprivileged user. Different implementations have different level of "safety".
Use Jython with finely tuned permissions.
Some exotic solutions like "client-side JS python implementation": brython, pyjs
In any case, even if you manage to implement or reuse existing "sandbox" you still will get many potential problems:
If multiprocessing or multithreading is allowed then you might want to monitor how CPU resources are utilized, because
some scripts might want to use EVERYTHING. Even with GIL it's possible for multi-threading to utilize all kernels (all the user has to do is to call functions that use c-libraries in the threads)
You might want to monitor memory usage, because some scripts might leak or simply use a lot of memory
Other candidates for monitoring: Disk IO usage, Network usage, open file descriptors usage, execution time, etc...
Also you should always check for security updates of your "sandboxing solution", because even docker sometimes is vulnerable and makes it possible to execute code on host machine
Recommended read: https://softwareengineering.stackexchange.com/questions/191623/best-practices-for-execution-of-untrusted-code

securely executing untrusted python code on server

I have a server and a front end, I would like to get python code from the user in the front end and execute it securely on the backend.
I read this article explaining the problematic aspect of each approach - eg pypl sandbox, docker etc.
I can define the input the code needs, and the output, and can put all of them in a directory, what's important to me is that this code should not be able to hurt my filesystem, and create a disk, memory, cpu overflow (I need to be able to set timeouts, and prevent it from accesing files that are not devoted to it)
What is the best practice here? Docker? Kubernetes? Is there a python module for that?
Thanks.

You can have a python docker container with the volume mount. Volume mount will be the directory in local system for code which will be available in your docker container also. By doing this you have isolated user supplied code only to the container when it runs.
Scan your python container with CIS benchmark for better security

Rich editors in a Docker development environment

So my team and I have bought into Docker - it is fantastic for deployment and testing. My real question is how to set up a great developer experience, specifically around writing Python apps, but this question could be generalized to nodejs, Java, etc.
The problem: When writing a Python app, I really like having decent linting/autocomplete functionality, there are some really good editors out there (Atom, VSCode, PyCharm) that provide these, but most really want a Python install on the local disk. The real advantage of Docker is that all of the core language and any project libraries can all be in the container, so reproducing all of that on the host machine just for developing is a pain.
I know that PyCharm pro does support Docker and docker-compose, but I found it quite sluggish and a lot of the test running capabilities were busted. On top of that, I really would like something that I can commit to version control so that the team can share dev setup and people don't have to repeat all of the steps for their own system.
A few Ideas that I had were:
Install an editor (like Atom) in a sidecar Docker container and use X11 forwarding
Use a browser based editor such as https://c9.io/ in a container - this seems most promising
Install some agent in a dev container that could handle autocomplete/linting, etc. and connect to it from a locally running editor - I think this would be the best solution, but I also think that right now it actually doesn't exist.
Has anyone had luck setting up a more productive development environment besides just mounting volumes and editing text?

You should use an 'advanced' IDE like IntelliJ (Pycharm) and configure a remote Python SDK using SSH-Access to your App-Docker-Container (using a shared ssh-key to auth against the app-container with a preinstalled openssh server and preconfigured authorized_keys file).
You can share this SDK information in your project file with all devs, so they wlll have this setup out of the box
1) This will ensure, your IDE knows about all the python libs/symbols available/installed in your docker-container during runtime. It will also enable you to properly debug remotely at the same time
2) This ensures, you have an IDE at your hand including a lot of important additional features like the inspector, 3way duff, search in path.. . hardly any of the Browser-Based IDEs will catch up with Pycharm at this point IMHO
Of course, as already mentioned in the comments, you need to share aka mount your code into the container. On linux, you plainly use host-volume-mounts from your local src folder to the container.
On OSX, you will run into performance issues when using host mounts. You might use something like http://docker-sync.io ( i am biased - there are also a lot of other similar tools )

I know this is an old question, but as I stumbled across it while trying to see what other editors might offer in this space, I would like to point out Visual Studio Code's notion of a Dev Container, which seems to provide the best level of integration I've seen for this so far. I'm hoping to see this turn into an industry trend myself.

Could use x11docker
x11docker allows to run graphical desktop applications (and entire desktops) in Docker Linux containers.
Docker allows to run applications in an isolated container environment. Containers need much less resources than virtual machines for similar tasks.
Docker does not provide a display server that would allow to run applications with a graphical user interface.
x11docker fills the gap. It runs an X display server on the host system and provides it to Docker containers.
Additionally x11docker does some security setup to enhance container isolation and to avoid X security leaks. This allows a sandbox environment that fairly well protects the host system from possibly malicious or buggy software.
https://github.com/mviereck/x11docker
https://github.com/mviereck/x11docker/wiki (extensive! knowledge)
https://dev.to/brickpop/my-dream-come-true-launching-gui-docker-sessions-with-dx11-in-seconds-1a53

Forbid Python from writing anything to disk

Are there any command-line options or configurations that forbids Python from writing to disk?
I know I can hack open but it doesn't sound very safe.
I've hosted some Python tutorials I wrote myself on my website for friends who want to learn Python, and I want them to have access to a Python console so they can try as they learn. This is done by creating a Python subprocess from the http server.
However, I do not want them to accidentally or intentionally damage my server, so I need to forbid the Python process from writing anything to disk.
Also I'm running the server on Ubuntu Linux so doing it Python-wise or system-wise are both OK.

I doubt there's a way to do this in the interpreter itself: there are way too many things to patch (open, subprocess, os.system, file, and probably others). I'd suggest looking into a way of containerizing the python runtime via something like Docker. The containerization gives some guarantees restricting access, though not as much as virtualization. See here for more discussion about the security implications.
Running a jupyter/ipython notebook in the docker container would probably be the easiest way to expose a web-frontend. jupyter provides a collection of docker containers for this purpose: see https://github.com/jupyter/tmpnb and https://github.com/jupyter/docker-stacks

faking a filesystem / virtual filesystem

I have a web service to which users upload python scripts that are run on a server. Those scripts process files that are on the server and I want them to be able to see only a certain hierarchy of the server's filesystem (best: a temporary folder on which I copy the files I want processed and the scripts).
The server will ultimately be a linux based one but if a solution is also possible on Windows it would be nice to know how.
What I though of is creating a user with restricted access to folders of the FS - ultimately only the folder containing the scripts and files - and launch the python interpreter using this user.
Can someone give me a better alternative? as relying only on this makes me feel insecure, I would like a real sandboxing or virtual FS feature where I could run safely untrusted code.

Either a chroot jail or a higher-order security mechanism such as SELinux can be used to restrict access to specific resources.

You are probably best to use a virtual machine like VirtualBox or VMware (perhaps even creating one per user/session).
That will allow you some control over other resources such as memory and network as well as disk
The only python that I know of that has such features built in is the one on Google App Engine. That may be a workable alternative for you too.

This is inherently insecure software. By letting users upload scripts you are introducing a remote code execution vulnerability. You have more to worry about than just modifying files, whats stopping the python script from accessing the network or other resources?
To solve this problem you need to use a sandbox. To better harden the system you can use a layered security approach.
The first layer, and the most important layer is a python sandbox. User supplied scripts will be executed within a python sandbox. This will give you the fine grained limitations that you need. Then, the entire python app should run within its own dedicated chroot. I highly recommend using the grsecurity kernel modules which improve the strength of any chroot. For instance a grsecuirty chroot cannot be broken unless the attacker can rip a hole into kernel land which is very difficult to do these days. Make sure your kernel is up to date.
The end result is that you are trying to limit the resources that an attacker's script has. Layers are a proven approach to security, as long as the layers are different enough such that the same attack won't break both of them. You want to isolate the script form the rest of the system as much as possible. Any resources that are shared are also paths for an attacker.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.