Python and os.chroot

Python and os.chroot - python

I'm writing a web-server in Python as a hobby project. The code is targeted at *NIX machines. I'm new to developing on Linux and even newer to Python itself.
I am worried about people breaking out of the folder that I'm using to serve up the web-site. The most obvious way to do this is to filter requests for documents like /../../etc/passwd. However, I'm worried that there might be clever ways to go up the directory tree that I'm not aware of and consequentially my filter won't catch.
I'm considering adding using the os.chroot so that the root directory is the web-site itself. Is this is a safe way of protecting against these jail breaking attacks? Are there any potential pitfalls to doing this that will hurt me down the road?

Yes there are pitfalls. Security wise:
If you run as root, there are always ways to break out. So first chroot(), then PERMANENTLY drop privileges to an other user.
Put nothing which isn't absolutely required into the chroot tree. Especially no suid/sgid files, named pipes, unix domain sockets and device nodes.
Python wise your whole module loading gets screwed up. Python is simply not made for such scenarios. If your application is moderately complex you will run into module loading issues.
I think much more important than chrooting is running as a non privileged user and simply using the file system permissions to keep that user from reading anything of importance.

Check out Twisted. twistd supports privilege shedding and chroot operation out of the box. Additionally it has a whole framework for writing network services, daemons, and pretty much everything.

Related

Externalized configurations in python microservices

The Current State:
I have some non-negligeble amount of microservices written in python.
Each such microservice has its own yaml configuration file that is located in the git repo. We use dynaconf to read the configuraion.
The Problem:
At first it was fine, the configurations were relatively small and it was easy to maintain them. Time went by, and the configurations went larger. It became annoying to change the configurations and it is bad that they are scattered between different git repos, i.e. not centralized.
I want to use "Externalized Configurations" in order to maintain all the configurations in a single repo and that each microservice will read its portion on startup. I have heard about Spring Boot, but it seems to be way too much and apart from it, it seems that the pip libraries seems to be at beta stage, new and unriliable...
Is there another reccomendation in this particular use case? Or should I proceed with Spring Boot?

You can use Microconfig.IO to manage your configuration with powerful templating and distribution. It's language agnostic as long as your configs are YAML or properties format.

I think you could create a default config file for each service and use the external storage idea of the dynaconf.
One possible solution would be to create a simple system to manage these variables with Redis.
And the DynaConf CLI allows you to make changes on the fly.

Python web based interpreter security issues

I am making a web based python interpreter which will take code executes it on Linux based python3 interpreter and give output on the same web page. But this has some serious loop holes like someone can execute bash script using python's os module, can check directory for source code of the web application and a lot more.
Can anyone suggest me how to prevent this kind of mishaps in my application
Regards

Short answer: there is no easy "python-only" solution for this.
Some details:
user can always try to call os, sys, with open(SENSITIVE_PATH, 'rw') as f: ..., etc, and it's hard to detect all those cases simply by analyzing the code
If you allow ANY third-party, then things become even more complicated, for example some third-party package may locally create an alias to os.execv (os_ex = os.execv), and after this it will be possible to write a script like from thirdparty.some_internals import os_ex; os_ex(...).
The more or less reliable solution is to use "external sandboxing" solutions:
Run interpreter in the unprivileged docker container. For example:
write untrusted script to some file that will be exposed through volume in the docker container
execute that script in docker:
a. subprocess.call(['docker', 'exec', 'CONTAINER_ID', '/usr/bin/python', 'PATH_TO_SCRIPT'])
b. subprocess.call(['docker', 'exec', 'CONTAINER_ID', '/usr/bin/python', '-c', UNTRUSTED_SCRIPT_TEXT])
Use PyPy-s sandbox.
Search for some "secure" IPython kernel for Jupyter notebook server. Or write your own. Note: existing kernels are not guaranteed to be secure and may allow to call subprocess.check_output, os.rm and others. So for "default kernel" it's still better to run Jupyter server in the isolated environment.
Run interpreter in chroot using unprivileged user. Different implementations have different level of "safety".
Use Jython with finely tuned permissions.
Some exotic solutions like "client-side JS python implementation": brython, pyjs
In any case, even if you manage to implement or reuse existing "sandbox" you still will get many potential problems:
If multiprocessing or multithreading is allowed then you might want to monitor how CPU resources are utilized, because
some scripts might want to use EVERYTHING. Even with GIL it's possible for multi-threading to utilize all kernels (all the user has to do is to call functions that use c-libraries in the threads)
You might want to monitor memory usage, because some scripts might leak or simply use a lot of memory
Other candidates for monitoring: Disk IO usage, Network usage, open file descriptors usage, execution time, etc...
Also you should always check for security updates of your "sandboxing solution", because even docker sometimes is vulnerable and makes it possible to execute code on host machine
Recommended read: https://softwareengineering.stackexchange.com/questions/191623/best-practices-for-execution-of-untrusted-code

Forbid Python from writing anything to disk

Are there any command-line options or configurations that forbids Python from writing to disk?
I know I can hack open but it doesn't sound very safe.
I've hosted some Python tutorials I wrote myself on my website for friends who want to learn Python, and I want them to have access to a Python console so they can try as they learn. This is done by creating a Python subprocess from the http server.
However, I do not want them to accidentally or intentionally damage my server, so I need to forbid the Python process from writing anything to disk.
Also I'm running the server on Ubuntu Linux so doing it Python-wise or system-wise are both OK.

I doubt there's a way to do this in the interpreter itself: there are way too many things to patch (open, subprocess, os.system, file, and probably others). I'd suggest looking into a way of containerizing the python runtime via something like Docker. The containerization gives some guarantees restricting access, though not as much as virtualization. See here for more discussion about the security implications.
Running a jupyter/ipython notebook in the docker container would probably be the easiest way to expose a web-frontend. jupyter provides a collection of docker containers for this purpose: see https://github.com/jupyter/tmpnb and https://github.com/jupyter/docker-stacks

Loading Python libraries via http

I have several small Python libraries that I wrote with stuff that I find myself wanting over and over again. I think most programmers have something similar. I want to use these libraries from a variety of different machines so I've started keeping this stuff in my DropBox. However, I'd like to be able to use my code on machines on which I can't install DropBox or other cloud storage applications, even in portable form. I can just download the files every time one of them changes (DropBox can provide me a URL for each file in my Public folder), which is only a moderate nuisance. But--and I admit this is a longshot--is there a solution out there that will let me tell Python to load a library from my DropBox via http?
BTW, I'd like to add the whole remove folder to my sys.path, but getting a URL for a folder is complicated, so I'm going to try to walk before I run by starting with individual files.

Yes, it's possible. I think you want the combination of two previous questions:
How to download a file in python over HTTP
How to dynamically load a library in python
So your task basically breaks down into writing a little bit of glue code: download the URL via the first bullet, write it to a local file, and then import that file using the second bullet.
So that's how you'd do that.
BUT - please keep in mind that dynamically downloading and executing code has many potential security downfalls. Will you be doing this over a secure connection? Who else has the ability to manipulate that URL? There are a bunch of security issues inherent in downloading and executing code on the fly. I would ask you to consider going about your solution in a different way, but I'm giving you the answer you're asking for.
As a simple security check, you can establish a known-good hash for your file, and then refuse to import any file other than one that's on the list of known-good hashes. This makes it a pain to update your modules, but gives you a little bit of extra safety.

Don't use DropBox as a Revision control
Pick a real solution like Git
Setup access to the Git repository on one of your servers
Clone the repository to your worker machines and checkout master
Create a develop branch where you put every change you make
Test the changes and when you consider any of them stable, merge it to master
On your worker machines set up a cron job which periodically pulls from master branch of repository (and possibly restarts some Python processes as importing the same module again won't make Python interpreter aware of changes since imported modules are cached)
Enjoy your automatically updated workers :)
Don't feel shame - it happens that even experienced software developers come up with XY problem

faking a filesystem / virtual filesystem

I have a web service to which users upload python scripts that are run on a server. Those scripts process files that are on the server and I want them to be able to see only a certain hierarchy of the server's filesystem (best: a temporary folder on which I copy the files I want processed and the scripts).
The server will ultimately be a linux based one but if a solution is also possible on Windows it would be nice to know how.
What I though of is creating a user with restricted access to folders of the FS - ultimately only the folder containing the scripts and files - and launch the python interpreter using this user.
Can someone give me a better alternative? as relying only on this makes me feel insecure, I would like a real sandboxing or virtual FS feature where I could run safely untrusted code.

Either a chroot jail or a higher-order security mechanism such as SELinux can be used to restrict access to specific resources.

You are probably best to use a virtual machine like VirtualBox or VMware (perhaps even creating one per user/session).
That will allow you some control over other resources such as memory and network as well as disk
The only python that I know of that has such features built in is the one on Google App Engine. That may be a workable alternative for you too.

This is inherently insecure software. By letting users upload scripts you are introducing a remote code execution vulnerability. You have more to worry about than just modifying files, whats stopping the python script from accessing the network or other resources?
To solve this problem you need to use a sandbox. To better harden the system you can use a layered security approach.
The first layer, and the most important layer is a python sandbox. User supplied scripts will be executed within a python sandbox. This will give you the fine grained limitations that you need. Then, the entire python app should run within its own dedicated chroot. I highly recommend using the grsecurity kernel modules which improve the strength of any chroot. For instance a grsecuirty chroot cannot be broken unless the attacker can rip a hole into kernel land which is very difficult to do these days. Make sure your kernel is up to date.
The end result is that you are trying to limit the resources that an attacker's script has. Layers are a proven approach to security, as long as the layers are different enough such that the same attack won't break both of them. You want to isolate the script form the rest of the system as much as possible. Any resources that are shared are also paths for an attacker.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.