Forbid Python from writing anything to disk

Forbid Python from writing anything to disk - python

Are there any command-line options or configurations that forbids Python from writing to disk?
I know I can hack open but it doesn't sound very safe.
I've hosted some Python tutorials I wrote myself on my website for friends who want to learn Python, and I want them to have access to a Python console so they can try as they learn. This is done by creating a Python subprocess from the http server.
However, I do not want them to accidentally or intentionally damage my server, so I need to forbid the Python process from writing anything to disk.
Also I'm running the server on Ubuntu Linux so doing it Python-wise or system-wise are both OK.

I doubt there's a way to do this in the interpreter itself: there are way too many things to patch (open, subprocess, os.system, file, and probably others). I'd suggest looking into a way of containerizing the python runtime via something like Docker. The containerization gives some guarantees restricting access, though not as much as virtualization. See here for more discussion about the security implications.
Running a jupyter/ipython notebook in the docker container would probably be the easiest way to expose a web-frontend. jupyter provides a collection of docker containers for this purpose: see https://github.com/jupyter/tmpnb and https://github.com/jupyter/docker-stacks

Related

How to limit python script so that it can't access local resources?

I am working on a project that allows users to upload a python script to an API and run it on a schedule. Currently, I'm trying to figure out a way to limit the functionality of the script so that it cannot access local files, mess with the flask server running the API, etc. Do you have any ideas on how I can achieve this? Is there anyway to make it so only specific libraries are available for importing?

Running other scripts on your server is serious security issue. If you are trying to deploy Python interpreter on your web application, you can try with something like judge0 - GitHub. It is free if you deploy it yourself and it will run scripts safely inside containers.

The simplest way is to ensure the user running the script is not root, but a user specifically designed for this task (e.g. part of a group that can only read and not write or execute). This means at minimum you should ensure all files have the appropriate mode. Then you can just use a pipe or something to run the script.
Alternatively, you could use a runtime that’s not “local”, like a VM or compute service (AWS lambda, etc). The latter would be simplest, and there’s lots of vendors who offer compute service with programmatic api.

Python web based interpreter security issues

I am making a web based python interpreter which will take code executes it on Linux based python3 interpreter and give output on the same web page. But this has some serious loop holes like someone can execute bash script using python's os module, can check directory for source code of the web application and a lot more.
Can anyone suggest me how to prevent this kind of mishaps in my application
Regards

Short answer: there is no easy "python-only" solution for this.
Some details:
user can always try to call os, sys, with open(SENSITIVE_PATH, 'rw') as f: ..., etc, and it's hard to detect all those cases simply by analyzing the code
If you allow ANY third-party, then things become even more complicated, for example some third-party package may locally create an alias to os.execv (os_ex = os.execv), and after this it will be possible to write a script like from thirdparty.some_internals import os_ex; os_ex(...).
The more or less reliable solution is to use "external sandboxing" solutions:
Run interpreter in the unprivileged docker container. For example:
write untrusted script to some file that will be exposed through volume in the docker container
execute that script in docker:
a. subprocess.call(['docker', 'exec', 'CONTAINER_ID', '/usr/bin/python', 'PATH_TO_SCRIPT'])
b. subprocess.call(['docker', 'exec', 'CONTAINER_ID', '/usr/bin/python', '-c', UNTRUSTED_SCRIPT_TEXT])
Use PyPy-s sandbox.
Search for some "secure" IPython kernel for Jupyter notebook server. Or write your own. Note: existing kernels are not guaranteed to be secure and may allow to call subprocess.check_output, os.rm and others. So for "default kernel" it's still better to run Jupyter server in the isolated environment.
Run interpreter in chroot using unprivileged user. Different implementations have different level of "safety".
Use Jython with finely tuned permissions.
Some exotic solutions like "client-side JS python implementation": brython, pyjs
In any case, even if you manage to implement or reuse existing "sandbox" you still will get many potential problems:
If multiprocessing or multithreading is allowed then you might want to monitor how CPU resources are utilized, because
some scripts might want to use EVERYTHING. Even with GIL it's possible for multi-threading to utilize all kernels (all the user has to do is to call functions that use c-libraries in the threads)
You might want to monitor memory usage, because some scripts might leak or simply use a lot of memory
Other candidates for monitoring: Disk IO usage, Network usage, open file descriptors usage, execution time, etc...
Also you should always check for security updates of your "sandboxing solution", because even docker sometimes is vulnerable and makes it possible to execute code on host machine
Recommended read: https://softwareengineering.stackexchange.com/questions/191623/best-practices-for-execution-of-untrusted-code

Python Script Sandboxing using Docker

If I build a container using a base image like Python 3 Alpine, and I'll follow the Hardening indicated into the docker documentation, is it secure to inject and execute a Python script?
I mean, if a user will write something dangerous (like sudo rm -R using a Python function), only the container will be affected of those problems, right?
Is this a good practice? I need to execute some small code snippets with limited access to the system, modules, etc...

I would not treat Docker as a security “silver bullet” here; you want to have at least some notion that the code you’re running is “trustworthy” before unleashing it on your system, even under Docker.
Remember that you need to have root privileges to run docker anything at all, or else you can trivially gain them (docker run -v /:/host -u root ... will let you freely edit the host filesystem). If your application really is dealing in untrusted code, consider whether you want a privileged process to be able to deal with it.
Beyond that, Docker containers share the host’s kernel and various physical resources. If there’s a kernel privilege escalation bug, something running in a container could exploit it. If your untrusted code makes outbound TCP calls to shuffle data around that you wouldn’t want on your network, that’s not limited by default. If it’s “merely” using your CPU cycles to mine Bitcoin, you can’t control that.
If all of this sounds like an acceptable level of risk to you, then running somewhat-trusted code under Docker is certainly better than not: you do get some protection against changing files on the host and host-level settings like network configuration, especially if you believe the code you’re running isn’t actively malicious.

Deploying a Python Script to Windows and Linux

I have a python server that I need to run in both a Linux and Windows environment, and my question is about deployment. What is the best method for deploying the solution instead of just double clicking on the file and running it?
Since I use the server_forever() on the server, I can just run the script from command line, but this keeps the python window open. If I log off the machine, naturally the process will stop. So what is the best method for deploying a python script that needs to keep running if the user is logged in or off a machine.
Since I am going to be using multiple environment, Linux and Windows, can you please be specific in what OS you are talking about?
For windows, I was thinking of running the script 'At Startup' using the Windows scheduler. But I wanted to see if anyone had a better option. For linux, I really don't know what to create. I am assuming a CRON job?
Deployment does refer to coding, so using serve_forever() on a multiprocessing job manager keeps the python window open upon execution. Is there a way to hide this window through code? Would you recommend using a conversion tool like py2exe instead?

This is the subject matter of a whole library of books, so I will just give an introduction here :-)
You can basically start scripts directly and then have multiple options to do this in a way that they keep running in the background.
If you have certain functionality that needs to run on regular moments, you would do this by scheduling it:
Windows: Windows Scheduler or specific scheduling tools
Linux: Cron
If your problem is that you want to start a script without it closing on you while SSH'ing into Linux, you want to look into the "screen" or "tmux" tools.
If you want to have it started automatically this could be done by using the "At Startup" as you point out and Linux has similar functionalities, but the preferred and more robust way would be to set up a service that is better integrated with the OS.
Windows: Windows Service
Linux: Daemon
Even more capabilities can be yielded by using an application server such a Django
Tomcat (see comment) is an option, but definitely not the standard one; you'll have a hard time finding support both from Tomcat people running Python or Python people running their stuff on Tomcat. That being said, I imagine you could enable CGI and have it run a Python command with your script.
Yet, instead of just starting a Python script I would strongly encourage you to have a look at different Python options that are probably available for your specific use case. From lightweight web solutions like Flask over a versatile networking engine like Twisted to a full blown web framework like Django.
They all have rather well-thought-out deployment solutions available. Look up WSGI for more background.

Python web hosting: Why are server restarts necessary?

We currently run a small shared hosting service for a couple of hundred small PHP sites on our servers. We'd like to offer Python support too, but from our initial research at least, a server restart seems to be required after each source code change.
Is this really the case? If so, we're just not going to be able to offer Python hosting support. Giving our clients the ability to upload files is easy, but we can't have them restart the (shared) server process!
PHP is easy -- you upload a new version of a file, the new version is run.
I've a lot of respect for the Python language and community, so find it hard to believe that it really requires such a crazy process to update a site's code. Please tell me I'm wrong! :-)

Python is a compiled language; the compiled byte code is cached by the Python process for later use, to improve performance. PHP, by default, is interpreted. It's a tradeoff between usability and speed.
If you're using a standard WSGI module, such as Apache's mod_wsgi, then you don't have to restart the server -- just touch the .wsgi file and the code will be reloaded. If you're using some weird server which doesn't support WSGI, you're sort of on your own usability-wise.

Depends on how you deploy the Python application. If it is as a pure Python CGI script, no restarts are necessary (not advised at all though, because it will be super slow). If you are using modwsgi in Apache, there are valid ways of reloading the source. modpython apparently has some support and accompanying issues for module reloading.
There are ways other than Apache to host Python application, including the CherryPy server, Paste Server, Zope, Twisted, and Tornado.
However, unless you have a specific reason not to use it (an since you are coming from presumably an Apache/PHP shop), I would highly recommed mod_wsgi on Apache. I know that Django recommends modwsgi on Apache and most of the other major Python frameworks will work on modwsgi.

Is this really the case?
It Depends. Code reloading is highly specific to the hosting solution. Most servers provide some way to automatically reload the WSGI script itself, but there's no standardisation; indeed, the question of how a WSGI Application object is connected to a web server at all differs widely across varying hosting environments. (You can just about make a single script file that works as deployment glue for CGI, mod_wsgi, passenger and ISAPI_WSGI, but it's not wholly trivial.)
What Python really struggles with, though, is module reloading. Which is problematic for WSGI applications because any non-trivial webapp will be encapsulating its functionality into modules and packages rather than simple standalone scripts. It turns out reloading modules is quite tricky, because if you reload() them one by one they can easily end up with bad references to old versions. Ideally the way forward would be to reload the whole Python interpreter when any file is updated, but in practice it seems some C extensions seem not to like this so it isn't generally done.
There are workarounds to reload a group of modules at once which can reliably update an application when one of its modules is touched. I use a deployment module that does this (which I haven't got around to publishing, but can chuck you a copy if you're interested) and it works great for my own webapps. But you do need a little discipline to make sure you don't accidentally start leaving references to your old modules' objects in other modules you aren't reloading; if you're talking loads of sites written by third parties whose code may be leaky, this might not be ideal.
In that case you might want to look at something like running mod_wsgi in daemon mode with an application group for each party and process-level reloading, and touch the WSGI script file when you've updated any of the modules.
You're right to complain; this (and many other WSGI deployment issues) could do with some standardisation help.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.