Is it possible to create an environment to safely run arbitrary Python scripts under Linux? Those scripts are supposed to be received from untrusted people and may be too large to check them manually.
A very brute-force solution is to create a virtual machine and restore its initial state after every launch of an untrusted script. (Too expensive.)
I wonder if it's possible to restrict Python from accessing the file system and interacting with other programs and so on.
Consider using a chroot jail. Not only is this very secure, well-supported and tested but it also applies to external applications you run from python.
There are 4 things you may try:
As you already mentioned, using a virtual machine or some other form of virtualisation (perhaps solaris zones are lightweight enough?). If the script breaks the OS there then you don't care.
Using chroot, which puts a shell session into a virtual root directory, separate from the main OS root directory.
Using systrace. Think of this as a firewall for system calls.
Using a "jail", which builds upon systrace, giving each jail it's own process table etc.
Systrace has been compromised recently, so be aware of that.
You could run jython and use the sandboxing mechanism from the JVM. The sandboxing in the JVM is very strong very well understood and more or less well documented. It will take some time to define exactly what you want to allow and what you dnt want to allow, but you should be able to get a very strong security from that ...
On the other side, jython is not 100% compatible with cPython ...
Try searching for "sandboxing python", e.g.:
http://wiki.python.org/moin/SandboxedPython
http://wiki.python.org/moin/How%20can%20I%20run%20an%20untrusted%20Python%20script%20safely%20(i.e.%20Sandbox)
could you not just run as a user which has no access to anything but the scripts in that directory?
Related
I need to integrate a body of Python code into an existing OSGi (Apache Felix) deployment.
I assume, or at least hope, that packages exist to help with this effort.
If it helps, the Python code is still relatively new and small, so can probably re re-architected to meet whatever constraints are needed. However, it must remain in Python, because of dependencies on third-party libraries.
What are suggested best practices?
The trick is to make this an extender, see 1 and 2. You want your Python code to be separate from the code that handles the interaction with the interpreter. So what you do is wrap the Python code and any native libraries in a bundle. This is trivial since it is just a zip file.
You then develop a bundle that listens to starting bundle (see the BundleTracker) that have python code. A manifest is often used but you can also look in a directory in the JAR. If you detect this code, you extract any native libraries and run the code in the interpeter of your choice.
If can use JYthon then that would be highly recommended. You can then carry the interpreter as an OSGi bundle that runs on the VM. If you need to use a native compiler your life is less rosy. You can rely on the environment to provide you with an interpreter but then why use OSGi in the first place. You basically lose the write once run anywhere advantage. You could go the full monty by creating bundles that contain Python installers for all platforms you support. Can be done, not even that hard, but a maintenance nightmare. Believe me, native code suck, it only does it a bit faster than Java.
For to offer interactive examples about data analysis, I'd like to embed an interactive python shell. It does not necessarily have to be a real Python shell. Users shall be given tasks that they can execute in the shell. This is similar to existing tutorials, as seen on, e.g., http://www.codecademy.org, but I'd like to work with libraries that those solutions do not offer, as far as I understood.
In order to get a real shell on the website, I think of two approaches:
I found projects like http://www.repl.it, but it seems rather difficult to include the necessary libraries like SciPy, NumPy, and Pandas. In addition, user input has to be validated and I'm not sure whether that works with those shells I found.
I could pipe the commands through a web applications to a Python installation on my server, but I'm scared of using eval() on foreign, arbitrary code. Is there a safe mode for Python? I found http://www.pypy.org. Although they offer a Python sandbox, unfortunately, they do not support the libraries I need.
Alternatively, I thought of just embedding a "fake shell", which I build to copy the behaviour of the functions that I want to explain. Of course, this would result in more work, as I would have to write a fake interface, but for now it seems to be the only possibility.
I hope that this question is not too generic; I'm looking for either a good HTML/JS library that helps me put a fake shell on my website or a library/service/software that can embed a real Python shell with the required modules installed.
There is no way to run untrusted Python safely; Python's dynamic nature allows for too many ways to break through any protective layers you could care to think of.
Instead, run each session on a new virtual machine, properly locked down (firewalled, unprivileged user), which you shut down after a hard time limit. New sessions get a new, clean virtual machine.
This isolates you from any malicious code that might run and try to break out of a sandbox; a good virtual machine is hardware-isolated by the processor from the host OS, something a Python-only layer could never achieve.
This process is sometimes called sandboxing.
You can find some good information on the python wiki
There are basically three options available:
machine-level mechanisms (such as a VM, as Martijn Pieters suggested)
OS-level mechanisms (such as a chroot or SELinux)
custom interpreters, such as pypy (which has sandboxing capabilities, as you mentioned), or Jython, where you may be able to use the Java security manager or applet mechanisms.
You may also want to check Restricted Python, which is especially useful for very restricted environments, but security will depend on its configuration.
Ultimately, your choice of solution will depend on what you want to restrict:
Filesystem access? Block everything, or allow certain directories?
Network access, such as sockets?
Arbitrary system calls?
I am working on Windows (sadface) with Python and virtualenv.
I would like to have setup and teardown scripts that go along with the virtualenv activation/deactivation. But I am not sure if these hooks have already been designated, and if so, where?
I guess I could hack the activate.bat, but then what if I use activate.py instead (does activate.py call activate.bat, or must I hack both files)? I can almost get away with environment variable PYTHONSTARTUP, but this needs to be redefined in each virtualenv. So unless virtualenv allows arbitrary assignment of env-vars, I am back to an activation/deactivation hook to set PYTHONSTARTUP (which really defeats the purpose, but now you see my catch-22).
EDIT: I plan to use my virtualenv to host interactive development sessions. I will be calling 'venv/bin/activate.bat' manually from the terminal. I do not want loose Batch/Powershell scripts laying around that I have to remember to call once when I activate, and once again when I deactive. I want to hook the execution in such a way, so that after I add my custom scripting hooks, 6 months later I don't have to remember how it works. I just execute activate.bat, and I am off to the races.
Many problems lessened or solved with virtualenvwrapper-win. Well written framework, with simple entry points.I spend a lot of time fighting with windows, trying to get a functional python work environment. This is one of the those programs I really wish I knew about a long time ago.
Does not handle multiple python installations extraordinarily (or switching between them), but the project owner also developed another supporting product, pywin, meant to augment that particular shortcoming.
The whole point is, that it makes Windows command-line development quite a bit smoother, even if its not all the automation I dream about.
I need to send code to remote clients to be executed in them but security is a concern for me right now. I don't want unsafe code to be executed there so I would like to control what a program is doing. I mean for example, know if is making connections, where is connecting to, if is reading local files, etc. Is this possible with Python?
EDIT: I'm thinking in something similar to Android permission system. I want to know what a code will do and if it does something different, stop it.
You could use a different Python runtime:
if you run your script using Jython; you can exploit Java's permission system
with Pypy's sandboxed version you can choose what is allowed to run in your controller script
There used to be a module in Python called bastian, but that was deprecated as it wasn't that secure. There's also I believe something called RPython, but I don't know too much about that.
I would in this case use Pyro and write the code on the target server. That way you know clients can only execute written and tested code.
edit - it's probably worth noting that Pyro also supports http://en.wikipedia.org/wiki/Privilege_separation - although I've not had to use it for that.
I think you are looking for a sandboxed python. There used to be an effort to implement this, but it has been abolished a couple of years ago.
Sandboxed python in the python wiki offers a nice overview of possible options for your usecase.
The most rigourous (but probably the slowest) way is to run Python on a bare OS in an emulator.
Depending on the OS you use, there are several ways of running programs with restrictions, but without the overhead of an emulator:
FreeBSD has a nice integrated solution in the form of jails.
These grew out of the chroot system call.
Linux-VServer aims to do more or less the same on Linux.
I want to make some python scripts to create an "Appliance" with VirtualBox. However, I can't find any documentation anywhere on making calls to VBoxService.exe. Well, I've found stuff that works from OUTSIDE the Machine, but nothing from working from inside the machine.
Does anyone know anything about this? If there's a library for another language like C I'd be okay with it, though Python would be heavily preferred.
Consider using libvirt. The VirtualBox support is bleeding-edge (not in any release, may not even be in source control yet, but is available as a set of patches on the mailing list) -- but this single API, available for C, Python and several other languages, lets you control virtual machines and images running in Qemu/KVM, Xen, LXC (Linux Containers), UML (User-Mode Linux), OpenVZ and others.
I build and administer virtual appliances (in an automated QA context) using libvirt with the qemu/KVM backend, and it meets my needs very well.
libvirt can be configured to allow remote access (such as controlling or querying VBoxService or libvirtd from within one of the VMs, which you appear to want to do -- though I question the wisdom and utility), with numerous authentication and transport options available.
[Caveat: libvirt principally targets Unixlike operating systems; it can be built for win32, but YMMV]