Python: Securing untrusted scripts/subprocess with chroot and chjail?

Python: Securing untrusted scripts/subprocess with chroot and chjail? - python

I'm writing a web server based on Python which should be able to execute "plugins" so that functionality can be easily extended.
For this I considered the approach to have a number of folders (one for each plugin) and a number of shell/python scripts in there named after predefined names for different events that can occur.
One example is to have an on_pdf_uploaded.py file which is executed when a PDF is uploaded to the server. To do this I would use Python's subprocess tools.
For convenience and security, this would allow me to use Unix environment variables to provide further information and set the working directory (cwd) of the process so that it can access the right files without having to find their location.
Since the plugin code is coming from an untrusted source, I want to make it as secure as possible. My idea was to execute the code in a subprocess, but put it into a chroot jail with a different user, so that it can't access any other resources on the server.
Unfortunately I couldn't find anything about this, and I wouldn't want to rely on the untrusted script to put itself into a jail.
Furthermore, I can't put the main/calling process into a chroot jail either, since plugin code might be executed in multiple processes at the same time while the server is answering other requests.
So here's the question: How can I execute subprocesses/scripts in a chroot jail with minimum privileges to protect the rest of the server from being damaged by faulty, untrusted code?
Thank you!

Perhaps something like this?
# main.py
subprocess.call(["python", "pluginhandler.py", "plugin", env])
Then,
# pluginhandler.py
os.chroot(chrootpath)
os.setgid(gid) # Important! Set GID first! See comments for details.
os.setuid(uid)
os.execle(programpath, arg1, arg2, ..., env)
# or another subprocess call
subprocess.call["python", "plugin", env])
EDIT: Wanted to use fork() but I didn't really understand what it did. Looked it up. New
code!
# main.py
import os,sys
somevar = someimportantdata
pid = os.fork()
if pid:
# this is the parent process... do whatever needs to be done as the parent
else:
# we are the child process... lets do that plugin thing!
os.setgid(gid) # Important! Set GID first! See comments for details.
os.setuid(uid)
os.chroot(chrootpath)
import untrustworthyplugin
untrustworthyplugin.run(somevar)
sys.exit(0)
This was useful and I pretty much just stole that code, so kudos to that guy for a decent example.

After creating your jail you would call os.chroot from your Python source to go into it. But even then, any shared libraries or module files already opened by the interpreter would still be open, and I have no idea what the consequences of closing those files via os.close would be; I've never tried it.
Even if this works, setting up chroot is a big deal so be sure the benefit is worth the price. In the worst case you would have to ensure that the entire Python runtime with all modules you intend to use, as well as all dependent programs and shared libraries and other files from /bin, /lib etc. are available within each jailed filesystem. And of course, doing this won't protect other types of resources, i.e. network destinations, database.
An alternative could be to read in the untrusted code as a string and then exec code in mynamespace where mynamespace is a dictionary defining only the symbols you want to expose to the untrusted code. This would be sort of a "jail" within the Python VM. You might have to parse the source first looking for things like import statements, unless replacing the built-in __import__ function would intercept that (I'm unsure).

Related

Prevent any file system usage in Python's pytest

I have a program that, for data security reasons, should never persist anything to local storage if deployed in the cloud. Instead, any input / output needs to be written to the connected (encrypted) storage instead.
To allow deployment locally as well as to multiple clouds, I am using the very useful fsspec. However, other developers are working on the project as well, and I need a way to make sure that they aren't accidentally using local File I/O methods - which may pass unit tests, but fail when deployed to the cloud.
For this, my idea is to basically mock/replace any I/O methods in pytest with ones that don't work and make the test fail. However, this is probably not straightforward to implement. I am wondering whether anyone else has had this problem as well, and maybe best practices / a library exists for this already?
During my research, I found pyfakefs, which looks like it is very close what I am trying to do - except I don't want to simulate another file system, I want there to be no local file system at all.
Any input appreciated.

You can not use any pytest addons to make it secure. There will always be ways to overcome it. Even if you patch everything in the standard python library, the code always can use third-party C libraries which can't be patched from the Python side.
Even if you, by some way, restrict every way the python process can write the file, it will still be able to call the OS or other process to write something.
The only ways are to run only the trusted code or to use some sandbox to run the process.
In Unix-like operating systems, the workable solution may be to create a chroot and run the program inside it.
If you're ok with just preventing opening files using open function, you can patch this function in builtins module.
_original_open = builtins.open
class FileSystemUsageError(Exception):
pass
def patched_open(*args, **kwargs):
raise FileSystemUsageError()
#pytest.fixture
def disable_fs():
builtins.open = patched_open
yield
builtins.open = _original_open
I've done this example of code on the basis of the pytest plugin which is written by the company in which I work now to prevent using network in pytests. You can see a full example here: https://github.com/best-doctor/pytest_network/blob/4e98d816fb93bcbdac4593710ff9b2d38d16134d/pytest_network.py

Safely import module using sys.addaudithook

I recently found out about the new sys.addaudithook in python. I run a service that involves running some untrusted scripts and I want to use the audit hooks to further secure the system. Scripts are running in a isolated container anyway but extra protection doesn't hurt.
Right now my code is this:
import sys
def audithook(event, args):
if event.startswith('os.') or event.startswith('sys.') or event.startswith('socket.') or event.startswith('winreg.') or event.startswith('webbrowser.') or event.startswith('shutil.') or event == "import":
raise RuntimeError("Attempt to break out of sandbox");
# Some code here that explicitly requires filesystem access
sys.addaudithook(audithook)
import untrusted_module
# More code that uses other
# Results need to eventually be saved to a file, that seems to work fine if I open the file before creating the hook then just write to it later. This is why I can't protect the entire program.
However, importing the file triggers quite a few hooks in the process of compiling the other file to bytecode and running it. I want to only prevent file system access inside the script itself and not the process of importing it.
I could try to set it to ignore some number N events before blocking new ones but the number of audit events seems to vary significantly depending on if the script changed or not, and how recent the stored bytecode is.
The entire audithook feature seems to be poorly documented and I can find very little about it online. Does anyone know how to use it to secure 1 module?

How to check if a folder is open in Windows/Macintosh and possibly close it?

I am working in python and sometimes with os.rename() I run into the problem that if the file or folder to rename is open in the windows system I get a PermissionError: [WinError 5] error.
So if I close the folder and rerun the script, everything works.
In any case I am working on Windows, but I think it is good practice that I take into account that it could also be run on Machintosh.
I don't know what the best practice is for this, but please have a little patience, I'm still learning python and really don't know how to ask the question any better than that.

In general, no. This is a violation of system security: another entity (user, session, process, etc.) is using the resource (file or folder), in a way that requires exclusive rights (typically update / modify rights). If you steal the resource, how is that other entity supposed to react or adapt to the change?
This is why an OS has these privileges and locks: to manage system resources. Since you already have user control over the resource, you are supposed to use that authority to release the file lock -- not cracking the security from outside.
However, as the controlling user, you do likely have rights to view your own sessions and jobs, to inquire which one owns the resource, and then terminate the job or otherwise force it to release the resource.
In that fashion, it is possible to steal the resource from the other process. If you want to do that, you need to educate yourself on your OS's capabilities and rights. The most useful ones will be available through Python's os package. Enjoy the learning.

To echo Prune's answer, figuring out where it's in use and closing it sounds difficult, and probably not a great idea anyway. Imagine if it's not available because some other program is currently saving to one of the files — you could end up with corrupted data.
That said, you could make your Python script smart enough to notice there's a permission error, then pause and let you know so that you could try and close things on your own before telling it to try again and continue.
import os
def try_to_rename(src, dst):
while True:
try:
os.rename(src, dst)
break
except PermissionError:
input(f"Unable to rename {src} to {dst}. If one or both "
"files/folders are open, please close them. Press Enter to "
"continue.")
P.S. It makes little difference for this simple example, but for working extensively with paths and files, I'd recommend pathlib over os. It can really make things a lot more convenient and readable.

when using Watchman's watch-make I want to access the name of the changed files

I am writing a watchman command with watchman-make and I'm at a loss when trying to access exactly what was changed in the directory. I want to run my upload.py script and inside the script I would like to access filenames of newly created files in /var/spool/cups-pdf/ANONYMOUS .
so far I have
$ watchman-make -p '/var/spool/cups-pdf/ANONYMOUS' -—run 'python /home/pi/upload.py'
I'd like to add another argument to python upload.py so I can have an exact filepath to the newly created file so that I can send the new file over to my database in upload.py,
I've been looking at the docs of watchman and the closest thing I can think to use is a trigger object. Please help!

Solution with watchman-wait:
Assuming project layout like this:
/posts/_SUBDIR_WITH_POST_NAME_/index.md
/Scripts/convert.sh
And the shell script like this:
#!/bin/bash
# File: convert.sh
SrcDirPath=$(cd "$(dirname "$0")/../"; pwd)
cd "$SrcDirPath"
echo "Converting: $SrcDirPath/$1"
Then we can launch watchman-wait like this:
watchman-wait . --max-events 0 -p 'posts/**/*.md' | while read line; do ./Scripts/convert.sh $line; done
When we changing file /posts/_SUBDIR_WITH_POST_NAME_/index.md the output will be like this:
...
Converting: /Users/.../Angular/dartweb_quickstart/posts/swift-on-android-building-toolchain/index.md
Converting: /Users/.../Angular/dartweb_quickstart/posts/swift-on-android-building-toolchain/index.md
...

watchman-make is intended to be used together with tools that will perform a follow-up query of their own to discover what they want to do as a next step. For example, running the make tool will cause make to stat the various deps to bring things up to date.
That means that your upload.py script needs to know how to do this for itself if you want to use it with watchman.
You have a couple of options, depending on how sophisticated you want things to be:
Use pywatchman to issue an ad-hoc query
If you want to be able to run upload.py whenever you want and have it figure out the right thing (just like make would do) then you can have it ask watchman directly. You can have upload.py use pywatchman (the python watchman client) to do this. pywatchman will get installed if the the watchman configure script thinks you have a working python installation. You can also pip install pywatchman. Once you have it available and in your PYTHONPATH:
import pywatchman
client = pywatchman.client()
client.query('watch-project', os.getcwd())
result = client.query('query', os.getcwd(), {
"since": "n:pi_upload",
"fields": ["name"]})
print(result["files"])
This snippet uses the since generator with a named cursor to discover the list of files that changed since the last query was issued using that same named cursor. Watchman will remember the associated clock value for you, so you don't need to complicate your script with state tracking. We're using the name pi_upload for the cursor; the name needs to be unique among the watchman clients that might use named cursors, so naming it after your tool is a good idea to avoid potential conflict.
This is probably the most direct way to extract the information you need without requiring that you make more invasive changes to your upload script.
Use pywatchman to initiate a long running subscription
This approach will transform your upload.py script so that it knows how to directly subscribe to watchman, so instead of using watchman-make you'd just directly run upload.py and it would keep running and performing the uploads. This is a bit more invasive and is a bit too much code to try and paste in here. If you're interested in this approach then I'd suggest that you take the code behind watchman-wait as a starting point. You can find it here:
https://github.com/facebook/watchman/blob/master/python/bin/watchman-wait
The key piece of this that you might want to modify is this line:
https://github.com/facebook/watchman/blob/master/python/bin/watchman-wait#L169
which is where it receives the list of files.
Why not triggers?
You could use triggers for this, but we're steering folks away from triggers because they are hard to manage. A trigger will run in the background and have its output go to the watchman log file. It can be difficult to tell if it is running, or to stop it running.
The interface is closer to the unix model and allows you to feed a list of files on stdin.
Speaking of unix, what about watchman-wait?
We also have a command that emits the list of changed files as they change. You could potentially stream the output from watchman-wait in your upload.py. This would make it have some similarities with the subscription approach but do so without directly using the pywatchman client.

How to customize Flyway so that it can handle CSV files as input as well?

Has someone implemented the CSV-handling for Flyway? It was requested some time ago (Flyway specific migration with csv files). Flyway comments it now as a possibility for the MigrationResolver and MigrationExecutor, but it does not seem to be implemented.
I've tried to do it myself with Flyway 4.2, but I'm not very good with java. I got as far as creating my own jar using the sample and make it accessible to flyway. But how does flyway distinguish when to use the SqlMigrator and when to use my CsvMigrator? I thought I have to register my own prefix/suffix (as the question above writes), but FlywayConfiguration seems to be read-only, at least I did not see any API calls for doing this :(.
How to connect the different Resolvers to the different migration file types? (.sql to the migration using Sql and .csv/.py to the loading of Csv and executing python scripts)

After some shed of tears and blood, it looks like came up with something on this. I can't make the whole code available because it is using proprietary file format, but here's the main ideas:
implement the ConfigurationAware as well, and use the setFlywayConfiguration implementation to catalog the extra files you want to handle (i.e. .csv). This is executed only once during the run.
during this cataloging I could not use the scanner or LoadableResources, there's some Java magic I do not understand. All the classes and methods seem to be available and accessible, even when using .getMethods() runtime... but when trying to actually call them during a run it throws java.lang.NoSuchMethodError and java.lang.NoClassDefFoundError. I've wasted a whole day on this - don't do that, just copy-paste the code from org.flywaydb.core.internal.util.scanner.filesystem.FileSystemScanner.
use Set< String > instead of LoadableResources[], way easier to work with, especially since there's no access to LoadableResources anyway and working with [] was a nightmare.
the python/shell call will go to the execute(). Some tips:
any exception or fawlty exitcode needs to be translated to an SQLException.
the build is enforcing Java 1.6, so new ProcessBuilder(cmd).inheritIO() cannot be used. Look at these solutions: ProcessBuilder: Forwarding stdout and stderr of started processes without blocking the main thread if you want to print the STDOUT/STDERR.
to compile flyway including your custom module, clone the whole flyway repo from git, edit the main pom.xml to include your module as well and use this command to compile: "mvn install -P-CommercialDBTest -P-CommandlinePlatformAssemblies -DskipTests=true" (I found this in another stackoverflow question.)
what I haven't done yet is the checksum part, I don't know yet what that wants.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.