Safely import module using sys.addaudithook

Safely import module using sys.addaudithook - python

I recently found out about the new sys.addaudithook in python. I run a service that involves running some untrusted scripts and I want to use the audit hooks to further secure the system. Scripts are running in a isolated container anyway but extra protection doesn't hurt.
Right now my code is this:
import sys
def audithook(event, args):
if event.startswith('os.') or event.startswith('sys.') or event.startswith('socket.') or event.startswith('winreg.') or event.startswith('webbrowser.') or event.startswith('shutil.') or event == "import":
raise RuntimeError("Attempt to break out of sandbox");
# Some code here that explicitly requires filesystem access
sys.addaudithook(audithook)
import untrusted_module
# More code that uses other
# Results need to eventually be saved to a file, that seems to work fine if I open the file before creating the hook then just write to it later. This is why I can't protect the entire program.
However, importing the file triggers quite a few hooks in the process of compiling the other file to bytecode and running it. I want to only prevent file system access inside the script itself and not the process of importing it.
I could try to set it to ignore some number N events before blocking new ones but the number of audit events seems to vary significantly depending on if the script changed or not, and how recent the stored bytecode is.
The entire audithook feature seems to be poorly documented and I can find very little about it online. Does anyone know how to use it to secure 1 module?

Related

Use static object in LibreOffce python script?

I've got a LibreOffice python script that uses serial IO. On my systems, opening a serial port is a very slow process (around 1 second), so I'd like to keep the serial port open, and just send stuff as required.
But LibreOffice python apparently reloads the python framework every time a call is made. Unlike most python implementations, where the process is persistent, and un-enclosed code in a module is run once, when the module is imported.
Is there a way in LibreOffice python to persist objects between calls?
SerialObject=None
def return_global():
return str(SerialObject) #always returns "None"
def init_serial_object():
SerialObject=True

It looks like there is a simple bug. Add global and then it works.
However, it may be that the real problem is your setup. Here is a working example. Put the code in a file called inctest.py under $LIBREOFFICE_USER_DIR/Scripts/python/pythonpath/incmod/.
def init_serial_object():
global SerialObject
SerialObject=True
def moduleVersion():
return "2.0" #change to verify that this is the most recently updated code
The code to call it should be located in the user profile directory (that is, location=user).
from incmod import inctest
inctest.init_serial_object()
msgbox("{}: {}".format(
inctest.moduleVersion(), inctest.return_global()))
Run the script by going to Tools > Macros > Run Macro and find it under My Macros.
Be aware that inctest.py will not always get reloaded. There are two ways to reload it: restart LibreOffice, or force python to reload it by doing del sys.modules[mod].
The moduleVersion() function is not necessary but helps you see when the module is getting reloaded — make changes to that line and then see whether the output changes.

Prevent any file system usage in Python's pytest

I have a program that, for data security reasons, should never persist anything to local storage if deployed in the cloud. Instead, any input / output needs to be written to the connected (encrypted) storage instead.
To allow deployment locally as well as to multiple clouds, I am using the very useful fsspec. However, other developers are working on the project as well, and I need a way to make sure that they aren't accidentally using local File I/O methods - which may pass unit tests, but fail when deployed to the cloud.
For this, my idea is to basically mock/replace any I/O methods in pytest with ones that don't work and make the test fail. However, this is probably not straightforward to implement. I am wondering whether anyone else has had this problem as well, and maybe best practices / a library exists for this already?
During my research, I found pyfakefs, which looks like it is very close what I am trying to do - except I don't want to simulate another file system, I want there to be no local file system at all.
Any input appreciated.

You can not use any pytest addons to make it secure. There will always be ways to overcome it. Even if you patch everything in the standard python library, the code always can use third-party C libraries which can't be patched from the Python side.
Even if you, by some way, restrict every way the python process can write the file, it will still be able to call the OS or other process to write something.
The only ways are to run only the trusted code or to use some sandbox to run the process.
In Unix-like operating systems, the workable solution may be to create a chroot and run the program inside it.
If you're ok with just preventing opening files using open function, you can patch this function in builtins module.
_original_open = builtins.open
class FileSystemUsageError(Exception):
pass
def patched_open(*args, **kwargs):
raise FileSystemUsageError()
#pytest.fixture
def disable_fs():
builtins.open = patched_open
yield
builtins.open = _original_open
I've done this example of code on the basis of the pytest plugin which is written by the company in which I work now to prevent using network in pytests. You can see a full example here: https://github.com/best-doctor/pytest_network/blob/4e98d816fb93bcbdac4593710ff9b2d38d16134d/pytest_network.py

How do I make Python code to execute only once?

I was thinking of how you execute code only once in Python. What I mean is setup code like when you set-up software; it only happens once and remembers you have already set up the software when you start the program again.
So in a sense I only want Python to execute a function once and not execute the function again even if the program is restarted.

you could create a file once set up is complete for example an empty .txt file and then check if it exists when program runs and if not runs setup
to check weather a file exists you can use os.pathlike so
import os.path
if not os.path.exists(file_path):
#run start up script
file = open (same_name_as_file_path, "w") #creates our file to say startup is complete you could write to this if you wanted as well
file.close

In addition to the method already proposed you may use pickle to save boolean variables representing whether some functions were executed (useful if you have multiple checks to carry out)
import pickle
f1_executed=True
f2_executed=False
pickle.dump([f1_executed,f2_executed],open("executed.pkl",mode='wb'))
##### Program Restarted #####
pickle.load(open("executed.pkl",mode='rb'))

If you need a kind of Setup-Script to install a program or to setup your operating system's environment, then I would go even further. Imagine that your setup became inconsistent in the mean-time and the program does not work properly anymore. Then it would be good to provide the user a script to repair that.
If you execute the script the second time, then you can:
either check, if the setup was correct and print an error message to the user, if the setup became inconsistent in the mean-time
or check the setup and repair it automatically, if inconsistent
Just reading a text file or something similar (f.e. storing a key in the registry of windows) may bring you into the situation that the setup became inconsistent, but your setup-script will say that everything is fine, because the text file (or the registry key) has been found.
Furthermore, if doing so, this facilitates also to "uninstall" your program. Since you know exactly what has been changed for installation, you can revert it by an uninstall script.

Python: Securing untrusted scripts/subprocess with chroot and chjail?

I'm writing a web server based on Python which should be able to execute "plugins" so that functionality can be easily extended.
For this I considered the approach to have a number of folders (one for each plugin) and a number of shell/python scripts in there named after predefined names for different events that can occur.
One example is to have an on_pdf_uploaded.py file which is executed when a PDF is uploaded to the server. To do this I would use Python's subprocess tools.
For convenience and security, this would allow me to use Unix environment variables to provide further information and set the working directory (cwd) of the process so that it can access the right files without having to find their location.
Since the plugin code is coming from an untrusted source, I want to make it as secure as possible. My idea was to execute the code in a subprocess, but put it into a chroot jail with a different user, so that it can't access any other resources on the server.
Unfortunately I couldn't find anything about this, and I wouldn't want to rely on the untrusted script to put itself into a jail.
Furthermore, I can't put the main/calling process into a chroot jail either, since plugin code might be executed in multiple processes at the same time while the server is answering other requests.
So here's the question: How can I execute subprocesses/scripts in a chroot jail with minimum privileges to protect the rest of the server from being damaged by faulty, untrusted code?
Thank you!

Perhaps something like this?
# main.py
subprocess.call(["python", "pluginhandler.py", "plugin", env])
Then,
# pluginhandler.py
os.chroot(chrootpath)
os.setgid(gid) # Important! Set GID first! See comments for details.
os.setuid(uid)
os.execle(programpath, arg1, arg2, ..., env)
# or another subprocess call
subprocess.call["python", "plugin", env])
EDIT: Wanted to use fork() but I didn't really understand what it did. Looked it up. New
code!
# main.py
import os,sys
somevar = someimportantdata
pid = os.fork()
if pid:
# this is the parent process... do whatever needs to be done as the parent
else:
# we are the child process... lets do that plugin thing!
os.setgid(gid) # Important! Set GID first! See comments for details.
os.setuid(uid)
os.chroot(chrootpath)
import untrustworthyplugin
untrustworthyplugin.run(somevar)
sys.exit(0)
This was useful and I pretty much just stole that code, so kudos to that guy for a decent example.

After creating your jail you would call os.chroot from your Python source to go into it. But even then, any shared libraries or module files already opened by the interpreter would still be open, and I have no idea what the consequences of closing those files via os.close would be; I've never tried it.
Even if this works, setting up chroot is a big deal so be sure the benefit is worth the price. In the worst case you would have to ensure that the entire Python runtime with all modules you intend to use, as well as all dependent programs and shared libraries and other files from /bin, /lib etc. are available within each jailed filesystem. And of course, doing this won't protect other types of resources, i.e. network destinations, database.
An alternative could be to read in the untrusted code as a string and then exec code in mynamespace where mynamespace is a dictionary defining only the symbols you want to expose to the untrusted code. This would be sort of a "jail" within the Python VM. You might have to parse the source first looking for things like import statements, unless replacing the built-in __import__ function would intercept that (I'm unsure).

Odd python search-path behavior, what's going wrong here?

We have an application based on Excel 2003 and Python 2.4 on Windows XP 32bit. The application consists of a large collection of Python functions which can be called from a number of excel worksheets.
We've notcied an anomolous behavior which is that sometimes in the middle of one of these calls the python interpreter will start hunting around for modules which almost certainly are already loaded and in memory.
We know this because we were able to hook-up Sysinternal's Process Monitor to the process and observe that from time to time the process (when called) starts hunting around a bunch of directories and eggs for certain .py files.
The obvious thing to try is to see if the python search-path had become modified, however we found this not to be the case. It's exactly what we'd expect. The odd thing is that:
The occasions on which this searching behavior was triggered appears to be random, i.e. it did not happen every time or with any noticable pattern.
The behavior did not affect the result of the function. It returned the same value irrespective of whether this file searching behavior was triggered.
The folders that were being scanned were non-existant (e.g. J:/python-eggs ) on a machine where J-drive contained no-such folder. Naturally procmon reports that this generated a file-not found error.
It's all very mysterious so I dont expect anybody to be able to provide a definitive answer as to what might be going wrong. I would appreciate any suggestions about how this problem might be debugged.
Thanks!
Answers to comments
All the things that are being searched for are actual, known python files which exist in the main project .egg file. The odd thing is that at the time they are being searched-for those particuar modules have already been imported. They must be in memory in order for the process to work.
Yes, this affects performance because sometimes this searching behavior tries to hit network drives. Also by searching eggs which couldnt possibly contain certain modules it the process gets interrupted by the corporate mandated virus-scanner. That slows down what would normally be a harmless and instant interruption.
This is stock python 2.4.4. No modifications.

Python programs can import modules at any time, not just during program load. Try searching the modules you are using for import.
If this doesn't work, you can write an import hook to catch and report all attempted imports before they occur. For example, if you run this before everything else, you will get a dump of every attempted import and its source:
import sys, traceback
class ImportDebugger:
def find_module(self, fullname, path=None):
print "Attempting to import %s:" % fullname
traceback.print_stack()
sys.meta_path.insert(0, ImportDebugger())

"Python functions which can be called from a number of excel worksheets"
And you're not blaming Excel for randomly running Python modules? Why not? How have you proven that Excel is behaving properly?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.