disallow access to filesystem inside exec and eval in Python

disallow access to filesystem inside exec and eval in Python - python

I want to disallow access to file system from clients code, so I think I could overwrite open function
env = {
'open': lambda *a: StringIO("you can't use open")
}
exec(open('user_code.py'), env)
but I got this
unqualified exec is not allowed in function 'my function' it contains a
nested function with free variables
I also try
def open_exception(*a):
raise Exception("you can't use open")
env = {
'open': open_exception
}
but got the same Exception (not "you can't use open")
I want to prevent of:
executing this:
"""def foo():
return open('some_file').read()
print foo()"""
and evaluate this
"open('some_file').write('some text')"
I also use session to store code that was evaluated previously so I need to prevent of executing this:
"""def foo(s):
return open(s)"""
and then evaluating this
"foo('some').write('some text')"
I can't use regex because someone could use (eval inside string)
"eval(\"opxx('some file').write('some text')\".replace('xx', 'en')"
Is there any way to prevent access to file system inside exec/eval? (I need both)

There's no way to prevent access to the file system inside exec/eval. Here's an example code that demonstrates a way for the user code to call otherwise restricted classes that always works:
import subprocess
code = """[x for x in ().__class__.__bases__[0].__subclasses__()
if x.__name__ == 'Popen'][0](['ls', '-la']).wait()"""
# Executing the `code` will always run `ls`...
exec code in dict(__builtins__=None)
And don't think about filtering the input, especially with regex.
You might consider a few alternatives:
ast.literal_eval if you could limit yourself only to simple expressions
Using another language for user code. You might look at Lua or JavaScript - both are sometimes used to run unsafe code inside sandboxes.
There's the pysandbox project, though I can't guarantee you that the sandboxed code is really safe. Python wasn't designed to be sandboxed, and in particular the CPython implementation wasn't written with sandboxing in mind. Even the author seems to doubt the possibility to implement such sandbox safely.

You can't turn exec() and eval() into a safe sandbox. You can always get access to the builtin module, as long as the sys module is available::
sys.modules[().__class__.__bases__[0].__module__].open
And even if sys is unavailable, you can still get access to any new-style class defined in any imported module by basically the same way. This includes all the IO classes in io.

This actually can be done.
That is, practically just what you describe can be accomplished on Linux, contrary to other answers here. That is, you can achieve a setup where you can have an exec-like call which runs untrusted code under security which is reasonably difficult to penetrate, and which allows output of the result. Untrusted code is not allowed to access the filesystem at all except for reading specifically allowed parts of the Python vm and standard library.
If that's close enough to what you wanted, read on.
I'm envisioning a system where your exec-like function spawns a subprocess under a very strict AppArmor profile, such as the one used by Straitjacket (see here and here). This will limit all filesystem access at the kernel level, other than files specifically allowed to be read. This will also limit the process's stack size, max data segment size, max resident set size, CPU time, the number of signals that can be queued, and the address space size. The process will have locked memory, cores, flock/fcntl locks, POSIX message queues, etc, wholly disallowed. If you want to allow using size-limited temporary files in a scratch area, you can mkstemp it and make it available to the subprocess, and allow writes there under certain conditions (make sure that hard links are absolutely disallowed). You'd want to make sure to clear out anything interesting from the subprocess environment and put it in a new session and process group, and close all FDs in the subprocess except for the stdin/stdout/stderr, if you want to allow communication with those.
If you want to be able to get a Python object back out from the untrusted code, you could wrap it in something which prints the result's repr to stdout, and after you check its size, you evaluate it with ast.literal_eval(). That pretty severely limits the possible types of object that can be returned, but really, anything more complicated than those basic types probably carries the possibility of sekrit maliciousness intended to be triggered within your process. Under no circumstances should you use pickle for the communication protocol between the processes.

As #Brian suggest overriding open doesn't work:
def raise_exception(*a):
raise Exception("you can't use open")
open = raise_exception
print eval("open('test.py').read()", {})
this display the content of the file but this (merging #Brian and #lunaryorn answers)
import sys
def raise_exception(*a):
raise Exception("you can't use open")
__open = sys.modules['__builtin__'].open
sys.modules['__builtin__'].open = raise_exception
print eval("open('test.py').read()", {})
will throw this:
Traceback (most recent call last):
File "./test.py", line 11, in <module>
print eval("open('test.py').read()", {})
File "<string>", line 1, in <module>
File "./test.py", line 5, in raise_exception
raise Exception("you can't use open")
Exception: you can't use open
Error in sys.excepthook:
Traceback (most recent call last):
File "/usr/lib/python2.6/dist-packages/apport_python_hook.py", line 48, in apport_excepthook
if not enabled():
File "/usr/lib/python2.6/dist-packages/apport_python_hook.py", line 23, in enabled
conf = open(CONFIG).read()
File "./test.py", line 5, in raise_exception
raise Exception("you can't use open")
Exception: you can't use open
Original exception was:
Traceback (most recent call last):
File "./test.py", line 11, in <module>
print eval("open('test.py').read()", {})
File "<string>", line 1, in <module>
File "./test.py", line 5, in raise_exception
raise Exception("you can't use open")
Exception: you can't use open
and you can access to open outside user code via __open

"Nested function" refers to the fact that it's declared inside another function, not that it's a lambda. Declare your open override at the top level of your module and it should work the way you want.
Also, I don't think this is totally safe. Preventing open is just one of the things you need to worry about if you want to sandbox Python.

Related

Finding which files are being read from during a session (python code)

I have a large system written in python. when I run it, it reads all sorts of data from many different files on my filesystem. There are thousands lines of code, and hundreds of files, most of them are not actually being used. I want to see which files are actually being accessed by the system (ubuntu), and hopefully, where in the code they are being opened. Filenames are decided dynamically using variables etc., so the actual filenames cannot be determined just by looking at the code.
I have access to the code, of course, and can change it.
I try to figure how to do this efficiently, with minimal changes in the code:
is there a Linux way to determine which files are accessed, and at what times? this might be useful, although it won't tell me where in the code this happens
is there a simple way to make an "open file" command also log the file name, time, etc... of the open file? hopefully without having to go into the code and change every open command, there are many of them, and some are not being used at runtime.
Thanks

You can trace file accesses without modifying your code, using strace.
Either you start your program with strace, like this
strace -f -e trace=file your_program.py
Otherwise you attach strace to a running program like this
strace -f -e trace=file -p <PID>

For 1 - You can use
ls -la /proc/<PID>/fd`
Replacing <PID> with your process id.
Note that it will give you all the open file descriptors, some of them are stdin stdout stderr, and often other things, such as open websockets (which use a file descriptor), however filtering it for files should be easy.
For 2- See the great solution proposed here -
Override python open function when used with the 'as' keyword to print anything
e.g. overriding the open function with your own, which could include the additional logging.

One possible method is to "overload" the open function. This will have many effects that depend on the code, so I would do that very carefully if needed, but basically here's an example:
>>> _open = open
>>> def open(filename):
... print(filename)
... return _open(filename)
...
>>> open('somefile.txt')
somefile.txt
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in open
FileNotFoundError: [Errno 2] No such file or directory: 'somefile.txt'
As you can see my new open function will return the original open (renamed as _open) but will first print out the argument (the filename). This can be done with more sophistication to log the filename if needed, but the most important thing is that this needs to run before any use of open in your code

How to override python file lookup for traceback printing?

I've developed a template engine (https://github.com/hl037/tempiny) that translate a file to python code. Then I call exec().
However, when there are errors in the template, the traceback either print no information (if I don't give a filename to compile) or prints the wrong line.
Thus, I would like to change the way the file is retrieved, to use my in-ram version. (by the way, I also can change the file name).
I've checked this question : File "<string>" traceback with line preview
...But the answer does not work with the built-in traceback. (and that means that you have to hack globally sys.excepthook, which could cause conflicts with other modules)
I could also use a temporary file, but it would results in some overhead just for getting info only when something wrong happens...
Do you have a solution?

RuntimeError: lost sys.stdout

I was trying to debug an issue with abc.ABCMeta - in particular a subclass check that didn't work as expected and I wanted to start by simply adding a print to the __subclasscheck__ method (I know there are better ways to debug code, but pretend for the sake of this question that there's no alternative). However when starting Python afterwards Python crashes (like a segmentation fault) and I get this exception:
Fatal Python error: Py_Initialize: can't initialize sys standard streams
Traceback (most recent call last):
File "C:\...\lib\io.py", line 84, in <module>
File "C:\...\lib\abc.py", line 158, in register
File "C:\...\lib\abc.py", line 196, in __subclasscheck__
RuntimeError: lost sys.stdout
So it probebly wasn't a good idea to put the print in there. But where exactly does the exception come from? I only changed Python code, that shouldn't crash, right?
Does someone know where this exception is coming from and if/how I can avoid it but still put a print in the abc.ABCMeta.__subclasscheck__ method?
I'm using Windows 10, Python-3.5 (just in case it might be important).

This exception stems from the fact that CPython imports io, and, indirectly, abc.py during the initialization of the standard streams:
if (!(iomod = PyImport_ImportModule("io"))) {
goto error;
}
io imports the abc module and registers FileIO as a virtual subclass of RawIOBase, a couple of other classes for BufferedIOBase and others for TextIOBase. ABCMeta.register invokes __subclasscheck__ in the process.
As you understand, using print in __subclasscheck__ when sys.stdout isn't set-up is a big no-no; the initialization fails and you get back your error:
if (initstdio() < 0)
Py_FatalError(
"Py_NewInterpreter: can't initialize sys standard streams");
You can get around it by guarding it with a hasattr(sys, 'stdout'), sys has been initialized by this point while stdout hasn't (and, as such, won't exist in sys in the early initialization phase):
if hasattr(sys, 'stdout'):
print("Debug")
you should get good amount of output when firing Python up now.

Using python subprocess to fake running a cmd from a terminal

We have a vendor-supplied python tool ( that's byte-compiled, we don't have the source ). Because of this, we're also locked into using the vendor supplied python 2.4. The way to the util is:
source login.sh
oupload [options]
The login.sh just sets a few env variables, and then 2 aliases:
odownload () {
${PYTHON_CMD} ${OCLIPATH}/ocli/commands/word_download_command.pyc "$#"
}
oupload () {
${PYTHON_CMD} ${OCLIPATH}/ocli/commands/word_upload_command.pyc "$#"
}
Now, when I run it their way - works fine. It will prompt for a username and password, then do it's thing.
I'm trying to create a wrapper around the tool to do some extra steps after it's run and provide some sane defaults for the utility. The problem I'm running into is I cannot, for the life of me, figure out how to use subprocess to successfully do this. It seems to realize that the original command isn't running directly from the terminal and bails.
I created a '/usr/local/bin/oupload' and copied from the original login.sh. Only difference is instead of doing an alias at the end, I actually run the command.
Then, in my python script, I try to run my new shell script:
if os.path.exists(options.zipfile):
try:
cmd = string.join(cmdargs,' ')
p1 = Popen(cmd, shell=True, stdin=PIPE)
But I get:
Enter Opsware Username: Traceback (most recent call last):
File "./command.py", line 31, in main
File "./controller.py", line 51, in handle
File "./controllers/word_upload_controller.py", line 81, in _handle
File "./controller.py", line 66, in _determineNew
File "./lib/util.py", line 83, in determineNew
File "./lib/util.py", line 112, in getAuth
Empty Username not legal
Unknown Error Encountered
SUMMARY:
Name: Empty Username not legal
Description: None
So it seemed like an extra carriage return was getting sent ( I tried rstripping all the options, didn't help ).
If I don't set stdin=PIPE, I get:
Enter Opsware Username: Traceback (most recent call last):
File "./command.py", line 31, in main
File "./controller.py", line 51, in handle
File "./controllers/word_upload_controller.py", line 81, in _handle
File "./controller.py", line 66, in _determineNew
File "./lib/util.py", line 83, in determineNew
File "./lib/util.py", line 109, in getAuth
IOError: [Errno 5] Input/output error
Unknown Error Encountered
I've tried other variations of using p1.communicate, p1.stdin.write() along with shell=False and shell=True, but I've had no luck in trying to figure out how to properly send along the username and password. As a last result, I tried looking at the byte code for the utility they provided - it didn't help - once I called the util's main routine with the proper arguments, it ended up core dumping w/ thread errors.
Final thoughts - the utility doesn't want to seem to 'wait' for any input. When run from the shell, it pauses at the 'Username' prompt. When run through python's popen, it just blazes thru and ends, assuming no password was given. I tried to lookup ways of maybe preloading the stdin buffer - thinking maybe the process would read from that if it was available, but couldn't figure out if that was possible.
I'm trying to stay away from using pexpect, mainly because we have to use the vendor's provided python 2.4 because of the precompiled libraries they provide and I'm trying to keep distribution of the script to as minimal a footprint as possible - if I have to, I have to, but I'd rather not use it ( and I honestly have no idea if it works in this situation either ).
Any thoughts on what else I could try would be most appreciated.
UPDATE
So I solved this by diving further into the bytecode and figuring out what I was missing from the compiled command.
However, this presented two problems -
The vendor code, when called, was doing an exit when it completed
The vendor code was writing to stdout, which I needed to store and operate on ( it contains the ID of the uploaded pkg ). I couldn't just redirect stdout, because the vendor code was still asking for the username/password.
1 was solved easy enough by wrapping their code in a try/except clause.
2 was solved by doing something similar to: https://stackoverflow.com/a/616672/677373
Instead of a log file, I used cStringIO. I also had to implement a fake 'flush' method, since it seems the vendor code was calling that and complaining that the new obj I had provided for stdout didn't supply it - code ends up looking like:
class Logger(object):
def __init__(self):
self.terminal = sys.stdout
self.log = StringIO()
def write(self, message):
self.terminal.write(message)
self.log.write(message)
def flush(self):
self.terminal.flush()
self.log.flush()
if os.path.exists(options.zipfile):
try:
os.environ['OCLI_CODESET'] = 'ISO-8859-1'
backup = sys.stdout
sys.stdout = output = Logger()
# UploadCommand was the command found in the bytecode
upload = UploadCommand()
try:
upload.main(cmdargs)
except Exception, rc:
pass
sys.stdout = backup
# now do some fancy stuff with output from output.log
I should note that the only reason I simply do a 'pass' in the except: clause is that the except clause is always called. The 'rc' is actually the return code from the command, so I will probably add handling for non-zero cases.

I tried to lookup ways of maybe preloading the stdin buffer
Do you perhaps want to create a named fifo, fill it with username/password info, then reopen it in read mode and pass it to popen (as in popen(..., stdin=myfilledbuffer))?
You could also just create an ordinary temporary file, write the data to it, and reopen it in read mode, again, passing the reopened handle as stdin. (This is something I'd personally avoid doing, since writing username/passwords to temporary files is often of the bad. OTOH it's easier to test than FIFOs)
As for the underlying cause: I suspect that the offending software is reading from stdin via a non-blocking method. Not sure why that works when connected to a terminal.
AAAANYWAY: no need to use pipes directly via Popen at all, right? I kinda laugh at the hackishness of this, but I'll bet it'll work for you:
# you don't actually seem to need popen here IMO -- call() does better for this application.
statuscode = call('echo "%s\n%s\n" | oupload %s' % (username, password, options) , shell=True)
tested with status = call('echo "foo\nbar\nbar\nbaz" |wc -l', shell = True) (output is '4', naturally.)

The original question was solved by just avoiding the issue and not using the terminal and instead importing the python code that was being called by the shell script and just using that.
I believe J.F. Sebastian's answer would probably work better for what was originally asked, however, so I'd suggest people looking for an answer to a similar question look down the path of using the pty module.

Detect file handle leaks in python?

My program appears to be leaking file handles. How can I find out where?
My program uses file handles in a few different places—output from child processes, call ctypes API (ImageMagick) opens files, and they are copied.
It crashes in shutil.copyfile, but I'm pretty sure this is not the place it is leaking.
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Python25\Lib\site-packages\magpy\magpy.py", line 874, in main
magpy.run_all()
File "C:\Python25\Lib\site-packages\magpy\magpy.py", line 656, in run_all
[operation.operate() for operation in operations]
File "C:\Python25\Lib\site-packages\magpy\magpy.py", line 417, in operate
output_file = self.place_image(output_file)
File "C:\Python25\Lib\site-packages\magpy\magpy.py", line 336, in place_image
shutil.copyfile(str(input_file), str(self.full_filename))
File "C:\Python25\Lib\shutil.py", line 47, in copyfile
fdst = open(dst, 'wb')
IOError: [Errno 24] Too many open files: 'C:\\Documents and Settings\\stuart.axon\\Desktop\\calzone\\output\\wwtbam4\\Nokia_NCD\\nl\\icon_42x42_V000.png'
Press any key to continue . . .

I had similar problems, running out of file descriptors during subprocess.Popen() calls. I used the following script to debug on what is happening:
import os
import stat
_fd_types = (
('REG', stat.S_ISREG),
('FIFO', stat.S_ISFIFO),
('DIR', stat.S_ISDIR),
('CHR', stat.S_ISCHR),
('BLK', stat.S_ISBLK),
('LNK', stat.S_ISLNK),
('SOCK', stat.S_ISSOCK)
)
def fd_table_status():
result = []
for fd in range(100):
try:
s = os.fstat(fd)
except:
continue
for fd_type, func in _fd_types:
if func(s.st_mode):
break
else:
fd_type = str(s.st_mode)
result.append((fd, fd_type))
return result
def fd_table_status_logify(fd_table_result):
return ('Open file handles: ' +
', '.join(['{0}: {1}'.format(*i) for i in fd_table_result]))
def fd_table_status_str():
return fd_table_status_logify(fd_table_status())
if __name__=='__main__':
print fd_table_status_str()
You can import this module and call fd_table_status_str() to log the file descriptor table status at different points in your code.
Also, make sure that subprocess.Popen instances are destroyed. Keeping references of Popen instances in Windows prevent the GC from running. And if the instances are kept, the associated pipes are not closed. More info here.

Use Process Explorer, select your process, View->Lower Pane View->Handles - then look for what seems out of place - usually lots of the same or similar files open points to the problem.

lsof -p <process_id> works well on several UNIX-like systems including FreeBSD.

Look at the output from ls -l /proc/$pid/fd/ (substituting the PID of your process, of course) to see which files are open [or, on win32, use Process Explorer to list open files]; then figure out where in your code you're opening them, and make sure that close() is being called. (Yes, the garbage collector will eventually close things, but it's not always fast enough to avoid running out of fds).
Checking for any circular references which might be preventing garbage collection is also a good practice. (The cycle collector will eventually dispose of these -- but it may not run frequently enough to avoid file descriptor exhaustion; I've been bitten by this personally).

While the OP has a Windows system, I'm sure plenty of people here (such as myself) are looking for others too (it's not even tagged Windows).
Google has a psutil package with a get_open_files() method. It looks like an excellent interface, but it hasn't been maintained in a couple years it seems. I actually wrote an implementation for my own Python 2 project on Linux. I'm using it with unittest to make sure my functions clean up their resources.
import os
# calling this **synchronously** will accurately relay open files on Linux
def get_open_files(pid):
# directory spawned by Python process, containing its file descriptors
path = "/proc/%d/fd" % pid
# list the abspaths belonging to that directory
links = ["%s/%s" % (path, f) for f in os.listdir(path)]
# filter out the bad ones returned by os.listdir()
valid_links = filter(lambda f: os.path.exists(f), links)
# these links are fd integers, so map them to their actual file devices
devices = map(lambda f: os.readlink(f), valid_links)
# remove any ones that are stdin, stdout, stderr, etc.
return filter(lambda f: "/dev/pts" not in f, devices)

Python's own test suite has a refleak module that utilizes fd_count. Works across operating systems and is available on full installs:
>>> from test.support.os_helper import fd_count
>>> fd_count()
27
On Python 3.9 and earlier, the os_helper doesn't exist, so from test.support import fd_count.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.