How to override python file lookup for traceback printing?

How to override python file lookup for traceback printing? - python

I've developed a template engine (https://github.com/hl037/tempiny) that translate a file to python code. Then I call exec().
However, when there are errors in the template, the traceback either print no information (if I don't give a filename to compile) or prints the wrong line.
Thus, I would like to change the way the file is retrieved, to use my in-ram version. (by the way, I also can change the file name).
I've checked this question : File "<string>" traceback with line preview
...But the answer does not work with the built-in traceback. (and that means that you have to hack globally sys.excepthook, which could cause conflicts with other modules)
I could also use a temporary file, but it would results in some overhead just for getting info only when something wrong happens...
Do you have a solution?

Related

Finding which files are being read from during a session (python code)

I have a large system written in python. when I run it, it reads all sorts of data from many different files on my filesystem. There are thousands lines of code, and hundreds of files, most of them are not actually being used. I want to see which files are actually being accessed by the system (ubuntu), and hopefully, where in the code they are being opened. Filenames are decided dynamically using variables etc., so the actual filenames cannot be determined just by looking at the code.
I have access to the code, of course, and can change it.
I try to figure how to do this efficiently, with minimal changes in the code:
is there a Linux way to determine which files are accessed, and at what times? this might be useful, although it won't tell me where in the code this happens
is there a simple way to make an "open file" command also log the file name, time, etc... of the open file? hopefully without having to go into the code and change every open command, there are many of them, and some are not being used at runtime.
Thanks

You can trace file accesses without modifying your code, using strace.
Either you start your program with strace, like this
strace -f -e trace=file your_program.py
Otherwise you attach strace to a running program like this
strace -f -e trace=file -p <PID>

For 1 - You can use
ls -la /proc/<PID>/fd`
Replacing <PID> with your process id.
Note that it will give you all the open file descriptors, some of them are stdin stdout stderr, and often other things, such as open websockets (which use a file descriptor), however filtering it for files should be easy.
For 2- See the great solution proposed here -
Override python open function when used with the 'as' keyword to print anything
e.g. overriding the open function with your own, which could include the additional logging.

One possible method is to "overload" the open function. This will have many effects that depend on the code, so I would do that very carefully if needed, but basically here's an example:
>>> _open = open
>>> def open(filename):
... print(filename)
... return _open(filename)
...
>>> open('somefile.txt')
somefile.txt
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in open
FileNotFoundError: [Errno 2] No such file or directory: 'somefile.txt'
As you can see my new open function will return the original open (renamed as _open) but will first print out the argument (the filename). This can be done with more sophistication to log the filename if needed, but the most important thing is that this needs to run before any use of open in your code

Exception while using Sphinx to document script with command line arguments

Is there any way I can use Sphinx to document the command line inputs of a python script? I am able to document inputs of functions or methods but I don't know how to document inputs of scripts. I have been trying to follow the same syntax I use for functions by adding in the source file the line .. automodule:: scriptLDOnServer where scriptLDOnServer is my python script (which corresponds to my main).
The problem is that I get an error like this:
__import__(self.modname)
File "/home/ubuntu/SVNBioinfo/trunk/Code/LD/scriptLDOnServer.py", line 10, in <module>
genotype_filename=sys.argv[7];
IndexError: list index out of range
It seems that Sphinx is trying to get the command line inputs but in my source file there are no inputs so the import fails. Is there a way to solve this? Should I use another command in the source for a script instead of a module?
Sorry for not being very clear but it is difficult to explain the problem.

It is failing while you are importing the file because you are trying to access sys.argv[7] at import time. This is not something you should be doing. Such code which can possibly fail should be either in an if __name__ == '__main__': block or in a function, so that it won't be executed when the code is imported.
This holds generally. You should never* write code which will fail to import or which has any side-effects of being imported.
* Conditions apply. But you'd better have pretty good justification before breaking this rule.

Python: NameError - what should I do with this?

I am working on a large-scale software system that is written in Python right now.
The thing is, I am not sure how to make sure if each individual .py file in the system is correct. The only way for me to run the software is to run the main.py file, which uses all the other .py files.
So either everything works, or one thing doesn't (causing everything to not work).
I keep getting a NameError even when importing the correct file. I think this may have to do with the fact that the class associated with that name in the NameError may have errors in it. Any suggestions? NameError is giving me this:
File "<string>", line 1, in <module>
NameError: name 'RGBox' is not defined
It's not a very helpful error message, and I'm not sure why it's giving "string" and 'module' instead of actual values.....
[EDIT]- I am working through ssh into a remote unix machine

This is a straight-forward error message which indicates that the execution flow has not yet encountered class/module/variable RGBox prior to it being called.
RGBox is either being called out of sequence or has been mispelt.
Perform a commandline search through the app files for the name 'RGBox' or its regex equivalents. for example with grep you can do a case-insensitive search:
$ grep -lsri 'rgbox' ./my_project_folder
which will output any file which contains the patterns 'RGBox', 'rgBox', etc.
If you are unfamiliar with the code and its structure, then you may as well insert strategic logging (or print) statements at significant locations in the code to understand its flow and execution logic.

disallow access to filesystem inside exec and eval in Python

I want to disallow access to file system from clients code, so I think I could overwrite open function
env = {
'open': lambda *a: StringIO("you can't use open")
}
exec(open('user_code.py'), env)
but I got this
unqualified exec is not allowed in function 'my function' it contains a
nested function with free variables
I also try
def open_exception(*a):
raise Exception("you can't use open")
env = {
'open': open_exception
}
but got the same Exception (not "you can't use open")
I want to prevent of:
executing this:
"""def foo():
return open('some_file').read()
print foo()"""
and evaluate this
"open('some_file').write('some text')"
I also use session to store code that was evaluated previously so I need to prevent of executing this:
"""def foo(s):
return open(s)"""
and then evaluating this
"foo('some').write('some text')"
I can't use regex because someone could use (eval inside string)
"eval(\"opxx('some file').write('some text')\".replace('xx', 'en')"
Is there any way to prevent access to file system inside exec/eval? (I need both)

There's no way to prevent access to the file system inside exec/eval. Here's an example code that demonstrates a way for the user code to call otherwise restricted classes that always works:
import subprocess
code = """[x for x in ().__class__.__bases__[0].__subclasses__()
if x.__name__ == 'Popen'][0](['ls', '-la']).wait()"""
# Executing the `code` will always run `ls`...
exec code in dict(__builtins__=None)
And don't think about filtering the input, especially with regex.
You might consider a few alternatives:
ast.literal_eval if you could limit yourself only to simple expressions
Using another language for user code. You might look at Lua or JavaScript - both are sometimes used to run unsafe code inside sandboxes.
There's the pysandbox project, though I can't guarantee you that the sandboxed code is really safe. Python wasn't designed to be sandboxed, and in particular the CPython implementation wasn't written with sandboxing in mind. Even the author seems to doubt the possibility to implement such sandbox safely.

You can't turn exec() and eval() into a safe sandbox. You can always get access to the builtin module, as long as the sys module is available::
sys.modules[().__class__.__bases__[0].__module__].open
And even if sys is unavailable, you can still get access to any new-style class defined in any imported module by basically the same way. This includes all the IO classes in io.

This actually can be done.
That is, practically just what you describe can be accomplished on Linux, contrary to other answers here. That is, you can achieve a setup where you can have an exec-like call which runs untrusted code under security which is reasonably difficult to penetrate, and which allows output of the result. Untrusted code is not allowed to access the filesystem at all except for reading specifically allowed parts of the Python vm and standard library.
If that's close enough to what you wanted, read on.
I'm envisioning a system where your exec-like function spawns a subprocess under a very strict AppArmor profile, such as the one used by Straitjacket (see here and here). This will limit all filesystem access at the kernel level, other than files specifically allowed to be read. This will also limit the process's stack size, max data segment size, max resident set size, CPU time, the number of signals that can be queued, and the address space size. The process will have locked memory, cores, flock/fcntl locks, POSIX message queues, etc, wholly disallowed. If you want to allow using size-limited temporary files in a scratch area, you can mkstemp it and make it available to the subprocess, and allow writes there under certain conditions (make sure that hard links are absolutely disallowed). You'd want to make sure to clear out anything interesting from the subprocess environment and put it in a new session and process group, and close all FDs in the subprocess except for the stdin/stdout/stderr, if you want to allow communication with those.
If you want to be able to get a Python object back out from the untrusted code, you could wrap it in something which prints the result's repr to stdout, and after you check its size, you evaluate it with ast.literal_eval(). That pretty severely limits the possible types of object that can be returned, but really, anything more complicated than those basic types probably carries the possibility of sekrit maliciousness intended to be triggered within your process. Under no circumstances should you use pickle for the communication protocol between the processes.

As #Brian suggest overriding open doesn't work:
def raise_exception(*a):
raise Exception("you can't use open")
open = raise_exception
print eval("open('test.py').read()", {})
this display the content of the file but this (merging #Brian and #lunaryorn answers)
import sys
def raise_exception(*a):
raise Exception("you can't use open")
__open = sys.modules['__builtin__'].open
sys.modules['__builtin__'].open = raise_exception
print eval("open('test.py').read()", {})
will throw this:
Traceback (most recent call last):
File "./test.py", line 11, in <module>
print eval("open('test.py').read()", {})
File "<string>", line 1, in <module>
File "./test.py", line 5, in raise_exception
raise Exception("you can't use open")
Exception: you can't use open
Error in sys.excepthook:
Traceback (most recent call last):
File "/usr/lib/python2.6/dist-packages/apport_python_hook.py", line 48, in apport_excepthook
if not enabled():
File "/usr/lib/python2.6/dist-packages/apport_python_hook.py", line 23, in enabled
conf = open(CONFIG).read()
File "./test.py", line 5, in raise_exception
raise Exception("you can't use open")
Exception: you can't use open
Original exception was:
Traceback (most recent call last):
File "./test.py", line 11, in <module>
print eval("open('test.py').read()", {})
File "<string>", line 1, in <module>
File "./test.py", line 5, in raise_exception
raise Exception("you can't use open")
Exception: you can't use open
and you can access to open outside user code via __open

"Nested function" refers to the fact that it's declared inside another function, not that it's a lambda. Declare your open override at the top level of your module and it should work the way you want.
Also, I don't think this is totally safe. Preventing open is just one of the things you need to worry about if you want to sandbox Python.

Google Appengine and Python exceptions

In my Google Appengine application I have defined a custom exception InvalidUrlException(Exception) in the module 'gvu'. Somewhere in my code I do:
try:
results = gvu.article_parser.parse(source_url)
except gvu.InvalidUrlException as e:
self.redirect('/home?message='+str(e))
...
which works fine in the local GAE development server, but raises
<type 'exceptions.SyntaxError'>: invalid syntax (translator.py, line 18)
when I upload it. (line 18 is the line starting with 'except')
The problem seems to come from the 'as e' part: if I remove it I don't get this exception anymore. However I would like to be able to access the raised exception. Have you ever encountered this issue? Is there an alternative syntax?

You probably have an older Python version on your server. except ExceptionType as varname: is a newer syntax. Previously you needed to simply use a comma: except ExceptionType, varname:.

I was getting the same error because I was using the pydoc command instead of the pydoc3 command on a python3 file that was using python3 print statements (print statements with parenthesis).

Just FYI, another possible cause for this error - especially if the line referenced is early in the script (like line 2) is line ending differences between Unix and Windows.
I was running Python on Windows from a Cygwin shell and got this error, and was really puzzled. I had created the file with "touch" before editing it.
I renamed the file to a temp file name, and copied another file (which I had downloaded from a Unix server) onto the original file name, then restored the contents via the temp file, and problem solved. Same exact file contents (on the screen anyway), only difference was where the file had originally been created.
Just wanted to post this in case anyone else ran across this error and was similarly puzzled.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.