Security of Python's eval() on untrusted strings?

Security of Python's eval() on untrusted strings? - python

If I am evaluating a Python string using eval(), and have a class like:
class Foo(object):
a = 3
def bar(self, x): return x + a
What are the security risks if I do not trust the string? In particular:
Is eval(string, {"f": Foo()}, {}) unsafe? That is, can you reach os or sys or something unsafe from a Foo instance?
Is eval(string, {}, {}) unsafe? That is, can I reach os or sys entirely from builtins like len and list?
Is there a way to make builtins not present at all in the eval context?
There are some unsafe strings like "[0] * 100000000" I don't care about, because at worst they slow/stop the program. I am primarily concerned about protecting user data external to the program.
Obviously, eval(string) without custom dictionaries is unsafe in most cases.

eval() will allow malicious data to compromise your entire system, kill your cat, eat your dog and make love to your wife.
There was recently a thread about how to do this kind of thing safely on the python-dev list, and the conclusions were:
It's really hard to do this properly.
It requires patches to the python interpreter to block many classes of attacks.
Don't do it unless you really want to.
Start here to read about the challenge: http://tav.espians.com/a-challenge-to-break-python-security.html
What situation do you want to use eval() in? Are you wanting a user to be able to execute arbitrary expressions? Or are you wanting to transfer data in some way? Perhaps it's possible to lock down the input in some way.

You cannot secure eval with a blacklist approach like this. See Eval really is dangerous for examples of input that will segfault the CPython interpreter, give access to any class you like, and so on.

You can get to os using builtin functions: __import__('os').
For python 2.6+, the ast module may help; in particular ast.literal_eval, although it depends on exactly what you want to eval.

Note that even if you pass empty dictionaries to eval(), it's still possible to segfault (C)Python with some syntax tricks. For example, try this on your interpreter: eval("()"*8**5)

You are probably better off turning the question around:
What sort of expressions are you wanting to eval?
Can you insure that only strings matching some narrowly defined syntax are eval()d?
Then consider if that is safe.
For example, if you are wanting to let the user enter an algebraic expression for evaluation, consider limiting them to one letter variable names, numbers, and a specific set of operators and functions. Don't eval() strings containing anything else.

There is a very good article on the un-safety of eval() in Mark Pilgrim's Dive into Python tutorial.
Quoted from this article:
In the end, it is possible to safely
evaluate untrusted Python expressions,
for some definition of “safe” that
turns out not to be terribly useful in
real life. It’s fine if you’re just
playing around, and it’s fine if you
only ever pass it trusted input. But
anything else is just asking for
trouble.

Related

Can the usage of `setattr` (and `getattr`) be considered as bad practice?

setattr and getattr kind of got into my style of programing (mainly scientific stuff, my knowledge about python is self told).
Considering that exec and eval inherit a potential danger since in some cases they might lead to security issues, I was wondering if for setattr the same argument is considered to be valid. (About getattr I found this question which contains some info - although the argumentation is not very convincing.)
From what I know, setattr can be used without worrying to much, but to be honest I don't trust my python knowledge enough to be sure, and if I'm wrong I'd like to try and get rid of the habit of using setattr.
Any input is very much appreciated!

First, it could definitely make it easier to an existing security hole.
For example, let's say you have code that does exec, eval, SQL queries or URLs built via string formatting, etc. And let's say you're passing, say, locals() or a filtered __dict__ to the formatting command or as the eval context or whatever. Using setattr clearly widens the security hole, making it much easier for me to find ways to attack your code, because you can no longer be sure what you're going to be passing to those functions.
But what if you don't do anything else unsafe? Is setattr safe then?
Not as bad, but it's still not safe. If I can influence the names of the attributes you're setting, I can, e.g., replace any method I want on your objects.
You can try to protect against this by, e.g., first checking that the old value was not callable, or not a method-type descriptor, or whatever. In the same way you can try to protect against people calling functions in eval or putting quotes and semicolons in SQL parameters and so on. This is effectively the same as any of those other cases. And it's a lot harder to try to close all illegitimate paths to an open door, than to just not open the door in the first place.
What if the name never comes from anything that can be influenced by the user?
Well, in that case, why are you using setattr? There is no good reason to call setattr with a literal.
Anyway, when Lattyware said that there are often better ways to solve the problem at hand, he was almost certainly talking about readability, maintainability, and idiomaticness. But the side effect of using those better ways is that you also often avoid any security implications.
90% of the time, the solution is to use a dict instead of an object. Unlike Javascript, they're not the same thing in Python, and they're not meant to be used the same way. A dict doesn't have methods, or inheritance, or built-in special names, so you don't have to worry about any of that. It also has a more convenient syntax, where you can say d['foo'] instead of setattr(o, 'foo'). And it's probably more efficient. And so on. But ultimately, the reason to use a dict is the conceptual reason: a dict is a named collection of values; a class instance is a representation of a model-space object, and those are not the same thing.
So, why does setattr even exist?
It's there for the same basic reasons as other low-level features, like being able to access im_func or func_closure, or having modules like traceback and imp, or treating special methods just like any other methods, or for that matter exec and eval.
First, you can build higher-level things out of these low-level tools. For example, to build collections.namedtuple, you'd need either exec or setattr.
Second, you occasionally need to monkey-patch code at runtime because you can't modify it (or maybe even see it) at compile time, and tools like setattr can be essential to doing that.
The setattr feature—much like eval—is often misused by people coming from Javascript, Tcl, or a few other languages. But as long as it can be used for good, you don't want to take it out of the language. (TOOWTDI shouldn't be taken so literally that only one program can ever be written.)
But that doesn't mean you should go around using this stuff whenever possible. You wouldn't write mylist.__getitem__(slice(1, 10, 2)) instead of mylist[1:10:2]. Sometimes, being able to call __getitem__ directly or build slice objects explicitly is a foundation to something that lets the rest of your code be more pythonic, or way to localize a workaround to avoid infecting the rest of your code. Otherwise, there are clearer and simpler ways to do it.

Parsing, securing python expression before passing it to eval()

I want to take an input from the user may be like foo() > 90 and boo() == 9 or do() > 100 and use eval on the server side to to evaluate this expression.
For security I want to restrict user to add limited functions and operators by keeping a check (against some data-structure) before I pass it to eval function.
PS: Input comes from a web page
Thanks

Basically the only way to do this is to parse it yourself. You navigate the parse tree to guarantee that each part is in a whitelist of perfectly benign and safe operations, making the entire expression safe by construction. Ned Batchelder's answer is actually a (simple) form of this. You could pass it to eval() after that, although, what would be the point? You could compute the value of each subexpression as part of verification (this is especially a good idea because it makes your parser resistant to changes in Python syntax and so on). This whitelist must be extremely tiny, and there are a lot of things that you might think are okay, but aren't (e.g. general call operator; getattr function). You have to be very careful.
A blacklist is absolutely out of the question (such as the suggestion to "reject suspicious entries"). Reject anything that is not obviously good. If you don't, it will be trivial to work around your filter and give an expression that does something bad, barring the unlikely possibility that your code is better than any other blacklisting filter for Python ever created.
There have been attempts at restricting Python execution, one is the infamous and now-disabled (because it didn't work) rexec module (and company), and another is PyPy's sandbox. This second option doesn't do exactly what you asked for, but it's certainly worth looking into. It's probably what I would use-- it just means that it won't be as easy as eval(safematize(user_input)).

the more secure way is to do everything at the back end. Users just key in the necessary parameters. For example you can prompt them to key in numeric values for foo(), boo() and do(). Then at the back end, pass these values to appropriate functions to do the calculations.

Perhaps the simplest check would be to look at all the words in the expression, and check them against a whitelist. Reject the expression if any of the words isn't on the whitelist.
import re
expr = "foo() > 90 and boo() == 9 or do() > 100"
whitelist = "and or foo boo do".split()
for word in re.findall(r"[a-zA-Z_]\w+", expr):
if word not in whitelist:
raise Exception("Warning! Warning!")
This works because you have a limited domain that you need users to be able to express themselves in, and also because I don't think there's a way to cause damage with eval without using identifiers.
You'll have to be careful that your whitelist doesn't inadvertently include possibly malicious Python identifiers, though.

You need to lock down the input format, or it will be a gaping security hole. Either implement a full blown parser, as lpthnc suggests, with a reasonable set of operations (but no more), or at least use a regular expression (or several regex patterns in a matching hierarchy and/or loop) to strip out recognized patterns, and reject suspicious entries as "not allowed".

Safety of Python 'eval' For List Deserialization

Are there any security exploits that could occur in this scenario:
eval(repr(unsanitized_user_input), {"__builtins__": None}, {"True":True, "False":False})
where unsanitized_user_input is a str object. The string is user-generated and could be nasty. Assuming our web framework hasn't failed us, it's a real honest-to-god str instance from the Python builtins.
If this is dangerous, can we do anything to the input to make it safe?
We definitely don't want to execute anything contained in the string.
See also:
Funny blog post about eval safety
Previous Question
Blog: Fast deserialization in Python
The larger context which is (I believe) not essential to the question is that we have thousands of these:
repr([unsanitized_user_input_1,
unsanitized_user_input_2,
unsanitized_user_input_3,
unsanitized_user_input_4,
...])
in some cases nested:
repr([[unsanitized_user_input_1,
unsanitized_user_input_2],
[unsanitized_user_input_3,
unsanitized_user_input_4],
...])
which are themselves converted to strings with repr(), put in persistent storage, and eventually read back into memory with eval.
Eval deserialized the strings from persistent storage much faster than pickle and simplejson. The interpreter is Python 2.5 so json and ast aren't available. No C modules are allowed and cPickle is not allowed.

It is indeed dangerous and the safest alternative is ast.literal_eval (see the ast module in the standard library). You can of course build and alter an ast to provide e.g. evaluation of variables and the like before you eval the resulting AST (when it's down to literals).
The possible exploit of eval starts with any object it can get its hands on (say True here) and going via .__class_ to its type object, etc. up to object, then gets its subclasses... basically it can get to ANY object type and wreck havoc. I can be more specific but I'd rather not do it in a public forum (the exploit is well known, but considering how many people still ignore it, revealing it to wannabe script kiddies could make things worse... just avoid eval on unsanitized user input and live happily ever after!-).

If you can prove beyond doubt that unsanitized_user_input is a str instance from the Python built-ins with nothing tampered, then this is always safe. In fact, it'll be safe even without all those extra arguments since eval(repr(astr)) = astr for all such string objects. You put in a string, you get back out a string. All you did was escape and unescape it.
This all leads me to think that eval(repr(x)) isn't what you want--no code will ever be executed unless someone gives you an unsanitized_user_input object that looks like a string but isn't, but that's a different question--unless you're trying to copy a string instance in the slowest way possible :D.

With everything as you describe, it is technically safe to eval repred strings, however, I'd avoid doing it anyway as it's asking for trouble:
There could be some weird corner-case where your assumption that only repred strings are stored (eg. a bug / different pathway into the storage that doesn't repr instantly becmes a code injection exploit where it might otherwise be unexploitable)
Even if everything is OK now, assumptions might change at some point, and unsanitised data may get stored in that field by someone unaware of the eval code.
Your code may get reused (or worse, copy+pasted) into a situation you didn't consider.
As Alex Martelli pointed out, in python2.6 and higher, there is ast.literal_eval which will safely handle both strings and other simple datatypes like tuples. This is probably the safest and most complete solution.
Another possibility however is to use the string-escape codec. This is much faster than eval (about 10 times according to timeit), available in earlier versions than literal_eval, and should do what you want:
>>> s = 'he\nllo\' wo"rld\0\x03\r\n\tabc'
>>> repr(s)[1:-1].decode('string-escape') == s
True
(The [1:-1] is to strip the outer quotes repr adds.)

Generally, you should never allow anyone to post code.
So called "paid professional programmers" have a hard-enough time writing code that actually works.
Accepting code from the anonymous public -- without benefit of formal QA -- is the worst of all possible scenarios.
Professional programmers -- without good, solid formal QA -- will make a hash of almost any web site. Indeed, I'm reverse engineering some unbelievably bad code from paid professionals.
The idea of allowing a non-professional -- unencumbered by QA -- to post code is truly terrifying.

repr([unsanitized_user_input_1,
unsanitized_user_input_2,
...
... unsanitized_user_input is a str object
You shouldn't have to serialise strings to store them in a database..
If these are all strings, as you mentioned - why can't you just store the strings in a db.StringListProperty?
The nested entries might be a bit more complicated, but why is this the case? When you have to resort to eval to get data from the database, you're probably doing something wrong..
Couldn't you store each unsanitized_user_input_x as it's own db.StringProperty row, and have group them by an reference field?
Either of those may not be applicable, since I've no idea what you're trying to achieve, but my point is - can you not structure the data in a way you where don't have to rely on eval (and also rely on it not being a security issue)?

Is there a way to make a user-defined Python function act like a built-in statement?

Is it possible to make a user-defined Python function act like a statement? In other words, I'd like to be able to say:
myfunc
rather than:
myfunc()
and have it get called anyway -- the way that, say, print would.
I can already hear you all composing responses about how this is a horrible thing to do and I'm stupid for asking it and why do I want to do this and I really should do something else instead, but please take my word that it's something I need to do to debug a problem I'm having and it's not going to be checked in or used for, like, an air traffic control system.

No, it is not possible.
As you can see from the Language Reference, there is no room left for extensions of the list of simple statements in the specification.
Moreover, print as a statement no longer exists in Python 3.0 and is replaced by the print() builtin function.

If what you're looking for is to add a new statement (like print) to Python's language, then this would not be easy. You'd probably have to modify lexer, parser and then recompile Python's C sources. A lot of work to do for a questionable convenience.

I would not implement this, but if I was implementing this, I would give code with myfunc a special extension, write an import hook to parse the file, add the parenthesis to make it valid Python, and feed that into the interpreter.

Not if you want to pass in arguments. You could do something build an object that ABUSES the __str__ method, but it is highly not recommended. You can also use other operators like overload the << operator like cout does in C++.

In Python 2.x print is not a function it is a statement just as if, while and def are statements.

Not possible in a planned way, or without a lot of work.
If you are bold and adventurous, read this wikipedia article about meta circular evaluation. Python has pretty good inspection and reflection on its own compiler/evaluater objects, you may be able to cobble something together along these lines.
"""Meta-circular implementations are suited to extending the language
they are written in. They are also useful for writing tools that are
tightly integrated with the programming language, such as
sophisticated debuggers. A language designed with a meta-circular
implementation in mind is often more suited for building languages in
general, even ones completely different from the host language."""
http://en.wikipedia.org/wiki/Meta-circular_evaluator
I believe pypy is doing something similarily, you might want to look into it.
http://pypy.org

This probably isn't going to cover your problem, but I'll mention it anyway. If myfunc is part of a module, and you are using it like this:
from mymodule import myfunc
myfunc # I want this to turn into a function call
Then you could instead do this:
import mymodule
mymodule.myfunc # I want this to turn into a function call
You could then remove myfunc from mymodule and overload the module so it calls a particular function each time the myfunc member is requested.

Is "safe_eval" really safe?

I'm looking for a "safe" eval function, to implement spreadsheet-like calculations (using numpy/scipy).
The functionality to do this (the rexec module) has been removed from Python since 2.3 due to apparently unfixable security problems. There are several third-party hacks out there that purport to do this - the most thought-out solution that I have found is
this Python Cookbok recipe, "safe_eval".
Am I reasonably safe if I use this (or something similar), to protect from malicious code, or am I stuck with writing my own parser? Does anyone know of any better alternatives?
EDIT: I just discovered RestrictedPython, which is part of Zope. Any opinions on this are welcome.

Depends on your definition of safe I suppose. A lot of the security depends on what you pass in and what you are allowed to pass in the context. For instance, if a file is passed in, I can open arbitrary files:
>>> names['f'] = open('foo', 'w+')
>>> safe_eval.safe_eval("baz = type(f)('baz', 'w+')", names)
>>> names['baz']
<open file 'baz', mode 'w+' at 0x413da0>
Furthermore, the environment is very restricted (you cannot pass in modules), thus, you can't simply pass in a module of utility functions like re or random.
On the other hand, you don't need to write your own parser, you could just write your own evaluator for the python ast:
>>> import compiler
>>> ast = compiler.parse("print 'Hello world!'")
That way, hopefully, you could implement safe imports. The other idea is to use Jython or IronPython and take advantage of Java/.Net sandboxing capabilities.

Writing your own parser could be fun! It might be a better option because people are expecting to use the familiar spreadsheet syntax (Excel, etc) and not Python when they're entering formulas. I'm not familiar with safe_eval but I would imagine that anything like this certainly has the potential for exploitation.

If you simply need to write down and read some data structure in Python, and don't need the actual capacity of executing custom code, this one is a better fit:
http://code.activestate.com/recipes/364469-safe-eval/
It garantees that no code is executed, only static data structures are evaluated: strings, lists, tuples, dictionnaries.

Although that code looks quite secure, I've always held the opinion that any sufficiently motivated person could break it given adequate time. I do think it will take quite a bit of determination to get through that, but I'm relatively sure it could be done.

Daniel,
Jinja implements a sandboxe environment that may or may not be useful to you. From what I remember, it doesn't yet "comprehend" list comprehensions.
Sanbox info

The functionality you want is in the compiler language services, see
http://docs.python.org/library/language.html
If you define your app to accept only expressions, you can compile the input as an expression and get an exception if it is not, e.g. if there are semicolons or statement forms.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.