I'm looking for a "safe" eval function, to implement spreadsheet-like calculations (using numpy/scipy).
The functionality to do this (the rexec module) has been removed from Python since 2.3 due to apparently unfixable security problems. There are several third-party hacks out there that purport to do this - the most thought-out solution that I have found is
this Python Cookbok recipe, "safe_eval".
Am I reasonably safe if I use this (or something similar), to protect from malicious code, or am I stuck with writing my own parser? Does anyone know of any better alternatives?
EDIT: I just discovered RestrictedPython, which is part of Zope. Any opinions on this are welcome.
Depends on your definition of safe I suppose. A lot of the security depends on what you pass in and what you are allowed to pass in the context. For instance, if a file is passed in, I can open arbitrary files:
>>> names['f'] = open('foo', 'w+')
>>> safe_eval.safe_eval("baz = type(f)('baz', 'w+')", names)
>>> names['baz']
<open file 'baz', mode 'w+' at 0x413da0>
Furthermore, the environment is very restricted (you cannot pass in modules), thus, you can't simply pass in a module of utility functions like re or random.
On the other hand, you don't need to write your own parser, you could just write your own evaluator for the python ast:
>>> import compiler
>>> ast = compiler.parse("print 'Hello world!'")
That way, hopefully, you could implement safe imports. The other idea is to use Jython or IronPython and take advantage of Java/.Net sandboxing capabilities.
Writing your own parser could be fun! It might be a better option because people are expecting to use the familiar spreadsheet syntax (Excel, etc) and not Python when they're entering formulas. I'm not familiar with safe_eval but I would imagine that anything like this certainly has the potential for exploitation.
If you simply need to write down and read some data structure in Python, and don't need the actual capacity of executing custom code, this one is a better fit:
http://code.activestate.com/recipes/364469-safe-eval/
It garantees that no code is executed, only static data structures are evaluated: strings, lists, tuples, dictionnaries.
Although that code looks quite secure, I've always held the opinion that any sufficiently motivated person could break it given adequate time. I do think it will take quite a bit of determination to get through that, but I'm relatively sure it could be done.
Daniel,
Jinja implements a sandboxe environment that may or may not be useful to you. From what I remember, it doesn't yet "comprehend" list comprehensions.
Sanbox info
The functionality you want is in the compiler language services, see
http://docs.python.org/library/language.html
If you define your app to accept only expressions, you can compile the input as an expression and get an exception if it is not, e.g. if there are semicolons or statement forms.
Related
When I try to view the built-in function all() in PyCharm, I could just see "pass" in the function body. How to view the actual implementation so that I could know what exactly the built-in function is doing?
def all(*args, **kwargs): # real signature unknown
"""
Return True if bool(x) is True for all values x in the iterable.
If the iterable is empty, return True.
"""
pass
Assuming you’re using the usual CPython interpreter, all is a builtin function object, which just has a pointer to a compiled function statically linked into the interpreter (or libpython). Showing you the x86_64 machine code at that address probably wouldn’t be very useful to the vast majority of people.
Try running your code in PyPy instead of CPython. Many things that are builtins in CPython are plain old Python code in PyPy.1 Of course that isn’t always an option (e.g., PyPy doesn’t support 3.7 features yet, there are a few third-party extension modules that are still way too slow to use, it’s harder to build yourself if you’re on some uncommon platform…), so let’s go back to CPython.
The actual C source for that function isn’t too hard to find online. It’s in bltinmodule.c. But, unlike the source code to the Python modules in your standard library, you probably don’t have these files around. Even if you do have them, the only way to connect the binary to the source is through debugging output emitted when you compiled CPython from that source, which you probably didn’t do. But if you’re thinking that sounds like a great idea—it is. Build CPython yourself (you may want a Py_DEBUG build), and then you can just run it in your C source debugger/IDE and it can handle all the clumsy bits.
But if that sounds more scary than helpful, even though you can read basic C code and would like to find it…
How did I know where to find that code on GitHub? Well, I know where the repo is; I know the basic organization of the source into Python, Objects, Modules, etc.; I know how module names usually map to C source file names; I know that builtins is special in a few ways…
That’s all pretty simple stuff. Couldn’t you just program all that knowledge into a script, which you could then build a PyCharm plugin out of?
You can do the first 50% or so in a quick evening hack, and such things litter the shores of GitHub. But actually doing it right requires handling a ton of special cases, parsing some ugly C code, etc. And for anyone capable of writing such a thing, it’s easier to just lldb Python than to write it.
1. Also, even the things that are builtins are written in a mix of Python and a Python subset called RPython, which you might find easier to understand than C—then again, it’s often even harder to find that source, and the multiple levels that all look like Python can be hard to keep straight.
What i mean is, how is the syntax defined, i.e. how can i make my own constructs like these?
I realise in a lot of languages, things like this will be built into the compiler / spec, and so it's dealt with by the compiler (at least that how i understand it to work).
But with python, everything i've come across so far has been accessible to the programmer, and so you more or less have the freedom to do whatever you want.
How would i go about writing my own version of for or while? Is it even possible?
I don't have any actual application for this, so the answer to any WHY?! questions is just "because why not?" or "curiosity".
No, you can't, not from within Python. You can't add new syntax to the language. (You'd have to modify the source code of Python itself to make your own custom version of Python.)
Note that the iterator protocol allows you to define objects that can be used with for in a custom way, which covers a lot of the possible use cases of writing your own iteration syntax.
Well, you have a couple of options for creating your own syntax:
Write a higher-order function, like map or reduce.
Modify python at the C level. This is, as you might expect, relatively easy as compared with fiddling with many other languages. See this article for an example: http://eli.thegreenplace.net/2010/06/30/python-internals-adding-a-new-statement-to-python/
Fake it using the debug facilities, or the encodings facility. See this code: http://entrian.com/goto/download.html and http://timhatch.com/projects/pybraces/
Use a preprocessor. Here's one project that tries to make this easy: http://www.fiber-space.de/langscape/doc/index.html
Use of the python facilities built in to achieve a similar effect (decorators, metaclasses, and the like).
Obviously, none of this is quite what you're looking for, but python, unlike smalltalk or lisp, isn't (necessarily) programmed in itself and guarantees to expose its own underlying execution and parsing mechanisms at runtime.
You can't make equivalent constructs. for, while, if etc. are statements, and they are built into the language with their own specific syntax. There are languages that do allow this sort of thing though (to some degree), such as Scala.
while, print, for etc. are keywords. That means they are parsed by the python parser whilst reading the code, stripped any redundant characters and result in tokens. Afterwards a lexer takes those tokens as input and builds a program tree which is then excuted by the interpreter. Said so, those constructs are used only as syntactic sugar for underlying lexical machinery and as such are not visible from inside the code.
i am creating ( researching possibility of ) a highly customizable python client and would like to allow users to actually edit the code in another language to customize the running of program. ( analogous to browser which itself coded in c/c++ and run another language html/js ). so my question is , is there any programming language implemented in pure python which i can see as a reference ( or use directly ? ) -- i need simple language ( simple statements and ifs can do )
edit: sorry if i did not make myself clear but what i want is "a language to customize the running of program" , even though pypi seems a great option, what i am looking for is more simple which i can study and extend myself if need arise. my google searches pointing towards xml based langagues. ( BMEL , XForms etc ).
The question isn't completely clear on scope, but I have a hunch that PyPy, embedding other full languages, and similar solutions might be overkill. It sounds like iamgopal may really be interested in something more like Interpreter Pattern or Little Language.
If the language you want to support is really small (see the Interpreter Pattern link), then hand-coding this yourself in Python won't be too hard. You can write a simple parser (Google around; here's one example), then walk the AST and evaluate user expressions.
However, if you expect this to be used for a long time or by many people, it may be worth throwing a real language at the problem. (I'd recommend Python itself if your users are already familiar with basic Python syntax).
Ren'Py is a modification to Python syntax built on top of Python itself, using the language tools in the stdlib.
For your user's sake, don't use an XML based language - XML is an awful basis for a programming language and your users will hate you for it.
Here is a suggestion. Use a strict subset of Python for your language. Use the compiler module to convert their code into an abstract syntax tree and walk the tree to to validate that the code conforms to your subset before converting the AST into python bytecode.
N.B. I just checked the docs and see that the compiler package is deprecated in 2.6 and removed in Python 3.x. Does anyone know why that is?
Numerous template languages such as Cheetah, Django templates, Genshi, Mako, Mighty might serve as an example.
Why not Python itself? With some care you can use eval to run user code.
One of the good thing about interpreted scripting languages is that you don't need another extra scripting language!
PLY (Python Lex-Yacc)
is something of your interest.
Possibly Common Lisp (or any other Lisp) will be the best choice for that task. Because Lisp make it possible to easily extend host language with powerful macroses and construct DSL (domain specific language).
If all you need is simple if statements and expressions, I'm sure it wouldn't be an awful task to parse each line. Something like
if some flag
activate some feature
deactivate some feature
elif some other flag
activate some feature
activate some feature
else
logout
Just write a class which, while parsing takes the first word, checks if it's "if, elif, else," etc, and if so, check a flag and set a flag saying you either are or are not executing until the next conditional. If it's not a conditional, call a function based on the first keyword that would modify the program state in some way.
The class could store some local execution state (are we in an if statement? If so are we executing this branch?) and have another class containing some global application state (flags that are checkable by if statements, etc).
This is probably the wrong thing to do in your situation (it's very prone to bugs, it's dangerous if you don't treat the data in the scripts correctly), but it's at least a start if you do decide to interpret your own mini-language.
Seriously though, if you try this, be very, very, srs careful. Don't give the scripts any functionality that they don't definitely need, because you are almost certainly opening security holes by doing something like this.
Don't say I didn't warn you.
If I am evaluating a Python string using eval(), and have a class like:
class Foo(object):
a = 3
def bar(self, x): return x + a
What are the security risks if I do not trust the string? In particular:
Is eval(string, {"f": Foo()}, {}) unsafe? That is, can you reach os or sys or something unsafe from a Foo instance?
Is eval(string, {}, {}) unsafe? That is, can I reach os or sys entirely from builtins like len and list?
Is there a way to make builtins not present at all in the eval context?
There are some unsafe strings like "[0] * 100000000" I don't care about, because at worst they slow/stop the program. I am primarily concerned about protecting user data external to the program.
Obviously, eval(string) without custom dictionaries is unsafe in most cases.
eval() will allow malicious data to compromise your entire system, kill your cat, eat your dog and make love to your wife.
There was recently a thread about how to do this kind of thing safely on the python-dev list, and the conclusions were:
It's really hard to do this properly.
It requires patches to the python interpreter to block many classes of attacks.
Don't do it unless you really want to.
Start here to read about the challenge: http://tav.espians.com/a-challenge-to-break-python-security.html
What situation do you want to use eval() in? Are you wanting a user to be able to execute arbitrary expressions? Or are you wanting to transfer data in some way? Perhaps it's possible to lock down the input in some way.
You cannot secure eval with a blacklist approach like this. See Eval really is dangerous for examples of input that will segfault the CPython interpreter, give access to any class you like, and so on.
You can get to os using builtin functions: __import__('os').
For python 2.6+, the ast module may help; in particular ast.literal_eval, although it depends on exactly what you want to eval.
Note that even if you pass empty dictionaries to eval(), it's still possible to segfault (C)Python with some syntax tricks. For example, try this on your interpreter: eval("()"*8**5)
You are probably better off turning the question around:
What sort of expressions are you wanting to eval?
Can you insure that only strings matching some narrowly defined syntax are eval()d?
Then consider if that is safe.
For example, if you are wanting to let the user enter an algebraic expression for evaluation, consider limiting them to one letter variable names, numbers, and a specific set of operators and functions. Don't eval() strings containing anything else.
There is a very good article on the un-safety of eval() in Mark Pilgrim's Dive into Python tutorial.
Quoted from this article:
In the end, it is possible to safely
evaluate untrusted Python expressions,
for some definition of “safe” that
turns out not to be terribly useful in
real life. It’s fine if you’re just
playing around, and it’s fine if you
only ever pass it trusted input. But
anything else is just asking for
trouble.
Is it possible to make a user-defined Python function act like a statement? In other words, I'd like to be able to say:
myfunc
rather than:
myfunc()
and have it get called anyway -- the way that, say, print would.
I can already hear you all composing responses about how this is a horrible thing to do and I'm stupid for asking it and why do I want to do this and I really should do something else instead, but please take my word that it's something I need to do to debug a problem I'm having and it's not going to be checked in or used for, like, an air traffic control system.
No, it is not possible.
As you can see from the Language Reference, there is no room left for extensions of the list of simple statements in the specification.
Moreover, print as a statement no longer exists in Python 3.0 and is replaced by the print() builtin function.
If what you're looking for is to add a new statement (like print) to Python's language, then this would not be easy. You'd probably have to modify lexer, parser and then recompile Python's C sources. A lot of work to do for a questionable convenience.
I would not implement this, but if I was implementing this, I would give code with myfunc a special extension, write an import hook to parse the file, add the parenthesis to make it valid Python, and feed that into the interpreter.
Not if you want to pass in arguments. You could do something build an object that ABUSES the __str__ method, but it is highly not recommended. You can also use other operators like overload the << operator like cout does in C++.
In Python 2.x print is not a function it is a statement just as if, while and def are statements.
Not possible in a planned way, or without a lot of work.
If you are bold and adventurous, read this wikipedia article about meta circular evaluation. Python has pretty good inspection and reflection on its own compiler/evaluater objects, you may be able to cobble something together along these lines.
"""Meta-circular implementations are suited to extending the language
they are written in. They are also useful for writing tools that are
tightly integrated with the programming language, such as
sophisticated debuggers. A language designed with a meta-circular
implementation in mind is often more suited for building languages in
general, even ones completely different from the host language."""
http://en.wikipedia.org/wiki/Meta-circular_evaluator
I believe pypy is doing something similarily, you might want to look into it.
http://pypy.org
This probably isn't going to cover your problem, but I'll mention it anyway. If myfunc is part of a module, and you are using it like this:
from mymodule import myfunc
myfunc # I want this to turn into a function call
Then you could instead do this:
import mymodule
mymodule.myfunc # I want this to turn into a function call
You could then remove myfunc from mymodule and overload the module so it calls a particular function each time the myfunc member is requested.