evaluate string arithmetic expression using python - python

How can I evaluate following:
a=b=c=d=e=0 # initially
if user enters:
"a=b=4" as a string, it should modify the existing value.
So result would be something like a=4, b=4
if user enters:
"a=(c=4)*2", it should evaluate as expression and update the values.
so result would be something like a=8, c=4
The brackets can be nested further.
Any help would be really appreciated. I am using python.

If this is a trivial console script where security is absolutely no concern, you can use exec() to execute mathematical statements. However, python does not support code such as a=(c=4)*2, so it won't be possible to do natively.
However, exec() is a gaping security hole if this is running on, say, a web server. If this is expected to be something where untrusted and potentially malicious users can submit commands, you should look into either sanitizing it and parsing it yourself, or implementing sandboxing.
TL;DR Since you're working on custom commands not supported, you should write your own parser to handle and execute these without worrying about executing untrusted code.

Probably the best way is writing a parser for this particular grammar (the worst is something involving eval/exec - submitting user-provided content to these functions is a security can of worms).
Take a look at this example:
http://pyparsing.wikispaces.com/file/detail/SimpleCalc.py

Related

How to store a python function within an XML file

I'm developing a GUI application in Python that stores it's documents in an XML based format. The application is a mathematical model which several pre-defined components which can be drag-and-dropped. I'd also like the user to be able to create custom components by writing a python function inside an editor provided within the application. My issue is with storing these functions in the XML.
A function might look something like this:
def func(node, timestamp):
return node.weight * timestamp.day + 4
These functions are wrapped in an object which provides a standard way of calling them (compared to the pre-defined components). If I was to create one from Python directly it would look like this:
parameter = ParameterFunction(func)
The function is then called by the model like this:
parameter.value(node=node, timestamp=timestamp)
The ParameterFunction object has a to_xml and from_xml functions which need to serialise/deserialise the object to/from an XML representation.
My question is: how do I store the Python functions in an XML document?
One solution I have thought of so far is to store the function definition as a string, eval() or exec() it for use but keep the string, then store the string in a CDATA block in the XML. Are there any issues with this that I'm not seeing?
An alternative would be to store all of the Python code in a separate file, and have the XML reference just the function names. This could be nice as it could be edited easily in an external editor. In which case what is the best way to import the code? I am envisiging fighting with the python import path...
I'm aware there are will be security concerns with running untrusted code, but I'm willing to make this tradeoff for the freedom it gives users.
The specific application I'm referring to is on github. I'm happy to provide more information if it's needed, but I've tried to keep it fairly generic here. https://github.com/snorfalorpagus/pywr/blob/120928eaacb9206701ceb9bc91a5d73740db1953/pywr/core.py#L396-L402
Nope, you have the easiest and best solution that I can think of. Just keep them as strings, as long as your not worried about running the untrusted code.
The way I'd deal with external python scripts containing tiny snippets like yours would be to treat them as plain text files and read them in as strings. This avoids all the problems with importing them. Just read them in and call exec on them, then the functions will exist in scope.
EDIT: I was going to add something on sandboxing python code, but after a bit of research it seems this will not be an easy task, it would be easier to sandbox the entire program. Another longer and harder way to restrict the untrusted code would be to create your own tiny interpreter that only did safe operations (i.e mathematical operations, calling existing functions, etc..)

How do I sanitize a list comprehension given by a user?

I am working on an interface for a simulator that is meant to be friendly to people who prefer the command line to a GUI. To give the simulator the levels, the user types the information into a file, which is then parsed and generates design points, which are then sent to a main server.
I would like to be able to implement some sort of "range" feature so that the user will not need to type out all of the individual levels. More power is needed than a simple additive sequence. Since the parser and related code is already in Python, this seems like a perfect use case for list comprehensions. However, the list comprehension is user input and not guaranteed to be valid. Using eval seems too dangerous, and literal_eval does not support list comprehensions.
My current goal is for something like this to be valid and safe:
{"Factor 1": [1,2,3,7,8],
"Factor 2": "[2**x for x in range(5,20) if (x % 3) == 0]"}
The base format for files that the user types is JSON. I am looking to extend the language to have additional features (like range) to fill various user needs. "Data set 1" can be parsed in the existing system. The list comprehension will be evaluated on the user's machine, so simple attacks like 'x'*9**999999**99999 are self-destructive.
It seems relatively easy to sanitize the range function using a regex, but I'm not sure how to make sure that the other parts are safe. Are regexes sufficient for this task, or is there another approach I should be following?
Further analysis seems to show that eval is less dangerous than usual here. The parsing is all done client side, and the code of the project is open-source. Therefore, anything malicious that the user could do by exploiting an eval is either self-destructive or possible through less convoluted methods. Therefore, I can just use eval to generate the list of levels.
Of course, I will have to heavily document this decision due to the "eval is evil; kill it with fire" reaction that most people (including myself) have towards its use.

programming language implemented in pure python

i am creating ( researching possibility of ) a highly customizable python client and would like to allow users to actually edit the code in another language to customize the running of program. ( analogous to browser which itself coded in c/c++ and run another language html/js ). so my question is , is there any programming language implemented in pure python which i can see as a reference ( or use directly ? ) -- i need simple language ( simple statements and ifs can do )
edit: sorry if i did not make myself clear but what i want is "a language to customize the running of program" , even though pypi seems a great option, what i am looking for is more simple which i can study and extend myself if need arise. my google searches pointing towards xml based langagues. ( BMEL , XForms etc ).
The question isn't completely clear on scope, but I have a hunch that PyPy, embedding other full languages, and similar solutions might be overkill. It sounds like iamgopal may really be interested in something more like Interpreter Pattern or Little Language.
If the language you want to support is really small (see the Interpreter Pattern link), then hand-coding this yourself in Python won't be too hard. You can write a simple parser (Google around; here's one example), then walk the AST and evaluate user expressions.
However, if you expect this to be used for a long time or by many people, it may be worth throwing a real language at the problem. (I'd recommend Python itself if your users are already familiar with basic Python syntax).
Ren'Py is a modification to Python syntax built on top of Python itself, using the language tools in the stdlib.
For your user's sake, don't use an XML based language - XML is an awful basis for a programming language and your users will hate you for it.
Here is a suggestion. Use a strict subset of Python for your language. Use the compiler module to convert their code into an abstract syntax tree and walk the tree to to validate that the code conforms to your subset before converting the AST into python bytecode.
N.B. I just checked the docs and see that the compiler package is deprecated in 2.6 and removed in Python 3.x. Does anyone know why that is?
Numerous template languages such as Cheetah, Django templates, Genshi, Mako, Mighty might serve as an example.
Why not Python itself? With some care you can use eval to run user code.
One of the good thing about interpreted scripting languages is that you don't need another extra scripting language!
PLY (Python Lex-Yacc)
is something of your interest.
Possibly Common Lisp (or any other Lisp) will be the best choice for that task. Because Lisp make it possible to easily extend host language with powerful macroses and construct DSL (domain specific language).
If all you need is simple if statements and expressions, I'm sure it wouldn't be an awful task to parse each line. Something like
if some flag
activate some feature
deactivate some feature
elif some other flag
activate some feature
activate some feature
else
logout
Just write a class which, while parsing takes the first word, checks if it's "if, elif, else," etc, and if so, check a flag and set a flag saying you either are or are not executing until the next conditional. If it's not a conditional, call a function based on the first keyword that would modify the program state in some way.
The class could store some local execution state (are we in an if statement? If so are we executing this branch?) and have another class containing some global application state (flags that are checkable by if statements, etc).
This is probably the wrong thing to do in your situation (it's very prone to bugs, it's dangerous if you don't treat the data in the scripts correctly), but it's at least a start if you do decide to interpret your own mini-language.
Seriously though, if you try this, be very, very, srs careful. Don't give the scripts any functionality that they don't definitely need, because you are almost certainly opening security holes by doing something like this.
Don't say I didn't warn you.

Security of Python's eval() on untrusted strings?

If I am evaluating a Python string using eval(), and have a class like:
class Foo(object):
a = 3
def bar(self, x): return x + a
What are the security risks if I do not trust the string? In particular:
Is eval(string, {"f": Foo()}, {}) unsafe? That is, can you reach os or sys or something unsafe from a Foo instance?
Is eval(string, {}, {}) unsafe? That is, can I reach os or sys entirely from builtins like len and list?
Is there a way to make builtins not present at all in the eval context?
There are some unsafe strings like "[0] * 100000000" I don't care about, because at worst they slow/stop the program. I am primarily concerned about protecting user data external to the program.
Obviously, eval(string) without custom dictionaries is unsafe in most cases.
eval() will allow malicious data to compromise your entire system, kill your cat, eat your dog and make love to your wife.
There was recently a thread about how to do this kind of thing safely on the python-dev list, and the conclusions were:
It's really hard to do this properly.
It requires patches to the python interpreter to block many classes of attacks.
Don't do it unless you really want to.
Start here to read about the challenge: http://tav.espians.com/a-challenge-to-break-python-security.html
What situation do you want to use eval() in? Are you wanting a user to be able to execute arbitrary expressions? Or are you wanting to transfer data in some way? Perhaps it's possible to lock down the input in some way.
You cannot secure eval with a blacklist approach like this. See Eval really is dangerous for examples of input that will segfault the CPython interpreter, give access to any class you like, and so on.
You can get to os using builtin functions: __import__('os').
For python 2.6+, the ast module may help; in particular ast.literal_eval, although it depends on exactly what you want to eval.
Note that even if you pass empty dictionaries to eval(), it's still possible to segfault (C)Python with some syntax tricks. For example, try this on your interpreter: eval("()"*8**5)
You are probably better off turning the question around:
What sort of expressions are you wanting to eval?
Can you insure that only strings matching some narrowly defined syntax are eval()d?
Then consider if that is safe.
For example, if you are wanting to let the user enter an algebraic expression for evaluation, consider limiting them to one letter variable names, numbers, and a specific set of operators and functions. Don't eval() strings containing anything else.
There is a very good article on the un-safety of eval() in Mark Pilgrim's Dive into Python tutorial.
Quoted from this article:
In the end, it is possible to safely
evaluate untrusted Python expressions,
for some definition of “safe” that
turns out not to be terribly useful in
real life. It’s fine if you’re just
playing around, and it’s fine if you
only ever pass it trusted input. But
anything else is just asking for
trouble.

Partial evaluation for parsing

I'm working on a macro system for Python (as discussed here) and one of the things I've been considering are units of measure. Although units of measure could be implemented without macros or via static macros (e.g. defining all your units ahead of time), I'm toying around with the idea of allowing syntax to be extended dynamically at runtime.
To do this, I'm considering using a sort of partial evaluation on the code at compile-time. If parsing fails for a given expression, due to a macro for its syntax not being available, the compiler halts evaluation of the function/block and generates the code it already has with a stub where the unknown expression is. When this stub is hit at runtime, the function is recompiled against the current macro set. If this compilation fails, a parse error would be thrown because execution can't continue. If the compilation succeeds, the new function replaces the old one and execution continues.
The biggest issue I see is that you can't find parse errors until the affected code is run. However, this wouldn't affect many cases, e.g. group operators like [], {}, (), and `` still need to be paired (requirement of my tokenizer/list parser), and top-level syntax like classes and functions wouldn't be affected since their "runtime" is really load time, where the syntax is evaluated and their objects are generated.
Aside from the implementation difficulty and the problem I described above, what problems are there with this idea?
Here are a few possible problems:
You may find it difficult to provide the user with helpful error messages in case of a problem. This seems likely, as any compilation-time syntax error could be just a syntax extension.
Performance hit.
I was trying to find some discussion of the pluses, minuses, and/or implementation of dynamic parsing in Perl 6, but I couldn't find anything appropriate. However, you may find this quote from Nicklaus Wirth (designer of Pascal and other languages) interesting:
The phantasies of computer scientists
in the 1960s knew no bounds. Spurned
by the success of automatic syntax
analysis and parser generation, some
proposed the idea of the flexible, or
at least extensible language. The
notion was that a program would be
preceded by syntactic rules which
would then guide the general parser
while parsing the subsequent program.
A step further: The syntax rules would
not only precede the program, but they
could be interspersed anywhere
throughout the text. For example, if
someone wished to use a particularly
fancy private form of for statement,
he could do so elegantly, even
specifying different variants for the
same concept in different sections of
the same program. The concept that
languages serve to communicate between
humans had been completely blended
out, as apparently everyone could now
define his own language on the fly.
The high hopes, however, were soon
damped by the difficulties encountered
when trying to specify, what these
private constructions should mean. As
a consequence, the intreaguing idea of
extensible languages faded away rather
quickly.
Edit: Here's Perl 6's Synopsis 6: Subroutines, unfortunately in markup form because I couldn't find an updated, formatted version; search within for "macro". Unfortunately, it's not too interesting, but you may find some things relevant, like Perl 6's one-pass parsing rule, or its syntax for abstract syntax trees. The approach Perl 6 takes is that a macro is a function that executes immediately after its arguments are parsed and returns either an AST or a string; Perl 6 continues parsing as if the source actually contained the return value. There is mention of generation of error messages, but they make it seem like if macros return ASTs, you can do alright.
Pushing this one step further, you could do "lazy" parsing and always only parse enough to evaluate the next statement. Like some kind of just-in-time parser. Then syntax errors could become normal runtime errors that just raise a normal Exception that could be handled by surrounding code:
def fun():
not implemented yet
try:
fun()
except:
pass
That would be an interesting effect, but if it's useful or desirable is a different question. Generally it's good to know about errors even if you don't call the code at the moment.
Macros would not be evaluated until control reaches them and naturally the parser would already know all previous definitions. Also the macro definition could maybe even use variables and data that the program has calculated so far (like adding some syntax for all elements in a previously calculated list). But this is probably a bad idea to start writing self-modifying programs for things that could usually be done as well directly in the language. This could get confusing...
In any case you should make sure to parse code only once, and if it is executed a second time use the already parsed expression, so that it doesn't lead to performance problems.
Here are some ideas from my master's thesis, which may or may not be helpful.
The thesis was about robust parsing of natural language.
The main idea: given a context-free grammar for a language, try to parse a given
text (or, in your case, a python program). If parsing failed, you will have a partially generated parse tree. Use the tree structure to suggest new grammar rules that will better cover the parsed text.
I could send you my thesis, but unless you read Hebrew this will probably not be useful.
In a nutshell:
I used a bottom-up chart parser. This type of parser generates edges for productions from the grammar. Each edge is marked with the part of the tree that was consumed. Each edge gets a score according to how close it was to full coverage, for example:
S -> NP . VP
Has a score of one half (We succeeded in covering the NP but not the VP).
The highest-scored edges suggest a new rule (such as X->NP).
In general, a chart parser is less efficient than a common LALR or LL parser (the types usually used for programming languages) - O(n^3) instead of O(n) complexity, but then again you are trying something more complicated than just parsing an existing language.
If you can do something with the idea, I can send you further details.
I believe looking at natural language parsers may give you some other ideas.
Another thing I've considered is making this the default behavior across the board, but allow languages (meaning a set of macros to parse a given language) to throw a parse error at compile-time. Python 2.5 in my system, for example, would do this.
Instead of the stub idea, simply recompile functions that couldn't be handled completely at compile-time when they're executed. This will also make self-modifying code easier, as you can modify the code and recompile it at runtime.
You'll probably need to delimit the bits of input text with unknown syntax, so that the rest of the syntax tree can be resolved, apart from some character sequences nodes which will be expanded later. Depending on your top level syntax, that may be fine.
You may find that the parsing algorithm and the lexer and the interface between them all need updating, which might rule out most compiler creation tools.
(The more usual approach is to use string constants for this purpose, which can be parsed to a little interpreter at run time).
I don't think your approach would work very well. Let's take a simple example written in pseudo-code:
define some syntax M1 with definition D1
if _whatever_:
define M1 to do D2
else:
define M1 to do D3
code that uses M1
So there is one example where, if you allow syntax redefinition at runtime, you have a problem (since by your approach the code that uses M1 would be compiled by definition D1). Note that verifying if syntax redefinition occurs is undecidable. An over-approximation could be computed by some kind of typing system or some other kind of static analysis, but Python is not well known for this :D.
Another thing that bothers me is that your solution does not 'feel' right. I find it evil to store source code you can't parse just because you may be able to parse it at runtime.
Another example that jumps to mind is this:
...function definition fun1 that calls fun2...
define M1 (at runtime)
use M1
...function definition for fun2
Technically, when you use M1, you cannot parse it, so you need to keep the rest of the program (including the function definition of fun2) in source code. When you run the entire program, you'll see a call to fun2 that you cannot call, even if it's defined.

Categories

Resources