python string to a function call with arguments, without using eval

python string to a function call with arguments, without using eval - python

I have a string stored in a database stands for a class instance creation for example module1.CustomHandler(filename="abc.csv", mode="rb"), where CustomHandler is a class defined in module1.
I would like to evaluate this string to create a class instance for a one time use. Right now I am using something like this
statement = r'module1.CustomHandler(filename="abc.csv", mode="rb")' # actually read from db
exec(f'from parent.module import {statement.split(".")[0]}')
func_or_instance = eval(statement) # this is what I need
Only knowledgable developers can insert such records into database so I am not worried about eval some unwanted codes. But I've read several posts saying eval is unsafe and there is always a better way. Is there a way I can achieve this without using eval?

You might want to take a look at the ast Python module, which stands for abstract syntax trees. It's mainly used when you need to process the grammar of the programming language, work with code in string format, and so much more functions available in the official documentation.
In this case eval() function looks like the best solution, clear and readable, but safe only under certain conditions. For example, if you try to evaluate a code that contains a class not implemented in the code, it will throw an exception. That's the main reason why eval is sometimes unsafe.

Related

Using exec() to get class instances

I have a class and I need to use string input to create it and I've heard about exec(), so I tried using that and I put the string in properly yet it gives me errors, this is the exec line:
exec(" ".join(args[2:])).toString()
The first 2 parts of the list are not relevant. I debugged this string just to see it is correct and even tried to hardcode it and it worked, but it didn't when I used exec.
What is wrong with this and how can I make this right?
Appreciating all the comment :)
Edit:
The error I get is AttributeError saying it is a NoneType, although if I just hardcode it it works perfectly fine.

Solution 1: instead of passing a Python expression, either use argparse's subcommands or define your own easily-parsable mini-language. You can store relevant classes and functions in a dict to get them from strings (better than relying on globals as it lets you whitelist only what you want to expose)
Solution 2: use the full power of the ast module to parse the Python expression into an ast then write your own visitors to safely evaluate the ast.
Solution 3: keep on using exec or eval and wait until some script kiddie erases your full hard-drive or worse.

Which is a better repr for a custom Python class?

It seems there are different ways the __repr__ function can return.
I have a class InfoObj that stores a number of things, some of which I don't particularly want users of the class to set by themselves. I recognize nothing is protected in python and they could just dive in and set it anyway, but seems defining it in __init__ makes it more likely someone might see it and assume it's fine to just pass it in.
(Example: Booleans that get set by a validation function when it determines that the object has been fully populated, and values that get calculated from other values when enough information is stored to do so... e.g. A = B + C, so once A and B are set then C is calculated and the object is marked Valid=True.)
So, given all that, which is the best way to design the output of __ repr__?
bob = InfoObj(Name="Bob")
# Populate bob.
# Output type A:
bob.__repr__()
'<InfoObj object at 0x1b91ca42>'
# Output type B:
bob.__repr__()
'InfoObj(Name="Bob",Pants=True,A=7,B=5,C=2,Valid=True)'
# Output type C:
bob.__repr__()
'InfoObj.NewInfoObj(Name="Bob",Pants=True,A=7,B=5,C=2,Valid=True)'
... the point of type C would be to not happily take all the stuff I'd set 'private' in C++ as arguments to the constructor, and make teammates using the class set it up using the interface functions even if it's more work for them. In that case I would define a constructor that does not take certain things in, and a separate function that's slightly harder to notice, for the purposes of __repr__
If it makes any difference, I am planning to store these python objects in a database using their __repr__ output and retrieve them using eval(), at least unless I come up with a better way. The consequence of a teammate creating a full object manually instead of going through the proper interface functions is just that one type of info retrieval might be unstable until someone figures out what he did.

The __repr__ method is designed to produce the most useful output for the developer, not the enduser, so only you can really answer this question. However, I'd typically go with option B. Option A isn't very useful, and option C is needlessly verbose -- you don't know how your module is imported anyway. Others may prefer option C.
However, if you want to store Python objects is a database, use pickle.
import pickle
bob = InfoObj(Name="Bob")
> pickle.dumps(bob)
b'...some bytestring representation of Bob...'
> pickle.loads(pickle.dumps(bob))
Bob(...)
If you're using older Python (pre-3.x), then note that cPickle is faster, but pickle is more extensible. Pickle will work on some of your classes without any configuration, but for more complicated objects you might want to write custom picklers.

Use of exec and eval in Python

So I have understood what exec and eval and also compile do. But why would I need to use them? I am being unclear on the usage scenario.
Can anyone give me some examples so that I can better appreciate the concept. Cause right I know it is all theory.

I'll give an example in which I have used eval and where I think it was the best choice.
I was writing a simple software testing utility ... something to test whether student exercises were conforming to the assignment requirements. The goal was to provide a way for a simple configuration file to serve as a test specification (to get around a "chicken-and-egg" issue of using a programming language to describe/document/implement the test cases for elementary programming assignments).
I based my harness on the ConfigParser in the standard libraries. However, I did want the ability to represent arbitrary Python strings (including interpolations of \n, \t, and especially any interpolated hex encoded ASCII characters in the values read therefrom.
My solution was a try around an parsed_string=eval('''%s''' % cfg_read_item) followed by a try of the triple double-quoted version ("""%s""") of the same.
This is a case where the alternative would have been to write (or find a pre-written) Python language parser and figure out how to include and adapt it to my program. The risks are minimal (I'm not worried that student submitted code is going to trick my parser, break out if its jail, delete all my files, send my credit card numbers to Romania and so on)*
*(In part because I was testing them under Linux from an untrusted headless user account).
As here others have said, there are other use cases where you're building code from a template based on input data and need to execute that code (meta programming). You should always be able to accomplish those tasks in another way. However, whenever that alternative entails coding effort that approaches writing a general programming language parser/compiler/interpreter .... then eval may be the better approach.

The standard library has an instructive example of how to use exec. collections.namedtuple uses it to build a class dynamically.
template = '''class %(typename)s(tuple):
'%(typename)s(%(argtxt)s)' \n
__slots__ = () \n
_fields = %(field_names)r \n
def __new__(_cls, %(argtxt)s):
'Create new instance of %(typename)s(%(argtxt)s)'
return _tuple.__new__(_cls, (%(argtxt)s)) \n
...'''
namespace = dict(_itemgetter=_itemgetter, __name__='namedtuple_%s' % typename,
OrderedDict=OrderedDict, _property=property, _tuple=tuple)
try:
exec template in namespace
except SyntaxError, e:
raise SyntaxError(e.message + ':\n' + template)

ast uses compile to generate abstract syntax trees from Python source code. These are used by modules such as pyflakes to parse and validate Python.
def parse(expr, filename='<unknown>', mode='exec'):
"""
Parse an expression into an AST node.
Equivalent to compile(expr, filename, mode, PyCF_ONLY_AST).
"""
return compile(expr, filename, mode, PyCF_ONLY_AST)

You don't need to use them, and in my opinion you should avoid them.
They are only useful in cases where you are generating the code itself, which in the end is going to most likely be considered bad practice.
If you are considering using eval() for things like mathematical expressions, you would be better sanitizing the input before evaluating it. You never know what kind of 'text' the user sends in that might screw up the application itself.

I think I have a valid use. I am using Python 3.2.1 inside Blender 2.6.4 to modify a set of points with x,y coordinates (in the z-plane).
The goal is to add concentric rings of new points around each existing point, with the rings behaving as ripples, (like when you drop a stone in a pond). The catch is that I want the ripples to constructively/destructively interfere with one another, so first I'm going through and constructing a 'ripple equation' centered at each point, and summing all the ripple equations together into one gigantic mathematical equation, which I will then feed the original points into in order to generate the correct z-value to assign each one to.
My plan is to append each additional term in the equation to the previous as a string, and then use eval() to calculate the z-value for the new set of points.

Are you just asking for an example? You could write a simple app that reads from standard in and allows the user to input various expressions, like (4*2)/8 - 1. In other languages (Java, C++, etc.) this would be close to impossible to evaluate, but in python it is simple, just:
eval((4*2)/8 - 1)
That being said, unless you are careful, using these things can be very dangerous as they (essentially)allow the user a huge amount of access.

This is used in meta-programming (when the program writes itself). For example, you have animals of different species, which are described with different classes: Lion, Tiger, Horse, Donkey. And you want to simulate crossbreeding between them, for exampe, between Lion and Tiger. When you write the program, you can not determine how the user will cross the animals, but you can create new classes of animals on the fly:
new_class_name = boy.class.to_str() + girl.class.to_str()
eval("class " + new_class_name + " extends " + boy.class.to_str() + ", " + girl.class.to_str())
P. S. Sorry, I forgot Python some. So there is a bunch of pseudo-code.

Here's a valid use case. In the python paste middleware (for web programming) when an exception is raised it creates a command line within the browser. This works be using methods like these. Also, in Blender there is an option to animate values using python expressions and this works using eval.

Hot swapping python code (duck type functions?)

I've been thinking about this far too long and haven't gotten any idea, maybe some of you can help.
I have a folder of python scripts, all of which have the same surrounding body (literally, I generated it from a shell script), but have one chunk that's different than all of them. In other words:
Top piece of code (always the same)
Middle piece of code (changes from file to file)
Bottom piece of code (always the same)
And I realized today that this is a bad idea, for example, if I want to change something from the top or bottom sections, I need to write a shell script to do it. (Not that that's hard, it just seems like it's very bad code wise).
So what I want to do, is have one outer python script that is like this:
Top piece of code
Dynamic function that calls the middle piece of code (based on a parameter)
Bottom piece of code
And then every other python file in the folder can simply be the middle piece of code. However, normal module wouldn't work here (unless I'm mistaken), because I would get the code I need to execute from the arguement, which would be a string, and thus I wouldn't know which function to run until runtime.
So I thought up two more solutions:
I could write up a bunch of if statements, one to run each script based on a certain parameter. I rejected this, as it's even worse than the previous design.
I could use:
os.command(sys.argv[0] scriptName.py)
which would run the script, but calling python to call python doesn't seem very elegant to me.
So does anyone have any other ideas? Thank you.

If you know the name of the function as a string and the name of module as a string, then you can do
mod = __import__(module_name)
fn = getattr(mod, fn_name)
fn()

Another possible solution is to have each of your repetitive files import the functionality from the main file
from topAndBottom import top, bottom
top()
# do middle stuff
bottom()

In addition to the several answers already posted, consider the Template Method design pattern: make an abstract class such as
class Base(object):
def top(self): ...
def bottom(self): ...
def middle(self): raise NotImplementedError
def doit(self):
self.top()
self.middle()
self.bottom()
Every pluggable module then makes a class which inherits from this Base and must override middle with the relevant code.
Perhaps not warranted for this simple case (you do still have to import the right module in order to instantiate its class and call doit on it), but still worth keeping in mind (together with its many Pythonic variations, which I have amply explained in many tech talks now available on youtube) for cases where the number or complexity of "pluggable pieces" keeps growing -- Template Method (despite its horrid name;-) is a solid, well-proven and highly scalable pattern [[sometimes a tad too rigid, but that's exactly what I address in those many tech talks -- and that problem doesn't apply to this specific use case]].

However, normal module wouldn't work here (unless I'm mistaken), because I would get the code I need to execute from the arguement, which would be a string, and thus I wouldn't know which function to run until runtime.
It will work just fine - use __import__ builtin or, if you have very complex layout, imp module to import your script. And then you can get the function by module.__dict__[funcname] for example.

Importing a module (as explained in other answers) is definitely the cleaner way to do this, but if for some reason that doesn't work, as long as you're not doing anything too weird you can use exec. It basically runs the content of another file as if it were included in the current file at the point where exec is called. It's the closest thing Python has to a source statement of the kind included in many shells. As a bare minimum, something like this should work:
exec(open(filename).read(None))

How about this?
function do_thing_one():
pass
function do_thing_two():
pass
dispatch = { "one" : do_thing_one,
"two" : do_thing_two,
}
# do something to get your string from the command line (optparse, argv, whatever)
# and put it in variable "mystring"
# do top thing
f = dispatch[mystring]
f()
# do bottom thing

Python setattr and getattr for global scope?

Suppose I need to create my own small DSL that would use Python to describe a certain data structure. E.g. I'd like to be able to write something like
f(x) = some_stuff(a,b,c)
and have Python, instead of complaining about undeclared identifiers or attempting to invoke the function some_stuff, convert it to a literal expression for my further convenience.
It is possible to get a reasonable approximation to this by creating a class with properly redefined __getattr__ and __setattr__ methods and use it as follows:
e = Expression()
e.f[e.x] = e.some_stuff(e.a, e.b, e.c)
It would be cool though, if it were possible to get rid of the annoying "e." prefixes and maybe even avoid the use of []. So I was wondering, is it possible to somehow temporarily "redefine" global name lookups and assignments? On a related note, maybe there are good packages for easily achieving such "quoting" functionality for Python expressions?

I'm not sure it's a good idea, but I thought I'd give it a try. To summarize:
class PermissiveDict(dict):
default = None
def __getitem__(self, item):
try:
return dict.__getitem__(self, item)
except KeyError:
return self.default
def exec_with_default(code, default=None):
ns = PermissiveDict()
ns.default = default
exec code in ns
return ns

You might want to take a look at the ast or parser modules included with Python to parse, access and transform the abstract syntax tree (or parse tree, respectively) of the input code. As far as I know, the Sage mathematical system, written in Python, has a similar sort of precompiler.

In response to Wai's comment, here's one fun solution that I've found. First of all, to explain once more what it does, suppose that you have the following code:
definitions = Structure()
definitions.add_definition('f[x]', 'x*2')
definitions.add_definition('f[z]', 'some_function(z)')
definitions.add_definition('g.i', 'some_object[i].method(param=value)')
where adding definitions implies parsing the left hand sides and the right hand sides and doing other ugly stuff. Now one (not necessarily good, but certainly fun) approach here would allow to write the above code as follows:
#my_dsl
def definitions():
f[x] = x*2
f[z] = some_function(z)
g.i = some_object[i].method(param=value)
and have Python do most of the parsing under the hood.
The idea is based on the simple exec <code> in <environment> statement, mentioned by Ian, with one hackish addition. Namely, the bytecode of the function must be slightly tweaked and all local variable access operations (LOAD_FAST) switched to variable access from the environment (LOAD_NAME).
It is easier shown than explained: http://fouryears.eu/wp-content/uploads/pydsl/
There are various tricks you may want to do to make it practical. For example, in the code presented at the link above you can't use builtin functions and language constructions like for loops and if statements within a #my_dsl function. You can make those work, however, by adding more behaviour to the Env class.
Update. Here is a slightly more verbose explanation of the same thing.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.