how do exec and eval add __builtins__ to a given environment?

how do exec and eval add __builtins__ to a given environment? - python

I'm trying to understand how eval and exec treat the environment (globals and locals) that they are given, so I made a class "logdict" which behaves like a dict but logs most methods (__new__ is excluded):
from functools import wraps
class LogDict(dict):
logs = {}
def _make_wrapper(name):
#wraps(getattr(dict, name))
def wrapper(self, *args, **kwargs):
LogDict.logs.setdefault(id(self), []).append({
'name': name,
'args': tuple(map(repr, args)),
'kwargs': dict((key, repr(kwargs[key])) for key in kwargs)
})
return getattr(super(), name)(*args, **kwargs)
return wrapper
for attr in dir(dict):
if callable(getattr(dict, attr)) and attr not in {'__new__',}:
locals()[attr] = _make_wrapper(attr)
def logrepr(self):
return ''.join(
"{fun}({rargs}{optsep}{rkwargs})\n".format(
fun = logitem['name'],
rargs = ', '.join(logitem['args']),
optsep = ', ' if len(logitem['kwargs'])>0 else '',
rkwargs = ', '.join('{} = {}'\
.format(key, logitem['kwargs'][key]) for key in logitem['kwargs'])
)
for logitem in LogDict.logs[id(self)])
as an example, this code:
d = LogDict()
d['1'] = 3
d['1'] += .5
print('1' in d)
print('log:')
print(d.logrepr())
produces this output:
True
log:
__init__()
__setitem__('1', 3)
__getitem__('1')
__setitem__('1', 3.5)
__contains__('1')
__getattribute__('logrepr')
I tried feeding this to exec in order to understand how it was being used, but I can't see it accessing the dictionary beyond what makes sense:
print('\tTesting exec(smth, logdict):')
d = LogDict()
exec('print("this line is inside the exec statement")', d)
print('the log is:')
print(d.logrepr(), end='')
print('the env now contains:')
print(d)
Testing exec(smth, logdict):
this line is inside the exec statement
the log is:
__init__()
__getitem__('print')
__getattribute__('logrepr')
the env now contains:
[a dictionary containing __builtins__]
so the exec function didn't call any of the methods I'm logging except __getitem__ to see if 'print' was in it (__getattribute__ is called later when I print the log); how did it set the key '__builtins__' (or check that it wasn't already defined)? Am I just missing the method it's using, or is it doing something more low-level?

The exec function uses low-level dictionary functions in the Python C API to insert the __builtins__ module into the global namespace dictionary. You can see the call in the CPython source code.
Because the call is to low level dict API, it doesn't look in your class to find your overridden __setitem__ method, it just directly writes into the underlying dictionary storage. The exec function requires that the global namespace passed in to it is a dict (or a dict subclass, but not some other mapping type), so this is always safe, at least in terms of not crashing the interpreter. But it does bypass your logging.
Unfortunately, I don't see any way to get logging added so that you can see __builtins__ get added to the global namespace. That probably means your attempt to directly observe exec's behavior is doomed. But perhaps reading the C source code is a suitable alternative, if you're just trying to understand what it does. One of the perks of using an open source programming language is that you can just go look up how the interpreter is programmed when you have questions like this. It does require reading C, rather than just Python, but the builtin_exec_impl function is straight forward enough (the actual code execution happens elsewhere and is surely much more complicated).

Related

Implement f-string like magic to pimp Django's format_html()

I would like to pimp format_html() of Django.
It already works quite nicely, but my IDE (PyCharm) thinks the variables are not used and paints them in light-gray color:
AFAIK f-strings use some magic rewriting.
Is there a way to implement this, so that the IDE knows that the variables get used?
Related: Implement f-string like syntax, with Django SafeString support
Here is my current implementation:
def h(html):
"""
Django's format_html() on steroids
"""
def replacer(match):
call_frame = sys._getframe(3)
return conditional_escape(
eval(match.group(1), call_frame.f_globals, call_frame.f_locals))
return mark_safe(re.sub(r'{(.*?)}', replacer, html))
Somebody raised security concerns: I don't plan to create CMS where a user can edit these templates. These template h-strings are only for developers to have a convenient way to create HTML.
Before writing an answer, be sure you know the magic of conditional_escape()

Since you don’t seem above using dirty hacks, here’s a hack even dirtier than the one in the question:
class _escaper(dict):
def __init__(self, other):
super().__init__(other)
def __getitem__(self, name):
return conditional_escape(super().__getitem__(name))
_C = lambda value: (lambda: value).__closure__[0]
_F = type(_C)
try:
type(_C(None))(None)
except:
pass
else:
_C = type(_C(None))
def h(f):
if not isinstance(f, _F):
raise TypeError(f"expected a closure, a {type(f).__name__} was given")
closure = None
if f.__closure__:
closure = tuple(
_C(conditional_escape(cell.cell_contents))
for cell in f.__closure__
)
fp = _F(
f.__code__, _escaper(f.__globals__),
f.__name__, f.__defaults__, closure
)
return mark_safe(fp())
The h function takes a closure, and for each variable closed over, it creates another, escaped copy of the variable, and modifies the closure to capture that copy instead. The globals dict is also wrapped to ensure references to globals are likewise escaped. The modified closure is then immediately executed, its return value is marked safe and returned. So you must pass h an ordinary function (no bound methods, for example) which accepts no arguments, preferably a lambda returning an f-string.
Your example then becomes:
foo = '&'
bar = h(lambda: f'<span>{foo}</span>')
assert h(lambda: f'<div>{bar}</div>') == '<div><span>&</span></div>'
There’s one thing, though. Due to how this has been implemented, you can only ever refer directly to variables inside interpolation points; no attribute accesses, no item lookups, nothing else. (You shouldn’t put string literals at interpolation points either.) Otherwise though, it works in CPython 3.9.2 and even in PyPy 7.3.3. I make no claims about it ever working in any other environment, or being in any way future-proof.

Changing function code using a decorator and execute it with eval?

I am trying to apply a decorator that changes function code, and then execute this function with changed code.
Below is the temp module with example function. I simply want the function to return [*args, *kwargs.items(), 123] instead of [*args, *kwargs.items()] if some_decorator is applied to this function.
Edit: please note that this is only a toy example, I don't intend to append new values to a list but rather rewrite a big chunk of a function.
from inspect import getsource
def some_decorator(method):
def wrapper(*args, **kwargs):
source_code = getsource(method)
code_starts_at = source_code.find('):') + 2
head = source_code[:code_starts_at]
body = source_code[code_starts_at:]
lines = body.split('\n')
return_line = [i for i in lines if 'return' in i][0]
old_expr = return_line.replace(' return ', '')
new_expr = old_expr.replace(']', ', 123]')
new_expr = head + '\n' + ' return ' + new_expr
return eval(new_expr)
return wrapper
#some_decorator
def example_func(*args, *kwargs):
return [*args, *kwargs]
A bit more of explanation: I am rewriting the original function
def example_func(*args, **kwargs):
return [*args, *kwargs.items()]
to
def example_func(*args, **kwargs):
return [*args, *kwargs.items(), 123]
I hope that eval is able to compile and run this modified function.
When I try to run it it returns a syntax error.
from temp import example_func
example_func(5)
I am aware that eval is able to cope with this:
[*args, *kwargs.items(), 123]
but only if args and kwargs are already declared. I want them to be read from example_func(args, kwargs) when I am executing example_func.
I suppose that simply writing the modified function code to a file
def example_func(*args, **kwargs):
return [*args, *kwargs.items(), 123]
and making some_decorator to execute the function with modified code instead of original one, would work just fine. However, ideally I would skip creating any intermediary files.
Is it possible to achieve that?

While you technically can do just about anything with functions and decorators in Python, you shouldn't.
In this specific case, adding an extra value to a function that returns a list is as simple as:
def some_decorator(method):
def wrapper(*args, **kwargs):
result = method(*args, **kwargs)
return result + [123]
return wrapper
This doesn't require any function code rewriting. If all you are doing is alter the inputs or the output of a function, just alter the inputs or output, and leave the function itself be.
Decorators are primarily just syntactic sugar here, a way to change
def function_name(*args, **kwargs):
# ...
function_name = create_a_wrapper_for(function_name)
into
#create_a_wrapper_for
def function_name(*args, **kwargs):
# ...
Also note that the eval() function can't alter your fuction, because eval() is strictly limited to expressions. The def syntax to create a function is a statement. Fundamentally, statements can contain expressions and other statements (e.g. if <test_expression>: <body of statements>) but expressions can't contain statements. This is why you are getting a SyntaxError exception; while [*args, *kwargs.items()] is a valid expression, return [*args, *kwargs.items()] is a statement (containing an expression):
>>> args, kwargs = (), {}
>>> eval("[*args, *kwargs.items()]")
[]
>>> eval("return [*args, *kwargs.items()]")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1
return [*args, *kwargs.items()]
^
SyntaxError: invalid syntax
To execute text as arbitrary Python code, you would have to use the exec() function instead, and take care to use the same namespace as the original function so any globals used in the original function still can be accessed.
For example, if the function calls another function to obtain an extra value:
def example(*args, **kwargs):
return [extra_value(), *args, *kwargs.items()]
def extra_value():
return 42
then you can’t execute the example() function in isolation; it is part of the module global namespace and looks up extra_value in that namespace when you call the function. Functions have a reference to the global namespace of the module they are created in, accessible via the function.__globals__ attribute. When you use exec() to execute a def statement creating a function, then the new function object is connected to the global namespace you passed in. Note that def creates a function object and assigns it to the function name, so you’ll have to retrieve that object from the same namespace again:
>>> namespace = {}
>>> exec("def foo(): return 42", namespace)
>>> namespace["foo"]
<function foo at 0x7f8194fb1598>
>>> namespace["foo"]()
42
>>> namespace["foo"].__globals__ is namespace
True
Next, text manipulation to rebuild Python code is very inefficient and error prone. For example, your str.replace() code would fail if the function used this instead:
def example(*args, **kwargs):
if args or kwargs:
return [
"[called with arguments:]",
*args,
*kwargs.items()
]
because now return is indented further, there are [..] brackets in a string value in the list, and the closing ] bracket of the list is on a separate line altogether.
You'd be far better off having Python compile source code into an Abstract Syntax Tree (via the ast module), then work on that tree. A directed graph of well-defined objects is much easier to manipulate than text (which is is much more flexible in how much whitespace is used, etc.). Both the above code and your example would result in a tree with a Return() node that contains an expression whose top-level would be a List() node. You can traverse that tree and find all Return() nodes and alter their List() nodes, adding an extra node to the end of the list contents.
A Python AST can be compiled to a code object (with compile()) then run through exec() (which accepts not only text, but code objects as well).
For a real-world example of a project that rewrites Python code, look at how pytest rewrites the assert statement to add extra context. They use a module import hook to do this, but as long as the source code is available for a function, you can do so with a decorator too.
Here is an example of using the ast module to alter the list in a return statement, adding in an arbitrary constant:
import ast, inspect, functools
class ReturnListInsertion(ast.NodeTransformer):
def __init__(self, value_to_insert):
self.value = value_to_insert
def visit_FunctionDef(self, node):
# remove the `some_decorator` decorator from the AST
# we don’t need to keep applying it.
if node.decorator_list:
node.decorator_list = [
n for n in node.decorator_list
if not (isinstance(n, ast.Name) and n.id == 'some_decorator')
]
return self.generic_visit(node)
def visit_Return(self, node):
if isinstance(node.value, ast.List):
# Python 3.8 and up have ast.Constant instead, which is more
# flexible.
node.value.elts.append(ast.Num(self.value))
return self.generic_visit(node)
def some_decorator(method):
source_code = inspect.getsource(method)
tree = ast.parse(source_code)
updated = ReturnListInsertion(123).visit(tree)
# fix all line number references, make it match the original
updated = ast.increment_lineno(
ast.fix_missing_locations(updated),
method.__code__.co_firstlineno
)
ast.copy_location(updated.body[0], tree)
# compile again, as a module, then execute the compiled bytecode and
# extract the new function object. Use the original namespace
# so that any global references in the function still work.
code = compile(tree, inspect.getfile(method), 'exec')
namespace = method.__globals__
exec(code, namespace)
new_function = namespace[method.__name__]
# update new function with old function attributes, name, module, documentation
# and attributes.
return functools.update_wrapper(new_function, method)
Note that this doesn't need a wrapper function. you don't need to re-work a function each time you try to call it, the decorator could do it just once and return the resulting function object directly.
Here is a demo module to try it out with:
#some_decorator
def example(*args, **kwargs):
return [extra_value(), *args, *kwargs.items()]
def extra_value():
return 42
if __name__ == '__main__':
print(example("Monty", "Python's", name="Flying circus!"))
The above outputs [42, 'Monty', "Python's", ('name', 'Flying circus!'), 123] when run.
However, it is much easier to just use the first method.
If you do want to pursue using exec() and AST manipulation, I can recommend you read up on how to do that in Green Tree Snakes.

Why must a new global variable be created to reference the current class instance in "exec"?

I have a class that contains ~20 methods, and in def __init__(self, ...): I have to call many of these methods (~9) but I didn't want to have to call each individual method one by one.
So I took the easy way out and created two list list comprehensions, that use exec to call each method:
[exec("self.create%s()" % x) for x in "ArticleObjects SeriesObjects ArticleList SearchList".split(" ")]
[exec("self.compile%sPage(self)" % x) for x in "About Screenshots Search Contact Articles".split(" ")]
When I ran this code using python3 filename.py I got an error, that read:
NameError: name 'self' is not defined
Through trial and error I found that; in order to get this code to work I had to create a copy of self called instance and make the new instance variable a global variable and then call the method using ClassName.methodName(instance) instead of self.methodName():
With the working code being:
global instance; instance = self
[exec("ClassNamecreate%s(instance)" % x) for x in "ArticleObjects SeriesObjects ArticleList SearchList".split(" ")]
[exec("ClassName.compile%sPage(instance)" % x) for x in "About Screenshots Search Contact Articles".split(" ")]
Why is this? Why is the self variable undefined in exec despite it being available to the scope that exec is being called in?
Update: I'm using Python 3.6.7

There's lots of good suggestions here for how to avoid the exec statement (which is generally bad), but to answer your question about why this happens, it's got more to do with the list comprehension. List comprehensions create a new scope, and when you call exec without a globals or locals argument, it uses the locals() function:
Note: The default locals act as described for function locals() below
Source
Here you can see what the results of the locals() function look like from within a list comprehension:
class Sample:
def __init__(self):
k = 4
print(locals())
exec("print(locals())")
[print(locals()) for x in range(1)]
[exec("print(locals())") for x in range(1)]
Sample()
output:
{'k': 4, 'self': <__main__.Sample object at 0x00000000030295C0>}
{'k': 4, 'self': <__main__.Sample object at 0x00000000030295C0>}
{'x': 0, '.0': <range_iterator object at 0x00000000030019F0>}
{'x': 0, '.0': <range_iterator object at 0x00000000030019F0>}
So, locals() is the same inside or outside the exec. It's the list comprehension that changes it. Only, when you're outside an exec statement, the interpreter can fall past the locals of the list comprehension and find self in the outer scope. No such luck once you call exec.

Using getattr is simpler (and usually safer) than exec. Try something along these lines:
def __init__(self):
suffixes = ["ArticleObjects", "SeriesObjects", ...]
for suffix in suffixes:
method = getattr(self, "create" + suffix)
method()

I wouldn't use exec for this. While it may be the shortest version, it might also confuse both collaborators and code analysis tools. I'd use something like this instead:
class Test:
def __init__(self):
for f in (self.createA, self.createB, self.createC):
f()
def createA(self):
print("A")
def createB(self):
print("B")
def createC(self):
print("C")

Why is it forbidden to override log record attributes?

Reading the documentation of Python's logging library (for version 2.7) I came across the following:
Logger.debug(msg, *args, **kwargs)
[...] The second keyword argument is extra which can be used to pass a dictionary which is used to populate the __dict__ of the LogRecord created for the logging event with user-defined attributes. These custom attributes can then be used as you like. For example, they could be incorporated into logged messages. [...] The keys in the dictionary passed in extra should not clash with the keys used by the logging system. [emph. mine]
So why does this constraint exist? In my opinion this removes flexibility from the library for no good reason (it is up to the developer to check which keys are built-in and which are not).
Imagine you want to write a decorator which logs function entry and exit:
def log_entry_exit(func):
def wrapper(*args, **kwargs):
logger.debug('Entry')
result = func(*args, **kwargs)
logger.debug('Exit')
return result
return wrapper
#log_entry_exit
def foo():
pass
Suppose you also want to log the name of the enclosing function:
format_string = '%(funcName)s: %(message)s'
Oops! This doesn't work. The output is:
>>> foo()
wrapper: Entry
wrapper: Exit
Of course the function name evaluates to wrapper because that is the enclosing function. However this is not what I want. I want the function name of the decorated function to be printed. Therefore it would be very convenient to just modify my logging calls to:
logger.debug('<msg>', extra={'funcName': func.__name__})
However (as the documentation already points out) this doesn't work:
KeyError: "Attempt to overwrite 'funcName' in LogRecord"
Nevertheless this would be a very straightforward and light solution to the given problem.
So again, why is logging preventing me from setting custom values for built-in attributes?

Not being the author, I can't be sure, but I have a hunch.
Looking at
https://hg.python.org/cpython/file/3.5/Lib/logging/__init__.py, this seems to be the code that threw the error you quoted:
rv = _logRecordFactory(name, level, fn, lno, msg, args, exc_info, func, sinfo)
if extra is not None:
for key in extra:
if (key in ["message", "asctime"]) or (key in rv.__dict__):
raise KeyError("Attempt to overwrite %r in LogRecord" % key)
rv.__dict__[key] = extra[key]
Looking at the __init__() method in that file, we can see that it sets a long list of attributes, at least some of which are used to keep track of object state (to borrow terminology from elsewhere, these serve the purpose of private member variables):
self.args = args
self.levelname = getLevelName(level)
self.levelno = level
self.pathname = pathname
try:
self.filename = os.path.basename(pathname)
self.module = os.path.splitext(self.filename)[0]
except (TypeError, ValueError, AttributeError):
self.filename = pathname
self.module = "Unknown module"
self.exc_info = exc_info
self.exc_text = None # used to cache the traceback text
self.stack_info = sinfo
self.lineno = lineno
self.funcName = func
[...]
The code makes assumptions in various places that these attributes contain what they were initialized to contain; rather than defensively checking whether the value is still sensible every time that it's used, it blocks attempts to update any of them, as we've seen above. And, instead of trying to distinguish between "safe-to-overwrite" and "unsafe-to-overwrite" attributes, it simply blocks any overwriting.
In the particular case of funcName, I suspect you won't suffer any ill effects (other than having a different funcName displayed) by overwriting it.
Possible ways forward:
live with the limitation
override Logger.makeRecord() to permit an update of funcName
override Logger to add a setFuncName() method
Of course, whatever you do, test your modification carefully to avoid surprises.

I know this is a few years old, but there is no chosen answer. If anyone else comes across it I have a workaround that should continue to work while the logging module undergoes changes.
Unfortunately, the author doesn't expose the keys that would conflict in a way that makes them easy to check for. However he/she does hint at a way to do so in the docs. This line: https://hg.python.org/cpython/file/3.5/Lib/logging/init.py#l368 returns a shell of a LogRecord object:
rv = _logRecordFactory(None, None, "", 0, "", (), None, None)
...and in this object you can see all the properties and you can make a Set that holds the "conflicting keys".
I created a logging helper module:
import logging
clashing_keywords = {key for key in dir(logging.LogRecord(None, None, "", 0, "", (), None, None)) if "__" not in key}
additional_clashing_keywords = {
"message",
"asctime"
}
clashing_keywords = clashing_keywords.union(additional_clashing_keywords)
def make_safe_kwargs(kwargs):
'''
Makes sure you don't have kwargs that might conflict with
the logging module
'''
assert isinstance(kwargs, dict)
for k in kwargs:
if k in clashing_keywords:
kwargs['_'+k] = kwargs.pop(k)
return kwargs
...which just prepends conflicting keys with a _. It can be used like so:
from mymodule.logging_helpers import make_safe_kwargs
logger.info("my message", extra=make_safe_kwargs(kwargs))
It's been working well for me. Hope this helps!

The short answer for me was to identify the name clash, and rename the kwarg:
#broken
log.info('some message', name=name)
# working
log.info('some message', special_name=name)

Call python function as if it were inline

I want to have a function in a different module, that when called, has access to all variables that its caller has access to, and functions just as if its body had been pasted into the caller rather than having its own context, basically like a C Macro instead of a normal function. I know I can pass locals() into the function and then it can access the local variables as a dict, but I want to be able to access them normally (eg x.y, not x["y"] and I want all names the caller has access to not just the locals, as well as things that were 'imported' into the caller's file but not into the module that contains the function.
Is this possible to pull off?
Edit 2 Here's the simplest possible example I can come up with of what I'm really trying to do:
def getObj(expression)
ofs = expression.rfind(".")
obj = eval(expression[:ofs])
print "The part of the expression Left of the period is of type ", type(obj),
Problem is that 'expression' requires the imports and local variables of the caller in order to eval without error.In reality theres a lot more than just an eval, so I'm trying to avoid the solution of just passing locals() in and through to the eval() since that won't fix my general case problem.

And another, even uglier way to do it -- please don't do this, even if it's possible --
import sys
def insp():
l = sys._getframe(1).f_locals
expression = l["expression"]
ofs = expression.rfind(".")
expofs = expression[:ofs]
obj = eval(expofs, globals(), l)
print "The part of the expression %r Left of the period (%r) is of type %r" % (expression, expofs, type(obj)),
def foo():
derp = 5
expression = "derp.durr"
insp()
foo()
outputs
The part of the expression 'derp.durr' Left of the period ('derp') is of type (type 'int')

I don't presume this is the answer that you wanted to hear, but trying to access local variables from a caller module's scope is not a good idea. If you normally program in PHP or C, you might be used to this sort of thing?
If you still want to do this, you might consider creating a class and passing an instance of that class in place of locals():
#other_module.py
def some_func(lcls):
print(lcls.x)
Then,
>>> import other_module
>>>
>>>
>>> x = 'Hello World'
>>>
>>> class MyLocals(object):
... def __init__(self, lcls):
... self.lcls = lcls
... def __getattr__(self, name):
... return self.lcls[name]
...
>>> # Call your function with an instance of this instead.
>>> other_module.some_func(MyLocals(locals()))
'Hello World'
Give it a whirl.

Is this possible to pull off?
Yes (sort of, in a very roundabout way) which I would strongly advise against it in general (more on that later).
Consider:
myfile.py
def func_in_caller():
print "in caller"
import otherfile
globals()["imported_func"] = otherfile.remote_func
imported_func(123, globals())
otherfile.py
def remote_func(x1, extra):
for k,v in extra.iteritems():
globals()[k] = v
print x1
func_in_caller()
This yields (as expected):
123
in caller
What we're doing here is trickery: we just copy every item into another namespace in order to make this work. This can (and will) break very easily and/or lead to hard to find bugs.
There's almost certainly a better way of solving your problem / structuring your code (we need more information in general on what you're trying to achieve).

From The Zen of Python:
2) Explicit is better than implicit.
In other words, pass in the parameter and don't try to get really fancy just because you think it would be easier for you. Writing code is not just about you.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

how do exec and eval add builtins to a given environment? - python

Related

Implement f-string like magic to pimp Django's format_html()

Changing function code using a decorator and execute it with eval?

Why must a new global variable be created to reference the current class instance in "exec"?

Why is it forbidden to override log record attributes?

Call python function as if it were inline

Categories

Resources