How to call a python function from C++ with pybind11? - python

Please consider the following C++ pybind11 program:
#include <pybind11/embed.h>
namespace py = pybind11;
int main() {
py::scoped_interpreter guard{};
py::dict locals;
py::exec(R"(
import sys
def f():
print(sys.version)
)", py::globals(), locals);
locals["f"](); // <-- ERROR
}
The py::exec call and the enclosed import sys call both succeed, but the call locals["f"]() throws an exception:
NameError: name 'sys' is not defined
on the first line of function f.
Expected behaviour is that the program prints the python system version.
Any ideas?
Update:
I modified the program as suggested by #DavidW:
#include <pybind11/embed.h>
namespace py = pybind11;
int main() {
py::scoped_interpreter guard{};
py::dict globals = py::globals();
py::exec(R"(
import sys
def f():
print(sys.version)
)", globals, globals);
globals["f"](); // <-- WORKS NOW
}
and it now works.
I'm not 100% sure I understand what is going on, so I would appreciate an explanation.
(In particular does the modification of the common globals / locals dictionary impact any other scripts. Is there some global dictionary that is part of the python interpreter that the exec script is modifying? or does py::globals() take a copy of that state so the execed script is isolated from other scripts?)
Update 2:
So it looks like having globals and locals be the same dictionary is the default state:
$ python
>>> globals() == locals()
True
>>> from __main__ import __dict__ as x
>>> x == globals()
True
>>> x == locals()
True
...and that the default value for the two is __main__.__dict__, whatever that is (__main__.__dict__ is the dictionary returned by py::globals())
I'm still not clear what exactly __main__.__dict__ is.

So the initial problem (solved in the comments) was that having different globals and locals causes it to be evaluated as if it were in a class (see the Python documentation for exec - the PyBind11 function behaves basically the same):
Remember that at the module level, globals and locals are the same dictionary. If exec gets two separate objects as globals and locals, the code will be executed as if it were embedded in a class definition.
A function scope doesn't look up variables defined in its enclosing class - this wouldn't work
class C:
import sys
def f():
print(sys.version)
# but C.sys.version would work
and thus your code doesn't work.
pybind11::globals returns a dictionary that's shared in a number of places:
Return a dictionary representing the global variables in the current execution frame, or __main__.__dict__ if there is no frame (usually when the interpreter is embedded).
and thus any modifications to this dictionary will be persistent and stay (which probably isn't what you want!). In your case it's probably __main__.__dict__ but in general "the current execution frame" might change from call-to-call, depending on how much you're crossing the C++-Python boundary. For example, if a Python function calls a C++ function that modifies globals() then exactly what you modify depends on the caller.
My advice would be to create a new, empty dict instead and pass that to exec. This ensures that you run in a fresh, non-shared namespace.
__main__ is just a special module that represents the "top level code environment". Like any module is has a __dict__. When running in the REPL it's the global scope there. From the pybind11 point of view it's just a module with a dict, and you probably shouldn't be writing into it casually (unless you've really decided that you want to deliberately put something there to share it globally).
Regarding the __builtins__: the documentation for the Python exec function says
If the globals dictionary does not contain a value for the key __builtins__, a reference to the dictionary of the built-in module builtins is inserted under that key. That way you can control what builtins are available to the executed code by inserting your own __builtins__ dictionary into globals before passing it to exec().
and looking at the code for the PyRun_String that Pybind11 exec calls, the same applies there.
This dictionary seems to be sufficient for the builtin functions to be looked up correctly. (If that isn't the case then you can always do pybind11::dict(pybind11::module::import("builtins").attr("__dict__")) to make a copy of the builtin dict and use that instead. However, I don't believe it's necessary)

Related

Can we use a fully qualified identifier in a module, without importing the module?

From Dynamic linking in C/C++ (dll) vs JAVA (JAR)
when i want to use this jar file in another project we use "package" or "import" keyword
You don't have to. This is just a short hand. You can use full
package.ClassName and there is no need for an import. Note: this
doesn't import any code or data, just allow you to use a shorter name
for the class.
e.g. there is no difference between
java.util.Date date = new java.util.Date();
and
import java.util.Date();
Date date = new Date(); // don't need to specify the full package name.
Is it the same case for import in Python3?
Can we use a identifier defined in a module, without importing its module? Did I miss something in the following to make that happen?
What differences are between Java and Python's import?
>>> random.randint(1,25)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'random' is not defined
>>> import random
>>> random.randint(1,25)
18
Python is not Java. In Python you can only access names that are either builtins or defined in the current scope or it's parents scopes - the "top-level" scope being the module namespace (AKA "global" namespace).
The import statement (which is an executable statement FWIW) does two things: first load the module (this actually happens only once per process, then the module is cached in sys.modules), then bind the imported name(s) in the current scope. IOW this:
import foo
is syntaxic sugar for
foo = __import__("foo")
and
from foo import bar
is syntaxic sugar for
foo = __import__("foo")
bar = getattr(foo, "bar")
del foo
Also you have to understand what "loading a module" really means: executing all the code at the module's top-level.
As I mentionned, import is an executable statement, but so are class and def - the def statement creates a code object from the function's body and signature, then creates a function object with this code object, and finally bind this function object to the function's name in the current scope, the class statement does the same thing for a class (executing all the code at the "class" statement's top-level in a temporary namespace and using this namespace to create the "class" object, then binding the class object to it's name).
IOW, all happens at runtime, everything is an object (including functions, classes and modules), and everything you do with an import, class or def statement can be done "manually" too (more or less easily though - manually creating a function is quite an involved process).
So as you can see, this really has nothing to do with how either Java or C++ work.
Short answer: No, you can't implicitly import a module in Python by using a fully qualified name.
Slightly longer answer:
In Python, importing a module can have side effects: a module can have module-level code, not all of its code has to be wrapped in functions or classes. Therefore, importing modules at arbitrary and unexpected locations could be confusing, since it would trigger those side effects when you don't expect them.
The recommended style (see https://www.python.org/dev/peps/pep-0008/ for details) is to put all your imports at the top of your module, and not to hide imports at unexpected places.

Execution of execfile() in a method with Python

I have a config.py script that has PREAMBLE information. I can use execfile() function to read the contents of the configuration file.
execfile("config.py")
print PREAMBLE
>>> "ABC"
However, when execfile() is called in a method, I have an error.
def a():
execfile("config.py")
print PREAMBLE
a()
>>> NameError: "global name 'PREAMBLE' is not defined"
What's wrong and how to solve this issue?
You need to pass the global dictionary to execfile to achieve the same result:
def a():
execfile("config.py",globals())
print PREAMBLE
a()
>>> "some string"
If you don't want to pollute your global namespace, you can pass a local dictionary and use that:
def a():
config = dict()
execfile('/tmp/file',config)
print config['PREAMBLE']
a()
>>> "some string"
For reference, in both cases above, /tmp/file contained PREAMBLE = "some string".
The issue here is the namespaces in which the code runs when execfile() is called. If you study the documentation you will see that execfile() can take two namespace arguments to represent the global and the local namespaces for execution of the code.
At module level (i.e. when the execfile() call does not come from inside a function) then the module global namespace is used for both those namespaces. From inside a function the module globals are used as the global namespace and the local namespace is the function call namespace (which typically disappears when the call returns).
Consequently, since assignments preferentially bind names in the local namespace, executing the file sets up PREAMBLE inside the function call namespace, and so it can't be found at module level.
If your config files really are Python, wouldn't it be possible just to import them, or is there some organizational reason why that won't work? That way you could just import config and then refer to config.PREAMBLE in your code.

execfile() cannot be used reliably to modify a function’s locals

The python documentation states "execfile() cannot be used reliably to modify a function’s locals." on the page http://docs.python.org/2/library/functions.html#execfile
Can anyone provide any further details on this statement? The documentation is fairly minimal. The statement seems very contradictory to "If both dictionaries are omitted, the expression is executed in the environment where execfile() is called." which is also in the documentation. Is there a special case when excecfile is used within a function then execfile is then acting similar to a function in that it creates a new scoping level?
If I use execfile in a function such as
def testfun():
execfile('thefile.py',globals())
def testfun2():
print a
and there are objects created by the commands in 'thefile.py' (such as the object 'a'), how do I know if they are going to be local objects to testfun or global objects? So, in the function testfun2, 'a' will appear to be a global? If I omit globals() from the execfile statement, can anyone give a more detailed explanation why objects created by commands in 'thefile.py' are not available to 'testfun'?
In Python, the way names are looked up is highly optimized inside functions. One of the side effects is that the mapping returned by locals() gives you a copy of the local names inside a function, and altering that mapping does not actually influence the function:
def foo():
a = 'spam'
locals()['a'] = 'ham'
print(a) # prints 'spam'
Internally, Python uses the LOAD_FAST opcode to look up the a name in the current frame by index, instead of the slower LOAD_NAME, which would look for a local name (by name), then in the globals() mapping if not found in the first.
The python compiler can only emit LOAD_FAST opcodes for local names that are known at compile time; but if you allow the locals() to directly influence a functions' locals then you cannot know all the local names ahead of time. Nested functions using scoped names (free variables) complicates matters some more.
In Python 2, you can force the compiler to switch off the optimizations and use LOAD_NAME always by using an exec statement in the function:
def foo():
a = 'spam'
exec 'a == a' # a noop, but just the presence of `exec` is important
locals()['a'] = 'ham'
print(a) # prints 'ham'
In Python 3, exec has been replaced by exec() and the work-around is gone. In Python 3 all functions are optimized.
And if you didn't follow all this, that's fine too, but that is why the documentation glosses over this a little. It is all due to an implementation detail of the CPython compiler and interpreter that most Python users do not need to understand; all you need to know that using locals() to change local names in a function does not work, usually.
Locals are kind of weird in Python. Regular locals are generally accessed by index, not by name, in the bytecode (as this is faster), but this means that Python has to know all the local variables at compile time. And that means you can't add new ones at runtime.
Now, if you use exec in a function, in Python 2.x, Python knows not to do this and falls back to the slower method of accessing local variables by name, and you can make new ones programmatically. (This trick was removed in Python 3.) You'd think Python would also do this for execfile(), but it doesn't, because exec is a statement and execfile() is a function call, and the name execfile might not refer to the built-in function at runtime (it can be reassigned, after all).
What will happen in your example function? Well, try it and find out! As the documentation for execfile states, if you don't pass in a locals dict, the dict you pass in as globals will be used. You pass in globals() (your module's real global variables) so if it assigns to a, then a becomes a global.
Now you might try something like this:
def testfun():
execfile('thefile.py')
def testfun2():
print a
return testfun2
exec ""
The exec statement at the end forces testfun() to use the old-style name-based local variables. It doesn't even have to be executed, as it is not here; it just has to be in the function somewhere.
But this doesn't work either, because the name-based locals don't support nesting functions with free variables (a in this case). That functionality also requires Python know all the local variables at function definition time. You can't even define the above function—Python won't let you.
In short, trying to deal with local variables programmatically is a pain and the documentation is correct: execfile() cannot reliably be used to modify a function's locals.
A better solution, probably, is to just import the file as a module. You can do this within the function, then access values in the module the usual way.
def testfun():
import thefile
print thefile.a
If you won't know the name of the file to be imported until runtime, you can use __import__ instead. Also, you may need to modify sys.path to make sure the directory you want to import from is first in the path (and put it back afterward, probably).
You can also just pass in your own dictionary to execfile and afterward, access the variables from the executed file using myVarsDict['a'] and so on.

C Python: Running Python code within a context

The Python C API function PyEval_EvalCode let's you execute compiled Python code. I want to execute a block of Python code as if it were executing within the scope of a function, so that it has its own dictionary of local variables which don't affect the global state.
This seems easy enough to do, since PyEval_EvalCode lets you provide a Global and Local dictionary:
PyObject* PyEval_EvalCode(PyCodeObject *co, PyObject *globals, PyObject *locals)
The problem I run into has to do with how Python looks up variable names. Consider the following code, that I execute with PyEval_EvalCode:
myvar = 300
def func():
return myvar
func()
This simple code actually raises an error, because Python is unable to find the variable myvar from within func. Even though myvar is in the local dictionary in the outer scope, Python doesn't copy it into the local dictionary in the inner scope. The reason for this is as follows:
Whenever Python looks up a variable name, first it checks locals, then it checks globals, and finally it checks builtins. At module scope, locals and globals are the SAME dictionary object. So the statement x = 5 at module scope will place x in the the locals dictionary, which is also the globals dictionary. Now, a function defined at module scope which needs to lookup x won't find x within the function-scope locals, because Python doesn't copy module-scope locals into function-scope locals. But this normally isn't a problem, because it can find x in globals.
x = 5
def foo():
print(x) # This works because 'x' in globals() == True
It's only with nested functions, that Python seems to copy outer-scope locals into inner-scope locals. (It also seems to do so lazily, only if they are needed within the inner scope.)
def foo():
x = 5
def bar():
print(x) # Now 'x' in locals() == True
bar()
So the result of all this is that, when executing code at module scope, you HAVE to make sure that your global dictionary and local dictionary are the SAME object, otherwise module-scope functions won't be able to access module-scope variables.
But in my case, I don't WANT the global dictionary and local dictionary to be the same. So I need some way to tell the Python interpreter that I am executing code at function scope. Is there some way to do this? I looked at the PyCompileFlags as well as the additional arguments to PyEval_EvalCodeEx and can't find any way to do this.
Python doesn't actually copy outer-scope locals into inner-scope locals; the documentation for locals states:
Free variables are returned by locals() when it is called in function blocks, but not in class blocks.
Here "free" variables refers to variables closed over by a nested function. It's an important distinction.
The simplest fix for your situation is just to pass the same dict object as globals and locals:
code = """
myvar = 300
def func():
return myvar
func()
"""
d = {}
eval(compile(code, "<str>", "exec"), d, d)
Otherwise, you can wrap your code in a function and extract it from the compiled object:
s = 'def outer():\n ' + '\n '.join(code.strip().split('\n'))
exec(compile(s, '<str>', 'exec').co_consts[0], {}, {})

What are `globals` and `locals` parameters in Python __import__ function for?

There's a part of __import__ in Python documentation, which I don't understand:
__import__(name[, globals[, locals[, fromlist[, level]]]])
The function imports the module name, potentially using the given globals and locals to determine how to interpret the name in a package context. The standard implementation does not use its locals argument at all, and uses its globals only to determine the package context of the import statement.
What is there to "interpret" about the module name? What is package context?
An example call using those parameters looks like this:
spam = __import__('spam', globals(), locals(), [], -1)
Why does the example provide globals() and locals() to the function? What happens when I only provide globals()? Or neither?
I am probably missing some part of the namespace logic with relation to importing modules. Could you point me to an article that explains this/has examples with __import__ function?
The standard implementation does not use its locals argument at all, and uses its globals only to determine the package context of the import statement.
(from docs.python.org)
I still have no idea how globals are used; what global variable can ever affect the way import statement works?
EDIT: After looking at import.c in Python 2.5 source I found that __import__ expects to find either __name__ or __path__ in globals in order to augment import search path relative to path(s) found in one of these variables, in that order.
globals is used to determine the current context on where the import is being called. For example:
"""
/myproject/a/b.py
/myproject/a/foo.py
/myproject/c/d.py
/myproject/c/foo.py
"""
# Which foo gets imported?
import foo #1
foo = __import__('foo') #2
They are not the same, since there is no (easy) way on #2 to know from which module the import is being called from. The __import__ function needs to know which is the current module to actually import the correct foo.
Internally on __import__(), globals is used to get the reference on the current module invoking the import. From the __import__ source code:
Return the package that an import is being performed in. If globals comes
from the module foo.bar.bat (not itself a package), this returns the
sys.modules entry for foo.bar. If globals is from a package's init.py,
the package's entry in sys.modules is returned, as a borrowed reference.
What is there to "interpret" about the module name? What is package context?
When you enter
>>> a
Python must "interpret" that name. Is it a global? Is it a local?
>>> def f(x):
... return x * a
Now, x is clearly local. a has to be "interpreted". Global? Local?
Why does the example provide globals() and locals() to the function? What happens when I only provide globals()? Or neither?
Try it and see. Seriously. It's easier to play with it than it is to ask.
What's important is that things you do at the >>> prompt are global.
You'll need to define functions that will create a local context so you can see the differences between global and local.

Categories

Resources