I have a C extension module, to which I would like to add some Python utility functions. Is there a recommended way of doing this?
For example:
import my_module
my_module.super_fast_written_in_C()
my_module.written_in_Python__easy_to_maintain()
I'm primarily interested in Python 2.x.
The usual way of doing this is: mymod.py contains the utility functions written in Python, and imports the goodies in the _mymod module which is written in C and is imported from _mymod.so or _mymod.pyd. For example, look at .../Lib/csv.py in your Python distribution.
Prefix your native extension with an underscore.
Then, in Python, create a wrapper module that imports that native extension and adds some other non-native routines on top of that.
The existing answers describe the method most often used: it has the potential advantage of allowing pure-Python (or other-language) implementations on platforms in which the compiled C extension is not available (including Jython and IronPython).
In a few cases, however, it may not be worth splitting the module into a C layer and a Python layer just to provide a few extras that are more sensibly written in Python than in C. For example, gmpy (lines 7113 ff at this time), in order to enable pickling of instances of gmpy's type, uses:
copy_reg_module = PyImport_ImportModule("copy_reg");
if (copy_reg_module) {
char* enable_pickle =
"def mpz_reducer(an_mpz): return (gmpy.mpz, (an_mpz.binary(), 256))\n"
"def mpq_reducer(an_mpq): return (gmpy.mpq, (an_mpq.binary(), 256))\n"
"def mpf_reducer(an_mpf): return (gmpy.mpf, (an_mpf.binary(), 0, 256))\n"
"copy_reg.pickle(type(gmpy.mpz(0)), mpz_reducer)\n"
"copy_reg.pickle(type(gmpy.mpq(0)), mpq_reducer)\n"
"copy_reg.pickle(type(gmpy.mpf(0)), mpf_reducer)\n"
;
PyObject* namespace = PyDict_New();
PyObject* result = NULL;
if (options.debug)
fprintf(stderr, "gmpy_module imported copy_reg OK\n");
PyDict_SetItemString(namespace, "copy_reg", copy_reg_module);
PyDict_SetItemString(namespace, "gmpy", gmpy_module);
PyDict_SetItemString(namespace, "type", (PyObject*)&PyType_Type);
result = PyRun_String(enable_pickle, Py_file_input,
namespace, namespace);
If you want those few extra functions to "stick around" in your module (not necessary in this example case), you would of course use your module object as built by Py_InitModule3 (or whatever other method) and its PyModule_GetDict rather than a transient dictionary as the namespace in which to PyRun_String. And of course there are more sophisticated approaches than to PyRun_String the def and class statements you need, but, for simple enough cases, this simple approach may in fact be sufficient.
Related
I have an inner loop C function which (in my case) constructs a Python datetime object:
PyObject* my_inner_loop_fn(void* some_data) {
PyObject* datetime = PyImport_ImportModule("datetime");
if (datetime == NULL) return NULL;
PyObject* datetime_date = PyObject_GetAttrString(datetime, "date");
PyObject* result = NULL;
if (datetime_date != NULL) {
/* long long my_year, my_month, my_day = ... ; */
PyObject* args = Py_BuildValue("(LLL)", my_year, my_month, my_day);
res = PyObject_Call(datetime_date, args, NULL);
Py_XDECREF(args);
}
Py_XDECREF(datetime_date);
Py_DECREF(datetime);
return result;
}
My questions:
Does the PyImport_ImportModule("datetime") reload the entire module from scratch every time, or is it cached?
If it is not cached:
What is the preferred method of caching it?
When is the earliest time it is safe to try importing the other module? Can I assign it to a global variable, for example?
I want to avoid paying a heavy cost for the import, since the function runs frequently. Is the above expected to be performant code?
Does the PyImport_ImportModule("datetime") reload the entire module from scratch every time, or is it cached?
The standard behaviour is to first check sys.modules to see if the module has already been imported and return that if possible. It's only reloaded if it hasn't been imported successfully.
You can obviously test that yourself by putting some code with a visible side-effect in a module and importing that multiple times (e.g. a print statement).
The module import system is customizable however, so I believe it's possible for another module to choose to modify that behaviour (as an example, pyximport module for example has an option to always reload). Therefore, it's not 100% guaranteed.
It may still be worth caching because there's some cost in doing the look-up - it's a balance between the convenience of not having to cache it yourself and speed.
When is the earliest time it is safe to try importing the other module?
It's safe after the Python interpreter has been initialized. If you're embedding Python in a C/C++ program this is something you need to think about. If you're writing a Python extension module then you can be confident that the interpreter is initialized for your module to be imported.
Can I assign it to a global variable, for example?
Yes. However, global variables make it a little difficult for your module to support being unloaded and reloaded cleanly. Lots of C extensions choose not to worry about this. However, the PyModule_GetState mechanism is designed to support this use-case so you might choose to put your cache in the extension module state.
I am using the first step example in pybind11's documentation
#include <pybind11/pybind11.h>
int add(int i, int j)
{
return i + j;
}
PYBIND11_MODULE(example, m)
{
m.doc() = "pybind11 example plugin"; // optional module docstring
m.def("add", &add, "A function which adds two numbers");
}
everything works fine, i can use it in python shell:
import example
example.add(2, 3) #returns 5
Now i made a simple change to use float instead of int for input for add(), everything compiles. and i want to reload the module example so i can test the new float based add(). However, i can not figure out a way to reload the example module. importlib.reload does not work, %autorelaod 2 in IPython does not work either.
both approached tested to work with pure python based modules, but not the c++ and pybind11 based modules.
Did I miss anything here? or it ought to be like this?
UPDATE: seems it is a known issue related to How to Reload a Python3 C extension module?
Python's import mechanism will never dlclose() a shared library. Once
loaded, the library will stay until the process terminates.
pybind11 module and ctypes module seems to share the same traits here regarding how the module is loaded/imported.
Also quote from https://github.com/pybind/pybind11/issues/2511:
The way C extensions are loaded by Python does not allow them to be
reloaded (in contract to Python modules, where the Python code can
just be reloaded and doesn't refer to a dynamically loaded library)
I now just wonder if there is a method to wrap this up in a more convenient way for reloading the module. E.g., spawn a subprocess for a new python shell that copies all C extensions related variable/module, and substitute the original one.
Its seems no way straightforward. Since it's possible to manage standard shared library by manually dlopen() & dlclose(), you can change your PYBIND11_MODULE to a pre-defined function like
void __bind_module(void *bind_) {
typedef void (*binder_t)(const char *, py::cpp_function);
auto bind = (binder_t) bind_;
bind("add", add);
}
and then write a manager module to attach/detach those libraries. Something like importlib from yourself.
What happens behind the scenes (in CPython 3.6.0) when code uses import posix? This module doesn't have a __file__ attribute. When starting the interpreter in verbose mode, I see this line:
import 'posix' # <class '_frozen_importlib.BuiltinImporter'>
It's already present in sys.modules in a newly openened interpreter, and importing it just binds a name to the existing module.
I'm trying to look at implementation detail of os.lstat on my platform to determine if and when it uses os.stat.
Here, have more detail than you're likely to need.
posix is a built-in module. When you hear "built-in module", you might think of ordinary standard library modules, or you might think of modules written in C, but posix is more built-in than most.
The posix module is written in C, in Modules/posixmodule.c. However, while most C modules, even standard library C modules, are compiled to .so or .pyd files and placed on the import path like regular Python modules, posix actually gets compiled right into the Python executable itself.
One of the internal details of CPython's import system is the PyImport_Inittab array:
extern struct _inittab _PyImport_Inittab[];
struct _inittab *PyImport_Inittab = _PyImport_Inittab;
This is an array of struct _inittabs, which consist of a name and a C module initialization function for the module with that name. Modules listed here are built-in.
This array is initially set to _PyImport_Inittab, which comes from Modules/config.c (or PC/config.c depending on your OS, but that's not the case here). Unfortunately, Modules/config.c is generated from Modules/config.c.in during the Python build process, so I can't show you a source code link, but here's part of what it looks like when I generate the file:
struct _inittab _PyImport_Inittab[] = {
{"_thread", PyInit__thread},
{"posix", PyInit_posix},
// ...
As you can see, there's an entry for the posix module, along with the module initialization function, PyInit_posix.
As part of the import system, when trying to load a module, Python goes through sys.meta_path, a list of module finders. One of these finders is responsible for performing the sys.path search you're likely more familiar with, but one of the others is _frozen_importlib.BuiltinImporter, responsible for finding built-in modules like posix. When Python tries that finder, it runs the finder's find_spec method:
#classmethod
def find_spec(cls, fullname, path=None, target=None):
if path is not None:
return None
if _imp.is_builtin(fullname):
return spec_from_loader(fullname, cls, origin='built-in')
else:
return None
which uses _imp.is_builtin to search PyImport_Inittab for the "posix" name. The search finds the name, so find_spec returns a module spec representing the fact that the loader for built-in modules should handle creating this module. (The loader is the second argument to spec_from_loader. It's cls here, because BuiltinImporter is both the finder and loader.)
Python then runs the loader's create_module method to generate the module object:
#classmethod
def create_module(self, spec):
"""Create a built-in module"""
if spec.name not in sys.builtin_module_names:
raise ImportError('{!r} is not a built-in module'.format(spec.name),
name=spec.name)
return _call_with_frames_removed(_imp.create_builtin, spec)
which delegates to _imp.create_builtin, which searches PyImport_Inittab for the module name and runs the corresponding initialization function.
(_call_with_frames_removed(x, y) just calls x(y), but part of the import system treats it as a magic indicator to strip importlib frames from stack traces, which is why you never see those frames in the stack trace when your imports go wrong.)
If you want to see more of the code path involved, you can look through Lib/importlib/_bootstrap.py, where most of the import implementation lives, Python/import.c, where most of the C part of the implementation lives, and Python/ceval.c, which is where the bytecode interpreter loop lives, and thus is where execution of an import statement starts, before it reaches the more core parts of the import machinery.
Relevant documentation includes the section of the language reference on the import system, as well as PEPs 451 and 302. There isn't much documentation on built-in modules, although I did find a bit of documentation targeted toward people embedding Python in other programs, since they might want to modify PyImport_Inittab, and there is the sys.builtin_module_names list.
I'm currently working on a module that wraps a C-library (let's call it foo)
The C-library has its functions prefixed by foo_ to avoid nameclashes with other libraries:
int foo_dothis(int x, int y);
void foo_dothat(struct foo_struct_*s);
In python, the foo_ prefix makes little sense, as we have namespaces for that kind of things.
import foo
foo.dothis(42)
The C-library also has functions for initializing/deinitializing the entire library:
int foo_init(void);
void foo_exit(void);
Now i'm wondering whether I should strip the foo_ prefix for those as well, in order to prevent confusion with the built-in exit():
from foo import *
exit()
I guess it is OK, as
being consistent is important
exit() is easier to remember than foo_exit()
foo.exit() is prettier than foo.foo_exit()
people are generally discouraged to use exit() in production code (and should only use it in the interpreter)
importing all symbols from a module asks for trouble anyhow
So what is the common approach to that (best-practice,...)
Since the role of foo_exit() is to uninitialise the library, and this is kind of the inverse of foo_init(), you could simply use name foo.uninit() for the Python function. This will avoid name clashes and confusion with the builtin exit(), and its purpose should be obvious to users of the module.
I'm just getting started with ctypes and would like to use a C++ class that I have exported in a dll file from within python using ctypes.
So lets say my C++ code looks something like this:
class MyClass {
public:
int test();
...
I would know create a .dll file that contains this class and then load the .dll file in python using ctypes.
Now how would I create an Object of type MyClass and call its test function? Is that even possible with ctypes? Alternatively I would consider using SWIG or Boost.Python but ctypes seems like the easiest option for small projects.
Besides Boost.Python(which is probably a more friendly solution for larger projects that require one-to-one mapping of C++ classes to python classes), you could provide on the C++ side a C interface. It's one solution of many so it has its own trade offs, but I will present it for the benefit of those who aren't familiar with the technique. For full disclosure, with this approach one wouldn't be interfacing C++ to python, but C++ to C to Python. Below I included an example that meets your requirements to show you the general idea of the extern "c" facility of C++ compilers.
//YourFile.cpp (compiled into a .dll or .so file)
#include <new> //For std::nothrow
//Either include a header defining your class, or define it here.
extern "C" //Tells the compile to use C-linkage for the next scope.
{
//Note: The interface this linkage region needs to use C only.
void * CreateInstanceOfClass( void )
{
// Note: Inside the function body, I can use C++.
return new(std::nothrow) MyClass;
}
//Thanks Chris.
void DeleteInstanceOfClass (void *ptr)
{
delete(std::nothrow) ptr;
}
int CallMemberTest(void *ptr)
{
// Note: A downside here is the lack of type safety.
// You could always internally(in the C++ library) save a reference to all
// pointers created of type MyClass and verify it is an element in that
//structure.
//
// Per comments with Andre, we should avoid throwing exceptions.
try
{
MyClass * ref = reinterpret_cast<MyClass *>(ptr);
return ref->Test();
}
catch(...)
{
return -1; //assuming -1 is an error condition.
}
}
} //End C linkage scope.
You can compile this code with
gcc -shared -o test.so test.cpp
#creates test.so in your current working directory.
In your python code you could do something like this (interactive prompt from 2.7 shown):
>>> from ctypes import cdll
>>> stdc=cdll.LoadLibrary("libc.so.6") # or similar to load c library
>>> stdcpp=cdll.LoadLibrary("libstdc++.so.6") # or similar to load c++ library
>>> myLib=cdll.LoadLibrary("/path/to/test.so")
>>> spam = myLib.CreateInstanceOfClass()
>>> spam
[outputs the pointer address of the element]
>>> value=CallMemberTest(spam)
[does whatever Test does to the spam reference of the object]
I'm sure Boost.Python does something similar under the hood, but perhaps understanding the lower levels concepts is helpful. I would be more excited about this method if you were attempting to access functionality of a C++ library and a one-to-one mapping was not required.
For more information on C/C++ interaction check out this page from Sun: http://dsc.sun.com/solaris/articles/mixing.html#cpp_from_c
The short story is that there is no standard binary interface for C++ in the way that there is for C. Different compilers output different binaries for the same C++ dynamic libraries, due to name mangling and different ways to handle the stack between library function calls.
So, unfortunately, there really isn't a portable way to access C++ libraries in general. But, for one compiler at a time, it's no problem.
This blog post also has a short overview of why this currently won't work. Maybe after C++0x comes out, we'll have a standard ABI for C++? Until then, you're probably not going to have any way to access C++ classes through Python's ctypes.
The answer by AudaAero is very good but not complete (at least for me).
On my system (Debian Stretch x64 with GCC and G++ 6.3.0, Python 3.5.3) I have segfaults as soon has I call a member function that access a member value of the class.
I diagnosticated by printing pointer values to stdout that the void* pointer coded on 64 bits in wrappers is being represented on 32 bits in Python. Thus big problems occurs when it is passed back to a member function wrapper.
The solution I found is to change:
spam = myLib.CreateInstanceOfClass()
Into
Class_ctor_wrapper = myLib.CreateInstanceOfClass
Class_ctor_wrapper.restype = c_void_p
spam = c_void_p(Class_ctor_wrapper())
So two things were missing: setting the return type to c_void_p (the default is int) and then creating a c_void_p object (not just an integer).
I wish I could have written a comment but I still lack 27 rep points.
Extending AudaAero's and Gabriel Devillers answer I would complete the class object instance creation by:
stdc=c_void_p(cdll.LoadLibrary("libc.so.6"))
using ctypes c_void_p data type ensures the proper representation of the class object pointer within python.
Also make sure that the dll's memory management be handled by the dll (allocated memory in the dll should be deallocated also in the dll, and not in python)!
I ran into the same problem. From trial and error and some internet research (not necessarily from knowing the g++ compiler or C++ very well), I came across this particular solution that seems to be working quite well for me.
//model.hpp
class Model{
public:
static Model* CreateModel(char* model_name) asm("CreateModel"); // static method, creates an instance of the class
double GetValue(uint32_t index) asm("GetValue"); // object method
}
#model.py
from ctypes import ...
if __name__ == '__main__':
# load dll as model_dll
# Static Method Signature
fCreateModel = getattr(model_dll, 'CreateModel') # or model_dll.CreateModel
fCreateModel.argtypes = [c_char_p]
fCreateModel.restype = c_void_p
# Object Method Signature
fGetValue = getattr(model_dll, 'GetValue') # or model_dll.GetValue
fGetValue.argtypes = [c_void_p, c_uint32] # Notice two Params
fGetValue.restype = c_double
# Calling the Methods
obj_ptr = fCreateModel(c_char_p(b"new_model"))
val = fGetValue(obj_ptr, c_int32(0)) # pass in obj_ptr as first param of obj method
>>> nm -Dg libmodel.so
U cbrt#GLIBC_2.2.5
U close#GLIBC_2.2.5
00000000000033a0 T CreateModel # <----- Static Method
U __cxa_atexit#GLIBC_2.2.5
w __cxa_finalize#GLIBC_2.2.5
U fprintf#GLIBC_2.2.5
0000000000002b40 T GetValue # <----- Object Method
w __gmon_start__
...
...
... # Mangled Symbol Names Below
0000000000002430 T _ZN12SHMEMWrapper4HashEPKc
0000000000006120 B _ZN12SHMEMWrapper8info_mapE
00000000000033f0 T _ZN5Model12DestroyModelEPKc
0000000000002b20 T _ZN5Model14GetLinearIndexElll
First, I was able to avoid the extern "C" directive completely by instead using the asm keyword which, to my knowledge, asks the compiler to use a given name instead of the generated one when exporting the function to the shared object lib's symbol table. This allowed me to avoid the weird symbol names that the C++ compiler generates automatically. They look something like the _ZN1... pattern you see above. Then in a program using Python ctypes, I was able to access the class functions directly using the custom name I gave them. The program looks like fhandle = mydll.myfunc or fhandler = getattr(mydll, 'myfunc') instead of fhandle = getattr(mydll, '_ZN12...myfunc...'). Of course, you could just use the long name; it would make no difference, but I figure the shorter name is a little cleaner and doesn't require using nm to read the symbol table and extract the names in the first place.
Second, in the spirit of Python's style of object oriented programming, I decided to try passing in my class' object pointer as the first argument of the class object method, just like when we pass self in as the first method in Python object methods. To my surprise, it worked! See the Python section above. Apparently, if you set the first argument in the fhandle.argtypes argument to c_void_ptr and pass in the ptr you get from your class' static factory method, the program should execute cleanly. Class static methods seem to work as one would expect like in Python; just use the original function signature.
I'm using g++ 12.1.1, python 3.10.5 on Arch Linux. I hope this helps someone.