Retrieving structs from .so files in Python - python

I am attempting to write a .so library wrapper for an existing C source code project, and then call the functions in the .so library from Python. I have been able to call functions with primitive arguments and return types with no problem, so I am now working on interfacing with more complex functions that have arguments that are pointers to structures.
My problem is in creating the structures on the Python side so that I can call the C-library functions. Some of the structs in the .so library have hundreds of fields, so I was hoping there was an easier alternative to spelling out all the fields and types in a Python ctypes Structure object.
I would like to be able to write something like this is Python:
from ctypes import *
lib = cdll.LoadLibrary("./libexample.so")
class Input(Structure):
_fields_ = lib.example_struct._fields ## where `example_struct` is defined in the .so library
## I have no idea if you can actually get the fields of the struct!!
my_input = Input(a,b,c,...) ## pseudo-code
my_ptr = pointer(my_input) ## wrap the input with a pointer
result = lib.my_lib_func(my_ptr) ## call .so function with struct
This would allow me to easily replicate at least the structure definitions of the large C structs without having to create and maintain lengthy Python versions of the struct definitions. Is this possible? Or is there another way to achieve the same effect?
EDIT: The C source code is third party, so for now, I am looking for an approach where I don't have to modify the C source.

The Cython approach is to read and interpret the .h header file.
But I do not say it would be easy.

Related

Call Windows Function in Python

https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/nf-ntifs-ntqueryinformationfile?redirectedfrom=MSDN
How can I call the above kernel method in python? I found an example on another stackoverflow post: Winapi: Get the process which has specific handle of a file
The answer on this other post is essentially what I want to do, but in python. The goal is to be able to get a list of processes which currently are accessing/locking a file. This NtQueryInformationFile method seems to be exactly what I want. I know this can be done with ctypes, but I am not familiar or comfortable enough with ctypes to do this myself. How can I do this?
If there's no available wrapper for the function, you'll need to call the function yourself using ctypes.
The dlls windows uses are exposed through ctypes.windll, with cytpes.windll.ntdll being the one that exposes the function you need.
To help python convert arguments, it's usually a good idea to specify the function's argument and return types, which can be done through the argtypes and restype attributes on the function object, like so:
function = cytpes.windll.ntdll.NtQueryInformationFile
function.argtypes = [ctypes.wintypes.HANDLE, ...]
function.restype = ctypes.c_long
ctypes exposes the common window types in the ctypes.wintypes module, though for most structures like the PIO_STATUS_BLOCK in your function you'll need to define the struct yourself and add it to the argument list to use it properly. In case it's optional a void pointer and passing it None will suffice.
Also, do mind that windows handles are not the file descriptors that python exposes, to convert to/from them you can use the ..._osfhandle functions from the msvcrt module

How to marshall data structures from C/C++ into python and back

I have a program that is written in C++ and can be extended with extensions that are written in python using python's C api, basically there is python interpreter in the program which can load python modules that contains some hooks to various events.
I can pass simple data types from C to python, such as string or integer. I can do the same the other way, I can call internal C++ api's I created from the python modules. What I can't do, however is to marshall more complex data types, such as classes. For example I have this thing:
class X
{
int a;
int b;
};
I could call some python hook that would look like this:
def some_hook(a, b):
print(str(a + b))
but I would rather do
def some_hook(x):
print(str(x.a + x.b))
I know how to call a python function from C using PyObject_CallObject and how to give it some simple python vars as parameters (like integer or string). But I have no idea how to construct python classes from within C/C++, fill them up with data, and then pass them as parameters to PyObject_CallObject.
My idea was to create some proxy classes like in SWIG (swig.org) that I would fill up with data from C++ class and then back. I created some .py file that contains the declaration of these classes, but still, I have no idea how would I instantiate them from within C/C++ and fill up? Neither how I would do it the other way - turn PyObject into C/C++ class?
In a nutshell: I want to be able to turn C++ class into a python class, call a python function with that class as a parameter, and then call some other function from python with this class that would be just a C++ API that would turn the python class into C++.
Some background reading on how to call python from C: http://docs.python.org/3/extending/embedding.html
Source code of my app C++ that loads and uses python extensions: https://github.com/huggle/huggle3-qt-lx/blob/master/huggle/pythonengine.cpp

Passing structure with pointers to other structures in ctypes

I am trying to make a python wrapper for a C library using ctypes. The library has functions which require a pointer to a structure to be passed, which acts as a handle for future calls.
This structure has pointers to another internal structure that further has pointers to other structures.
typedef struct varnam {
char *scheme_file;
char *suggestions_file;
struct varnam_internal *internal;
} varnam;
The varnam_internal structure has pointers to an sqlite database and so forth
struct varnam_internal
{
sqlite3 *db;
sqlite3 *known_words;
struct varray_t *r;
struct token *v;
...
}
I tried ignoring the varnam_internal structure according to this SO answer. Something like
class Varnam(Structure):
__fields__ = [("scheme_file",c_char_p),
("suggestions_file",c_char_p),("internal",c_void_p)]
But this does not seem to work because I think the library needs to allocate varnam_internal for functioning properly.
Should I implement all the dependent structures in python? Is ctypes suitable for wrapping libraries like this? I have read about alternatives like Cython but I have no experience with Cython so is this doable in it?
There's no reason define the varnam_internal structure in ctypes because you should have no need to access it. The library you're calling will allocate it regardless of whether you define the structure or not. Whatever problem you're encountering it's not because you didn't define the structure in ctypes.
Make sure you're calling varnam_init correctly. It uses pointers to pointers as arguments, which means you can't just use your Varnam class directly. You'll want do something like this:
from ctypes import *
class Varnam(Structure):
__fields__ = [("scheme_file",c_char_p),
("suggestions_file",c_char_p),
("internal",c_void_p)]
varnam_ptr = POINTER(Varnam)
libvarnam = cdll.LoadLibrary("libvarnam.so") # on Linux
# libvarnam = cdll.libvarnam # on Windows
varnam_init = libvarnam.varnam_init
varnam_init.argtypes = [c_char_p, POINTER(varnam_ptr), POINTER(c_char_p)]
def my_varnam_init(scheme_file):
handle = varnam_ptr()
msg = c_char_p()
r = varnam_init(scheme_file. handle.byref(), msg.byref())
if r != 0:
raise Exception(msg)
return handle
The above code is completely untested, but shows you how you should be calling varnam_init.

Is there a way to load the constant values stored in a header file via ctypes?

I'm doing a bunch of ctypes calls to the underlying OS libraries. My progress slows to a crawl anytime the docs reference a constant value stored in a .h. file somewhere, as I have to go track it down, and figure out what the actual value is so that I can pass it into the function.
Is there any way to load a .h file with ctypes and get access to all of the constants?
No.
Early versions of ctypes came with a module called codegenerator, which would parse header files, both to get the constant values and to convert the prototypes into restype/argtypes declarations. However, as far as I know, this was never finished, and it was dropped from the package before inclusion in the stdlib.
You could dig into the source and pull out the constants stuff while skipping the much more complicated prototypes stuff.
However, the way I've usually done it is to write my own generator.
For example, run this script as part of your setup process:
constants = {}
with open('foo.h') as infile:
for name, value in re.findall(r'#define\s+(\w+)\s+(.*)', infile):
try:
constants[name] = ast.literal_eval(value)
except Exception as e:
pass # maybe log something
with open('_foo_h.py', w) as outfile:
outfile.write(repr(constants))
Then foo.py can just from _foo_h import *.
Writing a perfect regexp for this is very hard, maybe impossible; writing one that works for the headers you actually care about in a given project is very easy. In fact, often, either the one above, or one that skips over comments, is all you need.
But sometimes this won't work. For example, a header file might #define FOO_SIZE 8 for 64-bit builds, and #define FOO_SIZE 4 for 32-bit builds. How do you handle that?
For that, you ask the compiler to do it for you. Most compilers have a way to preprocess a file just far enough to get all of the active definitions. Some compilers can even just dump the macro defines, in a nice format, skipping everything else. With gcc and flag-compatible compilers like clang, -E preprocesses, and -dM dumps macros. So:
macros = subprocess.check_output(['gcc', '-dM', '-E', '-', 'foo.h'])
for line in macros.splitlines():
try:
_, name, value = line.split(None, 2)
constants[name] = ast.literal_eval(value)
except Exception as e:
pass # again, do something nicer
You may want to pass in some extra compiler flags to control what gets defined appropriately, like the results of pkgconfig foo --cflags.
This will also give you the macros defined in anything that foo.h (recursively) includes, and gcc's builtin macros. You may or may not want each of those. Somewhere among the 69105 gcc flags, I believe there are ways to control that, but I don't remember them.
Note that neither of these will get you constant variables or enums, like:
static const int SPAM_SPAM_SPAM = 73;
enum {
kSPAM = 1,
kEGGS
};
Parsing that gets more difficult; you'd want to use a real C99 parser like pycparser—or, alternatively, you'd want to parse the output of something like gccxml instead of gcc -E. But even that isn't going to tell you that kEGGS is 2 without you writing a bit of logic.
And if you want to deal with C++, it's even worse, what with constexpr and static class members and user-defined literals…
Alternatively… do you have to use ctypes?
CFFI provides a different way to call C code from Python—and it makes this a lot easier.
Cython lets you write almost-Python code that gets compiled to C that gets compiled to a Python extension module, and it can include header files directly.
There are also a variety of binding-generators (e.g., SWIG) or binding-writing libraries (e.g., boost::python) that can make it easier to export values to Python through an extension module.

referencing opaque types with python ctypes

I'm using a 3rd party C library that defines an opaque type:
foo_t
And uses pointers to this type in its functions:
void foo_init(foo_t *foo);
Typical usage would be allocating a foo_t on the stack and passing a reference:
{
foo_t foo;
foo_init(&foo);
...
}
How do I call foo_init() with ctypes without knowing what constitutes a foo_t?
I think if I knew sizeof(foo_t) I could create a buffer of that size and cast, but is it possible to get the size with ctypes?
I could write a one-liner C program:
printf("sizeof(foo_t) = %zu\n", sizeof(foo_t));
and hard-code that value into my python, but that would get ugly in a hurry: I'd have to touch my python source with every upgrade to the library.
A slightly cleaner way would be to write a python c-ext to export the size value, but that too would require a recompile with every library upgrade.
Does anyone have a recipe for using ctypes with such opaque types?
I think this is the simplest solution...
Create a C file, say, foosizes.c:
size_t SIZEOF_FOO = sizeof(foo_t);
And compile it into a shared object, foosizes.so. Then in a python script:
from ctypes import *
foosizeslib = CDLL('foosizes.so')
sizeof_foo = c_ulong.in_dll(foosizeslib, 'SIZEOF_FOO')
I can then create a buffer of the appropriate size and pass it to functions, by reference, as a pointer to the opaque type. So far, so good.
It is not possible to get the size with ctypes as C does not support runtime reflection, as no metadata about types is stored in the compiled binary as is done with Java or C#/.Net.
As you said, one way to get the size is create a simple C program that includes the header that defines the type and then use the sizeof operator to print out the size. Taking that a step further you could utilize a C compiler written in Python to compile and execute your C code to get the size when your Python code is executed. You might even be able to get it without needing to actually execute the result by walking the data structures provided by the compiler.
That said, are you certain you need to create the memory yourself? Frequently C libraries provide a method to create an opaque type that their other functions operate on. Update: from the comments it is certain that the memory must be allocated by the caller.

Categories

Resources