I am trying to find (line and column position) all the references of a specific function declaration when parsing a C++ source file via libclang in Python.
For example:
#include <iostream>
using namespace std;
int addition (int a, int b)
{
int r;
r=a+b;
return r;
}
int main ()
{
int z, q;
z = addition (5,3);
q = addition (5,5);
cout << "The first result is " << z;
cout << "The second result is " << q;
}
So, for the source file above, I would like for the function declaration for addition in line 5, I would like the find_all_function_decl_references(see below) to return the references of addition at lines 15 and 16.
I have tried this (adapted from here)
import clang.cindex
import ccsyspath
index = clang.cindex.Index.create()
translation_unit = index.parse(filename, args=args)
for node in translation_unit.cursor.walk_preorder():
node_definition = node.get_definition()
if node.location.file is None:
continue
if node.location.file.name != sourcefile:
continue
if node_def is None:
pass
if node.kind.name == 'FUNCTION_DECL':
if node.kind.is_reference():
find_all_function_decl_references(node_definition.displayname) # TODO
Another approach could be to store all the function declarations found on a list and run the find_all_function_decl_references method on each.
Does anyone has any idea of how to approach this? How this find_all_function_decl_references method would be? (I am very new with libclang and Python.)
I have seen this where the def find_typerefs is finding all references to some type but I am not sure how to implement it for my needs.
Ideally, I would like to be able to fetch all references for any declaration; not only functions but variable declarations, parameter declarations (e.g. the a and b in the example above in line 7), class declarations etc.
EDIT
Following Andrew's comment, here are some details regarding my setup specifications:
LLVM 3.8.0-win64
libclang-py3 3.8.1
Python3.5.1 (in Windows, I assume CPython)
For the args, I tried both the ones suggested in the answer here and the ones from another answer.
*Please note, given my small programming experience I could appreciate an answer with a brief explanation of how it works.
The thing that really makes this problem challenging is the complexity of C++.
Consider what is callable in C++: functions, lambdas, the function call operator, member functions, template functions and member template functions. So in the case of just matching call expressions, you'd need to be able to disambiguate these cases.
Furthermore, libclang doesn't offer a perfect view of the clang AST (some nodes don't get exposed completely, particularly some nodes related to templates). Consequently, it's possible (even likely) that an arbitrary code fragment would contain some construct where libclangs view of the AST was insufficient to associate the call expression with a declaration.
However, if you're prepared to restrict yourself to a subset of the language it may be possible to make some headway - for example, the following sample tries to associate call sites with function declarations. It does this by doing a single pass over all the nodes in the AST matching function declarations with call expressions.
from clang.cindex import *
def is_function_call(funcdecl, c):
""" Determine where a call-expression cursor refers to a particular function declaration
"""
defn = c.get_definition()
return (defn is not None) and (defn == funcdecl)
def fully_qualified(c):
""" Retrieve a fully qualified function name (with namespaces)
"""
res = c.spelling
c = c.semantic_parent
while c.kind != CursorKind.TRANSLATION_UNIT:
res = c.spelling + '::' + res
c = c.semantic_parent
return res
def find_funcs_and_calls(tu):
""" Retrieve lists of function declarations and call expressions in a translation unit
"""
filename = tu.cursor.spelling
calls = []
funcs = []
for c in tu.cursor.walk_preorder():
if c.location.file is None:
pass
elif c.location.file.name != filename:
pass
elif c.kind == CursorKind.CALL_EXPR:
calls.append(c)
elif c.kind == CursorKind.FUNCTION_DECL:
funcs.append(c)
return funcs, calls
idx = Index.create()
args = '-x c++ --std=c++11'.split()
tu = idx.parse('tmp.cpp', args=args)
funcs, calls = find_funcs_and_calls(tu)
for f in funcs:
print(fully_qualified(f), f.location)
for c in calls:
if is_function_call(f, c):
print('-', c)
print()
To show how well this works, you need a slightly more challenging example to parse:
// tmp.cpp
#include <iostream>
using namespace std;
namespace impl {
int addition(int x, int y) {
return x + y;
}
void f() {
addition(2, 3);
}
}
int addition (int a, int b) {
int r;
r=a+b;
return r;
}
int main () {
int z, q;
z = addition (5,3);
q = addition (5,5);
cout << "The first result is " << z;
cout << "The second result is " << q;
}
And I get the output:
impl::addition
- <SourceLocation file 'tmp.cpp', line 10, column 9>
impl::f
addition
- <SourceLocation file 'tmp.cpp', line 22, column 7>
- <SourceLocation file 'tmp.cpp', line 23, column 7>
main
Scaling this up to consider more types of declarations would (IMO) be non-trivial and an interesting project in it's own right.
Addressing comments
Given that there are some questions about whether the code in this answer produces the results I've provided, I've added a gist of the code (that reproduces the content of this question) and a very minimal vagrant machine image that you can use to experiment with. Once the machine is booted you can clone the gist, and reproduce the answer with the commands:
git clone https://gist.github.com/AndrewWalker/daa2af23f34fe9a6acc2de579ec45535 find-func-decl-refs
cd find-func-decl-refs
export LD_LIBRARY_PATH=/usr/lib/llvm-3.8/lib/ && python3 main.py
Related
I am new to both C and ctypes, but I cannot seem to find an answer on how to do this, particularly with a numpy array.
C Code
// Import/Export Macros
#define DllImport __declspec( dllimport )
#define DllExport __declspec( dllexport )
// Test function for receiving and transmitting arrays
extern "C"
DllExport void c_fun(char **string_array)
{
string_array[0] = "foo";
string_array[1] = "bar";
string_array[2] = "baz";
}
Python Code
import numpy as np
import ctypes
# Load the DLL library...
# Define function argtypes
lib.c_fun.argtypes = [np.ctypeslib.ndpointer(ctypes.c_char, ndim = 2, flags="C_CONTIGUOUS")]
# Initialize, call, and print
string_array = np.empty((3,10),dtype=ctypes.c_char)
lib.c_fun(string_array)
print(string_array)
I am sure there is some encoding/decoding that needs to happen as well, but I am not sure how/which. Thanks!
Addressing the C code part of the question only...
As noted in comments, C does not allow assignments in this way if the three variables shown are defined as char arrays:
string_array[0] = "foo";
string_array[1] = "bar";
string_array[2] = "baz";
Use the following:
strcpy(string_array[0], "foo");
strcpy(string_array[1], "bar");
strcpy(string_array[2], "baz");
And as long as the caller to this function is pre-allocating and freeing memory for the buffers, this part of the solution is now at least syntactically correct.
But if the strings do indeed need to be immutable to be compatible with Python, then in the caller function allocate memory to create char **string_array such that you can pass an array of 3 pointers as the argument. For example:
char **string_array = malloc(3*sizeof(*string_array));//creates array of 3 pointers.
Then call it as:
c_fun(string_array);
This allows use of the DLL call just as shown in in your original post.:
DllExport void c_fun(char **string_array)
{
//array of pointers being assigned to addresses of 3 string literals
string_array[0] = "foo";//these will now be immutable strings
string_array[1] = "bar";
string_array[2] = "baz";
}
Here is my C code:
//int MyFunc(char* res); -> This is the definition of C function
char data[4096];
MyFunc(data);
printf("Data is : %s\n", data);
The data variable is updated by the C function. I used bytearray in Python to pass the variable as argument but the updated array is not reflecting. Any working code sample is much appreciated.
EDIT: I am using Python 3.7.
My Python code:
data = bytearray(b'1234567890')
str_buffer = create_string_buffer(bytes(data), len(data))
print(MyFunc(str_buffer))
print(str_buffer.value) #Output: b''
str_buffer does not contain the values updated by MyFunc().
Calling MyFunc() from C# using the below signature works for me. I am looking for a Python 3.7 equivalent of it.
[DllImport("mydll.dll", CharSet = CharSet.Ansi, CallingConvention = CallingConvention.Cdecl)]
public static extern int MyFunc(StringBuilder data);
A bytearray isn't the right way to pass a char * to a C function. Use create_string_buffer instead. Also, len(data) is an off-by-one error that results in a null terminator not being present, so either stick a + 1 on that or remove it, as the default length is right. Here's a minimal working example. First, a C function that turns every letter uppercase, and returns the number of letters that were already uppercase:
#include <ctype.h>
int MyFunc(char* res) {
int i = 0;
while(*res) {
if(isupper(*res)) {
++i;
} else {
*res = toupper(*res);
}
++res;
}
return i;
}
I compiled it with gcc -fPIC -shared upstring.c -o upstring.so. Since you're on Windows, you'll have to adapt this.
Now, some Python that calls it:
from ctypes import *
upstring = CDLL("./upstring.so") # Since you're on Windows, you'll have to adapt this too.
data = bytearray(b'abc12DEFGHI')
str_buffer = create_string_buffer(bytes(data)) # Note: using len(data) would be an off-by-one error that would lose the null terminator, so either omit it or use len(data)+1
print(upstring.MyFunc(str_buffer)) # prints 6
print(str_buffer.value) # prints b'ABC12DEFGHI'
An apparent calling convention mismatch exists where the position and contents of arguments are incorrect when loading a small function using Python's Ctypes module.
In the example I built up while trying to get something working, one positional argument gets another's value while the other gets garbage.
The Ctypes docs state that cdll.LoadLibrary expects the cdecl convention. Resulting standard boilerplate:
# Tell Rustc to output a dynamically linked library
crate-type = ["cdylib"]
// Specify clean symbol and cdecl calling convention
#[no_mangle]
pub extern "cdecl" fn boring_function(
n: *mut size_t,
in_data: *mut [c_ulong],
out_data: *mut [c_double],
garbage: *mut [c_double],
) -> c_int {
//...
Loading our library after build...
lib = ctypes.CDLL("nothing/lib/playtoys.so")
lib.boring_function.restype = ctypes.c_int
Load the result into Python and call it with some initialized data
data_len = 8
in_array_t = ctypes.c_ulong * data_len
out_array_t = ctypes.c_double * data_len
in_array = in_array_t(7, 7, 7, 7, 7, 8, 7, 7)
out_array = out_array_t(10000.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9)
val = ctypes.c_size_t(data_len)
in_array_p = ctypes.byref(in_array)
out_array_p = ctypes.byref(out_array)
n_p = ctypes.byref(val)
garbage = n_p
res = boring_function(n_p,
in_array_p,
# garbage cannot be observed in any callee arg
ctypes.cast(garbage, ctypes.POINTER(out_array_t)),
out_array_p)
Notice the garbage parameter. It is so-named because it winds up containing a garbage address. Note that its position is swapped with out_array_p in the Python call and the Rust declaration.
[src/hello.rs:29] n = 0x00007f56dbce5bc0
[src/hello.rs:30] in_data = 0x00007f56f81e3270
[src/hello.rs:31] out_data = 0x00007f56f81e3230
[src/hello.rs:32] garbage = 0x000000000000000a
in_data, out_data, and n print the correct values in this configuration. The positional swap between garbage and out_data makes this possible.
Other examples using more or less arguments reveal similar patterns of intermediate ordered variables containing odd values that resemble addresses earlier in the program or unrelated garbage.
Either I'm missing something in how I set up the calling convention or some special magic in argtypes must be missing. So far I had no luck with changing the declared calling conventions or explicit argtypes. Are there any other knobs I should try turning?
in_data: *mut [c_ulong],
A slice is not a FFI-safe data type. Namely, Rust's slices use fat pointers, which take up two pointer-sized values.
You need to pass the data pointer and length as two separate arguments.
See also:
Why can comparing two seemingly equal pointers with == return false?
Rust functions with slice arguments in The Rust FFI Omnibus
The complete example from the Omnibus:
extern crate libc;
use libc::{uint32_t, size_t};
use std::slice;
#[no_mangle]
pub extern fn sum_of_even(n: *const uint32_t, len: size_t) -> uint32_t {
let numbers = unsafe {
assert!(!n.is_null());
slice::from_raw_parts(n, len as usize)
};
let sum =
numbers.iter()
.filter(|&v| v % 2 == 0)
.fold(0, |acc, v| acc + v);
sum as uint32_t
}
#!/usr/bin/env python3
import sys, ctypes
from ctypes import POINTER, c_uint32, c_size_t
prefix = {'win32': ''}.get(sys.platform, 'lib')
extension = {'darwin': '.dylib', 'win32': '.dll'}.get(sys.platform, '.so')
lib = ctypes.cdll.LoadLibrary(prefix + "slice_arguments" + extension)
lib.sum_of_even.argtypes = (POINTER(c_uint32), c_size_t)
lib.sum_of_even.restype = ctypes.c_uint32
def sum_of_even(numbers):
buf_type = c_uint32 * len(numbers)
buf = buf_type(*numbers)
return lib.sum_of_even(buf, len(numbers))
print(sum_of_even([1,2,3,4,5,6]))
Disclaimer: I am the primary author of the Omnibus
Asked because of this: Default argument in c++
Say I have a function such as this: void f(int p1=1, int p2=2, int p3=3, int p4=4);
And I want to call it using only some of the arguments - the rest will be the defaults.
Something like this would work:
template<bool P1=true, bool P2=true, bool P3=true, bool P4=true>
void f(int p1=1, int p2=2, int p3=3, int p4=4);
// specialize:
template<>
void f<false, true, false, false>(int p1) {
f(1, p1);
}
template<>
void f<false, true, true, false>(int p1, int p2) {
f(1, p1, p2);
}
// ... and so on.
// Would need a specialization for each combination of arguments
// which is very tedious and error-prone
// Use:
f<false, true, false, false>(5); // passes 5 as p2 argument
But it requires too much code to be practical.
Is there a better way to do this?
Use the Named Parameters Idiom (→ FAQ link).
The Boost.Parameters library (→ link) can also solve this task, but paid for by code verbosity and greatly reduced clarity. It's also deficient in handling constructors. And it requires having the Boost library installed, of course.
Have a look at the Boost.Parameter library.
It implements named paramaters in C++. Example:
#include <boost/parameter/name.hpp>
#include <boost/parameter/preprocessor.hpp>
#include <iostream>
//Define
BOOST_PARAMETER_NAME(p1)
BOOST_PARAMETER_NAME(p2)
BOOST_PARAMETER_NAME(p3)
BOOST_PARAMETER_NAME(p4)
BOOST_PARAMETER_FUNCTION(
(void),
f,
tag,
(optional
(p1, *, 1)
(p2, *, 2)
(p3, *, 3)
(p4, *, 4)))
{
std::cout << "p1: " << p1
<< ", p2: " << p2
<< ", p3: " << p3
<< ", p4: " << p4 << "\n";
}
//Use
int main()
{
//Prints "p1: 1, p2: 5, p3: 3, p4: 4"
f(_p2=5);
}
Although Boost.Parameters is amusing, it suffers (unfortunately) for a number of issues, among which placeholder collision (and having to debug quirky preprocessors/template errors):
BOOST_PARAMETER_NAME(p1)
Will create the _p1 placeholder that you then use later on. If you have two different headers declaring the same placeholder, you get a conflict. Not fun.
There is a much simpler (both conceptually and practically) answer, based on the Builder Pattern somewhat is the Named Parameters Idiom.
Instead of specifying such a function:
void f(int a, int b, int c = 10, int d = 20);
You specify a structure, on which you will override the operator():
the constructor is used to ask for mandatory arguments (not strictly in the Named Parameters Idiom, but nobody said you had to follow it blindly), and default values are set for the optional ones
each optional parameter is given a setter
Generally, it is combined with Chaining which consists in making the setters return a reference to the current object so that the calls can be chained on a single line.
class f {
public:
// Take mandatory arguments, set default values
f(int a, int b): _a(a), _b(b), _c(10), _d(20) {}
// Define setters for optional arguments
// Remember the Chaining idiom
f& c(int v) { _c = v; return *this; }
f& d(int v) { _d = v; return *this; }
// Finally define the invocation function
void operator()() const;
private:
int _a;
int _b;
int _c;
int _d;
}; // class f
The invocation is:
f(/*a=*/1, /*b=*/2).c(3)(); // the last () being to actually invoke the function
I've seen a variant putting the mandatory arguments as parameters to operator(), this avoids keeping the arguments as attributes but the syntax is a bit weirder:
f().c(3)(/*a=*/1, /*b=*/2);
Once the compiler has inlined all the constructor and setters call (which is why they are defined here, while operator() is not), it should result in similarly efficient code compared to the "regular" function invocation.
This isn't really an answer, but...
In C++ Template Metaprogramming by David Abrahams and Aleksey Gurtovoy (published in 2004!) the authors talk about this:
While writing this book, we reconsidered the interface used for named
function parameter support. With a little experimentation we
discovered that it’s possible to provide the ideal syntax by using
keyword objects with overloaded assignment operators:
f(slew = .799, name = "z");
They go on to say:
We’re not going to get into the implementation details of this named
parameter library here; it’s straightforward enough that we suggest
you try implementing it yourself as an exercise.
This was in the context of template metaprogramming and Boost::MPL. I'm not too sure how their "straighforward" implementation would jive with default parameters, but I assume it would be transparent.
I have a self-made C library that I want to access using python. The problem is that the code consists essentially of two parts, an initialization to read in data from a number of files and a few calculations that need to be done only once. The other part is called in a loop and uses the data generated before repeatedly. To this function I want to pass parameters from python.
My idea was to write two C wrapper functions, "init" and "loop" - "init" reads the data and returns a void pointer to a structure that "loop" can use together with additional parameters that I can pass on from python. Something like
void *init() {
struct *mystruct ret = (mystruct *)malloc(sizeof(mystruct));
/* Fill ret with data */
return ret;
}
float loop(void *data, float par1, float par2) {
/* do stuff with data, par1, par2, return result */
}
I tried calling "init" from python as a c_void_p, but since "loop" changes some of the contents of "data" and ctypes' void pointers are immutable, this did not work.
Other solutions to similar problems I saw seem to require knowledge of how much memory "init" would use, and I do not know that.
Is there a way to pass data from one C function to another through python without telling python exactly what or how much it is? Or is there another way to solve my problem?
I tried (and failed) to write a minimum crashing example, and after some debugging it turned out there was a bug in my C code. Thanks to everyone who replied!
Hoping that this might help other people, here is a sort-of-minimal working version (still without separate 'free' - sorry):
pybug.c:
#include <stdio.h>
#include <stdlib.h>
typedef struct inner_struct_s {
int length;
float *array;
} inner_struct_t;
typedef struct mystruct_S {
int id;
float start;
float end;
inner_struct_t *inner;
} mystruct_t;
void init(void **data) {
int i;
mystruct_t *mystruct = (mystruct_t *)malloc(sizeof(mystruct_t));
inner_struct_t *inner = (inner_struct_t *)malloc(sizeof(inner_struct_t));
inner->length = 10;
inner->array = calloc(inner->length, sizeof(float));
for (i=0; i<inner->length; i++)
inner->array[i] = 2*i;
mystruct->id = 0;
mystruct->start = 0;
mystruct->end = inner->length;
mystruct->inner = inner;
*data = mystruct;
}
float loop(void *data, float par1, float par2, int newsize) {
mystruct_t *str = data;
inner_struct_t *inner = str->inner;
int i;
inner->length = newsize;
inner->array = realloc(inner->array, newsize * sizeof(float));
for (i=0; i<inner->length; i++)
inner->array[i] = par1 + i * par2;
return inner->array[inner->length-1];
}
compile as
cc -c -fPIC pybug.c
cc -shared -o libbug.so pybug.o
Run in python:
from ctypes import *
sl = CDLL('libbug.so')
# What arguments do functions take / return?
sl.init.argtype = c_void_p
sl.loop.restype = c_float
sl.loop.argtypes = [c_void_p, c_float, c_float, c_int]
# Init takes a pointer to a pointer
px = c_void_p()
sl.init(byref(px))
# Call the loop a couple of times
for i in range(10):
print sl.loop(px, i, 5, 10*i+5)
You should have a corresponding function to free the data buffer when the caller is done. Otherwise I don't see the issue. Just pass the pointer to loop that you get from init.
init.restype = c_void_p
loop.argtypes = [c_void_p, c_float, c_float]
loop.restype = c_float
I'm not sure what you mean by "ctypes' void pointers are immutable", unless you're talking about c_char_p and c_wchar_p. The issue there is if you pass a Python string as an argument it uses Python's private pointer to the string buffer. If a function can change the string, you should first copy it to a c_char or c_wchar array.
Here's a simple example showing the problem of passing a Python string (2.x byte string) as an argument to a function that modifies it. In this case it changes index 0 to '\x00':
>>> import os
>>> from ctypes import *
>>> open('tmp.c', 'w').write("void f(char *s) {s[0] = 0;}")
>>> os.system('gcc -shared -fPIC -o tmp.so tmp.c')
0
>>> tmp = CDLL('./tmp.so')
>>> tmp.f.argtypes = [c_void_p]
>>> tmp.f.restype = None
>>> tmp.f('a')
>>> 'a'
'\x00'
>>> s = 'abc'
>>> tmp.f(s)
>>> s
'\x00bc'
This is specific to passing Python strings as arguments. It isn't a problem to pass pointers to data structures that are intended to be mutable, either ctypes data objects such as a Structure, or pointers returned by libraries.
Is your C code in a DLL? If so can might consider creating a global pointer in there. init() will do any initialization required and set the pointer equal to newly allocated memory and loop() will operate on that memory. Also don't forget to free it up with a close() function