Related
How do I do this?
I saw your post saying that you can just pass java Objects to Python methods but this is not working for numpy arrays and TensorFlow tensors. The following, and various variation of this, is what I tried and to no avail.
double[][] anchors = new double[][]{{0.57273, 0.677385}, {1.87446, 2.06253}, {3.33843, 5.47434}, {7.88282, 3.52778}, {9.77052, 9.16828}};
PyObject anchors_ = numpy.callAttr("array", anchors);
I also tried to use concatenate to create this but it does not work. This is because concatenate (and stack, etc.) require a sequence containing the names of the arrays to be passed as an argument and there does not seem to be a way to do that with Chaquopy in Java.
Any advice?
I assume the error you received was "ValueError: only 2 non-keyword arguments accepted".
You probably also got a warning from Android Studio in the call to numpy.array, saying "Confusing argument 'anchors', unclear if a varargs or non-varargs call is desired". This is the source of the problem. You intended to pass one double[][] argument, but unfortunately Java has interpreted it as five double[] arguments.
Android Studio should offer you an automatic fix of casting the parameter to Object, i.e.:
numpy.callAttr("array", (Object)anchors);
This tells the Java compiler that you intend to pass only one argument, and numpy.array will then work correctly.
I have managed to find two ways that actually work in converting this toy array into proper Python arrays.
In Java:
import com.chaquo.python.*;
Python py = Python.getInstance();
PyObject np = py.getModule("numpy");
PyObject anchors_final = np.callAttr("array", anchors[0]);
anchors_final = np.callAttr("expand_dims", anchors_final, 0);
for (int i=1; i < anchors.length; i++){
PyObject temp_arr = np.callAttr("expand_dims", anchors[i], 0);
anchors_final = np.callAttr("append", anchors_final, temp_arr, 0);
}
// Then you can pass it to your Python file to do whatever
In Python (the simpler way)
After passing the array to your Python function, using for example:
import com.chaquo.python.*;
Python py = Python.getInstance();
PyObject pp = py.getModule("file_name");
PyObject output = pp.callAttr("fnc_head", anchors);
In your Python file, you can simply do:
def fnc_head():
anchors = [list(x) for x in anchors]
...
return result
These were tested with 2-d arrays. Other array types would likely require modifications.
2017/06/13 EDIT:
I tried using boost as was suggested, but after spending more than 3 days trying to get it to compile and link, and failing, I decided that the stupid painful way was probably the fastest and less painfull.... so now my code just saves a mess of gigantic text files (splitting arrays and the complex/ imaginary parts of the numbers across files) that C++ then reads. Elegant... no.... effective... yes.
I have some scientific code, currently written in Python, that is being slowed down by a numerical 3d integration step within a loop. To overcome this I am re-writing this particular step in C++. (Cython etc is not an option).
Long story short: I want to transfer several very large arrays of complex numbers from the python code to the C++ integrator as conveniently and painlessly as possible. I could do this manually and painfully using text or binary files - but before I embark on this, I was wondering if I have any better options?
I'm using visual studio for C++ and anaconda for python (not my choice!)
Is there any file format or method that would make it quick and convenient to save an array of complex numbers from python and then recreate it in C++?
Many thanks,
Ben
An easy solution that I used many times is to build your "C++ side" as a dll (=shared object on Linux/OS X), provide a simple, C-like entrypoint (straight integers, pointers & co., no STL stuff) and pass the data through ctypes.
This avoids boost/SIP/Swig/... build nightmares, can be kept zero-copy (with ctypes you can pass a straight pointer to your numpy data) and allow you to do whatever you want (especially on the build-side - no friggin' distutils, no boost, no nothing - build it with whatever can build a C-like dll) on the C++ side. It has also the nice side-effect of having your C++ algorithm callable from other languages (virtually any language has some way to interface with C libraries).
Here's a quick artificial example. The C++ side is just:
extern "C" {
double sum_it(double *array, int size) {
double ret = 0.;
for(int i=0; i<size; ++i) {
ret += array[i];
}
return ret;
}
}
This has to be compiled to a dll (on Windows) or a .so (on Linux), making sure to export the sum_it function (automatic with gcc, requires a .def file with VC++).
On the Python side, we can have a wrapper like
import ctypes
import os
import sys
import numpy as np
path = os.path.dirname(__file__)
cdll = ctypes.CDLL(os.path.join(path, "summer.dll" if sys.platform.startswith("win") else "summer.so"))
_sum_it = cdll.sum_it
_sum_it.restype = ctypes.c_double
def sum_it(l):
if isinstance(l, np.ndarray) and l.dtype == np.float64 and len(l.shape)==1:
# it's already a numpy array with the right features - go zero-copy
a = l.ctypes.data
else:
# it's a list or something else - try to create a copy
arr_t = ctypes.c_double * len(l)
a = arr_t(*l)
return _sum_it(a, len(l))
which makes sure that the data is marshaled correctly; then invoking the function is as trivial as
import summer
import numpy as np
# from a list (with copy)
print summer.sum_it([1, 2, 3, 4.5])
# from a numpy array of the right type - zero-copy
print summer.sum_it(np.array([3., 4., 5.]))
See the ctypes documentation for more information on how to use it. See also the relevant documentation in numpy.
For complex numbers, the situation is slightly more complicated, as there's no builtin for it in ctypes; if we want to use std::complex<double> on the C++ side (which is pretty much guaranteed to work fine with the numpy complex layout, namely a sequence of two doubles), we can write the C++ side as:
extern "C" {
std::complex<double> sum_it_cplx(std::complex<double> *array, int size) {
std::complex<double> ret(0., 0.);
for(int i=0; i<size; ++i) {
ret += array[i];
}
return ret;
}
}
Then, on the Python side, we have to replicate the c_complex layout to retrieve the return value (or to be able to build complex arrays without numpy):
class c_complex(ctypes.Structure):
# Complex number, compatible with std::complex layout
_fields_ = [("real", ctypes.c_double), ("imag", ctypes.c_double)]
def __init__(self, pycomplex):
# Init from Python complex
self.real = pycomplex.real
self.imag = pycomplex.imag
def to_complex(self):
# Convert to Python complex
return self.real + (1.j) * self.imag
Inheriting from ctypes.Structure enables the ctypes marshalling magic, which is performed according to the _fields_ member; the constructor and extra methods are just for ease of use on the Python side.
Then, we have to tell ctypes the return type
_sum_it_cplx = cdll.sum_it_cplx
_sum_it_cplx.restype = c_complex
and finally write our wrapper, in a similar fashion to the previous one:
def sum_it_cplx(l):
if isinstance(l, np.ndarray) and l.dtype == np.complex and len(l.shape)==1:
# the numpy array layout for complexes (sequence of two double) is already
# compatible with std::complex (see https://stackoverflow.com/a/5020268/214671)
a = l.ctypes.data
else:
# otherwise, try to build our c_complex
arr_t = c_complex * len(l)
a = arr_t(*(c_complex(r) for r in l))
ret = _sum_it_cplx(a, len(l))
return ret.to_complex()
Testing it as above
# from a complex list (with copy)
print summer.sum_it_cplx([1. + 0.j, 0 + 1.j, 2 + 2.j])
# from a numpy array of the right type - zero-copy
print summer.sum_it_cplx(np.array([1. + 0.j, 0 + 1.j, 2 + 2.j]))
yields the expected results:
(3+3j)
(3+3j)
I see the OP is over a year old now, but I recently addressed a similar problem using the native Python-C/C++ API and its Numpy-C/C++ extension, and since I personally don't enjoy using ctypes for various reasons (e.g., complex number workarounds, messy code), nor Boost, wanted to post my answer for future searchers.
Documentation for the Python-C API and Numpy-C API are both quite extensive (albeit a little overwhelming at first). But after one or two successes, writing native C/C++ extensions becomes very easy.
Here is an example C++ function that can be called from Python. It integrates a 3D numpy array of either real or complex (numpy.double or numpy.cdouble) type. The function will be imported through a DLL (.so) via the module cintegrate.so.
#include "Python.h"
#include "numpy/arrayobject.h"
#include <math.h>
static PyObject * integrate3(PyObject * module, PyObject * args)
{
PyObject * argy=NULL; // Regular Python/C API
PyArrayObject * yarr=NULL; // Extended Numpy/C API
double dx,dy,dz;
// "O" format -> read argument as a PyObject type into argy (Python/C API)
if (!PyArg_ParseTuple(args, "Oddd", &argy,&dx,&dy,&dz)
{
PyErr_SetString(PyExc_ValueError, "Error parsing arguments.");
return NULL;
}
// Determine if it's a complex number array (Numpy/C API)
int DTYPE = PyArray_ObjectType(argy, NPY_FLOAT);
int iscomplex = PyTypeNum_ISCOMPLEX(DTYPE);
// parse python object into numpy array (Numpy/C API)
yarr = (PyArrayObject *)PyArray_FROM_OTF(argy, DTYPE, NPY_ARRAY_IN_ARRAY);
if (yarr==NULL) {
Py_INCREF(Py_None);
return Py_None;
}
//just assume this for 3 dimensional array...you can generalize to N dims
if (PyArray_NDIM(yarr) != 3) {
Py_CLEAR(yarr);
PyErr_SetString(PyExc_ValueError, "Expected 3 dimensional integrand");
return NULL;
}
npy_intp * dims = PyArray_DIMS(yarr);
npy_intp i,j,k,m;
double * p;
//initialize variable to hold result
Py_complex result = {.real = 0, .imag = 0};
if (iscomplex) {
for (i=0;i<dims[0];i++)
for (j=0;j<dims[1];j++)
for (k=0;k<dims[1];k++) {
p = (double*)PyArray_GETPTR3(yarr, i,j,k);
result.real += *p;
result.imag += *(p+1);
}
} else {
for (i=0;i<dims[0];i++)
for (j=0;j<dims[1];j++)
for (k=0;k<dims[1];k++) {
p = (double*)PyArray_GETPTR3(yarr, i,j,k);
result.real += *p;
}
}
//multiply by step size
result.real *= (dx*dy*dz);
result.imag *= (dx*dy*dz);
Py_CLEAR(yarr);
//copy result into returnable type with new reference
if (iscomplex) {
return Py_BuildValue("D", &result);
} else {
return Py_BuildValue("d", result.real);
}
};
Simply put that into a source file (we'll call it cintegrate.cxx along with the module definition stuff, inserted at the bottom:
static PyMethodDef cintegrate_Methods[] = {
{"integrate3", integrate3, METH_VARARGS,
"Pass 3D numpy array (double or complex) and dx,dy,dz step size. Returns Reimman integral"},
{NULL, NULL, 0, NULL} /* Sentinel */
};
static struct PyModuleDef module = {
PyModuleDef_HEAD_INIT,
"cintegrate", /* name of module */
NULL, /* module documentation, may be NULL */
-1, /* size of per-interpreter state of the module,
or -1 if the module keeps state in global variables. */
cintegrate_Methods
};
Then compile that via setup.py much like Walter's boost example with just a couple obvious changes- replacing file.cc there with our file cintegrate.cxx, removing boost dependencies, and making sure the path to "numpy/arrayobject.h" is included.
In python then you can use it like:
import cintegrate
import numpy as np
arr = np.random.randn(4,8,16) + 1j*np.random.randn(4,8,16)
# arbitrary step size dx = 1., y=0.5, dz = 0.25
ans = cintegrate.integrate3(arr, 1.0, 0.5, .25)
This specific code hasn't been tested but is mostly copied from working code.
Note added in edit.
As mentioned in the comments, python itself, being an interpreted language, has little potential for computational efficiency. So in order to make python scripts efficient, one must use modules which aren't all interpreted, but under the hood call compiled (and optimized) code written in, say, C/C++. This is exactly what numpy does for you, in particular for operations on whole arrays.
Therefore, the first step towards efficient python scripts is the usage of numpy. Only the second step is to try to use your own compiled (and optimized) code. Therefore, I have assumed in my example below that you were using numpy to store the array of complex numbers. Everything else would be ill-advised.
There are various ways in which you can access python's original data from within a C/C++ program. I personally have done this with boost.Python, but must warn you that the documentation and support are lousy at best: you're pretty much on your own (and stack overflow, of course).
For example your C++ file may look like this
// file.cc
#include <boost/python.hpp>
#include <boost/python/numpy.hpp>
namespace p = boost::python;
namespace n = p::numpy;
n::ndarray func(const n::ndarray&input, double control_variable)
{
/*
your code here, see documentation for boost python
you pass almost any python variable, doesn't have to be numpy stuff
*/
}
BOOST_PYTHON_MODULE(module_name)
{
Py_Initialize();
n::initialize(); // only needed if you use numpy in the interface
p::def("function", func, "doc-string");
}
to compile this, you may use a python script such as
# setup.py
from distutils.core import setup
from distutils.extension import Extension
module_name = Extension(
'module_name',
extra_compile_args=['-std=c++11','-stdlib=libc++','-I/some/path/','-march=native'],
extra_link_args=['-stdlib=libc++'],
sources=['file.cc'],
libraries=['boost_python','boost_numpy'])
setup(
name='module_name',
version='0.1',
ext_modules=[module_name])
and run it as python setup.py build, which will create an appropriate .so file in a sub-directory of build, which you can import from python.
I have a variable number of numpy arrays, which I'd like to pass to a C function. I managed to pass each individual array (using <ndarray>.ctypes.data_as(c_void_p)), but the number of array may vary a lot.
I thought I could pass all of these "pointers" in a list and use the PyList_GetItem() function in the C code. It works like a charm, except that the values of all elements are not the pointers I usually get when they are passed as function arguments.
Though, if I have :
from numpy import array
from ctypes import py_object
a1 = array([1., 2., 3.8])
a2 = array([222.3, 33.5])
values = [a1, a2]
my_cfunc(py_object(values), c_long(len(values)))
And my C code looks like :
void my_cfunc(PyObject *values)
{
int i, n;
n = PyObject_Length(values)
for(i = 0; i < n; i++)
{
unsigned long long *pointer;
pointer = (unsigned long long *)(PyList_GetItem(values, i);
printf("value 0 : %f\n", *pointer);
}
}
The printed value are all 0.0000
I have tried a lot of different solutions, using ctypes.byref(), ctypes.pointer(), etc. But I can't seem to be able to retrieve the real pointer values. I even have the impression the values converted by c_void_p() are truncated to 32 bits...
While there are many documentations about passing numpy pointers to C, I haven't seen anything about c_types within Python list (I admit this may seem strange...).
Any clue ?
After a few hours spent reading many pages of documentation and digging in numpy include files, I've finally managed to understand exactly how it works. Since I've spent a great amount of time searching for these exact explanations, I'm providing the following text as a way to avoid anyone to waste its time.
I repeat the question :
How to transfer a list of numpy arrays, from Python to C
(I also assume you know how to compile, link and import your C module in Python)
Passing a Numpy array from Python to C is rather simple, as long as it's going to be passed as an argument in a C function. You just need to do something like this in Python
from numpy import array
from ctypes import c_long
values = array([1.0, 2.2, 3.3, 4.4, 5.5])
my_c_func(values.ctypes.data_as(c_void_p), c_long(values.size))
And the C code could look like :
void my_c_func(double *value, long size)
{
int i;
for (i = 0; i < size; i++)
printf("%ld : %.10f\n", i, values[i]);
}
That's simple... but what if I have a variable number of arrays ? Of course, I could use the techniques which parses the function's argument list (many examples in Stackoverflow), but I'd like to do something different.
I'd like to store all my arrays in a list and pass this list to the C function, and let the C code handle all the arrays.
In fact, it's extremely simple, easy et coherent... once you understand how it's done ! There is simply one very simple fact to remember :
Any member of a list/tuple/dictionary is a Python object... on the C side of the code !
You can't expect to directly pass a pointer as I initially, and wrongly, thought. Once said, it sounds very simple :-) Though, let's write some Python code :
from numpy import array
my_list = (array([1.0, 2.2, 3.3, 4.4, 5.5]),
array([2.9, 3.8. 4.7, 5.6]))
my_c_func(py_object(my_list))
Well, you don't need to change anything in the list, but you need to specify that you are passing the list as a PyObject argument.
And here is the how all this is being accessed in C.
void my_c_func(PyObject *list)
{
int i, n_arrays;
// Get the number of elements in the list
n_arrays = PyObject_Length(list);
for (i = 0; i LT n_arrays; i++)
{
PyArrayObject *elem;
double *pd;
elem = PyList_GetItem(list,
i);
pd = PyArray_DATA(elem);
printf("Value 0 : %.10f\n", *pd);
}
}
Explanation :
The list is received as a pointer to a PyObject
We get the number of array from the list by using the PyObject_Length() function.
PyList_GetItem() always return a PyObject (in fact a void *)
We retrieve the pointer to the array of data by using the PyArray_DATA() macro.
Normally, PyList_GetItem() returns a PyObject *, but, if you look in the Python.h and ndarraytypes.h, you'll find that they are both defined as (I've expanded the macros !):
typedef struct _object {
Py_ssize_t ob_refcnt;
struct _typeobject *ob_type;
} PyObject;
And the PyArrayObject... is exactly the same. Though, it's perfectly interchangeable at this level. The content of ob_type is accessible for both objects and contain everything which is needed to manipulate any generic Python object. I admit that I've used one of its member during my investigations. The struct member tp_name is the string containing the name of the object... in clear text; and believe me, it helped ! This is how I discovered what each list element was containing.
While these structures don't contain anything else, how is it that we can access the pointer of this ndarray object ? Simply using object macros... which use an extended structure, allowing the compiler to know how to access the additional object's elements, behind the ob_type pointer. The PyArray_DATA() macro is defined as :
#define PyArray_DATA(obj) ((void *)((PyArrayObject_fields *)(obj))->data)
There, it's casting the PyArayObject * as a PyArrayObject_fields * and this latest structure is simply (simplified and macros expanded !) :
typedef struct tagPyArrayObject_fields {
Py_ssize_t ob_refcnt;
struct _typeobject *ob_type;
char *data;
int nd;
npy_intp *dimensions;
npy_intp *strides;
PyObject *base;
PyArray_Descr *descr;
int flags;
PyObject *weakreflist;
} PyArrayObject_fields;
As you can see, the first two element of the structure are the same as a PyObject and PyArrayObject, but additional elements can be addressed using this definition. It is tempting to directly access these elements, but it's a very bad and dangerous practice which is more than strongly discouraged. You must rather use the macros and don't bother with the details and elements in all these structures. I just thought you might be interested by some internals.
Note that all PyArrayObject macros are documented in http://docs.scipy.org/doc/numpy/reference/c-api.array.html
For instance, the size of a PyArrayObject can be obtained using the macro PyArray_SIZE(PyArrayObject *)
Finally, it's very simple and logical, once you know it :-)
I'm using SWIG with numpy.i to expose a C library to python. The function I'm trying to wrap takes a series of double arrays as arguments:
int wcsp2s(struct wcsprm *wcs, int ncoord, int nelem, const double pixcrd[], double imgcrd[], double phi[], double theta[], double world[], int stat[]);
where some of the arrays are actually two-dimensional, with the extent given by the ncoord and nelem arguments. It's these two-dimensional arrays I'm having trouble with, since numpy.i only seems to support things of the form int n1,int n2,double * arr or various permutations (and my C function does not want those extra integers), or double arr[ANY][ANY]. The latter looked promising, as a multidimensional C array is just a contiguous block of memory, and so should be compatible with what the function expects. But when I try
%apply (double INPLACE_ARRAY2[ANY][ANY]) {(double imgcrd[]),(double world[])};
SWIG (or rather gcc running on SWIG's output) complains:
wcs_wrap.c:3770:7: error: expected expression before ‘,’ token
Here SWIG has generated invalid C code for those arguments.
Is what I'm trying to do here possible? I guess I could use %inplace and %rename to create a wrapper function that does take in the (unnecessary) dimensions of the arrays, and then calls the real function. Even better than the approach above with inplace arrays would be if I could return these arrays as output arguments (their dimensions are easy to calculate based on ncoord and nelem.
Or perhaps a fast (i.e. not the one in astLib) python interface to libwcs already exists, so I don't have to do this?
Edit: I just discovered pywcs (which has such an obvious name that i should have found it during my initial search), which solves my underlying problem.
Edit2: I guess a wrapper that takes in a 2d numpy arrays ans passes on a flattened view of it would get around the problem, since 1d arrays seem to work. Still, that ends up requiring a large amount of files for a simple wrapper (.i, _wrap.c, .py from swig and an additional .py to further wrap the SWIG functions to fix the dimensionality problem.
I am also missing a good cookbook for using numpy.i. As far as I understand, you can either:
pass arrays of dynamic size, where you pass the dimensions as function parameters as well. If your function behaves differently, write a wrapper (e.g., IN_ARRAY2 or INPLACE_ARRAY2).
pass arrays of fixed size (e.g., IN_ARRAY2 or INPLACE_ARRAY2).
When returning arrays (e.g., ARGOUT_ARRAY1), you have to pass the size when calling it from python. In the example below, you would write oo = func3(20). The reason seems to be since python needs to allocate the memory, it needs to know about the size,
For example, your .i-file could look like his:
...
%include "numpy.i"
%init %{
import_array();
%}
// Pass array of dynamic size:
%apply (double* INPLACE_ARRAY2, int DIM1, int DIM2) {(double *xx, int xx_n, int xx_m)};
void func1(double *xx,int xx_n, int xx_m);
// Pass array of fixed size:
%apply (int *INPLACE_ARRAY2[ANY][ANY]) { (double yy[4][4]) };
void func2(double yy[4][4]);
// Return a dynamic 1D array:
%apply (double* ARGOUT_ARRAY1, int DIM1) {(double* out, int out_n)}
void func3(double* out, int out_n);
Of course you can combine these - check the Docs for more information
I am trying to write a piece of code in C++ which can produce an array and return it as as a Python list. I understand that I can return the list as a NumPy array using the typemaps in numpy.i instead, but NumPy is not installed on some of the clusters I am using. The main difference between this question and some similar ones I've seen on here is that I want to return a C array of variable length as a Python list.
The stripped down C++ routine is below:
/* File get_rand_list.cpp */
#include "get_rand_list.h"
/* Define function implementation */
double* get_rand_list(int length) {
output_list = new double[length];
/* Populate input NumPy array with random numbers */
for (int i=0; i < length; i++)
output_list[i] = ((double) rand()) / RAND_MAX;
return output_list;
}
I would like to be able to use this in Python as so:
from get_rand_list import *
list = get_rand_list(10)
Which would return a Python list of 10 random floats. How can I use SWIG typemaps to wrap the C++ routine to do this? If there is a simpler way which requires some small modifications to the C++ routine that is most likely fine.
Since you are using C++ it is easiest to use STL: SWIG knows how to convert std::vector to Python list, so use
std::vector<double> get_rand_list(int length) { ... }
Don't forget to %import std_vector.i. This trick is very useful in general, see the Library section of SWIG docs for other typemaps pre-written. Always favor pre-written typemaps over making your own.
This question has been answered several times already
return double * from swig as python list
SWIG C-to-Python Int Array