I've got a library that takes in a very simple C image structure:
// Represents a one-channel 8-bit image
typedef struct simple_image_t {
uint32 rows;
uint32 cols;
uint8 *imgdata;
} simple_image;
I didn't create this library, nor this structure, so I can't change it. I'm responsible for wrapping this library for python using SWIG. The Python wrapper needs to be able to take in a PIL Image and convert it into this structure. Here's how I'm doing it right now (using a SWIG %inline%):
// Allows python to easily create and initialize this structure
simple_image* py_make_simple_image(uint32 width, uint32 height)
{
simple_image* img = new simple_image();
img->rows = height;
img->cols = width;
img->imgdata = new uint8[height * width];
return img;
}
// Allows python to set a particular pixel value
void py_set_simple_image(simple_image* img, uint32 pos, uint8 val)
{
img->imgdata[pos] = val;
}
And then on the python wrapper side here's how things look right now:
# Make sure it's an 8-bit image
if pil_image.mode != "L":
pil_image = pil_image.convert("L")
# Create the simple image structure
(width, height) = pil_image.size
img = swig_wrapper.py_make_simple_image(width, height)
try:
# Copy the image data into the simple image structure
pos = 0
for pixel in pil_image.getdata():
swig_wrapper.py_set_simple_image(img, pos, pixel)
pos += 1
# Call some library method that accepts a simple_image*
return swig_wrapper.some_image_method(img)
finally:
# Clean up the simple image structure
swig_wrapper.py_destroy_simple_image(img)
Amazingly this works, however as you may have guessed it's incredibly slow when working with even moderately large images. I know with SWIG the proper way to do things is to use a typemap, however that would mean digging in to the C API of PIL, and I just didn't have time to do that at the moment.
What are my options in terms of speed? Are there quicker ways of marshaling the pixel data from a PIL image to this simple image structure? Has someone already done this and my Google skills are just that bad? Am I just boned and soon will need to learn the internals of PIL?
Thanks.
PIL's Image.tostring() returns a string of the exact data you need for imgdata. The typemap I used is fairly simple, but not perfect, which I'll note below. Here is the sample code I created on Windows that worked for me:
sample.h
typedef unsigned int uint32;
typedef unsigned char uint8;
typedef struct simple_image_t {
uint32 rows;
uint32 cols;
uint8 *imgdata;
} simple_image;
#ifdef SAMPLE_EXPORT
# define SAMPLE_API __declspec(dllexport)
#else
# define SAMPLE_API __declspec(dllimport)
#endif
SAMPLE_API void some_func(const simple_image* si);
sample.c
#include <stdio.h>
#define SAMPLE_EXPORT
#include "sample.h"
void some_func(const simple_image* si)
{
uint32 i,j;
printf(
"rows = %d\n"
"cols = %d\n",
si->rows,si->cols);
/* Dump a simple map of the image data */
for(i = 0; i < si->rows; i++)
{
for(j = 0; j < si->cols; j++)
{
if(si->imgdata[i * si->rows + j] < 0x80)
printf(" ");
else
printf("*");
}
printf("\n");
}
}
sample.i
%module sample
%begin %{
#pragma warning(disable:4100 4127 4706)
%}
%{
#include "sample.h"
%}
%include <windows.i>
%typemap(in) uint8* (char* buffer, Py_ssize_t length) {
PyString_AsStringAndSize($input,&buffer,&length);
$1 = (uint8*)buffer;
}
%include "sample.h"
makefile
all: _sample.pyd
sample.dll: sample.c sample.h
cl /nologo /W4 /LD /MD sample.c
sample_wrap.c: sample.i
#echo sample.i
swig -python sample.i
_sample.pyd: sample_wrap.c sample.dll
cl /nologo /W4 /LD /MD /Fe_sample.pyd sample_wrap.c /Ic:\Python27\include -link /LIBPATH:c:\Python27\libs python27.lib sample.lib
example.py
from PIL import Image
import sample
im = Image.open('sample.gif')
im = im.convert('L')
si = sample.simple_image()
si.rows,si.cols = im.size
s = im.tostring() # Must keep a reference
si.imgdata = s
sample.some_func(si)
With this quick example I haven't determined how the typemap should correctly increment the reference count of the string object. Note that the above code could crash if the following code were used:
si.imgdata = im.tostring()
The current typemap's PyString_AsStringAndSize returns a direct pointer to the PyString object's buffer, but doesn't increment the reference count for the object. It can be garbage collected before some_func executes (and was for me, crashing Python). Assigning to s keeps a reference to the string and prevents problems. The typemap should copy the buffer, but you were looking for speed so this hack may be what you want.
May be you could convert the image to a char array using the array module, and then, from swig, memcpy the data to your C array.
import array
imagar = array.array('B', pil_image.getdata())
(mem, length) = imagar.buffer_info()
swig_wrapper.py_copy(img, mem, length)
being py_copy something like:
void py_copy(simple_image* img, uint32 mem, uint32 length) {
memcpy((void*)img->imgdata ,(void*)mem, length );
}
How about using ctypes? It allows you to have direct access to c structures, so no need to create a Python equivalent of the struct, and you should also be able to do a memcpy (which would be faster than copying pixel by pixel).
Related
cuda1.cu
#include <iostream>
using namespace std ;
# define DELLEXPORT extern "C" __declspec(dllexport)
__global__ void kernel(long* answer = 0){
*answer = threadIdx.x + (blockIdx.x * blockDim.x);
}
DELLEXPORT void resoult(long* h_answer){
long* d_answer = 0;
cudaMalloc(&d_answer, sizeof(long));
kernel<<<10,1000>>>(d_answer);
cudaMemcpy(&h_answer, d_answer, sizeof(long), cudaMemcpyDeviceToHost);
cudaFree(d_answer);
}
main.py
import ctypes
import numpy as np
add_lib = ctypes.CDLL(".\\a.dll")
resoult= add_lib.resoult
resoult.argtypes = [ctypes.POINTER(ctypes.c_long)]
x = ctypes.c_long()
print("R:",resoult(x))
print("RV: ",x.value)
print("RB: ",resoult(ctypes.byref(x)))
output in python:0
output in cuda: 2096
I implemented based on c language without any problems but in cuda mode I have a problem how can I have the correct output value
Thanks
cudaMemcpy is expecting pointers for dst and src.
In your function resoult, h_answer is a pointer to a long allocated by the caller.
Since it's already the pointer where the data should be copied to, you should use it as is and not take it's address by using &h_answer.
Therefore you need to change your cudaMemcpy from:
cudaMemcpy(&h_answer, d_answer, sizeof(long), cudaMemcpyDeviceToHost);
To:
cudaMemcpy(h_answer, d_answer, sizeof(long), cudaMemcpyDeviceToHost);
I have a c function that takes as arguments a void * pointer and an integer length for the size of the buffer pointed to.
e.g.
char* myfunc(void *mybuffer, int buflen)
On the python side I have a bytes object of binary data read from a file.
What I am trying to figure out is the right conversions to be able to call the c function from python, and am struggling a bit.
I understand the conversions for dealing with simple string data (e.g. encoding to utf-8 and using a char_p type) but dealing with a bytes object has been a bit of a struggle....
Thanks in advance!
Given your commented description, you can just use the obvious types if you don't need to free the returned char* memory. You can pass a bytes object to a void*. Here's a quick demo:
test.c
#include <stdio.h>
#include <stdint.h>
#ifdef _WIN32
# define API __declspec(dllexport)
#else
# define API
#endif
API char* myfunc(void *mybuffer, int buflen) {
const uint8_t* tmp = (const uint8_t*)mybuffer;
for(int i = 0; i < buflen; ++i) // show the passed bytes
printf("%02X\n", tmp[i]);
return "output"; // static string no deallocation required
}
test.py
import ctypes as ct
import os
dll = ct.CDLL('./test')
dll.myfunc.argtypes = ct.c_void_p, ct.c_int
dll.myfunc.restype = ct.c_char_p
buf = bytes([1,2,0,0xaa,0x55]) # including embedded null
ret = dll.myfunc(buf, len(buf))
print(ret)
Output:
01
02
00
AA
55
b'output'
I'm trying to figure out, how I could achieve this:
I'm having a Python script, which in the end produces a Numpy array, an array of arrays of floats, to be more specific. I have all properly set: I can pass parameters from C to Python, launch Py functions from C, and process returned scalar values in C.
What I'm currently not being able to do, is to return such a Numpy array as a result of a Py function to C.
Could somebody probably provide me a pointer, how to achieve this?
TIA
What you need to look at is Inter Process communication (IPC). There are several ways to perform it.
You can use one of:
Files (Easy to use)
Shared memory (really fast)
Named pipes
Sockets (slow)
See Wikipedia's IPC page and find the best approach for your needs.
Here's a small working demo example (1D, not 2D array! it's not perfect, adjust to your needs).
# file ./pyscript.py
import numpy as np
# print inline
np.set_printoptions(linewidth=np.inf)
x = np.random.random(10)
print(x)
# [0.52523722 0.29053534 0.95293405 0.7966214 0.77120688 0.22154705 0.29398872 0.47186567 0.3364234 0.38107864]
~ demo.c
// file ./demo.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
int main()
{
int fd[2];
pipe(fd); // create pipes
char buf[4096];
pid_t pid=fork();
if (pid==0) { // child process
dup2(fd[1],1);
close(fd[0]);
close(fd[1]);
char *pyscript = "./pyscript.py";
char *args[] = {"python3", pyscript, (char*)NULL};
execv("/usr/bin/python3",args);
}
else {
int status;
close(fd[1]);
int bytes = read(fd[0], buf, sizeof(buf));
printf("Python script output: %.*s\n", bytes, buf);
char* values[10];
int count = 0;
values[count++] = &buf[1]; // ignore the '[' coming from numpy array output
char* p = buf;
while (*p) {
if (*p == ' ') {
*p = 0;
values[count++] = p + 1;
}
p++;
}
float a[10];
float f;
for (int i = 0; i < 10; i++) {
printf("%f\n", f = atof(values[i]) ); // float values
a[i] = f;
}
waitpid(pid, &status, 0);
}
return 0;
}
Sample output
# cc demo.c
# ./a.out
Python script output: [0.23286839 0.54437959 0.37798547 0.17190732 0.49473837 0.48112695 0.93113395 0.20877592 0.96032973 0.30025713]
0.23286839
0.54437959
0.232868
0.544380
0.377985
0.171907
0.494738
0.481127
0.931134
0.208776
0.960330
0.300257
a will be your desired result, an array of float.
One has to use the PyList API for decoding list objects from Python to C
https://docs.python.org/3.3/c-api/list.html?highlight=m
Solved.
I have a C++ function computing a large tensor which I would like to return to Python as a NumPy array via pybind11.
From the documentation of pybind11, it seems like using STL unique_ptr is desirable.
In the following example, the commented out version works, whereas the given one compiles but fails at runtime ("Unable to convert function return value to a Python type!").
Why is the smartpointer version failing? What is the canonical way to create and return a NumPy array?
PS: Due to program structure and size of the array, it is desirable to not copy memory but create the array from a given pointer. Memory ownership should be taken by Python.
typedef typename py::array_t<double, py::array::c_style | py::array::forcecast> py_cdarray_t;
// py_cd_array_t _test()
std::unique_ptr<py_cdarray_t> _test()
{
double * memory = new double[3]; memory[0] = 11; memory[1] = 12; memory[2] = 13;
py::buffer_info bufinfo (
memory, // pointer to memory buffer
sizeof(double), // size of underlying scalar type
py::format_descriptor<double>::format(), // python struct-style format descriptor
1, // number of dimensions
{ 3 }, // buffer dimensions
{ sizeof(double) } // strides (in bytes) for each index
);
//return py_cdarray_t(bufinfo);
return std::unique_ptr<py_cdarray_t>( new py_cdarray_t(bufinfo) );
}
A few comments (then a working implementation).
pybind11's C++ object wrappers around Python types (like pybind11::object, pybind11::list, and, in this case, pybind11::array_t<T>) are really just wrappers around an underlying Python object pointer. In this respect there are already taking on the role of a shared pointer wrapper, and so there's no point in wrapping that in a unique_ptr: returning the py::array_t<T> object directly is already essentially just returning a glorified pointer.
pybind11::array_t can be constructed directly from a data pointer, so you can skip the py::buffer_info intermediate step and just give the shape and strides directly to the pybind11::array_t constructor. A numpy array constructed this way won't own its own data, it'll just reference it (that is, the numpy owndata flag will be set to false).
Memory ownership can be tied to the life of a Python object, but you're still on the hook for doing the deallocation properly. Pybind11 provides a py::capsule class to help you do exactly this. What you want to do is make the numpy array depend on this capsule as its parent class by specifying it as the base argument to array_t. That will make the numpy array reference it, keeping it alive as long as the array itself is alive, and invoke the cleanup function when it is no longer referenced.
The c_style flag in the older (pre-2.2) releases only had an effect on new arrays, i.e. when not passing a value pointer. That was fixed in the 2.2 release to also affect the automatic strides if you specify only shapes but not strides. It has no effect at all if you specify the strides directly yourself (as I do in the example below).
So, putting the pieces together, this code is a complete pybind11 module that demonstrates how you can accomplish what you're looking for (and includes some C++ output to demonstrate that is indeed working correctly):
#include <iostream>
#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>
namespace py = pybind11;
PYBIND11_PLUGIN(numpywrap) {
py::module m("numpywrap");
m.def("f", []() {
// Allocate and initialize some data; make this big so
// we can see the impact on the process memory use:
constexpr size_t size = 100*1000*1000;
double *foo = new double[size];
for (size_t i = 0; i < size; i++) {
foo[i] = (double) i;
}
// Create a Python object that will free the allocated
// memory when destroyed:
py::capsule free_when_done(foo, [](void *f) {
double *foo = reinterpret_cast<double *>(f);
std::cerr << "Element [0] = " << foo[0] << "\n";
std::cerr << "freeing memory # " << f << "\n";
delete[] foo;
});
return py::array_t<double>(
{100, 1000, 1000}, // shape
{1000*1000*8, 1000*8, 8}, // C-style contiguous strides for double
foo, // the data pointer
free_when_done); // numpy array references this parent
});
return m.ptr();
}
Compiling that and invoking it from Python shows it working:
>>> import numpywrap
>>> z = numpywrap.f()
>>> # the python process is now taking up a bit more than 800MB memory
>>> z[1,1,1]
1001001.0
>>> z[0,0,100]
100.0
>>> z[99,999,999]
99999999.0
>>> z[0,0,0] = 3.141592
>>> del z
Element [0] = 3.14159
freeing memory # 0x7fd769f12010
>>> # python process memory size has dropped back down
I recommend using ndarray. A foundational principle is that the underlying data is never copied unless explicitly requested (or you quickly end up with huge inefficiencies). Below is an example of it in use, but there are other features I haven't shown, including conversion to Eigen arrays (ndarray::asEigen(array)), which makes it pretty powerful.
Header:
#ifndef MYTENSORCODE_H
#define MYTENSORCODE_H
#include "ndarray_fwd.h"
namespace myTensorNamespace {
ndarray::Array<double, 2, 1> myTensorFunction(int param1, double param2);
} // namespace myTensorNamespace
#endif // include guard
Lib:
#include "ndarray.h"
#include "myTensorCode.h"
namespace myTensorNamespace {
ndarray::Array<double, 2, 1> myTensorFunction(int param1, double param2) {
std::size_t const size = calculateSize();
ndarray::Array<double, 2, 1> array = ndarray::allocate(size, size);
array.deep() = 0; // initialise
for (std::size_t ii = 0; ii < size; ++ii) {
array[ii][ndarray::view(ii, ii + 1)] = 1.0;
}
return array;
}
} // namespace myTensorNamespace
Wrapper:
#include "pybind11/pybind11.h"
#include "ndarray.h"
#include "ndarray/pybind11.h"
#include "myTensorCode.h"
namespace py = pybind11;
using namespace pybind11::literals;
namespace myTensorNamespace {
namespace {
PYBIND11_MODULE(myTensorModule, mod) {
mod.def("myTensorFunction", &myTensorFunction, "param1"_a, "param2"_a);
}
} // anonymous namespace
} // namespace myTensorNamespace
I have the following python code:
r = range(1,10)
r_squared = []
for item in r:
print item
r_squared.append(item*item)
How would I convert this code to C? Is there something like a mutable array in C or how would I do the equivalent of the python append?
simple array in c.Arrays in the C are Homogenous
int arr[10];
int i = 0;
for(i=0;i<sizeof(arr);i++)
{
arr[i] = i; // Initializing each element seperately
}
Try using vectors in C go through this link
/ vector-usage.c
#include <stdio.h>
#include "vector.h"
int main() {
// declare and initialize a new vector
Vector vector;
vector_init(&vector);
// fill it up with 150 arbitrary values
// this should expand capacity up to 200
int i;
for (i = 200; i > -50; i--) {
vector_append(&vector, i);
}
// set a value at an arbitrary index
// this will expand and zero-fill the vector to fit
vector_set(&vector, 4452, 21312984);
// print out an arbitrary value in the vector
printf("Heres the value at 27: %d\n", vector_get(&vector, 27));
// we're all done playing with our vector,
// so free its underlying data array
vector_free(&vector);
}
Arrays in C are mutable by default, in that you can write a[i] = 3, just like Python lists.
However, they're fixed-length, unlike Python lists.
For your problem, that should actually be fine. You know the final size you want; just create an array of that size, and assign to the members.
But of course there are problems for which you do need append.
Writing a simple library for appendable arrays (just like Python lists) is a pretty good learning project for C. You can also find plenty of ready-made implementations if that's what you want, but not in the standard library.
The key is to not use a stack array, but rather memory allocated on the heap with malloc. Keep track of the pointer to that memory, the capacity, and the used size. When the used size reaches the capacity, multiply it by some number (play with different numbers to get an idea of how they affect performance), then realloc. That's just about all there is to it. (And if you look at the CPython source for the list type, that's basically the same thing it's doing.)
Here's an example. You'll want to add some error handling (malloc and realloc can return NULL) and of course the rest of the API beyond append (especially a delete function, which will call free on the allocated memory), but this should be enough to show you the idea:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct {
int *i;
size_t len;
size_t capacity;
} IntArray;
IntArray int_array_make() {
IntArray a = {
.i = malloc(10 * sizeof(int)),
.len = 0,
.capacity = 10
};
return a;
}
void int_array_append(IntArray *a, int value) {
if (a->len+1 == a->capacity) {
size_t new_capacity = (int)(a->capacity * 1.6);
a->i = realloc(a->i, new_capacity * sizeof(int));
a->capacity = new_capacity;
}
a->i[a->len++] = value;
}
int main(int argc, char *argv[]) {
IntArray a = int_array_make();
for (int i = 0; i != 50; i++)
int_array_append(&a, i);
for (int i = 0; i != a.len; ++i)
printf("%d ", a.i[i]);
printf("\n");
}
c doesnt have any way of dynamically increasing the size of the array like in python. arrays here are of fixed length
if you know the size of the array that you will be using, u can use this kind of declaration, like this
int arr[10];
or if you would want to add memery on the fly (in runtime), use malloc call along with structure (linked lists)