I've written an extension in C++ for Python, and I'm currently debugging it.
The extension takes 3 numpy matrices and produces 2 as a result. To the inner C++ function that does the actualy calculation I pass 3 float C arrays (just flattened and converted from input numpy arrays), and return a C float array of arrays. Everything works as intended but ONLY if I print this output array of arrays before returning it.
What the hell is going on in here?
float** gradient(float* inputs, float* kernels, float* grads, npy_intp* input_dims, npy_intp* kernels_dims, npy_intp* output_dims){
float* g_inputs = new float[batch*h*w*ch_in];
for (int i = 0; i < batch*h*w*ch_in; i++) g_inputs[i] = 0;
float* g_kernels = new float[size*ch_out];
for (int i = 0; i < size*ch_out; i++) g_kernels[i] = 0;
float* ret[2] = {{g_inputs}, {g_kernels}};
std::cout<<ret<<std::endl; //<---without this it doesn't work
return ret;
}
I've omitted irrelevant code for clarity.
You are returning a pointer to an object with automatic lifetime. In other words, your function returns a dangling pointer, which is Undefined Behaviour.
Although aerostatic lizards are an uncommon result of UB, anything can happen and the symptom you observe, unlike the lizards, is common.
Related
I'm trying to figure out, how I could achieve this:
I'm having a Python script, which in the end produces a Numpy array, an array of arrays of floats, to be more specific. I have all properly set: I can pass parameters from C to Python, launch Py functions from C, and process returned scalar values in C.
What I'm currently not being able to do, is to return such a Numpy array as a result of a Py function to C.
Could somebody probably provide me a pointer, how to achieve this?
TIA
What you need to look at is Inter Process communication (IPC). There are several ways to perform it.
You can use one of:
Files (Easy to use)
Shared memory (really fast)
Named pipes
Sockets (slow)
See Wikipedia's IPC page and find the best approach for your needs.
Here's a small working demo example (1D, not 2D array! it's not perfect, adjust to your needs).
# file ./pyscript.py
import numpy as np
# print inline
np.set_printoptions(linewidth=np.inf)
x = np.random.random(10)
print(x)
# [0.52523722 0.29053534 0.95293405 0.7966214 0.77120688 0.22154705 0.29398872 0.47186567 0.3364234 0.38107864]
~ demo.c
// file ./demo.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
int main()
{
int fd[2];
pipe(fd); // create pipes
char buf[4096];
pid_t pid=fork();
if (pid==0) { // child process
dup2(fd[1],1);
close(fd[0]);
close(fd[1]);
char *pyscript = "./pyscript.py";
char *args[] = {"python3", pyscript, (char*)NULL};
execv("/usr/bin/python3",args);
}
else {
int status;
close(fd[1]);
int bytes = read(fd[0], buf, sizeof(buf));
printf("Python script output: %.*s\n", bytes, buf);
char* values[10];
int count = 0;
values[count++] = &buf[1]; // ignore the '[' coming from numpy array output
char* p = buf;
while (*p) {
if (*p == ' ') {
*p = 0;
values[count++] = p + 1;
}
p++;
}
float a[10];
float f;
for (int i = 0; i < 10; i++) {
printf("%f\n", f = atof(values[i]) ); // float values
a[i] = f;
}
waitpid(pid, &status, 0);
}
return 0;
}
Sample output
# cc demo.c
# ./a.out
Python script output: [0.23286839 0.54437959 0.37798547 0.17190732 0.49473837 0.48112695 0.93113395 0.20877592 0.96032973 0.30025713]
0.23286839
0.54437959
0.232868
0.544380
0.377985
0.171907
0.494738
0.481127
0.931134
0.208776
0.960330
0.300257
a will be your desired result, an array of float.
One has to use the PyList API for decoding list objects from Python to C
https://docs.python.org/3.3/c-api/list.html?highlight=m
Solved.
I am using ctypes to try and speed up my code.
My problem is similar to the one in this tutorial : https://cvstuff.wordpress.com/2014/11/27/wraping-c-code-with-python-ctypes-memory-and-pointers/
As pointed out in the tutorial I should free the memory after using the C function. Here is my C code
//C functions
double* getStuff(double *R_list, int items){
double results[items];
double* results_p;
for(int i = 0; i < items; i++){
res = calculation ; \\do some calculation
results[i] = res; }
results_p = results;
printf("C allocated address %p \n", results_p);
return results_p; }
void free_mem(double *a){
printf("freeing address: %p\n", a);
free(a); }
Which I compile with gcc -shared -Wl,-lgsl,-soname, simps -o libsimps.so -fPIC simps.c
And python:
//Python
from ctypes import *
import numpy as np
mydll = CDLL("libsimps.so")
mydll.getStuff.restype = POINTER(c_double)
mydll.getStuff.argtypes = [POINTER(c_double),c_int]
mydll.free_mem.restype = None
mydll.free_mem.argtypes = [POINTER(c_double)]
R = np.logspace(np.log10(0.011),1, 100, dtype = float) #input
tracers = c_int(len(R))
R_c = R.ctypes.data_as(POINTER(c_double))
for_list = mydll.getStuff(R_c,tracers)
print 'Python allocated', hex(for_list)
for_list_py = np.array(np.fromiter(for_list, dtype=np.float64, count=len(R)))
mydll.free_mem(for_list)
Up to the last line the code does what I want it to and the for_list_py values are correct. However, when I try to free the memory, I get a Segmentation fault and on closer inspection the address associated with for_list --> hex(for_list) is different to the one allocated to results_p within C part of the code.
As pointed out in this question, Python ctypes: how to free memory? Getting invalid pointer error , for_list will return the same address if mydll.getStuff.restype is set to c_void_p. But then I struggle to put the actual values I want into for_list_py. This is what I've tried:
cast(for_list, POINTER(c_double) )
for_list_py = np.array(np.fromiter(for_list, dtype=np.float64, count=len(R)))
mydll.free_mem(for_list)
where the cast operation seems to change for_list into an integer. I'm fairly new to C and very confused. Do I need to free that chunk of memory? If so, how do I do that whilst also keeping the output in a numpy array? Thanks!
Edit: It appears that the address allocated in C and the one I'm trying to free are the same, though I still recieve a Segmentation fault.
C allocated address 0x7ffe559a3960
freeing address: 0x7ffe559a3960
Segmentation fault
If I do print for_list I get <__main__.LP_c_double object at 0x7fe2fc93ab00>
Conclusion
Just to let everyone know, I've struggled with c_types for a bit.
I've ended up opting for SWIG instead of c_types. I've found that the code runs faster on the whole (compared to the version presented here). I found this documentation on dealing with memory deallocation in SWIG very useful https://scipy-cookbook.readthedocs.io/items/SWIG_Memory_Deallocation.html as well as the fact that SWIG gives you a very easy way of dealing with numpy n-dimensional arrays.
After getStuff function exits, the memory allocated to results array is not available any more, so when you try to free it, it crashes the program.
Try this instead:
double* getStuff(double *R_list, int items)
{
double* results_p = malloc(sizeof((*results_p) * (items + 1));
if (results_p == NULL)
{
// handle error
}
for(int i = 0; i < items; i++)
{
res = calculation ; \\do some calculation
results_p[i] = res;
}
printf("C allocated address %p \n", results_p);
return results_p;
}
MNIST is the hello world of machine learning and I've practiced it with TensorFlow and with pure python and numpy.
For more practice I am trying to write it in C on my own with only the standard library because I am relatively new to C and it's a great way to learn.
It's taken three weeks, and a lot of SEGFAULTS but I get 81% accuracy. Not very good but it's for learning.
The most troubling stuff was of course malloc/free for the data in the matrix struct as below:
typedef struct matrix{
int rows, cols;
float *data;
} matrix;
The forward and backward passes have things like:
1) matrix dot product
2) matrix add
3) matrix subtract
4) activation function (sigmoid in this case)
To avoid memory leaks I pass in three structs like so:
void matrix_add(matrix *a, matrix *b, matrix *res);
If res requires a dimensions change from a previous layer, then I free it and do a new malloc like so:
void zero_out_data(matrix *res, int rows, int cols)
{
if (res->rows != rows || res->cols != cols)
{
if ((res->rows*res->cols) != (rows*cols))
{
free(res->data);
res->data = NULL;
free(res);
res = NULL;
res = malloc(sizeof(matrix));
// make_matrix will calloc the data based on rows*cols
// any other init stuff that could be needed
make_matrix(res, rows, cols);
}
res->rows = rows;
res->cols = cols;
}
else {
res->rows = rows;
res->cols = cols;
for (int i =0; i < (rows*cols); i++)
{
res->data[i] = 0.0;
}
}
}
Then I can use that like so:
void sigmoid(matrix *z, matrix *res)
{
zero_out_data(res, z->rows, z->cols);
for (int i = 0; i < (z->rows*z->cols); i++)
{
res->data[i] = 1.0/(1.0+exp(-z->data[i]));
}
}
This gets very messy because a single forward pass has the following:
/* forward pass */
for (int k=0; k < (network->num_layers-1); k++)
{
matrix_dot(network->weights[k], activation, dot);
matrix_add(dot, network->biases[k], zs[k]);
sigmoid(zs[k], activation);
sigmoid(zs[k], activations[k+1]);
}
/* end forward pass */
As you can imagine the backprop gets alot messier. I have to pre-create 8 different matrices, plus many more of those pointers to pointers of matrices like the activations and zs above, for the gradient descent.
What I would like to be able to do is return a matrix from a function like matrix_dot so that I can do:
sigmoid(matrix_add(matrix_dot(network->weights[k], activation), network->biases[k]));
That's kind of in the style of python/numpy.
Of course I can't return a local variable from a function because it's taken off the stack once the function returns.
If I return a pointer, then the above style will cause sever memory leaks.
Please note: I am not trying to write my own library/framework. I am simply trying to learn neural networks and coding in C. I have been a python developer for 7 years or so, and my C skills need improvement.
Memory leak in void zero_out_data(matrix *res, int rows, int cols)
matrix *res malloc out of the function and pass to zero_out_data. In zero_out_data, res is free and malloc again. If you want to change pointer res's value, then you need parameter like matrix **res.
If you want zero out data, no need malloc new matrix, just malloc the data part. I think your make_matrix function can malloc memory for data.
void zero_out_data(matrix *res, int rows, int col) {
if (res->data == NULL) {
make_matrix(res, rows, cols);
} else if (res->rows != rows || res->cols != cols) {
if ((res->rows*res->cols) != (rows*cols))
{
free(res->data);
res->data = NULL;
make_matrix(res, rows, cols);
}
}
res->rows = rows;
res->cols = cols;
for (int i =0; i < (rows*cols); i++)
{
res->data[i] = 0.0;
}
}
How to implement this: sigmoid(matrix_add(matrix_dot(network->weights[k], activation), network->biases[k])); ?
You can use static or global variables to implement what you want. This will not be thread safe and reentrant. Examples in below:
matrix *matrix_dot(matrix *in_a, matrix *in_b)
{
static matrix res = {0, 0, NULL}; // static variable
// calculate the res's cols and rows number
zero_out_data(&res, res_cols, res_rows); // malloc new data
// do some math.
return &res;
}
// matrix_add will be just like matrix_dot
// I was wrong about sigmod no need new matrix. sigmod can also do it like matrix_dot.
You can use global variable replace static variable.
If you want thread-safe or reentrant, then just use local variable, then you can do it like this.
matrix *matrix_dot(matrix *in_a, matrix *in_b, matrix *res)
{
zero_out_data(res, xxx, xxx);
// do some math
return res;
}
// matrix_add will be the same.
// define local variables.
matrix add_res, dot_res, sig_res;
add_res->data = NULL;
dot_res->data = NULL;
sig_res->data = NULL;
sigmod(matrix_add(matrix_dot(network->weights[k], activation, &dot_res), network->biases[k], &add_res), &sig_res)
// Now remember to free data in matrix
free(add_res->data);
free(dot_res->data);
free(sig_res->data);
I have the following python code:
r = range(1,10)
r_squared = []
for item in r:
print item
r_squared.append(item*item)
How would I convert this code to C? Is there something like a mutable array in C or how would I do the equivalent of the python append?
simple array in c.Arrays in the C are Homogenous
int arr[10];
int i = 0;
for(i=0;i<sizeof(arr);i++)
{
arr[i] = i; // Initializing each element seperately
}
Try using vectors in C go through this link
/ vector-usage.c
#include <stdio.h>
#include "vector.h"
int main() {
// declare and initialize a new vector
Vector vector;
vector_init(&vector);
// fill it up with 150 arbitrary values
// this should expand capacity up to 200
int i;
for (i = 200; i > -50; i--) {
vector_append(&vector, i);
}
// set a value at an arbitrary index
// this will expand and zero-fill the vector to fit
vector_set(&vector, 4452, 21312984);
// print out an arbitrary value in the vector
printf("Heres the value at 27: %d\n", vector_get(&vector, 27));
// we're all done playing with our vector,
// so free its underlying data array
vector_free(&vector);
}
Arrays in C are mutable by default, in that you can write a[i] = 3, just like Python lists.
However, they're fixed-length, unlike Python lists.
For your problem, that should actually be fine. You know the final size you want; just create an array of that size, and assign to the members.
But of course there are problems for which you do need append.
Writing a simple library for appendable arrays (just like Python lists) is a pretty good learning project for C. You can also find plenty of ready-made implementations if that's what you want, but not in the standard library.
The key is to not use a stack array, but rather memory allocated on the heap with malloc. Keep track of the pointer to that memory, the capacity, and the used size. When the used size reaches the capacity, multiply it by some number (play with different numbers to get an idea of how they affect performance), then realloc. That's just about all there is to it. (And if you look at the CPython source for the list type, that's basically the same thing it's doing.)
Here's an example. You'll want to add some error handling (malloc and realloc can return NULL) and of course the rest of the API beyond append (especially a delete function, which will call free on the allocated memory), but this should be enough to show you the idea:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct {
int *i;
size_t len;
size_t capacity;
} IntArray;
IntArray int_array_make() {
IntArray a = {
.i = malloc(10 * sizeof(int)),
.len = 0,
.capacity = 10
};
return a;
}
void int_array_append(IntArray *a, int value) {
if (a->len+1 == a->capacity) {
size_t new_capacity = (int)(a->capacity * 1.6);
a->i = realloc(a->i, new_capacity * sizeof(int));
a->capacity = new_capacity;
}
a->i[a->len++] = value;
}
int main(int argc, char *argv[]) {
IntArray a = int_array_make();
for (int i = 0; i != 50; i++)
int_array_append(&a, i);
for (int i = 0; i != a.len; ++i)
printf("%d ", a.i[i]);
printf("\n");
}
c doesnt have any way of dynamically increasing the size of the array like in python. arrays here are of fixed length
if you know the size of the array that you will be using, u can use this kind of declaration, like this
int arr[10];
or if you would want to add memery on the fly (in runtime), use malloc call along with structure (linked lists)
I have a C++ function returning a std::vector and I want to use it in python, so I'm using the C numpy api:
static PyObject *
py_integrate(PyObject *self, PyObject *args){
...
std::vector<double> integral;
cpp_function(integral); // This changes integral
npy_intp size = {integral.size()};
PyObject *out = PyArray_SimpleNewFromData(1, &size, NPY_DOUBLE, &(integral[0]));
return out;
}
Here's how I call it from python:
import matplotlib.pyplot as plt
a = py_integrate(parameters)
print a
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(a)
print a
What happens is: The first print is ok, the values are correct. But when I plot a they are not; in the second print I see very strange values like 1E-308 1E-308 ... or 0 0 0 ... as an uninitialized memory. I don't understand why the first print is ok.
Partial solution (not working):
static void DeleteVector(void *ptr)
{
std::cout << "Delete" << std::endl;
vector * v = static_cast<std::vector<double> * >(ptr);
delete v;
return;
}
static PyObject *
cppfunction(PyObject *self, PyObject *args)
{
std::vector<double> *vector = new std::vector<double>();
vector->push_back(1.);
PyObject *py_integral = PyCObject_FromVoidPtr(vector, DeleteVector);
npy_intp size = {vector->size()};
PyArrayObject *out;
((PyArrayObject*) out)->base = py_integral;
return (PyObject*)(out);
}
Your std::vector object appears to be local to that function. PyArray_SimpleNewFromData does not make a copy of the data you pass it. It just keeps a pointer. So once your py_integrate function returns, the vector is deallocated. The print works the first time because nothing has written over the freed memory yet, but by the time you get to the next print, something else has used that memory, causing the values to be different.
You need to make a NumPy array that owns its own storage space and then copy the data into it.
Alternatively, allocate your vector on the heap. Then store a pointer to it in a CObject. Provide a destructor that deletes the vector. Then, take a look at the C-level PyArrayObject type. It has a PyObject * member called base. Store your CObject there. Then when the NumPy array is garbage collected, the reference count on this base object will be decremented, and assuming you haven't taken a copy of it elsewhere, your vector will be deleted thanks to the destructor you provided.
Fixer-upper
You forgot to actually create the PyArray. Try this:
(You didn't post DeleteVector, so I can only hope that it's right)
std::vector<double> *vector = new std::vector<double>();
vector->push_back(1.);
PyObject *py_integral = PyCObject_FromVoidPtr(vector, DeleteVector);
npy_intp size = {vector->size()};
PyObject *out = PyArray_SimpleNewFromData(1, &size, NPY_DOUBLE, &((*vector)[0]));
((PyArrayObject*) out)->base = py_integral;
return out;
Note: I'm not a C++ programmer, so I can only assume that &((*vector)[0]) works as intended with a pointer to a vector. I do know that the vector reallocate its storage area if you grow it, so don't increase its size after getting that pointer or it won't be valid anymore.
You will need to make a copy of the vector, since the vector will go out of scope and the memory will no longer be usable by the time you need it in Python (as stated by kwatford).
One way to make the Numpy array you need (by copying the data) is:
PyObject *out = nullptr;
std::vector<double> *vector = new std::vector<double>();
vector->push_back(1.);
npy_intp size = {vector.size()};
out = PyArray_SimpleNew(1, &size, NPY_DOUBLE);
memcpy(PyArray_DATA((PyArrayObject *) out), vector.data(), vector.size());