I'm currently embedding Python in my C++ program using boost/python in order to use matplotlib. Now I'm stuck at a point where I have to construct a large data structure, let's say a dense 10000x10000 matrix of doubles. I want to plot columns of that matrix and I figured that i have multiple options to do so:
Iterating and copying every value into a numpy array --> I don't want to do that for an obvious reason which is doubled memory consumption
Iterating and exporting every value into a file than importing it in python --> I could do that completely without boost/python and I don't think this is a nice way
Allocate and store the matrix in Python and just update the values from C++ --> But as stated here it's not a good idea to switch back and forth between the Python interpreter and my C++ program
Somehow expose the matrix to python without having to copy it --> All I can find on that matter is about extending Python with C++ classes and not embedding
Which of these is the best option concerning performance and of course memory consumption or is there an even better way of doing that kind of task.
To prevent copying in Boost.Python, one can either:
Use policies to return internal references
Allocate on the free store and use policies to have Python manage the object
Allocate the Python object then extract a reference to the array within C++
Use a smart pointer to share ownership between C++ and Python
If the matrix has a C-style contiguous memory layout, then consider using the Numpy C-API. The PyArray_SimpleNewFromData() function can be used to create an ndarray object thats wraps memory that has been allocated elsewhere. This would allow one to expose the data to Python without requiring copying or transferring each element between the languages. The how to extend documentation is a great resource for dealing with the Numpy C-API:
Sometimes, you want to wrap memory allocated elsewhere into an ndarray object for downstream use. This routine makes it straightforward to do that. [...] A new reference to an ndarray is returned, but the ndarray will not own its data. When this ndarray is deallocated, the pointer will not be freed.
[...]
If you want the memory to be freed as soon as the ndarray is deallocated then simply set the OWNDATA flag on the returned ndarray.
Also, while the plotting function may create copies of the array, it can do so within the C-API, allowing it to take advantage of the memory layout.
If performance is a concern, it may be worth considering the plotting itself:
taking a sample of the data and plotting it may be sufficient depending on the data distribution
using a raster based backend, such as Agg, will often out perform vector based backends on large datasets
benchmarking other tools that are designed for large data, such as Vispy
Altough Tanner's answer brought me a big step forward, I ended up using Boost.NumPy, an inofficial extension to Boost.Python that can easily be added. It wraps around the NumPy C API and makes it more save and easier to use.
Related
I'm new to C extensions for NumPy and I'm wondering if the following workflow is possible.
Pre-allocate an array in NumPy
Pass this array to a C extension
Modify array data in-place in C
Use the updated array in Python with standard NumPy functions
In particular, I'd like to do this while ensuring I'm making zero new copies of the data at any step.
I'm familiar with boilerplate on the C side such as PyModuleDef, PyMethodDef, and the PyObject* arguments but a lot of examples I've seen involve coercion to C arrays which to my understanding involves copying and/or casting. I'm also aware of Cython though I don't know if it does similar coercions or copies under the hood. I'm specifically interested in simple indexed get- and set- operations on ndarray with numeric (eg. int32) values.
Could someone provide a minimal working example of creating a NumPy array, modifying it in-place in a C extension, and using the results in Python subsequently?
Cython doesn't create new copies of numpy arrays unless you specifically request it to do so using numpy functions, so it is as efficient as it can be when dealing with numpy arrays, see Working with NumPy
choosing between writing raw C module and using cython depends on the purpose of the module written.
if you are writing a module that will only be used by python to do a very small specific task with numpy arrays as fast as possible, then by all means do use cython, as it will automate registering the module correctly as well as handle the memory and prevent common mistakes that people do when writing C code (like memory management problems), as well as automate the compiler includes and allow an overall easier access to complicated functionality (like using numpy iterators).
however if your module is going to be used in other languages and has to be run independently from python and has to be used with python without any overhead, and implements some complex C data structures and requires a lot of C functionality then by all means create your own C extension (or even a dll), and you can pass pointers to numpy arrays from python (using numpy.ctypeslib.as_ctypes_type), or pass the python object itself and return it (but you must make a .pyd/so instead of dll), or even create numpy array on C side and have it managed by python (but you will have to understand the numpy C API).
I want to create a Cython function that will compile a rather large image pyramid, then pass that image pyramid to another Cython function for additional processing. At issue is the fact that the 2D arrays in the image pyramid do not have the same shape. The number of pyramid layers and the dimensions of each layer are known ahead of time (i.e., I don't necessarily need a container that can be dynamically allocated and would prefer to avoid such a container for speed reasons). In pure Python, I would probably load the various pyramid layers into a pre-initialized list and pass the list as an argument to the function that needs the pyramid. Since I'm now working in Cython where the goal is improved processing speed, I'm trying to avoid any container that might ultimately slow down processing. Memory views seem to be a good option for speed; however, it seems inefficient in terms of memory to force an image pyramid with a large base layer size into a memory view. I've read the answers to the following stack overflow question:
Python list equivalent in C++?
As a guess, at least one of the suggestions at the cited link might address my question -- the problem is which suggestion? I'm relatively new to Cython and have an even more limited knowledge of C or C++, so hoping someone can point me in the right direction to help limit the number of possibilities I need to research.
When using large arrays, does python allocate memory as default, unlike C for example?
More specifically, when using the command array=[1,2,3], should I worry about freeing this and every other array I create?
Looking for answers on the web just confused me more.
array=[1,2,3] is a list, not an array. It is dynamically allocated (resizes automatically), and you do not have to free up memory.
The same applies to arrays from the array module in the standard library, and arrays from the numpy library.
As a rule, python handles memory allocation and memory freeing for all its objects; to, maybe, the exception of some objects created using cython, or directly calling c modules.
I'm working on a project which requires handling matrices of custom C structures, with some C functions implementing operations over these structures.
So far, we're proceeding as follows:
Build python wrapper classes around the C structures using ctypes
Override __and__ and __xor__ for that object calling the appropriate underlying C functions
Build numpy arrays containing instances of these classes
Now we are facing some performance issues, and I feel this is not a proper way to handle this thing because we have a C library implementing natively expensive operations on the data type, then numpy implementing natively expensive operations on matrixes, but (I guess) in this configuration every single operation will be proxyed by the python wrapper class.
Is there a way to implement this with numpy that make the operations fully native? I've read there are utilities for wrapping ctypes types into numpy arrays (see here), but what about the operator overloading?
We're not bound to using ctypes, but we would love to be able to still use python (which I believe has big advantages over C in terms of code maintainability...)
Does anyone know if this is possible, and how? Would you suggest other different solutions?
Thanks a lot!
To elaborate somewhat on the above comment (arrays of structs versus struct of arrays):
import numpy as np
#create an array of structs
array_of_structs = np.zeros((3,3), dtype=[('a', np.float64), ('b',np.int32)])
#we may as well think of it as a struct of arrays
struct_of_arrays = array_of_structs['a'], array_of_structs['b']
#this should print 8
print array_of_structs['b'].ctypes.data - array_of_structs['a'].ctypes.data
Note that working with a 'struct of arrays' approach requires a rather different coding style. That is, we largely have to let go of the OOP way of organizing our code. Naturally, you may still choose to place all operations relating to a particular object in a single namespace, but such a data layout encourages a more vectorized, or numpythonic coding style. Rather than passing single objects around between functions, we rather take the whole collection or matrix of objects to be the smallest atomic construct that we manipulate.
Of course, the same data may also be accessed in an OOP manner, where it is desirable. But that will not do you much good as far as numpy is concerned. Note that for performance reasons, it is often also preferable to maintain each attribute of your object as a separate contiguous array, since placing all attributes contiguously is cache-inefficient for operations which act on only a subset of attributes of the object.
I have a large set of large arrays that need to be fourier transformed one after another, repeatedly, and they do not all fit in memory at the same time. Typical array size is (350,250000), but is quite variable. The general procedure is
while True:
for data in data_set:
array = generate_array(data)
fft(array,farray)
do_something_with_farray()
ifft(farray,array)
do_something_with_array()
This needs to be fast, so ideally I would make plans for all the arrays beforehand, and reuse them in the loop. This is especially important because even constructing a plan with FFTW_ESTIMATE is too slow for me to do it inside the loop (10x+ times slower than just executing the plan, when constructing it as pyfftw.FFTW(array, farray, flags=['FFTW_ESTIMATE,FFTW_DESTROY_INPUT'], threads=nthread, axes=[-1])). However, each plan contains a reference to the arrays that were used when constructing it, which means that keeping all the plans in memory results in me keeping all the arrays in memory too, which I can't afford.
Is it possible to make pyfftw release the references it holds to the arrays? After all, I am planning to repoint them to fully compatible new arrays inside the loop anyway. If not, is there some other way of getting around this problem? I guess I could make plans for single rows, or for chunks of rows, but that could easily lead to slowdowns.
PS. I use FFTW_ESTIMATE rather than FFTW_MEASURE despite planning to reuse the plan many times becuse FFTW_MEASURE takes forever for these array sizes, and when I specify a time limit, performance is no better than for FFTW_ESTIMATE.
Edit: Actually, the slowness of constructing the plan only happens the first time I construct a plan of that shape (due to wisdom, I guess), so the approach of not storing the plans works after all. Still, if it is possible to store plans without the array references, that would be nice to know about.
FFTW plans are by there nature tied to a piece of memory. However, there is nothing to stop you using the same piece of memory for all your plans. So you could create a single array that is big enough for all your possible arrays and then create your FFTW objects on views into that array.
You can then execute the FFT using the FFTW.__call__() interface that allows the arrays to be updated prior to execution (with little overhead when they agree with the original array in strides and alignment).
Now, the FFTW object will have the new arrays as its internal arrays. If you want to revert back to the other memory, you can use FFTW.update_arrays().