I'm working on a project which requires handling matrices of custom C structures, with some C functions implementing operations over these structures.
So far, we're proceeding as follows:
Build python wrapper classes around the C structures using ctypes
Override __and__ and __xor__ for that object calling the appropriate underlying C functions
Build numpy arrays containing instances of these classes
Now we are facing some performance issues, and I feel this is not a proper way to handle this thing because we have a C library implementing natively expensive operations on the data type, then numpy implementing natively expensive operations on matrixes, but (I guess) in this configuration every single operation will be proxyed by the python wrapper class.
Is there a way to implement this with numpy that make the operations fully native? I've read there are utilities for wrapping ctypes types into numpy arrays (see here), but what about the operator overloading?
We're not bound to using ctypes, but we would love to be able to still use python (which I believe has big advantages over C in terms of code maintainability...)
Does anyone know if this is possible, and how? Would you suggest other different solutions?
Thanks a lot!
To elaborate somewhat on the above comment (arrays of structs versus struct of arrays):
import numpy as np
#create an array of structs
array_of_structs = np.zeros((3,3), dtype=[('a', np.float64), ('b',np.int32)])
#we may as well think of it as a struct of arrays
struct_of_arrays = array_of_structs['a'], array_of_structs['b']
#this should print 8
print array_of_structs['b'].ctypes.data - array_of_structs['a'].ctypes.data
Note that working with a 'struct of arrays' approach requires a rather different coding style. That is, we largely have to let go of the OOP way of organizing our code. Naturally, you may still choose to place all operations relating to a particular object in a single namespace, but such a data layout encourages a more vectorized, or numpythonic coding style. Rather than passing single objects around between functions, we rather take the whole collection or matrix of objects to be the smallest atomic construct that we manipulate.
Of course, the same data may also be accessed in an OOP manner, where it is desirable. But that will not do you much good as far as numpy is concerned. Note that for performance reasons, it is often also preferable to maintain each attribute of your object as a separate contiguous array, since placing all attributes contiguously is cache-inefficient for operations which act on only a subset of attributes of the object.
Related
I'm new to C extensions for NumPy and I'm wondering if the following workflow is possible.
Pre-allocate an array in NumPy
Pass this array to a C extension
Modify array data in-place in C
Use the updated array in Python with standard NumPy functions
In particular, I'd like to do this while ensuring I'm making zero new copies of the data at any step.
I'm familiar with boilerplate on the C side such as PyModuleDef, PyMethodDef, and the PyObject* arguments but a lot of examples I've seen involve coercion to C arrays which to my understanding involves copying and/or casting. I'm also aware of Cython though I don't know if it does similar coercions or copies under the hood. I'm specifically interested in simple indexed get- and set- operations on ndarray with numeric (eg. int32) values.
Could someone provide a minimal working example of creating a NumPy array, modifying it in-place in a C extension, and using the results in Python subsequently?
Cython doesn't create new copies of numpy arrays unless you specifically request it to do so using numpy functions, so it is as efficient as it can be when dealing with numpy arrays, see Working with NumPy
choosing between writing raw C module and using cython depends on the purpose of the module written.
if you are writing a module that will only be used by python to do a very small specific task with numpy arrays as fast as possible, then by all means do use cython, as it will automate registering the module correctly as well as handle the memory and prevent common mistakes that people do when writing C code (like memory management problems), as well as automate the compiler includes and allow an overall easier access to complicated functionality (like using numpy iterators).
however if your module is going to be used in other languages and has to be run independently from python and has to be used with python without any overhead, and implements some complex C data structures and requires a lot of C functionality then by all means create your own C extension (or even a dll), and you can pass pointers to numpy arrays from python (using numpy.ctypeslib.as_ctypes_type), or pass the python object itself and return it (but you must make a .pyd/so instead of dll), or even create numpy array on C side and have it managed by python (but you will have to understand the numpy C API).
I am just now making some experiments with the Python C api. I have seen that numpy provides some C related numerical fixed size data types e.g. np.int8 or np.float64, that should be the equivalents of C int8_t and double. I was wandering if there is a way to obtain a similar result (without using numpy), to turn a C fixed size datatype into something usable by Python. Since in my project I am already using cython I thought about using C extension types, but I have not found any implementation example of this technique applied over already-existing C data types.
I have seen that numpy provides some C related numerical fixed size data types e.g. np.int8 or np.float64
Perhaps you misunderstand what you see. numpy.int8 etc. are Python extension types wrapping instances of native data types. Some of their characteristics are reflective of that, but instances are still ordinary Python objects.
I was wandering if there is a way to obtain a similar result (without using numpy), to turn a C fixed size datatype into something usable by Python.
Python code cannot work directly with native data types. Neither Numpy nor Cython enables such a thing, though they may give you that impression. You can create Python wrappers for native data types, as Numpy does, but that's more involved than you may suppose, and why reinvent the wheel? You can also implement Python methods in C via the C API or via Cython, but using them involves converting between Python objects and native values.
Since in my project I am already using cython I thought about using C extension types, but I have not found any implementation example of this technique applied over already-existing C data types.
That would be a viable way to create wrappers such as I mentioned above. I don't see what you think is special about Cython extension types as applied to existing C data types, however. Presumably, the extension type would have a single member with the wanted native type:
cdef class CDouble:
cdef double value
def __init__(self, d):
self.value = d
# other methods ...
Of course, it's all the "other methods" that will be the main trick. There is no particular magic here; you'll need to write implementations for all the methods you need, including the methods for supporting Python arithmetic operators such as + (__add__() in that case). Before you start down this road, you might want to run this to get an idea of what Numpy's wrapper types provide:
import numpy
print(dir(numpy.int8(0)))
I get 136 methods, and that's pretty much all per-wrapper. You might not need as much for your purposes, but you really ought to think carefully about how much work you're considering taking on, and whether the benefit is worth the effort.
Just a short question that I can't find the answer to before i head off for the day,
When i do something like this:
v1 = float_list_python = ... # <some list of floats>
v2 = float_array_NumPy = ... # <some numpy.ndarray of floats>
# I guess they don't have to be floats -
# but some object that also has a native
# object in C, so that numpy can just use
# that
If i want to multiply these vectors by a scalar, my understanding has always been that the python list is a list of object references, and so looping through the list to do the multiplication must fetch the locations of all the floats, and then must get the floats in order to do it - which is one of the reasons it's slow.
If i do the same thing in NumPy, then, well, i'm not sure what happens. There are a number of things i imagine could happen:
It splits the multpilication up across the cores.
It vectorises the multications (as well?)
The documentation i've found suggests that many of the primitives in numpy take advantage of the first option there whenever they can (i don't have a computer on hand at the moment i can test it on). And my intuition tells me that number 2 should happen whenever it's possible.
So my question is, if I create a NumPy array of python objects, will it still at least perform operations on the list in parallel? I know that if you create an array of objects that have native C types, then it will actually create a contiguous array in memory of the actual objects, and that if you create an numpy array of python objects it will create an array of references, but i don't see why this would rule out parallel operations on said list, and cannot find anywhere that explicitly states that.
EDIT: I feel there's a bit of confusion over what i'm asking. I understand what vectorisation is, I understand that it is a compiler optimisation, and not something you necesarily program in (though aligning the data such that it's contiguous in memory is important). On the grounds of vectorisation, all i wanted to know was whether or not numpy uses it. If i do something like np_array1 * np_array2 does the underlying library call use vectorisation (presuming that dtype is a compatible type).
For the splitting up over the cores, all i mean there, is if i again do something like np_array1 * np_array2, but this time dtype=object: would it divide that work up amongst there cores?
numpy is fast because it performs numeric operations like this in fast compiled C code. In contrast the list operation operates at the interpreted Python level (streamlined as much as possible with Python bytecodes etc).
A numpy array of numeric type stores those numbers in a data buffer. At least in the simple cases this is just a block of bytes that C code can step through efficiently. The array also has shape and strides information that allows multidimensional access.
When you multiply the array by a scalar, it, in effect, calls a C function titled something like 'multiply_array_by_scalar', which does the multiplication in fast compiled code. So this kind of numpy operation is fast (compared to Python list code) regardless of the number of cores or other multi-processing/threading enhancements.
Arrays of objects do not have any special speed advantage (compared to lists), at least not at this time.
Look at my answer to a question about creating an array of arrays, https://stackoverflow.com/a/28284526/901925
I had to use iteration to initialize the values.
Have you done any time experiments? For example, construct an array, say (1000,2). Use tolist() to create an equivalent list of lists. And make a similar array of objects, with each object being a (2,) array or list (how much work did that take?). Now do something simple like len(x) for each of those sub lists.
#hpaulj provided a good answer to your question. In general, from reading your question it occurred to me that you do not actually understand what "vectorization" does under the hood. This writeup is a pretty decent explanation of vectorization and how it enables faster computations - http://quantess.net/2013/09/30/vectorization-magic-for-your-computations/
With regards to point 1 - Distributing computations across multiple cores, this is not always the case with Numpy. However, there are libraries like numexpr that enable multithreaded, highly efficient Numpy array computations with support for several basic logical and arithmetic operators. Numexpr can be used to turbo charge critical computations when used in conjunction with Numpy as it avoids replicating large arrays in memory for vectorization routines (as is the case for Numpy) and can use all cores on your system to perform computations.
I always considered Python as a highly object-oriented programming language. Recently, I've been using numpy a lot and I'm beginning to wonder why a number of things are implemented there as functions only, and not as methods of the numpy.array (or ndarray) object.
If we have a given array a for example, you can do
a = np.array([1, 2, 3])
np.sum(a)
>>> 6
a.sum()
>>> 6
which seems just fine but there are a lot of calls that do not work in the same way as in:
np.amax(a)
>>> 3
a.amax()
>>> AttributeError: 'numpy.ndarray' object has no attribute 'amax'
I find this confusing, unintuitive and I do not see any reason behind it. There might be a good one, though; maybe someone can just enlighten me.
When numpy was introduced as a successor to Numeric, a lot of things that were just functions and not methods were added as methods to the ndarray type. At this particular time, having the ability to subclass the array type was a new feature. It was thought that making some of these common functions methods would allow subclasses to do the right thing more easily. For example, having .sum() as a method is useful to allow the masked array type to ignore the masked values; you can write generic code that will work for both plain ndarrays and masked arrays without any branching.
Of course, you can't get rid of the functions. A nice feature of the functions is that they will accept any object that can be coerced into an ndarray, like a list of numbers, which would not have all of the ndarray methods. And the specific list of functions that were added as methods can't be all-inclusive; that's not good OO design, either. There's no need for all of the trig functions to be added as methods, for example.
The current list of methods were mostly chosen early in numpy's development for those that we thought were going to be useful as methods in terms of subclassing or notationally convenient. With more experience under our belt, we have mostly come to the opinion that we added too many methods (seriously, there's no need for .ptp() to be a method) and that subclassing ndarray is typically a bad idea for reasons that I won't go into here.
So take the list of methods as mostly a subset of the list of functions that are available, with the np.amin() and np.amax() as slight renames for the .min() and .max() methods to avoid aliasing the builtins.
All the books I've read on data structures so far seem to use C/C++, and make heavy use of the "manual" pointer control that they offer. Since Python hides that sort of memory management and garbage collection from the user is it even possible to implement efficient data structures in this language, and is there any reason to do so instead of using the built-ins?
Python gives you some powerful, highly optimized data structures, both as built-ins and as part of a few modules in the standard library (lists and dicts, of course, but also tuples, sets, arrays in module array, and some other containers in module collections).
Combinations of these data structures (and maybe some of the functions from helper modules such as heapq and bisect) are generally sufficient to implement most richer structures that may be needed in real-life programming; however, that's not invariably the case.
When you need something more than the rich library provides, consider the fact that an object's attributes (and items in collections) are essentially "pointers" to other objects (without pointer arithmetic), i.e., "reseatable references", in Python just like in Java. In Python, you normally use a None value in an attribute or item to represent what NULL would mean in C++ or null would mean in Java.
So, for example, you could implement binary trees via, e.g.:
class Node(object):
__slots__ = 'payload', 'left', 'right'
def __init__(self, payload=None, left=None, right=None):
self.payload = payload
self.left = left
self.right = right
plus methods or functions for traversal and similar operations (the __slots__ class attribute is optional -- mostly a memory optimization, to avoid each Node instance carrying its own __dict__, which would be substantially larger than the three needed attributes/references).
Other examples of data structures that may best be represented by dedicated Python classes, rather than by direct composition of other existing Python structures, include tries (see e.g. here) and graphs (see e.g. here).
For some simple data structures (eg. a stack), you can just use the builtin list to get your job done. With more complex structures (eg. a bloom filter), you'll have to implement them yourself using the primitives the language supports.
You should use the builtins if they serve your purpose really since they're debugged and optimised by a horde of people for a long time. Doing it from scratch by yourself will probably produce an inferior data structure.
If however, you need something that's not available as a primitive or if the primitive doesn't perform well enough, you'll have to implement your own type.
The details like pointer management etc. are just implementation talk and don't really limit the capabilities of the language itself.
C/C++ data structure books are only attempting to teach you the underlying principles behind the various structures - they are generally not advising you to actually go out and re-invent the wheel by building your own library of stacks and lists.
Whether you're using Python, C++, C#, Java, whatever, you should always look to the built in data structures first. They will generally be implemented using the same system primitives you would have to use doing it yourself, but with the advantage of having been tried and tested.
Only when the provided data structures do not allow you to accomplish what you need, and there isn't an alternative and reliable library available to you, should you be looking at building something from scratch (or extending what's provided).
How Python handles objects at a low level isn't too strange anyway. This document should disambiguate it a tad; it's basically all the pointer logic you're already familiar with.
With Python you have access to a vast assortment of library modules written and debugged by other people. Odds are very good that somewhere out there, there is a module that does at least part of what you want, and odds are even good that it might be implemented in C for performance.
For example, if you need to do matrix math, you can use NumPy, which was written in C and Fortran.
Python is slow enough that you won't be happy if you try to write some sort of really compute-intensive code (example, a Fast Fourier Transform) in native Python. On the other hand, you can get a C-coded Fourier Transform as part of SciPy, and just use it.
I have never had a situation where I wanted to solve a problem in Python and said "darn, I just can't express the data structure I need."
If you are a pioneer, and you are doing something in Python for which there just isn't any library module out there, then you can try writing it in pure Python. If it is fast enough, you are done. If it is too slow, you can profile it, figure out where the slow parts are, and rewrite them in C using the Python C API. I have never needed to do this yet.
It's not possible to implement something like a C++ vector in Python, since you don't have array primitives the way C/C++ do. However, anything more complicated can be implemented (efficiently) on top of it, including, but not limited to: linked lists, hash tables, multisets, bloom filters, etc.