I am read-onlying from a 70GB memmap array, but only using ~300MB from it. Learning from this answer, memmap doesn't actually use physical memory, so I figured I should copy the required array into physical memory for better performance.
However, when I np.copy() a memmap and np.info() the copied array, the class is a memmap. Regardless of this speculation, I see more memory usage and improvement in performance when using a copied array.
Does a copied memmap use physical memory? Or is something else going on behind the scene? Is it that it just looks like I'm using physical memory for the copied array, and my computer is deceiving me like always?
numpy.memmap is a subclass of numpy.ndarray. memmap does not override the ndarray.copy() method, so the semantics of ndarray.copy() are not touched. A copy into newly-allocated memory is indeed made. For a number of reasons, ndarray.copy() tries to keep the type of the returned object the same when a subclass is used. It makes less sense for numpy.memmap but much more sense for other subclasses like numpy.matrix.
In the case of numpy.memmap, the mmap-specific attributes in the copy are set to None, so the copied array will behave just like a numpy.ndarray except that its type will still be numpy.memmap. Check the ._mmap attribute in both the source and the copy to verify.
Related
I pass large scipy.sparse arrays to a parallel processes on shared memory of one computing node. In each round of parallel jobs, the passed array will not be modified. I want to pass the array with zero-copy.
While this is possible with multiprocessing.RawArray() and numpy.sharedmem (see here), I am wondering how ray's put() works.
As far as I understood (see memory management, [1], [2]), ray's put() copies the object once and for all (serialize, then de-serialize) to the object store that is available for all processes.
Question:
I am not sure I understood it correctly, is it a deep copy of the entire array in the object store or just a reference to it? Is there a way to "not" copy the object at all? Rather, just pass the address/reference of the existing scipy array? Basically, a true shallow copy without the overhead of copying the entire array.
Ubuntu 16.04, Python 3.7.6, Ray 0.8.5.
I would like to know how does Python actually manages memory allocation for ndarrays.
I loaded a file that contains 32K floating value using numpy loadtxt, so the ndarray size should be 256KB data.
Actually, ndarray.nbytes gives the right size.
However, the memory occupation after loading data is increased by 2MB: I don't unserstand why this difference.
I'm not sure exactly how you measure memory occupation, but when looking at the memory footprint of your entire app there's a lot more that could be happening that causes these kind of memory occupation increases.
In this case, I suspect that the loadtxt function uses some buffering or otherwise copies the data which wasn't cleared yet by the GC.
But other things could be happening as well. Maybe the numpy back-end loads some extra stuff the first time it initialises a ndarray. Either way, you could only truly figure this stuff out by reading the numpy source could which is available freely on github. The implementation of loadtxt can be found here: https://github.com/numpy/numpy/blob/5b22ee427e17706e3b765cf6c65e924d89f3bfce/numpy/lib/npyio.py#L797
When using large arrays, does python allocate memory as default, unlike C for example?
More specifically, when using the command array=[1,2,3], should I worry about freeing this and every other array I create?
Looking for answers on the web just confused me more.
array=[1,2,3] is a list, not an array. It is dynamically allocated (resizes automatically), and you do not have to free up memory.
The same applies to arrays from the array module in the standard library, and arrays from the numpy library.
As a rule, python handles memory allocation and memory freeing for all its objects; to, maybe, the exception of some objects created using cython, or directly calling c modules.
I'm currently embedding Python in my C++ program using boost/python in order to use matplotlib. Now I'm stuck at a point where I have to construct a large data structure, let's say a dense 10000x10000 matrix of doubles. I want to plot columns of that matrix and I figured that i have multiple options to do so:
Iterating and copying every value into a numpy array --> I don't want to do that for an obvious reason which is doubled memory consumption
Iterating and exporting every value into a file than importing it in python --> I could do that completely without boost/python and I don't think this is a nice way
Allocate and store the matrix in Python and just update the values from C++ --> But as stated here it's not a good idea to switch back and forth between the Python interpreter and my C++ program
Somehow expose the matrix to python without having to copy it --> All I can find on that matter is about extending Python with C++ classes and not embedding
Which of these is the best option concerning performance and of course memory consumption or is there an even better way of doing that kind of task.
To prevent copying in Boost.Python, one can either:
Use policies to return internal references
Allocate on the free store and use policies to have Python manage the object
Allocate the Python object then extract a reference to the array within C++
Use a smart pointer to share ownership between C++ and Python
If the matrix has a C-style contiguous memory layout, then consider using the Numpy C-API. The PyArray_SimpleNewFromData() function can be used to create an ndarray object thats wraps memory that has been allocated elsewhere. This would allow one to expose the data to Python without requiring copying or transferring each element between the languages. The how to extend documentation is a great resource for dealing with the Numpy C-API:
Sometimes, you want to wrap memory allocated elsewhere into an ndarray object for downstream use. This routine makes it straightforward to do that. [...] A new reference to an ndarray is returned, but the ndarray will not own its data. When this ndarray is deallocated, the pointer will not be freed.
[...]
If you want the memory to be freed as soon as the ndarray is deallocated then simply set the OWNDATA flag on the returned ndarray.
Also, while the plotting function may create copies of the array, it can do so within the C-API, allowing it to take advantage of the memory layout.
If performance is a concern, it may be worth considering the plotting itself:
taking a sample of the data and plotting it may be sufficient depending on the data distribution
using a raster based backend, such as Agg, will often out perform vector based backends on large datasets
benchmarking other tools that are designed for large data, such as Vispy
Altough Tanner's answer brought me a big step forward, I ended up using Boost.NumPy, an inofficial extension to Boost.Python that can easily be added. It wraps around the NumPy C API and makes it more save and easier to use.
I've got a large chunk of generated data (A[i,j,k]) on the device, but I only need one 'slice' of A[i,:,:], and in regular CUDA this could be easily accomplished with some pointer arithmetic.
Can the same thing be done within pycuda? i.e
cuda.memcpy_dtoh(h_iA,d_A+(i*stride))
Obviously this is completely wrong since theres no size information (unless inferred from the dest shape), but hopefully you get the idea?
The pyCUDA gpuArray class supports slicing of 1D arrays, but not higher dimensions that require a stride (although it is coming). You can, however, get access to the underlying pointer in a multidimensional gpuArray from the gpuarray member, which is a pycuda.driver.DeviceAllocation type, and the size information from the gpuArray.dtype.itemsize member. You can then do the same sort of pointer arithmetic you had in mind to get something that the driver memcpy functions will accept.
It isn't very pythonic, but it does work (or at least it did when I was doing a lot of pyCUDA + MPI hacking last year).
Is unlikely that is implemented in PyCuda.
I can think to the following solutions:
Copy the entire Array A in memory and make a numpy array from the interested slice.
Create a Kernel that read the matrix and creates the desired slice.
Rearrange the Produced Data in a way that you can read a slice at a time from pointer arithmetic.