Share scipy.sparse arrays with zero-copy in Python's Ray

Share scipy.sparse arrays with zero-copy in Python's Ray - python

I pass large scipy.sparse arrays to a parallel processes on shared memory of one computing node. In each round of parallel jobs, the passed array will not be modified. I want to pass the array with zero-copy.
While this is possible with multiprocessing.RawArray() and numpy.sharedmem (see here), I am wondering how ray's put() works.
As far as I understood (see memory management, [1], [2]), ray's put() copies the object once and for all (serialize, then de-serialize) to the object store that is available for all processes.
Question:
I am not sure I understood it correctly, is it a deep copy of the entire array in the object store or just a reference to it? Is there a way to "not" copy the object at all? Rather, just pass the address/reference of the existing scipy array? Basically, a true shallow copy without the overhead of copying the entire array.
Ubuntu 16.04, Python 3.7.6, Ray 0.8.5.

Related

What is the difference between detach, clone and deepcopy in Pytorch tensors in detail?

I've been struggling to understand the differences between .clone(), .detach() and copy.deepcopy when using Pytorch. In particular with Pytorch tensors.
I tried writing all my question about their differences and uses cases and became overwhelmed quickly and realized that perhaps have the 4 main properties of Pytorch tensors would clarify much better which one to use that going through every small question. The 4 main properties I realized one needs keep track are:
if one has a new pointer/reference to a tensor
if one has a new tensor object instance (and thus most likely this new instance has it's own meta-data like require_grads, shape, is_leaf, etc.)
if it has allocated a new memory for the tensor data (i.e. if this new tensor is a view of a different tensor)
if it's tracking the history of operations or not (or even if it's tracking a completely new history of operations or the same old one in the case of deep copy)
According to what mined out from the Pytorch forums and the documentation this is my current distinctions for each when used on tensors:
Clone
For clone:
x_cloned = x.clone()
I believe this is how it behaves according to the main 4 properties:
the cloned x_cloned has it's own python reference/pointer to the new object
it has created it's own new tensor object instance (with it's separate meta-data)
it has allocated a new memory for x_new with the same data as x
it is keeping track of the original history of operations and in addition included this clone operation as .grad_fn=<CloneBackward>
it seems that the main use of this as I understand is to create copies of things so that inplace_ operations are safe. In addition coupled with .detach as .detach().clone() (the "better" order to do it btw) it creates a completely new tensor that has been detached with the old history and thus stops gradient flow through that path.
Detach
x_detached = x.detach()
creates a new python reference (the only one that does not is doing x_new = x of course). One can use id for this one I believe
it has created it's own new tensor object instance (with it's separate meta-data)
it has NOT allocated a new memory for x_detached with the same data as x
it cuts the history of the gradients and does not allow it to flow through it. I think it's right to think of it as having no history, as a brand new tensor.
I believe the only sensible use I know of is of creating new copies with it's own memory when coupled with .clone() as .detach().clone(). Otherwise, I am not sure what the use it. Since it points to the original data, doing in place ops might be potentially dangerous (since it changes the old data but the change to the old data is NOT known by autograd in the earlier computation graph).
copy.deepcopy
x_deepcopy = copy.deepcopy(x)
if one has a new pointer/reference to a tensor
it creates a new tensor instance with it's own meta-data (all of the meta-data should point to deep copies, so new objects if it's implemented as one would expect I hope).
it has it's own memory allocated for the tensor data
If it truly is a deep copy, I would expect a deep copy of the history. So it should do a deep replication of the history. Though this seems really expensive but at least semantically consistent with what deep copy should be.
I don't really see a use case for this. I assume anyone trying to use this really meant 1) .detach().clone() or just 2) .clone() by itself, depending if one wants to stop gradient flows to the earlier graph with 1 or if they want just to replicate the data with a new memory 2).
So this is the best way I have to understand the differences as of now rather than ask all the different scenarios that one might use them.
So is this right? Does anyone see any major flaw that needs to be correct?
My own worry is about the semantics I gave to deep copy and wonder if it's correct wrt the deep copying the history.
I think a list of common use cases for each would be wonderful.
Resources
these are all the resources I've read and participated to arrive at the conclusions in this question:
Migration guide to 0.4.0 https://pytorch.org/blog/pytorch-0_4_0-migration-guide/
Confusion about using clone: https://discuss.pytorch.org/t/confusion-about-using-clone/39673/3
Clone and detach in v0.4.0: https://discuss.pytorch.org/t/clone-and-detach-in-v0-4-0/16861/2
Docs for clone:
https://pytorch.org/docs/stable/tensors.html#torch.Tensor.clone
Docs for detach (search for the word detach in your browser there is no direct link):
https://pytorch.org/docs/stable/tensors.html#torch.Tensor
Difference between detach().clone() and clone().detach(): https://discuss.pytorch.org/t/difference-between-detach-clone-and-clone-detach/34173
Why am I able to change the value of a tensor without the computation graph knowing about it in Pytorch with detach? Why am I able to change the value of a tensor without the computation graph knowing about it in Pytorch with detach?
What is the difference between detach, clone and deepcopy in Pytorch tensors in detail? What is the difference between detach, clone and deepcopy in Pytorch tensors in detail?
Copy.deepcopy() vs clone() https://discuss.pytorch.org/t/copy-deepcopy-vs-clone/55022/10

Note: Since this question was posted the behaviour and doc pages for these functions have been updated.
torch.clone()
Copies the tensor while maintaining a link in the autograd graph. To be used if you want to e.g. duplicate a tensor as an operation in a neural network (for example, passing a mid-level representation to two different heads for calculating different losses):
Returns a copy of input.
NOTE: This function is differentiable, so gradients will flow back from the result of this operation to input. To create a tensor without an autograd relationship to input see detach().
torch.tensor.detach()
Returns a view of the original tensor without the autograd history. To be used if you want to manipulate the values of a tensor (not in place) without affecting the computational graph (e.g. reporting values midway through the forward pass).
Returns a new Tensor, detached from the current graph.
The result will never require gradient.
This method also affects forward mode AD gradients and the result will never have forward mode AD gradients.
NOTE: Returned Tensor shares the same storage with the original one. In-place modifications on either of them will be seen, and may trigger errors in correctness checks. 1
copy.deepcopy
deepcopy is a generic python function from the copy library which makes a copy of an existing object (recursively if the object itself contains objects).
This is used (as opposed to more usual assignment) when the underlying object you wish to make a copy of is mutable (or contains mutables) and would be susceptible to mirroring changes made in one:
Assignment statements in Python do not copy objects, they create bindings between a target and an object. For collections that are mutable or contain mutable items, a copy is sometimes needed so one can change one copy without changing the other.
In a PyTorch setting, as you say, if you want a fresh copy of a tensor object to use in a completely different setting with no relationship or effect on its parent, you should use .detach().clone().
IMPORTANT NOTE: Previously, in-place size / stride / storage changes (such as resize_ / resize_as_ / set_ / transpose_) to the returned tensor also update the original tensor. Now, these in-place changes will not update the original tensor anymore, and will instead trigger an error. For sparse tensors: In-place indices / values changes (such as zero_ / copy_ / add_) to the returned tensor will not update the original tensor anymore, and will instead trigger an error.

Python: Does array creation automatically allocate memory?

When using large arrays, does python allocate memory as default, unlike C for example?
More specifically, when using the command array=[1,2,3], should I worry about freeing this and every other array I create?
Looking for answers on the web just confused me more.

array=[1,2,3] is a list, not an array. It is dynamically allocated (resizes automatically), and you do not have to free up memory.
The same applies to arrays from the array module in the standard library, and arrays from the numpy library.
As a rule, python handles memory allocation and memory freeing for all its objects; to, maybe, the exception of some objects created using cython, or directly calling c modules.

What happens when a numpy memmap array is copied?

I am read-onlying from a 70GB memmap array, but only using ~300MB from it. Learning from this answer, memmap doesn't actually use physical memory, so I figured I should copy the required array into physical memory for better performance.
However, when I np.copy() a memmap and np.info() the copied array, the class is a memmap. Regardless of this speculation, I see more memory usage and improvement in performance when using a copied array.
Does a copied memmap use physical memory? Or is something else going on behind the scene? Is it that it just looks like I'm using physical memory for the copied array, and my computer is deceiving me like always?

numpy.memmap is a subclass of numpy.ndarray. memmap does not override the ndarray.copy() method, so the semantics of ndarray.copy() are not touched. A copy into newly-allocated memory is indeed made. For a number of reasons, ndarray.copy() tries to keep the type of the returned object the same when a subclass is used. It makes less sense for numpy.memmap but much more sense for other subclasses like numpy.matrix.
In the case of numpy.memmap, the mmap-specific attributes in the copy are set to None, so the copied array will behave just like a numpy.ndarray except that its type will still be numpy.memmap. Check the ._mmap attribute in both the source and the copy to verify.

Memory copy when casting a list of 2D numpy arrays into a 3D array

I'm trying to cast a list of 2D numpy arrays into a 3D array, and I would like to know whether or not the data is copied.
For example, with this program:
images = []
for i in range(10):
images.append(numpy.random.rand(100, 100))
volume = numpy.array(images)
Is there a way to check if volume[n] is referring to the same memory block as images[n]?
I need my data to be a 3D array, and I'm trying to assess whether I should accept lists of images as an input, or if this will cause data to be copied. Data copy is not acceptable, as I'm working with very large datasets.

I could just reference an question about the storage differences between lists and arrays, but to tailor it your case:
Your list has a data buffer with pointers to the array objects stored elsewhere in memory. images.append just updates that pointer list. A list copy will just copy the pointers.
An array stores all of its data in a contiguous memory buffer. So to create volume np.array() has to copy values from each of the component arrays to its own buffer. The same applies if some version of np.concatenate is used to compile the 3d array.
numpy functions often have a x=np.asarray(x) statement at the start. In effect it is saying, "I'm working with an array, but I'll let you give me a list".
You could skip that and accept only a 3d array. But how was that 3d array constructed? For a random 3d array you can get by with one statement:
arr = numpy.random.rand(10, 100, 100)
but if the images are load individually from files, something or someone will have to perform one or more copies to create that 3d array of images. Will it be you, or your users?
My general advise is don't be too paranoid about making copies - until your code is running and you know, from profiling that the copies are expensive, or you start hitting MemoryError problems.

After importing numpy, Windows Task Manager said my Python process used 14 MB. After building images (though with range(5000)), it was at 396 MB (382 MB more). After building volume, it was at 778 MB (another 382 MB more). So looks like it copies. Using NumPy 1.11.1 in Python 3.5.2 on Windows 10.

numpy allows you to test whether two arrays share memory (which is a decent proxy for weren't copied in this case)
for i, img in enumerate(images):
print(i, numpy.may_share_memory(img, volume))
# all False
Looks like they were copied this time.

From numpy.array:
copy : bool, optional
If true (default), then the object is copied. Otherwise, a copy will only be made if __array__ returns a copy, if obj is a nested sequence, or if a copy is needed to satisfy any of the other requirements (dtype, order, etc.).
I'm pretty sure after testing with copy=False that you have a nested sequence. Unfortunately determining if __array__ returns a copy for a list (or any other iterator) is beyond my google-fu, but it seems likely, as you can't natively iterate a numpy array.

Boost python: passing large data structure to python

I'm currently embedding Python in my C++ program using boost/python in order to use matplotlib. Now I'm stuck at a point where I have to construct a large data structure, let's say a dense 10000x10000 matrix of doubles. I want to plot columns of that matrix and I figured that i have multiple options to do so:
Iterating and copying every value into a numpy array --> I don't want to do that for an obvious reason which is doubled memory consumption
Iterating and exporting every value into a file than importing it in python --> I could do that completely without boost/python and I don't think this is a nice way
Allocate and store the matrix in Python and just update the values from C++ --> But as stated here it's not a good idea to switch back and forth between the Python interpreter and my C++ program
Somehow expose the matrix to python without having to copy it --> All I can find on that matter is about extending Python with C++ classes and not embedding
Which of these is the best option concerning performance and of course memory consumption or is there an even better way of doing that kind of task.

To prevent copying in Boost.Python, one can either:
Use policies to return internal references
Allocate on the free store and use policies to have Python manage the object
Allocate the Python object then extract a reference to the array within C++
Use a smart pointer to share ownership between C++ and Python
If the matrix has a C-style contiguous memory layout, then consider using the Numpy C-API. The PyArray_SimpleNewFromData() function can be used to create an ndarray object thats wraps memory that has been allocated elsewhere. This would allow one to expose the data to Python without requiring copying or transferring each element between the languages. The how to extend documentation is a great resource for dealing with the Numpy C-API:
Sometimes, you want to wrap memory allocated elsewhere into an ndarray object for downstream use. This routine makes it straightforward to do that. [...] A new reference to an ndarray is returned, but the ndarray will not own its data. When this ndarray is deallocated, the pointer will not be freed.
[...]
If you want the memory to be freed as soon as the ndarray is deallocated then simply set the OWNDATA flag on the returned ndarray.
Also, while the plotting function may create copies of the array, it can do so within the C-API, allowing it to take advantage of the memory layout.
If performance is a concern, it may be worth considering the plotting itself:
taking a sample of the data and plotting it may be sufficient depending on the data distribution
using a raster based backend, such as Agg, will often out perform vector based backends on large datasets
benchmarking other tools that are designed for large data, such as Vispy

Altough Tanner's answer brought me a big step forward, I ended up using Boost.NumPy, an inofficial extension to Boost.Python that can easily be added. It wraps around the NumPy C API and makes it more save and easier to use.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.