creatiing a cupy array from python list is slow

creatiing a cupy array from python list is slow - python

I am using cupy to do the following operations and this is pretty fast:
import cupy as cp
shape = (256, 170, 256)
deformation = cp.meshgrid(cp.arange(shape[0]),
cp.arange(shape[1]),
cp.arange(shape[2]),
indexing='ij')
However, if I convert it to an array as:
deformation = cp.array(cp.meshgrid(cp.arange(shape[0]),
cp.arange(shape[1]),
cp.arange(shape[2]),
indexing='ij'))
This seems to very slow or just hangs (I gave up after 5 minutes). I am not sure what I am doing wrong here.
I also tried passing copy=False to the cp.array call but this did not change anything.

I don't believe this conversion of list of cupy arrays to cupy array is supported. If I make your shape much smaller, e.g. (8,8,8) I get a python error.
If we study the documentation for cupy.meshgrid, we see that it returns:
Returns: list of cupy.ndarray
The cupy documentation specifically says:
Currently, cupy.array() or cupy.asarray() cannot create an array from Python object containing CuPy array (e.g., a list of CuPy arrays). Use cupy.stack() instead.
Using the suggestion there, this seems to work relatively quickly for me:
$ cat t6.py
import cupy as cp
shape = (256, 170, 256)
deformation = cp.stack(cp.meshgrid(cp.arange(shape[0]),
cp.arange(shape[1]),
cp.arange(shape[2]),
indexing='ij'))
$ time python t6.py
real 0m1.281s
user 0m0.608s
sys 0m0.492s
$

Related

ddeint - problem with reproducing examples provided in pypi

I wanted to use the ddeint in my project. I copied the two examples provided on
https://pypi.org/project/ddeint/
and only the second one works. When I'm running the first one:
from pylab import cos, linspace, subplots
from ddeint import ddeint
def model(Y, t):
return -Y(t - 3 * cos(Y(t)) ** 2)
def values_before_zero(t):
return 1
tt = linspace(0, 30, 2000)
yy = ddeint(model, values_before_zero, tt)
fig, ax = subplots(1, figsize=(4, 4))
ax.plot(tt, yy)
ax.figure.savefig("variable_delay.jpeg")
The following error occures:
Traceback (most recent call last):
File "C:\Users\piobo\PycharmProjects\pythonProject\main.py", line 14, in <module>
yy = ddeint(model, values_before_zero, tt)
File "C:\Users\piobo\PycharmProjects\pythonProject\venv\lib\site-packages\ddeint\ddeint.py", line 145, in ddeint
return np.array([g(tt[0])] + results)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2000,) + inhomogeneous part.
I'm using python 3.9. Could anyone advise us on what I'm doing wrong? I didn't manipulate the code in any way.

Reproduction
Could not reproduce - the code runs when using following versions:
Python 3.6.9 (python3 --version)
ddeint 0.2 (pip3 show ddeint)
Numpy 1.18.3 (pip3 show numpy)
Upgrading numpy to 1.19
Then I got following warning:
/.local/lib/python3.6/site-packages/ddeint/ddeint.py:145: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
return np.array([g(tt[0])] + results)
But the output JPEG was created successfully.
Using Python 3.8 with latest numpy
Using Python 3.8 with a fresh install of ddeint using numpy 1.24.0:
Python 3.8
ddeint 0.2
Numpy 1.24.0
Now I could reproduce the error.
Hypotheses
Since this example does not run successfully out-of-the-box in the question's environment, I assume it is an issue with numpy versions.
Issue with versions
See Numpy 1.19 deprecation warning · Issue #9 · Zulko/ddeint · GitHub which seems related to this code line that we see in the error stacktrace:
return np.array([g(tt[0])] + results)
Directly using numpy
I suppose the tt value is the issue here. It is returned by pylab's linspace() function call (below written with module prefix):
tt = pylab.linspace(0, 30, 2000)
On MatPlotLib's pylab docs there is a warning:
Since heavily importing into the global namespace may result in unexpected behavior, the use of pylab is strongly discouraged. Use matplotlib.pyplot instead.
Furthermore the module pylab is explained as mixed bag:
pylab is a module that includes matplotlib.pyplot, numpy, numpy.fft, numpy.linalg, numpy.random, and some additional functions, all within a single namespace. Its original purpose was to mimic a MATLAB-like way of working by importing all functions into the global namespace. This is considered bad style nowadays.
Maybe you can use numpy.linspace() function directly.
Attention: There was a change for the dtype default:
The type of the output array. If dtype is not given, the data type is inferred from start and stop. The inferred dtype will never be an integer; float is chosen even if the arguments would produce an array of integers.
Since here arguments start and stop are given as 0 and 30, also the dtype should be integer (like in previous numpy version 1.19).
Note the braking-change:
Since 1.20.0
Values are rounded towards -inf instead of 0 when an integer dtype is specified. The old behavior can still be obtained with np.linspace(start, stop, num).astype(int)
So, we could replace the tt-assignment line with:
from numpy import np
tt = np.linspace(0, 30, 2000).astype(int)

Create mxnet.ndarray.NDArray from pycuda.driver.DeviceAllocation

I am trying to pass output of some pycuda operation to the input of mxnet computational graph.
I am able to achieve this via numpy conversion with the following code
import pycuda.driver as cuda
import pycuda.autoinit
import numpy as np
import mxnet as mx
batch_shape = (1, 1, 10, 10)
h_input = np.zeros(shape=batch_shape, dtype=np.float32)
# init output with ones to see if contents really changed
h_output = np.ones(shape=batch_shape, dtype=np.float32)
device_ptr = cuda.mem_alloc(input.nbytes)
stream = cuda.Stream()
cuda.memcpy_htod_async(d_input, h_input, stream)
# here some actions with d_input may be performed, e.g. kernel calls
# but for the sake of simplicity we'll just transfer it back to host
cuda.memcpy_dtoh_async(d_input, h_output, stream)
stream.synchronize()
mx_input = mx.nd(h_output, ctx=mx.gpu(0))
print('output after pycuda calls: ', h_output)
print('mx_input: ', mx_input)
However i would like to avoid the overhead of device-to-host and host-to-device memory copying.
I couldn't find a way to construct mxnet.ndarray.NDArray directly from h_output.
The closest thing that i was able to find is construction of ndarray from dlpack.
But it is not clear how to work with dlpack object from python.
Is there a way fo achieve NDArray <-> pycuda interoperability without copying memory via host?

Unfortunately, it is not possible at the moment.

numpy 'module' object has no attribute 'stack'

I am trying to run some code (which is not mine), where is used 'stack' from numpy library.
Looking into documentation, stack really exists in numpy:
https://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.stack.html
but when I run the code, I got:
AttributeError: 'module' object has no attribute 'stack'
any idea how to fix this.
code extract:
s_t = np.stack((x_t, x_t, x_t, x_t), axis = 2)
do I need some old libraries?
Thanks.
EDIT:
for some reason, python uses older version of numpy library. pip2 freeze prints "numpy==1.10.4". I've also reinstalled numpy and I've got "Successfully installed numpy-1.10.4", but printing np.version.version in code gives me 1.8.2.

The function numpy.stack is new; it appeared in numpy == 1.10.0. If you can't get that version running on your system, the code can be found at (near the end)
https://github.com/numpy/numpy/blob/f4cc58c80df5202a743bddd514a3485d5e4ec5a4/numpy/core/shape_base.py
I need to examine it a bit more, but the working part of the function is:
sl = (slice(None),) * axis + (_nx.newaxis,)
expanded_arrays = [arr[sl] for arr in arrays]
return _nx.concatenate(expanded_arrays, axis=axis)
So it adds a np.newaxis to each array, and then concatenate on that. So like, vstack, hstack and dstack it adjusts the dimensions of the inputs, and then uses np.concatenate. Nothing particularly new or magical.
So if x is (2,3) shape, x[:,np.newaxis] is (2,1,3), x[:,:,np.newaxis] is (2,3,1) etc.
If x_t is 2d, then
np.stack((x_t, x_t, x_t, x_t), axis = 2)
is probably the equivalent of
np.dstack((x_t, x_t, x_t, x_t))
creating a new array that has size 4 on axis 2.
Or:
tmp = x_t[:,:,None]
np.concatenate((tmp,tmp,tmp,tmp), axis=2)

It is likely have 2 numpy libraries, one in your System libraries, and the other in your python's site packages which is maintained by pip. You have a few options to fix this.
You should reorder the libraries in sys.path so your pip installed numpy library comes in front the native numpy library. Check this out to fix your path permanently.
Also look into virtualenv or Anaconda, which will allow you to work with specific versions of a package when you have multiple versions on your system.
Here's another suggestion about how to ensure pip installs the library on your user path (System Library).

recv_into a numpy array

I am transmiting images by sockets from a camera that runs wince :(
The images in the camera are just float arrays created using realloc for the given x * y size
On the other end, I am receiving these images in python.
I have this code working doing
img_dtype = np.float32
img_rcv = np.empty((img_y, img_x),
dtype = img_dtype)
p = sck.recv_into(img_rcv,
int(size_bytes),
socket.MSG_WAITALL)
if size_bytes != p:
print "Mismatch between expected and received data amount"
return img_rcv
I am a little bit confused about the way numpy creates its arrays and I am wondering if this img_rcv will be compatible with the way recv_into works.
My questions are:
How safe is this?
Does the memory allocation for the numpy array will be known for recv_into?
Are the numpy arrays creation routines equivalent to a malloc?
It is just working because I am lucky?

The answers are:
safe
yes, via the buffer interface
yes, in the sense that you get a block of memory you can work with
no

Pyopencl: difference between to_device and Buffer

Let
import pyopencl as cl
import pyopencl.array as cl_array
import numpy
a = numpy.random.rand(50000).astype(numpy.float32)
mf = cl.mem_flags
What is the difference between
a_gpu = cl.Buffer(self.ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=a)
and
a_gpu = cl_array.to_device(self.ctx, self.queue, a)
?
And what is the difference between
result = numpy.empty_like(a)
cl.enqueue_copy(self.queue, result, result_gpu)
and
result = result_gpu.get()
?

Buffers are CL's version of malloc, while pyopencl.array.Array is a workalike of numpy arrays on the compute device.
So for the second version of the first part of your question, you may write a_gpu + 2 to get a new arrays that has 2 added to each number in your array, whereas in the case of the Buffer, PyOpenCL only sees a bag of bytes and cannot perform any such operation.
The second part of your question is the same in reverse: If you've got a PyOpenCL array, .get() copies the data back and converts it into a (host-based) numpy array. Since numpy arrays are one of the more convenient ways to get contiguous memory in Python, the second variant with enqueue_copy also ends up in a numpy array--but note that you could've copied this data into an array of any size (as long as it's big enough) and any type--the copy is performed as a bag of bytes, whereas .get() makes sure you get the same size and type on the host.
Bonus fact: There is of course a Buffer underlying each PyOpenCL array. You can get it from the .data attribute.

To answer the first question, Buffer(hostbuf=...) can be called with anything that implements the buffer interface (reference). pyopencl.array.to_device(...) must be called with an ndarray (reference). ndarray implements the buffer interface and works in either place. However, only hostbuf=... would be expected to work with for example a bytearray (which also implements the buffer interface). I have not confirmed this, but it appears to be what the docs suggest.
On the second question, I am not sure what type result_gpu is supposed to be when you call get() on it (did you mean Buffer.get_host_array()?) In any case, enqueue_copy() works between combination of Buffer, Image and host, can have offsets and regions, and can be asynchronous (with is_blocking=False), and I think these capabilities are only available that way (whereas get() would be blocking and return the whole buffer). (reference)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

creatiing a cupy array from python list is slow - python

Related

ddeint - problem with reproducing examples provided in pypi

Create mxnet.ndarray.NDArray from pycuda.driver.DeviceAllocation

numpy 'module' object has no attribute 'stack'

recv_into a numpy array

Pyopencl: difference between to_device and Buffer

Categories

Resources