Can I pass bytes between python processes with multiprocessing shared_memory

Can I pass bytes between python processes with multiprocessing shared_memory - python

I am trying to pass sound data to a subprocess in python through shared_memory. Currently, in one program I am converting the sound byte data to a numpy array of int16. I can access the shared_memory of the numpy array from both python processes but the conversion of the numpy array back to a bytearray takes too long for what I am trying to do. Is there a way to just pass the byte array to a python subprocess (through shared_memory or something else)?
The python example I code I based my code off of is:
>>> # In the first Python interactive shell
>>> import numpy as np
>>> a = np.array([1, 1, 2, 3, 5, 8]) # Start with an existing NumPy array
>>> from multiprocessing import shared_memory
>>> shm = shared_memory.SharedMemory(create=True, size=a.nbytes)
>>> # Now create a NumPy array backed by shared memory
>>> b = np.ndarray(a.shape, dtype=a.dtype, buffer=shm.buf)
>>> b[:] = a[:] # Copy the original data into shared memory
>>> b
array([1, 1, 2, 3, 5, 8])
>>> type(b)
<class 'numpy.ndarray'>
>>> type(a)
<class 'numpy.ndarray'>
>>> shm.name # We did not specify a name so one was chosen for us
'psm_21467_46075'
>>> # In either the same shell or a new Python shell on the same machine
>>> import numpy as np
>>> from multiprocessing import shared_memory
>>> # Attach to the existing shared memory block
>>> existing_shm = shared_memory.SharedMemory(name='psm_21467_46075')
>>> # Note that a.shape is (6,) and a.dtype is np.int64 in this example
>>> c = np.ndarray((6,), dtype=np.int64, buffer=existing_shm.buf)
>>> c
array([1, 1, 2, 3, 5, 8])
>>> c[-1] = 888
>>> c
array([ 1, 1, 2, 3, 5, 888])
>>> # Back in the first Python interactive shell, b reflects this change
>>> b
array([ 1, 1, 2, 3, 5, 888])
>>> # Clean up from within the second Python shell
>>> del c # Unnecessary; merely emphasizing the array is no longer used
>>> existing_shm.close()
>>> # Clean up from within the first Python shell
>>> del b # Unnecessary; merely emphasizing the array is no longer used
>>> shm.close()
>>> shm.unlink() # Free and release the shared memory block at the very end
The data is saved in shared_memory in a int16 numpy array (c)
To input the sound data into pyaudio.stream.write I have to do the following conversion
>>> c = np.array([ 1, 1, 2, 3, 5, 888])
>>> c
array([ 1, 1, 2, 3, 5, 888])
>>> bytedata = b''.join(c)
>>> bytedata
b'\x01\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\x05\x00\x00\x00x\x03\x00\x00'
>>>
Is it possible to have this bytes format stored in a shared_memory location?
Ideally a working version of:
store_byte_array = np.bytearray(c, dtype=np.int16,buffer=shared_memory.buf)
Thanks in advance!

Related

(Python multiprocessing) How can I access an array shared with multiprocessing.shared_memory.SharedMemory?

I am trying to understand how multiprocessing.shared_memory.SharedMemory works. I tried to run the second example from https://docs.python.org/3/library/multiprocessing.shared_memory.html - but it does not seem to work as advertised:
Python 3.8.0 (tags/v3.8.0:fa919fd, Oct 14 2019, 19:37:50) [MSC v.1916 64 bit (AMD64)] on win32
>>> # In the first Python interactive shell
>>> import numpy as np
>>> a = np.array([1, 1, 2, 3, 5, 8]) # Start with an existing NumPy array
>>> from multiprocessing import shared_memory
>>> shm = shared_memory.SharedMemory(create=True, size=a.nbytes)
>>> # Now create a NumPy array backed by shared memory
>>> b = np.ndarray(a.shape, dtype=a.dtype, buffer=shm.buf)
>>> b[:] = a[:] # Copy the original data into shared memory
>>> b
array([1, 1, 2, 3, 5, 8])
>>> type(b)
<class 'numpy.ndarray'>
>>> type(a)
<class 'numpy.ndarray'>
>>> shm.name
'wnsm_e3abbd9a'
So far, so good. However, the problem arises when I try to access this shared array, either in the same or a new Python shell on the same machine:
>>> # In either the same shell or a new Python shell on the same machine
>>> import numpy as np
>>> from multiprocessing import shared_memory
>>> # Attach to the existing shared memory block
>>> existing_shm = shared_memory.SharedMemory(name='wnsm_e3abbd9a')
>>> # Note that a.shape is (6,) and a.dtype is np.int64 in this example
>>> c = np.ndarray((6,), dtype=np.int64, buffer=existing_shm.buf)
>>> c
array([ 4294967297, 12884901890, 34359738373, 0, 0,
0], dtype=int64)
This clearly is not the array that was originally shared. Note that I just copy-pasted the example straight from the documentation, only changing the name of the shared memory block. Interestingly, the same thing happens even if I don't create the array "b" or copy "a" into it before switching to the second Python shell.
Finally, changing the last element of the array in the second shell works as normal:
>>> c[-1] = 888
>>> c
array([ 4294967297, 12884901890, 34359738373, 0, 0,
888], dtype=int64)
But it does not affect the original array in the first shell:
>>> # Back in the first Python interactive shell, b reflects this change
>>> b
array([1, 1, 2, 3, 5, 8])
Does anyone know why this is happening, or what I (along with the official documentation) am doing wrong?
Thanks!

Found the answer here: https://bugs.python.org/issue39655
The default dtype for ndarray in Windows is int32, not int64. Example works after changing this.
(Nice of the devs not to mention this in the documentation, despite this issue being submitted as a bug and closed.)

How to Convert a Python NdArray into String and back to NdArray ? ( In THIS scenario ) [duplicate]

I'm having a little trouble here,
I'm trying to convert a numpy.ndarray to string, I've already done that like this:
randomArray.tostring()
It works, but I'm wondering if I can transform it back to a numpy.ndarray.
What's the best way to do this?
I'm using numpy 1.8.1
Context:
The objective is to send the numpy.ndarray as a message in rabbitmq (pika library)

You can use the fromstring() method for this:
arr = np.array([1, 2, 3, 4, 5, 6])
ts = arr.tostring()
print(np.fromstring(ts, dtype=int))
>>> [1 2 3 4 5 6]
Sorry for the short answer, not enough points for commenting. Remember to state the data types or you'll end up in a world of pain.
Note on fromstring from numpy 1.14 onwards:
sep : str, optional
The string separating numbers in the data; extra whitespace between elements is also ignored.
Deprecated since version 1.14: Passing sep='', the default, is deprecated since it will trigger the deprecated binary mode of this function. This mode interprets string as binary bytes, rather than ASCII text with decimal numbers, an operation which is better spelt frombuffer(string, dtype, count). If string contains unicode text, the binary mode of fromstring will first encode it into bytes using either utf-8 (python 3) or the default encoding (python 2), neither of which produce sane results.

If you use tostring you lose information on both shape and data type:
>>> import numpy as np
>>> a = np.arange(12).reshape(3, 4)
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> s = a.tostring()
>>> aa = np.fromstring(a)
>>> aa
array([ 0.00000000e+000, 4.94065646e-324, 9.88131292e-324,
1.48219694e-323, 1.97626258e-323, 2.47032823e-323,
2.96439388e-323, 3.45845952e-323, 3.95252517e-323,
4.44659081e-323, 4.94065646e-323, 5.43472210e-323])
>>> aa = np.fromstring(a, dtype=int)
>>> aa
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
>>> aa = np.fromstring(a, dtype=int).reshape(3, 4)
>>> aa
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
This means you have to send the metadata along with the data to the recipient. To exchange auto-consistent objects, try cPickle:
>>> import cPickle
>>> s = cPickle.dumps(a)
>>> cPickle.loads(s)
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])

Imagine you have a numpy array of integers (it works with other types but you need some slight modification). You can do this:
a = np.array([0, 3, 5])
a_str = ','.join(str(x) for x in a) # '0,3,5'
a2 = np.array([int(x) for x in a_str.split(',')]) # np.array([0, 3, 5])
If you have an array of float, be sure to replace int by float in the last line.
You can also use the __repr__() method, which will have the advantage to work for multi-dimensional arrays:
from numpy import array
numpy.set_printoptions(threshold=numpy.nan)
a = array([[0,3,5],[2,3,4]])
a_str = a.__repr__() # 'array([[0, 3, 5],\n [2, 3, 4]])'
a2 = eval(a_str) # array([[0, 3, 5],
# [2, 3, 4]])

I know, I am late but here is the correct way of doing it. using base64. This technique will convert the array to string.
import base64
import numpy as np
random_array = np.random.randn(32,32)
string_repr = base64.binascii.b2a_base64(random_array).decode("ascii")
array = np.frombuffer(base64.binascii.a2b_base64(string_repr.encode("ascii")))
array = array.reshape(32,32)
For array to string
Convert binary data to a line of ASCII characters in base64 coding and decode to ASCII to get string repr.
For string to array
First, encode the string in ASCII format then
Convert a block of base64 data back to binary and return the binary data.

This is a fast way to encode the array, the array shape and the array dtype:
def numpy_to_bytes(arr: np.array) -> str:
arr_dtype = bytearray(str(arr.dtype), 'utf-8')
arr_shape = bytearray(','.join([str(a) for a in arr.shape]), 'utf-8')
sep = bytearray('|', 'utf-8')
arr_bytes = arr.ravel().tobytes()
to_return = arr_dtype + sep + arr_shape + sep + arr_bytes
return to_return
def bytes_to_numpy(serialized_arr: str) -> np.array:
sep = '|'.encode('utf-8')
i_0 = serialized_arr.find(sep)
i_1 = serialized_arr.find(sep, i_0 + 1)
arr_dtype = serialized_arr[:i_0].decode('utf-8')
arr_shape = tuple([int(a) for a in serialized_arr[i_0 + 1:i_1].decode('utf-8').split(',')])
arr_str = serialized_arr[i_1 + 1:]
arr = np.frombuffer(arr_str, dtype = arr_dtype).reshape(arr_shape)
return arr
To use the functions:
a = np.ones((23, 23), dtype = 'int')
a_b = numpy_to_bytes(a)
a1 = bytes_to_numpy(a_b)
np.array_equal(a, a1) and a.shape == a1.shape and a.dtype == a1.dtype

This is a slightly improvised answer to ajsp answer using XML-RPC.
On the server-side when you convert the data, convert the numpy data to a string using the
'.tostring()' method. This encodes the numpy ndarray as bytes string. On the client-side when you receive the data decode it using '.fromstring()' method. I wrote two simple functions for this. Hope this is helpful.
ndarray2str -- Converts numpy ndarray to bytes string.
str2ndarray -- Converts binary str back to numpy ndarray.
def ndarray2str(a):
# Convert the numpy array to string
a = a.tostring()
return a
On the receiver side, the data is received as a 'xmlrpc.client.Binary' object. You need to access the data using '.data'.
def str2ndarray(a):
# Specify your data type, mine is numpy float64 type, so I am specifying it as np.float64
a = np.fromstring(a.data, dtype=np.float64)
a = np.reshape(a, new_shape)
return a
Note: Only problem with this approach is that XML-RPC is very slow while sending large numpy arrays. It took me around 4 secs to send and receive a (10, 500, 500, 3) size numpy array for me.
I am using python 3.7.4.

The right answer for for numpy version >1.9
arr = np.array([1, 2, 3, 4, 5, 6])
ts = arr.tobytes()
#Reverse to array
arr = np.frombuffer(ts, dtype=arr.dtype)
print(arr)
tostring() is deprecated
You don't need any external library (except numpy) and its there is no faster method to retrive the value!

Imagine you have a numpy array of text like in a messenger
>>> stex[40]
array(['Know the famous thing ...
and you want to get statistics from the corpus (text col=11) you first must get the values from dataframe (df5) and then join all records together in one single corpus:
>>> stex = (df5.ix[0:,[11]]).values
>>> a_str = ','.join(str(x) for x in stex)
>>> a_str = a_str.split()
>>> fd2 = nltk.FreqDist(a_str)
>>> fd2.most_common(50)

How to convert a ctypes array of c_uint to a numpy array

I have the following ctypes array:
data = (ctypes.c_uint * 100)()
And I want to create a numpy array np_data containing the integer values from ctypes array data (the ctypes array is obviously populated later with values)
I have seen that there is a ctypes interface in numpy (https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.ctypes.html) but as far as I understood this is only to get ctypes from a numpy array and not the opposite.
I can obviously traverse data and populate np_data array items one by one, but I am wondering if there is a more efficient/straightforward way to do achieve this task.

You could use [NumPy]: numpy.ctypeslib.as_array(obj, shape=None).
>>> import ctypes as ct
>>> import numpy as np
>>>
>>>
>>> CUIntArr10 = ctypes.c_uint * 10
>>>
>>> ui10 = CUIntArr10(*range(10, 0, -1))
>>>
>>> [e for e in ui10] # The ctypes array
[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
>>>
>>> np_arr = np.ctypeslib.as_array(ui10)
>>> np_arr # And the np one
array([10, 9, 8, 7, 6, 5, 4, 3, 2, 1], dtype=uint32)
Didn't get to the specific line of code (nor did I test my assumption), but I have a feeling that the contents copying is done by a single memcpy call, which would make it much faster than doing things "manually" from Python.

Probably the fastest is to use np.frombuffer. It can be used with every object which implements the buffer-protocol, in particular with ctypes-arrays.
The main advantage of np.frombuffer is, that the memory of the ctypes-array isn't copied at all, but shared:
data = (ctypes.c_uint * 100)()
arr = np.frombuffer(data, dtype=np.uint32)
arr.flags
# ...
# OWNDATA : False
# ...
By setting
arr.flags.writable = False
one can ensure, that the data will be not be changed via numpy-array arr.
If copying of data is really necessary, the usual numpy-functionality can be used for arr.
The in #CristiFati's answer proposed np.ctypeslib.as_array seems to be a better way to create a numpy-array:
the memory is shared as well - there is no copying involved.
the right dtype is used automatically (which is a great thing: it eliminates errors (as in my original post, where I've used np.uint (means 64bit unsigned integer on my machine) instead of np.uint32(which also might not be right on some architectures).
Experimental proof of the above:
arr = np.ctypeslib.as_array(data)
arr.flags
# ...
# OWNDATA : False
# ...
arr.dtype
# dtype('<u4')

Convert a numpy.ndarray to string(or bytes) and convert it back to numpy.ndarray

I'm having a little trouble here,
I'm trying to convert a numpy.ndarray to string, I've already done that like this:
randomArray.tostring()
It works, but I'm wondering if I can transform it back to a numpy.ndarray.
What's the best way to do this?
I'm using numpy 1.8.1
Context:
The objective is to send the numpy.ndarray as a message in rabbitmq (pika library)

You can use the fromstring() method for this:
arr = np.array([1, 2, 3, 4, 5, 6])
ts = arr.tostring()
print(np.fromstring(ts, dtype=int))
>>> [1 2 3 4 5 6]
Sorry for the short answer, not enough points for commenting. Remember to state the data types or you'll end up in a world of pain.
Note on fromstring from numpy 1.14 onwards:
sep : str, optional
The string separating numbers in the data; extra whitespace between elements is also ignored.
Deprecated since version 1.14: Passing sep='', the default, is deprecated since it will trigger the deprecated binary mode of this function. This mode interprets string as binary bytes, rather than ASCII text with decimal numbers, an operation which is better spelt frombuffer(string, dtype, count). If string contains unicode text, the binary mode of fromstring will first encode it into bytes using either utf-8 (python 3) or the default encoding (python 2), neither of which produce sane results.

If you use tostring you lose information on both shape and data type:
>>> import numpy as np
>>> a = np.arange(12).reshape(3, 4)
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> s = a.tostring()
>>> aa = np.fromstring(a)
>>> aa
array([ 0.00000000e+000, 4.94065646e-324, 9.88131292e-324,
1.48219694e-323, 1.97626258e-323, 2.47032823e-323,
2.96439388e-323, 3.45845952e-323, 3.95252517e-323,
4.44659081e-323, 4.94065646e-323, 5.43472210e-323])
>>> aa = np.fromstring(a, dtype=int)
>>> aa
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
>>> aa = np.fromstring(a, dtype=int).reshape(3, 4)
>>> aa
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
This means you have to send the metadata along with the data to the recipient. To exchange auto-consistent objects, try cPickle:
>>> import cPickle
>>> s = cPickle.dumps(a)
>>> cPickle.loads(s)
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])

Imagine you have a numpy array of integers (it works with other types but you need some slight modification). You can do this:
a = np.array([0, 3, 5])
a_str = ','.join(str(x) for x in a) # '0,3,5'
a2 = np.array([int(x) for x in a_str.split(',')]) # np.array([0, 3, 5])
If you have an array of float, be sure to replace int by float in the last line.
You can also use the __repr__() method, which will have the advantage to work for multi-dimensional arrays:
from numpy import array
numpy.set_printoptions(threshold=numpy.nan)
a = array([[0,3,5],[2,3,4]])
a_str = a.__repr__() # 'array([[0, 3, 5],\n [2, 3, 4]])'
a2 = eval(a_str) # array([[0, 3, 5],
# [2, 3, 4]])

I know, I am late but here is the correct way of doing it. using base64. This technique will convert the array to string.
import base64
import numpy as np
random_array = np.random.randn(32,32)
string_repr = base64.binascii.b2a_base64(random_array).decode("ascii")
array = np.frombuffer(base64.binascii.a2b_base64(string_repr.encode("ascii")))
array = array.reshape(32,32)
For array to string
Convert binary data to a line of ASCII characters in base64 coding and decode to ASCII to get string repr.
For string to array
First, encode the string in ASCII format then
Convert a block of base64 data back to binary and return the binary data.

This is a fast way to encode the array, the array shape and the array dtype:
def numpy_to_bytes(arr: np.array) -> str:
arr_dtype = bytearray(str(arr.dtype), 'utf-8')
arr_shape = bytearray(','.join([str(a) for a in arr.shape]), 'utf-8')
sep = bytearray('|', 'utf-8')
arr_bytes = arr.ravel().tobytes()
to_return = arr_dtype + sep + arr_shape + sep + arr_bytes
return to_return
def bytes_to_numpy(serialized_arr: str) -> np.array:
sep = '|'.encode('utf-8')
i_0 = serialized_arr.find(sep)
i_1 = serialized_arr.find(sep, i_0 + 1)
arr_dtype = serialized_arr[:i_0].decode('utf-8')
arr_shape = tuple([int(a) for a in serialized_arr[i_0 + 1:i_1].decode('utf-8').split(',')])
arr_str = serialized_arr[i_1 + 1:]
arr = np.frombuffer(arr_str, dtype = arr_dtype).reshape(arr_shape)
return arr
To use the functions:
a = np.ones((23, 23), dtype = 'int')
a_b = numpy_to_bytes(a)
a1 = bytes_to_numpy(a_b)
np.array_equal(a, a1) and a.shape == a1.shape and a.dtype == a1.dtype

This is a slightly improvised answer to ajsp answer using XML-RPC.
On the server-side when you convert the data, convert the numpy data to a string using the
'.tostring()' method. This encodes the numpy ndarray as bytes string. On the client-side when you receive the data decode it using '.fromstring()' method. I wrote two simple functions for this. Hope this is helpful.
ndarray2str -- Converts numpy ndarray to bytes string.
str2ndarray -- Converts binary str back to numpy ndarray.
def ndarray2str(a):
# Convert the numpy array to string
a = a.tostring()
return a
On the receiver side, the data is received as a 'xmlrpc.client.Binary' object. You need to access the data using '.data'.
def str2ndarray(a):
# Specify your data type, mine is numpy float64 type, so I am specifying it as np.float64
a = np.fromstring(a.data, dtype=np.float64)
a = np.reshape(a, new_shape)
return a
Note: Only problem with this approach is that XML-RPC is very slow while sending large numpy arrays. It took me around 4 secs to send and receive a (10, 500, 500, 3) size numpy array for me.
I am using python 3.7.4.

The right answer for for numpy version >1.9
arr = np.array([1, 2, 3, 4, 5, 6])
ts = arr.tobytes()
#Reverse to array
arr = np.frombuffer(ts, dtype=arr.dtype)
print(arr)
tostring() is deprecated
You don't need any external library (except numpy) and its there is no faster method to retrive the value!

Imagine you have a numpy array of text like in a messenger
>>> stex[40]
array(['Know the famous thing ...
and you want to get statistics from the corpus (text col=11) you first must get the values from dataframe (df5) and then join all records together in one single corpus:
>>> stex = (df5.ix[0:,[11]]).values
>>> a_str = ','.join(str(x) for x in stex)
>>> a_str = a_str.split()
>>> fd2 = nltk.FreqDist(a_str)
>>> fd2.most_common(50)

Can I avoid using `asmatrix`?

Is there any way for me to create matrices directly and not have to use asmatrix? From what I can see, all of the typical matrix functions (ones, rand, etc) in Numpy return arrays, not matrices, which means (according to the documentation) that asmatrix will copy the data. Is there any way to avoid this?

According to the documentation:
Unlike matrix, asmatrix does not make a copy if the input is already a
matrix or an ndarray. Equivalent to matrix(data, copy=False).
So, asmatrix does not copy the data if it doesn't need to:
>>> import numpy as np
>>> a = np.arange(9).reshape((3,3))
>>> b = np.asmatrix(a)
>>> b.base is a
True
>>> a[0] = 3
>>> b
matrix([[3, 3, 3],
[3, 4, 5],
[6, 7, 8]])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Can I pass bytes between python processes with multiprocessing shared_memory - python

Related

(Python multiprocessing) How can I access an array shared with multiprocessing.shared_memory.SharedMemory?

How to Convert a Python NdArray into String and back to NdArray ? ( In THIS scenario ) [duplicate]

How to convert a ctypes array of c_uint to a numpy array

Convert a numpy.ndarray to string(or bytes) and convert it back to numpy.ndarray

Can I avoid using `asmatrix`?

Categories

Resources