How to `np.loads()` an `np.save()`d array? - python

To wit:
>>> foo = np.array([1, 2, 3])
>>> np.save('zomg.npy', foo)
>>> np.load('zomg.npy')
array([1, 2, 3])
All good. What about loads?
>>> np.loads(open('zomg.npy', 'rb').read())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
_pickle.UnpicklingError: STACK_GLOBAL requires str
Nope. Shouldn't this work? np.load() succeeds, so I know the data is not corrupted:

I'd suggest sticking with the np.save and np.load unless there is some extra functionality of pickle that you need. Then it might be less confusing to use pickle directly rather via one of the np synonyms.
============
There is an undocumented np.loads; just another name for the pickle.loads.
In [573]: np.loads
Out[573]: <function _pickle.loads>
In [574]: np.loads??
Signature: np.loads(data, *, fix_imports=True, encoding='ASCII', errors='strict')
np.ma.loads has more docs, but is just:
def loads(strg):
...
return pickle.loads(strg)
np.load will use pickle for things that aren't regular arrays, but performs its own load from the np.save format. See what its docs says about pickled objects. And to add to the confusion. pickle.dump of an array uses np.save. That is, the pickle format for an ndarray is save.
So there is a relationship between np.load and np.loads, but it isn't quite the same as that between pickle.load and pickle.loads.
================
there isn't a np.dumps, but there is a np.ma.dumps
In [584]: d=np.ma.dumps(foo)
In [585]: d
Out[585]: b'\x80\x03cnumpy.core.multiarray\n_reconstruct\nq\x00cnumpy\nndarray\nq\x01K\x00\x85q\x02C\x01bq\x03\x87q\x04Rq\x05(K\x01K\x03\x85q\x06cnumpy\ndtype\nq\x07X\x02\x00\x00\x00i4q\x08K\x00K\x01\x87q\tRq\n(K\x03X\x01\x00\x00\x00<q\x0bNNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00tq\x0cb\x89C\x0c\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00q\rtq\x0eb.'
In [586]: np.loads(d)
Out[586]: array([1, 2, 3])
In [587]: np.ma.loads(d)
Out[587]: array([1, 2, 3])
In [588]: import pickle
In [589]: pickle.loads(d)
Out[589]: array([1, 2, 3])
Using the pickle interface to save and load an array:
In [594]: np.ma.dump(foo,open('test.pkl','wb'))
In [595]: np.load('test.pkl')
Out[595]: array([1, 2, 3])
In [600]: pickle.load(open('test.pkl','rb'))
Out[600]: array([1, 2, 3])

This works as a work-around for now:
>>> np.load(io.BytesIO(open('zomg.npy', 'rb').read()))
array([1, 2, 3])

Related

(Python multiprocessing) How can I access an array shared with multiprocessing.shared_memory.SharedMemory?

I am trying to understand how multiprocessing.shared_memory.SharedMemory works. I tried to run the second example from https://docs.python.org/3/library/multiprocessing.shared_memory.html - but it does not seem to work as advertised:
Python 3.8.0 (tags/v3.8.0:fa919fd, Oct 14 2019, 19:37:50) [MSC v.1916 64 bit (AMD64)] on win32
>>> # In the first Python interactive shell
>>> import numpy as np
>>> a = np.array([1, 1, 2, 3, 5, 8]) # Start with an existing NumPy array
>>> from multiprocessing import shared_memory
>>> shm = shared_memory.SharedMemory(create=True, size=a.nbytes)
>>> # Now create a NumPy array backed by shared memory
>>> b = np.ndarray(a.shape, dtype=a.dtype, buffer=shm.buf)
>>> b[:] = a[:] # Copy the original data into shared memory
>>> b
array([1, 1, 2, 3, 5, 8])
>>> type(b)
<class 'numpy.ndarray'>
>>> type(a)
<class 'numpy.ndarray'>
>>> shm.name
'wnsm_e3abbd9a'
So far, so good. However, the problem arises when I try to access this shared array, either in the same or a new Python shell on the same machine:
>>> # In either the same shell or a new Python shell on the same machine
>>> import numpy as np
>>> from multiprocessing import shared_memory
>>> # Attach to the existing shared memory block
>>> existing_shm = shared_memory.SharedMemory(name='wnsm_e3abbd9a')
>>> # Note that a.shape is (6,) and a.dtype is np.int64 in this example
>>> c = np.ndarray((6,), dtype=np.int64, buffer=existing_shm.buf)
>>> c
array([ 4294967297, 12884901890, 34359738373, 0, 0,
0], dtype=int64)
This clearly is not the array that was originally shared. Note that I just copy-pasted the example straight from the documentation, only changing the name of the shared memory block. Interestingly, the same thing happens even if I don't create the array "b" or copy "a" into it before switching to the second Python shell.
Finally, changing the last element of the array in the second shell works as normal:
>>> c[-1] = 888
>>> c
array([ 4294967297, 12884901890, 34359738373, 0, 0,
888], dtype=int64)
But it does not affect the original array in the first shell:
>>> # Back in the first Python interactive shell, b reflects this change
>>> b
array([1, 1, 2, 3, 5, 8])
Does anyone know why this is happening, or what I (along with the official documentation) am doing wrong?
Thanks!
Found the answer here: https://bugs.python.org/issue39655
The default dtype for ndarray in Windows is int32, not int64. Example works after changing this.
(Nice of the devs not to mention this in the documentation, despite this issue being submitted as a bug and closed.)

Can I pass bytes between python processes with multiprocessing shared_memory

I am trying to pass sound data to a subprocess in python through shared_memory. Currently, in one program I am converting the sound byte data to a numpy array of int16. I can access the shared_memory of the numpy array from both python processes but the conversion of the numpy array back to a bytearray takes too long for what I am trying to do. Is there a way to just pass the byte array to a python subprocess (through shared_memory or something else)?
The python example I code I based my code off of is:
>>> # In the first Python interactive shell
>>> import numpy as np
>>> a = np.array([1, 1, 2, 3, 5, 8]) # Start with an existing NumPy array
>>> from multiprocessing import shared_memory
>>> shm = shared_memory.SharedMemory(create=True, size=a.nbytes)
>>> # Now create a NumPy array backed by shared memory
>>> b = np.ndarray(a.shape, dtype=a.dtype, buffer=shm.buf)
>>> b[:] = a[:] # Copy the original data into shared memory
>>> b
array([1, 1, 2, 3, 5, 8])
>>> type(b)
<class 'numpy.ndarray'>
>>> type(a)
<class 'numpy.ndarray'>
>>> shm.name # We did not specify a name so one was chosen for us
'psm_21467_46075'
>>> # In either the same shell or a new Python shell on the same machine
>>> import numpy as np
>>> from multiprocessing import shared_memory
>>> # Attach to the existing shared memory block
>>> existing_shm = shared_memory.SharedMemory(name='psm_21467_46075')
>>> # Note that a.shape is (6,) and a.dtype is np.int64 in this example
>>> c = np.ndarray((6,), dtype=np.int64, buffer=existing_shm.buf)
>>> c
array([1, 1, 2, 3, 5, 8])
>>> c[-1] = 888
>>> c
array([ 1, 1, 2, 3, 5, 888])
>>> # Back in the first Python interactive shell, b reflects this change
>>> b
array([ 1, 1, 2, 3, 5, 888])
>>> # Clean up from within the second Python shell
>>> del c # Unnecessary; merely emphasizing the array is no longer used
>>> existing_shm.close()
>>> # Clean up from within the first Python shell
>>> del b # Unnecessary; merely emphasizing the array is no longer used
>>> shm.close()
>>> shm.unlink() # Free and release the shared memory block at the very end
The data is saved in shared_memory in a int16 numpy array (c)
To input the sound data into pyaudio.stream.write I have to do the following conversion
>>> c = np.array([ 1, 1, 2, 3, 5, 888])
>>> c
array([ 1, 1, 2, 3, 5, 888])
>>> bytedata = b''.join(c)
>>> bytedata
b'\x01\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\x05\x00\x00\x00x\x03\x00\x00'
>>>
Is it possible to have this bytes format stored in a shared_memory location?
Ideally a working version of:
store_byte_array = np.bytearray(c, dtype=np.int16,buffer=shared_memory.buf)
Thanks in advance!

Creating a numpy array from a set

I noticed the following behaviour exhibited by numpy arrays:
>>> import numpy as np
>>> s = {1,2,3}
>>> l = [1,2,3]
>>> np.array(l)
array([1, 2, 3])
>>> np.array(s)
array({1, 2, 3}, dtype=object)
>>> np.array(l, dtype='int')
array([1, 2, 3])
>>> np.array(l, dtype='int').dtype
dtype('int64')
>>> np.array(s, dtype='int')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: int() argument must be a string, a bytes-like object or a number, not 'set'
There are 2 things to notice:
Creating an array from a set results in the array dtype being object
Trying to specify dtype results in an error which suggests that the
set is being treated as a single element rather than an iterable.
What am I missing - I don't fully understand which bit of python I'm overlooking. Set is a mutable object much like a list is.
EDIT: tuples work fine:
>>> t = (1,2,3)
>>> np.array(t)
array([1, 2, 3])
>>> np.array(t).dtype
dtype('int64')
The array factory works best with sequence objects which a set is not. If you do not care about the order of elements and know they are all ints or convertible to int, then you can use np.fromiter
np.fromiter({1,2,3},int,3)
# array([1, 2, 3])
The second (dtype) argument is mandatory; the last (count) argument is optional, providing it can improve performance.
As you can see from the syntax of using curly brackets, a set are more closely related to a dict than to a list. You can solve it very simply by turning the set into a list or tuple before converting to an array:
>>> import numpy as np
>>> s = {1,2,3}
>>> np.array(s)
array({1, 2, 3}, dtype=object)
>>> np.array(list(s))
array([1, 2, 3])
>>> np.array(tuple(s))
array([1, 2, 3])
However this might be too inefficient for large sets, because the list or tuple functions have to run through the whole set before even starting the creation of the array. A better method would be to use the set as an iterator:
>>> np.fromiter(s, int)
array([1, 2, 3])
The np.array documentation says that the object argument must be "an array, any object exposing the array interface, an object whose __array__ method returns an array, or any (nested) sequence" (emphasis added).
A set is not a sequence. Specifically, sets are unordered and do not support the __getitem__ method. Hence you cannot create an array from a set like you trying to with the list.
Numpy expects the argument to be a list, it doesn't understand the set type so it creates an object array (this would be the same if you passed any other non sequence object). You can create a numpy array with a set by first converting the set to a list numpy.array(list(my_set)). Hope this helps.

simple but weird vstack/concatenate problems (python)

I've been reading over the documentation on numpy arrays and some of it is not making sense.
For instance, the answer given here suggest to use np.vstack or np.concatenate to combine arrays, as does many other places on the internet.
However, when I try to do this with converted lists to np.arrays is doesn't work:
"
>>> some_list = [1,2,3,4,5]
>>> np.array(some_list)
array([1, 2, 3, 4, 5])
>>> some_Y_list = [2,1,5,6,3]
>>> np.array(some_Y_list)
array([2, 1, 5, 6, 3])
>>> dydx = np.diff(some_Y_list)/np.diff(some_list)
>>> np.vstack([dydx, dydx[-1]])"
Traceback (most recent call last):
File "<pyshell#5>", line 1, in <module>
np.vstack([dydx, dydx[-1]])
File "C:\Python27\lib\site-packages\numpy\core\shape_base.py", line 226, in vstack
return _nx.concatenate(map(atleast_2d,tup),0)
ValueError: array dimensions must agree except for d_0
Any way that I can do this?
All I am needing this for in this instance is just to make the derivatives of any order the same shape as my X array given by the user so I can do processing.
Thanks for any help.
The following won't work except in some very limited circumstances:
np.vstack([dydx, dydx[-1]])
Here, dydx is an array and dydx[-1] is a scalar.
It's unclear what you're trying to achieve, but did you perhaps mean to stack them horizontally:
np.hstack([dydx, dydx[-1]])
?
In [38]: np.hstack([dydx, dydx[-1]])
Out[38]: array([-1, 4, 1, -3, -3])

Python function to accept numpy ndarray or sequence as arguments

I have seen some python functions that work generally receiving a (n,2) shaped numpy ndarray as argument, but can also "automagically" receive (2,n) or even len(2) sequences (tuple or list).
How is it pythonically achieved? Is there a unified good practice to check and treat these cases (example, functions in numpy and scipy module), or each developer implements which he thinks best?
I'd just like to avoid chains of (possibly nested) chains of ifs/elifs, in case there is a well known better way.
Thanks for any help.
You can use the numpy.asarray function to convert any sequence-like input to an array:
>>> import numpy
>>> numpy.asarray([1,2,3])
array([1, 2, 3])
>>> numpy.asarray(numpy.array([2,3]))
array([2, 3])
>>> numpy.asarray(1)
array(1)
>>> numpy.asarray((2,3))
array([2, 3])
>>> numpy.asarray({1:3,2:4})
array({1: 3, 2: 4}, dtype=object)
It's important to note that as the documentation says No copy is performed if the input is already an ndarray. This is really nice since you can pass an existing array in and it just returns the same array.
Once you convert it to a numpy array, just check the length if that's a requirement. Something like:
>>> def f(x):
... x = numpy.asarray(x)
... if len(x) != 2:
... raise Exception("invalid argument")
...
>>> f([1,2])
>>> f([1,2,3])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 4, in f
Exception: invalid argument
Update:
Since you asked, here's a "magic" function that will except *args as an array also:
>>> def f(*args):
... args = numpy.asarray(args[0]) if len(args) == 1 else numpy.asarray(args)
... return args
...
>>> f(7,3,5)
array([7, 3, 5])
>>> f([1,2,3])
array([1, 2, 3])
>>> f((2,3,4))
array([2, 3, 4])
>>> f(numpy.array([1,2,3]))
array([1, 2, 3])

Categories

Resources