I have a list of tuples, one of them is an object and the other one is generic. When I run np.asarray(list_in) the result is a 2D array, with the tuples being converted in a row. However I would like to obtain a 1D array made of tuples.
I could pass a dtype to force it and it works well if I try this minimalistic example
a = [(1,2),(3,4)]
b = np.asarray(a,dtype=('float,float'))
print b
[( 1., 2.) ( 3., 4.)]
But how do I take the first element of the list and construct a proper dtype out of it. type(list_in[0]) returns tuple and passing this to asarray does not work.
With this list of tuples you can make 3 kinds of arrays:
In [420]: a = [(1,2),(3,4)]
2d array, with dtype inferred from the inputs (but it could also be specified as something like float). Inputs match in size.
In [421]: np.array(a)
Out[421]:
array([[1, 2],
[3, 4]])
Structured array. 1d with 2 fields. Field indexing by name. Input must be a list of tuples (not list of lists):
In [422]: np.array(a, dtype='i,i')
Out[422]:
array([(1, 2), (3, 4)],
dtype=[('f0', '<i4'), ('f1', '<i4')])
In [423]: _['f0']
Out[423]: array([1, 3], dtype=int32)
In the structured array, input and display uses tuples, but the data is not actually stored as tuples. The values are packed as bytes - in this case 8 bytes representing 2 integers.
Object array. This is 1d with tuple contents. Contents could be anything else. This is an enhanced/debased list.
In [424]: A = np.empty((2,), dtype=object)
In [425]: A[:] = a
In [426]: A
Out[426]: array([(1, 2), (3, 4)], dtype=object)
In [427]: A.shape
Out[427]: (2,)
In [428]: A[1]
Out[428]: (3, 4)
Out[428] is an actual tuple. Trying to modify it, A[1][0]=30, raises an error.
In this last case A = np.empty(2, dtype=tuple) does the same thing. Any thing other than integer, float, string, etc is 'converted' to `object'.
Simply specifying object dtype doesn't help. The result is 2d with numeric elements (but stored as object pointers).
In [429]: np.array(a, dtype=object)
Out[429]:
array([[1, 2],
[3, 4]], dtype=object)
In [430]: _.shape
Out[430]: (2, 2)
More on making an object dtype array at
numpy ravel on inconsistent dimensional object
Related
When I create a numy array of a list of sublists of equal length, it implicitly converts it to a (len(list), len(sub_list)) 2d array:
>>> np.array([[1,2], [1,2]],dtype=object).shape
(2, 2)
But when I pass variable length sublists it creates a vector of length len(list):
>>> np.array([[1,2], [1,2,3]],dtype=object).shape
(2,)
How can I get a vector output when the sublists are the same length (i.e. make the first case behave like the second)?
Here you go...create with dtype=np.ndarray instead of dtype=object.
Simple example below (with 5 elements):
In [1]: arr = np.empty((5,), dtype=np.ndarray)
In [2]: arr.shape
Out[2]: (5,)
In [3]: arr[0]=np.array([1,2])
In [4]: arr[1]=np.array([2,3])
In [5]: arr[2]=np.array([1,2,3,4])
In [6]: arr
Out[6]:
array([array([1, 2]), array([2, 3]), array([1, 2, 3, 4]), None, None],
dtype=object)
You can create an array of objects of the desired size, and then set the elements like so:
elements = [np.array([1,2]), np.array([1,2])]
arr = np.empty(len(elements), dtype='object')
arr[:] = elements
But if you try to cast to an array directly with a list of arrays/lists of the same length, numpy will implicitly convert it into a multidimensional array.
np.array([[1,2], [1,2]],dtype=object)[0].shape
I'm trying to see if there is a prettier way to create (i.e force the creation) of a 1d numpy array from another list/array of objects. These objects, however, may have entries that are themselves iterable (so they can be lists, tuples, etc. but can also be more arbitrary objects).
So to make things really simple, let me consider the following scenario:
a=[(1,2), (3,4), (3,5)]
b=np.array(a, dtype=object)
b.shape # gives (2,3), but I would like to have (3,1) or (3,)
I was wondering if there is a nice pythonic/numpy'ish way to force b to have a shape (3,), and the iterable structure of the elements of a to be neglected in b. Right now I do this:
a=[(1,2), (3,4), (3,5)]
b=np.empty(len(a), dtype=object)
for i,x in enumerate(a):
b[i]=x
b.shape # gives (3,) this is what i want.
which works, but a bit ugly. I could not find a nicer way to do this in way that's more built-in into numpy. Any ideas?
(more context: what I really need to do is reshuffle the dimensions of b in various ways, hence I don't want b to know anything about the dimensions of its elements if they are iterable).
Thanks!
In [60]: b = np.empty(3, object)
You don't need to iterate when assigning from a list:
In [61]: b[:] = [(1,2),(3,4),(3,5)]
In [62]: b
Out[62]: array([(1, 2), (3, 4), (3, 5)], dtype=object)
In [63]: b.shape
Out[63]: (3,)
For an array it doesn't work:
In [64]: b[:] = np.array([(1,2),(3,4),(3,5)])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-64-3042dce1f885> in <module>
----> 1 b[:] = np.array([(1,2),(3,4),(3,5)])
ValueError: could not broadcast input array from shape (3,2) into shape (3)
You may have use the iteration in the array case:
In [66]: for i,n in enumerate(np.array([(1,2),(3,4),(3,5)])):
...: b[i] = n
...:
In [67]: b
Out[67]: array([array([1, 2]), array([3, 4]), array([3, 5])], dtype=object)
Keep in mind that object dtype arrays are a bit of fall back option. np.array(...) tries to create a multidimensional array if possible (with numeric dtype). Making an object dtype is done only if that isn't possible. And for some combinations of shapes, it throws up its hands and raises an error.
Turning that array into a list of arrays with list() also works (same speed):
In [92]: b[:] = list(np.array([(1,2),(3,4),(3,5)]))
In [93]: b
Out[93]: array([array([1, 2]), array([3, 4]), array([3, 5])], dtype=object)
NumPy is really helpful when creating arrays. If the first argument for numpy.array has a __getitem__ and __len__ method these are used on the basis that it might be a valid sequence.
Unfortunatly I want to create an array containing dtype=object without NumPy being "helpful".
Broken down to a minimal example the class would like this:
import numpy as np
class Test(object):
def __init__(self, iterable):
self.data = iterable
def __getitem__(self, idx):
return self.data[idx]
def __len__(self):
return len(self.data)
def __repr__(self):
return '{}({})'.format(self.__class__.__name__, self.data)
and if the "iterables" have different lengths everything is fine and I get exactly the result I want to have:
>>> np.array([Test([1,2,3]), Test([3,2])], dtype=object)
array([Test([1, 2, 3]), Test([3, 2])], dtype=object)
but NumPy creates a multidimensional array if these happen to have the same length:
>>> np.array([Test([1,2,3]), Test([3,2,1])], dtype=object)
array([[1, 2, 3],
[3, 2, 1]], dtype=object)
Unfortunatly there is only a ndmin argument so I was wondering if there is a way to enforce a ndmax or somehow prevent NumPy from interpreting the custom classes as another dimension (without deleting __len__ or __getitem__)?
This behavior has been discussed a number of times before (e.g. Override a dict with numpy support). np.array tries to make as high a dimensional array as it can. The model case is nested lists. If it can iterate and the sublists are equal in length it will 'drill' on down.
Here it went down 2 levels before encountering lists of different length:
In [250]: np.array([[[1,2],[3]],[1,2]],dtype=object)
Out[250]:
array([[[1, 2], [3]],
[1, 2]], dtype=object)
In [251]: _.shape
Out[251]: (2, 2)
Without a shape or ndmax parameter it has no way of knowing whether I want it to be (2,) or (2,2). Both of those would work with the dtype.
It's compiled code, so it isn't easy to see exactly what tests it uses. It tries to iterate on lists and tuples, but not on sets or dictionaries.
The surest way to make an object array with a given dimension is to start with an empty one, and fill it
In [266]: A=np.empty((2,3),object)
In [267]: A.fill([[1,'one']])
In [276]: A[:]={1,2}
In [277]: A[:]=[1,2] # broadcast error
Another way is to start with at least one different element (e.g. a None), and then replace that.
There is a more primitive creator, ndarray that takes shape:
In [280]: np.ndarray((2,3),dtype=object)
Out[280]:
array([[None, None, None],
[None, None, None]], dtype=object)
But that's basically the same as np.empty (unless I give it a buffer).
These are fudges, but they aren't expensive (time wise).
================ (edit)
https://github.com/numpy/numpy/issues/5933, Enh: Object array creation function. is an enhancement request. Also https://github.com/numpy/numpy/issues/5303 the error message for accidentally irregular arrays is confusing.
The developer sentiment seems to favor a separate function to create dtype=object arrays, one with more control over the initial dimensions and depth of iteration. They might even strengthen the error checking to keep np.array from creating 'irregular' arrays.
Such a function could detect the shape of a regular nested iterable down to a specified depth, and build an object type array to be filled.
def objarray(alist, depth=1):
shape=[]; l=alist
for _ in range(depth):
shape.append(len(l))
l = l[0]
arr = np.empty(shape, dtype=object)
arr[:]=alist
return arr
With various depths:
In [528]: alist=[[Test([1,2,3])], [Test([3,2,1])]]
In [529]: objarray(alist,1)
Out[529]: array([[Test([1, 2, 3])], [Test([3, 2, 1])]], dtype=object)
In [530]: objarray(alist,2)
Out[530]:
array([[Test([1, 2, 3])],
[Test([3, 2, 1])]], dtype=object)
In [531]: objarray(alist,3)
Out[531]:
array([[[1, 2, 3]],
[[3, 2, 1]]], dtype=object)
In [532]: objarray(alist,4)
...
TypeError: object of type 'int' has no len()
A workaround is of course to create an array of the desired shape and then copy the data:
In [19]: lst = [Test([1, 2, 3]), Test([3, 2, 1])]
In [20]: arr = np.empty(len(lst), dtype=object)
In [21]: arr[:] = lst[:]
In [22]: arr
Out[22]: array([Test([1, 2, 3]), Test([3, 2, 1])], dtype=object)
Notice that in any case I would not be surprised if numpy behavior w.r.t. interpreting iterable objects (which is what you want to use, right?) is numpy version dependent. And possibly buggy. Or maybe some of these bugs are actually features. Anyway, I'd be wary of breakage when a numpy version changes.
On the contrary, copying into a pre-created array should be way more robust.
This workaround may not be the most efficient, but I like it for its clarity:
test_list = [Test([1,2,3]), Test([3,2,1])]
test_list.append(None)
test_array = np.array(test_list, dtype=object)[:-1]
Summary: You take your list, append None, then convert to a numpy array, preventing numpy from converting to a multidimensional array. Finally you just remove the last entry to get the structure you want.
Workaround using pandas
This might not be what OP is looking for. But, just in case if anyone is looking for a way to prevent numpy from constructing multidimensional arrays, this might be useful.
Pass your list to pd.Series and then get the elements as a numpy array using .values.
import pandas as pd
pd.Series([Test([1,2,3]), Test([3,2,1])]).values
# array([Test([1, 2, 3]), Test([3, 2, 1])], dtype=object)
Or, if dealing with numpy arrays:
np.array([np.random.randn(2,2), np.random.randn(2,2)]).shape
(2, 2, 2)
Using pd.Series:
pd.Series([np.random.randn(2,2), np.random.randn(2,2)]).values.shape
#(2,)
NumPy is really helpful when creating arrays. If the first argument for numpy.array has a __getitem__ and __len__ method these are used on the basis that it might be a valid sequence.
Unfortunatly I want to create an array containing dtype=object without NumPy being "helpful".
Broken down to a minimal example the class would like this:
import numpy as np
class Test(object):
def __init__(self, iterable):
self.data = iterable
def __getitem__(self, idx):
return self.data[idx]
def __len__(self):
return len(self.data)
def __repr__(self):
return '{}({})'.format(self.__class__.__name__, self.data)
and if the "iterables" have different lengths everything is fine and I get exactly the result I want to have:
>>> np.array([Test([1,2,3]), Test([3,2])], dtype=object)
array([Test([1, 2, 3]), Test([3, 2])], dtype=object)
but NumPy creates a multidimensional array if these happen to have the same length:
>>> np.array([Test([1,2,3]), Test([3,2,1])], dtype=object)
array([[1, 2, 3],
[3, 2, 1]], dtype=object)
Unfortunatly there is only a ndmin argument so I was wondering if there is a way to enforce a ndmax or somehow prevent NumPy from interpreting the custom classes as another dimension (without deleting __len__ or __getitem__)?
This behavior has been discussed a number of times before (e.g. Override a dict with numpy support). np.array tries to make as high a dimensional array as it can. The model case is nested lists. If it can iterate and the sublists are equal in length it will 'drill' on down.
Here it went down 2 levels before encountering lists of different length:
In [250]: np.array([[[1,2],[3]],[1,2]],dtype=object)
Out[250]:
array([[[1, 2], [3]],
[1, 2]], dtype=object)
In [251]: _.shape
Out[251]: (2, 2)
Without a shape or ndmax parameter it has no way of knowing whether I want it to be (2,) or (2,2). Both of those would work with the dtype.
It's compiled code, so it isn't easy to see exactly what tests it uses. It tries to iterate on lists and tuples, but not on sets or dictionaries.
The surest way to make an object array with a given dimension is to start with an empty one, and fill it
In [266]: A=np.empty((2,3),object)
In [267]: A.fill([[1,'one']])
In [276]: A[:]={1,2}
In [277]: A[:]=[1,2] # broadcast error
Another way is to start with at least one different element (e.g. a None), and then replace that.
There is a more primitive creator, ndarray that takes shape:
In [280]: np.ndarray((2,3),dtype=object)
Out[280]:
array([[None, None, None],
[None, None, None]], dtype=object)
But that's basically the same as np.empty (unless I give it a buffer).
These are fudges, but they aren't expensive (time wise).
================ (edit)
https://github.com/numpy/numpy/issues/5933, Enh: Object array creation function. is an enhancement request. Also https://github.com/numpy/numpy/issues/5303 the error message for accidentally irregular arrays is confusing.
The developer sentiment seems to favor a separate function to create dtype=object arrays, one with more control over the initial dimensions and depth of iteration. They might even strengthen the error checking to keep np.array from creating 'irregular' arrays.
Such a function could detect the shape of a regular nested iterable down to a specified depth, and build an object type array to be filled.
def objarray(alist, depth=1):
shape=[]; l=alist
for _ in range(depth):
shape.append(len(l))
l = l[0]
arr = np.empty(shape, dtype=object)
arr[:]=alist
return arr
With various depths:
In [528]: alist=[[Test([1,2,3])], [Test([3,2,1])]]
In [529]: objarray(alist,1)
Out[529]: array([[Test([1, 2, 3])], [Test([3, 2, 1])]], dtype=object)
In [530]: objarray(alist,2)
Out[530]:
array([[Test([1, 2, 3])],
[Test([3, 2, 1])]], dtype=object)
In [531]: objarray(alist,3)
Out[531]:
array([[[1, 2, 3]],
[[3, 2, 1]]], dtype=object)
In [532]: objarray(alist,4)
...
TypeError: object of type 'int' has no len()
A workaround is of course to create an array of the desired shape and then copy the data:
In [19]: lst = [Test([1, 2, 3]), Test([3, 2, 1])]
In [20]: arr = np.empty(len(lst), dtype=object)
In [21]: arr[:] = lst[:]
In [22]: arr
Out[22]: array([Test([1, 2, 3]), Test([3, 2, 1])], dtype=object)
Notice that in any case I would not be surprised if numpy behavior w.r.t. interpreting iterable objects (which is what you want to use, right?) is numpy version dependent. And possibly buggy. Or maybe some of these bugs are actually features. Anyway, I'd be wary of breakage when a numpy version changes.
On the contrary, copying into a pre-created array should be way more robust.
This workaround may not be the most efficient, but I like it for its clarity:
test_list = [Test([1,2,3]), Test([3,2,1])]
test_list.append(None)
test_array = np.array(test_list, dtype=object)[:-1]
Summary: You take your list, append None, then convert to a numpy array, preventing numpy from converting to a multidimensional array. Finally you just remove the last entry to get the structure you want.
Workaround using pandas
This might not be what OP is looking for. But, just in case if anyone is looking for a way to prevent numpy from constructing multidimensional arrays, this might be useful.
Pass your list to pd.Series and then get the elements as a numpy array using .values.
import pandas as pd
pd.Series([Test([1,2,3]), Test([3,2,1])]).values
# array([Test([1, 2, 3]), Test([3, 2, 1])], dtype=object)
Or, if dealing with numpy arrays:
np.array([np.random.randn(2,2), np.random.randn(2,2)]).shape
(2, 2, 2)
Using pd.Series:
pd.Series([np.random.randn(2,2), np.random.randn(2,2)]).values.shape
#(2,)
I've read through the documentation but cannot work out how to create a structured array of strings and integers with numpy. A shortened version of my problem is below:
foo = [['asd', 1, 2],['bgf',2,3]]
bar = np.array(foo, dtype=['S10', 'i4','i4'])
I would then like to have bar[:,0] as an array of strings and bar[:,1]and bar[:,2] as arrays of integers.
Unfortunately this gives a TypeError: data type not understood. I've tried many other ways to get it to work but cannot find anything intuitive.
Currently I am just doing bar = np.array(foo) and then casting to integer whenever I call a value from the 2nd or 3rd column, which is far from ideal.
How can I create the structure array bar that I would like from the list of lists foo?
Here's one way to create the structured array:
>>> foo = [('asd', 1, 2),('bgf',2,3)]
>>> bar = np.array(foo, dtype='S10,i4,i4')
>>> bar
array([('asd', 1, 2), ('bgf', 2, 3)],
dtype=[('f0', 'S10'), ('f1', '<i4'), ('f2', '<i4')])
>>> bar['f0']
array(['asd', 'bgf'],
dtype='|S10')
>>> bar['f1']
array([1, 2], dtype=int32)
>>> bar['f2']
array([2, 3], dtype=int32)
If you want a normal array, with elements rather than fields, then use dtype=object.
If there is more than one datatype in an array, use dtype=object.
>>> bar = np.array(foo, dtype=object)
>>> bar[:,0]
array(['asd', 'bgf'], dtype=object)
>>> bar[:,1]
array([1, 2], dtype=object)
>>> bar[:,2]
array([2, 3], dtype=object)