List of lists into numpy array - python

How do I convert a simple list of lists into a numpy array? The rows are individual sublists and each row contains the elements in the sublist.

If your list of lists contains lists with varying number of elements then the answer of Ignacio Vazquez-Abrams will not work. Instead there are at least 3 options:
1) Make an array of arrays:
x=[[1,2],[1,2,3],[1]]
y=numpy.array([numpy.array(xi) for xi in x])
type(y)
>>><type 'numpy.ndarray'>
type(y[0])
>>><type 'numpy.ndarray'>
2) Make an array of lists:
x=[[1,2],[1,2,3],[1]]
y=numpy.array(x)
type(y)
>>><type 'numpy.ndarray'>
type(y[0])
>>><type 'list'>
3) First make the lists equal in length:
x=[[1,2],[1,2,3],[1]]
length = max(map(len, x))
y=numpy.array([xi+[None]*(length-len(xi)) for xi in x])
y
>>>array([[1, 2, None],
>>> [1, 2, 3],
>>> [1, None, None]], dtype=object)

>>> numpy.array([[1, 2], [3, 4]])
array([[1, 2], [3, 4]])

As this is the top search on Google for converting a list of lists into a Numpy array, I'll offer the following despite the question being 4 years old:
>>> x = [[1, 2], [1, 2, 3], [1]]
>>> y = numpy.hstack(x)
>>> print(y)
[1 2 1 2 3 1]
When I first thought of doing it this way, I was quite pleased with myself because it's soooo simple. However, after timing it with a larger list of lists, it is actually faster to do this:
>>> y = numpy.concatenate([numpy.array(i) for i in x])
>>> print(y)
[1 2 1 2 3 1]
Note that #Bastiaan's answer #1 doesn't make a single continuous list, hence I added the concatenate.
Anyway...I prefer the hstack approach for it's elegant use of Numpy.

It's as simple as:
>>> lists = [[1, 2], [3, 4]]
>>> np.array(lists)
array([[1, 2],
[3, 4]])

Again, after searching for the problem of converting nested lists with N levels into an N-dimensional array I found nothing, so here's my way around it:
import numpy as np
new_array=np.array([[[coord for coord in xk] for xk in xj] for xj in xi], ndmin=3) #this case for N=3

The OP specified that "the rows are individual sublists and each row contains the elements in the sublist".
Assuming that the use of numpy is not prohibited (given that the flair numpy has been added in the OP), use vstack:
import numpy as np
list_of_lists= [[1, 2, 3], [4, 5, 6], [7 ,8, 9]]
array = np.vstack(list_of_lists)
# array([[1, 2, 3],
# [4, 5, 6],
# [7, 8, 9]])
or simpler (as mentioned in another answer),
array = np.array(list_of_lists)

As mentioned in the other answers, np.vstack() will let you convert your list-of-lists(nested list) into a 1-dimensional array of sublists. But if you are looking to convert the list of lists into a 2-dimensional numpy.ndarray. Then you can use the numpy.asarray() function.
For example, if you have a list of lists named y_true that looks like:
[[0, 1, 0], [1, 0, 0], [0, 0, 1], [1, 0, 0], [0, 1, 0], [0, 0, 1], [1, 0, 0]]
<class 'list'>
This line y_true = np.asarray(y_true) will convert the list of lists into a 2-dimensional numpy ndarray that looks like:
[[0 1 0]
[1 0 0]
[0 0 1]
[1 0 0]
[0 1 0]
[0 0 1]
[1 0 0]]
<class 'numpy.ndarray'>
Additionally, you can also specify the dtype parameter like np.asarray(y_true, dtype = float) to have your array values in your desired data type.

I had a list of lists of equal length. Even then Ignacio Vazquez-Abrams's answer didn't work out for me. I got a 1-D numpy array whose elements are lists. If you faced the same problem, you can use the below method
Use numpy.vstack
import numpy as np
np_array = np.empty((0,4), dtype='float')
for i in range(10)
row_data = ... # get row_data as list
np_array = np.vstack((np_array, np.array(row_data)))

Just use pandas
list(pd.DataFrame(listofstuff).melt().values)
this only works for a list of lists
if you have a list of list of lists you might want to try something along the lines of
lists(pd.DataFrame(listofstuff).melt().apply(pd.Series).melt().values)

Related

using integer as index for multidimensional numpy array

I have boolean array of shape (n_samples, n_items) which represents a set: my_set[i, j] tells if sample i contains item j.
To populate it, the array is initialized as zeros, and receive another array of integers, with shape (n_samples, 3), telling for each example, three elements that belongs to it, for instance:
my_set = np.zeros((2, 5), dtype=bool)
init_values = np.array([[1,3,4], [0,1,2]], dtype=np.int64)
So, I need to fill my_set in row 0 and columns 1, 3, 4 and in row 1, columns 0, 1, 2, with with ones.
my_set contain valid values in appropriated range (that is, in [0, n_items)), and each column doesn't contain duplicated items.
Some failed approaches:
I know that a list of integers (or array) can be used as index, so I tried to use init_values as index straightforward, but it failed:
my_set[init_values] = 1
File "<ipython-input-9-9b2c4d19f4f6>", line 1, in <cell line: 1>
my_set[init_values] = 1
IndexError: index 3 is out of bounds for axis 0 with size 2
I don't know why the 3 is indexing over the first axis, so I tried a second approach: "pick up all rows and index only desired columns", using a mix of slicing and integer index. And it didn't throw error, but didn't worked as expected: checkout the shape, I expect it to be (2, 3), however...
my_set[:, init_values].shape
Out[11]: (2, 2, 3)
Not sure why it didn't work, but at least the first axis looks correct, so I tried to pick up only the first column, which is a list of integers, and therefore it is "more natural"... once again, it didn't worked:
my_set[:, init_values[:,0]].shape
Out[12]: (2, 2)
I expected this shape to be (2, 1) since I wanted all rows with a single column on each, corresponding to the indexes given in init_values.
I decided to go back to integer index approach for the first axis.... and it worked:
my_set[np.arange(len(my_set)), init_values[:,0]].shape
Out[13]: (2,)
However, it only works wor one column, so I need to iterate over columns to make it really work, but it looks like a good-initial workaround.
Current solution
So, to solve my original problem, I wrote this:
for c in range(init_values.shape[1])
my_set[np.arange(len(my_set)), init_values[:,c]] = 1
# now lets check my_set is properly filled
print(my_set)
Out[14]: [[False True False True True]
[ True True True False False]]
which is exactly what I need.
Question(s):
That said, here goes my main question:
Is there a more efficient way to do this? I see it quite inefficient as the number of elements grows (for this example I used 3 but I actually need larger values).
In addition to this I'd like to understand why using np.arange on the first index behaves different from slicing it as :: I didn't expect this behavior.
Any other comment to understand why previous approaches failed, are also welcome.
You only have column indices, so you also need to create their corresponding row indices:
>>> my_set[np.arange(len(my_set))[:, None], init_values] = 1
>>> my_set
array([[False, True, False, True, True],
[ True, True, True, False, False]])
[:, None] is used to convert the row indices row vector to the column vector, so that row and column indices have compatible shapes for broadcasting:
>>> np.arange(len(my_set))[:, None]
array([[0],
[1]])
>>> np.broadcast_arrays(np.arange(len(my_set))[:, None], init_values)
[array([[0, 0, 0],
[1, 1, 1]]),
array([[1, 3, 4],
[0, 1, 2]], dtype=int64)]
The essence of slicing is to apply the index of other dimensions to each index in the slicing range of this dimension. Here is a simple test. The matrix to be indexed is as follows:
>>> ar = np.arange(4).reshape(2, 2)
>>> ar
array([[0, 1],
[2, 3]])
If you want to get elements whit indices 0 and 1 in row 0, and elements with indices 1 and 0 in row 1, but you use the combination of column indices [[0, 1], [1, 0]] and slice, you will get:
>>> ar[:, [[0, 1], [1, 0]]]
array([[[0, 1],
[1, 0]],
[[2, 3],
[3, 2]]])
This is equivalent to combining the row index from 0 to 1 with the column indices respectively:
>>> ar[0, [[0, 1], [1, 0]]]
array([[0, 1],
[1, 0]])
>>> ar[1, [[0, 1], [1, 0]]]
array([[2, 3],
[3, 2]])
In fact, broadcasting is used secretly here. The actual indices are:
>>> np.broadcast_arrays(0, [[0, 1], [1, 0]])
[array([[0, 0],
[0, 0]]),
array([[0, 1],
[1, 0]])]
>>> np.broadcast_arrays(1, [[0, 1], [1, 0]])
[array([[1, 1],
[1, 1]]),
array([[0, 1],
[1, 0]])]
This is not the same as the indices you actually need. Therefore, you need to manually generate the correct row indices for broadcasting:
>>> ar[[[0], [1]], [[0, 1], [1, 0]]]
array([[0, 1],
[3, 2]])
>>> np.broadcast_arrays([[0], [1]], [[0, 1], [1, 0]])
[array([[0, 0],
[1, 1]]),
array([[0, 1],
[1, 0]])]

Slicing Numpy array using List

Consider 2D Numpy array A and in-place function x like
A = np.arange(9).reshape(3,3)
def x(M):
M[:,2] = 0
Now, I have a list (or 1D numpy array) L pointing the rows, I want to select and apply the function f on them like
L = [0, 1]
x(A[L, :])
where the output will be written to A. Since I used index access to A, the matrix A is not affected at all:
A = array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
What I actually need is to slice the matrix such as
x(A[:2, :])
giving me the desired output
A = array([[0, 1, 0],
[3, 4, 0],
[6, 7, 8]])
The question is now, how to provide Numpy array slicing by the list L (either any automatic conversion of list to slice or if there is any build in function for that), because I am not able to convert the list L easily to slice like :2 in this case.
Note that I have both large matrix A and list L in my problem - that is the reason, why I would need the in-place operations to control the available memory.
Can you modify the function so as you can pass slice L inside it:
def func(M,L):
M[L,2] = 0
func(A,L)
print(A)
Out:
array([[0, 1, 0],
[3, 4, 0],
[6, 7, 8]])

Add one point to each point in an numpy array

Sat I have the following numpy array:
arr = numpy.array([[0,0], [1, 0], [2, 0], [3, 0]])
How do I add a single sub-array on each of the six sub-arrays? (Say if want to add [2,1] to each of them then the output should be [[2,1], [3, 1], [4, 1], [5, 1]])
I know if it's a 1D array you can just write something like arr + 1 and it will add 1 to each elements in arr but what about in this case? I have yet to be able to find relative information in the documentations
arr = np.array([np.append(item, [2,1]) for item in arr])
This should give you the result

Numpy stack with unequal shapes

I've noticed that the solution to combining 2D arrays to 3D arrays through np.stack, np.dstack, or simply passing a list of arrays only works when the arrays have same .shape[0].
For instance, say I have:
print(arr)
[[0 1]
[2 3]
[4 5]
[6 7]
[8 9]]
it easy easy to get to:
print(np.array([arr[2:4], arr[3:5]])) # same shape
[[[4 5]
[6 7]]
[[6 7]
[8 9]]]
However, if I pass a list of arrays of unequal length, I get:
print(np.array([arr[:2], arr[:3]]))
[array([[0, 1],
[2, 3]])
array([[0, 1],
[2, 3],
[4, 5]])]
How can I get to simply:
[[[0, 1]
[2, 3]]
[[0, 1]
[2, 3]
[4, 5]]]
What I've tried: a number of other Array manipulation routines.
Note: ultimately want to do this for more than 2 arrays, so np.append is probably not ideal.
Numpy arrays have to be rectangular, so what you are trying to get is not possible with a numpy array.
You need a different data structure. Which one is suitable depends on what you want to do with that data.
I've made a function that works for this problem, assuming that you are willing to pad to make the shape rectangular, and you have arbitrarily higher multidimensional arrays. It could probably be optimised further, but it's not too bad.
import numpy as np
def stack_uneven(arrays, fill_value=0.):
'''
Fits arrays into a single numpy array, even if they are
different sizes. `fill_value` is the default value.
Args:
arrays: list of np arrays of various sizes
(must be same rank, but not necessarily same size)
fill_value (float, optional):
Returns:
np.ndarray
'''
sizes = [a.shape for a in arrays]
max_sizes = np.max(list(zip(*sizes)), -1)
# The resultant array has stacked on the first dimension
result = np.full((len(arrays),) + tuple(max_sizes), fill_value)
for i, a in enumerate(arrays):
# The shape of this array `a`, turned into slices
slices = tuple(slice(0,s) for s in sizes[i])
# Overwrite a block slice of `result` with this array `a`
result[i][slices] = a
return result
The only caveat to using this is that the input must able to be treated a sequence of numpy arrays. So for your example of
arr = np.array([[0, 1],
[2, 3],
[4, 5],
[6, 7],
[8, 9]])
stack_uneven([arr[:2], arr[:3]], 0)
This would give you
array([[[0, 1],
[2, 3],
[0, 0]],
[[0, 1],
[2, 3],
[4, 5]]])
But this works equally for higher dimensional things, like:
arr = [np.ones([3, 2, 2]), np.ones([2, 3, 2]), np.ones([2, 2, 3])]
The function np.stack joins multiple arrays along a new axis, not an existing one. See:
>>> import numpy as np
>>> arr = np.array(range(10)).reshape((5,2))
>>> print arr
[[0 1]
[2 3]
[4 5]
[6 7]
[8 9]]
>>> t1 = np.array([arr[2:4], arr[3:5]])
>>> print t1.shape
(2, 2, 2)
It's not creating a new array of shape (4,2) which I think you're intending. Look at np.concatenate for that.
Note if you really want to use stack, the docs require all input arrays be the same shape:
Parameters: arrays : sequence of array_like Each array must have the
same shape.
So what you're doing is going to have undefined behavior.
EDIT: I read too quickly. You are trying to add an axis. Still, you can't pass uneven shapes to stack. You would have to pad them all the the same shape. Example:
arr = np.array(range(10)).reshape((5,2))
print arr
arr_p1 = np.zeros(arr[0:3].shape)
arr_p1_src = arr[0:2]
arr_p1[:arr_p1_src.shape[0],:arr_p1_src.shape[1]] = arr_p1_src
t2 = np.array([arr_p1, arr[0:3]])
print t2
Output:
[[[ 0. 1.]
[ 2. 3.]
[ 0. 0.]]
[[ 0. 1.]
[ 2. 3.]
[ 4. 5.]]]
Eventually np.vstack or np.hstack can be useful, if you vertical or horizontal stack is enough for you and you have at least one equal dimension.

Why can't I create a numpy array like this: array([1, 2], 3)

from numpy import array
test_list = [[1,2],3]
x = array(test_list) #ValueError: setting an array element with a sequence.
Basically, I have a point with 2 coordinates and a scale and I was trying to put several on a ndarray but I can't do it right now. I could just use [1,2,3] but I'm curious about why I can't store that type of list in an array.
It's failing because the array is non-rectangular. If we change the 3 to [3, 4] then it works.
>>> array([[1, 2], [3, 4]])
array([[1, 2],
[3, 4]])
You can do
x = array([[1,2],3], dtype=object_)
which gives you a an array with a generic "object" dtype. But that doesn't get you very far. Even if you did
x = array([array([1,2]),3], dtype=object_)
you would have x[0] be an array, but you still couldn't index x[0,0], you'd have to do x[0][0].
It's a shame; there are places where it would be useful to do things like this and then do x.sum(1) and get [3, 3] as the result. (You can always do map(sum, x).)
If you really need non-rectangular arrays, you can give awkward a try:
In [1]: import awkward
In [2]: test_list = [[1,2],3]
In [3]: x = awkward.fromiter(test_list)
In [4]: x
Out[4]: <UnionArray [[1 2] 3] at 0x7f69c087c390>
In [5]: x + 1
Out[5]: <UnionArray [[2 3] 4] at 0x7f69c08075c0>
In [6]: x[0]
Out[6]: array([1, 2])
In [7]: x[0, 1]
Out[7]: 2
It behaves in many ways like a numpy array.

Categories

Resources