Numpy stack with unequal shapes

Numpy stack with unequal shapes - python

I've noticed that the solution to combining 2D arrays to 3D arrays through np.stack, np.dstack, or simply passing a list of arrays only works when the arrays have same .shape[0].
For instance, say I have:
print(arr)
[[0 1]
[2 3]
[4 5]
[6 7]
[8 9]]
it easy easy to get to:
print(np.array([arr[2:4], arr[3:5]])) # same shape
[[[4 5]
[6 7]]
[[6 7]
[8 9]]]
However, if I pass a list of arrays of unequal length, I get:
print(np.array([arr[:2], arr[:3]]))
[array([[0, 1],
[2, 3]])
array([[0, 1],
[2, 3],
[4, 5]])]
How can I get to simply:
[[[0, 1]
[2, 3]]
[[0, 1]
[2, 3]
[4, 5]]]
What I've tried: a number of other Array manipulation routines.
Note: ultimately want to do this for more than 2 arrays, so np.append is probably not ideal.

Numpy arrays have to be rectangular, so what you are trying to get is not possible with a numpy array.
You need a different data structure. Which one is suitable depends on what you want to do with that data.

I've made a function that works for this problem, assuming that you are willing to pad to make the shape rectangular, and you have arbitrarily higher multidimensional arrays. It could probably be optimised further, but it's not too bad.
import numpy as np
def stack_uneven(arrays, fill_value=0.):
'''
Fits arrays into a single numpy array, even if they are
different sizes. `fill_value` is the default value.
Args:
arrays: list of np arrays of various sizes
(must be same rank, but not necessarily same size)
fill_value (float, optional):
Returns:
np.ndarray
'''
sizes = [a.shape for a in arrays]
max_sizes = np.max(list(zip(*sizes)), -1)
# The resultant array has stacked on the first dimension
result = np.full((len(arrays),) + tuple(max_sizes), fill_value)
for i, a in enumerate(arrays):
# The shape of this array `a`, turned into slices
slices = tuple(slice(0,s) for s in sizes[i])
# Overwrite a block slice of `result` with this array `a`
result[i][slices] = a
return result
The only caveat to using this is that the input must able to be treated a sequence of numpy arrays. So for your example of
arr = np.array([[0, 1],
[2, 3],
[4, 5],
[6, 7],
[8, 9]])
stack_uneven([arr[:2], arr[:3]], 0)
This would give you
array([[[0, 1],
[2, 3],
[0, 0]],
[[0, 1],
[2, 3],
[4, 5]]])
But this works equally for higher dimensional things, like:
arr = [np.ones([3, 2, 2]), np.ones([2, 3, 2]), np.ones([2, 2, 3])]

The function np.stack joins multiple arrays along a new axis, not an existing one. See:
>>> import numpy as np
>>> arr = np.array(range(10)).reshape((5,2))
>>> print arr
[[0 1]
[2 3]
[4 5]
[6 7]
[8 9]]
>>> t1 = np.array([arr[2:4], arr[3:5]])
>>> print t1.shape
(2, 2, 2)
It's not creating a new array of shape (4,2) which I think you're intending. Look at np.concatenate for that.
Note if you really want to use stack, the docs require all input arrays be the same shape:
Parameters: arrays : sequence of array_like Each array must have the
same shape.
So what you're doing is going to have undefined behavior.
EDIT: I read too quickly. You are trying to add an axis. Still, you can't pass uneven shapes to stack. You would have to pad them all the the same shape. Example:
arr = np.array(range(10)).reshape((5,2))
print arr
arr_p1 = np.zeros(arr[0:3].shape)
arr_p1_src = arr[0:2]
arr_p1[:arr_p1_src.shape[0],:arr_p1_src.shape[1]] = arr_p1_src
t2 = np.array([arr_p1, arr[0:3]])
print t2
Output:
[[[ 0. 1.]
[ 2. 3.]
[ 0. 0.]]
[[ 0. 1.]
[ 2. 3.]
[ 4. 5.]]]

Eventually np.vstack or np.hstack can be useful, if you vertical or horizontal stack is enough for you and you have at least one equal dimension.

Related

Appending contents of 1D numpy array to another 2D numpy array

I have three numpy arrays. The shape of the first is (413, 2), the shape of the second is (176, 2), and the shape of the third is (589,). If you'll notice, 413 + 176 = 589. What I want to accomplish is to use the 589 values of the third np array and make the first two arrays of shapes (413, 3) and (176, 3) respectively.
So, what I want is to take the values in the third np array and append them to the columns of the first and second np arrays. I can do the logic for applying to the first and then using the offset of the length of the first to continue appending to the second with the correct values. I suppose I could also combine np arrays 1 and 2, they are separated for a reason though because of my data preprocessing.
To put it visually if that helps, what I have is like this:
Array 1:
[[1 2]
[3 4]
[4 5]]
Array 2:
[[6 7]
[8 9]
[10 11]]
Array 3:
[1 2 3 4 5 6]
And what I want to have is:
Array 1:
[[1 2 1]
[3 4 2]
[4 5 3]]
Array 2:
[[6 7 4]
[8 9 5]
[10 11 6]]
I've tried using np.append, np.concatenate, and np.vstack but have not been able to achieve what I am looking for. I am relatively new to using numpy, and Python in general, so I imagine I am just using these tools incorrectly.
Many thanks for any help that can be offered! This is my first time asking a question here so if I did anything wrong or left anything out please let me know.

Split the third array using the length of array1, then horizontally stack them. You need to use either np.newaxis or array.reshape to change the dimensionality of the slice of array3.
import numpy as np
array1 = np.array(
[[1, 2],
[3, 4],
[4, 5]]
)
array2 = np.array(
[[6, 7],
[8, 9],
[10, 11]]
)
array3 = np.array([1, 2, 3, 4, 5, 6])
array13 = np.hstack([array1, array3[:len(array1), np.newaxis]])
array23 = np.hstack([array1, array3[len(array1):, np.newaxis]])
Outputs:
array13
array([[1, 2, 4],
[3, 4, 5],
[4, 5, 6]])
array23
array([[ 6, 7, 4],
[ 8, 9, 5],
[10, 11, 6]])

Numpy array indexing with a List: difference between arr[:][li] and arr[:,li]

What is the explanation of the following behavior:
import numpy as np
arr = np.zeros((3, 3))
li = [1,2]
print('output1:', arr[:, li].shape)
print('output2:', arr[:][li].shape)
>>output1: (3, 2)
>>output2: (2, 3)
I would expect output2 to be equal to output1.

Let's use a different array where it's easier to see the difference:
>>> arr = np.arange(9).reshape(3, 3)
>>> arr
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
The first case arr[:, li] will select all elements from the first dimension (in this case all the rows), then index the array with [1, 2], which means just leaving out the first column:
array([[1, 2],
[4, 5],
[7, 8]])
Hence, the shape of this is (3, 2).
The other case arr[:] will copy the original array, so it doesn't change the shape, therefore it's equvivalent to arr[li], hence the output shape is (2, 3). In general you should avoid double indexing an array, because that might create views twice, which is inefficient.

You are getting the the correct output.
In first line
print('output1:', arr[:, li].shape)
You are printing 2nd and 3rd element of each subarray within arr, thus getting 3 elements each containing 2 values.
In second line
print('output2:', arr[:][li].shape)
You are selecting first the whole array, then from the whole array you select 2nd and 3rd element (each containing 3 elements themselves), thus getting 2 elements each containing 3 values.

The difference can be seen if you examine this code -
import numpy as np
arr = np.arange(9).reshape(3, 3)
li = [1,2]
print('output1:', arr[:, li])
print('output2:', arr[:][li])
This gives -
[[1 2]
[4 5]
[7 8]]
and
[[3 4 5]
[6 7 8]]
When you do arr[:, [1, 2]], what you are saying that you want to take all the rows of the array (: specifies this) and, from that, take column [1, 2].
On the other hand, when you do arr[:], you are referring to the full array first. Out of which you are again taking the first two rows.
Essentially, in the second case, [1 2] is referring to the row axis of the original array while in the first case, it's referring to the column.

Numpy 3d array indexing

I have a 3d numpy array (n_samples x num_components x 2) in the example below n_samples = 5 and num_components = 7.
I have another array (indices) which is the selected component for each sample which is of shape (n_samples,).
I want to select from the data array given the indices so that the resulting array is n_samples x 2.
The code is below:
import numpy as np
np.random.seed(77)
data=np.random.randint(low=0, high=10, size=(5, 7, 2))
indices = np.array([0, 1, 6, 4, 5])
#how can I select indices from the data array?
For example for data 0, the selected component should be the 0th and for data 1 the selected component should be 1.
Note that I can't use any for loops because I'm using it in Theano and the solution should be solely based on numpy.

Is this what you are looking for?
In [36]: data[np.arange(data.shape[0]),indices,:]
Out[36]:
array([[7, 4],
[7, 3],
[4, 5],
[8, 2],
[5, 8]])

To get component #0, use
data[:, 0]
i.e. we get every entry on axis 0 (samples), and only entry #0 on axis 1 (components), and implicitly everything on the remaining axes.
This can be easily generalized to
data[:, indices]
to select all relevant components.
But what OP really wants is just the diagonal of this array, i.e. (data[0, indices[0]], (data[1, indices[1]]), ...) The diagonal of a high-dimensional array can be extracted using the diagonal function:
>>> np.diagonal(data[:, indices])
array([[7, 7, 4, 8, 5],
[4, 3, 5, 2, 8]])
(You may need to transpose the result.)

You have a variety of ways to do so, but this is my loop recommendation:
selection = np.array([ datum[indices[k]] for k,datum in enumerate(data)])
The resulting array, selection, has the desired shape.

understanding numpy's dstack function

I have some trouble understanding what numpy's dstack function is actually doing. The documentation is rather sparse and just says:
Stack arrays in sequence depth wise (along third axis).
Takes a sequence of arrays and stack them along the third axis
to make a single array. Rebuilds arrays divided by dsplit.
This is a simple way to stack 2D arrays (images) into a single
3D array for processing.
So either I am really stupid and the meaning of this is obvious or I seem to have some misconception about the terms 'stacking', 'in sequence', 'depth wise' or 'along an axis'. However, I was of the impression that I understood these terms in the context of vstack and hstack just fine.
Let's take this example:
In [193]: a
Out[193]:
array([[0, 3],
[1, 4],
[2, 5]])
In [194]: b
Out[194]:
array([[ 6, 9],
[ 7, 10],
[ 8, 11]])
In [195]: dstack([a,b])
Out[195]:
array([[[ 0, 6],
[ 3, 9]],
[[ 1, 7],
[ 4, 10]],
[[ 2, 8],
[ 5, 11]]])
First of all, a and b don't have a third axis so how would I stack them along 'the third axis' to begin with? Second of all, assuming a and b are representations of 2D-images, why do I end up with three 2D arrays in the result as opposed to two 2D-arrays 'in sequence'?

It's easier to understand what np.vstack, np.hstack and np.dstack* do by looking at the .shape attribute of the output array.
Using your two example arrays:
print(a.shape, b.shape)
# (3, 2) (3, 2)
np.vstack concatenates along the first dimension...
print(np.vstack((a, b)).shape)
# (6, 2)
np.hstack concatenates along the second dimension...
print(np.hstack((a, b)).shape)
# (3, 4)
and np.dstack concatenates along the third dimension.
print(np.dstack((a, b)).shape)
# (3, 2, 2)
Since a and b are both two dimensional, np.dstack expands them by inserting a third dimension of size 1. This is equivalent to indexing them in the third dimension with np.newaxis (or alternatively, None) like this:
print(a[:, :, np.newaxis].shape)
# (3, 2, 1)
If c = np.dstack((a, b)), then c[:, :, 0] == a and c[:, :, 1] == b.
You could do the same operation more explicitly using np.concatenate like this:
print(np.concatenate((a[..., None], b[..., None]), axis=2).shape)
# (3, 2, 2)
* Importing the entire contents of a module into your global namespace using import * is considered bad practice for several reasons. The idiomatic way is to import numpy as np.

Let x == dstack([a, b]). Then x[:, :, 0] is identical to a, and x[:, :, 1] is identical to b. In general, when dstacking 2D arrays, dstack produces an output such that output[:, :, n] is identical to the nth input array.
If we stack 3D arrays rather than 2D:
x = numpy.zeros([2, 2, 3])
y = numpy.ones([2, 2, 4])
z = numpy.dstack([x, y])
then z[:, :, :3] would be identical to x, and z[:, :, 3:7] would be identical to y.
As you can see, we have to take slices along the third axis to recover the inputs to dstack. That's why dstack behaves the way it does.

I'd like to take a stab at visually explaining this (even though the accepted answer makes enough sense, it took me a few seconds to rationalise this to my mind).
If we imagine the 2d-arrays as a list of lists, where the 1st axis gives one of the inner lists and the 2nd axis gives the value in that list, then the visual representation of the OP's arrays will be this:
a = [
[0, 3],
[1, 4],
[2, 5]
]
b = [
[6, 9],
[7, 10],
[8, 11]
]
# Shape of each array is [3,2]
Now, according to the current documentation, the dstack function adds a 3rd axis, which means each of the arrays end up looking like this:
a = [
[[0], [3]],
[[1], [4]],
[[2], [5]]
]
b = [
[[6], [9]],
[[7], [10]],
[[8], [11]]
]
# Shape of each array is [3,2,1]
Now, stacking both these arrays in the 3rd dimension simply means that the result should look, as expected, like this:
dstack([a,b]) = [
[[0, 6], [3, 9]],
[[1, 7], [4, 10]],
[[2, 8], [5, 11]]
]
# Shape of the combined array is [3,2,2]
Hope this helps.

Because you mention "images", I think this example would be useful. If you're using Keras to train a 2D convolution network with the input X, then it is best to keep X with the dimension (#images, dim1ofImage, dim2ofImage).
image1 = np.array([[4,2],[5,5]])
image2 = np.array([[3,1],[6,7]])
image1 = image1.reshape(1,2,2)
image2 = image2.reshape(1,2,2)
X = np.stack((image1,image2),axis=1)
X
array([[[[4, 2],
[5, 5]],
[[3, 1],
[6, 7]]]])
np.shape(X)
X = X.reshape((2,2,2))
X
array([[[4, 2],
[5, 5]],
[[3, 1],
[6, 7]]])
X[0] # image 1
array([[4, 2],
[5, 5]])
X[1] # image 2
array([[3, 1],
[6, 7]])

List of lists into numpy array

How do I convert a simple list of lists into a numpy array? The rows are individual sublists and each row contains the elements in the sublist.

If your list of lists contains lists with varying number of elements then the answer of Ignacio Vazquez-Abrams will not work. Instead there are at least 3 options:
1) Make an array of arrays:
x=[[1,2],[1,2,3],[1]]
y=numpy.array([numpy.array(xi) for xi in x])
type(y)
>>><type 'numpy.ndarray'>
type(y[0])
>>><type 'numpy.ndarray'>
2) Make an array of lists:
x=[[1,2],[1,2,3],[1]]
y=numpy.array(x)
type(y)
>>><type 'numpy.ndarray'>
type(y[0])
>>><type 'list'>
3) First make the lists equal in length:
x=[[1,2],[1,2,3],[1]]
length = max(map(len, x))
y=numpy.array([xi+[None]*(length-len(xi)) for xi in x])
y
>>>array([[1, 2, None],
>>> [1, 2, 3],
>>> [1, None, None]], dtype=object)

>>> numpy.array([[1, 2], [3, 4]])
array([[1, 2], [3, 4]])

As this is the top search on Google for converting a list of lists into a Numpy array, I'll offer the following despite the question being 4 years old:
>>> x = [[1, 2], [1, 2, 3], [1]]
>>> y = numpy.hstack(x)
>>> print(y)
[1 2 1 2 3 1]
When I first thought of doing it this way, I was quite pleased with myself because it's soooo simple. However, after timing it with a larger list of lists, it is actually faster to do this:
>>> y = numpy.concatenate([numpy.array(i) for i in x])
>>> print(y)
[1 2 1 2 3 1]
Note that #Bastiaan's answer #1 doesn't make a single continuous list, hence I added the concatenate.
Anyway...I prefer the hstack approach for it's elegant use of Numpy.

It's as simple as:
>>> lists = [[1, 2], [3, 4]]
>>> np.array(lists)
array([[1, 2],
[3, 4]])

Again, after searching for the problem of converting nested lists with N levels into an N-dimensional array I found nothing, so here's my way around it:
import numpy as np
new_array=np.array([[[coord for coord in xk] for xk in xj] for xj in xi], ndmin=3) #this case for N=3

The OP specified that "the rows are individual sublists and each row contains the elements in the sublist".
Assuming that the use of numpy is not prohibited (given that the flair numpy has been added in the OP), use vstack:
import numpy as np
list_of_lists= [[1, 2, 3], [4, 5, 6], [7 ,8, 9]]
array = np.vstack(list_of_lists)
# array([[1, 2, 3],
# [4, 5, 6],
# [7, 8, 9]])
or simpler (as mentioned in another answer),
array = np.array(list_of_lists)

As mentioned in the other answers, np.vstack() will let you convert your list-of-lists(nested list) into a 1-dimensional array of sublists. But if you are looking to convert the list of lists into a 2-dimensional numpy.ndarray. Then you can use the numpy.asarray() function.
For example, if you have a list of lists named y_true that looks like:
[[0, 1, 0], [1, 0, 0], [0, 0, 1], [1, 0, 0], [0, 1, 0], [0, 0, 1], [1, 0, 0]]
<class 'list'>
This line y_true = np.asarray(y_true) will convert the list of lists into a 2-dimensional numpy ndarray that looks like:
[[0 1 0]
[1 0 0]
[0 0 1]
[1 0 0]
[0 1 0]
[0 0 1]
[1 0 0]]
<class 'numpy.ndarray'>
Additionally, you can also specify the dtype parameter like np.asarray(y_true, dtype = float) to have your array values in your desired data type.

I had a list of lists of equal length. Even then Ignacio Vazquez-Abrams's answer didn't work out for me. I got a 1-D numpy array whose elements are lists. If you faced the same problem, you can use the below method
Use numpy.vstack
import numpy as np
np_array = np.empty((0,4), dtype='float')
for i in range(10)
row_data = ... # get row_data as list
np_array = np.vstack((np_array, np.array(row_data)))

Just use pandas
list(pd.DataFrame(listofstuff).melt().values)
this only works for a list of lists
if you have a list of list of lists you might want to try something along the lines of
lists(pd.DataFrame(listofstuff).melt().apply(pd.Series).melt().values)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Numpy stack with unequal shapes - python

Numpy arrays have to be rectangular, so what you are trying to get is not possible with a numpy array. You need a different data structure. Which one is suitable depends on what you want to do with that data.

Eventually np.vstack or np.hstack can be useful, if you vertical or horizontal stack is enough for you and you have at least one equal dimension.

Related

Appending contents of 1D numpy array to another 2D numpy array

Numpy array indexing with a List: difference between arr[:][li] and arr[:,li]

Numpy 3d array indexing

understanding numpy's dstack function

List of lists into numpy array

Categories

Resources