Collapse nested array of arrays - python

I want to take an array with shape (N,), and dtype=object, of arrays that all have the same shape, shape, and create an array with shape == (N,) + shape. I was wondering if anyone knew the best way to do this. Here's an example.
import numpy as np
array = np.empty(4, dtype=object)
array[:] = [np.ones([3, 2])]
array = np.array(array.tolist())
print array.dtype
# float64
print array.shape
# (4, 3, 2)

If you already know the shape of your inner arrays (here, (3,2)), you could simplify the whole process as
subshape = (3,2)
a = np.empty(tuple([N,]+list(subshape)), dtype=object)
a[:] = np.ones(subshape)
That will let you avoid unnecessary conversions to/from lists.
Now, assuming you have a (N,) object array a where each element is a subshape float array, you could do:
a = np.vstack(a)
a.shape = [N,] + list(subshape)
or more simply:
a = np.array(a.tolist(), dtype=float)
the .tolist conversion might not be very efficient, though.

Related

Indexing array with array on numpy

It is similar to some questions around SO, but I don't quite understand the trick to get what I want.
I have two arrays,
arr of shape (x, y, z)
indexes of shape (x, y) which hold indexes of interest for z.
For each value of indexes I want to get the actual value in arr where:
arr.x == indexes.x
arr.y == indexes.y
arr.z == indexes[x,y]
This would give an array of shape(x,y) similar to indexes' shape.
For example:
arr = np.arange(99)
arr = arr.reshape(3,3,11)
indexes = np.asarray([
[0,2,2],
[1,2,3],
[3,2,10]])
# indexes.shape == (3,3)
# Example for the first element to be computed
first_element = arr[0,0,indexes[0,0]]
With the above indexes, the expected arrays would look like:
expected_result = np.asarray([
[0,13,24],
[34,46,58],
[69,79,98]])
I tried elements = np.take(arr, indexes, axis=z)
but it gives an array of shape (x, y, x, y)
I also tried things like elements = arr[indexes, indexes,:] but I don't get what I wish.
I saw a few answers involving transposing indexes and transforming it into tuples but I don't understand how it would help.
Note: I'm a bit new to numpy so I don't fully understand indexing yet.
How would you solve this numpy style ?
This can be done using np.take_along_axis
import numpy as np
#sample data
np.random.seed(0)
arr = np.arange(3*4*2).reshape(3, 4, 2) # 3d array
idx = np.random.randint(0, 2, (3, 4)) # array of indices
out = np.squeeze(np.take_along_axis(arr, idx[..., np.newaxis], axis=-1))
In this code, the array of indices gets added one more axis, so it can be broadcasted to the shape of the array arr from which we are making the selection. Then, since the return value of np.take_along_axis has the same shape as the array of indices, we need to remove this extra dimension using np.squeeze.
Another option is to use np.choose, but in this case the axis along which you are making selections must be moved to be the first axis of the array:
out = np.choose(idx, np.moveaxis(arr, -1, 0))
The solution here should work for you: Indexing 3d numpy array with 2d array
Adapted to your code:
ax_0 = np.arange(arr.shape[0])[:,None]
ax_1 = np.arange(arr.shape[1])[None,:]
new_array = arr[ax_0, ax_1, indexes]
You can perform such an operation with np.take_along_axis, the operation can only be applied along one dimension so you will need to reshape your input and indices.
The operation you are looking to perform is:
out[i, j] = arr[i, j, indices[i, j]]
However, we are forced to reshape both arr and indices, i.e. map (i, j) to k, such that we can apply np.take_along_axis. The following operation will take place:
out[k] = arr[k, indices[k]] # indexing along axis=1
The actual usage here comes down to:
>>> put = np.take_along_axis(arr.reshape(9, 11), indices.reshape(9, 1), axis=1)
array([[ 0],
[13],
[24],
[34],
[46],
[58],
[69],
[79],
[91]])
Then reshape back to the shape of indices:
>>> put.reshape(indices.shape)
array([[ 0, 13, 24],
[34, 46, 58],
[69, 79, 91]])

2D numpy array showing as 1D

I have a numpy ndarray train_data of length 200, where every row is another ndarray of length 10304.
However when I print np.shape(train_data), I get (200, 1), and when I print np.shape(train_data[0]) I get (1, ), and when I print np.shape(train_data[0][0]) I get (10304, ).
I am quite confused with this behavior as I supposed the first np.shape(train_data) should return (200, 10304).
Can someone explains to me why this is happening, and how could I get the array to be in shape of (200, 10304)?
This is because the arrays are constructed to be arrays of objects. Basically each element in the array is pointing to another array of size (1, ) which points to another array of size (10304, ). This is not equivalent to a normal ndarray in numpy so the shape is not recognized correctly. You can check this by looking at the dtypes.
To replicate what you see:
import numpy as np
arr = np.empty(200, dtype='object')
for i in range(200):
temp_arr = np.empty(1, dtype='object')
temp_arr[0] = np.zeros(10304)
arr[i] = temp_arr
print(arr.shape)
print(arr[0].shape)
print(arr[0][0].shape)
(200,)
(1,)
(10304,)
To get the (200, 10304) array back you need to "unpack" them:
new_arr = np.array([x[0] for x in arr])
#(200, 10304)
I'm not sure why that's happening, try reshaping the array:
B = np.reshape(A, (-1, 2))

About Numpy,a=np.array([1,2,3,4]),print a.shape[0]. why it will output 4?

import numpy as np
a = np.array([1,2,3,4])
print a.shape[0]
Why it will output 4?
The array [1,2,3,4], it's rows should be 1, I think , so who can explain the reason for me?
because
print(a.shape) # -> (4,)
what you think (or want?) to have is
a = np.array([[1],[2],[3],[4]])
print(a.shape) # -> (4, 1)
or rather (?)
a = np.array([[1, 2 , 3 , 4]])
print(a.shape) # -> (1, 4)
If you'll print a.ndim you'll get 1. That means that a is a one-dimensional array (has rank 1 in numpy terminology), with axis length = 4. It's different from 2D matrix with a single row or column (rank 2).
More on ranks
Related questions:
numpy: 1D array with various shape
Python: Differentiating between row and column vectors
The shape attribute for numpy arrays returns the dimensions of the array. If a has n rows and m columns, then a.shape is (n,m). So a.shape[0] is n and a.shape[1] is m.
numpy arrays returns the dimensions of the array. So, when you create an array using,
a = np.array([1,2,3,4])
you get an array with 4 dimensions. You can check it by printing the shape,
print(a.shape) #(4,)
So, what you get is NOT a 1x4 matrix. If you want that do,
a = numpy.array([1,2,3,4]).reshape((1,4))
print(a.shape)
Or even better,
a = numpy.array([[1,2,3,4]])
a = np.array([1, 2, 3, 4])
by doing this, you get a a as a ndarray, and it is a one-dimension array. Here, the shape (4,) means the array is indexed by a single index which runs from 0 to 3. You can access the elements by the index 0~3. It is different from multi-dimensional arrays.
You can refer to more help from this link Difference between numpy.array shape (R, 1) and (R,).

Delete 2d subarray from 3d array in numpy

In numpy I have a 3d array and I would ike to remove some of the 2d subarrays. Think about it like this:
r = range(27)
arr = np.reshape(r, (3,3,3))
del = [[0,1,2],[0,0,2]]
flatSeam = np.ravel_multi_index(del, arr.shape)
arr = np.delete(arr, flatSeam)
So at the end I would like to have an array of the shape (3,2,3) without the elements 00, 10, 22 from the original array. My problem is that I acn not use ravel_multi_index for this, because my indices are 2d and the array shape is 3d, so the wrong indices are calculated (the code above also does not execute because the indices array and the shape have to be the same size).
Do you have any ideas how I can achieve this?
Here's an approach using advanced-indexing -
# arr: Input array, rm_idx : 2-row list/array of indices to be removed
m,n,p = arr.shape
mask = np.asarray(rm_idx[1])[:,None] != np.arange(n)
out = arr[np.arange(m)[:,None],np.where(mask)[1].reshape(m,-1)]
Alternatively, with boolean-indexing -
out = arr.reshape(-1,p)[mask.ravel()].reshape(m,-1,p)
A bit less memory-intensive approach as we try to avoid creating 2D mask -
vmask = ~np.in1d(np.arange(m*n),rm_idx[1] + n*np.arange(m))
out = arr.reshape(-1,p)[vmask].reshape(m,-1,p)

Append numpy array into an element

I have a Numpy array of shape (5,5,3,2). I want to take the element (1,4) of that matrix, which is also a matrix of shape (3,2), and add an element to it -so it becomes a (4,2) array.
The code I'm using is the following:
import numpy as np
a = np.random.rand(5,5,3,2)
a = np.array(a, dtype = object) #So I can have different size sub-matrices
a[2][3] = np.append(a[2][3],[[1.0,1.0]],axis=0) #a[2][3] shape = (3,2)
I'm always obtaining the error:
ValueError: could not broadcast input array from shape (4,2) into shape (3,2)
I understand that the shape returned by the np.append function is not the same as the a[2][3] sub-array, but I thought that the dtype=object would solve my problem. However, I need to do this. Is there any way to go around this limitation?
I also tried to use the insert function but I don't know how could I add the element in the place I want.
Make sure you understand what you have produced. That requires checking the shape and dtype, and possibly looking at the values
In [29]: a = np.random.rand(5,5,3,2)
In [30]: b=np.array(a, dtype=object)
In [31]: a.shape
Out[31]: (5, 5, 3, 2) # a is a 4d array
In [32]: a.dtype
Out[32]: dtype('float64')
In [33]: b.shape
Out[33]: (5, 5, 3, 2) # so is b
In [34]: b.dtype
Out[34]: dtype('O')
In [35]: b[2,3].shape
Out[35]: (3, 2)
In [36]: c=np.append(b[2,3],[[1,1]],axis=0)
In [37]: c.shape
Out[37]: (4, 2)
In [38]: c.dtype
Out[38]: dtype('O')
b[2][3] is also an array. b[2,3] is the proper numpy way of indexing 2 dimensions.
I suspect you wanted b to be a (5,5) array containing arrays (as objects), and you think that you you can simply replace one of those with a (4,2) array. But the b constructor simply changes the floats of a to objects, without changing the shape (or 4d nature) of b.
I could construct a (5,5) object array, and fill it with values from a. And then replace one of those values with a (4,2) array:
In [39]: B=np.empty((5,5),dtype=object)
In [40]: for i in range(5):
...: for j in range(5):
...: B[i,j]=a[i,j,:,:]
...:
In [41]: B.shape
Out[41]: (5, 5)
In [42]: B.dtype
Out[42]: dtype('O')
In [43]: B[2,3]
Out[43]:
array([[ 0.03827568, 0.63411023],
[ 0.28938383, 0.7951006 ],
[ 0.12217603, 0.304537 ]])
In [44]: B[2,3]=c
In [46]: B[2,3].shape
Out[46]: (4, 2)
This constructor for B is a bit crude. I've answered other questions about creating/filling object arrays, but I'm not going to take the time here to streamline this case. It's for illustration purposes only.
In an array of object, any element can be indeed an array (or any kind of object).
import numpy as np
a = np.random.rand(5,5,3,2)
a = np.array(a, dtype=object)
# Assign an 1D array to the array element ``a[2][3][0][0]``:
a[2][3][0][0] = np.arange(10)
a[2][3][0][0][9] # 9
However a[2][3] is not an array element, it is a whole array.
a[2][3].ndim # 2
Therefore when you do a[2][3] = (something) you are using broadcasting instead of assigning an element: numpy tries to replace the content of the subarray a[2][3] and fails because of shape mismatch. The memory layout of numpy arrays does not allow to change the shape of subarrays.
Edit: Instead of using numpy arrays you could use nested lists. These nested lists can have arbitrary sizes. Note that the memory is higher and that the access time is higher compared to numpy array.
import numpy as np
a = np.random.rand(5,5,3,2)
a = np.array(a, dtype=object)
b = np.append(a[2][3], [[1.0,1.0]],axis=0)
a_list = a.tolist()
a_list[2][3] = b.tolist()
The problem here, is that you try to assign to a[2][3]
Make a new array instead.
new_array = np.append(a[2][3],np.array([[1.0,1.0]]),axis=0)

Categories

Resources