Numpy array of numpy arrays has 1D shape - python

I have two numpy arrays of arrays (A and B). They look something like this when printed:
A:
[array([0, 0, 0]) array([0, 0, 0]) array([1, 0, 0]) array([0, 0, 0])
array([0, 0, 0]) array([0, 0, 0]) array([0, 0, 0]) array([0, 0, 0])
array([0, 0, 0]) array([0, 0, 0]) array([0, 0, 1]) array([0, 0, 0])
array([1, 0, 0]) array([0, 0, 1]) array([0, 0, 0]) array([0, 0, 0])
array([0, 0, 0]) array([1, 0, 0]) array([0, 0, 1]) array([0, 0, 0])]
B:
[[ 4.302135e-01 4.320091e-01 4.302135e-01 4.302135e-01
1.172584e+08]
[ 4.097128e-01 4.097128e-01 4.077675e-01 4.077675e-01
4.397120e+07]
[ 3.796353e-01 3.796353e-01 3.778396e-01 3.778396e-01
2.643200e+07]
[ 3.871173e-01 3.890626e-01 3.871173e-01 3.871173e-01
2.161040e+07]
[ 3.984899e-01 4.002856e-01 3.984899e-01 3.984899e-01
1.836240e+07]
[ 4.227315e-01 4.246768e-01 4.227315e-01 4.227315e-01
1.215760e+07]
[ 4.433817e-01 4.451774e-01 4.433817e-01 4.433817e-01
9.340800e+06]
[ 4.620867e-01 4.638823e-01 4.620867e-01 4.620867e-01
1.173760e+07]]
type(A), type(A[0]), type(B), type(B[0]) are all <class 'numpy.ndarray'>.
However, A.shape is (20,), while B.shape is (8, 5).
Question 1: Why is A.shape one-dimensional, and how do I make it two-dimensional like B.shape? They're both arrays of arrays, right?
Question 2, possibly related to Q1: Why does printing A show the calls of array(), while printing B doesn't, and why do the elements of the subarrays of B not have commas in-between them?
Thanks in advance.

A.dtype is O, object, B.dtype is float.
A is a 1d array that contains objects, which happen to be arrays. They could just as well be lists or None`.
B is a 2d array of floats. Indexing one row of B gives a 1d array.
So A[0] and B[0] can appear to produce the same thing, but the selection process is different.
Try np.concatenate(A), or np.vstack(A). Both of these then treat A as a list of arrays, and join them either in 1 or 2d.
Converting object arrays to regular comes up quite often.
Converting a 3D List to a 3D NumPy array
is a little more general that what you need, but gives a lot of useful information.
also
Convert a numpy array of lists to a numpy array
==================
In [28]: A=np.empty((5,),object)
In [31]: A
Out[31]: array([None, None, None, None, None], dtype=object)
In [32]: for i in range(5):A[i]=np.zeros((3,),int)
In [33]: A
Out[33]:
array([array([0, 0, 0]), array([0, 0, 0]), array([0, 0, 0]),
array([0, 0, 0]), array([0, 0, 0])], dtype=object)
In [34]: print(A)
[array([0, 0, 0]) array([0, 0, 0]) array([0, 0, 0]) array([0, 0, 0])
array([0, 0, 0])]
In [35]: np.vstack(A)
Out[35]:
array([[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0]])
Edit
np.stack(A)
can join the arrays on a new leading axis.
If the subarrays differ in shape, these 'stack' functions will raise an error. It's up to you to find the problem array(s).

Related

How can I add a 1D array to a segment of a 2D array?

I have a 2D NumPy array filled with zeroes (placeholder values). I would like to add a 1D array filled with ones and zeroes to a part of it. eg.
2D array:
array([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]])
1D array:
array([1, 0, 1])
Desired end product: I want the array starting in position [2, 1]
array([[0, 0, 0, 0, 0],
[0, 0, 1, 0, 1],
[0, 0, 0, 0, 0]])
Or an insertion in any other position it could reasonably fit in. I have tried to do it with boolean masks but have not had any luck creating one in the correct shape. I have also tried flattening the 2D array, but couldn't figure out how to replace the values in the correct space.
You can indeed flatten the array and create a sequence of positions where you will insert your 1D array segment:
>>> pos = [1, 2]
>>> start = x.shape[1]*pos[0] + pos[1]
>>> seq = start + np.arange(len(segment))
>>> seq
array([7, 8, 9])
Then, you can either index the flattened array:
>>> x_f = x.flatten()
>>> x_f[seq] = segment
>>> x_f.reshape(x.shape)
array([[0, 0, 0, 0, 0],
[0, 0, 1, 0, 1],
[0, 0, 0, 0, 0]])
Alternatively, you can np.ravel_multi_index to get seq and apply np.unravel_index on it.
>>> seq = np.arange(len(segment)) + np.ravel_multi_index(pos, x.shape)
array([7, 8, 9])
>>> indices = np.unravel_index(seq, x.shape)
(array([1, 1, 1]), array([2, 3, 4]))
>>> x[indices] = segment
>>> x
array([[0, 0, 0, 0, 0],
[0, 0, 1, 0, 1],
[0, 0, 0, 0, 0]])

Create a new array out of two numpy arrays

I'm trying to create a new array out of two other arrays. I already tried multiple np.append() statements along multiple axis. Here is some code:
arr1 = np.zeros(2, 3)
arr2 = np.zeros(2, 2)
new_arr = np.append(arr1, arr2)
print(new_arr)
Desired output:
[
[[0, 0, 0], [0, 0, 0]],
[[0, 0], [0, 0]]
]
Actual output:
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
try this
import numpy as np
arr1 = np.array([0, 0, 0])
arr2 = np.array([0, 0, 0])
final_arr = np.concatenate((arr1, arr2))
print(final_arr)
Refer this --> https://numpy.org/doc/stable/reference/generated/numpy.concatenate.html
You can do it this way:
np.asarray([list(arr1),list(arr2)], dtype = 'O')
The dtype = 'O' means Object type.

Trying to modify np array diagonal

I am trying to modify the diagonal values of a 6 x 5 2D numpy array (It's an exercise in this scipy tutorial: http://scipy-lectures.org/intro/numpy/array_object.html#basic-visualization). I'm supposed to change the values of a diagonal from zeroes to 2,3,4,5,6. Since it's a 6 x 5 matrix, there's not really a "main" diagonal, and so I need to change the diagonal starting from the second row ([1][0]) to [5][4]. They suggest reading the docstring for diag. I did, and I still can't figure out how to do this. Any suggestions?
You can just slice an array, and fill_diagonal of that:
In [13]: import numpy as np
In [14]: a = np.zeros((6,5), int)
In [15]: np.fill_diagonal(a[1:], [2,3,4,5,6])
In [16]: a
Out[16]:
array([[0, 0, 0, 0, 0],
[2, 0, 0, 0, 0],
[0, 3, 0, 0, 0],
[0, 0, 4, 0, 0],
[0, 0, 0, 5, 0],
[0, 0, 0, 0, 6]])

converty numpy array of arrays to 2d array

I have a pandas series features that has the following values (features.values)
array([array([0, 0, 0, ..., 0, 0, 0]), array([0, 0, 0, ..., 0, 0, 0]),
array([0, 0, 0, ..., 0, 0, 0]), ...,
array([0, 0, 0, ..., 0, 0, 0]), array([0, 0, 0, ..., 0, 0, 0]),
array([0, 0, 0, ..., 0, 0, 0])], dtype=object)
Now I really want this to be recognized as matrix, but if I do
>>> features.values.shape
(10000,)
rather than (10000, 3000) which is what I would expect.
How can I get this to be recognized as 2d rather than a 1d array with arrays as values. Also why does it not automatically detect it as a 2d array?
In response your comment question, let's compare 2 ways of creating an array
First make an array from a list of arrays (all same length):
In [302]: arr = np.array([np.arange(3), np.arange(1,4), np.arange(10,13)])
In [303]: arr
Out[303]:
array([[ 0, 1, 2],
[ 1, 2, 3],
[10, 11, 12]])
The result is a 2d array of numbers.
If instead we make an object dtype array, and fill it with arrays:
In [304]: arr = np.empty(3,object)
In [305]: arr[:] = [np.arange(3), np.arange(1,4), np.arange(10,13)]
In [306]: arr
Out[306]:
array([array([0, 1, 2]), array([1, 2, 3]), array([10, 11, 12])],
dtype=object)
Notice that this display is like yours. This is, by design a 1d array. Like a list it contains pointers to arrays elsewhere in memory. Notice that it requires an extra construction step. The default behavior of np.array is to create a multidimensional array where it can.
It takes extra effort to get around that. Likewise it takes some extra effort to undo that - to create the 2d numeric array.
Simply calling np.array on it does not change the structure.
In [307]: np.array(arr)
Out[307]:
array([array([0, 1, 2]), array([1, 2, 3]), array([10, 11, 12])],
dtype=object)
stack does change it to 2d. stack treats it as a list of arrays, which it joins on a new axis.
In [308]: np.stack(arr)
Out[308]:
array([[ 0, 1, 2],
[ 1, 2, 3],
[10, 11, 12]])
Shortening #hpauli answer:
your_2d_arry = np.stack(arr_of_arr_object)

Numpy efficient indexing with varied size arrays

Take a look at this piece of code:
import numpy as np
a = np.random.random(10)
indicies = [
np.array([1, 4, 3]),
np.array([2, 5, 8, 7, 3]),
np.array([1, 2]),
np.array([3, 2, 1])
]
result = np.zeros(2)
result[0] = a[indicies[0]].sum()
result[1] = a[indicies[2]].sum()
Is there any way to get result more efficiently? In my case a is a very large array.
In other words I want to select elements from a with several varying size index arrays and then sum over them in one operation, resulting in a single array.
With your a and indicies list:
In [280]: [a[i].sum() for i in indicies]
Out[280]:
[1.3986792680307709,
2.6354365193743732,
0.83324677494990895,
1.8195179021311731]
Which of course could wrapped in np.array().
For a subset of the indicies items use:
In [281]: [a[indicies[i]].sum() for i in [0,2]]
Out[281]: [1.3986792680307709, 0.83324677494990895]
A comment suggests indicies comes from an Adjacency matrix, possibly sparse.
I could recreate such an array with:
In [289]: A=np.zeros((4,10),int)
In [290]: for i in range(4): A[i,indicies[i]]=1
In [291]: A
Out[291]:
array([[0, 1, 0, 1, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 0, 1, 0, 1, 1, 0],
[0, 1, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 0, 0, 0, 0, 0, 0]])
and use a matrix product (np.dot) to do the selection and sum:
In [292]: A.dot(a)
Out[292]: array([ 1.39867927, 2.63543652, 0.83324677, 1.8195179 ])
A[[0,2],:].dot(a) would use a subset of rows.
A sparse matrix version has that list of row indices:
In [294]: Al=sparse.lil_matrix(A)
In [295]: Al.rows
Out[295]: array([[1, 3, 4], [2, 3, 5, 7, 8], [1, 2], [1, 2, 3]], dtype=object)
And a matrix product with that gives the same numbers:
In [296]: Al*a
Out[296]: array([ 1.39867927, 2.63543652, 0.83324677, 1.8195179 ])
If your array a is very large you might have memory issues if your array of indices contains many arrays of many indices when looping through it.
To avoid this issue use an iterator instead of a list :
indices = iter(indices)
and then loop through your iterator.

Categories

Resources