I currently have an ndarray of shape (27,) where each array entry is an array of shape (121,61). I'd like to reshape the ndarray to a new size of (3267, 61), which is just expanding/flattening the nested arrays into one.
I've tried using the .resize(3267, 61) and .reshape(3267, 61) but when I do, the following error appears:
ValueError: cannot reshape array of size 27 into shape (3267, 61)
ValueError: cannot resize this array: it does not own its data
You can use np.stack() to turn a sequence of arrays into a single ndarray, which can then be reshaped as you need:
>>> a = np.zeros((27,), dtype=object)
>>> for i in range(a.shape[0]):
... a[i] = np.zeros((121, 61))
>>> b = np.stack(a).reshape((27*121, 61))
>>> b.shape
(3267, 61)
If the array elements are indeed other 1D nDarray (as #CamiloMartínez showed it can happen), then use:
b = np.concatenate(a)
If instead the array is just a 3D array (e.g. obtained by putting together a list of arrays and letting numpy optimize it), then use:
b = a.reshape(-1, a.shape[-1])
General case: If you are unsure, then the following works in either case. It also works in the case where a is a 2D (or higher dimensions) array containing arrays (as #drod31 was asking in the comments):
b = np.stack(a.ravel())
b = b.reshape(-1, b.shape[-1])
Here is a minimal example:
case 1: (thx #CamiloMartínez for the setup).
a = np.empty((27,), dtype=object)
for i in range(a.shape[0]):
a[i] = np.zeros((121, 61))
b = np.concatenate(a)
>>> b.shape
(3267, 61)
case 2 (my initial setup, that missed the actual array of array condition):
a = np.array([np.zeros((121, 61)) for _ in range(27)])
b = a.reshape(-1, a.shape[-1])
>>> b.shape
(3267, 61)
In any case, you'd usually like to express the transformation without explicit hardcoded dimensions, for more general use.
Corner-case example (as per #drod31 question):
a = np.empty((15,27), dtype=object)
for i in range(a.shape[0]):
for j in range(a.shape[1]):
a[i,j] = np.zeros((121, 61))
>>> a.shape
(15, 27)
>>> a[0,0].shape
(121, 61)
b = np.stack(a.ravel())
b = b.reshape(-1, b.shape[-1])
>>> b.shape
(49005, 61)
Related
One can use numpy.where for selecting values from two arrays depending on a condition:
import numpy
a = numpy.random.rand(5)
b = numpy.random.rand(5)
c = numpy.where(a > 0.5, a, b) # okay
If the array has more dimensions, however, this does not work anymore:
import numpy
a = numpy.random.rand(5, 2)
b = numpy.random.rand(5, 2)
c = numpy.where(a[:, 0] > 0.5, a, b) # !
Traceback (most recent call last):
File "p.py", line 10, in <module>
c = numpy.where(a[:, 0] > 0.5, a, b) # okay
File "<__array_function__ internals>", line 6, in where
ValueError: operands could not be broadcast together with shapes (5,) (5,2) (5,2)
I would have expected a numpy array of shape (5,2).
What's the issue here? How to work around it?
Remember that broadcasting in numpy only works from the right, so while (5,) shaped arrays can broadcast with (2,5) shaped arrays they can't broadcast with (5,2) shaped arrays. to broadcast with a (5,2) shaped array you need to maintain the second dimension so that the shape is (5,1) (anything can broadcast with 1)
Thus, you need to maintain the second dimension when indexing it (otherwise it removes the indexed dimension when only one value exists). You can do this by putting the index in a one-element list:
a = numpy.random.rand(5, 2)
b = numpy.random.rand(5, 2)
c = numpy.where(a[:, [0]] > 0.5, a, b) # works
You can use c = numpy.where(a > 0.5, a, b)
however if you want to use only the first column of a then you need to consider the shape of the output.
let's first see what is the shape of this operation
(a[:, 0] > 0.5).shape # outputs (5,)
it's one dimensional
while the shape of a and b is (5, 2)
it's two dimensional and hence you can't broadcast this
the solution is to reshape the mask operation to be of shape (5, 1)
your code should look like this
a = numpy.random.rand(5, 2)
b = numpy.random.rand(5, 2)
c = numpy.where((a[:, 0] > 0.5).reshape(-1, 1), a, b) # !
You can try:
import numpy
a = numpy.random.rand(5, 2)
b = numpy.random.rand(5, 2)
c = numpy.where(a > 0.5, a, b)
instead of: c = np.where(a>0.5,a,b)
you can use: c = np.array([a,b])[a>0.5]
which works for multidimensional arrays if a and b have the same shape.
Suppose I have a 5x10x3 array, which I interpret as 5 'sub-arrays', each consisting of 10 rows and 3 columns. I also have a seperate 1D array of length 5, which I call b.
I am trying to insert a new column into each sub-array, where the column inserted into the ith (i=0,1,2,3,4) sub-array is a 10x1 vector where each element is equal to b[i].
For example:
import numpy as np
np.random.seed(777)
A = np.random.rand(5,10,3)
b = np.array([2,4,6,8,10])
A[0] should look like:
A[1] should look like:
And similarly for the other 'sub-arrays'.
(Notice b[0]=2 and b[1]=4)
What about this?
# Make an array B with the same dimensions than A
B = np.tile(b, (1, 10, 1)).transpose(2, 1, 0) # shape: (5, 10, 1)
# Concatenate both
np.concatenate([A, B], axis=-1) # shape: (5, 10, 4)
One method would be np.pad:
np.pad(A, ((0,0),(0,0),(0,1)), 'constant', constant_values=[[[],[]],[[],[]],[[],b[:, None,None]]])
# array([[[9.36513084e-01, 5.33199169e-01, 1.66763960e-02, 2.00000000e+00],
# [9.79060284e-02, 2.17614285e-02, 4.72452812e-01, 2.00000000e+00],
# etc.
Or (more typing but probably faster):
i,j,k = A.shape
res = np.empty((i,j,k+1), np.result_type(A, b))
res[...,:-1] = A
res[...,-1] = b[:, None]
Or dstack after broadcast_to:
np.dstack([A,np.broadcast_to(b[:,None],A.shape[:2])]
I have a three dimensional array A, with shape (5774,15,100) and another 1 D array B with shape (5774,). I want to add these in order to get the another matrix C with shape (5774,15,101).
I am using hstack as
C = hstack((A ,np.array(B)[:,None]))
I am getting the below error, any suggesstions.
ValueError: could not broadcast input array from shape (5774,15,100) into shape (5774)
You'd need to use np.concatenate (which can cancatenate arrays of different shape, unlike the various np.*stack methods). Then, you need to use np.broadcast_to to get that (5774,) shaped array to (5774, 15, 1) (because concatenate still needs all the arrays to have the same number of dimensions).
C = np.concatenate((A,
np.broadcast_to(np.array(B)[:, None, None], A.shape[:-1] + (1,))),
axis = -1)
Checking:
A = np.random.rand(5774, 15, 100)
B = np.random.rand(5774)
C = np.concatenate((A,
np.broadcast_to(np.array(B)[:, None, None], A.shape[:-1] + (1,))),
axis = -1)
C.shape
Out: (5774, 15, 101)
Let two ndarrays: A of shape (n, *m), and B of shape (n, ). Is there a way to sort A in-place using the order that would sort B?
Sorting A with B is easy using np.argsort, but this is not done in-place:
A = A[np.argsort(B)]
Comments:
A and B have different dtypes, and A can have more than two dimensions. Hence they can’t be stacked to use ndarray.sort().
A takes up a lot of space, which is why it needs to be sorted in-place. Any solution requiring twice the space occupied by A would therefore defeat this purpose.
The title of this question “Re-arranging numpy array in place” may sound related, but the question itself is not very clear, and the answers do not match my question.
Here is a solution that works by following cycles in the index array. It can optionally be compiled using pythran giving a significant speedup if rows are small (80x for 10 elements) and a small speedup if rows are large (30% for 1000 elements).
To keep it pythran compatible I had to simplify it a bit, so it only accepts 2D arrays and it only sorts along axis 0.
Code:
import numpy as np
#pythran export take_inplace(float[:, :] or int[:, :], int[:])
def take_inplace(a, idx):
n, m = a.shape
been_there = np.zeros(n, bool)
keep = np.empty(m, a.dtype)
for i in range(n):
if been_there[i]:
continue
keep[:] = a[i]
been_there[i] = True
j = i
k = idx[i]
while not been_there[k]:
a[j] = a[k]
been_there[k] = True
j = k
k = idx[k]
a[j] = keep
Sample run using compiled version. As indicated above compilation is only required for small rows, for larger rows pure python should be fast enough.
>>> from timeit import timeit
>>> import numpy as np
>>> import take_inplace
>>>
>>> a = np.random.random((1000, 10))
>>> idx = a[:, 4].argsort()
>>>
>>> take_inplace.take_inplace(a, idx)
>>>
# correct
>>> np.all(np.arange(1000) == a[:, 4].argsort())
True
>>>
# speed
>>> timeit(lambda: take_inplace.take_inplace(a, idx), number=1000)
0.011950935004279017
>>>
# for comparison
>>> timeit(lambda: a[idx], number=1000)
0.02985276997787878
If you can set A beforehand as a structured array whose datatype is composed of a subarray of shape (m, ) and a scalar of the same type (e.g., np.int32), then you can sort it in-place with respect to B. For example:
import numpy as np
B = np.array([3, 1, 2])
A = np.array([[10, 11], [20, 21], [30, 31]])
(n, m) = A.shape
dt = np.dtype([('a', np.int32, (m, )), ('b', int)])
A2 = np.array([(a, b) for a, b in zip(A, B)], dtype=dt)
A2.sort(order='b')
print A2
I have a two 1 dimensional arrays, a such that np.shape(a) == (n,) and b such that np.shape(b) == (m,).
I want to make a (3rd order) tensor c such that np.shape(c) == (n,n,m,)by doing c = np.outer(np.outer(a,a),b).
But when I do this, I get:
>> np.shape(c)
(n*n,m)
which is just a rectangular matrix. How can I make a 3D tensor like I want?
You could perhaps use np.multiply.outer instead of np.outer to get the required outer product:
>>> a = np.arange(4)
>>> b = np.ones(5)
>>> mo = np.multiply.outer
Then we have:
>>> mo(mo(a, a), b).shape
(4, 4, 5)
A better way could be to use np.einsum (this avoids creating intermediate arrays):
>>> c = np.einsum('i,j,k->ijk', a, a, b)
>>> c.shape
(4, 4, 5)