Related
I have the following NumPy array:
>>> a = np.arange(21).reshape(3, 7)
>>> a
array([[ 0, 1, 2, 3, 4, 5, 6],
[ 7, 8, 9, 10, 11, 12, 13],
[14, 15, 16, 17, 18, 19, 20]])
I want to create a new array by replacing some columns with their sums. The columns that I need to sum over will always be grouped together.
For example, keep the first 2 columns as is. Replace the column with index 2 by the sum of columns with indices 2, 3, and 4. Replace the column with index 3 by the sum of columns with indices 5 and 6.
Required output:
array([[ 0, 1, 9, 11],
[ 7, 8, 30, 25],
[14, 15, 51, 39]])
What I tried:
b = np.concatenate([
a[:, :2],
a[:, 2:5].sum(axis=1, keepdims=True),
a[:, 5:].sum(axis=1, keepdims=True)
], axis=1)
This gives the required output, but I was wondering if there is a better/concise way to do this.
Define the bins of the intervals to be summed and then use np.add.reduceat along axis=1 (for each row) -
In [37]: a
Out[37]:
array([[ 0, 1, 2, 3, 4, 5, 6],
[ 7, 8, 9, 10, 11, 12, 13],
[14, 15, 16, 17, 18, 19, 20]])
In [38]: bins = [0,1,2,5]
In [39]: np.add.reduceat(a,bins,axis=1)
Out[39]:
array([[ 0, 1, 9, 11],
[ 7, 8, 30, 25],
[14, 15, 51, 39]])
This question already has answers here:
Selecting Random Windows from Multidimensional Numpy Array Rows
(2 answers)
Closed 3 years ago.
I am attempting to extract out and perform math on the subset of one numpy array that is of shape (3,32), the subset of data I am attempting to extract out is that of shape (3,9) and the ranges of this data originate at the indices contained in another array size (3). As an example, I have a data set of values from three channels operating in the time domain and I extract the index of the max value of each channel into an array
a = np.random.randint(20,size = (3,32))
a
array([[18, 3, 10, 6, 12, 1, 10, 8, 4, 11, 13, 14, 9, 9, 10, 2,
9, 0, 0, 16, 14, 19, 1, 19, 14, 19, 19, 2, 14, 0, 4, 18],
[ 9, 19, 2, 12, 0, 14, 18, 7, 3, 0, 7, 3, 12, 19, 4, 2,
5, 9, 2, 11, 15, 19, 16, 17, 3, 4, 17, 5, 6, 1, 2, 17],
[ 0, 11, 18, 8, 9, 2, 9, 15, 9, 6, 0, 8, 9, 16, 9, 6,
1, 19, 1, 9, 12, 8, 0, 0, 7, 15, 3, 14, 15, 8, 10, 19]])
b = np.argmax(a,1)
b
array([21, 1, 17], dtype=int64)
my goal at this point is to derive a new array consisting of the three values from each of the indexes specified. For instance I would be looking to extract out :
[21,22,23] from a[0]
[1,2,3] from a[1]
[17,18,19] from a[2]
all into a new array of size [3,3]
I've been able to achieve this using loops already but I suspect that there is a more efficient way of producing this result without loops (speed is a bit of an issue with this application). I have been able to effect a similar result by manually populating a smaller matrix manually ...
c = np.asarray([[1,2,3],[2,3,4],[3,4,5]])
a[np.arange(3)[:,None],c]
array([[ 3, 10, 6],
[ 2, 12, 0],
[ 8, 9, 2]])
However, given the dynamic nature of this application I would like to write this such that it can be dynamically scaled (range of indeces out to 9 values beyond the root index, etc). I just don't know if there is such a way to do this. I have used syntax similar to the following in an effort to slice the array ...
a[np.arange(3)[:,None],b[:]:(b[:] + 2)]
resulting in error messages in the nature of ...
builtins.TypeError: only integer scalar arrays can be converted to a scalar index
Since you say you cannot overflow, this gets much less tricky. In general, since you have your starting indices, you can use basic broadcasting to create an (n, 3) shape array with your indices, and use take_along_axis to pull those elements from the original array.
np.take_along_axis(a, b[:, None] + np.arange(3), axis=1)
array([[19, 1, 19],
[19, 2, 12],
[19, 1, 9]])
Given this 2D numpy array:
a=numpy.array([[31,22,43],[44,55,6],[17,68,19],[12,11,18],...,[99,98,97]])
given the need to flatten it using numpy.ravel:
b=numpy.ravel(a)
and given the need to later dump it into a pandas dataframe, how can I make sure the sequential order of the values in a is preserved when applying numpy.ravel? e.g., How can I check/ensure that numpy.ravel does not mess up with the original sequential order?
Of course the intended result should be that the numbers coming before and after 17 in b, for instance, are the same as in a.
First of all you need to formulate what "sequential" order means for you, as numpy.ravel() does preserve order. Here is a tip how to formulate what you need: try with a simplest possible toy example:
import numpy as np
X = np.arange(20).reshape(-1,4)
X
#array([[ 0, 1, 2, 3],
# [ 4, 5, 6, 7],
# [ 8, 9, 10, 11],
# [12, 13, 14, 15],
# [16, 17, 18, 19]])
X.ravel()
# array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
# 13, 14, 15, 16, 17, 18, 19])
Does it meet your expectation? Or you want to see this order:
Z = X.T
Z
# array([[ 0, 4, 8, 12, 16],
# [ 1, 5, 9, 13, 17],
# [ 2, 6, 10, 14, 18],
# [ 3, 7, 11, 15, 19]])
Z.ravel()
# array([ 0, 4, 8, 12, 16, 1, 5, 9, 13, 17, 2, 6, 10,
# 14, 18, 3, 7, 11, 15, 19])
so this is a question regarding the use of reshape and how this functions uses each axis on a multidimensional scale.
Suppose I have the following array that contains matrices indexed by the first index. What I want to achieve is to instead index the columns of each matrix with the first index. In order to illustrate this problem, consider the following example where the given numpy array that indexes matrices with its first index is z.
x = np.arange(9).reshape((3, 3))
y = np.arange(9, 18).reshape((3, 3))
z = np.dstack((x, y)).T
Where z looks like:
array([[[ 0, 3, 6],
[ 1, 4, 7],
[ 2, 5, 8]],
[[ 9, 12, 15],
[10, 13, 16],
[11, 14, 17]]])
And its shape is (2, 3, 3). Here, the first index are the two images and the three x three is a matrix.
The question more specifically phrased then, is how to use reshape to obtain the following desired output:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]])
Whose shape is (6, 3). This achieves that the dimension of the array indexes the columns of the matrix x and y as presented above. My natural inclination was to use reshape directly on z in the following way:
out = z.reshape(2 * 3, 3)
But its output is the following which indexes the rows of the matrices and not the columns:
array([[ 0, 3, 6],
[ 1, 4, 7],
[ 2, 5, 8],
[ 9, 12, 15],
[10, 13, 16],
[11, 14, 17]]
Could reshape be used to obtain the desired output above? Or more general, can you control how each axis is used when you use the reshape function?
Two things:
I know how to solve the problem. I can go through each element of the big matrix (z) transposed and then apply reshape in the way above. This increases computation time a little bit and is not really problematic. But it does not generalize and it does not feel python. So I was wondering if there is a standard enlightened way of doing this.
I was not clear on how to phrase this question. If anyone has suggestion on how to better phrase this problem I am all ears.
Every array has a natural (1D flattened) order to its elements. When you reshape an array, it is as though it were flattened first (thus obtaining the natural order), and then reshaped:
In [54]: z.ravel()
Out[54]:
array([ 0, 3, 6, 1, 4, 7, 2, 5, 8, 9, 12, 15, 10, 13, 16, 11, 14,
17])
In [55]: z.ravel().reshape(2*3, 3)
Out[55]:
array([[ 0, 3, 6],
[ 1, 4, 7],
[ 2, 5, 8],
[ 9, 12, 15],
[10, 13, 16],
[11, 14, 17]])
Notice that in the "natural order", 0 and 1 are far apart. However you reshape it, 0 and 1 will not be next to each other along the last axis, which is what you want in the desired array:
desired = np.array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]])
This requires some reordering, which in this case can be done by swapaxes:
In [53]: z.swapaxes(1,2).reshape(2*3, 3)
Out[53]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]])
because swapaxes(1,2) places the values in the desired order
In [56]: z.swapaxes(1,2).ravel()
Out[56]:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17])
In [57]: desired.ravel()
Out[57]:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17])
Note that the reshape method also has a order parameter which can be used to control the (C- or F-) order with which the elements are read from the array and placed in the reshaped array. However, I don't think this helps in your case.
Another way to think about the limits of reshape is to say that all reshapes followed by ravel are the same:
In [71]: z.reshape(3,3,2).ravel()
Out[71]:
array([ 0, 3, 6, 1, 4, 7, 2, 5, 8, 9, 12, 15, 10, 13, 16, 11, 14,
17])
In [72]: z.reshape(3,2,3).ravel()
Out[72]:
array([ 0, 3, 6, 1, 4, 7, 2, 5, 8, 9, 12, 15, 10, 13, 16, 11, 14,
17])
In [73]: z.reshape(3*2,3).ravel()
Out[73]:
array([ 0, 3, 6, 1, 4, 7, 2, 5, 8, 9, 12, 15, 10, 13, 16, 11, 14,
17])
In [74]: z.reshape(3*3,2).ravel()
Out[74]:
array([ 0, 3, 6, 1, 4, 7, 2, 5, 8, 9, 12, 15, 10, 13, 16, 11, 14,
17])
So if the ravel of the desired array is different, there is no way to obtain it only be reshaping.
The same goes for reshaping with order='F', provided you also ravel with order='F':
In [109]: z.reshape(2,3,3, order='F').ravel(order='F')
Out[109]:
array([ 0, 9, 1, 10, 2, 11, 3, 12, 4, 13, 5, 14, 6, 15, 7, 16, 8,
17])
In [110]: z.reshape(2*3*3, order='F').ravel(order='F')
Out[110]:
array([ 0, 9, 1, 10, 2, 11, 3, 12, 4, 13, 5, 14, 6, 15, 7, 16, 8,
17])
In [111]: z.reshape(2*3,3, order='F').ravel(order='F')
Out[111]:
array([ 0, 9, 1, 10, 2, 11, 3, 12, 4, 13, 5, 14, 6, 15, 7, 16, 8,
17])
It is possible to obtain the desired array using two reshapes:
In [83]: z.reshape(2, 3*3, order='F').reshape(2*3, 3)
Out[83]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]])
but I stumbled upon this serendipidously.
If I've totally misunderstood your question and x and y are the givens (not z) then you could obtain the desired array using row_stack instead of dstack:
In [88]: z = np.row_stack([x, y])
In [89]: z
Out[89]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]])
It you look at dstack code you'll discover that
np.dstack((x, y)).T
is effectively:
np.concatenate([i[:,:,None] for i in (x,y)],axis=2).transpose([2,1,0])
It reshapes each component array and then joins them along this new axis. Finally it transposes axes.
Your target is the same as (row stack)
np.concatenate((x,y),axis=0)
So with a bit of reverse engineering we can create it from z with
np.concatenate([i[...,0] for i in np.split(z.T,2,axis=2)],axis=0)
np.concatenate([i.T[:,:,0] for i in np.split(z,2,axis=0)],axis=0)
or
np.concatenate(np.split(z.T,2,axis=2),axis=0)[...,0]
or with a partial transpose we can keep the split-and-rejoin axis first, and just use concatenate:
np.concatenate(z.transpose(0,2,1),axis=0)
or its reshape equivalent
(z.transpose(0,2,1).reshape(-1,3))
is there a way to do the following without an if clause?
I'm reading a set of netcdf files with pupynere and want to build an array with numpy append. Sometimes the input data is multi-dimensional (see variable "a" below), sometimes one dimensional ("b"), but the number of elements in the first dimension is always the same ("9" in the example below).
> import numpy as np
> a = np.arange(27).reshape(3,9)
> b = np.arange(9)
> a.shape
(3, 9)
> b.shape
(9,)
this works as expected:
> np.append(a,a, axis=0)
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8],
[ 9, 10, 11, 12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23, 24, 25, 26],
[ 0, 1, 2, 3, 4, 5, 6, 7, 8],
[ 9, 10, 11, 12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23, 24, 25, 26]])
but, appending b does not work so elegantly:
> np.append(a,b, axis=0)
ValueError: arrays must have same number of dimensions
The problem with append is (from the numpy manual)
"When axis is specified, values must have the correct shape."
I'd have to cast first in order to get the right result.
> np.append(a,b.reshape(1,9), axis=0)
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8],
[ 9, 10, 11, 12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23, 24, 25, 26],
[ 0, 1, 2, 3, 4, 5, 6, 7, 8]])
So, in my file reading loop, I'm currently using an if clause like this:
for i in [a, b]:
if np.size(i.shape) == 2:
result = np.append(result, i, axis=0)
else:
result = np.append(result, i.reshape(1,9), axis=0)
Is there a way to append "a" and "b" without the if statement?
EDIT: While #Sven answered the original question perfectly (using np.atleast_2d()), he (and others) pointed out that the code is inefficient. In an answer below, I combined their suggestions and replaces my original code. It should be much more efficient now. Thanks.
You can use numpy.atleast_2d():
result = np.append(result, np.atleast_2d(i), axis=0)
That said, note that the repeated use of numpy.append() is a very inefficient way to build a NumPy array -- it has to be reallocated in every step. If at all possible, preallocate the array with the desired final size and populate it afterwards using slicing.
You can just add all of the arrays to a list, then use np.vstack() to concatenate them all together at the end. This avoids constantly reallocating the growing array with every append.
|1> a = np.arange(27).reshape(3,9)
|2> b = np.arange(9)
|3> np.vstack([a,b])
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8],
[ 9, 10, 11, 12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23, 24, 25, 26],
[ 0, 1, 2, 3, 4, 5, 6, 7, 8]])
I'm going to improve my code with the help of #Sven, #Henry and #Robert. #Sven answered the question, so he earns the reputation for this question, but - as highlighted by him and others -there is a more efficient way of doing what I want.
This involves using a python list, which allows appending with a performance penalty of O(1) whereas numpy.append() has a performance penalty of O(N**2). Afterwards, the list is converted to a numpy array:
Suppose i is either of type a or b:
> a = np.arange(27).reshape(3,9)
> b = np.arange(9)
> a.shape
(3, 9)
> b.shape
(9,)
Initialise list and append all read data, e.g. if data appear in order 'aaba'.
> mList = []
> for i in [a,a,b,a]:
mList.append(i)
Your mList will look like this:
> mList
[array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8],
[ 9, 10, 11, 12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23, 24, 25, 26]]),
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8],
[ 9, 10, 11, 12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23, 24, 25, 26]]),
array([0, 1, 2, 3, 4, 5, 6, 7, 8]),
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8],
[ 9, 10, 11, 12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23, 24, 25, 26]])]
finally, vstack the list to form a numpy array:
> result = np.vstack(mList[:])
> result.shape
(10, 9)
Thanks again for valuable help.
As pointed out, append needs to reallocate every numpy array. An alternative solution that allocates once would be something like this:
total_size = 0
for i in [a,b]:
total_size += i.size
result = numpy.empty(total_size, dtype=a.dtype)
offset = 0
for i in [a,b]:
# copy in the array
result[offset:offset+i.size] = i.ravel()
offset += i.size
# if you know its always divisible by 9:
result = result.reshape(result.size//9, 9)
If you can't precompute the array size, then perhaps you can put an upper bound on the size and then just preallocate a block that will always be big enough. Then you can just make the result a view into that block:
result = result[0:known_final_size]