I have a 3D array a of data and a 2D array b of indices. I need to take a sub-array of a along the 3rd axis, using the indices from b. I can do it with take like this:
a = np.arange(24).reshape((2,3,4))
b = np.array([0,2,1,3]).reshape((2,2))
np.array([np.take(a_,b_,axis=1) for (a_,b_) in zip(a,b)])
Can I do it without list comprehension, using some fancy indexing? I am worried about efficiency, so if fancy indexing is not more efficient in this case, I would like to know it.
EDIT The 1st thing I've tried is a[[0,1],:,b] but it doesn't give the sub-array I need
In [317]: a
Out[317]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
In [318]: a = np.arange(24).reshape((2,3,4))
...: b = np.array([0,2,1,3]).reshape((2,2))
...: np.array([np.take(a_,b_,axis=1) for (a_,b_) in zip(a,b)])
...:
Out[318]:
array([[[ 0, 2],
[ 4, 6],
[ 8, 10]],
[[13, 15],
[17, 19],
[21, 23]]])
So you want the 0 & 2 columns from the 1st block, and 1 & 3 from the second.
Make a c that matches b in shape, and embodies this observation
In [319]: c=np.array([[0,0],[1,1]])
In [320]: c
Out[320]:
array([[0, 0],
[1, 1]])
In [321]: b
Out[321]:
array([[0, 2],
[1, 3]])
In [322]: a[c,:,b]
Out[322]:
array([[[ 0, 4, 8],
[ 2, 6, 10]],
[[13, 17, 21],
[15, 19, 23]]])
That's the right numbers, but not the right shape.
A column vector can be used instead of c.
In [323]: a[np.arange(2)[:,None],:,b] # or a[[[0],[1]],:,b]
Out[323]:
array([[[ 0, 4, 8],
[ 2, 6, 10]],
[[13, 17, 21],
[15, 19, 23]]])
As for the shape, we can transpose the last two axes
In [324]: a[np.arange(2)[:,None],:,b].transpose(0,2,1)
Out[324]:
array([[[ 0, 2],
[ 4, 6],
[ 8, 10]],
[[13, 15],
[17, 19],
[21, 23]]])
This transpose is required because we have a slice between two index arrays, a mix of basic and advanced indexing. It's documented, but never the less often puzzling. It put the slice dimension (3) last, and we have to transpose it back.
Nice little indexing puzzle!
The latest question and explanation of this advanced/basic transpose:
Indexing numpy multidimensional arrays depends on a slicing method
This is my first try. I will see if I can do better.
#using numpy broadcasting.
np.r_[a[0][:,b[0]],a[1][:,b[1]]].reshape(2,3,2)
Out[300]: In [301]:
array([[[ 0, 2],
[ 4, 6],
[ 8, 10]],
[[13, 15],
[17, 19],
[21, 23]]])
Second try:
#convert both a and b to a 2d array and then slice all rows and only columns determined by b.
a.reshape(6,4)[np.arange(6)[:,None],b.repeat(3,0)].reshape(2,3,2)
Out[429]:
array([[[ 0, 2],
[ 4, 6],
[ 8, 10]],
[[13, 15],
[17, 19],
[21, 23]]])
Related
Say I have a 2x3 ndarray:
[[0,1,1],
[1,1,1]]
I want to replace {any row that has 0 in the first index} with [0,0,0]:
[[0,0,0],
[1,1,1]]
Is it possible to do this with np.where?
Here's my attempt:
import numpy as np
arr = np.array([[0,1,1],[1,1,1]])
replacement = np.full(arr.shape,[0,0,0])
new = np.where(arr[:,0]==0,replacement,arr)
I'm met with the following error at the last line:
ValueError: operands could not be broadcast together with shapes (2,) (2,3) (2,3)
The error makes sense, but I don't know how to fix the code to accomplish my goal. Any advice would be greatly appreciated!
Edit:
I was trying to simplify a higher-dimensional case, but turns out it might not generalize.
If I have this ndarray:
[[[0,1,1],[1,1,1],[1,1,1]],
[[1,1,1],[1,1,1],[1,1,1]],
[[1,1,1],[1,1,1],[1,1,1]]]
how can I replace the first triplet with [0,0,0]?
Simple indexing/broadcasting will do:
a[a[:,0]==0] = [0,0,0]
output:
array([[0, 0, 0],
[1, 1, 1]])
explanation:
# get first column
a[:,0]
# array([0, 1])
# compare to 0 creating a boolean array
a[:,0]==0
# array([ True, False])
# select rows where the boolean is True
a[a[:,0]==0]
# array([[0, 1, 1]])
# replace those rows with new array
a[a[:,0]==0] = [0,0,0]
using np.where
this is less elegant in my opinion:
a[np.where(a[:,0]==0)[0]] = [0,0,0]
Edit: generalization
input:
a = np.arange(3**3).reshape((3,3,3))
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]]])
transformation:
a[a[...,0]==0] = [0,0,0]
array([[[ 0, 0, 0],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]]])
I am generating multidimensional array of different sizes, though they'll all have an even number of columns.
>> import numpy as np
>> x = np.arange(24).reshape((3,8))
Which results in:
array([[ 0, 1, 2, 3, 4, 5, 6, 7],
[ 8, 9, 10, 11, 12, 13, 14, 15],
[16, 17, 18, 19, 20, 21, 22, 23]])
I am able to slice with numpy and get the first column in an array:
>> newarr = x[0:,0:2]
array([[ 0, 1],
[ 8, 9],
[16, 17]])
However, I want to have one array that is just a list of the columns where column 1 and 2 are together, 3 and 4 are together, and so on.. For example:
array([[[ 0, 1],
[ 8, 9],
[16, 17]],
[[ 2, 3],
[10, 11],
[18, 19]],
etc....]
)
This code below works but it's clunky and my arrays are not all the same. Some arrays have 16 columns, some have 34, some have 50, etc.
>> newarr = [x[0:,0:2]]+[x[0:,2:4]]+[x[0:,4:6]]
[array([[ 0, 1],
[ 8, 9],
[16, 17]]), array([[ 2, 3],
[10, 11],
[18, 19]])]
There's got to be a better way to do this than
newarr = [x[0:,0:2]]+[x[0:,2:4]]+[x[0:,4:6]]+...+[x[0:,n:n+2]]
Help!
My idea is adding a for loop:
slice_len = 2
x_list = [x[0:, slice_len*i:slice_len*(i+1)] for i in range(x.shape[1] // slice_len)]
Output:
[array([[ 0, 1],
[ 8, 9],
[16, 17]]), array([[ 2, 3],
[10, 11],
[18, 19]]), array([[ 4, 5],
[12, 13],
[20, 21]]), array([[ 6, 7],
[14, 15],
[22, 23]])]
Short version
I want to manipulate a numpy array (test, see first code snippet), so that it becomes rearranged (evenodd_single_column, see second code snippet). I wrote a for loop, but since I'm working with semi-big data I would be glad if there is a better way to achieve this.
Long version
I am writing a script where at one point I should be doing the following manipulation on a numpy array:
test = np.arange(24).reshape(8,3)
test
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17],
[18, 19, 20],
[21, 22, 23]])
needs to be converted to an array which takes the first x (timepoints, in this example 2) from all the columns (experiments, here 3) and puts it in an array. Then it goes to the next two values of all the columns and appends the array, until all iterations (y) are over. In the end it should look like that:
>>> evenodd_single_column
array([[ 0, 3],
[ 1, 4],
[ 2, 5],
[12, 15],
[13, 16],
[14, 17],
[ 6, 9],
[ 7, 10],
[ 8, 11],
[18, 21],
[19, 22],
[20, 23]])
To achieve this, I had to write a for loop:
all_odd = []
all_even = []
x = 2
y = 4
test = np.arange(24).reshape(8,3)
counter = 0
for i in range(1, int(test.shape[0]/2)+1):
time_window = i * x
if math.modf(counter / 2)[0] == 0:
for j in range(0, test.shape[1]):
all_even.extend(test[time_window - x:time_window, j])
else:
for j in range(0,test.shape[1]):
all_odd.extend(test[time_window - x:time_window, j])
counter = counter + 1
even_single_column_test = np.asarray(all_even).reshape((int(y / 2 * test.shape[1]), x))
odd_single_column_test = np.asarray(all_odd).reshape((int(y / 2 * test.shape[1]), x))
evenodd_single_column = even_single_column_test
evenodd_single_column = np.append(evenodd_single_column, odd_single_column_test).reshape(int(odd_single_column_test.shape[0]*2), x)
My question: Can this be done with one of the elegant (and more importantly - faster) numpy matrix manipulations? I don't want to go around loops, making lists to then transform them to numpy arrays again.
I am not a programmer by training, I apologize in advance if the solution is an obvious one!
Thanks!
You can use a combination of np.reshape and np.transpose -
test.reshape(2,2,2,3).transpose(1,0,3,2).reshape(-1,2)
Sample run -
In [42]: test
Out[42]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17],
[18, 19, 20],
[21, 22, 23]])
In [43]: test.reshape(2,2,2,3).transpose(1,0,3,2).reshape(-1,2)
Out[43]:
array([[ 0, 3],
[ 1, 4],
[ 2, 5],
[12, 15],
[13, 16],
[14, 17],
[ 6, 9],
[ 7, 10],
[ 8, 11],
[18, 21],
[19, 22],
[20, 23]])
I have an 2D-array a with shape (k,n) and I want to 'multiply' it with an 1D-array b of shape (m,):
a = np.array([[2, 8],
[4, 7],
[1, 2],
[5, 2],
[7, 4]])
b = np.array([3, 5, 5])
As a result of the 'multiplication' I'm looking for:
array([[[2*3,2*5,2*5],[8*3,8*5,8*5]],
[[4*3,4*5,4*5],[7*3,7*5,7*5]],
[[1*3,1*5,1*5], ..... ]],
................. ]]])
= array([[[ 6, 10, 10],
[24, 40, 40]],
[[12, 20, 20],
[21, 35, 35]],
[[ 3, 5, 5],
[ ........ ]],
....... ]]])
I could solve it with a loop of course, but I'm looking for a fast vectorized way of doing it.
Extend a to a 3D array case by adding a new axis at the end with np.newaxis/None and then do elementwise multiplication with b, bringing in broadcasting for a vectorized solution, like so -
b*a[...,None]
Sample run -
In [19]: a
Out[19]:
array([[2, 8],
[4, 7],
[1, 2],
[5, 2],
[7, 4]])
In [20]: b
Out[20]: array([3, 5, 5])
In [21]: b*a[...,None]
Out[21]:
array([[[ 6, 10, 10],
[24, 40, 40]],
[[12, 20, 20],
[21, 35, 35]],
[[ 3, 5, 5],
[ 6, 10, 10]],
[[15, 25, 25],
[ 6, 10, 10]],
[[21, 35, 35],
[12, 20, 20]]])
I have a 2D numpy array (i.e matrix) A which contains useful data interspread with garbage in the form of column vectors as well as a 'selection' array B which contains '1' for those columns that are important and 0 for those that are not. Is there a way to select only those columns from A that correspond to ones in B? i.e i have a matrix
A = array([[ 0, 1, 2, 3, 4], and a vector B = array([ 0, 1, 0, 1, 0])
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
and I want
array([[1, 3],
[6, 8],
[11, 13],
[16, 18],
[21, 23]])
Is there an elegant way to do so? Right now i just have a for loop that iterates through B.
NOTE: the matrices that i'm dealing with are large, so i don't want to use numpy masked arrays, as i simply don't want the masked data
>>> A
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
>>> B = NP.array([ 0, 1, 0, 1, 0])
>>> # convert the indexing array to a boolean array
>>> B = NP.array(B, dtype=bool)
>>> # index A against B--indexing array is placed after the ',' because
>>> # you are selecting columns
>>> res = A[:,B]
>>> res
array([[ 1, 3],
[ 6, 8],
[11, 13],
[16, 18],
[21, 23]])
The syntax for index-based slicing in NumPy is elegant and simple. A couple of rules cover a majority of use cases:
the form is [rows, columns]
specify all rows or all columns using a colon ":" e.g., [:, 4] (extracts the
entire 5th column)
Not sure if it's the most efficient way (because of the transposition), but it should be better than a for loop:
A.T[B == 1].T
I was interested to do the same but to slice row & column using the boolean values of vector B, the solution was simple:
res = A[:,B][B,:]