This question already has answers here:
Indexing one array by another in numpy
(4 answers)
Closed 4 years ago.
For each row of y I would like to get element whose indexes are specified in m.
>>> y = np.arange(15).reshape(3,5)
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]]
>>> m = np.array([[0, 1], [1, 2], [2, 3]])
Expected output:
[[0, 1]
[6, 7]
[12, 13]]
Solution with for cycle
>>> np.stack([y[i, cols] for i, cols in enumerate(m)])
Is there a way how to do it without a for cycle?
Using values from one array as index from another is called 'fancy indexing', however that indexing operation will be repeated for all rows:
y = numpy.arange(15).reshape(3,5)
y[:, [0, 2, 3]]
# array([[ 0, 2, 3],
# [ 5, 7, 8],
# [10, 12, 13]])
If you want to individually "use one index value per row", you need to give that row-to-index relation as another index:
y[[0, 1, 2], [0, 2, 3]]
# array([ 0, 7, 13])
Since your index array m is 2D, you need to tell NumPy which of these two dimension in m corresponds to your row index. You do this by adding another empty axis to the ascending index, (keyword: broadcasting), and you get
y = numpy.arange(15).reshape(3,5)
m = numpy.array([[0, 1], [1, 2], [2, 3]])
y[numpy.arange(len(m))[:, None], m]
# array([[ 0, 1],
# [ 6, 7],
# [12, 13]])
One line, although not much nicer than your own proposal using the for loop:
y[..., m][np.identity(3, dtype=bool)]
Though it will give you some insights in numpy indexing.
Related
This question already has answers here:
Indexing one array by another in numpy
(4 answers)
Closed 3 years ago.
I have a list of lists of indices for an numpy array, but do not quite come to the wanted result when using them.
n = 3
a = np.array([[8, 1, 6],
[3, 5, 7],
[4, 9, 2]])
np.random.seed(7)
idx = np.random.choice(np.arange(n), size=(n, n-1))
# array([[0, 1],
# [2, 0],
# [1, 2]])
In this case I want:
element 0 and 1 of row 0
element 2 and 0 of row 1
element 1 and 2 of row 2
My list has n sublists and all of those lists have the same length.
I want that each sublist is only used once and not for all axis.
# Wanted result
# b = array[[8, 1],
# [7, 3],
# [9, 2]])
I can achieve this but it seems rather cumbersome with a lot of repeating and reshaping.
# Possibility 1
b = a[:, idx]
# array([[[8, 1], | [[3, 5], | [[4, 9],
# [6, 8], | [7, 3], | [2, 4],
# [1, 6]], | [5, 7]], | [9, 2]])
b = b[np.arange(n), np.arange(n), :]
# Possibility 2
b = a[np.repeat(range(n), n-1), idx.ravel()]
# array([8, 1, 7, 3, 9, 2])
b = b.reshape(n, n-1)
Are there easier ways?
You can use np.take_along_axis here:
np.take_along_axis(a, idx, 1)
array([[8, 1],
[7, 3],
[9, 2]])
Or using broadcasting:
a[np.arange(a.shape[0])[:,None], idx]
array([[8, 1],
[7, 3],
[9, 2]])
Note that your using integer array indexing here, you need to specify over which axis and rows you want to index using idx.
Is it possible to extract the upper values from the whole 3D array?
A simple example of a 3D array is below:
import numpy as np
a = np.array([[[7, 4, 2], [5, 0, 4], [0, 0, 5]],
[[7, 6, 1], [3, 9, 5], [0, 8, 7]],
[[8, 10, 3], [1, 2, 15], [9, 0, 1]]])
You can use the numpy building-matrices functions like numpy.triu (triangle-upper) or numpy.tril (triangle-lower) to return a copy of a matrix with the elements above or below the k-th diagonal zeroed.
If, on the other hand, you are only interested in the values above or below the diagonal (without having a copy of the matrix), you can simply use numpy.triu_indices and numpy.tril_indices, as follows:
upper_index = np.triu_indices(n=3, k=1)
where n is the size of the arrays for which the returned indices will be valid, and k the diagonal offset.
and return the indices for the triangle. The returned tuple contains two arrays, each with the indices along one dimension of the array:
(array([0, 0, 1], dtype=int64), array([1, 2, 2], dtype=int64))
now you can use the indexes obtained as indexes of your array and you will get:
a[upper_index]
and gives:
array([[5, 0, 4],
[0, 0, 5],
[0, 8, 7]])
Similarly you can find the part under the diagonal using numpy.tril_indices.
IUUC, You could use triu_indices:
result = a[np.triu_indices(3)]
print(result)
Output
[[7 4 2]
[5 0 4]
[0 0 5]
[3 9 5]
[0 8 7]
[9 0 1]]
If you want those strictly above the diagonal, you can pass an offset value:
result = a[np.triu_indices(3, 1)]
print(result)
Output
[[5 0 4]
[0 0 5]
[0 8 7]]
I am learning at Numpy and I want to understand such shuffling data code as following:
# x is a m*n np.array
# return a shuffled-rows array
def shuffle_col_vals(x):
rand_x = np.array([np.random.choice(x.shape[0], size=x.shape[0], replace=False) for i in range(x.shape[1])]).T
grid = np.indices(x.shape)
rand_y = grid[1]
return x[(rand_x, rand_y)]
So I input an np.array object as following:
x1 = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16]], dtype=int)
And I get a output of shuffle_col_vals(x1) like comments as following:
array([[ 1, 5, 11, 15],
[ 3, 8, 9, 14],
[ 4, 6, 12, 16],
[ 2, 7, 10, 13]], dtype=int64)
I get confused about the initial way of rand_x and I didn't get such way in numpy.array
And I have been thinking it a long time, but I still don't understand why return x[(rand_x, rand_y)] will get a shuffled-rows array.
If not mind, could anyone explain the code to me?
Thanks in advance.
In indexing Numpy arrays, you can take single elements. Let's use a 3x4 array to be able to differentiate between the axes:
In [1]: x1 = np.array([[1, 2, 3, 4],
...: [5, 6, 7, 8],
...: [9, 10, 11, 12]], dtype=int)
In [2]: x1[0, 0]
Out[2]: 1
If you review Numpy Advanced indexing, you will find that you can do more in indexing, by providing lists for each dimension. Consider indexing with x1[rows..., cols...], let's take two elements.
Pick from the first and second row, but always from the first column:
In [3]: x1[[0, 1], [0, 0]]
Out[3]: array([1, 5])
You can even index with arrays:
In [4]: x1[[[0, 0], [1, 1]], [[0, 1], [0, 1]]]
Out[4]:
array([[1, 2],
[5, 6]])
np.indices creates a row and col array, that if used for indexing, give back the original array:
In [5]: grid = np.indices(x1.shape)
In [6]: np.alltrue(x1[grid[0], grid[1]] == x1)
Out[6]: True
Now if you shuffle the values of grid[0] col-wise, but keep grid[1] as-is, and then use these for indexing, you get an array with the values of the columns shuffled.
Each column index vector is [0, 1, 2]. The code now shuffles these column index vectors for each column individually, and stacks them together into rand_x into the same shape as x1.
Create a single shuffled column index vector:
In [7]: np.random.seed(0)
In [8]: np.random.choice(x1.shape[0], size=x1.shape[0], replace=False)
Out[8]: array([2, 1, 0])
The stacking works by (pseudo-code) stacking with [random-index-col-vec for cols in range(x1.shape[1])] and then transposing (.T).
To make it a little clearer we can rewrite i as col and use column_stack instead of np.array([... for col]).T:
In [9]: np.random.seed(0)
In [10]: col_list = [np.random.choice(x1.shape[0], size=x1.shape[0], replace=False)
for col in range(x1.shape[1])]
In [11]: col_list
Out[11]: [array([2, 1, 0]), array([2, 0, 1]), array([0, 2, 1]), array([2, 0, 1])]
In [12]: rand_x = np.column_stack(col_list)
In [13]: rand_x
Out[13]:
array([[2, 2, 0, 2],
[1, 0, 2, 0],
[0, 1, 1, 1]])
In [14]: x1[rand_x, grid[1]]
Out[14]:
array([[ 9, 10, 3, 12],
[ 5, 2, 11, 4],
[ 1, 6, 7, 8]])
Details to note:
the example output you give is different from what the function you provide does. It seems to be transposed.
the use of rand_x and rand_y in the sample code can be confusing when being used to the convention of x=column index, y=row index
See output:
import numpy as np
def shuffle_col_val(x):
print("----------------------------\n A rand_x\n")
f = np.random.choice(x.shape[0], size=x.shape[0], replace=False)
print(f, "\nNow I transpose an array.")
rand_x = np.array([f]).T
print(rand_x)
print("----------------------------\n B rand_y\n")
print("Grid gives you two possibilities\n you choose second:")
grid = np.indices(x.shape)
print(format(grid))
rand_y = grid[1]
print("\n----------------------------\n C Our rand_x, rand_y:")
print("\nThe order of values in the column CHANGE:\n has random order\n{}".format(rand_x))
print("\nThe order of values in the row NO CHANGE:\n has normal order 0, 1, 2, 3\n{}".format(rand_y))
return x[(rand_x, rand_y)]
x1 = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16]], dtype=int)
print("\n----------------------------\n D Our shuffled-rows: \n{}\n".format(shuffle_col_val(x1)))
Output:
A rand_x
[2 3 0 1]
Now I transpose an array.
[[2]
[3]
[0]
[1]]
----------------------------
B rand_y
Grid gives you two possibilities, you choose second:
[[[0 0 0 0]
[1 1 1 1]
[2 2 2 2]
[3 3 3 3]]
[[0 1 2 3]
[0 1 2 3]
[0 1 2 3]
[0 1 2 3]]]
----------------------------
C Our rand_x, rand_y:
The order of values in the column CHANGE: has random order
[[2]
[3]
[0]
[1]]
The order of values in the row NO CHANGE: has normal order 0, 1, 2, 3
[[0 1 2 3]
[0 1 2 3]
[0 1 2 3]
[0 1 2 3]]
----------------------------
D Our shuffled-rows:
[[ 9 10 11 12]
[13 14 15 16]
[ 1 2 3 4]
[ 5 6 7 8]]
I have numpy 2d array having duplicate values.
I am searching the array like this.
In [104]: import numpy as np
In [105]: array = np.array
In [106]: a = array([[1, 2, 3],
...: [1, 2, 3],
...: [2, 5, 6],
...: [3, 8, 9],
...: [4, 8, 9],
...: [4, 2, 3],
...: [5, 2, 3])
In [107]: num_list = [1, 4, 5]
In [108]: for i in num_list :
...: print(a[np.where(a[:,0] == num_list)])
...:
[[1 2 3]
[1 2 3]]
[[4 8 9]
[4 2 3]]
[[5 2 3]]
The input is list having number similar to column 0 values.
The end result I want is the resulting rows in any format like array, list or tuple for example
array([[1, 2, 3],
[1, 2, 3],
[4, 8, 9],
[4, 2, 3],
[5, 2, 3]])
My code works fine but doesn't seem pythonic. Is there any better searching strategy with multiple values?
like a[np.where(a[:,0] == l)] where only one time lookup is done to get all the values.
my real array is large
Approach #1 : Using np.in1d -
a[np.in1d(a[:,0], num_list)]
Approach #2 : Using np.searchsorted -
num_arr = np.sort(num_list) # Sort num_list and get as array
# Get indices of occurrences of first column in num_list
idx = np.searchsorted(num_arr, a[:,0])
# Take care of out of bounds cases
idx[idx==len(num_arr)] = 0
out = a[a[:,0] == num_arr[idx]]
You can do
a[numpy.in1d(a[:, 0], num_list), :]
I have a numpy array, say:
>>> a=np.array([[0,1,2],[4,3,6],[9,5,7],[8,9,8]])
>>> a
array([[0, 1, 2],
[4, 3, 6],
[9, 5, 7],
[8, 9, 8]])
I want to replace the second and third column elements with the minimum of them (row by row), except if one of these 2 elements is < 3.
The resulting array should be:
array([[0, 1, 2],# nothing changes since 1 and 2 are <3
[4, 3, 3], #min(3,6)=3 => 6 changed to 3
[9, 5, 5], #min(5,7)=5 => 7 changed to 5
[8, 8, 8]]) #min(9,8)=8 => 9 changed to 8
I know I can use clip, for instance a[:,1:3].clip(2,6,a[:,1:3]), but
1) clip will be applied to all elements, including those <3.
2) I don't know how to set the min and max values of clip to the minimum values of the 2 related elements of each row.
Just use the >= operator to first select what you are interested of:
b = a[:, 1:3] # select the columns
matching = numpy.all(b >= 3, axis=1) # find rows with all elements matching
b = b[matching, :] # select rows
Now you can replace the content with the minimum by e.g.:
# find row minimum and convert to a column vector
b[:, :] = b.min(1, keepdims=True)
We first defined a row_mask, depicting the <3 condition, and then apply a minimum along an axis to find the minimum (for rows in row_mask).
The newaxis part is required for the broadcasting of a 1dim array (of minimums) to the 2-dim target of the assignment.
a=np.array([[0,1,2],[4,3,6],[9,5,7],[8,9,8]])
row_mask = (a[:,0]>=3)
a[row_mask, 1:] = a[row_mask, 1:].min(axis=1)[...,np.newaxis]
a
=>
array([[0, 1, 2],
[4, 3, 3],
[9, 5, 5],
[8, 8, 8]])
Here's a one liner:
a[np.where(np.sum(a,axis=1)>3),1:3]=np.min(a[np.where(np.sum(a,axis=1)>3),1:3],axis=2).reshape(1,3,1)
Here's a breakdown:
>>> b = np.where(np.sum(a,axis=1)>3) # finds rows where, in a, row sums are > 3
(array([1, 2, 3]),)
>>> c = a[b,1:3] # the part of a that needs to change
array([[[3, 3],
[5, 5],
[8, 8]]])
>>> d = np.min(c,axis=2) # the minimum values in each row (cols 1 and 2)
array([[3, 5, 8]])
>>> e = d.reshape(1,3,1) # adjust shape for broadcast to a
array([[[3],
[5],
[8]]])
>>> a[np.where(np.sum(a,axis=1)>3),1:3] = e # set the values in a
>>> a
array([[0, 1, 2],
[4, 3, 3],
[9, 5, 5],
[8, 8, 8]])