import numpy as np
a = np.array([[1,2], [3, 4], [5, 6]])
print(a[[0, 1, 2], [0, 1, 0]]) # Prints "[1 4 5]"
print(a[[0, 0], [1, 1]]) # Prints "[2 2]"
I don't understand why it results [1 4 5] and [2 2]
Because you're slicing the array with indexes
a[[0, 1, 2], [0, 1, 0]] is equivalent to
a[0, 0] # 1
a[1, 1] # 4
a[2, 0] # 5
whereas a[[0, 0], [1, 1]] is equivalent to twice a[0, 1]
More about Numpy indexing here
Think of it as 2d-array access. When you initialize a you get your 2d array in the form:
[ 1 2 ]
[ 3 4 ]
[ 5 6 ]
Numpy indexing when given a 2d array works as follows: you input a list of the row indexes, then a list of the column indexes. Semantically your first index retrieval statement is saying "from row 0 retrieve element 0, from row 1 retrieve element 1, and from row 2 retrieve element 0" which corresponds to [1 4 5]. You can then figure out why you get [2 2] for the second statement.
You can read more about this advanced indexing here: https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#integer-array-indexing
Related
I'm trying to replace values in specific columns with zero with python, and the column numbers are specified in another array.
Given the following 2 numpy arrays
a = np.array([[ 1, 2, 3, 4],
[ 1, 2, 1, 2],
[ 0, 3, 2, 2]])
and
b = np.array([1,3])
b indicates column numbers in array "a" where values need to be replaced with zero.
So the expected output is
([[ 1, 0, 3, 0],
[ 1, 0, 1, 0],
[ 0, 0, 2, 0]])
Any ideas on how I can accomplish this? Thanks.
Your question is:
I'm trying to replace values in specific columns with zero with python, and the column numbers are specified in another array.
This can be done like this:
a[:,b] = 0
Output:
[[1 0 3 0]
[1 0 1 0]
[0 0 2 0]]
The Integer array indexing section of Indexing on ndarrays in the numpy docs has some similar examples.
A simple for loop will accomplish this.
for column in b:
for row in range(len(a)):
a[row][column] = 0
print(a)
[[1 0 3 0]
[1 0 1 0]
[0 0 2 0]]
I try to get value by indices from np.array or pd.DataFrame. Suppose raw value shape is [x,y], my indices is an array which shape is [x,z]. I want to take values for each column by indices. It means that each column will changed to z columns. I tried to use take directly, but it was not my want. Thus, I have to apply the method for columns loop. My code is as follows:
import numpy as np
arr = np.asarray([[0, 1, 2, 4], [1, 2, 3, 4], [2, 3, 4, 5]])
print(f"input array:\n{arr}")
indices = np.asarray([[-1, 0, 1], [-1, -1, 0]]).T
print(f"indices:\n{indices}")
res_0 = arr.take(indices)
print(f"take directly:\n{res_0}")
result_list = []
for i in range(arr.shape[1]):
result_list.append(arr[:, i].take(indices))
res_1 = np.concatenate(result_list, axis=-1)
print(f"expected result:\n{res_1}")
The output of the code is as follows:
input array:
[[0 1 2 4]
[1 2 3 4]
[2 3 4 5]]
indices:
[[-1 -1]
[ 0 -1]
[ 1 0]]
take directly:
[[5 5]
[0 5]
[1 0]]
expected result:
[[2 2 3 3 4 4 5 5]
[0 2 1 3 2 4 4 5]
[1 0 2 1 3 2 4 4]]
For each column of arr, using the indices to select will generate two new columns (each column in indices will generate a new column). Thus, we finally get a new array with shape [3, 4*2].
Using take directly can not achieve my target, while using loop is not so neat.
Is there any more efficient way to implement this?
I'd like to generate a np.ndarray NumPy array for a given shape of another NumPy array. The former array should contain the corresponding indices for each cell of the latter array.
Example 1
Let's say we have a = np.ones((3,)) which has a shape of (3,). I'd expect
[[0]
[1]
[2]]
since there is a[0], a[1] and a[2] in a which can be accessed by their indices 0, 1 and 2.
Example 2
For a shape of (3, 2) like b = np.ones((3, 2)) there is already very much to write. I'd expect
[[[0 0]
[0 1]]
[[1 0]
[1 1]]
[[2 0]
[2 1]]]
since there are 6 cells in b which can be accessed by the corresponding indices b[0][0], b[0][1] for the first row, b[1][0], b[1][1] for the second row and b[2][0], b[2][1] for the third row. Therefore we get [0 0], [0 1], [1 0], [1 1], [2 0] and [2 1] at the matching positions in the generated array.
Thank you very much for taking the time. Let me know if I can clarify the question in any way.
One way to do it with np.indices and np.stack:
np.stack(np.indices((3,)), -1)
#array([[0],
# [1],
# [2]])
np.stack(np.indices((3,2)), -1)
#array([[[0, 0],
# [0, 1]],
# [[1, 0],
# [1, 1]],
# [[2, 0],
# [2, 1]]])
np.indices returns an array of index grid where each subarray represents an axis:
np.indices((3, 2))
#array([[[0, 0],
# [1, 1],
# [2, 2]],
# [[0, 1],
# [0, 1],
# [0, 1]]])
Then transpose the array with np.stack, stacking index for each element from different axis:
np.stack(np.indices((3,2)), -1)
#array([[[0, 0],
# [0, 1]],
# [[1, 0],
# [1, 1]],
# [[2, 0],
# [2, 1]]])
I wrote the following:
arr3=np.array([[[1,2,3],[1,2,3],[1,2,3],[1,2,3]],[[2,2,3],[4,2,3],[4,2,2],[2,2,2]],[[1,1,1],[1,1,1],[1,1,1],[1,1,1]]])
As I expected,
arr3[0:3,1] should return the same result as
arr3[0:3][1]:array([[2, 2, 3],[4, 2, 3],[4, 2, 2],[2, 2, 2]])
But it returns:array([[1, 2, 3],[4, 2, 3],[1, 1, 1]]).
BTW, I am using python3 in Jupyter notebook
When doing arr3[0:3,1], you are taking element from 0:3 in the first axis and then for each of those, taking the first element.
This gives a different result to taking the 0:3 in the first axis with arr3[0:3] and then taking the first array from this axis.
So in this case, the 0:3 part does nothing in either case as the array's shape is (3, 4, 3) so taking the first 3 just gives you back the same array. This does absolutely nothing in the second case, but in the first case, it does serve as essentially a place holder so that you can access the second axis, but for that you should just use a colon so: [:, some_index].
See how its the same array?
>>> arr3[0:3]
array([[[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3]],
[[2, 2, 3],
[4, 2, 3],
[4, 2, 2],
[2, 2, 2]],
[[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1]]])
But then when you do arr3[:, 1] you are taking the second element from the second axis of the array so that will give you:
array([[1, 2, 3],
[4, 2, 3],
[1, 1, 1]])
whereas in the other case, you are taking the second element from the first axis of the array` so:
array([[2, 2, 3],
[4, 2, 3],
[4, 2, 2],
[2, 2, 2]])
To read further about numpy indexing, take a look at this page on scipy.
Take note of this specific description which applies directly to your problem:
When there is at least one slice (:), ellipsis (...) or np.newaxis in the index (or the array has more dimensions than there are advanced indexes), then the behaviour can be more complicated. It is like concatenating the indexing result for each advanced index element
Let's look at our multidimensional numpy array:
import numpy as np
arr3=np.array([
[
[1,2,3],[1,2,3],[1,2,3],[1,2,3]
],[
[2,2,3],[4,2,3],[4,2,2],[2,2,2]
],[
[1,1,1],[1,1,1],[1,1,1],[1,1,1]
]
])
print(arr3[0:3,1])
That returns:
[[1 2 3]
[4 2 3]
[1 1 1]]
Which makes sense because we are fetching row numbers 1 through 3 and we are grabbing only the first column.
However, arr3[0:3][1] returns the array from row 0 to row 3 and then selects the second row (or row index 1).
Observe:
print(arr3[0:3])
Returns:
[[[1 2 3]
[1 2 3]
[1 2 3]
[1 2 3]]
[[2 2 3]
[4 2 3]
[4 2 2]
[2 2 2]]
[[1 1 1]
[1 1 1]
[1 1 1]
[1 1 1]]]
It returns the a new array (which happens to be the same as our current array because we just asked for all rows in the array). Then we ask for the second row:
print(arr3[0:3][1])
Returns:
[[2 2 3]
[4 2 3]
[4 2 2]
[2 2 2]]
From Getting indices of both zero and nonzero elements in array, I can get indicies of non-zero elements in a 1 D array in numpy like this:
indices_nonzero = numpy.arange(len(array))[~bindices_zero]
Is there a way to extend it to a 2D array?
You can use numpy.nonzero
The following code is self-explanatory
import numpy as np
A = np.array([[1, 0, 1],
[0, 5, 1],
[3, 0, 0]])
nonzero = np.nonzero(A)
# Returns a tuple of (nonzero_row_index, nonzero_col_index)
# That is (array([0, 0, 1, 1, 2]), array([0, 2, 1, 2, 0]))
nonzero_row = nonzero[0]
nonzero_col = nonzero[1]
for row, col in zip(nonzero_row, nonzero_col):
print("A[{}, {}] = {}".format(row, col, A[row, col]))
"""
A[0, 0] = 1
A[0, 2] = 1
A[1, 1] = 5
A[1, 2] = 1
A[2, 0] = 3
"""
You can even do this
A[nonzero] = -100
print(A)
"""
[[-100 0 -100]
[ 0 -100 -100]
[-100 0 0]]
"""
Other variations
np.where(array)
It is equivalent to np.nonzero(array)
But, np.nonzero is preferred because its name is clear
np.argwhere(array)
It's equivalent to np.transpose(np.nonzero(array))
print(np.argwhere(A))
"""
[[0 0]
[0 2]
[1 1]
[1 2]
[2 0]]
"""
A = np.array([[1, 0, 1],
[0, 5, 1],
[3, 0, 0]])
np.stack(np.nonzero(A), axis=-1)
array([[0, 0],
[0, 2],
[1, 1],
[1, 2],
[2, 0]])
np.nonzero returns a tuple of arrays, one for each dimension of a, containing the indices of the non-zero elements in that dimension.
https://docs.scipy.org/doc/numpy/reference/generated/numpy.nonzero.html
np.stack joins this tuple array along a new axis. In our case, the innermost axis also known as the last axis (denoted by -1).
The axis parameter specifies the index of the new axis in the dimensions of the result. For example, if axis=0 it will be the first dimension and if axis=-1 it will be the last dimension.
New in version 1.10.0.
https://docs.scipy.org/doc/numpy/reference/generated/numpy.stack.html