Related
Suppose I have a Numpy array:
[
[0, 1, 0],
[0, 1, 4],
[2, 0, 0],
]
How can I turn this into a "hot encoded" 3D array? something like this:
[
# Group of 0's
[[1, 0, 1],
[1, 0, 0],
[0, 1, 1]],
# Group of 1's
[[0, 1, 0],
[0, 1, 0],
[0, 0, 0]],
# Group of 2's
[[0, 0, 0],
[0, 0, 0],
[1, 0, 0]],
# Group of 3's
# the group is still here, even though there are no threes
[[0, 0, 0],
[0, 0, 0],
[0, 0, 0]],
# Group of 4's
[[0, 0, 0],
[0, 0, 1],
[0, 0, 0]]
]
That is, how can I take each occurrence of a number in the array and "group" them into their own plane in a 3D matrix? As shown in the example, even the "gap" in numbers (i.e. the 3) should still appear. In my case, I know the range of the data beforehand (range (0, 6]), so that should make it easier.
BTW, I need this because I have a chessboard represented by numbers, but need it in this form to pass into a 2d convolutional neural network (different "channels" for different pieces).
I've seen Convert a 2d matrix to a 3d one hot matrix numpy, but that has a one-hot encoding for every value, which isn't what I'm looking for.
Create the desired array (arr.max()+1 here) and then reshape it to compare to the original array:
Setup:
arr = np.array([
[0, 1, 0],
[0, 1, 4],
[2, 0, 0],
])
u = np.arange(arr.max()+1)
(u[:,np.newaxis,np.newaxis]==arr).astype(int)
array([[[1, 0, 1],
[1, 0, 0],
[0, 1, 1]],
[[0, 1, 0],
[0, 1, 0],
[0, 0, 0]],
[[0, 0, 0],
[0, 0, 0],
[1, 0, 0]],
[[0, 0, 0],
[0, 0, 0],
[0, 0, 0]],
[[0, 0, 0],
[0, 0, 1],
[0, 0, 0]]])
I have some data that I want to "one-hot encode" and it is represented as a 1-dimensional vector of positions.
Is there any function in NumPy that can expand my x into my x_ohe?
I'm trying to avoid using for-loops in Python at all costs for operations like this after watching Jake Vanderplas's talk
x = np.asarray([0,0,1,0,2])
x_ohe = np.zeros((len(x), 3), dtype=int)
for i, pos in enumerate(x):
x_ohe[i,pos] = 1
x_ohe
# array([[1, 0, 0],
# [1, 0, 0],
# [0, 1, 0],
# [1, 0, 0],
# [0, 0, 1]])
If x only contains non negative integers, you can compare x with a sequence use numpy broadcasting and convert the result to ints:
(x[:,None] == np.arange(x.max()+1)).astype(int)
#array([[1, 0, 0],
# [1, 0, 0],
# [0, 1, 0],
# [1, 0, 0],
# [0, 0, 1]])
Or initialize first, then assign ones use advanced indexing:
x_ohe = np.zeros((len(x), 3), dtype=int)
x_ohe[np.arange(len(x)), x] = 1
x_ohe
#array([[1, 0, 0],
# [1, 0, 0],
# [0, 1, 0],
# [1, 0, 0],
# [0, 0, 1]])
A one liner :
np.equal.outer(x,range(3)).astype(int)
array([[1, 0, 0],
[1, 0, 0],
[0, 1, 0],
[1, 0, 0],
[0, 0, 1]])
np.equal.outer(x,np.unique(x)).astype(int) works also here.
So I have several 3D arrays that I need to add together. Each array consists of entries with either 0 or 1. All arrays also have the same dimension. Now, when I add these arrays together some of the values overlap (which they do). However, I just need to know how the structure of the total combined array is, which means that I don't need the values 1, 2 or 3 when 2 or 3 arrays have overlapped. This also just need to be one, and of course, wherever there is a zero, the value zero just need to remain zero.
So basically what I have is:
array1 =
[[[1, 0, 0], [0, 0, 0], [0, 0, 0]],
[[0, 1, 0], [0, 0, 0], [0, 0, 0]],
[[0, 0, 1], [1, 1, 1], [0, 0, 0]]]
array2 =
[[[1, 0, 0], [0, 1, 0], [0, 0, 0]],
[[0, 0, 0], [1, 1, 0], [0, 0, 0]],
[[0, 0, 1], [0, 1, 0], [0, 0, 0]]]
So when adding them together I get:
array_total = array1 + array2 =
[[[2, 0, 0], [0, 1, 0], [0, 0, 0]],
[[0, 1, 0], [1, 1, 0], [0, 0, 0]],
[[0, 0, 2], [1, 2, 1], [0, 0, 0]]]
Where I actually want it to give me:
array_total = array1 + array2 =
[[[1, 0, 0], [0, 1, 0], [0, 0, 0]],
[[0, 1, 0], [1, 1, 0], [0, 0, 0]],
[[0, 0, 1], [1, 1, 1], [0, 0, 0]]]
So can anyone give me a hint to how this is done ?
(Assuming those are numpy arrays, or array1 + array2 would behave differently).
If you want to "change all positive values to 1", you can do this
array_total[array_total > 0] = 1
But what you actually want is an array that has a 1 where array1 or array2 has a 1, so just write it directly like that:
array_total = array1 | array2
Example:
>>> array1 = np.array([[[1, 0, 0], [0, 0, 0], [0, 0, 0]],
... [[0, 1, 0], [0, 0, 0], [0, 0, 0]],
... [[0, 0, 1], [1, 1, 1], [0, 0, 0]]])
>>> array2 = np.array([[[1, 0, 0], [0, 1, 0], [0, 0, 0]],
... [[0, 0, 0], [1, 1, 0], [0, 0, 0]],
... [[0, 0, 1], [0, 1, 0], [0, 0, 0]]])
>>> array1 | array2
array([[[1, 0, 0], [0, 1, 0], [0, 0, 0]],
[[0, 1, 0], [1, 1, 0], [0, 0, 0]],
[[0, 0, 1], [1, 1, 1], [0, 0, 0]]])
I have a ndarray, and I want to set all the non-maximum elements in the last dimension to be zero.
a = np.array([[[1,8,3,4],[6,7,10,6],[11,12,15,4]],
[[4,2,3,4],[4,7,9,8],[41,14,15,3]],
[[4,22,3,4],[16,7,9,8],[41,12,15,43]]
])
print(a.shape)
(3,3,4)
I can get the indexes of maximum elements by np.argmax():
b = np.argmax(a, axis=2)
b
array([[1, 2, 2],
[0, 2, 0],
[1, 0, 3]])
Obviously, b has 1 dimension less than a. Now, I want to get a new 3-d array that has all zeros except for where the maximum values are.
I want to get this array:
np.array([[[0,1,0,0],[0,0,1,0],[0,0,1,0]],
[[1,0,0,1],[0,0,1,0],[1,0,0,0]],
[[0,1,0,0],[1,0,0,0],[0,0,0,1]]
])
One way to achieve this, I tried creating these temporary arrays
b = np.repeat(b[:,:,np.newaxis], 4, axis=2)
t = np.repeat(np.arange(4).reshape(4,1), 9, axis=1).T.reshape(b.shape)
z = np.zeros(shape=a.shape, dtype=int)
z[t == b] = 1
z
array([[[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 0, 1, 0]],
[[1, 0, 0, 0],
[0, 0, 1, 0],
[1, 0, 0, 0]],
[[0, 1, 0, 0],
[1, 0, 0, 0],
[0, 0, 0, 1]]])
Any idea how to get this in a more efficient way?
Here's one way that uses broadcasting:
In [108]: (a == a.max(axis=2, keepdims=True)).astype(int)
Out[108]:
array([[[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 0, 1, 0]],
[[1, 0, 0, 1],
[0, 0, 1, 0],
[1, 0, 0, 0]],
[[0, 1, 0, 0],
[1, 0, 0, 0],
[0, 0, 0, 1]]])
I have a 3d numpy array. I'd like to find the largest x, y and z co-ordinates of non-zero element elements along each of the three axes of the array. How can I do that?
So for the example below x=1, y=2, z=1
array([[[1, 1, 0],
[1, 1, 0],
[0, 0, 0]],
[[0, 0, 0],
[1, 0, 0],
[1, 0, 0]],
[[0, 0, 0],
[0, 0, 0],
[0, 0, 0]]])
Get the indices of non-zero elements with np.nonzero and stack them up in columns with np.column_stack and finally find the max along the columns with .max(0). The implementation would look something like this -
np.column_stack((np.nonzero(A))).max(0)
Looks like there is a built-in function np.argwhere for getting indices of all non-zero elements stacked in a 2D array. Thus, you can simply do -
np.argwhere(A).max(0)
Sample run -
In [50]: A
Out[50]:
array([[[1, 1, 0],
[1, 1, 0],
[0, 0, 0]],
[[0, 0, 0],
[1, 0, 0],
[1, 0, 0]],
[[0, 0, 0],
[0, 0, 0],
[0, 0, 0]]])
In [51]: np.column_stack((np.nonzero(A))).max(0)
Out[51]: array([1, 2, 1])
In [52]: np.argwhere(A).max(0)
Out[52]: array([1, 2, 1])
Done using numpy.nonzero
>>> tuple(coords.max() for coords in numpy.nonzero(A))
(1, 2, 1)