PyTorch value thresholding and zeroing all other values - python

I have 2D tensor in PyTorch, representing model confidences. I want:
if 2nd value in row is greater or equal to threshold, all other values should be changed to 0
else values should not change
The simple approach would be:
iterate through rows
check 2nd value
if value is greater or equal, create row of zeroes, change 2nd value to the 2nd value from row and replace row
else don't do anything
It is inefficient, however. Is there a vectorized / tensorized way to do this?

I would do this by first constructing a new zero matrix, and then moving items from your matrix to the zero matrix as needed. You copy all rows that are in a row whose second element is below the threshold. For all other rows, you only copy the second element.
import torch
threshold = .2
X = torch.rand((100, 10))
new = torch.zeros_like(X)
mask = X[:, 2] <= threshold
new[mask] = X[mask]
new[~mask, 2] = X[~mask, 2]

Try this
import numpy as np
x[(x[:,1] >= 0.5).nonzero(), np.r_[0, 2:x.shape[1]]] = 0.0
First, Get row indices using (x[:,1] >= 0.5).nonzero(),
then take columns indices np.r_[0, 2:x.shape[1]] except second column.

Related

Checking whether there is only one occurrence of maximum in a numpy array

I have a 4 x 4 numpy array. I would like determine whether the maximum of each row is unique, i.e. there is only one occurrence of the maximum value. I'm new to Python and numpy and wondered if there is a pythonic way (method) of doing this rather than running a for loop.
You could, for example, try this:
import numpy as np
x = np.random.randint(0, 10, (4, 4))
res = np.sum(x == x.max(axis=1, keepdims=True), axis=1) > 1
This gives you a boolean array. Its nth index is True if the maxima of the nth row of the input array occurs multiple times in the same row.
x.max(axis=1, keepdims=True) computes the maxima along the rows of the array and ensures that the result has the same number of dimensions as the input. Then it checks if there are further occurrences of the maxima in the corresponding rows. The result is boolean array of the same shape as the input array. In Python, booleans are effectively integer values, so you can sum them up. If sum is greater than 1, the maximum is not strict.

Selecting numpy columns based on values in a row

Suppose I have a numpy array with 2 rows and 10 columns. I want to select columns with even values in the first row. The outcome I want can be obtained is as follows:
a = list(range(10))
b = list(reversed(range(10)))
c = np.concatenate([a, b]).reshape(2, 10).T
c[c[:, 0] % 2 == 0].T
However, this method transposes twice and I don't suppose it's very pythonic. Is there a way to do the same job cleaner?
Numpy allows you to select along each dimension separately. You pass in a tuple of indices whose length is the number of dimensions.
Say your array is
a = np.random.randint(10, size=(2, 10))
The even elements in the first row are given by the mask
m = (a[0, :] % 2 == 0)
You can use a[0] to get the first row instead of a[0, :] because missing indices are synonymous with the slice : (take everything).
Now you can apply the mask to just the second dimension:
result = a[:, m]
You can also convert the mask to indices first. There are subtle differences between the two approaches, which you won't see in this simple case. The biggest difference is usually that linear indices are a little faster, especially if applied more than once:
i = np.flatnonzero(m)
result = a[:, i]

How to find index of minimum non zero element with numpy?

I have a 4x1 array that I want to search for the minimum non zero value and find its index. For example:
theta = array([0,1,2,3]).reshape(4,1)
It was suggested in a similar thread to use nonzero() or where(), but when I tried to use that in the way that was suggested, it creates a new array that doesn't have the same indices as the original:
np.argmin(theta[np.nonzero(theta)])
gives an index of zero, which clearly isn't right. I think this is because it creates a new array of non zero elements first. I am only interested in the first minimum value if there are duplicates.
np.nonzero(theta) returns the index of the values that are non-zero. In your case, it returns,
[1,2,3]
Then, theta[np.nonzero(theta)] returns the values
[1,2,3]
When you do np.argmin(theta[np.nonzero(theta)]) on the previous output, it returns the index of the value 1 which is 0.
Hence, the correct approach would be:
i,j = np.where( theta==np.min(theta[np.nonzero(theta)])) where i,j are the indices of the minimum non zero element of the original numpy array
theta[i,j] or theta[i] gives the respective value at that index.
#!/usr/bin/env python
# Solution utilizing numpy masking of zero value in array
import numpy as np
import numpy.ma as ma
a = [0,1,2,3]
a = np.array(a)
print "your array: ",a
# the non-zero minimum value
minval = np.min(ma.masked_where(a==0, a))
print "non-zero minimum: ",minval
# the position/index of non-zero minimum value in the array
minvalpos = np.argmin(ma.masked_where(a==0, a))
print "index of non-zero minimum: ", minvalpos
I think you #Emily were very close to the correct answer. You said:
np.argmin(theta[np.nonzero(theta)]) gives an index of zero, which clearly isn't right. I think this is because it creates a new array of non zero elements first.
The last sentence is correct => the first one is wrong since it is expected to give the index in the new array.
Let's now extract the correct index in the old (original) array:
nztheta_ind = np.nonzero(theta)
k = np.argmin(theta[nztheta_ind])
i = nztheta_ind[0][k]
j = nztheta_ind[1][k]
or:
[i[k] for i in nztheta_ind]
for arbitrary dimensionality of original array.
ndim Solution
i = np.unravel_index(np.where(theta!=0, theta, theta.max()+1).argmin(), theta.shape)
Explaination
Masking the zeros out creates t0. There are other ways, see the perfplot.
Finding the minimum location, returns the flattened (1D) index.
unravel_index fixes this problem, and hasn't been suggested yet.
theta = np.triu(np.random.rand(4,4), 1) # example array
t0 = np.where(theta!=0, theta, np.nan) # 1
i0 = np.nanargmin(t0) # 2
i = np.unravel_index(i0, theta.shape) # 3
print(theta, i, theta[i]) #
mask: i = np.unravel_index(np.ma.masked_where(a==0, a).argmin(), a.shape)
nan: i = np.unravel_index(np.nanargmin(np.where(a!=0, a, np.nan)), a.shape)
max: i = np.unravel_index(np.where(a!=0, a, a.max()+1).argmin(), a.shape)

Return elements in a location corresponding to the minimum values of another array

I have two arrays with the same shape in the first two dimensions and I'm looking to record the minimum value in each row of the first array. However I would also like to record the elements in the corresponding position in the third dimension of the second array. I can do it like this:
A = np.random.random((5000, 100))
B = np.random.random((5000, 100, 3))
A_mins = np.ndarray((5000, 4))
for i, row in enumerate(A):
current_min = min(row)
A_mins[i, 0] = current_min
A_mins[i, 1:] = B[i, row == current_min]
I'm new to programming (so correct me if I'm wrong) but I understand that with Numpy doing calculations on whole arrays is faster than iterating over them. With this in mind is there a faster way of doing this? I can't see a way to get rid of the row == current_min bit even though the location of the minimum point must have been 'known' to the computer when it was calculating the min().
Any tips/suggestions appreciated! Thanks.
Something along what #lib talked about:
index = np.argmin(A, axis=1)
A_mins[:,0] = A[np.arange(len(A)), index]
A_mins[:,1:] = B[np.arange(len(A)), index]
It is much faster than using a for loop.
For getting the index of the minimum value, use amin instead of min + comparison
The amin function (and many other functions in numpy) also takes the argument axis, that you can use to get the minimum of each row or each column.
See http://docs.scipy.org/doc/numpy/reference/generated/numpy.amin.html

Delete a column in a multi-dimensional array if all elements in that column satisfy a condition

I have a multi-dimensional array such as;
a = [[1,1,5,12,0,4,0],
[0,1,2,11,0,4,2],
[0,4,3,17,0,4,9],
[1,3,5,74,0,8,16]]
How can I delete the column if all entries within that column are equal to zero? In the array a that would mean deleting the 4th column resulting in:
a = [[1,1,5,12,4,0],
[0,1,2,11,4,2],
[0,4,3,17,4,9],
[1,3,5,74,8,16]]
N.b I've written a as a nested list but only to make it clear. I also don't know a priori where the zero column will be in the array.
My attempt so far only finds the index of the column in which all elements are equal to zero:
a = np.array([[1,1,5,12,0,4,0],[0,1,2,11,0,4,2],[0,4,3,17,0,4,9],[1,3,5,74,0,8,16]])
b = np.vstack(a)
ind = []
for n,m in zip(b.T,range(len(b.T))):
if sum(n) == 0:
ind.append(m)
Is there any way to achieve this?
With the code you already have, you can just do:
for place in ind:
for sublist in a:
del sublist[place]
Which gets the job done but is not very satisfactory...
Edit: numpy is strong
import numpy as np
a = np.array(a)
a = a[:, np.sum(a, axis=0)!=0]

Categories

Resources