Aggregating 2 NumPy arrays by confidence - python

I have 2 np arrays containing values in the interval [0,1].
I want to create the third array, containing the most "confident" values, meaning to take elementwise, the number from the array which is closer to 1 or 0. Consider the following example:
[0.7,0.12,1,0.5]
[0.1,0.99,0.001,0.49]
so my constructed array would be:
[0.1,0.99,1,0.49]

import numpy as np
A = np.array([0.7,0.12,1,0.5])
B = np.array([0.1,0.99,0.001,0.49])
maxi = np.maximum(A,B)
mini = np.minimum(A,B)
# Find where the maximum is closer to 1 than the minimum is to 0
idx = 1-maxi < mini
maxi*idx + mini*~idx
returns
array([ 0.1 , 0.99, 1. , 0.49])

You can try this:
c=np.array([a[i] if min(1-a[i],a[i])<min(1-b[i],b[i]) else b[i] for i in range(len(a))])
The result is:
array([ 0.1 , 0.99, 1. , 0.49])

Another way of stating your "confidence" measure is to ask which of the two numbers are furtest away from 0.5. That is, which of the two numbers x yields the largest abs(0.5 - x). The following solution constructs a 2D array c with the original arrays as columns. Then we construct and apply a boolean mask based on abs(0.5 - c):
import numpy as np
a = np.array([0.7,0.12,1,0.5])
b = np.array([0.1,0.99,0.001,0.49])
# Combine
c = np.concatenate((a, b)).reshape((2, len(a))).T
# Create mask
b_or_a = np.asarray(np.argmax(np.abs((0.5 - c)), axis=1), dtype=bool)
mask = np.zeros(c.shape, dtype=bool)
mask[:, 0] = ~b_or_a
mask[:, 1] = b_or_a
# Applt mask
d = c[mask]
print(d) # [ 0.1 0.99 1. 0.49]

Related

how to avoid division by zero in 2d numpy array when taking average?

Let's say I have three arrays
A = np.array([[2,2,2],[1,0,0],[1,2,1]])
B = np.array([[2,0,2],[0,1,0],[1,2,1]])
C = np.array([[2,0,1],[0,1,0],[1,1,2]])
A,B,C
(array([[2, 2, 2],
[1, 0, 0],
[1, 2, 1]]),
array([[2, 0, 2],
[0, 1, 0],
[1, 2, 1]]),
array([[2, 0, 1],
[0, 1, 0],
[1, 1, 2]]))
when i take average of C/ (A+B), i get nan/inf value with RunTimeWarning..
the resultant array looks like the following.
np.average(C/(A+B), axis = 1)
array([0.25 , nan, 0.58333333])
I would like to change any inf/nan value to 0.
What I tried so far was
#doesn't work. ( maybe im doing this wrong..)
mask = A+B >0
np.average(C[mask]/(A[mask]+B[mask]), axis = 1)
#does not work and not an ideal solution.
avg = np.average(C/(A+B), axis = 1)
avg[avg == np.nan] =0
any help would be appreciated!
Your tried approaches are both a valid way of dealing with it, but you need to change them slightly.
Avoiding the division upfront, by only calculating the result where it's valid (eg non-zero):
The use of the boolean mask you defined makes the resulting arrays (after indexing) to become 1D. So using this would mean you have to allocate the resulting array upfront, and assign it using that same mask.
mask = A+B > 0
result = np.zeros_like(A, dtype=np.float32)
result[mask] = C[mask]/(A[mask]+B[mask])
It does require the averaging over the second dimension to be done separate, and also masking the incorrect result to zero for elements where the division could not be done due to the zeros.
result = result.mean(axis=1)
result[(~mask).any(axis=1)] = 0
To me the main benefit would be avoiding the warning from Numpy, and perhaps in the case of a large amount of zeros (in A+B) you could gain a little performance by avoiding that calculation all together. But overall it seems a lot of effort to me.
Masking invalid values afterwards:
The main takeaway here is that you should never ever compare against np.nan directly since it will always be False. You can check this yourself by looking at the result from np.nan == np.nan. The way to handle this is use the dedicated np.isnan function. Or alternatively negate the np.isfinite function if you also want to catch +/- np.inf values at the same time.
avg = np.average(C/(A+B), axis = 1)
avg[np.isnan(avg)] = 0
# or to include inf
avg[~np.isfinite(avg)] = 0
import numpy as np
a = np.array([1, np.nan])
print(a) # [1, nan]
a = np.nan_to_num(a)
print(a) # [1, 0]
https://numpy.org/doc/stable/reference/generated/numpy.nan_to_num.html
for inf and -inf
from numpy import inf
avg[avg == inf] = 0
avg[avg == -inf] = 0
Simply follow this if you are supposed to keep the inf value zero
np.divide(a, b, where=b.astype(bool))
This is tougher than I thought as np.mean's where argument doesn't work if it results in empty arrays and np.average's weights have to be 1-D.
# these don't work
# >>> np.mean(div, axis=1, where=mask.all(1, keepdims=True))
# RuntimeWarning: Mean of empty slice.
# RuntimeWarning: invalid value encountered in true_divide
# >>> np.average(div, axis=1, weights=mask.all(1, keepdims=True))
# TypeError: 1D weights expected when shapes of a and weights differ.
import numpy as np
A = np.array([[2,2,2],[1,0,0],[1,2,1]])
B = np.array([[2,0,2],[0,1,0],[1,2,1]])
C = np.array([[2,0,1],[0,1,0],[1,1,2]])
div = np.zeros(C.shape)
AB = A+B # avoid repeated summing
mask = AB > 0 # AB != 0 to include all valid divisors
np.divide(C, AB, where=mask, out=div) # out=None won't initialize unused elements
np.mean(div * mask.all(1, keepdims=True), axis = 1)
Output
array([0.25 , 0. , 0.58333333])

Apply logical and/or operations along an axis in numpy python [duplicate]

For machine learning, I'm appliying Parzen Window algorithm.
I have an array (m,n). I would like to check on each row if any of the values is > 0.5 and if each of them is, then I would return 0, otherwise 1.
I would like to know if there is a way to do this without a loop thanks to numpy.
You can use np.all with axis=1 on a boolean array.
import numpy as np
arr = np.array([[0.8, 0.9], [0.1, 0.6], [0.2, 0.3]])
print(np.all(arr>0.5, axis=1))
>> [True False False]
import numpy as np
# Value Initialization
a = np.array([0.75, 0.25, 0.50])
y_predict = np.zeros((1, a.shape[0]))
#If the value is greater than 0.5, the value is 1; otherwise 0
y_predict = (a > 0.5).astype(float)
I have an array (m,n). I would like to check on each row if any of the values is > 0.5
That will be stored in b:
import numpy as np
a = # some np.array of shape (m,n)
b = np.any(a > 0.5, axis=1)
and if each of them is, then I would return 0, otherwise 1.
I'm assuming you mean 'and if this is the case for all rows'. In this case:
c = 1 - 1 * np.all(b)
c contains your return value, either 0 or 1.

How to mask an array where the index is less than a certain

I have a 3D array where the first index refers to the height. I have a 2D array where each element is a minimum height.
import numpy as np
a = np.ones((3,3,3)) # 3D array
b = [[1.2, 1.0, 2.0],
[1.5, 1.2, 1.3],
[1.0, 2.0, 0.5]]
I want to mask a where the first index/dimension of a is less than the value given by b.
For example:
a[0,1,1] = 0 and a[1,1,1] since b[1,1] = 1.2, but a[2,1,1] = 1
My solution is to use for loops, but I would like to create a boolean matrix using np.ma.mask().
My solution:
nLat = a.shape[1]
nLon = a.shape[2]
for i in np.arange(0,nLat,1):
for j in np.arange(0,nLon,1):
minHeight = b[i,j]
for hgt, value in enumerate(a):
if hgt < minHeight:
a[hgt,i,j] = 0
This modifies the original array. While this works, I'd rather create a boolean array (preferably with fewer loops), and then multiply the boolean by the original to create a final output that is unchanged except where the indices are too small.
We can get the required mask with a ranged comparison with b -
mask = np.arange(a.shape[0])[:,None,None]<b
a[mask] = 0
We can also use builtin for the outer comparison to get the mask :
mask = np.less.outer(np.arange(a.shape[0]),b)
If you are only interested in the mask equivalent of a, use -
L=3 # output length
a_mask = (np.arange(L)[:,None,None]>=b)

Is there a way to get the top k values per row of a numpy array (Python)?

Given a numpy array of the form below:
x = [[4.,3.,2.,1.,8.],[1.2,3.1,0.,9.2,5.5],[0.2,7.0,4.4,0.2,1.3]]
is there a way to retain the top-3 values in each row and set others to zero in python (without an explicit loop). The result in the case of the example above would be
x = [[4.,3.,0.,0.,8.],[0.,3.1,0.,9.2,5.5],[0.0,7.0,4.4,0.0,1.3]]
Code for one example
import numpy as np
arr = np.array([1.2,3.1,0.,9.2,5.5,3.2])
indexes=arr.argsort()[-3:][::-1]
a = list(range(6))
A=set(indexes); B=set(a)
zero_ind=(B.difference(A))
arr[list(zero_ind)]=0
The output:
array([0. , 0. , 0. , 9.2, 5.5, 3.2])
Above is my sample code (with many lines) for a 1-D numpy array. Looping through each row of a numpy array and performing this same computation repeatedly would be quite expensive. Is there a simpler way?
Here is a fully vectorized code without third party outside numpy. It is using numpy's argpartition to efficiently find the k-th values. See for instance this answer for other use cases.
def truncate_top_k(x, k, inplace=False):
m, n = x.shape
# get (unsorted) indices of top-k values
topk_indices = numpy.argpartition(x, -k, axis=1)[:, -k:]
# get k-th value
rows, _ = numpy.indices((m, k))
kth_vals = x[rows, topk_indices].min(axis=1)
# get boolean mask of values smaller than k-th
is_smaller_than_kth = x < kth_vals[:, None]
# replace mask by 0
if not inplace:
return numpy.where(is_smaller_than_kth, 0, x)
x[is_smaller_than_kth] = 0
return x
Use np.apply_along_axis to apply a function to 1-D slices along a given axis
import numpy as np
def top_k_values(array):
indexes = array.argsort()[-3:][::-1]
A = set(indexes)
B = set(list(range(array.shape[0])))
array[list(B.difference(A))]=0
return array
arr = np.array([[4.,3.,2.,1.,8.],[1.2,3.1,0.,9.2,5.5],[0.2,7.0,4.4,0.2,1.3]])
result = np.apply_along_axis(top_k_values, 1, arr)
print(result)
Output
[[4. 3. 0. 0. 8. ]
[0. 3.1 0. 9.2 5.5]
[0. 7. 4.4 0. 1.3]]
def top_k(arr, k, axis = 0):
top_k_idx = = np.take_along_axis(np.argpartition(arr, -k, axis = axis),
np.arange(-k,-1),
axis = axis) # indices of top k values in axis
out = np.zeros.like(arr) # create zero array
np.put_along_axis(out, top_k_idx, # put idx values of arr in out
np.take_along_axis(arr, top_k_idx, axis = axis),
axis = axis)
return out
This should work for arbitrary axis and k, but does not work in-place. If you want in-place it's a bit simpler:
def top_k(arr, k, axis = 0):
remove_idx = = np.take_along_axis(np.argpartition(arr, -k, axis = axis),
np.arange(arr.shape[axis] - k),
axis = axis) # indices to remove
np.put_along_axis(out, remove_idx, 0, axis = axis) # put 0 in indices
Here is an alternative that use a list comprehension to look thru your array and applying the keep_top_3 function
import numpy as np
import heapq
def keep_top_3(arr):
smallest = heapq.nlargest(3, arr)[-1] # find the top 3 and use the smallest as cut off
arr[arr < smallest] = 0 # replace anything lower than the cut off with 0
return arr
x = [[4.,3.,2.,1.,8.],[1.2,3.1,0.,9.2,5.5],[0.2,7.0,4.4,0.2,1.3]]
result = [keep_top_3(np.array(arr)) for arr in x]
I hope this helps :)

How to remove negative outputs from a function?

I have used a function to calculate the difference between 2 values. From printing the output of the function below, the answer ranges from -5 to 4. However I only want the function to display positive answers only (i.e: 1 to 4).
Is it possible to disregard the negative values without changing the boundaries of x and nor change the value of a?
import numpy as np
L = 10
a = 5
def position(x,a):
return x-a
x = np.arange(0.0, L, 1)
print (position(x,a))
[-5. -4. -3. -2. -1. 0. 1. 2. 3. 4.]
import numpy as np
L = 10
a = 5
def position(x,a):
return x-a
x = np.arange(0.0, L, 1)
tmp = position(x,a)
print (tmp[tmp>=0])
This may help, an example of filtering a numpy array
import numpy
arr = numpy.array([-1.1, 0.0, 1.1])
print(arr)
bools = arr >= 0.0 # define selection
print(bools)
# filter by "bools"
print(arr[bools])
based on what I can take from your question this should work:
result = list(filter(lambda x:x>=0, position(x,a))

Categories

Resources