comparing numpy arrays with tolerance

comparing numpy arrays with tolerance - python

I'm trying to compare floating numbers that are stored in numpy arrays.
I would like them to be compared with a tolerance and every number of the array should be compared with every number of the other array.
My attempt is shown underneath, I used two simple arrays as examples but it has the problem that it only compares numbers with the same indices.
b_y_ion_mass = np.array([1.000, 2.1300, 3.4320, 6.0000])
observed_mass_array = np.array([0.7310, 2.2300, 5.999, 8.000, 9.000])
abs_tol = 0.2
for (fragment, mass) in zip(b_y_ion_mass, observed_mass_array):
if (fragment+abs_tol)> mass and (fragment-abs_tol)< mass:
print(mass)
It would be great if anyone could help me.
Thank you.

Use np.isclose with atol = abs_tol.
import numpy as np
b_y_ion_mass = np.array([1.000, 2.1300, 3.4320, 6.0000])
observed_mass_array = np.array([0.7310, 2.2300, 5.999, 8.000, 9.000])
abs_tol = 0.2
np.isclose( b_y_ion_mass, observed_mass_array[ :, None ] , atol = abs_tol )
# columns rows
# array([[False, False, False, False],
# [False, True, False, False],
# [False, False, False, True],
# [False, False, False, False],
# [False, False, False, False]])
# Compares
# [1.000, 2.1300, 3.4320, 6.0000]
# [0.7310,
# 2.2300, True
# 5.999, True
# 8.000,
# 9.000]
To get the observed masses:
np.isclose( b_y_ion_mass, observed_mass_array[ :, None ],
atol = abs_tol ) * observed_mass_array[ :, None ]
Result
array([[0. , 0. , 0. , 0. ],
[0. , 2.23 , 0. , 0. ],
[0. , 0. , 0. , 5.999],
[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ]])

You can do:
diff_matrix = b_y_ion_mass - observed_mass_array[:, np.newaxis]
to subtract each item in a by each item in b:
array([[ 2.690e-01, 1.399e+00, 2.701e+00, 5.269e+00],
[-1.230e+00, -1.000e-01, 1.202e+00, 3.770e+00],
[-4.999e+00, -3.869e+00, -2.567e+00, 1.000e-03],
[-7.000e+00, -5.870e+00, -4.568e+00, -2.000e+00],
[-8.000e+00, -6.870e+00, -5.568e+00, -3.000e+00]])
then take the absolute value and compare to your tolerence:
valid = abs(diff_matrix) < abs_tol
output:
array([[False, False, False, False],
[False, True, False, False],
[False, False, False, True],
[False, False, False, False],
[False, False, False, False]])
So you can see here that the second item in the first array subtract the second item in the second array is less than your tolerance. Also, the last item in your first array subtract the third item in your second array is less than your tolerance

Related

get texts from array at indexes where 2D tensor's values are above or equal to thresholds in tensorflow2

I have tensorflow tensor with probabilites like this:
>>> valid_4_preds
array([[0.9817431 , 0.01259811, 0.50729334, 0.00053732, 0.6966804 ,
0.00488825],
[0.9851129 , 0.01246135, 0.38177294, 0.00378728, 0.8398497 ,
0.68413687],
[0.00061161, 0.00005008, 0.00017785, 0.0000152 , 0.00017121,
0.00002404],
[0.9991425 , 0.23962161, 0.98579687, 0.01727398, 0.9354003 ,
0.3325037 ]], dtype=float32)
I now need to map the above probabilties with different thresholds to classes(or a tensor with texts) and get them.
>>> # printing classes
>>> classes
<tf.Tensor: shape=(6,), dtype=string, numpy=
array([b'class_1', b'class_2', b'class_3', b'class_4', b'class_5',
b'class_6'], dtype=object)>
>>> # converting to bools
>>> true_falses = tf.math.greater_equal(valid_4_preds, tf.constant([0.5, 0.40, 0.20, 0.80, 0.5, 0.4]))
>>> true_falses
<tf.Tensor: shape=(4, 6), dtype=bool, numpy=
array([[ True, False, True, False, True, False],
[ True, False, True, False, True, True],
[False, False, False, False, False, False],
[ True, False, True, False, True, False]])>
now, I am trying to get the texts at indices where true_falses has Trues(this is my expected output), like this:
>>> <some-tensorflow-operations>
<tf.Tensor: shape=(4, 6), dtype=bool, numpy=
array([['class_1', 'class_3', 'class_5'],
['class_1', 'class_3', 'class_5', 'class_6'],
[],
['class_1', 'class_3', 'class_5']])>
Here's what I have tried:
tf.boolean_mask seems to solve the purpose, but the mask it takes in, strictly has to be 1D array.
tf.where can be used to get the indexes, output of which after reshaping to single dimension can be passed to tf.gather to get the respective classes like this:
>>> tf.gather(classes, tf.reshape(tf.where(true_falses[0] == True), shape=(-1,)))
<tf.Tensor: shape=(3,), dtype=string, numpy=array([b'class_1', b'class_3', b'class_5'], dtype=object)>
But, I haven't been able to figure out how to do this on 2D arrays.
this logic will go in a signature for serving via tensorflow-serving, so operations strictly only needs to be of tensorflow. How do I do this on 2D tensors or arrays? more efficient and quicker operations would be appreciated.

tf.ragged.boolean_mask?
import tensorflow as tf
classes = tf.constant([b'class_1', b'class_2', b'class_3', b'class_4', b'class_5', b'class_6'])
true_falses = tf.constant([
[ True, False, True, False, True, False],
[ True, False, True, False, True, True],
[False, False, False, False, False, False],
[ True, False, True, False, True, False]]
)
tf.ragged.boolean_mask(
data=tf.tile(tf.expand_dims(classes, 0), [tf.shape(true_falses)[0], 1]),
mask=true_falses
)
# <tf.RaggedTensor [[b'class_1', b'class_3', b'class_5'], [b'class_1', b'class_3', b'class_5', b'class_6'], [], [b'class_1', b'class_3', b'class_5']]>

Python numpy boolean array not whole columns and rows

I want to apply the NOT operation on whole columns/rows of a boolean Numpy array. Is this possible with Numpy?
matrix = np.array([[False for i in range(3)] for j in range(2)])
# Initial
# [False, False, False]
# [False, False, False]
matrix[:,1].not() # Something like this
# After not operation on column 1
# [False, True, False]
# [False, True, False]

This should do the trick, see here
matrix[:, 1] = np.logical_not(matrix[:, 1])

Find indices of element in 2D array

I have a piece of code below that calculates the maximum value of an array. It then calculates a value for 90% of the maximum, finds the closest value to this in the array as well as its corresponding index.
I need to ensure that I am finding the closest value to 90% that occurs only before the maximum. Can anyone help with this please? I was thinking about maybe compressing the array after the maximum has occurred but then each array I use will be a different size and that will be difficult later on.
import numpy as np
#make amplitude arrays
amplitude=[0,1,2,3, 5.5, 6,5,2,2, 4, 2,3,1,6.5,5,7,1,2,2,3,8,4,9,2,3,4,8,4,9,3]
#split arrays up into a line for each sample
traceno=5 #number of traces in file
samplesno=6 #number of samples in each trace. This wont change.
amplitude_split=np.array(amplitude, dtype=np.int).reshape((traceno,samplesno))
#find max value of trace
max_amp=np.amax(amplitude_split,1)
#find index of max value
ind_max_amp=np.argmax(amplitude_split, axis=1, out=None)
#find 90% of max value of trace
amp_90=np.amax(amplitude_split,1)*0.9
# find the indices of the min absolute difference
indices_90 = np.argmin(np.abs(amplitude_split - amp_90[:, None]), axis=1)
print("indices for 90 percent are", + indices_90)

Use a mask to set the values after the maximum (including the maximum? ) to a known 'too high' value. Then argmin will return the index of the minimum difference in the 'valid' area of each row.
# Create a mask for amplitude equal to the maximum
# add a dimension to max_amp.
mask = np.equal(amplitude_split, max_amp[-1, None])
# Cumsum the mask to set all elements in a row after the first True to True
mask[:] = mask.cumsum(axis = 1)
mask
# array([[False, False, False, False, False, True],
#  [ True, True, True, True, True, True],
# [False, False, False, True, True, True],
# [False, False, False, False, True, True],
# [False, False, False, False, True, True]])
# Set inter to the absolute difference.
inter = np.abs(amplitude_split - amp_90[-1,None])
# Set the max and after to a high value (10. here).
inter[mask] = max_amp.max() # Any suitably high value
inter # Where the mask is True inter == 9.
# array([[8.1, 7.1, 6.1, 5.1, 3.1, 9. ],
# [9. , 9. , 9. , 9. , 9. , 9. ],
# [7.1, 2.1, 3.1, 9. , 9. , 9. ],
# [6.1, 5.1, 0.1, 4.1, 9. , 9. ],
# [5.1, 4.1, 0.1, 4.1, 9. , 9. ]])
# Find the indices of the minimum in each row
np.argmin(inter, axis = 1)
# array([4, 0, 1, 2, 2])

Get path of boundaries of contiguous regions in 2D array

Say I have an array like this:
import numpy as np
arr = np.array([
[1, 1, 3, 3, 1],
[1, 3, 3, 1, 1],
[4, 4, 3, 1, 1],
[4, 4, 1, 1, 1]
])
There are 4 distinct regions: The top left 1s, 3s, 4s and right 1s.
How would I get the paths for the bounds of each region? The coordinates of the vertices of the region, in order.
For example, for the top left 1s, it is (0, 0), (0, 2), (1, 2), (1, 1), (2, 1), (2, 0)
(I ultimately want to end up with something like start at 0, 0. Right 2. Down 1. Right -1. Down 1. Right -1. Down -2., but it's easy to convert, as it's just the difference between adjacent vertices)
I can split it up into regions with scipy.ndimage.label:
from scipy.ndimage import label
regions = {}
# region_value is the number in the region
for region_value in np.unique(arr):
labeled, n_regions = label(arr == region_value)
regions[region_value] = [labeled == i for i in range(1, n_regions + 1)]
Which looks more like this:
{1: [
array([
[ True, True, False, False, False],
[ True, False, False, False, False],
[False, False, False, False, False],
[False, False, False, False, False]
], dtype=bool), # Top left 1s region
array([
[False, False, False, False, True],
[False, False, False, True, True],
[False, False, False, True, True],
[False, False, True, True, True]
], dtype=bool) # Right 1s region
],
3: [
array([
[False, False, True, True, False],
[False, True, True, False, False],
[False, False, True, False, False],
[False, False, False, False, False]
], dtype=bool) # 3s region
],
4: [
array([
[False, False, False, False, False],
[False, False, False, False, False],
[ True, True, False, False, False],
[ True, True, False, False, False]
], dtype=bool) # 4s region
]
}
So how would I convert that into a path?

a pseudo code idea would be to do the following:
scan multi-dim array horizontally and then vertically until you find True value (for second array it is (0,4))
output that as a start coord
since you have been scanning as determined above your first move will be to go right.
repeat until you come back:
move one block in the direction you are facing.
you are now at coord x,y
check values of ul=(x-1, y-1), ur=(x-1, y), ll=(x, y-1), lr=(x,y)
# if any of above is out of bounds, set it as False
if ul is the only True:
if previous move right:
next move is up
else:
next move is left
output previous move
move by one
..similarly for other single True cells..
elif ul and ur only True or ul and ll only True or ll and lr only True or ur and lr only True:
repeat previous move
elif ul and lr only True:
if previous move left:
next move down
elif previous move right:
next move up
elif preivous move down:
next move left:
else:
next move right
output previous move
move one
elif ul, ur, ll only Trues:
if previous move left:
next move down
else:
next move right
output previous move, move by one
...similarly for other 3 True combos...
for the second array it will do the following:
finds True val at 0,4
start at 0,4
only lower-right cell is True, so moves right to 0,5 (previous move is None, so no output)
now only lower-left cell is True, so moves down to 1,5 (previous move right 1 is output)
now both left cells are True, so repeat move (moves down to 2,5)
..repeat until hit 4,5..
only upper-left cell is True, so move left (output down 4)
both upper cells are true, repeat move (move left to 3,4)
both upper cells are true, repeat move (move left to 2,4)
upper right cell only true, so move up (output right -3)
..keep going until back at 0,4..
Try visualising all the possible coord neighbouring cell combos and that will give you a visual idea of the possible flows.
Also note that with this method it should be impossible to be traversing a coord which has all 4 neighbours as False.

How do I create a numpy array of all True or all False?

In Python, how do I create a numpy array of arbitrary shape filled with all True or all False?

The answer:
numpy.full((2, 2), True)
Explanation:
numpy creates arrays of all ones or all zeros very easily:
e.g. numpy.ones((2, 2)) or numpy.zeros((2, 2))
Since True and False are represented in Python as 1 and 0, respectively, we have only to specify this array should be boolean using the optional dtype parameter and we are done:
numpy.ones((2, 2), dtype=bool)
returns:
array([[ True, True],
[ True, True]], dtype=bool)
UPDATE: 30 October 2013
Since numpy version 1.8, we can use full to achieve the same result with syntax that more clearly shows our intent (as fmonegaglia points out):
numpy.full((2, 2), True, dtype=bool)
UPDATE: 16 January 2017
Since at least numpy version 1.12, full automatically casts to the dtype of the second parameter, so we can just write:
numpy.full((2, 2), True)

numpy.full((2,2), True, dtype=bool)

ones and zeros, which create arrays full of ones and zeros respectively, take an optional dtype parameter:
>>> numpy.ones((2, 2), dtype=bool)
array([[ True, True],
[ True, True]], dtype=bool)
>>> numpy.zeros((2, 2), dtype=bool)
array([[False, False],
[False, False]], dtype=bool)

If it doesn't have to be writeable you can create such an array with np.broadcast_to:
>>> import numpy as np
>>> np.broadcast_to(True, (2, 5))
array([[ True, True, True, True, True],
[ True, True, True, True, True]], dtype=bool)
If you need it writable you can also create an empty array and fill it yourself:
>>> arr = np.empty((2, 5), dtype=bool)
>>> arr.fill(1)
>>> arr
array([[ True, True, True, True, True],
[ True, True, True, True, True]], dtype=bool)
These approaches are only alternative suggestions. In general you should stick with np.full, np.zeros or np.ones like the other answers suggest.

benchmark for Michael Currie's answer
import perfplot
bench_x = perfplot.bench(
n_range= range(1, 200),
setup = lambda n: (n, n),
kernels= [
lambda shape: np.ones(shape, dtype= bool),
lambda shape: np.full(shape, True)
],
labels = ['ones', 'full']
)
bench_x.show()

Quickly ran a timeit to see, if there are any differences between the np.full and np.ones version.
Answer: No
import timeit
n_array, n_test = 1000, 10000
setup = f"import numpy as np; n = {n_array};"
print(f"np.ones: {timeit.timeit('np.ones((n, n), dtype=bool)', number=n_test, setup=setup)}s")
print(f"np.full: {timeit.timeit('np.full((n, n), True)', number=n_test, setup=setup)}s")
Result:
np.ones: 0.38416870904620737s
np.full: 0.38430388597771525s
IMPORTANT
Regarding the post about np.empty (and I cannot comment, as my reputation is too low):
DON'T DO THAT. DON'T USE np.empty to initialize an all-True array
As the array is empty, the memory is not written and there is no guarantee, what your values will be, e.g.
>>> print(np.empty((4,4), dtype=bool))
[[ True True True True]
[ True True True True]
[ True True True True]
[ True True False False]]

>>> a = numpy.full((2,4), True, dtype=bool)
>>> a[1][3]
True
>>> a
array([[ True, True, True, True],
[ True, True, True, True]], dtype=bool)
numpy.full(Size, Scalar Value, Type). There is other arguments as well that can be passed, for documentation on that, check https://docs.scipy.org/doc/numpy/reference/generated/numpy.full.html

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

comparing numpy arrays with tolerance - python

Related

get texts from array at indexes where 2D tensor's values are above or equal to thresholds in tensorflow2

Python numpy boolean array not whole columns and rows

Find indices of element in 2D array

Get path of boundaries of contiguous regions in 2D array

How do I create a numpy array of all True or all False?

Categories

Resources