I have a piece of code below that calculates the maximum value of an array. It then calculates a value for 90% of the maximum, finds the closest value to this in the array as well as its corresponding index.
I need to ensure that I am finding the closest value to 90% that occurs only before the maximum. Can anyone help with this please? I was thinking about maybe compressing the array after the maximum has occurred but then each array I use will be a different size and that will be difficult later on.
import numpy as np
#make amplitude arrays
amplitude=[0,1,2,3, 5.5, 6,5,2,2, 4, 2,3,1,6.5,5,7,1,2,2,3,8,4,9,2,3,4,8,4,9,3]
#split arrays up into a line for each sample
traceno=5 #number of traces in file
samplesno=6 #number of samples in each trace. This wont change.
amplitude_split=np.array(amplitude, dtype=np.int).reshape((traceno,samplesno))
#find max value of trace
max_amp=np.amax(amplitude_split,1)
#find index of max value
ind_max_amp=np.argmax(amplitude_split, axis=1, out=None)
#find 90% of max value of trace
amp_90=np.amax(amplitude_split,1)*0.9
# find the indices of the min absolute difference
indices_90 = np.argmin(np.abs(amplitude_split - amp_90[:, None]), axis=1)
print("indices for 90 percent are", + indices_90)
Use a mask to set the values after the maximum (including the maximum? ) to a known 'too high' value. Then argmin will return the index of the minimum difference in the 'valid' area of each row.
# Create a mask for amplitude equal to the maximum
# add a dimension to max_amp.
mask = np.equal(amplitude_split, max_amp[-1, None])
# Cumsum the mask to set all elements in a row after the first True to True
mask[:] = mask.cumsum(axis = 1)
mask
# array([[False, False, False, False, False, True],
# [ True, True, True, True, True, True],
# [False, False, False, True, True, True],
# [False, False, False, False, True, True],
# [False, False, False, False, True, True]])
# Set inter to the absolute difference.
inter = np.abs(amplitude_split - amp_90[-1,None])
# Set the max and after to a high value (10. here).
inter[mask] = max_amp.max() # Any suitably high value
inter # Where the mask is True inter == 9.
# array([[8.1, 7.1, 6.1, 5.1, 3.1, 9. ],
# [9. , 9. , 9. , 9. , 9. , 9. ],
# [7.1, 2.1, 3.1, 9. , 9. , 9. ],
# [6.1, 5.1, 0.1, 4.1, 9. , 9. ],
# [5.1, 4.1, 0.1, 4.1, 9. , 9. ]])
# Find the indices of the minimum in each row
np.argmin(inter, axis = 1)
# array([4, 0, 1, 2, 2])
Related
I'm trying to compare floating numbers that are stored in numpy arrays.
I would like them to be compared with a tolerance and every number of the array should be compared with every number of the other array.
My attempt is shown underneath, I used two simple arrays as examples but it has the problem that it only compares numbers with the same indices.
b_y_ion_mass = np.array([1.000, 2.1300, 3.4320, 6.0000])
observed_mass_array = np.array([0.7310, 2.2300, 5.999, 8.000, 9.000])
abs_tol = 0.2
for (fragment, mass) in zip(b_y_ion_mass, observed_mass_array):
if (fragment+abs_tol)> mass and (fragment-abs_tol)< mass:
print(mass)
It would be great if anyone could help me.
Thank you.
Use np.isclose with atol = abs_tol.
import numpy as np
b_y_ion_mass = np.array([1.000, 2.1300, 3.4320, 6.0000])
observed_mass_array = np.array([0.7310, 2.2300, 5.999, 8.000, 9.000])
abs_tol = 0.2
np.isclose( b_y_ion_mass, observed_mass_array[ :, None ] , atol = abs_tol )
# columns rows
# array([[False, False, False, False],
# [False, True, False, False],
# [False, False, False, True],
# [False, False, False, False],
# [False, False, False, False]])
# Compares
# [1.000, 2.1300, 3.4320, 6.0000]
# [0.7310,
# 2.2300, True
# 5.999, True
# 8.000,
# 9.000]
To get the observed masses:
np.isclose( b_y_ion_mass, observed_mass_array[ :, None ],
atol = abs_tol ) * observed_mass_array[ :, None ]
Result
array([[0. , 0. , 0. , 0. ],
[0. , 2.23 , 0. , 0. ],
[0. , 0. , 0. , 5.999],
[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ]])
You can do:
diff_matrix = b_y_ion_mass - observed_mass_array[:, np.newaxis]
to subtract each item in a by each item in b:
array([[ 2.690e-01, 1.399e+00, 2.701e+00, 5.269e+00],
[-1.230e+00, -1.000e-01, 1.202e+00, 3.770e+00],
[-4.999e+00, -3.869e+00, -2.567e+00, 1.000e-03],
[-7.000e+00, -5.870e+00, -4.568e+00, -2.000e+00],
[-8.000e+00, -6.870e+00, -5.568e+00, -3.000e+00]])
then take the absolute value and compare to your tolerence:
valid = abs(diff_matrix) < abs_tol
output:
array([[False, False, False, False],
[False, True, False, False],
[False, False, False, True],
[False, False, False, False],
[False, False, False, False]])
So you can see here that the second item in the first array subtract the second item in the second array is less than your tolerance. Also, the last item in your first array subtract the third item in your second array is less than your tolerance
Say I have an array like this:
import numpy as np
arr = np.array([
[1, 1, 3, 3, 1],
[1, 3, 3, 1, 1],
[4, 4, 3, 1, 1],
[4, 4, 1, 1, 1]
])
There are 4 distinct regions: The top left 1s, 3s, 4s and right 1s.
How would I get the paths for the bounds of each region? The coordinates of the vertices of the region, in order.
For example, for the top left 1s, it is (0, 0), (0, 2), (1, 2), (1, 1), (2, 1), (2, 0)
(I ultimately want to end up with something like start at 0, 0. Right 2. Down 1. Right -1. Down 1. Right -1. Down -2., but it's easy to convert, as it's just the difference between adjacent vertices)
I can split it up into regions with scipy.ndimage.label:
from scipy.ndimage import label
regions = {}
# region_value is the number in the region
for region_value in np.unique(arr):
labeled, n_regions = label(arr == region_value)
regions[region_value] = [labeled == i for i in range(1, n_regions + 1)]
Which looks more like this:
{1: [
array([
[ True, True, False, False, False],
[ True, False, False, False, False],
[False, False, False, False, False],
[False, False, False, False, False]
], dtype=bool), # Top left 1s region
array([
[False, False, False, False, True],
[False, False, False, True, True],
[False, False, False, True, True],
[False, False, True, True, True]
], dtype=bool) # Right 1s region
],
3: [
array([
[False, False, True, True, False],
[False, True, True, False, False],
[False, False, True, False, False],
[False, False, False, False, False]
], dtype=bool) # 3s region
],
4: [
array([
[False, False, False, False, False],
[False, False, False, False, False],
[ True, True, False, False, False],
[ True, True, False, False, False]
], dtype=bool) # 4s region
]
}
So how would I convert that into a path?
a pseudo code idea would be to do the following:
scan multi-dim array horizontally and then vertically until you find True value (for second array it is (0,4))
output that as a start coord
since you have been scanning as determined above your first move will be to go right.
repeat until you come back:
move one block in the direction you are facing.
you are now at coord x,y
check values of ul=(x-1, y-1), ur=(x-1, y), ll=(x, y-1), lr=(x,y)
# if any of above is out of bounds, set it as False
if ul is the only True:
if previous move right:
next move is up
else:
next move is left
output previous move
move by one
..similarly for other single True cells..
elif ul and ur only True or ul and ll only True or ll and lr only True or ur and lr only True:
repeat previous move
elif ul and lr only True:
if previous move left:
next move down
elif previous move right:
next move up
elif preivous move down:
next move left:
else:
next move right
output previous move
move one
elif ul, ur, ll only Trues:
if previous move left:
next move down
else:
next move right
output previous move, move by one
...similarly for other 3 True combos...
for the second array it will do the following:
finds True val at 0,4
start at 0,4
only lower-right cell is True, so moves right to 0,5 (previous move is None, so no output)
now only lower-left cell is True, so moves down to 1,5 (previous move right 1 is output)
now both left cells are True, so repeat move (moves down to 2,5)
..repeat until hit 4,5..
only upper-left cell is True, so move left (output down 4)
both upper cells are true, repeat move (move left to 3,4)
both upper cells are true, repeat move (move left to 2,4)
upper right cell only true, so move up (output right -3)
..keep going until back at 0,4..
Try visualising all the possible coord neighbouring cell combos and that will give you a visual idea of the possible flows.
Also note that with this method it should be impossible to be traversing a coord which has all 4 neighbours as False.
I have a volume represented by a 3D ndarray, X, with values between, say, 0 and 255, and I have another 3D ndarray, Y, that is an arbitrary mask of the first array, with values of either 0 or 1.
I want to find the indicies of a random sample of 50 voxels that is both greater than zero in X, the 'image', and equal to 1 in Y, the 'mask'.
My experience is with R, where the following would work:
idx <- sample(which(X>0 & Y==1), 50)
Maybe the advantage in R is that I can index 3D arrays linearly, because just using a single index in numpy gives me a 2D matrix, for example.
I guess it probably involves numpy.random.choice, but it doesn't seem like I can use that conditionally, let alone conditioned on two different arrays. Is there another approach I should be using instead?
Here's one way -
N = 50 # number of samples needed (50 for your actual case)
# Get mask based on conditionals
mask = (X>0) & (Y==1)
# Get corresponding linear indices (easier to random sample in next step)
idx = np.flatnonzero(mask)
# Get random sample
rand_idx = np.random.choice(idx, N)
# Format into three columnar output (each col for each dim/axis)
out = np.c_[np.unravel_index(rand_idx, X.shape)]
If you need random sample without replacement, use np.random.choice() with optional arg replace=False.
Sample run -
In [34]: np.random.seed(0)
...: X = np.random.randint(0,4,(2,3,4))
...: Y = np.random.randint(0,2,(2,3,4))
In [35]: N = 5 # number of samples needed (50 for your actual case)
...: mask = (X>0) & (Y==1)
...: idx = np.flatnonzero(mask)
...: rand_idx = np.random.choice(idx, N)
...: out = np.c_[np.unravel_index(rand_idx, X.shape)]
In [37]: mask
Out[37]:
array([[[False, True, True, False],
[ True, False, True, False],
[ True, False, True, True]],
[[False, True, True, False],
[False, False, False, True],
[ True, True, True, True]]], dtype=bool)
In [38]: out
Out[38]:
array([[1, 0, 1],
[0, 0, 1],
[0, 0, 2],
[1, 1, 3],
[1, 1, 3]])
Correlate the output out against the places of True values in mask for a quick verification.
If you don't want to flatten for getting the linear indices and directly get the indices per dim/axis, we can do it like so -
i0,i1,i2 = np.where(mask)
rand_idx = np.random.choice(len(i0), N)
out = np.c_[i0,i1,i2][rand_idx]
For performance, index first and then concatenate with np.c_ at the last step -
out = np.c_[i0[rand_idx], i1[rand_idx], i2[rand_idx]]
I want to inverse the true/false value in my numpy masked array.
So in the example below i don't want to mask out the second value in the data array, I want to mask out the first and third value.
Below is just an example. My masked array is created by a longer process than runs before. So I can not change the mask array itself. Is there another way to inverse the values?
import numpy
data = numpy.array([[ 1, 2, 5 ]])
mask = numpy.array([[0,1,0]])
numpy.ma.masked_array(data, mask)
import numpy
data = numpy.array([[ 1, 2, 5 ]])
mask = numpy.array([[0,1,0]])
numpy.ma.masked_array(data, ~mask) #note this probably wont work right for non-boolean (T/F) values
#or
numpy.ma.masked_array(data, numpy.logical_not(mask))
for example
>>> a = numpy.array([False,True,False])
>>> ~a
array([ True, False, True], dtype=bool)
>>> numpy.logical_not(a)
array([ True, False, True], dtype=bool)
>>> a = numpy.array([0,1,0])
>>> ~a
array([-1, -2, -1])
>>> numpy.logical_not(a)
array([ True, False, True], dtype=bool)
Latest Python version also support '~' character as 'logical_not'. For Example
import numpy
data = numpy.array([[ 1, 2, 5 ]])
mask = numpy.array([[False,True,False]])
result = data[~mask]
I've been trying to write some code which will add the numbers which fall into a certain range and add a corresponding number to a list. I also need to pull the range from a cumsum range.
numbers = []
i=0
z = np.random.rand(1000)
arraypmf = np.array(pmf)
summation = np.cumsum(z)
while i < 6:
index = i-1
a = np.extract[condition, z] # I can't figure out how to write the condition.
length = len(a)
length * numbers.append(i)
I'm not entirely sure what you're trying to do, but the easiest way to do conditions in numpy is to just apply them to the whole array to get a mask:
mask = (z >= 0.3) & (z < 0.6)
Then you can use, e.g., extract or ma if necessary—but in this case, I think you can just rely on the fact that True==1 and False==0 and do this:
zm = z * mask
After all, if all you're doing is summing things up, 0 is the same as not there, and you can just replace len with count_nonzero.
For example:
In [588]: z=np.random.rand(10)
In [589]: z
Out[589]:
array([ 0.33335522, 0.66155206, 0.60602815, 0.05755882, 0.03596728,
0.85610536, 0.06657973, 0.43287193, 0.22596789, 0.62220608])
In [590]: mask = (z >= 0.3) & (z < 0.6)
In [591]: mask
Out[591]: array([ True, False, False, False, False, False, False, True, False, False], dtype=bool)
In [592]: z * mask
Out[592]:
array([ 0.33335522, 0. , 0. , 0. , 0. ,
0. , 0. , 0.43287193, 0. , 0. ])
In [593]: np.count_nonzero(z * mask)
Out[593]: 2
In [594]: np.extract(mask, z)
Out[594]: array([ 0.33335522, 0.43287193])
In [595]: len(np.extract(mask, z))
Out[595]: 2
Here is another approach to do (what I think) you're trying to do:
import numpy as np
z = np.random.rand(1000)
bins = np.asarray([0, .1, .15, 1.])
# This will give the number of values in each range
counts, _ = np.histogram(z, bins)
# This will give the sum of all values in each range
sums, _ = np.histogram(z, bins, weights=z)