Get the relative extrema from 1D numpy array - python

I'm writing code that includes the algorithm to find local maximum/minimum values in array. But I failed to find the proper function.
At first, I used argrelextrema in scipy.signal.
b = [6, 1, 3, 5, 5, 3, 1, 2, 2, 3, 2, 1, 1, 9, 10, 10, 9, 8, 7, 7, 13, 10]
scipy.signal.argrelextrema(np.array(b), np.greater)
scipy.signal.argrelextrema(np.array(b), np.greater_equal)
scipy.signal.argrelextrema(np.array(b), np.greater_equal, order=2)
The result is
(array([ 9, 20], dtype=int64),)
(array([ 0, 3, 4, 7, 9, 14, 15, 20], dtype=int64),)
(array([ 0, 3, 4, 9, 14, 15, 20], dtype=int64),)
First one didn't catch the b[3](or b[4]). So I modified it to second one, using np.greater_equal. However, in this case, the first value b[0] is also treated as local maximum, and the value 2 in b[7] is included. By using third one, I could throw away b[7]. But order=2 still also has problem when data is like [1, 3, 1, 4, 1] (it can't catch 3)
My expected result is
[3(or 4), 9, 14(or 15), 20]
I want to catch only one among b[3], b[4] (same value). I want some problems of argrelextrema I mentioned above to be solved. The code below succeeded.
scipy.signal.find_peaks(b)
the result is [3, 9, 14, 20].
The code I'm writing is treating the pair of local maximum, and local minimum. So I want to find the local minimum in the same way. Is there any function like scipy.signal.find_peaks to find local minimum?

You could simply apply find_peaks to the negative version of your array:
from scipy.signal import find_peaks
min_idx = find_peaks([-x for x in b])
Even more convenient when using numpy arrays:
import numpy as np
b = np.array(b)
min_idx = find_peaks(-b)

Related

Find min value and index of the value in a matrix column after column

I have a problem, which seems to be easy but it is causing me a lot of headache.
The problem is that I'm programming in python (I'm relative new to it) and I'm looking for an aquivalent of the function max (min) of a matrix in matlab but using numpy.
What I want to do is to get the minimum value and its index in a matrix
Just to keep it as easiest as possible with an example, let's say this is the matrix:
arr2D = np.array([[11, 12, 13, 34],
[14, 15, 16, 3],
[17, 15, 11, 1],
[7, 5, 11, 4],
[1, 12, 4, 4],
[12, 14, 15,-3]])
in matlab I would do:
[local_max, index] = min(arr2D)
and I would get the min value and its index for every column in the matrix.
Trying to repeat the same in python (after looking here and here) with the following code:
print(np.where(arr2D == np.amin(arr2D, axis = 0))) # axis 0 is for columns
I get the following output:
(array([3, 4, 4, 5]), array([1, 0, 2, 3]))
which is not really what I want to get!
The expected output should be:
[1, 4] # Meaning the minimum value is 1 and it is in row 4 for the first column
[5, 3] # Meaning the minimum value is 5 and it is in row 3 for the second column
[4, 4] # Meaning the minimum value is 4 and it is in row 4 for the third column
[-3, 5] # Meaning the minimum value is -3 and it is in row 5 for the last column
I cannot use the output I get by:
print(np.where(arr2D == np.amin(arr2D, axis = 0)))
Or I don't understand the output or that's not the right way to get the aquivalent function max (min) of matlab.
Could you help me?
UPDATE:
I forgot to say that the matrix is float and not integer. I used integer just for the example
np.amin or np.min returns the min values along an axis
np.amin(arr2D, axis=0)
Out:
array([ 1, 5, 4, -3])
np.argmin returns the indices
np.argmin(arr2D, axis=0)
Out:
array([4, 3, 4, 5])
To get the desired output you can use np.vstack and transpose the array
np.vstack([np.amin(arr2D, axis=0), np.argmin(arr2D, axis=0)]).T
Out:
array([[ 1, 4],
[ 5, 3],
[ 4, 4],
[-3, 5]])
Use this code (you can simply make a function out of it):
import numpy as np
arr2D = np.array([[11, 12, 13, 34],
[14, 15, 16, 3],
[17, 15, 11, 1],
[7, 5, 11, 4],
[1, 12, 4, 4],
[12, 14, 15,-3]])
flat = arr2D.flatten()
arrayIndex = flat.tolist().index(min(flat))
// results
rowIndex = int(minIndex/arr2D.shape[0])
columnIndex = minIndex % arr2D.shape[1]

Change dataset entries based on a boolean mask

As part of a wider workflow, I need to perform the following operation: given 3 datasets, (with the same shape): one of them contains only boolean value, and will be referred as the "mask".
Essentially I need a function that changes each entry of the first dataset, using values from the second one, if the corresponding entry in the mask equals 1.
The following function does the job
def swap(a,b,c):
for i in range(a.shape[0]):
for j in range(a.shape[1]):
if c.iloc[i,j]== 1:
a.iloc[i,j] = b.iloc[i,j]
return a
but I doubt very much this is efficient, to say the least.
For starters, would be certainly best not to iterate over all entries, but just over indices corresponding to 1s in the mask.
Yet, in general, are there any pandas/numpy functions/implementations I should be considering? I could not find much at all, thanks
You can use np.copyto:
a,b,c = np.random.randint([0,10,0,],[10,20,2],[10,3]).T
a
# array([4, 5, 2, 6, 3, 6, 3, 1, 0, 7])
b
# array([19, 10, 17, 17, 18, 13, 15, 17, 14, 16])
c
# array([0, 1, 1, 1, 0, 1, 0, 1, 1, 1])
np.copyto(a,b,where=c.astype(bool))
a
# array([ 4, 10, 17, 17, 3, 13, 3, 17, 14, 16])
Using NumPy will be better:
import numpy as np
a = b.values*c.values + a.values*np.logical_not(c.values)
You can use boolean array indexing in numpy. Here is an simple example:
A = np.random.randn(5,5)
B = np.ones((5,5))
C = np.random.randint(1,size=(5,5))
A[C==1] = B[C==1]

How do I quickly decimate a numpy array?

I need a function that decimates, removes m in n of, a numpy array. For example to remove 1 in 2 or remove 2 in 3. So an array which is:
[7, 4, 3, 5, 9, 2, 4, 1, 6, 8]
decimated by 1:2 would become:
[7, 3, 9, 4, 6]
I wonder if it is possible to reshape the array from 1d array N long to one that is 2d and N/2, 2 long then drop the extra dimension?
Ideally, rather than just dump the decimated samples, I would like to find the maximum value across each set (in this example pair) of values. For example:
[7, 5, 9, 4, 8]
Is there a way to find the maximum value across each set rather than just to drop it?
The added challenge is that the point here is to plot the values.
The decimation is required because plotting every value is taking too long meaning that I have to reduce the size of an array before plotting it but I need to do this quickly. So for or while loops would take too long.
A quick and dirty way is
k,N = 3,18
a = np.random.randint(0,10,N) #[9, 6, 6, 6, 8, 4, 1, 4, 8, 1, 2, 6, 1, 8, 9, 8, 2, 8]
a = a[:-k:k] #[9, 6, 1, 1, 1]
This should work regardless of k dividing into N or not.
It is worth being afraid of simply throwing out readings, because significant readings can be thrown out.
For the tasks that you described, it is worth using decimation.
Unfortunately it is not in numpy, but it is in scipy.
In the code below, I gave an example when discarding samples leads to an error.
As you can see, the original data (blue) has a peak. And manual thinning can just skip it (green).
If you apply deciamation from the library, then it will be included in the result (orange).
from scipy import signal
import matplotlib.pyplot as plt
import numpy as np
downsampling_factor = 2
t = np.linspace(0, 1, 50)
y = list(np.random.randint(0,10,int(len(t)/2))) + [50] + list(np.random.randint(0,10,int(len(t)/2-1)))
ydem = signal.decimate(y, downsampling_factor)
t_new = np.linspace(0, 1, len(ydem))
manual_decimation = y[:-downsampling_factor:downsampling_factor]
t_manual_decimation = np.linspace(0, 1, len(manual_decimation))
plt.plot(t, y, '.-', t_new, ydem, 'o-', t_manual_decimation, manual_decimation, 'x-')
plt.legend(['data', 'scipy decimate', 'manual decimate'], loc='best')
plt.show()
In general, this is not such a trivial task, please be careful.
UPD: note that the length of the vector must be greater than 27.
to find the maximum:
1) k divides N:
k,N = 3,18
a = np.random.randint(0,10,N)
a
# array([0, 6, 6, 3, 7, 0, 9, 2, 3, 2, 5, 4, 2, 6, 9, 6, 3, 2])
a.reshape(-1,k).max(1)
# array([6, 7, 9, 5, 9, 6])
2) k does not divide N:
k,N = 4,21
a = np.random.randint(0,10,N)
a
# array([4, 4, 6, 0, 0, 1, 7, 8, 2, 3, 0, 5, 7, 1, 1, 5, 7, 8, 3, 1, 7])
np.maximum.reduceat(a, np.arange(0,N,k))
# array([6, 8, 5, 7, 8, 7])
2) should always work but I suspect 1) is faster where applicable

`np.average()` format option

I'm trying to understand a python code, a specific line of the code has troubled me a bit:
mean = np.average(data[:,index])
I understand that this is an average calculation of data declared early above, but what does [:,index]indicate?
I apologise if this question is duplicated, but please link me a solution before you mark it down. This is the first day I'm exposed to Python, please excuse my ignorance. Appreciate for any kind advice!
below is part of the original code
data = np.genfromtxt(args.inputfile)
def doBlocking(data,index):
ndata = data.shape[0]
ncols = data.shape[1]-1
#things unimportant
mean = np.average(data[:,index])
#more unimportance
It is so called slicing. In your case average of specific column (with index equal to variable with the name "index") of 2-dimensional array is calculated.
In this case data is a two dimensional numpy.array. Numpy supports slicing similar to that of Matlab
In [1]: import numpy as np
In [2]: data = np.arange(15)
In [3]: data
Out[3]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
In [4]: data = data.reshape([5,3])
In [5]: data
Out[5]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14]])
In [6]: data[:, 1]
Out[6]: array([ 1, 4, 7, 10, 13])
As you can see it selects the second column
Your code above will get the mean of column index. It basically says "Compute the mean for data in every line, and column index"

Set numpy array elements to zero if they are above a specific threshold

Say, I have a numpy array consists of 10 elements, for example:
a = np.array([2, 23, 15, 7, 9, 11, 17, 19, 5, 3])
Now I want to efficiently set all a values higher than 10 to 0, so I'll get:
[2, 0, 0, 7, 9, 0, 0, 0, 5, 3]
Because I currently use a for loop, which is very slow:
# Zero values below "threshold value".
def flat_values(sig, tv):
"""
:param sig: signal.
:param tv: threshold value.
:return:
"""
for i in np.arange(np.size(sig)):
if sig[i] < tv:
sig[i] = 0
return sig
How can I achieve that in the most efficient way, having in mind big arrays of, say, 10^6 elements?
In [7]: a = np.array([2, 23, 15, 7, 9, 11, 17, 19, 5, 3])
In [8]: a[a > 10] = 0
In [9]: a
Out[9]: array([2, 0, 0, 7, 9, 0, 0, 0, 5, 3])
Generally, list comprehensions are faster than for loops in python (because python knows that it doesn't need to care for a lot of things that might happen in a regular for loop):
a = [0 if a_ > thresh else a_ for a_ in a]
but, as #unutbu correctly pointed out, numpy allows list indexing, and element-wise comparison giving you index lists, so:
super_threshold_indices = a > thresh
a[super_threshold_indices] = 0
would be even faster.
Generally, when applying methods on vectors of data, have a look at numpy.ufuncs, which often perform much better than python functions that you map using any native mechanism.
If you don't want to change your original array
In [2]: a = np.array([2, 23, 15, 7, 9, 11, 17, 19, 5, 3])
In [3]: b = np.where(a > 10, 0, a)
In [4]: b
Out[4]: array([2, 0, 0, 7, 9, 0, 0, 0, 5, 3])
In [5]: a
Out[5]: array([ 2, 23, 15, 7, 9, 11, 17, 19, 5, 3])
From the neural networks from scratch series by sentdex on Youtube, he used np.maximum(0, [your array]) to make all values less than 0 into 0.
For your question I tried np.minimum(10, [your array]) and it seemed to work incredibly fast. I even did it on an array that was 10e6 (uniform distribution generated using 50 * np.random.rand(10000000)), and it worked in 0.039571 seconds. I hope this is fast enough.

Categories

Resources