Finding a 'spike' or drop in a dataset programatically

Finding a 'spike' or drop in a dataset programatically - python

If I have a dataset that looks like this
[0.523,0.445,0.558,0.492,0.440,0.502,0.742,0.802,0.821,0.811,0.804,0.860]
As you can see, there is a 'spike' in the values after 0.502. Is there a way to find this programmatically in Python?
I'm already using Numpy and Scipy; I'm sure those libraries contain something like this. I just don't know what this procedure is called.
An added bonus would be to adjust the 'sensitivity' of detecting a spike or drop, since the dataset can be quite noise. A spike would mean a sustained increase in the moving averages of the values, and a drop would mean a sustained decrease values.
The range of each value is [-1,1]. The number of values in the array would be 50-100.

I would recommend using the diff function of numpy:
import numpy
a = [0.523,0.445,0.558,0.492,0.440,0.502,0.742,0.802,0.821,0.811,0.804,0.860]
numpy.diff(a)
This would give you:
array([-0.078, 0.113, -0.066, -0.052, 0.062, 0.24 , 0.06 , 0.019,
-0.01 , -0.007, 0.056])
If the number is positive, then it's a jump up, if it's negative, then it's a jump down.
If you just want to find where there are spikes, up or down try this:
abs(numpy.diff(a)) > 0.2
Adjusting the 0.2 up or down would make it less or more sensitive, respectively. This would give:
array([False, False, False, False, False, True, False, False, False,
False, False], dtype=bool)

It's fairly simple to find where the difference of two adjacent values in a sequence differs by a threshold:
def findSpikes(data, threshold=0.2):
prev = None
for i, v in enumerate(data):
if prev is None:
prev = v
continue
delta = abs(v - prev)
if delta >= threshold:
print("Found spike at index %d (value %f)" % (i, v))
prev = v
For your sample data, it will print:
Found spike at index 6 (value 0.742000)
It's easy to convert the function to a generator; change the print line to yield i, v or something similar.

Related

Huge Numpy list - How to get n digit?

Currently i have a huge database of randomly generated numbers in numpy.
array([62051180209, 87882444506, 49821030805, ..., 54840854303,
21222836608, 24070750502])
Now i want to check how many nunmbers have for ex. nunmber 05, 15 digits on position 3 and 4. (ex. 62-05-1180209, like first on my list)
I would like to check how many numberx have other digits in other position. Like position 5, 6. 1st on my list have number 11 for example.

Operations on strings take a lot more CPU and RAM than integers. It is A LOT faster to use integer math instead:
def get_matches(array, start, end, value):
return np.remainder(array // 10**start, 10**(end-start)) == value
Explained:
array // 10**start drops start digits at the end, using whole number division
np.remainder drops everything except end-start trailing digits
== value checks if the value matches. Note that to check if two digits are 05, value should be just 5.

As Random Davis already suggested, this might work:
import numpy as np
mylist = np.array([62011180209, 87882444506, 49821030805, 54840854303,21222836608, 24070750502])
def get_matches(mylist, start, end, value):
value = str(value)
return [str(i)[start:end+1]==value for i in mylist]
get_matches(mylist, start=3, end=4, value=11)
For that list this delivers the following result:
[True, False, False, False, False, False]
If multiple choices should be considered, then with a naive approach, the above function can be rewritten as follows:
def get_matches_multichoice(mylist, start, end, valuelist):
valuelist = [str(value) for value in valuelist]
return [str(i)[start:end+1] in valuelist for i in mylist]
Calling is for the above data example:
print (get_matches_multichoice(mylist, start=3, end=5, valuelist=np.array([111, 824, 408])) )
then returns:
[True, True, False, True, False, False]

Python - Detect inflection point in list and replace

The title is actually misleading, but I didn't know how to describe my problem in a short sentence. I don't care about inflection point, but I care about the point where the values switch from x > 1 to x < 1.
Consider the following array:
a = np.array([0.683, 0.819, 0.678, 1.189, 1.465, 0.93 , 0.903, 1.321, 1.321, 0.785, 0.875])
# do something... and here's what I want:
np.array([True, False, False, False, False, True, True, False, False, True, True])
Here are the rules:
First point in array is the starting point, and is always marked True
In order for values to be marked True, it must be smaller than 1 (x < 1).
However, even if a value is smaller than 1, if it's between the first value smaller than 1 and the first value greater than 1, mark it as False.
In case my explanation doesn't make sense, here's the picture of what I want to do:
The decimal values in the array a are just ratios: current point / previous point. How can I do this in Python?

the code I put hereafter do what you asked. Unfortunately, it doesn't use list comprehension.
The first thing I did was to write a function that find the indexes of the first value below zero and the first value above zero.
import numpy as np
a = np.array([0.683, 0.819, 0.678, 1.189, 1.465, 0.93 , 0.903, 1.321, 1.321, 0.785, 0.875])
### if a number is below ONE but in a position between the first true below zero and the first false above zero
### then it's false
## find the two indexes of the first value below 1 and the first value above 1
def find_indx(a):
first_min=0
for i in range(len(a)):
if(a[i]<1):
first_min=i
break
first_max=0
for i in range(len(a)):
if(a[i]>1):
first_max=i
break
return([first_min,first_max])
Using this function you can set, to false, the values that are below zero but are in the interval between the first below zero and the first above zero.
The two indexes are stored in "false_range".
Once you have that it's quite easy. The first point is always true.
If the indexes are between the "false_range" and below zero they become false.
If the points are outside the "false_range" their value depends if they are above 1 or below.
false_range=find_indx(a)
truth_list=[]
for i in range(len(a)):
## the first value is always true
if(i==0):
truth_list.append(True)
else:
## if the index is between the false_range and
## this value is below 1 assign False
if(i>false_range[0] and i<false_range[1] and a[i]<1):
truth_list.append(False)
## in all the other cases it depends only if the value is below or above zero
elif(a[i]>1):
truth_list.append(False)
elif(a[i]<1):
truth_list.append(True)
print(truth_list)
[True, False, False, False, False, True, True, False, False, True, True]
The printed list correspond to the one you gave, but please, test this solution before using it.

Find values in list which lies within certain range with tolerance in python?

I am trying to find a certain value that lies within a certain range. I know that using np.isclose will do that with a certain tolerance. However, it will print out the result in terms of boolean variables. So, I would like to do the same but only prints out the values instead of True or False.
mylist = (1.5,3.1251,5.8741,9.213,7.858,2.1242,8.18956,2.5452,4.745,2.1254)
threshold = 3.5
result = np.isclose(mylist, threshold, rtol = 1e-05)
print(result)
Instead of the print below:
result = array([False, True, False, False, False, True, False, True, False, True])
I would like it to print the following:
result = array([3.1251, 2.1242, 2.5452, 2.1254])
P.S The result is just an example not the real result.
Edit 1
I managed to change the code to be the following:
def check_x_axis_list(comp_list, target_value, tolerance):
x_axis_result = []
for x in range(0, len(comp_list)):
curent_value = comp_list[x]
if curent_value >= target_value - tolerance and curent_value <= target_value + tolerance:
x_axis_result.append(comp_list[x])
return x_axis_result
and for the tolerance I tried to do the following:
tol_max = max(mylist)
tol_min = min(mylist)
tol = (tol_max - tol_min) / 2
However, I keep on having this error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Edit 2
I change the code to the following code but still, I am getting the same error
result = [val for val in comp_list if abs(val - target_value) < tolerance]
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Edit 3
Equation
PSD Plot with Peaks
Equation plot parameters
I am trying to obtain the values w1 and w2 from the PSD plot and the codes above is a way to eliminate the other values where the target value is the x-axis peak with asterisk mark and searches the values that are close to the x-axis value with a certain tolerance. I would like the code to search and instead of giving me boolean results to give me values that are close to the peak. The intersection values are below with the peak location.
Amplitude values: [0.0004046159973339667, 0.0003064079718686719]
Current Amplitude value: 0.0004046159973339667
Current half power amplitude value: 0.00028610671549140587
Current Amplitude value: 0.0003064079718686719
Current half power amplitude value: 0.00021666315471795475
Intersection Values: [array([11.6705359 , 13.66919925, 21.84434139, 22.53181091, 27.88789357,28.17911233]), array([11.43294083, 14.12791966, 21.28003529, 23.43686901, 27.50441635,28.79179351])]
Edit 4
I the following codes worked:
def check_x_axis_list(comp_list, target_value, tolerance):
x_axis_result = []
for x in range(0, len(comp_list)):
if comp_list[x] >= (target_value - tolerance) and comp_list[x] <= (target_value + tolerance):
x_axis_result.append(comp_list[x])
return x_axis_result
and
def check_x_axis_list(comp_list, target_value, tolerance):
x_axis_result = [val for val in comp_list if abs(val - target_value) < tolerance]
return x_axis_result
However, my only struggle is to how to manipulate with the tolerance value in order to get the values that are close to x-axis amplitude value becuase when it came to 28 it prints out the 2nd peak intersection and 3rd peak intersection instead of only the 3rd peak intersection. See the issue below:
Intersection: [11.6705359 , 13.66919925, 21.84434139, 22.53181091, 27.88789357, 28.17911233]
Current x-axis Peak Amplitude value: 13.0
28.17911232801107
11.670535903774892
8.254288212118087
[11.670535903774892, 13.66919924780022]
Intersection: 11.43294083, 14.12791966, 21.28003529, 23.43686901, 27.50441635, 28.79179351]
Current x-axis Peak Amplitude value: 28.0
28.791793514060206
11.432940831732218
8.679426341163994
[21.280035294406446, 23.436869009131495, 27.504416349364988, 28.791793514060206]

You can try this:
print(np.array(mylist)[result])
Output:
array([3.1251, 2.1242, 2.5452, 2.1254])

representing recurrence by chaining iterables in Python

I'm solving a problem where I have levels in a binary tree. I'm given a level, then a position.
The second level is [True, False].
The third level is [True, False, False, True].
The fourth [True, False, False, True, False, True, True, False], and so on.
To solve the problem, I may need to calculate this sequence out many times to get the element at a given position at that level.
For the initial array pattern = [True, False]
I want to do something like:
for _ in range(level):
pattern = pattern + [not elem for elem in pattern]
Obviously for large limits this is not working well for me. My attempts at a solution using the chain method from itertools has so far been fruitless. What is a memory efficient way to express this in Python?
EDIT
This did what I was looking for, but still did not meet the runtime requirements I was looking for.
for _ in range(level):
lsb, msb = tee(pattern)
pattern = chain(lsb, map(lambda x: not x, msb))
Ultimately, the sol'n involved finding the global index of the target element in question, and determining how many 'right' paths were taken from the root (base case = 1) to get to it, observing that the state from the parent to child does not change if a left path was taken, but flips if a right path was taken. It appears that most of the clever soln's are some spin on this fact.

What is a memory efficient way to express this in Python?
Since the approach you're using doubles the memory needed on each iteration, it won't scale easily. It may be better to find an analytic approach.
The generator below takes O(1) time to generate each element. And, crucially, calculating the next value depends only on the index and the previous value.
def gen():
yield True
n, prev = 1, 1
while True:
x = n ^ n - 1
y = x ^ x >> 1
if y.bit_length() % 2:
z = 1 - prev
else:
z = prev
yield bool(z)
prev = z
n += 1
A recurrence relation like this allows to compute elements in constant memory. Implementing the idea with cython or pypy should increase performance significantly.

Trying to generate elements one by one is a bad idea, and saving them all is even worse. You only need one element's value, and you can compute it directly.
Suppose the element you want is at index 2**i + k, where k < 2**i. Then this element is the negation of the element at index k, and the element at index k can be computed the same way. You end up negating element 0 once for each set bit in your desired index's binary representation. If there are an even number of set bits, the value is True. Otherwise, the value is False.

How to apply function to only certain array elements?

I have an array x and I want to apply a function f to every item in the matrix that meets some condition. Does Numpy offer a mechanism to make this easy?
Here's an example. My matrix x is supposed to contain only elements in the exclusive range (0, 1). However, due to rounding errors, some elements can be equal to 0 or 1. For every element in x that is exactly 0 I want to add epsilon and for every element that is exactly 1 I want to subtract epsilon.
Edit: (This edit was made after I had accepted askewchan's answer.) Another way to do this is to use numpy.clip.

You can do this:
a = np.array([0,.1,.5,1])
epsilon = 1e-5
a[a==0] += epsilon
a[a==1] += -epsilon
The reason this works is that a==0 returns a boolean array, just like what Валера Горбунов referred to in their answer:
In : a==0
Out: array([True, False, False, False], dtype=bool)
Then you're using that array as an index to a, which exposes the elements where True but not where False. There's a lot that you can do with this, see http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html

Sorry this isn't more concrete, but you could create a Boolean array that has a TRUE value for every position that meets your condition and FALSE for those that don't.
For something like [0, 1, 0, 0] when testing for 1 you will get an array [FALSE, TRUE, FALSE, FALSE]. In which case you can do [0, 1, 0, 0] - (epsilon)[FALSE, TRUE, FALSE, FALSE] and leave the 0 values unaffected.
Boolean Array Example

You can use map() as documented at http://docs.python.org/2/tutorial/datastructures.html#functional-programming-tools:
def applyEpsilon(value):
myEpsilon = 0.001
if value == 0:
return myEpsilon
elif value == 1:
return 1-myEpsilon
return value
inputList = [0, 0.25, 0.5, 0.75, 0.99, 1]
print map(applyEpsilon, inputList)
Yields:
[0.001, 0.25, 0.5, 0.75, 0.99, 0.999]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Finding a 'spike' or drop in a dataset programatically - python

Related

Huge Numpy list - How to get n digit?

Python - Detect inflection point in list and replace

Find values in list which lies within certain range with tolerance in python?

representing recurrence by chaining iterables in Python

How to apply function to only certain array elements?

Categories

Resources