Getting time frequency of number in array in Python? - python

Let's say I have a time series represented in a numpy array, where every 3 seconds, I get a data point. It looks something like this (but with many more data points):
z = np.array([1, 2, 1, 2.2, 3, 4.4, 1, 1.2, 2, 3, 2.1, 1.2, 5, 0.5])
I want to find a threshold where, on average, every y seconds a data point will surpass that threshold (x).
Maybe my question would be easier to understand in this sense: let's say I've gathered some data on how many ants are leaving their mound every 3 seconds. Using this data, I want to create a threshold (x) so that in the future if the number of ants leaving at one time exceeds x, my beeper will go off. Now this is the key part - I want my beeper to go off roughly every 4 seconds. I'd like to use Python to figure out what x should be given some y amount of time based on an array of data I've already collected.
Is there a way to do this in Python?

I think it is easiest to first think about this in terms of statistics. What I think you are really saying is that you want to calculate the 100*(1-m/nth) percentile, that is the number such that the value is below it 1-m/nth of the time, where m is your sampling period and n is your desired interval. In your example, it would be the 100*(1-3/4th) percentile, or 25th percentile. That is, you want the value that is exceeded 75% of the time.
So to calculate that on your data, you should use scipy.stats.scoreatpercentile. So for your case you can do something like:
>>> z = np.array([1, 2, 1, 2.2, 3, 4.4, 1, 1.2, 2, 3, 2.1, 1.2, 5, 0.5])
>>> m = 3.
>>> n = 4.
>>> x = scipy.stats.scoreatpercentile(z, 100*(1-m/n))
>>> print(x)
1.05
>>> print((z>x).sum()/len(z)) # test, should be about 0.75
0.714285714286
Of course if you have a lot of values this estimate will be better.
Edit: Originally I had the percentile backwards. It should be 1-m/n, but I originally had just m/n.

Assuming that one second resolution for the trigger is ok...
import numpy as np
z = np.array([1, 2, 1, 2.2, 3, 4.4, 1, 1.2, 2, 3, 2.1, 1.2, 5, 0.5])
period = 3
Divide each sample point by its period (in seconds) and create an array of one second data - assumes linear distribution(?) for each sample.
y = np.array([[n]*period for n in z / period])
y = y.flatten()
Reshape the data into four second periods (lossy)
h = len(y) % 4
x = y[:-h]
w = x.reshape((4, len(x) / 4))
Find the sum of each four second period and find the minimum of these intervals
v = w.sum(axis = -1)
# use the min value of these sums
threshold = v.min() # 2.1
This gives a gross threshold for non-overlapping four-second chunks- however it only produces 6 triggers for z which represents 42 seconds of data.
Use overlapping, rolling windows to find the minimum value of the sums of each four-second window (lossless)
def rolling(a, window, step = 1):
"""
Examples
--------
>>> a = np.arange(10)
>>> print rolling(a, 3)
[[0 1 2]
[1 2 3]
[2 3 4]
[3 4 5]
[4 5 6]
[5 6 7]
[6 7 8]
[7 8 9]]
>>> print rolling(a, 4)
[[0 1 2 3]
[1 2 3 4]
[2 3 4 5]
[3 4 5 6]
[4 5 6 7]
[5 6 7 8]
[6 7 8 9]]
>>> print rolling(a, 4, 2)
[[0 1 2 3]
[2 3 4 5]
[4 5 6 7]
[6 7 8 9]]
>>>
from http://stackoverflow.com/a/12498122/2823755
"""
shape = ( (a.size-window)/step + 1 , window)
strides = (a.itemsize*step, a.itemsize)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
t = rolling(y, 4)
s = t.sum(axis = -1)
threshold = s.min() # 1.3999999
This will produce 8 triggers for z.

Related

Index array based on value limits of another

Let's say I have an array (or even a list) that looks like:
tmp_data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
And then I have another ray that are distance values:
dist_data = [ 15.625 46.875 78.125 109.375 140.625 171.875 203.125 234.375 265.625 296.875]
Now, say I want to create a threshold of distance that I would like to perform an operation on from tmp_data. For this example, let's just take the max value. And let's set the threshold distance to 100. What I would like to do is take the n number of elements every 100 distance units and replace all elements in that with the maximum value in that small array. For example: I would want the final output to be
max_tmp_data_100 = [2,2,2,5,5,5,8,8,8,9]
This is because the first 3 elements in dist_data are below 100, so we take the first three elements of tmp_data (0,1,2), and get the maximum of this and replace all elements in there with that value, 2
Then, the next set of data that would be below the next 100 value would be
tmp_dist_array_100 = [109.375 140.625 171.875]
tmp_data_100 = [3,4,5]
max_tmp_data_100 = [5,5,5]
(append to [2,2,2])
I have come up with the following:
# Initialize
final_array = []
d_array = []
idx = 1
for i in range(0,10):
if dist_data[i] < idx * final_res:
d_array.append(tmp_data[i])
elif dist_data[i] > idx * final_res:
# Now get the values
max_val = np.amax(d_array)
new_array = np.ones(len(d_array)) * max_val
final_array.extend(new_array)
idx = idx + 1
But the outcome is
[2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0]
When it should be [2,2,2,5,5,5,8,8,8,9]
With numpy:
import numpy as np
cdist_data = [15.625, 46.875, 78.125, 109.375, 140.625, 171.875, 203.125, 234.375,265.625, 296.875]
cut = 100
a = np.array(dist_data)
vals = np.searchsorted(a, np.r_[cut:a.max() + cut:cut]) - 1
print(vals[(a/cut).astype(int)])
It gives:
[2 2 2 5 5 5 9 9 9 9]
You can do with groupby
from itertools import groupby
dist_data = [ 15.625, 46.875 ,78.125 ,109.375 ,140.625 ,171.875 ,203.125 ,234.375, 265.625 ,296.875]
tmp_data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
result = []
index_list = [[dist_data.index(i) for i in l]
for k, l in groupby(dist_data, key=lambda x:x//100)]
for i in tmp_data:
for lst in index_list:
if i in lst:
result.append(max(lst))
print(result)
# [2, 2, 2, 5, 5, 5, 9, 9, 9, 9]
A per your requirements last 4 elements will comes under next threshold value, the max of last 4 element is 9.

Fastest way to fill numpy array with new arrays from function

I have a function f(a) that takes one entry from a testarray and returns an array with 5 values:
f(testarray[0])
#Output: array([[0, 1, 5, 3, 2]])
Since f(testarray[0]) is the result of an experiment, I want to run this function f for each entry of the testarray and store each result in a new NumPy array. I always thought this would be quite simple by just taking an empty NumPy array with the length of the testarray and save the results the following way:
N = 1000 #Number of entries of the testarray
test_result = np.zeros([N, 5], dtype=int)
for i in testarray:
test_result[i] = f(i)
When I run this, I don't receive any error message but nonsense results (half of the test_result is empty while the rest is filled with implausible values). Since f() works perfectly for a single entry of the testarray I suppose that something of the way of how I save the results in the test_result is wrong. What am I missing here?
(I know that I could save the results as list and then append an empty list, but this method is too slow for the large number of times I want to run the function).
Since you don't seem to understand indexing, stick with this approach
alist = [f(i) for i in testarray]
arr = np.array(alist)
I could show how to use row indices and testarray values together, but that requires more explanation.
Your problem may could be reproduced by the following small example:
testarray = np.array([5, 6, 7, 3, 1])
def f(x):
return np.array([x * i for i in np.arange(1, 6)])
f(testarray[0])
# [ 5 10 15 20 25]
test_result = np.zeros([len(testarray), 5], dtype=int) # len(testarray) or testarray.shape[0]
So, as hpaulj mentioned in the comments, you must be careful how to use indexing:
for i in range(len(testarray)):
test_result[i] = f(testarray[i])
# [[ 5 10 15 20 25]
# [ 6 12 18 24 30]
# [ 7 14 21 28 35]
# [ 3 6 9 12 15]
# [ 1 2 3 4 5]]
There will be another condition where the testarray is a specified index array that contains shuffle integers from 0 to N to full fill the zero array i.e. test_result. For this condition we can create a reproducible example as:
testarray = np.array([4, 3, 0, 1, 2])
def f(x):
return np.array([x * i for i in np.arange(1, 6)])
f(testarray[0])
# [ 4 8 12 16 20]
test_result = np.zeros([len(testarray), 5], dtype=int)
So, using your loop will get the following result:
for i in testarray:
test_result[i] = f(i)
# [[ 0 0 0 0 0]
# [ 1 2 3 4 5]
# [ 2 4 6 8 10]
# [ 3 6 9 12 15]
# [ 4 8 12 16 20]]
As it can be understand from this loop, if the index array be not from 0 to N, some rows in the zero array will left zero (unchanged):
testarray = np.array([4, 2, 4, 1, 2])
for i in testarray:
test_result[i] = f(i)
# [[ 0 0 0 0 0] # <--
# [ 1 2 3 4 5]
# [ 2 4 6 8 10]
# [ 0 0 0 0 0] # <--
# [ 4 8 12 16 20]]

Take input in a function, divide the array in equal parts and print child array with highest value Python 3.6

I am trying to write a function which will take array as an input.
The array will always be divisible by 4. The array needs to be split into 4 parts equally.
Let's call these 4 arrays parts "One" "Two" "Three" "Four".
The array name needs to be the output which has the maximum difference between it's any 2 elements:
e.g
ONE [2 ,-3 , 3]
TWO [1 ,10, 8]
THREE [2 ,5, 13]
FOUR [-5, 3 ,-18]
The output should be "Four" because -18--5 = 21 which is the maximum.
import numpy as np
def solution(T):
arr = T
newarr = np.array_split(arr, 4)
print('[%s]' % ', '.join(map(str, newarr)))
The code below will print [ -5 3 -18], which is the sub-array that has the highest difference between its largest and smallest number.
import numpy as np
def solution(T):
arr = T
newarr = np.array_split(arr, 4)
arrayIdx = np.argmax(np.amax(newarr, axis=1)-np.amin(newarr, axis=1))
return newarr[arrayIdx]
array = [2 ,-3 , 3, 1 ,10, 8, 2 ,5, 13, -5, 3 ,-18]
print(solution(array))

Converting an array of numbers from an old range to a new range, where the lowest valued number is a 100 and the highest valued number is a 0?

Say we have an array of values of [2, 5, 7, 9, 3] I would want the 2 be a 100 since it's the lowest value and the 9 to be a 0 since it's the highest, and everything in between is interpolated, how would I go about converting this array? When I say interpolated I want the numbers to be the same distance apart in the new scale, so the 3 wouldn't quite be 100, but close, maybe around 95 or so.
Just scale the array into the [0, 100], then minus all of them by 100. So, the solution is:
import numpy as np
arr = [2, 5, 7, 9, 3]
min_val = np.min(arr)
max_val = np.max(arr)
total_range = max_val - min_val
new_arr = [(100 - int(((i - min_val)/total_range) * 100.0)) for i in arr]
Notice if you desire all values in the specified range from maximum to minimum will be uniformly distributed, your example for 3 cannot happen. So, in this solution, 3 will be around 84 (not 95).
I hope I'm understanding the question correctly. If so my solution would be to sort the list and then scale it proportional to the difference between the highest and lowest value of the list divided by 100. Here is a quick code example that works fine:
a = [2, 5, 7, 9, 3]
a.sort()
b = []
for element in a:
b.append(int(100 - (element - a[0]) * (100 / (a[-1]-a[0]))))
print(a)
print(b)
along the same lines but broken into smaller steps
Plus practice in naming variables
a = [2,5,7,9,3]
a_min = min(a)
a_max = max(a)
a_diff = a_max - a_min
b=[]
for x in a:
b += [(x- a_min)]
b_max = max(b)
c=[]
for x in b:
c += [(1-(x/b_max))*100]
print('c: ',c)
//c: [100.0, 57.14285714285714, 28.57142857142857, 0.0, 85.71428571428572]

How to efficiently compute logsumexp of upper triangle in a nested loop?

I have a nested for loop that iterates over rows of the weight matrix and applies logsumexp to the upper triangular portion of the outer addition matrix from these weights rows. It is very slow so I'm trying to figure out how to speed this up by either vectorizing or taking out the loops in lieu of matrix operations.
'''
Wm: weights matrix, nxk
W: updated weights matrix, nxn
triu_inds: upper triangular indices of Wxy outer matrix
'''
for x in range(n-1):
wx = Wm[x, :]
for y in range(x+1, n):
wy = Wm[y, :]
Wxy = np.add.outer(wx, wy)
Wxy = Wxy[triu_inds]
W[x, y] = logsumexp(Wxy)
logsumexp: computes the log of the sum of exponentials of an input array
a: [1, 2, 3]
logsumexp(a) = log( exp(1) + exp(2) + exp(3) )
The input data Wm is a weights matrix of nxk dimensions. K represents a patients sensor locations and n represents all such possible sensor locations. The values in Wm are basically how close a patients sensor is to a known sensor.
example:
Wm = [1 2 3]
[4 5 6]
[7 8 9]
[10 11 12]
wx = [1 2 3]
wy = [4 5 6]
Wxy = [5 6 7]
[6 7 8]
[7 8 9]
triu_indices = ([0, 0, 1], [1, 2, 2])
Wxy[triu_inds] = [6, 7, 8]
logsumexp(Wxy[triu_inds]) = log(exp(6) + exp(7) + exp(8))
You can perform the outer product on the full matrix Wm and then swap the axes corresponding to columns in operand 1 and rows in operand 2 in order to apply the triangle indices to the columns. The resulting matrix is filled for all combinations of rows, so you need to select the upper triangle part.
W = logsumexp(
np.add.outer(Wm, Wm).swapaxes(1, 2)[(slice(None),)*2 + triu_inds],
axis=-1 # Perform summation over last axis.
)
W = np.triu(W, k=1)

Categories

Resources