I have two lists, one for time and other for amplitude.
time = [0, 1, 2, 3, 6, 7, 10, 11, 13, 15, 16, 17, 18, 20] # (seconds for example) the step isn't fixed
ampli = [0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0] # ugly space to facilitate the reading
I want to know when there's a change from '0' to '1' or vice-versa, but I only care if the event happens after verify_time = X. So, if verify_time = 12.5 it would return time[8] = 13 and time[10] = 16.
What I have so far is:
time = [0, 1, 2, 3, 6, 7, 10, 11, 13, 15, 16, 17, 18, 20] # (seconds for example) the step isn't fixed
ampli = [0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0] # ugly spacing to facilitate the reading
verify_time = 12.5
start_end = []
for i, (t, a) in enumerate(zip(time, ampli)):
if t >= verify_time: # should check the values from here
if ampli[i-1] and (a != ampli[i-1]): # there's a change from 0 to 1 or vice-versa
start_end.append(i)
print(f"Start: {time[start_end[0]]}s")
print(f"End: {time[start_end[1]]}s")
This will print:
Start: 13s
End: 17s
Question 1) Shouldn't it print End: 16s? I'm kind of lost with this logic because the number of '1's is three (3).
Question 2) Is there another way to have the same results without using this for if if? I find it awkward, in Matlab I would use the diff() function
if you don't mind using numpy, it is easiest, also faster in larger lists, to find edges by calculating differences, unless your waves are taking gigabytes that goes out of memory
import numpy as np
verify_time = 12.5
time = np.array([0, 1, 2, 3, 6, 7, 10, 11, 13, 15, 16, 17, 18, 20])
ampli = np.array([0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0])
ind = time>verify_time
time = time[ind]
ampli = ampli[ind]
d_ampli = np.diff(ampli)
starts = np.where(d_ampli>0)[0]
ends = np.where(d_ampli<0)[0]-1
UPDATE
I forgot to change the diff properly, it should be d_ampli = np.diff(ampli, prepend=ampli[0]
UPDATE
As you noted, the original answer returns an empty start. The reason is that after filtering the ampli starts with [1, 1, ...] so there is no edge. A philosophical question arises here, does the edge really starts before 12.5 or after it? We don't know, and I'm kinda sure you won't care. What you want here is a backward differencing scheme that numpy does not allow, so we just trick it by shifting everything forward one index as:
import numpy as np
verify_time = 12.5
time = np.array([0, 1, 2, 3, 6, 7, 10, 11, 13, 15, 16, 17, 18, 20])
ampli = np.array([0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0])
d_ampli = np.r_[[0], np.diff(ampli)]
starts = np.where(d_ampli>0)[0]
ends = np.where(d_ampli<0)[0]-1
start = start[time[start]>verify_time]
ends = ends[time[ends]>verify_time]
start, ends
(array([8], dtype=int64), array([10], dtype=int64))
It prints 17s because you take note of the first value after the change, which is 17 for the first 0 after the end of the square wave.
I've simplified the logic into a list comprehension, so you it should make more sense:
assert len(time) == len(ampli)
start_end = [i for i in range(len(time)) if time[i] >= verify_time and ampli[i-1] is not None and (ampli[i] != ampli[i-1])]
print(f"Start: {time[start_end[0]]}s")
print(f"End: {time[start_end[1]]}s")
Also, you had an issue, where if ampli[i-1] was also False when it was 0. Fixed that too. It would be most accurate, if you took the average of time[start_end[0]] and time[start_end[0]-1], as all you know based on your resolution, that the transition occurred somewhere between the two samples.
I've made the below solution to have a straightforward algorithm. In summary, it goes as follows:
Convert lists to NumPy arrays
Find closest value in time array to verify_time, cut off all indexes that occur beforehand.
NumPys' "diff" method is great for finding rising and falling edges. Once those edges are found, we can use NumPys' "where" method to look up the indexes and then return the time found at the same indexes in the time array.
Coding Environment
Python 3.6 (Minimum Requirement for the print statements)
NumPy 1.15.2 (Older versions are probably fine)
import numpy as np
# inputs
time = [0, 1, 2, 3, 6, 7, 10, 11, 13, 15, 16, 17, 18, 20] # (seconds for example) the step isn't fixed
ampli = [0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0] # ugly spacing to facilitate the reading
verify_time = 12.5
# ------------------------------------------
# Solution
# Step 1) Convert lists to Numpy Arrays
npTime = np.array(time)
npAmplitude = np.array(ampli) # Amplitude
# Step 2) Find closest Value in time array to 'verify_time'.
# Strategy:
# i) Subtact 'verify_time' from each value in array. (Produces an array of Diffs)
# ii) The Diff that is nearest to zero, or better yet is zero is the best match for 'verify_time'
# iii) Get the array index of the Diff selected in step ii
# Step i
npDiffs = np.abs(npTime - float(verify_time))
# Step ii
smallest_value = np.amin(npDiffs)
# Step iii (Use numpy.where to lookup array index)
first_index_we_care_about = (np.where(npDiffs == smallest_value)[0])[0]
first_index_we_care_about = first_index_we_care_about - 1 # Below edge detection requires previous index
# Remove the beginning parts of the arrays that the question doesn't care about
npTime = npTime[first_index_we_care_about:len(npTime)]
npAmplitude = npAmplitude[first_index_we_care_about:len(npAmplitude)]
# Step 3) Edge Detection: Find the rising and falling edges
# Generates a 1 when rising edge is found, -1 for falling edges, 0s for no change
npEdges = np.diff(npAmplitude)
# For Reference
# Here you can see that numpy diff placed a 1 before all rising edges, and a -1 before falling
# ampli [ 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0]
# npEdges [ 0, 1, 0, -1, 0, 0, 0, 1, 0, 0, -1, 0, 0]
# Get array indexes where a 1 is found (I.e. A Rising Edge)
npRising_edge_indexes = np.where(npEdges == 1)[0]
# Get array indexes where a -1 is found (I.e. A Falling Edge)
npFalling_edge_indexes = np.where(npEdges == -1)[0]
# Print times that edges are found after 'verify_time'
# Note: Adjust edge detection index by '+1' to answer question correctly (yes this is consistent)
print(f'Start: {npTime[npRising_edge_indexes[0]+1]}s')
print(f'End: {npTime[npFalling_edge_indexes[0]+1]}s')
Output
Start: 13s
End: 17s
Related
I have a 2d numpy array called arm_resets that has positive integers. The first column has all positive integers < 360. For all columns other than the first, I need to replace all values over 360 with the value that is in the same row in the 1st column. I thought this would be a relatively easy thing to do, here's what I have:
i = 300
over_360 = arm_resets[:, [i]] >= 360
print(arm_resets[:, [i]][over_360])
print(arm_resets[:, [0]][over_360])
arm_resets[:, [i]][over_360] = arm_resets[:, [0]][over_360]
print(arm_resets[:, [i]][over_360])
And here's what prints:
[3600 3609 3608 ... 3600 3611 3605]
[ 0 9 8 ... 0 11 5]
[3600 3609 3608 ... 3600 3611 3605]
Since all numbers that are being shown in the first print (first 3 and last 3) are above 360, they should be getting replaced by the 2nd print in the 3rd print. Why is this not working?
edit: reproducible example:
df = pd.DataFrame({"start":[1,2,5,6],"freq":[1,5,6,9]})
periods = 6
arm_resets = df[["start"]].values
freq = df[["freq"]].values
arm_resets = np.pad(arm_resets,((0,0),(0,periods-1)))
for i in range(1,periods):
arm_resets[:,[i]] = arm_resets[:,[i-1]] + freq
#over_360 = arm_resets[:,[i]] >= periods
#arm_resets[:,[i]][over_360] = arm_resets[:,[0]][over_360]
arm_resets
Given commented out code here's what prints:
array([[ 1, 2, 3, 4, 5, 6],
[ 2, 7, 12, 17, 22, 27],
[ 3, 9, 15, 21, 27, 33],
[ 4, 13, 22, 31, 40, 49]])
What I would expect:
array([[ 1, 2, 3, 4, 5, 1],
[ 2, 2, 2, 2, 2, 2],
[ 3, 3, 3, 3, 3, 3],
[ 4, 4, 4, 4, 4, 4]])
Now if it helps, the final 2d array I'm actually trying to create is a 1/0 array that indicates which are filled in, so in this example I'd want this:
array([[ 0, 1, 1, 1, 1, 1],
[ 0, 0, 1, 0, 0, 0],
[ 0, 0, 0, 1, 0, 0],
[ 0, 0, 0, 0, 1, 0]])
The code I use to achieve this from the above arm_resets is this:
fin = np.zeros((len(arm_resets),periods),dtype=int)
for i in range(len(arm_resets)):
fin[i,a[i]] = 1
The slice arm_resets[:, [i]] is a fancy index, and therefore makes a copy of the ith column of the data. arm_resets[:, [i]][over_360] = ... therefore calls __setitem__ on a temporary array that is discarded as soon as the statement executes. If you want to assign to the mask, call __setitem__ on the sliced object directly:
arm_resets[over_360, [i]] = ...
You also don't need to make the index into a list. It's generally better to use simple indices, especially when doing assignments, since they create views rather than copies:
arm_resets[over_360, i] = ...
With slicing, even the following should work, since it calls __setitem__ on a view:
arm_resets[:, i][over_360] = ...
This index does not help you process each row of the data, since i is a column. In fact, you can process the entire matrix in one step, without looping, if you use indices rather than a boolean mask. The reason that indices are useful is that you can match the item from the correct row in the first column:
rows, cols = np.nonzero(arm_resets[:, 1:] >= 360)
arm_resets[rows, cols] = arm_resets[rows, 1]
You can use np.where()
first_col = arm_resets[:,0] # first col
first_col = first_col.reshape(first_col.size,1) #Transfor in 2d array
arm_resets = np.where(arm_resets >= 360,first_col,arm_resets)
You can see in detail how np.where work here, but basically it compare arm_resets >= 360, if true it put first_col value in place (there another detail here with broadcasting) if false it put arm_resets value.
Edit: As suggested by Mad Physicist. You can use arm_resets[:,0,None] directly instead of creating first_col variable.
arm_resets = np.where(arm_resets >= 360,arm_resets[:,0,None],arm_resets)
I would like to loop over following check_matrix in such a way that code recognize whether the first and second element is 1 and 1 or 1 and 2 etc? Then for each separate class of pair i.e. 1,1 or 1,2 or 2,2, the code should store in the new matrices, the sum of last element (which in this case has index 8) times exp(-i*q(check_matrix[k][2:5]-check_matrix[k][5:8])), where i is iota (complex number), k is the running index on check_matrix and q is a vector defined as given below. So there are 20 q vectors.
import numpy as np
q= []
for i in np.linspace(0, 10, 20):
q.append(np.array((0, 0, i)))
q = np.array(q)
check_matrix = np.array([[1, 1, 0, 0, 0, 0, 0, -0.7977, -0.243293],
[1, 1, 0, 0, 0, 0, 0, 1.5954, 0.004567],
[1, 2, 0, 0, 0, -1, 0, 0, 1.126557],
[2, 1, 0, 0, 0, 0.5, 0.86603, 1.5954, 0.038934],
[2, 1, 0, 0, 0, 2, 0, -0.7977, -0.015192],
[2, 2, 0, 0, 0, -0.5, 0.86603, 1.5954, 0.21394]])
This means in principles I will have to have 20 matrices of shape 2x2, corresponding to each q vector.
For the moment my code is giving only one matrix, which appears to be the last one, even though I am appending in the Matrices. My code looks like below,
for i in range(2):
i = i+1
for j in range(2):
j= j +1
j_list = []
Matrices = []
for k in range(len(check_matrix)):
if check_matrix[k][0] == i and check_matrix[k][1] == j:
j_list.append(check_matrix[k][8]*np.exp(-1J*np.dot(q,(np.subtract(check_matrix[k][2:5],check_matrix[k][5:8])))))
j_11 = np.sum(j_list)
I_matrix[i-1][j-1] = j_11
Matrices.append(I_matrix)
I_matrix is defined as below:
I_matrix= np.zeros((2,2),dtype=np.complex_)
At the moment I get following output.
Matrices = [array([[-0.66071446-0.77603624j, -0.29038112+2.34855023j], [-0.31387562-0.08116629j, 4.2788 +0.j ]])]
But, I desire to get a matrix corresponding to each q value meaning that in total there should be 20 matrices in this case, where each 2x2 matrix element would be containing sums such that elements belong to 1,1 and 1,2 and 2,2 pairs in following manner
array([[11., 12.],
[21., 22.]])
I shall highly appreciate your suggestion to correct it. Thanks in advance!
I am pretty sure you can solve this problem in an easier way and I am not 100% sure that I understood you correctly, but here is some code that does what I think you want. If you have a possibility to check if the results are valid, I would suggest you do so.
import numpy as np
n = 20
q = np.zeros((20, 3))
q[:, -1] = np.linspace(0, 10, n)
check_matrix = np.array([[1, 1, 0, 0, 0, 0, 0, -0.7977, -0.243293],
[1, 1, 0, 0, 0, 0, 0, 1.5954, 0.004567],
[1, 2, 0, 0, 0, -1, 0, 0, 1.126557],
[2, 1, 0, 0, 0, 0.5, 0.86603, 1.5954, 0.038934],
[2, 1, 0, 0, 0, 2, 0, -0.7977, -0.015192],
[2, 2, 0, 0, 0, -0.5, 0.86603, 1.5954, 0.21394]])
check_matrix[:, :2] -= 1 # python indexing is zero based
matrices = np.zeros((n, 2, 2), dtype=np.complex_)
for i in range(2):
for j in range(2):
k_list = []
for k in range(len(check_matrix)):
if check_matrix[k][0] == i and check_matrix[k][1] == j:
k_list.append(check_matrix[k][8] *
np.exp(-1J * np.dot(q, check_matrix[k][2:5]
- check_matrix[k][5:8])))
matrices[:, i, j] = np.sum(k_list, axis=0)
NOTE: I changed your indices to have consistent
zero-based indexing.
Here is another approach where I replaced the k-loop with a vectored version:
for i in range(2):
for j in range(2):
k = np.logical_and(check_matrix[:, 0] == i, check_matrix[:, 1] == j)
temp = np.dot(check_matrix[k, 2:5] - check_matrix[k, 5:8], q[:, :, np.newaxis])[..., 0]
temp = check_matrix[k, 8:] * np.exp(-1J * temp)
matrices[:, i, j] = np.sum(temp, axis=0)
3 line solution
You asked for efficient solution in your original title so how about this solution that avoids nested loops and if statements in a 3 liner, which is thus hopefully faster?
fac=2*(check_matrix[:,0]-1)+(check_matrix[:,1]-1)
grp=np.split(check_matrix[:,8], np.cumsum(np.unique(fac,return_counts=True)[1])[:-1])
[np.sum(x) for x in grp]
output:
[-0.23872600000000002, 1.126557, 0.023742000000000003, 0.21394]
How does it work?
I combine the first two columns into a single index, treating each as "bits" (i.e. base 2)
fac=2*(check_matrix[:,0]-1)+(check_matrix[:,1]-1)
( If you have indexes that exceed 2, you can still use this technique but you will need to use a different base to combine the columns. i.e. if your indices go from 1 to 18, you would need to multiply column 0 by a number equal to or larger than 18 instead of 2. )
So the result of the first line is
array([0., 0., 1., 2., 2., 3.])
Note as well it assumes the data is ordered, that one column changes fastest, if this is not the case you will need an extra step to sort the index and the original check matrix. In your example the data is ordered.
The next step groups the data according to the index, and uses the solution posted here.
np.split(check_matrix[:,8], np.cumsum(np.unique(fac,return_counts=True)[1])[:-1])
[array([-0.243293, 0.004567]), array([1.126557]), array([ 0.038934, -0.015192]), array([0.21394])]
i.e. it outputs the 8th column of check_matrix according to the grouping of fac
then the last line simply sums those... knowing how the first two columns were combined to give the single index allows you to map the result back. Or you could simply add it to check matrix as a 9th column if you wanted.
I am using the function get_tuples(length, total) from here
to generate an array of all tuples of given length and sum, an example and the function are shown below. After I have created the array I need to find a way to return the indices of a given number of elements in the array. I was able to do that using .index() by changing the array to a list, as shown below. However, this solution or another solution that is also based on searching (for example using np.where) takes a lot of time to find the indices. Since all elements in the array (array s in the example) are different, I was wondering if we can construct a one-to-one mapping, i.e., a function such that given the element in the array it returns the index of the element by doing some addition and multiplication on the values of this element. Any ideas if that is possible? Thanks!
import numpy as np
def get_tuples(length, total):
if length == 1:
yield (total,)
return
for i in range(total + 1):
for t in get_tuples(length - 1, total - i):
yield (i,) + t
#example
s = np.array(list(get_tuples(4, 20)))
# array s
In [1]: s
Out[1]:
array([[ 0, 0, 0, 20],
[ 0, 0, 1, 19],
[ 0, 0, 2, 18],
...,
[19, 0, 1, 0],
[19, 1, 0, 0],
[20, 0, 0, 0]])
#example of element to find the index for. (Note in reality this is 1000+ elements)
elements_to_find =np.array([[ 0, 0, 0, 20],
[ 0, 0, 7, 13],
[ 0, 5, 5, 10],
[ 0, 0, 5, 15],
[ 0, 2, 4, 14]])
#change array to list
s_list = s.tolist()
#find the indices
indx=[s_list.index(i) for i in elements_to_find.tolist()]
#output
In [2]: indx
Out[2]: [0, 7, 100, 5, 45]
Here is a formula that calculates the index based on the tuple alone, i.e. it needn't see the full array. To compute the index of an N-tuple it needs to evaluate N-1 binomial coefficients. The following implementation is (part-) vectorized, it accepts ND-arrays but the tuples must be in the last dimension.
import numpy as np
from scipy.special import comb
# unfortunately, comb with option exact=True is not vectorized
def bc(N,k):
return np.round(comb(N,k)).astype(int)
def get_idx(s):
N = s.shape[-1] - 1
R = np.arange(1,N)
ps = s[...,::-1].cumsum(-1)
B = bc(ps[...,1:-1]+R,1+R)
return bc(ps[...,-1]+N,N) - ps[...,0] - 1 - B.sum(-1)
# OP's generator
def get_tuples(length, total):
if length == 1:
yield (total,)
return
for i in range(total + 1):
for t in get_tuples(length - 1, total - i):
yield (i,) + t
#example
s = np.array(list(get_tuples(4, 20)))
# compute each index
r = get_idx(s)
# expected: 0,1,2,3,...
assert (r == np.arange(len(r))).all()
print("all ok")
#example of element to find the index for. (Note in reality this is 1000+ elements)
elements_to_find =np.array([[ 0, 0, 0, 20],
[ 0, 0, 7, 13],
[ 0, 5, 5, 10],
[ 0, 0, 5, 15],
[ 0, 2, 4, 14]])
print(get_idx(elements_to_find))
Sample run:
all ok
[ 0 7 100 5 45]
How to derive formula:
Use stars and bars to express the full partition count #part(N,k) (N is total, k is length) as a single binomial coefficient (N + k - 1) choose (k - 1).
Count back-to-front: It is not hard to verify that after the i-th full iteration of the outer loop of OP's generator exactly #part(N-i,k) have not yet been enumerated. Indeed, what's left are all partitions p1+p2+... = N with p1>=i; we can write p1=q1+i such that q1+p2+... = N-i and this latter partition is constraint-free so we can use 1. to count.
You can use binary search to make the search a lot faster.
Binary search makes the search O(log(n)) rather than O(n) (using Index)
We do not need to sort the tuples since they are already sorted by the generator
import bisect
def get_tuples(length, total):
" Generates tuples "
if length == 1:
yield (total,)
return
yield from ((i,) + t for i in range(total + 1) for t in get_tuples(length - 1, total - i))
def find_indexes(x, indexes):
if len(indexes) > 100:
# Faster to generate all indexes when we have a large
# number to check
d = dict(zip(x, range(len(x))))
return [d[tuple(i)] for i in indexes]
else:
return [bisect.bisect_left(x, tuple(i)) for i in indexes]
# Generate tuples (in this case 4, 20)
x = list(get_tuples(4, 20))
# Tuples are generated in sorted order [(0,0,0,20), ...(20,0,0,0)]
# which allows binary search to be used
indexes = [[ 0, 0, 0, 20],
[ 0, 0, 7, 13],
[ 0, 5, 5, 10],
[ 0, 0, 5, 15],
[ 0, 2, 4, 14]]
y = find_indexes(x, indexes)
print('Found indexes:', *y)
print('Indexes & Tuples:')
for i in y:
print(i, x[i])
Output
Found indexes: 0 7 100 5 45
Indexes & Tuples:
0 (0, 0, 0, 20)
7 (0, 0, 7, 13)
100 (0, 5, 5, 10)
5 (0, 0, 5, 15)
45 (0, 2, 4, 14)
Performance
Scenario 1--Tuples already computed and we just want to find the index of certain tuples
For instance x = list(get_tuples(4, 20)) has already been perform.
Search for
indexes = [[ 0, 0, 0, 20],
[ 0, 0, 7, 13],
[ 0, 5, 5, 10],
[ 0, 0, 5, 15],
[ 0, 2, 4, 14]]
Binary Search
%timeit find_indexes(x, indexes)
100000 loops, best of 3: 11.2 µs per loop
Calculates the index based on the tuple alone (courtesy #PaulPanzer approach)
%timeit get_idx(indexes)
10000 loops, best of 3: 92.7 µs per loop
In this scenario, binary search is ~8x faster when tuples have already been pre-computed.
Scenario 2--the tuples have not been pre-computed.
%%timeit
import bisect
def find_indexes(x, t):
" finds the index of each tuple in list t (assumes x is sorted) "
return [bisect.bisect_left(x, tuple(i)) for i in t]
# Generate tuples (in this case 4, 20)
x = list(get_tuples(4, 20))
indexes = [[ 0, 0, 0, 20],
[ 0, 0, 7, 13],
[ 0, 5, 5, 10],
[ 0, 0, 5, 15],
[ 0, 2, 4, 14]]
y = find_indexes(x, indexes)
100 loops, best of 3: 2.69 ms per loop
#PaulPanzer approach is the same timing in this scenario (92.97 us)
=> #PaulPanzer approach ~29 times faster when the tuples don't have to be computed
Scenario 3--Large number of indexes (#PJORR)
A large number of random indexes is generated
x = list(get_tuples(4, 20))
xnp = np.array(x)
indices = xnp[np.random.randint(0,len(xnp), 2000)]
indexes = indices.tolist()
%timeit find_indexes(x, indexes)
#Result: 1000 loops, best of 3: 1.1 ms per loop
%timeit get_idx(indices)
#Result: 1000 loops, best of 3: 716 µs per loop
In this case, we are #PaulPanzer is 53% faster
I have a problem that as to be solved as efficient as possible. My current approach kind of works, but is extreme slow.
I have a dataframe with multiple columns, in this case I only care for one of them. It contains positive continuous numbers and some zeros.
my goal: is to find the row where nearly no zeros appear in the following rows.
To make clear what I mean I wrote this example to replicate my problem:
df = pd.DataFrame([0,0,0,0,1,0,1,0,0,2,0,0,0,1,1,0,1,2,3,4,0,4,0,5,1,0,1,2,3,4,
0,0,1,2,1,1,1,1,2,2,1,3,6,1,1,5,1,2,3,4,4,4,3,5,1,2,1,2,3,4],
index=pd.date_range('2018-01-01', periods=60, freq='15T'))
There are some zeros at the beginning, but they get less after some time.
Here comes my unoptimized code to visualize the number of zeros:
zerosum = 0 # counter for all zeros that have appeared so far
for i in range(len(df)):
if(df[0][i]== 0.0):
df.loc[df.index[i],'zerosum']=zerosum
zerosum+=1
else:
df.loc[df.index[i],'zerosum']=zerosum
df['zerosum'].plot()
With that unoptimized code I can see the distribution of zeros over time.
My expected output: would be in this example the date 01-Jan-2018 08:00, because no zeros appear after that date.
The problem I have when dealing with my real data is some single zeros can appear later. Therefore I can't just pick the last row that contains a zero. I have to somehow inspect the distribution of zeros and ignore later outliers.
Note: The visualization is not necessary to solve my problem, I just included it to explain my problem as good as possible. Thanks
Ok
Second go
import pandas as pd
import numpy as np
import math
df = pd.DataFrame([0,0,0,0,1,0,1,0,0,2,0,0,0,1,1,0,1,2,3,4,0,4,0,5,1,0,1,2,3,4,
0,0,1,2,1,1,1,1,2,2,1,3,6,1,1,5,1,2,3,4,4,4,3,5,1,2,1,2,3,4],
index=pd.date_range('2018-01-01', periods=60, freq='15T'),
columns=['values'])
We create a column that contains the rank of each zero, and zero if there is a non-zero value
df['zero_idx'] = np.where(df['values']==0,np.cumsum(np.where(df['values']==0,1,0)), 0)
We can use this column to get the location of any zero of any rank. I dont know what your criteria is for naming a zero an outlier. But lets say we want to make sure at we are past at least 90% of all zeros...
# Total number of zeros
n_zeros = max(df['zero_idx'])
# Get past at least this percentage
tolerance = 0.9
# The rank of the abovementioned zero
rank_tolerance = math.ceil(tolerance * n_zeros)
df[df['zero_idx']==rank_tolerance].index
Out[44]: DatetimeIndex(['2018-01-01 07:30:00'], dtype='datetime64[ns]', freq='15T')
Okay, If you need to get the index after the last zero occurred, you can try this:
last = 0
for i in range(len(df)):
if(df[0][i] == 0):
last = i
print(df.iloc[last+1])
or by Filtering:
new = df.loc[df[0]==0]
last = df.index.get_loc(new.index[-1])
print(df.iloc[last+1])
here my solution using a filter and cumsum:
df = pd.DataFrame([0, 0, 0, 0, 1, 0, 1, 0, 0, 2, 0, 0, 0, 1, 1, 0, 1, 2, 3, 4, 0, 4, 0, 5, 1, 0, 1, 2, 3, 4,
0, 0, 1, 2, 1, 1, 1, 1, 2, 2, 1, 3, 6, 1, 1, 5, 1, 2, 3, 4, 4, 4, 3, 5, 1, 2, 1, 2, 3, 4],
index=pd.date_range('2018-01-01', periods=60, freq='15T'))
a = df[0] == 0
df['zerosum'] = a.cumsum()
maxval = max(df['zerosum'])
firstdate = df[df['zerosum'] == maxval].index[1]
print(firstdate)
output:
2018-01-01 08:00:00
I am trying to construct a numpy array (a 2-dimensional numpy array - i.e. a matrix) from a paper that uses a non-standard indexing to construct the matrix. I.e. the top left element is q1,2. instead of q0,0.
Define the n x (n-2) matrix Q by its elements qi,j for i = i,...,n and j = 2, ... , n-1 given by
qj-1,j=h-1j-1, qj,j = h-1j-1 - h-1j and qj+1,j=hjj-1. (I have posted this in Latex form here: http://www.texpaste.com/n/8vwds4fx)
I have tried to implement in python like this:
# n = u_s.size
# n = 299 for this example
n = 299
Q = np.zeros((n,n-2))
for i in range(0,n+1):
for j in range(2,n):
Q[j-1,j] = 1.0/h[j-1]
Q[j,j] = -1.0/h[j-1] - 1.0/h[j]
Q[j+1,j] = 1.0/h[j]
But I always get the error:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-54-c07a3b1c81bb> in <module>()
1 for i in range(1,n+1):
2 for j in range(2,n-1):
----> 3 Q[j-1,j] = 1.0/h[j-1]
4 Q[j,j] = -1.0/h[j-1] - 1.0/h[j]
5 Q[j+1,j] = 1.0/h[j]
IndexError: index 297 is out of bounds for axis 1 with size 297
I initially thought I could decrement both i and j in my for loop to keep edge cases safe, as a quick way to move to zero-indexed notation, but this hasn't worked. I also tried incrementing and modifying the range().
Is there a way to convert this definition to one that python can handle? Is this a common issue?
Simplifying the problem to make the assignment pattern obvious:
In [228]: h=np.arange(10,15)
In [229]: Q=np.zeros((5,5),int)
In [230]: for j in range(1,5):
...: Q[j-1:j+2,j] = h[j-1:j+2]
In [231]: Q
Out[231]:
array([[ 0, 10, 0, 0, 0],
[ 0, 11, 11, 0, 0],
[ 0, 12, 12, 12, 0],
[ 0, 0, 13, 13, 13],
[ 0, 0, 0, 14, 14]])
Assignment to the partial first and last columns may need tweaking. Here's the equivalent built from diagonals:
In [232]: np.diag(h,0)+np.diag(h[:-1],1)+np.diag(h[1:],-1)
Out[232]:
array([[10, 10, 0, 0, 0],
[11, 11, 11, 0, 0],
[ 0, 12, 12, 12, 0],
[ 0, 0, 13, 13, 13],
[ 0, 0, 0, 14, 14]])
With the h[j-1], h[j] indexing this diagonal assignment probably needs tweaking, but it should be a useful starting point.
Selecting h values more like what you use (skipping the 1/h for now):
In [238]: Q=np.zeros((5,5),int)
In [239]: for j in range(1,4):
...: Q[j-1:j+2,j] =[h[j-1],h[j-1]+h[j], h[j]]
...:
In [240]: Q
Out[240]:
array([[ 0, 10, 0, 0, 0],
[ 0, 21, 11, 0, 0],
[ 0, 11, 23, 12, 0],
[ 0, 0, 12, 25, 0],
[ 0, 0, 0, 13, 0]])
I'm skipping the two partial end columns for now. The first slicing approach allowed me to be a bit sloppy, since it's ok to slice 'off the end'. The end columns, if set, will require their own expressions.
In [241]: j=0; Q[j:j+2,j] =[h[j], h[j]]
In [242]: j=4; Q[j-1:j+1,j] =[h[j-1],h[j-1]+h[j]]
In [243]: Q
Out[243]:
array([[10, 10, 0, 0, 0],
[10, 21, 11, 0, 0],
[ 0, 11, 23, 12, 0],
[ 0, 0, 12, 25, 13],
[ 0, 0, 0, 13, 27]])
The relevant diagonal pieces are still evident:
In [244]: h[1:]+h[:-1]
Out[244]: array([21, 23, 25, 27])
The equation doesn't contain any value for i. It is referring only to j. The Q should be a matrix of dimension n+2 x n+2. For j = 1, it refers to Q[0,1], Q[1,1] and Q[2,1]. for j =n, it refers to Q[n-1,n], Q[n,n] and Q[n+1,n]. So, Q should have indices from 0 to n+1 which n+2
I don't think, you require the i loop. You can achieve your results only with j loop from 1 to n, but Q should be from 0 to n+1