Pandas count values greater than current row in the last n rows

Pandas count values greater than current row in the last n rows - python

How to get count of values greater than current row in the last n rows?
Imagine we have a dataframe as following:
col_a
0 8.4
1 11.3
2 7.2
3 6.5
4 4.5
5 8.9
I am trying to get a table such as following where n=3.
col_a col_b
0 8.4 0
1 11.3 0
2 7.2 2
3 6.5 3
4 4.5 3
5 8.9 0
Thanks in advance.

In pandas is best dont loop because slow, here is better use rolling with custom function:
n = 3
df['new'] = (df['col_a'].rolling(n+1, min_periods=1)
.apply(lambda x: (x[-1] < x[:-1]).sum())
.astype(int))
print (df)
col_a new
0 8.4 0
1 11.3 0
2 7.2 2
3 6.5 3
4 4.5 3
5 8.9 0
If performance is important, use strides:
n = 3
x = np.concatenate([[np.nan] * (n), df['col_a'].values])
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
arr = rolling_window(x, n + 1)
df['new'] = (arr[:, :-1] > arr[:, [-1]]).sum(axis=1)
print (df)
col_a new
0 8.4 0
1 11.3 0
2 7.2 2
3 6.5 3
4 4.5 3
5 8.9 0
Performance: Here is used perfplot in small window n = 3:
np.random.seed(1256)
n = 3
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
def roll(df):
df['new'] = (df['col_a'].rolling(n+1, min_periods=1).apply(lambda x: (x[-1] < x[:-1]).sum(), raw=True).astype(int))
return df
def list_comp(df):
df['count'] = [(j < df['col_a'].iloc[max(0, i-3):i]).sum() for i, j in df['col_a'].items()]
return df
def strides(df):
x = np.concatenate([[np.nan] * (n), df['col_a'].values])
arr = rolling_window(x, n + 1)
df['new1'] = (arr[:, :-1] > arr[:, [-1]]).sum(axis=1)
return df
def make_df(n):
df = pd.DataFrame(np.random.randint(20, size=n), columns=['col_a'])
return df
perfplot.show(
setup=make_df,
kernels=[list_comp, roll, strides],
n_range=[2**k for k in range(2, 15)],
logx=True,
logy=True,
xlabel='len(df)')
Also I was curious about performance in large window, n = 100:

n = 3
df['col_b'] = df.apply(lambda row: sum(row.col_a <= df.col_a.loc[row.name - n: row.name-1]), axis=1)
Out[]:
col_a col_b
0 8.4 0
1 11.3 0
2 7.2 2
3 6.5 3
4 4.5 3
5 8.9 0

Using a list comprehension with pd.Series.items:
n = 3
df['count'] = [(j < df['col_a'].iloc[max(0, i-3):i]).sum() \
for i, j in df['col_a'].items()]
Equivalently, using enumerate:
n = 3
df['count'] = [(j < df['col_a'].iloc[max(0, i-n):i]).sum() \
for i, j in enumerate(df['col_a'])]
Result:
print(df)
col_a count
0 8.4 0
1 11.3 0
2 7.2 2
3 6.5 3
4 4.5 3
5 8.9 0

Related

How to insert n random NaN consecutive and no consecutive data by month?

I have this df:
CODE DATE PP
0 000130 1991-01-01 0.0
1 000130 1991-01-02 1.0
2 000130 1991-01-03 2.0
3 000130 1991-01-04 2.0
4 000130 1991-01-05 1.1
... ... ...
10861 000142 2020-12-27 2.1
10862 000142 2020-12-28 2.2
10863 000142 2020-12-29 2.1
10864 000142 2020-12-30 0.4
10865 000142 2020-12-31 1.1
I want to have at least 3 consecutive nans and 5 non consecutive nans in df['PP'] by each df['CODE'] with their corresponding df['DATE'].dt.year and df['DATE'].dt.month
so i must convert random values of df['PP'] to NaN to reach that 3 consecutive and 5 non consecutive NaNs. Expected result:
CODE DATE PP
0 000130 1991-01-01 0.0
1 000130 1991-01-02 NaN
2 000130 1991-01-03 NaN
3 000130 1991-01-04 NaN
4 000130 1991-01-05 1.1
5 000130 1991-01-06 2.1
6 000130 1991-01-07 NaN
7 000130 1991-01-08 2.1
8 000130 1991-01-09 0.4
9 000130 1991-01-10 NaN
... ... ... ...
Important: consecutive nans + alternate nans = 5. So i can have 3 consecutive nans per month inside the 5 nans. And if i already have n nans in a month, i should only add the difference to reach 5 nans. For example if i already have 2 nans in a month i should only add 3 consecutive nans. If i already have 5 nans in the month the code should do nothing with that month.
I tried this:
df['PPNEW']=df['PP'].groupby([df['CODE'],df['DATE'].dt.month]).sample(frac=0.984)
But i can't get the exact quantity of NaNs (only in percentage and months sometimes have 30-31 days) and i can't get consecutive NaNs.
Would you mind to help me?
Thanks in advance.

This is not exactly beautiful code, but it does the job assuming that there are no existing NaN in your data
import numpy as np
import pandas as pd
def add_nans(df, n_consecutive=3, n_alternate=5):
seq = list(df["PP"].values)
indexes = list(range(len(seq)))
idx = np.random.randint(0, len(seq) - n_consecutive + 1)
seq[idx : idx + n_consecutive] = ["nan"] * n_consecutive
if 0 < idx < len(seq) - n_consecutive:
indexes = indexes[:idx - 1] + indexes[idx + n_consecutive + 1:]
elif idx == 0:
indexes = indexes[n_consecutive + 1:]
elif idx == len(seq) - n_consecutive:
indexes = indexes[:idx - 1]
for i in range(n_alternate):
choice = np.random.randint(0, len(indexes))
idx = indexes.pop(choice)
try:
indexes.pop(choice)
except IndexError:
pass
try:
indexes.pop(choice - 1)
except IndexError:
pass
seq[idx] = "nan"
df["PP"] = seq
return df
Here it is the dataframe I tested this function on:
>>> df
CODE DATE PP
0 130 1991-01-01 0.0
1 130 1991-01-02 1.0
2 130 1991-01-03 2.0
3 130 1991-01-04 2.0
4 130 1991-01-05 1.1
5 142 2020-12-27 2.1
6 142 2020-12-28 2.2
7 142 2020-12-29 2.1
8 142 2020-12-30 0.4
9 142 2020-12-31 1.1
Here it is the final result once you apply the function to each group
>>> (df
.groupby(["CODE", df["DATE"].dt.month])
.apply(add_nans, n_consecutive=2, n_alternate=1))
CODE DATE PP
0 130 1991-01-01 nan
1 130 1991-01-02 nan
2 130 1991-01-03 2
3 130 1991-01-04 2
4 130 1991-01-05 nan
5 142 2020-12-27 nan
6 142 2020-12-28 2.2
7 142 2020-12-29 2.1
8 142 2020-12-30 nan
9 142 2020-12-31 nan
In your case n_consecutive = 3 and n_alternate = 2.

def main():
df = generate_df();
df = pd.read_csv('row_data.csv')
df["DATE"] = pd.to_datetime(df['DATE'])
df["Year"] = df['DATE'].dt.year
df['Month'] = df["DATE"].dt.month
final_df = pd.DataFrame(columns=['CODE', 'DATE', 'PP'])
for year in df["Year"].unique().tolist():
year_df = df[df["Year"] == year]
for code in year_df["CODE"].unique().tolist():
specific_code_df = year_df[year_df['CODE'] == code]
for month in specific_code_df['Month'].unique().tolist():
specific_by_month_df = specific_code_df[specific_code_df['Month'] == month]
full_months = len(specific_by_month_df)
if(full_months < 28):
final_df = pd.concat([final_df, specific_by_month_df])
continue
null_counts = specific_by_month_df['PP'].isna().sum()
if(null_counts == 0):
pp_column = list(specific_by_month_df['PP'])
for i in range(1, 4):
pp_column[i] = NaN
rand_index_1 = randint(5, (len(pp_column) // 2) - 1)
rand_index_2 = randint(len(pp_column) // 2, len(pp_column) - 1)
pp_column[rand_index_1] = NaN
pp_column[rand_index_2] = NaN
specific_by_month_df['PP'] = pp_column
elif(null_counts == 1):
pass
elif(null_counts == 2):
pass
elif(null_counts == 3):
pass
elif(null_counts == 4):
pass
else:
pass
final_df = pd.concat([final_df, specific_by_month_df])
final_df.to_csv("final_df.csv", index=False)
This is not complete , but just wanted to try it out.

Try the next, it will work if the main conditions are no broken, that is if there are no more than 3 consecutives nan, and more than 5 in total. If you wanna fix those cases use a bunch of if and it will do the job, but this is enough, I think. If you wanted by 'CODE' just change the df.groupby([df['CODE'], df['date'].dt.month]) to df.groupby(['CODE']). Let me know if it works ;)
from random import randrange
from datetime import datetime, timedelta
import numpy as np
from pandas import Series
def rand(start, stop, *exclude):
start_date = datetime.now()
x = randrange(start, stop)
exc = [0 if i < 0 else i for i in exclude]
while x in exc:
x = rand(start, stop)
assert datetime.now() - start_date < timedelta(seconds=5), 'Took TOO long!!!'
return x
def getPositions(L: list) -> list:
tot = []
for x in L:
if not isinstance(x, list):
tot.append(x)
else:
tot += getPositions(x)
return tot
def addNan(values: list, cons_nan: list=[], non_cons_nan: list=[]) -> list:
temp_non_cons = non_cons_nan[:]
temp_cons = cons_nan[:]
while len(temp_non_cons) + len(temp_cons) < 5:
while len(temp_non_cons) < 2:
exc = temp_non_cons[:]
for i in temp_cons:
exc += [i-1, i, i+1]
temp_non_cons.append(rand(0, len(values), *exc))
while len(temp_cons) < 3:
exc = []
for i in temp_non_cons:
exc += [i-1, i, i+1]
exc += temp_cons
non_exc = []
for i in temp_cons:
non_exc += [i-1, i, i+1]
if non_exc:
for i in range(0, len(values)):
if i not in non_exc:
exc.append(i)
try:
temp_cons.append(rand(0, len(values), *exc))
except AssertionError:
break
else:
break
temp_non_cons, temp_cons = cons_nan[:], non_cons_nan[:]
for i in temp_non_cons + temp_cons:
values[i] = np.nan
return values
def countNan(values: Series, x: int = 0) -> list:
if values.empty or not np.isnan(values.iloc[0]):
return []
else:
temp = x + 1
return [x] + countNan(values[1:], temp)
def runColumn(values: Series, count:int = 0) -> list:
if values.empty:
return []
else:
x = countNan(values)
if not x:
i = 0
temp = count + 1
return runColumn(values[i + 1:], temp)
else:
i = x[-1]
temp = count + i + 1
return [[i + count for i in x]] + runColumn(values[i + 1:], temp)
if __name__ == '__main__':
# Create the DF to analyze:
# ...
df['PP'] = df['PP'].apply(lambda x: np.nan if x == 0 else x)
groups = df.groupby([df['CODE'], df['date'].dt.month])
list2concat = []
for i, group in groups:
nans_positions = runColumn(group['PP'])
# 'nans_positions' will be a list like this [[x1, x1+1], [x2], [x3, x3+1, x3+2]]
# where each 'x' will be a position with a nan
nans_positions.sort(reverse=True)
if len(nans_positions ) > 2:
nans_positions = (nans_positions [0], getPositions(nans_positions [1:]))
group['PP'] = addNan(list(group['PP']), *nans_positions)
list2concat.append(group)
nan_df = pd.concat(list2concat, ignore_index=True)
print(nan_df)

Get n rows before specific value in pandas

Say, i have the following dataframe:
import pandas as pd
dict = {'val':[3.2, 2.4, -2.3, -4.9, 3.2, 2.4, -2.3, -4.9, 2.4, -2.3, -4.9],
'label': [0, 2, 1, -1, 1, 2, -1, -1,1, 1, -1]}
df = pd.DataFrame(dict)
df
val label
0 3.2 0
1 2.4 2
2 -2.3 1
3 -4.9 -1
4 3.2 1
5 2.4 2
6 -2.3 -1
7 -4.9 -1
8 2.4 1
9 -2.3 1
10 -4.9 -1
I want to take each n (for example 2) rows before -1 value in column label. In the given df first -1 appears at index 3, we take 2 rows before it and drop index 3, then next -1 appears at index 6, we again keep 2 rows before and etc. The desired output is as following:
val label
1 2.4 2
2 -2.3 1
4 3.2 1
5 2.4 2
6 -2.3 -1
8 2.4 1
9 -2.3 1
Thanks for any ideas!

You can get the index values and then get the previous two row index values:
idx = df[df.label == -1].index
filtered_idx = (idx-1).union(idx-2)
filtered_idx = filtered_idx[filtered_idx > 0]
df_new = df.iloc[filtered_idx]
output:
val label
1 2.4 2
2 -2.3 1
4 3.2 1
5 2.4 2
6 -2.3 -1
8 2.4 1
9 -2.3 1
Speed comparison with for a for loop solution:
# create large df:
import numpy as np
df = pd.DataFrame(np.random.random((20000000,2)), columns=["val","label"])
df.loc[df.sample(frac=0.01).index, "label"] = - 1
def vectorized_filter(df):
idx = df[df.label == -1].index
filtered_idx = (idx -1).union(idx-2)
df_new = df.iloc[filtered_idx]
return df_new
def loop_filter(df):
filter = df.loc[df['label'] == -1].index
req_idx = []
for idx in filter:
if idx == 0:
continue
elif idx == 1:
req_idx.append(idx-1)
else:
req_idx.append(idx-2)
req_idx.append(idx-1)
req_idx = list(set(req_idx))
df2 = df.loc[df.index.isin(req_idx)]
return df2
%timeit vectorized_filter(df)
%timeit loop_filter(df)
vectorized runs ~20x faster on my machine

Here's a solution:
new_df = pd.DataFrame()
markers = df[df.label.eq(-1)].index
for marker in markers:
new_df = new_df.append(df[marker-2:marker])
new_df.reset_index().drop_duplicates().set_index("index")
Result:
val label
index
1 2.4 2
2 -2.3 1
4 3.2 1
5 2.4 2
6 -2.3 -1
8 2.4 1
9 -2.3 1

filter = df.loc[df['label'] == -1].index
req_idx = []
for idx in filter:
if idx == 0:
continue
elif idx == 1:
req_idx.append(idx-1)
else:
req_idx.append(idx-2)
req_idx.append(idx-1)
req_idx = list(set(req_idx))
df2 = df.loc[df.index.isin(req_idx)]
print(df2)
Output:
val label
1 2.4 2
2 -2.3 1
4 3.2 1
5 2.4 2
6 -2.3 -1
8 2.4 1
9 -2.3 1
This should also work if you have the label as -1 in the first two rows

PANDAS NEW COLUMN BASED ON MULTIPLE CRITERIA AND COLUMNS

I want to create a new columns for a big table using several criteria and columsn and was not sure the best way to approach it.
df = pd.DataFrame({'a': ['A', "B", "B", "C", "D"],
'b':['y','n','y','n', np.nan], 'c':[10,20,10,40,30], 'd':[.3,.1,.4,.2, .1]})
df.head()
def fun(df=df):
df=df.copy()
if df.a=='A' & df.b =='n':
df['new_Col'] = df.c+df.d
if df.a=='A' & df.b =='y':
df['new_Col'] = df.d *2
else:
df['new_Col'] = 0
return df
fun()
OR
def fun(df=df):
df=df.copy()
if df.a=='A' & df.b =='n':
return = df.c+df.d
if df.a=='A' & df.b =='y':
return df.d *2
else:
return 0
df['new_Col"] df.apply(fun)
OR using np.where:
df['new_Col'] = np.where(df.a=='A' & df.b =='n', df.c+df.d,0 )
df['new_Col'] = np.where(df.a=='A' & df.b =='y', df.d *2,0 )

Looks like you need np.select
a, n, y = df.a.eq('A'), df.b.eq('n'), df.b.eq('y')
df['result'] = np.select([a & n, a & y], [df.c + df.d, df.d*2], default=0)

This is an arithmetic way (I added one more row to your sample for case a = 'A' and b = 'n'):
sample
Out[1369]:
a b c d
0 A y 10 0.3
1 B n 20 0.1
2 B y 10 0.4
3 C n 40 0.2
4 D NaN 30 0.1
5 A n 50 0.9
nc = df.a.eq('A') & df.b.eq('y')
mc = df.a.eq('A') & df.b.eq('n')
nr = df.d * 2
mr = df.c + df.d
df['new_col'] = nc*nr + mc*mr
Out[1371]:
a b c d new_col
0 A y 10 0.3 0.6
1 B n 20 0.1 0.0
2 B y 10 0.4 0.0
3 C n 40 0.2 0.0
4 D NaN 30 0.1 0.0
5 A n 50 0.9 50.9

How do I calculate moving average with customized weight in pandas?

I have a dataframe than contains two columns, a: [1,2,3,4,5]; b: [1,0.4,0.3,0.5,0.2]. How can I make a column c such that:
c[0] = 1
c[i] = c[i-1]*b[i]+a[i]*(1-b[i])
so that c:[1,1.6,2.58,3.29,4.658]
Calculation:
1 = 1
1*0.4+2*0.6 = 1.6
1.6*0.3+3*0.7 = 2.58
2.58*0.5+4*0.5 = 3.29
3.29*0.2+5*0.8 = 4.658
?

I can't see a way to vectorise your recursive algorithm. However, you can use numba to optimize your current logic. This should be preferable to a regular loop.
from numba import jit
df = pd.DataFrame({'a': [1,2,3,4,5],
'b': [1,0.4,0.3,0.5,0.2]})
#jit(nopython=True)
def foo(a, b):
c = np.zeros(a.shape)
c[0] = 1
for i in range(1, c.shape[0]):
c[i] = c[i-1] * b[i] + a[i] * (1-b[i])
return c
df['c'] = foo(df['a'].values, df['b'].values)
print(df)
a b c
0 1 1.0 1.000
1 2 0.4 1.600
2 3 0.3 2.580
3 4 0.5 3.290
4 5 0.2 4.658

There could be a smarter way, but here's my attempt:
import pandas as pd
a = [1,2,3,4,5]
b = [1,0.4,0.3,0.5,0.2]
df = pd.DataFrame({'a':a , 'b': b})
for i in range(len(df)):
if i is 0:
df.loc[i,'c'] = 1
else:
df.loc[i,'c'] = df.loc[i-1,'c'] * df.loc[i,'b'] + df.loc[i,'a'] * (1 - df.loc[i,'b'])
Output:
a b c
0 1 1.0 1.000
1 2 0.4 1.600
2 3 0.3 2.580
3 4 0.5 3.290
4 5 0.2 4.658

How to count the number of time intervals that meet a boolean condition within a pandas dataframe?

I have a pandas df with a time series in column1, and a boolean condition in column2. This describes continuous time intervals that meet a specific condition. Note that the time intervals are of unequal length.
Timestamp Boolean_condition
1 1
2 1
3 0
4 1
5 1
6 1
7 0
8 0
9 1
10 0
How to count the total number of time intervals within the whole series that meet this condition?
The desired output should look like this:
Timestamp Boolean_condition Event_number
1 1 1
2 1 1
3 0 NaN
4 1 2
5 1 2
6 1 2
7 0 NaN
8 0 NaN
9 1 3
10 0 NaN

You can create Series with cumsum of two masks and then create NaN by function Series.mask:
mask0 = df.Boolean_condition.eq(0)
mask2 = df.Boolean_condition.ne(df.Boolean_condition.shift(1))
print ((mask2 & mask0).cumsum().add(1))
0 1
1 1
2 2
3 2
4 2
5 2
6 3
7 3
8 3
9 4
Name: Boolean_condition, dtype: int32
df['Event_number'] = (mask2 & mask0).cumsum().add(1).mask(mask0)
print (df)
Timestamp Boolean_condition Event_number
0 1 1 1.0
1 2 1 1.0
2 3 0 NaN
3 4 1 2.0
4 5 1 2.0
5 6 1 2.0
6 7 0 NaN
7 8 0 NaN
8 9 1 3.0
9 10 0 NaN
Timings:
#[100000 rows x 2 columns
df = pd.concat([df]*10000).reset_index(drop=True)
df1 = df.copy()
df2 = df.copy()
def nick(df):
isone = df.Boolean_condition[df.Boolean_condition.eq(1)]
idx = isone.index
grp = (isone != idx.to_series().diff().eq(1)).cumsum()
df.loc[idx, 'Event_number'] = pd.Categorical(grp).codes + 1
return df
def jez(df):
mask0 = df.Boolean_condition.eq(0)
mask2 = df.Boolean_condition.ne(df.Boolean_condition.shift(1))
df['Event_number'] = (mask2 & mask0).cumsum().add(1).mask(mask0)
return (df)
def jez1(df):
mask0 = ~df.Boolean_condition
mask2 = df.Boolean_condition.ne(df.Boolean_condition.shift(1))
df['Event_number'] = (mask2 & mask0).cumsum().add(1).mask(mask0)
return (df)
In [68]: %timeit (jez1(df))
100 loops, best of 3: 6.45 ms per loop
In [69]: %timeit (nick(df1))
100 loops, best of 3: 12 ms per loop
In [70]: %timeit (jez(df2))
100 loops, best of 3: 5.34 ms per loop

You could try the following:
1) Get all values of True instance (here, 1) which comprises of isone
2) Take it's corresponding set of indices and convert this to a series representation so that the new series has both it's index and values as the earlier computed indices. Perform the difference between successive rows and check if they are equal to 1. This becomes our boolean mask.
3) Compare isone with the obtained boolean mask and whenever they do not become equal, we take their cumulative sum (also known as adjacency check between elements). These help us in grouping purposes.
4) Using loc for the indices of isone, we assign the codes computed after changing the grp array to Categorical format to a new column created, Event_number.
isone = df.Bolean_condition[df.Bolean_condition.eq(1)]
idx = isone.index
grp = (isone != idx.to_series().diff().eq(1)).cumsum()
df.loc[idx, 'Event_number'] = pd.Categorical(grp).codes + 1
Faster approach:
Using only numpy:
1) Get it's array representation.
2) Compute the non-zero, here (1's) indices.
3) Insert NaN at the beginning of this array which would act as a starting point for us to perform difference taking successive rows into consideration.
4) Initialize a new array filled with Nan's of the same shape as that of the original array.
5) Whenever the difference between successive rows is not equal to 1, we take their cumulative sum, else they fall in the same group. These values get imputed at the indices where there were 1's before.
6) Assign these back to the new column.
def nick(df):
b = df.Bolean_condition.values
slc = np.flatnonzero(b)
slc_pl_1 = np.append(np.nan, slc)
nan_arr = np.full(b.size, fill_value=np.nan)
nan_arr[slc] = np.cumsum(slc_pl_1[1:] - slc_pl_1[:-1] != 1)
df['Event_number'] = nan_arr
return df
Timings:
For a DF of 10,000 rows:
np.random.seed(42)
df1 = pd.DataFrame(dict(
Timestamp=np.arange(10000),
Bolean_condition=np.random.choice(np.array([0,1]), 10000, p=[0.4, 0.6]))
)
df1.shape
# (10000, 2)
def jez(df):
mask0 = df.Bolean_condition.eq(0)
mask2 = df.Bolean_condition.ne(df.Bolean_condition.shift(1))
df['Event_number'] = (mask2 & mask0).cumsum().mask(mask0)
return (df)
nick(df1).equals(jez(df1))
# True
%%timeit
nick(df1)
1000 loops, best of 3: 362 µs per loop
%%timeit
jez(df1)
100 loops, best of 3: 1.56 ms per loop
For a DF containing 1 million rows:
np.random.seed(42)
df1 = pd.DataFrame(dict(
Timestamp=np.arange(1000000),
Bolean_condition=np.random.choice(np.array([0,1]), 1000000, p=[0.4, 0.6]))
)
df1.shape
# (1000000, 2)
nick(df1).equals(jez(df1))
# True
%%timeit
nick(df1)
10 loops, best of 3: 34.9 ms per loop
%%timeit
jez(df1)
10 loops, best of 3: 50.1 ms per loop

This should work but might be a bit slow for a very long df.
df = pd.concat([df,pd.Series([0]*len(df), name = '2')], axis = 1)
if df.iloc[0,1] == 1:
counter = 1
df.iloc[0, 2] = counter
else:
counter = 0
df.iloc[0,2] = 0
previous = df.iloc[0,1]
for y,x in df.iloc[1:,].iterrows():
print(y)
if x[1] == 1 and previous == 1:
previous = x[1]
df.iloc[y, 2] = counter
if x[1] == 0:
previous = x[1]
df.iloc[y,2] = 0
if x[1] == 1 and previous == 0:
counter += 1
previous = x[1]
df.iloc[y,2] = counter

A custom function does the trick. here is a solution in Matlab code:
Boolean_condition = [1 1 0 1 1 1 0 0 1 0];
Event_number = [NA NA NA NA NA NA NA NA NA NA];
loop_event_number = 1;
for timestamp=1:10
if Boolean_condition(timestamp)==1
Event_number(timestamp) = loop_event_number;
last_event_number = loop_event_number;
else
loop_event_number = last_event_number +1;
end
end
% Event_number = 1 1 NA 2 2 2 NA NA 3 NA

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas count values greater than current row in the last n rows - python

How to get count of values greater than current row in the last n rows? Imagine we have a dataframe as following: col_a 0 8.4 1 11.3 2 7.2 3 6.5 4 4.5 5 8.9 I am trying to get a table such as following where n=3. col_a col_b 0 8.4 0 1 11.3 0 2 7.2 2 3 6.5 3 4 4.5 3 5 8.9 0 Thanks in advance.

n = 3 df['col_b'] = df.apply(lambda row: sum(row.col_a <= df.col_a.loc[row.name - n: row.name-1]), axis=1) Out[]: col_a col_b 0 8.4 0 1 11.3 0 2 7.2 2 3 6.5 3 4 4.5 3 5 8.9 0

Related

How to insert n random NaN consecutive and no consecutive data by month?

Get n rows before specific value in pandas

PANDAS NEW COLUMN BASED ON MULTIPLE CRITERIA AND COLUMNS

How do I calculate moving average with customized weight in pandas?

How to count the number of time intervals that meet a boolean condition within a pandas dataframe?

Categories

Resources