Streaking cumulative count with cancellation - python

I am trying to create a count that accumulates streaks but can be cancelled by a different column. There are three outcomes in this count
The streak accumulates based on flag == true.
The streak resets on cancel on cancel == true.
The streak does nothing and repeats the current streak.
I have tried several different approaches attempting to combinine the flag and cancel using np.where, masking groupby with where, multiple cumsums, fills, and ngroup, but cannot get the result wanted.
df = pd.DataFrame(
{
"cond1": [True, False, True, False, True, False, True],
"cond2": [False, False, False, True, False, False, False]
})
df['flag'] = np.where(df['cond1'], 1, 0)
df['cancel'] = np.where(df['cond2'], 1, 0)
# Combined
df['combined'] = df['flag'] - df['cancel']
# Cumsum only
df['cumsum'] = df['combined'].cumsum()
# Cumcount masked by where
df['cumsum_cumcount'] = df.where(df['cond1']).groupby((df['cond2']).cumsum()).cumcount()
# Cumcount then cumsum
df['cumsum_cumcount_cumsum'] = df.where(df['cancel'] == False).groupby(df['flag'].cumsum()).cumcount().cumsum()
cond1 cond2 flag cancel c2 c3 c1
0 True False 1 0 0 0 1
1 False False 0 0 1 1 1
2 True False 1 0 2 1 2
3 False True 0 1 0 2 1
4 True False 1 0 1 2 2
5 False False 0 0 2 3 2
6 True False 1 0 3 3 3
cond1 cond2 streak
0 True False 1
1 False False 1
2 True False 2
3 False True 0
4 True False 1
5 False False 1
6 True False 2
7 True False 3
8 False False 3
9 True False 4
10 False True 0
11 False False 0
12 True False 1
The current streak repeats, accumulates when cond1 is true and resets when cond2 is false. Big bonus points if this could accumulate in the opposite direction too without too much hassle. Cancels being negatives flags being positives.

Seems like you need cumsum with cond2 create the group key then cumsum with cond1
df.groupby(df.cond2.cumsum()).cond1.cumsum()
Out[155]:
0 1.0
1 1.0
2 2.0
3 0.0
4 1.0
5 1.0
6 2.0
7 3.0
8 3.0
9 4.0
10 0.0
11 0.0
12 1.0
Name: cond1, dtype: float64

Related

Pandas expanding count when column value changes

I have a dataframe
df = pd.DataFrame({'A': [True, True, False, False, False, False, False, True, True, True, False]})
A
0 True
1 True
2 False
3 False
4 False
5 False
6 False
7 True
8 True
9 True
10 False
I want to apply a count which expands using two criteria: each time column A changes value, or when succeeding rows are False. If succeeding rows are True the count should hold static. The desired output would be:
A B
0 True 1
1 True 1
2 False 2
3 False 3
4 False 4
5 False 5
6 False 6
7 True 7
8 True 7
9 True 7
10 False 8
I've faffed with a whole range of pandas functions and can't seem to figure it out.
Try:
1st condition: each time column A changes value: df.ne(df.shift()
2nd condition: when succeeding rows are False: df.eq(False)
and do a cumsum over the boolean mask:
>>> (df.ne(df.shift()) | df.eq(False)).cumsum()
A
0 1
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 7
9 7
10 8

Group boolean values in Pandas Dataframe

i have a Dataframe with a random series of True, False in a column:
import pandas as pd
df = pd.DataFrame(data={'A':[True, False, True, True, False, False, True, False, True, False, False]})
df
A
0
True
1
False
2
True
3
True
4
False
5
False
6
True
7
False
8
True
9
False
10
False
and i want this: (Dont know how to explain it with easy words)
A
B
0
True
1
1
False
2
2
True
2
3
True
2
4
False
3
5
False
3
6
True
3
7
False
4
8
True
4
9
False
5
10
False
5
I've tried something with the following commands, but without success:
df[A].shift()
df[A].diff()
df[A].eq()
Many thanks for your help.
Matthias
IIUC, you can try:
df['B'] = (df.A.shift() & ~df.A).cumsum() + 1
# OR df['B'] = (df.A.shift() & ~df.A).cumsum().add(1)
OUTPUT:
A B
0 True 1
1 False 2
2 True 2
3 True 2
4 False 3
5 False 3
6 True 3
7 False 4
8 True 4
9 False 5
10 False 5
A little bit logic with diff
(~df.A.astype(int).diff().ne(-1)).cumsum()+1
Out[234]:
0 1
1 2
2 2
3 2
4 3
5 3
6 3
7 4
8 4
9 5
10 5
Name: A, dtype: int32

How to create a Boolean Series based on the logic: (0 followed by a 1 is True. A 1 preceded by a 0 is True. All others are False)

I have the following DF. I'm Trying to make a Boolean Series where the logic is:
(0 followed by a 1 is True. A 1 preceded by a 0 is True. All others are False)
Here is the DataFrame
df = pd.DataFrame({'A': {0: 1, 1: 0, 2: 1, 3: 1, 4: 1, 5: 0, 6: 1, 7: 1, 8: 1, 9: 1, 10: 0, 11: 0, 12: 1}})
A
0 1
1 0
2 1
3 1
4 1
5 0
6 1
7 1
8 1
9 1
10 0
11 0
12 1
Expected Output (0 followed by a 1 is True. A 1 preceded by a 0 is True. All others are False:
A Truth
0 1 False
1 0 True
2 1 True
3 1 False
4 1 False
5 0 True
6 1 True
7 1 False
8 1 False
9 1 False
10 0 False
11 0 True
12 1 True
My ouput using: df['Truth'] = df['A'] == 0 | ( (df['A'].shift() == 0) & (df['A'] == 1) )
A Truth
0 1 False
1 0 True
2 1 True
3 1 False
4 1 False
5 0 True
6 1 True
7 1 False
8 1 False
9 1 False
10 0 True
11 0 True
12 1 True
I'm getting True on a zero, but a zero should only by True if followed by one, and not another zero. Any help would be appreciated. Thanks.
Try:
cond1 = df['A'].diff().shift(-1).eq(1).where(df['A']==0)
df['Truth'] = df['A'].diff().eq(1).where(df['A'] == 1).fillna(cond1).astype('bool')
print(df)
Output:
A Truth
0 1 False
1 0 True
2 1 True
3 1 False
4 1 False
5 0 True
6 1 True
7 1 False
8 1 False
9 1 False
10 0 False
11 0 True
12 1 True
Check condition 1 and only set it where A == 0 then check condition 2 and only set it where A == 1, use fillna to combine the two condtions.
In your case rolling sum should be 1
df.A.rolling(2).sum()==1
0 False
1 True
2 True
3 False
4 False
5 True
6 True
7 False
8 False
9 False
10 True
11 False
12 True
You can use your logic:
df['A'] != df['A'].shift(fill_value=df['A'].iloc[0])
Output:
0 False
1 True
2 True
3 False
4 False
5 True
6 True
7 False
8 False
9 False
10 True
11 False
12 True
Name: A, dtype: bool

Problems while creating a two column based index in a new pandas column?

Given the following dataframe:
col_1 col_2
False 1
False 1
False 1
False 1
False 1
False 1
False 1
False 1
False 1
False 1
False 1
False 1
False 1
False 1
False 2
True 2
False 2
False 2
True 2
False 2
False 2
False 2
False 2
False 2
False 2
False 2
False 2
False 2
False 2
False 2
How can I create a new index that help to identify when a True value is present in col_1? That is, when in the first column a True value appears I would like to fill backward with a number starting from one the new column. For example, this is the expected output for the above dataframe:
col_1 col_2 new_id
False 1 1
False 1 1
False 1 1
False 1 1
False 1 1
False 1 1
False 1 1
False 1 1
False 1 1
False 1 1
False 1 1
False 1 1
False 1 1
False 1 1
False 2 1
True 2 1 --------- ^ (fill with 1 and increase the counter)
False 2 2
False 2 2
True 2 2 --------- ^ (fill with 2 and increase the counter)
False 2 3
False 2 3
False 2 3
False 2 3
False 2 3
False 2 3
False 2 3
False 2 3
False 2 3
False 2 3
False 2 3
True 2 4 --------- ^ (fill with 3 and increase the counter)
The problem is that I do not know how to create the id although I know that pandas provide a bfill object that may help to achieve this purpose. So far I tried to iterate with a simple for loop:
count = 0
for index, row in df.iterrows():
if row['col_1'] == False:
print(count+1)
else:
print(row['col_2'] + 1)
However, I do not know how to increase the counter to the next number. Also I tried to create a function and then apply it to the dataframe:
def create_id(col_1, col_2):
counter = 0
if col_1 == True and col_2.bool() == True:
return counter + 1
else:
pass
Nevertheless, i lose control of filling backward the column.
Just do with cumsum
df['new_id']=(df.col_1.cumsum().shift().fillna(0)+1).astype(int)
df
Out[210]:
col_1 col_2 new_id
0 False 1 1
1 False 1 1
2 False 1 1
3 False 1 1
4 False 1 1
5 False 1 1
6 False 1 1
7 False 1 1
8 False 1 1
9 False 1 1
10 False 1 1
11 False 1 1
12 False 1 1
13 False 1 1
14 False 2 1
15 True 2 1
16 False 2 2
17 False 2 2
18 True 2 2
19 False 2 3
20 False 2 3
21 False 2 3
22 False 2 3
23 False 2 3
24 False 2 3
25 False 2 3
26 False 2 3
27 False 2 3
28 False 2 3
29 False 2 3
If you aim to append the new_id column to your dataframe:
new_id=[]
counter=1
for index, row in df.iterrows():
new_id+= [counter]
if row['col_1']==True:
counter+=1
df['new_id']=new_id

deleting rows based on number of true values per group - Python

I am trying to delete rows based on groupby and number of True values.
Per group, if they have only one true value (sum() = 1), I would like that single row deleted.
import pandas as pd
df = pd.DataFrame({'id': [1,1,1,2,2,2,3,3,3], 'value': [True, True, False, True, False, False, False, False, True]})
print (df)
id value
0 1 True
1 1 True
2 1 False
3 2 True
4 2 False
5 2 False
6 3 False
7 3 False
8 3 True
df.groupby('id')['value'].sum()
Out[571]:
id
1 2.0
2 1.0
3 1.0
id 1 & 3 match the criteria, but how do i delete those single true rows such that the dataframe then becomes:
print (df)
id value
0 1 True
1 1 True
2 1 False
3 2 False
4 2 False
5 3 False
6 3 False
You can use a Boolean mask:
m1 = df.groupby('id')['value'].transform('sum') == 1
m2 = df['value']
df = df[~(m1 & m2)].reset_index(drop=True)
print(df)
id value
0 1 True
1 1 True
2 1 False
3 2 False
4 2 False
5 3 False
6 3 False

Categories

Resources