Pandas expanding count when column value changes - python

I have a dataframe
df = pd.DataFrame({'A': [True, True, False, False, False, False, False, True, True, True, False]})
A
0 True
1 True
2 False
3 False
4 False
5 False
6 False
7 True
8 True
9 True
10 False
I want to apply a count which expands using two criteria: each time column A changes value, or when succeeding rows are False. If succeeding rows are True the count should hold static. The desired output would be:
A B
0 True 1
1 True 1
2 False 2
3 False 3
4 False 4
5 False 5
6 False 6
7 True 7
8 True 7
9 True 7
10 False 8
I've faffed with a whole range of pandas functions and can't seem to figure it out.

Try:
1st condition: each time column A changes value: df.ne(df.shift()
2nd condition: when succeeding rows are False: df.eq(False)
and do a cumsum over the boolean mask:
>>> (df.ne(df.shift()) | df.eq(False)).cumsum()
A
0 1
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 7
9 7
10 8

Related

Pandas, create column using previous new column value

I am using Python and have the following Pandas Dataframe:
idx
result
grouping
1
False
2
True
3
True
4
False
5
True
6
True
7
True
8
False
9
True
10
True
11
True
12
True
What I would like is to do the following logic...
if the result is False then I want grouping to be the idx value.
if the result is True then I want the grouping to be the previous grouping value
So the end result will be:
idx
result
grouping
1
False
1
2
True
1
3
True
1
4
False
4
5
True
4
6
True
4
7
True
4
8
False
8
9
True
8
10
True
8
11
True
8
12
True
8
I have tried all sorts to get this working from using the Pandas shift() command to using lambda, but I am just not getting it.
I know I could iterate through the dataframe and perform the calculation but there has to be a better method.
examples of what I have tried and failed with are:
df['grouping'] = df['idx'] if not df['result'] else df['grouping'].shift(1)
df['grouping'] = df.apply(lambda x: x['idx'] if not x['result'] else x['grouping'].shift(1), axis=1)
Many Thanks for any assistance you can provide.
mask true values then forward fill
df['grouping'] = df['idx'].mask(df['result']).ffill(downcast='infer')
idx result grouping
0 1 False 1
1 2 True 1
2 3 True 1
3 4 False 4
4 5 True 4
5 6 True 4
6 7 True 4
7 8 False 8
8 9 True 8
9 10 True 8
10 11 True 8
11 12 True 8

Fill grouping variable pandas dataframe

I have a pandas dataframe with an id column called doc_ID and a boolean column that reports if a certain value is below a threshold, like so:
df = pd.DataFrame({'doc_ID': [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3],
'below_threshold': [False, False, False, False, True, False, False, True, False, False,
False, False, False, False, False, False, False, True, False, False]})
I'm trying to create a new grouping id within each doc_ID that would extend from the first False value in order until and including the first True value. Something like this
doc_ID below_threshold new_group
0 1 False 1
1 1 False 1
2 1 False 1
3 1 False 1
4 1 True 1
5 1 False 2
6 2 False 3
7 2 True 3
8 2 False 4
9 2 False 4
10 2 False 4
11 2 False 4
12 3 False 5
13 3 False 5
14 3 False 5
15 3 False 5
16 3 False 5
17 3 True 5
18 3 False 6
19 3 False 6
IIUC, use:
m1 = ~df['below_threshold']
m2 = df.groupby('doc_ID')['below_threshold'].shift(fill_value=True)
df['new_group'] = (m1&m2).cumsum()
Output:
doc_ID below_threshold new_group
0 1 False 1
1 1 False 1
2 1 False 1
3 1 False 1
4 1 True 1
5 1 False 2
6 2 False 3
7 2 True 3
8 2 False 4
9 2 False 4
10 2 False 4
11 2 False 4
12 3 False 5
13 3 False 5
14 3 False 5
15 3 False 5
16 3 False 5
17 3 True 5
18 3 False 6
19 3 False 6

Group boolean values in Pandas Dataframe

i have a Dataframe with a random series of True, False in a column:
import pandas as pd
df = pd.DataFrame(data={'A':[True, False, True, True, False, False, True, False, True, False, False]})
df
A
0
True
1
False
2
True
3
True
4
False
5
False
6
True
7
False
8
True
9
False
10
False
and i want this: (Dont know how to explain it with easy words)
A
B
0
True
1
1
False
2
2
True
2
3
True
2
4
False
3
5
False
3
6
True
3
7
False
4
8
True
4
9
False
5
10
False
5
I've tried something with the following commands, but without success:
df[A].shift()
df[A].diff()
df[A].eq()
Many thanks for your help.
Matthias
IIUC, you can try:
df['B'] = (df.A.shift() & ~df.A).cumsum() + 1
# OR df['B'] = (df.A.shift() & ~df.A).cumsum().add(1)
OUTPUT:
A B
0 True 1
1 False 2
2 True 2
3 True 2
4 False 3
5 False 3
6 True 3
7 False 4
8 True 4
9 False 5
10 False 5
A little bit logic with diff
(~df.A.astype(int).diff().ne(-1)).cumsum()+1
Out[234]:
0 1
1 2
2 2
3 2
4 3
5 3
6 3
7 4
8 4
9 5
10 5
Name: A, dtype: int32

Pandas: Check if column value is smaller than any previous column value

I want to check if any value of column 'c' is smaller than all previous column values.
In my current approach I am using pandas diff(), but it let's me only compare to the previous value.
import pandas as pd
df = pd.DataFrame({'c': [1, 4, 9, 7, 8, 36]})
df['diff'] = df['c'].diff() < 0
print(df)
Current result:
c diff
0 1 False
1 4 False
2 9 False
3 7 True
4 8 False
5 36 False
Wanted result:
c diff
0 1 False
1 4 False
2 9 False
3 7 True
4 8 True
5 36 False
So row 4 should also result in a True, as 8 is smaller than 9.
Thanks
This should work:
df['diff'] = df['c'] < df['c'].cummax()
Output is just as you mentioned:
c diff
0 1 False
1 4 False
2 9 False
3 7 True
4 8 True
5 36 False

deleting rows based on number of true values per group - Python

I am trying to delete rows based on groupby and number of True values.
Per group, if they have only one true value (sum() = 1), I would like that single row deleted.
import pandas as pd
df = pd.DataFrame({'id': [1,1,1,2,2,2,3,3,3], 'value': [True, True, False, True, False, False, False, False, True]})
print (df)
id value
0 1 True
1 1 True
2 1 False
3 2 True
4 2 False
5 2 False
6 3 False
7 3 False
8 3 True
df.groupby('id')['value'].sum()
Out[571]:
id
1 2.0
2 1.0
3 1.0
id 1 & 3 match the criteria, but how do i delete those single true rows such that the dataframe then becomes:
print (df)
id value
0 1 True
1 1 True
2 1 False
3 2 False
4 2 False
5 3 False
6 3 False
You can use a Boolean mask:
m1 = df.groupby('id')['value'].transform('sum') == 1
m2 = df['value']
df = df[~(m1 & m2)].reset_index(drop=True)
print(df)
id value
0 1 True
1 1 True
2 1 False
3 2 False
4 2 False
5 3 False
6 3 False

Categories

Resources