Get groups where values go from 0 to 1 - python

I want to get all the users from a dataframe where a specific column goes from 1 to 0.
For example, with the following dataframe I want to keep only user 1 and 2 as their values go from 1 to 0.
Relevant rows
Row 6 to 7 for user 1
Row 9 to 10 for user 2
user value
0 0 0
1 0 0
2 0 1
3 0 1
4 1 0
5 1 1
6 1 1
7 1 0
8 2 1
9 2 1
10 2 0
11 2 0
Desired Result
user value
4 1 0
5 1 1
6 1 1
7 1 0
8 2 1
9 2 1
10 2 0
11 2 0
I have tried window functions and conditions but for some reason I cannot get the desired result.

Let us try cummax
df.loc[df.user.isin(df.loc[df.value != df.groupby('user')['value'].cummax(),'user'])]
Out[769]:
user value
4 1 0
5 1 1
6 1 1
7 1 0
8 2 1
9 2 1
10 2 0
11 2 0

You can use shift to check if the next value is 1 (df.value.shift(1).eq(1)), and combine that with a mask checking if the current value is 0 (df.value.eq(0)). Then, group by 'user' and transform('any') to create the appropriate mask:
filtered = df[(df.value.eq(0) & df.value.shift(1).eq(1)).groupby(df.user).transform('any')]
Output:
>>> filtered
user value
4 1 0
5 1 1
6 1 1
7 1 0
8 2 1
9 2 1
10 2 0
11 2 0

You can use GroupBy.filter. If any diff (difference of successive values) is equal to -1 (0-1), keep the group.
df.groupby('user').filter(lambda g: g['value'].diff().eq(-1).any())
NB. this assumes you only have 0 and 1s, if you can have other numbers you also need to use two conditions: (g['value'].eq(1)&g['value'].shift(-1).eq(0)).any()
output:
user value
4 1 0
5 1 1
6 1 1
7 1 0
8 2 1
9 2 1
10 2 0
11 2 0

Related

Increment the value in a new column based on a condition using an existing column

I have a pandas dataframe with two columns:
temp_1 flag
1 0
1 0
1 0
2 0
3 0
4 0
4 1
4 0
5 0
6 0
6 1
6 0
and I wanted to create a new column named "final" based on :
if "flag" has a value = 1 , then it increments "temp_1" by 1 and following values as well. If we find value = 1 again in flag column then the previous value in "final" with get incremented by 1 , please refer to expected output
I have tired using .cumsum() with filters but not getting the desired result.
Expected output
temp_1 flag final
1 0 1
1 0 1
1 0 1
2 0 2
3 0 3
4 0 4
4 1 5
4 0 5
5 0 6
6 0 7
6 1 8
6 0 8
Just do cumsum for flag:
>>> df['final'] = df['temp_1'] + df['flag'].cumsum()
>>> df
temp_1 flag final
0 1 0 1
1 1 0 1
2 1 0 1
3 2 0 2
4 3 0 3
5 4 0 4
6 4 1 5
7 4 0 5
8 5 0 6
9 6 0 7
10 6 1 8
11 6 0 8
>>>

How to count consecutive same values in a pythonic way that looks iterative

So I am trying to count the number of consecutive same values in a dataframe and put that information into a new column in the dataframe, but I want the count to look iterative.
Here is what I have so far:
df = pd.DataFrame(np.random.randint(0,3, size=(15,4)), columns=list('ABCD'))
df['subgroupA'] = (df.A != df.A.shift(1)).cumsum()
dfg = df.groupby(by='subgroupA', as_index=False).apply(lambda grp: len(grp))
dfg.rename(columns={None: 'numConsec'}, inplace=True)
df = df.merge(dfg, how='left', on='subgroupA')
df
Here is the result:
A B C D subgroupA numConsec
0 2 1 1 1 1 1
1 1 2 1 0 2 2
2 1 0 2 1 2 2
3 0 1 2 0 3 1
4 1 0 0 1 4 1
5 0 2 2 1 5 2
6 0 2 1 1 5 2
7 1 0 0 1 6 1
8 0 2 0 0 7 4
9 0 0 0 2 7 4
10 0 2 1 1 7 4
11 0 2 2 0 7 4
12 1 2 0 1 8 1
13 0 1 1 0 9 1
14 1 1 1 0 10 1
The problem is, in the numConsec column, I don't want the full count for every row. I want it to reflect how it looks as you iteratively look at the dataframe. The problem is, my dataframe is too large to iteratively loop through and make the counts, as that would be too slow. I need to do it in a pythonic way and make it look like this:
A B C D subgroupA numConsec
0 2 1 1 1 1 1
1 1 2 1 0 2 1
2 1 0 2 1 2 2
3 0 1 2 0 3 1
4 1 0 0 1 4 1
5 0 2 2 1 5 1
6 0 2 1 1 5 2
7 1 0 0 1 6 1
8 0 2 0 0 7 1
9 0 0 0 2 7 2
10 0 2 1 1 7 3
11 0 2 2 0 7 4
12 1 2 0 1 8 1
13 0 1 1 0 9 1
14 1 1 1 0 10 1
Any ideas?

How do I create a Sequence in Pyspark that resets when rows change from 0 to 1 and and increments when all are 1's

I have a pyspark dataframe like this and need the SEQ output as shown:
R_ID ORDER SC_ITEM seq
A 1 0
A 3 1 1
A 4 1 2
A 5 1 3
A 6 1 4
A 7 1 5
A 8 1 6
A 9 1 7
A 10 0 0
A 11 1 1
A 12 0 0
A 13 1
A 14 0
A 15 1 1
A 16 1 2
A 17 1 3
A 18 1 4
A 19 1 5
A 20 1 6
A 21 0 0
A 22 0 0
B 1 0 0
B 2 1 1
C 1 1 1
C 2 1 2
Not sure if the data is showing properly. So pic attached :enter image description here
I did something like this :
RN = Window().orderBy(lit('A'))
.when(((F.col("R_ID")==(lag(F.col("R_ID"),1).over(RN))) & (F.col("SC_ITEM")== 1)), (F.col("SC_ITEM") + (lag(F.col("SEQ"),1).over(RN))))\
Not sure if I can do lead or lag over the SEQ. Please help how to do this

Change dataframe values avobe and below a cell with a certain value

I have a data frame with a column called "flag" with values 1 and 0. 1 means that the data is alright and 0 that there was something weird with this data value. I want to create another column called "safe" that copies the flag values value and changes to 0 a N number of cells above and below a 0 value in "flag". For example with N=2 I want to get this output:
flag safe
1 1 1
2 1 0
3 1 0
4 0 0
5 1 0
6 1 0
7 1 1
8 1 0
9 1 0
10 0 0
11 0 0
12 1 0
13 1 0
14 1 1
15 1 1
I want to be able to change N=3,4,5,6 manually so I can see how big is the impact. How could I do this?
IIUC, Series.where + Series.bfill and Series.ffill
N=2
df['safe'] = (df['flag'].where(lambda x: x.eq(0))
.bfill(limit=N)
.ffill(limit=N)
.fillna(df['flag'], downcast='int'))
print(df)
flag safe
1 1 1
2 1 0
3 1 0
4 0 0
5 1 0
6 1 0
7 1 1
8 1 0
9 1 0
10 0 0
11 0 0
12 1 0
13 1 0
14 1 1
15 1 1

create a 'group number' column for a pandas data frame column of '0' and '1' s

How to get the data frame below
dd = pd.DataFrame({'val':[0,0,1,1,1,0,0,0,0,1,1,0,1,1,1,1,0,0],
'groups':[1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,'ignore','ignore']})
val groups
0 0 1
1 0 1
2 1 1
3 1 1
4 1 1
5 0 2
6 0 2
7 0 2
8 0 2
9 1 2
10 1 2
11 0 3
12 1 3
13 1 3
14 1 3
15 1 3
16 0 ignore
17 0 ignore
I have a series df.val with has values [0,0,1,1,1,0,0,0,0,1,1,0,1,1,1,1,0,0].
How to create df.groups from df.val.
first 0,0,1,1,1 will form group 1,(i.e. from the beginning upto next occurrence of 0 after 1's)
0,0,0,0,1,1 will form group 2, (incremental group number, starting where previous group ended uptill next occurrence of 0 after 1's),...etc
Can anyone please help.
First test if next value after 0 is 1 and create groups by sumulative sums by Series.cumsum:
s = (dd['val'].eq(0) & dd['val'].shift().eq(1)).cumsum().add(1)
Then convert last group to ignore if last value of data are 0 with numpy.where:
mask = s.eq(s.max()) & (dd['val'].iat[-1] == 0)
dd['new'] = np.where(mask, 'ignore', s)
print (dd)
val groups new
0 0 1 1
1 0 1 1
2 1 1 1
3 1 1 1
4 1 1 1
5 0 2 2
6 0 2 2
7 0 2 2
8 0 2 2
9 1 2 2
10 1 2 2
11 0 3 3
12 1 3 3
13 1 3 3
14 1 3 3
15 1 3 3
16 0 ignore ignore
17 0 ignore ignore
IIUC first we do diff and cumsum , then we need to find the condition to ignore the previous value we get (np.where)
s=df.val.diff().eq(-1).cumsum()+1
df['New']=np.where(df['val'].eq(1).groupby(s).transform('any'),s,'ignore')
df
val groups New
0 0 1 1
1 0 1 1
2 1 1 1
3 1 1 1
4 1 1 1
5 0 2 2
6 0 2 2
7 0 2 2
8 0 2 2
9 1 2 2
10 1 2 2
11 0 3 3
12 1 3 3
13 1 3 3
14 1 3 3
15 1 3 3
16 0 ignore ignore
17 0 ignore ignore

Categories

Resources