I have a pandas dataframe with two columns:
temp_1 flag
1 0
1 0
1 0
2 0
3 0
4 0
4 1
4 0
5 0
6 0
6 1
6 0
and I wanted to create a new column named "final" based on :
if "flag" has a value = 1 , then it increments "temp_1" by 1 and following values as well. If we find value = 1 again in flag column then the previous value in "final" with get incremented by 1 , please refer to expected output
I have tired using .cumsum() with filters but not getting the desired result.
Expected output
temp_1 flag final
1 0 1
1 0 1
1 0 1
2 0 2
3 0 3
4 0 4
4 1 5
4 0 5
5 0 6
6 0 7
6 1 8
6 0 8
Just do cumsum for flag:
>>> df['final'] = df['temp_1'] + df['flag'].cumsum()
>>> df
temp_1 flag final
0 1 0 1
1 1 0 1
2 1 0 1
3 2 0 2
4 3 0 3
5 4 0 4
6 4 1 5
7 4 0 5
8 5 0 6
9 6 0 7
10 6 1 8
11 6 0 8
>>>
So I am trying to count the number of consecutive same values in a dataframe and put that information into a new column in the dataframe, but I want the count to look iterative.
Here is what I have so far:
df = pd.DataFrame(np.random.randint(0,3, size=(15,4)), columns=list('ABCD'))
df['subgroupA'] = (df.A != df.A.shift(1)).cumsum()
dfg = df.groupby(by='subgroupA', as_index=False).apply(lambda grp: len(grp))
dfg.rename(columns={None: 'numConsec'}, inplace=True)
df = df.merge(dfg, how='left', on='subgroupA')
df
Here is the result:
A B C D subgroupA numConsec
0 2 1 1 1 1 1
1 1 2 1 0 2 2
2 1 0 2 1 2 2
3 0 1 2 0 3 1
4 1 0 0 1 4 1
5 0 2 2 1 5 2
6 0 2 1 1 5 2
7 1 0 0 1 6 1
8 0 2 0 0 7 4
9 0 0 0 2 7 4
10 0 2 1 1 7 4
11 0 2 2 0 7 4
12 1 2 0 1 8 1
13 0 1 1 0 9 1
14 1 1 1 0 10 1
The problem is, in the numConsec column, I don't want the full count for every row. I want it to reflect how it looks as you iteratively look at the dataframe. The problem is, my dataframe is too large to iteratively loop through and make the counts, as that would be too slow. I need to do it in a pythonic way and make it look like this:
A B C D subgroupA numConsec
0 2 1 1 1 1 1
1 1 2 1 0 2 1
2 1 0 2 1 2 2
3 0 1 2 0 3 1
4 1 0 0 1 4 1
5 0 2 2 1 5 1
6 0 2 1 1 5 2
7 1 0 0 1 6 1
8 0 2 0 0 7 1
9 0 0 0 2 7 2
10 0 2 1 1 7 3
11 0 2 2 0 7 4
12 1 2 0 1 8 1
13 0 1 1 0 9 1
14 1 1 1 0 10 1
Any ideas?
I have a pyspark dataframe like this and need the SEQ output as shown:
R_ID ORDER SC_ITEM seq
A 1 0
A 3 1 1
A 4 1 2
A 5 1 3
A 6 1 4
A 7 1 5
A 8 1 6
A 9 1 7
A 10 0 0
A 11 1 1
A 12 0 0
A 13 1
A 14 0
A 15 1 1
A 16 1 2
A 17 1 3
A 18 1 4
A 19 1 5
A 20 1 6
A 21 0 0
A 22 0 0
B 1 0 0
B 2 1 1
C 1 1 1
C 2 1 2
Not sure if the data is showing properly. So pic attached :enter image description here
I did something like this :
RN = Window().orderBy(lit('A'))
.when(((F.col("R_ID")==(lag(F.col("R_ID"),1).over(RN))) & (F.col("SC_ITEM")== 1)), (F.col("SC_ITEM") + (lag(F.col("SEQ"),1).over(RN))))\
Not sure if I can do lead or lag over the SEQ. Please help how to do this
I have a data frame with a column called "flag" with values 1 and 0. 1 means that the data is alright and 0 that there was something weird with this data value. I want to create another column called "safe" that copies the flag values value and changes to 0 a N number of cells above and below a 0 value in "flag". For example with N=2 I want to get this output:
flag safe
1 1 1
2 1 0
3 1 0
4 0 0
5 1 0
6 1 0
7 1 1
8 1 0
9 1 0
10 0 0
11 0 0
12 1 0
13 1 0
14 1 1
15 1 1
I want to be able to change N=3,4,5,6 manually so I can see how big is the impact. How could I do this?
IIUC, Series.where + Series.bfill and Series.ffill
N=2
df['safe'] = (df['flag'].where(lambda x: x.eq(0))
.bfill(limit=N)
.ffill(limit=N)
.fillna(df['flag'], downcast='int'))
print(df)
flag safe
1 1 1
2 1 0
3 1 0
4 0 0
5 1 0
6 1 0
7 1 1
8 1 0
9 1 0
10 0 0
11 0 0
12 1 0
13 1 0
14 1 1
15 1 1
How to get the data frame below
dd = pd.DataFrame({'val':[0,0,1,1,1,0,0,0,0,1,1,0,1,1,1,1,0,0],
'groups':[1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,'ignore','ignore']})
val groups
0 0 1
1 0 1
2 1 1
3 1 1
4 1 1
5 0 2
6 0 2
7 0 2
8 0 2
9 1 2
10 1 2
11 0 3
12 1 3
13 1 3
14 1 3
15 1 3
16 0 ignore
17 0 ignore
I have a series df.val with has values [0,0,1,1,1,0,0,0,0,1,1,0,1,1,1,1,0,0].
How to create df.groups from df.val.
first 0,0,1,1,1 will form group 1,(i.e. from the beginning upto next occurrence of 0 after 1's)
0,0,0,0,1,1 will form group 2, (incremental group number, starting where previous group ended uptill next occurrence of 0 after 1's),...etc
Can anyone please help.
First test if next value after 0 is 1 and create groups by sumulative sums by Series.cumsum:
s = (dd['val'].eq(0) & dd['val'].shift().eq(1)).cumsum().add(1)
Then convert last group to ignore if last value of data are 0 with numpy.where:
mask = s.eq(s.max()) & (dd['val'].iat[-1] == 0)
dd['new'] = np.where(mask, 'ignore', s)
print (dd)
val groups new
0 0 1 1
1 0 1 1
2 1 1 1
3 1 1 1
4 1 1 1
5 0 2 2
6 0 2 2
7 0 2 2
8 0 2 2
9 1 2 2
10 1 2 2
11 0 3 3
12 1 3 3
13 1 3 3
14 1 3 3
15 1 3 3
16 0 ignore ignore
17 0 ignore ignore
IIUC first we do diff and cumsum , then we need to find the condition to ignore the previous value we get (np.where)
s=df.val.diff().eq(-1).cumsum()+1
df['New']=np.where(df['val'].eq(1).groupby(s).transform('any'),s,'ignore')
df
val groups New
0 0 1 1
1 0 1 1
2 1 1 1
3 1 1 1
4 1 1 1
5 0 2 2
6 0 2 2
7 0 2 2
8 0 2 2
9 1 2 2
10 1 2 2
11 0 3 3
12 1 3 3
13 1 3 3
14 1 3 3
15 1 3 3
16 0 ignore ignore
17 0 ignore ignore