df:
id cond1 a b c d
0 Q b 1 1 nan 1
1 R b 8 3 nan 3
2 Q a 12 4 8 nan
3 Q b 8 3 nan 1
4 R b 1 2 nan 3
5 Q a 7 9 8 nan
6 Q b 4 4 nan 1
7 R b 9 8 nan 3
8 Q a 0 10 8 nan
Group by id and cond1 and do a rolling(2).sum():
df.groupby(['id','cond1']).apply(lambda x: x[x.name[1]].rolling(2).sum())
Output:
id cond1
Q a 2 nan
5 19.00000
8 7.00000
b 0 nan
3 4.00000
6 7.00000
R b 1 nan
4 5.00000
7 10.00000
dtype: float64
Why is the output in a table form? Can it be in a series form and its index reset?
You can use reset_index() to make groupby object back to dataframe
Related
If I have a pandas dataframe like this:
0 1 2 3 4 5
A 5 5 10 9 4 5
B 10 10 10 8 1 1
C 8 8 0 9 6 3
D 10 10 11 4 2 9
E 0 9 1 5 8 3
If I set a threshold of 7, how do I loop through each row and set the values after the threshold is no longer met equal to np.nan such that I get a data frame like this:
0 1 2 3 4 5
A 5 5 10 9 NaN NaN
B 10 10 10 8 NaN NaN
C 8 8 0 9 NaN NaN
D 10 10 11 4 2 9
E 0 9 1 5 8 NaN
Where everything after the last number greater than 7 is set equal to np.nan.
Let's try this:
df.where(df.where(df > 7).bfill(axis=1).notna())
Output:
0 1 2 3 4 5
A 5 5 10 9 NaN NaN
B 10 10 10 8 NaN NaN
C 8 8 0 9 NaN NaN
D 10 10 11 4 2.0 9.0
E 0 9 1 5 8.0 NaN
create a mask m by using df.where on df.gt(7) and bfill and isna. Finally, indexing df using m
m = df.where(df.gt(7)).bfill(1).notna()
df[m]
Out[24]:
0 1 2 3 4 5
A 5 5 10 9 NaN NaN
B 10 10 10 8 NaN NaN
C 8 8 0 9 NaN NaN
D 10 10 11 4 2.0 9.0
E 0 9 1 5 8.0 NaN
A very nice question , reverse the order then cumsum the one equal to 0 should be NaN
df.where(df.iloc[:,::-1].gt(7).cumsum(1).ne(0))
0 1 2 3 4 5
A 5 5 10 9 NaN NaN
B 10 10 10 8 NaN NaN
C 8 8 0 9 NaN NaN
D 10 10 11 4 2.0 9.0
E 0 9 1 5 8.0 NaN
I have a dataframe which I want to cut at a specific row and then I want to add this cut to right of the data frame.
I hope my example clarifies what I mean.
Appreciate your help.
Example:
Column_name1 Column_name2 column_name3 Column_name4
0
1
2
3
4
5------------------------------------------------------< cut here
6
7
8
9
10
Column_name1 Column_name2 column_name3 column_name4 column_name5
0 5
1 6
2 7 add cut here
3 8
4 9
Use:
df = pd.DataFrame({
'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')
})
n = 3
df = pd.concat([df.iloc[:n].reset_index(drop=True),
df.iloc[n:].add_prefix('cutted_').reset_index(drop=True)], axis=1)
print (df)
A B C D E F cutted_A cutted_B cutted_C cutted_D cutted_E cutted_F
0 a 4 7 1 5 a d 5 4 7 9 b
1 b 5 8 3 3 a e 5 2 1 2 b
2 c 4 9 5 6 a f 4 3 0 4 b
n = 5
df = pd.concat([df.iloc[:n].reset_index(drop=True),
df.iloc[n:].add_prefix('cutted_').reset_index(drop=True)], axis=1)
print (df)
A B C D E F cutted_A cutted_B cutted_C cutted_D cutted_E cutted_F
0 a 4 7 1 5 a f 4.0 3.0 0.0 4.0 b
1 b 5 8 3 3 a NaN NaN NaN NaN NaN NaN
2 c 4 9 5 6 a NaN NaN NaN NaN NaN NaN
3 d 5 4 7 9 b NaN NaN NaN NaN NaN NaN
4 e 5 2 1 2 b NaN NaN NaN NaN NaN NaN
pd.DataFrame({'A':[None,2,None,None,3,4],'B':[1,2,3,4,5,6]})
A B
0 NaN 1
1 2 2
2 NaN 3
3 NaN 4
4 3 5
5 4 6
how do I add column C that will take the value from column A if it's not NaN, otherwise column B's value?
A B C
0 NaN 1 1
1 2 2 2
2 NaN 3 3
3 NaN 4 4
4 3 5 3
5 4 6 4
try combine_first():
In [184]: a['C'] = a['A'].combine_first(a['B']).astype(int)
In [185]: a
Out[185]:
A B C
0 NaN 1 1
1 2.0 2 2
2 NaN 3 3
3 NaN 4 4
4 3.0 5 3
5 4.0 6 4
you could also try fillna():
In [26]: a['C'] = a['A'].fillna(a['B'])
In [27]: a
Out[27]:
A B C
0 NaN 1 1
1 2 2 2
2 NaN 3 3
3 NaN 4 4
4 3 5 3
5 4 6 4
Consider the dataframes
A:
g N a
1 3 5
2 4 6
and B:
g N a e
3 3 4 7
4 9 1 8
Is there some way to merge these such that the resultant dataframe is:
g N a e
1 3 5 NaN
2 4 6 NaN
3 3 4 7
4 9 1 8
In other words, is there some way to preserve the column order rather than re-sort lexicographically?
Use reindex_axis:
pd.concat([A,B]).reindex_axis(B.columns, axis=1)
Output:
g N a e
0 1 3 5 NaN
1 2 4 6 NaN
0 3 3 4 7.0
1 4 9 1 8.0
When merging, specify sort=False.
In [1251]: A.merge(B, how='outer', sort=False)
Out[1251]:
g N a e
0 1 3 5 NaN
1 2 4 6 NaN
2 3 3 4 7.0
3 4 9 1 8.0
The following should do the trick: pd.concat([a, b])[b.columns]
Full test code:
import pandas as pd
from io import StringIO
a = pd.read_csv(StringIO("""
g N a
1 3 5
2 4 6
"""), sep=r"\s*")
b = pd.read_csv(StringIO("""
g N a e
3 3 4 7
4 9 1 8
"""), sep=r"\s*")
pd.concat([a, b])[b.columns]
This produces:
g N a e
0 1 3 5 NaN
1 2 4 6 NaN
0 3 3 4 7.0
1 4 9 1 8.0
You might also want to reset the index:
pd.concat([a, b])[b.columns].reset_index(drop=True)
... in order to remove index duplicates. This gives:
g N a e
0 1 3 5 NaN
1 2 4 6 NaN
2 3 3 4 7.0
3 4 9 1 8.0
I have a data frame like this:
A B C D
0 1 0 nan nan
1 8 0 nan nan
2 8 1 nan nan
3 2 1 nan nan
4 0 0 nan nan
5 1 1 nan nan
and i have a dictionary like this:
dc = {'C': 5, 'D' : 10}
I want to fill the nanvalues in the data frame with the dictionary but only for the cells in which the column B values are 0, i want to obtain this:
A B C D
0 1 0 5 10
1 8 0 5 10
2 8 1 nan nan
3 2 1 nan nan
4 0 0 5 10
5 1 1 nan nan
I know how to subset the dataframe but i can't find a way to fill the values with the dictionary; any ideas?
You could use fillna with loc and pass your dict to it:
In [13]: df.loc[df.B==0,:].fillna(dc)
Out[13]:
A B C D
0 1 0 5 10
1 8 0 5 10
4 0 0 5 10
To do it for you dataframe you need to slice with the same mask and assign the result above to it:
df.loc[df.B==0, :] = df.loc[df.B==0,:].fillna(dc)
In [15]: df
Out[15]:
A B C D
0 1 0 5 10
1 8 0 5 10
2 8 1 NaN NaN
3 2 1 NaN NaN
4 0 0 5 10
5 1 1 NaN NaN