I want to add column values vertically from top to down
def add(x,y):
return x,y
df = pd.DataFrame({'A':[1,2,3,4,5]})
df['add'] = df.apply(lambda row : add(row['A'], axis = 1)
I tried using apply but its not working
Desired output is basically adding A column values 1+2, 2+3:
A add
0 1 1
1 2 3
2 3 5
3 4 7
4 5 9
You can apply rolling.sum on a moving window of size 2:
df.A.rolling(2, min_periods=1).sum()
0 1.0
1 3.0
2 5.0
3 7.0
4 9.0
Name: A, dtype: float64
Try this instead:
>>> df['add'] = (df + df.shift()).fillna(df)['A']
>>> df
A add
0 1 1.0
1 2 3.0
2 3 5.0
3 4 7.0
4 5 9.0
>>>
Related
i have this dataframe:
a b c d
4 7 5 12
3 8 2 8
1 9 3 5
9 2 6 4
i want the column 'd' to become the difference between n-value of column a and n+1 value of column 'a'.
I tried this but it doesn't run:
for i in data.index-1:
data.iloc[i]['d']=data.iloc[i]['a']-data.iloc[i+1]['a']
can anyone help me?
Basically what you want is diff.
df = pd.DataFrame.from_dict({"a":[4,3,1,9]})
df["d"] = df["a"].diff(periods=-1)
print(df)
Output
a d
0 4 1.0
1 3 2.0
2 1 -8.0
3 9 NaN
lets try simple way:
df=pd.DataFrame.from_dict({'a':[2,4,8,15]})
diff=[]
for i in range(len(df)-1):
diff.append(df['a'][i+1]-df['a'][i])
diff.append(np.nan)
df['d']=diff
print(df)
a d
0 2 2.0
1 4 4.0
2 8 7.0
3 15 NaN
I have this dataframe.
from pandas import DataFrame
import pandas as pd
df = pd.DataFrame({'name': ['A','D','M','T','B','C','D','E','A','L'],
'id': [1,1,1,2,2,3,3,3,3,5],
'rate': [3.5,4.5,2.0,5.0,4.0,1.5,2.0,2.0,1.0,5.0]})
>> df
name id rate
0 A 1 3.5
1 D 1 4.5
2 M 1 2.0
3 T 2 5.0
4 B 2 4.0
5 C 3 1.5
6 D 3 2.0
7 E 3 2.0
8 A 3 1.0
9 L 5 5.0
df = df.groupby('id')['rate'].mean()
what i want is this:
1) find mean of every 'id'.
2) give the number of ids (length) which has mean >= 3.
3) give back all rows of dataframe (where mean of any id >= 3.
Expected output:
Number of ids (length) where mean >= 3: 3
>> dataframe where (mean(id) >=3)
>>df
name id rate
0 A 1 3.0
1 D 1 4.0
2 M 1 2.0
3 T 2 5.0
4 B 2 4.0
5 L 5 5.0
Use GroupBy.transform for means by all groups with same size like original DataFrame, so possible filter by boolean indexing:
df = df[df.groupby('id')['rate'].transform('mean') >=3]
print (df)
name id rate
0 A 1 3.5
1 D 1 4.5
2 M 1 2.0
3 T 2 5.0
4 B 2 4.0
9 L 5 5.0
Detail:
print (df.groupby('id')['rate'].transform('mean'))
0 3.333333
1 3.333333
2 3.333333
3 4.500000
4 4.500000
5 1.625000
6 1.625000
7 1.625000
8 1.625000
9 5.000000
Name: rate, dtype: float64
Alternative solution with DataFrameGroupBy.filter:
df = df.groupby('id').filter(lambda x: x['rate'].mean() >=3)
Good Morning,
I have the following dataframe:
a = [1,2,3,4,5,6]
b = pd.DataFrame({'a': a})
I would like to create a column that sums the next "n" rows of column "a", including the present value of a; I tried:
n = 2
b["r"] = pd.rolling_sum(b.a, n) + a
print(b)
a r
0 1 NaN
1 2 5.0
2 3 8.0
3 4 11.0
4 5 14.0
5 6 17.0
It would be delightful to have:
a r
0 1 1 + 2 + 3 = 6
1 2 2 + 3 + 4 = 9
2 3 3 + 4 + 5 = 12
3 4 4 + 5 + 6 = 15
4 5 5 + 6 + 0 = 11
5 6 6 + 0 + 0 = 6
pandas >= 1.1
Pandas now supports "forward-looking window operations", see here.
From 1.1, you can use FixedForwardWindowIndexer
idx = pd.api.indexers.FixedForwardWindowIndexer
b['a'].rolling(window=idx(window_size=3), min_periods=1).sum()
0 6.0
1 9.0
2 12.0
3 15.0
4 11.0
5 6.0
Name: a, dtype: float64
Note that this is still (at the time of writing) very buggy for datetime rolling operations - use with caution.
pandas <= 1.0.X
Without builtin support, you can get your output by first reversing your data, using rolling_sum with min_periods=1, and reverse again.
b.a[::-1].rolling(3, min_periods=1).sum()[::-1]
0 6.0
1 9.0
2 12.0
3 15.0
4 11.0
5 6.0
Name: a, dtype: float64
After merging of two data frames:
output = pd.merge(df1, df2, on='ID', how='outer')
I have data frame like this:
index x y z
0 2 NaN 3
0 NaN 3 3
1 2 NaN 4
1 NaN 3 4
...
How to merge rows with the same index?
Expected output:
index x y z
0 2 3 3
1 2 3 4
Perhaps, you could take mean on them.
In [418]: output.groupby('index', as_index=False).mean()
Out[418]:
index x y z
0 0 2.0 3.0 3
1 1 2.0 3.0 4
We can group the DataFrame by the 'index' and then... we can just get the first values with .first() or minimum with .min() etc. depending on the case of course. What do you want to get if the values in z differ?
In [28]: gr = df.groupby('index', as_index=False)
In [29]: gr.first()
Out[29]:
index x y z
0 0 2.0 3.0 3
1 1 2.0 3.0 4
In [30]: gr.max()
Out[30]:
index x y z
0 0 2.0 3.0 3
1 1 2.0 3.0 4
In [31]: gr.min()
Out[31]:
index x y z
0 0 2.0 3.0 3
1 1 2.0 3.0 4
In [32]: gr.mean()
Out[32]:
index x y z
0 0 2.0 3.0 3
1 1 2.0 3.0 4
Hi I have a pandas frame work like:
1. 1
2. 2
3. 3
4. 4
And the output is something like
1. 1
2. 3
3. 6
4. 10
where each value is the current value plus the last one (3 = 1 + 2, 6 = 3 + 3, 10 = 6 + 4 etc).
Can I do this without a for loop?
You need Series.cumsum:
print (df)
col
1.0 1
2.0 2
3.0 3
4.0 4
df['col1'] = df.col.cumsum()
print (df)
col col1
1.0 1 1
2.0 2 3
3.0 3 6
4.0 4 10
If need overwrite column col:
df.col = df.col.cumsum()
print (df)
col
1.0 1
2.0 3
3.0 6
4.0 10