How to apply function vertically in df

How to apply function vertically in df - python

I want to add column values vertically from top to down
def add(x,y):
return x,y
df = pd.DataFrame({'A':[1,2,3,4,5]})
df['add'] = df.apply(lambda row : add(row['A'], axis = 1)
I tried using apply but its not working
Desired output is basically adding A column values 1+2, 2+3:
A add
0 1 1
1 2 3
2 3 5
3 4 7
4 5 9

You can apply rolling.sum on a moving window of size 2:
df.A.rolling(2, min_periods=1).sum()
0 1.0
1 3.0
2 5.0
3 7.0
4 9.0
Name: A, dtype: float64

Try this instead:
>>> df['add'] = (df + df.shift()).fillna(df)['A']
>>> df
A add
0 1 1.0
1 2 3.0
2 3 5.0
3 4 7.0
4 5 9.0
>>>

Related

Compute difference between values in dataframe column

i have this dataframe:
a b c d
4 7 5 12
3 8 2 8
1 9 3 5
9 2 6 4
i want the column 'd' to become the difference between n-value of column a and n+1 value of column 'a'.
I tried this but it doesn't run:
for i in data.index-1:
data.iloc[i]['d']=data.iloc[i]['a']-data.iloc[i+1]['a']
can anyone help me?

Basically what you want is diff.
df = pd.DataFrame.from_dict({"a":[4,3,1,9]})
df["d"] = df["a"].diff(periods=-1)
print(df)
Output
a d
0 4 1.0
1 3 2.0
2 1 -8.0
3 9 NaN

lets try simple way:
df=pd.DataFrame.from_dict({'a':[2,4,8,15]})
diff=[]
for i in range(len(df)-1):
diff.append(df['a'][i+1]-df['a'][i])
diff.append(np.nan)
df['d']=diff
print(df)
a d
0 2 2.0
1 4 4.0
2 8 7.0
3 15 NaN

Finding mean of specific column and keep all rows that have specific mean values

I have this dataframe.
from pandas import DataFrame
import pandas as pd
df = pd.DataFrame({'name': ['A','D','M','T','B','C','D','E','A','L'],
'id': [1,1,1,2,2,3,3,3,3,5],
'rate': [3.5,4.5,2.0,5.0,4.0,1.5,2.0,2.0,1.0,5.0]})
>> df
name id rate
0 A 1 3.5
1 D 1 4.5
2 M 1 2.0
3 T 2 5.0
4 B 2 4.0
5 C 3 1.5
6 D 3 2.0
7 E 3 2.0
8 A 3 1.0
9 L 5 5.0
df = df.groupby('id')['rate'].mean()
what i want is this:
1) find mean of every 'id'.
2) give the number of ids (length) which has mean >= 3.
3) give back all rows of dataframe (where mean of any id >= 3.
Expected output:
Number of ids (length) where mean >= 3: 3
>> dataframe where (mean(id) >=3)
>>df
name id rate
0 A 1 3.0
1 D 1 4.0
2 M 1 2.0
3 T 2 5.0
4 B 2 4.0
5 L 5 5.0

Use GroupBy.transform for means by all groups with same size like original DataFrame, so possible filter by boolean indexing:
df = df[df.groupby('id')['rate'].transform('mean') >=3]
print (df)
name id rate
0 A 1 3.5
1 D 1 4.5
2 M 1 2.0
3 T 2 5.0
4 B 2 4.0
9 L 5 5.0
Detail:
print (df.groupby('id')['rate'].transform('mean'))
0 3.333333
1 3.333333
2 3.333333
3 4.500000
4 4.500000
5 1.625000
6 1.625000
7 1.625000
8 1.625000
9 5.000000
Name: rate, dtype: float64
Alternative solution with DataFrameGroupBy.filter:
df = df.groupby('id').filter(lambda x: x['rate'].mean() >=3)

Rolling sum of the next N elements, including the current element

Good Morning,
I have the following dataframe:
a = [1,2,3,4,5,6]
b = pd.DataFrame({'a': a})
I would like to create a column that sums the next "n" rows of column "a", including the present value of a; I tried:
n = 2
b["r"] = pd.rolling_sum(b.a, n) + a
print(b)
a r
0 1 NaN
1 2 5.0
2 3 8.0
3 4 11.0
4 5 14.0
5 6 17.0
It would be delightful to have:
a r
0 1 1 + 2 + 3 = 6
1 2 2 + 3 + 4 = 9
2 3 3 + 4 + 5 = 12
3 4 4 + 5 + 6 = 15
4 5 5 + 6 + 0 = 11
5 6 6 + 0 + 0 = 6

pandas >= 1.1
Pandas now supports "forward-looking window operations", see here.
From 1.1, you can use FixedForwardWindowIndexer
idx = pd.api.indexers.FixedForwardWindowIndexer
b['a'].rolling(window=idx(window_size=3), min_periods=1).sum()
0 6.0
1 9.0
2 12.0
3 15.0
4 11.0
5 6.0
Name: a, dtype: float64
Note that this is still (at the time of writing) very buggy for datetime rolling operations - use with caution.
pandas <= 1.0.X
Without builtin support, you can get your output by first reversing your data, using rolling_sum with min_periods=1, and reverse again.
b.a[::-1].rolling(3, min_periods=1).sum()[::-1]
0 6.0
1 9.0
2 12.0
3 15.0
4 11.0
5 6.0
Name: a, dtype: float64

Python / Pandas: How to merge rows in dataframe

After merging of two data frames:
output = pd.merge(df1, df2, on='ID', how='outer')
I have data frame like this:
index x y z
0 2 NaN 3
0 NaN 3 3
1 2 NaN 4
1 NaN 3 4
...
How to merge rows with the same index?
Expected output:
index x y z
0 2 3 3
1 2 3 4

Perhaps, you could take mean on them.
In [418]: output.groupby('index', as_index=False).mean()
Out[418]:
index x y z
0 0 2.0 3.0 3
1 1 2.0 3.0 4

We can group the DataFrame by the 'index' and then... we can just get the first values with .first() or minimum with .min() etc. depending on the case of course. What do you want to get if the values in z differ?
In [28]: gr = df.groupby('index', as_index=False)
In [29]: gr.first()
Out[29]:
index x y z
0 0 2.0 3.0 3
1 1 2.0 3.0 4
In [30]: gr.max()
Out[30]:
index x y z
0 0 2.0 3.0 3
1 1 2.0 3.0 4
In [31]: gr.min()
Out[31]:
index x y z
0 0 2.0 3.0 3
1 1 2.0 3.0 4
In [32]: gr.mean()
Out[32]:
index x y z
0 0 2.0 3.0 3
1 1 2.0 3.0 4

Pandas accumulation without for loop

Hi I have a pandas frame work like:
1. 1
2. 2
3. 3
4. 4
And the output is something like
1. 1
2. 3
3. 6
4. 10
where each value is the current value plus the last one (3 = 1 + 2, 6 = 3 + 3, 10 = 6 + 4 etc).
Can I do this without a for loop?

You need Series.cumsum:
print (df)
col
1.0 1
2.0 2
3.0 3
4.0 4
df['col1'] = df.col.cumsum()
print (df)
col col1
1.0 1 1
2.0 2 3
3.0 3 6
4.0 4 10
If need overwrite column col:
df.col = df.col.cumsum()
print (df)
col
1.0 1
2.0 3
3.0 6
4.0 10

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to apply function vertically in df - python

You can apply rolling.sum on a moving window of size 2: df.A.rolling(2, min_periods=1).sum() 0 1.0 1 3.0 2 5.0 3 7.0 4 9.0 Name: A, dtype: float64

Try this instead: >>> df['add'] = (df + df.shift()).fillna(df)['A'] >>> df A add 0 1 1.0 1 2 3.0 2 3 5.0 3 4 7.0 4 5 9.0 >>>

Related

Compute difference between values in dataframe column

Finding mean of specific column and keep all rows that have specific mean values

Rolling sum of the next N elements, including the current element

Python / Pandas: How to merge rows in dataframe

Pandas accumulation without for loop

Categories

Resources