Sum values from DataFrame into Parent Index - Python/Pandas - python

I'm working with Mint transaction data and trying to sum the values from each category into it's parent category.
I have a dataframe mint_data that is created from all my Mint transactions:
mint_data = tranactions_data.pivot(index='Category', columns='Date', values='Amount')
mint_data image
And a dict with Category:Parent pairs (this uses xlwings to pull from excel sheet)
cat_parent = cats_sheet.range('A1').expand().options(dict).value
Cat:Parent image
I'm not sure how to go about looping through the mint_data df and summing amounts into the parent category. I would like to keep the data frame format exactly the same, just replacing the parent values.
Here is an example df:
A B C D E
par_a 0 0 5 0 0
cat1a 5 2 3 2 1
cat2a 0 1 2 1 0
par_b 1 0 1 1 2
cat1b 0 1 2 1 0
cat2b 1 1 1 1 1
cat3b 0 1 2 1 0
I also have a dict with
{'par_a': 'par_a',
'cat1a': 'par_a',
'cat2a': 'par_a',
'par_b': 'par_b',
'cat1b': 'par_b',
'cat2b': 'par_b',
'cat3b': 'par_b'}
I am trying to get the dataframe to end up with
A B C D E
par_a 5 3 10 3 1
cat1a 5 2 3 2 1
cat2a 0 1 2 1 0
par_b 2 3 6 4 3
cat1b 0 1 2 1 0
cat2b 1 1 1 1 1
cat3b 0 1 2 1 0

Let's call your dictionary "dct" and then make a new column that maps to the parent:
>>> df['parent'] = df.reset_index()['index'].map(dct).values
A B C D E parent
par_a 0 0 5 0 0 par_a
cat1a 5 2 3 2 1 par_a
cat2a 0 1 2 1 0 par_a
par_b 1 0 1 1 2 par_b
cat1b 0 1 2 1 0 par_b
cat2b 1 1 1 1 1 par_b
cat3b 0 1 2 1 0 par_b
Then sum by parent:
>>> df_sum = df.groupby('parent').sum()
A B C D E
parent
par_a 5 3 10 3 1
par_b 2 3 6 4 3
In many cases you would stop there, but since you want to combine the parent/child data, you need some sort of merge. combine_first will work well here since it will selectively update in the direction you want:
>>> df_new = df_sum.combine_first(df)
A B C D E parent
cat1a 5.0 2.0 3.0 2.0 1.0 par_a
cat1b 0.0 1.0 2.0 1.0 0.0 par_b
cat2a 0.0 1.0 2.0 1.0 0.0 par_a
cat2b 1.0 1.0 1.0 1.0 1.0 par_b
cat3b 0.0 1.0 2.0 1.0 0.0 par_b
par_a 5.0 3.0 10.0 3.0 1.0 par_a
par_b 2.0 3.0 6.0 4.0 3.0 par_b
You mentioned a multi-index in a comment, so you may prefer to organize it more like this:
>>> df_new.reset_index().set_index(['parent','index']).sort_index()
A B C D E
parent index
par_a cat1a 5.0 2.0 3.0 2.0 1.0
cat2a 0.0 1.0 2.0 1.0 0.0
par_a 5.0 3.0 10.0 3.0 1.0
par_b cat1b 0.0 1.0 2.0 1.0 0.0
cat2b 1.0 1.0 1.0 1.0 1.0
cat3b 0.0 1.0 2.0 1.0 0.0
par_b 2.0 3.0 6.0 4.0 3.0

Related

Calculating differences between rows within groups using pandas

I want to group by the id column in this dataframe:
id a b c
0 1 1 6 2
1 1 2 5 2
2 2 3 4 2
3 2 4 3 2
4 3 5 2 2
5 3 6 1 2
and add the differences between rows for the same column and group as additional columns to end up with this dataframe:
id a b c a_diff b_diff c_diff
0 1 1 6 2 -1.0 1.0 0.0
1 1 2 5 2 1.0 -1.0 0.0
2 2 3 4 2 -1.0 1.0 0.0
3 2 4 3 2 1.0 -1.0 0.0
4 3 5 2 2 -1.0 1.0 0.0
5 3 6 1 2 1.0 -1.0 0.0
data here
df = pd.DataFrame({'id': [1,1,2,2,3,3], 'a': [1,2,3,4,5,6],'b': [6,5,4,3,2,1], 'c': [2,2,2,2,2,2]})
Your desired output doesn't make much sense, but I can force it there with:
df[['a_diff', 'b_diff', 'c_diff']] = df.groupby('id').transform(lambda x: x.diff(1).fillna(x.diff(-1)))
Output:
id a b c a_diff b_diff c_diff
0 1 1 6 2 -1.0 1.0 0.0
1 1 2 5 2 1.0 -1.0 0.0
2 2 3 4 2 -1.0 1.0 0.0
3 2 4 3 2 1.0 -1.0 0.0
4 3 5 2 2 -1.0 1.0 0.0
5 3 6 1 2 1.0 -1.0 0.0

How to assign a value from the last row of a preceding group to the next group?

The goal is to put the digits from the last row of the previous letter group in the new column "last_digit_prev_group". The expected, correct value, as a result formula, was entered by me manually in the column "col_ok". I stopped trying shift (), but the effect was far from what I expected. Maybe there is some other way?
Forgive me the inconsistency of my post, I'm not an IT specialist and I don't know English. Thanks in advance for your support.
df = pd.read_csv('C:/Users/.../a.csv',names=['group_letter', 'digit', 'col_ok'] ,
index_col=0,)
df['last_digit_prev_group'] = df.groupby('group_letter')['digit'].shift(1)
print(df)
group_letter digit col_ok last_digit_prev_group
A 1 n NaN
A 3 n 1.0
A 2 n 3.0
A 5 n 2.0
A 1 n 5.0
B 1 1 NaN
B 2 1 1.0
B 1 1 2.0
B 1 1 1.0
B 3 1 1.0
C 5 3 NaN
C 6 3 5.0
C 1 3 6.0
C 2 3 1.0
C 3 3 2.0
D 4 3 NaN
D 3 3 4.0
D 2 3 3.0
D 5 3 2.0
D 7 3 5.0
Use Series.mask with DataFrame.duplicated for last valeus of digit, then Series.shift and last ffill:
df['last_digit_prev_group'] = (df['digit'].mask(df.duplicated('group_letter', keep='last'))
.shift()
.ffill())
print (df)
group_letter digit col_ok last_digit_prev_group
0 A 1 n NaN
1 A 3 n NaN
2 A 2 n NaN
3 A 5 n NaN
4 A 1 n NaN
5 B 1 1 1.0
6 B 2 1 1.0
7 B 1 1 1.0
8 B 1 1 1.0
9 B 3 1 1.0
10 C 5 3 3.0
11 C 6 3 3.0
12 C 1 3 3.0
13 C 2 3 3.0
14 C 3 3 3.0
15 D 4 3 3.0
16 D 3 3 3.0
17 D 2 3 3.0
18 D 5 3 3.0
19 D 7 3 3.0
If possible some last value is NaN:
df['last_digit_prev_group'] = (df['digit'].mask(df.duplicated('group_letter', keep='last'))
.shift()
.groupby(df['group_letter']).ffill()
print (df)
group_letter digit col_ok last_digit_prev_group
0 A 1.0 n NaN
1 A 3.0 n NaN
2 A 2.0 n NaN
3 A 5.0 n NaN
4 A 1.0 n NaN
5 B 1.0 1 1.0
6 B 2.0 1 1.0
7 B 1.0 1 1.0
8 B 1.0 1 1.0
9 B 3.0 1 1.0
10 C 5.0 3 3.0
11 C 6.0 3 3.0
12 C 1.0 3 3.0
13 C 2.0 3 3.0
14 C NaN 3 3.0
15 D 4.0 3 NaN
16 D 3.0 3 NaN
17 D 2.0 3 NaN
18 D 5.0 3 NaN
19 D 7.0 3 NaN

How to insert list of values into null values of a column in python?

I am new to pandas. I am facing an issue with null values. I have a list of 3 values which has to be inserted into a column of missing values how do I do that?
In [57]: df
Out[57]:
a b c d
0 0 1 2 3
1 0 NaN 0 1
2 0 Nan 3 4
3 0 1 2 5
4 0 Nan 2 6
In [58]: list = [11,22,44]
The output I want
Out[57]:
a b c d
0 0 1 2 3
1 0 11 0 1
2 0 22 3 4
3 0 1 2 5
4 0 44 2 6
If your list is same length as the no of NaN:
l=[11,22,44]
df.loc[df['b'].isna(),'b'] = l
print(df)
a b c d
0 0 1.0 2 3
1 0 11.0 0 1
2 0 22.0 3 4
3 0 1.0 2 5
4 0 44.0 2 6
Try with stack and assign the value then unstack back
s = df.stack(dropna=False)
s.loc[s.isna()] = l # chnage the list name to l here, since override the original python and panda function and object name will create future warning
df = s.unstack()
df
Out[178]:
a b c d
0 0.0 1.0 2.0 3.0
1 0.0 11.0 0.0 1.0
2 0.0 22.0 3.0 4.0
3 0.0 1.0 2.0 5.0
4 0.0 44.0 2.0 6.0

Panda, summing multiple DataFrames with different columns

I have the following DataFrames:
A =
0 1 2
0 1 1 1
1 1 1 1
2 1 1 1
B =
0 5
0 1 1
5 1 1
I want to 'join' these two frames such that:
A + B =
0 1 2 5
0 2 1 1 1
1 1 1 1 0
2 1 1 1 0
5 1 0 0 1
where A+B is a new dataframe
Using add
df1.add(df2,fill_value=0).fillna(0)
Out[217]:
0 1 2 5
0 2.0 1.0 1.0 1.0
1 1.0 1.0 1.0 0.0
2 1.0 1.0 1.0 0.0
5 1.0 0.0 0.0 1.0
If you need int
df1.add(df2,fill_value=0).fillna(0).astype(int)
Out[242]:
0 1 2 5
0 2 1 1 1
1 1 1 1 0
2 1 1 1 0
5 1 0 0 1
import numpy as np
import pandas as pd
A = pd.DataFrame(np.ones(9).reshape(3, 3))
B = pd.DataFrame(np.ones(4).reshape(2, 2), columns=[0, 5], index=[0, 5])
A.add(B, fill_value=0).fillna(0)
[Out]
0 1 2 5
0 2.0 1.0 1.0 1.0
1 1.0 1.0 1.0 0.0
2 1.0 1.0 1.0 0.0
5 1.0 0.0 0.0 1.0

Pandas: How can I fill in the n/a with the mean of previous none-empty value and next none-empty value

I have some N/A value in my dataframe
df = pd.DataFrame({'A':[1,1,1,3],
'B':[1,1,1,3],
'C':[1,np.nan,3,5],
'D':[2,np.nan, np.nan, 6]})
print(df)
A B C D
0 1 1 1.0 2.0
1 1 1 NaN NaN
2 1 1 3.0 NaN
3 3 3 5.0 6.0
How can I fill in the n/a value with the mean of its previous non-empty value and next non-empty value in its column?
For example, the second value in column C should be filled in with (1+3)/2= 2
Desired Output:
A B C D
0 1 1 1.0 2.0
1 1 1 2.0 4.0
2 1 1 3.0 4.0
3 3 3 5.0 6.0
Thanks!
Use ffill and bfill for replace NaNs by forward and back filling, then concat and groupby by index with aggregate mean:
df1 = pd.concat([df.ffill(), df.bfill()]).groupby(level=0).mean()
print (df1)
A B C D
0 1 1 1.0 2.0
1 1 1 2.0 4.0
2 1 1 3.0 4.0
3 3 3 5.0 6.0
Detail:
print (df.ffill())
A B C D
0 1 1 1.0 2.0
1 1 1 1.0 2.0
2 1 1 3.0 2.0
3 3 3 5.0 6.0
print (df.bfill())
A B C D
0 1 1 1.0 2.0
1 1 1 3.0 6.0
2 1 1 3.0 6.0
3 3 3 5.0 6.0

Categories

Resources