Traspose groupby sublevels into columns pandas/python

Traspose groupby sublevels into columns pandas/python - python

gp1.groupby(by=['ID', 'CD'])['BALANCE_AM'].sum()
ID CD
4332 5 0.0
58 0.0
123 22656.0
756423 47 645087.0
123 227655.0
I want to create columns for each type of CD, what is the sum of BALANCE_AM
Desired Output
ID 5 58 123 47
4332 0 0 22656.0 NaN
756423 NaN NaN 227655.0 645087.0

Add Series.unstack and DataFrame.reset_index if necessary ID in column:
df = gp1.groupby(by=['ID', 'CD'])['BALANCE_AM'].sum().unstack().reset_index()

Another way is to use pivot_table instead of groupby:
gp1.pivot_table(values='BALANCE_AM', index='ID', columns='CD', aggfunc='sum')

Related

Sum two columns only if the values of one column is bigger/greater 0

I've got the following dataframe
lst=[['01012021','',100],['01012021','','50'],['01022021',140,5],['01022021',160,12],['01032021','',20],['01032021',200,25]]
df1=pd.DataFrame(lst,columns=['Date','AuM','NNA'])
I am looking for a code which sums the columns AuM and NNA only if the values of column AuM contains a value. The result is showed below:
lst=[['01012021','',100,''],['01012021','','50',''],['01022021',140,5,145],['01022021',160,12,172],['01032021','',20,'']]
df2=pd.DataFrame(lst,columns=['Date','AuM','NNA','Sum'])

It is not a good practice to use '' in place of NaN when you have numeric data.
That said, a generic solution to your issue would be to use sum with the skipna=False option:
df1['Sum'] = (df1[['AuM', 'NNA']] # you can use as many columns as you want
.apply(pd.to_numeric, errors='coerce') # convert to numeric
.sum(1, skipna=False) # sum if all are non-NaN
.fillna('') # fill NaN with empty string (bad practice)
)
output:
Date AuM NNA Sum
0 01012021 100
1 01012021 50
2 01022021 140 5 145.0
3 01022021 160 12 172.0
4 01032021 20
5 01032021 200 25 225.0

I assume you mean to include the last row too:
df2 = (df1.assign(Sum=df1.loc[df1.AuM.ne(""), ["AuM", "NNA"]].sum(axis=1))
.fillna(""))
print(df2)
Result:
Date AuM NNA Sum
0 01012021 100
1 01012021 50
2 01022021 140 5 145.0
3 01022021 160 12 172.0
4 01032021 20
5 01032021 200 25 225.0

column not found while renaming in panda dataframe

I have this panda dataframe
timestamp EG2021 EGH2021
2021-01-04 33 Nan
2021-02-04 45 65
And I Am trying to replace the columnm name with new name as mapped in an excel file like this
OldId NewId
EG2021 LER_EG2021
EGH2021 LER_EGH2021
I tried below code but its not working I get the error as
KeyError: "None of [Index(['LER_EG2021',LER_EGH2021'],\n
dtype='object', length=186)] are in the [columns]
Code:
df = pd.ExcelFile('ids.xlsx').parse('Sheet1')
x=[]
x.append(df['external_ids'].to_list())
dtest_df = (my panda dataframe as mentioned above)
mapper = df.set_index(df['oldId'])[df['NewId']]
dtest_df.columns = dtest_df.columns.Series.replace(mapper)
Any idea what wrong am I doing??

You need:
mapper = df.set_index('oldId')['NewId']
dtest_df.columns = dtest_df.columns.map(mapper.to_dict())
Or:
dtest_df = dtest_df.rename(columns=df.set_index('oldId')['NewId'].to_dict())
dtest_df output:
timestamp LER_EG2021 LER_EGH2021
0 2021-01-04 33 NaN
1 2021-02-04 45 65

Another way, dict the zip of the df with the Old and New ids
dtest_df.rename(columns=dict(zip(df['OldId'], df['NewId'])), inplace=True)
timestamp LER_EG2021 LER_EGH2021
0 2021-01-04 33 Nan
1 2021-02-04 45 65

how to convert pandas columns to rows dynamically

I have following dataframe in pandas
code tank noz_sale_cumsum noz_1_sub noz_2_sub noz_1_avg noz_2_avg noz_1_flag noz_2_flag
123 1 1234 12 23 23.23 32.45 short ok
123 2 1200 13 53 33.13 22.45 excess ok
columns such as noz_1_sub, noz_2_sub, noz_1_avg, noz_2_avg, noz_1_flag and noz_2_flag are generated dynamically.
My desired dataframe would be following.
code tank noz_no noz_sale_cumsum noz_sub noz_avg noz_flag
123 1 1 1234 12 23.23 short
123 1 2 1234 23 32.45 ok
123 2 1 1200 13 33.13 excess
123 2 2 1200 53 22.45 ok
I am doing following in pandas.
first I am getting all dynamic columns in different arrays
cols_sub = [cols for cols in df.columns if re.search('noz_\d+_sub', cols)]
cols_avg = [cols for cols in df.columns if re.search('noz_\d+_avg', cols)]
cols_flag = [cols for cols in df.columns if re.search('noz_\d+_flag', cols)]
final_df = df.pivot_table(index=['code', 'tank', 'noz_sale_cumsum'], columns=[cols_sub, cols_avg, cols_flag], values=[]).reset_index()
I am not sure about values column and how do I extract number from noz like columns and put it under noz_no column. Any help is appreciated.

You can use melt to convert everything to rows then use pivot_table to convert back some rows to columns.
a = df.melt(id_vars=['code', 'tank', 'noz_sale_cumsum'])
a['noz_no'] = a.variable.map(lambda x: x.split('_')[1])
a['kpi'] = a.variable.map(lambda x: 'noz_' + x.split('_')[2])
a.pivot_table(
values='value',
index=['code', 'tank', 'noz_sale_cumsum', 'noz_no'],
columns=['kpi'], aggfunc='first'
).reset_index()

How should I subtract two dataframes and in Pandas and diplay the required output?

My table looks like this:
In [82]:df.head()
Out[82]:
MatDoc MatYr MvT Material Plnt SLoc Batch Customer AmountLC Amount ... PO MatYr.1 MatDoc.1 Order ProfitCtr SLED/BBD PstngDate EntryDate Time Username
0 4912693062 2015 551 100062 HDC2 0001 5G30MC1A11 NaN 9.03 9.06 ... NaN NaN NaN NaN IN1165B085 26.01.2016 01.08.2015 01.08.2015 01:13:16 O33462
1 4912693063 2015 501 166 HDC2 0004 NaN NaN 0.00 0.00 ... NaN NaN NaN NaN IN1165B085 NaN 01.08.2015 01.08.2015 01:13:17 O33462
2 4912693320 2015 551 101343 HDC2 0001 5G28MC1A11 NaN 53.73 53.72 ... NaN NaN NaN NaN IN1165B085 25.01.2016 01.08.2015 01.08.2015 01:16:30 O33462
Here, I need to group by data on Order column and sum only AmountLC column.Then I need to check for the Order column values such that it should be present in both MvT101group and MvT102group. and if an Order matches in both sets of data then I need to subtract MvT102group from MvT101group. and display
Order|Plnt|Material|Batch|Sum101=SumofMvt101ofAmountLC|Sum102=SumofMvt102ofAmountLC|(Sum101-Sum102)/100
What I have done is first I made new df containing only 101 and 102: Mvt101 and MvT102
MvT101 = df.loc[df['MvT'] == 101]
MvT102 = df.loc[df['MvT'] == 102]
Then I grouped it by Order and got the sum value for the column
MvT101group = MvT101.groupby('Order', sort=True)
In [76]:
MvT101group[['AmountLC']].sum()
Out[76]:
Order AmountLC
1127828 16348566.88
1127829 22237710.38
1127830 29803745.65
1127831 30621381.06
1127832 33926352.51
MvT102group = MvT102.groupby('Order', sort=True)
In [77]:
MvT102group[['AmountLC']].sum()
Out[77]:
Order AmountLC
1127830 53221.70
1127831 651475.13
1127834 67442.16
1127835 2477494.17
1128622 218743.14
After this I am not able to understand how should I write my query.
Please ask me any further details if you want.Here is the CSV file from where I am working Link

Hope I understood the question correctly. After grouping both groups as you did:
MvT101group = MvT101.groupby('Order',sort=True).sum()
MvT102group = MvT102.groupby('Order',sort=True).sum()
You can update the columns' names for both groups:
MvT101group.columns = MvT101group.columns.map(lambda x: str(x) + '_101')
MvT102group.columns = MvT102group.columns.map(lambda x: str(x) + '_102')
Then merge all 3 tables so that you will have all 3 columns in the main table:
df = df.merge(MvT101group, left_on=['Order'], right_index=True, how='left')
df = df.merge(MvT102group, left_on=['Order'], right_index=True, how='left')
And then you can add the calculated column:
df['calc'] = (df['Order_101']-df['Order_102']) / 100

Pandas Reindex - Fill Column with Missing Values

I tried several examples of this topic but with no results. I'm reading a DataFrame like:
Code,Counts
10006,5
10011,2
10012,26
10013,20
10014,17
10015,2
10018,2
10019,3
How can I get another DataFrame like:
Code,Counts
10006,5
10007,NaN
10008,NaN
...
10011,2
10012,26
10013,20
10014,17
10015,2
10016,NaN
10017,NaN
10018,2
10019,3
Basically filling the missing values of the 'Code' Column? I tried the df.reindex() method but I can't figure out how it works. Thanks a lot.

I'd set the index to you 'Code' column, then reindex by passing in a new array based on your current index, arange accepts a start and stop param (you need to add 1 to the end) and then reset_index this assumes that your 'Code' values are already sorted:
In [21]:
df.set_index('Code', inplace=True)
df = df.reindex(index = np.arange(df.index[0], df.index[-1] + 1)).reset_index()
df
Out[21]:
Code Counts
0 10006 5
1 10007 NaN
2 10008 NaN
3 10009 NaN
4 10010 NaN
5 10011 2
6 10012 26
7 10013 20
8 10014 17
9 10015 2
10 10016 NaN
11 10017 NaN
12 10018 2
13 10019 3

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Traspose groupby sublevels into columns pandas/python - python

gp1.groupby(by=['ID', 'CD'])['BALANCE_AM'].sum() ID CD 4332 5 0.0 58 0.0 123 22656.0 756423 47 645087.0 123 227655.0 I want to create columns for each type of CD, what is the sum of BALANCE_AM Desired Output ID 5 58 123 47 4332 0 0 22656.0 NaN 756423 NaN NaN 227655.0 645087.0

Add Series.unstack and DataFrame.reset_index if necessary ID in column: df = gp1.groupby(by=['ID', 'CD'])['BALANCE_AM'].sum().unstack().reset_index()

Another way is to use pivot_table instead of groupby: gp1.pivot_table(values='BALANCE_AM', index='ID', columns='CD', aggfunc='sum')

Related

Sum two columns only if the values of one column is bigger/greater 0

column not found while renaming in panda dataframe

how to convert pandas columns to rows dynamically

How should I subtract two dataframes and in Pandas and diplay the required output?

Pandas Reindex - Fill Column with Missing Values

Categories

Resources