Need help in Python Pivot table group by - python

I have the a dataframe something like the below struture :
I need to make it look it as this :
Can any one help pls ?

You can use the groupby() function with a list and append summarising functions with agg().
import pandas as pd
df = pd.DataFrame({'customer': [1,2,1,3,1,2,3],
"group_code": ['111', '111', '222', '111', '111', '111', '333'],
"ind_code": ['A', 'B', 'AA', 'A', 'AAA', 'C', 'BBB'],
"amount": [100, 200, 140, 400, 225, 125, 600],
"card": ['XXX', 'YYY', 'YYY', 'XXX', 'XXX', 'YYY', 'XXX']})
df_groupby = df.groupby(['customer', 'group_code', 'ind_code']).agg(['count', 'mean'])

Related

creating a list of dictionaries from pandas dataframe

This is my df:
df = pd.DataFrame({'sym': ['a', 'b', 'c', 'x', 'y', 'z', 'q', 'w', 'e'],
'sym_t': ['tsla', 'msft', 'f', 'aapl', 'aa', 'gg', 'amd', 'ba', 'c']})
I want to separate this df into groups of three and create a list of dictionaries:
options = [{'value':'a b c', 'label':'tsla msft f'}, {'value':'x y z', 'label':'aapl aa gg'}, {'value':'q w e', 'label':'amd ba c'}]
How can I create that list? My original df has over 1000 rows.
Try groupby to concatenate the rows, then to_dict:
tmp = df.groupby(np.arange(len(df))//3).agg(' '.join)
tmp.columns = ['value', 'label']
tmp.to_dict(orient='records')
Output:
[{'value': 'a b c', 'label': 'tsla msft f'},
{'value': 'x y z', 'label': 'aapl aa gg'},
{'value': 'q w e', 'label': 'amd ba c'}]

pandas way to turn DataFrame of sets into DataFrame of dictionaries with value in corresponding cell in other DataFrame

It's hard to explain what I'm trying to do so I'll give an example. In the example below, I am trying to get df3. I have done it with the code below but it is very "anti-pandas" and I am looking for a better (faster, cleaner, more pandas-esque) way to do it:
import pandas as pd
df1 = pd.DataFrame({"begin": [{"a", "b"}, {"b"}, {"c"}], "end": [{"x"}, {"z", "y"}, {"z"}]})
df2 = pd.DataFrame(
{"a": [10, 10, 15], "b": [15, 20, 30], "c": [8, 12, 10], "x": [1, 2, 3], "y": [1, 3, 4], "z": [1, 3, 1]}
)
df3 = df1.copy()
for i in range(len(df1)):
for j in range(len(df1.loc[i])):
df3.at[i, df1.columns[j]] = []
for v in df1.loc[i][j]:
df3.at[i, df1.columns[j]].append({"letter": v, "value": df2.loc[i][v]})
print(df3)
Here's my goal (which this code does, just probably not in the best way):
begin end
0 [{'letter': 'b', 'value': 15}, {'letter': 'a', 'value': 10} [{'letter': 'x', 'value': 1}]
1 [{'letter': 'b', 'value': 20}] [{'letter': 'y', 'value': 3}, {'letter': 'z', 'value': 3}
2 [{'letter': 'c', 'value': 10}] [{'letter': 'z', 'value': 1}]
Here is one way to approach the problem using pandas
# Reshape and explode the dataframe
s = df1.stack().explode().reset_index(name='letter')
# Map the values corresponding to the letters
s['value'] = s.set_index(['level_0', 'letter']).index.map(df2.stack())
# Assign list of records
s['records'] = s[['letter', 'value']].to_dict('records')
# Pivot with aggfunc as list
s = s.pivot_table('records', 'level_0', 'level_1', aggfunc=list)
print(s)
level_1 begin end
level_0
0 [{'letter': 'a', 'value': 10}, {'letter': 'b', 'value': 15}] [{'letter': 'x', 'value': 1}]
1 [{'letter': 'b', 'value': 20}] [{'letter': 'z', 'value': 3}, {'letter': 'y', 'value': 3}]
2 [{'letter': 'c', 'value': 10}] [{'letter': 'z', 'value': 1}]

Python pandas dataframe to list by column instead of row

I would like to know if there is an easy way to convert pandas dataframe to list by column instead of row ? for the example below, can we have [['Apple','Orange','Kiwi','Mango'],[220,200,1000,800],['a','o','k','m']] ?
Appreciate if anyone can advise on this. Thanks
import pandas as pd
data = {'Brand': ['Apple','Orange','Kiwi','Mango'],
'Price': [220,200,1000,800],
'Type' : ['a','o','k','m']
}
df = pd.DataFrame(data, columns = ['Brand', 'Price', 'Type'])
df.head()
df.values.tolist()
#[['Apple', 220, 'a'], ['Orange', 200, 'o'], ['Kiwi', 1000, 'k'], ['Mango', 800, 'm']]
#Anyway to have ?????
#[['Apple','Orange','Kiwi','Mango'],[220,200,1000,800],['a','o','k','m']]
Just use Transpose(T) attribute:
lst=df.T.values.tolist()
OR
use transpose() method:
lst=df.transpose().values.tolist()
If you print lst you will get:
[['Apple', 'Orange', 'Kiwi', 'Mango'], [220, 200, 1000, 800], ['a', 'o', 'k', 'm']]

Reshaping dataframe with multiple columns to row groups

inp Dataframe
df = pd.DataFrame({'Loc': ['Hyd', 'Hyd','Bang','Bang'],
'Item': ['A', 'B', 'A', 'B'],
'Month' : ['May','May','June','June'],
'Sales': [100, 100, 200, 200],
'Values': [1000, 1000, 2000, 2000]
})
My expected output
df = pd.DataFrame({'Loc': ['Hyd', 'Hyd','Hyd','Hyd','Bang','Bang','Bang','Bang'],
'Item': ['A', 'A', 'B', 'B','A', 'A', 'B', 'B'],
'VAR' : ['Sales','Values','Sales','Values','Sales','Values','Sales','Values'],
'May': [100, 1000, 100, 1000, 100, 1000, 100, 1000],
'June': [200, 2000, 200, 2000, 200, 2000, 200, 2000]
})
I have tried multiple solutions using melt and pivot but nothing seems to work ? not sure where I am missing ?
Here's my code
dem.melt(['Part','IBU','Date1']).pivot_table(index=['Part','IBU','variable'],columns=['Date1'])
Any help would be much appreciated
You can use melt and pivot functions in pandas:
df_melted = pd.melt(df, id_vars=["Loc", "Item", "Month"], value_vars=["Sales", "Values"])
This will result:
And then:
df_pivot = df_melted.pivot_table(index=["Loc", "Item", "variable"], columns="Month")
So, the final output will be:

Add a Total and Count Row to a Dataframe

I have a dataframe as follow:
dashboard = pd.DataFrame({
'id':[1,2,3,4],
'category': ['a', 'b', 'a', 'c'],
'price': [123, 151, 21, 24],
'description': ['IT related', 'IT related', 'Marketing','']
})
I need to add a row to show both sum and count only for some categories as follow:
pd.DataFrame({
'id': [3],
'category': ['a&b'],
'price': [295],
'description': ['']
})
An option using .agg:
dashboard = pd.DataFrame({
'id': [1, 2, 3, 4],
'category': ['a', 'b', 'a', 'c'],
'price': [123, 151, 21, 24],
'description': ['IT related', 'IT related', 'Marketing', '']
})
a_b = dashboard[dashboard['category'].isin(['a','b'])].agg({'id':'count', 'price':sum})
df = pd.DataFrame({'a&b':a_b})
yields
a&b
id 3
price 295
which you could then .transpose() and merge into your existing dataframe if desired, or compile a separate dataframe of summary results, etc.
I pre-calculate all the sums for each category, then for each pair we add the sums, and the category names, and append the new row.
try this:
import pandas as pd
dashboard = pd.DataFrame({
'id': [1, 2, 3, 4],
'category': ['a', 'b', 'a', 'c'],
'price': [123, 151, 21, 24],
'description': ['IT related', 'IT related', 'Marketing', '']
})
pairs = [('a', 'b')]
groups = dashboard.groupby("category")['price'].sum()
for c1, c2 in pairs:
new_id = sum((dashboard['category'] == c1) | (dashboard['category'] == c2))
name = '{}&{}'.format(c1, c2)
price_sum = groups[c1] + groups[c2]
dashboard = dashboard.append(pd.DataFrame({'id': [new_id], 'category': [name], 'price': [price_sum], 'description': ['']}))
print(dashboard)
Try this:
Code
dashboard = pd.DataFrame({
'id':[1,2,3,4],
'category': ['a', 'b', 'a', 'c'],
'price': [123, 151, 21, 24],
'description': ['IT related', 'IT related', 'Marketing','']
})
selection =['a','b']
selection_row = '&'.join(selection)
df2 = dashboard[dashboard['category'].isin(selection)].agg({'id' : ['count'], 'price' : ['sum']}).fillna(0).T
df2['summary'] = df2['count'].add(df2['sum'])
df2.loc['description'] =np.nan
df2.loc['category'] = selection_row
final_df = df2['summary']
final_df
id 3
price 295
description NaN
category a&b
Name: summary, dtype: object

Categories

Resources