I have the a dataframe something like the below struture :
I need to make it look it as this :
Can any one help pls ?
You can use the groupby() function with a list and append summarising functions with agg().
import pandas as pd
df = pd.DataFrame({'customer': [1,2,1,3,1,2,3],
"group_code": ['111', '111', '222', '111', '111', '111', '333'],
"ind_code": ['A', 'B', 'AA', 'A', 'AAA', 'C', 'BBB'],
"amount": [100, 200, 140, 400, 225, 125, 600],
"card": ['XXX', 'YYY', 'YYY', 'XXX', 'XXX', 'YYY', 'XXX']})
df_groupby = df.groupby(['customer', 'group_code', 'ind_code']).agg(['count', 'mean'])
Related
This is my df:
df = pd.DataFrame({'sym': ['a', 'b', 'c', 'x', 'y', 'z', 'q', 'w', 'e'],
'sym_t': ['tsla', 'msft', 'f', 'aapl', 'aa', 'gg', 'amd', 'ba', 'c']})
I want to separate this df into groups of three and create a list of dictionaries:
options = [{'value':'a b c', 'label':'tsla msft f'}, {'value':'x y z', 'label':'aapl aa gg'}, {'value':'q w e', 'label':'amd ba c'}]
How can I create that list? My original df has over 1000 rows.
Try groupby to concatenate the rows, then to_dict:
tmp = df.groupby(np.arange(len(df))//3).agg(' '.join)
tmp.columns = ['value', 'label']
tmp.to_dict(orient='records')
Output:
[{'value': 'a b c', 'label': 'tsla msft f'},
{'value': 'x y z', 'label': 'aapl aa gg'},
{'value': 'q w e', 'label': 'amd ba c'}]
It's hard to explain what I'm trying to do so I'll give an example. In the example below, I am trying to get df3. I have done it with the code below but it is very "anti-pandas" and I am looking for a better (faster, cleaner, more pandas-esque) way to do it:
import pandas as pd
df1 = pd.DataFrame({"begin": [{"a", "b"}, {"b"}, {"c"}], "end": [{"x"}, {"z", "y"}, {"z"}]})
df2 = pd.DataFrame(
{"a": [10, 10, 15], "b": [15, 20, 30], "c": [8, 12, 10], "x": [1, 2, 3], "y": [1, 3, 4], "z": [1, 3, 1]}
)
df3 = df1.copy()
for i in range(len(df1)):
for j in range(len(df1.loc[i])):
df3.at[i, df1.columns[j]] = []
for v in df1.loc[i][j]:
df3.at[i, df1.columns[j]].append({"letter": v, "value": df2.loc[i][v]})
print(df3)
Here's my goal (which this code does, just probably not in the best way):
begin end
0 [{'letter': 'b', 'value': 15}, {'letter': 'a', 'value': 10} [{'letter': 'x', 'value': 1}]
1 [{'letter': 'b', 'value': 20}] [{'letter': 'y', 'value': 3}, {'letter': 'z', 'value': 3}
2 [{'letter': 'c', 'value': 10}] [{'letter': 'z', 'value': 1}]
Here is one way to approach the problem using pandas
# Reshape and explode the dataframe
s = df1.stack().explode().reset_index(name='letter')
# Map the values corresponding to the letters
s['value'] = s.set_index(['level_0', 'letter']).index.map(df2.stack())
# Assign list of records
s['records'] = s[['letter', 'value']].to_dict('records')
# Pivot with aggfunc as list
s = s.pivot_table('records', 'level_0', 'level_1', aggfunc=list)
print(s)
level_1 begin end
level_0
0 [{'letter': 'a', 'value': 10}, {'letter': 'b', 'value': 15}] [{'letter': 'x', 'value': 1}]
1 [{'letter': 'b', 'value': 20}] [{'letter': 'z', 'value': 3}, {'letter': 'y', 'value': 3}]
2 [{'letter': 'c', 'value': 10}] [{'letter': 'z', 'value': 1}]
I would like to know if there is an easy way to convert pandas dataframe to list by column instead of row ? for the example below, can we have [['Apple','Orange','Kiwi','Mango'],[220,200,1000,800],['a','o','k','m']] ?
Appreciate if anyone can advise on this. Thanks
import pandas as pd
data = {'Brand': ['Apple','Orange','Kiwi','Mango'],
'Price': [220,200,1000,800],
'Type' : ['a','o','k','m']
}
df = pd.DataFrame(data, columns = ['Brand', 'Price', 'Type'])
df.head()
df.values.tolist()
#[['Apple', 220, 'a'], ['Orange', 200, 'o'], ['Kiwi', 1000, 'k'], ['Mango', 800, 'm']]
#Anyway to have ?????
#[['Apple','Orange','Kiwi','Mango'],[220,200,1000,800],['a','o','k','m']]
Just use Transpose(T) attribute:
lst=df.T.values.tolist()
OR
use transpose() method:
lst=df.transpose().values.tolist()
If you print lst you will get:
[['Apple', 'Orange', 'Kiwi', 'Mango'], [220, 200, 1000, 800], ['a', 'o', 'k', 'm']]
inp Dataframe
df = pd.DataFrame({'Loc': ['Hyd', 'Hyd','Bang','Bang'],
'Item': ['A', 'B', 'A', 'B'],
'Month' : ['May','May','June','June'],
'Sales': [100, 100, 200, 200],
'Values': [1000, 1000, 2000, 2000]
})
My expected output
df = pd.DataFrame({'Loc': ['Hyd', 'Hyd','Hyd','Hyd','Bang','Bang','Bang','Bang'],
'Item': ['A', 'A', 'B', 'B','A', 'A', 'B', 'B'],
'VAR' : ['Sales','Values','Sales','Values','Sales','Values','Sales','Values'],
'May': [100, 1000, 100, 1000, 100, 1000, 100, 1000],
'June': [200, 2000, 200, 2000, 200, 2000, 200, 2000]
})
I have tried multiple solutions using melt and pivot but nothing seems to work ? not sure where I am missing ?
Here's my code
dem.melt(['Part','IBU','Date1']).pivot_table(index=['Part','IBU','variable'],columns=['Date1'])
Any help would be much appreciated
You can use melt and pivot functions in pandas:
df_melted = pd.melt(df, id_vars=["Loc", "Item", "Month"], value_vars=["Sales", "Values"])
This will result:
And then:
df_pivot = df_melted.pivot_table(index=["Loc", "Item", "variable"], columns="Month")
So, the final output will be:
I have a dataframe as follow:
dashboard = pd.DataFrame({
'id':[1,2,3,4],
'category': ['a', 'b', 'a', 'c'],
'price': [123, 151, 21, 24],
'description': ['IT related', 'IT related', 'Marketing','']
})
I need to add a row to show both sum and count only for some categories as follow:
pd.DataFrame({
'id': [3],
'category': ['a&b'],
'price': [295],
'description': ['']
})
An option using .agg:
dashboard = pd.DataFrame({
'id': [1, 2, 3, 4],
'category': ['a', 'b', 'a', 'c'],
'price': [123, 151, 21, 24],
'description': ['IT related', 'IT related', 'Marketing', '']
})
a_b = dashboard[dashboard['category'].isin(['a','b'])].agg({'id':'count', 'price':sum})
df = pd.DataFrame({'a&b':a_b})
yields
a&b
id 3
price 295
which you could then .transpose() and merge into your existing dataframe if desired, or compile a separate dataframe of summary results, etc.
I pre-calculate all the sums for each category, then for each pair we add the sums, and the category names, and append the new row.
try this:
import pandas as pd
dashboard = pd.DataFrame({
'id': [1, 2, 3, 4],
'category': ['a', 'b', 'a', 'c'],
'price': [123, 151, 21, 24],
'description': ['IT related', 'IT related', 'Marketing', '']
})
pairs = [('a', 'b')]
groups = dashboard.groupby("category")['price'].sum()
for c1, c2 in pairs:
new_id = sum((dashboard['category'] == c1) | (dashboard['category'] == c2))
name = '{}&{}'.format(c1, c2)
price_sum = groups[c1] + groups[c2]
dashboard = dashboard.append(pd.DataFrame({'id': [new_id], 'category': [name], 'price': [price_sum], 'description': ['']}))
print(dashboard)
Try this:
Code
dashboard = pd.DataFrame({
'id':[1,2,3,4],
'category': ['a', 'b', 'a', 'c'],
'price': [123, 151, 21, 24],
'description': ['IT related', 'IT related', 'Marketing','']
})
selection =['a','b']
selection_row = '&'.join(selection)
df2 = dashboard[dashboard['category'].isin(selection)].agg({'id' : ['count'], 'price' : ['sum']}).fillna(0).T
df2['summary'] = df2['count'].add(df2['sum'])
df2.loc['description'] =np.nan
df2.loc['category'] = selection_row
final_df = df2['summary']
final_df
id 3
price 295
description NaN
category a&b
Name: summary, dtype: object