Related
In the following dataset:
import pandas as pd
df = pd.DataFrame({'globalid': {0: '4388064', 1: '4388200', 2: '4399344', 3: '4400638', 4: '4401765', 5: '4401831', 6: '4402098', 7: '4406997', 8: '4407331', 9: '4417043', 10: '4437380', 11: '4442467', 12: '4401955', 13: '4425140', 14: '4426164', 15: '4405473', 16: '4411249', 17: '4388584', 18: '4400483', 19: '4433927', 20: '4413441', 21: '4436355', 22: '4443361', 23: '4443375', 24: '4388176'}, 'postcode': {0: '1774PG', 1: '7481LK', 2: '1068MS', 3: '5628EN', 4: '7731TV', 5: '5971CR', 6: '9571BM', 7: '1031KA', 8: '9076BK', 9: '4465AL', 10: '1096AC', 11: '3601', 12: '2563PT', 13: '2341HN', 14: '2553DM', 15: '2403EM', 16: '1051AN', 17: '4525AB', 18: '4542BA', 19: '1096AC', 20: '5508AE', 21: '1096AC', 22: '3543GC', 23: '4105TA', 24: '7742EH'}, 'koopprijs': {0: '139000', 1: '209000', 2: '267500', 3: '349000', 4: '495000', 5: '162500', 6: '217500', 7: '655000', 8: '180000', 9: '495000', 10: '2395000', 11: '355000', 12: '150000', 13: '167500', 14: '710000', 15: '275000', 16: '498000', 17: '324500', 18: '174500', 19: '610000', 20: '300000', 21: '2230000', 22: '749000', 23: '504475', 24: '239000'}, 'place_name': {0: 'Slootdorp', 1: 'Haaksbergen', 2: 'Amsterdam', 3: 'Eindhoven', 4: 'Ommen', 5: 'Grubbenvorst', 6: '2e Exloërmond', 7: 'Amsterdam', 8: 'St.-Annaparochie', 9: 'Goes', 10: 'Amsterdam', 11: 'Maarssen', 12: "'s-Gravenhage", 13: 'Oegstgeest', 14: "'s-Gravenhage", 15: 'Alphen aan den Rijn', 16: 'Amsterdam', 17: 'Retranchement', 18: 'Hoek', 19: 'Amsterdam', 20: 'Veldhoven', 21: 'Amsterdam', 22: 'Utrecht', 23: 'Culemborg', 24: 'Coevorden'}})
print(df)
I would like to compute the average asking price, which is indicated by 'koopprijs' per place_name. Can someone please provide the code, or explain how this can be computed? As there are multiple 'koopprijs' per place_name, such as Amsterdam I am looking to compute the average price per placename.
You can try below:
df['koopprijs'] = df['koopprijs'].astype(int) # just make sure the values are int.
df2 = df.groupby('place_name')['koopprijs'].mean()
print(df2)
You will get the output as:
place_name
's-Gravenhage 430000
2e Exloërmond 217500
Alphen aan den Rijn 275000
Amsterdam 1109250
Coevorden 239000
Culemborg 504475
Eindhoven 349000
Goes 495000
Grubbenvorst 162500
Haaksbergen 209000
Hoek 174500
Maarssen 355000
Oegstgeest 167500
Ommen 495000
Retranchement 324500
Slootdorp 139000
St.-Annaparochie 180000
Utrecht 749000
Veldhoven 300000
Name: koopprijs, dtype: int32
First change the data type for koopprijs and then use groupby-agg
df['koopprijs'] = df['koopprijs'].astype('int')
df = df.groupby(['place_name'])['koopprijs'].agg('mean')
I am trying to solve a business problem using Python but have difficulties to come up with a script to solve it. I have tried to loop through the dataframe using df.iterrows() but I am totally stuck because I just don't know how to proceed.
We process volumes in production orders of 1 type of resource that we need to process FIFO (first in first out). Each lot has a certain volume and price, after using up a lot we start with the next lot (FIFO).
Question: How can I automate the calculation of column Revenu ? Can you come up with some Python code that I can use to automate this process? Would you use a while or for loop, and would you iterate through the dataframe?
Herebelow I posted a print screen of the solution, on the left the production orders and on the right the volume and price per lot.
Below the image I posted 2 dictionaries containing the data of the screenshot.
Would really appreciate your help...
{'Productionorder': {0: 'Productionorder 1',
1: 'Productionorder 2',
2: 'Productionorder 3',
3: 'Productionorder 4',
4: 'Productionorder 5',
5: 'Productionorder 6',
6: 'Productionorder 7',
7: 'Productionorder 8',
8: 'Productionorder 9',
9: 'Productionorder 10',
10: 'Productionorder 11',
11: 'Productionorder 12',
12: 'Productionorder 13',
13: 'Productionorder 14',
14: 'Productionorder 15',
15: 'Productionorder 16',
16: 'Productionorder 17',
17: 'Productionorder 18',
18: 'Productionorder 19',
19: 'Productionorder 20',
20: 'Productionorder 21',
21: 'Productionorder 22'},
'Processed volume': {0: 810,
1: 3240,
2: 3177,
3: 1620,
4: 6480,
5: 5120,
6: 10880,
7: 13770,
8: 21060,
9: 4860,
10: 810,
11: 1620,
12: 15390,
13: 15390,
14: 6800,
15: 4480,
16: 10200,
17: 16650,
18: 2550,
19: 9050,
20: 9900,
21: 3200},
'Lotno.': {0: 1,
1: 1,
2: 1,
3: 1,
4: 2,
5: 2,
6: 2,
7: 2,
8: 2,
9: 2,
10: 2,
11: 2,
12: 2,
13: 3,
14: 3,
15: 3,
16: 3,
17: 3,
18: 3,
19: 3,
20: 4,
21: 4},
'Left of Lotno.': {0: 8490,
1: 5250,
2: 2073,
3: 453,
4: 75973,
5: 70853,
6: 59973,
7: 46203,
8: 25143,
9: 20283,
10: 19473,
11: 17853,
12: 2463,
13: 52073,
14: 45273,
15: 40793,
16: 30593,
17: 13943,
18: 11393,
19: 2343,
20: 38443,
21: 35243},
'Revenu': {0: 1741.5,
1: 6966.0,
2: 6830.549999999999,
3: 3483.0,
4: 10315.800000000001,
5: 7936.0,
6: 16864.0,
7: 21343.5,
8: 32643.0,
9: 7533.0,
10: 1255.5,
11: 2511.0,
12: 23854.5,
13: 20622.750000000004,
14: 8840.0,
15: 5824.0,
16: 13260.0,
17: 21645.0,
18: 3315.0,
19: 11765.0,
20: 12492.15,
21: 4000.0}}
{'Date': {0: Timestamp('2021-01-01 00:00:00'),
1: Timestamp('2021-01-02 00:00:00'),
2: Timestamp('2021-01-03 00:00:00'),
3: Timestamp('2021-01-04 00:00:00')},
'Lotno.': {0: 1, 1: 2, 2: 3, 3: 4},
'Volume': {0: 9300, 1: 82000, 2: 65000, 3: 46000},
'Price': {0: 2.15, 1: 1.55, 2: 1.3, 3: 1.25}}
Assuming you have two dataframes:
One for the Production Orders
And another for the Lot Details
The following function should allow you to calculate the Revenues (Along with the 'Lotno.' and 'Left of Lotno.' intermediary columns)
Requirements for each dataframe:
The Production Orders DataFrame must:
contain a column with the title 'Processed volume'
the index should be of consecutive integers starting at 0.
The Lot Details must:
contain the Columns ['Lotno.', 'Volume', 'Price']
have at least one row
rows should be ordered in the order of expected depletion.
In the event that the Quantity available in the lot is depleted, no additional revenue will be generated.
def fill_revenue(df1_orig, df2):
"""
df1_orig is the Production Orders DataFrame
df2 is the Lot Details DataFrame
The returned DataFrame is based on a copy of the df1_orig
"""
df1 = df1_orig.copy()
# Create Empty Columns for calculated fields
df1['Lotno.'] = None
df1['Left of Lotno.'] = None
df1['Revenu'] = None
def recursive_revenu_calc(order_volume, current_lot, current_lot_quantity, return_dict=None):
"""A function used to update the new values of a row"""
if return_dict is None:
return_dict = {'Revenu': 0}
return_dict.update({'Lotno.': current_lot, 'Left of Lotno.': current_lot_quantity})
lot_info = df2.loc[df2['Lotno.'] == current_lot].iloc[0]
# start calculation
if current_lot_quantity > order_volume:
return_dict['Revenu'] += order_volume * lot_info['Price']
current_lot_quantity -= order_volume
order_volume = 0
return_dict['Left of Lotno.'] = current_lot_quantity
else:
return_dict['Revenu'] += current_lot_quantity * lot_info['Price']
order_volume -= current_lot_quantity
try:
lot_info = df2.iloc[df2.index.get_loc(lot_info.name) + 1]
except IndexError:
return_dict['Left of Lotno.'] = 0
return return_dict
current_lot = lot_info['Lotno.']
current_lot_quantity = lot_info['Volume']
recursive_revenu_calc(order_volume, current_lot, current_lot_quantity, return_dict)
return return_dict
# updating each row of the Production Orders DataFrame
for idx, row in df1.iterrows():
order_volume = row['Processed volume']
current_lot = df2.iloc[0]['Lotno.'] if idx == 0 else df1.iloc[idx - 1]['Lotno.']
current_lot_quantity = df2.iloc[0]['Volume'] if idx == 0 else df1.iloc[idx - 1]['Left of Lotno.']
update_dict = recursive_revenu_calc(order_volume, current_lot, current_lot_quantity)
for key, value in update_dict.items():
df1.loc[idx, key] = value
return df1
I'm trying to use Plotly to overlay a marker/line chart on top of my OHLC candle chart.
Code
import plotly.graph_objects as go
import pandas as pd
from datetime import datetime
df = pd.DataFrame(
{'index': {0: 0,
1: 1,
2: 2,
3: 3,
4: 4,
5: 5,
6: 6,
7: 7,
8: 8,
9: 9,
10: 10,
11: 11,
12: 12,
13: 13,
14: 14,
15: 15,
16: 16,
17: 17,
18: 18,
19: 19,
20: 20,
21: 21,
22: 22,
23: 23,
24: 24},
'Date': {0: '2018-09-03',
1: '2018-09-04',
2: '2018-09-05',
3: '2018-09-06',
4: '2018-09-07',
5: '2018-09-10',
6: '2018-09-11',
7: '2018-09-12',
8: '2018-09-13',
9: '2018-09-14',
10: '2018-09-17',
11: '2018-09-18',
12: '2018-09-19',
13: '2018-09-20',
14: '2018-09-21',
15: '2018-09-24',
16: '2018-09-25',
17: '2018-09-26',
18: '2018-09-27',
19: '2018-09-28',
20: '2018-10-01',
21: '2018-10-02',
22: '2018-10-03',
23: '2018-10-04',
24: '2018-10-05'},
'Open': {0: 1.2922067642211914,
1: 1.2867859601974487,
2: 1.2859420776367188,
3: 1.2914056777954102,
4: 1.2928247451782229,
5: 1.292808175086975,
6: 1.3027958869934082,
7: 1.3017443418502808,
8: 1.30451238155365,
9: 1.3110626935958862,
10: 1.3071041107177734,
11: 1.3146650791168213,
12: 1.3166556358337402,
13: 1.3140604496002195,
14: 1.3271400928497314,
15: 1.3080958127975464,
16: 1.3117163181304932,
17: 1.3180439472198486,
18: 1.3169677257537842,
19: 1.3077707290649414,
20: 1.3039510250091553,
21: 1.3043931722640991,
22: 1.2979763746261597,
23: 1.2941633462905884,
24: 1.3022021055221558},
'High': {0: 1.2934937477111816,
1: 1.2870012521743774,
2: 1.2979259490966797,
3: 1.2959914207458496,
4: 1.3024225234985352,
5: 1.3052103519439695,
6: 1.30804443359375,
7: 1.3044441938400269,
8: 1.3120088577270508,
9: 1.3143367767333984,
10: 1.3156682252883911,
11: 1.3171066045761108,
12: 1.3211784362792969,
13: 1.3296104669570925,
14: 1.3278449773788452,
15: 1.3166556358337402,
16: 1.3175750970840454,
17: 1.3196094036102295,
18: 1.3180439472198486,
19: 1.3090718984603882,
20: 1.3097577095031738,
21: 1.3049719333648682,
22: 1.3020155429840088,
23: 1.3036959171295166,
24: 1.310753345489502},
'Low': {0: 1.2856279611587524,
1: 1.2813942432403564,
2: 1.2793285846710205,
3: 1.289723515510559,
4: 1.2918561697006226,
5: 1.289823293685913,
6: 1.2976733446121216,
7: 1.298414707183838,
8: 1.3027619123458862,
9: 1.3073604106903076,
10: 1.3070186376571655,
11: 1.3120776414871216,
12: 1.3120431900024414,
13: 1.3140085935592651,
14: 1.305841088294983,
15: 1.3064552545547483,
16: 1.3097233772277832,
17: 1.3141123056411743,
18: 1.309706211090088,
19: 1.3002548217773438,
20: 1.3014055490493774,
21: 1.2944146394729614,
22: 1.2964619398117063,
23: 1.2924572229385376,
24: 1.3005592823028564},
'Close': {0: 1.292306900024414,
1: 1.2869019508361816,
2: 1.2858428955078125,
3: 1.2914891242980957,
4: 1.2925406694412231,
5: 1.2930254936218262,
6: 1.302643060684204,
7: 1.3015578985214231,
8: 1.304546356201172,
9: 1.311131477355957,
10: 1.307326316833496,
11: 1.3146305084228516,
12: 1.3168463706970217,
13: 1.3141123056411743,
14: 1.327087163925171,
15: 1.30804443359375,
16: 1.3117333650588991,
17: 1.3179919719696045,
18: 1.3172800540924072,
19: 1.3078734874725342,
20: 1.3039000034332275,
21: 1.3043591976165771,
22: 1.2981956005096436,
23: 1.294062852859497,
24: 1.3024225234985352},
'Pivot Price': {0: 1.2934937477111816,
1: np.nan,
2: 1.2793285846710205,
3: np.nan,
4: np.nan,
5: np.nan,
6: np.nan,
7: np.nan,
8: np.nan,
9: np.nan,
10: np.nan,
11: np.nan,
12: np.nan,
13: 1.3296104669570925,
14: np.nan,
15: np.nan,
16: np.nan,
17: np.nan,
18: np.nan,
19: np.nan,
20: np.nan,
21: np.nan,
22: np.nan,
23: 1.2924572229385376,
24: np.nan}})
fig = go.Figure(data=[go.Candlestick(x=df['Date'],
open=df['Open'],
high=df['High'],
low=df['Low'],
close=df['Close'])])
fig.add_trace(
go.Scatter(mode = "lines+markers",
x=df['Date'],
y=df["Pivot Price"]
))
fig.update_layout(
autosize=False,
width=1000,
height=800,)
fig.show()
This is the current image
This is the desired output/image
I want black line between the markers (pivots). I would also ideally like a value next to each line showing the distance between each pivot but Im not sure how to do this.
For example the distance between the first two pivots round(abs(1.293494 - 1.279329),3) returns 0.014 so I would ideally like this next to the line.
The second is round(abs(1.279329 - 1.329610),3) so the value would be 0.05. I have hand edited the image and added the lines for the first two values to give a visual representation of what Im trying to achieve.
The problem seems to be the missing values. So just use pandas.Series.interpolate in combination with fig.add_annotation to get:
I've included annotations for differences as well. There are surely more elegant ways to do it than with for loops, but it does the job. Let me know if anything is unclear!
import pandas as pd
import numpy as np
import plotly.graph_objects as go
df = pd.DataFrame(
{'index': {0: 0,
1: 1,
2: 2,
3: 3,
4: 4,
5: 5,
6: 6,
7: 7,
8: 8,
9: 9,
10: 10,
11: 11,
12: 12,
13: 13,
14: 14,
15: 15,
16: 16,
17: 17,
18: 18,
19: 19,
20: 20,
21: 21,
22: 22,
23: 23,
24: 24},
'Date': {0: '2018-09-03',
1: '2018-09-04',
2: '2018-09-05',
3: '2018-09-06',
4: '2018-09-07',
5: '2018-09-10',
6: '2018-09-11',
7: '2018-09-12',
8: '2018-09-13',
9: '2018-09-14',
10: '2018-09-17',
11: '2018-09-18',
12: '2018-09-19',
13: '2018-09-20',
14: '2018-09-21',
15: '2018-09-24',
16: '2018-09-25',
17: '2018-09-26',
18: '2018-09-27',
19: '2018-09-28',
20: '2018-10-01',
21: '2018-10-02',
22: '2018-10-03',
23: '2018-10-04',
24: '2018-10-05'},
'Open': {0: 1.2922067642211914,
1: 1.2867859601974487,
2: 1.2859420776367188,
3: 1.2914056777954102,
4: 1.2928247451782229,
5: 1.292808175086975,
6: 1.3027958869934082,
7: 1.3017443418502808,
8: 1.30451238155365,
9: 1.3110626935958862,
10: 1.3071041107177734,
11: 1.3146650791168213,
12: 1.3166556358337402,
13: 1.3140604496002195,
14: 1.3271400928497314,
15: 1.3080958127975464,
16: 1.3117163181304932,
17: 1.3180439472198486,
18: 1.3169677257537842,
19: 1.3077707290649414,
20: 1.3039510250091553,
21: 1.3043931722640991,
22: 1.2979763746261597,
23: 1.2941633462905884,
24: 1.3022021055221558},
'High': {0: 1.2934937477111816,
1: 1.2870012521743774,
2: 1.2979259490966797,
3: 1.2959914207458496,
4: 1.3024225234985352,
5: 1.3052103519439695,
6: 1.30804443359375,
7: 1.3044441938400269,
8: 1.3120088577270508,
9: 1.3143367767333984,
10: 1.3156682252883911,
11: 1.3171066045761108,
12: 1.3211784362792969,
13: 1.3296104669570925,
14: 1.3278449773788452,
15: 1.3166556358337402,
16: 1.3175750970840454,
17: 1.3196094036102295,
18: 1.3180439472198486,
19: 1.3090718984603882,
20: 1.3097577095031738,
21: 1.3049719333648682,
22: 1.3020155429840088,
23: 1.3036959171295166,
24: 1.310753345489502},
'Low': {0: 1.2856279611587524,
1: 1.2813942432403564,
2: 1.2793285846710205,
3: 1.289723515510559,
4: 1.2918561697006226,
5: 1.289823293685913,
6: 1.2976733446121216,
7: 1.298414707183838,
8: 1.3027619123458862,
9: 1.3073604106903076,
10: 1.3070186376571655,
11: 1.3120776414871216,
12: 1.3120431900024414,
13: 1.3140085935592651,
14: 1.305841088294983,
15: 1.3064552545547483,
16: 1.3097233772277832,
17: 1.3141123056411743,
18: 1.309706211090088,
19: 1.3002548217773438,
20: 1.3014055490493774,
21: 1.2944146394729614,
22: 1.2964619398117063,
23: 1.2924572229385376,
24: 1.3005592823028564},
'Close': {0: 1.292306900024414,
1: 1.2869019508361816,
2: 1.2858428955078125,
3: 1.2914891242980957,
4: 1.2925406694412231,
5: 1.2930254936218262,
6: 1.302643060684204,
7: 1.3015578985214231,
8: 1.304546356201172,
9: 1.311131477355957,
10: 1.307326316833496,
11: 1.3146305084228516,
12: 1.3168463706970217,
13: 1.3141123056411743,
14: 1.327087163925171,
15: 1.30804443359375,
16: 1.3117333650588991,
17: 1.3179919719696045,
18: 1.3172800540924072,
19: 1.3078734874725342,
20: 1.3039000034332275,
21: 1.3043591976165771,
22: 1.2981956005096436,
23: 1.294062852859497,
24: 1.3024225234985352},
'Pivot Price': {0: 1.2934937477111816,
1: np.nan,
2: 1.2793285846710205,
3: np.nan,
4: np.nan,
5: np.nan,
6: np.nan,
7: np.nan,
8: np.nan,
9: np.nan,
10: np.nan,
11: np.nan,
12: np.nan,
13: 1.3296104669570925,
14: np.nan,
15: np.nan,
16: np.nan,
17: np.nan,
18: np.nan,
19: np.nan,
20: np.nan,
21: np.nan,
22: np.nan,
23: 1.2924572229385376,
24: np.nan}})
import plotly.graph_objects as go
import pandas as pd
from datetime import datetime
# df=pd.read_csv("for_so.csv")
fig = go.Figure(data=[go.Candlestick(x=df['Date'],
# fig = go.Figure(data=[go.Candlestick(x=df.index,
open=df['Open'],
high=df['High'],
low=df['Low'],
close=df['Close'])])
# some calculations
df_diff = df['Pivot Price'].dropna().diff().copy()
df2 = df[df.index.isin(df_diff.index)].copy()
df2['Price Diff'] = df['Pivot Price'].dropna().values
fig.add_trace(
go.Scatter(mode = "lines+markers",
x=df['Date'],
y=df["Pivot Price"]
))
fig.update_layout(
autosize=False,
width=1000,
height=800,)
fig.add_trace(go.Scatter(x=df['Date'], y=df['Pivot Price'].interpolate(),
# fig.add_trace(go.Scatter(x=df.index, y=df['Pivot Price'].interpolate(),
mode = 'lines',
line = dict(color='black')))
def annot(value):
# print(type(value))
if np.isnan(value):
return ''
else:
return value
j = 0
for i, p in enumerate(df['Pivot Price']):
# print(p)
# if not np.isnan(p) and not np.isnan(df_diff.iloc[j]):
if not np.isnan(p):
# print(not np.isnan(df_diff.iloc[j]))
fig.add_annotation(dict(font=dict(color='rgba(0,0,200,0.8)',size=12),
x=df['Date'].iloc[i],
# x=df.index[i],
# x = xStart
y=p,
showarrow=False,
text=annot(round(abs(df_diff.iloc[j]),3)),
textangle=0,
xanchor='right',
xref="x",
yref="y"))
j = j + 1
fig.update_xaxes(type='category')
fig.show()
Problem seems the missing values, plotly has difficulty with. With this trick you can only plot the point;
has_value = ~df["Pivot Price"].isna()
import plotly.graph_objects as go
import pandas as pd
from datetime import datetime
df=pd.read_csv("notebooks/for_so.csv")
fig = go.Figure(data=[go.Candlestick(x=df['Date'],
open=df['Open'],
high=df['High'],
low=df['Low'],
close=df['Close'])])
fig.add_trace(
go.Scatter(mode = 'lines',
x=df[has_value]['Date'],
y=df[has_value]["Pivot Price"], line={'color':'black', 'width':1}
))
fig.add_trace(
go.Scatter(mode = "markers",
x=df['Date'],
y=df["Pivot Price"]
))
fig.update_layout(
autosize=False,
width=1000,
height=800,)
fig.show()
This did it for me.
This question already has answers here:
Get the row(s) which have the max value in groups using groupby
(15 answers)
Closed 2 years ago.
I have a table that looks like this
I want to be able to keep ids for brands that have highest freq. For example in case of audi both ids have same frequencies so keep only one. In case of mercedes-benz keep the latter one since it has frequency 7.
This is my dataframe:
{'Brand':
{0: 'audi',
1: 'audi',
2: 'bmw',
3: 'dacia',
4: 'fiat',
5: 'ford',
6: 'ford',
7: 'honda',
8: 'honda',
9: 'hyundai',
10: 'kia',
11: 'mercedes-benz',
12: 'mercedes-benz',
13: 'nissan',
14: 'nissan',
15: 'opel',
16: 'renault',
17: 'renault',
18: 'renault',
19: 'renault',
20: 'toyota',
21: 'toyota',
22: 'volvo',
23: 'vw',
24: 'vw',
25: 'vw',
26: 'vw'},
'id':
{0: 'audi_a4_dynamic_2016_otomatik',
1: 'audi_a6_standart_2015_otomatik',
2: 'bmw_5 series_executive_2016_otomatik',
3: 'dacia_duster_laureate_2017_manuel',
4: 'fiat_egea_easy_2017_manuel',
5: 'ford_focus_trend x_2015_manuel',
6: 'ford_focus_trend x_2015_otomatik',
7: 'honda_civic_eco elegance_2017_otomatik',
8: 'honda_cr-v_executive_2018_otomatik',
9: 'hyundai_tucson_elite plus_2017_otomatik',
10: 'kia_sportage_concept plus_2015_otomatik',
11: 'mercedes-benz_c-class_amg_2016_otomatik',
12: 'mercedes-benz_e-class_edition e_2015_otomatik',
13: 'nissan_qashqai_black edition_2014_manuel',
14: 'nissan_qashqai_sky pack_2015_otomatik',
15: 'opel_astra_edition_2016_manuel',
16: 'renault_clio_joy_2016_manuel',
17: 'renault_kadjar_icon_2015_otomatik',
18: 'renault_kadjar_icon_2016_otomatik',
19: 'renault_mégane_touch_2017_otomatik',
20: 'toyota_corolla_touch_2015_otomatik',
21: 'toyota_corolla_touch_2016_otomatik',
22: 'volvo_s60_advance_2018_otomatik',
23: 'vw_jetta_comfortline_2013_otomatik',
24: 'vw_passat_highline_2017_otomatik',
25: 'vw_tiguan_sport&style_2012_manuel',
26: 'vw_tiguan_sport&style_2013_manuel'},
'freq': {0: 4,
1: 4,
2: 7,
3: 4,
4: 4,
5: 4,
6: 4,
7: 4,
8: 4,
9: 4,
10: 4,
11: 4,
12: 7,
13: 4,
14: 4,
15: 4,
16: 4,
17: 4,
18: 4,
19: 4,
20: 4,
21: 4,
22: 4,
23: 4,
24: 7,
25: 4,
26: 4}}
Edit: tried one of the answers and got an extra level of header
You need to pandas.groupby Brand and then aggregate with respect to the maximal frequency.
Something like this should work:
df.groupby('Brand')[['id', 'freq']].agg({'freq': 'max'})
To get your result, run:
result = df.groupby('Brand', as_index=False).apply(
lambda grp: grp[grp.freq == grp.freq.max()].iloc[0])
I have a following dictionary where each key in the dictionary is associated with a dataframe.
data['total_brands'] = pd.DataFrame({'total_brands': {0: 164}})
data['new_portfolios_added'] = pd.DataFrame({'new_portfolios_added': {0: 3}})
data['total_updated_portfolios'] = pd.DataFrame({'total_updated_portfolios': {0: 1}})
data['family_per_brand'] = pd.DataFrame({'brand_name': {0: 'Morningstar',
1: 'Vanguard',
2: 'WisdomTree',
3: 'State Street',
4: 'First Trust',
5: 'Franklin Templeton Investments',
6: 'Logicly',
7: 'Nuveen',
8: 'Scott Burns',
9: 'Paul Merriman',
10: 'Fidelity',
11: 'FlexShares',
12: 'Alpha Architect',
13: 'Rick Ferri',
14: 'Craig Israelsen',
15: 'Rajan Subramanian',
16: 'Goldman Sachs',
17: 'JPMorgan',
18: 'Xtrackers',
19: 'PIMCO',
20: 'John Hancock',
21: 'Hartford',
22: 'Invesco',
23: 'Schwab'},
'family_per_brand': {0: 7,
1: 6,
2: 5,
3: 5,
4: 4,
5: 4,
6: 3,
7: 3,
8: 2,
9: 2,
10: 2,
11: 1,
12: 1,
13: 1,
14: 1,
15: 1,
16: 0,
17: 0,
18: 0,
19: 0,
20: 0,
21: 0,
22: 0,
23: 0}})
Now, i want to send all my data to an email in text format with in the body of the email with the data frames looking presentable. I searched around stack overflow and found these functions to help with my case:
blanks = r'^ *([a-zA-Z_0-9-]*) .*$'
blanks_comp = re.compile(blanks)
def find_index_in_line(line):
index = 0
spaces = False
for ch in line:
if ch == ' ':
spaces = True
elif spaces:
break
index += 1
return index
def pretty_to_string(df):
lines = df.to_string().split('\n')
header = lines[0]
m = blanks_comp.match(header)
indices = []
if m:
st_index = m.start(1)
indices.append(st_index)
non_header_lines = lines[1:len(lines)]
for line in non_header_lines:
index = find_index_in_line(line)
indices.append(index)
mn = np.min(indices)
newlines = []
for l in lines:
newlines.append(l[mn:len(l)])
return '\n'.join(newlines) if df.shape[0] > 1 else ':'.join(newlines)
Then I tried:
final = "\n".join(pretty_to_string(data[key]) for key in data.keys())
print(final)
Gives me the following output which is visually not appealing as you can see from the attachment.
Ideally i would want, 164 under total_brands, 3 under new_portfolios_added and 1 in total_updated_portfolios all aligned to the right
Ideally I would want the dataframe with the column "brand_name" aligned below the "total_updated_portfolios" tab
Perhaps saving to a csv, then opening in excel, copying the table into email would be fastest / easiest. That method often preserves the formatting you select, depending on your email client.
data.to_csv('newfilename.csv')