Related
This question already has an answer here:
How can I use cumsum within a group in Pandas?
(1 answer)
Closed 6 months ago.
I have a periodic table that includes premium in different categories over a year for different companies. The dataframe looks like the below:
Company
Type
Month
Year
Ferdi Grup
Premium
1
Allianz
Birikimli Hayat
1
2022
Ferdi
325
2
Allianz
Birikimli Hayat
2
2022
Ferdi
476
3
Axa
Birikimli Hayat
3
2022
Ferdi
687
I want to get a table where I can see the premium cumulated over 'Company' and 'Year'. For each month I want to see premium cumulated from the beginning of the year.
This is the regular sum operation which works well in this case.
data.pivot_table(
columns = 'Company',
index = 'Month',
values = 'Premium',
aggfunc= np.sum
)
However when I change to np.cumsum the result is a series. I want a cumulated pivot table for each year, adding each month's value to the previous ones. How can I do that?
Expected output:
Company
Month
Year
Premium
1
Allianz
1
2022
325
2
Allianz
2
2022
801
3
Axa
3
2022
687
So, this is the original data I am working with:
{'Company': {0: 'AgeSA',
1: 'Türkiye',
2: 'Türkiye',
3: 'AgeSA',
4: 'AgeSA',
5: 'Türkiye',
6: 'AgeSA',
7: 'Türkiye',
8: 'Türkiye',
9: 'AgeSA',
10: 'Türkiye',
11: 'Türkiye',
12: 'AgeSA',
13: 'Türkiye',
14: 'Türkiye',
15: 'AgeSA',
16: 'AgeSA',
17: 'Türkiye',
18: 'AgeSA',
19: 'Türkiye',
20: 'Türkiye',
21: 'AgeSA',
22: 'Türkiye',
23: 'Türkiye'},
'Type': {0: 'Birikimli Hayat',
1: 'Birikimli Hayat',
2: 'Sadece Yaşam Teminatlı',
3: 'Karma Sigorta',
4: 'Yıllık Vefat',
5: 'Yıllık Vefat',
6: 'Uzun Süreli Vefat',
7: 'Uzun Süreli Vefat',
8: 'Birikimli Hayat',
9: 'Yıllık Vefat',
10: 'Yıllık Vefat',
11: 'Uzun Süreli Vefat',
12: 'Birikimli Hayat',
13: 'Birikimli Hayat',
14: 'Sadece Yaşam Teminatlı',
15: 'Karma Sigorta',
16: 'Yıllık Vefat',
17: 'Yıllık Vefat',
18: 'Uzun Süreli Vefat',
19: 'Uzun Süreli Vefat',
20: 'Birikimli Hayat',
21: 'Yıllık Vefat',
22: 'Yıllık Vefat',
23: 'Uzun Süreli Vefat'},
'Month': {0: 1,
1: 1,
2: 1,
3: 1,
4: 1,
5: 1,
6: 1,
7: 1,
8: 1,
9: 1,
10: 1,
11: 1,
12: 2,
13: 2,
14: 2,
15: 2,
16: 2,
17: 2,
18: 2,
19: 2,
20: 2,
21: 2,
22: 2,
23: 2},
'Year': {0: 2022,
1: 2022,
2: 2022,
3: 2022,
4: 2022,
5: 2022,
6: 2022,
7: 2022,
8: 2022,
9: 2022,
10: 2022,
11: 2022,
12: 2022,
13: 2022,
14: 2022,
15: 2022,
16: 2022,
17: 2022,
18: 2022,
19: 2022,
20: 2022,
21: 2022,
22: 2022,
23: 2022},
'Ferdi Grup': {0: 'Ferdi',
1: 'Ferdi',
2: 'Ferdi',
3: 'Ferdi',
4: 'Ferdi',
5: 'Ferdi',
6: 'Ferdi',
7: 'Ferdi',
8: 'Grup',
9: 'Grup',
10: 'Grup',
11: 'Grup',
12: 'Ferdi',
13: 'Ferdi',
14: 'Ferdi',
15: 'Ferdi',
16: 'Ferdi',
17: 'Ferdi',
18: 'Ferdi',
19: 'Ferdi',
20: 'Grup',
21: 'Grup',
22: 'Grup',
23: 'Grup'},
'Premium': {0: 936622.43,
1: 14655.67,
2: 8496.0,
3: 124768619.29,
4: 6651019.24,
5: 11055383.530005993,
6: 54273212.457471885,
7: 22163192.66,
8: 81000.95,
9: 9338009.52,
10: 251790130.54997802,
11: 140949274.79999998,
12: 910808.77,
13: 8754.71,
14: 7128.0,
15: 129753498.31,
16: 8015974.454128993,
17: 16776490.000003006,
18: 67607915.34000003,
19: 24683694.700000003,
20: 60887.56,
21: 1497105.2458709963,
22: 195019190.297756,
23: 167424048.43},
'cumsum': {0: 936622.43,
1: 14655.67,
2: 23151.67,
3: 125705241.72000001,
4: 132356260.96000001,
5: 11078535.200005993,
6: 186629473.4174719,
7: 33241727.860005993,
8: 33322728.810005993,
9: 195967482.9374719,
10: 285112859.35998404,
11: 426062134.159984,
12: 196878291.7074719,
13: 426070888.869984,
14: 426078016.869984,
15: 326631790.0174719,
16: 334647764.4716009,
17: 442854506.869987,
18: 402255679.8116009,
19: 467538201.569987,
20: 467599089.129987,
21: 403752785.05747193,
22: 662618279.427743,
23: 830042327.857743}}
This is the result of a regular sum pivot:
AgeSA
Türkiye
1
195967482.9374719
426062134.159984
2
207785302.12000003
403980193.69775903
When I use the suggested code as below:
df_2 = data.copy()
df_2['cumsum'] = df_2.groupby(['Company', 'Year'])[['Premium']].cumsum()
df_2.sort_values(['Company', 'Year', 'cumsum']).reset_index(drop = True)
Each line gets a cumsum value from the above lines it seems:
For me to be able to get the table I need, I need to get max in each group again in a pivot_table:
df_2.pivot_table(
index = ['Year', 'Month'],
values = ['Premium', 'cumsum'],
columns = 'Company',
aggfunc = {'Premium': 'sum', 'cumsum': 'max'}
)
which finally gets me to this result:
Is it that difficult to get the cumsum table in pandas or am I just doing it the hard way?
Your dataframe is already in the right format, why you want to pivot it again?
I think what you are searching for is a pandas.groupby.
df['cumsum_by_group'] = df.groupby(['Company', 'Year'])['Premium'].cumsum()
Output:
Company Type Month Year Ferdi Grup Premium cumsum_by_group
1 Allianz Birikimli Hayat 1 2022 Ferdi 325 325
2 Allianz Birikimli Hayat 2 2022 Ferdi 476 801
3 Axa Birikimli Hayat 3 2022 Ferdi 687 687
To calculate the cumulative sum over multiple colums of a dataframe, you can use pandas.DataFrame.groupby and pandas.DataFrame.cumsum combined.
Assuming that data is the dataframe that holds the original dataset, use the code below :
data['Premium'] = data.groupby(['Company', 'Year'])['Premium'].cumsum()
out = data[['Company', 'Month', 'Year', 'Premium']] #to select the specific columns
>>> print(out)
I am looking to build a dictionary of list that meet the following criteria:
Item0 in list1 == Item0 in list2 and Item1 in list1 == Item1 in list2 and Date2 in list1 < Date2 in list2.
Running the code as is gives me one list in the dict. The one list is the same even if I change the if statement to > instead of <.
Everything prior to this(see below) looks correct:
"for li in Liststr2:
for lr in Liststr1A:
if lr[0] == li[0] and lr[1] == li[1] and lr[2] > li[2]:"
Also "lr[2] & li[2]" are dtype <M8[ns] if that makes a difference.
df = {'Position': {0: 1, 1: 2, 2: 1, 3: 2, 4: 1, 5: 2, 6: 1, 7: 2, 8: 1, 9: 1, 10: 1, 11: 2, 12: 1, 13: 2, 14: 1, 15: 2, 16: 1, 17: 2, 18: 1, 19: 2, 20: 1}, 'Location': {0: 'AB1', 1: 'AB2', 2: 'AB3', 3: 'AB4', 4: 'AB4', 5: 'AB4', 6: 'AB4', 7: 'AB4', 8: 'AB4', 9: 'AB2', 10: 'AB5', 11: 'AB4', 12: 'AB4', 13: 'AB6', 14: 'AB6', 15: 'AB6', 16: 'AB6', 17: 'AB6', 18: 'AB6', 19: 'AB1', 20: 'AB1'}, 'DATE': {0: Timestamp('2021-05-22 18:00:00'), 1: Timestamp('2021-05-21 13:00:00'), 2: Timestamp('2021-05-24 12:23:00'), 3: Timestamp('2021-05-23 12:25:00'), 4: Timestamp('2021-05-23 12:25:00'), 5: Timestamp('2021-05-23 12:25:00'), 6: Timestamp('2021-05-23 12:25:00'), 7: Timestamp('2021-05-23 12:25:00'), 8: Timestamp('2021-05-23 12:25:00'), 9: Timestamp('2021-05-21 18:00:00'), 10: Timestamp('2021-05-21 18:00:00'), 11: Timestamp('2021-05-24 14:08:00'), 12: Timestamp('2021-05-24 14:08:00'), 13: Timestamp('2021-05-24 16:35:00'), 14: Timestamp('2021-05-24 16:35:00'), 15: Timestamp('2021-05-24 16:35:00'), 16: Timestamp('2021-05-24 16:35:00'), 17: Timestamp('2021-05-24 19:48:00'), 18: Timestamp('2021-05-24 19:48:00'), 19: Timestamp('2021-05-25 23:45:00'), 20: Timestamp('2021-05-25 23:45:00')}, 'Item Numbers': {0: '788-33', 1: '07-1', 2: '5214-3', 3: '003', 4: '003', 5: '009J', 6: '009J', 7: '009J', 8: '009J', 9: '07-1', 10: '68-302', 11: '6-5213', 12: '6-5214', 13: '1-801', 14: '1-801', 15: '1-801', 16: '1-801', 17: '4-008', 18: '4-008', 19: 'A-001', 20: 'A-001'}}
finaltemp = []
Finallist = {}
str1Temp = []
str2Temp = []
NaValues = []
Liststr2 = []
Liststr1 = []
Listna = []
n = 0
for col, row in df.iterrows() :
col1Temp = row['col1']
col2Temp = row['col2']
col3temp = row['col3']
col4Temp = row['col4']
if col4Temp == None:
NaValues = [col1Temp, col3temp, col2Temp]
Listna.append(NaValues)
if col4Temp == 'str1':
str1Temp = [col1Temp, col3temp, col2Temp]
Liststr1.append(str1Temp)
if col4Temp == 'str2':
str2Temp = [col1Temp, col3temp, col2Temp]
Liststr2.append(str2Temp)
for li in Liststr2:
for lr in Liststr1:
if lr[0] == li[0] and lr[1] == li[1] and lr[2] > li[2]:
finaltemp = [lr[0], lr[1], lr[2]]
n = +1
key = 'Bad' + str(n)
def t() : return {key : finaltemp}
Finallist.update(t())
print(Finallist)
This simplifies your final loop, which as I said should be at the left margin, not indented one step:
Liststr1A = Liststr1[:10]
for li in Liststr2:
for lr in Liststr1A:
if lr[0] == li[0] and lr[1] == li[1] and lr[2] > li[2]:
Finallist['Bad'+str(len(FinalList)+1)] = lr[:]
print(Finallist)
It's not clear to me why you want Finallist to be a dictionary, since you want incrementing keys. Why not just make it a list and use Finallist.append?
I am trying to solve a business problem using Python but have difficulties to come up with a script to solve it. I have tried to loop through the dataframe using df.iterrows() but I am totally stuck because I just don't know how to proceed.
We process volumes in production orders of 1 type of resource that we need to process FIFO (first in first out). Each lot has a certain volume and price, after using up a lot we start with the next lot (FIFO).
Question: How can I automate the calculation of column Revenu ? Can you come up with some Python code that I can use to automate this process? Would you use a while or for loop, and would you iterate through the dataframe?
Herebelow I posted a print screen of the solution, on the left the production orders and on the right the volume and price per lot.
Below the image I posted 2 dictionaries containing the data of the screenshot.
Would really appreciate your help...
{'Productionorder': {0: 'Productionorder 1',
1: 'Productionorder 2',
2: 'Productionorder 3',
3: 'Productionorder 4',
4: 'Productionorder 5',
5: 'Productionorder 6',
6: 'Productionorder 7',
7: 'Productionorder 8',
8: 'Productionorder 9',
9: 'Productionorder 10',
10: 'Productionorder 11',
11: 'Productionorder 12',
12: 'Productionorder 13',
13: 'Productionorder 14',
14: 'Productionorder 15',
15: 'Productionorder 16',
16: 'Productionorder 17',
17: 'Productionorder 18',
18: 'Productionorder 19',
19: 'Productionorder 20',
20: 'Productionorder 21',
21: 'Productionorder 22'},
'Processed volume': {0: 810,
1: 3240,
2: 3177,
3: 1620,
4: 6480,
5: 5120,
6: 10880,
7: 13770,
8: 21060,
9: 4860,
10: 810,
11: 1620,
12: 15390,
13: 15390,
14: 6800,
15: 4480,
16: 10200,
17: 16650,
18: 2550,
19: 9050,
20: 9900,
21: 3200},
'Lotno.': {0: 1,
1: 1,
2: 1,
3: 1,
4: 2,
5: 2,
6: 2,
7: 2,
8: 2,
9: 2,
10: 2,
11: 2,
12: 2,
13: 3,
14: 3,
15: 3,
16: 3,
17: 3,
18: 3,
19: 3,
20: 4,
21: 4},
'Left of Lotno.': {0: 8490,
1: 5250,
2: 2073,
3: 453,
4: 75973,
5: 70853,
6: 59973,
7: 46203,
8: 25143,
9: 20283,
10: 19473,
11: 17853,
12: 2463,
13: 52073,
14: 45273,
15: 40793,
16: 30593,
17: 13943,
18: 11393,
19: 2343,
20: 38443,
21: 35243},
'Revenu': {0: 1741.5,
1: 6966.0,
2: 6830.549999999999,
3: 3483.0,
4: 10315.800000000001,
5: 7936.0,
6: 16864.0,
7: 21343.5,
8: 32643.0,
9: 7533.0,
10: 1255.5,
11: 2511.0,
12: 23854.5,
13: 20622.750000000004,
14: 8840.0,
15: 5824.0,
16: 13260.0,
17: 21645.0,
18: 3315.0,
19: 11765.0,
20: 12492.15,
21: 4000.0}}
{'Date': {0: Timestamp('2021-01-01 00:00:00'),
1: Timestamp('2021-01-02 00:00:00'),
2: Timestamp('2021-01-03 00:00:00'),
3: Timestamp('2021-01-04 00:00:00')},
'Lotno.': {0: 1, 1: 2, 2: 3, 3: 4},
'Volume': {0: 9300, 1: 82000, 2: 65000, 3: 46000},
'Price': {0: 2.15, 1: 1.55, 2: 1.3, 3: 1.25}}
Assuming you have two dataframes:
One for the Production Orders
And another for the Lot Details
The following function should allow you to calculate the Revenues (Along with the 'Lotno.' and 'Left of Lotno.' intermediary columns)
Requirements for each dataframe:
The Production Orders DataFrame must:
contain a column with the title 'Processed volume'
the index should be of consecutive integers starting at 0.
The Lot Details must:
contain the Columns ['Lotno.', 'Volume', 'Price']
have at least one row
rows should be ordered in the order of expected depletion.
In the event that the Quantity available in the lot is depleted, no additional revenue will be generated.
def fill_revenue(df1_orig, df2):
"""
df1_orig is the Production Orders DataFrame
df2 is the Lot Details DataFrame
The returned DataFrame is based on a copy of the df1_orig
"""
df1 = df1_orig.copy()
# Create Empty Columns for calculated fields
df1['Lotno.'] = None
df1['Left of Lotno.'] = None
df1['Revenu'] = None
def recursive_revenu_calc(order_volume, current_lot, current_lot_quantity, return_dict=None):
"""A function used to update the new values of a row"""
if return_dict is None:
return_dict = {'Revenu': 0}
return_dict.update({'Lotno.': current_lot, 'Left of Lotno.': current_lot_quantity})
lot_info = df2.loc[df2['Lotno.'] == current_lot].iloc[0]
# start calculation
if current_lot_quantity > order_volume:
return_dict['Revenu'] += order_volume * lot_info['Price']
current_lot_quantity -= order_volume
order_volume = 0
return_dict['Left of Lotno.'] = current_lot_quantity
else:
return_dict['Revenu'] += current_lot_quantity * lot_info['Price']
order_volume -= current_lot_quantity
try:
lot_info = df2.iloc[df2.index.get_loc(lot_info.name) + 1]
except IndexError:
return_dict['Left of Lotno.'] = 0
return return_dict
current_lot = lot_info['Lotno.']
current_lot_quantity = lot_info['Volume']
recursive_revenu_calc(order_volume, current_lot, current_lot_quantity, return_dict)
return return_dict
# updating each row of the Production Orders DataFrame
for idx, row in df1.iterrows():
order_volume = row['Processed volume']
current_lot = df2.iloc[0]['Lotno.'] if idx == 0 else df1.iloc[idx - 1]['Lotno.']
current_lot_quantity = df2.iloc[0]['Volume'] if idx == 0 else df1.iloc[idx - 1]['Left of Lotno.']
update_dict = recursive_revenu_calc(order_volume, current_lot, current_lot_quantity)
for key, value in update_dict.items():
df1.loc[idx, key] = value
return df1
I'm trying to use Plotly to overlay a marker/line chart on top of my OHLC candle chart.
Code
import plotly.graph_objects as go
import pandas as pd
from datetime import datetime
df = pd.DataFrame(
{'index': {0: 0,
1: 1,
2: 2,
3: 3,
4: 4,
5: 5,
6: 6,
7: 7,
8: 8,
9: 9,
10: 10,
11: 11,
12: 12,
13: 13,
14: 14,
15: 15,
16: 16,
17: 17,
18: 18,
19: 19,
20: 20,
21: 21,
22: 22,
23: 23,
24: 24},
'Date': {0: '2018-09-03',
1: '2018-09-04',
2: '2018-09-05',
3: '2018-09-06',
4: '2018-09-07',
5: '2018-09-10',
6: '2018-09-11',
7: '2018-09-12',
8: '2018-09-13',
9: '2018-09-14',
10: '2018-09-17',
11: '2018-09-18',
12: '2018-09-19',
13: '2018-09-20',
14: '2018-09-21',
15: '2018-09-24',
16: '2018-09-25',
17: '2018-09-26',
18: '2018-09-27',
19: '2018-09-28',
20: '2018-10-01',
21: '2018-10-02',
22: '2018-10-03',
23: '2018-10-04',
24: '2018-10-05'},
'Open': {0: 1.2922067642211914,
1: 1.2867859601974487,
2: 1.2859420776367188,
3: 1.2914056777954102,
4: 1.2928247451782229,
5: 1.292808175086975,
6: 1.3027958869934082,
7: 1.3017443418502808,
8: 1.30451238155365,
9: 1.3110626935958862,
10: 1.3071041107177734,
11: 1.3146650791168213,
12: 1.3166556358337402,
13: 1.3140604496002195,
14: 1.3271400928497314,
15: 1.3080958127975464,
16: 1.3117163181304932,
17: 1.3180439472198486,
18: 1.3169677257537842,
19: 1.3077707290649414,
20: 1.3039510250091553,
21: 1.3043931722640991,
22: 1.2979763746261597,
23: 1.2941633462905884,
24: 1.3022021055221558},
'High': {0: 1.2934937477111816,
1: 1.2870012521743774,
2: 1.2979259490966797,
3: 1.2959914207458496,
4: 1.3024225234985352,
5: 1.3052103519439695,
6: 1.30804443359375,
7: 1.3044441938400269,
8: 1.3120088577270508,
9: 1.3143367767333984,
10: 1.3156682252883911,
11: 1.3171066045761108,
12: 1.3211784362792969,
13: 1.3296104669570925,
14: 1.3278449773788452,
15: 1.3166556358337402,
16: 1.3175750970840454,
17: 1.3196094036102295,
18: 1.3180439472198486,
19: 1.3090718984603882,
20: 1.3097577095031738,
21: 1.3049719333648682,
22: 1.3020155429840088,
23: 1.3036959171295166,
24: 1.310753345489502},
'Low': {0: 1.2856279611587524,
1: 1.2813942432403564,
2: 1.2793285846710205,
3: 1.289723515510559,
4: 1.2918561697006226,
5: 1.289823293685913,
6: 1.2976733446121216,
7: 1.298414707183838,
8: 1.3027619123458862,
9: 1.3073604106903076,
10: 1.3070186376571655,
11: 1.3120776414871216,
12: 1.3120431900024414,
13: 1.3140085935592651,
14: 1.305841088294983,
15: 1.3064552545547483,
16: 1.3097233772277832,
17: 1.3141123056411743,
18: 1.309706211090088,
19: 1.3002548217773438,
20: 1.3014055490493774,
21: 1.2944146394729614,
22: 1.2964619398117063,
23: 1.2924572229385376,
24: 1.3005592823028564},
'Close': {0: 1.292306900024414,
1: 1.2869019508361816,
2: 1.2858428955078125,
3: 1.2914891242980957,
4: 1.2925406694412231,
5: 1.2930254936218262,
6: 1.302643060684204,
7: 1.3015578985214231,
8: 1.304546356201172,
9: 1.311131477355957,
10: 1.307326316833496,
11: 1.3146305084228516,
12: 1.3168463706970217,
13: 1.3141123056411743,
14: 1.327087163925171,
15: 1.30804443359375,
16: 1.3117333650588991,
17: 1.3179919719696045,
18: 1.3172800540924072,
19: 1.3078734874725342,
20: 1.3039000034332275,
21: 1.3043591976165771,
22: 1.2981956005096436,
23: 1.294062852859497,
24: 1.3024225234985352},
'Pivot Price': {0: 1.2934937477111816,
1: np.nan,
2: 1.2793285846710205,
3: np.nan,
4: np.nan,
5: np.nan,
6: np.nan,
7: np.nan,
8: np.nan,
9: np.nan,
10: np.nan,
11: np.nan,
12: np.nan,
13: 1.3296104669570925,
14: np.nan,
15: np.nan,
16: np.nan,
17: np.nan,
18: np.nan,
19: np.nan,
20: np.nan,
21: np.nan,
22: np.nan,
23: 1.2924572229385376,
24: np.nan}})
fig = go.Figure(data=[go.Candlestick(x=df['Date'],
open=df['Open'],
high=df['High'],
low=df['Low'],
close=df['Close'])])
fig.add_trace(
go.Scatter(mode = "lines+markers",
x=df['Date'],
y=df["Pivot Price"]
))
fig.update_layout(
autosize=False,
width=1000,
height=800,)
fig.show()
This is the current image
This is the desired output/image
I want black line between the markers (pivots). I would also ideally like a value next to each line showing the distance between each pivot but Im not sure how to do this.
For example the distance between the first two pivots round(abs(1.293494 - 1.279329),3) returns 0.014 so I would ideally like this next to the line.
The second is round(abs(1.279329 - 1.329610),3) so the value would be 0.05. I have hand edited the image and added the lines for the first two values to give a visual representation of what Im trying to achieve.
The problem seems to be the missing values. So just use pandas.Series.interpolate in combination with fig.add_annotation to get:
I've included annotations for differences as well. There are surely more elegant ways to do it than with for loops, but it does the job. Let me know if anything is unclear!
import pandas as pd
import numpy as np
import plotly.graph_objects as go
df = pd.DataFrame(
{'index': {0: 0,
1: 1,
2: 2,
3: 3,
4: 4,
5: 5,
6: 6,
7: 7,
8: 8,
9: 9,
10: 10,
11: 11,
12: 12,
13: 13,
14: 14,
15: 15,
16: 16,
17: 17,
18: 18,
19: 19,
20: 20,
21: 21,
22: 22,
23: 23,
24: 24},
'Date': {0: '2018-09-03',
1: '2018-09-04',
2: '2018-09-05',
3: '2018-09-06',
4: '2018-09-07',
5: '2018-09-10',
6: '2018-09-11',
7: '2018-09-12',
8: '2018-09-13',
9: '2018-09-14',
10: '2018-09-17',
11: '2018-09-18',
12: '2018-09-19',
13: '2018-09-20',
14: '2018-09-21',
15: '2018-09-24',
16: '2018-09-25',
17: '2018-09-26',
18: '2018-09-27',
19: '2018-09-28',
20: '2018-10-01',
21: '2018-10-02',
22: '2018-10-03',
23: '2018-10-04',
24: '2018-10-05'},
'Open': {0: 1.2922067642211914,
1: 1.2867859601974487,
2: 1.2859420776367188,
3: 1.2914056777954102,
4: 1.2928247451782229,
5: 1.292808175086975,
6: 1.3027958869934082,
7: 1.3017443418502808,
8: 1.30451238155365,
9: 1.3110626935958862,
10: 1.3071041107177734,
11: 1.3146650791168213,
12: 1.3166556358337402,
13: 1.3140604496002195,
14: 1.3271400928497314,
15: 1.3080958127975464,
16: 1.3117163181304932,
17: 1.3180439472198486,
18: 1.3169677257537842,
19: 1.3077707290649414,
20: 1.3039510250091553,
21: 1.3043931722640991,
22: 1.2979763746261597,
23: 1.2941633462905884,
24: 1.3022021055221558},
'High': {0: 1.2934937477111816,
1: 1.2870012521743774,
2: 1.2979259490966797,
3: 1.2959914207458496,
4: 1.3024225234985352,
5: 1.3052103519439695,
6: 1.30804443359375,
7: 1.3044441938400269,
8: 1.3120088577270508,
9: 1.3143367767333984,
10: 1.3156682252883911,
11: 1.3171066045761108,
12: 1.3211784362792969,
13: 1.3296104669570925,
14: 1.3278449773788452,
15: 1.3166556358337402,
16: 1.3175750970840454,
17: 1.3196094036102295,
18: 1.3180439472198486,
19: 1.3090718984603882,
20: 1.3097577095031738,
21: 1.3049719333648682,
22: 1.3020155429840088,
23: 1.3036959171295166,
24: 1.310753345489502},
'Low': {0: 1.2856279611587524,
1: 1.2813942432403564,
2: 1.2793285846710205,
3: 1.289723515510559,
4: 1.2918561697006226,
5: 1.289823293685913,
6: 1.2976733446121216,
7: 1.298414707183838,
8: 1.3027619123458862,
9: 1.3073604106903076,
10: 1.3070186376571655,
11: 1.3120776414871216,
12: 1.3120431900024414,
13: 1.3140085935592651,
14: 1.305841088294983,
15: 1.3064552545547483,
16: 1.3097233772277832,
17: 1.3141123056411743,
18: 1.309706211090088,
19: 1.3002548217773438,
20: 1.3014055490493774,
21: 1.2944146394729614,
22: 1.2964619398117063,
23: 1.2924572229385376,
24: 1.3005592823028564},
'Close': {0: 1.292306900024414,
1: 1.2869019508361816,
2: 1.2858428955078125,
3: 1.2914891242980957,
4: 1.2925406694412231,
5: 1.2930254936218262,
6: 1.302643060684204,
7: 1.3015578985214231,
8: 1.304546356201172,
9: 1.311131477355957,
10: 1.307326316833496,
11: 1.3146305084228516,
12: 1.3168463706970217,
13: 1.3141123056411743,
14: 1.327087163925171,
15: 1.30804443359375,
16: 1.3117333650588991,
17: 1.3179919719696045,
18: 1.3172800540924072,
19: 1.3078734874725342,
20: 1.3039000034332275,
21: 1.3043591976165771,
22: 1.2981956005096436,
23: 1.294062852859497,
24: 1.3024225234985352},
'Pivot Price': {0: 1.2934937477111816,
1: np.nan,
2: 1.2793285846710205,
3: np.nan,
4: np.nan,
5: np.nan,
6: np.nan,
7: np.nan,
8: np.nan,
9: np.nan,
10: np.nan,
11: np.nan,
12: np.nan,
13: 1.3296104669570925,
14: np.nan,
15: np.nan,
16: np.nan,
17: np.nan,
18: np.nan,
19: np.nan,
20: np.nan,
21: np.nan,
22: np.nan,
23: 1.2924572229385376,
24: np.nan}})
import plotly.graph_objects as go
import pandas as pd
from datetime import datetime
# df=pd.read_csv("for_so.csv")
fig = go.Figure(data=[go.Candlestick(x=df['Date'],
# fig = go.Figure(data=[go.Candlestick(x=df.index,
open=df['Open'],
high=df['High'],
low=df['Low'],
close=df['Close'])])
# some calculations
df_diff = df['Pivot Price'].dropna().diff().copy()
df2 = df[df.index.isin(df_diff.index)].copy()
df2['Price Diff'] = df['Pivot Price'].dropna().values
fig.add_trace(
go.Scatter(mode = "lines+markers",
x=df['Date'],
y=df["Pivot Price"]
))
fig.update_layout(
autosize=False,
width=1000,
height=800,)
fig.add_trace(go.Scatter(x=df['Date'], y=df['Pivot Price'].interpolate(),
# fig.add_trace(go.Scatter(x=df.index, y=df['Pivot Price'].interpolate(),
mode = 'lines',
line = dict(color='black')))
def annot(value):
# print(type(value))
if np.isnan(value):
return ''
else:
return value
j = 0
for i, p in enumerate(df['Pivot Price']):
# print(p)
# if not np.isnan(p) and not np.isnan(df_diff.iloc[j]):
if not np.isnan(p):
# print(not np.isnan(df_diff.iloc[j]))
fig.add_annotation(dict(font=dict(color='rgba(0,0,200,0.8)',size=12),
x=df['Date'].iloc[i],
# x=df.index[i],
# x = xStart
y=p,
showarrow=False,
text=annot(round(abs(df_diff.iloc[j]),3)),
textangle=0,
xanchor='right',
xref="x",
yref="y"))
j = j + 1
fig.update_xaxes(type='category')
fig.show()
Problem seems the missing values, plotly has difficulty with. With this trick you can only plot the point;
has_value = ~df["Pivot Price"].isna()
import plotly.graph_objects as go
import pandas as pd
from datetime import datetime
df=pd.read_csv("notebooks/for_so.csv")
fig = go.Figure(data=[go.Candlestick(x=df['Date'],
open=df['Open'],
high=df['High'],
low=df['Low'],
close=df['Close'])])
fig.add_trace(
go.Scatter(mode = 'lines',
x=df[has_value]['Date'],
y=df[has_value]["Pivot Price"], line={'color':'black', 'width':1}
))
fig.add_trace(
go.Scatter(mode = "markers",
x=df['Date'],
y=df["Pivot Price"]
))
fig.update_layout(
autosize=False,
width=1000,
height=800,)
fig.show()
This did it for me.
This question already has answers here:
Get the row(s) which have the max value in groups using groupby
(15 answers)
Closed 2 years ago.
I have a table that looks like this
I want to be able to keep ids for brands that have highest freq. For example in case of audi both ids have same frequencies so keep only one. In case of mercedes-benz keep the latter one since it has frequency 7.
This is my dataframe:
{'Brand':
{0: 'audi',
1: 'audi',
2: 'bmw',
3: 'dacia',
4: 'fiat',
5: 'ford',
6: 'ford',
7: 'honda',
8: 'honda',
9: 'hyundai',
10: 'kia',
11: 'mercedes-benz',
12: 'mercedes-benz',
13: 'nissan',
14: 'nissan',
15: 'opel',
16: 'renault',
17: 'renault',
18: 'renault',
19: 'renault',
20: 'toyota',
21: 'toyota',
22: 'volvo',
23: 'vw',
24: 'vw',
25: 'vw',
26: 'vw'},
'id':
{0: 'audi_a4_dynamic_2016_otomatik',
1: 'audi_a6_standart_2015_otomatik',
2: 'bmw_5 series_executive_2016_otomatik',
3: 'dacia_duster_laureate_2017_manuel',
4: 'fiat_egea_easy_2017_manuel',
5: 'ford_focus_trend x_2015_manuel',
6: 'ford_focus_trend x_2015_otomatik',
7: 'honda_civic_eco elegance_2017_otomatik',
8: 'honda_cr-v_executive_2018_otomatik',
9: 'hyundai_tucson_elite plus_2017_otomatik',
10: 'kia_sportage_concept plus_2015_otomatik',
11: 'mercedes-benz_c-class_amg_2016_otomatik',
12: 'mercedes-benz_e-class_edition e_2015_otomatik',
13: 'nissan_qashqai_black edition_2014_manuel',
14: 'nissan_qashqai_sky pack_2015_otomatik',
15: 'opel_astra_edition_2016_manuel',
16: 'renault_clio_joy_2016_manuel',
17: 'renault_kadjar_icon_2015_otomatik',
18: 'renault_kadjar_icon_2016_otomatik',
19: 'renault_mégane_touch_2017_otomatik',
20: 'toyota_corolla_touch_2015_otomatik',
21: 'toyota_corolla_touch_2016_otomatik',
22: 'volvo_s60_advance_2018_otomatik',
23: 'vw_jetta_comfortline_2013_otomatik',
24: 'vw_passat_highline_2017_otomatik',
25: 'vw_tiguan_sport&style_2012_manuel',
26: 'vw_tiguan_sport&style_2013_manuel'},
'freq': {0: 4,
1: 4,
2: 7,
3: 4,
4: 4,
5: 4,
6: 4,
7: 4,
8: 4,
9: 4,
10: 4,
11: 4,
12: 7,
13: 4,
14: 4,
15: 4,
16: 4,
17: 4,
18: 4,
19: 4,
20: 4,
21: 4,
22: 4,
23: 4,
24: 7,
25: 4,
26: 4}}
Edit: tried one of the answers and got an extra level of header
You need to pandas.groupby Brand and then aggregate with respect to the maximal frequency.
Something like this should work:
df.groupby('Brand')[['id', 'freq']].agg({'freq': 'max'})
To get your result, run:
result = df.groupby('Brand', as_index=False).apply(
lambda grp: grp[grp.freq == grp.freq.max()].iloc[0])