Animate a Plotly map with a sliding date bar

Animate a Plotly map with a sliding date bar - python

I'm struggling to turn this piece of code I wrote - which constructs a static heatmap - into an animated version with a date slider.
import pandas as pd
import plotly.graph_objects as go
...
fig = go.Figure(go.Densitymapbox(lat=df_heat['lat'], lon=df_heat['lon'], z=df_heat['count'],
radius=10,))
fig.update_layout(mapbox_style="carto-positron", mapbox_zoom=10, mapbox_center = {"lat": 40.7831, "lon": -73.9712},)
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()
The above code successfully converts a Pandas DataFrame df_heat, which looks like the following, into a Plotly heatmap.
lat lon count
0 -62.884215 39.440236 1
1 -62.834226 39.408072 1
2 -62.811707 39.380462 1
3 -62.744564 39.489112 1
...
Static heatmap output:
df_heat is itself just an aggregated view of the following DataFrame, which also includes a date.
date lat lon count
0 2018-07-29 40.691828 -73.944609 1
1 2018-07-29 40.693601 -73.945092 1
2 2018-07-29 40.696132 -73.945178 1
3 2018-07-29 40.692726 -73.945532 1
My question is, how can I convert this DataFrame with dates into an animated plotly map, such as the ones here, here, and here, which feature a date slider as a filter.
Dummy data for testing:
df = pd.DataFrame({'datetime': {0: '2018-09-29 00:00:00', 1: '2018-07-28 00:00:00', 2: '2018-07-29 00:00:00', 3: '2018-07-29 00:00:00', 4: '2018-08-01 00:00:00', 5: '2018-08-01 00:00:00', 6: '2018-08-01 00:00:00', 7: '2018-08-05 00:00:00', 8: '2018-09-06 00:00:00', 9: '2018-09-07 00:00:00', 10: '2018-09-07 00:00:00', 11: '2018-09-08 00:00:00', 12: '2018-09-08 00:00:00', 13: '2018-09-08 00:00:00', 14: '2018-10-08 00:00:00', 15: '2018-10-10 00:00:00', 16: '2018-10-10 00:00:00', 17: '2018-10-11 00:00:00', 18: '2018-10-11 00:00:00', 19: '2018-10-11 00:00:00'},
'lat': {0: 40.6908284, 1: 40.693601, 2: 40.6951317, 3: 40.6967261, 4: 40.697593, 5: 40.6987141, 6: 40.7186497, 7: 40.7187772, 8: 40.7196151, 9: 40.7196865, 10: 40.7187408, 11: 40.7189716, 12: 40.7214273, 13: 40.7226571, 14: 40.7236955, 15: 40.7247207, 16: 40.7221074, 17: 40.7445859, 18: 40.7476252, 19: 40.7476451},
'lon': {0: -73.9336094, 1: -73.9350917, 2: -73.9351778, 3: -73.9355315, 4: -73.9366737, 5: -73.9393797, 6: -74.0011939, 7: -74.0010918, 8: -73.9887851, 9: -74.0035125, 10: -74.0250842, 11: -74.0299202, 12: -74.029886, 13: -74.027542, 14: -74.0290157, 15: -74.0291541, 16: -74.0220728, 17: -73.9442636, 18: -73.9641326, 19: -73.9533039},
'count': {0: 1, 1: 2, 2: 5, 3: 1, 4: 6, 5: 1, 6: 3, 7: 2, 8: 1, 9: 7, 10: 3, 11: 3, 12: 1, 13: 2, 14: 1, 15: 1, 16: 2, 17: 1, 18: 1, 19: 1}})

I did few changes and added the timeline animation to your code.
Similarly to the solution of Teoretic I also used plotly.express which make things shorter.
Live example
http://www.erangrinberg.de/plotly/map-animation.html
import pandas as pd
import plotly.express as px
df = pd.DataFrame({'Date': {0: '2018-09-29 00:00:00', 1: '2018-07-28 00:00:00', 2: '2018-07-29 00:00:00', 3: '2018-07-29 00:00:00', 4: '2018-08-01 00:00:00', 5: '2018-08-01 00:00:00', 6: '2018-08-01 00:00:00', 7: '2018-08-05 00:00:00', 8: '2018-09-06 00:00:00', 9: '2018-09-07 00:00:00', 10: '2018-09-07 00:00:00', 11: '2018-09-08 00:00:00', 12: '2018-09-08 00:00:00', 13: '2018-09-08 00:00:00', 14: '2018-10-08 00:00:00', 15: '2018-10-10 00:00:00', 16: '2018-10-10 00:00:00', 17: '2018-10-11 00:00:00', 18: '2018-10-11 00:00:00', 19: '2018-10-11 00:00:00'},
'lat': {0: 40.6908284, 1: 40.693601, 2: 40.6951317, 3: 40.6967261, 4: 40.697593, 5: 40.6987141, 6: 40.7186497, 7: 40.7187772, 8: 40.7196151, 9: 40.7196865, 10: 40.7187408, 11: 40.7189716, 12: 40.7214273, 13: 40.7226571, 14: 40.7236955, 15: 40.7247207, 16: 40.7221074, 17: 40.7445859, 18: 40.7476252, 19: 40.7476451},
'lon': {0: -73.9336094, 1: -73.9350917, 2: -73.9351778, 3: -73.9355315, 4: -73.9366737, 5: -73.9393797, 6: -74.0011939, 7: -74.0010918, 8: -73.9887851, 9: -74.0035125, 10: -74.0250842, 11: -74.0299202, 12: -74.029886, 13: -74.027542, 14: -74.0290157, 15: -74.0291541, 16: -74.0220728, 17: -73.9442636, 18: -73.9641326, 19: -73.9533039},
'count': {0: 1, 1: 2, 2: 5, 3: 1, 4: 6, 5: 1, 6: 3, 7: 2, 8: 1, 9: 7, 10: 3, 11: 3, 12: 1, 13: 2, 14: 1, 15: 1, 16: 2, 17: 1, 18: 1, 19: 1}})
fig = px.density_mapbox(df, lat=df['lat'],
lon=df['lon'],
z=df['count'],
radius=10,
animation_frame="Date"
)
fig.update_layout(mapbox_style="carto-positron", mapbox_zoom=10, mapbox_center = {"lat": 40.7831, "lon": -73.9712},)
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.layout.updatemenus[0].buttons[0].args[1]["frame"]["duration"] = 600
fig.layout.updatemenus[0].buttons[0].args[1]["transition"]["duration"] = 600
fig.layout.coloraxis.showscale = True
fig.layout.sliders[0].pad.t = 10
fig.layout.updatemenus[0].pad.t= 10
fig.show()
Would be nice to see the final result,
or maybe you can share the DataSet Source.

You can play with scatter_geo plot from plotly.express to get an interactive graph.
It doesn't produce the heat map, but it can make dots like on your graph.
Sample code with your dummy data:
import pandas as pd
import plotly.express as px
df = pd.DataFrame({'datetime': {0: '2018-09-29 00:00:00', 1: '2018-07-28 00:00:00', 2: '2018-07-29 00:00:00', 3: '2018-07-29 00:00:00', 4: '2018-08-01 00:00:00', 5: '2018-08-01 00:00:00', 6: '2018-08-01 00:00:00', 7: '2018-08-05 00:00:00', 8: '2018-09-06 00:00:00', 9: '2018-09-07 00:00:00', 10: '2018-09-07 00:00:00', 11: '2018-09-08 00:00:00', 12: '2018-09-08 00:00:00', 13: '2018-09-08 00:00:00', 14: '2018-10-08 00:00:00', 15: '2018-10-10 00:00:00', 16: '2018-10-10 00:00:00', 17: '2018-10-11 00:00:00', 18: '2018-10-11 00:00:00', 19: '2018-10-11 00:00:00'},
'lat': {0: 40.6908284, 1: 40.693601, 2: 40.6951317, 3: 40.6967261, 4: 40.697593, 5: 40.6987141, 6: 40.7186497, 7: 40.7187772, 8: 40.7196151, 9: 40.7196865, 10: 40.7187408, 11: 40.7189716, 12: 40.7214273, 13: 40.7226571, 14: 40.7236955, 15: 40.7247207, 16: 40.7221074, 17: 40.7445859, 18: 40.7476252, 19: 40.7476451},
'lon': {0: -73.9336094, 1: -73.9350917, 2: -73.9351778, 3: -73.9355315, 4: -73.9366737, 5: -73.9393797, 6: -74.0011939, 7: -74.0010918, 8: -73.9887851, 9: -74.0035125, 10: -74.0250842, 11: -74.0299202, 12: -74.029886, 13: -74.027542, 14: -74.0290157, 15: -74.0291541, 16: -74.0220728, 17: -73.9442636, 18: -73.9641326, 19: -73.9533039},
'count': {0: 1, 1: 2, 2: 5, 3: 1, 4: 6, 5: 1, 6: 3, 7: 2, 8: 1, 9: 7, 10: 3, 11: 3, 12: 1, 13: 2, 14: 1, 15: 1, 16: 2, 17: 1, 18: 1, 19: 1}})
fig = px.scatter_geo(df,
lat='lat',
lon='lon',
scope='usa',
color="count",
size='count',
projection="albers usa",
animation_frame="datetime",
title='Your title')
fig.update(layout_coloraxis_showscale=False)
fig.show()
Also you can check this kaggle notebook for more examples of usage of this graph.

Related

Pandas merge dataframes with multiple columns

I am trying to merge 2 dataframes and have a problem in figuring out how, as it is not straigh forward.
One data frame has match results for over 25000 games and looks like this.
The second one has team performance metrics but only for around 1500 games.
As I am not allowed to post pictures yet, here are the column names of interest:
df_match['date', 'home_team_api_id', 'away_team_api_id']
df_team_attributes['date', 'team_api_id']
Both data frames have additional columns with results or performance metrics.
To be able to merge correctly, I need to merge by date and by looking if the 'team_api_id' matches either 'home...' or 'away_team_api_id'
This is what I have tried until now:
df_team_performance = pd.merge(df_team_attributes, df_match,
how = 'left',
left_on = ['date', 'team_api_id', 'team_api_id'],
right_on = ['date', 'home_team_api_id', 'home_team_api_id'])
I have tried also with only 2 columns, but w/o succes.
What I would like to get is a new data frame with only the rows of the df_team_attributes and columns from both data frames.
Thank you in advance!
Added to request by Correlien:
output of print(df_match[['date', 'home_team_api_id', 'away_team_api_id', 'win_home', 'win_away', 'draw', 'win']].head(10).to_dict())
{'date': {0: '2008-08-17 00:00:00', 1: '2008-08-16 00:00:00', 2: '2008-08-16 00:00:00', 3: '2008-08-17 00:00:00', 4: '2008-08-16 00:00:00', 5: '2008-09-24 00:00:00', 6: '2008-08-16 00:00:00', 7: '2008-08-16 00:00:00', 8: '2008-08-16 00:00:00', 9: '2008-11-01 00:00:00'}, 'home_team_api_id': {0: 9987, 1: 10000, 2: 9984, 3: 9991, 4: 7947, 5: 8203, 6: 9999, 7: 4049, 8: 10001, 9: 8342}, 'away_team_api_id': {0: 9993, 1: 9994, 2: 8635, 3: 9998, 4: 9985, 5: 8342, 6: 8571, 7: 9996, 8: 9986, 9: 8571}, 'win_home': {0: 0, 1: 0, 2: 0, 3: 1, 4: 0, 5: 0, 6: 0, 7: 0, 8: 1, 9: 1}, 'win_away': {0: 0, 1: 0, 2: 1, 3: 0, 4: 1, 5: 0, 6: 0, 7: 1, 8: 0, 9: 0}, 'draw': {0: 1, 1: 1, 2: 0, 3: 0, 4: 0, 5: 1, 6: 1, 7: 0, 8: 0, 9: 0}, 'win': {0: 0, 1: 0, 2: 1, 3: 1, 4: 1, 5: 0, 6: 0, 7: 1, 8: 1, 9: 1}}
output for print(df_team_attributes[['date', 'team_api_id', 'buildUpPlaySpeed', 'buildUpPlaySpeedClass']].head(10).to_dict())
{'date': {0: '2010-02-22 00:00:00', 1: '2014-09-19 00:00:00', 2: '2015-09-10 00:00:00', 3: '2010-02-22 00:00:00', 4: '2011-02-22 00:00:00', 5: '2012-02-22 00:00:00', 6: '2013-09-20 00:00:00', 7: '2014-09-19 00:00:00', 8: '2015-09-10 00:00:00', 9: '2010-02-22 00:00:00'}, 'team_api_id': {0: 9930, 1: 9930, 2: 9930, 3: 8485, 4: 8485, 5: 8485, 6: 8485, 7: 8485, 8: 8485, 9: 8576}, 'buildUpPlaySpeed': {0: 60, 1: 52, 2: 47, 3: 70, 4: 47, 5: 58, 6: 62, 7: 58, 8: 59, 9: 60}, 'buildUpPlaySpeedClass': {0: 'Balanced', 1: 'Balanced', 2: 'Balanced', 3: 'Fast', 4: 'Balanced', 5: 'Balanced', 6: 'Balanced', 7: 'Balanced', 8: 'Balanced', 9: 'Balanced'}}

Have you tried casting the your date columns into the correct format and then attempting the merge? The following worked for me based on the example that you provided -
# Casting to date
df_match["date"] = pd.to_datetime(df_match["date"])
df_team_attributes["date"] = pd.to_datetime(df_match["date"])
# Merging on the date field alone
df_team_performance = pd.merge(df_team_attributes, df_match,
how = 'left',
on = 'date')
# Filtering out the required rows
result = df_team_performance.query("(team_api_id == home_team_api_id) | (team_api_id == away_team_api_id)")
Please let me know if my understanding of your question is correct.

Computing the average

In the following dataset:
import pandas as pd
df = pd.DataFrame({'globalid': {0: '4388064', 1: '4388200', 2: '4399344', 3: '4400638', 4: '4401765', 5: '4401831', 6: '4402098', 7: '4406997', 8: '4407331', 9: '4417043', 10: '4437380', 11: '4442467', 12: '4401955', 13: '4425140', 14: '4426164', 15: '4405473', 16: '4411249', 17: '4388584', 18: '4400483', 19: '4433927', 20: '4413441', 21: '4436355', 22: '4443361', 23: '4443375', 24: '4388176'}, 'postcode': {0: '1774PG', 1: '7481LK', 2: '1068MS', 3: '5628EN', 4: '7731TV', 5: '5971CR', 6: '9571BM', 7: '1031KA', 8: '9076BK', 9: '4465AL', 10: '1096AC', 11: '3601', 12: '2563PT', 13: '2341HN', 14: '2553DM', 15: '2403EM', 16: '1051AN', 17: '4525AB', 18: '4542BA', 19: '1096AC', 20: '5508AE', 21: '1096AC', 22: '3543GC', 23: '4105TA', 24: '7742EH'}, 'koopprijs': {0: '139000', 1: '209000', 2: '267500', 3: '349000', 4: '495000', 5: '162500', 6: '217500', 7: '655000', 8: '180000', 9: '495000', 10: '2395000', 11: '355000', 12: '150000', 13: '167500', 14: '710000', 15: '275000', 16: '498000', 17: '324500', 18: '174500', 19: '610000', 20: '300000', 21: '2230000', 22: '749000', 23: '504475', 24: '239000'}, 'place_name': {0: 'Slootdorp', 1: 'Haaksbergen', 2: 'Amsterdam', 3: 'Eindhoven', 4: 'Ommen', 5: 'Grubbenvorst', 6: '2e Exloërmond', 7: 'Amsterdam', 8: 'St.-Annaparochie', 9: 'Goes', 10: 'Amsterdam', 11: 'Maarssen', 12: "'s-Gravenhage", 13: 'Oegstgeest', 14: "'s-Gravenhage", 15: 'Alphen aan den Rijn', 16: 'Amsterdam', 17: 'Retranchement', 18: 'Hoek', 19: 'Amsterdam', 20: 'Veldhoven', 21: 'Amsterdam', 22: 'Utrecht', 23: 'Culemborg', 24: 'Coevorden'}})
print(df)
I would like to compute the average asking price, which is indicated by 'koopprijs' per place_name. Can someone please provide the code, or explain how this can be computed? As there are multiple 'koopprijs' per place_name, such as Amsterdam I am looking to compute the average price per placename.

You can try below:
df['koopprijs'] = df['koopprijs'].astype(int) # just make sure the values are int.
df2 = df.groupby('place_name')['koopprijs'].mean()
print(df2)
You will get the output as:
place_name
's-Gravenhage 430000
2e Exloërmond 217500
Alphen aan den Rijn 275000
Amsterdam 1109250
Coevorden 239000
Culemborg 504475
Eindhoven 349000
Goes 495000
Grubbenvorst 162500
Haaksbergen 209000
Hoek 174500
Maarssen 355000
Oegstgeest 167500
Ommen 495000
Retranchement 324500
Slootdorp 139000
St.-Annaparochie 180000
Utrecht 749000
Veldhoven 300000
Name: koopprijs, dtype: int32

First change the data type for koopprijs and then use groupby-agg
df['koopprijs'] = df['koopprijs'].astype('int')
df = df.groupby(['place_name'])['koopprijs'].agg('mean')

Python: looping through 2 dataframes having thresholds and calculating revenue, stuck

I am trying to solve a business problem using Python but have difficulties to come up with a script to solve it. I have tried to loop through the dataframe using df.iterrows() but I am totally stuck because I just don't know how to proceed.
We process volumes in production orders of 1 type of resource that we need to process FIFO (first in first out). Each lot has a certain volume and price, after using up a lot we start with the next lot (FIFO).
Question: How can I automate the calculation of column Revenu ? Can you come up with some Python code that I can use to automate this process? Would you use a while or for loop, and would you iterate through the dataframe?
Herebelow I posted a print screen of the solution, on the left the production orders and on the right the volume and price per lot.
Below the image I posted 2 dictionaries containing the data of the screenshot.
Would really appreciate your help...
{'Productionorder': {0: 'Productionorder 1',
1: 'Productionorder 2',
2: 'Productionorder 3',
3: 'Productionorder 4',
4: 'Productionorder 5',
5: 'Productionorder 6',
6: 'Productionorder 7',
7: 'Productionorder 8',
8: 'Productionorder 9',
9: 'Productionorder 10',
10: 'Productionorder 11',
11: 'Productionorder 12',
12: 'Productionorder 13',
13: 'Productionorder 14',
14: 'Productionorder 15',
15: 'Productionorder 16',
16: 'Productionorder 17',
17: 'Productionorder 18',
18: 'Productionorder 19',
19: 'Productionorder 20',
20: 'Productionorder 21',
21: 'Productionorder 22'},
'Processed volume': {0: 810,
1: 3240,
2: 3177,
3: 1620,
4: 6480,
5: 5120,
6: 10880,
7: 13770,
8: 21060,
9: 4860,
10: 810,
11: 1620,
12: 15390,
13: 15390,
14: 6800,
15: 4480,
16: 10200,
17: 16650,
18: 2550,
19: 9050,
20: 9900,
21: 3200},
'Lotno.': {0: 1,
1: 1,
2: 1,
3: 1,
4: 2,
5: 2,
6: 2,
7: 2,
8: 2,
9: 2,
10: 2,
11: 2,
12: 2,
13: 3,
14: 3,
15: 3,
16: 3,
17: 3,
18: 3,
19: 3,
20: 4,
21: 4},
'Left of Lotno.': {0: 8490,
1: 5250,
2: 2073,
3: 453,
4: 75973,
5: 70853,
6: 59973,
7: 46203,
8: 25143,
9: 20283,
10: 19473,
11: 17853,
12: 2463,
13: 52073,
14: 45273,
15: 40793,
16: 30593,
17: 13943,
18: 11393,
19: 2343,
20: 38443,
21: 35243},
'Revenu': {0: 1741.5,
1: 6966.0,
2: 6830.549999999999,
3: 3483.0,
4: 10315.800000000001,
5: 7936.0,
6: 16864.0,
7: 21343.5,
8: 32643.0,
9: 7533.0,
10: 1255.5,
11: 2511.0,
12: 23854.5,
13: 20622.750000000004,
14: 8840.0,
15: 5824.0,
16: 13260.0,
17: 21645.0,
18: 3315.0,
19: 11765.0,
20: 12492.15,
21: 4000.0}}
{'Date': {0: Timestamp('2021-01-01 00:00:00'),
1: Timestamp('2021-01-02 00:00:00'),
2: Timestamp('2021-01-03 00:00:00'),
3: Timestamp('2021-01-04 00:00:00')},
'Lotno.': {0: 1, 1: 2, 2: 3, 3: 4},
'Volume': {0: 9300, 1: 82000, 2: 65000, 3: 46000},
'Price': {0: 2.15, 1: 1.55, 2: 1.3, 3: 1.25}}

Assuming you have two dataframes:
One for the Production Orders
And another for the Lot Details
The following function should allow you to calculate the Revenues (Along with the 'Lotno.' and 'Left of Lotno.' intermediary columns)
Requirements for each dataframe:
The Production Orders DataFrame must:
contain a column with the title 'Processed volume'
the index should be of consecutive integers starting at 0.
The Lot Details must:
contain the Columns ['Lotno.', 'Volume', 'Price']
have at least one row
rows should be ordered in the order of expected depletion.
In the event that the Quantity available in the lot is depleted, no additional revenue will be generated.
def fill_revenue(df1_orig, df2):
"""
df1_orig is the Production Orders DataFrame
df2 is the Lot Details DataFrame
The returned DataFrame is based on a copy of the df1_orig
"""
df1 = df1_orig.copy()
# Create Empty Columns for calculated fields
df1['Lotno.'] = None
df1['Left of Lotno.'] = None
df1['Revenu'] = None
def recursive_revenu_calc(order_volume, current_lot, current_lot_quantity, return_dict=None):
"""A function used to update the new values of a row"""
if return_dict is None:
return_dict = {'Revenu': 0}
return_dict.update({'Lotno.': current_lot, 'Left of Lotno.': current_lot_quantity})
lot_info = df2.loc[df2['Lotno.'] == current_lot].iloc[0]
# start calculation
if current_lot_quantity > order_volume:
return_dict['Revenu'] += order_volume * lot_info['Price']
current_lot_quantity -= order_volume
order_volume = 0
return_dict['Left of Lotno.'] = current_lot_quantity
else:
return_dict['Revenu'] += current_lot_quantity * lot_info['Price']
order_volume -= current_lot_quantity
try:
lot_info = df2.iloc[df2.index.get_loc(lot_info.name) + 1]
except IndexError:
return_dict['Left of Lotno.'] = 0
return return_dict
current_lot = lot_info['Lotno.']
current_lot_quantity = lot_info['Volume']
recursive_revenu_calc(order_volume, current_lot, current_lot_quantity, return_dict)
return return_dict
# updating each row of the Production Orders DataFrame
for idx, row in df1.iterrows():
order_volume = row['Processed volume']
current_lot = df2.iloc[0]['Lotno.'] if idx == 0 else df1.iloc[idx - 1]['Lotno.']
current_lot_quantity = df2.iloc[0]['Volume'] if idx == 0 else df1.iloc[idx - 1]['Left of Lotno.']
update_dict = recursive_revenu_calc(order_volume, current_lot, current_lot_quantity)
for key, value in update_dict.items():
df1.loc[idx, key] = value
return df1

How to add lines with annotations to candlestick charts when some values are missing?

I'm trying to use Plotly to overlay a marker/line chart on top of my OHLC candle chart.
Code
import plotly.graph_objects as go
import pandas as pd
from datetime import datetime
df = pd.DataFrame(
{'index': {0: 0,
1: 1,
2: 2,
3: 3,
4: 4,
5: 5,
6: 6,
7: 7,
8: 8,
9: 9,
10: 10,
11: 11,
12: 12,
13: 13,
14: 14,
15: 15,
16: 16,
17: 17,
18: 18,
19: 19,
20: 20,
21: 21,
22: 22,
23: 23,
24: 24},
'Date': {0: '2018-09-03',
1: '2018-09-04',
2: '2018-09-05',
3: '2018-09-06',
4: '2018-09-07',
5: '2018-09-10',
6: '2018-09-11',
7: '2018-09-12',
8: '2018-09-13',
9: '2018-09-14',
10: '2018-09-17',
11: '2018-09-18',
12: '2018-09-19',
13: '2018-09-20',
14: '2018-09-21',
15: '2018-09-24',
16: '2018-09-25',
17: '2018-09-26',
18: '2018-09-27',
19: '2018-09-28',
20: '2018-10-01',
21: '2018-10-02',
22: '2018-10-03',
23: '2018-10-04',
24: '2018-10-05'},
'Open': {0: 1.2922067642211914,
1: 1.2867859601974487,
2: 1.2859420776367188,
3: 1.2914056777954102,
4: 1.2928247451782229,
5: 1.292808175086975,
6: 1.3027958869934082,
7: 1.3017443418502808,
8: 1.30451238155365,
9: 1.3110626935958862,
10: 1.3071041107177734,
11: 1.3146650791168213,
12: 1.3166556358337402,
13: 1.3140604496002195,
14: 1.3271400928497314,
15: 1.3080958127975464,
16: 1.3117163181304932,
17: 1.3180439472198486,
18: 1.3169677257537842,
19: 1.3077707290649414,
20: 1.3039510250091553,
21: 1.3043931722640991,
22: 1.2979763746261597,
23: 1.2941633462905884,
24: 1.3022021055221558},
'High': {0: 1.2934937477111816,
1: 1.2870012521743774,
2: 1.2979259490966797,
3: 1.2959914207458496,
4: 1.3024225234985352,
5: 1.3052103519439695,
6: 1.30804443359375,
7: 1.3044441938400269,
8: 1.3120088577270508,
9: 1.3143367767333984,
10: 1.3156682252883911,
11: 1.3171066045761108,
12: 1.3211784362792969,
13: 1.3296104669570925,
14: 1.3278449773788452,
15: 1.3166556358337402,
16: 1.3175750970840454,
17: 1.3196094036102295,
18: 1.3180439472198486,
19: 1.3090718984603882,
20: 1.3097577095031738,
21: 1.3049719333648682,
22: 1.3020155429840088,
23: 1.3036959171295166,
24: 1.310753345489502},
'Low': {0: 1.2856279611587524,
1: 1.2813942432403564,
2: 1.2793285846710205,
3: 1.289723515510559,
4: 1.2918561697006226,
5: 1.289823293685913,
6: 1.2976733446121216,
7: 1.298414707183838,
8: 1.3027619123458862,
9: 1.3073604106903076,
10: 1.3070186376571655,
11: 1.3120776414871216,
12: 1.3120431900024414,
13: 1.3140085935592651,
14: 1.305841088294983,
15: 1.3064552545547483,
16: 1.3097233772277832,
17: 1.3141123056411743,
18: 1.309706211090088,
19: 1.3002548217773438,
20: 1.3014055490493774,
21: 1.2944146394729614,
22: 1.2964619398117063,
23: 1.2924572229385376,
24: 1.3005592823028564},
'Close': {0: 1.292306900024414,
1: 1.2869019508361816,
2: 1.2858428955078125,
3: 1.2914891242980957,
4: 1.2925406694412231,
5: 1.2930254936218262,
6: 1.302643060684204,
7: 1.3015578985214231,
8: 1.304546356201172,
9: 1.311131477355957,
10: 1.307326316833496,
11: 1.3146305084228516,
12: 1.3168463706970217,
13: 1.3141123056411743,
14: 1.327087163925171,
15: 1.30804443359375,
16: 1.3117333650588991,
17: 1.3179919719696045,
18: 1.3172800540924072,
19: 1.3078734874725342,
20: 1.3039000034332275,
21: 1.3043591976165771,
22: 1.2981956005096436,
23: 1.294062852859497,
24: 1.3024225234985352},
'Pivot Price': {0: 1.2934937477111816,
1: np.nan,
2: 1.2793285846710205,
3: np.nan,
4: np.nan,
5: np.nan,
6: np.nan,
7: np.nan,
8: np.nan,
9: np.nan,
10: np.nan,
11: np.nan,
12: np.nan,
13: 1.3296104669570925,
14: np.nan,
15: np.nan,
16: np.nan,
17: np.nan,
18: np.nan,
19: np.nan,
20: np.nan,
21: np.nan,
22: np.nan,
23: 1.2924572229385376,
24: np.nan}})
fig = go.Figure(data=[go.Candlestick(x=df['Date'],
open=df['Open'],
high=df['High'],
low=df['Low'],
close=df['Close'])])
fig.add_trace(
go.Scatter(mode = "lines+markers",
x=df['Date'],
y=df["Pivot Price"]
))
fig.update_layout(
autosize=False,
width=1000,
height=800,)
fig.show()
This is the current image
This is the desired output/image
I want black line between the markers (pivots). I would also ideally like a value next to each line showing the distance between each pivot but Im not sure how to do this.
For example the distance between the first two pivots round(abs(1.293494 - 1.279329),3) returns 0.014 so I would ideally like this next to the line.
The second is round(abs(1.279329 - 1.329610),3) so the value would be 0.05. I have hand edited the image and added the lines for the first two values to give a visual representation of what Im trying to achieve.

The problem seems to be the missing values. So just use pandas.Series.interpolate in combination with fig.add_annotation to get:
I've included annotations for differences as well. There are surely more elegant ways to do it than with for loops, but it does the job. Let me know if anything is unclear!
import pandas as pd
import numpy as np
import plotly.graph_objects as go
df = pd.DataFrame(
{'index': {0: 0,
1: 1,
2: 2,
3: 3,
4: 4,
5: 5,
6: 6,
7: 7,
8: 8,
9: 9,
10: 10,
11: 11,
12: 12,
13: 13,
14: 14,
15: 15,
16: 16,
17: 17,
18: 18,
19: 19,
20: 20,
21: 21,
22: 22,
23: 23,
24: 24},
'Date': {0: '2018-09-03',
1: '2018-09-04',
2: '2018-09-05',
3: '2018-09-06',
4: '2018-09-07',
5: '2018-09-10',
6: '2018-09-11',
7: '2018-09-12',
8: '2018-09-13',
9: '2018-09-14',
10: '2018-09-17',
11: '2018-09-18',
12: '2018-09-19',
13: '2018-09-20',
14: '2018-09-21',
15: '2018-09-24',
16: '2018-09-25',
17: '2018-09-26',
18: '2018-09-27',
19: '2018-09-28',
20: '2018-10-01',
21: '2018-10-02',
22: '2018-10-03',
23: '2018-10-04',
24: '2018-10-05'},
'Open': {0: 1.2922067642211914,
1: 1.2867859601974487,
2: 1.2859420776367188,
3: 1.2914056777954102,
4: 1.2928247451782229,
5: 1.292808175086975,
6: 1.3027958869934082,
7: 1.3017443418502808,
8: 1.30451238155365,
9: 1.3110626935958862,
10: 1.3071041107177734,
11: 1.3146650791168213,
12: 1.3166556358337402,
13: 1.3140604496002195,
14: 1.3271400928497314,
15: 1.3080958127975464,
16: 1.3117163181304932,
17: 1.3180439472198486,
18: 1.3169677257537842,
19: 1.3077707290649414,
20: 1.3039510250091553,
21: 1.3043931722640991,
22: 1.2979763746261597,
23: 1.2941633462905884,
24: 1.3022021055221558},
'High': {0: 1.2934937477111816,
1: 1.2870012521743774,
2: 1.2979259490966797,
3: 1.2959914207458496,
4: 1.3024225234985352,
5: 1.3052103519439695,
6: 1.30804443359375,
7: 1.3044441938400269,
8: 1.3120088577270508,
9: 1.3143367767333984,
10: 1.3156682252883911,
11: 1.3171066045761108,
12: 1.3211784362792969,
13: 1.3296104669570925,
14: 1.3278449773788452,
15: 1.3166556358337402,
16: 1.3175750970840454,
17: 1.3196094036102295,
18: 1.3180439472198486,
19: 1.3090718984603882,
20: 1.3097577095031738,
21: 1.3049719333648682,
22: 1.3020155429840088,
23: 1.3036959171295166,
24: 1.310753345489502},
'Low': {0: 1.2856279611587524,
1: 1.2813942432403564,
2: 1.2793285846710205,
3: 1.289723515510559,
4: 1.2918561697006226,
5: 1.289823293685913,
6: 1.2976733446121216,
7: 1.298414707183838,
8: 1.3027619123458862,
9: 1.3073604106903076,
10: 1.3070186376571655,
11: 1.3120776414871216,
12: 1.3120431900024414,
13: 1.3140085935592651,
14: 1.305841088294983,
15: 1.3064552545547483,
16: 1.3097233772277832,
17: 1.3141123056411743,
18: 1.309706211090088,
19: 1.3002548217773438,
20: 1.3014055490493774,
21: 1.2944146394729614,
22: 1.2964619398117063,
23: 1.2924572229385376,
24: 1.3005592823028564},
'Close': {0: 1.292306900024414,
1: 1.2869019508361816,
2: 1.2858428955078125,
3: 1.2914891242980957,
4: 1.2925406694412231,
5: 1.2930254936218262,
6: 1.302643060684204,
7: 1.3015578985214231,
8: 1.304546356201172,
9: 1.311131477355957,
10: 1.307326316833496,
11: 1.3146305084228516,
12: 1.3168463706970217,
13: 1.3141123056411743,
14: 1.327087163925171,
15: 1.30804443359375,
16: 1.3117333650588991,
17: 1.3179919719696045,
18: 1.3172800540924072,
19: 1.3078734874725342,
20: 1.3039000034332275,
21: 1.3043591976165771,
22: 1.2981956005096436,
23: 1.294062852859497,
24: 1.3024225234985352},
'Pivot Price': {0: 1.2934937477111816,
1: np.nan,
2: 1.2793285846710205,
3: np.nan,
4: np.nan,
5: np.nan,
6: np.nan,
7: np.nan,
8: np.nan,
9: np.nan,
10: np.nan,
11: np.nan,
12: np.nan,
13: 1.3296104669570925,
14: np.nan,
15: np.nan,
16: np.nan,
17: np.nan,
18: np.nan,
19: np.nan,
20: np.nan,
21: np.nan,
22: np.nan,
23: 1.2924572229385376,
24: np.nan}})
import plotly.graph_objects as go
import pandas as pd
from datetime import datetime
# df=pd.read_csv("for_so.csv")
fig = go.Figure(data=[go.Candlestick(x=df['Date'],
# fig = go.Figure(data=[go.Candlestick(x=df.index,
open=df['Open'],
high=df['High'],
low=df['Low'],
close=df['Close'])])
# some calculations
df_diff = df['Pivot Price'].dropna().diff().copy()
df2 = df[df.index.isin(df_diff.index)].copy()
df2['Price Diff'] = df['Pivot Price'].dropna().values
fig.add_trace(
go.Scatter(mode = "lines+markers",
x=df['Date'],
y=df["Pivot Price"]
))
fig.update_layout(
autosize=False,
width=1000,
height=800,)
fig.add_trace(go.Scatter(x=df['Date'], y=df['Pivot Price'].interpolate(),
# fig.add_trace(go.Scatter(x=df.index, y=df['Pivot Price'].interpolate(),
mode = 'lines',
line = dict(color='black')))
def annot(value):
# print(type(value))
if np.isnan(value):
return ''
else:
return value
j = 0
for i, p in enumerate(df['Pivot Price']):
# print(p)
# if not np.isnan(p) and not np.isnan(df_diff.iloc[j]):
if not np.isnan(p):
# print(not np.isnan(df_diff.iloc[j]))
fig.add_annotation(dict(font=dict(color='rgba(0,0,200,0.8)',size=12),
x=df['Date'].iloc[i],
# x=df.index[i],
# x = xStart
y=p,
showarrow=False,
text=annot(round(abs(df_diff.iloc[j]),3)),
textangle=0,
xanchor='right',
xref="x",
yref="y"))
j = j + 1
fig.update_xaxes(type='category')
fig.show()

Problem seems the missing values, plotly has difficulty with. With this trick you can only plot the point;
has_value = ~df["Pivot Price"].isna()
import plotly.graph_objects as go
import pandas as pd
from datetime import datetime
df=pd.read_csv("notebooks/for_so.csv")
fig = go.Figure(data=[go.Candlestick(x=df['Date'],
open=df['Open'],
high=df['High'],
low=df['Low'],
close=df['Close'])])
fig.add_trace(
go.Scatter(mode = 'lines',
x=df[has_value]['Date'],
y=df[has_value]["Pivot Price"], line={'color':'black', 'width':1}
))
fig.add_trace(
go.Scatter(mode = "markers",
x=df['Date'],
y=df["Pivot Price"]
))
fig.update_layout(
autosize=False,
width=1000,
height=800,)
fig.show()
This did it for me.

How to remove duplicates based on lower frequency [duplicate]

This question already has answers here:
Get the row(s) which have the max value in groups using groupby
(15 answers)
Closed 2 years ago.
I have a table that looks like this
I want to be able to keep ids for brands that have highest freq. For example in case of audi both ids have same frequencies so keep only one. In case of mercedes-benz keep the latter one since it has frequency 7.
This is my dataframe:
{'Brand':
{0: 'audi',
1: 'audi',
2: 'bmw',
3: 'dacia',
4: 'fiat',
5: 'ford',
6: 'ford',
7: 'honda',
8: 'honda',
9: 'hyundai',
10: 'kia',
11: 'mercedes-benz',
12: 'mercedes-benz',
13: 'nissan',
14: 'nissan',
15: 'opel',
16: 'renault',
17: 'renault',
18: 'renault',
19: 'renault',
20: 'toyota',
21: 'toyota',
22: 'volvo',
23: 'vw',
24: 'vw',
25: 'vw',
26: 'vw'},
'id':
{0: 'audi_a4_dynamic_2016_otomatik',
1: 'audi_a6_standart_2015_otomatik',
2: 'bmw_5 series_executive_2016_otomatik',
3: 'dacia_duster_laureate_2017_manuel',
4: 'fiat_egea_easy_2017_manuel',
5: 'ford_focus_trend x_2015_manuel',
6: 'ford_focus_trend x_2015_otomatik',
7: 'honda_civic_eco elegance_2017_otomatik',
8: 'honda_cr-v_executive_2018_otomatik',
9: 'hyundai_tucson_elite plus_2017_otomatik',
10: 'kia_sportage_concept plus_2015_otomatik',
11: 'mercedes-benz_c-class_amg_2016_otomatik',
12: 'mercedes-benz_e-class_edition e_2015_otomatik',
13: 'nissan_qashqai_black edition_2014_manuel',
14: 'nissan_qashqai_sky pack_2015_otomatik',
15: 'opel_astra_edition_2016_manuel',
16: 'renault_clio_joy_2016_manuel',
17: 'renault_kadjar_icon_2015_otomatik',
18: 'renault_kadjar_icon_2016_otomatik',
19: 'renault_mégane_touch_2017_otomatik',
20: 'toyota_corolla_touch_2015_otomatik',
21: 'toyota_corolla_touch_2016_otomatik',
22: 'volvo_s60_advance_2018_otomatik',
23: 'vw_jetta_comfortline_2013_otomatik',
24: 'vw_passat_highline_2017_otomatik',
25: 'vw_tiguan_sport&style_2012_manuel',
26: 'vw_tiguan_sport&style_2013_manuel'},
'freq': {0: 4,
1: 4,
2: 7,
3: 4,
4: 4,
5: 4,
6: 4,
7: 4,
8: 4,
9: 4,
10: 4,
11: 4,
12: 7,
13: 4,
14: 4,
15: 4,
16: 4,
17: 4,
18: 4,
19: 4,
20: 4,
21: 4,
22: 4,
23: 4,
24: 7,
25: 4,
26: 4}}
Edit: tried one of the answers and got an extra level of header

You need to pandas.groupby Brand and then aggregate with respect to the maximal frequency.
Something like this should work:
df.groupby('Brand')[['id', 'freq']].agg({'freq': 'max'})

To get your result, run:
result = df.groupby('Brand', as_index=False).apply(
lambda grp: grp[grp.freq == grp.freq.max()].iloc[0])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Animate a Plotly map with a sliding date bar - python

Related

Pandas merge dataframes with multiple columns

Computing the average

Python: looping through 2 dataframes having thresholds and calculating revenue, stuck

How to add lines with annotations to candlestick charts when some values are missing?

How to remove duplicates based on lower frequency [duplicate]

Categories

Resources