Bar chart with 2 data series with Pandas Dataframe and Plotly

Bar chart with 2 data series with Pandas Dataframe and Plotly - python

I have a dataframe with the following data (example):
year
month
count
2020
11
100
12
50
2021
01
80
02
765
03
100
04
265
05
500
I would like to plot this with plotly on a bar chart where I would have 2 vertical bars for each month, one for 2020 and another for 2021. I would like the axis to be defined automatically based on the existing values on the dataset which could change. today is only for year 2020 and 2021 but it could be different.
I have searched for information but is always mentioning hardcoded dataset series names and data and I'm not understanding how I could dynamically input these in ploty.
I was expecting something like this but it is not working:
import plotly.express as px
...
px.bar(df, x=['year','month'], y='count')
fig.show()
Thank you,

To get two vertical bar for each month, I'm guessing the traces should represent each individual year. In that case you can use:
for y in df.year.unique():
dfy = df[df.year == y]
fig.add_bar(x = dfy.month, y = dfy.value, name = str(y))
Plot 1
That's the result for your limited dataset, though. If you expand the dataset a bit you'll get a better impression of how it will look:
Plot 2
Complete code:
import plotly.graph_objects as go
import pandas as pd
df = pd.DataFrame({'year': {0: 2020, 1: 2020, 2: 2021, 3: 2021, 4: 2021, 5: 2021, 6: 2021},
'month': {0: 11, 1: 12, 2: 1, 3: 2, 4: 3, 5: 4, 6: 5},
'value': {0: 100, 1: 50, 2: 80, 3: 765, 4: 100, 5: 265, 6: 500}})
df = pd.DataFrame({'year': {0: 2020,
1: 2020,
2: 2020,
3: 2020,
4: 2020,
5: 2020,
6: 2020,
7: 2020,
8: 2020,
9: 2020,
10: 2020,
11: 2020,
12: 2021,
13: 2021,
14: 2021,
15: 2021,
16: 2021,
17: 2021,
18: 2021,
19: 2021,
20: 2021,
21: 2021,
22: 2021,
23: 2021},
'month': {0: 1,
1: 2,
2: 3,
3: 4,
4: 5,
5: 6,
6: 7,
7: 8,
8: 9,
9: 10,
10: 11,
11: 12,
12: 1,
13: 2,
14: 3,
15: 4,
16: 5,
17: 6,
18: 7,
19: 8,
20: 9,
21: 10,
22: 11,
23: 12},
'value': {0: 100,
1: 50,
2: 265,
3: 500,
4: 80,
5: 765,
6: 100,
7: 265,
8: 500,
9: 80,
10: 765,
11: 100,
12: 80,
13: 765,
14: 100,
15: 265,
16: 500,
17: 80,
18: 765,
19: 100,
20: 265,
21: 500,
22: 80,
23: 765}})
fig = go.Figure()
for y in df.year.unique():
dfy = df[df.year == y]
fig.add_bar(x = dfy.month, y = dfy.value, name = str(y))
fig.show()

have modified your data to demonstrate
this is an example of https://plotly.com/python/categorical-axes/#multicategorical-axes Hence need to use go
import pandas as pd
import io
import plotly.express as px
import plotly.graph_objects as go
df = pd.read_csv(
io.StringIO(
"""year,month,count
2020,1,50
2020,2,50
2020,3,50
2020,4,50
2020,11,100
2020,12,50
2021,1,80
2021,2,765
2021,3,100
2021,4,265
2021,5,500"""
)
)
go.Figure(go.Bar(x=[df["month"].tolist(), df["year"].tolist()], y=df["count"]))

Using Plotly Express and updated with multi-categorical x-axis:
import pandas as pd
import io
import plotly.express as px
df = pd.read_csv(
io.StringIO(
"""year,month,count
2020,1,50
2020,2,50
2020,3,50
2020,4,50
2020,11,100
2020,12,50
2021,1,80
2021,2,765
2021,3,100
2021,4,265
2021,5,500"""
)
)
# convert year to string so you get a catergorical scale
df['year'] = df['year'].astype(str)
channel_top_Level = "year"
channel_2nd_Level = "month"
fig = px.bar(df, x = channel_2nd_Level, y = 'count', color = channel_top_Level)
for num,channel_top_Level_val in enumerate(df[channel_top_Level].unique()):
temp_df = df.query(f"{channel_top_Level} == {channel_top_Level_val !r}")
fig.data[num].x = [
temp_df[channel_2nd_Level].tolist(),
temp_df[channel_top_Level].tolist()
]
fig.layout.xaxis.title.text = f"{channel_top_Level} / { channel_2nd_Level}"
fig

Related

How to get cumulative sum over index and columns in pandas? [duplicate]

This question already has an answer here:
How can I use cumsum within a group in Pandas?
(1 answer)
Closed 6 months ago.
I have a periodic table that includes premium in different categories over a year for different companies. The dataframe looks like the below:
Company
Type
Month
Year
Ferdi Grup
Premium
1
Allianz
Birikimli Hayat
1
2022
Ferdi
325
2
Allianz
Birikimli Hayat
2
2022
Ferdi
476
3
Axa
Birikimli Hayat
3
2022
Ferdi
687
I want to get a table where I can see the premium cumulated over 'Company' and 'Year'. For each month I want to see premium cumulated from the beginning of the year.
This is the regular sum operation which works well in this case.
data.pivot_table(
columns = 'Company',
index = 'Month',
values = 'Premium',
aggfunc= np.sum
)
However when I change to np.cumsum the result is a series. I want a cumulated pivot table for each year, adding each month's value to the previous ones. How can I do that?
Expected output:
Company
Month
Year
Premium
1
Allianz
1
2022
325
2
Allianz
2
2022
801
3
Axa
3
2022
687
So, this is the original data I am working with:
{'Company': {0: 'AgeSA',
1: 'Türkiye',
2: 'Türkiye',
3: 'AgeSA',
4: 'AgeSA',
5: 'Türkiye',
6: 'AgeSA',
7: 'Türkiye',
8: 'Türkiye',
9: 'AgeSA',
10: 'Türkiye',
11: 'Türkiye',
12: 'AgeSA',
13: 'Türkiye',
14: 'Türkiye',
15: 'AgeSA',
16: 'AgeSA',
17: 'Türkiye',
18: 'AgeSA',
19: 'Türkiye',
20: 'Türkiye',
21: 'AgeSA',
22: 'Türkiye',
23: 'Türkiye'},
'Type': {0: 'Birikimli Hayat',
1: 'Birikimli Hayat',
2: 'Sadece Yaşam Teminatlı',
3: 'Karma Sigorta',
4: 'Yıllık Vefat',
5: 'Yıllık Vefat',
6: 'Uzun Süreli Vefat',
7: 'Uzun Süreli Vefat',
8: 'Birikimli Hayat',
9: 'Yıllık Vefat',
10: 'Yıllık Vefat',
11: 'Uzun Süreli Vefat',
12: 'Birikimli Hayat',
13: 'Birikimli Hayat',
14: 'Sadece Yaşam Teminatlı',
15: 'Karma Sigorta',
16: 'Yıllık Vefat',
17: 'Yıllık Vefat',
18: 'Uzun Süreli Vefat',
19: 'Uzun Süreli Vefat',
20: 'Birikimli Hayat',
21: 'Yıllık Vefat',
22: 'Yıllık Vefat',
23: 'Uzun Süreli Vefat'},
'Month': {0: 1,
1: 1,
2: 1,
3: 1,
4: 1,
5: 1,
6: 1,
7: 1,
8: 1,
9: 1,
10: 1,
11: 1,
12: 2,
13: 2,
14: 2,
15: 2,
16: 2,
17: 2,
18: 2,
19: 2,
20: 2,
21: 2,
22: 2,
23: 2},
'Year': {0: 2022,
1: 2022,
2: 2022,
3: 2022,
4: 2022,
5: 2022,
6: 2022,
7: 2022,
8: 2022,
9: 2022,
10: 2022,
11: 2022,
12: 2022,
13: 2022,
14: 2022,
15: 2022,
16: 2022,
17: 2022,
18: 2022,
19: 2022,
20: 2022,
21: 2022,
22: 2022,
23: 2022},
'Ferdi Grup': {0: 'Ferdi',
1: 'Ferdi',
2: 'Ferdi',
3: 'Ferdi',
4: 'Ferdi',
5: 'Ferdi',
6: 'Ferdi',
7: 'Ferdi',
8: 'Grup',
9: 'Grup',
10: 'Grup',
11: 'Grup',
12: 'Ferdi',
13: 'Ferdi',
14: 'Ferdi',
15: 'Ferdi',
16: 'Ferdi',
17: 'Ferdi',
18: 'Ferdi',
19: 'Ferdi',
20: 'Grup',
21: 'Grup',
22: 'Grup',
23: 'Grup'},
'Premium': {0: 936622.43,
1: 14655.67,
2: 8496.0,
3: 124768619.29,
4: 6651019.24,
5: 11055383.530005993,
6: 54273212.457471885,
7: 22163192.66,
8: 81000.95,
9: 9338009.52,
10: 251790130.54997802,
11: 140949274.79999998,
12: 910808.77,
13: 8754.71,
14: 7128.0,
15: 129753498.31,
16: 8015974.454128993,
17: 16776490.000003006,
18: 67607915.34000003,
19: 24683694.700000003,
20: 60887.56,
21: 1497105.2458709963,
22: 195019190.297756,
23: 167424048.43},
'cumsum': {0: 936622.43,
1: 14655.67,
2: 23151.67,
3: 125705241.72000001,
4: 132356260.96000001,
5: 11078535.200005993,
6: 186629473.4174719,
7: 33241727.860005993,
8: 33322728.810005993,
9: 195967482.9374719,
10: 285112859.35998404,
11: 426062134.159984,
12: 196878291.7074719,
13: 426070888.869984,
14: 426078016.869984,
15: 326631790.0174719,
16: 334647764.4716009,
17: 442854506.869987,
18: 402255679.8116009,
19: 467538201.569987,
20: 467599089.129987,
21: 403752785.05747193,
22: 662618279.427743,
23: 830042327.857743}}
This is the result of a regular sum pivot:
AgeSA
Türkiye
1
195967482.9374719
426062134.159984
2
207785302.12000003
403980193.69775903
When I use the suggested code as below:
df_2 = data.copy()
df_2['cumsum'] = df_2.groupby(['Company', 'Year'])[['Premium']].cumsum()
df_2.sort_values(['Company', 'Year', 'cumsum']).reset_index(drop = True)
Each line gets a cumsum value from the above lines it seems:
For me to be able to get the table I need, I need to get max in each group again in a pivot_table:
df_2.pivot_table(
index = ['Year', 'Month'],
values = ['Premium', 'cumsum'],
columns = 'Company',
aggfunc = {'Premium': 'sum', 'cumsum': 'max'}
)
which finally gets me to this result:
Is it that difficult to get the cumsum table in pandas or am I just doing it the hard way?

Your dataframe is already in the right format, why you want to pivot it again?
I think what you are searching for is a pandas.groupby.
df['cumsum_by_group'] = df.groupby(['Company', 'Year'])['Premium'].cumsum()
Output:
Company Type Month Year Ferdi Grup Premium cumsum_by_group
1 Allianz Birikimli Hayat 1 2022 Ferdi 325 325
2 Allianz Birikimli Hayat 2 2022 Ferdi 476 801
3 Axa Birikimli Hayat 3 2022 Ferdi 687 687

To calculate the cumulative sum over multiple colums of a dataframe, you can use pandas.DataFrame.groupby and pandas.DataFrame.cumsum combined.
Assuming that data is the dataframe that holds the original dataset, use the code below :
data['Premium'] = data.groupby(['Company', 'Year'])['Premium'].cumsum()
out = data[['Company', 'Month', 'Year', 'Premium']] #to select the specific columns
>>> print(out)

Plotly: How to make a jagged line plot look better?

I made a chart showing the number of items purchased over a period of time. The graph seems unreadable to me, hard to get the right perspective. My code below:
import numpy as np
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)
trace1 = go.Scatter(x=df_temp['Date'],
y=df_temp['Quantity'],
line = dict(color = 'blue'),
opacity = 0.3)
layout = dict(title='Purchases of NC coin',)
fig = dict(data=[trace1], layout=layout)
iplot(fig)
And some of my data:
Id Date Quantity
8 2022-01-16 19:14:56 50814.040553
15 2022-01-12 09:18:01 2563.443420
17 2022-01-11 13:52:38 33055.752836
18 2022-01-11 11:49:54 6483.182959
19 2022-01-11 11:07:48 13005.174783
21 2022-01-11 10:50:20 19605.381370
23 2022-01-11 10:15:30 6561.223602
24 2022-01-11 10:14:44 19762.821100
28 2022-01-07 15:56:50 3307.607665
29 2022-01-07 15:54:30 66868.030051
30 2022-01-07 12:27:07 42683.069577
31 2022-01-07 12:20:51 3423.618394
34 2022-01-05 12:11:57 69607.963793
35 2022-01-05 10:41:48 20370.090947
37 2022-01-05 10:21:22 72415.914082
38 2022-01-05 10:05:04 20687.003754
39 2022-01-05 09:36:53 37410.532342
40 2022-01-05 08:35:06 43815.009603
41 2022-01-04 19:27:27 30581.795021
44 2022-01-03 16:34:41 14290.644375
My plot looks like this now:
Do you have any ideas?

In my opinion, you've got three options:
1. If no aggregation is desired, use a barplot with px.bar
2. Aggregate by day and use a line plot
3. Aggregate by day and use a bar plot
Since you're specifically asking for aesthetics, and not Plotly code, I'm going to use Plotly Express instead of iplot. You should too! If for some reason you can't, just let me know.
Complete code:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
df_temp = pd.DataFrame({'Id': {0: 8,
1: 15,
2: 17,
3: 18,
4: 19,
5: 21,
6: 23,
7: 24,
8: 28,
9: 29,
10: 30,
11: 31,
12: 34,
13: 35,
14: 37,
15: 38,
16: 39,
17: 40,
18: 41,
19: 44},
'Date': {0: '2022-01-16',
1: '2022-01-12',
2: '2022-01-11',
3: '2022-01-11',
4: '2022-01-11',
5: '2022-01-11',
6: '2022-01-11',
7: '2022-01-11',
8: '2022-01-07',
9: '2022-01-07',
10: '2022-01-07',
11: '2022-01-07',
12: '2022-01-05',
13: '2022-01-05',
14: '2022-01-05',
15: '2022-01-05',
16: '2022-01-05',
17: '2022-01-05',
18: '2022-01-04',
19: '2022-01-03'},
'Time': {0: '19:14:56',
1: '09:18:01',
2: '13:52:38',
3: '11:49:54',
4: '11:07:48',
5: '10:50:20',
6: '10:15:30',
7: '10:14:44',
8: '15:56:50',
9: '15:54:30',
10: '12:27:07',
11: '12:20:51',
12: '12:11:57',
13: '10:41:48',
14: '10:21:22',
15: '10:05:04',
16: '09:36:53',
17: '08:35:06',
18: '19:27:27',
19: '16:34:41'},
'Quantity': {0: 50814.040553,
1: 2563.44342,
2: 33055.752836,
3: 6483.182959,
4: 13005.174783,
5: 19605.38137,
6: 6561.223602,
7: 19762.8211,
8: 3307.607665,
9: 66868.030051,
10: 42683.069577,
11: 3423.618394,
12: 69607.963793,
13: 20370.090947,
14: 72415.914082,
15: 20687.003754,
16: 37410.532342,
17: 43815.009603,
18: 30581.795021,
19: 14290.644375}})
trace1 = go.Scatter(x=df_temp['Date'],
y=df_temp['Quantity'],
line = dict(color = 'blue'),
opacity = 0.3)
layout = dict(title='Purchases of NC coin',)
# build pandas datetime series
df_temp['DateTime'] = pd.to_datetime(df_temp.Date+' '+df_temp.Time)
# # unaggregated barplot
# fig = px.bar(df_temp, x = 'DateTime', y = 'Quantity')
# fig.update_traces(marker_line_color = 'blue')
# fig.update_layout(title='Purchases of NC coin')
# aggregate by day
df_temp = df_temp.groupby(by=[df_temp.DateTime.dt.date]).mean().reset_index()
# # aggregated lineplot
# fig = px.line(df_temp, x = 'DateTime', y = 'Quantity')
# fig.update_traces(marker_line_color = 'blue')
# fig.update_layout(title='Purchases of NC coin')
# aggregated barplot
fig = px.bar(df_temp, x = 'DateTime', y = 'Quantity')
fig.update_traces(marker_line_color = 'blue')
fig.update_layout(title='Purchases of NC coin')
fig.show()

Creating a for loop that creates a histogram for each column and groupsby index value

I have a dataframe that looks like this:
time_spent_inx time_spent_iny
name bin
team_a (0, 200] 10 0
(200, 400] 0 0
(400, 600] 20 0
(600, 800] 0 20
(800, 1000] 20 20
(1000, 1200] 10 0
team_b (0, 200] 5 35
(200, 400] 0 0
(400, 600] 40 0
(600, 800] 10 20
(800, 1000] 20 0
(1000, 1200] 10 70
team_c (0, 200] 5 30
(200, 400] 25 0
(400, 600] 0 0
(600, 800] 10 5
(800, 1000] 5 0
(1000, 1200] 10 25
I want to create a histogram for each column, for each different team. The bins are already pre-defined.
this is what I've tried to use so far:
df = df.reset_index()
a= df["bin"].astype(str)
b= df["time_spent_inx"]
fig, ax = plt.subplots(figsize=(100,100) )
ax.hist(a,6,weights=b, by = df["name"])
ax.set_xticklabels(a,rotation = 90)
ax.xaxis.set_tick_params(which='major', labelsize=60, width=2.5, length=10)
ax.yaxis.set_tick_params(which='major', labelsize=60, width=2.5, length=10)
plt.show()
I've used the weights argument since the data is already binned. the above code is just i'ved tried to use to plot the histogram for one column ("time_spent_inx").
This is one error i get: AttributeError: 'Rectangle' object has no property 'by'

It is not really clear what you are trying to achieve. As far as I can see you already calculated the bins. If you are happy using plotly you could follow these steps
Data
import pandas as pd
data = {'name': {0: 'team_a',
1: 'team_a',
2: 'team_a',
3: 'team_a',
4: 'team_a',
5: 'team_a',
6: 'team_b',
7: 'team_b',
8: 'team_b',
9: 'team_b',
10: 'team_b',
11: 'team_b',
12: 'team_c',
13: 'team_c',
14: 'team_c',
15: 'team_c',
16: 'team_c',
17: 'team_c'},
'bin': {0: '(0, 200]',
1: '(200, 400]',
2: '(400, 600]',
3: '(600, 800]',
4: '(800, 1000]',
5: '(1000, 1200]',
6: '(0, 200]',
7: '(200, 400]',
8: '(400, 600]',
9: '(600, 800]',
10: '(800, 1000]',
11: '(1000, 1200]',
12: '(0, 200]',
13: '(200, 400]',
14: '(400, 600]',
15: '(600, 800]',
16: '(800, 1000]',
17: '(1000, 1200]'},
'time_spent_inx': {0: 10,
1: 0,
2: 20,
3: 0,
4: 20,
5: 10,
6: 5,
7: 0,
8: 40,
9: 10,
10: 20,
11: 10,
12: 5,
13: 25,
14: 0,
15: 10,
16: 5,
17: 10},
'time_spent_iny': {0: 0,
1: 0,
2: 0,
3: 20,
4: 20,
5: 0,
6: 35,
7: 0,
8: 0,
9: 20,
10: 0,
11: 70,
12: 30,
13: 0,
14: 0,
15: 5,
16: 0,
17: 25}}
df = pd.DataFrame(data)
Transform wide df to long one
df_long = pd.melt(df,
id_vars=['name', 'bin'],
value_vars=["time_spent_inx", "time_spent_iny"],
var_name="time_spent")
Use plotly express
import plotly.express as px
px.bar(df_long,
x="bin",
y="value",
color="name",
barmode="group",
facet_col="time_spent")

How to plot the daily maximum values

I'm trying to plot the maximum value per day of a dataframe column (ext_temp):
import pandas as pd
data = {'vin': {0: 'VF1AG0000KF908155', 1: 'VF1AG0000KF908155', 2: 'VF1AG0000KF908155', 3: 'VF1AG0000KF908155', 4: 'VF1AG0000KF908155', 5: 'VF1AG0000KF908155', 6: 'VF1AG0000KF908155', 7: 'VF1AG0000KF908155', 8: 'VF1AG0000KF908155', 9: 'VF1AG0000KF908155'}, 'date': {0: pd.Timestamp('2019-09-27 07:07:02'), 1: pd.Timestamp('2019-09-27 09:23:08'), 2: pd.Timestamp('2019-09-27 09:39:08'), 3: pd.Timestamp('2020-07-15 11:46:41'), 4: pd.Timestamp('2020-07-16 07:17:52'), 5: pd.Timestamp('2020-07-16 09:23:47'), 6: pd.Timestamp('2020-09-11 07:43:05'), 7: pd.Timestamp('2020-09-17 15:00:33'), 8: pd.Timestamp('2020-10-21 06:49:58'), 9: pd.Timestamp('2020-10-21 14:47:33')}, 'sohe': {0: 101, 1: 101, 2: 101, 3: 96, 4: 96, 5: 96, 6: 96, 7: 96, 8: 96, 9: 96}, 'soc': {0: 60, 1: 63, 2: 99, 3: 66, 4: 68, 5: 69, 6: 86, 7: 58, 8: 9, 9: 9}, 'ext_temp': {0: 27, 1: 30, 2: 31, 3: 30, 4: 26, 5: 29, 6: 26, 7: 29, 8: 28, 9: 27}, 'battery_temp': {0: 27, 1: 33, 2: 32, 3: 26, 4: 26, 5: 26, 6: 26, 7: 30, 8: 27, 9: 29}}
df = pd.DataFrame(data)
Unfortunately, when trying to use
nd = "VF1AG0000KF908155"
df = charge[charge.vin==gop]
df = df.groupby(pd.Grouper(key = 'date', freq = 'D'))
fig,ax = plt.subplots()
ax.plot(df.date, df['ext_temp'].max())
I get the following error message :
VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray

Using pd.Grouper has will fill missing days with a NaN
If you don't want missing days filled in, groupby the date component of 'date' by using the .dt extractor.
Use pandas.DataFrame.plot for plotting the dataframe
kind='bar' was used, since there's not much data. For a line plot, use kind='line'.
pd.Grouper
Note the need to use .dropna(), at least to plot the bar plot.
dfg = df.groupby(pd.Grouper(key='date', freq='D'))['ext_temp'].max().dropna()
ax = dfg.plot(kind='bar')
dfg = df.groupby(pd.Grouper(key='date', freq='D'))['ext_temp'].max().dropna()
ax = dfg.plot(kind='line')
.dt.date
Groupby only the date component of the 'date' column
dfg = df.groupby(df.date.dt.date)['ext_temp'].max()
ax = dfg.plot(kind='bar')

How to add lines with annotations to candlestick charts when some values are missing?

I'm trying to use Plotly to overlay a marker/line chart on top of my OHLC candle chart.
Code
import plotly.graph_objects as go
import pandas as pd
from datetime import datetime
df = pd.DataFrame(
{'index': {0: 0,
1: 1,
2: 2,
3: 3,
4: 4,
5: 5,
6: 6,
7: 7,
8: 8,
9: 9,
10: 10,
11: 11,
12: 12,
13: 13,
14: 14,
15: 15,
16: 16,
17: 17,
18: 18,
19: 19,
20: 20,
21: 21,
22: 22,
23: 23,
24: 24},
'Date': {0: '2018-09-03',
1: '2018-09-04',
2: '2018-09-05',
3: '2018-09-06',
4: '2018-09-07',
5: '2018-09-10',
6: '2018-09-11',
7: '2018-09-12',
8: '2018-09-13',
9: '2018-09-14',
10: '2018-09-17',
11: '2018-09-18',
12: '2018-09-19',
13: '2018-09-20',
14: '2018-09-21',
15: '2018-09-24',
16: '2018-09-25',
17: '2018-09-26',
18: '2018-09-27',
19: '2018-09-28',
20: '2018-10-01',
21: '2018-10-02',
22: '2018-10-03',
23: '2018-10-04',
24: '2018-10-05'},
'Open': {0: 1.2922067642211914,
1: 1.2867859601974487,
2: 1.2859420776367188,
3: 1.2914056777954102,
4: 1.2928247451782229,
5: 1.292808175086975,
6: 1.3027958869934082,
7: 1.3017443418502808,
8: 1.30451238155365,
9: 1.3110626935958862,
10: 1.3071041107177734,
11: 1.3146650791168213,
12: 1.3166556358337402,
13: 1.3140604496002195,
14: 1.3271400928497314,
15: 1.3080958127975464,
16: 1.3117163181304932,
17: 1.3180439472198486,
18: 1.3169677257537842,
19: 1.3077707290649414,
20: 1.3039510250091553,
21: 1.3043931722640991,
22: 1.2979763746261597,
23: 1.2941633462905884,
24: 1.3022021055221558},
'High': {0: 1.2934937477111816,
1: 1.2870012521743774,
2: 1.2979259490966797,
3: 1.2959914207458496,
4: 1.3024225234985352,
5: 1.3052103519439695,
6: 1.30804443359375,
7: 1.3044441938400269,
8: 1.3120088577270508,
9: 1.3143367767333984,
10: 1.3156682252883911,
11: 1.3171066045761108,
12: 1.3211784362792969,
13: 1.3296104669570925,
14: 1.3278449773788452,
15: 1.3166556358337402,
16: 1.3175750970840454,
17: 1.3196094036102295,
18: 1.3180439472198486,
19: 1.3090718984603882,
20: 1.3097577095031738,
21: 1.3049719333648682,
22: 1.3020155429840088,
23: 1.3036959171295166,
24: 1.310753345489502},
'Low': {0: 1.2856279611587524,
1: 1.2813942432403564,
2: 1.2793285846710205,
3: 1.289723515510559,
4: 1.2918561697006226,
5: 1.289823293685913,
6: 1.2976733446121216,
7: 1.298414707183838,
8: 1.3027619123458862,
9: 1.3073604106903076,
10: 1.3070186376571655,
11: 1.3120776414871216,
12: 1.3120431900024414,
13: 1.3140085935592651,
14: 1.305841088294983,
15: 1.3064552545547483,
16: 1.3097233772277832,
17: 1.3141123056411743,
18: 1.309706211090088,
19: 1.3002548217773438,
20: 1.3014055490493774,
21: 1.2944146394729614,
22: 1.2964619398117063,
23: 1.2924572229385376,
24: 1.3005592823028564},
'Close': {0: 1.292306900024414,
1: 1.2869019508361816,
2: 1.2858428955078125,
3: 1.2914891242980957,
4: 1.2925406694412231,
5: 1.2930254936218262,
6: 1.302643060684204,
7: 1.3015578985214231,
8: 1.304546356201172,
9: 1.311131477355957,
10: 1.307326316833496,
11: 1.3146305084228516,
12: 1.3168463706970217,
13: 1.3141123056411743,
14: 1.327087163925171,
15: 1.30804443359375,
16: 1.3117333650588991,
17: 1.3179919719696045,
18: 1.3172800540924072,
19: 1.3078734874725342,
20: 1.3039000034332275,
21: 1.3043591976165771,
22: 1.2981956005096436,
23: 1.294062852859497,
24: 1.3024225234985352},
'Pivot Price': {0: 1.2934937477111816,
1: np.nan,
2: 1.2793285846710205,
3: np.nan,
4: np.nan,
5: np.nan,
6: np.nan,
7: np.nan,
8: np.nan,
9: np.nan,
10: np.nan,
11: np.nan,
12: np.nan,
13: 1.3296104669570925,
14: np.nan,
15: np.nan,
16: np.nan,
17: np.nan,
18: np.nan,
19: np.nan,
20: np.nan,
21: np.nan,
22: np.nan,
23: 1.2924572229385376,
24: np.nan}})
fig = go.Figure(data=[go.Candlestick(x=df['Date'],
open=df['Open'],
high=df['High'],
low=df['Low'],
close=df['Close'])])
fig.add_trace(
go.Scatter(mode = "lines+markers",
x=df['Date'],
y=df["Pivot Price"]
))
fig.update_layout(
autosize=False,
width=1000,
height=800,)
fig.show()
This is the current image
This is the desired output/image
I want black line between the markers (pivots). I would also ideally like a value next to each line showing the distance between each pivot but Im not sure how to do this.
For example the distance between the first two pivots round(abs(1.293494 - 1.279329),3) returns 0.014 so I would ideally like this next to the line.
The second is round(abs(1.279329 - 1.329610),3) so the value would be 0.05. I have hand edited the image and added the lines for the first two values to give a visual representation of what Im trying to achieve.

The problem seems to be the missing values. So just use pandas.Series.interpolate in combination with fig.add_annotation to get:
I've included annotations for differences as well. There are surely more elegant ways to do it than with for loops, but it does the job. Let me know if anything is unclear!
import pandas as pd
import numpy as np
import plotly.graph_objects as go
df = pd.DataFrame(
{'index': {0: 0,
1: 1,
2: 2,
3: 3,
4: 4,
5: 5,
6: 6,
7: 7,
8: 8,
9: 9,
10: 10,
11: 11,
12: 12,
13: 13,
14: 14,
15: 15,
16: 16,
17: 17,
18: 18,
19: 19,
20: 20,
21: 21,
22: 22,
23: 23,
24: 24},
'Date': {0: '2018-09-03',
1: '2018-09-04',
2: '2018-09-05',
3: '2018-09-06',
4: '2018-09-07',
5: '2018-09-10',
6: '2018-09-11',
7: '2018-09-12',
8: '2018-09-13',
9: '2018-09-14',
10: '2018-09-17',
11: '2018-09-18',
12: '2018-09-19',
13: '2018-09-20',
14: '2018-09-21',
15: '2018-09-24',
16: '2018-09-25',
17: '2018-09-26',
18: '2018-09-27',
19: '2018-09-28',
20: '2018-10-01',
21: '2018-10-02',
22: '2018-10-03',
23: '2018-10-04',
24: '2018-10-05'},
'Open': {0: 1.2922067642211914,
1: 1.2867859601974487,
2: 1.2859420776367188,
3: 1.2914056777954102,
4: 1.2928247451782229,
5: 1.292808175086975,
6: 1.3027958869934082,
7: 1.3017443418502808,
8: 1.30451238155365,
9: 1.3110626935958862,
10: 1.3071041107177734,
11: 1.3146650791168213,
12: 1.3166556358337402,
13: 1.3140604496002195,
14: 1.3271400928497314,
15: 1.3080958127975464,
16: 1.3117163181304932,
17: 1.3180439472198486,
18: 1.3169677257537842,
19: 1.3077707290649414,
20: 1.3039510250091553,
21: 1.3043931722640991,
22: 1.2979763746261597,
23: 1.2941633462905884,
24: 1.3022021055221558},
'High': {0: 1.2934937477111816,
1: 1.2870012521743774,
2: 1.2979259490966797,
3: 1.2959914207458496,
4: 1.3024225234985352,
5: 1.3052103519439695,
6: 1.30804443359375,
7: 1.3044441938400269,
8: 1.3120088577270508,
9: 1.3143367767333984,
10: 1.3156682252883911,
11: 1.3171066045761108,
12: 1.3211784362792969,
13: 1.3296104669570925,
14: 1.3278449773788452,
15: 1.3166556358337402,
16: 1.3175750970840454,
17: 1.3196094036102295,
18: 1.3180439472198486,
19: 1.3090718984603882,
20: 1.3097577095031738,
21: 1.3049719333648682,
22: 1.3020155429840088,
23: 1.3036959171295166,
24: 1.310753345489502},
'Low': {0: 1.2856279611587524,
1: 1.2813942432403564,
2: 1.2793285846710205,
3: 1.289723515510559,
4: 1.2918561697006226,
5: 1.289823293685913,
6: 1.2976733446121216,
7: 1.298414707183838,
8: 1.3027619123458862,
9: 1.3073604106903076,
10: 1.3070186376571655,
11: 1.3120776414871216,
12: 1.3120431900024414,
13: 1.3140085935592651,
14: 1.305841088294983,
15: 1.3064552545547483,
16: 1.3097233772277832,
17: 1.3141123056411743,
18: 1.309706211090088,
19: 1.3002548217773438,
20: 1.3014055490493774,
21: 1.2944146394729614,
22: 1.2964619398117063,
23: 1.2924572229385376,
24: 1.3005592823028564},
'Close': {0: 1.292306900024414,
1: 1.2869019508361816,
2: 1.2858428955078125,
3: 1.2914891242980957,
4: 1.2925406694412231,
5: 1.2930254936218262,
6: 1.302643060684204,
7: 1.3015578985214231,
8: 1.304546356201172,
9: 1.311131477355957,
10: 1.307326316833496,
11: 1.3146305084228516,
12: 1.3168463706970217,
13: 1.3141123056411743,
14: 1.327087163925171,
15: 1.30804443359375,
16: 1.3117333650588991,
17: 1.3179919719696045,
18: 1.3172800540924072,
19: 1.3078734874725342,
20: 1.3039000034332275,
21: 1.3043591976165771,
22: 1.2981956005096436,
23: 1.294062852859497,
24: 1.3024225234985352},
'Pivot Price': {0: 1.2934937477111816,
1: np.nan,
2: 1.2793285846710205,
3: np.nan,
4: np.nan,
5: np.nan,
6: np.nan,
7: np.nan,
8: np.nan,
9: np.nan,
10: np.nan,
11: np.nan,
12: np.nan,
13: 1.3296104669570925,
14: np.nan,
15: np.nan,
16: np.nan,
17: np.nan,
18: np.nan,
19: np.nan,
20: np.nan,
21: np.nan,
22: np.nan,
23: 1.2924572229385376,
24: np.nan}})
import plotly.graph_objects as go
import pandas as pd
from datetime import datetime
# df=pd.read_csv("for_so.csv")
fig = go.Figure(data=[go.Candlestick(x=df['Date'],
# fig = go.Figure(data=[go.Candlestick(x=df.index,
open=df['Open'],
high=df['High'],
low=df['Low'],
close=df['Close'])])
# some calculations
df_diff = df['Pivot Price'].dropna().diff().copy()
df2 = df[df.index.isin(df_diff.index)].copy()
df2['Price Diff'] = df['Pivot Price'].dropna().values
fig.add_trace(
go.Scatter(mode = "lines+markers",
x=df['Date'],
y=df["Pivot Price"]
))
fig.update_layout(
autosize=False,
width=1000,
height=800,)
fig.add_trace(go.Scatter(x=df['Date'], y=df['Pivot Price'].interpolate(),
# fig.add_trace(go.Scatter(x=df.index, y=df['Pivot Price'].interpolate(),
mode = 'lines',
line = dict(color='black')))
def annot(value):
# print(type(value))
if np.isnan(value):
return ''
else:
return value
j = 0
for i, p in enumerate(df['Pivot Price']):
# print(p)
# if not np.isnan(p) and not np.isnan(df_diff.iloc[j]):
if not np.isnan(p):
# print(not np.isnan(df_diff.iloc[j]))
fig.add_annotation(dict(font=dict(color='rgba(0,0,200,0.8)',size=12),
x=df['Date'].iloc[i],
# x=df.index[i],
# x = xStart
y=p,
showarrow=False,
text=annot(round(abs(df_diff.iloc[j]),3)),
textangle=0,
xanchor='right',
xref="x",
yref="y"))
j = j + 1
fig.update_xaxes(type='category')
fig.show()

Problem seems the missing values, plotly has difficulty with. With this trick you can only plot the point;
has_value = ~df["Pivot Price"].isna()
import plotly.graph_objects as go
import pandas as pd
from datetime import datetime
df=pd.read_csv("notebooks/for_so.csv")
fig = go.Figure(data=[go.Candlestick(x=df['Date'],
open=df['Open'],
high=df['High'],
low=df['Low'],
close=df['Close'])])
fig.add_trace(
go.Scatter(mode = 'lines',
x=df[has_value]['Date'],
y=df[has_value]["Pivot Price"], line={'color':'black', 'width':1}
))
fig.add_trace(
go.Scatter(mode = "markers",
x=df['Date'],
y=df["Pivot Price"]
))
fig.update_layout(
autosize=False,
width=1000,
height=800,)
fig.show()
This did it for me.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Bar chart with 2 data series with Pandas Dataframe and Plotly - python

Related

How to get cumulative sum over index and columns in pandas? [duplicate]

Plotly: How to make a jagged line plot look better?

Creating a for loop that creates a histogram for each column and groupsby index value

How to plot the daily maximum values

How to add lines with annotations to candlestick charts when some values are missing?

Categories

Resources