hvplot.errorbars - linking the error bars to line/scatterplots - python

I have the following dataframe, containing the mean and standard deviations of data, as well as other descriptors.
{'Person': {0: 'Mark',
1: 'Mark',
2: 'Mark',
3: 'Mark',
4: 'Mark',
5: 'Mark',
6: 'Mark',
7: 'Mark',
8: 'Mark',
9: 'Mark',
10: 'Mark',
11: 'Mark',
12: 'Mark',
13: 'Mark',
14: 'Mark',
15: 'Mark',
16: 'Mark',
17: 'Mark',
18: 'Mark',
19: 'Mark',
20: 'Mark',
21: 'Mark',
22: 'John',
23: 'John',
24: 'John',
25: 'John',
26: 'John',
27: 'John',
28: 'John',
29: 'John',
30: 'John',
31: 'John',
32: 'John',
33: 'John',
34: 'John',
35: 'John',
36: 'John',
37: 'John',
38: 'John',
39: 'John',
40: 'John',
41: 'John',
42: 'John',
43: 'John'},
'Alcohol': {0: 'No',
1: 'No',
2: 'No',
3: 'No',
4: 'No',
5: 'No',
6: 'No',
7: 'No',
8: 'No',
9: 'No',
10: 'No',
11: 'Yes',
12: 'Yes',
13: 'Yes',
14: 'Yes',
15: 'Yes',
16: 'Yes',
17: 'Yes',
18: 'Yes',
19: 'Yes',
20: 'Yes',
21: 'Yes',
22: 'No',
23: 'No',
24: 'No',
25: 'No',
26: 'No',
27: 'No',
28: 'No',
29: 'No',
30: 'No',
31: 'No',
32: 'No',
33: 'Yes',
34: 'Yes',
35: 'Yes',
36: 'Yes',
37: 'Yes',
38: 'Yes',
39: 'Yes',
40: 'Yes',
41: 'Yes',
42: 'Yes',
43: 'Yes'},
'Product': {0: 'Orange',
1: 'Orange',
2: 'Orange',
3: 'Orange',
4: 'Orange',
5: 'Apple',
6: 'Apple',
7: 'Apple',
8: 'Apple',
9: 'Apple',
10: 'Apple',
11: 'Orange',
12: 'Orange',
13: 'Orange',
14: 'Orange',
15: 'Orange',
16: 'Apple',
17: 'Apple',
18: 'Apple',
19: 'Apple',
20: 'Apple',
21: 'Apple',
22: 'Orange',
23: 'Orange',
24: 'Orange',
25: 'Orange',
26: 'Orange',
27: 'Apple',
28: 'Apple',
29: 'Apple',
30: 'Apple',
31: 'Apple',
32: 'Apple',
33: 'Orange',
34: 'Orange',
35: 'Orange',
36: 'Orange',
37: 'Orange',
38: 'Apple',
39: 'Apple',
40: 'Apple',
41: 'Apple',
42: 'Apple',
43: 'Apple'},
'Concentration': {0: 0,
1: 10,
2: 20,
3: 30,
4: 40,
5: 0,
6: 10,
7: 20,
8: 30,
9: 40,
10: 50,
11: 0,
12: 10,
13: 20,
14: 30,
15: 40,
16: 0,
17: 10,
18: 20,
19: 30,
20: 40,
21: 50,
22: 0,
23: 10,
24: 20,
25: 30,
26: 40,
27: 0,
28: 10,
29: 20,
30: 30,
31: 40,
32: 50,
33: 0,
34: 10,
35: 20,
36: 30,
37: 40,
38: 0,
39: 10,
40: 20,
41: 30,
42: 40,
43: 50},
'Response': {0: 4,
1: 10,
2: 25,
3: 31,
4: 48,
5: 10,
6: 22,
7: 35,
8: 46,
9: 56,
10: 61,
11: 24,
12: 30,
13: 45,
14: 51,
15: 68,
16: 30,
17: 42,
18: 55,
19: 66,
20: 76,
21: 81,
22: 17,
23: 23,
24: 38,
25: 44,
26: 61,
27: 23,
28: 35,
29: 48,
30: 59,
31: 69,
32: 74,
33: 37,
34: 43,
35: 58,
36: 64,
37: 81,
38: 43,
39: 55,
40: 68,
41: 79,
42: 89,
43: 94},
'Response mean': {0: 4.333333333,
1: 15.0,
2: 24.33333333,
3: 35.33333333,
4: 45.33333333,
5: 12.33333333,
6: 24.66666667,
7: 34.33333333,
8: 45.0,
9: 57.66666667,
10: 55.66666667,
11: 24.33333333,
12: 35.0,
13: 44.33333333,
14: 55.33333333,
15: 65.33333333,
16: 32.33333333,
17: 44.66666667,
18: 54.33333333,
19: 65.0,
20: 77.66666667,
21: 75.66666667,
22: 17.33333333,
23: 28.0,
24: 37.33333333,
25: 48.33333333,
26: 58.33333333,
27: 25.33333333,
28: 37.66666667,
29: 47.33333333,
30: 58.0,
31: 70.66666667,
32: 68.66666667,
33: 37.33333333,
34: 48.0,
35: 57.33333333,
36: 68.33333333,
37: 78.33333333,
38: 45.33333333,
39: 57.66666667,
40: 67.33333333,
41: 78.0,
42: 90.66666667,
43: 88.66666667},
'Response SD': {0: 1.527525232,
1: 4.582575695,
2: 2.081665999,
3: 4.041451884,
4: 2.516611478,
5: 2.516611478,
6: 3.055050463,
7: 2.081665999,
8: 1.0,
9: 1.527525232,
10: 14.74222959,
11: 1.527525232,
12: 4.582575695,
13: 2.081665999,
14: 4.041451884,
15: 2.516611478,
16: 2.516611478,
17: 3.055050463,
18: 2.081665999,
19: 1.0,
20: 1.527525232,
21: 14.74222959,
22: 1.527525232,
23: 4.582575695,
24: 2.081665999,
25: 4.041451884,
26: 2.516611478,
27: 2.516611478,
28: 3.055050463,
29: 2.081665999,
30: 1.0,
31: 1.527525232,
32: 14.74222959,
33: 1.527525232,
34: 4.582575695,
35: 2.081665999,
36: 4.041451884,
37: 2.516611478,
38: 2.516611478,
39: 3.055050463,
40: 2.081665999,
41: 1.0,
42: 1.527525232,
43: 14.74222959}}
I've starting using hvplot because of the interactivity it offers to explore data, and from what I can tell, its based on Bokeh. In the example code below, I can plot the mean values (using scatter to give me the glyphs) and the line through the glyphs (using line) and overlaying them with the scatter * line code. However, as I need to plot the standard deviations, I'm using the errorbar plot too, which works great (note that I use scatter * line * error). However, when I click on the legend to remove certain data, the scatter and line is removed (note that I have muted_alpha=0 for the line and scatter, but there is no such option for the error bars), but the error bars stay on the plot. Its as though the scatter and line are 'linked' together, but the errorbars isn't.
line = df2.hvplot.line(x='Concentration', y='Response mean', by='Product', groupby= ['Person', 'Alcohol'], muted_alpha=0)
scatter = df2.hvplot.scatter(x='Concentration', y='Response mean', by='Product', groupby=['Person', 'Alcohol'], marker='o', size=40, muted_alpha=0)
error = df2.hvplot.errorbars(x='Concentration', y='Response mean', yerr1='Response SD', by='Product', groupby=['Person', 'Alcohol'])
all_plots = scatter * line * error
all_plots
Can anyone help me to link the errorbars to its corresponding data, so when I click on the legend, the error bars, scatter and line are removed from the plot?
Thank you in advance!

Related

graphs overlapping and redundant code to clear it out

I've been using RMarkdown to create graphs. Then I take the graphs and copy and paste them into Powerpoint presentations. That's been my workflow.
Here is the dataframe that I am using.
{'Unnamed: 0': {0: 'Mazda RX4', 1: 'Mazda RX4 Wag', 2: 'Datsun 710', 3: 'Hornet 4 Drive', 4: 'Hornet Sportabout', 5: 'Valiant', 6: 'Duster 360', 7: 'Merc 240D', 8: 'Merc 230', 9: 'Merc 280', 10: 'Merc 280C', 11: 'Merc 450SE', 12: 'Merc 450SL', 13: 'Merc 450SLC', 14: 'Cadillac Fleetwood', 15: 'Lincoln Continental', 16: 'Chrysler Imperial', 17: 'Fiat 128', 18: 'Honda Civic', 19: 'Toyota Corolla', 20: 'Toyota Corona', 21: 'Dodge Challenger', 22: 'AMC Javelin', 23: 'Camaro Z28', 24: 'Pontiac Firebird', 25: 'Fiat X1-9', 26: 'Porsche 914-2', 27: 'Lotus Europa', 28: 'Ford Pantera L', 29: 'Ferrari Dino', 30: 'Maserati Bora', 31: 'Volvo 142E'}, 'mpg': {0: 21.0, 1: 21.0, 2: 22.8, 3: 21.4, 4: 18.7, 5: 18.1, 6: 14.3, 7: 24.4, 8: 22.8, 9: 19.2, 10: 17.8, 11: 16.4, 12: 17.3, 13: 15.2, 14: 10.4, 15: 10.4, 16: 14.7, 17: 32.4, 18: 30.4, 19: 33.9, 20: 21.5, 21: 15.5, 22: 15.2, 23: 13.3, 24: 19.2, 25: 27.3, 26: 26.0, 27: 30.4, 28: 15.8, 29: 19.7, 30: 15.0, 31: 21.4}, 'cyl': {0: 6, 1: 6, 2: 4, 3: 6, 4: 8, 5: 6, 6: 8, 7: 4, 8: 4, 9: 6, 10: 6, 11: 8, 12: 8, 13: 8, 14: 8, 15: 8, 16: 8, 17: 4, 18: 4, 19: 4, 20: 4, 21: 8, 22: 8, 23: 8, 24: 8, 25: 4, 26: 4, 27: 4, 28: 8, 29: 6, 30: 8, 31: 4}, 'disp': {0: 160.0, 1: 160.0, 2: 108.0, 3: 258.0, 4: 360.0, 5: 225.0, 6: 360.0, 7: 146.7, 8: 140.8, 9: 167.6, 10: 167.6, 11: 275.8, 12: 275.8, 13: 275.8, 14: 472.0, 15: 460.0, 16: 440.0, 17: 78.7, 18: 75.7, 19: 71.1, 20: 120.1, 21: 318.0, 22: 304.0, 23: 350.0, 24: 400.0, 25: 79.0, 26: 120.3, 27: 95.1, 28: 351.0, 29: 145.0, 30: 301.0, 31: 121.0}, 'hp': {0: 110, 1: 110, 2: 93, 3: 110, 4: 175, 5: 105, 6: 245, 7: 62, 8: 95, 9: 123, 10: 123, 11: 180, 12: 180, 13: 180, 14: 205, 15: 215, 16: 230, 17: 66, 18: 52, 19: 65, 20: 97, 21: 150, 22: 150, 23: 245, 24: 175, 25: 66, 26: 91, 27: 113, 28: 264, 29: 175, 30: 335, 31: 109}, 'drat': {0: 3.9, 1: 3.9, 2: 3.85, 3: 3.08, 4: 3.15, 5: 2.76, 6: 3.21, 7: 3.69, 8: 3.92, 9: 3.92, 10: 3.92, 11: 3.07, 12: 3.07, 13: 3.07, 14: 2.93, 15: 3.0, 16: 3.23, 17: 4.08, 18: 4.93, 19: 4.22, 20: 3.7, 21: 2.76, 22: 3.15, 23: 3.73, 24: 3.08, 25: 4.08, 26: 4.43, 27: 3.77, 28: 4.22, 29: 3.62, 30: 3.54, 31: 4.11}, 'wt': {0: 2.62, 1: 2.875, 2: 2.32, 3: 3.215, 4: 3.44, 5: 3.46, 6: 3.57, 7: 3.19, 8: 3.15, 9: 3.44, 10: 3.44, 11: 4.07, 12: 3.73, 13: 3.78, 14: 5.25, 15: 5.424, 16: 5.345, 17: 2.2, 18: 1.615, 19: 1.835, 20: 2.465, 21: 3.52, 22: 3.435, 23: 3.84, 24: 3.845, 25: 1.935, 26: 2.14, 27: 1.513, 28: 3.17, 29: 2.77, 30: 3.57, 31: 2.78}, 'qsec': {0: 16.46, 1: 17.02, 2: 18.61, 3: 19.44, 4: 17.02, 5: 20.22, 6: 15.84, 7: 20.0, 8: 22.9, 9: 18.3, 10: 18.9, 11: 17.4, 12: 17.6, 13: 18.0, 14: 17.98, 15: 17.82, 16: 17.42, 17: 19.47, 18: 18.52, 19: 19.9, 20: 20.01, 21: 16.87, 22: 17.3, 23: 15.41, 24: 17.05, 25: 18.9, 26: 16.7, 27: 16.9, 28: 14.5, 29: 15.5, 30: 14.6, 31: 18.6}, 'vs': {0: 0, 1: 0, 2: 1, 3: 1, 4: 0, 5: 1, 6: 0, 7: 1, 8: 1, 9: 1, 10: 1, 11: 0, 12: 0, 13: 0, 14: 0, 15: 0, 16: 0, 17: 1, 18: 1, 19: 1, 20: 1, 21: 0, 22: 0, 23: 0, 24: 0, 25: 1, 26: 0, 27: 1, 28: 0, 29: 0, 30: 0, 31: 1}, 'am': {0: 1, 1: 1, 2: 1, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 10: 0, 11: 0, 12: 0, 13: 0, 14: 0, 15: 0, 16: 0, 17: 1, 18: 1, 19: 1, 20: 0, 21: 0, 22: 0, 23: 0, 24: 0, 25: 1, 26: 1, 27: 1, 28: 1, 29: 1, 30: 1, 31: 1}, 'gear': {0: 4, 1: 4, 2: 4, 3: 3, 4: 3, 5: 3, 6: 3, 7: 4, 8: 4, 9: 4, 10: 4, 11: 3, 12: 3, 13: 3, 14: 3, 15: 3, 16: 3, 17: 4, 18: 4, 19: 4, 20: 3, 21: 3, 22: 3, 23: 3, 24: 3, 25: 4, 26: 5, 27: 5, 28: 5, 29: 5, 30: 5, 31: 4}, 'carb': {0: 4, 1: 4, 2: 1, 3: 1, 4: 2, 5: 1, 6: 4, 7: 2, 8: 2, 9: 4, 10: 4, 11: 3, 12: 3, 13: 3, 14: 4, 15: 4, 16: 4, 17: 1, 18: 2, 19: 1, 20: 1, 21: 2, 22: 2, 23: 4, 24: 2, 25: 1, 26: 2, 27: 2, 28: 4, 29: 6, 30: 8, 31: 2}}
The code looks like this.
```{r, warning = FALSE, message = FALSE}
ggplot2::ggplot(data = mtcars, aes(x = wt, y = after_stat(count))) +
geom_histogram(bins = 32, color = 'black', fill = '#ffe6b7') +
labs(title = "Mtcars", subtitle = "Histogram") +
theme(plot.title = element_text(face = "bold"))
ggplot2::ggplot(data = mtcars, aes(x = mpg, y = after_stat(count))) +
geom_histogram(bins = 32, color = 'black', fill = '#ffe6b7') +
labs(title = "Mtcars", subtitle = "Histogram") +
theme(plot.title = element_text(face = "bold"))
ggplot2::ggplot(data = mtcars, aes(x = disp, y = after_stat(count))) +
geom_histogram(bins = 32, color = 'black', fill = '#ffe6b7') +
labs(title = "Mtcars", subtitle = "Histogram") +
theme(plot.title = element_text(face = "bold"))
```
And here is a screenshot of the output.
Now I'm trying to do the same using python graphs. I'm seeing that I can't do the same thing exactly because the graphs start overlapping.
```{python}
seaborn.histplot(data=mtcars, x="wt", bins = 30)
plt.title("wt histogram", loc = 'left')
plt.show()
seaborn.histplot(data=mtcars, x="mpg", bins = 30)
plt.title("mpg histogram", loc = 'left')
plt.show()
seaborn.histplot(data=mtcars, x="disp", bins = 30)
plt.title("disp histogram", loc = 'left')
plt.show()
```
So now what I'm doing is I'm clearing out the space after I create every single graph. The output now looks fine - I get a distinct histogram for each variable I'm calling.
```{python}
plt.figure().clear()
plt.close()
plt.cla()
plt.clf()
seaborn.histplot(data=mtcars, x="wt", bins = 30)
plt.title("wt histogram", loc = 'left')
plt.show()
plt.figure().clear()
plt.close()
plt.cla()
plt.clf()
seaborn.histplot(data=mtcars, x="mpg", bins = 30)
plt.title("mpg histogram", loc = 'left')
plt.show()
plt.figure().clear()
plt.close()
plt.cla()
plt.clf()
seaborn.histplot(data=mtcars, x="disp", bins = 30)
plt.title("disp histogram", loc = 'left')
plt.show()
```
The output is definitely better.
But isn't this method really redundant? What do people who use python more regularly do to maintain what is happening with the graphs? Do you all clear out the space every time in this way?

Stacked bar chart X axis gives wrong order python plotly

Hi created a stack bar chart using python plotly. But gives the wrong X-axis order.
DF :
Day-Shift State seconds
Day 01-05 A 7439
Day 01-05 STOPPED 0
Day 01-05 B 10
Day 01-05 C 35751
Night 01-05 C 43200
Day 01-06 STOPPED 7198
Day 01-06 F 18
Day 01-06 A 14
Day 01-06 A 29301
Day 01-06 STOPPED 6
Day 01-06 A 6663
Night 01-06 A 43200
In df Day-Shift represent shift and Date, it goes Day 01-05, Night 01-05, Day 01-06, Night 01-06, and so on.
But in the graph, gives the wrong order on X-axis. Ex: After the Day 01-05 graph shows Night 01-08 instead of Night 01-05.
Sample df and my code attached below:
import plotly.express as px
fig = px.bar(df, x="Day-Shift", y="seconds", color="State")
fig.show()
Df ad Dict:
import pandas as pd
import plotly.express as px
df = pd.DataFrame({'Day-Shift': {0: 'Day 01-05',
1: 'Day 01-05',
2: 'Day 01-05',
3: 'Day 01-05',
4: 'Night 01-05',
5: 'Day 01-06',
6: 'Day 01-06',
7: 'Day 01-06',
8: 'Day 01-06',
9: 'Day 01-06',
10: 'Day 01-06',
11: 'Night 01-06',
12: 'Day 01-07',
13: 'Night 01-07',
14: 'Night 01-07',
15: 'Night 01-07',
16: 'Night 01-07',
17: 'Night 01-07',
18: 'Night 01-08',
19: 'Night 01-08',
20: 'Night 01-08',
21: 'Night 01-08',
22: 'Day 01-08',
23: 'Day 01-08',
24: 'Day 01-08',
25: 'Night 01-09',
26: 'Night 01-09',
27: 'Night 01-09',
28: 'Day 01-09',
29: 'Day 01-09',
30: 'Day 01-09',
31: 'Day 01-09',
32: 'Day 01-10',
33: 'Night 01-10',
34: 'Day 01-11',
35: 'Day 01-11',
36: 'Day 01-11',
37: 'Day 01-11',
38: 'Day 01-11',
39: 'Night 01-11',
40: 'Day 01-12',
41: 'Night 01-12',
42: 'Day 01-13',
43: 'Day 01-13',
44: 'Day 01-13',
45: 'Day 01-13',
46: 'Day 01-13',
47: 'Day 01-13',
48: 'Day 01-13',
49: 'Night 01-13',
50: 'Day 01-14',
51: 'Day 01-14',
52: 'Day 01-14',
53: 'Day 01-14',
54: 'Day 01-14',
55: 'Day 01-14',
56: 'Day 01-14',
57: 'Day 01-14',
58: 'Day 01-14',
59: 'Night 01-14'},
'State': {0: 'D',
1: 'STOPPED',
2: 'B',
3: 'A',
4: 'A',
5: 'A',
6: 'A1',
7: 'A2',
8: 'A3',
9: 'A4',
10: 'B1',
11: 'B1',
12: 'B1',
13: 'B1',
14: 'B2',
15: 'STOPPED',
16: 'RUNNING',
17: 'B',
18: 'STOPPED',
19: 'B',
20: 'RUNNING',
21: 'D',
22: 'STOPPED',
23: 'B',
24: 'RUNNING',
25: 'STOPPED',
26: 'RUNNING',
27: 'B',
28: 'RUNNING',
29: 'STOPPED',
30: 'B',
31: 'D',
32: 'B',
33: 'B',
34: 'B',
35: 'RUNNING',
36: 'STOPPED',
37: 'D',
38: 'A',
39: 'A',
40: 'A',
41: 'A',
42: 'A',
43: 'A1',
44: 'A2',
45: 'A3',
46: 'A4',
47: 'B1',
48: 'B2',
49: 'B2',
50: 'B2',
51: 'B',
52: 'STOPPED',
53: 'A',
54: 'A1',
55: 'A2',
56: 'A3',
57: 'A4',
58: 'B1',
59: 'B1'},
'seconds': {0: 7439,
1: 0,
2: 10,
3: 35751,
4: 43200,
5: 7198,
6: 18,
7: 14,
8: 29301,
9: 6,
10: 6663,
11: 43200,
12: 43200,
13: 5339,
14: 8217,
15: 0,
16: 4147,
17: 1040,
18: 24787,
19: 1500,
20: 14966,
21: 1410,
22: 2499,
23: 1310,
24: 39391,
25: 3570,
26: 17234,
27: 47390,
28: 36068,
29: 270,
30: 6842,
31: 20,
32: 43200,
33: 43200,
34: 2486,
35: 8420,
36: 870,
37: 30,
38: 31394,
39: 43200,
40: 43200,
41: 43200,
42: 36733,
43: 23,
44: 6,
45: 4,
46: 4,
47: 3,
48: 6427,
49: 43200,
50: 620,
51: 0,
52: 4,
53: 41336,
54: 4,
55: 4,
56: 4,
57: 23,
58: 1205,
59: 43200}})
Really appreciate your support !!!
You can use category_orders to set the order of values:
import pandas as pd
import plotly.express as px
df = pd.DataFrame({'Day-Shift': {0: 'Day 01-05', 1: 'Day 01-05', 2: 'Day 01-05', 3: 'Day 01-05', 4: 'Night 01-05', 5: 'Day 01-06', 6: 'Day 01-06', 7: 'Day 01-06', 8: 'Day 01-06', 9: 'Day 01-06', 10: 'Day 01-06', 11: 'Night 01-06', 12: 'Day 01-07', 13: 'Night 01-07', 14: 'Night 01-07', 15: 'Night 01-07', 16: 'Night 01-07', 17: 'Night 01-07', 18: 'Night 01-08', 19: 'Night 01-08', 20: 'Night 01-08', 21: 'Night 01-08', 22: 'Day 01-08', 23: 'Day 01-08', 24: 'Day 01-08', 25: 'Night 01-09', 26: 'Night 01-09', 27: 'Night 01-09', 28: 'Day 01-09', 29: 'Day 01-09', 30: 'Day 01-09', 31: 'Day 01-09', 32: 'Day 01-10', 33: 'Night 01-10', 34: 'Day 01-11', 35: 'Day 01-11', 36: 'Day 01-11', 37: 'Day 01-11', 38: 'Day 01-11', 39: 'Night 01-11', 40: 'Day 01-12', 41: 'Night 01-12', 42: 'Day 01-13', 43: 'Day 01-13', 44: 'Day 01-13', 45: 'Day 01-13', 46: 'Day 01-13', 47: 'Day 01-13', 48: 'Day 01-13', 49: 'Night 01-13', 50: 'Day 01-14', 51: 'Day 01-14', 52: 'Day 01-14', 53: 'Day 01-14', 54: 'Day 01-14', 55: 'Day 01-14', 56: 'Day 01-14', 57: 'Day 01-14', 58: 'Day 01-14', 59: 'Night 01-14'}, 'State': {0: 'D', 1: 'STOPPED', 2: 'B', 3: 'A', 4: 'A', 5: 'A', 6: 'A1', 7: 'A2', 8: 'A3', 9: 'A4', 10: 'B1', 11: 'B1', 12: 'B1', 13: 'B1', 14: 'B2', 15: 'STOPPED', 16: 'RUNNING', 17: 'B', 18: 'STOPPED', 19: 'B', 20: 'RUNNING', 21: 'D', 22: 'STOPPED', 23: 'B', 24: 'RUNNING', 25: 'STOPPED', 26: 'RUNNING', 27: 'B', 28: 'RUNNING', 29: 'STOPPED', 30: 'B', 31: 'D', 32: 'B', 33: 'B', 34: 'B', 35: 'RUNNING', 36: 'STOPPED', 37: 'D', 38: 'A', 39: 'A', 40: 'A', 41: 'A', 42: 'A', 43: 'A1', 44: 'A2', 45: 'A3', 46: 'A4', 47: 'B1', 48: 'B2', 49: 'B2', 50: 'B2', 51: 'B', 52: 'STOPPED', 53: 'A', 54: 'A1', 55: 'A2', 56: 'A3', 57: 'A4', 58: 'B1', 59: 'B1'}, 'seconds': {0: 7439, 1: 0, 2: 10, 3: 35751, 4: 43200, 5: 7198, 6: 18, 7: 14, 8: 29301, 9: 6, 10: 6663, 11: 43200, 12: 43200, 13: 5339, 14: 8217, 15: 0, 16: 4147, 17: 1040, 18: 24787, 19: 1500, 20: 14966, 21: 1410, 22: 2499, 23: 1310, 24: 39391, 25: 3570, 26: 17234, 27: 47390, 28: 36068, 29: 270, 30: 6842, 31: 20, 32: 43200, 33: 43200, 34: 2486, 35: 8420, 36: 870, 37: 30, 38: 31394, 39: 43200, 40: 43200, 41: 43200, 42: 36733, 43: 23, 44: 6, 45: 4, 46: 4, 47: 3, 48: 6427, 49: 43200, 50: 620, 51: 0, 52: 4, 53: 41336, 54: 4, 55: 4, 56: 4, 57: 23, 58: 1205, 59: 43200}})
fig = px.bar(df, x="Day-Shift", y="seconds", category_orders={'Day-Shift': df['Day-Shift'].to_list()},color="State")
fig.show()
Output:
Setting category_orders = {"Day-Shift":df['Day-Shift'].unique()} will work, but only reliably if your dataset has the correct order to begin with. Another condition is that you only have data for one unique year. In order to guarantee the correct order regardless of original order, and to make it possible to have data for december 2020 combinde with january 2021 I would suggest you to:
split "Day-Shift" into two separate columns; time of day == tod and day of month = date,
append year to your dates, like dfs['date2'] = dfs['date'] + '-2021',
turn 'date2' into datetime using dfs['date2'] = pd.to_datetime(dfs['date2']),
sort your values chronologically, and
retrieve "Day-Shift" in the now correct order with new_order = list(df['Day-Shift'].unique()), and then
apply the chronologially correct order through category_orders = {'Day-Shift': new_order}
Plot
Complete code:
import pandas as pd
import plotly.express as px
df = pd.DataFrame({'Day-Shift': {0: 'Day 01-05',
1: 'Day 01-05',
2: 'Day 01-05',
3: 'Day 01-05',
4: 'Night 01-05',
5: 'Day 01-06',
6: 'Day 01-06',
7: 'Day 01-06',
8: 'Day 01-06',
9: 'Day 01-06',
10: 'Day 01-06',
11: 'Night 01-06',
12: 'Day 01-07',
13: 'Night 01-07',
14: 'Night 01-07',
15: 'Night 01-07',
16: 'Night 01-07',
17: 'Night 01-07',
18: 'Night 01-08',
19: 'Night 01-08',
20: 'Night 01-08',
21: 'Night 01-08',
22: 'Day 01-08',
23: 'Day 01-08',
24: 'Day 01-08',
25: 'Night 01-09',
26: 'Night 01-09',
27: 'Night 01-09',
28: 'Day 01-09',
29: 'Day 01-09',
30: 'Day 01-09',
31: 'Day 01-09',
32: 'Day 01-10',
33: 'Night 01-10',
34: 'Day 01-11',
35: 'Day 01-11',
36: 'Day 01-11',
37: 'Day 01-11',
38: 'Day 01-11',
39: 'Night 01-11',
40: 'Day 01-12',
41: 'Night 01-12',
42: 'Day 01-13',
43: 'Day 01-13',
44: 'Day 01-13',
45: 'Day 01-13',
46: 'Day 01-13',
47: 'Day 01-13',
48: 'Day 01-13',
49: 'Night 01-13',
50: 'Day 01-14',
51: 'Day 01-14',
52: 'Day 01-14',
53: 'Day 01-14',
54: 'Day 01-14',
55: 'Day 01-14',
56: 'Day 01-14',
57: 'Day 01-14',
58: 'Day 01-14',
59: 'Night 01-14'},
'State': {0: 'D',
1: 'STOPPED',
2: 'B',
3: 'A',
4: 'A',
5: 'A',
6: 'A1',
7: 'A2',
8: 'A3',
9: 'A4',
10: 'B1',
11: 'B1',
12: 'B1',
13: 'B1',
14: 'B2',
15: 'STOPPED',
16: 'RUNNING',
17: 'B',
18: 'STOPPED',
19: 'B',
20: 'RUNNING',
21: 'D',
22: 'STOPPED',
23: 'B',
24: 'RUNNING',
25: 'STOPPED',
26: 'RUNNING',
27: 'B',
28: 'RUNNING',
29: 'STOPPED',
30: 'B',
31: 'D',
32: 'B',
33: 'B',
34: 'B',
35: 'RUNNING',
36: 'STOPPED',
37: 'D',
38: 'A',
39: 'A',
40: 'A',
41: 'A',
42: 'A',
43: 'A1',
44: 'A2',
45: 'A3',
46: 'A4',
47: 'B1',
48: 'B2',
49: 'B2',
50: 'B2',
51: 'B',
52: 'STOPPED',
53: 'A',
54: 'A1',
55: 'A2',
56: 'A3',
57: 'A4',
58: 'B1',
59: 'B1'},
'seconds': {0: 7439,
1: 0,
2: 10,
3: 35751,
4: 43200,
5: 7198,
6: 18,
7: 14,
8: 29301,
9: 6,
10: 6663,
11: 43200,
12: 43200,
13: 5339,
14: 8217,
15: 0,
16: 4147,
17: 1040,
18: 24787,
19: 1500,
20: 14966,
21: 1410,
22: 2499,
23: 1310,
24: 39391,
25: 3570,
26: 17234,
27: 47390,
28: 36068,
29: 270,
30: 6842,
31: 20,
32: 43200,
33: 43200,
34: 2486,
35: 8420,
36: 870,
37: 30,
38: 31394,
39: 43200,
40: 43200,
41: 43200,
42: 36733,
43: 23,
44: 6,
45: 4,
46: 4,
47: 3,
48: 6427,
49: 43200,
50: 620,
51: 0,
52: 4,
53: 41336,
54: 4,
55: 4,
56: 4,
57: 23,
58: 1205,
59: 43200}})
dfs = df['Day-Shift'].str.extract('([a-zA-Z]+)([^a-zA-Z]+)', expand=True)
dfs.columns = ['tod', 'date']
dfs['date2'] = dfs['date'] + '-2021'
dfs['date2'] = pd.to_datetime(dfs['date2'])
df = pd.concat([df, dfs], axis = 1)
df = df.sort_values(['date2', 'tod'], ascending = [True, True])
new_order = list(df['Day-Shift'].unique())
# df['Day-Shift'] = pd.Categorical(df['Day-Shift'], categories=new_order, ordered=True)
fig = px.bar(df, x="Day-Shift", y="seconds", color="State",
category_orders = {'Day-Shift': new_order})
fig.update_xaxes(type='category')
fig.show()

Stacked bar chart returns unexpected output (Python, plotly)

I have a df with three columns(Day-Shift, State, seconds).
Day-Shift State seconds
Day 01-05 A 7439
Day 01-05 STOPPED 0
Day 01-05 B 10
Day 01-05 C 35751
Night 01-05 C 43200
Day 01-06 STOPPED 7198
Day 01-06 F 18
Day 01-06 A 14
Day 01-06 A 29301
Day 01-06 STOPPED 6
Day 01-06 A 6663
Night 01-06 A 43200
My code to build a stacked bar chart is:
import plotly.express as px
fig = px.bar(df, x="Day-Shift", y="seconds", color="State")
fig.show()
But it returns this stacked bar chart.
The fault here is that the Day-Shift order is changed and the corresponding seconds are not in this chart. I cannot identify the error. Really appreciate your support!
DF:
{'Day-Shift': {0: 'Day 01-05',
1: 'Day 01-05',
2: 'Day 01-05',
3: 'Day 01-05',
4: 'Night 01-05',
5: 'Day 01-06',
6: 'Day 01-06',
7: 'Day 01-06',
8: 'Day 01-06',
9: 'Day 01-06',
10: 'Day 01-06',
11: 'Night 01-06',
12: 'Day 01-07',
13: 'Night 01-07',
14: 'Night 01-07',
15: 'Night 01-07',
16: 'Night 01-07',
17: 'Night 01-07',
18: 'Night 01-08',
19: 'Night 01-08',
20: 'Night 01-08',
21: 'Night 01-08',
22: 'Day 01-08',
23: 'Day 01-08',
24: 'Day 01-08',
25: 'Night 01-09',
26: 'Night 01-09',
27: 'Night 01-09',
28: 'Day 01-09',
29: 'Day 01-09',
30: 'Day 01-09',
31: 'Day 01-09',
32: 'Day 01-10',
33: 'Night 01-10',
34: 'Day 01-11',
35: 'Day 01-11',
36: 'Day 01-11',
37: 'Day 01-11',
38: 'Day 01-11',
39: 'Night 01-11',
40: 'Day 01-12',
41: 'Night 01-12',
42: 'Day 01-13',
43: 'Day 01-13',
44: 'Day 01-13',
45: 'Day 01-13',
46: 'Day 01-13',
47: 'Day 01-13',
48: 'Day 01-13',
49: 'Night 01-13',
50: 'Day 01-14',
51: 'Day 01-14',
52: 'Day 01-14',
53: 'Day 01-14',
54: 'Day 01-14',
55: 'Day 01-14',
56: 'Day 01-14',
57: 'Day 01-14',
58: 'Day 01-14',
59: 'Night 01-14'},
'State': {0: 'D',
1: 'STOPPED',
2: 'B',
3: 'A',
4: 'A',
5: 'A',
6: 'A1',
7: 'A2',
8: 'A3',
9: 'A4',
10: 'B1',
11: 'B1',
12: 'B1',
13: 'B1',
14: 'B2',
15: 'STOPPED',
16: 'RUNNING',
17: 'B',
18: 'STOPPED',
19: 'B',
20: 'RUNNING',
21: 'D',
22: 'STOPPED',
23: 'B',
24: 'RUNNING',
25: 'STOPPED',
26: 'RUNNING',
27: 'B',
28: 'RUNNING',
29: 'STOPPED',
30: 'B',
31: 'D',
32: 'B',
33: 'B',
34: 'B',
35: 'RUNNING',
36: 'STOPPED',
37: 'D',
38: 'A',
39: 'A',
40: 'A',
41: 'A',
42: 'A',
43: 'A1',
44: 'A2',
45: 'A3',
46: 'A4',
47: 'B1',
48: 'B2',
49: 'B2',
50: 'B2',
51: 'B',
52: 'STOPPED',
53: 'A',
54: 'A1',
55: 'A2',
56: 'A3',
57: 'A4',
58: 'B1',
59: 'B1'},
'seconds': {0: 7439,
1: 0,
2: 10,
3: 35751,
4: 43200,
5: 7198,
6: 18,
7: 14,
8: 29301,
9: 6,
10: 6663,
11: 43200,
12: 43200,
13: 5339,
14: 8217,
15: 0,
16: 4147,
17: 1040,
18: 24787,
19: 1500,
20: 14966,
21: 1410,
22: 2499,
23: 1310,
24: 39391,
25: 3570,
26: 17234,
27: 47390,
28: 36068,
29: 270,
30: 6842,
31: 20,
32: 43200,
33: 43200,
34: 2486,
35: 8420,
36: 870,
37: 30,
38: 31394,
39: 43200,
40: 43200,
41: 43200,
42: 36733,
43: 23,
44: 6,
45: 4,
46: 4,
47: 3,
48: 6427,
49: 43200,
50: 620,
51: 0,
52: 4,
53: 41336,
54: 4,
55: 4,
56: 4,
57: 23,
58: 1205,
59: 43200}}
Your snippet seems to be running fine on my end:
import plotly.express as px
fig = px.bar(df, x="Day-Shift", y="seconds", color="State")
fig.show()
And produces this plot:
So then it's either an issue with your version, or, more likely, your data. The first thing you should do is make sure that none of your data has been turned into an index. You can easily reset your index using df = df.reset_index(). In the snippet below you'll see that I've used your identical dataset as a dict with no index.
Edit: xaxis formatting
In the figure above, plotly interprets your xaxis as time values. If you'd like to prevent this, just include fig.update_xaxes(type='category') to get this:
Complete code:
import pandas as pd
import plotly.express as px
# df = pd.read_clipboard(sep='\\s+').reset_index()
# df.to_dict()
df = pd.DataFrame({'index': {0: 'Day',
1: 'Day',
2: 'Day',
3: 'Day',
4: 'Night',
5: 'Day',
6: 'Day',
7: 'Day',
8: 'Day',
9: 'Day',
10: 'Day',
11: 'Night'},
'Day-Shift': {0: '01-05',
1: '01-05',
2: '01-05',
3: '01-05',
4: '01-05',
5: '01-06',
6: '01-06',
7: '01-06',
8: '01-06',
9: '01-06',
10: '01-06',
11: '01-06'},
'State': {0: 'A',
1: 'STOPPED',
2: 'B',
3: 'C',
4: 'C',
5: 'STOPPED',
6: 'F',
7: 'A',
8: 'A',
9: 'STOPPED',
10: 'A',
11: 'A'},
'seconds': {0: 7439,
1: 0,
2: 10,
3: 35751,
4: 43200,
5: 7198,
6: 18,
7: 14,
8: 29301,
9: 6,
10: 6663,
11: 43200}})
import plotly.express as px
fig = px.bar(df, x="Day-Shift", y="seconds", color="State")
fig.show()

How to add lines with annotations to candlestick charts when some values are missing?

I'm trying to use Plotly to overlay a marker/line chart on top of my OHLC candle chart.
Code
import plotly.graph_objects as go
import pandas as pd
from datetime import datetime
df = pd.DataFrame(
{'index': {0: 0,
1: 1,
2: 2,
3: 3,
4: 4,
5: 5,
6: 6,
7: 7,
8: 8,
9: 9,
10: 10,
11: 11,
12: 12,
13: 13,
14: 14,
15: 15,
16: 16,
17: 17,
18: 18,
19: 19,
20: 20,
21: 21,
22: 22,
23: 23,
24: 24},
'Date': {0: '2018-09-03',
1: '2018-09-04',
2: '2018-09-05',
3: '2018-09-06',
4: '2018-09-07',
5: '2018-09-10',
6: '2018-09-11',
7: '2018-09-12',
8: '2018-09-13',
9: '2018-09-14',
10: '2018-09-17',
11: '2018-09-18',
12: '2018-09-19',
13: '2018-09-20',
14: '2018-09-21',
15: '2018-09-24',
16: '2018-09-25',
17: '2018-09-26',
18: '2018-09-27',
19: '2018-09-28',
20: '2018-10-01',
21: '2018-10-02',
22: '2018-10-03',
23: '2018-10-04',
24: '2018-10-05'},
'Open': {0: 1.2922067642211914,
1: 1.2867859601974487,
2: 1.2859420776367188,
3: 1.2914056777954102,
4: 1.2928247451782229,
5: 1.292808175086975,
6: 1.3027958869934082,
7: 1.3017443418502808,
8: 1.30451238155365,
9: 1.3110626935958862,
10: 1.3071041107177734,
11: 1.3146650791168213,
12: 1.3166556358337402,
13: 1.3140604496002195,
14: 1.3271400928497314,
15: 1.3080958127975464,
16: 1.3117163181304932,
17: 1.3180439472198486,
18: 1.3169677257537842,
19: 1.3077707290649414,
20: 1.3039510250091553,
21: 1.3043931722640991,
22: 1.2979763746261597,
23: 1.2941633462905884,
24: 1.3022021055221558},
'High': {0: 1.2934937477111816,
1: 1.2870012521743774,
2: 1.2979259490966797,
3: 1.2959914207458496,
4: 1.3024225234985352,
5: 1.3052103519439695,
6: 1.30804443359375,
7: 1.3044441938400269,
8: 1.3120088577270508,
9: 1.3143367767333984,
10: 1.3156682252883911,
11: 1.3171066045761108,
12: 1.3211784362792969,
13: 1.3296104669570925,
14: 1.3278449773788452,
15: 1.3166556358337402,
16: 1.3175750970840454,
17: 1.3196094036102295,
18: 1.3180439472198486,
19: 1.3090718984603882,
20: 1.3097577095031738,
21: 1.3049719333648682,
22: 1.3020155429840088,
23: 1.3036959171295166,
24: 1.310753345489502},
'Low': {0: 1.2856279611587524,
1: 1.2813942432403564,
2: 1.2793285846710205,
3: 1.289723515510559,
4: 1.2918561697006226,
5: 1.289823293685913,
6: 1.2976733446121216,
7: 1.298414707183838,
8: 1.3027619123458862,
9: 1.3073604106903076,
10: 1.3070186376571655,
11: 1.3120776414871216,
12: 1.3120431900024414,
13: 1.3140085935592651,
14: 1.305841088294983,
15: 1.3064552545547483,
16: 1.3097233772277832,
17: 1.3141123056411743,
18: 1.309706211090088,
19: 1.3002548217773438,
20: 1.3014055490493774,
21: 1.2944146394729614,
22: 1.2964619398117063,
23: 1.2924572229385376,
24: 1.3005592823028564},
'Close': {0: 1.292306900024414,
1: 1.2869019508361816,
2: 1.2858428955078125,
3: 1.2914891242980957,
4: 1.2925406694412231,
5: 1.2930254936218262,
6: 1.302643060684204,
7: 1.3015578985214231,
8: 1.304546356201172,
9: 1.311131477355957,
10: 1.307326316833496,
11: 1.3146305084228516,
12: 1.3168463706970217,
13: 1.3141123056411743,
14: 1.327087163925171,
15: 1.30804443359375,
16: 1.3117333650588991,
17: 1.3179919719696045,
18: 1.3172800540924072,
19: 1.3078734874725342,
20: 1.3039000034332275,
21: 1.3043591976165771,
22: 1.2981956005096436,
23: 1.294062852859497,
24: 1.3024225234985352},
'Pivot Price': {0: 1.2934937477111816,
1: np.nan,
2: 1.2793285846710205,
3: np.nan,
4: np.nan,
5: np.nan,
6: np.nan,
7: np.nan,
8: np.nan,
9: np.nan,
10: np.nan,
11: np.nan,
12: np.nan,
13: 1.3296104669570925,
14: np.nan,
15: np.nan,
16: np.nan,
17: np.nan,
18: np.nan,
19: np.nan,
20: np.nan,
21: np.nan,
22: np.nan,
23: 1.2924572229385376,
24: np.nan}})
fig = go.Figure(data=[go.Candlestick(x=df['Date'],
open=df['Open'],
high=df['High'],
low=df['Low'],
close=df['Close'])])
fig.add_trace(
go.Scatter(mode = "lines+markers",
x=df['Date'],
y=df["Pivot Price"]
))
fig.update_layout(
autosize=False,
width=1000,
height=800,)
fig.show()
This is the current image
This is the desired output/image
I want black line between the markers (pivots). I would also ideally like a value next to each line showing the distance between each pivot but Im not sure how to do this.
For example the distance between the first two pivots round(abs(1.293494 - 1.279329),3) returns 0.014 so I would ideally like this next to the line.
The second is round(abs(1.279329 - 1.329610),3) so the value would be 0.05. I have hand edited the image and added the lines for the first two values to give a visual representation of what Im trying to achieve.
The problem seems to be the missing values. So just use pandas.Series.interpolate in combination with fig.add_annotation to get:
I've included annotations for differences as well. There are surely more elegant ways to do it than with for loops, but it does the job. Let me know if anything is unclear!
import pandas as pd
import numpy as np
import plotly.graph_objects as go
df = pd.DataFrame(
{'index': {0: 0,
1: 1,
2: 2,
3: 3,
4: 4,
5: 5,
6: 6,
7: 7,
8: 8,
9: 9,
10: 10,
11: 11,
12: 12,
13: 13,
14: 14,
15: 15,
16: 16,
17: 17,
18: 18,
19: 19,
20: 20,
21: 21,
22: 22,
23: 23,
24: 24},
'Date': {0: '2018-09-03',
1: '2018-09-04',
2: '2018-09-05',
3: '2018-09-06',
4: '2018-09-07',
5: '2018-09-10',
6: '2018-09-11',
7: '2018-09-12',
8: '2018-09-13',
9: '2018-09-14',
10: '2018-09-17',
11: '2018-09-18',
12: '2018-09-19',
13: '2018-09-20',
14: '2018-09-21',
15: '2018-09-24',
16: '2018-09-25',
17: '2018-09-26',
18: '2018-09-27',
19: '2018-09-28',
20: '2018-10-01',
21: '2018-10-02',
22: '2018-10-03',
23: '2018-10-04',
24: '2018-10-05'},
'Open': {0: 1.2922067642211914,
1: 1.2867859601974487,
2: 1.2859420776367188,
3: 1.2914056777954102,
4: 1.2928247451782229,
5: 1.292808175086975,
6: 1.3027958869934082,
7: 1.3017443418502808,
8: 1.30451238155365,
9: 1.3110626935958862,
10: 1.3071041107177734,
11: 1.3146650791168213,
12: 1.3166556358337402,
13: 1.3140604496002195,
14: 1.3271400928497314,
15: 1.3080958127975464,
16: 1.3117163181304932,
17: 1.3180439472198486,
18: 1.3169677257537842,
19: 1.3077707290649414,
20: 1.3039510250091553,
21: 1.3043931722640991,
22: 1.2979763746261597,
23: 1.2941633462905884,
24: 1.3022021055221558},
'High': {0: 1.2934937477111816,
1: 1.2870012521743774,
2: 1.2979259490966797,
3: 1.2959914207458496,
4: 1.3024225234985352,
5: 1.3052103519439695,
6: 1.30804443359375,
7: 1.3044441938400269,
8: 1.3120088577270508,
9: 1.3143367767333984,
10: 1.3156682252883911,
11: 1.3171066045761108,
12: 1.3211784362792969,
13: 1.3296104669570925,
14: 1.3278449773788452,
15: 1.3166556358337402,
16: 1.3175750970840454,
17: 1.3196094036102295,
18: 1.3180439472198486,
19: 1.3090718984603882,
20: 1.3097577095031738,
21: 1.3049719333648682,
22: 1.3020155429840088,
23: 1.3036959171295166,
24: 1.310753345489502},
'Low': {0: 1.2856279611587524,
1: 1.2813942432403564,
2: 1.2793285846710205,
3: 1.289723515510559,
4: 1.2918561697006226,
5: 1.289823293685913,
6: 1.2976733446121216,
7: 1.298414707183838,
8: 1.3027619123458862,
9: 1.3073604106903076,
10: 1.3070186376571655,
11: 1.3120776414871216,
12: 1.3120431900024414,
13: 1.3140085935592651,
14: 1.305841088294983,
15: 1.3064552545547483,
16: 1.3097233772277832,
17: 1.3141123056411743,
18: 1.309706211090088,
19: 1.3002548217773438,
20: 1.3014055490493774,
21: 1.2944146394729614,
22: 1.2964619398117063,
23: 1.2924572229385376,
24: 1.3005592823028564},
'Close': {0: 1.292306900024414,
1: 1.2869019508361816,
2: 1.2858428955078125,
3: 1.2914891242980957,
4: 1.2925406694412231,
5: 1.2930254936218262,
6: 1.302643060684204,
7: 1.3015578985214231,
8: 1.304546356201172,
9: 1.311131477355957,
10: 1.307326316833496,
11: 1.3146305084228516,
12: 1.3168463706970217,
13: 1.3141123056411743,
14: 1.327087163925171,
15: 1.30804443359375,
16: 1.3117333650588991,
17: 1.3179919719696045,
18: 1.3172800540924072,
19: 1.3078734874725342,
20: 1.3039000034332275,
21: 1.3043591976165771,
22: 1.2981956005096436,
23: 1.294062852859497,
24: 1.3024225234985352},
'Pivot Price': {0: 1.2934937477111816,
1: np.nan,
2: 1.2793285846710205,
3: np.nan,
4: np.nan,
5: np.nan,
6: np.nan,
7: np.nan,
8: np.nan,
9: np.nan,
10: np.nan,
11: np.nan,
12: np.nan,
13: 1.3296104669570925,
14: np.nan,
15: np.nan,
16: np.nan,
17: np.nan,
18: np.nan,
19: np.nan,
20: np.nan,
21: np.nan,
22: np.nan,
23: 1.2924572229385376,
24: np.nan}})
import plotly.graph_objects as go
import pandas as pd
from datetime import datetime
# df=pd.read_csv("for_so.csv")
fig = go.Figure(data=[go.Candlestick(x=df['Date'],
# fig = go.Figure(data=[go.Candlestick(x=df.index,
open=df['Open'],
high=df['High'],
low=df['Low'],
close=df['Close'])])
# some calculations
df_diff = df['Pivot Price'].dropna().diff().copy()
df2 = df[df.index.isin(df_diff.index)].copy()
df2['Price Diff'] = df['Pivot Price'].dropna().values
fig.add_trace(
go.Scatter(mode = "lines+markers",
x=df['Date'],
y=df["Pivot Price"]
))
fig.update_layout(
autosize=False,
width=1000,
height=800,)
fig.add_trace(go.Scatter(x=df['Date'], y=df['Pivot Price'].interpolate(),
# fig.add_trace(go.Scatter(x=df.index, y=df['Pivot Price'].interpolate(),
mode = 'lines',
line = dict(color='black')))
def annot(value):
# print(type(value))
if np.isnan(value):
return ''
else:
return value
j = 0
for i, p in enumerate(df['Pivot Price']):
# print(p)
# if not np.isnan(p) and not np.isnan(df_diff.iloc[j]):
if not np.isnan(p):
# print(not np.isnan(df_diff.iloc[j]))
fig.add_annotation(dict(font=dict(color='rgba(0,0,200,0.8)',size=12),
x=df['Date'].iloc[i],
# x=df.index[i],
# x = xStart
y=p,
showarrow=False,
text=annot(round(abs(df_diff.iloc[j]),3)),
textangle=0,
xanchor='right',
xref="x",
yref="y"))
j = j + 1
fig.update_xaxes(type='category')
fig.show()
Problem seems the missing values, plotly has difficulty with. With this trick you can only plot the point;
has_value = ~df["Pivot Price"].isna()
import plotly.graph_objects as go
import pandas as pd
from datetime import datetime
df=pd.read_csv("notebooks/for_so.csv")
fig = go.Figure(data=[go.Candlestick(x=df['Date'],
open=df['Open'],
high=df['High'],
low=df['Low'],
close=df['Close'])])
fig.add_trace(
go.Scatter(mode = 'lines',
x=df[has_value]['Date'],
y=df[has_value]["Pivot Price"], line={'color':'black', 'width':1}
))
fig.add_trace(
go.Scatter(mode = "markers",
x=df['Date'],
y=df["Pivot Price"]
))
fig.update_layout(
autosize=False,
width=1000,
height=800,)
fig.show()
This did it for me.

Nested for loops to create muliple pivot table based on 2 level multiindex in pandas

Started getting confused with this one. I have a large Fact Invoice Header table. I took the original dataframe, used a groupby to split the df up based upon one column. The output was a list of dataframes:
list_of_dfs = []
for _, g in df.groupby(df['Project State Name']):
list_of_dfs.append(g)
list_of_dfs
Then I used a another for loop to loop through the list of dataframes and perform one pivot table aggregation.
for each_state_df in list_of_dfs:
columns_to_index_by = ['Project Issue', 'Project Secondary Issue', 'Project Client Name']
# Aggregating to the Project Level
table_for_pivots = pd.pivot_table(df, index=['FY Year', 'Project Issue'], values=["Project Key", 'Total Net Amount', "Project Total Resolution Amount", 'Project Budgeted Amount'],
aggfunc= {"Project Key": lambda x: len(x.unique()), 'Total Net Amount': np.sum, "Project Total Resolution Amount": np.mean,
'Project Budgeted Amount': np.mean},
fill_value=np.mean)
print(table_for_pivots)
My question is, how can I use another for loop replace the second element in the pivot table index with each value in the variable columns_to_index_by? The output would be 3 pivot tables where index=[‘FY Year’, ‘Project Issue’], index=[‘FY Year’, ‘Project Secondary Issue’, and index=[‘FY Year’, ‘Project Client Name’]. Thanks all!
Link to download a sample df data is here:
https://ufile.io/iufv9nma
Use list comprehension and iterate through a zip of the index you want to set for each group:
from pandas import Timestamp
from numpy import nan
d = {'Total Net Amount': {2: 672.0, 41: 1277.9, 17: 270.0, 32: 845.3, 26: 828.62, 11: 733.5, 23: 1741.8, 35: 254.14655, 29: 245.0, 59: 215.0, 38: 617.4, 0: 1061.5}, 'Project Total Resolution Amount': {2: 35000, 41: 27000, 17: 40000, 32: 27000, 26: 27000, 11: 40000, 23: 27000, 35: 27000, 29: 27000, 59: 27000, 38: 27000, 0: 30000}, 'Invoice Header Key': {2: 1229422, 41: 984803, 17: 1270731, 32: 938069, 26: 911535, 11: 1247443, 23: 902150, 35: 943737, 29: 918888, 59: 1071541, 38: 965091, 0: 1279581}, 'Project Key': {2: 259661, 41: 194517, 17: 259188, 32: 194517, 26: 194517, 11: 259188, 23: 194517, 35: 194517, 29: 194517, 59: 194517, 38: 194517, 0: 263736}, 'Project Secondary Issue': {2: 2, 41: 4, 17: 0, 32: 3, 26: 3, 11: 0, 23: 4, 35: 4, 29: 4, 59: 4, 38: 3, 0: 4}, 'Organization Key': {2: 16029, 41: 22638, 17: 24230, 32: 22638, 26: 22638, 11: 24230, 23: 22638, 35: 22638, 29: 22638, 59: 22638, 38: 22638, 0: 4532}, 'Project Budgeted Amount': {2: 42735.0, 41: 32500.0, 17: 26000.0, 32: 32500.0, 26: 32500.0, 11: 26000.0, 23: 32500.0, 35: 32500.0, 29: 32500.0, 59: 32500.0, 38: 32500.0, 0: nan}, 'Project State Name': {2: 0, 41: 1, 17: 2, 32: 1, 26: 1, 11: 2, 23: 1, 35: 1, 29: 1, 59: 1, 38: 1, 0: 1}, 'Project Issue': {2: 0, 41: 2, 17: 1, 32: 2, 26: 2, 11: 1, 23: 2, 35: 2, 29: 2, 59: 2, 38: 2, 0: 1}, 'Project Number': {2: 2, 41: 0, 17: 1, 32: 0, 26: 0, 11: 1, 23: 0, 35: 0, 29: 0, 59: 0, 38: 0, 0: 3}, 'Project Client Name': {2: 1, 41: 0, 17: 0, 32: 0, 26: 0, 11: 0, 23: 0, 35: 0, 29: 0, 59: 0, 38: 0, 0: 1}, 'Paid Date Year Month': {2: 13, 41: 7, 17: 15, 32: 4, 26: 2, 11: 14, 23: 1, 35: 5, 29: 3, 59: 12, 38: 6, 0: 16}, 'FY Year': {2: 2, 41: 0, 17: 2, 32: 0, 26: 0, 11: 2, 23: 0, 35: 0, 29: 0, 59: 1, 38: 0, 0: 2}, 'Invoice Paid Date': {2: Timestamp('2019-09-10 00:00:00'), 41: Timestamp('2017-12-20 00:00:00'), 17: Timestamp('2019-11-25 00:00:00'), 32: Timestamp('2017-08-31 00:00:00'), 26: Timestamp('2017-06-14 00:00:00'), 11: Timestamp('2019-10-08 00:00:00'), 23: Timestamp('2017-05-30 00:00:00'), 35: Timestamp('2017-09-07 00:00:00'), 29: Timestamp('2017-07-10 00:00:00'), 59: Timestamp('2018-10-03 00:00:00'), 38: Timestamp('2017-11-03 00:00:00'), 0: Timestamp('2019-12-12 00:00:00')}, 'Invoice Paid Date Key': {2: 20190910, 41: 20171220, 17: 20191125, 32: 20170831, 26: 20170614, 11: 20191008, 23: 20170530, 35: 20170907, 29: 20170710, 59: 20181003, 38: 20171103, 0: 20191212}, 'Count Project Secondary Issue': {2: 3, 41: 3, 17: 3, 32: 3, 26: 3, 11: 3, 23: 3, 35: 3, 29: 3, 59: 3, 38: 3, 0: 2}, 'Total Net Amount By Count Project Secondary Issue': {2: 224.0, 41: 425.9666666666667, 17: 90.0, 32: 281.7666666666667, 26: 276.2066666666666, 11: 244.5, 23: 580.6, 35: 84.71551666666666, 29: 81.66666666666667, 59: 71.66666666666667, 38: 205.8, 0: 530.75}, 'Total Net Invoice Amount': {2: 672.0, 41: 1277.9, 17: 270.0, 32: 845.3, 26: 828.62, 11: 733.5, 23: 1741.8, 35: 254.14655, 29: 245.0, 59: 215.0, 38: 617.4, 0: 1061.5}, 'Total Project Invoice Amount': {2: 7176.52, 41: 10110.98655, 17: 1678.5, 32: 10110.98655, 26: 10110.98655, 11: 1678.5, 23: 10110.98655, 35: 10110.98655, 29: 10110.98655, 59: 10110.98655, 38: 10110.98655, 0: 1061.5}, 'Invoice Dollar Percent of Project': {2: 0.09363869953682286, 41: 0.1263872712796755, 17: 0.160857908847185, 32: 0.08360212881501655, 26: 0.08195243816242638, 11: 0.4369973190348526, 23: 0.1722680562758735, 35: 0.02513568272919916, 29: 0.02423106773888449, 59: 0.02126399821983741, 38: 0.06106229070198891, 0: 1.0}}
df = pd.DataFrame(d)
# list comprehension with groupby
group = [g for _, g in df.groupby('Project State Name')]
#create a list of indices you want to use in pivot
idx = [['FY Year', 'Project Issue'],
['FY Year', 'Project Secondary Issue'],
['FY Year', 'Project Client Name']]
# create a list of columns to add to the value param in pivot
values = ["Project Key", 'Total Net Amount',
"Project Total Resolution Amount", 'Project Budgeted Amount']
# use your current pivot and iterate through zip(idx, group)
dfs = [pd.pivot_table(df, index=i, values=values,
aggfunc= {"Project Key": lambda x: len(x.unique()), 'Total Net Amount': np.sum,
"Project Total Resolution Amount": np.mean,
'Project Budgeted Amount': np.mean},
fill_value=np.mean) for i,df in zip(idx, group)]
dict comprehension
I did not know what you wanted the key to be so I just selected the second value from idx. You will call each dataframe from the dict by dfs['Project Issue']
dfs = {i[1]: pd.pivot_table(df, index=i, values=values,
aggfunc= {"Project Key": lambda x: len(x.unique()), 'Total Net Amount': np.sum,
"Project Total Resolution Amount": np.mean,
'Project Budgeted Amount': np.mean},
fill_value=np.mean) for i,df in zip(idx, group)}

Categories

Resources