Hi created a stack bar chart using python plotly. But gives the wrong X-axis order.
DF :
Day-Shift State seconds
Day 01-05 A 7439
Day 01-05 STOPPED 0
Day 01-05 B 10
Day 01-05 C 35751
Night 01-05 C 43200
Day 01-06 STOPPED 7198
Day 01-06 F 18
Day 01-06 A 14
Day 01-06 A 29301
Day 01-06 STOPPED 6
Day 01-06 A 6663
Night 01-06 A 43200
In df Day-Shift represent shift and Date, it goes Day 01-05, Night 01-05, Day 01-06, Night 01-06, and so on.
But in the graph, gives the wrong order on X-axis. Ex: After the Day 01-05 graph shows Night 01-08 instead of Night 01-05.
Sample df and my code attached below:
import plotly.express as px
fig = px.bar(df, x="Day-Shift", y="seconds", color="State")
fig.show()
Df ad Dict:
import pandas as pd
import plotly.express as px
df = pd.DataFrame({'Day-Shift': {0: 'Day 01-05',
1: 'Day 01-05',
2: 'Day 01-05',
3: 'Day 01-05',
4: 'Night 01-05',
5: 'Day 01-06',
6: 'Day 01-06',
7: 'Day 01-06',
8: 'Day 01-06',
9: 'Day 01-06',
10: 'Day 01-06',
11: 'Night 01-06',
12: 'Day 01-07',
13: 'Night 01-07',
14: 'Night 01-07',
15: 'Night 01-07',
16: 'Night 01-07',
17: 'Night 01-07',
18: 'Night 01-08',
19: 'Night 01-08',
20: 'Night 01-08',
21: 'Night 01-08',
22: 'Day 01-08',
23: 'Day 01-08',
24: 'Day 01-08',
25: 'Night 01-09',
26: 'Night 01-09',
27: 'Night 01-09',
28: 'Day 01-09',
29: 'Day 01-09',
30: 'Day 01-09',
31: 'Day 01-09',
32: 'Day 01-10',
33: 'Night 01-10',
34: 'Day 01-11',
35: 'Day 01-11',
36: 'Day 01-11',
37: 'Day 01-11',
38: 'Day 01-11',
39: 'Night 01-11',
40: 'Day 01-12',
41: 'Night 01-12',
42: 'Day 01-13',
43: 'Day 01-13',
44: 'Day 01-13',
45: 'Day 01-13',
46: 'Day 01-13',
47: 'Day 01-13',
48: 'Day 01-13',
49: 'Night 01-13',
50: 'Day 01-14',
51: 'Day 01-14',
52: 'Day 01-14',
53: 'Day 01-14',
54: 'Day 01-14',
55: 'Day 01-14',
56: 'Day 01-14',
57: 'Day 01-14',
58: 'Day 01-14',
59: 'Night 01-14'},
'State': {0: 'D',
1: 'STOPPED',
2: 'B',
3: 'A',
4: 'A',
5: 'A',
6: 'A1',
7: 'A2',
8: 'A3',
9: 'A4',
10: 'B1',
11: 'B1',
12: 'B1',
13: 'B1',
14: 'B2',
15: 'STOPPED',
16: 'RUNNING',
17: 'B',
18: 'STOPPED',
19: 'B',
20: 'RUNNING',
21: 'D',
22: 'STOPPED',
23: 'B',
24: 'RUNNING',
25: 'STOPPED',
26: 'RUNNING',
27: 'B',
28: 'RUNNING',
29: 'STOPPED',
30: 'B',
31: 'D',
32: 'B',
33: 'B',
34: 'B',
35: 'RUNNING',
36: 'STOPPED',
37: 'D',
38: 'A',
39: 'A',
40: 'A',
41: 'A',
42: 'A',
43: 'A1',
44: 'A2',
45: 'A3',
46: 'A4',
47: 'B1',
48: 'B2',
49: 'B2',
50: 'B2',
51: 'B',
52: 'STOPPED',
53: 'A',
54: 'A1',
55: 'A2',
56: 'A3',
57: 'A4',
58: 'B1',
59: 'B1'},
'seconds': {0: 7439,
1: 0,
2: 10,
3: 35751,
4: 43200,
5: 7198,
6: 18,
7: 14,
8: 29301,
9: 6,
10: 6663,
11: 43200,
12: 43200,
13: 5339,
14: 8217,
15: 0,
16: 4147,
17: 1040,
18: 24787,
19: 1500,
20: 14966,
21: 1410,
22: 2499,
23: 1310,
24: 39391,
25: 3570,
26: 17234,
27: 47390,
28: 36068,
29: 270,
30: 6842,
31: 20,
32: 43200,
33: 43200,
34: 2486,
35: 8420,
36: 870,
37: 30,
38: 31394,
39: 43200,
40: 43200,
41: 43200,
42: 36733,
43: 23,
44: 6,
45: 4,
46: 4,
47: 3,
48: 6427,
49: 43200,
50: 620,
51: 0,
52: 4,
53: 41336,
54: 4,
55: 4,
56: 4,
57: 23,
58: 1205,
59: 43200}})
Really appreciate your support !!!
You can use category_orders to set the order of values:
import pandas as pd
import plotly.express as px
df = pd.DataFrame({'Day-Shift': {0: 'Day 01-05', 1: 'Day 01-05', 2: 'Day 01-05', 3: 'Day 01-05', 4: 'Night 01-05', 5: 'Day 01-06', 6: 'Day 01-06', 7: 'Day 01-06', 8: 'Day 01-06', 9: 'Day 01-06', 10: 'Day 01-06', 11: 'Night 01-06', 12: 'Day 01-07', 13: 'Night 01-07', 14: 'Night 01-07', 15: 'Night 01-07', 16: 'Night 01-07', 17: 'Night 01-07', 18: 'Night 01-08', 19: 'Night 01-08', 20: 'Night 01-08', 21: 'Night 01-08', 22: 'Day 01-08', 23: 'Day 01-08', 24: 'Day 01-08', 25: 'Night 01-09', 26: 'Night 01-09', 27: 'Night 01-09', 28: 'Day 01-09', 29: 'Day 01-09', 30: 'Day 01-09', 31: 'Day 01-09', 32: 'Day 01-10', 33: 'Night 01-10', 34: 'Day 01-11', 35: 'Day 01-11', 36: 'Day 01-11', 37: 'Day 01-11', 38: 'Day 01-11', 39: 'Night 01-11', 40: 'Day 01-12', 41: 'Night 01-12', 42: 'Day 01-13', 43: 'Day 01-13', 44: 'Day 01-13', 45: 'Day 01-13', 46: 'Day 01-13', 47: 'Day 01-13', 48: 'Day 01-13', 49: 'Night 01-13', 50: 'Day 01-14', 51: 'Day 01-14', 52: 'Day 01-14', 53: 'Day 01-14', 54: 'Day 01-14', 55: 'Day 01-14', 56: 'Day 01-14', 57: 'Day 01-14', 58: 'Day 01-14', 59: 'Night 01-14'}, 'State': {0: 'D', 1: 'STOPPED', 2: 'B', 3: 'A', 4: 'A', 5: 'A', 6: 'A1', 7: 'A2', 8: 'A3', 9: 'A4', 10: 'B1', 11: 'B1', 12: 'B1', 13: 'B1', 14: 'B2', 15: 'STOPPED', 16: 'RUNNING', 17: 'B', 18: 'STOPPED', 19: 'B', 20: 'RUNNING', 21: 'D', 22: 'STOPPED', 23: 'B', 24: 'RUNNING', 25: 'STOPPED', 26: 'RUNNING', 27: 'B', 28: 'RUNNING', 29: 'STOPPED', 30: 'B', 31: 'D', 32: 'B', 33: 'B', 34: 'B', 35: 'RUNNING', 36: 'STOPPED', 37: 'D', 38: 'A', 39: 'A', 40: 'A', 41: 'A', 42: 'A', 43: 'A1', 44: 'A2', 45: 'A3', 46: 'A4', 47: 'B1', 48: 'B2', 49: 'B2', 50: 'B2', 51: 'B', 52: 'STOPPED', 53: 'A', 54: 'A1', 55: 'A2', 56: 'A3', 57: 'A4', 58: 'B1', 59: 'B1'}, 'seconds': {0: 7439, 1: 0, 2: 10, 3: 35751, 4: 43200, 5: 7198, 6: 18, 7: 14, 8: 29301, 9: 6, 10: 6663, 11: 43200, 12: 43200, 13: 5339, 14: 8217, 15: 0, 16: 4147, 17: 1040, 18: 24787, 19: 1500, 20: 14966, 21: 1410, 22: 2499, 23: 1310, 24: 39391, 25: 3570, 26: 17234, 27: 47390, 28: 36068, 29: 270, 30: 6842, 31: 20, 32: 43200, 33: 43200, 34: 2486, 35: 8420, 36: 870, 37: 30, 38: 31394, 39: 43200, 40: 43200, 41: 43200, 42: 36733, 43: 23, 44: 6, 45: 4, 46: 4, 47: 3, 48: 6427, 49: 43200, 50: 620, 51: 0, 52: 4, 53: 41336, 54: 4, 55: 4, 56: 4, 57: 23, 58: 1205, 59: 43200}})
fig = px.bar(df, x="Day-Shift", y="seconds", category_orders={'Day-Shift': df['Day-Shift'].to_list()},color="State")
fig.show()
Output:
Setting category_orders = {"Day-Shift":df['Day-Shift'].unique()} will work, but only reliably if your dataset has the correct order to begin with. Another condition is that you only have data for one unique year. In order to guarantee the correct order regardless of original order, and to make it possible to have data for december 2020 combinde with january 2021 I would suggest you to:
split "Day-Shift" into two separate columns; time of day == tod and day of month = date,
append year to your dates, like dfs['date2'] = dfs['date'] + '-2021',
turn 'date2' into datetime using dfs['date2'] = pd.to_datetime(dfs['date2']),
sort your values chronologically, and
retrieve "Day-Shift" in the now correct order with new_order = list(df['Day-Shift'].unique()), and then
apply the chronologially correct order through category_orders = {'Day-Shift': new_order}
Plot
Complete code:
import pandas as pd
import plotly.express as px
df = pd.DataFrame({'Day-Shift': {0: 'Day 01-05',
1: 'Day 01-05',
2: 'Day 01-05',
3: 'Day 01-05',
4: 'Night 01-05',
5: 'Day 01-06',
6: 'Day 01-06',
7: 'Day 01-06',
8: 'Day 01-06',
9: 'Day 01-06',
10: 'Day 01-06',
11: 'Night 01-06',
12: 'Day 01-07',
13: 'Night 01-07',
14: 'Night 01-07',
15: 'Night 01-07',
16: 'Night 01-07',
17: 'Night 01-07',
18: 'Night 01-08',
19: 'Night 01-08',
20: 'Night 01-08',
21: 'Night 01-08',
22: 'Day 01-08',
23: 'Day 01-08',
24: 'Day 01-08',
25: 'Night 01-09',
26: 'Night 01-09',
27: 'Night 01-09',
28: 'Day 01-09',
29: 'Day 01-09',
30: 'Day 01-09',
31: 'Day 01-09',
32: 'Day 01-10',
33: 'Night 01-10',
34: 'Day 01-11',
35: 'Day 01-11',
36: 'Day 01-11',
37: 'Day 01-11',
38: 'Day 01-11',
39: 'Night 01-11',
40: 'Day 01-12',
41: 'Night 01-12',
42: 'Day 01-13',
43: 'Day 01-13',
44: 'Day 01-13',
45: 'Day 01-13',
46: 'Day 01-13',
47: 'Day 01-13',
48: 'Day 01-13',
49: 'Night 01-13',
50: 'Day 01-14',
51: 'Day 01-14',
52: 'Day 01-14',
53: 'Day 01-14',
54: 'Day 01-14',
55: 'Day 01-14',
56: 'Day 01-14',
57: 'Day 01-14',
58: 'Day 01-14',
59: 'Night 01-14'},
'State': {0: 'D',
1: 'STOPPED',
2: 'B',
3: 'A',
4: 'A',
5: 'A',
6: 'A1',
7: 'A2',
8: 'A3',
9: 'A4',
10: 'B1',
11: 'B1',
12: 'B1',
13: 'B1',
14: 'B2',
15: 'STOPPED',
16: 'RUNNING',
17: 'B',
18: 'STOPPED',
19: 'B',
20: 'RUNNING',
21: 'D',
22: 'STOPPED',
23: 'B',
24: 'RUNNING',
25: 'STOPPED',
26: 'RUNNING',
27: 'B',
28: 'RUNNING',
29: 'STOPPED',
30: 'B',
31: 'D',
32: 'B',
33: 'B',
34: 'B',
35: 'RUNNING',
36: 'STOPPED',
37: 'D',
38: 'A',
39: 'A',
40: 'A',
41: 'A',
42: 'A',
43: 'A1',
44: 'A2',
45: 'A3',
46: 'A4',
47: 'B1',
48: 'B2',
49: 'B2',
50: 'B2',
51: 'B',
52: 'STOPPED',
53: 'A',
54: 'A1',
55: 'A2',
56: 'A3',
57: 'A4',
58: 'B1',
59: 'B1'},
'seconds': {0: 7439,
1: 0,
2: 10,
3: 35751,
4: 43200,
5: 7198,
6: 18,
7: 14,
8: 29301,
9: 6,
10: 6663,
11: 43200,
12: 43200,
13: 5339,
14: 8217,
15: 0,
16: 4147,
17: 1040,
18: 24787,
19: 1500,
20: 14966,
21: 1410,
22: 2499,
23: 1310,
24: 39391,
25: 3570,
26: 17234,
27: 47390,
28: 36068,
29: 270,
30: 6842,
31: 20,
32: 43200,
33: 43200,
34: 2486,
35: 8420,
36: 870,
37: 30,
38: 31394,
39: 43200,
40: 43200,
41: 43200,
42: 36733,
43: 23,
44: 6,
45: 4,
46: 4,
47: 3,
48: 6427,
49: 43200,
50: 620,
51: 0,
52: 4,
53: 41336,
54: 4,
55: 4,
56: 4,
57: 23,
58: 1205,
59: 43200}})
dfs = df['Day-Shift'].str.extract('([a-zA-Z]+)([^a-zA-Z]+)', expand=True)
dfs.columns = ['tod', 'date']
dfs['date2'] = dfs['date'] + '-2021'
dfs['date2'] = pd.to_datetime(dfs['date2'])
df = pd.concat([df, dfs], axis = 1)
df = df.sort_values(['date2', 'tod'], ascending = [True, True])
new_order = list(df['Day-Shift'].unique())
# df['Day-Shift'] = pd.Categorical(df['Day-Shift'], categories=new_order, ordered=True)
fig = px.bar(df, x="Day-Shift", y="seconds", color="State",
category_orders = {'Day-Shift': new_order})
fig.update_xaxes(type='category')
fig.show()
Related
I am trying to create an animated 3D scatterplot to represent fish swimming in 3D space. I have 8 fish, and for each fish I have 4 points. I am able to make the graph and animate it, however the size of the graph changes randomly between time points. I have set the axes mins and maxes, but the distance between them seems to change. What aspect of the plot do I need to alter in order to keep it stable?
This is the plotly express command that I am using:
fig = px.scatter_3d(df,x="x", y="y", z="z",
color="Fish", animation_frame="Frame", hover_data = ["BodyPart"],
range_x=[-0.25,0.25], range_y=[-0.15,0.15], range_z=[-0.15,0.15],
color_continuous_scale = "rainbow")
These two images show the graph one frame apart from one another. The green square shows stats on one point to show that it is not changing drastically:
I am also including this video for a clearer example.
Edited:
Minimum graphing code:
import pandas as pd
import plotly.express as px
data_dict = {'Fish': {0: 0, 1: 0, 2: 0, 3: 0, 4: 1, 5: 1, 6: 1, 7: 1, 8: 2, 9: 2, 10: 2, 11: 2, 12: 3, 13: 3, 14: 3, 15: 3, 16: 4, 17: 4, 18: 4, 19: 4, 20: 5, 21: 5, 22: 5, 23: 5, 24: 6, 25: 6, 26: 6, 27: 6, 28: 7, 29: 7, 30: 7, 31: 7, 32: 0, 33: 0, 34: 0, 35: 0, 36: 1, 37: 1, 38: 1, 39: 1, 40: 2, 41: 2, 42: 2, 43: 2, 44: 3, 45: 3, 46: 3, 47: 3, 48: 4, 49: 4, 50: 4, 51: 4, 52: 5, 53: 5, 54: 5, 55: 5, 56: 6, 57: 6, 58: 6, 59: 6, 60: 7, 61: 7, 62: 7, 63: 7}, 'BodyPart': {0: 'head', 1: 'midline2', 2: 'tailbase', 3: 'tailtip', 4: 'head', 5: 'midline2', 6: 'tailbase', 7: 'tailtip', 8: 'head', 9: 'midline2', 10: 'tailbase', 11: 'tailtip', 12: 'head', 13: 'midline2', 14: 'tailbase', 15: 'tailtip', 16: 'head', 17: 'midline2', 18: 'tailbase', 19: 'tailtip', 20: 'head', 21: 'midline2', 22: 'tailbase', 23: 'tailtip', 24: 'head', 25: 'midline2', 26: 'tailbase', 27: 'tailtip', 28: 'head', 29: 'midline2', 30: 'tailbase', 31: 'tailtip', 32: 'head', 33: 'midline2', 34: 'tailbase', 35: 'tailtip', 36: 'head', 37: 'midline2', 38: 'tailbase', 39: 'tailtip', 40: 'head', 41: 'midline2', 42: 'tailbase', 43: 'tailtip', 44: 'head', 45: 'midline2', 46: 'tailbase', 47: 'tailtip', 48: 'head', 49: 'midline2', 50: 'tailbase', 51: 'tailtip', 52: 'head', 53: 'midline2', 54: 'tailbase', 55: 'tailtip', 56: 'head', 57: 'midline2', 58: 'tailbase', 59: 'tailtip', 60: 'head', 61: 'midline2', 62: 'tailbase', 63: 'tailtip'}, 'x': {0: 0.121283071, 1: 0.074230535, 2: 0.096664814, 3: 0.063435668, 4: -0.11843468, 5: -0.133776416, 6: -0.12698166, 7: -0.133996648, 8: 0.154499401, 9: 0.099541555, 10: 0.126525899, 11: 0.086448979, 12: -0.001723707, 13: -0.064203743, 14: -0.033163578, 15: -0.077987938, 16: 0.160456072, 17: 0.175340028, 18: 0.178537856, 19: 0.16438273, 20: -0.151890354, 21: -0.099510254, 22: -0.123827166, 23: -0.08765671, 24: 0.052741099, 25: -0.003778201, 26: 0.022010701, 27: -0.014747641, 28: -0.137528989, 29: -0.078632593, 30: -0.106688178, 31: -0.065274018, 32: 0.12128202, 33: 0.074230379, 34: 0.096662597, 35: 0.063435699, 36: -0.118412987, 37: -0.133729238, 38: -0.12729935, 39: -0.134238167, 40: 0.154498856, 41: 0.099541572, 42: 0.126525899, 43: 0.086450612, 44: -0.001719156, 45: -0.064209291, 46: -0.033163578, 47: -0.07796947, 48: 0.157094899, 49: 0.175288008, 50: 0.178383788, 51: 0.1643551, 52: -0.153086656, 53: -0.100645272, 54: -0.125700666, 55: -0.089248865, 56: 0.052731775, 57: -0.003778201, 58: 0.022011924, 59: -0.014749184, 60: -0.138954183, 61: -0.079588201, 62: -0.107413558, 63: -0.06588028}, 'y': {0: -0.018777537, 1: -0.017936625, 2: -0.019031854, 3: -0.018688299, 4: 0.031655295, 5: 0.089278103, 6: 0.060434868, 7: 0.102354879, 8: 0.012448659, 9: 0.005374916, 10: 0.008431857, 11: 0.010384436, 12: 0.007394437, 13: 0.002657548, 14: 0.0047918, 15: 0.004216939, 16: -0.061691249, 17: -0.022574622, 18: -0.044862196, 19: -0.015288812, 20: 0.126254494, 21: 0.125420316, 22: 0.127216595, 23: 0.122366769, 24: -0.018798237, 25: -0.026209512, 26: -0.020654802, 27: -0.030922742, 28: 0.100460973, 29: 0.091726762, 30: 0.095608508, 31: 0.089022071, 32: -0.018930378, 33: -0.018313362, 34: -0.019121954, 35: -0.018839649, 36: 0.030465513, 37: 0.087966041, 38: 0.058855924, 39: 0.100617287, 40: 0.012372615, 41: 0.00530059, 42: 0.008431857, 43: 0.009864426, 44: 0.007169236, 45: 0.002524294, 46: 0.0047918, 47: 0.002813216, 48: -0.061409007, 49: -0.024774863, 50: -0.045825365, 51: -0.017002469, 52: 0.125813664, 53: 0.125533354, 54: 0.126988948, 55: 0.121414741, 56: -0.019165739, 57: -0.026209512, 58: -0.020802186, 59: -0.031842627, 60: 0.100213119, 61: 0.091677506, 62: 0.095490242, 63: 0.08724155}, 'z': {0: -0.011584533, 1: -0.005671144, 2: -0.004720913, 3: -0.007099159, 4: 0.048633092, 5: 0.044680886, 6: 0.047755313, 7: 0.047602698, 8: 0.005219131, 9: 0.020195691, 10: 0.013766486, 11: 0.019271016, 12: -0.009086866, 13: 0.005213358, 14: -0.003552202, 15: 0.001820855, 16: -0.039992723, 17: 0.041166976, 18: -0.013040119, 19: 0.048827692, 20: 0.044577227, 21: 0.043492943, 22: 0.045104437, 23: 0.0399218, 24: 0.007934858, 25: 0.007980119, 26: 0.010593472, 27: 0.006390279, 28: 0.070277892, 29: 0.066889416, 30: 0.070485941, 31: 0.054907996, 32: -0.011559485, 33: -0.005583401, 34: -0.004725084, 35: -0.007089815, 36: 0.048823811, 37: 0.04574317, 38: 0.047201689, 39: 0.043995531, 40: 0.005234299, 41: 0.020211407, 42: 0.013766486, 43: 0.019405438, 44: -0.009034049, 45: 0.005200504, 46: -0.003552202, 47: 0.002061042, 48: -0.035258171, 49: 0.041424053, 50: -0.013317812, 51: 0.048629332, 52: 0.043972705, 53: 0.042581942, 54: 0.046299595, 55: 0.040028712, 56: 0.007931264, 57: 0.007980119, 58: 0.010624531, 59: 0.006616644, 60: 0.068992196, 61: 0.064455916, 62: 0.07226277, 63: 0.056393304}, 'Frame': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 10: 0, 11: 0, 12: 0, 13: 0, 14: 0, 15: 0, 16: 0, 17: 0, 18: 0, 19: 0, 20: 0, 21: 0, 22: 0, 23: 0, 24: 0, 25: 0, 26: 0, 27: 0, 28: 0, 29: 0, 30: 0, 31: 0, 32: 1, 33: 1, 34: 1, 35: 1, 36: 1, 37: 1, 38: 1, 39: 1, 40: 1, 41: 1, 42: 1, 43: 1, 44: 1, 45: 1, 46: 1, 47: 1, 48: 1, 49: 1, 50: 1, 51: 1, 52: 1, 53: 1, 54: 1, 55: 1, 56: 1, 57: 1, 58: 1, 59: 1, 60: 1, 61: 1, 62: 1, 63: 1}}
df = pd.DataFrame(data_dict)
fig = px.scatter_3d(df,x="x", y="y", z="z", color="Fish", animation_frame="Frame", hover_data = ["BodyPart"],
range_x=[-0.25,0.25], range_y=[-0.15,0.15], range_z=[-0.15,0.15], color_continuous_scale = "rainbow")
fig.update_layout(margin=dict(l=0, r=0, b=0, t=0))
fig.show()
This seems related to the aspectratio in fig.layout.scene:
layout.Scene({
'aspectmode': 'auto',
'aspectratio': {'x': 1.7359689116422856, 'y': 0.9924641251101735, 'z':0.5804211635071164},
If you manually set x, y and z in the dict above to something specific, the flinching of the figure between animation frames seems to disappear.
I've tried:
fig.layout.scene.aspectratio = {'x':1, 'y':1, 'z':1}
fig.show()
And the results are promising. Give it a go on your end and let me know how it works out for you.
It also seems, as you've already discovered, to work best in tandem with setting defined ranges for x_range, y_range, z_range. Since your datasample is a bit limited, I've been messing around with px.data.gapminder().
Plot
Complete code
import plotly.express as px
df = px.data.gapminder()
# df
fig = px.scatter_3d(df, x = 'pop', y='lifeExp', z = 'gdpPercap', animation_frame='year',
range_x=[int(df['pop'].min()*0.5),int(df['pop'].max()*1.5)],
range_y=[int(df.lifeExp.min()*0.5),int(df.lifeExp.max()*1.5)],
range_z=[int(df['gdpPercap'].min()*0.5),int(df['gdpPercap'].max()*1.5)]
)
fig.layout.scene.aspectratio = {'x':1, 'y':1, 'z':1}
fig.show()
I know the question is worded horribly but I can't think of how to word it any better.
I have two dataframes, one containing the original data:
{2016: {1: 88698.0,
2: 86725.0,
3: 80426.0,
4: 74888.0,
5: 71659.0,
6: 67431.0,
7: 63613.0,
8: 60174.0,
9: 59495.0,
10: 59487.0,
11: 59118.0,
12: 59542.0,
13: 61170.0,
14: 63785.0,
15: 65038.0,
16: 67441.0,
17: 68188.0,
18: 69303.0,
19: 70224.0,
20: 70163.0,
21: 71522.0,
22: 73707.0,
23: 75002.0,
24: 76487.0,
25: 78806.0,
26: 81444.0,
27: 84114.0,
28: 84274.0,
29: 86701.0,
30: 87051.0,
31: 89298.0,
32: 91461.0,
33: 93937.0,
34: 96308.0,
35: 96803.0,
36: 98718.0,
37: 99343.0,
38: 100494.0,
39: 101260.0,
40: 101153.0,
41: 99668.0,
42: 97139.0,
43: 97203.0,
44: 95940.0,
45: 96969.0,
46: 98608.0,
47: 96332.0,
48: 94736.0,
49: 90970.0,
50: 87938.0,
51: 82082.0,
52: 79481.0,
53: nan},
2017: {1: 75212.0,
2: 68024.0,
3: 64087.0,
4: 58824.0,
5: 52226.0,
6: 50006.0,
7: 46975.0,
8: 46794.0,
9: 42855.0,
10: 42021.0,
11: 41884.0,
12: 40281.0,
13: 39117.0,
14: 37985.0,
15: 37120.0,
16: 36968.0,
17: 36702.0,
18: 38486.0,
19: 39051.0,
20: 40589.0,
21: 44099.0,
22: 47631.0,
23: 49984.0,
24: 51726.0,
25: 55653.0,
26: 57663.0,
27: 59409.0,
28: 62820.0,
29: 63324.0,
30: 64788.0,
31: 64693.0,
32: 66452.0,
33: 69349.0,
34: 70697.0,
35: 76470.0,
36: 78805.0,
37: 77624.0,
38: 75268.0,
39: 74695.0,
40: 75892.0,
41: 75930.0,
42: 74942.0,
43: 75824.0,
44: 74628.0,
45: 72058.0,
46: 71113.0,
47: 70602.0,
48: 71898.0,
49: 72186.0,
50: 68760.0,
51: 65931.0,
52: 65441.0,
53: nan},
2018: {1: 59224.0,
2: 55546.0,
3: 51355.0,
4: 50126.0,
5: 45962.0,
6: 42438.0,
7: 39840.0,
8: 39370.0,
9: 37844.0,
10: 35470.0,
11: 33731.0,
12: 32671.0,
13: 33416.0,
14: 33039.0,
15: 33260.0,
16: 32937.0,
17: 33599.0,
18: 35737.0,
19: 37453.0,
20: 38314.0,
21: 40159.0,
22: 44152.0,
23: 47971.0,
24: 51381.0,
25: 55825.0,
26: 58905.0,
27: 61242.0,
28: 62724.0,
29: 61766.0,
30: 63514.0,
31: 63533.0,
32: 66825.0,
33: 65732.0,
34: 68240.0,
35: 70572.0,
36: 71835.0,
37: 72966.0,
38: 74556.0,
39: 76592.0,
40: 78223.0,
41: 79895.0,
42: 79209.0,
43: 79793.0,
44: 80800.0,
45: 79795.0,
46: 78203.0,
47: 77027.0,
48: 75356.0,
49: 72124.0,
50: 68584.0,
51: 67402.0,
52: 65576.0,
53: nan},
2019: {1: 63624.0,
2: 62046.0,
3: 58091.0,
4: 54316.0,
5: 51765.0,
6: 52033.0,
7: 48140.0,
8: 46787.0,
9: 44772.0,
10: 43806.0,
11: 44905.0,
12: 45564.0,
13: 46906.0,
14: 48134.0,
15: 50554.0,
16: 51797.0,
17: 53271.0,
18: 54197.0,
19: 57114.0,
20: 60312.0,
21: 60509.0,
22: 63388.0,
23: 66265.0,
24: 69530.0,
25: 70905.0,
26: 72313.0,
27: 72288.0,
28: 73153.0,
29: 74967.0,
30: 76430.0,
31: 79261.0,
32: 82623.0,
33: 86492.0,
34: 90041.0,
35: 92856.0,
36: 93701.0,
37: 96520.0,
38: 95368.0,
39: 96264.0,
40: 96355.0,
41: 95794.0,
42: 95282.0,
43: 94817.0,
44: 95536.0,
45: 92914.0,
46: 89160.0,
47: 88321.0,
48: 86443.0,
49: 88099.0,
50: 85469.0,
51: 82634.0,
52: 82188.0,
53: nan},
2020: {1: 82784.0,
2: 81804.0,
3: 80581.0,
4: 77236.0,
5: 77976.0,
6: 71822.0,
7: 68726.0,
8: 68132.0,
9: 64557.0,
10: 61529.0,
11: 61379.0,
12: 59424.0,
13: 59134.0,
14: 59027.0,
15: 56780.0,
16: 57442.0,
17: 56835.0,
18: 59376.0,
19: 61625.0,
20: 62697.0,
21: 64240.0,
22: 67329.0,
23: 66282.0,
24: 68967.0,
25: 71331.0,
26: 74599.0,
27: 76823.0,
28: 80348.0,
29: 82388.0,
30: 84404.0,
31: 86713.0,
32: 89336.0,
33: 89295.0,
34: 90833.0,
35: 95222.0,
36: 97380.0,
37: 96141.0,
38: 97890.0,
39: 101959.0,
40: 101842.0,
41: 99897.0,
42: 98325.0,
43: 98391.0,
44: 95828.0,
45: 94889.0,
46: 92887.0,
47: 92562.0,
48: 91718.0,
49: 87637.0,
50: 83927.0,
51: 81596.0,
52: 75146.0,
53: 72777.0},
2021: {1: 66048.0,
2: 59818.0,
3: 57610.0,
4: 56053.0,
5: 51545.0,
6: 48649.0,
7: 43491.0,
8: 41246.0,
9: 41199.0,
10: 41029.0,
11: 41269.0,
12: nan,
13: nan,
14: nan,
15: nan,
16: nan,
17: nan,
18: nan,
19: nan,
20: nan,
21: nan,
22: nan,
23: nan,
24: nan,
25: nan,
26: nan,
27: nan,
28: nan,
29: nan,
30: nan,
31: nan,
32: nan,
33: nan,
34: nan,
35: nan,
36: nan,
37: nan,
38: nan,
39: nan,
40: nan,
41: nan,
42: nan,
43: nan,
44: nan,
45: nan,
46: nan,
47: nan,
48: nan,
49: nan,
50: nan,
51: nan,
52: nan,
53: nan}}
and then one which is just the first dataframe.diff():
{2016: {1: nan,
2: -1973.0,
3: -6299.0,
4: -5538.0,
5: -3229.0,
6: -4228.0,
7: -3818.0,
8: -3439.0,
9: -679.0,
10: -8.0,
11: -369.0,
12: 424.0,
13: 1628.0,
14: 2615.0,
15: 1253.0,
16: 2403.0,
17: 747.0,
18: 1115.0,
19: 921.0,
20: -61.0,
21: 1359.0,
22: 2185.0,
23: 1295.0,
24: 1485.0,
25: 2319.0,
26: 2638.0,
27: 2670.0,
28: 160.0,
29: 2427.0,
30: 350.0,
31: 2247.0,
32: 2163.0,
33: 2476.0,
34: 2371.0,
35: 495.0,
36: 1915.0,
37: 625.0,
38: 1151.0,
39: 766.0,
40: -107.0,
41: -1485.0,
42: -2529.0,
43: 64.0,
44: -1263.0,
45: 1029.0,
46: 1639.0,
47: -2276.0,
48: -1596.0,
49: -3766.0,
50: -3032.0,
51: -5856.0,
52: -2601.0,
53: nan},
2017: {1: nan,
2: -7188.0,
3: -3937.0,
4: -5263.0,
5: -6598.0,
6: -2220.0,
7: -3031.0,
8: -181.0,
9: -3939.0,
10: -834.0,
11: -137.0,
12: -1603.0,
13: -1164.0,
14: -1132.0,
15: -865.0,
16: -152.0,
17: -266.0,
18: 1784.0,
19: 565.0,
20: 1538.0,
21: 3510.0,
22: 3532.0,
23: 2353.0,
24: 1742.0,
25: 3927.0,
26: 2010.0,
27: 1746.0,
28: 3411.0,
29: 504.0,
30: 1464.0,
31: -95.0,
32: 1759.0,
33: 2897.0,
34: 1348.0,
35: 5773.0,
36: 2335.0,
37: -1181.0,
38: -2356.0,
39: -573.0,
40: 1197.0,
41: 38.0,
42: -988.0,
43: 882.0,
44: -1196.0,
45: -2570.0,
46: -945.0,
47: -511.0,
48: 1296.0,
49: 288.0,
50: -3426.0,
51: -2829.0,
52: -490.0,
53: nan},
2018: {1: nan,
2: -3678.0,
3: -4191.0,
4: -1229.0,
5: -4164.0,
6: -3524.0,
7: -2598.0,
8: -470.0,
9: -1526.0,
10: -2374.0,
11: -1739.0,
12: -1060.0,
13: 745.0,
14: -377.0,
15: 221.0,
16: -323.0,
17: 662.0,
18: 2138.0,
19: 1716.0,
20: 861.0,
21: 1845.0,
22: 3993.0,
23: 3819.0,
24: 3410.0,
25: 4444.0,
26: 3080.0,
27: 2337.0,
28: 1482.0,
29: -958.0,
30: 1748.0,
31: 19.0,
32: 3292.0,
33: -1093.0,
34: 2508.0,
35: 2332.0,
36: 1263.0,
37: 1131.0,
38: 1590.0,
39: 2036.0,
40: 1631.0,
41: 1672.0,
42: -686.0,
43: 584.0,
44: 1007.0,
45: -1005.0,
46: -1592.0,
47: -1176.0,
48: -1671.0,
49: -3232.0,
50: -3540.0,
51: -1182.0,
52: -1826.0,
53: nan},
2019: {1: nan,
2: -1578.0,
3: -3955.0,
4: -3775.0,
5: -2551.0,
6: 268.0,
7: -3893.0,
8: -1353.0,
9: -2015.0,
10: -966.0,
11: 1099.0,
12: 659.0,
13: 1342.0,
14: 1228.0,
15: 2420.0,
16: 1243.0,
17: 1474.0,
18: 926.0,
19: 2917.0,
20: 3198.0,
21: 197.0,
22: 2879.0,
23: 2877.0,
24: 3265.0,
25: 1375.0,
26: 1408.0,
27: -25.0,
28: 865.0,
29: 1814.0,
30: 1463.0,
31: 2831.0,
32: 3362.0,
33: 3869.0,
34: 3549.0,
35: 2815.0,
36: 845.0,
37: 2819.0,
38: -1152.0,
39: 896.0,
40: 91.0,
41: -561.0,
42: -512.0,
43: -465.0,
44: 719.0,
45: -2622.0,
46: -3754.0,
47: -839.0,
48: -1878.0,
49: 1656.0,
50: -2630.0,
51: -2835.0,
52: -446.0,
53: nan},
2020: {1: nan,
2: -980.0,
3: -1223.0,
4: -3345.0,
5: 740.0,
6: -6154.0,
7: -3096.0,
8: -594.0,
9: -3575.0,
10: -3028.0,
11: -150.0,
12: -1955.0,
13: -290.0,
14: -107.0,
15: -2247.0,
16: 662.0,
17: -607.0,
18: 2541.0,
19: 2249.0,
20: 1072.0,
21: 1543.0,
22: 3089.0,
23: -1047.0,
24: 2685.0,
25: 2364.0,
26: 3268.0,
27: 2224.0,
28: 3525.0,
29: 2040.0,
30: 2016.0,
31: 2309.0,
32: 2623.0,
33: -41.0,
34: 1538.0,
35: 4389.0,
36: 2158.0,
37: -1239.0,
38: 1749.0,
39: 4069.0,
40: -117.0,
41: -1945.0,
42: -1572.0,
43: 66.0,
44: -2563.0,
45: -939.0,
46: -2002.0,
47: -325.0,
48: -844.0,
49: -4081.0,
50: -3710.0,
51: -2331.0,
52: -6450.0,
53: -2369.0}}
What I am trying to do is calculate, for all columns in any row where 2021 is NaN, the next value row by taking the value in the normal dataframe and adding the next down value from the .diff() dataframe. So, for example, 2020 for week 12 would be 61379 (row 11 in normal df) + (-1955.0, row 12 from the .diff() df)
TIA
Same logic like before
out = df1.mask(df1[2021].notna(),df1+df2.shift(-1),axis=0).fillna(df1[[2021]])
I have a df with three columns(Day-Shift, State, seconds).
Day-Shift State seconds
Day 01-05 A 7439
Day 01-05 STOPPED 0
Day 01-05 B 10
Day 01-05 C 35751
Night 01-05 C 43200
Day 01-06 STOPPED 7198
Day 01-06 F 18
Day 01-06 A 14
Day 01-06 A 29301
Day 01-06 STOPPED 6
Day 01-06 A 6663
Night 01-06 A 43200
My code to build a stacked bar chart is:
import plotly.express as px
fig = px.bar(df, x="Day-Shift", y="seconds", color="State")
fig.show()
But it returns this stacked bar chart.
The fault here is that the Day-Shift order is changed and the corresponding seconds are not in this chart. I cannot identify the error. Really appreciate your support!
DF:
{'Day-Shift': {0: 'Day 01-05',
1: 'Day 01-05',
2: 'Day 01-05',
3: 'Day 01-05',
4: 'Night 01-05',
5: 'Day 01-06',
6: 'Day 01-06',
7: 'Day 01-06',
8: 'Day 01-06',
9: 'Day 01-06',
10: 'Day 01-06',
11: 'Night 01-06',
12: 'Day 01-07',
13: 'Night 01-07',
14: 'Night 01-07',
15: 'Night 01-07',
16: 'Night 01-07',
17: 'Night 01-07',
18: 'Night 01-08',
19: 'Night 01-08',
20: 'Night 01-08',
21: 'Night 01-08',
22: 'Day 01-08',
23: 'Day 01-08',
24: 'Day 01-08',
25: 'Night 01-09',
26: 'Night 01-09',
27: 'Night 01-09',
28: 'Day 01-09',
29: 'Day 01-09',
30: 'Day 01-09',
31: 'Day 01-09',
32: 'Day 01-10',
33: 'Night 01-10',
34: 'Day 01-11',
35: 'Day 01-11',
36: 'Day 01-11',
37: 'Day 01-11',
38: 'Day 01-11',
39: 'Night 01-11',
40: 'Day 01-12',
41: 'Night 01-12',
42: 'Day 01-13',
43: 'Day 01-13',
44: 'Day 01-13',
45: 'Day 01-13',
46: 'Day 01-13',
47: 'Day 01-13',
48: 'Day 01-13',
49: 'Night 01-13',
50: 'Day 01-14',
51: 'Day 01-14',
52: 'Day 01-14',
53: 'Day 01-14',
54: 'Day 01-14',
55: 'Day 01-14',
56: 'Day 01-14',
57: 'Day 01-14',
58: 'Day 01-14',
59: 'Night 01-14'},
'State': {0: 'D',
1: 'STOPPED',
2: 'B',
3: 'A',
4: 'A',
5: 'A',
6: 'A1',
7: 'A2',
8: 'A3',
9: 'A4',
10: 'B1',
11: 'B1',
12: 'B1',
13: 'B1',
14: 'B2',
15: 'STOPPED',
16: 'RUNNING',
17: 'B',
18: 'STOPPED',
19: 'B',
20: 'RUNNING',
21: 'D',
22: 'STOPPED',
23: 'B',
24: 'RUNNING',
25: 'STOPPED',
26: 'RUNNING',
27: 'B',
28: 'RUNNING',
29: 'STOPPED',
30: 'B',
31: 'D',
32: 'B',
33: 'B',
34: 'B',
35: 'RUNNING',
36: 'STOPPED',
37: 'D',
38: 'A',
39: 'A',
40: 'A',
41: 'A',
42: 'A',
43: 'A1',
44: 'A2',
45: 'A3',
46: 'A4',
47: 'B1',
48: 'B2',
49: 'B2',
50: 'B2',
51: 'B',
52: 'STOPPED',
53: 'A',
54: 'A1',
55: 'A2',
56: 'A3',
57: 'A4',
58: 'B1',
59: 'B1'},
'seconds': {0: 7439,
1: 0,
2: 10,
3: 35751,
4: 43200,
5: 7198,
6: 18,
7: 14,
8: 29301,
9: 6,
10: 6663,
11: 43200,
12: 43200,
13: 5339,
14: 8217,
15: 0,
16: 4147,
17: 1040,
18: 24787,
19: 1500,
20: 14966,
21: 1410,
22: 2499,
23: 1310,
24: 39391,
25: 3570,
26: 17234,
27: 47390,
28: 36068,
29: 270,
30: 6842,
31: 20,
32: 43200,
33: 43200,
34: 2486,
35: 8420,
36: 870,
37: 30,
38: 31394,
39: 43200,
40: 43200,
41: 43200,
42: 36733,
43: 23,
44: 6,
45: 4,
46: 4,
47: 3,
48: 6427,
49: 43200,
50: 620,
51: 0,
52: 4,
53: 41336,
54: 4,
55: 4,
56: 4,
57: 23,
58: 1205,
59: 43200}}
Your snippet seems to be running fine on my end:
import plotly.express as px
fig = px.bar(df, x="Day-Shift", y="seconds", color="State")
fig.show()
And produces this plot:
So then it's either an issue with your version, or, more likely, your data. The first thing you should do is make sure that none of your data has been turned into an index. You can easily reset your index using df = df.reset_index(). In the snippet below you'll see that I've used your identical dataset as a dict with no index.
Edit: xaxis formatting
In the figure above, plotly interprets your xaxis as time values. If you'd like to prevent this, just include fig.update_xaxes(type='category') to get this:
Complete code:
import pandas as pd
import plotly.express as px
# df = pd.read_clipboard(sep='\\s+').reset_index()
# df.to_dict()
df = pd.DataFrame({'index': {0: 'Day',
1: 'Day',
2: 'Day',
3: 'Day',
4: 'Night',
5: 'Day',
6: 'Day',
7: 'Day',
8: 'Day',
9: 'Day',
10: 'Day',
11: 'Night'},
'Day-Shift': {0: '01-05',
1: '01-05',
2: '01-05',
3: '01-05',
4: '01-05',
5: '01-06',
6: '01-06',
7: '01-06',
8: '01-06',
9: '01-06',
10: '01-06',
11: '01-06'},
'State': {0: 'A',
1: 'STOPPED',
2: 'B',
3: 'C',
4: 'C',
5: 'STOPPED',
6: 'F',
7: 'A',
8: 'A',
9: 'STOPPED',
10: 'A',
11: 'A'},
'seconds': {0: 7439,
1: 0,
2: 10,
3: 35751,
4: 43200,
5: 7198,
6: 18,
7: 14,
8: 29301,
9: 6,
10: 6663,
11: 43200}})
import plotly.express as px
fig = px.bar(df, x="Day-Shift", y="seconds", color="State")
fig.show()
Started getting confused with this one. I have a large Fact Invoice Header table. I took the original dataframe, used a groupby to split the df up based upon one column. The output was a list of dataframes:
list_of_dfs = []
for _, g in df.groupby(df['Project State Name']):
list_of_dfs.append(g)
list_of_dfs
Then I used a another for loop to loop through the list of dataframes and perform one pivot table aggregation.
for each_state_df in list_of_dfs:
columns_to_index_by = ['Project Issue', 'Project Secondary Issue', 'Project Client Name']
# Aggregating to the Project Level
table_for_pivots = pd.pivot_table(df, index=['FY Year', 'Project Issue'], values=["Project Key", 'Total Net Amount', "Project Total Resolution Amount", 'Project Budgeted Amount'],
aggfunc= {"Project Key": lambda x: len(x.unique()), 'Total Net Amount': np.sum, "Project Total Resolution Amount": np.mean,
'Project Budgeted Amount': np.mean},
fill_value=np.mean)
print(table_for_pivots)
My question is, how can I use another for loop replace the second element in the pivot table index with each value in the variable columns_to_index_by? The output would be 3 pivot tables where index=[‘FY Year’, ‘Project Issue’], index=[‘FY Year’, ‘Project Secondary Issue’, and index=[‘FY Year’, ‘Project Client Name’]. Thanks all!
Link to download a sample df data is here:
https://ufile.io/iufv9nma
Use list comprehension and iterate through a zip of the index you want to set for each group:
from pandas import Timestamp
from numpy import nan
d = {'Total Net Amount': {2: 672.0, 41: 1277.9, 17: 270.0, 32: 845.3, 26: 828.62, 11: 733.5, 23: 1741.8, 35: 254.14655, 29: 245.0, 59: 215.0, 38: 617.4, 0: 1061.5}, 'Project Total Resolution Amount': {2: 35000, 41: 27000, 17: 40000, 32: 27000, 26: 27000, 11: 40000, 23: 27000, 35: 27000, 29: 27000, 59: 27000, 38: 27000, 0: 30000}, 'Invoice Header Key': {2: 1229422, 41: 984803, 17: 1270731, 32: 938069, 26: 911535, 11: 1247443, 23: 902150, 35: 943737, 29: 918888, 59: 1071541, 38: 965091, 0: 1279581}, 'Project Key': {2: 259661, 41: 194517, 17: 259188, 32: 194517, 26: 194517, 11: 259188, 23: 194517, 35: 194517, 29: 194517, 59: 194517, 38: 194517, 0: 263736}, 'Project Secondary Issue': {2: 2, 41: 4, 17: 0, 32: 3, 26: 3, 11: 0, 23: 4, 35: 4, 29: 4, 59: 4, 38: 3, 0: 4}, 'Organization Key': {2: 16029, 41: 22638, 17: 24230, 32: 22638, 26: 22638, 11: 24230, 23: 22638, 35: 22638, 29: 22638, 59: 22638, 38: 22638, 0: 4532}, 'Project Budgeted Amount': {2: 42735.0, 41: 32500.0, 17: 26000.0, 32: 32500.0, 26: 32500.0, 11: 26000.0, 23: 32500.0, 35: 32500.0, 29: 32500.0, 59: 32500.0, 38: 32500.0, 0: nan}, 'Project State Name': {2: 0, 41: 1, 17: 2, 32: 1, 26: 1, 11: 2, 23: 1, 35: 1, 29: 1, 59: 1, 38: 1, 0: 1}, 'Project Issue': {2: 0, 41: 2, 17: 1, 32: 2, 26: 2, 11: 1, 23: 2, 35: 2, 29: 2, 59: 2, 38: 2, 0: 1}, 'Project Number': {2: 2, 41: 0, 17: 1, 32: 0, 26: 0, 11: 1, 23: 0, 35: 0, 29: 0, 59: 0, 38: 0, 0: 3}, 'Project Client Name': {2: 1, 41: 0, 17: 0, 32: 0, 26: 0, 11: 0, 23: 0, 35: 0, 29: 0, 59: 0, 38: 0, 0: 1}, 'Paid Date Year Month': {2: 13, 41: 7, 17: 15, 32: 4, 26: 2, 11: 14, 23: 1, 35: 5, 29: 3, 59: 12, 38: 6, 0: 16}, 'FY Year': {2: 2, 41: 0, 17: 2, 32: 0, 26: 0, 11: 2, 23: 0, 35: 0, 29: 0, 59: 1, 38: 0, 0: 2}, 'Invoice Paid Date': {2: Timestamp('2019-09-10 00:00:00'), 41: Timestamp('2017-12-20 00:00:00'), 17: Timestamp('2019-11-25 00:00:00'), 32: Timestamp('2017-08-31 00:00:00'), 26: Timestamp('2017-06-14 00:00:00'), 11: Timestamp('2019-10-08 00:00:00'), 23: Timestamp('2017-05-30 00:00:00'), 35: Timestamp('2017-09-07 00:00:00'), 29: Timestamp('2017-07-10 00:00:00'), 59: Timestamp('2018-10-03 00:00:00'), 38: Timestamp('2017-11-03 00:00:00'), 0: Timestamp('2019-12-12 00:00:00')}, 'Invoice Paid Date Key': {2: 20190910, 41: 20171220, 17: 20191125, 32: 20170831, 26: 20170614, 11: 20191008, 23: 20170530, 35: 20170907, 29: 20170710, 59: 20181003, 38: 20171103, 0: 20191212}, 'Count Project Secondary Issue': {2: 3, 41: 3, 17: 3, 32: 3, 26: 3, 11: 3, 23: 3, 35: 3, 29: 3, 59: 3, 38: 3, 0: 2}, 'Total Net Amount By Count Project Secondary Issue': {2: 224.0, 41: 425.9666666666667, 17: 90.0, 32: 281.7666666666667, 26: 276.2066666666666, 11: 244.5, 23: 580.6, 35: 84.71551666666666, 29: 81.66666666666667, 59: 71.66666666666667, 38: 205.8, 0: 530.75}, 'Total Net Invoice Amount': {2: 672.0, 41: 1277.9, 17: 270.0, 32: 845.3, 26: 828.62, 11: 733.5, 23: 1741.8, 35: 254.14655, 29: 245.0, 59: 215.0, 38: 617.4, 0: 1061.5}, 'Total Project Invoice Amount': {2: 7176.52, 41: 10110.98655, 17: 1678.5, 32: 10110.98655, 26: 10110.98655, 11: 1678.5, 23: 10110.98655, 35: 10110.98655, 29: 10110.98655, 59: 10110.98655, 38: 10110.98655, 0: 1061.5}, 'Invoice Dollar Percent of Project': {2: 0.09363869953682286, 41: 0.1263872712796755, 17: 0.160857908847185, 32: 0.08360212881501655, 26: 0.08195243816242638, 11: 0.4369973190348526, 23: 0.1722680562758735, 35: 0.02513568272919916, 29: 0.02423106773888449, 59: 0.02126399821983741, 38: 0.06106229070198891, 0: 1.0}}
df = pd.DataFrame(d)
# list comprehension with groupby
group = [g for _, g in df.groupby('Project State Name')]
#create a list of indices you want to use in pivot
idx = [['FY Year', 'Project Issue'],
['FY Year', 'Project Secondary Issue'],
['FY Year', 'Project Client Name']]
# create a list of columns to add to the value param in pivot
values = ["Project Key", 'Total Net Amount',
"Project Total Resolution Amount", 'Project Budgeted Amount']
# use your current pivot and iterate through zip(idx, group)
dfs = [pd.pivot_table(df, index=i, values=values,
aggfunc= {"Project Key": lambda x: len(x.unique()), 'Total Net Amount': np.sum,
"Project Total Resolution Amount": np.mean,
'Project Budgeted Amount': np.mean},
fill_value=np.mean) for i,df in zip(idx, group)]
dict comprehension
I did not know what you wanted the key to be so I just selected the second value from idx. You will call each dataframe from the dict by dfs['Project Issue']
dfs = {i[1]: pd.pivot_table(df, index=i, values=values,
aggfunc= {"Project Key": lambda x: len(x.unique()), 'Total Net Amount': np.sum,
"Project Total Resolution Amount": np.mean,
'Project Budgeted Amount': np.mean},
fill_value=np.mean) for i,df in zip(idx, group)}
With the following data
ex = {'id': {0: 12,
1: 7745,
2: 14190,
3: 12,
4: 7745,
5: 14190,
6: 12,
7: 7745,
8: 14190,
9: 12,
10: 7745,
11: 14190,
12: 12,
13: 7745,
14: 14190,
15: 12,
16: 7745,
17: 14190,
18: 12,
19: 7745,
20: 14190,
21: 12,
22: 7745,
23: 14190,
24: 12,
25: 7745,
26: 14190,
27: 12,
28: 7745,
29: 14190,
30: 12,
31: 7745,
32: 14190,
33: 12,
34: 7745,
35: 14190,
36: 12,
37: 7745,
38: 14190,
39: 12,
40: 7745,
41: 14190,
42: 12,
43: 7745,
44: 14190,
45: 12,
46: 7745,
47: 14190,
48: 12,
49: 7745,
50: 14190,
51: 12,
52: 7745,
53: 14190,
54: 12,
55: 7745,
56: 14190,
57: 12,
58: 7745,
59: 14190},
'id2': {0: 0,
1: 0,
2: 0,
3: 1,
4: 1,
5: 1,
6: 2,
7: 2,
8: 2,
9: 3,
10: 3,
11: 3,
12: 4,
13: 4,
14: 4,
15: 5,
16: 5,
17: 5,
18: 6,
19: 6,
20: 6,
21: 7,
22: 7,
23: 7,
24: 8,
25: 8,
26: 8,
27: 9,
28: 9,
29: 9,
30: 10,
31: 10,
32: 10,
33: 11,
34: 11,
35: 11,
36: 12,
37: 12,
38: 12,
39: 13,
40: 13,
41: 13,
42: 14,
43: 14,
44: 14,
45: 15,
46: 15,
47: 15,
48: 16,
49: 16,
50: 16,
51: 17,
52: 17,
53: 17,
54: 18,
55: 18,
56: 18,
57: 19,
58: 19,
59: 19},
'var1': {0: 60.57423361566744,
1: 58.044840216178606,
2: 51.29251700680272,
3: 60.674455993946225,
4: 58.21241610641044,
5: 51.31371599732972,
6: 60.77849708396439,
7: 58.369465051911966,
8: 51.33611104900928,
9: 60.88625886689413,
10: 58.516561288952005,
11: 51.35969457224551,
12: 60.99764332390786,
13: 58.65427905379941,
14: 51.38445897744256,
15: 61.112552436177864,
16: 58.78319258272294,
17: 51.4103966750045,
18: 61.230888184876434,
19: 58.90387611199144,
20: 51.43750007533549,
21: 61.35255255117588,
22: 59.01690387787371,
23: 51.465761588839634,
24: 61.4774475162485,
25: 59.122850116638496,
26: 51.49517362592107,
27: 61.60547506126665,
28: 59.222289064554694,
29: 51.52572859698392,
30: 61.736537167402595,
31: 59.31579495789107,
32: 51.55741891243228,
33: 61.870535815828646,
34: 59.40394203291643,
35: 51.5902369826703,
36: 62.00737298771711,
37: 59.48730452589962,
38: 51.624175218102074,
39: 62.14695066424032,
40: 59.56645667310938,
41: 51.659226029131744,
42: 62.289170826570604,
43: 59.64197271081458,
44: 51.69538182616348,
45: 62.43393545588018,
46: 59.714426875284005,
47: 51.732635019601275,
48: 62.58114653334144,
49: 59.784393402786435,
50: 51.770978019849345,
51: 62.73070604012664,
52: 59.85244652959075,
53: 51.81040323731179,
54: 62.88251595740815,
55: 59.919160491965705,
56: 51.85090308239276,
57: 63.03647826635822,
58: 59.98510952618012,
59: 51.892469965496346},
'var2': {0: 26.46961208868258,
1: 25.02784060286349,
2: 67.01680672268907,
3: 26.362852053047188,
4: 25.16250452630659,
5: 67.20428262498875,
6: 26.257170717779545,
7: 25.25801378937902,
8: 67.37902432665504,
9: 26.15255739707393,
10: 25.315898046471766,
11: 67.5412758313266,
12: 26.04900140512476,
13: 25.33768695197584,
14: 67.69128114264197,
15: 25.946492056126274,
16: 25.32491016028206,
17: 67.82928426423972,
18: 25.84501866427287,
19: 25.27909732578149,
20: 67.95552919975847,
21: 25.74457054375889,
22: 25.201778102865052,
23: 68.07025995283685,
24: 25.64513700877862,
25: 25.094482145923664,
26: 68.17372052711335,
27: 25.546707373526395,
28: 24.958739109348315,
29: 68.26615492622662,
30: 25.449270952196603,
31: 24.796078647529914,
32: 68.34780715381525,
33: 25.35281705898356,
34: 24.608030414859442,
35: 68.41892121351782,
36: 25.257335008081554,
37: 24.396124065727854,
38: 68.47974110897286,
39: 25.162814113684988,
40: 24.16188925452609,
41: 68.53051084381906,
42: 25.069243689988213,
43: 23.906855635645105,
44: 68.57147442169496,
45: 24.976613051185442,
46: 23.63255286347585,
47: 68.60287584623913,
48: 24.88491151147112,
49: 23.340510592409263,
50: 68.62495912109016,
51: 24.79412838503955,
52: 23.03225847683625,
53: 68.63796824988664,
54: 24.704252986085066,
55: 22.70932617114788,
56: 68.64214723626722,
57: 24.615274628802,
58: 22.373243329735022,
59: 68.6377400838704}}
ex = pd.DataFrame(ex).set_index(['id', 'id2'])
I'd like to calculate for each value in id the average of next n values of var1 where "next" is defined by id2. I know that pd.Series.expanding exists and I could do something like df.groupby('id')['var1'].transform(lambda x: x.expanding().mean()) but this would involve all 20 elements of each id, when I want to limit the average to the next n elements (let's say n = 5). How it can be done?
This should do the trick:
print(ex.sort_index(ascending=False).groupby("id")["var1"].rolling(6, min_periods=1).mean().reset_index(0, drop=True))
Output:
id id2
12 19 63.036478
18 62.959497
17 62.883233
16 62.807712
15 62.732956
14 62.658992
13 62.510738
12 62.364880
11 62.221519
10 62.080750
9 61.942674
8 61.807387
7 61.674987
6 61.545573
5 61.419242
4 61.296093
3 61.176224
2 61.059732
1 60.946716
0 60.837274
7745 19 59.985110
18 59.952135
17 59.918906
16 59.885277
15 59.851107
14 59.816252
13 59.746476
12 59.674500
11 59.599749
10 59.521650
9 59.439627
8 59.353106
7 59.261514
6 59.164276
5 59.060818
4 58.950565
3 58.832944
2 58.707380
1 58.573298
0 58.430126
14190 19 51.892470
18 51.871687
17 51.851259
16 51.831189
15 51.811478
14 51.792129
13 51.753255
12 51.715467
11 51.678772
10 51.643179
9 51.608695
8 51.575327
7 51.543082
6 51.511970
5 51.481997
4 51.453170
3 51.425498
2 51.398987
1 51.373646
0 51.349482
Name: var1, dtype: float64
[Program finished]