How to groupby aggregate min / max and plot grouped bars

How to groupby aggregate min / max and plot grouped bars - python

I am trying to graph the functions of min () and max () in the same graph, I already could with the function of max () but how can I join the two in the same graph and that it can be displayed correctly?
Example of my code and my output:
df.groupby('fecha_inicio')['capacidad_base_firme'].max().plot(kind='bar', legend = 'Reverse')
plt.xlabel('Tarifa de Base firme por Zona')
And my output of my dataframe:
zona capacidad_base_firme ... fecha_inicio fecha_fin
0 Sur 1.52306 ... 2016-01-01 2016-03-31
1 Centro 2.84902 ... 2016-01-01 2016-03-31
2 Occidente 1.57302 ... 2016-01-01 2016-03-31
3 Golfo 3.06847 ... 2016-01-01 2016-03-31
4 Norte 4.34706 ... 2016-01-01 2016-03-31
.. ... ... ... ... ...
67 Golfo 5.22776 ... 2017-10-01 2017-12-31
68 Norte 6.99284 ... 2017-10-01 2017-12-31
69 Istmo 7.25957 ... 2017-10-01 2017-12-31
70 Nacional 0.21971 ... 2017-10-01 2017-12-31
71 Nacional con AB -0.72323 ... 2017-10-01 2017-12-31
[72 rows x 10 columns]

The correct way is to aggregate multiple metrics at the same time with .agg, and then plot directly with pandas.DataFrame.plot
There is no need to call .groupby for each metric. For very large datasets, this can be resource intensive.
There is also no need to create a figure and axes with a separate call to matplotlib, as this is taken care of by pandas.DataFrame.plot, which uses matplotlib as the default backend.
Tested in python 3.9.7, pandas 1.3.4, matplotlib 3.5.0
import seaborn as sns # for data
import pandas as pd
import matplotlib.pyplot as plt
# load the test data
df = sns.load_dataset('penguins')
# display(df.head(3))
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
0 Adelie Torgersen 39.1 18.7 181.0 3750.0 Male
1 Adelie Torgersen 39.5 17.4 186.0 3800.0 Female
2 Adelie Torgersen 40.3 18.0 195.0 3250.0 Female
# aggregate metrics on a column
dfg = df.groupby('species').bill_length_mm.agg(['min', 'max'])
# display(dfg)
min max
species
Adelie 32.1 46.0
Chinstrap 40.9 58.0
Gentoo 40.9 59.6
# plot the grouped bar
ax = dfg.plot(kind='bar', figsize=(8, 6), title='Bill Length (mm)', xlabel='Species', ylabel='Length (mm)', rot=0)
plt.show()
Use stacked=True for stacked bars
ax = dfg.plot(kind='bar', figsize=(8, 6), title='Bill Length (mm)', xlabel='Species', ylabel='Length (mm)', rot=0, stacked=True)

Step 1
Create a subplot to plot the data to
fig, ax = plt.subplots()
Step 2
Plot your DataFrame maximum and minimum to the specific axis
df.groupby('fecha_inicio')['capacidad_base_firme'].max().plot(ax = ax, kind='bar', legend = 'Reverse', label='Maximum')
df.groupby('fecha_inicio')['capacidad_base_firme'].min().plot(ax = ax, kind='bar', legend = 'Reverse', label='Minimum')
You may need to adjust the zorder to get the effect of a stacked bar plot.

Related

python: complex plot with sns.FacetGrid() and secondary Y-axes

I have the following dataset:
trust_id cohort cnt
index_event_datetime
2017-06-01 chel sepsis 216.0
2017-07-01 chel sepsis 191.0
2017-08-01 chel sepsis 184.0
2017-09-01 chel sepsis 186.0
2017-10-01 chel sepsis 173.0
... ... ... ...
2022-02-01 ouh_ sepsis_thrombocytopenia 5.0
2022-03-01 ouh_ sepsis_thrombocytopenia NaN
2022-04-01 ouh_ sepsis_thrombocytopenia NaN
2022-05-01 ouh_ sepsis_thrombocytopenia NaN
2022-06-01 ouh_ sepsis_thrombocytopenia NaN
and I want to produce 4 plots for each trust with the count among three diseases:
grid = sns.FacetGrid(
incident_cnt_long.reset_index(),
col="trust_id",
hue="cohort",
col_wrap=2,
legend_out=True,
palette=["#FF4613", "#00FFAA", "#131E29"]
)
grid.map(sns.lineplot, "index_event_datetime", "cnt")
for ax in grid.axes:
# ax.xaxis.set_major_locator(mdates.MonthLocator((1, 7)))
# ax.xaxis.set_major_formatter(mdates.DateFormatter("%b-%Y"))
ax.xaxis.set_tick_params(rotation=90)
ax.set_xlabel(None)
ax.set_ylabel(None)
# grid.add_legend()
grid.fig.legend(["SEP", "STH", "SDI"])
grid.fig.supylabel("Patients (no)")
grid.fig.supxlabel("Date of index event", va="top")
grid.fig.set_size_inches(plt.rcParams["figure.figsize"])
grid.tight_layout()
plt.show()
fig = grid.fig
I get the following figure:
I want to introduce those with SDI in the second Y-axes per plot. So far I have tried to define a twin_lineplot as the following:
def twin_lineplot(x,y,color,**kwargs):
ax = plt.twinx()
sns.lineplot(x=x,y=y,color=color,**kwargs, ax=ax)
grid = sns.FacetGrid(
incident_cnt_long.reset_index(),
col="trust_id",
hue="cohort",
col_wrap=2,
legend_out=True,
palette=["#FF4613", "#00FFAA", "#131E29"]
)
grid.map(sns.lineplot, "index_event_datetime", "cnt")
grid.map(twin_lineplot, "index_event_datetime", "cnt")
for ax in grid.axes:
# ax.xaxis.set_major_locator(mdates.MonthLocator((1, 7)))
# ax.xaxis.set_major_formatter(mdates.DateFormatter("%b-%Y"))
ax.xaxis.set_tick_params(rotation=90)
ax.set_xlabel(None)
ax.set_ylabel(None)
# grid.add_legend()
grid.fig.legend(["SEP", "STH", "SDI"])
grid.fig.supylabel("Patients (no)")
grid.fig.supxlabel("Date of index event", va="top")
grid.fig.set_size_inches(plt.rcParams["figure.figsize"])
grid.tight_layout()
plt.show()
fig = grid.fig
but I am not getting the desired output:

Cannot Plot Time alone as x-axis >> TypeError: float() argument must be a string or a number, not 'datetime.time'

I am trying to plot a graph of Time (6:00 am to 6:00 pm) against temperature and other parameters
but I have been struggling all week
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import matplotlib.dates as mdates
import datetime
import random
df = pd.read_excel('g.xlsx')
TIME TEMP RH WS NOISE
0 06:00:00 26.3 78.4 0.1 69.2
1 06:10:00 26.8 77.4 0.0 82.0
2 06:20:00 27.1 76.8 0.2 81.0
3 06:30:00 27.1 76.4 0.3 74.0
4 06:40:00 27.4 75.4 0.4 74.0
... ... ... ... ... ...
68 17:20:00 32.5 57.7 0.5 76.1
69 17:30:00 31.8 60.6 2.2 73.4
70 17:40:00 31.4 60.8 0.4 71.8
71 17:50:00 31.2 61.3 0.2 77.3
72 18:00:00 30.9 62.3 2.2 78.1
even when I try to convert the column to date time
df['TIME'] = pd.to_datetime(df['TIME'],format= '%H:%M:%S' ).dt.time
and I try plotting
plt.plot(df.TIME, df.TEMP)
I get this error message >> TypeError: float() argument must be a string or a number, not 'datetime.time'
please assist me
df.plot works instead of plt.plot
but the downside is I am unable to treat the figure as fig and manipulate the graph
df.plot(x="TIME", y=["TEMP"])
df.plot.line(x="TIME", y=["TEMP"])
The downside with this is the time should start at the beginning 6:00 am and end at 6:00 pm, but it's unable to be manipulated, adding figure doesn't work
fig = plt.figure(1, figsize=(5, 5))
Thanks and waiting for your fast response

You can pass an axes to df.plot:
f, ax = plt.subplots(figsize=(5, 5))
df.plot(x='TIME', y='TEMP', ax=ax)
ax.set_xlim(6*60*60, 18*60*60) # time in seconds
output:
It looks like scatter plot is not working well with datetime. You can use this workaround:
f, ax = plt.subplots(figsize=(5, 5))
df.plot(x='TIME', y='TEMP', ax=ax, style='.')
ax.set_xlim(6*60*60, 18*60*60)

I had a similar problem in which the same error message arose, but not using Pandas. My code went something like this:
from datetime import datetime
import matplotlib.pyplot at plt
x = [datetime(2022,1,1, 6).time(),
datetime(2022,1,1, 9).time(),
datetime(2022,1,1, 12).time(),
datetime(2022,1,1, 15).time(),
datetime(2022,1,1, 18).time()]
y = [1,5,7,5,1] #(shape of solar intensity)
fig = plt.plot()
ax = fig.subplot(111)
ax.plot(x,y)
The problem was that matplotlib could not plot datetime.time objects. I got around the problem by instead plotting y against x1=[1,2,3,4,5] and then setting the x-ticks:
ax.set_xticks(x1, ["6am","9am","12pm","3pm","6pm"])

Grid line with date and time data in x axis in matplotlib

I created a line graph of the location of Tropical cyclone with time on a 6 hourly basis. I successfully plot the graph with all the needed labels except the gridline both the major and the minor gridlines. The major gridline appears in a 12 hours interval instead of 6 hours. My goal is to put the start of the major grid line at 0 and not a few mm east of 0 in the x-axis. Another thing, I cannot put a minor grid line just at the center between the two major gridlines to represent 6 hourly data or create a 6 hours interval major gridline.
The image below shows the result of my code.
And this is my code.
import matplotlib.pyplot as plt
from matplotlib.ticker import (AutoMinorLocator, MultipleLocator)
from matplotlib.dates import HourLocator, MonthLocator, YearLocator
fig, ax = plt.subplots()
ax.plot(df.time,df.Distance, color='r',marker = 'o', linestyle ='--')
ax.set_xlabel('Date and Time')
ax.set_ylabel('Distance (km)')
ax.set_title('The expected distance of Tropical cyclone')
plt.grid(True)
ax.minorticks_on()
plt.grid(which='major',axis ='y', linewidth='1', color='black')
plt.grid(which='minor', linestyle=':', linewidth='0.5', color='black')
ax.tick_params(which='both', # Options for both major and minor ticks
top='off', # turn off top ticks
left='off', # turn off left ticks
right='off', # turn off right ticks
bottom='off') # turn off bottom ticks
hloc = HourLocator(1)
ax.xaxis.set_minor_locator(hloc)
ax.yaxis.set_minor_locator(MultipleLocator(50))
m = np.arange(0,round(max(df.Distance+200),100),100)
ax.set_yticks(m)
plt.xticks(rotation=45)
plt.ylim(0,1500)
plt.show()
and my data for the x-axis is this-
0 2019-09-24 04:00:00
1 2019-09-24 10:00:00
2 2019-09-24 16:00:00
3 2019-09-24 22:00:00
4 2019-09-25 04:00:00
5 2019-09-25 10:00:00
6 2019-09-25 16:00:00
7 2019-09-25 22:00:00
8 2019-09-26 04:00:00
9 2019-09-26 10:00:00
10 2019-09-26 16:00:00
11 2019-09-26 22:00:00
12 2019-09-27 04:00:00
13 2019-09-27 10:00:00
14 2019-09-27 16:00:00
15 2019-09-27 22:00:00
16 2019-09-28 04:00:00
and the y-axis
0 1385
1 1315
2 1245
3 1175
4 1105
5 1050
6 995
7 935
8 880
9 835
10 790
11 745
12 485
13 435
14 390
15 350
16 315

Revised to 6 hour intervals. I wasn't sure of the intent of the grid, so I posted the details and no grid.
df['time'] = pd.to_datetime(df['time'])
import matplotlib.pyplot as plt
from matplotlib.ticker import (AutoMinorLocator, MultipleLocator)
from matplotlib.dates import HourLocator, MonthLocator, YearLocator
import matplotlib.dates as mdates
fig, ax = plt.subplots(figsize=(20,12))
ax.plot(df.time,df.Distance, color='r',marker = 'o', linestyle ='--')
ax.set_xlabel('Date and Time')
ax.set_ylabel('Distance (km)')
ax.set_title('The expected distance of Tropical cyclone')
plt.grid(True)
ax.minorticks_on()
plt.grid(which='major',axis ='y', linewidth='1', color='black')
plt.grid(which='minor', linestyle=':', linewidth='0.5', color='black')
ax.tick_params(which='both', # Options for both major and minor ticks
top='off', # turn off top ticks
left='off', # turn off left ticks
right='off', # turn off right ticks
bottom='off') # turn off bottom ticks
# hloc = HourLocator(1)
# ax.xaxis.set_minor_locator(hloc)
# ax.yaxis.set_minor_locator(MultipleLocator(50))
ax.xaxis.set_minor_locator(HourLocator(byhour=None, interval=3, tz=None))
ax.xaxis.set_major_locator(HourLocator(byhour=None, interval=6, tz=None))
ax.xaxis.set_major_formatter(mdates.DateFormatter("%m-%d %H"))
m = np.arange(0,round(max(df.Distance+200),100),100)
ax.set_yticks(m)
plt.xticks(rotation=45)
plt.ylim(0,1500)
plt.xlim(df['time'].min(), df['time'].max())
plt.show()

To view the minor gridlines, you should run
plt.minorticks_on()
To limit the x-axis of the chart, do:
plt.xlim(df.time.min(), df.time.max())
The result is below. As you can see, there's a major x-gridline every 6 hours and a minor one every hour.

Plot bar and line using both right and left axis in Matplotlib

Give a dataframe as follows:
date gdp tertiary_industry gdp_growth tertiary_industry_growth
0 2015/3/31 3768 2508 10.3 11.3
1 2015/6/30 8285 5483 10.9 12.0
2 2015/9/30 12983 8586 11.5 12.7
3 2015/12/31 18100 12086 10.5 13.2
4 2016/3/31 4118 2813 13.5 14.6
5 2016/6/30 8844 6020 13.3 14.3
6 2016/9/30 14038 9513 14.4 13.9
7 2016/12/31 19547 13557 16.3 13.3
8 2017/3/31 4692 3285 13.3 12.4
9 2017/6/30 9891 6881 12.9 12.5
10 2017/9/30 15509 10689 12.7 12.3
11 2017/12/31 21503 15254 14.8 12.7
12 2018/3/31 4954 3499 12.4 11.3
13 2018/6/30 10653 7520 12.9 12.4
14 2018/9/30 16708 11697 13.5 13.0
15 2018/12/31 22859 16402 14.0 13.2
16 2019/3/31 5508 3983 13.5 13.9
17 2019/6/30 11756 8556 10.2 13.4
18 2019/9/30 17869 12765 10.2 14.8
19 2019/12/31 23629 16923 11.6 15.2
20 2020/3/31 5229 3968 11.9 14.9
I have applied following code to draw a bar plot for gdp and tertiary_industry.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import matplotlib.ticker as ticker
import matplotlib.style as style
style.available
style.use('fivethirtyeight')
from pylab import rcParams
plt.rcParams["figure.figsize"] = (20, 10)
plt.rcParams['font.sans-serif']=['SimHei']
plt.rcParams['axes.unicode_minus']=False
import matplotlib
matplotlib.matplotlib_fname()
plt.rcParams.update({'font.size': 25})
colors = ['#c23531','#2f4554', '#61a0a8', '#d48265', '#91c7ae','#749f83', '#ca8622', '#bda29a', '#6e7074', '#546570', '#c4ccd3']
df = df.sort_values(by = 'date')
df['date'] = pd.to_datetime(df['date']).dt.to_period('M')
df = df.set_index('date')
df.columns
cols = ['gdp', 'tertiary_industry']
df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')
color_dict = dict(zip(cols, colors))
plt.figure(figsize=(20, 10))
df[cols].plot(color=[color_dict.get(x, '#333333') for x in df.columns], kind='bar', width=0.8)
plt.xticks(rotation=45)
plt.xlabel("")
plt.ylabel("million dollar")
fig = plt.gcf()
plt.show()
plt.draw()
fig.savefig("./gdp.png", dpi=100, bbox_inches = 'tight')
plt.clf()
The output from the code above:
Now I want to use line type and right axis to draw gdp_growth and tertiary_industry_growth, which are percentage values, on the same plot.
Please note I want to use colors from customized color list in the code instead of default ones.
How could I do that based on code above? Thanks a lot for your kind help.

This is what I would do:
#convert to datetime
df['date'] = pd.to_datetime(df['date']).dt.to_period('M')
cols = ['gdp', 'tertiary_industry']
colors = ['#c23531','#2f4554', '#61a0a8', '#d48265', '#91c7ae','#749f83', '#ca8622', '#bda29a', '#6e7074', '#546570', '#c4ccd3']
df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')
# modify color_dict here:
color_dict = dict(zip(cols, colors))
# initialize an axis instance
fig, ax = plt.subplots(figsize=(10,6))
# plot on new instance
df.plot.bar(y=cols,ax=ax,
color=[color_dict.get(x, '#333333') for x in cols])
# create a twinx axis
ax1 = ax.twinx()
# plot the other two columns on this axis
df.plot.line(y=['gdp_growth','tertiary_industry_growth'], ax=ax1,
color=[color_dict.get(x, '#333333') for x in line_cols])
ax.set_xticklabels(df['date'])
# set y-axes labels:
ax.set_ylabel('Million Dollar')
ax1.set_ylabel('%')
# set x-axis label
ax.set_xlabel('Quarter')
plt.show()
Output:
If you replace both colors=[...] in the above codes with your original color=[color_dict.get(x, '#333333') for x in df.columns] you would get

Pandas TypeError when adding bar chart to the plot

I am working on creating a plot featuring two line plots - planned and actual production, and a bar chart showing the difference between those.
I've created line plots:
ax.plot_date(df['Date'], df['Planned_x'], 'b-', c='red')
ax.plot_date(df['Date'], df['Actuals'], 'b-', c='blue')
Then later I saw in an old question on Stack Overflow that incorporating bar chart will be easier if I switched plot_date for normal plot and passed ax.xaxis_date() separately since this is all plot_date does and so I've changed the code accordingly.
It all works fine so long as I don't try to add the bar chart, but as soon as I do it like so:
ax.plot(df['Date'], df['Planned_x'], 'b-', c='red')
ax.plot(df['Date'], df['Actuals'], 'b-', c='blue')
ax.bar(df['Date'], df['Delta'], c='black', width=1)
ax.xaxis_date()
...I start getting TypeErrors: TypeError: the dtypes of parameters x (datetime64[ns]) and width (int32) are incompatible
I looked around, but most of all I found were bug reports on matplotlib and Pandas github pages and there were no solutions that were of any help to me.
EDIT:
Here's the example data from the Dataframe:
Date Planned_x Actuals ... C2P (%) Planned_y Delta
766 2019-09-19 284.000000 439.0 ... NaN NaN -155.000000
767 2019-09-20 284.000000 469.0 ... NaN NaN -185.000000
768 2019-09-21 260.000000 240.0 ... NaN NaN 20.000000
769 2019-09-22 305.000000 229.0 ... NaN NaN 76.000000
770 2019-09-23 351.000000 225.0 ... 0.533391 NaN 126.000000
771 2019-09-24 387.353430 1.0 ... NaN NaN 386.353430
772 2019-09-25 444.317519 152.0 ... NaN NaN 292.317519
773 2019-09-26 475.557830 300.0 ... NaN NaN 175.557830
774 2019-09-27 404.524517 150.0 ... NaN NaN 254.524517
775 2019-09-28 355.303705 550.0 ... NaN NaN -194.696295

I used your data and indexed the date column, by tagging ".set_index('Date')"
df = pd.DataFrame(data,columns=['Date','Planned_x','Actuals','C2P','Planned_y','Delta']).set_index('Date')
I assume you already have some code to attach the plt board to your data, like:
ax = plt.subplot(111)
Then you trick the matplotlib, saying:
plt.bar(df.index, df.Delta)
Remember that your index is your dataframe column Date.
The only problem I see here is the messed up with the date labels, maybe you need to choose to show a reduced amount of data or so.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to groupby aggregate min / max and plot grouped bars - python

Related

python: complex plot with sns.FacetGrid() and secondary Y-axes

Cannot Plot Time alone as x-axis >> TypeError: float() argument must be a string or a number, not 'datetime.time'

Grid line with date and time data in x axis in matplotlib

Plot bar and line using both right and left axis in Matplotlib

Pandas TypeError when adding bar chart to the plot

Categories

Resources