Plot bar and line using both right and left axis in Matplotlib - python

Give a dataframe as follows:
date gdp tertiary_industry gdp_growth tertiary_industry_growth
0 2015/3/31 3768 2508 10.3 11.3
1 2015/6/30 8285 5483 10.9 12.0
2 2015/9/30 12983 8586 11.5 12.7
3 2015/12/31 18100 12086 10.5 13.2
4 2016/3/31 4118 2813 13.5 14.6
5 2016/6/30 8844 6020 13.3 14.3
6 2016/9/30 14038 9513 14.4 13.9
7 2016/12/31 19547 13557 16.3 13.3
8 2017/3/31 4692 3285 13.3 12.4
9 2017/6/30 9891 6881 12.9 12.5
10 2017/9/30 15509 10689 12.7 12.3
11 2017/12/31 21503 15254 14.8 12.7
12 2018/3/31 4954 3499 12.4 11.3
13 2018/6/30 10653 7520 12.9 12.4
14 2018/9/30 16708 11697 13.5 13.0
15 2018/12/31 22859 16402 14.0 13.2
16 2019/3/31 5508 3983 13.5 13.9
17 2019/6/30 11756 8556 10.2 13.4
18 2019/9/30 17869 12765 10.2 14.8
19 2019/12/31 23629 16923 11.6 15.2
20 2020/3/31 5229 3968 11.9 14.9
I have applied following code to draw a bar plot for gdp and tertiary_industry.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import matplotlib.ticker as ticker
import matplotlib.style as style
style.available
style.use('fivethirtyeight')
from pylab import rcParams
plt.rcParams["figure.figsize"] = (20, 10)
plt.rcParams['font.sans-serif']=['SimHei']
plt.rcParams['axes.unicode_minus']=False
import matplotlib
matplotlib.matplotlib_fname()
plt.rcParams.update({'font.size': 25})
colors = ['#c23531','#2f4554', '#61a0a8', '#d48265', '#91c7ae','#749f83', '#ca8622', '#bda29a', '#6e7074', '#546570', '#c4ccd3']
df = df.sort_values(by = 'date')
df['date'] = pd.to_datetime(df['date']).dt.to_period('M')
df = df.set_index('date')
df.columns
cols = ['gdp', 'tertiary_industry']
df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')
color_dict = dict(zip(cols, colors))
plt.figure(figsize=(20, 10))
df[cols].plot(color=[color_dict.get(x, '#333333') for x in df.columns], kind='bar', width=0.8)
plt.xticks(rotation=45)
plt.xlabel("")
plt.ylabel("million dollar")
fig = plt.gcf()
plt.show()
plt.draw()
fig.savefig("./gdp.png", dpi=100, bbox_inches = 'tight')
plt.clf()
The output from the code above:
Now I want to use line type and right axis to draw gdp_growth and tertiary_industry_growth, which are percentage values, on the same plot.
Please note I want to use colors from customized color list in the code instead of default ones.
How could I do that based on code above? Thanks a lot for your kind help.

This is what I would do:
#convert to datetime
df['date'] = pd.to_datetime(df['date']).dt.to_period('M')
cols = ['gdp', 'tertiary_industry']
colors = ['#c23531','#2f4554', '#61a0a8', '#d48265', '#91c7ae','#749f83', '#ca8622', '#bda29a', '#6e7074', '#546570', '#c4ccd3']
df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')
# modify color_dict here:
color_dict = dict(zip(cols, colors))
# initialize an axis instance
fig, ax = plt.subplots(figsize=(10,6))
# plot on new instance
df.plot.bar(y=cols,ax=ax,
color=[color_dict.get(x, '#333333') for x in cols])
# create a twinx axis
ax1 = ax.twinx()
# plot the other two columns on this axis
df.plot.line(y=['gdp_growth','tertiary_industry_growth'], ax=ax1,
color=[color_dict.get(x, '#333333') for x in line_cols])
ax.set_xticklabels(df['date'])
# set y-axes labels:
ax.set_ylabel('Million Dollar')
ax1.set_ylabel('%')
# set x-axis label
ax.set_xlabel('Quarter')
plt.show()
Output:
If you replace both colors=[...] in the above codes with your original color=[color_dict.get(x, '#333333') for x in df.columns] you would get

Related

How to groupby aggregate min / max and plot grouped bars

I am trying to graph the functions of min () and max () in the same graph, I already could with the function of max () but how can I join the two in the same graph and that it can be displayed correctly?
Example of my code and my output:
df.groupby('fecha_inicio')['capacidad_base_firme'].max().plot(kind='bar', legend = 'Reverse')
plt.xlabel('Tarifa de Base firme por Zona')
And my output of my dataframe:
zona capacidad_base_firme ... fecha_inicio fecha_fin
0 Sur 1.52306 ... 2016-01-01 2016-03-31
1 Centro 2.84902 ... 2016-01-01 2016-03-31
2 Occidente 1.57302 ... 2016-01-01 2016-03-31
3 Golfo 3.06847 ... 2016-01-01 2016-03-31
4 Norte 4.34706 ... 2016-01-01 2016-03-31
.. ... ... ... ... ...
67 Golfo 5.22776 ... 2017-10-01 2017-12-31
68 Norte 6.99284 ... 2017-10-01 2017-12-31
69 Istmo 7.25957 ... 2017-10-01 2017-12-31
70 Nacional 0.21971 ... 2017-10-01 2017-12-31
71 Nacional con AB -0.72323 ... 2017-10-01 2017-12-31
[72 rows x 10 columns]
The correct way is to aggregate multiple metrics at the same time with .agg, and then plot directly with pandas.DataFrame.plot
There is no need to call .groupby for each metric. For very large datasets, this can be resource intensive.
There is also no need to create a figure and axes with a separate call to matplotlib, as this is taken care of by pandas.DataFrame.plot, which uses matplotlib as the default backend.
Tested in python 3.9.7, pandas 1.3.4, matplotlib 3.5.0
import seaborn as sns # for data
import pandas as pd
import matplotlib.pyplot as plt
# load the test data
df = sns.load_dataset('penguins')
# display(df.head(3))
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
0 Adelie Torgersen 39.1 18.7 181.0 3750.0 Male
1 Adelie Torgersen 39.5 17.4 186.0 3800.0 Female
2 Adelie Torgersen 40.3 18.0 195.0 3250.0 Female
# aggregate metrics on a column
dfg = df.groupby('species').bill_length_mm.agg(['min', 'max'])
# display(dfg)
min max
species
Adelie 32.1 46.0
Chinstrap 40.9 58.0
Gentoo 40.9 59.6
# plot the grouped bar
ax = dfg.plot(kind='bar', figsize=(8, 6), title='Bill Length (mm)', xlabel='Species', ylabel='Length (mm)', rot=0)
plt.show()
Use stacked=True for stacked bars
ax = dfg.plot(kind='bar', figsize=(8, 6), title='Bill Length (mm)', xlabel='Species', ylabel='Length (mm)', rot=0, stacked=True)
Step 1
Create a subplot to plot the data to
fig, ax = plt.subplots()
Step 2
Plot your DataFrame maximum and minimum to the specific axis
df.groupby('fecha_inicio')['capacidad_base_firme'].max().plot(ax = ax, kind='bar', legend = 'Reverse', label='Maximum')
df.groupby('fecha_inicio')['capacidad_base_firme'].min().plot(ax = ax, kind='bar', legend = 'Reverse', label='Minimum')
You may need to adjust the zorder to get the effect of a stacked bar plot.

Cannot Plot Time alone as x-axis >> TypeError: float() argument must be a string or a number, not 'datetime.time'

I am trying to plot a graph of Time (6:00 am to 6:00 pm) against temperature and other parameters
but I have been struggling all week
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import matplotlib.dates as mdates
import datetime
import random
df = pd.read_excel('g.xlsx')
TIME TEMP RH WS NOISE
0 06:00:00 26.3 78.4 0.1 69.2
1 06:10:00 26.8 77.4 0.0 82.0
2 06:20:00 27.1 76.8 0.2 81.0
3 06:30:00 27.1 76.4 0.3 74.0
4 06:40:00 27.4 75.4 0.4 74.0
... ... ... ... ... ...
68 17:20:00 32.5 57.7 0.5 76.1
69 17:30:00 31.8 60.6 2.2 73.4
70 17:40:00 31.4 60.8 0.4 71.8
71 17:50:00 31.2 61.3 0.2 77.3
72 18:00:00 30.9 62.3 2.2 78.1
even when I try to convert the column to date time
df['TIME'] = pd.to_datetime(df['TIME'],format= '%H:%M:%S' ).dt.time
and I try plotting
plt.plot(df.TIME, df.TEMP)
I get this error message >> TypeError: float() argument must be a string or a number, not 'datetime.time'
please assist me
df.plot works instead of plt.plot
but the downside is I am unable to treat the figure as fig and manipulate the graph
df.plot(x="TIME", y=["TEMP"])
df.plot.line(x="TIME", y=["TEMP"])
The downside with this is the time should start at the beginning 6:00 am and end at 6:00 pm, but it's unable to be manipulated, adding figure doesn't work
fig = plt.figure(1, figsize=(5, 5))
Thanks and waiting for your fast response
You can pass an axes to df.plot:
f, ax = plt.subplots(figsize=(5, 5))
df.plot(x='TIME', y='TEMP', ax=ax)
ax.set_xlim(6*60*60, 18*60*60) # time in seconds
output:
It looks like scatter plot is not working well with datetime. You can use this workaround:
f, ax = plt.subplots(figsize=(5, 5))
df.plot(x='TIME', y='TEMP', ax=ax, style='.')
ax.set_xlim(6*60*60, 18*60*60)
I had a similar problem in which the same error message arose, but not using Pandas. My code went something like this:
from datetime import datetime
import matplotlib.pyplot at plt
x = [datetime(2022,1,1, 6).time(),
datetime(2022,1,1, 9).time(),
datetime(2022,1,1, 12).time(),
datetime(2022,1,1, 15).time(),
datetime(2022,1,1, 18).time()]
y = [1,5,7,5,1] #(shape of solar intensity)
fig = plt.plot()
ax = fig.subplot(111)
ax.plot(x,y)
The problem was that matplotlib could not plot datetime.time objects. I got around the problem by instead plotting y against x1=[1,2,3,4,5] and then setting the x-ticks:
ax.set_xticks(x1, ["6am","9am","12pm","3pm","6pm"])

I'm getting float axis even with the command MaxNlocator(integer=True)

I have this df called normales:
CODIGO MES TMAX TMIN PP
0 000130 Enero 31.3 23.5 51.1
1 000130 Febrero 31.7 23.8 136.7
2 000130 Marzo 31.8 23.9 119.5
3 000130 Abril 31.5 23.7 55.6
4 000130 Mayo 30.6 23.1 15.6
... ... ... ... ...
4447 158328 Agosto 11.9 -10.6 2.2
4448 158328 Septiembre 13.2 -9.1 1.2
4449 158328 Octubre 14.6 -8.2 4.9
4450 158328 Noviembre 15.4 -7.2 11.1
4451 158328 Diciembre 14.7 -5.3 35.9
With this code i'm plotting time series and bars:
from matplotlib.ticker import MaxNLocator
from matplotlib.font_manager import FontProperties
for code, data in normales.groupby('CODIGO'):
fig, (ax1, ax2, ax3, ax4) = plt.subplots(4, sharex=False, sharey=False,figsize=(20, 15))
data.plot('MES',["TMAX"], alpha=0.5, color='red', marker='P', fontsize = 15.0,ax=ax1)
data.plot('MES',["TMIN"], alpha=0.5,color='blue',marker='D', fontsize = 15.0,ax=ax2)
data.plot('MES',["PP"],kind='bar',color='green', fontsize = 15.0,ax=ax3)
tabla=ax4.table(cellText=data[['TMAX','TMIN','PP']].T.values,colLabels=["Enero","Febrero","Marzo","Abril","Mayo","Junio","Julio","Agosto",
"Septiembre","Octubre","Noviembre","Diciembre"],
rowLabels=data[['TMAX','TMIN','PP']].columns,rowColours =["red","blue","green"],
colColours =["black"] * 12,loc="center",bbox = [0.0, -0.5, 1, 1])
tabla.auto_set_font_size(False)
tabla.set_fontsize(15)
tabla.scale(1,2)
ax4.axis('off')
ax1.set_ylabel("Temperatura\nMáxima °C/mes", fontsize = 15.0)
ax1.yaxis.set_major_locator(MaxNLocator(integer=True))
ax2.set_ylabel("Temperatura\nMínima °C/mes", fontsize = 15.0)
ax2.yaxis.set_major_locator(MaxNLocator(integer=True))
ax3.set_ylabel("Precipitación mm/mes", fontsize = 15.0)
ax3.yaxis.set_major_locator(MaxNLocator(integer=True))
ax1.set_xlabel("")
ax2.set_xlabel("")
ax3.set_xlabel("")
ax4.set_xlabel("")
You can realize that i'm using ax.yaxis.set_major_locator(MaxNLocator(integer=True)) in every axis to make integer the numbers of the axis. Although i'm using ax.yaxis.set_major_locator(MaxNLocator(integer=True)) i'm getting graphics with non integer (float) values in the yaxis. Do you know why this is happening?
Thanks in advance.
From the MaxNLocator docs:
integer bool, default: False
If True, ticks will take only integer values, provided at least min_n_ticks integers are found within the view limits.
....
min_n_ticks int, default: 2
You need to change min_n_ticks to 1 since ax2 only has one integer within the view limits, namely 12.

Plot histogram on binary based column data versus continuous column of data

Here is the data showing two columns between which I need to plot histogram
Cont Bin_Data
21 1
21 1
22.8 1
21.4 0
18.7 0
18.1 0
14.3 0
24.4 0
22.8 1
19.2 1
17.8 0
16.4 1
17.3 0
15.2 1
I have to plot Bin_Data(column) based histogram to compare Cont(column). I have tried 3 approaches and am not getting satisfactory result/plot.
Approach #1
plt.hist('mpg', bins=5, data=am)
Approach #2
plt.hist(mpg, bins=np.arange(mpg.min(), mpg.max()+1))
Approach #3
am = data['am']
legend = ['am', 'mpg']
mpg = data['mpg']
plt.hist([mpg, am], color=['orange', 'green'])
plt.xlabel("am")
plt.ylabel("mpg")
plt.legend(legend)
#plt.xticks(range(0, 7))
#plt.yticks(range(1, 20))
plt.title('Analysis of "am" upon "mpg"')
plt.show()

How to make a 4d plot using Python with matplotlib

I am looking for a way to create four-dimensional plots (surface plus a color scale) using Python and matplotlib. I am able to generate the surface using the first three variables, but I am not having success adding the color scale for the fourth variable. Here is a small subset of my data below. Any help would be greatly appreciated. Thanks
Data Subset
var1 var2 var3 var4
10.39 73.32 2.02 28.26
11.13 68.71 1.86 27.83
12.71 74.27 1.89 28.26
11.46 91.06 1.63 28.26
11.72 85.38 1.51 28.26
13.39 78.68 1.89 28.26
13.02 68.02 2.01 28.26
12.08 64.37 2.18 28.26
11.58 60.71 2.28 28.26
8.94 65.67 1.92 27.04
11.61 59.57 2.32 27.52
19.06 74.49 1.69 63.35
17.52 73.62 1.73 63.51
19.52 71.52 1.79 63.51
18.76 67.55 1.86 63.51
19.84 53.34 2.3 63.51
20.19 59.82 1.97 63.51
17.43 57.89 2.05 63.38
17.9 59.95 1.89 63.51
18.97 57.84 2 63.51
19.22 57.74 2.05 63.51
17.55 55.66 1.99 63.51
19.22 101.31 6.76 94.29
19.41 99.47 6.07 94.15
18.99 94.01 7.32 94.08
19.88 103.57 6.98 94.58
19.08 95.38 5.66 94.14
20.36 100.43 6.13 94.47
20.13 98.78 7.37 94.47
20.36 89.36 8.79 94.71
20.96 84.48 8.33 94.01
21.02 83.97 6.78 94.72
19.6 95.64 6.56 94.57
To create the plot you want, we need to use matplotlib's plot_surface to plot Z vs (X,Y) surface, and then use the keyword argument facecolors to pass in a new color for each patch.
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
# create some fake data
x = y = np.arange(-4.0, 4.0, 0.02)
# here are the x,y and respective z values
X, Y = np.meshgrid(x, y)
Z = np.sinc(np.sqrt(X*X+Y*Y))
# this is the value to use for the color
V = np.sin(Y)
# create the figure, add a 3d axis, set the viewing angle
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.view_init(45,60)
# here we create the surface plot, but pass V through a colormap
# to create a different color for each patch
ax.plot_surface(X, Y, Z, facecolors=cm.Oranges(V))

Categories

Resources