Subplot secondary axis - Python, matplotlib - python

I have a dataframe called conversionRate like this:
| State| Apps | Loans| conversionratio|
2013-01-01 IL 1165 152 13.047210
2013-01-01 NJ 2210 756 34.208145
2013-01-01 TX 1454 73 5.020633
2013-02-01 CA 2265 400 17.660044
2013-02-01 IL 1073 168 15.657036
2013-02-01 NJ 2036 739 36.296660
2013-02-01 TX 1370 63 4.598540
2013-03-01 CA 2545 548 21.532417
2013-03-01 IL 1108 172 15.523466
I intend to plot the number of apps and number of loans in the primary Y axis and the Conversion Ratio in the secondary axis for each state.
I tried the below code:
import math
rows =int(math.ceil(len(pd.Series.unique(conversionRate['State']))/2))
fig, axes = plt.subplots(nrows=rows, ncols=2, figsize=(10, 10),sharex=True, sharey=False)
columnCounter = itertools.cycle([0,1])
rowCounter1 = 0
for element in pd.Series.unique(conversionRate['State']):
rowCounter = (rowCounter1)//2
rowCounter1 = (rowCounter1+1)
subSample = conversionRate[conversionRate['State']==element]
axis=axes[rowCounter,next(columnCounter)]
#ax2 = axis.twinx()
subSample.plot(y=['Loans', 'Apps'],secondary_y=['conversionratio'],\
ax=axis)
I end up with a figure like the below:
The question is how do I get the secondary axis line to show? If I try the below (per the manual setting secondary_y in plot() should selectively plot those columns in the secondary axis), I see only the line I plot on the secondary axis. There must be something simple and obvious I am missing. I can't figure out what it is! Can any guru please help?
subSample.plot(secondary_y=['conversionratio'],ax=axis)

You need to include conversionration in y=['Loans', 'Apps','conversionratio'] as well as in secondary_y... or better yet leave that parameter out, since you're plotting all the columns.
rows =int(math.ceil(len(pd.Series.unique(conversionRate['State']))/2))
fig, axes = plt.subplots(nrows=rows, ncols=2, figsize=(10,
10),sharex=True, sharey=False)
columnCounter = itertools.cycle([0,1])
rowCounter1 = 0
for element in pd.Series.unique(conversionRate['State']):
rowCounter = (rowCounter1)//2
rowCounter1 = (rowCounter1+1)
subSample = conversionRate[conversionRate['State']==element]
axis=axes[rowCounter,next(columnCounter)]
#ax2 = axis.twinx()
subSample.plot(secondary_y=['conversionratio'], ax=axis)

Related

How to groupby aggregate min / max and plot grouped bars

I am trying to graph the functions of min () and max () in the same graph, I already could with the function of max () but how can I join the two in the same graph and that it can be displayed correctly?
Example of my code and my output:
df.groupby('fecha_inicio')['capacidad_base_firme'].max().plot(kind='bar', legend = 'Reverse')
plt.xlabel('Tarifa de Base firme por Zona')
And my output of my dataframe:
zona capacidad_base_firme ... fecha_inicio fecha_fin
0 Sur 1.52306 ... 2016-01-01 2016-03-31
1 Centro 2.84902 ... 2016-01-01 2016-03-31
2 Occidente 1.57302 ... 2016-01-01 2016-03-31
3 Golfo 3.06847 ... 2016-01-01 2016-03-31
4 Norte 4.34706 ... 2016-01-01 2016-03-31
.. ... ... ... ... ...
67 Golfo 5.22776 ... 2017-10-01 2017-12-31
68 Norte 6.99284 ... 2017-10-01 2017-12-31
69 Istmo 7.25957 ... 2017-10-01 2017-12-31
70 Nacional 0.21971 ... 2017-10-01 2017-12-31
71 Nacional con AB -0.72323 ... 2017-10-01 2017-12-31
[72 rows x 10 columns]
The correct way is to aggregate multiple metrics at the same time with .agg, and then plot directly with pandas.DataFrame.plot
There is no need to call .groupby for each metric. For very large datasets, this can be resource intensive.
There is also no need to create a figure and axes with a separate call to matplotlib, as this is taken care of by pandas.DataFrame.plot, which uses matplotlib as the default backend.
Tested in python 3.9.7, pandas 1.3.4, matplotlib 3.5.0
import seaborn as sns # for data
import pandas as pd
import matplotlib.pyplot as plt
# load the test data
df = sns.load_dataset('penguins')
# display(df.head(3))
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
0 Adelie Torgersen 39.1 18.7 181.0 3750.0 Male
1 Adelie Torgersen 39.5 17.4 186.0 3800.0 Female
2 Adelie Torgersen 40.3 18.0 195.0 3250.0 Female
# aggregate metrics on a column
dfg = df.groupby('species').bill_length_mm.agg(['min', 'max'])
# display(dfg)
min max
species
Adelie 32.1 46.0
Chinstrap 40.9 58.0
Gentoo 40.9 59.6
# plot the grouped bar
ax = dfg.plot(kind='bar', figsize=(8, 6), title='Bill Length (mm)', xlabel='Species', ylabel='Length (mm)', rot=0)
plt.show()
Use stacked=True for stacked bars
ax = dfg.plot(kind='bar', figsize=(8, 6), title='Bill Length (mm)', xlabel='Species', ylabel='Length (mm)', rot=0, stacked=True)
Step 1
Create a subplot to plot the data to
fig, ax = plt.subplots()
Step 2
Plot your DataFrame maximum and minimum to the specific axis
df.groupby('fecha_inicio')['capacidad_base_firme'].max().plot(ax = ax, kind='bar', legend = 'Reverse', label='Maximum')
df.groupby('fecha_inicio')['capacidad_base_firme'].min().plot(ax = ax, kind='bar', legend = 'Reverse', label='Minimum')
You may need to adjust the zorder to get the effect of a stacked bar plot.

Background with range on seaborn based on two columns

I am trying to add to my several line plots a background that shows a range from value x (column "Min") to value y (column "Max") for each year. My dataset looks like that:
Country Model Year Costs Min Max
494 FR 1 1990 300 250 350
495 FR 1 1995 250 300 400
496 FR 1 2000 220 330 640
497 FR 1 2005 210 289 570
498 FR 2 1990 400 250 350
555 JPN 8 1990 280 250 350
556 JPN 8 1995 240 300 400
557 JPN 8 2000 200 330 640
558 JPN 8 2005 200 289 570
I used the following code:
example_1 = sns.relplot(data=example, x = "Year", y = "Costs", hue = "Model", style = "Model", col = "Country", kind="line", col_wrap=4,height = 4, dashes = True, markers = True, palette = palette, style_order = style_order)
I would like something like this with the range being my "Min" and "Max" by year.
Is it possible to do it?
Thank you very much !
Usually, grid.map is the tool for this, as shown in many examples in the mutli-plot grids tutorial. But you are using relplot to combine lineplot with a FacetGrid as it is suggested in the docs (last example) which lets you use some extra styling parameters.
Because relplot processes the data a bit differently than if you would first initiate a FacetGrid and then map a lineplot (you can check this with grid.data), using grid.map(plt.bar, ...) to plot the ranges is quite cumbersome as it requires editing the grid.data dataframe as well as the x- and y-axis labels.
The simplest way to plot the ranges is to loop through the grid.axes. This can be done with grid.axes_dict.items() which provides the column names (i.e. countries) that you can use to select the appropriate data for the bars (useful if the ranges were to differ, contrary to this example).
The default figure legend does not contain the complete legend including the key for ranges, but the first ax object does so that one displayed instead of the default legend in the following example. Note that I have edited the data you shared so that the min/max ranges make more sense:
import io
import pandas as pd # v 1.1.3
import matplotlib.pyplot as plt # v 3.3.2
import seaborn as sns # v 0.11.0
data ='''
Country Model Year Costs Min Max
494 FR 1 1990 300 250 350
495 FR 1 1995 250 200 300
496 FR 1 2000 220 150 240
497 FR 1 2005 210 189 270
555 JPN 8 1990 280 250 350
556 JPN 8 1995 240 200 300
557 JPN 8 2000 200 150 240
558 JPN 8 2005 200 189 270
'''
df = pd.read_csv(io.StringIO(data), delim_whitespace=True)
# Create seaborn FacetGrid with line plots
grid = sns.relplot(data=df, x='Year', y='Costs', hue='Model', style='Model',height=3.9,
col='Country', kind='line', markers=True, palette='tab10')
# Loop through axes of the FacetGrid to plot bars for ranges and edit x ticks
for country, ax in grid.axes_dict.items():
df_country = df[df['Country'] == country]
cost_range = df_country['Max']-df_country['Min']
ax.bar(x=df_country['Year'], height=cost_range, bottom=df_country['Min'],
color='black', alpha=0.1, label='Min/max\nrange')
ax.set_xticks(df_country['Year'])
# Remove default seaborn figure legend and show instead full legend stored in first ax
grid._legend.remove()
grid.axes.flat[0].legend(bbox_to_anchor=(2.1, 0.5), loc='center left',
frameon=False, title=grid.legend.get_title().get_text());

Grid line with date and time data in x axis in matplotlib

I created a line graph of the location of Tropical cyclone with time on a 6 hourly basis. I successfully plot the graph with all the needed labels except the gridline both the major and the minor gridlines. The major gridline appears in a 12 hours interval instead of 6 hours. My goal is to put the start of the major grid line at 0 and not a few mm east of 0 in the x-axis. Another thing, I cannot put a minor grid line just at the center between the two major gridlines to represent 6 hourly data or create a 6 hours interval major gridline.
The image below shows the result of my code.
And this is my code.
import matplotlib.pyplot as plt
from matplotlib.ticker import (AutoMinorLocator, MultipleLocator)
from matplotlib.dates import HourLocator, MonthLocator, YearLocator
fig, ax = plt.subplots()
ax.plot(df.time,df.Distance, color='r',marker = 'o', linestyle ='--')
ax.set_xlabel('Date and Time')
ax.set_ylabel('Distance (km)')
ax.set_title('The expected distance of Tropical cyclone')
plt.grid(True)
ax.minorticks_on()
plt.grid(which='major',axis ='y', linewidth='1', color='black')
plt.grid(which='minor', linestyle=':', linewidth='0.5', color='black')
ax.tick_params(which='both', # Options for both major and minor ticks
top='off', # turn off top ticks
left='off', # turn off left ticks
right='off', # turn off right ticks
bottom='off') # turn off bottom ticks
hloc = HourLocator(1)
ax.xaxis.set_minor_locator(hloc)
ax.yaxis.set_minor_locator(MultipleLocator(50))
m = np.arange(0,round(max(df.Distance+200),100),100)
ax.set_yticks(m)
plt.xticks(rotation=45)
plt.ylim(0,1500)
plt.show()
and my data for the x-axis is this-
0 2019-09-24 04:00:00
1 2019-09-24 10:00:00
2 2019-09-24 16:00:00
3 2019-09-24 22:00:00
4 2019-09-25 04:00:00
5 2019-09-25 10:00:00
6 2019-09-25 16:00:00
7 2019-09-25 22:00:00
8 2019-09-26 04:00:00
9 2019-09-26 10:00:00
10 2019-09-26 16:00:00
11 2019-09-26 22:00:00
12 2019-09-27 04:00:00
13 2019-09-27 10:00:00
14 2019-09-27 16:00:00
15 2019-09-27 22:00:00
16 2019-09-28 04:00:00
and the y-axis
0 1385
1 1315
2 1245
3 1175
4 1105
5 1050
6 995
7 935
8 880
9 835
10 790
11 745
12 485
13 435
14 390
15 350
16 315
Revised to 6 hour intervals. I wasn't sure of the intent of the grid, so I posted the details and no grid.
df['time'] = pd.to_datetime(df['time'])
import matplotlib.pyplot as plt
from matplotlib.ticker import (AutoMinorLocator, MultipleLocator)
from matplotlib.dates import HourLocator, MonthLocator, YearLocator
import matplotlib.dates as mdates
fig, ax = plt.subplots(figsize=(20,12))
ax.plot(df.time,df.Distance, color='r',marker = 'o', linestyle ='--')
ax.set_xlabel('Date and Time')
ax.set_ylabel('Distance (km)')
ax.set_title('The expected distance of Tropical cyclone')
plt.grid(True)
ax.minorticks_on()
plt.grid(which='major',axis ='y', linewidth='1', color='black')
plt.grid(which='minor', linestyle=':', linewidth='0.5', color='black')
ax.tick_params(which='both', # Options for both major and minor ticks
top='off', # turn off top ticks
left='off', # turn off left ticks
right='off', # turn off right ticks
bottom='off') # turn off bottom ticks
# hloc = HourLocator(1)
# ax.xaxis.set_minor_locator(hloc)
# ax.yaxis.set_minor_locator(MultipleLocator(50))
ax.xaxis.set_minor_locator(HourLocator(byhour=None, interval=3, tz=None))
ax.xaxis.set_major_locator(HourLocator(byhour=None, interval=6, tz=None))
ax.xaxis.set_major_formatter(mdates.DateFormatter("%m-%d %H"))
m = np.arange(0,round(max(df.Distance+200),100),100)
ax.set_yticks(m)
plt.xticks(rotation=45)
plt.ylim(0,1500)
plt.xlim(df['time'].min(), df['time'].max())
plt.show()
To view the minor gridlines, you should run
plt.minorticks_on()
To limit the x-axis of the chart, do:
plt.xlim(df.time.min(), df.time.max())
The result is below. As you can see, there's a major x-gridline every 6 hours and a minor one every hour.

How to create grouped bars charts with matplotlib with data in DataFrame

This is my current output:
Now i want the next bars next to the already plotted bars.
My DataFrame has 3 columns: 'Block', 'Cluster', and 'District'.
'Block' and 'Cluster' contain the numbers for plotting and the grouping is based
on the strings in 'District'.
How can I plot the other bars next to the existing bars?
df=pd.read_csv("main_ds.csv")
fig = plt.figure(figsize=(20,8))
ax = fig.add_subplot(111)
plt.xticks(rotation=90)
bwidth=0.30
indic1=ax.bar(df["District"],df["Block"], width=bwidth, color='r')
indic2=ax.bar(df["District"],df["Cluster"], width=bwidth, color='b')
ax.autoscale(tight=False)
def autolabel(rects):
for rect in rects:
h = rect.get_height()
ax.text(rect.get_x()+rect.get_width()/2., 1.05*h, '%d'%int(h),
ha='center', va='top')
autolabel(indic1)
autolabel(indic2)
plt.show()
Data:
District Block Cluster Villages Schools Decadal_Growth_Rate Literacy_Rate Male_Literacy Female_Literacy Primary ... Govt_School Pvt_School Govt_Sch_Rural Pvt_School_Rural Govt_Sch_Enroll Pvt_Sch_Enroll Govt_Sch_Enroll_Rural Pvt_Sch_Enroll_Rural Govt_Sch_Teacher Pvt_Sch_Teacher
0 Dimapur 5 30 278 494 23.2 85.4 88.1 82.5 147 ... 298 196 242 90 33478 57176 21444 18239 3701 3571
1 Kiphire 3 3 94 142 -58.4 73.1 76.5 70.4 71 ... 118 24 118 24 5947 7123 5947 7123 853 261
2 Kohima 5 5 121 290 22.7 85.6 89.3 81.6 128 ... 189 101 157 49 10116 26464 5976 8450 2068 2193
3 Longleng 2 2 37 113 -30.5 71.1 75.6 65.4 60 ... 90 23 90 23 3483 4005 3483 4005 830 293
4 Mon 5 5 139 309 -3.8 56.6 60.4 52.4 165 ... 231 78 219 58 18588 16578 17108 8665 1667 903
5 rows × 26 columns
Try using pandas.DataFrame.plot
import pandas as pd
import numpy as np
from io import StringIO
from datetime import date
import matplotlib.pyplot as plt
def add_value_labels(ax, spacing=5):
for rect in ax.patches:
y_value = rect.get_height()
x_value = rect.get_x() + rect.get_width() / 2
space = spacing
# Vertical alignment for positive values
va = 'bottom'
# If value of bar is negative: Place label below bar
if y_value < 0:
# Invert space to place label below
space *= -1
# Vertically align label at top
va = 'top'
# Use Y value as label and format number with one decimal place
label = "{:.1f}".format(y_value)
# Create annotation
ax.annotate(
label, # Use `label` as label
(x_value, y_value), # Place label at end of the bar
xytext=(0, space), # Vertically shift label by `space`
textcoords="offset points", # Interpret `xytext` as offset in points
ha='center', # Horizontally center label
va=va) # Vertically align label differently for
# positive and negative values.
first3columns = StringIO("""District Block Cluster
Dimapur 5 30
Kiphire 3 3
Kohima 5 5
Longleng 2
Mon 5 5
""")
df_plot = pd.read_csv(first3columns, delim_whitespace=True)
fig, ax = plt.subplots()
#df_plot.set_index(['District'], inplace=True)
df_plot[['Block', 'Cluster']].plot.bar(ax=ax, color=['r', 'b'])
ax.set_xticklabels(df_plot['District'])
add_value_labels(ax)
plt.show()
Try changing
indic1=ax.bar(df["District"],df["Block"], width=bwidth, color='r')
indic2=ax.bar(df["District"],df["Cluster"], width=bwidth, color='b')
to
indic1=ax.bar(df["District"]-bwidth/2,df["Block"], width=bwidth, color='r')
indic2=ax.bar(df["District"]+bwidth/2,df["Cluster"], width=bwidth, color='b')

Add horizontal lines to plot based on sort_values criteria

Question:
How do I add horizontal lines to a plot based on the sort_values criteria specified below captured in the top_5 variable.:
Data:
Here is a slice of the data in a CSV:
This is the current plot.
axnum = today_numBars_slice[['High','Low']].plot()
axnum.yaxis.set_major_formatter(FormatStrFormatter('%.2f'))
This is the data I want to add to this plot (the High and Low values from each row):
top_5 = today_numBars_slice[['High','Low','# of Trades']].sort_values(by='# of Trades',ascending=False).head()
top_5
High Low # of Trades
Timestamp
2017-01-02 12:55:09.100 164.88 164.84 470
2017-01-02 12:10:12.000 164.90 164.86 465
2017-01-02 12:38:59.000 164.90 164.86 431
2017-01-02 11:54:49.100 164.87 164.83 427
2017-01-02 10:52:26.000 164.60 164.56 332
Desired output:
This is an example of the desired output showing two of the lines from top_5:
You can use faster DataFrame.nlargest for top 5 rows and then iterrows with axhline:
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
df = pd.read_csv('for_stack_nums')
#print (df.head())
top_5 = df[['High','Low','# of Trades']].nlargest(5, '# of Trades')
print (top_5)
High Low # of Trades
94 164.88 164.84 470
90 164.90 164.86 465
93 164.90 164.86 431
89 164.87 164.83 427
65 164.60 164.56 332
axnum = df[['High','Low']].plot()
axnum.yaxis.set_major_formatter(ticker.FormatStrFormatter('%.2f'))
for idx, l in top_5.iterrows():
plt.axhline(y=l['High'], color='r')
plt.axhline(y=l['Low'], color='b')
plt.show()
Also subset is not necessary:
df = pd.read_csv('for_stack_nums.csv')
#print (df.head())
axnum = df[['High','Low']].plot()
axnum.yaxis.set_major_formatter(ticker.FormatStrFormatter('%.2f'))
for idx, l in df.nlargest(5, '# of Trades').iterrows():
plt.axhline(y=l['High'], color='r')
plt.axhline(y=l['Low'], color='b')
plt.show()
Would pyplot.axhline be what you're looking for?
axnum = today_numBars_slice[['High','Low']].plot()
axnum.yaxis.set_major_formatter(FormatStrFormatter('%.2f'))
top_5 = today_numBars_slice[['High','Low','# of Trades']].sort_values(by='# of Trades',ascending=False).head()
for l in top_5.iterrows():
plt.axhline(l['high'], color='r')
plt.axhline(l['low'], color='b')
plt.show();

Categories

Resources