How to create grouped bars charts with matplotlib with data in DataFrame - python

This is my current output:
Now i want the next bars next to the already plotted bars.
My DataFrame has 3 columns: 'Block', 'Cluster', and 'District'.
'Block' and 'Cluster' contain the numbers for plotting and the grouping is based
on the strings in 'District'.
How can I plot the other bars next to the existing bars?
df=pd.read_csv("main_ds.csv")
fig = plt.figure(figsize=(20,8))
ax = fig.add_subplot(111)
plt.xticks(rotation=90)
bwidth=0.30
indic1=ax.bar(df["District"],df["Block"], width=bwidth, color='r')
indic2=ax.bar(df["District"],df["Cluster"], width=bwidth, color='b')
ax.autoscale(tight=False)
def autolabel(rects):
for rect in rects:
h = rect.get_height()
ax.text(rect.get_x()+rect.get_width()/2., 1.05*h, '%d'%int(h),
ha='center', va='top')
autolabel(indic1)
autolabel(indic2)
plt.show()
Data:
District Block Cluster Villages Schools Decadal_Growth_Rate Literacy_Rate Male_Literacy Female_Literacy Primary ... Govt_School Pvt_School Govt_Sch_Rural Pvt_School_Rural Govt_Sch_Enroll Pvt_Sch_Enroll Govt_Sch_Enroll_Rural Pvt_Sch_Enroll_Rural Govt_Sch_Teacher Pvt_Sch_Teacher
0 Dimapur 5 30 278 494 23.2 85.4 88.1 82.5 147 ... 298 196 242 90 33478 57176 21444 18239 3701 3571
1 Kiphire 3 3 94 142 -58.4 73.1 76.5 70.4 71 ... 118 24 118 24 5947 7123 5947 7123 853 261
2 Kohima 5 5 121 290 22.7 85.6 89.3 81.6 128 ... 189 101 157 49 10116 26464 5976 8450 2068 2193
3 Longleng 2 2 37 113 -30.5 71.1 75.6 65.4 60 ... 90 23 90 23 3483 4005 3483 4005 830 293
4 Mon 5 5 139 309 -3.8 56.6 60.4 52.4 165 ... 231 78 219 58 18588 16578 17108 8665 1667 903
5 rows × 26 columns

Try using pandas.DataFrame.plot
import pandas as pd
import numpy as np
from io import StringIO
from datetime import date
import matplotlib.pyplot as plt
def add_value_labels(ax, spacing=5):
for rect in ax.patches:
y_value = rect.get_height()
x_value = rect.get_x() + rect.get_width() / 2
space = spacing
# Vertical alignment for positive values
va = 'bottom'
# If value of bar is negative: Place label below bar
if y_value < 0:
# Invert space to place label below
space *= -1
# Vertically align label at top
va = 'top'
# Use Y value as label and format number with one decimal place
label = "{:.1f}".format(y_value)
# Create annotation
ax.annotate(
label, # Use `label` as label
(x_value, y_value), # Place label at end of the bar
xytext=(0, space), # Vertically shift label by `space`
textcoords="offset points", # Interpret `xytext` as offset in points
ha='center', # Horizontally center label
va=va) # Vertically align label differently for
# positive and negative values.
first3columns = StringIO("""District Block Cluster
Dimapur 5 30
Kiphire 3 3
Kohima 5 5
Longleng 2
Mon 5 5
""")
df_plot = pd.read_csv(first3columns, delim_whitespace=True)
fig, ax = plt.subplots()
#df_plot.set_index(['District'], inplace=True)
df_plot[['Block', 'Cluster']].plot.bar(ax=ax, color=['r', 'b'])
ax.set_xticklabels(df_plot['District'])
add_value_labels(ax)
plt.show()

Try changing
indic1=ax.bar(df["District"],df["Block"], width=bwidth, color='r')
indic2=ax.bar(df["District"],df["Cluster"], width=bwidth, color='b')
to
indic1=ax.bar(df["District"]-bwidth/2,df["Block"], width=bwidth, color='r')
indic2=ax.bar(df["District"]+bwidth/2,df["Cluster"], width=bwidth, color='b')

Related

How to annotate grouped bars with group count instead of bar height

To draw plot, I am using seaborn and below is my code
import seaborn as sns
sns.set_theme(style="whitegrid")
tips = sns.load_dataset("tips")
tips=tips.head()
ax = sns.barplot(x="day", y="total_bill",hue="sex", data=tips, palette="tab20_r")
I want to get and print frequency of data plots that is no. of times it occurred and below is the expected image
To Add label in bar,
I have used below code
for rect in ax.patches:
y_value = rect.get_height()
x_value = rect.get_x() + rect.get_width() / 2
space = 1
label = "{:.0f}".format(y_value)
ax.annotate(label, (x_value, y_value), xytext=(0, space), textcoords="offset points", ha='center', va='bottom')
plt.show()
So, With above code. I am able to display height with respect to x-axis , but I don't want height. I want frequency/count that satisfies relationship. For above example, there are 2 male and 3 female who gave tip on Sunday. So it should display 2 and 3 and not the amount of tip
Below is the code
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style="whitegrid")
df = sns.load_dataset("tips")
ax = sns.barplot(x='day', y='tip',hue="sex", data=df, palette="tab20_r")
for rect in ax.patches:
y_value = rect.get_height()
x_value = rect.get_x() + rect.get_width() / 2
space = 1
label = "{:.0f}".format(y_value)
ax.annotate(label, (x_value, y_value), xytext=(0, space), textcoords="offset points", ha='center', va='bottom')
plt.show()
How to display custom values on a bar plot does not clearly show how to annotate grouped bars, nor does it show how to determine the frequency of each hue category for each day.
How to plot and annotate grouped bars in seaborn / matplotlib shows how to annotate grouped bars, but not with custom labels.
for rect in ax.patches is an obsolete way to annotate bars. Use matplotlib.pyplot.bar_label, as fully described in How to add value labels on a bar chart.
Use pandas.crosstab or pandas.DataFrame.groupby to calculate the count of each category by the hue group.
As tips.info() shows, several columns have a category Dtype, which insures the plotting order and why the tp.index and tp.column order matches the x-axis and hue order of ax. Use pandas.Categorical to set a column to a category Dtype.
Tested in python 3.11, pandas 1.5.2, matplotlib 3.6.2, seaborn 0.12.1
import pandas as pd
import seaborn as sns
# load the data
tips = sns.load_dataset('tips')
# determine the number of each gender for each day
tp = pd.crosstab(tips.day, tips.sex)
# or use groupby
# tp = tips.groupby(['day', 'sex']).sex.count().unstack('sex')
# plot the data
ax = sns.barplot(x='day', y='total_bill', hue='sex', data=tips)
# move the legend if needed
sns.move_legend(ax, bbox_to_anchor=(1, 1.02), loc='upper left', frameon=False)
# iterate through each group of bars, zipped to the corresponding column name
for c, col in zip(ax.containers, tp):
# add bar labels with custom annotation values
ax.bar_label(c, labels=tp[col], padding=3, label_type='center')
DataFrame Views
tips
tips.head()
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
tips.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 total_bill 244 non-null float64
1 tip 244 non-null float64
2 sex 244 non-null category
3 smoker 244 non-null category
4 day 244 non-null category
5 time 244 non-null category
6 size 244 non-null int64
dtypes: category(4), float64(2), int64(1)
memory usage: 7.4 KB
tp
sex Male Female
day
Thur 30 32
Fri 10 9
Sat 59 28
Sun 58 18

Python Seaborn: how to plot all columns and use index as hue?

I have a dataframe that looks like this:
index
9
1
8
3
7
6
2
5
0
4
0
32941
3545
2829
2423
1945
1834
1213
1205
1096
969
1
24352
2738
2666
2432
1388
7937
682
3539
2705
1561
2
2137
1271
2401
540
3906
1446
3432
24855
1885
8127
I want to use barplot to plot these values, and use the index as hue. How can I do that? It can be matplotlib or seaborn or any tool, but I prefer the first two.
use:
df = df.melt(id_vars='index')
sns.barplot(x = 'variable', y = 'value' , data = df, hue = 'index' )
OUTPUT:
NOTE: If you wanna add the values on the top of each bar use:
plt.figure(figsize = (20,8))
ax = sns.barplot(x = 'variable', y = 'value' , data = df, hue = 'index' )
for p in ax.patches:
height = p.get_height()
ax.text(p.get_x()+p.get_width()/2., height + 300, int(height) , ha="center", fontsize= 'small')
OUTPUT:

Background with range on seaborn based on two columns

I am trying to add to my several line plots a background that shows a range from value x (column "Min") to value y (column "Max") for each year. My dataset looks like that:
Country Model Year Costs Min Max
494 FR 1 1990 300 250 350
495 FR 1 1995 250 300 400
496 FR 1 2000 220 330 640
497 FR 1 2005 210 289 570
498 FR 2 1990 400 250 350
555 JPN 8 1990 280 250 350
556 JPN 8 1995 240 300 400
557 JPN 8 2000 200 330 640
558 JPN 8 2005 200 289 570
I used the following code:
example_1 = sns.relplot(data=example, x = "Year", y = "Costs", hue = "Model", style = "Model", col = "Country", kind="line", col_wrap=4,height = 4, dashes = True, markers = True, palette = palette, style_order = style_order)
I would like something like this with the range being my "Min" and "Max" by year.
Is it possible to do it?
Thank you very much !
Usually, grid.map is the tool for this, as shown in many examples in the mutli-plot grids tutorial. But you are using relplot to combine lineplot with a FacetGrid as it is suggested in the docs (last example) which lets you use some extra styling parameters.
Because relplot processes the data a bit differently than if you would first initiate a FacetGrid and then map a lineplot (you can check this with grid.data), using grid.map(plt.bar, ...) to plot the ranges is quite cumbersome as it requires editing the grid.data dataframe as well as the x- and y-axis labels.
The simplest way to plot the ranges is to loop through the grid.axes. This can be done with grid.axes_dict.items() which provides the column names (i.e. countries) that you can use to select the appropriate data for the bars (useful if the ranges were to differ, contrary to this example).
The default figure legend does not contain the complete legend including the key for ranges, but the first ax object does so that one displayed instead of the default legend in the following example. Note that I have edited the data you shared so that the min/max ranges make more sense:
import io
import pandas as pd # v 1.1.3
import matplotlib.pyplot as plt # v 3.3.2
import seaborn as sns # v 0.11.0
data ='''
Country Model Year Costs Min Max
494 FR 1 1990 300 250 350
495 FR 1 1995 250 200 300
496 FR 1 2000 220 150 240
497 FR 1 2005 210 189 270
555 JPN 8 1990 280 250 350
556 JPN 8 1995 240 200 300
557 JPN 8 2000 200 150 240
558 JPN 8 2005 200 189 270
'''
df = pd.read_csv(io.StringIO(data), delim_whitespace=True)
# Create seaborn FacetGrid with line plots
grid = sns.relplot(data=df, x='Year', y='Costs', hue='Model', style='Model',height=3.9,
col='Country', kind='line', markers=True, palette='tab10')
# Loop through axes of the FacetGrid to plot bars for ranges and edit x ticks
for country, ax in grid.axes_dict.items():
df_country = df[df['Country'] == country]
cost_range = df_country['Max']-df_country['Min']
ax.bar(x=df_country['Year'], height=cost_range, bottom=df_country['Min'],
color='black', alpha=0.1, label='Min/max\nrange')
ax.set_xticks(df_country['Year'])
# Remove default seaborn figure legend and show instead full legend stored in first ax
grid._legend.remove()
grid.axes.flat[0].legend(bbox_to_anchor=(2.1, 0.5), loc='center left',
frameon=False, title=grid.legend.get_title().get_text());

How to draw cumulative density plot from pandas?

I have a dataframe:
count_single count_multi column_names
0 11345 7209 e
1 11125 6607 w
2 10421 5105 j
3 9840 4478 r
4 9561 5492 f
5 8317 3937 i
6 7808 3795 l
7 7240 4219 u
8 6915 3854 s
9 6639 2750 n
10 6340 2465 b
11 5627 2834 y
12 4783 2384 c
13 4401 1698 p
14 3305 1753 g
15 3283 1300 o
16 2767 1697 t
17 2453 1276 h
18 2125 1140 a
19 2090 929 q
20 1330 518 d
I want to visualize the single count and multi_count while column_names as a common column in both of them. I am looking something like this :
What I've tried:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_context('paper')
f, ax = plt.subplots(figsize = (6,15))
sns.set_color_codes('pastel')
sns.barplot(x = 'count_single', y = 'column_names', data = df,
label = 'Type_1', color = 'orange', edgecolor = 'w')
sns.set_color_codes('muted')
sns.barplot(x = 'count_multi', y = 'column_names', data = df,
label = 'Type_2', color = 'green', edgecolor = 'w')
ax.legend(ncol = 2, loc = 'lower right')
sns.despine(left = True, bottom = True)
plt.show()
it's giving me plot like this:
How to visualize these two columns with same as expected images?
I really appreciate any help you can provide.
# instantiate figure with two rows and one column
fig, axes = plt.subplots(nrows=2, figsize=(10,5))
# plot barplot in the first row
df.set_index('column_names').plot.bar(ax=axes[0], color=['rosybrown', 'tomato'])
# first scale each column bydividing by its sum and then use cumulative sum to generate the cumulative density function. plot on the second ax
df.set_index('column_names').apply(lambda x: x/x.sum()).cumsum().plot(ax=axes[1], color=['rosybrown', 'tomato'])
# change ticks in first plot:
axes[0].set_yticks(np.linspace(0, 12000, 7)) # this means: make 7 ticks between 0 and 12000
# adjust the axislabels for the second plot
axes[1].set_xticks(range(len(df)))
axes[1].set_xticklabels(df['column_names'], rotation=90)
plt.tight_layout()

Subplot secondary axis - Python, matplotlib

I have a dataframe called conversionRate like this:
| State| Apps | Loans| conversionratio|
2013-01-01 IL 1165 152 13.047210
2013-01-01 NJ 2210 756 34.208145
2013-01-01 TX 1454 73 5.020633
2013-02-01 CA 2265 400 17.660044
2013-02-01 IL 1073 168 15.657036
2013-02-01 NJ 2036 739 36.296660
2013-02-01 TX 1370 63 4.598540
2013-03-01 CA 2545 548 21.532417
2013-03-01 IL 1108 172 15.523466
I intend to plot the number of apps and number of loans in the primary Y axis and the Conversion Ratio in the secondary axis for each state.
I tried the below code:
import math
rows =int(math.ceil(len(pd.Series.unique(conversionRate['State']))/2))
fig, axes = plt.subplots(nrows=rows, ncols=2, figsize=(10, 10),sharex=True, sharey=False)
columnCounter = itertools.cycle([0,1])
rowCounter1 = 0
for element in pd.Series.unique(conversionRate['State']):
rowCounter = (rowCounter1)//2
rowCounter1 = (rowCounter1+1)
subSample = conversionRate[conversionRate['State']==element]
axis=axes[rowCounter,next(columnCounter)]
#ax2 = axis.twinx()
subSample.plot(y=['Loans', 'Apps'],secondary_y=['conversionratio'],\
ax=axis)
I end up with a figure like the below:
The question is how do I get the secondary axis line to show? If I try the below (per the manual setting secondary_y in plot() should selectively plot those columns in the secondary axis), I see only the line I plot on the secondary axis. There must be something simple and obvious I am missing. I can't figure out what it is! Can any guru please help?
subSample.plot(secondary_y=['conversionratio'],ax=axis)
You need to include conversionration in y=['Loans', 'Apps','conversionratio'] as well as in secondary_y... or better yet leave that parameter out, since you're plotting all the columns.
rows =int(math.ceil(len(pd.Series.unique(conversionRate['State']))/2))
fig, axes = plt.subplots(nrows=rows, ncols=2, figsize=(10,
10),sharex=True, sharey=False)
columnCounter = itertools.cycle([0,1])
rowCounter1 = 0
for element in pd.Series.unique(conversionRate['State']):
rowCounter = (rowCounter1)//2
rowCounter1 = (rowCounter1+1)
subSample = conversionRate[conversionRate['State']==element]
axis=axes[rowCounter,next(columnCounter)]
#ax2 = axis.twinx()
subSample.plot(secondary_y=['conversionratio'], ax=axis)

Categories

Resources