I am using Python 3.5. Also, I am a beginner (3 weeks experience) Python attempter and somehow I haven't given up in trying to analyze my data.
Data Description: My data is in a csv file (fev.csv). I've included it here if you want to see the full extent of it full data set. It has 5 columns:
age (years)
fev (liters)
ht (inches)
sex (female=0, male=1)
smoke (non-smoker=1, smoker=1)
Task: I am trying to write a program to generate a bar graph of average FEVs with error bars indicating standard deviation. I'm trying to get 2 side by side bars (smokers/non-smokers) at 4 different age categories (11-12, 13-14, 15-16, 17 or older).
Code so far (please excuse all my #notes, it helps me know what I'm trying to do):
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('fev.csv')
nonsmokers = data[data.smoke==0]
smokers = data[data.smoke==1]
nonsmokers1 = nonsmokers[(nonsmokers.age==11) | (nonsmokers.age==12)]
nonsmokers2 = nonsmokers[(nonsmokers.age==13) | (nonsmokers.age==14)]
nonsmokers3 = nonsmokers[(nonsmokers.age==15) | (nonsmokers.age==16)]
nonsmokers4 = nonsmokers[(nonsmokers.age>=17)]
smokers1 = smokers[(smokers.age==11) | (smokers.age==12)]
smokers2 = smokers[(smokers.age==13) | (smokers.age==14)]
smokers3 = smokers[(smokers.age==15) | (smokers.age==16)]
smokers4 = smokers[(smokers.age>=17)]
nonsmMean = [nonsmokers1.fev.mean(), nonsmokers2.fev.mean(), nonsmokers3.fev.mean(), nonsmokers4.fev.mean()]
nonsmSd = [nonsmokers1.fev.std(), nonsmokers2.fev.std(), nonsmokers3.fev.std(), nonsmokers4.fev.std()]
smMean = [smokers1.fev.mean(), smokers2.fev.mean(), smokers3.fev.mean(), smokers4.fev.mean()]
smSd = [smokers1.fev.std(), smokers2.fev.std(), smokers3.fev.std(), smokers4.fev.std()]
# data to be plotted
nonsmoker = np.array(nonsmMean)
sdNonsmoker = np.array(nonsmSd)
smoker = np.array(smMean)
sdSmoker = np.array(smSd)
# parameters
bar_width = 0.35
x = np.arange(len(nonsmoker))
# plotting bars
plt.bar(x, nonsmoker, bar_width, yerr=sdNonsmoker, ecolor='k', color='b', label='Nonsmokers')
plt.bar(x+bar_width, smoker, bar_width, yerr=sdSmoker, ecolor='k', color='m', label='Smokers')
# formatting and labeling the axes and title
plt.xlabel('Age')
plt.ylabel('FEV')
plt.title('Mean FEV by Age and Smoking Status')
plt.xticks(x+0.35, ['11 to 12', '13 to 14', '15 to 16', '17+'])
# adding the legend
plt.legend()
plt.axis([-0.5,4.2,0,7])
plt.savefig('FEVgraph.png', dpi=300)
# and we are done!
plt.show()
Is there a more efficient way of doing this?
Thanks!
Possible solution is the following:
# pip install pandas
# pip install matplotlib
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# read csv file and create pandas dataframe
df = pd.read_csv('https://raw.githubusercontent.com/benkeser/halplus/master/inst/extdata/fev.csv')
# assign age bins to data
bins = [df['age'].min()-1, 10, 12, 14, 16, df['age'].max()]
bins_labels = ['<11', '11 to 12', '13 to 14', '15 to 16', '17+']
df['age_bins'] = pd.cut(df['age'], bins, labels = bins_labels)
# aggregate data
result = df.groupby(['smoke', 'age_bins'], as_index=False).agg({'fev':['mean','std']})
result.columns = ['_'.join(col).strip('_') for col in result.columns.values]
result = result.round(1)
# prepare data for plot
nonsmokers = result[result['smoke'] == 0]
smokers = result[result['smoke'] == 1]
x = np.arange(len(bins_labels))
width = 0.35
# set plot fugure size
plt.rcParams["figure.figsize"] = [8,6]
fig, ax = plt.subplots()
rects1 = ax.bar(x - width/2, nonsmokers['fev_mean'], width, yerr=nonsmokers['fev_std'], color='b', label='Nonsmokers')
rects2 = ax.bar(x + width/2, smokers['fev_mean'], width, yerr=smokers['fev_std'], color='m', label='Smokers')
ax.set_xlabel('Age')
ax.set_ylabel('FEV')
ax.set_title('Mean FEV by Age and Smoking Status')
ax.set_xticks(x, bins_labels)
ax.legend(loc=2)
fig.tight_layout()
plt.savefig('FEVgraph.png', dpi=300)
plt.show()
Returns
Related
I have been trying to understand the answer of this post in order to populate two different legends.
I create a clustered stacked bar plot with different hatches for each bar and my code below is a bit different from the answer of the aforementioned post.
But I have not been able to figure out how to get one legend with the colors and one legend with the hatches.
The color legend should correspond to A, B, C, D, E and the hatch legend should indicate "with" if bar is hatched and "without" if non-hatched.
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap as coloring
# copy the dfs below and use pd.read_clipboard() to reproduce
df_1
A B C D E
Mg 10 15 23 25 27
Ca 30 33 0 20 17
df_2
A B C D E
Mg 20 12 8 40 10
Ca 7 26 12 22 16
hatches=(' ', '//')
colors_ABCDE=['tomato', 'gold', 'greenyellow', 'forestgreen', 'palevioletred']
dfs=[df_1,df_2]
for each_df, df in enumerate(dfs):
df.plot(ax=plt.subplot(111), kind="barh", \
stacked=True, hatch=hatches[each_df], \
colormap=coloring.from_list("my_colormap", colors_ABCDE), \
figsize=(7,2.5), position=len(dfs)-each_df-1, \
align='center', width=0.2, edgecolor="darkgrey")
plt.legend(loc='center left', bbox_to_anchor=(1.0, 0.5), fontsize=12)
The plot I manage to get is:
Any ideas how to create two legends and place them one next to the other or one below the other? Thanks in advance ^_^
Since adding legends in matplotlib is a complex, extensive step, consider using the very link you cite with function solution by #jrjc. However, you will need to adjust function to your horizontal bar graph needs. Specifically:
Add an argument for color map and in DataFrame.plot call
Adjust bar plot from kind='bar' to kind='barh' for horizontal version
Swap x for y in line: rect.set_y(rect.get_y() + 1 / float(n_df + 1) * i / float(n_col))
Swap width for height in line: rect.set_height(1 / float(n_df + 1))
Adjust axe.set_xticks and axe.set_xticklabels for np.arange(0, 120, 20) values
Function
import numpy as np
import pandas as pd
import matplotlib.cm as cm
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap as coloring
def plot_clustered_stacked(dfall, labels=None, title="multiple stacked bar plot", H="//",
colors_ABCDE=['tomato', 'gold', 'greenyellow', 'forestgreen', 'palevioletred'], **kwargs):
"""
CREDIT: #jrjc (https://stackoverflow.com/a/22845857/1422451)
Given a list of dataframes, with identical columns and index, create a clustered stacked bar plot.
labels is a list of the names of the dataframe, used for the legend
title is a string for the title of the plot
H is the hatch used for identification of the different dataframe
"""
n_df = len(dfall)
n_col = len(dfall[0].columns)
n_ind = len(dfall[0].index)
axe = plt.subplot(111)
for df in dfall : # for each data frame
axe = df.plot(kind="barh",
linewidth=0,
stacked=True,
ax=axe,
legend=False,
grid=False,
colormap=coloring.from_list("my_colormap", colors_ABCDE),
edgecolor="darkgrey",
**kwargs) # make bar plots
h,l = axe.get_legend_handles_labels() # get the handles we want to modify
for i in range(0, n_df * n_col, n_col): # len(h) = n_col * n_df
for j, pa in enumerate(h[i:i+n_col]):
for rect in pa.patches: # for each index
rect.set_y(rect.get_y() + 1 / float(n_df + 2) * i / float(n_col))
rect.set_hatch(H * int(i / n_col)) #edited part
rect.set_height(1 / float(n_df + 2))
axe.set_xticks(np.arange(0, 125, 20))
axe.set_xticklabels(np.arange(0, 125, 20).tolist(), rotation = 0)
axe.margins(x=0, tight=None)
axe.set_title(title)
# Add invisible data to add another legend
n=[]
for i in range(n_df):
n.append(axe.bar(0, 0, color="gray", hatch=H * i, edgecolor="darkgrey"))
l1 = axe.legend(h[:n_col], l[:n_col], loc=[1.01, 0.5])
if labels is not None:
l2 = plt.legend(n, labels, loc=[1.01, 0.1])
axe.add_artist(l1)
return axe
Call
plt.figure(figsize=(10, 4))
plot_clustered_stacked([df_1, df_2],["df_1", "df_2"])
plt.show()
plt.clf()
plt.close()
Output
I thought that this function solution by #jrjc is rather perplexing for my understanding and thus, I preferred to alter my own thing a little and adjust it.
So, it took my some time to understand that when a second legend is created for a plot, python automatically erases the first one and this is when add_artist() must be employed.
The other prerequisite in order to add the second legend is to name the plot and apply the .add_artist() method to that specific plot, so that python knows where to stick that new piece.
In short, this is how I managed to create the plot I had in mind and I hope that the comments will make it somehow clearer and useful for anyone.
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap as coloring
import matplotlib.patches as mpatches
# copy the dfs below and use pd.read_clipboard() to reproduce
df_1
A B C D E
Mg 10 15 23 25 27
Ca 30 33 0 20 17
df_2
A B C D E
Mg 20 12 8 40 10
Ca 7 26 12 22 16
hatches=(' ', '//')
colors_ABCDE=['tomato', 'gold', 'greenyellow', 'forestgreen', 'palevioletred']
dfs=[df_1,df_2]
for each_df, df in enumerate(dfs):
#I name the plot as "figure"
figure=df.plot(ax=plt.subplot(111), kind="barh", \
stacked=True, hatch=hatches[each_df], \
colormap=coloring.from_list("my_colormap", colors_ABCDE), \
figsize=(7,2.5), position=len(dfs)-each_df-1, \
align='center', width=0.2, edgecolor="darkgrey", \
legend=False) #I had to False the legend too
legend_1=plt.legend(df_1.columns, loc='center left', bbox_to_anchor=(1.0, 0.5), fontsize=12)
patch_hatched = mpatches.Patch(facecolor='beige', hatch='///', edgecolor="darkgrey", label='hatched')
patch_unhatched = mpatches.Patch(facecolor='beige', hatch=' ', edgecolor="darkgrey", label='non-hatched')
legend_2=plt.legend(handles=[patch_hatched, patch_unhatched], loc='center left', bbox_to_anchor=(1.15, 0.5), fontsize=12)
# as soon as a second legend is made, the first disappears and needs to be added back again
figure.add_artist(legend_1) #python now knows that "figure" must take the "legend_1" along with "legend_2"
I am pretty sure that it can be even more elegant and automated.
I have some longitudinal test data and I wanted to examine the overall trend for this data. My data is set up like this:
import seaborn as sns
import matplotlib.pyplot as plt
test_data = pd.read_csv('./Files Used to Generate Graphs/test_data.csv',header = 0)
test_data
The way I want to plot this data is to have each donor have his/her own longitudinal data line, but color each line based on the gender of the donor like this:
test_plt = sns.lineplot(x = 'Timepoint',y = 'Prevalence',
hue = 'Gender',
data = test_data,
style = 'Donor',
palette = dict(Male = 'red',
Female = 'blue'))
for line in test_plt.lines:
line.set_linestyle("-")
ax = plt.gca()
legend = ax.legend()
legend.set_visible(False)
plt.figure()
However, it seems that seaborn's lineplot's style argument is capped at 6 types. If I try to add another donor to my data and plot it, I get this:
append_df = pd.DataFrame(index = [12,13],
columns = ['Donor','Timepoint','Gender','Prevalence'])
append_df['Donor'] = 7
append_df['Gender'] = 'Female'
append_df.loc[12,'Timepoint'] = 1945
append_df.loc[13,'Timepoint'] = 1948
append_df.loc[12,'Prevalence'] = 18
append_df.loc[13,'Prevalence'] = 36
test_data = test_data.append(append_df)
test_data
test_plt = sns.lineplot(x = 'Timepoint',y = 'Prevalence',
hue = 'Gender',
data = test_data,
style = 'Donor',
palette = dict(Male = 'red',
Female = 'blue'))
for line in test_plt.lines:
line.set_linestyle("-")
ax = plt.gca()
legend = ax.legend()
legend.set_visible(False)
plt.figure()
So is there a way to bypass this limit on lineplot or do I have to go through Matplotlib for this? If the latter, how would the Matplotlib code look like?
On a separate note, is there a way to generate the legend on the seaborn plot that shows the gender of each donor and not each specific donor?
How can I display values for my stacked barh chart that come from a dataframe? How can I place the labels above their respective sections on each bar and modify the font so that it shows up well as a gray scale graphic?
It is related to this question but it has a list of values rather than two lists pulled from a pandas dataframe. If it were a singe list, I think I could pull values from a single record in the dataframe but with two lists, I'm not sure how to apply that to each bar in the bar graph.
My dataframe:
Delin. Group1 Group2 Group3 Group4 Group5
Census 0.2829 0.3387 0.2636 0.0795 0.0353
USPS 0.2538 0.3143 0.2901 0.1052 0.0366
My code:
import os
import pandas as pd
import time
#
start_time = time.time()
#
output_dir = r"C:\Some\Directory\For\Ouputs"
#
output_fig = "race_barh2.png"
#
fig_path = os.path.join(output_dir, output_fig)
#
os.chdir(output_dir)
#
input_csv = r"C:\Some\Directory\To\My.csv"
#
df = pd.read_csv(input_csv, delimiter = ",")
#
ax = df.plot.barh( stacked = True, color = ("#252525", "#636363", "#969696", "#cccccc", "#f7f7f7"), edgecolor = "black", linewidth = 1)
#
ax.set_xlabel("Percentage of Total", fontsize = 18)
#
ax.set_ylabel("Boundary Delineation", fontsize = 18)
#
ax.set_yticklabels(["Census", "USPS"])
#
ax.set_xticklabels(["0%", "20%", "40%", "60%", "80%", "100%"])
#
horiz_offset = 1.03
#
vert_offset = 1
#
ax.legend(bbox_to_anchor=(horiz_offset, vert_offset))
#
fig = ax.get_figure()
#
fig.savefig(fig_path, bbox_inches = "tight", dpi = 600)
#
#
#
end_time = round( time.time() - start_time, 5 )
#
print "Seconds elapsed: {0}".format(end_time)
You can do this similarly as in the referenced question, by annotating the bars. For a stacked bar chart you'll have to tweak the position of the labels a little to get them where you want. You can play around with the horizontalalignment, verticalalignment and adding a bit of a margin as I did (+.5).
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from cycler import cycler
#used gray colormap, you can use your own colors by replacing colormap='gray' with color=colors
colors = ["#252525", "#636363", "#969696", "#cccccc", "#f7f7f7"]
plt.rcParams['axes.prop_cycle'] = cycler(color=colors)
#dummy data
df = pd.DataFrame(np.random.randint(5, 8, (10, 3)), columns=['Group1', 'Group2', 'Group3'])
for col in df.columns.tolist():
df[col] = df[col].apply(lambda x:x*100 / df[col].sum())
ax = df.T.plot.barh(stacked=True, colormap='gray', edgecolor='black', linewidth=1)
for lbl in ax.patches:
ax.annotate("{:.0f}%".format(int(lbl.get_width())), (lbl.get_x(), lbl.get_y()+.5), verticalalignment='bottom', horizontalalignment='top', fontsize=8, color='black')
ax.legend(loc='center left', bbox_to_anchor=(1.0, .5))
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
plt.show()
I'm trying to write a python program that displays an animation of a map of the world where countries change color based on how much renewable energy use they have. I'm trying to have it display the colors for all countries in year 1960, then the colors for all countries in the year 1961, then 1962...
I'm using cartopy to add countries to the figure and basing their color off of values that I pull into a pandas dataframe from a SQL database. I was able to get the map to show what I want for one year like this:
However, I can't figure out how to animate it. I tried using FuncAnimate, but I'm really struggling to understand how it works. All the examples seem to have functions that return lines, but I'm not graphing lines or contours. Here is what I tried:
import sqlite3
import pandas as pd
import os
import matplotlib.pyplot as plt
import matplotlib as mpl
import matplotlib.animation as animation
import cartopy.crs as ccrs
import cartopy.io.shapereader as shpreader
from math import log
from math import exp
from matplotlib import colors
path = 'H:/USER/DVanLunen/indicator_data/world-development-indicators/'
os.chdir(path)
con = sqlite3.connect('database.sqlite')
# Grab :
# % of electricity from renewable sources EG.ELC.RNWX.ZS
# 1960 - 2013
Indicator_df = pd.read_sql('SELECT * '
'FROM Indicators '
'WHERE IndicatorCode in('
'"EG.ELC.RNWX.ZS"'
')'
, con)
# setup colorbar stuff and shape files
norm = mpl.colors.Normalize(vmin=0, vmax=30)
colors_in_map = []
for i in range(30):
val = log(i + 1, logbase) / log(31, logbase)
colors_in_map.append((1 - val, val, 0))
cmap = colors.ListedColormap(colors_in_map)
shpfilename = shpreader.natural_earth(resolution='110m',
category='cultural',
name='admin_0_countries')
reader = shpreader.Reader(shpfilename)
countries_map = reader.records()
logbase = exp(1)
fig, ax = plt.subplots(figsize=(12, 6),
subplot_kw={'projection': ccrs.PlateCarree()})
def run(data):
"""Update the Dist"""
year = 1960 + data % 54
logbase = exp(1)
for n, country in enumerate(countries_map):
facecolor = 'gray'
edgecolor = 'black'
indval = Indicator_df.loc[(Indicator_df['CountryName'] ==
country.attributes['name_long']) &
(Indicator_df['Year'] == year), 'Value']
if indval.any():
greenamount = (log(float(indval) + 1, logbase) /
log(31, logbase))
facecolor = 1 - greenamount, greenamount, 0
ax.add_geometries(country.geometry, ccrs.PlateCarree(),
facecolor=facecolor, edgecolor=edgecolor)
ax.set_title('Percent of Electricity from Renewable Sources ' +
str(year))
ax.figure.canvas.draw()
cax = fig.add_axes([0.92, 0.2, 0.02, 0.6])
cb = mpl.colorbar.ColorbarBase(cax, cmap=cmap, norm=norm,
spacing='proportional')
cb.set_label('%')
ani = animation.FuncAnimation(fig, run, interval=200, blit=False)
plt.show()
Any help would be greatly appreciated. Thanks!
Some example data for Indicator_df (not real):
CountryName Year Value
United States 1960 5
United States 1961 10
United States 1962 20
United States 1963 30
There are actually several problems with how you've set up your run(), but the major problem appeared to actually be the enumate(countries_map). The records() function returns a generator, which once you've run through it once doesn't appear to like being run through again - I tried it separate from the animation to make sure.
That said, the problem can be avoided entirely by moving a lot of code out of the run(). Currently, even if it worked you're re-drawing every single country every frame, not just the ones with colors. It's both intensive and unnecessary - you don't need to draw any gray ones more than once.
I've restructured your code a bit and with the fake data I put in for the US and Argentina it works fine for me.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl
import matplotlib.animation as animation
import cartopy.crs as ccrs
import cartopy.io.shapereader as shpreader
from math import log
from math import exp
from matplotlib import colors
from shapely.geometry.multipolygon import MultiPolygon
# Grab :
# % of electricity from renewable sources EG.ELC.RNWX.ZS
# 1960 - 2013
# Make fake data
Indicator_df = pd.DataFrame({
'CountryName': ['United States'] * 4 + ['Argentina'] * 4,
'Year': [1960, 1961, 1962, 1963] * 2,
'Value': [5, 10, 20, 30] * 2
})
# setup colorbar stuff and shape files
norm = mpl.colors.Normalize(vmin=0, vmax=30)
colors_in_map = []
logbase = exp(1)
for i in range(30):
val = log(i + 1, logbase) / log(31, logbase)
colors_in_map.append((1 - val, val, 0))
cmap = colors.ListedColormap(colors_in_map)
shpfilename = shpreader.natural_earth(resolution='110m',
category='cultural',
name='admin_0_countries')
reader = shpreader.Reader(shpfilename)
countries_map = reader.records()
# These don't need to constantly be redefined, especially edgecolor
facecolor = 'gray'
edgecolor = 'black'
fig, ax = plt.subplots(figsize=(12, 6),
subplot_kw={'projection': ccrs.PlateCarree()})
# Draw all the gray countries just once in an init function
# I also make a dictionary for easy lookup of the geometries by country name later
geom_dict = {}
def init_run():
for n, country in enumerate(countries_map):
if country.geometry.type == "Polygon":
geom = MultiPolygon([country.geometry])
else:
geom = country.geometry
ax.add_geometries(geom,
ccrs.PlateCarree(),
facecolor=facecolor,
edgecolor=edgecolor)
geom_dict[country.attributes['NAME_LONG']] = geom
def run(data):
"""Update the Dist"""
# "data" in this setup is a frame number starting from 0, so it corresponds nicely
# with your years
# data = 0
year = 1960 + data
# get a subset of the df for the current year
year_df = Indicator_df[Indicator_df['Year'] == year]
for i, row in year_df.iterrows():
# This loops over countries, gets the value and geometry and adds
# the new-colored shape
geom = geom_dict[row['CountryName']]
value = row['Value']
greenamount = (log(float(value) + 1, logbase) / log(31, logbase))
facecolor = 1 - greenamount, greenamount, 0
ax.add_geometries(geom,
ccrs.PlateCarree(),
facecolor=facecolor,
edgecolor=edgecolor)
# I decreased the indent of this, you only need to do it once per call to run()
ax.set_title('Percent of Electricity from Renewable Sources ' + str(year))
cax = fig.add_axes([0.92, 0.2, 0.02, 0.6])
cb = mpl.colorbar.ColorbarBase(cax,
cmap=cmap,
norm=norm,
spacing='proportional')
cb.set_label('%')
ani = animation.FuncAnimation(fig,
run,
init_func=init_run,
frames=4,
interval=500,
blit=False)
ani.save(filename="test.gif")
The primary difference is that I'm not accessing the shpreader at all inside the run function. When making an animation, the only thing that should be in the run function are things that change, you don't need to re-draw everything every frame.
That said, this could be even better if you just keep the artist from the very first draw and just change the color of it in the run function, instead of doing a whole new ax.add_geometries. You'll have to look into how to change the color of a cartopy FeatureArtist for that.
Just to address the second point about not having to draw the whole shape again:
Instead of storing the shape information, store the feature artist, i.e.:
feature_artist = ax.add_geometries(country.geometry, ccrs.PlateCarree(),
facecolor=facecolor, edgecolor=edgecolor)
geom_dict[country.attributes['name_long']] = feature_artist
Then, in the updating loop, instead of calling ax.add_geometries again, call the following:
geom._feature._kwargs['facecolor'] = facecolor
This will update the facecolor. (You could also change the adgecolor - since it stays the same, you can leave that away.)
I'm working on a school project and I'm stuck in making a grouped bar chart. I found this article online with an explanation: https://www.pythoncharts.com/2019/03/26/grouped-bar-charts-matplotlib/
Now I have a dataset with an Age column and a Sex column in the Age column there stand how many years the client is and in the sex is a 0 for female and 1 for male. I want to plot the age difference between male and female. Now I have tried the following code like in the example:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import pylab as pyl
fig, ax = plt.subplots(figsize=(12, 8))
x = np.arange(len(data.Age.unique()))
# Define bar width. We'll use this to offset the second bar.
bar_width = 0.4
# Note we add the `width` parameter now which sets the width of each bar.
b1 = ax.bar(x, data.loc[data['Sex'] == '0', 'count'], width=bar_width)
# Same thing, but offset the x by the width of the bar.
b2 = ax.bar(x + bar_width, data.loc[data['Sex'] == '1', 'count'], width=bar_width)
This raised the following error: KeyError: 'count'
Then I tried to change the code a bit and got another error:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import pylab as pyl
fig, ax = plt.subplots(figsize=(12, 8))
x = np.arange(len(data.Age.unique()))
# Define bar width. We'll use this to offset the second bar.
bar_width = 0.4
# Note we add the `width` parameter now which sets the width of each bar.
b1 = ax.bar(x, (data.loc[data['Sex'] == '0'].count()), width=bar_width)
# Same thing, but offset the x by the width of the bar.
b2 = ax.bar(x + bar_width, (data.loc[data['Sex'] == '1'].count()), width=bar_width)
This raised the error: ValueError: shape mismatch: objects cannot be broadcast to a single shape
Now how do I count the results that I do can make this grouped bar chart?
It seems like the article goes through too much trouble just to plot grouped chart bar:
np.random.seed(1)
data = pd.DataFrame({'Sex':np.random.randint(0,2,1000),
'Age':np.random.randint(20,50,1000)})
(data.groupby('Age')['Sex'].value_counts() # count the Sex values for each Age
.unstack('Sex') # turn Sex into columns
.plot.bar(figsize=(12,6)) # plot grouped bar
)
Or even simpler with seaborn:
fig, ax = plt.subplots(figsize=(12,6))
sns.countplot(data=data, x='Age', hue='Sex', ax=ax)
Output: