How to Create Double or Stacked Bar Graph Using Matplotlib - python

How do you create a grouped or stacked bar graph using three sets of value_counts() data from a csv file in Python? I can successfully graph each set individually using this code:
dfwidget1.country.value_counts().plot('bar', color='blue')
I don't know, however, how to get them all to plot onto the same graph. Obviously this, which I've tried, doesn't work:
dfwidget1.country.value_counts().plot('bar', color='blue')
dfwidget2.country.value_counts().plot('bar', color='red')
dfwidget3.country.value_counts().plot('bar', color='yellow')
After researching, I've also tried using groupby(), but without success. If that's the way to go, I'd appreciate affirmation along those lines, and I'll go study up. If there is a simpler way to do it, I'm all ears.
Here's the toy df (saved as widget-by-country.csv):
Here's the code I've tried:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set()
df = pd.read_csv('widget-by-country.csv')
dfwidget1 = df[df['category'].str.contains('widget1', na=False)]
dfwidget2 = df[df['category'].str.contains('widget2', na=False)]
dfwidget3 = df[df['category'].str.contains('widget3', na=False)]
dfwidget1.country.value_counts().plot('bar', color='blue')
dfwidget2.country.value_counts().plot('bar', color='red')
dfwidget3.country.value_counts().plot('bar', color='yellow')
This is what I get, which does not show me the full distribution of countries where each widget is made:
I'd really like to see a grouped bar graph showing this data that looks something like this:

Related

Stacked barplot with some customizations using matplotlib

I try to produce a stacked barplot showing some categories. However, the current dataframe seems difficult to stack categories together. Also, some years has no count, and this should be removed. Any ideas how to produce something like pic below. Highly appreciate your time.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df=pd.read_csv(r"https://raw.githubusercontent.com/tuyenhavan/Course_Data/main/test_barplot.csv")
fig,ax=plt.subplots(figsize=(15,8))
ax.bar(df.Year,df.Number)
plt.xticks(np.arange(2000,2022),np.arange(2000,2022))
plt.xlabel("Year", fontsize=15)
plt.ylabel("Number", fontsize=15)
plt.xticks()
plt.show()
You can reshape the data such that the stacked categories are columns. Then you can use pandas plot.bar with stacked=True. reindex adds the missing years.
fig, ax=plt.subplots(figsize=(15,8))
df_stack = df.pivot_table(index="Year",
columns="Category",
values="Number",
aggfunc=sum)
df_stack = df_stack.reindex(np.arange(2000, 2022))
df_stack.plot.bar(stacked=True, ax=ax)
plt.xlabel("Year", fontsize=15)
plt.ylabel("Number", fontsize=15)
Double Agriculture is due to one with and one without trailing space.

Plot separately different parts of a dataframe

I would like to plot from the seaborn dataset 'tips'.
import seaborn as sns
import pandas as pd
tips = sns.load_dataset("tips")
x1 = tips.loc[(df['time']=='lunch'), 'tip']
x2 = tips.loc[(df['time']=='dinner'),'tip']
x1.plot.kde(color='orange')
x2.plot.kde(color='blue')
plt.show()
I don't know exactly where it's wrong...
Thanks for the help.
Seaborn's sns.kdeplot() supports the hue argument to split the plot between different categories:
import seaborn as sns
import pandas as pd
tips = sns.load_dataset("tips")
sns.kdeplot(data=tips, x='tip', hue='time')
Of course your approach could work too, but there are several problems with your code:
What is df? Shouldn't that be tips?
The category names Lunch and Dinner must be capitalized, as in the data.
You're mixing different indexing techniques. It should be e.g. x1 = tips.tip[tips['time'] == 'Lunch'].
If you want to plot two KDE in the same diagram, they should be scaled according to sample size. With my approach above, seaborn has done that automatically.
As you are loading data from the seaborn built-in datasets check that your column names are case sensitive replace them with correct name.
You can plot the cumulative distribution between the time and density as follows:
sns.kdeplot(
data=tips, x="total_bill", hue="time",
cumulative=True, common_norm=False, common_grid=True,
)

Python subplots with seaborn or pyplot

I'm a R programmer learning python and finding the plotting in python much more difficult than R.
I'm trying to write the following function but haven't been successful. Could anyone help?
import pandas as pd
#example data
df1 = pd.DataFrame({
'PC1':[-2.2,-2.0,2.04,0.97],
'PC2':[0.5,-0.6,0.9,-0.5],
'PC3':[-0.1,-0.2,0.2,0.8],
'f1':['a','a','b','b'],
'f2':['x','y','x','y'],
'f3':['k','g','g','k']
})
def drawPCA(df,**kwargs):
"""Produce a 1x3 subplots of scatterplot; each subplot includes two PCs with
no legend, e.g. subplot 1 is PC1 vs PC2. The legend is on the upper middle of
the figure.
Parameters
----------
df: Pandas DataFrame
The first 3 columns are the PCs, followed by sample characters.
kwargs
To specify hue,style,size, etc. if the plotting uses seaborn.scatterplot;
or c,s,etc. if using pyplot scatter
Example
----------
drawPCA(df1, hue="f1")
drawPCA(df1, c="f1", s="f2") #if plotting uses plt.scatter
drawPCA(df1, hue="f1", size="f2",style="f3")
or more varialbes passable to the actual plotting function
"""
This is what I come up with! Just two question:
is there a parameter to set the legend horizontal, instead of using the ncol?
how to prevent the figure from being displayed when running the function like this?
fig,ax=drawPCA(df1,hue="f1",style="f2",size="f3")
#may do more changing on the figure.
Here is the function:
def drawPCA2(df,**kwargs):
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.figure import figaspect
nUniVals=sum([df[i].unique().size for i in kwargs.values()])
nKeys=len(kwargs.keys())
w, h = figaspect(1/3)
fig1, axs = plt.subplots(ncols=3,figsize=(w,h))
fig1.suptitle("All the PCs")
sns.scatterplot(x="PC1",y="PC2",data=df,legend=False,ax=axs[0],**kwargs)
sns.scatterplot(x="PC1",y="PC3",data=df,legend=False,ax=axs[1],**kwargs)
sns.scatterplot(x="PC2",y="PC3",data=df,ax=axs[2],label="",**kwargs)
handles, labels = axs[2].get_legend_handles_labels()
fig1.legend(handles, labels, loc='lower center',bbox_to_anchor=(0.5, 0.85), ncol=nUniVals+nKeys)
axs[2].get_legend().remove()
fig1.tight_layout(rect=[0, 0.03, 1, 0.9])
return fig1,axs

Adding Caption to Graph in matplotlib

I have a conceptual problem in the basic structure of matplotlib.
I want to add a Caption to a graph and I do understand the advice given in Is there a way of drawing a caption box in matplotlib
However, I do not know, how to combine this with the pandas data frame I have.
Without the structure given in the link above my code looks (projects1 being my pandas data frame):
ax2=projects1.T.plot.bar(stacked=True)
ax2.set_xlabel('Year',size=20)
and it returns a barplot.
But if I want to apply the structure of above, I get stuck. I tried:
fig = plt.figure()
ax2 = fig.add_axes((.1,.4,.8,.5))
ax2.plot.bar(projects1.T,stacked=True)
And it results in various errors.
So the question is, how do I apply the structure of the link given above with pandas data frame and with more complex graphs than a mere line. Thx
Pandas plot function has an optional argument ax which can be used to supply an externally created matplotlib axes instance to the pandas plot.
import matplotlib.pyplot as plt
import pandas as pd
projects1 = ...?
fig = plt.figure()
ax2 = fig.add_axes((.1,.4,.8,.5))
projects1.T.plot.bar(stacked=True, ax = ax2)
ax2.set_xlabel('Year',size=20)

secondary_y=True changes x axis in pandas

I'm trying to plot two series together in Pandas, from different dataframes.
Both their axis are datetime objects, so they can be plotted together:
amazon_prices.Close.plot()
data[amazon].BULL_MINUS_BEAR.resample("W").plot()
plt.plot()
Yields:
All fine, but I need the green graph to have its own scale. So I use the
amazon_prices.Close.plot()
data[amazon].BULL_MINUS_BEAR.resample("W").plot(secondary_y=True)
plt.plot()
This secondary_y creates a problem, as instead of having the desired graph, I have the following:
Any help with this is hugely appreciated.
(Less relevant notes: I'm (evidently) using Pandas, Matplotlib, and all this is in an Ipython notebook)
EDIT:
I've since noticed that removing the resample("W") solves the issue. It is still a problem however as the non-resampled data is too noisy to be visible. Being able to plot sampled data with a secondary axis would be hugely helpful.
import matplotlib.pyplot as plt
import pandas as pd
from numpy.random import random
df = pd.DataFrame(random((15,2)),columns=['a','b'])
df.a = df.a*100
fig, ax1 = plt.subplots(1,1)
df.a.plot(ax=ax1, color='blue', label='a')
ax2 = ax1.twinx()
df.b.plot(ax=ax2, color='green', label='b')
ax1.set_ylabel('a')
ax2.set_ylabel('b')
ax1.legend(loc=3)
ax2.legend(loc=0)
plt.show()
I had the same issue, always getting a strange plot when I wanted a secondary_y.
I don't know why no-one mentioned this method in this post, but here's how I got it to work, using the same example as cphlewis:
import matplotlib.pyplot as plt
import pandas as pd
from numpy.random import random
df = pd.DataFrame(random((15,2)),columns=['a','b'])
ax = df.plot(secondary_y=['b'])
plt.show()
Here's what it'll look like

Categories

Resources