Stacked barplot with some customizations using matplotlib - python

I try to produce a stacked barplot showing some categories. However, the current dataframe seems difficult to stack categories together. Also, some years has no count, and this should be removed. Any ideas how to produce something like pic below. Highly appreciate your time.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df=pd.read_csv(r"https://raw.githubusercontent.com/tuyenhavan/Course_Data/main/test_barplot.csv")
fig,ax=plt.subplots(figsize=(15,8))
ax.bar(df.Year,df.Number)
plt.xticks(np.arange(2000,2022),np.arange(2000,2022))
plt.xlabel("Year", fontsize=15)
plt.ylabel("Number", fontsize=15)
plt.xticks()
plt.show()

You can reshape the data such that the stacked categories are columns. Then you can use pandas plot.bar with stacked=True. reindex adds the missing years.
fig, ax=plt.subplots(figsize=(15,8))
df_stack = df.pivot_table(index="Year",
columns="Category",
values="Number",
aggfunc=sum)
df_stack = df_stack.reindex(np.arange(2000, 2022))
df_stack.plot.bar(stacked=True, ax=ax)
plt.xlabel("Year", fontsize=15)
plt.ylabel("Number", fontsize=15)
Double Agriculture is due to one with and one without trailing space.

Related

Plot separately different parts of a dataframe

I would like to plot from the seaborn dataset 'tips'.
import seaborn as sns
import pandas as pd
tips = sns.load_dataset("tips")
x1 = tips.loc[(df['time']=='lunch'), 'tip']
x2 = tips.loc[(df['time']=='dinner'),'tip']
x1.plot.kde(color='orange')
x2.plot.kde(color='blue')
plt.show()
I don't know exactly where it's wrong...
Thanks for the help.
Seaborn's sns.kdeplot() supports the hue argument to split the plot between different categories:
import seaborn as sns
import pandas as pd
tips = sns.load_dataset("tips")
sns.kdeplot(data=tips, x='tip', hue='time')
Of course your approach could work too, but there are several problems with your code:
What is df? Shouldn't that be tips?
The category names Lunch and Dinner must be capitalized, as in the data.
You're mixing different indexing techniques. It should be e.g. x1 = tips.tip[tips['time'] == 'Lunch'].
If you want to plot two KDE in the same diagram, they should be scaled according to sample size. With my approach above, seaborn has done that automatically.
As you are loading data from the seaborn built-in datasets check that your column names are case sensitive replace them with correct name.
You can plot the cumulative distribution between the time and density as follows:
sns.kdeplot(
data=tips, x="total_bill", hue="time",
cumulative=True, common_norm=False, common_grid=True,
)

Vertically align time series (plot and barplot) sharing same x-axis in matplotlib

Is there an easy way to align two subplots of a time series of different kinds (plot and barplot) in matplotlib? I use the pandas wrapper since I am dealing with pd.Series objects:
import pandas as pd
import matplotlib.pyplot as plt
series = pd._testing.makeTimeSeries()
fig, axes = plt.subplots(2, 1)
series.head(3).plot(marker='o', ax=axes[0])
series.head(3).plot.bar(ax=axes[1])
plt.tight_layout()
The result is not visually great, it would be great to keep the code simplicity and:
Vertically align data points in the top plot to the bars on the bottom plot
Share the axis of the bar plot with the first and remove the visibility on x-axis labels of the top plot altogether (but keep grids whenever present)
Based on the ideas thrown in the comments, I think that this is the simplest solution (giving up the pandas API), which is exactly what I needed:
import pandas as pd
import matplotlib.pyplot as plt
series = pd._testing.makeTimeSeries()
fig, axes = plt.subplots(2, 1, sharex=True)
axes[0].plot(series.head(3), marker='o')
axes[1].bar(series.head(3).index, series.head(3))
plt.tight_layout()
With eventual fix on the xticks for cases with missing values, where the xticks are not plotted daily (e.g. plt.xticks(series.head(3).index)).
Thanks for the help!

Display all x values of a graph

I know it has already been asked, but I could not solve my problem.
I have three pandas column, One with dates, and other with values.
I can get my graph with the two curves depending on date.
However, I cannot display all dates in the x axis. Can you help me?
import pandas as pd
import matplotlib.pyplot as plt
# mau_file is the pandas dataframe with three columns.
plt.figure()
mau_file.plot(x='month_date', y=['mau', 'nb_migs'], figsize=(10,5), grid=True)
plt.set_xticklabels(mau_file['month_date'])
plt.legend(loc='best')
plt.show()
Usually, plt.xticks() is used to display x axis values.
As I'm not sure it is 100% compatible with a pandas structure, you may need to store your data in a classical table or a numpy array.
Documentation of plt.xticks()
EDIT : It is possible to chose the orientation of the labels.
For exemple plt.xticks(x, labels, rotation='vertical') will give you vertical labels.

secondary_y=True changes x axis in pandas

I'm trying to plot two series together in Pandas, from different dataframes.
Both their axis are datetime objects, so they can be plotted together:
amazon_prices.Close.plot()
data[amazon].BULL_MINUS_BEAR.resample("W").plot()
plt.plot()
Yields:
All fine, but I need the green graph to have its own scale. So I use the
amazon_prices.Close.plot()
data[amazon].BULL_MINUS_BEAR.resample("W").plot(secondary_y=True)
plt.plot()
This secondary_y creates a problem, as instead of having the desired graph, I have the following:
Any help with this is hugely appreciated.
(Less relevant notes: I'm (evidently) using Pandas, Matplotlib, and all this is in an Ipython notebook)
EDIT:
I've since noticed that removing the resample("W") solves the issue. It is still a problem however as the non-resampled data is too noisy to be visible. Being able to plot sampled data with a secondary axis would be hugely helpful.
import matplotlib.pyplot as plt
import pandas as pd
from numpy.random import random
df = pd.DataFrame(random((15,2)),columns=['a','b'])
df.a = df.a*100
fig, ax1 = plt.subplots(1,1)
df.a.plot(ax=ax1, color='blue', label='a')
ax2 = ax1.twinx()
df.b.plot(ax=ax2, color='green', label='b')
ax1.set_ylabel('a')
ax2.set_ylabel('b')
ax1.legend(loc=3)
ax2.legend(loc=0)
plt.show()
I had the same issue, always getting a strange plot when I wanted a secondary_y.
I don't know why no-one mentioned this method in this post, but here's how I got it to work, using the same example as cphlewis:
import matplotlib.pyplot as plt
import pandas as pd
from numpy.random import random
df = pd.DataFrame(random((15,2)),columns=['a','b'])
ax = df.plot(secondary_y=['b'])
plt.show()
Here's what it'll look like

Plotting Pandas DataFrames in to Pie Charts using matplotlib

Is it possible to print a DataFrame as a pie chart using matplotlib? The Pandas documentation on chart visualization has instructions for plotting lot of chart types including bar, histogram, scatter plot etc. But pie chart is missing?
To plot a pie chart from a dataframe df you can use Panda's plot.pie:
df.plot.pie(y='column_name')
Example:
import pandas as pd
df = pd.DataFrame({'activity': ['Work', 'Sleep', 'Play'],
'hours': [8, 10, 6]})
df.set_index('activity', inplace=True)
print(df)
# hours
# activity
# Work 8
# Sleep 10
# Play 6
plot = df.plot.pie(y='hours', figsize=(7, 7))
Note that the labels of the pie chart are the index entries, this is the reason for using set_index to set the index to activity.
To style the plot, you can use all those arguments that can be passed to DataFrame.plot(), here an example showing percentages:
plot = df.plot.pie(y='hours', title="Title", legend=False, \
autopct='%1.1f%%', explode=(0, 0, 0.1), \
shadow=True, startangle=0)
Pandas has this built in to the pd.DataFrame.plot(). All you have to do is use kind='pie' flag and tell it which column you want (or use subplots=True to get all columns). This will automatically add the labels for you and even do the percentage labels as well.
import matplotlib.pyplot as plt
df.Data.plot(kind='pie')
To make it a little more customization you can do this:
fig = plt.figure(figsize=(6,6), dpi=200)
ax = plt.subplot(111)
df.Data.plot(kind='pie', ax=ax, autopct='%1.1f%%', startangle=270, fontsize=17)
Where you tell the DataFrame that ax=ax. You can also use all the normal matplotlib plt.pie() flags as shown above.
import matplotlib.pyplot as plt
plt.pie(DataFrame([1,2,3]))
seems to work as expected. If the DataFrame has more than one column, it will raise.

Categories

Resources