Plotting Pandas DataFrames in to Pie Charts using matplotlib - python

Is it possible to print a DataFrame as a pie chart using matplotlib? The Pandas documentation on chart visualization has instructions for plotting lot of chart types including bar, histogram, scatter plot etc. But pie chart is missing?

To plot a pie chart from a dataframe df you can use Panda's plot.pie:
df.plot.pie(y='column_name')
Example:
import pandas as pd
df = pd.DataFrame({'activity': ['Work', 'Sleep', 'Play'],
'hours': [8, 10, 6]})
df.set_index('activity', inplace=True)
print(df)
# hours
# activity
# Work 8
# Sleep 10
# Play 6
plot = df.plot.pie(y='hours', figsize=(7, 7))
Note that the labels of the pie chart are the index entries, this is the reason for using set_index to set the index to activity.
To style the plot, you can use all those arguments that can be passed to DataFrame.plot(), here an example showing percentages:
plot = df.plot.pie(y='hours', title="Title", legend=False, \
autopct='%1.1f%%', explode=(0, 0, 0.1), \
shadow=True, startangle=0)

Pandas has this built in to the pd.DataFrame.plot(). All you have to do is use kind='pie' flag and tell it which column you want (or use subplots=True to get all columns). This will automatically add the labels for you and even do the percentage labels as well.
import matplotlib.pyplot as plt
df.Data.plot(kind='pie')
To make it a little more customization you can do this:
fig = plt.figure(figsize=(6,6), dpi=200)
ax = plt.subplot(111)
df.Data.plot(kind='pie', ax=ax, autopct='%1.1f%%', startangle=270, fontsize=17)
Where you tell the DataFrame that ax=ax. You can also use all the normal matplotlib plt.pie() flags as shown above.

import matplotlib.pyplot as plt
plt.pie(DataFrame([1,2,3]))
seems to work as expected. If the DataFrame has more than one column, it will raise.

Related

Overlaying Pandas plot with Matplotlib is sensitive to the plotting order

I have the following problem: I'm trying to overlay two plots: One Pandas plot via plot.area() for a dataframe, and a second plot that is a standard Matplotlib plot. Depending the coder order for those two, the Matplotlib plot is displayed only if the code is before the Pandas plot.area() on the same axes.
Example: I have a Pandas dataframe called revenue that has a DateTimeIndex, and a single column with "revenue" values (float). Separately I have a dataset called projection with data along the same index (revenue.index)
If the code looks like this:
import pandas as pd
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(10, 6))
# First -- Pandas area plot
revenue.plot.area(ax = ax)
# Second -- Matplotlib line plot
ax.plot(revenue.index, projection, color='black', linewidth=3)
plt.tight_layout()
plt.show()
Then the only thing displayed is the pandas plot.area() like this:
1/ Pandas plot.area() and 2/ Matplotlib line plot
However, if the order of the plotting is reversed:
fig, ax = plt.subplots(figsize=(10, 6))
# First -- Matplotlib line plot
ax.plot(revenue.index, projection, color='black', linewidth=3)
# Second -- Pandas area plot
revenue.plot.area(ax = ax)
plt.tight_layout()
plt.show()
Then the plots are overlayed properly, like this:
1/ Matplotlib line plot and 2/ Pandas plot.area()
Can someone please explain me what I'm doing wrong / what do I need to do to make the code more robust ? Kind TIA.
The values on the x-axis are different in both plots. I think DataFrame.plot.area() formats the DateTimeIndex in a pretty way, which is not compatible with pyplot.plot().
If you plot of the projection first, plot.area() can still plot the data and does not format the x-axis.
Mixing the two seems tricky to me, so I would either use pyplot or Dataframe.plot for both the area and the line:
import pandas as pd
from matplotlib import pyplot as plt
projection = [1000, 2000, 3000, 4000]
datetime_series = pd.to_datetime(["2021-12","2022-01", "2022-02", "2022-03"])
datetime_index = pd.DatetimeIndex(datetime_series.values)
revenue = pd.DataFrame({"value": [1200, 2200, 2800, 4100]})
revenue = revenue.set_index(datetime_index)
fig, ax = plt.subplots(1, 2, figsize=(10, 4))
# Option 1: only pyplot
ax[0].fill_between(revenue.index, revenue.value)
ax[0].plot(revenue.index, projection, color='black', linewidth=3)
ax[0].set_title("Pyplot")
# Option 2: only DataFrame.plot
revenue["projection"] = projection
revenue.plot.area(y='value', ax=ax[1])
revenue.plot.line(y='projection', ax=ax[1], color='black', linewidth=3)
ax[1].set_title("DataFrame.plot")
The results then look like this, where DataFrame.plot gives a much cleaner looking result:
If you do not want the projection in the revenue DataFrame, you can put it in a separate DataFrame and set the index to match revenue:
projection_df = pd.DataFrame({"projection": projection})
projection_df = projection_df.set_index(datetime_index)
projection_df.plot.line(ax=ax[1], color='black', linewidth=3)

How to affect a list of colors to histogram index bar in matplotlib?

I have the the folowing dataframe "freqs2" with index (SD to SD17) and associated values (frequencies) :
freqs
SD 101
SD2 128
...
SD17 65
I would like to affect a list of precise colors (in order) for each index. I've tried the following code :
colors=['#e5243b','#DDA63A', '#4C9F38','#C5192D','#FF3A21','#26BDE2','#FCC30B','#A21942','#FD6925','#DD1367','#FD9D24','#BF8B2E','#3F7E44','#0A97D9','#56C02B','#00689D','#19486A']
freqs2.plot.bar(freqs2.index, legend=False,rot=45,width=0.85, figsize=(12, 6),fontsize=(14),color=colors )
plt.ylabel('Frequency',fontsize=(17))
As result I obtain all my chart bars in red color (first color of the list).
Based on similar questions, I've tried to integrate "freqs2.index" to stipulate that the list of colors concern index but the problem stay the same.
It looks like a bug in pandas, plotting directly in matplotlib or using seaborn (which I recommend) works:
import seaborn as sns
colors=['#e5243b','#dda63a', '#4C9F38','#C5192D','#FF3A21','#26BDE2','#FCC30B','#A21942','#FD6925','#DD1367','#FD9D24','#BF8B2E','#3F7E44','#0A97D9','#56C02B','#00689D','#19486A']
# # plotting directly with matplotlib works too:
# fig = plt.figure()
# ax = fig.add_axes([0,0,1,1])
# ax.bar(x=df.index, height=df['freqs'], color=colors)
ax = sns.barplot(data=df, x= df.index, y='freqs', palette=colors)
ax.tick_params(axis='x', labelrotation=45)
plt.ylabel('Frequency',fontsize=17)
plt.show()
Edit: an issue already exists on Github

is it possible to combine 2 differents styles in Matplotlib or seaborn in one plot?

I don't know if it's possible with Matplotlib or seaborn or another tools to plot 1 line and 1 bar (candlestick style) , both in one figure . Like the image below (in excel) :
The x-axis and y-axis are the same
following the response below , I choose mplfinance : mplfinance
i have the following dataframe (daily)
and with the following function we can plot :
def ploting_chart(daily):
# Take marketcolors from 'yahoo'
mc = mpf.make_marketcolors(base_mpf_style='yahoo',up='#ff3300',down='#009900',inherit=True)
# Create a style based on `seaborn` using those market colors:
s = mpf.make_mpf_style(base_mpl_style='seaborn',marketcolors=mc,y_on_right=True,
gridstyle = 'solid' , mavcolors = ['#4d79ff','#d24dff']
)
# **kwargs
kwargs = dict(
type='candle',mav=(7,15),volume=True, figratio=(11,8),figscale=2,
title = 'Covid-19 Madagascar en traitement',ylabel = 'Total en traitement',
update_width_config=dict(candle_linewidth=0.5,candle_width=0.5),
ylabel_lower = 'Total'
)
# Plot my new custom mpf style:
mpf.plot(daily,**kwargs,style=s,scale_width_adjustment=dict(volume=0.4))
I get the final result
Yes, the plt.figure or plt.subplots gives you a figure object and then you can plot as many figures as you want. In fact if you use
import seaborn as sns
fmri = sns.load_dataset("fmri")
f,ax = plt.subplots(1,1,figsize=(10,7)) # make a subplot of 1 row and 1 column
g1 = sns.lineplot(x="timepoint", y="signal", data=fmri,ax=ax) # ax=axis object is must
g2 = sns.some_other_chart(your_data, ax=ax)
g3 = ax.some_matlotlib_chart(your_data) # no need to use ax=ax
Seaborn does not support Candlestick but you can plot using the matplotlib on the same axis.
from matplotlib.finance import candlestick_ohlc
candlestick_ohlc(ax, data.values, width=0.6, colorup='g', colordown='r') # just a dummy code to explain. YOu can see the ax object here as first arg
You can even use the pandas df.plot(data,kind='bar',ax=ax,**kwargs) to plot within the same axis object.
Note: Some of the seaborn charts do not support plotting on the same ax because they use their own grid such as relplot
Yes, mplfinance allows you to plot multiple data sets, on the same plot, or on multiple subplots, where each one can be any of candlestick, ohlc-bars, line, scatter, or bar chart.
For more information, see for example:
Adding Your Own Technical Studies to Plots
Subplots: Multiple Plots on a Single Figure, including:
The Panels Method
External Axes Method
Note, as a general rule, it is recommended to not use the "External Axes Method" if what you are trying to accomplish can be done otherwise with mplfinance in panels mode.

Vertically align time series (plot and barplot) sharing same x-axis in matplotlib

Is there an easy way to align two subplots of a time series of different kinds (plot and barplot) in matplotlib? I use the pandas wrapper since I am dealing with pd.Series objects:
import pandas as pd
import matplotlib.pyplot as plt
series = pd._testing.makeTimeSeries()
fig, axes = plt.subplots(2, 1)
series.head(3).plot(marker='o', ax=axes[0])
series.head(3).plot.bar(ax=axes[1])
plt.tight_layout()
The result is not visually great, it would be great to keep the code simplicity and:
Vertically align data points in the top plot to the bars on the bottom plot
Share the axis of the bar plot with the first and remove the visibility on x-axis labels of the top plot altogether (but keep grids whenever present)
Based on the ideas thrown in the comments, I think that this is the simplest solution (giving up the pandas API), which is exactly what I needed:
import pandas as pd
import matplotlib.pyplot as plt
series = pd._testing.makeTimeSeries()
fig, axes = plt.subplots(2, 1, sharex=True)
axes[0].plot(series.head(3), marker='o')
axes[1].bar(series.head(3).index, series.head(3))
plt.tight_layout()
With eventual fix on the xticks for cases with missing values, where the xticks are not plotted daily (e.g. plt.xticks(series.head(3).index)).
Thanks for the help!

Pandas bar plot with specific colors and legend location?

I have a pandas DataFrame and I want to plot a bar chart that includes a legend.
import pylab as pl
from pandas import *
x = DataFrame({"Alpha": Series({1: 1, 2: 3, 3:2.5}), "Beta": Series({1: 2, 2: 2, 3:3.5})})
If I call plot directly, then it puts the legend above the plot:
x.plot(kind="bar")
If I turn of the legend in the plot and try to add it later, then it doesn't retain the colors associated with the two columns in the DataFrame (see below):
x.plot(kind="bar", legend=False)
l = pl.legend(('Alpha','Beta'), loc='best')
What's the right way to include a legend in a matplotlib plot from a Pandas DataFrame?
The most succinct way to go is:
x.plot(kind="bar").legend(bbox_to_anchor=(1.2, 0.5))
or in general
x.plot(kind="bar").legend(*args, **kwargs)
If you want to add the legend manually, you have to ask the subplot for the elements of the bar plot:
In [17]: ax = x.plot(kind='bar', legend=False)
In [18]: patches, labels = ax.get_legend_handles_labels()
In [19]: ax.legend(patches, labels, loc='best')
Out[19]: <matplotlib.legend.Legend at 0x10b292ad0>
Also, plt.legend(loc='best') or ax.legend(loc='best') should "just work", because there are already "links" to the bar plot patches set up when the plot is made, so you don't have to pass a list of axis labels.
I'm not sure if the version of pandas you're using returns a handle to the subplot (ax = ...) but I'm fairly certain that 0.7.3 does. You can always get a reference to it with plt.gca().

Categories

Resources