How do I use matplotlib autopct? - python

I'd like to create a matplotlib pie chart which has the value of each wedge written on top of the wedge.
The documentation suggests I should use autopct to do this.
autopct: [ None | format string |
format function ]
If not None, is a string or function used to label the wedges with
their numeric value. The label will be
placed inside the wedge. If it is a
format string, the label will be
fmt%pct. If it is a function, it will
be called.
Unfortunately, I'm unsure what this format string or format function is supposed to be.
Using this basic example below, how can I display each numerical value on top of its wedge?
plt.figure()
values = [3, 12, 5, 8]
labels = ['a', 'b', 'c', 'd']
plt.pie(values, labels=labels) #autopct??
plt.show()

autopct enables you to display the percent value using Python string formatting. For example, if autopct='%.2f', then for each pie wedge, the format string is '%.2f' and the numerical percent value for that wedge is pct, so the wedge label is set to the string '%.2f'%pct.
import matplotlib.pyplot as plt
plt.figure()
values = [3, 12, 5, 8]
labels = ['a', 'b', 'c', 'd']
plt.pie(values, labels=labels, autopct='%.2f')
plt.show()
yields
You can do fancier things by supplying a callable to autopct. To display both the percent value and the original value, you could do this:
import matplotlib.pyplot as plt
# make the pie circular by setting the aspect ratio to 1
plt.figure(figsize=plt.figaspect(1))
values = [3, 12, 5, 8]
labels = ['a', 'b', 'c', 'd']
def make_autopct(values):
def my_autopct(pct):
total = sum(values)
val = int(round(pct*total/100.0))
return '{p:.2f}% ({v:d})'.format(p=pct,v=val)
return my_autopct
plt.pie(values, labels=labels, autopct=make_autopct(values))
plt.show()
Again, for each pie wedge, matplotlib supplies the percent value pct as the argument, though this time it is sent as the argument to the function my_autopct. The wedge label is set to my_autopct(pct).

You can do:
plt.pie(values, labels=labels, autopct=lambda p : '{:.2f}% ({:,.0f})'.format(p,p * sum(values)/100))

Using lambda and format may be better
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
path = r"C:\Users\byqpz\Desktop\DATA\raw\tips.csv"
df = pd.read_csv(path, engine='python', encoding='utf_8_sig')
days = df.groupby('day').size()
sns.set()
days.plot(kind='pie', title='Number of parties on different days', figsize=[8,8],
autopct=lambda p: '{:.2f}%({:.0f})'.format(p,(p/100)*days.sum()))
plt.show()

As autopct is a function used to label the wedges with their numeric value, you can write there any label or format items quantity with it as you need. The easiest approach for me to show percentage label is using lambda:
autopct = lambda p:f'{p:.2f}%'
or for some cases you can label data as
autopct = lambda p:'any text you want'
and for your code, to show percentage you can use:
plt.figure()
values = [3, 12, 5, 8]
labels = ['a', 'b', 'c', 'd']
plt.pie(values, labels=labels, autopct=lambda p:f'{p:.2f}%, {p*sum(values)/100 :.0f} items')
plt.show()
and result will be like:

autopct enables you to display the percentage value of each slice using Python string formatting.
For example,
autopct = '%.1f' # display the percentage value to 1 decimal place
autopct = '%.2f' # display the percentage value to 2 decimal places
If you want to show the % symbol on the pie chart, you have to write/add:
autopct = '%.1f%%'
autopct = '%.2f%%'

val=int(pct*total/100.0)
should be
val=int((pct*total/100.0)+0.5)
to prevent rounding errors.

With the help of matplotlib gallary and hints from StackOverflow users, I came up with the following pie chart.
the autopct shows amounts and kinds of ingredients.
import matplotlib.pyplot as plt
%matplotlib inline
reciepe= ["480g Flour", "50g Eggs", "90g Sugar"]
amt=[int(x.split('g ')[0]) for x in reciepe]
ing=[x.split()[-1] for x in reciepe]
fig, ax=plt.subplots(figsize=(5,5), subplot_kw=dict(aspect='equal'))
wadges, text, autotext=ax.pie(amt, labels=ing, startangle=90,
autopct=lambda p:"{:.0f}g\n({:.1f})%".format(p*sum(amt)/100, p),
textprops=dict(color='k', weight='bold', fontsize=8))
ax.legend(wadges, ing,title='Ingredents', loc='best', bbox_to_anchor=(0.35,0.85,0,0))
Piechart showing the amount and of percent of a sample recipe ingredients
Pie chart showing the salary and percent of programming Language users

Related

How to label these points on the scatter plot

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_excel("path to the file")
fig, ax = plt.subplots()
fig.set_size_inches(7,3)
df = pd.DataFrame(data, columns = ['Player', 'Pos', 'Age'])
df.plot.scatter(x='Age',
y='Pos',
c='DarkBlue', xticks=([15,20,25,30,35,40]))
plt.show()
Got the plot but not able to label these points
Provided you'd like to label each point, you can loop over each coordinate plotted, assigning it a label using plt.text() at the plotted point's position, like so:
from matplotlib import pyplot as plt
y_points = [i for i in range(0, 20)]
x_points = [(i*3) for i in y_points]
offset = 5
plt.figure()
plt.grid(True)
plt.scatter(x_points, y_points)
for i in range(0, len(x_points)):
plt.text(x_points[i] - offset, y_points[i], f'{x_points[i]}')
plt.show()
In the above example it will give the following:
The offset is just to make the labels more readable so that they're not right on top of the scattered points.
Obviously we don't have access to your spreadsheet, but the same basic concept would apply.
EDIT
For non numerical values, you can simply define the string as the coordinate. This can be done like so:
from matplotlib import pyplot as plt
y_strings = ['a', 'b', 'c', 'd', 'a', 'b', 'c', 'd']
x_values = [i for i, string in enumerate(y_strings)]
# Plot coordinates:
plt.scatter(x_values, y_strings)
for i, string in enumerate(y_strings):
plt.text(x_values[i], string, f'{x_values[i]}:{string}')
plt.grid(True)
plt.show()
Which will provide the following output:

Pandas: label with a given set of yaxis values category plot

The csv has the following values
Name Grade
Jack B
Jill C
The labels for the y-axis are B and C from the CSV. But i want the y axis to contain all the grades- A,B,C,D,F .This plots only the given values in the y-axis(B,C),
ax = sns.catplot(x = "Name", y = "Grade")
Is there any possible way to give all the grades in the y-axis to plot.
When you call sns.catplot() without the kind argument, it invokes the default sns.stripplot, which only works if y is numerical. So if you really want this kind of plot, you should code the grades as numbers. You can still show the grade letters in the plot, by assigning them as labels:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
# code grades as numbers (A: 1 etc.)
df = pd.DataFrame({'Name': ['Jack', 'Jill'],
'Grade': [2, 3]})
# catplot (i.e. the default stripplot) works, as y is numerical
sns.catplot(x='Name', y='Grade', data=df)
# provide y tick positions and labels (translate numbers back to grade letters)
plt.yticks(range(1, 7), [chr(ord('A') + i) for i in range(6)])
Edit: If you want to have A on top, just add this line at the end:
plt.gca().invert_yaxis()

Modify plot axis so the order of its tick labels and their respective points change accordingly - without modifying the data itself

I want to reorder x-axis tick labels such that the data also changes appropriately.
Example
y = [5,8,9,10]
x = ['a', 'b', 'c', 'd']
plt.plot(y, x)
What I want the plot to look like by modifying the location of axis ticks.
Please note that I don't want to achieve this by modifying the order of my data
My Try
# attempt 1
fig, ax =plt.subplots()
plt.plot(y,x)
ax.set_xticklabels(['b', 'c', 'a', 'd'])
# this just overwrites the labels, not what we intended
# attempt2
fig, ax =plt.subplots()
plt.plot(y,x)
locs, labels = plt.xticks()
plt.xticks((1,2,0,3)); # This is essentially showing the location
# of the labels to dsiplay irrespective of the order of the tuple.
Edit:
Based on comments here are some further clarifications.
Let's say the first point (a,5) in fig 1. If I changed my x-axis definition such that a is now defined at the third position, then it gets reflected in the plot as well, which means, 5 on y-axis moves with a as shown in fig-2. One way to achieve this would be to re-order the data. However, I would like to see if it is possible to achieve it somehow by changing axis locations. To summarize, the data should be plotted based on how we define our custom axis without re-ordering the original data.
Edit2:
Based on the discussion in the comments it's not possible to do it by just modifying axis labels. Any approach would involve modifying the data. This was an oversimplification of the original problem I was facing. Finally, using dictionary-based labels in a pandas data frame helped me to sort the axis values in a specific order while also making sure that their respective values change accordingly.
Toggling between two different orders of the x axis categories could look as follows,
import numpy as np
import matplotlib.pyplot as plt
x = ['a', 'b', 'c', 'd']
y = [5,8,9,10]
order1 = ['a', 'b', 'c', 'd']
order2 = ['b', 'c', 'a', 'd']
fig, ax = plt.subplots()
line, = ax.plot(x, y, marker="o")
def toggle(order):
_, ind1 = np.unique(x, return_index=True)
_, inv2 = np.unique(order, return_inverse=True)
y_new = np.array(y)[ind1][inv2]
line.set_ydata(y_new)
line.axes.set_xticks(range(len(order)))
line.axes.set_xticklabels(order)
fig.canvas.draw_idle()
curr = [0]
orders = [order1, order2]
def onclick(evt):
curr[0] = (curr[0] + 1) % 2
toggle(orders[curr[0]])
fig.canvas.mpl_connect("button_press_event", onclick)
plt.show()
Click anywhere on the plot to toggle between order1 and order2.

Bar Chart with Line Chart - Using non numeric index

I'd like to show on the same graph a bar chart of a dataframe, and a line chart that represents the sum.
I can do that for a frame for which the index is numeric or text. But it doesn't work for a datetime index.
Here is the code I use:
import datetime as dt
np.random.seed(1234)
data = np.random.randn(10, 2)
date = dt.datetime.today()
index_nums = range(10)
index_text = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'k']
index_date = pd.date_range(date + dt.timedelta(days=-9), date)
a_nums = pd.DataFrame(columns=['a', 'b'], index=index_nums, data=data)
a_text = pd.DataFrame(columns=['a', 'b'], index=index_text, data=data)
a_date = pd.DataFrame(columns=['a', 'b'], index=index_date, data=data)
fig, ax = plt.subplots(3, 1)
ax = ax.ravel()
for i, a in enumerate([a_nums, a_text, a_date]):
a.plot.bar(stacked=True, ax=ax[i])
(a.sum(axis=1)).plot(c='k', ax=ax[i])
As you can see the last chart comes only as the line with the bar chart legend. And the dates are missing.
Also if I replace the last line with
ax[i].plot(a.sum(axis=1), c='k')
Then:
The chart with index_nums is the same
The chart with index_text raises an error
the chart with index_date shows the bar chart but not the line chart.
fgo I'm using pytho 3.6.2 pandas 0.20.3 and matplotlib 2.0.2
Plotting a bar plot and a line plot to the same axes may often be problematic, because a bar plot puts the bars at integer positions (0,1,2,...N-1) while a line plot uses the numeric data to determine the ordinates.
In the case from the question, using range(10) as index for both bar and line plot works fine, since those are exactly the numbers a bar plot would use anyways. Using text also works fine, since this needs to be replaced by numbers in order to show it and of course the first N integers are used for that.
The bar plot for a datetime index also uses the first N integers, while the line plot will plot on the dates. Hence depending on which one comes first, you only see the line or bar plot (you would actually see the other by changing the xlimits accordingly).
An easy solution is to plot the bar plot first and reset the index to a numeric one on the dataframe for the line plot.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np; np.random.seed(1234)
import datetime as dt
data = np.random.randn(10, 2)
date = dt.datetime.today()
index_date = pd.date_range(date + dt.timedelta(days=-9), date)
df = pd.DataFrame(columns=['a', 'b'], index=index_date, data=data)
fig, ax = plt.subplots(1, 1)
df.plot.bar(stacked=True, ax=ax)
df.sum(axis=1).reset_index().plot(ax=ax)
fig.autofmt_xdate()
plt.show()
Alternatively you can plot the lineplot as usual and use a matplotlib bar plot, which accepts numeric positions. See this answer: Python making combined bar and line plot with secondary y-axis

Add Legend to Seaborn point plot

I am plotting multiple dataframes as point plot using seaborn. Also I am plotting all the dataframes on the same axis.
How would I add legend to the plot ?
My code takes each of the dataframe and plots it one after another on the same figure.
Each dataframe has same columns
date count
2017-01-01 35
2017-01-02 43
2017-01-03 12
2017-01-04 27
My code :
f, ax = plt.subplots(1, 1, figsize=figsize)
x_col='date'
y_col = 'count'
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df_1,color='blue')
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df_2,color='green')
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df_3,color='red')
This plots 3 lines on the same plot. However the legend is missing. The documentation does not accept label argument .
One workaround that worked was creating a new dataframe and using hue argument.
df_1['region'] = 'A'
df_2['region'] = 'B'
df_3['region'] = 'C'
df = pd.concat([df_1,df_2,df_3])
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df,hue='region')
But I would like to know if there is a way to create a legend for the code that first adds sequentially point plot to the figure and then add a legend.
Sample output :
I would suggest not to use seaborn pointplot for plotting. This makes things unnecessarily complicated.
Instead use matplotlib plot_date. This allows to set labels to the plots and have them automatically put into a legend with ax.legend().
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np
date = pd.date_range("2017-03", freq="M", periods=15)
count = np.random.rand(15,4)
df1 = pd.DataFrame({"date":date, "count" : count[:,0]})
df2 = pd.DataFrame({"date":date, "count" : count[:,1]+0.7})
df3 = pd.DataFrame({"date":date, "count" : count[:,2]+2})
f, ax = plt.subplots(1, 1)
x_col='date'
y_col = 'count'
ax.plot_date(df1.date, df1["count"], color="blue", label="A", linestyle="-")
ax.plot_date(df2.date, df2["count"], color="red", label="B", linestyle="-")
ax.plot_date(df3.date, df3["count"], color="green", label="C", linestyle="-")
ax.legend()
plt.gcf().autofmt_xdate()
plt.show()
In case one is still interested in obtaining the legend for pointplots, here a way to go:
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df1,color='blue')
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df2,color='green')
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df3,color='red')
ax.legend(handles=ax.lines[::len(df1)+1], labels=["A","B","C"])
ax.set_xticklabels([t.get_text().split("T")[0] for t in ax.get_xticklabels()])
plt.gcf().autofmt_xdate()
plt.show()
Old question, but there's an easier way.
sns.pointplot(x=x_col,y=y_col,data=df_1,color='blue')
sns.pointplot(x=x_col,y=y_col,data=df_2,color='green')
sns.pointplot(x=x_col,y=y_col,data=df_3,color='red')
plt.legend(labels=['legendEntry1', 'legendEntry2', 'legendEntry3'])
This lets you add the plots sequentially, and not have to worry about any of the matplotlib crap besides defining the legend items.
I tried using Adam B's answer, however, it didn't work for me. Instead, I found the following workaround for adding legends to pointplots.
import matplotlib.patches as mpatches
red_patch = mpatches.Patch(color='#bb3f3f', label='Label1')
black_patch = mpatches.Patch(color='#000000', label='Label2')
In the pointplots, the color can be specified as mentioned in previous answers. Once these patches corresponding to the different plots are set up,
plt.legend(handles=[red_patch, black_patch])
And the legend ought to appear in the pointplot.
This goes a bit beyond the original question, but also builds on #PSub's response to something more general---I do know some of this is easier in Matplotlib directly, but many of the default styling options for Seaborn are quite nice, so I wanted to work out how you could have more than one legend for a point plot (or other Seaborn plot) without dropping into Matplotlib right at the start.
Here's one solution:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# We will need to access some of these matplotlib classes directly
from matplotlib.lines import Line2D # For points and lines
from matplotlib.patches import Patch # For KDE and other plots
from matplotlib.legend import Legend
from matplotlib import cm
# Initialise random number generator
rng = np.random.default_rng(seed=42)
# Generate sample of 25 numbers
n = 25
clusters = []
for c in range(0,3):
# Crude way to get different distributions
# for each cluster
p = rng.integers(low=1, high=6, size=4)
df = pd.DataFrame({
'x': rng.normal(p[0], p[1], n),
'y': rng.normal(p[2], p[3], n),
'name': f"Cluster {c+1}"
})
clusters.append(df)
# Flatten to a single data frame
clusters = pd.concat(clusters)
# Now do the same for data to feed into
# the second (scatter) plot...
n = 8
points = []
for c in range(0,2):
p = rng.integers(low=1, high=6, size=4)
df = pd.DataFrame({
'x': rng.normal(p[0], p[1], n),
'y': rng.normal(p[2], p[3], n),
'name': f"Group {c+1}"
})
points.append(df)
points = pd.concat(points)
# And create the figure
f, ax = plt.subplots(figsize=(8,8))
# The KDE-plot generates a Legend 'as usual'
k = sns.kdeplot(
data=clusters,
x='x', y='y',
hue='name',
shade=True,
thresh=0.05,
n_levels=2,
alpha=0.2,
ax=ax,
)
# Notice that we access this legend via the
# axis to turn off the frame, set the title,
# and adjust the patch alpha level so that
# it closely matches the alpha of the KDE-plot
ax.get_legend().set_frame_on(False)
ax.get_legend().set_title("Clusters")
for lh in ax.get_legend().get_patches():
lh.set_alpha(0.2)
# You would probably want to sort your data
# frame or set the hue and style order in order
# to ensure consistency for your own application
# but this works for demonstration purposes
groups = points.name.unique()
markers = ['o', 'v', 's', 'X', 'D', '<', '>']
colors = cm.get_cmap('Dark2').colors
# Generate the scatterplot: notice that Legend is
# off (otherwise this legend would overwrite the
# first one) and that we're setting the hue, style,
# markers, and palette using the 'name' parameter
# from the data frame and the number of groups in
# the data.
p = sns.scatterplot(
data=points,
x="x",
y="y",
hue='name',
style='name',
markers=markers[:len(groups)],
palette=colors[:len(groups)],
legend=False,
s=30,
alpha=1.0
)
# Here's the 'magic' -- we use zip to link together
# the group name, the color, and the marker style. You
# *cannot* retreive the marker style from the scatterplot
# since that information is lost when rendered as a
# PathCollection (as far as I can tell). Anyway, this allows
# us to loop over each group in the second data frame and
# generate a 'fake' Line2D plot (with zero elements and no
# line-width in our case) that we can add to the legend. If
# you were overlaying a line plot or a second plot that uses
# patches you'd have to tweak this accordingly.
patches = []
for x in zip(groups, colors[:len(groups)], markers[:len(groups)]):
patches.append(Line2D([0],[0], linewidth=0.0, linestyle='',
color=x[1], markerfacecolor=x[1],
marker=x[2], label=x[0], alpha=1.0))
# And add these patches (with their group labels) to the new
# legend item and place it on the plot.
leg = Legend(ax, patches, labels=groups,
loc='upper left', frameon=False, title='Groups')
ax.add_artist(leg);
# Done
plt.show();
Here's the output:

Categories

Resources