I'm currently trying to make a nested doughnut chart with four layers and I have come across some problems with it.
There is one dependency in my data. I look into the changes done with a specific method and divide them into agronomical and academic traits. I then create a fourth ring which shows basically the amount of academic and each agronomical trait. I don't know how to automatically align both doughnut rings so they match.
I looked into the matplotlib documentation, but I don't understand the addressing of the colormaps. I took over the example code, but in the end its not really understandable how this is addressing the colors of it.
I need to make a legend for the chart. However, due to the long names of some of the subgroups, I can not show them in the pie chart but they should appear in the legend. When I draw the legend via the ax.legend function, it adds only the groups to the legend which I addressed in the ax.pie function with labels=, if I use fig.legend for drawing the legend, the colors are not matching at all. I tried to use the handles= function I stumbled across some posts here on StackOverflow. But they just give me an error
AttributeError: 'tuple' object has no attribute 'legend'
I would like to add the pct and number of occurrences to my legend, but I guess there is no "easy" way for that?
import numpy as np
import pandas
import pandas as pd
import matplotlib.pyplot as plt
import openpyxl
df = pandas.read_excel("savedrecs.xlsx", sheet_name="test")
size = 0.3
fig, ax = plt.subplots(figsize=(12,8))
cmap1 = plt.get_cmap("tab20c")
cmap2 = plt.get_cmap("tab10")
outer_colors = cmap1(np.arange(20))
inner_colors = cmap1(np.arange(12))
sr_colors = cmap1(np.arange(5,6))
third_ring = df[df["Group"].str.contains("group")]
fourth_ring = df[df["Group"].str.contains("Target trait")]
second_ring = df[df["Group"].str.contains("Cultivar")]
first_ring = df[df["Group"].str.contains("Mutation")]
def make_autopct(values):
def my_autopct(pct):
total = sum(values)
val = int(round(pct*total/100.0))
return '{p:.2f}%\n({v:d})'.format(p=pct,v=val)
return my_autopct
ir = ax.pie(first_ring["Occurence"], radius=1-size, labels=first_ring["Name"], textprops={"fontsize":8},labeldistance=0,
colors=sr_colors, wedgeprops=dict(edgecolor="w"))
sr = ax.pie(second_ring["Occurence"],
radius=1,wedgeprops=dict(width=size, edgecolor="w"),startangle=90,colors=inner_colors)
tr = ax.pie(third_ring["Occurence"],
radius=1+size,wedgeprops=dict(width=size, edgecolor="w"),startangle=90,colors=outer_colors)
fr = ax.pie(fourth_ring["Occurence"],
radius=1+size*2,wedgeprops=dict(width=size, edgecolor="w"),startangle=90,colors=outer_colors)
#---Legend & Title----
ax.legend( bbox_to_anchor=(1.04, 0.5), loc="center left", borderaxespad=10 ,fancybox=True, shadow=False, ncol=1, title="This will be a fancy legend title")
fig.suptitle("This will be a fancy title, which I don't know yet!")
The output of this code is then as follows:
I am visualizing the results of a survey. The answers are long and I would like to fit them entirely into the graph. Therefore, I would be very grateful if you could point me to a way to have multi-line xticklabels, or include the xticklabels in a legend on the side as seen in this example:
Because otherwise I would have to make the graph very wide to fit the entire answer. My current code and the resulting plot look as follows:
import seaborn as sns
from textwrap import wrap
catp = (sns.catplot(data=results, x='1',
ylabel='Number of Participants',
title="\n".join(wrap("Question 1: Out of the three options, please choose the one you would prefer your fully autonomous car to choose, if you sat in it.", 90)))
for p in catp.ax.patches:
percentage = '{:.1f}%'.format(100 * p.get_height()/92)
x = p.get_x() + p.get_width() / 2 - 0.05
y = p.get_y() + p.get_height() + 0.3
catp.ax.annotate(percentage, (x, y), size = 12)
Best regards!
Edit: You can create a sample dataframe with this code:
import pandas as pd
import numpy as np
from itertools import chain
x = (np.repeat('Brake and crash into the bus', 37),
np.repeat('Steer into the passing car on the left', 22),
np.repeat('Steer into the right hand sidewall', 39))
results = pd.DataFrame({'1': list(chain(*x))})
Extract xticklabels and fix them with wrap as you did with the title
matplotlib 3.4.2 now comes with .bar_label to more easily annotate bars
See this answer for customizing the bar annotation labels.
The height and aspect of the figure will still require some adjusting depending on wrap width.
An alternate solution is to fix the values in the dataframe:
df['1'] = df['1'].apply(lambda row: '\n'.join(wrap(row, 30)))
for col in df.columns: df[col] = df[col].apply(lambda row: '\n'.join(wrap(row, 30))) for all columns.
The list comprehension for labels uses an assignment expression (:=), which requires python >= 3.8. This can be rewritten as a standard for loop.
labels = [f'{v.get_height()/len(df)*100:0.1f}%' for v in c] works without an assignment expression, but doesn't check if the bar height is 0.
Tested in python 3.8.11, pandas 1.3.2, matplotlib 3.4.2, seaborn 0.11.2
import seaborn as sns
from textwrap import wrap
from itertools import chain
import pandas as pd
import numpy as np
# sample dataframe
x = (np.repeat('Brake and crash into the bus, which will result in the killing of the children on the bus, but save your life', 37),
np.repeat('Steer into the passing car on the left, pushing it into the wall, saving your life, but killing passengers in the other car', 22),
np.repeat('Steer into the right hand sidewall, killing you but saving the lives of all other passengers', 39))
df = pd.DataFrame({'1': list(chain(*x))})
# plotting
catp = (sns.catplot(data=df, x='1',
ylabel='Number of Participants',
title="\n".join(wrap("Question 1: Out of the three options, please choose the one you would prefer your fully autonomous car to choose, if you sat in it.", 90)))
for ax in catp.axes.ravel():
# extract labels
labels = ax.get_xticklabels()
# fix the labels
for v in labels:
text = v.get_text()
text = '\n'.join(wrap(text, 30))
# set the new labels
# annotate the bars
for c in ax.containers:
# create a custom annotation: percent of total
labels = [f'{w/len(df)*100:0.1f}%' if (w := v.get_height()) > 0 else '' for v in c]
ax.bar_label(c, labels=labels, label_type='edge')
My aim is to show a bar chart with 3-dim data, x, categorical and y1, y2 as continuous series; the bars should have heights from y1 and color to indicate y2.
This does not seem to be particularly obscure to me, but I didn't find a simple / built-in way to use a bar chart to visualise three dimensions -- I'm thinking mostly for exploratory purposes, before investigating relationships more formally.
Am I missing a type of plot in the libraries? Is there a good alternative to showing 3d data?
Anyway here are some things that I've tried that aren't particularly satisfying:
Some data for these attempts
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Example data with explicit (-ve) correlation in the two series
n = 10; sd = 2.5
fruits = [ 'Lemon', 'Cantaloupe', 'Redcurrant', 'Raspberry', 'Papaya',
'Apricot', 'Cherry', 'Durian', 'Guava', 'Jujube']
cost = np.random.uniform(3, 15, n)
harvest = 50 - (np.random.randn(n) * sd + cost)
df = pd.DataFrame(data={'fruit':fruits, 'cost':cost, 'harvest':harvest})
df.sort_values(by="cost", inplace=True) # preferrable to sort during plot only
# set up several subplots to show progress.
n_colors = 5; cmap_base = "coolwarm" # a diverging map
fig, axs = plt.subplots(3,2)
ax = axs.flat
Attempt 1 uses hue for the 3rd dim data in barplot. However, this produces a single color for each value in the series, and also seems to do odd things with the bar width & spacing.
import seaborn as sns
sns.barplot(ax=ax[0], x='fruit', y='cost', hue='harvest',
data=df, palette=cmap_base)
# fix the sns barplot label orientation
ax[0].set_xticklabels(ax[0].get_xticklabels(), rotation=90)
Attempt 2 uses the pandas DataFrame.plot.bar, with a continuous color range, then adds a colorbar (need scalar mappable). I borrowed some techniques from medium post among others.
import matplotlib as mpl
norm = mpl.colors.Normalize(vmin=min(df.harvest), vmax=max(df.harvest), clip=True)
mapper1 = mpl.cm.ScalarMappable(norm=norm, cmap=cmap_base)
colors1 = [mapper1.to_rgba(x) for x in df.harvest]
df.plot.bar(ax=ax[1], x='fruit', y='cost', color=colors1, legend=False)
mapper1._A = []
plt.colorbar(mapper1, ax=ax[1], label='havest')
Attempt 3 builds on this, borrowing from https://gist.github.com/jakevdp/91077b0cae40f8f8244a to facilitate a discrete colormap.
def discrete_cmap(N, base_cmap=None):
"""Create an N-bin discrete colormap from the specified input map"""
# from https://gist.github.com/jakevdp/91077b0cae40f8f8244a
base = plt.cm.get_cmap(base_cmap)
color_list = base(np.linspace(0, 1, N))
cmap_name = base.name + str(N)
return base.from_list(cmap_name, color_list, N)
cmap_disc = discrete_cmap(n_colors, cmap_base)
mapper2 = mpl.cm.ScalarMappable(norm=norm, cmap=cmap_disc)
colors2 = [mapper2.to_rgba(x) for x in df.harvest]
df.plot.bar(ax=ax[2], x='fruit', y='cost', color=colors2, legend=False)
mapper2._A = []
cb = plt.colorbar(mapper2, ax=ax[2], label='havest')
cb.set_ticks(np.linspace(*cb.get_clim(), num=n_colors+1)) # indicate color boundaries
cb.set_ticklabels(["{:.0f}".format(t) for t in cb.get_ticks()]) # without too much precision
Finally, attempt 4 gives in to trying 3d in one plot and present in 2 parts.
sns.barplot(ax=ax[4], x='fruit', y='cost', data=df, color='C0')
ax[4].set_xticklabels(ax[4].get_xticklabels(), rotation=90)
sns.regplot(x='harvest', y='cost', data=df, ax=ax[5])
(1) is unusable - I'm clearly not using as intended. (2) is ok with 10 series but with more series is harder to tell whether a given sample is above/below average, for instance. (3) is quite nice and scales to 50 bars ok, but it is far from "out-of-the-box", too involved for a quick analysis. Moreover, the sm._A = [] seems like a hack but the code fails without it. Perhaps the solution in a couple of lines in (4) is a better way to go.
To come back to the question again: Is it possible easily produce a bar chart that displays 3d data? I've focused on using a small number of colors for the 3rd dimension for easier identification of trends, but I'm open to other suggestions.
I've posted a solution as well, which uses a lot of custom code to achieve what I can't really believe is not built in some graphing library of python.
the following code, using R's ggplot gives a reasonable approximation to (2) with built-in commands.
ggplot(data = df, aes(x =reorder(fruit, +cost), y = cost, fill=harvest)) +
geom_bar(data=df, aes(fill=harvest), stat='identity') +
The first 2 lines are more or less the minimal code for barplot, and the third changes the color palette.
So if this ease were available in python I'd love to know about it!
I'm posting an answer that does solve my aims of being simple at the point of use, still being useful with ~100 bars, and by leveraging the Fisher-Jenks 1d classifier from PySAL ends up handling outliers quite well (post about d3 coloring)
-- but overall is quite involved (50+ lines in the BinnedColorScaler class, posted at the bottom).
# set up the color binner
quantizer = BinnedColorScaler(df.harvest, k=5, cmap='coolwarm' )
# and plot dataframe with it.
df.plot.bar(ax=ax, x='fruit', y='cost',
quantizer.add_legend(ax, title='harvest') # show meaning of bins in legend
Using the following class that uses a nice 1d classifier from PySAL and borrows ideas from geoplot/geopandas libraries.
from pysal.esda.mapclassify import Fisher_Jenks
class BinnedColorScaler(object):
give this an array-like data set, a bin count, and a colormap name, and it
- quantizes the data
- provides a bin lookup and a color mapper that can be used by pandas for selecting artist colors
- provides a method for a legend to display the colors and bin ranges
def __init__(self, values, k=5, cmap='coolwarm'):
self.base_cmap = plt.cm.get_cmap(cmap) # can be None, text, or a cmap instane
self.bin_colors = self.base_cmap(np.linspace(0, 1, k)) # evenly-spaced colors
# produce bins - see _discrete_colorize in geoplot.geoplot.py:2372
self.binning = Fisher_Jenks(np.array(values), k)
self.bin_edges = np.array([self.binning.yb.min()] + self.binning.bins.tolist())
# some text for the legend (as per geopandas approx)
self.categories = [
'{0:.2f} - {1:.2f}'.format(self.bin_edges[i], self.bin_edges[i + 1])
for i in xrange(len(self.bin_edges) - 1)]
def map_by_class(self, val):
''' return a color for a given data value '''
#bin_id = self.binning.find_bin(val)
bin_id = self.find_bin(val)
return self.bin_colors[bin_id]
def find_bin(self, x):
''' unfortunately the pysal implementation seems to fail on bin edge
cases :(. So reimplement with the way we expect here.
# wow, subtle. just <= instead of < in the uptos
x = np.asarray(x).flatten()
uptos = [np.where(value <= self.binning.bins)[0] for value in x]
bins = [v.min() if v.size > 0 else len(self.bins)-1 for v in uptos] #bail upwards
bins = np.asarray(bins)
if len(bins) == 1:
return bins[0]
return bins
def add_legend(self, ax, title=None, **kwargs):
''' add legend showing the discrete colors and the corresponding data range '''
# following the geoplot._paint_hue_legend functionality, approx.
# generate a patch for each color in the set
artists, labels = [], []
for i in xrange(len(self.bin_colors)):
(0,0), (1,0), mfc='none', marker='None', ls='-', lw=10,
return ax.legend(artists, labels, fancybox=True, title=title, **kwargs)
I am struggling for a while with the definition of colors in a bar plot using Pandas and Matplotlib. Let us imagine that we have following dataframe:
import pandas as pd
pers1 = ["Jesús","lord",2]
pers2 = ["Mateo","apostel",1]
pers3 = ["Lucas","apostel",1]
dfnames = pd.DataFrame(
[pers1,pers2, pers3],
Now, I want to create a bar plot with the importance as the numerical value, the names of the people as ticks and use the type column to assign colors. I have read other questions (for example: Define bar chart colors for Pandas/Matplotlib with defined column) but it doesn't work...
So, first I have to define colors and assign them to different values:
colors = {'apostel':'blue','lord':'green'}
And finally use the .plot() function:
color = dfnames['type'].map(colors)
Good. The only problem is that all bars are green:
Why?? I don't know... I am testing it in Spyder and Jupyter... Any help? Thanks!
As per this GH16822, this is a regression bug introduced in version 0.20.3, wherein only the first colour was picked from the list of colours passed. This was not an issue with prior versions.
The reason, according to one of the contributors was this -
The problem seems to be in _get_colors. I think that BarPlot should
define a _get_colors that does something like
def _get_colors(self, num_colors=None, color_kwds='color'):
color = self.kwds.get('color')
if color is None:
return super()._get_colors(self, num_colors=num_colors, color_kwds=color_kwds)
num_colors = len(self.data) # maybe? may not work for some cases
return _get_standard_colors(color=kwds.get('color'), num_colors=num_colors)
There's a couple of options for you -
The most obvious choice would be to update to the latest version of pandas (currently v0.22)
If you need a workaround, there's one (also mentioned in the issue tracker) whereby you wrap the arguments within an extra tuple -
Though, in the interest of progress, I'd recommend updating your pandas.
I find another solution to your problem and it works!
I used directly matplotlib library instead of using plot attribute of the data frame :
here is the code :
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline # for jupyter notebook
pers1 = ["Jesús","lord",2]
pers2 = ["Mateo","apostel",1]
pers3 = ["Lucas","apostel",1]
dfnames = pd.DataFrame([pers1,pers2, pers3], columns=["name","type","importance"])
fig, ax = plt.subplots()
bars = ax.bar(dfnames.name, dfnames.importance)
colors = {'apostel':'blue','lord':'green'}
for index, bar in enumerate(bars) :
color = colors.get(dfnames.loc[index]['type'],'b') # get the color key in your df
And here is the results :
so I am plotting error bar of pandas dataframe. Now the error bar has a weird arrow at the top, but what I want is a horizontal line. For example, a figure like this:
But now my error bar ends with arrow instead of a horinzontal line.
Here is the code i used to generate it:
plot = meansum.plot(
figsize=(8, 2),
error_kw=dict(ecolor="black", elinewidth=0.5, lolims=True, marker="o"),
So what should I change to make the error become the one I want. Thx.
Using plt.errorbar from matplotlib makes it easier as it returns several objects including the caplines which contain the marker you want to change (the arrow which is automatically used when lolims is set to True, see docs).
Using pandas, you just need to dig the correct line in the children of plot and change its marker:
import pandas as pd
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
df = pd.DataFrame({"val":[1,2,3,4],"error":[.4,.3,.6,.9]})
meansum = df["val"]
stdsum = df["error"]
plot = meansum.plot(kind='bar',yerr=stdsum,colormap='OrRd_r',edgecolor='black',grid=False,figsize=8,2),ax=ax,position=0.45,error_kw=dict(ecolor='black',elinewidth=0.5, lolims=True),width=0.8)
for ch in plot.get_children():
if str(ch).startswith('Line2D'): # this is silly, but it appears that the first Line in the children are the caplines...
ch.set_markersize(10) # to change its size
The result looks like:
Just don't set lolim = True and you are good to go, an example with sample data:
import pandas as pd
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
df = pd.DataFrame({"val":[1,2,3,4],"error":[.4,.3,.6,.9]})
meansum = df["val"]
stdsum = df["error"]
plot = meansum.plot(kind='bar',yerr=stdsum,colormap='OrRd_r',edgecolor='black',grid=False,figsize=(8,2),ax=ax,position=0.45,error_kw=dict(ecolor='black',elinewidth=0.5),width=0.8)
I am plotting multiple dataframes as point plot using seaborn. Also I am plotting all the dataframes on the same axis.
How would I add legend to the plot ?
My code takes each of the dataframe and plots it one after another on the same figure.
Each dataframe has same columns
date count
2017-01-01 35
2017-01-02 43
2017-01-03 12
2017-01-04 27
My code :
f, ax = plt.subplots(1, 1, figsize=figsize)
y_col = 'count'
This plots 3 lines on the same plot. However the legend is missing. The documentation does not accept label argument .
One workaround that worked was creating a new dataframe and using hue argument.
df_1['region'] = 'A'
df_2['region'] = 'B'
df_3['region'] = 'C'
df = pd.concat([df_1,df_2,df_3])
But I would like to know if there is a way to create a legend for the code that first adds sequentially point plot to the figure and then add a legend.
Sample output :
I would suggest not to use seaborn pointplot for plotting. This makes things unnecessarily complicated.
Instead use matplotlib plot_date. This allows to set labels to the plots and have them automatically put into a legend with ax.legend().
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np
date = pd.date_range("2017-03", freq="M", periods=15)
count = np.random.rand(15,4)
df1 = pd.DataFrame({"date":date, "count" : count[:,0]})
df2 = pd.DataFrame({"date":date, "count" : count[:,1]+0.7})
df3 = pd.DataFrame({"date":date, "count" : count[:,2]+2})
f, ax = plt.subplots(1, 1)
y_col = 'count'
ax.plot_date(df1.date, df1["count"], color="blue", label="A", linestyle="-")
ax.plot_date(df2.date, df2["count"], color="red", label="B", linestyle="-")
ax.plot_date(df3.date, df3["count"], color="green", label="C", linestyle="-")
In case one is still interested in obtaining the legend for pointplots, here a way to go:
ax.legend(handles=ax.lines[::len(df1)+1], labels=["A","B","C"])
ax.set_xticklabels([t.get_text().split("T")[0] for t in ax.get_xticklabels()])
Old question, but there's an easier way.
plt.legend(labels=['legendEntry1', 'legendEntry2', 'legendEntry3'])
This lets you add the plots sequentially, and not have to worry about any of the matplotlib crap besides defining the legend items.
I tried using Adam B's answer, however, it didn't work for me. Instead, I found the following workaround for adding legends to pointplots.
import matplotlib.patches as mpatches
red_patch = mpatches.Patch(color='#bb3f3f', label='Label1')
black_patch = mpatches.Patch(color='#000000', label='Label2')
In the pointplots, the color can be specified as mentioned in previous answers. Once these patches corresponding to the different plots are set up,
plt.legend(handles=[red_patch, black_patch])
And the legend ought to appear in the pointplot.
This goes a bit beyond the original question, but also builds on #PSub's response to something more general---I do know some of this is easier in Matplotlib directly, but many of the default styling options for Seaborn are quite nice, so I wanted to work out how you could have more than one legend for a point plot (or other Seaborn plot) without dropping into Matplotlib right at the start.
Here's one solution:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# We will need to access some of these matplotlib classes directly
from matplotlib.lines import Line2D # For points and lines
from matplotlib.patches import Patch # For KDE and other plots
from matplotlib.legend import Legend
from matplotlib import cm
# Initialise random number generator
rng = np.random.default_rng(seed=42)
# Generate sample of 25 numbers
n = 25
clusters = []
for c in range(0,3):
# Crude way to get different distributions
# for each cluster
p = rng.integers(low=1, high=6, size=4)
df = pd.DataFrame({
'x': rng.normal(p[0], p[1], n),
'y': rng.normal(p[2], p[3], n),
'name': f"Cluster {c+1}"
# Flatten to a single data frame
clusters = pd.concat(clusters)
# Now do the same for data to feed into
# the second (scatter) plot...
n = 8
points = []
for c in range(0,2):
p = rng.integers(low=1, high=6, size=4)
df = pd.DataFrame({
'x': rng.normal(p[0], p[1], n),
'y': rng.normal(p[2], p[3], n),
'name': f"Group {c+1}"
points = pd.concat(points)
# And create the figure
f, ax = plt.subplots(figsize=(8,8))
# The KDE-plot generates a Legend 'as usual'
k = sns.kdeplot(
x='x', y='y',
# Notice that we access this legend via the
# axis to turn off the frame, set the title,
# and adjust the patch alpha level so that
# it closely matches the alpha of the KDE-plot
for lh in ax.get_legend().get_patches():
# You would probably want to sort your data
# frame or set the hue and style order in order
# to ensure consistency for your own application
# but this works for demonstration purposes
groups = points.name.unique()
markers = ['o', 'v', 's', 'X', 'D', '<', '>']
colors = cm.get_cmap('Dark2').colors
# Generate the scatterplot: notice that Legend is
# off (otherwise this legend would overwrite the
# first one) and that we're setting the hue, style,
# markers, and palette using the 'name' parameter
# from the data frame and the number of groups in
# the data.
p = sns.scatterplot(
# Here's the 'magic' -- we use zip to link together
# the group name, the color, and the marker style. You
# *cannot* retreive the marker style from the scatterplot
# since that information is lost when rendered as a
# PathCollection (as far as I can tell). Anyway, this allows
# us to loop over each group in the second data frame and
# generate a 'fake' Line2D plot (with zero elements and no
# line-width in our case) that we can add to the legend. If
# you were overlaying a line plot or a second plot that uses
# patches you'd have to tweak this accordingly.
patches = []
for x in zip(groups, colors[:len(groups)], markers[:len(groups)]):
patches.append(Line2D([0],[0], linewidth=0.0, linestyle='',
color=x[1], markerfacecolor=x[1],
marker=x[2], label=x[0], alpha=1.0))
# And add these patches (with their group labels) to the new
# legend item and place it on the plot.
leg = Legend(ax, patches, labels=groups,
loc='upper left', frameon=False, title='Groups')
# Done
Here's the output: