I have a matplotlib boxplot with very long strings as xticks. Is there a way to automatically split them into multiple lines to make the plot more clean? I am using the seaborn function barplot to create the graph.
This is the plot:
And the code i use to create it:
plt.figure(figsize=(15,13))
sns.barplot(x="Component",y="TTTR",color = "C0",data=new,dodge=False,edgecolor="black",zorder=3)
plt.xticks(rotation=90)
plt.grid(axis='y',zorder=0)
plt.title("10 most impactful components",size=30,y=1.04,**pfont)
plt.ylabel("Impact (Sum TTR in h)")
plt.xlabel('Component')
plt.tight_layout()
seaborn.barplot returns a matplotlib.Axes object so you could use Axes.set_xticklabels to update the labels afterwards, e.g.
import textwrap
max_width = 20
ax = sns.barplot(...)
ax.set_xticklabels(textwrap.fill(x.get_text(), max_width) for x in ax.get_xticklabels())
Found a workaround: Replacing the label column in the pandas dataframe with the labels from this answer
new.Component = [re.sub("(.{20})", "\\1\n", label, 0, re.DOTALL) for label in new.Component]
sns.barplot(x="Component",y="TTTR",color ="C0",data=new,dodge=False,edgecolor="black",zorder=3)
Related
I want to specify the color of the area surrounding a plot created using the df.plot() in Pandas/Python.
Using .set_facecolor as in the code below only changes the area inside the axes (see image), I want to change the color outside too.
import pandas as pd
import numpy as np
df = pd.DataFrame(components, columns=['PC1','PC2']
df.plot('PC1','PC2','scatter').set_facecolor('green')
Replacing the last line with these two lines produces the same graph.
ax = df.plot('PC1','PC2','scatter')
ax.set_facecolor('green')
setfacecolor example
IIUC, you can use fig.set_facecolor:
fig, ax = plt.subplots()
df.plot('PC1','PC2','scatter', ax=ax).set_facecolor('green')
fig.set_facecolor('green')
plt.show()
Output:
I am using the integrated plot() function in pandas to generate a graph with two y-axes. This works well and the legend even points to the (right) y-axis for the second data set. But imho the legend's position is bad.
However, when I update the legend position I get two legends the correct one ('A', 'B (right)') at an inconvenient location, and a wrong one ('A' only) at the chosen location.
So now I want to generate a legend on my own and was looking for the second <matplotlib.lines.Line2D>, but it is not contained in the ax environment.
import pandas as pd
df = pd.DataFrame({"A":[1,2,3],"B":[1/4,1/5,1/6]})
ax = df.plot(secondary_y=['B'])
len(ax.lines)
>>> 1
My ultimate objective is to be able to move the correct legend around, but I am confident I could manually place a legend, if only I had access to the second line container.
If I had, I was going to suppress the original legend by invoking df.plot(...,legend=None) and do something like plt.legend([ax.lines[0],ax.lines[1]],['A','B (right)'],loc='center left',bbox_to_anchor=(1.2, 0.5)). But ax only stores the first line "A", where is the second?
Also ax.get_legend_handles_labels() only contains ([<matplotlib.lines.Line2D at 0x2630e2193c8>], ['A']).
You create two axes. Each contains a line. So you need to loop over the axes and take the line(s) from each of them.
import numpy as np
import pandas as pd
df = pd.DataFrame({"A":[1,2,3],"B":[1/4,1/5,1/6]})
ax = df.plot(secondary_y=['B'])
lines = np.array([axes.lines for axes in ax.figure.axes]).flatten()
print(lines)
For the purpose of creating a single legend you may however just use a figure legend,
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({"A":[1,2,3],"B":[1/4,1/5,1/6]})
ax = df.plot(secondary_y=['B'], legend=False)
ax.figure.legend()
plt.show()
I am using Seaborn to make a boxplot using data from a pandas dataframe.
colorpalette = sns.hls_palette(8,h=.9)
g = sns.boxplot(x="estimator", y="mean_score", data=dFrame, palette=colorpalette)
g.set(ylabel='Mean Accuracy', xlabel='')
plt.show()
This results me in the previous figure. As you can see the ticklabels are too long to be in one line. So, I plan to use textwrap on the xticklabels to span them over multiple rows. In order to get the labels, I tried using
g.xaxis.get_ticklabels()
Returns me the following
<a list of 9 Text major ticklabel objects>
If I try it in a loop like this
for item in g.xaxis.get_ticklabels():
print(item)
I get the following output
Text(0,0,'ExtraTreesClassifier')
Text(1,0,'RandomForestClassifier')
Text(2,0,'GradientBoostingClassifier')
Text(3,0,'LogisticRegression')
Text(4,0,'DecisionTreeClassifier')
Text(5,0,'kNearestNeighbors')
Text(6,0,'LinearSVC')
Text(7,0,'Perceptron')
Is there a way to do it more efficiently using default functions/methods in seaborn.
Having a matplotlib axes instance ax (as it is e.g. returned by seaborn plots),
ax = sns.boxplot(...)
allows to obtain the ticklabels as
ax.get_xticklabels()
The easiest way to get the texts out of the list would be
texts = [t.get_text() for t in ax.get_xticklabels()]
Wrapping the text could be done as well on the fly
texts = [textwrap.fill(t.get_text(), 10) for t in ax.get_xticklabels()]
and even setting the text back as ticklabels can be done in the same line
ax.set_xticklabels([textwrap.fill(t.get_text(), 10) for t in ax.get_xticklabels()])
The accepted answer didn't work for me (there was message like: 'FacetGrid' object has no attribute 'get_xticklabels'.
but this worked:
g.fig.autofmt_xdate()
I am trying to create a single image with heatmaps representing the correlation of features of data points for each label separately. With seaborn I can create a heatmap for a single class like so
grouped = df.groupby('target')
sns.heatmap(grouped.get_group('Class_1').corr())
An I get this which makes sense:
But then I try to make a list of all the labels like so:
g = sns.FacetGrid(df, col='target')
g.map(lambda grp: sns.heatmap(grp.corr()))
And sadly I get this which makes no sense to me:
Turns out you can do it pretty concisely with just seaborn if you use map_dataframe instead of map:
g = sns.FacetGrid(df, col='target')
g.map_dataframe(lambda data, color: sns.heatmap(data.corr(), linewidths=0))
#mwaskom points out in his comment that it might be a good idea to explicitly set the limits of the colormap so that the different facets can be more directly compared. The documentation describes relevant heatmap parameters:
vmin, vmax : floats, optional
Values to anchor the colormap, otherwise they are inferred from the data and other keyword arguments.
Without FacetGrid, but making a corr heatmap for each group in a column:
import pandas as pd
import seaborn as sns
from numpy.random import randint
import matplotlib.pyplot as plt
df = pd.DataFrame(randint(0,10,(200,12)),columns=list('abcdefghijkl'))
grouped = df.groupby('a')
rowlength = grouped.ngroups/2 # fix up if odd number of groups
fig, axs = plt.subplots(figsize=(9,4), nrows=2, ncols=rowlength)
targets = zip(grouped.groups.keys(), axs.flatten())
for i, (key, ax) in enumerate(targets):
sns.heatmap(grouped.get_group(key).corr(), ax=ax,
xticklabels=(i >= rowlength),
yticklabels=(i%rowlength==0),
cbar=False) # Use cbar_ax into single side axis
ax.set_title('a=%d'%key)
plt.show()
Maybe there's a way to set up a lambda to correctly pass the data from the g.facet_data() generator through corr before going to heatmap.
I have a pandas DataFrame and I want to plot a bar chart that includes a legend.
import pylab as pl
from pandas import *
x = DataFrame({"Alpha": Series({1: 1, 2: 3, 3:2.5}), "Beta": Series({1: 2, 2: 2, 3:3.5})})
If I call plot directly, then it puts the legend above the plot:
x.plot(kind="bar")
If I turn of the legend in the plot and try to add it later, then it doesn't retain the colors associated with the two columns in the DataFrame (see below):
x.plot(kind="bar", legend=False)
l = pl.legend(('Alpha','Beta'), loc='best')
What's the right way to include a legend in a matplotlib plot from a Pandas DataFrame?
The most succinct way to go is:
x.plot(kind="bar").legend(bbox_to_anchor=(1.2, 0.5))
or in general
x.plot(kind="bar").legend(*args, **kwargs)
If you want to add the legend manually, you have to ask the subplot for the elements of the bar plot:
In [17]: ax = x.plot(kind='bar', legend=False)
In [18]: patches, labels = ax.get_legend_handles_labels()
In [19]: ax.legend(patches, labels, loc='best')
Out[19]: <matplotlib.legend.Legend at 0x10b292ad0>
Also, plt.legend(loc='best') or ax.legend(loc='best') should "just work", because there are already "links" to the bar plot patches set up when the plot is made, so you don't have to pass a list of axis labels.
I'm not sure if the version of pandas you're using returns a handle to the subplot (ax = ...) but I'm fairly certain that 0.7.3 does. You can always get a reference to it with plt.gca().