I need to plot multiple sets of data on the same plot, and I use matplotlib.
For some of plots I use plt.plot() and for the others I use plt.errorbar(). But when I make a legend the ones created with plt.plot() appears first, no matter in which order I put them in the file (and zorder seems to have no effect on the position in the legend).
How can I give the order that I want in the legend, regardless of the way I plot the data?
You can adjust the order manually, by getting the legend handles and labels using ax.get_legend_handles_labels, and then reordering the resulting lists, and feeding them to ax.legend. Like so:
import matplotlib.pyplot as plt
import numpy as np
fig,ax = plt.subplots(1)
ax.plot(np.arange(5),np.arange(5),'bo-',label='plot1')
ax.errorbar(np.arange(5),np.arange(1,6),yerr=1,marker='s',color='g',label='errorbar')
ax.plot(np.arange(5),np.arange(2,7),'ro-',label='plot2')
handles,labels = ax.get_legend_handles_labels()
handles = [handles[0], handles[2], handles[1]]
labels = [labels[0], labels[2], labels[1]]
ax.legend(handles,labels,loc=2)
plt.show()
Related
I am using pandas in conjunction with numpy, matplotlib, seaborne and a load of data. I am plotting dataframe data with the following line of code
####MINIMIZED CODE#####
importing numpy ect
df=pd.read_csv("csvfile")
df["movieTitle"]=pd.Categorical(df["movieTitle"] #done for "movieTitle"
df['releaseData']=pd.to_datetime(df["releaseData"]
df2=df.sort_values('inflationGross', ascending=True) ###which was disneyData2 in original code
df2=df=df2.head(100)
plot=rando=sns.lmplot(x='releaseDate', y='inflationGross', data=df2, fit_reg=False, hue='movieTitle', legend=True)
rando.set_xticklabels(rotation=90)
########ACTUAL CODE#########
rando=sns.lmplot(x='releaseDate', y='inflationGross', data=disneyData2, fit_reg=False, hue='movieTitle', legend=True)```
The plot itself is fine, however, the categorical data, which is a series of film titles won't fit entirely on the legend. There's a large number of categories, which I understand isn't good practice but I'd like to find a way to perhaps shrink text size, or format the legend correctly.
Everything has been imported correctly and I have checked there are no bugs in the code.
I would like to keep only using seaborne to analyse this data NOT matplotlib
Picture of the problem is included below.
Thank you in advance for any help!
Roughly speaking, you can delete the automatically created legend and fetch the text of the original legend to create a new The legend will be set to (I set the font size to 8.) Since no data was presented, I modified the code on the official site for illustrative purposes. If the data is presented we will get some great answers from more people.
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(color_codes=True)
tips = sns.load_dataset("tips")
g = sns.lmplot(x="total_bill", y="tip", hue='smoker', data=tips, legend=True, legend_out=True)
# Figure can hold multiple legends, so you can specify the first legend in the list
lg = g.fig.legends[0]
# Figure legend delete
g.fig.legends[0].remove()
# The list where the handle is kept
handles = lg.legendHandles
# Extracts only strings from the list of Text objects holding labels
labels = [t.get_text() for t in lg.texts]
print(handles)
print(labels)
g.fig.axes[0].legend(handles, labels, loc='center left', bbox_to_anchor=(1.0, 0.5), frameon=False, fontsize=8)
I am using the integrated plot() function in pandas to generate a graph with two y-axes. This works well and the legend even points to the (right) y-axis for the second data set. But imho the legend's position is bad.
However, when I update the legend position I get two legends the correct one ('A', 'B (right)') at an inconvenient location, and a wrong one ('A' only) at the chosen location.
So now I want to generate a legend on my own and was looking for the second <matplotlib.lines.Line2D>, but it is not contained in the ax environment.
import pandas as pd
df = pd.DataFrame({"A":[1,2,3],"B":[1/4,1/5,1/6]})
ax = df.plot(secondary_y=['B'])
len(ax.lines)
>>> 1
My ultimate objective is to be able to move the correct legend around, but I am confident I could manually place a legend, if only I had access to the second line container.
If I had, I was going to suppress the original legend by invoking df.plot(...,legend=None) and do something like plt.legend([ax.lines[0],ax.lines[1]],['A','B (right)'],loc='center left',bbox_to_anchor=(1.2, 0.5)). But ax only stores the first line "A", where is the second?
Also ax.get_legend_handles_labels() only contains ([<matplotlib.lines.Line2D at 0x2630e2193c8>], ['A']).
You create two axes. Each contains a line. So you need to loop over the axes and take the line(s) from each of them.
import numpy as np
import pandas as pd
df = pd.DataFrame({"A":[1,2,3],"B":[1/4,1/5,1/6]})
ax = df.plot(secondary_y=['B'])
lines = np.array([axes.lines for axes in ax.figure.axes]).flatten()
print(lines)
For the purpose of creating a single legend you may however just use a figure legend,
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({"A":[1,2,3],"B":[1/4,1/5,1/6]})
ax = df.plot(secondary_y=['B'], legend=False)
ax.figure.legend()
plt.show()
I'm making some EDA using pandas and seaborn, this is the code I have to plot the histograms of a group of features:
skewed_data = pd.DataFrame.skew(data)
skewed_features =skewed_data.index
fig, axs = plt.subplots(ncols=len(skewed_features))
plt.ticklabel_format(style='sci', axis='both', scilimits=(0,0))
for i,skewed_feature in enumerate(skewed_features):
g = sns.distplot(data[column])
sns.distplot(data[skewed_feature], ax=axs[i])
This is the result I'm getting:
Is not readable, how can I avoid that issue?
I know you are concerning about the layout of the figures. However, you need to first decide how to represent your data. Here are two choices for your case
(1) Multiple lines in one figure and
(2) Multiple subplots 2x2, each subplot draws one line.
I am not quite familiar with searborn, but the plotting of searborn is based on matplotlib. I could give you some basic ideas.
To archive (1), you can first declare the figure and ax, then add all line to this ax. Example codes:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
# YOUR LOOP, use the ax parameter
for i in range(3)
sns.distplot(data[i], ax=ax)
To archive (2), same as above, but with different number subplots, and put your line in the different subplot.
# Four subplots, 2x2
fig, axarr = plt.subplots(2,2)
# YOUR LOOP, use different cell
You may check matplotlib subplots demo. To do a good visualization is a very tough work. There are so many documents to read. Check the gallery of matplotlib or seaborn is a good and quick way to understand how some kinds of visualization are implemented.
Thanks.
I am looping through a bunch of CSV files containing various measurements.
Each file might be from one of 4 different data sources.
In each file, I merge the data into monthly datasets, that I then plot in a 3x4 grid. After this plot has been saved, the loop moves on and does the same to the next file.
This part I got figured out, however I would like to add a visual clue to the plots, as to what data it is. As far as I understand it (and tried it)
plt.subplot(4,3,1)
plt.hist(Jan_Data,facecolor='Red')
plt.ylabel('value count')
plt.title('January')
does work, however this way, I would have to add the facecolor='Red' by hand to every 12 subplots. Looping through the plots wont work for this situation, since I want the ylabel only for the leftmost plots, and xlabels for the bottom row.
Setting facecolor at the beginning in
fig = plt.figure(figsize=(20,15),facecolor='Red')
does not work, since it only changes the background color of the 20 by 15 figure now, which subsequently gets ignored when I save it to a PNG, since it only gets set for screen output.
So is there just a simple setthecolorofallbars='Red' command for plt.hist(… or plt.savefig(… I am missing, or should I just copy n' paste it to all twelve months?
You can use mpl.rc("axes", color_cycle="red") to set the default color cycle for all your axes.
In this little toy example, I use the with mpl.rc_context block to limit the effects of mpl.rc to just the block. This way you don't spoil the default parameters for your whole session.
import matplotlib as mpl
import matplotlib.pylab as plt
import numpy as np
np.random.seed(42)
# create some toy data
n, m = 2, 2
data = []
for i in range(n*m):
data.append(np.random.rand(30))
# and do the plotting
with mpl.rc_context():
mpl.rc("axes", color_cycle="red")
fig, axes = plt.subplots(n, m, figsize=(8,8))
for ax, d in zip(axes.flat, data):
ax.hist(d)
The problem with the x- and y-labels (when you use loops) can be solved by using plt.subplots as you can access every axis seperately.
import matplotlib.pyplot as plt
import numpy.random
# creating figure with 4 plots
fig,ax = plt.subplots(2,2)
# some data
data = numpy.random.randn(4,1000)
# some titles
title = ['Jan','Feb','Mar','April']
xlabel = ['xlabel1','xlabel2']
ylabel = ['ylabel1','ylabel2']
for i in range(ax.size):
a = ax[i/2,i%2]
a.hist(data[i],facecolor='r',bins=50)
a.set_title(title[i])
# write the ylabels on all axis on the left hand side
for j in range(ax.shape[0]):
ax[j,0].set_ylabel(ylabel[j])
# write the xlabels an all axis on the bottom
for j in range(ax.shape[1]):
ax[-1,j].set_xlabel(xlabels[j])
fig.tight_layout()
All features (like titles) which are not constant can be put into arrays and placed at the appropriate axis.
In the following code snippet:
import numpy as np
import pandas as pd
import pandas.rpy.common as com
import matplotlib.pyplot as plt
mtcars = com.load_data("mtcars")
df = mtcars.groupby(["cyl"]).apply(lambda x: pd.Series([x["cyl"].count(), np.mean(x["wt"])], index=["n", "wt"])).reset_index()
plt.plot(df["n"], range(len(df["cyl"])), "o")
plt.yticks(range(len(df["cyl"])), df["cyl"])
plt.show()
This code outputs the dot plot graph, but the result looks quite awful, since both the xticks and yticks don't have enough space, that it's quite difficult to notice both 4 and 8 of the cyl variable output its values in the graph.
So how can I plot it with enough space in advance, much like you can do it without any hassles in R/ggplot2?
For your information, both of this code and this doesn't work in my case. Anyone knows the reason? And do I have to bother to creating such subplots in the first place? Is it impossible to automatically adjust the ticks with response to the input values?
I can't quite tell what you're asking...
Are you asking why the ticks aren't automatically positioned or are you asking how to add "padding" around the inside edges of the plot?
If it's the former, it's because you've manually set the tick locations with yticks. This overrides the automatic tick locator.
If it's the latter, use ax.margins(some_percentage) (where some_percentage is between 0 and 1, e.g. 0.05 is 5%) to add "padding" to the data limits before they're autoscaled.
As an example of the latter, by default, the data limits can be autoscaled such that a point can lie on the boundaries of the plot. E.g.:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(range(10), 'ro')
plt.show()
If you want to avoid this, use ax.margins (or equivalently, plt.margins) to specify a percentage of padding to be added to the data limits before autoscaling takes place.
E.g.
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(range(10), 'ro')
ax.margins(0.04) # 4% padding, similar to R.
plt.show()