In Python for loop, how to create legends for plots? - python

I have successfully created a figure of four sub-plots and two of subplots having double y-axis features as shown below. Each subplot has many different plots. I wanted to add a legend. But, I could not do it with code. The figure is given below:
My code is given below:
fig, axs = plt.subplots(2,2,figsize=(17,10))
fig.legend(loc="center right", fontsize=13,fancybox=True, framealpha=1, shadow=True, borderpad=1)
plt.rc('font',family='Times New Roman')
.
.
for i,j in zip(IV_start_index,IV_start_index[1:]): # This is simple code to access present and next element in a list
axs[0][0].plot(module_allData_df['Time'].iloc[mpp_index],pmpp_theo,'bs',label="Theoretical")
axs[0][0].plot(module_allData_df['Time'].iloc[mpp_index],pmpp_act,'rd',label="Actual")
.
.
plt.suptitle('A NIST Module %s power loss analysis'%(module_allData_df['Time'].loc[i].strftime('%Y-%m-%d')),fontsize=18) #
plt.savefig('All_day_power_loss')
The output is:
No handles with labels found to put in legend.
Could you help me to correct my code?
Corrections:
I did change the code as given below.
for i,j in zip(IV_start_index,IV_start_index[1:]): # This is simple code to access present and next element in a list
axs[0][0].plot(module_allData_df['Time'].iloc[mpp_index],pmpp_theo,'bs',label="Theoretical")
axs[0][0].plot(module_allData_df['Time'].iloc[mpp_index],pmpp_act,'rd',label="Actual")
axs[0][0].legend()
It has created many legends.The output figure is given below:

If you don't provide the legend entries within the legend() function, it should be placed after the label-entries in the plot commands. Otherwise legend cannot know what to list.
If you want to have a legend within each subplot, you should call axs[i][j].legend() within a loop (or call axs[0][0].legend(), axs[0][1].legend(), ... manually for each subplot of course). The point is: fig.legend() is a legend on figure level, i.e. one legend for all subplots together. This should be called once outside of the loop.

After a series of trails, I did the following to my code:
for i,j in zip(IV_start_index,IV_start_index[1:]): # This is simple code to access present and next element in a list
axs[0][0].plot(module_allData_df['Time'].iloc[mpp_index],pmpp_theo,'bs')
axs[0][0].plot(module_allData_df['Time'].iloc[mpp_index],pmpp_act,'rd')
axs[0][0].legend(['Theoretical','Actual'])
.
.
My output is:

Related

Matplotlib.pyplot - how to save a histogram in a variable for later access?

Due to data access patterns, I need to save various histograms in a Python list and then access them later to output as part of a multi-page PDF.
If I save the histograms to my PDF as soon as I create them, my code works fine:
def output_histogram_pdf(self, pdf):
histogram = plt.hist(
x=[values], bins=50)
plt.xlabel(xlabel)
plt.ylabel(ylabel)
plt.title(title)
if isinstance(pdf, PdfPages):
pdf.savefig()
But if I instead save them to a list so I can later manipulate the order, I run into trouble.
histogram_list.append(histogram)
Then later
for histogram in histogram_list:
plt.figure(histogram)
pdf.savefig()
This does not work. I'm either saving the wrong thing, or I don't know how to properly open what I've saved.
I've spent quite some time fruitlessly googling for a working solution, but so many of the terms involved are sufficiently vague that I get tons of different types of issues in my search results. Any help would be greatly appreciated, thanks!
Short Answer
You can use plt.gcf()
When creating your graph, after setting xlabel, ylabel, and title, append the figure to histogram list.
histogram_list.append(plt.gcf())
You can then iterate over the list later and call savefig.
Long Answer
plt.hist doesn't return the figure object. However, the figure object can be obtained using gcf (Get Current Figure).
In case you do not want to use the current figure, you could always create the figure yourself, using plt.figure or plt.subplot.
Either way, since you are already plotting the histogram and setting the labels for the figure, you'd want to append the figure to the list.
Option 1: using gcf
histogram = plt.hist(
x=[values], bins=50)
plt.xlabel(xlabel)
plt.ylabel(ylabel)
plt.title(title)
histogram_list.append(plt.gcf())
Option 2: create your own figure
figure = plt.figure(figsize=(a,b,))
# draw histogram on figure
histogram_list.append(figure)
Each histogram is formed by (n,bins,patches) where n are the values for each bin, bins are the bins edges (1 more than n), and patches are the artists to create the bars.
Most simply, try to plot each histogram as
for histogram in histogram_list:
n = histogram[0]
bins = histogram[1]
plt.plot(bins[:-1], n, '-', ds='steps-pre')

plotting multivariate grouped bar graph using loop

I am trying to create a grouped bar chart with multiple subplots using matplotlib and pandas. I am able to create it manually defining the plots according to the values of the datatframe, but I want to get it automated with loops. I have tried many ways doing a loop, but running into one or other error every time. Being a beginner in both programming and python, I am getting lost. here's my data:sales3
The code I have written to get the expected output:
sales3 = sales.groupby(["Region","Tier"])[["Sales2015","Sales2016"]].sum().round().astype("int64")
sales3.reset_index(inplace=True)
fig,(ax1,ax2,ax3) = plt.subplots(nrows=1,ncols=3,sharex=True,sharey=True,figsize=(10,6))
sales3[sales3["Region"]=="Central"].plot(kind="bar",x="Tier",y=["Sales2015","Sales2016"],ax=ax1)
ax1.set_title("Central")
sales3[sales3["Region"]=="East"].plot(kind="bar",x="Tier",y=["Sales2015","Sales2016"],ax=ax2)
ax2.set_title("East")
sales3[sales3["Region"]=="West"].plot(kind="bar",x="Tier",y=["Sales2015","Sales2016"],ax=ax3)
ax3.set_title("West")
plt.tight_layout()
output:
expected output
Please guide how do I write it using a loop or any automated way. Say, I have another region like "North" /"South" added in future or a new Tier introduced, what will be the best way to program that would accommodate such new additions.
You can iterate through the axes and regions:
sales3 = sales.groupby(["Region","Tier"])[["Sales2015","Sales2016"]].sum().round().astype("int64")
sales3.reset_index(inplace=True)
fig,axes = plt.subplots(nrows=1,ncols=3,sharex=True,sharey=True,figsize=(10,6))
# define regions to plot
regions = ["Central", "East", "West"]
# iterate over regions and axes using zip()
for region, ax in zip(regions,axes):
sales3[sales3["Region"]==region].plot(kind="bar",x="Tier",y=["Sales2015","Sales2016"],ax=ax)
ax.set_title(region)
plt.tight_layout()
I think the key is using pythons built-in zip function which is documented here.

Colorbar for each row in ImageGrid

Disclaimer: I am very inexperienced using matplotlib and python in general.
Here is the figure I'm trying to make:
Using GridSpec works well for laying out the plots, but when I try to include a colorbar on the right of each row, it changes the size of the corresponding subplot. This seems to be a well known and unavoidable problem with GridSpec. So at the advice of this question: Matplotlib 2 Subplots, 1 Colorbar
I've decided to remake the whole plot using ImageGrid. Unfortunately the documentation only lists the options cbar_mode=[None|single|each] whereas I want 1 colobar per row. Is there a way to do this inside a single ImageGrid? or will I have to make 2 grids and deal with the nightmare of alignment.
What about the 5th plot at the bottom? Is there a way to include that in the image grid somehow?
The only way I can see this working is to somehow nest two ImageGrids into a GridSpec in a 1x3 column. this seems overly complicated and difficult so I don't want to build that script until I know its the right way to go.
Thanks for any help/advice!
Ok I figured it out. It seems ImageGrid uses subplot somehow inside it. So I was able to generate the following plot using something like
TopGrid = ImageGrid( fig, 311,
nrows_ncols=(1,2),
axes_pad=0,
share_all=True,
cbar_location="right",
cbar_mode="single",
cbar_size="3%",
cbar_pad=0.0,
cbar_set_cax=True
)
<Plotting commands for the top row of plots and colorbar>
BotGrid = ImageGrid( fig, 312,
nrows_ncols=(1,2),
axes_pad=0,
share_all=True,
cbar_location="right",
cbar_mode="single",
cbar_size="3%",
cbar_pad=0.0,
)
<Plotting commands for bottom row . . .>
StemPlot = plt.subplot(313)
<plotting commands for bottom stem plot>
EDIT: the whitespace in the color plots is intentional, not some artifact from adding the colorbars

matplotlib x-axis formatting if x-axis is pandas index

I'm using iPython notebook's %matplotlib inline and I'm having trouble formatting my plot.
As you can see, my first and last data point aren't showing up the way the other data points are showing up. I'd like to have the error bars visible and have the graph be "zoomed out" a bit.
df.plot(yerr=df['std dev'],color='b', ecolor='r')
plt.title('SpO2 Mean with Std Dev')
plt.xlabel('Time (s)')
plt.ylabel(SpO2)
I assume I have to use
matplotlib.pyplot.xlim()
but I'm not sure how to use it properly if my x-axis is a DataFrame index composed of strings:
index = ['-3:0','0:3','3:6','6:9','9:12','12:15','15:18','18:21','21:24']
Any ideas? Thanks!
You can see the usage of xlim here. Basically in this case if you ran plt.xlim() you would get(0.0, 8.0). As you have an index that uses text and not numbers the values for xlim are actually just the index of the entries in your index. So in this case you would just need to change the values by feeding in however many steps left and right you want your graph to take. For example:
plt.xlim(-1,len(df))
Would change this:
to this:
Hope that helps.

Changing the marker on the same set of data

I have a set of data that comes from two different sources, and I have multiple sets graphed together. So essentially 6 scatterplots with error bars (all different colors), and each scatterplot has two sources.
Basically I want the blue scatterplot to have two different markers, 'o' and's'. I currently have done this by plotting each point individually with a loop and checking to see if the source is 1 or 2. If it is 1 it plots a 's' if the source is 2 then it plots a 'o'.
However this method does not really allow for having a legend. (Data1, Data2,...Data6)
Is there a better way of doing this?
EDIT:
I want a cleaner method for this, something along the lines of
x=[1,2,3]
y=[4,5,6]
m=['o','s','^']
plt.scatter(x,y,marker=m)
But this returns an error Unrecognized marker style
A more pythonic way (but still a loop) might be something like
x=[1,2,3]
y=[4,5,6]
l=['data1','data2','data3']
m=['ob','sb','^b']
f,a = plt.subplots(1,1)
[a.plot(*data, label=lab) for data,lab in zip(zip(x,y,m),l)]
plt.legend(loc='lower right')
plt.xlim(0,4)
plt.ylim(3,7);
But I guess this is not the most efficient way if you have lots of datapoints.
If you want to use scatter try something like
m=['o','s','^']
f,a = plt.subplots(1,1)
[a.scatter(*data, marker=m1, label=l1) for data,m1,l1 in zip(zip(x,y),m,l)]
I'm pretty sure, there is also a possibility to apply ** and dicts here.
UPDATE:
Instead of looping over the plot command the ability of matplotlib's plot function to read an arbitrary number of x,y,fmt groups, see docs.
x=np.random.random((3,6))
y=np.random.random((3,6))
l=['data1','data2','data3']
m=['ob','sb','^b']
plt.plot(*[i[j] for i in zip(x,y,m) for j in range(3)])
plt.legend(l,loc='lower right')
Calling plot in a loop is fine. You just need to keep the list of lines returned by plot and use fig.legend to create a legend for the whole figure. See http://matplotlib.org/examples/pylab_examples/figlegend_demo.html
Seconded to #tcaswell 's comments, .scatter() returns collections.PathCollection, which provides a fast way of plotting a large number of identical shaped objects. You can use a loop to plot the data as many scatter plots (and many different datasets) but in my opinion it looses all the speed benefit provided by .scatter().
With these being said, it is however not true that the dots have to be identical in a scatter plot. You can have different linewidth, edgecolor and many other things. But the dots have to be the same shape. See this example, assigning different colors (and only plot one dataset):
>>> sc=plt.scatter(x, y, label='test')
>>> sc.set_color(['r','g','b'])
>>> plt.legend()
See details in http://matplotlib.org/api/collections_api.html.
These were all alright, but not really what I was looking for. The problem was how I parsed through my data and how I could add a legend in the wouldn't mess that up. Since I did a for-loop and plotted each point individually based on if it was measured at Observation location 1 or 2 whenever I made a legend it would plot over 50 legend entries. So I plotted my data as full sets (Invisibly and with no change in symbols) then again in color with the varying symbols. This worked better. Thanks though

Categories

Resources