I've started working more with figures and axes, and at first blush it seems to be really nice: an axes object can be independently created and manipulated (either by adding plots to it, or changing scale/etc), however the issue I'm running into is that it appears that "Figure" is the only class that can control layout of axes objects.
I would like to do something like this:
def plot_side_by_side(lefts, rights, coupled=True, width_ratios=[2,1]):
import matplotlib.gridspec as gridspec
# lefts and rights are lists of functions that
# take axes objects as keywords, the length of this
# object is the number of subplots we have:
plots = list(zip(lefts, rights))
y_size = len(plots)
# create figure with a number of subplots:
fig = plt.figure(figsize=(10,y_size * 4))
gs = gridspec.GridSpec(y_size,2,width_ratios=width_ratios,height_ratios=[1 for _ in plots])
#get axes on the left
cleft_axes = [plt.subplot(gs[0,0])]
if y_size > 1:
cleft_axes += [plt.subplot(gs[i,0], sharex=cleft_axes[0]) for i in range(1,y_size)]
[plt.setp(ax.get_xticklabels(), visible=False) for ax in cleft_axes[:-1]]
# get axes on the right, if coupled we fix the yaxes
# together, otherwise we don't
if coupled:
yaxes = cleft_axes
else:
yaxes = [None for _ in cleft_axes]
cright_axes = [plt.subplot(gs[0,1], sharey=yaxes[0])]
if y_size > 1:
cright_axes += [plt.subplot(gs[i,1], sharey=yaxes[i], sharex=cright_axes[0]) for i in range(1,y_size)]
[plt.setp(ax.get_xticklabels(), visible=False) for ax in cright_axes[:-1]]
# for each plot in our list, give it an axes object if it is on
# the left or right. Now this function will plot on that axes
for (pl, pr), l, r, name in zip(plots,cleft_axes,cright_axes,names):
pl(ax=l)
pr(ax=r)
return fig
And I would like to be able to create a function that takes a axes object as a keyword and puts two plots on it:
def twoplots(ax=ax):
# make a grid of axes, allow them to be plotted to, etc.
# this is all within the space given me by `ax`.
Is this possible? How would I go about doing such a thing? I know that I can get the figure from the axes object that is passed, is it possible to modify the parent gridspec without messing up every other gridspec?
Hope I'm not committing a foul reviving a thread this old. I wanted to give some extra context on what I think the OP is trying to do. (At least I hope it is what he's trying to do, because I'm trying to do the same thing.)
Suppose I have a statistical model that is composed of K submodels of different types. I want the submodels to plot themselves. Most of the time, in the typical case, each submodel will plot itself on an axes object. Occasionally, a submodel might need multiple axes to plot itself.
For example: suppose a model is a time series model, and the submodels are showing trend, seasonality, regression effects, holiday effects, etc. If the seasonal effect is showing annual seasonality it will plot itself just like the trend model (its effect vs time). But if it is showing day of week seasonality the plot vs time will be ineffective, because the lines will wiggle too fast. It would be more effective to plot the time series of Mondays, then the time series of Tuesdays, etc. To fit in with the larger scheme you want this cluster of 7 plots to be "the seasonality plot."
With K submodels you can often start off with
fig, ax = plt.submodels(K) and then pass ax[k] to the submodel as model.submodel[k].plot(ax[k]). The question is what to do when you'd like to plot the day-of-week seasonality effect described above on ax[k].
One answer might be "don't use this mechanism: use GridSpec or something else." But that's what I think the question is asking.
Related
optional context feel free to skip: I'm currently using cartopy and matplotlib to read in and plot weather model data on a map. I have three different fields I'm plotting: temperature, wind, and surface pressure. I'm using contourf, barbs, and contour respectively to plot each field. I want one image for each field, and then I'd like one image that contains all three fields overlaid on a single map. Currently I'm doing this by plotting each field individually, saving each of the individual images, then replotting all three fields on a single ax and a new fig, and saving that fig. Since the data takes a while to plot, I would like to be able to plot each of the single fields, then combine the axes into one final image.
I'd like to be able to combine multiple matplotlib axes without replotting the data on the axes. I'm not sure if this is possible, but doing so would be a pretty major time and performance saver. An example of what I'm talking about:
from matplotlib import pyplot as plt
import numpy as np
x1 = np.linspace(0, 2*np.pi, 100)
x2 = x1 + 5
y = np.sin(x1)
firstFig = plt.figure()
firstAx = firstFig.gca()
firstAx.scatter(x1, y, 1, "red")
firstAx.set_xlim([0, 12])
secondFig = plt.figure()
secondAx = secondFig.gca()
secondAx.scatter(x2, y, 1, "blue")
secondAx.set_xlim([0, 12])
firstFig.savefig("1.png")
secondFig.savefig("2.png")
This generates two images, 1.png and 2.png.
Is it possible to save a third file, 3.png that would look something like the following, but without calling scatter again, because for my dataset, the actual plotting takes a long time?
If you just want to save images of your plots and you don't intend to further use the Figure objects, you can use the following after saving "2.png".
# get the scatter object from the first figure
scatter = firstAx.get_children()[0]
# remove it from this collection so you can assign it to a new axis
# the axis reassignment will raise an error if it already belongs to another axis
scatter.remove()
scatter.axes = secondAx
# now you can add it to your new axis
secondAx.add_artist(scatter)
secondFig.savefig("3.png")
This modifies both figures, as it removes a scatter from one and adds it to another. If for some reason you want to preserve them, you can copy the contents of secondFig to a new one and then add the scatter to that. However, this will still modify the first plot as you have to remove the scatter from there.
I wanted to use seaborn to visualize my entire Pandas dataframe with violinplots, and I thought I had made the necessary corrections to generate a large graph for the sizable number of 270 variables my dataframe possessed.
However, no matter what I do, the violinplots only display their inner mini-boxplots (as another question here describes) for each variable, and not their kde's:
fig, ax = plt.subplots(figsize=(50,5))
ax.set_ylim(-6, 6)
a = sns.violinplot(x='variable', y='value', data=pd.melt(train_norm), ax=ax)
a.set_xticklabels(a.get_xticklabels(), rotation=90);
plt.savefig('massive_violinplot.png', figsize=(50,5), dpi=220)
(apologies for the cropped graph, the whole thing is too big to post)
Whereas the following code, using the same pd.Dataframe, but only showing the first six variables, displays correctly:
fig, ax = plt.subplots(figsize=(10,5))
ax.set_ylim(-6, 6)
a = sns.violinplot(x='variable', y='value', data=pd.melt(train_norm.iloc[:,:6]), ax=ax)
a.set_xticklabels(a.get_xticklabels(), rotation=90);
plt.savefig('massive_violinplot.png', figsize=(10,5), dpi=220)
How could I get a graph like the above for all the variables, filled with proper violinplots showing their kde's?
This is not related to the number of variables or the plot size but to the huge differences in the distributions of the variables. I can't access your data right now so I will ilustrate it with a made up dataset. You can follow along with your dataset, selecting the three variables with more dispersion and the three with less dispersion. As a dispersion measurement you can use the variance or even the data range (if you don't have crazy long tails) or something different, I am not sure what would work better.
rs = np.random.RandomState(42)
data = rs.randn(100, 6)
data[:, :3] *= 20
df = pd.DataFrame(data)
See what happens if we plot the density with common axes so they are directly comparable.
df.plot(kind='kde', subplots=True, layout=(3, 2), sharex=True, sharey=True)
plt.tight_layout()
This is more or less the same you can see in the seaborn violin plot but of course transposed.
sns.violinplot(x='variable', y='value', data=pd.melt(df))
This is usually great for comparing the variables because you can look at the differences in width as differences in density. Unfortunately the violin for the variables with more dispersion are so narrow that you can't see the width at all and you lose any sense of the shape. On the other hand the variables with less dispersion appear too short (actually in your dataset some of them are just horizontal lines).
For the first problem you can make the violins use all the available horizontal space by using scale='width' but then you no longer can compare the density across variables. The width is the same at the peaks but the density is not.
sns.violinplot(x='variable', y='value', data=pd.melt(df), scale='width')
By the way, this is what matplotlib's violin plot does by default.
plt.violinplot(df.T)
For the second problem I think your only option is to normalize or standardize the variables in some way.
sns.violinplot(x='variable', y='value', data=pd.melt((df - df.mean()) / df.std()))
Now you have a clearer view of each variable separately (how many modes they have, how skewed they are, how long the tails are...) but you can compare neither the scale nor the dispersion across variables.
The moral of the story is that you can't see everything at once, you have to pick and choose depending on what you are looking for in the data.
Let's look at a swarmplot, made with Python 3.5 and Seaborn on some data (which is stored in a pandas dataframe df with column lables stored in another class. This does not matter for now, just look at the plot):
ax = sns.swarmplot(x=self.dte.label_temperature, y=self.dte.label_current, hue=self.dte.label_voltage, data = df)
Now the data is more readable if plotted in log scale on the y-axis because it goes over some decades.
So let's change the scaling to logarithmic:
ax.set_yscale("log")
ax.set_ylim(bottom = 5*10**-10)
Well I have a problem with the gaps in the swarms. I guess they are there because they have been there when the plot is created with a linear axis in mind and the dots should not overlap there. But now they look kind of strange and there is enough space to from 4 equal looking swarms.
My question is: How can I force seaborn to recalculate the position of the dots to create better looking swarms?
mwaskom hinted to me in the comments how to solve this.
It is even stated in the swamplot doku:
Note that arranging the points properly requires an accurate transformation between data and point coordinates. This means that non-default axis limits should be set before drawing the swarm plot.
Setting an existing axis to log-scale and use this for the plot:
fig = plt.figure() # create figure
rect = 0,0,1,1 # create an rectangle for the new axis
log_ax = fig.add_axes(rect) # create a new axis (or use an existing one)
log_ax.set_yscale("log") # log first
sns.swarmplot(x=self.dte.label_temperature, y=self.dte.label_current, hue=self.dte.label_voltage, data = df, ax = log_ax)
This yields in the correct and desired plotting behaviour:
I am looping through a bunch of CSV files containing various measurements.
Each file might be from one of 4 different data sources.
In each file, I merge the data into monthly datasets, that I then plot in a 3x4 grid. After this plot has been saved, the loop moves on and does the same to the next file.
This part I got figured out, however I would like to add a visual clue to the plots, as to what data it is. As far as I understand it (and tried it)
plt.subplot(4,3,1)
plt.hist(Jan_Data,facecolor='Red')
plt.ylabel('value count')
plt.title('January')
does work, however this way, I would have to add the facecolor='Red' by hand to every 12 subplots. Looping through the plots wont work for this situation, since I want the ylabel only for the leftmost plots, and xlabels for the bottom row.
Setting facecolor at the beginning in
fig = plt.figure(figsize=(20,15),facecolor='Red')
does not work, since it only changes the background color of the 20 by 15 figure now, which subsequently gets ignored when I save it to a PNG, since it only gets set for screen output.
So is there just a simple setthecolorofallbars='Red' command for plt.hist(… or plt.savefig(… I am missing, or should I just copy n' paste it to all twelve months?
You can use mpl.rc("axes", color_cycle="red") to set the default color cycle for all your axes.
In this little toy example, I use the with mpl.rc_context block to limit the effects of mpl.rc to just the block. This way you don't spoil the default parameters for your whole session.
import matplotlib as mpl
import matplotlib.pylab as plt
import numpy as np
np.random.seed(42)
# create some toy data
n, m = 2, 2
data = []
for i in range(n*m):
data.append(np.random.rand(30))
# and do the plotting
with mpl.rc_context():
mpl.rc("axes", color_cycle="red")
fig, axes = plt.subplots(n, m, figsize=(8,8))
for ax, d in zip(axes.flat, data):
ax.hist(d)
The problem with the x- and y-labels (when you use loops) can be solved by using plt.subplots as you can access every axis seperately.
import matplotlib.pyplot as plt
import numpy.random
# creating figure with 4 plots
fig,ax = plt.subplots(2,2)
# some data
data = numpy.random.randn(4,1000)
# some titles
title = ['Jan','Feb','Mar','April']
xlabel = ['xlabel1','xlabel2']
ylabel = ['ylabel1','ylabel2']
for i in range(ax.size):
a = ax[i/2,i%2]
a.hist(data[i],facecolor='r',bins=50)
a.set_title(title[i])
# write the ylabels on all axis on the left hand side
for j in range(ax.shape[0]):
ax[j,0].set_ylabel(ylabel[j])
# write the xlabels an all axis on the bottom
for j in range(ax.shape[1]):
ax[-1,j].set_xlabel(xlabels[j])
fig.tight_layout()
All features (like titles) which are not constant can be put into arrays and placed at the appropriate axis.
I have multiple lines to be drawn on the same axes, and each of them are dynamically updated (I use set_data), The issue being that i am not aware of the x and y limits of each of the lines. And axes.autoscale_view(True,True,True) / axes.set_autoscale_on(True) are not doing what they are supposed to. How do i auto scale my axes?
import matplotlib.pyplot as plt
fig = plt.figure()
axes = fig.add_subplot(111)
axes.set_autoscale_on(True)
axes.autoscale_view(True,True,True)
l1, = axes.plot([0,0.1,0.2],[1,1.1,1.2])
l2, = axes.plot([0,0.1,0.2],[-0.1,0,0.1])
#plt.show() #shows the auto scaled.
l2.set_data([0,0.1,0.2],[-1,-0.9,-0.8])
#axes.set_ylim([-2,2]) #this works, but i cannot afford to do this.
plt.draw()
plt.show() #does not show auto scaled
I have referred to these already, this , this.
In all cases I have come across, the x,y limits are known. I have multiple lines on the axes and their ranges change, keeping track of the ymax for the entire data is not practical
A little bit of exploring got me to this,
xmin,xmax,ymin,ymax = matplotlib.figure.FigureImage.get_extent(FigureImage)
But here again, i do not know how to access FigureImage from the Figure instance.
Using matplotlib 0.99.3
From the matplotlib docs for autoscale_view:
The data limits are not updated automatically when artist data are changed after the artist has been added to an Axes instance. In that case, use matplotlib.axes.Axes.relim() prior to calling autoscale_view.
So, you'll need to add two lines before your plt.draw() call after the set_data call:
axes.relim()
axes.autoscale_view(True,True,True)