I'm puzzled by the meaning of the 'ax' keyword in the pandas scatter_matrix function:
pd.scatter_matrix(frame, alpha=0.5, figsize=None, ax=None, grid=False, diagonal='hist', marker='.', density_kwds={}, hist_kwds={}, **kwds)
The only clue given in the docstring for the ax keyword is too cryptic for me:
ax : Matplotlib axis object
I had a look in the pandas code for the scatter_matrix function, and the ax variable is incorporated in the following matplotlib subplots call:
fig, axes = plt.subplots(nrows=n, ncols=n, figsize=figsize, ax=ax,
squeeze=False)
But, for the life of me, I can't find any reference to an 'ax' keyword in matplotlib subplots!
Can anyone tell me what this ax keyword is for???
This is tricky here. When looking at the source of pandas scatter_matrix you will find this line right after the docstring:
fig, axes = _subplots(nrows=n, ncols=n, figsize=figsize, ax=ax, squeeze=False)
Hence, internally, a new figure, axes combination is created using the internal _subplots method. This is strongly related to the matplotlibs subplots command but slightly different. Here, the ax keyword is supplied as well. If you look at the corresponding source (pandas.tools.plotting._subplots) you will find these lines:
if ax is None:
fig = plt.figure(**fig_kw)
else:
fig = ax.get_figure()
fig.clear()
Hence, if you supply an axes object (e.g. created using matplotlibs subplots command), pandas scatter_matrix grabs the corresponding (matplolib) figure object and deletes its content. Afterwards a new subplots grid is created into this figure object.
All in all, the ax keyword allows to plot the scatter matrix into a given figure (even though IMHO in a slightly strange way).
In short, it targets a subplot within a grid.
If you have nrows=2 and ncols=2, for example, then ax allows you to plot on a specific axis by passing ax=axes[0,0] (top left) or ax=axes[1,1] (bottom right), etc.
When you create the subplots, you receive an axes variable. You can later plot (or subplot) with an element of that axes variable as above.
Take a look at the "Targeting different subplots" section of this page: http://pandas.pydata.org/pandas-docs/dev/visualization.html#targeting-different-subplots
I hope this helps.
Related
I try to figure out how to create scatter plot in matplotlib with two different y-axis values.
Now i have one and need to add second with index column values on y.
points1 = plt.scatter(r3_load["TimeUTC"], r3_load["r3_load_MW"],
c=r3_load["r3_load_MW"], s=50, cmap="rainbow", alpha=1) #set style options
plt.rcParams['figure.figsize'] = [20,10]
#plt.colorbar(points)
plt.title("timeUTC vs Load")
#plt.xlim(0, 400)
#plt.ylim(0, 300)
plt.xlabel('timeUTC')
plt.ylabel('Load_MW')
cbar = plt.colorbar(points1)
cbar.set_label('Load')
Result i expect is like this:
So second scatter set should be for TimeUTC vs index. Colors are not the subject;) also in excel y-axes are different sites, but doesnt matter.
Appriciate your help! Thanks, Paulina
Continuing after the suggestions in the comments.
There are two ways of using matplotlib.
Via the matplotlib.pyplot interface, like you were doing in your original code snippet with .plt
The object-oriented way. This is the suggested way to use matplotlib, especially when you need more customisation like in your case. In your code, ax1 is an Axes instance.
From an Axes instance, you can plot your data using the Axes.plot and Axes.scatter methods, very similar to what you did through the pyplot interface. This means, you can write a Axes.scatter call instead of .plot and use the same parameters as in your original code:
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
ax1.scatter(r3_load["TimeUTC"], r3_load["r3_load_MW"],
c=r3_load["r3_load_MW"], s=50, cmap="rainbow", alpha=1)
ax2.plot(r3_dda249["TimeUTC"], r3_dda249.index, c='b', linestyle='-')
ax1.set_xlabel('TimeUTC')
ax1.set_ylabel('r3_load_MW', color='g')
ax2.set_ylabel('index', color='b')
plt.show()
I am trying to use Pandas DataFrame.plot() to plot two variable bar plots side by side with the following code:
fig, (ax1, ax2) = plt.subplots(1,2)
ax1 = train_df['Condition1'].value_counts().plot(kind='bar')
ax2 = train_df['Condition2'].value_counts().plot(kind='bar')
plt.show()
The result is this:
The data is Kaggle's House Prices dataset, however I do not think it matters to answering the question. I have tried this with multiple pairs of variables just to be sure. It only ever shows one plot on the right.
Interestingly enough, the assignment of axes does not matter. If you only assign ax1, it will show in the right hand plot. If you only assign ax2, it will be on the right side.
This occurs no matter what orientation I choose for my subplots (2,) (1,2), (2,1). Always one empty plot.
What's going on here?
You already created the axes with your first line of code. Your second and third code line overwrite these.
You need to pass ax1 and ax2 as arguments to pandas' plot function instead.
Try this:
fig, (ax1, ax2) = plt.subplots(1,2)
train_df['Condition1'].value_counts().plot(kind='bar', ax=ax1)
train_df['Condition2'].value_counts().plot(kind='bar', ax=ax2)
plt.show()
I am trying to plot box plots and violin plots for three variables against a variable in a 3X2 subplot formation. But I am not able to figure out how to include sns lib with subplot function.
#plots=plt.figure()
axis=plt.subplots(nrows=3,ncols=3)
for i,feature in enumerate(list(df.columns.values)[:-1]):
axis[i].plot(sns.boxplot(data=df,x='survival_status_after_5yrs',y=feature))
i+=1
axis[i].plot(sns.violinplot(data=df,x='survival_status_after_5yrs',y=feature))
plt.show()```
I am expecting 3X2 subplot, x axis stays same all the time y axis rolls over the three variables I have mentioned.
Thanks for your help.
I think you have two problems.
First, plt.subplots(nrows=3, ncols=2) returns a figure object and an array of axes objects so you should replace this line with:
fig, ax = plt.subplots(nrows=3, ncols=2). The ax object is now a 3x2 numpy array of axes objects.
You could turn this into a 1-d array with ax = ax.flatten() but given what I think you are trying to do I think it is easier to keep as 3x2.
(Btw I assume the ncols=3 is a typo)
Second, as Ewoud answer mentions with seaborn you pass the axes to plot on as an argument to the plot call.
I think the following will work for you:
fig, ax = plt.subplots(nrows=3, ncols=2)
for i, feature in enumerate(list(df.columns.values)[:-1]):
# for each feature create two plots on the same row
sns.boxplot(data=df, x='survival_status_after_5yrs',y=feature, ax=ax[i, 0])
sns.violinplot(data=df, x='survival_status_after_5yrs', y=feature, ax=ax[i, 1])
plt.show()
Most seaborn plot functions have an axis kwarg, so instead of
axis[i].plot(sns.boxplot(data=df,x='survival_status_after_5yrs',y=feature))
try
sns.boxplot(data=df,x='survival_status_after_5yrs',y=feature,axis=axis[i])
I'm trying to have three pseudocolor subplots side by side, with one colorbar for subplots #1 and #2 and a second colorbar for #3. I'd also like the color limits (clim) to be set so it's the same for the first two (so the first colorbar would reflect the values of both subplots #1 and #2).
Here's what I have so far:
import numpy as np
import matplotlib.pyplot as plt
plt.ion()
import matplotlib as mpl
data1 = np.random.random((10,10))
data2 = 2.*np.random.random((10,10))
data3 = 3.*np.random.random((10,10))
f, (ax1, ax2, ax3) = plt.subplots(1, 3, sharey=True)
imgplot1 = ax1.pcolormesh(data1, edgecolors='None')
imgplot2 = ax2.pcolormesh(data2, edgecolors='None')
plt.subplots_adjust(hspace=0.1, wspace=0.1)
cax2, kw = mpl.colorbar.make_axes([ax1, ax2])
plt.colorbar(imgplot2, cax=cax2, **kw)
imgplot2.set_clim(0,20)
imgplot3 = ax3.pcolormesh(data3, edgecolors='None')
cax3, kw = mpl.colorbar.make_axes([ax3])
plt.colorbar(imgplot3, cax=cax3, **kw)
imgplot2.set_clim(0,20) sets subplot #2 (though I have seen a backend-dependent issue where it doesn't always update unless you interact with the plot), but is there a way to link the color limits of two subplots so one colorbar can describe both plots?
update: To clarify, I'm looking for the ability to re-adjust clim after the plots have already been created.
Specify vmin and vmax in both your calls to pcolormesh of the axes where the data should be visualized with a common colorbar.
In your case:
opts = {'vmin': 0, 'vmax': 20, 'edgecolors': 'none'}
imgplot1 = ax1.pcolormesh(data1, **opts)
imgplot2 = ax2.pcolormesh(data2, **opts)
And then get rid of the call to imgplot2.set_clim(0,20).
Alternatively, you could call set_clim on imgplot1 too, using the same parameters. It would have the same effect.
Continuing on your comment below this answer, you could just as easily replace your single call to imgplot2.set_clim with a call to a custom function that updates the clim of both axes:
def update_clims(vmin, vmax, axes):
for ax in axes:
ax.set_clim(vmin, vmax)
If you really wanted to, you could rebind the set_clim method of the Axes class to perform the functionality above, provided you add a list of axes that have shared z-data. But that seems like a lot of work for an otherwise easy to implement function. And since you're calling an update of the clim already once on one axes, you could just as easily replace it with that call.
Pretty much what it says in the title.. most pandas examples suggest doing fig = plt.figure() before df.plot(..). But if I do that, two figures pop up after plt.show() - the first completely empty and the second with the actual pandas figure.. Any ideas why?
On a DataFrame, df.plot(..) will create a new figure, unless you provide an Axes object to the ax keyword argument.
So you are correct that the plt.figure() is not needed in this case. The plt.figure() calls in the pandas documentation should be removed, as they indeed are not needed. There is an issue about this: https://github.com/pydata/pandas/issues/8776
What you can do with the ax keyword is eg:
fig, ax = plt.subplots()
df.plot(..., ax=ax)
Note that when plotting a series, this will by default plot on the 'current' axis (plt.gca()) if you don't provide ax.