I'm making some EDA using pandas and seaborn, this is the code I have to plot the histograms of a group of features:
skewed_data = pd.DataFrame.skew(data)
skewed_features =skewed_data.index
fig, axs = plt.subplots(ncols=len(skewed_features))
plt.ticklabel_format(style='sci', axis='both', scilimits=(0,0))
for i,skewed_feature in enumerate(skewed_features):
g = sns.distplot(data[column])
sns.distplot(data[skewed_feature], ax=axs[i])
This is the result I'm getting:
Is not readable, how can I avoid that issue?
I know you are concerning about the layout of the figures. However, you need to first decide how to represent your data. Here are two choices for your case
(1) Multiple lines in one figure and
(2) Multiple subplots 2x2, each subplot draws one line.
I am not quite familiar with searborn, but the plotting of searborn is based on matplotlib. I could give you some basic ideas.
To archive (1), you can first declare the figure and ax, then add all line to this ax. Example codes:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
# YOUR LOOP, use the ax parameter
for i in range(3)
sns.distplot(data[i], ax=ax)
To archive (2), same as above, but with different number subplots, and put your line in the different subplot.
# Four subplots, 2x2
fig, axarr = plt.subplots(2,2)
# YOUR LOOP, use different cell
You may check matplotlib subplots demo. To do a good visualization is a very tough work. There are so many documents to read. Check the gallery of matplotlib or seaborn is a good and quick way to understand how some kinds of visualization are implemented.
Thanks.
Related
N.B.: I have edited the question as it was probably unclear: I am looking for the best method to understand the type of plot in a given axis.
QUESTION:
I am trying to make a generic function which can arrange multiple figures as subplots.
As I loop over the subplots to set some properties (e.g. axis range) iterating over fig.axes, I need to understand which type every plot is in order to determine which properties I want to set for each of them (e.g. I want to set x range on images and line plots, but not on colorbar, otherwise my plot will explode).
My question is then how I can distinguish between different types.
I tried to play with try and except and select on the basis of different properties for different plot types, but they seem to be the same for all of them, so, at the moment, the best way I found is to check the content of each axis: in particular ax.images is a non empty list if a plot is an image, and ax.lines is not empty if it is a line plot, (and a colorbar has both empty).
This works for simple plots, but I wonder if this is still the best way and still working for more complex cases (e.g. insets, overlapped lines and images, subclasses)?
This is just an example to illustrate how the different type of plots can be accessed, with the following code creating three axes l, i and cb (respectively line, image, colorbar):
# create test figure
plt.figure()
b = np.arange(12).reshape([4,3])
plt.subplot(121)
plt.plot([1,2,3],[4,5,6])
plt.subplot(122)
plt.imshow(b)
plt.colorbar()
# create test objects
ax=plt.gca()
fig=plt.gcf()
l,i,cb = fig.axes
# do a simple test, images are different:
for o in l,i,cb: print(len(o.images))
# this also doesn't work in finding properties not in common between lines and colobars, gives empty list.
[a for a in dir(l) if a not in dir(cb)]
After creating the image above in IPython
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.cm import ScalarMappable
fig, ax = plt.subplots()
ax.imshow(((0,1),(2,3)))
ax.scatter((0,1),(0,1), fc='w', ec='k')
ax.plot((0,1),(0,1))
fig.colorbar(ScalarMappable(), ax=ax)
plt.show()
I tried to investigate
In [48]: fig.axes
Out[48]: [<AxesSubplot:>, <AxesSubplot:label='<colorbar>'>]
I can recognize that one of the two axes is a colorbar — but it's easy to inspect the content of the individual axes
In [49]: fig.axes[0]._children
Out[49]:
[<matplotlib.image.AxesImage at 0x7fad9dda2b30>,
<matplotlib.collections.PathCollection at 0x7fad9dad04f0>,
<matplotlib.lines.Line2D at 0x7fad9dad09d0>]
In [50]: fig.axes[1]._children
Out[50]:
[<matplotlib.patches.Polygon at 0x7fad9db525f0>,
<matplotlib.collections.LineCollection at 0x7fad9db52830>,
<matplotlib.collections.QuadMesh at 0x7fad9dad2320>]
I have to remind you that
Matplotib provides you with many different container objects,
You can store the Axes destination in a list, or a dictionary, when you use it — you can even say ax.ax_type = 'lineplot'.
That said, e.g.,
from matplotlib.pyplot import subplots, plot
fig, ax = subplots()
plot((1, 2), (2, 1))
...
axes_types = []
for ax_i in fig.axes:
try:
ax_i.__getattr__('get_clabel')
axes_types.append('colorbar')
except AttributeError:
axes_types.append('lineplot')
...
In other word, chose a method that is unique to each one of the differnt types you're testing and check if it's available.
How can I use the lineplot plotting function in seaborn to create a plot with no lines connecting between the points. I know the function is called lineplot, but it has the useful feature of merging all datapoints with the same x value and plotting a single mean and confidence interval.
tips = sns.load_dataset('tips')
sns.lineplot(x='size', y='total_bill', data=tips, marker='o', err_style='bars')
How do I plot without the line? I'm not sure of a better way to phrase my question. How can I plot points only? Lineless lineplot?
I know that seaborn has a pointplot function, but that is for categorical data. In some cases, my x-values are continuous values, so pointplot would not work.
I realize one could get into the matplotlib figure artists and delete the line, but that gets more complicated as the amount of stuff on the plot increases. I was wondering if there are some sort of arguments that can be passed to the lineplot function.
To get error bars without the connecting lines, you can set the linestyle parameter to '':
import seaborn as sns
tips = sns.load_dataset('tips')
sns.lineplot(x='size', y='total_bill', data=tips, marker='o', linestyle='', err_style='bars')
Other types of linestyle could also be interesting, for example "a loosely dotted line": sns.lineplot(..., linestyle=(0, (1, 10)))
I recommend setting join=False.
For me only join = True works.
sns.pointplot(data=df, x = "x_attribute", y = "y_attribute", ci= 95, join=False)
I am trying to find a way to apply the shared axes parameters of subplot() to every other plot in a series of subplots.
I've got the following code, which uses data from RPM4, based on rows in fpD
fig, ax = plt.subplots(2*(fpD['name'].count()), sharex=True, figsize=(6,fpD['name'].count()*2),
gridspec_kw={'height_ratios':[5,1]*fpD['name'].count()})
for i, r in fpD.iterrows():
RPM4[RPM4['name'] == RPM3.iloc[i,0]].plot(x='date', y='RPM', ax=ax[(2*i)], legend=False)
RPM4[RPM4['name'] == RPM3.iloc[i,0]].plot(kind='area', color='lightgrey', x='date', y='total', ax=ax[(2*i)+1],
legend=False,)
ax[2*i].set_title('test', fontsize=12)
plt.tight_layout()
Which produces an output that is very close to what I need. It loops through the 'name' column in a table and produces two plots for each, and displays them as subplots:
As you can see, the sharex parameter works fine for me here, since I want all the plots to share the same axis.
However, what I'd really like is for all the even-numbered (bigger) plots to share the same y axis, and for the odd-numbered (small grey) plots to all share a different y axis.
Any help on accomplishing this is much appreciated, thanks!
Due to the 2nd answer of this question I supposed the following code
import matplotlib.pyplot as plt
for i1 in range(2):
plt.figure(1)
f, ax = plt.subplots()
plt.plot((0,3), (2, 2), 'b')
for i2 in range(2):
plt.figure(2)
f, ax = plt.subplots()
plt.plot([1,2,3], [1,2,3], 'r')
plt.savefig('foo_{}_bar_{}.jpg'.format(i2, i1))
plt.close()
plt.figure(1)
plt.plot( [1,2,3],[1,2,3], 'r')
plt.savefig('bar_{}.jpg'.format(i1))
plt.close()
to create plots bar_0.jpg and bar_1.jpg showing a blue and a red line each.
However, figures look like
instead of
How can I achieve the desired behaviour?
Note that plots foo_*.jpg have to be closed and saved during handling of the bar plots.
You're already saving the Axes objects, so instead of calling the PyPlot plot function (which draws on the last created or activated Axes), use the objects' plot function:
ax.plot(...)
If you then give both a different name, say ax1 and ax2, you can draw on the one you like without interfering with the other. All plt. commands also exist as an Axes member function, but sometimes the name changes (plt.xticks becomes ax.set_xticks for example). See the documentation of Axes for details.
To save to figures, use the Figure objects in the same way:
f.savefig(...)
This API type is only just coming to Matlab, FYI, and will probably replace the old-fashioned "draw on the last active plot" behaviour in the future. The object-oriented approach here is more flexible with minimal overhead, so I strongly recommend you use it everywhere.
If unsure, better to make it explicit:
import matplotlib.pyplot as plt
for i1 in range(2):
fig1,ax1 = plt.subplots()
fig2,ax2 = plt.subplots()
ax1.plot([0,4],[2,2],'b')
for i2 in range(2):
ax2.plot([1,2,3],[1,2,3],'r')
fig2.savefig('abc{}.png'.format(2*i1+i2))
plt.figure(1)
ax1.plot([1,2,3],[1,2,3],'r')
fig1.savefig('cba{}.png'.format(i1))
I'm puzzled by the meaning of the 'ax' keyword in the pandas scatter_matrix function:
pd.scatter_matrix(frame, alpha=0.5, figsize=None, ax=None, grid=False, diagonal='hist', marker='.', density_kwds={}, hist_kwds={}, **kwds)
The only clue given in the docstring for the ax keyword is too cryptic for me:
ax : Matplotlib axis object
I had a look in the pandas code for the scatter_matrix function, and the ax variable is incorporated in the following matplotlib subplots call:
fig, axes = plt.subplots(nrows=n, ncols=n, figsize=figsize, ax=ax,
squeeze=False)
But, for the life of me, I can't find any reference to an 'ax' keyword in matplotlib subplots!
Can anyone tell me what this ax keyword is for???
This is tricky here. When looking at the source of pandas scatter_matrix you will find this line right after the docstring:
fig, axes = _subplots(nrows=n, ncols=n, figsize=figsize, ax=ax, squeeze=False)
Hence, internally, a new figure, axes combination is created using the internal _subplots method. This is strongly related to the matplotlibs subplots command but slightly different. Here, the ax keyword is supplied as well. If you look at the corresponding source (pandas.tools.plotting._subplots) you will find these lines:
if ax is None:
fig = plt.figure(**fig_kw)
else:
fig = ax.get_figure()
fig.clear()
Hence, if you supply an axes object (e.g. created using matplotlibs subplots command), pandas scatter_matrix grabs the corresponding (matplolib) figure object and deletes its content. Afterwards a new subplots grid is created into this figure object.
All in all, the ax keyword allows to plot the scatter matrix into a given figure (even though IMHO in a slightly strange way).
In short, it targets a subplot within a grid.
If you have nrows=2 and ncols=2, for example, then ax allows you to plot on a specific axis by passing ax=axes[0,0] (top left) or ax=axes[1,1] (bottom right), etc.
When you create the subplots, you receive an axes variable. You can later plot (or subplot) with an element of that axes variable as above.
Take a look at the "Targeting different subplots" section of this page: http://pandas.pydata.org/pandas-docs/dev/visualization.html#targeting-different-subplots
I hope this helps.