Determine kind of Matplotlib Axes subplot

Determine kind of Matplotlib Axes subplot - python

Given a matplotlib.axes_subplots.AexesSubplot object how do I tell what type of plot it contains? Is there a matplotlib feature that will determine this for me? for example...
I commonly plot data with pandas
import pandas as pd
df = pd.DataFrame({'y':range(10)})
line_ax = df.plot()
or
bar_ax = df.plot(kind='bar')
or
barh_ax = df.plot(kind='barh')

The matplotlib axes does not care about which plot it contains and it does not even know about it.
The question would also be how to distinguish "kinds" of plots. What kind of plot is in an axes which contains 2 bars, several markers, 2 lines and 3 arrows?
The kind argument to pandas plot function is simply a flag by which pandas decides which plotting function to call. This is independent of the axes and you may of course also have a plot produced by kind='bar' and kind='scatter' in the same axes.
So the answer is: No there is no general way to determine the kind of plot in an axes, mainly due to the fact that there is no such thing as a "kind of plot".
Of course, depending on what you'd need this type of information for, there are probably alternative ways to accomplish what you need.

Related

How do I use matplotlib to create a bar chart of a very large dataset?

The data I am working with is an array 27,000 elements long which is a histogram of a few million data points but what I have is the histogram and I need to plot it in my program, preferably with vertical bars.
I've tried using the 'bar' function in matplotlib but this takes a minute or two to plot whereas using just regular plot (with just points on the chart) is almost immediate but obviously does not achieve the effect I want (i.e. bars). I'm not sure why the bar function is so much slower so I was wondering if there was a more effective way to plot a histogram with vertical bars using matplotlib?
I've looked at the hist function with matplotlib but it's purpose to my understanding is to take data, make a histogram, and then plot it but I already have a histogram so I don't believe it works for my case. I greatly appreciate any help!
Here's a reference to the hist function documentation, maybe I missed something.
https://matplotlib.org/3.2.0/api/_as_gen/matplotlib.pyplot.hist.html
Thanks in advance! Let me know if you would like an example of the code I am working with but it is just your most generic my_axes.plot(my_data) or my_axes.bar(my_data) so I'm not sure how helpful it would be.
I've taken a look at this as well now: https://gist.github.com/pierdom/d639a1d3b8934ee31db8b2ab9997ae92.
This also works but has the same time issue as using bar so I suppose this is just an issue with rendering a lot of vertical bars? (though I still wonder why rendering 27000 points happens so quickly)

Apparently, this is a known and discussed limitation of the bar graph as it is currently implemented. See this issue and this discussion. Though there are questions about it's usefulness, in my particular case I have a toolbar across the top that allows the user to zoom in and move around the data set (which is very practical method for my use case).
However, a great alternative does exist in the form of stairs. Simply use fill and you have an effective bar graph, that is much more performant.
import matplotlib.pyplot as plt
import random
bins = range(27001) # Note that bins needs to be one greater then heights
heights = [random.randint(0, i) for i in range(27000)]
ax = plt.gca()
ax.stairs(heights, bins, fill=True)
plt.show()

matplotlib's bar should be pretty fast to execute so I'm guessing you're passing all the data points to it (although you mention you have "histogram data", so if you can provide more details on the format, that'd help).
bar takes the x positions for the bars and the heights, so if you want the bar function to produce a histogram you need to bin and count.
This will produce something similar to matplotlib's hist:
import matplotlib.pyplot as plt
bins = [0, 1, 2, 3]
heights = [1, 2, 3, 4]
ax = plt.gca()
ax.bar(bins, heights, align='center', width=1)

seaborn change clustermap visualization options without redoing the clustering

Is it possible to run seaborn.clustermap on a previously obtained ClusterGrid object?
For example I user clustermap to obtain g in the following example:
import seaborn as ns
data = sns.load_dataset("iris")
species = iris.pop("species")
g = sns.clustermap(
data,
cmap="mako",
col_cluster=False,
yticklabels=False, figsize=(5, 10),
method='ward',
metric="euclidean"
)
I would like to try different visualization options like different colormaps, figure sizes, how it looks with and without labels etc.
With the iris dataset everything is really fast, but I have a way larger dataset and the clustering part takes a lot of time.
Can I use g to show the heatmap and dendrogram using different options?

the object returned by clustermap is of type ClusterGrid. That object is not really documented in seaborn, however, it is essentially just a container for a few Axes objects. Depending on the kind of manipulations you want to make, you may simply need to access the relevant Axes object or the figure itself:
# change the figure size after the fact
g.fig.set_size_inches((4,4))
# remove the labels of the heatmap
g.ax_heatmap.set_xticklabels([])
The colormap thing is a little more difficult to access. clustermap uses matplotlib pcolormesh under the hood. This function returns a collection object (QuadMesh), which is store in the list of collections of the main axes (g.ax_heatmap.collections). Since, AFAIK, seaborn doesn't plot anything else on that axes, We can get the QuadMesh object by its index [0], and then we can use any function applicable to that object.
# change the colormap used
g.ax_heatmap.collections[0].set_cmap('seismic')

Data visualization in python (matplotlib) [duplicate]

I'm not really new to matplotlib and I'm deeply ashamed to admit I have always used it as a tool for getting a solution as quick and easy as possible. So I know how to get basic plots, subplots and stuff and have quite a few code which gets reused from time to time...but I have no "deep(er) knowledge" of matplotlib.
Recently I thought I should change this and work myself through some tutorials. However, I am still confused about matplotlibs plt, fig(ure) and ax(arr). What is really the difference?
In most cases, for some "quick'n'dirty' plotting I see people using just pyplot as plt and directly plot with plt.plot. Since I am having multiple stuff to plot quite often, I frequently use f, axarr = plt.subplots()...but most times you see only code putting data into the axarr and ignoring the figure f.
So, my question is: what is a clean way to work with matplotlib? When to use plt only, what is or what should a figure be used for? Should subplots just containing data? Or is it valid and good practice to everything like styling, clearing a plot, ..., inside of subplots?
I hope this is not to wide-ranging. Basically I am asking for some advice for the true purposes of plt <-> fig <-> ax(arr) (and when/how to use them properly).
Tutorials would also be welcome. The matplotlib documentation is rather confusing to me. When one searches something really specific, like rescaling a legend, different plot markers and colors and so on the official documentation is really precise but rather general information is not that good in my opinion. Too much different examples, no real explanations of the purposes...looks more or less like a big listing of all possible API methods and arguments.

pyplot is the 'scripting' level API in matplotlib (its highest level API to do a lot with matplotlib). It allows you to use matplotlib using a procedural interface in a similar way as you can do it with Matlab. pyplot has a notion of 'current figure' and 'current axes' that all the functions delegate to (#tacaswell dixit). So, when you use the functions available on the module pyplot you are plotting to the 'current figure' and 'current axes'.
If you want 'fine-grain' control of where/what your are plotting then you should use an object oriented API using instances of Figure and Axes.
Functions available in pyplot have an equivalent method in the Axes.
From the repo anatomy of matplotlib:
The Figure is the top-level container in this hierarchy. It is the overall window/page that everything is drawn on. You can have multiple independent figures and Figures can contain multiple Axes.
But...
Most plotting occurs on an Axes. The axes is effectively the area that we plot data on and any ticks/labels/etc associated with it. Usually we'll set up an Axes with a call to subplot (which places Axes on a regular grid), so in most cases, Axes and Subplot are synonymous.
Each Axes has an XAxis and a YAxis. These contain the ticks, tick locations, labels, etc.
If you want to know the anatomy of a plot you can visit this link.

I think that this tutorial explains well the basic notions of the object hierarchy of matplotlib like Figure and Axes, as well as the notion of current figure and current Axes.
If you want a quick answer: There is the Figure object which is the container that wraps multiple Axes(which is different from axis) which also contains smaller objects like legends, line, tick marks ... as shown in this image taken from matplotlib documentation
So when we do
>>> import matplotlib.pyplot as plt
>>> fig, ax = plt.subplots()
>>> type(fig)
<class 'matplotlib.figure.Figure'>
>>> type(ax)
<class 'matplotlib.axes._subplots.AxesSubplot'>
We have created a Figure object and an Axes object that is contained in that figure.

pyplot is matlab like API for those who are familiar with matlab and want to make quick and dirty plots
figure is object-oriented API for those who doesn't care about matlab style plotting
So you can use either one but perhaps not both together.

matplot and seaborn figure parameters/customizations

I'm so confused between the two. Every time I make a chart on either pyplot or seaborn, I have to guess what syntax to use. For example, for seaborn doesn't have a title setter so I have to remember to use plt.title. Or, for seaborn charts, plt.xlabel doesn't work, so I have to use sns.axlable(x,y).
And also, randomly I run into the following problem. I'm simply trying to make my seaborn jointplot bigger but I have no success trying both the plt nor the seaborn methods (any tips as to a good documentation showing all the chart parameters??? I find them scattered on the web and it seems like each solution on stack overflow is unique...which adds to the overall confusion).
Here's my code:
a = plt.figure(figsize=(30,30))
a.set_size_inches(30,30)
sns.jointplot(x='COAST',y='NORTH',data = data_df, kind = 'kde')
Notice I used the plt method and the sns.set_size_inches methods. Both gave me a small chart.
So frustrated with the random overlaps of the two libraries. Any pro tips to lessen the confusion will be greatly appreciated!
edit: This is also true for seaborn's pairplot. I have no success in changing the pairplot's size.

sns.jointplot creates its own figure instance (as #tcaswell suspected). It doesn't appear that you can tell jointplot to use an existing figure. I think you have two options:
You can give sns.jointplot the size option. e.g.:
sns.jointplot(x='COAST', y='NORTH', data=data_df, kind='kde', size=30)
You can alter the JointGrid figure size after creating it, using:
g=sns.jointplot(x='COAST', y='NORTH', data=data_df, kind='kde')
g.fig.set_size_inches(30,30)
I presume option 1 is the better option, as it is a built-in seaborn option

Plotting histograms against classes in pandas / matplotlib

Is there a idiomatic way to plot the histogram of a feature for two classes?
In pandas, I basically want
df.feature[df.class == 0].hist()
df.feature[df.class == 1].hist()
To be in the same plot. I could do
df.feature.hist(by=df.class)
but that gives me two separate plots.
This seems to be a common task so I would imagine there to be an idiomatic way to do this. Of course I could manipulate the histograms manually to fit next to each other but usually pandas does that quite nicely.
Basically I want this matplotlib example in one line of pandas: http://matplotlib.org/examples/pylab_examples/barchart_demo.html
I thought I was missing something, but maybe it is not possible (yet).

How about df.groupby("class").feature.hist()? To see overlapping distributions you'll probably need to pass alpha=0.4 to hist(). Alternatively, I'd be tempted to use a kernel density estimate instead of a histogram with df.groupby("class").feature.plot(kind='kde').
As an example, I plotted the iris dataset's classes using:
iris.groupby("Name").PetalWidth.plot(kind='kde', ax=axs[1])
iris.groupby("Name").PetalWidth.hist(alpha=0.4, ax=axs[0])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.