Understanding of fig, ax, and plt when combining Matplotlib and Pandas - python

I'm trying to get a better understanding of how figure, axes, and plt all fit together when combining Matplotlib and Pandas for plotting. The accepted answer here helped me connect Matplotlib and Pandas in an object oriented way I understand through this line:
fig, ax = plt.suplots()
df.plot(ax=ax)
But as I'm diving deeper the answer here threw me off. Specifically, I still have methods I need to call directly off plt, that don't apply to either a figure or an axis. Example:
fig, ax = plt.subplots()
df[['realgdp','trend']]["2000-03-31":].plot(figsize=(8,8), ax=ax)
ax.set_title('Real GDP & Trend')
ax.set_ylabel('Readl GDP')
plt.xticks(rotation=45)
If I try to call xticks(rotation=45) off ax or fig I get an error that neither ax nor fig have an xticks method. The solution I have above works, but I don't understand why.
When I type plt.xticks(rotations=45), where does that information get sent? Why does the comment in the answer here that "when you use the functions available on the module pyplot you are plotting to the 'current figure' and 'current axes'" not apply in this case? Why do I need to call off plt directly?

plt.xticks() only works on the "current" ax. You should use ax.set_xticks(), ax.set_xticklabels() and ax.tick_params() instead.
plt.xticks() is a rather old function that is still supported, mimicking similar matlab code, born in a time when people were only plotting onto a single plot. The newer functions are more general with more options.
In short: you don't need to call plt directly, you are invited to use the ax functions instead. When calling plt.xticks(), it gets rerouted to the currently active ax (often the last one created).

Related

How to apply an option to several figures at the same time

If I do this
import numpy as np
import matplotlib.pyplot as plt
a=[1,2,3]
b=[3,4,5]
plt.figure(1)
plt.xlim(0,3)
plt.plot(b)
plt.figure(2)
plt.plot(a)
plt.show()
the choice of the x axes will be applied only to figure 1. What can I use to discriminate between the options that I want to be valid for only figure 1 or 2 and the ones that I want to be applied to both figures?
Clarification: I know that it is possible to call plt.xlim several times. I was rather looking for some command of a form like
plt.apply_options_to(1,2)
and from that moment on every time I call plt.xlim the option is applied to both figures and not only one of the two.
With pyplot, each command applies to the currently active figure or axes. This means you can easily loop over the figures and apply each command like
for i in (1,2):
plt.figure(i)
plt.xlim(0,3)
Now those are three lines of code. If the requirement is to use a single line of code, the following would be a solution
[plt.setp(plt.figure(i).axes[0], xlim=(0,3)) for i in plt.get_fignums() if i in (1,2)]
This is neither elegant nor easy to type, so when using pyplot I would recommend the first solution.
In general however I would recommend using the object oriented approach, where creating two figures would look like:
import matplotlib.pyplot as plt
a=[1,2,3]
b=[3,4,5]
fig, ax = plt.subplots()
ax.plot(b)
fig2, ax2 = plt.subplots()
ax2.plot(a)
plt.show()
Then the single line solution is also a bit more compact
plt.setp([ax,ax2], xlim=(0,3))

How to suppress seaborn output when recalling figure object with regplot

I am trying to plot data to a figure and respective axis in matplotlib and as new work comes up, recall the figure with the additional plot on the axis:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
x=np.arange(0,20)
y=2*x
fig,ax=plt.subplots()
ax.scatter(x,x)
ax.scatter(x,y)
fig
Which works fine with matplotlib, if I however use seaborn's regplot:
fig2,ax2=plt.subplots()
sns.regplot(x,x,ax=ax2,fit_reg=False)
sns.regplot(x,y,ax=ax2,fit_reg=False)
fig2
fig2 generates the figure that I want but the regplot command generates an empty figure. Is there a way to suppress the regplot's empty output or have it display the updated ax2 without recalling fig2?
It seems you are using the jupyter notebook with the inline backend. In some circumstances regplot triggers the creation of a new figure even if the artists are being added to the previous one and this messes up the output. I don't know why this happens but I found a workaround that might help you, using plt.ioff to temporarily disable automatic display of figures.
plt.ioff()
fig, ax = plt.subplots()
sns.regplot(x, x, ax=ax)
fig
sns.regplot(x, 2 * x, ax=ax)
fig
You have to call plt.ioff before creating the figure for this to work. After that you have to explicitly display the figure. Then you can call plt.ion to restore the default behaviour.
regplot does not generate an empty figure. According to the documentation:
Understanding the difference between regplot() and lmplot() can be a
bit tricky. In fact, they are closely related, as lmplot() uses
regplot() internally and takes most of its parameters. However,
regplot() is an axes-level function, so it draws directly onto an axes
(either the currently active axes or the one provided by the ax
parameter), while lmplot() is a figure-level function and creates its
own figure, which is managed through a FacetGrid.
When I do the following:
fig2,ax2 = plt.subplots()
same_fig2 = sns.regplot(x,x,ax=ax2,fit_reg=False)
same_fig2.figure is fig2
>>> True

How modules know each other

I can plot data from a CSV file with the following code:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('test0.csv',delimiter='; ', engine='python')
df.plot(x='Column1', y='Column3')
plt.show()
But I don't understand one thing. How plt.show() knows about df? I'll make more sense to me seeing, somewhere, an expression like:
plt = something(df)
I have to mention I'm just learning Python.
Matplotlib has two "interfaces": a Matlab-style interface and an object-oriented interface.
Plotting with the Matlab-style interface looks like this:
import matplotlib.pyplot as plt
plt.plot(x, y)
plt.show()
The call to plt.plot implicitly creates a figure and an axes on which to draw.
The call to plt.show displays all figures.
Pandas is supporting the Matlab-style interface by implicitly creating a figure and axes for you when df.plot(x='Column1', y='Column3') is called.
Pandas can also use the more flexible object-oriented interface, in which case
your code would look like this:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('test0.csv',delimiter='; ', engine='python')
fig, ax = plt.subplots()
df.plot(ax=ax, x='Column1', y='Column3')
plt.show()
Here the axes, ax, is explicitly created and passed to df.plot, which then
calls ax.plot under the hood.
One case where the object-oriented interface is useful is when you wish to use
df.plot more than once while still drawing on the same axes:
fig, ax = plt.subplots()
df.plot(ax=ax, x='Column1', y='Column3')
df2.plot(ax=ax, x='Column2', y='Column4')
plt.show()
From the pandas docs on plotting:
The plot method on Series and DataFrame is just a simple wrapper
around :meth:plt.plot() <matplotlib.axes.Axes.plot>
So as is, the df.plot method is an highlevel call to plt.plot (using a wrapper), and thereafter, calling plt.show will simply:
display all figures and block until the figures have been closed
as it would with for all figures plotted with plt.plot.
Therefore, you don't see plt = something(df) as you would expect, because matpotlib.pyplot.plot is being called behind the scene by df.plot.
According to http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.show , the plt.show() itself doesn't know about the data, you need to pass the data as parameters.
What you are seeing should be the plot of pandas library, according to the usage http://pandas.pydata.org/pandas-docs/stable/visualization.html#basic-plotting-plot.
Hope this solves your question.

Turn axes off for all subplots of a figure

I am creating a large array of subplots and I want to turn off axes for all the subplots.
Currently I am achieving this by
fig, ax = plt.subplots(7, len(clusters))
fig.subplots_adjust(wspace=0, top=1.0, bottom=0.5, left=0, right=1.0)
for x in ax.ravel():
x.axis("off")
but looping over the subplots to turn of the axes individually is ugly.
Is there a way to tell subplots to turn od axes at creation time
or some setting on Figure or pyplot that turns axes off globally.
pyplot.axis('off') turns off axes just on the last subplot.
I agree with #tcaswell that you should probably just use what you're already using. Another option to use it as a function is to use numpy.vectorize():
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots(7, len(clusters))
np.vectorize(lambda ax:ax.axis('off'))(ax)
or, if you need to invoke it multiple times, by assigning the vectorized function to a variable:
axoff_fun = np.vectorize(lambda ax:ax.axis('off'))
# ... stuff here ...
fig, ax = plt.subplots(7, len(clusters))
axoff_fun(ax)
Again, note that this is the same thing that #tcaswell suggested, in a fancier setting (only slower, probably). And it's essentially the same thing you're using now.
However, if you insist on doing it some other way (i.e. you are a special kind of lazy), you can set matplotlib.rcParams once, and then every subsequent axes will automatically be off. There's probably an easier way to emulate axis('off'), but here's how I've succeeded:
import matplotlib as mpl
# before
mpl.pyplot.figure()
mpl.pyplot.plot([1,3,5],[4,6,5])
# kill axis in rcParams
mpl.rc('axes.spines',top=False,bottom=False,left=False,right=False);
mpl.rc('axes',facecolor=(1,1,1,0),edgecolor=(1,1,1,0));
mpl.rc(('xtick','ytick'),color=(1,1,1,0));
# after
mpl.pyplot.figure()
mpl.pyplot.plot([1,3,5],[4,6,5])
Result before/after:
Hopefully there aren't any surprises which I forgot to override, but that would become clear quite quickly in an actual application anyway.

Why do pyplot methods apply instantly and subplot axes methods do not?

I'm editing my graphs step by step. Doing so, plt functions from matplotlib.pyplot apply instantly to my graphical output of pylab. That's great.
If I address axes of a subplot, it does not happen anymore.
Please find both alternatives in my minimal working example.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
f = plt.figure()
sp1 = f.add_subplot(1,1,1)
f.show()
# This works well
sp1.set_xlim([1,5])
# Now I plot the graph
df = pd.Series([0,5,9,10,15])
df.hist(bins=50, color="red", alpha=0.5, normed=True, ax=sp1)
# ... and try to change the ticks of the x-axis
sp1.set_xticks(np.arange(1, 15, 1))
# Unfortunately, it does not result in an instant change
# because my plot has already been drawn.
# If I wanted to use the code above,
# I would have to execute him before drawing the graph.
# Therefore, I have to use this function:
plt.xticks(np.arange(1, 15, 1))
I understand that there is a difference between matplotlib.pyplot and an axis instance. Did I miss anything or does it just work this way?
Most of pyplot functions (if not all) have a call to plt.draw_if_interactive() before returning. So if you do
plt.ion()
plt.plot([1,2,3])
plt.xlim([-1,4])
you obtain that the plot is updated as you go. If you have interactive off, it won't create or update the plot until you don't call plt.show().
But all pyplot functions are wrappers around corresponding (usually) Axes methods.
If you want to use the OO interface, and still draw stuff as you type, you can do something like this
plt.ion() # if you don't have this, you probably don't get anything until you don't call a blocking `plt.show`
fig, ax = plt.subplots() # create an empty plot
ax.plot([1,2,3]) # create the line
plt.draw() # draw it (you can also use `draw_if_interactive`)
ax.set_xlim([-1,4]) #set the limits
plt.draw() # updata the plot
You don't have to use the pyplot you don't want, just remember to draw
The plt.xticks() method calls a function draw_if_interactive() that comes from pylab_setup(), who is updating the graph. In order to do it using sp1.set_xticks(), just call the corresponding show() method:
sp1.figure.show()

Categories

Resources