Reproducing default plot behaviour of pandas.DataFrame.plot - python

As a frequent user of pandas, I often want to plot my data.
Using the df.plot() is very convenient, and I like the layout it gives me.
I often have the problem that when I show the generated graph to someone else, they like it, but want some tiny changes.
This often digresses into me trying to recreate the exact graph in matplotlib, which turns into a couple of hundred rows of code and it still does not work quite the same way as the df.plot()
Is there a way to get the settings for the default plotting behaviour from pandas and just ad something to the plot?
Example:
df = pd.DataFrame([1,2,3,6],index=[15,16,17,18], columns=['values'])
df.plot(kind='bar')
This little piece of code makes this pretty graph:
Trying to recreate this with matplotlib turns into a few hours of digging through documentation and still not comming up with quite the right solution.
Not to mention how many lines of configuration code it is.
import matplotlib.pyplot as plt
import matplotlib.ticker as plticker
import matplotlib.patches as mpatches
fig, ax1 = plt.subplots()
ax1.bar(df.index, df['values'], 0.4, align='center')
loc = plticker.MultipleLocator(base=1.0)
ax1.xaxis.set_major_locator(loc)
ax1.xaxis.set_ticklabels(["","15", "16", "17", "18"])
plt.show()
TLDR;
How can I easily copy the behaviour of df.plot() and extend it, without having to recreate everything manually?

Related

Incorrect backend configuration with macosx in (old) matplotlib: plt.ion different from savefig, here overwritten alpha keyword

I am using interactive python with plt.ion() for generating figures (v2.7) and have noticed that the figure looks different from the figure exported by savefig (this is not a DPI issue (cf. matplotlib savefig() plots different from show()) - I think it might be a backend issue, but would appreciate help as I don't understand this properly).
Specifically, I wanted visualise the importance of a series of points by the intensity of their colour, which I thought I could do with the "alpha" keyword in matplotlib.
When I just do this, this works fine,
but when I want to add a line to the figure, the alpha keyword seemed to not work any more, and plt.ion() shows this:
I initially thought that perhaps the following issue on github may be related:
https://github.com/matplotlib/matplotlib/issues/4580
but then I noticed that exporting the figure actually produced the following file (i.e. as desired):
It would be great to understand a bit better what is going on, and how I can avoid such issues in the future. Is plt.ion()/plt.show() not the best way to show figures in interactive python, or is this an issue with the alpha keyword?
The code is here:
import numpy as np
from numpy import random as random
from matplotlib import pyplot as plt
fig2,ax2=plt.subplots(1,1,figsize=(3,3),sharey=True)
for ii in range(1):
ax2.plot(np.linspace(0,200,200), [0.1]*200, c= 'k')
for i in range(200):
test2=random.randint(5)
ydata= random.rand(test2)
test = random.rand(test2)
for j in range(test2):
ax2.plot(i,ydata[j],'o',ms=4, c= 'Darkblue',alpha=test[j],markeredgecolor='None')

Matplotlib doesn't forget previous data when saving figures with savefig

import matplotlib.pyplot as plt
plt.plot([1,2,3],[1,2,3],'ro')
plt.axis([-4,4,-4,4])
plt.savefig('azul.png')
plt.plot([0,1,2],[0,0,0],'ro')
plt.axis([-4,4,-4,4])
plt.savefig('amarillo.png')
Output:
Why does this happen and how to solve?
What you see is a completely expected behaviour. You can plot as many data as often as you want to the same figure, which is very often very useful.
If you want to create several figures in the same script using the matplotlib state machine, you need to first close one figure before generating the next.
So in this very simple case, just add plt.close() between figure creation.
import matplotlib.pyplot as plt
plt.plot([1,2,3],[1,2,3],'bo')
plt.axis([-4,4,-4,4])
plt.savefig('azul.png')
plt.close()
plt.plot([0,1,2],[0,0,0],'yo')
plt.axis([-4,4,-4,4])
plt.savefig('amarillo.png')

matplot and seaborn figure parameters/customizations

I'm so confused between the two. Every time I make a chart on either pyplot or seaborn, I have to guess what syntax to use. For example, for seaborn doesn't have a title setter so I have to remember to use plt.title. Or, for seaborn charts, plt.xlabel doesn't work, so I have to use sns.axlable(x,y).
And also, randomly I run into the following problem. I'm simply trying to make my seaborn jointplot bigger but I have no success trying both the plt nor the seaborn methods (any tips as to a good documentation showing all the chart parameters??? I find them scattered on the web and it seems like each solution on stack overflow is unique...which adds to the overall confusion).
Here's my code:
a = plt.figure(figsize=(30,30))
a.set_size_inches(30,30)
sns.jointplot(x='COAST',y='NORTH',data = data_df, kind = 'kde')
Notice I used the plt method and the sns.set_size_inches methods. Both gave me a small chart.
So frustrated with the random overlaps of the two libraries. Any pro tips to lessen the confusion will be greatly appreciated!
edit: This is also true for seaborn's pairplot. I have no success in changing the pairplot's size.
sns.jointplot creates its own figure instance (as #tcaswell suspected). It doesn't appear that you can tell jointplot to use an existing figure. I think you have two options:
You can give sns.jointplot the size option. e.g.:
sns.jointplot(x='COAST', y='NORTH', data=data_df, kind='kde', size=30)
You can alter the JointGrid figure size after creating it, using:
g=sns.jointplot(x='COAST', y='NORTH', data=data_df, kind='kde')
g.fig.set_size_inches(30,30)
I presume option 1 is the better option, as it is a built-in seaborn option

Matplotlib - Tcl_AsyncDelete: async handler deleted by the wrong thread?

I'm asking this question because I can't solve one problem in Python/Django (actually in pure Python it's ok) which leads to RuntimeError: tcl_asyncdelete async handler deleted by the wrong thread. This is somehow related to the way how I render matplotlib plots in Django. The way I do it is:
...
import matplotlib.pyplot as plt
...
fig = plt.figure()
...
plt.close()
I extremely minimized my code. But the catch is - even if I have just one line of code:
fig = plt.figure()
I see this RuntimeError happening. I hope I could solve the problem, If I knew the correct way of closing/cleaning/destroying plots in Python/Django.
By default matplotlib uses TK gui toolkit, when you're rendering an image without using the toolkit (i.e. into a file or a string), matplotlib still instantiates a window that doesn't get displayed, causing all kinds of problems. In order to avoid that, you should use an Agg backend. It can be activated like so --
import matplotlib
matplotlib.use('Agg')
from matplotlib import pyplot
For more information please refer to matplotlib documentation -- http://matplotlib.org/faq/howto_faq.html#matplotlib-in-a-web-application-server
The above (accepted) answer is a solution in a terminal environment. If you debug in an IDE, you still might wanna use 'TkAgg' for displaying data. In order to prevent this issue, apply these two simple rules:
everytime you display your data, initiate a new fig = plt.figure()
don't close old figures manually (e.g. when using a debug mode)
Example code:
import matplotlib
matplotlib.use('TkAgg')
from matplotlib import pyplot as plt
fig = plt.figure()
plt.plot(data[:,:,:3])
plt.show()
This proves to be the a good intermediate solution under MacOS and PyCharm IDE.
If you don't need to show plots while debugging, the following works:
import matplotlib
matplotlib.use('Agg')
from matplotlib import pyplot as plt
However, if you would like to plot while debugging, you need to do 3 steps:
1.Keep backend to 'TKAgg' as follows:
import matplotlib
matplotlib.use('TKAgg')
from matplot.lib import pyplot as plt
or simply
import matplotlib.pyplot as plt
2.As Fábio also mentioned, you need to add fig(no. #i)=plt.figure(no.#i) for each figure #i. As the following example for plot no.#1, add:
fig1 = plt.figure(1)
plt.plot(yourX,yourY)
plt.show()
3.Add breakpoints. You need to add two breakpoints at least, one somewhere at the beginning of your codes (before the first plot), and the other breakpoint at a point where you would like all plots (before to the second breakpoint) are plotted. All figures are plotted and you even don't need to close any figure manually.
For me, this happened due to parallel access to data by both Matplotlib and by Tensorboard, after Tensorboard's server was running for a week straight.
Rebotting tensorboard tensorboard --logdir . --samples_per_plugin images=100 solved this for me.
I encountered this problem when plotting graphs live with matplotlib in my tkinter application.
The easiest solution I found, was to always delete subplots. I found you didn't need to instantiate a new figure, you only needed to delete the old subplot (using del subplot), then remake it.
Before plotting a new graph, make sure to delete the old subplot.
Example:
f = Figure(figsize=(5,5), dpi=100)
a = f.add_subplot(111)
(For Loop code that updates graph every 5 seconds):
del a #delete subplot
a = f.add_subplot(111) #redefine subplot
Finding this simple solution to fix this "async handler bug" was excruciatingly painful, I hope this helps someone else :)

Pyplot/Subplot APIs Matplotlib

I'm making something using Matplotlib where I have multiple subplots on a figure. It seems to me like the subplot API is limited compared to the PyPlot API: for example, I can't seem to make custom axes labels in my subplot although it is possible using PyPlot.
My question is: Is there a richer subplot API besides the tiny one on the PyPlot page (http://matplotlib.org/api/pyplot_api.html), and/or is there a way to get the full functionality of a PyPlot on a subplot?
Basically, what is a subplot? I can't find it in the documentation. Even more generally, when should I use a figure vs an axis vs a subplot? They all seem to do essentially the same thing.
Consider the following code:
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(2,1,1)
Then ax is an axis? Can I use the pyplot API to customize ax?
Thanks for your help.
While i suggest that use the axes methods, there is the plt.sca function (set current axes).
So
plt.sca(ax)
does what you want, i think.

Categories

Resources