Creating charts with Pandas - python

My code is inside a Jupyter Notebook.
I can create a chart using Method 1 below, and have it look exactly as I'd like it to look.
But when I try with Method 2, which uses subplot, I don't know how to make it look the same (setting the figsize, colors, legend off to the right).
How do I use subplot, and have it look the same as Method 1?
Thank you in advance for your help!
# Using Numpy and Pandas
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.style as style
df = pd.DataFrame(np.random.randint(0,100,size=(4, 4)), columns=list('ABCD'))
style.use('fivethirtyeight')
# Colorblind-friendly colors
colors = [[0,0,0], [230/255,159/255,0], [86/255,180/255,233/255], [0,158/255,115/255]]
# Method 1
chart = df.plot(figsize = (10,5), color = colors)
chart.yaxis.label.set_visible(True)
chart.set_ylabel("Bitcoin Price")
chart.set_xlabel("Time")
chart.legend(bbox_to_anchor=(1.05, 1), loc=2)
plt.show()
# Method 2
fig, ax = plt.subplots()
ax.plot(df)
ax.set_ylabel("Bitcoin Price")
ax.set_xlabel("Time")
plt.show()

You just replace char by ax, like this
ax.yaxis.label.set_visible(True)
ax.set_ylabel("Bitcoin Price") ax.set_xlabel("Time") ax.legend(bbox_to_anchor=(1.05, 1), loc=2)

I'm thinking of two ways to get a result that might be useful for you. pd.DataFrame.plot returns an Axes object you can pass all the methods you want, so both examples just replace chart for ax.
Setup
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.style as style
df = pd.DataFrame(np.random.randint(0,100,size=(4, 4)), columns=list('ABCD'))
style.use('fivethirtyeight')
# Colorblind-friendly colors
colors = [[0,0,0], [230/255,159/255,0], [86/255,180/255,233/255], [0,158/255,115/255]]
Iterating over df
colors_gen = (x for x in colors) # we will also be iterating over the colors
fig, ax = plt.subplots(figsize = (10,5))
for i in df: # iterate over columns...
ax.plot(df[i], color=next(colors_gen)) # and plot one at a time
ax.set_ylabel("Bitcoin Price")
ax.set_xlabel("Time")
ax.legend(bbox_to_anchor=(1.05, 1), loc=2)
ax.yaxis.label.set_visible(True)
plt.show()
Use pd.DataFrame.plot but pass ax as an argument
fig, ax = plt.subplots(figsize = (10,5))
df.plot(color=colors, ax=ax)
ax.set_ylabel("Bitcoin Price")
ax.set_xlabel("Time")
ax.legend(bbox_to_anchor=(1.05, 1), loc=2)
ax.yaxis.label.set_visible(True)
plt.show()

Related

Matplotlib - Tight layout of multiple subplots with colorbar

I have a series of subplots in a single row, all sharing the same colorbar and I would like to use plt.tight_layout().
However when used naively, the colorbar messes everything up. Luckily, I found this in the matplotlib documentation, but it works only for one subplot.
Minimal Working Example
I tried to adapt it to multiple subplots but the subplot to which the colorbar is assigned to ends up being smaller.
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import make_axes_locatable
import numpy as np
plt.close('all')
arr = np.arange(100).reshape((10, 10))
fig, ax = plt.subplots(ncols=2, figsize=(8, 4))
im0 = ax[0].imshow(arr, interpolation="none")
im1 = ax[1].imshow(arr, interpolation='none')
divider = make_axes_locatable(plt.gca())
cax = divider.append_axes("right", "5%", pad="3%")
plt.colorbar(im0, cax=cax)
plt.tight_layout()
This is what the result looks like.
With the newest matplotlib (3.6), there is a new option layout='compressed' for this situation:
import matplotlib.pyplot as plt
import numpy as np
arr = np.arange(100).reshape((10, 10))
fig, ax = plt.subplots(ncols=2, figsize=(4, 2), layout='compressed')
im0 = ax[0].imshow(arr)
im1 = ax[1].imshow(arr)
plt.colorbar(im0, ax=ax)
plt.show()

Matplotlib subplot not plotting

I have the following code:
import pandas.util.testing as testing
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib as mpl
df = testing.makeTimeDataFrame(freq='MS')
with mpl.rc_context(rc={'font.family': 'serif', 'font.weight': 'bold', 'font.size': 12}):
fig = plt.figure(figsize= (12, 6))
fig.add_subplot(2, 2, (1,2))
ax2 = ax.twinx()
df['A'].plot(ax=ax, color = 'g')
df['B'].plot(ax=ax2, color ='g')
fig.add_subplot(223)
df['C'].plot(color='r')
fig.add_subplot(224)
df['D'].plot()
fig.tight_layout()
plt.show()
Which produces the following plot.
I am trying to plot df['A'] and df['B'] on the same top plot. Could you please advise what I have overlooked?
one little detail is missing. before calling twinx you need to assign ax to the first subplot. Then it'll work.
ax = fig.add_subplot(2, 2, (1,2))

How to delete legend in pandas

I try to plot with both pandas (pd) and matplotlib.pyplot (plt). But I don't want pandas to show legend yet I still need the plt legend. Is there a way I could delete the legend of pandas plot? (legend=False doesn't work)
import pandas as pd
import matplotlib.pyplot as plt
xs = [i for i in range(1, 11)]
ys = [i for i in range(1, 11)]
df = pd.DataFrame(list(zip(xs, ys)), columns=['xs', 'ys'])
fig, ax = plt.subplots()
# plot pd data-frame, I don't want this to show legend
df.plot(x='xs', y='ys', ax=ax, kind='line', legend=False)
# these doesn't work
ax.legend([])
ax.get_legend().remove()
ax.legend().set_visible(False)
# plot by plt, I only want this to show legend
ax.plot(xs, ys, label='I only need this label to be shown')
ax.legend()
plt.show() # still showing both legends
Note: I prefer not to change the order of plotting (even though plot plt first and then pd could allow showing only plt legend, but the plt plot will get block by pd plot), and not using plt to plot the dataframe's data
You can remove the 1st set of lines and labels from the legend:
fig, ax = plt.subplots()
df.plot(x='xs', y='ys', ax=ax, kind='line', label='Something')
ax.plot(xs, ys, label='I only need this label to be shown')
# Legend except 1st lines/labels
lines, labels = ax.get_legend_handles_labels()
ax.legend(lines[1:], labels[1:])
plt.show()
You can use matplotlib to plot DataFrame data (and other data from other sources) on the same plot without using df.plot(). Do you need to use df.plot(), or would this be okay?
import pandas as pd
import matplotlib.pyplot as plt
xs = [i for i in range(1, 11)]
ys = [i for i in range(1, 11)]
df = pd.DataFrame(list(zip(xs, ys)), columns=['xs', 'ys'])
fig, ax = plt.subplots()
#just keep using mpl but reference the data in the dataframe, basically what df.plot() does
ax.plot(df['xs'], df['ys'])
ax.plot(xs, ys, label='I only need this label to be shown')
ax.legend()
plt.show()
If you do insist on using df.plot(), you can still take advantage of the underscore trick, as described in the documentation:
Specific lines can be excluded from the automatic legend element selection by defining a label starting with an underscore.
import pandas as pd
import matplotlib.pyplot as plt
xs = [i for i in range(1, 11)]
ys = [i for i in range(1, 11)]
df = pd.DataFrame(list(zip(xs, ys)), columns=['xs', 'ys'])
fig, ax = plt.subplots()
# plot pd data-frame, I don't want this to show legend
df.plot(x='xs', y='ys', ax=ax, kind='line', label='_hidden')
# plot by plt, I only want this to show legend
ax.plot(xs, ys, label='I only need this label to be shown')
ax.legend()
plt.show() # still showing both legends
This will yield the same result as above, but I get a warning (UserWarning: The handle <matplotlib.lines.Line2D object at 0x00000283F0FFDB38> has a label of '_hidden' which cannot be automatically added to the legend.). This feels messier and more hacky, so I prefer the first option.
Use label='_nolegend_' as recommended here. This worked for me:
import pandas as pd
import matplotlib.pyplot as plt
xs = [i for i in range(1, 11)]
ys = [i for i in range(1, 11)]
df = pd.DataFrame(list(zip(xs, ys)), columns=['xs', 'ys'])
fig, ax = plt.subplots()
# plot pd data-frame, I don't want this to show legend
df.plot(x='xs', y='ys', ax=ax, kind='line', label='_nolegend_')
# plot by plt, I only want this to show legend
ax.plot(xs, ys, label='I only need this label to be shown')
ax.legend()
plt.show() # now showing one legend

How to use pandas df.plot.scatter to make a figure with subplots

Hello how can i make a figure with scatter subplots using pandas? Its working with plot, but not with scatter.
Here an Example
import numpy as np
import pandas as pd
matrix = np.random.rand(200,5)
df = pd.DataFrame(matrix,columns=['index','A','B','C','D'])
#single plot, working with
df.plot(
kind='scatter',
x='index',
y='A',
s= 0.5
)
# not workig
df.plot(
subplots=True,
kind='scatter',
x='index',
y=['A','B','C'],
s= 0.5
)
Error
raise ValueError(self._kind + " requires an x and y column")
ValueError: scatter requires an x and y column
Edit:
Solution to make a figure with subplots with using df.plot
(Thanks to #Fourier)
import numpy as np
import pandas as pd
matrix = np.random.rand(200,5)#random data
df = pd.DataFrame(matrix,columns=['index','A','B','C','D']) #make df
#get a list for subplots
labels = list(df.columns)
labels.remove('index')
df.plot(
layout=(-1, 5),
kind="line",
x='index',
y=labels,
subplots = True,
sharex = True,
ls="none",
marker="o")
Would this work for you:
import pandas as pd
import numpy as np
df = pd.DataFrame({"index":np.arange(5),"A":np.random.rand(5),"B":np.random.rand(5),"C":np.random.rand(5)})
df.plot(kind="line", x="index", y=["A","B","C"], subplots=True, sharex=True, ls="none", marker="o")
Output
Note: This uses a line plot with invisible lines. For a scatter, I would go and loop over it.
for column in df.columns[:-1]: #[:-1] ignores the index column for my random sample
df.plot(kind="scatter", x="index", y=column)
EDIT
In order to add custom ylabels you can do the following:
axes = df.plot(kind='line', x="index", y=["A","B","C"], subplots=True, sharex=True, ls="none", marker="o", legend=False)
ylabels = ["foo","bar","baz"]
for ax, label in zip(axes, ylabels):
ax.set_ylabel(label)

How do I make grid axes invisible for a pandas dataframe hist()?

Here is what i want to do, histogram plots of all columns of a dataframe but without the grid axes. The below code works, but preferably I'd like a more elegant solution (such as passing an argument to hist)
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
X = np.asarray([50]*25+[30]*10)
X2 = np.asarray([90]*10+[20]*25)
X3 = np.asarray([10]*15+[70]*20)
df = pd.DataFrame(np.vstack([X, X2, X3]).T)
def plot_hists(df, nbins=10, figsize=(8, 8), disable_axis_labels = True):
plt.close('all')
grid_of_ax_hists = df.hist(bins=nbins, figsize=figsize)
if disable_axis_labels:
for row in grid_of_ax_hists:
for ax in row:
ax.xaxis.set_visible(False)
ax.yaxis.set_visible(False)
plt.show()
df.hist()
plt.subplots()
plot_hists(df, nbins=10, figsize=(8, 8), disable_axis_labels = True)
Even so this thread is kinda old, i'd like to add a working solution because i've just had the same issue.
At least in pandas 0.18
df.hist() takes all possible plotting keywords from pandas.DataFrame.plot
df.hist(grid=False)
works easily.. and there is no need of dealing with matplotlib axes.
You can try:
fig = plt.figure(figsize=figsize)
ax1 = fig.add_subplot(111)
df.hist(bins=nbins, ax=ax1)
ax1.grid(b=False)

Categories

Resources