Pandas plot: Assign Colors - python

I have many data frames that I am plotting for a presentation. These all have different columns, but all contain the same additional column foobar. At the moment, I am plotting these different data frames using
df.plot(secondary_y='foobar')
Unfortunately, since these data frames all have different additional columns with different ordering, the color of foobar is always different. This makes the presentation slides unnecessary complicated. I would like, throughout the different plots, assign that foobar is plotted bold and black.
Looking at the docs, the only thing coming close appears to be the parameter colormap - I would need to ensure that the xth color in the color map is always black, where x is the order of foobar in the data frame. Seems to be more complicated than it should be, also this wouldn't make it bold.
Is there a (better) approach?

I would suggest using matplotlib directly rather than the dataframe plotting methods. If df.plot returned the artists it added instead of an Axes object it wouldn't be too bad to change the color of the line after it was plotted.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
def pandas_plot(ax, df, callout_key):
"""
Parameters
----------
ax : mpl.Axes
The axes to draw to
df : DataFrame
Data to plot
callout_key : str
key to highlight
"""
artists = {}
x = df.index.values
for k, v in df.iteritems():
style_kwargs = {}
if k == callout_key:
style_kwargs['c'] = 'k'
style_kwargs['lw'] = 2
ln, = ax.plot(x, v.values, **style_kwargs)
artists[k] = ln
ax.legend()
ax.set_xlim(np.min(x), np.max(x))
return artists
Usage:
fig, ax = plt.subplots()
ax2 = ax.twinx()
th = np.linspace(0, 2*np.pi, 1024)
df = pd.DataFrame({'cos': np.cos(th), 'sin': np.sin(th),
'foo': np.sin(th + 1), 'bar': np.cos(th +1)}, index=th)
df2 = pd.DataFrame({'cos': -np.cos(th), 'sin': -np.sin(th)}, index=th)
pandas_plot(ax, df, 'sin')
pandas_plot(ax2, df2, 'sin')

Perhaps you could define a function which handles the special column in a separate plot call:
def emphasize_plot(ax, df, col, **emphargs):
columns = [c for c in df.columns if c != col]
df[columns].plot(ax=ax)
df[col].plot(ax=ax, **emphargs)
Using code from tcaswell's example,
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
def emphasize_plot(ax, df, col, **emphargs):
columns = [c for c in df.columns if c != col]
df[columns].plot(ax=ax)
df[col].plot(ax=ax, **emphargs)
fig, ax = plt.subplots()
th = np.linspace(0, 2*np.pi, 1024)
df = pd.DataFrame({'cos': np.cos(th), 'foobar': np.sin(th),
'foo': np.sin(th + 1), 'bar': np.cos(th +1)}, index=th)
df2 = pd.DataFrame({'cos': -np.cos(th), 'foobar': -np.sin(th)}, index=th)
emphasize_plot(ax, df, 'foobar', lw=2, c='k')
emphasize_plot(ax, df2, 'foobar', lw=2, c='k')
plt.show()
yields

I used #unutbut's answer and extended it to allow for a secondary y axis and correct legends:
def emphasize_plot(ax, df, col, **emphargs):
columns = [c for c in df.columns if c != col]
ax2 = ax.twinx()
df[columns].plot(ax=ax)
df[col].plot(ax=ax2, **emphargs)
lines, labels = ax.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax2.legend(lines + lines2, labels + labels2, loc=0)

Related

plotting area plot as a subplot [duplicate]

I have a few Pandas DataFrames sharing the same value scale, but having different columns and indices. When invoking df.plot(), I get separate plot images. what I really want is to have them all in the same plot as subplots, but I'm unfortunately failing to come up with a solution to how and would highly appreciate some help.
You can manually create the subplots with matplotlib, and then plot the dataframes on a specific subplot using the ax keyword. For example for 4 subplots (2x2):
import matplotlib.pyplot as plt
fig, axes = plt.subplots(nrows=2, ncols=2)
df1.plot(ax=axes[0,0])
df2.plot(ax=axes[0,1])
...
Here axes is an array which holds the different subplot axes, and you can access one just by indexing axes.
If you want a shared x-axis, then you can provide sharex=True to plt.subplots.
You can see e.gs. in the documentation demonstrating joris answer. Also from the documentation, you could also set subplots=True and layout=(,) within the pandas plot function:
df.plot(subplots=True, layout=(1,2))
You could also use fig.add_subplot() which takes subplot grid parameters such as 221, 222, 223, 224, etc. as described in the post here. Nice examples of plot on pandas data frame, including subplots, can be seen in this ipython notebook.
You can plot multiple subplots of multiple pandas data frames using matplotlib with a simple trick of making a list of all data frame. Then using the for loop for plotting subplots.
Working code:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# dataframe sample data
df1 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
df2 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
df3 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
df4 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
df5 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
df6 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
#define number of rows and columns for subplots
nrow=3
ncol=2
# make a list of all dataframes
df_list = [df1 ,df2, df3, df4, df5, df6]
fig, axes = plt.subplots(nrow, ncol)
# plot counter
count=0
for r in range(nrow):
for c in range(ncol):
df_list[count].plot(ax=axes[r,c])
count+=1
Using this code you can plot subplots in any configuration. You need to define the number of rows nrow and the number of columns ncol. Also, you need to make list of data frames df_list which you wanted to plot.
You can use the familiar Matplotlib style calling a figure and subplot, but you simply need to specify the current axis using plt.gca(). An example:
plt.figure(1)
plt.subplot(2,2,1)
df.A.plot() #no need to specify for first axis
plt.subplot(2,2,2)
df.B.plot(ax=plt.gca())
plt.subplot(2,2,3)
df.C.plot(ax=plt.gca())
etc...
You can use this:
fig = plt.figure()
ax = fig.add_subplot(221)
plt.plot(x,y)
ax = fig.add_subplot(222)
plt.plot(x,z)
...
plt.show()
You may not need to use Pandas at all. Here's a matplotlib plot of cat frequencies:
x = np.linspace(0, 2*np.pi, 400)
y = np.sin(x**2)
f, axes = plt.subplots(2, 1)
for c, i in enumerate(axes):
axes[c].plot(x, y)
axes[c].set_title('cats')
plt.tight_layout()
Option 1: Create subplots from a dictionary of dataframes with long (tidy) data
Assumptions:
There is a dictionary of multiple dataframes of tidy data that are either:
Created by reading in from files
Created by separating a single dataframe into multiple dataframes
The categories, cat, may be overlapping, but all dataframes don't necessarily contain all values of cat
hue='cat'
This example uses a dict of dataframes, but a list of dataframes would be similar.
If the dataframes are wide, use pandas.DataFrame.melt to convert them to long form.
Because dataframes are being iterated through, there's no guarantee that colors will be mapped the same for each plot
A custom color map needs to be created from the unique 'cat' values for all the dataframes
Since the colors will be the same, place one legend to the side of the plots, instead of a legend in every plot
Tested in python 3.10, pandas 1.4.3, matplotlib 3.5.1, seaborn 0.11.2
Imports and Test Data
import pandas as pd
import numpy as np # used for random data
import matplotlib.pyplot as plt
from matplotlib.patches import Patch # for custom legend - square patches
from matplotlib.lines import Line2D # for custom legend - round markers
import seaborn as sns
import math import ceil # determine correct number of subplot
# synthetic data
df_dict = dict()
for i in range(1, 7):
np.random.seed(i) # for repeatable sample data
data_length = 100
data = {'cat': np.random.choice(['A', 'B', 'C'], size=data_length),
'x': np.random.rand(data_length), 'y': np.random.rand(data_length)}
df_dict[i] = pd.DataFrame(data)
# display(df_dict[1].head())
cat x y
0 B 0.944595 0.606329
1 A 0.586555 0.568851
2 A 0.903402 0.317362
3 B 0.137475 0.988616
4 B 0.139276 0.579745
# display(df_dict[6].tail())
cat x y
95 B 0.881222 0.263168
96 A 0.193668 0.636758
97 A 0.824001 0.638832
98 C 0.323998 0.505060
99 C 0.693124 0.737582
Create color mappings and plot
# create color mapping based on all unique values of cat
unique_cat = {cat for v in df_dict.values() for cat in v.cat.unique()} # get unique cats
colors = sns.color_palette('tab10', n_colors=len(unique_cat)) # get a number of colors
cmap = dict(zip(unique_cat, colors)) # zip values to colors
col_nums = 3 # how many plots per row
row_nums = math.ceil(len(df_dict) / col_nums) # how many rows of plots
# create the figue and axes
fig, axes = plt.subplots(row_nums, col_nums, figsize=(9, 6), sharex=True, sharey=True)
# convert to 1D array for easy iteration
axes = axes.flat
# iterate through dictionary and plot
for ax, (k, v) in zip(axes, df_dict.items()):
sns.scatterplot(data=v, x='x', y='y', hue='cat', palette=cmap, ax=ax)
sns.despine(top=True, right=True)
ax.legend_.remove() # remove the individual plot legends
ax.set_title(f'dataset = {k}', fontsize=11)
fig.tight_layout()
# create legend from cmap
# patches = [Patch(color=v, label=k) for k, v in cmap.items()] # square patches
patches = [Line2D([0], [0], marker='o', color='w', markerfacecolor=v, label=k, markersize=8) for k, v in cmap.items()] # round markers
# place legend outside of plot; change the right bbox value to move the legend up or down
plt.legend(title='cat', handles=patches, bbox_to_anchor=(1.06, 1.2), loc='center left', borderaxespad=0, frameon=False)
plt.show()
Option 2: Create subplots from a single dataframe with multiple separate datasets
The dataframes must be in a long form with the same column names.
This option uses pd.concat to combine multiple dataframes into a single dataframe, and .assign to add a new column.
See Import multiple csv files into pandas and concatenate into one DataFrame for creating a single dataframes from a list of files.
This option is easier because it doesn't require manually mapping colors to 'cat'
Combine DataFrames
# using df_dict, with dataframes as values, from the top
# combine all the dataframes in df_dict to a single dataframe with an identifier column
df = pd.concat((v.assign(dataset=k) for k, v in df_dict.items()), ignore_index=True)
# display(df.head())
cat x y dataset
0 B 0.944595 0.606329 1
1 A 0.586555 0.568851 1
2 A 0.903402 0.317362 1
3 B 0.137475 0.988616 1
4 B 0.139276 0.579745 1
# display(df.tail())
cat x y dataset
595 B 0.881222 0.263168 6
596 A 0.193668 0.636758 6
597 A 0.824001 0.638832 6
598 C 0.323998 0.505060 6
599 C 0.693124 0.737582 6
Plot a FacetGrid with seaborn.relplot
sns.relplot(kind='scatter', data=df, x='x', y='y', hue='cat', col='dataset', col_wrap=3, height=3)
Both options create the same result, however, it's less complicated to combine all the dataframes, and plot a figure-level plot with sns.relplot.
Building on #joris response above, if you have already established a reference to the subplot, you can use the reference as well. For example,
ax1 = plt.subplot2grid((50,100), (0, 0), colspan=20, rowspan=10)
...
df.plot.barh(ax=ax1, stacked=True)
Here is a working pandas subplot example, where modes is the column names of the dataframe.
dpi=200
figure_size=(20, 10)
fig, ax = plt.subplots(len(modes), 1, sharex="all", sharey="all", dpi=dpi)
for i in range(len(modes)):
ax[i] = pivot_df.loc[:, modes[i]].plot.bar(figsize=(figure_size[0], figure_size[1]*len(modes)),
ax=ax[i], title=modes[i], color=my_colors[i])
ax[i].legend()
fig.suptitle(name)
import numpy as np
import pandas as pd
imoprt matplotlib.pyplot as plt
fig, ax = plt.subplots(2,2)
df = pd.DataFrame({'A':np.random.randint(1,100,10),
'B': np.random.randint(100,1000,10),
'C':np.random.randint(100,200,10)})
for ax in ax.flatten():
df.plot(ax =ax)

Creating box plots by looping multiple columns

I am trying to create multiple box plot charts for about 5 columns in my dataframe (df_summ):
columns = ['dimension_a','dimension_b']
for i in columns:
sns.set(style = "ticks", palette = "pastel")
box_plot = sns.boxplot(y="measure", x=i,
palette=["m","g"],
data=df_summ_1500_delta)
sns.despine(offset=10, trim=True)
medians = df_summ_1500_delta.groupby([i])['measure'].median()
vertical_offset=df_summ_1500_delta['measure'].median()*-0.5
for xtick in box_plot.get_xticks():
box_plot.text(xtick,medians[xtick] + vertical_offset,medians[xtick],
horizontalalignment='center',size='small',color='blue',weight='semibold')
My only issue is that they aren't be separated on different facets, but rather on top of each other.
Any help on how I can make both on their own separate chart with the x axis being 'dimension a' and the x axis of the second chart being 'dimension b'.
To draw two boxplots next to each other at each x-position, you can use a hue for dimension_a and dimension_b separately. These two columns need to be transformed (with pd.melt()) to "long form".
Here is a some example code starting from generated test data. Note that the order both for the x-values as for the hue-values needs to be enforced to be sure of their exact position. The individual box plots are distributed over a width of 0.8.
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
df = pd.DataFrame({'dimension_a': np.random.choice(['hot', 'cold'], 100),
'dimension_b': np.random.choice(['hot', 'cold'], 100),
'measure': np.random.uniform(100, 500, 100)})
df.loc[df['dimension_a'] == 'hot', 'measure'] += 100
df.loc[df['dimension_a'] == 'cold', 'measure'] -= 100
x_order = ['hot', 'cold']
columns = ['dimension_a', 'dimension_b']
df1 = df.melt(value_vars=columns, var_name='dimension', value_name='value', id_vars='measure')
sns.set(style="ticks", palette="pastel")
ax = sns.boxplot(data=df1, x='value', order=x_order, y='measure',
hue='dimension', hue_order=columns, palette=["m", "g"], dodge=True)
ax.set_xlabel('')
sns.despine(offset=10, trim=True)
for col, dodge_dist in zip(columns, np.linspace(-0.4, 0.4, 2 * len(x_order) + 1)[1::2]):
medians = df.groupby([col])['measure'].median()
vertical_offset = df['measure'].median() * -0.5
for x_ind, xtick in enumerate(x_order):
ax.text(x_ind + dodge_dist, medians[xtick] + vertical_offset, f'{medians[xtick]:.2f}',
horizontalalignment='center', size='small', color='blue', weight='semibold')
plt.show()

Legends disappear when {"hist":False} in seaborn distplot

I have the following function:
Say hue="animals have three categories dog,bird,horse and we have two dataframes df_m and df_f consisting of data of male animals and women animals only, respectively.
The function plots three distplot of y (e.g y="weight") one for each hue={dog,bird,horse}. In each subplot we plot df_m[y] and df_f[y] such that I can compare the weight of male dogs/female dogs, male birds/female birds, male horses/female horses.
If I set distkwargs={"hist":False} when calling the function the legends ["F","M"] disappears, for some reason. Having distkwargs={"hist":True}` shows the legends
def plot_multi_kde_cat(self,dfs,y,hue,subkwargs={},distkwargs={},legends=[]):
"""
Create a subplot multi_kde with categories in the same plot
dfs: List
- DataFrames for each category e.g one for male and one for females
hue: string
- column for which each category is plotted (in each subplot)
"""
hues = dfs[0][hue].cat.categories
if len(hues)==2: #Only two categories
fig,axes = plt.subplots(1,2,**subkwargs) #Get axes and flatten them
axes=axes.flatten()
for ax,hu in zip(axes,hues):
for df in dfs:
sns.distplot(df.loc[df[hue]==hu,y],ax=ax,**distkwargs)
ax.set_title(f"Segment: {hu}")
ax.legend(legends)
else: #More than two categories: create a square grid and remove unsused axes
n_rows = int(np.ceil(np.sqrt(len(hues)))) #number of rows
fig,axes = plt.subplots(n_rows,n_rows,**subkwargs)
axes = axes.flatten()
for ax,hu in zip(axes,hues):
for df in dfs:
sns.distplot(df.loc[df[hue]==hu,y],ax=ax,**distkwargs)
ax.set_title(f"Segment: {hu}")
ax.legend(legends)
n_remove = len(axes)-len(hues) #number of axes to remove
if n_remove>0:
for ax in axes[-n_remove:]:
ax.set_visible(False)
fig.tight_layout()
return fig,axes
You can work around the problem by explicitly providing the label to the distplot. This forces a legend entry for each distplot. ax.legend() then already gets the correct labels.
Here is some minimal sample code to illustrate how everything works together:
from matplotlib import pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np
def plot_multi_kde_cat(dfs, y, hue, subkwargs={}, distkwargs={}, legends=[]):
hues = np.unique(dfs[0][hue])
fig, axes = plt.subplots(1, len(hues), **subkwargs)
axes = axes.flatten()
for ax, hu in zip(axes, hues):
for df, legend_label in zip(dfs, legends):
sns.distplot(df.loc[df[hue] == hu, y], ax=ax, label=legend_label, **distkwargs)
ax.set_title(f"Segment: {hu}")
ax.legend()
N = 20
df_m = pd.DataFrame({'animal': np.random.choice(['tiger', 'horse'], N), 'weight': np.random.uniform(100, 200, N)})
df_f = pd.DataFrame({'animal': np.random.choice(['tiger', 'horse'], N), 'weight': np.random.uniform(80, 160, N)})
plot_multi_kde_cat([df_m, df_f], 'weight', 'animal',
subkwargs={}, distkwargs={'hist': False}, legends=['male', 'female'])
plt.show()

How to do kde plot in pyplot.subplots context? [duplicate]

I have a few Pandas DataFrames sharing the same value scale, but having different columns and indices. When invoking df.plot(), I get separate plot images. what I really want is to have them all in the same plot as subplots, but I'm unfortunately failing to come up with a solution to how and would highly appreciate some help.
You can manually create the subplots with matplotlib, and then plot the dataframes on a specific subplot using the ax keyword. For example for 4 subplots (2x2):
import matplotlib.pyplot as plt
fig, axes = plt.subplots(nrows=2, ncols=2)
df1.plot(ax=axes[0,0])
df2.plot(ax=axes[0,1])
...
Here axes is an array which holds the different subplot axes, and you can access one just by indexing axes.
If you want a shared x-axis, then you can provide sharex=True to plt.subplots.
You can see e.gs. in the documentation demonstrating joris answer. Also from the documentation, you could also set subplots=True and layout=(,) within the pandas plot function:
df.plot(subplots=True, layout=(1,2))
You could also use fig.add_subplot() which takes subplot grid parameters such as 221, 222, 223, 224, etc. as described in the post here. Nice examples of plot on pandas data frame, including subplots, can be seen in this ipython notebook.
You can plot multiple subplots of multiple pandas data frames using matplotlib with a simple trick of making a list of all data frame. Then using the for loop for plotting subplots.
Working code:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# dataframe sample data
df1 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
df2 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
df3 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
df4 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
df5 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
df6 = pd.DataFrame(np.random.rand(10,2)*100, columns=['A', 'B'])
#define number of rows and columns for subplots
nrow=3
ncol=2
# make a list of all dataframes
df_list = [df1 ,df2, df3, df4, df5, df6]
fig, axes = plt.subplots(nrow, ncol)
# plot counter
count=0
for r in range(nrow):
for c in range(ncol):
df_list[count].plot(ax=axes[r,c])
count+=1
Using this code you can plot subplots in any configuration. You need to define the number of rows nrow and the number of columns ncol. Also, you need to make list of data frames df_list which you wanted to plot.
You can use the familiar Matplotlib style calling a figure and subplot, but you simply need to specify the current axis using plt.gca(). An example:
plt.figure(1)
plt.subplot(2,2,1)
df.A.plot() #no need to specify for first axis
plt.subplot(2,2,2)
df.B.plot(ax=plt.gca())
plt.subplot(2,2,3)
df.C.plot(ax=plt.gca())
etc...
You can use this:
fig = plt.figure()
ax = fig.add_subplot(221)
plt.plot(x,y)
ax = fig.add_subplot(222)
plt.plot(x,z)
...
plt.show()
You may not need to use Pandas at all. Here's a matplotlib plot of cat frequencies:
x = np.linspace(0, 2*np.pi, 400)
y = np.sin(x**2)
f, axes = plt.subplots(2, 1)
for c, i in enumerate(axes):
axes[c].plot(x, y)
axes[c].set_title('cats')
plt.tight_layout()
Option 1: Create subplots from a dictionary of dataframes with long (tidy) data
Assumptions:
There is a dictionary of multiple dataframes of tidy data that are either:
Created by reading in from files
Created by separating a single dataframe into multiple dataframes
The categories, cat, may be overlapping, but all dataframes don't necessarily contain all values of cat
hue='cat'
This example uses a dict of dataframes, but a list of dataframes would be similar.
If the dataframes are wide, use pandas.DataFrame.melt to convert them to long form.
Because dataframes are being iterated through, there's no guarantee that colors will be mapped the same for each plot
A custom color map needs to be created from the unique 'cat' values for all the dataframes
Since the colors will be the same, place one legend to the side of the plots, instead of a legend in every plot
Tested in python 3.10, pandas 1.4.3, matplotlib 3.5.1, seaborn 0.11.2
Imports and Test Data
import pandas as pd
import numpy as np # used for random data
import matplotlib.pyplot as plt
from matplotlib.patches import Patch # for custom legend - square patches
from matplotlib.lines import Line2D # for custom legend - round markers
import seaborn as sns
import math import ceil # determine correct number of subplot
# synthetic data
df_dict = dict()
for i in range(1, 7):
np.random.seed(i) # for repeatable sample data
data_length = 100
data = {'cat': np.random.choice(['A', 'B', 'C'], size=data_length),
'x': np.random.rand(data_length), 'y': np.random.rand(data_length)}
df_dict[i] = pd.DataFrame(data)
# display(df_dict[1].head())
cat x y
0 B 0.944595 0.606329
1 A 0.586555 0.568851
2 A 0.903402 0.317362
3 B 0.137475 0.988616
4 B 0.139276 0.579745
# display(df_dict[6].tail())
cat x y
95 B 0.881222 0.263168
96 A 0.193668 0.636758
97 A 0.824001 0.638832
98 C 0.323998 0.505060
99 C 0.693124 0.737582
Create color mappings and plot
# create color mapping based on all unique values of cat
unique_cat = {cat for v in df_dict.values() for cat in v.cat.unique()} # get unique cats
colors = sns.color_palette('tab10', n_colors=len(unique_cat)) # get a number of colors
cmap = dict(zip(unique_cat, colors)) # zip values to colors
col_nums = 3 # how many plots per row
row_nums = math.ceil(len(df_dict) / col_nums) # how many rows of plots
# create the figue and axes
fig, axes = plt.subplots(row_nums, col_nums, figsize=(9, 6), sharex=True, sharey=True)
# convert to 1D array for easy iteration
axes = axes.flat
# iterate through dictionary and plot
for ax, (k, v) in zip(axes, df_dict.items()):
sns.scatterplot(data=v, x='x', y='y', hue='cat', palette=cmap, ax=ax)
sns.despine(top=True, right=True)
ax.legend_.remove() # remove the individual plot legends
ax.set_title(f'dataset = {k}', fontsize=11)
fig.tight_layout()
# create legend from cmap
# patches = [Patch(color=v, label=k) for k, v in cmap.items()] # square patches
patches = [Line2D([0], [0], marker='o', color='w', markerfacecolor=v, label=k, markersize=8) for k, v in cmap.items()] # round markers
# place legend outside of plot; change the right bbox value to move the legend up or down
plt.legend(title='cat', handles=patches, bbox_to_anchor=(1.06, 1.2), loc='center left', borderaxespad=0, frameon=False)
plt.show()
Option 2: Create subplots from a single dataframe with multiple separate datasets
The dataframes must be in a long form with the same column names.
This option uses pd.concat to combine multiple dataframes into a single dataframe, and .assign to add a new column.
See Import multiple csv files into pandas and concatenate into one DataFrame for creating a single dataframes from a list of files.
This option is easier because it doesn't require manually mapping colors to 'cat'
Combine DataFrames
# using df_dict, with dataframes as values, from the top
# combine all the dataframes in df_dict to a single dataframe with an identifier column
df = pd.concat((v.assign(dataset=k) for k, v in df_dict.items()), ignore_index=True)
# display(df.head())
cat x y dataset
0 B 0.944595 0.606329 1
1 A 0.586555 0.568851 1
2 A 0.903402 0.317362 1
3 B 0.137475 0.988616 1
4 B 0.139276 0.579745 1
# display(df.tail())
cat x y dataset
595 B 0.881222 0.263168 6
596 A 0.193668 0.636758 6
597 A 0.824001 0.638832 6
598 C 0.323998 0.505060 6
599 C 0.693124 0.737582 6
Plot a FacetGrid with seaborn.relplot
sns.relplot(kind='scatter', data=df, x='x', y='y', hue='cat', col='dataset', col_wrap=3, height=3)
Both options create the same result, however, it's less complicated to combine all the dataframes, and plot a figure-level plot with sns.relplot.
Building on #joris response above, if you have already established a reference to the subplot, you can use the reference as well. For example,
ax1 = plt.subplot2grid((50,100), (0, 0), colspan=20, rowspan=10)
...
df.plot.barh(ax=ax1, stacked=True)
Here is a working pandas subplot example, where modes is the column names of the dataframe.
dpi=200
figure_size=(20, 10)
fig, ax = plt.subplots(len(modes), 1, sharex="all", sharey="all", dpi=dpi)
for i in range(len(modes)):
ax[i] = pivot_df.loc[:, modes[i]].plot.bar(figsize=(figure_size[0], figure_size[1]*len(modes)),
ax=ax[i], title=modes[i], color=my_colors[i])
ax[i].legend()
fig.suptitle(name)
import numpy as np
import pandas as pd
imoprt matplotlib.pyplot as plt
fig, ax = plt.subplots(2,2)
df = pd.DataFrame({'A':np.random.randint(1,100,10),
'B': np.random.randint(100,1000,10),
'C':np.random.randint(100,200,10)})
for ax in ax.flatten():
df.plot(ax =ax)

How to annotate end of lines using python and matplotlib?

With a dataframe and basic plot such as this:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(123456)
rows = 75
df = pd.DataFrame(np.random.randint(-4,5,size=(rows, 3)), columns=['A', 'B', 'C'])
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=rows).tolist()
df['dates'] = datelist
df = df.set_index(['dates'])
df.index = pd.to_datetime(df.index)
df = df.cumsum()
df.plot()
What is the best way of annotating the last points on the lines so that you get the result below?
In order to annotate a point use ax.annotate(). In this case it makes sense to specify the coordinates to annotate separately. I.e. the y coordinate is the data coordinate of the last point of the line (which you can get from line.get_ydata()[-1]) while the x coordinate is independent of the data and should be the right hand side of the axes (i.e. 1 in axes coordinates). You may then also want to offset the text a bit such that it does not overlap with the axes.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
rows = 75
df = pd.DataFrame(np.random.randint(-4,5,size=(rows, 3)), columns=['A', 'B', 'C'])
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=rows).tolist()
df['dates'] = datelist
df = df.set_index(['dates'])
df.index = pd.to_datetime(df.index)
df = df.cumsum()
ax = df.plot()
for line, name in zip(ax.lines, df.columns):
y = line.get_ydata()[-1]
ax.annotate(name, xy=(1,y), xytext=(6,0), color=line.get_color(),
xycoords = ax.get_yaxis_transform(), textcoords="offset points",
size=14, va="center")
plt.show()
Method 1
Here is one way, or at least a method, which you can adapt to aesthetically fit in whatever way you want, using the plt.annotate method:
[EDIT]: If you're going to use a method like this first one, the method outlined in ImportanceOfBeingErnest's answer is better than what I've proposed.
df.plot()
for col in df.columns:
plt.annotate(col,xy=(plt.xticks()[0][-1]+0.7, df[col].iloc[-1]))
plt.show()
For the xy argument, which is the x and y coordinates of the text, I chose the last x coordinate in plt.xticks(), and added 0.7 so that it is outside of your x axis, but you can coose to make it closer or further as you see fit.
METHOD 2:
You could also just use the right y axis, and label it with your 3 lines. For example:
fig, ax = plt.subplots()
df.plot(ax=ax)
ax2 = ax.twinx()
ax2.set_ylim(ax.get_ylim())
ax2.set_yticks([df[col].iloc[-1] for col in df.columns])
ax2.set_yticklabels(df.columns)
plt.show()
This gives you the following plot:
I've got some tips from the other answers and believe this is the easiest solution.
Here is a generic function to improve the labels of a line chart. Its advantages are:
you don't need to mess with the original DataFrame since it works over a line chart,
it will use the already set legend label,
removes the frame,
just copy'n paste it to improve your chart :-)
You can just call it after creating any line char:
def improve_legend(ax=None):
if ax is None:
ax = plt.gca()
for spine in ax.spines:
ax.spines[spine].set_visible(False)
for line in ax.lines:
data_x, data_y = line.get_data()
right_most_x = data_x[-1]
right_most_y = data_y[-1]
ax.annotate(
line.get_label(),
xy=(right_most_x, right_most_y),
xytext=(5, 0),
textcoords="offset points",
va="center",
color=line.get_color(),
)
ax.legend().set_visible(False)
This is the original chart:
Now you just need to call the function to improve your plot:
ax = df.plot()
improve_legend(ax)
The new chart:
Beware, it will probably not work well if a line has null values at the end.

Categories

Resources