Plotting by grouped data using Matplotlib

Plotting by grouped data using Matplotlib - python

I'm using Matplotlib and Pandas to plot x by y, grouped by z. So I have the following:
x = df['ColumnA']
y = df['ColumnB']
fig, ax = plt.subplots(figsize=(20, 10))
for key, grp in df.groupby(['ColumnC']):
plt.plot(grp['ColumnA'], grp['ColumnB'].rolling(window=30).mean(), label=key)
I also want to highlight 2 specific values from the total amount of values that will be plotted:
ax.legend(('Value1', 'Value2'))
plt.show()
This works fine. I just have the 2 values in my legend, but all values are actually plotted. What I actually want, is to be able to specify the colors for the 2 Values above. i.e. red and blue and have all the other values from Column C show on the plot as one color. The objective is to highlight how Value 1 & 2 are performing compared to everything else.

First, change the colours of the lines of interest.
lines_to_highlight = {
'Value1': 'red',
'Value2': 'blue'
}
DEFAULT_COLOR = 'gray'
legend_entries = [] # Lines to show in legend
for line in ax.lines:
if line.get_label() in lines_to_highlight:
line.set_color(lines_to_highlight[line.get_label()])
legend_entries.append(line)
else:
line.set_color(DEFAULT_COLOR)
Second, create your legend.
ax.legend(
legend_entries,
[entry.get_label() for entry in legend_entries]
)
Notes:
ax.legend(('Value1', 'Value2')) doesn't do what you expect. It simply resets the labels for the first two lines you plotted. It doesn't restrict the legend to lines you created with those labels. (The matplotlib docs themselves say that this mistake is easy to make.)
You must call ax.legend(...) after setting the line colours. Otherwise, the colours in the legend might not match the ones in the plot.
Example
import matplotlib.pyplot as plt
ax = plt.subplot(111)
ax.plot([1, 1, 1], label='one')
ax.plot([2, 2, 2], label='two')
ax.plot([3, 3, 3], label='three')
lines_to_highlight = {
'one': 'red',
'three': 'blue'
}
DEFAULT_COLOR = 'gray'
legend_entries = [] # Lines to show in legend
for line in ax.lines:
if line.get_label() in lines_to_highlight:
line.set_color(lines_to_highlight[line.get_label()])
legend_entries.append(line)
else:
line.set_color(DEFAULT_COLOR)
ax.legend(
legend_entries, [entry.get_label() for entry in legend_entries]
)
plt.show()

Related

Scatterplot plot multiple groups of points with different colors

I want to paint different color the key of diccionary, how can i get this?
colors: {"Hardcover": "red", "Kindle Edition":"green", "Paperback":"blue", "ebook":"purple", "Unknown Binding":"black", "Boxed Set - Hardcover":"yellow"}
fig, ax = plt.subplots(figsize=(16, 8))
for key, value in colors.items():
ax.scatter(data["Type"], data["Price"], c=value, label=key)
ax.legend()
plt.show()

You don't have to iterate over the colours. You can just pass it in as a Series.
data = pd.DataFrame({'Type': ['Hardcover', 'Kindle Edition', 'Paperback'], 'Price': [1,2,3]})
colors = {'Hardcover': 'red', 'Kindle Edition':'green', 'Paperback':'blue'}
fig, ax = plt.subplots(figsize=(16, 8))
scatter_colors = data['Type'].map(colors)
ax.scatter(data['Type'], data['Price'], c=scatter_colors)
print(scatter_colors)
plt.show()
I would argue that a legend is not required in your case. If you really want to have it, I think the only way might be to iterate over the 'Types'
Edited since the OP wants the legend:
Replace the scatter_colors and ax.scatter lines from the code block above with this:
for book_type in data['Type'].unique():
data_subset = data[data['Type'] == book_type].copy()
ax.scatter(data_subset['Type'], data_subset['Price'], color=colors[book_type], label=book_type)
ax.legend()

Line chart with shaded areas

I tried to make a line chart while having shaded areas to indicate anomalies (recessions in this case). The rate is the variable for the line chart. I created a dummy variable, normal, to indicate if it is normal or not. I want the bar chart to be grey every period when normal = 1, similar to this chart.
This is my code so far. It is very different from what I desired. I wonder if someone can help me out.
df = pd.DataFrame({
'rate' : [90,40,30,30,30,25,25,20,15,10],
'group' : [1,2,3,4,5,6,7,8,9,10],
'normal' : [1,0,0,0,0,1,0,1,0,0]})
ax = df[['group','rate']].plot()
df[['group','normal']].plot(kind = 'bar',secondary_y = True, ax = ax)
plt.show()

IIUC, and based on the question you linked, you could just find your group values where normal == 1, and use ax.vline to draw a thick line at each of those points. For example:
ax = df.set_index('group')['rate'].plot()
x = df.loc[df.normal == 1, 'group']
for i in x:
ax.axvline(i, color='gray', alpha = 0.5, linewidth=30)
plt.show()

Matplotlib: Automatic coloured legend for all subplots using subplot line labels

The code below achieves what I want to do, but does so in a very roundabout way. I have looked around for a succinct way to produce a single legend for a figure that includes multiple subplots that takes into account their labels, to no avail. plt.figlegend() requires you to pass in labels and lines, and plt.legend() requires only handles (slightly better).
My example below illustrates what I want. I have 9 vectors, each with one of 3 categories. I want to plot each vector on a separate sub plot, label it, and plot a legend which indicates (using colour) what the label means; this is the automatic behaviour on a single plot.
Do you know of a better way of achieving the plot below?
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
nr_lines = 9
nr_cats = 3
np.random.seed(1337)
# Data
X = np.random.randn(nr_lines, 100)
labels = ['Category {}'.format(ii) for ii in range(nr_cats)]
y = np.random.choice(labels, nr_lines)
# Ideally wouldn't have to manually pick colours
clrs = matplotlib.rcParams['axes.prop_cycle'].by_key()['color']
clrs = [clrs[ii] for ii in range(nr_cats)]
lab_clr = {k: v for k, v in zip(labels, clrs)}
fig, ax = plt.subplots(3, 3)
ax = ax.flatten()
for ii in range(nr_lines):
ax[ii].plot(X[ii,:], label=y[ii], color=lab_clr[y[ii]])
lines = [a.lines[0] for a in ax]
l_labels = [l.get_label() for l in lines]
# the hack - get a single occurance of each label
idx_list = [l_labels.index(lab) for lab in labels]
lines_ = [lines[idx] for idx in idx_list]
#l_labels_ = [l_labels[idx] for idx in idx_list]
plt.legend(handles=lines_, bbox_to_anchor=[2, 2.5])
plt.tight_layout()
plt.savefig('/home/james/Downloads/stack_figlegend_example.png',
bbox_inches='tight')

You could use a dictionary to collect them using the label as a key. For example:
handles = {}
for ii in range(nr_lines):
l1, = ax[ii].plot(X[ii,:], label=y[ii], color=lab_clr[y[ii]])
if y[ii] not in handles:
handles[y[ii]] = l1
plt.legend(handles=handles.values(), bbox_to_anchor=[2, 2.5])
You only add a handle to the dictionary if the category isn't already present.

Add Legend to Seaborn point plot

I am plotting multiple dataframes as point plot using seaborn. Also I am plotting all the dataframes on the same axis.
How would I add legend to the plot ?
My code takes each of the dataframe and plots it one after another on the same figure.
Each dataframe has same columns
date count
2017-01-01 35
2017-01-02 43
2017-01-03 12
2017-01-04 27
My code :
f, ax = plt.subplots(1, 1, figsize=figsize)
x_col='date'
y_col = 'count'
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df_1,color='blue')
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df_2,color='green')
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df_3,color='red')
This plots 3 lines on the same plot. However the legend is missing. The documentation does not accept label argument .
One workaround that worked was creating a new dataframe and using hue argument.
df_1['region'] = 'A'
df_2['region'] = 'B'
df_3['region'] = 'C'
df = pd.concat([df_1,df_2,df_3])
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df,hue='region')
But I would like to know if there is a way to create a legend for the code that first adds sequentially point plot to the figure and then add a legend.
Sample output :

I would suggest not to use seaborn pointplot for plotting. This makes things unnecessarily complicated.
Instead use matplotlib plot_date. This allows to set labels to the plots and have them automatically put into a legend with ax.legend().
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np
date = pd.date_range("2017-03", freq="M", periods=15)
count = np.random.rand(15,4)
df1 = pd.DataFrame({"date":date, "count" : count[:,0]})
df2 = pd.DataFrame({"date":date, "count" : count[:,1]+0.7})
df3 = pd.DataFrame({"date":date, "count" : count[:,2]+2})
f, ax = plt.subplots(1, 1)
x_col='date'
y_col = 'count'
ax.plot_date(df1.date, df1["count"], color="blue", label="A", linestyle="-")
ax.plot_date(df2.date, df2["count"], color="red", label="B", linestyle="-")
ax.plot_date(df3.date, df3["count"], color="green", label="C", linestyle="-")
ax.legend()
plt.gcf().autofmt_xdate()
plt.show()
In case one is still interested in obtaining the legend for pointplots, here a way to go:
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df1,color='blue')
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df2,color='green')
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df3,color='red')
ax.legend(handles=ax.lines[::len(df1)+1], labels=["A","B","C"])
ax.set_xticklabels([t.get_text().split("T")[0] for t in ax.get_xticklabels()])
plt.gcf().autofmt_xdate()
plt.show()

Old question, but there's an easier way.
sns.pointplot(x=x_col,y=y_col,data=df_1,color='blue')
sns.pointplot(x=x_col,y=y_col,data=df_2,color='green')
sns.pointplot(x=x_col,y=y_col,data=df_3,color='red')
plt.legend(labels=['legendEntry1', 'legendEntry2', 'legendEntry3'])
This lets you add the plots sequentially, and not have to worry about any of the matplotlib crap besides defining the legend items.

I tried using Adam B's answer, however, it didn't work for me. Instead, I found the following workaround for adding legends to pointplots.
import matplotlib.patches as mpatches
red_patch = mpatches.Patch(color='#bb3f3f', label='Label1')
black_patch = mpatches.Patch(color='#000000', label='Label2')
In the pointplots, the color can be specified as mentioned in previous answers. Once these patches corresponding to the different plots are set up,
plt.legend(handles=[red_patch, black_patch])
And the legend ought to appear in the pointplot.

This goes a bit beyond the original question, but also builds on #PSub's response to something more general---I do know some of this is easier in Matplotlib directly, but many of the default styling options for Seaborn are quite nice, so I wanted to work out how you could have more than one legend for a point plot (or other Seaborn plot) without dropping into Matplotlib right at the start.
Here's one solution:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# We will need to access some of these matplotlib classes directly
from matplotlib.lines import Line2D # For points and lines
from matplotlib.patches import Patch # For KDE and other plots
from matplotlib.legend import Legend
from matplotlib import cm
# Initialise random number generator
rng = np.random.default_rng(seed=42)
# Generate sample of 25 numbers
n = 25
clusters = []
for c in range(0,3):
# Crude way to get different distributions
# for each cluster
p = rng.integers(low=1, high=6, size=4)
df = pd.DataFrame({
'x': rng.normal(p[0], p[1], n),
'y': rng.normal(p[2], p[3], n),
'name': f"Cluster {c+1}"
})
clusters.append(df)
# Flatten to a single data frame
clusters = pd.concat(clusters)
# Now do the same for data to feed into
# the second (scatter) plot...
n = 8
points = []
for c in range(0,2):
p = rng.integers(low=1, high=6, size=4)
df = pd.DataFrame({
'x': rng.normal(p[0], p[1], n),
'y': rng.normal(p[2], p[3], n),
'name': f"Group {c+1}"
})
points.append(df)
points = pd.concat(points)
# And create the figure
f, ax = plt.subplots(figsize=(8,8))
# The KDE-plot generates a Legend 'as usual'
k = sns.kdeplot(
data=clusters,
x='x', y='y',
hue='name',
shade=True,
thresh=0.05,
n_levels=2,
alpha=0.2,
ax=ax,
)
# Notice that we access this legend via the
# axis to turn off the frame, set the title,
# and adjust the patch alpha level so that
# it closely matches the alpha of the KDE-plot
ax.get_legend().set_frame_on(False)
ax.get_legend().set_title("Clusters")
for lh in ax.get_legend().get_patches():
lh.set_alpha(0.2)
# You would probably want to sort your data
# frame or set the hue and style order in order
# to ensure consistency for your own application
# but this works for demonstration purposes
groups = points.name.unique()
markers = ['o', 'v', 's', 'X', 'D', '<', '>']
colors = cm.get_cmap('Dark2').colors
# Generate the scatterplot: notice that Legend is
# off (otherwise this legend would overwrite the
# first one) and that we're setting the hue, style,
# markers, and palette using the 'name' parameter
# from the data frame and the number of groups in
# the data.
p = sns.scatterplot(
data=points,
x="x",
y="y",
hue='name',
style='name',
markers=markers[:len(groups)],
palette=colors[:len(groups)],
legend=False,
s=30,
alpha=1.0
)
# Here's the 'magic' -- we use zip to link together
# the group name, the color, and the marker style. You
# *cannot* retreive the marker style from the scatterplot
# since that information is lost when rendered as a
# PathCollection (as far as I can tell). Anyway, this allows
# us to loop over each group in the second data frame and
# generate a 'fake' Line2D plot (with zero elements and no
# line-width in our case) that we can add to the legend. If
# you were overlaying a line plot or a second plot that uses
# patches you'd have to tweak this accordingly.
patches = []
for x in zip(groups, colors[:len(groups)], markers[:len(groups)]):
patches.append(Line2D([0],[0], linewidth=0.0, linestyle='',
color=x[1], markerfacecolor=x[1],
marker=x[2], label=x[0], alpha=1.0))
# And add these patches (with their group labels) to the new
# legend item and place it on the plot.
leg = Legend(ax, patches, labels=groups,
loc='upper left', frameon=False, title='Groups')
ax.add_artist(leg);
# Done
plt.show();
Here's the output:

Python Matplotlib Boxplot Color

I am trying to make two sets of box plots using Matplotlib. I want each set of box plot filled (and points and whiskers) in a different color. So basically there will be two colors on the plot
My code is below, would be great if you can help make these plots in color. d0 and d1 are each list of lists of data. I want the set of box plots made with data in d0 in one color, and the set of box plots with data in d1 in another color.
plt.boxplot(d0, widths = 0.1)
plt.boxplot(d1, widths = 0.1)

To colorize the boxplot, you need to first use the patch_artist=True keyword to tell it that the boxes are patches and not just paths. Then you have two main options here:
set the color via ...props keyword argument, e.g.
boxprops=dict(facecolor="red"). For all keyword arguments, refer to the documentation
Use the plt.setp(item, properties) functionality to set the properties of the boxes, whiskers, fliers, medians, caps.
obtain the individual items of the boxes from the returned dictionary and use item.set_<property>(...) on them individually. This option is detailed in an answer to the following question: python matplotlib filled boxplots, where it allows to change the color of the individual boxes separately.
The complete example, showing options 1 and 2:
import matplotlib.pyplot as plt
import numpy as np
data = np.random.normal(0.1, size=(100,6))
data[76:79,:] = np.ones((3,6))+0.2
plt.figure(figsize=(4,3))
# option 1, specify props dictionaries
c = "red"
plt.boxplot(data[:,:3], positions=[1,2,3], notch=True, patch_artist=True,
boxprops=dict(facecolor=c, color=c),
capprops=dict(color=c),
whiskerprops=dict(color=c),
flierprops=dict(color=c, markeredgecolor=c),
medianprops=dict(color=c),
)
# option 2, set all colors individually
c2 = "purple"
box1 = plt.boxplot(data[:,::-2]+1, positions=[1.5,2.5,3.5], notch=True, patch_artist=True)
for item in ['boxes', 'whiskers', 'fliers', 'medians', 'caps']:
plt.setp(box1[item], color=c2)
plt.setp(box1["boxes"], facecolor=c2)
plt.setp(box1["fliers"], markeredgecolor=c2)
plt.xlim(0.5,4)
plt.xticks([1,2,3], [1,2,3])
plt.show()

You can change the color of a box plot using setp on the returned value from boxplot(). This example defines a box_plot() function that allows the edge and fill colors to be specified:
import matplotlib.pyplot as plt
def box_plot(data, edge_color, fill_color):
bp = ax.boxplot(data, patch_artist=True)
for element in ['boxes', 'whiskers', 'fliers', 'means', 'medians', 'caps']:
plt.setp(bp[element], color=edge_color)
for patch in bp['boxes']:
patch.set(facecolor=fill_color)
return bp
example_data1 = [[1,2,0.8], [0.5,2,2], [3,2,1]]
example_data2 = [[5,3, 4], [6,4,3,8], [6,4,9]]
fig, ax = plt.subplots()
bp1 = box_plot(example_data1, 'red', 'tan')
bp2 = box_plot(example_data2, 'blue', 'cyan')
ax.legend([bp1["boxes"][0], bp2["boxes"][0]], ['Data 1', 'Data 2'])
ax.set_ylim(0, 10)
plt.show()
This would display as follows:

This question seems to be similar to that one (Face pattern for boxes in boxplots)
I hope this code solves your problem
import matplotlib.pyplot as plt
# fake data
d0 = [[4.5, 5, 6, 4],[4.5, 5, 6, 4]]
d1 = [[1, 2, 3, 3.3],[1, 2, 3, 3.3]]
# basic plot
bp0 = plt.boxplot(d0, patch_artist=True)
bp1 = plt.boxplot(d1, patch_artist=True)
for box in bp0['boxes']:
# change outline color
box.set(color='red', linewidth=2)
# change fill color
box.set(facecolor = 'green' )
# change hatch
box.set(hatch = '/')
for box in bp1['boxes']:
box.set(color='blue', linewidth=5)
box.set(facecolor = 'red' )
plt.show()

Change the color of a boxplot
import numpy as np
import matplotlib.pyplot as plt
#generate some random data
data = np.random.randn(200)
d= [data, data]
#plot
box = plt.boxplot(d, showfliers=False)
# change the color of its elements
for _, line_list in box.items():
for line in line_list:
line.set_color('r')
plt.show()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Plotting by grouped data using Matplotlib - python

Related

Scatterplot plot multiple groups of points with different colors

Line chart with shaded areas

Matplotlib: Automatic coloured legend for all subplots using subplot line labels

Add Legend to Seaborn point plot

Python Matplotlib Boxplot Color

Categories

Resources