highlight a row in seaborn heatmap - python

Today I'm working on a heatmap inside a function. It's nothing too fancy: the heatmap shows a value for every district in my city and inside the function one of the arguments is district_name.
I want my function to print this same heatmap, but that it highlights the selected district (preferably through bolded text).
My code is something like this:
def print_heatmap(district_name, df2):
df2=df2[df2.t==7]
pivot=pd.pivot_table(df2,values='return',index= 'district',columns= 't',aggfunc ='mean')
sns.heatmap(pivot, annot=True, cmap=sns.cm.rocket_r,fmt='.2%',annot_kws={"size": 10})
So I need to access ax's values, so I can bold, say "Macul" if I enter print_heatmap('Macul',df2). Is there any way I can do this?
What I tried was to use mathtext but for some reason I can't use bolding in this case:
pivot.index=pivot.index.str.replace(district_name,r"$\bf{{{}}}$".format(district_name)
But that brings:
ValueError:
f{macul}$
^
Expected end of text (at char 0), (line:1, col:1)
Thanks

I think it is hard to do this explicitly in seaborn, you can instead though iterate through the texts in the axes (annot) and the ticklabels and set their properties to "highlight" a row.
Here is an example of this approach.
import matplotlib as mpl
import seaborn as sns
import numpy as np
fig = plt.figure(figsize = (5,5))
uniform_data = np.random.rand(10, 1)
cmap = mpl.cm.Blues_r
ax = sns.heatmap(uniform_data, annot=True, cmap=cmap)
# iterate through both the labels and the texts in the heatmap (ax.texts)
for lab, annot in zip(ax.get_yticklabels(), ax.texts):
text = lab.get_text()
if text == '2': # lets highlight row 2
# set the properties of the ticklabel
lab.set_weight('bold')
lab.set_size(20)
lab.set_color('purple')
# set the properties of the heatmap annot
annot.set_weight('bold')
annot.set_color('purple')
annot.set_size(20)

Related

How to reduce the blank area in a grouped boxplot with many missing hue categories

I have an issue when plotting a categorical grouped boxplot by seaborn in Python, especially using 'hue'.
My raw data is as shown in the figure below. And I wanted to plot values in column 8 after categorized by column 1 and 4.
I used seaborn and my code is shown below:
ax = sns.boxplot(x=output[:,1], y=output[:,8], hue=output[:,4])
ax.set_xticklabel(ax.get_xticklabels(), rotation=90)
plt.legend([],[])
However, the generated plot always contains large blank area, as shown in the upper figure below. I tried to add 'dodge=False' in sns.boxplot according to a post here (https://stackoverflow.com/questions/53641287/off-center-x-axis-in-seaborn), but it gives the lower figure below.
Actually, what I want Python to plot is a boxplot like what I generated using JMP below.
It seems that if one of the 2nd categories is empty, seaborn will still leave the space on the generated figure for each 1st category, thus causes the observed off-set/blank area.
So I wonder if there is any way to solve this issue, like using other package in python?
Seaborn reserves a spot for each individual hue value, even when some of these values are missing. When many hue values are missing, this leads to annoying open spots. (When there would be only one box per x-value, dodge=False would solve the problem.)
A workaround is to generate a separate subplot for each individual x-label.
Reproducible example for default boxplot with missing hue values
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
np.random.seed(20230206)
df = pd.DataFrame({'label': np.repeat(['label1', 'label2', 'label3', 'label4'], 250),
'cat': np.repeat(np.random.choice([*'abcdefghijklmnopqrst'], 40), 25),
'value': np.random.randn(1000).cumsum()})
df['cat'] = pd.Categorical(df['cat'], [*'abcdefghijklmnopqrst'])
sns.set_style('white')
plt.figure(figsize=(15, 5))
ax = sns.boxplot(df, x='label', y='value', hue='cat', palette='turbo')
sns.move_legend(ax, loc='upper left', bbox_to_anchor=(1, 1), ncol=2)
sns.despine()
plt.tight_layout()
plt.show()
Individual subplots per x value
A FacetGrid is generated with a subplot ("facet") for each x value
The original hue will be used as x-value for each subplot. To avoid empty spots, the hue should be of string type. When the hue would be pd.Categorical, seaborn would still reserve a spot for each of the categories.
df['cat'] = df['cat'].astype(str) # the column should be of string type, not pd.Categorical
g = sns.FacetGrid(df, col='label', sharex=False)
g.map_dataframe(sns.boxplot, x='cat', y='value')
for label, ax in g.axes_dict.items():
ax.set_title('') # remove the title generated by sns.FacetGrid
ax.set_xlabel(label) # use the label from the dataframe as xlabel
plt.tight_layout()
plt.show()
Adding consistent coloring
A dictionary palette can color the boxes such that corresponding boxes in different subplots have the same color. hue= with the same column as the x= will do the coloring, and dodge=False will remove the empty spots.
df['cat'] = df['cat'].astype(str) # the column should be of string type, not pd.Categorical
cats = np.sort(df['cat'].unique())
palette_dict = {cat: color for cat, color in zip(cats, sns.color_palette('turbo', len(cats)))}
g = sns.FacetGrid(df, col='label', sharex=False)
g.map_dataframe(sns.boxplot, x='cat', y='value',
hue='cat', dodge=False, palette=palette_dict)
for label, ax in g.axes_dict.items():
ax.set_title('') # remove the title generated by sns.FacetGrid
ax.set_xlabel(label) # use the label from the dataframe as xlabel
# ax.tick_params(axis='x', labelrotation=90) # optionally rotate the tick labels
plt.tight_layout()
plt.show()

How to control the color of a specific column in a bar plot depending on it's xtick label?

I have a number of plots that show transcribed text from a speech to text engine in which I want to show the bars where the S2T engine transcribed correctly. I have labeled the subplots according to their expected values and now want to color the bars where the engine transcribed correctly in a different number than the other bars.
That means I need to access the color of the bars depending on their x-tick label. How do I do that?
Basically:
for xlabel in fig.xlabels:
if(xlabel.text == fig.title):
position = xlabel.position
fig.colorbar(position, 'red')
Code that is used to generate the plots:
def count_id(id_val, ax=None):
title = df.loc[df['ID'] == id_val, 'EXPECTED_TEXT'].iloc[0]
fig = df[df['ID']==id_val]['TRANSCRIPTION_STRING'].value_counts().plot(kind='bar', ax=ax, figsize=(20,6), title=title)
fig.set_xticklabels(fig.get_xticklabels(), rotation=40, ha ='right')
fig.yaxis.set_major_locator(MaxNLocator(integer=True))
fig, axs = plt.subplots(2, 4)
fig.suptitle('Classic subplot')
fig.subplots_adjust(hspace=1.4)
count_id('byte', axs[0,0])
count_id('clefting', axs[0,1])
count_id('left_hander', axs[0,2])
count_id('leftmost', axs[0,3])
count_id('right_hander', axs[1,0])
count_id('rightmost', axs[1,1])
count_id('wright', axs[1,2])
count_id('write', axs[1,3])
If anyone has an idea how to iterate over axs so I don't have to call count_id() 8 times, that'd be super helpful too. And yea I tried:
misses = ['byte', 'cleftig', 'left_hander', 'leftmost', 'right_hander', 'rightmost', 'wright', 'write']
for ax, miss in zip(axs.flat, misses):
count_id(ax, miss) # <- computer says no
You can set the color of each bar according to the label both before and after plotting the bars.
I will use the sample data below for demonstration.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = pd.DataFrame({'word': list('abcdefg'), 'number': np.arange(1, 8)})
1. Before plotting:
This is the most common way of plotting a colored barplot. Your can pass a list of colors to plt.plot().
def plot_color_label_before(data, target):
colors = ['red' if word == target else 'blue' for word in data.word]
bars = plt.bar(x=data.word, height=data.number, color=colors, alpha=0.5)
The data passed to the function contains two columns, where the first lists all the words on your xtick, and the second lists the corrsponding numbers. The target is your expected word.
The code determines the colors for each word according to whether it is consistent with your target. For example,
plot_color_label_before(data, 'c')
2. After plotting:
If you want to access the color after calling plt.plot, use set_color to change the color of a specific bar.
def plot_color_label_after(data, target):
bars = plt.bar(x=data.word, height=data.number, color='blue', alpha=0.5)
for idx, word in enumerate(data.word):
if word == target:
bars[idx].set_color(c='yellow')
plt.bar returns a BarContainer, the i-th element of which is a patch (rectangle). Iterate over all the labels and change the color if the word hits the target.
For example,
plot_color_label_after(data, 'c')
Finally, as to iterating over axs, just ravel it can solve the problem.
fig, axs = plt.subplots(2, 4)
for ax in axs.ravel():
ax.plot(...)

How to use a colored shape as yticks in matplotlib or seaborn?

I am working on a task called knowledge tracing which estimates the student mastery level over time. I would like to plot a similar figure as below using the Matplotlib or Seaborn.
It uses different colors to represent a knowledge concept, instead of a text. However, I have googled and found there is no article is talking about how we can do this.
I tried the following
# simulate a record of student mastery level
student_mastery = np.random.rand(5, 30)
df = pd.DataFrame(student_mastery)
# plot the heatmap using seaborn
marker = matplotlib.markers.MarkerStyle(marker='o', fillstyle='full')
sns_plot = sns.heatmap(df, cmap="RdYlGn", vmin=0.0, vmax=1.0)
y_limit = 5
y_labels = [marker for i in range(y_limit)]
plt.yticks(range(y_limit), y_labels)
Yet it simply returns the __repr__ of the marker, e.g., <matplotlib.markers.MarkerStyle at 0x1c5bb07860> on the yticks.
Thanks in advance!
While How can I make the xtick labels of a plot be simple drawings using matplotlib? gives you a general solution for arbitrary shapes, for the shapes shown here, it may make sense to use unicode symbols as text and colorize them according to your needs.
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(42)
fig, ax = plt.subplots()
ax.imshow(np.random.rand(3,10), cmap="Greys")
symbolsx = ["⚪", "⚪", "⚫", "⚫", "⚪", "⚫","⚪", "⚫", "⚫","⚪"]
colorsx = np.random.choice(["#3ba1ab", "#b43232", "#8ecc3a", "#893bab"], 10)
ax.set_xticks(range(len(symbolsx)))
ax.set_xticklabels(symbolsx, size=40)
for tick, color in zip(ax.get_xticklabels(), colorsx):
tick.set_color(color)
symbolsy = ["◾", "◾", "◾"]
ax.set_yticks(range(len(symbolsy)))
ax.set_yticklabels(symbolsy, size=40)
for tick, color in zip(ax.get_yticklabels(), ["crimson", "gold", "indigo"]):
tick.set_color(color)
plt.show()

Colormap lines showing up as the same color.

I am extending this question to figure out how to make each of the lines a different shade of red or black. This might require making a custom color map or somehow tweaking the colormap "RdGy".
Using the data and packages from the last question, this is what I have so far:
df0.groupby([df0.ROI.str.split('_').str[0],'Band']).Mean.plot.kde(colormap='RdGy')
plt.legend()
plt.show()
And the figure looks like this:
But I want the 'bcs' lines to be shades of black and the 'red' lines to be shades of red. How is this possible?
It would also be great to customize the names of the lines in the legend, such as "BCS Band 1", etc.. but not sure how to do this either.
In principle #Robbies answer to the linked question gives you all the tools needed to create lines of any color and label you want.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame()
nam = ["red", "bcs"]
for i in range(8):
df['{}_{}'.format(nam[i//4],i%4)] = np.random.normal(i, i%4+1, 100)
nam = {"red":plt.cm.Reds, "bcs": plt.cm.gray_r}
fig, ax = plt.subplots()
for i, s in enumerate(df.columns):
df[s].plot(kind='density', color=nam[s.split("_")[0]]((i%4+1)/4.),
label=" Band ".join(s.split("_")))
plt.legend()
plt.show()
Of course you can also just use a list of strings as legend entries. Either by supplying them to the label argument of the plotting function,
labels = "This is a legend with some custom entries".split()
for i, s in enumerate(df.columns):
df[s].plot(kind='density', color=nam[s.split("_")[0]]((i%4+1)/4.),
label=labels[i])
or
by using the labels argument of the legend,
labels = "This is a legend with some custom entries".split()
for i, s in enumerate(df.columns):
df[s].plot(kind='density', color=nam[s.split("_")[0]]((i%4+1)/4.) )
plt.legend(labels=labels)

automatically position text box in matplotlib

Is there a way of telling pyplot.text() a location like you can with pyplot.legend()?
Something like the legend argument would be excellent:
plt.legend(loc="upper left")
I am trying to label subplots with different axes using letters (e.g. "A","B"). I figure there's got to be a better way than manually estimating the position.
Thanks
Just use annotate and specify axis coordinates. For example, "upper left" would be:
plt.annotate('Something', xy=(0.05, 0.95), xycoords='axes fraction')
You could also get fancier and specify a constant offset in points:
plt.annotate('Something', xy=(0, 1), xytext=(12, -12), va='top'
xycoords='axes fraction', textcoords='offset points')
For more explanation see the examples here and the more detailed examples here.
I'm not sure if this was available when I originally posted the question but using the loc parameter can now actually be used. Below is an example:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.offsetbox import AnchoredText
# make some data
x = np.arange(10)
y = x
# set up figure and axes
f, ax = plt.subplots(1,1)
# loc works the same as it does with figures (though best doesn't work)
# pad=5 will increase the size of padding between the border and text
# borderpad=5 will increase the distance between the border and the axes
# frameon=False will remove the box around the text
anchored_text = AnchoredText("Test", loc=2)
ax.plot(x,y)
ax.add_artist(anchored_text)
plt.show()
The question is quite old but as there is no general solution to the problem till now (2019) according to Add loc=best kwarg to pyplot.text(), I'm using legend() and the following workaround to obtain auto-placement for simple text boxes:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpl_patches
x = np.linspace(-1,1)
fig, ax = plt.subplots()
ax.plot(x, x*x)
# create a list with two empty handles (or more if needed)
handles = [mpl_patches.Rectangle((0, 0), 1, 1, fc="white", ec="white",
lw=0, alpha=0)] * 2
# create the corresponding number of labels (= the text you want to display)
labels = []
labels.append("pi = {0:.4g}".format(np.pi))
labels.append("root(2) = {0:.4g}".format(np.sqrt(2)))
# create the legend, supressing the blank space of the empty line symbol and the
# padding between symbol and label by setting handlelenght and handletextpad
ax.legend(handles, labels, loc='best', fontsize='small',
fancybox=True, framealpha=0.7,
handlelength=0, handletextpad=0)
plt.show()
The general idea is to create a legend with a blank line symbol and to remove the resulting empty space afterwards. How to adjust the size of matplotlib legend box? helped me with the legend formatting.

Categories

Resources