Seaborn Heatmap: underline text in a cell - python

I am making some data analysis in Python, and I am using Seaborn for visualization.
Seaborn works very nice for creating heatmaps.
I am trying to underline the maximum values for each column in my heatmap.
I was able to correctly highlight the text in the maximum cells by making them italic and bold. Still, I found no way to underline it.
This is an example of my code:
data_matrix = < extract my data and put them into a matrix >
max_in_each_column = np.max(data_matrix, axis=0)
sns.heatmap(data_matrix,
mask=data_matrix == max_in_each_column,
linewidth=0.5,
annot=True,
xticklabels=my_x_tick_labels,
yticklabels=my_y_tick_labels,
cmap="coolwarm_r")
sns.heatmap(data_matrix,
mask=data_matrix != max_in_each_column,
annot_kws={"style": "italic", "weight": "bold"},
linewidth=0.5,
annot=True,
xticklabels=my_x_tick_labels,
yticklabels=my_y_tick_labels,
cbar=False,
cmap="coolwarm_r")
This is my current result:
Of course I have tried using argumentannot_kws={"style": "underlined"}, but apparently in Seaborn the "style" key only supports values "normal", "italic" or "oblique".
Is there a workaround to this?

Yes, you can workaround your problem using tex commands within your texts. The basic idea is that you use the annot key of seaborn.heatmap to assign an array of strings as text labels. These contain your data values + some tex prefixes/suffixes to allow tex making them bold/emphasized (italic)/underlined or whatsoever.
An example (with random numbers):
# random data
data_matrix = np.round(np.random.rand(10, 10), decimals=2)
max_in_each_column = np.max(data_matrix, axis=0)
# Activating tex in all labels globally
plt.rc('text', usetex=True)
# Adjust font specs as desired (here: closest similarity to seaborn standard)
plt.rc('font', **{'size': 14.0})
plt.rc('text.latex', preamble=r'\usepackage{lmodern}')
# remains unchanged
sns.heatmap(data_matrix,
mask=data_matrix == max_in_each_column,
linewidth=0.5,
annot=True,
cmap="coolwarm_r")
# changes here
sns.heatmap(data_matrix,
mask=data_matrix != max_in_each_column,
linewidth=0.5,
# Use annot key with np.array as value containing strings of data + latex
# prefixes/suffices making the bold/italic/underline formatting
annot=np.array([r'\textbf{\emph{\underline{' + str(data) + '}}}'
for data in data_matrix.ravel()]).reshape(
np.shape(data_matrix)),
# fmt key must be empty, formatting error otherwise
fmt='',
cbar=False,
cmap="coolwarm_r")
plt.show()
Further explanation the annotation array:
# For all matrix_elements in your 2D data array (2D requires the .ravel() and .reshape()
# stuff at the end) construct in sum a 2D data array consisting of strings
# \textbf{\emph{\underline{<matrix_element>}}}. Each string will be represented by tex as
# a bold, italic and underlined representation of the matrix_element
np.array([r'\textbf{\emph{\underline{' + str(data) + '}}}'
for data in data_matrix.ravel()]).reshape(np.shape(data_matrix))
The resulting plot is basically what you wanted:

Related

How to change axis values to italic format (Python, Matplotlib)

Currently I'm working on a university bioinformatics project. I have plot with an axis where values should be in italic format (Proteobacteria, Bacteroidata, etc should be in italic), but I can't find any solution how to change format ONLY for one axis values. I have found that using plt.rcParams.update() function can help, but it changes all plot/graph to italic format, but as I said I only need to change only one axis.
My code:
color_map = ['#dfdfdf' for i in range(len(order_count))]
color_map[0] = '#e81518'
fig, ax = plt.subplots(1,1, figsize=(13,7))
ax.barh(phyl_count['Phylum'], phyl_count['Phyl_Perc'], linewidth=0.6, color = color_map)
plt.gca().invert_yaxis()
ax.grid(axis='x', alpha=0.4)
plt.xlabel('Percent')
plt.ylabel('Phylum')
for index,data in enumerate(phyl_count['Phyl_Perc']):
plt.text(x=data+0.1, y=index+0.20,s=f'{data}%')
plt.show()
And I get this graph, how can I change Proteobacteria, Bacteriodata, etc to italic?
You can obtain a list of the labels by writing
labels = ax.get_yticklabels()
This gives you a list of the labels, where each label is an instance of Matplotlibs Text. This has a set_style-method, where one of the options is to set the text 'italic' so you can do
for lbl in labels:
lbl.set_style('italic')

How to plot two case1.hdf5 and case2.hdf5 files in matplotlib. Seeking help to correct the script

I have below script which only plots case1.hdf5 file.
I want to plot another case2.hdf5 file in same script such that I
get two overlapping plots.
Additionally, I want to use
Times New Roman fonts for labels and titles.
Insert Legends for both the plots.
Multiply Y-axis data with some constant number.
This script gives bottom three lines in a same colour but I want all
three in different solid colours for case1.hdf5 and with same
colour and dashed for another case2.hdf5 file.
My script is here
import h5py
import matplotlib.pyplot as plt
import warnings
import matplotlib
warnings.filterwarnings("ignore") # Ignore all warnings
ticklabels=[r'$\Gamma$','F','Q','Z',r'$\Gamma$']
params = {
'mathtext.default': 'regular',
'axes.linewidth': 1.2,
'axes.edgecolor': 'Black',
}
plt.rcParams.update(params)
fig, ax = plt.subplots()
f = h5py.File('band.hdf5', 'r')
#print ('datasets are:')
print(list(f.keys()))
dist=f[u'distance']
freq=f[u'frequency']
kpt=f[u'path']
# Iterate over each segment
for i in range(len(dist)):
# Iteraton over each band
for nbnd in range(len(freq[i][0])):
x=[]
y=[]
for j in range(len(dist[i])):
x.append(dist[i][j])
y.append(freq[i][j][nbnd])
# First 3 bands are red
if (nbnd<3):
color='red'
else:
color='black'
ax.plot(x, y, c=color, lw=2.0, alpha=0.8)
# Labels and axis limit and ticks
ax.set_ylabel(r'Frequency (THz)', fontsize=12)
ax.set_xlabel(r'Wave Vector (q)', fontsize=12)
ax.set_xlim([dist[0][0],dist[len(dist)-1][-1]])
xticks=[dist[i][0] for i in range(len(dist))]
xticks.append(dist[len(dist)-1][-1])
ax.set_xticks(xticks)
ax.set_xticklabels(ticklabels)
# Plot grid
ax.grid(which='major', axis='x', c='green', lw=2.5, linestyle='--', alpha=0.8)
# Save to pdf
plt.savefig('plots.pdf', bbox_inches='tight')
You see, there is
First 3 bands are red
if (nbnd<3):
color='red'
and instead of red I want all of these three in solid different colours and for case2.hdf5 in dashed lines with same colours.
1. Colours
It sounds like in the first instance you want to map different colours to the first there bands of your data.
One way you might do this is to setup a colourmap and then apply it to those first three bands. Here I have just picked the default matplotlib colormap, but there are loads to choose from, so if the default doesn't work for you I would suggest checking out the post about choosing a colormap. In most use cases you should try to stick to a perceptually constant map.
2. Legend
This should just be a matter of calling ax.legend(). Although be wary when setting the position of the legend to be outside the bounds of the plot as you need to do some extra finicking when saving to pdf, as detailed here..
However you first need to add some labels to your plot, which in your case you would do inside your ax.plot() calls. I'm not sure what you are plotting, so can't tell you what labels would be sensible, but you may want something like: ax.plot(... label=f'band {nbnd}' if nbnd < 4 else None).
Notice the inline if. You are likely going to have a whole bunch of black bands that you don't want to label individually, so you likely want to only label the first and let the rest have label = None which means no bloated legend.
3. Scale Y
If you change the way you iterate through your data you should be able to capture the h5 dataset as something that behaves much like a numpy array. What I mean by that is you really only need two loops to index the data you want. freq[i, :, nbnd] should be a 1-d array that you want to set to y. You can multiply that 1-d array by some scale value
4.
import h5py
import matplotlib.pyplot as plt
import warnings
import matplotlib
warnings.filterwarnings("ignore") # Ignore all warnings
cmap = matplotlib.cm.get_cmap('jet', 4)
ticklabels=['A','B','C','D','E']
params = {
'mathtext.default': 'regular',
'axes.linewidth': 1.2,
'axes.edgecolor': 'Black',
'font.family' : 'serif'
}
#get the viridis cmap with a resolution of 3
#apply a scale to the y axis. I'm just picking an arbritrary number here
scale = 10
offset = 0 #set this to a non-zero value if you want to have your lines offset in a waterfall style effect
plt.rcParams.update(params)
fig, ax = plt.subplots()
f = h5py.File('band.hdf5', 'r')
#print ('datasets are:')
print(list(f.keys()))
dist=f[u'distance']
freq=f[u'frequency']
kpt=f[u'path']
lbl = {0:'AB', 1:'BC', 2:'CD', 3:'fourth'}
for i, section in enumerate(dist):
for nbnd, _ in enumerate(freq[i][0]):
x = section # to_list() you may need to convert sample to list.
y = (freq[i, :, nbnd] + offset*nbnd) * scale
if (nbnd<3):
color=f'C{nbnd}'
else:
color='black'
ax.plot(x, y, c=color, lw=2.0, alpha=0.8, label = lbl[nbnd] if nbnd < 3 and i == 0 else None)
ax.legend()
# Labels and axis limit and ticks
ax.set_ylabel(r'Frequency (THz)', fontsize=12)
ax.set_xlabel(r'Wave Vector (q)', fontsize=12)
ax.set_xlim([dist[0][0],dist[len(dist)-1][-1]])
xticks=[dist[i][0] for i in range(len(dist))]
xticks.append(dist[len(dist)-1][-1])
ax.set_xticks(xticks)
ax.set_xticklabels(ticklabels)
# Plot grid
ax.grid(which='major', axis='x', c='green', lw=2.5, linestyle='--', alpha=0.8)
# Save to pdf
plt.savefig('plots.pdf', bbox_inches='tight')
This script gives me the following image with the data you supplied

Plotting a cumulative histogram with exported data in Python

I am trying to plot a cumulative histogram similar to the one shown below. It shows the number of occurrences (y-axis) of the French pronoun “vous” in a text corpus (x-axis) represented from word 0 to 92,633. It’s been created using a corpus analysis application named TXM. TXM’s plots, however, are not adapted to the specific requirements of my publisher. I would like to produce my own plots exporting the data to python. The problem is that the data exported by TXM is a bit puzzling, and I am wondering how I it can be used to make plots:
it’s a one-column txt file with integers.
Each one of them indicates the position of “vous” in the text corpus. Word 2620 is one “vous,”
3376, another one, etc. One of my attempts with Matplotlib :
from matplotlib import pyplot as plt
pos = [2620,3367,3756,4522,4546,9914,9972,9979,9987,10013,10047,10087,10114,13635,13645,13646,13758,13771,13783,13796,23410,23420,28179,28265,28274,28297,28344,34579,34590,34612,40280,40449,40570,40932,40938,40969,40983,41006,41040,41069,41096,41120,41214,41474,41478,42524,42533,42534,45569,45587,45598,56450,57574,57587]
plt.bar(pos, 1)
plt.show()
But this doesn't come close.
What steps should I follow to complete the plot?
Desired plot:
With matplotlib, you could create the step plot as follows. where='post' means the value changes at every x-position and stays so until the next x-position.
The x-values are the positions in the text, a zero is prepended to let the graph start with zero occurrences. The text-length is appended at the end. The y-values are the numbers 0, 1, 2, ..., where the last value is repeated to draw the last step in full.
from matplotlib import pyplot as plt
from matplotlib.ticker import MultipleLocator, StrMethodFormatter
import numpy as np
pos = [2620,3367,3756,4522,4546,9914,9972,9979,9987,10013,10047,10087,10114,13635,13645,13646,13758,13771,13783,13796,23410,23420,28179,28265,28274,28297,28344,34579,34590,34612,40280,40449,40570,40932,40938,40969,40983,41006,41040,41069,41096,41120,41214,41474,41478,42524,42533,42534,45569,45587,45598,56450,57574,57587]
text_len = 92633
cum = np.arange(0, len(pos) + 1)
fig, ax = plt.subplots(figsize=(12, 3))
ax.step([0] + pos + [text_len], np.pad(cum, (0, 1), 'edge'), where='post', label=f'vous {len(pos)}')
ax.xaxis.set_major_locator(MultipleLocator(5000)) # x-ticks every 5000
ax.xaxis.set_major_formatter(StrMethodFormatter('{x:,.0f}')) # use the thousands separator
ax.yaxis.set_major_locator(MultipleLocator(5)) # have a y-tick every 5
ax.grid(b=True, ls=':') # show a grid with dotted lines
ax.autoscale(enable=True, axis='x', tight=True) # disable padding x-direction
ax.set_xlabel(f'T={text_len:,d}')
ax.set_ylabel('Occurrences')
ax.set_title("Progression of 'vous' in TCN")
plt.legend() # add a legend (uses the label of ax.step)
plt.tight_layout()
plt.show()

highlight a row in seaborn heatmap

Today I'm working on a heatmap inside a function. It's nothing too fancy: the heatmap shows a value for every district in my city and inside the function one of the arguments is district_name.
I want my function to print this same heatmap, but that it highlights the selected district (preferably through bolded text).
My code is something like this:
def print_heatmap(district_name, df2):
df2=df2[df2.t==7]
pivot=pd.pivot_table(df2,values='return',index= 'district',columns= 't',aggfunc ='mean')
sns.heatmap(pivot, annot=True, cmap=sns.cm.rocket_r,fmt='.2%',annot_kws={"size": 10})
So I need to access ax's values, so I can bold, say "Macul" if I enter print_heatmap('Macul',df2). Is there any way I can do this?
What I tried was to use mathtext but for some reason I can't use bolding in this case:
pivot.index=pivot.index.str.replace(district_name,r"$\bf{{{}}}$".format(district_name)
But that brings:
ValueError:
f{macul}$
^
Expected end of text (at char 0), (line:1, col:1)
Thanks
I think it is hard to do this explicitly in seaborn, you can instead though iterate through the texts in the axes (annot) and the ticklabels and set their properties to "highlight" a row.
Here is an example of this approach.
import matplotlib as mpl
import seaborn as sns
import numpy as np
fig = plt.figure(figsize = (5,5))
uniform_data = np.random.rand(10, 1)
cmap = mpl.cm.Blues_r
ax = sns.heatmap(uniform_data, annot=True, cmap=cmap)
# iterate through both the labels and the texts in the heatmap (ax.texts)
for lab, annot in zip(ax.get_yticklabels(), ax.texts):
text = lab.get_text()
if text == '2': # lets highlight row 2
# set the properties of the ticklabel
lab.set_weight('bold')
lab.set_size(20)
lab.set_color('purple')
# set the properties of the heatmap annot
annot.set_weight('bold')
annot.set_color('purple')
annot.set_size(20)

Changing color of single characters in matplotlib plot labels

Using matplotlib in python 3.4:
I would like to be able to set the color of single characters in axis labels.
For example, the x-axis labels for a bar plot might be ['100','110','101','111',...], and I would like the first value to be red, and the others black.
Is this possible, is there some way I could format the text strings so that they would be read out in this way? Perhaps there is some handle that can be grabbed at set_xticklabels and modified?
or, is there some library other than matplotlib that could do it?
example code (to give an idea of my idiom):
rlabsC = ['100','110','101','111']
xs = [1,2,3,4]
ys = [0,.5,.25,.25]
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.bar(xs,ys)
ax.set_xticks([a+.5 for a in xs])
ax.set_xticklabels(rlabsC, fontsize=16, rotation='vertical')
thanks!
It's going to involve a little work, I think. The problem is that individual Text objects have a single color. A workaround is to split your labels into multiple text objects.
First we write the last two characters of the label. To write the first character we need to know how far below the axis to draw -- this is accomplished using transformers and the example found here.
rlabsC = ['100','110','101','111']
xs = [1,2,3,4]
ys = [0,.5,.25,.25]
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.bar(xs,ys)
ax.set_xticks([a+.5 for a in xs])
plt.tick_params('x', labelbottom='off')
text_kwargs = dict(rotation='vertical', fontsize=16, va='top', ha='center')
offset = -0.02
for x, label in zip(ax.xaxis.get_ticklocs(), rlabsC):
first, rest = label[0], label[1:]
# plot the second and third numbers
text = ax.text(x, offset, rest, **text_kwargs)
# determine how far below the axis to place the first number
text.draw(ax.figure.canvas.get_renderer())
ex = text.get_window_extent()
tr = transforms.offset_copy(text._transform, y=-ex.height, units='dots')
# plot the first number
ax.text(x, offset, first, transform=tr, color='red', **text_kwargs)

Categories

Resources