Highlighting maximum value in a column on a seaborn heatmap - python

I have a seaborn.heatmap plotted from a DataFrame:
import seaborn as sns
import matplotlib.pyplot as plt
fig = plt.figure(facecolor='w', edgecolor='k')
sns.heatmap(collected_data_frame, annot=True, vmax=1.0, cmap='Blues', cbar=False, fmt='.4g')
I would like to create some sort of highlight for a maximum value in each column - it could be a red box around that value, or a red dot plotted next to that value, or the cell could be colored red instead of using Blues. Ideally I'm expecting something like this:
I got the highlight working for DataFrame printing in Jupyter Notebook using tips from this answer:
How can I achieve a similar thing but on a heatmap?

We've customized the heatmap examples in the official reference. The customization examples were created from the responses from this site. It's a form of adding parts to an existing graph. I added a frame around the maximum value, but this is manual.
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
import seaborn as sns
sns.set()
# Load the example flights dataset and convert to long-form
flights_long = sns.load_dataset("flights")
flights = flights_long.pivot("month", "year", "passengers")
# Draw a heatmap with the numeric values in each cell
f, ax = plt.subplots(figsize=(9, 6))
ax = sns.heatmap(flights, annot=True, fmt="d", linewidths=.5, ax=ax)
ax.add_patch(Rectangle((10,6),2,2, fill=False, edgecolor='blue', lw=3))
max value:
ymax = max(flights)
ymax
1960
flights.columns.get_loc(ymax)
11
xmax = flights[ymax].idxmax()
xmax
'July'
xpos = flights.index.get_loc(xmax)
xpos
6
ax.add_patch(Rectangle((ymax,xpos),1,1, fill=False, edgecolor='blue', lw=3))

Complete solution based on the answer of #r-beginners:
Generate DataFrame:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
import seaborn
arr = np.array([[0.9336719 , 0.90119269, 0.90791181, 0.3112451 , 0.56715989,
0.83339874, 0.14571595, 0.36505745, 0.89847367, 0.95317909,
0.16396293, 0.63463356],
[0.93282304, 0.90605976, 0.91276066, 0.30288519, 0.56366228,
0.83032344, 0.14633036, 0.36081791, 0.9041638 , 0.95268572,
0.16803188, 0.63459491],
[0.15215358, 0.4311569 , 0.32324376, 0.51620611, 0.69872915,
0.08811177, 0.80087247, 0.234593 , 0.47973905, 0.21688613,
0.2738223 , 0.38322856],
[0.90406056, 0.89632902, 0.92220635, 0.3022458 , 0.58843012,
0.78159595, 0.17089609, 0.33443782, 0.89997103, 0.93128579,
0.15942313, 0.62644379],
[0.93868063, 0.45617598, 0.17708323, 0.81828266, 0.72986428,
0.82543775, 0.41530088, 0.2604382 , 0.33132295, 0.94686745,
0.05607774, 0.54141198]])
columns_text = [str(num) for num in range(0,12)]
index_text = ['C1', 'C2', 'C3', 'C4', 'C5']
arr_data_frame = pd.DataFrame(arr, columns=columns_text, index=index_text)
Highlighting maximum in a column:
fig,ax = plt.subplots(figsize=(15, 3), facecolor='w', edgecolor='k')
ax = seaborn.heatmap(arr_data_frame, annot=True, vmax=1.0, vmin=0, cmap='Blues', cbar=False, fmt='.4g', ax=ax)
column_max = arr_data_frame.idxmax(axis=0)
for col, variable in enumerate(columns_text):
position = arr_data_frame.index.get_loc(column_max[variable])
ax.add_patch(Rectangle((col, position),1,1, fill=False, edgecolor='red', lw=3))
plt.savefig('max_column_heatmap.png', dpi = 500, bbox_inches='tight')
Highlighting maximum in a row:
fig,ax = plt.subplots(figsize=(15, 3), facecolor='w', edgecolor='k')
ax = seaborn.heatmap(arr_data_frame, annot=True, vmax=1.0, vmin=0, cmap='Blues', cbar=False, fmt='.4g', ax=ax)
row_max = arr_data_frame.idxmax(axis=1)
for row, index in enumerate(index_text):
position = arr_data_frame.columns.get_loc(row_max[index])
ax.add_patch(Rectangle((position, row),1,1, fill=False, edgecolor='red', lw=3))
plt.savefig('max_row_heatmap.png', dpi = 500, bbox_inches='tight')

Related

How to customize the location of color bar in Seaborn heatmap?

I have the following code to create a heatmap. However, it creates an overlap of the color bar and the right axis text. The text has no problems, I want it to be in that length.
How can I locate the colorbar on the right/left side of the heatmap with no overlap?
I tried with "pad" parameter in cbar_kws but it didn't help.enter image description here
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
PT=pd.DataFrame(np.random.randn(300,3), columns=list('ABC'))
miniPT=PT.iloc[:,:-1]
SMALL_SIZE = 8
MEDIUM_SIZE = 80
BIGGER_SIZE = 120
plt.rc('font', size=MEDIUM_SIZE) # controls default text sizes
plt.rc('axes', titlesize=MEDIUM_SIZE) # fontsize of the axes title
plt.rc('axes', labelsize=MEDIUM_SIZE) # fontsize of the x and y labels
plt.rc('xtick', labelsize=MEDIUM_SIZE) # fontsize of the tick labels
plt.rc('ytick', labelsize=SMALL_SIZE) # fontsize of the tick labels
plt.rc('legend', fontsize=MEDIUM_SIZE) # legend fontsize
plt.rc('figure', titlesize=BIGGER_SIZE) # fontsize of the figure title
plt.figure(figsize=(10, miniPT.shape[0]/5.2))
ax =sns.heatmap(miniPT, annot=False, cmap='RdYlGn')
for _, spine in ax.spines.items():
spine.set_visible(True)
# second axis
asset_list=np.asarray(PT['C'])
asset_list=asset_list[::-1]
ax3 = ax.twinx()
ax3.set_ylim([0,ax.get_ylim()[1]])
ax3.set_yticks(ax.get_yticks())
ax3.set_yticklabels(asset_list, fontsize=MEDIUM_SIZE*0.6)
# colorbar
cbar = ax.collections[0].colorbar
cbar.ax.tick_params(labelsize=MEDIUM_SIZE)
One way to get the overlap automatically adjusted by matplotlib, is to explicitly create subplots: one for the heatmap and another for the colorbar. sns.heatmap's cbar_ax= parameter can be set to point to this subplot. gridspec_kws= is needed to set the relative sizes. At the end, plt.tight_layout() will adjust all the paddings to make everything fit nicely.
The question's code contains some strange settings (e.g. a fontsize of 80 is immense). Also, 300 rows will inevitably lead to overlapping text (the fontsize needs to be so small that non-overlapping text wouldn't be readable). Here is some more simplified example code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
PT = pd.DataFrame(np.random.randn(100, 3), columns=list('ABC'))
fig, (ax, cbar_ax) = plt.subplots(ncols=2, figsize=(10, len(PT) / 5.2), gridspec_kw={'width_ratios': [10, 1]})
sns.heatmap(PT.iloc[:, :-1], annot=False, cmap='RdYlGn', cbar_ax=cbar_ax, ax=ax)
for _, spine in ax.spines.items():
spine.set_visible(True)
# second axis
asset_list = np.asarray(PT['C'])
ax3 = ax.twinx()
ax3.set_ylim(ax.get_ylim())
ax3.set_yticks(np.arange(len(PT)))
ax3.set_yticklabels(asset_list, fontsize=80)
# colorbar
cbar_ax.tick_params(labelsize=80)
plt.tight_layout()
plt.show()
As the plot is quite large, here only the bottom part is pasted, with a link to the full plot.
This is how it would look like with:
fontsize 80 (Note that font sizes are measured in "points per inch", standard 72 points per inch);
figure width of 20 inches (instead of 10);
300 rows
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
PT = pd.DataFrame(np.random.randn(300, 3), columns=list('ABC'))
fig, (ax, cbar_ax) = plt.subplots(ncols=2, figsize=(20, len(PT) / 5.2), gridspec_kw={'width_ratios': [15, 1]})
sns.heatmap(PT.iloc[:, :-1], annot=False, cmap='RdYlGn', cbar_ax=cbar_ax, ax=ax)
for _, spine in ax.spines.items():
spine.set_visible(True)
# second axis
asset_list = np.asarray(PT['C'])
ax3 = ax.twinx()
ax3.set_ylim(ax.get_ylim())
ax3.set_yticks(np.arange(len(PT)))
ax3.set_yticklabels(asset_list, fontsize=80)
# colorbar
cbar_ax.tick_params(labelsize=80)
plt.tight_layout()
plt.show()
My solution was eventually move the colorbar to left side. This is the code and the output:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
PT = pd.DataFrame(np.random.randn(300, 3), columns=list('ABC'))
fig, (ax0, ax1) = plt.subplots(ncols=2, figsize=(10, len(PT) / 5.2), gridspec_kw={'width_ratios': [15, 15]})
sns.heatmap(PT.iloc[:, :-1], annot=False, cmap='RdYlGn', cbar_ax=ax0, ax=ax1)
for _, spine in ax1.spines.items():
spine.set_visible(True)
# second axis
asset_list = np.asarray(PT['C'])
ax3 = ax1.twinx()
ax3.set_ylim(ax1.get_ylim())
ax3.set_yticks(np.arange(len(PT)))
ax3.set_yticklabels(asset_list, fontsize=80)
# colorbar
ax0.tick_params(labelsize=80)
plt.tight_layout()
plt.show()

Add different shade colors for trend and forecast , with text on the region

import numpy as np
import pandas as pd
df = pd.DataFrame({"y" : np.random.rand(20)})
ax = df.iloc[:15,:].plot(ls="-", color="b")
ax2 = ax.twinx() #Create a twin Axes sharing the xaxis
df.iloc[15:,:].plot(ls="--", color="r", ax=ax)
plt.axhline(y=0.5,linestyle="--",animated=True,label="False Alaram")
plt.show()
So, first 15 are trend and last 5 are predictions.
I want different colors for trend and pred in background.
Also, how can i add text "Historic" and "Forecast" on graph.
I believe you're looking for fill_between:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({"y" : np.random.rand(20)})
fig, ax = plt.subplots(figsize=(8,6))
df.iloc[:15,:].plot(ls="-", color="b", ax=ax)
plt.fill_between(df.iloc[:15].index.tolist(), df.iloc[:15].y.tolist(), alpha=.25, color='b')
df.iloc[15:,:].plot(ls="--", color="r", ax=ax)
plt.axhline(y=0.5,linestyle="--", animated=True, label="False Alaram")
plt.fill_between(df.iloc[15:].index.tolist(), df.iloc[15:].y.tolist(), alpha=.25, color='r')
plt.legend()
plt.show()

Plotting multiple seaborn heatmaps with individual color bar

Is it possible to plot multiple seaborn heatmaps into a single figure, with a shared yticklabel, and individual color bars, like the figure below?
What I can do is to plot the heatmaps individually, using the following code:
#Figure 1
plt.figure()
sns.set()
comp = sns.heatmap(df, cmap="coolwarm", linewidths=.5, xticklabels=True, yticklabels=True, cbar_kws={"orientation": "horizontal", "label": "Pathway completeness", "pad": 0.004})
comp.set_xticklabels(comp.get_xticklabels(), rotation=-90)
comp.xaxis.tick_top() # x axis on top
comp.xaxis.set_label_position('top')
cbar = comp.collections[0].colorbar
cbar.set_ticks([0, 50, 100])
cbar.set_ticklabels(['0%', '50%', '100%'])
figure = comp.get_figure()
figure.savefig("hetmap16.png", format='png', bbox_inches='tight')
#Figure 2 (figure 3 is the same, but with a different database)
plt.figure()
sns.set()
df = pd.DataFrame(heatMapFvaMinDictP)
fvaMax = sns.heatmap(df, cmap="rocket_r", linewidths=.5, xticklabels=True, cbar_kws={"orientation": "horizontal", "label": "Minimum average flux", "pad": 0.004})
fvaMax.set_xticklabels(fvaMax.get_xticklabels(), rotation=-90)
fvaMax.xaxis.tick_top() # x axis on top
fvaMax.xaxis.set_label_position('top')
fvaMax.tick_params(axis='y', labelleft=False)
figure = fvaMax.get_figure()
figure.savefig("fva1.png", format='png', bbox_inches='tight')
Seaborn builds upon matplotlib, which can be used for further customizing plots. plt.subplots(ncols=3, sharey=True, ...) creates three subplots with a shared y-axis. Adding ax=ax1 to sns.heatmap(..., ax=...) creates the heatmap on the desired subplot. Note that the return value of sns.heatmap is again that same ax.
The following code shows an example. vmin and vmax are explicitly set for the first heatmap to make sure that both values will appear in the colorbar (the default colorbar runs between the minimum and maximum of the encountered values).
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
sns.set()
fig, (ax1, ax2, ax3) = plt.subplots(ncols=3, sharey=True, figsize=(20, 8))
N = 20
labels = [''.join(np.random.choice(list('abcdefghi '), 40)) for _ in range(N)]
df = pd.DataFrame({'column 1': np.random.uniform(0, 100, N), 'column 2': np.random.uniform(0, 100, N)},
index=labels)
sns.heatmap(df, cmap="coolwarm", linewidths=.5, xticklabels=True, yticklabels=True, ax=ax1, vmin=0, vmax=100,
cbar_kws={"orientation": "horizontal", "label": "Pathway completeness", "pad": 0.004})
ax1.set_xticklabels(ax1.get_xticklabels(), rotation=-90)
ax1.xaxis.tick_top() # x axis on top
ax1.xaxis.set_label_position('top')
cbar = ax1.collections[0].colorbar
cbar.set_ticks([0, 50, 100])
cbar.set_ticklabels(['0%', '50%', '100%'])
for ax in (ax2, ax3):
max_value = 10 if ax == ax2 else 1000
df = pd.DataFrame({'column 1': np.random.uniform(0, max_value, N), 'column 2': np.random.uniform(0, max_value, N)},
index=labels)
sns.heatmap(df, cmap="rocket_r", linewidths=.5, xticklabels=True, ax=ax,
cbar_kws={"orientation": "horizontal", "pad": 0.004,
"label": ("Minimum" if ax == ax2 else "Minimum") + " average flux"})
ax.set_xticklabels(ax.get_xticklabels(), rotation=-90)
ax.xaxis.tick_top() # x axis on top
ax.xaxis.set_label_position('top')
plt.tight_layout()
fig.savefig("subplots.png", format='png', bbox_inches='tight')
plt.show()
You can concatenate the two dataframes and use FacetGrid with FacetGrid.map_dataframe, and I guess you might need to adjust the aesthetics a bit. Don't have your data so I try it with an example data:
import pandas as pd
import numpy as np
import seaborn as sns
np.random.seed(111)
df1 = pd.DataFrame({'A':np.random.randn(15),'B':np.random.randn(15)},
index=['row_variable'+str(i+1) for i in range(15)])
df2 = pd.DataFrame({'A':np.random.randn(15),'B':np.random.randn(15)},
index=['row_variable'+str(i+1) for i in range(15)])
We annotate the data.frames with a column indicating the database like you have, and also set a dictionary for the color schemes for each dataframes:
df1['database'] = "database1"
df2['database'] = "database2"
dat = pd.concat([df1,df2])
cdict = {'database1':'rocket_r','database2':'coolwarm'}
And define a function to draw the heatmap:
def heat(data,color):
sns.heatmap(data[['A','B']],cmap=cdict[data['database'][0]],
cbar_kws={"orientation": "horizontal"})
Then facet:
fg = sns.FacetGrid(data=dat, col='database',aspect=0.7,height=4)
fg.map_dataframe(heat)

Seaborn scatterplot legend showing true values and normalized continuous color

I have a dataframe that I'd like to use to build a scatterplot where different points have different colors:
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
dat=pd.DataFrame(np.random.rand(20, 2), columns=['x','y'])
dat['c']=np.random.randint(0,100,20)
dat['c_norm']=(dat['c']-dat['c'].min())/(dat['c'].max()-dat['c'].min())
dat['group']=np.append(np.repeat('high',10), np.repeat('low',10))
As you can see, the column c_norm shows the c column has been normalized between 0 and 1. I would like to show a continuous legend whose color range reflect the normalized values, but labeled using the original c values as label. Say, the minimum (1), the maximum (86), and the median (49). I also want to have differing markers depending on group.
So far I was able to do this:
fig = plt.figure(figsize = (8,8))
ax = fig.add_subplot(1,1,1)
for row in dat.index:
if(dat.loc[row,'group']=='low'):
i_marker='.'
else:
i_marker='x'
ax.scatter(
x=dat.loc[row,'x'],
y=dat.loc[row,'y'],
s=50, alpha=0.5,
marker=i_marker
)
ax.legend(dat['c_norm'], loc='center right', bbox_to_anchor=(1.5, 0.5), ncol=1)
Questions:
- How to generate a continuous legend based on the values?
- How to adapt its ticks to show the original ticks in c, or at least a min, max, and mean or median?
Thanks in advance
Partial answer. Do you actually need to determine your marker colors based on the normed values? See the output of the snippet below.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
dat = pd.DataFrame(np.random.rand(20, 2), columns=['x', 'y'])
dat['c'] = np.random.randint(0, 100, 20)
dat['c_norm'] = (dat['c'] - dat['c'].min()) / (dat['c'].max() - dat['c'].min())
dat['group'] = np.append(np.repeat('high', 10), np.repeat('low', 10))
fig, (ax, bx) = plt.subplots(nrows=1, ncols=2, num=0, figsize=(16, 8))
mask = dat['group'] == 'low'
scat = ax.scatter(dat['x'][mask], dat['y'][mask], s=50, c=dat['c'][mask],
marker='s', vmin=np.amin(dat['c']), vmax=np.amax(dat['c']),
cmap='plasma')
ax.scatter(dat['x'][~mask], dat['y'][~mask], s=50, c=dat['c'][~mask],
marker='X', vmin=np.amin(dat['c']), vmax=np.amax(dat['c']),
cmap='plasma')
cbar = fig.colorbar(scat, ax=ax)
scat = bx.scatter(dat['x'][mask], dat['y'][mask], s=50, c=dat['c_norm'][mask],
marker='s', vmin=np.amin(dat['c_norm']),
vmax=np.amax(dat['c_norm']), cmap='plasma')
bx.scatter(dat['x'][~mask], dat['y'][~mask], s=50, c=dat['c_norm'][~mask],
marker='X', vmin=np.amin(dat['c_norm']),
vmax=np.amax(dat['c_norm']), cmap='plasma')
cbar2 = fig.colorbar(scat, ax=bx)
plt.show()
You could definitely modify the second colorbar so that it matches the first one, but is that necessary?

How do I plot this using seaborn?

import matplotlib.pyplot as plt
import seaborn as sns
rankings_by_age = star_wars.groupby("Age").agg(np.mean).iloc[:,8:]
age_first = rankings_by_age.iloc[0, :].values
age_second = rankings_by_age.iloc[1, :].values
age_third = rankings_by_age.iloc[2, :].values
age_fourth = rankings_by_age.iloc[3, :].values
fig, ax = plt.subplots(figsize=(12, 9))
ind = np.arange(6)
width = 0.2
rects_1 = ax.bar(ind, age_first, width, color=(114/255,158/255,206/255),
alpha=.8)
rects_2 = ax.bar(ind+width, age_second, width, color=
(255/255,158/255,74/255), alpha=.8)
rects_3 = ax.bar(ind+2*width, age_third, width, color=
(103/255,191/255,92/255), alpha=.8)
rects_4 = ax.bar(ind+3*width, age_fourth, width, color=
(237/255,102/255,93/255), alpha=.8)
ax.set_title("Star Wars Film Rankings by Age")
ax.set_ylabel("Ranking")
ax.set_xticks(ind)
ax.set_xticklabels(titles, rotation=45)
ax.tick_params(top='off', right='off', left='off', bottom='off')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.legend((rects_1[0], rects_2[0], rects_3[0], rects_4[0]), ('18-29', '30-
44', '45-60', '> 60'), title="Age")
plt.show()
I want to replicate this plot using seaborn, but I am not sure how to go about plotting multiple bars for each category. I understand how to do it using one age group at a time, but getting more than one bar per age group seems tricky. Any help would be appreciated.
Quoting the seaborn bar plot documentation, you can use the hue argument to determine which column of the dataframe the bars should be grouped by.
import seaborn.apionly as sns
import matplotlib.pyplot as plt
df = sns.load_dataset("tips")
ax = sns.barplot(data=df, x="day", y="total_bill", hue="sex")
plt.show()

Categories

Resources