Heatmap with circles indicating size of population

Heatmap with circles indicating size of population - python

I would like to produce a heatmap in Python, similar to the one shown, where the size of the circle indicates the size of the sample in that cell. I looked in seaborn's gallery and couldn't find anything, and I don't think I can do this with matplotlib.

It's the inverse. While matplotlib can do pretty much everything, seaborn only provides a small subset of options.
So using matplotlib, you can plot a PatchCollection of circles as shown below.
Note: You could equally use a scatter plot, but since scatter dot sizes are in absolute units it would be rather hard to scale them into the grid.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import PatchCollection
N = 10
M = 11
ylabels = ["".join(np.random.choice(list("PQRSTUVXYZ"), size=7)) for _ in range(N)]
xlabels = ["".join(np.random.choice(list("ABCDE"), size=3)) for _ in range(M)]
x, y = np.meshgrid(np.arange(M), np.arange(N))
s = np.random.randint(0, 180, size=(N,M))
c = np.random.rand(N, M)-0.5
fig, ax = plt.subplots()
R = s/s.max()/2
circles = [plt.Circle((j,i), radius=r) for r, j, i in zip(R.flat, x.flat, y.flat)]
col = PatchCollection(circles, array=c.flatten(), cmap="RdYlGn")
ax.add_collection(col)
ax.set(xticks=np.arange(M), yticks=np.arange(N),
xticklabels=xlabels, yticklabels=ylabels)
ax.set_xticks(np.arange(M+1)-0.5, minor=True)
ax.set_yticks(np.arange(N+1)-0.5, minor=True)
ax.grid(which='minor')
fig.colorbar(col)
plt.show()

Here's a possible solution using Bokeh Plots:
import pandas as pd
from bokeh.palettes import RdBu
from bokeh.models import LinearColorMapper, ColumnDataSource, ColorBar
from bokeh.models.ranges import FactorRange
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
import numpy as np
output_notebook()
d = dict(x = ['A','A','A', 'B','B','B','C','C','C','D','D','D'],
y = ['B','C','D', 'A','C','D','B','D','A','A','B','C'],
corr = np.random.uniform(low=-1, high=1, size=(12,)).tolist())
df = pd.DataFrame(d)
df['size'] = np.where(df['corr']<0, np.abs(df['corr']), df['corr'])*50
#added a new column to make the plot size
colors = list(reversed(RdBu[9]))
exp_cmap = LinearColorMapper(palette=colors,
low = -1,
high = 1)
p = figure(x_range = FactorRange(), y_range = FactorRange(), plot_width=700,
plot_height=450, title="Correlation",
toolbar_location=None, tools="hover")
p.scatter("x","y",source=df, fill_alpha=1, line_width=0, size="size",
fill_color={"field":"corr", "transform":exp_cmap})
p.x_range.factors = sorted(df['x'].unique().tolist())
p.y_range.factors = sorted(df['y'].unique().tolist(), reverse = True)
p.xaxis.axis_label = 'Values'
p.yaxis.axis_label = 'Values'
bar = ColorBar(color_mapper=exp_cmap, location=(0,0))
p.add_layout(bar, "right")
show(p)

One option is to use matplotlib's scatter plots with legends and grid. You can specify size of those circles with specifying the scales. You can also change the color of each circle. You should somehow specify X,Y values so that the circles sit straight on lines. This is an example I got from here:
volume = np.random.rayleigh(27, size=40)
amount = np.random.poisson(10, size=40)
ranking = np.random.normal(size=40)
price = np.random.uniform(1, 10, size=40)
fig, ax = plt.subplots()
# Because the price is much too small when being provided as size for ``s``,
# we normalize it to some useful point sizes, s=0.3*(price*3)**2
scatter = ax.scatter(volume, amount, c=ranking, s=0.3*(price*3)**2,
vmin=-3, vmax=3, cmap="Spectral")
# Produce a legend for the ranking (colors). Even though there are 40 different
# rankings, we only want to show 5 of them in the legend.
legend1 = ax.legend(*scatter.legend_elements(num=5),
loc="upper left", title="Ranking")
ax.add_artist(legend1)
# Produce a legend for the price (sizes). Because we want to show the prices
# in dollars, we use the *func* argument to supply the inverse of the function
# used to calculate the sizes from above. The *fmt* ensures to show the price
# in dollars. Note how we target at 5 elements here, but obtain only 4 in the
# created legend due to the automatic round prices that are chosen for us.
kw = dict(prop="sizes", num=5, color=scatter.cmap(0.7), fmt="$ {x:.2f}",
func=lambda s: np.sqrt(s/.3)/3)
legend2 = ax.legend(*scatter.legend_elements(**kw),
loc="lower right", title="Price")
plt.show()
Output:

I don't have enough reputation to comment on Delenges' excellent answer, so I'll leave my comment as an answer instead:
R.flat doesn't order the way we need it to, so the circles assignment should be:
circles = [plt.Circle((j,i), radius=R[j][i]) for j, i in zip(x.flat, y.flat)]

Here is an easy example to plot circle_heatmap.
from matplotlib import pyplot as plt
import pandas as pd
from sklearn.datasets import load_wine as load_data
from psynlig import plot_correlation_heatmap
plt.style.use('seaborn-talk')
data_set = load_data()
data = pd.DataFrame(data_set['data'], columns=data_set['feature_names'])
#data = df_corr_selected
kwargs = {
'heatmap': {
'vmin': -1,
'vmax': 1,
'cmap': 'viridis',
},
'figure': {
'figsize': (14, 10),
},
}
plot_correlation_heatmap(data, bubble=True, annotate=False, **kwargs)
plt.show()

Related

How to make matplotlib's sequential colormaps "pop" more?

I want to use matplotlib's single-color colormaps (e.g. Blues), but I want the color to "pop" more. I'm not sure what the technical term is for this - higher contrast, increased brightness, something else.
My question: how, in matplotlib, can I make a single-color colormap more vibrant?
There's a toy script and output below. In the output, I want both blue and red to be less dull.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
num_units_per_rf = 1000
# Emulate Gaussian DF
gaussian_place_cell_rf_list = []
gaussian_score_90_by_neuron_list = []
for place_cell_rf in np.arange(start=1, stop=4, step=0.5):
score_90_by_neuron = np.random.normal(loc=place_cell_rf, size=num_units_per_rf)
gaussian_score_90_by_neuron_list.append(score_90_by_neuron)
gaussian_place_cell_rf_list.append(np.full(fill_value=place_cell_rf, shape=num_units_per_rf))
gaussian_df = pd.DataFrame({
'place_cell_rf': np.concatenate(gaussian_place_cell_rf_list),
'score_90_by_neuron': np.concatenate(gaussian_score_90_by_neuron_list),
})
noise_place_cell_rf_list = []
noise_score_90_by_neuron_list = []
for place_cell_rf in np.arange(start=1, stop=4, step=0.5):
score_90_by_neuron = np.random.normal(loc=-place_cell_rf, size=num_units_per_rf)
noise_score_90_by_neuron_list.append(score_90_by_neuron)
noise_place_cell_rf_list.append(np.full(fill_value=place_cell_rf, shape=num_units_per_rf))
noise_df = pd.DataFrame({
'place_cell_rf': np.concatenate(noise_place_cell_rf_list),
'score_90_by_neuron': np.concatenate(noise_score_90_by_neuron_list),
})
fig, ax = plt.subplots(figsize=(12, 8))
# Plot Gaussians and Noise.
g = sns.kdeplot(
data=gaussian_df,
x='score_90_by_neuron',
common_norm=False, # Ensure each sweep is normalized separately.
cumulative=True,
hue='place_cell_rf',
palette='Reds',
ax=ax)
sns.kdeplot(
data=noise_df,
x='score_90_by_neuron',
common_norm=False, # Ensure each sweep is normalized separately.
cumulative=True,
hue='place_cell_rf',
palette='Blues',
ax=g)
plt.show()
I want the blues to "glow" more, like this subset of the hsv colormap:
I want the reds to "glow" more, like this subset of the hsv colormap:

Contour lines on incomplete seaborn heatmap

I have succeeded in making contour lines on a seaborn heatmap using the LineCollection answer of the link below:
Contour (iso-z) or threshold lines in seaborn heatmap
However, the dataset I have is not completely filled. It basically has 2 borders. because of this, for each iso line add, I get 3. one is at the correct location, the other two are at the top and bottom borders of the heatmap.
The problem can be reproduced by adding a column filled with zeros:
import seaborn as sns
import numpy as np
from matplotlib.collections import LineCollection
flights = sns.load_dataset("flights")
flights = flights.pivot("month", "year", "passengers")
flights["1965"] = 0
ax = sns.heatmap(flights, annot=True, fmt='d')
def add_iso_line(ax, value, color):
v = flights.gt(value).diff(axis=1).fillna(False).to_numpy()
h = flights.gt(value).diff(axis=0).fillna(False).to_numpy()
try:
l = np.argwhere(v.T)
vlines = np.array(list(zip(l, np.stack((l[:, 0], l[:, 1] + 1)).T)))
l = np.argwhere(h.T)
hlines = np.array(list(zip(l, np.stack((l[:, 0] + 1, l[:, 1])).T)))
lines = np.vstack((vlines, hlines))
ax.add_collection(LineCollection(lines, lw=3, colors=color))
except:
pass
add_iso_line(ax, 200, 'b')
add_iso_line(ax, 400, 'y')
Is there a way to adjust the code, such that it only plots the correct iso line?

How to add median value labels to a Seaborn boxplot using the hue argument

In addition to the solution posted in this link I would also like if I can also add the Hue Parameter, and add the Median Values in each of the plots.
The Current Code:
testPlot = sns.boxplot(x='Pclass', y='Age', hue='Sex', data=trainData)
m1 = trainData.groupby(['Pclass', 'Sex'])['Age'].median().values
mL1 = [str(np.round(s, 2)) for s in m1]
p1 = range(len(m1))
for tick, label in zip(p1, testPlot.get_xticklabels()):
print(testPlot.text(p1[tick], m1[tick] + 1, mL1[tick]))
Gives a Output Like:
I'm working on the Titanic Dataset which can be found in this link.
I'm getting the required values, but only when I do a print statement, how do I include it in my Plot?

Place your labels manually according to hue parameter and width of bars for every category in a cycle of all xticklabels:
import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
trainData = pd.read_csv('titanic.csv')
testPlot = sns.boxplot(x='pclass', y='age', hue='sex', data=trainData)
m1 = trainData.groupby(['pclass', 'sex'])['age'].median().values
mL1 = [str(np.round(s, 2)) for s in m1]
ind = 0
for tick in range(len(testPlot.get_xticklabels())):
testPlot.text(tick-.2, m1[ind+1]+1, mL1[ind+1], horizontalalignment='center', color='w', weight='semibold')
testPlot.text(tick+.2, m1[ind]+1, mL1[ind], horizontalalignment='center', color='w', weight='semibold')
ind += 2
plt.show()

This answer is nearly copy & pasted from here but fit more to your example code. The linked answer is IMHO a bit missplaced there because that question is just about labeling a boxplot and not about a boxplot using the hue argument.
I couldn't use your Train dataset because it is not available as Python package. So I used Titanic instead which has nearly the same column names.
#!/usr/bin/env python3
import pandas as pd
import matplotlib
import matplotlib.patheffects as path_effects
import seaborn as sns
def add_median_labels(ax, fmt='.1f'):
"""Credits: https://stackoverflow.com/a/63295846/4865723
"""
lines = ax.get_lines()
boxes = [c for c in ax.get_children() if type(c).__name__ == 'PathPatch']
lines_per_box = int(len(lines) / len(boxes))
for median in lines[4:len(lines):lines_per_box]:
x, y = (data.mean() for data in median.get_data())
# choose value depending on horizontal or vertical plot orientation
value = x if (median.get_xdata()[1] - median.get_xdata()[0]) == 0 else y
text = ax.text(x, y, f'{value:{fmt}}', ha='center', va='center',
fontweight='bold', color='white')
# create median-colored border around white text for contrast
text.set_path_effects([
path_effects.Stroke(linewidth=3, foreground=median.get_color()),
path_effects.Normal(),
])
df = sns.load_dataset('titanic')
plot = sns.boxplot(x='pclass', y='age', hue='sex', data=df)
add_median_labels(plot)
plot.figure.show()
Als an alternative when you create your boxplot with a figure-based function. In that case you need to give the axes parameter to add_median_labels().
# imports and add_median_labels() unchanged
df = sns.load_dataset('titanic')
plot = sns.catplot(kind='box', x='pclass', y='age', hue='sex', data=df)
add_median_labels(plot.axes[0][0])
plot.figure.show()
The resulting plot
This solution also works with more then two categories in the column used for the hue argument.

Python Matplotlib polar Labeling

Hi Im currently wishing to label my polar bar chart in the form whereby the labels are all rotating by differing amounts so they can be read easily much like a clock. I know there is a rotation in plt.xlabel however this will only rotate it by one amount I have many values and thus would like to not have them all crossing my graph.
This is figuratively what my graph is like with all the orientations in the same way, however I would like something akin to this; I really need this just using matplotlib and pandas if possible. Thanks in advance for the help!
Some example names might be farming, generalists, food and drink if these are not correctly rotated they will overlap the graph and be difficult to read.
from pandas import DataFrame,Series
import pandas as pd
import matplotlib.pylab as plt
from pylab import *
import numpy as np
data = pd.read_csv('/.../data.csv')
data=DataFrame(data)
N = len(data)
data1=DataFrame(data,columns=['X'])
data1=data1.get_values()
plt.figure(figsize=(8,8))
ax = plt.subplot(projection='polar')
plt.xlabel("AAs",fontsize=24)
ax.set_theta_zero_location("N")
bars = ax.bar(theta, data1,width=width, bottom=0.0,color=colours)
I would then like to label the bars according to their names which I can obtain in a list, However there are a number of values and i would like to be able to read the data names.

The very meager beginnings of an answer for you (I was doing something similar, so I just threw a quick hack to go in the right direction):
# The number of labels you'd like
In [521]: N = 5
# Where on the circle it will show up
In [522]: theta = numpy.linspace(0., 2 * numpy.pi, N + 1, endpoint = True)
In [523]: theta = theta[1:]
# Create the figure
In [524]: fig = plt.figure(figsize = (6,6), facecolor = 'white', edgecolor = None)
# Create the axis, notice polar = True
In [525]: ax = plt.subplot2grid((1, 1), (0,0), polar = True)
# Create white bars so you're really just focusing on the labels
In [526]: ax.bar(theta, numpy.ones_like(theta), align = 'center',
...: color = 'white', edgecolor = 'white')
# Create the text you're looking to add, here I just use numbers from counter = 1 to N
In [527]: counter = 1
In [528]: for t, o in zip(theta, numpy.ones_like(theta)):
...: ax.text(t, 1 - .1, counter, horizontalalignment = 'center', verticalalignment = 'center', rotation = t * 100)
...: counter += 1
In [529]: ax.set_yticklabels([])
In [530]: ax.set_xticklabels([])
In [531]: ax.grid(False)
In [531]: plt.show()

Discrete colorbar in matplotlib [duplicate]

How does one set the color of a line in matplotlib with scalar values provided at run time using a colormap (say jet)? I tried a couple of different approaches here and I think I'm stumped. values[] is a storted array of scalars. curves are a set of 1-d arrays, and labels are an array of text strings. Each of the arrays have the same length.
fig = plt.figure()
ax = fig.add_subplot(111)
jet = colors.Colormap('jet')
cNorm = colors.Normalize(vmin=0, vmax=values[-1])
scalarMap = cmx.ScalarMappable(norm=cNorm, cmap=jet)
lines = []
for idx in range(len(curves)):
line = curves[idx]
colorVal = scalarMap.to_rgba(values[idx])
retLine, = ax.plot(line, color=colorVal)
#retLine.set_color()
lines.append(retLine)
ax.legend(lines, labels, loc='upper right')
ax.grid()
plt.show()

The error you are receiving is due to how you define jet. You are creating the base class Colormap with the name 'jet', but this is very different from getting the default definition of the 'jet' colormap. This base class should never be created directly, and only the subclasses should be instantiated.
What you've found with your example is a buggy behavior in Matplotlib. There should be a clearer error message generated when this code is run.
This is an updated version of your example:
import matplotlib.pyplot as plt
import matplotlib.colors as colors
import matplotlib.cm as cmx
import numpy as np
# define some random data that emulates your indeded code:
NCURVES = 10
np.random.seed(101)
curves = [np.random.random(20) for i in range(NCURVES)]
values = range(NCURVES)
fig = plt.figure()
ax = fig.add_subplot(111)
# replace the next line
#jet = colors.Colormap('jet')
# with
jet = cm = plt.get_cmap('jet')
cNorm = colors.Normalize(vmin=0, vmax=values[-1])
scalarMap = cmx.ScalarMappable(norm=cNorm, cmap=jet)
print scalarMap.get_clim()
lines = []
for idx in range(len(curves)):
line = curves[idx]
colorVal = scalarMap.to_rgba(values[idx])
colorText = (
'color: (%4.2f,%4.2f,%4.2f)'%(colorVal[0],colorVal[1],colorVal[2])
)
retLine, = ax.plot(line,
color=colorVal,
label=colorText)
lines.append(retLine)
#added this to get the legend to work
handles,labels = ax.get_legend_handles_labels()
ax.legend(handles, labels, loc='upper right')
ax.grid()
plt.show()
Resulting in:
Using a ScalarMappable is an improvement over the approach presented in my related answer:
creating over 20 unique legend colors using matplotlib

I thought it would be beneficial to include what I consider to be a more simple method using numpy's linspace coupled with matplotlib's cm-type object. It's possible that the above solution is for an older version. I am using the python 3.4.3, matplotlib 1.4.3, and numpy 1.9.3., and my solution is as follows.
import matplotlib.pyplot as plt
from matplotlib import cm
from numpy import linspace
start = 0.0
stop = 1.0
number_of_lines= 1000
cm_subsection = linspace(start, stop, number_of_lines)
colors = [ cm.jet(x) for x in cm_subsection ]
for i, color in enumerate(colors):
plt.axhline(i, color=color)
plt.ylabel('Line Number')
plt.show()
This results in 1000 uniquely-colored lines that span the entire cm.jet colormap as pictured below. If you run this script you'll find that you can zoom in on the individual lines.
Now say I want my 1000 line colors to just span the greenish portion between lines 400 to 600. I simply change my start and stop values to 0.4 and 0.6 and this results in using only 20% of the cm.jet color map between 0.4 and 0.6.
So in a one line summary you can create a list of rgba colors from a matplotlib.cm colormap accordingly:
colors = [ cm.jet(x) for x in linspace(start, stop, number_of_lines) ]
In this case I use the commonly invoked map named jet but you can find the complete list of colormaps available in your matplotlib version by invoking:
>>> from matplotlib import cm
>>> dir(cm)

A combination of line styles, markers, and qualitative colors from matplotlib:
import itertools
import matplotlib as mpl
import matplotlib.pyplot as plt
N = 8*4+10
l_styles = ['-','--','-.',':']
m_styles = ['','.','o','^','*']
colormap = mpl.cm.Dark2.colors # Qualitative colormap
for i,(marker,linestyle,color) in zip(range(N),itertools.product(m_styles,l_styles, colormap)):
plt.plot([0,1,2],[0,2*i,2*i], color=color, linestyle=linestyle,marker=marker,label=i)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,ncol=4);
UPDATE: Supporting not only ListedColormap, but also LinearSegmentedColormap
import itertools
import matplotlib.pyplot as plt
Ncolors = 8
#colormap = plt.cm.Dark2# ListedColormap
colormap = plt.cm.viridis# LinearSegmentedColormap
Ncolors = min(colormap.N,Ncolors)
mapcolors = [colormap(int(x*colormap.N/Ncolors)) for x in range(Ncolors)]
N = Ncolors*4+10
l_styles = ['-','--','-.',':']
m_styles = ['','.','o','^','*']
fig,ax = plt.subplots(gridspec_kw=dict(right=0.6))
for i,(marker,linestyle,color) in zip(range(N),itertools.product(m_styles,l_styles, mapcolors)):
ax.plot([0,1,2],[0,2*i,2*i], color=color, linestyle=linestyle,marker=marker,label=i)
ax.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,ncol=3,prop={'size': 8})

U may do as I have written from my deleted account (ban for new posts :( there was). Its rather simple and nice looking.
Im using 3-rd one of these 3 ones usually, also I wasny checking 1 and 2 version.
from matplotlib.pyplot import cm
import numpy as np
#variable n should be number of curves to plot (I skipped this earlier thinking that it is obvious when looking at picture - sorry my bad mistake xD): n=len(array_of_curves_to_plot)
#version 1:
color=cm.rainbow(np.linspace(0,1,n))
for i,c in zip(range(n),color):
ax1.plot(x, y,c=c)
#or version 2: - faster and better:
color=iter(cm.rainbow(np.linspace(0,1,n)))
c=next(color)
plt.plot(x,y,c=c)
#or version 3:
color=iter(cm.rainbow(np.linspace(0,1,n)))
for i in range(n):
c=next(color)
ax1.plot(x, y,c=c)
example of 3:
Ship RAO of Roll vs Ikeda damping in function of Roll amplitude A44

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Heatmap with circles indicating size of population - python

I would like to produce a heatmap in Python, similar to the one shown, where the size of the circle indicates the size of the sample in that cell. I looked in seaborn's gallery and couldn't find anything, and I don't think I can do this with matplotlib.

I don't have enough reputation to comment on Delenges' excellent answer, so I'll leave my comment as an answer instead: R.flat doesn't order the way we need it to, so the circles assignment should be: circles = [plt.Circle((j,i), radius=R[j][i]) for j, i in zip(x.flat, y.flat)]

Related

How to make matplotlib's sequential colormaps "pop" more?

Contour lines on incomplete seaborn heatmap

How to add median value labels to a Seaborn boxplot using the hue argument

Python Matplotlib polar Labeling

Discrete colorbar in matplotlib [duplicate]

Categories

Resources