Contour lines on incomplete seaborn heatmap - python

I have succeeded in making contour lines on a seaborn heatmap using the LineCollection answer of the link below:
Contour (iso-z) or threshold lines in seaborn heatmap
However, the dataset I have is not completely filled. It basically has 2 borders. because of this, for each iso line add, I get 3. one is at the correct location, the other two are at the top and bottom borders of the heatmap.
The problem can be reproduced by adding a column filled with zeros:
import seaborn as sns
import numpy as np
from matplotlib.collections import LineCollection
flights = sns.load_dataset("flights")
flights = flights.pivot("month", "year", "passengers")
flights["1965"] = 0
ax = sns.heatmap(flights, annot=True, fmt='d')
def add_iso_line(ax, value, color):
v = flights.gt(value).diff(axis=1).fillna(False).to_numpy()
h = flights.gt(value).diff(axis=0).fillna(False).to_numpy()
try:
l = np.argwhere(v.T)
vlines = np.array(list(zip(l, np.stack((l[:, 0], l[:, 1] + 1)).T)))
l = np.argwhere(h.T)
hlines = np.array(list(zip(l, np.stack((l[:, 0] + 1, l[:, 1])).T)))
lines = np.vstack((vlines, hlines))
ax.add_collection(LineCollection(lines, lw=3, colors=color))
except:
pass
add_iso_line(ax, 200, 'b')
add_iso_line(ax, 400, 'y')
Is there a way to adjust the code, such that it only plots the correct iso line?

Related

How to draw histogram + QQ plots together for each column?

I have a dataset with lots of numerical columns. I want to draw histogram for each column but also add extra QQ plot just to check more thoroughly if data follow normal distribution. So I would like to have histogram and QQ plot under histogram for each column. Something like that:
I tried to do this using following code but both plots overlap each other:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
num_cols = df.select_dtypes(include=np.number)
cols = num_cols.columns.tolist()
df_sample = df.sample(n=5000)
fig, axes = plt.subplots(4, 5, figsize=(15,12), layout = 'constrained')
for col, axs in zip(cols, axes.flat):
sns.histplot(data = df_sample[col], kde = True, stat = 'density', ax = axs, alpha = .4)
sm.qqplot(df_sample[col], line='45', ax = axs)
plt.show()
How can I generate hist and QQ plots one under another for each column?
Another issue is that my QQ plots look strange, I'm wondering if I need to standarize all my columns before making QQ plot.

Line plot with marker at final point

I am looking to produce a graph plotting the points of particles under the action of gravity and am currently producing a plot as below:
However, I would like to produce a clearer plot showing a line for the path of the particles and a marker at the final point indicating their final positions, like in the plot below:
My current line of code plotting each line is:
plt.plot(N_pos[:,0] * AU, N_pos[:,1], 'o')
This just plots the x and y coordinate from an array listing the x, y and z coordinate for each particle
Is the simplest way to do this remove the 'o' marker from the code and just plot the last position of each particle again but this time using a marker? If so, how to I make the line and final marker the same colour instead of like below?:
for i in range(len(all_positions[0])):
N_pos = all_positions[:,i]
plt.plot(N_pos[:,0] , N_pos[:,1])
plt.plot(N_pos[:,0][-1] , N_pos[:,1][-1], 'o')
When no explicit color is given, plt.plot() cycles through a list of default colors.
A simple solution would be to extract the color from the lineplot and provide it as the color for the dot:
import numpy as np
import matplotlib.pyplot as plt
a = np.random.randn(200, 10, 1).cumsum(axis=0) * 0.1
all_positions = np.dstack([np.sin(a), np.cos(a)]).cumsum(axis=0)
for i in range(len(all_positions[0])):
N_pos = all_positions[:, i]
line, = plt.plot(N_pos[:, 0], N_pos[:, 1])
plt.plot(N_pos[:, 0][-1], N_pos[:, 1][-1], 'o', color=line.get_color())
plt.show()
Another option would be to create a scatter plot, and set the size of the dots via an array. For example, N-1 times 1 and one time 20:
for i in range(len(all_positions[0])):
N_pos = all_positions[:, i]
plt.scatter(N_pos[:, 0], N_pos[:, 1], s=np.append(np.ones(len(N_pos) - 1), 20))
You can define your own color palette and give each trace its unique(ish) color:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
np.random.random(123)
all_positions = np.random.randn(10, 5, 2).cumsum(axis=0) #shamelessly stolen from JohanC
l = all_positions.shape[1]
my_cmap = cm.plasma
for i in range(l):
N_pos = all_positions[:,i]
plt.plot(N_pos[:,0], N_pos[:,1], c= my_cmap(i/l))
plt.plot(N_pos[:,0][-1], N_pos[:,1][-1], 'o', color=my_cmap(i/l))
plt.show()
Output:
You can reset the color cycler and plot the markers in a second round (not recommended, just to illustrate cycler properties):
import numpy as np
import matplotlib.pyplot as plt
np.random.random(123)
all_positions = np.random.randn(10, 5, 2).cumsum(axis=0)
l = all_positions.shape[1]
for i in range(l):
N_pos = all_positions[:,i]
plt.plot(N_pos[:,0], N_pos[:,1])
plt.gca().set_prop_cycle(None)
for i in range(l):
N_pos = all_positions[:,i]
plt.plot(N_pos[:,0][-1], N_pos[:,1][-1], 'o')
plt.show()
Sample output:

Heatmap with circles indicating size of population

I would like to produce a heatmap in Python, similar to the one shown, where the size of the circle indicates the size of the sample in that cell. I looked in seaborn's gallery and couldn't find anything, and I don't think I can do this with matplotlib.
It's the inverse. While matplotlib can do pretty much everything, seaborn only provides a small subset of options.
So using matplotlib, you can plot a PatchCollection of circles as shown below.
Note: You could equally use a scatter plot, but since scatter dot sizes are in absolute units it would be rather hard to scale them into the grid.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import PatchCollection
N = 10
M = 11
ylabels = ["".join(np.random.choice(list("PQRSTUVXYZ"), size=7)) for _ in range(N)]
xlabels = ["".join(np.random.choice(list("ABCDE"), size=3)) for _ in range(M)]
x, y = np.meshgrid(np.arange(M), np.arange(N))
s = np.random.randint(0, 180, size=(N,M))
c = np.random.rand(N, M)-0.5
fig, ax = plt.subplots()
R = s/s.max()/2
circles = [plt.Circle((j,i), radius=r) for r, j, i in zip(R.flat, x.flat, y.flat)]
col = PatchCollection(circles, array=c.flatten(), cmap="RdYlGn")
ax.add_collection(col)
ax.set(xticks=np.arange(M), yticks=np.arange(N),
xticklabels=xlabels, yticklabels=ylabels)
ax.set_xticks(np.arange(M+1)-0.5, minor=True)
ax.set_yticks(np.arange(N+1)-0.5, minor=True)
ax.grid(which='minor')
fig.colorbar(col)
plt.show()
Here's a possible solution using Bokeh Plots:
import pandas as pd
from bokeh.palettes import RdBu
from bokeh.models import LinearColorMapper, ColumnDataSource, ColorBar
from bokeh.models.ranges import FactorRange
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
import numpy as np
output_notebook()
d = dict(x = ['A','A','A', 'B','B','B','C','C','C','D','D','D'],
y = ['B','C','D', 'A','C','D','B','D','A','A','B','C'],
corr = np.random.uniform(low=-1, high=1, size=(12,)).tolist())
df = pd.DataFrame(d)
df['size'] = np.where(df['corr']<0, np.abs(df['corr']), df['corr'])*50
#added a new column to make the plot size
colors = list(reversed(RdBu[9]))
exp_cmap = LinearColorMapper(palette=colors,
low = -1,
high = 1)
p = figure(x_range = FactorRange(), y_range = FactorRange(), plot_width=700,
plot_height=450, title="Correlation",
toolbar_location=None, tools="hover")
p.scatter("x","y",source=df, fill_alpha=1, line_width=0, size="size",
fill_color={"field":"corr", "transform":exp_cmap})
p.x_range.factors = sorted(df['x'].unique().tolist())
p.y_range.factors = sorted(df['y'].unique().tolist(), reverse = True)
p.xaxis.axis_label = 'Values'
p.yaxis.axis_label = 'Values'
bar = ColorBar(color_mapper=exp_cmap, location=(0,0))
p.add_layout(bar, "right")
show(p)
One option is to use matplotlib's scatter plots with legends and grid. You can specify size of those circles with specifying the scales. You can also change the color of each circle. You should somehow specify X,Y values so that the circles sit straight on lines. This is an example I got from here:
volume = np.random.rayleigh(27, size=40)
amount = np.random.poisson(10, size=40)
ranking = np.random.normal(size=40)
price = np.random.uniform(1, 10, size=40)
fig, ax = plt.subplots()
# Because the price is much too small when being provided as size for ``s``,
# we normalize it to some useful point sizes, s=0.3*(price*3)**2
scatter = ax.scatter(volume, amount, c=ranking, s=0.3*(price*3)**2,
vmin=-3, vmax=3, cmap="Spectral")
# Produce a legend for the ranking (colors). Even though there are 40 different
# rankings, we only want to show 5 of them in the legend.
legend1 = ax.legend(*scatter.legend_elements(num=5),
loc="upper left", title="Ranking")
ax.add_artist(legend1)
# Produce a legend for the price (sizes). Because we want to show the prices
# in dollars, we use the *func* argument to supply the inverse of the function
# used to calculate the sizes from above. The *fmt* ensures to show the price
# in dollars. Note how we target at 5 elements here, but obtain only 4 in the
# created legend due to the automatic round prices that are chosen for us.
kw = dict(prop="sizes", num=5, color=scatter.cmap(0.7), fmt="$ {x:.2f}",
func=lambda s: np.sqrt(s/.3)/3)
legend2 = ax.legend(*scatter.legend_elements(**kw),
loc="lower right", title="Price")
plt.show()
Output:
I don't have enough reputation to comment on Delenges' excellent answer, so I'll leave my comment as an answer instead:
R.flat doesn't order the way we need it to, so the circles assignment should be:
circles = [plt.Circle((j,i), radius=R[j][i]) for j, i in zip(x.flat, y.flat)]
Here is an easy example to plot circle_heatmap.
from matplotlib import pyplot as plt
import pandas as pd
from sklearn.datasets import load_wine as load_data
from psynlig import plot_correlation_heatmap
plt.style.use('seaborn-talk')
data_set = load_data()
data = pd.DataFrame(data_set['data'], columns=data_set['feature_names'])
#data = df_corr_selected
kwargs = {
'heatmap': {
'vmin': -1,
'vmax': 1,
'cmap': 'viridis',
},
'figure': {
'figsize': (14, 10),
},
}
plot_correlation_heatmap(data, bubble=True, annotate=False, **kwargs)
plt.show()

How to add median value labels to a Seaborn boxplot using the hue argument

In addition to the solution posted in this link I would also like if I can also add the Hue Parameter, and add the Median Values in each of the plots.
The Current Code:
testPlot = sns.boxplot(x='Pclass', y='Age', hue='Sex', data=trainData)
m1 = trainData.groupby(['Pclass', 'Sex'])['Age'].median().values
mL1 = [str(np.round(s, 2)) for s in m1]
p1 = range(len(m1))
for tick, label in zip(p1, testPlot.get_xticklabels()):
print(testPlot.text(p1[tick], m1[tick] + 1, mL1[tick]))
Gives a Output Like:
I'm working on the Titanic Dataset which can be found in this link.
I'm getting the required values, but only when I do a print statement, how do I include it in my Plot?
Place your labels manually according to hue parameter and width of bars for every category in a cycle of all xticklabels:
import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
trainData = pd.read_csv('titanic.csv')
testPlot = sns.boxplot(x='pclass', y='age', hue='sex', data=trainData)
m1 = trainData.groupby(['pclass', 'sex'])['age'].median().values
mL1 = [str(np.round(s, 2)) for s in m1]
ind = 0
for tick in range(len(testPlot.get_xticklabels())):
testPlot.text(tick-.2, m1[ind+1]+1, mL1[ind+1], horizontalalignment='center', color='w', weight='semibold')
testPlot.text(tick+.2, m1[ind]+1, mL1[ind], horizontalalignment='center', color='w', weight='semibold')
ind += 2
plt.show()
This answer is nearly copy & pasted from here but fit more to your example code. The linked answer is IMHO a bit missplaced there because that question is just about labeling a boxplot and not about a boxplot using the hue argument.
I couldn't use your Train dataset because it is not available as Python package. So I used Titanic instead which has nearly the same column names.
#!/usr/bin/env python3
import pandas as pd
import matplotlib
import matplotlib.patheffects as path_effects
import seaborn as sns
def add_median_labels(ax, fmt='.1f'):
"""Credits: https://stackoverflow.com/a/63295846/4865723
"""
lines = ax.get_lines()
boxes = [c for c in ax.get_children() if type(c).__name__ == 'PathPatch']
lines_per_box = int(len(lines) / len(boxes))
for median in lines[4:len(lines):lines_per_box]:
x, y = (data.mean() for data in median.get_data())
# choose value depending on horizontal or vertical plot orientation
value = x if (median.get_xdata()[1] - median.get_xdata()[0]) == 0 else y
text = ax.text(x, y, f'{value:{fmt}}', ha='center', va='center',
fontweight='bold', color='white')
# create median-colored border around white text for contrast
text.set_path_effects([
path_effects.Stroke(linewidth=3, foreground=median.get_color()),
path_effects.Normal(),
])
df = sns.load_dataset('titanic')
plot = sns.boxplot(x='pclass', y='age', hue='sex', data=df)
add_median_labels(plot)
plot.figure.show()
Als an alternative when you create your boxplot with a figure-based function. In that case you need to give the axes parameter to add_median_labels().
# imports and add_median_labels() unchanged
df = sns.load_dataset('titanic')
plot = sns.catplot(kind='box', x='pclass', y='age', hue='sex', data=df)
add_median_labels(plot.axes[0][0])
plot.figure.show()
The resulting plot
This solution also works with more then two categories in the column used for the hue argument.

Changing the length of axis lines in matplotlib

I am trying to change the displayed length of the axis of matplotlib plot. This is my current code:
import matplotlib.pyplot as plt
import numpy as np
linewidth = 2
outward = 10
ticklength = 4
tickwidth = 1
fig, ax = plt.subplots()
ax.plot(np.arange(100))
ax.tick_params(right="off",top="off",length = ticklength, width = tickwidth, direction = "out")
ax.spines["top"].set_visible(False), ax.spines["right"].set_visible(False)
for line in ["left","bottom"]:
ax.spines[line].set_linewidth(linewidth)
ax.spines[line].set_position(("outward",outward))
Which generates the following plot:
I would like my plot to look like the following with axis line shortened:
I wasn't able to find this in ax[axis].spines method. I also wasn't able to plot this nicely using ax.axhline method.
You could add these lines to the end of your code:
ax.spines['left'].set_bounds(20, 80)
ax.spines['bottom'].set_bounds(20, 80)
for i in [0, -1]:
ax.get_yticklabels()[i].set_visible(False)
ax.get_xticklabels()[i].set_visible(False)
for i in [0, -2]:
ax.get_yticklines()[i].set_visible(False)
ax.get_xticklines()[i].set_visible(False)
To get this:

Categories

Resources