How to customize histogram using seaborn FacetGrid

How to customize histogram using seaborn FacetGrid - python

I am using seaborn's FacetGrid to do multiple histogram plots from a dataframe (plot_df) on the parameter - "xyz". But I want to do the following additional things too in those plots,
Create a vertical axes line at x-value = 0
Color all the bins that are equal to or lesser than 0 (on x-axis) with a different shade
Calculate the percentage area of the histogram for only those bins that are below 0 (on x-axis)
I am able to get lot of examples online but not with seaborn FacetGrid option
g = sns.FacetGrid(plot_df, col='xyz', height=5)```
g.map(plt.hist, "slack", bins=50)

You could loop through the generated axes (for xyz, ax in g.axes_dict.items(): ....) and call your plotting functions for each of those axes.
Or, you could call g.map_dataframe(...) with a custom function. That function will need to draw onto the "current ax".
Changing the x and y labels, needs to be done after the call to g.map_dataframe() because seaborn erases the x and y labels at the end of that functions.
You can call plt.setp(g.axes, xlabel='data', ylabel='frequency') to set the labels for all the subplots. Or g.set_ylabels('...') to only set the y labels for the "outer" subplots.
Here is some example code to get you started:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
def individual_plot(**kwargs):
ax = plt.gca() # get the current ax
data = kwargs['data']['slack'].values
xmin, xmax = data.min(), data.max()
bin_width = xmax / 50
# histogram part > 0
ax.hist(data, bins=np.arange(0.000001, xmax + 0.001, bin_width), color='tomato')
# histogram part < 0
ax.hist(data, bins=-np.arange(0, abs(xmin) + bin_width + 0.001, bin_width)[::-1], color='lime')
# line at x=0
ax.axvline(0, color='navy', ls='--')
# calculate and show part < 0
percent_under_zero = sum(data <= 0) / len(data) * 100
ax.text(0.5, 0.98, f'part < 0: {percent_under_zero:.1f} %',
color='k', ha='center', va='top', transform=ax.transAxes)
# first generate some test data
plot_df = pd.DataFrame({'xyz': np.repeat([*'xyz'], 1000),
'slack': np.random.randn(3000) * 10 + np.random.choice([10, 500], 3000, p=[0.9, 0.1])})
g = sns.FacetGrid(plot_df, col='xyz', height=5)
g.map_dataframe(individual_plot)
plt.setp(g.axes, xlabel='data', ylabel='frequency')
plt.tight_layout()
plt.show()

Related

How to fill intervals under KDE curve with different colors

I am looking for a way to color the intervals below the curve with different colors; on the interval x < 0, I would like to fill the area under the curve with one color and on the interval x >= 0 with another color, like the following image:
This is the code for basic kde plot:
fig, (ax1) = plt.subplots(1, 1, figsize = ((plot_size + 1.5) * 1,(plot_size + 1.5)))
sns.kdeplot(data=pd.DataFrame(w_contrast, columns=['contrast']), x="contrast", ax=ax1);
ax1.set_xlabel(f"Dry Yield Posterior Contrast (kg)");
Is there a way to fill the area under the curve with different colors using seaborn?

seaborn is a high level api for matplotlib, so the curve will have to be calculated; similar to, but simpler than this answer.
Calculate the values for the kde curve with scipy.stats.gaussian_kde
Use matplotlib.pyplot.fill_between to fill the areas.
Use scipy.integrate.simpson to calculate the area under the curve, which will be passed to matplotlib.pyplot.annotate to annotate.
import seaborn as sns
from scipy.stats import gaussian_kde
from scipy.integrate import simps
import numpy as np
# load sample data
df = sns.load_dataset('planets')
# create the kde model
kde = gaussian_kde(df.mass.dropna())
# plot
fig, ax = plt.subplots(figsize=(9, 6))
g = sns.kdeplot(data=df.mass, ax=ax, c='k')
# remove margins; optional
g.margins(x=0, y=0)
# get the min and max of the x-axis
xmin, xmax = g.get_xlim()
# create points between the min and max
x = np.linspace(xmin, xmax, 1000)
# calculate the y values from the model
kde_y = kde(x)
# select x values below 0
x0 = x[x < 0]
# get the len, which will be used for slicing the other arrays
x0_len = len(x0)
# slice the arrays
y0 = kde_y[:x0_len]
x1 = x[x0_len:]
y1 = kde_y[x0_len:]
# calculate the area under the curves
area0 = np.round(simps(y0, x0, dx=1) * 100, 0)
area1 = np.round(simps(y1, x1, dx=1) * 100, 0)
# fill the areas
g.fill_between(x=x0, y1=y0, color='r', alpha=.5)
g.fill_between(x=x1, y1=y1, color='b', alpha=.5)
# annotate
g.annotate(f'{area0:.0f}%', xy=(-1, 0.075), xytext=(10, 0.150), arrowprops=dict(arrowstyle="->", color='r', alpha=.5))
g.annotate(f'{area1:.0f}%', xy=(1, 0.05), xytext=(10, 0.125), arrowprops=dict(arrowstyle="->", color='b', alpha=.5))

How can I make a problem matrix with percentage using matplotlib and seaborn?

I want to make this type of graph you see below.
I get that I can make a matrix graph with matplotlib
like so
cmap = colors.ListedColormap(['white','red'])
data = [
[0,0,0,0,0,1,1,1,1,],
[0,0,0,0,0,1,0,0,1,],
]
plt.figure(figsize=(9,5))
plt.pcolor(data[::-1],cmap=cmap,edgecolors='k', linewidths=3)
plt.xlabel('Problem')
plt.ylabel('Particpant')
plt.show()
But how would I go about adding percentages to be included in this graph?

You can add a secondary x-axis (ax.twiny()), using the top axis for the numbering and the bottom axis to show the percentages.
Calling pcolor with a list of x and y positions that are 0.5 shifted will put the ticks and tick labels at integer positions. clip_on=False makes sure the outer cell borders have the same thickness as the rest. ax.invert_yaxis() lets you invert the y axis (so you can use data instead of data[::-1]).
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
import numpy as np
cmap = ListedColormap(['white', 'orangered'])
data = np.random.randint(0, 3, size=(28, 30)) % 2
data[:, 9] = 1 # one full column to simulate 100%
data[:, 11] = 0 # one empty column to simulate 0%
fig, ax = plt.subplots(figsize=(9, 5))
ax.pcolor(np.arange(data.shape[1] + 1) + 0.5, np.arange(data.shape[0] + 1) + 0.5, data,
cmap=cmap, edgecolors='k', linewidths=3, clip_on=False)
ax.set_yticks(range(1, data.shape[0] + 1))
ax.set_xticks(range(1, data.shape[1] + 1))
ax.set_xticklabels([f'{p:.0f}' for p in data.mean(axis=0) * 100])
ax.invert_yaxis()
ax2 = ax.twiny()
ax2.set_xlim(ax.get_xlim())
ax2.set_xticks(range(1, data.shape[1] + 1))
ax2.set_xlabel('Problem')
ax.tick_params(length=0)
ax2.tick_params(length=0)
ax.set_ylabel('Particpant')
plt.tight_layout()
plt.show()
Decreasing the fontsize (or increasing the figsize) allows to also show the percentage sign:
ax.set_xticklabels([f'{p:.0f}%' for p in data.mean(axis=0) * 100], fontsize=8)

Not getting the heatmap in the background using Matplotlib Python

I have tried this and got the result as in the image:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.colors import LinearSegmentedColormap
cmap = LinearSegmentedColormap.from_list("", ["red","grey","green"])
df = pd.read_csv('t.csv', header=0)
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax = ax1.twiny()
# Scatter plot of positive points, coloured blue (C0)
ax.scatter(np.argwhere(df['real'] > 0), df.loc[df['real'] > 0, 'real'], color='C2')
# Scatter plot of negative points, coloured red (C3)
ax.scatter(np.argwhere(df['real'] < 0), df.loc[df['real'] < 0, 'real'], color='C3')
# Scatter neutral values in grey (C7)
ax.scatter(np.argwhere(df['real'] == 0), df.loc[df['real'] == 0, 'real'], color='C7')
ax.set_ylim([df['real'].min(), df['real'].max()])
index = len(df.index)
ymin = df['prediction'].min()
ymax= df['prediction'].max()
ax1.imshow([np.arange(index),df['prediction']],cmap=cmap,
extent=(0,index-1,ymin, ymax), alpha=0.8)
plt.show()
Image:
I was expecting one output where the color is placed according to the figure. I am getting green color and no reds or greys.
I want to get the image or contours spread as the values are. How I can do that? See the following image, something similar:
Please let me know how I can achieve this. The data I used is here: t.csv
For a live version, have a look at Tensorflow Playground

There are essentially 2 tasks required in a solution like this:
Plot the heatmap as the background;
Plot the scatter data;
Output:
Source code:
import numpy as np
import matplotlib.pyplot as plt
###
# Plot heatmap in the background
###
# Setting up input values
x = np.arange(-6.0, 6.0, 0.1)
y = np.arange(-6.0, 6.0, 0.1)
X, Y = np.meshgrid(x, y)
# plot heatmap colorspace in the background
fig, ax = plt.subplots(nrows=1)
im = ax.imshow(X, cmap=plt.cm.get_cmap('RdBu'), extent=(-6, 6, -6, 6), interpolation='bilinear')
cax = fig.add_axes([0.21, 0.95, 0.6, 0.03]) # [left, bottom, width, height]
fig.colorbar(im, cax=cax, orientation='horizontal') # add colorbar at the top
###
# Plot data as scatter
###
# generate the points
num_samples = 150
theta = np.linspace(0, 2 * np.pi, num_samples)
# generate inner points
circle_r = 2
r = circle_r * np.random.rand(num_samples)
inner_x, inner_y = r * np.cos(theta), r * np.sin(theta)
# generate outter points
circle_r = 4
r = circle_r + np.random.rand(num_samples)
outter_x, outter_y = r * np.cos(theta), r * np.sin(theta)
# plot data
ax.scatter(inner_x, inner_y, s=30, marker='o', color='royalblue', edgecolors='white', linewidths=0.8)
ax.scatter(outter_x, outter_y, s=30, marker='o', color='crimson', edgecolors='white', linewidths=0.8)
ax.set_ylim([-6,6])
ax.set_xlim([-6,6])
plt.show()
To keep things simple, I kept the colorbar range (-6, 6) to match the data range.
I'm sure this code can be changed to suit your specific needs. Good luck!

Here is a possible solution.
A few notes and questions:
What are the 'prediction' values in your data file? They do not seem to correlate with the values in the 'real' column.
Why do you create a second axis? What is represented on the bottom X-axis in your plot? I removed the second axis and labelled the remaining axes (index and real).
When you slice a pandas DataFrame, the index comes with it. You don't need to create a separate index (argwhere and arange(index) in your code). I simplified the first part of the code, where scatterplots are produced.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.colors import LinearSegmentedColormap
cmap = LinearSegmentedColormap.from_list("", ["red","grey","green"])
df = pd.read_csv('t.csv', header=0)
print(df)
fig = plt.figure()
ax = fig.add_subplot(111)
# Data limits
xmin = 0
xmax = df.shape[0]
ymin = df['real'].min()
ymax = df['real'].max()
# Scatter plots
gt0 = df.loc[df['real'] > 0, 'real']
lt0 = df.loc[df['real'] < 0, 'real']
eq0 = df.loc[df['real'] == 0, 'real']
ax.scatter(gt0.index, gt0.values, edgecolor='white', color='C2')
ax.scatter(lt0.index, lt0.values, edgecolor='white', color='C3')
ax.scatter(eq0.index, eq0.values, edgecolor='white', color='C7')
ax.set_ylim((ymin, ymax))
ax.set_xlabel('index')
ax.set_ylabel('real')
# We want 0 to be in the middle of the colourbar,
# because gray is defined as df['real'] == 0
if abs(ymax) > abs(ymin):
lim = abs(ymax)
else:
lim = abs(ymin)
# Create a gradient that runs from -lim to lim in N number of steps,
# where N is the number of colour steps in the cmap.
grad = np.arange(-lim, lim, 2*lim/cmap.N)
# Arrays plotted with imshow must be 2D arrays. In this case it will be
# 1 pixel wide and N pixels tall. Set the aspect ratio to auto so that
# each pixel is stretched out to the full width of the frame.
grad = np.expand_dims(grad, axis=1)
im = ax.imshow(grad, cmap=cmap, aspect='auto', alpha=1, origin='bottom',
extent=(xmin, xmax, -lim, lim))
fig.colorbar(im, label='real')
plt.show()
This gives the following result:

Laying out several plots in matplotlib + numpy

I am pretty new to python and want to plot a dataset using a histogram and a heatmap below. However, I am a bit confused about
How to put a title above both plots and
How to insert some text into bots plots
How to reference the upper and the lower plot
For my first task I used the title instruction, which inserted a caption in between both plots instead of putting it above both plots
For my second task I used the figtext instruction. However, I could not see the text anywhere in the plot. I played a bit with the x, y and fontsize parameters without any success.
Here is my code:
def drawHeatmap(xDim, yDim, plot, threshold, verbose):
global heatmapList
stableCells = 0
print("\n[I] - Plotting Heatmaps ...")
for currentHeatmap in heatmapList:
if -1 in heatmapList[currentHeatmap]:
continue
print("[I] - Plotting heatmap for PUF instance", currentHeatmap,"(",len(heatmapList[currentHeatmap])," values)")
# Convert data to ndarray
#floatMap = list(map(float, currentHeatmap[1]))
myArray = np.array(heatmapList[currentHeatmap]).reshape(xDim,yDim)
# Setup two plots per page
fig, ax = plt.subplots(2)
# Histogram
weights = np.ones_like(heatmapList[currentHeatmap]) / len(heatmapList[currentHeatmap])
hist, bins = np.histogram(heatmapList[currentHeatmap], bins=50, weights=weights)
width = 0.7 * (bins[1] - bins[0])
center = (bins[:-1] + bins[1:]) / 2
ax[0].bar(center, hist, align='center', width=width)
stableCells = calcPercentageStable(threshold, verbose)
plt.figtext(100,100,"!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!", fontsize=40)
heatmap = ax[1].pcolor(myArray, cmap=plt.cm.Blues, alpha=0.8, vmin=0, vmax=1)
cbar = fig.colorbar(heatmap, shrink=0.8, aspect=10, fraction=.1,pad=.01)
#cbar.ax.tick_params(labelsize=40)
for y in range(myArray.shape[0]):
for x in range(myArray.shape[1]):
plt.text(x + 0.5, y + 0.5, '%.2f' % myArray[y, x],
horizontalalignment='center',
verticalalignment='center',
fontsize=(xDim/yDim)*5
)
#fig = plt.figure()
fig = matplotlib.pyplot.gcf()
fig.set_size_inches(60.5,55.5)
plt.savefig(dataDirectory+"/"+currentHeatmap+".pdf", dpi=800, papertype="a3", format="pdf")
#plt.title("Heatmap for PUF instance "+str(currentHeatmap[0][0])+" ("+str(numberOfMeasurements)+" measurements; "+str(sizeOfMeasurements)+" bytes)")
if plot:
plt.show()
print("\t[I] - Done ...")
And here is my current output:

Perhaps this example will make things easier to understand. Things to note are:
Use fig.suptitle to add a title to the top of a figure.
Use ax[i].text(x, y, str) to add text to an Axes object
Each Axes object, ax[i] in your case, holds all the information about a single plot. Use them instead of calling plt, which only really works well with one subplot per figure or to modify all subplots at once. For example, instead of calling plt.figtext, call ax[0].text to add text to the top plot.
Try following the example code below, or at least read through it to get a better idea how to use your ax list.
import numpy as np
import matplotlib.pyplot as plt
histogram_data = np.random.rand(1000)
heatmap_data = np.random.rand(10, 100)
# Set up figure and axes
fig = plt.figure()
fig.suptitle("These are my two plots")
top_ax = fig.add_subplot(211) #2 rows, 1 col, 1st plot
bot_ax = fig.add_subplot(212) #2 rows, 1 col, 2nd plot
# This is the same as doing 'fig, (top_ax, bot_ax) = plt.subplots(2)'
# Histogram
weights = np.ones_like(histogram_data) / histogram_data.shape[0]
hist, bins = np.histogram(histogram_data, bins=50, weights=weights)
width = 0.7 * (bins[1] - bins[0])
center = (bins[:-1] + bins[1:]) / 2
# Use top_ax to modify anything with the histogram plot
top_ax.bar(center, hist, align='center', width=width)
# ax.text(x, y, str). Make sure x,y are within your plot bounds ((0, 1), (0, .5))
top_ax.text(0.5, 0.5, "Here is text on the top plot", color='r')
# Heatmap
heatmap_params = {'cmap':plt.cm.Blues, 'alpha':0.8, 'vmin':0, 'vmax':1}
# Use bot_ax to modify anything with the heatmap plot
heatmap = bot_ax.pcolor(heatmap_data, **heatmap_params)
cbar = fig.colorbar(heatmap, shrink=0.8, aspect=10, fraction=.1,pad=.01)
# See how it looks
plt.show()

Matplotlib - label each bin

I'm currently using Matplotlib to create a histogram:
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as pyplot
...
fig = pyplot.figure()
ax = fig.add_subplot(1,1,1,)
n, bins, patches = ax.hist(measurements, bins=50, range=(graph_minimum, graph_maximum), histtype='bar')
#ax.set_xticklabels([n], rotation='vertical')
for patch in patches:
patch.set_facecolor('r')
pyplot.title('Spam and Ham')
pyplot.xlabel('Time (in seconds)')
pyplot.ylabel('Bits of Ham')
pyplot.savefig(output_filename)
I'd like to make the x-axis labels a bit more meaningful.
Firstly, the x-axis ticks here seem to be limited to five ticks. No matter what I do, I can't seem to change this - even if I add more xticklabels, it only uses the first five. I'm not sure how Matplotlib calculates this, but I assume it's auto-calculated from the range/data?
Is there some way I can increase the resolution of x-tick labels - even to the point of one for each bar/bin?
(Ideally, I'd also like the seconds to be reformatted in micro-seconds/milli-seconds, but that's a question for another day).
Secondly, I'd like each individual bar labeled - with the actual number in that bin, as well as the percentage of the total of all bins.
The final output might look something like this:
Is something like that possible with Matplotlib?
Cheers,
Victor

Sure! To set the ticks, just, well... Set the ticks (see matplotlib.pyplot.xticks or ax.set_xticks). (Also, you don't need to manually set the facecolor of the patches. You can just pass in a keyword argument.)
For the rest, you'll need to do some slightly more fancy things with the labeling, but matplotlib makes it fairly easy.
As an example:
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.ticker import FormatStrFormatter
data = np.random.randn(82)
fig, ax = plt.subplots()
counts, bins, patches = ax.hist(data, facecolor='yellow', edgecolor='gray')
# Set the ticks to be at the edges of the bins.
ax.set_xticks(bins)
# Set the xaxis's tick labels to be formatted with 1 decimal place...
ax.xaxis.set_major_formatter(FormatStrFormatter('%0.1f'))
# Change the colors of bars at the edges...
twentyfifth, seventyfifth = np.percentile(data, [25, 75])
for patch, rightside, leftside in zip(patches, bins[1:], bins[:-1]):
if rightside < twentyfifth:
patch.set_facecolor('green')
elif leftside > seventyfifth:
patch.set_facecolor('red')
# Label the raw counts and the percentages below the x-axis...
bin_centers = 0.5 * np.diff(bins) + bins[:-1]
for count, x in zip(counts, bin_centers):
# Label the raw counts
ax.annotate(str(count), xy=(x, 0), xycoords=('data', 'axes fraction'),
xytext=(0, -18), textcoords='offset points', va='top', ha='center')
# Label the percentages
percent = '%0.0f%%' % (100 * float(count) / counts.sum())
ax.annotate(percent, xy=(x, 0), xycoords=('data', 'axes fraction'),
xytext=(0, -32), textcoords='offset points', va='top', ha='center')
# Give ourselves some more room at the bottom of the plot
plt.subplots_adjust(bottom=0.15)
plt.show()

One thing I wanted to add to the plots in the histogram with "density = True" was the relative frequency values for each bin, search but I couldn't find a function that would do that. A solution I made follows as image:
The function:
def label_densityHist(ax, n, bins, x=4, y=0.01, r=2, **kwargs):
"""
Add labels,relative value of bin, to each bin in a density histogram .
:param ax: Object axe of matplotlib
The axis to plot.
:param n: list, array of int, float
The values of the histogram bins.
:param bins: list, array of int, float
The edges of the bins.
:param x: int, float
Related the x position of the bin labels. The higher, the lower the value on the x-axis.
Default: 4
:param y: int, float
Related the y position of the bin labels. The higher, the greater the value on the y-axis.
Default: 0.01
:param r: int
Number of decimal places.
Default: 2
:param **kwargs: Text properties in matplotlib
:return: None
Example
import matplotlib.pyplot as plt
import numpy as np
dados = np.random.randn(100)
axe = plt.gca()
n, bins, _ = axe.hist(x=dados, edgecolor='black')
label_densityHist(axe,n, bins)
plt.show()
Example:
import matplotlib.pyplot as plt
import numpy as np
dados = np.random.randn(100)
axe = plt.gca()
n, bins, _ = axe.hist(x=dados, edgecolor='black')
label_densityHist(axe,n, bins, x=6, fontsize='large')
plt.show()
Reference:
[1]https://matplotlib.org/3.1.1/api/text_api.html#matplotlib.text.Text
"""
k = []
# calculate the relative frequency of each bin
for i in range(0,len(n)):
k.append((bins[i+1]-bins[i])*n[i])
# rounded
k = around(k,r); #print(k)
# plot the label/text to each bin
for i in range(0, len(n)):
x_pos = (bins[i + 1] - bins[i]) / x + bins[i]
y_pos = n[i] + (n[i] * y)
label = str(k[i]) # relative frequency of each bin
ax.text(x_pos, y_pos, label, kwargs)

To add SI prefixes to your axis labels you want to use QuantiPhy. In fact, in its documentation it has an example that shows how to do this exact thing: MatPlotLib Example.
I think you would add something like this to your code:
from matplotlib.ticker import FuncFormatter
from quantiphy import Quantity
time_fmtr = FuncFormatter(lambda v, p: Quantity(v, 's').render(prec=2))
ax.xaxis.set_major_formatter(time_fmtr)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to customize histogram using seaborn FacetGrid - python

Related

How to fill intervals under KDE curve with different colors

How can I make a problem matrix with percentage using matplotlib and seaborn?

Not getting the heatmap in the background using Matplotlib Python

Laying out several plots in matplotlib + numpy

Matplotlib - label each bin

Categories

Resources