Artificial tick labels for seaborn heatmaps - python

I have a seaborn heatmap that looks like this:
...generated from a pandas dataframe of randomly generated values a piece of which looks like this:
The values along the y axis are all in the range [0,1], and the ones on the x axis in the range [0,2*pi], and I just want some short floats at regular intervals for my tick labels, but I can only seem to get values that are in my dataframe. When I try specifying the values I want, it doesn't put them in the right place, as seen in the plot above. He's my code right now. How can I get the axis labels that I tried specifying with xticks and yticks in this code in the correct places (which would be evenly spaced along the axes)?
import pandas as pd
import numpy as np
import matplotlib as plt
from matplotlib.mlab import griddata
sns.set_style("darkgrid")
PHI, COSTH = np.meshgrid(phis, cos_thetas)
THICK = griddata(phis, cos_thetas, thicknesses, PHI, COSTH, interp='linear')
thick_df = pd.DataFrame(THICK, columns=phis, index=cos_thetas)
thick_df = thick_df.sort_index(axis=0, ascending=False)
thick_df = thick_df.sort_index(axis=1)
cmap = sns.cubehelix_palette(start=1.6, light=0.8, as_cmap=True, reverse=True)
yticks = np.array([0,0.2,0.4,0.6,0.8,1.0])
xticks = np.array([0,1,2,3,4,5,6])
g = sns.heatmap(thick_df, linewidth=0, xticklabels=xticks, yticklabels=yticks, square=True, cmap=cmap)
plt.show(g)

Here's something that should do what you want:
cmap = sns.cubehelix_palette(start=1.6, light=0.8, as_cmap=True, reverse=True)
yticks = np.linspace(0,1,6)
x_end = 6
xticks = np.arange(x_end+1)
ax = sns.heatmap(thick_df, linewidth=0, xticklabels=xticks, yticklabels=yticks[::-1], square=True, cmap=cmap)
ax.set_xticks(xticks*ax.get_xlim()[1]/(2*math.pi))
ax.set_yticks(yticks*ax.get_ylim()[1])
plt.show()
You could pass ['{:,.2f}'.format(x) for x in xticks] instead of xticks to get a float with 2 decimals.
Note that I'm reversing the yticklabels because that's what seaborn does: see matrix.py#L138.
Seaborn calculates the tick positions around the same place (e.g.: #L148), for you that amounts to:
# thick_df.T.shape[0] = thick_df.shape[1]
xticks: np.arange(0, thick_df.T.shape[0], 1) + .5
yticks: np.arange(0, thick_df.T.shape[1], 1) + .5

Related

How to customize histogram using seaborn FacetGrid

I am using seaborn's FacetGrid to do multiple histogram plots from a dataframe (plot_df) on the parameter - "xyz". But I want to do the following additional things too in those plots,
Create a vertical axes line at x-value = 0
Color all the bins that are equal to or lesser than 0 (on x-axis) with a different shade
Calculate the percentage area of the histogram for only those bins that are below 0 (on x-axis)
I am able to get lot of examples online but not with seaborn FacetGrid option
g = sns.FacetGrid(plot_df, col='xyz', height=5)```
g.map(plt.hist, "slack", bins=50)
You could loop through the generated axes (for xyz, ax in g.axes_dict.items(): ....) and call your plotting functions for each of those axes.
Or, you could call g.map_dataframe(...) with a custom function. That function will need to draw onto the "current ax".
Changing the x and y labels, needs to be done after the call to g.map_dataframe() because seaborn erases the x and y labels at the end of that functions.
You can call plt.setp(g.axes, xlabel='data', ylabel='frequency') to set the labels for all the subplots. Or g.set_ylabels('...') to only set the y labels for the "outer" subplots.
Here is some example code to get you started:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
def individual_plot(**kwargs):
ax = plt.gca() # get the current ax
data = kwargs['data']['slack'].values
xmin, xmax = data.min(), data.max()
bin_width = xmax / 50
# histogram part > 0
ax.hist(data, bins=np.arange(0.000001, xmax + 0.001, bin_width), color='tomato')
# histogram part < 0
ax.hist(data, bins=-np.arange(0, abs(xmin) + bin_width + 0.001, bin_width)[::-1], color='lime')
# line at x=0
ax.axvline(0, color='navy', ls='--')
# calculate and show part < 0
percent_under_zero = sum(data <= 0) / len(data) * 100
ax.text(0.5, 0.98, f'part < 0: {percent_under_zero:.1f} %',
color='k', ha='center', va='top', transform=ax.transAxes)
# first generate some test data
plot_df = pd.DataFrame({'xyz': np.repeat([*'xyz'], 1000),
'slack': np.random.randn(3000) * 10 + np.random.choice([10, 500], 3000, p=[0.9, 0.1])})
g = sns.FacetGrid(plot_df, col='xyz', height=5)
g.map_dataframe(individual_plot)
plt.setp(g.axes, xlabel='data', ylabel='frequency')
plt.tight_layout()
plt.show()

Python matplotlib polar coordinate is not plotting as it is supposed to be

I am plotting from a CSV file that contains Cartesian coordinates and I want to change it to Polar coordinates, then plot using the Polar coordinates.
Here is the code
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
df = pd.read_csv('test_for_plotting.csv',index_col = 0)
x_temp = df['x'].values
y_temp = df['y'].values
df['radius'] = np.sqrt( np.power(x_temp,2) + np.power(y_temp,2) )
df['theta'] = np.arctan2(y_temp,x_temp)
df['degrees'] = np.degrees(df['theta'].values)
df['radians'] = np.radians(df['degrees'].values)
ax = plt.axes(polar = True)
ax.set_aspect('equal')
ax.axis("off")
sns.set(rc={'axes.facecolor':'white', 'figure.facecolor':'white','figure.figsize':(10,10)})
# sns.scatterplot(data = df, x = 'x',y = 'y', s= 1,alpha = 0.1, color = 'black',ax = ax)
sns.scatterplot(data = df, x = 'radians',y = 'radius', s= 1,alpha = 0.1, color = 'black',ax = ax)
plt.tight_layout()
plt.show()
Here is the dataset
If you run this command using polar = False and use this line to plot sns.scatterplot(data = df, x = 'x',y = 'y', s= 1,alpha = 0.1, color = 'black',ax = ax) it will result in this picture
now after setting polar = True and run this line to plot sns.scatterplot(data = df, x = 'radians',y = 'radius', s= 1,alpha = 0.1, color = 'black',ax = ax) It is supposed to give you this
But it is not working as if you run the actual code the shape in the Polar format is the same as Cartesian which does not make sense and it does not match the picture I showed you for polar (If you are wondering where did I get the second picture from, I plotted it using R)
I would appreciate your help and insights and thanks in advance!
For a polar plot, the "x-axis" represents the angle in radians. So, you need to switch x and y, and convert the angles to radians (I also added ax=ax, as the axes was created explicitly):
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
data = {'radius': [0, 0.5, 1, 1.5, 2, 2.5], 'degrees': [0, 25, 75, 155, 245, 335]}
df_temp = pd.DataFrame(data)
ax = plt.axes(polar=True)
sns.scatterplot(x=np.radians(df_temp['degrees']), y=df_temp['radius'].to_numpy(),
s=100, alpha=1, color='black', ax=ax)
for deg, y in zip(df_temp['degrees'], df_temp['radius']):
x = np.radians(deg)
ax.axvline(x, color='skyblue', ls=':')
ax.text(x, y, f' {deg}', color='crimson')
ax.set_rlabel_position(-15) # Move radial labels away from plotted dots
plt.tight_layout()
plt.show()
About your new question: if you have an xy plot, and you convert these xy values to polar coordinates, and then plot these on a polar plot, you'll get again the same plot.
After some more testing with the data, I decided to create the plot directly with matplotlib, as seaborn makes some changes that don't have exactly equal effects across seaborn and matplotlib versions.
What seems to be happening in R:
The angles (given by "x") are spread out to fill the range (0,2 pi). This either requires a rescaling of x, or change how the x-values are mapped to angles. One way to get this, is subtracting the minimum. And with that result divide by the new maximum and multiply by 2 pi.
The 0 of the angles it at the top, and the angles go clockwise.
The following code should create the plot with Python. You might want to experiment with alpha and with s in the scatter plot options. (Default the scatter dots get an outline, which often isn't desired when working with very small dots, and can be removed by lw=0.)
ax = plt.axes(polar=True)
ax.set_aspect('equal')
ax.axis('off')
x_temp = df['x'].to_numpy()
y_temp = df['y'].to_numpy()
x_temp -= x_temp.min()
x_temp = x_temp / x_temp.max() * 2 * np.pi
ax.scatter(x=x_temp, y=y_temp, s=0.05, alpha=1, color='black', lw=0)
ax.set_rlim(y_temp.min(), y_temp.max())
ax.set_theta_zero_location("N") # set zero at the north (top)
ax.set_theta_direction(-1) # go clockwise
plt.show()
At the left the resulting image, at the right using the y-values for coloring (ax.scatter(..., c=y_temp, s=0.05, alpha=1, cmap='plasma_r', lw=0)):

Rotate matplotlib colourmap

The ProPlot Python package adds additional features to the Matplotlib library, including colourmap manipulations. One feature that is particularly attractive to me is the ability to rotate/shift colourmaps. To give you an example:
import proplot as pplot
import matplotlib.pyplot as plt
import numpy as np
state = np.random.RandomState(51423)
data = state.rand(30, 30).cumsum(axis=1)
fig, axes = plt.subplots(ncols=3, figsize=(9, 4))
fig.patch.set_facecolor("white")
axes[0].pcolormesh(data, cmap="Blues")
axes[0].set_title("Blues")
axes[1].pcolormesh(data, cmap="Blues_r")
axes[1].set_title("Reversed Blues")
axes[2].pcolormesh(data, cmap="Blues_s")
axes[2].set_title("Rotated Blues")
plt.tight_layout()
plt.show()
In the third column, you see the 180° rotated version of Blues. Currently ProPlot suffers from a bug that doesn't allow the user to revert the plotting style to Matplotlib's default style, so I was wondering if there was an easy way to rotate a colourmap in Matplotlib without resorting to ProPlot. I always found cmap manipulations in Matplotlib a bit arcane, so any help would be much appreciated.
If what you are trying to do is shift the colormaps, this can be done (relatively) easily:
def shift_cmap(cmap, frac):
"""Shifts a colormap by a certain fraction.
Keyword arguments:
cmap -- the colormap to be shifted. Can be a colormap name or a Colormap object
frac -- the fraction of the colorbar by which to shift (must be between 0 and 1)
"""
N=256
if isinstance(cmap, str):
cmap = plt.get_cmap(cmap)
n = cmap.name
x = np.linspace(0,1,N)
out = np.roll(x, int(N*frac))
new_cmap = matplotlib.colors.LinearSegmentedColormap.from_list(f'{n}_s', cmap(out))
return new_cmap
demonstration:
x = np.linspace(0,1,100)
x = np.vstack([x,x])
cmap1 = plt.get_cmap('Blues')
cmap2 = shift_cmap(cmap1, 0.25)
fig, (ax1, ax2) = plt.subplots(2,1)
ax1.imshow(x, aspect='auto', cmap=cmap1)
ax2.imshow(x, aspect='auto', cmap=cmap2)
To reverse a ListedColormap, there is a built-in reversed() but for the intended rotation, we have to create our own function.
#fake data generation
import numpy as np
np.random.seed(123)
#numpy array containing x, y, and color
arr = np.random.random(30).reshape(3, 10)
from matplotlib import pyplot as plt
from matplotlib.colors import ListedColormap
def rotate_cm(co_map, deg=180):
#define a function where the colormap is rotated by a certain degree
#180° shifts by 50%, 360° no change
n = co_map.N
#if rotating in the opposite direction feels more intuitive, reverse the sign here
deg = -deg%360
if deg < 0:
deg += 360
cutpoint = n * deg // 360
new_col_arr = [co_map(i) for i in range(cutpoint, n)] + [co_map(i) for i in range(cutpoint)]
return ListedColormap(new_col_arr)
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(21,7))
#any listed colormap
my_cm = plt.cm.get_cmap("inferno")
#normal color map
cb1 = ax1.scatter(*arr[:2,:], c=arr[2,:], cmap=my_cm, marker="o")
plt.colorbar(cb1, ax=ax1)
ax1.set_title("regular colormap")
#reversed colormap
cb2 = ax2.scatter(*arr[:2,:], c=arr[2,:], cmap=my_cm.reversed(), marker="o")
plt.colorbar(cb2, ax=ax2)
ax2.set_title("reversed colormap")
#rotated colormap
cb3 = ax3.scatter(*arr[:2,:], c=arr[2,:], cmap=rotate_cm(my_cm, 90), marker="o")
#you can also combine the rotation with reversed()
#cb3 = ax3.scatter(*arr[:2,:], c=arr[2,:], cmap=rotate_cm(my_cm, 90).reversed(), marker="o")
plt.colorbar(cb3, ax=ax3)
ax3.set_title("colormap rotated by 90°")
plt.show()
Sample output:

Python pcolormesh with separate alpha value for each bin

Lets say I have the following dataset:
import numpy as np
import matplotlib.pyplot as plt
x_bins = np.arange(10)
y_bins = np.arange(10)
z = np.random.random((9,9))
I can easily plot this data with
plt.pcolormesh(x_bins, y_bins, z, cmap = 'viridis)
However, let's say I now add some alpha value for each point:
a = np.random.random((9,9))
How can I change the alpha value of each box in the pcolormesh plot to match the corresponding value in array "a"?
The mesh created by pcolormesh can only have one alpha for the complete mesh. To set an individual alpha for each cell, the cells need to be created one by one as rectangles.
The code below shows the pcolormesh without alpha at the left, and the mesh of rectangles with alpha at the right. Note that on the spots where the rectangles touch, the semi-transparency causes some unequal overlap. This can be mitigated by not drawing the cell edge (edgecolor='none'), or by longer black lines to separate the cells.
The code below changes the x dimension so easier verify that x and y aren't mixed up. relim and autoscale are needed because with matplotlib's default behavior the x and y limits aren't changed by adding patches.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle, Patch
x_bins = np.arange(12)
y_bins = np.arange(10)
z = np.random.random((9, 11))
a = np.random.random((9, 11))
cmap = plt.get_cmap('inferno')
norm = plt.Normalize(z.min(), z.max())
fig, (ax1, ax2) = plt.subplots(ncols=2)
ax1.pcolormesh(x_bins, y_bins, z, cmap=cmap, norm=norm)
for i in range(len(x_bins) - 1):
for j in range(len(y_bins) - 1):
rect = Rectangle((x_bins[i], y_bins[j]), x_bins[i + 1] - x_bins[i], y_bins[j + 1] - y_bins[j],
facecolor=cmap(norm(z[j, i])), alpha=a[j, i], edgecolor='none')
ax2.add_patch(rect)
# ax2.vlines(x_bins, y_bins.min(), y_bins.max(), edgecolor='black')
# ax2.hlines(y_bins, x_bins.min(), x_bins.max(), edgecolor='black')
ax2.relim()
ax2.autoscale(enable=True, tight=True)
plt.show()

Matplotlib - label each bin

I'm currently using Matplotlib to create a histogram:
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as pyplot
...
fig = pyplot.figure()
ax = fig.add_subplot(1,1,1,)
n, bins, patches = ax.hist(measurements, bins=50, range=(graph_minimum, graph_maximum), histtype='bar')
#ax.set_xticklabels([n], rotation='vertical')
for patch in patches:
patch.set_facecolor('r')
pyplot.title('Spam and Ham')
pyplot.xlabel('Time (in seconds)')
pyplot.ylabel('Bits of Ham')
pyplot.savefig(output_filename)
I'd like to make the x-axis labels a bit more meaningful.
Firstly, the x-axis ticks here seem to be limited to five ticks. No matter what I do, I can't seem to change this - even if I add more xticklabels, it only uses the first five. I'm not sure how Matplotlib calculates this, but I assume it's auto-calculated from the range/data?
Is there some way I can increase the resolution of x-tick labels - even to the point of one for each bar/bin?
(Ideally, I'd also like the seconds to be reformatted in micro-seconds/milli-seconds, but that's a question for another day).
Secondly, I'd like each individual bar labeled - with the actual number in that bin, as well as the percentage of the total of all bins.
The final output might look something like this:
Is something like that possible with Matplotlib?
Cheers,
Victor
Sure! To set the ticks, just, well... Set the ticks (see matplotlib.pyplot.xticks or ax.set_xticks). (Also, you don't need to manually set the facecolor of the patches. You can just pass in a keyword argument.)
For the rest, you'll need to do some slightly more fancy things with the labeling, but matplotlib makes it fairly easy.
As an example:
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.ticker import FormatStrFormatter
data = np.random.randn(82)
fig, ax = plt.subplots()
counts, bins, patches = ax.hist(data, facecolor='yellow', edgecolor='gray')
# Set the ticks to be at the edges of the bins.
ax.set_xticks(bins)
# Set the xaxis's tick labels to be formatted with 1 decimal place...
ax.xaxis.set_major_formatter(FormatStrFormatter('%0.1f'))
# Change the colors of bars at the edges...
twentyfifth, seventyfifth = np.percentile(data, [25, 75])
for patch, rightside, leftside in zip(patches, bins[1:], bins[:-1]):
if rightside < twentyfifth:
patch.set_facecolor('green')
elif leftside > seventyfifth:
patch.set_facecolor('red')
# Label the raw counts and the percentages below the x-axis...
bin_centers = 0.5 * np.diff(bins) + bins[:-1]
for count, x in zip(counts, bin_centers):
# Label the raw counts
ax.annotate(str(count), xy=(x, 0), xycoords=('data', 'axes fraction'),
xytext=(0, -18), textcoords='offset points', va='top', ha='center')
# Label the percentages
percent = '%0.0f%%' % (100 * float(count) / counts.sum())
ax.annotate(percent, xy=(x, 0), xycoords=('data', 'axes fraction'),
xytext=(0, -32), textcoords='offset points', va='top', ha='center')
# Give ourselves some more room at the bottom of the plot
plt.subplots_adjust(bottom=0.15)
plt.show()
One thing I wanted to add to the plots in the histogram with "density = True" was the relative frequency values for each bin, search but I couldn't find a function that would do that. A solution I made follows as image:
The function:
def label_densityHist(ax, n, bins, x=4, y=0.01, r=2, **kwargs):
"""
Add labels,relative value of bin, to each bin in a density histogram .
:param ax: Object axe of matplotlib
The axis to plot.
:param n: list, array of int, float
The values of the histogram bins.
:param bins: list, array of int, float
The edges of the bins.
:param x: int, float
Related the x position of the bin labels. The higher, the lower the value on the x-axis.
Default: 4
:param y: int, float
Related the y position of the bin labels. The higher, the greater the value on the y-axis.
Default: 0.01
:param r: int
Number of decimal places.
Default: 2
:param **kwargs: Text properties in matplotlib
:return: None
Example
import matplotlib.pyplot as plt
import numpy as np
dados = np.random.randn(100)
axe = plt.gca()
n, bins, _ = axe.hist(x=dados, edgecolor='black')
label_densityHist(axe,n, bins)
plt.show()
Example:
import matplotlib.pyplot as plt
import numpy as np
dados = np.random.randn(100)
axe = plt.gca()
n, bins, _ = axe.hist(x=dados, edgecolor='black')
label_densityHist(axe,n, bins, x=6, fontsize='large')
plt.show()
Reference:
[1]https://matplotlib.org/3.1.1/api/text_api.html#matplotlib.text.Text
"""
k = []
# calculate the relative frequency of each bin
for i in range(0,len(n)):
k.append((bins[i+1]-bins[i])*n[i])
# rounded
k = around(k,r); #print(k)
# plot the label/text to each bin
for i in range(0, len(n)):
x_pos = (bins[i + 1] - bins[i]) / x + bins[i]
y_pos = n[i] + (n[i] * y)
label = str(k[i]) # relative frequency of each bin
ax.text(x_pos, y_pos, label, kwargs)
To add SI prefixes to your axis labels you want to use QuantiPhy. In fact, in its documentation it has an example that shows how to do this exact thing: MatPlotLib Example.
I think you would add something like this to your code:
from matplotlib.ticker import FuncFormatter
from quantiphy import Quantity
time_fmtr = FuncFormatter(lambda v, p: Quantity(v, 's').render(prec=2))
ax.xaxis.set_major_formatter(time_fmtr)

Categories

Resources