Seaborn Heatmap Y-tick Labels Stacked

Seaborn Heatmap Y-tick Labels Stacked - python

I'm working with biological data, and am using the heatmap from Seaborn to plot Pearson R values so I can visually compare expression of each of 22 cell types with every other cell type (making a 22x22 heatmap).
I've had two separate problems. Here is what my plots looked like the other day:
You can see that in one plot the y-tick labels are spaced, but the last label is clipped off. Then in the other plot, all of the y-tick labels are present, but are spaced too tightly.
Here's what my plots look like today. I foolishly am not sure what I changed when trying to fix the issue in the first image (aside from changing vmin and vmax, and the color scheme), but now the y-tick labels are all on top of each other.
The y-ticks are all on top of each other.
Here is my code to generate the heatmaps:
def cancerHeatmap(patients, cancer):
x_labels = ['Naive B','Memory B','Plasma','CD8 T','CD4 T Naive','CD4 T MR',
'CD4 T MA','FH T','Tregs','GD T',
'NK resting','NK activated','Monos','Macros M0','Macros M1','Macros M2',
'Dendritic resting','Dendritic activated','Mast resting','Mast activated','Eosinophils','Neutrophils']
pearsonrs = getPearsons(patients, cancer)
axs = sns.heatmap(pearsonrs, cmap = 'RdYlBu', vmin=-0.6, vmax=0.6)
axs.set_yticklabels(x_labels, rotation = 0, fontsize = 10)
axs.set_xticklabels(x_labels, rotation = 90)
axs.xaxis.set_ticks_position('top')
axs.set_xlabel(cancer)
return axs
sns.set_style("whitegrid")
sns.despine()
plt.figure(figsize=(12,70))
plt.suptitle('Subtitle')
for i, cancer in enumerate(cancer_types):
print cancer
plt.subplot((len(cancer_types)/2)+1,2, i+1)
ax = cancerHeatmap(patients, cancer)
plt.tight_layout(rect=[0, 0, 1, 0.97])
plt.savefig('outfile.pdf', dpi=200)
The important details of this are:
"pearsonrs" is a function that generates the numpy array of all of the values that go directly into the heatmap
"cancer_types" is a list of the different cancer types that I'll be generating heatmaps for
I have no idea why this is happening, especially because I can generate one plot at a time with this code, and the axis are perfect:
x_labels = ['Naive B','Memory B','Plasma','CD8 T','CD4 T Naive','CD4 T MR',
'CD4 T MA','FH T','Tregs','GD T',
'NK resting','NK activated','Monos','Macros M0','Macros M1','Macros
M2', 'Dendritic resting','Dendritic activated','Mast resting','Mast activated','Eosinophils','Neutrophils']
axs = sns.heatmap(np.array(pearsonrs), cmap = 'rainbow', vmin=-0.6, vmax=0.6)
axs.invert_yaxis()
axs.set_yticklabels(x_labels, rotation = 0, fontsize = 10)
axs.set_xticklabels(x_labels, rotation = 90, fontsize = 10)
axs.set_xlabel('Cancer')
axs.xaxis.set_ticks_position('top')
Any help is immensely appreciated.
Edit:
I ran identical scripts on my rMBP and my coworker's rMBP, and mine produced the figures with the stacked axis labels (image 2), and my coworker's computer produced the figures with odd spacing (image 1).
Edit 2:
It turns out that the Conda installation was slightly different on each computer. Updating both computers to the most recent Conda (4.5.9) does not change the heatmaps that either computer produces.

Related

Plotting with sns.catplot gives bad graphs

I am trying to plot my data where it shows my predicted values superimposed with the actual data values. It does the job but the bar that represents the y value become ridiculously small and uninterpretable and the x-axis labels only show at the bottom of the last graph.
Bit of background- the class ids are essentially subplots of different graphs with different actual and predicted values.
enter image description here
g = sns.catplot(data=plt_df,
y='Outcome',
x='DT',
kind='bar',
ci=None,
hue='Outcome_Type',
row='CLASS_ID',
palette=sns.color_palette(['red', 'blue']),
height = 10,
aspect = 3.5)
g.fig.subplots_adjust(hspace=1)
fig, ax = plt.subplots(figsize=(20, 9))
g.fig.suptitle("Distribution Plot Comparing Actual and Predicted Visits given caliberated Betas - " + describe_plot)
g.set_xlabels('Drive Time (Mins')
g.set_ylabels('Visits Percentage')
plt.xticks(rotation= 90)
plt.show()

seaborn distplot different bar width on each figure

Sorry for giving an image however I think it is the best way to show my problem.
As you can see all of the bin width are different, from my understanding it shows range of rent_hours. I am not sure why different figure have different bin width even though I didn't set any.
My code looks is as follows:
figure, axes = plt.subplots(nrows=4, ncols=3)
figure.set_size_inches(18,14)
plt.subplots_adjust(hspace=0.5)
for ax, age_g in zip(axes.ravel(), age_cat):
group = total_usage_df.loc[(total_usage_df.age_group == age_g) & (total_usage_df.day_of_week <= 4)]
sns.distplot(group.rent_hour, ax=ax, kde=False)
ax.set(title=age_g)
ax.set_xlim([0, 24])
figure.suptitle("Weekday usage pattern", size=25);
additional question:
Seaborn : How to get the count in y axis for distplot using PairGrid for here it says that kde=False makes y-axis count however http://seaborn.pydata.org/generated/seaborn.distplot.html in the doc, it uses kde=False and still seems to show something else. How can I set y-axis to show count?
I've tried
sns.distplot(group.rent_hour, ax=ax, norm_hist=True) and it still seems to give something else rather than count.
sns.distplot(group.rent_hour, ax=ax, kde=False) gives me count however I don't know why it is giving me count.

Answer 1:
From the documentation:
norm_hist : bool, optional
If True, the histogram height shows a density rather than a count.
This is implied if a KDE or fitted density is plotted.
So you need to take into account your bin width as well, i.e. compute the area under the curve and not just the sum of the bin heights.
Answer 2:
# Plotting hist without kde
ax = sns.distplot(your_data, kde=False)
# Creating another Y axis
second_ax = ax.twinx()
#Plotting kde without hist on the second Y axis
sns.distplot(your_data, ax=second_ax, kde=True, hist=False)
#Removing Y ticks from the second axis
second_ax.set_yticks([])

Excluding a certain range of bins in a matplotlib histogram?

I'm using matplotlib to look at how wins are distributed based on betting odds for the MLB. The issue is that because betting odds are either >= 100 or <= -100, there's a big gap in the middle of my histogram.
Is there any way to exclude certain bins (specifically anything between -100 and 100) so that the bars of the chart flow more smoothly?
Link to current histogram
Here's the code I have right now:
num_bins = 20
fig, ax = plt.subplots()
n, bins, patches = ax.hist(winner_odds_df['WinnerOdds'], num_bins,
range=range_of_winner_odds)
ax.set_xlabel('Betting Odds')
ax.set_ylabel('Win Frequency')
ax.set_title('Histogram of Favorite Win Frequency Based on Betting Odds (2018)')
fig.tight_layout()
plt.show()

You could break your chart's x-axis as explained here, by plotting on two different axes that are made to visually look like one plot. The essential part, rewritten to apply to the x-axis instead of the y-axis, is:
f, (axl, axr) = plt.subplots(1, 2, sharey=True)
# plot the same data on both axes
axl.hist(winner_odds_df['WinnerOdds'], num_bins)
axr.hist(winner_odds_df['WinnerOdds'], num_bins)
# zoom-in / limit the view to different portions of the data
axl.set_xlim(-500, -100) # outliers only
axr.set_xlim(100, 500) # most of the data
# hide the spines between axl and axr
axl.spines['right'].set_visible(False)
axr.spines['left'].set_visible(False)
axr.yaxis.tick_right()
# How much space to leave between plots
plt.subplots_adjust(wspace=0.15)
See the linked document for how to polish this by adding diagonal break lines. The basic version produced by the code above then looks like this:

maintaining consistent matplotlib axes limits

I'm trying to produce a series of figures showing geometric shapes of different sizes (one shape in each figure) but consistent, equal-spacing axes across each figure. I can't seem to get axis('equal') to play nice with set_xlim in matplotlib.
Here's the closest I've come so far:
pts0 = np.array([[13,34], [5,1], [ 0,0], [7,36], [13,34]], dtype=np.uint8)
pts1 = np.array([[10,82], [119,64], [149,63], [136,0], [82,14], [81,18],
[26,34], [3,29], [0,34], [10,82]], dtype=np.uint8)
shapes = [pts0,pts1]
for i in range(2):
pts = shapes[i]
fig = plt.figure()
ax1 = fig.add_subplot(111)
plotShape = patches.Polygon(pts, True, fill=True)
p = PatchCollection([plotShape], cmap=cm.Greens)
color = [99]
p.set_clim([0, 100])
p.set_array(np.array(color))
ax1.add_collection(p)
ax1.axis('equal')
ax1.set_xlim(-5,200)
ax1.set_ylim(-5,200)
ax1.set_title('pts'+str(i))
plt.show()
In my system, this results in two figures with the same axes, but neither one of them shows y=0 or the lower portion of the shape. If I remove the line ax1.set_ylim(-5,200), then figure "pts1" looks correct, but the limits of figure "pts0" are such that the shape doesn't show up at all.
My ideal situation is to "anchor" the lower-left corner of the figures at (-5,-5), define xlim as 200, and allow the scaling of the x axis and the value of ymax to "float" as the figure windows are resized, but right now I'd be happy just to consistently get the shapes inside the figures.
Any help would be greatly appreciated!

You can define one of your axes independently first and then when you define the second axis use the sharex or sharey arguments
new_ax = fig.add_axes([<bounds>], sharex=old_ax)

Matplotlib: Adjust legend location/position

I'm creating a figure with multiple subplots. One of these subplots is giving me some trouble, as none of the axes corners or centers are free (or can be freed up) for placing the legend. What I'd like to do is to have the legend placed somewhere in between the 'upper left' and 'center left' locations, while keeping the padding between it and the y-axis equal to the legends in the other subplots (that are placed using one of the predefined legend location keywords).
I know I can specify a custom position by using loc=(x,y), but then I can't figure out how to get the padding between the legend and the y-axis to be equal to that used by the other legends. Would it be possible to somehow use the borderaxespad property of the first legend? Though I'm not succeeding at getting that to work.
Any suggestions would be most welcome!
Edit: Here is a (very simplified) illustration of the problem:
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 2, sharex=False, sharey=False)
ax[0].axhline(y=1, label='one')
ax[0].axhline(y=2, label='two')
ax[0].set_ylim([0.8,3.2])
ax[0].legend(loc=2)
ax[1].axhline(y=1, label='one')
ax[1].axhline(y=2, label='two')
ax[1].axhline(y=3, label='three')
ax[1].set_ylim([0.8,3.2])
ax[1].legend(loc=2)
plt.show()
What I'd like is that the legend in the right plot is moved down somewhat so it no longer overlaps with the line.
As a last resort I could change the axis limits, but I would very much like to avoid that.

I saw the answer you posted and tried it out. The problem however is that it is also depended on the figure size.
Here's a new try:
import numpy
import matplotlib.pyplot as plt
x = numpy.linspace(0, 10, 10000)
y = numpy.cos(x) + 2.
x_value = .014 #Offset by eye
y_value = .55
fig, ax = plt.subplots(1, 2, sharex = False, sharey = False)
fig.set_size_inches(50,30)
ax[0].plot(x, y, label = "cos")
ax[0].set_ylim([0.8,3.2])
ax[0].legend(loc=2)
line1 ,= ax[1].plot(x,y)
ax[1].set_ylim([0.8,3.2])
axbox = ax[1].get_position()
fig.legend([line1], ["cos"], loc = (axbox.x0 + x_value, axbox.y0 + y_value))
plt.show()
So what I am now doing is basically getting the coordinates from the subplot. I then create the legend based on the dimensions of the entire figure. Hence, the figure size does not change anything to the legend positioning anymore.
With the values for x_value and y_value the legend can be positioned in the subplot. x_value has been eyeballed for a good correspondence with the "normal" legend. This value can be changed at your desire. y_value determines the height of the legend.
Good luck!

After spending way too much time on this, I've come up with the following satisfactory solution (the Transformations Tutorial definitely helped):
bapad = plt.rcParams['legend.borderaxespad']
fontsize = plt.rcParams['font.size']
axline = plt.rcParams['axes.linewidth'] #need this, otherwise the result will be off by a few pixels
pad_points = bapad*fontsize + axline #padding is defined in relative to font size
pad_inches = pad_points/72.0 #convert from points to inches
pad_pixels = pad_inches*fig.dpi #convert from inches to pixels using the figure's dpi
Then, I found that both of the following work and give the same value for the padding:
# Define inverse transform, transforms display coordinates (pixels) to axes coordinates
inv = ax[1].transAxes.inverted()
# Inverse transform two points on the display and find the relative distance
pad_axes = inv.transform((pad_pixels, 0)) - inv.transform((0,0))
pad_xaxis = pad_axes[0]
or
# Find how may pixels there are on the x-axis
x_pixels = ax[1].transAxes.transform((1,0)) - ax[1].transAxes.transform((0,0))
# Compute the ratio between the pixel offset and the total amount of pixels
pad_xaxis = pad_pixels/x_pixels[0]
And then set the legend with:
ax[1].legend(loc=(pad_xaxis,0.6))
Plot:

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Seaborn Heatmap Y-tick Labels Stacked - python

Related

Plotting with sns.catplot gives bad graphs

seaborn distplot different bar width on each figure

Excluding a certain range of bins in a matplotlib histogram?

maintaining consistent matplotlib axes limits

Matplotlib: Adjust legend location/position

Categories

Resources