I'm using matplotlib to look at how wins are distributed based on betting odds for the MLB. The issue is that because betting odds are either >= 100 or <= -100, there's a big gap in the middle of my histogram.
Is there any way to exclude certain bins (specifically anything between -100 and 100) so that the bars of the chart flow more smoothly?
Link to current histogram
Here's the code I have right now:
num_bins = 20
fig, ax = plt.subplots()
n, bins, patches = ax.hist(winner_odds_df['WinnerOdds'], num_bins,
range=range_of_winner_odds)
ax.set_xlabel('Betting Odds')
ax.set_ylabel('Win Frequency')
ax.set_title('Histogram of Favorite Win Frequency Based on Betting Odds (2018)')
fig.tight_layout()
plt.show()
You could break your chart's x-axis as explained here, by plotting on two different axes that are made to visually look like one plot. The essential part, rewritten to apply to the x-axis instead of the y-axis, is:
f, (axl, axr) = plt.subplots(1, 2, sharey=True)
# plot the same data on both axes
axl.hist(winner_odds_df['WinnerOdds'], num_bins)
axr.hist(winner_odds_df['WinnerOdds'], num_bins)
# zoom-in / limit the view to different portions of the data
axl.set_xlim(-500, -100) # outliers only
axr.set_xlim(100, 500) # most of the data
# hide the spines between axl and axr
axl.spines['right'].set_visible(False)
axr.spines['left'].set_visible(False)
axr.yaxis.tick_right()
# How much space to leave between plots
plt.subplots_adjust(wspace=0.15)
See the linked document for how to polish this by adding diagonal break lines. The basic version produced by the code above then looks like this:
Related
Question
How can I plot the following scenario, just like shown in the attached image? This is for the purpose of visualising frequency allocation in a network
Scenario
I have a range of frequency values in a list-tuple like so, where the 1st value is the centre frequency, 2nd is total width, 3rd is guard band:
frequencies = [('195.71250000', '59.00000000', '2.50000000'), ('195.78750000', '59.00000000', '2.50000000'), ('195.86250000', '59.00000000', '2.50000000')]
and the range of these values are:
range = [('191.32500000', '196.12500000')]
Note: These are dummy values, the actual data is much larger but follows the same general structure
There are several ways to create this plot. One way is to use ax.vlines to plot the dashed lines for the frequencies and to use ax.bar for the rectangles representing the frequency ranges.
Here is an example where the frequencies are occupied at regular intervals within the range you have given (boundaries included) but with widths of randomly varying size. No guards are computed seeing as they should be automatically apparent thanks to the position of the frequencies and the widths, as far as I understand.
Also, the widths are much smaller compared to the sample data you have provided, else the bars will be very wide and will all overlap with one another, which would look very different from the image you have shared.
import numpy as np # v 1.19.2
import matplotlib.pyplot as plt # v 3.3.2
# Create sample dataset
rng = np.random.default_rng(seed=1) # random number generator
frequencies = np.arange(191.325, 196.125, step=0.3)
widths = rng.uniform(0.05, 0.25, size=frequencies.size)
# Create figure with single Axes and loop through frequencies and widths to plot
# vertical dashed lines for the frequencies and bars for the widths
fig, ax = plt.subplots(figsize=(10,3))
for freq, width in zip(frequencies, widths):
ax.vlines(x=freq, ymin=0, ymax=10, colors='tab:blue', linestyle='--', zorder=1)
ax.bar(x=freq, height=6, width=width, color='tab:blue', zorder=2)
# Additional formatting
ax.set_xlabel('Frequency (THZ)', labelpad=15, size=12)
ax.set_xticks(frequencies[::2])
ax.yaxis.set_visible(False)
for spine in ['top', 'left', 'right']:
ax.spines[spine].set_visible(False)
plt.show()
Sorry for giving an image however I think it is the best way to show my problem.
As you can see all of the bin width are different, from my understanding it shows range of rent_hours. I am not sure why different figure have different bin width even though I didn't set any.
My code looks is as follows:
figure, axes = plt.subplots(nrows=4, ncols=3)
figure.set_size_inches(18,14)
plt.subplots_adjust(hspace=0.5)
for ax, age_g in zip(axes.ravel(), age_cat):
group = total_usage_df.loc[(total_usage_df.age_group == age_g) & (total_usage_df.day_of_week <= 4)]
sns.distplot(group.rent_hour, ax=ax, kde=False)
ax.set(title=age_g)
ax.set_xlim([0, 24])
figure.suptitle("Weekday usage pattern", size=25);
additional question:
Seaborn : How to get the count in y axis for distplot using PairGrid for here it says that kde=False makes y-axis count however http://seaborn.pydata.org/generated/seaborn.distplot.html in the doc, it uses kde=False and still seems to show something else. How can I set y-axis to show count?
I've tried
sns.distplot(group.rent_hour, ax=ax, norm_hist=True) and it still seems to give something else rather than count.
sns.distplot(group.rent_hour, ax=ax, kde=False) gives me count however I don't know why it is giving me count.
Answer 1:
From the documentation:
norm_hist : bool, optional
If True, the histogram height shows a density rather than a count.
This is implied if a KDE or fitted density is plotted.
So you need to take into account your bin width as well, i.e. compute the area under the curve and not just the sum of the bin heights.
Answer 2:
# Plotting hist without kde
ax = sns.distplot(your_data, kde=False)
# Creating another Y axis
second_ax = ax.twinx()
#Plotting kde without hist on the second Y axis
sns.distplot(your_data, ax=second_ax, kde=True, hist=False)
#Removing Y ticks from the second axis
second_ax.set_yticks([])
I have a dataframe with ~120 features that I would like to examine by year. I am plotting each feature, x = year, y = feature value within a loop. Whilst these plot successfully, the charts are illegible as they are totally squashed.
I have tried using plt.tight_layout() and adjusting the figure size using plt.rcParams['figure.figsize'] but sadly to no avail
for i in range(len(roll_df.columns)):
plt.subplot(len(roll_df.columns), 1, i+1)
name = roll_df.columns[i]
plt.plot(roll_df[name])
plt.title(name, y=0)
plt.yticks([])
plt.xticks([])
plt.tight_layout()
plt.show()
The loop runs but all plots are so squashed on the y-axis as to become illegible:
Matplotlib will not automatically adjust the size of your figure. So if you add more subplots below each other, it will split the available space instead of extending the figure. That's why your y axes are so narrow.
You could try to define the figure size beforehand, or determine the figure size based on how many subplots you have:
n_plots = roll_df.shape[1]
fig, axes = plt.subplots(n_plots, 1, figsize=(8, 4 * n_plots), tight_layout=True)
# Then your usual part, but plot on the created axes
for i in range(n_plots):
name = roll_df.columns[i]
axes[i].plot(roll_df[name])
axes[i].title(name, y=0)
axes[i].yticks([])
axes[i].xticks([])
plt.show()
I'm working with biological data, and am using the heatmap from Seaborn to plot Pearson R values so I can visually compare expression of each of 22 cell types with every other cell type (making a 22x22 heatmap).
I've had two separate problems. Here is what my plots looked like the other day:
You can see that in one plot the y-tick labels are spaced, but the last label is clipped off. Then in the other plot, all of the y-tick labels are present, but are spaced too tightly.
Here's what my plots look like today. I foolishly am not sure what I changed when trying to fix the issue in the first image (aside from changing vmin and vmax, and the color scheme), but now the y-tick labels are all on top of each other.
The y-ticks are all on top of each other.
Here is my code to generate the heatmaps:
def cancerHeatmap(patients, cancer):
x_labels = ['Naive B','Memory B','Plasma','CD8 T','CD4 T Naive','CD4 T MR',
'CD4 T MA','FH T','Tregs','GD T',
'NK resting','NK activated','Monos','Macros M0','Macros M1','Macros M2',
'Dendritic resting','Dendritic activated','Mast resting','Mast activated','Eosinophils','Neutrophils']
pearsonrs = getPearsons(patients, cancer)
axs = sns.heatmap(pearsonrs, cmap = 'RdYlBu', vmin=-0.6, vmax=0.6)
axs.set_yticklabels(x_labels, rotation = 0, fontsize = 10)
axs.set_xticklabels(x_labels, rotation = 90)
axs.xaxis.set_ticks_position('top')
axs.set_xlabel(cancer)
return axs
sns.set_style("whitegrid")
sns.despine()
plt.figure(figsize=(12,70))
plt.suptitle('Subtitle')
for i, cancer in enumerate(cancer_types):
print cancer
plt.subplot((len(cancer_types)/2)+1,2, i+1)
ax = cancerHeatmap(patients, cancer)
plt.tight_layout(rect=[0, 0, 1, 0.97])
plt.savefig('outfile.pdf', dpi=200)
The important details of this are:
"pearsonrs" is a function that generates the numpy array of all of the values that go directly into the heatmap
"cancer_types" is a list of the different cancer types that I'll be generating heatmaps for
I have no idea why this is happening, especially because I can generate one plot at a time with this code, and the axis are perfect:
x_labels = ['Naive B','Memory B','Plasma','CD8 T','CD4 T Naive','CD4 T MR',
'CD4 T MA','FH T','Tregs','GD T',
'NK resting','NK activated','Monos','Macros M0','Macros M1','Macros
M2', 'Dendritic resting','Dendritic activated','Mast resting','Mast activated','Eosinophils','Neutrophils']
axs = sns.heatmap(np.array(pearsonrs), cmap = 'rainbow', vmin=-0.6, vmax=0.6)
axs.invert_yaxis()
axs.set_yticklabels(x_labels, rotation = 0, fontsize = 10)
axs.set_xticklabels(x_labels, rotation = 90, fontsize = 10)
axs.set_xlabel('Cancer')
axs.xaxis.set_ticks_position('top')
Any help is immensely appreciated.
Edit:
I ran identical scripts on my rMBP and my coworker's rMBP, and mine produced the figures with the stacked axis labels (image 2), and my coworker's computer produced the figures with odd spacing (image 1).
Edit 2:
It turns out that the Conda installation was slightly different on each computer. Updating both computers to the most recent Conda (4.5.9) does not change the heatmaps that either computer produces.
I have some geodetic stations and the values of their displacements over two year.
I'm using ax.quiverto get the arrows of displacements on my figure.
The problem is that the displacements are much smaller than the position of the stations and when I use scale=0.0001 (or something like this), all the arrow increase and I need that only the length increase, not their whole body.
My code:
fig, ax = plt.subplots(figsize=(10, 5))
ax.scatter(eee_SB_um,nnn_SB_um,edgecolors='none',marker='o', color='r')
plt.axis('equal')
ax.quiver(eee_SB_um, nnn_SB_um, DSB_1_E, DSB_1_N,
angles='xy',scale_units='xy', scale = 0.0001)
My figure: