I have this code
bins = [0,1,10,20,30,40,50,75,100]
plt.figure(figsize=(15,15))
plt.hist(df.v1, bins = bins)
My problem is that the bin widths as they appear in the figure are proportional to their range in bins. However, I want all bins to come out having the same width.
I'm not sure how you'll make sense of the result, but you can use numpy.histogram to calculate the height of your bars, then plot those directly against an arbitrary x-scale.
x = np.random.normal(loc=50, scale=200, size=(2000,))
bins = [0,1,10,20,30,40,50,75,100]
fig = plt.figure()
ax = fig.add_subplot(211)
ax.hist(x, bins=bins, edgecolor='k')
ax = fig.add_subplot(212)
h,e = np.histogram(x, bins=bins)
ax.bar(range(len(bins)-1),h, width=1, edgecolor='k')
Related
I am trying to plot a histogram of exponential distribution ranging from 0 to 20 with mean value 2.2 and bin width 0.05. However, the bar color became white as I am plotting it. The following is my code:
bins = np.linspace(0, 20, 401)
x = np.random.exponential(2.2, 3000)
counts, _ = np.histogram(x, bins)
df = pd.DataFrame({'bin': bins[:-1], 'count': counts})
p = sns.catplot(data = df, x = 'bin', y = 'count', yerr = [i**(1/2) for i in counts], kind = 'bar', height = 4, aspect = 2, palette = 'Dark2_r')
p.set(xlabel = 'Muon decay times ($\mu s$)', ylabel = 'Count', title = 'Distribution for muon decay times')
for ax in p.axes.flat:
labels = ax.get_xticklabels()
for i,l in enumerate(labels):
if (i%40 != 0):
labels[i] = ""
ax.set_xticklabels(labels, rotation=30)
I believe that this is caused by the number of bins. If the first line of the codes are set to bins = np.linspace(0, 20, 11), the plot would be:
But I have no idea how to resolve this.
As #JohanC points out, if you're trying to draw elements that are close to or smaller than the resolution of your raster graphic, you have to expect some artifacts. But it also seems like you'd have an easier time making this plot directly in matplotlib, since catplot is not designed to make histograms:
f, ax = plt.subplots(figsize=(8, 4), dpi=96)
ax.bar(
bins[:-1], counts,
yerr=[i**(1/2) for i in counts],
width=(bins[1] - bins[0]), align="edge",
linewidth=0, error_kw=dict(linewidth=1),
)
ax.set(
xmargin=.01,
xlabel='Muon decay times ($\mu s$)',
ylabel='Count',
title='Distribution for muon decay times'
)
Matplotlib doesn't have a good way to deal with bars that are thinner than one pixel. If you save to an image file, you can increase the dpi and/or the figsize.
Some white space is due to the bars being 0.8 wide, leaving a gap of 0.2. Seaborn's barplot doesn't let you set the bar widths, but you could iterate through the generated bars and change their width (also updating their x-value to keep them centered around the tick position).
The edges of the bars get a fixed color (default 'none', or fully transparent). While iterating through the generated bars, you could set the edge color equal to the face color.
from matplotlib import pyplot as plt
from matplotlib.ticker import MultipleLocator
import seaborn as sns
import pandas as pd
import numpy as np
bins = np.linspace(0, 20, 401)
x = np.random.exponential(2.2, 3000)
counts, _ = np.histogram(x, bins)
df = pd.DataFrame({'bin': bins[:-1], 'count': counts})
g = sns.catplot(data=df, x='bin', y='count', yerr=[i ** (1 / 2) for i in counts], kind='bar',
height=4, aspect=2, palette='Dark2_r', lw=0.5)
g.set(xlabel='Muon decay times ($\mu s$)', ylabel='Count', title='Distribution for muon decay times')
for ax in g.axes.flat:
ax.xaxis.set_major_locator(MultipleLocator(40))
ax.tick_params(axis='x', labelrotation=30)
for bar in ax.patches:
bar.set_edgecolor(bar.get_facecolor())
bar.set_x(bar.get_x() - (1 - bar.get_width()) / 2)
bar.set_width(1)
plt.tight_layout()
plt.show()
For some reason xticks on my histogram are shifted:
Here is the code:
data = list(df['data'].to_numpy())
bin = 40
plt.style.use('seaborn-colorblind')
plt.grid(axis='y', alpha=0.5, linestyle='--')
plt.hist(data, bins=bin, rwidth=0.7, align='mid')
plt.yticks(np.arange(0, 13000, 1000))
ticks = np.arange(0, 100000, 2500)
plt.xticks(ticks, rotation='-90', ha='center')
plt.show()
Im wondering why x ticks are shifted at the very beginning of the xaxis.
When setting bins=40, 40 equally sized bins will be created between the lowest and highest data value. In this case, the highest data value seems to be around 90000, and the lowest about 0. Dividing this into 40 regions will result in boundaries with non-rounded values. Therefore, it seems better to explicitly set the bins boundaries to the values you really want, for example dividing the range 0-100000 into 40 (so 41 boundaries).
from matplotlib import pyplot as plt
import numpy as np
plt.style.use('seaborn-colorblind')
data = np.random.lognormal(10, 0.4, 100000)
data[data > 90000] = np.nan
fig, axes = plt.subplots(ncols=2, figsize=(12, 4))
for ax in axes:
if ax == axes[0]:
bins = 40
ax.set_title('bins = 40')
else:
bins = np.linspace(0, 100000, 41)
ax.set_title('bins = np.linspace(0, 100000, 41)')
ax.grid(axis='y', alpha=0.5, linestyle='--')
ax.hist(data, bins=bins, rwidth=0.7, align='mid')
ax.set_yticks(np.arange(0, 13000, 1000))
xticks = np.arange(0, 100000, 2500)
ax.set_xticks(xticks)
ax.tick_params(axis='x', labelrotation=-90)
plt.tight_layout()
plt.show()
The issue is related to the way bins are constructed.
You have two choices:
Set the range for bins directly
plt.hist(data, bins=bin, rwidth=0.7, range=(0, 100_000), align='mid')
Set x axis accordingly to the binning:
_, bin_edges, _ = plt.hist(data, bins=bin, rwidth=0.7, align='mid')
ticks = bin_edges
I recommend the 2. option. The histogram will have a more natural scale comparing to the boundaries of bins.
I have created a histogram in a Jupyter notebook to show the distribution of time on page in seconds for 100 web visits.
Code as follows:
ax = df.hist(column='time_on_page', bins=25, grid=False, figsize=(12,8), color='#86bf91', zorder=2, rwidth=0.9)
ax = ax[0]
for x in ax:
# Despine
x.spines['right'].set_visible(False)
x.spines['top'].set_visible(False)
x.spines['left'].set_visible(False)
# Switch off ticks
x.tick_params(axis="both", which="both", bottom="off", top="off", labelbottom="on", left="off", right="off", labelleft="on")
# Draw horizontal axis lines
vals = x.get_yticks()
for tick in vals:
x.axhline(y=tick, linestyle='dashed', alpha=0.4, color='#eeeeee', zorder=1)
# Set title
x.set_title("Time on Page Histogram", fontsize=20, weight='bold', size=12)
# Set x-axis label
x.set_xlabel("Time on Page Duration (Seconds)", labelpad=20, weight='bold', size=12)
# Set y-axis label
x.set_ylabel("Page Views", labelpad=20, weight='bold', size=12)
# Format y-axis label
x.yaxis.set_major_formatter(StrMethodFormatter('{x:,g}'))
This produces the following visualisation:
I'm generally happy with the appearance however I'd like for the axis to be a little more descriptive, perhaps showing the bin range for each bin and the percentage of the total that each bin constitutes.
Have looked for this in the Matplotlib documentation but cannot seem ot find anything that would allow me to achieve my end goal.
Any help greatly appreciated.
When you set bins=25, 25 equally spaced bins are set between the lowest and highest values encountered. If you use these ranges to mark the bins, things can be confusing due to the arbitrary values. It seems more adequate to round these bin boundaries, for example to multiples of 20. Then, these values can be used as tick marks on the x-axis, nicely between the bins.
The percentages can be added by looping through the bars (rectangular patches). Their height indicates the number of rows belonging to the bin, so dividing by the total number of rows and multiplying by 100 gives a percentage. The bar height, x and half width can position the text.
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame({'time_on_page': np.random.lognormal(4, 1.1, 100)})
max_x = df['time_on_page'].max()
bin_width = max(20, np.round(max_x / 25 / 20) * 20) # round to multiple of 20, use max(20, ...) to avoid rounding to zero
bins = np.arange(0, max_x + bin_width, bin_width)
axes = df.hist(column='time_on_page', bins=bins, grid=False, figsize=(12, 8), color='#86bf91', rwidth=0.9)
ax = axes[0, 0]
total = len(df)
ax.set_xticks(bins)
for p in ax.patches:
h = p.get_height()
if h > 0:
ax.text(p.get_x() + p.get_width() / 2, h, f'{h / total * 100.0 :.0f} %\n', ha='center', va='center')
ax.grid(True, axis='y', ls=':', alpha=0.4)
ax.set_axisbelow(True)
for dir in ['left', 'right', 'top']:
ax.spines[dir].set_visible(False)
ax.tick_params(axis="y", length=0) # Switch off y ticks
ax.margins(x=0.02) # tighter x margins
plt.show()
Currently, I have the first y axis (probability) of my subplots aligned. However, I am attempting to get the secondary y axis (sample size) of the subplots aligned. I've tried to simply set the y-axis limit, but this solution isn't very generalizable.
Here is my code:
attacks = 5
crit_rate = .5
idealdata = fullMatrix(attacks, crit_rate)
crit_rate = ("crit_%.0f" % (crit_rate*100))
actualdata = trueDataM(attacks, crit_rate)
[enter image description here][1]
fig, axs = plt.subplots(attacks+1, sharex=True, sharey=True)
axs2 = [ax.twinx() for ax in axs]
fig.text(0.5, 0.04, 'State', ha='center')
fig.text(0.04, 0.5, 'Probability', va='center', rotation='vertical')
fig.text(.95, .5, 'Sample Size', va='center', rotation='vertical')
fig.text(.45, .9, 'Ideal vs. Actual Critical Strike Rate', va='center')
cmap = plt.get_cmap('rainbow')
samplesize = datasample(attacks, 'crit_50')
fig.set_size_inches(18.5, 10.5)
for i in range(attacks+1):
axs[i].plot(idealdata[i], color=cmap(i/attacks), marker='o', lw=3)
axs[i].plot(actualdata[i], 'gray', marker='o', lw=3, ls='--')
axs2[i].bar(range(len(samplesize[i])), samplesize[i], width=.1, color=cmap(i/attacks), alpha = .6)
plt.show()
https://i.stack.imgur.com/HKJlE.png
Without data to confirm my assumptions it's hard to tell if this will be correct.
You are not making any attempt to scale the left y-axes so that data must all have the same range. To ensure the right y-axes all have the same scale/limits you need to determine the range (max and min) of the (all) data being plotted on those axes then apply that to all of them.
It isn't clear whether samplesize is a Numpy ndarray or a lists of lists, I'm also assuming that it is a 2-d structure with range(attacks+1) rows. Since you are making bar charts on the second y-axes you only need to find the largest height in all the data.
# for a list of lists
biggest = max(max(row) for row in samplesize)
# or
biggest = max(map(max,samplesize))
# for an ndarray
biggest = samplesize.max()
Then apply that scale to all the right y-axes before they are shown
for ax in axs2:
ax.set_ylim(top=biggest)
If you determine biggest prior to the plot loop you can just add a line to that loop:
for i in range(attacks+1):
...
axs2[i].set_ylim(top=biggest)
You'll find plenty of related SO Q&A'a searching with the terms: matplotlib subplots same y scale, matplotlib subplots y axis limits or something similar.
Here is a toy example:
from matplotlib import pyplot as plt
import numpy as np
lines = np.random.randint(0,200,(5,10))
bars = [np.random.randint(0,np.random.randint(0,10000),10) for _ in (0,0,0,0,0,)]
fig, axs = plt.subplots(lines.shape[0], sharex=True, sharey=True)
axs2 = [ax.twinx() for ax in axs]
#xs = np.arange(lines.shape[1])
xs = np.arange(1,11)
biggest = max(map(max,bars))
for ax,ax2,line,row in zip(axs,axs2,lines,bars):
bars = ax2.bar(xs,row)
ax.plot(line)
ax2.set_ylim(top=biggest)
plt.show()
plt.close()
I have written a code which will plot a graph of Time VS Amplitude. Now , I want to change the index which is on the horizontal axis. I want to know how I can do it for a single plot and also for the subplots. I want the range of the horizontal axis to be from 0 to 2*pi.
#the following code was written for plotting
fig, (ax1, ax2 ,ax3) = plt.subplots(3 ,constrained_layout = True)
fig.suptitle('AMPLITUDE MODULATION' ,color = 'Red')
ax1.plot(message_signal)
ax1.set_title('Message Signal' ,color = 'green')
I expect the x-axis to go from 0 to 2*pi only. In short, I want to customize the indexing of the x-axis
You can use xlim to set the limits of the x-axis for whole plot or specific axes, e.g. plt.xlim(0, 1) or ax1.set_xlim(0, 1).
Here I set the limits for the x-axis to be [0, 3*pi]
fig, (ax1, ax2 ,ax3) = plt.subplots(3, constrained_layout = True)
fig.suptitle('AMPLITUDE MODULATION', color = 'Red')
x = np.linspace(0, 2*np.pi, 1000)
ax1.plot(x, np.sin(x))
ax1.set_title('Message Signal', color = 'green')
ax1.set_xlim(0, 3*np.pi)