Matplotlib horizontal histogram: Bins with low values disappear - python

The following happens, when I plot a histogram using matplotlib's hist function with horizontal orientation: Sometimes the bins with very low values, which are present at the default orientation, are not shown for the horizontal orientation.
Executing the following code in a notebook cell (sadly I cannot upload/should generate two histograms, where you can see the difference at most right/left bins of the distribution. For default there are small bins at ~3 and -3, which are not present for the horizontal orientation.
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(20)
y = np.random.normal(size=1000000)
plt.hist(y, bins=20)
plt.show()
plt.hist(y, bins=20, orientation='horizontal')
plt.show()
I also linked the plots here:
default -
horizontal
Does anybody have an idea what is the issue here?

I had similar issues with matplotlib's hist function. Sadly, I reverted to using barh and numpy.histogram() by hand.
Your code could then look like:
y = np.random.normal(size=1000000)
yhist = np.histogram(y, bins=20)
plt.barh(y=yhist[1][:20], x=yhist[0])
plt.show()

Related

How to make a square heatmap (overal plot, not the cells)

The square attribute in sns.heatmap works in weird manner. When I plot a heatmap using random numbers and use the square attribute, it works fine.
When I plot the heatmap with my matrix, it creates the heatmap properly.
However, when I use the square attribute, the plot becomes a tiny square.
I can't figure out what is going wrong over here.
Well, square=True means: "show all cells as squares". The only way to fit 7x560 squares into the plot region is reducing the height by a factor of about 80. In other words: it is strongly recommended to use square=False for data that has such a large difference between horizontal and vertical directions. Seaborn isn't doing anything wrong here, it just gives you want you asked for.
If you want the heatmap to be square (instead of the cells), you can use ax = sns.heatmap(data, square=False) and then ax.set_aspect(data.shape[1] / data.shape[0]).
Here is an example:
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
data = np.random.randn(7, 560).cumsum(axis=1).cumsum(axis=0)
data -= data.min(axis=1, keepdims=True)
data /= data.max(axis=1, keepdims=True)
ax = sns.heatmap(data, cmap='turbo', cbar=True, xticklabels=50,
yticklabels=['Grumpy', 'Dopey', 'Doc', 'Happy', 'Bashful', 'Sneezy', 'Sleepy'])
ax.set_aspect(data.shape[1] / data.shape[0])
ax.tick_params(labelrotation=0)
plt.tight_layout()
plt.show()

Shade the area between two axhline using matplotlib

What I'm trying to achieve: a plot with two axhline horizontal lines, with the area between them shaded.
The best so far:
ax.hline(y1, color=c)
ax.hline(y2, color=c)
ax.fill_between(ax.get_xlim(), y1, y2, color=c, alpha=0.5)
The problem is that this leaves a small amount of blank space to the left and right of the shaded area.
I understand that this is likely due to the plot creating a margin around the used/data area of the plot. So, how do I get the fill_between to actually cover the entire plot without matplotlib rescaling the x-axis after drawing? Is there an alternative to get_xlim that would give me appropriate limits of the plot, or an alternative to fill_between?
This is the current result:
Note that this is part of a larger grid layout with several plots, but they all leave a similar margin around these shaded areas.
Not strictly speaking an answer to the question of getting the outer limits, but it does solve the problem. Instead of using fill_between, I should have used:
ax.axhspan(y1, y2, facecolor=c, alpha=0.5)
Result:
ax.get_xlim() does return the limits of the axis, not that of the data:
Axes.get_xlim()
Returns the current x-axis limits as the tuple (left, right).
But Matplotlib simply rescales the x-axis after drawing the fill_between:
import matplotlib.pylab as pl
import numpy as np
pl.figure()
ax=pl.subplot(111)
pl.plot(np.random.random(10))
print(ax.get_xlim())
pl.fill_between(ax.get_xlim(), 0.5, 1)
print(ax.get_xlim())
This results in:
(-0.45000000000000001, 9.4499999999999993)
(-0.94499999999999995, 9.9449999999999985)
If you don't want to manually set the x-limits, you could use something like:
import matplotlib.pylab as pl
import numpy as np
pl.figure()
ax=pl.subplot(111)
pl.plot(np.random.random(10))
xlim = ax.get_xlim()
pl.fill_between(xlim, 0.5, 1)
ax.set_xlim(xlim)

Python: Align bars between bin edges for a double histogram

I am having trouble using the pyplot.hist function to plot 2 histograms on the same figure. For each binning interval, I want the 2 bars to be centered between the bins (Python 3.6 user). To illustrate, here is an example:
import numpy as np
from matplotlib import pyplot as plt
bin_width=1
A=10*np.random.random(100)
B=10*np.random.random(100)
bins=np.arange(0,np.round(max(A.max(),B.max())/bin_width)*bin_width+2*bin_width,bin_width)
fig = plt.figure()
ax = fig.add_subplot(111)
ax.hist(A,bins,color='Orange',alpha=0.8,rwidth=0.4,align='mid',label='A')
ax.hist(B,bins,color='Orange',alpha=0.8,rwidth=0.4,align='mid',label='B')
ax.legend()
ax.set_ylabel('Count')
I get this:
Histogram_1
A and B series are overlapping, which is not good. Knowing there are only 3 option for 'align', (centered on left bin, middle of 2 bins, centered on right bin), i see no other options than modifying the bins, by adding:
bins-=0.25*bin_width
Before plotting A, and adding:
bins+=0.5*bin_width
Before plotting B. That gives me: Histogram
That's better! However, I had to modify the binning, so it is not the same for A and B.
I searched for a simple way to use the same bins, and then shift the 1st and 2nd plot so they are correctly displayed in the binning intervals, but I didn't find it. Any advice?
I hope I explained my problem clearly.
As previously was mentioned in the above comment you do not need a hist plot function. Use numpy histogram function and plot it results with bar function of matplotlib.
According to bins count and count of data types you can calculate bin width. Ticks you may adjust with xticks method:
import numpy as np
import matplotlib.pylab as plt
A=10*np.random.random(100)
B=10*np.random.random(100)
bins=20
# calculate heights and bins for both lists
ahist, abins = np.histogram(A, bins)
bhist, bbins = np.histogram(B, abins)
fig = plt.figure()
ax = fig.add_subplot(111)
# calc bin width for two lists
w = (bbins[1] - bbins[0])/3.
# plot bars
ax.bar(abins[:-1]-w/2.,ahist,width=w,color='r')
ax.bar(bbins[:-1]+w/2.,bhist,width=w,color='orange')
# adjsut xticks
plt.xticks(abins[:-1], np.arange(bins))
plt.show()

Avoid overlapping ticks in matplotlib

I am generating plots like this one:
When using less ticks, the plot fits nicely and the bars are wide enough to see them correctly. Nevertheless, when there are lots of ticks, instead of making the plot larger, it just compress the y axe, resulting in thin bars and overlapping tick text.
This is happening both for plt.show() and plt.save_fig().
Is there any solution so it plots the figure in a scale which guarantees that bars have the specified width, not more (if too few ticks) and not less (too many, overlapping)?
EDIT:
Yes, I'm using barh, and yes, I'm setting height to a fixed value (8):
height = 8
ax.barh(yvalues-width/2, xvalues, height=height, color='blue', align='center')
ax.barh(yvalues+width/2, xvalues, height=height, color='red', align='center')
I don't quite understand your code, it seems you do two plots with the same (only shifted) yvalues, but the image doesn't look so. And are you sure you want to shift by width/2 if you have align=center? Anyways, to changing the image size:
No, I am not sure there is no other way, but I don't see anything in the manual at a glance. To set image size by hand:
fig = plt.figure(figsize=(5, 80))
ax = fig.add_subplot(111)
...your_code
the size is in cm. You can compute it beforehand, try for example
import numpy as np
fig_height = (max(yvalues) - min(yvalues)) / np.diff(yvalue)
this would (approximately) set the minimum distance between ticks to a centimeter, which is too much, but try to adjust it.
I think of two solutions for your case:
If you are trying to plot a histogram, use hist function [1]. This will automatically bin your data. You can even plot multiple overlapping histograms as long as you set alpha value lower than 1. See this post
import matplotlib.pyplot as plt
import numpy as np
x = mu + sigma*np.random.randn(10000)
plt.hist(x, 50, normed=1, facecolor='green',
alpha=0.75, orientation='horizontal')
You can also identify interval of your axis ticks. This will place a tick every 10 items. But I doubt this will solve your problem.
import matplotlib.ticker as ticker
...
ax.yaxis.set_major_locator(ticker.MultipleLocator(10))

Remove grid lines, but keep frame (ggplot2 style in matplotlib)

Using Matplotlib I'd like to remove the grid lines inside the plot, while keeping the frame (i.e. the axes lines). I've tried the code below and other options as well, but I can't get it to work. How do I simply keep the frame while removing the grid lines?
I'm doing this to reproduce a ggplot2 plot in matplotlib. I've created a MWE below. Be aware that you need a relatively new version of matplotlib to use the ggplot2 style.
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import pylab as P
import numpy as np
if __name__ == '__main__':
values = np.random.uniform(size=20)
plt.style.use('ggplot')
fig = plt.figure()
_, ax1 = P.subplots()
weights = np.ones_like(values)/len(values)
plt.hist(values, bins=20, weights=weights)
ax1.set_xlabel('Value')
ax1.set_ylabel('Probability')
ax1.grid(b=False)
#ax1.yaxis.grid(False)
#ax1.xaxis.grid(False)
ax1.set_axis_bgcolor('white')
ax1.set_xlim([0,1])
P.savefig('hist.pdf', bbox_inches='tight')
OK, I think this is what you are asking (but correct me if I misunderstood):
You need to change the colour of the spines. You need to do this for each spine individually, using the set_color method:
for spine in ['left','right','top','bottom']:
ax1.spines[spine].set_color('k')
You can see this example and this example for more about using spines.
However, if you have removed the grey background and the grid lines, and added the spines, this is not really in the ggplot style any more; is that really the style you want to use?
EDIT
To make the edge of the histogram bars touch the frame, you need to either:
Change your binning, so the bin edges go to 0 and 1
n,bins,patches = plt.hist(values, bins=np.linspace(0,1,21), weights=weights)
# Check, by printing bins:
print bins[0], bins[-1]
# 0.0, 1.0
If you really want to keep the bins to go between values.min() and values.max(), you would need to change your plot limits to no longer be 0 and 1:
n,bins,patches = plt.hist(values, bins=20, weights=weights)
ax.set_xlim(bins[0],bins[-1])

Categories

Resources