Displaying 3 histograms on 1 axis in a legible way - matplotlib - python

I have produced 3 sets of data which are organised in numpy arrays. I'm interested in plotting the probability distribution of these three sets of data as normed histograms. All three distributions should look almost identical so it seems sensible to plot all three on the same axis for ease of comparison.
By default matplotlib histograms are plotted as bars which makes the image I want look very messy. Hence, my question is whether it is possible to force pyplot.hist to only draw a box/circle/triangle where the top of the bar would be in the default form so I can cleanly display all three distributions on the same graph or whether I have to calculate the histogram data and then plot it separately as a scatter graph.
Thanks in advance.

There are two ways to plot three histograms simultaniously, but both are not what you've asked for. To do what you ask, you must calculate the histogram, e.g. by using numpy.histogram, then plot using the plot method. Use scatter only if you want to associate other information with your points by setting a size for each point.
The first alternative approach to using hist involves passing all three data sets at once to the hist method. The hist method then adjusts the widths and placements of each bar so that all three sets are clearly presented.
The second alternative is to use the histtype='step' option, which makes clear plots for each set.
Here is a script demonstrating this:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(101)
a = np.random.normal(size=1000)
b = np.random.normal(size=1000)
c = np.random.normal(size=1000)
common_params = dict(bins=20,
range=(-5, 5),
normed=True)
plt.subplots_adjust(hspace=.4)
plt.subplot(311)
plt.title('Default')
plt.hist(a, **common_params)
plt.hist(b, **common_params)
plt.hist(c, **common_params)
plt.subplot(312)
plt.title('Skinny shift - 3 at a time')
plt.hist((a, b, c), **common_params)
plt.subplot(313)
common_params['histtype'] = 'step'
plt.title('With steps')
plt.hist(a, **common_params)
plt.hist(b, **common_params)
plt.hist(c, **common_params)
plt.savefig('3hist.png')
plt.show()
And here is the resulting plot:
Keep in mind you could do all this with the object oriented interface as well, e.g. make individual subplots, etc.

Related

Seaborn distplot() won't display frequency in the y-axis

I am trying to display the weighted frequency in the y-axis of a seaborn.distplot() graph, but it keeps displaying the density (which is the default in distplot())
I read the documentation and also many similar questions here in Stack.
The common answer is to set norm_hist=False and also to assign the weights in a bumpy array as in a standard histogram. However, it keeps showing the density and not the probability/frequency of each bin.
My code is
plt.figure(figsize=(10, 4))
plt.xlim(-0.145,0.145)
plt.axvline(0, color='grey')
data = df['col1']
x = np.random.normal(data.mean(), scale=data.std(), size=(100000))
normal_dist =sns.distplot(x, hist=False,color="red",label="Gaussian")
data_viz = sns.distplot(data,color="blue", bins=31,label="data", norm_hist=False)
# I also tried adding the weights inside the argument
#hist_kws={'weights': np.ones(len(data))/len(data)})
plt.legend(bbox_to_anchor=(1, 1), loc=1)
And I keep receiving this output:
Does anyone have an idea of what could be the problem here?
Thanks!
[EDIT]: The problem is that the y-axis is showing the kdevalues and not those from the weighted histogram. If I set kde=False then I can display the frequency in the y-axis. However, I still want to keep the kde, so I am not considering that option.
Keeping the kde and the frequency/count in one y-axis in one plot will not work because they have different scales. So it might be better to create a plot with 2 axis with each showing the kde and histogram separately.
From documentation norm_hist If True, the histogram height shows a density rather than a count. **This is implied if a KDE or fitted density is plotted**.
versusnja in https://github.com/mwaskom/seaborn/issues/479 has a workaround:
# Plot hist without kde.
# Create another Y axis.
# Plot kde without hist on the second Y axis.
# Remove Y ticks from the second axis.
first_ax = sns.distplot(data, kde=False)
second_ax = ax.twinx()
sns.distplot(data, ax=second_ax, kde=True, hist=False)
second_ax.set_yticks([])
If you need this just for visualization it should be good enough.

Single column heat map in python

My goal is to have a single column heat map, but for some reason to code I normally use for heat maps doesn't work with if I'm not using a 2-D array.
vec1 = np.asarray([1,2,3,4,5])
fig, ax = plt.subplots()
plt.imshow(vec1, cmap='jet')
I know it's weird to show I single column vector as a heat map, but it's a nice visual for my purposes. I just want a column of colored squares that I can label along the y-axis to show a ranked list of things to people.
You could use the library Seaborn to do this. In Seaborn you can identify specific columns to plot. In this case that'd be your array. The following should accomplish what you're wanting
vec1 = np.asarray([1,2,3,4,5])
fig, ax = plt.subplots()
seaborn.heatmap([vec1])
Then you'll just have to do your formatting on that heatmap as you would in pyplotlib.
http://seaborn.pydata.org/generated/seaborn.heatmap.html
Starting from the previous answer, I've come up with an approach which uses both Seaborn and Matplotlib's transform to do what pavlov requested within its comment (that is, swapping axis in a heatmap even though Seaborn does not have an orientation parameter).
Let's start from the previous answer:
vec1 = np.asarray([1,2,3,4,5])
sns = heatmap([vec1])
plt.show()
Using heatmap on a single vector yields to the following result:
Ok, let's swap the x-axis with the y-axis. To do that, we can use an Affine2D transform, applying a rotation of 90 degrees.
from matplotlib import transforms
tr = transforms.Affine2D().rotate_deg(90)
Let's also reshape the initial array to make it resemble a column vector:
vec2 = vec1.reshape(vec1.shape[0], 1)
Now we can plot the heatmap and force Matplotlib to perform an affine transform:
sns.heatmap(vec2)
plt.show(tr)
The resulting plot is:
Now, if we want to force each row to be a square, we can simply use the square=True parameter:
sns.heatmap(vec2, square=True)
plt.show(tr)
This is the final result:
Hope it helps!

Adding horizontal and vertical lines and colorbar to seaborn jointplot

I would like to use kernel density estimate of seaborn.
First I would like to add a colorbor for the main plot.
Second I would like to add horizontal line to the joint probability distribution to show the 68%, 98% confidence levels and another line which shows the true value
Third I also would like to remove the legend in the plot, considering the following example:
import numpy as np
import pandas as pd
import seaborn as sns
sns.set_context("paper")
# Generate a random correlated bivariate dataset
rs = np.random.RandomState(5)
mean = [0, 0]
cov = [(1, .5), (.5, 1)]
x1, x2 = rs.multivariate_normal(mean, cov, 500).T
x1 = pd.Series(x1, name="$X_1$")
x2 = pd.Series(x2, name="$X_2$")
# Show the joint distribution using kernel density estimation
g = sns.jointplot(x1, x2, kind="kde", size=7, space=0, color="r")
How should I do it?
Not easily possible (although the density values are not particularly interpretable anyway).
These are matplotlib objects, you can add any additional plot elements you want to them.
stat_func=None, as is shown here.
AFAIK you can't do any of those per doc.
I would like to add a colorbor for the main plot.
That's no an option with jointplot. Colorbars are only available with heatmap, clustermap, and interactplot
I would like to add horizontal line to the joint probability distribution to show the 68%, 98% confidence levels and another line
which shows the true value
Not an option as well, the closest you can come to that is overlay two graphs
I also would like to remove the legend in the plot
I'm assuming you're talking about the pearsonr and p values. Those aren't legends and no documentation to show a way to remove them.

Creating two x-axes for a line-plot in matplotlib with unknown transform function between scales

Using matplotlib, two x-axes for 1 line plot can easily be obtained using twiny().
If the transform between the two x-scales can be described by a function, the corresponding ticks can be set by applying this transform function.
(this is described here: How to add a second x-axis in matplotlib)
How can I achieve this, if the transform function between the scales is unknown?
Edit:
Imagine the following situation:
You have 2 thermometers, both measuring the temperature. Thermometer 1 is measuring in °C and thermometer 2 in an imaginary unit, lets call it °D. Basically, what you know is that with increasing °C °D is increasing as well. Additionally, both thermometers have some degree of inaccuracy.
Both thermometers measure the same physical quantity, hence I should be able to represent them with a single line and two scales. However, in contrast to plotting tempoeratures in °C vs. K or °F, the transformation between the scales is unknown.
This means for example I have:
import numpy as np
from matplotlib import pyplot as plt
temp1 = np.sort(np.random.uniform(size=21))
temp2 = np.sort(np.random.uniform(low=-20, high=20, size=21))
y = np.linspace(0,1,21, endpoint=True)
A transform function between temp1 and temp2 is existent, but unknow. Y, however, is the same.
Additionally, I know that temp1 and y are confined to the range (0,1)
Now we may plot like this:
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.set_aspect('equal')
ax2 = plt.twiny(ax1)
ax1.plot(x1,y, 'k-')
ax2.plot(x2,y, 'r:')
ax1.set_xlabel(r'1st x-axis')
ax2.set_xlabel(r'2nd x-axis')
ax1.set_xlim([0,1])
ax1.set_ylim([0,1])
fig.savefig('dual_x_faulty.png', format='png')
This leads to the following plot:
You can see that both curves are not the same, and the plot is not square (as it would be without twinning the y axis).
So, here is what I want (and can't achieve on my own):
Plotting a 3d-array (temp1, temp2, y) in a 2d line plot by having two x-axes
Matplotlib shoud 'automagically' set the ticks of temp2 such, that the curves (temp1, y) and (temp2, y) are congruent
Is there a workaround?
Thanks for your help!

Number density contours in Python

I'm trying to reproduce this plot in python with little luck:
It's a simple number density contour currently done in SuperMongo. I'd like to drop it in favor of Python but the closest I can get is:
which is by using hexbin(). How could I go about getting the python plot to resemble the SuperMongo one? I don't have enough rep to post images, sorry for the links. Thanks for your time!
Example simple contour plot from a fellow SuperMongo => python sufferer:
import numpy as np
from matplotlib.colors import LogNorm
from matplotlib import pyplot as plt
plt.interactive(True)
fig=plt.figure(1)
plt.clf()
# generate input data; you already have that
x1 = np.random.normal(0,10,100000)
y1 = np.random.normal(0,7,100000)/10.
x2 = np.random.normal(-15,7,100000)
y2 = np.random.normal(-10,10,100000)/10.
x=np.concatenate([x1,x2])
y=np.concatenate([y1,y2])
# calculate the 2D density of the data given
counts,xbins,ybins=np.histogram2d(x,y,bins=100,normed=LogNorm())
# make the contour plot
plt.contour(counts.transpose(),extent=[xbins.min(),xbins.max(),
ybins.min(),ybins.max()],linewidths=3,colors='black',
linestyles='solid')
plt.show()
produces a nice contour plot.
The contour function offers a lot of fancy adjustments, for example let's set the levels by hand:
plt.clf()
mylevels=[1.e-4, 1.e-3, 1.e-2]
plt.contour(counts.transpose(),mylevels,extent=[xbins.min(),xbins.max(),
ybins.min(),ybins.max()],linewidths=3,colors='black',
linestyles='solid')
plt.show()
producing this plot:
And finally, in SM one can do contour plots on linear and log scales, so I spent a little time trying to figure out how to do this in matplotlib. Here is an example when the y points need to be plotted on the log scale and the x points still on the linear scale:
plt.clf()
# this is our new data which ought to be plotted on the log scale
ynew=10**y
# but the binning needs to be done in linear space
counts,xbins,ybins=np.histogram2d(x,y,bins=100,normed=LogNorm())
mylevels=[1.e-4,1.e-3,1.e-2]
# and the plotting needs to be done in the data (i.e., exponential) space
plt.contour(xbins[:-1],10**ybins[:-1],counts.transpose(),mylevels,
extent=[xbins.min(),xbins.max(),ybins.min(),ybins.max()],
linewidths=3,colors='black',linestyles='solid')
plt.yscale('log')
plt.show()
This produces a plot which looks very similar to the linear one, but with a nice vertical log axis, which is what was intended:
Have you checked out matplotlib's contour plot?
Unfortunately I couldn't view yours images. Do you mean something like this? It was done by MathGL -- GPL plotting library, which have Python interface too. And you can use arbitrary data arrays as input (including numpy's one).
You can use numpy.histogram2d to get a number density distribution of your array.
Try this example:
http://micropore.wordpress.com/2011/10/01/2d-density-plot-or-2d-histogram/

Categories

Resources