I'm trying to get a set of histograms plotted, with raw count data (non-normalized to density/pdf) and a fit line. However, I can't seem to figure out how to get a fit line plotted that ISN'T normalized by a pdf function. Is there a way to plot a non-normalized line, or a function to reverse the density calculation? Right now, I've got the below code, which works for the normalized histogram and fit line.
fig, ax = plt.subplots()
x=[13.140,17.520,15.768,10.512,10.512,9.636,10.512, 9.636,11.388,7.884,7.008,7.008,9.636,11.388,7.884,7.88,16.644,42.924,17.520]
n, bins, patches = plt.hist(x, bins=10, normed=False, color='cornflowerblue', alpha=0.75)
(mu, sigma) = norm.fit(x)
y = mlab.normpdf(bins, mu, sigma)
l = plt.plot(bins, y, '-o', linewidth=2)
ax.set_xlabel('Millirems')
This is the graph i have so far, with raw count data and a normalized fit line
You could just do this by multiplying the pdf by the total area of the histogram I think?
import numpy as np
l = plt.plot(bins, y * np.sum(np.diff(bins) * n))
Maybe you want to scale the pdf by the same factor the histogram is scaled with respect to a normed one. This factor would be the area of the histogram sum(n * np.diff(bins)).
fig, ax = plt.subplots()
x = [13.140,17.520,15.768,10.512,10.512,9.636,10.512, 9.636,11.388,7.884,7.008,7.008,9.636,11.388,7.884,7.88,16.644,42.924,17.520]
n, bins, patches = plt.hist(x, bins=10, normed=False, color='cornflowerblue', alpha=0.75)
(mu, sigma) = norm.fit(x)
y = mlab.normpdf(bins, mu, sigma) * sum(n * np.diff(bins))
plt.plot(bins, y, '-o', linewidth=2)
ax.set_xlabel('Millirems')
Related
Here I tried to plot the probability function P(s)=C/s and then plot a histogram showing real probability function and then show the results of sampling:
import numpy as np
s_min = 1
s_max = 1000
# calculate the normalization constant
C = 1 / (np.log(s_max) - np.log(s_min))
u = np.random.rand(int(1000000))
s = s_min * np.exp(u * (np.log(s_max) - np.log(s_min)))
a = np.log10(min(s))
b = np.log10(max(s))
mybins = np.logspace(a, b, num=17)
plt.hist(s, bins=mybins, density=True, histtype='step', log=True, label='Random Numbers')
x = np.logspace(a, b, num=100)
y = C / x
plt.plot(x, y, 'r', label='Expected Distribution')
plt.xlabel('s')
plt.ylabel('P(s)')
plt.xscale('log')
plt.yscale('log')
plt.legend()
plt.show()
but the code is generating an empty plot with labels.
Tried to add %matplotlib inline and nothing changed
I am using pyplot.hist2d to plot a 2D histogram (x vs.y) weighted by a third variable, z. Instead of summing z values in a given pixel [x_i,y_i] as done by hist2d, I'd like to obtain the average z of all data points falling in that pixel.
Is there a python script doing that ?
Thanks.
Numpy's histogram2d() can calculate both the counts (a standard histogram) as the sums (via the weights parameter). Dividing both gives the mean value.
The example below shows the 3 histograms together with a colorbar. The number of samples is chosen relatively small to demonstrate what would happen for cells with a count of zero (the division gives NaN, so the cell is left blank).
import numpy as np
import matplotlib.pyplot as plt
N = 1000
x = np.random.uniform(0, 10, N)
y = np.random.uniform(0, 10, N)
z = np.cos(x) * np.sin(y)
counts, xbins, ybins = np.histogram2d(x, y, bins=(30, 20))
sums, _, _ = np.histogram2d(x, y, weights=z, bins=(xbins, ybins))
fig, (ax1, ax2, ax3) = plt.subplots(ncols=3, figsize=(15, 4))
m1 = ax1.pcolormesh(ybins, xbins, counts, cmap='coolwarm')
plt.colorbar(m1, ax=ax1)
ax1.set_title('counts')
m2 = ax2.pcolormesh(ybins, xbins, sums, cmap='coolwarm')
plt.colorbar(m2, ax=ax2)
ax2.set_title('sums')
with np.errstate(divide='ignore', invalid='ignore'): # suppress possible divide-by-zero warnings
m3 = ax3.pcolormesh(ybins, xbins, sums / counts, cmap='coolwarm')
plt.colorbar(m3, ax=ax3)
ax3.set_title('mean values')
plt.tight_layout()
plt.show()
I have some 1-D data that is retrieved from two normal distributions. My goal is to estimate the two different gaussian components.
plt.hist(my_data, bins=100, edgecolor= 'white' normed=False)
I use a GMM (Gaussian Mixture model).
clf = mixture.GaussianMixture(n_components=2)
clf.fit(my_data)
I retrive my two gaussians.
mean_1 = clf.means_[0][0]
mean_2 = clf.means_[1][0]
std_1 = np.sqrt(clf.covariances_[0][0])[0]
std_2 = np.sqrt(clf.covariances_[1][0])[0]
weight_1 = weights[0]
weight_2 = weights[1]
Now to the question, I would like to overlay the histogram with gaussian parameters that i have above. I guess that I first have to norm the histogram but how do I plot them so that the area of each gaussian weights correctly and that the total area equals to 1, and how do i overlay on top of the non-normed histogram?
xmin, xmax = plt.xlim()
x = np.linspace(xmin, xmax, 500)
y = norm.pdf(x, mean_1, std_1)
plt.plot(x,y)
y = norm.pdf(x, mean_2, std_2)
plt.plot(x,y)
The above code block gives me two normed gaussians plots but they both have the same area.
UPDATE:
I solved my issue by scaling each component to its weight, and to overlay it on the non-normed histogram I scaled it with the total area of its bins.
val, bins, _ = plt.hist(my_data, bins=100, edgecolor = 'white',
normed=False)
area = sum(np.diff(bins)*val) + sum(np.diff(bins)*val)
y = norm.pdf(x, mean_1, std_1)*weight_1*area
plt.plot(x,y)
y = norm.pdf(x, mean_2, std_2)*weight_2*area
plt.plot(x,y)
I need to draw the density curve on the Histogram with the actual height of the bars (actual frequency) as the y-axis.
Try1:
I found a related answer here but, it has normalized the Histogram to the range of the curve.
Below is my code and the output.
import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
from scipy.stats import norm
data = [125.36, 126.66, 130.28, 133.74, 126.92, 120.85, 119.42, 128.61, 123.53, 130.15, 126.02, 116.65, 125.24, 126.84,
125.95, 114.41, 138.62, 127.4, 127.59, 123.57, 133.76, 124.6, 113.48, 128.6, 121.04, 119.42, 120.83, 136.53, 120.4,
136.58, 121.73, 132.72, 109.25, 125.42, 117.67, 124.01, 118.74, 128.99, 131.11, 112.27, 118.76, 119.15, 122.42,
122.22, 134.71, 126.22, 130.33, 120.52, 126.88, 117.4]
(mu, sigma) = norm.fit(data)
x = np.linspace(min(data), max(data), 100)
plt.hist(data, bins=12, normed=True)
plt.plot(x, mlab.normpdf(x, mu, sigma))
plt.show()
Try2:
There #DavidG has given an option, a user defined function even it doesn't cover the density of the Histogram accurately.
def gauss_function(x, a, x0, sigma):
return a * np.exp(-(x - x0) ** 2 / (2 * sigma ** 2))
test = gauss_function(x, max(data), mu, sigma)
plt.hist(data, bins=12)
plt.plot(x, test)
plt.show()
The result for this was,
But the actual Histogram is below, where Y-axis ranges from 0 to 8,
And I want to draw the density curve exactly on that. Any help this regards will be really appreciated.
Is this what you're looking for? I'm multiplying the pdf by the area of the histogram.
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
data = [125.36, 126.66, 130.28, 133.74, 126.92, 120.85, 119.42, 128.61, 123.53, 130.15, 126.02, 116.65, 125.24, 126.84,
125.95, 114.41, 138.62, 127.4, 127.59, 123.57, 133.76, 124.6, 113.48, 128.6, 121.04, 119.42, 120.83, 136.53, 120.4,
136.58, 121.73, 132.72, 109.25, 125.42, 117.67, 124.01, 118.74, 128.99, 131.11, 112.27, 118.76, 119.15, 122.42,
122.22, 134.71, 126.22, 130.33, 120.52, 126.88, 117.4]
(mu, sigma) = norm.fit(data)
x = np.linspace(min(data), max(data), 100)
values, bins, _ = plt.hist(data, bins=12)
area = sum(np.diff(bins) * values)
plt.plot(x, norm.pdf(x, mu, sigma) * area, 'r')
plt.show()
Result:
I need to center the bars of a histogram.
x = array
y = [0,1,2,3,4,5,6,7,8,9,10]
num_bins = len(array)
n, bins, patches = plt.hist(x, num_bins, facecolor='green', alpha=0.5)
barWidth=20
x.bar(x, y, width=barWidth, align='center')
plt.show()
What I need, is that it looks like the one in this picture
I tried almost everything, but still can't go through.
Thank you all
For your task, I think it's better to calculate the histogram with NumPy and plot with bat function. Please refer to a following code and see how to use bin_edges.
import matplotlib.pyplot as plt
import numpy as np
num_samples = 100
num_bins = 10
lb, ub = 0, 10 # lower bound, upper bound
# create samples
y = np.random.random(num_samples) * ub
# caluculate histogram
hist, bin_edges = np.histogram(y, num_bins, range=(lb, ub))
width = (bin_edges[1] - bin_edges[0])
# plot histogram
plt.bar(bin_edges[:-1], hist, align='center',
width=width, edgecolor='k', facecolor='green', alpha=0.5)
plt.xticks(range(num_bins))
plt.xlim([lb-width/2, ub-width/2])
plt.show()