I'm trying to make a scaled scatter plot from a histogram. The scatter plot is fairly straight-forward, make the histogram, find bin centers, scatter plot.
nbins=7
# Some example data
A = np.random.randint(0, 10, 100)
B = np.random.rand(100)
counts, binEdges=np.histogram(A,bins=nbins)
bincenters = 0.5*(binEdges[1:]+binEdges[:-1])
fig = plt.figure(figsize=(7,5))
ax = fig.add_subplot(111)
ax.scatter(bincenters,counts,c='k', marker='.')
ax_setup(ax, 'X', 'Y')
plt.show()
but I want each element of A to only contribute a scaled value to it's bin, that scaled value is stored in B. (i.e. instead of each bin being the count of elements from A for that bin, I want each bin to be the sum of corresponding values from B)
To do this I tried creating a list C (same length as A, and B) that had the bin number allocation for each element of A, then summing all of the values from B that go into the same bin. I thought numpy.searchsorted() is what I needed e.g.,
C = bincenters.searchsorted(A, 'right')
but this doesn't get the allocation right, and doesn't seem to return the correct number of bins.
So, how do I create a list that tells me which histogram bin each element of my data goes into?
You write
but I want each element of A to only contribute a scaled value to it's bin, that scaled value is stored in B. (i.e. instead of each bin being the count of elements from A for that bin, I want each bin to be the sum of corresponding values from B)
IIUC, this functionality is already supported in numpy.histogram via the weights parameter:
An array of weights, of the same shape as a. Each value in a only contributes its associated weight towards the bin count (instead of 1). If normed is True, the weights are normalized, so that the integral of the density over the range remains 1.
So, for your case, it would just be
counts, binEdges=np.histogram(A, bins=nbins, weights=B)
Another point: if your intent is to plot the histogram, note that you can directly use matplotlib.pyplot's utility functions for this (which take weights as well):
from matplotlib import pyplot as plt
plt.hist(A, bins=nbins, weights=B);
Finally, if you're intent on getting the assignments to bins, then that's exactly what numpy.digitize does:
nbins=7
# Some example data
A = np.random.randint(0, 10, 10)
B = np.random.rand(10)
counts, binEdges=np.histogram(A,bins=nbins)
>>> binEdges, np.digitize(A, binEdges)
array([ 0. , 1.28571429, 2.57142857, 3.85714286, 5.14285714,
6.42857143, 7.71428571, 9. ])
Related
I have a range of positive integers ranging from 250-1200, with a normal distribution. I have found the answer to creating bins of equal density (Matplotlib: How to make a histogram with bins of equal area?). What I am actually looking for is to be able to retrieve the upper and lower boundaries of each bin. Is there a library/function that exists for this? or can this information be pulled out from matplotlib?
Let's take a look at the code provided in the question you linked:
def histedges_equalN(x, nbin):
npt = len(x)
return np.interp(np.linspace(0, npt, nbin + 1),
np.arange(npt),
np.sort(x))
x = np.random.randn(1000)
n, bins, patches = plt.hist(x, histedges_equalN(x, 10))
bins is actually giving you the edges of each bin as you can read in the docs of hist function:
I need to use each individual bin of three histograms to use in an equation (i.e. bin1 from histogram1, histogram2, and histogram3, and then bin2 from h1, h2, h3, etc).
Is there any way to call the individual bins from a histogram to use?
Thanks in advance
Yes, when you call plt.hist the locations of the bins, as well as the number of entries in each bin is returned. Let's say you generated three histograms (I'm going to go for histograms 0, 1 and 2, because python):
import matplotlib.pyplot as plt
import numpy as np
x0 = np.random.rand(25)
x1 = np.random.rand(25)
x2 = np.random.rand(25)
counts0, bins0, patches0 = plt.hist(x0)
counts1, bins1, patches1 = plt.hist(x1)
counts2, bins2, patches2 = plt.hist(x2)
The locations of the bins for histogram 0 are then stored in bins0.
The number of entries in the bins of histogram 0 is then stored in counts0.
I'd then be tempted to gather these together into 2d arrays:
counts = np.vstack([counts0, counts1, counts2]).T
bins = np.vstack([bins0, bins1, bins2]).T
Now, bins[i, j] details the location of bin i for histogram j. Similarly counts[i, j] contains the number of entries in bin i of histogram j.
With this set up you can get the counts in bin i for histograms 0, 1 and 2 as counts[i].
Additionally, if you don't actually need the plots, and are only calling plt.hist to get a handle on counts and bins then you can use np.histogram instead. The syntax is similar: counts, bins = np.histogram(x) (np.histogram doesn't return patches).
I would like to plot histogram using matplotlib.
I am just wondering how I may set up range (<9.0,9.0-10.0,11.0-12.0,12.0-13.0.. max element in an array) of bins.
<9.0 stands for elements smaller than 0.9
I have used the smallest and biggest value in an array:
plt.hist(results, bins=np.arange(np.amin(results),np.amax(results),0.1))
I'll be grateful for any hints
The list or array supplied to bins contains the edges of the histogram bins. You may therefore create a bin ranging from the minimal value in results to 9.0.
bins = [np.min(results)] + range(9, np.max(results), 1)
plt.hist(results, bins=bins)
I have a list.
Index of list is degree number.
Value is the probability of this degree number.
It looks like, x[ 1 ] = 0.01 means, the degree 1 's probability is 0.01.
I want to draw a distribution graph of this list, and I try
hist = plt.figure(1)
plt.hist(PrDeg, bins = 1)
plt.title("Degree Probability Histogram")
plt.xlabel("Degree")
plt.ylabel("Prob.")
hist.savefig("Prob_Hist")
PrDeg is the list which i mention above.
But the saved figure is not correct.
The X axis value becomes to Prob. and Y is Degree ( Index of list )
How can I exchange x and y axis value by using pyplot ?
Histograms do not usually show you probabilities, they show the count or frequency of observations within different intervals of values, called bins. pyplot defines interval or bins by splitting the range between the minimum and maximum value of your array into n equally sized bins, where n is the number you specified with argument : bins = 1. So, in this case your histogram has a single bin which gives it its odd aspect. By increasing that number you will be able to better see what actually happens there.
The only information that we can get from such an histogram is that the values of your data range from 0.0 to ~0.122 and that len(PrDeg) is close to 1800. If I am right about that much, it means your graph looks like what one would expect from an histogram and it is therefore not incorrect.
To answer your question about swapping the axes, the argument orientation=u'horizontal' is what you are looking for. I used it in the example below, renaming the axes accordingly:
import numpy as np
import matplotlib.pyplot as plt
PrDeg = np.random.normal(0,1,10000)
print PrDeg
hist = plt.figure(1)
plt.hist(PrDeg, bins = 100, orientation=u'horizontal')
plt.title("Degree Probability Histogram")
plt.xlabel("count")
plt.ylabel("Values randomly generated by numpy")
hist.savefig("Prob_Hist")
plt.show()
Is there a way to tell matplotlib to "normalize" a histogram such that its area equals a specified value (other than 1)?
The option "normed = 0" in
n, bins, patches = plt.hist(x, 50, normed=0, histtype='stepfilled')
just brings it back to a frequency distribution.
Just calculate it and normalize it to any value you'd like, then use bar to plot the histogram.
On a side note, this will normalize things such that the area of all the bars is normed_value. The raw sum will not be normed_value (though it's easy to have that be the case, if you'd like).
E.g.
import numpy as np
import matplotlib.pyplot as plt
x = np.random.random(100)
normed_value = 2
hist, bins = np.histogram(x, bins=20, density=True)
widths = np.diff(bins)
hist *= normed_value
plt.bar(bins[:-1], hist, widths)
plt.show()
So, in this case, if we were to integrate (sum the height multiplied by the width) the bins, we'd get 2.0 instead of 1.0. (i.e. (hist * widths).sum() will yield 2.0)
You can pass a weights argument to hist instead of using normed. For example, if your bins cover the interval [minval, maxval], you have n bins, and you want to normalize the area to A, then I think
weights = np.empty_like(x)
weights.fill(A * n / (maxval-minval) / x.size)
plt.hist(x, bins=n, range=(minval, maxval), weights=weights)
should do the trick.
EDIT: The weights argument must be the same size as x, and its effect is to make each value in x contribute the corresponding value in weights towards the bin count, instead of 1.
I think the hist function could probably do with a greater ability to control normalization, though. For example, I think as it stands, values outside the binned range are ignored when normalizing, which isn't generally what you want.