histogram giving wrong bins for zero array - python

plt.hist(np.zeros((784,1)), bins=2)
This should produce histogram with all values for bin with 0 but the output is:
What's wrong?

Not shure what you are expecting, maybe this helps:
The bins represent intervals. The function computes the occurrences of the input data that fall within each bin (or interval).
Consider this example:
plt.hist(np.zeros((784)), bins=(0,1,2))
There are 2 intervals, the first for values from 0 to 1 , the second for values from 1 to 2. So you will have 784 'counts' in the first and no 'counts' in the second intervall. This will produce the following:
Now if you replace bins=(0,1,2) with bins=2, it will use 2 intervals of equal width between the minimum input value and the maximum input value. Since you have only zeros in the input, it takes -0,5 as minimum and +0,5 as maximum, resulting in the histogram you showed above: no 'counts' between -0,5 to 0 and all 784 zeros between 0 and +0,5.
So I guess what you want is a thin bar centered at zero, you can get this by e.g. setting bins = some bigger odd number:
plt.hist(np.zeros((784)), bins=7)

That's how plt.hist works. For example, you have a list like that (3, 5, 1, 7, 4, 3, 9, 0, 2) and pass it to plt.hist with bins=3. Hist distributes all the numbers to 3 categories (e.g. 0-2, 3-6, 7-9) and draws 3 bins. The height of each bin represents the quantity of numbers that were distributed to a corresponding category. In this case, heights will be (3, 4, 2). In your case, bins=2, and categories are something like (-0.5-0.0001, 0-0.5). All the 784 zeros are distributed to the second bin, and the fist bin is empty.
There is another function in matplotlob that works as you probably expected plt.hist to work. It's plt.bar. You can just pass the heights of the bins to it and it will will do nothing to them and just draw a histogram. You can use it like that:
plt.bar(np.arange(784), np.zeros((784,1)))
and it will give you 784 zero-height bars.

Related

Min numpy - 3D array

I get confused by this example.
A = np.random.random((6, 4, 5))
A
A.min(axis=0)
A.min(axis=1)
A.min(axis=2)
What mins are we really computing here?
I know I can think of this array as a 6x5x4 Parallelepiped in 3D space and I know A.min(axis=0) means we go along the 0-th axis. OK, but as we go along that 0-th axis all we get is 6 "layers" which are basically rectangles of size 4x5 filled with numbers. So what min am I computing when saying A.min(axis=0) for example?!?! I am just trying to visualize it in my head.
From A.min(axis=0) I get back a 4x5 2D matrix. Why? Shouldn't I get just 6 values in a 1D array. I am walking along the 0-th axis so shouldn't I get 6 values back - one value for each of these 4x5 rectangles?
I always find this notation confusing and just don't get it, sorry.
You calculate the min across one particular axis when you are interested in maintaining the structure of the remainder axes.
The gif below may help to understand.
In this example, your result will have shape (3, 2).
That's because you are getting the smallest value along axis 0, which squeezes that dimension into only 1 value, so we don't need the dimension anymore.

Historgram function for matplotlib returning weird y-axis values

I am trying to understand the matplotlib.hist function. I have the following data:
cs137_count = np.array([this has a size of 750 and integers in the range from 1820 to 1980])
plt.figure()
plt.hist(cs137_count, density=True, bin = 50)
plt.ylabel('Distribution')
plt.xlabel('Counts');
but the plot it provides has weird values for the y-axis in the range from 0 - 0.016 which makes no sense and I am not sure why it returns those values? I have attached an image of the plot below.
That's because you're using density=True. From the docs
density: bool, optional
If True, the first element of the return tuple
will be the counts normalized to form a probability density, i.e., the
area (or integral) under the histogram will sum to 1. This is achieved
by dividing the count by the number of observations times the bin
width and not dividing by the total number of observations. If stacked
is also True, the sum of the histograms is normalized to 1.
Default is False.

range of the bins matplotlib

I would like to plot histogram using matplotlib.
I am just wondering how I may set up range (<9.0,9.0-10.0,11.0-12.0,12.0-13.0.. max element in an array) of bins.
<9.0 stands for elements smaller than 0.9
I have used the smallest and biggest value in an array:
plt.hist(results, bins=np.arange(np.amin(results),np.amax(results),0.1))
I'll be grateful for any hints
The list or array supplied to bins contains the edges of the histogram bins. You may therefore create a bin ranging from the minimal value in results to 9.0.
bins = [np.min(results)] + range(9, np.max(results), 1)
plt.hist(results, bins=bins)

Python histogram with pre-set size of bins

How would I go by if I wanted to plot a histogram when I already have the bins and their size ?
If I use :
plt.hist(x, bins)
it considers x as a list of results and not the already defined value of the corresponding bin.
Thanks
In that case you can simply create a bar chart with plt.bar:
plt.bar(bins[:, 0], x, bins[:, 1] - bins[:, 0])
I simply assumed bins is an array of shape (n, 2), where nis the number of bins. The first column is the lowest value covered by the bin and the second column is the upper value covered by the bin.

How can I normalize a histogram such that the sum of the heights is equal to 1?

I generated the figure below using the a call to matplotlib.pyplot.hist in which I passed the kwarg normed=True:
Upon further research, I realized that this kind of normalization works in such a way that the integral of the histogram is equal to 1. How can I plot this same data such that the sum of the heights of the bars equals 1?
In other words, I want each bit to represent the proportion of the whole that its values contain.
I'm not sure if there's a straightforward way, but
you can manually divide all bar heights by the length of the input (the following is made in ipython --pylab to skip the imports):
inp = normal(size=1000)
h = hist(inp)
Which gives you
Now, you can do:
bar(h[1][:-1], h[0]/float(len(inp)), diff(h[1]))
and get

Categories

Resources