Binning data with overlapping bins

Binning data with overlapping bins - python

I need to binning wind data
The idea is to vary the wind bins size so each bin can cover a minimun amount of data.Then at the end I will have 360 overlapping bins
Therefore a need to define the lower and upper limit of the bins and vary the upper limit afterwards according to the data that fall into the different bins
np.digitize seems to work only with 1 dimensional bins
so this was my try:
import numpy as np
from scipy.stats import binned_statistic
#initially equal size bins[[lower limit], [upper limit]]
bins=np.array([[np.arange(0,360)], [np.arange(1,361)]])
#dfIn is a vector with angles from 0 to 360
index_Rdir= binned_statistic(dfIn, dfIn, bins=bins)
then the rest of the algorithm in which the frequency of data
in each bin is calculated and the upper limit of the bin increased
for those bins which do not get a minimun of data...
when trying to binning with binned_statistic raise the following error
ValueError: cannot copy sequence with size 2973 to array axis with dimension 3
If instead of one array for the bins I try with a list, I got the similar error.
I tried also with np.histogram as according to documentation allows non uniform bin size
index_Rdir= np.histogram(dfIn, bins=bins)
all the input arrays must have same number of dimensions

Related

Matplotlib - Wrong number of Frequency bins with Specgram

From what I unserstand, in a FFT the number of frequency bins is exactly the number of samples / 2.
However, matplotlib.specgram gives me one bin too much.
fig, [ax1, ax2, ax3] = plt.subplots(nrows=3, ncols=1)
Pxx, freqs, times, im = ax3.specgram(raw_normalized, NFFT=1024, Fs=sampleRate, noverlap=overlap, window=matplotlib.mlab.window_none, scale='linear')
Pxx contains the array of the spectogram, which should have 512 bins (due to the number of samples set to 1024), however it has 513.
Is there something off with my unserstanding off FFTs, or is there something wrong/quirky in the matplotlib library?

Just a wild guess, but if you set NFFT to 1024, it should have 1024 bins. However, if your input is real-valued, the values should be symmetrical. The 0th value should be the DC value, and the 512th value should be the value with the highest frequency. The 511th and 513th should be identical, so the spectrogram might filter out the symmetric values, as it knows the input is real-valued. So you get 513 values. (because the 513th to 1023th values are hidden; starting count at #0, of course)
The reasoning behind that is that the FFT folds a 'rotating' value on top of your data. It starts rotation slowly, #0 is the dc value, followed by #1, which is one rotation on the entire data. #2 is two rotations, and so on.
#512 is 512 rotations on your data of 1024 points, meaning you get one full rotation every 2 samples. This is the nyquist frequency for your data, everything above that will be subject to aliasing. Therefore the #513 looks identically to #511, just rotating in reverse. #1023 is identical to #1, just a single rotation, but in the opposite direction.
For complex-valued data, folding with a clockwise rotation and a counter clockwise rotation makes a difference, but for real-valued data it is the same.
Therefore values #513 to #1023 can be discarded, leaving you with 513 meaningful buckets.
Another detail: Technically, the output values of the FFT are always complex, even with real-valued inputs, and contain both a magnitude and a phase information, but your library probably filters out the phase information and just gives the magnitude, to convert it back to real-valued output values.

Numpy histogram data: Why is the length of bins vector longer than the histogram values vector?

There are two outputs to numpy.histogram:
hist: values of the histogram
bin_edges: Return the bin edges (length(hist)+1)
both are vectors but in the example below, the second vector is of length 101, which is 1 higher than the first vector, which is length 100 :
import numpy as np
from numpy.random import rand, randn
n = 100 # number of bins
X = randn(n)*.1
a,bins1 = np.histogram(X,bins=n)
The following shape error occurs if I then try plt.plot(bins1,a):
ValueError: x and y must have same first dimension, but have shapes (101,) and (100,)
Why, and how do I fix the inequal shape error so I can plot the histogram?

The unequal shapes occur because bin_edges, as the name implies, specifies the bin edges. Since a bin has left and right edge, bin_edges will have be of length len(bins)+1.
As already denoted in the comments, an appropriate way to plot is plt.hist

I had this question as well because I wanted to transform my data before doing a histogram but display the results un-transformed (eg. just keep the autogenerated bin edges). The other answers here got you most of the way but what I found was useful was to do something like this:
h, bin_edges = np.histogram(np.log(X), bins=100)
plt.hist(X, bins=np.exp(bin_edges))
Of course, you could do this manually by just choosing your bin edges originally and passing them in to plt.hist without using np.histogram. But this was nice as the automated calculations simplified some things for me.

Python - create custom bins defined with x and y boundaries

I would like to create spatial bins like (for example) the ones in this plot here:
Here there are 412 bins but this can vary depending on how many I want (https://arxiv.org/pdf/1909.04701.pdf). I have already computed all the boundary lines that defines these bins. If I have millions of points defined by their x, y coordinates, how would I efficiently put them in one of these 412 bins.
[update]
The latitude bins are always equivalent in size while the longitudinal bins are not. I can use np.digitize to find the latitudinal bin pretty easily, once that's found I know the bins of the longitude as well. However I'm not sure I can vectorize np.digitize to have a different bin array for each point I provide. longitudinal_bins_arr would be an array of all the longitudinal bins for each latitude for the length of all the points that I'm trying to bin.
lat_bin_for_points = np.digitize( latlon[:,0], latitudinal_bins )
lon_bin_for_points = np.digitize( latlon[:,1], longitudinal_bins_arr[lat_bin_for_points] )

range of the bins matplotlib

I would like to plot histogram using matplotlib.
I am just wondering how I may set up range (<9.0,9.0-10.0,11.0-12.0,12.0-13.0.. max element in an array) of bins.
<9.0 stands for elements smaller than 0.9
I have used the smallest and biggest value in an array:
plt.hist(results, bins=np.arange(np.amin(results),np.amax(results),0.1))
I'll be grateful for any hints

The list or array supplied to bins contains the edges of the histogram bins. You may therefore create a bin ranging from the minimal value in results to 9.0.
bins = [np.min(results)] + range(9, np.max(results), 1)
plt.hist(results, bins=bins)

Using pyplot to draw histogram

I have a list.
Index of list is degree number.
Value is the probability of this degree number.
It looks like, x[ 1 ] = 0.01 means, the degree 1 's probability is 0.01.
I want to draw a distribution graph of this list, and I try
hist = plt.figure(1)
plt.hist(PrDeg, bins = 1)
plt.title("Degree Probability Histogram")
plt.xlabel("Degree")
plt.ylabel("Prob.")
hist.savefig("Prob_Hist")
PrDeg is the list which i mention above.
But the saved figure is not correct.
The X axis value becomes to Prob. and Y is Degree ( Index of list )
How can I exchange x and y axis value by using pyplot ?

Histograms do not usually show you probabilities, they show the count or frequency of observations within different intervals of values, called bins. pyplot defines interval or bins by splitting the range between the minimum and maximum value of your array into n equally sized bins, where n is the number you specified with argument : bins = 1. So, in this case your histogram has a single bin which gives it its odd aspect. By increasing that number you will be able to better see what actually happens there.
The only information that we can get from such an histogram is that the values of your data range from 0.0 to ~0.122 and that len(PrDeg) is close to 1800. If I am right about that much, it means your graph looks like what one would expect from an histogram and it is therefore not incorrect.
To answer your question about swapping the axes, the argument orientation=u'horizontal' is what you are looking for. I used it in the example below, renaming the axes accordingly:
import numpy as np
import matplotlib.pyplot as plt
PrDeg = np.random.normal(0,1,10000)
print PrDeg
hist = plt.figure(1)
plt.hist(PrDeg, bins = 100, orientation=u'horizontal')
plt.title("Degree Probability Histogram")
plt.xlabel("count")
plt.ylabel("Values randomly generated by numpy")
hist.savefig("Prob_Hist")
plt.show()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.