Python - create custom bins defined with x and y boundaries - python

I would like to create spatial bins like (for example) the ones in this plot here:
Here there are 412 bins but this can vary depending on how many I want (https://arxiv.org/pdf/1909.04701.pdf). I have already computed all the boundary lines that defines these bins. If I have millions of points defined by their x, y coordinates, how would I efficiently put them in one of these 412 bins.
[update]
The latitude bins are always equivalent in size while the longitudinal bins are not. I can use np.digitize to find the latitudinal bin pretty easily, once that's found I know the bins of the longitude as well. However I'm not sure I can vectorize np.digitize to have a different bin array for each point I provide. longitudinal_bins_arr would be an array of all the longitudinal bins for each latitude for the length of all the points that I'm trying to bin.
lat_bin_for_points = np.digitize( latlon[:,0], latitudinal_bins )
lon_bin_for_points = np.digitize( latlon[:,1], longitudinal_bins_arr[lat_bin_for_points] )

Related

How can I find the line of best fit with a 2d-Histogram in Python?

I have a 2d Histogram where the value of each bin is calculated by points per bin divided by total points inside that bins row (so occurrence percentage by row). If I am trying to create a line of best fit that goes through the denser center areas of the histogram, how could I do that?
The data I have is one numpy array stored like,
percentages = [[0.00209644 0.00069881 0.00279525 0.00069881 0.00139762
0.00209644 0.00349406 0.00419287 0.00628931 0.01607268 0.01467505
0.02166317 0.02445842 0.03214535, i, i, i, and so on]
[0.02581665 0.02212856 0.02107482...]]
that is a 50 x 20 array so each bin has a value. Using these values, I made the histogram using
plt.pcolormesh(xEdges, yEdges, percentages)
So my question is, how would I create a line of best fit when this is all the information I have?
the denser center areas of the histogram
I assume density would be the z-value - percentages.
define an upper and lower bound for the values of your line, maybe .079 < percentages <= .081
find all the points within those boundaries
add a line using those points.
if the line is too thick or not continuous, adjust the boundaries and repeat.
determine the value that delineates inside or outside - maybe .08 percent
use numpy's .isclose method, with an appropriate tolerance, to find the points close to that value
draw a line using those points

Build a histogram in python by giving bins parameters

I have the x and y obtained from a histogram, and I want to rebuild that histogram. How can I do that? I tried this:
plt.hist(x,bins=len(x),weights=y)
But it seems like the points are not exactly on the center of the bin and they get significantly shifted after a while (the points on the x-axis are not equally spaced).

How do I estimate the 80% cumulative distribution point in scipy, numpy and/or Python?

Seaborn has a kdeplot function where if you pass in cumulative=True, then a cumulative distribution of the data is drawn. I need to annotate or figure out the value on the x-axis at which the cumulative distribution is 80% and then draw a vertical line from that value.
Is there a method in numpy, scipy or elsewhere in Python that may compute that value?
If you already have the cdf, then you can do the following. I'm not sure how your data is formatted, but assuming you have two arrays, one of x-values and one of y-values, you can search for the index of the y-value just above 0.8. The corresponding x-value would be what you're looking for. A quick way to do this, since your y-values should already be sorted, is:
import bisect
index = bisect.bisect_right(y_vals, 0.8) - 1
This is a nearest neighbor approach. If you want a slightly more accurate x-value, you can linearly interpolate between index and index-1.

range of the bins matplotlib

I would like to plot histogram using matplotlib.
I am just wondering how I may set up range (<9.0,9.0-10.0,11.0-12.0,12.0-13.0.. max element in an array) of bins.
<9.0 stands for elements smaller than 0.9
I have used the smallest and biggest value in an array:
plt.hist(results, bins=np.arange(np.amin(results),np.amax(results),0.1))
I'll be grateful for any hints
The list or array supplied to bins contains the edges of the histogram bins. You may therefore create a bin ranging from the minimal value in results to 9.0.
bins = [np.min(results)] + range(9, np.max(results), 1)
plt.hist(results, bins=bins)

Binning data with overlapping bins

I need to binning wind data
The idea is to vary the wind bins size so each bin can cover a minimun amount of data.Then at the end I will have 360 overlapping bins
Therefore a need to define the lower and upper limit of the bins and vary the upper limit afterwards according to the data that fall into the different bins
np.digitize seems to work only with 1 dimensional bins
so this was my try:
import numpy as np
from scipy.stats import binned_statistic
#initially equal size bins[[lower limit], [upper limit]]
bins=np.array([[np.arange(0,360)], [np.arange(1,361)]])
#dfIn is a vector with angles from 0 to 360
index_Rdir= binned_statistic(dfIn, dfIn, bins=bins)
then the rest of the algorithm in which the frequency of data
in each bin is calculated and the upper limit of the bin increased
for those bins which do not get a minimun of data...
when trying to binning with binned_statistic raise the following error
ValueError: cannot copy sequence with size 2973 to array axis with dimension 3
If instead of one array for the bins I try with a list, I got the similar error.
I tried also with np.histogram as according to documentation allows non uniform bin size
index_Rdir= np.histogram(dfIn, bins=bins)
all the input arrays must have same number of dimensions

Categories

Resources