I have four arrays (A30, A300, B30, B300) and I want to create a cumulative, normalized histogram in Python so that the A300 curve ends at 1. I'm looking to normalize the four histograms so that the histogram of A300 will be visibly taller than the rest. However, I do NOT want the area under all four curves to add up to 1. Is there any simple way of doing this with numpy or calculating it?
Related
I want to create a 2d histogram, where altitude is represented on the y-axis and max wind speed on the x-axis and each bin is scaled by the total number of data points in the specific row (altitude level).
The desired output looks similar to the attached figure, however does the color represent a scaled density for each altitude level.
The 2d histogram without scaling looks like this:
The goal I want to achieve with scaling is to get a more accurate Pearson correlation, since we're mainly interested in the altitude effect of wind speed maxima.
I have a 2d Histogram where the value of each bin is calculated by points per bin divided by total points inside that bins row (so occurrence percentage by row). If I am trying to create a line of best fit that goes through the denser center areas of the histogram, how could I do that?
The data I have is one numpy array stored like,
percentages = [[0.00209644 0.00069881 0.00279525 0.00069881 0.00139762
0.00209644 0.00349406 0.00419287 0.00628931 0.01607268 0.01467505
0.02166317 0.02445842 0.03214535, i, i, i, and so on]
[0.02581665 0.02212856 0.02107482...]]
that is a 50 x 20 array so each bin has a value. Using these values, I made the histogram using
plt.pcolormesh(xEdges, yEdges, percentages)
So my question is, how would I create a line of best fit when this is all the information I have?
the denser center areas of the histogram
I assume density would be the z-value - percentages.
define an upper and lower bound for the values of your line, maybe .079 < percentages <= .081
find all the points within those boundaries
add a line using those points.
if the line is too thick or not continuous, adjust the boundaries and repeat.
determine the value that delineates inside or outside - maybe .08 percent
use numpy's .isclose method, with an appropriate tolerance, to find the points close to that value
draw a line using those points
I have the x and y obtained from a histogram, and I want to rebuild that histogram. How can I do that? I tried this:
plt.hist(x,bins=len(x),weights=y)
But it seems like the points are not exactly on the center of the bin and they get significantly shifted after a while (the points on the x-axis are not equally spaced).
I followed the codes in this link
What do the numbers on the x-axis and y-axis mean in this plot? Why they are discrete numbers?
When I used my own data, it gives me this kind of plot, I can't understand what the plot is trying to say.
As they are working with more than two dimensions (features), they are using PCA to project the data into two dimensions (that do not need to correspond to any of the dimensions of the original data) so it can be plotted.
So each of the data points are projected into the dimensions PCA1 and PCA2, which are real-valued (not discrete)
I plot seaborn's kdeplot using subsets of a dataset. the main dataset contains people detections and are the coordinates on a map.There can be many detections in a single frame. the data contains detections of 24 hours
Data format : [time/frame_number, x_cordinate, y_cordinate]
Problem
When i draw two different kdeplot using two subsets data (say 1-2pm and 10-11pm), Both plots are drawn fine.
By exploring the the data i found out that 1-2pm is rush hour, where there are many detections and 10-11pm is closing time where detections are way less. but kdeplot represents the data in same scale (red density areas). This behavior is understandable, Since kde defines the scale based on local max and min values.
Requirement
I want to plot hour-wise kde plots, but i want the scale to be constant over the whole day. Meaning, if 1-2pm is rush-hour, the red density areas are shown there. but at 10-11pm when there mild traffic, the color should not be red, rather lower density colors (green, blue), Since as compared to rush-hour the detections would be very less.
Inshort making scale/levels based on 24 hour min, max values and using consistently in hour-wise plots