I have a map, on top of which I wish to overlay a color weight map. The weight map has m x n 1m x 1m boxes. I have many points, and each point carries a weight. The weight of a box is calculated by summing up the weights of all the points that fall into that box.
Now according to their wights, a certain color is filled for each box according to its weight.
The desired outcome is similar to the one shown here, but
The map and the weight map have to be overlayed nicely in my case.
Instead of frequency, the weights that have been calculated should be used to assign the color.
How may I do this?
If I understand you correctly: you have a set of points, each of which has an x-coordinate, a y-coordinate and a weight associated with it.
The answers in the question you linked to already describe pretty much exactly what you want to do. The only difference is that you can use the weights= argument to get the weighted count in each bin.
For example, you could use
wcounts, xedges, yedges = np.histogram2d(x, y, weights=w)
to get your weighted histogram, then do
extent = xedges[0], xedges[-1], yedges[0], yedges[-1]
imshow(wcounts, extent=extent, alpha=0.5)
to display it.
I don't know what you mean by overlaying it on a 'map', but you can use the alpha= argument to imshow() to make the image semi-transparent (or you could just draw your 'map' on top of the image).
Likewise, you could do
hexbin(x, y, C=w, alpha=0.5)
to draw a weighted hexagonal binning plot, in this case using the C= argument to specify the weights, and again using alpha= to control the transparency of the plot.
Edit
Ok, so you want to compute the histogram over a specified grid of bin locations. Supposing your x-coordinates are positions between 0m and 100m, your y-coordinates are between 0m and 75m, and you want each bin to be 1m by 1m. You can pass a tuple of arrays specifying the bin edges in x and y to np.histogram2d():
# remember that for n bins there are n+1 bin edges
x_edges = np.linspace(0, 100, 101)
y_edges = np.linspace(0, 75, 76)
wcounts = np.histogram2d(x, y, weights=w, bins=(x_edges, y_edges))[0]
Now wcounts is a (100, 75) array, with each element representing the weighted count in a 1m by 1m bin.
Related
I have certain 2D-arrays jpdf (representing JPDFs) which go from xmin to xmax and ymin to ymax. I want to plot contour-plots of the arrays and compare them. That for, want to have all plots of the JPDFs to have the same x- and y-lims which are usually larger than the maximum array values.
This leads my plots to have ugly white areas around the shown array (see image on the left).
How can I set the lowest level of the colorbar to 'white' without affecting the rest of the colorbar? Or another solution: Could i pick the lowest color of the colorbar and set the now white area to that background color?
And is there a sleak way to achieve this?
Here is what the code looks like so far:
im1 = ax.contourf(jpdf.T, cmap='YlGnBu',
extent=[umin, umax, vmin, vmax], levels = 15)
im2 = ax.contour(jpdf.T, colors='gray',
extent=[umin, umax, vmin, vmax], levels = 15)
ax.set_xlim(-0.5, 0.5)
ax.set_ylim(-0.5, 0.5)
I would like to plot contourf with (lat,depth,temp) and then have similar spacing as in the figure below (the temperature vary more near the surface then at depth, so I want to emphasized this region).
My depth array is not uniform (i.e. depth = [5,15,...,4975,5185,...]. I want to have such non-uniform vertical spacing.
I would like to show yticks = [10,100,500,1000,1500,2000,3000,4000,5000], and depth array does not have those exact values.
z = np.arange(0,50) # I want uniform spacing
pos = ([0,2,5,10,15,20,30,40,48]) # I want some yticks (not all of them)
ax=plt.contourf(lat,z,temp) # temp is a variable with dimensions (lat,depth)
plt.colorbar()
plt.gca().yaxis.set_ticks(pos) # Set some yticks, not all of them
plt.yticks(z[pos],depth[pos].astype(int)) # Replace the dummy values of z-array by something meaningful
plt.gca().invert_yaxis()
plt.grid(linestyle=':')
plt.gca().set(ylabel='depth (m)',xlabel='Latitude')'''
Potential Temperature of the Atlantic Ocean:
Per the matplotlib docs on yticks, you can specify the labels you want to use. In your case, if you want to show the labels [10,100,500,1000,1500,2000,3000,4000,5000] you can simply pass that list as the second argument in plt.yticks(), like so
plt.yticks(z[pos], [10,100,500,1000,1500,2000,3000,4000,5000])
and it will display the yticks accordingly. The issue arises in the specification of the positions - since the depth array does not have points corresponding exactly to the desired ytick values you will need to interpolate in order to find the exact position at which to place the labels. Unless the approximate positions specified in pos are already sufficient, in which case the above suffices.
If the depth data are not uniformly spaced then you can use numpy.interp to perform the interpolation, as shown below
import matplotlib.pyplot as plt
import numpy as np
# Create some depth data that is not uniformly spaced over [0, 5500]
depth = [(np.random.random() - 0.5)*25 + ii for ii in np.linspace(0, 5500, 50)]
lat = np.linspace(-75, 75, 50)
z = np.linspace(0,50, 50)
yticks = [10,100,500,1000,1500,2000,3000,4000,5000]
# Interpolate depths to get z-positions
pos = np.interp(yticks, depth, z)
temp = np.outer(lat, z) # Arbitrarily populate temp for demonstration
ax = plt.contourf(lat,z,temp)
plt.colorbar()
plt.gca().yaxis.set_ticks(pos)
plt.yticks(pos,yticks) # Place yticks at interpolated z-positions
plt.gca().invert_yaxis()
plt.grid(linestyle=':')
plt.gca().set(ylabel='Depth (m)',xlabel='Latitude')
plt.show()
This will find the exact positions where the yticks would fall if the depth array had data at those positions and place them accordingly as shown below.
I have some geometrically distributed data. When I want to take a look at it, I use
sns.distplot(data, kde=False, norm_hist=True, bins=100)
which results is a picture:
However, bins heights don't add up to 1, which means y axis doesn't show probability, it's something different. If instead we use
weights = np.ones_like(np.array(data))/float(len(np.array(data)))
plt.hist(data, weights=weights, bins = 100)
the y axis shall show probability, as bins heights sum up to 1:
It can be seen more clearly here: suppose we have a list
l = [1, 3, 2, 1, 3]
We have two 1s, two 3s and one 2, so their respective probabilities are 2/5, 2/5 and 1/5. When we use seaborn histplot with 3 bins:
sns.distplot(l, kde=False, norm_hist=True, bins=3)
we get:
As you can see, the 1st and the 3rd bin sum up to 0.6+0.6=1.2 which is already greater than 1, so y axis is not a probability. When we use
weights = np.ones_like(np.array(l))/float(len(np.array(l)))
plt.hist(l, weights=weights, bins = 3)
we get:
and the y axis is probability, as 0.4+0.4+0.2=1 as expected.
The amount of bins in these 2 cases are is the same for both methods used in each case: 100 bins for geometrically distributed data, 3 bins for small array l with 3 possible values. So bins amount is not the issue.
My question is: in seaborn distplot called with norm_hist=True, what is the meaning of y axis?
From the documentation:
norm_hist : bool, optional
If True, the histogram height shows a density rather than a count. This is implied if a KDE or fitted density is plotted.
So you need to take into account your bin width as well, i.e. compute the area under the curve and not just the sum of the bin heights.
The x-axis is the value of the variable just like in a histogram, but what exactly does the y-axis represent?
ANS-> The y-axis in a density plot is the probability density function for the kernel density estimation. However, we need to be careful to specify this is a probability density and not a probability. The difference is the probability density is the probability per unit on the x-axis. To convert to an actual probability, we need to find the area under the curve for a specific interval on the x-axis. Somewhat confusingly, because this is a probability density and not a probability, the y-axis can take values greater than one. The only requirement of the density plot is that the total area under the curve integrates to one. I generally tend to think of the y-axis on a density plot as a value only for relative comparisons between different categories.
from the reference of https://towardsdatascience.com/histograms-and-density-plots-in-python-f6bda88f5ac0
This code will help you make something like this :
sns.set_style("whitegrid")
ax = sns.displot(data=df_p,
x='Volume_Tonnes', kind='kde', fill=True, height=5, aspect=2)
# Here you can define the x limit
ax.set(xlim=(-50,100))
ax.set(xlabel = 'Volume Tonnes', ylabel = 'Probability Density')
ax.fig.suptitle("Volume Tonnes Distribution",
fontsize=20, fontdict={"weight": "bold"})
plt.show()
I'm trying to plot a simple histogram with multiple data in parallel.
My data are a set of 2D ndarrays, all of them with the same dimension (in this example 256 x 256).
I have this method to plot the data set:
def plot_data_histograms(data, bins, color, label, file_path):
"""
Plot multiple data histograms in parallel
:param data : a set of data to be plotted
:param bins : the number of bins to be used
:param color : teh color of each data in the set
:param label : the label of each color in the set
:param file_path : the path where the output will be save
"""
plt.figure()
plt.hist(data, bins, normed=1, color=color, label=label, alpha=0.75)
plt.legend(loc='upper right')
plt.savefig(file_path + '.png')
plt.close()
And I'm passing my data as follows:
data = [sobel.flatten(), prewitt.flatten(), roberts.flatten(), scharr.flatten()]
labels = ['Sobel', 'Prewitt', 'Roberts Cross', 'Scharr']
colors = ['green', 'blue', 'yellow', 'red']
plot_data_histograms(data, 5, colors, labels, '../Visualizations/StatisticalMeasures/RMSEHistograms')
And I got this histogram:
I know that this may be stupid, but I didn't get why my yticks varies from 0 to 4.5. I know that is due the normed parameter, but even reading this;
If True, the first element of the return tuple will be the counts
normalized to form a probability density, i.e., n/(len(x)*dbin). In a
probability density, the integral of the histogram should be 1; you
can verify that with a trapezoidal integration of the probability
density function.
I didn't really get how it works.
Also, once I set my bins to be equal five and the histogram has exactly 5 xticks (excluding borders), I didn't understand why I have some bars in the middle of some thicks, like the yellow one over the 0.6 thick. Since my number of bins and of xticks matches, I though that each set of four bars should be concentrated inside each interval, like it happens with the four first bars, completely concentrated inside the [0.0, 0.2] interval.
Thank you in advance.
The reason this is confusing is because you're squishing four histograms on one plot. In order to do this, matplotlib chooses to narrow the bars and put a gap between them. In a standard histogram, the total area of all bins is either 1 if normed or N. Here's a simple example:
a = np.random.rand(10)
bins = np.array([0, 0.5, 1.0]) # just two bins
plt.hist(a, bins, normed=True)
First note that the each bar covers the entire range of its bin: The first bar ranges from 0 to 0.5, and its height is given by the number of points in that range.
Next, you can see that the total area of the two bars is 1 because normed = True: The width of each bar is 0.5 and the heights are 1.2 and 0.8.
Let's plot the same thing again with another distribution so you can see the effect:
b = np.random.rand(10)
plt.hist([a, b], bins, normed=True)
Recall that the blue bars represent exactly the same data as in the first plot, but they're less than half the width now because they must make room for the green bars. You can see that now two bars plus some whitespace covers the range of each bin. So we must pretend that the width of each bar is actually the width of all bars plus the width of the whitespace gap when we are calculating the bin range and bar area.
Finally, notice that nowhere do the xticks align with the binedges. If you wish, you can set this to be the case manually, with:
plt.xticks(bins)
If you hadn't manually created bins first, you can grab it from plt.hist:
counts, bins, bars = plt.hist(...)
plt.xticks(bins)
I'm plotting a 2d histogram as image in pyqtgraph. I would like to set the axes scales correctly (i.e. representing the actual values of the binned data).
I found this article but I'm not quite sure how to translate it to my case.
I do:
h = np.histogram2d(x, y, 30, normed = True)
w = pg.ImageView(view=pg.PlotItem())
w.setImage(h[0])
but the scale of the PlotItem axes run from 0 to 30 (number of bins), which is not what I would like.
You need to set the position and scale of the image. The link you provided has the following code:
view.setImage(img, pos=[x0, y0], scale=[xscale, yscale])
You only need to determine the correct values of [x0, y0] and [xscale, yscale] based on your bin values in h[1].