Plot average of scattered values in 2D bins as a histogram/hexplot

Plot average of scattered values in 2D bins as a histogram/hexplot - python

I have 3 dimensional scattered data x, y, z.
I want to plot the average of z in bins of x and y as a hex plot or 2D histogram plot.
Is there any matplotlib function to do this?
I can only come up with some very cumbersome implementations even though this seems to be a common problem.
E.g. something like this:
Except that the color should depend on the average z values for the (x, y) bin (rather than the number of entries in the (x, y) bin as in the default hexplot/2D histogram functionalities).

If binning is what you are asking, then binned_statistic_2d might work for you. Here's an example:
from scipy.stats import binned_statistic_2d
import numpy as np
x = np.random.uniform(0, 10, 1000)
y = np.random.uniform(10, 20, 1000)
z = np.exp(-(x-3)**2/5 - (y-18)**2/5) + np.random.random(1000)
x_bins = np.linspace(0, 10, 10)
y_bins = np.linspace(10, 20, 10)
ret = binned_statistic_2d(x, y, z, statistic=np.mean, bins=[x_bins, y_bins])
fig, (ax0, ax1) = plt.subplots(1, 2, figsize=(12, 4))
ax0.scatter(x, y, c=z)
ax1.imshow(ret.statistic.T, origin='bottom', extent=(0, 10, 10, 20))

#Andrea's answer is very clear and helpful, but I wanted to mention a faster alternative that does not use the scipy library.
The idea is to do a 2d histogram of x and y weighted by the z variable (it has the sum of the z variable in each bin) and then normalize against the histogram without weights (it has the number of counts in each bin). In this way, you will calculate the average of the z variable in each bin.
The code:
import numpy as np
import matplotlib.pyplot as plt
x = np.random.uniform(0, 10, 10**7)
y = np.random.uniform(10, 20, 10**7)
z = np.exp(-(x-3)**2/5 - (y-18)**2/5) + np.random.random(10**7)
x_bins = np.linspace(0, 10, 50)
y_bins = np.linspace(10, 20, 50)
H, xedges, yedges = np.histogram2d(x, y, bins = [x_bins, y_bins], weights = z)
H_counts, xedges, yedges = np.histogram2d(x, y, bins = [x_bins, y_bins])
H = H/H_counts
plt.imshow(H.T, origin='lower', cmap='RdBu',
extent=[xedges[0], xedges[-1], yedges[0], yedges[-1]])
plt.colorbar()
In my computer, this method is approximately a factor 5 faster than using scipy's binned_statistic_2d.

Related

The Matplotlib Result is Different From WolfarmAlpha

I want to plot some equation in Matplotlib. But it has different result from Wolframalpha.
This is the equation:
y = 10yt + y^2t + 20
The plot result in wolframalpha is:
But when I want to plot it in the matplotlib with these code
# Creating vectors X and Y
x = np.linspace(-2, 2, 100)
# Assuming α is 10
y = ((10*y*x)+((y**2)*x)+20)
# Create the plot
fig = plt.figure(figsize = (10, 5))
plt.plot(x, y)
The result is:
Any suggestion to modify to code so it has similar plot result as wolframalpha? Thank you

As #Him has suggested in the comments, y = ((10*y*x)+((y**2)*x)+20) won't describe a relationship, so much as make an assignment, so the fact that y appears on both sides of the equation makes this difficult.
It's not trivial to express y cleanly in terms of x, but it's relatively easy to express x in terms of y, and then graph that relationship, like so:
import numpy as np
import matplotlib.pyplot as plt
y = np.linspace(-40, 40, 2000)
x = (y-20)*(((10*y)+(y**2))**-1)
fig, ax = plt.subplots()
ax.plot(x, y, linestyle = 'None', marker = '.')
ax.set_xlim(left = -4, right = 4)
ax.grid()
ax.set_xlabel('x')
ax.set_ylabel('y')
Which produces the following result:
If you tried to plot this with a line instead of points, you'll get a big discontinuity as the asymptotic limbs try to join up
So you'd have to define the same function and evaluate it in three different ranges and plot them all so you don't get any crossovers.
import numpy as np
import matplotlib.pyplot as plt
y1 = np.linspace(-40, -10, 2000)
y2 = np.linspace(-10, 0, 2000)
y3 = np.linspace(0, 40, 2000)
x = lambda y: (y-20)*(((10*y)+(y**2))**-1)
y = np.hstack([y1, y2, y3])
fig, ax = plt.subplots()
ax.plot(x(y), y, linestyle = '-', color = 'b')
ax.set_xlim(left = -4, right = 4)
ax.grid()
ax.set_xlabel('x')
ax.set_ylabel('y')
Which produces this result, that you were after:

How can I give specific x values to `scipy.interpolate.splev`?

How can I interpolate a hysteresis loop at specific x points? Multiple related questions/answers are available on SOF regarding B-spline interpolation using scipy.interpolate.splprep (other questions here or here). However, I have hundreds of hysteresis loops at very similar (but not exactly same) x positions and I would like to perform B-spline interpolation on all of them at specific x coordinates.
Taking a previous example:
import numpy as np
from scipy import interpolate
from matplotlib import pyplot as plt
x = np.array([23, 24, 24, 25, 25])
y = np.array([13, 12, 13, 12, 13])
# append the starting x,y coordinates
x = np.r_[x, x[0]]
y = np.r_[y, y[0]]
# fit splines to x=f(u) and y=g(u), treating both as periodic. also note that s=0
# is needed in order to force the spline fit to pass through all the input points.
tck, u = interpolate.splprep([x, y], s=0, per=True)
# evaluate the spline fits for 1000 evenly spaced distance values
xi, yi = interpolate.splev(np.linspace(0, 1, 1000), tck)
# plot the result
fig, ax = plt.subplots(1, 1)
ax.plot(x, y, 'or')
ax.plot(xi, yi, '-b')
plt.show()
Is it possible to provide specific x values to interpolate.splev? I get unexpected results:
x2, y2 = interpolate.splev(np.linspace(start=23, stop=25, num=30), tck)
fig, ax = plt.subplots(1, 1)
ax.plot(x, y, 'or')
ax.plot(x2, y2, '-b')
plt.show()

The b-spline gives x and y positions for a given u (between 0 and 1).
Getting y positions for a given x position involves solving for the inverse. As there can be many y's corresponding to one x (in the given example there are places with 4 y's, for example at x=24).
A simple way to get a list of (x,y)'s for x between two limits, is to create a filter:
import numpy as np
from scipy import interpolate
from matplotlib import pyplot as plt
x = np.array([23, 24, 24, 25, 25])
y = np.array([13, 12, 13, 12, 13])
# append the starting x,y coordinates
x = np.r_[x, x[0]]
y = np.r_[y, y[0]]
tck, u = interpolate.splprep([x, y], s=0, per=True)
# evaluate the spline fits for 1000 evenly spaced distance values
xi, yi = interpolate.splev(np.linspace(0, 1, 1000), tck)
# plot the result
fig, ax = plt.subplots(1, 1)
ax.plot(x, y, 'or')
ax.plot(xi, yi, '-b')
filter = (xi >= 24) & (xi <= 25)
x2 = xi[filter]
y2 = yi[filter]
ax.scatter(x2, y2, color='c')
plt.show()

Changing the linewidth and the color simultaneously in matplotlib

The figure above is a great artwork showing the wind speed, wind direction and temperature simultaneously. detailedly:
The X axes represent the date
The Y axes shows the wind direction(Southern, western, etc)
The variant widths of the line were stand for the wind speed through timeseries
The variant colors of the line were stand for the atmospheric temperature
This simple figure visualized 3 different attribute without redundancy.
So, I really want to reproduce similar plot in matplotlib.
My attempt now
## Reference 1 http://stackoverflow.com/questions/19390895/matplotlib-plot-with-variable-line-width
## Reference 2 http://stackoverflow.com/questions/17240694/python-how-to-plot-one-line-in-different-colors
def plot_colourline(x,y,c):
c = plt.cm.jet((c-np.min(c))/(np.max(c)-np.min(c)))
lwidths=1+x[:-1]
ax = plt.gca()
for i in np.arange(len(x)-1):
ax.plot([x[i],x[i+1]], [y[i],y[i+1]], c=c[i],linewidth = lwidths[i])# = lwidths[i])
return
x=np.linspace(0,4*math.pi,100)
y=np.cos(x)
lwidths=1+x[:-1]
fig = plt.figure(1, figsize=(5,5))
ax = fig.add_subplot(111)
plot_colourline(x,y,prop)
ax.set_xlim(0,4*math.pi)
ax.set_ylim(-1.1,1.1)
Does someone has a more interested way to achieve this? Any advice would be appreciate!

Using as inspiration another question.
One option would be to use fill_between. But perhaps not in the way it was intended. Instead of using it to create your line, use it to mask everything that is not the line. Under it you can have a pcolormesh or contourf (for example) to map color any way you want.
Look, for instance, at this example:
import matplotlib.pyplot as plt
import numpy as np
from scipy.interpolate import interp1d
def windline(x,y,deviation,color):
y1 = y-deviation/2
y2 = y+deviation/2
tol = (y2.max()-y1.min())*0.05
X, Y = np.meshgrid(np.linspace(x.min(), x.max(), 100), np.linspace(y1.min()-tol, y2.max()+tol, 100))
Z = X.copy()
for i in range(Z.shape[0]):
Z[i,:] = c
#plt.pcolormesh(X, Y, Z)
plt.contourf(X, Y, Z, cmap='seismic')
plt.fill_between(x, y2, y2=np.ones(x.shape)*(y2.max()+tol), color='w')
plt.fill_between(x, np.ones(x.shape) * (y1.min() - tol), y2=y1, color='w')
plt.xlim(x.min(), x.max())
plt.ylim(y1.min()-tol, y2.max()+tol)
plt.show()
x = np.arange(100)
yo = np.random.randint(20, 60, 21)
y = interp1d(np.arange(0, 101, 5), yo, kind='cubic')(x)
dv = np.random.randint(2, 10, 21)
d = interp1d(np.arange(0, 101, 5), dv, kind='cubic')(x)
co = np.random.randint(20, 60, 21)
c = interp1d(np.arange(0, 101, 5), co, kind='cubic')(x)
windline(x, y, d, c)
, which results in this:
The function windline accepts as arguments numpy arrays with x, y , a deviation (like a thickness value per x value), and color array for color mapping. I think it can be greatly improved by messing around with other details but the principle, although not perfect, should be solid.

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
x = np.linspace(0,4*np.pi,10000) # x data
y = np.cos(x) # y data
r = np.piecewise(x, [x < 2*np.pi, x >= 2*np.pi], [lambda x: 1-x/(2*np.pi), 0]) # red
g = np.piecewise(x, [x < 2*np.pi, x >= 2*np.pi], [lambda x: x/(2*np.pi), lambda x: -x/(2*np.pi)+2]) # green
b = np.piecewise(x, [x < 2*np.pi, x >= 2*np.pi], [0, lambda x: x/(2*np.pi)-1]) # blue
a = np.ones(10000) # alpha
w = x # width
fig, ax = plt.subplots(2)
ax[0].plot(x, r, color='r')
ax[0].plot(x, g, color='g')
ax[0].plot(x, b, color='b')
# mysterious parts
points = np.array([x, y]).T.reshape(-1, 1, 2)
segments = np.concatenate([points[:-1], points[1:]], axis=1)
# mysterious parts
rgba = list(zip(r,g,b,a))
lc = LineCollection(segments, linewidths=w, colors=rgba)
ax[1].add_collection(lc)
ax[1].set_xlim(0,4*np.pi)
ax[1].set_ylim(-1.1,1.1)
fig.show()
I notice this is what I suffered.

How to use scipy.interpolate.interp2d for a vector of data?

I have a table of measured values for a quantity that depends on two parameters.
So say I have a function fuelConsumption(speed, temperature), for which data on a mesh are known.
Now I want to interpolate the expected fuelConsumption for a lot of measured data points (speed, temperature) from a pandas.DataFrame (and return a vector with the values for each data point).
I am currently using SciPy's interpolate.interp2d for interpolation, but when passing the parameters as two vectors [s1,s2] and [t1,t2] (only two ordered values for simplicity) it will construct a mesh and return:
[[f(s1,t1), f(s2,t1)], [f(s1,t2), f(s2,t2)]]
The result I am hoping to get is:
[f(s1,t1), f(s2, t2)]
How can I interpolate to get the output I want?

From scipy v0.14 onwards you can use scipy.interpolate.RectBivariateSpline with grid=False:
import numpy as np
from scipy.interpolate import RectBivariateSpline
from matplotlib import pyplot as plt
x, y = np.ogrid[-1:1:10j,-1:1:10j]
z = (x + y)*np.exp(-6.0 * (x * x + y * y))
spl = RectBivariateSpline(x, y, z)
xi = np.linspace(-1, 1, 50)
yi = np.linspace(-1, 1, 50)
zi = spl(xi, yi, grid=False)
fig, ax = plt.subplots(1, 1)
ax.hold(True)
ax.imshow(z, cmap=plt.cm.coolwarm, origin='lower', extent=(-1, 1, -1, 1))
ax.scatter(xi, yi, s=60, c=zi, cmap=plt.cm.coolwarm)

Contour graph in python

How would I make a countour grid in python using matplotlib.pyplot, where the grid is one colour where the z variable is below zero and another when z is equal to or larger than zero? I'm not very familiar with matplotlib so if anyone can give me a simple way of doing this, that would be great.
So far I have:
x= np.arange(0,361)
y= np.arange(0,91)
X,Y = np.meshgrid(x,y)
area = funcarea(L,D,H,W,X,Y) #L,D,H and W are all constants defined elsewhere.
plt.figure()
plt.contourf(X,Y,area)
plt.show()

You can do this using the levels keyword in contourf.
import numpy as np
import matplotlib.pyplot as plt
fig, axs = plt.subplots(1,2)
x = np.linspace(0, 1, 100)
X, Y = np.meshgrid(x, x)
Z = np.sin(X)*np.sin(Y)
levels = np.linspace(-1, 1, 40)
zdata = np.sin(8*X)*np.sin(8*Y)
cs = axs[0].contourf(X, Y, zdata, levels=levels)
fig.colorbar(cs, ax=axs[0], format="%.2f")
cs = axs[1].contourf(X, Y, zdata, levels=[-1,0,1])
fig.colorbar(cs, ax=axs[1])
plt.show()
You can change the colors by choosing and different colormap; using vmin, vmax; etc.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Plot average of scattered values in 2D bins as a histogram/hexplot - python

Related

The Matplotlib Result is Different From WolfarmAlpha

How can I give specific x values to `scipy.interpolate.splev`?

Changing the linewidth and the color simultaneously in matplotlib

How to use scipy.interpolate.interp2d for a vector of data?

Contour graph in python

Categories

Resources