python piecewise linear interpolation - python

I'm trying to create a piecewise linear interpolation routine and I'm pretty new to all of this so I'm very uncertain of what needs to be done.
I've generate a set of data points in 3D which gives variation in all 3 directions. I want to interpolate between these data points and plot in 3D.
The current data set is much smaller than the final one will be. Linear interpolation is important.
here's the current code
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import scipy.interpolate as interp
x = np.linspace(-1.3,1.3,10)
y1 = np.linspace(.5,0.,5)
y2 = np.linspace(0.,.5,5)
y = np.hstack((y1,y2))
z1 = np.linspace(.1,0.,5)
z2 = np.linspace(0.,.1,5)
z = np.hstack((z1,z2))
data = np.dstack([x,y,z])
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
f = interp.interp2d(x, y, z, kind='linear')
xnew = np.linspace(-1.3,1.3,100)
y1new = np.linspace(.5,0.,50)
y2new = np.linspace(0.,.5,50)
ynew = np.hstack((y1new,y2new))
znew = f(xnew,ynew)
ax.plot(x,y,znew, 'b-')
ax.scatter(x,y,z,'ro')
plt.show()
As I said, dataset is just to add variation. The real set will be much bigger but have less variation. I don't really understand the interpolation tool and the scipy documentation isn't very clear
would appreciate suggestions

2D ok. Please help with 3D
What I'm trying to do is build something that takes data points for deflections of a beam an interpolates between the data points. I wanted to to this in 3D and get a 3D plot showing the deflection along the x-axis in both y and z directions at the same time. As a stop gap measure I've used the below code to individually show deflection in y dir and z dir. Note, the data set is randomly generated for the moment. Some choices might look strange at the mo, but that's to sorta stick to the kinda range the final data set will use. The code below works for a 2D system so may be helpful to someone. I'd still really appreciate if someone could help me do this in 3D.
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import CubicSpline
u=10
x = np.linspace(-1.3,1.3,u) #regular x-data
y = np.random.random_sample(u)/4 #random y data
z = np.random.random_sample(u)/10 # random zdata
ynone = np.ones(u)*0.1 #no deflection dataset
znone = np.ones(u)*0.05
xspace = np.linspace(-1.3, 1.3, u*100)
ydefl = CubicSpline(x, y) #creating cubinc spline function for original data
zdefl = CubicSpline(x, z)
plt.subplot(2, 1, 1)
plt.plot(x, ynone, '-',label='y - no deflection')
plt.plot(x, y, 'go',label='y-deflection data')
plt.plot(xspace, ydefl(xspace), label='spline') #plot xspace vs spline function of xspace
plt.title('X [m]s')
plt.ylabel('Y [m]')
plt.legend(loc='best', ncol=3)
plt.subplot(2, 1, 2)
plt.plot(x, znone, '-',label='z - no deflection')
plt.plot(x, z, 'go',label='z-deflection data')
plt.plot(xspace, zdefl(xspace),label='spline')
plt.xlabel('X [m]')
plt.ylabel('Z [m]')
plt.legend(loc='best', ncol=3)
plt.show()

Related

How to plot smooth curve through the true data points in Python 3?

I have tried a bunch of spline examples already posted for plotting smooth curves in python but those smoothed curves don't always cross through the true points. So far, I tried using
make_interp_spline, interp1dand also BSpline.
Is there any way (any spline option) I can plot a smooth curve that must include/cross through true data points?
Here is a simple example with interp1d:
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
x = np.arange(5)
y = np.random.random(5)
fun = interp1d(x=x, y=y, kind=2)
x2 = np.linspace(start=0, stop=4, num=1000)
y2 = fun(x2)
plt.plot(x2, y2, color='b')
plt.plot(x, y, ls='', marker='o', color='r')
You can easily verify that this interpolation includes the true data points:
assert np.allclose(fun(x), y)

3D surface graph with matplotlib using dataframe columns to input the data

I have a spreadsheet file that I would like to input to create a 3D surface graph using Matplotlib in Python.
I used plot_trisurf and it worked, but I need the projections of the contour profiles onto the graph that I can get with the surface function, like this example.
I'm struggling to arrange my Z data in a 2D array that I can use to input in the plot_surface method. I tried a lot of things, but none seems to work.
Here it is what I have working, using plot_trisurf
import matplotlib
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
import pandas as pd
df=pd.read_excel ("/Users/carolethais/Desktop/Dissertação Carol/Códigos/Resultados/res_02_0.5.xlsx")
fig = plt.figure()
ax = fig.gca(projection='3d')
# I got the graph using trisurf
graf=ax.plot_trisurf(df["Diametro"],df["Comprimento"], df["temp_out"], cmap=matplotlib.cm.coolwarm)
ax.set_xlim(0, 0.5)
ax.set_ylim(0, 100)
ax.set_zlim(25,40)
fig.colorbar(graf, shrink=0.5, aspect=15)
ax.set_xlabel('Diâmetro (m)')
ax.set_ylabel('Comprimento (m)')
ax.set_zlabel('Temperatura de Saída (ºC)')
plt.show()
This is a part of my df, dataframe:
Diametro Comprimento temp_out
0 0.334294 0.787092 34.801994
1 0.334294 8.187065 32.465551
2 0.334294 26.155976 29.206090
3 0.334294 43.648591 27.792126
4 0.334294 60.768219 27.163233
... ... ... ...
59995 0.437266 14.113660 31.947302
59996 0.437266 25.208851 30.317583
59997 0.437266 33.823035 29.405461
59998 0.437266 57.724209 27.891616
59999 0.437266 62.455890 27.709298
I tried this approach to use the imported data with plot_surface, but what I got was indeed a graph but it didn't work, here it's the way the graph looked with this approach:
Thank you so much
A different approach, based on re-gridding the data, that doesn't require that the original data is specified on a regular grid [deeply inspired by this example;-].
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.tri as tri
from mpl_toolkits.mplot3d import Axes3D
np.random.seed(19880808)
# compute the sombrero over a cloud of random points
npts = 10000
x, y = np.random.uniform(-5, 5, npts), np.random.uniform(-5, 5, npts)
z = np.cos(1.5*np.sqrt(x*x + y*y))/(1+0.33*(x*x+y*y))
# prepare the interpolator
triang = tri.Triangulation(x, y)
interpolator = tri.LinearTriInterpolator(triang, z)
# do the interpolation
xi = yi = np.linspace(-5, 5, 101)
Xi, Yi = np.meshgrid(xi, yi)
Zi = interpolator(Xi, Yi)
# plotting
fig = plt.figure()
ax = fig.gca(projection='3d')
norm = plt.Normalize(-1,1)
ax.plot_surface(Xi, Yi, Zi,
cmap='inferno',
norm=plt.Normalize(-1,1))
plt.show()
plot_trisurf expects x, y, z as 1D arrays while plot_surface expects X, Y, Z as 2D arrays or as x, y, Z with x, y being 1D array and Z a 2D array.
Your data consists of 3 1D arrays, so plotting them with plot_trisurf is immediate but you need to use plot_surface to be able to project the isolines on the coordinate planes... You need to reshape your data.
It seems that you have 60000 data points, in the following I assume that you have a regular grid 300 points in the x direction and 200 points in y — but what is important is the idea of regular grid.
The code below shows
the use of plot_trisurf (with a coarser mesh), similar to your code;
the correct use of reshaping and its application in plot_surface;
note that the number of rows in reshaping corresponds to the number
of points in y and the number of columns to the number of points in x;
and 4. incorrect use of reshaping, the resulting subplots are somehow
similar to the plot you showed, maybe you just need to fix the number
of row and columns.
import matplotlib.pyplot as plt
import numpy as np
from mpl_toolkits.mplot3d import Axes3D
x, y = np.arange(30)/3.-5, np.arange(20)/2.-5
x, y = (arr.flatten() for arr in np.meshgrid(x, y))
z = np.cos(1.5*np.sqrt(x*x + y*y))/(1+0.1*(x*x+y*y))
fig, axes = plt.subplots(2, 2, subplot_kw={"projection" : "3d"})
axes = iter(axes.flatten())
ax = next(axes)
ax.plot_trisurf(x,y,z, cmap='Reds')
ax.set_title('Trisurf')
X, Y, Z = (arr.reshape(20,30) for arr in (x,y,z))
ax = next(axes)
ax.plot_surface(X,Y,Z, cmap='Reds')
ax.set_title('Surface 20×30')
X, Y, Z = (arr.reshape(30,20) for arr in (x,y,z))
ax = next(axes)
ax.plot_surface(X,Y,Z, cmap='Reds')
ax.set_title('Surface 30×20')
X, Y, Z = (arr.reshape(40,15) for arr in (x,y,z))
ax = next(axes)
ax.plot_surface(X,Y,Z, cmap='Reds')
ax.set_title('Surface 40×15')
plt.tight_layout()
plt.show()

2D Density Plot with X Y Z data

I am trying to plot 2d terrain map with x,y and z (elevation). I followed the steps from the following link but I am getting very weird plot.
Python : 2d contour plot from 3 lists : x, y and rho?
I spent almost half day searching but got nowhere.
import numpy as np
import matplotlib.pyplot as plt
import scipy.interpolate
# import data:
import xlrd
loc = "~/Desktop/Book4.xlsx"
wb = xlrd.open_workbook(loc)
sheet = wb.sheet_by_index(0)
sample=500
# Generate array:
x=np.array(sheet.col_values(0))[0:sample]
y=np.array(sheet.col_values(1))[0:sample]
z=np.hamming(sample)[0:sample][:,None]
# Set up a regular grid of interpolation points
xi, yi = np.meshgrid(x, y)
# Interpolate
rbf = scipy.interpolate.Rbf(x, y, z, function='cubic')
zi = rbf(xi, yi)
# Plot
plt.imshow(zi, vmin=z.min(), vmax=z.max(), origin='lower',
extent=[x.min(), x.max(), y.min(), y.max()])
plt.colorbar()
plt.show()
The first of the following fig is what I am getting and the last one is how it should look like.
Any help shall be appreciated
Link to data file
I think the problem is that the data you're giving it is not smooth enough to interpolate with the default parameters. Here's one approach, using mgrid instead of meshgrid:
import numpy as np
import pandas as pd
from scipy.interpolate import Rbf
# fname is your data, but as a CSV file.
data = pd.read_csv(fname).values
x, y = data.T
x_min, x_max = np.amin(x), np.amax(x)
y_min, y_max = np.amin(y), np.amax(y)
# Make a grid with spacing 0.002.
grid_x, grid_y = np.mgrid[x_min:x_max:0.002, y_min:y_max:0.002]
# Make up a Z.
z = np.hamming(x.size)
# Make an n-dimensional interpolator.
rbfi = Rbf(x, y, z, smooth=2)
# Predict on the regular grid.
di = rbfi(grid_x, grid_y)
Then you can look at the result:
import matplotlib.pyplot as plt
plt.imshow(di)
I get:
I wrote a Jupyter Notebook on this topic recently, check it out for a few other interpolation methods, like kriging and spline fitting.

Contour plotting complex numbers and conjugates

I am trying to make a contour plot in python with complex numbers (i am using matplotlib, pylab).
I am working with sharp bounds on harmonic polynomials, but specifically right now I am trying to plot:
Re(z(bar) - e^(z))= 0
Im(z(bar) - e^z) = 0
and plot them over each other in a contour in order to find their zeros to determine how many solutions there are to the equation z(bar) = e^(z).
Does anyone have experience in contour plotting, specifically with complex numbers?
import numpy as np
from matplotlib import pyplot as plt
x = np.r_[0:10:30j]
y = np.r_[0:10:20j]
X, Y = np.meshgrid(x, y)
Z = X*np.exp(1j*Y) # some arbitrary complex data
def plotit(z, title):
plt.figure()
cs = plt.contour(X,Y,z) # contour() accepts complex values
plt.clabel(cs, inline=1, fontsize=10) # add labels to contours
plt.title(title)
plt.savefig(title+'.png')
plotit(Z, 'real')
plotit(Z.real, 'explicit real')
plotit(Z.imag, 'imaginary')
plt.show()
EDIT: Above is my code, and note that for Z, I need to plot both real and imaginary parts of (x- iy) - e^(x+iy)=0. The current Z that is there is simply arbitrary. It is giving me an error for not having a 2D array when I try to plug mine in.
I don't know how you are plotting since you didn't post any code, but in general I advise moving away from using pylab or the pyplot interface to matplotlib, using the direct object methods is much more robust and just as simple. Here is an example of plotting contours of two sets of data on the same plot.
import numpy as np
import matplotlib.pyplot as plt
# making fake data
x = np.linspace(0, 2)
y = np.linspace(0, 2)
c = x[:,np.newaxis] * y
c2 = np.flipud(c)
# plot
fig, ax = plt.subplots(1, 1)
cont1 = ax.contour(x, y, c, colors='b')
cont2 = ax.contour(x, y, c2, colors='r')
cont1.clabel()
cont2.clabel()
plt.show()
For tom10, here is the plot this code produces. Note that setting colors to a single color makes distinguishing the two plots much easier.

Generate a heatmap using a scatter data set

I have a set of X,Y data points (about 10k) that are easy to plot as a scatter plot but that I would like to represent as a heatmap.
I looked through the examples in Matplotlib and they all seem to already start with heatmap cell values to generate the image.
Is there a method that converts a bunch of x, y, all different, to a heatmap (where zones with higher frequency of x, y would be "warmer")?
If you don't want hexagons, you can use numpy's histogram2d function:
import numpy as np
import numpy.random
import matplotlib.pyplot as plt
# Generate some test data
x = np.random.randn(8873)
y = np.random.randn(8873)
heatmap, xedges, yedges = np.histogram2d(x, y, bins=50)
extent = [xedges[0], xedges[-1], yedges[0], yedges[-1]]
plt.clf()
plt.imshow(heatmap.T, extent=extent, origin='lower')
plt.show()
This makes a 50x50 heatmap. If you want, say, 512x384, you can put bins=(512, 384) in the call to histogram2d.
Example:
In Matplotlib lexicon, i think you want a hexbin plot.
If you're not familiar with this type of plot, it's just a bivariate histogram in which the xy-plane is tessellated by a regular grid of hexagons.
So from a histogram, you can just count the number of points falling in each hexagon, discretiize the plotting region as a set of windows, assign each point to one of these windows; finally, map the windows onto a color array, and you've got a hexbin diagram.
Though less commonly used than e.g., circles, or squares, that hexagons are a better choice for the geometry of the binning container is intuitive:
hexagons have nearest-neighbor symmetry (e.g., square bins don't,
e.g., the distance from a point on a square's border to a point
inside that square is not everywhere equal) and
hexagon is the highest n-polygon that gives regular plane
tessellation (i.e., you can safely re-model your kitchen floor with hexagonal-shaped tiles because you won't have any void space between the tiles when you are finished--not true for all other higher-n, n >= 7, polygons).
(Matplotlib uses the term hexbin plot; so do (AFAIK) all of the plotting libraries for R; still i don't know if this is the generally accepted term for plots of this type, though i suspect it's likely given that hexbin is short for hexagonal binning, which is describes the essential step in preparing the data for display.)
from matplotlib import pyplot as PLT
from matplotlib import cm as CM
from matplotlib import mlab as ML
import numpy as NP
n = 1e5
x = y = NP.linspace(-5, 5, 100)
X, Y = NP.meshgrid(x, y)
Z1 = ML.bivariate_normal(X, Y, 2, 2, 0, 0)
Z2 = ML.bivariate_normal(X, Y, 4, 1, 1, 1)
ZD = Z2 - Z1
x = X.ravel()
y = Y.ravel()
z = ZD.ravel()
gridsize=30
PLT.subplot(111)
# if 'bins=None', then color of each hexagon corresponds directly to its count
# 'C' is optional--it maps values to x-y coordinates; if 'C' is None (default) then
# the result is a pure 2D histogram
PLT.hexbin(x, y, C=z, gridsize=gridsize, cmap=CM.jet, bins=None)
PLT.axis([x.min(), x.max(), y.min(), y.max()])
cb = PLT.colorbar()
cb.set_label('mean value')
PLT.show()
Edit: For a better approximation of Alejandro's answer, see below.
I know this is an old question, but wanted to add something to Alejandro's anwser: If you want a nice smoothed image without using py-sphviewer you can instead use np.histogram2d and apply a gaussian filter (from scipy.ndimage.filters) to the heatmap:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
from scipy.ndimage.filters import gaussian_filter
def myplot(x, y, s, bins=1000):
heatmap, xedges, yedges = np.histogram2d(x, y, bins=bins)
heatmap = gaussian_filter(heatmap, sigma=s)
extent = [xedges[0], xedges[-1], yedges[0], yedges[-1]]
return heatmap.T, extent
fig, axs = plt.subplots(2, 2)
# Generate some test data
x = np.random.randn(1000)
y = np.random.randn(1000)
sigmas = [0, 16, 32, 64]
for ax, s in zip(axs.flatten(), sigmas):
if s == 0:
ax.plot(x, y, 'k.', markersize=5)
ax.set_title("Scatter plot")
else:
img, extent = myplot(x, y, s)
ax.imshow(img, extent=extent, origin='lower', cmap=cm.jet)
ax.set_title("Smoothing with $\sigma$ = %d" % s)
plt.show()
Produces:
The scatter plot and s=16 plotted on top of eachother for Agape Gal'lo (click for better view):
One difference I noticed with my gaussian filter approach and Alejandro's approach was that his method shows local structures much better than mine. Therefore I implemented a simple nearest neighbour method at pixel level. This method calculates for each pixel the inverse sum of the distances of the n closest points in the data. This method is at a high resolution pretty computationally expensive and I think there's a quicker way, so let me know if you have any improvements.
Update: As I suspected, there's a much faster method using Scipy's scipy.cKDTree. See Gabriel's answer for the implementation.
Anyway, here's my code:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
def data_coord2view_coord(p, vlen, pmin, pmax):
dp = pmax - pmin
dv = (p - pmin) / dp * vlen
return dv
def nearest_neighbours(xs, ys, reso, n_neighbours):
im = np.zeros([reso, reso])
extent = [np.min(xs), np.max(xs), np.min(ys), np.max(ys)]
xv = data_coord2view_coord(xs, reso, extent[0], extent[1])
yv = data_coord2view_coord(ys, reso, extent[2], extent[3])
for x in range(reso):
for y in range(reso):
xp = (xv - x)
yp = (yv - y)
d = np.sqrt(xp**2 + yp**2)
im[y][x] = 1 / np.sum(d[np.argpartition(d.ravel(), n_neighbours)[:n_neighbours]])
return im, extent
n = 1000
xs = np.random.randn(n)
ys = np.random.randn(n)
resolution = 250
fig, axes = plt.subplots(2, 2)
for ax, neighbours in zip(axes.flatten(), [0, 16, 32, 64]):
if neighbours == 0:
ax.plot(xs, ys, 'k.', markersize=2)
ax.set_aspect('equal')
ax.set_title("Scatter Plot")
else:
im, extent = nearest_neighbours(xs, ys, resolution, neighbours)
ax.imshow(im, origin='lower', extent=extent, cmap=cm.jet)
ax.set_title("Smoothing over %d neighbours" % neighbours)
ax.set_xlim(extent[0], extent[1])
ax.set_ylim(extent[2], extent[3])
plt.show()
Result:
Instead of using np.hist2d, which in general produces quite ugly histograms, I would like to recycle py-sphviewer, a python package for rendering particle simulations using an adaptive smoothing kernel and that can be easily installed from pip (see webpage documentation). Consider the following code, which is based on the example:
import numpy as np
import numpy.random
import matplotlib.pyplot as plt
import sphviewer as sph
def myplot(x, y, nb=32, xsize=500, ysize=500):
xmin = np.min(x)
xmax = np.max(x)
ymin = np.min(y)
ymax = np.max(y)
x0 = (xmin+xmax)/2.
y0 = (ymin+ymax)/2.
pos = np.zeros([len(x),3])
pos[:,0] = x
pos[:,1] = y
w = np.ones(len(x))
P = sph.Particles(pos, w, nb=nb)
S = sph.Scene(P)
S.update_camera(r='infinity', x=x0, y=y0, z=0,
xsize=xsize, ysize=ysize)
R = sph.Render(S)
R.set_logscale()
img = R.get_image()
extent = R.get_extent()
for i, j in zip(xrange(4), [x0,x0,y0,y0]):
extent[i] += j
print extent
return img, extent
fig = plt.figure(1, figsize=(10,10))
ax1 = fig.add_subplot(221)
ax2 = fig.add_subplot(222)
ax3 = fig.add_subplot(223)
ax4 = fig.add_subplot(224)
# Generate some test data
x = np.random.randn(1000)
y = np.random.randn(1000)
#Plotting a regular scatter plot
ax1.plot(x,y,'k.', markersize=5)
ax1.set_xlim(-3,3)
ax1.set_ylim(-3,3)
heatmap_16, extent_16 = myplot(x,y, nb=16)
heatmap_32, extent_32 = myplot(x,y, nb=32)
heatmap_64, extent_64 = myplot(x,y, nb=64)
ax2.imshow(heatmap_16, extent=extent_16, origin='lower', aspect='auto')
ax2.set_title("Smoothing over 16 neighbors")
ax3.imshow(heatmap_32, extent=extent_32, origin='lower', aspect='auto')
ax3.set_title("Smoothing over 32 neighbors")
#Make the heatmap using a smoothing over 64 neighbors
ax4.imshow(heatmap_64, extent=extent_64, origin='lower', aspect='auto')
ax4.set_title("Smoothing over 64 neighbors")
plt.show()
which produces the following image:
As you see, the images look pretty nice, and we are able to identify different substructures on it. These images are constructed spreading a given weight for every point within a certain domain, defined by the smoothing length, which in turns is given by the distance to the closer nb neighbor (I've chosen 16, 32 and 64 for the examples). So, higher density regions typically are spread over smaller regions compared to lower density regions.
The function myplot is just a very simple function that I've written in order to give the x,y data to py-sphviewer to do the magic.
If you are using 1.2.x
import numpy as np
import matplotlib.pyplot as plt
x = np.random.randn(100000)
y = np.random.randn(100000)
plt.hist2d(x,y,bins=100)
plt.show()
Seaborn now has the jointplot function which should work nicely here:
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
# Generate some test data
x = np.random.randn(8873)
y = np.random.randn(8873)
sns.jointplot(x=x, y=y, kind='hex')
plt.show()
Here's Jurgy's great nearest neighbour approach but implemented using scipy.cKDTree. In my tests it's about 100x faster.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
from scipy.spatial import cKDTree
def data_coord2view_coord(p, resolution, pmin, pmax):
dp = pmax - pmin
dv = (p - pmin) / dp * resolution
return dv
n = 1000
xs = np.random.randn(n)
ys = np.random.randn(n)
resolution = 250
extent = [np.min(xs), np.max(xs), np.min(ys), np.max(ys)]
xv = data_coord2view_coord(xs, resolution, extent[0], extent[1])
yv = data_coord2view_coord(ys, resolution, extent[2], extent[3])
def kNN2DDens(xv, yv, resolution, neighbours, dim=2):
"""
"""
# Create the tree
tree = cKDTree(np.array([xv, yv]).T)
# Find the closest nnmax-1 neighbors (first entry is the point itself)
grid = np.mgrid[0:resolution, 0:resolution].T.reshape(resolution**2, dim)
dists = tree.query(grid, neighbours)
# Inverse of the sum of distances to each grid point.
inv_sum_dists = 1. / dists[0].sum(1)
# Reshape
im = inv_sum_dists.reshape(resolution, resolution)
return im
fig, axes = plt.subplots(2, 2, figsize=(15, 15))
for ax, neighbours in zip(axes.flatten(), [0, 16, 32, 63]):
if neighbours == 0:
ax.plot(xs, ys, 'k.', markersize=5)
ax.set_aspect('equal')
ax.set_title("Scatter Plot")
else:
im = kNN2DDens(xv, yv, resolution, neighbours)
ax.imshow(im, origin='lower', extent=extent, cmap=cm.Blues)
ax.set_title("Smoothing over %d neighbours" % neighbours)
ax.set_xlim(extent[0], extent[1])
ax.set_ylim(extent[2], extent[3])
plt.savefig('new.png', dpi=150, bbox_inches='tight')
and the initial question was... how to convert scatter values to grid values, right?
histogram2d does count the frequency per cell, however, if you have other data per cell than just the frequency, you'd need some additional work to do.
x = data_x # between -10 and 4, log-gamma of an svc
y = data_y # between -4 and 11, log-C of an svc
z = data_z #between 0 and 0.78, f1-values from a difficult dataset
So, I have a dataset with Z-results for X and Y coordinates. However, I was calculating few points outside the area of interest (large gaps), and heaps of points in a small area of interest.
Yes here it becomes more difficult but also more fun. Some libraries (sorry):
from matplotlib import pyplot as plt
from matplotlib import cm
import numpy as np
from scipy.interpolate import griddata
pyplot is my graphic engine today,
cm is a range of color maps with some initeresting choice.
numpy for the calculations,
and griddata for attaching values to a fixed grid.
The last one is important especially because the frequency of xy points is not equally distributed in my data. First, let's start with some boundaries fitting to my data and an arbitrary grid size. The original data has datapoints also outside those x and y boundaries.
#determine grid boundaries
gridsize = 500
x_min = -8
x_max = 2.5
y_min = -2
y_max = 7
So we have defined a grid with 500 pixels between the min and max values of x and y.
In my data, there are lots more than the 500 values available in the area of high interest; whereas in the low-interest-area, there are not even 200 values in the total grid; between the graphic boundaries of x_min and x_max there are even less.
So for getting a nice picture, the task is to get an average for the high interest values and to fill the gaps elsewhere.
I define my grid now. For each xx-yy pair, i want to have a color.
xx = np.linspace(x_min, x_max, gridsize) # array of x values
yy = np.linspace(y_min, y_max, gridsize) # array of y values
grid = np.array(np.meshgrid(xx, yy.T))
grid = grid.reshape(2, grid.shape[1]*grid.shape[2]).T
Why the strange shape? scipy.griddata wants a shape of (n, D).
Griddata calculates one value per point in the grid, by a predefined method.
I choose "nearest" - empty grid points will be filled with values from the nearest neighbor. This looks as if the areas with less information have bigger cells (even if it is not the case). One could choose to interpolate "linear", then areas with less information look less sharp. Matter of taste, really.
points = np.array([x, y]).T # because griddata wants it that way
z_grid2 = griddata(points, z, grid, method='nearest')
# you get a 1D vector as result. Reshape to picture format!
z_grid2 = z_grid2.reshape(xx.shape[0], yy.shape[0])
And hop, we hand over to matplotlib to display the plot
fig = plt.figure(1, figsize=(10, 10))
ax1 = fig.add_subplot(111)
ax1.imshow(z_grid2, extent=[x_min, x_max,y_min, y_max, ],
origin='lower', cmap=cm.magma)
ax1.set_title("SVC: empty spots filled by nearest neighbours")
ax1.set_xlabel('log gamma')
ax1.set_ylabel('log C')
plt.show()
Around the pointy part of the V-Shape, you see I did a lot of calculations during my search for the sweet spot, whereas the less interesting parts almost everywhere else have a lower resolution.
Make a 2-dimensional array that corresponds to the cells in your final image, called say heatmap_cells and instantiate it as all zeroes.
Choose two scaling factors that define the difference between each array element in real units, for each dimension, say x_scale and y_scale. Choose these such that all your datapoints will fall within the bounds of the heatmap array.
For each raw datapoint with x_value and y_value:
heatmap_cells[floor(x_value/x_scale),floor(y_value/y_scale)]+=1
Very similar to #Piti's answer, but using 1 call instead of 2 to generate the points:
import numpy as np
import matplotlib.pyplot as plt
pts = 1000000
mean = [0.0, 0.0]
cov = [[1.0,0.0],[0.0,1.0]]
x,y = np.random.multivariate_normal(mean, cov, pts).T
plt.hist2d(x, y, bins=50, cmap=plt.cm.jet)
plt.show()
Output:
Here's one I made on a 1 Million point set with 3 categories (colored Red, Green, and Blue). Here's a link to the repository if you'd like to try the function. Github Repo
histplot(
X,
Y,
labels,
bins=2000,
range=((-3,3),(-3,3)),
normalize_each_label=True,
colors = [
[1,0,0],
[0,1,0],
[0,0,1]],
gain=50)
I'm afraid I'm a little late to the party but I had a similar question a while ago. The accepted answer (by #ptomato) helped me out but I'd also want to post this in case it's of use to someone.
''' I wanted to create a heatmap resembling a football pitch which would show the different actions performed '''
import numpy as np
import matplotlib.pyplot as plt
import random
#fixing random state for reproducibility
np.random.seed(1234324)
fig = plt.figure(12)
ax1 = fig.add_subplot(121)
ax2 = fig.add_subplot(122)
#Ratio of the pitch with respect to UEFA standards
hmap= np.full((6, 10), 0)
#print(hmap)
xlist = np.random.uniform(low=0.0, high=100.0, size=(20))
ylist = np.random.uniform(low=0.0, high =100.0, size =(20))
#UEFA Pitch Standards are 105m x 68m
xlist = (xlist/100)*10.5
ylist = (ylist/100)*6.5
ax1.scatter(xlist,ylist)
#int of the co-ordinates to populate the array
xlist_int = xlist.astype (int)
ylist_int = ylist.astype (int)
#print(xlist_int, ylist_int)
for i, j in zip(xlist_int, ylist_int):
#this populates the array according to the x,y co-ordinate values it encounters
hmap[j][i]= hmap[j][i] + 1
#Reversing the rows is necessary
hmap = hmap[::-1]
#print(hmap)
im = ax2.imshow(hmap)
Here's the result
None of these solutions worked for my application, so this is what I came up with. Essentially I am placing a 2D Gaussian at every single point:
import cv2
import numpy as np
import matplotlib.pyplot as plt
def getGaussian2D(ksize, sigma, norm=True):
oneD = cv2.getGaussianKernel(ksize=ksize, sigma=sigma)
twoD = np.outer(oneD.T, oneD)
return twoD / np.sum(twoD) if norm else twoD
def pt2heat(pts, shape, kernel=16, sigma=5):
heat = np.zeros(shape)
k = getGaussian2D(kernel, sigma)
for y,x in pts:
x, y = int(x), int(y)
for i in range(-kernel//2, kernel//2):
for j in range(-kernel//2, kernel//2):
if 0 <= x+i < shape[0] and 0 <= y+j < shape[1]:
heat[x+i, y+j] = heat[x+i, y+j] + k[i+kernel//2, j+kernel//2]
return heat
heat = pts2heat(pts, img.shape[:2])
plt.imshow(heat, cmap='heat')
Here are the points overlayed ontop of it's associated image, along with the resulting heat map:

Categories

Resources