In Matplotlib, it's not too tough to make a legend (example_legend(), below), but I think it's better style to put labels right on the curves being plotted (as in example_inline(), below). This can be very fiddly, because I have to specify coordinates by hand, and, if I re-format the plot, I probably have to reposition the labels. Is there a way to automatically generate labels on curves in Matplotlib? Bonus points for being able to orient the text at an angle corresponding to the angle of the curve.
import numpy as np
import matplotlib.pyplot as plt
def example_legend():
plt.clf()
x = np.linspace(0, 1, 101)
y1 = np.sin(x * np.pi / 2)
y2 = np.cos(x * np.pi / 2)
plt.plot(x, y1, label='sin')
plt.plot(x, y2, label='cos')
plt.legend()
def example_inline():
plt.clf()
x = np.linspace(0, 1, 101)
y1 = np.sin(x * np.pi / 2)
y2 = np.cos(x * np.pi / 2)
plt.plot(x, y1, label='sin')
plt.plot(x, y2, label='cos')
plt.text(0.08, 0.2, 'sin')
plt.text(0.9, 0.2, 'cos')
Update: User cphyc has kindly created a Github repository for the code in this answer (see here), and bundled the code into a package which may be installed using pip install matplotlib-label-lines.
Pretty Picture:
In matplotlib it's pretty easy to label contour plots (either automatically or by manually placing labels with mouse clicks). There does not (yet) appear to be any equivalent capability to label data series in this fashion! There may be some semantic reason for not including this feature which I am missing.
Regardless, I have written the following module which takes any allows for semi-automatic plot labelling. It requires only numpy and a couple of functions from the standard math library.
Description
The default behaviour of the labelLines function is to space the labels evenly along the x axis (automatically placing at the correct y-value of course). If you want you can just pass an array of the x co-ordinates of each of the labels. You can even tweak the location of one label (as shown in the bottom right plot) and space the rest evenly if you like.
In addition, the label_lines function does not account for the lines which have not had a label assigned in the plot command (or more accurately if the label contains '_line').
Keyword arguments passed to labelLines or labelLine are passed on to the text function call (some keyword arguments are set if the calling code chooses not to specify).
Issues
Annotation bounding boxes sometimes interfere undesirably with other curves. As shown by the 1 and 10 annotations in the top left plot. I'm not even sure this can be avoided.
It would be nice to specify a y position instead sometimes.
It's still an iterative process to get annotations in the right location
It only works when the x-axis values are floats
Gotchas
By default, the labelLines function assumes that all data series span the range specified by the axis limits. Take a look at the blue curve in the top left plot of the pretty picture. If there were only data available for the x range 0.5-1 then then we couldn't possibly place a label at the desired location (which is a little less than 0.2). See this question for a particularly nasty example. Right now, the code does not intelligently identify this scenario and re-arrange the labels, however there is a reasonable workaround. The labelLines function takes the xvals argument; a list of x-values specified by the user instead of the default linear distribution across the width. So the user can decide which x-values to use for the label placement of each data series.
Also, I believe this is the first answer to complete the bonus objective of aligning the labels with the curve they're on. :)
label_lines.py:
from math import atan2,degrees
import numpy as np
#Label line with line2D label data
def labelLine(line,x,label=None,align=True,**kwargs):
ax = line.axes
xdata = line.get_xdata()
ydata = line.get_ydata()
if (x < xdata[0]) or (x > xdata[-1]):
print('x label location is outside data range!')
return
#Find corresponding y co-ordinate and angle of the line
ip = 1
for i in range(len(xdata)):
if x < xdata[i]:
ip = i
break
y = ydata[ip-1] + (ydata[ip]-ydata[ip-1])*(x-xdata[ip-1])/(xdata[ip]-xdata[ip-1])
if not label:
label = line.get_label()
if align:
#Compute the slope
dx = xdata[ip] - xdata[ip-1]
dy = ydata[ip] - ydata[ip-1]
ang = degrees(atan2(dy,dx))
#Transform to screen co-ordinates
pt = np.array([x,y]).reshape((1,2))
trans_angle = ax.transData.transform_angles(np.array((ang,)),pt)[0]
else:
trans_angle = 0
#Set a bunch of keyword arguments
if 'color' not in kwargs:
kwargs['color'] = line.get_color()
if ('horizontalalignment' not in kwargs) and ('ha' not in kwargs):
kwargs['ha'] = 'center'
if ('verticalalignment' not in kwargs) and ('va' not in kwargs):
kwargs['va'] = 'center'
if 'backgroundcolor' not in kwargs:
kwargs['backgroundcolor'] = ax.get_facecolor()
if 'clip_on' not in kwargs:
kwargs['clip_on'] = True
if 'zorder' not in kwargs:
kwargs['zorder'] = 2.5
ax.text(x,y,label,rotation=trans_angle,**kwargs)
def labelLines(lines,align=True,xvals=None,**kwargs):
ax = lines[0].axes
labLines = []
labels = []
#Take only the lines which have labels other than the default ones
for line in lines:
label = line.get_label()
if "_line" not in label:
labLines.append(line)
labels.append(label)
if xvals is None:
xmin,xmax = ax.get_xlim()
xvals = np.linspace(xmin,xmax,len(labLines)+2)[1:-1]
for line,x,label in zip(labLines,xvals,labels):
labelLine(line,x,label,align,**kwargs)
Test code to generate the pretty picture above:
from matplotlib import pyplot as plt
from scipy.stats import loglaplace,chi2
from labellines import *
X = np.linspace(0,1,500)
A = [1,2,5,10,20]
funcs = [np.arctan,np.sin,loglaplace(4).pdf,chi2(5).pdf]
plt.subplot(221)
for a in A:
plt.plot(X,np.arctan(a*X),label=str(a))
labelLines(plt.gca().get_lines(),zorder=2.5)
plt.subplot(222)
for a in A:
plt.plot(X,np.sin(a*X),label=str(a))
labelLines(plt.gca().get_lines(),align=False,fontsize=14)
plt.subplot(223)
for a in A:
plt.plot(X,loglaplace(4).pdf(a*X),label=str(a))
xvals = [0.8,0.55,0.22,0.104,0.045]
labelLines(plt.gca().get_lines(),align=False,xvals=xvals,color='k')
plt.subplot(224)
for a in A:
plt.plot(X,chi2(5).pdf(a*X),label=str(a))
lines = plt.gca().get_lines()
l1=lines[-1]
labelLine(l1,0.6,label=r'$Re=${}'.format(l1.get_label()),ha='left',va='bottom',align = False)
labelLines(lines[:-1],align=False)
plt.show()
#Jan Kuiken's answer is certainly well-thought and thorough, but there are some caveats:
it does not work in all cases
it requires a fair amount of extra code
it may vary considerably from one plot to the next
A much simpler approach is to annotate the last point of each plot. The point can also be circled, for emphasis. This can be accomplished with one extra line:
import matplotlib.pyplot as plt
for i, (x, y) in enumerate(samples):
plt.plot(x, y)
plt.text(x[-1], y[-1], f'sample {i}')
A variant would be to use the method matplotlib.axes.Axes.annotate.
Nice question, a while ago I've experimented a bit with this, but haven't used it a lot because it's still not bulletproof. I divided the plot area into a 32x32 grid and calculated a 'potential field' for the best position of a label for each line according the following rules:
white space is a good place for a label
Label should be near corresponding line
Label should be away from the other lines
The code was something like this:
import matplotlib.pyplot as plt
import numpy as np
from scipy import ndimage
def my_legend(axis = None):
if axis == None:
axis = plt.gca()
N = 32
Nlines = len(axis.lines)
print Nlines
xmin, xmax = axis.get_xlim()
ymin, ymax = axis.get_ylim()
# the 'point of presence' matrix
pop = np.zeros((Nlines, N, N), dtype=np.float)
for l in range(Nlines):
# get xy data and scale it to the NxN squares
xy = axis.lines[l].get_xydata()
xy = (xy - [xmin,ymin]) / ([xmax-xmin, ymax-ymin]) * N
xy = xy.astype(np.int32)
# mask stuff outside plot
mask = (xy[:,0] >= 0) & (xy[:,0] < N) & (xy[:,1] >= 0) & (xy[:,1] < N)
xy = xy[mask]
# add to pop
for p in xy:
pop[l][tuple(p)] = 1.0
# find whitespace, nice place for labels
ws = 1.0 - (np.sum(pop, axis=0) > 0) * 1.0
# don't use the borders
ws[:,0] = 0
ws[:,N-1] = 0
ws[0,:] = 0
ws[N-1,:] = 0
# blur the pop's
for l in range(Nlines):
pop[l] = ndimage.gaussian_filter(pop[l], sigma=N/5)
for l in range(Nlines):
# positive weights for current line, negative weight for others....
w = -0.3 * np.ones(Nlines, dtype=np.float)
w[l] = 0.5
# calculate a field
p = ws + np.sum(w[:, np.newaxis, np.newaxis] * pop, axis=0)
plt.figure()
plt.imshow(p, interpolation='nearest')
plt.title(axis.lines[l].get_label())
pos = np.argmax(p) # note, argmax flattens the array first
best_x, best_y = (pos / N, pos % N)
x = xmin + (xmax-xmin) * best_x / N
y = ymin + (ymax-ymin) * best_y / N
axis.text(x, y, axis.lines[l].get_label(),
horizontalalignment='center',
verticalalignment='center')
plt.close('all')
x = np.linspace(0, 1, 101)
y1 = np.sin(x * np.pi / 2)
y2 = np.cos(x * np.pi / 2)
y3 = x * x
plt.plot(x, y1, 'b', label='blue')
plt.plot(x, y2, 'r', label='red')
plt.plot(x, y3, 'g', label='green')
my_legend()
plt.show()
And the resulting plot:
matplotx (which I wrote) has line_labels() which plots the labels to the right of the lines. It's also smart enough to avoid overlaps when too many lines are concentrated in one spot. (See stargraph for examples.) It does that by solving a particular non-negative-least-squares problem on the target positions of the labels. Anyway, in many cases where there's no overlap to begin with, such as the example below, that's not even necessary.
import matplotlib.pyplot as plt
import matplotx
import numpy as np
# create data
rng = np.random.default_rng(0)
offsets = [1.0, 1.50, 1.60]
labels = ["no balancing", "CRV-27", "CRV-27*"]
x0 = np.linspace(0.0, 3.0, 100)
y = [offset * x0 / (x0 + 1) + 0.1 * rng.random(len(x0)) for offset in offsets]
# plot
with plt.style.context(matplotx.styles.dufte):
for yy, label in zip(y, labels):
plt.plot(x0, yy, label=label)
plt.xlabel("distance [m]")
matplotx.ylabel_top("voltage [V]") # move ylabel to the top, rotate
matplotx.line_labels() # line labels to the right
plt.show()
# plt.savefig("out.png", bbox_inches="tight")
A simpler approach like the one Ioannis Filippidis do :
import matplotlib.pyplot as plt
import numpy as np
# evenly sampled time at 200ms intervals
tMin=-1 ;tMax=10
t = np.arange(tMin, tMax, 0.1)
# red dashes, blue points default
plt.plot(t, 22*t, 'r--', t, t**2, 'b')
factor=3/4 ;offset=20 # text position in view
textPosition=[(tMax+tMin)*factor,22*(tMax+tMin)*factor]
plt.text(textPosition[0],textPosition[1]+offset,'22 t',color='red',fontsize=20)
textPosition=[(tMax+tMin)*factor,((tMax+tMin)*factor)**2+20]
plt.text(textPosition[0],textPosition[1]+offset, 't^2', bbox=dict(facecolor='blue', alpha=0.5),fontsize=20)
plt.show()
code python 3 on sageCell
Related
I have a dataset for curvature and I need to find the tangent to the curve but unfortunately, this is a bit far from the curve. Kindly guide me the issue solution related to the problem. Thank you!
My code is as follows:
fig, ax1 = plt.subplots()
chData_m = efficient.get('Car.Road.y')
x_fit = chData_m.timestamps
y_fit = chData_m.samples
fittedParameters = np.polyfit(x_fit[:],y_fit[:],1)
f = plt.figure(figsize=(800/100.0, 600/100.0), dpi=100)
axes = f.add_subplot(111)
# first the raw data as a scatter plot
axes.plot(x_fit, y_fit, 'D')
# create data for the fitted equation plot
xModel = np.linspace(min(x_fit), max(x_fit))
yModel = np.polyval(fittedParameters, xModel)
# now the model as a line plot
axes.plot(xModel, yModel)
axes.set_xlabel('X Data') # X axis data label
axes.set_ylabel('Y Data') # Y axis data label
# polynomial derivative from numpy
deriv = np.polyder(fittedParameters)
# for plotting
minX = min(x_fit)
maxX = max(x_fit)
# value of derivative (slope) at a specific X value, so
# that a straight line tangent can be plotted at the point
# you might place this code in a loop to animate
pointVal = 10.0 # example X value
y_value_at_point = np.polyval(fittedParameters, pointVal)
slope_at_point = np.polyval(deriv, pointVal)
ylow = (minX - pointVal) * slope_at_point + y_value_at_point
yhigh = (maxX - pointVal) * slope_at_point + y_value_at_point
# now the tangent as a line plot
axes.plot([minX, maxX], [ylow, yhigh])
plt.show()
plt.close('all') # clean up after using pyplot
and the output is:
I am not sure how you wanted to use numpy polyfit/polyval to determine the tangent formula. I describe here a different approach. The advantage of this approach is that it does not have any assumptions about the nature of the function. The disadvantage is that it will not work for vertical tangents.
To be on the safe side, I have considered both cases, i.e., that the evaluated x-value is a data point in your series and that it is not. Some problems may arise because I see that you mention timestamps in your question without specifying their nature by providing a toy dataset - so, this version may or may not work with the datetime objects or timestamps of your original data:
import matplotlib.pyplot as plt
import numpy as np
#generate fake data with unique random x-values between 0 and 70
def func(x, a=0, b=100, c=1, n=3.5):
return a + (b/(1+(c/x)**n))
np.random.seed(123)
x = np.random.choice(range(700000), 100)/10000
x.sort()
y = func(x, 1, 2, 15, 2.4)
#data point to evaluate
xVal = 29
#plot original data
fig, ax = plt.subplots()
ax.plot(x, y, c="blue", label="data")
#calculate gradient
slope = np.gradient(y, x)
#determine slope and intercept at xVal
ind1 = (np.abs(x - xVal)).argmin()
#case 1 the value is a data point
if xVal == x[ind1]:
yVal, slopeVal = y[ind1], slope[ind1]
#case 2 the value lies between to data points
#in which case we approximate linearly from the two nearest data points
else:
if xVal < x[ind1]:
ind1, ind2 = ind1-1, ind1
else:
ind1, ind2 = ind1, ind1+1
yVal = y[ind1] + (y[ind2]-y[ind1]) * (xVal-x[ind1]) / (x[ind2]-x[ind1])
slopeVal = slope[ind1] + (slope[ind2]-slope[ind1]) * (xVal-x[ind1]) / (x[ind2]-x[ind1])
intercVal = yVal - slopeVal * xVal
ax.plot([x.min(), x.max()], [slopeVal*x.min()+intercVal, slopeVal*x.max()+intercVal], color="green",
label=f"tangent\nat point [{xVal:.1f}, {yVal:.1f}]\nwith slope {slopeVal:.2f}\nand intercept {intercVal:.2f}" )
ax.set_ylim(0.8 * y.min(), 1.2 * y.max())
ax.legend()
plt.show()
For my report, I'm creating a special color plot in jupyter notebook. There are two parameters, x and y.
import numpy as np
x = np.arange(-1,1,0.1)
y = np.arange(1,11,1)
with which I compute a third quantity. Here is an example to demonstrate the concept:
values = []
for i in range(len(y)) :
z = y[i] * x**3
# in my case the value z represents phases of oscillators
# so I will transform the computed values to the intervall [0,2pi)
values.append(z)
values = np.array(values) % 2*np.pi
I'm plotting y vs x. For each y = 1,2,3,4... there will be a horizontal line with total length two. For example: The coordinate (0.5,8) stands for a single point on line 8 at position x = 0.5 and z(0.5,8) is its associated value.
Now I want to represent each point on all ten lines with a unique color that is determined by z(x,y). Since z(x,y) takes only values in [0,2pi) I need a color scheme that starts at zero (for example z=0 corresponds to blue). For increasing z the color continuously changes and in the end at 2pi it takes the same color again (so at z ~ 2pi it becomes blue again).
Does someone know how this can be done in python?
The kind of structure for x, y and z you need, is easier using a meshgrid. Also, to have a lot of x-values between -1 and 1, np.linspace(-1,1,N) divides the range in N even intervals.
Using meshgrid, z can be calculated in one line using numpy's vectorization. This runs much faster.
To set a repeating color, a cyclic colormap such as hsv can be used. There the last color is the same as the starting color.
import numpy as np
from matplotlib import pyplot as plt
x, y = np.meshgrid(np.linspace(-1,1,100), np.arange(1,11,1))
z = (y * x**3) % 2*np.pi
plt.scatter(x, y, c=z, s=6, cmap='hsv')
plt.yticks(range(1,11))
plt.show()
Alternatively, a symmetric colormap could be built taken the colors from and existing map and combining them with the same colors in reverse order.
import numpy as np
from matplotlib import pyplot as plt
import matplotlib.colors as mcolors
colors_orig = plt.cm.viridis_r(np.linspace(0, 1, 128))
# combine the colors with the reversed array and build a new colormap
colors = np.vstack((colors_orig, colors_orig[::-1]))
symcmap = mcolors.LinearSegmentedColormap.from_list('symcmap', colors)
x, y = np.meshgrid(np.linspace(-1,1,100), np.arange(1,11,1))
z = (y * x**3) % 2*np.pi
plt.scatter(x, y, c=z, s=6, cmap=symcmap)
plt.yticks(range(1,11))
plt.show()
Multicolored lines are somewhat more complicated than just scatter plots. The docs have an example using LineCollections. Here is the adapted code. Note that the line segments are colored using their start point, so make sure there are enough x values. Also, the x and y limits aren't set automatically any more.
The code also adds a colorbar to illustrate how the colors map to the z values. Some interesting code from Jake VanderPlas shows how to create ticks for multiples of π.
import numpy as np
from matplotlib import pyplot as plt
from matplotlib.collections import LineCollection
# code from Jake VanderPlas
def format_func(value, tick_number):
# find number of multiples of pi/2
N = int(np.round(2 * value / np.pi))
if N == 0:
return "0"
elif N == 1:
return r"$\pi/2$"
elif N == 2:
return r"$\pi$"
elif N % 2 > 0:
return r"${0}\pi/2$".format(N)
else:
return r"${0}\pi$".format(N // 2)
x = np.linspace(-1, 1, 500)
y_max = 10
# Create a continuous norm to map from data points to colors
norm = plt.Normalize(0, 2 * np.pi)
for y in range(1, y_max + 1):
z = (y * x ** 3) % 2 * np.pi
y_array = y * np.ones_like(x)
points = np.array([x, y_array]).T.reshape(-1, 1, 2)
segments = np.concatenate([points[:-1], points[1:]], axis=1)
lc = LineCollection(segments, cmap='hsv', norm=norm)
lc.set_array(z) # Set the values used for colormapping
lc.set_linewidth(2)
line = plt.gca().add_collection(lc)
# plt.scatter(x, y_array, c=z, s=10, norm=norm, cmap='hsv')
cbar = plt.colorbar(line) # , ticks=[k*np.pi for k in np.arange(0, 2.001, 0.25)])
cbar.locator = plt.MultipleLocator(np.pi / 2)
cbar.minor_locator = plt.MultipleLocator(np.pi / 4)
cbar.formatter = plt.FuncFormatter(format_func)
cbar.ax.minorticks_on()
cbar.update_ticks()
plt.yticks(range(1, y_max + 1)) # one tick for every y
plt.xlim(x.min(), x.max()) # the LineCollection doesn't force the limits
plt.ylim(0.5, y_max + 0.5)
plt.show()
I would like to plot 2d data as an image, with profile plots through along the x and y axis displayed below and to the side. It's a pretty common way to display data so there may be an easier way to approach this. I would like to find the most simple and robust way that does so correctly, and without using anything outside of matplotlib (though I would be interested in knowing of other packages that may be particularly relevant). In particular, the method should work without changing anything if the shape (aspect ratio) of the data changes.
My main issue is getting the side plots to scale correctly so their borders match up with main plot.
Example code:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
# generate grid and test data
x, y = np.linspace(-3,3,300), np.linspace(-1,1,100)
X, Y = np.meshgrid(x,y)
def f(x,y) :
return np.exp(-(x**2/4+y**2)/.2)*np.cos((x**2+y**2)*10)**2
data = f(X,Y)
# 2d image plot with profiles
h, w = data.shape
gs = gridspec.GridSpec(2, 2,width_ratios=[w,w*.2], height_ratios=[h,h*.2])
ax = [plt.subplot(gs[0]),plt.subplot(gs[1]),plt.subplot(gs[2])]
bounds = [x.min(),x.max(),y.min(),y.max()]
ax[0].imshow(data, cmap='gray', extent = bounds, origin='lower')
ax[1].plot(data[:,w/2],Y[:,w/2],'.',data[:,w/2],Y[:,w/2])
ax[1].axis([data[:,w/2].max(), data[:,w/2].min(), Y.min(), Y.max()])
ax[2].plot(X[h/2,:],data[h/2,:],'.',X[h/2,:],data[h/2,:])
plt.show()
As you can see from the output below, the way things are scaled the image to the right does not properly match the boundaries.
Partial solutions:
1) Manually play with the figure size to find the right aspect ratio so that it appears correctly (could do automatically using the image ratio + padding + the width ratios used?). Seems tacky when there are already so many options for packing that are supposed to take care of these things automatically. EDIT: plt.gcf().set_figheight(f.get_figwidth()*h/w) seems to work if padding is not changed.
2) Add ax[0].set_aspect('auto') , which then makes boundaries line up, but the image no longer has the correct aspect ratio.
Output from code sample above:
you can use sharex and sharey to do this, replace your ax= line with this:
ax = [plt.subplot(gs[0]),]
ax.append(plt.subplot(gs[1], sharey=ax[0]))
ax.append(plt.subplot(gs[2], sharex=ax[0]))
I haven't been able to generate your layout by using subplot and gridspec, while still preserving (1) the ratio of the axes and (2) the limits imposed on the axis. An alternative solution would be to place your axes manually in your figure instead and to control the size of the figure accordingly (as you already mentioned in your OP). Although this requires more work than using subplot and gridspec, this approach remains quite simple and can be very powerful and flexible to produce complex layouts where a fine control over the margins and the placement of the axes is desired.
Below is an example that shows how this can be achieve by setting the size of the figure accordingly to the size given to the axes. Inversely, it would also be possible to fit the axes within a figure of a predefined size. The aspect ratio of the axes would then be kept by using the figure margins as a buffer.
import numpy as np
import matplotlib.pyplot as plt
plt.close('all')
#------------------------------------------------------------ generate data ----
# generate grid and test data
x, y = np.linspace(-3, 3, 300), np.linspace(-1, 1, 100)
X, Y = np.meshgrid(x,y)
def f(x,y) :
return np.exp(-(x**2/4+y**2)/.2)*np.cos((x**2+y**2)*10)**2
data = f(X,Y)
# 2d image plot with profiles
h, w = data.shape
data_ratio = h / float(w)
#------------------------------------------------------------ create figure ----
#--- define axes lenght in inches ----
width_ax0 = 8.
width_ax1 = 2.
height_ax2 = 2.
height_ax0 = width_ax0 * data_ratio
#---- define margins size in inches ----
left_margin = 0.65
right_margin = 0.2
bottom_margin = 0.5
top_margin = 0.25
inter_margin = 0.5
#--- calculate total figure size in inches ----
fwidth = left_margin + right_margin + inter_margin + width_ax0 + width_ax1
fheight = bottom_margin + top_margin + inter_margin + height_ax0 + height_ax2
fig = plt.figure(figsize=(fwidth, fheight))
fig.patch.set_facecolor('white')
#---------------------------------------------------------------- create axe----
ax0 = fig.add_axes([left_margin / fwidth,
(bottom_margin + inter_margin + height_ax2) / fheight,
width_ax0 / fwidth, height_ax0 / fheight])
ax1 = fig.add_axes([(left_margin + width_ax0 + inter_margin) / fwidth,
(bottom_margin + inter_margin + height_ax2) / fheight,
width_ax1 / fwidth, height_ax0 / fheight])
ax2 = fig.add_axes([left_margin / fwidth, bottom_margin / fheight,
width_ax0 / fwidth, height_ax2 / fheight])
#---------------------------------------------------------------- plot data ----
bounds = [x.min(),x.max(),y.min(),y.max()]
ax0.imshow(data, cmap='gray', extent = bounds, origin='lower')
ax1.plot(data[:,w/2],Y[:,w/2],'.',data[:,w/2],Y[:,w/2])
ax1.invert_xaxis()
ax2.plot(X[h/2,:], data[h/2,:], '.', X[h/2,:], data[h/2,:])
plt.show(block=False)
fig.savefig('subplot_layout.png')
Which results in:
Interestingly the solution with sharex and sharey do not work for me. They align the axis ranges but not the axis lengths!
To have them aligned reliably I added:
pos = ax[0].get_position()
pos1 = ax[1].get_position()
pos2 = ax[2].get_position()
ax[1].set_position([pos1.x0,pos.y0,pos1.width,pos.height])
ax[2].set_position([pos.x0,pos2.y0,pos.width,pos2.height])
So in context with the earlier answer from CT Zhu this makes:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
# generate grid and test data
x, y = np.linspace(-3,3,300), np.linspace(-1,1,100)
X, Y = np.meshgrid(x,y)
def f(x,y) :
return np.exp(-(x**2/4+y**2)/.2)*np.cos((x**2+y**2)*10)**2
data = f(X,Y)
# 2d image plot with profiles
h, w = data.shape
gs = gridspec.GridSpec(2, 2,width_ratios=[w,w*.2], height_ratios=[h,h*.2])
ax = [plt.subplot(gs[0]),]
ax.append(plt.subplot(gs[1], sharey=ax[0]))
ax.append(plt.subplot(gs[2], sharex=ax[0]))
bounds = [x.min(),x.max(),y.min(),y.max()]
ax[0].imshow(data, cmap='gray', extent = bounds, origin='lower')
ax[1].plot(data[:,int(w/2)],Y[:,int(w/2)],'.',data[:,int(w/2)],Y[:,int(w/2)])
ax[1].axis([data[:,int(w/2)].max(), data[:,int(w/2)].min(), Y.min(), Y.max()])
ax[2].plot(X[int(h/2),:],data[int(h/2),:],'.',X[int(h/2),:],data[int(h/2),:])
pos = ax[0].get_position()
pos1 = ax[1].get_position()
pos2 = ax[2].get_position()
ax[1].set_position([pos1.x0,pos.y0,pos1.width,pos.height])
ax[2].set_position([pos.x0,pos2.y0,pos.width,pos2.height])
plt.show()
In R, there is a function called abline in which a line can be drawn on a plot based on the specification of the intercept (first argument) and the slope (second argument). For instance,
plot(1:10, 1:10)
abline(0, 1)
where the line with an intercept of 0 and the slope of 1 spans the entire range of the plot. Is there such a function in Matplotlib?
A lot of these solutions are focusing on adding a line to the plot that fits the data. Here's a simple solution for adding an arbitrary line to the plot based on a slope and intercept.
import matplotlib.pyplot as plt
import numpy as np
def abline(slope, intercept):
"""Plot a line from slope and intercept"""
axes = plt.gca()
x_vals = np.array(axes.get_xlim())
y_vals = intercept + slope * x_vals
plt.plot(x_vals, y_vals, '--')
I know this question is a couple years old, but since there is no accepted answer, I'll add what works for me.
You could just plot the values in your graph, and then generate another set of values for the coordinates of the best fit line and plot that over your original graph. For example, see the following code:
import matplotlib.pyplot as plt
import numpy as np
# Some dummy data
x = [1, 2, 3, 4, 5, 6, 7]
y = [1, 3, 3, 2, 5, 7, 9]
# Find the slope and intercept of the best fit line
slope, intercept = np.polyfit(x, y, 1)
# Create a list of values in the best fit line
abline_values = [slope * i + intercept for i in x]
# Plot the best fit line over the actual values
plt.plot(x, y, '--')
plt.plot(x, abline_values, 'b')
plt.title(slope)
plt.show()
As of 2021, in matplotlib 3.3.4, it supports drawing lines with slope value and a point.
fig, ax = plt.subplots()
ax.axline((0, 4), slope=3., color='C0', label='by slope')
ax.set_xlim(0, 1)
ax.set_ylim(3, 5)
ax.legend()
X = np.array([1, 2, 3, 4, 5, 6, 7])
Y = np.array([1.1,1.9,3.0,4.1,5.2,5.8,7])
scatter (X,Y)
slope, intercept = np.polyfit(X, Y, 1)
plot(X, X*slope + intercept, 'r')
It looks like this feature will be part of version 3.3.0:
matplotlib.axes.Axes.axline
You'll be, for example, able to draw a red line through points (0, 0) and (1, 1) using
axline((0, 0), (1, 1), linewidth=4, color='r')
I couldn't figure a way to do it without resorting to callbacks, but this seems to work fairly well.
import numpy as np
from matplotlib import pyplot as plt
class ABLine2D(plt.Line2D):
"""
Draw a line based on its slope and y-intercept. Additional arguments are
passed to the <matplotlib.lines.Line2D> constructor.
"""
def __init__(self, slope, intercept, *args, **kwargs):
# get current axes if user has not specified them
if not 'axes' in kwargs:
kwargs.update({'axes':plt.gca()})
ax = kwargs['axes']
# if unspecified, get the current line color from the axes
if not ('color' in kwargs or 'c' in kwargs):
kwargs.update({'color':ax._get_lines.color_cycle.next()})
# init the line, add it to the axes
super(ABLine2D, self).__init__([], [], *args, **kwargs)
self._slope = slope
self._intercept = intercept
ax.add_line(self)
# cache the renderer, draw the line for the first time
ax.figure.canvas.draw()
self._update_lim(None)
# connect to axis callbacks
self.axes.callbacks.connect('xlim_changed', self._update_lim)
self.axes.callbacks.connect('ylim_changed', self._update_lim)
def _update_lim(self, event):
""" called whenever axis x/y limits change """
x = np.array(self.axes.get_xbound())
y = (self._slope * x) + self._intercept
self.set_data(x, y)
self.axes.draw_artist(self)
I suppose for the case of (intercept, slope) of (0, 1) the following function could be used and extended to accommodate other slopes and intercepts, but won't readjust if axis limits are changed or autoscale is turned back on.
def abline():
gca = plt.gca()
gca.set_autoscale_on(False)
gca.plot(gca.get_xlim(),gca.get_ylim())
import matplotlib.pyplot as plt
plt.scatter(range(10),range(10))
abline()
plt.draw()
I'd like to expand on the answer from David Marx, where we are making sure that the sloped line does not expand over the original plotting area.
Since the x-axis limits are used to calculate the y-data for the sloped line, we need to make sure, that the calculated y-data does not extend the given ymin - ymax range. If it does crop the displayed data.
def abline(slope, intercept,**styles):
"""Plot a line from slope and intercept"""
axes = plt.gca()
xmin,xmax = np.array(axes.get_xlim())
ymin,ymax = np.array(axes.get_ylim()) # get also y limits
x_vals = np.linspace(xmin,xmax,num=1000) #increased sampling (only actually needed for large slopes)
y_vals = intercept + slope * x_vals
locpos = np.where(y_vals<ymax)[0] # if data extends above ymax
locneg = np.where(y_vals>ymin)[0] # if data extends below ymin
# select most restricitive condition
if len(locpos) >= len(locneg):
loc = locneg
else:
loc = locpos
plt.plot(x_vals[loc], y_vals[loc], '--',**styles)
return y_vals
Here's a possible workaround I came up with: suppose I have my intercept coordinates stored as x_intercept and y_intercept, and the slope (m) saved as my_slope which was found through the renowned equation m = (y2-y1)/(x2-x1), or in whichever way you managed to find it.
Using the other famous general equation for a line y = mx + q, I define the function find_second_point that first computes the q (since m, x and y are known) and then computes another random point that belongs to that line.
Once I have the two points (the initial x_intercept,y_intercept and the newly found new_x,new_y), I simply plot the segment through those two points. Here's the code:
import numpy as np
import matplotlib.pyplot as plt
x_intercept = 3 # invented x coordinate
y_intercept = 2 # invented y coordinate
my_slope = 1 # invented slope value
def find_second_point(slope,x0,y0):
# this function returns a point which belongs to the line that has the slope
# inserted by the user and that intercepts the point (x0,y0) inserted by the user
q = y0 - (slope*x0) # calculate q
new_x = x0 + 10 # generate random x adding 10 to the intersect x coordinate
new_y = (slope*new_x) + q # calculate new y corresponding to random new_x created
return new_x, new_y # return x and y of new point that belongs to the line
# invoke function to calculate the new point
new_x, new_y = find_second_point(my_slope , x_intercept, y_intercept)
plt.figure(1) # create new figure
plt.plot((x_intercept, new_x),(y_intercept, new_y), c='r', label='Segment')
plt.scatter(x_intercept, y_intercept, c='b', linewidths=3, label='Intercept')
plt.scatter(new_x, new_y, c='g', linewidths=3, label='New Point')
plt.legend() # add legend to image
plt.show()
here is the image generated by the code:
Short answer inspired by kite.com:
plt.plot(x, s*x + i)
Reproducible code:
import numpy as np
import matplotlib.pyplot as plt
i=3 # intercept
s=2 # slope
x=np.linspace(1,10,50) # from 1 to 10, by 50
plt.plot(x, s*x + i) # abline
plt.show()
One can simply create a list with the line's equation obtained from a particular intercept and slope. Put those values in a list and plot it against any set of numbers you would like. For example- (Lr being the Linear regression model)
te= []
for i in range(11):
te.append(Lr.intercept_ + Lr.coef_*i)
plt.plot(te, '--')
Gets the job done.
You can write a simple function by converting Slope-Intercept form to 2-Point Form.
def mxline(slope, intercept, start, end):
y1 = slope*start + intercept
y2 = slope*end + intercept
plt.plot([start, end], [y1, y2])
Calling the function
mxline(m,c, 0, 20)
OUTPUT
I have a set of X,Y data points (about 10k) that are easy to plot as a scatter plot but that I would like to represent as a heatmap.
I looked through the examples in Matplotlib and they all seem to already start with heatmap cell values to generate the image.
Is there a method that converts a bunch of x, y, all different, to a heatmap (where zones with higher frequency of x, y would be "warmer")?
If you don't want hexagons, you can use numpy's histogram2d function:
import numpy as np
import numpy.random
import matplotlib.pyplot as plt
# Generate some test data
x = np.random.randn(8873)
y = np.random.randn(8873)
heatmap, xedges, yedges = np.histogram2d(x, y, bins=50)
extent = [xedges[0], xedges[-1], yedges[0], yedges[-1]]
plt.clf()
plt.imshow(heatmap.T, extent=extent, origin='lower')
plt.show()
This makes a 50x50 heatmap. If you want, say, 512x384, you can put bins=(512, 384) in the call to histogram2d.
Example:
In Matplotlib lexicon, i think you want a hexbin plot.
If you're not familiar with this type of plot, it's just a bivariate histogram in which the xy-plane is tessellated by a regular grid of hexagons.
So from a histogram, you can just count the number of points falling in each hexagon, discretiize the plotting region as a set of windows, assign each point to one of these windows; finally, map the windows onto a color array, and you've got a hexbin diagram.
Though less commonly used than e.g., circles, or squares, that hexagons are a better choice for the geometry of the binning container is intuitive:
hexagons have nearest-neighbor symmetry (e.g., square bins don't,
e.g., the distance from a point on a square's border to a point
inside that square is not everywhere equal) and
hexagon is the highest n-polygon that gives regular plane
tessellation (i.e., you can safely re-model your kitchen floor with hexagonal-shaped tiles because you won't have any void space between the tiles when you are finished--not true for all other higher-n, n >= 7, polygons).
(Matplotlib uses the term hexbin plot; so do (AFAIK) all of the plotting libraries for R; still i don't know if this is the generally accepted term for plots of this type, though i suspect it's likely given that hexbin is short for hexagonal binning, which is describes the essential step in preparing the data for display.)
from matplotlib import pyplot as PLT
from matplotlib import cm as CM
from matplotlib import mlab as ML
import numpy as NP
n = 1e5
x = y = NP.linspace(-5, 5, 100)
X, Y = NP.meshgrid(x, y)
Z1 = ML.bivariate_normal(X, Y, 2, 2, 0, 0)
Z2 = ML.bivariate_normal(X, Y, 4, 1, 1, 1)
ZD = Z2 - Z1
x = X.ravel()
y = Y.ravel()
z = ZD.ravel()
gridsize=30
PLT.subplot(111)
# if 'bins=None', then color of each hexagon corresponds directly to its count
# 'C' is optional--it maps values to x-y coordinates; if 'C' is None (default) then
# the result is a pure 2D histogram
PLT.hexbin(x, y, C=z, gridsize=gridsize, cmap=CM.jet, bins=None)
PLT.axis([x.min(), x.max(), y.min(), y.max()])
cb = PLT.colorbar()
cb.set_label('mean value')
PLT.show()
Edit: For a better approximation of Alejandro's answer, see below.
I know this is an old question, but wanted to add something to Alejandro's anwser: If you want a nice smoothed image without using py-sphviewer you can instead use np.histogram2d and apply a gaussian filter (from scipy.ndimage.filters) to the heatmap:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
from scipy.ndimage.filters import gaussian_filter
def myplot(x, y, s, bins=1000):
heatmap, xedges, yedges = np.histogram2d(x, y, bins=bins)
heatmap = gaussian_filter(heatmap, sigma=s)
extent = [xedges[0], xedges[-1], yedges[0], yedges[-1]]
return heatmap.T, extent
fig, axs = plt.subplots(2, 2)
# Generate some test data
x = np.random.randn(1000)
y = np.random.randn(1000)
sigmas = [0, 16, 32, 64]
for ax, s in zip(axs.flatten(), sigmas):
if s == 0:
ax.plot(x, y, 'k.', markersize=5)
ax.set_title("Scatter plot")
else:
img, extent = myplot(x, y, s)
ax.imshow(img, extent=extent, origin='lower', cmap=cm.jet)
ax.set_title("Smoothing with $\sigma$ = %d" % s)
plt.show()
Produces:
The scatter plot and s=16 plotted on top of eachother for Agape Gal'lo (click for better view):
One difference I noticed with my gaussian filter approach and Alejandro's approach was that his method shows local structures much better than mine. Therefore I implemented a simple nearest neighbour method at pixel level. This method calculates for each pixel the inverse sum of the distances of the n closest points in the data. This method is at a high resolution pretty computationally expensive and I think there's a quicker way, so let me know if you have any improvements.
Update: As I suspected, there's a much faster method using Scipy's scipy.cKDTree. See Gabriel's answer for the implementation.
Anyway, here's my code:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
def data_coord2view_coord(p, vlen, pmin, pmax):
dp = pmax - pmin
dv = (p - pmin) / dp * vlen
return dv
def nearest_neighbours(xs, ys, reso, n_neighbours):
im = np.zeros([reso, reso])
extent = [np.min(xs), np.max(xs), np.min(ys), np.max(ys)]
xv = data_coord2view_coord(xs, reso, extent[0], extent[1])
yv = data_coord2view_coord(ys, reso, extent[2], extent[3])
for x in range(reso):
for y in range(reso):
xp = (xv - x)
yp = (yv - y)
d = np.sqrt(xp**2 + yp**2)
im[y][x] = 1 / np.sum(d[np.argpartition(d.ravel(), n_neighbours)[:n_neighbours]])
return im, extent
n = 1000
xs = np.random.randn(n)
ys = np.random.randn(n)
resolution = 250
fig, axes = plt.subplots(2, 2)
for ax, neighbours in zip(axes.flatten(), [0, 16, 32, 64]):
if neighbours == 0:
ax.plot(xs, ys, 'k.', markersize=2)
ax.set_aspect('equal')
ax.set_title("Scatter Plot")
else:
im, extent = nearest_neighbours(xs, ys, resolution, neighbours)
ax.imshow(im, origin='lower', extent=extent, cmap=cm.jet)
ax.set_title("Smoothing over %d neighbours" % neighbours)
ax.set_xlim(extent[0], extent[1])
ax.set_ylim(extent[2], extent[3])
plt.show()
Result:
Instead of using np.hist2d, which in general produces quite ugly histograms, I would like to recycle py-sphviewer, a python package for rendering particle simulations using an adaptive smoothing kernel and that can be easily installed from pip (see webpage documentation). Consider the following code, which is based on the example:
import numpy as np
import numpy.random
import matplotlib.pyplot as plt
import sphviewer as sph
def myplot(x, y, nb=32, xsize=500, ysize=500):
xmin = np.min(x)
xmax = np.max(x)
ymin = np.min(y)
ymax = np.max(y)
x0 = (xmin+xmax)/2.
y0 = (ymin+ymax)/2.
pos = np.zeros([len(x),3])
pos[:,0] = x
pos[:,1] = y
w = np.ones(len(x))
P = sph.Particles(pos, w, nb=nb)
S = sph.Scene(P)
S.update_camera(r='infinity', x=x0, y=y0, z=0,
xsize=xsize, ysize=ysize)
R = sph.Render(S)
R.set_logscale()
img = R.get_image()
extent = R.get_extent()
for i, j in zip(xrange(4), [x0,x0,y0,y0]):
extent[i] += j
print extent
return img, extent
fig = plt.figure(1, figsize=(10,10))
ax1 = fig.add_subplot(221)
ax2 = fig.add_subplot(222)
ax3 = fig.add_subplot(223)
ax4 = fig.add_subplot(224)
# Generate some test data
x = np.random.randn(1000)
y = np.random.randn(1000)
#Plotting a regular scatter plot
ax1.plot(x,y,'k.', markersize=5)
ax1.set_xlim(-3,3)
ax1.set_ylim(-3,3)
heatmap_16, extent_16 = myplot(x,y, nb=16)
heatmap_32, extent_32 = myplot(x,y, nb=32)
heatmap_64, extent_64 = myplot(x,y, nb=64)
ax2.imshow(heatmap_16, extent=extent_16, origin='lower', aspect='auto')
ax2.set_title("Smoothing over 16 neighbors")
ax3.imshow(heatmap_32, extent=extent_32, origin='lower', aspect='auto')
ax3.set_title("Smoothing over 32 neighbors")
#Make the heatmap using a smoothing over 64 neighbors
ax4.imshow(heatmap_64, extent=extent_64, origin='lower', aspect='auto')
ax4.set_title("Smoothing over 64 neighbors")
plt.show()
which produces the following image:
As you see, the images look pretty nice, and we are able to identify different substructures on it. These images are constructed spreading a given weight for every point within a certain domain, defined by the smoothing length, which in turns is given by the distance to the closer nb neighbor (I've chosen 16, 32 and 64 for the examples). So, higher density regions typically are spread over smaller regions compared to lower density regions.
The function myplot is just a very simple function that I've written in order to give the x,y data to py-sphviewer to do the magic.
If you are using 1.2.x
import numpy as np
import matplotlib.pyplot as plt
x = np.random.randn(100000)
y = np.random.randn(100000)
plt.hist2d(x,y,bins=100)
plt.show()
Seaborn now has the jointplot function which should work nicely here:
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
# Generate some test data
x = np.random.randn(8873)
y = np.random.randn(8873)
sns.jointplot(x=x, y=y, kind='hex')
plt.show()
Here's Jurgy's great nearest neighbour approach but implemented using scipy.cKDTree. In my tests it's about 100x faster.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
from scipy.spatial import cKDTree
def data_coord2view_coord(p, resolution, pmin, pmax):
dp = pmax - pmin
dv = (p - pmin) / dp * resolution
return dv
n = 1000
xs = np.random.randn(n)
ys = np.random.randn(n)
resolution = 250
extent = [np.min(xs), np.max(xs), np.min(ys), np.max(ys)]
xv = data_coord2view_coord(xs, resolution, extent[0], extent[1])
yv = data_coord2view_coord(ys, resolution, extent[2], extent[3])
def kNN2DDens(xv, yv, resolution, neighbours, dim=2):
"""
"""
# Create the tree
tree = cKDTree(np.array([xv, yv]).T)
# Find the closest nnmax-1 neighbors (first entry is the point itself)
grid = np.mgrid[0:resolution, 0:resolution].T.reshape(resolution**2, dim)
dists = tree.query(grid, neighbours)
# Inverse of the sum of distances to each grid point.
inv_sum_dists = 1. / dists[0].sum(1)
# Reshape
im = inv_sum_dists.reshape(resolution, resolution)
return im
fig, axes = plt.subplots(2, 2, figsize=(15, 15))
for ax, neighbours in zip(axes.flatten(), [0, 16, 32, 63]):
if neighbours == 0:
ax.plot(xs, ys, 'k.', markersize=5)
ax.set_aspect('equal')
ax.set_title("Scatter Plot")
else:
im = kNN2DDens(xv, yv, resolution, neighbours)
ax.imshow(im, origin='lower', extent=extent, cmap=cm.Blues)
ax.set_title("Smoothing over %d neighbours" % neighbours)
ax.set_xlim(extent[0], extent[1])
ax.set_ylim(extent[2], extent[3])
plt.savefig('new.png', dpi=150, bbox_inches='tight')
and the initial question was... how to convert scatter values to grid values, right?
histogram2d does count the frequency per cell, however, if you have other data per cell than just the frequency, you'd need some additional work to do.
x = data_x # between -10 and 4, log-gamma of an svc
y = data_y # between -4 and 11, log-C of an svc
z = data_z #between 0 and 0.78, f1-values from a difficult dataset
So, I have a dataset with Z-results for X and Y coordinates. However, I was calculating few points outside the area of interest (large gaps), and heaps of points in a small area of interest.
Yes here it becomes more difficult but also more fun. Some libraries (sorry):
from matplotlib import pyplot as plt
from matplotlib import cm
import numpy as np
from scipy.interpolate import griddata
pyplot is my graphic engine today,
cm is a range of color maps with some initeresting choice.
numpy for the calculations,
and griddata for attaching values to a fixed grid.
The last one is important especially because the frequency of xy points is not equally distributed in my data. First, let's start with some boundaries fitting to my data and an arbitrary grid size. The original data has datapoints also outside those x and y boundaries.
#determine grid boundaries
gridsize = 500
x_min = -8
x_max = 2.5
y_min = -2
y_max = 7
So we have defined a grid with 500 pixels between the min and max values of x and y.
In my data, there are lots more than the 500 values available in the area of high interest; whereas in the low-interest-area, there are not even 200 values in the total grid; between the graphic boundaries of x_min and x_max there are even less.
So for getting a nice picture, the task is to get an average for the high interest values and to fill the gaps elsewhere.
I define my grid now. For each xx-yy pair, i want to have a color.
xx = np.linspace(x_min, x_max, gridsize) # array of x values
yy = np.linspace(y_min, y_max, gridsize) # array of y values
grid = np.array(np.meshgrid(xx, yy.T))
grid = grid.reshape(2, grid.shape[1]*grid.shape[2]).T
Why the strange shape? scipy.griddata wants a shape of (n, D).
Griddata calculates one value per point in the grid, by a predefined method.
I choose "nearest" - empty grid points will be filled with values from the nearest neighbor. This looks as if the areas with less information have bigger cells (even if it is not the case). One could choose to interpolate "linear", then areas with less information look less sharp. Matter of taste, really.
points = np.array([x, y]).T # because griddata wants it that way
z_grid2 = griddata(points, z, grid, method='nearest')
# you get a 1D vector as result. Reshape to picture format!
z_grid2 = z_grid2.reshape(xx.shape[0], yy.shape[0])
And hop, we hand over to matplotlib to display the plot
fig = plt.figure(1, figsize=(10, 10))
ax1 = fig.add_subplot(111)
ax1.imshow(z_grid2, extent=[x_min, x_max,y_min, y_max, ],
origin='lower', cmap=cm.magma)
ax1.set_title("SVC: empty spots filled by nearest neighbours")
ax1.set_xlabel('log gamma')
ax1.set_ylabel('log C')
plt.show()
Around the pointy part of the V-Shape, you see I did a lot of calculations during my search for the sweet spot, whereas the less interesting parts almost everywhere else have a lower resolution.
Make a 2-dimensional array that corresponds to the cells in your final image, called say heatmap_cells and instantiate it as all zeroes.
Choose two scaling factors that define the difference between each array element in real units, for each dimension, say x_scale and y_scale. Choose these such that all your datapoints will fall within the bounds of the heatmap array.
For each raw datapoint with x_value and y_value:
heatmap_cells[floor(x_value/x_scale),floor(y_value/y_scale)]+=1
Very similar to #Piti's answer, but using 1 call instead of 2 to generate the points:
import numpy as np
import matplotlib.pyplot as plt
pts = 1000000
mean = [0.0, 0.0]
cov = [[1.0,0.0],[0.0,1.0]]
x,y = np.random.multivariate_normal(mean, cov, pts).T
plt.hist2d(x, y, bins=50, cmap=plt.cm.jet)
plt.show()
Output:
Here's one I made on a 1 Million point set with 3 categories (colored Red, Green, and Blue). Here's a link to the repository if you'd like to try the function. Github Repo
histplot(
X,
Y,
labels,
bins=2000,
range=((-3,3),(-3,3)),
normalize_each_label=True,
colors = [
[1,0,0],
[0,1,0],
[0,0,1]],
gain=50)
I'm afraid I'm a little late to the party but I had a similar question a while ago. The accepted answer (by #ptomato) helped me out but I'd also want to post this in case it's of use to someone.
''' I wanted to create a heatmap resembling a football pitch which would show the different actions performed '''
import numpy as np
import matplotlib.pyplot as plt
import random
#fixing random state for reproducibility
np.random.seed(1234324)
fig = plt.figure(12)
ax1 = fig.add_subplot(121)
ax2 = fig.add_subplot(122)
#Ratio of the pitch with respect to UEFA standards
hmap= np.full((6, 10), 0)
#print(hmap)
xlist = np.random.uniform(low=0.0, high=100.0, size=(20))
ylist = np.random.uniform(low=0.0, high =100.0, size =(20))
#UEFA Pitch Standards are 105m x 68m
xlist = (xlist/100)*10.5
ylist = (ylist/100)*6.5
ax1.scatter(xlist,ylist)
#int of the co-ordinates to populate the array
xlist_int = xlist.astype (int)
ylist_int = ylist.astype (int)
#print(xlist_int, ylist_int)
for i, j in zip(xlist_int, ylist_int):
#this populates the array according to the x,y co-ordinate values it encounters
hmap[j][i]= hmap[j][i] + 1
#Reversing the rows is necessary
hmap = hmap[::-1]
#print(hmap)
im = ax2.imshow(hmap)
Here's the result
None of these solutions worked for my application, so this is what I came up with. Essentially I am placing a 2D Gaussian at every single point:
import cv2
import numpy as np
import matplotlib.pyplot as plt
def getGaussian2D(ksize, sigma, norm=True):
oneD = cv2.getGaussianKernel(ksize=ksize, sigma=sigma)
twoD = np.outer(oneD.T, oneD)
return twoD / np.sum(twoD) if norm else twoD
def pt2heat(pts, shape, kernel=16, sigma=5):
heat = np.zeros(shape)
k = getGaussian2D(kernel, sigma)
for y,x in pts:
x, y = int(x), int(y)
for i in range(-kernel//2, kernel//2):
for j in range(-kernel//2, kernel//2):
if 0 <= x+i < shape[0] and 0 <= y+j < shape[1]:
heat[x+i, y+j] = heat[x+i, y+j] + k[i+kernel//2, j+kernel//2]
return heat
heat = pts2heat(pts, img.shape[:2])
plt.imshow(heat, cmap='heat')
Here are the points overlayed ontop of it's associated image, along with the resulting heat map: