Creating a Confidence Ellipsis in a scatterplot using matplotlib - python

How do I create a confidence ellipsis in a scatterplot using matplotlib?
The following code works until creating scatter plot. Then, is anyone familiar with putting confidence ellipses over the scatter plot?
import numpy as np
import matplotlib.pyplot as plt
x = [5,7,11,15,16,17,18]
y = [8, 5, 8, 9, 17, 18, 25]
plt.scatter(x,y)
plt.show()
Following is the reference for Confidence Ellipses from SAS.
http://support.sas.com/documentation/cdl/en/grstatproc/62603/HTML/default/viewer.htm#a003160800.htm
The code in sas is like this:
proc sgscatter data=sashelp.iris(where=(species="Versicolor"));
title "Versicolor Length and Width";
compare y=(sepalwidth petalwidth)
x=(sepallength petallength)
/ reg ellipse=(type=mean) spacing=4;
run;

The following code draws a one, two, and three standard deviation sized ellipses:
x = [5,7,11,15,16,17,18]
y = [8, 5, 8, 9, 17, 18, 25]
cov = np.cov(x, y)
lambda_, v = np.linalg.eig(cov)
lambda_ = np.sqrt(lambda_)
from matplotlib.patches import Ellipse
import matplotlib.pyplot as plt
ax = plt.subplot(111, aspect='equal')
for j in xrange(1, 4):
ell = Ellipse(xy=(np.mean(x), np.mean(y)),
width=lambda_[0]*j*2, height=lambda_[1]*j*2,
angle=np.rad2deg(np.arccos(v[0, 0])))
ell.set_facecolor('none')
ax.add_artist(ell)
plt.scatter(x, y)
plt.show()

After giving the accepted answer a go, I found that it doesn't choose the quadrant correctly when calculating theta, as it relies on np.arccos:
Taking a look at the 'possible duplicate' and Joe Kington's solution on github, I watered his code down to this:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Ellipse
def eigsorted(cov):
vals, vecs = np.linalg.eigh(cov)
order = vals.argsort()[::-1]
return vals[order], vecs[:,order]
x = [5,7,11,15,16,17,18]
y = [25, 18, 17, 9, 8, 5, 8]
nstd = 2
ax = plt.subplot(111)
cov = np.cov(x, y)
vals, vecs = eigsorted(cov)
theta = np.degrees(np.arctan2(*vecs[:,0][::-1]))
w, h = 2 * nstd * np.sqrt(vals)
ell = Ellipse(xy=(np.mean(x), np.mean(y)),
width=w, height=h,
angle=theta, color='black')
ell.set_facecolor('none')
ax.add_artist(ell)
plt.scatter(x, y)
plt.show()

In addition to the accepted answer: I think the correct angle should be:
angle=np.rad2deg(np.arctan2(*v[:,np.argmax(abs(lambda_))][::-1])))
and the corresponding width (larger eigenvalue) and height should be:
width=lambda_[np.argmax(abs(lambda_))]*j*2, height=lambda_[1-np.argmax(abs(lambda_))]*j*2
As we need to find the corresponding eigenvector for the largest eigenvalue. Since "the eigenvalues are not necessarily ordered" according to the specs https://numpy.org/doc/stable/reference/generated/numpy.linalg.eig.html and v[:,i] is the eigenvector corresponding to the eigenvalue lambda_[i]; we should find the correct column of the eigenvector by np.argmax(abs(lambda_)).

There is no need to compute angles explicitly once you have the eigendecomposition of your covariance matrix: the rotation portion already encodes that information for you for free:
cov = np.cov(x, y)
val, rot = np.linalg.eig(cov)
val = np.sqrt(val)
center = np.mean([x, y], axis=1)[:, None]
t = np.linspace(0, 2.0 * np.pi, 1000)
xy = np.stack((np.cos(t), np.sin(t)), axis=-1)
plt.scatter(x, y)
plt.plot(*(rot # (val * xy).T + center))
You can expand your ellipse by applying a scale before translation:
plt.plot(*(2 * rot # (val * xy).T + center))

Related

2D Elliptical fit to x,y data with tilt [duplicate]

How do I create a confidence ellipsis in a scatterplot using matplotlib?
The following code works until creating scatter plot. Then, is anyone familiar with putting confidence ellipses over the scatter plot?
import numpy as np
import matplotlib.pyplot as plt
x = [5,7,11,15,16,17,18]
y = [8, 5, 8, 9, 17, 18, 25]
plt.scatter(x,y)
plt.show()
Following is the reference for Confidence Ellipses from SAS.
http://support.sas.com/documentation/cdl/en/grstatproc/62603/HTML/default/viewer.htm#a003160800.htm
The code in sas is like this:
proc sgscatter data=sashelp.iris(where=(species="Versicolor"));
title "Versicolor Length and Width";
compare y=(sepalwidth petalwidth)
x=(sepallength petallength)
/ reg ellipse=(type=mean) spacing=4;
run;
The following code draws a one, two, and three standard deviation sized ellipses:
x = [5,7,11,15,16,17,18]
y = [8, 5, 8, 9, 17, 18, 25]
cov = np.cov(x, y)
lambda_, v = np.linalg.eig(cov)
lambda_ = np.sqrt(lambda_)
from matplotlib.patches import Ellipse
import matplotlib.pyplot as plt
ax = plt.subplot(111, aspect='equal')
for j in xrange(1, 4):
ell = Ellipse(xy=(np.mean(x), np.mean(y)),
width=lambda_[0]*j*2, height=lambda_[1]*j*2,
angle=np.rad2deg(np.arccos(v[0, 0])))
ell.set_facecolor('none')
ax.add_artist(ell)
plt.scatter(x, y)
plt.show()
After giving the accepted answer a go, I found that it doesn't choose the quadrant correctly when calculating theta, as it relies on np.arccos:
Taking a look at the 'possible duplicate' and Joe Kington's solution on github, I watered his code down to this:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Ellipse
def eigsorted(cov):
vals, vecs = np.linalg.eigh(cov)
order = vals.argsort()[::-1]
return vals[order], vecs[:,order]
x = [5,7,11,15,16,17,18]
y = [25, 18, 17, 9, 8, 5, 8]
nstd = 2
ax = plt.subplot(111)
cov = np.cov(x, y)
vals, vecs = eigsorted(cov)
theta = np.degrees(np.arctan2(*vecs[:,0][::-1]))
w, h = 2 * nstd * np.sqrt(vals)
ell = Ellipse(xy=(np.mean(x), np.mean(y)),
width=w, height=h,
angle=theta, color='black')
ell.set_facecolor('none')
ax.add_artist(ell)
plt.scatter(x, y)
plt.show()
In addition to the accepted answer: I think the correct angle should be:
angle=np.rad2deg(np.arctan2(*v[:,np.argmax(abs(lambda_))][::-1])))
and the corresponding width (larger eigenvalue) and height should be:
width=lambda_[np.argmax(abs(lambda_))]*j*2, height=lambda_[1-np.argmax(abs(lambda_))]*j*2
As we need to find the corresponding eigenvector for the largest eigenvalue. Since "the eigenvalues are not necessarily ordered" according to the specs https://numpy.org/doc/stable/reference/generated/numpy.linalg.eig.html and v[:,i] is the eigenvector corresponding to the eigenvalue lambda_[i]; we should find the correct column of the eigenvector by np.argmax(abs(lambda_)).
There is no need to compute angles explicitly once you have the eigendecomposition of your covariance matrix: the rotation portion already encodes that information for you for free:
cov = np.cov(x, y)
val, rot = np.linalg.eig(cov)
val = np.sqrt(val)
center = np.mean([x, y], axis=1)[:, None]
t = np.linspace(0, 2.0 * np.pi, 1000)
xy = np.stack((np.cos(t), np.sin(t)), axis=-1)
plt.scatter(x, y)
plt.plot(*(rot # (val * xy).T + center))
You can expand your ellipse by applying a scale before translation:
plt.plot(*(2 * rot # (val * xy).T + center))

How to generate a array with points of this curve?

I want to code a program to generate an array with coordinates to follow for drawing a shape like the white here, given are the blue points. Does anyone know how to do something like that or at least can give me a tip?
You could use e.g. InterpolatedUnivariateSpline to interpolate the points. As these spline functions are usually 1D, you could calculate x and y positions separately, depending on a new variable t going from 0 to 1.
import matplotlib.pyplot as plt
import numpy as np
from scipy import interpolate
# positions of the given points
px = [1, 4, 3, 2, 5]
py = [1, 3, 4, 3, 1]
# 5 t-values, at t=0 in point 1, at t=1 reaching point 5
pt = np.linspace(0, 1, len(px))
# sx and sy are functions that interpolate the points at the given t-values
sx = interpolate.InterpolatedUnivariateSpline(pt, px)
sy = interpolate.InterpolatedUnivariateSpline(pt, py)
# calculate many intermediate values
t = np.linspace(0, 1, 500)
x = sx(t)
y = sy(t)
# show the original points together with the spline
fig, ax = plt.subplots(facecolor='black')
ax.axis('off')
plt.scatter(px, py, s=80, color='skyblue')
plt.plot(x, y, color='white')
for i, (xi, yi) in enumerate(zip(px, py), start=1):
ax.text(xi, yi, f'\n {i}', ha='left', va='center', size=30, color='yellow')
plt.show()

Plot average of scattered values in 2D bins as a histogram/hexplot

I have 3 dimensional scattered data x, y, z.
I want to plot the average of z in bins of x and y as a hex plot or 2D histogram plot.
Is there any matplotlib function to do this?
I can only come up with some very cumbersome implementations even though this seems to be a common problem.
E.g. something like this:
Except that the color should depend on the average z values for the (x, y) bin (rather than the number of entries in the (x, y) bin as in the default hexplot/2D histogram functionalities).
If binning is what you are asking, then binned_statistic_2d might work for you. Here's an example:
from scipy.stats import binned_statistic_2d
import numpy as np
x = np.random.uniform(0, 10, 1000)
y = np.random.uniform(10, 20, 1000)
z = np.exp(-(x-3)**2/5 - (y-18)**2/5) + np.random.random(1000)
x_bins = np.linspace(0, 10, 10)
y_bins = np.linspace(10, 20, 10)
ret = binned_statistic_2d(x, y, z, statistic=np.mean, bins=[x_bins, y_bins])
fig, (ax0, ax1) = plt.subplots(1, 2, figsize=(12, 4))
ax0.scatter(x, y, c=z)
ax1.imshow(ret.statistic.T, origin='bottom', extent=(0, 10, 10, 20))
#Andrea's answer is very clear and helpful, but I wanted to mention a faster alternative that does not use the scipy library.
The idea is to do a 2d histogram of x and y weighted by the z variable (it has the sum of the z variable in each bin) and then normalize against the histogram without weights (it has the number of counts in each bin). In this way, you will calculate the average of the z variable in each bin.
The code:
import numpy as np
import matplotlib.pyplot as plt
x = np.random.uniform(0, 10, 10**7)
y = np.random.uniform(10, 20, 10**7)
z = np.exp(-(x-3)**2/5 - (y-18)**2/5) + np.random.random(10**7)
x_bins = np.linspace(0, 10, 50)
y_bins = np.linspace(10, 20, 50)
H, xedges, yedges = np.histogram2d(x, y, bins = [x_bins, y_bins], weights = z)
H_counts, xedges, yedges = np.histogram2d(x, y, bins = [x_bins, y_bins])
H = H/H_counts
plt.imshow(H.T, origin='lower', cmap='RdBu',
extent=[xedges[0], xedges[-1], yedges[0], yedges[-1]])
plt.colorbar()
In my computer, this method is approximately a factor 5 faster than using scipy's binned_statistic_2d.

How can I give specific x values to `scipy.interpolate.splev`?

How can I interpolate a hysteresis loop at specific x points? Multiple related questions/answers are available on SOF regarding B-spline interpolation using scipy.interpolate.splprep (other questions here or here). However, I have hundreds of hysteresis loops at very similar (but not exactly same) x positions and I would like to perform B-spline interpolation on all of them at specific x coordinates.
Taking a previous example:
import numpy as np
from scipy import interpolate
from matplotlib import pyplot as plt
x = np.array([23, 24, 24, 25, 25])
y = np.array([13, 12, 13, 12, 13])
# append the starting x,y coordinates
x = np.r_[x, x[0]]
y = np.r_[y, y[0]]
# fit splines to x=f(u) and y=g(u), treating both as periodic. also note that s=0
# is needed in order to force the spline fit to pass through all the input points.
tck, u = interpolate.splprep([x, y], s=0, per=True)
# evaluate the spline fits for 1000 evenly spaced distance values
xi, yi = interpolate.splev(np.linspace(0, 1, 1000), tck)
# plot the result
fig, ax = plt.subplots(1, 1)
ax.plot(x, y, 'or')
ax.plot(xi, yi, '-b')
plt.show()
Is it possible to provide specific x values to interpolate.splev? I get unexpected results:
x2, y2 = interpolate.splev(np.linspace(start=23, stop=25, num=30), tck)
fig, ax = plt.subplots(1, 1)
ax.plot(x, y, 'or')
ax.plot(x2, y2, '-b')
plt.show()
The b-spline gives x and y positions for a given u (between 0 and 1).
Getting y positions for a given x position involves solving for the inverse. As there can be many y's corresponding to one x (in the given example there are places with 4 y's, for example at x=24).
A simple way to get a list of (x,y)'s for x between two limits, is to create a filter:
import numpy as np
from scipy import interpolate
from matplotlib import pyplot as plt
x = np.array([23, 24, 24, 25, 25])
y = np.array([13, 12, 13, 12, 13])
# append the starting x,y coordinates
x = np.r_[x, x[0]]
y = np.r_[y, y[0]]
tck, u = interpolate.splprep([x, y], s=0, per=True)
# evaluate the spline fits for 1000 evenly spaced distance values
xi, yi = interpolate.splev(np.linspace(0, 1, 1000), tck)
# plot the result
fig, ax = plt.subplots(1, 1)
ax.plot(x, y, 'or')
ax.plot(xi, yi, '-b')
filter = (xi >= 24) & (xi <= 25)
x2 = xi[filter]
y2 = yi[filter]
ax.scatter(x2, y2, color='c')
plt.show()

Facecolor changing edgecolor in matplotlib

I am trying to remove the edge color in the plot of a cylinder where I have set an alpha and facecolors. However, if I also set the facecolors, I can still see the edge colors. If I remove the alpha = 0.5 statement then the problem is resolved, however I need the alpha to be <1 . Here is an example:
You can still see the blue edgecolors even tough I have set the edgecolor to None.
This is the code where I use plot_surface()
ax.plot_surface(X, Y,Z, edgecolor = "None", facecolors = col1, alpha = 0.5)
Yet the edge colors are still there? However, if I remove the facecolors statement inside plot_surface() then the edge colors are no longer there. Here is the complete code:
import numpy as np
from matplotlib import cm
from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from scipy.linalg import norm
from mpl_toolkits.mplot3d.art3d import Poly3DCollection
import random
import numpy as np
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
origin = np.array([0, 0, 0])
#axis and radius
p0 = np.array([0, 0, 0])
p1 = np.array([8, 8, 8])
R = 4
#vector in direction of axis
v = p1 - p0
#find magnitude of vector
mag = norm(v)
#unit vector in direction of axis
v = v / mag
#make some vector not in the same direction as v
not_v = np.array([1, 0, 0])
if (v == not_v).all():
not_v = np.array([0, 1, 0])
#make vector perpendicular to v
n1 = np.cross(v, not_v)
#normalize n1
n1 /= norm(n1)
#make unit vector perpendicular to v and n1
n2 = np.cross(v, n1)
#surface ranges over t from 0 to length of axis and 0 to 2*pi
t = np.linspace(0, mag, 200)
theta = np.linspace(0, 2 * np.pi, 100)
#use meshgrid to make 2d arrays
t, theta = np.meshgrid(t, theta)
#generate coordinates for surface
X, Y, Z = [p0[i] + v[i] * t + R * np.sin(theta) * n1[i] + R * np.cos(theta) * n2[i] for i in [0, 1, 2]]
col1 = plt.cm.Blues(np.linspace(0,1,200)) # linear gradient along the t-axis
col1 = np.repeat(col1[np.newaxis,:, :], 100, axis=0) # expand over the theta- axis
ax.plot_surface(X, Y,Z, edgecolor = None, facecolors = col1, alpha = 0.5)
#plot axis
ax.plot(*zip(p0, p1), color = 'red')
ax.set_xlim(0, 10)
ax.set_ylim(0, 10)
ax.set_zlim(0, 10)
plt.axis('off')
ax.axes.get_xaxis().set_visible(False)
ax.axes.get_yaxis().set_visible(False)
plt.show()
Setting linewidth=0 in plot_surface() solves this problem:
ax.plot_surface(X, Y, Z, edgecolor=None, facecolors=col1, alpha=0.5, linewidth=0)
p.s.: I didn't find this worth an answer, but per: Question with no answers, but issue solved in the comments (or extended in chat), I added it as a quick answer so the question can be marked as solved

Categories

Resources