How do I create a confidence ellipsis in a scatterplot using matplotlib?
The following code works until creating scatter plot. Then, is anyone familiar with putting confidence ellipses over the scatter plot?
import numpy as np
import matplotlib.pyplot as plt
x = [5,7,11,15,16,17,18]
y = [8, 5, 8, 9, 17, 18, 25]
plt.scatter(x,y)
plt.show()
Following is the reference for Confidence Ellipses from SAS.
http://support.sas.com/documentation/cdl/en/grstatproc/62603/HTML/default/viewer.htm#a003160800.htm
The code in sas is like this:
proc sgscatter data=sashelp.iris(where=(species="Versicolor"));
title "Versicolor Length and Width";
compare y=(sepalwidth petalwidth)
x=(sepallength petallength)
/ reg ellipse=(type=mean) spacing=4;
run;
The following code draws a one, two, and three standard deviation sized ellipses:
x = [5,7,11,15,16,17,18]
y = [8, 5, 8, 9, 17, 18, 25]
cov = np.cov(x, y)
lambda_, v = np.linalg.eig(cov)
lambda_ = np.sqrt(lambda_)
from matplotlib.patches import Ellipse
import matplotlib.pyplot as plt
ax = plt.subplot(111, aspect='equal')
for j in xrange(1, 4):
ell = Ellipse(xy=(np.mean(x), np.mean(y)),
width=lambda_[0]*j*2, height=lambda_[1]*j*2,
angle=np.rad2deg(np.arccos(v[0, 0])))
ell.set_facecolor('none')
ax.add_artist(ell)
plt.scatter(x, y)
plt.show()
After giving the accepted answer a go, I found that it doesn't choose the quadrant correctly when calculating theta, as it relies on np.arccos:
Taking a look at the 'possible duplicate' and Joe Kington's solution on github, I watered his code down to this:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Ellipse
def eigsorted(cov):
vals, vecs = np.linalg.eigh(cov)
order = vals.argsort()[::-1]
return vals[order], vecs[:,order]
x = [5,7,11,15,16,17,18]
y = [25, 18, 17, 9, 8, 5, 8]
nstd = 2
ax = plt.subplot(111)
cov = np.cov(x, y)
vals, vecs = eigsorted(cov)
theta = np.degrees(np.arctan2(*vecs[:,0][::-1]))
w, h = 2 * nstd * np.sqrt(vals)
ell = Ellipse(xy=(np.mean(x), np.mean(y)),
width=w, height=h,
angle=theta, color='black')
ell.set_facecolor('none')
ax.add_artist(ell)
plt.scatter(x, y)
plt.show()
In addition to the accepted answer: I think the correct angle should be:
angle=np.rad2deg(np.arctan2(*v[:,np.argmax(abs(lambda_))][::-1])))
and the corresponding width (larger eigenvalue) and height should be:
width=lambda_[np.argmax(abs(lambda_))]*j*2, height=lambda_[1-np.argmax(abs(lambda_))]*j*2
As we need to find the corresponding eigenvector for the largest eigenvalue. Since "the eigenvalues are not necessarily ordered" according to the specs https://numpy.org/doc/stable/reference/generated/numpy.linalg.eig.html and v[:,i] is the eigenvector corresponding to the eigenvalue lambda_[i]; we should find the correct column of the eigenvector by np.argmax(abs(lambda_)).
There is no need to compute angles explicitly once you have the eigendecomposition of your covariance matrix: the rotation portion already encodes that information for you for free:
cov = np.cov(x, y)
val, rot = np.linalg.eig(cov)
val = np.sqrt(val)
center = np.mean([x, y], axis=1)[:, None]
t = np.linspace(0, 2.0 * np.pi, 1000)
xy = np.stack((np.cos(t), np.sin(t)), axis=-1)
plt.scatter(x, y)
plt.plot(*(rot # (val * xy).T + center))
You can expand your ellipse by applying a scale before translation:
plt.plot(*(2 * rot # (val * xy).T + center))
Related
I want to code a program to generate an array with coordinates to follow for drawing a shape like the white here, given are the blue points. Does anyone know how to do something like that or at least can give me a tip?
You could use e.g. InterpolatedUnivariateSpline to interpolate the points. As these spline functions are usually 1D, you could calculate x and y positions separately, depending on a new variable t going from 0 to 1.
import matplotlib.pyplot as plt
import numpy as np
from scipy import interpolate
# positions of the given points
px = [1, 4, 3, 2, 5]
py = [1, 3, 4, 3, 1]
# 5 t-values, at t=0 in point 1, at t=1 reaching point 5
pt = np.linspace(0, 1, len(px))
# sx and sy are functions that interpolate the points at the given t-values
sx = interpolate.InterpolatedUnivariateSpline(pt, px)
sy = interpolate.InterpolatedUnivariateSpline(pt, py)
# calculate many intermediate values
t = np.linspace(0, 1, 500)
x = sx(t)
y = sy(t)
# show the original points together with the spline
fig, ax = plt.subplots(facecolor='black')
ax.axis('off')
plt.scatter(px, py, s=80, color='skyblue')
plt.plot(x, y, color='white')
for i, (xi, yi) in enumerate(zip(px, py), start=1):
ax.text(xi, yi, f'\n {i}', ha='left', va='center', size=30, color='yellow')
plt.show()
I want to scatter a lot of datapoints around a centre one (2.5,2.5) based on a given distance for each datapoint to the centre.
How do I do that and also evade duplicates/scatter them evenly around the centre?
Thanks in advance
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure(figsize=(6, 6))
N = 120
angles = np.linspace(0, 2 * np.pi, N)
c_x, c_y = (2.5, 2.5)
x_s, y_s = [], []
distances = list(np.arange(0, 5.5, 0.5))
for distance in distances:
for angle in angles:
x_s.append(c_x + distance * np.cos(angle))
y_s.append(c_y + distance * np.sin(angle))
plt.scatter(x_s, y_s, c="b", s=4)
plt.show()
To clarify, I wanted one point for each distance, and then the next one offset by 180 or 90 degrees. But I managed to complete it based on the code provided by Gustav Rasmussen:
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure(figsize=(6, 6))
#default
N = 50
angles = np.linspace(0, 2 * np.pi, N)
c_x, c_y = (2.5, 2.5)
x_s, y_s = [], []
distances = list(np.arange(0, 5.5, 0.01))
i = angles.size/4
for distance in distances:
x_s.append(c_x + distance * np.cos(i))
y_s.append(c_y + distance * np.sin(i))
i += i
plt.scatter(x_s, y_s, c="b", s=4)
plt.show()
Here we can see 550 distances, displayed with the next one being displayed offset by approximately 90 degrees.
Last mention: When dealing with a dataset of bigger deviations it is better to do i = angles.size/2 as to keep the output somewhat circled
import cmath
import numpy as np
from matplotlib import pyplot as plt
from itertools import starmap
c = np.array(list(starmap(cmath.rect, [(v//40+1, v*np.pi/20) for v in range(120)])))
x = c.real+2.5
y = c.imag+2.5
plt.scatter(x, y)
How can I interpolate a hysteresis loop at specific x points? Multiple related questions/answers are available on SOF regarding B-spline interpolation using scipy.interpolate.splprep (other questions here or here). However, I have hundreds of hysteresis loops at very similar (but not exactly same) x positions and I would like to perform B-spline interpolation on all of them at specific x coordinates.
Taking a previous example:
import numpy as np
from scipy import interpolate
from matplotlib import pyplot as plt
x = np.array([23, 24, 24, 25, 25])
y = np.array([13, 12, 13, 12, 13])
# append the starting x,y coordinates
x = np.r_[x, x[0]]
y = np.r_[y, y[0]]
# fit splines to x=f(u) and y=g(u), treating both as periodic. also note that s=0
# is needed in order to force the spline fit to pass through all the input points.
tck, u = interpolate.splprep([x, y], s=0, per=True)
# evaluate the spline fits for 1000 evenly spaced distance values
xi, yi = interpolate.splev(np.linspace(0, 1, 1000), tck)
# plot the result
fig, ax = plt.subplots(1, 1)
ax.plot(x, y, 'or')
ax.plot(xi, yi, '-b')
plt.show()
Is it possible to provide specific x values to interpolate.splev? I get unexpected results:
x2, y2 = interpolate.splev(np.linspace(start=23, stop=25, num=30), tck)
fig, ax = plt.subplots(1, 1)
ax.plot(x, y, 'or')
ax.plot(x2, y2, '-b')
plt.show()
The b-spline gives x and y positions for a given u (between 0 and 1).
Getting y positions for a given x position involves solving for the inverse. As there can be many y's corresponding to one x (in the given example there are places with 4 y's, for example at x=24).
A simple way to get a list of (x,y)'s for x between two limits, is to create a filter:
import numpy as np
from scipy import interpolate
from matplotlib import pyplot as plt
x = np.array([23, 24, 24, 25, 25])
y = np.array([13, 12, 13, 12, 13])
# append the starting x,y coordinates
x = np.r_[x, x[0]]
y = np.r_[y, y[0]]
tck, u = interpolate.splprep([x, y], s=0, per=True)
# evaluate the spline fits for 1000 evenly spaced distance values
xi, yi = interpolate.splev(np.linspace(0, 1, 1000), tck)
# plot the result
fig, ax = plt.subplots(1, 1)
ax.plot(x, y, 'or')
ax.plot(xi, yi, '-b')
filter = (xi >= 24) & (xi <= 25)
x2 = xi[filter]
y2 = yi[filter]
ax.scatter(x2, y2, color='c')
plt.show()
I am trying to remove the edge color in the plot of a cylinder where I have set an alpha and facecolors. However, if I also set the facecolors, I can still see the edge colors. If I remove the alpha = 0.5 statement then the problem is resolved, however I need the alpha to be <1 . Here is an example:
You can still see the blue edgecolors even tough I have set the edgecolor to None.
This is the code where I use plot_surface()
ax.plot_surface(X, Y,Z, edgecolor = "None", facecolors = col1, alpha = 0.5)
Yet the edge colors are still there? However, if I remove the facecolors statement inside plot_surface() then the edge colors are no longer there. Here is the complete code:
import numpy as np
from matplotlib import cm
from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from scipy.linalg import norm
from mpl_toolkits.mplot3d.art3d import Poly3DCollection
import random
import numpy as np
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
origin = np.array([0, 0, 0])
#axis and radius
p0 = np.array([0, 0, 0])
p1 = np.array([8, 8, 8])
R = 4
#vector in direction of axis
v = p1 - p0
#find magnitude of vector
mag = norm(v)
#unit vector in direction of axis
v = v / mag
#make some vector not in the same direction as v
not_v = np.array([1, 0, 0])
if (v == not_v).all():
not_v = np.array([0, 1, 0])
#make vector perpendicular to v
n1 = np.cross(v, not_v)
#normalize n1
n1 /= norm(n1)
#make unit vector perpendicular to v and n1
n2 = np.cross(v, n1)
#surface ranges over t from 0 to length of axis and 0 to 2*pi
t = np.linspace(0, mag, 200)
theta = np.linspace(0, 2 * np.pi, 100)
#use meshgrid to make 2d arrays
t, theta = np.meshgrid(t, theta)
#generate coordinates for surface
X, Y, Z = [p0[i] + v[i] * t + R * np.sin(theta) * n1[i] + R * np.cos(theta) * n2[i] for i in [0, 1, 2]]
col1 = plt.cm.Blues(np.linspace(0,1,200)) # linear gradient along the t-axis
col1 = np.repeat(col1[np.newaxis,:, :], 100, axis=0) # expand over the theta- axis
ax.plot_surface(X, Y,Z, edgecolor = None, facecolors = col1, alpha = 0.5)
#plot axis
ax.plot(*zip(p0, p1), color = 'red')
ax.set_xlim(0, 10)
ax.set_ylim(0, 10)
ax.set_zlim(0, 10)
plt.axis('off')
ax.axes.get_xaxis().set_visible(False)
ax.axes.get_yaxis().set_visible(False)
plt.show()
Setting linewidth=0 in plot_surface() solves this problem:
ax.plot_surface(X, Y, Z, edgecolor=None, facecolors=col1, alpha=0.5, linewidth=0)
p.s.: I didn't find this worth an answer, but per: Question with no answers, but issue solved in the comments (or extended in chat), I added it as a quick answer so the question can be marked as solved
How do I create a confidence ellipsis in a scatterplot using matplotlib?
The following code works until creating scatter plot. Then, is anyone familiar with putting confidence ellipses over the scatter plot?
import numpy as np
import matplotlib.pyplot as plt
x = [5,7,11,15,16,17,18]
y = [8, 5, 8, 9, 17, 18, 25]
plt.scatter(x,y)
plt.show()
Following is the reference for Confidence Ellipses from SAS.
http://support.sas.com/documentation/cdl/en/grstatproc/62603/HTML/default/viewer.htm#a003160800.htm
The code in sas is like this:
proc sgscatter data=sashelp.iris(where=(species="Versicolor"));
title "Versicolor Length and Width";
compare y=(sepalwidth petalwidth)
x=(sepallength petallength)
/ reg ellipse=(type=mean) spacing=4;
run;
The following code draws a one, two, and three standard deviation sized ellipses:
x = [5,7,11,15,16,17,18]
y = [8, 5, 8, 9, 17, 18, 25]
cov = np.cov(x, y)
lambda_, v = np.linalg.eig(cov)
lambda_ = np.sqrt(lambda_)
from matplotlib.patches import Ellipse
import matplotlib.pyplot as plt
ax = plt.subplot(111, aspect='equal')
for j in xrange(1, 4):
ell = Ellipse(xy=(np.mean(x), np.mean(y)),
width=lambda_[0]*j*2, height=lambda_[1]*j*2,
angle=np.rad2deg(np.arccos(v[0, 0])))
ell.set_facecolor('none')
ax.add_artist(ell)
plt.scatter(x, y)
plt.show()
After giving the accepted answer a go, I found that it doesn't choose the quadrant correctly when calculating theta, as it relies on np.arccos:
Taking a look at the 'possible duplicate' and Joe Kington's solution on github, I watered his code down to this:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Ellipse
def eigsorted(cov):
vals, vecs = np.linalg.eigh(cov)
order = vals.argsort()[::-1]
return vals[order], vecs[:,order]
x = [5,7,11,15,16,17,18]
y = [25, 18, 17, 9, 8, 5, 8]
nstd = 2
ax = plt.subplot(111)
cov = np.cov(x, y)
vals, vecs = eigsorted(cov)
theta = np.degrees(np.arctan2(*vecs[:,0][::-1]))
w, h = 2 * nstd * np.sqrt(vals)
ell = Ellipse(xy=(np.mean(x), np.mean(y)),
width=w, height=h,
angle=theta, color='black')
ell.set_facecolor('none')
ax.add_artist(ell)
plt.scatter(x, y)
plt.show()
In addition to the accepted answer: I think the correct angle should be:
angle=np.rad2deg(np.arctan2(*v[:,np.argmax(abs(lambda_))][::-1])))
and the corresponding width (larger eigenvalue) and height should be:
width=lambda_[np.argmax(abs(lambda_))]*j*2, height=lambda_[1-np.argmax(abs(lambda_))]*j*2
As we need to find the corresponding eigenvector for the largest eigenvalue. Since "the eigenvalues are not necessarily ordered" according to the specs https://numpy.org/doc/stable/reference/generated/numpy.linalg.eig.html and v[:,i] is the eigenvector corresponding to the eigenvalue lambda_[i]; we should find the correct column of the eigenvector by np.argmax(abs(lambda_)).
There is no need to compute angles explicitly once you have the eigendecomposition of your covariance matrix: the rotation portion already encodes that information for you for free:
cov = np.cov(x, y)
val, rot = np.linalg.eig(cov)
val = np.sqrt(val)
center = np.mean([x, y], axis=1)[:, None]
t = np.linspace(0, 2.0 * np.pi, 1000)
xy = np.stack((np.cos(t), np.sin(t)), axis=-1)
plt.scatter(x, y)
plt.plot(*(rot # (val * xy).T + center))
You can expand your ellipse by applying a scale before translation:
plt.plot(*(2 * rot # (val * xy).T + center))