Related
I want to plot 2 trendlines for one scatterplot using Matplotlib in Python but I don't know how. The graph should be similar to this target plot (from here, fig.2).
I managed to plot 1 trendline on a scatterplot here but can't figure out how to plot another trend.
Underneath is what I tried until now:
This proved ok for other parameters that I plotted, but not for this case, which led me to the conclusion that it's not too correct.
X = vO2.reshape(-1, 1)
Y = ve.reshape(-1, 1)
linear_regressor = LinearRegression()
linear_regressor.fit(X, Y)
y_pred = linear_regressor.predict(X)
x_pred = linear_regressor.predict(Y)
plt.scatter(X, Y)
plt.plot(X, y_pred, '-*',label="O2")
plt.plot(x_pred, Y, '-*',label="vent")
plt.xlabel("VO2 (L/min)")
plt.ylabel("VE (L/min)")
plt.show()
and also
z1 = np.polyfit(vO2, ve, 1)
p1 = np.poly1d(z1)
z2 = np.polyfit(ve, vO2, 1)
p2 = np.poly1d(z2)
plt.scatter(vO2, ref_vent, label='original')
plt.plot(vO2, p1(vO2), label='trendline')
plt.plot(ve, p2(ve), label='trendline')
plt.show()
which also didn't look similar to the target plot.
I don't know how to continue. Thanks in advance!
example dataset:
vo2 = [1.673925 1.9015125 1.981775 2.112875 2.1112625 2.086375 2.13475
2.1777 2.176975 2.1857125 2.258925 2.2718375 2.3381 2.3330875
2.353725 2.4879625 2.448275 2.4829875 2.5084375 2.511275 2.5511
2.5678375 2.5844625 2.6101875 2.6457375 2.6602125 2.6939875 2.7210625
2.720475 2.767025 2.751375 2.7771875 2.776025 2.7319875 2.564
2.3977625 2.4459125 2.42965 2.401275 2.387175 2.3544375]
ve = [ 3.93125 7.1975 9.04375 14.06125 14.11875 13.24375
14.6625 15.3625 15.2 15.035 17.7625 17.955
19.2675 19.875 21.1575 22.9825 23.75625 23.30875
25.9925 25.6775 27.33875 27.7775 27.9625 29.35
31.86125 32.2425 33.7575 34.69125 36.20125 38.6325
39.4425 42.085 45.17 47.18 42.295 37.5125
38.84375 37.4775 34.20375 33.18 32.67708333]
OK, so you need to find the point, where slope of line changes. I tried 2nd derivative, but it was noisy and I coulnd't find the right spot.
Another way is to try all possible points, calculate left and right regression lines and find pair with best fit (r2 coeff). Give this code a try. It is not complete. I do not know, how to force regression lines to go through point in the middle. And it might be better to work with interpolated data, if there are not enough datapoints.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import r2_score
vo2 = [1.673925,1.9015125,1.981775,2.112875,2.1112625,2.086375,2.13475,2.1777,2.176975,2.1857125,2.258925,2.2718375,2.3381,2.3330875,2.353725,2.4879625,2.448275,2.4829875,2.5084375,2.511275,2.5511,2.5678375,2.5844625,2.6101875,2.6457375,2.6602125,2.6939875,2.7210625,2.720475,2.767025,2.751375,2.7771875,2.776025,2.7319875,2.564,2.3977625,2.4459125,2.42965,2.401275,2.387175,2.3544375]
ve = [ 3.93125,7.1975,9.04375,14.06125,14.11875,13.24375,14.6625,15.3625,15.2,15.035,17.7625,17.955,19.2675,19.875,21.1575,22.9825,23.75625,23.30875,25.9925,25.6775,27.33875,27.7775,27.9625,29.35,31.86125,32.2425,33.7575,34.69125,36.20125,38.6325,39.4425,42.085,45.17,47.18,42.295,37.5125,38.84375,37.4775,34.20375,33.18,32.67708333]
x = np.array(vo2)
y = np.array(ve)
sort_idx = x.argsort()
x = x[sort_idx]
y = y[sort_idx]
assert len(x) == len(y)
def fit(x,y):
p = np.polyfit(x, y, 1)
f = np.poly1d(p)
r2 = r2_score(y, f(x))
return p, f, r2
skip = 5 # minimal length of split data
r2 = [0] * len(x)
funcs = {}
for i in range(len(x)):
if i < skip or i > len(x) - skip:
continue
_, f_left, r2_left = fit(x[:i], y[:i])
_, f_right, r2_right = fit(x[i:], y[i:])
r2[i] = r2_left * r2_right
funcs[i] = (f_left, f_right)
split_ix = np.argmax(r2) # index of split
f_left,f_right = funcs[split_ix]
print(f"split point index: {split_ix}, x: {x[split_ix]}, y: {y[split_ix]}")
xd = np.linspace(min(x), max(x), 100)
plt.plot(x, y, "o")
plt.plot(xd, f_left(xd))
plt.plot(xd, f_right(xd))
plt.plot(x[split_ix], y[split_ix], "x")
plt.show()
I'm learning OpenCV, and looking at convertScaleAbs to transform the original values to the range [0,255], quite similar to what normalize do in the mode NORM_MINMAX.
As far as I understand, values are transformed according to y = a*x + b, then the resulting values are clipped and converted to uint8. If this is correct, then selecting a and b this way:
a = (255.0 - 0) / (x_max - x_min)
b = -x_min * a
should linearly transform the original values to [0,255] and the final step, the clipping, should not change the values (only the type). However I cannot obtain this correct result with the a and b values above. I create random original values, then show the result of normalize (the ones expected), then the results of convertScaleAbs (wrong, everything converted to 255):
Here is my code:
import numpy as np
import random as rnd
import matplotlib.pyplot as plt
import cv2
x_values = range(100)
y_values = [None]*3
# Original values
a,b = rnd.randint(0,10), rnd.randint(0,1e4)
y_values[0] = np.array([a*i+b for i in x_values])
np.random.shuffle(y_values[0])
# Transformed values, fist method
y_values[1] = np.zeros(y_values[0].shape)
y_values[1] = cv2.normalize(
y_values[0], y_values[1],
0, 255, cv2.NORM_MINMAX, cv2.CV_8U)
# Transformed values, alternate method
ymin, ymax = y_values[0].min(), y_values[0].max()
a = (255.0 - 0) / (ymax - ymin)
b = -ymin * a
y_values[2] = cv2.convertScaleAbs(y_values[0], a, b)
# Check visually
fig, ax = plt.subplots(3,1, figsize=(6,4), sharex=True)
for i,values in enumerate(y_values):
ax[i].set_ylim(y_values[i].min(), y_values[i].max())
ax[i].tick_params(axis='both', which='major', labelsize=8)
ax[i].set_title(titles[i], fontsize=8)
ax[i].grid(axis='both', ls=':')
ax[i].scatter(x_values, y_values[i], marker='.', s=1)
fig.tight_layout()
plt.locator_params(axis='y', nbins=5)
plt.ioff()
plt.show()
According to the documentation at convertScaleAbs, the second argument to the function should be the destination matrix: y_values[2] here.
After changing to
y_values[2] = cv2.convertScaleAbs(y_values[0], y_values[2], a, b)
, it seems fine now:
I am trying to fit a smoothing B-spline to some data and I found this very helpful post on here. However, I not only need the spline, but also its derivatives, so I tried to add the following code to the example:
tck_der = interpolate.splder(tck, n=1)
x_der, y_der, z_der = interpolate.splev(u_fine, tck_der)
For some reason this does not seem to work due to some data type issues. I get the following traceback:
Traceback (most recent call last):
File "interpolate_point_trace.py", line 31, in spline_example
tck_der = interpolate.splder(tck, n=1)
File "/home/user/anaconda3/lib/python3.7/site-packages/scipy/interpolate/fitpack.py", line 657, in splder
return _impl.splder(tck, n)
File "/home/user/anaconda3/lib/python3.7/site-packages/scipy/interpolate/_fitpack_impl.py", line 1206, in splder
sh = (slice(None),) + ((None,)*len(c.shape[1:]))
AttributeError: 'list' object has no attribute 'shape'
The reason for this seems to be that the second argument of the tck tuple contains a list of numpy arrays. I thought turning the input data to be a numpy array as well would help, but it does not change the data types of tck.
Does this behavior reflect an error in scipy, or is the input malformed?
I tried manually turning the list into an array:
tck[1] = np.array(tck[1])
but this (which didn't surprise me) also gave an error:
ValueError: operands could not be broadcast together with shapes (0,8) (7,1)
Any ideas of what the problem could be? I have used scipy before and on 1D splines the splder function works just fine, so I assume it has something to do with the spline being a line in 3D.
------- edit --------
Here is a minimum working example:
import numpy as np
import matplotlib.pyplot as plt
from scipy import interpolate
from mpl_toolkits.mplot3d import Axes3D
total_rad = 10
z_factor = 3
noise = 0.1
num_true_pts = 200
s_true = np.linspace(0, total_rad, num_true_pts)
x_true = np.cos(s_true)
y_true = np.sin(s_true)
z_true = s_true / z_factor
num_sample_pts = 80
s_sample = np.linspace(0, total_rad, num_sample_pts)
x_sample = np.cos(s_sample) + noise * np.random.randn(num_sample_pts)
y_sample = np.sin(s_sample) + noise * np.random.randn(num_sample_pts)
z_sample = s_sample / z_factor + noise * np.random.randn(num_sample_pts)
tck, u = interpolate.splprep([x_sample, y_sample, z_sample], s=2)
x_knots, y_knots, z_knots = interpolate.splev(tck[0], tck)
u_fine = np.linspace(0, 1, num_true_pts)
x_fine, y_fine, z_fine = interpolate.splev(u_fine, tck)
# this is the part of the code I inserted: the line under this causes the crash
tck_der = interpolate.splder(tck, n=1)
x_der, y_der, z_der = interpolate.splev(u_fine, tck_der)
# end of the inserted code
fig2 = plt.figure(2)
ax3d = fig2.add_subplot(111, projection='3d')
ax3d.plot(x_true, y_true, z_true, 'b')
ax3d.plot(x_sample, y_sample, z_sample, 'r*')
ax3d.plot(x_knots, y_knots, z_knots, 'go')
ax3d.plot(x_fine, y_fine, z_fine, 'g')
fig2.show()
plt.show()
Stumbled into the same problem...
I circumvented the error by using interpolate.splder(tck, n=1) and instead used interpolate.splev(spline_ev, tck, der=1) which returns the derivatives at the points spline_ev (see Scipy Doku).
If you need the spline I think you can then use interpolate.splprep() again.
In total something like:
import numpy as np
from scipy import interpolate
import matplotlib.pyplot as plt
points = np.random.rand(10,2) * 10
(tck, u), fp, ier, msg = interpolate.splprep(points.T, s=0, k=3, full_output=True)
spline_ev = np.linspace(0.0, 1.0, 100, endpoint=True)
spline_points = interpolate.splev(spline_ev, tck)
# Calculate derivative
spline_der_points = interpolate.splev(spline_ev, tck, der=1)
spline_der = interpolate.splprep(spline_der_points.T, s=0, k=3, full_output=True)
# Plot the data and derivative
fig = plt.figure()
plt.plot(points[:,0], points[:,1], '.-', label="points")
plt.plot(spline_points[0], spline_points[1], '.-', label="tck")
plt.plot(spline_der_points[0], spline_der_points[1], '.-', label="tck_der")
# Show tangent
plt.arrow(spline_points[0][23]-spline_der_points[0][23], spline_points[1][23]-spline_der_points[1][23], 2.0*spline_der_points[0][23], 2.0*spline_der_points[1][23])
plt.legend()
plt.show()
EDIT:
I also opened an Issue on Github and according to ev-br the usage of interpolate.splprep is depreciated and one should use make_interp_spline / BSpline instead.
As noted in other answers, splprep output is incompatible with splder, but is compatible with splev. And the latter can evaluate the derivatives.
However, for interpolation, there is an alternative approach, which avoids splprep altogether. I'm basically copying a reply on the SciPy issue tracker (https://github.com/scipy/scipy/issues/10389):
Here's an example of replicating the splprep outputs. First let's make sense out of the splprep output:
# start with the OP example
import numpy as np
from scipy import interpolate
points = np.random.rand(10,2) * 10
(tck, u), fp, ier, msg = interpolate.splprep(points.T, s=0, k=3, full_output=True)
# check the meaning of the `u` array: evaluation of the spline at `u`
# gives back the original points (up to a list/transpose)
xy = interpolate.splev(u, tck)
xy = np.asarray(xy)
np.allclose(xy.T, points)
Next, let's replicate it without splprep. First, build the u array: the curve is represented parametrically, and u is essentially an approximation for the arc length. Other parametrizations are possible, but here let's stick to what splprep does. Translating the pseudocode from the doc page, https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.splprep.html
vv = np.sum((points[1:, :] - points[:-1, :])**2, axis=1)
vv = np.sqrt(vv).cumsum()
vv/= vv[-1]
vv = np.r_[0, vv]
# check:
np.allclose(u, vv)
Now, interpolate along the parametric curve: points vs vv:
spl = interpolate.make_interp_spline(vv, points)
# check spl.t vs knots from splPrep
spl.t - tck[0]
The result, spl, is a BSpline object which you can evaluate, differentiate etc in a usual way:
np.allclose(points, spl(vv))
# differentiate
spl_derivative = spl.derivative(vv)
I'm new to python and trying to plot a gaussian distribution having the function defined as
I plotted normal distribution P(x,y) and it's giving correct output. code and output are below.
Code :
Output :
Now I need to plot a conditional distribution and the output should like . to do this I need to define a boundary condition for the equation. I tried to define a boundary condition but it's not working. the code which I tried is but it's giving wrong output
please help me how to plot the same.
Thanks,
You used the boundary condition on the wrong parameter, try to do it after creating the grid points.
R = np.arange(-4, 4, 0.1)
X, Y = np.meshgrid(R, R)
then validate X and Y based on the condition
valid_xy = np.sqrt(X**2+Y**2) >= 1
X = X[valid_xy]
Y = Y[valid_xy]
Then continue with the rest of the code.
Update
If you want just to reset values around the peak to zero, you can use the following code:
import numpy as np
import matplotlib.pyplot as plt
R = np.arange(-4, 4, 0.1)
X, Y = np.meshgrid(R, R)
Z = np.sum(np.exp(-0.5*(X**2+Y**2)))
P = (1/Z)*np.exp(-0.5*(X**2+Y**2))
# reset the peak
invalid_xy = (X**2+Y**2)<1
P[invalid_xy] = 0
# plot the result
fig = plt.figure(figsize=(10, 6))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X, Y, P, s=0.5, alpha=0.5)
plt.show()
You can't use np.meshgrid anymore because it will output a matrix where the coordinates of X and Y form a grid (hence its name) and not a custom shape (a grid minus a disc like you want):
However you can create your custom grid the following way:
R = np.arange(-,4,0.1)
xy_coord = np.array(((x,y) for x in R for y in R if (x*x + y*y) > 1))
X,Y = xy_coord.transpose()
X
# array([ 0. , 0. , 0. , ..., 3.9, 3.9, 3.9])
Y
# array([ 1.1, 1.2, 1.3, ..., 3.7, 3.8, 3.9])
I know that there are several similar questions such as Use of curve_fit to fit data but the answer there (specific float type) does not seem to work for me. I am wondering if anyone is able to explain to me why the following code (which is an adaptation from an answer -> https://stackoverflow.com/a/11507723/1093485) does not work.
sample code:
#! /usr/bin/env python
import numpy
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
def gaussFunction(x, A, mu, sigma):
return A*numpy.exp(-(x-mu)**2/(2.*sigma**2))
x_points = [4245.428, 4245.4378, 4245.4477, 4245.4575, 4245.4673, 4245.4772, 4245.487, 4245.4968, 4245.5066, 4245.5165, 4245.5263, 4245.5361, 4245.546, 4245.5558, 4245.5656, 4245.5755, 4245.5853, 4245.5951, 4245.605, 4245.6148, 4245.6246, 4245.6345, 4245.6443, 4245.6541, 4245.6639, 4245.6738, 4245.6836, 4245.6934, 4245.7033, 4245.7131, 4245.7229, 4245.7328, 4245.7426, 4245.7524, 4245.7623, 4245.7721, 4245.7819, 4245.7918, 4245.8016, 4245.8114, 4245.8213, 4245.8311, 4245.8409, 4245.8508, 4245.8606, 4245.8704, 4245.8803, 4245.8901, 4245.8999, 4245.9097, 4245.9196, 4245.9294, 4245.9392, 4245.9491, 4245.9589, 4245.9687, 4245.9786, 4245.9884, 4245.9982, 4246.0081, 4246.0179, 4246.0277, 4246.0376, 4246.0474, 4246.0572, 4246.0671, 4246.0769, 4246.0867, 4246.0966, 4246.1064, 4246.1162, 4246.1261, 4246.1359, 4246.1457, 4246.1556, 4246.1654, 4246.1752, 4246.1851, 4246.1949, 4246.2047, 4246.2146]
y_points = [978845.0, 1165115.0, 1255368.0, 1253901.0, 1199857.0, 1134135.0, 1065403.0, 977347.0, 866444.0, 759457.0, 693284.0, 679772.0, 696896.0, 706494.0, 668272.0, 555221.0, 374547.0, 189968.0, 161754.0, 216483.0, 181937.0, 73967.0, 146627.0, 263495.0, 284992.0, 240291.0, 327541.0, 555690.0, 758847.0, 848035.0, 800159.0, 645769.0, 444412.0, 249627.0, 125078.0, 254856.0, 498501.0, 757049.0, 977861.0, 1125316.0, 1202892.0, 1263220.0, 1366361.0, 1497071.0, 1559804.0, 1464012.0, 1196896.0, 848736.0, 584363.0, 478640.0, 392943.0, 312466.0, 355540.0, 320666.0, 114711.0, 690948.0, 1409389.0, 1825921.0, 1636486.0, 1730980.0, 4081179.0, 7754166.0, 11747734.0, 15158681.0, 17197366.0, 17388832.0, 15710571.0, 12593935.0, 8784968.0, 5115720.0, 2277684.0, 769734.0, 674437.0, 647250.0, 708156.0, 882759.0, 833756.0, 504655.0, 317790.0, 711106.0, 1011705.0]
# Try to fit the gaussian
trialX = numpy.linspace(x_points[0],x_points[-1],1000)
coeff, var_matrix = curve_fit(gaussFunction, x_points, y_points)
yEXP = gaussFunction(trialX, *coeff)
# Plot the data (just for visualization)
fig = plt.figure()
ax = fig.add_subplot(111)
plt.plot(x_points, y_points, 'b*')
plt.plot(trialX,yEXP, '--')
plt.show()
The output that it generates:
Without starting parameters they default to 1. Putting in
reasonable p0, it works:
p0 = [np.max(y_points), x_points[np.argmax(y_points)], 0.1]
coeff, var_matrix = curve_fit(gaussFunction, x_points, y_points, p0)
Leads to: