I'm trying to draw a curve for regression fitting. The curve is for a higher degree polynomial ( 6 and above ).
fig=figure()
ax1=fig.add_axes((0.1,0.2,0.8,0.7))
ax1.set_title("Training data(blue) and fitting curve(red)")
ax1.set_xlabel('X-axis')
ax1.set_ylabel('Y-axis')
ax1.plot(x_train,y_train,'.',x_train,np.polyval(best_coef,x_train),'-r')
show()
This is the output of the given code
I want it to be a smooth curve.
something like this , with a continues red line , instead of discreet point of red
I think you just need to sort x_train before plotting the fit results:
ax1.plot(x_train,y_train,'.', np.sort(x_train),np.polyval(best_coef,np.sort(x_train)),'-r')
The plot you included looks like the x_train values (and therefore also the fitted values) are in random order, but the plot routine does not connect the nearest points, but consecutive points in the arrays.
Related
I'm wanting to create a contour map (plot) based on the probability of a condition being met. So far I have three columns of equal length-- soil saturation (X), storm intensity (Y), and (Z) the probability of the condition being met (i.e. 0-1)
My X and Y data have a different number of unique values, such that each X value has a number of corresponding Y values from which to generate points in a typical scatterplot. I'd like to have contours of probability at location (X,Y) where hopefully I can illustrate a each contour as a decision boundary where the decision for a 1 is made above that probability value (e.g. 0.4, 0.5, 0.6, etc.).
I've so far tried a number of options and most of them start with np.meshgrid. I'll post some sample plots below. Is there anyway I can clean this up to just be a single line for each probability boundary?
Here is an example scatterplot I want to make contours from:
plt.figure(figsize=(18,5))
plt.scatter(sat_depth_test.loc[:,'soil sat'],sat_depth_test.loc[:,'storm depth'],c=probas[:,1],cmap='YlOrRd')
plt.colorbar()
plt.title('Prediction Boundary for Ksat = 0.00001, all other Base Case Parameterizations')
plt.xlabel('Antecedent Soil Saturation')
plt.ylabel('Storm Storm Depth (mm)')
plt.yscale('log')
plt.ylim([5,150])
scatterplot
I've also done something similar, but not with single lines showing the decision boundary:
triang = tri.Triangulation(sat, storm)
interpolator = tri.LinearTriInterpolator(triang, probdf['Prob'])
X,Y = np.meshgrid(sat,storm)
Z = interpolator(X, Y)
plt.figure(figsize=(15,5))
plt.contour(X,Y,Z)
plt.colorbar()
plt.title('Prediction Boundary for Ksat = 0.00001, all other Base Case Parameterizations')
plt.xlabel('Antecedent Soil Saturation')
plt.ylabel('Storm Storm Depth (mm)')
plt.yscale('log')
plt.ylim([5,150])
contour plot
I'm just feeling a little stuck here. Ideally, I'd like to have single lines showing contours for each 0.05 or 0.10 probability interval in place of a scatterplot.
It would be similar to this, but still cleaner:
cleaner contour
Ideally, I would like something like this: (borrowed from https://jakevdp.github.io/PythonDataScienceHandbook/04.04-density-and-contour-plots.html)
ideal plot
I also apologize for not putting the actual plots on here. This is my first question on SO, and I'm having trouble finding out how to include them.
Thank you!
I'm having trouble using the scipy interpolation methods to generate a nice smooth curve from the data points given. I've tried using the standard 1D interpolation, the Rbf interpolation with all options (cubic, gaussian, multiquadric etc.)
in the image provided, the blue line is the original data, and I'm looking to first smooth the sharp edges, and then have dynamically editable points from which to recalculate the curve. Each time a single point is edited it should auto calculate a new spline of some sort to smoothly transition between each point.
It kind of works when the points are within a particular range of each other as below.
But if the points end up too far apart, or too close together, I end up with issues like the following.
Key points are:
The curve MUST be flat between the first two points
The curve must NOT go below point 1 or 2 (i.e. derivative can't be negative)
~15 points (not shown) between points 2 and 3 are also editable and the line between is not necessarily linear. Full control over each of these points is a must, as is the curve going through each of them.
I'm happy to break it down into smaller curves that i then join/convolve, but just need to ensure a >0 gradient.
sample data:
x=[0, 37, 50, 105, 115,120]
y=[0.00965, 0.00965, 0.047850827205882, 0.35600416666667, 0.38074375, 0.38074375]
As an example, try moving point 2 (x=37) to an extreme value, say 10 (keep y the same). Just ensure that all points from x=0 to x=10 (or any other variation) have identical y values of 0.00965.
any assistance is greatly appreciated.
UPDATE
Attempted pchip method suggested in comments with the results below:
pchip method, better and worse...
Solved!
While I'm not sure that this is exactly true, it is as if the spline tools for creating Bezier curves treat the control points as points the calculated curve must go through - which is not true in my case. I couldn't figure out how to turn this feature off, so I found the cubic formula for a Bezier curve (cubic is what I need) and calculated my own points. I only then had to do a little adjustment to make the points fit the required integer x values - in my case, near enough is good enough. I would otherwise have needed to interpolate linearly between two points either side of the desired x value and determine the exact value.
For those interested, cubic needs 4 points - start, end, and 2 control points. The rule is:
B(t) = (1-t)^3 P0 + 3(1-t)^2 tP1 + 3(1-t)t^2 P2 + t^3 P3
Calculate for x and y separately, using a list of values for t. If you need to gradient match, just make sure that the control points for P1 and P2 are only moved along the same gradient as the preceding/proceeding sections.
Perfect result
I am plotting both a distribution of test scores and a fitted curve to these test scores:
h = sorted(data['Baseline']) #sorted
fit = stats.norm.pdf(h, np.mean(h), np.std(h))
plt.plot(h,fit,'-o')
plt.hist(h,normed=True) #use this to draw histogram of your data
plt.show()
The plot of the pdf, however, does not look normal (see kink in curve near x=60). See output:
I'm not sure what is going on here...any help appreciated. Is this because the normal line is being drawn between supplied observations? Can provide you with the actual data if needed, there are only 60 observations.
Yes, you evaluate the norm-pdf on the overservation. You would instead want to create some other data like
h = sorted(data['Baseline']) #sorted
x = np.linspace(h.min(), h.max(), 151)
fit = stats.norm.pdf(x, np.mean(h), np.std(h))
plt.plot(x,fit,'-')
plt.hist(h,normed=True)
plt.show()
Note however, that the data does not look normally distributed at all. So potentially you would rather fit a different distribution, or maybe perform a kernel density estimate.
I have a set of 3d coordinates (x,y,z) to which I would like to fit a space curve. Does anyone know of existing routines for this in Python?
From what I have found (https://docs.scipy.org/doc/scipy/reference/interpolate.html), there are existing modules for fitting a curve to a set of 2d coordinates, and others for fitting a surface to a set of 3d coordinates. I want the middle path - fitting a curve to a set of 3d coordinates.
EDIT --
I found an explicit answer to this on another post here, using interpolate.splprep() and interpolate.splenv(). Here are my data points:
import numpy as np
data = np.array([[21.735556483642707, 7.9999120559310359, -0.7043281314370935],
[21.009401429607784, 8.0101161320825103, -0.16388503829177037],
[20.199370045383134, 8.0361339131845497, 0.25664085801558179],
[19.318149385194054, 8.0540100864979447, 0.50434139043379278],
[18.405497793567243, 8.0621753888918484, 0.57169888018720161],
[17.952649703401562, 8.8413995204241491, 0.39316793526155014],
[17.539007529982641, 9.6245700151356104, 0.14326173861202204],
[17.100154581079089, 10.416295524018977, 0.011339000091976647],
[16.645143439968102, 11.208477191735446, 0.070252116425261066],
[16.198247656768263, 11.967005154933993, 0.31087815045809558],
[16.661378578010989, 12.717314230004659, 0.54140549139204996],
[17.126106263351478, 13.503461982612732, 0.57743407626794219],
[17.564249250974573, 14.28890107482801, 0.42307198199366186],
[17.968265052275274, 15.031985807202176, 0.10156997950061938]])
Here is my code:
from scipy import interpolate
from mpl_toolkits.mplot3d import Axes3D
data = data.transpose()
#now we get all the knots and info about the interpolated spline
tck, u= interpolate.splprep(data, k=5)
#here we generate the new interpolated dataset,
#increase the resolution by increasing the spacing, 500 in this example
new = interpolate.splev(np.linspace(0,1,500), tck, der=0)
#now lets plot it!
fig = plt.figure()
ax = Axes3D(fig)
ax.plot(data[0], data[1], data[2], label='originalpoints', lw =2, c='Dodgerblue')
ax.plot(new[0], new[1], new[2], label='fit', lw =2, c='red')
ax.legend()
plt.savefig('junk.png')
plt.show()
This is the image:
You can see that the fit is not good, while I am already using the maximum allowed fitting order value (k=5). Is this because the curve is not fully convex? Does anyone know how I can improve the fit?
Depends on what the points represent, but if it's just position data, you could use a kalman filter such as this one written in python. You could just query the kalman filter at any time to get the "expected point" at that time, so it would work just like a function of time.
If you do plan to use a kalman filter, just set the initial estimate to your first coordinate, and set your covariance to be a diagonal matrix of huge numbers, this will indicate that you are very uncertain about the position of your next point, which will quickly lock the filter onto your coordinates.
You'd want to stay away from spline fitting methods, because splines will always go through your data.
You can fit a curve to any dimensional data. The curve fitting / optimization algorithms (say, in scipy.optimize) all treat the observations you want to model as a plain 1-d array, and do not care what the independent variables are. If you flatten your 3d data, each value will correspond to an (x, y, z) tuple. You can just pass that information along as "extra" data to you fitting routine to help you calculate the model curve that will be fitted to your data.
I have plotted some experimental data in Python and need to find a cubic fit to the data. The reason I need to do this is because the cubic fit will be used to remove background (in this case resistance in a diode) and you will be left with the evident features. Here is the code I am currently using to make the cubic fit in the first place, where Vnew and yone represent arrays of the experimental data.
answer1=raw_input ('Cubic Plot attempt?\n ')
if answer1 in['y','Y','Yes']:
def cubic(x,A):
return A*x**3
cubic_guess=array([40])
popt,pcov=curve_fit(cubic,Vnew,yone,cubic_guess)
plot(Vnew,cubic(Vnew,*popt),'r-',label='Cubic Fit: curve_fit')
#ylim(-0.05,0.05)
legend(loc='best')
print 'Cubic plotted'
else:
print 'No Cubic Removal done'
I have knowledge of curve smoothing but only in theory. I do not know how to implement it. I would really appreciate any assistance.
Here is the graph generated so far:
To make the fitted curve "wider", you're looking for extrapolation. Although in this case, you could just make Vnew cover a larger interval, in which case you'd put this before your plot command:
Vnew = numpy.linspace(-1,1, 256) # min and max are merely an example, based on your graph
plot(Vnew,cubic(Vnew,*popt),'r-',label='Cubic Fit: curve_fit')
"Blanking out" the feature you see, can be done with numpy's masked arrays but also just by removing those elements you don't want from both your original Vnew (which I'll call xone) and yone:
mask = (xone > 0.1) & (xone < 0.35) # values between these voltages (?) need to be removed
xone = xone[numpy.logical_not(mask)]
yone = yone[numpy.logical_not(mask)]
Then redo the curve fitting:
popt,_ = curve_fit(cubic, xone, yone, cubic_guess)
This will have fitted only to the data that was actually there (which aren't that many points in your dataset, from the looks of it, so beware!).