Plotting Kernel Ridge Regression results in weird lines - python

I'm trying to run kernel Ridge regression on a simple artificial dataset. When I run the code, I get two plots. The first is for Linear Regression fit, which looks normal. however, the kernel one is very erratic. Is this expected behavior, or am I not calling the functions properly?
The first plt.show():
The second plt.show():
from sklearn.kernel_ridge import KernelRidge
from sklearn.linear_model import LinearRegression
import numpy as np
import matplotlib.pyplot as plt
w = 5
x = np.random.randn(10, 1)
x_to_draw_line = np.random.randn(1000, 1)
y = w * x
lr = LinearRegression()
lr.fit(x, y)
lr_preds = lr.predict(x_to_draw_line)
plt.figure()
plt.plot(x_to_draw_line, lr_preds, color="C1")
plt.scatter(x, y, color="C0")
plt.show()
krr = KernelRidge(kernel="polynomial")
krr.fit(x, y)
krr_preds = krr.predict(x_to_draw_line)
plt.figure()
plt.plot(x_to_draw_line, krr_preds, color="C1")
plt.scatter(x, y, color="C0")
plt.show()

The line plot appears jumbled because matplotlib draws a connecting line between each pair of points in the order they appear in the input array.
The solution is to sort the array of randomly generated x-values for which to generate and draw predictions:
x_to_draw_line = np.random.randn(1000, 1).sort()

Related

non linear regression scatter plot

My data points are:
x =[5.00E-07, 1.40E-06, 4.10E-06, 1.25E-05, 3.70E-05, 1.11E-04, 0.33E-04, 1.00E-03]
y= [494.55, 333.4666667, 333.3333333, 333.1, 303.4966667, 197.7533333, 66.43333333, 67.715]
The x axis on my plot must be exponential!!
I want to make a regression line such as the image added, in an S shape. How do I do this (in matlab or python)?
IMG
UPDATE: I tried:
import matplotlib.pyplot as plt
from scipy.interpolate import make_interp_spline
import numpy as np
#create data
x = np.array([5.00E-07, 1.40E-06, 4.10E-06, 1.25E-05, 3.70E-05, 1.11E-04, 3.33E-04, 1.00E-03])
y= np.array([494.55, 333.4666667, 333.3333333, 333.1, 303.4966667, 197.7533333, 66.43333333, 67.715])
#define x as 200 equally spaced values between the min and max of original x
xnew = np.linspace(x.min(), x.max(), 100)
#define spline
spl = make_interp_spline(x, y, k=2)
y_smooth = spl(xnew)
#create smooth line chart
plt.plot(x,y, 'o', xnew, y_smooth)
plt.xscale("log")
plt.show()
My results are: results
How can I make it even smoother? differing the k doesn't make it better.
Note that the higher the degree you use for the k argument, the more “wiggly” the curve will be
Depending on how curved you want the line to be, you can modify the value for k.
try this:
import matplotlib.pyplot as plt
from scipy.interpolate import make_interp_spline
import numpy as np
#create data
x = np.array([5.00E-07, 1.40E-06, 4.10E-06, 1.25E-05, 3.70E-05, 1.11E-04, 3.33E-04, 1.00E-03])
y= np.array([494.55, 333.4666667, 333.3333333, 333.1, 303.4966667, 197.7533333, 66.43333333, 67.715])
#define x as 200 equally spaced values between the min and max of original x
xnew = np.linspace(x.min(), x.max(), 200)
#define spline
spl = make_interp_spline(x, y, k=3)
y_smooth = spl(xnew)
#create smooth line chart
plt.plot(xnew, y_smooth)
plt.show()

Polynomial Regression Curves in Python

I'm trying to create a regression curve for my data, with 2 degrees. When I create my graph, I get a funny zigzag thing:
but I want to model my data as an actual curve, which would look like the connected version of the scatter plot.
Any advice/better ways of doing this?
degree = 2
p = np.poly1d(np.polyfit(data['input'],y, degree))
plt.plot(data['input'], p(data['input']), c='r',linestyle='-')
plt.scatter(data['input'], p(data['input']), c='b')
Here, data['input'] is a column vector with the same dimensions as y.
Edit: I have also tried it like this:
X, y = np.array(data['input']).reshape(-1,1), np.array(data['output'])
lin_reg=LinearRegression(fit_intercept=False)
lin_reg.fit(X,y)
poly_reg=PolynomialFeatures(degree=2)
X_poly=poly_reg.fit_transform(X)
poly_reg.fit(X_poly,y)
lin_reg2=LinearRegression(fit_intercept=False)
lin_reg2.fit(X_poly,y)
X_grid=np.arange(min(X),max(X),0.1)
X_grid=X_grid.reshape((len(X_grid),1))
plt.scatter(X,y,color='red')
plt.plot(X,lin_reg2.predict(poly_reg.fit_transform(X)),color='blue')
plt.show()
Which gives me this graph here.
The scatter is my data and the blue zigzag is what is SUPPOSED to be a quadratic curve modelling the data. Help?
In your plot you just plot from point to point with straight lines (where your y value is the approximated y from your polyfit function).
I would skip the polyfit function (because you have all y values you are interested in) and just interpolate the data['input'] and y with BSplines function make_interp_spline from scipy and plot the new y values with your interested range of x.
import numpy as np
import matplotlib.pyplot as plt
import scipy.interpolate as interp
plots just from point to point (zigzag)
x = np.array([1, 2, 3, 4])
y = np.array([75, 0, 25, 100])
plt.plot(x, y)
interpolates the points
x_new = np.linspace(1, 4, 300)
a_BSpline = interp.make_interp_spline(x, y)
y_new = a_BSpline(x_new)
plt.plot(x_new, y_new)
Try this and then adjust with your data! :)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
#improve degree = 3
p_reg = PolynomialFeatures(degree = 3)
X_poly = p_reg.fit_transform(X)
#again create new linear regression obj
reg2 = LinearRegression()
reg2.fit(X_poly,y)
plt.scatter(X, y, color = 'b')
plt.xlabel('Level')
plt.ylabel('Salary')
plt.title("Truth or Bluff")
# predicted values
plt.plot(X, reg2.predict(X_poly), color='r')
plt.show()
With Degree 3
With Degree 4

Incorrect x axis on Matplotlib when doing polynomial linear regression

The following code results in an x axis that ranges from 8 to 18. The data for the x axis actually ranges from 1,000 to 50 million. I would expect a log scale to show (10,000), (100,000), (1,000,000) (10,000,000) etc.
How do i fix the x axis?
dataset = pandas.DataFrame(Transactions, Price)
dataset = dataset.drop_duplicates()
import numpy as np
import matplotlib.pyplot as plt
X=dataset[['Transactions']]
y=dataset[['Price']]
log_X =np.log(X)
from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree=2)
X_poly = poly_reg.fit_transform(log_X)
pol_reg = LinearRegression()
pol_reg.fit(X_poly, y)
def viz_polymonial():
plt.scatter(log_X, y, color='red')
plt.plot(log_X, pol_reg.predict(poly_reg.fit_transform(log_X)), color='blue')
plt.title('Price Curve')
plt.xlabel('Transactions')
plt.ylabel('Price')
plt.grid(linestyle='dotted')
plt.show()
return
viz_polymonial()
Plot:
You plot the values of log_X with log-scale. It's double-logged. Plot just X with log scale, or np.exp(log_X).
No you are not even using log-scale. Plot X wiht log-scale: plt.xscale("log"), not log_X with normal scale.

python piecewise linear interpolation

I'm trying to create a piecewise linear interpolation routine and I'm pretty new to all of this so I'm very uncertain of what needs to be done.
I've generate a set of data points in 3D which gives variation in all 3 directions. I want to interpolate between these data points and plot in 3D.
The current data set is much smaller than the final one will be. Linear interpolation is important.
here's the current code
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import scipy.interpolate as interp
x = np.linspace(-1.3,1.3,10)
y1 = np.linspace(.5,0.,5)
y2 = np.linspace(0.,.5,5)
y = np.hstack((y1,y2))
z1 = np.linspace(.1,0.,5)
z2 = np.linspace(0.,.1,5)
z = np.hstack((z1,z2))
data = np.dstack([x,y,z])
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
f = interp.interp2d(x, y, z, kind='linear')
xnew = np.linspace(-1.3,1.3,100)
y1new = np.linspace(.5,0.,50)
y2new = np.linspace(0.,.5,50)
ynew = np.hstack((y1new,y2new))
znew = f(xnew,ynew)
ax.plot(x,y,znew, 'b-')
ax.scatter(x,y,z,'ro')
plt.show()
As I said, dataset is just to add variation. The real set will be much bigger but have less variation. I don't really understand the interpolation tool and the scipy documentation isn't very clear
would appreciate suggestions
2D ok. Please help with 3D
What I'm trying to do is build something that takes data points for deflections of a beam an interpolates between the data points. I wanted to to this in 3D and get a 3D plot showing the deflection along the x-axis in both y and z directions at the same time. As a stop gap measure I've used the below code to individually show deflection in y dir and z dir. Note, the data set is randomly generated for the moment. Some choices might look strange at the mo, but that's to sorta stick to the kinda range the final data set will use. The code below works for a 2D system so may be helpful to someone. I'd still really appreciate if someone could help me do this in 3D.
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import CubicSpline
u=10
x = np.linspace(-1.3,1.3,u) #regular x-data
y = np.random.random_sample(u)/4 #random y data
z = np.random.random_sample(u)/10 # random zdata
ynone = np.ones(u)*0.1 #no deflection dataset
znone = np.ones(u)*0.05
xspace = np.linspace(-1.3, 1.3, u*100)
ydefl = CubicSpline(x, y) #creating cubinc spline function for original data
zdefl = CubicSpline(x, z)
plt.subplot(2, 1, 1)
plt.plot(x, ynone, '-',label='y - no deflection')
plt.plot(x, y, 'go',label='y-deflection data')
plt.plot(xspace, ydefl(xspace), label='spline') #plot xspace vs spline function of xspace
plt.title('X [m]s')
plt.ylabel('Y [m]')
plt.legend(loc='best', ncol=3)
plt.subplot(2, 1, 2)
plt.plot(x, znone, '-',label='z - no deflection')
plt.plot(x, z, 'go',label='z-deflection data')
plt.plot(xspace, zdefl(xspace),label='spline')
plt.xlabel('X [m]')
plt.ylabel('Z [m]')
plt.legend(loc='best', ncol=3)
plt.show()

Lasso Regression in Sklearn Returning Inaccurate Coefficients

I'm trying to use sklearn and Lasso regression to do some analysis, but I'm getting some strange results. I've tried to narrow the problem, but it appears that the issue is that I just don't understand what sklearn is trying to do. For example, in the code below I would expect that the coefficient for 5th power of x to be 2. Or at least close to it. However, no matter what I do, I keep getting values around 16.
Any ideas about what I'm missing/doing wrong?
import matplotlib
matplotlib.use('Qt4Agg')
from matplotlib import pyplot as plt
import numpy as np
from sklearn.linear_model import Lasso
x_data = np.reshape(np.linspace(-3, 3, 20), (-1, 1))
y_data = 2*x_data**5 # + np.random.normal(0, 2, x_data.shape)
X = np.hstack((np.ones(x_data.shape), x_data, x_data**2, x_data*3, x_data**4, x_data*5))
c5 = list()
for alpha in np.logspace(0, 2, num=100):
model = Lasso(alpha=alpha, max_iter=15000, fit_intercept=True, warm_start=False, selection='cyclic', tol=1e-5)
model.fit(X, y_data)
coefficient = model.coef_[-1]
c5.append(coefficient)
fig = plt.figure()
plt.plot(np.logspace(0, 2, num=100).tolist(), c5, 'r-')
plt.xlabel('x data')
plt.ylabel('y data')
plt.grid(True)
fig.canvas.manager.window.raise_()
plt.show()

Categories

Resources