Polynomial Regression Curves in Python - python

I'm trying to create a regression curve for my data, with 2 degrees. When I create my graph, I get a funny zigzag thing:
but I want to model my data as an actual curve, which would look like the connected version of the scatter plot.
Any advice/better ways of doing this?
degree = 2
p = np.poly1d(np.polyfit(data['input'],y, degree))
plt.plot(data['input'], p(data['input']), c='r',linestyle='-')
plt.scatter(data['input'], p(data['input']), c='b')
Here, data['input'] is a column vector with the same dimensions as y.
Edit: I have also tried it like this:
X, y = np.array(data['input']).reshape(-1,1), np.array(data['output'])
lin_reg=LinearRegression(fit_intercept=False)
lin_reg.fit(X,y)
poly_reg=PolynomialFeatures(degree=2)
X_poly=poly_reg.fit_transform(X)
poly_reg.fit(X_poly,y)
lin_reg2=LinearRegression(fit_intercept=False)
lin_reg2.fit(X_poly,y)
X_grid=np.arange(min(X),max(X),0.1)
X_grid=X_grid.reshape((len(X_grid),1))
plt.scatter(X,y,color='red')
plt.plot(X,lin_reg2.predict(poly_reg.fit_transform(X)),color='blue')
plt.show()
Which gives me this graph here.
The scatter is my data and the blue zigzag is what is SUPPOSED to be a quadratic curve modelling the data. Help?

In your plot you just plot from point to point with straight lines (where your y value is the approximated y from your polyfit function).
I would skip the polyfit function (because you have all y values you are interested in) and just interpolate the data['input'] and y with BSplines function make_interp_spline from scipy and plot the new y values with your interested range of x.
import numpy as np
import matplotlib.pyplot as plt
import scipy.interpolate as interp
plots just from point to point (zigzag)
x = np.array([1, 2, 3, 4])
y = np.array([75, 0, 25, 100])
plt.plot(x, y)
interpolates the points
x_new = np.linspace(1, 4, 300)
a_BSpline = interp.make_interp_spline(x, y)
y_new = a_BSpline(x_new)
plt.plot(x_new, y_new)
Try this and then adjust with your data! :)

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
#improve degree = 3
p_reg = PolynomialFeatures(degree = 3)
X_poly = p_reg.fit_transform(X)
#again create new linear regression obj
reg2 = LinearRegression()
reg2.fit(X_poly,y)
plt.scatter(X, y, color = 'b')
plt.xlabel('Level')
plt.ylabel('Salary')
plt.title("Truth or Bluff")
# predicted values
plt.plot(X, reg2.predict(X_poly), color='r')
plt.show()
With Degree 3
With Degree 4

Related

Plotting Kernel Ridge Regression results in weird lines

I'm trying to run kernel Ridge regression on a simple artificial dataset. When I run the code, I get two plots. The first is for Linear Regression fit, which looks normal. however, the kernel one is very erratic. Is this expected behavior, or am I not calling the functions properly?
The first plt.show():
The second plt.show():
from sklearn.kernel_ridge import KernelRidge
from sklearn.linear_model import LinearRegression
import numpy as np
import matplotlib.pyplot as plt
w = 5
x = np.random.randn(10, 1)
x_to_draw_line = np.random.randn(1000, 1)
y = w * x
lr = LinearRegression()
lr.fit(x, y)
lr_preds = lr.predict(x_to_draw_line)
plt.figure()
plt.plot(x_to_draw_line, lr_preds, color="C1")
plt.scatter(x, y, color="C0")
plt.show()
krr = KernelRidge(kernel="polynomial")
krr.fit(x, y)
krr_preds = krr.predict(x_to_draw_line)
plt.figure()
plt.plot(x_to_draw_line, krr_preds, color="C1")
plt.scatter(x, y, color="C0")
plt.show()
The line plot appears jumbled because matplotlib draws a connecting line between each pair of points in the order they appear in the input array.
The solution is to sort the array of randomly generated x-values for which to generate and draw predictions:
x_to_draw_line = np.random.randn(1000, 1).sort()

How to plot contour levels base on the standard deviation levels in a multivariate normal distribution?

I am using EllipticEnvelope, which estimates my dataset's mean and covariance matrix. Now, I want to plot the multivariate normal distribution using a contour plot but I want to add a contour levels parameter based on different levels of standard deviation similar to this post
but I have this plot (note that the dataset is different):
I also read this post and this one but the answer doesn't work for me and I would like to use levels parameter of the contour plot.
Here is my code:
import numpy as np
import pandas as pd
import scipy.linalg
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.stats import multivariate_normal
%matplotlib inline
from sklearn.datasets import load_iris
iris = load_iris(as_frame=True)
X = iris.data
from sklearn.covariance import EllipticEnvelope
cov = EllipticEnvelope(random_state=42)
cov.fit(X)
i = 0
j = 1
mean = [cov.location_[i], cov.location_[j]]
covariance = [[cov.covariance_[i, i], cov.covariance_[i, j]], [cov.covariance_[j, i], cov.covariance_[j, j]]]
x_list = X[X.columns[i]].values
y_list = X[X.columns[j]].values
x, y = np.mgrid[x_list.min():x_list.max():.01, y_list.min():y_list.max():.01]
pos = np.dstack((x, y))
rv = multivariate_normal(mean, covariance)
z = rv.pdf(pos)
plt.figure()
plt.contour(x, y, z, cmap='RdYlGn')
plt.scatter(x_list, y_list)
plt.xlabel(X.columns[i])
plt.ylabel(X.columns[j])
plt.show()

non linear regression scatter plot

My data points are:
x =[5.00E-07, 1.40E-06, 4.10E-06, 1.25E-05, 3.70E-05, 1.11E-04, 0.33E-04, 1.00E-03]
y= [494.55, 333.4666667, 333.3333333, 333.1, 303.4966667, 197.7533333, 66.43333333, 67.715]
The x axis on my plot must be exponential!!
I want to make a regression line such as the image added, in an S shape. How do I do this (in matlab or python)?
IMG
UPDATE: I tried:
import matplotlib.pyplot as plt
from scipy.interpolate import make_interp_spline
import numpy as np
#create data
x = np.array([5.00E-07, 1.40E-06, 4.10E-06, 1.25E-05, 3.70E-05, 1.11E-04, 3.33E-04, 1.00E-03])
y= np.array([494.55, 333.4666667, 333.3333333, 333.1, 303.4966667, 197.7533333, 66.43333333, 67.715])
#define x as 200 equally spaced values between the min and max of original x
xnew = np.linspace(x.min(), x.max(), 100)
#define spline
spl = make_interp_spline(x, y, k=2)
y_smooth = spl(xnew)
#create smooth line chart
plt.plot(x,y, 'o', xnew, y_smooth)
plt.xscale("log")
plt.show()
My results are: results
How can I make it even smoother? differing the k doesn't make it better.
Note that the higher the degree you use for the k argument, the more “wiggly” the curve will be
Depending on how curved you want the line to be, you can modify the value for k.
try this:
import matplotlib.pyplot as plt
from scipy.interpolate import make_interp_spline
import numpy as np
#create data
x = np.array([5.00E-07, 1.40E-06, 4.10E-06, 1.25E-05, 3.70E-05, 1.11E-04, 3.33E-04, 1.00E-03])
y= np.array([494.55, 333.4666667, 333.3333333, 333.1, 303.4966667, 197.7533333, 66.43333333, 67.715])
#define x as 200 equally spaced values between the min and max of original x
xnew = np.linspace(x.min(), x.max(), 200)
#define spline
spl = make_interp_spline(x, y, k=3)
y_smooth = spl(xnew)
#create smooth line chart
plt.plot(xnew, y_smooth)
plt.show()

Incorrect x axis on Matplotlib when doing polynomial linear regression

The following code results in an x axis that ranges from 8 to 18. The data for the x axis actually ranges from 1,000 to 50 million. I would expect a log scale to show (10,000), (100,000), (1,000,000) (10,000,000) etc.
How do i fix the x axis?
dataset = pandas.DataFrame(Transactions, Price)
dataset = dataset.drop_duplicates()
import numpy as np
import matplotlib.pyplot as plt
X=dataset[['Transactions']]
y=dataset[['Price']]
log_X =np.log(X)
from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree=2)
X_poly = poly_reg.fit_transform(log_X)
pol_reg = LinearRegression()
pol_reg.fit(X_poly, y)
def viz_polymonial():
plt.scatter(log_X, y, color='red')
plt.plot(log_X, pol_reg.predict(poly_reg.fit_transform(log_X)), color='blue')
plt.title('Price Curve')
plt.xlabel('Transactions')
plt.ylabel('Price')
plt.grid(linestyle='dotted')
plt.show()
return
viz_polymonial()
Plot:
You plot the values of log_X with log-scale. It's double-logged. Plot just X with log scale, or np.exp(log_X).
No you are not even using log-scale. Plot X wiht log-scale: plt.xscale("log"), not log_X with normal scale.

2D Density Plot with X Y Z data

I am trying to plot 2d terrain map with x,y and z (elevation). I followed the steps from the following link but I am getting very weird plot.
Python : 2d contour plot from 3 lists : x, y and rho?
I spent almost half day searching but got nowhere.
import numpy as np
import matplotlib.pyplot as plt
import scipy.interpolate
# import data:
import xlrd
loc = "~/Desktop/Book4.xlsx"
wb = xlrd.open_workbook(loc)
sheet = wb.sheet_by_index(0)
sample=500
# Generate array:
x=np.array(sheet.col_values(0))[0:sample]
y=np.array(sheet.col_values(1))[0:sample]
z=np.hamming(sample)[0:sample][:,None]
# Set up a regular grid of interpolation points
xi, yi = np.meshgrid(x, y)
# Interpolate
rbf = scipy.interpolate.Rbf(x, y, z, function='cubic')
zi = rbf(xi, yi)
# Plot
plt.imshow(zi, vmin=z.min(), vmax=z.max(), origin='lower',
extent=[x.min(), x.max(), y.min(), y.max()])
plt.colorbar()
plt.show()
The first of the following fig is what I am getting and the last one is how it should look like.
Any help shall be appreciated
Link to data file
I think the problem is that the data you're giving it is not smooth enough to interpolate with the default parameters. Here's one approach, using mgrid instead of meshgrid:
import numpy as np
import pandas as pd
from scipy.interpolate import Rbf
# fname is your data, but as a CSV file.
data = pd.read_csv(fname).values
x, y = data.T
x_min, x_max = np.amin(x), np.amax(x)
y_min, y_max = np.amin(y), np.amax(y)
# Make a grid with spacing 0.002.
grid_x, grid_y = np.mgrid[x_min:x_max:0.002, y_min:y_max:0.002]
# Make up a Z.
z = np.hamming(x.size)
# Make an n-dimensional interpolator.
rbfi = Rbf(x, y, z, smooth=2)
# Predict on the regular grid.
di = rbfi(grid_x, grid_y)
Then you can look at the result:
import matplotlib.pyplot as plt
plt.imshow(di)
I get:
I wrote a Jupyter Notebook on this topic recently, check it out for a few other interpolation methods, like kriging and spline fitting.

Categories

Resources