Why is my code not predicting and only computes the target value - python

I have this code that load_digits and uses an SVM model for predicting digits. But after fitting the model, its prediction on new values is incorrect and computes target values that do not correspond to the given input. Below is the code:
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn import svm
digits = datasets.load_digits()
my_OCR_model = svm.SVC(gamma = 0.001, C = 100)
X, y = digits.data[:-10], digits.target[:-10]
my_OCR_model.fit(X, y)
print(my_OCR_model.predict(X[[-6]]))
print(y[-6])
plt.imshow(digits.images[-6], cmap=plt.cm.gray_r, interpolation="nearest")
plt.show()

Remove the slicing.
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn import svm
digits = datasets.load_digits()
my_OCR_model = svm.SVC(gamma = 0.001, C = 100)
X, y = digits.data, digits.target # remove slicing here
my_OCR_model.fit(X, y)
print(my_OCR_model.predict(X[[-6]]))
print(y[-6])
plt.imshow(digits.images[-6], cmap=plt.cm.gray_r, interpolation="nearest")
plt.show()
Alternatively, if you had a good reason for slicing, keep the same data for X, y, and images. Use this as the last line:
plt.imshow(digits.images[:-10][-6], cmap=plt.cm.gray_r, interpolation="nearest")
plt.show()

Related

Gaussian Confidence Interval: Python

I'm writing a script that uses GPR to analyze and predict burn properties of different fuels. I've got good outputs for my test set, and now want to add a 95% confidence interval. When I try to implement the interval I get terrible results. Please send help.
#Gaussian Predictions for Ignition Delay
#September 14 2021
%matplotlib inline
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
from sklearn.metrics import mean_absolute_error as mae
from sklearn.model_selection import train_test_split
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, ConstantKernel as C
#gpr = GaussianProcessRegressor()
kernel = C(1.0, (1e-3, 1e3))*RBF(10, (1e-2, 1e2))
gpr = GaussianProcessRegressor(kernel = kernel, n_restarts_optimizer = 9, alpha = 0.1, normalize_y = True)
gpr.fit(x_train, y_train)
y_prediction, std = gpr.predict(x_test, return_std = True)
confidence = std*1.96/np.sqrt(len(x_test))
confidence = confidence.reshape(-1,1)
# Plot the function, the prediction and the 95% confidence interval based on
# the MSE
plt.figure()
plt.plot(x_train, y_train, "b.", markersize=10, label="Observations")
plt.fill(x_test,
y_prediction-confidence,
y_prediction+confidence,
alpha=0.3,
fc="b",
ec="None",
label="95% confidence interval",
) #this plots confidence interval and fit it to my data
plt.plot(x_test, y_prediction, "r.", markersize=10, label="Prediction")
```[enter image description here][1]
[1]: https://i.stack.imgur.com/PItpi.png
Looking at this example from the sklearn docs
https://scikit-learn.org/stable/auto_examples/gaussian_process/plot_gpr_noisy_targets.html#sphx-glr-auto-examples-gaussian-process-plot-gpr-noisy-targets-py
it looks like you need to adapt your plot function. For me, the following worked
plt.fill_between(
x_test.ravel(),
y_prediction - 1.96 * std,
y_prediction + 1.96 * std,
alpha=0.5,
label=r"95% confidence interval",
)
here, I generated data like in the sklearn example:
X = np.linspace(start=0, stop=10, num=1_000).reshape(-1, 1)
y = np.squeeze(X * np.sin(X))
rng = np.random.RandomState(1)
training_indices = rng.choice(np.arange(y.size), size=6, replace=False)
test_indices = [x for x in np.arange(y.size) if x not in training_indices]
x_train, y_train = X[training_indices], y[training_indices]
x_test, y_test = X[test_indices], y[test_indices]

Incorrect x axis on Matplotlib when doing polynomial linear regression

The following code results in an x axis that ranges from 8 to 18. The data for the x axis actually ranges from 1,000 to 50 million. I would expect a log scale to show (10,000), (100,000), (1,000,000) (10,000,000) etc.
How do i fix the x axis?
dataset = pandas.DataFrame(Transactions, Price)
dataset = dataset.drop_duplicates()
import numpy as np
import matplotlib.pyplot as plt
X=dataset[['Transactions']]
y=dataset[['Price']]
log_X =np.log(X)
from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree=2)
X_poly = poly_reg.fit_transform(log_X)
pol_reg = LinearRegression()
pol_reg.fit(X_poly, y)
def viz_polymonial():
plt.scatter(log_X, y, color='red')
plt.plot(log_X, pol_reg.predict(poly_reg.fit_transform(log_X)), color='blue')
plt.title('Price Curve')
plt.xlabel('Transactions')
plt.ylabel('Price')
plt.grid(linestyle='dotted')
plt.show()
return
viz_polymonial()
Plot:
You plot the values of log_X with log-scale. It's double-logged. Plot just X with log scale, or np.exp(log_X).
No you are not even using log-scale. Plot X wiht log-scale: plt.xscale("log"), not log_X with normal scale.

Is it possible to set the color for the bottom region with `mlxtend.plotting`?

I am trying to reproduce the example in this post, which produces this figure.
The colored regions above are plotted by mlxtend.plotting (version '0.14.0').
With the default settings on colab, this code
from mlxtend.plotting import plot_decision_regions
plot_decision_regions(X, y, clf=ppn)
produces this figure.
The data points have been plotted while the bottom region has not.
Is it possible to set the color for the bottom region with mlxtend.plotting?
it seems like a bug derived by the classification of two regions, if you try and separate 3 clusters as the following example it will work.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import itertools
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from mlxtend.classifier import EnsembleVoteClassifier
from mlxtend.data import iris_data
from mlxtend.plotting import plot_decision_regions
# Initializing Classifiers
clf1 = LogisticRegression(random_state=0)
clf2 = RandomForestClassifier(random_state=0)
clf3 = SVC(random_state=0, probability=True)
eclf = EnsembleVoteClassifier(clfs=[clf1, clf2, clf3],
weights=[2, 1, 1], voting='soft')
# Loading some example data
X, y = iris_data()
X = X[:,[0, 2]]
# Plotting Decision Regions
gs = gridspec.GridSpec(2, 2)
fig = plt.figure(figsize=(10, 8))
labels = ['Logistic Regression',
'Random Forest',
'RBF kernel SVM',
'Ensemble']
for clf, lab, grd in zip([clf1, clf2, clf3, eclf],
labels,
itertools.product([0, 1],
repeat=2)):
clf.fit(X, y)
ax = plt.subplot(gs[grd[0], grd[1]])
fig = plot_decision_regions(X=X, y=y,
clf=clf, legend=2)
plt.title(lab)
plt.show()
Try and ask directly on their github directory: https://github.com/rasbt/mlxtend
I think it's possible. You can use the colors parameter instead, I think it is much easier. You should try this one, is this what you are looking for?
fig = plot_decision_regions(
X=X,
y=y.astype(int),
clf=clf,
legend=2,
colors='yellow,red'
)

How to extend the regression line in plot?

I did a cubic regression on the data below. How can I plot the regression line with x value starting from 0 rather than the minimum x?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
df = pd.DataFrame({'x':list(range(3,18)),'y':[-4,-2,0,3,5,8,12,17,21,23,24,25,26,26,24]})
x = df['x'].values.reshape(-1,1)
y = df['y'].values.reshape(-1,1)
cubic = PolynomialFeatures(degree=3)
x_cubic = cubic.fit_transform(x)
cubic.fit(x_cubic, y)
model = LinearRegression()
model.fit(x_cubic, y)
fig, ax = plt.subplots()
ax.scatter(x, y, color = 'blue')
pred = model.predict(cubic.fit_transform(x))
ax.plot(x, pred, color = 'red')
ax.set_xlim(0)
ax.set_ylim(-20)
This is what I have now.
How can I get a plot like this?
Try creating and extended x range like this and predicting with your existing model. Add this to the bottom of your code.
ex_x = np.arange(0,4).reshape(-1,1)
ex_pred = model.predict(cubic.fit_transform(ex_x))
ax.plot(ex_x, ex_pred, color='red', linestyle='--')
Output:

How to evaluate output of DecisionTreeRegressor python

I am trying to use DecisionTreeRegressor from sklearn python to find out what is the dependency between two variables X- axis preassure and y - axis received optical power. I am measuring both parameters like */1min.
When I did work with polyval and polyfit in matlab, I was able to extract
the actual prediction equation which more or less described the relation between
received_optical_power = f(preassure)
I think that my question is basically how to evaluate the output of my analysis when using DecisionTreeRegressor. I mean the actual equation, residua and how to calculate the actual error of extracted curve.
I do my project with jupyter notebooks python at localhost just because my input final.merged.txt file has like 10 MB.
preassure column 3!!!!
print(__doc__)
# Import the necessary modules and libraries
import pandas as pd
import numpy as np
from sklearn.tree import DecisionTreeRegressor
import matplotlib.pyplot as plt
# Create a random dataset
rng = np.random.RandomState(1)
# X = np.sort(5 * rng.rand(80, 1), axis=0)
# y = np.sin(X).ravel()
# y[::5] += 3 * (0.5 - rng.rand(16))
X = pd.read_csv('final.merged.txt',sep = ";",usecols=[3]) # -- toto funguje !!
#X = pd.read_csv('final.merged.txt',sep = ";",usecols=(3,5,6,7,9,10))
#X = X.loc[:,'avgPressure'].values
y = pd.read_csv('final.merged.txt',sep = ";",usecols=[1])
# y = y.ix[:,0]
y = y.loc[:,'received optical power'].values
# Fit regression model
regr_1 = DecisionTreeRegressor(max_depth=2)
regr_2 = DecisionTreeRegressor(max_depth=5)
regr_3 = DecisionTreeRegressor(max_depth=10)
regr_1.fit(X, y)
regr_2.fit(X, y)
regr_3.fit(X, y)
# Predict
#X_test = np.arange(0.0, 5.0, 0.01)[:, np.newaxis]
#X_test =xrange(X.min(),X.max())
X_test = np.arange(X.min(), X.max(), (X.max()-X.min())/len(X))
print "vector y: %s\nvector X: %s\nX_test: %s" % (len(y), len(X),len(X_test))
len(X_test) is len(y)
X_test=pd.DataFrame(X_test,columns = ['X_test'])
y_1 = regr_1.predict(X_test)
y_2 = regr_2.predict(X_test)
y_3 = regr_3.predict(X_test)
# Plot the results
plt.figure()
plt.scatter(X, y, c="darkorange", label="data")
plt.plot(X_test, y_1, color="cornflowerblue", label="max_depth=2", linewidth=2)
plt.plot(X_test, y_2, color="yellowgreen", label="max_depth=5", linewidth=2)
plt.plot(X_test, y_3, color="red", label="max_depth=10", linewidth=2)
plt.xlabel("data")
plt.ylabel("target")
plt.title("Decision Tree Regression")
plt.legend()
plt.show()
I do appreciate any suggestion.

Categories

Resources