Hi everyone i have problem to print my result using MLPClassifier sklearn, i want my result is plot graphs mse vs epoch for training vs testing and this is my code :
#Spliting data into Feature and
x=Dt[['new_tests','people_vaccinated','people_fully_vaccinated','total_boosters','new_vaccinations']]
y=Dt[['new_cases','new_deaths']]
#Import train_test_split function**
from sklearn.model_selection import train_test_split
#Split dataset into training set and test set to be 70% and 30%
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=42)
# Import MLPClassifer and mse
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import mean_squared_error
# Create model object
flc = MLPClassifier(hidden_layer_sizes=(15,10),
random_state=5,
verbose=True,
max_iter=50,
learning_rate_init=0.1)
mse = mean_squared_error(y_test, ypred, squared=False)
# Fit data onto the model
flc.fit(x_train, y_train)
# Make prediction on test dataset
ypred = flc.predict(x_test)
# Import accuracy score
from sklearn.metrics import accuracy_score
# Calcuate accuracy
accuracy_score(y_test,ypred)
I was expecting the graph display mse vs epoch for training and testing Please, is anyone able to run it? Any ideas or suggestion? What could be causing this issue? Thank you.
Mean square error is a regression metric. To visualize the loss per iteration, you can plot the clf.loss_curve_:
import matplotlib.pyplot as plt
from sklearn.neural_network import MLPClassifier
from sklearn.datasets import make_classification
X, y = make_classification()
clf = MLPClassifier(max_iter=50).fit(X, y)
plt.plot(clf.loss_curve_)
plt.show()
If you want to plot something more complicated (e.g. error vs. epoch for training and testing), you'll to write a training loop like this: Training and validation loss history for MLPRegressor
Related
I am trying to use SKLearn's MLPRegressor model, however, I am unable to replicate the final loss value that the model is providing.
I have turned off regularisation by setting alpha to 0, and turned off early-stopping and validation. But my loss calculation differs from what SKLearn is providing by about 1%.
from sklearn.neural_network import MLPRegressor
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
def squared_loss(y_true, y_pred):
return ((y_true - y_pred) ** 2).mean() / 2
X, y = make_regression(n_samples=200, random_state=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)
regr = MLPRegressor(random_state=1, max_iter=500, alpha=0, validation_fraction=0).fit(X_train, y_train)
# sklearn loss calc
regr.loss_ # 354.05262744103453
# manual loss calc
squared_loss(y_train, regr.predict(X_train)) # 350.25399153165534
Any pointers would be great, I've taken my squared_loss function directly from SKLearn docs and I can't find anything else that could be causing the difference when going through the code - but must be missing something. My guess is that there is some sample-size-related adjustment somewhere, because whatever value I change the first 'random_state' to, the difference between the two calculations is always around 1%.
I am getting 100% accuracy in multiple linear regression. I am following one tutorial of last year. He is not getting 100% accuracy on the same model but I am getting now. Seems weird to me. Here's my code. Am I am doing it right or there's something wrong in my code?
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
dataset = pd.read_csv('M_Regression.csv')
X = dataset.iloc[:, :-1].values
Y = dataset.iloc[:, :1].values
from sklearn.model_selection import train_test_split
X_train, x_test, Y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=0)
#regression
from sklearn.linear_model import LinearRegression
reg = LinearRegression()
reg.fit(X_train,Y_train)
#Prediction
y_pred = reg.predict(x_test)
print(str(y_test) + " - " + str(y_pred))
Linear Regression have simple numbers it is common to have 100% accuracy on large dataset. Try with other datasets once. I tried your code i got 1.0 Accuracy on it.
To check the accuracy of your model, you could try printing the r2 score of your test sample. Something among the lines of :
from sklearn.metrics import r2_score
print(r2_score(y_test,y_pred))
if you still have issues with the score. You could try removing the "random_state=0" to check if you still have 100% accuracy with several train/test data sets.
I am trying to predict Boston Housing prices. When I choose polynomial regression degree 1 or 2, R2 score is OK. But 3rd degree decreases R2 score.
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
from sklearn.datasets import load_boston
boston_dataset = load_boston()
dataset = pd.DataFrame(boston_dataset.data, columns = boston_dataset.feature_names)
dataset['MEDV'] = boston_dataset.target
X = dataset.iloc[:, 0:13].values
y = dataset.iloc[:, 13].values.reshape(-1,1)
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
# Fitting Linear Regression to the dataset
from sklearn.linear_model import LinearRegression
# Fitting Polynomial Regression to the dataset
from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree = 2) # <-- Tuning to 3
X_poly = poly_reg.fit_transform(X_train)
poly_reg.fit(X_poly, y_train)
lin_reg_2 = LinearRegression()
lin_reg_2.fit(X_poly, y_train)
y_pred = lin_reg_2.predict(poly_reg.fit_transform(X_test))
from sklearn.metrics import r2_score
print('Prediction Score is: ', r2_score(y_test, y_pred))
Output (degree=2):
Prediction Score is: 0.6903318065831567
Output (degree=3):
Prediction Score is: -12898.308114085281
It is called overfitting the model.What you are doing is fitting the model perfectly on the training set that will lead to high variance.When you fit your hypothesis well on the training set it will then fail on the test set. You can check your r2_score for your training set using r2_score(X_train,y_train). It will be high. You need to balance the trade-off between bias and variance.
You can try other regression models like lasso and ridge and can play with their alpha value in case you are looking for a high r2_score. For better understanding, I am putting up an image that will show how hypothesis line gets affected on increasing the degree of the polynomial.
How can I find the overall accuracy of the outputs that we got by running a decision tree algorithm.I am able to get the top five class labels for the active user input but I am getting the accuracy for the X_train and Y_train dataset using accuracy_score().Suppose I am getting five top recommendation . I wish to get the accuracy for each class labels and with the help of these, the overall accuracy for the output.Please suggest some idea.
My python script is here:
here event is the different class labels
DTC= DecisionTreeClassifier()
DTC.fit(X_train_one_hot,y_train)
print("output from DTC:")
res=DTC.predict_proba(X_test_one_hot)
new=list(chain.from_iterable(res))
#Here I got the index value of top five probabilities
index=sorted(range(len(new)), key=lambda i: new[i], reverse=True)[:5]
for i in index:
print(event[i])
Here is the sample code which i tried to get the accuracy for the predicted class labels:
here index is the index for the top five probability of class label and event is the different class label.
for i in index:
DTC.fit(X_train_one_hot,y_train)
y_pred=event[i]
AC=accuracy_score((event,y_pred)*100)
print(AC)
Since you have a multi-class classification problem, you can calculate accuracy of the classifier by using the confusion_matrix function in Python.
To get overall accuracy, sum the values in the diagonal and divide the sum by the total number of samples.
Consider the following simple multi-class classification example using the IRIS dataset:
import itertools
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
# import some data to play with
iris = datasets.load_iris()
X = iris.data
y = iris.target
class_names = iris.target_names
# Split the data into a training set and a test set
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
# Run classifier, using a model that is too regularized (C too low) to see
# the impact on the results
classifier = svm.SVC(kernel='linear', C=0.01)
y_pred = classifier.fit(X_train, y_train).predict(X_test)
Now to calculate overall accuracy, use confusion matrix:
conf_mat = confusion_matrix(y_pred, y_test)
acc = np.sum(conf_mat.diagonal()) / np.sum(conf_mat)
print('Overall accuracy: {} %'.format(acc*100))
I am tring to use Random Forest Classifier from scikit learn in Python to predict stock movements. My dataset has 8 features and 1201 records. But after fitting the model and using it to predict, it appears 100% of accuracy and 100% of OOB error. I modified the n_estimators from 100 to a small value, but the OOB error has just dropped few %. Here is my code:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
import numpy as np
import pandas as pd
#File reading
df = pd.read_csv('700.csv')
df.drop(df.columns[0],1,inplace=True)
target = df.iloc[:,8]
print(target)
#train test split
X_train, X_test, y_train, y_test = train_test_split(df, target, test_size=0.3)
#model fit
clf = RandomForestClassifier(n_estimators=100, criterion='gini',oob_score= True)
clf.fit(X_train,y_train)
pred = clf.predict(X_test)
accuaracy = accuracy_score(y_test,pred)
print(clf.oob_score_)
print(accuaracy)
How can I modifiy the code in order to make the oob error drop? Thanks.
If you want to check the error then use/modify your code like this one :
oob_error = 1 - clf.oob_score_