I am new to machine learning. I have been trying to get this code working but the loss is stuck as 1.12 and is neither increasing or decreasing. Any help would be appreciated.
import pandas as pd
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
dataset = pd.read_csv('Iris.csv')
#for rncoding label
from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()
dataset["Labels"] = encoder.fit_transform(dataset["Species"])
X = dataset.iloc[:,1:5]
Y = dataset['Labels']
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.33, random_state=42)
X_train = np.array(X_train).astype(np.float32)
X_test = np.array(X_test).astype(np.float32)
y_train = np.array(y_train).astype(np.float32)
y_test = np.array(y_test).astype(np.float32)
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(8, input_shape=(4,), activation='relu'),
tf.keras.layers.Dense(3, activation='softmax')
])
opt = tf.keras.optimizers.Adam(0.01)
model.compile(optimizer=opt, loss='mse')
r = model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=50)
This is a classification problem where you have to predict the class of Iris plant (source). You have specified mse loss which stands for 'Mean Squared Error'. It measures the average deviation of predicted values from actual values. The square ensures you penalize a large deviation higher than a small deviation. This loss is used for regression problems when you have to predict a continuous value like price, clicks, sales etc.
A few suggestions that will help are:
Change the loss to a classification loss function. categorical_cross_entropy is a good choice here. Without going into too many details, in classification problems model outputs the score of a particular sample belonging to a class. The softmax function used by you converts these scores to normalized probabilities. The cross-entropy loss ensures that your model is penalized when it gives a high probability to the wrong class
Try standardizing your data with 0 mean and unit variance. This helps the model convergence.
You may refer to this article for building a neural network for Iris dataset.
Related
We have developed an Artificial Neural Network (ANN), where we split our data into training and testing data with train_test_split. As we want a better and more generalized estimate of our performance scores, we would like to split data with k-fold instead.
Now, we split the data into 70% training and 30% testing data with train_test_split
def splitData(dataframe):
X = dataframe[Predictors].values
y = dataframe[TargetVariable].values
### Sandardization of data ###
PredictorScaler = StandardScaler()
TargetVarScaler = StandardScaler()
# Storing the fit object for later reference
PredictorScalerFit = PredictorScaler.fit(X)
TargetVarScalerFit = TargetVarScaler.fit(y)
# Generating the standardized values of X and y
X = PredictorScalerFit.transform(X)
y = TargetVarScalerFit.transform(y)
# Split the data into training and testing set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
return X_train, X_test, y_train, y_test, PredictorScalerFit, TargetVarScalerFit
How do we split our data with k-fold instead?
And are we right in our assumptions that the performance scores will result in better and more generalized estimates if using k-fold cross-validation instead of train_test_split?
Our main
if __name__ == '__main__':
print('--------------')
# ...
dataframe = pd.read_csv("/.../file.csv")
# Splitting data into training and tesing data
X_train, X_test, y_train, y_test, PredictorScalerFit, TargetVarScalerFit = splitData(dataframe=dataframe)
# Making the Regression Artificial Neural Network (ANN)
ann = ANN(X_train=X_train, y_train=y_train, X_test=X_test, y_test=y_test, PredictorScalerFit=PredictorScalerFit, TargetVarScalerFit=TargetVarScalerFit)
# Evaluation of the performance of the Aritifical Neural Network (ANN)
performanceEvaluation(y_test_orig=ann['temp'], y_test_pred=ann['Predicted_temp'])
Our ANN Prediction function
def ANN(X_train, y_train, X_test, y_test, TargetVarScalerFit, PredictorScalerFit):
# Making the regression Artificial Neural Network (ANN) Model
model = make_regression_ann()
# Fitting the ANN to the Training set
model.fit(X_train, y_train, batch_size=5, epochs=50, verbose=1)
# Generating Predictions on testing data
Predictions = model.predict(X_test)
# Scaling the predicted temp data back to original price scale
Predictions = TargetVarScalerFit.inverse_transform(Predictions)
# Scaling the y_test temp data back to original temp scale
y_test_orig = TargetVarScalerFit.inverse_transform(y_test)
# Scaling the test data back to original scale
Test_Data = PredictorScalerFit.inverse_transform(X_test)
TestingData = pd.DataFrame(data=Test_Data, columns=Predictors)
TestingData['temp'] = y_test_orig
TestingData['Predicted_temp'] = Predictions
TestingData.head()
TestingData.to_csv("TestingData.csv")
return TestingData
Making the regression ann model
def make_regression_ann(initializer='normal', activation='relu', optimizer='adam', loss='mse'):
# create ANN model
model = Sequential()
# Defining the Input layer and FIRST hidden layer, both are same!
model.add(Dense(units=8, input_dim=7, kernel_initializer=initializer, activation=activation))
# Defining the Second layer of the model
# after the first layer we don't have to specify input_dim as keras configure it automatically
model.add(Dense(units=6, kernel_initializer=initializer, activation=activation))
# The output neuron is a single fully connected node
# Since we will be predicting a single number
model.add(Dense(1, kernel_initializer=initializer))
# Compiling the model
model.compile(loss=loss, optimizer=optimizer)
return model
Our function to generate performance scores
def performanceEvaluation(y_test_orig, y_test_pred):
# Computing the Mean Absolute Percent Error
MAPE = mean_absolute_percentage_error(y_test_orig, y_test_pred)
# Computing R2 Score
r2 = r2_score(y_test_orig, y_test_pred)
# Computing Mean Square Error (MSE)
MSE = mean_squared_error(y_test_orig, y_test_pred)
# Computing Root Mean Square Error (RMSE)
RMSE = mean_squared_error(y_test_orig, y_test_pred, squared=False)
# Computing Mean Absolute Error (MAE)
MAE = mean_absolute_error(y_test_orig, y_test_pred)
# Computing Mean Bias Error (MBE)
MBE = np.mean(y_test_pred - y_test_orig) # here we calculate MBE
print('--------------')
print('The Coefficient of Determination (R2) of ANN model is:', r2)
print("The Root Mean Squared Error (RMSE) of ANN model is:", RMSE)
print("The Mean Squared Error (MSE) of ANN model is:", MSE)
print('The Mean Absolute Percent Error (MAPE) of ANN model is:', MAPE)
print("The Mean Absolute Error (MAE) of ANN model is:", MAE)
print("The Mean Bias Error (MBE) of ANN model is:", MBE)
print('--------------')
eval_list = [r2, RMSE, MSE, MAPE, MAE, MBE]
columns = ['Coefficient of Determination (R2)', 'Root Mean Square Error (RMSE)', 'Mean Squared Error (MSE)',
'Mean Absolute Percent Error (MAPE)', 'Mean Absolute Error (MAE)', 'Mean Bias Error (MBE)']
dataframe = pd.DataFrame([eval_list], columns=columns)
return dataframe
Our performance scores
Coefficient of Determination (R2) Root Mean Square Error (RMSE) Mean Squared Error (MSE) Mean Absolute Percent Error (MAPE) Mean Absolute Error (MAE) Mean Bias Error (MBE)
0.982052940563799 0.7293977725380798 0.5320211105835124 0.0894734801108027 0.5711224962560332 0.049541171482965995
What we tried to reach our goal
Splitting whole dataset into X (Predictors) , y (TargetVariable)
def splitData(dataframe):
X = dataframe[Predictors].values
y = dataframe[TargetVariable].values
### Sandardization of data ###
PredictorScaler = StandardScaler()
TargetVarScaler = StandardScaler()
# Storing the fit object for later reference
PredictorScalerFit = PredictorScaler.fit(X)
TargetVarScalerFit = TargetVarScaler.fit(y)
# Generating the standardized values of X and y
X = PredictorScalerFit.transform(X)
y = TargetVarScalerFit.transform(y)
return X, y
Utilize ANN
def ANN_test(X,y):
# Fitting the ANN to the Training set
model = KerasRegressor(build_fn=make_regression_ann(), epochs=50, batch_size=5)
cv = KFold(n_splits=10, random_state=1, shuffle=True)
test = cross_val_score(model, X=X, y=y, cv=cv, scoring="neg_mean_squared_error", n_jobs=1)
print(test)
mean = test.mean()
print(mean)
Error receiving from using this function:
2021-12-24 16:16:47.909705: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
/Users/yusuf/PycharmProjects/MasterThesis-ArtificialNeuralNetwork/main.py:168: DeprecationWarning: KerasRegressor is deprecated, use Sci-Keras (https://github.com/adriangb/scikeras) instead.
model = KerasRegressor(build_fn=make_regression_ann(), epochs=50, batch_size=5)
2021-12-24 16:16:48.193312: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
[nan nan nan nan nan nan nan nan nan nan]
nan
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/model_selection/_validation.py:372: FitFailedWarning:
10 fits failed out of a total of 10.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/model_selection/_validation.py", line 681, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/keras/wrappers/scikit_learn.py", line 152, in fit
self.model = self.build_fn(
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/keras/engine/base_layer.py", line 3076, in _split_out_first_arg
raise ValueError(
ValueError: The first argument to `Layer.call` must always be passed.
warnings.warn(some_fits_failed_message, FitFailedWarning)
We are operating on:
Mac Monterey, Version: 12.0.1
Tensorflow Version: 2.7.0
Keras Version: 2.7.0
Libraries
import os
import time
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from pathlib import Path
from sklearn.metrics import accuracy_score, make_scorer, mean_absolute_percentage_error, mean_absolute_error, mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, GridSearchCV, KFold, cross_val_score
from keras.wrappers.scikit_learn import KerasRegressor
from keras.models import Sequential
from keras.layers import Dense
You need to use KerasRegressor to wrap your keras model as a scikit learn model.
Take a look at example 1 here
from keras.wrappers.scikit_learn import KerasRegressor
model = KerasRegressor(build_fn=make_regression_ann, epochs=512, batch_size=3)
kfold = KFold(n_splits=10, random_state=seed)
scores = cross_val_score(model, x, y, cv=kfold)
I am an electrical engineer and I am looking for a solution to calculate the DC current of a permanent synchronous motor. So I decided to check the ANN solutions with Keras and so on.Long story short, I'll show you a screenshot of some measured signals.
The first 5 signals are the measured signals. The last one is the DC current, which I will estimate. Here the value was recorded with the help of a current clamp. Okay, I started building a model in Python and tried some things that I assume will increase the accuracy of the model. But after all that, I am not getting that good results from the model and my hope is that maybe I am choosing wrong parameters or not an ideal model for this purpose.
Here is my code:
import numpy as np
from keras.layers import Dense, LSTM
from keras.models import Sequential
from keras.callbacks import EarlyStopping
import pandas as pd
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
from matplotlib import pyplot as plt
import seaborn as sns
# Import input (x) and output (y) data, and asign these to df1 and df1
df = pd.read_csv('train_data.csv')
df = df[['rpm','iq','uq','udc','idc']]
X = df[df.columns[:-1]]
Y = df.idc
plt.figure()
sns.heatmap(df.corr(),annot=True)
plt.show()
# Split the data into input (x) training and testing data, and ouput (y) training and testing data,
# with training data being 80% of the data, and testing data being the remaining 20% of the data
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2)#, shuffle=True)
# Scale both training and testing input data
X_train = preprocessing.maxabs_scale(X_train)
X_test = preprocessing.maxabs_scale(X_test)
model = Sequential()
model.add(Dense(4, input_shape=(4,)))
model.add(Dense(4, input_shape=(4,)))
model.add(Dense(1, input_shape=(4,)))
model.compile(optimizer="adam", loss="msle", metrics=['mean_squared_logarithmic_error','accuracy'])
# Pass several parameters to 'EarlyStopping' function and assign it to 'earlystopper'
earlystopper = EarlyStopping(monitor='val_loss', min_delta=0, patience=15, verbose=1, mode='auto')
model.summary()
history = model.fit(X_train, y_train, epochs = 2000, validation_split = 0.3, verbose = 2, callbacks = [earlystopper])
# Runs model (the one with the activation function, although this doesn't really matter as they perform the same)
# with its current weights on the training and testing data
y_train_pred = model.predict(X_train)
y_test_pred = model.predict(X_test)
# Calculates and prints r2 score of training and testing data
print("The R2 score on the Train set is:\t{:0.3f}".format(r2_score(y_train, y_train_pred)))
print("The R2 score on the Test set is:\t{:0.3f}".format(r2_score(y_test, y_test_pred)))
df = pd.read_csv('test_two_data.csv')
df = df[['rpm','iq','uq','udc','idc']]
X = df[df.columns[:-1]]
Y = df.idc
X_validate = preprocessing.maxabs_scale(X)
y_pred = model.predict(X_validate)
plt.plot(Y)
plt.plot(y_pred)
plt.show()
(weight_0,bias_0) = model.layers[0].get_weights()
(weight_1,bias_1) = model.layers[1].get_weights()
One limitation is that I can't use LSTM layers or other complex algorithms because I need to implement the trained model in a microcontroller on a motor application later.
I guess you could find some words for me to make my model a little better in accuracy.
At the end here is a figure where I show you the worse prediction performance. Orange is the prediction and blue is the measured current.
The training dataset was this one.
The correlation between the individual values can be found here. Since the values of id and ud have no correlation to idc, I decided to delete them.
The most important thing to keep in mind when trying to improve the accuracy of the model is ALWAYS Normalise the input data which basically means rescaling real-valued numeric attributes into the range 0 and 1. I am not able to understand the way you are providing the training data to the model. Could you please explain that. It would be better in understanding and identifying the scope of higher accuracy.
Now if we talk about parameters, I would suggest you the addition of a Tuning Algorithm for the parameters to get the optimized value of each parameter.
It is always a good parctice to include hidden layers which could provide better feature extract.
I am trying simple multinomial logistic regression using Keras, but the results are quite different compared to standard scikit-learn approach.
For example with iris data:
import numpy as np
import pandas as pd
df = pd.read_csv("./data/iris.data", header=None)
from sklearn.model_selection import train_test_split
df_train, df_test = train_test_split(df, test_size=0.3, random_state=52)
X_train = df_train.drop(4, axis=1)
y_train = df_train[4]
X_test = df_test.drop(4, axis=1)
y_test = df_test[4]
Using scikit-learn:
from sklearn.linear_model import LogisticRegression
scikit_model = LogisticRegression(multi_class='multinomial', solver ='saga', max_iter=500)
scikit_model.fit(X_train, y_train)
the average weighted f1-score on test set:
y_test_pred = scikit_model.predict(X_test)
from sklearn.metrics import classification_report
print(classification_report(y_test, y_test_pred, scikit_model.classes_))
is 0.96.
Then with Keras:
from sklearn.preprocessing import LabelEncoder
from keras.utils import np_utils
# first we have to encode class values as integers
encoder = LabelEncoder()
encoder.fit(y_train)
y_train_encoded = encoder.transform(y_train)
Y_train = np_utils.to_categorical(y_train_encoded)
y_test_encoded = encoder.transform(y_test)
Y_test = np_utils.to_categorical(y_test_encoded)
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from keras.regularizers import l2
#model construction
input_dim = 4 # 4 variables
output_dim = 3 # 3 possible outputs
def classification_model():
model = Sequential()
model.add(Dense(output_dim, input_dim=input_dim, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
return model
#training
keras_model = classification_model()
keras_model.fit(X_train, Y_train, epochs=500, verbose=0)
the average weighted f1-score on test set:
classes = np.argmax(keras_model.predict(X_test), axis = 1)
y_test_pred = encoder.inverse_transform(classes)
from sklearn.metrics import classification_report
print(classification_report(y_test, y_test_pred, encoder.classes_))
is 0.89.
Is it possible to perform identical (or at least as much as possible) logistic regression with Keras as with scikit-learn?
I tried to run your examples and noticed a couple of potential sources:
The test set is incredibly small, only 45 instances. This means that to get from accuracy of .89 to .96, the model only needs to predict just three more instances correctly. Due to randomness in training, your Keras results can oscillate quite a bit.
As explained by #meowongac https://stackoverflow.com/a/59643522/1467943, you're using a different optimizer. One point is that scikit's algorithm will automatically set its learning rate. For SGD in Keras, tweaking learning rate and/or number of epochs could lead to improvements.
Scikit learn quietly uses L2 regularization by default.
Using your code, I was able to get accuracy ranging from .89 to .96 by running SGD with learning rate set to .05. When switching to Adam (also with this quite high learning rate), I got more stable results ranging from .92 to .96 (although this is more of an impression as I didn't run too many trials).
One obvious difference is saga (a variant of SAG) is used in LogisticRegression while SGD is used in your NN. As far as I know, LogisticRegression doesn't support SGD. Alternatively you can use SGDRegressor or SGDClassifier instead of LogisticRegression. And here is a blog discussing the differences between them.
I am currently working on a model that reads structured data and determines if someone has a disease. I think the issue is the data is not being split between training and testing data. I am unaware of how I would be able to do that.
I am not sure what to try.
import pandas as pd
import numpy as np
import keras
from keras.models import Sequential
from keras.layers import Dense
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
import seaborn as sns
from sklearn.tree import DecisionTreeClassifier
heart_data = pd.read_csv('cardio_train.csv')
heart_data.head()
heart_data.shape
heart_data.describe()
heart_data.isnull().sum()
heart_data_columns = heart_data.columns
predictors = heart_data[heart_data_columns[heart_data_columns != 'target']] # all columns except Breast Cancer
target = heart_data['target'] # Breast Cancer column
#This function returns the first n rows for the object based on position. It is useful for quickly testing if your object has the right type
predictors.head()
target.head()
#normalize the data by subtracting the mean and dividing by the standard deviation.
predictors_norm = (predictors - predictors.mean()) / predictors.std()
predictors_norm.head()
n_cols = predictors_norm.shape[1] # number of predictors
def regression_model():
# create model
model = Sequential()
#inputs
model.add(Dense(50, activation='relu', input_shape=(n_cols,)))
model.add(Dense(50, activation='relu')) # activation function
model.add(Dense(1))
# compile model
model.compile(optimizer='adam', loss='mean_squared_error')
#loss measures the results and figures out how bad it did. Optimizer generates next guess.
return model
# build the model
model = regression_model()
print (model)
# fit the model
history=model.fit(predictors_norm, target, validation_split=0.3, epochs=10, verbose=2)
#Decision Tree
print ("Processing Decision Tree")
dtc = DecisionTreeClassifier()
dtc.fit(predictors_norm,target)
print("Decision Tree Test Accuracy {:.2f}%".format(dtc.score(predictors_norm, target)*100))
#Support Vector Machine
print ("Processing Support Vector Machine")
svm = SVC(random_state = 1)
svm.fit(predictors_norm, target)
print("Test Accuracy of SVM Algorithm: {:.2f}%".format(svm.score(predictors_norm,target)*100))
#Random Forest
print ("Processing Random Forest")
rf = RandomForestClassifier(n_estimators = 1000, random_state = 1)
rf.fit(predictors_norm, target)
print("Random Forest Algorithm Accuracy Score : {:.2f}%".format(rf.score(predictors_norm,target)*100))
The message i am getting is this
Decision Tree Test Accuracy 100.00%
However, support vector machine is getting 73.37%
You are evaluating your model on the same data as when you trained it : you are probably overfitting. To overcome this, you must separate the data into two parts, one for learning, one for testing :
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(predictors, target, test_size=0.2)
Then, learn your model with the train dataset and evaluate it on the test dataset :
dtc = DecisionTreeClassifier()
dtc.fit(x_train, y_train)
accuracy = dtc.score(x_test, y_test) * 100
print(f"Decision Tree test accuracy : {accuracy} %.")
I'm implementing Naive Bayes by sklearn with imbalanced data.
My data has more than 16k records and 6 output categories.
I tried to fit the model with the sample_weight calculated by sklearn.utils.class_weight
The sample_weight received something like:
sample_weight = [11.77540107 1.82284768 0.64688602 2.47138047 0.38577435 1.21389195]
import numpy as np
data_set = np.loadtxt("./data/_vector21.csv", delimiter=",")
inp_vec = data_set[:, 1:22]
out_vec = data_set[:, 22:]
#
# # Split dataset into training set and test set
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(inp_vec, out_vec, test_size=0.2) # 80% training and 20% test
#
# class weight
from keras.utils.np_utils import to_categorical
output_vec_categorical = to_categorical(y_train)
from sklearn.utils import class_weight
y_ints = [y.argmax() for y in output_vec_categorical]
c_w = class_weight.compute_class_weight('balanced', np.unique(y_ints), y_ints)
cw = {}
for i in set(y_ints):
cw[i] = c_w[i]
# Create a Gaussian Classifier
from sklearn.naive_bayes import *
model = GaussianNB()
# Train the model using the training sets
print(c_w)
model.fit(X_train, y_train, c_w)
# Predict the response for test dataset
y_pred = model.predict(X_test)
# Import scikit-learn metrics module for accuracy calculation
from sklearn import metrics
# Model Accuracy, how often is the classifier correct?
print("\nClassification Report: \n", (metrics.classification_report(y_test, y_pred)))
print("\nAccuracy: %.3f%%" % (metrics.accuracy_score(y_test, y_pred)*100))
I got this message:
ValueError: Found input variables with inconsistent numbers of samples: [13212, 6]
Can anyone tell me what did I do wrong and how can fix it?
Thanks a lot.
The sample_weight and class_weight are two different things.
As their name suggests:
sample_weight is to be applied to individual samples (rows in your data). So the length of sample_weight must match the number of samples in your X.
class_weight is to make the classifier give more importance and attention to the classes. So the length of class_weight must match the number of classes in your targets.
You are calculating class_weight and not sample_weight by using the sklearn.utils.class_weight, but then try to pass it to the sample_weight. Hence the dimension mismatch error.
Please see the following questions for more understanding of how these two weights interact internally:
What is the difference between sample weight and class weight options in scikit learn?
https://stats.stackexchange.com/questions/244630/difference-between-sample-weight-and-class-weight-randomforest-classifier
This way I was able to calculate the weights to deal with class imbalance.
from sklearn.utils import class_weight
sample = class_weight.compute_sample_weight('balanced', y_train)
#Classifier Naive Bayes
naive = naive_bayes.MultinomialNB()
naive.fit(X_train,y_train, sample_weight=sample)
predictions_NB = naive.predict(X_test)