I have 50 target classes of 300 datasets.
This is my sample dataset, with 98 features:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
dataset = pd.read_csv(root_path + 'pima-indians-diabetes.data.csv', header=None)
X= dataset.iloc[:,0:8]
y= dataset.iloc[:,8]
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X = sc.fit_transform(X)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=0.3)
from keras import Sequential
from keras.layers import Dense
classifier = Sequential()
#First Hidden Layer
classifier.add(Dense(units = 10, activation='relu',kernel_initializer='random_normal', input_dim=8))
#Second Hidden Layer
classifier.add(Dense(units = 10, activation='relu',kernel_initializer='random_normal'))
#Output Layer
classifier.add(Dense(units = 1, activation='sigmoid',kernel_initializer='random_normal'))
#Compiling the neural network
classifier.compile(optimizer ='adam',loss='binary_crossentropy', metrics =['accuracy'])
#Fitting the data to the training dataset
classifier.fit(X_train,y_train, batch_size=2, epochs=10)
I get 19% accuracy here, and I don't know how to optimize my prediction result.
I am considering that you have performed the Dimentionality Reduction technique on your original data having 98 features and therefore you are using an 8-dimensional input feature in your model.
I have a few observations on your implementation:
[As a Classification Problem]
As you have mentioned that your samples belong to 50 diffecent classes, the problem is certainly a multiclass classification problem. So, you need to encode your label first like:
from keras.utils import to_categorical
y = to_categorical(y, num_classes=50, dtype='float32')
In this case, you need to change the number of output node (representing class) and activation function in the final layer as follows:
classifier.add(Dense(units = 50, activation='softmax'))
Furthermore, you have to ue categorical_crossentropy as a loss function while compiling your model.
classifier.compile(optimizer ='adam',loss='categorical_crossentropy', metrics =['accuracy'])
[As a Regression Problem]
You can also consider this problem as a multiple regression problem as the output is within the range of 0 to 50 (continuous) and can keep a single output node in the final layer as you did. But in that case, you should use a linear activation function instead of sigmoid.
So, the final layer should be like:
classifier.add(Dense(units = 1)) # default activation is linear
Additionally, In case of regression problem, mean_squared_error is the most relevant cost function to use (assuming not many outliers in your dataset) and accuracy as a performance metric is irrelevant (rather you may use mean_absolute_error which is analogous to loss). Hence, the second modification is:
classifier.compile(optimizer ='adam',loss='mean_squared_error')
Related
I am an electrical engineer and I am looking for a solution to calculate the DC current of a permanent synchronous motor. So I decided to check the ANN solutions with Keras and so on.Long story short, I'll show you a screenshot of some measured signals.
The first 5 signals are the measured signals. The last one is the DC current, which I will estimate. Here the value was recorded with the help of a current clamp. Okay, I started building a model in Python and tried some things that I assume will increase the accuracy of the model. But after all that, I am not getting that good results from the model and my hope is that maybe I am choosing wrong parameters or not an ideal model for this purpose.
Here is my code:
import numpy as np
from keras.layers import Dense, LSTM
from keras.models import Sequential
from keras.callbacks import EarlyStopping
import pandas as pd
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
from matplotlib import pyplot as plt
import seaborn as sns
# Import input (x) and output (y) data, and asign these to df1 and df1
df = pd.read_csv('train_data.csv')
df = df[['rpm','iq','uq','udc','idc']]
X = df[df.columns[:-1]]
Y = df.idc
plt.figure()
sns.heatmap(df.corr(),annot=True)
plt.show()
# Split the data into input (x) training and testing data, and ouput (y) training and testing data,
# with training data being 80% of the data, and testing data being the remaining 20% of the data
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2)#, shuffle=True)
# Scale both training and testing input data
X_train = preprocessing.maxabs_scale(X_train)
X_test = preprocessing.maxabs_scale(X_test)
model = Sequential()
model.add(Dense(4, input_shape=(4,)))
model.add(Dense(4, input_shape=(4,)))
model.add(Dense(1, input_shape=(4,)))
model.compile(optimizer="adam", loss="msle", metrics=['mean_squared_logarithmic_error','accuracy'])
# Pass several parameters to 'EarlyStopping' function and assign it to 'earlystopper'
earlystopper = EarlyStopping(monitor='val_loss', min_delta=0, patience=15, verbose=1, mode='auto')
model.summary()
history = model.fit(X_train, y_train, epochs = 2000, validation_split = 0.3, verbose = 2, callbacks = [earlystopper])
# Runs model (the one with the activation function, although this doesn't really matter as they perform the same)
# with its current weights on the training and testing data
y_train_pred = model.predict(X_train)
y_test_pred = model.predict(X_test)
# Calculates and prints r2 score of training and testing data
print("The R2 score on the Train set is:\t{:0.3f}".format(r2_score(y_train, y_train_pred)))
print("The R2 score on the Test set is:\t{:0.3f}".format(r2_score(y_test, y_test_pred)))
df = pd.read_csv('test_two_data.csv')
df = df[['rpm','iq','uq','udc','idc']]
X = df[df.columns[:-1]]
Y = df.idc
X_validate = preprocessing.maxabs_scale(X)
y_pred = model.predict(X_validate)
plt.plot(Y)
plt.plot(y_pred)
plt.show()
(weight_0,bias_0) = model.layers[0].get_weights()
(weight_1,bias_1) = model.layers[1].get_weights()
One limitation is that I can't use LSTM layers or other complex algorithms because I need to implement the trained model in a microcontroller on a motor application later.
I guess you could find some words for me to make my model a little better in accuracy.
At the end here is a figure where I show you the worse prediction performance. Orange is the prediction and blue is the measured current.
The training dataset was this one.
The correlation between the individual values can be found here. Since the values of id and ud have no correlation to idc, I decided to delete them.
The most important thing to keep in mind when trying to improve the accuracy of the model is ALWAYS Normalise the input data which basically means rescaling real-valued numeric attributes into the range 0 and 1. I am not able to understand the way you are providing the training data to the model. Could you please explain that. It would be better in understanding and identifying the scope of higher accuracy.
Now if we talk about parameters, I would suggest you the addition of a Tuning Algorithm for the parameters to get the optimized value of each parameter.
It is always a good parctice to include hidden layers which could provide better feature extract.
I am new to machine learning. I have been trying to get this code working but the loss is stuck as 1.12 and is neither increasing or decreasing. Any help would be appreciated.
import pandas as pd
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
dataset = pd.read_csv('Iris.csv')
#for rncoding label
from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()
dataset["Labels"] = encoder.fit_transform(dataset["Species"])
X = dataset.iloc[:,1:5]
Y = dataset['Labels']
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.33, random_state=42)
X_train = np.array(X_train).astype(np.float32)
X_test = np.array(X_test).astype(np.float32)
y_train = np.array(y_train).astype(np.float32)
y_test = np.array(y_test).astype(np.float32)
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(8, input_shape=(4,), activation='relu'),
tf.keras.layers.Dense(3, activation='softmax')
])
opt = tf.keras.optimizers.Adam(0.01)
model.compile(optimizer=opt, loss='mse')
r = model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=50)
This is a classification problem where you have to predict the class of Iris plant (source). You have specified mse loss which stands for 'Mean Squared Error'. It measures the average deviation of predicted values from actual values. The square ensures you penalize a large deviation higher than a small deviation. This loss is used for regression problems when you have to predict a continuous value like price, clicks, sales etc.
A few suggestions that will help are:
Change the loss to a classification loss function. categorical_cross_entropy is a good choice here. Without going into too many details, in classification problems model outputs the score of a particular sample belonging to a class. The softmax function used by you converts these scores to normalized probabilities. The cross-entropy loss ensures that your model is penalized when it gives a high probability to the wrong class
Try standardizing your data with 0 mean and unit variance. This helps the model convergence.
You may refer to this article for building a neural network for Iris dataset.
So I build a GRU model and I'm comparing 3 different datasets on the same model. I was just running the first dataset and set the number of epochs to 25, but I have noticed that my validation loss is increasing just after the 6th epoch, doesn't that indicate overfitting, am I doing something wrong?
import pandas as pd
import tensorflow as tf
from keras.layers.core import Dense
from keras.layers.recurrent import GRU
from keras.models import Sequential
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from google.colab import files
from tensorboardcolab import TensorBoardColab, TensorBoardColabCallback
tbc=TensorBoardColab() # Tensorboard
df10=pd.read_csv('/content/drive/My Drive/Isolation Forest/IF 10 PERCENT.csv',index_col=None)
df2_10= pd.read_csv('/content/drive/My Drive/2019 Dataframe/2019 10minutes IF 10 PERCENT.csv',index_col=None)
X10_train= df10[['WindSpeed_mps','AmbTemp_DegC','RotorSpeed_rpm','RotorSpeedAve','NacelleOrientation_Deg','MeasuredYawError','Pitch_Deg','WindSpeed1','WindSpeed2','WindSpeed3','GeneratorTemperature_DegC','GearBoxTemperature_DegC']]
X10_train=X10_train.values
y10_train= df10['Power_kW']
y10_train=y10_train.values
X10_test= df2_10[['WindSpeed_mps','AmbTemp_DegC','RotorSpeed_rpm','RotorSpeedAve','NacelleOrientation_Deg','MeasuredYawError','Pitch_Deg','WindSpeed1','WindSpeed2','WindSpeed3','GeneratorTemperature_DegC','GearBoxTemperature_DegC']]
X10_test=X10_test.values
y10_test= df2_10['Power_kW']
y10_test=y10_test.values
# scaling values for model
x_scale = MinMaxScaler()
y_scale = MinMaxScaler()
X10_train= x_scale.fit_transform(X10_train)
y10_train= y_scale.fit_transform(y10_train.reshape(-1,1))
X10_test= x_scale.fit_transform(X10_test)
y10_test= y_scale.fit_transform(y10_test.reshape(-1,1))
X10_train = X10_train.reshape((-1,1,12))
X10_test = X10_test.reshape((-1,1,12))
# creating model using Keras
model10 = Sequential()
model10.add(GRU(units=512, return_sequences=True, input_shape=(1,12)))
model10.add(GRU(units=256, return_sequences=True))
model10.add(GRU(units=256))
model10.add(Dense(units=1, activation='sigmoid'))
model10.compile(loss=['mse'], optimizer='adam',metrics=['mse'])
model10.summary()
history10=model10.fit(X10_train, y10_train, batch_size=256, epochs=25,validation_split=0.20, verbose=1, callbacks=[TensorBoardColabCallback(tbc)])
score = model10.evaluate(X10_test, y10_test)
print('Score: {}'.format(score))
y10_predicted = model10.predict(X10_test)
y10_predicted = y_scale.inverse_transform(y10_predicted)
y10_test = y_scale.inverse_transform(y10_test)
plt.plot( y10_predicted, label='Predicted')
plt.plot( y10_test, label='Measurements')
plt.legend()
plt.savefig('/content/drive/My Drive/Figures/Power Prediction 10 Percent.png')
plt.show()
LSTMs(and also GRUs in spite of their lighter construction) are notorious for easily overfitting.
Reduce the number of units(the output size) in each of the layers(32(layer1)-64(layer2); you could also eliminate the last layer altogether.
The second of all, you are using the activation 'sigmoid', but your loss function + metric is mse.
Ensure that your problem is either a regression or a classification one. If it is indeed a regression, then the activation function should be 'linear' at the last step. If it is a classification one, you should change your loss_function to binary_crossentropy and your metric to 'accuracy'.
Therefore, the plot displayed is just misleading for the moment. If you modify like I suggested and you still get such a train-val loss plot, then we can state for sure that you have an overfitting case.
I am trying simple multinomial logistic regression using Keras, but the results are quite different compared to standard scikit-learn approach.
For example with iris data:
import numpy as np
import pandas as pd
df = pd.read_csv("./data/iris.data", header=None)
from sklearn.model_selection import train_test_split
df_train, df_test = train_test_split(df, test_size=0.3, random_state=52)
X_train = df_train.drop(4, axis=1)
y_train = df_train[4]
X_test = df_test.drop(4, axis=1)
y_test = df_test[4]
Using scikit-learn:
from sklearn.linear_model import LogisticRegression
scikit_model = LogisticRegression(multi_class='multinomial', solver ='saga', max_iter=500)
scikit_model.fit(X_train, y_train)
the average weighted f1-score on test set:
y_test_pred = scikit_model.predict(X_test)
from sklearn.metrics import classification_report
print(classification_report(y_test, y_test_pred, scikit_model.classes_))
is 0.96.
Then with Keras:
from sklearn.preprocessing import LabelEncoder
from keras.utils import np_utils
# first we have to encode class values as integers
encoder = LabelEncoder()
encoder.fit(y_train)
y_train_encoded = encoder.transform(y_train)
Y_train = np_utils.to_categorical(y_train_encoded)
y_test_encoded = encoder.transform(y_test)
Y_test = np_utils.to_categorical(y_test_encoded)
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from keras.regularizers import l2
#model construction
input_dim = 4 # 4 variables
output_dim = 3 # 3 possible outputs
def classification_model():
model = Sequential()
model.add(Dense(output_dim, input_dim=input_dim, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
return model
#training
keras_model = classification_model()
keras_model.fit(X_train, Y_train, epochs=500, verbose=0)
the average weighted f1-score on test set:
classes = np.argmax(keras_model.predict(X_test), axis = 1)
y_test_pred = encoder.inverse_transform(classes)
from sklearn.metrics import classification_report
print(classification_report(y_test, y_test_pred, encoder.classes_))
is 0.89.
Is it possible to perform identical (or at least as much as possible) logistic regression with Keras as with scikit-learn?
I tried to run your examples and noticed a couple of potential sources:
The test set is incredibly small, only 45 instances. This means that to get from accuracy of .89 to .96, the model only needs to predict just three more instances correctly. Due to randomness in training, your Keras results can oscillate quite a bit.
As explained by #meowongac https://stackoverflow.com/a/59643522/1467943, you're using a different optimizer. One point is that scikit's algorithm will automatically set its learning rate. For SGD in Keras, tweaking learning rate and/or number of epochs could lead to improvements.
Scikit learn quietly uses L2 regularization by default.
Using your code, I was able to get accuracy ranging from .89 to .96 by running SGD with learning rate set to .05. When switching to Adam (also with this quite high learning rate), I got more stable results ranging from .92 to .96 (although this is more of an impression as I didn't run too many trials).
One obvious difference is saga (a variant of SAG) is used in LogisticRegression while SGD is used in your NN. As far as I know, LogisticRegression doesn't support SGD. Alternatively you can use SGDRegressor or SGDClassifier instead of LogisticRegression. And here is a blog discussing the differences between them.
I am quite new to Keras so apologies in advance for any stupid mistakes. I am currently attempting to try out some good old cross-domain transfer learning between two datasets. I have a model here that is trained and executed on a voice recognition dataset that I have generated (code is at the bottom of this question because it's quite long)
If I were to train a new model, say model_2 on a different dataset, then I'd get a baseline from the initial random distribution of weights.
I wonder, is it possible to train model_1 and model_2, then, and this is the bit I don't know how to do; can I take the two 256 and 128 dense layers from model_1 (with trained weights) and use them as starting points for a model_3 - which is dataset 2 with the initial weight distribution from model_1?
So, in the end, I have the following:
Model_1 which starts from a random distribution and trains on dataset 1
Model_2 which starts from a random distribution and trains on dataset 2
Model_3 which starts from the distribution trained in Model_1 and trains on dataset 2.
My question is, how would I go about doing step 3 in the above? I don't want to freeze the weights, I just want an initial distribution for training from a past experiment
Any help would be greatly appreciated. Thank you! Apologies if I didn't make it quite clear enough what I'm going for
My code to train Model_1 is as follows:
import numpy
import pandas
from keras.models import Sequential
from keras.layers import Dense
from keras.callbacks import EarlyStopping
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
import matplotlib.pyplot as plt
from keras.utils import np_utils
from keras.layers.normalization import BatchNormalization
import time
start = time.clock()
# fix random seed for reproducibility
seed = 1
numpy.random.seed(seed)
# load dataset
dataframe = pandas.read_csv("voice.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
numVars = len(dataframe.columns) - 1
numClasses = dataframe[numVars].nunique()
X = dataset[:,0:numVars].astype(float)
Y = dataset[:,numVars]
print("THERE ARE " + str(numVars) + " ATTRIBUTES")
print("THERE ARE " + str(numClasses) + " UNIQUE CLASSES")
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# convert integers to dummy variables (i.e. one hot encoded)
dummy_y = np_utils.to_categorical(encoded_Y)
calls = [EarlyStopping(monitor='acc', min_delta=0.0001, patience=100, verbose=2, mode='max', restore_best_weights=True)]
# define baseline model
def baseline_model():
# create model
model = Sequential()
model.add(BatchNormalization())
model.add(Dense(256, input_dim=numVars, activation='sigmoid'))
model.add(Dense(128, activation='sigmoid'))
model.add(Dense(numClasses, activation='softmax'))
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
estimator = KerasClassifier(build_fn=baseline_model, epochs=2000, batch_size=1000, verbose=1)
kfold = KFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(estimator, X, dummy_y, cv=kfold, fit_params={'callbacks':calls})
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
#your code here
print (time.clock() - start)
PS: Input attributes and outputs will all be the same between the two datasets, all that will change are attribute values. I am curious, can this be done if the two datasets have different numbers of output classes?
In short, to fine-tune Model_3 from Model_1, just call model.load_weights('/path/to/model_1.h5', by_name=True) after model.compile(...). Of course, you must have saved the trained Model_1 first.
If I understood correct, you have the same number of features and classes among the two datasets, so you do not even need to re-design your model. If you had different set of classes, then you had to give different names to the last layers of Model_1 and Model_3:
model.add(Dense(numClasses, activation='softmax', name='some_unique_name'))