MLP with keras for prediction - python

I try to create a neural network with keras (backened tensorflow).
I have 4 Input and 2 Output variables:
not available
I want to do predictions to a Testset not available.
This is my Code:
from keras import optimizers
from keras.models import Sequential
from keras.layers import Dense
import numpy
numpy.random.seed(7)
dataset = numpy.loadtxt("trainingsdata.csv", delimiter=";")
X = dataset[:,0:4]
Y = dataset[:,4:6]
model = Sequential()
model.add(Dense(4, input_dim=4, init='uniform', activation='sigmoid'))
model.add(Dense(3, init='uniform', activation='sigmoid'))
model.add(Dense(2, init='uniform', activation='linear'))
sgd = optimizers.SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='mean_squared_error', optimizer='sgd', metrics=['accuracy'])
model.fit(X, Y, epochs=150, batch_size=10, verbose=2)
testset = numpy.loadtxt("testdata.csv", delimiter=";")
Z = testset[:,0:4]
predictions = model.predict(Z)
print(predictions)
When I run the script, the accuracy is 1.000 after every epoch and I get as result always the same output for every input pair:
[-5.83297 68.2967]
[-5.83297 68.2967]
[-5.83297 68.2967]
...
Has anybody an idea what the fault in my code is?

I suggest you normalize / standardize your data before feeding it to your model and then check if your model starts to learn.
Have a look at scikit-learn's StandardScaler.
And look into this SO thread to learn how to correctly fit_transform your training data and only transform your test data.
There is also this tutorial that makes use of scikit-learn's data preprocessing pipeline: http://machinelearningmastery.com/regression-tutorial-keras-deep-learning-library-python/

Neural networks have a tough time if the scale of the input variables is too different from each other. Having 10, 1000, 100000 as the same inputs causes the gradients to collapse towards whatever the large value is. The other values effectively don't provide any information.
One method is to simply rescale the input variables by a constant. You can simply divide the 206000 by 100000. Try getting all of the variables to be at around the same number of digits. Large numbers are a bit harder than small numbers, for networks.

Related

All predicted values of LSTM model is almost same

I have trained a LSTM model to predict multiple output value.
Predicted values are almost same even though the loss is less. Why is it so? How can I improve it?
`from keras import backend as K
import math
from sklearn.metrics import mean_squared_error, mean_absolute_error
from keras.layers.core import Dense, Dropout, Activation
def create_model():
model = Sequential()
model.add(LSTM(50, return_sequences=True, input_shape=(40000, 7)))
model.add(LSTM(50, return_sequences= True))
model.add(LSTM(50, return_sequences= False))
model.add(Dense(25))
model.add(Dense(2, activation='linear'))
model.compile(optimizer='adam', loss='mean_squared_error')
model.summary()
return model
model = create_model()
model.fit(X_train, Y_train, shuffle=False, verbose=1, epochs=10)
prediction = model.predict(X_test, verbose=0)
print(prediction)
prediction =
[[0.26766795 0.00193274]
[0.2676593 0.00192017]
[0.2676627 0.00193239]
[0.2676644 0.00192784]
[0.26766634 0.00193461]
[0.2676624 0.00192487]
[0.26766685 0.00193129]
[0.26766685 0.00193165]
[0.2676621 0.00193216]
[0.26766127 0.00192624]]
`
calculate mean_relative error
`mean_relative_error = tf.reduce_mean(tf.abs((Y_test-prediction)/Y_test))
print(mean_relative_error)`
`mean_relative_error= 1.9220362`
It means you are just closing the values of x as nearest to y. Just like mapping x -> y. The Relative Error is saying to me that your y's are relatively small and when you are taking the mean difference between y_hat and y they are close enough...
To Break this symmetry you should increase the number of LSTM Cells and add a Dropout to it, also make sure to put an L1-Regularization term into your Dense Layers.
Decrease the number of neurons from each Dense Layer and increase the network size, also change your loss from "mean_squared_error" to "mean_absolute_error".
One more thing use Adagrad with a learning_rate of 1, instead of Adam Optimizer.

Why the accuracy of the neural network stops increasing

I'm trying to solve the Titanic competition on Kaggle. But the modelaccuracy isn't going beyond 80%.
I tried to change a number of hidden nodes, a number of epochs, also tried to apply batch normalization, dropout, changing the weights initializations, but there's the same 80%. What am I doing wrong?
This is my code below:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(10, input_shape=(5,), kernel_initializer='he_normal', activation='relu'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dense(20, kernel_initializer='he_normal', activation='relu'))
model.add(tf.keras.layers.Dropout(0.3))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dense(2, kernel_initializer=tf.keras.initializers.GlorotNormal(), activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
train_scores = model.fit(train_features, train_labels, epochs=200, batch_size=64, verbose=2)
And here's on the picture accuracy in some last epochs:model accuracy
How can I improve it?
You can try normalising the data, Generally while implementing Neural Networks we don't need to normalise our data (if the network is deep) but since here we are only working with 3 layers only I guess normalising the data might help.
I would suggest to split your training data again into training and validation set and use K-fold cross validation ( I am not sure about this one!! I too am new in this field).
But in general I have seen if the accuracy is constant then the best approach is to alter the training data ( I mean normalise it or try imputing NaN values with the mean (rather than setting the to 0)).

Can a neural network be configured to output a matrix in Keras?

I am working with predicting a time series in 3 dimensions and would like to know if it is possible to configure a model to output a martix in Keras.
Currently, I have 3 regression models I train one after the other, one for predicting each output dimension. With a prediction horizon of 10 samples for example, each model is outputting a 10x1 vector. However, it seems like this could be done much more efficiently with a single model.
Thank you
I have found a much better way to do this using the Keras core layer Reshape. For a prediction horizon by predicted variables sized output, add a Reshape layer after the dense layer with that shape
from keras.layers import Dense, Reshape, Sequential, LSTM
model = Sequential()
model.add(LSTM(100, activation='relu', return_sequences=True, input_shape=(n_steps_in, n_features)))
model.add(LSTM(100, activation='relu'))
model.add(Dense(n_steps_out*n_features))
model.add(Reshape((n_steps_out,n_features)))
model.compile(optimizer='adam', loss='mse')
I figured out a pretty easy work around. I just reshape the targets on the way in and reshape the predictions on the way out.
input_data = input_data.reshape((num_simulations,input_samples*3))
target_data = target_data.reshape((num_simulations,horizon*3))
model.fit(input_data, target_data, validation_split=0.2, epochs=epochs,
batch_size=batch_size, verbose=0, shuffle=True)
prediction = model.predict(input_data, batch_size=batch_size)
prediction = prediction.reshape((num_simulations,horizon,3))

Keras model to predict probability distribution

We are trying to build a keras model to predict a vector with probablity rates from a vector of features. The output vector should be of probabilty rates which are between 0 and one and to sum to 1, but some how the output vector consists mostly of zeros and ones, moreover during the time which the model should be training and learn loss and val_loss rates remains the same.
Does anyone knows what is the problem with our model?
example of input vector:
(0,4,1444997,0,622,154536,0,2,11,0,5,11,10,32,4.26E-04,0,5,498,11,1,11,0,172,0,4,1,8,150)
example of expected output vector:
(0.25,0,0,0.083333333,0.583333333,0.083333333)
example of real output vector:
(1.000000000000000000e+00,5.556597260531319618e-28,1.000000000000000000e+00,0.000000000000000000e+00,0.000000000000000000e+00,0.000000000000000000e+00)
the code:
# Create first network with Keras
from keras.models import Sequential
from keras.layers import Dense
from keras.layers.advanced_activations import LeakyReLU
from keras import optimizers
import numpy
X = numpy.loadtxt("compiledFeatures.csv", delimiter=",")
Y = numpy.loadtxt("naive_compiledDate.csv", delimiter=",")
# create model
model = Sequential()
model.add(Dense(20, input_dim=28, init='normal', activation='relu'))
model.add(Dense(15, init='normal', activation='relu'))
model.add(Dense(6, init='normal', activation='relu'))
model.add(Dense(6, init='normal', activation='sigmoid'))
# Compile model
model.compile(optimizer = "adam", loss = 'mae')
# Fit the model
model.fit(X, Y, epochs=2000, verbose=2, validation_split = 0.15)
# calculate predictions
predictions = model.predict(X)
The last activation function to guarantee that the sum is 1 is "softmax".
Now, a frozen loss may be caused by "relu" in this case where you have so few neurons in each layer. (Also a improper weight initialization)
I suggest instead of relu you use "softplus", "tanh" or even "sigmoid".
EDIT:
As #nuric suggested, it's really a good idea to use "categorical_crossentropy" as loss when you're using "softmax".

Keras accuracy is not increasing over 50%

I am trying to build a binary classification algorithm (output is 0 or 1) on a dataset that contains normal and malicious network packets.
The dataset shape (after converting IP #'s and hexa to decimal) is:
IP src, IP dest, ports, TTL, etc..
Note: The final column is the output.
And the Keras model is:
from keras.models import Sequential
from keras.layers import Dense
from sklearn import preprocessing
import numpy
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
seed = 4
numpy.random.seed(seed)
dataset = numpy.loadtxt("NetworkPackets.csv", delimiter=",")
X = dataset[:, 0:11].astype(float)
Y = dataset[:, 11]
model = Sequential()
model.add(Dense(12, input_dim=11, kernel_initializer='normal', activation='relu'))
model.add(Dense(12, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal', activation='relu'))
model.compile(loss='binary_crossentropy', optimizer='Adam', metrics=['accuracy'])
model.fit(X, Y, nb_epoch=100, batch_size=5)
scores = model.evaluate(X, Y)
print("\n%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
However, I tried different optimizers, activation functions, number of layers, but the accuracy is reaching 0.5 at most:
Result
Even I tried Grid search for searching the best parameters, but the maximum is 0.5.
Does anyone knows why the output is always like that? and how can I enhance it.
Thanks in advance!
Your model isn't even outperforming a random chance model, so there must be something wrong in the data.
There may be two possibilities
1 - You don't feed enough training samples to your model for it to identify significant features as to distinguish between normal and malicious.
2 - The data itself is not informative enough to derive the decision you are looking for.

Categories

Resources