I am trying to build a binary classification algorithm (output is 0 or 1) on a dataset that contains normal and malicious network packets.
The dataset shape (after converting IP #'s and hexa to decimal) is:
IP src, IP dest, ports, TTL, etc..
Note: The final column is the output.
And the Keras model is:
from keras.models import Sequential
from keras.layers import Dense
from sklearn import preprocessing
import numpy
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
seed = 4
numpy.random.seed(seed)
dataset = numpy.loadtxt("NetworkPackets.csv", delimiter=",")
X = dataset[:, 0:11].astype(float)
Y = dataset[:, 11]
model = Sequential()
model.add(Dense(12, input_dim=11, kernel_initializer='normal', activation='relu'))
model.add(Dense(12, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal', activation='relu'))
model.compile(loss='binary_crossentropy', optimizer='Adam', metrics=['accuracy'])
model.fit(X, Y, nb_epoch=100, batch_size=5)
scores = model.evaluate(X, Y)
print("\n%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
However, I tried different optimizers, activation functions, number of layers, but the accuracy is reaching 0.5 at most:
Result
Even I tried Grid search for searching the best parameters, but the maximum is 0.5.
Does anyone knows why the output is always like that? and how can I enhance it.
Thanks in advance!
Your model isn't even outperforming a random chance model, so there must be something wrong in the data.
There may be two possibilities
1 - You don't feed enough training samples to your model for it to identify significant features as to distinguish between normal and malicious.
2 - The data itself is not informative enough to derive the decision you are looking for.
Related
I have trained a LSTM model to predict multiple output value.
Predicted values are almost same even though the loss is less. Why is it so? How can I improve it?
`from keras import backend as K
import math
from sklearn.metrics import mean_squared_error, mean_absolute_error
from keras.layers.core import Dense, Dropout, Activation
def create_model():
model = Sequential()
model.add(LSTM(50, return_sequences=True, input_shape=(40000, 7)))
model.add(LSTM(50, return_sequences= True))
model.add(LSTM(50, return_sequences= False))
model.add(Dense(25))
model.add(Dense(2, activation='linear'))
model.compile(optimizer='adam', loss='mean_squared_error')
model.summary()
return model
model = create_model()
model.fit(X_train, Y_train, shuffle=False, verbose=1, epochs=10)
prediction = model.predict(X_test, verbose=0)
print(prediction)
prediction =
[[0.26766795 0.00193274]
[0.2676593 0.00192017]
[0.2676627 0.00193239]
[0.2676644 0.00192784]
[0.26766634 0.00193461]
[0.2676624 0.00192487]
[0.26766685 0.00193129]
[0.26766685 0.00193165]
[0.2676621 0.00193216]
[0.26766127 0.00192624]]
`
calculate mean_relative error
`mean_relative_error = tf.reduce_mean(tf.abs((Y_test-prediction)/Y_test))
print(mean_relative_error)`
`mean_relative_error= 1.9220362`
It means you are just closing the values of x as nearest to y. Just like mapping x -> y. The Relative Error is saying to me that your y's are relatively small and when you are taking the mean difference between y_hat and y they are close enough...
To Break this symmetry you should increase the number of LSTM Cells and add a Dropout to it, also make sure to put an L1-Regularization term into your Dense Layers.
Decrease the number of neurons from each Dense Layer and increase the network size, also change your loss from "mean_squared_error" to "mean_absolute_error".
One more thing use Adagrad with a learning_rate of 1, instead of Adam Optimizer.
I'm willing to create a Neural network in python, using Keras, that tells if anumber is even or odd. I know that can be done in many ways and that using NN for this is overkill but i want to this for educational purpose.
I'm running into an issue: the accuracy of my model is about 50 % that means that it's unable to tell if a number is even or odd.
I'll detail to you the step that i went through and hopefully we'll find a solution together :)
Step one creation of the data and labels:
Basically my data are the number from 0 to 99(binary) and the labels are 0(odd) and 1(even)
for i in range(100):
string = np.binary_repr(i,8)
array = []
for k in string:
array.append(int(k))
array = np.array(array)
labels.append(-1*(i%2 - 1))
Then I'm creating the model thas is made of 3 layer.
-Layer 1 (input) : one neuron that's takes any numpy array of size 8 (8 bit representation of integers)
-Layer 2 (Hidden) : two neurons
-Layer 3 (outuput) : one neuron
# creating a model
model = Sequential()
model.add(Dense(1, input_dim=8, kernel_initializer='uniform', activation='relu'))
model.add(Dense(2, kernel_initializer='uniform', activation='relu'))
model.add(Dense(1, kernel_initializer='uniform', activation='sigmoid'))
then I'm training the model using binary_cross_entropy as a loss function since i want a binary classification of integers:
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
then I'm training the model and evaluating it:
#training
model.fit(data, labels, epochs=10, batch_size=2)
#evaluate the model
scores = model.evaluate(data, labels)
print("\n%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
And that's where I'm lost because of that 50 % accuracy.
I think i missundesrtood something about NN or Keras implementation so any help would be appreciated.
Thank you for reading
edit : I modified my code according to the comment of Stefan Falk
The following gives me an accuracy on the test set of 100%:
import numpy as np
from tensorflow.contrib.learn.python.learn.estimators._sklearn import train_test_split
from tensorflow.python.keras import Sequential
from tensorflow.python.keras.layers import Dense
# Number of samples (digits from 0 to N-1)
N = 10000
# Input size depends on the number of digits
input_size = int(np.log2(N)) + 1
# Generate data
y = list()
X = list()
for i in range(N):
binary_string = np.binary_repr(i, input_size)
array = np.zeros(input_size)
for j, binary in enumerate(binary_string):
array[j] = int(binary)
X.append(array)
y.append(int(i % 2 == 0))
X = np.asarray(X)
y = np.asarray(y)
# Make train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1)
# Create the model
model = Sequential()
model.add(Dense(2, kernel_initializer='uniform', activation='relu'))
model.add(Dense(1, kernel_initializer='uniform', activation='sigmoid'))
model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])
# Train
model.fit(X_train, y_train, epochs=3, batch_size=10)
# Evaluate
print("Evaluating model:")
scores = model.evaluate(X_test, y_test)
print("\n%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
Why does it work that well?
Your problem is very simple. The network only needs to know whether the first bit is set (1) or not (0). For this you actually don't need a hidden layer or any non-linlearities. The problem can be solved with simple linear regression.
This
model = Sequential()
model.add(Dense(1, kernel_initializer='uniform', activation='sigmoid'))
will do the job as well. Further, on the topic of feature engineering,
X = [v % 2 for v in range(N)]
is also enough. You'll see that X in that case will have the same content as y.
Maybe try a non-linear example such as XOR. Note that we do not have a test-set here because there's nothing to generalize or any "unseen" data which may surprise the network.
import numpy as np
from tensorflow.python.keras import Sequential
from tensorflow.python.keras.layers import Dense
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])
model = Sequential()
model.add(Dense(5, input_dim=2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, y, batch_size=1, nb_epoch=1000)
print(model.predict_proba(X))
print(model.predict_proba(X) > 0.5)
Look at this link and play around with the example.
We are trying to build a keras model to predict a vector with probablity rates from a vector of features. The output vector should be of probabilty rates which are between 0 and one and to sum to 1, but some how the output vector consists mostly of zeros and ones, moreover during the time which the model should be training and learn loss and val_loss rates remains the same.
Does anyone knows what is the problem with our model?
example of input vector:
(0,4,1444997,0,622,154536,0,2,11,0,5,11,10,32,4.26E-04,0,5,498,11,1,11,0,172,0,4,1,8,150)
example of expected output vector:
(0.25,0,0,0.083333333,0.583333333,0.083333333)
example of real output vector:
(1.000000000000000000e+00,5.556597260531319618e-28,1.000000000000000000e+00,0.000000000000000000e+00,0.000000000000000000e+00,0.000000000000000000e+00)
the code:
# Create first network with Keras
from keras.models import Sequential
from keras.layers import Dense
from keras.layers.advanced_activations import LeakyReLU
from keras import optimizers
import numpy
X = numpy.loadtxt("compiledFeatures.csv", delimiter=",")
Y = numpy.loadtxt("naive_compiledDate.csv", delimiter=",")
# create model
model = Sequential()
model.add(Dense(20, input_dim=28, init='normal', activation='relu'))
model.add(Dense(15, init='normal', activation='relu'))
model.add(Dense(6, init='normal', activation='relu'))
model.add(Dense(6, init='normal', activation='sigmoid'))
# Compile model
model.compile(optimizer = "adam", loss = 'mae')
# Fit the model
model.fit(X, Y, epochs=2000, verbose=2, validation_split = 0.15)
# calculate predictions
predictions = model.predict(X)
The last activation function to guarantee that the sum is 1 is "softmax".
Now, a frozen loss may be caused by "relu" in this case where you have so few neurons in each layer. (Also a improper weight initialization)
I suggest instead of relu you use "softplus", "tanh" or even "sigmoid".
EDIT:
As #nuric suggested, it's really a good idea to use "categorical_crossentropy" as loss when you're using "softmax".
I am trying to get into machine learning with Keras.
I am not a Mathematician and I have only a basic understanding of how neural net-works (haha get it?), so go easy on me.
This is my current code:
from keras.utils import plot_model
from keras.models import Sequential
from keras.layers import Dense
from keras import optimizers
import numpy
# fix random seed for reproducibility
numpy.random.seed(7)
# split into input (X) and output (Y) variables
X = []
Y = []
count = 0
while count < 10000:
count += 1
X += [count / 10000]
numpy.random.seed(count)
#Y += [numpy.random.randint(1, 101) / 100]
Y += [(count + 1) / 100]
print(str(X) + ' ' + str(Y))
# create model
model = Sequential()
model.add(Dense(50, input_dim=1, kernel_initializer = 'uniform', activation='relu'))
model.add(Dense(50, kernel_initializer = 'uniform', activation='relu'))
model.add(Dense(1, kernel_initializer = 'uniform', activation='sigmoid'))
# Compile model
opt = optimizers.SGD(lr=0.01)
model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])
# Fit the model
model.fit(X, Y, epochs=150, batch_size=100)
# evaluate the model
scores = model.evaluate(X, Y)
predictions = model.predict(X)
print("\n%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
print (str(predictions))
##plot_model(model, to_file='C:/Users/Markus/Desktop/model.png')
The accuracy stays zero and the predictions are an array of 1's. What am I doing wrong?
From what I can see you are trying to solve a regression problem (floating point function output) rather than a classification problem (one hot vector style output/put input into categories).
Your sigmoid final layer will only give an output between 0 and 1, which clearly limits your NNs ability to predict the desired range of Y values which go up much higher. Your NN is trying to get as close as it can, but you are limiting it! Sigmoids in the output layer are good for single class yes/no output, but not regression.
So, you want your last layer to have a linear activation where the inputs are just summed. Something like this instead of the sigmoid.
model.add(Dense(1, kernel_initializer='lecun_normal', activation='linear'))
Then it will likely work, at least if the learning rate is low enough.
Google "keras regression" for useful links.
Looks like you are attempting to do binary classification, with a binary_crossentropy loss function. However, the class labels Y are floats. The labels should be 0 or 1. So the biggest problem lies in the input data you are feeding the model for training.
You can try some data that makes more sense, for example two classes where data are sampled from two different normal distributions, and the labels are either 0 or 1 for each observation:
X = np.concatenate([np.random.randn(10000)/2, np.random.randn(10000)/2+1])
Y = np.concatenate([np.zeros(10000), np.ones(10000)])
The model should be able to go somewhere with this type of data.
I try to create a neural network with keras (backened tensorflow).
I have 4 Input and 2 Output variables:
not available
I want to do predictions to a Testset not available.
This is my Code:
from keras import optimizers
from keras.models import Sequential
from keras.layers import Dense
import numpy
numpy.random.seed(7)
dataset = numpy.loadtxt("trainingsdata.csv", delimiter=";")
X = dataset[:,0:4]
Y = dataset[:,4:6]
model = Sequential()
model.add(Dense(4, input_dim=4, init='uniform', activation='sigmoid'))
model.add(Dense(3, init='uniform', activation='sigmoid'))
model.add(Dense(2, init='uniform', activation='linear'))
sgd = optimizers.SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='mean_squared_error', optimizer='sgd', metrics=['accuracy'])
model.fit(X, Y, epochs=150, batch_size=10, verbose=2)
testset = numpy.loadtxt("testdata.csv", delimiter=";")
Z = testset[:,0:4]
predictions = model.predict(Z)
print(predictions)
When I run the script, the accuracy is 1.000 after every epoch and I get as result always the same output for every input pair:
[-5.83297 68.2967]
[-5.83297 68.2967]
[-5.83297 68.2967]
...
Has anybody an idea what the fault in my code is?
I suggest you normalize / standardize your data before feeding it to your model and then check if your model starts to learn.
Have a look at scikit-learn's StandardScaler.
And look into this SO thread to learn how to correctly fit_transform your training data and only transform your test data.
There is also this tutorial that makes use of scikit-learn's data preprocessing pipeline: http://machinelearningmastery.com/regression-tutorial-keras-deep-learning-library-python/
Neural networks have a tough time if the scale of the input variables is too different from each other. Having 10, 1000, 100000 as the same inputs causes the gradients to collapse towards whatever the large value is. The other values effectively don't provide any information.
One method is to simply rescale the input variables by a constant. You can simply divide the 206000 by 100000. Try getting all of the variables to be at around the same number of digits. Large numbers are a bit harder than small numbers, for networks.