I am trying to get into machine learning with Keras.
I am not a Mathematician and I have only a basic understanding of how neural net-works (haha get it?), so go easy on me.
This is my current code:
from keras.utils import plot_model
from keras.models import Sequential
from keras.layers import Dense
from keras import optimizers
import numpy
# fix random seed for reproducibility
numpy.random.seed(7)
# split into input (X) and output (Y) variables
X = []
Y = []
count = 0
while count < 10000:
count += 1
X += [count / 10000]
numpy.random.seed(count)
#Y += [numpy.random.randint(1, 101) / 100]
Y += [(count + 1) / 100]
print(str(X) + ' ' + str(Y))
# create model
model = Sequential()
model.add(Dense(50, input_dim=1, kernel_initializer = 'uniform', activation='relu'))
model.add(Dense(50, kernel_initializer = 'uniform', activation='relu'))
model.add(Dense(1, kernel_initializer = 'uniform', activation='sigmoid'))
# Compile model
opt = optimizers.SGD(lr=0.01)
model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])
# Fit the model
model.fit(X, Y, epochs=150, batch_size=100)
# evaluate the model
scores = model.evaluate(X, Y)
predictions = model.predict(X)
print("\n%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
print (str(predictions))
##plot_model(model, to_file='C:/Users/Markus/Desktop/model.png')
The accuracy stays zero and the predictions are an array of 1's. What am I doing wrong?
From what I can see you are trying to solve a regression problem (floating point function output) rather than a classification problem (one hot vector style output/put input into categories).
Your sigmoid final layer will only give an output between 0 and 1, which clearly limits your NNs ability to predict the desired range of Y values which go up much higher. Your NN is trying to get as close as it can, but you are limiting it! Sigmoids in the output layer are good for single class yes/no output, but not regression.
So, you want your last layer to have a linear activation where the inputs are just summed. Something like this instead of the sigmoid.
model.add(Dense(1, kernel_initializer='lecun_normal', activation='linear'))
Then it will likely work, at least if the learning rate is low enough.
Google "keras regression" for useful links.
Looks like you are attempting to do binary classification, with a binary_crossentropy loss function. However, the class labels Y are floats. The labels should be 0 or 1. So the biggest problem lies in the input data you are feeding the model for training.
You can try some data that makes more sense, for example two classes where data are sampled from two different normal distributions, and the labels are either 0 or 1 for each observation:
X = np.concatenate([np.random.randn(10000)/2, np.random.randn(10000)/2+1])
Y = np.concatenate([np.zeros(10000), np.ones(10000)])
The model should be able to go somewhere with this type of data.
Related
I am using long short term memory (LSTM) to generate predictions. I have noticed that each time I run the LSTM model, it generates slightly different predictions with the same data. I was wondering why this happens and if there is something I am doing wrong?
Thank You
from numpy import array
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers import TimeDistributed
from keras.layers.convolutional import Conv1D
from keras.layers.convolutional import MaxPooling1D
# split a univariate sequence into samples
def split_sequence(sequence, n_steps):
X, y = list(), list()
for i in range(len(sequence)):
# find the end of this pattern
end_ix = i + n_steps
# check if we are beyond the sequence
if end_ix > len(sequence)-1:
break
# gather input and output parts of the pattern
seq_x, seq_y = sequence[i:end_ix], sequence[end_ix]
X.append(seq_x)
y.append(seq_y)
return array(X), array(y)
def LSTM_Model(Data, N_Steps, Epochs):
# define input sequence
raw_seq = Data
# choose a number of time steps
n_steps_og = N_Steps
# split into samples
X, y = split_sequence(raw_seq, n_steps_og)
# reshape from [samples, timesteps] into [samples, subsequences, timesteps, features]
n_features = 1
n_seq = 2
n_steps = 2
X = X.reshape((X.shape[0], n_seq, n_steps, n_features))
# define model
model = Sequential()
model.add(TimeDistributed(Conv1D(filters=64, kernel_size=1, activation='relu'), input_shape=(None, n_steps, n_features)))
model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(50, activation='relu'))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
# fit model
model.fit(X, y, epochs=Epochs, verbose=2)
#Create Forcasting data
#Now take the last 4 days of the Model data for the forcast
Forcast_data = Data[len(new_data) - n_steps_og:]
# demonstrate prediction
x_input = array(Forcast_data)
x_input = x_input.reshape((1, n_seq, n_steps, n_features))
yhat = float(model.predict(x_input, verbose=0))
return(yhat)
Many methods like this are initialized with random weights for the coefficients. Then they search for a good local minimum to some sort of loss function. This means they will (hopefully) find just one of the many nearly optimal solutions, but are unlikely to find the single very best solution, nor to even find the same solution repeatedly. Due to this, your results are typical, so long as your predictions are only slightly different.
This is more of a general machine learning question, rather than being specific to Python, but I hope this helps.
We are trying to build a keras model to predict a vector with probablity rates from a vector of features. The output vector should be of probabilty rates which are between 0 and one and to sum to 1, but some how the output vector consists mostly of zeros and ones, moreover during the time which the model should be training and learn loss and val_loss rates remains the same.
Does anyone knows what is the problem with our model?
example of input vector:
(0,4,1444997,0,622,154536,0,2,11,0,5,11,10,32,4.26E-04,0,5,498,11,1,11,0,172,0,4,1,8,150)
example of expected output vector:
(0.25,0,0,0.083333333,0.583333333,0.083333333)
example of real output vector:
(1.000000000000000000e+00,5.556597260531319618e-28,1.000000000000000000e+00,0.000000000000000000e+00,0.000000000000000000e+00,0.000000000000000000e+00)
the code:
# Create first network with Keras
from keras.models import Sequential
from keras.layers import Dense
from keras.layers.advanced_activations import LeakyReLU
from keras import optimizers
import numpy
X = numpy.loadtxt("compiledFeatures.csv", delimiter=",")
Y = numpy.loadtxt("naive_compiledDate.csv", delimiter=",")
# create model
model = Sequential()
model.add(Dense(20, input_dim=28, init='normal', activation='relu'))
model.add(Dense(15, init='normal', activation='relu'))
model.add(Dense(6, init='normal', activation='relu'))
model.add(Dense(6, init='normal', activation='sigmoid'))
# Compile model
model.compile(optimizer = "adam", loss = 'mae')
# Fit the model
model.fit(X, Y, epochs=2000, verbose=2, validation_split = 0.15)
# calculate predictions
predictions = model.predict(X)
The last activation function to guarantee that the sum is 1 is "softmax".
Now, a frozen loss may be caused by "relu" in this case where you have so few neurons in each layer. (Also a improper weight initialization)
I suggest instead of relu you use "softplus", "tanh" or even "sigmoid".
EDIT:
As #nuric suggested, it's really a good idea to use "categorical_crossentropy" as loss when you're using "softmax".
I am currently learning keras. My goal is to create a simple model, that predicts values of a function. At first I create two arrays, one for the X-Values and one for the corresponding Y-Values.
# declare and init arrays for training-data
X = np.arange(0.0, 10.0, 0.05)
Y = np.empty(shape=0, dtype=float)
# Calculate Y-Values
for x in X:
Y = np.append(Y, float(0.05*(15.72807*x - 7.273893*x**2 + 1.4912*x**3 - 0.1384615*x**4 + 0.00474359*x**5)))
Then I create and train the model
# model architecture
model = Sequential()
model.add(Dense(1, input_shape=(1,)))
model.add(Dense(5))
model.add(Dense(1, activation='linear'))
# compile model
model.compile(loss='mean_absolute_error', optimizer='adam', metrics=['accuracy'])
# train model
model.fit(X, Y, epochs=150, batch_size=10)
and predict the values using the model
# declare and init arrays for prediction
YPredict = np.empty(shape=0, dtype=float)
# Predict Y
YPredict = model.predict(X)
# plot training-data and prediction
plt.plot(X, Y, 'C0')
plt.plot(X, YPredict, 'C1')
# show graph
plt.show()
and I get this output (blue is training-data, orange is prediction):
What did I do wrong? I guess it's a fundamental problem with the network-architecture, right?
The problem is indeed with your network architecture. Specifically, you are using linear activations in all layers: this means that the network can only fit linear functions. You should keep a linear activation in the output layer, but you should use a ReLU activation in the hidden layer:
model.add(Dense(1, input_shape=(1,)))
model.add(Dense(5, activation='relu'))
model.add(Dense(1, activation='linear'))
Then, play with the number/size of the hidden layers; I suggest you use a couple more.
On top of the answer provided by BlackBear:
You should normalize both your inputs X and your outputs Y before feeding them into your neural network:
# Feature Scaling (ignore possible warnings due to conversion of integers to floats)
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X)
sc_Y = StandardScaler()
Y_train = sc_Y.fit_transform(Y)
# [...]
model.fit(X_train, Y_train, ...)
See this answer to see what happens if you don't, in a regression setting very similar to yours. Keep in mind that you should similarly scale any test data using sc_X; also, if you need later to scale any predictions produced by the model back to the original scale of your Y, you should use
sc_Y.inverse_transform(predictions)
Accuracy has no meaning in a regression setting like yours; you should remove metrics=['accuracy'] from your model compilation (loss itself is enough as a metric here)
I am trying to build a binary classification algorithm (output is 0 or 1) on a dataset that contains normal and malicious network packets.
The dataset shape (after converting IP #'s and hexa to decimal) is:
IP src, IP dest, ports, TTL, etc..
Note: The final column is the output.
And the Keras model is:
from keras.models import Sequential
from keras.layers import Dense
from sklearn import preprocessing
import numpy
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
seed = 4
numpy.random.seed(seed)
dataset = numpy.loadtxt("NetworkPackets.csv", delimiter=",")
X = dataset[:, 0:11].astype(float)
Y = dataset[:, 11]
model = Sequential()
model.add(Dense(12, input_dim=11, kernel_initializer='normal', activation='relu'))
model.add(Dense(12, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal', activation='relu'))
model.compile(loss='binary_crossentropy', optimizer='Adam', metrics=['accuracy'])
model.fit(X, Y, nb_epoch=100, batch_size=5)
scores = model.evaluate(X, Y)
print("\n%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
However, I tried different optimizers, activation functions, number of layers, but the accuracy is reaching 0.5 at most:
Result
Even I tried Grid search for searching the best parameters, but the maximum is 0.5.
Does anyone knows why the output is always like that? and how can I enhance it.
Thanks in advance!
Your model isn't even outperforming a random chance model, so there must be something wrong in the data.
There may be two possibilities
1 - You don't feed enough training samples to your model for it to identify significant features as to distinguish between normal and malicious.
2 - The data itself is not informative enough to derive the decision you are looking for.
I want to practice keras by code a xor, but the result is not right, the followed is my code, thanks for everybody to help me.
from keras.models import Sequential
from keras.layers.core import Dense,Activation
from keras.optimizers import SGD
import numpy as np
model = Sequential()# two layers
model.add(Dense(input_dim=2,output_dim=4,init="glorot_uniform"))
model.add(Activation("sigmoid"))
model.add(Dense(input_dim=4,output_dim=1,init="glorot_uniform"))
model.add(Activation("sigmoid"))
sgd = SGD(l2=0.0,lr=0.05, decay=1e-6, momentum=0.11, nesterov=True)
model.compile(loss='mean_absolute_error', optimizer=sgd)
print "begin to train"
list1 = [1,1]
label1 = [0]
list2 = [1,0]
label2 = [1]
list3 = [0,0]
label3 = [0]
list4 = [0,1]
label4 = [1]
train_data = np.array((list1,list2,list3,list4)) #four samples for epoch = 1000
label = np.array((label1,label2,label3,label4))
model.fit(train_data,label,nb_epoch = 1000,batch_size = 4,verbose = 1,shuffle=True,show_accuracy = True)
list_test = [0,1]
test = np.array((list_test,list1))
classes = model.predict(test)
print classes
Output
[[ 0.31851079] [ 0.34130159]] [[ 0.49635666] [0.51274764]]
If I increase the number of epochs in your code to 50000 it does often converge to the right answer for me, just takes a little while :)
It does often get stuck, though. I get better convergence properties if I change your loss function to 'mean_squared_error', which is a smoother function.
I get still faster convergence if I use the Adam or RMSProp optimizers. My final compile line, which works:
model.compile(loss='mse', optimizer='adam')
...
model.fit(train_data, label, nb_epoch = 10000,batch_size = 4,verbose = 1,shuffle=True,show_accuracy = True)
I used a single hidden layer with 4 hidden nodes, and it almost always converges to the right answer within 500 epochs. I used sigmoid activations.
XOR training with Keras
Below, the minimal neuron network architecture required to learn XOR which should be a (2,2,1) network. In fact, if maths shows that the (2,2,1) network can solve the XOR problem, but maths doesn't show that the (2,2,1) network is easy to train. It could sometimes takes a lot of epochs (iterations) or does not converge to the global minimum. That said, I've got easily good results with (2,3,1) or (2,4,1) network architectures.
The problem seems to be related to the existence of many local minima. Look at this 1998 paper, «Learning XOR: exploring the space of a classic problem» by Richard Bland. Furthermore weights initialization with random number between 0.5 and 1.0 helps to converge.
It works fine with Keras or TensorFlow using loss function 'mean_squared_error', sigmoid activation and Adam optimizer. Even with pretty good hyperparameters, I observed that the learned XOR model is trapped in a local minimum about 15% of the time.
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from tensorflow.keras import initializers
import numpy as np
X = np.array([[0,0],[0,1],[1,0],[1,1]])
y = np.array([[0],[1],[1],[0]])
def initialize_weights(shape, dtype=None):
return np.random.normal(loc = 0.75, scale = 1e-2, size = shape)
model = Sequential()
model.add(Dense(2,
activation='sigmoid',
kernel_initializer=initialize_weights,
input_dim=2))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='mean_squared_error',
optimizer='adam',
metrics=['accuracy'])
print("*** Training... ***")
model.fit(X, y, batch_size=4, epochs=10000, verbose=0)
print("*** Training done! ***")
print("*** Model prediction on [[0,0],[0,1],[1,0],[1,1]] ***")
print(model.predict_proba(X))
*** Training... ***
*** Training done! ***
*** Model prediction on [[0,0],[0,1],[1,0],[1,1]] ***
[[0.08662204]
[0.9235283 ]
[0.92356336]
[0.06672956]]