How to use keras for XOR - python

I want to practice keras by code a xor, but the result is not right, the followed is my code, thanks for everybody to help me.
from keras.models import Sequential
from keras.layers.core import Dense,Activation
from keras.optimizers import SGD
import numpy as np
model = Sequential()# two layers
model.add(Dense(input_dim=2,output_dim=4,init="glorot_uniform"))
model.add(Activation("sigmoid"))
model.add(Dense(input_dim=4,output_dim=1,init="glorot_uniform"))
model.add(Activation("sigmoid"))
sgd = SGD(l2=0.0,lr=0.05, decay=1e-6, momentum=0.11, nesterov=True)
model.compile(loss='mean_absolute_error', optimizer=sgd)
print "begin to train"
list1 = [1,1]
label1 = [0]
list2 = [1,0]
label2 = [1]
list3 = [0,0]
label3 = [0]
list4 = [0,1]
label4 = [1]
train_data = np.array((list1,list2,list3,list4)) #four samples for epoch = 1000
label = np.array((label1,label2,label3,label4))
model.fit(train_data,label,nb_epoch = 1000,batch_size = 4,verbose = 1,shuffle=True,show_accuracy = True)
list_test = [0,1]
test = np.array((list_test,list1))
classes = model.predict(test)
print classes
Output
[[ 0.31851079] [ 0.34130159]] [[ 0.49635666] [0.51274764]]

If I increase the number of epochs in your code to 50000 it does often converge to the right answer for me, just takes a little while :)
It does often get stuck, though. I get better convergence properties if I change your loss function to 'mean_squared_error', which is a smoother function.
I get still faster convergence if I use the Adam or RMSProp optimizers. My final compile line, which works:
model.compile(loss='mse', optimizer='adam')
...
model.fit(train_data, label, nb_epoch = 10000,batch_size = 4,verbose = 1,shuffle=True,show_accuracy = True)

I used a single hidden layer with 4 hidden nodes, and it almost always converges to the right answer within 500 epochs. I used sigmoid activations.

XOR training with Keras
Below, the minimal neuron network architecture required to learn XOR which should be a (2,2,1) network. In fact, if maths shows that the (2,2,1) network can solve the XOR problem, but maths doesn't show that the (2,2,1) network is easy to train. It could sometimes takes a lot of epochs (iterations) or does not converge to the global minimum. That said, I've got easily good results with (2,3,1) or (2,4,1) network architectures.
The problem seems to be related to the existence of many local minima. Look at this 1998 paper, «Learning XOR: exploring the space of a classic problem» by Richard Bland. Furthermore weights initialization with random number between 0.5 and 1.0 helps to converge.
It works fine with Keras or TensorFlow using loss function 'mean_squared_error', sigmoid activation and Adam optimizer. Even with pretty good hyperparameters, I observed that the learned XOR model is trapped in a local minimum about 15% of the time.
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from tensorflow.keras import initializers
import numpy as np
X = np.array([[0,0],[0,1],[1,0],[1,1]])
y = np.array([[0],[1],[1],[0]])
def initialize_weights(shape, dtype=None):
return np.random.normal(loc = 0.75, scale = 1e-2, size = shape)
model = Sequential()
model.add(Dense(2,
activation='sigmoid',
kernel_initializer=initialize_weights,
input_dim=2))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='mean_squared_error',
optimizer='adam',
metrics=['accuracy'])
print("*** Training... ***")
model.fit(X, y, batch_size=4, epochs=10000, verbose=0)
print("*** Training done! ***")
print("*** Model prediction on [[0,0],[0,1],[1,0],[1,1]] ***")
print(model.predict_proba(X))
*** Training... ***
*** Training done! ***
*** Model prediction on [[0,0],[0,1],[1,0],[1,1]] ***
[[0.08662204]
[0.9235283 ]
[0.92356336]
[0.06672956]]

Related

Why does my Keras CNN always guess one number? Why is my Loss so high?

Ive been working on this CNN. In the Test() function it always says that it is 1 given number. (example. always outputting 8 even though it's not even close). Ive tried training the model more to see if the model was just not good enough. Here is my code:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Dense, Conv2D, Dropout, MaxPooling2D
from tensorflow.keras.callbacks import TensorBoard
from tensorflow.keras.utils import to_categorical
from matplotlib import pyplot as plt
(Train_Data, Train_Labels), (Test_Data, Test_Labels) = tf.keras.datasets.mnist.load_data()
Train_Data = Train_Data.reshape(60000,28,28,1)
Test_Data = Test_Data.reshape(10000,28,28,1)
Train_Data = Train_Data / 255 - 0.5
Test_Data = Test_Data / 255 - 0.5
def load(name):
net = keras.models.load_model(name)
return net
def save(name):
model.save(name)
print("""
###:::SAVING MODEL:::###
""")
def makeCNN():
model = keras.Sequential()
model.add(Conv2D(32, kernel_size=3, activation='relu'))
model.add(MaxPooling2D(pool_size=(3,3)))
model.add(keras.layers.Flatten())
model.add(Dense(9, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.compile(optimzer='adam', loss="mse", metrics=['accuracy'])
return model
def train(epochs):
for i in range(epochs):
print(i+1)
model.fit(Train_Data, Train_Labels)
save('CNN.h5')
def test():
validCorrect = 0
validTotal = 0
print(Test_Data.shape)
for i in range(1000):
data = Test_Data[i]
data = data.reshape(1,28,28,1)
prediction = model.predict(data)
validTotal +=1
if np.argmax(prediction) == Test_Labels[i]:
validCorrect+=1
print(f"""
TOTAL:{validTotal}
ACCURACY:{(validCorrect/validTotal)*100}
CORRECT:{validCorrect}
""")
print(f"GUESS:{np.argmax(prediction)}
REALITY{Test_Labels[i]}")
model = makeCNN()
train(80)
test()
Any help is appreciated. Thanks! Im pretty new to Machine Learning(especially CNNs)
Firstly, you should use categorical_crossentropy as your loss. It's tempting to use MSE, we're dealing with digits after all, but since this is a classification task, the model doesn't know about the supposed ordinality of the different digits. It just knows them as "ten different classes of image". For example, is a 7 more similar to a 2 or an 8? In terms of ordinality, it's closer to 8. But the digit looks rather more like a 2, doesn't it?
Also, I'm guessing that your model is likely to under-fit quite severely, because is not deep enough. You can try adding some more convolutional layers to your network. You could draw inspiration from this example in the Keras documentation (also on the MNIST dataset) here https://keras.io/examples/mnist_cnn/ where they achieve >99% on this problem with just a couple of extra convolutional layers, and some techniques to reduce overfitting, such as dropout.

Keras: unsupervised pre-training kills performance

I'm trying to train a deep classifier in Keras both with and without pretraining of the hidden layers via stacked autoencoders. My problem is that the pretraining seems to drastically degrade performance (i.e. if pretrain is set to False in the code below the training error of the final classification layer converges much faster). This seems completely outrageous to me given that pretraining should only initialize the weights of the hidden layers and I don't see how that could completely kill the models performance even if that initialization does not work very well. I can not include the specific dataset I used but the effect should occur for any appropriate dataset (e.g. minist). What is going on here and how can I fix it?
EDIT: code is now reproducible with the MNIST data, final line prints change in loss function, which is significantly lower with pre-training.
I have also slightly modified the code and added sample learning curves below:
from functools import partial
import matplotlib.pyplot as plt
from keras.datasets import mnist
from keras.layers import Dense
from keras.models import Sequential
from keras.optimizers import SGD
from keras.regularizers import l2
from keras.utils import to_categorical
(inputs_train, targets_train), _ = mnist.load_data()
inputs_train = inputs_train[:1000].reshape(1000, 784)
targets_train = to_categorical(targets_train[:1000])
hidden_nodes = [256] * 4
learning_rate = 0.01
regularization = 1e-6
epochs = 30
def train_model(pretrain):
model = Sequential()
layer = partial(Dense,
activation='sigmoid',
kernel_initializer='random_normal',
kernel_regularizer=l2(regularization))
for i, hn in enumerate(hidden_nodes):
kwargs = dict(units=hn, name='hidden_{}'.format(i + 1))
if i == 0:
kwargs['input_dim'] = inputs_train.shape[1]
model.add(layer(**kwargs))
if pretrain:
# train autoencoders
inputs_train_ = inputs_train.copy()
for i, hn in enumerate(hidden_nodes):
autoencoder = Sequential()
autoencoder.add(layer(units=hn,
input_dim=inputs_train_.shape[1],
name='hidden'))
autoencoder.add(layer(units=inputs_train_.shape[1],
name='decode'))
autoencoder.compile(optimizer=SGD(lr=learning_rate, momentum=0.9),
loss='binary_crossentropy')
autoencoder.fit(
inputs_train_,
inputs_train_,
batch_size=32,
epochs=epochs,
verbose=0)
autoencoder.pop()
model.layers[i].set_weights(autoencoder.layers[0].get_weights())
inputs_train_ = autoencoder.predict(inputs_train_)
num_classes = targets_train.shape[1]
model.add(Dense(units=num_classes,
activation='softmax',
name='classify'))
model.compile(optimizer=SGD(lr=learning_rate, momentum=0.9),
loss='categorical_crossentropy')
h = model.fit(
inputs_train,
targets_train,
batch_size=32,
epochs=epochs,
verbose=0)
return h.history['loss']
plt.plot(train_model(pretrain=False), label="Without Pre-Training")
plt.plot(train_model(pretrain=True), label="With Pre-Training")
plt.xlabel("Epoch")
plt.ylabel("Cross-Entropy")
plt.legend()
plt.show()

Keras: Accuracy stays zero

I am trying to get into machine learning with Keras.
I am not a Mathematician and I have only a basic understanding of how neural net-works (haha get it?), so go easy on me.
This is my current code:
from keras.utils import plot_model
from keras.models import Sequential
from keras.layers import Dense
from keras import optimizers
import numpy
# fix random seed for reproducibility
numpy.random.seed(7)
# split into input (X) and output (Y) variables
X = []
Y = []
count = 0
while count < 10000:
count += 1
X += [count / 10000]
numpy.random.seed(count)
#Y += [numpy.random.randint(1, 101) / 100]
Y += [(count + 1) / 100]
print(str(X) + ' ' + str(Y))
# create model
model = Sequential()
model.add(Dense(50, input_dim=1, kernel_initializer = 'uniform', activation='relu'))
model.add(Dense(50, kernel_initializer = 'uniform', activation='relu'))
model.add(Dense(1, kernel_initializer = 'uniform', activation='sigmoid'))
# Compile model
opt = optimizers.SGD(lr=0.01)
model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])
# Fit the model
model.fit(X, Y, epochs=150, batch_size=100)
# evaluate the model
scores = model.evaluate(X, Y)
predictions = model.predict(X)
print("\n%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
print (str(predictions))
##plot_model(model, to_file='C:/Users/Markus/Desktop/model.png')
The accuracy stays zero and the predictions are an array of 1's. What am I doing wrong?
From what I can see you are trying to solve a regression problem (floating point function output) rather than a classification problem (one hot vector style output/put input into categories).
Your sigmoid final layer will only give an output between 0 and 1, which clearly limits your NNs ability to predict the desired range of Y values which go up much higher. Your NN is trying to get as close as it can, but you are limiting it! Sigmoids in the output layer are good for single class yes/no output, but not regression.
So, you want your last layer to have a linear activation where the inputs are just summed. Something like this instead of the sigmoid.
model.add(Dense(1, kernel_initializer='lecun_normal', activation='linear'))
Then it will likely work, at least if the learning rate is low enough.
Google "keras regression" for useful links.
Looks like you are attempting to do binary classification, with a binary_crossentropy loss function. However, the class labels Y are floats. The labels should be 0 or 1. So the biggest problem lies in the input data you are feeding the model for training.
You can try some data that makes more sense, for example two classes where data are sampled from two different normal distributions, and the labels are either 0 or 1 for each observation:
X = np.concatenate([np.random.randn(10000)/2, np.random.randn(10000)/2+1])
Y = np.concatenate([np.zeros(10000), np.ones(10000)])
The model should be able to go somewhere with this type of data.

Keras accuracy is not increasing over 50%

I am trying to build a binary classification algorithm (output is 0 or 1) on a dataset that contains normal and malicious network packets.
The dataset shape (after converting IP #'s and hexa to decimal) is:
IP src, IP dest, ports, TTL, etc..
Note: The final column is the output.
And the Keras model is:
from keras.models import Sequential
from keras.layers import Dense
from sklearn import preprocessing
import numpy
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
seed = 4
numpy.random.seed(seed)
dataset = numpy.loadtxt("NetworkPackets.csv", delimiter=",")
X = dataset[:, 0:11].astype(float)
Y = dataset[:, 11]
model = Sequential()
model.add(Dense(12, input_dim=11, kernel_initializer='normal', activation='relu'))
model.add(Dense(12, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal', activation='relu'))
model.compile(loss='binary_crossentropy', optimizer='Adam', metrics=['accuracy'])
model.fit(X, Y, nb_epoch=100, batch_size=5)
scores = model.evaluate(X, Y)
print("\n%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
However, I tried different optimizers, activation functions, number of layers, but the accuracy is reaching 0.5 at most:
Result
Even I tried Grid search for searching the best parameters, but the maximum is 0.5.
Does anyone knows why the output is always like that? and how can I enhance it.
Thanks in advance!
Your model isn't even outperforming a random chance model, so there must be something wrong in the data.
There may be two possibilities
1 - You don't feed enough training samples to your model for it to identify significant features as to distinguish between normal and malicious.
2 - The data itself is not informative enough to derive the decision you are looking for.

XOR not learned using keras v2.0

I have for some time gotten pretty bad results using the tool keras, and haven't been suspisous about the tool that much.. But I am beginning to be a bit concerned now.
I tried to see whether it could handle a simple XOR problem, and after 30000 epochs it still haven't solved it...
code:
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.optimizers import SGD
import numpy as np
np.random.seed(100)
model = Sequential()
model.add(Dense(2, input_dim=2))
model.add(Activation('tanh'))
model.add(Dense(1, input_dim=2))
model.add(Activation('sigmoid'))
X = np.array([[0,0],[0,1],[1,0],[1,1]], "float32")
y = np.array([[0],[1],[1],[0]], "float32")
model.compile(loss='binary_crossentropy', optimizer='adam')
model.fit(X, y, nb_epoch=30000, batch_size=1,verbose=1)
print(model.predict_classes(X))
Here is part of my result:
4/4 [==============================] - 0s - loss: 0.3481
Epoch 29998/30000
4/4 [==============================] - 0s - loss: 0.3481
Epoch 29999/30000
4/4 [==============================] - 0s - loss: 0.3481
Epoch 30000/30000
4/4 [==============================] - 0s - loss: 0.3481
4/4 [==============================] - 0s
[[0]
[1]
[0]
[0]]
Is there something wrong with the tool - or am I doing something wrong??
Version I am using:
MacBook-Pro:~ usr$ python -c "import keras; print keras.__version__"
Using TensorFlow backend.
2.0.3
MacBook-Pro:~ usr$ python -c "import tensorflow as tf; print tf.__version__"
1.0.1
MacBook-Pro:~ usr$ python -c "import numpy as np; print np.__version__"
1.12.0
Updated version:
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.optimizers import Adam, SGD
import numpy as np
#np.random.seed(100)
model = Sequential()
model.add(Dense(units = 2, input_dim=2, activation = 'relu'))
model.add(Dense(units = 1, activation = 'sigmoid'))
X = np.array([[0,0],[0,1],[1,0],[1,1]], "float32")
y = np.array([[0],[1],[1],[0]], "float32")
model.compile(loss='binary_crossentropy', optimizer='adam')
print model.summary()
model.fit(X, y, nb_epoch=5000, batch_size=4,verbose=1)
print(model.predict_classes(X))
I cannot add a comment to Daniel's response as I don't have enough reputation, but I believe he's on the right track. While I have not personally tried running the XOR with Keras, here's an article that might be interesting - it analyzes the various regions of local minima for a 2-2-1 network, showing that higher numerical precision would lead to fewer instances of getting stuck on a gradient descent algorithm.
The Local Minima of the Error Surface of the 2-2-1 XOR Network (Ida G. Sprinkhuizen-Kuyper and Egbert J.W. Boers)
On a side note I won't consider using a 2-4-1 network as over-fitting the problem. Having 4 linear cuts on the 0-1 plane (cutting into a 2x2 grid) instead of 2 cuts (cutting the corners off diagonally) just separates the data in a different way, but since we only have 4 data points and no noise in the data, the neural network that uses 4 linear cuts isn't describing "noise" instead of the XOR relationship.
I think it's a "local minimum" in the loss function.
Why?
I have run this same code over and over for a few times, and sometimes it goes right, sometimes it gets stuck into a wrong result. Notice that this code "recreates" the model every time I run it. (If I insist on training a model that found the wrong results, it will simply be kept there forever).
from keras.models import Sequential
from keras.layers import *
import numpy as np
m = Sequential()
m.add(Dense(2,input_dim=2, activation='tanh'))
#m.add(Activation('tanh'))
m.add(Dense(1,activation='sigmoid'))
#m.add(Activation('sigmoid'))
X = np.array([[0,0],[0,1],[1,0],[1,1]],'float32')
Y = np.array([[0],[1],[1],[0]],'float32')
m.compile(optimizer='adam',loss='binary_crossentropy')
m.fit(X,Y,batch_size=1,epochs=20000,verbose=0)
print(m.predict(X))
Running this code, I have found some different outputs:
Wrong: [[ 0.00392423], [ 0.99576807], [ 0.50008368], [ 0.50008368]]
Right: [[ 0.08072935], [ 0.95266515], [ 0.95266813], [ 0.09427474]]
What conclusion can we take from it?
The optimizer is not dealing properly with this local minimum. If it gets lucky (a proper weight initialization), it will fall in a good minimum, and bring the right results.
If it gets unlucky (a bad weight initialization), it will fall in a local minimum, without really knowing that there are better places in the loss function, and its learn_rate is simply not big enough to escape this minimum. The small gradient keeps turning around the same point.
If you take the time to study which gradients appear in the wrong case, you will probably see it keeps pointing towards that same point, and increasing the learning rate a little may make it escape the hole.
Intuition makes me think that such very small models have more prominent local minimums.
Instead of just increasing the number of epochs, try using relu for the activation of your hidden layer instead of tanh. Making only that change to the code you provide, I am able to obtain the following result after only 2000 epochs (Theano backend):
import numpy as np
print(np.__version__) #1.11.3
import keras
print(theano.__version__) # 0.9.0
import theano
print(keras.__version__) # 2.0.2
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.optimizers import Adam, SGD
np.random.seed(100)
model = Sequential()
model.add(Dense(units = 2, input_dim=2, activation = 'relu'))
model.add(Dense(units = 1, activation = 'sigmoid'))
X = np.array([[0,0],[0,1],[1,0],[1,1]], "float32")
y = np.array([[0],[1],[1],[0]], "float32")
model.compile(loss='binary_crossentropy', optimizer='adam'
model.fit(X, y, epochs=2000, batch_size=1,verbose=0)
print(model.evaluate(X,y))
print(model.predict_classes(X))
4/4 [==============================] - 0s
0.118175707757
4/4 [==============================] - 0s
[[0]
[1]
[1]
[0]]
It would be easy to conclude that this is due to vanishing gradient problem. However, the simplicity of this network suggest that this isn't the case. Indeed, if I change the optimizer from 'adam' to SGD(lr=0.01, momentum=0.0, decay=0.0, nesterov=False) (the default values), I can see the following result after 5000 epochs with tanh activation in the hidden layer.
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.optimizers import Adam, SGD
np.random.seed(100)
model = Sequential()
model.add(Dense(units = 2, input_dim=2, activation = 'tanh'))
model.add(Dense(units = 1, activation = 'sigmoid'))
X = np.array([[0,0],[0,1],[1,0],[1,1]], "float32")
y = np.array([[0],[1],[1],[0]], "float32")
model.compile(loss='binary_crossentropy', optimizer=SGD())
model.fit(X, y, epochs=5000, batch_size=1,verbose=0)
print(model.evaluate(X,y))
print(model.predict_classes(X))
4/4 [==============================] - 0s
0.0314897596836
4/4 [==============================] - 0s
[[0]
[1]
[1]
[0]]
Edit: 5/17/17 - Included complete code to enable reproduction
The minimal neuron network architecture required to learn XOR which should be a (2,2,1) network. In fact, if maths shows that the (2,2,1) network can solve the XOR problem, but maths doesn't show that the (2,2,1) network is easy to train. It could sometimes takes a lot of epochs (iterations) or does not converge to the global minimum. That said, I've got easily good results with (2,3,1) or (2,4,1) network architectures.
The problem seems to be related to the existence of many local minima. Look at this 1998 paper, «Learning XOR: exploring the space of a classic problem» by Richard Bland. Furthermore weights initialization with random number between 0.5 and 1.0 helps to converge.
It works fine with Keras or TensorFlow using loss function 'mean_squared_error', sigmoid activation and Adam optimizer. Even with pretty good hyperparameters, I observed that the learned XOR model is trapped in a local minimum about 15% of the time.
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from tensorflow.keras import initializers
import numpy as np
X = np.array([[0,0],[0,1],[1,0],[1,1]])
y = np.array([[0],[1],[1],[0]])
def initialize_weights(shape, dtype=None):
return np.random.normal(loc = 0.75, scale = 1e-2, size = shape)
model = Sequential()
model.add(Dense(2,
activation='sigmoid',
kernel_initializer=initialize_weights,
input_dim=2))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='mean_squared_error',
optimizer='adam',
metrics=['accuracy'])
print("*** Training... ***")
model.fit(X, y, batch_size=4, epochs=10000, verbose=0)
print("*** Training done! ***")
print("*** Model prediction on [[0,0],[0,1],[1,0],[1,1]] ***")
print(model.predict_proba(X))
*** Training... ***
*** Training done! ***
*** Model prediction on [[0,0],[0,1],[1,0],[1,1]] ***
[[0.08662204]
[0.9235283 ]
[0.92356336]
[0.06672956]]

Categories

Resources