How to do reverse prediction in python (keras)? - python

I have a classification neural network which I trained in a very large dataset with a fair amount of accuracy and prediction. Now I want to reverse prediction (normal prediction: input->output reverse prediction input<-output).
This is an example neural network made for this question. Here I take natural numbers and try to classify it as odd or even numbers using a neural network.
from keras.models import Sequential
from keras.layers import Dense
import numpy as np
#model
model = Sequential()
model.add(Dense(1,input_dim=1))
model.add(Dense(4))
model.add(Dense(4))
model.add(Dense(2, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam',metrics=['accuracy'])
model.summary
#creating natural number features and binary odd and even targets
n=500
#natural numbers
x=np.arange(1,n)
x=x.reshape(n-1,1)
#even
y1=np.where(x%2==0,1,0)
#odd
y2=np.where(x%2!=0,1,0)
#target
Y=np.column_stack([y1,y2])
#model training
model.fit(x,Y,validation_split=0.3)
y_dash=np.arange(n,n+10)
#model prediction
model.predict(y_dash)
its accuracy is 50% (which is more than enough for this sample ANN)
349/349 [==============================] - 0s - loss: 0.7077 - acc: 0.4871 - val_loss: 0.6991 - val_acc: 0.5000
How can I reverse prediction of this neural network?
For example If I want this neural network to output some odd numbers.

Related

Neural Network with an input array of size 4 and output array of size 8

I am trying to process the data obtained from YOLO v5, which is an array of 4 values (posX, posY, SizX, Sizy) per object detected. Now, I know several detections are related, and I want a neural network to find this relation. For every array input, it should return as output a 2x4 matrix, or, flattened, an array of size 8. I am training 4017 samples using Keras Sequential model:
model = Sequential()
model.add(layers.Dense(256, activation="relu", name="layer1"))
model.add(Dropout(0.5))
model.add(BatchNormalization())
model.add(layers.Dense(592, activation="relu", name="layer2"))
model.add(Dense(8))
model.add(Activation('softmax'))
sgd = tf.keras.optimizers.SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='binary_crossentropy',
optimizer=sgd,
metrics=['accuracy'])
hist=model.fit(X, y, batch_size=48, epochs=50, validation_split=0.2)
But the results I am getting are not good:
Epoch 20/20
80/80 [==============================] - 1s 10ms/step - loss: 0.5413 - accuracy: 0.9963 - val_loss: 0.5414 - val_accuracy: 0.9937
Where the predictions are:
Input: [0.50070833 0.50070833 0.42683333 0.22983333]
Expected Output: [[0.591 0.50070833 0.04514583 0.25035417]
[0.50070833 0.34475 0.44735417 0.04514583]]
NeuralNetwork Output: [[0.28618604 0.18969838 0.00889739 0.06283922]
[0.18952993 0.09904755 0.15489812 0.00890343]]
Adding/suppresing layers/BatchNormalization/Dropout is not making any difference and changing lossfunction/optimizer is only worsening the result. Do you have any advice or solution to this problem?
Softmax activation at the output is used for classification problems, you do not seem to be doing classification, but regression (since the output is continuous, not discrete).
Then you should change your activation at the output to something appropriate, like linear (general solution) or sigmoid (if the targets are in the [0, 1] range).

Big difference between val-acc and prediction accuracy in Keras Neural Network

I have a dataset that I used for making NN model in Keras, i took 2000 rows from that dataset to have them as validation data, those 2000 rows should be added in .predict function.
I wrote a code for Keras NN and for now it works good, but I noticed something that is very strange for me. It gives me very good accuracy of more than 83%, loss is around 0.12, but when I want to make a prediction with unseen data (those 2000 rows), it only predicts correct in average of 65%.
When I add Dropout layer, it only decreases accuracy.
Then I have added EarlyStopping, and it gave me accuracy around 86%, loss is around 0.10, but still when I make prediction with unseen data, I get final prediction accuracy of 67%.
Does this mean that model made correct prediction in 87% of situations? Im going with a logic, if I add 100 samples in my .predict function, that program should make good prediction for 87/100 samples, or somewhere in that range (lets say more than 80)? I have tried to add 100, 500, 1000, 1500 and 2000 samples in my .predict function, and it always make correct prediction in 65-68% of the samples.
Why is that, am I doing something wrong?
I have tried to play with number of layers, number of nodes, with different activation functions and with different optimizers but it only changes the results by 1-2%.
My dataset looks like this:
DataFrame shape (59249, 33)
x_train shape (47399, 32)
y_train shape (47399,)
x_test shape (11850, 32)
y_test shape (11850,)
testing_features shape (1000, 32)
This is my NN model:
model = Sequential()
model.add(Dense(64, input_dim = x_train.shape[1], activation = 'relu')) # input layer requires input_dim param
model.add(Dropout(0.2))
model.add(Dense(32, activation = 'relu'))
model.add(Dropout(0.2))
model.add(Dense(16, activation = 'relu'))
model.add(Dense(1, activation='sigmoid')) # sigmoid instead of relu for final probability between 0 and 1
# compile the model, adam gradient descent (optimized)
model.compile(loss="binary_crossentropy", optimizer= "adam", metrics=['accuracy'])
# call the function to fit to the data training the network)
es = EarlyStopping(monitor='val_loss', min_delta=0.0, patience=1, verbose=0, mode='auto')
model.fit(x_train, y_train, epochs = 15, shuffle = True, batch_size=32, validation_data=(x_test, y_test), verbose=2, callbacks=[es])
scores = model.evaluate(x_test, y_test)
print(model.metrics_names[0], round(scores[0]*100,2), model.metrics_names[1], round(scores[1]*100,2))
These are the results:
Train on 47399 samples, validate on 11850 samples
Epoch 1/15
- 25s - loss: 0.3648 - acc: 0.8451 - val_loss: 0.2825 - val_acc: 0.8756
Epoch 2/15
- 9s - loss: 0.2949 - acc: 0.8689 - val_loss: 0.2566 - val_acc: 0.8797
Epoch 3/15
- 9s - loss: 0.2741 - acc: 0.8773 - val_loss: 0.2468 - val_acc: 0.8849
Epoch 4/15
- 9s - loss: 0.2626 - acc: 0.8816 - val_loss: 0.2416 - val_acc: 0.8845
Epoch 5/15
- 10s - loss: 0.2566 - acc: 0.8827 - val_loss: 0.2401 - val_acc: 0.8867
Epoch 6/15
- 8s - loss: 0.2503 - acc: 0.8858 - val_loss: 0.2364 - val_acc: 0.8893
Epoch 7/15
- 9s - loss: 0.2480 - acc: 0.8873 - val_loss: 0.2321 - val_acc: 0.8895
Epoch 8/15
- 9s - loss: 0.2450 - acc: 0.8886 - val_loss: 0.2357 - val_acc: 0.8888
11850/11850 [==============================] - 2s 173us/step
loss 23.57 acc 88.88
And this is for prediction:
#testing_features are 2000 rows that i extracted from dataset (these samples are not used in training, this is separate dataset thats imported)
prediction = model.predict(testing_features , batch_size=32)
res = []
for p in prediction:
res.append(p[0].round(0))
# Accuracy with sklearn - also much lower
acc_score = accuracy_score(testing_results, res)
print("Sklearn acc", acc_score)
result_df = pd.DataFrame({"label":testing_results,
"prediction":res})
result_df["prediction"] = result_df["prediction"].astype(int)
s = 0
for x,y in zip(result_df["label"], result_df["prediction"]):
if x == y:
s+=1
print(s,"/",len(result_df))
acc = s*100/len(result_df)
print('TOTAL ACC:', round(acc,2))
The problem is...now I get accuracy with sklearn 52% and my_acc 52%.
Why do I get such low accuracy on validation, when it says that its much larger?
The training data you posted gives high validation accuracy, so I'm a bit confused as to where you get that 65% from, but in general when your model performs much better on training data than on unseen data, that means you're over fitting. This is a big and recurring problem in machine learning, and there is no method guaranteed to prevent this, but there are a couple of things you can try:
regularizing the weights of your network, e.g. using l2 regularization
using stochastic regularization techniques such as drop-out during training
early stopping
reducing model complexity (but you say you've already tried this)
I will list the problems/recommendations that I see on your model.
What are you trying to predict? You are using sigmoid activation function in the last layer which seems it is a binary classification but in your loss fuction you used mse which seems strange. You can try binary_crossentropy instead of mse loss function for your model.
Your model seems suffer from overfitting so you can increase the prob. of Dropout and also add new Dropout between other hidden layers or you can remove one of the hidden layers because it seem your model is too complex.
You can change your neuron numbers in layers like a narrower => 64 -> 32 -> 16 -> 1 or try different NN architectures.
Try adam optimizer instead of sgd.
If you have 57849 sample you can use 47000 samples in training+validation and rest of will be your test set.
Don't use the same sets for your evaluation and validation. First split your data into train and test set. Then when you are fitting your model give validation_split_ratio then it will automatically give validation set from your training set.

Training loss higher than validation loss

I am trying to train a regression model of a dummy function with 3 variables with fully connected neural nets in Keras and I always get a training loss much higher than the validation loss.
I split the data set in 2/3 for training and 1/3 for validation. I have tried lots of different things:
changing the architecture
adding more neurons
using regularization
using different batch sizes
Still the training error is one order of magnitude higer than the validation error:
Epoch 5995/6000
4020/4020 [==============================] - 0s 78us/step - loss: 1.2446e-04 - mean_squared_error: 1.2446e-04 - val_loss: 1.3953e-05 - val_mean_squared_error: 1.3953e-05
Epoch 5996/6000
4020/4020 [==============================] - 0s 98us/step - loss: 1.2549e-04 - mean_squared_error: 1.2549e-04 - val_loss: 1.5730e-05 - val_mean_squared_error: 1.5730e-05
Epoch 5997/6000
4020/4020 [==============================] - 0s 105us/step - loss: 1.2500e-04 - mean_squared_error: 1.2500e-04 - val_loss: 1.4372e-05 - val_mean_squared_error: 1.4372e-05
Epoch 5998/6000
4020/4020 [==============================] - 0s 96us/step - loss: 1.2500e-04 - mean_squared_error: 1.2500e-04 - val_loss: 1.4151e-05 - val_mean_squared_error: 1.4151e-05
Epoch 5999/6000
4020/4020 [==============================] - 0s 80us/step - loss: 1.2487e-04 - mean_squared_error: 1.2487e-04 - val_loss: 1.4342e-05 - val_mean_squared_error: 1.4342e-05
Epoch 6000/6000
4020/4020 [==============================] - 0s 79us/step - loss: 1.2494e-04 - mean_squared_error: 1.2494e-04 - val_loss: 1.4769e-05 - val_mean_squared_error: 1.4769e-05
This makes no sense, please help!
Edit: this is the full code
I have 6000 training examples
# -*- coding: utf-8 -*-
"""
Created on Mon Feb 26 13:40:03 2018
#author: Michele
"""
#from keras.datasets import reuters
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout, LSTM
from keras import optimizers
import matplotlib.pyplot as plt
import os
import pylab
from keras.constraints import maxnorm
from sklearn.model_selection import train_test_split
from keras import regularizers
from sklearn.preprocessing import MinMaxScaler
import math
from sklearn.metrics import mean_squared_error
import keras
# fix random seed for reproducibility
seed=7
np.random.seed(seed)
dataset = np.loadtxt("BabbaX.csv", delimiter=",")
#split into input (X) and output (Y) variables
#x = dataset.transpose()[:,10:15] #only use power
x = dataset
del(dataset) # delete container
dataset = np.loadtxt("BabbaY.csv", delimiter=",")
#split into input (X) and output (Y) variables
y = dataset.transpose()
del(dataset) # delete container
#scale labels from 0 to 1
scaler = MinMaxScaler(feature_range=(0, 1))
y = np.reshape(y, (y.shape[0],1))
y = scaler.fit_transform(y)
lenData=x.shape[0]
x=np.transpose(x)
xtrain=x[:,0:round(lenData*0.67)]
ytrain=y[0:round(lenData*0.67),]
xtest=x[:,round(lenData*0.67):round(lenData*1.0)]
ytest=y[round(lenData*0.67):round(lenData*1.0)]
xtrain=np.transpose(xtrain)
xtest=np.transpose(xtest)
l2_lambda = 0.1 #reg factor
#sequential type of model
model = Sequential()
#stacking layers with .add
units=300
#model.add(Dense(units, input_dim=xtest.shape[1], activation='relu', kernel_initializer='normal', kernel_regularizer=regularizers.l2(l2_lambda), kernel_constraint=maxnorm(3)))
model.add(Dense(units, activation='relu', input_dim=xtest.shape[1]))
#model.add(Dropout(0.1))
model.add(Dense(units, activation='relu'))
#model.add(Dropout(0.1))
model.add(Dense(1)) #no activation function should be used for the output layer
rms = optimizers.RMSprop(lr=0.00001, rho=0.9, epsilon=None, decay=0) #It is recommended to leave the parameters
adam = keras.optimizers.Adam(lr=0.00001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=1e-6, amsgrad=False)
#of this optimizer at their default values (except the learning rate, which can be freely tuned).
#adam=keras.optimizers.Adam(lr=0.01, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
#configure learning process with .compile
model.compile(optimizer=adam, loss='mean_squared_error', metrics=['mse'])
# fit the model (iterate on the training data in batches)
history = model.fit(xtrain, ytrain, nb_epoch=1000, batch_size=round(xtest.shape[0]/100),
validation_data=(xtest, ytest), shuffle=True, verbose=2)
#extract weights for each layer
weights = [layer.get_weights() for layer in model.layers]
#evaluate on training data set
valuesTrain=model.predict(xtrain)
#evaluate on test data set
valuesTest=model.predict(xtest)
#invert predictions
valuesTrain = scaler.inverse_transform(valuesTrain)
ytrain = scaler.inverse_transform(ytrain)
valuesTest = scaler.inverse_transform(valuesTest)
ytest = scaler.inverse_transform(ytest)
TL;DR:
When a model is learning well and quickly the validation loss can be lower than the training loss, since the validation happens on the updated model, while the training loss did not have any (no batches) or only some (with batches) of the updates applied.
Okay I think I found out what's happening here. I used the following code to test this.
import numpy as np
import keras
from keras.models import Sequential
from keras.layers import Dense
import matplotlib.pyplot as plt
np.random.seed(7)
N_DATA = 6000
x = np.random.uniform(-10, 10, (3, N_DATA))
y = x[0] + x[1]**2 + x[2]**3
xtrain = x[:, 0:round(N_DATA*0.67)]
ytrain = y[0:round(N_DATA*0.67)]
xtest = x[:, round(N_DATA*0.67):N_DATA]
ytest = y[round(N_DATA*0.67):N_DATA]
xtrain = np.transpose(xtrain)
xtest = np.transpose(xtest)
model = Sequential()
model.add(Dense(10, activation='relu', input_dim=3))
model.add(Dense(5, activation='relu'))
model.add(Dense(1))
adam = keras.optimizers.Adam()
# configure learning process with .compile
model.compile(optimizer=adam, loss='mean_squared_error', metrics=['mse'])
# fit the model (iterate on the training data in batches)
history = model.fit(xtrain, ytrain, nb_epoch=50,
batch_size=round(N_DATA/100),
validation_data=(xtest, ytest), shuffle=False, verbose=2)
plt.plot(history.history['mean_squared_error'])
plt.plot(history.history['val_loss'])
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
This is essentially the same as your code and replicates the problem, which is not actually a problem. Simply change
history = model.fit(xtrain, ytrain, nb_epoch=50,
batch_size=round(N_DATA/100),
validation_data=(xtest, ytest), shuffle=False, verbose=2)
to
history = model.fit(xtrain, ytrain, nb_epoch=50,
batch_size=round(N_DATA/100),
validation_data=(xtrain, ytrain), shuffle=False, verbose=2)
So instead of validating with your validation data you validate using the training data again, which leads to exactly the same behavior. Weird isn't it? No actually not. What I think is happening is:
The initial mean_squared_error given by Keras on every epoch is the loss before the gradients have been applied, while the validation happens after the gradients have been applied, which makes sense.
With highly stochastic problems for which NNs are usually used you do not see that, because the data varies so much that the updated weights simply are not good enough to describe the validation data, the slight overfitting effect on the training data is still so much stronger that even after updating the weights the validation loss is still higher than the training loss from before. That is only how I think it is though, I might be completely wrong.
One of the reasons that I think is maybe you can increase the size of training data and lower the size of validation data. Then your model will be trained on more samples which may include some complex samples as well and then can be validated on the remaining samples. Try something like train-80% and Validation-20% or any other numbers a little higher than what you used previously.
If you don't want to change the size of training and validation sets, then you can try changing the random seed value to some other number so that you will get a training set with different samples which might be helpful in training the model well.
Check this answer here to get more understanding of the other possible reasons.
Check this link if you want a more detailed explanation with an example. #Michele
If training loss is a little higher or nearer to validation loss, it mean that model is not overfitting.
Efforts are always there to use best out of features to have less overfitting and better validation and test accuracies.
Probable reason that you are always getting train loss higher can be the features and data you are using to train.
Please refer following link and observe the training and validation loss in case of dropout:
http://danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facial-keypoints-tutorial/

XOR not learned using keras v2.0

I have for some time gotten pretty bad results using the tool keras, and haven't been suspisous about the tool that much.. But I am beginning to be a bit concerned now.
I tried to see whether it could handle a simple XOR problem, and after 30000 epochs it still haven't solved it...
code:
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.optimizers import SGD
import numpy as np
np.random.seed(100)
model = Sequential()
model.add(Dense(2, input_dim=2))
model.add(Activation('tanh'))
model.add(Dense(1, input_dim=2))
model.add(Activation('sigmoid'))
X = np.array([[0,0],[0,1],[1,0],[1,1]], "float32")
y = np.array([[0],[1],[1],[0]], "float32")
model.compile(loss='binary_crossentropy', optimizer='adam')
model.fit(X, y, nb_epoch=30000, batch_size=1,verbose=1)
print(model.predict_classes(X))
Here is part of my result:
4/4 [==============================] - 0s - loss: 0.3481
Epoch 29998/30000
4/4 [==============================] - 0s - loss: 0.3481
Epoch 29999/30000
4/4 [==============================] - 0s - loss: 0.3481
Epoch 30000/30000
4/4 [==============================] - 0s - loss: 0.3481
4/4 [==============================] - 0s
[[0]
[1]
[0]
[0]]
Is there something wrong with the tool - or am I doing something wrong??
Version I am using:
MacBook-Pro:~ usr$ python -c "import keras; print keras.__version__"
Using TensorFlow backend.
2.0.3
MacBook-Pro:~ usr$ python -c "import tensorflow as tf; print tf.__version__"
1.0.1
MacBook-Pro:~ usr$ python -c "import numpy as np; print np.__version__"
1.12.0
Updated version:
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.optimizers import Adam, SGD
import numpy as np
#np.random.seed(100)
model = Sequential()
model.add(Dense(units = 2, input_dim=2, activation = 'relu'))
model.add(Dense(units = 1, activation = 'sigmoid'))
X = np.array([[0,0],[0,1],[1,0],[1,1]], "float32")
y = np.array([[0],[1],[1],[0]], "float32")
model.compile(loss='binary_crossentropy', optimizer='adam')
print model.summary()
model.fit(X, y, nb_epoch=5000, batch_size=4,verbose=1)
print(model.predict_classes(X))
I cannot add a comment to Daniel's response as I don't have enough reputation, but I believe he's on the right track. While I have not personally tried running the XOR with Keras, here's an article that might be interesting - it analyzes the various regions of local minima for a 2-2-1 network, showing that higher numerical precision would lead to fewer instances of getting stuck on a gradient descent algorithm.
The Local Minima of the Error Surface of the 2-2-1 XOR Network (Ida G. Sprinkhuizen-Kuyper and Egbert J.W. Boers)
On a side note I won't consider using a 2-4-1 network as over-fitting the problem. Having 4 linear cuts on the 0-1 plane (cutting into a 2x2 grid) instead of 2 cuts (cutting the corners off diagonally) just separates the data in a different way, but since we only have 4 data points and no noise in the data, the neural network that uses 4 linear cuts isn't describing "noise" instead of the XOR relationship.
I think it's a "local minimum" in the loss function.
Why?
I have run this same code over and over for a few times, and sometimes it goes right, sometimes it gets stuck into a wrong result. Notice that this code "recreates" the model every time I run it. (If I insist on training a model that found the wrong results, it will simply be kept there forever).
from keras.models import Sequential
from keras.layers import *
import numpy as np
m = Sequential()
m.add(Dense(2,input_dim=2, activation='tanh'))
#m.add(Activation('tanh'))
m.add(Dense(1,activation='sigmoid'))
#m.add(Activation('sigmoid'))
X = np.array([[0,0],[0,1],[1,0],[1,1]],'float32')
Y = np.array([[0],[1],[1],[0]],'float32')
m.compile(optimizer='adam',loss='binary_crossentropy')
m.fit(X,Y,batch_size=1,epochs=20000,verbose=0)
print(m.predict(X))
Running this code, I have found some different outputs:
Wrong: [[ 0.00392423], [ 0.99576807], [ 0.50008368], [ 0.50008368]]
Right: [[ 0.08072935], [ 0.95266515], [ 0.95266813], [ 0.09427474]]
What conclusion can we take from it?
The optimizer is not dealing properly with this local minimum. If it gets lucky (a proper weight initialization), it will fall in a good minimum, and bring the right results.
If it gets unlucky (a bad weight initialization), it will fall in a local minimum, without really knowing that there are better places in the loss function, and its learn_rate is simply not big enough to escape this minimum. The small gradient keeps turning around the same point.
If you take the time to study which gradients appear in the wrong case, you will probably see it keeps pointing towards that same point, and increasing the learning rate a little may make it escape the hole.
Intuition makes me think that such very small models have more prominent local minimums.
Instead of just increasing the number of epochs, try using relu for the activation of your hidden layer instead of tanh. Making only that change to the code you provide, I am able to obtain the following result after only 2000 epochs (Theano backend):
import numpy as np
print(np.__version__) #1.11.3
import keras
print(theano.__version__) # 0.9.0
import theano
print(keras.__version__) # 2.0.2
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.optimizers import Adam, SGD
np.random.seed(100)
model = Sequential()
model.add(Dense(units = 2, input_dim=2, activation = 'relu'))
model.add(Dense(units = 1, activation = 'sigmoid'))
X = np.array([[0,0],[0,1],[1,0],[1,1]], "float32")
y = np.array([[0],[1],[1],[0]], "float32")
model.compile(loss='binary_crossentropy', optimizer='adam'
model.fit(X, y, epochs=2000, batch_size=1,verbose=0)
print(model.evaluate(X,y))
print(model.predict_classes(X))
4/4 [==============================] - 0s
0.118175707757
4/4 [==============================] - 0s
[[0]
[1]
[1]
[0]]
It would be easy to conclude that this is due to vanishing gradient problem. However, the simplicity of this network suggest that this isn't the case. Indeed, if I change the optimizer from 'adam' to SGD(lr=0.01, momentum=0.0, decay=0.0, nesterov=False) (the default values), I can see the following result after 5000 epochs with tanh activation in the hidden layer.
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.optimizers import Adam, SGD
np.random.seed(100)
model = Sequential()
model.add(Dense(units = 2, input_dim=2, activation = 'tanh'))
model.add(Dense(units = 1, activation = 'sigmoid'))
X = np.array([[0,0],[0,1],[1,0],[1,1]], "float32")
y = np.array([[0],[1],[1],[0]], "float32")
model.compile(loss='binary_crossentropy', optimizer=SGD())
model.fit(X, y, epochs=5000, batch_size=1,verbose=0)
print(model.evaluate(X,y))
print(model.predict_classes(X))
4/4 [==============================] - 0s
0.0314897596836
4/4 [==============================] - 0s
[[0]
[1]
[1]
[0]]
Edit: 5/17/17 - Included complete code to enable reproduction
The minimal neuron network architecture required to learn XOR which should be a (2,2,1) network. In fact, if maths shows that the (2,2,1) network can solve the XOR problem, but maths doesn't show that the (2,2,1) network is easy to train. It could sometimes takes a lot of epochs (iterations) or does not converge to the global minimum. That said, I've got easily good results with (2,3,1) or (2,4,1) network architectures.
The problem seems to be related to the existence of many local minima. Look at this 1998 paper, «Learning XOR: exploring the space of a classic problem» by Richard Bland. Furthermore weights initialization with random number between 0.5 and 1.0 helps to converge.
It works fine with Keras or TensorFlow using loss function 'mean_squared_error', sigmoid activation and Adam optimizer. Even with pretty good hyperparameters, I observed that the learned XOR model is trapped in a local minimum about 15% of the time.
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from tensorflow.keras import initializers
import numpy as np
X = np.array([[0,0],[0,1],[1,0],[1,1]])
y = np.array([[0],[1],[1],[0]])
def initialize_weights(shape, dtype=None):
return np.random.normal(loc = 0.75, scale = 1e-2, size = shape)
model = Sequential()
model.add(Dense(2,
activation='sigmoid',
kernel_initializer=initialize_weights,
input_dim=2))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='mean_squared_error',
optimizer='adam',
metrics=['accuracy'])
print("*** Training... ***")
model.fit(X, y, batch_size=4, epochs=10000, verbose=0)
print("*** Training done! ***")
print("*** Model prediction on [[0,0],[0,1],[1,0],[1,1]] ***")
print(model.predict_proba(X))
*** Training... ***
*** Training done! ***
*** Model prediction on [[0,0],[0,1],[1,0],[1,1]] ***
[[0.08662204]
[0.9235283 ]
[0.92356336]
[0.06672956]]

Machine Learning accuracy shows 0

I need help with the accuracy for training the model for machine learning.
My input for the training are several array with 500 integer/data which I saved it in hdf5 file under dataset named 'the_data'. In this example, I have 100 arrays.
[[1,2,3,...500],
[501,502,...1000],
[1001,... ],
....
...... ]]
The output is a random number which I generate before hand and save it as 'output.txt'. It has 100 random numbers.
194521, 307329, 182440, 180444, 275690,...,350879
Below is my modified script based on http://machinelearningmastery.com/tutorial-first-neural-network-python-keras/
import h5py
from keras.models import Sequential
from keras.layers import Dense
seed = 7
np.random.seed(seed)
input_data = h5py.File('test.h5', 'r')
output_data = open("output.txt", "r")
X = input_data['the_data'][:]
Y = output_data.read().split(',')
model = Sequential()
model.add(Dense(500, input_dim=500, init='normal', activation='relu'))
model.add(Dense(100, init='normal', activation='relu'))
model.add(Dense(60, init='normal', activation='relu'))
model.add(Dense(1, init='normal', activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adamax', metrics=['accuracy'])
model.fit(X, Y, nb_epoch=500, batch_size=10)
scores = model.evaluate(X, Y)
print("%s: %.2f%% , %s: %.2f%%" % (model.metrics_names[0], scores[0]*100, model.metrics_names[1], scores[1]*100))
What I got as the result is like this
Epoch 500/500
100/100 [==============================] - 0s - loss: -4851446.0896 - acc: 0.0000e+00
100/100 [==============================] - 0s
loss: -485144614.93% , acc: 0.00%
Does anyone have any thought about why does this happened?
Thank you for your help.
Do you know what the binary crossentropy is?
It's a loss function made for targets being binary (0 or 1). The loss is then some logarithm of the output or the output - 1 depending on the target value.
So you can not apply it in your case.
You want to predict numerical values, so you should use something like root mean squarre errors.
The accuracy doesnt make sens either as you're not trying to predict a class but a real number value. It will rarely predict exactly the good one. Accuracy is used for example with binary crossentropy, then we can classify an output that is 0.7 as being in class 1. Or 0.2 being in class 0.
One more comment : why would you want to predict random values? It cant work... the network needs to recognize patterns and there is no patterns in random targets.
I hope this helps you a bit.
I agree with Nassim Ben.
Try to use this
model.compile(loss='mean_square', optimizer='sgd')
Then, to calculate the accuracy you need a different way:
from sklearn.metrics import mean_squared_error
mse = mean_squared_error(Y,Y_predicted)
print('MSE : %.3f' % mse)
print("Acc = ", 1-numpy.sqrt(mse))
This worked for me. But honestly, I feel like keras is not working good with predicting high numbers (other then between 0 and 1)
I would be happy if I was wrong about this.

Categories

Resources