Edits below
I am in the process of learning about artificial neural networks using the Keras library and in order to ensure that I have a good understanding of the basics of neural network classification, I have been trying to reproduce a neural network written with Keras using only tensorflow. However, I have run into some problems.
training_epochs = 100
n_input = 11
n_hidden_1 = 6
n_hidden_2 = 6
n_output = 1
classifier = Sequential()
classifier.add(Dense(output_dim=n_hidden_1, init='uniform', activation='relu', input_dim=n_input))
classifier.add(Dense(output_dim=n_hidden_2, init='uniform', activation='relu'))
classifier.add(Dense(output_dim=n_output, init='uniform', activation='sigmoid'))
classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
classifier.fit(X_train, y_train, batch_size=10, nb_epoch=training_epochs)
y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5)
cm = confusion_matrix(y_test, y_pred)
print(cm)
So essentially I am using a neural network with 2 hidden layers of size 6, an input layer of size 11, and an output of size 1. My output uses the sigmoid function to generate probabilities in order to classify training data into binary categories. I tried to reproduce this with tensorflow as follows:
training_epochs = 100
n_input = 11
n_hidden_1 = 6
n_hidden_2 = 6
n_output = 1
def neuralNetwork(x, weights):
layer_1 = tf.matmul(x, weights['h1'])
layer_1 = tf.nn.relu(layer_1)
layer_2 = tf.matmul(layer_1, weights['h2'])
layer_2 = tf.nn.relu(layer_2)
output_layer = tf.matmul(layer_2, weights['output'])
return output_layer
weights = {
'h1': tf.Variable(tf.random_uniform([n_input, n_hidden_1])),
'h2': tf.Variable(tf.random_uniform([n_hidden_1, n_hidden_2])),
'output': tf.Variable(tf.random_uniform([n_hidden_2, n_output]))
}
x = tf.placeholder('float', [None, n_input]) # [?, 11]
y = tf.placeholder('float', [None, n_output]) # [?, 1]
logits = neuralNetwork(x, weights)
prediction = tf.nn.softmax(logits)
cost = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=logits,labels=y))
optimizer = tf.train.AdamOptimizer().minimize(cost)
init = tf.global_variables_initializer()
with tf.Session() as session:
session.run(init)
for epoch in range(training_epochs):
loss, accuracy = session.run([optimizer, cost], feed_dict={x:X_train, y:y_train})
print('Epoch: {} Acc: {}'.format(epoch+1, accuracy))
print('Model has completed training.')
However, I keep getting the error:
Cannot feed value of shape (8000,) for Tensor 'Placeholder_1:0', which has shape '(?, 1)
My input data has 8000 rows with 11 columns and my output data has 8000 rows and 1 column. In order to try to reshape my data, I tried feeding it in row by row, but I kept getting more errors. Am I going about this the right way? Any help would be appreciated!
Edit: So I updated my code following the given suggestions. I am now getting output for accuracy, however, it seems to finish at around 4-5%. Furthermore, the accuracy also seems to decrease over time rather than improving. When I increase the number of training epochs to 200, the accuracy dips even lower (to around 2%).
Epoch: 1 Acc: 7.641509056091309
...
...
Epoch: 100 Acc: 4.339457035064697
Related
I have the following model
def get_model():
epochs = 100
learning_rate = 0.1
decay_rate = learning_rate / epochs
inp = keras.Input(shape=(64, 101, 1), name="inputs")
x = layers.Conv2D(128, kernel_size=(3, 3), strides=(3, 3), padding="same")(inp)
x = layers.Conv2D(256, kernel_size=(3, 3), strides=(3, 3), padding="same")(x)
x = layers.Flatten()(x)
x = layers.Dense(150)(x)
x = layers.Dense(150)(x)
out1 = layers.Dense(40000, name="sf_vec")(x)
out2 = layers.Dense(128, name="ls_weights")(x)
model = keras.Model(inp, [out1, out2], name="2_out_model")
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=decay_rate), # in caso rimettere 0.001
loss="mean_squared_error")
keras.utils.plot_model(model, to_file='model.png', show_shapes=True, show_layer_names=True)
model.summary()
return model
that is, I want to train my neural network based on the "mix" of the loss from the first output and the loss from the second output.
I train my neural network in this way:
model.fit(x_train, [sf_train, ls_filters_train], epochs=10)
and during the training ,for example, this is shown:
Epoch 10/10 -> loss: 0.0702 - sf_vec_loss: 0.0666 - ls_weights_loss: 0.0035
I'd like to know if it's a case that the "loss" is nearly the sum between the sf_vec_loss and ls_weights_loss or if keras is actually reasoning in this way.
Also, is the network being trained on the "loss" only?
Thank you in advance :)
following the Tensorflow Documentation...
from the loss argument:
If the model has multiple outputs, you can use a different loss on
each output by passing a dictionary or a list of losses. The loss
value that will be minimized by the model will then be the sum of all
individual losses
remember also that you can also weight the loss contributions of different model outputs
from the loss_weights argument:
The loss value that will be minimized by the model will then be the
weighted sum of all individual losses, weighted by the loss_weights coefficients
I'm new to Tensorflow and I'm trying to rebuild a simple network, that I've built in Keras (TF backend), with Tensorflows Python API. It is a simple function approximator (z = sin(x + y)).
I've tried different architectures, optimizers and learning rates, but I'm not getting the new network to train properly. However in my eyes, the networks seem to be identical. Both get the exact same feature vectors and labels:
# making training data
start = 0
end = 2*np.pi
samp = 1000
num_samp = samp**2
step = end / samp
x_train = np.arange(start, end, step)
y_train = np.arange(start, end, step)
data = np.array(np.meshgrid(x_train,y_train)).T.reshape(-1,2)
z_label = np.sin(data[:,0] + data[:,1])
Here is the Keras model:
#start model
model = Sequential()
#stack layers
model.add(Dense(units=128, activation='sigmoid', input_dim=2, name='dense_1'))
model.add(Dense(units=64, activation='sigmoid', input_dim=128, name='dense_2'))
model.add(Dense(units=1, activation='linear', name='output'))
#compile model
model.compile(loss='mean_squared_error',
optimizer='sgd',
metrics=['accuracy'])
checkpointer = ModelCheckpoint(filepath='./weights/weights.h5',
verbose=1, save_best_only=True)
tensorboard = TensorBoard(log_dir="logs/{}".format(time()))
model.fit(data, z_label, epochs=20, batch_size=32,
shuffle='true',validation_data=(data_val, z_label_val),
callbacks=[checkpointer, tensorboard])
Here is the new network, built with Tensorflows Python API:
# hyperparameter
n_inputs = 2
n_hidden1 = 128
n_hidden2 = 64
n_outputs = 1
learning_rate = 0.01
# construction phase
X = tf.placeholder(tf.float32, shape=(None, n_inputs), name='input')
y = tf.placeholder(tf.float32, shape=(None), name="target")
hidden1 = tf.layers.dense(X, n_hidden1, name="hidden1", activation=tf.nn.sigmoid)
hidden2 = tf.layers.dense(hidden1, n_hidden2, name="hidden2", activation=tf.nn.sigmoid)
logits = tf.layers.dense(hidden2, n_outputs, activation='linear', name='output')
loss = tf.reduce_mean(tf.square(logits - y), name='loss')
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
training_op = optimizer.minimize(loss, name='train')
init = tf.global_variables_initializer()
saver = tf.train.Saver()
# --- execution phase ---
n_epochs = 40
batch_size = 32
n_batches = int(num_samp/batch_size)
with tf.Session() as sess:
init.run()
for epoch in range(n_epochs):
print("Epoch: ", epoch, " Running...")
loss_arr = np.array([])
for iteration in range( n_batches ):
start = iteration * batch_size
end = start + batch_size
sess.run(training_op, feed_dict={X: data[start:end], y: z_label[start:end] })
loss_arr = np.append(loss_arr, loss.eval(feed_dict={X: data[start:end, :], y: z_label[start:end]}))
mean_loss = np.mean(loss_arr)
print("Epoch: ", epoch, " Calculated ==> Loss: ", mean_loss)
While the Keras model train properly with a decreasing loss and proper test results, the new model converges pretty fast and stops learning. Accordingly the results are completely useless.
Am I building/training the the model incorrectly or is Keras doing anything in the background, that I'm not aware of?
Solved this issue. The problem was the shape of the label vector. It was a lying vector with shape (1000000,). While Keras is apparently capable of dealing with different shapes of output and label vectors, Tensorflow initialized the placeholder incorrectly and the loss function
loss = tf.reduce_mean(tf.square(logits - y), name='loss')
did't make sense anymore and thus training failed. Adding
z_label = z_label.reshape(-1,1)
reshaped the label vector to (1000000, 1) and solved it. Alternatively one can specify the shape of the placeholder more precisely
y = tf.placeholder(tf.float32, shape=(None,1), name="target")
I'm trying to implement the same model in Keras, and in Tensorflow using Keras layers, using custom data. The two models produce consistently different accuracies over many times of training (keras ~71%, tensorflow ~65%). I want tensorflow to do as well as keras so I can go into the tensorflow iterations to tweak some lower level algorithms.
Here's my original Keras code:
from keras.layers import Dense, Dropout, Input
from keras.models import Model, Sequential
from keras import backend as K
input_size = 2000
num_classes = 4
num_industries = 22
num_aux_inputs = 3
main_input = Input(shape=(input_size,),name='text_vectors')
x = Dense(units=64, activation='relu', name = 'dense1')(main_input)
drop1 = Dropout(0.2,name='dropout1')(x)
auxiliary_input = Input(shape=(num_aux_inputs,), name='aux_input')
x = keras.layers.concatenate([drop1,auxiliary_input])
x = Dense(units=64, activation='relu',name='dense2')(x)
drop2 = Dropout(0.1,name='dropout2')(x)
x = Dense(units=32, activation='relu',name='dense3')(drop2)
main_output = Dense(units=num_classes,
activation='softmax',name='main_output')(x)
model = Model(inputs=[main_input, auxiliary_input],
outputs=main_output)
model.compile(loss=keras.losses.categorical_crossentropy, metrics= ['accuracy'],optimizer=keras.optimizers.Adadelta())
history = model.fit([train_x,train_x_auxiliary], train_y, batch_size=128, epochs=20, verbose=1, validation_data=([val_x,val_x_auxiliary], val_y))
loss, accuracy = model.evaluate([val_x,val_x_auxiliary], val_y, verbose=0)
Here's I moved the keras layers to tensorflow following this article:
import tensorflow as tf
from keras import backend as K
import keras
from keras.layers import Dense, Dropout, Input # Dense layers are "fully connected" layers
from keras.metrics import categorical_accuracy as accuracy
from keras.objectives import categorical_crossentropy
tf.reset_default_graph()
sess = tf.Session()
K.set_session(sess)
input_size = 2000
num_classes = 4
num_industries = 22
num_aux_inputs = 3
x = tf.placeholder(tf.float32, shape=[None, input_size], name='X')
x_aux = tf.placeholder(tf.float32, shape=[None, num_aux_inputs], name='X_aux')
y = tf.placeholder(tf.float32, shape=[None, num_classes], name='Y')
# build graph
layer = Dense(units=64, activation='relu', name = 'dense1')(x)
drop1 = Dropout(0.2,name='dropout1')(layer)
layer = keras.layers.concatenate([drop1,x_aux])
layer = Dense(units=64, activation='relu',name='dense2')(layer)
drop2 = Dropout(0.1,name='dropout2')(layer)
layer = Dense(units=32, activation='relu',name='dense3')(drop2)
output_logits = Dense(units=num_classes, activation='softmax',name='main_output')(layer)
loss = tf.reduce_mean(categorical_crossentropy(y, output_logits))
acc_value = tf.reduce_mean(accuracy(y, output_logits))
correct_prediction = tf.equal(tf.argmax(output_logits, 1), tf.argmax(y, 1), name='correct_pred')
optimizer = tf.train.AdadeltaOptimizer(learning_rate=1.0, rho=0.95,epsilon=tf.keras.backend.epsilon()).minimize(loss)
init = tf.global_variables_initializer()
sess.run(init)
epochs = 20 # Total number of training epochs
batch_size = 128 # Training batch size
display_freq = 300 # Frequency of displaying the training results
num_tr_iter = int(len(y_train) / batch_size)
with sess.as_default():
for epoch in range(epochs):
print('Training epoch: {}'.format(epoch + 1))
# Randomly shuffle the training data at the beginning of each epoch
x_train, x_train_aux, y_train = randomize(x_train, x_train_auxiliary, y_train)
for iteration in range(num_tr_iter):
start = iteration * batch_size
end = (iteration + 1) * batch_size
x_batch, x_aux_batch, y_batch = get_next_batch(x_train, x_train_aux, y_train, start, end)
# Run optimization op (backprop)
feed_dict_batch = {x: x_batch, x_aux:x_aux_batch, y: y_batch,K.learning_phase(): 1}
optimizer.run(feed_dict=feed_dict_batch)
I also implemented the whole model from scratch in tensorflow, but it also is a ~65% accuracy, so I decided to try this Keras-layers-within-TF set up to identify problems.
I've looked up posts on similar problems with Keras and Tensorflow, and have tried the following which didn't help in my case:
Keras's dropout layer is only active in the training phase, so I did the same in my tf code by setting keras.backend.learning_phase().
Keras and Tensorflow have different variable initializations. I've tried initializing my weights in tensorflow these following 3 ways, which is supposed to be the same as Keras's weight initialization, but they also didn't affect the accuracies:
initer = tf.glorot_uniform_initializer()
initer = tf.contrib.layers.xavier_initializer()
initer = tf.random_normal(shape) * (np.sqrt(2.0/(shape[0] + shape[1])))
The optimizer in the two versions are set to be exactly the same! Though it doesn't look like the accuracy depends on the optimizer - I tried using different optimizers in both keras and tf and the accuracies each converge to the same.
Help!
It seems to me that this is most probably the weight initialization problem. What I would suggest you to do is to initialize keras layers and before training get the layer weights and initialize tf layers with those values.
I have ran into that kind of problems and it solved problems for me but it was a long time ago and I don't know if they made those initializers the same. At that time tf and keras initializations were not the same obviously.
I checked with initializers,seed, parameters and hyperparameters but accuracy is different.
I checked the code for Keras and they randomly shuffle the batch of images and then fed into the network, so this shuffling is different across different engines. So we need to figure out a way in which we can fed the same set of batch images to the network in order to get same accuracy
A neural network trained on iris dataset using [4, 4] hidden layers and created separately in tensorflow and keras gives different results.
While the tensorflow model gives 96.6 % accuracy on test, keras model gives only around 50%. The various hyper parameters like learning rate, optimiser, mini batch size, etc were the same in both cases.
Keras model
model = Sequential()
model.add(Dense(units = 4, activation = 'relu', input_dim = 4))
model.add(Dropout(0.25))
model.add(Dense(units = 4, activation = 'relu'))
model.add(Dropout(0.25))
model.add(Dense(units = 3, activation = 'softmax'))
adam = Adam(epsilon = 10**(-6), lr = 0.01)
model.compile(optimizer = 'adagrad', loss = 'categorical_crossentropy', metrics = ['accuracy'])
one_hot_labels = keras.utils.to_categorical(y_train, num_classes = 3)
model.fit(X_train, one_hot_labels, epochs = 50, batch_size = 40)
Tensorflow model
feature_columns = [tf.feature_column.numeric_column(key = name,
shape = (1),
dtype = tf.float32) for name in list(X_train.columns)]
classifier = tf.estimator.DNNClassifier(hidden_units = [4, 4],
feature_columns = feature_columns,
n_classes = 3,
dropout = 0.25,
model_dir = './DNN_model')
train_input_fn = tf.estimator.inputs.pandas_input_fn(x = X_train,
y = y_train,
batch_size = 40,
num_epochs = 50,
shuffle = False)
classifier.train(input_fn = train_input_fn, steps = None)
For the keras model, I did try changing the learning rate, increasing the number of epochs, using different optimisers, etc. As such, the accuracy remained poor. Clearly, both the models are doing different things, but on the surface, they seem identical to me for all the key aspects.
Any help is appreciated.
they have the same architecture, and that's all.
The difference in performance is coming from one or more of these factors:
You have Dropout. Therefore your networks in every start behaving differently (check how the Dropout works);
Weight initializations, which method you're using in Keras and TensorFlow?
Check all parameters of the optimizer.
I am trying to train a single layer perceptron (basing my code on this) on the following data file in tensor flow:
1,1,0.05,-1.05
1,1,0.1,-1.1
....
where the last column is the label (function of 3 parameters) and the first three columns are the function argument. The code that reads the data and trains the model (I simplify it for readability):
import tensorflow as tf
... # some basics to read the data
example, label = read_file_format(filename_queue)
... # model construction and parameter setting
n_hidden_1 = 4 # 1st layer number of features
n_input = 3
n_output = 1
...
# calls a function which produces a prediction
pred = multilayer_perceptron(x, weights, biases)
# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
# Initializing the variables
init = tf.initialize_all_variables()
# Launch the graph
with tf.Session() as sess:
sess.run(init)
for epoch in range(training_epochs):
_, c = sess.run([optimizer, cost], feed_dict={x: example.reshape(1,3), y: label.reshape(-1,1)})
# Display logs per epoch step
if epoch % display_step == 0:
print("Epoch:", '%04d' % (epoch+1), "Cost:",c)
but when I run it, something seems to be very wrong:
('Epoch:', '0001', 'Cost:', nan)
('Epoch:', '0002', 'Cost:', nan)
....
('Epoch:', '0015', 'Cost:', nan)
This is the complete code for the multilaye_perceptron function, etc:
# Parameters
learning_rate = 0.001
training_epochs = 15
display_step = 1
# Network Parameters
n_hidden_1 = 4 # 1st layer number of features
n_input = 3
n_output = 1
# tf Graph input
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_output])
# Create model
def multilayer_perceptron(x, weights, biases):
layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
layer_1 = tf.nn.relu(layer_1)
# Output layer with linear activation
out_layer = tf.matmul(layer_1, weights['out']) + biases['out']
return out_layer
# Store layers weight & bias
weights = {
'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
'out': tf.Variable(tf.random_normal([n_hidden_1, n_output]))
}
biases = {
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'out': tf.Variable(tf.random_normal([n_output]))
}
Is this one example at a time? I would go batches and increase batch size to 128 or similar, as long as you are getting nans.
When I am getting nans it is usually either of the three:
- batch size too small (in your case then just 1)
- log(0) somewhere
- learning rate too high and uncapped gradients