I just finished writing my first ever Neural Network and it finally works, but it works really bad. I get about 0.37 accuracy. Any tips on how to make it more accurate? I have already tried different learning rates and also different number of hidden layer units, but I never get above 0.37 accuracy. I'm trying to classify data into one of the 3 classes 0, 1 or 2. I use a 1 hot Matrix as my Y. How could I improve my code?
X = data[1:, 2:]
m, n = X.shape
labels = data[1:, 1]
Y = np.zeros((m,3))
i = 0
for label in labels:
if label == 0:
Y[i,0] = 1
elif label == 1:
Y[i,1] = 1
elif label == 2:
Y[i,2] = 1
i += 1
slice_size = math.floor(m/5)
X_test = X[-slice_size:, :]
Y_test = Y[-slice_size:]
X_train = X[:slice_size, :]
Y_train = Y[:slice_size]
learning_rate = 0.00001
num_steps = 200
batch_size = 100
display_step = 2
n_nodes_hl1 = 5
n_nodes_hl2 = 5
n_nodes_hl3 = 5
n_classes = 3
n_inputs = 16
training_epochs = 500
x = tf.placeholder('float32', [None,n])
y = tf.placeholder('float32', [None, n_classes])
weights = {
'h1': tf.Variable(tf.random_normal([n_inputs, n_nodes_hl1])),
'h2': tf.Variable(tf.random_normal([n_nodes_hl1, n_nodes_hl2])),
'h3': tf.Variable(tf.random_normal([n_nodes_hl2, n_nodes_hl3])),
'out': tf.Variable(tf.random_normal([n_nodes_hl1, n_classes]))
}
biases = {
'b1': tf.Variable(tf.random_normal([n_nodes_hl1])),
'b2': tf.Variable(tf.random_normal([n_nodes_hl2])),
'b3': tf.Variable(tf.random_normal([n_nodes_hl3])),
'out': tf.Variable(tf.random_normal([n_classes]))
}
def neural_network(data):
layer_1 = tf.add(tf.matmul(data, weights['h1']), biases['b1'])
layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
layer_3 = tf.add(tf.matmul(layer_2, weights['h3']), biases['b3'])
output = tf.matmul(layer_3, weights['out']) + biases['out']
return output
logits = neural_network(x)
prediction = tf.nn.softmax(logits)
loss_op =
tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits,
labels=y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(loss_op)
correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y_train, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for step in range(1, num_steps+1):
x_step = np.asarray(X_train[step,:])
y_step = np.asarray(Y_train[step])
x_step = np.reshape(x_step, (1, n))
y_step = np.reshape(y_step, (1,n_classes))
sess.run(train_op, feed_dict={x:x_step , y:y_step})
if step % display_step == 0 or step == 1:
#Calculate batch loss and accuracy
loss, acc = sess.run([loss_op, accuracy], feed_dict={x: x_step,
y: y_step})
print("Step " + str(step) + ", Minibatch Loss= " +
"{:.4f}".format(loss) + ", Training Accuracy= " +
"{:.3f}".format(acc))
x_step_test = np.asarray(X_test)
y_step_test = np.asarray(Y_test)
x_step_test = np.reshape(x_step, (1, n))
y_step_test = np.reshape(y_step, (1,n_classes))
print("Optimization Finished!")
print("Testing Accuracy:",
sess.run(accuracy, feed_dict={x: x_step_test,
y: y_step_test}))
1.
x_step_test = np.asarray(X_test)
y_step_test = np.asarray(Y_test)
x_step_test = np.reshape(x_step, (1, n))
y_step_test = np.reshape(y_step, (1,n_classes))
Shouldn't this be:
x_step_test = np.asarray(X_test)
y_step_test = np.asarray(Y_test)
x_step_test = np.reshape(x_step_test, (1, n))
y_step_test = np.reshape(y_step_test, (1,n_classes))
Also check how u r taking the batches, there might be some problem.
Use train_test_split from sklearn.model_selection, it splits your train and test data after shuffling. Not shuffling your data might create problem if ur data have some pattern, eg. u have 99 data points, first 33 contain its a dog another 33 contains its a cat and for last 33 its a mouse, your neural net will train only on 66 dog and cat images and won't learn to recognise mouse.
Increase the learning rate, AdamOptimizer already decays the lr, use something like 0.1 or 0.01.
I guess tensorflow part is correct.
Related
I am building several simple networks to predict the bike rentals at 500 stations in the upcoming hour, given rentals at all stations in the previous 24 hours. I am working with two architectures, one with a graph convolution (which amounts to updating each station with a learned linear combination of other stations, at each hour) and a FNN layer to prediction, and a second with a graph convolution -> LSTM -> FNN to prediction.
Before I describe more, I'm getting poorer performance for my model which includes an LSTM unit, which is confusing me.
See these two images for a description of each architecture, for each architecture I also add hourly meta-data (weather, time, etc) as variation, they are in the images in red, and not relevant to my question. Image links at the bottom of the post.
[Architecture 1: GCNN + FNN][1]
[Architecture 2: GCNN + LSTM + FNN][2]
Confusingly, the test RMSE for the first model is 3.46, for the second model its 3.57. Could someone please explain to me why the second wouldn't be lower, as it seems to be running the exact same processes, except with an additional LSTM unit.
Here are relevant snippets of my code for the GCNN+FNN model:
def gcnn_ddgf(hidden_layer, node_num, feature_in, horizon, learning_rate, beta, batch_size, early_stop_th, training_epochs, X_training, Y_training, X_val, Y_val, X_test, Y_test, scaler, display_step):
n_output_vec = node_num * horizon # length of output vector at the final layer
early_stop_k = 0 # early stop patience
best_val = 10000
traing_error = 0
test_error = 0
pred_Y = []
tf.reset_default_graph()
batch_size = batch_size
early_stop_th = early_stop_th
training_epochs = training_epochs
# tf Graph input and output
X = tf.placeholder(tf.float32, [None, node_num, feature_in]) # X is the input signal
Y = tf.placeholder(tf.float32, [None, n_output_vec]) # y is the regression output
# define dictionaries to store layers weight & bias
weights_hidden = {}
weights_A = {}
biases = {}
vec_length = feature_in
weights_hidden['h1'] = tf.Variable(tf.random_normal([vec_length, hidden_layer], stddev=0.5))
biases['b1'] = tf.Variable(tf.random_normal([1, hidden_layer], stddev=0.5))
weights_A['A1'] = tf.Variable(tf.random_normal([node_num, node_num], stddev=0.5))
weights_hidden['out'] = tf.Variable(tf.random_normal([hidden_layer, horizon], stddev=0.5))
biases['bout'] = tf.Variable(tf.random_normal([1, horizon], stddev=0.5))
# Construct model
pred= gcn(X, weights_hidden, weights_A, biases, node_num, horizon) #see below
pred = scaler.inverse_transform(pred)
Y_original = scaler.inverse_transform(Y)
cost = tf.sqrt(tf.reduce_mean(tf.pow(pred - Y_original, 2)))
#optimizer = tf.train.RMSPropOptimizer(learning_rate, decay).minimize(cost)
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate, beta1=beta).minimize(cost)
# Initializing the variables
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for epoch in range(training_epochs):
avg_cost_sq = 0.
num_train = X_training.shape[0]
total_batch = int(num_train/batch_size)
for i in range(total_batch):
_, c = sess.run([optimizer, cost], feed_dict={X: X_training[i*batch_size:(i+1)*batch_size,],
Y: Y_training[i*batch_size:(i+1)*batch_size,]})
avg_cost_sq += np.square(c) * batch_size #/ total_batch
# rest part of training dataset
if total_batch * batch_size != num_train:
_, c = sess.run([optimizer, cost], feed_dict={X: X_training[total_batch*batch_size:num_train,],
Y: Y_training[total_batch*batch_size:num_train,]})
avg_cost_sq += np.square(c) * (num_train - total_batch*batch_size)
avg_cost = np.sqrt(avg_cost_sq / num_train)
# validation
c_val, = sess.run([cost], feed_dict={X: X_val, Y: Y_val})
if c_val < best_val:
# testing
c_tes, preds, Y_true = sess.run([cost, pred, Y_original], feed_dict={X: X_test,Y: Y_test})
best_val = c_val
test_error = c_tes
traing_error = avg_cost
pred_Y = preds
early_stop_k = 0 # reset to 0
# update early stopping patience
if c_val >= best_val:
early_stop_k += 1
# threshold
if early_stop_k == early_stop_th:
break
if epoch % display_step == 0:
print ("Epoch:", '%04d' % (epoch+1), "Training RMSE: ","{:.9f}".format(avg_cost))
print("Validation RMSE: ", c_val)
print("Lowest test RMSE: ", test_error)
print("epoch is ", epoch)
print("training RMSE is ", traing_error)
print("Optimization Finished! the lowest validation RMSE is ", best_val)
print("The test RMSE is ", test_error)
return best_val, pred_Y ,Y_true,test_error
# code that creates the model
def gcn(signal_in, weights_hidden, weights_A, biases, node_num, horizon):
signal_in = tf.transpose(signal_in, [1, 0, 2]) # node_num, batch, feature_in
feature_len = signal_in.shape[2] # feature vector length at the node of the input graph
signal_in = tf.reshape(signal_in, [node_num, -1]) # node_num, batch*feature_in
Adj = 0.5*(weights_A['A1'] + tf.transpose(weights_A['A1']))
Adj = normalize_adj(Adj)
Z = tf.matmul(Adj, signal_in) # node_num, batch*feature_in
Z = tf.reshape(Z, [-1, int(feature_len)]) # node_num * batch, feature_in
signal_output = tf.add(tf.matmul(Z, weights_hidden['h1']), biases['b1'])
signal_output = tf.nn.relu(signal_output) # node_num * batch, hidden_vec
final_output = tf.add(tf.matmul(signal_output, weights_hidden['out']), biases['bout']) # node_num * batch, horizon
# final_output = tf.nn.relu(final_output)
final_output = tf.reshape(final_output, [node_num, -1, horizon]) # node_num, batch, horizon
final_output = tf.transpose(final_output, [1, 0, 2]) # batch, node_num, horizon
final_output = tf.reshape(final_output, [-1, node_num*horizon]) # batch, node_num*horizon
return final_output
And the code for the GCNN+LSTM+FNN model:
def gcnn_ddgf_lstm(node_num, feature_in, learning_rate, beta, batch_size, early_stop_th, training_epochs, X_training,
Y_training, X_val, Y_val, X_test, Y_test, scaler, lstm_layer):
n_output_vec = node_num # length of output vector at the final layer
early_stop_k = 0 # early stop patience
display_step = 1 # frequency of printing results
best_val = 10000
traing_error = 0
test_error = 0
predic_res = []
tf.reset_default_graph()
batch_size = batch_size
early_stop_th = early_stop_th
training_epochs = training_epochs
# tf Graph input and output
X = tf.placeholder(tf.float32, [None, node_num, feature_in]) # X is the input signal
Y = tf.placeholder(tf.float32, [None, n_output_vec]) # y is the regression output
lstm_cell = tf.nn.rnn_cell.LSTMCell(lstm_layer, state_is_tuple=True)
# define dictionaries to store layers weight & bias
weights_hidden = {}
weights_A = {}
biases = {}
weights_A['A1'] = tf.Variable(tf.random_normal([node_num, node_num], stddev=0.5))
weights_hidden['h1'] = tf.Variable(tf.random_normal([lstm_layer, node_num], stddev=0.5))
biases['h1'] = tf.Variable(tf.random_normal([1, node_num], stddev=0.5))
weights_hidden['out'] = tf.Variable(tf.random_normal([node_num, node_num], stddev=0.5))
biases['bout'] = tf.Variable(tf.random_normal([1, node_num], stddev=0.5))
# Construct model
pred= gcn_lstm(X, weights_hidden, weights_A, biases, node_num, lstm_cell)
# pred = scaler.inverse_transform(pred)
# Y_original = scaler.inverse_transform(Y)
cost = tf.sqrt(tf.reduce_mean(tf.pow(pred - Y, 2)))
#optimizer = tf.train.RMSPropOptimizer(learning_rate, decay).minimize(cost)
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate, beta1=beta).minimize(cost)
# Initializing the variables
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for epoch in range(training_epochs):
avg_cost_sq = 0.
num_train = X_training.shape[0]
total_batch = int(num_train/batch_size)
for i in range(total_batch):
_, c = sess.run([optimizer, cost], feed_dict={X: X_training[i*batch_size:(i+1)*batch_size,],
Y: Y_training[i*batch_size:(i+1)*batch_size,]})
avg_cost_sq += np.square(c) * batch_size #/ total_batch
# rest part of training dataset
if total_batch * batch_size != num_train:
_, c = sess.run([optimizer, cost], feed_dict={X: X_training[total_batch*batch_size:num_train,],
Y: Y_training[total_batch*batch_size:num_train,]})
avg_cost_sq += np.square(c) * (num_train - total_batch*batch_size)
avg_cost = np.sqrt(avg_cost_sq / num_train)
# validation
c_val, = sess.run([cost], feed_dict={X: X_val, Y: Y_val})
if c_val < best_val:
c_tes, preds = sess.run([cost, pred], feed_dict={X: X_test,Y: Y_test})
best_val = c_val
# save model
#saver.save(sess, './bikesharing_gcnn_ddgf')
test_error = c_tes
traing_error = avg_cost
early_stop_k = 0 # reset to 0
# update early stopping patience
if c_val >= best_val:
early_stop_k += 1
# threshold
if early_stop_k == early_stop_th:
pred_Y = scaler.inverse_transform(preds)
Y_true = scaler.inverse_transform(Y_test)
test_err = tf.sqrt(tf.reduce_mean(tf.pow(pred_Y - Y_true, 2)))
break
if epoch % display_step == 0:
print ("Epoch:", '%04d' % (epoch+1), "Training RMSE: ","{:.9f}".format(avg_cost))
print("Validation RMSE: ", c_val)
print("Lowest test RMSE: ", test_error)
print("epoch is ", epoch)
print("training RMSE is ", traing_error)
print("Optimization Finished! the lowest validation RMSE is ", best_val)
print("The scaled test RMSE is ", test_error)
return pred_Y, Y_true
def gcn_lstm(signal_in, weights_hidden, weights_A, biases, node_num, lstm_cell):
signal_in = tf.transpose(signal_in, [1, 0, 2]) # node_num, batch, feature_in
feature_len = signal_in.shape[2] # feature vector length at the node of the input graph
signal_in = tf.reshape(signal_in, [node_num, -1]) # node_num, batch*feature_in
Adj = 0.5*(weights_A['A1'] + tf.transpose(weights_A['A1']))
Adj = normalize_adj(Adj)
Z = tf.matmul(Adj, signal_in) # node_num, batch*feature_in
Z = tf.reshape(Z, [node_num, -1, int(feature_len)]) # node_num, batch, feature_in
Z = tf.transpose(Z,[1,2,0]) # batch, feature_in, node_num
# init_state = cell.zero_state(batch_size, tf.float32)
_, Z = tf.nn.dynamic_rnn(lstm_cell, Z, dtype = tf.float32) # init_state?
dense_output = tf.add(tf.matmul(Z[1], weights_hidden['h1']), biases['h1'])
dense_output = tf.nn.relu(dense_output)
final_output = tf.add(tf.matmul(dense_output, weights_hidden['out']), biases['bout']) # batch, node_num*horizon
return final_output
In particular, should I be weary that _, Z = tf.nn.dynamic_rnn(lstm_cell, Z, dtype = tf.float32) causes my variables defined elsewhere not to train?
Thanks a lot for any help :)
[1]: https://i.stack.imgur.com/MAO2t.png
[2]: https://i.stack.imgur.com/UDjHw.png
I resolved this.
I have three years of bike use data to make the prediction, and was using the ~last three months as my validation/test set. The last few months were winter with lower bike use. I got expected results (GCNN+LSTM outperforms GCNN, though not by much) when I shuffled my training data prior to allocating to sets (with sequences preserved for LSTM)
I wants to build a neural network for Student Admission dataset(admit, gre, gpa, rank)
I made admit and rank one-hot as follows
one_hot_data = pd.concat([data, pd.get_dummies(data['rank'], prefix='rank')], axis=1)
one_hot_data = pd.concat([one_hot_data, pd.get_dummies(data['admit'], prefix='admit')], axis=1)
# Drop the previous rank column
data = one_hot_data.drop('rank', axis=1)
data = one_hot_data.drop('admit', axis=1)
print(data.shape)
I split the data using train_test_split and scale using minmax_scale
But neural network is as folows
n_features = X_train.shape[1]
n_labels = y_train.shape[1]
features = tf.placeholder(tf.float32, [None, n_features])
labels = tf.placeholder(tf.float32, [None, n_labels])
w = [
tf.Variable(tf.random_normal((n_features, 16)), name='Weights_layer_0'),
tf.Variable(tf.random_normal((16, 4)), name='Weights_layer_1'),
tf.Variable(tf.random_normal((4, n_labels)), name='Weights_layer_2'),
]
n_layers = len(w)
b = [
tf.Variable(tf.zeros(16), name='Bias_layer_0'),
tf.Variable(tf.zeros(4), name='Bias_layer_1'),
tf.Variable(tf.zeros(n_labels), name='Bias_layer_2'),
]
def neural_network(input, weights, biases):
for i in range(n_layers-1):
layer = tf.add(tf.matmul(input if i==0 else layer, weights[i]),biases[i])
layer = tf.nn.relu(layer)
# layer = tf.nn.dropout(layer, keep_prob=0.6)
out_layer = tf.add(tf.matmul(layer, weights[-1]),biases[-1])
return out_layer
loss_ = []
res = []
prediction = neural_network(features, w, b)
loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=prediction, labels=labels))
optim = tf.train.AdadeltaOptimizer(0.0001).minimize(loss)
correct_prediction = tf.equal(tf.argmax(prediction, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
with tf.device('/gpu'):
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(10):
for m,n in zip(X_train_batches, y_train_batches):
_, l = sess.run([optim, loss],feed_dict={features: m, labels: n})
loss_.append(l)
acc = sess.run([accuracy], feed_dict={features: X_train, labels: y_train})
print(i, acc)
test_accuracy = sess.run(accuracy,feed_dict={features: X_test, labels: y_test})
print(test_accuracy)
res = sess.run(neural_network(features,w,b),feed_dict={features: X})
But accuracy doesn't change
0 [0.4857143]
1 [0.4857143]
2 [0.4857143]
3 [0.4857143]
4 [0.4857143]
5 [0.4857143]
6 [0.4857143]
7 [0.4857143]
8 [0.4857143]
9 [0.4857143]
10 [0.4857143]
0.5333333
and loss stays the same
[0.5546836, 0.5546756, 0.5546678, 0.55466014, 0.55465263, 0.5546452, 0.55463773, 0.55463034, 0.5546232, 0.5546159, 0.5546088, 0.5546016, 0.5545944, 0.5545874, 0.5545803, 0.5545734, 0.55456626, 0.5545592, 0.5545522, 0.5545452]
What is missing? Is my neural network correct? Full code
There may be many possible causes here (and we don't have your data), but, according to my experience, a frequent mistake in such cases is initializing the weights with the default argument of stddev=1.0 in tf.random_normal() (see the docs), as you do here.
A stddev=1.0 is a huge value, and it alone can make your NN go astray. Change it to stddev=0.01 for all your initial weights:
w = [
tf.Variable(tf.random_normal((n_features, 16), stddev=0.01), name='Weights_layer_0'),
tf.Variable(tf.random_normal((16, 4), stddev=0.01), name='Weights_layer_1'),
tf.Variable(tf.random_normal((4, n_labels), stddev=0.01), name='Weights_layer_2'),
]
Other than that, as already suggested in the comments, a learning rate of 0.0001 seems way too small here (given how slowly the loss is decreasing); experiment with higher values (0.01 - 0.001).
I am trying to set an instance so that dropout is compute only during the training session, but somehow it seems that the model doesn't see the dropout layer, as when modifying the probabilities nothing happens. I suspect it's a logic issue in my code, but I can't spot where. Also, I'm relatively new to this world, so please cope with my inexperience. Any help will be much appreciated.
Here's the code. I first create a Boolean placeholder
Train = tf.placeholder(tf.bool,shape=())
which will be then passed into a dictionary value as true(training) or False(test). Then I implemented the forward propagation as follows.
def forward_prop_cost(X, parameters,string,drop_probs,Train):
"""
Implements the forward propagation for the model: LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SOFTMAX
Arguments:
X -- input dataset placeholder, of shape (input size, number of examples)
parameters -- python dictionary containing your parameters "W1", "b1", ...
string - ReLU or tanh
drop_probs = drop probabilities for each layer. First and last == 0
Train = boolean
Returns:
ZL -- the output of the last LINEAR unit
"""
L = len(drop_probs)-1
activations = []
activations.append(X)
if string == 'ReLU':
for i in range(1,L):
Zi = tf.matmul(parameters['W'+str(i)],activations[i-1]) + parameters['b'+str(i)]
if (Train == True and drop_probs[i] != 0):
Ai = tf.nn.dropout(tf.nn.relu(Zi),drop_probs[i])
else:
Ai = tf.nn.relu(Zi)
activations.append(Ai)
elif string == 'tanh': #needs update!
for i in range(1,L):
Zi = tf.matmul(parameters['W'+str(i)],activations[i-1]) + parameters['b'+str(i)]
Ai = tf.nn.dropout(tf.nn.tanh(Zi),drop_probs[i])
activations.append(Ai)
ZL = tf.matmul(parameters['W'+str(L)],activations[L-1]) + parameters['b'+str(L)]
logits = tf.transpose(ZL)
labels = tf.transpose(Y)
return ZL
Then I call the model function, where just at the end I pass the values of the Train as true or false, depending on the data set I'm using.
def model(X_train, Y_train, X_test, Y_test,hidden = [12288,25,12,6], string = 'ReLU',drop_probs = [0.,0.4,0.2,0.],
regular_param = 0.0, starter_learning_rate = 0.0001,
num_epochs = 1500, minibatch_size = 32, print_cost = True, learning_decay = False):
'''
Returns:
parameters -- parameters learnt by the model. They can then be used to predict.
'''
ops.reset_default_graph()
tf.set_random_seed(1)
seed = 3
(n_x, m) = X_train.shape # (n_x: input size, m : number of examples in the train set)
n_y = Y_train.shape[0] # n_y : output size
costs = [] # To keep track of the cost
graph = tf.Graph()
X, Y ,Train = create_placeholders(n_x, n_y)
parameters = initialize_parameters(hidden)
#print([n.name for n in tf.get_default_graph().as_graph_def().node])
ZL = forward_prop_cost(X, parameters,'ReLU',drop_probs,Train)
#cost = forward_prop_cost(X, parameters,'ReLU',drop_probs,regular_param )
cost = compute_cost(ZL,Y,parameters,regular_param)
#optimizer = tf.train.AdamOptimizer(learning_rate = starter_learning_rate).minimize(cost)
if learning_decay == True:
increasing = tf.Variable(0, trainable=False)
learning_rate = tf.train.exponential_decay(starter_learning_rate,increasing * minibatch_size,m, 0.95, staircase=True)
optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost,global_step=increasing)
else:
optimizer = tf.train.AdamOptimizer(learning_rate = starter_learning_rate).minimize(cost)
# Initialize all the variables
init = tf.global_variables_initializer()
# Start the session to compute the tensorflow graph
with tf.Session() as sess:
# Run the initialization
sess.run(init, { Train: True } )
# Do the training loop
for epoch in range(num_epochs):
epoch_cost = 0.
num_minibatches = int(m / minibatch_size)
seed = seed + 1
minibatches = random_mini_batches(X_train, Y_train, minibatch_size, seed)
for minibatch in minibatches:
(minibatch_X, minibatch_Y) = minibatch
_ , minibatch_cost = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y: minibatch_Y})
epoch_cost += minibatch_cost / num_minibatches
# Print the cost every 100 epoch
if print_cost == True and epoch % 100 == 0:
print ("Cost after epoch %i: %f" % (epoch, epoch_cost))
if print_cost == True and epoch % 5 == 0:
costs.append(epoch_cost)
# plot the cost
plt.plot(np.squeeze(costs))
plt.ylabel('cost')
plt.xlabel('iterations (per fives)')
plt.title("Learning rate =" + str(learning_rate))
plt.show()
parameters = sess.run(parameters)
print ("Parameters have been trained!")
# Calculate accuracy on the test set
correct_prediction = tf.equal(tf.argmax(ZL), tf.argmax(Y))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print ("Train Accuracy:", accuracy.eval({X: X_train, Y: Y_train, Train: True}))
print ("Test Accuracy:", accuracy.eval({X: X_test, Y: Y_test, Train: False}))
return parameters
I am able to perform classification with this code. It outputs the probability for each output labels. But I need to convert this so that it can predict the values. That is, I want to add a regression layer at the end instead of softmax. How can I achieve this? Let's say for example I trained the model for label 1,2,3,4,5. But I want the model to predict the values beyond those 5 labels. Example, Given the input, the model may predict 1.3 or 2.5, etc. I want a continuous output rather than a discrete output.
Update
I am trying to achieve a suggested solution from this question
Here
Let's say I have a training data. I train the model for whole number temperatures like 1,2,3,4,5 degrees. Basically, Those output temperatures are the labels. How can I predict the values that lies between two temperatures like 2.5 degree. It is not possible to train for every values of temperature. How can I achieve this?
My model gives probability of each class predicted
Temp Probability
1 .01
2 .05
3 .56
4 .24
5 .14
I want my model to predict the temperature values like 1.2, 2.7, etc. instead of predicting the probability of each class.
input_height = 1 # 1-Dimensional convulotion
input_width = 90 #window
num_labels = 5 #output labels
num_channels = 8 #input columns
batch_size = 10
kernel_size = 60
depth = 60
num_hidden = 1000
learning_rate = 0.0001
training_epochs = 8
total_batches = train_x.shape[0] # batch_size
X = tf.placeholder(tf.float32, shape=[None,input_height,input_width,num_channels],name="input")
# X = tf.placeholder(tf.float32, shape=[None,input_width * num_channels], name="input")
# X_reshaped = tf.reshape(X,[-1,1,90,3])
Y = tf.placeholder(tf.float32, shape=[None,num_labels])
c = apply_depthwise_conv(X,kernel_size,num_channels,depth)
p = apply_max_pool(c,20,2)
c = apply_depthwise_conv(p,6,depth*num_channels,depth//10)
shape = c.get_shape().as_list()
c_flat = tf.reshape(c, [-1, shape[1] * shape[2] * shape[3]])
f_weights_l1 = weight_variable([shape[1] * shape[2] * depth * num_channels * (depth//10), num_hidden])
f_biases_l1 = bias_variable([num_hidden])
f = tf.nn.tanh(tf.add(tf.matmul(c_flat, f_weights_l1),f_biases_l1))
out_weights = weight_variable([num_hidden, num_labels])
out_biases = bias_variable([num_labels])
y_ = tf.nn.softmax(tf.matmul(f, out_weights) + out_biases,name="y_")
loss = -tf.reduce_sum(Y * tf.log(y_))
optimizer = tf.train.GradientDescentOptimizer(learning_rate = learning_rate).minimize(loss)
correct_prediction = tf.equal(tf.argmax(y_,1), tf.argmax(Y,1)) #difference between correct output and expected output
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
cost_history = np.empty(shape=[1], dtype=float)
with tf.Session() as session:
tf.global_variables_initializer().run()
for epoch in range(training_epochs):
for b in range(total_batches):
offset = (b * batch_size) % (train_y.shape[0] - batch_size)
batch_x = train_x[offset:(offset + batch_size), :, :, :]
batch_y = train_y[offset:(offset + batch_size), :]
_, c = session.run([optimizer, loss], feed_dict={X: batch_x, Y: batch_y})
cost_history = np.append(cost_history, c)
print "Epoch: ", epoch, " Training Loss: ", c, " Training Accuracy: ",session.run(accuracy, feed_dict={X: train_x, Y: train_y})
print "Testing Accuracy:", session.run(accuracy, feed_dict={X: test_x, Y: test_y})
If you want to predict which class is detected, just do an arg_max on the output. The one with the highest probability is the detected class.
predict = tf.argmax(y_)
I'm experimenting with TensorFlow (which seems amazing so far!) and I'm playing around with a toy example a 1 class classification problem. I'm generating some features and if the first feature is above a threshold then the example is "positive"
Full code here:
https://gist.github.com/tnbredillet/f136c2bc40815517e0aa1139bd2060ee
The problem is that it seems that the model is unable to capture that simple relationship.
Of course I'm missing a lot of stuff (CV, regularization, batch normalization, hyperparameter tuning) to name a few.
But still I would expect the model to manage to figure that one out right ?
Maybe there's simply a bug in my code?
Would welcome any insights :-)
EDIT:
Data generating code:
num_examples = 100000
split = 0.2
num_features = 1
def generate_input_data(num_examples, num_features):
features = []
labels = []
for i in xrange(num_examples):
features.append(np.random.rand(num_features) * np.random.randint(1, 10) + np.random.rand(num_features))
if np.random.randint(101) > 90:
features[i-1][np.random.randint(num_features)] = 0
hard = ceil(np.sum(features[i-1])) % 2
easy = 0
if features[i-1][0] > 3:
easy = 1
labels.append(easy)
df = pd.concat(
[
pd.DataFrame(features),
pd.Series(labels).rename('labels')
],
axis=1,
)
return df
def one_hot_encoding(train_df):
#TODO: handle categorical feature one hot encoding.
return 0, 0
def scale_data(train_df, test_df):
categorical_columns, encoding = one_hot_encoding(train_df)
scaler = MinMaxScaler(feature_range=(0,1))
scaler.fit(train_df.drop(['labels'], axis=1))
train_df = pd.concat(
[
pd.DataFrame(scaler.transform(train_df.drop('labels', axis=1))),
train_df['labels']
],
axis=1,
)
test_df = pd.concat(
[
pd.DataFrame(scaler.transform(test_df.drop('labels', axis=1))),
test_df['labels']
],
axis=1,
)
return train_df, test_df
def preprocess_data(train_df, test_df):
all_dfs = [train_df, test_df]
features = set()
for df in all_dfs:
features |= set(df.columns)
for df in all_dfs:
for f in features:
if f not in df.columns:
df[f] = 0.0
for df in all_dfs:
df.sort_index(axis=1, inplace=True)
train_df, test_df = scale_data(train_df, test_df)
train_df = shuffle(train_df).reset_index(drop=True)
return train_df, test_df
def get_data(num_examples, split):
train_df = generate_input_data(num_examples, num_features)
test_df = generate_input_data(int(ceil(num_examples*split)), num_features)
return preprocess_data(train_df, test_df)
def get_batch(df, batch_size, epoch):
start = batch_size*epoch-batch_size
end = batch_size*epoch
if end > len(df):
end = len(df)
size = end - start
batch_x = df.drop('labels', axis=1)[start:end].as_matrix()
batch_y = df['labels'][start:end].as_matrix().reshape(size, 1)
return batch_x, batch_y
And the network definition/training and evaluation:
train_df, test_df = get_data(num_examples, split)
n_hidden_1 = 8
n_hidden_2 = 4
learning_rate = 0.01
batch_size = 500
num_epochs = 200
display_epoch = 50
def neural_net(x):
layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
out_layer = tf.matmul(layer_2, weights['out']) + biases['out']
return out_layer
weights = {
'h1': tf.Variable(tf.random_normal([num_features, n_hidden_1])),
'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_hidden_2, 1]))
}
biases = {
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'b2': tf.Variable(tf.random_normal([n_hidden_2])),
'out': tf.Variable(tf.random_normal([1]))
}
X = tf.placeholder(tf.float32, shape=(None, num_features))
Y = tf.placeholder(tf.float32, shape=(None, 1))
logits = neural_net(X)
loss_op = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=logits, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(loss_op)
predictions = tf.sigmoid(logits)
predicted_class = tf.greater(predictions, 0.5)
correct = tf.equal(predicted_class, tf.equal(Y,1.0))
accuracy = tf.reduce_mean( tf.cast(correct, 'float') )
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
for epoch in range(1, num_epochs + 1):
batch_x, batch_y = get_batch(train_df, batch_size, epoch)
sess.run(train_op, feed_dict={X: batch_x, Y: batch_y})
if epoch % display_epoch == 0 or epoch == 1:
loss, acc , pred, fff= sess.run([loss_op, accuracy, predictions, logits],
feed_dict={X: batch_x,
Y: batch_y})
c = ', '.join('{}={}'.format(*t) for t in zip(pred, batch_y))
print("[{}] Batch loss={:.4f}, Accuracy={:.5f}, Logits vs labels= {}".format(epoch, loss, acc, c))
print("Optimization Finished!")
batch_x, batch_y = get_batch(test_df, batch_size, 1)
print("Testing Accuracy:", \
sess.run(accuracy, feed_dict={X: batch_x,
Y: batch_y}))
final output:
[1] Batch loss=3.2160, Accuracy=0.41000
[50] Batch loss=0.6661, Accuracy=0.61800
[100] Batch loss=0.6472, Accuracy=0.65200
[150] Batch loss=0.6538, Accuracy=0.64000
[200] Batch loss=0.6508, Accuracy=0.64400
Optimization Finished!
('Testing Accuracy:', 0.63999999)
In this case it is not a machine learning algorithm problem, but a bug in your data generation which is scrambling the relationship that you intend. In this function:
def generate_input_data(num_examples, num_features):
features = []
labels = []
for i in xrange(num_examples):
features.append(np.random.rand(num_features) * np.random.randint(1, 10) + np.random.rand(num_features))
if np.random.randint(101) > 90:
features[i-1][np.random.randint(num_features)] = 0
hard = ceil(np.sum(features[i-1])) % 2
easy = 0
if features[i-1][0] > 3:
easy = 1
labels.append(easy)
df = pd.concat(
[
pd.DataFrame(features),
pd.Series(labels).rename('labels')
],
axis=1,
)
return df
You are indexing features by i-1 to determine the label. However, xrange will generate numbers starting from 0, so you don't need to subtract the 1. In fact, when you do, the relationship becomes close to random, and essentially unpredictable, so even though the rest of your model is OK, it won't be able to score well.
So you need to index by i instead e.g. if features[i][0] > 3.