I have a training data, with 1000 rows. I am using Tensorflow for training this data. Also trying to divide this into mini-batches of size 32. While Training the data, i am getting the error as mentioned below
InvalidArgumentError: Incompatible shapes: [1000] vs. [32]
[[{{node logistic_loss_1/mul}}]]
On the contrary, if i don't divide my training data into minibatches, or use a single minibatch of size 1000, the code works fine.
I have defined weights as tf.Variables and running the tensorflow session. See the code below
def sigmoid_cost(z,Y):
print("Entered Cost")
z = tf.squeeze(z)
Y = tf.cast(Y_train,tf.float64)
logits = tf.transpose(z)
labels = (Y)
print(logits.shape)
print(labels.shape)
return tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=labels,logits=logits))
def model(X_train, Y_train, X_test, Y_test, learning_rate = 0.0001,
num_epochs = 1500, minibatch_size = 32, print_cost = True):
hidden_layer = 4
m,n = X_train.shape
n_y = Y_train.shape[0]
X = tf.placeholder(tf.float64,shape=(None,n), name="X")
Y = tf.placeholder(tf.float64,shape=(None),name="Y")
parameters = init_params(n)
z4, parameters = fwd_model(X,parameters)
cost = sigmoid_cost(z4,Y)
num_minibatch = m/minibatch_size
print("Getting Minibatches")
num_minibatch = tf.cast(num_minibatch,tf.int32)
optimizer = tf.train.GradientDescentOptimizer(learning_rate = learning_rate).minimize(cost)
print("Gradient Defination Done")
init = tf.global_variables_initializer()
init_op = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(init)
sess.run(init_op)
for epoch in range(0,num_epochs):
minibatches = []
minibatches = minibatch(X_train,Y_train,minibatch_size)
minibatch_cost = 0
for i in range (0,len(minibatches)):
(X_m,Y_m) = minibatches[i]
Y_m = np.squeeze(Y_m)
print("Minibatch %d X shape Y Shape ",i, X_m.shape,Y_m.shape)
_ , minibatch_cost = sess.run([optimizer, cost], feed_dict={X: X_m, Y: Y_m})
print("Mini Batch Cost is ",minibatch_cost)
epoch_cost = minibatch_cost/num_minibatch
if print_cost == True and epoch % 100 == 0:
print ("Cost after epoch %i: %f" % (epoch, epoch_cost))
print(epoch_cost)
For some reason, while running the cost function the size of either X or Y batch is being taken as 32, 100 or vice-versa. Any help would be appreciated.
I think you are getting above error because of Y = tf.cast(Y_train, tf.float64) line inside sigmoid_cost function. Here, Y_train has 1000 rows, but loss function is expecting 32(which is your batch size).
It should be Y = tf.cast(Y, tf.float64). Infact, there is no need to cast data type here as Y is already of type tf.float64. Check below line:
Y = tf.placeholder(tf.float64,shape=(None),name="Y")
That's why, when you were using a single minibatch of size 1000(full Y_train data), your code was working fine.
Related
I am trying to set an instance so that dropout is compute only during the training session, but somehow it seems that the model doesn't see the dropout layer, as when modifying the probabilities nothing happens. I suspect it's a logic issue in my code, but I can't spot where. Also, I'm relatively new to this world, so please cope with my inexperience. Any help will be much appreciated.
Here's the code. I first create a Boolean placeholder
Train = tf.placeholder(tf.bool,shape=())
which will be then passed into a dictionary value as true(training) or False(test). Then I implemented the forward propagation as follows.
def forward_prop_cost(X, parameters,string,drop_probs,Train):
"""
Implements the forward propagation for the model: LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SOFTMAX
Arguments:
X -- input dataset placeholder, of shape (input size, number of examples)
parameters -- python dictionary containing your parameters "W1", "b1", ...
string - ReLU or tanh
drop_probs = drop probabilities for each layer. First and last == 0
Train = boolean
Returns:
ZL -- the output of the last LINEAR unit
"""
L = len(drop_probs)-1
activations = []
activations.append(X)
if string == 'ReLU':
for i in range(1,L):
Zi = tf.matmul(parameters['W'+str(i)],activations[i-1]) + parameters['b'+str(i)]
if (Train == True and drop_probs[i] != 0):
Ai = tf.nn.dropout(tf.nn.relu(Zi),drop_probs[i])
else:
Ai = tf.nn.relu(Zi)
activations.append(Ai)
elif string == 'tanh': #needs update!
for i in range(1,L):
Zi = tf.matmul(parameters['W'+str(i)],activations[i-1]) + parameters['b'+str(i)]
Ai = tf.nn.dropout(tf.nn.tanh(Zi),drop_probs[i])
activations.append(Ai)
ZL = tf.matmul(parameters['W'+str(L)],activations[L-1]) + parameters['b'+str(L)]
logits = tf.transpose(ZL)
labels = tf.transpose(Y)
return ZL
Then I call the model function, where just at the end I pass the values of the Train as true or false, depending on the data set I'm using.
def model(X_train, Y_train, X_test, Y_test,hidden = [12288,25,12,6], string = 'ReLU',drop_probs = [0.,0.4,0.2,0.],
regular_param = 0.0, starter_learning_rate = 0.0001,
num_epochs = 1500, minibatch_size = 32, print_cost = True, learning_decay = False):
'''
Returns:
parameters -- parameters learnt by the model. They can then be used to predict.
'''
ops.reset_default_graph()
tf.set_random_seed(1)
seed = 3
(n_x, m) = X_train.shape # (n_x: input size, m : number of examples in the train set)
n_y = Y_train.shape[0] # n_y : output size
costs = [] # To keep track of the cost
graph = tf.Graph()
X, Y ,Train = create_placeholders(n_x, n_y)
parameters = initialize_parameters(hidden)
#print([n.name for n in tf.get_default_graph().as_graph_def().node])
ZL = forward_prop_cost(X, parameters,'ReLU',drop_probs,Train)
#cost = forward_prop_cost(X, parameters,'ReLU',drop_probs,regular_param )
cost = compute_cost(ZL,Y,parameters,regular_param)
#optimizer = tf.train.AdamOptimizer(learning_rate = starter_learning_rate).minimize(cost)
if learning_decay == True:
increasing = tf.Variable(0, trainable=False)
learning_rate = tf.train.exponential_decay(starter_learning_rate,increasing * minibatch_size,m, 0.95, staircase=True)
optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost,global_step=increasing)
else:
optimizer = tf.train.AdamOptimizer(learning_rate = starter_learning_rate).minimize(cost)
# Initialize all the variables
init = tf.global_variables_initializer()
# Start the session to compute the tensorflow graph
with tf.Session() as sess:
# Run the initialization
sess.run(init, { Train: True } )
# Do the training loop
for epoch in range(num_epochs):
epoch_cost = 0.
num_minibatches = int(m / minibatch_size)
seed = seed + 1
minibatches = random_mini_batches(X_train, Y_train, minibatch_size, seed)
for minibatch in minibatches:
(minibatch_X, minibatch_Y) = minibatch
_ , minibatch_cost = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y: minibatch_Y})
epoch_cost += minibatch_cost / num_minibatches
# Print the cost every 100 epoch
if print_cost == True and epoch % 100 == 0:
print ("Cost after epoch %i: %f" % (epoch, epoch_cost))
if print_cost == True and epoch % 5 == 0:
costs.append(epoch_cost)
# plot the cost
plt.plot(np.squeeze(costs))
plt.ylabel('cost')
plt.xlabel('iterations (per fives)')
plt.title("Learning rate =" + str(learning_rate))
plt.show()
parameters = sess.run(parameters)
print ("Parameters have been trained!")
# Calculate accuracy on the test set
correct_prediction = tf.equal(tf.argmax(ZL), tf.argmax(Y))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print ("Train Accuracy:", accuracy.eval({X: X_train, Y: Y_train, Train: True}))
print ("Test Accuracy:", accuracy.eval({X: X_test, Y: Y_test, Train: False}))
return parameters
I'm trying to create a RNN to guess what notes are being played on a piano, given a sound file of piano notes (WAV format). I'm currently cutting the WAV clips into ten-second chunks (2D), padding shorter sections to 10 seconds with zeroes so the input is all regular. However, when I pass in the clips to the RNN, it gives an output of one less dimension (1D) (when taking the last state - should I be taking the state series?).
I've created a simpler RNN to analyze single notes files (2D) and produce one output (1D), which has been successful. However, when trying to apply this same technique to full clips with multiple notes and notes starting/stopping it seems to break down, as I can't seem to change the output shape.
def weight_variable(shape):
initer = tf.truncated_normal_initializer(stddev=0.01)
return tf.get_variable('W', dtype=tf.float32, shape=shape, initializer=initer)
def bias_variable(shape):
initial = tf.constant(0., shape=shape, dtype=tf.float32)
return tf.get_variable('b', dtype=tf.float32,initializer=initial)
def RNN(x, weights, biases, timesteps, num_hidden):
x = tf.unstack(x, timesteps, 1)
# Define a rnn cell with tensorflow
lstm_cell = rnn.LSTMCell(num_hidden)
states_series, current_state = rnn.static_rnn(lstm_cell, x, dtype=tf.float32)
return tf.matmul(current_state[1], weights) + biases
# return [tf.matmul(temp,weights) + biases for temp in states_series]
# does this even make sense
# x is for data, y is for targets, shapes are [index, time, frequency], [index, time, output note (s)] respectively
x_train, x_valid, y_train, y_valid = load_data() # removed test
print("Size of:")
print("- Training-set:\t\t{}".format(y_train.shape[0]))
print("- Validation-set:\t{}".format(y_valid.shape[0]))
# print("- Test-set\t{}".format(len(y_test)))
learning_rate = 0.001 # The optimization initial learning rate
epochs = 1000 # Total number of training epochs
batch_size = 100 # Training batch size
display_freq = 100 # Frequency of displaying the training results
threshold = 0.7 # Threshold for determining a "note"
num_hidden_units = 15 # Number of hidden units of the RNN
# Placeholders for inputs (x) and outputs(y)
x = tf.placeholder(tf.float32, shape=(None, stepCount, num_input))
y = tf.placeholder(tf.float32, shape=(None, stepCount, n_classes))
# create weight matrix initialized randomly from N~(0, 0.01)
W = weight_variable(shape=[num_hidden_units, n_classes])
# create bias vector initialized as zero
b = bias_variable(shape=[n_classes])
output_logits = RNN(x, W, b, stepCount, num_hidden_units)
y_pred = tf.nn.softmax(output_logits)
# Define the loss function, optimizer, and accuracy, etc.
# (code removed, irrelevant)
# Creating the op for initializing all variables
init = tf.global_variables_initializer()
sess = tf.InteractiveSession()
sess.run(init)
global_step = 0
# Number of training iterations in each epoch
num_tr_iter = int(y_train.shape[0] / batch_size)
for epoch in range(epochs):
print('Training epoch: {}'.format(epoch + 1))
x_train, y_train = randomize(x_train, y_train)
for iteration in range(num_tr_iter):
global_step += 1
start = iteration * batch_size
end = (iteration + 1) * batch_size
x_batch, y_batch = get_next_batch(x_train, y_train, start, end)
# Run optimization op (backprop)
feed_dict_batch = {x: x_batch, y: y_batch}
sess.run(optimizer, feed_dict=feed_dict_batch)
if iteration % display_freq == 0:
# Calculate and display the batch loss and accuracy
loss_batch, acc_batch = sess.run([loss, accuracy],
feed_dict=feed_dict_batch)
print("iter {0:3d}:\t Loss={1:.2f},\tTraining Accuracy={2:.01%}".
format(iteration, loss_batch, acc_batch))
testLoss.append(loss_batch)
testAcc.append(acc_batch)
# Run validation after every epoch
feed_dict_valid = {x: x_valid[:1000].reshape((-1, stepCount, num_input)), y: y_valid[:1000]}
loss_valid, acc_valid = sess.run([loss, accuracy], feed_dict=feed_dict_valid)
print('---------------------------------------------------------')
print("Epoch: {0}, validation loss: {1:.2f}, validation accuracy: {2:.01%}".
format(epoch + 1, loss_valid, acc_valid))
print('---------------------------------------------------------')
validLoss.append(loss_valid)
validAcc.append(acc_batch)
Currently, this is outputting a 1D array of predictions, which really does not make sense in my scenario, but I'm not sure how to change it (it should be outputting predictions for each timestep - i.e. predictions of what notes are playing at each moment in time).
I am doing this project in which i applied minibatches in neural network and calculating epoch cost:-
def model(X_train, Y_train, X_test, Y_test, learning_rate = 0.0001, num_epochs = 1500,
minibatch_size = 32, print_cost = True):
ops.reset_default_graph()
tf.set_random_seed(1)
seed = 3
costs = []
(n_x, m) = X_train.shape
n_y = Y_train.shape[0]
#create placeholder
X, Y = create_placeholder(n_x, n_y)
# init parameter
parameters = init_parameter()
# forward prop
Z3 = forward_prop(X, parameters)
# compute cost
cost = compute_cost(Z3, Y)
# optimizer
optimizer = tf.train.AdamOptimizer(learning_rate= learning_rate).minimize(cost)
# Initialize all variables
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for epoch in range(num_epochs):
epoch_cost = 0
num_minibatche = int(m/ minibatch_size)
seed = seed + 1
minibatches = random_mini_batches(X_train, Y_train,
minibatch_size, seed)
for minibatch in minibatches:
(minibatch_X, minibatch_Y) = minibatch
_, minibatch_cost = sess.run([optimizer, cost], feed_dict
= {X: minibatch_X, Y: minibatch_Y})
epoch_cost += minibatch_cost / num_minibatche
# Print the cost every epoch
if print_cost == True and epoch % 100 == 0:
print ("Cost after epoch " ,epoch, np.mean(epoch_cost))
if print_cost == True and epoch % 5 == 0:
costs.append(epoch_cost)
# plot the cost
plt.plot(np.squeeze(costs))
plt.ylabel('cost')
plt.xlabel('iterations (per tens)')
plt.title("Learning rate =" + str(learning_rate))
plt.show()
# save the parameters
parameters = sess.run(parameters)
print ("Parameters have been trained!")
# Calculate the correct predictions
correct_prediction = tf.equal(tf.argmax(Z3), tf.argmax(Y))
# Calculate accuracy on the test set
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print ("Train Accuracy:", accuracy.eval({X: X_train, Y: Y_train}))
print ("Test Accuracy:", accuracy.eval({X: X_test, Y: Y_test}))
return parameters
SO when i run this code i am getting this error on line:-
---->epoch_cost += minibatch_cost / num_minibatche
---->ValueError: operands could not be broadcast together with shapes (32,) (5,) (32,)
I took minibatche_size = 32 and number of training examples = 1381
But i am totally confused why i am getting this error.
The code you've posted is missing a lot of parts, like the whole model() function, so it's difficult to debug. But based on just what we have, some things here that are supposed to be scalars are in fact, arrays.
epoch_cost starts out as a scalar zero with epoch_cost = 0. Then you add some value to it, then try to print np.mean( epoch_cost ). Why do you take the mean of a scalar? Looks like the code was different earlier, and the migration to a scalar epoch_cost was not successful.
It is easy to imagine that minibatch_cost is returned as an array from TensorFlow - one cost value for each member of the batch. In that case you would need to apply np.mean() right there, like
epoch_cost += np.mean( minibatch_cost ) / num_minibatche
Maybe even num_minibatche somehow became a vector. It comes from
num_minibatche = int(m/ minibatch_size)
and minibatch_size is supposedly 32, so that's all right. But m comes from
(n_x, m) = X_train.shape
and we know nothing of X_train. Maybe m somehow became a vector, and in turn num_minibatche too. You will need to print the value for num_minibatche once calculated and make sure it's what it's supposed to be.
Hope this helps. If you post the whole code, I can help you more.
I want to print the value of MSE at each epoch/batch combination. the code below reports the tensor object representing the mse instead of its value at each iteration:
print("Epoch", epoch, "Batch_Index", batch_index, "MSE:", mse)
Example line of output:
Epoch 0 Batch_Index 0 MSE: Tensor("mse_2:0", shape=(), dtype=float32)
I understand it is because MSE is referencing tf.placeholder nodes which by themselves do not have any data. But once I run the below code:
sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
the data should be already available thus values for all nodes depending on that data should be accessible as well, I think requesting an evaluation of the MSE in the print statement results in error
print("Epoch", epoch, "Batch_Index", batch_index, "MSE:", mse.eval())
Output2:
InvalidArgumentError: You must feed a value for placeholder tensor 'X_2' with dtype float and shape [?,9]
...
This tells me that mse.eval() does not see the data defined in sess.run()
Why do we experience such behavior?
How should we change the code to make it report MSA at each specified iteration?
import numpy as np
from sklearn.datasets import fetch_california_housing
housing = fetch_california_housing()
m, n = housing.data.shape
housing_data_plus_bias = np.c_[np.ones((m, 1)), housing.data] # ADD COLUMN OF 1s for BIAS!
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaled_housing_data = scaler.fit_transform(housing.data)
scaled_housing_data_plus_bias = np.c_[np.ones((m, 1)), scaled_housing_data]
X = tf.placeholder(tf.float32, shape=(None, n + 1), name="X")
y = tf.placeholder(tf.float32, shape=(None, 1), name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)
init = tf.global_variables_initializer()
n_epochs = 100
batch_size = 100
n_batches = int(np.ceil(m / batch_size))
learning_rate = 0.01
def fetch_batch(epoch, batch_index, batch_size):
np.random.seed(epoch * n_batches + batch_index) # not shown in the book
indices = np.random.randint(m, size=batch_size) # not shown
X_batch = scaled_housing_data_plus_bias[indices] # not shown
y_batch = housing.target.reshape(-1, 1)[indices] # not shown
return X_batch, y_batch
with tf.Session() as sess:
sess.run(init)
for epoch in range(n_epochs):
for batch_index in range(n_batches):
X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
if (epoch % 50 == 0 and batch_index % 100 == 0):
print("Epoch", epoch, "Batch_Index", batch_index, "MSE:", mse)
best_theta = theta.eval()
best_theta
First, I think this kind of debugging and printing and stuff is much easier to do with eager execution enabled in tensorflow.
Without eager execution enabled, "print" in tensorflow will never print the dynamic value of a tensor; it'll only print the name of the tensor, which is rarely what you want. Instead, use tf.Print to inspect the runtime value of the tensor (by doing something like tensor = tf.Print(tensor, [tensor]) as tf.Print does not execute unless its output is used somewhere).
i made it work by modifying the print statement to the following:
print("Epoch", epoch, "Batch_Index", batch_index, "MSE:", mse.eval(feed_dict={X: scaled_housing_data_plus_bias, y: housing_target}))
moreover by referencing complete data set (not batches) i was able to test the generalization of the current batch-based model to the whole sample. It should be easy to extend it to test on the test and hold-out samples as training of the model progresses
i am afraid that such on-the-fly evaluation (even on batches) can have impact on performance of the model. I will do further tests of that.
I am new to tensorflow. I am trying to train a rnn(for phoneme tagging). When I run the code, the loss is not lowing down at all.(My batch size is one because each sequence has different length). I am wondering if I am doing something wrong? Can I optimize the model by feeding the data through a for loop like this? I thought each time I use sess.run for optimization, it will update the variable from it's previous value. Am I wrong?
with tf.variable_scope("foo",reuse=True):
x = tf.placeholder(tf.float32, [1, None, 69])
y = tf.placeholder(tf.int32, [None])
cell = tf.nn.rnn_cell.LSTMCell(48)
outputs, state = tf.nn.dynamic_rnn(cell, x, dtype=tf.float32)
loss =tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels = y, logits = tf.reshape(outputs,[-1, 48])))
optimizer = tf.train.AdamOptimizer(learning_rate=0.1).minimize(loss)
saver = tf.train.Saver()
with tf.Session() as sess:
init = tf.global_variables_initializer()
sess.run(init)
for epoch in range(1):
for index, ID in enumerate(list(X.keys())):
x_temp = np.array(X[ID])[np.newaxis,:,:]
y_temp = np.array(Y_int[ID])
_ , _loss = sess.run([optimizer,loss], feed_dict = {x: x_temp, y : y_temp } )
if index % 10 == 0:
print("epoch "+str(epoch)+" samples "+str(index)+" loss: "+str(_loss))
prediction = []
save_path = saver.save(sess, "/tmp/model.ckpt")
for index, ID in enumerate(list(X.keys())):
if index % 10 == 0:
print("epoch "+str(epoch)+" samples "+str(index))
x_temp = np.array(X[ID])[np.newaxis,:,:]
y_temp = np.array(Y_int[ID])
out = sess.run(outputs, feed_dict = {x: x_temp, y : y_temp })
prediction.append(out)