I'm trying to create a 2 layer lstm (incl. dropout) but get an error message that 'inputs must be a sequence'.
I use embeddings as the input and not sure how to change these to be a sequence? Any explanations are greatly appreciated.
This is my graph definition:
with tf.name_scope('Placeholders'):
input_x = tf.placeholder(tf.int32, [None, n_steps], name='input_x')
input_y = tf.placeholder(tf.float32, [None, n_classes], name='input_y')
dropout_keep_prob = tf.placeholder(tf.float32, name='dropout_keep_prob')
with tf.name_scope('Embedding_layer'):
embeddings_var = tf.Variable(tf.random_uniform([vocab_size, EMBEDDING_DIM], -1.0, 1.0), trainable=True)
embedded_chars = tf.nn.embedding_lookup(embeddings_var, input_x)
print(embedded_chars, 'embed')
def get_a_cell(lstm_size, keep_prob):
lstm = tf.nn.rnn_cell.BasicLSTMCell(lstm_size)
drop = tf.nn.rnn_cell.DropoutWrapper(lstm, output_keep_prob=dropout_keep_prob)
return drop
with tf.name_scope('lstm'):
cell = tf.nn.rnn_cell.MultiRNNCell(
[get_a_cell(num_hidden, dropout_keep_prob) for _ in range(num_layers)]
)
lstm_outputs, state = tf.nn.static_rnn(cell=cell,inputs=embedded_chars, dtype=tf.float32)
with tf.name_scope('Fully_connected'):
W = tf.Variable(tf.truncated_normal([num_hidden, n_classes], stddev=0.1))
b = tf.Variable(tf.constant(0.1, shape=n_classes))
output = tf.nn.xw_plus_b(lstm_outputs,W,b)
predictions = tf.argmax(output, 1, name='predictions')
with tf.name_scope('Loss'):
# Cross-entropy loss and optimizer initialization
loss1 = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=output, labels=input_y))
global_step = tf.Variable(0, name="global_step", trainable=False)
optimizer = tf.train.AdamOptimizer(learning_rate=1e-3).minimize(loss1, global_step=global_step)
with tf.name_scope('Accuracy'):
# Accuracy metrics
accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.round(tf.nn.softmax(output)), input_y), tf.float32))
with tf.name_scope('num_correct'):
correct_predictions = tf.equal(predictions, tf.argmax(input_y, 1))
num_correct = tf.reduce_sum(tf.cast(correct_predictions, 'float'), name='num_correct')
EDIT:
when changing static_rnn to dynamic_rnn the error message changes to the following, failing on the bias (b) variable:
TypeError: 'int' object is not iterable
After I changed the bias term to this:
b = tf.Variable(tf.random_normal([n_classes]))
and get a new error message:
ValueError: Shape must be rank 2 but is rank 3 for 'Fully_connected/xw_plus_b/MatMul' (op: 'MatMul') with input shapes: [?,27,128], [128,6].
If we assume you use tf.dynamic_rnn (for the case of tf.static_rnn, the first problem is because you don't give the input in the right format, tf.static_rnn except a sequence of tensor such as list of tensors [batch_size x seq_len] and not a single tensor with shape [batch_size x seq_len x dim] whereas tf.dynamic_rnn deals with such tensors as input)
I invite you to read the documentation of tf.nn_dynamic_rnn to see that for your classification problem you might not want to use lstm_outputs but state which basically contain the last output of your RNN, because lstm_output contains all the outputs , whereas here you are interested on only in the last_output (except if you want to do something like attention for classification , here you'll need all the outputs).
To get the last output you'll basically need to do that:
lstm_outputs, state = tf.nn.dynamic_rnn(cell=cell,inputs=embedded_chars, dtype=tf.float32)
last_output = state[-1].h
state[-1] to take the state of the last cell, then h contains the last output and pass last_output to your feed forward network.
Full code
(working, but compute wrong accuracy see comments)
n_classes = 6
n_steps = 27
num_hidden=128
dropout_keep_prob =0.5
vocab_size=10000
EMBEDDING_DIM=300
num_layers = 2
with tf.name_scope('Placeholders'):
input_x = tf.placeholder(tf.int32, [None, n_steps], name='input_x')
input_y = tf.placeholder(tf.float32, [None, n_classes], name='input_y')
dropout_keep_prob = tf.placeholder(tf.float32, name='dropout_keep_prob')
with tf.name_scope('Embedding_layer'):
embeddings_var = tf.Variable(tf.random_uniform([vocab_size, EMBEDDING_DIM], -1.0, 1.0), trainable=True)
embedded_chars = tf.nn.embedding_lookup(embeddings_var, input_x)
print(embedded_chars, 'embed')
def get_a_cell(lstm_size, keep_prob):
lstm = tf.nn.rnn_cell.BasicLSTMCell(lstm_size)
drop = tf.nn.rnn_cell.DropoutWrapper(lstm, output_keep_prob=dropout_keep_prob)
return drop
with tf.name_scope('lstm'):
cell = tf.nn.rnn_cell.MultiRNNCell(
[get_a_cell(num_hidden, dropout_keep_prob) for _ in range(num_layers)]
)
lstm_outputs, state = tf.nn.dynamic_rnn(cell=cell,inputs=embedded_chars, dtype=tf.float32)
last_output = state[-1].h
with tf.name_scope('Fully_connected'):
W = tf.Variable(tf.truncated_normal([num_hidden, n_classes], stddev=0.1))
b = tf.Variable(tf.constant(0.1, shape=[n_classes]))
output = tf.nn.xw_plus_b(last_output,W,b)
predictions = tf.argmax(output, 1, name='predictions')
with tf.name_scope('Loss'):
# Cross-entropy loss and optimizer initialization
loss1 = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=output, labels=input_y))
global_step = tf.Variable(0, name="global_step", trainable=False)
optimizer = tf.train.AdamOptimizer(learning_rate=1e-3).minimize(loss1, global_step=global_step)
with tf.name_scope('Accuracy'):
# Accuracy metrics
accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.round(tf.nn.softmax(output)), input_y), tf.float32))
with tf.name_scope('num_correct'):
correct_predictions = tf.equal(predictions, tf.argmax(input_y, 1))
num_correct = tf.reduce_sum(tf.cast(correct_predictions, 'float'), name='num_correct')
Related
I'm using one convolution layer and one fully connected layer in CNN. where as there are two output nodes. I'm using one input channel and 3 filter channel in the convolutional layer(1D convolution). When I store final weight matrix of fully connected layer it has shape (36,2). Whereas a single input has 12 features. Now I want to plot filter weights attached to 1st channel, second channel and third channel separately. If I plot first 12 weights Does it mean they correspond to the 1 class of first channel?
`
def weight_variable(shape):
initial = tf.truncated_normal(shape, mean=0, stddev=0.1)
return tf.Variable(initial)
def conv1d(input, filter):
return tf.nn.conv1d(input, filter, stride=1, padding='SAME')
x = tf.placeholder(tf.float32, [None, FLAGS.image_width])
y_ = tf.placeholder(tf.float32, [None, 2])
input = tf.reshape(x, [-1, FLAGS.image_width, FLAGS.input_channel])
filter = weight_variable([FLAGS.filter_width, FLAGS.input_channel,
FLAGS.filter_channel])
conv_out = tf.nn.tanh(conv1d(input, filter))
#Fully_Connected_layer
dim = conv_out.get_shape().as_list()
conv_re = tf.reshape(conv_out, (-1, dim[1]*dim[2]))
W_fc = weight_variable([dim[1]*dim[2], 2])
logits = tf.matmul(conv_re, W_fc)
y_prime = tf.nn.softmax(logits)
#Cross_entropy:
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=logits,
labels= y_)
loss = tf.reduce_mean(cross_entropy)
optimizer = tf.train.GradientDescentOptimizer(FLAGS.rLearn).minimize(loss)
#Check_predictions:
correct_prediction = tf.equal(tf.argmax(y_prime, axis=1),tf.argmax(y_,
axis=1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))`
`W = W.eval() #shape (36,2)
W1 = W[0:12,0]
W2 = W[12:24,0]
W3 = W[24:36,0]`
I am developing a neural network model for classifying benign and malware apks.
I have tried using tf.squeeze() function but after using it I am unable to use optimizer
def neural_network_model(data):
l1 = tf.add(tf.matmul(data,hidden_1_layer['weight']), hidden_1_layer['bias'])
l1 = tf.nn.relu(l1)
l2 = tf.add(tf.matmul(l1,hidden_2_layer['weight']), hidden_2_layer['bias'])
l2 = tf.nn.relu(l2)
l3 = tf.add(tf.matmul(l2,hidden_3_layer['weight']), hidden_3_layer['bias'])
l3 = tf.nn.relu(l3)
output = tf.matmul(l3,output_layer['weight']) + output_layer['bias']
return output
def train_neural_network(x):
prediction = neural_network_model(x)
cost = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(logits=prediction, labels= y) )
optimizer = tf.train.AdamOptimizer(learning_rate=0.001).minimize(cost)
The shape of pred and y must be same however by running the code I am having different shape of pred is (3799,2) whereas the shape of y is (1,3799).
My remarks:
If your labels aren't one-hot encoded you can use tf.nn.sparse_softmax_cross_entropy_with_logits() without converting it to the one-hot encoded representation. Otherwise, tf.nn.softmax_cross_entropy_with_logits() accepts only one-hot encoded labels.
You can't pass numpy values as inputs to the loss function (or as inputs to anything except for feed_dict in session.run()) if you're writing code in a graph mode. Use placeholders instead.
Following is the example to illustrate how to use placeholders and feed numpy arrays of data.
import numpy as np
import tensorflow as tf
# Dummy data with 3 classes for illustration
n_classes =3
x_train = np.random.normal(size=(3799, 2)) # 3799 samples of size (2, ) each
y_train = np.random.randint(low=0, high=n_classes, size=(1, 3799))
# Define placeholders here
x = tf.placeholder(tf.float32, shape=(None, 2))
y = tf.placeholder(tf.int32, shape=(1, None))
# Define your network here
w = tf.Variable(tf.random_normal(shape=[2, n_classes]), dtype=tf.float32)
b = tf.Variable(tf.zeros([n_classes, ]), dtype=tf.float32)
logits = tf.matmul(x, w) + b
labels = tf.squeeze(y)
xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,
labels=labels)
cost = tf.reduce_mean(xentropy)
train_op = tf.train.AdamOptimizer(learning_rate=0.001).minimize(cost)
# Training
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
cost_val = sess.run(cost, feed_dict={x:x_train, y:y_train})
print(cost_val) # 1.8630761
sess.run(train_op, feed_dict={x:x_train, y:y_train}) # optimizer step
cost_val = sess.run(cost, feed_dict={x:x_train, y:y_train})
print(cost_val) # 1.8619089
There are a few parameters in the config, particularly when I change the max_len, hidden_size or embedding_size.
config = {
"max_len": 64,
"hidden_size": 64,
"vocab_size": vocab_size,
"embedding_size": 128,
"n_class": 15,
"learning_rate": 1e-3,
"batch_size": 32,
"train_epoch": 20
}
I get an error:
"ValueError: Cannot feed value of shape (32, 32) for Tensor 'Placeholder:0', which has shape '(?, 64)'"
The tensorflow graph below is what I have a problem understanding. Is there a way to understand what relative max_len, hidden_size or embedding_size parameters need to be set to avoid the error I get above?
embeddings_var = tf.Variable(tf.random_uniform([self.vocab_size, self.embedding_size], -1.0, 1.0),
trainable=True)
batch_embedded = tf.nn.embedding_lookup(embeddings_var, self.x)
# multi-head attention
ma = multihead_attention(queries=batch_embedded, keys=batch_embedded)
# FFN(x) = LN(x + point-wisely NN(x))
outputs = feedforward(ma, [self.hidden_size, self.embedding_size])
outputs = tf.reshape(outputs, [-1, self.max_len * self.embedding_size])
logits = tf.layers.dense(outputs, units=self.n_class)
self.loss = tf.reduce_mean(
tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=self.label))
self.prediction = tf.argmax(tf.nn.softmax(logits), 1)
# optimization
loss_to_minimize = self.loss
tvars = tf.trainable_variables()
gradients = tf.gradients(loss_to_minimize, tvars, aggregation_method=tf.AggregationMethod.EXPERIMENTAL_TREE)
grads, global_norm = tf.clip_by_global_norm(gradients, 1.0)
self.global_step = tf.Variable(0, name="global_step", trainable=False)
self.optimizer = tf.train.AdamOptimizer(learning_rate=self.learning_rate)
self.train_op = self.optimizer.apply_gradients(zip(grads, tvars), global_step=self.global_step,
name='train_step')
print("graph built successfully!")
max_len is the length of longest sentence/document token-wise in your training set. It is the second dimension of your input tensor (the first one being batch).
Each sentence will be padded to this length. Attention models need predefined longest sentence as each token will have it's respective weight.
hidden_size is the size of of hidden RNN cell, can be set to anything which will be outputted at each time step.
embedding_size defines dimensionality of token representation (e.g. 300 is standard for word2vec, 1024 for BERT embedding etc.).
I am little interested in sequence tagging for NER. I follow the code "https://github.com/monikkinom/ner-lstm/blob/master/model.py" to make my model like below:
X = tf.placeholder(tf.float32, shape=[None, timesteps , num_input])
Y = tf.placeholder("float", [None, timesteps, num_classes])
y_true = tf.reshape(tf.stack(Y), [-1, num_classes])
the input is,
X: (batch_size,max_sent_length,word_embed_dim)
and
Y: (batch_size,max_sent_length,number_of_labels)
Then I pass the value to a Bi-direction LSTM unit:
def BiRNN(x):
x=tf.unstack(tf.transpose(x, perm=[1, 0, 2]))
def rnn_cell():
cell = tf.nn.rnn_cell.LSTMCell(rnn_size, forget_bias=1,state_is_tuple=True)
return cell
fw_cell=rnn_cell()
bw_cell=rnn_cell()
output,_, _ = tf.nn.static_bidirectional_rnn(fw_cell, bw_cell,x, dtype=tf.float32)
weight, bias = weight_and_bias(2 * rnn_size, num_classes)
output = tf.reshape(tf.transpose(tf.stack(output), perm=[1, 0, 2]), [-1, 2 * rnn_size])
return (tf.matmul(output, weight) + bias)
Where, rnn_size = 128
Then I am doing the below calculations:
logits = BiRNN(X)
logits = tf.reshape(tf.stack(logits), [-1, timesteps,num_classes])
prediction = tf.reshape(logits, [-1, num_classes])
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction, labels=y_true))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
train_op = optimizer.minimize(cost)
I took, batch_size = 64 and 30 epochs.
But in my model only one label is detected every time. I am not able to point out the problem in my code. Please help.
Please check the dimensions of the tensors y_true, output(both the places), logits and prediction and check whether it comes as per your expectation.
I am trying to implement a Fully Convolutional network (5 layers) in TensorFlow.
But after few time of training, all my logits fall to 0.
Did anyone have the same problem before ?
Here is how I implemented my CONV-ReLU-maxPOOL layer :
def conv_relu_layer (in_data, nb_filters, filter_shape) :
nb_in_channels = int (in_data_reshaped.shape[3])
conv_shape = [filter_shape[0], filter_shape[1],
nb_in_channels, nb_filters]
weights = tf.Variable (
tf.truncated_normal (conv_shape, mean=0., stddev=.05))
bias = tf.Variable (
tf.truncated_normal ([nb_filters], mean=0., stddev=1.))
output = tf.nn.conv2d (in_data_reshaped, weights,
[1,1,1,1], padding="SAME")
output += bias
output = tf.nn.relu (output)
return output
def conv_relu_pool_layer (in_data, nb_filters, filter_shape, pool_shape,
pooling=tf.nn.max_pool) :
conv_out = conv_relu_layer (in_data, nb_filters, filter_shape)
ksize = [1, pool_shape[0], pool_shape[1], 1]
strides = [1, pool_shape[0], pool_shape[1], 1]
return pooling (conv_out, ksize=ksize, strides=strides, padding="SAME")
Here is my network :
def create_network_5C (in_data, name="5C") :
c1 = conv_relu_pool_layer (in_data, 64, [5,5], [2,2])
c2 = conv_relu_pool_layer (c1, 128, [5,5], [2,2])
c3 = conv_relu_pool_layer (c2, 256, [5,5], [2,2])
c4 = conv_relu_pool_layer (c3, 64, [5,5], [2,2])
return conv_relu_layer (c4, 2, [5,5])
The loss function :
def loss (logits, labels, num_classes) :
with tf.name_scope('loss'):
logits = tf.reshape(logits, (-1, num_classes))
epsilon = tf.constant(value=1e-4)
labels = tf.to_float(tf.reshape(labels, (-1, num_classes)))
softmax = tf.nn.softmax(logits) + epsilon
cross_entropy = - tf.reduce_sum (
tf.multiply (labels * tf.log (softmax), head),
reduction_indices=[1])
cross_entropy_mean = tf.reduce_mean (cross_entropy)
tf.add_to_collection('losses', cross_entropy_mean)
loss = tf.add_n(tf.get_collection('losses'))
return loss
My main routine :
batch_size = 5
# Load data
x = tf.placeholder (tf.float32, [None, 416, 416, 3], name="x")
y = tf.placeholder (tf.float32, [None, 416, 416, 1], name="y")
# Contrast normalization and computation
x_gcn = tf.map_fn (lambda img : tf.image.per_image_standardization (img), x)
logits = create_network_5C (x_gcn)
# Having label at the same dimension as the output
y_p = tf.nn.avg_pool (tf.sign (y),
ksize=[1,16,16,1], strides=[1,16,16,1], padding="SAME")
y_rshp = tf.reshape (y_p, [batch_size, 416//16, 416//16])
y_bin = tf.cast (y_rshp > .5, tf.int32)
y_1hot = tf.one_hot (y_bin, 2)
# Compute error
error = loss (logits, y_1hot, 2)
optimizer = tf.train.AdamOptimizer (learning_rate=args.eta).minimize (error)
# Run the session
init_op = tf.global_variables_initializer ()
with tf.Session () as session :
session.run (init_op)
err, _ = session.run ([error, optimizer],
feed_dict={ x: image_batch,
y: label_batch })
I note that, if I reduce my network to 2 layers only, it won't drop the logits to 0, but it won't learn anything either. If I reduce it to 3 layers, it will drop to 0, but after a many iterations (while 5 layers drop to 0 in few batches).
Can this be linked to what is called "gradient vanish" ?
If it's relevant, my spec are : Ubuntu 16.04 - Python 3.6.4 - tensorflow 1.6.0
[EDIT] My problem really look like dead-ReLU, as mentioned here : StackOverflow : FCN training error, but my data is normalized (between something like -2 and +2, and I already tried to change the mean and stddev initial value of my weights and biases
[EDIT 2] I tried to replace the ReLUs with Leaky ReLU, or a softplus, in both cases, logits get stucked under 0.1 and loss stay between 0.6 and 0.7
Using some leaky relu was actually enough, then I just needed to let him train for a hudge amount of time.