So my task is to predict sequence. I have x,y,z values at time t which are float type. I have to predict sequence that has values x,y,z at time (t + 1).
TIME_STEP = 10
N_FEATURES = N_CLASSES = 3
LEARNING_RATE = 0.01
EPOCHS = 50
BATCH_SIZE = 10
x = tf.placeholder(tf.float32, shape = [None, N_FEATURES], name = 'name')
y = tf.placeholder(tf.float32, shape = [N_CLASSES], name = 'labels')
then I have my lstm model, which looks like:
x = tf.transpose(x, [1, 0])
x = tf.reshape(x, [-1, num_features])
hidden = tf.nn.relu(tf.matmul(x, self.h_W) + self.h_biases)
hidden = tf.split(hidden, self.time_step)
lstm_layers = [tf.contrib.rnn.BasicLSTMCell(self.hidden_units, forget_bias=1.0) for _ in range(2)]
lstm_layers = tf.contrib.rnn.MultiRNNCell(lstm_layers)
outputs, _ = tf.contrib.rnn.static_rnn(lstm_layers, hidden, dtype = tf.float32)
lstm_output = outputs[-1]
and finally I define loss function and optimizer
loss = tf.reduce_mean(tf.square(y - y_pred))
opt = tf.train.AdamOptimizer(learning_rate = LEARNING_RATE).minimize(loss)
for now I want to take previous 10 values to predict the 11th one. so I run session like
for time in range(0, len(X)):
sess.run(opt, feed_dict = {x : X[time: time + TIME_STEP ],
y : Y[time + TIME_STEP + 1]})
but when I check loss for this function it has huge value like 99400290.0 and it increases time by time. This is my first experience with predicting sequences so I think I must be missing something huge
Yes, you should normalize your real world input data and it should use the same scaling (same parameters) that you used on your training set.
The reason is, now your model is trained to accept inputs of a certain shape and scale and for it to perform as intended you'll have to scale your test inputs to it.
(sorry for posting this as an answer, not enough rep for commenting)
Related
I created my first transformer model, after having worked so far with LSTMs. I created it for multivariate time series predictions - I have 10 different meteorological features (temperature, humidity, windspeed, pollution concentration a.o.) and with them I am trying to predict time sequences (24 consecutive values/hours) of air pollution. So my input has the shape X.shape = (75575, 168, 10) - 75575 time sequences, each sequence contains 168 hourly entries/vectors and each vector contains 10 meteo features. My output has the shape y.shape = (75575, 24) - 75575 sequences each containing 24 consecutive hourly values of the air pollution concentration.
I took as a model an example from the official keras site. It is created for classification problems, I only took out the softmax activation and in the last dense layer I set the number of neurons to 24 and I hoped it would work. I runs and trains, but it doesn't do a better job than the LSTMs I have used on the same problem and more importantly - it is very slow - 4 min/epoch. Below I attach the model and I would like to know:
I) Have I done something wrong in the model? can the accuracy or speed be improved? Are there maybe some other parts of the code I need to change for it to work on regression, not classification problems?
II) Also, can a transformer at all work on multivariate problems of my kind (10 features input, 1 feature output) or do transformers only work on univariate problems? Tnx
def build_transformer_model(input_shape, head_size, num_heads, ff_dim, num_transformer_blocks, mlp_units, dropout=0, mlp_dropout=0):
inputs = keras.Input(shape=input_shape)
x = inputs
for _ in range(num_transformer_blocks):
# Normalization and Attention
x = layers.LayerNormalization(epsilon=1e-6)(x)
x = layers.MultiHeadAttention(
key_dim=head_size, num_heads=num_heads, dropout=dropout
)(x, x)
x = layers.Dropout(dropout)(x)
res = x + inputs
# Feed Forward Part
x = layers.LayerNormalization(epsilon=1e-6)(res)
x = layers.Conv1D(filters=ff_dim, kernel_size=1, activation="relu")(x)
x = layers.Dropout(dropout)(x)
x = layers.Conv1D(filters=inputs.shape[-1], kernel_size=1)(x)
x = x + res
x = layers.GlobalAveragePooling1D(data_format="channels_first")(x)
for dim in mlp_units:
x = layers.Dense(dim, activation="relu")(x)
x = layers.Dropout(mlp_dropout)(x)
x = layers.Dense(24)(x)
return keras.Model(inputs, x)
model_tr = build_transformer_model(input_shape=(window_size, X_train.shape[2]), head_size=256, num_heads=4, ff_dim=4, num_transformer_blocks=4, mlp_units=[128], mlp_dropout=0.4, dropout=0.25)
model_tr.compile(loss="mse",optimizer='adam')
m_tr_history = model_tr.fit(x=X_train, y=y_train, validation_split=0.25, batch_size=64, epochs=10, callbacks=[modelsave_cb])
I am trying to create a simple 3D U-net for image segmentation, just to learn how to use the layers. Therefore I do a 3D convolution with stride 2 and then a transpose deconvolution to get back the same image size. I am also overfitting to a small set (test set) just to see if my network is learning.
I created the same net in Keras and it works just fine. Now I want to create in tensorflow but I been having trouble with it.
The cost changes slightly but no matter what I do (reduce learning rate, add more epochs, add more layers, change batch size...) the output is always the same. I believe the net is not updating the weights. I am sure I am doing something wrong but I can find what it is. Any help would be greatly appreciate it.
Here is my code:
def forward_propagation(X):
if ( mode == 'train'): print(" --------- Net --------- ")
# Convolutional Layer 1
with tf.variable_scope('CONV1'):
Z1 = tf.layers.conv3d(X, filters = 16, kernel =[3,3,3], strides = [ 2, 2, 2], padding='SAME', name = 'S2/conv3d')
A1 = tf.nn.relu(Z1, name = 'S2/ReLU')
if ( mode == 'train'): print("Convolutional Layer 1 S2 " + str(A1.get_shape()))
# DEConvolutional Layer 1
with tf.variable_scope('DeCONV1'):
output_deconv1 = tf.stack([X.get_shape()[0] , X.get_shape()[1], X.get_shape()[2], X.get_shape()[3], 1])
dZ1 = tf.nn.conv3d_transpose(A1, filters = 1, kernel =[3,3,3], strides = [2, 2, 2], padding='SAME', name = 'S2/conv3d_transpose')
dA1 = tf.nn.relu(dZ1, name = 'S2/ReLU')
if ( mode == 'train'): print("Deconvolutional Layer 1 S1 " + str(dA1.get_shape()))
return dA1
def compute_cost(output, target, method = 'dice_hard_coe'):
with tf.variable_scope('COST'):
if (method == 'sigmoid_cross_entropy') :
# Make them vectors
output = tf.reshape( output, [-1, output.get_shape().as_list()[0]] )
target = tf.reshape( target, [-1, target.get_shape().as_list()[0]] )
loss = tf.nn.sigmoid_cross_entropy_with_logits(logits = output, labels = target)
cost = tf.reduce_mean(loss)
return cost
and the main function for the model:
def model(X_h5, Y_h5, learning_rate = 0.009,
num_epochs = 100, minibatch_size = 64, print_cost = True):
ops.reset_default_graph() # to be able to rerun the model without overwriting tf variables
#tf.set_random_seed(1) # to keep results consistent (tensorflow seed)
#seed = 3 # to keep results consistent (numpy seed)
(m, n_D, n_H, n_W, num_channels) = X_h5["test_data"].shape #TTT
num_labels = Y_h5["test_mask"].shape[4] #TTT
img_size = Y_h5["test_mask"].shape[1] #TTT
costs = [] # To keep track of the cost
accuracies = [] # To keep track of the accuracy
# Create Placeholders of the correct shape
X, Y = create_placeholders(n_H, n_W, n_D, minibatch_size)
# Forward propagation: Build the forward propagation in the tensorflow graph
nn_output = forward_propagation(X)
prediction = tf.nn.sigmoid(nn_output)
# Cost function: Add cost function to tensorflow graph
cost_method = 'sigmoid_cross_entropy'
cost = compute_cost(nn_output, Y, cost_method)
# Backpropagation: Define the tensorflow optimizer. Use an AdamOptimizer that minimizes the cost.
optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost)
# Initialize all the variables globally
init = tf.global_variables_initializer()
# Start the session to compute the tensorflow graph
with tf.Session() as sess:
print('------ Training ------')
# Run the initialization
tf.local_variables_initializer().run(session=sess)
sess.run(init)
# Do the training loop
for i in range(num_epochs*m):
# ----- TRAIN -------
current_epoch = i//m
patient_start = i-(current_epoch * m)
patient_end = patient_start + minibatch_size
current_X_train = np.zeros((minibatch_size, n_D, n_H, n_W,num_channels))
current_X_train[:,:,:,:,:] = np.array(X_h5["test_data"][patient_start:patient_end,:,:,:,:]) #TTT
current_X_train = np.nan_to_num(current_X_train) # make nan zero
current_Y_train = np.zeros((minibatch_size, n_D, n_H, n_W, num_labels))
current_Y_train[:,:,:,:,:] = np.array(Y_h5["test_mask"][patient_start:patient_end,:,:,:,:]) #TTT
current_Y_train = np.nan_to_num(current_Y_train) # make nan zero
feed_dict = {X: current_X_train, Y: current_Y_train}
_ , temp_cost = sess.run([optimizer, cost], feed_dict=feed_dict)
# ----- TEST -------
# Print the cost every 1/5 epoch
if ((i % (num_epochs*m/5) )== 0):
# Calculate the predictions
test_predictions = np.zeros(Y_h5["test_mask"].shape)
for j in range(0, X_h5["test_data"].shape[0], minibatch_size):
patient_start = j
patient_end = patient_start + minibatch_size
current_X_test = np.zeros((minibatch_size, n_D, n_H, n_W, num_channels))
current_X_test[:,:,:,:,:] = np.array(X_h5["test_data"][patient_start:patient_end,:,:,:,:])
current_X_test = np.nan_to_num(current_X_test) # make nan zero
current_Y_test = np.zeros((minibatch_size, n_D, n_H, n_W, num_labels))
current_Y_test[:,:,:,:,:] = np.array(Y_h5["test_mask"][patient_start:patient_end,:,:,:,:])
current_Y_test = np.nan_to_num(current_Y_test) # make nan zero
feed_dict = {X: current_X_test, Y: current_Y_test}
_, current_prediction = sess.run([cost, prediction], feed_dict=feed_dict)
test_predictions[j:j + minibatch_size,:,:,:,:] = current_prediction
costs.append(temp_cost)
print ("[" + str(current_epoch) + "|" + str(num_epochs) + "] " + "Cost : " + str(costs[-1]))
display_progress(X_h5["test_data"], Y_h5["test_mask"], test_predictions, 5, n_H, n_W)
# plot the cost
plt.plot(np.squeeze(costs))
plt.ylabel('cost')
plt.xlabel('epochs')
plt.show()
return
I call the model with:
model(hdf5_data_file, hdf5_mask_file, num_epochs = 500, minibatch_size = 1, learning_rate = 1e-3)
These are the results that I am currently getting:
Edit:
I have tried reducing the learning rate and it doesn't help. I also tried using tensorboard debug and the weights are not being updated:
I am not sure why this is happening.
I Created the same simple model in keras and it works fine. I am not sure what I am doing wrong in tensorflow.
Not sure if you are still looking for help, as I am answering this question half a year later your posted date. :) I've listed my observations and also some suggestions for you to try below. It my primary observation is right... then you probably just need a coffee break / a night of good sleep.
primary observation:
tf.reshape( output, [-1, output.get_shape().as_list()[0]] ) seems wrong. If you prefer to flatten the vector, it should be something like tf.reshape(output,[-1,np.prod(image_shape_list)]).
other observations:
With such a shallow network, I doubt the network have enough spatial resolution to differentiate tumor voxels from non-tumor voxels. Can you show the keras implementation and the performance compared to a pure tf implementation? I would probably go with 2+ layers, let's .
say with 3 layers, with a stride of 2 per layer, and an input image width of 256, you will end with a width of 32 at your deepest encoder layer. (If you have a limited GPU memory, downsample the input image.)
if changing the loss computation does not work, as #bremen_matt mentioned, reduce LR to say maybe 1e-5.
after the basic architecture tweaks and you "feel" that the network is sort of learning and not stuck, try augmenting the training data, add dropout, batch norm during training, and then maybe fancy up your loss by adding a discriminator.
In reviewing the numerous similar questions concerning multidimensional inputs and a stacked LSTM RNN I have not found an example which lays out the dimensionality for the initial_state placeholder and following rnn_tuple_state below. The attempted [lstm_num_layers, 2, None, lstm_num_cells, 2] is an extension of the code from these examples (http://monik.in/a-noobs-guide-to-implementing-rnn-lstm-using-tensorflow/, https://medium.com/#erikhallstrm/using-the-tensorflow-multilayered-lstm-api-f6e7da7bbe40) with an extra dimension of feature_dim added at the end for the multiple values at each time step of the features (this doesn't work but instead produces a ValueError due to mismatched dimensions in the tensorflow.nn.dynamic_rnn call).
time_steps = 10
feature_dim = 2
label_dim = 4
lstm_num_layers = 3
lstm_num_cells = 100
dropout_rate = 0.8
# None is to allow for variable size batches
features = tensorflow.placeholder(tensorflow.float32,
[None, time_steps, feature_dim])
labels = tensorflow.placeholder(tensorflow.float32, [None, label_dim])
cell = tensorflow.contrib.rnn.MultiRNNCell(
[tensorflow.contrib.rnn.LayerNormBasicLSTMCell(
lstm_num_cells,
dropout_keep_prob = dropout_rate)] * lstm_num_layers,
state_is_tuple = True)
# not sure of the dimensionality for the initial state
initial_state = tensorflow.placeholder(
tensorflow.float32,
[lstm_num_layers, 2, None, lstm_num_cells, feature_dim])
# which impacts these two lines as well
state_per_layer_list = tensorflow.unstack(initial_state, axis = 0)
rnn_tuple_state = tuple(
[tensorflow.contrib.rnn.LSTMStateTuple(
state_per_layer_list[i][0],
state_per_layer_list[i][1]) for i in range(lstm_num_layers)])
# also not sure if expanding the feature dimensions is correct here
outputs, state = tensorflow.nn.dynamic_rnn(
cell, tensorflow.expand_dims(features, -1),
initial_state = rnn_tuple_state)
What would be most helpful is an explanation of the generic situation where:
each time step has N values
each time sequence has S steps
each batch has B sequences
each output has R values
there are L hidden LSTM layers in the network
each layer has M number of nodes
so the pseudocode version of this would be:
# B, S, N, and R are undefined values for the purpose of this question
features = tensorflow.placeholder(tensorflow.float32, [B, S, N])
labels = tensorflow.placeholder(tensorflow.float32, [B, R])
...
which if I could finish I wouldn't be asking here in the first place. Thanks in advance. Any comments on relevant best practices welcome.
After much trial and error the following produces a stacked LSTM dynamic_rnn regardless of the dimensionality of the features:
time_steps = 10
feature_dim = 2
label_dim = 4
lstm_num_layers = 3
lstm_num_cells = 100
dropout_rate = 0.8
learning_rate = 0.001
features = tensorflow.placeholder(
tensorflow.float32, [None, time_steps, feature_dim])
labels = tensorflow.placeholder(
tensorflow.float32, [None, label_dim])
cell_list = []
for _ in range(lstm_num_layers):
cell_list.append(
tensorflow.contrib.rnn.LayerNormBasicLSTMCell(lstm_num_cells,
dropout_keep_prob=dropout_rate))
cell = tensorflow.contrib.rnn.MultiRNNCell(cell_list, state_is_tuple=True)
initial_state = tensorflow.placeholder(
tensorflow.float32, [lstm_num_layers, 2, None, lstm_num_cells])
state_per_layer_list = tensorflow.unstack(initial_state, axis=0)
rnn_tuple_state = tuple(
[tensorflow.contrib.rnn.LSTMStateTuple(
state_per_layer_list[i][0],
state_per_layer_list[i][1]) for i in range(lstm_num_layers)])
state_series, last_state = tensorflow.nn.dynamic_rnn(
cell=cell, inputs=features, initial_state=rnn_tuple_state)
hidden_layer_output = tensorflow.transpose(state_series, [1, 0, 2])
last_output = tensorflow.gather(hidden_layer_output, int(
hidden_layer_output.get_shape()[0]) - 1)
weights = tensorflow.Variable(tensorflow.random_normal(
[lstm_num_cells, int(labels.get_shape()[1])]))
biases = tensorflow.Variable(tensorflow.constant(
0.0, shape=[labels.get_shape()[1]]))
predictions = tensorflow.matmul(last_output, weights) + biases
mean_squared_error = tensorflow.reduce_mean(
tensorflow.square(predictions - labels))
minimize_error = tensorflow.train.RMSPropOptimizer(
learning_rate).minimize(mean_squared_error)
Part of what started this journey down one of many proverbial rabbit holes was the previously referenced examples reshaped the output to accommodate a classifier instead of a regressor (which is what I was attempting to build). Since this is independent of the feature dimensionality it serves as a generic template for this use case.
Inspired by this article, I'm trying to build a Conditional GAN which will use LSTM to generate MNIST numbers. I hope I'm using the same architecture as in the image below (except for the bidirectional RNN in discriminator, taken from this paper):
When I run this model, I've got very strange results. This image shows my model generating number 3 after each epoch. It should look more like this. It's really bad.
Loss of my discriminator network decreasing really fast up to close to zero. However, the loss of my generator network oscillates around some fixed point (maybe diverging slowly). I really don't know what's happening. Here is the most important part of my code (full code here):
timesteps = 28
X_dim = 28
Z_dim = 100
y_dim = 10
X = tf.placeholder(tf.float32, [None, timesteps, X_dim]) # reshaped MNIST image to 28x28
y = tf.placeholder(tf.float32, [None, y_dim]) # one-hot label
Z = tf.placeholder(tf.float32, [None, timesteps, Z_dim]) # numpy.random.uniform noise in range [-1; 1]
y_timesteps = tf.tile(tf.expand_dims(y, axis=1), [1, timesteps, 1]) # [None, timesteps, y_dim] - replicate y along axis=1
def discriminator(x, y):
with tf.variable_scope('discriminator', reuse=tf.AUTO_REUSE) as vs:
inputs = tf.concat([x, y], axis=2)
D_cell = tf.contrib.rnn.LSTMCell(64)
output, _ = tf.nn.dynamic_rnn(D_cell, inputs, dtype=tf.float32)
last_output = output[:, -1, :]
logit = tf.contrib.layers.fully_connected(last_output, 1, activation_fn=None)
pred = tf.nn.sigmoid(logit)
variables = [v for v in tf.all_variables() if v.name.startswith(vs.name)]
return variables, pred, logit
def generator(z, y):
with tf.variable_scope('generator', reuse=tf.AUTO_REUSE) as vs:
inputs = tf.concat([z, y], axis=2)
G_cell = tf.contrib.rnn.LSTMCell(64)
output, _ = tf.nn.dynamic_rnn(G_cell, inputs, dtype=tf.float32)
logit = tf.contrib.layers.fully_connected(output, X_dim, activation_fn=None)
pred = tf.nn.sigmoid(logit)
variables = [v for v in tf.all_variables() if v.name.startswith(vs.name)]
return variables, pred
G_vars, G_sample = run_generator(Z, y_timesteps)
D_vars, D_real, D_logit_real = run_discriminator(X, y_timesteps)
_, D_fake, D_logit_fake = run_discriminator(G_sample, y_timesteps)
D_loss = -tf.reduce_mean(tf.log(D_real) + tf.log(1. - D_fake))
G_loss = -tf.reduce_mean(tf.log(D_fake))
D_solver = tf.train.AdamOptimizer().minimize(D_loss, var_list=D_vars)
G_solver = tf.train.AdamOptimizer().minimize(G_loss, var_list=G_vars)
There is most likely something wrong with my model. Anyone could help me make the generator network converge?
There are a few things you can do to improve your network architecture and training phase.
Remove the tf.nn.sigmoid(logit) from both the generator and discriminator. Return just the pred.
Use a numerically stable function to calculate your loss functions and fix the loss functions:
D_loss = -tf.reduce_mean(tf.log(D_real) + tf.log(1. - D_fake))
G_loss = -tf.reduce_mean(tf.log(D_fake))
should be:
D_loss_real = tf.nn.sigmoid_cross_entropy_with_logits(
logits=D_real,
labels=tf.ones_like(D_real))
D_loss_fake = tf.nn.sigmoid_cross_entropy_with_logits(
logits=D_fake,
labels=tf.zeros_like(D_fake))
D_loss = -tf.reduce_mean(D_loss_real + D_loss_fake)
G_loss = -tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
logits=D_real,
labels=tf.ones_like(D_real)))
Once you fixed the loss and used a numerically stable function, things will go better. Also, as a rule of thumb, if there's too much noise in the loss, reduce the learning rate (the default lr of ADAM is usually too high when training GANs).
Hope it helps
when i use bilstm model to process NLP problem. i got the error while using session.run().I search on Google it seems that bad feed dict make the error. i print the input x shape, it is (100,), but i define it as that:[100, None,256].
how i can solve the error?
it's my evenrimont:
python:3.6
tensorflow:1.0.0
task:everty description has some tags, like stackoverflow, one question has some tags.i need to build a model to predict tags for questions. my training input x is:[batch_size,None,word_embedding_size] a batch of question description, one description has some words, and one word is expressed as vector as length is 256. input y is:[batch_size,n_classes],
this is my model code:
self.X_inputs = tf.placeholder(tf.float32, [self.n_steps,None,self.n_inputs])
self.targets = tf.placeholder(tf.float32, [None,self.n_classes])
#transpose the input x
x = tf.transpose(self.X_inputs, [1, 0, 2])
x = tf.reshape(x, [-1, self.n_inputs])
x = tf.split(x, self.n_steps)
# lstm cell
lstm_cell_fw = tf.contrib.rnn.BasicLSTMCell(self.hidden_dim)
lstm_cell_bw = tf.contrib.rnn.BasicLSTMCell(self.hidden_dim)
# dropout
if is_training:
lstm_cell_fw = tf.contrib.rnn.DropoutWrapper(lstm_cell_fw, output_keep_prob=(1 - self.dropout_rate))
lstm_cell_bw = tf.contrib.rnn.DropoutWrapper(lstm_cell_bw, output_keep_prob=(1 - self.dropout_rate))
lstm_cell_fw = tf.contrib.rnn.MultiRNNCell([lstm_cell_fw] * self.num_layers)
lstm_cell_bw = tf.contrib.rnn.MultiRNNCell([lstm_cell_bw] * self.num_layers)
# forward and backward
self.outputs, _, _ = tf.contrib.rnn.static_bidirectional_rnn(
lstm_cell_fw,
lstm_cell_bw,
x,
dtype=tf.float32
)
feed_dict like that:
feed_dict={
self.X_inputs: X_train_batch,
self.targets: y_train_batch
}
the X_train_batch is some sentence, they have such shape, [100,None,256], the'None' means, input sentence has not the same length, from 10 to 1500,i just get the real length. maybe it occur the error?
My question is:do you padding the sentence length as the same, or reshape the inputs while doing such nlp work?