I'm trying to build an RNN in Keras. I don't quite understand the required input format. I can build dense networks no problem, but I think that the RNN layers expect input dimension x batch x time step? Can anyone verify this?
Here is the code I would like to update:
Original code:
def get_generative(G_in, dense_dim=200, out_dim=50, lr=1e-3):
x = Dense(dense_dim)(G_in)
x = Activation('tanh')(x)
G_out = Dense(out_dim, activation='tanh')(x)
G = Model(G_in, G_out)
opt = SGD(lr=lr)
G.compile(loss='binary_crossentropy', optimizer=opt)
return G, G_out
G_in = Input(shape=[10])
G, G_out = get_generative(G_in)
G.summary()
Modified with GRU layers and some slightly different dimensions:
def get_generative(G_in, dense_dim=10, out_dim=37, lr=1e-3):
clear_session()
x = GRU(dense_dim, activation='tanh',return_state=True)(G_in)
G_out = GRU(out_dim, return_state=True)(x)
G = Model(G_in, G_out)
opt = SGD(lr=lr)
G.compile(loss='binary_crossentropy', optimizer=opt)
return G, G_out
G_in = Input(shape=(None,3))
G, G_out = get_generative(G_in)
G.summary()
The error that I am seeing with this code is:
ValueError: Tensor("gru_1/strided_slice:0", shape=(3, 10),
dtype=float32) must be from the same graph as
Tensor("strided_slice_1:0", shape=(?, 3), dtype=float32).
If I remove the None above, I get:
ValueError: Input 0 is incompatible with layer gru_1: expected ndim=3,
found ndim=2
Any explanation would be helpful here.
You get an error because you clear the session after creating the input tensor. That is why the input tensor is not coming from the same graph as the rest of your network. To fix this simply leave out the line clear_session().
Another problem with your code: the second GRU layer expects a sequence input, therefore you should use return_sequences=True inside the first GRU layer. You probably want to leave out the argument return_state=True since that makes the layer return a tuple of tensors (output and state) instead of just one output tensor.
To sum up, the following code should do it:
def get_generative(G_in, dense_dim=10, out_dim=37, lr=1e-3):
x = GRU(dense_dim, activation='tanh', return_sequences=True)(G_in)
G_out = GRU(out_dim)(x)
G = Model(G_in, G_out)
opt = SGD(lr=lr)
G.compile(loss='binary_crossentropy', optimizer=opt)
return G, G_out
The problem here is that RNN layers expect a 3D tensor input of the form: [num samples, time steps, features].
So we can modify the code above as:
def get_generative(G_in, dense_dim=10, out_dim=37, lr=1e-3):
x = GRU(dense_dim, activation='tanh',return_state=True)(G_in)
G_out = GRU(out_dim, return_state=True)(x)
G = Model(G_in, G_out)
opt = SGD(lr=lr)
G.compile(loss='binary_crossentropy', optimizer=opt)
return G, G_out
G_in = Input(shape=(1,3))
G, G_out = get_generative(G_in)
G.summary()
So what we are saying is that we expect an input of an arbitrary number of samples, each of 1 time step with 3 features.
Anna is correct that clear_session() should not be inside the generator function.
Lastly, if you actually want to input data into the network, its shape should also match what we just discussed. You can do this by using numpy reshape:
X = np.reshape(X, (X.shape[0], 1, X.shape[1]))
Related
I have written a generator function with Keras, before returning X,y from __getitem__ I have double check the shapes of the X's and Y's and they are alright, but generator is giving dimension mismatch array and warnings.
(Colab Code to reproduce: https://colab.research.google.com/drive/1bSJm44MMDCWDU8IrG2GXKBvXNHCuY70G?usp=sharing)
My training and validation generators are pretty much same as
class ValidGenerator(Sequence):
def __init__(self, df, batch_size=64):
self.batch_size = batch_size
self.df = df
self.indices = self.df.index.tolist()
self.num_classes = num_classes
self.shuffle = shuffle
self.on_epoch_end()
def __len__(self):
return int(len(self.indices) // self.batch_size)
def __getitem__(self, index):
index = self.index[index * self.batch_size:(index + 1) * self.batch_size]
batch = [self.indices[k] for k in index]
X, y = self.__get_data(batch)
return X, y
def on_epoch_end(self):
self.index = np.arange(len(self.indices))
if self.shuffle == True:
np.random.shuffle(self.index)
def __get_data(self, batch):
#some logic is written here
#hat prepares 3 X features and 3 Y outputs
X = [input_array_1,input_array_2,input_array_3]
y = [out_1,out_2,out_3]
#print(len(X))
return X, y
I am return tupple of X,y from which has 3 input features and 3 output features each, so shape of X is (3,32,10,1)
I am using functional api to build model(I have things like concatenation, multi input/output, which isnt possible with sequential) with following structure
When I try to fit the model with generator with following code
train_datagen = TrainGenerator(df=train_df, batch_size=32, num_classes=None, shuffle=True)
valid_datagen = ValidGenerator(df=train_df, batch_size=32, num_classes=None, shuffle=True)
model.fit(train_datagen, epochs=2,verbose=1,callbacks=[checkpoint,es])
I get these warnings and errors, that dont go away
Epoch 1/2
WARNING:tensorflow:Model was constructed with shape (None, 10) for input >Tensor("input_1:0", shape=(None, 10), dtype=float32), but it was called >on an input with incompatible shape (None, None, None).
WARNING:tensorflow:Model was constructed with shape (None, 10) for input
Tensor("input_2:0", shape=(None, 10), dtype=float32), but it was
called on an input with incompatible shape (None, None, None).
WARNING:tensorflow:Model was constructed with shape (None, 10) for
input Tensor("input_3:0", shape=(None, 10), dtype=float32), but it was
called on an input with incompatible shape (None, None, None).
...
...
call
return super(RNN, self).call(inputs, **kwargs)
/home/eduardo/.virtualenvs/kgpu3/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer.py:975
call
input_spec.assert_input_compatibility(self.input_spec, inputs,
/home/eduardo/.virtualenvs/kgpu3/lib/python3.8/site-packages/tensorflow/python/keras/engine/input_spec.py:176
assert_input_compatibility
raise ValueError('Input ' + str(input_index) + ' of layer ' +
ValueError: Input 0 of layer lstm is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: [None, None, None, 88]
I have rechecked whole code and it isnt possible to have input (None,None,None) like in warning or in error, my input dimension is (3,32,10,1)
Update
I have also tried to write a generator function with python and got exactly same error.
My generator function
def generate_arrays_from_file(batchsize,df):
#print(bat)
inputs = []
targets = []
batchcount = 0
while True:
df3 = df.loc[np.arange(batchcount*batchsize,(batchcount*batchsize)+batchsize)]
#Some pre processing
X = [input_array_1,input_array_2,input_array_3]
y = [out_1,out_2,out_3]
yield X,y
batchcount = batchcount +1
It seems like it is something wrong internally wit keras (may be due to the fact I am using functional API)
Update 2
I also tried to output tuple
X = (input1_X,input2_X,input3_X)
y = (output1_y,output2_y,output3_y)
and also named input/output, but it doesnt work
X = {"input_1": input1_X, "input_2": input2_X,"input_3": input3_X}
y = {"output_1": output1_y, "output_2": output2_y,"output_3": output3_y}
Note about problem formulation:
Changing the individual X features to shape (32,10) instead of (32,10,1) might help to get rid of this error but that is not what I want, it changes my problem(I no longer have 10 time steps with one feature each)
Keras use 'None' for dynamic dimensions.
As you can see on the model.summary() chart - the model expecting shape(None, 10) for all of your inputs, which is two dimensional. With batch dimension - you should feed three dimensional data to the model.
But you are feeding four dimensional data.
I would guess that your model doesn't split your input list by three inputs. Try to change your inputs to tuple:
X = (input_array_1,input_array_2,input_array_3)
In order to resolve this error:
ValueError: Input 0 of layer lstm is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: [None, None, None, 88]
TrainGenerator should be changed in the following way.
Current code:
input1_X = np.array(df3['input1_X'].to_list()).reshape(dlen,pad_len,1)
input2_X = np.array(df3['input2_X'].to_list()).reshape(dlen,pad_len,1)
input3_X = np.array(df3['input3_X'].to_list()).reshape(dlen,pad_len,1)
Should be changed to:
input1_X = np.array(df3['input1_X'].to_list()).reshape(dlen,pad_len)
input2_X = np.array(df3['input2_X'].to_list()).reshape(dlen,pad_len)
input3_X = np.array(df3['input3_X'].to_list()).reshape(dlen,pad_len)
The reason is that each of the 3 Inputs expects a 2-dimensional array, but the generator provides a 3-dimensional one. The expected shape is (batch_size, 10).
I had a similar issue with a custom generator that just had to pass a numpy array of size 10 as input and one single output.
To solve this problem i had to trasform the shape of the 2 vectors passed to the neural network like this:
def slides_generator(integer_list):
# stuff happens
x = np_ts[np_index:np_index+10] # numpy array
y = np_ts[np_index+10] # numpy array
yield tf.convert_to_tensor(x)[np.newaxis, ...], tf.convert_to_tensor(y)[np.newaxis, ...]
doge_gen = slides_generator(integer_list) #next(doge_gen)
basically you need to pass the 2 arrays with shape (None,size),
so in my case were (None,10) and (None,1), and to achieve this i just passed 2 reshaped tensors.
you need the None dimension as the batch size.
Hi so what I exactly want is if we have matrix W and vector V such as:
V=[1,2,3,4]
W=[[1,1,1,1],[1,1,1,1],[1,1,1,1],[1,1,1,1]]
we should got the result:
result=[[1,1,1,1],[2,2,2,2],[3,3,3,3],[4,4,4,4]]
I found this method on the website:
V = tf.constant([1,2,4], dtype=tf.float32)
W = tf.constant([[1,2,3,4],[1,2,3,4],[1,2,3,4]], dtype=tf.float32)
tf.multiply(tf.expand_dims(V,1),W)
## produce: [[1,2,3,4],[2,4,6,8],[4,8,12,16]]
which is exactly what I want but when I implement this on my model it also include the batch size of the vector in which result in error such
with input shapes: [?,1,297], [?,297,300].
which I assume is the same error which this can produce
V = tf.constant([[1,2,4]], dtype=tf.float32)
W = tf.constant([[[1,2,3,4],[1,2,3,4],[1,2,3,4]]], dtype=tf.float32)
tf.multiply(tf.expand_dims(V,1),W)
I wanted to know what is the standard procedure to get each element from the softmax output vector and multiply them as weight for each vector in the feature tensor
I found that by using
V = tf.constant([[1,2,4]], dtype=tf.float32)
W = tf.constant([[[1,2,3,4],[1,2,3,4],[1,2,3,4]]], dtype=tf.float32)
h2=tf.keras.layers.multiply([W,tf.expand_dims(V,2)])
the keras layer will ignore the batch size part for us but we have to change the parameter of expand dim because we still have to consider the batch size of V before feed to the layer.
I am trying to use the Keras.backend ops to write a function that I will wrap as a Lambda to use in my model.
There are two tensors, X and Y. X is not trainable. Y is trainable.
The python function that is wrapped is:
import keras.backend as K
from keras.activations import softmax
def _attention(inputs):
X, Y = inputs
attention_weight = K.dot(X, K.expand_dims(Y))
attention_weight = K.squeeze(attention_weight, axis=-1)
attention_weight = softmax(attention_weight, axis=-1)
return attention_weight
which I wanted to wrap as:
Y = K.random_normal_variable(shape=(200,), mean=0.0, scale=1.0)
attend = Lambda(_attention)
attention = attend((X,Y))
When I call:
model = Model(inputs=[input], outputs=[attention])
I receive the message
ValueError: Output tensors to a Model must be the output of a TensorFlowLayer(thus holding past layer metadata). Found: Tensor("lambda_2/Softmax:0", shape=(?, ?), dtype=float32)
Do I really need to make a custom layer for the expand_dims, dot product, and squeeze method? I know I could always reshape Y from (dim,) -> (dim,1) but I am still stuck with the squeeze.
I am implementing an OCR with Keras, Tensorflow backend.
I want to use keras.backend.ctc_decode implementation.
I have a model class :
import keras
def ctc_lambda_func(args):
y_pred, y_true, input_x_width, input_y_width = args
# the 2 is critical here since the first couple outputs of the RNN
# tend to be garbage:
# y_pred = y_pred[:, 2:, :]
return keras.backend.ctc_batch_cost(y_true, y_pred, input_x_width, input_y_width)
class ModelOcropy(keras.Model):
def __init__(self, alphabet: str):
self.img_height = 48
self.lstm_size = 100
self.alphabet_size = len(alphabet)
# check backend input shape (channel first/last)
if keras.backend.image_data_format() == "channels_first":
input_shape = (1, None, self.img_height)
else:
input_shape = (None, self.img_height, 1)
# data input
input_x = keras.layers.Input(input_shape, name='x')
# training inputs
input_y = keras.layers.Input((None,), name='y')
input_x_widths = keras.layers.Input([1], name='x_widths')
input_y_widths = keras.layers.Input([1], name='y_widths')
# network
flattened_input_x = keras.layers.Reshape((-1, self.img_height))(input_x)
bidirectional_lstm = keras.layers.Bidirectional(
keras.layers.LSTM(self.lstm_size, return_sequences=True, name='lstm'),
name='bidirectional_lstm'
)(flattened_input_x)
dense = keras.layers.Dense(self.alphabet_size, activation='relu')(bidirectional_lstm)
y_pred = keras.layers.Softmax(name='y_pred')(dense)
# ctc loss
ctc = keras.layers.Lambda(ctc_lambda_func, output_shape=[1], name='ctc')(
[dense, input_y, input_x_widths, input_y_widths]
)
# init keras model
super().__init__(inputs=[input_x, input_x_widths, input_y, input_y_widths], outputs=[y_pred, ctc])
# ctc decoder
top_k_decoded, _ = keras.backend.ctc_decode(y_pred, input_x_widths)
self.decoder = keras.backend.function([input_x, input_x_widths], [top_k_decoded[0]])
# decoded_sequences = self.decoder([test_input_data, test_input_lengths])
My use of ctc_decode comes from another post : Keras using Lambda layers error with K.ctc_decode
I get an error :
ValueError: Shape must be rank 1 but is rank 2 for 'CTCGreedyDecoder' (op: 'CTCGreedyDecoder') with input shapes: [?,?,7], [?,1].
I guess I have to squeeze my input_x_widths, but Keras does not seem to have such function (it always outputs something like (batch_size, 1))
Indeed, the function is expecting a 1D tensor, and you've got a 2D tensor.
Keras does have the keras.backend.squeeze(x, axis=-1) function.
And you can also use keras.backend.reshape(x, (-1,))
If you need to go back to the old shape after the operation, you can both:
keras.backend.expand_dims(x)
keras.backend.reshape(x,(-1,1))
Complete fix :
# ctc decoder
flattened_input_x_width = keras.backend.reshape(input_x_widths, (-1,))
top_k_decoded, _ = keras.backend.ctc_decode(y_pred, flattened_input_x_width)
self.decoder = keras.backend.function([input_x, flattened_input_x_width], [top_k_decoded[0]])
# decoded_sequences = self.decoder([input_x, flattened_input_x_width])
I am using a recurrent neural network in tensorflow with BasicLSTMCells. Basically, I have an input sequence of word ids, I convert each id to word embeddings, pass the word embeddings one at a time through the rnn, and then make a prediction for a single word after reading the whole sequence. My embedding matrix is of dimension V x H where V is the size of my vocabulary and H is the number of hidden units in my rnn. In order to make a prediction for the next word, I multiply my hidden vector by a weight matrix of size H x V and then compute a softmax. With the setup I described, everything seems to work as expected. I'm able to train on some examples and make reasonable predictions.
However, I've noticed that if I try to use the transpose of the embedding matrix, which will be a matrix of size H x V, instead of a separate matrix for the softmax layer, tensorflow raises a value error claiming that the dimensions of something it's not specifying don't have the same rank. I've verified that the dimensions of my embedding matrix (well, its transpose) are the same as those of the separate softmax matrix I'm creating. Changing just the one line of code from using my embedding matrix vs a separate softmax weight matrix causes the error.
I created a relatively small program to demonstrate what I'm trying to do and to show what causes the error. I was not able to make the error occur on a smaller network when I tried with just a single hidden layer network.
import sys
import time
import tensorflow as tf
from tensorflow.models.rnn import rnn
from tensorflow.models.rnn import rnn_cell
from tensorflow.models.rnn.rnn_cell import BasicLSTMCell
import numpy as np
INPUT_LENGTH = 17
BATCH_SIZE = 20
VOCAB_SIZE = 11
NUM_EPOCHS = 1000
HIDDEN_UNITS = 100
class Model(object):
def __init__(self, is_training):
initializer = tf.random_uniform_initializer(-1.0, 1.0)
self._target = tf.placeholder(tf.float32, [BATCH_SIZE, VOCAB_SIZE])
self._input_data=tf.placeholder(tf.int32,[BATCH_SIZE, INPUT_LENGTH])
self.embedding = tf.get_variable("embedding",
[VOCAB_SIZE, HIDDEN_UNITS],
initializer=initializer)
self.inputs = tf.split(1, INPUT_LENGTH,
tf.nn.embedding_lookup(self.embedding, self._input_data))
self.inputs2 = [tf.squeeze(input_, [1]) for input_ in self.inputs]
cell = rnn_cell.BasicLSTMCell(num_units=HIDDEN_UNITS)
initial_state = cell.zero_state(BATCH_SIZE, tf.float32)
outputs, states = rnn.rnn(cell, self.inputs2,
initial_state=initial_state)
self._outputs = outputs[-1]
self.soft_w = tf.get_variable("softmax_w",
[HIDDEN_UNITS, VOCAB_SIZE],
initializer=initializer)
prod = tf.matmul(self._outputs, self.soft_w)
#uncommenting out the following line causes the error
# prod = tf.matmul(self._outputs, self.embedding, False, True)
soft_b = tf.get_variable("softmax_b", [VOCAB_SIZE],
initializer=initializer)
self._logits = tf.nn.bias_add(prod,soft_b)
self._loss = tf.nn.softmax_cross_entropy_with_logits(self._logits,
self._target)
if not is_training:
return
learning_rate = .010001
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
self._train_op = optimizer.minimize(self._loss)
def train(self, sess, inputs, targets):
t = np.zeros((BATCH_SIZE, VOCAB_SIZE))
for i, target in enumerate(targets):
t[i,target] = 1.0
inputs = np.array(inputs)
inputs = inputs.reshape(BATCH_SIZE,INPUT_LENGTH)
fd = {self._target:t,
self._input_data:inputs}
o = sess.run([self._train_op, self._loss, self._outputs, self.embedding, self.soft_w], feed_dict = fd)
print o[2].shape
print o[3].shape
print o[4].shape
sys.exit()
return np.mean(o[1])
#this just generates dummy data
def read_data_rows(count):
ret = []
for i in range(count):
inputs = [4] * INPUT_LENGTH
output = 1
ret.append((inputs, output))
return ret
def main():
start = time.time()
tf.set_random_seed(1)
print "creating model",time.time()-start
m = Model(is_training=True)
with tf.Session() as sess:
print "initializing variables", time.time()-start
tf.initialize_all_variables().run()
for epoch in range(NUM_EPOCHS):
train_rows = read_data_rows(100)
for row_num in range(0, len(train_rows), BATCH_SIZE):
qs = []
ans = []
batch = train_rows[row_num:row_num+BATCH_SIZE]
for b in batch:
qs.append(b[0])
ans.append(b[1])
m.train(sess, qs, ans)
if __name__ == "__main__":
main()
The error I see is ValueError: Shapes TensorShape([Dimension(100)]) and TensorShape([Dimension(17), Dimension(100)]) must have the same rank
when uncommenting the line I mentioned above. What is the cause of the error I'm seeing? Why is the embedding matrix not treated the same way as the matrix self.soft_w?
The 0.6.0 (and earlier) release of TensorFlow had a bug in the implementation of gradients for tf.nn.embedding_lookup() and tf.gather() when the indices argument (self._input_data in your code) had more than one dimension.
Upgrading to the latest source release should fix this error. Otherwise, this commit has the relevant change (to array_grad.py) that will enable your program to work.