Giving input of different sizes to LSTM model in keras

Giving input of different sizes to LSTM model in keras - python

I have a multi-class(4-class) classification model in keras which looks like 1
While training, the model expects the input shape to be (None,None,300). That is, If there are 'n' different input sequences, then the input shape should be (n,None,300). In my case, the size of each input sequence is different.
Say, the input sequences are of shapes (1000,300), (1500,300), (1200,300) and (2000,300). Now I need to put them together to (4,None,300). I tried using numpy array, but numpy array won't give shape of (4,None,300),instead it will be (4L,).
Now I want to know how to train my model? Is it possible to do with numpy arrays or any different data structures are available?

Since your sequences are of different duration, you may consider padding them with zeros (adjusting the loss/labels accordingly) and then
max_duration = 2000
in_ = np.zeros((4, max_duration, 300), dtype='f4')
for i in xrange(4):
# fit sequence
in_[i,:len(seq[i]),:] = seq[i]

Related

Variable sentence length for LSTM using word2vec as inputs on tensorflow

I am building an LSTM Model using word2vec as an input. I am using the tensorflow framework. I have finished word embedding part, but I am stuck with LSTM part.
The issue here is that I have different sentence lengths, which means that I have to either do padding or use dynamic_rnn with specified sequence length. I am struggling with both of them.
Padding.
The confusing part of padding is when I do padding. My model goes like
word_matrix=model.wv.syn0
X = tf.placeholder(tf.int32, shape)
data = tf.placeholder(tf.float32, shape)
data = tf.nn.embedding_lookup(word_matrix, X)
Then, I am feeding sequences of word indices for word_matrix into X. I am worried that if I pad zero's to the sequences fed into X, then I would incorrectly keep feeding unnecessary input (word_matrix[0] in this case).
So, I am wondering what is the correct way of 0 padding. It would be great if you let me know how to implement it with tensorflow.
dynamic_rnn
For this, I have declared a list containing all the lengths of sentences and feed those along with X and y at the end. In this case, I cannot feed the inputs as batch though. Then, I have encountered this error (ValueError: as_list() is not defined on an unknown TensorShape.), which seems to me that sequence_length argument only accepts list? (My thoughts might be entirely incorrect though).
The following is my code for this.
X = tf.placeholder(tf.int32)
labels = tf.placeholder(tf.int32, [None, numClasses])
length = tf.placeholder(tf.int32)
data = tf.placeholder(tf.float32, [None, None, numDimensions])
data = tf.nn.embedding_lookup(word_matrix, X)
lstmCell = tf.contrib.rnn.BasicLSTMCell(lstmUnits, state_is_tuple=True)
lstmCell = tf.contrib.rnn.DropoutWrapper(cell=lstmCell, output_keep_prob=0.25)
initial_state=lstmCell.zero_state(batchSize, tf.float32)
value, _ = tf.nn.dynamic_rnn(lstmCell, data, sequence_length=length,
initial_state=initial_state, dtype=tf.float32)
I am so struggling with this part so that any help would be very much appreciated.
Thank you in advance.

Tensorflow does not support variable length Tensor. So when you declare a Tensor, the list/numpy array should have a uniform shape.
From your 1st part, what I understand is that you were already able to pad the zeros in the last time steps of the sequence length. Which is what the ideal situation should be. Here is how it should look for a batch size of 4, max sequence length 10 and 50 hidden units ->
[4,10,50] would be the size of your whole batch, but internally, it may be shaped like this when you try to visualize the paddings ->
`[[5+5pad,50],[10,50],[8+2pad,50],[9+1pad,50]`
Each pad would represent a sequence length of 1 with hidden state size 50 Tensor. All filled with nothing but zeroes. Look at this question and this one to know more about how to pad manually.
You will use dynamic rnn for the exact reason that you do not want to compute it on the padding sequences. The tf.nn.dynamic_rnn api will ensure that by passing the sequence_length argument.
For the above example, that argument will be: [5,10,8,9] for the example above. You can compute it by summing the non-zero entities for each batch component. A simple way to compute that would be:
data_mask = tf.cast(data, tf.bool)
data_len = tf.reduce_sum(tf.cast(data_mask, tf.int32), axis=1)
and pass it in the tf.nn.dynamic_rnn api:
tf.nn.dynamic_rnn(lstmCell, data, sequence_length=data_len, initial_state=initial_state)

Format mutiple inputs with mutiple categories for a functional keras model and feed it to the model

I can't figure out how to correctly feed training data to a functional keras model. I have two input types: Image data and float numbers, each number belonging to one image. This data is classified into 6 classes. How do I need to format my input data and how do I need to define it in my keras network?
The image data is analyzed by a CNN and should then be concatenated with the float numbers. Afterwards, three dense layers are used for classification. There doesn't seem to be an example or tutorial that is similar to my problem.

Two separate inputs:
imageInput = Input(image_shape) #often, image_shape is (pixelsX, pixelsY, channels)
floatInput = Input(float_shape) #if one number per image, shape is: (1,)
The convolutional part:
convOut = SomeConvLayer(...)(imageInput)
convOut = SomeConvLayer(...)(convOut)
#...
convOut = SomeConvLayer(...)(convOut)
If necessary, do something similar with the other input.
Joining the two branches:
#Please make sure you use compatible shapes
#You should probably not have spatial dimensions anymore at this point
#Probably some kind of GloobalPooling:
convOut = GlobalMaxPooling2D()(convOut)
#concatenate the values:
joinedOut = Concatenate()([convOut,floatInput])
#or some floatOut if there were previous layers in the float side
Do more stuff with your joined output:
joinedOut = SomeStuff(...)(joinedOut)
joinedOut = Dense(6, ...)(joinedOut)
Create the model with two inputs:
model = Model([imageInput,floatInput], joinedOut)
Train with:
model.fit([X_images, X_floats], classes, ...)
Where classes is a "one-hot encoded" tensor containing the correct class(es) for each image.
There isn't "one correct solution", though. You could try a lot of different things, such as "adding the number" somewhere in the middle of the convolutions, or multiplying it, or creating more convolutions after you manage to concatenate the values somehow.... this is art.
The input data
The input and output data should be numpy arrays.
The arrays should be shaped as:
- Image input: `(number_of_images, side1, side2, channels)`
- Floats input: `(number_of_images, number_of_floats_per_image)`
- Outputs: `(number_of_images, number_of_classes)`
Keras will know everything necessary from these shapes, row 0 in all arrays will be image 0, row 1 will be image 1 and so on.

Keras: feed images into CNN and get image output

So far, I've been practicing neural networks on numerical datasets in pandas, but now I need to create a model that will take an image as input and output a binary mask of that image.
I have my training data as numpy arrays of shape (602, 2048, 2048, 1). 602 images of dimensions 2048x2048 with one channel. The array of output masks have the same dimensions.
What I can't figure out is how to define the first layer or how to correctly feed the data into the model. I would greatly appreciate your help on this issue

Well, this is not a "rule", but probably you will be using mostly 2D conv and related layers.
You feed everything as numpy arrays, as usual, maybe normalizing the values. Common options are:
Between 0 and 1 (just divide by 255.)
Between -1 and 1 (divide by 255., multiply by 2, subtract 1)
Caffe style: subtract from each channel a specific value to "center" the values based on their usual mean without rescaling them.
Your model should start with something like:
inputTensor = Input((2048,2048,1))
output = Conv2D(filters, kernel_size, .....)(inputTensor)
Or, in sequential models: model.add(Conv2D(...., input_shape=(2048,2048,1))
Later, it's up to you to decide which layers to use.
Conv2D
MaxPooling2D
Upsampling2D
Whether you're going to create a linear model or if you're going to divide branches, join branches, etc. is also your call.
Models in a U-Net style should be a good start for you.
What you can't do:
Don't use Flatten layers (actually you can, if you later reshape the output for having image dimensions... but why?)
Don't use Global Pooling layers (you don't want to sacrifice your spatial dimensions)

How to feed input with changing size in Tensorflow

I want to train a network with planar curves, which I represent as numpy arrays with shape (L,2).
The number 2 stands for x,y coordinates and L is the number of points which is changing in my dataset. I treat x,y as 2 different "channels".
I implemented a function, next_batch(batch_size), that provides the next batch as a 1D numpy array with shape (batch_size,), containing elements which are 2D arrays with shape: (L,2). These are my curves, and as mentioned before, L is different between the elements. (I didn't want to confine to fixed number of points in the curve).
My question:
How can I manipulate the output from next_batch() so I will able to feed the network with the input curves, using a scheme similar to what appears in Tensorflow tutorial: https://www.tensorflow.org/get_started/mnist/pros
i.e, using the feed_dict mechanism.
In the given turorial the input size was fixed, in the tutorial's code line:
train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
batch[0] has a fixed shape: (50,784) (50 = #samples,784 = #pixels)
I cannot transform my input into numpy array with shape (batch_size,L,2)
since the array should have fixed size in every dimension.
So what can I do?
I already defined a placeholder (that can have unknown size):
#first dimension is the sample dim, second is curve length, third:x,y coordinates
x = tf.placeholder(tf.float32, [None, None,2])
but how can I feed it properly?

Short answer that you're probably looking for: you can't without padding or grouping samples by lenght.
To elaborate a bit: in tensorflow, dimensions must be fixed throughout a batch, and jagged arrays are not natively supported.
Dimensions may be unknown a priori (in which case you set the placeholders' dimensions to None) but are still inferred at runtime, so
your solution of having a placeholder:
x = tf.placeholder(tf.float32, [None, None, 2])
couldn't work because it's semantically equivalent to saying "I don't know the constant length of the curves in a batch a priori, infer it at runtime from the data".
This is not to say that your model in general can't accept inputs of different dimensions, if you structure it accordingly, but the data that you feed it each time you call sess.run() must have fixed dimensions.
Your options, then, are as follows:
Pad your batches along the second dimension.
Say that you have 2 curves of shape (4, 2) and (5, 2) and you know the maximum curve length in you dataset is 6, you could use np.pad as follows:
In [1]: max_len = 6
...: curve1 = np.random.rand(4, 2)
...: curve2 = np.random.rand(5, 2)
...: batch = [curve1, curve2]
In [2]: for b in batch:
...: dim_difference = max_len - b.shape[0]
...: print np.pad(b, [(0, dim_difference), (0,0)], 'constant')
...:
[[ 0.92870128 0.12910409]
[ 0.41894655 0.59203704]
[ 0.3007023 0.52024492]
[ 0.47086336 0.72839691]
[ 0. 0. ]
[ 0. 0. ]]
[[ 0.71349902 0.0967278 ]
[ 0.5429274 0.19889411]
[ 0.69114597 0.28624011]
[ 0.43886002 0.54228625]
[ 0.46894651 0.92786989]
[ 0. 0. ]]
Have your next_batch() function return batches of curves grouped by length.
These are the standard ways of doing things when dealing with jagged arrays.
Another possibility, if your task allows for it, is to concatenate all your points in a single tensor of shape (None, 2) and change your model to operate on single points as if they were samples in a batch. If you save the original sample lengths in a separate array, you can then restore the model outputs by slicing them correctly. This is highly inefficient and requires all sorts of assumptions on your problem, but it's a possibility.
Cheers and good luck!

You can use input with different sizes in TF. just feed the data in the same way as in the tutorial you listed, but make sure to define the changing dimensions in the placeholder as None.
Here's an simple example of feeding a placeholder with different shapes:
import tensorflow as tf
import numpy as np
array1 = np.arange(9).reshape((3,3))
array2 = np.arange(16).reshape((4,4))
array3 = np.arange(25).reshape((5,5))
model_input = tf.placeholder(dtype='float32', shape=[None, None])
sqrt_result = tf.sqrt(model_input)
with tf.Session() as sess:
print sess.run(sqrt_result, feed_dict={model_input:array1})
print sess.run(sqrt_result, feed_dict={model_input:array2})
print sess.run(sqrt_result, feed_dict={model_input:array3})

You can use placeholder with initial the var with [None, ..., None]. Each 'None' means there are input feed data at that dimension for the compiler. For example, [None, None] means a matrix with any row and column length you can feed. However, you should take care about which kind of NN you use. Because when you deal with CNN, at the convolution layer and pool layer you must identify the specific size of the 'tensor'.

Tensorflow Fold might be of interest to you.
From the Tensorflow Fold README:
TensorFlow Fold is a library for creating TensorFlow models that consume structured data, where the structure of the computation graph depends on the structure of the input data.Fold implements dynamic batching. Batches of arbitrarily shaped computation graphs are transformed to produce a static computation graph. This graph has the same structure regardless of what input it receives, and can be executed efficiently by TensorFlow.
The graph structure can be set up so as to accept an arbitrary L value so that any structured input can be read in. This is especially helpful when building architectures such as recursive neural nets. The overall structure is very similar to what you are used to (feed dicts, etc). Since you need a dynamic computational graph for your application, this might be a good move for you in the long run.

TensorFlow Multi-Layer Perceptron

I am learning TensorFlow, and my goal is to implement MultiPerceptron for my needs. I checked the MNIST tutorial with MultiPerceptron implementation and everything was clear to me except this:
_, c = sess.run([optimizer, cost], feed_dict={x: batch_x,
y: batch_y})
I guess, x is an image itself(28*28 pixels, so the input is 784 neurons) and y is a label which is an 1x10 array:
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])
They feed whole batches (which are packs of data points and labels)! How does tensorflow interpret this "batch" input? And how does it update the weights: simultaneously after each element in a batch, or after running through the whole batch?
And, if I need to input one number (input_shape = [1,1]) and output four numbers (output_shape = [1,4]), how should I change the tf.placeholders and in which form should I feed them into session?
When I ask, how does tensorflow interpret it, I want to know how tensorflow splits the batch into single elements. For example, batch is a 2-D array, right? In which direction does it split an array? Or it uses matrix operations and doesn't split anything?
When I ask, how should I feed my data, I want to know, should it be a 2-D array with samples at its rows and features at its columns, or, maybe, could it be a 2-D list.
When I feed my float numpy array X_train to x, which is :
x = tf.placeholder("float", [1, n_input])
I receive an error:
ValueError: Cannot feed value of shape (1, 18) for Tensor 'Placeholder_10:0', which has shape '(1, 1)'
It appears that I have to create my data as a Tensor too?
When I tried [18x1]:
Cannot feed value of shape (18, 1) for Tensor 'Placeholder_12:0', which has shape '(1, 1)'

They feed whole bathces(which are packs of data points and labels)!
Yes, this is how neural networks are usually trained (due to some nice mathematical properties of having best of two worlds - better gradient approximation than in SGD on one hand and much faster convergence than full GD).
How does tensorflow interpret this "batch" input?
It "interprets" it according to operations in your graph. You probably have reduce mean somewhere in your graph, which calculates average over your batch, thus causing this to be the "interpretation".
And how does it update the weights: 1.simultaniusly after each element in a batch? 2. After running threw the whole batch?.
As in the previous answer - there is nothing "magical" about batch, it is just another dimension, and each internal operation of neural net is well defined for the batch of data, thus there is still a single update in the end. Since you use reduce mean operation (or maybe reduce sum?) you are updating according to mean of the "small" gradients (or sum if there is reduce sum instead). Again - you could control it (up to the agglomerative behaviour, you cannot force it to do per-sample update unless you introduce while loop into the graph).
And, if i need to imput one number(input_shape = [1,1]) and ouput four nubmers (output_shape = [1,4]), how should i change the tf.placeholders and in which form should i feed them into session? THANKS!!
just set the variables, n_input=1 and n_classes=4, and you push your data as before, as [batch, n_input] and [batch, n_classes] arrays (in your case batch=1, if by "1x1" you mean "one sample of dimension 1", since your edit start to suggest that you actually do have a batch, and by 1x1 you meant a 1d input).
EDIT: 1.when i ask, how does tensorflow interpret it, i want to know, how tensorflow split the batch into single elements. For example, batch is a 2-D array, right? In which direction it splits an array. Or it uses matrix operations and doesnt split anything? 2. When i ask, how should i feed my data, i want to know, should it be a 2-D array with samples at its rows and features at its colums, or, maybe, could it be a 2-D list.
It does not split anything. It is just a matrix, and each operation is perfectly well defined for matrices as well. Usually you put examples in rows, thus in first dimension, and this is exactly what [batch, n_inputs] says - that you have batch rows each with n_inputs columns. But again - there is nothing special about it, and you could also create a graph which accepts column-wise batches if you would really need to.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.