I want to train a network with planar curves, which I represent as numpy arrays with shape (L,2).
The number 2 stands for x,y coordinates and L is the number of points which is changing in my dataset. I treat x,y as 2 different "channels".
I implemented a function, next_batch(batch_size), that provides the next batch as a 1D numpy array with shape (batch_size,), containing elements which are 2D arrays with shape: (L,2). These are my curves, and as mentioned before, L is different between the elements. (I didn't want to confine to fixed number of points in the curve).
My question:
How can I manipulate the output from next_batch() so I will able to feed the network with the input curves, using a scheme similar to what appears in Tensorflow tutorial: https://www.tensorflow.org/get_started/mnist/pros
i.e, using the feed_dict mechanism.
In the given turorial the input size was fixed, in the tutorial's code line:
train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
batch[0] has a fixed shape: (50,784) (50 = #samples,784 = #pixels)
I cannot transform my input into numpy array with shape (batch_size,L,2)
since the array should have fixed size in every dimension.
So what can I do?
I already defined a placeholder (that can have unknown size):
#first dimension is the sample dim, second is curve length, third:x,y coordinates
x = tf.placeholder(tf.float32, [None, None,2])
but how can I feed it properly?
Short answer that you're probably looking for: you can't without padding or grouping samples by lenght.
To elaborate a bit: in tensorflow, dimensions must be fixed throughout a batch, and jagged arrays are not natively supported.
Dimensions may be unknown a priori (in which case you set the placeholders' dimensions to None) but are still inferred at runtime, so
your solution of having a placeholder:
x = tf.placeholder(tf.float32, [None, None, 2])
couldn't work because it's semantically equivalent to saying "I don't know the constant length of the curves in a batch a priori, infer it at runtime from the data".
This is not to say that your model in general can't accept inputs of different dimensions, if you structure it accordingly, but the data that you feed it each time you call sess.run() must have fixed dimensions.
Your options, then, are as follows:
Pad your batches along the second dimension.
Say that you have 2 curves of shape (4, 2) and (5, 2) and you know the maximum curve length in you dataset is 6, you could use np.pad as follows:
In [1]: max_len = 6
...: curve1 = np.random.rand(4, 2)
...: curve2 = np.random.rand(5, 2)
...: batch = [curve1, curve2]
In [2]: for b in batch:
...: dim_difference = max_len - b.shape[0]
...: print np.pad(b, [(0, dim_difference), (0,0)], 'constant')
...:
[[ 0.92870128 0.12910409]
[ 0.41894655 0.59203704]
[ 0.3007023 0.52024492]
[ 0.47086336 0.72839691]
[ 0. 0. ]
[ 0. 0. ]]
[[ 0.71349902 0.0967278 ]
[ 0.5429274 0.19889411]
[ 0.69114597 0.28624011]
[ 0.43886002 0.54228625]
[ 0.46894651 0.92786989]
[ 0. 0. ]]
Have your next_batch() function return batches of curves grouped by length.
These are the standard ways of doing things when dealing with jagged arrays.
Another possibility, if your task allows for it, is to concatenate all your points in a single tensor of shape (None, 2) and change your model to operate on single points as if they were samples in a batch. If you save the original sample lengths in a separate array, you can then restore the model outputs by slicing them correctly. This is highly inefficient and requires all sorts of assumptions on your problem, but it's a possibility.
Cheers and good luck!
You can use input with different sizes in TF. just feed the data in the same way as in the tutorial you listed, but make sure to define the changing dimensions in the placeholder as None.
Here's an simple example of feeding a placeholder with different shapes:
import tensorflow as tf
import numpy as np
array1 = np.arange(9).reshape((3,3))
array2 = np.arange(16).reshape((4,4))
array3 = np.arange(25).reshape((5,5))
model_input = tf.placeholder(dtype='float32', shape=[None, None])
sqrt_result = tf.sqrt(model_input)
with tf.Session() as sess:
print sess.run(sqrt_result, feed_dict={model_input:array1})
print sess.run(sqrt_result, feed_dict={model_input:array2})
print sess.run(sqrt_result, feed_dict={model_input:array3})
You can use placeholder with initial the var with [None, ..., None]. Each 'None' means there are input feed data at that dimension for the compiler. For example, [None, None] means a matrix with any row and column length you can feed. However, you should take care about which kind of NN you use. Because when you deal with CNN, at the convolution layer and pool layer you must identify the specific size of the 'tensor'.
Tensorflow Fold might be of interest to you.
From the Tensorflow Fold README:
TensorFlow Fold is a library for creating TensorFlow models that consume structured data, where the structure of the computation graph depends on the structure of the input data.Fold implements dynamic batching. Batches of arbitrarily shaped computation graphs are transformed to produce a static computation graph. This graph has the same structure regardless of what input it receives, and can be executed efficiently by TensorFlow.
The graph structure can be set up so as to accept an arbitrary L value so that any structured input can be read in. This is especially helpful when building architectures such as recursive neural nets. The overall structure is very similar to what you are used to (feed dicts, etc). Since you need a dynamic computational graph for your application, this might be a good move for you in the long run.
Related
I'm trying to train N-tuple Network using keras. N-tuple network is just sparse array of one-hot activated patterns. Imagine chess board with 64 squares, each square containing possible N types of pieces, so there will be always of 64 activated ones, for 64*N possible parameters, and stored as 2d array [64][N]. Or every possible 2x2 squares, so N^4 possible configuration for each such square. Such network is linear and will output 1 value. The training is a good old SGD and the likes.
I successfully trained the network using my code in c++, using lookup tables and summing. But I tried to do it keras, as keras allows for different optimization algorithms, use of GPUs etc. For starters I changed the 2d array into big vector, but soon it became impractical. There are thousands possible parameters, in which there are only handful (fixed) number of ones and the rest are zeros.
I was wondering if in keras (or similar library) it is possible to use training data like this: 13,16,11,11,5,...,3, where those numbers would be indexes, instead of using one big vector of 0,0,0,1,0,0,......,1,0,0,0,....,1,0,0,0,...
You could use, tf.sparse.SparseTensor(...), then set sparse=True, for tf.keras.Input(...).
def sparse_one_hot(y):
m = len(y)
n_classes = len(tf.unique(tf.squeeze(y))[0])
dim2 = tf.range(m, dtype='int64')[:, None]
indices = tf.concat([y, dim2], axis=1)
ones = tf.ones(shape=(m, ), dtype='float32')
sparse_y = tf.sparse.SparseTensor(indices, ones, dense_shape=(m, n_classes))
return sparse_y
import tensorflow as tf
y = tf.random.uniform(shape=(10, 1), minval=0, maxval=4, dtype=tf.int64)
sparse_y = sparse_one_hot(y) # sparse_y.values, sparse_y.indices
# set sparse=True, for Input
# tf.keras.Input(..., sparse=True, ...)
I have a multi-class(4-class) classification model in keras which looks like 1
While training, the model expects the input shape to be (None,None,300). That is, If there are 'n' different input sequences, then the input shape should be (n,None,300). In my case, the size of each input sequence is different.
Say, the input sequences are of shapes (1000,300), (1500,300), (1200,300) and (2000,300). Now I need to put them together to (4,None,300). I tried using numpy array, but numpy array won't give shape of (4,None,300),instead it will be (4L,).
Now I want to know how to train my model? Is it possible to do with numpy arrays or any different data structures are available?
Since your sequences are of different duration, you may consider padding them with zeros (adjusting the loss/labels accordingly) and then
max_duration = 2000
in_ = np.zeros((4, max_duration, 300), dtype='f4')
for i in xrange(4):
# fit sequence
in_[i,:len(seq[i]),:] = seq[i]
I am working on a siamese CNN with attention in TensorFlow.
The CNN structure consists on a embedding lookup table shared by two CNN sharing weights.
The inputs for the network are two matrices, both containing indices for question and answer to be fed into the network (batch_size x sentence_length):
self.input_q = tf.placeholder(tf.int32, [None, sentence_length], name="input_q")
self.input_a = tf.placeholder(tf.int32, [None, sentence_length], name="input_a")
After embedding each sentence (row from the input matrix) I end up with two tensors (questions and answer) each of them of size: batch_size x sentence_lentgh x embedding_size.
Let's forget for now about the batch dimension to make things easier. This is to say, we have two matrices Qemb and Aemb, both sentence_lentgh x embedding_size.
From this two matrices I would like to construct a third one, an attention matrix A used for a posterior learnable attention feature matrix , using numpy would be defined as follows:
A[i,j] = 1.0 / (1.0 + np.linalg.norm(Qemb[i,:]-Aemb[j,:]))
This matrix is built for each input pair, so should be a part of the graph, but apparently this cannot be done in TensorFlow as there's no asingn operation by index for a Tensor.
Am I right?
I thought I could run the ops for embedding the question and answer, build the A matrix outside the graph given the computedembeddings and then feed the A matrix back to the graph to continue the next operations based on it.
self.attention_matrix = \
tf.placeholder(tf.float32,
[None, sentence_length, sentence_length],
name = "Attention_matrix")
Is there any problem with this approach that I might not be aware of?
(Appart from runing the embeddings ops twice, what doesn't seem optimal, but not a big deal)
I am learning TensorFlow, and my goal is to implement MultiPerceptron for my needs. I checked the MNIST tutorial with MultiPerceptron implementation and everything was clear to me except this:
_, c = sess.run([optimizer, cost], feed_dict={x: batch_x,
y: batch_y})
I guess, x is an image itself(28*28 pixels, so the input is 784 neurons) and y is a label which is an 1x10 array:
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])
They feed whole batches (which are packs of data points and labels)! How does tensorflow interpret this "batch" input? And how does it update the weights: simultaneously after each element in a batch, or after running through the whole batch?
And, if I need to input one number (input_shape = [1,1]) and output four numbers (output_shape = [1,4]), how should I change the tf.placeholders and in which form should I feed them into session?
When I ask, how does tensorflow interpret it, I want to know how tensorflow splits the batch into single elements. For example, batch is a 2-D array, right? In which direction does it split an array? Or it uses matrix operations and doesn't split anything?
When I ask, how should I feed my data, I want to know, should it be a 2-D array with samples at its rows and features at its columns, or, maybe, could it be a 2-D list.
When I feed my float numpy array X_train to x, which is :
x = tf.placeholder("float", [1, n_input])
I receive an error:
ValueError: Cannot feed value of shape (1, 18) for Tensor 'Placeholder_10:0', which has shape '(1, 1)'
It appears that I have to create my data as a Tensor too?
When I tried [18x1]:
Cannot feed value of shape (18, 1) for Tensor 'Placeholder_12:0', which has shape '(1, 1)'
They feed whole bathces(which are packs of data points and labels)!
Yes, this is how neural networks are usually trained (due to some nice mathematical properties of having best of two worlds - better gradient approximation than in SGD on one hand and much faster convergence than full GD).
How does tensorflow interpret this "batch" input?
It "interprets" it according to operations in your graph. You probably have reduce mean somewhere in your graph, which calculates average over your batch, thus causing this to be the "interpretation".
And how does it update the weights: 1.simultaniusly after each element in a batch? 2. After running threw the whole batch?.
As in the previous answer - there is nothing "magical" about batch, it is just another dimension, and each internal operation of neural net is well defined for the batch of data, thus there is still a single update in the end. Since you use reduce mean operation (or maybe reduce sum?) you are updating according to mean of the "small" gradients (or sum if there is reduce sum instead). Again - you could control it (up to the agglomerative behaviour, you cannot force it to do per-sample update unless you introduce while loop into the graph).
And, if i need to imput one number(input_shape = [1,1]) and ouput four nubmers (output_shape = [1,4]), how should i change the tf.placeholders and in which form should i feed them into session? THANKS!!
just set the variables, n_input=1 and n_classes=4, and you push your data as before, as [batch, n_input] and [batch, n_classes] arrays (in your case batch=1, if by "1x1" you mean "one sample of dimension 1", since your edit start to suggest that you actually do have a batch, and by 1x1 you meant a 1d input).
EDIT: 1.when i ask, how does tensorflow interpret it, i want to know, how tensorflow split the batch into single elements. For example, batch is a 2-D array, right? In which direction it splits an array. Or it uses matrix operations and doesnt split anything? 2. When i ask, how should i feed my data, i want to know, should it be a 2-D array with samples at its rows and features at its colums, or, maybe, could it be a 2-D list.
It does not split anything. It is just a matrix, and each operation is perfectly well defined for matrices as well. Usually you put examples in rows, thus in first dimension, and this is exactly what [batch, n_inputs] says - that you have batch rows each with n_inputs columns. But again - there is nothing special about it, and you could also create a graph which accepts column-wise batches if you would really need to.
Has anyone tried using Sparse Tensors for Text Analysis with TensorFlow with success? Everything is ready and I manage to feed feed_dict in tf.Session for a Softmax layer with numpy arrays, but I am unable to feed the dictionary with SparseTensorValues.
I have not found either documentation about using sparse matrices to train a model ( softmax for example ) with Tensor Flow, which is strange, as classes SparseTensor and SparseTensorValues or TensorFlow.sparse_to_dense methods are ready for it, but there is no documentation about how to feed the feed_dict dictionary of values in the session.run(fetches,feed_dict=None) method.
Thanks a lot,
I have found a way of putting sparse images into tensorflow including batch processing if that is of any help.
I create a 4-d sparse matrix in a dictionary where the dimensions are batchSize, xLen, ylen, zLen (where zLen is 3 for colour for example). The following pseudo code is for a batch of 50 32x96 pixel 3-color images. Values are the intensity of each pixel. In the snippet below I show the first 2 pixels of the first batch being initialised...
shape = [50, 32, 96, 3]
indices = [[0, 20, 31, 0],[0, 22, 33, 1], etc...]
values = [12, 24, etc...]
batch = {"indices": indices, "values": values, "shape": shape}
When setting up the computational graph I create a sparse-placeholder of the correct dimensions
images = tf.sparse_placeholder(tf.float32, shape=[None, 32, 96, 3])
'None' is used so I can vary the batch size.
When I first want to use the images, e.g. to feed into a batch convolution, I convert them back to a dense tensor:
images = tf.sparse_tensor_to_dense(batch)
Then when I am ready to run a session, e.g. for training, I pass the 3 components of the batch into the dictionary so that they will be picked up by the sparse_placeholder:
train_dict = {images: (batch['indices'], batch['values'], batch['shape']), etc...}
sess.run(train_step, feed_dict=train_dict)
If you are not needing to do batch processing just leave off the first dimension and remove 'none' from the placeholder shape.
I couldn't find any way of passing the images across in batch as an array of sparse matrices. It only worked if I created the 4th dimension. I'd be interested to know of alternatives.
Whilst this doesn't give an exact answer to your question I hope it is of use as I have been struggling with similar issues.