For a time series prediction, I'm testing several models and size in order to evaluate which model/configuration is the more accurate for the datasets to learn.
A generic dataset start from a matrix n x m of examples and features.
With a common LSTM, conv1d, GRU,.. I reshape the matrix in 3d tensor n x q x m of examples, timestep and features as feed for the model. I then split the 3d tensor in 2 sub tensor of train and validation with n1 x... and n2 x .... where n1 + n2 = n.
....But for a convLSTM2d model, how can I reshape my 2d dataset to a 4d tensor in order to feed this new model ? I red several instruction on the web and in case of image processing, I could use its shape as 1st and 2nd element of the tensor. But in case of time series, which parameters I should use for obtain a coherent 4d tensor (suitable to then split in 2 coherent sub 4d tensors) ?
Thanks for help me
Related
I am a beginner for Tensorflow. I am a bit confused by the tutorial. The author firstly gives a formula y=softmax(Wx+b), but use xW+b in the python code and explain it is a small trick. I do not understand the trick, why does the author need to flip the formula?
https://www.tensorflow.org/get_started/mnist/beginners
First, we multiply x by W with the expression tf.matmul(x, W). This is
flipped from when we multiplied them in our equation, where we had Wx,
as a small trick to deal with x being a 2D tensor with multiple
inputs. We then add b, and finally apply tf.nn.softmax.
As you can see from the formula,
y=softmax(Wx + b)
the input x is multiplied by the Weight variable W,but in the doc
y = tf.nn.softmax(tf.matmul(x, W) + b)
W is multiplied by x for calculation convenience, so we must flip W from 10*784 to 784*10 keep consistent with the formula.
In general in machine learning, esp. tensorflow, you always want your first dimension to represent your batch. The trick is only a way of ensuring that without transposing everything before and after each matrix multiplication.
x is not really a column vector of features, but a 2D matrix of shape (batch_size, n_features).
If you keep Wx, then you'll transpose x (to x' of shape (n_features, batch_size)) use W of shape (n_outputs, n_features), and Wx' will be of shape (n_outputs, batch_size), so you'll have to transpose it back to (batch_size, n_outputs), which is what you want in the end.
If you're using tf.matmul(x, W), then W is of shape (n_features, n_outputs ), and the result is directly of shape (batch_size, n_outputs).
I agree this is not clear at first.
x being a 2D tensor with multiple inputs
is a very succinct way to tell you that in tensorflow, data is stored in tensors following conventions that are not those of linear algebra.
In particular, the outermost dimension (i.e. columns for matrices) is always the sample dimension: that is, it has the same size as your number of samples.
When you store sample features in a 2D tensor (a matrix), the features are therefore stored in the inner-most dimension, i.e. lines. That is, tensor x is the transposed of variable $x$ in the equation. So are W and b. The fact that x.T*W.T=(W.x).T explains the swap inconsistency in the multiplication between the linear algebra equation and the tensor implementation of it.
I have a multi-class(4-class) classification model in keras which looks like 1
While training, the model expects the input shape to be (None,None,300). That is, If there are 'n' different input sequences, then the input shape should be (n,None,300). In my case, the size of each input sequence is different.
Say, the input sequences are of shapes (1000,300), (1500,300), (1200,300) and (2000,300). Now I need to put them together to (4,None,300). I tried using numpy array, but numpy array won't give shape of (4,None,300),instead it will be (4L,).
Now I want to know how to train my model? Is it possible to do with numpy arrays or any different data structures are available?
Since your sequences are of different duration, you may consider padding them with zeros (adjusting the loss/labels accordingly) and then
max_duration = 2000
in_ = np.zeros((4, max_duration, 300), dtype='f4')
for i in xrange(4):
# fit sequence
in_[i,:len(seq[i]),:] = seq[i]
I am working on a siamese CNN with attention in TensorFlow.
The CNN structure consists on a embedding lookup table shared by two CNN sharing weights.
The inputs for the network are two matrices, both containing indices for question and answer to be fed into the network (batch_size x sentence_length):
self.input_q = tf.placeholder(tf.int32, [None, sentence_length], name="input_q")
self.input_a = tf.placeholder(tf.int32, [None, sentence_length], name="input_a")
After embedding each sentence (row from the input matrix) I end up with two tensors (questions and answer) each of them of size: batch_size x sentence_lentgh x embedding_size.
Let's forget for now about the batch dimension to make things easier. This is to say, we have two matrices Qemb and Aemb, both sentence_lentgh x embedding_size.
From this two matrices I would like to construct a third one, an attention matrix A used for a posterior learnable attention feature matrix , using numpy would be defined as follows:
A[i,j] = 1.0 / (1.0 + np.linalg.norm(Qemb[i,:]-Aemb[j,:]))
This matrix is built for each input pair, so should be a part of the graph, but apparently this cannot be done in TensorFlow as there's no asingn operation by index for a Tensor.
Am I right?
I thought I could run the ops for embedding the question and answer, build the A matrix outside the graph given the computedembeddings and then feed the A matrix back to the graph to continue the next operations based on it.
self.attention_matrix = \
tf.placeholder(tf.float32,
[None, sentence_length, sentence_length],
name = "Attention_matrix")
Is there any problem with this approach that I might not be aware of?
(Appart from runing the embeddings ops twice, what doesn't seem optimal, but not a big deal)
I am learning TensorFlow, and my goal is to implement MultiPerceptron for my needs. I checked the MNIST tutorial with MultiPerceptron implementation and everything was clear to me except this:
_, c = sess.run([optimizer, cost], feed_dict={x: batch_x,
y: batch_y})
I guess, x is an image itself(28*28 pixels, so the input is 784 neurons) and y is a label which is an 1x10 array:
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])
They feed whole batches (which are packs of data points and labels)! How does tensorflow interpret this "batch" input? And how does it update the weights: simultaneously after each element in a batch, or after running through the whole batch?
And, if I need to input one number (input_shape = [1,1]) and output four numbers (output_shape = [1,4]), how should I change the tf.placeholders and in which form should I feed them into session?
When I ask, how does tensorflow interpret it, I want to know how tensorflow splits the batch into single elements. For example, batch is a 2-D array, right? In which direction does it split an array? Or it uses matrix operations and doesn't split anything?
When I ask, how should I feed my data, I want to know, should it be a 2-D array with samples at its rows and features at its columns, or, maybe, could it be a 2-D list.
When I feed my float numpy array X_train to x, which is :
x = tf.placeholder("float", [1, n_input])
I receive an error:
ValueError: Cannot feed value of shape (1, 18) for Tensor 'Placeholder_10:0', which has shape '(1, 1)'
It appears that I have to create my data as a Tensor too?
When I tried [18x1]:
Cannot feed value of shape (18, 1) for Tensor 'Placeholder_12:0', which has shape '(1, 1)'
They feed whole bathces(which are packs of data points and labels)!
Yes, this is how neural networks are usually trained (due to some nice mathematical properties of having best of two worlds - better gradient approximation than in SGD on one hand and much faster convergence than full GD).
How does tensorflow interpret this "batch" input?
It "interprets" it according to operations in your graph. You probably have reduce mean somewhere in your graph, which calculates average over your batch, thus causing this to be the "interpretation".
And how does it update the weights: 1.simultaniusly after each element in a batch? 2. After running threw the whole batch?.
As in the previous answer - there is nothing "magical" about batch, it is just another dimension, and each internal operation of neural net is well defined for the batch of data, thus there is still a single update in the end. Since you use reduce mean operation (or maybe reduce sum?) you are updating according to mean of the "small" gradients (or sum if there is reduce sum instead). Again - you could control it (up to the agglomerative behaviour, you cannot force it to do per-sample update unless you introduce while loop into the graph).
And, if i need to imput one number(input_shape = [1,1]) and ouput four nubmers (output_shape = [1,4]), how should i change the tf.placeholders and in which form should i feed them into session? THANKS!!
just set the variables, n_input=1 and n_classes=4, and you push your data as before, as [batch, n_input] and [batch, n_classes] arrays (in your case batch=1, if by "1x1" you mean "one sample of dimension 1", since your edit start to suggest that you actually do have a batch, and by 1x1 you meant a 1d input).
EDIT: 1.when i ask, how does tensorflow interpret it, i want to know, how tensorflow split the batch into single elements. For example, batch is a 2-D array, right? In which direction it splits an array. Or it uses matrix operations and doesnt split anything? 2. When i ask, how should i feed my data, i want to know, should it be a 2-D array with samples at its rows and features at its colums, or, maybe, could it be a 2-D list.
It does not split anything. It is just a matrix, and each operation is perfectly well defined for matrices as well. Usually you put examples in rows, thus in first dimension, and this is exactly what [batch, n_inputs] says - that you have batch rows each with n_inputs columns. But again - there is nothing special about it, and you could also create a graph which accepts column-wise batches if you would really need to.
I attempt to solve this problem 6 in this notebook. The question is to train a simple model on this data using 50, 100, 1000 and 5000 training samples by using the LogisticRegression model from sklearn.linear_model.
lr = LogisticRegression()
lr.fit(train_dataset,train_labels)
This is the code i trying to do and it give me the error.
ValueError: Found array with dim 3. Estimator expected <= 2.
Any idea?
scikit-learn expects 2d num arrays for the training dataset for a fit function. The dataset you are passing in is a 3d array you need to reshape the array into a 2d.
nsamples, nx, ny = train_dataset.shape
d2_train_dataset = train_dataset.reshape((nsamples,nx*ny))
In LSTM, GRU, and TCN layers, the return_sequence in last layer before Dence Layer must set False .
It is one of conditions that you encounter to this error message .
If anyone is stumbling onto this question from using LSTM or any RNN for two or more time series, this might be a solution.
However, to those who want error between two different values predicted, if for example you're trying to predict two completely different time series, then you can do the following:
from sklearn import mean_squared_error
# Any sklearn function that takes 2D data only
# 3D data
real = np.array([
[
[1,60],
[2,70],
[3,80]
],
[
[2,70],
[3,80],
[4,90]
]
])
pred = np.array([
[
[1.1,62.1],
[2.1,72.1],
[3.1,82.1]
],
[
[2.1,72.1],
[3.1,82.1],
[4.1,92.1]
]
])
# Error/Some Metric on Feature 1:
print(mean_squared_error(real[:,:,0], pred[:,:,0]) # 0.1000
# Error/Some Metric on Feature 2:
print(mean_squared_error(real[:,:,1], pred[:,:,1]) # 2.0000
Additional Info from the numpy indexing
You probably have the last "lstm" layer in your model using "return_sequences=True".
Change this to false to not return the output for further lstm models.
I had a similar Error by solving an image classification problem. We have a 3D matrix: the first dimension is the total number of images, can be replaced by "-1", the second dimension is the product of the height and the width of the picture, the third dimension is equal to three, since the RGB image has three channels (red, green blue). If we don't want to lose information about the color of the image, then we use x_train.reshape(-1, nxny3). If the color can be neglected and thereby reduce the size of the matrix: x_train.reshape(-1, nxny1)