I wonder what is the structure of the Tensorflow's BasicRNNCell in recurrent neural network shown below? It seems to me that it is a neural network with 3 layers and 12 neurons. But I am not sure how this connections look like? I am not sure whether it is a Hopfield net?
cell = tf.contrib.rnn.BasicRNNCell(num_units=12)
states_series, current_state = tf.nn.dynamic_rnn(cell=cell,inputs=batchX_placeholder,dtype=tf.float32)
This is one layer of basic RNN cells, each having 12 hidden units. The number of cells depends on your batchX_placeholder placeholder.
Here's an example:
n_steps = 2
n_inputs = 3
n_neurons = 5
X = tf.placeholder(dtype=tf.float32, shape=[None, n_steps, n_inputs])
basic_cell = tf.nn.rnn_cell.BasicRNNCell(num_units=n_neurons)
outputs, states = tf.nn.dynamic_rnn(basic_cell, X, dtype=tf.float32)
print(tf.trainable_variables())
It prints...
[<tf.Variable 'rnn/basic_rnn_cell/kernel:0' shape=(8, 5) dtype=float32_ref>,
<tf.Variable 'rnn/basic_rnn_cell/bias:0' shape=(5,) dtype=float32_ref>]
So it created one shared kernel matrix and one shared bias vector. The number of cells corresponds to output.shape (derived from X.shape), which is [?, 2, 5] in this example. So there're 2 cells.
If you wish to create multiple layers, you should use tf.nn.rnn_cell.MultiRNNCell function that accepts the list of cells in each layer.
Related
I am working on a stock market prediction project using sentiment analysis. I am trying to create a CNN model where I am passing 4000 days of stock data with a batch size of 100. At the end of the dense layer, I want to add regression layer to get the price of the stock.
def Model(train_data):
input_layer = tf.reshape(tf.cast(train_data, tf.float32), [-1, 1, 100, 2])
conv1 = tf.layers.conv2d(inputs=input_layer,filters=32,kernel_size=[1, 5],padding="same",
activation=tf.nn.relu,strides=1,kernel_initializer=tf.contrib.layers.variance_scaling_initializer())
pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[1, 2], strides=[1,2])
conv2 = tf.layers.conv2d(inputs=pool1,filters=8,kernel_size=[1, 5],padding="same",activation=tf.nn.relu,
strides=1,kernel_initializer=tf.contrib.layers.variance_scaling_initializer())
pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[1, 5], strides=[1,5])
conv3 = tf.layers.conv2d(inputs=pool2,filters=2,kernel_size=[1, 2],padding="same",activation=tf.nn.relu,
strides=1,kernel_initializer=tf.contrib.layers.variance_scaling_initializer())
pool3 = tf.layers.max_pooling2d(inputs=conv3, pool_size=[1, 2], strides=[1, 2])
pool3_flat = tf.reshape(pool3, [40, 1 * 5 * 2])
dense = tf.layers.dense(inputs=pool3_flat, units=5, activation=tf.nn.relu)
dropout = tf.layers.dropout(
inputs=dense, rate=0.2, training=mode == tf.estimator.ModeKeys.TRAIN)
logits = tf.layers.dense(inputs=dropout, units=1)
I am referring https://www.tensorflow.org/tutorials/estimators/cnn for the model, but they are doing classification. Can anybody suggest an approach for regression? The train_data for the model has a shape of [2,4000] where one row is for normalized stock prices and another is for sentiment factor.
The only thing you would have to do would be to add a fully connected layer at the very end, and select a linear activation. Intuitively, this will take the outputs of your Conv layers, and apply y = mx + b to them. Your fully connected output layer would have 40 nodes (one for each output). In fact, you already have one dense layer in that code. If your output is of size 40, just make it 40 instead of 5.
Just a side note, traditionally, CNNs are used for image classification, and only recently did it start migrating to other applications (such as spam detection). I would advise trying a simple feed forward neural network first, and if that does not work, perhaps try a RNN before this.
In order to learn tensorflow, I executed this tensorflow official mnist script (cnn_mnist.py) and displayed the graph with tensorboard.
The following is part of the code.
This network contains two conv layers and two dense layers.
conv1 = tf.layers.conv2d(inputs=input_layer,filters=32,kernel_size=[5, 5],
padding="same",activation=tf.nn.relu)
pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2)
conv2 = tf.layers.conv2d(inputs=pool1,filters=64,kernel_size=[5, 5],
padding="same",activation=tf.nn.relu)
pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2, 2], strides=2)
pool2_flat = tf.reshape(pool2, [-1, 7 * 7 * 64])
dense = tf.layers.dense(inputs=pool2_flat, units=1024, activation=tf.nn.relu)
dropout = tf.layers.dropout(
inputs=dense, rate=0.4, training=mode == tf.estimator.ModeKeys.TRAIN)
logits = tf.layers.dense(inputs=dropout, units=10)
However, looking at the graph generated by tensorboard, there are three conv layers and three dense layers.
I did not expect that conv2d_1 and dense_1 will be generated.
Why was conv2d_1 and dense_1 generated ?
This is a good question, because it sheds some light into inner structure of tf.layers wrappers. Let's run two experiments:
Run the model exactly as in the question.
Add explicit names to the layers via name argument and run again.
The graph without layers' names
That's the same graph as your, but I expanded and zoomed in to the logits dense layers. Note that dense_1 contains the layer variables (kernel and bias) and dense_2 contains the ops (matrix multiplication and addition).
This means that this is still one layer, but with two naming scopes - dense_1 and dense_2. This happens because this is the second dense layer, and the first one already used the naming scope dense. Variables creation is separated from the actual layer logic - there's build and call method, - and they both try to get a unique name for the scope. This leads to dense_1 and dense_2 holding variables and ops respectively.
The graph with names specified
Now let's add name='logits' to the same layer and run again:
logits = tf.layers.dense(inputs=dropout, units=10, name='logits')
You can see there're still 2 variables and 2 ops, but the layer managed to grab one unique name for the scope (logits) and put everything inside.
Conclusion
This is a good example why explicit naming in tensorflow is beneficial, no matter if it's about tensors directly or higher-level layer. There is much less confusion, when the model uses meaningful names, instead of automatically generated ones.
They are just the variables creations and other hidden operations that happen in tf.layers.conv2d other than just the convolution operation itself (tf.nn.conv2d) and the activation (same goes for the dense layer). There are only 2 convolutions happening: as you can see, if you follow your data in the graph, it never goes through conv2D_1 or dense_1, it's just that the result of these ops (basically the variables needed for the convolution) are also given as input to the convolution operation itself. I'm actually more surprised to not see the same thing appearing for conv_2d though, but I really wouldn't worry for that !
I have a question about detailed indices of global variables which are generated by LSTM cells.
placeholders = {"inputs":tf.placeholder(tf.float32, shape=[None, None, 1000])}
cell = tf.nn.rnn_cell.BasicLSTMCell(80)
outs, states = tf.nn.dynamic_rnn(cell=cell, inputs=placeholders["inputs"], dtype=tf.float32)
This graph building gives us tf.global_variables() list as follows:
[<tf.Variable 'rnn/basic_lstm_cell/kernel:0' shape=(1080, 320) dtype=float32_ref>, <tf.Variable 'rnn/basic_lstm_cell/bias:0' shape=(320,) dtype=float32_ref>]
My question is about the index set of the variable 'rnn/basic_lstm_cell/kernel:0'.
According to the source file of BasicLSTMCell ( https://github.com/tensorflow/tensorflow/blob/r1.5/tensorflow/python/ops/rnn_cell_impl.py#L537 ), this variable has two kinds of vectors: the first one is for hidden weights and the second one is for recurrent state.
Is the following partition of indices of kernel variable correct?
Do hidden weights and recurrent state correspond to tf.slice(var, [0:0], [1000,320] and tf.slice(var, [1000:0], [1080:320]), respectively, where "var" rnn/basic_lstm_cell/kernel variable?
That is, the first 1000(=the dimenrnn/basic_lstm_cell/kernelsionality of input sequence) indices are for hidden weights and the next 80(=num_units) indices are for recurrent state.
I am trying to make a basic nonlinear regression model that will predict the return index of companies in the FTSE350.
I am unsure as to what my bias term should look like in terms of dimensions and whether I am using it properly in the calculations method:
w1 = tf.Variable(tf.truncated_normal([4, 10], mean=0.0, stddev=1.0, dtype=tf.float64))
b1 = tf.Variable(tf.constant(0.1, shape=[4,10], dtype = tf.float64))
w2 = tf.Variable(tf.truncated_normal([10, 1], mean=0.0, stddev=1.0, dtype=tf.float64))
b2 = tf.Variable(tf.constant(0.1, shape=[1], dtype = tf.float64))
def calculations(x, y):
w1d = tf.matmul(x, w1)
h1 = (tf.nn.sigmoid(tf.add(w1d, b1)))
h1w2 = tf.matmul(h1, w2)
activation = tf.add(tf.nn.sigmoid(tf.matmul(h1, w2)), b2)
error = tf.reduce_sum(tf.pow(activation - y,2))/(len(x))
return [ activation, error ]
My initial thoughts were that it should be the same size as my weights but I get this error:
ValueError: Dimensions must be equal, but are 251 and 4 for 'Add' (op: 'Add') with input shapes: [251,10], [4,10]
I've played around with different ideas but don't seem to be getting anywhere.
(My input data has 4 features)
The network structure I have attempted is 4 neurons in the input layer, 10 in the hidden layer, and 1 in the output later but I feel like I may mixed up the dimensions in my weights layer too?
When you are constructing the layers for a feed-forward fully-connected neural network (like in your example), the shape of the biases should be equal to the number of nodes in the corresponding layer. So in your case, since your weight matrix has a shape of (4, 10), you have 10 nodes in that layer and you should be using:
b1 = tf.Variable(tf.constant(0.1, shape=[10], type = tf.float64))
The reason for this is when you do w1d = tf.matmul(x, w1), you are actually getting a matrix of shape (batch_size, 10) (if batch_size is the number of rows in your input matrix). This is because you are matrix multiplying a (batch_size, 4) matrix by a (4, 10) weight matrix. Then, you are adding a bias across each column of w1d, which can be represented as a 10-dimensional vector, which you would get if you made the shape of b1 [10].
Without the non-linearity (sigmoid) afterward, this is called an affine transformation, which you can read more about here: https://en.wikipedia.org/wiki/Affine_transformation.
Another fantastic resource is the Stanford Deep Learning Tutorial, which has a good explanation of how these feed-forward models work here:
http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/.
Hope that helped!
I think your b1 should just be of dimention 10 and your code should run
Since 4 is the number of features and 10 is the number of neurones in your first layer (i think in term of neural net ...)
then you must add a bias of dimention = 10
Also you might see the biases as adding an extra feature of constant value = 1.
see this pdf if you have time it expalin very well :https://cs.stanford.edu/~quocle/tutorial1.pdf
I am reading an example of using RNN with tensorflow here: ptb_word_lm.py
I can't figure out what the embedding and embedding_lookup are doing here. How can it add another dimension to the tensor? Going from (20, 25) to (20, 25, 200). In this case (20,25) is a batch-size of 20 with 25 time steps. I can't understand how/why you can add the hidden_size of the cell as a dimension of the input data? Typically the input data would be a matrix of size [batch_size, num_features] and the model would map num_features ---> hidden_dims with a matrix of size [num_features, hidden_dims] yielding an output of size [batch-size, hidden-dims]. So how can hidden_dims be a dimension of the input tensor?
input_data, targets = reader.ptb_producer(train_data, 20, 25)
cell = tf.nn.rnn_cell.BasicLSTMCell(200, forget_bias=1.0, state_is_tuple=True)
initial_state = cell.zero_state(20, tf.float32)
embedding = tf.get_variable("embedding", [10000, 200], dtype=tf.float32)
inputs = tf.nn.embedding_lookup(embedding, input_data)
input_data_train # <tf.Tensor 'PTBProducer/Slice:0' shape=(20, 25) dtype=int32>
inputs # <tf.Tensor 'embedding_lookup:0' shape=(20, 25, 200) dtype=float32>
outputs = []
state = initial_state
for time_step in range(25):
if time_step > 0:
tf.get_variable_scope().reuse_variables()
cell_output, state = cell(inputs[:, time_step, :], state)
outputs.append(cell_output)
output = tf.reshape(tf.concat(1, outputs), [-1, 200])
outputs # list of 20: <tf.Tensor 'BasicLSTMCell/mul_2:0' shape=(20, 200) dtype=float32>
output # <tf.Tensor 'Reshape_2:0' shape=(500, 200) dtype=float32>
softmax_w = tf.get_variable("softmax_w", [config.hidden_size, config.vocab_size], dtype=tf.float32)
softmax_b = tf.get_variable("softmax_b", [config.hidden_size, config.vocab_size], dtype=tf.float32)
logits = tf.matmul(output, softmax_w) + softmax_b
loss = tf.nn.seq2seq.sequence_loss_by_example([logits], [tf.reshape(targets, [-1])],[tf.ones([20*25], dtype=tf.float32)])
cost = tf.reduce_sum(loss) / batch_size
ok, I'm not going to try and explain this specific code, but I will try and answer the "what is an embedding?" part of the title.
Basically it's a mapping of the original input data into some set of real-valued dimensions, and the "position" of the original input data in those dimensions is organized to improve the task.
In tensorflow, if you imagine some text input field has "king", "queen", "girl","boy", and you have 2 embedding dimensions. Hopefully the backprop will train the embedding to put the concept of royalty on one axis and gender on the other. So in this case, what was a 4 categorical value feature gets "boiled" down to a floating point embedding feature with 2 dimensions.
They are implemented using a lookup table, either hashed from the original or from a dictionary ordering. For a fully trained one, You might put in "Queen", and you get out say [1.0,1.0], Put in "Boy" and you get out [0.0,0.0].
Tensorflow does backprop of the error INTO this lookup table, and hopefully what starts off as a randomly initialized dictionary will gradually become like we see above.
Hope this helps. If not, look at: http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/
At simplest,
input_data: Batch of sequence of word IDs (with shape (20,25))
inputs: Batch of sequence of word embeddings (with shape (20,25,200))
How does input_data becomes inputs you might ask? This is what learning word embeddings does. The easiest way to imagine is,
unwrap the input_data to a single batch of shape (20*25,).
Now assign a vector of size 200 for each element in that unwrapped input_data which gives you a matrix of shape (20*25,200).
Now, reshape the matrix to shape (20,25,200).
This is because, embedding learning is not a time-series process. You learn word embeddings with a feed forward network. Next important question would be, how do you learn the word embeddings.
Initialise a huge Tensorflow variable of size (vocabulary_size, 200) (i.e. embedding in the code)
Optimise the embedding so that a given word should be able to predict any word from its context. (e.g. in "dog barked at the mailman", if "at" is the target word "dog", "barked", "the" and "mailman" are context words)
This process give you a vector (200 long in this example) for each word, such that semantics are preserved (i.e. vector of "dog" is close to "cat", but far away from "pen").
Here's an overview of what I just explained.