I am working on a stock market prediction project using sentiment analysis. I am trying to create a CNN model where I am passing 4000 days of stock data with a batch size of 100. At the end of the dense layer, I want to add regression layer to get the price of the stock.
def Model(train_data):
input_layer = tf.reshape(tf.cast(train_data, tf.float32), [-1, 1, 100, 2])
conv1 = tf.layers.conv2d(inputs=input_layer,filters=32,kernel_size=[1, 5],padding="same",
activation=tf.nn.relu,strides=1,kernel_initializer=tf.contrib.layers.variance_scaling_initializer())
pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[1, 2], strides=[1,2])
conv2 = tf.layers.conv2d(inputs=pool1,filters=8,kernel_size=[1, 5],padding="same",activation=tf.nn.relu,
strides=1,kernel_initializer=tf.contrib.layers.variance_scaling_initializer())
pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[1, 5], strides=[1,5])
conv3 = tf.layers.conv2d(inputs=pool2,filters=2,kernel_size=[1, 2],padding="same",activation=tf.nn.relu,
strides=1,kernel_initializer=tf.contrib.layers.variance_scaling_initializer())
pool3 = tf.layers.max_pooling2d(inputs=conv3, pool_size=[1, 2], strides=[1, 2])
pool3_flat = tf.reshape(pool3, [40, 1 * 5 * 2])
dense = tf.layers.dense(inputs=pool3_flat, units=5, activation=tf.nn.relu)
dropout = tf.layers.dropout(
inputs=dense, rate=0.2, training=mode == tf.estimator.ModeKeys.TRAIN)
logits = tf.layers.dense(inputs=dropout, units=1)
I am referring https://www.tensorflow.org/tutorials/estimators/cnn for the model, but they are doing classification. Can anybody suggest an approach for regression? The train_data for the model has a shape of [2,4000] where one row is for normalized stock prices and another is for sentiment factor.
The only thing you would have to do would be to add a fully connected layer at the very end, and select a linear activation. Intuitively, this will take the outputs of your Conv layers, and apply y = mx + b to them. Your fully connected output layer would have 40 nodes (one for each output). In fact, you already have one dense layer in that code. If your output is of size 40, just make it 40 instead of 5.
Just a side note, traditionally, CNNs are used for image classification, and only recently did it start migrating to other applications (such as spam detection). I would advise trying a simple feed forward neural network first, and if that does not work, perhaps try a RNN before this.
Related
I am currently using a neural network that outputs a one hot encoded output.
Upon evaluating it with a classification report I am receiving this error:
UndefinedMetricWarning: Recall and F-score are ill-defined and being set
to 0.0 in samples with no true labels.
When one-hot encoding my output during the train-test-split phase, I had to drop one of the columns in order to avoid the Dummy Variable Trap. As a result, some of the predictions of my neural network are [0, 0, 0, 0], signaling that it belongs to the fifth category. I believe this to be the cause of the UndefinedMetricWarning:.
Is there a solution to this? Or should I avoid classification reports in the first place and is there a better way to evaluate these sorts of neural networks? I'm fairly new to machine learning and neural networks, please forgive my ignorance. Thank you for all the help!!
Edit #1:
Here is my network:
from keras.models import Sequential
from keras.layers import Dense
classifier = Sequential()
classifier.add(Dense(units = 10000,
input_shape = (30183,),
kernel_initializer = 'glorot_uniform',
activation = 'relu'
)
)
classifier.add(Dense(units = 4583,
kernel_initializer = 'glorot_uniform',
activation = 'relu'
)
)
classifier.add(Dense(units = 1150,
kernel_initializer = 'glorot_uniform',
activation = 'relu'
)
)
classifier.add(Dense(units = 292,
kernel_initializer = 'glorot_uniform',
activation = 'relu'
)
)
classifier.add(Dense(units = 77,
kernel_initializer = 'glorot_uniform',
activation = 'relu'
)
)
classifier.add(Dense(units = 23,
kernel_initializer = 'glorot_uniform',
activation = 'relu'
)
)
classifier.add(Dense(units = 7,
kernel_initializer = 'glorot_uniform',
activation = 'relu'
)
)
classifier.add(Dense(units = 4,
kernel_initializer = 'glorot_uniform',
activation = 'softmax'
)
)
classifier.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])
The above is my network. After training the network, I predict values and convert them to class labels using:
from sklearn.preprocessing import LabelBinarizer
labels = np.argmax(predictions, axis = -1)
lb = LabelBinarizer()
labeled_predictions = lb.fit_transform(labels)
Upon calling a classification report comparing y_test and labeled_predctions, I receive the error.
**As a side note for anyone curious, I am experimenting with natural language processing and neural networks. The reason the input vector of my network is so large is that it takes in count-vectorized text as part of its inputs.
Edit #2:
I converted the predictions into a dataframe and dropped duplicates for both the test set and predictions getting this result:
y_test.drop_duplicates()
javascript python r sql
738 0 0 0 0
4678 1 0 0 0
6666 0 0 0 1
5089 0 1 0 0
6472 0 0 1 0
predictions_df.drop_duplicates()
javascript python r sql
738 1 0 0 0
6666 0 0 0 1
5089 0 1 0 0
3444 0 0 1 0
So, essentially what's happening is due to the way softmax is being converted to binary, the predictions will never result in a [0,0,0,0]. When one hot encoding y_test, should I just not drop the first column?
Yes I would say that you should not drop the first column. Because what you do now is to get the softmax and then take the neuron with the highest value as label (labels = np.argmax(predictions, axis = -1) ). With this approach you can never get a [0,0,0,0] result vector. So instead of doing this just create a onehot vector with positions for all 5 classes. You're problem with sklearn should then disappear, as you will get samples with true labels for your 5th class.
I'm also not sure if the dummy variable trap is a problem for neural networks. I have never heard from this before and a short google scholar search did not find any results. Also in all resources I've seen so far about neural networks I never saw this problem. So I guess (but this is really just a guess), that it isn't really a problem that you have when training neural networks. This conclusion is also driven by the fact that the majority of NNs use a softmax at the end.
I wonder what is the structure of the Tensorflow's BasicRNNCell in recurrent neural network shown below? It seems to me that it is a neural network with 3 layers and 12 neurons. But I am not sure how this connections look like? I am not sure whether it is a Hopfield net?
cell = tf.contrib.rnn.BasicRNNCell(num_units=12)
states_series, current_state = tf.nn.dynamic_rnn(cell=cell,inputs=batchX_placeholder,dtype=tf.float32)
This is one layer of basic RNN cells, each having 12 hidden units. The number of cells depends on your batchX_placeholder placeholder.
Here's an example:
n_steps = 2
n_inputs = 3
n_neurons = 5
X = tf.placeholder(dtype=tf.float32, shape=[None, n_steps, n_inputs])
basic_cell = tf.nn.rnn_cell.BasicRNNCell(num_units=n_neurons)
outputs, states = tf.nn.dynamic_rnn(basic_cell, X, dtype=tf.float32)
print(tf.trainable_variables())
It prints...
[<tf.Variable 'rnn/basic_rnn_cell/kernel:0' shape=(8, 5) dtype=float32_ref>,
<tf.Variable 'rnn/basic_rnn_cell/bias:0' shape=(5,) dtype=float32_ref>]
So it created one shared kernel matrix and one shared bias vector. The number of cells corresponds to output.shape (derived from X.shape), which is [?, 2, 5] in this example. So there're 2 cells.
If you wish to create multiple layers, you should use tf.nn.rnn_cell.MultiRNNCell function that accepts the list of cells in each layer.
I use slim framework for tensorflow, because of its simplicity.
But I want to have convolutional layer with both biases and batch normalization.
In vanilla tensorflow, I have:
def conv2d(input_, output_dim, k_h=5, k_w=5, d_h=2, d_w=2, name="conv2d"):
with tf.variable_scope(name):
w = tf.get_variable('w', [k_h, k_w, input_.get_shape()[-1], output_dim],
initializer=tf.contrib.layers.xavier_initializer(uniform=False))
conv = tf.nn.conv2d(input_, w, strides=[1, d_h, d_w, 1], padding='SAME')
biases = tf.get_variable('biases', [output_dim], initializer=tf.constant_initializer(0.0))
conv = tf.reshape(tf.nn.bias_add(conv, biases), conv.get_shape())
tf.summary.histogram("weights", w)
tf.summary.histogram("biases", biases)
return conv
d_bn1 = BatchNorm(name='d_bn1')
h1 = lrelu(d_bn1(conv2d(h0, df_dim + y_dim, name='d_h1_conv')))
and I rewrote it to slim by this:
h1 = slim.conv2d(h0,
num_outputs=self.df_dim + self.y_dim,
scope='d_h1_conv',
kernel_size=[5, 5],
stride=[2, 2],
activation_fn=lrelu,
normalizer_fn=layers.batch_norm,
normalizer_params=batch_norm_params,
weights_initializer=layers.xavier_initializer(uniform=False),
biases_initializer=tf.constant_initializer(0.0)
)
But this code does not add bias to conv layer.
That is because of https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/layers/python/layers/layers.py#L1025 where is
layer = layer_class(filters=num_outputs,
kernel_size=kernel_size,
strides=stride,
padding=padding,
data_format=df,
dilation_rate=rate,
activation=None,
use_bias=not normalizer_fn and biases_initializer,
kernel_initializer=weights_initializer,
bias_initializer=biases_initializer,
kernel_regularizer=weights_regularizer,
bias_regularizer=biases_regularizer,
activity_regularizer=None,
trainable=trainable,
name=sc.name,
dtype=inputs.dtype.base_dtype,
_scope=sc,
_reuse=reuse)
outputs = layer.apply(inputs)
in the construction of layer, which results in not having bias when using batch normalization.
Does that mean that I can not have both biases and batch normalization using slim and layers library? Or is there another way to achieve having both bias and batch normalization in layer when using slim?
Batchnormalization already includes the addition of the bias term. Recap that BatchNorm is already:
gamma * normalized(x) + bias
So there is no need (and it makes no sense) to add another bias term in the convolution layer. Simply speaking BatchNorm shifts the activation by their mean values. Hence, any constant will be canceled out.
If you still want to do this, you need to remove the normalizer_fn argument and add BatchNorm as a single layer. Like I said, this makes no sense.
But the solution would be something like
net = slim.conv2d(net, normalizer_fn=None, ...)
net = tf.nn.batch_normalization(net)
Note, the BatchNorm relies on non-gradient updates. So you either need to use an optimizer which is compatible with the UPDATE_OPS collection. Or you need to manually add tf.control_dependencies.
Long story short: Even if you implement the ConvWithBias+BatchNorm, it will behave like ConvWithoutBias+BatchNorm. It is the same as multiple fully-connected layers without activation function will behave like a single one.
The reason there is no bias for our convolutional layers is because we have batch normalization applied to their outputs. The goal of batch normalization is to get outputs with:
mean = 0
standard deviation = 1
Since we want the mean to be 0, we do not want to add an offset (bias) that will deviate from 0. We want the outputs of our convolutional layer to rely only on the coefficient weights.
I am new to machine learning and Tensorflow and want to do a simple 2-dimensional classification with data, that cannot be linear separated.
On the left side, you can see the training data for the model.
The right side shows, what the trained model predicts.
As of now I am overfitting my model, so every possible input is fed to the model.
My expected result would be a very high accurancy as the model already 'knows' each answer.
Unfortunately the Deep Neural Network I am using is only able to separate by a linear divider, which doesn't fit my data.
This is how I train my Model:
def testDNN(data):
"""
* data is a list of tuples (x, y, b),
* where (x, y) is the input vector and b is the expected output
"""
# Build neural network
net = tflearn.input_data(shape=[None, 2])
net = tflearn.fully_connected(net, 100)
net = tflearn.fully_connected(net, 100)
net = tflearn.fully_connected(net, 100)
net = tflearn.fully_connected(net, 2, activation='softmax')
net = tflearn.regression(net)
# Define model
model = tflearn.DNN(net)
# check if we already have a trained model
# Start training (apply gradient descent algorithm)
model.fit(
[(x,y) for (x,y,b) in data],
[([1, 0] if b else [0, 1]) for (x,y,b) in data],
n_epoch=2, show_metric=True)
return lambda x,y: model.predict([[x, y]])[0][0]
Most of it is taken from the examples of tflearn, so I do not exactly understand, what every line does.
You need an activation function in your network for non-linearity. An activation function is the way for a neural network to fit non-linear function. Tflearn by default uses a linear activation, you could change this to 'sigmoid' and see if the results improve.
I was trying to solve the Dogs vs. Cats Redux: Kernels Edition problem on Kaggle. It is a simple image classification problem. However, I am doing worse than a random predictor with a score of 17+. Does anyone know why this might be?
Neural Network Model
def convolutional_neural_network():
weights = {
# 3x3x3 conv => 1x1x8
'conv1': tf.Variable(tf.random_normal([3, 3, 3, 8])),
# 5x5x8 conv => 1x1x16
'conv2': tf.Variable(tf.random_normal([5, 5, 8, 16])),
# 3x3x16 conv => 1x1x32
'conv3': tf.Variable(tf.random_normal([3, 3, 16, 32])),
# 32 FC => output_features
'out': tf.Variable(tf.random_normal([(SIZE//16)*(SIZE//16)*32, output_features]))
}
biases = {
'conv1': tf.Variable(tf.random_normal([8])),
'conv2': tf.Variable(tf.random_normal([16])),
'conv3': tf.Variable(tf.random_normal([32])),
'out': tf.Variable(tf.random_normal([output_features]))
}
conv1 = tf.add(conv2d(input_placeholder, weights['conv1'], 1), biases['conv1'])
relu1 = relu(conv1)
pool1 = maxpool2d(relu1, 4)
conv2 = tf.add(conv2d(pool1, weights['conv2'], 1), biases['conv2'])
relu2 = relu(conv2)
pool2 = maxpool2d(relu2, 2)
conv3 = tf.add(conv2d(pool2, weights['conv3'], 1), biases['conv3'])
relu3 = relu(conv3)
pool3 = maxpool2d(relu3, 2)
pool3 = tf.reshape(pool3 , shape=[-1, (SIZE//16)*(SIZE//16)*32])
output = tf.add(tf.matmul(pool3, weights['out']), biases['out'])
return output
The the output has no activation function.
Prediction, Optimizer and Loss Function
output_prediction = convolutional_neural_network()
loss = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(output_prediction, output_placeholder) )
trainer = tf.train.AdamOptimizer()
optimizer = trainer.minimize(loss)
test_prediction = tf.nn.softmax(output_prediction)
The images are converted into an numpy array of size 128x128x3 and fed into the neural network with a batch size of 64.
Full Code Here
Edit : Ran the same code for 200 epochs. No improvement. I did Slightly worse.
This is more of a comment but not enough privilege points for that:
Did you normalize your data (i.e. divide the pixels values by 255)? I can't you see doing that in the script.
When you get terrible results like 17 logloss that means your model is always predicting one class with 100% confidence. Usually in this case it's not the architecture or learning rate or number of epochs but rather some silly mistake like forgetting to normalize or mixing up your labels. For this particular problem and given your architecture you should see an accuracy of about 80% and 0.4 logloss within 40 number of epochs. No need for thousands of epochs :)
Improving accuracy is an art than one task solution, you can try some of these methods:
try different gradient optimization, SGD, momentum, nestrov, adap, ...
Try adaptive learning rate
improve regularization methods L1,L2,dropout, drop connect, ...
Augment your training data (have more data).
change your network hyper parameters
finally if nothing helped change the network structure.