I am quite new to machine learning and RNN.
I have a small problem as follows: the problem has 4 independent input variables (says (x1, x2, x_3, x_4)) and a vector output (says y_out = 100x1 elements). I want to built a LSTM model to predict the new y_out for any input set (x_1new, x_2new, x_3new, x_4new).
Let say, I build a sample dataset = 50000 lines, each line consists of [x_1i, x_2i, x_3i, x_4i, y_outi]. So my dataset has the dimension of 50000x(4+100).
I don't know how to configure my LSTM model. Any suggestion would be appreciated.
I have already tried the NN model such as Backpropagation with very worst prediction results.
Thank you
Related
New to neural nets so please correct my syntax.
I'm trying to create a LSTM RNN that will predict the Fibonacci sequence. When I ran the code below, the loss remains incredibly high (around 35339663592701874176).
Why does the shape of the input have to be (batch_size, timesteps, input_dim)? In my example I have 100 data entries so that'd be my batch_size, and the Fibonacci sequence takes in 2 inputs so that'd be input_dim but what would timesteps be in this case? 1?
Shouldn't the the units of the LSTM be 1? If I'm understanding correctly, the "units" are just the amount of hidden state nodes that are in the LSTM. So in theory, each of the 2 inputs would have a "1" coefficient weight towards that hidden state after training.
Would an RNN be a suitable model for this problem? When I've looked online, most people like to use the Fibonacci sequence as an example to explain how RNN's work.
Thanks for the help!
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
# Create Training Data
xs = [[[1, 1]]]
ys = []
i = 0
while i < 100:
ys.append([xs[i][0][0]+xs[i][0][1]])
xs.append([[xs[i][0][1], ys[len(ys)-1][0]]])
i = i + 1
del xs[len(xs)-1]
xs = np.array(xs, dtype=float)
ys = np.array(ys, dtype=float)
# Create Model
model = keras.Sequential()
model.add(layers.LSTM(1, input_shape=(1, 2)))
model.add(layers.Dense(1))
model.compile(optimizer="adam", loss="mean_absolute_error", metrics=[ 'accuracy' ])
# Train
model.fit(xs, ys, epochs=100000)
You can't feed a NN data where some of the values are 10^21 times as large as some of the others and expect it to work, it just doesn't happen.
You're not doing anything here that actually calls for LSTM (or any RNN), you're not actually using the time dimension, and you're basically just trying to learn addition. Maybe you meant to do something different (like input digits as a sequence, or have the output run for multiple timesteps and give you several values of the sequence), but that's not what you're doing, and it's unclear what you want.
The number of units is your memory/procesing capacity. Each unit of an RNN is able to receive values from all of the units in the previous timestep. One unit alone can't do anything interesting, especially with no layer before it to preprocess the data.
I'm trying to build ANN architecture to predict sickness rate. I'm actually stuck in 40% accuracy, I'm new in machine learning and I tried several tips like changing the optimzer, the layer node number and the dropout value without any improving.
So could you guys help me with some advice.
the x array is composed of 10 columns
the y array is only one column the sickness rate
here is my model
def build_dropout_model(rate):
model = Sequential()
model.add(Dense(10,input_shape=(10,)))
model.add(Dropout(rate))
model.add(Dense(256,kernel_constraint=NonNeg(),activation="relu"))
model.add(Dense(256,kernel_constraint=NonNeg(),activation="relu"))
model.add(Dense(128,kernel_constraint=NonNeg(),activation="relu"))
model.add(Dense(1,activation="sigmoid"))
model.compile(optimizer='adam',loss='mean_absolute_error',metrics=['accuracy'])
return model
model = build_dropout_model(0.2)
history = model.fit(xtr,ytr,epochs=1000,verbose=2)
loss, acc=model.evaluate(xtst,ytst)
and thank you in advance
This is a regression model, not a classification model, so you should be using "linear" in your output layer.
model.add(Dense(1,activation="linear"))
For each event/sample, I have a set of values x1,x2,y1,y2,z,k. I have 2 neural networks that I want to undergo two phases of training. The first neural network takes x1,x2 as inputs and outputs z. The second neural network takes y1,y2 as inputs and outputs k.
First phase:
Separately train the first neural network with inputs x1,x2 to output z and the second neural network with inputs y1,y2 to output k.
Second phase:
Here I’m ready to let go of z and k, and I’m looking for a value h that is somewhere between z and k. Therefore, for this phase, I want to train both neural networks to minimize the difference between their outputs (i.e. for each epoch, train NN1 on the output of NN2, and train NN2 on the output of NN1. Update the weights and biases of each. Then, use the new weights and biases to calculate a new output for each. Then, go through another epoch to train NN1 on the new output of NN2, and train NN2 on the new output of NN1)
What Machine Learning package allows me to do that. I’m familiar with Keras (with a TensorFlow backend). Is that possible in Keras? If not, is it possible in TensorFlow?
Thanks
Suppose you have both models as model1 and model2.
Let's create a layer that calculates the difference between their outputs:
from keras.layers import Lambda
difference = Lambda(lambda x: x[0] - x[1])([model1.output, model2.output])
Then, let's make a model that outputs the difference.
#if your models have one input each, (if x1 and x2 are elements in the input array)
diffModel = Model([model1.input, model2.input], difference)
#if your models have two inputs (if x1 and x2 are two input tensors)
diffModel = Model(model1.inputs + model2.inputs, difference)
Let's compile this model, choose a loss to compare the difference with 0:
diffModel.compile(optimizer = 'adam', loss='mse')
And let's train it with zero as output:
#if models with one input:
diffModel.fit([x,y], np.zeros(shape))
#if models with two inputs:
diffModel.fit([x1,x2,y1,y2], np.zeros(shape))
I have recently constructed a CNN in Keras (with Tensorflow as this backend) that takes stellar spectra as an input and predicts three stellar parameters as outputs: Temperature, Surface Gravity, and Metallicity. I am now trying to create an RNN that does the same thing in order to compare the two models.
After searching through examples and forums I haven't come across many applications that are similar enough to my project. I have tried implementing a simple RNN to see if I can come up with sensible results, but no luck so far: the networks don't seem to be learning at all.
I could really use some guidance to get me started. Specifically:
Is an RNN an appropriate network for this type of problem?
What is the correct input shape for the model? I know this depends on the architecture of the network, so I guess that my next question would be: what is a simple architecture to start with that is capable of computing regression predictions?
My input data is such that I have m=50,000 spectra each with n=7000 data points, and L=3 output labels that I am trying to learn. I also have test sets and cross-validation sets with the same n & L dimensions.
When structuring my input data as (m,n,1) and my output targets as (m,L) and using the following architecture, the loss doesn't seem to decrease.
n=7000
L=3
## train_X.shape = (50000, n, 1)
## train_Y.shape = (50000, L)
## cv_X.shape = (10000, n, 1)
## cv_Y.shape = (10000, L)
batch_size=32
lstm_layers = [16, 32]
input_shape = (None, n, 1)
model = Sequential([
InputLayer(batch_input_shape=input_shape),
LSTM(lstm_layers[0],return_sequences=True, dropout_W=0.2, dropout_U=0.2),
LSTM(lstm_layers[1], return_sequences=False),
Dense(L),
Activation('linear')
])
model.compile(loss='mean_squared_error',
optimizer='adam',
metrics=['accuracy'])
model.fit(train_X, train_Y, batch_size=batch_size, nb_epoch=20,
validation_data=(cv_X, cv_Y), verbose=2)
I have also tried changing my input shape to (m, 1, n) and still haven't had any success. I am not looking for an optimal network, just something that trains and then I can take it from there. My input data isn't in time-series, but there are relationships between one section of the spectrum and the previous section, so is there a way I can structure each spectrum into a 2D array that will allow an RNN to learn stellar parameters from the spectra?
First you set
train_X.shape = (50000, n, 1)
Then you write
input_shape = (None, 1, n)
Why don't you try
input_shape = (None, n, 1) ?
It would make more sense for your RNN to receive a sequence of n timesteps and 1 value per time step than the other way around.
Does it help? :)
**EDIT : **
Alright, after re-reading here is my 2cents for your questions : the LSTM isn't a good idea.
1) because there is no "temporal" information, there isnt a "direction" in the spectrum information. The LSTM is good at capturing a changing state of the world for example. It's not the best at combining informations at the beginning of your spectrum with information at the end. It will "read" starting from the beginning and that information will fade as the state is updated. You could try a bidirectionnal LSTM to counter the fact that there is "no direction". However, go to the second point.
2) 7000 timesteps is wayyy too much for an LSTM to work. When it trains, at the backpropagation step, the LSTM is unrolled and the information will have to go through "7000 layers" (not actually 7000 because they have the same weights). This is very very difficult to train. I would limit LSTM to 100 steps at max (from my experience).
Otherwise your input shape is correct :)
Have you tried a deep fully connected network?! I believe this would be more efficient.
I have the following data
feat_1 feat_2 ... feat_n label
gene_1 100.33 10.2 ... 90.23 great
gene_2 13.32 87.9 ... 77.18 soso
....
gene_m 213.32 63.2 ... 12.23 quitegood
The size of M is large ~30K rows, and N is much smaller ~10 columns.
My question is what is the appropriate Deep Learning structure to learn
and test the data like above.
At the end of the day, the user will give a vector of genes with expression.
gene_1 989.00
gene_2 77.10
...
gene_N 100.10
And the system will label which label does each gene apply e.g. great or soso, etc...
By structure I mean one of these:
Convolutional Neural Network (CNN)
Autoencoder
Deep Belief Network (DBN)
Restricted Boltzman Machine
To expand a little on #sung-kim 's comment:
CNN's are used primarily for problems in computer imaging, such as
classifying images. They are modelled on animals visual cortex, they
basically have a connection network such that there are tiles of
features which have some overlap. Typically they require a lot of
data, more than 30k examples.
Autoencoder's are used for feature generation and dimensionality reduction. They start with lots of neurons on each layer, then this number is reduced, and then increased again. Each object is trained on itself. This results in the middle layers (low number of neurons) providing a meaningful projection of the feature space in a low dimension.
While I don't know much about DBN's they appear to be a supervised extension of the Autoencoder. Lots of parameters to train.
Again I don't know much about Boltzmann machines, but they aren't widely used for this sort of problem (to my knowledge)
As with all modelling problems though, I would suggest starting from the most basic model to look for signal. Perhaps a good place to start is Logistic Regression before you worry about deep learning.
If you have got to the point where you want to try deep learning, for whatever reasons. Then for this type of data a basic feed-forward network is the best place to start. In terms of deep-learning, 30k data points is not a large number, so always best start out with a small network (1-3 hidden layers, 5-10 neurons) and then get bigger. Make sure you have a decent validation set when performing parameter optimisation though. If your a fan of the scikit-learn API, I suggest that Keras is a good place to start
One further comment, you will want to use a OneHotEncoder on your class labels before you do any training.
EDIT
I see from the bounty and the comments that you want to see a bit more about how these networks work. Please see the example of how to build a feed-forward model and do some simple parameter optisation
import numpy as np
from sklearn import preprocessing
from keras.models import Sequential
from keras.layers.core import Dense, Activation, Dropout
# Create some random data
np.random.seed(42)
X = np.random.random((10, 50))
# Similar labels
labels = ['good', 'bad', 'soso', 'amazeballs', 'good']
labels += labels
labels = np.array(labels)
np.random.shuffle(labels)
# Change the labels to the required format
numericalLabels = preprocessing.LabelEncoder().fit_transform(labels)
numericalLabels = numericalLabels.reshape(-1, 1)
y = preprocessing.OneHotEncoder(sparse=False).fit_transform(numericalLabels)
# Simple Keras model builder
def buildModel(nFeatures, nClasses, nLayers=3, nNeurons=10, dropout=0.2):
model = Sequential()
model.add(Dense(nNeurons, input_dim=nFeatures))
model.add(Activation('sigmoid'))
model.add(Dropout(dropout))
for i in xrange(nLayers-1):
model.add(Dense(nNeurons))
model.add(Activation('sigmoid'))
model.add(Dropout(dropout))
model.add(Dense(nClasses))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='sgd')
return model
# Do an exhaustive search over a given parameter space
for nLayers in xrange(2, 4):
for nNeurons in xrange(5, 8):
model = buildModel(X.shape[1], y.shape[1], nLayers, nNeurons)
modelHist = model.fit(X, y, batch_size=32, nb_epoch=10,
validation_split=0.3, shuffle=True, verbose=0)
minLoss = min(modelHist.history['val_loss'])
epochNum = modelHist.history['val_loss'].index(minLoss)
print '{0} layers, {1} neurons best validation at'.format(nLayers, nNeurons),
print 'epoch {0} loss = {1:.2f}'.format(epochNum, minLoss)
Which outputs
2 layers, 5 neurons best validation at epoch 0 loss = 1.18
2 layers, 6 neurons best validation at epoch 0 loss = 1.21
2 layers, 7 neurons best validation at epoch 8 loss = 1.49
3 layers, 5 neurons best validation at epoch 9 loss = 1.83
3 layers, 6 neurons best validation at epoch 9 loss = 1.91
3 layers, 7 neurons best validation at epoch 9 loss = 1.65
Deep learning structure would be recommended if you were dealing with raw data and wanted to find features, that work towards your classification goal, automatically. But based on the names of your columns and their number (only 10) it seems that you have your features already engineered.
For this reason you could just go with a standard multi-layer neural network and use supervised learning (back propagation). Such network would have the number of inputs matching the number of your columns (10), followed by a number of hidden layers, and then followed by an output layer with the number of neurons matching the number of your labels. You could experiment with using different number of hidden layers, neurons, different neuron types (sigmoid, tanh, rectified linear etc.) and so on.
Alternatively you could use the raw data (if it's available) and then go with DBNs (they're known to be robust and achieve good results across different problems) or auto-encoders.
If you expect the output to be thought of like scores for a label (as I understood from your question), try a supervised multi-class logistic regression classifier. (the highest score takes the label).
If you're bound to use deep-learning.
A simple feed-forward ANN should do, supervise learning through back propagation. Input layer with N neurons, and one or two hidden layers can be added, not more than that. There is no need to go 'deep' and add more layers for this data, there is risk to overfit the data easily with more layers, if you do so it can be tricky to figure out what the problem is, and the test accuracy will be affected greatly.
Simply plotting or visualizing the data i.e with t-sne can be a good start, if you need to figure out which features are important (or any correlation that may exist).
you can then play with higher powers of those feature dimensions/ or add increased weight to their score.
For problems like this, deep-learning probably isn't very well suited. but a simpler ANN architecture like this should work well depending on the data.