I am using long short term memory (LSTM) to generate predictions. I have noticed that each time I run the LSTM model, it generates slightly different predictions with the same data. I was wondering why this happens and if there is something I am doing wrong?
Thank You
from numpy import array
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers import TimeDistributed
from keras.layers.convolutional import Conv1D
from keras.layers.convolutional import MaxPooling1D
# split a univariate sequence into samples
def split_sequence(sequence, n_steps):
X, y = list(), list()
for i in range(len(sequence)):
# find the end of this pattern
end_ix = i + n_steps
# check if we are beyond the sequence
if end_ix > len(sequence)-1:
break
# gather input and output parts of the pattern
seq_x, seq_y = sequence[i:end_ix], sequence[end_ix]
X.append(seq_x)
y.append(seq_y)
return array(X), array(y)
def LSTM_Model(Data, N_Steps, Epochs):
# define input sequence
raw_seq = Data
# choose a number of time steps
n_steps_og = N_Steps
# split into samples
X, y = split_sequence(raw_seq, n_steps_og)
# reshape from [samples, timesteps] into [samples, subsequences, timesteps, features]
n_features = 1
n_seq = 2
n_steps = 2
X = X.reshape((X.shape[0], n_seq, n_steps, n_features))
# define model
model = Sequential()
model.add(TimeDistributed(Conv1D(filters=64, kernel_size=1, activation='relu'), input_shape=(None, n_steps, n_features)))
model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(50, activation='relu'))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
# fit model
model.fit(X, y, epochs=Epochs, verbose=2)
#Create Forcasting data
#Now take the last 4 days of the Model data for the forcast
Forcast_data = Data[len(new_data) - n_steps_og:]
# demonstrate prediction
x_input = array(Forcast_data)
x_input = x_input.reshape((1, n_seq, n_steps, n_features))
yhat = float(model.predict(x_input, verbose=0))
return(yhat)
Many methods like this are initialized with random weights for the coefficients. Then they search for a good local minimum to some sort of loss function. This means they will (hopefully) find just one of the many nearly optimal solutions, but are unlikely to find the single very best solution, nor to even find the same solution repeatedly. Due to this, your results are typical, so long as your predictions are only slightly different.
This is more of a general machine learning question, rather than being specific to Python, but I hope this helps.
Related
Hi there recently I've been working on a RNN LSTM project and I have e 2D data set like
x = [[x1,x2,x3...,x18],[x1,x2,x3...,x18],...]
y = [[y1,y2,y3],[y1,y2,y3],...]
X.shape => (295,5,18)
Y.shape => (295,3)
and I convert it to a 3D dataset by code below
X_train = []
Y_train = []
for i in range(5,300):
X_train.append(training_set_scaled[i-5:i,0:18])
Y_train.append(training_set_scaled[i,18:22])
X_train, Y_train = np.array(X_train), np.array(Y_train)
and then use Keras for LSTM
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout
regressor = Sequential()
regressor.add(LSTM(units=50, return_sequences=True,input_shape=(X_train.shape[0],X_train.shape[1],X_train.shape[2])))
regressor.add(Dropout(0.1))
regressor.add(LSTM(units=50, return_sequences=True))
regressor.add(Dropout(0.1))
regressor.add(LSTM(units=50, return_sequences=True))
regressor.add(Dropout(0.1))
regressor.add(LSTM(units=50, return_sequences=True))
regressor.add(Dropout(0.1))
regressor.add(LSTM(units=50))
regressor.add(Dropout(0.1))
regressor.add(Dense(units= 1))
regressor.compile(optimizer= 'adam', loss='binary_crossentropy')
regressor.fit(X_train,Y_train, epochs = 100, batch_size = 32)
and when I run this script I got the error below:
ValueError: Input 0 is incompatible with layer lstm_104: expected ndim=3, found ndim=4
I have no idea about the problem can any body help me?
Change:
input_shape=(X_train.shape[0],X_train.shape[1],X_train.shape[2])
to
input_shape=(X_train.shape[1],X_train.shape[2])
Basically keras is designed to take any number of examples in a single batch, so it automatically puts None as the first parameter. So, when you mention the rest 2 dimensions, it gets a 3 dimensional input in total, but if you yourself mention the first dimension, the number of the dimensions becomes 4, i.e, (None, X_train.shape[0],X_train.shape[1],X_train.shape[2]).
But again, if you really want to hard code the batch_size you can still do it. For this, you have to use batch_input_shape instead of input_shape like follow:
regressor.add(LSTM(units=50, return_sequences=True, batch_input_shape=(X_train.shape[0], X_train.shape[1], X_train.shape[2])))
It will give you the power to control which specific batch size to set for the network. (Your program has another flaw in this case, you are setting the batch size X_train.shape[0] which is 295, but you are sending 32 in fit(), but they should be equal. Also batch size is generally taken lesser than the data set size).
ValueError: Can not squeeze dim[1], expected a dimension of 1, got 3 for 'metrics/sparse_categorical_accuracy/Squeeze' (op: 'Squeeze') with input shapes: [?,3].
The Iris dataset
In this assignment, you will use the Iris dataset. It consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters. For a reference, see the following papers:
R. A. Fisher. "The use of multiple measurements in taxonomic problems". Annals of Eugenics. 7 (2): 179–188, 1936.
Your goal is to construct a neural network that classifies each sample into the correct class, as well as applying validation and regularisation techniques.
Load and preprocess the data
First read in the Iris dataset using datasets.load_iris(), and split the dataset into training and test sets.
You can now construct a model to fit to the data. Using the Sequential API, build your model according to the following specifications:
The model should use the input_shape in the function argument to set the input size in the first layer.
The first layer should be a dense layer with 64 units.
The weights of the first layer should be initialised with the He uniform initializer.
The biases of the first layer should be all initially equal to one.
There should then be a further four dense layers, each with 128 units.
This should be followed with four dense layers, each with 64 units.
All of these Dense layers should use the ReLU activation function.
The output Dense layer should have 3 units and the softmax activation function.
In total, the network should have 10 layers.
from numpy.random import seed
seed(8)
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets, model_selection
get_ipython().run_line_magic('matplotlib', 'inline')
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Softmax
def read_in_and_split_data(iris_data):
return model_selection.train_test_split(
iris_data["data"],
iris_data["target"],
test_size=0.1
)
# In[3]:
# Run your function to generate the test and training data.
iris_data = datasets.load_iris()
(train_data, test_data,
train_targets, test_targets) = read_in_and_split_data(iris_data)
# We will now convert the training and test targets using a one hot encoder.
# In[4]:
# Convert targets to a one-hot encoding
train_targets = tf.keras.utils.to_categorical(np.array(train_targets))
test_targets = tf.keras.utils.to_categorical(np.array(test_targets))
#### GRADED CELL ####
# Complete the following function.
# Make sure to not change the function name or arguments.
def get_model(input_shape):
"""
This function should build a Sequential model according to
the above specification. Ensure the weights are initialised
by providing the input_shape argument in the first layer, given by the
function argument.
Your function should return the model.
"""
model = Sequential([
Dense(64, activation = "rely",
kernel_initializer='he_uniform',
bias_initializer='ones',
input_shape=input_shape),
Dense(128, activation = "relu"),
Dense(128, activation = "relu"),
Dense(128, activation = "relu"),
Dense(128, activation = "relu"),
Dense(64, activation = "relu"),
Dense(64, activation = "relu"),
Dense(64, activation = "relu"),
Dense(64, activation = "relu"),
Dense(3, activation = "softmax"),
])
return model
# In[16]:
# Run your function to get the model
model = get_model(train_data[0].shape)
# #### Compile the model
#
# You should now compile the model using the `compile` method.
# Remember that you need to specify an optimizer, a loss function and
# a metric to judge the performance of your model.
# In[23]:
#### GRADED CELL ####
# Complete the following function.
# Make sure to not change the function name or arguments.
def compile_model(model):
#model.compile(loss="sparse_categorical_crossentropy", optimizer="adam")
opt = tf.keras.optimizers.Adam(learning_rate=0.0001)
acc = tf.keras.metrics.SparseCategoricalAccuracy()
model.compile(optimizer=opt,
loss='sparse_categorical_crossentropy',
metrics=[acc] )
# In[24]:
# Run your function to compile the model
compile_model(model)
#### GRADED CELL ####
# Complete the following function.
# Make sure to not change the function name or arguments.
def train_model(model, train_data, train_targets, epochs):
"""
This function should train the model for the given number of epochs on the
train_data and train_targets.
Your function should return the training history, as returned by model.fit.
"""
return model.fit(train_data, train_targets, epochs)
# Run the following cell to run the training for 800 epochs.
# In[26]:
# Run your function to train the model
history = train_model(model, train_data, train_targets, epochs=800)
This is because you have the wrong loss function. Your targets are one-hot encoded, so you should not use 'sparse_categorical_crossentropy'. Instead, you should use 'categorical_crossentropy'
Same thing for acc = tf.keras.metrics.SparseCategoricalAccuracy(). It should be acc = tf.keras.metrics.CategoricalAccuracy()
This question already has answers here:
Many to one and many to many LSTM examples in Keras
(2 answers)
Closed 4 years ago.
I am reading this article (The Unreasonable Effectiveness of Recurrent Neural Networks) and want to understand how to express one-to-one, one-to-many, many-to-one, and many-to-many LSTM neural networks in Keras. I have read a lot about RNN and understand how LSTM NNs work, in particular vanishing gradient, LSTM cells, their outputs and states, sequence output and etc. However, I have trouble expressing all these concepts in Keras.
To start with I have created the following toy NN using LSTM layer
from keras.models import Model
from keras.layers import Input, LSTM
import numpy as np
t1 = Input(shape=(2, 3))
t2 = LSTM(1)(t1)
model = Model(inputs=t1, outputs=t2)
inp = np.array([[[1,2,3],[4,5,6]]])
model.predict(inp)
Output:
array([[ 0.0264638]], dtype=float32)
In my example I have the input shape 2 by 3. As far as I understand this means that the input is a sequence of 2 vectors and each vector has 3 features and hence my input must be a 3D tensor of shape (n_examples, 2, 3). In terms of 'sequences', the input is a sequence of length 2, and each element in the sequence is expressed by 3 features (please correct me if I am wrong). When I call predict it returns a 2-dim tensor with a single scalar. So,
Q1: Is it one-to-one or another type of LSTM network?
When we say "one/many input and one/many output"
Q2: what do we mean by "one/many input/output"? A "one/many" scalar(s), vector(s), sequence(s)..., one/many what?
Q3: Can someone give a simple working example in Keras for each type of the networks: 1-1, 1-M, M-1, and M-M?
PS: I ask multiple questions in a single thread since they are very close and related to each other.
The distinction one-to-one, one-to-many, many-to-one, many-to-many is only existent in case of RNN / LSTM or networks that work on sequential ( temporal ) data, CNNs work on spatial data there this distinction does not exist. So many always involves multiple timesteps / a sequence
The different species describe the shape of input and output and its classification. For the input one means a single input quantity is classified as a closed quantity and many means a sequence of quantities ( i.e. sequence of images, sequence of words) is classified as a closed quantity. For the output one means the output is a scalar ( binary classification i.e. is a bird or is not a bird ) 0 or 1, many means output is a one-hot encoded vector with one dimension for each class ( multiclass classification i.e. is a sparrow, is a robin, ... ), for i.e. three classes 001, 010, 100 :
In the following example images and sequences of images are used as quantity that shall be classified, alternatively you could use words or ... and sequences of words ( sentences ) or ... :
one-to-one : single images ( or words,... ) are classified in single class ( binary classification ) i.e. is this a bird or not
one-to-many : single images ( or words,... ) are classified in multiple classes
many-to-one : sequence of images ( or words, ... ) is classified in single class ( binary classification of a sequence )
many-to-many : sequence of images ( or words, ... ) is classified in multiple classes
cf https://www.quora.com/How-can-I-choose-between-one-to-one-one-to-many-many-to-one-many-to-one-and-many-to-many-in-long-short-term-memory-LSTM
one-to-one ( activation=sigmoid ( default ) loss=mean_squared_error )
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
# prepare sequence
length = 5
seq = array([i/float(length) for i in range(length)])
X = seq.reshape(len(seq), 1, 1)
y = seq.reshape(len(seq), 1)
# define LSTM configuration
n_neurons = length
n_batch = length
n_epoch = 1000
# create LSTM
model = Sequential()
model.add(LSTM(n_neurons, input_shape=(1, 1)))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
print(model.summary())
# train LSTM
model.fit(X, y, epochs=n_epoch, batch_size=n_batch, verbose=2)
# evaluate
result = model.predict(X, batch_size=n_batch, verbose=0)
for value in result:
print('%.1f' % value)
source : https://machinelearningmastery.com/timedistributed-layer-for-long-short-term-memory-networks-in-python/
one-to-many uses RepeatVector() to transform single quantities into a sequence what is needed for multiclass classification
def test_one_to_many(self):
params = dict(
input_dims=[1, 10], activation='tanh',
return_sequences=False, output_dim=3
),
number_of_times = 4
model = Sequential()
model.add(RepeatVector(number_of_times, input_shape=(10,)))
model.add(LSTM(output_dim=params[0]['output_dim'],
activation=params[0]['activation'],
inner_activation='sigmoid',
return_sequences=True,
))
relative_error, keras_preds, coreml_preds = simple_model_eval(params, model)
# print relative_error, '\n', keras_preds, '\n', coreml_preds, '\n'
for i in range(len(relative_error)):
self.assertLessEqual(relative_error[i], 0.01)
source: https://www.programcreek.com/python/example/89689/keras.layers.RepeatVector
alternative one-to-many
model.add(RepeatVector(number_of_times, input_shape=input_shape))
model.add(LSTM(output_size, return_sequences=True))
source : Many to one and many to many LSTM examples in Keras
many-to-one, binary classification (loss=binary_crossentropy, activation=sigmoid, dimensionality of fully-connected ouput layer is 1 (Dense(1)), outputs a scalar, 0 or 1 )
model = Sequential()
model.add(Embedding(5000, 32, input_length=500))
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())
model.fit(X_train, y_train, epochs=3, batch_size=64)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
many-to-many, multiclass classification ( loss=sparse_categorial_crossentropy , activation=softmax, needs one-hot encoding of target, ground truth data, dimensionality of fully-connected ouput layer is 7 (Dense71)) outputs a 7-dimensional vector in that the 7 classes are one-hot encoded )
from keras.models import Sequential
from keras.layers import *
model = Sequential()
model.add(Embedding(5000, 32, input_length=500))
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(7, activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
cf Keras LSTM multiclass classification
Alternative many-to-many using TimeDistributed layer cf https://machinelearningmastery.com/timedistributed-layer-for-long-short-term-memory-networks-in-python/ for description
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import TimeDistributed
from keras.layers import LSTM
# prepare sequence
length = 5
seq = array([i/float(length) for i in range(length)])
X = seq.reshape(1, length, 1)
y = seq.reshape(1, length, 1)
# define LSTM configuration
n_neurons = length
n_batch = 1
n_epoch = 1000
# create LSTM
model = Sequential()
model.add(LSTM(n_neurons, input_shape=(length, 1), return_sequences=True))
model.add(TimeDistributed(Dense(1)))
model.compile(loss='mean_squared_error', optimizer='adam')
print(model.summary())
# train LSTM
model.fit(X, y, epochs=n_epoch, batch_size=n_batch, verbose=2)
# evaluate
result = model.predict(X, batch_size=n_batch, verbose=0)
for value in result[0,:,0]:
print('%.1f' % value)
source : https://machinelearningmastery.com/timedistributed-layer-for-long-short-term-memory-networks-in-python/
I am trying to build an LSTM Autoencoder to predict Time Series data. Since I am new to Python I have mistakes in the decoding part. I tried to build it up like here and Keras. I could not understand the difference between the given examples at all. The code that I have right now looks like:
Question 1: is how to choose the batch_size and input_dimension when each sample has 2000 values?
Question 2: How to get this LSTM Autoencoder working (the model and the prediction) ? This ist just the model, but how to predict? That it is predicting from the lets say starting from sample 10 on till the end of the data?
Mydata has in total 1500 samples, I would go with 10 time steps (or more if better), and each sample has 2000 Values. If you need more information I would include them as well later.
trainX = np.reshape(data, (1500, 10,2000))
from keras.layers import *
from keras.models import Model
from keras.layers import Input, LSTM, RepeatVector
parameter
timesteps=10
input_dim=2000
units=100 #choosen unit number randomly
batch_size=2000
epochs=20
Model
inpE = Input((timesteps,input_dim))
outE = LSTM(units = units, return_sequences=False)(inpE)
encoder = Model(inpE,outE)
inpD = RepeatVector(timesteps)(outE)
outD1 = LSTM(input_dim, return_sequences=True)(outD
decoder = Model(inpD,outD)
autoencoder = Model(inpE, outD)
autoencoder.compile(loss='mean_squared_error',
optimizer='rmsprop',
metrics=['accuracy'])
autoencoder.fit(trainX, trainX,
batch_size=batch_size,
epochs=epochs)
encoderPredictions = encoder.predict(trainX)
The LSTM model that I use is this one:
def get_model(n_dimensions):
inputs = Input(shape=(timesteps, input_dim))
encoded = LSTM(n_dimensions, return_sequences=False, name="encoder")(inputs)
decoded = RepeatVector(timesteps)(encoded)
decoded = LSTM(input_dim, return_sequences=True, name='decoder')(decoded)
autoencoder = Model(inputs, decoded)
encoder = Model(inputs, encoded)
return autoencoder, encoder
autoencoder, encoder = get_model(n_dimensions)
autoencoder.compile(optimizer='rmsprop', loss='mse',
metrics=['acc', 'cosine_proximity'])
history = autoencoder.fit(x, x, batch_size=100, epochs=100)
encoded = encoder.predict(x)
It works with the data that have, x is of size (3000, 180, 40), that is 3000 samples, timesteps=180 and input_dim=40.
I am trying to get into machine learning with Keras.
I am not a Mathematician and I have only a basic understanding of how neural net-works (haha get it?), so go easy on me.
This is my current code:
from keras.utils import plot_model
from keras.models import Sequential
from keras.layers import Dense
from keras import optimizers
import numpy
# fix random seed for reproducibility
numpy.random.seed(7)
# split into input (X) and output (Y) variables
X = []
Y = []
count = 0
while count < 10000:
count += 1
X += [count / 10000]
numpy.random.seed(count)
#Y += [numpy.random.randint(1, 101) / 100]
Y += [(count + 1) / 100]
print(str(X) + ' ' + str(Y))
# create model
model = Sequential()
model.add(Dense(50, input_dim=1, kernel_initializer = 'uniform', activation='relu'))
model.add(Dense(50, kernel_initializer = 'uniform', activation='relu'))
model.add(Dense(1, kernel_initializer = 'uniform', activation='sigmoid'))
# Compile model
opt = optimizers.SGD(lr=0.01)
model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])
# Fit the model
model.fit(X, Y, epochs=150, batch_size=100)
# evaluate the model
scores = model.evaluate(X, Y)
predictions = model.predict(X)
print("\n%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
print (str(predictions))
##plot_model(model, to_file='C:/Users/Markus/Desktop/model.png')
The accuracy stays zero and the predictions are an array of 1's. What am I doing wrong?
From what I can see you are trying to solve a regression problem (floating point function output) rather than a classification problem (one hot vector style output/put input into categories).
Your sigmoid final layer will only give an output between 0 and 1, which clearly limits your NNs ability to predict the desired range of Y values which go up much higher. Your NN is trying to get as close as it can, but you are limiting it! Sigmoids in the output layer are good for single class yes/no output, but not regression.
So, you want your last layer to have a linear activation where the inputs are just summed. Something like this instead of the sigmoid.
model.add(Dense(1, kernel_initializer='lecun_normal', activation='linear'))
Then it will likely work, at least if the learning rate is low enough.
Google "keras regression" for useful links.
Looks like you are attempting to do binary classification, with a binary_crossentropy loss function. However, the class labels Y are floats. The labels should be 0 or 1. So the biggest problem lies in the input data you are feeding the model for training.
You can try some data that makes more sense, for example two classes where data are sampled from two different normal distributions, and the labels are either 0 or 1 for each observation:
X = np.concatenate([np.random.randn(10000)/2, np.random.randn(10000)/2+1])
Y = np.concatenate([np.zeros(10000), np.ones(10000)])
The model should be able to go somewhere with this type of data.