Recurrent neural network using different time steps with keras - python

I build RNN using keras but when I want to change time steps to different size I get an error and I can't get it done
here is my example for dummy data
from numpy import array
import numpy as np
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout
from sklearn.preprocessing import MinMaxScaler
from keras import optimizers
X=array(
[
[#first sample
[0,2],[1,2],[2,2] # three time steps and 2 features
]
,
[# sample 2
[0,2],[1,2],[2,2] # three time steps and 2 features
]
,
[# sample 3
[7,2], [9,2], [4,2] # three time steps and 2 features
]
,
[# sample 4
[2,2], [5,2], [4,2],[7,9] # four steps and 2 features
]
]
)
Y=np.array([1,2,3,4])
model = Sequential()
model.add(LSTM(8, input_shape=(None, 2),return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(32,return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(128,return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(58, activation='softmax'))
optimize=optimizers.RMSprop(lr=0.001, rho=0.9, epsilon=None, decay=0.0)
model.compile(optimize,loss='sparse_categorical_crossentropy',metrics=['accuracy'])
model.summary()
model.fit(X,Y,batch_size=1,epochs=50,shuffle=True,verbose=2)
as you can see from the code I have 4 sequence and 2 features in each sequence.
in the last sequence I have 4 time steps instead of 3 and here is the problem if I change it to 3 time steps the code works correctly,
but I want it to work on different time steps how can I achieve that without use padding or masking.
I did read different topics describe different solutions but I can't get it work in the above example
when I try to run above code I got error
ValueError: Error when checking input: expected lstm_1_input to have 3 dimensions, but got array with shape (4, 1)

Your X is not a valid array. A numpy array must be rectangular and not jagged. Keras can only take in valid numpy arrays as inputs. You have two choices:
Feed the samples into your model 1 sample at a time. I.e. use a batch_size of 1, use fit_on_batch or fit_generator rather than just fit. Note that this will remove all vectorization related speed-optimizations and will slow your training down to a crawl if you have a lot of data.
Pad your training set so that they all are of the same time dimension. 0-padding shouldn't really affect the performance of your model. This is the recommended method.
See this thread for more details.

Related

Why do I get "logits and labels must have the same shape" error?

I have the following piece of code, which is a simplification of my actual code:
#!/usr/bin/env python3
import tensorflow as tf
import re
import numpy as np
import pandas as pd
from tensorflow.keras.preprocessing.text import Tokenizer #text to vector.
from keras.layers import Dense, Embedding, LSTM, SpatialDropout1D
from keras.models import Sequential
DATASIZE = 1000
model = Sequential()
model.add(Embedding(1000, 120, input_length = DATASIZE))
model.add(SpatialDropout1D(0.4))
model.add(LSTM(176, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(2,activation='sigmoid'))
model.compile(loss = 'binary_crossentropy', optimizer='adam', metrics = ['accuracy'])
print(model.summary())
training = [[0 for x in range(10)] for x in range (DATASIZE)] #random value
label = [[1 for x in range(10)] for x in range (DATASIZE)] #random value
model.fit(training, label, epochs = 5, batch_size=32, verbose = 'auto')
What I need:
My endgoal is to be able to check whether a given input vector (which is a numerical representation of data) is positive or negative, so it is a binary classification issue here. Here all the 1000 vectors are 10 digits long, but it could be waay longer or even much shorter, length might vary throughout the dataset.
When running this I get the following error:
ValueError: logits and labels must have the same shape ((None, 2) vs (None, 10))
How do I have to structure my vectors in order to not get this error and actually correctly fit the model?
EDIT:
I could change the model.add(Dense(2,activation='sigmoid')) call to model.add(Dense(10,activation='sigmoid')). But this doesn't make much sense, I think. As the first parameter of Dense is the actual number of output possibilities. In my case there are only 2 possibilities: positive or negative. So rnndomly changing this to 10, makes the program run, but doesn't make sense to me. And I am not even sure it is using my 1000 training vectors....
Your last layer dense has 2 neurons, while your dataset has 10 labels. So technically, the last layer must have same number of neurons as the number of classes in your dataset which is 10 in your case. Just replace 2 with 10 in your last dense layer.

Predicting Fibonacci Using LSTM RNN

New to neural nets so please correct my syntax.
I'm trying to create a LSTM RNN that will predict the Fibonacci sequence. When I ran the code below, the loss remains incredibly high (around 35339663592701874176).
Why does the shape of the input have to be (batch_size, timesteps, input_dim)? In my example I have 100 data entries so that'd be my batch_size, and the Fibonacci sequence takes in 2 inputs so that'd be input_dim but what would timesteps be in this case? 1?
Shouldn't the the units of the LSTM be 1? If I'm understanding correctly, the "units" are just the amount of hidden state nodes that are in the LSTM. So in theory, each of the 2 inputs would have a "1" coefficient weight towards that hidden state after training.
Would an RNN be a suitable model for this problem? When I've looked online, most people like to use the Fibonacci sequence as an example to explain how RNN's work.
Thanks for the help!
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
# Create Training Data
xs = [[[1, 1]]]
ys = []
i = 0
while i < 100:
ys.append([xs[i][0][0]+xs[i][0][1]])
xs.append([[xs[i][0][1], ys[len(ys)-1][0]]])
i = i + 1
del xs[len(xs)-1]
xs = np.array(xs, dtype=float)
ys = np.array(ys, dtype=float)
# Create Model
model = keras.Sequential()
model.add(layers.LSTM(1, input_shape=(1, 2)))
model.add(layers.Dense(1))
model.compile(optimizer="adam", loss="mean_absolute_error", metrics=[ 'accuracy' ])
# Train
model.fit(xs, ys, epochs=100000)
You can't feed a NN data where some of the values are 10^21 times as large as some of the others and expect it to work, it just doesn't happen.
You're not doing anything here that actually calls for LSTM (or any RNN), you're not actually using the time dimension, and you're basically just trying to learn addition. Maybe you meant to do something different (like input digits as a sequence, or have the output run for multiple timesteps and give you several values of the sequence), but that's not what you're doing, and it's unclear what you want.
The number of units is your memory/procesing capacity. Each unit of an RNN is able to receive values from all of the units in the previous timestep. One unit alone can't do anything interesting, especially with no layer before it to preprocess the data.

Why is this ML model giving me zero accuracy?

I am trying to train a network on the Swiss Roll dataset with three features X = [x1, x2, x3] for the classification task. There are four classes with labels 1, 2, 3, 4, and the vector y contains the labels for all the data.
A row in the X matrix looks like this:
-5.2146470e+00 7.0879738e+00 6.7292474e+00
The shape of X is (100, 3), and the shape of y is (100,).
I want to use Radial Basis Functions to train this model. I have used the custom RBFLayer from this StackOverflow answer (also see this explanation) to build the RBFLayer. I want to use a couple of Keras Dense layers to build the network for classification.
What I have tried so far
I have used a Dense layer for the first layer, followed by the custom RBFLayer, and two other Dense layers. Here's the code:
model = Sequential()
model.add((Dense(100, input_dim=3)))
# number of units = 10, gamma = 0.05
model.add(RBFLayer(10,0.05))
model.add(Dense(15, activation='relu'))
model.add(Dense(1, activation='softmax'))
This model gives me zero accuracy. I think there is something wrong with the model architecture, but I can't figure out what is the issue.
Also, I thought the number of units in the last Dense layer should match the number of classes, which is 4 in this case. But when I set the number of units to 4 in the last layer, I get the following error:
ValueError: Shapes (None, 1) and (None, 4) are incompatible
Can you help me with this model architecture?
I faced the same issue while practicing with multi-class classification. Where I had 7 features and the model classifies into 7 classes. I tried encoding the labels and it fixed the issue.
First import LabelEncoder class from sklearn and import to_categorical from tensorflow
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras.utils import to_categorical
Then, initialize an object to the LabelEncoder class and transform your labels before fitting and training the model.
encoder = LabelEncoder()
encoder.fit(y)
y = encoder.transform(y)
y = to_categorical(y)
Note that you have to use np.argmax for getting the actual predicted classification. in my case, the prediction is stored in variable called res
res = np.argmax(res, axis=None, out=None)
You can get your actual predicted class after this line. Looking forward to help you. Hope it solved your problem.
There are four classes with labels 1, 2, 3, 4, and the vector y contains the labels for all the data.
The simplest solution for input output matching is that you print the shape of the inputs and output for a single batch and then compare.
RBF layer should have no problem because output is taken from last dense layer rather then RBF layer.
With classification problem you must have last nodes equal to classes in regression the last node is 1 sometimes.
you should print
pseudo code
print(input.shape)
compare it with
print(model.input_shape)
then at output
print(output.shape)
then compare it with
print(model.predict(input).shape)
you can find the correct syntax at keras docs these are approx correct syntax / pseudo

How do I shape multivariate data for input to LSTM

What I am trying to achieve.
I am trying to predict opening price of Natural Gas ("NG Open") from multiple input parameters per table below. I have followed some tutorials but they don't explain the reason behind a particular format.The code is working after multiple trial and error but need to have some understanding on re-shaping data.
Dataset - Only few lines.
Contract NGLast NGOpen NGHigh NGLow NGVolumes COOpen COHigh COLow
2018-12-01 4.487 4.50 4.60 4.03 100,000 56.00 58.00 50.00
2019-01-01 4.450 4.52 4.61 4.11 93000 51.00 53.00 45.00
Code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from keras.layers import Dense
from keras.models import Sequential
from keras.layers import LSTM
import date time
from keras import metrics
from sklearn.preprocessing import MinMaxScaler
data = pd.read_excel("C:\Futures\Futures.xls")
data['Contract'] = pd.to_datetime(data['Contract'],unit='s').dt.date
data['NG Last'] = data['NG Last'].str.rstrip('s')
data['CO Last'] = data['CO Last'].str.rstrip('s')
COHigh = np.array([data.iloc[:,8]])
COLow = np.array([data.iloc[:,9]])
NGLast = np.array([data.iloc[:,1]])
NGOpen = np.array([data.iloc[:,2]])
NGHigh = np.array([data.iloc[:,3]])
X = np.concatenate([COHigh,COLow, NGLast,NGOpen], axis =0)
X = np.transpose(X)
Y = NGHigh
Y = np.transpose(Y)
scaler = MinMaxScaler()
scaler.fit(X)
X = scaler.transform(X)
scaler.fit(Y)
Y = scaler.transform(Y)
**X = np.reshape(X,(X.shape[0],1,X.shape[1]))**
print(X.shape)
model = Sequential()
**model.add(LSTM(100,activation='tanh',input_shape=(1,4),** recurrent_activation='hard_sigmoid'))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='rmsprop', metrics = [metrics.mae])
model.fit(X,Y,epochs = 10,batch_size=1,verbose=2)
Predict = model.predict(X,verbose=1)
Question
What is the reasoning behind the code marked in astrix above?
1> I have four columns as input so shouldn't it be X = np.reshape(X,(X.shape[0],1,X.shape[1], X.Shape[2],X.shape[3]))? and so on for all the columns considered as inputs?
2> I need explanation of parameters in this line below. model.add(LSTM(100,activation='tanh',input_shape=(1,4), recurrent_activation='hard_sigmoid'))
Your data is currently an array of shape (x,4) where x is the number of rows. So in the toy data you provide, X.shape should return (2,4). If you look at the LSTM line later on, you'll note that it's looking for tensor of shape (1,4) -- that's the input_shape parameter. The np.reshape line is what gets you there. It's converting your 2-d array into a 3-d one. Again, with your example, X.shape will return (2,1,4) after you reshape it. Basically, you now have a list of length 2 of (1,4) arrays, which matches what the LSTM layer wants.
I'd recommend taking a look at the Keras documentation on the LSTM, but here's basically what's happening. 100 is the number of units (or neurons) that this layer will have. Input shape is noted above. The activation is the function that will be used to calculate the output for each neuron. The tanh function is pretty standard in this basic setup, but others are possible Take a look at the list of activations that Keras provides out of the box.

How to handle variable sized input in CNN with Keras?

I am trying to perform the usual classification on the MNIST database but with randomly cropped digits.
Images are cropped the following way : removed randomly first/last and/or row/column.
I would like to use a Convolutional Neural Network using Keras (and Tensorflow backend) to perform convolution and then the usual classification.
Inputs are of variable size and i can't manage to get it to work.
Here is how I cropped digits
import numpy as np
from keras.utils import to_categorical
from sklearn.datasets import load_digits
digits = load_digits()
X = digits.images
X = np.expand_dims(X, axis=3)
X_crop = list()
for index in range(len(X)):
X_crop.append(X[index, np.random.randint(0,2):np.random.randint(7,9), np.random.randint(0,2):np.random.randint(7,9), :])
X_crop = np.array(X_crop)
y = to_categorical(digits.target)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_crop, y, train_size=0.8, test_size=0.2)
And here is the architecture of the model I want to use
from keras.layers import Dense, Dropout
from keras.layers.convolutional import Conv2D
from keras.models import Sequential
model = Sequential()
model.add(Conv2D(filters=10,
kernel_size=(3,3),
input_shape=(None, None, 1),
data_format='channels_last'))
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
model.summary()
model.fit(X_train, y_train, epochs=100, batch_size=16, validation_data=(X_test, y_test))
Does someone have an idea on how to handle variable sized input in my neural network?
And how to perform classification?
TL/DR - go to point 4
So - before we get to the point - let's fix some problems with your network:
Your network will not work because of activation: with categorical_crossentropy you need to have a softmax activation:
model.add(Dense(10, activation='softmax'))
Vectorize spatial tensors: as Daniel mentioned - you need to, at some stage, switch your vectors from spatial (images) to vectorized (vectors). Currently - applying Dense to output from a Conv2D is equivalent to (1, 1) convolution. So basically - output from your network is spatial - not vectorized what causes dimensionality mismatch (you can check that by running your network or checking the model.summary(). In order to change that you need to use either GlobalMaxPooling2D or GlobalAveragePooling2D. E.g.:
model.add(Conv2D(filters=10,
kernel_size=(3, 3),
input_shape=(None, None, 1),
padding="same",
data_format='channels_last'))
model.add(GlobalMaxPooling2D())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(10, activation='softmax'))
Concatenated numpy arrays need to have the same shape: if you check the shape of X_crop you'll see that it's not a spatial matrix. It's because you concatenated matrices with different shapes. Sadly - it's impossible to overcome this issue as numpy.array need to have a fixed shape.
How to make your network train on examples of different shape: The most important thing in doing this is to understand two things. First - is that in a single batch every image should have the same size. Second - is that calling fit multiple times is a bad idea - as you reset inner model states. So here is what needs to be done:
a. Write a function which crops a single batch - e.g. a get_cropped_batches_generator which given a matrix cuts a batch out of it and crops it randomly.
b. Use train_on_batch method. Here is an example code:
from six import next
batches_generator = get_cropped_batches_generator(X, batch_size=16)
losses = list()
for epoch_nb in range(nb_of_epochs):
epoch_losses = list()
for batch_nb in range(nb_of_batches):
# cropped_x has a different shape for different batches (in general)
cropped_x, cropped_y = next(batches_generator)
current_loss = model.train_on_batch(cropped_x, cropped_y)
epoch_losses.append(current_loss)
losses.append(epoch_losses.sum() / (1.0 * len(epoch_losses))
final_loss = losses.sum() / (1.0 * len(losses))
So - a few comments to code above: First, train_on_batch doesn't use nice keras progress bar. It returns a single loss value (for a given batch) - that's why I added logic to compute loss. You could use Progbar callback for that also. Second - you need to implement get_cropped_batches_generator - I haven't written a code to keep my answer a little bit more clear. You could ask another question on how to implement it. Last thing - I use six to keep compatibility between Python 2 and Python 3.
Usually, a model containing Dense layers cannot have variable size inputs, unless the outputs are also variable. But see the workaround and also the other answer using GlobalMaxPooling2D - The workaround is equivalent to GlobalAveragePooling2D. These are layers that can eliminiate the variable size before a Dense layer and suppress the spatial dimensions.
For an image classification case, you may want to resize the images outside the model.
When my images are in numpy format, I resize them like this:
from PIL import Image
im = Image.fromarray(imgNumpy)
im = im.resize(newSize,Image.LANCZOS) #you can use options other than LANCZOS as well
imgNumpy = np.asarray(im)
Why?
A convolutional layer has its weights as filters. There is a static filter size, and the same filter is applied to the image over and over.
But a dense layer has its weights based on the input. If there is 1 input, there is a set of weights. If there are 2 inputs, you've got twice as much weights. But weights must be trained, and changing the amount of weights will definitely change the result of the model.
As #Marcin commented, what I've said is true when your input shape for Dense layers has two dimensions: (batchSize,inputFeatures).
But actually keras dense layers can accept inputs with more dimensions. These additional dimensions (which come out of the convolutional layers) can vary in size. But this would make the output of these dense layers also variable in size.
Nonetheless, at the end you will need a fixed size for classification: 10 classes and that's it. For reducing the dimensions, people often use Flatten layers, and the error will appear here.
A possible fishy workaround (not tested):
At the end of the convolutional part of the model, use a lambda layer to condense all the values in a fixed size tensor, probably taking a mean of the side dimensions and keeping the channels (channels are not variable)
Suppose the last convolutional layer is:
model.add(Conv2D(filters,kernel_size,...))
#so its output shape is (None,None,None,filters) = (batchSize,side1,side2,filters)
Let's add a lambda layer to condense the spatial dimensions and keep only the filters dimension:
import keras.backend as K
def collapseSides(x):
axis=1 #if you're using the channels_last format (default)
axis=-1 #if you're using the channels_first format
#x has shape (batchSize, side1, side2, filters)
step1 = K.mean(x,axis=axis) #mean of side1
return K.mean(step1,axis=axis) #mean of side2
#this will result in a tensor shape of (batchSize,filters)
Since the amount of filters is fixed (you have kicked out the None dimensions), the dense layers should probably work:
model.add(Lambda(collapseSides,output_shape=(filters,)))
model.add(Dense.......)
.....
In order for this to possibly work, I suggest that the number of filters in the last convolutional layer be at least 10.
With this, you can make input_shape=(None,None,1)
If you're doing this, remember that you can only pass input data with a fixed size per batch. So you have to separate your entire data in smaller batches, each batch having images all of the same size. See here: Keras misinterprets training data shape

Categories

Resources