I have a 5000 by 9 2d numpy array of features trainX which are the features of a time sequence. I also have a 1d numpy array of floating point feature labels trainY. This is exactly the format you would need for scikit-learn for example.
I would like to use these with keras+LSTM. This is my code at present:
NUM_EPOCHS = 20
model = Sequential()
model.add(LSTM(8, input_shape=(1, window_size)))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(trainX, trainY, epochs=NUM_EPOCHS, batch_size=1, verbose=2)
However this doesn't work as keras needs trainX in a different format it seems. I have read the manual but I can't understand what this is exactly.
How can I convert my data into a format that keras will accept?
The format is (samples, timeSteps, features)
How many sequences do you have? It sounds like one sequence of 5000 steps, is that right?
Then the format is (1,5000,9).
The labels should also be (1,5000,1), if you have one label per time step. (Then use return_sequences=True). Otherwise labels are (1,1).
Optionally, you may want to split your single sequence in many segments, in a classical sliding window case, for instance, where you'd have many samples with less time steps, such as (4998,3,1), supposing you want a 3-step window. Then the labels should follow: (4998,1).
Related
I'm currently working on a simple neural network using Keras, and I'm running into a problem with my labels. The network is making a binary choice, and as such, my labels are all 1s and 0s. My data is composed of a 3d NumPy array, basically pixel data from a bunch of images. Its shape is (560, 560, 32086). However since the first two dimensions are only pixels, I shouldn't assign a label to each one so I tried to make the label array with the shape (1, 1, 32086) so that each image only has 1 label. However when I try to compile this with the following code:
model = Sequential(
[
Rescaling(1.0 / 255),
Conv1D(32, 3, input_shape=datax.shape, activation="relu"),
Dense(750, activation='relu'),
Dense(2, activation='sigmoid')
]
)
model.compile(optimizer=SGD(learning_rate=0.1), loss="binary_crossentropy", metrics=['accuracy'])
model1 = model.fit(x=datax, y=datay, batch_size=1, epochs=15, shuffle=True, verbose=2)
I get this error "ValueError: Data cardinality is ambiguous:
x sizes: 560
y sizes: 1
Make sure all arrays contain the same number of samples." Which I assume means that the labels have to be the same size as the input data, but that doesn't make sense for each pixel to have an individual label.
The data is collected through a for loop looping through files in a directory and reading their pixel data. I then add this to the NumPy array and add their corresponding label to a label array. Any help in this problem would be greatly appreciated.
The ValueError occurs because the first dimension is supposed to be the number of samples and needs to be the same for x and y. In your example that is not the case. You would need datax to to have shape (32086, 560, 560) and datay should be (32086,).
Have a look at this example and note how the 60000 training images have shape (60000, 28, 28).
Also I suspect a couple more mistakes have sneaked into your code:
Are you sure you want a Conv1D layer and not Conv2D? Maybe this example would be informative.
Since you are using binary crossentropy loss your last layer should only have one output instead of two.
I have got a dataset which consists of 100,000 rows and 12 columns, where each column stands of a certain input to train a sequential GRU model to predict only 1 output. The following is the code for the model:
model = Sequential()
model.add(GRU(units=70, return_sequences=True, input_shape=(1,12),activity_regularizer=regularizers.l2(0.0001)))
model.add(GRU(units=50, return_sequences=True,dropout=0.1))
model.add(GRU(units=30, dropout=0.1))
model.add(Dense(units=5))
model.add(Dense(units=3))
model.add(Dense(units=1, activation='relu'))
model.compile(loss=['mae'], optimizer=Adam(lr=0.0001),metrics=['mse'])
model.summary()
history=model.fit(X_train, y_train, batch_size=1000,epochs=30,validation_split=0.1, verbose=1)
However, before that I had to transform the training dataset from 2D to 3D using x_train=x.reshape(-1,1,12) and the output from 1D to 2D using y_train=y.reshape(-1,1). That is the part I really don't understand, why not just keep them as they are?
You had to describe your data in order to be decisive.
But since each layers output is the input of the next layer, their shape must be equal. In the incomplete example you gave your labels need to be a single value for each sample and I think that's why reshape was used.
I have a medical longitudinal data on which I am doing a research.
To start with I am working with 4000 rows of sample with 3 time-steps(3 columns) of a bone size corresponding to size of bone measured in 3 different months.
I am done with the basic model. Now I want to be sure if my understanding of the model is correct.
model = Sequential()
model.add(layers.SimpleRNN(units=10, input_shape=(3,1),use_bias=True,bias_initializer='zeros',activation="relu",kernel_initializer="random_uniform"))
model.add(layers.Dense(1, activation="sigmoid"))
model.compile(loss='binary_crossentropy', optimizer='sgd')
model.summary()
model.fit(trainX,train_op, epochs=100, batch_size=50, verbose=2)
trainPredict = model.predict(trainX)
testPredict = model.predict(testX)
Following are my few doubts around this model :
Here return_sequences is False , then shouldn't I get only the last output from RNN layer. Why the output is of shape(None, 10) from RNN layer? I assumed it should be (sample size,1) .
Also my below mentioned logic is flawed but I need to resolve it which is :
Units corresponds to output units. Initially my guess was that since there are 3 time-steps there has to be 3 output units but I was surprised that even if give units= 128 or 10,1 the model worked. How and why it is happening ? This question along with the above one confuses me more.
input_shape corresponds to -[sample size, number of time steps, features]. Here, I am measuring 1 bone size over 3 time periods. Is my understanding correct when I say the input shape is (sample size, 3,1) ? Moreover, I have confusion regarding how numpy represents 3d array. It seems, to get required dimension I need to input as - #features, observations/sample_size, timesteps . Do I have to reshape my inputs according to how numpy represents 3d or should i let it be. ?
Moreover, how can I build a model if i have different set of features measured over different time frame or have various time steps ? How can i incorporate with the above model.
Yes, you get the last output, which is a 10-dimensional vector, not a one-dimensional vector, so getting shape (samples, 10) is correct.
Number of units has nothing to do with timesteps, the number of timesteps is how many times the neurons are applied recurrently, so its orthogonal to the number of features or units.
Yes, shape of your inputs should be (samples, 3, 1) and the input_shape should be (3, 1), all of this is correct in your code. I am not sure what you are talking about on "how numpy represents 3d array", the shape is clear, numpy does not do any modifications to input shapes.
Sources
There are several sources out there explaining stateful / stateless LSTMs and the role of batch_size which I've read already. I'll refer to them later in my post:
[1] https://machinelearningmastery.com/understanding-stateful-lstm-recurrent-neural-networks-python-keras/
[2] https://machinelearningmastery.com/stateful-stateless-lstm-time-series-forecasting-python/
[3] http://philipperemy.github.io/keras-stateful-lstm/
[4] https://machinelearningmastery.com/use-different-batch-sizes-training-predicting-python-keras/
And also other SO threads like Understanding Keras LSTMs and Keras - stateful vs stateless LSTMs which didn't fully explain what I'm looking for however.
My Problem
I am still not sure what is the correct approach for my task regarding statefulness and determining batch_size.
I have about 1000 independent time series (samples) that have a length of about 600 days (timesteps) each (actually variable length, but I thought about trimming the data to a constant timeframe) with 8 features (or input_dim) for each timestep (some of the features are identical to every sample, some individual per sample).
Input shape = (1000, 600, 8)
One of the features is the one I want to predict, while the others are (supposed to be) supportive for the prediction of this one “master feature”. I will do that for each of the 1000 time series. What would be the best strategy to model this problem?
Output shape = (1000, 600, 1)
What is a Batch?
From [4]:
Keras uses fast symbolic mathematical libraries as a backend, such as TensorFlow and Theano.
A downside of using these libraries is that the shape and size of your data must be defined once up front and held constant regardless of whether you are training your network or making predictions.
[…]
This does become a problem when you wish to make fewer predictions than the batch size. For example, you may get the best results with a large batch size, but are required to make predictions for one observation at a time on something like a time series or sequence problem.
This sounds to me like a “batch” would be splitting the data along the timesteps-dimension.
However, [3] states that:
Said differently, whenever you train or test your LSTM, you first have to build your input matrix X of shape nb_samples, timesteps, input_dim where your batch size divides nb_samples. For instance, if nb_samples=1024 and batch_size=64, it means that your model will receive blocks of 64 samples, compute each output (whatever the number of timesteps is for every sample), average the gradients and propagate it to update the parameters vector.
When looking deeper into the examples of [1] and [4], Jason is always splitting his time series to several samples that only contain 1 timestep (the predecessor that in his example fully determines the next element in the sequence). So I think the batches are really split along the samples-axis. (However his approach of time series splitting doesn’t make sense to me for a long-term dependency problem.)
Conclusion
So let’s say I pick batch_size=10, that means during one epoch the weights are updated 1000 / 10 = 100 times with 10 randomly picked, complete time series containing 600 x 8 values, and when I later want to make predictions with the model, I’ll always have to feed it batches of 10 complete time series (or use solution 3 from [4], copying the weights to a new model with different batch_size).
Principles of batch_size understood – however still not knowing what would be a good value for batch_size. and how to determine it
Statefulness
The KERAS documentation tells us
You can set RNN layers to be 'stateful', which means that the states computed for the samples in one batch will be reused as initial states for the samples in the next batch.
If I’m splitting my time series into several samples (like in the examples of [1] and [4]) so that the dependencies I’d like to model span across several batches, or the batch-spanning samples are otherwise correlated with each other, I may need a stateful net, otherwise not. Is that a correct and complete conclusion?
So for my problem I suppose I won’t need a stateful net. I’d build my training data as a 3D array of the shape (samples, timesteps, features) and then call model.fit with a batch_size yet to determine. Sample code could look like:
model = Sequential()
model.add(LSTM(32, input_shape=(600, 8))) # (timesteps, features)
model.add(LSTM(32))
model.add(LSTM(32))
model.add(LSTM(32))
model.add(Dense(1, activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(X, y, epochs=500, batch_size=batch_size, verbose=2)
Let me explain it via an example:
So let's say you have the following series: 1,2,3,4,5,6,...,100. You have to decide how many timesteps your lstm will learn, and reshape your data as so. Like below:
if you decide time_steps = 5, you have to reshape your time series as a matrix of samples in this way:
1,2,3,4,5 -> sample1
2,3,4,5,6 -> sample2
3,4,5,6,7 -> sample3
etc...
By doing so, you will end with a matrix of shape (96 samples x 5 timesteps)
This matrix should be reshape as (96 x 5 x 1) indicating Keras that you have just 1 time series. If you have more time series in parallel (as in your case), you do the same operation on each time series, so you will end with n matrices (one for each time series) each of shape (96 sample x 5 timesteps).
For the sake of argument, let's say you 3 time series. You should concat all of three matrices into one single tensor of shape (96 samples x 5 timeSteps x 3 timeSeries). The first layer of your lstm for this example would be:
model = Sequential()
model.add(LSTM(32, input_shape=(5, 3)))
The 32 as first parameter is totally up to you. It means that at each point in time, your 3 time series will become 32 different variables as output space. It is easier to think each time step as a fully conected layer with 3 inputs and 32 outputs but with a different computation than FC layers.
If you are about stacking multiple lstm layers, use return_sequences=True parameter, so the layer will output the whole predicted sequence rather than just the last value.
your target shoud be the next value in the series you want to predict.
Putting all together, let say you have the following time series:
Time series 1 (master): 1,2,3,4,5,6,..., 100
Time series 2 (support): 2,4,6,8,10,12,..., 200
Time series 3 (support): 3,6,9,12,15,18,..., 300
Create the input and target tensor
x -> y
1,2,3,4,5 -> 6
2,3,4,5,6 -> 7
3,4,5,6,7 -> 8
reformat the rest of time series, but forget about the target since you don't want to predict those series
Create your model
model = Sequential()
model.add(LSTM(32, input_shape=(5, 3), return_sequences=True)) # Input is shape (5 timesteps x 3 timeseries), output is shape (5 timesteps x 32 variables) because return_sequences = True
model.add(LSTM(8)) # output is shape (1 timesteps x 8 variables) because return_sequences = False
model.add(Dense(1, activation='linear')) # output is (1 timestep x 1 output unit on dense layer). It is compare to target variable.
Compile it and train. A good batch size is 32. Batch size is the size your sample matrices are splited for faster computation. Just don't use statefull
A colleague of mine pointed out the very cool option to use sample_weight instead of a masking layer when you need to mask input to a non-RNN in Keras.
In my case, I have 62 columns in the input, with the 63rd being the response. Over 97% of the nonzero entries in the 62 columns are contained in the first 30 columns. I'm trying to just get this working, so I'd like to weight the last 32 columns to be 0 in training, essentially creating a 'poor-man's mask'.
This is an 8-class classification task, using an MLP. The response variable has been transformed using the to_categorical() function in Keras.
Here's the implementation:
model = Sequential()
model.add(Dense(100, input_dim=X.shape[1], init='uniform', activation='relu'))
model.add(Dense(8, init='uniform', activation='sigmoid'))
hist = model.fit(X, y,
validation_data=(X_test, ytest),
nb_epoch=epochs_,
batch_size=batch_size_,
callbacks=callbacks_list,
sample_weight = np.array([X.shape[1]-32, 30]))
I'm getting this error:
in standardize_weights
assert y.shape[:sample_weight.ndim] == sample_weight.shape
How can I fix my sample_weight to 'mask' the first 32 columns of the input?
Sample weight isn't working like that:
sample_weight: optional array of the same length as x, containing weights to apply to the model's loss for each sample. In the case of temporal data, you can pass a 2D array with shape (samples, sequence_length), to apply a different weight to every timestep of every sample. In this case you should make sure to specify sample_weight_mode="temporal" in compile(). source
In other words, this setting puts different weights on the samples of the training data, not on the features of each sample. This is used only at training step.
I think you should use masking if you don't want the layer to use those features. Or just remove them from your dataset? Or, if it's not too complicated, let the network learn by itself which the useful features are.
Does this help?