I'm looking at an example from a book. The input is of shape (samples=128, timesteps=24, features=13). When defining two different networks both receiving the same input they have different input_shape on flatten and GRU layers.
model 1:
model = Sequential()
model.add(layers.Flatten(input_shape=(24, 13)))
model.add(layers.Dense(32, activation='relu'))
model.add(layers.Dense(1))
model 2:
model = Sequential()
model.add(layers.GRU(32, input_shape=(None, 13)))
model.add(layers.Dense(1))
I understand that input_shape represents the shape of a single input (not considering batch size), so on my understanding the input_shape on both cases should be (24, 13).
Why are the input_shapes differents between model 1 and model 2?
GRU is a recurrent unit (RNN), which takes a sequence of data as input. The expected input shape for GRU is (batch size, sequence length, feature size). In your case the sequence length is 24 and feature size is 13.
As usual, you don't need to specify a batch size for input_shape argument. Additionally, for recurrent units like GRU or LSTM you can use "None" instead of sequence length, so that it can accept sequences of any length. This is why "input_shape=(None, 13)" is allowed here.
Related
Hello I am trying to build a seq2seq model to generate some music.
I really dont know much about it though.
On the internet I have found this model:
def createSeq2Seq():
#seq2seq model
#encoder
model = Sequential()
model.add(LSTM(input_shape = (None, input_dim), units = num_units, activation= 'tanh', return_sequences = True ))
model.add(BatchNormalization())
model.add(Dropout(0.3))
model.add(LSTM(num_units, activation= 'tanh'))
#decoder
model.add(RepeatVector(y_seq_length))
num_layers= 2
for _ in range(num_layers):
model.add(LSTM(num_units, activation= 'tanh', return_sequences = True))
model.add(BatchNormalization())
model.add(Dropout(0.3))
model.add(TimeDistributed(Dense(output_dim, activation= 'softmax')))
return model
My data is a list of pianorolls. A piano roll is a matrix with the columns representing a one-hot encoding of the different possible pitches (49 in my case) with each column representing a time (0,02s in my case). The pianoroll matrix is then only ones and zeros.
I have prepared my training data reshaping my pianoroll songs (putting them all one after the other) into
shape = (something, batchsize, 49). So my input data are all the songs one after the other separeted in blocks of size the batchsize. My training data is then the same input but delayed one batch.
The x_seq_length and y_seq_length are equal to the batch_size. Input_dim = 49
My input and output sequences have the same dimension.
Have I made any mistake in my reasoning? Is the seq2seq model Ive found correct? What does the RepeatVector does?
This is not a seq2seq model. RepeatVector takes the last state of the last encoder LSTM and makes one copy per output token. Then you feed these copies into a "decoder" LSTM, which thus has the same input in every time step.
A proper autoregressive decoder takes its previous outputs as input, i.e., at training time, the input of the decoder is the same as its output, but shifted by one position. This also means that your model misses the embedding layer for the decoder inputs.
Let's say I want to code this basic Neural Network Structure in Keras which has 10 units in Input Layer and 3 units in Output layer.
Now if I am using Keras, and give input_shape of more then 10, how it will adjust in it.
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
model = Sequential()
model.add(Dense(10, activation = 'relu', input_shape = (64,)))
model.add(Dense(3, activation = 'sigmoid'))
model.summary()
You see, here input_shape is of size 64, but how will it adjust in model whose first layer has 10 units because for what I have learned that size of input shape/vector should be equal to number of units in the input layer.
Or Am I not implementing this neural network right?
That would not be a problem. The weight matrix of shape (10,64) would be used in input layer. your input has shape 64 and first hidden layer has 10 units giving a output of 3 units. Seems fine to me.
But your input layer itself is 64. So what you are getting is a 3-layer network with a hidden layer of 10 units.
If the shape of your input vector is 64, then you really need to have an input layer with size 64. The input layer of a neural network doesn't perform any computations. It just passes the inputs forward to the first hidden layer. This one, on the other hand, performs the computations for all neurons contained in it (linear combination of input vector and weights, later served as an input to the activation function, which is the ReLU in your case).
In your code, you are building a neural net with 64 input neurons (which again don't perform any computations), 10 neurons in the first (and only) hidden layer and 3 neurons in the output layer.
I'm trying to build a model that predict the price of a certain commodity based on current market conditions, my data are shaped similar to
num_samples = 100
sample_dimension = 10
XXX = np.random.random((num_samples,sample_dimension)).reshape(-1,1,sample_dimension)
YYY = np.random.random(num_samples).reshape(-1,1)
so I've got 100 ordered samples of X data, each consisting of 10 variables. My model looks like the following
model = keras.Sequential()
model.add(tf.keras.layers.Conv1D(4,
kernel_size = (2),
activation='sigmoid',
input_shape=(None, sample_dimension),
batch_input_shape = [1,1,sample_dimension]))
model.add(tf.keras.layers.AveragePooling1D(pool_size=2))
model.add(tf.keras.layers.Reshape((1, sample_dimension)))
model.add(tf.keras.layers.LSTM(100,
stateful = True,
return_sequences=False,
activation='sigmoid'))
model.add(keras.layers.Dense(1))
model.compile(optimizer='adam',
loss='mean_squared_error',
metrics=['accuracy'])
so it's a 1D convolution, a pooling, a reshape (so it plays nice with the lstm) and then casting down to a prediction
but when I try to run it, I get the following error
Negative dimension size caused by subtracting 2 from 1 for 'conv1d/conv1d' (op: 'Conv2D') with input shapes: [1,1,1,10], [1,2,10,4].
I've tried a few different values for the kernel size, pool size, and batch_input_shape (have to batch my inputs because my actual data are spread across several large files, so I want to read one at a time and kick it into training the model), but nothing seems to work.
What am I doing wrong? How can I track/predict the shape of my data as it goes through this model? What are the data/variables supposed to look like?
I ended up looking through tutorials for conv2D, and then converting stuff to conv1D (please edit as you feel appropriate)
conv2D solution
model = keras.Sequential()
model.add(tf.keras.layers.Conv2D(4,
kernel_size = (**1**,2),
activation = 'sigmoid',
input_shape = (**1**,sample_dimension,1),
batch_input_shape = [None,**1**,sample_dimension,1]))
model.add(tf.keras.layers.AveragePooling2D(pool_size=(1,2)))
#model.add(tf.keras.layers.Reshape((1,sample_dimension)))
model.add(tf.keras.layers.Flatten())
model.add(keras.layers.Dense(1))
Then I converted it to conv1D by taking out a dimension from each of the necessary arguments (the bold 1s)
model = keras.Sequential()
model.add(tf.keras.layers.Conv1D(4,
kernel_size = 2,
activation = 'sigmoid',
input_shape = (sample_dimension,1),
batch_input_shape = [None,sample_dimension,1]))
model.add(tf.keras.layers.AveragePooling1D(pool_size=2))
#model.add(tf.keras.layers.Reshape((1,sample_dimension)))
model.add(tf.keras.layers.Flatten())
model.add(keras.layers.Dense(1))
i guess the key takeaway is that tensorflow isn't designed to deal with vectors or even matrices, so the last dimension has to be the dimension of the tensor- in this case, it's a 1D tensor (just a number) being held in a sample_dimension
I am trying to use a Conv1D and Bidirectional LSTM in keras (much like in this question) for signal processing, but doing a multiclass classification of each time step.
The problem is that even though the shapes used by Conv1D and LSTM are somewhat equivalent:
Conv1D: (batch, length, channels)
LSTM: (batch, timeSteps, features)
The output of the Conv1D is = (length - (kernel_size - 1)/strides), and therefore doesn't match the LSTM shape anymore, even without using MaxPooling1D and Dropout.
To be more specific, my training set X has n samples with 1000 time steps and one channel (n_samples, 1000, 1), and I used LabelEncoder and OneHotEncoder so y has n samples, 1000 time steps and 5 one hot encoded classes (n_samples, 1000, 5).
Since one class is much more prevalent than the others (is actually the absence of signal), I am using loss='sparse_categorical_crossentropy', sample_weight_mode="temporal" and sample_weight to give a higher weight to time steps containing meaningful classes.
model = Sequential()
model.add(Conv1D(128, 3, strides=1, input_shape = (1000, 1), activation = 'relu'))
model.add(Bidirectional(LSTM(128, return_sequences=True)))
model.add(TimeDistributed(Dense(5, activation='softmax')))
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['categorical_accuracy'], sample_weight_mode="temporal")
print(model.summary())
Model
When I try to fit the model I get this error message:
Error when checking target: expected time_distributed_1 to have shape
(None, 998, 1) but got array with shape (100, 1000, 5).
Is there a way to make such a neural network configuration work?
Your convolution is cutting the tips of the sequence. Use padding='same' in the convolutional layers.
The message, though, seems not to fit your model. Your model clearly has 5 output features (because of Dense(5)), but the massage says it expects 1. Maybe this is happening because of "sparse" crossentropy. You should probably, by the format of your data, use a "categorical_crossentropy".
I'm a bit confused about the number of layers that are used in Keras models. The documentation is rather opaque on the matter.
According to Jason Brownlee the first layer technically consists of two layers, the input layer, specified by input_dim and a hidden layer. See the first questions on his blog.
In all of the Keras documentation the first layer is generally specified as
model.add(Dense(number_of_neurons, input_dim=number_of_cols_in_input, activtion=some_activation_function)).
The most basic model we could make would therefore be:
model = Sequential()
model.add(Dense(1, input_dim = 100, activation = None))
Does this model consist of a single layer, where 100 dimensional input is passed through a single input neuron, or does it consist of two layers, first a 100 dimensional input layer and second a 1 dimensional hidden layer?
Further, if I were to specify a model like this, how many layers does it have?
model = Sequential()
model.add(Dense(32, input_dim = 100, activation = 'sigmoid'))
model.add(Dense(1)))
Is this a model with 1 input layer, 1 hidden layer, and 1 output layer or is this a model with 1 input layer and 1 output layer?
Your first one consists of a 100 neurons input layer connected to one single output neuron
Your second one consists of a 100 neurons input layer, one hidden layer of 32 neurons and one output layer of one single neuron.
You have to think of your first layer as your input layer (with the same number of neurons as the dimenson, so 100 for you) connected to another layer with as many neuron as you specify (1 in your first case, 32 in the second one)
In Keras what is useful is the command
model.summary()
For your first question, the model is :
1 input layer and 1 output layer.
For the second question :
1 input layer
1 hidden layer
1 activation layer (The sigmoid one)
1 output layer
For the input layer, this is abstracted by Keras with the input_dim arg or input_shape, but you can find this layer in :
from keras.layers import Input
Same for the activation layer.
from keras.layers import Activation
# Create a `Sequential` model and add a Dense layer as the first layer.
model = tf.keras.models.Sequential()
model.add(tf.keras.Input(shape=(16,)))
model.add(tf.keras.layers.Dense(32, activation='relu'))
# Now the model will take as input arrays of shape (None, 16)
# and output arrays of shape (None, 32).
# Note that after the first layer, you don't need to specify
# the size of the input anymore:
model.add(tf.keras.layers.Dense(32))
model.output_shape
(None, 32)
model.layers
[<keras.layers.core.dense.Dense at 0x7f494062e950>,
<keras.layers.core.dense.Dense at 0x7f4944048d90>]
model.summary()
Output
it may help you understand clearly