I have got a dataset which consists of 100,000 rows and 12 columns, where each column stands of a certain input to train a sequential GRU model to predict only 1 output. The following is the code for the model:
model = Sequential()
model.add(GRU(units=70, return_sequences=True, input_shape=(1,12),activity_regularizer=regularizers.l2(0.0001)))
model.add(GRU(units=50, return_sequences=True,dropout=0.1))
model.add(GRU(units=30, dropout=0.1))
model.add(Dense(units=5))
model.add(Dense(units=3))
model.add(Dense(units=1, activation='relu'))
model.compile(loss=['mae'], optimizer=Adam(lr=0.0001),metrics=['mse'])
model.summary()
history=model.fit(X_train, y_train, batch_size=1000,epochs=30,validation_split=0.1, verbose=1)
However, before that I had to transform the training dataset from 2D to 3D using x_train=x.reshape(-1,1,12) and the output from 1D to 2D using y_train=y.reshape(-1,1). That is the part I really don't understand, why not just keep them as they are?
You had to describe your data in order to be decisive.
But since each layers output is the input of the next layer, their shape must be equal. In the incomplete example you gave your labels need to be a single value for each sample and I think that's why reshape was used.
Related
I have dataset of shape (143312, 30) and i'm using the following code for setting the model
model = Sequential()
model.add(LSTM(100,activation='sigmoid', input_shape = (30,1 ) ))
model.add(Dense(5, activation='softmax'))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy',f1_m,precision_m, recall_m])
It is working but I have no idea why. Is it just about the feature numbers? When I have 30 features then do I simply set it like this? What does 1 mean and on which basis was Dense set to 5?
About this one:
LSTM(100,activation='sigmoid', input_shape = (30,1))
You have created RNN, which works on sequences of 30 items, each item has one feature. This matches to your data set with shape (143312, 30). The dataset contains 143312 sequences of data, each sequence 30 items long, each item is just a single feature.
The 100 here specifies the number of units (recurrent neurons) used in LSTM. It is a hyperparameter, you use a bigger number for a more complex model and smaller one if your model overfits data.
Regarding this one:
model.add(Dense(5, activation='softmax'))
This is an output layer of your model. Apparently you are using your model for classficantion ('softmax' activation function) and your labels have 5 classes, hence 5 neurons in the Dense layer.
I am currently working on a Supervised Machine Learning Solution to categorize some data into two classes.
So far I have worked on a keras/tensorflow Python Scipt which seems to manage that just fine:
input_dim = len(data.columns) - 1
print(input_dim)
model = Sequential()
model.add(Dense(8, input_dim=input_dim, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(2, activation='softmax'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(train_x, train_y, validation_split=0.33, epochs=1500, batch_size=1000, verbose=1)
The input Data I use is a csv data with 168 input features. As I was first running this script successfully I was very surprised to see that I actually got an accuracy of over 99% after only a couple hundred epochs of training. I didn't even bother to normalize the input data yet.
What I am trying to find out now is which of my 168 input features is responsible for such a high accuracy rate and which features dont take much of an effect while training.
Is there a way to check the weights of each input column to see which of them is being used most, respectively which make the most impact.
Answering your last question:
model.layers[0].get_weights()
However, unless there is an obviously dominating weight, it is unlikely that a single sample gives you good accuracy. For feature selection, try replacing some features of your input by their mean and check how the prediction fluctuates. Little-to-no fluctuation means that the feature is not important.
Also, please consider posting ML questions on https://datascience.stackexchange.com/
There is going to be a connection from each 'column' to each neuron in first layer. You could go two ways (apart from randomizing or dropping (equivalent to replacing with mean as suggested in the answer above) the columns values) about finding the relative importance of columns using the weights. Please keep in mind that these methods make sense only if you input standardized dataset
You could use L1 or L2 norm of each columns weight in the first layer
Say your input has 100 columns. You create a layer that dot products the input with a tensor (trainable) of size (100,). Now, you input the output of this layer to your sequential model. Your trained (100,) tensor is the relative importance of your columns
I am trying to create human action recognition model. But when I try to add TimeDistributed feature, I have an input_shape problem. How can I convert input_shape from 4d to 5d?
I want to train my dataset with each 10 images, in order to understand actions.
Dataset size= (28000,90,90,1)
#define CNN model
cnn = Sequential()
cnn.add(Conv2D(filters=32,kernel_size=
(5,5),padding="Same",activation="relu",input_shape=(90,90,1)))
cnn.add(MaxPooling2D(pool_size=(2,2)))
cnn.add(Dropout(0.25))
cnn.add(Conv2D(filters=16,kernel_size=(5,5),padding="Same",activation="relu"))
cnn.add(MaxPooling2D(pool_size=(2,2)))
cnn.add(Dropout(0.25))
cnn.add(Conv2D(filters=32,kernel_size=(5,5),padding="Same",activation="relu"))
cnn.add(MaxPooling2D(pool_size=(2,2)))
cnn.add(Dropout(0.25))
cnn.add(Flatten())
cnn.add(Dense(4096, activation="relu"))
#define LSTM model
model = Sequential()
model.add(TimeDistributed(cnn,input_shape=(10,90,90,1)))
model.add(LSTM(10))
model.add(Dense(2, activation="softmax"))
verbose, epochs, batch_size = 0, 25, 64
optimizer=Adam(lr=0.001,beta_1=0.9,beta_2=0.999)
model.compile(optimizer=optimizer,loss="binary_crossentropy",metrics=["accuracy"])
model.fit(x_train, y_train,validation_data=(x_val,y_val), epochs=epochs, batch_size=batch_size)
Here the error:
ValueError: Error when checking input: expected time_distributed_8_input to have 5 dimensions, but got array with shape (28000, 90, 90)
I had the same issue. I'm using Tensorflow.keras from TensorFlow 2.0 alpha. My input data was formed in the following way:
(list, list, numpy.ndarray, numpy.ndarray, numpy.ndarray) corresponding to number of records in batch, number of timesteps, img_width, img_height, channels.
Turns out the Tensorflow input-shape-validation code actually misses the case where the input shape of a given record is built from a list containing numpy arrays, and the list dimension is stripped off. It does handle almost every other formation of the data.
I modified the Tensorflow library code locally to fix it and have reported the fix (https://github.com/tensorflow/tensorflow/issues/28323, which I hope to fix and submit to TF this week).
That said - I think if you change your input data set into a form consisting of (list, numpy.ndarray, numpy.ndarray, numpy.ndarray, numpy.ndarray) it may solve the issue you're having.
I'm trying the get a hang of keras and I'm trying to get basic time series prediction working. My input is a list of random ints between 0 and 10 such as:[1,3,2,4,7,5,9,0] and my labels are the same as the input but delayed such as: [X,X,1,3,2,4,7,5] and I'm trying to have my model learn this relationship of remembering past data points.
My code is:
labels = keras.utils.to_categorical(output, num_keys)
model = keras.Sequential([
keras.layers.LSTM(10),
keras.layers.Dense(10, activation='relu'),
keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer=tf.train.AdamOptimizer(),
loss=tf.keras.losses.categorical_crossentropy,
metrics=['accuracy'])
model.fit(input, labels, epochs=30, verbose=2,shuffle=False)
and I get the error:ValueError: Please provide as model inputs either a single array or a list of arrays. You passed: x=[7, 6,...
I've tried reformating my input with:
input=numpy.array([[i,input[i]]for i in range(len(input))])
input=numpy.reshape(input,input.shape+(1,))
and adding input_shape=input.shape[1:] to my LSTM layer and that throws no errors but the accuracy is no better then just blind guessing
This seems like that kind of thing that could be trivial but I'm clearly missing something.
With keras.layers.LSTM(10), you need to include the input data shape: keras.layers.LSTM(10, input_shape = (input.shape[1], input.shape[2])).
Keras is expecting the input data shaped as [instances, time, predictors] and since you don't have any additional predictors, you may need to reshape your input data to input.reshape(input.shape[0], input.shape[1], 1).
Keras will infer the data shapes for the next layers, but the first layer needs the input shape defined.
A colleague of mine pointed out the very cool option to use sample_weight instead of a masking layer when you need to mask input to a non-RNN in Keras.
In my case, I have 62 columns in the input, with the 63rd being the response. Over 97% of the nonzero entries in the 62 columns are contained in the first 30 columns. I'm trying to just get this working, so I'd like to weight the last 32 columns to be 0 in training, essentially creating a 'poor-man's mask'.
This is an 8-class classification task, using an MLP. The response variable has been transformed using the to_categorical() function in Keras.
Here's the implementation:
model = Sequential()
model.add(Dense(100, input_dim=X.shape[1], init='uniform', activation='relu'))
model.add(Dense(8, init='uniform', activation='sigmoid'))
hist = model.fit(X, y,
validation_data=(X_test, ytest),
nb_epoch=epochs_,
batch_size=batch_size_,
callbacks=callbacks_list,
sample_weight = np.array([X.shape[1]-32, 30]))
I'm getting this error:
in standardize_weights
assert y.shape[:sample_weight.ndim] == sample_weight.shape
How can I fix my sample_weight to 'mask' the first 32 columns of the input?
Sample weight isn't working like that:
sample_weight: optional array of the same length as x, containing weights to apply to the model's loss for each sample. In the case of temporal data, you can pass a 2D array with shape (samples, sequence_length), to apply a different weight to every timestep of every sample. In this case you should make sure to specify sample_weight_mode="temporal" in compile(). source
In other words, this setting puts different weights on the samples of the training data, not on the features of each sample. This is used only at training step.
I think you should use masking if you don't want the layer to use those features. Or just remove them from your dataset? Or, if it's not too complicated, let the network learn by itself which the useful features are.
Does this help?