A colleague of mine pointed out the very cool option to use sample_weight instead of a masking layer when you need to mask input to a non-RNN in Keras.
In my case, I have 62 columns in the input, with the 63rd being the response. Over 97% of the nonzero entries in the 62 columns are contained in the first 30 columns. I'm trying to just get this working, so I'd like to weight the last 32 columns to be 0 in training, essentially creating a 'poor-man's mask'.
This is an 8-class classification task, using an MLP. The response variable has been transformed using the to_categorical() function in Keras.
Here's the implementation:
model = Sequential()
model.add(Dense(100, input_dim=X.shape[1], init='uniform', activation='relu'))
model.add(Dense(8, init='uniform', activation='sigmoid'))
hist = model.fit(X, y,
validation_data=(X_test, ytest),
nb_epoch=epochs_,
batch_size=batch_size_,
callbacks=callbacks_list,
sample_weight = np.array([X.shape[1]-32, 30]))
I'm getting this error:
in standardize_weights
assert y.shape[:sample_weight.ndim] == sample_weight.shape
How can I fix my sample_weight to 'mask' the first 32 columns of the input?
Sample weight isn't working like that:
sample_weight: optional array of the same length as x, containing weights to apply to the model's loss for each sample. In the case of temporal data, you can pass a 2D array with shape (samples, sequence_length), to apply a different weight to every timestep of every sample. In this case you should make sure to specify sample_weight_mode="temporal" in compile(). source
In other words, this setting puts different weights on the samples of the training data, not on the features of each sample. This is used only at training step.
I think you should use masking if you don't want the layer to use those features. Or just remove them from your dataset? Or, if it's not too complicated, let the network learn by itself which the useful features are.
Does this help?
Related
I have dataset of shape (143312, 30) and i'm using the following code for setting the model
model = Sequential()
model.add(LSTM(100,activation='sigmoid', input_shape = (30,1 ) ))
model.add(Dense(5, activation='softmax'))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy',f1_m,precision_m, recall_m])
It is working but I have no idea why. Is it just about the feature numbers? When I have 30 features then do I simply set it like this? What does 1 mean and on which basis was Dense set to 5?
About this one:
LSTM(100,activation='sigmoid', input_shape = (30,1))
You have created RNN, which works on sequences of 30 items, each item has one feature. This matches to your data set with shape (143312, 30). The dataset contains 143312 sequences of data, each sequence 30 items long, each item is just a single feature.
The 100 here specifies the number of units (recurrent neurons) used in LSTM. It is a hyperparameter, you use a bigger number for a more complex model and smaller one if your model overfits data.
Regarding this one:
model.add(Dense(5, activation='softmax'))
This is an output layer of your model. Apparently you are using your model for classficantion ('softmax' activation function) and your labels have 5 classes, hence 5 neurons in the Dense layer.
I have got a dataset which consists of 100,000 rows and 12 columns, where each column stands of a certain input to train a sequential GRU model to predict only 1 output. The following is the code for the model:
model = Sequential()
model.add(GRU(units=70, return_sequences=True, input_shape=(1,12),activity_regularizer=regularizers.l2(0.0001)))
model.add(GRU(units=50, return_sequences=True,dropout=0.1))
model.add(GRU(units=30, dropout=0.1))
model.add(Dense(units=5))
model.add(Dense(units=3))
model.add(Dense(units=1, activation='relu'))
model.compile(loss=['mae'], optimizer=Adam(lr=0.0001),metrics=['mse'])
model.summary()
history=model.fit(X_train, y_train, batch_size=1000,epochs=30,validation_split=0.1, verbose=1)
However, before that I had to transform the training dataset from 2D to 3D using x_train=x.reshape(-1,1,12) and the output from 1D to 2D using y_train=y.reshape(-1,1). That is the part I really don't understand, why not just keep them as they are?
You had to describe your data in order to be decisive.
But since each layers output is the input of the next layer, their shape must be equal. In the incomplete example you gave your labels need to be a single value for each sample and I think that's why reshape was used.
I am currently working on a Supervised Machine Learning Solution to categorize some data into two classes.
So far I have worked on a keras/tensorflow Python Scipt which seems to manage that just fine:
input_dim = len(data.columns) - 1
print(input_dim)
model = Sequential()
model.add(Dense(8, input_dim=input_dim, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(2, activation='softmax'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(train_x, train_y, validation_split=0.33, epochs=1500, batch_size=1000, verbose=1)
The input Data I use is a csv data with 168 input features. As I was first running this script successfully I was very surprised to see that I actually got an accuracy of over 99% after only a couple hundred epochs of training. I didn't even bother to normalize the input data yet.
What I am trying to find out now is which of my 168 input features is responsible for such a high accuracy rate and which features dont take much of an effect while training.
Is there a way to check the weights of each input column to see which of them is being used most, respectively which make the most impact.
Answering your last question:
model.layers[0].get_weights()
However, unless there is an obviously dominating weight, it is unlikely that a single sample gives you good accuracy. For feature selection, try replacing some features of your input by their mean and check how the prediction fluctuates. Little-to-no fluctuation means that the feature is not important.
Also, please consider posting ML questions on https://datascience.stackexchange.com/
There is going to be a connection from each 'column' to each neuron in first layer. You could go two ways (apart from randomizing or dropping (equivalent to replacing with mean as suggested in the answer above) the columns values) about finding the relative importance of columns using the weights. Please keep in mind that these methods make sense only if you input standardized dataset
You could use L1 or L2 norm of each columns weight in the first layer
Say your input has 100 columns. You create a layer that dot products the input with a tensor (trainable) of size (100,). Now, you input the output of this layer to your sequential model. Your trained (100,) tensor is the relative importance of your columns
I am training a single layer LSTM that is coded as follows:
model = keras.Sequential()
model.add(keras.layers.LSTM(units=64,
input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(keras.layers.Dense(units=1, activation='sigmoid'))
model.compile(
loss='binary_crossentropy',
optimizer = keras.optimizers.Adam(lr=0.0001),
metrics=['acc']
)
The input to my LSTM is a hourly timeseries. I want to predict values at a daily level based on the hourly series.
Currently what I do is generate hourly predictions and then take the first prediction as the prediction for each day. However I wanted to know if there is a way to generate the same prediction at a daily level.
Thank you!
You have two options as my opinion.
Train the model using daily based training data set. When filtering out what the most suitable datapoint for the day is, you can use the datapoint having the greatest number of repeatitions ( mode ) or the mean.
Take the outputs hourly based but forecasting 24 outputs for the next 24 hours and get the mean or mode of those 24 as the prediction for the day.
The best way is probably second one. It would be much accurate.
One option is you can also give an argument called batch_input_shape instead of input_shape. The difference is now you have to give a fixed batch size and your input array shape will look like (24, X_train.shape[1], X_train.shape[2]).
You also have an option to set another argument return_sequences. This argument tells whether to return the output at each time step instead of the final time step. As we set the return_sequences to True, the output shape becomes a 3D array, instead of a 2D array.
model = keras.Sequential()
model.add(keras.layers.LSTM(units=64,
batch_input_shape=(24, X_train.shape[1], X_train.shape[2])))
model.add(keras.layers.Dense(units=1, activation='sigmoid'))
model.compile(
loss='binary_crossentropy',
optimizer = keras.optimizers.Adam(lr=0.0001),
metrics=['acc']
)
I have a 5000 by 9 2d numpy array of features trainX which are the features of a time sequence. I also have a 1d numpy array of floating point feature labels trainY. This is exactly the format you would need for scikit-learn for example.
I would like to use these with keras+LSTM. This is my code at present:
NUM_EPOCHS = 20
model = Sequential()
model.add(LSTM(8, input_shape=(1, window_size)))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(trainX, trainY, epochs=NUM_EPOCHS, batch_size=1, verbose=2)
However this doesn't work as keras needs trainX in a different format it seems. I have read the manual but I can't understand what this is exactly.
How can I convert my data into a format that keras will accept?
The format is (samples, timeSteps, features)
How many sequences do you have? It sounds like one sequence of 5000 steps, is that right?
Then the format is (1,5000,9).
The labels should also be (1,5000,1), if you have one label per time step. (Then use return_sequences=True). Otherwise labels are (1,1).
Optionally, you may want to split your single sequence in many segments, in a classical sliding window case, for instance, where you'd have many samples with less time steps, such as (4998,3,1), supposing you want a 3-step window. Then the labels should follow: (4998,1).