Comprehending which inputs have the highest weight in a neuronal network

Comprehending which inputs have the highest weight in a neuronal network - python

I am currently working on a Supervised Machine Learning Solution to categorize some data into two classes.
So far I have worked on a keras/tensorflow Python Scipt which seems to manage that just fine:
input_dim = len(data.columns) - 1
print(input_dim)
model = Sequential()
model.add(Dense(8, input_dim=input_dim, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(2, activation='softmax'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(train_x, train_y, validation_split=0.33, epochs=1500, batch_size=1000, verbose=1)
The input Data I use is a csv data with 168 input features. As I was first running this script successfully I was very surprised to see that I actually got an accuracy of over 99% after only a couple hundred epochs of training. I didn't even bother to normalize the input data yet.
What I am trying to find out now is which of my 168 input features is responsible for such a high accuracy rate and which features dont take much of an effect while training.
Is there a way to check the weights of each input column to see which of them is being used most, respectively which make the most impact.

Answering your last question:
model.layers[0].get_weights()
However, unless there is an obviously dominating weight, it is unlikely that a single sample gives you good accuracy. For feature selection, try replacing some features of your input by their mean and check how the prediction fluctuates. Little-to-no fluctuation means that the feature is not important.
Also, please consider posting ML questions on https://datascience.stackexchange.com/

There is going to be a connection from each 'column' to each neuron in first layer. You could go two ways (apart from randomizing or dropping (equivalent to replacing with mean as suggested in the answer above) the columns values) about finding the relative importance of columns using the weights. Please keep in mind that these methods make sense only if you input standardized dataset
You could use L1 or L2 norm of each columns weight in the first layer
Say your input has 100 columns. You create a layer that dot products the input with a tensor (trainable) of size (100,). Now, you input the output of this layer to your sequential model. Your trained (100,) tensor is the relative importance of your columns

Related

Predicting future values with a multivariate LSTM model

I come to ask a question concerning the future predictions with an LSTM models
I explain to you :
I am using an LSTM model to predict the stock price for the next 36 hours.
I have a dataset with 10 features.
I use these 10 features as inputs in my model with a single output (the expected price).
Here is my overall model:
model = Sequential()
# input shape == (336, 10), I use 336 hours for my lookback and 10 features
model.add(LSTM(units=50,return_sequences=True,input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(Dropout(0.2))
model.add(LSTM(units=50,return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(units=50,return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(units=50))
model.add(Dropout(0.2))
model.add(Dense(units=1, activation='linear'))
model.compile(optimizer='adam',loss='mean_squared_error')
I can assess the performance of my model with my test data, but now I would like to use it to predict the next 36 hours, that's the goal anyway, isn't it?
And there I have the impression that there is a big black hole on the internet, everyone presents how to build models and test them with the test data but nobody uses them...
I found two interesting examples which consist in re-integrating the prediction into the last window iteratively.
Here are the examples at the bottom of the topics:
https://towardsdatascience.com/time-series-forecasting-with-recurrent-neural-networks-74674e289816
https://towardsdatascience.com/using-lstms-to-forecast-time-series-4ab688386b1f
In itself it works but with only one feature as input.
I have 10 features, my model just returns me an output value so I cannot reintegrate it into the last window which expects 10 features in its shape.
Do you see the problem?
I really hope you can orient me on the subject.
Adrien

CNN on small dataset is overfiting

I want to classify pattern on image. My original image shape are 200 000*200 000 i reshape it to 96*96, pattern are still recognizable with human eyes. Pixel value are 0 or 1.
i'm using the following neural network.
train_X, test_X, train_Y, test_Y = train_test_split(cnn_mat, img_bin["Classification"], test_size = 0.2, random_state = 0)
class_weights = class_weight.compute_class_weight('balanced',
np.unique(train_Y),
train_Y)
train_Y_one_hot = to_categorical(train_Y)
test_Y_one_hot = to_categorical(test_Y)
train_X,valid_X,train_label,valid_label = train_test_split(train_X, train_Y_one_hot, test_size=0.2, random_state=13)
model = Sequential()
model.add(Conv2D(24,kernel_size=3,padding='same',activation='relu',
input_shape=(96,96,1)))
model.add(MaxPool2D())
model.add(Conv2D(48,kernel_size=3,padding='same',activation='relu'))
model.add(MaxPool2D())
model.add(Conv2D(64,kernel_size=3,padding='same',activation='relu'))
model.add(MaxPool2D())
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(256, activation='relu'))
model.add(Dense(16, activation='softmax'))
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])
train = model.fit(train_X, train_label, batch_size=80,epochs=20,verbose=1,validation_data=(valid_X, valid_label),class_weight=class_weights)
I have already run some experiment to find a "good" number of hidden layer and fully connected layer. it's probably not the most optimal architecture since my computer is slow, i just ran different model once and selected best one with matrix confusion, i didn't use cross validation,I didn't try more complex architecture since my number of data is small, i have read small architecture are the best, is it worth to try more complex architecture?
here the result with 5 and 12 epoch, bach size 80. This is the confusion matrix for my test set
As you can see it's look like i'm overfiting. When i only run 5 epoch, most of the class are assigned to class 0; With more epoch, class 0 is less important but classification is still bad
I added 0.8 dropout after each convolutional layer
e.g
model.add(Conv2D(48,kernel_size=3,padding='same',activation='relu'))
model.add(MaxPool2D())
model.add(Dropout(0.8))
model.add(Conv2D(64,kernel_size=3,padding='same',activation='relu'))
model.add(MaxPool2D())
model.add(Dropout(0.8))
With drop out, 95% of my image are classified in class 0.
I tryed image augmentation; i made rotation of all my training image, still used weighted activation function, result didnt improve. Should i try to augment only class with small number of image? Most of the thing i read says to augment all the dataset...
To resume my question are:
Should i try more complex model?
Is it usefull to do image augmentation only on unrepresented class? then should i still use weight class (i guess no)?
Should i have hope to find a "good" model with cnn when we see the size of my dataset?

I think according to the imbalanced data, it is better to create a custom data generator for your model so that each of it's generated data batch, contains at least one sample from each class. And also it is better to use Dropout layer after each dense layer instead of conv layer. For data augmentation it is better to at least use combination of rotate, horizontal flip and vertical flip. there are some other approaches for data augmentation like using GAN network or random pixel replacement.
For Gan you can check This SO post
For using Gan as data augmenter you can read This Article.
For combination of pixel level augmentation and GAN pixel level data augmentation

What I used - in a different setting - was to upsample my data with ADASYN. This algorithm calculates the amount of new data required to balance your classes, and then takes available data to sample novel examples.
There is an implementation for Python. Otherwise, you also have very little data. SVMs are good performing even with little data. You might want to try them or other image classification algorithms depending where the expected pattern is always at the same position, or varies. Then you could also try the Viola–Jones object detection framework.

InvalidArgumentError with RNN/LSTM in Keras

I'm throwing myself into machine learning, and wish to use Keras for a university project that's time-critical. I realise it would be best to learn individual concepts and building blocks, but it's important that this is done soon.
I'm working with someone who has some experience and interest in machine learning, but we cannot seem to get further than this. The below code was adapted from GitHub code mentioned in a guide in Machine Learning Mastery.
For context, I've got data from multiple physical sensors (where each sensor is a column), with each sample from those sensors represented by one row. I wish to use machine learning to determine who the sensors were tracking at any given time. I'm trying to allocate approximately 80% of the rows to training and 20% to testing, and am creating my own "y" set of data (with the first 521,549 rows being from one participant, and the remainder from another). My data (training and test) has a total of 1,019,802 rows, and 16 columns (all populated), but the number of columns can be reduced if need be.
I would love to know the following:
What does this error mean in the context of what I'm trying to achieve, and how can I change my code to avoid it?
Is the below code suitable for what I'm trying to achieve?
Does this code represent any specific fundamental flaw in my understanding of what machine learning (generally or specifically) is designed to achieve?
Below is the Python code I'm trying to run to make use of machine learning:
x_all = pd.read_csv("(redacted)...csv",
delim_whitespace=True, header=None, low_memory=False).values
y_all = np.append(np.full((521549,1), 0), np.full((498253,1),1))
limit = 815842
x_train = x_all[:limit]
y_train = y_all[:limit]
x_test = x_all[limit:]
y_test = y_all[limit:]
max_features = 16
maxlen = 80
batch_size = 32
model = Sequential()
model.add(Embedding(500, 32, input_length=max_features))
model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=15,
validation_data=(x_test, y_test))
score, acc = model.evaluate(x_test, y_test,
batch_size=batch_size)
This is an excerpt from the CSV referenced in the code:
6698.486328125 4.28260869565217 4.6304347826087 10.6195652173913 2.4392579293836 2.56134051466188 9.05326152004788 0.0 1.0812 924.898261191267 -1.55725190839695 -0.244274809160305 0.320610687022901 -0.122938530734633 0.490254872563718 0.382308845577211
6706.298828125 4.28260869565217 4.58695652173913 10.5978260869565 2.4655894673848 2.50867743865949 9.04368641532017 0.0 1.0812 924.898261191267 -1.64885496183206 -0.366412213740458 0.381679389312977 -0.122938530734633 0.490254872563718 0.382308845577211
6714.111328125 4.26086956521739 4.64130434782609 10.5978260869565 2.45601436265709 2.57809694793537 9.03411131059246 0.0 1.0812 924.898261191267 -0.931297709923664 -0.320610687022901 0.320610687022901 -0.125937031484258 0.493253373313343 0.371814092953523
The following error occurs when running this:
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0,0] = 972190 is not in [0, 500)
[[Node: embedding_1/embedding_lookup = GatherV2[Taxis=DT_INT32, Tindices=DT_INT32, Tparams=DT_FLOAT, _class=["loc:#training/Adam/Assign_2"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](embedding_1/embeddings/read, embedding_1/Cast, training/Adam/gradients/embedding_1/embedding_lookup_grad/concat/axis)]]
For reference, I'm on a 2017 27-inch iMac Retina 5K with 4.2 GHz i7, 32 GB RAM, with a Radeon Pro 580 8 GB.

There are some more tutorials on Machine Learning Mastery for what you want to accomplish
https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/
https://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/
And I'll give my own quick explanation of what you probably want to do.
Right now it looks like you are using the exact same data for the X and y inputs into your model. The y inputs are the labels which in your case is "who the sensors were tracking". So in the binary case of having 2 possible people it is set to 0 for the first person and 1 for the second person.
The sigmoid activation on the final layer will output a number between 0 and 1. If the number is bellow 0.5 then it is predicting that the sensor is tracking person 0 and if it above 0.5 then it is predicting person 1. This will be represented in the accuracy score.
You will probably not want to use an embedding layer, its possible that you might but I would drop it to start with. Normalize your data though before feeding it into the net to improve training. Scikit-Learn has good tools for this if you want a quick solution.
http://scikit-learn.org/stable/modules/preprocessing.html
When working with time series data you often want to feed in a window of time points rather than a single point. If you send your time series to Keras model.fit() then it will use a single point as input.
In order to have a time window as input you need to reorganize each example in the data set to be a whole window, or you can use a generator if that will take up to much memory. This is described in the Machine Learning Mastery pages that I linked.
Keras has a generator that you can use called TimeseriesGenerator
from keras.preprocessing.sequence import TimeseriesGenerator
timeseries_generator = TimeseriesGenerator(data, targets, length, sampling_rate)
where data is your time series of features and targets is your time series of labels.
If you use the timeseries generator then when fitting you will have to use fit_generator
model.fit_generator(timeseries_generator)
same with evaluating using evaluate_generator()
If you have your data set up correctly then your model should work
model = Sequential()
model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))
you could also try a simpler dense model
model = Sequential()
model.add(Flatten())
model.add(Dense(64, dropout=0.2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
One more issue I see is that it appears you would be splitting off a test set that contains only one type of label which is not only bad practice but will also weight your training set towards the other label which might hurt your results.
Hopefully that gets you started. Make sure you get your data set up correctly!

Received a label value of 1 which is outside the valid range of [0, 1) - Python, Keras

I am working on a simple cnn classifier using keras with tensorflow background.
def cnnKeras(training_data, training_labels, test_data, test_labels, n_dim):
print("Initiating CNN")
seed = 8
numpy.random.seed(seed)
model = Sequential()
model.add(Convolution2D(64, 1, 1, init='glorot_uniform',
border_mode='valid',input_shape=(16, 1, 1), activation='relu'))
model.add(MaxPooling2D(pool_size=(1, 1)))
model.add(Convolution2D(32, 1, 1, init='glorot_uniform',
activation='relu'))
model.add(MaxPooling2D(pool_size=(1, 1)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dense(1, activation='softmax'))
# Compile model
model.compile(loss='sparse_categorical_crossentropy',
optimizer='adam', metrics=['accuracy'])
model.fit(training_data, training_labels, validation_data=(
test_data, test_labels), nb_epoch=30, batch_size=8, verbose=2)
scores = model.evaluate(test_data, test_labels, verbose=1)
print("Baseline Error: %.2f%%" % (100 - scores[1] * 100))
# model.save('trained_CNN.h5')
return None
It is a binary classification problem, but I keep getting the message Received a label value of 1 which is outside the valid range of [0, 1) which does not make any sense to me. Any suggesstions?

Range [0, 1) means every number between 0 and 1, excluding 1. So 1 is not a value in the range [0, 1).
I am not 100% sure, but the issue could be due to your choice of loss function. For a binary classification, binary_crossentropy should be a better choice.

In the last Dense layer you used model.add(Dense(1, activation='softmax')). Here 1 restricts its value from [0, 1) change its shape to the maximum output label. For eg your output is from label [0,7) then use model.add(Dense(7, activation='softmax'))

Peculiarities of sparse categorical crossentropy
The loss function sparse_categorical_crossentropy interprets the final layer in the context of classifiers as a set of probabilities for each possible class, and the output value as the number of the class. (The Tensorflow/Keras documentation goes into a bit more detail.) So x neurons in output layer are compared against output values in the range from 0 to x-1; and having just one neuron in the output layer is an 'unary' classifier that doesn't make sense.
If it's a classification task where you want to have output data in the form from 0 to x-1, then you can keep sparse categorical crossentropy, but you need to set the number of neurons in the output layer to the number of classes you have. Alternatively, you might encode the output in a one-hot vector and use categorical crossentropy loss function instead of sparse categorical crossentropy.
If it's not a classification task and you want to predict arbitrary real-valued numbers as in a regression, then categorical crossentropy is not a suitable loss function at all.

Cray and Shaili's answer was correct!
I had a range of outcomes from 1 to 6,
and the line:
tf.keras.layers.Dense(6, activation = 'softmax')
Produced that error message, saying that things were outside of the range
[0,6). I had thought that it was a labels problem (were all values present in
both the training and validation label sets?), and was flogging them.
)

the error is in range [0,4) ,you can just add one to the number of classes(lables) .
for example change this :
layers.Dense(4)
to :
layers.Dense(5)
**same for [0,1)

I had this problem when I had labels of type "float", cast them it "int" and the problem was solved...

Another potential answer to this question is regarding the workspace. If it's not a logic/sparseness/entropy error as other answers suggest, keep reading:
If you created a workspace to hold the data as the model trained, the old workspace data can cause this error if you re-train the data with new samples, especially with a different number of folders and are using the folders as the labels for classification.
Example:
I trained my original set on:
and when I tried to retrain on the new Sample Set:
I received the error:
Received a label value of 3 which is outside the valid range of [0, 3)
This is likely because the old sample set's cached values of 4 folders versus the new sample set's 3 folders caused some kind of issue. All I know for sure is once I cleared the old information out of my workspace, and ran it again, it ran to completion. This was an isolated change after multiple failures, so I am certain it is what solved the issue.
Disclaimer: I am using C# and ML.NET, but it is still utilizing TensorFlow, which is where both of our errors were produced, so it should absolutely apply to the question.

For me issue was that the number of class passed to model was less than the actual number of class in the data. Hence model predicted -1 for most case and thus giving error as out of range.

Keras: masking zero-padded input for non-RNN

A colleague of mine pointed out the very cool option to use sample_weight instead of a masking layer when you need to mask input to a non-RNN in Keras.
In my case, I have 62 columns in the input, with the 63rd being the response. Over 97% of the nonzero entries in the 62 columns are contained in the first 30 columns. I'm trying to just get this working, so I'd like to weight the last 32 columns to be 0 in training, essentially creating a 'poor-man's mask'.
This is an 8-class classification task, using an MLP. The response variable has been transformed using the to_categorical() function in Keras.
Here's the implementation:
model = Sequential()
model.add(Dense(100, input_dim=X.shape[1], init='uniform', activation='relu'))
model.add(Dense(8, init='uniform', activation='sigmoid'))
hist = model.fit(X, y,
validation_data=(X_test, ytest),
nb_epoch=epochs_,
batch_size=batch_size_,
callbacks=callbacks_list,
sample_weight = np.array([X.shape[1]-32, 30]))
I'm getting this error:
in standardize_weights
assert y.shape[:sample_weight.ndim] == sample_weight.shape
How can I fix my sample_weight to 'mask' the first 32 columns of the input?

Sample weight isn't working like that:
sample_weight: optional array of the same length as x, containing weights to apply to the model's loss for each sample. In the case of temporal data, you can pass a 2D array with shape (samples, sequence_length), to apply a different weight to every timestep of every sample. In this case you should make sure to specify sample_weight_mode="temporal" in compile(). source
In other words, this setting puts different weights on the samples of the training data, not on the features of each sample. This is used only at training step.
I think you should use masking if you don't want the layer to use those features. Or just remove them from your dataset? Or, if it's not too complicated, let the network learn by itself which the useful features are.
Does this help?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.