I have a dataset of N videos each video is characterized by some metrics (that will be inputs for a neural net) my goal is to predict the score that a person will give when he or she watches the video.
The problem is that in my dataset each video was watched more than once by different subjects, so I was forced to duplicate the same metrics (inputs) the number of time the video was watched to keep all the scores given by the subjects.
I built an MLP model to predictet the scores. But when I calculate the RMSE it's always higher than 0.7.
I want to know if having a dataset like that would affect the performance of my model ? And how can I deal with it ?
Here is how the dataset looks like:
The first 5 columns are the inputs and the last one is the score of subjects. Note that all of them are normalized.
Here is my Model:
def mlp_model():
# create model
model = Sequential()
model.add(Dense(100,input_dim=5, kernel_initializer='normal', activation='relu'))
model.add(Dense(100, kernel_initializer='normal', activation='relu'))
model.add(Dense(100, kernel_initializer='normal', activation='relu'))
model.add(Dense(100, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal'))
# Compile model
model.compile(loss='mean_squared_error', optimizer='adam')
return model
seed = 100
numpy.random.seed(seed)
myModel = mlp_model()
myModel.fit(x=x_train, y=y_train, batch_size=10, epochs=45, validation_split=0.3, shuffle=True,callbacks=[plot_losses])
predictions = myModel.predict(x_test)
print predictions
Your problem statement reveals an inherent flaw in the design. As you correctly pointed out, you have no way of knowing what the user does, how she has rated other videos, and how she will rate the current video.
It would be helpful to explain what your current input values are, and whether they could differ at all. For example, a metric like "time spent watching the video" might be different for different users.
On a larger scale, try to answer the question whethre you could answer the rating (with a completely deterministic judgement), i.e. would it be possible for you to come up with the same answer (given the same input), and constantly get the same result?
Since that is currently not the case, I would say that you should investigate more time in finding a suitable approach to your problem, like for example recommender systems, but that also requires you to use a lot of different input information.
Alternatively, you could try to find more input data, which specifically identifies the users, and allows you to make more suitable predictions; even then, it will be hard to base a reasonable prediction on such proxy metrics, since you might end up creating an unwanted bias in your preprocessing.
In any case, getting much better results with the current format of the input is very unlikely.
Related
I have a 1DCNN model that seems to only predict close to the mean of the actual values in my test dataset. Is this a poor model based on the fact the distribution of the actuals vs predictions are radically different?
Actual vs Predicted density plot
My questions are:
What could this graph indicate about my predictions? Is it normal for a model to simply predict the mean? Is that what will happen if it can't learn much about the dataset?
Doesn't it seem that MAE is not a good metric? Is it misleading and should I use a different one?
I am trying to improve the model by decreasing MAE, but as it decreases, it simply moves toward the mean value of the actual data and further from the spread of the real distribution. You can see the SD of my predictions is around 9 and the SD of the actual data is about 22. The users want the results in the actual units, which is why I am supplying MAE. Plus I have other baselines to compare to with MAE. I feel it is a very misleading metric.
I have about 30 weather and soil features, all continuous and scaled. 6 years of daily weather data at several thousand weather locations. At each location I have a single target value per year. The 1DCNN architecture is shown below. I split my data with the first 5 years in training and the last year is test. The data spans 3 US states and there are about 9 physical districts per state. I tried building a model per state (just 3 models) but my performance is poor for each. If I build it down to the district level, I can get acceptable results. I don't expect great results, but I'm really just trying to figure out why it's circling the mean.
My model looks like this:
model = Sequential()
model.add(Conv1D(filters=13, kernel_size=3, activation='relu', input_shape=input_shape))
model.add(Conv1D(filters=13, kernel_size=3, activation='relu'))
model.add(Flatten())
model.add(Dense(32, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='linear'))
opt = Adam(lr=.0001)
model.compile(loss='mean_squared_error', optimizer=opt)
I'm training the model on different size datasets to capture results throughout the year as more weather data is added, how ever the predictions are similar for each model.
multiple models showing same situation
In the question, you said you are using MAE(Mean Absolute Error), but in code, you have used the loss as mean_squared_error.
Any reason for that?
Some debugging ideas.
Check the distribution of your training and validation split dependent.
Bucket the values into classes and try to build a classifier, and see if you still face the same issue, this could help you pinpoint which part of the distribution is not being captured properly.
If you are using a structured dataset, try Tree-based algorithms for example XGBoost or Random Forest, for something to see if the algorithm might be the problem.
I am using keras with a tensorflow (version 2.2.0) backend to train a classifier to distinguish between two datasets, A and B, which I have mixed into a pandas DataFrame object x_train (with two columns), and with labels in a numpy array y_train. I would like to perform sample weighting in order to account for the fact that A has far more samples than B. In addition, A is comprised of two datasets A1 and A2, with A1 much larger than A2; I would like to account for this fact as well using my sample weights. I have the sample weights in a numpy array called w_train. There are ~10 million training samples.
Here is example code:
model = Sequential()
model.add(Dense(64, input_dim=x_train.shape[1], activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(x_train.iloc, y_train, sample_weight=w_train)
When I use the sample_weight argument in model.fit(), I find that the model fitting initialization (i.e. whatever happens before keras starts to display the training progress) takes forever, too long to wait for. The problem goes away when I limit the dataset to 1000 samples, but as I increase to 100000 or 1000000 samples I notice that there is a significant difference in initialization and fitting time, so I suspect it has something to do with the way the data is being loaded. Nevertheless, it seems weird that merely adding the sample_weights argument would cause such a large timing difference.
Other information: I am running on CPU using a Jupyter notebook.
What is the problem here? Is there a way for me to modify the training setup or something else in order to speed up the initialization (or training) time?
The issue is caused by how TensorFlow validates some type of input objects. Such validations, when the data are surely correct, are exclusively a wasted time expenditure (I hope in the future it will be handled better).
In order to force TensorFlow to skip such validation procedures, you can trivially wrap the weights in a Pandas Series, such as follows:
model.fit(x_train.iloc, y_train, sample_weight=pd.Series(w_train))
Do note that in your code you are using the metrics keyword. If you want the accuracy to be actually weighted on the provided weights, to use the weighted_metrics argument instead.
I am trying to use LSTM for my timeseries classification problem as follows. My dataset has about 2000 datapoints and each data point is having 25 length 4 timeseries.
model = Sequential()
model.add(LSTM(100, input_shape=(25,4)))
model.add(Dense(50))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
However, the LSTM model works very poorly and gives me very low results. While this is upsetting, I am thinking that LSTM provides low results as it is unable to capture some imporatant characteristics from the timeseries.
In that case, I am wondering if it is possible to give some handcrafted features along with the timeseries to the model? If so, please let me know how to do it.
I am happy to provide more details if needed.
EDIT:
I am thinking if it is possible to use kera's functional API in this regard. So that, I can I can use my features as a seperate input.
LSTM model takes-in a 3-dimensional tensor as an input with dimensions (batch-size, time-length, num-features).
To answer your question, you will have to concatenate those hand-crafted features along with these four raw features that you have, may be normalize them to bring all of them to the same scale, and pass a (batch-size, time-length, features+x) as an input to the LSTM model.
I am trying to train a full CNN on 3-D data(I use conv3D). First, some context to understand. the input is a 3-D matrix that represents the density map of a protein, and the output is a 3-D matrix where the locations of C-alpha is labeled as 1 and the rest is labeled as 0. As expected, this leads to a massive data imbalance. So I implemented a custom cross-entropy cost function that focuses the model on class=1 as shown below:
custom cost function
these maps tend to be large, so the training time tends to increase exponentially with the slight increase in map dimension, or if I make my network a little deeper. In addition to that, a large part of the map is empty space, but I have to keep it to maintain real distance between different locations of C-alpha. To work around this problem, I split each map into smaller boxes of dimension (5,5,5). the benefit of this approach is I get to ignore the empty space which significantly reduces the amount of memory and computation needed for the training.
The problem that I have now is that, I get NaN in the training loss and the training is terminated as shown below:
Network training behavior
this is the network I am using:
model = Sequential()
model.add(Conv3D(15, kernel_size=(3,3,3), activation='relu', input_shape=(5,5,5,1), padding='same'))
model.add(Conv3D(30, kernel_size=(3,3,3), activation='relu', padding='same'))
model.add(Conv3D(60, kernel_size=(3,3,3), activation='relu', padding='same'))
model.add(Conv3D(30, kernel_size=(3,3,3), activation='relu', padding='same'))
model.add(Conv3D(15, kernel_size=(3,3,3), activation='relu', padding='same'))
model.add(Conv3D(1, kernel_size=(3,3,3), activation='sigmoid', padding='same'))
model.compile(loss=weighted_cross_entropp_loss, optimizer='nadam',metrics=['accuracy'])
############# model training ######################################
model.fit(x_train, y_train, batch_size=32, epochs=epochs, verbose=1,validation_split=0.2,shuffle=True,callbacks=[stop_immediately,save_best_model,stop_here_please])
model.save('my_map_model_weighted_custom_box_5.h5')
can anyone please help me, I have been working on this problem for many many weeks
Regards
So I asked this question some time ago and since then I have been trying to find a solution to it. I think I managed to find it, so I thought I should share it. But before I do that I want to say thing that I read and I think it points to the core of my problem. the saying goes like this " if hyper parameter works for someone else, it does not necessarily work for you. If you are trying to solve a new problem using a new architecture, then you need new hyper parameters", or something along those lines.
Sadly, in Keras (which is a good tool by the way if used within the expected domain) it is stated that the optimizer parameters should be kept as they are, which is wrong, now to the solution.
the short answer:
the learning rate was too high.
the long answer:
here I detail how to figure that out. If you get a NaN within the first 100 iterations that is a straight indication that the problem is high learning rate. However, if you get it after the first 100 iterations, then the problem can be one of two things:
if you are using RCNN then you have to use gradient.clip
if you have a custom layer (I had a custom cost function) then it is likely the reason. Now, an important thing to keep in mind is, there is a difference between correct implementation and stable implementation. you can implement your custom layer correctly, However, there are some tricks that you need to add to make it stable which is what was missing from my custom layer.
I am training an image classifier with 2 classes and 53k images, and validating it with 1.3k images using keras. Here is the structure of the neural network :
model = Sequential()
model.add(Flatten(input_shape=train_data.shape[1:]))
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='adam',
loss='binary_crossentropy', metrics=['accuracy'])
Training accuracy increases from ~50% to ~85% in the first epoch, with 85% validation accuracy. Subsequent epochs increase the training accuracy consistently, however, validation accuracy stays in the 80-90% region.
I'm curious, is it possible to get high validation and training accuracy in the first epoch? If my understanding is correct, it starts small and increases steadily with each passing epoch.
Thanks
EDIT : The image size is 150x150 after rescaling and the mini-batch size is 16.
Yes, it is entirely possible to get high accuracy on first epoch and then only modest improvements.
If there is enough redundancy in the data and you make enough updates (wrt. the complexity of your model, which seems fairly easy to optimize) in the first epoch (i.e. you use small minibatches), it's entirely possible that you learn most of the important stuff during the first epoch. When you show the data again, the model will start overfitting to pecularities introduced by the specific images in your train set (thus you get increasing training accuracy), but since you do not provide any novel samples, it will not learn anything new about the underlying properties of your classes.
You can think of your training data as an infinite stream (which actually SGD would like to enjoy all the convergence theorems). Do you think that you need more than 50k samples to learn what is important? You can actually test the data-hunger of your model by providing less data or reporting performance after a some sub-epoch number of updates.
You cannot expect to get an accuracy over 90-95% with image classification using feed forward neural networks.
You need to use another architecture called Convolution Neural network, state of the art in image recognition.
Also it is very easy to build that using keras, but it is computationally more intensive than this.
If you want to stick with feed forward layers the best thing you can do is early stopping, but even that wouldn't give you accuracy over 90%.
Yeah, epoch are supposed to fit the data on the model.
Try using 2 neurons at the end and one hot encode on ur Class label!
Like I have seen one case where I got better results doing that, instead of binary output.