Keras evaluate Accuracy w.r.t. a feature

Keras evaluate Accuracy w.r.t. a feature - python

I have a dataset that consists of different features, like "gender". The task of the model is to determine if the annual income is above or below 50k.
Let say I have a trained network that does the classification.
Now I want to see how often the classifier makes false positive respectively false negative predictions by grouping them accordingly to the gender feature.
The basic idea is a confusion matrix of some sorts, but not a matrix of class to class but class to feature.
The image below illustrates the result I would like to have.

The basic idea is as follows:
1)Make a prediction with the Network.
2)Set the predicted values as new column in your Dataset, you now have a new dataset data_new
Your dataset now has two columns, one for the predicted and one for the true values. You can calculate the overall accuracy by boolean comparison (1 and 1 is right prediction and 0 and 1 and 1 and 0 are wrong predictions respectively).
3)Now you can filter the new data for any column you want, so in my case for the specific gender.
4)Now you can calculate the accuracy w.r.t to chosen gender.

Related

How to predict on test set in matrix factorization collaborative filtering

Say I have df that looks like this
userId movieId rating
0 1 31 1
1 1 34 5
2 1 742 2
3 1 1013 4
4 2 31 1
...
I've splitted using stratified sampling to keep same user in both train/test set.
When training on train dataset I would usually initialize embedding matrices for user and movies and try to learn using SGD.
After two matrices learned say P, Q. I take dot_product(P_i, Q.T_j) to get prediction for (i,j)th position in rating's matrix.
Since P,Q are learned embedding seems correct to use this learned embedding to predict validation dataset. However simply validation_dataset - dot_product(P,Q) doesn't make sense because shape of train and valid dataset are different.
One way to do is from original dataset take-out known ratings and keep it as validation set. However I am wondering if there is a way to split data first then apply learned embeddings to predict test set (this seems more intuitive to me however do not know how to do it...)

The most widely accepted method to calculate test-set performance on collaborative-filtering systems is to keep some number of known user-item interactions separate, in the form of a test set. We exclude those test-set interactions from the training set, which is used to train the model.
After training, for each pair of user u and item i in the test set, we compute the model's predicted interaction score for u and i, and compare it with the known interaction score, which is either 1 or 0 (0 when negative-sampling is used). This is how we compute the test-set performance metrics.
If you use the model's predicted ratings/scores to create new data-points for the test-set, then it may not reflect the true generalization performance of the model on completely unseen/new data. Let me know if that answers your question, or if any clarifications are needed.

Unexpected Number of Weights in tf.keras Sequential Model

I have a question about the predictive power of each feature so I need a way to evaluate how strong each feature is in the final model. My feature_layer contains two indicator_columns wrapped around categorical_column_with_vocabulary_lists for categorical data, an indicator_column wrapped around a cross between two bucketized numerical columns for latitude/longitude data, and five numeric columns.
I would expect the finished model to have 15 weights: 2 for the latitude and longitude, 5 for the numeric columns, and 5 and 3 for each of the categorical columns using one-hot encoding. However, len(model.get_weights())[0] returns 513. I suspect the latitude and longitude have many more weights since a cross between two bucketized columns ends up being a sparse categorical feature with a high enough resolution. However, assuming this is true, I still don't know how to interpret the weights returned by model.get_weights()[0].

I found out that the answer has to do with the hash_bucket_size argument in the crossed_column. Each of those hashes gets a weight of its own in the final model. The 513 weights were a result of the 13 weights from every other feature and the 500 hashes for the crossed latitude/longitude.
In terms of interpreting the weights, I am under the assumption that the weights of the model remain in the order that I added features to the feature_layer.

Using SHAP: Scaling the Shapley values for each model and then averaging across models, or just adding the Shapley values for each model?

I'm running n trials on a Keras model with k features, after which I apply SHAPs DeepExplainer to the model on each trial. All the data is the same, but it is randomly split between the training and testing sets. I'm trying to figure out the best way to combine the model outputs, whether it be directly by adding the Shapley values for each trial, feature by feature, and then averaging - or by scaling the Shapley values output each trial first and then adding them and averaging.
My initial thought was that, as the "baseline is always relative based on the average of all predictions" (from here), the overall average would be skewed and there might be a better way of combining the data. Though I wonder if, despite the different samples in the train/test split and a different relative "baseline" for each model, if averaging over many models would give a final averaged model should have as much interpretation value as a single model. Should this be the case?
However, would scaling the features per model offer any benefits: Again from here I can (save for the caveats) scale a features Shapley values for a single observation in a model. It seems then that I should be able to scale each of the features Shapley values after summing over all observations, over each bin such that all Shapley values for each feature sum to 1. If this is the case, that I can scale by features within the model can I average the models this way? I am thinking a benefit of this is that all models will then have equal weight since the features are scaled within each. Is this a valid approach, and if so, does it offer any benefit over adding all the Shapley values, feature by feature together over all models?
To be clear on what mean concerning the bins, they are the the lists returned from the explainer, equal to the number of classifications:
explainer = shap.DeepExplainer(model, X_train)
ShapleyBinVals = explainer.shap_values(X_test)
Bin = ShapleyBinVals[n]
where n is the number of output classifications. Here's a bar plot of the scaled output:
Notice that for each feature e.g. PSWQ_2 the y-value is a percentage and the sum of percentages over all bins is 1.

Keras time series, how to predict the next time period

I am using Keras on some data. Here are the details:
8,000 customers, each customer has varying time steps ranging from 2 - 41. So I am using zero padding to ensure all customers have 41 time steps. All 8,000 customers have 2 features and the data comes with multiclass labels, 0-4. Each tilmestep has a label.
After training the model, in the test part of the process I'd like to feed in the features and labels for timesteps 1-40, then have it predict the label in the 41st timestep. Does anyone know if this is possible? I've found Keras to be somewhat of a black box in interpreting what is it actually predicting (eg when it gives an accuracy score, what is this the accuracy of? What is it trying to predict: the last tilmestep label or all tilmestep labels?).
Is there a particular sub-division of model that should be used within sequential Keras LSTM models? I've read 'A many-to-one model (f(...)) produces one output (y(t)) value after receiving multiple input values (X(t), X(t+1), ...). ' (Brownlee 2017). However, it doesn't seem to make accommodation for the fact that my input is Xt & Yt for all time steps except the last one that I want to predict. I'm not sure how I would set up my code to instruct the model to predict the last timestep (that I have the data for but then I want to compare the predicted category with the actual category).

To predict the next timestep for each feature you would want your final Dense layer to be the same width as the number of features:
model.add(Dense(n_features))
There's a good example of a similar problem here under Multiple Parallel Series
https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/
The accuracy is just a metric for measuring the effectiveness of your model. For accuracy, it's correct_predictions / total_predictions
https://keras.io/metrics/

Data missing imputation with autoencoder on small set of data

I have to model a ANN to predict the level of consumer complains regarding the in-process parameters on the chain production for my master thesis. Unfortunately, the firm gives me unregulated collected data and there are a lot of missing data. It's about a year of data grouped by open day, so I have 17 column of physical values for 260 days. To infer the missing values, I try to model an denoising autoencoder but it doesn't provide good results. For training the model, I have only 113 days with complete data. The values are real-valued, with different unit and range (some are in the range (100,150) and others are in (90.03,90.35)).
To simulate noise and like the missing dynamic is Not Random At All, I modify a value, with this condition (Random.random()
def DAE(train,l1,l2,num_layer):
input_size = train.shape[1]
#num_layer = 2
theta = 1 #int(input_size/num_layer)
code_size = input_size-theta*(num_layer+1)
epochss=1000
lrr=0.01
autoencoder = Sequential()
autoencoder.add(Dense(input_size, input_shape=(input_size,), kernel_regularizer=regularizers.l2(0.01),
activity_regularizer=regularizers.l1(0.01)))
for index in range(num_layer):
layer_size=input_size-(index+1)*theta
autoencoder.add(Dense(layer_size,input_shape=(input_size,),activation='linear'))
print(layer_size)
autoencoder.add(Dense(code_size,activation='linear'))
print(code_size)
for index in range(num_layer):
layer_size=input_size-(num_layer-index)*theta
autoencoder.add(Dense(layer_size,input_shape=(input_size,),activation='linear'))
print(layer_size)
autoencoder.add(Dense(input_size,activation='linear'))
autoencoder.compile(Adam(lr=lrr), loss='mean_squared_error', metrics=['accuracy'])
return autoencoder
autoencoder = DAE(AE_train,l1,l2,3)
history = autoencoder.fit(AE_train,AE_target,epochs=1000,validation_split=0.2)
On train and test loss plot, it converge really fastly but after a certain number of epochs it appears a big peak with log decay just after. I don't understand why it rise.
When I try to predict the missing values, I change the nan by the mean of the column. The prediction is always out of the min max range of the specific physical values.
So here are my question, how can I deal with missing data in a small set of values ? Here I have different type of values(unit), should I normalize the values ? But if a do that, how to reconstruct them, as I want to infer real value. Is there a better solution for missing data imputation than autoencoder in ML family techniques?
Thanks for reading my problem and even more for bring me an answer.
Loss plot for test and train sets

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.