Adding prior belief into a neural Network - python

I am busy with a classification problem, with three classes. One of the classes is never predicted/classified. I would like to know if there s anyway to inject a prior belief into my neural network, be design or not.
My football prediction model predicts [Draws , Home Win , Away Win]. My classes are pretty balanced (40% , 30 % , 30%). The class [Draw] that accounts for 40% of the data is the one the my NN never predicts. My dataset contains 1900 samples.
I am using a deep NN with 2 to 4 hidden layers.
My code of my best model(based on training/val loss) is as follows:
X_all = df.copy()
train_cols = ['a_line0','a_line1','a_line2','a_line3','a_line4','a_line5',
'a_line6','a_line7','a_line8','a_line9','a_line10','h_line0',
'h_line1','h_line2','h_line3','h_line4','h_line5','h_line6',
'h_line7','h_line8','h_line9','h_line10','odds0','odds1','odds2']
x = X_all[train_cols]
x_v = x.values #returns a numpy array
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x_v)
x = pd.DataFrame(x_scaled)
y = X_all['result']
ohe = OneHotEncoder(n_values=3,categories='auto')
y = ohe.fit_transform(y.reshape(-1,1))
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=0)
for lr,ep in [(0.001,300)]:
model = Sequential()
model.add(Dense(25, input_dim=25, activation='relu'))
model.add(Dense(36, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(12, activation='relu'))
model.add(Dense(3, activation='sigmoid'))
adam = kr.optimizers.Adam(lr=lr, decay=1e-6)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])
model.fit(X_train, y_train, epochs=ep, batch_size=10,verbose = 0)
_, accuracy = model.evaluate(X_test, y_test)
_, accuracy1 = model.evaluate(X_train, y_train)
print('Testing Accuracy: %.2f' % (accuracy*100),'Train Accuracy: %.2f' % (accuracy1*100), 'learning rate : ', lr)
I apologise if the code is a bit messy.
My model also overfits by +- 16% (52% vs 68%) on this config of my network.

Since you are in a multi-class single-label setting (i.e. your labels are mutually exclusive), you should not use sigmoid as activation in your final layer; change it to
model.add(Dense(3, activation='softmax'))
Also, dropout should not be used by default; remove it for starters, and only add it if it improves the result.

Related

Deep learning accuracy changes

Every time I change the dataset, it gives a different accuracy. Sometimes it gives 97%, 50%, and 92%. It is a text classification. Why does this happen? The other 95% comes from 2 datasets that are the same size and give almost the same result.
#Split DatA
X_train, X_test, label_train, label_test = train_test_split(X, Y, test_size=0.2,random_state=42)
#Size of train and test data:
print("Training:", len(X_train), len(label_train))
print("Testing: ", len(X_test), len(label_test))
#Function defined to test the models in the test set
def test_model(model, epoch_stop):
model.fit(X_test
, Y_test
, epochs=epoch_stop
, batch_size=batch_size
, verbose=0)
results = model.evaluate(X_test, Y_test)
return results
#############3
maxlen = 300
#Bidirectional LSTM model
embedding_dim = 100
dropout = 0.5
opt = 'adam'
####################
#embed_dim = 128 #dimension of the word embedding vector for each word in a sequence
lstm_out = 196 #no of lstm layers
lstm_model = Sequential()
#Adding dropout
#lstm_model.add(LSTM(lstm_out, dropout=0.2, recurrent_dropout=0.2))##############################
lstm_model = Sequential()
lstm_model.add(layers.Embedding(input_dim=num_words,
output_dim=embedding_dim,
input_length=X_train.shape[1]))
#lstm_model.add(Bidirectional(LSTM(lstm_out, return_sequences=True, dropout=0.2, recurrent_dropout=0.2)))
#lstm_model.add(Bidirectional(LSTM(lstm_out, dropout=0.2, recurrent_dropout=0.2)))
#lstm_model.add(Bidirectional(LSTM(64, return_sequences=True)))
lstm_model.add(Bidirectional(LSTM(64, return_sequences=True)))
lstm_model.add(layers.GlobalMaxPool1D())
#Adding a regularized dense layer
lstm_model.add(layers.Dense(32,kernel_regularizer=regularizers.l2(0.001),activation='relu'))
lstm_model.add(layers.Dropout(0.25))
lstm_model.add(Dense(3,activation='softmax'))
lstm_model.compile(loss = 'categorical_crossentropy', optimizer='adam',metrics = ['accuracy'])
print(lstm_model.summary())
#TRANING
history = lstm_model.fit(X_train, label_train,
epochs=4,
verbose=True,**strong text**
validation_data=(X_test, label_test),
batch_size=64)
loss, accuracy = lstm_model.evaluate(X_train, label_train, verbose=True)
print("Training Accuracy: {:.4f}".format(accuracy))
loss_val, accuracy_val = lstm_model.evaluate(X_test, label_test, verbose=True)
print("Testing Accuracy: {:.4f}".format(accuracy_val))
ML models will base their predictions on the data previously trained on, it is only natural that the outcome will differ in case the training data is changed. Also it might be the case that a different dataset may perform better using different hyperparameters.

Why I get a very low accuracy with LSTM and pretrained word2vec?

I'm working on a reviews classification model with only two categories 0 (negative) and 1 (positive). I'm using pre-trained word2vec from google with LSTM. The problem is I get an accuracy of around 50% where it should be around 83% according to this paper. I tried many different hyperparameters combination and still gets a horrible accuracy. I also tried to change the data preprocessing techniques and tried stemming but it hasn't resolved the problem
here's my code
X, y = read_data()
X = np.array(clean_text(X)) #apply data preprocessing
tokenizer = Tokenizer()
tokenizer.fit_on_texts(X)
#converts text to sequence and add padding zeros
sequence = tokenizer.texts_to_sequences(X)
X_data = pad_sequences(sequence, maxlen = length, padding = 'post')
X_train, X_val, y_train, y_val = train_test_split(X_data, y, test_size = 0.2)
#Load the word2vec model
word2vec = KeyedVectors.load_word2vec_format(EMBEDDING_FILE, binary=True)
word_index = tokenizer.word_index
nb_words = min(MAX_NB_WORDS, len(word_index))+1
embedding_matrix = np.zeros((nb_words, EMBEDDING_DIM))
null_words = []
for word, i in word_index.items():
if word in word2vec.wv.vocab:
embedding_matrix[i] = word2vec.word_vec(word)
else:
null_words.append(word)
embedding_layer = Embedding(embedding_matrix.shape[0], # or len(word_index) + 1
embedding_matrix.shape[1], # or EMBEDDING_DIM,
weights=[embedding_matrix],
input_length=701,
trainable=False)
model = Sequential()
model.add(embedding_layer)
model.add(LSTM(100))
model.add(Dropout(0.4))
model.add(Dense(2, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_train, y_train, batch_size=32, epochs=2, validation_data=(X_val, y_val), workers = -1, verbose=1)
score, acc = model.evaluate(X_val, y_val, batch_size=64)
I also tried other optimizers like AdaMax and MSLE loss function and no matter how much I increase the epoch or change the batch size the accuracy never gets better. I'm just so confused if the problem isn't with the model and preprocessing where could it be? Thanks
Few things I noted,
Why do you have the trainable=False it is restricting your model, so that the model cannot finetune the embedding. Having to learn a problem using a fixed set of embedding is difficult than using trainable embedding. Therefore, try setting trainable=True.
embedding_layer = Embedding(embedding_matrix.shape[0], # or len(word_index) + 1
embedding_matrix.shape[1], # or EMBEDDING_DIM,
weights=[embedding_matrix],
input_length=701,
trainable=False)
Second problem is that you are using 2 units with sigmoid activation and binary_crossentropy. This combination doesn't work. You have two options.
model = Sequential()
...
model.add(Dense(2, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
Option 1
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
If you pick this option, note that your labels need to be [sample size, 1] shape.
Option 2
model.add(Dense(2, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

Neural network for beginner

I'm willing to create a Neural network in python, using Keras, that tells if anumber is even or odd. I know that can be done in many ways and that using NN for this is overkill but i want to this for educational purpose.
I'm running into an issue: the accuracy of my model is about 50 % that means that it's unable to tell if a number is even or odd.
I'll detail to you the step that i went through and hopefully we'll find a solution together :)
Step one creation of the data and labels:
Basically my data are the number from 0 to 99(binary) and the labels are 0(odd) and 1(even)
for i in range(100):
string = np.binary_repr(i,8)
array = []
for k in string:
array.append(int(k))
array = np.array(array)
labels.append(-1*(i%2 - 1))
Then I'm creating the model thas is made of 3 layer.
-Layer 1 (input) : one neuron that's takes any numpy array of size 8 (8 bit representation of integers)
-Layer 2 (Hidden) : two neurons
-Layer 3 (outuput) : one neuron
# creating a model
model = Sequential()
model.add(Dense(1, input_dim=8, kernel_initializer='uniform', activation='relu'))
model.add(Dense(2, kernel_initializer='uniform', activation='relu'))
model.add(Dense(1, kernel_initializer='uniform', activation='sigmoid'))
then I'm training the model using binary_cross_entropy as a loss function since i want a binary classification of integers:
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
then I'm training the model and evaluating it:
#training
model.fit(data, labels, epochs=10, batch_size=2)
#evaluate the model
scores = model.evaluate(data, labels)
print("\n%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
And that's where I'm lost because of that 50 % accuracy.
I think i missundesrtood something about NN or Keras implementation so any help would be appreciated.
Thank you for reading
edit : I modified my code according to the comment of Stefan Falk
The following gives me an accuracy on the test set of 100%:
import numpy as np
from tensorflow.contrib.learn.python.learn.estimators._sklearn import train_test_split
from tensorflow.python.keras import Sequential
from tensorflow.python.keras.layers import Dense
# Number of samples (digits from 0 to N-1)
N = 10000
# Input size depends on the number of digits
input_size = int(np.log2(N)) + 1
# Generate data
y = list()
X = list()
for i in range(N):
binary_string = np.binary_repr(i, input_size)
array = np.zeros(input_size)
for j, binary in enumerate(binary_string):
array[j] = int(binary)
X.append(array)
y.append(int(i % 2 == 0))
X = np.asarray(X)
y = np.asarray(y)
# Make train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1)
# Create the model
model = Sequential()
model.add(Dense(2, kernel_initializer='uniform', activation='relu'))
model.add(Dense(1, kernel_initializer='uniform', activation='sigmoid'))
model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])
# Train
model.fit(X_train, y_train, epochs=3, batch_size=10)
# Evaluate
print("Evaluating model:")
scores = model.evaluate(X_test, y_test)
print("\n%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
Why does it work that well?
Your problem is very simple. The network only needs to know whether the first bit is set (1) or not (0). For this you actually don't need a hidden layer or any non-linlearities. The problem can be solved with simple linear regression.
This
model = Sequential()
model.add(Dense(1, kernel_initializer='uniform', activation='sigmoid'))
will do the job as well. Further, on the topic of feature engineering,
X = [v % 2 for v in range(N)]
is also enough. You'll see that X in that case will have the same content as y.
Maybe try a non-linear example such as XOR. Note that we do not have a test-set here because there's nothing to generalize or any "unseen" data which may surprise the network.
import numpy as np
from tensorflow.python.keras import Sequential
from tensorflow.python.keras.layers import Dense
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])
model = Sequential()
model.add(Dense(5, input_dim=2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, y, batch_size=1, nb_epoch=1000)
print(model.predict_proba(X))
print(model.predict_proba(X) > 0.5)
Look at this link and play around with the example.

Why is the accuracy and f1 score of this model the same no matter what?

After doing some data wrangling, I have some variables that are either categorical ones labeled 0 or 1, or numerical variables that have been normalized to have values ranging from 0.0 to 1.0. Then here is where I define and run the model:
def baseline_model():
# create model
model = Sequential()
model.add(Dense(8, input_dim=6, activation='relu'))
model.add(Dense(2, activation='softmax'))
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
# Fit the model
estimator = KerasClassifier(build_fn=baseline_model, nb_epoch=200, batch_size=10, verbose=0)
kfold = KFold(n_splits=5, shuffle=True, random_state=seed)
results = cross_val_score(estimator, X, Y, cv=kfold, scoring="f1_macro")
print("\nF1 score: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
However, no matter which features I put into X, or how much of them, the F1 Score will always be 49.79%
EDIT: I've also tested with different optimizers, but the result is still the same

Can I predict on continues target value in keras?

This is the example of data that I have.
Length of df is 1778360.
The search term is the queries that people type on Search Engine.
CR (Conversion Rate) is a continuous number. It starts from 0 to no limit.
Search term CR
0 asos french connection lined mac 100
1 hugo boss polo black 50
2 women's pale grey trousers uk 47
3 military jacket 8
4 girls adidas red tracksuit top 0
What I want is to predict the CR with the text as the input.
texts = df['Search term']
tags = df['CR']
num_max = 1000
# preprocess
le = LabelEncoder()
tags = le.fit_transform(tags)
token = Tokenizer(num_words=num_max)
token.fit_on_texts(texts)
mat_texts = token.texts_to_matrix(texts, mode='freq')
print(tags[:5])
print(mat_texts[:5])
print(tags.shape, mat_texts.shape)
# split data to train and test
X_train, X_test, y_train, y_test = train_test_split(mat_texts, tags, train_size=0.8, random_state=1)
# create model
model = Sequential()
model.add(Dense(512, input_dim=num_max, kernel_initializer='normal', activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(100, kernel_initializer='normal', activation='softmax'))
# compile model
model.compile(loss='sparse_categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
# fit the model
model.fit(X_train, y_train, epochs=10, batch_size=32, verbose=0, validation_data=(X_test, y_test))
# evaluate the model
train_scores = model.evaluate(X_train, y_train, verbose=0)
print("Train %s: %.2f%%" % (model.metrics_names[1], train_scores[1]*100))
test_scores = model.evaluate(X_test, y_test, verbose=0)
print("Test %s: %.2f%%" % (model.metrics_names[1], test_scores[1]*100))
I got this as result:
Train acc: 82.53%
Test acc: 82.48%
I'm not sure if the last dense and the loss function is correct. This is more like a linear regression but I couldn't find a suitable keras model for linear regression.
Can somebody help, please? Thanks.
P.s. I'm very new to deep learning and neural network.
For regression problem, the activation of the last dense layer should be 'linear' or 'sigmoid', and the loss should be 'mean_squared_error'.

Categories

Resources