Model not converging towards distribution - python

I'm still a beginner to keras and playing around with it.
My current goal is to make a model learn a distribution. For this I have chosen the numpy beta distribution function.
My aim was to make the model learn the beta distribution and tell if a value would be inside it or not.
So I made a csv with 5000 values of beta/rect values, which the model should learn from.
But when the model is learning there is absolutely no change in it. It seems I have a wrong approach to my problem or it can't be solved this way.
I've tried changing the model, but that doesn't seem to work.
data_size = 5000
X = np.zeros((data_size, 2))
Y = np.zeros((data_size, 1))
for i in range(np.size(X, 0)):
X[i][0] = np.random.beta(2, 2)
X[i][1] = np.random.random()
Y = X[i][0]
np.savetxt('\values.csv', X, delimiter=',')
dataset = np.loadtxt('\values.csv', delimiter=',')
X_train = dataset[:, 0:2]
Y_train = dataset[:, 1]
model = Sequential()
model.add(Dense(32, input_dim=2, activation='tanh'))
model.add(Dense(16, activation='tanh'))
model.add(Dense(1, activation='softmax'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_train, Y_train, epochs=500, batch_size=50, verbose=1, validation_split=0.2)

I've changed to a GAN.
The discriminator takes a distribution as inpu while the generator learns to reproduce it.
Works like a miracle and needs just a few epochs to converge.

Related

Why I get a very low accuracy with LSTM and pretrained word2vec?

I'm working on a reviews classification model with only two categories 0 (negative) and 1 (positive). I'm using pre-trained word2vec from google with LSTM. The problem is I get an accuracy of around 50% where it should be around 83% according to this paper. I tried many different hyperparameters combination and still gets a horrible accuracy. I also tried to change the data preprocessing techniques and tried stemming but it hasn't resolved the problem
here's my code
X, y = read_data()
X = np.array(clean_text(X)) #apply data preprocessing
tokenizer = Tokenizer()
tokenizer.fit_on_texts(X)
#converts text to sequence and add padding zeros
sequence = tokenizer.texts_to_sequences(X)
X_data = pad_sequences(sequence, maxlen = length, padding = 'post')
X_train, X_val, y_train, y_val = train_test_split(X_data, y, test_size = 0.2)
#Load the word2vec model
word2vec = KeyedVectors.load_word2vec_format(EMBEDDING_FILE, binary=True)
word_index = tokenizer.word_index
nb_words = min(MAX_NB_WORDS, len(word_index))+1
embedding_matrix = np.zeros((nb_words, EMBEDDING_DIM))
null_words = []
for word, i in word_index.items():
if word in word2vec.wv.vocab:
embedding_matrix[i] = word2vec.word_vec(word)
else:
null_words.append(word)
embedding_layer = Embedding(embedding_matrix.shape[0], # or len(word_index) + 1
embedding_matrix.shape[1], # or EMBEDDING_DIM,
weights=[embedding_matrix],
input_length=701,
trainable=False)
model = Sequential()
model.add(embedding_layer)
model.add(LSTM(100))
model.add(Dropout(0.4))
model.add(Dense(2, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_train, y_train, batch_size=32, epochs=2, validation_data=(X_val, y_val), workers = -1, verbose=1)
score, acc = model.evaluate(X_val, y_val, batch_size=64)
I also tried other optimizers like AdaMax and MSLE loss function and no matter how much I increase the epoch or change the batch size the accuracy never gets better. I'm just so confused if the problem isn't with the model and preprocessing where could it be? Thanks
Few things I noted,
Why do you have the trainable=False it is restricting your model, so that the model cannot finetune the embedding. Having to learn a problem using a fixed set of embedding is difficult than using trainable embedding. Therefore, try setting trainable=True.
embedding_layer = Embedding(embedding_matrix.shape[0], # or len(word_index) + 1
embedding_matrix.shape[1], # or EMBEDDING_DIM,
weights=[embedding_matrix],
input_length=701,
trainable=False)
Second problem is that you are using 2 units with sigmoid activation and binary_crossentropy. This combination doesn't work. You have two options.
model = Sequential()
...
model.add(Dense(2, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
Option 1
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
If you pick this option, note that your labels need to be [sample size, 1] shape.
Option 2
model.add(Dense(2, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

CNN model overfitting on multi-class classification

I am trying to use GloVe embeddings to train a cnn model based on this article (also a rnn, which has this issue). The dataset is a labeled data: text (tweets) with labels (hate, offensive or neither).
The problem is that model performs well on train set but poorly on validation set.
here is the model:
kernel_size = 2
filters = 256
pool_size = 2
gru_node = 64
model = Sequential()
model.add(Embedding(len(word_index) + 1,
EMBEDDING_DIM,
weights=[embedding_matrix],
input_length=MAX_SEQUENCE_LENGTH,
trainable=True))
model.add(Dropout(0.25))
model.add(Conv1D(filters, kernel_size, activation='relu'))
model.add(MaxPooling1D(pool_size=pool_size))
model.add(Conv1D(filters, kernel_size, activation='softmax'))
model.add(MaxPooling1D(pool_size=pool_size))
model.add(LSTM(gru_node, return_sequences=True, recurrent_dropout=0.2))
model.add(LSTM(gru_node, return_sequences=True, recurrent_dropout=0.2))
model.add(LSTM(gru_node, return_sequences=True, recurrent_dropout=0.2))
model.add(LSTM(gru_node, recurrent_dropout=0.2))
model.add(Dense(1024,activation='relu'))
model.add(Dense(nclasses))
model.add(Activation('softmax'))
model.compile(loss='sparse_categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
fitting the model:
X = df.tweet
y = df['classifi'] # classes 0,1,2
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, shuffle=False)
X_train_Glove,X_test_Glove, word_index,embeddings_index = loadData_Tokenizer(X_train,X_test)
model_RCNN = Build_Model_RCNN_Text(word_index,embeddings_index, 20)
model_RCNN.fit(X_train_Glove, y_train,validation_data=(X_test_Glove, y_test),
epochs=15,batch_size=128,verbose=2)
predicted = model_RCNN.predict(X_test_Glove)
predicted = np.argmax(predicted, axis=1)
print(metrics.classification_report(y_test, predicted))
this is what the distribution looks like (0:hate, 1:offensive, 2:neither)
model summary
Results:
classification report
is this the correct approach or am I missing something here
Generally speaking there are two sides that you can tackle overfitting:
Improving the data
More unique data
oversampling (to balance data)
Limiting the network structure
Dropout (You've implemented this)
Less parameters (You might want to benchmark against a much smaller network)
regularization (ex. L1 and L2)
I'd suggest trying with significantly fewer parameters (because this is quick) and oversampling (because your data seems lopsided).
Also, You can also try hyperparameter fitting. Making a large number of networks with different parameters than picking the best one.
Note: if you do hyper parameter fitting make sure to have an extra validation set because you can easily overfit your test set this way.
Side note: Sometimes when troubleshooting NN it is helpful to set the optimizer to a basic stochastic gradient descent. It slows the training down a bunch but makes the progression much clearer.
Good luck!

Adding prior belief into a neural Network

I am busy with a classification problem, with three classes. One of the classes is never predicted/classified. I would like to know if there s anyway to inject a prior belief into my neural network, be design or not.
My football prediction model predicts [Draws , Home Win , Away Win]. My classes are pretty balanced (40% , 30 % , 30%). The class [Draw] that accounts for 40% of the data is the one the my NN never predicts. My dataset contains 1900 samples.
I am using a deep NN with 2 to 4 hidden layers.
My code of my best model(based on training/val loss) is as follows:
X_all = df.copy()
train_cols = ['a_line0','a_line1','a_line2','a_line3','a_line4','a_line5',
'a_line6','a_line7','a_line8','a_line9','a_line10','h_line0',
'h_line1','h_line2','h_line3','h_line4','h_line5','h_line6',
'h_line7','h_line8','h_line9','h_line10','odds0','odds1','odds2']
x = X_all[train_cols]
x_v = x.values #returns a numpy array
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x_v)
x = pd.DataFrame(x_scaled)
y = X_all['result']
ohe = OneHotEncoder(n_values=3,categories='auto')
y = ohe.fit_transform(y.reshape(-1,1))
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=0)
for lr,ep in [(0.001,300)]:
model = Sequential()
model.add(Dense(25, input_dim=25, activation='relu'))
model.add(Dense(36, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(12, activation='relu'))
model.add(Dense(3, activation='sigmoid'))
adam = kr.optimizers.Adam(lr=lr, decay=1e-6)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])
model.fit(X_train, y_train, epochs=ep, batch_size=10,verbose = 0)
_, accuracy = model.evaluate(X_test, y_test)
_, accuracy1 = model.evaluate(X_train, y_train)
print('Testing Accuracy: %.2f' % (accuracy*100),'Train Accuracy: %.2f' % (accuracy1*100), 'learning rate : ', lr)
I apologise if the code is a bit messy.
My model also overfits by +- 16% (52% vs 68%) on this config of my network.
Since you are in a multi-class single-label setting (i.e. your labels are mutually exclusive), you should not use sigmoid as activation in your final layer; change it to
model.add(Dense(3, activation='softmax'))
Also, dropout should not be used by default; remove it for starters, and only add it if it improves the result.

Neural network for beginner

I'm willing to create a Neural network in python, using Keras, that tells if anumber is even or odd. I know that can be done in many ways and that using NN for this is overkill but i want to this for educational purpose.
I'm running into an issue: the accuracy of my model is about 50 % that means that it's unable to tell if a number is even or odd.
I'll detail to you the step that i went through and hopefully we'll find a solution together :)
Step one creation of the data and labels:
Basically my data are the number from 0 to 99(binary) and the labels are 0(odd) and 1(even)
for i in range(100):
string = np.binary_repr(i,8)
array = []
for k in string:
array.append(int(k))
array = np.array(array)
labels.append(-1*(i%2 - 1))
Then I'm creating the model thas is made of 3 layer.
-Layer 1 (input) : one neuron that's takes any numpy array of size 8 (8 bit representation of integers)
-Layer 2 (Hidden) : two neurons
-Layer 3 (outuput) : one neuron
# creating a model
model = Sequential()
model.add(Dense(1, input_dim=8, kernel_initializer='uniform', activation='relu'))
model.add(Dense(2, kernel_initializer='uniform', activation='relu'))
model.add(Dense(1, kernel_initializer='uniform', activation='sigmoid'))
then I'm training the model using binary_cross_entropy as a loss function since i want a binary classification of integers:
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
then I'm training the model and evaluating it:
#training
model.fit(data, labels, epochs=10, batch_size=2)
#evaluate the model
scores = model.evaluate(data, labels)
print("\n%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
And that's where I'm lost because of that 50 % accuracy.
I think i missundesrtood something about NN or Keras implementation so any help would be appreciated.
Thank you for reading
edit : I modified my code according to the comment of Stefan Falk
The following gives me an accuracy on the test set of 100%:
import numpy as np
from tensorflow.contrib.learn.python.learn.estimators._sklearn import train_test_split
from tensorflow.python.keras import Sequential
from tensorflow.python.keras.layers import Dense
# Number of samples (digits from 0 to N-1)
N = 10000
# Input size depends on the number of digits
input_size = int(np.log2(N)) + 1
# Generate data
y = list()
X = list()
for i in range(N):
binary_string = np.binary_repr(i, input_size)
array = np.zeros(input_size)
for j, binary in enumerate(binary_string):
array[j] = int(binary)
X.append(array)
y.append(int(i % 2 == 0))
X = np.asarray(X)
y = np.asarray(y)
# Make train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1)
# Create the model
model = Sequential()
model.add(Dense(2, kernel_initializer='uniform', activation='relu'))
model.add(Dense(1, kernel_initializer='uniform', activation='sigmoid'))
model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])
# Train
model.fit(X_train, y_train, epochs=3, batch_size=10)
# Evaluate
print("Evaluating model:")
scores = model.evaluate(X_test, y_test)
print("\n%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
Why does it work that well?
Your problem is very simple. The network only needs to know whether the first bit is set (1) or not (0). For this you actually don't need a hidden layer or any non-linlearities. The problem can be solved with simple linear regression.
This
model = Sequential()
model.add(Dense(1, kernel_initializer='uniform', activation='sigmoid'))
will do the job as well. Further, on the topic of feature engineering,
X = [v % 2 for v in range(N)]
is also enough. You'll see that X in that case will have the same content as y.
Maybe try a non-linear example such as XOR. Note that we do not have a test-set here because there's nothing to generalize or any "unseen" data which may surprise the network.
import numpy as np
from tensorflow.python.keras import Sequential
from tensorflow.python.keras.layers import Dense
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])
model = Sequential()
model.add(Dense(5, input_dim=2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, y, batch_size=1, nb_epoch=1000)
print(model.predict_proba(X))
print(model.predict_proba(X) > 0.5)
Look at this link and play around with the example.

How to predict a function/table using Keras?

I am currently learning keras. My goal is to create a simple model, that predicts values of a function. At first I create two arrays, one for the X-Values and one for the corresponding Y-Values.
# declare and init arrays for training-data
X = np.arange(0.0, 10.0, 0.05)
Y = np.empty(shape=0, dtype=float)
# Calculate Y-Values
for x in X:
Y = np.append(Y, float(0.05*(15.72807*x - 7.273893*x**2 + 1.4912*x**3 - 0.1384615*x**4 + 0.00474359*x**5)))
Then I create and train the model
# model architecture
model = Sequential()
model.add(Dense(1, input_shape=(1,)))
model.add(Dense(5))
model.add(Dense(1, activation='linear'))
# compile model
model.compile(loss='mean_absolute_error', optimizer='adam', metrics=['accuracy'])
# train model
model.fit(X, Y, epochs=150, batch_size=10)
and predict the values using the model
# declare and init arrays for prediction
YPredict = np.empty(shape=0, dtype=float)
# Predict Y
YPredict = model.predict(X)
# plot training-data and prediction
plt.plot(X, Y, 'C0')
plt.plot(X, YPredict, 'C1')
# show graph
plt.show()
and I get this output (blue is training-data, orange is prediction):
What did I do wrong? I guess it's a fundamental problem with the network-architecture, right?
The problem is indeed with your network architecture. Specifically, you are using linear activations in all layers: this means that the network can only fit linear functions. You should keep a linear activation in the output layer, but you should use a ReLU activation in the hidden layer:
model.add(Dense(1, input_shape=(1,)))
model.add(Dense(5, activation='relu'))
model.add(Dense(1, activation='linear'))
Then, play with the number/size of the hidden layers; I suggest you use a couple more.
On top of the answer provided by BlackBear:
You should normalize both your inputs X and your outputs Y before feeding them into your neural network:
# Feature Scaling (ignore possible warnings due to conversion of integers to floats)
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X)
sc_Y = StandardScaler()
Y_train = sc_Y.fit_transform(Y)
# [...]
model.fit(X_train, Y_train, ...)
See this answer to see what happens if you don't, in a regression setting very similar to yours. Keep in mind that you should similarly scale any test data using sc_X; also, if you need later to scale any predictions produced by the model back to the original scale of your Y, you should use
sc_Y.inverse_transform(predictions)
Accuracy has no meaning in a regression setting like yours; you should remove metrics=['accuracy'] from your model compilation (loss itself is enough as a metric here)

Categories

Resources