I am currently learning keras. My goal is to create a simple model, that predicts values of a function. At first I create two arrays, one for the X-Values and one for the corresponding Y-Values.
# declare and init arrays for training-data
X = np.arange(0.0, 10.0, 0.05)
Y = np.empty(shape=0, dtype=float)
# Calculate Y-Values
for x in X:
Y = np.append(Y, float(0.05*(15.72807*x - 7.273893*x**2 + 1.4912*x**3 - 0.1384615*x**4 + 0.00474359*x**5)))
Then I create and train the model
# model architecture
model = Sequential()
model.add(Dense(1, input_shape=(1,)))
model.add(Dense(5))
model.add(Dense(1, activation='linear'))
# compile model
model.compile(loss='mean_absolute_error', optimizer='adam', metrics=['accuracy'])
# train model
model.fit(X, Y, epochs=150, batch_size=10)
and predict the values using the model
# declare and init arrays for prediction
YPredict = np.empty(shape=0, dtype=float)
# Predict Y
YPredict = model.predict(X)
# plot training-data and prediction
plt.plot(X, Y, 'C0')
plt.plot(X, YPredict, 'C1')
# show graph
plt.show()
and I get this output (blue is training-data, orange is prediction):
What did I do wrong? I guess it's a fundamental problem with the network-architecture, right?
The problem is indeed with your network architecture. Specifically, you are using linear activations in all layers: this means that the network can only fit linear functions. You should keep a linear activation in the output layer, but you should use a ReLU activation in the hidden layer:
model.add(Dense(1, input_shape=(1,)))
model.add(Dense(5, activation='relu'))
model.add(Dense(1, activation='linear'))
Then, play with the number/size of the hidden layers; I suggest you use a couple more.
On top of the answer provided by BlackBear:
You should normalize both your inputs X and your outputs Y before feeding them into your neural network:
# Feature Scaling (ignore possible warnings due to conversion of integers to floats)
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X)
sc_Y = StandardScaler()
Y_train = sc_Y.fit_transform(Y)
# [...]
model.fit(X_train, Y_train, ...)
See this answer to see what happens if you don't, in a regression setting very similar to yours. Keep in mind that you should similarly scale any test data using sc_X; also, if you need later to scale any predictions produced by the model back to the original scale of your Y, you should use
sc_Y.inverse_transform(predictions)
Accuracy has no meaning in a regression setting like yours; you should remove metrics=['accuracy'] from your model compilation (loss itself is enough as a metric here)
Related
I have a table with 1799 users and 31 features which are arranged in rows and columns respectively. The last column is a 2-type condition feature that tells the model which condition the users belong to. I understood that by using LSTM I need to make my input to be 3-d. So, I used reshape(31,1) as I don't have time series data. I also understood that input_shape took in the number of features. My issue is that I want to predict a new set of users who also have the same 30 features and give me a classification result about which user belongs to which condition. It would be better if the result can tell me what is the probability of each of the conditions predicted. So, I tried to use model.predict to do the mentioned tasks. It gave me a result of a numpy array predict_prob with a shape=(200, 31, 1). I am confused at the part that the data structure should be [(31x1)x200] and the output should be the conditions of the users which should be (200,). How come the result is in 3-d and how should I convert it to dataframe format so that I can read it in .csv format? Thank you in advance.
X = raw_data[feature_names]
P = predict_data_raw[feature_names]
P1 = predict_data_raw[feature_names1]
#Training
y = raw_data['Conditions']
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=22, test_size=0.1)
X_test = np.expand_dims(X_test, axis=2)
# fit and evaluate a model
model = Sequential()
model.add(Reshape((31,1)))
model.add(Bidirectional(LSTM(10, return_sequences=True),input_shape=(31,)))
model.add(Dropout(0.5))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# compile the keras model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit the keras model on the dataset
LSTM = model.fit(X_train, y_train, epochs=5, batch_size=10)
# evaluate the keras model
_, accuracy = model.evaluate(X_test)
print('Accuracy: %.2f' % (accuracy*100))
predict_prob=model.predict([X_test])
df = pd.DataFrame(predict_prob, columns=["Prediction"])
I am trying to use GloVe embeddings to train a cnn model based on this article (also a rnn, which has this issue). The dataset is a labeled data: text (tweets) with labels (hate, offensive or neither).
The problem is that model performs well on train set but poorly on validation set.
here is the model:
kernel_size = 2
filters = 256
pool_size = 2
gru_node = 64
model = Sequential()
model.add(Embedding(len(word_index) + 1,
EMBEDDING_DIM,
weights=[embedding_matrix],
input_length=MAX_SEQUENCE_LENGTH,
trainable=True))
model.add(Dropout(0.25))
model.add(Conv1D(filters, kernel_size, activation='relu'))
model.add(MaxPooling1D(pool_size=pool_size))
model.add(Conv1D(filters, kernel_size, activation='softmax'))
model.add(MaxPooling1D(pool_size=pool_size))
model.add(LSTM(gru_node, return_sequences=True, recurrent_dropout=0.2))
model.add(LSTM(gru_node, return_sequences=True, recurrent_dropout=0.2))
model.add(LSTM(gru_node, return_sequences=True, recurrent_dropout=0.2))
model.add(LSTM(gru_node, recurrent_dropout=0.2))
model.add(Dense(1024,activation='relu'))
model.add(Dense(nclasses))
model.add(Activation('softmax'))
model.compile(loss='sparse_categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
fitting the model:
X = df.tweet
y = df['classifi'] # classes 0,1,2
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, shuffle=False)
X_train_Glove,X_test_Glove, word_index,embeddings_index = loadData_Tokenizer(X_train,X_test)
model_RCNN = Build_Model_RCNN_Text(word_index,embeddings_index, 20)
model_RCNN.fit(X_train_Glove, y_train,validation_data=(X_test_Glove, y_test),
epochs=15,batch_size=128,verbose=2)
predicted = model_RCNN.predict(X_test_Glove)
predicted = np.argmax(predicted, axis=1)
print(metrics.classification_report(y_test, predicted))
this is what the distribution looks like (0:hate, 1:offensive, 2:neither)
model summary
Results:
classification report
is this the correct approach or am I missing something here
Generally speaking there are two sides that you can tackle overfitting:
Improving the data
More unique data
oversampling (to balance data)
Limiting the network structure
Dropout (You've implemented this)
Less parameters (You might want to benchmark against a much smaller network)
regularization (ex. L1 and L2)
I'd suggest trying with significantly fewer parameters (because this is quick) and oversampling (because your data seems lopsided).
Also, You can also try hyperparameter fitting. Making a large number of networks with different parameters than picking the best one.
Note: if you do hyper parameter fitting make sure to have an extra validation set because you can easily overfit your test set this way.
Side note: Sometimes when troubleshooting NN it is helpful to set the optimizer to a basic stochastic gradient descent. It slows the training down a bunch but makes the progression much clearer.
Good luck!
I'm still a beginner to keras and playing around with it.
My current goal is to make a model learn a distribution. For this I have chosen the numpy beta distribution function.
My aim was to make the model learn the beta distribution and tell if a value would be inside it or not.
So I made a csv with 5000 values of beta/rect values, which the model should learn from.
But when the model is learning there is absolutely no change in it. It seems I have a wrong approach to my problem or it can't be solved this way.
I've tried changing the model, but that doesn't seem to work.
data_size = 5000
X = np.zeros((data_size, 2))
Y = np.zeros((data_size, 1))
for i in range(np.size(X, 0)):
X[i][0] = np.random.beta(2, 2)
X[i][1] = np.random.random()
Y = X[i][0]
np.savetxt('\values.csv', X, delimiter=',')
dataset = np.loadtxt('\values.csv', delimiter=',')
X_train = dataset[:, 0:2]
Y_train = dataset[:, 1]
model = Sequential()
model.add(Dense(32, input_dim=2, activation='tanh'))
model.add(Dense(16, activation='tanh'))
model.add(Dense(1, activation='softmax'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_train, Y_train, epochs=500, batch_size=50, verbose=1, validation_split=0.2)
I've changed to a GAN.
The discriminator takes a distribution as inpu while the generator learns to reproduce it.
Works like a miracle and needs just a few epochs to converge.
I am trying to get into machine learning with Keras.
I am not a Mathematician and I have only a basic understanding of how neural net-works (haha get it?), so go easy on me.
This is my current code:
from keras.utils import plot_model
from keras.models import Sequential
from keras.layers import Dense
from keras import optimizers
import numpy
# fix random seed for reproducibility
numpy.random.seed(7)
# split into input (X) and output (Y) variables
X = []
Y = []
count = 0
while count < 10000:
count += 1
X += [count / 10000]
numpy.random.seed(count)
#Y += [numpy.random.randint(1, 101) / 100]
Y += [(count + 1) / 100]
print(str(X) + ' ' + str(Y))
# create model
model = Sequential()
model.add(Dense(50, input_dim=1, kernel_initializer = 'uniform', activation='relu'))
model.add(Dense(50, kernel_initializer = 'uniform', activation='relu'))
model.add(Dense(1, kernel_initializer = 'uniform', activation='sigmoid'))
# Compile model
opt = optimizers.SGD(lr=0.01)
model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])
# Fit the model
model.fit(X, Y, epochs=150, batch_size=100)
# evaluate the model
scores = model.evaluate(X, Y)
predictions = model.predict(X)
print("\n%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
print (str(predictions))
##plot_model(model, to_file='C:/Users/Markus/Desktop/model.png')
The accuracy stays zero and the predictions are an array of 1's. What am I doing wrong?
From what I can see you are trying to solve a regression problem (floating point function output) rather than a classification problem (one hot vector style output/put input into categories).
Your sigmoid final layer will only give an output between 0 and 1, which clearly limits your NNs ability to predict the desired range of Y values which go up much higher. Your NN is trying to get as close as it can, but you are limiting it! Sigmoids in the output layer are good for single class yes/no output, but not regression.
So, you want your last layer to have a linear activation where the inputs are just summed. Something like this instead of the sigmoid.
model.add(Dense(1, kernel_initializer='lecun_normal', activation='linear'))
Then it will likely work, at least if the learning rate is low enough.
Google "keras regression" for useful links.
Looks like you are attempting to do binary classification, with a binary_crossentropy loss function. However, the class labels Y are floats. The labels should be 0 or 1. So the biggest problem lies in the input data you are feeding the model for training.
You can try some data that makes more sense, for example two classes where data are sampled from two different normal distributions, and the labels are either 0 or 1 for each observation:
X = np.concatenate([np.random.randn(10000)/2, np.random.randn(10000)/2+1])
Y = np.concatenate([np.zeros(10000), np.ones(10000)])
The model should be able to go somewhere with this type of data.
I try to create a neural network with keras (backened tensorflow).
I have 4 Input and 2 Output variables:
not available
I want to do predictions to a Testset not available.
This is my Code:
from keras import optimizers
from keras.models import Sequential
from keras.layers import Dense
import numpy
numpy.random.seed(7)
dataset = numpy.loadtxt("trainingsdata.csv", delimiter=";")
X = dataset[:,0:4]
Y = dataset[:,4:6]
model = Sequential()
model.add(Dense(4, input_dim=4, init='uniform', activation='sigmoid'))
model.add(Dense(3, init='uniform', activation='sigmoid'))
model.add(Dense(2, init='uniform', activation='linear'))
sgd = optimizers.SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='mean_squared_error', optimizer='sgd', metrics=['accuracy'])
model.fit(X, Y, epochs=150, batch_size=10, verbose=2)
testset = numpy.loadtxt("testdata.csv", delimiter=";")
Z = testset[:,0:4]
predictions = model.predict(Z)
print(predictions)
When I run the script, the accuracy is 1.000 after every epoch and I get as result always the same output for every input pair:
[-5.83297 68.2967]
[-5.83297 68.2967]
[-5.83297 68.2967]
...
Has anybody an idea what the fault in my code is?
I suggest you normalize / standardize your data before feeding it to your model and then check if your model starts to learn.
Have a look at scikit-learn's StandardScaler.
And look into this SO thread to learn how to correctly fit_transform your training data and only transform your test data.
There is also this tutorial that makes use of scikit-learn's data preprocessing pipeline: http://machinelearningmastery.com/regression-tutorial-keras-deep-learning-library-python/
Neural networks have a tough time if the scale of the input variables is too different from each other. Having 10, 1000, 100000 as the same inputs causes the gradients to collapse towards whatever the large value is. The other values effectively don't provide any information.
One method is to simply rescale the input variables by a constant. You can simply divide the 206000 by 100000. Try getting all of the variables to be at around the same number of digits. Large numbers are a bit harder than small numbers, for networks.