I have started some exploration into tensorflow and it's modelling capabilities.
I have a number of normalised 400x500 images stored as numpy arrays.
These are organised as:
180 for training category A,
20 for testing category A,
50 for training category B, and
11 for testing category.
For the moment I am using the introductory model:
model = keras.Sequential([
keras.layers.Flatten(input_shape=(400, 500)),
keras.layers.Dense(12, activation=tf.nn.relu),
keras.layers.Dense(8, activation=tf.nn.sigmoid)
keras.layers.Dense(2, activation=tf.nn.softmax)])
model.compile(optimizer=tf.train.AdamOptimizer(),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=5, batch_size = 5)
train_images contains [0-199] category A and [200-249] category B images.
train_labels contains the respective labels.
During execution, accuracy always sits at around 0.78, irrespective of the number of epochs used. Loss also does not change.
0.78 seems to be close to the ratio of imaged between category A and B.
I would appreciate any assistance to help get going.
Thank you.
I think your model is too simple. You have three dense layers with just a few neurons, that does not seem like something that would be able to recognize some complex features in input of size 200000. You should try first a few convolutional layers.
Related
I'm having some issues with the predict function predicting all 0's or all 1's from my model. Here is my model
model = keras.Sequential(
[
layers.BatchNormalization(),
layers.Dense(200, activation="relu"),
layers.Dense(500, activation="relu"),
layers.Dense(1300, activation="relu"),
layers.Dense(2000, activation="relu"),
layers.Dense(1320, activation="relu"),
layers.Dense(710, activation="relu"),
layers.Dense(150, activation="relu"),
layers.Dense(30, activation="relu"),
layers.BatchNormalization(),
layers.Dense(1, activation="sigmoid"),
]
)
model.compile(loss='binary_crossentropy', optimizer=keras.optimizers.Adam(learning_rate=(0.001)), metrics=[metrics])
history = model.fit(training, target, batch_size=2048, epochs=100, shuffle=True, validation_split=0.2)
I'm very new to deep learning and trying to create models to classify and get predictions. My answer is just based on a 0 or 1 which will say if a customer is going to leave or stay as a customer in the long run. I've tested the data for null and NaN.
I've looked at a lot of posts about what this could be, and for the most part it seems that people were using the wrong activation function for a classification instead of regression problem. And the answer was that if you're using binary crossentropy, you should be using sigmoid (Why does a binary Keras CNN always predict 1?). I thought the output of my network would be correct seeing that I am using ReLu and SigMoid with binary crossentropy but whenever I predict, it's persistent in being all 0's or all 1's. The layers might not make too much sense, I'm still very new at this and playing around to see how layers are affecting the results of when I train and evaluate.
Here is roughly how I am using predict with the data
data = pd.read_csv("judge.csv", skiprows=range(0,0))
samples_to_predict = data.drop(['Surname', 'CreditScore', 'Geography', 'Gender', 'Tenure', 'NumOfProducts', 'HasCrCard', 'EstimatedSalary'], axis=1)
prediction = loaded_model.predict(samples_to_predict.values)
print(prediction)
I've been trying to debug this for a while and any help as to which direction to error could be coming from would be welcomed. I've tried increasing my epoch to 1000, I tried lowering my learning_rate, I believe BatchNormalization might take care of not scaling my data(I might be misunderstanding that), tried lowering my batch_size, I tried simply using 3 Dense layers being two ReLu and one Sigmoid, checked that the data I'm predicting is a numpy array and they've all so far produced the same result of predict outputting all 0's or all 1's.
Turns out I was using predict with categories of the data that I had not trained the model with. For example I have a column titled CustomerID that I dropped to train the model, but when I was predicting, I had forgotten to drop that column which made my model predict all 0's or all 1's. After fixing that issue and making sure that I am using only the categories that I trained it with, to predict, got me predictions that were not all 0's or all 1's.
So, i'm trying to create a CNN which can predict if there is any "support devices" in a x-ray thorax image, but when training my model it seems it's not learning anything.
I'm using a dataset called "CheXpert" which has over 200.000 images to use. After doing some "cleaning", the final dataset ended up with 100.000 images.
As far as the model is concerned, i imported the convolutional base of the vgg16 pretrained model and added by my self 2 fully conected layers. Then, i freezed all the convolutional base and make only trainable the fully conected layers. Here's the code:
from keras.layers import GlobalAveragePooling2D
from keras.models import Model
pretrained_model = VGG16(weights='imagenet', include_top=False)
pretrained_model.summary()
for layer in pretrained_model.layers:
layer.trainable = False
x = pretrained_model.output
x = GlobalAveragePooling2D()(x)
dropout = Dropout(0.25)
# let's add a fully-connected layer
x = Dense(1024, activation='relu')(x)
x = dropout(x)
x = Dense(1024, activation = 'relu')(x)
predictions = Dense(1, activation='sigmoid')(x)
final_model = Model(inputs=pretrained_model.input, outputs=predictions)
final_model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
As far as i know, the normal behavior should be that the accuracy should start low and then grow up with the epochs. But here it only oscillates through the same values (0.93 and 0.95). I'm sorry i cannot upload images to show you the graphs.
To sum up, i want to know if that little variance in the accuracy means that the model is not learning anything.
I have an hypothesis: from all the 100.000 images of the dataset, 95.000 have the label "1" and only 5.000 have the label "0". I think that if diminish the images with "1" equate them with the images with "0" the results would change.
The lack of images labeled "0" doesn't help the CNN for sure. I also suggest to lower the learning rate and play around with the batch size to see if something changes.
I wish it helps.
Because of imbalance training data, I suggest that you can set "class_weight" during the training step. The more data you have, the lower class_weight you set.
class_weight = {0: 1.5, 1: 0.5}
model.fit(X, Y, class_weight=class_weight)
You can check the augment of class_weight in keras document.
class_weight: Optional dictionary mapping class indices (integers) to
a weight (float) value, used for weighting the loss function (during
training only). This can be useful to tell the model to "pay more
attention" to samples from an under-represented class.
I'm learning the simplest neural networks using Dense layers using Keras. I'm trying to implement face recognition on a relatively small dataset (In total ~250 images with 50 images per class).
I've downloaded the images from google images and resized them to 100 * 100 png files. Then I've read those files into a numpy array and also created a one hot label array for training my model.
Here is my code for processing the training data:
X, Y = [], []
feature_map = {
'Alia Bhatt': 0,
'Dipika Padukon': 1,
'Shahrukh khan': 2,
'amitabh bachchan': 3,
'ayushmann khurrana': 4
}
for each_dir in os.listdir('.'):
if os.path.isdir(each_dir):
for each_file in os.listdir(each_dir):
X.append(cv2.imread(os.path.join(each_dir, each_file), -1).reshape(1, -1))
Y.append(feature_map[os.path.basename(each_file).split('-')[0]])
X = np.squeeze(X)
X = X / 255.0 # normalize the training data
Y = np.array(Y)
Y = np.eye(5)[Y]
print (X.shape)
print (Y.shape)
This is printing (244, 40000) and (244, 5). Here is my model:
model = Sequential()
model.add(Dense(8000, input_dim = 40000, activation = 'relu'))
model.add(Dense(1200, activation = 'relu'))
model.add(Dense(700, activation = 'relu'))
model.add(Dense(100, activation = 'relu'))
model.add(Dense(5, activation = 'softmax'))
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fit the model
model.fit(X, Y, epochs=25, batch_size=15)
When I train the model, It stuck at the accuracy 0.2172, which is almost the same as random predictions (0.20).
I've also tried to train mode with grayscale images but still not getting expected accuracy. Also tried with different network architectures by changing the number of hidden layers and neurons in hidden layers.
What am I missing here? Is my dataset too small? or am I missing any other technical detail?
For more details of code, here is my notebook: https://colab.research.google.com/drive/1hSVirKYO5NFH3VWtXfr1h6y0sxHjI5Ey
Two suggestions I can make:
Your data set is probably too small. If you are splitting training and validation at 80/20, that means you are only training on 200 images, which is probably too small. Try increasing your data set to see if results improve.
I would recommend adding Dropout to each layer of your network as your training set is so small. Your network is most likely over-fitting your training data set since it is so small, and Dropout is an easy way to help avoid this problem.
Let me know if these suggestions make a difference!
I agree that the dataset is too small, 50 instances of each person is probably not enough. You can use data augmentation with the keras ImageDataGenerator method to increase the number of images, and rewrite your numpy reshaping code as a pre-processing function for the generator. I also noticed that you haven't shuffled the data, so the network is likely predicting the first class for everything (which is maybe why the accuracy is near random chance).
If increasing the dataset size doesn't help, you'll probably have to play around with the learning rate for the Adam optimizer.
I have been working on a side project trying to learn machine learning with Keras myself and I think I am stuck here.
My intention is to predict the bike availability of a public sharing system that has 31 stations. For now I am only training my model to predict the availability of one station only. I'd like to do online predictions with batch training. I'd like to start giving it a number of bikes at, for example, 00:00 with N given time steps plus the day of the year and weekday.
The input data is this:
Day of the year, encoded as ints, 1-JAN is 0, 2-JAN is 1...
Time in 5' intervals encoded as ints the same way as before, 00:00 is 0, 00:05 is 1...
Weekday, again encoded as int
Those 3 columns are then normalized, then i add the columns that refer to the bikes, they are one hot encode, if the station has 20 bikes the encoded array will have length 21. The data is then transformed to a supervised problem more or less following this tutorial.
Now I divide my dataset into training (65%) and test (35%) samples. And then define the neural network as this:
model = Sequential()
model.add(LSTM(lstm_neurons, batch_input_shape=(1000, 5, 24), stateful=False))
model.add(Dense(max_cases, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics = ['accuracy', 'mse', 'mae'])
Fit the model
for i in range(epochs):
model.fit(train_x, train_y, epochs=1, batch_size=new_batch_size, verbose=2, shuffle=False)
model.reset_states()
w = model.get_weights()
Accuracy plot looks good but the loss one does weird things.
Once the training finishes I predict values, I change from stateless to stateful and modify the batch size
model = Sequential()
model.add(LSTM(lstm_neurons, batch_input_shape=(1, 5, 24), stateful=True))
model.add(Dense(max_cases, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics = ['accuracy', 'mse', 'mae'])
model.set_weights(w)
I now predict using the test values I got from before
for i in range(0, len(test_x)):
auxx = test_x[i].reshape((batch_size, test_x[i].shape[0], test_x[i].shape[1])) # (...,n_in,4)
yhat = model.predict(auxx, batch_size = batch_size)
This is the result, I am zooming it a bit to get a closer look and not a crowded plot. It doesn't look bad at all, it has some errors but overall the predictions looks good enough.
After this I create my set of data to do the online prediction and predict
for i in range(0,290):
# ...
predicted_bikes = model.predict(data_to_feed, batch_size = 1)
# ...
The result is this one, a continuous line.
As I've seen in the previous plot the predicted value is moved like an interval later to the real value which makes me think that the neural network has learnt to repeat the previous values. That's why here I got a straight line.
I am new to neural networks and have two, probably pretty basic, questions. I am setting up a generic LSTM Network to predict the future of sequence, based on multiple Features.
My training data is therefore of the shape (number of training sequences, length of each sequence, amount of features for each timestep).
Or to make it more specific, something like (2000, 10, 3).
I try to predict the value of one feature, not of all three.
Problem:
If I make my Network deeper and/or wider, the only output I get is the constant mean of the values to be predicted. Take this setup for example:
z0 = Input(shape=[None, len(dataset[0])])
z = LSTM(32, return_sequences=True, activation='softsign', recurrent_activation='softsign')(z0)
z = LSTM(32, return_sequences=True, activation='softsign', recurrent_activation='softsign')(z)
z = LSTM(64, return_sequences=True, activation='softsign', recurrent_activation='softsign')(z)
z = LSTM(64, return_sequences=True, activation='softsign', recurrent_activation='softsign')(z)
z = LSTM(128, activation='softsign', recurrent_activation='softsign')(z)
z = Dense(1)(z)
model = Model(inputs=z0, outputs=z)
print(model.summary())
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
history= model.fit(trainX, trainY,validation_split=0.1, epochs=200, batch_size=32,
callbacks=[ReduceLROnPlateau(factor=0.67, patience=3, verbose=1, min_lr=1E-5),
EarlyStopping(patience=50, verbose=1)])
If I just use one layer, like:
z0 = Input(shape=[None, len(dataset[0])])
z = LSTM(4, activation='soft sign', recurrent_activation='softsign')(z0)
z = Dense(1)(z)
model = Model(inputs=z0, outputs=z)
print(model.summary())
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
history= model.fit(trainX, trainY,validation_split=0.1, epochs=200, batch_size=32,
callbacks=[ReduceLROnPlateau(factor=0.67, patience=3, verbose=1, min_lr=1E-5),
EarlyStopping(patience=200, verbose=1)])
The predictions are somewhat reasonable, at least they are not constant anymore.
Why does that happen? Around 2000 samples not that many, but in the case of overfitting, I would expect the predictions to match perfectly...
EDIT: Solved, as stated in the comments, it's just that Keras always expects Batches: Keras
When I use:
`test=model.predict(trainX[0])`
to get the prediction for the first sequence, I get an dimension error:
"Error when checking : expected input_1 to have 3 dimensions, but got array with shape (3, 3)"
I need to feed in an array of sequences like:
`test=model.predict(trainX[0:1])`
This is a workaround, but I am not really sure, whether this has any deeper meaning, or is just a syntax thing...
This is because you have not normalised input data.
Any neural network model will initially have weights normalised around zero. Since your training dataset has all positive values, the model will try to adjust its weights to predict only positive values. However, the activation function (in your case softsign) will map it to 1. So the model can do nothing except adding the bias. That is why you are getting an almost constant line around the average value of the dataset.
For this, you can use a general tool like sklearn to pre-process your data. If you are using pandas dataframe, something like this will help
data_df = (data_df - data_df.mean()) / data_df.std()
Or to have the parameters in the model, you can consider adding batch normalization layer to your model