Why are Keras Conv1D weights not changed during training?

Why are Keras Conv1D weights not changed during training? - python

I initialise my network with only one Convolutional layer (8 filters with length 10).
# Initialize Convolutional Neural Network
cnn = Sequential()
conv = Conv1D(filters=8, kernel_size=10, strides=1, padding="same", input_shape=(train.values.shape[1]-1, 1))
cnn.add(conv)
cnn.add(Activation("relu"))
cnn.add(MaxPooling1D(pool_size=2, strides=2, padding="same"))
cnn.add(Flatten())
cnn.add(Dense(2, activation='softmax'))
cnn.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
cnn.summary()
I take the weights once before and once after the training and plot them with a function I wrote.
w1 = conv.get_weights()[0][:, 0, :]
print(w1[:,0])
plot_weights(w1)
# Fit CNN
y = to_categorical(train.values[:, -1])
X_cnn = np.expand_dims(train.values[:, :-1], axis=2)
start = time.time()
cnn.fit(X_cnn, y, verbose=1, batch_size=20, validation_split=0.2, epochs=20)
end = time.time()
w2 = conv.get_weights()[0][:, 0, :]
print(w2[:,0])
plot_weights(w2)
Function to plot the weights:
def plot_weights(w):
w_min = w.min()
w_max = w.max()
n = w.shape[0]
fig, axes = plt.subplots(nrows=8, ncols=1)
for i, ax in enumerate(axes.flat):
im = ax.imshow(w[:, i].reshape(1, n), vmin=w_min, vmax=w_max, interpolation="nearest",
cmap="gray") # Display weights as image
plt.setp(ax.get_yticklabels(), visible=False) # Hide y ticks
ax.tick_params(axis='y', which='both', length=0) # Set length of y ticks to 0
fig.colorbar(im, ax=axes.ravel().tolist())
plt.show(block=False)
return
The output looks like this:
Before training
After training
When i print the first filter before and after training, you can also see that it is the exact same numbers (not even slightly changed).
>>>[-0.20076838 0.03835052 -0.04454999 -0.20220913 0.24402907 0.03407234
-0.09768075 0.16887552 0.12767741 0.00756356]
>>>[-0.20076838 0.03835052 -0.04454999 -0.20220913 0.24402907 0.03407234
-0.09768075 0.16887552 0.12767741 0.00756356]
What is the reason for this behaviour? Am I doing something wrong? The network is clearly learning something, since i get an accuracy of nearly 100%.
--ga97dil

You may need to access the model itself that is being trained, i.e. cnn rather than the definition you use to initialise the layer i.e. conv.
Try cnn.layers[0].get_weights()[:, 0, :] instead of conv.get_weights()[0][:, 0, :].

Related

Why my predictions of LSTM so low and have shape of the last examples?

I'm writing model LSTM for predicting next stock prices. This model shows good test results, but when I am trying to predict next values after initial dataset it shows extremely low values with shape of the last window values. What can I do to prevent this situation?
My model
def create_model():
model = Sequential()
model.add(LSTM(units=128, return_sequences=True, input_shape=(x_train.shape[1], 1)))
model.add(Dropout(0.2))
model.add(LSTM(units=64, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(units=64))
model.add(Dense(units=1))
model.compile(loss='mean_squared_error', optimizer='adam')
return model
Function to prepare dataset for fit and testing
def create_dataset(df, window):
x = []
y = []
for i in range(window, df.shape[0]):
x.append(df[i-window:i, 0])
y.append(df[i, 0])
x = np.array(x)
y = np.array(y)
return (x, y)
Here I am checking on the whole dataset
dataset_valid = np.array(df)
dataset_valid = scaler.transform(dataset_valid)
dataset_valid = np.reshape(dataset_valid, (dataset_valid.shape[0], dataset_valid.shape[1], 1))
x_dataset_valid, y_dataset_valid = create_dataset(dataset_valid, window)
predict = model.predict(x_dataset_valid)
predict = scaler.inverse_transform(predict)
dataset_valid = scaler.inverse_transform(y_dataset_valid)
plt.figure(figsize=(16, 8))
plt.plot(dataset_valid, color='r', label='Original')
plt.plot(predict, color='b', label='Predicted')
plt.legend()
plt.show()
And here I am trying to predict values after dataset
dataset_valid = np.array(df)
dataset_valid = scaler.transform(dataset_valid)
dataset_valid = create_predict(dataset_valid, window)
predict = model.predict(dataset_valid)
predict = scaler.inverse_transform(predict)
predict = np.append(np.array([0] * window), predict)
dataset_valid_1 = np.array(df[:])
dataset_valid_1 = scaler.transform(dataset_valid_1)
predict_1 = model.predict(dataset_valid_1[-window:])
predict_1 = scaler.inverse_transform(predict_1)
predict = np.append(predict, predict_1)
plt.figure(figsize=(16, 8))
plt.plot(df, color='r', label='Original')
plt.plot(predict, color='b', label='Predicted')
plt.legend()
plt.show()
I really have no idea what I am doing wrong
I've tried create function like dataset[-window:], tried predict only next value and after that append this value and repeat, but nothing works

TF 2.0 MLP accuracy always zero

I've written a minimal example of a simple neural network that fits a given function (a multilayer perceptron for regression).
During the training process the loss decresses as expected and the model works fine. However, the accuracy remains constant and equal to 0.0 at all times, and I don't understand why. What am I missing here?
I guess there is some technical detail that prevents the accuracy from updating?
The training process and the resulting model can be seen in this link
Thank you very much for any help you can provide! ;)
PS- Here is a minimal example to reproduce this result:
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import gridspec
# Create TRAINING data
noise = 0.1
N=500
Xt = np.random.uniform(-np.pi, np.pi, size=(N,))
Yt = np.sin(Xt) + noise * np.random.uniform(-1,1,size=Xt.shape)
# Create VALIDATION data
Nv = int(0.1*N)
Xv = np.random.uniform(-np.pi, np.pi, size=(Nv,))
Yv = np.sin(Xv) + noise * np.random.uniform(-1,1,size=Xv.shape)
# Create model
model = Sequential()
model.add( Dense(10, activation='tanh',input_shape=(1,)) )
model.add( Dense(5, activation='tanh') )
model.add( Dense(1, activation=None) )
model.compile(optimizer='adam',
loss='mse',
metrics=['accuracy'])
# Fit & evaluate
history = model.fit(Xt, Yt, validation_data=(Xv,Yv),
epochs=100,
verbose=2)
results = model.evaluate(Xv, Yv,verbose=0)
print('\n\nEvaluating model, loss/acc:', results)
## PLOTS
fig = plt.figure()
gs = gridspec.GridSpec(2, 2)
ax1 = plt.subplot(gs[0,0]) # losses
ax2 = plt.subplot(gs[1,0], sharex=ax1) # accuracies
ax3 = plt.subplot(gs[:,1]) # data & model
# Plot learning curve
err = history.history['loss']
val_err = history.history['val_loss']
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
ax1.plot(err,label='loss')
ax1.plot(val_err,label='val_loss')
ax2.plot(acc,label='accuracy')
ax2.plot(val_acc,label='val_accuracy')
ax1.set_ylim(bottom=0)
ax2.set_ylim(bottom=-0.01)
ax1.legend()
ax2.legend()
# Plot test
# Generate "continous" data for pretty test
x = np.linspace(np.min(Xt),np.max(Xt),1000)
y = model.predict(x)
ax3.scatter(Xt, Yt, label='Training')
ax3.scatter(Xv, Yv, c='C2', label='Validation')
ax3.plot(x, y, 'C3-', lw=4, label='Model')
ax3.legend()
fig.tight_layout()
plt.show()

As Swier pointed out in the comment accuracy is meant for classification.
Nevertheless I thought that some points should yield the exact target value, that's why I was expecting acc>0.
Anyway I mapped the problem to a integer-only problem and in that scenario the accuracy is different from zero. Obviously not a useful metric, but at least it makes (mathematical) sense.
Thanks!!

Why Bother With Recurrent Neural Networks For Structured Data?

I have been developing feedforward neural networks (FNNs) and recurrent neural networks (RNNs) in Keras with structured data of the shape [instances, time, features], and the performance of FNNs and RNNs has been the same (except that RNNs require more computation time).
I have also simulated tabular data (code below) where I expected a RNN to outperform a FNN because the next value in the series is dependent on the previous value in the series; however, both architectures predict correctly.
With NLP data, I have seen RNNs outperform FNNs, but not with tabular data. Generally, when would one expect a RNN to outperform a FNN with tabular data? Specifically, could someone post simulation code with tabular data demonstrating a RNN outperforming a FNN?
Thank you! If my simulation code is not ideal for my question, please adapt it or share a more ideal one!
from keras import models
from keras import layers
from keras.layers import Dense, LSTM
import numpy as np
import matplotlib.pyplot as plt
Two features were simulated over 10 time steps, where the value of the second feature is dependent on the value of both features in the prior time step.
## Simulate data.
np.random.seed(20180825)
X = np.random.randint(50, 70, size = (11000, 1)) / 100
X = np.concatenate((X, X), axis = 1)
for i in range(10):
X_next = np.random.randint(50, 70, size = (11000, 1)) / 100
X = np.concatenate((X, X_next, (0.50 * X[:, -1].reshape(len(X), 1))
+ (0.50 * X[:, -2].reshape(len(X), 1))), axis = 1)
print(X.shape)
## Training and validation data.
split = 10000
Y_train = X[:split, -1:].reshape(split, 1)
Y_valid = X[split:, -1:].reshape(len(X) - split, 1)
X_train = X[:split, :-2]
X_valid = X[split:, :-2]
print(X_train.shape)
print(Y_train.shape)
print(X_valid.shape)
print(Y_valid.shape)
FNN:
## FNN model.
# Define model.
network_fnn = models.Sequential()
network_fnn.add(layers.Dense(64, activation = 'relu', input_shape = (X_train.shape[1],)))
network_fnn.add(Dense(1, activation = None))
# Compile model.
network_fnn.compile(optimizer = 'adam', loss = 'mean_squared_error')
# Fit model.
history_fnn = network_fnn.fit(X_train, Y_train, epochs = 10, batch_size = 32, verbose = False,
validation_data = (X_valid, Y_valid))
plt.scatter(Y_train, network_fnn.predict(X_train), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()
plt.scatter(Y_valid, network_fnn.predict(X_valid), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()
LSTM:
## LSTM model.
X_lstm_train = X_train.reshape(X_train.shape[0], X_train.shape[1] // 2, 2)
X_lstm_valid = X_valid.reshape(X_valid.shape[0], X_valid.shape[1] // 2, 2)
# Define model.
network_lstm = models.Sequential()
network_lstm.add(layers.LSTM(64, activation = 'relu', input_shape = (X_lstm_train.shape[1], 2)))
network_lstm.add(layers.Dense(1, activation = None))
# Compile model.
network_lstm.compile(optimizer = 'adam', loss = 'mean_squared_error')
# Fit model.
history_lstm = network_lstm.fit(X_lstm_train, Y_train, epochs = 10, batch_size = 32, verbose = False,
validation_data = (X_lstm_valid, Y_valid))
plt.scatter(Y_train, network_lstm.predict(X_lstm_train), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()
plt.scatter(Y_valid, network_lstm.predict(X_lstm_valid), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()

In practice even in NLP you see that RNNs and CNNs are often competitive. Here's a 2017 review paper that shows this in more detail. In theory it might be the case that RNNs can handle the full complexity and sequential nature of language better but in practice the bigger obstacle is usually properly training the network and RNNs are finicky.
Another problem that might have a chance of working would be to look at a problem like the balanced parenthesis problem (either with just parentheses in the strings or parentheses along with other distractor characters). This requires processing the inputs sequentially and tracking some state and might be easier to learn with a LSTM then a FFN.
Update:
Some data that looks sequential might not actually have to be treated sequentially. For example even if you provide a sequence of numbers to add since addition is commutative a FFN will do just as well as a RNN. This could also be true of many health problems where the dominating information is not of a sequential nature. Suppose every year a patient's smoking habits are measured. From a behavioral standpoint the trajectory is important but if you're predicting whether the patient will develop lung cancer the prediction will be dominated by just the number of years the patient smoked (maybe restricted to the last 10 years for the FFN).
So you want to make the toy problem more complex and to require taking into account the ordering of the data. Maybe some kind of simulated time series, where you want to predict whether there was a spike in the data, but you don't care about absolute values just about the relative nature of the spike.
Update2
I modified your code to show a case where RNNs perform better. The trick was to use more complex conditional logic that is more naturally modeled in LSTMs than FFNs. The code is below. For 8 columns we see that the FFN trains in 1 minute and reaches a validation loss of 6.3. The LSTM takes 3x longer to train but it's final validation loss is 6x lower at 1.06.
As we increase the number of columns the LSTM has a larger and larger advantage, especially if we added more complicated conditions in. For 16 columns the FFNs validation loss is 19 (and you can more clearly see the training curve as the model isn't able to instantly fit the data). In comparison the LSTM takes 11 times longer to train but has a validation loss of 0.31, 30 times smaller than the FFN! You can play around with even larger matrices to see how far this trend will extend.
from keras import models
from keras import layers
from keras.layers import Dense, LSTM
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
import time
matplotlib.use('Agg')
np.random.seed(20180908)
rows = 20500
cols = 10
# Randomly generate Z
Z = 100*np.random.uniform(0.05, 1.0, size = (rows, cols))
larger = np.max(Z[:, :cols/2], axis=1).reshape((rows, 1))
larger2 = np.max(Z[:, cols/2:], axis=1).reshape((rows, 1))
smaller = np.min((larger, larger2), axis=0)
# Z is now the max of the first half of the array.
Z = np.append(Z, larger, axis=1)
# Z is now the min of the max of each half of the array.
# Z = np.append(Z, smaller, axis=1)
# Combine and shuffle.
#Z = np.concatenate((Z_sum, Z_avg), axis = 0)
np.random.shuffle(Z)
## Training and validation data.
split = 10000
X_train = Z[:split, :-1]
X_valid = Z[split:, :-1]
Y_train = Z[:split, -1:].reshape(split, 1)
Y_valid = Z[split:, -1:].reshape(rows - split, 1)
print(X_train.shape)
print(Y_train.shape)
print(X_valid.shape)
print(Y_valid.shape)
print("Now setting up the FNN")
## FNN model.
tick = time.time()
# Define model.
network_fnn = models.Sequential()
network_fnn.add(layers.Dense(32, activation = 'relu', input_shape = (X_train.shape[1],)))
network_fnn.add(Dense(1, activation = None))
# Compile model.
network_fnn.compile(optimizer = 'adam', loss = 'mean_squared_error')
# Fit model.
history_fnn = network_fnn.fit(X_train, Y_train, epochs = 500, batch_size = 128, verbose = False,
validation_data = (X_valid, Y_valid))
tock = time.time()
print()
print(str('%.2f' % ((tock - tick) / 60)) + ' minutes.')
print("Now evaluating the FNN")
loss_fnn = history_fnn.history['loss']
val_loss_fnn = history_fnn.history['val_loss']
epochs_fnn = range(1, len(loss_fnn) + 1)
print("train loss: ", loss_fnn[-1])
print("validation loss: ", val_loss_fnn[-1])
plt.plot(epochs_fnn, loss_fnn, 'black', label = 'Training Loss')
plt.plot(epochs_fnn, val_loss_fnn, 'red', label = 'Validation Loss')
plt.title('FNN: Training and Validation Loss')
plt.legend()
plt.show()
plt.scatter(Y_train, network_fnn.predict(X_train), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.title('training points')
plt.show()
plt.scatter(Y_valid, network_fnn.predict(X_valid), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.title('valid points')
plt.show()
print("LSTM")
## LSTM model.
X_lstm_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)
X_lstm_valid = X_valid.reshape(X_valid.shape[0], X_valid.shape[1], 1)
tick = time.time()
# Define model.
network_lstm = models.Sequential()
network_lstm.add(layers.LSTM(32, activation = 'relu', input_shape = (X_lstm_train.shape[1], 1)))
network_lstm.add(layers.Dense(1, activation = None))
# Compile model.
network_lstm.compile(optimizer = 'adam', loss = 'mean_squared_error')
# Fit model.
history_lstm = network_lstm.fit(X_lstm_train, Y_train, epochs = 500, batch_size = 128, verbose = False,
validation_data = (X_lstm_valid, Y_valid))
tock = time.time()
print()
print(str('%.2f' % ((tock - tick) / 60)) + ' minutes.')
print("now eval")
loss_lstm = history_lstm.history['loss']
val_loss_lstm = history_lstm.history['val_loss']
epochs_lstm = range(1, len(loss_lstm) + 1)
print("train loss: ", loss_lstm[-1])
print("validation loss: ", val_loss_lstm[-1])
plt.plot(epochs_lstm, loss_lstm, 'black', label = 'Training Loss')
plt.plot(epochs_lstm, val_loss_lstm, 'red', label = 'Validation Loss')
plt.title('LSTM: Training and Validation Loss')
plt.legend()
plt.show()
plt.scatter(Y_train, network_lstm.predict(X_lstm_train), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.title('training')
plt.show()
plt.scatter(Y_valid, network_lstm.predict(X_lstm_valid), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.title("validation")
plt.show()

CNN, GAN, How can the Generator know, what class it should draw?

I have a GAN network. the generator is drawing mnist digits. It works great. But I cant understand how it knows, which digit it should draw.
Here is the Generator:
def build_generator(latent_size):
# we will map a pair of (z, L), where z is a latent vector and L is a
# label drawn from P_c, to image space (..., 1, 28, 28)
cnn = Sequential()
cnn.add(Dense(1024, input_dim=latent_size, activation='relu'))
cnn.add(Dense(128 * 7 * 7, activation='relu'))
cnn.add(Reshape((128, 7, 7)))
# upsample to (..., 14, 14)
cnn.add(UpSampling2D(size=(2, 2)))
cnn.add(Conv2D(256, 5, padding='same',
activation='relu',
kernel_initializer='glorot_normal'))
# upsample to (..., 28, 28)
cnn.add(UpSampling2D(size=(2, 2)))
cnn.add(Conv2D(128, 5, padding='same',
activation='relu',
kernel_initializer='glorot_normal'))
# take a channel axis reduction
cnn.add(Conv2D(1, 2, padding='same',
activation='tanh',
kernel_initializer='glorot_normal'))
# this is the z space commonly refered to in GAN papers
latent = Input(shape=(latent_size, ))
# this will be our label
image_class = Input(shape=(1,), dtype='int32')
cls = Flatten()(Embedding(num_classes, latent_size,
embeddings_initializer='glorot_normal')(image_class))
# hadamard product between z-space and a class conditional embedding
h = layers.multiply([latent, cls])
fake_image = cnn(h)
return Model([latent, image_class], fake_image)
The Input is a latent-array
noise = np.random.uniform(-1, 1, (batch_size, latent_size))
and the labels are just generated randomly.
So my question is. After the network is embedding the labels. They should look like this
So, now. If I give the network more latent-arrays and labels. He is multiplying the latent-arrays(the noise) with the embedding(of the labels):
So what I expect is:
So the network knows, what new array represents what number.
but the output of np.multiply(noise,embedded_label) is this:
So how can the network know, what digit it should draw?
EDIT:
So here is the whole code. And it works. But why?
The latent_size in the code is 100. The latent_size in my pictures is 2, because I wanted to visualize them. But i think it doesn't change a thing, if I multiply the noise in the 2 dimensional space or the 100 dimensional space. At the end the new points with label "1" are not close to the other points with label "1". Same for the other digits("0","1","2","3",...)
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
Train an Auxiliary Classifier Generative Adversarial Network (ACGAN) on the
MNIST dataset. See https://arxiv.org/abs/1610.09585 for more details.
You should start to see reasonable images after ~5 epochs, and good images
by ~15 epochs. You should use a GPU, as the convolution-heavy operations are
very slow on the CPU. Prefer the TensorFlow backend if you plan on iterating,
as the compilation time can be a blocker using Theano.
Timings:
Hardware | Backend | Time / Epoch
-------------------------------------------
CPU | TF | 3 hrs
Titan X (maxwell) | TF | 4 min
Titan X (maxwell) | TH | 7 min
Consult https://github.com/lukedeo/keras-acgan for more information and
example output
"""
from __future__ import print_function
from collections import defaultdict
try:
import cPickle as pickle
except ImportError:
import pickle
from PIL import Image
from six.moves import range
import keras.backend as K
from keras.datasets import mnist
from keras import layers
from keras.layers import Input, Dense, Reshape, Flatten, Embedding, Dropout
from keras.layers.advanced_activations import LeakyReLU
from keras.layers.convolutional import UpSampling2D, Conv2D
from keras.models import Sequential, Model
from keras.optimizers import Adam
from keras.utils.generic_utils import Progbar
import numpy as np
import time, os
np.random.seed(1337)
K.set_image_data_format('channels_first')
num_classes = 10
def build_generator(latent_size):
# we will map a pair of (z, L), where z is a latent vector and L is a
# label drawn from P_c, to image space (..., 1, 28, 28)
cnn = Sequential()
cnn.add(Dense(1024, input_dim=latent_size, activation='relu'))
cnn.add(Dense(128 * 7 * 7, activation='relu'))
cnn.add(Reshape((128, 7, 7)))
# upsample to (..., 14, 14)
cnn.add(UpSampling2D(size=(2, 2)))
cnn.add(Conv2D(256, 5, padding='same',
activation='relu',
kernel_initializer='glorot_normal'))
# upsample to (..., 28, 28)
cnn.add(UpSampling2D(size=(2, 2)))
cnn.add(Conv2D(128, 5, padding='same',
activation='relu',
kernel_initializer='glorot_normal'))
# take a channel axis reduction
cnn.add(Conv2D(1, 2, padding='same',
activation='tanh',
kernel_initializer='glorot_normal'))
# this is the z space commonly refered to in GAN papers
latent = Input(shape=(latent_size, ))
# this will be our label
image_class = Input(shape=(1,), dtype='int32')
cls = Flatten()(Embedding(num_classes, latent_size,
embeddings_initializer='glorot_normal')(image_class))
# hadamard product between z-space and a class conditional embedding
h = layers.multiply([latent, cls])
fake_image = cnn(h)
return Model([latent, image_class], fake_image)
def build_discriminator():
# build a relatively standard conv net, with LeakyReLUs as suggested in
# the reference paper
cnn = Sequential()
cnn.add(Conv2D(32, 3, padding='same', strides=2,
input_shape=(1, 28, 28)))
cnn.add(LeakyReLU())
cnn.add(Dropout(0.3))
cnn.add(Conv2D(64, 3, padding='same', strides=1))
cnn.add(LeakyReLU())
cnn.add(Dropout(0.3))
cnn.add(Conv2D(128, 3, padding='same', strides=2))
cnn.add(LeakyReLU())
cnn.add(Dropout(0.3))
cnn.add(Conv2D(256, 3, padding='same', strides=1))
cnn.add(LeakyReLU())
cnn.add(Dropout(0.3))
cnn.add(Flatten())
image = Input(shape=(1, 28, 28))
features = cnn(image)
# first output (name=generation) is whether or not the discriminator
# thinks the image that is being shown is fake, and the second output
# (name=auxiliary) is the class that the discriminator thinks the image
# belongs to.
fake = Dense(1, activation='sigmoid', name='generation')(features) # fake oder nicht fake
aux = Dense(num_classes, activation='softmax', name='auxiliary')(features) #welche klasse ist es
return Model(image, [fake, aux])
if __name__ == '__main__':
start_time_string = time.strftime("%Y_%m_%d_%H_%M_%S", time.gmtime())
os.mkdir('history/' + start_time_string)
os.mkdir('images/' + start_time_string)
os.mkdir('acgan/' + start_time_string)
# batch and latent size taken from the paper
epochs = 50
batch_size = 100
latent_size = 100
# Adam parameters suggested in https://arxiv.org/abs/1511.06434
adam_lr = 0.00005
adam_beta_1 = 0.5
# build the discriminator
discriminator = build_discriminator()
discriminator.compile(
optimizer=Adam(lr=adam_lr, beta_1=adam_beta_1),
loss=['binary_crossentropy', 'sparse_categorical_crossentropy']
)
# build the generator
generator = build_generator(latent_size)
generator.compile(optimizer=Adam(lr=adam_lr, beta_1=adam_beta_1),
loss='binary_crossentropy')
latent = Input(shape=(latent_size, ))
image_class = Input(shape=(1,), dtype='int32')
# get a fake image
fake = generator([latent, image_class])
# we only want to be able to train generation for the combined model
discriminator.trainable = False
fake, aux = discriminator(fake)
combined = Model([latent, image_class], [fake, aux])
combined.compile(
optimizer=Adam(lr=adam_lr, beta_1=adam_beta_1),
loss=['binary_crossentropy', 'sparse_categorical_crossentropy']
)
# get our mnist data, and force it to be of shape (..., 1, 28, 28) with
# range [-1, 1]
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = (x_train.astype(np.float32) - 127.5) / 127.5
x_train = np.expand_dims(x_train, axis=1)
x_test = (x_test.astype(np.float32) - 127.5) / 127.5
x_test = np.expand_dims(x_test, axis=1)
num_train, num_test = x_train.shape[0], x_test.shape[0]
train_history = defaultdict(list)
test_history = defaultdict(list)
for epoch in range(1, epochs + 1):
print('Epoch {}/{}'.format(epoch, epochs))
num_batches = int(x_train.shape[0] / batch_size)
progress_bar = Progbar(target=num_batches)
epoch_gen_loss = []
epoch_disc_loss = []
for index in range(num_batches):
# generate a new batch of noise
noise = np.random.uniform(-1, 1, (batch_size, latent_size))
# get a batch of real images
image_batch = x_train[index * batch_size:(index + 1) * batch_size]
label_batch = y_train[index * batch_size:(index + 1) * batch_size]
# sample some labels from p_c
sampled_labels = np.random.randint(0, num_classes, batch_size)
# generate a batch of fake images, using the generated labels as a
# conditioner. We reshape the sampled labels to be
# (batch_size, 1) so that we can feed them into the embedding
# layer as a length one sequence
generated_images = generator.predict(
[noise, sampled_labels.reshape((-1, 1))], verbose=0)
x = np.concatenate((image_batch, generated_images))
y = np.array([1] * batch_size + [0] * batch_size)
aux_y = np.concatenate((label_batch, sampled_labels), axis=0)
# see if the discriminator can figure itself out...
epoch_disc_loss.append(discriminator.train_on_batch(x, [y, aux_y]))
# make new noise. we generate 2 * batch size here such that we have
# the generator optimize over an identical number of images as the
# discriminator
noise = np.random.uniform(-1, 1, (2 * batch_size, latent_size))
sampled_labels = np.random.randint(0, num_classes, 2 * batch_size)
# we want to train the generator to trick the discriminator
# For the generator, we want all the {fake, not-fake} labels to say
# not-fake
trick = np.ones(2 * batch_size)
epoch_gen_loss.append(combined.train_on_batch(
[noise, sampled_labels.reshape((-1, 1))],
[trick, sampled_labels]))
progress_bar.update(index + 1)
print('Testing for epoch {}:'.format(epoch))
# evaluate the testing loss here
# generate a new batch of noise
noise = np.random.uniform(-1, 1, (num_test, latent_size))
# sample some labels from p_c and generate images from them
sampled_labels = np.random.randint(0, num_classes, num_test)
generated_images = generator.predict(
[noise, sampled_labels.reshape((-1, 1))], verbose=False)
x = np.concatenate((x_test, generated_images))
y = np.array([1] * num_test + [0] * num_test)
aux_y = np.concatenate((y_test, sampled_labels), axis=0)
# see if the discriminator can figure itself out...
discriminator_test_loss = discriminator.evaluate(
x, [y, aux_y], verbose=False)
discriminator_train_loss = np.mean(np.array(epoch_disc_loss), axis=0)
# make new noise
noise = np.random.uniform(-1, 1, (2 * num_test, latent_size))
sampled_labels = np.random.randint(0, num_classes, 2 * num_test)
trick = np.ones(2 * num_test)
generator_test_loss = combined.evaluate(
[noise, sampled_labels.reshape((-1, 1))],
[trick, sampled_labels], verbose=False)
generator_train_loss = np.mean(np.array(epoch_gen_loss), axis=0)
# generate an epoch report on performance
train_history['generator'].append(generator_train_loss)
train_history['discriminator'].append(discriminator_train_loss)
test_history['generator'].append(generator_test_loss)
test_history['discriminator'].append(discriminator_test_loss)
print('{0:<22s} | {1:4s} | {2:15s} | {3:5s}'.format(
'component', *discriminator.metrics_names))
print('-' * 65)
ROW_FMT = '{0:<22s} | {1:<4.2f} | {2:<15.2f} | {3:<5.2f}'
print(ROW_FMT.format('generator (train)',
*train_history['generator'][-1]))
print(ROW_FMT.format('generator (test)',
*test_history['generator'][-1]))
print(ROW_FMT.format('discriminator (train)',
*train_history['discriminator'][-1]))
print(ROW_FMT.format('discriminator (test)',
*test_history['discriminator'][-1]))
# save weights every epoch
generator.save_weights(
'acgan/'+ start_time_string +'/params_generator_epoch_{0:03d}.hdf5'.format(epoch), True)
discriminator.save_weights(
'acgan/'+ start_time_string +'/params_discriminator_epoch_{0:03d}.hdf5'.format(epoch), True)
# generate some digits to display
noise = np.random.uniform(-1, 1, (100, latent_size))
sampled_labels = np.array([
[i] * num_classes for i in range(num_classes)
]).reshape(-1, 1)
# get a batch to display
generated_images = generator.predict(
[noise, sampled_labels], verbose=0)
# arrange them into a grid
img = (np.concatenate([r.reshape(-1, 28)
for r in np.split(generated_images, num_classes)
], axis=-1) * 127.5 + 127.5).astype(np.uint8)
Image.fromarray(img).save(
'images/'+ start_time_string +'/plot_epoch_{0:03d}_generated.png'.format(epoch))
pickle.dump({'train': train_history, 'test': test_history},
open('history/'+ start_time_string +'/acgan-history.pkl', 'wb'))

Your noise is too big, and has negative values.
You should not multiply the noise, but sum it (and make it a lot smaller).
By multiplying +1 and -1, you can completely change the input. That's the reason for having that completely scattered image in reality.
If even with that weird scattered input the model is still able to recognize the number you meant, then it's probably using certain dimensions of the latent vector more than its actual values.
If you look closely to the scattered graph, it has some interesting patterns such as:
0 - a vertical line. It used only a certain dimension to be zero.
4 - another vertical line.
7 - a horizontal line.
3 - seems to be a diagonal, not sure.
If we can see a pattern (even in a 2D graph hiding actual 100 dimensions), the model can also see a pattern. This pattern might be extremely evident if we could see all the 100 dimensions.
So, your embedding is probably creating a compensation for the wild random factors, maybe by eliminating the random factors with zeros in certain groups of dimensions. That makes the straight lines following certain axes. And certain combinations zero dimensions versus varying dimensions may identify a label.
Example:
For the label 0, your embedding may be creating [0,0,0,0,1,1,1,1,1,1,1,1,...]
For the label 1, it may be creating [1,1,1,1,0,0,0,0,1,1,1,1,1....]
For the label 2, it may be creating [1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1...]
Then the random factor will never change those zeros, and the model can identify a number by checking those groups of four zeros in the examples.
Of course, this is just one supposition... there might be many other possible ways for the model to work around the random factors... but if one exists, it's enough to show that it's ok for the model to find it.

Keras: How to use fit_generator with multiple outputs of different type

In a Keras model with the Functional API I need to call fit_generator to train on augmented images data using an ImageDataGenerator.
The problem is my model has two outputs: the mask I'm trying to predict and a binary value.
I obviously only want to augment the input and the mask output and not the binary value.
How can I achieve this?

The example below might be self-explanatory!
The 'dummy' model takes 1 input (image) and it outputs 2 values. The model computes the MSE for each output.
x = Convolution2D(8, 5, 5, subsample=(1, 1))(image_input)
x = Activation('relu')(x)
x = Flatten()(x)
x = Dense(50, W_regularizer=l2(0.0001))(x)
x = Activation('relu')(x)
output1 = Dense(1, activation='linear', name='output1')(x)
output2 = Dense(1, activation='linear', name='output2')(x)
model = Model(input=image_input, output=[output1, output2])
model.compile(optimizer='adam', loss={'output1': 'mean_squared_error', 'output2': 'mean_squared_error'})
The function below generates batches to feed the model during training. It takes the training data x and the label y where y=[y1, y2]
def batch_generator(x, y, batch_size, is_train):
sample_idx = 0
while True:
X = np.zeros((batch_size, input_height, input_width, n_channels), dtype='float32')
y1 = np.zeros((batch_size, mask_height, mask_width), dtype='float32')
y2 = np.zeros((batch_size, 1), dtype='float32')
# fill up the batch
for row in range(batch_sz):
image = x[sample_idx]
mask = y[0][sample_idx]
binary_value = y[1][sample_idx]
# transform/preprocess image
image = cv2.resize(image, (input_width, input_height))
if is_train:
image, mask = my_data_augmentation_function(image, mask)
X_batch[row, ;, :, :] = image
y1_batch[row, :, :] = mask
y2_batch[row, 0] = binary_value
sample_idx += 1
# Normalize inputs
X_batch = X_batch/255.
yield(X_batch, {'output1': y1_batch, 'output2': y2_batch} ))
Finally, we call the fit_generator()
model.fit_generator(batch_generator(X_train, y_train, batch_size, is_train=1))

If you have separated both mask and binary value you can try something like this:
generator = ImageDataGenerator(rotation_range=5.,
width_shift_range=0.1,
height_shift_range=0.1,
horizontal_flip=True,
vertical_flip=True)
def generate_data_generator(generator, X, Y1, Y2):
genX = generator.flow(X, seed=7)
genY1 = generator.flow(Y1, seed=7)
while True:
Xi = genX.next()
Yi1 = genY1.next()
Yi2 = function(Y2)
yield Xi, [Yi1, Yi2]
So, you use the same generator for both input and mask with the same seed to define the same operation. You may change the binary value or not depending on your needs (Y2). Then, you call the fit_generator():
model.fit_generator(generate_data_generator(generator, X, Y1, Y2),
epochs=epochs)

The best way to achieve this seems to be to create a new generator class expanding the one provided by Keras that parses the data augmenting only the images and yielding all the outputs.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.