TL;DR:
Have input (batch_size,128,60,41,2) and labels (batch_size, 128), each label is either 0, 1, or 2. How should network output be/how to design output?
I am converting audio clips of length 60 seconds each to an array of (128, 60, 41, 2)*. This is my feature data per example.
As for the labels, I have (per example) and array of shape (128,), for each of the 128 things I extract.
So one (feature, label) pair is int the form (feature={128, 60, 41,2}, label={128}).
When I batch the data, the features and labels get appended to; e.g. for a batch of size 10: the features are of shape: (10,128,60,41,2), and labels are of shape (10,128)
My clarified question is: How can I design the network to calculate a loss based on this labels?
The longer version:
The last dense layer should have 3 units, one per class. Now, I have a batch with bs items. Thus, I have labels in the shape (bs,128). How can the network be designed to calculate the loss; the first batch item is of shape (128,60,41,2), and the labels for this first item are in shape (128,). Each label within is either 0, 1, or 2. Now I want to calculate to design the network to have as the last output a shape of (None, 128,3).
None is the batch size, 128 for all the things I extract, and the 3 because I have three classes
Edit: Thanks for the notes, I hopefully clarified the question
*For those further interested:
I use a sliding window over the time axis. For each window, I extract log-scaled spectrograms. Here, 128 is the number of windows, 60 and 41 control the Mel scale, and 2 is for adding a delta dimension.
The code to generate the audio:
def sub_method(fn, label, bands, frames, delta):
def _windows(data, window_size):
start = 0
while start < len(data):
yield int(start), int(start + window_size)
start += (window_size // 2)
window_size = 512 * (frames - 1)
segment_log_specgrams, segment_labels = [], []
sound_clip,sr = librosa.load(fn)
for (start,end) in _windows(sound_clip,window_size):
if(len(sound_clip[start:end]) == window_size):
signal = sound_clip[start:end]
melspec = librosa.feature.melspectrogram(signal,n_mels=bands)
logspec = librosa.amplitude_to_db(melspec)
logspec = logspec.T.flatten()[:, np.newaxis].T
segment_log_specgrams.append(logspec)
segment_labels.append(label)
if delta:
segment_log_specgrams = np.asarray(segment_log_specgrams)
segment_log_specgrams = segment_log_specgrams.reshape(len(segment_log_specgrams),bands,frames,1)
segment_features = np.concatenate((segment_log_specgrams, np.zeros(np.shape(segment_log_specgrams))), axis=3)
for i in range(len(segment_features)):
segment_features[i, :, :, 1] = librosa.feature.delta(segment_features[i, :, :, 0])
else:
segment_features = segment_log_specgrams
if len(segment_features) > 0: # check for empty segments
return 1, segment_features, segment_labels
else:
return 0, 0, 0
Try this:
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, Reshape
from tensorflow.keras import Sequential
model = Sequential([
Reshape((128, -1), input_shape=(128, 60, 41, 2)),
Dense(3)
])
inp = tf.random.uniform([10, 128, 60, 41, 2], dtype=tf.float32)
labels = tf.random.uniform([10, 128], 0, 3, dtype=tf.int32)
pred = model(inp)
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy')
model.fit(inp, labels)
Related
I have an input that is a time series of 5 dimensions:
a = [[8,3],[2] , [4,5],[1], [9,1],[2]...] #total 100 timestamps. For each element, dims 0,1 are numerical data and dim 2 is a numerical encoding of a category. This is per sample, 3200 samples
The category has 3 possible values (0,1,2)
I want to build a NN such that the last dimension (the category) will go through an embedding layer with output size 8, and then will be concatenated back to the first two dims (the numerical data).
So, this will be something like:
input1 = keras.layers.Input(shape=(2,)) #the numerical features
input2 = keras.layers.Input(shape=(1,)) #the encoding of the categories. this part will be embedded to 5 dims
x2 = Embedding(input_dim=1, output_dim = 8)(input2) #apply it to every timestamp and take only dim 3, so [2],[1], [2]
x = concatenate([input1,x2]) #will get 10 dims at each timepoint, still 100 timepoints
x = LSTM(units=24)(x) #the input has 10 dims/features at each timepoint, total 100 timepoints per sample
x = Dense(1, activation='sigmoid')(x)
model = Model(inputs=[input1, input2] , outputs=[x]) #input1 is 1D vec of the width 2 , input2 is 1D vec with the width 1 and it is going through the embedding
model.compile(
loss='binary_crossentropy',
optimizer='adam',
metrics=['acc']
)
How can I do it? (preferably in keras)?
My problem is how to apply the embedding to every time point?
Meaning, if I have 1000 timepoints with 3 dims each, I need to convert it to 1000 timepoints with 8 dims each (The emebedding layer should transform input2 from (1000X1) to (1000X8)
There are a couple of issues you are having here.
First let me give you a working example and explain along the way how to solve your issues.
Imports and Data Generation
import tensorflow as tf
import numpy as np
from tensorflow.keras import layers
from tensorflow.keras.models import Model
num_timesteps = 100
max_features_values = [100, 100, 3]
num_observations = 2
input_list = [[[np.random.randint(0, v) for _ in range(num_timesteps)]
for v in max_features_values]
for _ in range(num_observations)]
input_arr = np.array(input_list) # shape (2, 3, 100)
In order to use an embedding we need to the voc_size as input_dimension, as stated in the LSTM documentation.
Embedding and Concatenation
voc_size = len(np.unique(input_arr[:, 2, :])) + 1 # 4
Now we need to create the inputs. Inputs should be of size [None, 2, num_timesteps] and [None, 1, num_timesteps] where the first dimension is the flexible and will be filled with the number of observations we are passing in. Let's use the embedding right after that using the previously calculated voc_size.
inp1 = layers.Input(shape=(2, num_timesteps)) # TensorShape([None, 2, 100])
inp2 = layers.Input(shape=(1, num_timesteps)) # TensorShape([None, 1, 100])
x2 = layers.Embedding(input_dim=voc_size, output_dim=8)(inp2) # TensorShape([None, 1, 100, 8])
x2_reshaped = tf.transpose(tf.squeeze(x2, axis=1), [0, 2, 1]) # TensorShape([None, 8, 100])
This cannot be easily concatenated since all dimensions must match except for the one along the concatenation axis. But the shapes are not matching unfortunately. Therefore we reshape x2. We do so by removing the first dimension and then transposing.
Now we can concatenate without any issue and everything works in a straight forward fashion:
x = layers.concatenate([inp1, x2_reshaped], axis=1)
x = layers.LSTM(32)(x)
x = layers.Dense(1, activation='sigmoid')(x)
model = Model(inputs=[inp1, inp2], outputs=[x])
Check on Dummy Example
inp1_np = input_arr[:, :2, :]
inp2_np = input_arr[:, 2:, :]
model.predict([inp1_np, inp2_np])
# Output
# array([[0.544262 ],
# [0.6157502]], dtype=float32)
#This outputs values between 0 and 1 just as expected.
In case you are not looking for Embeddings the way it's usually used in Keras (positive integers mapping to dense vectors). You might be looking for some sort of unprojection or basis expansion, in which 3 dimensions get mapped (embedded) to 8 and concatenating the result. This can be done using the kernel trick or other methods, but also happens implicitly in neural networks with non-linear applications.
As such, you can do something like this, following a similar format to pythonic833 because it was good (but with timestamps in the middle per the Keras LSTM documentation asking for [batch, timesteps, feature]):
Input generation
import tensorflow as tf
import numpy as np
from tensorflow.keras import layers
from tensorflow.keras.models import Model
num_timesteps = 100
num_features = 5
num_observations = 2
input_list = [[[np.random.randint(1, 100) for _ in range(num_features)]
for _ in range(num_timesteps)]
for _ in range(num_observations)]
input_arr = np.array(input_list) # shape (2, 100, 5)
Model construction
Then you can process the inputs:
input1 = layers.Input(shape=(num_timesteps, 2,))
input2 = layers.Input(shape=(num_timesteps, 3))
x2 = layers.Dense(8, activation='relu')(input2)
x = layers.concatenate([input1,x2], axis=2) # This produces tensors of shape (None, 100, 10)
x = layers.LSTM(units=24)(x)
x = layers.Dense(1, activation='sigmoid')(x)
model = Model(inputs=[input1, input2] , outputs=[x])
model.compile(
loss='binary_crossentropy',
optimizer='adam',
metrics=['acc']
)
Results
inp1_np = input_arr[:, :, :2]
inp2_np = input_arr[:, :, 2:]
model.predict([inp1_np, inp2_np])
which produces
array([[0.44117224],
[0.23611131]], dtype=float32)
Other explanations about basis expansion to check out:
https://stats.stackexchange.com/questions/527258/embedding-data-into-a-larger-dimension-space
https://www.reddit.com/r/MachineLearning/comments/2ffejw/why_dont_researchers_use_the_kernel_method_in/
In the keras MWE below I'm trying to train a multi-output regression model with 1000 samples having 20 features (X) as input and producing outputs of size 50 (Y). However, I'm missing a step that I fail to wrap my head around and that I miss the word to describe properly. Let me try anyway, and please forgive the mess:
Here, each one of the 50 outputs is characterised by a set of 10 "feature filters" which are here to interact (through e.g. a dot product) with the 20 features to produce the numeric output. I miss a layer that would train a unique weight matrix of size (20, 10) whose sum (or average) subsequently produces the numeric output Y. The idea is that the output reacts to the features in ways that are dictated by those feature filters and that those interactions are consistent across outputs (e.g. high values in one feature filter might lead to a higher reaction to one feature and lower to another one, and those positive/negative relationships are not output-specific but identified for the whole dataset via the common weight matrix of size 10x20).
How could that side-input matrix (10, 50) of output-specific "feature filters" enter the network? My try below consists in (1) a tensordot product for every sample with the side-matrix (i.e. 3D output), which is (2) subsequently flatten to 1D to interact with a small Dense layer. The dense layer is then (3) tiled/repeated so that it stays small and learn weights that apply to all outputs. The tiled dense output is then (4) dimensionally reduced through averaging to fit the output format of (n, 50).
The problem with this approach is the Dense layer is fully connected, when all that is needed is a locally connected weight matrix (10 * 20) that is tiled 50 times. That is, 1 weight/bias per interaction between feature and feature filter, which apply to every outputs. Having that one weight per interaction we can then visualise which interaction are key to match the output (which is not possible if fully connected).
I suspect I need to replace the dense layer by some locally connected or convolutional or separable or some sort of layer that I don't really understand. Any ideas?
import numpy as np
import tensorflow as tf
from tensorflow import keras
## create dummy input/output matrices
XData = np.ones((1000, 20)) ## 1000 samples, 20 features
YData = np.ones((1000, 50)) ## 1000 samples, 50 outputs
filterData = np.ones((10, 50)) ## 10 feature filters, 50 outputs
filterData = tf.cast(filterData, tf.float32) ## needed for tf.math.reduce_mean() below
## input of size (n, 20)
input = keras.Input(XData.shape[1])
## dot product with filterData, out size = (n, 20, 10, 50)
tdot = keras.layers.Lambda(lambda x: tf.tensordot(x, filterData, axes=0))(input)
# flatten for dense layers, out size = (n, 10000)
tflat = keras.layers.Flatten()(tdot)
## learning dense layer, out size = (n, 20*10),
tdense = keras.layers.Dense(XData.shape[1] * filterData.shape[0], activation="linear")(tflat)
## tiling layer that repeats the dense layer for every output
ttile = keras.layers.Lambda(lambda x: keras.backend.repeat(x, filterData.shape[1]))(tdense)
## reduce dimensions through averaging to fit YData, out size = (n, 50)
tmean = keras.layers.Lambda(lambda x: tf.math.reduce_mean(x, axis=(2)))(ttile)
## make the model
model = keras.Model(input, tmean)
model.compile(
optimizer='adam',
loss='mse'
)
history = model.fit(
x = XData,
y = YData,
epochs = 3,
validation_split = 0.3,
verbose = 2,
batch_size=10
)
EDIT
The code below achieves the singular connection, i.e. one weigth per feature/feature_filter interaction (shared throughout outputs), that the dense layer does not allow. It consists in a collection of 20 * 10 = 200 single unit dense layer that are subsequently concatenated, before being tiled 50 times. However learning is very poor and maybe setting that concatenated colection inside a time distributed layer, as suggested by #SoheilStar could help. However the presence of the loop prevents me from using it in the sequential API code given by #SoheilStar. Any help on this?
## create dummy input/output matrices
XData = np.ones((1000, 20)) ## 1000 samples, 20 features
YData = np.zeros((1000, 50)) ## 1000 samples, 50 outputs
filterData = np.ones((10, 50)) ## 10 feature filters, 50 outputs
filterData = tf.cast(filterData, tf.float32) ## needed for tf.math.reduce_mean() below
## input of size (n, 20)
input = keras.Input(XData.shape[1])
## dot product with filterData, out size = (n, 20, 10, 50)
tdot = keras.layers.Lambda(lambda x: tf.tensordot(x, filterData, axes=0))(input)
# flatten for dense layers, out size = (n, 10000)
tflat = keras.layers.Flatten()(tdot)
## singular connection layer, i.e. a concatenated collection of single unit dense layer, out size = (n, 200)
dense_list = [None] * (filterData.shape[0] * XData.shape[1])
for i in range(filterData.shape[0] * XData.shape[1]):
dense_list[i] = keras.layers.Dense(1, activation="linear")(tflat[:,i:(i+1)])
tdense = keras.layers.Concatenate()(dense_list)
## tiling layer that repeats the dense layer for every output
ttile = keras.layers.Lambda(lambda x: keras.backend.repeat(x, filterData.shape[1]))(tdense)
## reduce dimensions through averaging to fit YData, out size = (n, 50)
tmean = keras.layers.Lambda(lambda x: tf.math.reduce_mean(x, axis=(2)))(ttile)
## make the model
model = keras.Model(input, tmean)
EDIT 2
To address the previous problem of having the for loop in the time distributed layer, I defined a custom function to give to the time distributed layer:
## define a custom layer to be used in a time distributed layer with the sequential api
class customLayer(tf.keras.layers.Layer):
def __init__(self):
super().__init__()
self.input_dim = filterData.shape[0] * XData.shape[1]
self.dense_list = [None] * (self.input_dim)
for i in range(self.input_dim):
self.dense_list[i] = keras.layers.Dense(1, activation="linear")
self.concat = keras.layers.Concatenate()
self.flat = keras.layers.Flatten()
def call(self, inputs):
flat_input = self.flat(inputs)
list = [None] * (self.input_dim)
for i in range(self.input_dim):
list[i] = self.dense_list[i](flat_input[:, i:(i+1)])
return self.concat(list)
def compute_output_shape(self, input_dim):
return (None, self.input_dim)
## transpose and time distribute along the first dimension (now the output size)
tdot_ = tf.transpose(tdot, [0, 3, 1, 2])
## call the customLayer inside a time distributed layer
tcustom = tf.keras.layers.TimeDistributed(customLayer())(tdot_)
And this work, technically, but learning is very poor. The proposition of #SoheilStar below, works after changing the last line so that we have instead:
## This layer would try to train its parameters according to each parameter
tdense_ = tf.keras.layers.TimeDistributed(tf.keras.layers.Flatten())(tdot_)
tdense_ = [tf.keras.layers.TimeDistributed(keras.layers.Dense(1, activation="linear"))(tdense_[:, :, i][..., None]) for i in range(XData.shape[1] * filterData.shape[0])]
tdense_ = tf.keras.layers.Concatenate()(tdense_)
Although again learning is poor, but it is probably to be expected with my real data and the small number of weights in presence.
Updated
I am not sure if I got the problem correctly. If you are looking to train the Dense layer simultaneously on the 50 outputs, then you can use TimeDistributed layer like this:
## create dummy input/output matrices
XData = tf.ones((1000, 20)) ## 1000 samples, 20 features
YData = tf.ones((1000, 50)) ## 1000 samples, 50 outputs
filterData = tf.ones((10, 50))
TrData = tf.ones((10, 50), dtype=tf.float32) ## 10 feature filters, 50 outputs
## input of size (n, 20)
input = keras.Input(XData.shape[1])
## dot product with filterData, out size = (n, 20, 10, 50)
tdot = keras.layers.Lambda(lambda x: tf.tensordot(x, filterData, axes=0))(input)
## My modification
## change the order of dimensions in order to use the TimeDistributed layer
tdot_ = tf.transpose(tdot, [0, 3, 1, 2])
## This layer would try to train its parameters according to each output
tdense_ = tf.keras.layers.TimeDistributed(tf.keras.models.Sequential([tf.keras.layers.Flatten(),
tf.keras.layers.Dense(XData.shape[1] * TrData.shape[0]),
tf.keras.layers.Dense(1)]))(tdot_)
## I used the Flatten to squeeze the output and the final shape would be (Batch, 50)
final_output = tf.keras.layers.Flatten()(tdense_)
But if it is not, then why you're not putting another Dense layer with a size of 50 after the tdense? like this:
tdense = keras.layers.Dense(XData.shape[1] * TrData.shape[0], activation="linear")(tflat)
final_output = keras.layers.Dense(50, activation="linear")(tdense)
Update
To address the issue you mentioned about the For loop, I did some modifications:
import numpy as np
import tensorflow as tf
from tensorflow import keras
## create dummy input/output matrices
XData = tf.ones((1000, 20)) ## 1000 samples, 20 features
YData = tf.ones((1000, 50)) ## 1000 samples, 50 outputs
filterData = tf.ones((10, 50))
TrData = tf.ones((10, 50), dtype=tf.float32) ## 10 feature filters, 50 outputs
## input of size (n, 20)
input = keras.Input(XData.shape[1])
## dot product with filterData, out size = (n, 20, 10, 50)
tdot = keras.layers.Lambda(lambda x: tf.tensordot(x, filterData, axes=0))(input)
## My modification
## change the order of dimension in order to use TimeDistributed
tdot_ = tf.transpose(tdot, [0, 3, 1, 2])
## This layer would try to train its parameters according to each parameter
tdense_ = tf.keras.layers.TimeDistributed(tf.keras.layers.Flatten())(tdot_)
tdense_ = [tf.keras.layers.TimeDistributed(keras.layers.Dense(1, activation="linear"))(tdense_[:, :, i][..., None]) for i in range(XData.shape[1] * filterData.shape[0])]
tdense_ = tf.keras.layers.TimeDistributed(tf.keras.layers.Concatenate())(tdense_)
## reduce dimensions through averaging to fit YData, out size = (n, 50)
tmean = keras.layers.Lambda(lambda x: tf.math.reduce_mean(x, axis=(2)))(tdense_)
I'm trying to program a simple example to understand how LSTMs work. I want to take a simple integer series 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, and predict the next number. I've got a code, but I don't know what the second argument of the fit method needs to be.
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
import numpy as np
from keras.models import Sequential
from keras.layers import LSTM
df = pd.DataFrame(columns = ['Serie'])
for i in range(0, 10):
df.loc[i, 'Serie'] = i
sc = MinMaxScaler(feature_range = (0, 1))
train_set = sc.fit_transform(df.iloc[:, [True]])
xTrain = []
for i in range(0, len(train_set) - 3):
xTrain.append(train_set[i:i + 3, 0])
xTrain = np.array(xTrain)
xTrain = np.reshape(xTrain, (xTrain.shape[0], xTrain.shape[1], 1))
regresor = Sequential()
regresor.add(LSTM(units = 1, input_shape = (3, 1)))
regresor.compile(optimizer = 'rmsprop', loss = 'mse')
regresor.fit(xTrain, ???, batch_size = 1)
Can someone give me a very simple example of this?
You need to set the problem as a supervised one. Every sample contains the independent variable x and the dependent variable y. Based on your question, x contains samples of 3 timesteps and 1 feature. Start off by doing the necessary imports:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
import numpy as np
import tensorflow as tf
Let's define some constants:
points = 30 # number of data points to generate
timesteps = 3 # number of time steps per sample as LSTM layers need input shape (samples, time steps, features)
features = 1 # number of features per time step as LSTM layers need input shape (samples, time steps, features)
A sequence generation from 0 ... 30:
x = np.arange(points + 1) # array([ 0, 1, ..., 29, 30])
Here is where we start setting the problem as a supervised one with xas a sequence of numbers and y as sequence of next numbers:
y = x[1:] # [ 1, 2, ..., 29, 30 ]
x = x[:30] # [ 0, 1, ..., 28, 29 ]
Put both x and y together for scaling:
dataset = np.hstack((x.reshape((points, 1)),y.reshape((points, 1))))
scaler = MinMaxScaler((0, 1))
scaled = scaler.fit_transform(dataset)
Let's define the inputs and outputs of our model:
x_train = scaled[:,0] # first column
x_train = x_train.reshape((points // timesteps, timesteps, features)) # as i stated before LSTM layers need input shape (samples, time steps, features)
y_train = scaled[:,1] # second column
y_train = y_train[2::3] # start at the third element in steps of 3, for a total of 10
Model definition and compilation. I decided to make the model architecture a little more robust for "better" performance (see the results below):
regresor = tf.keras.models.Sequential()
regresor.add(tf.keras.layers.LSTM(units = 4, return_sequences = True))
regresor.add(tf.keras.layers.LSTM(units = 2))
regresor.add(tf.keras.layers.Dense(units = 1))
regresor.compile(optimizer = 'rmsprop', loss = 'mse')
Train the model:
regresor.fit(x_train, y_train, batch_size = 2, epochs = 500, verbose = 1)
Some predictions:
y_hats = regresor.predict(x_train)
The results;
real y predicted y
0.068966 0.086510
0.172414 0.162209
0.275862 0.252749
0.379310 0.356117
0.482759 0.467885
0.586207 0.582081
0.689655 0.692756
0.793103 0.795362
0.896552 0.887317
1.000000 0.967796
As you can see, the predictions are close enough to the real values.
A plot of the results:
Note that for simplicity I performed the predictions on the training data set, the testing should be done on test data. For that, you will have to generate more points and split them accordingly (70% training, 30% testing). Also, you can obtain the values in the original range by calling the scaler's inverse_transform methods.
In Keras implementation of Wavenet, the input shape is (None, 1). I have a time series (val(t)) in which the target is to predict the next data point given a window of past values (the window size depends on maximum dilation). The input-shape in wavenet is confusing. I have few questions about it:
How Keras figure out the input dimension (None) when a full sequence is given? According to dilations, we want the input to have a length of 2^8.
If a input series of shape (1M, 1) is given as training X, do we need to generate vectors of 2^8 time-steps as input? It seems, we can just use the input series as input of wave-net (Not sure why raw time series input does not give error).
In general, how we can debug such Keras networks. I tried to apply the function on numerical data like Conv1D(16, 1, padding='same', activation='relu')(inputs), however, it gives error.
#
n_filters = 32
filter_width = 2
dilation_rates = [2**i for i in range(7)] * 2
from keras.models import Model
from keras.layers import Input, Conv1D, Dense, Activation, Dropout, Lambda, Multiply, Add, Concatenate
from keras.optimizers import Adam
history_seq = Input(shape=(None, 1))
x = history_seq
skips = []
for dilation_rate in dilation_rates:
# preprocessing - equivalent to time-distributed dense
x = Conv1D(16, 1, padding='same', activation='relu')(x)
# filter
x_f = Conv1D(filters=n_filters,
kernel_size=filter_width,
padding='causal',
dilation_rate=dilation_rate)(x)
# gate
x_g = Conv1D(filters=n_filters,
kernel_size=filter_width,
padding='causal',
dilation_rate=dilation_rate)(x)
# combine filter and gating branches
z = Multiply()([Activation('tanh')(x_f),
Activation('sigmoid')(x_g)])
# postprocessing - equivalent to time-distributed dense
z = Conv1D(16, 1, padding='same', activation='relu')(z)
# residual connection
x = Add()([x, z])
# collect skip connections
skips.append(z)
# add all skip connection outputs
out = Activation('relu')(Add()(skips))
# final time-distributed dense layers
out = Conv1D(128, 1, padding='same')(out)
out = Activation('relu')(out)
out = Dropout(.2)(out)
out = Conv1D(1, 1, padding='same')(out)
# extract training target at end
def slice(x, seq_length):
return x[:,-seq_length:,:]
pred_seq_train = Lambda(slice, arguments={'seq_length':1})(out)
model = Model(history_seq, pred_seq_train)
model.compile(Adam(), loss='mean_absolute_error')
you are using extreme values for dilatation rate, they don't make sense. try to reduce them using, for example, a sequence made of [1, 2, 4, 8, 16, 32]. the dilatation rates aren't a constraint on the dimension of the input passed
your network work simply passing this input
n_filters = 32
filter_width = 2
dilation_rates = [1, 2, 4, 8, 16, 32]
....
model = Model(history_seq, pred_seq_train)
model.compile(Adam(), loss='mean_absolute_error')
n_sample = 5
time_step = 100
X = np.random.uniform(0,1, (n_sample,time_step,1))
model.predict(X)
specify a None dimension in Keras means to leave the model free to receive every dimension. this not means you can pass samples of various dimension, they always must have the same format... you can build the model every time with a different dimension size
for time_step in np.random.randint(100,200, 4):
print('temporal dim:', time_step)
n_sample = 5
model = Model(history_seq, pred_seq_train)
model.compile(Adam(), loss='mean_absolute_error')
X = np.random.uniform(0,1, (n_sample,time_step,1))
print(model.predict(X).shape)
I suggest also you a premade library in Keras which provide WAVENET implementation: https://github.com/philipperemy/keras-tcn you can use it as a baseline and investigate also the code to create a WAVENET
I have a GAN network. the generator is drawing mnist digits. It works great. But I cant understand how it knows, which digit it should draw.
Here is the Generator:
def build_generator(latent_size):
# we will map a pair of (z, L), where z is a latent vector and L is a
# label drawn from P_c, to image space (..., 1, 28, 28)
cnn = Sequential()
cnn.add(Dense(1024, input_dim=latent_size, activation='relu'))
cnn.add(Dense(128 * 7 * 7, activation='relu'))
cnn.add(Reshape((128, 7, 7)))
# upsample to (..., 14, 14)
cnn.add(UpSampling2D(size=(2, 2)))
cnn.add(Conv2D(256, 5, padding='same',
activation='relu',
kernel_initializer='glorot_normal'))
# upsample to (..., 28, 28)
cnn.add(UpSampling2D(size=(2, 2)))
cnn.add(Conv2D(128, 5, padding='same',
activation='relu',
kernel_initializer='glorot_normal'))
# take a channel axis reduction
cnn.add(Conv2D(1, 2, padding='same',
activation='tanh',
kernel_initializer='glorot_normal'))
# this is the z space commonly refered to in GAN papers
latent = Input(shape=(latent_size, ))
# this will be our label
image_class = Input(shape=(1,), dtype='int32')
cls = Flatten()(Embedding(num_classes, latent_size,
embeddings_initializer='glorot_normal')(image_class))
# hadamard product between z-space and a class conditional embedding
h = layers.multiply([latent, cls])
fake_image = cnn(h)
return Model([latent, image_class], fake_image)
The Input is a latent-array
noise = np.random.uniform(-1, 1, (batch_size, latent_size))
and the labels are just generated randomly.
So my question is. After the network is embedding the labels. They should look like this
So, now. If I give the network more latent-arrays and labels. He is multiplying the latent-arrays(the noise) with the embedding(of the labels):
So what I expect is:
So the network knows, what new array represents what number.
but the output of np.multiply(noise,embedded_label) is this:
So how can the network know, what digit it should draw?
EDIT:
So here is the whole code. And it works. But why?
The latent_size in the code is 100. The latent_size in my pictures is 2, because I wanted to visualize them. But i think it doesn't change a thing, if I multiply the noise in the 2 dimensional space or the 100 dimensional space. At the end the new points with label "1" are not close to the other points with label "1". Same for the other digits("0","1","2","3",...)
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
Train an Auxiliary Classifier Generative Adversarial Network (ACGAN) on the
MNIST dataset. See https://arxiv.org/abs/1610.09585 for more details.
You should start to see reasonable images after ~5 epochs, and good images
by ~15 epochs. You should use a GPU, as the convolution-heavy operations are
very slow on the CPU. Prefer the TensorFlow backend if you plan on iterating,
as the compilation time can be a blocker using Theano.
Timings:
Hardware | Backend | Time / Epoch
-------------------------------------------
CPU | TF | 3 hrs
Titan X (maxwell) | TF | 4 min
Titan X (maxwell) | TH | 7 min
Consult https://github.com/lukedeo/keras-acgan for more information and
example output
"""
from __future__ import print_function
from collections import defaultdict
try:
import cPickle as pickle
except ImportError:
import pickle
from PIL import Image
from six.moves import range
import keras.backend as K
from keras.datasets import mnist
from keras import layers
from keras.layers import Input, Dense, Reshape, Flatten, Embedding, Dropout
from keras.layers.advanced_activations import LeakyReLU
from keras.layers.convolutional import UpSampling2D, Conv2D
from keras.models import Sequential, Model
from keras.optimizers import Adam
from keras.utils.generic_utils import Progbar
import numpy as np
import time, os
np.random.seed(1337)
K.set_image_data_format('channels_first')
num_classes = 10
def build_generator(latent_size):
# we will map a pair of (z, L), where z is a latent vector and L is a
# label drawn from P_c, to image space (..., 1, 28, 28)
cnn = Sequential()
cnn.add(Dense(1024, input_dim=latent_size, activation='relu'))
cnn.add(Dense(128 * 7 * 7, activation='relu'))
cnn.add(Reshape((128, 7, 7)))
# upsample to (..., 14, 14)
cnn.add(UpSampling2D(size=(2, 2)))
cnn.add(Conv2D(256, 5, padding='same',
activation='relu',
kernel_initializer='glorot_normal'))
# upsample to (..., 28, 28)
cnn.add(UpSampling2D(size=(2, 2)))
cnn.add(Conv2D(128, 5, padding='same',
activation='relu',
kernel_initializer='glorot_normal'))
# take a channel axis reduction
cnn.add(Conv2D(1, 2, padding='same',
activation='tanh',
kernel_initializer='glorot_normal'))
# this is the z space commonly refered to in GAN papers
latent = Input(shape=(latent_size, ))
# this will be our label
image_class = Input(shape=(1,), dtype='int32')
cls = Flatten()(Embedding(num_classes, latent_size,
embeddings_initializer='glorot_normal')(image_class))
# hadamard product between z-space and a class conditional embedding
h = layers.multiply([latent, cls])
fake_image = cnn(h)
return Model([latent, image_class], fake_image)
def build_discriminator():
# build a relatively standard conv net, with LeakyReLUs as suggested in
# the reference paper
cnn = Sequential()
cnn.add(Conv2D(32, 3, padding='same', strides=2,
input_shape=(1, 28, 28)))
cnn.add(LeakyReLU())
cnn.add(Dropout(0.3))
cnn.add(Conv2D(64, 3, padding='same', strides=1))
cnn.add(LeakyReLU())
cnn.add(Dropout(0.3))
cnn.add(Conv2D(128, 3, padding='same', strides=2))
cnn.add(LeakyReLU())
cnn.add(Dropout(0.3))
cnn.add(Conv2D(256, 3, padding='same', strides=1))
cnn.add(LeakyReLU())
cnn.add(Dropout(0.3))
cnn.add(Flatten())
image = Input(shape=(1, 28, 28))
features = cnn(image)
# first output (name=generation) is whether or not the discriminator
# thinks the image that is being shown is fake, and the second output
# (name=auxiliary) is the class that the discriminator thinks the image
# belongs to.
fake = Dense(1, activation='sigmoid', name='generation')(features) # fake oder nicht fake
aux = Dense(num_classes, activation='softmax', name='auxiliary')(features) #welche klasse ist es
return Model(image, [fake, aux])
if __name__ == '__main__':
start_time_string = time.strftime("%Y_%m_%d_%H_%M_%S", time.gmtime())
os.mkdir('history/' + start_time_string)
os.mkdir('images/' + start_time_string)
os.mkdir('acgan/' + start_time_string)
# batch and latent size taken from the paper
epochs = 50
batch_size = 100
latent_size = 100
# Adam parameters suggested in https://arxiv.org/abs/1511.06434
adam_lr = 0.00005
adam_beta_1 = 0.5
# build the discriminator
discriminator = build_discriminator()
discriminator.compile(
optimizer=Adam(lr=adam_lr, beta_1=adam_beta_1),
loss=['binary_crossentropy', 'sparse_categorical_crossentropy']
)
# build the generator
generator = build_generator(latent_size)
generator.compile(optimizer=Adam(lr=adam_lr, beta_1=adam_beta_1),
loss='binary_crossentropy')
latent = Input(shape=(latent_size, ))
image_class = Input(shape=(1,), dtype='int32')
# get a fake image
fake = generator([latent, image_class])
# we only want to be able to train generation for the combined model
discriminator.trainable = False
fake, aux = discriminator(fake)
combined = Model([latent, image_class], [fake, aux])
combined.compile(
optimizer=Adam(lr=adam_lr, beta_1=adam_beta_1),
loss=['binary_crossentropy', 'sparse_categorical_crossentropy']
)
# get our mnist data, and force it to be of shape (..., 1, 28, 28) with
# range [-1, 1]
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = (x_train.astype(np.float32) - 127.5) / 127.5
x_train = np.expand_dims(x_train, axis=1)
x_test = (x_test.astype(np.float32) - 127.5) / 127.5
x_test = np.expand_dims(x_test, axis=1)
num_train, num_test = x_train.shape[0], x_test.shape[0]
train_history = defaultdict(list)
test_history = defaultdict(list)
for epoch in range(1, epochs + 1):
print('Epoch {}/{}'.format(epoch, epochs))
num_batches = int(x_train.shape[0] / batch_size)
progress_bar = Progbar(target=num_batches)
epoch_gen_loss = []
epoch_disc_loss = []
for index in range(num_batches):
# generate a new batch of noise
noise = np.random.uniform(-1, 1, (batch_size, latent_size))
# get a batch of real images
image_batch = x_train[index * batch_size:(index + 1) * batch_size]
label_batch = y_train[index * batch_size:(index + 1) * batch_size]
# sample some labels from p_c
sampled_labels = np.random.randint(0, num_classes, batch_size)
# generate a batch of fake images, using the generated labels as a
# conditioner. We reshape the sampled labels to be
# (batch_size, 1) so that we can feed them into the embedding
# layer as a length one sequence
generated_images = generator.predict(
[noise, sampled_labels.reshape((-1, 1))], verbose=0)
x = np.concatenate((image_batch, generated_images))
y = np.array([1] * batch_size + [0] * batch_size)
aux_y = np.concatenate((label_batch, sampled_labels), axis=0)
# see if the discriminator can figure itself out...
epoch_disc_loss.append(discriminator.train_on_batch(x, [y, aux_y]))
# make new noise. we generate 2 * batch size here such that we have
# the generator optimize over an identical number of images as the
# discriminator
noise = np.random.uniform(-1, 1, (2 * batch_size, latent_size))
sampled_labels = np.random.randint(0, num_classes, 2 * batch_size)
# we want to train the generator to trick the discriminator
# For the generator, we want all the {fake, not-fake} labels to say
# not-fake
trick = np.ones(2 * batch_size)
epoch_gen_loss.append(combined.train_on_batch(
[noise, sampled_labels.reshape((-1, 1))],
[trick, sampled_labels]))
progress_bar.update(index + 1)
print('Testing for epoch {}:'.format(epoch))
# evaluate the testing loss here
# generate a new batch of noise
noise = np.random.uniform(-1, 1, (num_test, latent_size))
# sample some labels from p_c and generate images from them
sampled_labels = np.random.randint(0, num_classes, num_test)
generated_images = generator.predict(
[noise, sampled_labels.reshape((-1, 1))], verbose=False)
x = np.concatenate((x_test, generated_images))
y = np.array([1] * num_test + [0] * num_test)
aux_y = np.concatenate((y_test, sampled_labels), axis=0)
# see if the discriminator can figure itself out...
discriminator_test_loss = discriminator.evaluate(
x, [y, aux_y], verbose=False)
discriminator_train_loss = np.mean(np.array(epoch_disc_loss), axis=0)
# make new noise
noise = np.random.uniform(-1, 1, (2 * num_test, latent_size))
sampled_labels = np.random.randint(0, num_classes, 2 * num_test)
trick = np.ones(2 * num_test)
generator_test_loss = combined.evaluate(
[noise, sampled_labels.reshape((-1, 1))],
[trick, sampled_labels], verbose=False)
generator_train_loss = np.mean(np.array(epoch_gen_loss), axis=0)
# generate an epoch report on performance
train_history['generator'].append(generator_train_loss)
train_history['discriminator'].append(discriminator_train_loss)
test_history['generator'].append(generator_test_loss)
test_history['discriminator'].append(discriminator_test_loss)
print('{0:<22s} | {1:4s} | {2:15s} | {3:5s}'.format(
'component', *discriminator.metrics_names))
print('-' * 65)
ROW_FMT = '{0:<22s} | {1:<4.2f} | {2:<15.2f} | {3:<5.2f}'
print(ROW_FMT.format('generator (train)',
*train_history['generator'][-1]))
print(ROW_FMT.format('generator (test)',
*test_history['generator'][-1]))
print(ROW_FMT.format('discriminator (train)',
*train_history['discriminator'][-1]))
print(ROW_FMT.format('discriminator (test)',
*test_history['discriminator'][-1]))
# save weights every epoch
generator.save_weights(
'acgan/'+ start_time_string +'/params_generator_epoch_{0:03d}.hdf5'.format(epoch), True)
discriminator.save_weights(
'acgan/'+ start_time_string +'/params_discriminator_epoch_{0:03d}.hdf5'.format(epoch), True)
# generate some digits to display
noise = np.random.uniform(-1, 1, (100, latent_size))
sampled_labels = np.array([
[i] * num_classes for i in range(num_classes)
]).reshape(-1, 1)
# get a batch to display
generated_images = generator.predict(
[noise, sampled_labels], verbose=0)
# arrange them into a grid
img = (np.concatenate([r.reshape(-1, 28)
for r in np.split(generated_images, num_classes)
], axis=-1) * 127.5 + 127.5).astype(np.uint8)
Image.fromarray(img).save(
'images/'+ start_time_string +'/plot_epoch_{0:03d}_generated.png'.format(epoch))
pickle.dump({'train': train_history, 'test': test_history},
open('history/'+ start_time_string +'/acgan-history.pkl', 'wb'))
Your noise is too big, and has negative values.
You should not multiply the noise, but sum it (and make it a lot smaller).
By multiplying +1 and -1, you can completely change the input. That's the reason for having that completely scattered image in reality.
If even with that weird scattered input the model is still able to recognize the number you meant, then it's probably using certain dimensions of the latent vector more than its actual values.
If you look closely to the scattered graph, it has some interesting patterns such as:
0 - a vertical line. It used only a certain dimension to be zero.
4 - another vertical line.
7 - a horizontal line.
3 - seems to be a diagonal, not sure.
If we can see a pattern (even in a 2D graph hiding actual 100 dimensions), the model can also see a pattern. This pattern might be extremely evident if we could see all the 100 dimensions.
So, your embedding is probably creating a compensation for the wild random factors, maybe by eliminating the random factors with zeros in certain groups of dimensions. That makes the straight lines following certain axes. And certain combinations zero dimensions versus varying dimensions may identify a label.
Example:
For the label 0, your embedding may be creating [0,0,0,0,1,1,1,1,1,1,1,1,...]
For the label 1, it may be creating [1,1,1,1,0,0,0,0,1,1,1,1,1....]
For the label 2, it may be creating [1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1...]
Then the random factor will never change those zeros, and the model can identify a number by checking those groups of four zeros in the examples.
Of course, this is just one supposition... there might be many other possible ways for the model to work around the random factors... but if one exists, it's enough to show that it's ok for the model to find it.