Keras: Understanding the role of Embedding layer in a Conditional GAN - python

I am working to understand Erik Linder-Norén's implementation of the Categorical GAN model, and am confused by the generator in that model:
def build_generator(self):
model = Sequential()
# ...some lines removed...
model.add(Dense(np.prod(self.img_shape), activation='tanh'))
model.add(Reshape(self.img_shape))
model.summary()
noise = Input(shape=(self.latent_dim,))
label = Input(shape=(1,), dtype='int32')
label_embedding = Flatten()(Embedding(self.num_classes, self.latent_dim)(label))
model_input = multiply([noise, label_embedding])
img = model(model_input)
return Model([noise, label], img)
My question is: How does the Embedding() layer work here?
I know that noise is a vector that has length 100, and label is an integer, but I don't understand what the label_embedding object contains or how it functions here.
I tried printing the shape of label_embedding to try and figure out what's going on in that Embedding() line but that returns (?,?).
If anyone could help me understand how the Embedding() lines here work, I'd be very grateful for their assistance!

To keep in mind why use embedding here at all: the alternative is to concatenate the noise with the conditioned class, which may cause the generator to completely ignore the noise values, generating data with high similarity in each class (or even just 1 per class).

From the documentation, https://keras.io/layers/embeddings/#embedding,
Turns positive integers (indexes) into dense vectors of fixed size.
eg. [[4], [20]] -> [[0.25, 0.1], [0.6, -0.2]]
In the GAN model, the input integer(0-9) is converted to a vector of shape 100. With this short code snippet, we can feed some test input to check the output shape of the Embedding layer.
from keras.layers import Input, Embedding
from keras.models import Model
import numpy as np
latent_dim = 100
num_classes = 10
label = Input(shape=(1,), dtype='int32')
label_embedding = Embedding(num_classes, latent_dim)(label)
mod = Model(label, label_embedding)
test_input = np.zeros((1))
print(f'output shape is {mod.predict(test_input).shape}')
mod.summary()
output shape is (1, 1, 100)
From model summary, output shape for embedding layer is (1,100) which is the same as output of predict.
embedding_1 (Embedding) (None, 1, 100) 1000
One additional point, in the output shape (1,1,100), the leftmost 1 is the batch size, the middle 1 is the input length. In this case, we provided an input of length 1.

The embedding stores the per label state. If I read the code correctly, each label corresponds to a digit; i.e. there is an embedding that captures how to generate a 0, 1, ... 9.
This code takes some random noise and multiplies it to this per label state. The result should be a vector that leads the generator to display the digit corresponding to the label (i.e. 0..9).

Related

Correct keras LSTM input shape after text-embedding

I'm trying to understand the keras LSTM layer a bit better in regards to timesteps, but am still struggling a bit.
I want to create a model that is able to compare 2 inputs (siamese network). So my input is twice a preprocessed text. The preprocessing is done as followed:
max_len = 64
data['cleaned_text_1'] = assets.apply(lambda x: clean_string(data[]), axis=1)
data['text_1_seq'] = t.texts_to_sequences(cleaned_text_1.astype(str).values)
data['text_1_seq_pad'] = [list(x) for x in pad_sequences(assets['text_1_seq'], maxlen=max_len, padding='post')]
same is being done for the second text input. T is from keras.preprocessing.text.Tokenizer.
I defined the model with:
common_embed = Embedding(
name="synopsis_embedd",
input_dim=len(t.word_index)+1,
output_dim=300,
input_length=len(data['text_1_seq_pad'].tolist()[0]),
trainable=True
)
lstm_layer = tf.keras.layers.Bidirectional(
tf.keras.layers.LSTM(32, dropout=0.2, recurrent_dropout=0.2)
)
input1 = tf.keras.Input(shape=(len(data['text_1_seq_pad'].tolist()[0]),))
e1 = common_embed(input1)
x1 = lstm_layer(e1)
input2 = tf.keras.Input(shape=(len(data['text_1_seq_pad'].tolist()[0]),))
e2 = common_embed(input2)
x2 = lstm_layer(e2)
merged = tf.keras.layers.Lambda(
function=l1_distance, output_shape=l1_dist_output_shape, name='L1_distance'
)([x1, x2])
conc = Concatenate(axis=-1)([merged, x1, x2])
x = Dropout(0.01)(conc)
preds = tf.keras.layers.Dense(1, activation='sigmoid')(x)
model = tf.keras.Model(inputs=[input1, input2], outputs=preds)
that seems to work if I feed the numpy data with the fit method:
model.fit(
x = [np.array(data['text_1_seq_pad'].tolist()), np.array(data['text_2_seq_pad'].tolist())],
y = y_train.values.reshape(-1,1),
epochs=epochs,
batch_size=batch_size,
validation_data=([np.array(val['text_1_seq_pad'].tolist()), np.array(val['text_2_seq_pad'].tolist())], y_val.values.reshape(-1,1)),
)
What I'm trying to understand at the moment is what is the shape in my case for the LSTM layer for:
samples
time_steps
features
Is it correct that the input_shape for the LSTM layer would be input_shape=(300,1) because I set the embedding output dim to 300 and I have only 1 input feature per LSTM?
And do I need to reshape the embedding output or can I just set
lstm_layer = tf.keras.layers.Bidirectional(
tf.keras.layers.LSTM(32, input_shape=(300,1), dropout=0.2, recurrent_dropout=0.2)
)
from the embedding output?
Example notebook can be found in Github or as Colab
In general, an LSTM layer needs 3D inputs shaped this way : (batch_size, lenght of an input sequence , number of features ). (Batch size is not really important, so you can just consider that one input need to have this shape (lenght of sequence, number of features par item) )
In your case, the output dim of your embedding layer is 300. So your LSTM have 300 features.
Then, using LSTM on sentences requires a constant number of tokens. LSTM works with constant input dimension, you can not pass it a text with 12 tokens following by another one with 68 tokens. Indeed, you need to fix a limit and pad the sequence if needed.
So, if your sentence is 20 tokens long and that your limit is 50, you need to pad (add at the end of your sequence) the sequence with 30 “neutral” tokens (often zeros).
After all, your LSTM input dimension must be (number of token per text, dimension of your embedding outputs) -> (50, 300) in my example.
To learn more about it, it suggest you to take a look to this : (but in your case, you can replace time_steps by number_of_tokens)
https://shiva-verma.medium.com/understanding-input-and-output-shape-in-lstm-keras-c501ee95c65e
Share
Edit
Delete
Flag

Why is this ML model giving me zero accuracy?

I am trying to train a network on the Swiss Roll dataset with three features X = [x1, x2, x3] for the classification task. There are four classes with labels 1, 2, 3, 4, and the vector y contains the labels for all the data.
A row in the X matrix looks like this:
-5.2146470e+00 7.0879738e+00 6.7292474e+00
The shape of X is (100, 3), and the shape of y is (100,).
I want to use Radial Basis Functions to train this model. I have used the custom RBFLayer from this StackOverflow answer (also see this explanation) to build the RBFLayer. I want to use a couple of Keras Dense layers to build the network for classification.
What I have tried so far
I have used a Dense layer for the first layer, followed by the custom RBFLayer, and two other Dense layers. Here's the code:
model = Sequential()
model.add((Dense(100, input_dim=3)))
# number of units = 10, gamma = 0.05
model.add(RBFLayer(10,0.05))
model.add(Dense(15, activation='relu'))
model.add(Dense(1, activation='softmax'))
This model gives me zero accuracy. I think there is something wrong with the model architecture, but I can't figure out what is the issue.
Also, I thought the number of units in the last Dense layer should match the number of classes, which is 4 in this case. But when I set the number of units to 4 in the last layer, I get the following error:
ValueError: Shapes (None, 1) and (None, 4) are incompatible
Can you help me with this model architecture?
I faced the same issue while practicing with multi-class classification. Where I had 7 features and the model classifies into 7 classes. I tried encoding the labels and it fixed the issue.
First import LabelEncoder class from sklearn and import to_categorical from tensorflow
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras.utils import to_categorical
Then, initialize an object to the LabelEncoder class and transform your labels before fitting and training the model.
encoder = LabelEncoder()
encoder.fit(y)
y = encoder.transform(y)
y = to_categorical(y)
Note that you have to use np.argmax for getting the actual predicted classification. in my case, the prediction is stored in variable called res
res = np.argmax(res, axis=None, out=None)
You can get your actual predicted class after this line. Looking forward to help you. Hope it solved your problem.
There are four classes with labels 1, 2, 3, 4, and the vector y contains the labels for all the data.
The simplest solution for input output matching is that you print the shape of the inputs and output for a single batch and then compare.
RBF layer should have no problem because output is taken from last dense layer rather then RBF layer.
With classification problem you must have last nodes equal to classes in regression the last node is 1 sometimes.
you should print
pseudo code
print(input.shape)
compare it with
print(model.input_shape)
then at output
print(output.shape)
then compare it with
print(model.predict(input).shape)
you can find the correct syntax at keras docs these are approx correct syntax / pseudo

Creating a Keras CNN for image alteration

I'm working on a problem that involves computationally evaluating three-dimensional data of the shape (32, 16, 5) and providing a corrected form of this data also in the shape of (32, 16, 5). The problem is relatively specific to my field, but it can be viewed as analogous to processing color images (just with five color channels instead of three). If it helps, this could be thought of as a color correction model.
In my initial efforts, I created a random forest model using XGBoost for each of these output parameters. I had good results, but found that the sheer number of output parameters (32*16*5 = 2560) made the runtime of this approach too long, so I am looking for an alternative.
I'm looking at using Keras to solve this, using a convolutional neural network approach, since the adjacent 'pixels' in my data should have some useful information about their neighbors. Note that 'adjacency' here is both spatial and in the color channels. So far, I am doing alright in creating a simple model that I believe has inputs/outputs of the correct shape, but I am running into an issue when I try to train the model on some dummy images:
#!/usr/bin/env python3
import tensorflow as tf
import pandas as pd
import numpy as np
def create_model(image_shape, batch_size = 10):
width, height, channels = image_shape
conv_shape = (batch_size, width, height, channels)
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv3D(filters = channels, kernel_size = 3, input_shape = conv_shape, padding = "same"))
model.add(tf.keras.layers.Dense(channels, activation = "relu"))
return model
if __name__ == "__main__":
image_shape = (32, 16, 5)
# Create test input/output data sets:
input_img = np.random.rand(*image_shape) # Create one dummy input image
output_img = np.random.rand(*image_shape) # Create one dummy output image
# Create a bogus 'training set' by copying the input/output images into lists many times
inputs = [input_img]*500
outputs = [output_img]*500
# Create the model and fit it to the dummy data
model = create_model(image_shape)
model.summary()
model.compile(loss = "mean_squared_error", optimizer = "adam", metrics = ["accuracy"])
model.fit(input_img, output_img)
However, when I run this code, I get the following error:
ValueError: Input 0 of layer sequential is incompatible with the layer: : expected min_ndim=5, found ndim=3. Full shape received: [32, 16, 5]
I am not really sure what the other two expected dimensions are for the data passed into model.fit(). I suspect this is a problem with the way that I am formatting my input data. Even if I have a list of input/output images, that will only bring the ndim of my data to 4, not 5.
I have been trying to find similar examples in the documentation and around the web to see what I'm doing incorrectly, but 3D convolution on a non-classifier network seems a bit off the beaten path, and I'm not having much luck (or just don't know the name of what I should search for).
I have tried passing the dummy training set to model.fit instead of two individual images. Fitting with model.fit(inputs, outputs) instead, I get:
ValueError: Layer sequential expects 1 inputs, but it received 500 input tensors.
It seems that passing a list of tensors isn't correct here. If I convert the list of input images to numpy arrays with:
inputs = np.array(inputs)
outputs = np.array(outputs)
This does bring up the number of dimensions in my input data to 4, but Keras is still expecting 5. The error I get in this case is very similar to the first:
ValueError: Input 0 of layer sequential is incompatible with the layer: : expected min_ndim=5, found ndim=4. Full shape received: [None, 32, 16, 5]
I'm definitely not understanding something here, and any help would be appreciated.
I think you made two mistakes in your code:
Instead of using Conv3D, you need to use Conv2D.
model.fit(input_img, output_img) should be model.fit(inputs, outputs).
The reason why you need to use Conv2D is the shape of your data is (length,width,channel), it doesn't possess an extra dimension.
Try the script below
#!/usr/bin/env python3
import tensorflow as tf
import pandas as pd
import numpy as np
def create_model(image_shape, batch_size = 10):
width, height, channels = image_shape
conv_shape = (width, height, channels)
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv2D(filters = channels, kernel_size = 3, input_shape = conv_shape, padding = "same"))
model.add(tf.keras.layers.Dense(channels, activation = "relu"))
return model
if __name__ == "__main__":
image_shape = (32, 16, 5)
# Create test input/output data sets:
input_img = np.random.rand(*image_shape) # Create one dummy input image
output_img = np.random.rand(*image_shape) # Create one dummy output image
# Create a bogus 'training set' by copying the input/output images into lists many times
inputs = np.array([input_img]*500)
outputs = np.array([output_img]*500)
# Create the model and fit it to the dummy data
model = create_model(image_shape)
model.summary()
model.compile(loss = "mean_squared_error", optimizer = "adam", metrics = ["accuracy"])
model.fit(inputs, outputs)

Keras and input shape to Conv1D issues

First off, I am very new to Neural Nets and Keras.
I am trying to create a simple Neural Network using Keras where the input is a time series and the output is another time series of same length (1 dimensional vectors).
I made dummy code to create random input and output time series using a Conv1D layer. The Conv1D layer then outputs 6 different time series (because I have 6 filters) and the next layer I define to add all 6 of those outputs into one which is the output to the entire network.
import numpy as np
import tensorflow as tf
from tensorflow.python.keras.models import Model
from tensorflow.python.keras.layers import Conv1D, Input, Lambda
def summation(x):
y = tf.reduce_sum(x, 0)
return y
time_len = 100 # total length of time series
num_filters = 6 # number of filters/outputs to Conv1D layer
kernel_len = 10 # length of kernel (memory size of convolution)
# create random input and output time series
X = np.random.randn(time_len)
Y = np.random.randn(time_len)
# Create neural network architecture
input_layer = Input(shape = X.shape)
conv_layer = Conv1D(filters = num_filters, kernel_size = kernel_len, padding = 'same')(input_layer)
summation_layer = Lambda(summation)(conv_layer)
model = Model(inputs = input_layer, outputs = summation_layer)
model.compile(loss = 'mse', optimizer = 'adam', metrics = ['mae'])
model.fit(X,Y,epochs = 1, metrics = ['mae'])
The error I get is:
ValueError: Input 0 of layer conv1d_1 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [None, 100]
Looking at the Keras documentation for Conv1D, the input shape is supposed to be a 3D tensor of shape (batch, steps, channels) which I don't understand if we are working with 1 dimensional data.
Can you explain the meaning of each of the items: batch, steps, and channels? And how should I shape my 1D vectors to allow my network to run?
What is a (training) sample?
The (training) data may consists of tens, hundreds or thousands of samples. For example, each image in an image dataset like Cifar-10 or ImageNet is a sample. As another example, for a timseries dataset which consists of weather statistics recorded during the days over 10 years, each training sample may be a timeseries of each day. If we have recorded 100 measurements during the day and each measurement consists of temperature and humidity (i.e. we have two features per measurement) then the shape of our dataset is roughly (10x365, 100, 2).
What is batch size?
The batch size is simply the number of samples that can be processed by the model at a single time. We can set the batch size using the batch_size argument of fit method in Keras. The common values are 16, 32, 64, 128, 256, etc (though you must choose a number such that your machine could have enough RAM to allocate the required resources).
Further, the "steps" (also called "sequence length") and "channels" (also called "feature size") are the number of measurements and the size of each measurement, respectively. For example in our weather example above, we have steps=100 and channels=2.
To resolve the issue with your code you need to define your training data (i.e. X) such that it has a shape of (num_samples, steps or time_len, channels or feat_size):
n_samples = 1000 # we have 1000 samples in our training data
n_channels = 1 # each measurement has one feature
X = np.random.randn(n_samples, time_len, n_channels)
# if you want to predict one value for each measurement
Y = np.random.randn(n_samples, time_len)
# or if you want to predict one value for each sample
Y = np.random.randn(n_samples)
Edit:
One more thing is that you should pass the shape of one sample as the input shape of the model. Therefore, the input shape of Input layer must be passed like shape=X.shape[1:].

What does embedding do in tensorflow

I am reading an example of using RNN with tensorflow here: ptb_word_lm.py
I can't figure out what the embedding and embedding_lookup are doing here. How can it add another dimension to the tensor? Going from (20, 25) to (20, 25, 200). In this case (20,25) is a batch-size of 20 with 25 time steps. I can't understand how/why you can add the hidden_size of the cell as a dimension of the input data? Typically the input data would be a matrix of size [batch_size, num_features] and the model would map num_features ---> hidden_dims with a matrix of size [num_features, hidden_dims] yielding an output of size [batch-size, hidden-dims]. So how can hidden_dims be a dimension of the input tensor?
input_data, targets = reader.ptb_producer(train_data, 20, 25)
cell = tf.nn.rnn_cell.BasicLSTMCell(200, forget_bias=1.0, state_is_tuple=True)
initial_state = cell.zero_state(20, tf.float32)
embedding = tf.get_variable("embedding", [10000, 200], dtype=tf.float32)
inputs = tf.nn.embedding_lookup(embedding, input_data)
input_data_train # <tf.Tensor 'PTBProducer/Slice:0' shape=(20, 25) dtype=int32>
inputs # <tf.Tensor 'embedding_lookup:0' shape=(20, 25, 200) dtype=float32>
outputs = []
state = initial_state
for time_step in range(25):
if time_step > 0:
tf.get_variable_scope().reuse_variables()
cell_output, state = cell(inputs[:, time_step, :], state)
outputs.append(cell_output)
output = tf.reshape(tf.concat(1, outputs), [-1, 200])
outputs # list of 20: <tf.Tensor 'BasicLSTMCell/mul_2:0' shape=(20, 200) dtype=float32>
output # <tf.Tensor 'Reshape_2:0' shape=(500, 200) dtype=float32>
softmax_w = tf.get_variable("softmax_w", [config.hidden_size, config.vocab_size], dtype=tf.float32)
softmax_b = tf.get_variable("softmax_b", [config.hidden_size, config.vocab_size], dtype=tf.float32)
logits = tf.matmul(output, softmax_w) + softmax_b
loss = tf.nn.seq2seq.sequence_loss_by_example([logits], [tf.reshape(targets, [-1])],[tf.ones([20*25], dtype=tf.float32)])
cost = tf.reduce_sum(loss) / batch_size
ok, I'm not going to try and explain this specific code, but I will try and answer the "what is an embedding?" part of the title.
Basically it's a mapping of the original input data into some set of real-valued dimensions, and the "position" of the original input data in those dimensions is organized to improve the task.
In tensorflow, if you imagine some text input field has "king", "queen", "girl","boy", and you have 2 embedding dimensions. Hopefully the backprop will train the embedding to put the concept of royalty on one axis and gender on the other. So in this case, what was a 4 categorical value feature gets "boiled" down to a floating point embedding feature with 2 dimensions.
They are implemented using a lookup table, either hashed from the original or from a dictionary ordering. For a fully trained one, You might put in "Queen", and you get out say [1.0,1.0], Put in "Boy" and you get out [0.0,0.0].
Tensorflow does backprop of the error INTO this lookup table, and hopefully what starts off as a randomly initialized dictionary will gradually become like we see above.
Hope this helps. If not, look at: http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/
At simplest,
input_data: Batch of sequence of word IDs (with shape (20,25))
inputs: Batch of sequence of word embeddings (with shape (20,25,200))
How does input_data becomes inputs you might ask? This is what learning word embeddings does. The easiest way to imagine is,
unwrap the input_data to a single batch of shape (20*25,).
Now assign a vector of size 200 for each element in that unwrapped input_data which gives you a matrix of shape (20*25,200).
Now, reshape the matrix to shape (20,25,200).
This is because, embedding learning is not a time-series process. You learn word embeddings with a feed forward network. Next important question would be, how do you learn the word embeddings.
Initialise a huge Tensorflow variable of size (vocabulary_size, 200) (i.e. embedding in the code)
Optimise the embedding so that a given word should be able to predict any word from its context. (e.g. in "dog barked at the mailman", if "at" is the target word "dog", "barked", "the" and "mailman" are context words)
This process give you a vector (200 long in this example) for each word, such that semantics are preserved (i.e. vector of "dog" is close to "cat", but far away from "pen").
Here's an overview of what I just explained.

Categories

Resources