How can I improve the speed of my simple neural network? - python

I've just started exploring TensorFlow and I'm facing an issue regarding performance. As a starting example, I tried implementing a model to simulate a logic gate. Let's say there are two inputs A and B and one output Y. Suppose Y depended only on B and not on A. That means that the following are valid examples:
[0, 0] -> 0
[1, 0] -> 0
[0, 1] -> 1
[1, 1] -> 1
I created training sets for this data and created a model that uses a DenseFeatures layer using two features A and B. This layer feeds into a Dense(128, 'relu') layer, which feeds into a Dense(16, 'relu') layer, which finally feeds into a Dense(1, 'sigmoid') layer.
Training this NN works fine and the predictions are perfect. However, I noticed that on my MacBook, each prediction takes about 250ms. This is too much, since my final goal is to use such a NN to test hundreds of predictions each second.
So I stripped the network down to DenseFeatures([A, B]) -> Dense(8, 'relu') -> Dense(1, 'sigmoid'), however predictions for this NN still takes the same about of time. I was expecting that the execution speed depends on the complexity of the model. I can see that this is not the case here? What am I doing wrong?
Also, I had read that TensorFlow uses floating point math for accuracy but this has a penalty hit in terms of performance and if we convert our data to use integer math, it would speed things up. However, I have no idea of how to achieve that.
I would really appreciate if someone can help me understand why predictions for such a simple logic gate and such a simple NN is taking this long. And how can I speed it up.
For reference, here is my code in python:
import random
from typing import Any, List
import numpy as np
import tensorflow as tf
from sklearn.model_selection import train_test_split
from tensorflow import feature_column
from tensorflow.keras import layers
class Input:
def __init__(self, data: List[int]):
self.data = data
class Output:
def __init__(self, value: float):
self.value = value
class Item:
def __init__(self, input: Input, output: Output):
self.input = input
self.output = output
DATA: List[Item] = []
for i in range(10000):
x = Input([random.randint(0, 1), random.randint(0, 1)])
y = Output(x.data[1])
DATA.append(Item(x, y))
BATCH_SIZE = 5
DATA_TRAIN, DATA_TEST = train_test_split(DATA, shuffle=True, test_size=0.2)
DATA_TRAIN, DATA_VAL = train_test_split(DATA_TRAIN, shuffle=True, test_size=0.2/0.8)
def toDataSet(data: List[Item], shuffle: bool, batch_size: int):
a = {
'a': [x.input.data[0] for x in data],
'b': [x.input.data[1] for x in data],
}
b = [x.output.value for x in data]
return tf.data.Dataset.from_tensor_slices((a, b)).shuffle(buffer_size=len(data)).batch(BATCH_SIZE)
DS_TRAIN = toDataSet(DATA_TRAIN, True, 5)
DS_VAL = toDataSet(DATA_VAL, True, 5)
DS_TEST = toDataSet(DATA_TEST, True, 5)
FEATURES = []
FEATURES.append(a)
FEATURES.append(b)
feature_layer = tf.keras.layers.DenseFeatures(FEATURES)
model = tf.keras.models.load_model('MODEL.H5')
model = tf.keras.Sequential([
feature_layer,
layers.Dense(8, activation='relu'),
layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(DS_TRAIN, validation_data=DS_VAL, epochs=10)
loss, accuracy = model.evaluate(DS_TEST)
for i in range(1000):
val = model.predict([np.array([random.randint(0, 1)]), np.array([random.randint(0, 1)])])

Since you are only using integers, change the input of the model to use 8-bit signed integers. You can do this by changing the datatype in your input layer by adding the dtype parameter. This will vastly improve processing speed since you won't be wasting calculations.

Related

Custom layer with different activation function for each output

I have a simple neural network with two outputs and for each of them I need to use different activation function. I do basically what is written in this article - here, but it looks like my layer with different activation functions is not working:
See my code below:
X = filled_df.loc[:, "SOUTEZ_MEAN_HOME":"TOTAL_POINTS_AWAY"].values
y = filled_df.loc[:, "HOME_YELLOW_CARDS"].values
X= X.astype("float32")
y= y.astype("float32")
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.3)
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train= scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
def negative_binomial_layer(x):
# Get the number of dimensions of the input
num_dims = len(x.get_shape())
# Separate the parameters
n, p = tf.unstack(x, num=2, axis=-1)
# Add one dimension to make the right shape
n = tf.expand_dims(n, -1)
p = tf.expand_dims(p, -1)
# Apply a softplus to make positive
n = tf.cast(n, tf.float32)
p = tf.cast(p, tf.float32)
n = tf.keras.activations.softplus(n)
# Apply a sigmoid activation to bound between 0 and 1
p = tf.keras.activations.sigmoid(p)
# Join back together again
out_tensor = tf.concat((n, p), axis=num_dims-1)
return out_tensor
input_shape = (212, )
# Define inputs with predefined shape
inputs = Input(shape=input_shape)
# Build network with some predefined architecture
Layer1 = Dense(16)
Layer2 = Dense(8)
output1 = Layer1(inputs)
output2 = Layer2(output1)
# Predict the parameters of a negative binomial distribution
outputs = Dense(2)(output2)
#outputs = tf.cast(outputs, tf.float32)
distribution_outputs = Lambda(negative_binomial_layer)(outputs)
# Construct model
model = Model(inputs=inputs, outputs=outputs)
num_epochs = 10
opt = Adam()
model.compile(loss = negative_binomial_loss, optimizer = opt)
history = model.fit(X_train, y_train, epochs = num_epochs,
validation_data = (X_test, y_test))
These are my predicted values if I print y_pred in custom loss function:
Epoch 1/10
y_pred = [[2.19472528 3.14479065]
[-1.16056371 1.69369149]
[-1.12327099 2.06830978]
...
[-1.23587477 4.82307]
[0.235431105 3.86740351]
[-2.75554061 1.10352468]] [[[2.19472528 3.14479065]
[-1.16056371 1.69369149]
[-1.12327099 2.06830978]
...
[-1.23587477 4.82307]
[0.235431105 3.86740351]
[-2.75554061 1.10352468]]]
Second predicted value p should be between 0 and 1 and since it is out of this range I am getting nan during counting loss.
Any suggestions? Thanks
I can't give an exact programming explanation, but I can give a theoretical answer to this question which you should be able to use to build it.
From what I am assuming, you are asking how to use a different activation function for each output node in the outputs layer. I do not know much about any of the libraries or extensions you are using, but usually these kinds of libraries include some kind of method to create a customised network. From the code you have posted I can see that you are using a pre-defined structure for a network, this means that you may not be able to customise the output layer yourself, and you will have to create a custom network instead. I am assuming you are using Tensorflow due to some of the methods in the code you posted.
There is also something else to consider. Usually you have activated functions on the neurons (hidden layer) too, that is something that you might have to take in to consideration as well.
I am sorry I was not able to give a practical answer, but I hope this helps you see what you can do to get it to work - have a nice day!

LSTM to Predict Pattern 010101... Understanding Hidden State

I did a quick experiment to see if I could understand what the hidden state in an LSTM does...
I tried to make an LSTM predict a sequence of [1,0,1,0,1...] based off an input sequence of X with X[0] = 1 and the remainder as random noise.
X = [1, randFloat, randFloat, randFloat...]
label = [1, 0, 1, 0...]
In my head, the model would understand:
The inputs X mean nothing, or at least very little (as it's noise) - so it'd discard these values for the most part
Solely the hidden state from the previous sequence/timestep n would be used to predict the next timestep n+1... [1, 0, 1, 0...]
I also set X[0] = 1 so the first initial in an attempt to guide the net to predicting 1 on the first item (which it does)
So, this didn't work. In theory, should it not? Can you someone explain?
It essentially never converges, and is on the cusp of guessing between 0 or 1
## Code
import os
import numpy as np
import torch
from torchvision import transforms
from torch import nn
from sklearn import preprocessing
from util import create_sequences
import torch.optim as optim
Create some fake data
sequence_1 = torch.tensor(np.random.uniform(size=50)).float().detach()
sequence_1[0] = 1
sequence_2 = torch.tensor(np.random.uniform(size=50)).float().detach()
sequence_2[0] = 1
labels_1 = np.zeros(50)
labels_1[::2] = 1
labels_1 = torch.tensor(labels_1, dtype=torch.long)
labels_2 = labels_1.clone()
training_data = [sequence_1, sequence_2]
label_data = [labels_1, labels_2]
Create simple LSTM Model
class LSTM(nn.Module):
def __init__(self, input_dim, hidden_dim, output_dim):
super(LSTM, self).__init__()
self.lstm = nn.LSTM(input_dim, hidden_dim)
self.fc = nn.Linear(hidden_dim, output_dim)
def forward(self, seq):
lstm_out, _ = self.lstm(seq.view(len(seq), 1, -1))
out = self.fc(lstm_out.view(len(seq), -1))
out = F.log_softmax(out, dim=1)
return out
We try to overfit on the dataset
INPUT_DIM = 1
HIDDEN_DIM = 6
model = LSTM(INPUT_DIM, HIDDEN_DIM, 2)
loss_function = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)
for epoch in range(500):
for i, seq in enumerate(training_data):
labels = label_data[i]
model.zero_grad()
scores = model(seq)
loss = loss_function(scores, labels)
loss.backward()
print(loss)
optimizer.step()
with torch.no_grad():
seq_d = training_data[0]
tag_scores = model(seq_d)
for score in tag_scores:
print(np.argmax(score))
I would say it's not meant to work.
The model would always try to make sense and find patterns in the data it's trained on i.e sequence_1 and to "verify" that it has "found" them, it uses labels_1. Since the data is random the model fails to find the pattern.
The pattern the model tries to find is not in the label but in the data, so it doesn't matter how the label is arranged. The label actually never passes through the model, so NO.
If perhaps, you trained it on a single example then definitely. The model will become overfit and give you your ones and zeros and fail miserably on other examples, otherwise it just won't be able to make sense of the random data no matter the size.
Hidden State
Solely the hidden state from the previous sequence/timestep n would be used to predict the next timestep n+1... [1, 0, 1, 0...]
Concerning Hidden state, NOTE that it is not a trainable parameter, it is the result of performing some operations on the data and parameters, meaning that the input data determines the Hidden state.
What the Hidden state does is to hold the information the model has extracted from the previous timesteps and passes it to the next timestep or as output. In the case of LSTM, it does some forgetting and updating before passing it.

LSTM: Input 0 of layer sequential is incompatible with the layer

I know there are several questions about this here, but I haven't found one which fits exactly my problem.
I'm trying to fit an LSTM with data from Pandas DataFrames but getting confused about the format I have to provide them.
I created a small code snipped which shall show you what I try to do:
import pandas as pd, tensorflow as tf, random
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
targets = pd.DataFrame(index=pd.date_range(start='2019-01-01', periods=300, freq='D'))
targets['A'] = [random.random() for _ in range(len(targets))]
targets['B'] = [random.random() for _ in range(len(targets))]
features = pd.DataFrame(index=targets.index)
for i in range(len(features)) :
features[str(i)] = [random.random() for _ in range(len(features))]
model = Sequential()
model.add(LSTM(units=targets.shape[1], input_shape=features.shape))
model.compile(optimizer='adam', loss='mae')
model.fit(features, targets, batch_size=10, epochs=10)
this results to:
ValueError: Input 0 of layer sequential is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [10, 300]
which I expect relates to the dimensions of the features DataFrame provided. I guess that once fixed this the next error would mention the targets DataFrame.
As far as I understand, 'units' parameter of my first layer defines the output dimensionality of this model. The inputs have to have a 3D shape, but I don't know how to create them out of the 2D world of the Data Frames.
I hope you can help me understanding the reshape mechanism in Python and how to use them in combination with Pandas DataFrames. (I'm quite new to Python and came from R)
Thankls in advance
Lets looks at the few popular ways in LSTMs are used.
Many to Many
Example: You have a sentence (composed of words in sequence). Give these sequence of words you would like to predict the Parts of speech (POS) of each word.
So you have n words and you feed each word per timestep to the LSTM. Each LSTM timestep (also called LSTM unwrapping) will produce and output. The word is represented by a a set of features normally word embeddings. So the input to LSTM is of size bath_size X time_steps X features
Keras code:
inputs = keras.Input(shape=(10,3))
lstm = keras.layers.LSTM(8, input_shape = (10, 3), return_sequences = True)(inputs)
outputs = keras.layers.TimeDistributed(keras.layers.Dense(5, activation='softmax'))(lstm)
model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(loss='categorical_crossentropy', optimizer='adam')
X = np.random.randn(4,10,3)
y = np.random.randint(0,2, size=(4,10,5))
model.fit(X, y, epochs=2)
print (model.predict(X).shape)
Many to One
Example: Again you have a sentence (composed of words in sequence). Give these sequence of words you would like to predict sentiment of the sentence if it is positive or negative.
Keras code
inputs = keras.Input(shape=(10,3))
lstm = keras.layers.LSTM(8, input_shape = (10, 3), return_sequences = False)(inputs)
outputs =keras.layers.Dense(5, activation='softmax')(lstm)
model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(loss='categorical_crossentropy', optimizer='adam')
X = np.random.randn(4,10,3)
y = np.random.randint(0,2, size=(4,5))
model.fit(X, y, epochs=2)
print (model.predict(X).shape)
Many to multi-headed
Example: You have a sentence (composed of words in sequence). Give these sequence of words you would like to predict sentiment of the sentence as well the author of the sentence.
This is multi-headed model where one head will predict the sentiment and another head will predict the author. Both the heads share the same LSTM backbone.
Keras code
inputs = keras.Input(shape=(10,3))
lstm = keras.layers.LSTM(8, input_shape = (10, 3), return_sequences = False)(inputs)
output_A = keras.layers.Dense(5, activation='softmax')(lstm)
output_B = keras.layers.Dense(5, activation='softmax')(lstm)
model = keras.Model(inputs=inputs, outputs=[output_A, output_B])
model.compile(loss='categorical_crossentropy', optimizer='adam')
X = np.random.randn(4,10,3)
y_A = np.random.randint(0,2, size=(4,5))
y_B = np.random.randint(0,2, size=(4,5))
model.fit(X, [y_A, y_B], epochs=2)
y_hat_A, y_hat_B = model.predict(X)
print (y_hat_A.shape, y_hat_B.shape)
What you are looking for is Many to Multi head model where your predictions for A will be made by one head and another head will make predictions for B
The input data for the LSTM has to be 3D.
If you print the shapes of your DataFrames you get:
targets : (300, 2)
features : (300, 300)
The input data has to be reshaped into (samples, time steps, features). This means that targets and features must have the same shape.
You need to set a number of time steps for your problem, in other words, how many samples will be used to make a prediction.
For example, if you have 300 days and 2 features the time step can be 3. So that three days will be used to make one prediction (you can choose this arbitrarily). Here is the code for reshaping your data (with a few more changes):
import pandas as pd
import numpy as np
import tensorflow as tf
import random
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
data = pd.DataFrame(index=pd.date_range(start='2019-01-01', periods=300, freq='D'))
data['A'] = [random.random() for _ in range(len(data))]
data['B'] = [random.random() for _ in range(len(data))]
# Choose the time_step size.
time_steps = 3
# Use numpy for the 3D array as it is easier to handle.
data = np.array(data)
def make_x_y(ts, data):
"""
Parameters
ts : int
data : numpy array
This function creates two arrays, x and y.
x is the input data and y is the target data.
"""
x, y = [], []
offset = 0
for i in data:
if offset < len(data)-ts:
x.append(data[offset:ts+offset])
y.append(data[ts+offset])
offset += 1
return np.array(x), np.array(y)
x, y = make_x_y(time_steps, data)
print(x.shape, y.shape)
nodes = 100 # This is the width of the network.
out_size = 2 # Number of outputs produced by the network. Same size as features.
model = Sequential()
model.add(LSTM(units=nodes, input_shape=(x.shape[1], x.shape[2])))
model.add(Dense(out_size)) # For the output a Dense (fully connected) layer is used.
model.compile(optimizer='adam', loss='mae')
model.fit(x, y, batch_size=10, epochs=10)
Well, just to finalize this issue I would like to provide one solution I have meanwhile worked on. The class TimeseriesGenerator in tf.keras.... enabled me quite easy to provide the data in the right shape to an LSTM model
from keras.preprocessing.sequence import TimeseriesGenerator
import numpy as np
window_size = 7
batch_size = 8
sampling_rate = 1
train_gen = TimeseriesGenerator(X_train.values, y_train.values,
length=window_size, sampling_rate=sampling_rate,
batch_size=batch_size)
valid_gen = TimeseriesGenerator(X_valid.values, y_valid.values,
length=window_size, sampling_rate=sampling_rate,
batch_size=batch_size)
test_gen = TimeseriesGenerator(X_test.values, y_test.values,
length=window_size, sampling_rate=sampling_rate,
batch_size=batch_size)
There are many other ways on implementing generators e.g. using the more_itertools which provides the function windowed, or making use of tensorflow.Dataset and its function window.
For me the TimeseriesGenerator was sufficient to feed the tests I did.
In case you would like to see an example modeling the DAX based on some stocks I'm sharing a notebook on Github.

How to shape TFRecordDataset to meet Model API?

I am building a model based on this code for noise suppression. My problem with the vanilla implementation is that it loads all data at once, which is not the best idea when the training data gets really large; my input file, denoted in the linked code as training.h5, is over 30 GB.
I decided to instead go with tf.data interface that should allow me to work with large data sets; my problem here is that I don't know how to properly shape TFRecordDataset so that it meets what's required by the Model API.
If you check model.fit(x_train, [y_train, vad_train], it essentially requires the following:
x_train, shape [nb_sequences, window, 42]
y_train, shape [nb_sequences, window, 22]
vad_train, shape [nb_sequences, window, 1]
window one typically fixes (in the code: 2000), so the only variable nb_sequences that stems from how large is your data set. However, with tf.data, we don't supply x and y, but only x (see Model API docs).
Saving tfrecord to file
In an effort to make the code reproducible, I created the input file with the following code:
writer = tf.io.TFRecordWriter(path='example.tfrecord')
for record in data:
feature = {}
feature['X'] = tf.train.Feature(float_list=tf.train.FloatList(value=record[:42]))
feature['y'] = tf.train.Feature(float_list=tf.train.FloatList(value=record[42:64]))
feature['vad'] = tf.train.Feature(float_list=tf.train.FloatList(value=[record[64]]))
example = tf.train.Example(features=tf.train.Features(feature=feature))
serialized = example.SerializeToString()
writer.write(serialized)
writer.close()
data is our training data with shape [10000, 65]. My example.tfrecord is available here. It's 3 MB, in reality it would be 30 GB+.
You might notice that in the linked code, numpy array has shape [x, 87], while mine is [x, 65]. That's OK - the remainder is not used anywhere.
Loading the dataset with tf.data.TFRecordDataset
I would like to use tf.data to load "on demand" the data with some prefetching, there's no need to keep it all in memory. My attempt:
import datetime
import numpy as np
import h5py
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import GRU
from tensorflow.keras import regularizers
from tensorflow.keras.constraints import Constraint
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras import backend as K
from tensorflow.keras.layers import concatenate
def load_dataset(path):
def _parse_function(example_proto):
keys_to_features = {
'X': tf.io.FixedLenFeature([42], tf.float32),
'y': tf.io.FixedLenFeature([22], tf.float32),
'vad': tf.io.FixedLenFeature([1], tf.float32)
}
features = tf.io.parse_single_example(example_proto, keys_to_features)
return (features['X'], (features['y'], features['vad']))
dataset = tf.data.TFRecordDataset(path).map(_parse_function)
return dataset
def my_crossentropy(y_true, y_pred):
return K.mean(2 * K.abs(y_true - 0.5) * K.binary_crossentropy(y_pred, y_true), axis=-1)
def mymask(y_true):
return K.minimum(y_true + 1., 1.)
def msse(y_true, y_pred):
return K.mean(mymask(y_true) * K.square(K.sqrt(y_pred) - K.sqrt(y_true)), axis=-1)
def mycost(y_true, y_pred):
return K.mean(mymask(y_true) * (10 * K.square(K.square(K.sqrt(y_pred) - K.sqrt(y_true))) + K.square(
K.sqrt(y_pred) - K.sqrt(y_true)) + 0.01 * K.binary_crossentropy(y_pred, y_true)), axis=-1)
def my_accuracy(y_true, y_pred):
return K.mean(2 * K.abs(y_true - 0.5) * K.equal(y_true, K.round(y_pred)), axis=-1)
class WeightClip(Constraint):
'''Clips the weights incident to each hidden unit to be inside a range
'''
def __init__(self, c=2.0):
self.c = c
def __call__(self, p):
return K.clip(p, -self.c, self.c)
def get_config(self):
return {'name': self.__class__.__name__,
'c': self.c}
def build_model():
reg = 0.000001
constraint = WeightClip(0.499)
main_input = Input(shape=(None, 42), name='main_input')
tmp = Dense(24, activation='tanh', name='input_dense', kernel_constraint=constraint, bias_constraint=constraint)(
main_input)
vad_gru = GRU(24, activation='tanh', recurrent_activation='sigmoid', return_sequences=True, name='vad_gru',
kernel_regularizer=regularizers.l2(reg), recurrent_regularizer=regularizers.l2(reg),
kernel_constraint=constraint, recurrent_constraint=constraint, bias_constraint=constraint)(tmp)
vad_output = Dense(1, activation='sigmoid', name='vad_output', kernel_constraint=constraint,
bias_constraint=constraint)(vad_gru)
noise_input = concatenate([tmp, vad_gru, main_input])
noise_gru = GRU(48, activation='relu', recurrent_activation='sigmoid', return_sequences=True, name='noise_gru',
kernel_regularizer=regularizers.l2(reg), recurrent_regularizer=regularizers.l2(reg),
kernel_constraint=constraint, recurrent_constraint=constraint, bias_constraint=constraint)(noise_input)
denoise_input = concatenate([vad_gru, noise_gru, main_input])
denoise_gru = GRU(96, activation='tanh', recurrent_activation='sigmoid', return_sequences=True, name='denoise_gru',
kernel_regularizer=regularizers.l2(reg), recurrent_regularizer=regularizers.l2(reg),
kernel_constraint=constraint, recurrent_constraint=constraint, bias_constraint=constraint)(
denoise_input)
denoise_output = Dense(22, activation='sigmoid', name='denoise_output', kernel_constraint=constraint,
bias_constraint=constraint)(denoise_gru)
model = Model(inputs=main_input, outputs=[denoise_output, vad_output])
model.compile(loss=[mycost, my_crossentropy],
metrics=[msse],
optimizer='adam', loss_weights=[10, 0.5])
return model
model = build_model()
dataset = load_dataset('example.tfrecord')
My dataset has now the following shape:
<MapDataset shapes: ((42,), ((22,), (1,))), types: (tf.float32, (tf.float32, tf.float32))>
which I thought is what Model API expects (spoiler: it doesn't).
model.fit(dataset.batch(10))
gives following error:
ValueError: Error when checking input: expected main_input to have 3 dimensions, but got array with shape (None, 42)
Makes sense, I don't have the window here. At the same time it seems like it's not getting correct shape expected by Model(inputs=main_input, outputs=[denoise_output, vad_output]).
How to modify load_dataset so that it matches what's expected by the Model API for the tf.data?
Given that your model has 1 input and 2 outputs, your tf.data.Dataset should have two entries:
1) Input array of shape (window, 42)
2) Tuple of two arrays each of shape (window, 22) and (window, 1)
EDIT: Updated answer - you already return two element tuple
I just noticed that your dataset has these two entries (similar to those described above) and the only thing that differs is the shape.
The only operations you need to perfom is to batch your data twice:
First - to restore the window parameter.
Second - to pass a batch to a model.
window_size = 1
batch_size = 10
dataset = load_dataset('example.tfrecord')
model.fit(dataset.batch(window_size).batch(batch_size)
And that should work.
Below is an old answer, where I wrongfully assumed your dataset shape:
Old Answer, where I assumed you are returning three element tuple:
Assuming that you are starting from three element tuple of shapes (42,), (22,) and (1,), this can be achieved in the same batching operations, enriched with a custom_reshape function to return two-element tuple:
window_size = 1
batch_size = 10
dataset = load_dataset('example.tfrecord')
dataset = dataset.batch(window_size).batch(batch_size)
# Change output format
def custom_reshape(x, y, vad):
return x, (y, vad)
dataset = dataset.map(custom_reshape)
In short, given this dataset shape, you could just call:
model.fit(dataset.batch(window_size).batch(10).map(custom_reshape)
and it should work too.
Best of luck. And sorry again for the fuss.

tf.keras `predict()` gets different results

I was playing around with tf.keras and ran some predict() method on two Model objects with the same weights initialization.
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import LSTM, Masking, Input, Embedding, Dense
from tensorflow.keras.models import Model
tf.enable_eager_execution()
np.random.seed(10)
X = np.asarray([
[0, 1, 2, 3, 3],
[0, 0, 1, 1, 1],
[0, 0, 0, 1, 1],
])
y = [
0,
1,
1
]
seq_len = X.shape[1]
inp = Input(shape=[seq_len])
emb = Embedding(4, 10, name='embedding')(inp)
x = emb
x = LSTM(5, return_sequences=False, name='lstm')(x)
out = Dense(1, activation='sigmoid', name='out')(x)
model = Model(inputs=inp, outputs=out)
model.summary()
preds = model.predict(X)
inp = Input(shape=[seq_len])
emb = Embedding(4, 10, name='embedding', weights=model.get_layer('embedding').get_weights()[0])(inp)
x = emb
x = LSTM(5, return_sequences=False, weights=model.get_layer('lstm').get_weights()[0])(x)
out = Dense(1, activation='sigmoid', weights=model.get_layer('out').get_weights()[0])(x)
model_2 = Model(inputs=inp, outputs=out)
model_2.summary()
preds_2 = model_2.predict(X)
print(preds, preds_2)
I am not sure why but the results of the two predictions are different. I got these when I ran the print function. You might get something different.
[[0.5027414 ]
[0.5019673 ]
[0.50134844]] [[0.5007331]
[0.5002397]
[0.4996575]]
I am trying to understand how keras works. Any explanation would be appreciated. Thank you.
NOTE: THERE IS NO LEARNING INVOLVED HERE. I don't get the idea where the randomness comes from.
Try to change the optimizer from adam to SGD or something else. I noticed that with the same model I used to get different results and it fixed the problem. Also, take a look at the here to fix the initial weights. By the way, I don't know why and how the optimizer can affect the results in the test time with the same model.
It is that you are not copying all the weights. I have no idea why your call mechanically works but it is really easy to see you are not by examining the get_weights without the [0] indexing.
e.g.these are not copied:
model.get_layer('lstm').get_weights()[1]
array([[ 0.11243069, -0.1028666 , 0.01080172, -0.07471965, 0.05566487,
-0.12818974, 0.34882438, -0.17163819, -0.21306667, 0.5386005 ,
-0.03643916, 0.03835883, -0.31128728, 0.04882491, -0.05503649,
-0.22660127, -0.4683674 , -0.00415642, -0.29038426, -0.06893865],
[-0.5117522 , 0.01057898, -0.23182054, 0.03220385, 0.21614116,
0.0732751 , -0.30829042, 0.06233712, -0.54017985, -0.1026137 ,
-0.18011908, 0.15880923, -0.21900705, -0.11910527, -0.03808065,
0.07623457, -0.13157862, -0.18740109, 0.06135096, -0.21589288],
[-0.2295578 , -0.12452635, -0.08739456, -0.1880849 , 0.2220488 ,
-0.14575425, 0.32249492, 0.05235165, -0.09479579, 0.2496742 ,
0.10411342, -0.0263749 , 0.33186644, -0.1838699 , 0.28964192,
-0.2414586 , 0.41612682, 0.13791762, 0.13942356, -0.36176005],
[-0.14428475, -0.02090888, 0.27968913, 0.09452424, 0.1291543 ,
-0.43372717, -0.11366601, 0.37842247, 0.3320751 , 0.21959782,
-0.4242381 , 0.02412989, -0.24809352, 0.2508208 , -0.06223384,
0.08648364, 0.17311276, -0.05988384, 0.02276517, -0.1473657 ],
[ 0.28600952, -0.37206012, 0.21376705, -0.16566195, 0.0833357 ,
-0.00887177, 0.01394618, 0.5345957 , -0.25116244, -0.17159337,
0.096329 , -0.32286254, 0.02044407, -0.1393016 , -0.0767666 ,
0.1505355 , -0.28456056, 0.16909163, 0.16806729, -0.14622769]],
dtype=float32)
but also if you name the lstm layer in model 2 you can see there are not equal parts of the weights.
model_2.get_layer("lstm").get_weights()[1] - model.get_layer("lstm").get_weights()[1]
Perhaps, setting numpy seed is not enough to make the operations and weights deterministic. Tensorflow documentation suggests that to have deterministic weights, you should rather run
tf.keras.utils.set_random_seed(1)
tf.config.experimental.enable_op_determinism()
https://www.tensorflow.org/api_docs/python/tf/config/experimental/enable_op_determinism#:~:text=Configures%20TensorFlow%20ops%20to%20run%20deterministically.&text=When%20op%20determinism%20is%20enabled,is%20useful%20for%20debugging%20models.
Could you check if it helps? (your code seems to be written in version 1 of TF, so it does not run on my v2 setup without adaptation)
The thing about machine learning is that it doesn't always learn quite the same way. It involves lots of probabilities, so on a larger scale the results will tend to converge towards one value, but individual runs can and will give varying results.
More info here
It is absolutely normal that the many runs with the same input data
give different output. It is mainly due to the internal stochasticity
of such machine learning techniques (example: ANN, Decision Trees
building algorithms, etc.).
- Nabil Belgasmi, Université de la Manouba
There is not a specific method or technique. The results and
evaluation of the performance depends on several factors: the data
type, parameters of induction function, training set (supervised),
etc. What is important is to compare the results of using metric
measurements such as recall, precision, F_measure, ROC curves or other
graphical methods.
- Jésus Antonio Motta Laval University
EDIT
The predict() function takes an array of one or more data instances.
The example below demonstrates how to make regression predictions on multiple data instances with an unknown expected outcome.
# example of making predictions for a regression problem
from keras.models import Sequential
from keras.layers import Dense
from sklearn.datasets import make_regression
from sklearn.preprocessing import MinMaxScaler
# generate regression dataset
X, y = make_regression(n_samples=100, n_features=2, noise=0.1, random_state=1)
scalarX, scalarY = MinMaxScaler(), MinMaxScaler()
scalarX.fit(X)
scalarY.fit(y.reshape(100,1))
X = scalarX.transform(X)
y = scalarY.transform(y.reshape(100,1))
# define and fit the final model
model = Sequential()
model.add(Dense(4, input_dim=2, activation='relu'))
model.add(Dense(4, activation='relu'))
model.add(Dense(1, activation='linear'))
model.compile(loss='mse', optimizer='adam')
model.fit(X, y, epochs=1000, verbose=0)
# new instances where we do not know the answer
Xnew, a = make_regression(n_samples=3, n_features=2, noise=0.1, random_state=1)
Xnew = scalarX.transform(Xnew)
# make a prediction
ynew = model.predict(Xnew)
# show the inputs and predicted outputs
for i in range(len(Xnew)):
print("X=%s, Predicted=%s" % (Xnew[i], ynew[i]))
Running the example makes multiple predictions, then prints the inputs and predictions side by side for review.
X=[0.29466096 0.30317302], Predicted=[0.17097184]
X=[0.39445118 0.79390858], Predicted=[0.7475489]
X=[0.02884127 0.6208843 ], Predicted=[0.43370453]
SOURCE
Disclaimer: The predict() function itself is slightly random (probabilistic)

Categories

Resources