I was playing around with tf.keras and ran some predict() method on two Model objects with the same weights initialization.
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import LSTM, Masking, Input, Embedding, Dense
from tensorflow.keras.models import Model
tf.enable_eager_execution()
np.random.seed(10)
X = np.asarray([
[0, 1, 2, 3, 3],
[0, 0, 1, 1, 1],
[0, 0, 0, 1, 1],
])
y = [
0,
1,
1
]
seq_len = X.shape[1]
inp = Input(shape=[seq_len])
emb = Embedding(4, 10, name='embedding')(inp)
x = emb
x = LSTM(5, return_sequences=False, name='lstm')(x)
out = Dense(1, activation='sigmoid', name='out')(x)
model = Model(inputs=inp, outputs=out)
model.summary()
preds = model.predict(X)
inp = Input(shape=[seq_len])
emb = Embedding(4, 10, name='embedding', weights=model.get_layer('embedding').get_weights()[0])(inp)
x = emb
x = LSTM(5, return_sequences=False, weights=model.get_layer('lstm').get_weights()[0])(x)
out = Dense(1, activation='sigmoid', weights=model.get_layer('out').get_weights()[0])(x)
model_2 = Model(inputs=inp, outputs=out)
model_2.summary()
preds_2 = model_2.predict(X)
print(preds, preds_2)
I am not sure why but the results of the two predictions are different. I got these when I ran the print function. You might get something different.
[[0.5027414 ]
[0.5019673 ]
[0.50134844]] [[0.5007331]
[0.5002397]
[0.4996575]]
I am trying to understand how keras works. Any explanation would be appreciated. Thank you.
NOTE: THERE IS NO LEARNING INVOLVED HERE. I don't get the idea where the randomness comes from.
Try to change the optimizer from adam to SGD or something else. I noticed that with the same model I used to get different results and it fixed the problem. Also, take a look at the here to fix the initial weights. By the way, I don't know why and how the optimizer can affect the results in the test time with the same model.
It is that you are not copying all the weights. I have no idea why your call mechanically works but it is really easy to see you are not by examining the get_weights without the [0] indexing.
e.g.these are not copied:
model.get_layer('lstm').get_weights()[1]
array([[ 0.11243069, -0.1028666 , 0.01080172, -0.07471965, 0.05566487,
-0.12818974, 0.34882438, -0.17163819, -0.21306667, 0.5386005 ,
-0.03643916, 0.03835883, -0.31128728, 0.04882491, -0.05503649,
-0.22660127, -0.4683674 , -0.00415642, -0.29038426, -0.06893865],
[-0.5117522 , 0.01057898, -0.23182054, 0.03220385, 0.21614116,
0.0732751 , -0.30829042, 0.06233712, -0.54017985, -0.1026137 ,
-0.18011908, 0.15880923, -0.21900705, -0.11910527, -0.03808065,
0.07623457, -0.13157862, -0.18740109, 0.06135096, -0.21589288],
[-0.2295578 , -0.12452635, -0.08739456, -0.1880849 , 0.2220488 ,
-0.14575425, 0.32249492, 0.05235165, -0.09479579, 0.2496742 ,
0.10411342, -0.0263749 , 0.33186644, -0.1838699 , 0.28964192,
-0.2414586 , 0.41612682, 0.13791762, 0.13942356, -0.36176005],
[-0.14428475, -0.02090888, 0.27968913, 0.09452424, 0.1291543 ,
-0.43372717, -0.11366601, 0.37842247, 0.3320751 , 0.21959782,
-0.4242381 , 0.02412989, -0.24809352, 0.2508208 , -0.06223384,
0.08648364, 0.17311276, -0.05988384, 0.02276517, -0.1473657 ],
[ 0.28600952, -0.37206012, 0.21376705, -0.16566195, 0.0833357 ,
-0.00887177, 0.01394618, 0.5345957 , -0.25116244, -0.17159337,
0.096329 , -0.32286254, 0.02044407, -0.1393016 , -0.0767666 ,
0.1505355 , -0.28456056, 0.16909163, 0.16806729, -0.14622769]],
dtype=float32)
but also if you name the lstm layer in model 2 you can see there are not equal parts of the weights.
model_2.get_layer("lstm").get_weights()[1] - model.get_layer("lstm").get_weights()[1]
Perhaps, setting numpy seed is not enough to make the operations and weights deterministic. Tensorflow documentation suggests that to have deterministic weights, you should rather run
tf.keras.utils.set_random_seed(1)
tf.config.experimental.enable_op_determinism()
https://www.tensorflow.org/api_docs/python/tf/config/experimental/enable_op_determinism#:~:text=Configures%20TensorFlow%20ops%20to%20run%20deterministically.&text=When%20op%20determinism%20is%20enabled,is%20useful%20for%20debugging%20models.
Could you check if it helps? (your code seems to be written in version 1 of TF, so it does not run on my v2 setup without adaptation)
The thing about machine learning is that it doesn't always learn quite the same way. It involves lots of probabilities, so on a larger scale the results will tend to converge towards one value, but individual runs can and will give varying results.
More info here
It is absolutely normal that the many runs with the same input data
give different output. It is mainly due to the internal stochasticity
of such machine learning techniques (example: ANN, Decision Trees
building algorithms, etc.).
- Nabil Belgasmi, Université de la Manouba
There is not a specific method or technique. The results and
evaluation of the performance depends on several factors: the data
type, parameters of induction function, training set (supervised),
etc. What is important is to compare the results of using metric
measurements such as recall, precision, F_measure, ROC curves or other
graphical methods.
- Jésus Antonio Motta Laval University
EDIT
The predict() function takes an array of one or more data instances.
The example below demonstrates how to make regression predictions on multiple data instances with an unknown expected outcome.
# example of making predictions for a regression problem
from keras.models import Sequential
from keras.layers import Dense
from sklearn.datasets import make_regression
from sklearn.preprocessing import MinMaxScaler
# generate regression dataset
X, y = make_regression(n_samples=100, n_features=2, noise=0.1, random_state=1)
scalarX, scalarY = MinMaxScaler(), MinMaxScaler()
scalarX.fit(X)
scalarY.fit(y.reshape(100,1))
X = scalarX.transform(X)
y = scalarY.transform(y.reshape(100,1))
# define and fit the final model
model = Sequential()
model.add(Dense(4, input_dim=2, activation='relu'))
model.add(Dense(4, activation='relu'))
model.add(Dense(1, activation='linear'))
model.compile(loss='mse', optimizer='adam')
model.fit(X, y, epochs=1000, verbose=0)
# new instances where we do not know the answer
Xnew, a = make_regression(n_samples=3, n_features=2, noise=0.1, random_state=1)
Xnew = scalarX.transform(Xnew)
# make a prediction
ynew = model.predict(Xnew)
# show the inputs and predicted outputs
for i in range(len(Xnew)):
print("X=%s, Predicted=%s" % (Xnew[i], ynew[i]))
Running the example makes multiple predictions, then prints the inputs and predictions side by side for review.
X=[0.29466096 0.30317302], Predicted=[0.17097184]
X=[0.39445118 0.79390858], Predicted=[0.7475489]
X=[0.02884127 0.6208843 ], Predicted=[0.43370453]
SOURCE
Disclaimer: The predict() function itself is slightly random (probabilistic)
Related
I have a simple neural network with two outputs and for each of them I need to use different activation function. I do basically what is written in this article - here, but it looks like my layer with different activation functions is not working:
See my code below:
X = filled_df.loc[:, "SOUTEZ_MEAN_HOME":"TOTAL_POINTS_AWAY"].values
y = filled_df.loc[:, "HOME_YELLOW_CARDS"].values
X= X.astype("float32")
y= y.astype("float32")
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.3)
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train= scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
def negative_binomial_layer(x):
# Get the number of dimensions of the input
num_dims = len(x.get_shape())
# Separate the parameters
n, p = tf.unstack(x, num=2, axis=-1)
# Add one dimension to make the right shape
n = tf.expand_dims(n, -1)
p = tf.expand_dims(p, -1)
# Apply a softplus to make positive
n = tf.cast(n, tf.float32)
p = tf.cast(p, tf.float32)
n = tf.keras.activations.softplus(n)
# Apply a sigmoid activation to bound between 0 and 1
p = tf.keras.activations.sigmoid(p)
# Join back together again
out_tensor = tf.concat((n, p), axis=num_dims-1)
return out_tensor
input_shape = (212, )
# Define inputs with predefined shape
inputs = Input(shape=input_shape)
# Build network with some predefined architecture
Layer1 = Dense(16)
Layer2 = Dense(8)
output1 = Layer1(inputs)
output2 = Layer2(output1)
# Predict the parameters of a negative binomial distribution
outputs = Dense(2)(output2)
#outputs = tf.cast(outputs, tf.float32)
distribution_outputs = Lambda(negative_binomial_layer)(outputs)
# Construct model
model = Model(inputs=inputs, outputs=outputs)
num_epochs = 10
opt = Adam()
model.compile(loss = negative_binomial_loss, optimizer = opt)
history = model.fit(X_train, y_train, epochs = num_epochs,
validation_data = (X_test, y_test))
These are my predicted values if I print y_pred in custom loss function:
Epoch 1/10
y_pred = [[2.19472528 3.14479065]
[-1.16056371 1.69369149]
[-1.12327099 2.06830978]
...
[-1.23587477 4.82307]
[0.235431105 3.86740351]
[-2.75554061 1.10352468]] [[[2.19472528 3.14479065]
[-1.16056371 1.69369149]
[-1.12327099 2.06830978]
...
[-1.23587477 4.82307]
[0.235431105 3.86740351]
[-2.75554061 1.10352468]]]
Second predicted value p should be between 0 and 1 and since it is out of this range I am getting nan during counting loss.
Any suggestions? Thanks
I can't give an exact programming explanation, but I can give a theoretical answer to this question which you should be able to use to build it.
From what I am assuming, you are asking how to use a different activation function for each output node in the outputs layer. I do not know much about any of the libraries or extensions you are using, but usually these kinds of libraries include some kind of method to create a customised network. From the code you have posted I can see that you are using a pre-defined structure for a network, this means that you may not be able to customise the output layer yourself, and you will have to create a custom network instead. I am assuming you are using Tensorflow due to some of the methods in the code you posted.
There is also something else to consider. Usually you have activated functions on the neurons (hidden layer) too, that is something that you might have to take in to consideration as well.
I am sorry I was not able to give a practical answer, but I hope this helps you see what you can do to get it to work - have a nice day!
I am new to machine learning and trying to apply it to my problem.
I have a training dataset with 44000 rows of features with shape 6, 25. I want to build a sequential model. I was wondering if there is a way to use the features without flattening it. Currently, I flatten the features to 1d array and normalize for training (see the code below). I could not find a way to normalize 2d features.
dataset2d = dataset2d.reshape(dataset2d.shape[0],
dataset2d.shape[1]*dataset2d.shape[2])
normalizer = preprocessing.Normalization()
normalizer.adapt(dataset2d)
print(normalizer.mean.numpy())
x_train, x_test, y_train, y_test = train_test_split(dataset2d, flux_val,
test_size=0.2)
# %% DNN regression multiple parameter
def build_and_compile_model(norm):
inputs = Input(shape=(x_test.shape[1],))
x = norm(inputs)
x = layers.Dense(128, activation="selu")(x)
x = layers.Dense(64, activation="relu")(x)
x = layers.Dense(32, activation="relu")(x)
x = layers.Dense(1, activation="linear")(x)
model = Model(inputs, x)
model.compile(loss='mean_squared_error',
optimizer=keras.optimizers.Adam(learning_rate=1e-3))
return model
dnn_model = build_and_compile_model(normalizer)
dnn_model.summary()
# interrupt training when model is no longer imporving
path_checkpoint = "model_checkpoint.h5"
modelckpt_callback = keras.callbacks.ModelCheckpoint(monitor="val_loss",
filepath=path_checkpoint,
verbose=1,
save_weights_only=True,
save_best_only=True)
es_callback = keras.callbacks.EarlyStopping(monitor="val_loss",
min_delta=0, patience=10)
history = dnn_model.fit(x_train, y_train, validation_split=0.2,
epochs=120, callbacks=[es_callback, modelckpt_callback])
I also tried to modify my model input layer to the following, such that I do not need to reshape my input
inputs = Input(shape=(x_test.shape[-1], x_test.shape[-2], ))
and modify the normalization to the following
normalizer = preprocessing.Normalization(axis=1)
normalizer.adapt(dataset2d)
print(normalizer.mean.numpy())
But this does not seem to help. The normalization adapts to a 1d array of length 6, while I want it to adapt to a 2d array of shape 25, 6.
Sorry for the long question. You help will be much appreciated.
I'm not sure if I understood your issue. The normalizer layer can take N-D tensor and it produces an output with the same shape, for example:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
t = tf.constant(np.arange(2*3*4).reshape(2,3,4) , dtype=tf.float32)
tf.print("\n",t)
normalizer_layer = tf.keras.layers.LayerNormalization(axis=1)
output = normalizer_layer(t)
tf.print("\n",output)
I know there are several questions about this here, but I haven't found one which fits exactly my problem.
I'm trying to fit an LSTM with data from Pandas DataFrames but getting confused about the format I have to provide them.
I created a small code snipped which shall show you what I try to do:
import pandas as pd, tensorflow as tf, random
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
targets = pd.DataFrame(index=pd.date_range(start='2019-01-01', periods=300, freq='D'))
targets['A'] = [random.random() for _ in range(len(targets))]
targets['B'] = [random.random() for _ in range(len(targets))]
features = pd.DataFrame(index=targets.index)
for i in range(len(features)) :
features[str(i)] = [random.random() for _ in range(len(features))]
model = Sequential()
model.add(LSTM(units=targets.shape[1], input_shape=features.shape))
model.compile(optimizer='adam', loss='mae')
model.fit(features, targets, batch_size=10, epochs=10)
this results to:
ValueError: Input 0 of layer sequential is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [10, 300]
which I expect relates to the dimensions of the features DataFrame provided. I guess that once fixed this the next error would mention the targets DataFrame.
As far as I understand, 'units' parameter of my first layer defines the output dimensionality of this model. The inputs have to have a 3D shape, but I don't know how to create them out of the 2D world of the Data Frames.
I hope you can help me understanding the reshape mechanism in Python and how to use them in combination with Pandas DataFrames. (I'm quite new to Python and came from R)
Thankls in advance
Lets looks at the few popular ways in LSTMs are used.
Many to Many
Example: You have a sentence (composed of words in sequence). Give these sequence of words you would like to predict the Parts of speech (POS) of each word.
So you have n words and you feed each word per timestep to the LSTM. Each LSTM timestep (also called LSTM unwrapping) will produce and output. The word is represented by a a set of features normally word embeddings. So the input to LSTM is of size bath_size X time_steps X features
Keras code:
inputs = keras.Input(shape=(10,3))
lstm = keras.layers.LSTM(8, input_shape = (10, 3), return_sequences = True)(inputs)
outputs = keras.layers.TimeDistributed(keras.layers.Dense(5, activation='softmax'))(lstm)
model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(loss='categorical_crossentropy', optimizer='adam')
X = np.random.randn(4,10,3)
y = np.random.randint(0,2, size=(4,10,5))
model.fit(X, y, epochs=2)
print (model.predict(X).shape)
Many to One
Example: Again you have a sentence (composed of words in sequence). Give these sequence of words you would like to predict sentiment of the sentence if it is positive or negative.
Keras code
inputs = keras.Input(shape=(10,3))
lstm = keras.layers.LSTM(8, input_shape = (10, 3), return_sequences = False)(inputs)
outputs =keras.layers.Dense(5, activation='softmax')(lstm)
model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(loss='categorical_crossentropy', optimizer='adam')
X = np.random.randn(4,10,3)
y = np.random.randint(0,2, size=(4,5))
model.fit(X, y, epochs=2)
print (model.predict(X).shape)
Many to multi-headed
Example: You have a sentence (composed of words in sequence). Give these sequence of words you would like to predict sentiment of the sentence as well the author of the sentence.
This is multi-headed model where one head will predict the sentiment and another head will predict the author. Both the heads share the same LSTM backbone.
Keras code
inputs = keras.Input(shape=(10,3))
lstm = keras.layers.LSTM(8, input_shape = (10, 3), return_sequences = False)(inputs)
output_A = keras.layers.Dense(5, activation='softmax')(lstm)
output_B = keras.layers.Dense(5, activation='softmax')(lstm)
model = keras.Model(inputs=inputs, outputs=[output_A, output_B])
model.compile(loss='categorical_crossentropy', optimizer='adam')
X = np.random.randn(4,10,3)
y_A = np.random.randint(0,2, size=(4,5))
y_B = np.random.randint(0,2, size=(4,5))
model.fit(X, [y_A, y_B], epochs=2)
y_hat_A, y_hat_B = model.predict(X)
print (y_hat_A.shape, y_hat_B.shape)
What you are looking for is Many to Multi head model where your predictions for A will be made by one head and another head will make predictions for B
The input data for the LSTM has to be 3D.
If you print the shapes of your DataFrames you get:
targets : (300, 2)
features : (300, 300)
The input data has to be reshaped into (samples, time steps, features). This means that targets and features must have the same shape.
You need to set a number of time steps for your problem, in other words, how many samples will be used to make a prediction.
For example, if you have 300 days and 2 features the time step can be 3. So that three days will be used to make one prediction (you can choose this arbitrarily). Here is the code for reshaping your data (with a few more changes):
import pandas as pd
import numpy as np
import tensorflow as tf
import random
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
data = pd.DataFrame(index=pd.date_range(start='2019-01-01', periods=300, freq='D'))
data['A'] = [random.random() for _ in range(len(data))]
data['B'] = [random.random() for _ in range(len(data))]
# Choose the time_step size.
time_steps = 3
# Use numpy for the 3D array as it is easier to handle.
data = np.array(data)
def make_x_y(ts, data):
"""
Parameters
ts : int
data : numpy array
This function creates two arrays, x and y.
x is the input data and y is the target data.
"""
x, y = [], []
offset = 0
for i in data:
if offset < len(data)-ts:
x.append(data[offset:ts+offset])
y.append(data[ts+offset])
offset += 1
return np.array(x), np.array(y)
x, y = make_x_y(time_steps, data)
print(x.shape, y.shape)
nodes = 100 # This is the width of the network.
out_size = 2 # Number of outputs produced by the network. Same size as features.
model = Sequential()
model.add(LSTM(units=nodes, input_shape=(x.shape[1], x.shape[2])))
model.add(Dense(out_size)) # For the output a Dense (fully connected) layer is used.
model.compile(optimizer='adam', loss='mae')
model.fit(x, y, batch_size=10, epochs=10)
Well, just to finalize this issue I would like to provide one solution I have meanwhile worked on. The class TimeseriesGenerator in tf.keras.... enabled me quite easy to provide the data in the right shape to an LSTM model
from keras.preprocessing.sequence import TimeseriesGenerator
import numpy as np
window_size = 7
batch_size = 8
sampling_rate = 1
train_gen = TimeseriesGenerator(X_train.values, y_train.values,
length=window_size, sampling_rate=sampling_rate,
batch_size=batch_size)
valid_gen = TimeseriesGenerator(X_valid.values, y_valid.values,
length=window_size, sampling_rate=sampling_rate,
batch_size=batch_size)
test_gen = TimeseriesGenerator(X_test.values, y_test.values,
length=window_size, sampling_rate=sampling_rate,
batch_size=batch_size)
There are many other ways on implementing generators e.g. using the more_itertools which provides the function windowed, or making use of tensorflow.Dataset and its function window.
For me the TimeseriesGenerator was sufficient to feed the tests I did.
In case you would like to see an example modeling the DAX based on some stocks I'm sharing a notebook on Github.
I've just started exploring TensorFlow and I'm facing an issue regarding performance. As a starting example, I tried implementing a model to simulate a logic gate. Let's say there are two inputs A and B and one output Y. Suppose Y depended only on B and not on A. That means that the following are valid examples:
[0, 0] -> 0
[1, 0] -> 0
[0, 1] -> 1
[1, 1] -> 1
I created training sets for this data and created a model that uses a DenseFeatures layer using two features A and B. This layer feeds into a Dense(128, 'relu') layer, which feeds into a Dense(16, 'relu') layer, which finally feeds into a Dense(1, 'sigmoid') layer.
Training this NN works fine and the predictions are perfect. However, I noticed that on my MacBook, each prediction takes about 250ms. This is too much, since my final goal is to use such a NN to test hundreds of predictions each second.
So I stripped the network down to DenseFeatures([A, B]) -> Dense(8, 'relu') -> Dense(1, 'sigmoid'), however predictions for this NN still takes the same about of time. I was expecting that the execution speed depends on the complexity of the model. I can see that this is not the case here? What am I doing wrong?
Also, I had read that TensorFlow uses floating point math for accuracy but this has a penalty hit in terms of performance and if we convert our data to use integer math, it would speed things up. However, I have no idea of how to achieve that.
I would really appreciate if someone can help me understand why predictions for such a simple logic gate and such a simple NN is taking this long. And how can I speed it up.
For reference, here is my code in python:
import random
from typing import Any, List
import numpy as np
import tensorflow as tf
from sklearn.model_selection import train_test_split
from tensorflow import feature_column
from tensorflow.keras import layers
class Input:
def __init__(self, data: List[int]):
self.data = data
class Output:
def __init__(self, value: float):
self.value = value
class Item:
def __init__(self, input: Input, output: Output):
self.input = input
self.output = output
DATA: List[Item] = []
for i in range(10000):
x = Input([random.randint(0, 1), random.randint(0, 1)])
y = Output(x.data[1])
DATA.append(Item(x, y))
BATCH_SIZE = 5
DATA_TRAIN, DATA_TEST = train_test_split(DATA, shuffle=True, test_size=0.2)
DATA_TRAIN, DATA_VAL = train_test_split(DATA_TRAIN, shuffle=True, test_size=0.2/0.8)
def toDataSet(data: List[Item], shuffle: bool, batch_size: int):
a = {
'a': [x.input.data[0] for x in data],
'b': [x.input.data[1] for x in data],
}
b = [x.output.value for x in data]
return tf.data.Dataset.from_tensor_slices((a, b)).shuffle(buffer_size=len(data)).batch(BATCH_SIZE)
DS_TRAIN = toDataSet(DATA_TRAIN, True, 5)
DS_VAL = toDataSet(DATA_VAL, True, 5)
DS_TEST = toDataSet(DATA_TEST, True, 5)
FEATURES = []
FEATURES.append(a)
FEATURES.append(b)
feature_layer = tf.keras.layers.DenseFeatures(FEATURES)
model = tf.keras.models.load_model('MODEL.H5')
model = tf.keras.Sequential([
feature_layer,
layers.Dense(8, activation='relu'),
layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(DS_TRAIN, validation_data=DS_VAL, epochs=10)
loss, accuracy = model.evaluate(DS_TEST)
for i in range(1000):
val = model.predict([np.array([random.randint(0, 1)]), np.array([random.randint(0, 1)])])
Since you are only using integers, change the input of the model to use 8-bit signed integers. You can do this by changing the datatype in your input layer by adding the dtype parameter. This will vastly improve processing speed since you won't be wasting calculations.
I have built a LSTM model using Keras library to predict duplicate questions on the Quora official dataset. The test labels are 0 or 1. 1 indicates the question pair is duplicate. After building the model using model.fit, I test the model using model.predict on the test data. The output is an array of values(probabilities) like below:
[ 0.00514298]
[ 0.15161049]
[ 0.27588326]
[ 0.00236167]
[ 1.80067325]
[ 0.01048524]
[ 1.43425131]
[ 1.99202418]
[ 0.54853892]
[ 0.02514757]
I am only showing the first 10 values in the array. I don't understand what do these values mean and how do I compare it against the test labels to calculate the test accuracy. I want the model to output the binary predicted values as 0 or 1 rather than the probabilities. Please refer the last section of my code below:
sequence_1_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32')
embedded_sequences_1 = embedding_layer(sequence_1_input)
x1 = lstm_layer(embedded_sequences_1)
sequence_2_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32')
embedded_sequences_2 = embedding_layer(sequence_2_input)
y1 = lstm_layer(embedded_sequences_2)
merged = concatenate([x1, y1])
merged = Dropout(rate_drop_dense)(merged)
merged = BatchNormalization()(merged)
merged = Dense(num_dense, activation=act)(merged)
merged = Dropout(rate_drop_dense)(merged)
merged = BatchNormalization()(merged)
preds = Dense(1, activation='sigmoid')(merged)
########################################
## train the model
########################################
model = Model(inputs=[sequence_1_input, sequence_2_input], \
outputs=preds)
model.compile(loss='binary_crossentropy',
optimizer='nadam',
metrics=['acc'])
hist = model.fit([data_1_train, data_2_train], labels_train, \
validation_data=([data_1_val, data_2_val], labels_val, weight_val), \
epochs=200, batch_size=2048, shuffle=True, \
class_weight=class_weight, callbacks=[early_stopping, model_checkpoint])
preds = model.predict([test_data_1, test_data_2], batch_size=8192,
verbose=1)
preds += model.predict([test_data_2, test_data_1], batch_size=8192,
verbose=1)
preds /= 2
print(type(preds))
print(preds[:20])
print('preds.ravel')
print(preds.ravel())
As you say, your output is a np array with probabilities. You can convert it to binary labels by doing for example (model.predict(X) > 0.5).astype(int)
Artificial neural networks are probablisitc classfiiers, so your output is absolutly fine. It´s just the probability to belong to your target label.
In addition one interesting fact is that 0.5 is maybe not the offet you want to use. It depends on, how important true-positives and false-positives are in your task. You can take a look at the ROC Curves to find the optimal offset.
You can try changing your activation function to softmax in your last layer or you can make your own softmax function and pass your output to that function. Here's an example for a custom softmax function
def softmax(x):
return np.exp(x) / np.sum(np.exp(x), axis=0)