Consider the following example problem:
# dummy data for a SO question
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
from keras.models import Model
from keras.layers import Input, Conv1D, Dense
from keras.optimizers import Adam, SGD
time = np.array(range(100))
brk = np.array((time>40) & (time < 60)).reshape(100,1)
B = np.array([5, -5]).reshape(1,2)
np.dot(brk, B)
y = np.c_[np.sin(time), np.sin(time)] + np.random.normal(scale = .2, size=(100,2))+ np.dot(brk, B)
plt.clf()
plt.plot(time, y[:,0])
plt.plot(time, y[:,1])
You've got N time series, and they've got one component that follows a common process, and another component that is idiosyncratic to the series itself. Assume for simplicity that you know a priori that the bump is between 40 and 60, and you want to model it simultaneously with the sinusoidal component.
A TCN does a good job on the common component, but it can't get the series-idiosyncratic component:
# time series model
n_filters = 10
filter_width = 3
dilation_rates = [2**i for i in range(7)]
inp = Input(shape=(None, 1))
x = inp
for dilation_rate in dilation_rates:
x = Conv1D(filters=n_filters,
kernel_size=filter_width,
padding='causal',
activation = "relu",
dilation_rate=dilation_rate)(x)
x = Dense(1)(x)
model = Model(inputs = inp, outputs = x)
model.compile(optimizer = Adam(), loss='mean_squared_error')
model.summary()
X_train = np.transpose(np.c_[time, time]).reshape(2,100,1)
y_train = np.transpose(y).reshape(2,100,1)
history = model.fit(X_train, y_train,
batch_size=2,
epochs=1000,
verbose = 0)
yhat = model.predict(X_train)
plt.clf()
plt.plot(time, y[:,0])
plt.plot(time, y[:,1])
plt.plot(time, yhat[0,:,:])
plt.plot(time, yhat[1,:,:])
On the other hand, a basic linear regression with N outputs (here implemented in Keras) is perfect for the idiosyncratic component:
inp1 = Input((1,))
x1 = inp1
x1 = Dense(2)(x1)
model1 = Model(inputs = inp1, outputs = x1)
model1.compile(optimizer = Adam(), loss='mean_squared_error')
model1.summary()
brk_train = brk
y_train = y
history = model1.fit(brk_train, y_train,
batch_size=100,
epochs=6000, verbose = 0)
yhat1 = model1.predict(brk_train)
plt.clf()
plt.plot(time, y[:,0])
plt.plot(time, y[:,1])
plt.plot(time, yhat1[:,0])
plt.plot(time, yhat1[:,1])
I want to use keras to jointly estimate the time series component and the idiosyncratic component. The major problem is that feed-forward networks (which linear regression is a special case of) take shape batch_size x dims while time series networks take dimension batch_size x time_steps x dims.
Because I want to jointly estimate the idiosyncratic part of the model (the linear regression part) together with the time series part, I'm only ever going to batch-sample whole time-series. Which is why I specified batch_size = time_steps for model 1.
But in the static model, what I'm really doing is modeling my data as time_steps x dims.
I have tried to re-cast the feed-forward model as a time-series model, without success. Here's the non-working approach:
inp3 = Input(shape = (None, 1))
x3 = inp3
x3 = Dense(2)(x3)
model3 = Model(inputs = inp3, outputs = x3)
model3.compile(optimizer = Adam(), loss='mean_squared_error')
model3.summary()
brk_train = brk.reshape(1, 100, 1)
y_train = np.transpose(y).reshape(2,100,1)
history = model3.fit(brk_train, y_train,
batch_size=1,
epochs=1000, verbose = 1)
ValueError: Error when checking target: expected dense_40 to have shape (None, 2) but got array with shape (100, 1)
I am trying to fit the same model as model1, but with a different shape, so that it is compatible with the TCN model -- and importantly so that it will have the same batching structure.
The output should ultimately have the shape (2, 100, 1) in this example. Basically I want the model to do the following algorithm:
ingest X of shape (N, time_steps, dims)
Lose the first dimension, because the design matrix is going to be identical for every series, yielding X1 of shape (time_steps, dims)
Forward step: np.dot(X1, W), where W is of dimension (dims, N), yielding X2 of dimension (time_steps, N)
Reshape X2 to (N, time_steps, 1). Then I can add it to the output of the other part of the model.
Backwards step: since this is just a linear model, the gradient of W with respect to the output is just X1
How can I implement this? Do I need a custom layer?
I'm building off of ideas in this paper, in case you're curious about the motivation behind all of this.
EDIT: After posting, I noticed that I used only the time variable, rather than the time series itself. A TCN fit with the lagged series fits the idiosyncratic part of the series just fine (in-sample anyway). But my basic question still stands -- I want to merge the two types of networks.
So, I solved my own problem. The answer is to create dummy interactions (and a thus a really sparse design matrix) and then reshape the data.
###########################
# interaction model
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
from keras.models import Model
from keras.layers import Input, Conv1D, Dense
from keras.optimizers import Adam, SGD
from patsy import dmatrix
def shift5(arr, num, fill_value=np.nan):
result = np.empty_like(arr)
if num > 0:
result[:num] = fill_value
result[num:] = arr[:-num]
elif num < 0:
result[num:] = fill_value
result[:num] = arr[-num:]
else:
result = arr
return result
time = np.array(range(100))
brk = np.array((time>40) & (time < 60)).reshape(100,1)
B = np.array([5, -5]).reshape(1,2)
np.dot(brk, B)
y = np.c_[np.sin(time), np.sin(time)] + np.random.normal(scale = .2, size=(100,2))+ np.dot(brk, B)
plt.clf()
plt.plot(time, y[:,0])
plt.plot(time, y[:,1])
# define interaction model
inp = Input(shape=(None, 2))
x = inp
x = Dense(1)(x)
model = Model(inputs = inp, outputs = x)
model.compile(optimizer = Adam(), loss='mean_squared_error')
model.summary()
from patsy import dmatrix
df = pd.DataFrame(data = {"fips": np.concatenate((np.zeros(100), np.ones(100))),
"brk": np.concatenate((brk.reshape(100), brk.squeeze()))})
df.brk = df.brk.astype(int)
tm = np.asarray(dmatrix("brk:C(fips)-1", data = df))
brkint = np.concatenate(( \
tm[:100,:].reshape(1,100,2),
tm[100:200,:].reshape(1,100,2)
), axis = 0)
y_train = np.transpose(y).reshape(2,100,1)
history = model.fit(brkint, y_train,
batch_size=2,
epochs=1000,
verbose = 1)
yhat = model.predict(brkint)
plt.clf()
plt.plot(time, y[:,0])
plt.plot(time, y[:,1])
plt.plot(time, yhat[0,:,:])
plt.plot(time, yhat[1,:,:])
The output shape is the same as for the TCN, and can simply be added element-wise.
Related
I know there are several questions about this here, but I haven't found one which fits exactly my problem.
I'm trying to fit an LSTM with data from Pandas DataFrames but getting confused about the format I have to provide them.
I created a small code snipped which shall show you what I try to do:
import pandas as pd, tensorflow as tf, random
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
targets = pd.DataFrame(index=pd.date_range(start='2019-01-01', periods=300, freq='D'))
targets['A'] = [random.random() for _ in range(len(targets))]
targets['B'] = [random.random() for _ in range(len(targets))]
features = pd.DataFrame(index=targets.index)
for i in range(len(features)) :
features[str(i)] = [random.random() for _ in range(len(features))]
model = Sequential()
model.add(LSTM(units=targets.shape[1], input_shape=features.shape))
model.compile(optimizer='adam', loss='mae')
model.fit(features, targets, batch_size=10, epochs=10)
this results to:
ValueError: Input 0 of layer sequential is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [10, 300]
which I expect relates to the dimensions of the features DataFrame provided. I guess that once fixed this the next error would mention the targets DataFrame.
As far as I understand, 'units' parameter of my first layer defines the output dimensionality of this model. The inputs have to have a 3D shape, but I don't know how to create them out of the 2D world of the Data Frames.
I hope you can help me understanding the reshape mechanism in Python and how to use them in combination with Pandas DataFrames. (I'm quite new to Python and came from R)
Thankls in advance
Lets looks at the few popular ways in LSTMs are used.
Many to Many
Example: You have a sentence (composed of words in sequence). Give these sequence of words you would like to predict the Parts of speech (POS) of each word.
So you have n words and you feed each word per timestep to the LSTM. Each LSTM timestep (also called LSTM unwrapping) will produce and output. The word is represented by a a set of features normally word embeddings. So the input to LSTM is of size bath_size X time_steps X features
Keras code:
inputs = keras.Input(shape=(10,3))
lstm = keras.layers.LSTM(8, input_shape = (10, 3), return_sequences = True)(inputs)
outputs = keras.layers.TimeDistributed(keras.layers.Dense(5, activation='softmax'))(lstm)
model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(loss='categorical_crossentropy', optimizer='adam')
X = np.random.randn(4,10,3)
y = np.random.randint(0,2, size=(4,10,5))
model.fit(X, y, epochs=2)
print (model.predict(X).shape)
Many to One
Example: Again you have a sentence (composed of words in sequence). Give these sequence of words you would like to predict sentiment of the sentence if it is positive or negative.
Keras code
inputs = keras.Input(shape=(10,3))
lstm = keras.layers.LSTM(8, input_shape = (10, 3), return_sequences = False)(inputs)
outputs =keras.layers.Dense(5, activation='softmax')(lstm)
model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(loss='categorical_crossentropy', optimizer='adam')
X = np.random.randn(4,10,3)
y = np.random.randint(0,2, size=(4,5))
model.fit(X, y, epochs=2)
print (model.predict(X).shape)
Many to multi-headed
Example: You have a sentence (composed of words in sequence). Give these sequence of words you would like to predict sentiment of the sentence as well the author of the sentence.
This is multi-headed model where one head will predict the sentiment and another head will predict the author. Both the heads share the same LSTM backbone.
Keras code
inputs = keras.Input(shape=(10,3))
lstm = keras.layers.LSTM(8, input_shape = (10, 3), return_sequences = False)(inputs)
output_A = keras.layers.Dense(5, activation='softmax')(lstm)
output_B = keras.layers.Dense(5, activation='softmax')(lstm)
model = keras.Model(inputs=inputs, outputs=[output_A, output_B])
model.compile(loss='categorical_crossentropy', optimizer='adam')
X = np.random.randn(4,10,3)
y_A = np.random.randint(0,2, size=(4,5))
y_B = np.random.randint(0,2, size=(4,5))
model.fit(X, [y_A, y_B], epochs=2)
y_hat_A, y_hat_B = model.predict(X)
print (y_hat_A.shape, y_hat_B.shape)
What you are looking for is Many to Multi head model where your predictions for A will be made by one head and another head will make predictions for B
The input data for the LSTM has to be 3D.
If you print the shapes of your DataFrames you get:
targets : (300, 2)
features : (300, 300)
The input data has to be reshaped into (samples, time steps, features). This means that targets and features must have the same shape.
You need to set a number of time steps for your problem, in other words, how many samples will be used to make a prediction.
For example, if you have 300 days and 2 features the time step can be 3. So that three days will be used to make one prediction (you can choose this arbitrarily). Here is the code for reshaping your data (with a few more changes):
import pandas as pd
import numpy as np
import tensorflow as tf
import random
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
data = pd.DataFrame(index=pd.date_range(start='2019-01-01', periods=300, freq='D'))
data['A'] = [random.random() for _ in range(len(data))]
data['B'] = [random.random() for _ in range(len(data))]
# Choose the time_step size.
time_steps = 3
# Use numpy for the 3D array as it is easier to handle.
data = np.array(data)
def make_x_y(ts, data):
"""
Parameters
ts : int
data : numpy array
This function creates two arrays, x and y.
x is the input data and y is the target data.
"""
x, y = [], []
offset = 0
for i in data:
if offset < len(data)-ts:
x.append(data[offset:ts+offset])
y.append(data[ts+offset])
offset += 1
return np.array(x), np.array(y)
x, y = make_x_y(time_steps, data)
print(x.shape, y.shape)
nodes = 100 # This is the width of the network.
out_size = 2 # Number of outputs produced by the network. Same size as features.
model = Sequential()
model.add(LSTM(units=nodes, input_shape=(x.shape[1], x.shape[2])))
model.add(Dense(out_size)) # For the output a Dense (fully connected) layer is used.
model.compile(optimizer='adam', loss='mae')
model.fit(x, y, batch_size=10, epochs=10)
Well, just to finalize this issue I would like to provide one solution I have meanwhile worked on. The class TimeseriesGenerator in tf.keras.... enabled me quite easy to provide the data in the right shape to an LSTM model
from keras.preprocessing.sequence import TimeseriesGenerator
import numpy as np
window_size = 7
batch_size = 8
sampling_rate = 1
train_gen = TimeseriesGenerator(X_train.values, y_train.values,
length=window_size, sampling_rate=sampling_rate,
batch_size=batch_size)
valid_gen = TimeseriesGenerator(X_valid.values, y_valid.values,
length=window_size, sampling_rate=sampling_rate,
batch_size=batch_size)
test_gen = TimeseriesGenerator(X_test.values, y_test.values,
length=window_size, sampling_rate=sampling_rate,
batch_size=batch_size)
There are many other ways on implementing generators e.g. using the more_itertools which provides the function windowed, or making use of tensorflow.Dataset and its function window.
For me the TimeseriesGenerator was sufficient to feed the tests I did.
In case you would like to see an example modeling the DAX based on some stocks I'm sharing a notebook on Github.
I have been developing feedforward neural networks (FNNs) and recurrent neural networks (RNNs) in Keras with structured data of the shape [instances, time, features], and the performance of FNNs and RNNs has been the same (except that RNNs require more computation time).
I have also simulated tabular data (code below) where I expected a RNN to outperform a FNN because the next value in the series is dependent on the previous value in the series; however, both architectures predict correctly.
With NLP data, I have seen RNNs outperform FNNs, but not with tabular data. Generally, when would one expect a RNN to outperform a FNN with tabular data? Specifically, could someone post simulation code with tabular data demonstrating a RNN outperforming a FNN?
Thank you! If my simulation code is not ideal for my question, please adapt it or share a more ideal one!
from keras import models
from keras import layers
from keras.layers import Dense, LSTM
import numpy as np
import matplotlib.pyplot as plt
Two features were simulated over 10 time steps, where the value of the second feature is dependent on the value of both features in the prior time step.
## Simulate data.
np.random.seed(20180825)
X = np.random.randint(50, 70, size = (11000, 1)) / 100
X = np.concatenate((X, X), axis = 1)
for i in range(10):
X_next = np.random.randint(50, 70, size = (11000, 1)) / 100
X = np.concatenate((X, X_next, (0.50 * X[:, -1].reshape(len(X), 1))
+ (0.50 * X[:, -2].reshape(len(X), 1))), axis = 1)
print(X.shape)
## Training and validation data.
split = 10000
Y_train = X[:split, -1:].reshape(split, 1)
Y_valid = X[split:, -1:].reshape(len(X) - split, 1)
X_train = X[:split, :-2]
X_valid = X[split:, :-2]
print(X_train.shape)
print(Y_train.shape)
print(X_valid.shape)
print(Y_valid.shape)
FNN:
## FNN model.
# Define model.
network_fnn = models.Sequential()
network_fnn.add(layers.Dense(64, activation = 'relu', input_shape = (X_train.shape[1],)))
network_fnn.add(Dense(1, activation = None))
# Compile model.
network_fnn.compile(optimizer = 'adam', loss = 'mean_squared_error')
# Fit model.
history_fnn = network_fnn.fit(X_train, Y_train, epochs = 10, batch_size = 32, verbose = False,
validation_data = (X_valid, Y_valid))
plt.scatter(Y_train, network_fnn.predict(X_train), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()
plt.scatter(Y_valid, network_fnn.predict(X_valid), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()
LSTM:
## LSTM model.
X_lstm_train = X_train.reshape(X_train.shape[0], X_train.shape[1] // 2, 2)
X_lstm_valid = X_valid.reshape(X_valid.shape[0], X_valid.shape[1] // 2, 2)
# Define model.
network_lstm = models.Sequential()
network_lstm.add(layers.LSTM(64, activation = 'relu', input_shape = (X_lstm_train.shape[1], 2)))
network_lstm.add(layers.Dense(1, activation = None))
# Compile model.
network_lstm.compile(optimizer = 'adam', loss = 'mean_squared_error')
# Fit model.
history_lstm = network_lstm.fit(X_lstm_train, Y_train, epochs = 10, batch_size = 32, verbose = False,
validation_data = (X_lstm_valid, Y_valid))
plt.scatter(Y_train, network_lstm.predict(X_lstm_train), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()
plt.scatter(Y_valid, network_lstm.predict(X_lstm_valid), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()
In practice even in NLP you see that RNNs and CNNs are often competitive. Here's a 2017 review paper that shows this in more detail. In theory it might be the case that RNNs can handle the full complexity and sequential nature of language better but in practice the bigger obstacle is usually properly training the network and RNNs are finicky.
Another problem that might have a chance of working would be to look at a problem like the balanced parenthesis problem (either with just parentheses in the strings or parentheses along with other distractor characters). This requires processing the inputs sequentially and tracking some state and might be easier to learn with a LSTM then a FFN.
Update:
Some data that looks sequential might not actually have to be treated sequentially. For example even if you provide a sequence of numbers to add since addition is commutative a FFN will do just as well as a RNN. This could also be true of many health problems where the dominating information is not of a sequential nature. Suppose every year a patient's smoking habits are measured. From a behavioral standpoint the trajectory is important but if you're predicting whether the patient will develop lung cancer the prediction will be dominated by just the number of years the patient smoked (maybe restricted to the last 10 years for the FFN).
So you want to make the toy problem more complex and to require taking into account the ordering of the data. Maybe some kind of simulated time series, where you want to predict whether there was a spike in the data, but you don't care about absolute values just about the relative nature of the spike.
Update2
I modified your code to show a case where RNNs perform better. The trick was to use more complex conditional logic that is more naturally modeled in LSTMs than FFNs. The code is below. For 8 columns we see that the FFN trains in 1 minute and reaches a validation loss of 6.3. The LSTM takes 3x longer to train but it's final validation loss is 6x lower at 1.06.
As we increase the number of columns the LSTM has a larger and larger advantage, especially if we added more complicated conditions in. For 16 columns the FFNs validation loss is 19 (and you can more clearly see the training curve as the model isn't able to instantly fit the data). In comparison the LSTM takes 11 times longer to train but has a validation loss of 0.31, 30 times smaller than the FFN! You can play around with even larger matrices to see how far this trend will extend.
from keras import models
from keras import layers
from keras.layers import Dense, LSTM
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
import time
matplotlib.use('Agg')
np.random.seed(20180908)
rows = 20500
cols = 10
# Randomly generate Z
Z = 100*np.random.uniform(0.05, 1.0, size = (rows, cols))
larger = np.max(Z[:, :cols/2], axis=1).reshape((rows, 1))
larger2 = np.max(Z[:, cols/2:], axis=1).reshape((rows, 1))
smaller = np.min((larger, larger2), axis=0)
# Z is now the max of the first half of the array.
Z = np.append(Z, larger, axis=1)
# Z is now the min of the max of each half of the array.
# Z = np.append(Z, smaller, axis=1)
# Combine and shuffle.
#Z = np.concatenate((Z_sum, Z_avg), axis = 0)
np.random.shuffle(Z)
## Training and validation data.
split = 10000
X_train = Z[:split, :-1]
X_valid = Z[split:, :-1]
Y_train = Z[:split, -1:].reshape(split, 1)
Y_valid = Z[split:, -1:].reshape(rows - split, 1)
print(X_train.shape)
print(Y_train.shape)
print(X_valid.shape)
print(Y_valid.shape)
print("Now setting up the FNN")
## FNN model.
tick = time.time()
# Define model.
network_fnn = models.Sequential()
network_fnn.add(layers.Dense(32, activation = 'relu', input_shape = (X_train.shape[1],)))
network_fnn.add(Dense(1, activation = None))
# Compile model.
network_fnn.compile(optimizer = 'adam', loss = 'mean_squared_error')
# Fit model.
history_fnn = network_fnn.fit(X_train, Y_train, epochs = 500, batch_size = 128, verbose = False,
validation_data = (X_valid, Y_valid))
tock = time.time()
print()
print(str('%.2f' % ((tock - tick) / 60)) + ' minutes.')
print("Now evaluating the FNN")
loss_fnn = history_fnn.history['loss']
val_loss_fnn = history_fnn.history['val_loss']
epochs_fnn = range(1, len(loss_fnn) + 1)
print("train loss: ", loss_fnn[-1])
print("validation loss: ", val_loss_fnn[-1])
plt.plot(epochs_fnn, loss_fnn, 'black', label = 'Training Loss')
plt.plot(epochs_fnn, val_loss_fnn, 'red', label = 'Validation Loss')
plt.title('FNN: Training and Validation Loss')
plt.legend()
plt.show()
plt.scatter(Y_train, network_fnn.predict(X_train), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.title('training points')
plt.show()
plt.scatter(Y_valid, network_fnn.predict(X_valid), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.title('valid points')
plt.show()
print("LSTM")
## LSTM model.
X_lstm_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)
X_lstm_valid = X_valid.reshape(X_valid.shape[0], X_valid.shape[1], 1)
tick = time.time()
# Define model.
network_lstm = models.Sequential()
network_lstm.add(layers.LSTM(32, activation = 'relu', input_shape = (X_lstm_train.shape[1], 1)))
network_lstm.add(layers.Dense(1, activation = None))
# Compile model.
network_lstm.compile(optimizer = 'adam', loss = 'mean_squared_error')
# Fit model.
history_lstm = network_lstm.fit(X_lstm_train, Y_train, epochs = 500, batch_size = 128, verbose = False,
validation_data = (X_lstm_valid, Y_valid))
tock = time.time()
print()
print(str('%.2f' % ((tock - tick) / 60)) + ' minutes.')
print("now eval")
loss_lstm = history_lstm.history['loss']
val_loss_lstm = history_lstm.history['val_loss']
epochs_lstm = range(1, len(loss_lstm) + 1)
print("train loss: ", loss_lstm[-1])
print("validation loss: ", val_loss_lstm[-1])
plt.plot(epochs_lstm, loss_lstm, 'black', label = 'Training Loss')
plt.plot(epochs_lstm, val_loss_lstm, 'red', label = 'Validation Loss')
plt.title('LSTM: Training and Validation Loss')
plt.legend()
plt.show()
plt.scatter(Y_train, network_lstm.predict(X_lstm_train), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.title('training')
plt.show()
plt.scatter(Y_valid, network_lstm.predict(X_lstm_valid), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.title("validation")
plt.show()
I am trying to make hourly predictions using a recurrent neural network using TensorFlow and Keras in Python.I have assigned my inputs of the neural network to be (None, None, 5) shown in my .
However, I am getting the errorː
ValueError: Error when checking input: expected gru_3_input to have shape (None, None, 10) but got array with shape (1, 4, 1) My MVCE code isː
%matplotlib inline
#!pip uninstall keras
#!pip install keras==2.1.2
import tensorflow as tf
import pandas as pd
from pandas import DataFrame
import math
#####Create the Recurrent Neural Network###
model = Sequential()
model.add(GRU(units=5,
return_sequences=True,
input_shape=(None, num_x_signals)))
## This line is going to map the above 512 values to just 1 (num_y_signal)
model.add(Dense(num_y_signals, activation='sigmoid'))
if False:
from tensorflow.python.keras.initializers import RandomUniform
# Maybe use lower init-ranges.##### I may have to change these during debugging####
init = RandomUniform(minval=-0.05, maxval=0.05)
model.add(Dense(num_y_signals,
activation='linear',
kernel_initializer=init))
warmup_steps = 5
def loss_mse_warmup(y_true, y_pred):
#
# Ignore the "warmup" parts of the sequences
# by taking slices of the tensors.
y_true_slice = y_true[:, warmup_steps:, :]
y_pred_slice = y_pred[:, warmup_steps:, :]
# These sliced tensors both have this shape:
# [batch_size, sequence_length - warmup_steps, num_y_signals]
# Calculate the MSE loss for each value in these tensors.
# This outputs a 3-rank tensor of the same shape.
loss = tf.losses.mean_squared_error(labels=y_true_slice,
predictions=y_pred_slice)
loss_mean = tf.reduce_mean(loss)
return loss_mean
optimizer = RMSprop(lr=1e-3) ### This is somthing related to debugging
model.compile(loss=loss_mse_warmup, optimizer=optimizer)#### I may have to make the output a singnal rather than the whole data set
print(model.summary())
model.fit_generator(generator=generator,
epochs=20,
steps_per_epoch=100,
validation_data=validation_data)
I am not sure why this could be, but i believe it could something to do with reshaping my training and testing data. ɪ have also attached my full error message to my code to make the problem reproducible.
I'm unsure about the correctness but here it is:
%matplotlib inline
#!pip uninstall keras
#!pip install keras==2.1.2
import tensorflow as tf
import pandas as pd
from pandas import DataFrame
import math
import numpy
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
import datetime
from keras.layers import Input, Dense, GRU, Embedding
from keras.optimizers import RMSprop
from keras.callbacks import EarlyStopping, ModelCheckpoint, TensorBoard, ReduceLROnPlateau
datetime = [datetime.datetime(2012, 1, 1, 1, 0, 0) + datetime.timedelta(hours=i) for i in range(10)]
X=np.array([2.25226244,1.44078451,0.99174488,0.71179491,0.92824542,1.67776948,2.96399534,5.06257161,7.06504245,7.77817664
,0.92824542,1.67776948,2.96399534,5.06257161,7.06504245,7.77817664])
y= np.array([0.02062136,0.00186715,0.01517354,0.0129046 ,0.02231125,0.01492537,0.09646542,0.28444476,0.46289928,0.77817664
,0.02231125,0.01492537,0.09646542,0.28444476,0.46289928,0.77817664])
X = X[1:11]
y= y[1:11]
df = pd.DataFrame({'date':datetime,'y':y,'X':X})
df['t']= [x for x in range(10)]
df['X-1'] = df['X'].shift(-1)
x_data = df['X-1'].fillna(0)
y_data = y
num_data = len(x_data)
#### training and testing split####
train_split = 0.6
num_train = int(train_split*num_data)
num_test = num_data-num_train## number of observations in test set
#input train test
x_train = x_data[0:num_train].reshape(-1, 1)
x_test = x_data[num_train:].reshape(-1, 1)
#print (len(x_train) +len( x_test))
#output train test
y_train = y_data[0:num_train].reshape(-1, 1)
y_test = y_data[num_train:].reshape(-1, 1)
#print (len(y_train) + len(y_test))
### number of input signals
num_x_signals = x_data.shape[0]
# print (num_x_signals)
## number of output signals##
num_y_signals = y_data.shape[0]
#print (num_y_signals)
####data scalling'###
x_scaler = MinMaxScaler(feature_range=(0,1))
x_train_scaled = x_scaler.fit_transform(x_train)
x_test_scaled = MinMaxScaler(feature_range=(0,1)).fit_transform(x_test)
y_scaler = MinMaxScaler()
y_train_scaled = y_scaler.fit_transform(y_train)
y_test_scaled = MinMaxScaler(feature_range=(0,1)).fit_transform(y_test)
def batch_generator(batch_size, sequence_length):
"""
Generator function for creating random batches of training-data.
"""
# Infinite loop. providing the neural network with random data from the
# datase for x and y
while True:
# Allocate a new array for the batch of input-signals.
x_shape = (batch_size, sequence_length, num_x_signals)
x_batch = np.zeros(shape=x_shape, dtype=np.float16)
# Allocate a new array for the batch of output-signals.
y_shape = (batch_size, sequence_length, num_y_signals)
y_batch = np.zeros(shape=y_shape, dtype=np.float16)
# Fill the batch with random sequences of data.
for i in range(batch_size):
# Get a random start-index.
# This points somewhere into the training-data.
idx = np.random.randint(num_train - sequence_length)
# Copy the sequences of data starting at this index.
x_batch[i] = x_train_scaled[idx:idx+sequence_length]
y_batch[i] = y_train_scaled[idx:idx+sequence_length]
yield (x_batch, y_batch)
batch_size =20
sequence_length = 2
generator = batch_generator(batch_size=batch_size,
sequence_length=sequence_length)
x_batch, y_batch = next(generator)
#########Validation Set Start########
def batch_generator(batch_size, sequence_length):
"""
Generator function for creating random batches of training-data.
"""
# Infinite loop. providing the neural network with random data from the
# datase for x and y
while True:
# Allocate a new array for the batch of input-signals.
x_shape = (batch_size, sequence_length, num_x_signals)
x_batch = np.zeros(shape=x_shape, dtype=np.float16)
# Allocate a new array for the batch of output-signals.
y_shape = (batch_size, sequence_length, num_y_signals)
y_batch = np.zeros(shape=y_shape, dtype=np.float16)
# Fill the batch with random sequences of data.
for i in range(batch_size):
# Get a random start-index.
# This points somewhere into the training-data.
idx = np.random.randint(num_train - sequence_length)
# Copy the sequences of data starting at this index.
x_batch[i] = x_test_scaled[idx:idx+sequence_length]
y_batch[i] = y_test_scaled[idx:idx+sequence_length]
yield (x_batch, y_batch)
validation_data= next(batch_generator(batch_size,sequence_length))
# validation_data = (np.expand_dims(x_test_scaled, axis=0),
# np.expand_dims(y_test_scaled, axis=0))
#Validation set end
#####Create the Recurrent Neural Network###
model = Sequential()
model.add(GRU(units=5,
return_sequences=True,
input_shape=(None, num_x_signals)))
## This line is going to map the above 512 values to just 1 (num_y_signal)
model.add(Dense(num_y_signals, activation='sigmoid'))
if False:
from tensorflow.python.keras.initializers import RandomUniform
# Maybe use lower init-ranges.##### I may have to change these during debugging####
init = RandomUniform(minval=-0.05, maxval=0.05)
model.add(Dense(num_y_signals,
activation='linear',
kernel_initializer=init))
warmup_steps = 5
def loss_mse_warmup(y_true, y_pred):
#
# Ignore the "warmup" parts of the sequences
# by taking slices of the tensors.
y_true_slice = y_true[:, warmup_steps:, :]
y_pred_slice = y_pred[:, warmup_steps:, :]
# These sliced tensors both have this shape:
# [batch_size, sequence_length - warmup_steps, num_y_signals]
# Calculate the MSE loss for each value in these tensors.
# This outputs a 3-rank tensor of the same shape.
loss = tf.losses.mean_squared_error(labels=y_true_slice,
predictions=y_pred_slice)
loss_mean = tf.reduce_mean(loss)
return loss_mean
optimizer = RMSprop(lr=1e-3) ### This is somthing related to debugging
model.compile(loss=loss_mse_warmup, optimizer=optimizer)#### I may have to make the output a singnal rather than the whole data set
print(model.summary())
model.fit_generator(generator=generator,
epochs=20,
steps_per_epoch=100,
validation_data=validation_data)
I've only changed part of code between validation set start and validation set end.
I am trying to write a program that predicts if one has malignant tumor or benign tumor
Dataset is from:https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Prognostic%29
This is my code and my accuracy is at about 65% which is no better than a coin flip. Any help would be appreciated
import tensorflow as tf
import pandas as pd
import numpy as np
df = pd.read_csv(r'D:\wholedesktop\logisticReal.txt')
df.drop(['id'], axis=1, inplace=True)
x_data = np.array(df.drop(['class'], axis=1))
x_data = x_data.astype(np.float64)
y = df['class']
y.replace(2, 0, inplace=True)
y.replace(4, 1, inplace=True)
y_data = np.array(y)
# y shape = 681,1
# x shape = 681,9
x = tf.placeholder(name='x', dtype=np.float32)
y = tf.placeholder(name='y', dtype=np.float32)
w = tf.Variable(dtype=np.float32, initial_value=np.random.random((9, 1)))
b = tf.Variable(dtype=np.float32, initial_value=np.random.random((1, 1)))
y_ = (tf.add(tf.matmul(x, w), b))
error = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=y_, labels=y))
goal = tf.train.GradientDescentOptimizer(0.05).minimize(error)
prediction = tf.round(tf.sigmoid(y_))
correct = tf.cast(tf.equal(prediction, y), dtype=np.float64)
accuracy = tf.reduce_mean(correct)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(2000):
sess.run(goal, feed_dict={x: x_data, y: y_data})
print(i, sess.run(accuracy, feed_dict={x: x_data, y: y_data}))
weight = sess.run(w)
bias = sess.run(b)
print(weight)
print(bias)
Your neural network only has a single layer, so the best it can do is fit a straight line to your data that separates the different classes. This is vastly insufficient for a general (high-dimensional) data set. The power of (deep) neural networks lies in the connectivity between many layers of neurons. In your example, you could add more layers manually by passing the output of matmul to a new matmul with different weights and biases, or you could use the contrib.layers collection to make it more concise:
x = tf.placeholder(name='x', dtype=np.float32)
fc1 = tf.contrib.layers.fully_connected(inputs=x, num_outputs=16, activation_fn=tf.nn.relu)
fc2 = tf.contrib.layers.fully_connected(inputs=fc1, num_outputs=32, activation_fn=tf.nn.relu)
fc3 = tf.contrib.layers.fully_connected(inputs=fc2, num_outputs=64, activation_fn=tf.nn.relu)
The trick is to pass the output from one layer as input to the next layer. As you add more and more layers, your accuracy will go up (probably because of over-fitting, use dropout to remedy that).
I know there were already some questions in this area, but I couldn't find the answer to my problem.
I have an LSTM (with tflearn) for a regression problem.
I get 3 types of errors, no matter what kind of modifications I do.
import pandas
import tflearn
import tensorflow as tf
from sklearn.cross_validation import train_test_split
csv = pandas.read_csv('something.csv', sep = ',')
X_train, X_test = train_test_split(csv.loc[:,['x1', 'x2',
'x3','x4','x5','x6',
'x7','x8','x9',
'x10']].as_matrix())
Y_train, Y_test = train_test_split(csv.loc[:,['y']].as_matrix())
#create LSTM
g = tflearn.input_data(shape=[None, 1, 10])
g = tflearn.lstm(g, 512, return_seq = True)
g = tflearn.dropout(g, 0.5)
g = tflearn.lstm(g, 512)
g = tflearn.dropout(g, 0.5)
g = tflearn.fully_connected(g, 1, activation='softmax')
g = tflearn.regression(g, optimizer='adam', loss = 'mean_square',
learning_rate=0.001)
model = tflearn.DNN(g)
model.fit(X_train, Y_train, validation_set = (Y_train, Y_test))
n_examples = Y_train.size
def mean_squared_error(y,y_):
return tf.reduce_sum(tf.pow(y_ - y, 2))/(2 * n_examples)
print()
print("\nTest prediction")
print(model.predict(X_test))
print(Y_test)
Y_pred = model.predict(X_test)
print('MSE Test: %.3f' % ( mean_squared_error(Y_test,Y_pred)) )
At the first run when starting new kernel i get
ValueError: Cannot feed value of shape (100, 10) for Tensor 'InputData/X:0', which has shape '(?, 1, 10)'
Then, at the second time
AssertionError: Input dim should be at least 3.
and it refers to the second LSTM layer. I tried to remove the second LSTM an Dropout layers, but then I get
feed_dict[net_inputs[i]] = x
IndexError: list index out of range
If you read this, have a nice day. I you answer it, thanks a lot!!!!
Ok, I solved it. I post it so maybe it helps somebody:
X_train = X_train.reshape([-1,1,10])
X_test = X_test.reshape([-1,1,10])