Feature extraction for Timeseries LSTM

Feature extraction for Timeseries LSTM - python

I want to feed a timeseries into an LSTM to perform a forecast.
Lets say I have 10000 samples. Now in order to feed the timeseries into my LSTM I reshape it to (samples,timesteps,features). In my case I use timesteps=50 to create subsequences and perform a forecast of t+1. So I end up with x.shape=(9950,50,1). So far so good.
My Model
model= Sequential()
model.add(LSTM(50,return_sequences=True,input_shape=(50,1)))
model.add(Dense(out_dim, activation = 'sigmoid'))
model.compile(loss='mse', optimizer='adam')
NOW I want to create artificial features e.g. I want to use the fft of the signal as a feature. How I can feed it into my LSTM? Is it legitimate to just compute the fft, append it to the Dataframe and reshape all together so I end up with (9950,50,2)??
Questions are basically:
How I input artifical created features into an LSTM?
Its the same way for rolling statistics or correlation features?
Thanks in advance

Any extra feature you compute from the input data is just another feature so:
You feed it just like another feature of series, input_shape=(50, 1+extra_features) and you will have to concatenate those prior to passing to model. So yes, the input shape will now be (9950, 50, 2).
Yes it is, you can pre-compute that feature let's say moving average and then concatenate it with the original input.
You can also write custom layers to compute these features within the model, but the model would have compute it every time. If you compute it a priori, the advantage is you can save / cache it.
If you have non-timeseries features, now you need to move onto the functional API and have multiple inputs: 1 which is timeseries and another which is not:
series_in = Input(shape=(50, 2))
other_in = Input(shape(extra_features,)) # not a timeseries just a vector
# An example graph
lstm_out = LSTM(128)(series_in)
merged = concatenate([lstm_out, other_in])
out = Dense(out_dim, activation='sigmoid')(merged)
model = Model([series_in, other_in], out)
model.compile(...)
In this case we have 2 inputs to the model and can use the auxiliary features at any point. In the example, I merge before the final Dense layer to aid the predication along with the timeseries features extracted with the LSTM.

Related

RNN : understanfingConcatenating layers

I am trying to understanding concatenating of layers in tensorflow keras.
Below I have drew what I think is the concatenation of 2 RNN layers [ Spare for picture clarity] and the output
Here I am trying to concatenate two RNN layers. One layer has longitudinal data[ integer valued ] of patients in some time sequence and other layer has again details of same patients of other time sequence with categorical input.
I don't want these two different time sequences to be mixed up since it is medical data. So I am trying this. But before that I want to be sure if what I have drawn is what concatenating of two layers means.
Below is my code. It appears to work well but I want to confirm if my what i drew and what is implemented are correct .
#create simpleRNN with one sequence of input
first_input = Input(shape=(4, 7),dtype='float32')
simpleRNN1 = layers.SimpleRNN(units=25,bias_initializer= initializers.RandomNormal(stddev=0.0001),
activation="relu",kernel_initializer= "random_uniform")(first_input)
#another layer of RNN
second_input = Input(shape=(16,1),dtype='float32')
simpleRNN2 = layers.SimpleRNN(units=25,bias_initializer= initializers.RandomNormal(stddev=0.0001),
activation="relu",kernel_initializer= "random_uniform")(second_input)
#concatenate two layers,stack dense layer on top
concat_lay = tf.keras.layers.Concatenate()([simpleRNN1, simpleRNN2])
dens_lay = layers.Dense(64, activation='relu')(concat_lay)
dens_lay = layers.Dense(32, activation='relu')(dens_lay)
dens_lay = layers.Dense(1, activation='sigmoid')(dens_lay)
model = tf.keras.Model(inputs=[first_input, second_input], outputs= [dens_lay])
model.compile(loss='binary_crossentropy', optimizer='adam',metrics=["accuracy"],lr=0.001)
model.summary()

Concatenation means 'chaining together' or 'unification' here, making a union of two enities.
i think your problem is addressed in https://datascience.stackexchange.com/questions/29634/how-to-combine-categorical-and-continuous-input-features-for-neural-network-trai (How to combine categorical and continuous input features for neural network training)
If you have biomedical data, i.e. ECG, as the continuous data and diagnoses as the categorical data i would consider ensemble learning as the best ansatz.
What is the best solution here depends on the details of your problem ...
Building an ensembleof two neural nets is described in https://machinelearningmastery.com/ensemble-methods-for-deep-learning-neural-networks/

Yes, what you have implemented is correct (in comparison to the diagram). To be exact it is doing the following. Here, the blue nodes denote Inputs/Outputs and N denote None (this is the batch dimension).
But just to add few notes,
I am assuming that you want to mix the two outputs of the RNNs at the first Dense layer (with 64 units) because after that, there's no telling which input is which.
When you use Concatenate as a rule of thumb specify the axis you need to concatenate on (defaults to -1). Here as you have two inputs (None, 25) and (None, 25), axis=-1 works out fine. But it is always good to be specific, otherwise you might end up with bizarre results when you're implementing complex models.

How to use Multivariate time-series prediction with Keras, when multiple samples are used

As the title states, I am doing multivariate time-series prediction. I have some experience with this situation and was able to successfully setup and train a working model in TF Keras.
However, I did not know the 'proper' way to handle having multiple unrelated time-series samples. I have about 8000 unique sample 'blocks' with anywhere from 800 time steps to 30,000 time steps per sample. Of course I couldn't concatenate them all into one single time series because the first points of sample 2 are not related in time with the last points of sample 1.
Thus my solution was to fit each sample individually in a loop (at great inefficiency).
My new idea is can/should I pad the start of each sample with empty time-steps = to the amount of look back for the RNN and then concatenate the padded samples into one time-series? This will mean that the first time-step will have a look-back data of mostly 0's which sounds like another 'hack' for my problem and not the right way to do it.

The main challenge is in 800 vs. 30,000 timesteps, but nothing you can't do.
Model design: group sequences into chunks - for example, 30 sequences of 800-to-900 timesteps, padded, then 60 sequences of 900-to-1000, etc. - don't have to be contiguous (i.e. next can be 1200-to-1500)
Input shape: (samples, timesteps, channels) - or equivalently, (sequences, timesteps, features)
Layers: Conv1D and/or RNNs - e.g. GRU, LSTM. Each can handle variable timesteps
Concatenation: don't do it. If each of your sequences is independent, then each must be fed along dimension 0 in Keras - the batch or samples dimension. If they are dependent, e.g. multivariate timeseries, like many channels in a signal - then feed them along the channels dimension (dim 2). But never concatenate along timeseries dimension, as it implies causal continuity whrere none exists.
Stateful RNNs: can help in processing long sequences - info on how they work here
RNN capability: is limited w.r.t. long sequences, and 800 is already in danger zone even for LSTMs; I'd suggest dimensionality reduction via either autoencoders or CNNs w/ strides > 1 at input, then feeding their outputs to RNNs.
RNN training: is difficult. Long train times, hyperparameter sensitivity, vanishing gradients - but, with proper regularization, they can be powerful. More info here
Zero-padding: before/after/both - debatable, can read about it, but probably stay clear from "both" as learning to ignore paddings is easier with one locality; I personally use "before"
RNN variant: use CuDNNLSTM or CuDNNGRU whenever possible, as they are 10x faster
Note: "samples" above, and in machine learning, refers to independent examples / observations, rather than measured signal datapoints (which would be referred to as timesteps).
Below is a minimal code for what a timeseries-suited model would look like:
from tensorflow.keras.layers import Input, Conv1D, LSTM, Dense
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
import numpy as np
def make_data(batch_shape): # dummy data
return (np.random.randn(*batch_shape),
np.random.randint(0, 2, (batch_shape[0], 1)))
def make_model(batch_shape): # example model
ipt = Input(batch_shape=batch_shape)
x = Conv1D(filters=16, kernel_size=10, strides=2, padding='valid')(ipt)
x = LSTM(units=16)(x)
out = Dense(1, activation='sigmoid')(x) # assuming binary classification
model = Model(ipt, out)
model.compile(Adam(lr=1e-3), 'binary_crossentropy')
return model
batch_shape = (32, 100, 16) # 32 samples, 100 timesteps, 16 channels
x, y = make_data(batch_shape)
model = make_model(batch_shape)
model.train_on_batch(x, y)

How to train ML model on 2 columns to solve for classification?

I have three columns in a dataset on which I'm doing sentiment analysis(classes 0,1,2):
text thing sentiment
But the problem is that I can train my data only on either text or thing and get predicted sentiment. Is there a way to train the data both on text & thing and then predict sentiment ?
Problem case(say):
|text thing sentiment
0 | t1 thing1 0
. |
. |
54| t1 thing2 2
This example tells us that sentiment shall depend on the thing as well. If I try to concatenate the two columns one below the other and then try but that would be incorrect as we wouldn't be giving any relationship between the two columns to the model.
Also my test set contains two columns test and thing for which I've to predict the sentiment according to the trained model on the two columns.
Right now I'm using the tokenizer and then the model below:
model = Sequential()
model.add(Embedding(MAX_NB_WORDS, EMBEDDING_DIM, input_length=X.shape[1]))
model.add(SpatialDropout1D(0.2))
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())
Any pointers on how to proceed or which model or coding manipulation to use ?

You may want to shift to the Keras functionnal API and train a multi-input model.
According to Keras's creator, François CHOLLET, in his book Deep Learning with Python [Manning, 2017] (chapter 7, section 1) :
Some tasks, require multimodal inputs: they merge data coming from different input sources, processing each type of data using different kinds of neural layers. Imagine a deep-learning model trying to predict the most likely market price of a second-hand piece of clothing, using the following inputs: user-provided metadata (such as the item’s brand, age, and so on), a user-provided text description, and a picture of the item. If you had only the metadata available, you could one-hot encode it and use a densely connected network to predict the price. If you had only the text description available, you could use an RNN or a 1D convnet. If you had only the picture, you could use a 2D convnet. But how can you use all three at the same time? A naive approach would be to train three separate models and then do a weighted average of their predictions. But this may be suboptimal, because the information extracted by the models may be redundant. A better way is to jointly learn a more accurate model of the data by using a model that can see all available input modalities simultaneously: a model with three input branches.

I think the Concatenate functionality is the way to get in such a case and the general idea should be as follows. Please tweak it according to your use case.
### whatever preprocessing you may want to do
text_input = Input(shape=(1, ))
thing_input = Input(shape=(1,))
### now bring them together
merged_inputs = Concatenate(axis = 1)([text_input, thing_input])
### sample output layer
output = Dense(3)(merged_inputs)
### pass your inputs and outputs to the model
model = Model(inputs = [text_input, thing_input], outputs = output)

You have to take multiple column as list and then merge to train after embedding and pre processing on the raw data.
Example:
train = pd.read_csv('COVID19 multifeature Emotion - 50 data.csv', nrows=49)
# This dataset has two text column field and different class level
X_train_doctor_opinion = train["doctor-opinion"].str.lower()
X_train_patient_opinion = train["patient-opinion"].str.lower()
X_train = list(X_train_doctor_opinion) + list(X_train_patient_opinion))
Then pre process and embed

Tensorflow neural network has very high error with simple regression

I'm trying to build a NN to do regression with Keras in Tensorflow.
I've trying to predict the chart ranking of a song based on a set of features, I've identified a strong correlation of having a low feature 1, a high feature 2 and a high feature 3, with having a high position on the chart (a low output ranking, eg position 1).
However after training my model, the MAE is coming out at about 3500 (very very high) on both the training and testing set. Throwing some values in, it seems to give the lowest output rankings for observations with low values in all 3 features.
I think this could be something to do with the way I'm normalising my data. After brining it into a pandas dataframe with a column for each feature, I use the following code to normalise:
def normalise_dataset(df):
return df-(df.mean(axis=0))/df.std()
I'm using a sequential model with one Dense input layer with 64 neurons and one dense output layer with one neuron. Here is the definition code for that:
model = keras.Sequential([
keras.layers.Dense(64, activation=tf.nn.relu, input_dim=3),
keras.layers.Dense(1)
])
optimizer = tf.train.RMSPropOptimizer(0.001)
model.compile(loss='mse', optimizer=optimizer, metrics=['mae'])
I'm a software engineer, not a data scientist so I don't know if this model set-up is the correct configuration for my problem, I'm very open to advice on how to make it better fit my use case.
Thanks
EDIT: Here's the first few entires of my training data, there are ~100,000 entires. The final col (finalPos) contains the labels, the field I'm trying to predict.
chartposition,tagcount,artistScore,finalPos
256,191,119179,4625
256,191,5902650,292
256,191,212156,606
205,1480523,5442
256,195,5675757,179
256,195,933171,7745

The first obvious thing is that you are normalizing your data in the wrong way. The correct way is
return (df - df.mean(axis=0))/df.std()
I just changed the bracket, but basically it is (data - mean) divided by standard deviation, whereas you are dividing the mean by the standard deviation.

Using tensorflow or keras to build a NN model by feeding 'pairwise' samples

I'm trying to implement a NN model with pairwise samples. Details are shown in follows:
Original data:
X_org with shape of (100, 50) for example, namely 100 samples with 50 features.
Y_org with shape of (100, 1).
Processing these original data for real training:
Select 2 samples from X_org randomly (so we have 100*99/2 such combinations) to form a new 'pairwise' sample, and the prediction target, namely the new y label is the subtraction of the two corresponding y_org labels (Y_org_sample1 - Y_org_sample2). Now we have new X_train and Y_train.
I need a more a NN model (DNN, CNN, LSTM, whatever ...), with which I can pass the first sub_sample of one pairwise sample from X_train into the model and will get one result, same step for the second sub_sample. By calculating the subtraction of the two results, I can get the prediction of this pairwise sample. This prediction will be the one compared with the corresponding Y label from Y_train.
Overall, I need to train a model (update the weights) after feeding it a 'pairwise' sample (two successive sub samples). The reason why I don't choose a 'two-arm' model (e.g. merge two arms by xxx.sub()) is that I will only feed one sub sample during test process. I will just use the model to predict one sub-sample finally.
So I will use the data from X_train during train step, while use X_org-like data format during test step. It looks a bit complex.
Looks like Tensorflow would be more feasible for this task, if keras also works, please kindly share your idea.

You can first create a model that will take only one X_org-like element:
#create a model the way you like it, it can be Functional API or Sequential, no problem
xOrgModel = createAModelForXOrgData(...)
Now, lets create a second model, this time necessarily functional API that works with both inputs:
from keras.models import Model
from keras.layers import Input, Subtract
input1 = Input(shapeOfInput)
input2 = Input(shapeOfInput)
output1 = xOrgModel(input1)
output2 = xOrgModel(input2)
output = Subtract()([output1,output2])
pairWiseModel = Model([input1,input2],output)
Now you have two models: xOrgModel and pairWiseModel. You can use any of them depending on the task you are doing at the moment.
Both models are sharing their weights. This means that you can train any of them and the other will be updated as well.
Using the pairwise model
First, organize your data in two separate arrays. (Because our model uses two inputs)
L = len(X_org)
x1 = []
x2 = []
y = []
for i in range(L):
for j in range(i+1,L):
x1.append(X_org[i])
x2.append(X_org[j])
y.append(Y_org[i] - Y_org[j])
x1 = np.array(x1)
x2 = np.array(x2)
y = np.array(y)
Train and predict with a list of inputs:
pairWiseModel.fit([x1,x2],y,...)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.