I use keras for time series prediction. My code can predict next 6 months by predict next one month and then get it to be input for predict next month again untill complete 6 months. That means predict one month 6 times. Can I predict next 6 month in one time.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from keras.layers import LSTM
from pandas.tseries.offsets import MonthEnd
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, Dropout
import keras.backend as K
from keras.layers import Bidirectional
from keras.layers import Embedding
from keras.layers import GRU
df = pd.read_csv('D://data.csv',
engine='python')
df['DATE_'] = pd.to_datetime(df['DATE_']) + MonthEnd(1)
df = df.set_index('DATE_')
df.head()
split_date = pd.Timestamp('03-01-2015')
train = df.loc[:split_date, ['data']]
test = df.loc[split_date:, ['data']]
sc = MinMaxScaler()
train_sc = sc.fit_transform(train)
test_sc = sc.transform(test)
X_train = train_sc[:-1]
y_train = train_sc[1:]
X_test = test_sc[:-1]
y_test = test_sc[1:]
K.clear_session()
model = Sequential()
model.add(Dense(12, input_dim=1, activation='relu'))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.summary()
model.fit(X_train, y_train, epochs=200, batch_size=2)
y_pred = model.predict(X_test)
real_pred = sc.inverse_transform(y_pred)
real_test = sc.inverse_transform(y_test)
print("Predict Value")
print(real_pred)
print("Test Value")
print(real_test)
Yes, by changing your output layer (the last layer) from Dense(1) to Dense(6). Of course you also have to change your y_train and y_test to have shape (1,6) instead of (1,1).
Best of luck.
Related
I want to predict outputs using two inputs using LSTM. my code is as follow:
from keras.models import Sequential
from keras.layers import Dense, LSTM
from numpy import array
from numpy.random import uniform
from numpy import hstack
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import pandas as pd
from sklearn.preprocessing import StandardScaler
import seaborn as sns
dataset_train = pd.read_csv('E:/Research/Summer22/Disaggregation/hourly1.csv')
cols = dataset_train[['TotalE','Outdoor_temp_F','electric.appliances..kWhs.','hvac']]
#Normalization
scaler = StandardScaler()
scaler = scaler.fit(cols)
df_for_training_scaled = scaler.transform(cols)
df_for_training_scaled= pd.DataFrame(data=df_for_training_scaled)
x = array(df_for_training_scaled.iloc[:, [0,1]] )
x = x.reshape(x.shape[0],x.shape[1])
x=x.astype(float)
y= array(df_for_training_scaled.iloc[:, [2,3]] )
y=y.astype(float)
print(y.shape[1])
out_dim = y.shape[1]
in_dim = x.shape[0]
plt.plot(y)
plt.show()
xtrain, xtest, ytrain, ytest=train_test_split(x, y, test_size=0.15)
model = Sequential()
model.add(LSTM(64, input_shape=in_dim, activation="relu"))
model.add(Dense(out_dim))
model.compile(loss="mse", optimizer="adam")
model.summary()
History=model.fit(xtrain, ytrain, validation_split=0.7, epochs=10, batch_size=16, verbose=10)
When I try to build the model and save it to history I face an error:
Input 0 of layer "sequential_85" is incompatible with the layer: expected shape=(None, 2750, 2), found shape=(None, 2)
I know there is a problem with my input/output shape but I don't know how to fix it to predict ['electric.appliances..kWhs.','hvac'] using ['TotalE','Outdoor_temp_F']
I am trying to build a LSTM model for crypto currency prediction just for fun.
I managed to build & compile my LSTM model. However, I couldn't success to predict future dates.
I have checked these solutions so far;
How to use the LSTM model for multi-step forecasting?
Forecast future values with LSTM in Python
How to predict actual future values after testing the trained LSTM model?
I couldn't implement these solutions into my code.
A summary of my dataset is like (simple bitcoin prices):
open,close,high,low,volume,time,date
4331.6,4354.43,4394.47,4303.29,3841.525758,1543438799,2018-11-28 23:59:59
4356.23,4243.57,4359.13,4218.79,4434.861032,1543442399,2018-11-29 00:59:59
4243.57,4236.09,4266.0,4185.01,4347.171442,1543445999,2018-11-29 01:59:59
4236.4,4264.85,4279.9,4215.8,2999.814805,1543449599,2018-11-29 02:59:59
First preparing & scaling my data:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.optimizers import adam_v2
from keras.layers import Dense, LSTM, LeakyReLU, Dropout
data = pd.read_csv('bitcoin.csv')
price = data.filter(['close'])
min_max_scaler = MinMaxScaler()
norm_data = min_max_scaler.fit_transform(price.values)
And then splitting my test & train data from original data.
def univariate_data(dataset, start_index, end_index, history_size, target_size):
data = []
labels = []
start_index = start_index + history_size
if end_index is None:
end_index = len(dataset) - target_size
for i in range(start_index, end_index):
indices = range(i-history_size, i)
data.append(np.reshape(dataset[indices], (history_size, 1)))
labels.append(dataset[i+target_size])
return np.array(data), np.array(labels)
past_history = 5
future_target = 0
TRAIN_SPLIT = int(len(norm_data) * 0.75)
x_train, y_train = univariate_data(norm_data, 0, TRAIN_SPLIT, past_history, future_target)
x_test, y_test = univariate_data(norm_data, TRAIN_SPLIT, None, past_history, future_target)
And finally i compile my model & predict.
num_units = 64
learning_rate = 0.0001
activation_function = 'sigmoid'
adam = adam_v2.Adam(learning_rate=learning_rate)
loss_function = 'mse'
batch_size = 5
num_epochs = 64
model = Sequential()
model.add(LSTM(units = num_units, activation=activation_function, input_shape=(None, 1)))
model.add(LeakyReLU(alpha=0.5))
model.add(Dropout(0.1))
model.add(Dense(units = 1))
model.compile(optimizer=adam, loss=loss_function)
history = model.fit(
x_train,
y_train,
validation_split=0.1,
batch_size=batch_size,
epochs=num_epochs,
shuffle=False
)
model.save('bitcoin.h5')
test_predict = model.predict(x_test)
train_predict = model.predict(x_train)
The result is satisfying for me. But instead of predicting a train data, I want to predict the future by using this model. (For example next 100 rows...)
I am learning numpy & pandas and all other libraries used in this example.
Here is my complete code. I'm trying to predict protein classes from protein sequences.
from sklearn.preprocessing import LabelBinarizer
# Transform labels to one-hot
lb = LabelBinarizer()
Y = lb.fit_transform(df.classification)
from keras.preprocessing import text, sequence
from keras.preprocessing.text import Tokenizer
from sklearn.model_selection import train_test_split
#maximum length of sequence, everything afterwards is discarded!
max_length = 500
#create and fit tokenizer
tokenizer = Tokenizer(char_level=True)
tokenizer.fit_on_texts(seqs)
X = tokenizer.texts_to_sequences(seqs)
X = sequence.pad_sequences(X, maxlen=max_length)
from __future__ import print_function
import numpy as np
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Dropout, Embedding, LSTM, Bidirectional, Conv1D
from keras.layers.convolutional import MaxPooling1D
import tensorflow as tf
from tensorflow.keras import layers
embedding_vecor_length = 128
max_length = 500
model = Sequential()
model.add(Embedding(len(tokenizer.word_index)+1, embedding_vecor_length, input_length=max_length))
model.add(Conv1D(filters=32, kernel_size=3, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Bidirectional(LSTM(64)))
model.add(Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=.2)
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=512)
This is accuracy of the model
train-acc = 0.8485087800799034
test-acc = 0.8203392530062913
and my prediction results are:
[9.65313017e-02 1.33084046e-04 1.73516816e-03 4.62103529e-08
8.45071673e-03 2.42734270e-04 3.54182965e-04 2.88571493e-04
1.99087553e-05 8.92244339e-01]
[8.89207274e-02 1.99566261e-04 1.76228161e-04 2.08527595e-02
1.64435953e-01 2.83987029e-03 1.53038520e-02 7.07270563e-01
5.16798650e-07 2.19354401e-08]
[9.36142087e-01 6.09822795e-02 3.55492946e-09 2.19342492e-05
5.41335670e-04 1.89031591e-04 2.66434945e-04 1.84136129e-03
1.54582867e-05 3.31551647e-10]
Any help in this regard would be appreciated. I'm stuck with it and don't know how to solve it. Also, I'm kindda new to deep learning.
As you can see your last layer has an activation function of softmax function
model.add(Dense(10, activation='softmax'))
So when you predict values it passes through that softmax function in the last layer which gives you those strange-looking float values.
Now, basically what the softmax function is doing here is that it normalizes the input values given to the function and normalizes them in range (0, 1) and all the components will add up to 1. You can read more about the softmax function here: https://en.wikipedia.org/wiki/Softmax_function.
On how can you find the prediction label id you just need to find the maximum value's index in the array and you will have your label id they are pointing to.
You can use numpy argmax function to find maximum values index in multidimensional arrays. You can refer here: https://numpy.org/doc/stable/reference/generated/numpy.argmax.html
after a few days where my code was reproducible everytime - it is now not! i don't know what happend, i just changed some lines of codes and i don't know how to fix it!
# Code reproduzierbar machen
import numpy as np
import os
import random as rn
import tensorflow as tf
import keras
from keras import backend as K
#-----------------------------Keras reproducible------------------#
SEED = 1234
tf.set_random_seed(SEED)
os.environ['PYTHONHASHSEED'] = str(SEED)
np.random.seed(SEED)
rn.seed(SEED)
session_conf = tf.ConfigProto(
intra_op_parallelism_threads=1,
inter_op_parallelism_threads=1
)
sess = tf.Session(
graph=tf.get_default_graph(),
config=session_conf
)
K.set_session(sess)
# Importiere Datasets (Training und Test)
import pandas as pd
poker_train = pd.read_csv("C:/Users/elihe/Documents/Studium Master/WS 19 und 20/Softwareprojekt/poker-hand-training-true.data")
poker_test = pd.read_csv("C:/Users/elihe/Documents/Studium Master/WS 19 und 20/Softwareprojekt/poker-hand-testing.data")
X_tr = poker_train.iloc[:, 0:10]
y_train = poker_train.iloc[:, 10:11]
X_te = poker_test.iloc[:, 0:10]
y_test = poker_test.iloc[:, 10:11]
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_tr)
X_test = sc.transform(X_te)
# NN mit Keras erstellen
import keras
from keras.models import Sequential
from keras.layers import Dense
nen = Sequential()
nen.add(Dense(100, input_dim = 10, activation = 'relu'))
nen.add(Dense(50, activation = 'relu'))
nen.add(Dense(10, activation = 'softmax'))
# Kompilieren
from keras import metrics
nen.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
nen_fit = nen.fit(X_train, y_train,epochs=2, batch_size=50, verbose=1, validation_split = 0.2, shuffle = False)
I took just 2 epochs so that i can see immediately if my Output is the same, normally there would be 500 epochs. The first lines of code made it until today reproducible, but now it is not! I changed the part with the X_te and X_tr, because first i made a OneHotEncoding with the classes y_train and y_test, but now i am not doing it. Also i changed the activation functions from sigmoid to relu and the optimizer from RMSprop to adam. I don't know what to do:(
I have this CSV file, where I'm trying to predict the Histology based on the data in the other rows.
I have the code shown below to do that. However, I'm getting all the predictions as 1. Why is that? Although the accuracy I get after training the model is 86.81%.
import numpy as np
import pandas as pd
from keras.layers import Dense, Dropout, BatchNormalization, Activation
import keras.models as md
import keras.layers.core as core
import keras.utils.np_utils as kutils
import keras.layers.convolutional as conv
from keras.layers import MaxPool2D
from subprocess import check_output
dataset = pd.read_csv('mutation-train.csv')
dataset = dataset[['CDS_Mutation',
'Primary_Tissue',
'Genomic',
'Gene_ID',
'Official_Symbol',
'Histology']]
X = dataset.iloc[:,0:5].values
y = dataset.iloc[:,5].values
# Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X_0 = LabelEncoder()
X[:, 0] = labelencoder_X_0.fit_transform(X[:, 0])
labelencoder_X_1 = LabelEncoder()
X[:, 1] = labelencoder_X_1.fit_transform(X[:, 1])
labelencoder_X_2= LabelEncoder()
X[:, 2] = labelencoder_X_2.fit_transform(X[:, 2])
labelencoder_X_4= LabelEncoder()
X[:, 4] = labelencoder_X_4.fit_transform(X[:, 4])
X = X.astype(float)
labelencoder_y= LabelEncoder()
y = labelencoder_y.fit_transform(y)
onehotencoder0 = OneHotEncoder(categorical_features = [0])
X = onehotencoder0.fit_transform(X).toarray()
X = X[:,0:]
onehotencoder1 = OneHotEncoder(categorical_features = [1])
X = onehotencoder1.fit_transform(X).toarray()
X = X[:,0:]
onehotencoder2 = OneHotEncoder(categorical_features = [2])
X = onehotencoder2.fit_transform(X).toarray()
X = X[:,0:]
onehotencoder4 = OneHotEncoder(categorical_features = [4])
X = onehotencoder4.fit_transform(X).toarray()
X = X[:,0:]
# Splitting the dataset training and test sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2)
# Feature scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Evaluating the ANN
from sklearn.model_selection import cross_val_score
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
model=Sequential()
model.add(Dense(32, activation = 'relu', input_shape=(X.shape[1],)))
model.add(Dense(16, activation = 'relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ["accuracy"])
# Compile model
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
# Fit the model
model.fit(X,y, epochs=3, batch_size=1)
# Evaluate the model
scores = model.evaluate(X,y)
print("\n%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
# Calculate predictions
predictions = model.predict(X)
prediction = pd.DataFrame(predictions,columns=['predictions']).to_csv('prediction.csv')
Thanks.
As you are getting 86.81% accuracy where all the values are 1, it seems like your data is imbalanced it means in your training dataset one of the class has overpowered the other one.
So even if your predict 1 for all the test-data, you will get higher accuracy.
Refer Accuracy paradox
Eg. In your dataset, around 85% data samples are of class 1 and remaining of class 0.
How to deal with it
There are plenty of ways to deal with it.
Upsampling: Create duplicate data for class 0 so both class 1 and class 0 will be in same proportion.
Downsampling: Just remove some of the samples from class 1 to get same proprtion.
change Performance matrix: Rather than using accuracy as performance matrix use,
F1 score, precision or recall
You can assign different penalties to different classes on making a mistake. In this case you give high weightage to class which has low data.
And there more ways to deal with it.
Refer this link for more details.