Pytorch LSTM predicts too low values - python

I followed the guide on https://stackabuse.com/time-series-prediction-using-lstm-with-pytorch-in-python/ in order to set up a LSTM model for predictions using torch.
Once imported the needed modules (torch, torch.nn, functional, optim, etc:) I set up my indexlist with quantities for 52 weeks.
idxlist = []
for x in range (0, len(inputslist)):
idxlist.append(a - datetime.timedelta(weeks = x))
in_data = pd.DataFrame(columns=['date','qty'])
in_data.date = idxlist
in_data.qty = inputslist
all_data = in_data['qty'].values.astype(float)
Then, I defined the test size and set up a scaler to normalize data:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range=(-1,1))
train_data_normalized = scaler.fit_transform(train_data.reshape(-1,1))
train_data_normalized = torch.FloatTensor(train_data_normalized).view(-1)
#train window
train_window = 12
Then I created sequences:
def create_inout_sequences(input_data,tw):
inout_seq = []
L = len(input_data) ##40
for i in range(L-tw): ##32 (se tw=8)
train_seq = input_data[i:i+tw]
train_label = input_data[i+tw:i+tw+1]
inout_seq.append((train_seq ,train_label))
return inout_seq
train_inout_seq = create_inout_sequences(train_data_normalized, train_window)
print(len(train_inout_seq)) #len(train) - train_w
And set up the Net class for the LSTM model. After optimizing with model=LSTM(), optim.Adam and model.eval(), I obtain the inverse transform with:
actual_predictions = scaler.inverse_transform(np.array(test_inputs[train_window:] ).reshape(-1, 1))
However, plotting the predicted values of last 12 weeks with the actual values I obtain:
Graph.
Why is this prediction so much below the actual values, even if periodic?

Related

How to get Tensorflow RNN to respect time

I am trying to set up a RNN to solve a Rubik's cube from any legitimate scramble. I have generated the data the network is being trained on so I know that it is correct. I believe that I have set up both the data pipeline and the network but I have included it below just to incase. I am using tensorflow version 2.8.0 but I am willing to change version.
I set up the data pipeline as such:
def process_path(file_path):
data_out = tf.py_function(
func=lambda path: np.load(path.numpy().decode("utf-8")),inp=[file_path],Tout=tf.float32)
feature = data_out[0,:,:]
feature = tf.expand_dims(feature, axis=0)
feature = tf.expand_dims(feature, axis=-1)
label = data_out[1:,:,:]
label = tf.expand_dims(label, axis=-1)
return feature/6, label/6
r = 2
list_ds = tf.data.Dataset.list_files(str('../data/*r_'+str(r)+'_*.npy'),shuffle=False)
labeled_ds = list_ds.map(process_path)
data_dirpathname = '../data/'
data_dir = pathlib.Path(data_dirpathname)
data_count = len(list(data_dir.glob('*r_'+str(r)+'_*.npy')))
print(data_count)
val_size = int(data_count * splt)
train_ds = list_ds.skip(val_size)
val_ds = list_ds.take(val_size)
train_ds = train_ds.map(process_path)
val_ds = val_ds.map(process_path)
batch_size = 300
def configure_for_performance(ds):
ds = ds.cache()
ds = ds.shuffle(buffer_size=1000)
ds = ds.batch(batch_size)
ds = ds.prefetch(buffer_size=tf.data.AUTOTUNE)
return ds
train_ds = configure_for_performance(train_ds)
val_ds = configure_for_performance(val_ds)
I then test that the features and labels are made correctly by ploting out the cross display of the cube using a personal function called plotCube that takes the value and turns it into the right colors.
for feature, label in labeled_ds.take(1):
plt.figure(figsize=(90,30))
print(feature.shape)
one = feature[0,:,:,0]
ax = plt.subplot(131)
plotCube(one*6,ax)
print(label.shape)
for i in range(label.shape[0]):
one = label[i,:,:,0]
ax = plt.subplot(1,3,i+2)
plotCube(one*6,ax)
Plot output
The output of the plot shows that the feature is the scramble of the cube and the labels are the states taken to solve it in order. This proves that the data is set up right
The network was made using the following code:
inp = layers.Input(shape=(1,9,12,1))
x = layers.Rescaling(1./6)(inp)
x = layers.ConvLSTM2D(filters=64,kernel_size=(5,5),padding="same",return_sequences=True,activation="relu",)(x)
x = layers.BatchNormalization()(x)
x = layers.ConvLSTM2D(filters=64,kernel_size=(3, 3),padding="same",return_sequences=True,activation="relu",)(x)
x = layers.BatchNormalization()(x)
x = layers.ConvLSTM2D(filters=64,kernel_size=(1, 1),padding="same",return_sequences=True,activation="relu",)(x)
x = layers.Conv3D(filters=1,kernel_size=(3, 3, 3),padding="same",activation="sigmoid",)(x)
model = keras.models.Model(inp, x)
model.compile(loss=keras.losses.binary_crossentropy, optimizer=keras.optimizers.Adam(),)
After the model is trained on 7500 data sets the prediction looks as if time is not being respected and the average of the moves is being returned. When I test the model on a data set that it has never seen before and get this result. The left column is the data set and the right column is what the model predicts. The model is given the first cube cross from the data set. Each row represents the current state, so the top row is first state and the bottom row is the last state.
frames_prediction = model.predict(np.expand_dims(np.expand_dims(data[0,0,:,:,:], axis=0), axis=0))
print(frames_prediction.shape)
fig = plt.figure(figsize=(10,data.shape[1]*10))
for i in range(data.shape[1]):
ax = plt.subplot(data.shape[1]+4,2,i*2+1)
plotCube(np.round(data[0,i,:,:,:]*6),ax)
if i<(data.shape[1]-1):
ax = plt.subplot(data.shape[1]+4,2,i*2+4)
plotCube(np.round(frames_prediction[0,i,:,:,0]*6),ax)
It is clear to see that the output is just the average of the second and third state. I have also trained this exact model architecture on 20 moves, 10 moves, and 5 moves. All of these models give worse results than the 2 move model shown. Is there any way to force the network to respect the order that the model is trained on so the output at each state can be unique?

RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 421 but got size 71 for tensor number 1 in the list

I am trying to follow this tutorial
when I try to predict I get the runtime error on the below line
# compute N predictions
pred = model_RNN.predict(n=FC_N, future_covariates=covariates)
RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 421 but got size 71 for tensor number 1 in the list.
which list it is referring to?
My train data size is 421 and valdiation is 105
not sure what 71 it is referring to.
Below is the full code
## load data
from darts import TimeSeries
df = df[['close']]
ts = TimeSeries.from_dataframe(df, freq='b')
#ts = pd.Series(df['close'], index=df.index)
series = df
plt.figure(100, figsize=(12,5))
series.plot()
# analyze its seasonality
is_seasonal, periodicity = check_seasonality(ts, max_lag=240)
dict_seas ={
"is seasonal?":is_seasonal,
"periodicity (months)":f'{periodicity:.1f}',
"periodicity (~years)": f'{periodicity/12:.1f}'}
_ = [print(k,":",v) for k,v in dict_seas.items()]
# split training vs test dataset
train, val = ts.split_after(0.8)
# normalize the time series
trf = Scaler()
# fit the transformer to the training dataset
train_trf = trf.fit_transform(train)
# apply the transformer to the validation set and the complete series
val_trf = trf.transform(val)
ts_trf = trf.transform(ts)
# create month and year covariate series
year_series = datetime_attribute_timeseries(
pd.date_range(start=ts.start_time(),
freq=ts.freq_str,
periods=len(ts)),
attribute='year',
one_hot=False)
year_series = Scaler().fit_transform(year_series)
month_series = datetime_attribute_timeseries(
year_series,
attribute='month',
one_hot=True)
covariates = year_series.stack(month_series)
cov_train, cov_val = covariates.split_after(0.8)
# helper function: fit the RNN model
def fit_it(model, train, val, flavor):
t_start = time.perf_counter()
print("\nbeginning the training of the {0} RNN:".format(flavor))
res = model.fit(train,
future_covariates=covariates,
val_series=val,
val_future_covariates=covariates,
verbose=True)
res_time = time.perf_counter() - t_start
print("training of the {0} RNN has completed:".format(flavor), f'{res_time:.2f} sec')
return res
# set up, fit, run, plot, and evaluate the RNN model
def run_RNN(flavor, ts, train, val):
# set the model up
model_RNN = RNNModel(
model=flavor,
model_name=flavor + str(" RNN"),
input_chunk_length=periodicity,
training_length=20,
hidden_dim=20,
batch_size=16,
n_epochs=EPOCH,
dropout=0,
optimizer_kwargs={'lr': 1e-3},
log_tensorboard=True,
random_state=42,
force_reset=True)
if flavor == "RNN": flavor = "Vanilla"
# fit the model
fit_it(model_RNN, train, val, flavor)
# compute N predictions
pred = model_RNN.predict(n=FC_N, future_covariates=covariates)
# plot predictions vs actual
plot_fitted(pred, ts, flavor)
# print accuracy metrics
res_acc = accuracy_metrics(pred, ts)
print(flavor + " : ")
_ = [print(k,":",f'{v:.4f}') for k,v in res_acc.items()]
return [pred, res_acc]
# run 3 different flavors of RNN on the time series:
flavors = ["LSTM", "GRU", "RNN"]
# call the RNN model setup for each of the 3 RNN flavors
res_flavors = [run_RNN(flv, ts_trf, train_trf, val_trf) for flv in flavors]

Python LSTM Bitcoin prediction flatlines

I'm currently trying to build a "simple" LSTM model that takes historical Bitcoin data, learns from that and then tries to predict the future X steps in advance.
I've build it on the idea that A + B + C = D so B + C + D should be E. (I think that's a very simple idea behind an LSTM model. I might be wrong however i'm pretty new to it.)
I managed to build the basics in python (I'm fairly new to python) but something seems off by the prediction. For some reason many of the predictions i test / make end up flatlining. I have a theory on why but we have no idea if it's correct and even less idea on how to solve it.
My theory is that within a sequence the model learns to put more importance / weight on the last digit in the sequence because with Bitcoin prices the future price (in 1 minute) is probably pretty close to the price now. That's try the predicted values keeps getting closer to the real value eventually being equal and thus flatlining in a graph. (I don't know if that makes sense but thats what i tought anyway.)
I've also added a screenshot of my graph from a few days ago. Almost all predictions however end similar to this graph. This is just a more extreme example as demonstration.
Here is my code, can someone please explain why it flatlines and what i did wrong?
import numpy as np
from matplotlib import pyplot
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout
import yfinance as yf
from sklearn.preprocessing import MinMaxScaler
from math import sqrt
from sklearn.metrics import mean_squared_error
# Create output sets X + Y from given input-set
# with inputset : a 1-dimensional list of floats
# with N : the number of lookback values to use for X
# with Gap : the number of point skipped between X and Y
# Y: is equal to input, (although the first N are missing)
# X: for each y of Y a corresponding set of size N is created
# composed of the N values preceeding y.
def create_lookback(inputset, n=1, gap=0):
print("create_lookback with n=%d gap=%d" % (n,gap))
print(" - length of inputset = %d" % len(inputset))
dataX, dataY = [], []
for i in range(len(inputset) - (n+gap)):
a = inputset[i:(i + n), 0]
dataX.append(a)
dataY.append(inputset[i + n+gap, 0])
print(" - length of dataY = %d" % len(dataY))
data_x = np.array(dataX)
xret = data_x.reshape(data_x.shape[0], 1, data_x.shape[1])
return xret, np.array(dataY)
# Train model based on given training-set + Test-set
def create_model(trainX,trainY,testX,testY):
model = Sequential()
model.add(LSTM(units = 100, input_shape=(trainX.shape[1], trainX.shape[2], )))
model.add(Dropout(0.2))
#model.add(LSTM(30, return_sequences=True))
#model.add(Dropout(0.1))
model.add(Dense(1))
model.compile(loss='mae', optimizer='adam')
history = model.fit(trainX, trainY, epochs=100, batch_size=5, validation_data=(testX, testY), verbose=1, shuffle=False)
return model
# Evaluate given X / Y set.
# - Calculate RMSE
# - Generate visual line-plot to screen
def show_result(scaler,yhat,setY,txt):
print("Show %s result" % txt)
yhat_inverse = scaler.inverse_transform(yhat.reshape(-1, 1))
testY_inverse = scaler.inverse_transform(setY.reshape(-1, 1))
if len(testY_inverse) == len(yhat_inverse):
rmse = sqrt(mean_squared_error(testY_inverse, yhat_inverse))
print(' RMSE %s : %.3f' % (txt,rmse))
pyplot.plot(yhat_inverse, label='predict '+txt)
pyplot.plot(testY_inverse, label='actual '+txt, alpha=0.5)
pyplot.legend()
pyplot.show()
# Extrapoleer is dutch for Extrapolate
def extrapoleer(i,model,tup,toekomst):
if(i == 0):
return
setX = np.array([[tup]])
y = model.predict(setX)
y_float = y[0][0]
tup_new = np.append(tup[1:], y_float)
toekomst.append(y_float)
extrapoleer(i-1, model, tup_new,toekomst)
# --- end of defined functions
# -- start of main flow
data_grid_1 = yf.download('BTC-USD', start="2021-04-14",end="2021-04-15", interval="1m");
data_grid_2 = yf.download('BTC-USD', period="12h", interval="1m");
dataset_1 = data_grid_1.iloc[:, 1:2].values
dataset_2 = data_grid_2.iloc[:, 1:2].values
scaler = MinMaxScaler(feature_range = (0, 1))
scaled = scaler.fit_transform(dataset_1)
# 70% of dataset_1 is used to train ; 30% to test
train_size = int(len(scaled) * 0.7)
test_size = len(scaled) - train_size
train, test = scaled[0:train_size,:], scaled[train_size:len(scaled),:]
print("train: %d test: %d" % (len(train), len(test)))
scaled_2 = scaler.fit_transform(dataset_2)
look_back_n = 3
look_back_gap = 0
trainX, trainY = create_lookback(train, look_back_n, look_back_gap)
testX, testY = create_lookback(test, look_back_n, look_back_gap)
testX_2, testY_2 = create_lookback(scaled_2, look_back_n, look_back_gap)
model = create_model(trainX,trainY,testX,testY)
yhat_1 = model.predict(testX)
yhat_2 = model.predict(testX_2)
show_result(scaler,yhat_1,testY,"test")
show_result(scaler,yhat_2,testY_2,"test2")
last_n = testY_2[-look_back_n:]
#toekomst = Future in dutch
toekomst = []
#aantal = Amount in Dutch, this indicates the amount if steps you want to future predict
aantal = 30
extrapoleer(aantal, model, last_n, toekomst)
print("Resultaat van %d voorspelde punten in de toekomst: " % aantal)
print(toekomst)
yhat_2_plus = np.append(yhat_2,toekomst)
show_result(scaler,yhat_2_plus,testY_2,"test2-plus")

Error when checking input: expected input_12 to have 4 dimensions, but got array with shape (None, None, None, None, None)

I am trying to classify 24 RGB images belonging to 2 classes. Each image was originally of dimension 400 X 400, but has been resized to 32 X 32 in the code. Iam using the metric-learning for image similarity search algorithm. However, I obtain the error " Error when checking input.....", when I run the line history = model.fit(AnchorPositivePairs(num_batchs=2), epochs=20) at the end of the program. What could be causing this error?
Here is my code!
import random
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from collections import defaultdict
from PIL import Image
from sklearn.metrics import ConfusionMatrixDisplay
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras import utils
import glob
import os
import tqdm
IMG_DIR = "C:/Temp2/AGAIN_4/" # load the images from one directory
IM_WIDTH = 32
IM_HEIGHT = 32
#batch_size = 2
num_classes = 2
#epochs = 15
def read_images(directory, resize_to=(32, 32)): # extract image labels
"""
Reads images and labels from the given directory
:param directory directory from which to read the files
:param resize_to a tuple of width, height to resize the images
: returns a tuple of list of images and labels
"""
files = glob.glob(directory + "*.jpg")
images = []
labels = []
for f in tqdm.tqdm_notebook(files):
im = Image.open(f)
im = im.resize(resize_to)
im = np.array(im) / 255.0
im = im.astype("float32")
images.append(im)
label = 1 if 'microwave' in f.lower() else 0
labels.append(label)
return np.array(images), np.array(labels)
x, y = read_images(directory=IMG_DIR, resize_to=(IM_WIDTH, IM_HEIGHT))
# make sure we have 25000 images if we are reading the full data set.
# Change the number accordingly if you have created a subset
assert len(x) == len(y) == 24 #25000
from sklearn.model_selection import train_test_split # extract train and test data
x_train, x_test, y_train, y_test =train_test_split(x, y, test_size=0.25)
# remove X and y since we don't need them anymore
# otherwise it will just use the memory
del x
del y
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
# Display some of the images
height_width = 32
def show_collage(examples):
box_size = height_width + 2
num_rows, num_cols = examples.shape[:2]
collage = Image.new(
mode="RGB",
size=(num_cols * box_size, num_rows * box_size),
color=(250, 250, 250),
)
for row_idx in range(num_rows):
for col_idx in range(num_cols):
array = (np.array(examples[row_idx, col_idx]) * 255).astype(np.uint8)
collage.paste(
Image.fromarray(array), (col_idx * box_size, row_idx * box_size)
)
# Double size for visualisation.
collage = collage.resize((2 * num_cols * box_size, 2 * num_rows * box_size))
return collage
# Show a collage of 3x3 random images.
sample_idxs = np.random.randint(0, 15, size=(3, 3))
examples = x_train[sample_idxs]
show_collage(examples) # Displays 9 images
class_idx_to_train_idxs = defaultdict(list)
for y_train_idx, y in enumerate(y_train):
class_idx_to_train_idxs[y].append(y_train_idx)
class_idx_to_test_idxs = defaultdict(list)
for y_test_idx, y in enumerate(y_test):
class_idx_to_test_idxs[y].append(y_test_idx)
num_classes = 2
class AnchorPositivePairs(keras.utils.Sequence):
def __init__(self, num_batchs):
self.num_batchs = num_batchs
def __len__(self):
return self.num_batchs
def __getitem__(self, _idx):
x = np.empty((2, num_classes, height_width, height_width, 3), dtype=np.float32)
for class_idx in range(num_classes):
examples_for_class = class_idx_to_train_idxs[class_idx]
anchor_idx = random.choice(examples_for_class)
positive_idx = random.choice(examples_for_class)
while positive_idx == anchor_idx:
positive_idx = random.choice(examples_for_class)
x[0, class_idx] = x_train[anchor_idx]
x[1, class_idx] = x_train[positive_idx]
return x
examples = next(iter(AnchorPositivePairs(num_batchs=1)))
show_collage(examples)
class EmbeddingModel(keras.Model):
def train_step(self, data):
# Note: Workaround for open issue, to be removed.
if isinstance(data, tuple):
data = data[0]
anchors, positives = data[0], data[1]
with tf.GradientTape() as tape:
# Run both anchors and positives through model.
anchor_embeddings = self(anchors, training=True)
positive_embeddings = self(positives, training=True)
# Calculate cosine similarity between anchors and positives. As they have
# been normalised this is just the pair wise dot products.
similarities = tf.einsum(
"ae,pe->ap", anchor_embeddings, positive_embeddings
)
# Since we intend to use these as logits we scale them by a temperature.
# This value would normally be chosen as a hyper parameter.
temperature = 0.2
similarities /= temperature
# We use these similarities as logits for a softmax. The labels for
# this call are just the sequence [0, 1, 2, ..., num_classes] since we
# want the main diagonal values, which correspond to the anchor/positive
# pairs, to be high. This loss will move embeddings for the
# anchor/positive pairs together and move all other pairs apart.
sparse_labels = tf.range(num_classes)
loss = self.compiled_loss(sparse_labels, similarities)
# Calculate gradients and apply via optimizer.
gradients = tape.gradient(loss, self.trainable_variables)
self.optimizer.apply_gradients(zip(gradients, self.trainable_variables))
# Update and return metrics (specifically the one for the loss value).
self.compiled_metrics.update_state(sparse_labels, similarities)
return {m.name: m.result() for m in self.metrics}
inputs = layers.Input(shape=(height_width, height_width, 3))
#inputs = layers.Input(shape=(32, 32, 3))
x = layers.Conv2D(filters=32, kernel_size=3, strides=2, activation="relu")(inputs)
x = layers.Conv2D(filters=64, kernel_size=3, strides=2, activation="relu")(x)
x = layers.Conv2D(filters=128, kernel_size=3, strides=2, activation="relu")(x)
x = layers.GlobalAveragePooling2D()(x)
embeddings = layers.Dense(units=8, activation=None)(x)
embeddings = tf.nn.l2_normalize(embeddings, axis=-1)
model = EmbeddingModel(inputs, embeddings)
model.summary()
model.compile(
optimizer=keras.optimizers.Adam(learning_rate=1e-3),
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
)
history = model.fit(AnchorPositivePairs(num_batchs=2), epochs=20)
plt.plot(history.history["loss"])
plt.show()
I have used the cifar10 dataset as input instead of my local directory images as shown in the next code, but I still get the same error.
import random
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from collections import defaultdict
from PIL import Image
from sklearn.metrics import ConfusionMatrixDisplay
from tensorflow import keras
from tensorflow.keras import layers
"""
## Dataset
For this example we will be using the
[CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset.
"""
from tensorflow.keras.datasets import cifar10
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
x_train = x_train.astype("float32") / 255.0
y_train = np.squeeze(y_train)
x_test = x_test.astype("float32") / 255.0
y_test = np.squeeze(y_test)
"""
To get a sense of the dataset we can visualise a grid of 25 random examples.
"""
height_width = 32
def show_collage(examples):
box_size = height_width + 2
num_rows, num_cols = examples.shape[:2]
collage = Image.new(
mode="RGB",
size=(num_cols * box_size, num_rows * box_size),
color=(250, 250, 250),
)
for row_idx in range(num_rows):
for col_idx in range(num_cols):
array = (np.array(examples[row_idx, col_idx]) * 255).astype(np.uint8)
collage.paste(
Image.fromarray(array), (col_idx * box_size, row_idx * box_size)
)
# Double size for visualisation.
collage = collage.resize((2 * num_cols * box_size, 2 * num_rows * box_size))
return collage
# Show a collage of 5x5 random images.
sample_idxs = np.random.randint(0, 50000, size=(5, 5))
examples = x_train[sample_idxs]
show_collage(examples)
"""
Metric learning provides training data not as explicit `(X, y)` pairs but instead uses
multiple instances that are related in the way we want to express similarity. In our
example we will use instances of the same class to represent similarity; a single
training instance will not be one image, but a pair of images of the same class. When
referring to the images in this pair we'll use the common metric learning names of the
`anchor` (a randomly chosen image) and the `positive` (another randomly chosen image of
the same class).
To facilitate this we need to build a form of lookup that maps from classes to the
instances of that class. When generating data for training we will sample from this
lookup.
"""
class_idx_to_train_idxs = defaultdict(list)
for y_train_idx, y in enumerate(y_train):
class_idx_to_train_idxs[y].append(y_train_idx)
class_idx_to_test_idxs = defaultdict(list)
for y_test_idx, y in enumerate(y_test):
class_idx_to_test_idxs[y].append(y_test_idx)
"""
For this example we are using the simplest approach to training; a batch will consist of
`(anchor, positive)` pairs spread across the classes. The goal of learning will be to
move the anchor and positive pairs closer together and further away from other instances
in the batch. In this case the batch size will be dictated by the number of classes; for
CIFAR-10 this is 10.
"""
num_classes = 10
class AnchorPositivePairs(keras.utils.Sequence):
def __init__(self, num_batchs):
self.num_batchs = num_batchs
def __len__(self):
return self.num_batchs
def __getitem__(self, _idx):
x = np.empty((2, num_classes, height_width, height_width, 3), dtype=np.float32)
for class_idx in range(num_classes):
examples_for_class = class_idx_to_train_idxs[class_idx]
anchor_idx = random.choice(examples_for_class)
positive_idx = random.choice(examples_for_class)
while positive_idx == anchor_idx:
positive_idx = random.choice(examples_for_class)
x[0, class_idx] = x_train[anchor_idx]
x[1, class_idx] = x_train[positive_idx]
return x
"""
We can visualise a batch in another collage. The top row shows randomly chosen anchors
from the 10 classes, the bottom row shows the corresponding 10 positives.
"""
examples = next(iter(AnchorPositivePairs(num_batchs=1)))
show_collage(examples)
"""
## Embedding model
We define a custom model with a `train_step` that first embeds both anchors and positives
and then uses their pairwise dot products as logits for a softmax.
"""
class EmbeddingModel(keras.Model):
def train_step(self, data):
# Note: Workaround for open issue, to be removed.
if isinstance(data, tuple):
data = data[0]
anchors, positives = data[0], data[1]
with tf.GradientTape() as tape:
# Run both anchors and positives through model.
anchor_embeddings = self(anchors, training=True)
positive_embeddings = self(positives, training=True)
# Calculate cosine similarity between anchors and positives. As they have
# been normalised this is just the pair wise dot products.
similarities = tf.einsum(
"ae,pe->ap", anchor_embeddings, positive_embeddings
)
# Since we intend to use these as logits we scale them by a temperature.
# This value would normally be chosen as a hyper parameter.
temperature = 0.2
similarities /= temperature
# We use these similarities as logits for a softmax. The labels for
# this call are just the sequence [0, 1, 2, ..., num_classes] since we
# want the main diagonal values, which correspond to the anchor/positive
# pairs, to be high. This loss will move embeddings for the
# anchor/positive pairs together and move all other pairs apart.
sparse_labels = tf.range(num_classes)
loss = self.compiled_loss(sparse_labels, similarities)
# Calculate gradients and apply via optimizer.
gradients = tape.gradient(loss, self.trainable_variables)
self.optimizer.apply_gradients(zip(gradients, self.trainable_variables))
# Update and return metrics (specifically the one for the loss value).
self.compiled_metrics.update_state(sparse_labels, similarities)
return {m.name: m.result() for m in self.metrics}
"""
Next we describe the architecture that maps from an image to an embedding. This model
simply consists of a sequence of 2d convolutions followed by global pooling with a final
linear projection to an embedding space. As is common in metric learning we normalise the
embeddings so that we can use simple dot products to measure similarity. For simplicity
this model is intentionally small.
"""
inputs = layers.Input(shape=(height_width, height_width, 3))
x = layers.Conv2D(filters=32, kernel_size=3, strides=2, activation="relu")(inputs)
x = layers.Conv2D(filters=64, kernel_size=3, strides=2, activation="relu")(x)
x = layers.Conv2D(filters=128, kernel_size=3, strides=2, activation="relu")(x)
x = layers.GlobalAveragePooling2D()(x)
embeddings = layers.Dense(units=8, activation=None)(x)
embeddings = tf.nn.l2_normalize(embeddings, axis=-1)
model = EmbeddingModel(inputs, embeddings)
"""
Finally we run the training. On a Google Colab GPU instance this takes about a minute.
"""
model.compile(
optimizer=keras.optimizers.Adam(learning_rate=1e-3),
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
)
history = model.fit(AnchorPositivePairs(num_batchs=1000), epochs=20)
plt.plot(history.history["loss"])
plt.show()
"""
## Testing
We can review the quality of this model by applying it to the test set and considering
near neighbours in the embedding space.
First we embed the test set and calculate all near neighbours. Recall that since the
embeddings are unit length we can calculate cosine similarity via dot products.
"""
near_neighbours_per_example = 10
embeddings = model.predict(x_test)
gram_matrix = np.einsum("ae,be->ab", embeddings, embeddings)
near_neighbours = np.argsort(gram_matrix.T)[:, -(near_neighbours_per_example + 1) :]
"""
As a visual check of these embeddings we can build a collage of the near neighbours for 5
random examples. The first column of the image below is a randomly selected image, the
following 10 columns show the nearest neighbours in order of similarity.
"""
num_collage_examples = 5
examples = np.empty(
(
num_collage_examples,
near_neighbours_per_example + 1,
height_width,
height_width,
3,
),
dtype=np.float32,
)
for row_idx in range(num_collage_examples):
examples[row_idx, 0] = x_test[row_idx]
anchor_near_neighbours = reversed(near_neighbours[row_idx][:-1])
for col_idx, nn_idx in enumerate(anchor_near_neighbours):
examples[row_idx, col_idx + 1] = x_test[nn_idx]
show_collage(examples)
"""
We can also get a quantified view of the performance by considering the correctness of
near neighbours in terms of a confusion matrix.
Let us sample 10 examples from each of the 10 classes and consider their near neighbours
as a form of prediction; that is, does the example and its near neighbours share the same
class?
We observe that each animal class does generally well, and is confused the most with the
other animal classes. The vehicle classes follow the same pattern.
"""
confusion_matrix = np.zeros((num_classes, num_classes))
# For each class.
for class_idx in range(num_classes):
# Consider 10 examples.
example_idxs = class_idx_to_test_idxs[class_idx][:10]
for y_test_idx in example_idxs:
# And count the classes of its near neighbours.
for nn_idx in near_neighbours[y_test_idx][:-1]:
nn_class_idx = y_test[nn_idx]
confusion_matrix[class_idx, nn_class_idx] += 1
# Display a confusion matrix.
labels = [
"Airplane",
"Automobile",
"Bird",
"Cat",
"Deer",
"Dog",
"Frog",
"Horse",
"Ship",
"Truck",
]
disp = ConfusionMatrixDisplay(confusion_matrix=confusion_matrix, display_labels=labels)
disp.plot(include_values=True, cmap="viridis", ax=None, xticks_rotation="vertical")
plt.show()
ValueError: Error when checking input: expected input_16 to have 4 dimensions, but got array with shape (None, None, None, None, None)
I m not sure but as i understand it your own data generator cause this error. You should also pass to try your data size in generator, try this:
def __len__(self):
if self.batch_size > self.X.shape[0]:
print("Batch size is greater than data size!!")
return -1
return int(np.floor(self.X.shape[0] / self.batch_size))

Tensorflow practice y = xw how do you initialize vector x?

I am studying with first machine-learning practice.
This is the prediction system of monthly temperature.
train_t has the temperatures and train_x has the weight for each data.
However I have a question where initializing train_x
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from pprint import pprint
x = tf.placeholder(tf.float32,[None,5])
w = tf.Variable(tf.zeros([5,1]))
y = tf.matmul(x,w)
t = tf.placeholder(tf.float32,[None,1])
loss = tf.reduce_sum(tf.square(y-t))
train_step = tf.train.AdamOptimizer().minimize(loss)
sess = tf.Session()
sess.run(tf.initialize_all_variables())
train_t = np.array([5.2,5.7,8.6,14.9,18.2,20.4,25.5,26.4,22.8,17.5,11.1,6.6]) #montly temperature
train_t = train_t.reshape([12,1])
train_x = np.zeros([12,5])
for row, month in enumerate(range(1,13)):
for col, n in enumerate(range(0,5)):
train_x[row][col] = month**n ## why initialize like this??
i = 0
for _ in range(10000):
i += 1
sess.run(train_step,feed_dict={x:train_x,t:train_t})
if i % 1000 == 0:
loss_val = sess.run(loss,feed_dict={x:train_x,t:train_t})
print('step : %d,Loss: %f' % (i,loss_val))
w_val = sess.run(w)
pprint(w_val)
def predict(x):
result = 0.0
for n in range(0,5):
result += w_val[n][0] * x**n
return result
fig = plt.figure()
subplot = fig.add_subplot(1,1,1)
subplot.set_xlim(1,12)
subplot.scatter(range(1,13),train_t)
linex = np.linspace(1,12,100)
liney = predict(linex)
subplot.plot(linex, liney)
However I don't understand here
for row, month in enumerate(range(1,13)): #
for col, n in enumerate(range(0,5)): #
train_x[row][col] = month**n ## why initialize like this??
What does this mean??
There is no comment about this in my book??
Why train_x is initialized here??
In fact, this bloc of code:
train_t = np.array([5.2,5.7,8.6,14.9,18.2,20.4,25.5,26.4,22.8,17.5,11.1,6.6]) #montly temperature
train_t = train_t.reshape([12,1])
train_x = np.zeros([12,5])
for row, month in enumerate(range(1,13)):
for col, n in enumerate(range(0,5)):
train_x[row][col] = month**n
Is the generation of your data. It initialize train_t and train_x which are the data that will be injected into placeholders x and t
train_t is a tensor of temperatures
train_x is a tensor of sort of weight of each temperatures.
They constitute the dataset.
Both train_x and train_t are arrays with your training data. In the array train_t you have the target of your model, while train_x contains the features in input to your model.
The weights of your model (the ones that are trained) are w (which is the only tf.Variable in your code), which is initialized randomly.
The model you are training is a degree 4 (which is the max of range(0, 5)) polynomial on the linear variable month which ranges in range(1, 13). That snipped of code generates the features for a degree 4 polynomial starting from the linear variable month.

Categories

Resources