I have a simple toy NN with Pytorch. I am setting all the seeds I can find in the docs as well as numpy random.
If I run the code below from top to bottom, the results appear to be reproducible.
BUT, if I run block 1 only once and then each time run block 2, the result changes (sometimes dramatically). I am unsure why this happens since the network is being re-initialized and optimizer reset each time.
I am using version 0.4.0
BLOCK #1
from __future__ import division
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import torch
import torch.utils.data as utils_data
from torch.autograd import Variable
from torch import optim, nn
from torch.utils.data import Dataset
import torch.nn.functional as F
from torch.nn.init import xavier_uniform_, xavier_normal_,uniform_
torch.manual_seed(123)
import random
random.seed(123)
from sklearn.datasets import load_boston
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
%matplotlib inline
cuda=True #set to true uses GPU
if cuda:
torch.cuda.manual_seed(123)
#load boston data from scikit
boston = load_boston()
x=boston.data
y=boston.target
y=y.reshape(y.shape[0],1)
#train and test
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.3, random_state=123, shuffle=False)
#change to tensors
x_train = torch.from_numpy(x_train)
y_train = torch.from_numpy(y_train)
#create dataset and use data loader
training_samples = utils_data.TensorDataset(x_train, y_train)
data_loader_trn = utils_data.DataLoader(training_samples, batch_size=64,drop_last=False)
#change to tensors
x_test = torch.from_numpy(x_test)
y_test = torch.from_numpy(y_test)
#create dataset and use data loader
testing_samples = utils_data.TensorDataset(x_test, y_test)
data_loader_test = utils_data.DataLoader(testing_samples, batch_size=64,drop_last=False)
#simple model
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
#all the layers
self.fc1 = nn.Linear(x.shape[1], 20)
xavier_uniform_(self.fc1.weight.data) #this is how you can change the weight init
self.drop = nn.Dropout(p=0.5)
self.fc2 = nn.Linear(20, 1)
def forward(self, x):
x = F.relu(self.fc1(x))
x= self.drop(x)
x = self.fc2(x)
return x
BLOCK #2
net=Net()
if cuda:
net.cuda()
# create a stochastic gradient descent optimizer
optimizer = optim.Adam(net.parameters())
# create a loss function (mse)
loss = nn.MSELoss(size_average=False)
# run the main training loop
epochs =20
hold_loss=[]
for epoch in range(epochs):
cum_loss=0.
cum_records_epoch =0
for batch_idx, (data, target) in enumerate(data_loader_trn):
tr_x, tr_y = data.float(), target.float()
if cuda:
tr_x, tr_y = tr_x.cuda(), tr_y.cuda()
# Reset gradient
optimizer.zero_grad()
# Forward pass
fx = net(tr_x)
output = loss(fx, tr_y) #loss for this batch
cum_loss += output.item() #accumulate the loss
# Backward
output.backward()
# Update parameters based on backprop
optimizer.step()
cum_records_epoch +=len(tr_x)
if batch_idx % 1 == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, cum_records_epoch, len(data_loader_trn.dataset),
100. * (batch_idx+1) / len(data_loader_trn), output.item()))
print('Epoch average loss: {:.6f}'.format(cum_loss/cum_records_epoch))
hold_loss.append(cum_loss/cum_records_epoch)
#training loss
plt.plot(np.array(hold_loss))
plt.show()
Possible Reason
Not knowing what the "sometimes dramatic differences" are, it is hard to answer for sure; but having different results when running [block_1 x1; block_2 x1] xN (read "running block_1 then block_2 once; and repeat both operations N times) and [block_1 x1; block_2 xN] x1 makes sense, given how pseudo-random number generators (PRNGs) and seeds work.
In the first case, you are re-initializing the PRNGs in block_1 after each block_2, so each of the N instances of block_2 will access the same sequence of pseudo-random numbers, seeded by each block_1 before.
In the second case, the PRNGs are initialized only once, by the single block_1 run. So each instance of block_2 will have different random values.
(For more on PRNGs and seeds, you could check: random.seed(): What does it do?)
Simplified Example
Let's suppose numpy/CUDA/pytorch are actually using a really poor PRNG, which only returns incremented values (i.e. PRNG(x_n) = PRNG(x_(n-1)) + 1, with x_0 = seed). If you seed this generator with 0, it will thus return 1 the first random() call, 2 the second call, etc.
Now let also simplifies your blocks for the sake of the example:
def block_1():
seed = 0
print("seed: {}".format(seed))
prng.seed(seed)
--
def block_2():
res = "random results:"
for i in range(4):
res += " {}".format(prng.random())
print(res)
Let's compare [block_1 x1; block_2 x1] xN and [block_1 x1; block_2 xN] x1 with N=3:
for i in range(3):
block_1()
block_2()
# > seed: 0
# > random results: 1 2 3 4
# > seed: 0
# > random results: 1 2 3 4
# > seed: 0
# > random results: 1 2 3 4
block_1()
for i in range(3):
block_2()
# > seed: 0
# > random results: 1 2 3 4
# > random results: 4 5 6 7
# > random results: 8 9 10 11
Related
I'm currently trying to build a "simple" LSTM model that takes historical Bitcoin data, learns from that and then tries to predict the future X steps in advance.
I've build it on the idea that A + B + C = D so B + C + D should be E. (I think that's a very simple idea behind an LSTM model. I might be wrong however i'm pretty new to it.)
I managed to build the basics in python (I'm fairly new to python) but something seems off by the prediction. For some reason many of the predictions i test / make end up flatlining. I have a theory on why but we have no idea if it's correct and even less idea on how to solve it.
My theory is that within a sequence the model learns to put more importance / weight on the last digit in the sequence because with Bitcoin prices the future price (in 1 minute) is probably pretty close to the price now. That's try the predicted values keeps getting closer to the real value eventually being equal and thus flatlining in a graph. (I don't know if that makes sense but thats what i tought anyway.)
I've also added a screenshot of my graph from a few days ago. Almost all predictions however end similar to this graph. This is just a more extreme example as demonstration.
Here is my code, can someone please explain why it flatlines and what i did wrong?
import numpy as np
from matplotlib import pyplot
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout
import yfinance as yf
from sklearn.preprocessing import MinMaxScaler
from math import sqrt
from sklearn.metrics import mean_squared_error
# Create output sets X + Y from given input-set
# with inputset : a 1-dimensional list of floats
# with N : the number of lookback values to use for X
# with Gap : the number of point skipped between X and Y
# Y: is equal to input, (although the first N are missing)
# X: for each y of Y a corresponding set of size N is created
# composed of the N values preceeding y.
def create_lookback(inputset, n=1, gap=0):
print("create_lookback with n=%d gap=%d" % (n,gap))
print(" - length of inputset = %d" % len(inputset))
dataX, dataY = [], []
for i in range(len(inputset) - (n+gap)):
a = inputset[i:(i + n), 0]
dataX.append(a)
dataY.append(inputset[i + n+gap, 0])
print(" - length of dataY = %d" % len(dataY))
data_x = np.array(dataX)
xret = data_x.reshape(data_x.shape[0], 1, data_x.shape[1])
return xret, np.array(dataY)
# Train model based on given training-set + Test-set
def create_model(trainX,trainY,testX,testY):
model = Sequential()
model.add(LSTM(units = 100, input_shape=(trainX.shape[1], trainX.shape[2], )))
model.add(Dropout(0.2))
#model.add(LSTM(30, return_sequences=True))
#model.add(Dropout(0.1))
model.add(Dense(1))
model.compile(loss='mae', optimizer='adam')
history = model.fit(trainX, trainY, epochs=100, batch_size=5, validation_data=(testX, testY), verbose=1, shuffle=False)
return model
# Evaluate given X / Y set.
# - Calculate RMSE
# - Generate visual line-plot to screen
def show_result(scaler,yhat,setY,txt):
print("Show %s result" % txt)
yhat_inverse = scaler.inverse_transform(yhat.reshape(-1, 1))
testY_inverse = scaler.inverse_transform(setY.reshape(-1, 1))
if len(testY_inverse) == len(yhat_inverse):
rmse = sqrt(mean_squared_error(testY_inverse, yhat_inverse))
print(' RMSE %s : %.3f' % (txt,rmse))
pyplot.plot(yhat_inverse, label='predict '+txt)
pyplot.plot(testY_inverse, label='actual '+txt, alpha=0.5)
pyplot.legend()
pyplot.show()
# Extrapoleer is dutch for Extrapolate
def extrapoleer(i,model,tup,toekomst):
if(i == 0):
return
setX = np.array([[tup]])
y = model.predict(setX)
y_float = y[0][0]
tup_new = np.append(tup[1:], y_float)
toekomst.append(y_float)
extrapoleer(i-1, model, tup_new,toekomst)
# --- end of defined functions
# -- start of main flow
data_grid_1 = yf.download('BTC-USD', start="2021-04-14",end="2021-04-15", interval="1m");
data_grid_2 = yf.download('BTC-USD', period="12h", interval="1m");
dataset_1 = data_grid_1.iloc[:, 1:2].values
dataset_2 = data_grid_2.iloc[:, 1:2].values
scaler = MinMaxScaler(feature_range = (0, 1))
scaled = scaler.fit_transform(dataset_1)
# 70% of dataset_1 is used to train ; 30% to test
train_size = int(len(scaled) * 0.7)
test_size = len(scaled) - train_size
train, test = scaled[0:train_size,:], scaled[train_size:len(scaled),:]
print("train: %d test: %d" % (len(train), len(test)))
scaled_2 = scaler.fit_transform(dataset_2)
look_back_n = 3
look_back_gap = 0
trainX, trainY = create_lookback(train, look_back_n, look_back_gap)
testX, testY = create_lookback(test, look_back_n, look_back_gap)
testX_2, testY_2 = create_lookback(scaled_2, look_back_n, look_back_gap)
model = create_model(trainX,trainY,testX,testY)
yhat_1 = model.predict(testX)
yhat_2 = model.predict(testX_2)
show_result(scaler,yhat_1,testY,"test")
show_result(scaler,yhat_2,testY_2,"test2")
last_n = testY_2[-look_back_n:]
#toekomst = Future in dutch
toekomst = []
#aantal = Amount in Dutch, this indicates the amount if steps you want to future predict
aantal = 30
extrapoleer(aantal, model, last_n, toekomst)
print("Resultaat van %d voorspelde punten in de toekomst: " % aantal)
print(toekomst)
yhat_2_plus = np.append(yhat_2,toekomst)
show_result(scaler,yhat_2_plus,testY_2,"test2-plus")
I am using the Extreme Learning Machine classifier for hand gestures recognition but I still have 20% as accuracy.Can anyone help me to implement an iterative training loop to improve the accuracy?I am a beginner and Here the code I am using:I split the dataset that I prepared into train and test parts after normalization and I train it using the train function by calculating the Moore Penrose inverse and then predict the class of each gesture using the prediction function.
# -*- coding: utf-8 -*-
"""
Created on Sat Jul 4 17:52:25 2020
#author: lenovo
"""
# -*- coding: utf-8 -*-
__author__ = 'Sarra'
import numpy as np
class ELM(object):
def __init__(self, inputSize, outputSize, hiddenSize):
"""
Initialize weight and bias between input layer and hidden layer
Parameters:
inputSize: int
The number of input layer dimensions or features in the training data
outputSize: int
The number of output layer dimensions
hiddenSize: int
The number of hidden layer dimensions
"""
self.inputSize = inputSize
self.outputSize = outputSize
self.hiddenSize = hiddenSize
# Initialize random weight with range [-0.5, 0.5]
self.weight = np.matrix(np.random.uniform(-0.5, 0.5, (self.hiddenSize, self.inputSize)))
# Initialize random bias with range [0, 1]
self.bias = np.matrix(np.random.uniform(0, 1, (1, self.hiddenSize)))
self.H = 0
self.beta = 0
def sigmoid(self, x):
"""
Sigmoid activation function
Parameters:
x: array-like or matrix
The value that the activation output will look for
Returns:
The results of activation using sigmoid function
"""
return 1 / (1 + np.exp(-1 * x))
def predict(self, X):
"""
Predict the results of the training process using test data
Parameters:
X: array-like or matrix
Test data that will be used to determine output using ELM
Returns:
Predicted results or outputs from test data
"""
X = np.matrix(X)
y = self.sigmoid((X * self.weight.T) + self.bias) * self.beta
return y
def train(self, X, y):
"""
Extreme Learning Machine training process
Parameters:
X: array-like or matrix
Training data that contains the value of each feature
y: array-like or matrix
Training data that contains the value of the target (class)
Returns:
The results of the training process
"""
X = np.matrix(X)
y = np.matrix(y)
# Calculate hidden layer output matrix (Hinit)
self.H = (X * self.weight.T) + self.bias
# Sigmoid activation function
self.H = self.sigmoid(self.H)
# Calculate the Moore-Penrose pseudoinverse matriks
H_moore_penrose = np.linalg.pinv(self.H.T * self.H) * self.H.T
# Calculate the output weight matrix beta
self.beta = H_moore_penrose * y
return self.H * self.beta
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn import preprocessing
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
# read the dataset
database = pd.read_csv(r"C:\\Users\\lenovo\\tensorflow\\tensorflow1\\Numpy-ELM\\hand_gestures_database.csv")
#separate data from labels
data = database.iloc[:, 1:].values.astype('float64')
#normalize data
#n_data = preprocessing.minmax_scale(data, feature_range=(0, 1), axis=0, copy=True)
scaler = MinMaxScaler()
scaler.fit(data)
n_data = scaler.transform(data)
#identify the labels
label = database.iloc[:, 0]
#encoding labels to transform each label to a value between 0 to number of labels-1
def prepare_targets(n):
le =preprocessing.LabelEncoder()
le.fit(n)
label_enc = le.transform(n)
return label_enc
label_enc = prepare_targets(label)
CLASSES = 10
#transform the value of each label to a binary vector
target = np.zeros([label_enc.shape[0], CLASSES])
for i in range(label_enc.shape[0]):
target[i][label_enc[i]] = 1
target.view(type=np.matrix)
print("target",target)
# Create instance of ELM object with 10 hidden neuron
maxx=0
for u in range(10):
elm = ELM(45, 10, 10)
# Train test split 80:20
X_train, X_test, y_train, y_test = train_test_split(n_data, target, test_size=0.34, random_state=1)
elm.train(X_train,y_train)
y_pred = elm.predict(X_test)
# Train data
correct = 0
total = y_pred.shape[0]
for i in range(total):
predicted = np.argmax(y_pred[i])
test = np.argmax(y_test[i])
correct = correct + (1 if predicted == test else 0)
print('Accuracy: {:f}'.format(correct/total))
if(correct/total>maxx):
maxx=correct/total
print(maxx)
###confusion matrix
import seaborn as sns
y_pred=np.argmax(y_pred, axis=1)
y_true=(np.argmax(y_test, axis=1))
target_names=["G1","G2","G3","G4","G5","G6","G7","G8","G9","G10"]
cm=confusion_matrix(y_true, y_pred)
#cmn = cm.astype('float')/cm.sum(axis=1)[:, np.newaxis]*100
fig, ax = plt.subplots(figsize=(15,8))
sns.heatmap(cm/np.sum(cm), annot=True, fmt='.2f',xticklabels=target_names, yticklabels=target_names, cmap='Blues')
#sns.heatmap(cmn, annot=True, fmt='.2%', xticklabels=target_names, yticklabels=target_names)
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.ylim(-0.5, len(target_names) + 0.5)
plt.show(block=False)
def perf_measure(y_actual, y_pred):
TP = 0
FP = 0
TN = 0
FN = 0
for i in range(len(y_pred)):
if y_actual[i]==y_pred[i]==1:
TP += 1
if y_pred[i]==1 and y_actual[i]!=y_pred[i]:
FP += 1
if y_actual[i]==y_pred[i]==0:
TN += 1
if y_pred[i]==0 and y_actual[i]!=y_pred[i]:
FN += 1
return(TP, FP, TN, FN)
TP, FP, TN, FN=perf_measure(y_true, y_pred)
print("precision",TP/(TP+FP))
print("sensivity",TP/(TP+FN))
print("specifity",TN/(TN+FP))
print("accuracy",(TP+TN)/(TN+FP+FN+TP))
To your question about whether you can implement an iterative training loop for an ELM:
No, you can not. An ELM consists of one random layer followed by an output layer. Because the first layer is fixed, this is essentially a linear model and we can find the optimal output weights by using the pseudo-inverse, as you pointed out.
However, since you already find the perfect solution for this model in one step, there is no direct way to iteratively improve this result.
I would, however, not advise using an extreme learning machine.
Besides the controversy about their origin, they are very limited in the functions they can learn.
There are other well-established approaches for gesture classification that are likely more useful to pursue.
What is the difference in the implementation of the numerically stable sigmoid function and that implemented in TensorFlow?
I am getting different results while implementing these two functions sigmoid() and tf.nn.sigmoid() (or tf.sigmoid()). The first one gives nan and a very bad accuracy (around 0.93%) while the second one gives a very good accuracy (around 99.99%).
The numerically stable sigmoid function, sigmoid(), is given by:
def sigmoid(z):
return tf.where(z >= 0, 1 / (1 + tf.exp(-z)), tf.exp(z) / (1 + tf.exp(z)))
I expect to get the same results (accuracy) for both approaches, whether that one implemented by TensorFlow or that one created from scratch sigmoid().
Note: I tested the two functions tf.sigmoid and sigmoid() with same model.
I tried reproducing your case with below code with simple Iris Dataset. Value of l is the cost calculated using tf.sigmoid and the value of l2 is the cost (cost2) calculated using your custom sigmoid function and the values of l and l2 are almost the same for me.
We can dig deeper into this if you could provide code and the data (if it can be shared).
import tensorflow as tf
import numpy as np
import pandas as pd
from sklearn import preprocessing
from sklearn import model_selection
import sys
iris_data = pd.read_csv('iris_species/Iris.csv',header=0,delimiter = ',')
data_set_y = pd.DataFrame(iris_data['Species'])
data_set_X = iris_data.drop(['Species'],axis=1)
num_samples = iris_data.shape[0]
num_features = iris_data.shape[1]
num_labels = 1
X = tf.placeholder('float',[None,4])
y = tf.placeholder('float',[None,1])
W = tf.Variable(tf.zeros([4,2]),dtype=tf.float32)
b = tf.Variable(tf.zeros([1]),dtype=tf.float32)
train_X,test_X,train_y,test_y = model_selection.train_test_split(data_set_X,data_set_y,random_state=0)
train_y = np.reshape(train_y,(-1,1))
prediction = tf.add(tf.matmul(X,W),b)
cost = tf.sigmoid(prediction)
optimizer = tf.train.GradientDescentOptimizer(0.001).minimize(cost)
num_epochs = 1000
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch in range(num_epochs):
_,l = sess.run([optimizer,cost],feed_dict = {X: train_X, y: train_y})
if epoch % 50 == 0:
#print (type(l))
#print (l.shape)
print (l)
def sigmoid(z):
return tf.where(z >= 0, 1 / (1 + tf.exp(-z)), tf.exp(z) / (1 + tf.exp(z)))
prediction = tf.add(tf.matmul(X,W),b)
cost2 = sigmoid(prediction)
optimizer2 = tf.train.GradientDescentOptimizer(0.001).minimize(cost2)
num_epochs = 1000
print ('Shape of train_y is: ',train_y.shape)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch in range(num_epochs):
_,l2 = sess.run([optimizer2,cost2],feed_dict = {X: train_X, y: train_y})
if epoch % 50 == 0:
#print (type(l))
#print (l.shape)
print (l2)
I am trying to solve the very simple non-linear problem. It is XOR gate.
I my school knowledge. XOR can be solve by using 2 input nodes, 2 hidden layer nodes. And 1 output. It is binary classification problem.
I generate the 1000 of random integer number it is 0 or 1 and then do backpropagation. But for some unknown reason my network has not learned anything. The training accuracy is constant at 50.
# coding: utf-8
import matplotlib
import torch
import torch.nn as nn
from torch.autograd import Variable
matplotlib.use('TkAgg') # My buggy OSX 10.13.6 requires this
import matplotlib.pyplot as plt
from torch.utils.data import Dataset
from tqdm import tqdm
import random
N = 1000
batch_size = 10
epochs = 40
hidden_size = 2
output_size = 1
lr = 0.1
def return_xor(N):
tmp_x = []
tmp_y = []
for i in range(N):
a = (random.randint(0, 1) == 1)
b = (random.randint(0, 1) == 1)
if (a and not b) or (not a and b):
q = True
else:
q = False
input_features = (a, b)
output_class = q
tmp_x.append(input_features)
tmp_y.append(output_class)
return tmp_x, tmp_y
# In[495]:
# Training set
x, y = return_xor(N)
x = torch.tensor(x, dtype=torch.float, requires_grad=True)
y = torch.tensor(y, dtype=torch.float, requires_grad=True)
# Test dataset
x_test, y_test = return_xor(100)
x_test = torch.tensor(x_test)
y_test = torch.tensor(y_test)
class MyDataset(Dataset):
"""Define my own `Dataset` in order to use `Variable` with `autograd`"""
def __init__(self, x, y):
self.x = x
self.y = y
def __getitem__(self, index):
return self.x[index], self.y[index]
def __len__(self):
return len(self.x)
dataset = MyDataset(x, y)
test_dataset = MyDataset(x_test, y_test)
print(dataset.x.shape)
print(dataset.y.shape)
# Make data iterable by loading to a loader. Shuffle, batch_size kwargs put them here in order to remind I myself
train_loader = torch.utils.data.DataLoader(dataset=dataset, batch_size=batch_size, shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False)
print(f"They are {len(train_loader)} batches in the dataset")
shown = 0
for (x, y) in train_loader:
if shown == 1:
break
print(f"{x.shape} {x.dtype}")
print(f"{y.shape} {y.dtype}")
shown += 1
class MyModel(nn.Module):
"""
Binary classification
2 input nodes
2 hidden nodes
1 output node
"""
def __init__(self, input_size, hidden_size, output_size):
super().__init__()
self.fc1 = torch.nn.Linear(input_size, hidden_size)
self.fc2 = torch.nn.Linear(hidden_size, output_size)
self.sigmoid = torch.nn.Sigmoid()
def forward(self, out):
out = self.fc1(out)
out = self.fc2(out)
out = self.sigmoid(out)
return out
# Create my network
net = MyModel(dataset.x.shape[1], hidden_size, output_size)
CUDA = torch.cuda.is_available()
if CUDA:
net = net.cuda()
criterion = torch.nn.BCELoss(reduction='elementwise_mean')
optimizer = torch.optim.SGD(net.parameters(), lr=lr)
# Train the network
correct_train = 0
total_train = 0
for epoch in range(epochs):
for i, (batches, labels) in enumerate(train_loader):
batcesh = Variable(batches.float())
labels = Variable(labels.float())
output = net(batches) # Forward pass
optimizer.zero_grad()
loss = criterion(output, labels.view(10, 1))
loss.backward()
optimizer.step()
total_train += labels.size(0)
correct_train += (predicted == labels.long()).sum()
if (i + 1) % 10 == 0:
print(f"""
Epoch {epoch+1}/{epochs},
Iteration {i+1}/{len(dataset)//batch_size},
Training Loss: {loss.item()},
Training Accuracy: {100*correct_train/total_train}
""")
Solution:
I did initialized weight, Adaptive learning rate
https://github.com/elcolie/nnbootcamp/blob/master/Study-XOR.ipynb
I am not sure what results you are getting, as the code you have posted in the question doesn't work (It gives errors with pytorch 0.4.1 like predicted not defined etc). But syntax issues apart, there are other problems.
Your model is not actually two layer as it does not use non-linearity after the first output. Effectively this is one layer network and to fix that you can modify your model's forward as follows:
def forward(self, out):
out = torch.nn.functional.relu(self.fc1(out))
out = self.fc2(out)
out = self.sigmoid(out)
return out
You can try sigmoid or tanh non-linearity as well... but the non-linearity is a must. This should fix the problem.
I also see that you are using only 2 hidden units. This might be restrictive and you might want to increase that to something like 5 or 10.
I have a dataset with 5 columns, I am feeding in first 3 columns as my Inputs and the other 2 columns as my outputs.
I have successfully executed the program but i am not sure how to test the model by giving my own values as input and getting a predicted output from the model.
Can anyone please help me, How can I actually test the model with my own value after training is done ?
I am using Tensorflow in Python. I am able to display accuracy of testing,but How do I actually predict with value if I pass some random input (here,I need to pass 3 input values to get 2 output values)
Here is my code:
# Implementation of a simple MLP network with one hidden layer. Tested on the iris data set.
# Requires: numpy, sklearn>=0.18.1, tensorflow>=1.0
# NOTE: In order to make the code simple, we rewrite x * W_1 + b_1 = x' * W_1'
# where x' = [x | 1] and W_1' is the matrix W_1 appended with a new row with elements b_1's.
# Similarly, for h * W_2 + b_2
import tensorflow as tf
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
import pandas as pd
RANDOM_SEED = 1000
tf.set_random_seed(RANDOM_SEED)
def init_weights(shape):
""" Weight initialization """
weights = tf.random_normal(shape, stddev=0.1)
return tf.Variable(weights)
def forwardprop(X, w_1, w_2):
"""
Forward-propagation.
IMPORTANT: yhat is not softmax since TensorFlow's softmax_cross_entropy_with_logits() does that internally.
"""
h = tf.nn.sigmoid(tf.matmul(X, w_1)) # The \sigma function
yhat = tf.matmul(h, w_2) # The \varphi function
return yhat
def get_iris_data():
""" Read the iris data set and split them into training and test sets """
df = pd.read_csv("H:\MiniThessis\Sample.csv")
train_X = np.array(df[df.columns[0:3]])
train_Y = np.array(df[df.columns[3:]])
print(train_X)
# Convert into one-hot vectors
#num_labels = len(np.unique(train_Y))
#all_Y = np.eye(num_labels)[train_Y] # One liner trick!
#print()
return train_test_split(train_X, train_Y, test_size=0.33, random_state=RANDOM_SEED)
def main():
train_X, test_X, train_y, test_y = get_iris_data()
# Layer's sizes
x_size = train_X.shape[1] # Number of input nodes: 4 features and 1 bias
h_size = 256 # Number of hidden nodes
y_size = train_y.shape[1] # Number of outcomes (3 iris flowers)
# Symbols
X = tf.placeholder("float", shape=[None, x_size])
y = tf.placeholder("float", shape=[None, y_size])
# Weight initializations
w_1 = init_weights((x_size, h_size))
w_2 = init_weights((h_size, y_size))
# Forward propagation
yhat = forwardprop(X, w_1, w_2)
predict = tf.argmax(yhat, axis=1)
# Backward propagation
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=yhat))
updates = tf.train.GradientDescentOptimizer(0.01).minimize(cost)
# Run SGD
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
for epoch in range(3):
# Train with each example
for i in range(len(train_X)):
sess.run(updates, feed_dict={X: train_X[i: i + 1], y: train_y[i: i + 1]})
train_accuracy = np.mean(np.argmax(train_y, axis=1) == sess.run(predict, feed_dict={X: train_X, y: train_y}))
test_accuracy = np.mean(np.argmax(test_y, axis=1) ==sess.run(predict, feed_dict={X: test_X, y: test_y}))
print("Epoch = %d, train accuracy = %.2f%%, test accuracy = %.2f%%"
% (epoch + 1, 100. * train_accuracy, 100. * test_accuracy))
correct_Prediction = tf.equal((tf.arg_max(predict,1)),(tf.arg_max(y,1)))
best = sess.run([predict], feed_dict={X: np.array([[20.14, 46.93, 1014.66]])})
#print(correct_Prediction)
print(best)
sess.close()
if __name__ == '__main__':
main()
Note, this answer takes some assumptions on the input, which you did not provide in your original question.
At this point I think your homework will require some more work to be finished. The 4th and 5th column are probably the petal width (a real value) and the Iris type (encoded as a category). This means that as a result you will probably need a real valued output (petal width) and a categorical prediction. This means that your softmax_cross_entry_with_logits will probably not play nicely with the petal width. Also you are applying an argmax as part of the prediction and this willl return the index with the highest value (in this case the petal width or the catergorically encoded value) which also does not make sense. So as a debugging aid why not start with:
print(sess.run([yhat], feed_dict={X: np.array([[20.14, 46.93, 1014.66]])})
This will print out yhat, (based upon the inputs [20.14, 46.93, 1014.66] (that's a massive flower, completely unlike anything inside the dataset) and will contain 2 outputs.