I am trying to implement a multi-layer perceptron using simple numpy. But, I am running into a road-block. There's a fair chance that I am not implementing it correctly as in the past I have always used a library for this purpose. I would really appreciate some help in debugging the code. It is less than 100 lines of code and hopefully, should not take up too much time. Thanks!
The details of my perceptron are as follows:
Inputs = 2
Output = 1
Hidden Layer Count = 5
Loss = Squared-Error
The following is my code:
(I have commented in all the places that seemed necessary)
import numpy as np
import matplotlib.pyplot as plt
#Sampling 100 random values uniformly distributed b/w 0 and 5.
x=np.random.uniform(low=0, high=5, size=(100,))
y=np.multiply(x,x)
#Storing the random values and their squares in x and y
x=np.reshape(x,(-1,1))
y=np.reshape(y,(-1,1))
# plt.plot(x,y, 'ro')
# plt.show()
#Network Initialisation
hSize=5
inputSize=1
outputSize=1
Wxh=np.random.rand(hSize, inputSize+1)
Woh=np.random.rand(outputSize, hSize+1)
#+++++++++++++Back-propagation++++++++++++++
iterations=100
WohGrad=np.zeros(Woh.shape)
WxhGrad=np.zeros(Wxh.shape)
for i in range(0, iterations):
#+++++++++++++Forward Pass++++++++++++++
#Input Layer
z1=x[i]
a1=z1
h1=np.append([1], a1)
#Hidden Layer-1
z2=np.dot(Wxh, h1)
a2=1/(1+np.exp(-z2))
h2=np.append([1], a2)
#Output Layer
z3=np.dot(Woh, h2)
a3=z3
#+++++++++++++Backward Pass++++++++++++++
#Squared Error
pred=a3
expected=y[i]
loss=np.square(pred-expected)/2
#Delta Values
delta_3=(pred-expected)
delta_2=np.multiply(np.dot(np.transpose(Woh), delta_3)[1:], 1/(1+np.exp(-z2) ))
#Parameter Gradients and Update
WohGrad=WohGrad+np.dot(delta_3,(h2.reshape(1,-1)))
WxhGrad=WxhGrad+np.dot(delta_2.reshape(hSize,-1),(h1.reshape(1,-1)))
#Parameter Update
learningRate=0.01
L2_regularizer=0.01
WohGrad=WohGrad/iterations+L2_regularizer*Woh
WxhGrad=WxhGrad/iterations+L2_regularizer*Wxh
Wxh=Wxh-learningRate*WxhGrad
Woh=Woh-learningRate*WohGrad
#++++++++Testing++++++++++
#Forward Pass
#Input Layer
z1=np.array([2.5])
a1=z1
h1=np.append([1], a1)
#Hidden Layer-1
z2=np.dot(Wxh, h1)
a2=1/(1+np.exp(-z2))
h2=np.append([1], a2)
#Output Layer
z3=np.dot(Woh, h2)
a3=z3
print(a3)
Related
I am building an audio-based deep learning model. As part of the preporcessing I want to augment the audio in my datasets. One augmentation that I want to do is to apply RIR (room impulse response) function. I am working with Python 3.9.5 and TensorFlow 2.8.
In Python the standard way to do it is, if the RIR is given as a finite impulse response (FIR) of n taps, is using SciPy lfilter
import numpy as np
from scipy import signal
import soundfile as sf
h = np.load("rir.npy")
x, fs = sf.read("audio.wav")
y = signal.lfilter(h, 1, x)
Running in loop on all the files may take a long time. Doing it with TensorFlow map utility on TensorFlow datasets:
# define filter function
def h_filt(audio, label):
h = np.load("rir.npy")
x = audio.numpy()
y = signal.lfilter(h, 1, x)
return tf.convert_to_tensor(y, dtype=tf.float32), label
# apply it via TF map on dataset
aug_ds = ds.map(h_filt)
Using tf.numpy_function:
tf_h_filt = tf.numpy_function(h_filt, [audio, label], [tf.float32, tf.string])
# apply it via TF map on dataset
aug_ds = ds.map(tf_h_filt)
I have two questions:
Is this way correct and fast enough (less than a minute for 50,000 files)?
Is there a faster way to do it? E.g. replace the SciPy function with a built-in TensforFlow function. I didn't find the equivalent of lfilter or SciPy's convolve.
Here is one way you could do
Notice that tensor flow function is designed to receive batches of inputs with multiple channels, and the filter can have multiple input channels and multiple output channels. Let N be the size of the batch I, the number of input channels, F the filter width, L the input width and O the number of output channels. Using padding='SAME' it maps an input of shape (N, L, I) and a filter of shape (F, I, O) to an output of shape (N, L, O).
import numpy as np
from scipy import signal
import tensorflow as tf
# data to compare the two approaches
x = np.random.randn(100)
h = np.random.randn(11)
# h
y_lfilt = signal.lfilter(h, 1, x)
# Since the denominator of your filter transfer function is 1
# the output of lfiler matches the convolution
y_np = np.convolve(h, x)
assert np.allclose(y_lfilt, y_np[:len(y_lfilt)])
# now let's do the convolution using tensorflow
y_tf = tf.nn.conv1d(
# x must be padded with half of the size of h
# to use padding 'SAME'
np.pad(x, len(h) // 2).reshape(1, -1, 1),
# the time axis of h must be flipped
h[::-1].reshape(-1, 1, 1), # a 1x1 matrix of filters
stride=1,
padding='SAME',
data_format='NWC')
assert np.allclose(y_lfilt, np.squeeze(y_tf)[:len(y_lfilt)])
I am playing with a mixed multinomial discrete choice model in Tensorflow Probability. The function should take an input of a choice among 3 alternatives. The chosen alternative is specified by CHOSEN (a # observationsx3 tensor). Below is an update to the code to reflect my progress on the problem (but the problem remains).
I currently get the error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [6768,3] vs. [1,3,6768] [Op:Mul]
with the traceback suggesting the issue is in the call to log_prob() for the final component of the joint distrubtion (i.e., tfp.Independent(tfp.Multinomial(...))
The main components are (thank you to Padarn Wilson for helping to fix the joint distribution definition):
#tf.function
def affine(x, kernel_diag, bias=tf.zeros([])):
"""`kernel_diag * x + bias` with broadcasting."""
kernel_diag = tf.ones_like(x) * kernel_diag
bias = tf.ones_like(x) * bias
return x * kernel_diag + bias
def mmnl_func():
adj_AV_train = (tf.ones(num_idx) - AV[:,0]) * tf.constant(-9999.)
adj_AV_SM = (tf.ones(num_idx) - AV[:,1]) * tf.constant(-9999.)
adj_AV_car = (tf.ones(num_idx) - AV[:,2]) * tf.constant(-9999.)
return tfd.JointDistributionSequential([
tfd.Normal(loc=0., scale=1e5), # mu_b_time
tfd.HalfCauchy(loc=0., scale=5), # sigma_b_time
lambda sigma_b_time,mu_b_time: tfd.MultivariateNormalDiag( # b_time
loc=affine(tf.ones([num_idx]), mu_b_time[..., tf.newaxis]),
scale_diag=sigma_b_time*tf.ones(num_idx)),
tfd.Normal(loc=0., scale=1e5), # a_train
tfd.Normal(loc=0., scale=1e5), # a_car
tfd.Normal(loc=0., scale=1e5), # b_cost
lambda b_cost,a_car,a_train,b_time: tfd.Independent(tfd.Multinomial(
total_count=1,
logits=tf.stack([
affine(DATA[:,0], tf.gather(b_time, IDX[:,0], axis=-1), (a_train + b_cost * DATA[:,1] + adj_AV_train)),
affine(DATA[:,2], tf.gather(b_time, IDX[:,0], axis=-1), (b_cost * DATA[:,3] + adj_AV_SM)),
affine(DATA[:,4], tf.gather(b_time, IDX[:,0], axis=-1), (a_car + b_cost * DATA[:,5] + adj_AV_car))
], axis=1)
),reinterpreted_batch_ndims=1)
])
#tf.function
def mmnl_log_prob(mu_b_time, sigma_b_time, b_time, a_train, a_car, b_cost):
return mmnl_func().log_prob(
[mu_b_time, sigma_b_time, b_time, a_train, a_car, b_cost, CHOICE])
# Based on https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args
# change constant values to tf.constant()
nuts_samples = tf.constant(1000)
nuts_burnin = tf.constant(500)
num_chains = tf.constant(1)
## Initial step size
init_step_size= tf.constant(0.3)
# Set the chain's start state.
initial_state = [
tf.zeros([num_chains], dtype=tf.float32, name="init_mu_b_time"),
tf.zeros([num_chains], dtype=tf.float32, name="init_sigma_b_time"),
tf.zeros([num_chains, num_idx], dtype=tf.float32, name="init_b_time"),
tf.zeros([num_chains], dtype=tf.float32, name="init_a_train"),
tf.zeros([num_chains], dtype=tf.float32, name="init_a_car"),
tf.zeros([num_chains], dtype=tf.float32, name="init_b_cost")
]
## NUTS (using inner step size averaging step)
##
#tf.function
def nuts_sampler(init):
nuts_kernel = tfp.mcmc.NoUTurnSampler(
target_log_prob_fn=mmnl_log_prob,
step_size=init_step_size,
)
adapt_nuts_kernel = tfp.mcmc.DualAveragingStepSizeAdaptation(
inner_kernel=nuts_kernel,
num_adaptation_steps=nuts_burnin,
step_size_getter_fn=lambda pkr: pkr.step_size,
log_accept_prob_getter_fn=lambda pkr: pkr.log_accept_ratio,
step_size_setter_fn=lambda pkr, new_step_size: pkr._replace(step_size=new_step_size)
)
samples_nuts_, stats_nuts_ = tfp.mcmc.sample_chain(
num_results=nuts_samples,
current_state=initial_state,
kernel=adapt_nuts_kernel,
num_burnin_steps=tf.constant(100),
parallel_iterations=tf.constant(5))
return samples_nuts_, stats_nuts_
samples_nuts, stats_nuts = nuts_sampler(initial_state)
More than likely this is an issue with your initial state and number of chains. You can try to initialize your kernel outside of the sampler call:
nuts_kernel = tfp.mcmc.NoUTurnSampler(
target_log_prob_fn=mmnl_log_prob,
step_size=init_step_size,
)
adapt_nuts_kernel = tfp.mcmc.DualAveragingStepSizeAdaptation(
inner_kernel=nuts_kernel,
num_adaptation_steps=nuts_burnin,
step_size_getter_fn=lambda pkr: pkr.step_size,
log_accept_prob_getter_fn=lambda pkr: pkr.log_accept_ratio,
step_size_setter_fn=lambda pkr, new_step_size: pkr._replace(step_size=new_step_size)
)
and then do
nuts_kernel.bootstrap_results(initial_state)
and investigate the shapes of logL, and proposal states are being returned.
Another thing to do is to feed your initial state into your log-likelihood/posterior and see if the dimensions of the returned log-likelihoods match what you think it should be (if you are doing multiple chains then maybe it should be returning # chains log likelihoods).
It is my understanding that the batch dimension (# chains) has to be the first one in all your vectorized calculations.
The very last part of my blog post on tensorflow and custom likelihoods has working code for an example that does this.
I was able to get reasonable results from my model. Thank you to everyone for the help! The following points helped solve the various issues.
Use of JointDistributionSequentialAutoBatched() to produce consistent batch shapes. You need tf-nightly installed for access.
More informative priors for hyperparameters. The exponential transformation in the Multinomial() distribution means that uninformative hyperparameters (i.e., with sigma = 1e5) mean you quickly have large positive numbers entering the exp(), leading to infinities.
Setting the step size, etc. was also important.
I found an answer by Christopher Suter to a recent question on the Tensorflow Probability forum specifying a model from STAN useful. I took the use of taking a sample from my prior as a starting point for the initial likelihood parameters useful.
Despite JointDistributionSequentialAutoBatched() correcting the batch shapes, I went back and corrected my joint distribution shapes so that printing log_prob_parts() gives consistent shapes (i.e., [10,1] for 10 chains). I still get a shape error without using JointDistributionSequentialAutoBatched() but the combination seem to work.
I separated my affine() into two functions. They do the same thing but remove retracing warnings. Basically, affine() was able to broadcast the inputs but they differed and it was easier to write two functions that setup the inputs with consistent shapes. Differently shaped inputs causes Tensorflow to trace the function multiple times.
i am new to python,trying to learn machine learning in python.i have tried to write a neural network from scratch with one hidden layer on the famous iris dataset.this is a three class classifier with out put as one hot vectors.i have also taken help from already written algos for help.for instance i used the same training set as my testing set.
it is a huge code to go through,i would like you to tell me, that how do we subtract 'y' output( which is one hot vector) of dimensions (150,3) and my out y softmax will be of vector (150,21).this is my biggest problem.i tried to look online everyone have used this method but since i am weak in python i don't understand it.this is the line of code delta3[range(m1), y] -= 1
arrays used as indices must be of integer (or boolean) type if m1 is sie of(150)
and if i give size m1(150,3) then
delta3[range(m1), y] -= 1
TypeError: range() integer end argument expected, got tuple.
remember m1=150
my y vector=150,3
softmax=150,21
my code is
#labels or classes
#1=iris-setosa
#2=iris-versicolor
#0=iris-virginica
#features
#sepallength
#sepalwidth
#petallengthcm
#petalwidth
import pandas as pd
import matplotlib.pyplot as plt
import csv
import numpy as np
df=pd.read_csv('Iris.csv')
df.convert_objects(convert_numeric=True)
df.fillna(0,inplace=True)
df.drop(['Id'],1,inplace=True)
#function to convert three labels into values 0,1,2
def handle_non_numericaldata(df):
columns=df.columns.values
for column in columns:
text_digit_vals={}
def convert_to_int(val):
return text_digit_vals[val]
if df[column].dtype!=np.int64 and df[column].dtype!=np.float:
column_contents=df[column].values.tolist()
unique_elements=set(column_contents)
x=0
for unique in unique_elements:
if unique not in text_digit_vals:
text_digit_vals[unique]=x
x+=1
df[column]=list(map(convert_to_int,df[column]))
return(df)
handle_non_numericaldata(df)
x=np.array(df.drop(['Species'],1).astype(float))
c=np.array(df['Species'])
n_values=(np.max(c)+1)
y=(np.eye(n_values)[c])
m1=np.size(c)
theta=np.ones(shape=(4,1))
theta2=np.ones(shape=(1,21))
#no of examples "m"
#learning rate alpha
alpha=0.01
#regularization parameter
lamda=0.01
for i in range(1,1000):
z1=np.dot(x,theta)
sigma=1/(1+np.exp(-z1))
#activation layer 2.
a2=sigma
z2=np.dot(a2,theta2)
probs=np.exp(z2)
softmax=probs/np.sum(probs,axis=1,keepdims=True)
delta3=softmax
delta3[range(m1), y] -= 1
A2=np.transpose(a2)
dw2 = (A2).dot(delta3)
W2=np.transpose(theta2)
delta2=delta3.dot(W2)*sigma*(1-sigma)
X2=np.transpose(x)
dw1=np.dot(X2,delta2)
dw2=dw2-lamda*theta2
dw1=dw1-lamda*theta
theta =theta -alpha* dw1
theta2= theta2-alpha * dw2
correct_logprobs=0
correct_logprobs=correct_logprobs-np.log(probs[range(m1),y])
data_loss=np.sum(correct_logprobs)
data_loss+=lamda/2*(np.sum(np.square(theta))+ np.square(theta2))
loss=1./m1*data_loss
if 1000%i==0:
print("loss after iteration%i:%f",loss)
final1=x.dot(theta)
sigma=1/(1+np.exp(-final1))
z2=sigma.dot(theta2)
exp_scores=np.exp(z2)
probs=exp_scores/np.sum(exp_scores,axis=1,keepdims=True)
print(np.argmax(probs,axis=1))
In Python range generates a tuple of numbers from x to y with range(x, y). If you generate something like range(10) then it is the same as (0, 1, 2, 3, 4, 5, 6, 7, 8, 9). Lists in Python need an integer index such as list[0] or list[4], not list[0, 4], however, there is a built-in thing in Python that allows access from index x to index y in a list here is the syntax: list[0:4]. This will return every value from 0 to 3 in the list. Such as if a list is list = [0,10,3,4,12,5,3] then list[0:4] will return [0,10,3,4].
Try taking a look at list data structures in Python on the Python Docs. As well as Understanding Generators in Python.
I think what your looking for is something like this: delta3 = [[z-1 for z in delta3[x:y]] for x in range(m1)]. This list comprehension uses two generations both, [x-1 for x in l], which subtracts one from every element in the list, and [l[x:y] for x in range(m)], which generates a list of lists with values through x to y in a range of m. Though I'm not sure I understand what your end goal is, fully.
What is a Neural Network?
The term ‘Neural’ has origin from the human (animal) nervous system’s basic functional unit ‘neuron’ or nerve cells present in the brain and other parts of the human (animal) body. A neural network is a group of algorithms that certify the underlying relationship in a set of data similar to the human brain. The neural network helps to change the input so that the network gives the best result without redesigning the output procedure
Now in code Example:_
import numpy as np
#assign input values
input_value=np.array([[0.26,0.77,0.25],[0.42,0.8,0.25],[0.56,0.53,0.25],[0.29,0.79,0.25]])
input_value.shape
#assign output values
output=np.array([0.644045,0.651730,0.707523,0.644395])
output=output.reshape(4,1)
output
#assign weights
weights=np.array([[0.1],[0.1],[0.1]])
weights.shape
weights
#add bias
bias=0.3
#activation function
def sigmoid_func(x):
return 1/(1+np.exp(-x))
#derivative of sigmoid function
def der(x):
return sigmoid_func(x)*(1-sigmoid_func(x))
#updating weights
for epochs in range(10000):
input_arr=input_value
#print(input_arr)
weighted_sum=np.dot(input_arr,weights)+bias
### CALCULATION OF PRE ACTIVATION FUNCTION
first_output=sigmoid_func(weighted_sum)
#print(first_output)
error=first_output - output
#print(error)
total_error=np.square(np.subtract(first_output,output)).mean()
#print total error
first_der=error
second_der=der(first_output)
derivative=first_der*second_der
t_input=input_value.T
final_derivative=np.dot(t_input,derivative)
#update Weigths
weights=weights-0.05*final_derivative
#update bias
for i in derivative:
bias=bias-0.05*i
print(weights)
print(bias)
#prediction for 1st item
pred=np.array([0.26,0.77,0.25])
result=np.dot(pred,weights)+bias
res=sigmoid_func(result)
print(res)
#prediction for 2nd item
pred=np.array([0.42,0.8,0.25])
result=np.dot(pred,weights)+bias
res=sigmoid_func(result)
print(res)
#prediction for 3rd item
pred=np.array([0.56,0.53,0.25])
result=np.dot(pred,weights)+bias
res=sigmoid_func(result)
print(res)
#prediction for 4th item
pred=np.array([0.29,0.79,0.25])
result=np.dot(pred,weights)+bias
res=sigmoid_func(result)
print(res)
i am making a neural network for scratch from practise and i am not very experienced python programmer.
i know most of the maths concepts of neural network. my model is in not behaving well.derivation of sigmoid function is h(x)*(1-h(x)) but i am not sure that line of code is correct,i searched it on google and everyone have used tanh activation.and i am not really sure delta 2.i have no idea where my code is going wrong.i have few doubts of how do we subtract the prediction (y)-label(y). this is a three class classifier.
delta3[range(m1), y] -= 1
this line of code is also not clear to me,i have copied it from online just putting my m1(total number of examples in it).because my y matrix(labels in form of 0,1,2) is a vector of order(150,1) and the the prediction matrix is of (151,21) sp how do we subtract them.
#labels or classes
#1=iris-setosa
#2=iris-versicolor
#0=iris-virginica
#features
#sepallength
#sepalwidth
#petallengthcm
#petalwidth
import pandas as pd
import matplotlib.pyplot as plt
import csv
import numpy as np
df=pd.read_csv('Iris.csv')
df.convert_objects(convert_numeric=True)
df.fillna(0,inplace=True)
df.drop(['Id'],1,inplace=True)
#function to convert three labels into values 0,1,2
def handle_non_numericaldata(df):
columns=df.columns.values
for column in columns:
text_digit_vals={}
def convert_to_int(val):
return text_digit_vals[val]
if df[column].dtype!=np.int64 and df[column].dtype!=np.float:
column_contents=df[column].values.tolist()
unique_elements=set(column_contents)
x=0
for unique in unique_elements:
if unique not in text_digit_vals:
text_digit_vals[unique]=x
x+=1
df[column]=list(map(convert_to_int,df[column]))
return(df)
handle_non_numericaldata(df)
x=np.array(df.drop(['Species'],1).astype(float))
y=np.array(df['Species'])
m1=np.size(y)
theta=np.ones(shape=(4,1))
theta2=np.ones(shape=(1,21))
#no of examples "m"
#learning rate alpha
alpha=0.01
#regularization parameter
lamda=0.01
for i in range(1,2):
z1=np.dot(x,theta)
sigma=1/(1+np.exp(-z1))
#activation layer 2.
a2=sigma
z2=np.dot(a2,theta2)
probs=np.exp(z2)
softmax=probs/np.sum(probs,axis=1,keepdims=True)
delta3=softmax
print(softmax)
delta3[range(m1), y] -= 1
A2=np.transpose(a2)
dw2 = (A2).dot(delta3)
W2=np.transpose(theta2)
delta2=delta3.dot(W2)*sigma*(1-sigma)
X2=np.transpose(x)
dw1=np.dot(X2,delta2)
dw2=dw2-lamda*theta2
dw1=dw1-lamda*theta
theta =theta -alpha* dw1
theta2= theta2-alpha * dw2
correct_logprobs=-np.log(probs[range(m1),y])
data_loss=np.sum(correct_logprobs)
data_loss+=lamda/2*(np.sum(np.square(theta))+ np.square(theta2))
( 1./m1*data_loss)
my output for theta(weights)is
[[ 1.22833047]
[ 1.22591229]
[ 1.22341726]
[ 1.22162091]]
which obviously can not be correct.
New to tensorflow, python and numpy (I guess that's everything in this sample)
In the code below I (almost) understand that the update_weights.run() call in the loop is calculating the loss and developing new weights. What I don't see is how this actually causes the weights to be changed.
The point I'm stuck on is commented # THIS IS WHAT I DONT UNDERSTAND
What is the relationship between the update_weights.run() and the new values being placed in weights? - Or perhaps; how come when weights.eval is called after the loop that the values have changed?
Thanks for any help
##test {"output": "ignore"}
# Import tf
import tensorflow as tf
# Numpy is Num-Pie n dimensional arrays
# https://en.wikipedia.org/wiki/NumPy
import numpy as np
# Plotting library
# http://matplotlib.org/users/pyplot_tutorial.html
import matplotlib.pyplot as plt
# %matplotlib magic
# http://ipython.readthedocs.io/en/stable/interactive/tutorial.html#magics-explained
%matplotlib inline
# Set up the data with a noisy linear relationship between X and Y.
# Variable?
num_examples = 5
noise_factor = 1.5
line_x_range = (-10,10)
#Just variables in Python
# np.linspace - Return evenly spaced numbers over a specified interval.
X = np.array([
np.linspace(line_x_range[0], line_x_range[1], num_examples),
np.linspace(line_x_range[0], line_x_range[1], num_examples)
])
# Plot out the starting data
# plt.figure(figsize=(4,4))
# plt.scatter(X[0], X[1])
# plt.show()
# npm.random.randn - Return a sample (or samples) from the “standard normal” distribution.
# Generate noise for x and y (2)
noise = np.random.randn(2, num_examples) * noise_factor
# plt.figure(figsize=(4,4))
# plt.scatter(noise[0],noise[1])
# plt.show()
# += on an np.array
X += noise
# The 'Answer' polyfit to the noisy data
answer_m, answer_b = np.polyfit(X[0], X[1], 1)
# Destructuring Assignment - http://codeschool.org/python-additional-miscellany/
x, y = X
# plt.figure(figsize=(4,4))
# plt.scatter(x, y)
# plt.show()
# np.array
# for a in x
# [(1., a) for a in [1,2,3]] => [(1.0, 1), (1.0, 2), (1.0, 3)]
# numpy.ndarray.astype - http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.astype.html
# Copy of the array, cast to a specified type.
x_with_bias = np.array([(1., a) for a in x]).astype(np.float32)
#Just variables in Python
# The difference between our current outputs and the training outputs over time
# Starts high and decreases
losses = []
history = []
training_steps = 50
learning_rate = 0.002
# Start the session and give it a variable name sess
with tf.Session() as sess:
# Set up all the tensors, variables, and operations.
# Creates a constant tensor
input = tf.constant(x_with_bias)
# Transpose the ndarray y of random float numbers
target = tf.constant(np.transpose([y]).astype(np.float32))
# Start with random weights
weights = tf.Variable(tf.random_normal([2, 1], 0, 0.1))
# Initialize variables ...?obscure?
tf.initialize_all_variables().run()
print('Initialization complete')
# tf.matmul - Matrix Multiplication
# What are yhat? Why this name?
yhat = tf.matmul(input, weights)
# tf.sub - Matrix Subtraction
yerror = tf.sub(yhat, target)
# tf.nn.l2_loss - Computes half the L2 norm of a tensor without the sqrt
# loss function?
loss = tf.nn.l2_loss(yerror)
# tf.train.GradientDescentOptimizer - Not sure how this is updating the weights tensor?
# What is it operating on?
update_weights = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)
# _ in Python is conventionally used for a throwaway variable
for step in range(training_steps):
# Repeatedly run the operations, updating the TensorFlow variable.
# THIS IS WHAT I DONT UNDERSTAND
update_weights.run()
losses.append(loss.eval())
b, m = weights.eval()
history.append((b,m,step))
# Training is done, get the final values for the graphs
betas = weights.eval()
yhat = yhat.eval()
# Show the fit and the loss over time.
# destructuring assignment
fig, (ax1, ax2, ax3) = plt.subplots(1, 3)
# Adjust whitespace between plots
plt.subplots_adjust(wspace=.2)
# Output size of the figure
fig.set_size_inches(12, 4)
ax1.set_title("Final Data Fit")
ax1.axis('equal')
ax1.axis([-15, 15, -15, 15])
# Scatter plot data x and y (pairs?) set with 60% opacity
ax1.scatter(x, y, alpha=.6)
# Scatter plot x and np.transpose(yhat)[0] (must be same length), in red, 50% transparency
# these appear to be the x values mapped onto the
ax1.scatter(x, np.transpose(yhat)[0], c="r", alpha=.5)
# Add the line along the slope defined by betas (whatever that is)
ax1.plot(line_x_range, [betas[0] + a * betas[1] for a in line_x_range], "g", alpha=0.6)
# This polyfit coefficients are reversed in order vs the betas
ax1.plot(line_x_range, [answer_m * a + answer_b for a in line_x_range], "r", alpha=0.3)
ax2.set_title("Loss over Time")
# Create a range of intefers from 0 to training_steps and plot the losses as a curve
ax2.plot(range(0, training_steps), losses)
ax2.set_ylabel("Loss")
ax2.set_xlabel("Training steps")
ax3.set_title("Slope over Time")
ax3.axis('equal')
ax3.axis([-15, 15, -15, 15])
for b, m, step in history:
ax3.plot(line_x_range, [b + a * m for a in line_x_range], "g", alpha=0.2)
# This line seems to be superfluous removing it doesn't change the behaviour
plt.show()
Ok so update_weights() is calling a minimizer on the loss that you defined to be the error between your prediction and the target.
What it will do is it will add some small quantity to the weights (how small is controlled by the learning_rate parameter) to make your loss decrease and hence to make your predictions "truer".
This is what happens when you call update_weights() so after the call your weights have changed from a small value and if everything went according to the plan your loss value has decreased.
What you want is follow the evolution of your loss and the weights see for example if the loss is really decreasing (and your algorithm works) or if the weights are changing a lot or maybe to visualize them.
You can gain a lot of insights by visualizing how the loss is changing.
This is why you have to see the full history of the evolution of the parameters and loss; That is why you eval them at each step.
The eval or run operation is different when you do it on the minimizer and on the parameters when you do it on the minimizer it will apply the minimizer to the weights when you do it on the weights it simply evaluates them.
I strongly advise you to read this website where the author explains far better than me what is going on and in more details.