So, I'm trying to learn tensorflow and, for that, I try to create a classifier for something that, I think, is not so hard.
I'd like to predict if a number is odd or even.
The problem is that Tensorflow always predict the same output, I searched answers the last days but nothing helped me...
I saw the following answers : -Tensorflow predicts always the same result
-TensorFlow always converging to same output for all items after training
-TensorFlow always return same result
Here's my code:
in:
df
nb y1
0 1 0
1 2 1
2 3 0
3 4 1
4 5 0
...
19 20 1
inputX = df.loc[:, ['nb']].as_matrix()
inputY = df.loc[:, ['y1']].as_matrix()
print(inputX.shape)
print(inputY.shape)
out:
(20, 1)
(20, 1)
in:
# Parameters
learning_rate = 0.00000001
training_epochs = 2000
display_step = 50
n_samples = inputY.size
x = tf.placeholder(tf.float32, [None, 1])
W = tf.Variable(tf.zeros([1, 1]))
b = tf.Variable(tf.zeros([1]))
y_values = tf.add(tf.matmul(x, W), b)
y = tf.nn.relu(y_values)
y_ = tf.placeholder(tf.float32, [None,1])
# Cost function: Mean squared error
cost = tf.reduce_sum(tf.pow(y_ - y, 2))/(2*n_samples)
# Gradient descent
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
# Initialize variabls and tensorflow session
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
for i in range(training_epochs):
sess.run(optimizer, feed_dict={x: inputX, y_: inputY}) # Take a gradient descent step using our inputs and labels
# Display logs per epoch step
if (i) % display_step == 0:
cc = sess.run(cost, feed_dict={x: inputX, y_:inputY})
print("Training step:", '%04d' % (i), "cost=", "{:.9f}".format(cc)) #, \"W=", sess.run(W), "b=", sess.run(b)
print ("Optimization Finished!")
training_cost = sess.run(cost, feed_dict={x: inputX, y_: inputY})
print ("Training cost=", training_cost, "W=", sess.run(W), "b=", sess.run(b), '\n')
out:
Training step: 0000 cost= 0.250000000
Training step: 0050 cost= 0.250000000
Training step: 0100 cost= 0.250000000
...
Training step: 1800 cost= 0.250000000
Training step: 1850 cost= 0.250000000
Training step: 1900 cost= 0.250000000
Training step: 1950 cost= 0.250000000
Optimization Finished!
Training cost= 0.25 W= [[ 0.]] b= [ 0.]
in:
sess.run(y, feed_dict={x: inputX })
out:
array([[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.]], dtype=float32)
I tried to play with my Hyper parameters like, the learning rate or the number of training epochs.
I changed the activation function from softmax to relu.
I changed my dataframe to have more examples but nothing happened.
I also tried to add random for my Weights, but nothing changed, the cost was just starting to a higher value.
From giving a quick look at the code, it looks ok to me (maybe a part initializing the weights to zero, usually you want a small number different from zero to avoid a trivial solution), while I don't think that you can fit the problem of the parity of integers with a linear regression.
The point is that you are trying to fit
x % 2
with predictions of the form
activation(x * w + b)
and there is no way to find good w and b to solve this problem.
Another way to understand this is to plot your data: the scatter plot of the parity of x are two lines of points, and the only way to fit them with a line is with a flat line (that will have a high cost anyway).
I think it would be better to change data to start with, but if you want to address this problem, you should obtain some result using a sine or a cosine as activation function.
The main problem that I see is that you initialize your weights in the W matrix with 0s. The operation that you have in the linear layer is basically Wx + b. Hence the gradient with respect to x is W. If you start now with zeros for W then the gradient is 0 as well and you are not able to learn anything. Try to use random initial values as stated on tensorflow.org
# Create two variables.
weights = tf.Variable(tf.random_normal([784, 200], stddev=0.35),
name="weights")
biases = tf.Variable(tf.zeros([200]), name="biases")
first of all I have to admit that I never used tensorflow. But I think you have a modelling problem here.
You are using the simplest network architecture possible (a 1-dimensional perceptron). You have two variables (w and b) which you want to learn and your decision rule for the output looks like
if you subtract the b and divide by w you get
So you are basically looking for a threshold to seperate odd and even numbers. No matter how you choose w and b you will always misclassify half of the numbers.
Although decinding if a number is odd or even seems to be a super trivial task for us humans it is not for a single perceptron.
Related
I am training a keras model whose last layer is a single sigmoid unit:
output = Dense(units=1, activation='sigmoid')
I am training this model with some training data in which the expected output is always a number between 0.0 and 1.0.
I am compiling the model with mean-squared-error:
model.compile(optimizer='adam', loss='mse')
Since both the expected output and the real output are single floats between 0 and 1, I was expecting a loss between 0 and 1 as well, but when I start the training I get a loss of 3.3932, larger than 1.
Am I missing something?
Edit:
I am adding an example to show the problem:
https://drive.google.com/file/d/1fBBrgW-HlBYhG-BUARjTXn3SpWqrHHPK/view?usp=sharing
(I cannot just paste the code because I need to attach the training data)
After running python stackoverflow.py, the summary of the model will be shown, as well as the training process.
I also print the minimum and maximum values of y_true each step to verify that they are within the [0, 1] range.
There is no need to wait for the training to finish, you will see that the loss during the first few epochs is much larger than 1.
First, we can demystify mse loss - it's a normal callable function in tf.keras:
import tensorflow as tf
import numpy as np
mse = tf.keras.losses.mse
print(mse([1] * 3, [0] * 3)) # tf.Tensor(1, shape=(), dtype=int32)
Next, as the name "mean squared error" implies, it's a mean, meaning size of vectors passed to it do not change the value as long as the mean is the same:
print(mse([1] * 10, [0] * 10)) # tf.Tensor(1, shape=(), dtype=int32)
In order for the mse to exceed 1, average error must exceed 1:
print( mse(np.random.random((100,)), np.random.random((100,))) ) # tf.Tensor(0.14863832582680103, shape=(), dtype=float64)
print( mse( 10 * np.random.random((100,)), np.random.random((100,))) ) # tf.Tensor(30.51209646429651, shape=(), dtype=float64)
Lastly, sigmoid indeed guarantees that output is between 0 and 1:
sigmoid = tf.keras.activations.sigmoid
signal = 10 * np.random.random((100,))
output = sigmoid(signal)
print(f"Raw: {np.mean(signal):.2f}; Sigmoid: {np.mean(output):.2f}" ) # Raw: 5.35; Sigmoid: 0.92
What this implies is that in your code, mean of y_true is NOT between 0 and 1.
You can verify this with np.mean(y_true).
I do not have an answer for the question asked. I am getting nans in my MSE loss, with input in range [0,1] and sigmoid at output. So I thought the question is relevant.
Here are a few observations about sigmoid:
import tensorflow as tf
import numpy as np
x=tf.constant([-20, -1.0, 0.0, 1.0, 20], dtype = tf.float32)
x=tf.keras.activations.sigmoid(x)
x.numpy()
# array([2.0611537e-09, 2.6894143e-01, 5.0000000e-01, 7.3105860e-01,
# 1.0000000e+00], dtype=float32)
x=tf.constant([float('nan')]*5, dtype = tf.float32)
x=tf.keras.activations.sigmoid(x)
x.numpy()
# array([nan, nan, nan, nan, nan], dtype=float32)
x=tf.constant([np.inf]*5, dtype = tf.float32)
x=tf.keras.activations.sigmoid(x)
x.numpy()
# array([1., 1., 1., 1., 1.], dtype=float32)
So, it is possible to get nans out of sigmoid. Just in case someone (me, in near future) has this doubt (again).
I'm still working on my understanding of the PyTorch autograd system. One thing I'm struggling at is to understand why .clamp(min=0) and nn.functional.relu() seem to have different backward passes.
It's especially confusing as .clamp is used equivalently to relu in PyTorch tutorials, such as https://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-nn.
I found this when analysing the gradients of a simple fully connected net with one hidden layer and a relu activation (linear in the outputlayer).
to my understanding the output of the following code should be just zeros. I hope someone can show me what I am missing.
import torch
dtype = torch.float
x = torch.tensor([[3,2,1],
[1,0,2],
[4,1,2],
[0,0,1]], dtype=dtype)
y = torch.ones(4,4)
w1_a = torch.tensor([[1,2],
[0,1],
[4,0]], dtype=dtype, requires_grad=True)
w1_b = w1_a.clone().detach()
w1_b.requires_grad = True
w2_a = torch.tensor([[-1, 1],
[-2, 3]], dtype=dtype, requires_grad=True)
w2_b = w2_a.clone().detach()
w2_b.requires_grad = True
y_hat_a = torch.nn.functional.relu(x.mm(w1_a)).mm(w2_a)
y_a = torch.ones_like(y_hat_a)
y_hat_b = x.mm(w1_b).clamp(min=0).mm(w2_b)
y_b = torch.ones_like(y_hat_b)
loss_a = (y_hat_a - y_a).pow(2).sum()
loss_b = (y_hat_b - y_b).pow(2).sum()
loss_a.backward()
loss_b.backward()
print(w1_a.grad - w1_b.grad)
print(w2_a.grad - w2_b.grad)
# OUT:
# tensor([[ 0., 0.],
# [ 0., 0.],
# [ 0., -38.]])
# tensor([[0., 0.],
# [0., 0.]])
#
The reason is that clamp and relu produce different gradients at 0. Checking with a scalar tensor x = 0 the two versions: (x.clamp(min=0) - 1.0).pow(2).backward() versus (relu(x) - 1.0).pow(2).backward(). The resulting x.grad is 0 for the relu version but it is -2 for the clamp version. That means relu chooses x == 0 --> grad = 0 while clamp chooses x == 0 --> grad = 1.
I'm trying to build an LSTM model, data consists of date_time & some numeric values. While fitting the model, its getting
"ValueError: Error when checking input: expected lstm_1_input to have 3 dimensions, but got array with shape (10, 1)" error.
Sample data:
"date.csv" looks like:
Date
06/13/2018 07:20:04 PM
06/13/2018 07:20:04 PM
06/13/2018 07:20:04 PM
06/13/2018 07:22:12 PM
06/13/2018 07:22:12 PM
06/13/2018 07:22:12 PM
06/13/2018 07:26:20 PM
06/13/2018 07:26:20 PM
06/13/2018 07:26:20 PM
06/13/2018 07:26:20 PM
"tasks.csv" looks like :
Tasks
2
1
2
1
4
2
3
2
3
4
date = pd.read_csv('date.csv')
task = pd.read_csv('tasks.csv')
model = Sequential()
model.add(LSTM(24,return_sequences=True,input_shape=(date.shape[0],1)))
model.add(Dense(1))
model.compile(loss="mean_squared_error", optimizer="adam")
model.fit(date, task, epochs=100, batch_size=1, verbose=1)
How can I forecast the result?
There are some issues with this code sample. Therea are lack of preprocessing, label encoding, target encoding and incorrect loss function. I briefly describe possible solutions, but for more information and examples you can read a tutorial about time-series and forecasting.
Adressing specific problem which generates this ValueError is: LSTM requires a three-dimensional input. The shape of it is (batch_size, input_length, dimension). So, it requires an input of some values at least (batch_size, 1, 1) - but date.shape is (10, 1). If you do
date = date.values.reshape((1, 10, 1))
- it will solve this one problem, but brings an avalanche of other problems:
date = date.values.reshape((1, 10, 1))
model = Sequential()
model.add(LSTM(24, return_sequences=True, input_shape=(date.shape[1], 1)))
print(model.layers[-1].output_shape)
model.add(Dense(1))
model.compile(loss="mean_squared_error", optimizer="adam")
model.fit(date, task, epochs=100, batch_size=1, verbose=1)
ValueError: Input arrays should have the same number of samples as target arrays. Found 1 input samples and 10 target samples.
Unfortunately, there's no answers to other questions, because of a lack of information. But some general-purpose recommendations.
Preprocessing
Unfortunately, you probably can't just reshape because the forecasting is little less complicated thing. You should choose some periond based on you will forecast next task. Good news, there is periodic measurements, but for each time there are several tasks, which maked the task harder to solve.
Features
You should have a features to predict something. It's not clear what is feature this case, but perhaps not a date and time. Even the previous task could be a features, but you can't use just the task id, it requires some embedding, as it's not a continuous numeric value but a label.
Embedding
There's a keras.layers.Embedding for embedding of something in keras.
If the number of tasks is 4 (1, 2, 3, 4) and the shape of the output vector is, you could use this way:
model = Sequential()
model.add(Embedding(4 + 1, 10, input_length=10)) # + 1 to deal with non-zero indexing
# ... the reso of the code is omitted
- the first argument is a number of embedded items, second is an output shape, and the latter is input length (10 is just an example value).
Label encoding
Probably task labels just a labels, there's no reasonable distance or metric between them - i.e. you can't say 1 is closer to 2 than to 4 etc. That case mse is useless, but fortunately exists a probabilistic loss function named categorical cross-entropy which helps to predict a category of data.
To use it, you shoul binarize labels:
import numpy as np
def binarize(labels):
label_map = dict(map(reversed, enumerate(np.unique(labels))))
bin_labels = np.zeros((len(labels), len(label_map)))
bin_labels[np.arange(len(labels)), [label_map[label] for label in labels]] = 1
return bin_labels, label_map
binarized_task, label_map = binarize(task)
binarized_task
Out:
array([[0., 1., 0., 0.],
[1., 0., 0., 0.],
[0., 1., 0., 0.],
[1., 0., 0., 0.],
[0., 0., 0., 1.],
[0., 1., 0., 0.],
[0., 0., 1., 0.],
[0., 1., 0., 0.],
[0., 0., 1., 0.],
[0., 0., 0., 1.]]
label_map
Out:
{1: 0, 2: 1, 3: 2, 4: 3}
- binarized labels and the collection of "task-is's position in binary labels".
Of course, you should use cross-entropy loss in model with binarized labels. Also, the last layer should use softmax activation function (explained in tutorial about cross-entropy; shortly, you deal with a probabbility of a label, so, it should be sumed up to one, and softmax modifies previous layer values according to this requirement):
model.add(Dense(4, activation='softmax'))
model.compile(loss="categorical_crossentropy", optimizer="adam")
model.fit(date, binarized_task, epochs=100, batch_size=1, verbose=1)
"Complete", but, probably, meaningless example
This example uses all the things listed above, but it doesn't pretend to be complete or useful - but, I hope, it is explanatory at least.
import datetime
import numpy as np
import pandas as pd
import keras
from keras.models import Sequential
from keras.layers import Dense, LSTM, Flatten, Embedding
# Define functions
def binarize(labels):
"""
Labels of shape (size,) to {0, 1} array of the shape (size, n_labels)
"""
label_map = dict(map(reversed, enumerate(np.unique(labels))))
bin_labels = np.zeros((len(labels), len(label_map)))
bin_labels[np.arange(len(labels)), [label_map[label] for label in labels]] = 1
return bin_labels, label_map
def group_chunks(df, chunk_size):
"""
Group task date by periods, train on some columns and use lask ('Tasks') as the target. Function uses 'Tasks' as a features.
"""
chunks = []
for i in range(0, len(df)-chunk_size):
chunks.append(df.iloc[i:i + chunk_size]['Tasks']) # slice period, append
chunks[-1].index = list(range(chunk_size))
df_out = pd.concat(chunks, axis=1).T
df_out.index = df['Date'].iloc[:(len(df) - chunk_size)]
df_out.columns = [i for i in df_out.columns[:-1]] + ['Tasks']
return df_out
# I modify this date for simlicity - now it's a single entry for each datetime
date = pd.DataFrame({
"Date" : [
"06/13/2018 07:20:00 PM",
"06/13/2018 07:20:01 PM",
"06/13/2018 07:20:02 PM",
"06/13/2018 07:20:03 PM",
"06/13/2018 07:20:04 PM",
"06/13/2018 07:20:05 PM",
"06/13/2018 07:20:06 PM",
"06/13/2018 07:20:07 PM",
"06/13/2018 07:20:08 PM",
"06/13/2018 07:20:09 PM"]
})
task = pd.DataFrame({"Tasks": [2, 1, 2, 1, 4, 2, 3, 2, 3, 4]})
date['Tasks'] = task['Tasks']
date['Date'] = date['Date'].map(lambda x: datetime.datetime.strptime(x, "%m/%d/%Y %I:%M:%S %p")) # formatting datetime as datetime
chunk_size = 4
df = group_chunks(date, chunk_size)
# print(df)
"""
0 1 2 Tasks
Date
2018-06-13 19:20:00 2 1 2 1
2018-06-13 19:20:01 1 2 1 4
2018-06-13 19:20:02 2 1 4 2
2018-06-13 19:20:03 1 4 2 3
2018-06-13 19:20:04 4 2 3 2
2018-06-13 19:20:05 2 3 2 3
"""
# extract the train data and target
X = df[list(range(chunk_size-1))].values
y, label_map = binarize(df['Tasks'].values)
# Create a model, compile, fit
model = Sequential()
model.add(Embedding(len(np.unique(X))+1, 24, input_length=X.shape[-1]))
model.add(LSTM(24, return_sequences=True, input_shape=(date.shape[1], 1)))
model.add(Flatten())
model.add(Dense(4, activation='softmax'))
model.compile(loss="categorical_crossentropy", optimizer="adam")
history = model.fit(X, y, epochs=100, batch_size=1, verbose=1)
Out:
Epoch 1/100
6/6 [==============================] - 1s 168ms/step - loss: 1.3885
Epoch 2/100
6/6 [==============================] - 0s 5ms/step - loss: 1.3811
Epoch 3/100
6/6 [==============================] - 0s 5ms/step - loss: 1.3781
...
- etc. Works somehow, but I kinly advice one more time: read a toturial linked above (or any othe forecasting tutorial). Because, for example, I haven't covered a testing/validation area in this example.
I code tensorflow program for linear regression. I am using Gradient Descent algorithm for optimizing(Minimising) loss function. But value of loss function is increasing while executing the program. My program and output is in follow.
import tensorflow as tf
W = tf.Variable([.3],dtype=tf.float32)
b = tf.Variable([-.3],dtype=tf.float32)
X = tf.placeholder(tf.float32)
Y = tf.placeholder(tf.float32)
sess = tf.Session()
init = init = tf.global_variables_initializer()
sess.run(init)
lm = W*X + b
delta = tf.square(lm-Y)
loss = tf.reduce_sum(delta)
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)
for i in range(8):
print(sess.run([W, b]))
print("loss= %f" %sess.run(loss,{X:[10,20,30,40],Y:[1,2,3,4]}))
sess.run(train, {X: [10,20,30,40],Y: [1,2,3,4]})
sess.close()
Output for my program is
2017-12-07 14:50:10.517685: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
[array([ 0.30000001], dtype=float32), array([-0.30000001],dtype=float32)]
loss= 108.359993
[array([-11.09999943], dtype=float32), array([-0.676], dtype=float32)]
loss= 377836.000000
[array([ 662.25195312], dtype=float32), array([ 21.77807617], dtype=float32)]
loss= 1318221568.000000
[array([-39110.421875], dtype=float32), array([-1304.26794434], dtype=float32)]
loss= 4599107289088.000000
[array([ 2310129.25], dtype=float32), array([ 77021.109375], dtype=float32)]
loss= 16045701465112576.000000
[array([ -1.36451664e+08], dtype=float32), array([-4549399.], dtype=float32)]
loss= 55981405829796462592.000000
[array([ 8.05974733e+09], dtype=float32), array([ 2.68717856e+08], dtype=float32)]
loss= 195312036582209632600064.000000
Please provide me a answer why value of loss is increasing instead of decreasing.
Did you try changing the learning rate? Using a lower running rate (~1e-4) and more iterations should work.
More justification as to why a lower learning rate might be required. Note that your loss function is
L = \sum (Wx+b-Y)^2
and dL/dW = \sum 2(Wx+b-Y)*x
and hessian d^2L/d^2W = \sum 2x*x
Now, your loss is diverging because learning rate is more than inverse of hessian which there will be roughly 1/(2*2900). So you should try and decrease the learning rate here.
Note: I wasn't sure how to add math to StackOverflow answer so I had to add it this way.
To do a linear regression this is the code i've been using with numpy:
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import pandas as pd
print(tf.__version__)
%matplotlib inline
plt.rcParams['figure.figsize'] = (10, 6)
x = np.arange(start=0.0, stop=5.0, step=0.1)
##You can adjust the slope and intercept to verify the changes in the graph
W=1
b=0
# We define de linear ecuation
y= W*x + b
# And plot it thanks to matplotlib
plt.plot(x,y)
plt.ylabel('Dependent Variable')
plt.xlabel('Indepdendent Variable')
plt.show()
With TensorFlow you can use something similar to the code bellow to do a linear regression:
def graph_formula_vs_data(formula, x_vector, y_vector):
"""
This function graphs a formula in the form of a line, vs. data points
"""
x = np.array(range(0, int(max(x_vector))))
y = eval(formula)
plt.plot(x, y)
plt.plot(x_vector, y_vector, "ro")
plt.show()
df=pd.read_csv('./linear_reg_exam_dataset.csv',usecols = [0,1],skiprows = [0],header=None)
d = df.values
data = np.float32(d)
dataset = pd.DataFrame({'x': data[:, 0], 'y': data[:, 1]})
# Number of epochs (times we make the model go through all the data)
n_epochs = 100
# Model parameters
W = tf.Variable([0.], tf.float32)
b = tf.Variable([0.], tf.float32)
y = dataset['y'] # define the target variable (dependent variable) as y
x = dataset['x']
msk = np.random.rand(len(df)) < 0.8
# Model input and output
x_train = x[msk].values.tolist()
y_train = y[msk].values.tolist()
# Validation data (with this we validate that the model has learned to generalize the problem)
x_val = x[~msk].values.tolist()
y_val = y[~msk].values.tolist()
# Model definition
#tf.function
def linear_model(x, W, b):
return W*x + b
# Cost function
loss = lambda: tf.reduce_sum(tf.math.squared_difference(y_train,linear_model(x_train, W, b)))
# optimizer to do the gradient descent
optimizer = tf.optimizers.SGD(0.0000000000001)
# We perform n_epochs training iterations
for i in range(n_epochs):
optimizer.minimize(loss, var_list=[W, b])
# Every 10 epochs we print the data of how W, b evolve and the amount of error there is
if i % 10 == 0 or i == n_epochs-1:
print("Epoch {}".format(i))
print("W: {}".format(W.numpy()))
print("b: {}".format(b.numpy()))
print("loss: {}".format(loss()))
# This formula represents w * x + b in string form to be able to graph it
stringfied_formula=str(W.numpy()) + "*x +" + str(b.numpy())
graph_formula_vs_data(formula=stringfied_formula, x_vector=x_train, y_vector=y_train)
print("\n")
Epoch 99
W: [0.39189553]
b: [0.00059491]
loss: 1458421628928.0
# Evaluation of the model with validation data
stringfied_formula=str(W.numpy()) + "*x +" + str(b.numpy())
graph_formula_vs_data(formula=stringfied_formula, x_vector=x_val, y_vector=y_val)
loss = lambda: tf.reduce_sum(tf.math.squared_difference(y_val,linear_model(x_val, W, b)))
print("\nValidation: ")
print("W: {}".format(W.numpy()))
print("b: {}".format(b.numpy()))
print("loss: {}".format(loss()))
graph_formula_vs_data(formula=stringfied_formula, x_vector=x_val, y_vector=y_val)
Validation:
W: [75.017586]
b: [0.11139687]
loss: 8863.4775390625
I tried to build a simple MLP with an input layer (2 neurons), a hidden layer (5 neurons) and an output layer (1 neuron). I planned to train and feed it with [[0., 0.], [0., 1.], [1., 0.], [1., 1.]] for getting the desired output of [0., 1., 1., 0.] (elementwise).
Unfortunately my code refuses to run. I keep getting dimensionality errors no matter what I'm trying. Quite frustrating :/ I think I'm missing something but I can not figure out what is wrong.
For better readability I also uploaded the code to a pastebin: code
Any ideas?
import tensorflow as tf
#####################
# preparation stuff #
#####################
# define input and output data
input_data = [[0., 0.], [0., 1.], [1., 0.], [1., 1.]] # XOR input
output_data = [0., 1., 1., 0.] # XOR output
# create a placeholder for the input
# None indicates a variable batch size for the input
# one input's dimension is [1, 2]
n_input = tf.placeholder(tf.float32, shape=[None, 2])
# number of neurons in the hidden layer
hidden_nodes = 5
################
# hidden layer #
################
b_hidden = tf.Variable(0.1) # hidden layer's bias neuron
W_hidden = tf.Variable(tf.random_uniform([hidden_nodes, 2], -1.0, 1.0)) # hidden layer's weight matrix
# initialized with a uniform distribution
hidden = tf.sigmoid(tf.matmul(W_hidden, n_input) + b_hidden) # calc hidden layer's activation
################
# output layer #
################
W_output = tf.Variable(tf.random_uniform([hidden_nodes, 1], -1.0, 1.0)) # output layer's weight matrix
output = tf.sigmoid(tf.matmul(W_output, hidden)) # calc output layer's activation
############
# learning #
############
cross_entropy = tf.nn.sigmoid_cross_entropy_with_logits(output, n_input) # calc cross entropy between current
# output and desired output
loss = tf.reduce_mean(cross_entropy) # mean the cross_entropy
optimizer = tf.train.GradientDescentOptimizer(0.1) # take a gradient descent for optimizing with a "stepsize" of 0.1
train = optimizer.minimize(loss) # let the optimizer train
####################
# initialize graph #
####################
init = tf.initialize_all_variables()
sess = tf.Session() # create the session and therefore the graph
sess.run(init) # initialize all variables
# train the network
for epoch in xrange(0, 201):
sess.run(train) # run the training operation
if epoch % 20 == 0:
print("step: {:>3} | W: {} | b: {}".format(epoch, sess.run(W_hidden), sess.run(b_hidden)))
EDIT: I am still getting errors :/
hidden = tf.sigmoid(tf.matmul(n_input, W_hidden) + b_hidden)
outputs line 27 (...) ValueError: Dimensions Dimension(2) and Dimension(5) are not compatible. Altering the line to:
hidden = tf.sigmoid(tf.matmul(W_hidden, n_input) + b_hidden)
seems to be working, but then the error appears in:
output = tf.sigmoid(tf.matmul(hidden, W_output))
telling me: line 34 (...) ValueError: Dimensions Dimension(2) and Dimension(5) are not compatible
Turning the statement to:
output = tf.sigmoid(tf.matmul(W_output, hidden))
also throws an exception: line 34 (...) ValueError: Dimensions Dimension(1) and Dimension(5) are not compatible.
EDIT2: I do not really understand this. Shouldn't hidden be W_hidden x n_input.T, since in dimensions this would be (5, 2) x (2, 1)? If I transpose n_input hidden is still working (I even don't get the point why it is working without a transpose at all). However, output keeps throwing errors but this operation in dimensions should be (1, 5) x (5, 1)?!
(0) It's helpful to include the error output - it's also a useful thing to look at, because it does identify exactly where you were having shape problems.
(1) The shape errors arose because you have the arguments to matmul backwards in both of your matmuls, and have the tf.Variable backwards. The general rule is that the weights for layer that has input_size, output_size should be [input_size, output_size], and the matmul should be tf.matmul(input_to_layer, weights_for_layer) (and then add the biases, which are of shape [output_size]).
So with your code,
W_hidden = tf.Variable(tf.random_uniform([hidden_nodes, 2], -1.0, 1.0))
should be:
W_hidden = tf.Variable(tf.random_uniform([2, hidden_nodes], -1.0, 1.0))
and
hidden = tf.sigmoid(tf.matmul(W_hidden, n_input) + b_hidden)
should be tf.matmul(n_input, W_hidden); and
output = tf.sigmoid(tf.matmul(W_output, hidden))
should be tf.matmul(hidden, W_output)
(2) Once you've fixed those bugs, your run needs to be fed a feed_dict:
sess.run(train)
should be:
sess.run(train, feed_dict={n_input: input_data})
At least, I presume that this is what you're trying to achieve.