classifying integer data by tensorflow - python

I want to classify
if input data is under 200 than output is (0, 1)
and if input data is over 200 than output is (1, 0)
input value is sequential integer value and layer is 5.
hidden layer use sigmoid and last hidden layer use softmax function
loss function is reduce_mean and training with gradient descendent
import numpy as np
import tensorflow as tf
def set_x_data():
x_data = np.array([[50]
, [60]
, [70]
, [80]
, [90]
, [110]
, [120]
, [130]
, [140]
, [150]
, [160]
, [170]
, [180]
, [190]
, [200]
, [210]
, [220]
, [230]
, [240]
, [250]
, [260]
, [270]
, [280]
, [290]
, [300]
, [310]
, [320]
, [330]
, [340]
, [350]
, [360]
, [370]
, [380]
, [390]])
return x_data
def set_y_data(x):
y_data = np.array([[0, 1]
, [0, 1]
, [0, 1]
, [0, 1]
, [0, 1]
, [0, 1]
, [0, 1]
, [0, 1]
, [0, 1]
, [0, 1]
, [0, 1]
, [0, 1]
, [0, 1]
, [0, 1]
, [0, 1]
, [0, 1]
, [1, 0]
, [1, 0]
, [1, 0]
, [1, 0]
, [1, 0]
, [1, 0]
, [1, 0]
, [1, 0]
, [1, 0]
, [1, 0]
, [1, 0]
, [1, 0]
, [1, 0]
, [1, 0]
, [1, 0]
, [1, 0]
, [1, 0]
, [1, 0]])
return y_data
def set_bias(efficiency):
arr = np.array([efficiency])
return arr
W1 = tf.Variable(tf.random_normal([1, 5]), name='weight1')
W2 = tf.Variable(tf.random_normal([5, 5]), name='weight2')
W3 = tf.Variable(tf.random_normal([5, 5]), name='weight3')
W4 = tf.Variable(tf.random_normal([5, 5]), name='weight4')
W5 = tf.Variable(tf.random_normal([5, 2]), name='weight5')
def inference(input, b):
hidden_layer1 = tf.sigmoid(tf.matmul(input, W1) + b)
hidden_layer2 = tf.sigmoid(tf.matmul(hidden_layer1, W2) + b)
hidden_layer3 = tf.sigmoid(tf.matmul(hidden_layer2, W3) + b)
hidden_layer4 = tf.sigmoid(tf.matmul(hidden_layer3, W4) + b)
out_layer = tf.nn.softmax(tf.matmul(hidden_layer4, W5) + b)
return out_layer
def loss(hypothesis, y):
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y * tf.log(hypothesis), reduction_indices=[1]))
return cross_entropy
def train(loss):
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1)
train = optimizer.minimize(loss)
return train
x_data = set_x_data(1)
y_data = set_y_data(0)
b_data = set_bias(0.8)
x= tf.placeholder(tf.float32, shape=[None, 1])
y= tf.placeholder(tf.float32, shape=[None, 2])
b = tf.placeholder(tf.float32, shape=[None])
hypothesis = inference(x, b)
loss = loss(hypothesis, y)
train = train(loss)
sess = tf.Session()
init = tf.global_variables_initializer()
for step in range(2000):, feed_dict={x:x_data, y:y_data, b:b_data})
print(, feed_dict={x:np.array([[1000]]), b:b_data}))
when I print W1 before training and after training, value doesn't change specially and testing when input = 1000, that value doesn't currect what I expect. I think value nearly close to (1, 0), but result is almost (0.5, 0.5)
I guess that mistakes come from loss function because it was copied from here and there, but I can't be sure about it
upper code is just simplified of my code but I think I have to show my real code
the code is too long so I create new post
classifying data by tensorflow but accuracy value didn't change

There are a few issues in the training of the above network, but with a few changes you can achieve a network that gets this decision function
(The plot in the link shows the score of class 2, i.e. if x > 200)
The list of issues subject to improvement in this network:
The training data is very scarce (only 34 points!) This is typically too small, especially for a 5-layer network as in your case. You typically want many more input samples than parameters in the network. Try adding more input values and reducing the number of layers (as in the code below - I've used floats instead of integers to get more points, but I think it is still compatible).
The input ranges typically require scaling (below I've tried a super-simple scaling by dividing by a constant). This is because you typically want to avoid high ranges of variables (especially of you pass many layers with a soft-max non-linearity, this would destroy the information contained in the very high or very low values). In more advanced cases you might want to do Min-Max Scaling or z-scores.
Try more epochs (and try plotting the evolution of the loss function value). With the given number of epochs, the optimization of the loss function had not converged. Below I do 10x more epochs. See how the code below now almost converges in this plot (and see how 2000 epochs were not enough):
Something that helped was shuffling the (x,y) data. Though this is not crucial in this case, it converges faster (see the paper "Efficient Backprop" by Le Cun). And in more serious examples it is typically needed.
Importantly, I think you want b to be a parameter, not a constant, don't you? The bias of a network is typically also optimized together with the multiplicative weights. (Also, it is not common to use a single, shared bias for all the hidden layers. )
Below is the code. Note there might be further improvements but these few tricks end up with the desired decision function.
I've added some inline comments to indicate changes with respect to the original. I hope you find these pieces of advice insightful!
The code:
import numpy as np
import tensorflow as tf
# I've modified the functions set_x_data and set_y_data
# so as to generate a larger set of numbers.
# Generate a range of numbers from 50 to 390
def set_x_data():
x_data = np.arange(50, 390, 0.1)
return x_data[:,None]
# Assign labels depending on x_data
def set_y_data(x_data):
ydata1 = x_data >= 200
ydata2 = x_data < 200
return np.hstack((ydata1, ydata2))
def set_bias(efficiency):
arr = np.array([efficiency])
return arr
# Let's keep W1 and W5 (one hidden layer only)
# BTW, in this problem you could do with 0 hidden layers. But keeping
# 1 to show it works
W1 = tf.Variable(tf.random_normal([1, 5]), name='weight1')
W5 = tf.Variable(tf.random_normal([5, 2]), name='weight5')
# BTW, b should be a parameter, too.
b = tf.Variable(tf.constant(0.0))
# Just keeping 1 hidden layer
def inference(input):
hidden_layer1 = tf.sigmoid(tf.matmul(input, W1) + b)
out_layer = tf.nn.softmax(tf.matmul(hidden_layer1, W5) + b)
return out_layer
# This is unchanged
def loss(hypothesis, y):
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y * tf.log(hypothesis), reduction_indices=[1]))
return cross_entropy
# This is unchanged
def train(loss):
optimizer =
train = optimizer.minimize(loss)
return train
# Using SCALE to normalize the input variables (range of inputs too big)
# This is a simple normalization in this case. Other examples are
# Min-Max normalization or z-scores.
SCALE = 1000
x_data = set_x_data()
y_data = set_y_data(x_data)
x_data /= SCALE
# Now only placeholders are x and y (b is a parameter)
x= tf.placeholder(tf.float32, shape=[None, 1])
y= tf.placeholder(tf.float32, shape=[None, 2])
hypothesis = inference(x)
loss = loss(hypothesis, y)
train = train(loss)
sess = tf.Session()
init = tf.global_variables_initializer()
# Epochs x 10, it did not converge with fewer epochs
epochs = 20000
losses = np.zeros(epochs)
for step in range(epochs):
# Shuffle data
r = np.random.permutation(x_data.shape[0])
x_data = x_data[r]
y_data = y_data[r,:]
# Small modification here to capture the loss.
_, l =[train, loss], feed_dict={x:x_data, y:y_data})
losses[step] = l
The code to display the decision function above:
%matplotlib inline
import matplotlib.pyplot as plt
ystar = np.arange(50, 400, 10)[:,None]
plt.plot(ystar,, feed_dict={x:ystar/SCALE})[:,0])


How can I fix weight issue in neural network

I'm trying to train a neural network for a HW (can't use classes). I have pretty much everything figured out, but when it comes to updating the weights, I fixed the shape error I had, but the resulting weights are [nan nan nan, nan nan...], which shouldn't happen. I'm not asking anyone to do my HW for me, I'm just not sure where in the code I need to fix.
# Creating the activation function: Sigmoid function ==> h(x) = 1/1+e^(-x)
def sig(x):
return 1/(1+np.exp(-x))
# Derivative of act. function:
def deriv(x):
return x*(1-x)
# Establishing inputs and outputs
x = np.array([[1, 1, 1], [1, 0, 0], [1, 1, 0], [0, 1, 1], [1, 0, 1]])
y = np.array([[1], [0], [0], [1], [1]])
alpha = 0.05
# Establishing weights randomly with a mean of 0 for a weight matrix
w = np.random.uniform(low=-0.4, high=0.4, size=(3,1))
print('Random weight:',w)
# Going to iterate 1000 times
for _ in range(1000):
# Feed Forward:
# Input
z =, w)
h = sig(z)
# Output
yhat = sig(z)
# Backpropagation:
# Calculating the errors:
J = ((1/2)*(np.power((yhat - y), 2)))
deltay = yhat - y
grad = deriv(z)
dz = deltay*grad
const =, w.T)
dJdw =, const*deriv(z))
# Update the weights (shape is messing this and 3 up for some reason....)
w = w - alpha*dJdw
print('Weights after training: ',w)
print('Outputs after training: ',y_hat)
print('Error obtained: ',delta_y)

Tensorflow custom layer: Creating a sparse matrix with trainable parameters

A model that I am working on should be predicting quite a lot of variables simultaneously (>1000). Therefore I would like to have a small neural network at the end of the network for each output.
In order to do this compactly, I would like to find a way to create a sparse trainable connection between two layers in the neural network within the Tensorflow framework.
Only a small portion of the connection matrix should be trainable: It is only the parameters that are part of the block-diagonal.
For example:
The connection matrix is the following:
The trainable parameters should be in the place of the 1's.
I have written exactly such a layer:
It takes a sparse matrix as an input and lets you decide how to connect between layers. The layer uses sparse tensors and matrix multiplications.
so the comment was Is this a trainable object though?
The answer: No. You cannot use sparse matrix currently and make it trainable. Instead you can use a mask matrix (see at the end)
But if you need to use sparse matrix, you just have to use tf.sparse.sparse_dense_matmul() or tf.sparse_tensor_to_dense() where your sparse interacts with a dense matrix. I have taken a simple XOR example from here and replaced dense with a sparse matrix:
#Declaring necessary modules
import tensorflow as tf
import numpy as np
A simple numpy implementation of a XOR gate to understand the backpropagation
x = tf.placeholder(tf.float32,shape = [4,2],name = "x")
#declaring a place holder for input x
y = tf.placeholder(tf.float32,shape = [4,1],name = "y")
#declaring a place holder for desired output y
m = np.shape(x)[0]#number of training examples
n = np.shape(x)[1]#number of features
hidden_s = 2 #number of nodes in the hidden layer
l_r = 1#learning rate initialization
theta1 = tf.SparseTensor(indices=[[0, 0],[0, 1], [1, 1]], values=[0.1, 0.2, 0.1], dense_shape=[3, 2])
#theta1 = tf.cast(tf.Variable(tf.random_normal([3,hidden_s]),name = "theta1"),tf.float64)
theta2 = tf.cast(tf.Variable(tf.random_normal([hidden_s+1,1]),name = "theta2"),tf.float32)
#conducting forward propagation
a1 = tf.concat([np.c_[np.ones(x.shape[0])],x],1)
#the weights of the first layer are multiplied by the input of the first layer
#z1 = tf.sparse_tensor_dense_matmul(theta1, a1)
z1 = tf.matmul(a1,tf.sparse_tensor_to_dense(theta1))
#the input of the second layer is the output of the first layer, passed through the
a2 = tf.concat([np.c_[np.ones(x.shape[0])],tf.sigmoid(z1)],1)
#the input of the second layer is multiplied by the weights
z3 = tf.matmul(a2,theta2)
#the output is passed through the activation function to obtain the final probability
h3 = tf.sigmoid(z3)
cost_func = -tf.reduce_sum(y*tf.log(h3)+(1-y)*tf.log(1-h3),axis = 1)
#built in tensorflow optimizer that conducts gradient descent using specified
optimiser = tf.train.GradientDescentOptimizer(learning_rate = l_r).minimize(cost_func)
#setting required X and Y values to perform XOR operation
X = [[0,0],[0,1],[1,0],[1,1]]
Y = [[0],[1],[1],[0]]
#initializing all variables, creating a session and running a tensorflow session
init = tf.global_variables_initializer()
sess = tf.Session()
#running gradient descent for each iterati
for i in range(200):, feed_dict = {x:X,y:Y})#setting place holder values using feed_dict
if i%100==0:
and the output is:
Epoch: 0
SparseTensorValue(indices=array([[0, 0],
[0, 1],
[1, 1]]), values=array([0.1, 0.2, 0.1], dtype=float32), dense_shape=array([3, 2]))
Epoch: 100
SparseTensorValue(indices=array([[0, 0],
[0, 1],
[1, 1]]), values=array([0.1, 0.2, 0.1], dtype=float32), dense_shape=array([3, 2]))
So the only way is to use a mask matrix. You can use it by multiplication or tf.where
1) Multiplication: You can create mask matrix of the desired shape and multiply it with your weight matrix:
mask = tf.Variable([[1,0,0],[0,1,0],[0,0,1]],name ='mask', trainable=False)
weight = tf.cast(tf.Variable(tf.random_normal([3,3])),tf.float32)
desired_tensor = tf.matmul(weight, mask)
2) tf.where
mask = tf.Variable([[1,0,0],[0,1,0],[0,0,1]],name ='mask', trainable=False)
weight = tf.cast(tf.Variable(tf.random_normal([3,3])),tf.float32)
desired_tensor = tf.where(mask > 0, tf.ones_like(weight), weight)
Hope it helps
You can do that by using sparse tensors like so:
SparseTensor(indices=[[0, 0], [1, 2]], values=[1, 2], dense_shape=[3, 4])
and the output is:
[[1, 0, 0, 0]
[0, 0, 2, 0]
[0, 0, 0, 0]]
you can look up more on the documentation of sparse tensor here:
Hope it helps!

'None' gradients in pytorch

I am trying to implement a simple MDN that predicts the parameters of a distribution over a target variable instead of a point value, and then assigns probabilities to discrete bins of the point value. Narrowing down the issue, the code from which the 'None' springs is:
import torch
# params
tte_bins = np.linspace(
).reshape(1, 1, -1)
bins = torch.tensor(tte_bins, dtype=torch.float32)
x_train = np.random.randn(1, 1024, 3)
y_labels = np.random.randint(low=0, high=399, size=(1, 1024))
y_train = np.eye(400)[y_labels]
# data
in_train = torch.tensor(x_train[0:1, :, :], dtype=torch.float)
in_train = (in_train - torch.mean(in_train)) / torch.std(in_train)
out_train = torch.tensor(y_train[0:1, :, :], dtype=torch.float)
# model
linear = torch.nn.Linear(in_features=3, out_features=2)
lin = linear(in_train)
preds = torch.exp(lin)
# intermediate values
alpha = torch.clamp(preds[0:1, :, 0:1], 0, 500)
beta = torch.clamp(preds[0:1, :, 1:2], 0, 100)
# probs
p1 = torch.exp(-torch.pow(bins / alpha, beta))
p2 = torch.exp(-torch.pow((bins + 1.0) / alpha, beta))
probs = p1 - p2
# loss
loss = torch.mean(torch.pow(out_train - probs, 2))
# gradients
for p in linear.parameters():
print(p.grad, 'gradient')
in_train has shape: [1, 1024, 3], out_train has shape: [1, 1024, 400], bins has shape: [1, 1, 400]. All the broadcasting etc.. appears find, the resulting matrices (like alpha/beta/loss) are the right shape and have the right values - there's simply no gradients
edit: added loss.backward() and x_train/y_train, now I have nans
You simply forgot to compute the gradients. While you calculate the loss, you never tell pytorch with respect to which function it should calculate the gradients.
Simply adding
to your code should fix the problem.
Additionally, in your code some intermediate results like alpha are sometimes zero but are in a denominator when computing the gradient. This will lead to the nan results you observed.

Why does my TensorFlow Neural Network for XOR only have an accuracy of around 0.5?

I Wrote a Neural Network in TensorFlow for the XOR input. I have used 1 hidden layer with 2 units and softmax classification. The input is of the form <1, x_1, x_2, zero, one> , where
1 is the bias
x_1 and x_2 are either between 0 and 1 for all the combination {00, 01, 10, 11}. Selected to be normally distributed around 0 or 1
zero: is 1 if the output is zero
one: is 1 if the output is one
The accuracy is always around 0.5. What has gone wrong? Is the architecture of the neural network wrong, or is there something with the code?
import tensorflow as tf
import numpy as np
from random import randint
def init_weights(shape):
return tf.Variable(tf.random_normal(shape, stddev=0.01))
def model(X, weight_hidden, weight_output):
# [1,3] x [3,n_hiddent_units] = [1,n_hiddent_units]
hiddern_units_output = tf.nn.sigmoid(tf.matmul(X, weight_hidden))
# [1,n_hiddent_units] x [n_hiddent_units, 2] = [1,2]
return hiddern_units_output
#return tf.matmul(hiddern_units_output, weight_output)
def getHiddenLayerOutput(X, weight_hidden):
hiddern_units_output = tf.nn.sigmoid(tf.matmul(X, weight_hidden))
return hiddern_units_output
total_inputs = 100
zeros = tf.zeros([total_inputs,1])
ones = tf.ones([total_inputs,1])
around_zeros = tf.random_normal([total_inputs,1], mean=0, stddev=0.01)
around_ones = tf.random_normal([total_inputs,1], mean=1, stddev=0.01)
batch_size = 10
n_hiddent_units = 2
X = tf.placeholder("float", [None, 3])
Y = tf.placeholder("float", [None, 2])
weight_hidden = init_weights([3, n_hiddent_units])
weight_output = init_weights([n_hiddent_units, 2])
hiddern_units_output = getHiddenLayerOutput(X, weight_hidden)
py_x = model(X, weight_hidden, weight_output)
#cost = tf.square(Y - py_x)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=py_x, labels=Y))
train_op = tf.train.GradientDescentOptimizer(0.05).minimize(cost)
with tf.Session() as sess:
trX_0_0 =[ones, around_zeros, around_zeros, ones, zeros], axis=1))
trX_0_1 =[ones, around_zeros, around_ones, zeros, ones], axis=1))
trX_1_0 =[ones, around_ones, around_zeros, zeros, ones], axis=1))
trX_1_1 =[ones, around_ones, around_ones, ones, zeros], axis=1))
trX =[trX_0_0, trX_0_1, trX_1_0, trX_1_1], axis=0))
trX =
for i in range(10):
for start, end in zip(range(0, len(trX), batch_size), range(batch_size, len(trX) + 1, batch_size)):
trY = tf.identity(trX[start:end,3:5])
trY =,[batch_size, 2])), feed_dict={ X: trX[start:end,0:3], Y: trY })
start_index = randint(0, (total_inputs*4)-batch_size)
y_0 =, feed_dict={X: trX[start_index:start_index+batch_size,0:3]})
print("iteration :",i, " accuracy :", np.mean(np.absolute(trX[start_index:start_index+batch_size,3:5]-y_0)),"\n")
Check the comments section for the updated code
The problem was with the randomly assigned weights. Here is the modified version, obtained after a series of trail-and-error.

What is going wrong with the training and predictions using TensorFlow?

Please see the code written below.
x = tf.placeholder("float", [None, 80])
W = tf.Variable(tf.zeros([80,2]))
b = tf.Variable(tf.zeros([2]))
y = tf.nn.softmax(tf.matmul(x,W) + b)
y_ = tf.placeholder("float", [None,2])
So here we see that there are 80 features in the data with only 2 possible outputs. I set the cross_entropy and the train_step like so.
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(tf.matmul(x, W) + b, y_)
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
Initialize all variables.
init = tf.initialize_all_variables()
sess = tf.Session()
Then I use this code to "train" my Neural Network.
g = 0
for i in range(len(x_train)):
_, w_out, b_out =[train_step, W, b], feed_dict={x: [x_train[g]], y_: [y_train[g]]})
g += 1
print "...Trained..."
After training the network, it always produces the same accuracy rate regardless of how many times I train it. That accuracy rate is 0.856067 and I get to that accuracy with this code-
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print, feed_dict={x: x_test, y_: y_test})
So this is where the question comes in. Is it because I have too small of dimensions? Maybe I should break the features into a 10x8 matrix? Maybe a 4x20 matrix? etc.
Then I try to get the probabilities of the actual test data producing a 0 or a 1 like so-
test_data_actual = genfromtxt('clean-test-actual.csv',delimiter=',') # Actual Test data
x_test_actual = []
for i in test_data_actual:
x_test_actual = np.array(x_test_actual)
ans =, feed_dict={x: x_test_actual})
And print out the probabilities:
print ans[0:10]
[[ 1. 0.]
[ 1. 0.]
[ 1. 0.]
[ 1. 0.]
[ 1. 0.]
[ 1. 0.]
[ 1. 0.]
[ 1. 0.]
[ 1. 0.]
[ 1. 0.]]
(Note: it does produce [ 0. 1.] sometimes.)
I then tried to see if applying the expert methodology would produce better results. Please see the following code.
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 1, 1, 1],
strides=[1, 1, 1, 1], padding='SAME')
(Please note how I changed the strides in order to avoid errors).
W_conv1 = weight_variable([1, 80, 1, 1])
b_conv1 = bias_variable([1])
Here is where the question comes in again. I define the Tensor (vector/matrix if you will) as 80x1 (so 1 row with 80 features in it); I continue to do that throughout the rest of the code (please see below).
x_ = tf.reshape(x, [-1,1,80,1])
h_conv1 = tf.nn.relu(conv2d(x_, W_conv1) + b_conv1)
Second Convolutional Layer
h_pool1 = max_pool_2x2(h_conv1)
W_conv2 = weight_variable([1, 80, 1, 1])
b_conv2 = bias_variable([1])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
Densely Connected Layer
W_fc1 = weight_variable([80, 1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 80])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
keep_prob = tf.placeholder("float")
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
W_fc2 = weight_variable([1024, 2])
b_fc2 = bias_variable([2])
y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
In the above you'll see that I defined the output as 2 possible answers (also to avoid errors).
Then cross_entropy and the train_step.
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(tf.matmul(h_fc1_drop, W_fc2) + b_fc2, y_)
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
Start the session.
"Train" the neural network.
g = 0
for i in range(len(x_train)):
if i%100 == 0:
train_accuracy = accuracy.eval(session=sess, feed_dict={x: [x_train[g]], y_: [y_train[g]], keep_prob: 1.0}), feed_dict={x: [x_train[g]], y_: [y_train[g]], keep_prob: 0.5})
g += 1
print "test accuracy %g"%accuracy.eval(session=sess, feed_dict={
x: x_test, y_: y_test, keep_prob: 1.0})
test accuracy 0.929267
And, once again, it always produces 0.929267 as the output.
The probabilities on the actual data producing a 0 or a 1 are as follows:
[[ 0.92820859 0.07179145]
[ 0.92820859 0.07179145]
[ 0.92820859 0.07179145]
[ 0.92820859 0.07179145]
[ 0.92820859 0.07179145]
[ 0.92820859 0.07179145]
[ 0.96712834 0.03287172]
[ 0.92820859 0.07179145]
[ 0.92820859 0.07179145]
[ 0.92820859 0.07179145]]
As you see, there is some variance in these probabilities, but typically just the same result.
I know that this isn't a Deep Learning problem. This is obviously a training problem. I know that there should always be some variance in the training accuracy every time you reinitialize the variables and retrain the network, but I just don't know why or where it's going wrong.
The answer is 2 fold.
One problem is with the dimensions/parameters. The other problem is that the features are being placed in the wrong spot.
W_conv1 = weight_variable([1, 2, 1, 80])
b_conv1 = bias_variable([80])
Notice the first two numbers in the weight_variable correspond to the dimensions of the input. The second two numbers correspond to the dimensions of the feature tensor. The bias_variable always takes the final number in the weight_variable.
Second Convolutional Layer
W_conv2 = weight_variable([1, 2, 80, 160])
b_conv2 = bias_variable([160])
Here the first two numbers still correspond to the dimensions of the input. The second two numbers correspond to the amount of features and the weighted network that results from the 80 previous features. In this case, we double the weighted network. 80x2=160. The bias_variable then takes the final number in the weight_variable. If you were to finish the code at this point, the last number in the weight_variable would be a 1 in order to prevent dimensional errors due to the shape of the input tensor and the output tensor. But, instead, for better predictions, let's add a third convolutional layer.
Third Convolutional Layer
W_conv3 = weight_variable([1, 2, 160, 1])
b_conv3 = bias_variable([1])
Once again, the first two numbers in the weight_variable take the shape of the input. The third number corresponds to the amount of the weighted variables we established in the Second Convolutional Layer. The last number in the weight_variable now becomes 1 so we don't run into any dimension errors on the output that we are predicting. In this case, the output has the dimensions of 1, 2.
W_fc2 = weight_variable([80, 1024])
b_fc2 = bias_variable([1024])
Here, the number of neurons is 1024 which is completely arbitrary, but the first number in the weight_variable needs to be something that the dimensions of our feature matrix needs to be divisible by. In this case it can be any number (such as 2, 4, 10, 20, 40, 80). Once again, the bias_variable takes the last number in the weight_variable.
At this point, make sure that the last number in h_pool3_flat = tf.reshape(h_pool3, [-1, 80]) corresponds to the first number in the W_fc2 weight_variable.
Now when you run your training program you will notice that the outcome varies and won't always guess all 1's or all 0's.
When you want to predict the probabilities, you have to feed x to the softmax variable-> y_conv=tf.nn.softmax(tf.matmul(h_fc2_drop, W_fc3) + b_fc3) like so-
ans =, feed_dict={x: x_test_actual, keep_prob: 1.0})
You can alter the keep_prob variable, but keeping it at a 1.0 always produces the best results. Now, if you print out ans you'll have something that looks like this-
[[ 0.90855026 0.09144982]
[ 0.93020624 0.06979381]
[ 0.98385173 0.0161483 ]
[ 0.93948185 0.06051811]
[ 0.90705943 0.09294061]
[ 0.95702559 0.04297439]
[ 0.95543593 0.04456403]
[ 0.95944828 0.0405517 ]
[ 0.99154049 0.00845954]
[ 0.84375167 0.1562483 ]
[ 0.98449463 0.01550537]
[ 0.97772813 0.02227189]
[ 0.98341942 0.01658053]
[ 0.93026513 0.06973486]
[ 0.93376994 0.06623009]
[ 0.98026556 0.01973441]
[ 0.93210858 0.06789146]
Notice how the probabilities vary. Your training is now working properly.

