I am trying to implement simple autoencoder like below.
The number of input features are 2, and I want to build sparse autoencoder for dimension reduction to feature 1. I selected the number of nodes are 2(input), 8(hidden), 1(reduced feature), 8(hidden), 2(output) to add some more complexity than using only (2, 1, 2) nodes. The number of samples N is around 10000.
'DATA' is a just a 2x10000 matrix containing integer values.
import tensorflow as tf
x = tf.placeholder(shape=[None, 2])
w1 = tf.Variable(tf.random_normal(shape=[2, 8]))
w2 = tf.Variable(tf.random_normal(shape=[8, 1]))
h1 = tf.nn.relu(tf.matmul(x, w1))
encoded = tf.matmul(h1, w2)
h2 = tf.nn.relu(encoded)
h3 = tf.nn.relu(tf.matmul(h2, tf.transpose(w2)))
y = tf.matmul(h3, tf.transpose(w1))
mse = tf.reduce_mean(tf.squared_difference(x, y))
optimizer =
tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(mse)
sess = tf.Session()
sess.run(init)
fd = {x: DATA}
loss_value, reduced_feature = sess.run([mse, encoded], feed_dict=fd)
I have 2 questions with the implementation, as the result was quite different as I expected.
Is this implementation correct? Will the variable 'reduced_feature' show the reduced feature(1d feature) from 2 feature inputs?
Should I add some sparsity condition if I want to use more hidden nodes than input? If yes, can you show some sample code for this task?
Related
I am trying to write a custom loss function for a model that utilizes Monte Carlo (MC) dropout. I want the model to run through each sample in a batch n times before feeding the predictions to the loss function. A current toy code is shown below. The model has 24 inputs and 10 outputs with 5000 training samples.
import numpy as np
import tensorflow as tf
X = np.random.rand(5000,24)
y = np.random.rand(5000,10)
def MC_Loss(y_true,y_pred):
mu = tf.math.reduce_mean(y_pred,axis=0)
#error = tf.square(y_true-mu)
error = tf.square(y_true-y_pred)
var = tf.math.reduce_variance(y_pred,axis=0)
return tf.math.divide(error,var)/2 + tf.math.log(var)/2 + tf.math.log(2*np.pi)/2
input_layer = tf.keras.layers.Input(shape=(X.shape[1],))
hidden_layer = tf.keras.layers.Dense(units=100,activation='elu')(input_layer)
do_layer = tf.keras.layers.Dropout(rate=0.20)(hidden_layer,training=True)
output_layer = tf.keras.layers.Dense(units=10,activation='sigmoid')(do_layer)
model = tf.keras.models.Model(input_layer,output_layer)
model.compile(loss=MC_Loss,optimizer='Adam')
model.fit(X,y,epochs=100,batch_size=128,shuffle=True)
The current shape of y_true and y_pred are (None,10) with None being the batch_size. I want to be able to have n values for each sample in the batch, so I can get the mean and standard deviation for each sample to use in the loss function. I want these value, because the mean and standard deviation should be unique to each sample, not taken across all samples in a batch. The current shape of mu and sigma are (10,) and I would want them to be (None,10) which would mean y_true and y_pred have the shape (None,n,10).
How can I accomplish this?
I believe I found the solution after some experimentation. The modified code is shown below.
import numpy as np
import tensorflow as tf
n = 100
X = np.random.rand(5000,24)
X1 = np.concatenate(([X.reshape(X.shape[0],1,X.shape[1]) for _ in range(n)]),axis=1)
y = np.random.rand(5000,10)
y1 = np.concatenate(([y.reshape(y.shape[0],1,y.shape[1]) for _ in range(n)]),axis=1)
def MC_Loss(y_true,y_pred):
mu = tf.math.reduce_mean(y_pred,axis=1)
obs = tf.math.reduce_mean(y_true,axis=1)
error = tf.square(obs-mu)
var = tf.math.reduce_variance(y_pred,axis=1)
return tf.math.divide(error,var)/2 + tf.math.log(var)/2 + tf.math.log(2*np.pi)/2
input_layer = tf.keras.layers.Input(shape=(X.shape[1]))
hidden_layer = tf.keras.layers.Dense(units=100,activation='elu')(input_layer)
do_layer = tf.keras.layers.Dropout(rate=0.20)(hidden_layer,training=True)
output_layer = tf.keras.layers.Dense(units=10,activation='sigmoid')(do_layer)
model = tf.keras.models.Model(input_layer,output_layer)
model.compile(loss=MC_Loss,optimizer='Adam')
model.fit(X1,y1,epochs=100,batch_size=128,shuffle=True)
So what I am now doing is stacking the inputs and outputs about an intermediate axis, creating n identical sets of all input and output samples. While tensorflow shows a warning because the model is created without knowledge of this intermediate axis. It still trains with no issues and the shapes are as expected.
Note: since y_true now has the shape (None,n,10), you have to take the mean about the intermediate axis which gives you the true value since all n are identical.
I was wondering if possible to train the inputs of neural network part by part. For example, suppose that I have neural network of inputs 256, and output of 256. what I am asking is about the possibility to take groups where each group contains only 16 out of 265 of the inputs in order to be predicted based on a single model trained independently and then concatenate the whole groups at final outputs.
For example, the below example is provided :
from matplotlib import pyplot as plt
import tensorflow as tf
tf.reset_default_graph()
x_train = [[0.,0.],[1.,1.],[1.,0.],[0.,1.]]
y_train = [[0.],[0.],[1.],[1.]]
x_test = [[0.,0.],[.5,.5],[.5,0.],[0.,.5]]
y_test = [[0.],[0.],[2.],[2.]]
# use placeholder instead so you can have different inputs
x = tf.placeholder('float32', [None, 2])
y = tf.placeholder('float32',)
# Layer 1 = the 2x3 hidden sigmoid
m1 = tf.Variable(tf.random_uniform([2,3], minval=0.1, maxval=0.9, dtype=tf.float32))
b1 = tf.Variable(tf.random_uniform([3], minval=0.1, maxval=0.9, dtype=tf.float32))
h1 = tf.sigmoid(tf.matmul(x, m1) + b1)
# Layer 2 = the 3x1 sigmoid output
m2 = tf.Variable(tf.random_uniform([3,1], minval=0.1, maxval=0.9, dtype=tf.float32))
b2 = tf.Variable(tf.random_uniform([1], minval=0.1, maxval=0.9, dtype=tf.float32))
y_out = tf.sigmoid(tf.matmul(h1, m2) + b2)
### loss
# loss : sum of the squares of y0 - y_out
loss = tf.reduce_sum(tf.square(y - y_out))
# training step : gradient decent (1.0) to minimize loss
train = tf.train.GradientDescentOptimizer(1.0).minimize(loss)
# the two feed dictionaries
feeddict_train = {x: x_train, y: y_train}
feeddict_test = {x: x_test, y: y_test}
### training
# run 500 times using all the X and Y
# print out the loss and any other interesting info
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
train_loss, test_loss = [], []
for step in range(500):
loss_train, _ = sess.run([loss, train], feed_dict=feeddict_train)
train_loss.append(loss_train)
# under the same tensorflow graph (in the session), use another feed dictionary
loss_test = sess.run(loss, feed_dict=feeddict_test)
test_loss.append(loss_test)
plt.plot(train_loss, 'r', label='train_loss')
plt.plot(test_loss, 'b', label='test_loss')
plt.legend(loc='best')
here in this command loss_test = sess.run(loss, feed_dict=feeddict_test), the whole inputs feeddict_test will be
taken and trained. what's about if I want to take it into two groups each groub contain only 2 items out of the available
4, and then test them indpendentaly and contencate the outputs, is that possible ??
How can I do that? could you please help me in doing that if possible?
thank you in advance.
There are few ways your question can be interpreted due to the inaccuracy of your question.
First interpretation:
If what you're asking is that if your neural network receives an input vector of size 256 and outputs a vector of size 256, then the answer is no, you can't input a part of the vector as input and expect it to work.
Second interpretation:
If what you're asking is that if you have 256 data (each data is an n-sized vector) and you want to train the network by inputting the first 16, then the second 16, and so on until the 16th 16, yes it is very much possible. Based on the example code you've given, all you need to do is make a for loop that loops 2 times (because in your example, there are 4 data and you want to input them in a group of 2) and,
Change these lines of code:
for step in range(500):
loss_train, _ = sess.run([loss, train], feed_dict=feeddict_train)`
to
for step in range(500):
temp_list = [] #an empty list
for i in range(0,4,2):
loss_train, _ = sess.run([loss, train], feed_dict={x:x_train[i:i+2], y:y_train[i:i+2]}
temp_list.append(loss_train) #append the loss of the network for each group of data.
These will allow the network to be trained with two groups of data independently and learn from them. You can simply make an empty list before the new for loop and concatenate the outputs in it.
Hope this helps. Do let me know if I understood your questions wrongly. Cheers.
I am developing a Tensorflow network based on their MNIST for beginners template. Basically, I am trying to implement a simple logistic regression in which 10 continuous variables predict a binary outcome, so my inputs are 10 values between 0 and 1, and my target variable (Y_train and Y_test in the code) is a 1 or 0.
My main problem is that there is no change in accuracy no matter how many training sets I run -- it is 0.276667 whether I run 100 or 31240 steps. Additionally, when I switch from the softmax to simply matmul to generate my Y values, I get 0.0 accuracy, which suggests there may be something wrong with my x*W + b calculation. The inputs read out just fine.
What I'm wondering is a) whether I'm not calculating Y values properly because of an error in my code and b) if that's not the case, is it possible that I need to implement the one_hot vectors -- even though my output already takes the form of 0 or 1. If the latter is the case, where do I include the one_hot=TRUE function in my generation of the target values vector? Thanks!
import numpy as np
import tensorflow as tf
train_data = np.genfromtxt("TRAINDATA2.txt", delimiter=" ")
train_input = train_data[:, :10]
train_input = train_input.reshape(31240, 10)
X_train = tf.placeholder(tf.float32, [31240, 10])
train_target = train_data[:, 10]
train_target = train_target.reshape(31240, 1)
Y_train = tf.placeholder(tf.float32, [31240, 1])
test_data = np.genfromtxt("TESTDATA2.txt", delimiter = " ")
test_input = test_data[:, :10]
test_input = test_input.reshape(7800, 10)
X_test = tf.placeholder(tf.float32, [7800, 10])
test_target = test_data[:, 10]
test_target = test_target.reshape(7800, 1)
Y_test = tf.placeholder(tf.float32, [7800, 1])
W = tf.Variable(tf.zeros([10, 1]))
b = tf.Variable(tf.zeros([1]))
Y_obt = tf.nn.softmax(tf.matmul(X_train, W) + b)
Y_obt_test = tf.nn.softmax(tf.matmul(X_test, W) + b)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=Y_obt,
labels=Y_train)
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(cross_entropy)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
for _ in range(31240):
sess.run(train_step, feed_dict={X_train: train_input,
Y_train:train_target})
correct_prediction = tf.equal(tf.round(Y_obt_test), Y_test)
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={X_test : test_input, Y_test:
test_target}))
Since you map your values to a target with one element, you should not use softmax cross entropy, since the softmax operation transforms the input into a probability distribution, with the sum of all probabilities equal to 1. Since your target has only one element, it will simply output 1 everytime, since this is the only possible way to transform the input into a probability distribution.
You should instead use tf.nn.sigmoid_cross_entropy_with_logits() (which is used for binary classification) and also remove the softmax from Y_obt and convert it into tf.sigmoid() for Y_obt_test.
Another way is to one-hot encode your targets and use a network with a two-element output. In this case, you should use tf.nn.softmax_cross_entropy_with_logits(), but remove the tf.nn.softmax() from Y_obt, since the softmax cross entropy expects unscaled logits (https://www.tensorflow.org/api_docs/python/tf/nn/softmax_cross_entropy_with_logits). For the Y_obt_test, you should of course not remove it in this case.
Another thing: It might also help to take the mean of the cross entropies with cross_entropy = tf.reduce_mean(tf.sigmoid_cross_entropy_...).
import tensorflow as tf
x = tf.placeholder(tf.float32, [None,4]) # input vector
w1 = tf.Variable(tf.random_normal([4,2])) # weights between first and second layers
b1 = tf.Variable(tf.zeros([2])) # biases added to hidden layer
w2 = tf.Variable(tf.random_normal([2,1])) # weights between second and third layer
b2 = tf.Variable(tf.zeros([1])) # biases added to third (output) layer
def feedForward(x,w,b): # function for forward propagation
Input = tf.add(tf.matmul(x,w), b)
Output = tf.sigmoid(Input)
return Output
>>> Out1 = feedForward(x,w1,b1) # output of first layer
>>> Out2 = feedForward(Out1,w2,b2) # output of second layer
>>> MHat = 50*Out2 # final prediction is in the range (0,50)
>>> M = tf.placeholder(tf.float32, [None,1]) # placeholder for actual (target value of marks)
>>> J = tf.reduce_mean(tf.square(MHat - M)) # cost function -- mean square errors
>>> train_step = tf.train.GradientDescentOptimizer(0.05).minimize(J) # minimize J using Gradient Descent
>>> sess = tf.InteractiveSession() # create interactive session
>>> tf.global_variables_initializer().run() # initialize all weight and bias variables with specified values
>>> xs = [[1,3,9,7],
[7,9,8,2], # x training data
[2,4,6,5]]
>>> Ms = [[47],
[43], # M training data
[39]]
>>> for _ in range(1000): # performing learning process on training data 1000 times
sess.run(train_step, feed_dict = {x:xs, M:Ms})
>>> print(sess.run(MHat, feed_dict = {x:[[1,3,9,7]]}))
[[ 50.]]
>>> print(sess.run(MHat, feed_dict = {x:[[1,15,9,7]]}))
[[ 50.]]
>>> print(sess.run(tf.transpose(MHat), feed_dict = {x:[[1,15,9,7]]}))
[[ 50.]]
In this code, I am trying to predict the marks M of a student out of 50 given how many hours he/she slept, studied, used electronics and played. These 4 features come under the input feature vector x.
To solve this regression problem, I am using a deep neural network with
an input layer with 4 perceptrons (the input features) , a hidden layer with two perceptrons and an output layer with one perceptron. I have used sigmoid as activation function. But, I am getting the exact same prediction([[50.0]]) for M for all possible input vectors I feed in. Can someone please tell me
what is wrong with the code below. I HIGHLY APPRECIATE THE HELP! (IN ADVANCE)
You would need to modify your feedforward() function. Here you don't need to apply sigmoid() at last layer (simply return the activation function!) and also no need to multiply output of this function by 50.
def feedForward(X,W1,b1,W2,b2):
Z=tf.sigmoid(tf.matmul(X,W1)+b1)
return tf.matmul(Z,W2)+b2
MHat = feedForward(x,w1,b1,w2,b2)
Hope this helps!
Don't forget to let us know if it solved your problem :)
I'm trying to impelement this article:
http://ronan.collobert.com/pub/matos/2008_deep_icml.pdf
Specfically the equation (3) from section 2.
Shortly I want to do a pairwise distance computation for the features of each mini-batch and insert this loss to the general network loss.
I have only the Tesnor of the batch (16 samples), the labels tensor of the batch and the batch feature Tensor.
After looking for quite a while I still couldn't figure out the following:
1) How do I divide the batch for Positive (i.e. same label) and negative pairs. Since Tensor are not iterateble I can't figure out how to get which sample have which label and then divide my vector, or get which indices of the tensor belong to each class.
2) How can I do pairwise distance calculation for some of the indices in the batch tensor?
3) I also need to define a new distance function for negative examples
Overall, I need to get which indices belong to which class, do a positive pair-wise distace calculation for all positive pairs. And do another calculation for all negative pairs. Then sum it all up and add it to the network loss.
Any help (to one of more of the 3 issues) would be highly appreciated.
1)
You should do the pair sampling before feeding the data into a session. Label every pair a boolean label, say y = 1 for matched-pair, 0 otherwise.
2) 3) Just calculate both pos/neg terms for every pair, and let the 0-1 label y to choose which to add to the loss.
First create placeholders, y_ is for boolean labels.
dim = 64
x1_ = tf.placeholder('float32', shape=(None, dim))
x2_ = tf.placeholder('float32', shape=(None, dim))
y_ = tf.placeholder('uint8', shape=[None]) # uint8 for boolean
Then the loss tensor can be created by the function.
def loss(x1, x2, y):
# Euclidean distance between x1,x2
l2diff = tf.sqrt( tf.reduce_sum(tf.square(tf.sub(x1, x2)),
reduction_indices=1))
# you can try margin parameters
margin = tf.constant(1.)
labels = tf.to_float(y)
match_loss = tf.square(l2diff, 'match_term')
mismatch_loss = tf.maximum(0., tf.sub(margin, tf.square(l2diff)), 'mismatch_term')
# if label is 1, only match_loss will count, otherwise mismatch_loss
loss = tf.add(tf.mul(labels, match_loss), \
tf.mul((1 - labels), mismatch_loss), 'loss_add')
loss_mean = tf.reduce_mean(loss)
return loss_mean
loss_ = loss(x1_, x2_, y_)
Then feed your data (random generated for example):
batchsize = 4
x1 = np.random.rand(batchsize, dim)
x2 = np.random.rand(batchsize, dim)
y = np.array([0,1,1,0])
l = sess.run(loss_, feed_dict={x1_:x1, x2_:x2, y_:y})
Short answer
I think the simplest way to do that is to sample the pairs offline (i.e. outside of the TensorFlow graph).
You create tf.placeholder for a batch of pairs along with their labels (positive or negative, i.e. same class or different class), and then you can compute in TensorFlow the corresponding loss.
With the code
You sample the pairs offline. You sample batch_size pairs of inputs, and output the batch_size left elements of the pairs of shape [batch_size, input_size]. You also output the labels of the pairs (either positive of negative) of shape [batch_size,]
pairs_left = np.zeros((batch_size, input_size))
pairs_right = np.zeros((batch_size, input_size))
labels = np.zeros((batch_size, 1)) # ex: [[0.], [1.], [1.], [0.]] for batch_size=4
Then you create Tensorflow placeholders corresponding to these inputs. In your code, you will feed the previous inputs to these placeholders in the feed_dict argument of sess.run()
pairs_left_node = tf.placeholder(tf.float32, [batch_size, input_size])
pairs_right_node = tf.placeholder(tf.float32, [batch_size, input_size])
labels_node = tf.placeholder(tf.float32, [batch_size, 1])
Now we can perform a feedforward on the inputs (let's say your model is a linear model).
W = ... # shape [input_size, feature_size]
output_left = tf.matmul(pairs_left_node, W) # shape [batch_size, feature_size]
output_right = tf.matmul(pairs_right_node, W) # shape [batch_size, feature_size]
Finally we can compute the pairwise loss.
l2_loss_pairs = tf.reduce_sum(tf.square(output_left - output_right), 1)
positive_loss = l2_loss_pairs
negative_loss = tf.nn.relu(margin - l2_loss_pairs)
final_loss = tf.mul(labels_node, positive_loss) + tf.mul(1. - labels_node, negative_loss)
And that's it ! You can now optimize on this loss, with a good offline sampling.