When I have some trainable parameters, say layer.trainable_weights. I want to sort those weights before feed into other operations, is it possible for me to do that? Can I use something like
import tensorflow as tf
p = layer.trainable_weights
p = tf.sort(p)
or are there any particular ways in Keras?
I'm new to Keras and TensorFlow. Really appreciate if someone can answer my questions, thanks in advance!
EDIT:
For "other operations", I want to feed those sorted trainable weights into another neural network, but that neural network is fixed (not trainable). So what I want to do is something like
import tensorflow as tf
p = model.layer[0].trainable_weights
p = tf.sort(p)
another_model.trainable = False
x = another_model(p)
# x is involved in the loss function of the original model
Hope this is clear, and hope anyone can help me!
(also, can I just use x=another_model.predict(p) instead of x=another_model(p) above?)
Of course, you can do it. Since you have not stated clearly what these downstreaming operations are, it is more difficult to answer your question.
If you only want to do something to monitor the training process, e.g. monitoring a custom metric to measure the cumulative distribution function of the weight matrix of interest, feel free to use tf.sort (you may use K.stop_gradient(W) before you sort it to further ensure no gradient is flowed back from any downstreaming process).
If you want to do things other than monitoring, e.g. computing a custom regularization term to play a part in training optimization, you should rethink how to implement this part both efficiently and effectively, but directly using tf.sort does not help! Why? Because this function is not implemented with back-propagation, (you might want to double check on this to see whether the latest version supports this feature)
Related
I have written a code that computes Choquet pooling in a Custom Layer in Keras. Below the Colab link to the notebook:
https://colab.research.google.com/drive/1lCrUb2Jm680JRnACPxWpxkOSkP_DlHGj
As you can the code crashes in gradient computation, precisely inside the function custom_grad. This is impossible because I'm returning 0 gradients with the same shape as the previous layer.
So I have 2 questions:
Is in Keras (or in Tensorflow) a way to compute gradient between the layer input and its output?
If I have passed a Tensor with the same shape as the previous layer, but filled with 0s, why the code is not working?
Thanks for your attention and I'm waiting for your help.
Thanks in advance
No one is interested in that question.
After several trials, I have found a solution. The problem is that, as posted by Mainak431 in this GitHub repo:
link to diff and non-diff ops in tensorflow
There are differentiable TensorFlow operations and non-differentiable operations. In the Colab notebook, I used, as an example, scatter_nd_update that is non-differentiable.
So I suggest, if you want to create your own Custom Layer in Keras to take a look at the above lists in order to use operations that allow Keras to auto-differentiate for you.
Anyway, I'm working on it to inform as much as possible on that open research topic. I remember that with the neural network the "LEGO-ing" is borderline, and I know for sure that many of you are interested in adding your operations(aggregation or something else) in a deep neural network model.
Special Thanks to Maniak431, I love you <3
I have created a prediction model and used RNN in it offered by the tensorflow library in Python. Here is the complete code I have created and tried:
Jupyter Notbook of the Code
But I have doubts.
1) Whether RNN is correct for what I am trying to predict?
2) Is there a better algorithm I can try?
3) Can anyone suggest me how I can give multiple inputs and get the necessary output using tensorflow model? Can anyone guide me please.
I hope I am clear on my points. Please do tell me if anything else required.
Having doubts is normal, but you should try to measure them before asking for advice. If you don't have a clear thing you want to improve it's unlikely you will get something better.
1) Whether RNN is correct for what I am trying to predict?
Yes. RNN is used appropriately here. If you don't care much about having arbitrary length input sequences, you can also try to force them to a fixed size and then apply convolutions on top (see convolutional NeuralNetworks), or even try with a more simple DNN.
The more important question to ask yourself is if you have the right inputs and if you have sufficient training data to learn what you hope to learn.
2) Is there a better algorithm I can try?
Probably no. As I said RNN seems appropriate for this problem. Do try some hyper parameter tuning to make sure you don't accidentally just pick a sub-optimal configuration.
3) Can anyone suggest me how I can give multiple inputs and get the necessary output using tensorflow model? Can anyone guide me please.
The common way to handle variable length inputs is to set a max length and pad the shorter examples until they reach that length. The max length can be a variable you pick or you can dynamically set it to the largest length in the batch. This is needed only because the internal operations are done in batches. You can pick which results you want. Picking the last one is reasonable (the model will just have to learn to propagate the state for the padding values). Another reasonable thing to do is to pick the first one you get after feeding the last meaningful value into the RNN.
Looking at your code, there's one thing I would improve:
Instead of computing a loss on the last value only, I would compute it over all values in the series. This gives your model more training data with very little performance degradation.
I'm in need of an artificial neural network library (preferably in python) for one (simple) task. I want to train it so that it can tell wether a thing is in an image. I would train it by feeding it lots of pictures and telling it wether it contains the thing I'm looking for or not:
These images contain this thing, return True (or probability of it containing the thing)
These images do not contain this thing, return False (or probability of it containing the thing)
Does such a library already exist? I'm fairly new to ANNs and image recognition; although I understand how they both work in principle I find it quite hard to find an adequate library for this task, and even research in this field has proven to be kind of a frustration - any advice towards the right direction is greatly appreciated.
There are several good Neural Network approaches in Python, including TensorFlow, Caffe, Lasagne, and sknn (Sci-kit Neural Network). sknn provides an easy, out of the box solution, although in my opinion it is more difficult to customize and can be slow on large datasets.
One thing to consider is whether you want to use a CNN (Convolutional Neural Network) or a standard ANN. With an ANN you will mostly likely have to "unroll" your images into a vector whereas with a CNN, it expects the image to be a cube (if in color, a square otherwise).
Here is a good resource on CNNs in Python.
However, since you aren't really doing a multiclass image classification (for which CNNs are the current gold standard) and doing more of a single object recognition, you may consider a transformed image approach, such as one using the Histogram of Oriented Gradients (HOG).
In any case, the accuracy of a Neural Network approach, especially when using CNNs, is highly dependent on successful hyperparamter tuning. Unfortunately, there isn't yet any kind of general theory on what hyperparameter values (number and size of layers, learning rate, update rule, dropout percentage, batch size, etc.) are optimal in a given situation. So be prepared to have a nice Training, Validation, and Test set setup in order to fit a robust model.
I am unaware of any library which can do this for you. I use a lot of Caffe and can give you a solution till you find a single library which can do it for you.
I hope you know about ImageNet and that Caffe has a trained model based on ImageNet.
Here is the idea:
Define what the object is. Say object = "laptop".
Use Caffe's ImageNet trained model, change the code to display the required output you want (you mentioned TRUE or FALSE) when the object is in the output labels.
Here is a link to the ImageNet tutorial which I wrote.
Here is what you might try:
Take a look here. It is a stripped down version of the ImageNet program which I used in a prediction engine.
In line 80 you'll get the top-1 predicted output label. In line 86 you'll get the top-5 predicted labels. Write a line of code to check whether object is in the output_label and return TRUE or FALSE according to it.
I understand that you are looking for a specific library, I will look for it, but this is something I would try out in the beginning.
I'm trying to write something similar to google's wide and deep learning after running into difficulties of doing multi-class classification(12 classes) with the sklearn api. I've tried to follow the advice in a couple of posts and used the tf.group(logistic_regression_optimizer, deep_model_optimizer). It seems to work but I was trying to figure out how to get predictions out of this model. I'm hoping that with the tf.group operator the model is learning to weight the logistic and deep models differently but I don't know how to get these weights out so I can get the right combination of the two model's predictions. Thanks in advance for any help.
https://groups.google.com/a/tensorflow.org/forum/#!topic/discuss/Cs0R75AGi8A
How to set layer-wise learning rate in Tensorflow?
tf.group() creates a node that forces a list of other nodes to run using control dependencies. It's really just a handy way to package up logic that says "run this set of nodes, and I don't care about their output". In the discussion you point to, it's just a convenient way to create a single train_op from a pair of training operators.
If you're interested in the value of a Tensor (e.g., weights), you should pass it to session.run() explicitly, either in the same call as the training step, or in a separate session.run() invocation. You can pass a list of values to session.run(), for example, your tf.group() expression, as well as a Tensor whose value you would like to compute.
Hope that helps!
I try to get reliable features for ImageNet to do further classification on them. To achieve that I would like to use tensorflow with Alexnet, for feature extraction. That means I would like to get the values from the last layer in the CNN. Could someone write a piece of Python code that explains how that works?
As jonrsharpe mentioned, that's not really stackoverflow's MO, but in practice, many people do choose to write code to help explain answers (because it's often easier).
So I'm going to assume that this was just miscommunication, and you really intended to ask one of the following two questions:
How does one grab the values of the last layer of Alexnet in TensorFlow?
How does feature extraction from the last layer of a deep convolutional network like alexnet work?
The answer to the first question is actually very easy. I'll use the cifar10 example code in TensorFlow (which is loosely based on AlexNet) as an example. The forward pass of the network is built in the inference function, which returns a variable representing the output of the softmax layer. To actually get predicted image labels, you just argmax the logits, like this: (I've left out some of the setup code, but if you're already running alexnet, you already have that working)
logits = cifar10.inference(images)
predictions = tf.argmax(logits,1)
# Actually run the computation
labels = session.run([predictions])
So grabbing just the last layer features is literally just as easy as asking for them. The only wrinkle is that, in this case, cifar10 doesn't natively expose them, so you need to modify the cifar10.inference function to return both:
# old code in cifar10.inference:
# return softmax_linear
# new code in cifar10.inference:
return softmax_linear, local4
And then modify all the calls to cifar10.inference, like the one we just showed:
logits,local4 = cifar10.inference(images)
predictions = tf.argmax(logits,1)
# Actually run the computation, this time asking for both answers
labels,last_layer = session.run([predictions, local4])
And that's it. last_layer contains the last layer for all of the inputs you gave the model.
As for the second question, that's a much deeper question, but I'm guessing that's why you want to work on it. I'd suggest starting by reading up on some of the papers published in this area. I'm not an expert here, but I do like Bolei Zhou's work. For instance, try looking at Figure 2 in "Learning Deep Features for Discriminative Localization". It's a localization paper, but it's using very similar techniques (and several of Bolei's papers use it).