Is it possible to get intermediate layer output after TensorRT? - python

We now have a trained network for classification task. The top of the network is like
so the layer relu_fc1 is something like extracted features, then softmax to class prediction.
Now we want to extract these features directly. In normal case, we can do it by
y = sess.graph.get_tensor_by_name('relu_fc1:0') sess.run(y,...)
That's great, but we still want to make it faster, so we use TensorRT to convert the saved model. However, after the conversion, we can't get the right tensor in the relu_fc1 because TensorRT mixed the operation up and produced something like TRTENgineOp_1.
I want to know is there a way to get the intermediate layer's output after TensorRT? I guess maybe it's easier for us can delete the last layers in the network then do the conversion, but can't find practical materials for removing the layers in tensorflow.

I want to know is there a way to get the intermediate layer's output after TensorRT? I guess maybe it's easier for us can delete the last layers in the network then do the conversion, but can't find practical materials for removing the layers in tensorflow.
For this question, when you do the tf-to-onnx conversion, you can specify which layer as the final output for the onnx model. Then, you can do the onnx-to-tensorrt conversion.
For more details, see tensorflow-onnx. The --outputs parameter is what you want.

Related

Can I use a convolution filter instead of a dense layer for clasification?

I was reading a decent paper S-DCNet and I fell upon a section (page3,table1,classifier) where a convolution layer has been used on the feature map in order to produce a binary classification output as part of an internal process. Since I am a noob and when someone talks to me about classification I automatically make a synapse relating to FCs combined with softmax, I started wondering ... Is this a possible thing to do? Can indeed a convolutional layer be used to classify a binary outcome? The whole concept triggered my imagination so much that I insist on getting answers...
Honestly, how does this actually work? What is the difference between using a convolution filter instead of a fully connected layer for classification purposes?
Edit (Uncertain answer on how does it work): I asked a colleague and he told me that using a filter of the same shape as the length-width shape of the feature map at the current stage, may lead to a learnable binary output (considering that you also reduce the #channels of the feature map to a single channel). But I still don't understand the motivations behind such a technique ..
Using convolutions as FCs can be done (for example) with filters of spatial size (1,1) and with depth of the same size as the FC input size.
The resulting feature map would be of the same size as the input feature map, but each pixel would be the output of a "FC" layer whose weights are the weights of the shared 1x1 conv filter.
This kind of thing is used mainly for semantic segmentation, meaning classification per pixel. U-net is a good example if memory serves.
Also see this.
Also note that 1x1 convolutions have other uses as well.
paperswithcode probably some of the nets there use this trick.

Can CNN autoencoders have different input and output dimensions?

I am working on a problem which requires me to build a deep learning model that based on certain input image it has to output another image. It is worth noting that these two images are conceptually related but they don't have the same dimensions.
At first I thought that a classical CNN with a final dense layer whose argument is the multiplication of the height and width of the output image would suit this case, but when training it was giving strange figures such as accuracy of 0.
While looking for some answers on the Internet I discovered the concepts of CNN autoencoders and I was wondering if this approach could help me solve my problem. Among all the examples I saw, the input and output of an autoencoder had the same size and dimensions.
At this point I wanted to ask if there was a type of CNN autoencoders that produce an output image that has different dimension compared to input image.
Auto-encoder (AE) is an architecture that tries to encode your image into a lower-dimensional representation by learning to reconstruct the data from such representation simultaniously. Therefore AE rely on a unsupervised (don't need labels) data that is used both as an input and as the target (used in the loss).
You can try using a U-net based architecture for your usecase. A U-net would forward intermediate data representations to later layers of the network which should assist with faster learning/mapping of the inputs into a new domain..
You can also experiment with a simple architecture containing a few ResNet blocks without any downsampling layers, which might or might not be enough for your use-case.
If you want to dig a little deeper you can look into Disco-GAN and related methods.They explicitly try to map image into a new domain while maintaining image information.

tensorflow save and restore autoencoder

I used tf.layers.dense to build a fully connected autoencoder. and I want to save it and restore only the encoder to get the embedding output.
How to use tf.train.saver to restore only the encoder? Because I want to set different batch size of the restored model, to input only one data into it.
I saw many tutorials but there is no tutorials about this.
Is there any standard solution about this
Thank you very much
If you don't care about memory space the easiest way is by saving the whole graph (encoder and decoder) and when using it for prediction, you can pass the last layer of the encoder as the fetch argument. Tensorflow will only calculate to this point and you don't have any computational difference compared to only saving the encoder.
Otherwise you can create two graphs (one for the encoder, one for the decoder) an train them at the same time and train them together. But this is I bit more complex.

Multilabel classification using LSTM on variable length signal using Keras

I have recently started working on ECG signal classification in to various classes. It is basically multi label classification task (Total 4 classes). I am new to Deep Learning, LSTM and Keras that why i am confused in few things.
I am thinking about giving normalized original signal as input to the network, is this a good approach?
I also need to understand training input shape for LSTM as ECG signals are of variable length (9000 to 18000 samples) and usually classifier need fixed variable input. How can i handle such type of input in case of LSTM.
Finally what should be structure of deep LSTM network for such lengthy input and how many layers should i use.
Thanks for your time.
Regards
I am thinking about giving normalized original signal as input to the network, is this a good approach?
Yes this is a good approach. It is actually quite standard for Deep Learning algorithms to give them your input normalized or rescaled.
This usually helps your model converge faster, as now you are inside smaller range (i.e.: [-1, 1]) instead of greater un-normalized ranges from your original input (say [0, 1000]). It also helps you get better, more precise results, as it helps solve problems like the vanishing gradient as well as adapting better to modern activation and optimizer functions.
I also need to understand training input shape for LSTM as ECG signals are of variable length (9000 to 18000 samples) and usually classifier need fixed variable input. How can i handle such type of input in case of LSTM.
This part is really important. You are correct, LSTM expects to receive inputs with a fixed shape, one that you know beforehand (in fact, any Deep Learning layer expects fixed shape inputs). This is also explained in the keras docs on Recurrent Layers where they say:
Input shape
3D tensor with shape (batch_size, timesteps, input_dim).
As we can see, it expects your data to have a number of timesteps as well as a dimension on each one of those timesteps (batch size is usually 1). To exemplify, suppose your input data consists of elements like: [[1,4],[2,3],[3,2],[4,1]]. Then, using a batch_size of 1, the shape of your data would be (1,4,2). As you have 4 timesteps, each with 2 features.
So bottom line, you have to make sure that you pre-process you data so it has a fixed shape you can then pass to your LSTM layers. This one you will have to find out by yourself, as you know your data and problem better than we do.
Maybe you can fix the samples you obtain from your signal, discarding some and keeping others so every signal is of the same length (if you say your signals are between 9k and 18k choosing 9000 could be the logical choice, discarding samples from the others you get). You could even do some other conversion to your data in a way that you can map from inputs of 9000-18000 to a fixed size.
Finally what should be structure of deep LSTM network for such lengthy input and how many layers should i use.
This one is really quite broad and doesn't have a unique answer. It would depend on the nature of your problem, and determining those parameters a priori is not so straightforward.
What I recommend you do is to start with a simple model first, and then add layers and blocks (neurons) incrementally until you are satisfied with the results.
Try just one hidden layer first, train and test your model and check your performance. You can then add more blocks and see if your performance improved. You can also add more layers and check for the same until you are satisfied.
This is a good way to create Deep Learning models, as you will arrive to the results you want while keeping your Network as lean as possible, which in turn helps your execution time and complexity. Good luck with your coding, hope you find this useful.

PyBrain Training Multiple Output Modules

I would like to train a network with multiple output layers.
in->hidden->out 1
->out 2
Is this possible? If so how do I setup the datasets and trainer to accomplish training.
As you are looking into splitting your output in order to have several SoftMax regions, you can use PartialSoftmaxLayer provided by PyBrain.
Note that it is limited to slices of the same length, but its code can inspire you if you require a custom output layer:
https://github.com/pybrain/pybrain/blob/master/pybrain/structure/modules/softmax.py
No. You can have multiple hidden layers, like this
in -> hidden 1 -> hidden 2 -> out
Alternatively, you can have multiple output neurons (in a single output layer).
Technically, you can set up any arrangement of neurons and layers, connect them however you like, and call them whatever you want, but the above is the general way of doing it.
It would be more work for you as the programmer, but if you want to have two different outputs, you can always concatenate your outputs into one vector and use that as the output for the network.
in --> hidden --> concatenate([out1, out2])
A possibly significant drawback of this approach is that if the two outputs are of different scales, then concatenation will distort the error metric you use to train the network.
However, if you were able to use two separate outputs, then you'd still need to solve this problem, likely by somehow weighting the two error metrics that you use.
Potential solutions to this problem could include defining a custom error metric (e.g., by using a variant of weighted squared error or weighted cross-entropy) and/or standardizing the two output datasets so that they exist in a common scale.

Categories

Resources