Is it possible to combine multiple types of data through a neural network and in return output a particular datatype?
For example, can I input an image and some metadata about that image, then the algorithm will output a number.
I'm thinking along the lines of stitching a CNN and ANN together.
Thanks in advance!
Yes, that is possible. Actually, that is pretty straightforward.
Commonly, what happens in image classification (for example) is that a so-called feature map (generated by the last convolutional layer in this case) gets flattened. This flattened tensor is then fed through a feed-forward Neural Network (NN) to perform the classification task.
The easiest would be to simply add one layer to your network which concatenates two tensors a and b, where a is the flattened output of the last convolutional layer and b is your meta-data (which is then ideally also a flattened tensor).
Afterwards, you simply feed the concatenated tensor (containing both the encoding of your image and the meta-data) through the feed-forward NN to perform the final classification (or generation of whatever the desired output is).
Related
I was reading a decent paper S-DCNet and I fell upon a section (page3,table1,classifier) where a convolution layer has been used on the feature map in order to produce a binary classification output as part of an internal process. Since I am a noob and when someone talks to me about classification I automatically make a synapse relating to FCs combined with softmax, I started wondering ... Is this a possible thing to do? Can indeed a convolutional layer be used to classify a binary outcome? The whole concept triggered my imagination so much that I insist on getting answers...
Honestly, how does this actually work? What is the difference between using a convolution filter instead of a fully connected layer for classification purposes?
Edit (Uncertain answer on how does it work): I asked a colleague and he told me that using a filter of the same shape as the length-width shape of the feature map at the current stage, may lead to a learnable binary output (considering that you also reduce the #channels of the feature map to a single channel). But I still don't understand the motivations behind such a technique ..
Using convolutions as FCs can be done (for example) with filters of spatial size (1,1) and with depth of the same size as the FC input size.
The resulting feature map would be of the same size as the input feature map, but each pixel would be the output of a "FC" layer whose weights are the weights of the shared 1x1 conv filter.
This kind of thing is used mainly for semantic segmentation, meaning classification per pixel. U-net is a good example if memory serves.
Also see this.
Also note that 1x1 convolutions have other uses as well.
paperswithcode probably some of the nets there use this trick.
We need to implement a new approach using a generative deep learning model based on autoencoders to crypt any type of data And the idea is to use autoencoders to reduce dimensions of data so is it possible and how???
Is it possible? Yes! Autoencoders can be a solution for representing one information into another representation.
Autoencoders are a unsupervised learning technique where the goal is to make the input and output have the same value. So what is the use of autoencoders? They have hidden layers, usually with a smaller number of dimensions and the data is cloudy during this phase. You can still reconstruct the original data using the second part of the ANN, the decoder.
The ideia is:
original data > input layer > hidden layer (different number of nodes) > output layer > original data
The input and hidden layer: encoder
The hidden layer and output layer: decoder
Here you can find more information:
https://towardsdatascience.com/applied-deep-learning-part-3-autoencoders-1c083af4d798
One way would be to use a big hidden layer, higher than the data dimension in order to create an "encrypted" version, but I really don't see the point if you can use classical cryptography.
I am working on a problem which requires me to build a deep learning model that based on certain input image it has to output another image. It is worth noting that these two images are conceptually related but they don't have the same dimensions.
At first I thought that a classical CNN with a final dense layer whose argument is the multiplication of the height and width of the output image would suit this case, but when training it was giving strange figures such as accuracy of 0.
While looking for some answers on the Internet I discovered the concepts of CNN autoencoders and I was wondering if this approach could help me solve my problem. Among all the examples I saw, the input and output of an autoencoder had the same size and dimensions.
At this point I wanted to ask if there was a type of CNN autoencoders that produce an output image that has different dimension compared to input image.
Auto-encoder (AE) is an architecture that tries to encode your image into a lower-dimensional representation by learning to reconstruct the data from such representation simultaniously. Therefore AE rely on a unsupervised (don't need labels) data that is used both as an input and as the target (used in the loss).
You can try using a U-net based architecture for your usecase. A U-net would forward intermediate data representations to later layers of the network which should assist with faster learning/mapping of the inputs into a new domain..
You can also experiment with a simple architecture containing a few ResNet blocks without any downsampling layers, which might or might not be enough for your use-case.
If you want to dig a little deeper you can look into Disco-GAN and related methods.They explicitly try to map image into a new domain while maintaining image information.
I'm working on a recursive auto-encoder. The neural network takes two 2D images each shaped (28,28,1) and combined to create an input of (28,28,2). They are encoded into a (28,28,1) shape, and decoded back into the original shape (28,28,2). Thus, the encoded form of data can be fed into the auto-encoder for recursive operation.
We can assume that channel 1 is a new image, and channel 2 is previously encoded data. How do I create a loss function that penalises more heavily for mistakes reconstructing channel 2 (as this will carry previously encoded data)?
I am working in Keras, with a Tensorflow back-end.
Alternatively, is there a way to train the network as a complete tree, as opposed to doing it only for single two input - two output blocks at a time?
You can separate your decoded (28, 28, 2) back into 2 images as output and use loss_weights to assign a weight of importance. From the documentation:
model.compile(optimizer='rmsprop', loss='binary_crossentropy',
loss_weights=[1., 0.2])
And yes, all models in Keras are like layers so you can chain them together to construct a tree for example. Then you can train the network in one go and decide whether you would like to share weights etc. However, it could be more difficult to train. I would recommend using the functional API to create these more complex structures so you have more control.
I am trying to replicate a neural network for depth estimation. The original authors have taken a pre-trained network and added between the fully connected layer and the convolutional layer a 'Superpixel Pooling Layer'. In this layer, the convolutional feature maps are upsampled and the features per superpixel are averaged.
My problem is that in order to successfully achieve this, I need to calculate the superpixels per image. How can I access the data being used by keras/tensorflow during batch processing to perform SLIC oversegmentation?
I considered splitting the tasks and working by pieces i.e. feed the images into the convolutional network. Process the outputs separately and then feed them into a fully connected layer. However, this makes further training of the network impossible.
At the time it seems to be impossible to actually access the data within the symbolic tensor. It also seems unlikely that such functionality will be added in the future since in the Tensorflow page it says:
A Tensor object is a symbolic handle to the result of an operation, but
does not actually hold the values of the operation's output.
Keras allows for the creation of personalized layers. However, these are limited by the available backend operations. As such, it is simply not possible to access the batch data.