Implementing fast dense feature extraction in PyTorch

Implementing fast dense feature extraction in PyTorch - python

I am trying to implement this paper in PyTorch Fast Dense Feature Extractor but I am having trouble converting the Torch implementation example they provide into PyTorch.
My attempt thus far has the issue that when adding an additional dimension to the feature map then the convolutional weights don't match the feature shape. How is this managed in Torch (from their implementation it seem that Torch doesn't care about this, but PyTorch does). My code: https://gist.github.com/system123/c4b8ef3824f2230f181f8cfba84f0cfd
Any other solutions to this problem would be great too. Basically, I have a feature extractor that converts a 128x128 patch into an embedding and I'd like to apply this in a dense manner across a larger image without using a for loop to evaluate the CNN on each location as that has a lot of duplicate computation.

It is your lucky day as I have recently uploaded a PyTorch and TF implementation of the paper Fast Dense Feature Extraction with CNNs with Pooling Layers.
An approach to compute patch-based local feature descriptors efficiently in presence of pooling and striding layers for whole images at once.
See https://github.com/erezposner/Fast_Dense_Feature_Extraction for details.
It contains simple instructions that will explain how to use the Fast Dense Feature Extraction (FDFE) project.
Good luck

Related

Can I use a convolution filter instead of a dense layer for clasification?

I was reading a decent paper S-DCNet and I fell upon a section (page3,table1,classifier) where a convolution layer has been used on the feature map in order to produce a binary classification output as part of an internal process. Since I am a noob and when someone talks to me about classification I automatically make a synapse relating to FCs combined with softmax, I started wondering ... Is this a possible thing to do? Can indeed a convolutional layer be used to classify a binary outcome? The whole concept triggered my imagination so much that I insist on getting answers...
Honestly, how does this actually work? What is the difference between using a convolution filter instead of a fully connected layer for classification purposes?
Edit (Uncertain answer on how does it work): I asked a colleague and he told me that using a filter of the same shape as the length-width shape of the feature map at the current stage, may lead to a learnable binary output (considering that you also reduce the #channels of the feature map to a single channel). But I still don't understand the motivations behind such a technique ..

Using convolutions as FCs can be done (for example) with filters of spatial size (1,1) and with depth of the same size as the FC input size.
The resulting feature map would be of the same size as the input feature map, but each pixel would be the output of a "FC" layer whose weights are the weights of the shared 1x1 conv filter.
This kind of thing is used mainly for semantic segmentation, meaning classification per pixel. U-net is a good example if memory serves.
Also see this.
Also note that 1x1 convolutions have other uses as well.
paperswithcode probably some of the nets there use this trick.

How to build a Neural Network with sentence embeding concatenated to pre-trained CNN

I want to build a neural network that will take the feature map from the last layer of a CNN (VGG or resnet for example), concatenate an additional vector (for example , 1X768 bert vector) , and re-train the last layer on classification problem.
So the architecture should be like in:
but I want to concat an additional vector to each feature vector (I have a sentence to describe each frame).
I have 5 possible labels , and 100 frames in the input frames.
Can someone help me as to how to implement this type of network?

I would recommend looking into the Keras functional API.
Unlike a sequential model (which is usually enough for many introductory problems), the functional API allows you to create any acyclic graph you want. This means that you can have two input branches, one for the CNN (image data) and the other for any NLP you need to do (relating to the descriptive sentence that you mentioned). Then, you can feed in the combined outputs of these two branches into the final layers of your network and produce your result.
Even if you've already created your model using models.Sequential(), it shouldn't be too hard to rewrite it to use the functional API.
For more information and implementation details, look at the official documentation here: https://keras.io/guides/functional_api/

How can I implement a random shear preprocessing layer in tensorflow 2?

I am looking for a layer that randomly shears a batch of images, such as the preprocessing layers in tf.keras.layers.experimental.preprocessing. However, there doesn't seem to be any such layer.
There are a few similar questions to implementing shear layers in TF, but those use deprecated methods in tf.contrib (here and here). Can someone point me in a direction to implement random shears in tensorflow 2?

Few of the operations of tf.contrib in Tensorflow 2.x is moved to Tensorflow addons.
Equivalent for tf.contrib.image.transoform is tfa.image.transform
For more details on library find here
And also take a look at Random_shear using tf.keras.preprocessing.image.random_shear

Can CNN autoencoders have different input and output dimensions?

I am working on a problem which requires me to build a deep learning model that based on certain input image it has to output another image. It is worth noting that these two images are conceptually related but they don't have the same dimensions.
At first I thought that a classical CNN with a final dense layer whose argument is the multiplication of the height and width of the output image would suit this case, but when training it was giving strange figures such as accuracy of 0.
While looking for some answers on the Internet I discovered the concepts of CNN autoencoders and I was wondering if this approach could help me solve my problem. Among all the examples I saw, the input and output of an autoencoder had the same size and dimensions.
At this point I wanted to ask if there was a type of CNN autoencoders that produce an output image that has different dimension compared to input image.

Auto-encoder (AE) is an architecture that tries to encode your image into a lower-dimensional representation by learning to reconstruct the data from such representation simultaniously. Therefore AE rely on a unsupervised (don't need labels) data that is used both as an input and as the target (used in the loss).
You can try using a U-net based architecture for your usecase. A U-net would forward intermediate data representations to later layers of the network which should assist with faster learning/mapping of the inputs into a new domain..
You can also experiment with a simple architecture containing a few ResNet blocks without any downsampling layers, which might or might not be enough for your use-case.
If you want to dig a little deeper you can look into Disco-GAN and related methods.They explicitly try to map image into a new domain while maintaining image information.

Tensorflow Anomaly Detection

I was asked to create a machine algorithm using tensorflow and python that could detect anomalies by creating a range of 'normal' values. I have two perameters, a large array of floats around 1.5 and timestamps. I have not seen similar threads using tensorflow in a basic sense, and since I am new to technology I am looking to make a more basic machine. However, I would like to have it be unsupervised, meaning that I do not specify what an anomaly is, but rather a large amount of past data does. Thank you, I am running python 3.5 and tensorflow 1.2.1.

Deep Learning - Anomaly and Fraud Detection
https://exploreai.org/p/deep-learning-anomaly-and-fraud-detection

Simply normalize the values and feed it to the tensorflow autoencoder model.
So, autoencoders are deep neural networks used to reproduce the input at the output layer i.e. the number of neurons in the output layer is exactly the same as the number of neurons in the input layer. Consider the image below
The autoencoders work in a similar way. The encoder part of the architecture breaks down the input data to a compressed version ensuring that important data is not lost but the overall size of the data is reduced significantly. This concept is called Dimensionality Reduction.
Check this repo for code : Autoencoder in tensorflow

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Implementing fast dense feature extraction in PyTorch - python

Related

Can I use a convolution filter instead of a dense layer for clasification?

How to build a Neural Network with sentence embeding concatenated to pre-trained CNN

How can I implement a random shear preprocessing layer in tensorflow 2?

Can CNN autoencoders have different input and output dimensions?

Tensorflow Anomaly Detection

Categories

Resources