SimCLR does not learn representations

SimCLR does not learn representations - python

So I'm trying to train a SimCLR network with a custom lightweight ConvNet backbone (tried it with a ResNet already) on a dataset containing first 5 letters of the alphabet out of which two are randomly selected and placed in random positions in the image. I am unsure of what augmentations to use in such a scenario, so I only use Image translation to provide some degree of difference between the augmented samples.
This sounds like an extremely trivial task, but it performs VERY poorly on a multi-label classifier built on top of the frozen pretrained network. I'm quite certain this is because of how poor the quality of representations learnt are rather than the linear classifier. This works well on a supervised classifier, obviously.
Variations I've tried till now:
Made the dataset single letter, random position (multi-class) and it performed very well.
Made the dataset with random letters, but same center position, and it performed well. Same augmentation mentioned above for these as well.
Sample image from dataset (Label here is [1, 1, 0, 0, 0] for the letters that are present)
Can someone please help me figure out how to make this work?

This is not the first time I hear of someone trying SimCLR and getting horrible results...
I have some questions:
Have you tried other losses for the contrastive pretraining part? How about triplet loss?
Are the representations normalised?
Are you getting good results with contrastive pretraining in the variations that you mention?
Are you getting good supervised classification results with both models (Resnet and custom convnet)?
Have you tried to visualise the features learned by the model in the conv layers?
You could also try to visualise the feature maps with forward hooks and see what is the network "looking at".

Related

Multi-label text classification with non-uniform distribution of class labels for every train data

I have a multi-label classification problem, I want to classify texts with six labels, each text can have one to six labels but this label distribution is not equal. For example, 10 people annotated sentence1 as below:
These labels are the number of votes for that class. I can normalize them like sad 0.7, anger 0.2, fear 0.1, happy 0.0,...
What is the best classifier for this problem? What is the best type for labels I mean I should normalize them or not?
What keywords should I search for this kind of multi-label classification problem where the probability of labels is not equal?

Well, first, to clarify if I understand your problem correctly. You have sentences=[sent1, sent2, ... sentn] and you want to classify them into these six labels labels=[l1,l2,...,l6]. Your data isn't the labels themselves, but the probability of having that label in the text. You also mentioned the six labels comes from human annotation (I don't know what you mean by 10 people commented, I'll guess it is annotation)
If this is the case, you can deal with the problem with multi-label classification or a multi-target regression perspectives. I'll approach what you can do with your data both cases:
Multilabel Classification: In this case, you need to define the classes for each sentence so that you can train your model. Right now you have only the probabilities. You can do that by creating a threshold and the probabilities of labels that are above the threshold can be considered the labels for a sentence. You can read more about the evaluation metrics here.
Multi-target Regression: In this case, you don't need to define the classes, you just use the training input and we use the data to predict the probabilities for each label. I think it is a better and easier problem, given your data collection. If you want to know more about the problem of multi-target regression, you can read more about it here, but the models they used in this tutorial are not the the state-of-the-art (be aware of it).
Training Models: You can use both shallow and deep models for this task. You need a model that can receive a sentence as input and predict six labels or six probabilities. I suggest you take a look into this example, it can be a very good starting point for your work. The author provides a tutorial on how to build a multi-label text classifier using deep neural networks. He basically built a LSTM and a Feed-forward layer in the end to classify the labels. If you decide to use regression instead of classification, you can just drop the activation in the end.
The best results are likely to be obtained by deep neural networks, so the article I sent you can work very well. I also suggest you take a look in the state-of-the-art methods for text classification, such as BERT or XLNET. I implemented a Multi-label classification method using BERT, maybe it can be helpful to you.

Clustering a set of images

I have a folder with hundres/thousands of images, some of them look alike. I would like to create clusters separating those images (those which look alike in the same cluster).
I can't determine the number of clusters that will be needed, it depends on the images.
Does anyone have an idea on how to do this using Python, OpenCV and which algorithm to use?
I've made some research and found that AffinityPropagation or DBSCAN can be useful for me but I don't know where to start (how to encode my images, what should I pass to those algorithms etc...)

Unfortunately it is not that simple with images, since naively clustering would result in clusters of images with the same colors, not the same "content". You can use a neural network as a feature extractor for the images, I see two options:
Use a pre-trained network and get the features from an intermediate layer
Train an autoencoder on your dataset, and use the latent features
Option 1 is cheaper since you can easily find pre-trained models, option 2 is much more computationally expensive but should work better, especially if there is no pre-trained model on your domain.
This tutorial (randomly found on the internet) seems to be a good introduction to method 2.

Unify text and image classification (Python)

I am working on a code to classify texts of scientific articles (using the title and the abstract). And for this I'm using an SVM, which delivers a good accuracy (83%). At the same time I used a CNN to classify the images of these articles. My idea is to merge the text classifier with the image classifier, to improve the accuracy.
It is possible? If so, you would have some idea how I could implement it or some kind of guideline?
Thank you!

You could use the CNN to do both. For this you'd need two (or even three) inputs. One for the text (or two where one is for the abstract and the other for the title) and the second input for the image. Then you'd have some conv-max pooling layers before you merge them at one point. You then plug in some additional CNN or dense layers.
You could also have multiple outputs in this model. E.g a combined one, one for the text and one for the images. If you're using keras you would need the functional API. A picture of an example model can be found here (They're using LSTM in the example, but I guess you should stick to CNN.)

If you get probability from both classifiers you can average them and take the combined result. However taking a weighted average might be a better approach in which case you can use a validation set to find the suitable value for the weight.
prob_svm = probability from SVM text classifier
prob_cnn = probability from CNN image classifier
prob_total = alpha * prob_svm + (1-alpha) * prob_cnn # fine-tune alpha with validation set
If you can get another classifier (maybe a different version of any of these two classifiers), you can also do a majority voting i.e., take the class on which two or all three classifiers agree on.

What algorithm to chose for binary image classification

Lets say I have two arrays in dataset:
1) The first one is array classified as (0,1) - [0,1,0,1,1,1,0.....]
2) And the second array costists of grey scale image vectors with 2500 elements in each(numbers from 0 to 300). These numbers are pixels from 50*50px images. - [[13 160 239 192 219 199 4 60..][....][....][....][....]]
The size of this dataset is quite significant (~12000 elements).
I am trying to build bery basic binary classificator which will give appropriate results. Lets say I wanna choose non deep learning but some supervised method.
Is it suitable in this case? I've already tried SVM of sklearn with various parameters. But the outcome is inappropriately inacurate and consists mainly of 1: [1,1,1,1,1,0,1,1,1,....]
What is the right approach? Isnt a size of dataset enough to get a nice result with supervised algorithm?

You should probably post this on cross-validated:
But as a direct answer you should probably look into sequence to sequence learners as it has been clear to you SVM is not the ideal solution for this.
You should look into Markov models for sequential learning if you dont wanna go the deep learning route, however, Neural Networks have a very good track record with image classification problems.
Ideally for a Sequential learning you should try to look into Long Short Term Memory Recurrent Neural Networks, and for your current dataset see if pre-training it on an existing data corpus (Say CIFAR-10) may help.
So my recomendation is give Tensorflow a try with a high level library such as Keras/SKFlow.
Neural Networks are just another tool in your machine learning repertoire and you might aswell give them a real chance.
An Edit to address your comment:
Your issue there is not a lack of data for SVM,
the SVM will work well, for a small dataset, as it will be easier for it to overfit/fit a separating hyperplane on this dataset.
As you increase your data dimensionality, keep in mind that separating it using a separating hyperplane becomes increasingly difficult[look at the curse of dimensionality].
However if you are set on doing it this way, try some dimensionality reduction
such as PCA.
Although here you're bound to find another fence-off with Neural Networks,
since the Kohonen Self Organizing Maps do this task beautifully, you could attempt to
project your data in a lower dimension therefore allowing the SVM to separate it with greater accuracy.
I still have to stand by saying you may be using the incorrect approach.

Neural network library for true-false based image recognition

I'm in need of an artificial neural network library (preferably in python) for one (simple) task. I want to train it so that it can tell wether a thing is in an image. I would train it by feeding it lots of pictures and telling it wether it contains the thing I'm looking for or not:
These images contain this thing, return True (or probability of it containing the thing)
These images do not contain this thing, return False (or probability of it containing the thing)
Does such a library already exist? I'm fairly new to ANNs and image recognition; although I understand how they both work in principle I find it quite hard to find an adequate library for this task, and even research in this field has proven to be kind of a frustration - any advice towards the right direction is greatly appreciated.

There are several good Neural Network approaches in Python, including TensorFlow, Caffe, Lasagne, and sknn (Sci-kit Neural Network). sknn provides an easy, out of the box solution, although in my opinion it is more difficult to customize and can be slow on large datasets.
One thing to consider is whether you want to use a CNN (Convolutional Neural Network) or a standard ANN. With an ANN you will mostly likely have to "unroll" your images into a vector whereas with a CNN, it expects the image to be a cube (if in color, a square otherwise).
Here is a good resource on CNNs in Python.
However, since you aren't really doing a multiclass image classification (for which CNNs are the current gold standard) and doing more of a single object recognition, you may consider a transformed image approach, such as one using the Histogram of Oriented Gradients (HOG).
In any case, the accuracy of a Neural Network approach, especially when using CNNs, is highly dependent on successful hyperparamter tuning. Unfortunately, there isn't yet any kind of general theory on what hyperparameter values (number and size of layers, learning rate, update rule, dropout percentage, batch size, etc.) are optimal in a given situation. So be prepared to have a nice Training, Validation, and Test set setup in order to fit a robust model.

I am unaware of any library which can do this for you. I use a lot of Caffe and can give you a solution till you find a single library which can do it for you.
I hope you know about ImageNet and that Caffe has a trained model based on ImageNet.
Here is the idea:
Define what the object is. Say object = "laptop".
Use Caffe's ImageNet trained model, change the code to display the required output you want (you mentioned TRUE or FALSE) when the object is in the output labels.
Here is a link to the ImageNet tutorial which I wrote.
Here is what you might try:
Take a look here. It is a stripped down version of the ImageNet program which I used in a prediction engine.
In line 80 you'll get the top-1 predicted output label. In line 86 you'll get the top-5 predicted labels. Write a line of code to check whether object is in the output_label and return TRUE or FALSE according to it.
I understand that you are looking for a specific library, I will look for it, but this is something I would try out in the beginning.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.