Segmentation by map

Segmentation by map - python

Goodnight,
I have a body thermo image and I need to do a segmentation based on a body map.
I am attaching the images.
[body] (https://storage.googleapis.com/kaggle-forum-message-attachments/536018/13283/body.jpeg)
[map] (https://storage.googleapis.com/kaggle-forum-message-attachments/536018/13284/map.png)
Anyone have any clue and can help-me?
I have tried to overlay the images but, how they are not perfect fit it not worked.
I expected to have a series of images, one for each region.

If you want to use deep learning technique, its Generative adversarial network (GAN) based method. you can search online, its all over the place. Search keyword: deep fake, GAN
The traditional technique is to use shiftmap based method, e.g object rearrangement using shiftmap technique. opencv has simple for impainting/retargeting implementation you can convert this to deformable model rearrangement case
the detailed work can be found at.
http://www.vision.huji.ac.il/shiftmap/inpainting/

Related

Python compare images of, piece of, clothing (identification)

As an example I have two pictures with a particular type of clothing of a certain brand.
I can download a lot of different images of this same piece, and color, of clothing
I want to create a model which can recognize the item based on a picture.
I tried to do it using this example:
https://www.tensorflow.org/tutorials/keras/classification.
This can recognize the type of clothing (eg shirt or shoe or trousers, etc) But not a specific item and color.
My goal is to have a model that can tell me that the person on my first picture is wearing the item of my second picture.
As mentioned I can upload a few variations of this same item to train my model, if that would be the best approach.
I also tried to use https://pillow.readthedocs.io
This can do something with color recognition but does not solve my initial goal.

i don't think that CNN can help you in your problemes, take a look at the SIFT Technique see this for more détails.it is used for image matching and i think it's better in your cas. if your not looking to get in to much detailes the opencv is a python (and c++ i think) library that has image matching function that are easy to use more détails .

As mentionned by #nadji mansouri, I would use SIFT technique as it suits your need. But I want just to correct something, CNN is also a thing in this case. This being said, I wouldn't tackle the problem as a classification problem, but rather using Distance Metric Learning, i.e, training a model to generate embeddings that are similar in the space when the inputs are similar, and distant otherwise. But to do this you need a large representative dataset.
In short, I suggest starting with SIFT, using OpenCV, or open source implementations on GitHub, playing around with the parameters and see what fits your case best, and then see if it's really necessary to switch to a neural network, and in this case tackling the problem as a metric learning task, maybe with something like siamese networks.
Some definitions:
Metric learning is an approach based directly on a distance metric that aims to establish similarity or dissimilarity between data (images in your case). Deep Metric Learning on the other hand uses Neural Networks to automatically learn discriminative features from the data and then compute the metric. source.
The Scale-Invariant Feature Transform (SIFT) is a method used in computer vision to detect and describe local features in images. The algorithm is invariant to image scale and rotation, and robust to changes in illumination and affine distortion. SIFT features are represented by local image gradients, which are calculated at various scales and orientations, and are used to identify keypoints in an image. These keypoints and their associated descriptor vectors can then be used for tasks such as image matching, object recognition, and structure from motion. source, with modification.

how can we apply masked language modelling using multimodal transformer models?

It may not be clear from the question, but how can we apply masked language modelling with text and image given using multimodal models such as VisualBERT or CLIP?
For example, if some text is given (it's Masked) and we mask some word in it, how can we apply MML to predict the word as cat?
Is it possible to give only the text to the model, without the image?
How can we implement such a thing and get MLM estimates from it using the huggingface library API?
A code snippet explaining this would be great. If anyone can help, it would help to have a better understanding.

How do I train the DeepSORT tracker for custom class?

I want to detect and count the number of vines in a vineyard using Deep Learning and Computer Vision techniques. I am using the YOLOv4 object detector and training on the darknet framework. I have been able to integrate the SORT tracker into my application and it works well, but I still have the following issues:
The tracker sometimes reassigns a new ID to the object
The detector sometimes misidentifies the object (which lead to incorrect tracking)
The tracker sometimes does not track a detected object.
You can see an example of the reassignment issue in the following image. As you can see, in frame 40 the id 9 was a metal post, and frame 42 onwards it is being assigned to a tree
In searching for the cause of these problems, I have learnt that DeepSORT is an improved version of the SORT, which aims to handle this problem by using a Neural Network for associating tracks to detections.
Problem:
The problem I am facing is with the training of this particular model for Deepsort. I have seen that the authors have used cosine metric learning to train their model, but I am not being able to customize the learning for my custom classes. The questions I have are as follows:
I have a dataset of annotated (YOLO TXT format) images which I have used to train the YOLOv4 model. Can I reuse the same dataset for the Deepsort tracker? If so, then how?
If I cannot reuse the dataset, then how do I create my own dataset for training the model?
Thanks in advance for the help!

Yes, you can use the same classes for DeepSORT. SORT works in 2 stages, and DeepSORT adds a 3rd stage. First stage is detection, which is handled by YOLOv3, next is track association, which is handled by Kalman Filter and IOU. DeepSORT implements the 3rd stage, a Siamese network to compare the appearance features between current detections and the features of each track. I've seen implementations use ResNet as the feature embedding network
Basically once YOLO detects your class, you pass the cropped detected image over to your siamese network and it converts it into feature embeddings and compares those features with the past ones using cosine distance.
In conclusion, you can use the same YOLO classes for DeepSORT and SORT since they both need a detection stage, which is handled by YOLO.

Neural network library for true-false based image recognition

I'm in need of an artificial neural network library (preferably in python) for one (simple) task. I want to train it so that it can tell wether a thing is in an image. I would train it by feeding it lots of pictures and telling it wether it contains the thing I'm looking for or not:
These images contain this thing, return True (or probability of it containing the thing)
These images do not contain this thing, return False (or probability of it containing the thing)
Does such a library already exist? I'm fairly new to ANNs and image recognition; although I understand how they both work in principle I find it quite hard to find an adequate library for this task, and even research in this field has proven to be kind of a frustration - any advice towards the right direction is greatly appreciated.

There are several good Neural Network approaches in Python, including TensorFlow, Caffe, Lasagne, and sknn (Sci-kit Neural Network). sknn provides an easy, out of the box solution, although in my opinion it is more difficult to customize and can be slow on large datasets.
One thing to consider is whether you want to use a CNN (Convolutional Neural Network) or a standard ANN. With an ANN you will mostly likely have to "unroll" your images into a vector whereas with a CNN, it expects the image to be a cube (if in color, a square otherwise).
Here is a good resource on CNNs in Python.
However, since you aren't really doing a multiclass image classification (for which CNNs are the current gold standard) and doing more of a single object recognition, you may consider a transformed image approach, such as one using the Histogram of Oriented Gradients (HOG).
In any case, the accuracy of a Neural Network approach, especially when using CNNs, is highly dependent on successful hyperparamter tuning. Unfortunately, there isn't yet any kind of general theory on what hyperparameter values (number and size of layers, learning rate, update rule, dropout percentage, batch size, etc.) are optimal in a given situation. So be prepared to have a nice Training, Validation, and Test set setup in order to fit a robust model.

I am unaware of any library which can do this for you. I use a lot of Caffe and can give you a solution till you find a single library which can do it for you.
I hope you know about ImageNet and that Caffe has a trained model based on ImageNet.
Here is the idea:
Define what the object is. Say object = "laptop".
Use Caffe's ImageNet trained model, change the code to display the required output you want (you mentioned TRUE or FALSE) when the object is in the output labels.
Here is a link to the ImageNet tutorial which I wrote.
Here is what you might try:
Take a look here. It is a stripped down version of the ImageNet program which I used in a prediction engine.
In line 80 you'll get the top-1 predicted output label. In line 86 you'll get the top-5 predicted labels. Write a line of code to check whether object is in the output_label and return TRUE or FALSE according to it.
I understand that you are looking for a specific library, I will look for it, but this is something I would try out in the beginning.

Can a neural network recognize a screen and replicate a finite set of actions?

I learned, that neural networks can replicate any function.
Normally the neural network is fed with a set of descriptors to its input neurons and then gives out a certain score at its output neuron. I want my neural network to recognize certain behaviours from a screen. Objects on the screen are already preprocessed and clearly visible, so recognition should not be a problem.
Is it possible to use the neural network to recognize a pixelated picture of the screen and make decisions on that basis? The amount of training data would be huge of course. Is there way to teach the ANN by online supervised learning?
Edit:
Because a commenter said the programming problem would be too general:
I would like to implement this in python first, to see if it works. If anyone could point me to a resource where i could do this online-learning thing with python, i would be grateful.

I would suggest
http://www.neuroforge.co.uk/index.php/getting-started-with-python-a-opencv
http://docs.opencv.org/doc/tutorials/ml/table_of_content_ml/table_of_content_ml.html
http://blog.damiles.com/2008/11/the-basic-patter-recognition-and-classification-with-opencv/
https://github.com/bytefish/machinelearning-opencv
openCV is basically an image processing library but also has some amazing helper classes that you you can use for almost any task. Its machine learning module is pretty easy to use and you can go through the source to see explanation and background theory about each function.
You could also use a pure python machine learning library like:
http://scikit-learn.org/stable/
But, before you feed in the data from your screen (i'm assuming thats in pixels?) to your ANN or SVM or whatever ML algorithm you choose, you need to perform "Feature Extraction" on your data. (which are the objects on the screen)
Feature Extraction can be thought of like representing the same data on the screen but with fewer numbers so i have less numbers to give to my ANN. You need to experiment with different features before you find a combination that works well for your particular scenario. a sample one could look something like this:
[x1,y1,x2,y2...,col]
This is basically a list of edge points that represent the area your object is in. a sort of ROI (Region of Interest) and perform egde detection, color detection and also extract any other relevant characteristics. The important thing is that now all your objects, their shape/color information is represented by a number of these lists, one for each object detected.
This is the data that can be provided as input to the neural network. but you'll have to define some meaningfull output parameters depending on your specific problem statements before you can train/test your system of course.
Hope this helps.

This is not entirely correct.
A 3-layer feedforward MLP can theoretically replicate any CONTINUOUS function.
If there are discontinuities, then you need a 4th layer.
Since you are dealing with pixelated screens and such, you probably would need to consider a fourth layer.
Finally, if you are looking at circular shapes, etc., than a radial basis function (RBF) network may be more suitable.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.