I have following Signal Images which I want to classify depending upon the shape. Which algorithm is suited to do this ? I have attached 2-2 images of each class.
You'll probably want to use sklearn. Assuming you want to classify these patterns based on images rather than the data from which the images were generated, you can use a simple k-nearest-neighbor (KNN) classifier.
KNN classification is a way you can classify many different types of data. First, the algorithm is trained using tagged data (images, in your case, that appear to be in classes of differing frequencies). Then, the algorithm analyzes untagged data, data you want to classify. The "nearest neighbor" part means that each new piece of data seen by the algorithm is classified based on the k nearest pieces of data that you trained the algorithm with. The idea is that new data of a certain category will be numerically similar to another category. Here's a high-level workflow of how the algorithm works:
train_set = [(img1, 'low freq'), (img2, 'hi freq'), (img3, 'low freq'), (img4, 'med freq'), (img5, 'med freq')]
img_classifier = algorithm(train_set)
Then, you call your trained algorithm on new data to identify untagged images.
test = [img6, img7]
for i in test:
img_classifier(test)
You'll want to use a LOT more than five training images, though. The value of k that you choose is important, too. Assuming you train with the same amount of images for each class (let's say n), for a total of 3n images trained with, a good k to use is be k=n/2. Too high and you risk misclassification because you take into account too much of the training data, too low and you may take into account too little.
There is an excellent tutorial here that you should definitely check out if you decide to use sklearn.
Your images appear to be in very discrete classes. If you don't want to use sklearn, you might be able to classify your images based on the area of the image that your curve covers. If these are the only these three classes, you can try some of these to see if they give you a good threshold for image classification:
Calculate the area of the blue (light+dark) in the image--different frequencies may be covered by different areas.
Check out the ratio of light blue to dark blue, it may be different.
Calculate the maximum y-displacement of the dark blue from the center of the image (x-axis). This will easily separate the high-frequency from the mid and low frequency images, and then you can use the area calculation in the first method to differentiate the low and mid frequencies as they clearly cover different areas.
If you decide to go with the second method, definitely check out the Python Imaging Library. It's used in sklearn, actually, if I'm not mistaken.
Related
As an example I have two pictures with a particular type of clothing of a certain brand.
I can download a lot of different images of this same piece, and color, of clothing
I want to create a model which can recognize the item based on a picture.
I tried to do it using this example:
https://www.tensorflow.org/tutorials/keras/classification.
This can recognize the type of clothing (eg shirt or shoe or trousers, etc) But not a specific item and color.
My goal is to have a model that can tell me that the person on my first picture is wearing the item of my second picture.
As mentioned I can upload a few variations of this same item to train my model, if that would be the best approach.
I also tried to use https://pillow.readthedocs.io
This can do something with color recognition but does not solve my initial goal.
i don't think that CNN can help you in your problemes, take a look at the SIFT Technique see this for more détails.it is used for image matching and i think it's better in your cas. if your not looking to get in to much detailes the opencv is a python (and c++ i think) library that has image matching function that are easy to use more détails .
As mentionned by #nadji mansouri, I would use SIFT technique as it suits your need. But I want just to correct something, CNN is also a thing in this case. This being said, I wouldn't tackle the problem as a classification problem, but rather using Distance Metric Learning, i.e, training a model to generate embeddings that are similar in the space when the inputs are similar, and distant otherwise. But to do this you need a large representative dataset.
In short, I suggest starting with SIFT, using OpenCV, or open source implementations on GitHub, playing around with the parameters and see what fits your case best, and then see if it's really necessary to switch to a neural network, and in this case tackling the problem as a metric learning task, maybe with something like siamese networks.
Some definitions:
Metric learning is an approach based directly on a distance metric that aims to establish similarity or dissimilarity between data (images in your case). Deep Metric Learning on the other hand uses Neural Networks to automatically learn discriminative features from the data and then compute the metric. source.
The Scale-Invariant Feature Transform (SIFT) is a method used in computer vision to detect and describe local features in images. The algorithm is invariant to image scale and rotation, and robust to changes in illumination and affine distortion. SIFT features are represented by local image gradients, which are calculated at various scales and orientations, and are used to identify keypoints in an image. These keypoints and their associated descriptor vectors can then be used for tasks such as image matching, object recognition, and structure from motion. source, with modification.
i have a dataset of images and built a strong image recognition model. now i want to add another label to my model.
i am asking myself, if i have to label every single image in my dataset, which has the requested attribute:
simple example:
lets say i have 500k images in total and i want to label all images which have a palm on it.
lets imagine that around 100k images have a palm on it.
would my model be able to recognise the label palm 80%, 90% or better, if i only label around 20, 30 or 50k images with a palm on it? or do i have to label all 100k images with a palm to get acceptable performance?
from my point of view this could be interpretated in two directions:
multilabel image classification model ignores all 0 labeled attributes and these wont affect model accuracy -> 20k labeled palm images would be good enough for strong performance, because the model is only interested in the attributes labeled as 1. (even if 100k labeled images would result in better performance)
multilabel image classification model will get affected by 0 labeled attributes as well. if only 20k out of 100k palm images will be labeled, the model gets confused, because 80k images have a palm on it, but arent labeled as palm. result would be weak performance regarding this label. if thats the case, all 100k images have to be labeled for strong performance.
Am I right with one of the two suggestions or does multilabel image classification work different?
I have a very big dataset and I have to label all my images by hand, which takes a lot of time. If my first suggestion works, I could save myself weeks of work.
I would appreciate a lot, if you share your expertise, experiences and whys!
The training process uses the negative cases just as much as the positive cases to learn what a palm is. So if some of the supplied negative cases actually contain a palm tree, your model will have a much harder time learning. You could try only labeling the 20k images to start to see if the result is good enough, but for the best result you should label all 100k.
I'm new to the computer vision world, I'm trying to create a script with the objective to gather data from a dataset of images.
I'm interested in what kind of objects are in those images and getting a summary of them in a json file for every image.
I've checked out some YOLO implementations but the ones I've seen are almost always based on COCO and have 80 classes or have a custom dataset.
I've seen that there are algorithms like InceptionV3 etc. which are capable of classifying 1000 classes. But per my understanding object classification is different from object recognition.
Is there a way to use those big dataset classification algos for object detection?
Or any other suggestion?
Unfortunately, I do not know where the breaking point is, and of course, it will depend on acceptable evaluation metrics and training data size.
From a technical point of view, there is no hard limit and if you go to extremes there could be Core ML model size issues and memory issues during inferences. However, that will only happen for an extremely large number of classes.
From a modeling perspective (which is a problem that will happen much earlier than the technical limitation) it is not as clear. As you increase the number of classes, you increase the risk of making classification mistakes. Although, the severity of a lot of the mistakes should simultaneously go down as you will have more and more classes that are naturally similar (breeds of dogs, etc.). The original YOLO9000 paper (https://arxiv.org/pdf/1612.08242.pdf) trained a model using 9000+ classes with reasonable results (lots of mistakes of course, but still impressive). They trained it on a combination of detection and classification data, so if they actually had detection data for all 9000, then results would presumably be even better.
In your experiment, it sounds like 50-60 was OK (thanks for giving us a sample point!). Anything below 100 is definitely tried and true, as long as you have the data. However, will 300 do OK? Will 1000 do OK? Theoretically, I would say yes, if you are able to provide enough training data and you adjust your expectation of what a good evaluation metric is since you know you'll make more mistakes. For instance, for classification with 1000 classes, it is common to report top-5 accuracy (that is, the correct label is in your top-5 classes for a sample).
Here is a useful link - https://github.com/apple/turicreate/issues/968
First, to level set on terminology.
Image Classification based neural networks, such as Inception and Resnet, classify an entire image based upon the classes the network was trained on. So if the image has a dog, then the classifier will most likely return the class dog with a higher confidence score as compared to the other classes the network was trained on. To train a network such as this, it's simple enough to group the same class images (all images with a dog) into folders as inputs. ImageNet and Pascal VOC are examples of public labeled datasets for Image Classification.
Object Detection based neural networks on the other hand, such as SSD and Yolo, will return a set of coordinates that indicate a bounding box and confident score for each class (object) that is detected based upon what the network was trained with. To train a network such as this, each object in an image much as annotated with a set of coordinates that correspond to the bounding boxes of the class (object). The COCO dataset, for example, is an annotated dataset of 80 classes (objects) with coordinates corresponding to the bounding box around each object. Another popular dataset is Object365 that contains 365 classes.
Another important type of neural network that the COCO dataset provides annotations for is Instance Segmentation models, such as Mask RCNN. These models provide pixel-level classification and are extremely compute-intensive, but critical for use cases such as self-driving cars. If you search for Detectron2 tutorials, you will find several great learning examples of training a Mask RCNN network on the COCO dataset.
So, to answer your question, Yes, you can use the COCO dataset (amongst many other options available publicly on the web) for object detection, or, you can also create your own dataset with a little effort by annotating your own dataset with bounding boxes around the object classes you want to train. Try Googling - 'using coco to train ssd model' to get some easy-to-follow tutorials. SSD stands for single-shot detector and is an alternative neural network architecture to Yolo.
What is the best Clustering methodology we can use in Voice domain ?
For example if we have the voice utterances from multiple speakers and we need to cluster them in to specific baskets where each of the baskets correspond to one speaker.For this what is the best clustering algorithm that we can use ?
I'd suggest RNN-LSTM. There is a great tutorial explaining about music genre classification using this neural network. I've watched it and it's very didatic to understand:
First you have to understand your audio data (take a look here). In this link he explains MFCC (Mel Frequency Cepstral Coefficients), which allows you to extract features of your audio data into a spectogram. On image below, each amplitude of the MFCC represents a feature of the audio (e.g. features of the speaker voice).
Then you have to preprocess the data for the classification (practical example here)
And then train your neural network to predict to which speaker the audio belongs. He shows here, but I'd recommend you watch the entire series. I think it's the best I've seen about this topic, giving all the background, code anda dataset necessary to solve such speaker classificatin problem.
Hope you enjoy the links, they've really helped me and sure they will solve your question.
There are two approaches here: supervised classification as Eduardo suggests, or unsupervised clustering. Supervised requires training data (audio clips labeled with who is speaking) while unsupervised does not (although you do need some labeled examples to evaluate the method). Here I'll discuss unsupervised clustering.
The biggest difference is that an unsupervised model that works for this task can be applied to audio clips from new speakers, and any number of speakers!!!!! Supervised models will only work on the speakers, and number of speakers, on which they were trained. This is a huge limitation.
The most important element will be a way to encode each audio clip into a fixed-length vector such that the encoding somehow contains the needed information which is who is speaking. If you transcribed into text, this could be TF*IDF or BERT, which would pick out differences in topic, speech style, etc, but this would perform poorly if the clips of different speakers come from the same conversation. There's probably some pretrained encoder for voice clips that would work well here, not as familiar with these.
Clustering method: Simple k-means may work here, where k would be the number of people included in the dataset if known. If not known, you could use clustering metrics such as inertia and silhouette with the elbow heuristic to pick the optimal k, which may represent the number of speakers if your encoding is really good. Additionally, you could use a hierarchical method like agglomerative clustering if there is some inherent hierarchality in the voice clips such as half of the people talk only about science while the other half talk only about literature, or separating first by gender or age or something.
Evaluation: Use PCA to project each fixed-length vector encoding onto 2D so you can visualize it and assign each cluster's voice clips a unique color. This will show you which clusters are more similar to each other, and the organization of these clusters will show you what features are being represented by the encodings.
Pros and Cons of Unsupervised:
Pros:
Is flexible to number of unique speakers and their voices. Meaning if you successfully build a clusterer that clusters audios based on their speaker, you can take this model and apply it to a totally different set of audios from different people, even a different number of people, and it will likely work similarly. A classifier would need to be trained on the unique people and the same number of people that it is applied to, otherwise it will not work.
No need for large labeled dataset, only enough examples to verify the program works. You can even do this after the fact by just listening to samples in one cluster and seeing if they sound like one person.
Cons:
It may not work. You have little control over what features are represented in the embedding, and thus determine cluster assignment. The way you control this is by picking a method of embedding that does this. An embedding method could be as simple as the average volume of the clip, but what would work better is taking the front half of a supervised model that someone else has trained on a voice task, effectively taking a hidden state from that model and using it as your embedding. If the task is similar to your task, such as a classifier to identify speaker, it will probably work well.
Hard to objectively compare unless you have a labeled test set
My suggestion: If you have a labeled set of voices, use half of this to train a classifier as Eduardo suggests, and use that model's hidden states has your embedding method, then send that to k-means, and use the other half of the labeled examples as a test set.
I have a database of images that contains identity cards, bills and passports.
I want to classify these images into different groups (i.e identity cards, bills and passports).
As I read about that, one of the ways to do this task is clustering (since it is going to be unsupervised).
The idea for me is like this: the clustering will be based on the similarity between images (i.e images that have similar features will be grouped together).
I know also that this process can be done by using k-means.
So the problem for me is about features and using images with K-means.
If anyone has done this before, or has a clue about it, please would you recommend some links to start with or suggest any features that can be helpful.
Most simple way to get good results will be to break down the problem into two parts :
Getting the features from the images: Using the raw pixels as features will give you poor results. Pass the images through a pre trained CNN(you can get several of those online). Then use the last CNN layer(just before the fully connected) as the image features.
Clustering of features : Having got the rich features for each image, you can do clustering on these(like K-means).
I would recommend implementing(using already implemented) 1, 2 in Keras and Sklearn respectively.
Label a few examples, and use classification.
Clustering is as likely to give you the clusters "images with a blueish tint", "grayscale scans" and "warm color temperature". That is a quote reasonable way to cluster such images.
Furthermore, k-means is very sensitive to outliers. And you probably have some in there.
Since you want your clusters correspond to certain human concepts, classification is what you need to use.
I have implemented Unsupervised Clustering based on Image Similarity using Agglomerative Hierarchical Clustering.
My use case had images of People, so I had extracted the Face Embedding (aka Feature) Vector from each image. I have used dlib for face embedding and so each feature vector was 128d.
In general, the feature vector of each image can be extracted. A pre-trained VGG or CNN network, with its final classification layer removed; can be used for feature extraction.
A dictionary with KEY as the IMAGE_FILENAME and VALUE as the FEATURE_VECTOR can be created for all the images in the folder. This will make the co-relation between the filename and it’s feature vector easier.
Then create a single feature vector say X, which comprises of individual feature vectors of each image in the folder/group which needs to be clustered.
In my use case, X had the dimension as : NUMBER OF IMAGE IN THE FOLDER, 128 (i.e SIZE OF EACH FEATURE VECTOR). For instance, Shape of X : 50,128
This feature vector can then be used to fit an Agglomerative Hierarchical Cluster. One needs to fine tune the distance threshold parameter empirically.
Finally, we can write a code to identify which IMAGE_FILENAME belongs to which cluster.
In my case, there were about 50 images per folder so this was a manageable solution. This approach was able to group image of a single person into a single clusters. For example, 15 images of PERSON1 belongs to CLUSTER 0, 10 images of PERSON2 belongs to CLUSTER 2 and so on…