I am using python and openCV to create face recognition with Eigenfaces. I stumbled on a problem, since I don't know how to create training set.
Do I need multiple faces of people I want to recognize(myself for example), or do I need a lot of different faces to train my model?
First I tried training my model with 10 pictures of my face and 10 pictures of ScarJo face, but my prediction was not working well.
Now I'm trying to train my model with 20 different faces (mine is one of them).
Am I doing it wrong and if so what am I doing wrong?
You can do both, actually. If you look at the FaceRecognizer train method, it takes in two arguments. The first is a list of pictures. The second is a list of labels (integers) that correspond to the pictures. Use the labels to designate which pictures are which faces. So in your case of just pictures of yourself, the labels would be all the same (0). In the case where there are pictures of yourself and someone else is where it would really matter. For example here's what your labels might look like if you had pictures of both yourself and ScarJo
faces = [scarjo_1, scarjo_2, me_1, me_2, scar_jo_3]
labels = [ 0, 0, 1, 1, 0]
Notice how the last index in labels has a value of 0...the label which corresponds to ScarJo's face.
I later found the answer and would like to share it if someone will be facing the same challenges.
You need pictures only for the different people you are trying to recognise. I created my training set with 30 images of every person (6 persons) and figured out that histogram equalisation can play an important role when creating the training set and later when recognising faces. Using the histogram equalisation model accuracy was greatly increased. Another thing to consider is eye axis alignment so that all pictures have their eye axis aligned before they enter face recognition.
Related
I have images that are 4928x3280 and I'd like to crop them into tiles of 640x640 with a certain percentage of overlap. The issue is that I have no idea how to deal with the bounding boxes of these files in my dataset as I've found this paper,(http://openaccess.thecvf.com/content_CVPRW_2019/papers/UAVision/Unel_The_Power_of_Tiling_for_Small_Object_Detection_CVPRW_2019_paper.pdf), but not code or so referring to how they did this. There are some examples on the internet that actually have the yoloV5 tiling but without overlap like this(https://github.com/slanj/yolo-tiling) one.
Does anyone know how I could make this myself or if someone has an example of this for me?
If you want a ready to go library to make possible tiling and inference for yolov5, there is SAHI:
<https://github.com/obss/sahi
You can use it to create tiles with related annotations, to make inferences and evaluate model performance.
I am trying to classify infected red blood cells(RBC) and uninfected ones and am trying to do some image preprocessing that might help boost accuracy scores. I using this preprocessing for XGBOOST and SVM.
Asking for help here as my capstone tutor is not responding for quite some time.
Image segmentation example
I give two examples 1st on the left and 2nd on the right. My goal is to segment the infected places inside the RBC, the darker spots.
What I have currently done is:
normalize the image
get the histogram of the colored the normalized image
if there is one peak of the channel then pick the last lightest value of the "hill" base. If there are two peaks on the channel then pick the value in-between the "hills".
with the picked values of each channel segment the image in range from (1,1,1) to (red value, green value, blue value)
All of the steps above were done manually and they work (shown in the image link I gave).
I want to do this automatically as I have a huge data set.
My Questions:
How do I get the base values where the peak ends automatically
Also I am using python.
EDIT:
Sorry, I did not realize to add the images I'm working with.
Here is the data set zip folder I am working with: https://data.lhncbc.nlm.nih.gov/public/Malaria/cell_images.zip
Here are separate images just in case:
infected_img
uninfected_img
infected2_img
infected3_img
uninfected2_img
I am new to deep learning, and I am trying to train a ResNet50 model to classify 3 different surgical tools. The problem is that every article I read tells me that I need to use 224 X 224 images to train ResNet, but the images I have are of size 512 X 288.
So my questions are:
Is it possible to use 512 X 288 images to train ResNet without cropping the images? I do not want to crop the image because the tools are positioned rather randomly inside the image, and I think cropping the image will cut off part of the tools as well.
For the training and test set images, do I need to draw a rectangle around the object I want to classify?
Is it okay if multiple different objects are in one image? The data set I am using often has multiple tools appearing in one image, and I wonder if I must only use images that only have one tool appearing at a time.
If I were to crop the images to fit one tool, will it be okay even if the sizes of the images vary?
Thank you.
Is it possible to use 512 X 288 images to train ResNet without cropping the images? I do not want to crop the image because the tools
are positioned rather randomly inside the image, and I think cropping
the image will cut off part of the tools as well.
Yes you can train ResNet without cropping your images. you can resize them, or if that's not possible for some reason, you can alter the network, e.g. add a global pooling at the very end and account for the different input sizes. (you might need to change kernel sizes, or downsampling rate).
If your bigest issue here is that resnet requires 224x224 while your images are of size 512x228, the simplest solution would be to first resize them into 224x224. only if that`s not a possibility for you for some technical reasons, then create a fully convolutional network by adding a global pooling at the end.(I guess ResNet does have a GP at the end, in case it does not, you can add it.)
For the training and test set images, do I need to draw a rectangle around the object I want to classify?
For classification no, you do not. having a bounding box for an object is only needed if you want to do detection (that's when you want your model to also draw a rectangle around the objects of interest.)
Is it okay if multiple different objects are in one image? The data set I am using often has multiple tools appearing in one image, and I
wonder if I must only use images that only have one tool appearing at
a time.
3.Its ok to have multiple different objects in one image, as long as they do not belong to different classes that you are training against. That is, if you are trying to classify apples vs oranges, its obvious that, an image can not contain both of them at the same time. but if for example it contains anything else, a screwdriver, key, person, cucumber, etc, its fine.
If I were to crop the images to fit one tool, will it be okay even if the sizes of the images vary?
It depends on your model. cropping and image size are two different things. you can crop an image of any size, and yet resize it to your desired dimensions. you usually want to have all images with the same size, as it makes your life easier, but its not a hard condition and based on your requirements you can have varying images, etc as well.
One year ago I trained a model to detect flowers. One year later I am starting this project up again, but first I decided to make sure I still remembered by training it to detect and red and green crayons.
My process is more or less following this tutorial –
https://github.com/EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10
I have two labels, green and red. I have 200 training images and 20 test images.
Using faster_rcnn_inception. I followed the steps and ran my model.
It detects the crayons as well as you could with only 200 images, however, can’t tell the red and green crayon apart at all. I thought maybe I had screwed up the settings, but if I move a blue pen in, the label pops up!
Even if I feed it the training images, it classifies 99% of them as two green pens. Even though each image always has two different pens!!!
Can this model work with colour? Or is it converting the colour somehow and messing it up? Is colour hard to detect, and I just need more training images? Have I likely screwed up a setting, since it can’t even correctly classify the training images?
The config file I am using is here:
https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/faster_rcnn_inception_v2_pets.config
I've changed line 9, line 130 and line 108 to false.
In general, neural networks can detect colour.
But often they learn not to. Due to differences in colour temperature and perspective different colours can produce same or similar pixel-level values. Therefore, when training on larger datasets networks tend to become highly colour agnostic. Unfortunately, I can only speak from the gut feeling and can not provide any example or reference, but the picture above should give you a sense why.
In your case issues are further complicated by the fact, that there is a competing task of detecting object box. Due to that during retraining detection net can become insensitive to weak clues like colour.
To troubleshoot the situation I would recommend to look closely on your classification accuracy during retraining. As far as i can tell, tutorial code only provides loss value. One should expect that during retraining at least the train set should be overfit almost perfectly i.e. green and red crayons must become distinguishable. If not, it might make sense to train for longer or decrease the learning rate.
Recently I downloaded some flags from the CIA world factbook. Now I want to "classify them.
Get the colors
Get some shapes (stars, moons etc.)
While browsing I came across the Python Image Library which allows me to extract the colors (i.e. for Austria:
#!/usr/bin/env python
import Image
bild = Image.open("au-lgflag.gif").convert("RGB")
bild.getcolors()
[(44748, (255, 255, 255)), (452, (236, 145, 146)), (653, (191, 147, 149)), ...)]
What I found strange here is that the austrian flag only has two colors in it, but the above output shows more than ten. Do you know why? My idea was to only count the top 5 colors and as I'm not interested in every color I would do some "normalize" the numbers to multiples of 64 (so (236, 145, 146) becomes (192, 128, 128)).
However at the moment I have no idea what is the best way to extract more information (Ist there a star in the image? or else). Could you give me some hints on how to do it?
Thanks in advance
The Python Imaging Library - PIL just does basic image manipulation - opening, some transforms or filters, and saving to other formats.
Pattern recognition, is part of an advanced image processign field and evolving -- it deos use algorithms far different than those present in PIL.
There are some libraries and frameworks you can use in Python for pattern recognition - (recognising stars, and moons, and so) - Although I advance you: if you want this just to classify one0-hundered-and-a-few coutnry flags, you should do it manually, rather than try to dive in pattern recognition.
Your comment on the number of colors tells that you are not used with computer images at all. And pattern recognition is hardcore, even with a python front-end. (You can't expect any current framework to know beforehand what is a "moon" or a "star" for example)
So, for less than 500 images, you can resort to software that allows you to tag images manually and write some code to link the tags to each flag.
As for the colors: Computer rasterized images are formed of pixels. These are Square. At the boundary between different colors, if a pixel is on one color (say white), and its neighbor is a complete different color (like red), this boundary will show up jagged. This is known as "aliasing". To diminish this, computer software mixes colors at hard boundaries, creating intermediate colors - that is why a PNG even with 2 apparent colors can have several colors internally. For .JPG it is even worse, because the rounded decimal numbers for RGB colors we use are not even stored as they are in the image.
Unlike pattern recognizing, you can downsize the number of colours seen by using just the most significant bits of each component. I'd say the two most significant bits would be enough.
The following python function could do that using a color count given by PIL:
def get_main_colors(col_list):
main_colors = set()
for index, color in col_list:
main_colors.add(tuple(component >> 6 for component in color))
return [tuple(component << 6 for component in color) for color in main_colors]
call it with "get_main_colors(bild.get_colors()) " for example.
Here is another question dealing with the pattern recognition part:
python image recognition
First some quick terminology, just in case:
A classifier learns a map of inputs to outputs. You train a classifier by giving it input/output pairs, for example feature vectors like color information and labels like 'czech flag'. In practice, the labels are represented as scalar numbers. In your example, you have a multi-class problem, which simply means that there are more than two possible labels (obviously, since there are more than two country flags). Training a multi-class classifier can a little trickier than the vanilla binary classifier, so you may want to search for terms like "multi-class classifier" or "one-vs-many classifier" to investigate the best approach for you.
On to the problem:
I think your problem might be easily-solved using a simple classifier, like k-nearest neighbors, with color histograms as feature vectors. In particular, I would use HSV feature vectors as opposed to RGB feature vectors. Some great results have been reported in the literature using just this kind of simple classifier system, for example: SVMs for Histogram-Based Image Classification. In that paper, the authors use a particular classifier known as a Support Vector Machine (SVM) and HSV feature vectors. HSV feature vectors also sidestep the issue of image scale and rotation, for example a flag that is 1024x768 vs 640x480, or a flag that is rotated in an image by 45 degrees.
The pseudocode for training the algorithm would look something like this:
# training simple kNN -- just compute feature vectors, collect labels
X = [] # tuple (input example, label)
for training_image in data:
x = get_hsv_vector(training_image)
y = get_label(training_image)
X.append((x,y))
# classification -- pick k closest feature vectors
K = 3 # the 'k' in kNN -- how many similar featvecs to use
d = [] # (distance, label) tuples for scoring
x_test = get_hsv_vector(test_image) # feature vector to be classified
for x_train in X:
d.append((distance(x_test[0], x_train), x_test[1])
# sort distances, d, by closeness and pick top K labels for scoring
d.sort()
output = get_majority_vote([x[1] for x in d[:K]])
The kNN classifier is available in several python packages, with good documentation. It should be pretty easy to convert to HSV colorspace as well. If you don't achieve your desired results, you can try to improve your feature vectors or your classifier.