What are good features for classifying photos of clothing? [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I want to build a clothing classifier that takes a photo of an item of clothing and classifies it as 'jeans', 'dress', 'trainers' etc.
Some examples:
These images are from retailer websites, so are typically taken from the same angle, typically on a white or pale background -- they tend to be very similar.
I have a set of several thousand images whose category I already know, which I can use to train a machine-learning algorithm.
However, I'm struggling for ideas of what features I should use. The features I have so far:
def get_aspect_ratio(pil_image):
_, _, width, height = pil_image.getbbox()
return width / height
def get_greyscale_array(pil_image):
"""Convert the image to a 13x13 square grayscale image, and return a
list of colour values 0-255.
I've chosen 13x13 as it's very small but still allows you to
distinguish the gap between legs on jeans in my testing.
"""
grayscale_image = pil_image.convert('L')
small_image = grayscale_image.resize((13, 13), Image.ANTIALIAS)
pixels = []
for y in range(13):
for x in range(13):
pixels.append(small_image.getpixel((x, y)))
return pixels
def get_image_features(image_path):
image = Image.open(open(image_path, 'rb'))
features = {}
features['aspect_ratio'] = get_aspect_ratio(image)
for index, pixel in enumerate(get_greyscale_array(image)):
features["pixel%s" % index] = pixel
return features
I'm extracting a simple 13x13 grayscale grid as a crude approximation of shape. Howerver, using these features with nltk's NaiveBayesClassifier only gets me 34% accuracy.
What features would work well here?

This is a tough problem and therefore there are many approaches.
On common method (although complicated) is taken an input image, superpixelate the image and compute descriptors (such as SIFT of SURF) of those superpixels building a bag-of-word representation by accumulating histograms per superpixel, this operation extracts the key information from a bunch of pixels reducing dimensionality. Then a Conditional Random Field algorithm searches for relationships between superpixels in the image and classifies the group of pixels inside a known category. For pixelating images scikit-image package implements SLIC algorithm segmentation.slic, and for the CRF you should take a look to PyStruct package. SURF and SIFT can be calculated using OpenCV.
Another simple version would be computing descriptors of a given image (SIFT, SURF, borders, histogram etc) and use them as inputs in a classifier algorithm, you might want start from here, maybe scikit-learn.org is the easiest and most powerful package for doing this.

HOG is commonly used in object detection schemes. OpenCV has a package for HOG descriptor:
http://docs.opencv.org/modules/gpu/doc/object_detection.html
You can also use BoW based features. Here's a post that explains the method:
http://gilscvblog.wordpress.com/2013/08/23/bag-of-words-models-for-visual-categorization/

Using all the raw pixel values in an image directly as features aren't great, especially as the number of features increase, due to the very large search space (169 features represents a large search space, which can be difficult for any classification algorithm to solve). This is perhaps why moving to an 20x20 image actually degrades performance compared to 13x13. Reducing your feature set/search space might improve performance since you simplify the classification problem.
A very simple (and generic) approach to accomplish this is to use pixel statistics as features. This is the mean and standard deviation (SD) of the raw pixel values in a given region of the image. This captures the contrast/ brightness of a given region.
You can choose the regions based on trial and error, e.g., these can be:
a series of concentric circular regions, increasing in radius, in the centre of the image. The mean and SD of four circular regions of increasing size gives eight features.
a series of rectangular regions, either increasing in size or fixed size but placed around different regions in the image. The mean and SD of four non-overlapping regions (of size 6x6) in the four corners of the image and one in the centre gives 10 features.
a combination of circular and square regions.

Have you tried SVM? It generally performs better than Naive Bayes.

Related

Apply bundle adjustment to rectify images globally in the context of image stitching in python/openCV

I am trying to perform image registration on potentially hundreds of aerial images taken from a camera mounted on a UAV. I think it is safe to assume that I know the ordering of the images, and hopefully, sequential images will overlap.
I have read some papers that suggest using a CNN to find the homography matrix can vastly outperform the old school feature descriptor matching with RANSAC song and dance. My issue is that I don't quite understand how to stitch more than 2 images together. It seems to me that to register image 100 in the same coordinate frame as image 1 using the cv2.warpPerspective function, I would do I100H1H2*H3...H99. Even if the error in each transform is small after 100 applications it seems like it would be huge. My understanding is that the solution to this problem is bundle adjustment.
I have looked into bundle adjustment a little bit but Im struggling to see how exactly I can use it. I have read the paper that many related stack overflow posts suggest "Automatic Panoramic Image Stitching using Invariant Features". In the section on bundle adjustment IF I understand the authors suggest that after building the initial panorama it is likely that image A will eventually overlap with multiple other images. Using the matched feature points in any images that overlap with A they basically calculate some adjustment...? I think to image A?
My question is using openCV how do I apply this adjustment? Let's say I have 3 images I1, I2, I3 all overlapping for a minimal example.
#assuimg CNN model predicts transform
#I think the first step is find the homography between all images
H12 = cnnMod.predict(I1,I2)
H13 = cnnMod.predict(I1,I3)
H23 = cnnMod.predict(I2,I3)
outI2 = cv2.warpPerspective(I2,H12,(maxWidth, maxHeight),flags=cv2.INTER_LINEAR)
outI3 = cv2.warpPerspective(I2,H23,(maxWidth, maxHeight),flags=cv2.INTER_LINEAR)
#now would I do some bundle voodoo?
#what would it look like?
#which of the bundler classes should I use?
#would it look like this?
#or maybe the input is features?
voodoo = cv2.bundleVoodoo([H12,H13,H23])
golaballyRectifiedI2 = cv2.warpPerspective(outI2,voodoo[2],(maxWidth, maxHeight),flags=cv2.INTER_LINEAR)
The code is my best guess at what a solution might look like but clearly I have no idea what I am doing. I've not been able to find anything that actually shows how the bundle adjustment is done.
The basic idea underlying image alignment through bundle adjustment is that, rather than matching pairs of 2D points (x, x') across pairs of images, you posit the existence of 3d points X that, ideally, project onto matched tuples of 2D points (x, x', x'', ...) matched among corresponding tuples of images. You then solve for the location of the X's and the camera parameters (extrinsics, and intrinsics if the camera is uncalibrated) that minimize the (robustified, usually) RMS reprojection error over all 2d points and images.
Depending on your particular setup and scene, you may make some simplifying assumptions, e.g.:
That the X's all belong to the same plane (which you can arbitrarily choose as the world's Z=0 plane). This is useful, for example, when stitching images of a painting, or aerial images on relatively flat ground with relatively small extent so one can ignore the earth's curvature.
Or that the X's are all on the WGS84 ellipsoid.
Both the above assumptions remove one free coordinate from X, effectively reducing the problem's dimensionality.

How to predict the next pixel given all previous pixels in an image using tensorflow

Just as a project, I wanted to see if it would be possible to predict the next pixel in an image given all previous pixels.
For example: lets say I have an image with x pixels. Given the first y pixels, I want to be able to somewhat accurately predict the y+1th pixel. How should I go about solving this problem.
You are looking for some kind of generative model. RNNs are commonly used, and there's a great blog post here demonstrating character-by-character text generation.
The same principle can be applied to any ordered sequence. You talk about an image as being a sequence of pixels, but images have an intrinsic 2D structure (3 if you include color) that would be lost if you took the exact same approach as text generation. A couple of ideas:
Use tensorflow's GridLSTMCells
Treat a column of pixels as a single element of the sequence and predict column-by-column (or row by row)
Combine idea 2 with some 1D convolutions along the column/row
Use features from a section of an image as the seed to a generative adversarial network. See this repository for basic implementations.

Blobs detection using machine learning?

I have a large stack of images showing a bar with some dark blobs, whose position changes with time (see Figure, b). To detect the blobs I am now using an intensity threshold (c in Figure, where all intensity values below the threshold are set to 1) and then I search for blobs in the binary image using the Matlab code below. As you see the binary image is quite noisy, complicating the blobs detection process. Do you have any suggestion on how to improve the shape detection, maybe including some machine learning algorithm? Thanks!
Code:
se = strel('disk',1);
se_1 = strel('disk',3);
pw2 = imclose(IM,se);
pw3 = imopen(pw2,se_1);
pw4 = imfill(pw3, 'holes');
% Consider only the blobs with more than threshold pixels
[L,num] = bwlabel(pw4);
counts = sum(bsxfun(#eq,L(:),1:num));
number_valid_counts = length(find(counts>threshold));
This might help.
Extract texture features of the boundary of the blobs you want to extract. This can be done using Local binary patterns. There are many other texture features, you can get a detailed survey here.
Then use them to train a binary classifier.
It seems that the data come like pulses in the lower side of the image, I suggest to get some images and to slice vertical lines of the pixels perpendicular to the pulse direction, each time you take a line of values, little bit above and lower the pulse, the strip width is one pixel, and its height is little bit larger than the pulse image to take some of the light values lower and above the pulse, you may start from pixel 420-490, each time you save 70 grey values, those will form the feature vector, take also lines from the non blob areas to save for class 2, do this on several images and lines from each image.
now you get your training data, you may use any machine learning algorithm to train the computer for pulses and non pulses,
in the test step, you scan the image reading each time 70 pixels vertically and test them against the trained model, create a new black image if they belong to class "bolob" draw white vertical line starting from little below the tested pixel, else draw nothing on the output image.
at the end of scanning the image: check if there is an isolated white line you may delete considering it as false accepted . if you find a dark line within a group of white line, then convert it to white, considering false rejection.
you may use my classifier: https://www.researchgate.net/publication/265168466_Solving_the_Problem_of_the_K_Parameter_in_the_KNN_Classifier_Using_an_Ensemble_Learning_Approach
if you decide I will send you coed to do it. the distance metric is a problem, because the values varies between 0 and 255, so the light values will dominate the distance, to solve this problem you may use Hassanat distance metric at : https://www.researchgate.net/publication/264995324_Dimensionality_Invariant_Similarity_Measure
because it is invariant to scale in data, as each feature output a value between 0 and 1 no more, thus the highest values will not dominate the final distance.
Good luck

Can anyone provide me with some clustering examples?

I am having a hard time understanding what scipy.cluster.vq really does!!
On Wikipedia it says Clustering can be used to divide a digital image into distinct regions for border detection or object recognition.
on other sites and books it says we can use clustering methods for clustering images for finding groups of similar images.
AS i am interested in image processing ,I really need to fully understand what clustering is .
So
Can anyone show me simple examples about using scipy.cluster.vq with images??
The kind of clustering performed by scipy.cluster.vq is definitely of the latter (groups of similar images) variety.
The only clustering algorithm implemented in scipy.cluster.vq is the K-Means algorithm, which typically treats input data as points in n-dimensional euclidean space, and attempts to divide that space so that new, incoming data can be summarized by saying "example x is most like centroid y". Centroids can be thought of as prototypical examples of the input data. Vector quantization leads to concise, or compressed representations because, instead of remembering all 100 pixels of each new image we see, we can remember a single integer which points at the prototypical example that the new image is most like.
If you had many small grayscale images:
>>> import numpy as np
>>> images = np.random.random_sample((100,10,10))
So, we've got 100 10x10 pixel images. Let's assume they already all have similar brightness and contrast. The scipy kmeans implementation expects flat vectors:
>>> images = images.reshape((100,100))
>>> images.shape
(100,100)
Now, let's train the K-Means algorithm so that any new incoming image can be assigned to one of 10 clusters:
>>> from scipy.cluster.vq import kmeans, vq
>>> codebook,distortion = kmeans(images,10)
Finally, let's say we have five new images we'd like to assign to one of the ten clusters:
>>> newimages = np.random.random_samples((5,10,10))
>>> clusters = vq(newimages.reshape((5,100)),codebook)
clusters will contain the integer index of the best matching centroid for each of the five examples.
This is kind of a toy example, and won't yield great results unless the objects of interest in the images you're working with are all centered. Since objects of interest might appear anywhere in larger images, it's typical to learn centroids for smaller image "patches", and then convolve them (compare them at many different locations) with larger images to promote translation-invariance.
The second is what clustering is: group objects that are somewhat similar (and that could be images). Clustering is not a pure imaging technique.
When processing a single image, it can for example be applied to colors. This is a quite good approach for reducing the number of colors in an image. If you cluster by colors and pixel coordinates, you can also use it for image segmentation, as it will group pixels that have a similar color and are close to each other. But this is an application domain of clustering, not pure clustering.

Automatically recognize patterns in images

Recently I downloaded some flags from the CIA world factbook. Now I want to "classify them.
Get the colors
Get some shapes (stars, moons etc.)
While browsing I came across the Python Image Library which allows me to extract the colors (i.e. for Austria:
#!/usr/bin/env python
import Image
bild = Image.open("au-lgflag.gif").convert("RGB")
bild.getcolors()
[(44748, (255, 255, 255)), (452, (236, 145, 146)), (653, (191, 147, 149)), ...)]
What I found strange here is that the austrian flag only has two colors in it, but the above output shows more than ten. Do you know why? My idea was to only count the top 5 colors and as I'm not interested in every color I would do some "normalize" the numbers to multiples of 64 (so (236, 145, 146) becomes (192, 128, 128)).
However at the moment I have no idea what is the best way to extract more information (Ist there a star in the image? or else). Could you give me some hints on how to do it?
Thanks in advance
The Python Imaging Library - PIL just does basic image manipulation - opening, some transforms or filters, and saving to other formats.
Pattern recognition, is part of an advanced image processign field and evolving -- it deos use algorithms far different than those present in PIL.
There are some libraries and frameworks you can use in Python for pattern recognition - (recognising stars, and moons, and so) - Although I advance you: if you want this just to classify one0-hundered-and-a-few coutnry flags, you should do it manually, rather than try to dive in pattern recognition.
Your comment on the number of colors tells that you are not used with computer images at all. And pattern recognition is hardcore, even with a python front-end. (You can't expect any current framework to know beforehand what is a "moon" or a "star" for example)
So, for less than 500 images, you can resort to software that allows you to tag images manually and write some code to link the tags to each flag.
As for the colors: Computer rasterized images are formed of pixels. These are Square. At the boundary between different colors, if a pixel is on one color (say white), and its neighbor is a complete different color (like red), this boundary will show up jagged. This is known as "aliasing". To diminish this, computer software mixes colors at hard boundaries, creating intermediate colors - that is why a PNG even with 2 apparent colors can have several colors internally. For .JPG it is even worse, because the rounded decimal numbers for RGB colors we use are not even stored as they are in the image.
Unlike pattern recognizing, you can downsize the number of colours seen by using just the most significant bits of each component. I'd say the two most significant bits would be enough.
The following python function could do that using a color count given by PIL:
def get_main_colors(col_list):
main_colors = set()
for index, color in col_list:
main_colors.add(tuple(component >> 6 for component in color))
return [tuple(component << 6 for component in color) for color in main_colors]
call it with "get_main_colors(bild.get_colors()) " for example.
Here is another question dealing with the pattern recognition part:
python image recognition
First some quick terminology, just in case:
A classifier learns a map of inputs to outputs. You train a classifier by giving it input/output pairs, for example feature vectors like color information and labels like 'czech flag'. In practice, the labels are represented as scalar numbers. In your example, you have a multi-class problem, which simply means that there are more than two possible labels (obviously, since there are more than two country flags). Training a multi-class classifier can a little trickier than the vanilla binary classifier, so you may want to search for terms like "multi-class classifier" or "one-vs-many classifier" to investigate the best approach for you.
On to the problem:
I think your problem might be easily-solved using a simple classifier, like k-nearest neighbors, with color histograms as feature vectors. In particular, I would use HSV feature vectors as opposed to RGB feature vectors. Some great results have been reported in the literature using just this kind of simple classifier system, for example: SVMs for Histogram-Based Image Classification. In that paper, the authors use a particular classifier known as a Support Vector Machine (SVM) and HSV feature vectors. HSV feature vectors also sidestep the issue of image scale and rotation, for example a flag that is 1024x768 vs 640x480, or a flag that is rotated in an image by 45 degrees.
The pseudocode for training the algorithm would look something like this:
# training simple kNN -- just compute feature vectors, collect labels
X = [] # tuple (input example, label)
for training_image in data:
x = get_hsv_vector(training_image)
y = get_label(training_image)
X.append((x,y))
# classification -- pick k closest feature vectors
K = 3 # the 'k' in kNN -- how many similar featvecs to use
d = [] # (distance, label) tuples for scoring
x_test = get_hsv_vector(test_image) # feature vector to be classified
for x_train in X:
d.append((distance(x_test[0], x_train), x_test[1])
# sort distances, d, by closeness and pick top K labels for scoring
d.sort()
output = get_majority_vote([x[1] for x in d[:K]])
The kNN classifier is available in several python packages, with good documentation. It should be pretty easy to convert to HSV colorspace as well. If you don't achieve your desired results, you can try to improve your feature vectors or your classifier.

Categories

Resources