I am working on a digit detection software and I am need to use the MNIST dataset together with TensorFlow.
To predict the numbers I have to do some pre-processing to the images. There are many scripts on the web which have this step inside (the preprocess) but no one of these, at least all what I have tried for now, do what original documentation speak about.
More or less, all the tutorials are based on a theory: the images have to be 28x28 pixels. Which is ok...Maybe..
The documentation tell this about how the MNIST dataset was made:
The original black and white (bilevel) images from NIST were size
normalized to fit in a 20x20 pixel box while preserving their aspect
ratio. The resulting images contain grey levels as a result of the
anti-aliasing technique used by the normalization algorithm. the
images were centered in a 28x28 image by computing the center of mass
of the pixels, and translating the image so as to position this point
at the center of the 28x28 field.
And at this point, this is what I tried to do at my images, to have a 1:1 ratio between MNIST an my ROI.
I create a blank canvas of 28x28 pixels. I took my image (which could be 17x34, it's an example to say that are all rectangular) and resized it to 20x20, without maintaing the ratio between height and width.
I calculate the center of mass of this new (squared) ROI and then I paste it into the center of my 28x28 canvas, in a way that the center of mass and the center of the canvas match.
Above there is the original ROI (rectangles as result of cropping process) and below the output of preprocessing. Note: the red square does not exist, it's just to highlight.
These images, passed to TensorFlow, are being recognized well (~96%). These.
I have other numbers...
The blue ones are being recognized with a low percentage (from a range of 80% to 30%) and the red is not recognized (or recognized wrong with a high percentage).
Example: 5,1,4 are recognized as 7,2,6.
My question is this: what I am doing wrong ? Is the center of mass working good (I am asking this because, for some numbers it traslate the 20x20 ROI down the Y axis and I am correcting this manually) ?
Any help is really welcome and appreciated.
P.S: here you can find some numbers NOT pre-processed so you can test.
P.P.S: I am asking this because, for numbers that it don't recognize, if I move/traslate them by 1/2/3 pixels (really, it's matter of few) up or down, TF work good. Why ?
Related
I am trying to find the roughness of shapes in an image. I have found the contours and used simplification.cutil.simplify_coords_vwp(contour,1000) to calculate a polygon that I want to use as the "smooth" shape (this is similar to the more commonly used "Douglas-Peuker" algorithm). These shapes have ~13 points to them giving leeway to any bends. This is done for all of the shapes present in the full image.
The images below show the full image I want the roughness of and a zoomed in image showing what I am trying to calculate. I want to quantify the black inside of the lines and the white outside of the lines, giving a quantity to the amount of roughness. I didn't post the code because it would be a lot of extra information, I am only looking for help conceptualizing what modules could be useful here.
I am trying to use OpenCV to measure size of filament ( that plastic material used for 3D printing)
What I am trying to do is measuring filament size ( that plastic material used for 3D printing ). The idea is that I use led panel to illuminate filament, then take image with camera, preprocess the image, apply edge detections and calculate it's size. Most filaments are fine made of one colour which is easy to preprocess and get fine results.
The problem comes with transparent filament. I am not able to get useful results. I would like to ask for a little help, or if someone could push me the right directions. I have already tried cropping the image to heigh that is a bit higher than filament, and width just a few pixels and calculating size using number of pixels in those images, but this did not work very well. So now I am here and trying to do it with edge detections
works well for filaments of single colour
not working for transparent filament
Code below is working just fine for common filaments, the problem is when I try to use it for transparent filament. I have tried adjusting tresholds for Canny function. I have tried different colour-spaces. But I am not able to get the results.
Images that may help to understand:
https://imgur.com/gallery/CIv7fxY
image = cv.imread("../images/img_fil_2.PNG") # load image
gray = cv.cvtColor(image, cv.COLOR_BGR2GRAY) # convert image to grayscale
edges = cv.Canny(gray, 100, 200) # detect edges of image
You can use the assumption that the images are taken under the same conditions.
Your main problem is that the reflections in the transparent filament are detected as edges. But, since the image is relatively simple, without any other edges, you can simply take the upper and the lower edge, and measure the distance between them.
A simple way of doing this is to take 2 vertical lines (e.g. image sides), find the edges that intersect the line (basically traverse a column in the image and find edge pixels), and connect the highest and the lowest points to form the edges of the filament. This also removes the curvature in the filament, which I assume is not needed for your application.
You might want to use 3 or 4 vertical lines, for robustness.
I have a use case where I need to classify some images as grey scale or color. My initial step was based on the feature that grey scale images should have r,g,b values at a pixel, the same values as it is single channel. Were as for color images, r,g,b values at the same pixel may not be the same.
So I am checking by getting the difference between (r,g), (b,g) and (r,b) and if all three has only zero, its grey scale else, its color.
This approach helped me to identify many grey scale images but still there are some images which does not follow this logic. Can anyone specify some good features on which we can classify an image as color or grey scale using opencv?
Do not ask me to check the number of channels and classify, it gives 3 for both the classes as we are loading it in .jpg format.
Thanks in advance
I suspect, some never were grey-scale images after digitizing (e.g. a color scan of gray-scale picture). Due to noise, there are minimal differences in the RGB values. A low threshold greater than perfect zero should do the trick.
Please note that JPEG totally has a gray-scale option. However, when storing the picture, you request that mode. Compressors usually do not pick it up automatically. Also, you explicitly need to set the flag IMREAD_UNCHANGED while reading with OpenCV's imread.
With the method Suggested by #QuangHoang I got a result of 85+% accuracy.
Here is the approach explained.
#test image
img=cv2.imread('test.jpg')
r,g,b=cv2.split(img)
#spliting b,g,r and getting differences between them
r_g=np.count_nonzero(abs(r-g))
r_b=np.count_nonzero(abs(r-b))
g_b=np.count_nonzero(abs(g-b))
diff_sum=float(r_g+r_b+g_b)
#finding ratio of diff_sum with respect to size of image
ratio=diff_sum/img.size
if ratio>0.005:
label='color'
else:
label='grey'
Thanks for all the suggestions.
I am trying to determine the orientation of the following image. Given an image at random between 140x140 to 150X150 pixels with no EXIF data. Is there a method to define each image as 0, 90, 180 or 270 degrees so that when I get an image of a particular orientation I can match that with my predefined images? I've looked into feature matching with opencv using the following tutorial, and it works correctly. Identify the images as the same no matter its orientation, but I have no clue how to tell them apart.
I've looked into feature matching with opencv using the following tutorial, and it works correctly
So you could establish a valid match between an image of unknown rotation and an image in your database? And the latter one is of a known rotation (i.e. upright)?
In this case you can compute a transformation matrix:
either a homography which defines a full planar transformation (use cv::findHomography)
or an affine transform which expresses translation, rotation and scaling and thus seems best for your needs (use cv::estimateRigidTransform with fullAffine=true). You can find more about affine transformations here
If you don't have any known image then this task seems mathematically unsolvable but you could use something like an Artificial-Neural-Network-based heuristic which seems like a very research-intensive project.
If you have the random image somewhere (say, you're trying to match a certain image to a list of images you have), you could try taking the difference of your random image and your list of known images four times for each image, rotating the known image each time by 90 deg. Whichever one is closer to zero should be what you want.
If the image sizes of both your new image and the list of images are the same, you might also be able to just compare the keypoint distance differences (if the image is a match but all the keypoints are all rotated a quadrant clockwise from each other, then it's 90 deg off etc).
If you have no idea what that random image is supposed to be, I can't really think of any way to figure that out, unless you know for sure that a blob of light blue is supposed to be the sky. As far as I know, there's got to be something that you know to be up in order to determine what up is.
I have a large stack of images showing a bar with some dark blobs, whose position changes with time (see Figure, b). To detect the blobs I am now using an intensity threshold (c in Figure, where all intensity values below the threshold are set to 1) and then I search for blobs in the binary image using the Matlab code below. As you see the binary image is quite noisy, complicating the blobs detection process. Do you have any suggestion on how to improve the shape detection, maybe including some machine learning algorithm? Thanks!
Code:
se = strel('disk',1);
se_1 = strel('disk',3);
pw2 = imclose(IM,se);
pw3 = imopen(pw2,se_1);
pw4 = imfill(pw3, 'holes');
% Consider only the blobs with more than threshold pixels
[L,num] = bwlabel(pw4);
counts = sum(bsxfun(#eq,L(:),1:num));
number_valid_counts = length(find(counts>threshold));
This might help.
Extract texture features of the boundary of the blobs you want to extract. This can be done using Local binary patterns. There are many other texture features, you can get a detailed survey here.
Then use them to train a binary classifier.
It seems that the data come like pulses in the lower side of the image, I suggest to get some images and to slice vertical lines of the pixels perpendicular to the pulse direction, each time you take a line of values, little bit above and lower the pulse, the strip width is one pixel, and its height is little bit larger than the pulse image to take some of the light values lower and above the pulse, you may start from pixel 420-490, each time you save 70 grey values, those will form the feature vector, take also lines from the non blob areas to save for class 2, do this on several images and lines from each image.
now you get your training data, you may use any machine learning algorithm to train the computer for pulses and non pulses,
in the test step, you scan the image reading each time 70 pixels vertically and test them against the trained model, create a new black image if they belong to class "bolob" draw white vertical line starting from little below the tested pixel, else draw nothing on the output image.
at the end of scanning the image: check if there is an isolated white line you may delete considering it as false accepted . if you find a dark line within a group of white line, then convert it to white, considering false rejection.
you may use my classifier: https://www.researchgate.net/publication/265168466_Solving_the_Problem_of_the_K_Parameter_in_the_KNN_Classifier_Using_an_Ensemble_Learning_Approach
if you decide I will send you coed to do it. the distance metric is a problem, because the values varies between 0 and 255, so the light values will dominate the distance, to solve this problem you may use Hassanat distance metric at : https://www.researchgate.net/publication/264995324_Dimensionality_Invariant_Similarity_Measure
because it is invariant to scale in data, as each feature output a value between 0 and 1 no more, thus the highest values will not dominate the final distance.
Good luck