I am learning OpenCV applications by reading research papers and attempting to duplicate their tests and results. I may have jumped a bit too deep off the beaten path and am now curious the proper way to go about this investigation.
Goal: 1) Register these two images. 2) Stack the exposures (there are actually 20+ in this series). 3) Learn.
Attached below is an example image- shot with a cell phone, in low light, in burst mode. If one were to level stretch one would see there are very few hard edges (some sheets), but there are enough details to manually align portions of the images with each other. I ran this through the default OpenCV implementations of ORB and SIFT and, as expected, came back with poor matches.
I have not yet stumbled upon the right technique described to increase edge detection. As mentioned, no hard edges are present. However I thought I'd previously read that one could downsample the image using a max function and get a better 'edge' detection. That edge should be able to provide registration homography to the higher resolution image. But I can neither find the resource to do so nor any descriptions of similar activity. Help here would be appreciated.
In addition if there are any authored papers discussing this technique that I could be pointed to I'd appreciate it. I'm quite familiar with astrophotography and star stacking, and am looking forward to trying drizzle on a different type of image set.
Downsampling the image techniques I've tried to better indicate edges: Differences of Gaussians, Laplace, directional edge detection, and a few others.
I appreciate the time you've taken to help me learn how to expand my efforts for this.
Thank you.
Edit: Modifying the image's contrast, or brightness, or tonal response, has no effect on the correlation of the image content. At least in the limited set of tests I've been able to run. It makes them 'prettier' but, honestly, the algorithms don't care if they're in 'human visual space' or in 'linear digital counts'. I can post it as a pretty image but, without those sharp edges, most of the filters fail and matches don't succeed- which is the crux of my issues here.
Related
I am thinking about creating a database system for images where they are stored with compact signatures and then matched against a "query image" that could be a resized, cropped, brightened, rotated or a flipped version of the stored one. Note that I am not talking about image similarity algorithms but rather strictly about duplicate detection. This would make things a lot simpler. The system wouldn't care if two images have an elephant on them, it would only be important to detect if the two images are in fact the same image.
Histogram comparisons simply won't work for cropped query images. The only viable way to go I see is shape/edge detection. Images would first be somehow discretized, every pixel being converted to an 8-level grayscale for example. The discretized image will contain vast regions in the same colour which would help indicate shapes. These shapes then could be described with coefficients and their relative position could be remembered. Compact signatures would be produced out of that. This process will be carried out over each image being stored and over each query image when a comparison has to be performed. Does that sound like an efficient and realisable algorithm? To illustrate this idea:
removed dead ImageShack link
I know this is an immature research area, I have read Wikipedia on the subject and I would ask you to propose your ideas about such an algorithm.
SURF should do its job.
http://en.wikipedia.org/wiki/SURF
It is fast an robust, it is invariant on rotations and scaling and also on blure and contrast/lightning (but not so strongly).
There is example of automatic panorama stitching.
Check article on SIFT first
http://en.wikipedia.org/wiki/Scale-invariant_feature_transform
If you want to do a feature detection driven model, you could perhaps take the singular value decomposition of the images (you'd probably have to do a SVD for each color) and use the first few columns of the U and V matrices along with the corresponding singular values to judge how similar the images are.
Very similar to the SVD method is one called principle component analysis which I think will be easier to use to compare between images. The PCA method is pretty close to just taking the SVD and getting rid of the singular values by factoring them into the U and V matrices. If you follow the PCA path, you might also want to look into correspondence analysis. By the way, the PCA method was a common method used in the Netflix Prize for extracting features.
How about converting this python codes to C back?
Check out tineye.com They have a good system that's always improving. I'm sure you can find research papers from them on the subject.
The article you might be referring to on Wikipedia on feature detection.
If you are running on Intel/AMD processor, you could use the Intel Integrated Performance Primitives to get access to a library of image processing functions. Or beyond that, there is the OpenCV project, again another library of image processing functions for you. The advantage of a using library is that you can try various algorithms, already implemented, to see what will work for your situation.
We are experimenting with applying a convolutional neural network to classify good surfaces and surfaces with defects.
The good and bad images are mostly like the following:
Good ones:
Bad ones:
The image is relatively big (Height:800 pixels, width: 500 pixels)
The defect very local and small relative to image
The background is very noisy
The deep learning (6 x conv+pooling -> flatten -> dense64-> dense32) result is very bad
(perhaps due to limited Bad samples and very small defect pattern)
There are other defect patterns like very subtle scratches, residuals and stains, etc., which is one of the main reasons that we want to use deep learning instead of specific feature engineering.
We can and are willing to accumulate more images of defects.
So the question are:
Is deep learning even an appropriate tool for defect detection like this in practice.
If yes, how can we adapt or pre-process the images to the formats that the deep learning models can really work with. (Could we apply some known filters to make the background much less noisy?)
If no, what are other practical techniques that can be used other than deep models.
Will things like template matching or anything else actually be a fit for this type of problems?
Update:
Very good idea to come up with an explicit circular stripes checker.
It might be directly used to check where the pattern is disturbed or be used as a pre-processing step for deep learning.
Update:
A more subtle pattern 'scratch'.
There is a scratch starting from the bottom of the fan area going up and a little to the right.
Is deep learning even an appropriate tool for defect detection like
this in practice.
Deep learning certainly is a possibility that promises to be universal. In general, it should rather be the last resort than the first approach. Downsides include:
It is difficult to include prior knowledge.
You therefore need an extreme amount of data to train the classifier for the general case.
If you succeed, the model is opaque. It might depend on subtle properties, which cause it to fail if the manufacturing process is changed in the slightest way and there is no easy way to fix it.
If yes, how can we adapt or pre-process the images to the formats that
the deep learning models can really work with. (Could we apply some
known filters to make the background much less noisy?)
Independent of the classifier you eventually decide to use, preprocessing should be optimal.
Illumination: The illumination is uneven. I'd suggest to define a region of interested, in which the illumination is bright enough to see something. I'd suggest to calculate the average intensity over many images and use this to normalize the brightness. The result would be an image cropped to the region of interest, where the illumination is homogenous.
Circular stripes: In the images you show, as the stripes are circular, their orientation depends on the position in the image. I would suggest to use a transformation, which transforms the region of interest (fraction of a circle) into a trapezoid, where each stripe is horizontal and the length of each stripe is retained.
If no, what are other practical techniques that can be used other than
deep models. Will things like template matching or anything else
actually be a fit for this type of problems?
Rather than identifying defects, you could try identifying the intact structure, which has relatively constant properties. (This would be the circular stripes checker that I have suggested in the comment). Here, one obvious thing to test would be a 2D fourier transformation at each pixel within an image preprocessed as described above. If the stripes are intact, you should see that the frequency of intensity change is much lower in horizontal than in vertical direction. I would just plot these two quantities for many "good" and "bad" pixels and check, whether that might already allow some classification.
If you can preselect possible defects with that method, you could then crop out a small image and subject it to deep learning or whatever other method you want to use.
I'm producing an ugv prototype. The goal is to perform the desired actions to the targets set within the maze. When I surf the Internet, the mere right to navigate in the labyrinth is usually made with a distance sensor. I want to consult more ideas than the question.
I want to navigate the labyrinth by analyzing the image from the 3d stereo camera. Is there a resource or successful method you can suggest for this? As a secondary problem, the car must start in front of the entrance of the labyrinth, see the entrance and go in, and then leave the labyrinth after it completes operations in the labyrinth.
I would be glad if you suggest a source for this problem. :)
The problem description is a bit vague, but i'll try to highlight some general ideas.
An useful assumption is that labyrinth is a 2D environment which you want to explore. You need to know, at every moment, which part of the map has been explored, which part of the map still needs exploring, and which part of the map is accessible in any way (in other words, where are the walls).
An easy initial data structure to help with this is a simple matrix, where each cell represents a square in the real world. Each cell can be then labelled according to its state, starting in an unexplored state. Then you start moving, and exploring. Based on the distances reported by the camera, you can estimate the state of each cell. The exploration can be guided by something such as A* or Q-learning.
Now, a rather subtle issue is that you will have to deal with uncertainty and noise. Sometimes you can ignore it, sometimes you don't. The finer the resolution you need, the bigger is the issue. A probabilistic framework is most likely the best solution.
There is an entire field of research of the so-called SLAM algorithms. SLAM stands for simultaneous localization and mapping. They build a map using some sort of input from various types of cameras or sensors, and they build a map. While building the map, they also solve the localization problem within the map. The algorithms are usually designed for 3d environments, and are more demanding than the simpler solution indicated above, but you can find ready to use implementations. For exploration, something like Q-learning still have to be used.
I've used Kirsch filter to try and obtain the blood vessels, but the result isn't the best, as shown below:
Although the vessels have been obtained, they aren't bright enough. How do I go about making them 'more visible'?
I worked on retina vessel detection for a bit few years ago, and there are different ways to do it:
If you don't need a top result but something fast, you can use oriented openings, see here and here.
Then you have an other version using mathematical morphology version here.
For better results, here are some ideas:
Personally, I used combination of Gabor filters, and results where pretty good. See the segmentation result here on the first image of drive.
And Gabor can be combined with learning for a good result, or here.
Few years ago, they claimed to have the best algorithm, but I've never had the opportunity to test it. I was sceptic about the performance gap and the way they thresholded the line detector results, it was kind of obscure.
But I know that nowadays, many people try to tackle the problem using CNN, but I've not heard about significant improvements.
[EDIT] To answer your specific question, you can erase the bright ring, and then apply a histogram stretching. But I think that the methods I introduced before will work better than the filter you are using.
looks like, the solution for your problem is histogram equalization (we had the same problem for homework)
http://docs.opencv.org/3.1.0/d5/daf/tutorial_py_histogram_equalization.html#gsc.tab=0
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I'm trying to locate a (possibly perspective-deformed) book in an image and extract it so that it is "straight" and "front-on" (i.e. perspective-corrected).
The particular book is unknown -- there is no query or reference image to check for matches against (i.e. by some sort of feature descriptor matching process). In other words, I'm trying to hunt through the image and find a bunch of pixels that look like they belong to the object class "book", not a particular book.
The book may be somewhat rotated or otherwise perspective-deformed. However, it is assumed the amount of deformation is within fairly reasonable bounds: the person taking the photo is working "with" me. This means as well that the book should feature prominently in the image -- perhaps 30-90% of total image area (and not as some random item amidst a bunch of other clutter).
Good resources exist for (superficially) similar problems online. For example, this well-written tutorial covers automatic perspective-correction of playing cards: https://opencv-code.com/tutorials/automatic-perspective-correction-for-quadrilateral-objects/.
Currently, the system follows a loosely similar process as this tutorial, with some additions. The general technique stack is:
Pre-processing
Find edges with Canny edge detection
Find edges that look like lines with Hough transform
Find intersection points between lines in the hope of finding book corners
Filter out implausible lines and intersection points based on simple geometric properties
Take convex hull of intersection points
Get polygon approximation to the convex hull and use this to get four corners
Apply perspective/homographic transform
The output points (used to calculate the perspective transform) are known because we assume a known aspect ratio (i.e. book dimensions).
It works for some images where the book is against fairly homogeneous backgrounds (around 1/3 to 1/2 of "nicer" images). After experimenting with the fairly dumb convex hull approach as well as a more involved quadrilateral-enumeration approach, I've concluded that the problem may be impossible using just geometric/spatial information alone -- it would probably need augmenting with colour/texture information (well, this is obvious when you consider the case of 180 degrees rotation/upside-down books).
The obvious challenge is that there is an almost infinite variety of possible book covers, and an almost infinite variety of possible backgrounds. Therefore, solving for the general case would be impossible or at least intractably hard. I knew this when I began the task. But, I hoped it would be the sort of problem that may have a solution enough of the time.
Other approaches I've considered looking at include OCRing the titles/text to work out orientation or possibly general position. The other approach that might conceivably be fruitful is some sort of learning-based classifier.
A related subtask I'm working on is the same goal but in a webcam video stream. This is definitely easier since I can use temporal information (i.e. position across frames). I just started this one yesterday but, after some initial progress, plateaued. A human holding the book generates background movement noise which throws off trivial approaches like frame differencing / background subtraction. Compared with the static image problem, however, I feel this is far more doable.
Sorry if that was a little long-winded. I wanted to make sure I made a sincere effort to articulate the problem(s). What do people think? Anyone have any thoughts as to how these problems might best be tackled?
Does calculating homography with 4 lines instead of 4 points help the problem? As you probably know, if points are related as p2=Hp1, the lines are related as l2=H-1l1. The lines on the book border should be quite prominent especially if the deformation is not large. Is you main problem selecting right lines (you did NOT actually said what's your problem was)? May be some kind of Hough-rectangle can help to find lines?
Anyway, selecting lines for homography input has an additional advantage that RANSAC homography with a constraint on aspect ratio is likely to keep right lines as inliners in the presence of numerous outliers from the background. And if those outliers sneak in they probably look like another book.