Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I'm trying to locate a (possibly perspective-deformed) book in an image and extract it so that it is "straight" and "front-on" (i.e. perspective-corrected).
The particular book is unknown -- there is no query or reference image to check for matches against (i.e. by some sort of feature descriptor matching process). In other words, I'm trying to hunt through the image and find a bunch of pixels that look like they belong to the object class "book", not a particular book.
The book may be somewhat rotated or otherwise perspective-deformed. However, it is assumed the amount of deformation is within fairly reasonable bounds: the person taking the photo is working "with" me. This means as well that the book should feature prominently in the image -- perhaps 30-90% of total image area (and not as some random item amidst a bunch of other clutter).
Good resources exist for (superficially) similar problems online. For example, this well-written tutorial covers automatic perspective-correction of playing cards: https://opencv-code.com/tutorials/automatic-perspective-correction-for-quadrilateral-objects/.
Currently, the system follows a loosely similar process as this tutorial, with some additions. The general technique stack is:
Pre-processing
Find edges with Canny edge detection
Find edges that look like lines with Hough transform
Find intersection points between lines in the hope of finding book corners
Filter out implausible lines and intersection points based on simple geometric properties
Take convex hull of intersection points
Get polygon approximation to the convex hull and use this to get four corners
Apply perspective/homographic transform
The output points (used to calculate the perspective transform) are known because we assume a known aspect ratio (i.e. book dimensions).
It works for some images where the book is against fairly homogeneous backgrounds (around 1/3 to 1/2 of "nicer" images). After experimenting with the fairly dumb convex hull approach as well as a more involved quadrilateral-enumeration approach, I've concluded that the problem may be impossible using just geometric/spatial information alone -- it would probably need augmenting with colour/texture information (well, this is obvious when you consider the case of 180 degrees rotation/upside-down books).
The obvious challenge is that there is an almost infinite variety of possible book covers, and an almost infinite variety of possible backgrounds. Therefore, solving for the general case would be impossible or at least intractably hard. I knew this when I began the task. But, I hoped it would be the sort of problem that may have a solution enough of the time.
Other approaches I've considered looking at include OCRing the titles/text to work out orientation or possibly general position. The other approach that might conceivably be fruitful is some sort of learning-based classifier.
A related subtask I'm working on is the same goal but in a webcam video stream. This is definitely easier since I can use temporal information (i.e. position across frames). I just started this one yesterday but, after some initial progress, plateaued. A human holding the book generates background movement noise which throws off trivial approaches like frame differencing / background subtraction. Compared with the static image problem, however, I feel this is far more doable.
Sorry if that was a little long-winded. I wanted to make sure I made a sincere effort to articulate the problem(s). What do people think? Anyone have any thoughts as to how these problems might best be tackled?
Does calculating homography with 4 lines instead of 4 points help the problem? As you probably know, if points are related as p2=Hp1, the lines are related as l2=H-1l1. The lines on the book border should be quite prominent especially if the deformation is not large. Is you main problem selecting right lines (you did NOT actually said what's your problem was)? May be some kind of Hough-rectangle can help to find lines?
Anyway, selecting lines for homography input has an additional advantage that RANSAC homography with a constraint on aspect ratio is likely to keep right lines as inliners in the presence of numerous outliers from the background. And if those outliers sneak in they probably look like another book.
Related
I am learning OpenCV applications by reading research papers and attempting to duplicate their tests and results. I may have jumped a bit too deep off the beaten path and am now curious the proper way to go about this investigation.
Goal: 1) Register these two images. 2) Stack the exposures (there are actually 20+ in this series). 3) Learn.
Attached below is an example image- shot with a cell phone, in low light, in burst mode. If one were to level stretch one would see there are very few hard edges (some sheets), but there are enough details to manually align portions of the images with each other. I ran this through the default OpenCV implementations of ORB and SIFT and, as expected, came back with poor matches.
I have not yet stumbled upon the right technique described to increase edge detection. As mentioned, no hard edges are present. However I thought I'd previously read that one could downsample the image using a max function and get a better 'edge' detection. That edge should be able to provide registration homography to the higher resolution image. But I can neither find the resource to do so nor any descriptions of similar activity. Help here would be appreciated.
In addition if there are any authored papers discussing this technique that I could be pointed to I'd appreciate it. I'm quite familiar with astrophotography and star stacking, and am looking forward to trying drizzle on a different type of image set.
Downsampling the image techniques I've tried to better indicate edges: Differences of Gaussians, Laplace, directional edge detection, and a few others.
I appreciate the time you've taken to help me learn how to expand my efforts for this.
Thank you.
Edit: Modifying the image's contrast, or brightness, or tonal response, has no effect on the correlation of the image content. At least in the limited set of tests I've been able to run. It makes them 'prettier' but, honestly, the algorithms don't care if they're in 'human visual space' or in 'linear digital counts'. I can post it as a pretty image but, without those sharp edges, most of the filters fail and matches don't succeed- which is the crux of my issues here.
I'm producing an ugv prototype. The goal is to perform the desired actions to the targets set within the maze. When I surf the Internet, the mere right to navigate in the labyrinth is usually made with a distance sensor. I want to consult more ideas than the question.
I want to navigate the labyrinth by analyzing the image from the 3d stereo camera. Is there a resource or successful method you can suggest for this? As a secondary problem, the car must start in front of the entrance of the labyrinth, see the entrance and go in, and then leave the labyrinth after it completes operations in the labyrinth.
I would be glad if you suggest a source for this problem. :)
The problem description is a bit vague, but i'll try to highlight some general ideas.
An useful assumption is that labyrinth is a 2D environment which you want to explore. You need to know, at every moment, which part of the map has been explored, which part of the map still needs exploring, and which part of the map is accessible in any way (in other words, where are the walls).
An easy initial data structure to help with this is a simple matrix, where each cell represents a square in the real world. Each cell can be then labelled according to its state, starting in an unexplored state. Then you start moving, and exploring. Based on the distances reported by the camera, you can estimate the state of each cell. The exploration can be guided by something such as A* or Q-learning.
Now, a rather subtle issue is that you will have to deal with uncertainty and noise. Sometimes you can ignore it, sometimes you don't. The finer the resolution you need, the bigger is the issue. A probabilistic framework is most likely the best solution.
There is an entire field of research of the so-called SLAM algorithms. SLAM stands for simultaneous localization and mapping. They build a map using some sort of input from various types of cameras or sensors, and they build a map. While building the map, they also solve the localization problem within the map. The algorithms are usually designed for 3d environments, and are more demanding than the simpler solution indicated above, but you can find ready to use implementations. For exploration, something like Q-learning still have to be used.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have webcam which detects faces and stores them in a image repository. In the repository only faces(image) will be stored in which i have multiple duplicate faces. Is there any option which i can detect duplicate faces ?
I tried by computing hash value of the image but it could only detect duplicate images not faces. Please suggest best possible solution.
I tried with the link https://www.tensorflow.org/api_docs/python/tf/contrib/learn/KMeansClustering but i was unable to input images and execute.
Thanks
avinash
Face detection = identifying the fact that a face appears in an image, and locating where it is in the image.
Face recognition = matching the face to a known person's identity, or matching multiple face images to each other based on the fact that they are images of the same person.
You say your webcam does (at least) face detection. Your question indicates that you also want to do recognition.
Both these processes require the extraction of high-level invariants in the image. Computation of these invariant feature representations is pretty much at the cutting edge of modern computer vision. Simply hashing pixel values is light-years away from this: the hash of an image will differ by an arbitrary degree as soon as the intensity of even one pixel changes by even one level in even one channel. And of course, a tiny change like that does not change the identity of the face in the image. Even much, much larger changes at the pixel level will not necessarily change the identity - they might be due to rotation of the head, different lighting conditions, beards/sunglasses/hairstyle changes, etc.
If you say that your webcam "detects faces", is that due to face-detection technology supplied by the webcam manufacturer? If so, start with their API documentation. Maybe they have support for face recognition as well? Check that out, and then google "face recognition library" to compare other software approaches to this complex problem.
One option you might decide to explore further is OpenCV, which has Python bindings and which contains tools for both detection (using the CascadeClassifier object) and recognition (using FaceRecognizer). Here is a tutorial: http://docs.opencv.org/2.4/modules/contrib/doc/facerec/tutorial/facerec_video_recognition.html
Most "recognition" approaches are supervised in that they require a "training set" to be specified in advance. Maybe your application allows for this: maybe you know in advance who is going to be in the photos and already have pictures of them, each associated with an identity. (For example, Facebook's face recognition is able to exploit the fact that people have previously tagged faces in photos and thereby provided multiple training points for images that are associated with a particular identity.) If not, then you'll have to come up with some scheme of building a training set on-the-fly, and continuously or periodically updating the training. This particular subtype of face recognition problem can be described as "unsupervised face clustering" - i.e. grouping face images together without knowing a-priori the identity of any of them. Facebook also does this to some extent. This is even closer to the cutting edge of the cutting edge, and you'll probably need to delve into the computer-science literature to figure out how it's done. See here, for example: http://bitsearch.blogspot.com/2013/02/unsupervised-face-clustering-with-opencv.html
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I have series of line data (2-3 connected points).
What is the best machine learning algorithm that I can use to be able to classify lines to their location similarities? (image below)
Preferably python libraries such as SciKit-Learn.
Edit:
I have tried DBSCAN, but the problem I faced was if there are two lines intersect each other, sometimes DBSCAN consider them to one group even though they are completely in different direction.
Here is a solution I found so far:
GeoPath Clustering Algorithm
The idea here is to cluster geo paths that travel very similar to each other into groups.
Steps:
1- Cluster lines based on slope
2- Within each cluster from step 1, find centriod of lines and by using k-mean
algorithm cluster them into smaller groups
3- Within each geoup from step 2, calculate lenght of each line and group lines within defined length threshold
Result will be small groups of lines that have similar slope, close to each other and with similar travel distance.
Here are screen shots of visualization:
Yellow lines are all lines and red are cluster of paths travel together.
I'll throw an answer since I think the current one is incomplete...and I also think the comment of "simple heuristic" is premature. I think that if you cluster on points, you'll get a different result than what your diagram depicts. As the clusters will be near the end-points and you wouldn't get your nice ellipses.
So, if your data really does behave similarly to how you display it. I would take a stab at turning each set of 2/3 points into a longer list of points that basically trace out the lines. (you will need to experiment on how dense)
Then run HDBSCAN on the result see video ( https://www.youtube.com/watch?v=AgPQ76RIi6A ) to get your clusters. I believe "pip install hdbscan" installs it.
Now, when testing a new sample, first decompose it into many(N) points and fit them with your hdbscan model. I reckon that if you take a majority voting approach with your N points, you'll get the best overall cluster to which the "line" belongs.
So, while I sort of agree with the "simple heuristic" comment, it's not so simple if you want the whole thing automated. And once you watch the video you may be convinced that HDBSCAN, because of its density-based algorithm, will suit this problem(if you decide to create many points from each sample).
I'll wrap up by saying that I'm sure there are line-intersection models that have done this before...and that there does exist heuristics and rules that can do the job. Likely, they're computationally more economical too. My answer is just something organic using sklearn as you requested...and I haven't even tested it! It's just how I would proceed if I were in your shoes.
edit
I poked around and there a couple of line similarity measures you can possibly try. Frechet and Hausdorff distance measures.
Frechet: http://arxiv.org/pdf/1307.6628.pdf
Hausdorff: distance matrix of curves in python for a python example.
If you generate all pair-wise similarities and then group them according to similarity and/or into N bins, you can then call those bins your "clusters" (not kmeans clusters though!). For each new line, generate all similarities and see which bin it belongs to. I revise my original comment of possibly being computationally less intensive...you're lucky your lines only have 2 or 3 points!
The problem you're trying to solve is called clustering. For an overview of clustering algorithms in sklearn, see http://scikit-learn.org/stable/modules/clustering.html#clustering.
Edit 2: KMeans was what sprung to mind when I first saw your post, but based on feedback from the comments it looks like it's not a good fit. You may want to try sklearn's DBSCAN instead.
A potential transformation or extra feature you could add would be to fit a straight line to each set of points, and then use the (slope, intercept) pair. You may also want to use the centroid of each line.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
I made a heightmap generator which uses gradient/value noise to generate a terrain. The problem is, that the height map is too chaotic to look realistic.
Here's what I am talking about:
Here's the map without the colors:
I used a 257x257 grid of blocks with 17x17 gradients.
As it is visible, there are too many islands as well as there are some random small beach islands in the middle of the ocean.
Also, There are a lot of sharp edges, especially for the mountain terrain (dark gray).
What I would like is a smoother and less chaotic terrain, such as a large island, etcetera. How do I do that?
In games, the most common noise generator for textures and heightmaps is the Perlin Noise.
I don't know from your answer is you actually want to create the noise generator or use it in your application.
If you are looking to create your own Perlin Noise Generator, this would be a good starting point.
I would however recommend using the noise (https://pypi.python.org/pypi/noise/) library available through pip using:
pip install noise
You can then use the noise.snoise2(x,y,a,b,c) function and fiddle with with the different parameters.
I would recommend reading this article: http://simblob.blogspot.ch/2010/01/simple-map-generation.html if you want to learn more about terrain generation.
Look at this article where Amit walks through some map generation techniques. He even has sample code online.
In the article, he takes perlin noise as a randomization parameter to his terrain generator, but doesn't use it as the whole generator. The result looks really good. (I'd post a picture of the result, but I don't know of copyright issues just yet.)
While you're at it, Amit has written and curated on things game programming for years and years. Here and here are a few more articles of his on the subject. I hope this doesn't become a time sink for you, I've certainly spent many hours on his blog. :)
(PS. I prefer simplex noise over perlin noise. Same inventor, simpler implementation, and looks better to me.)
From what I see, your sample may lack octaves and interpolation.
Depending on the implementation you are using, you may play with octave number, frequency, persistence / lacunarity, various interpolation techniques, etc...
Try playing / mixing with turbulence too (easy way to add fancy features to your height maps).
Many simplex noise (Ken Perlin's too, but scales better / faster on more dimensions) implementations deal with pretty complete set of parameters for you to play with, when generating your height maps.