My goal is to draw a rectangle border around the face by removing the neck area connected to the whole face area. All positive values here represent skin color pixels. Here I have so far filtered out the binary image using OpenCV and python. Code so far skinid.py
Below is the test image.
Noise removals have also been applied to this binary image
Up to this point, I followed this paper Face segmentation using skin-color map in videophone applications. And for the most of it, I used custom functions rather than using built-in OpenCV functions because I kind of wanted to do it from scratch. (although some erosion, opening, closing were used to tune it up)
I want to know a way to split the neck from the whole face area and remove it like this,
as I am quite new to the whole image processing area.
Perform a distance transform (built into opencv or you could write by hand its a pretty fun and easy one to write using the erode function iteratively, and adding the result into another matrix each round, lol slow but conceptually easy). On the binary image you presented above, the highest value in a distance transform (and tbh I think pretty generalized across any mug shots) will be the center of the face. So that pixel is the center of your box, but also that value (value of that pixel after the distance transform) will give you a pretty solid approx face size (since it is going to be the pixel distance from the center of the face to the horizontal edges of the face). Depending on what you are after, you may just be able to multiply that distance by say 1.5 or so (figure out standard face width to height ratio and such to choose your best multiplier), set that as your circle radius (or half side width for a box) and call it a day. Comment if you need anything clarified as I am pretty confident in this answer and would be happy to write up some quick code (in c++ opencv) if you need/ it would help.
(alt idea). You could tweak your color filter a bit to reject darker areas (this will at least in the image presented) create a nice separation between your face and neck due to the shadowing of the chin. (you may have to dial back your dilate/ closing op tho)
I am trying to recover a trajectory of a 2D camera, using a sequence of 2D-images and OpenCV. But the trajectory I get is not so good as I would like it to be. It goes back and forth instead of going just forth.
I have a sequence of photos taken on 2D-camera while it was moving (KITTI dataset, outdoors part, namely). For each two sequential frames I compute the rotation matrix (R) and translation vector (t) with E = cv2.findEssentialMat() and cv2.recoverPose(E, ...), and then I estimate the trajectory, assuming that coordinates of every translation vector are given in local coordinate system, which position is set by the corresponding rotation matrix.
upd: Each recovered position looks like [X,Y,Z], and I scatter (X_i, Y_i) for every i (these points are thought to be 2D positions), so the following graphs are my estimated trajectories.
Here's what I get instead of a straight line (the camera was moving straight forward). Previous results were even worse.
The green point is where it starts and the red point is where it ends. So most of the time it even moves backwards. This, though, is probably because of a mistake in the beginning, which was the cause of everything turning around (right?)
Here's what I do:
E, mask = cv2.findEssentialMat(points1, points2, K_00, cv2.RANSAC, 0.99999, 0.1)
inliers, R, t, mask = cv2.recoverPose(E, points1, points2, K_00, R, t, mask)
Seems to me that recoverPose somehow chooses wrong R and t sign on some steps. So the trajectory that was supposed to go forward, goes back. And then forth again.
What I did to improve the situation was:
1) skip the frames with too many outliers (I check this both after using findEssentialMat and after using recoverPose)
2) set the threshold for RANSAC method in findEssentialMat to 0.1
3) increase the number of the feature points on each image from 8 to 24.
This didn't really help.
Here I need to note: I know that on practice, 5-point algorithm, which is used for computing the essential matrix, needs a lot more points than 8 or even 24. And maybe this is actually the problem.
So the questions are:
1) Can the number of feature points (approx. 8-24) be the cause of recoverPose mistakes?
2) If checking the number of outliers if the right thing, then what percentage of outliers should I set as the limitation?
3) I estimate positions like this (instead of simple p[i+1] = R*p[i]+t):
C = np.dot(R, C)
p[i+1] = p[i] + np.dot(np.linalg.inv(C), t)
This is because I can't help thinking of t as a vector in local coordinates, so C is the transformation matrix, which is updated on every step to summarize the rotations. Is that right or not really?
4) It's really possible that I am missing something, since my knowledge of the topic seems tiny. Is there anything (anything!) you could recommend?
Huge thanks for your time! I would appreciate any advice.
upd: for example, here are the first six rotation matrices, translation vectors, and recovered positions I get. Signs of t seem a bit crazy.
upd: here's my code. (I'm not a really good programmer yet). The main idea is that my feature points are corners of bouding boxes of static objects, which I detect with Faster R-CNN (I used this implementation). So the first part of the code detects objects, and the second part uses detected feature points for recovering the trajectory.
Here's the dataset I use (this is part 2011_09_26_drive_0005 from here).
I have a Python program where people can draw simple line drawings using a touch screen. The images are documented in two ways. First, they are saved as actual image files. Second, I record 4 pieces of information at every refresh: the time point, whether contact was being made with the screen at the time (1 or 0), the x coordinate, and the y coordinate.
What I'd like to do is gain some measure of how similar a given drawing is to any other drawing. I've tried a few things, including simple Euclidian distance and similarity between each pixel, and I've looked at Frechet distance. None of these can give what I'm looking for.
The issues are that each drawing might have a different number of points, one segment does not always immediately connect to the next, and the order of the points is irrelevant. For instance, if you and I both draw something as simple as an ice cream cone, I might draw ice cream first, and you might draw the cone first. We may get an identical end result, but many of the most intuitive metrics would be totally thrown off.
Any ideas anyone has would be greatly appreciated.
if you care about how similar a drawing is to another, then there's no need to collect data at every refresh. just collect it once the drawer is done drawing
Then, you can use fourier analysis to break the images down in to frequency domains and run cross correlations on that
or some kind of 2D cross correlation on the images, I guess
I know that this problem has been solved before, but I've been great difficulty finding any literature describing the algorithms used to process this sort of data. I'm essentially doing some edge finding on a set of 2D data. I want to be able to find a couple points on an eye diagram (generally used to qualify high speed communications systems), and as I have had no experience with image processing I am struggling to write efficient methods.
As you can probably see, these diagrams are so called because they resemble the human eye. They can vary a great deal in the thickness, slope, and noise, depending on the signal and the system under test. The measurements that are normally taken are jitter (the horizontal thickness of the crossing region) and eye height (measured at either some specified percentage of the width or the maximum possible point). I know this can best be done with image processing instead of a more linear approach, as my attempts so far take several seconds just to find the left side of the first crossing. Any ideas of how I should go about this in Python? I'm already using NumPy to do some of the processing.
Here's some example data, it is formatted as a 1D array with associated x-axis data. For this particular example, it should be split up every 666 points (2 * int((1.0 / 2.5e9) / 1.2e-12)), since the rate of the signal was 2.5 GB/s, and the time between points was 1.2 ps.
Thanks!
Have you tried OpenCV (Open Computer Vision)? It's widely used and has a Python binding.
Not to be a PITA, but are you sure you wouldn't be better off with a numerical approach? All the tools I've seen for eye-diagram analysis go the numerical route; I haven't seen a single one that analyzes the image itself.
You say your algorithm is painfully slow on that dataset -- my next question would be why. Are you looking at an oversampled dataset? (I'm guessing you are.) And if so, have you tried decimating the signal first? That would at the very least give you fewer samples for your algorithm to wade through.
just going down your route for a moment, if you read those images into memory, as they are, wouldn't it be pretty easy to do two flood fills (starting centre and middle of left edge) that include all "white" data. if the fill routine recorded maximum and minimum height at each column, and maximum horizontal extent, then you have all you need.
in other words, i think you're over-thinking this. edge detection is used in complex "natural" scenes when the edges are unclear. here you edges are so completely obvious that you don't need to enhance them.
I have a large collection of rectangles, all of the same size. I am generating random points that should not fall in these rectangles, so what I wish to do is test if the generated point lies in one of the rectangles, and if it does, generate a new point.
Using R-trees seem to work, but they are really meant for rectangles and not points. I could use a modified version of a R-tree algorithm which works with points too, but I'd rather not reinvent the wheel, if there is already some better solution. I'm not very familiar with data-structures, so maybe there already exists some structure that works for my problem?
In summary, basically what I'm asking is if anyone knows of a good algorithm, that works in Python, that can be used to check if a point lies in any rectangle in a given set of rectangles.
edit: This is in 2D and the rectangles are not rotated.
This Reddit thread addresses your problem:
I have a set of rectangles, and need to determine whether a point is contained within any of them. What are some good data structures to do this, with fast lookup being important?
If your universe is integer, or if the level of precision is well known and is not too high, you can use abelsson's suggestion from the thread, using O(1) lookup using coloring:
As usual you can trade space for
time.. here is a O(1) lookup with very
low constant. init: Create a bitmap
large enough to envelop all rectangles
with sufficient precision, initialize
it to black. Color all pixels
containing any rectangle white. O(1)
lookup: is the point (x,y) white? If
so, a rectangle was hit.
I recommend you go to that post and fully read ModernRonin's answer which is the most accepted one. I pasted it here:
First, the micro problem. You have an
arbitrarily rotated rectangle, and a
point. Is the point inside the
rectangle?
There are many ways to do this. But
the best, I think, is using the 2d
vector cross product. First, make sure
the points of the rectangle are stored
in clockwise order. Then do the vector
cross product with 1) the vector
formed by the two points of the side
and 2) a vector from the first point
of the side to the test point. Check
the sign of the result - positive is
inside (to the right of) the side,
negative is outside. If it's inside
all four sides, it's inside the
rectangle. Or equivalently, if it's
outside any of the sides, it's outside
the rectangle. More explanation here.
This method will take 3 subtracts per
vector * times 2 vectors per side,
plus one cross product per side which
is three multiplies and two adds. 11
flops per side, 44 flops per
rectangle.
If you don't like the cross product,
then you could do something like:
figure out the inscribed and
circumscribed circles for each
rectangle, check if the point inside
the inscribed one. If so, it's in the
rectangle as well. If not, check if
it's outside the circumscribed
rectangle. If so, it's outside the
rectangle as well. If it falls between
the two circles, you're f****d and you
have to check it the hard way.
Finding if a point is inside a circle
in 2d takes two subtractions and two
squarings (= multiplies), and then you
compare distance squared to avoid
having to do a square root. That's 4
flops, times two circles is 8 flops -
but sometimes you still won't know.
Also this assumes that you don't pay
any CPU time to compute the
circumscribed or inscribed circles,
which may or may not be true depending
on how much pre-computation you're
willing to do on your rectangle set.
In any event, it's probably not a
great idea to test the point against
every rectangle, especially if you
have a hundred million of them.
Which brings us to the macro problem.
How to avoid testing the point against
every single rectangle in the set? In
2D, this is probably a quad-tree
problem. In 3d, what generic_handle
said - an octree. Off the top of my
head, I would probably implement it as
a B+ tree. It's tempting to use d = 5,
so that each node can have up to 4
children, since that maps so nicely
onto the quad-tree abstraction. But if
the set of rectangles is too big to
fit into main memory (not very likely
these days), then having nodes the
same size as disk blocks is probably
the way to go.
Watch out for annoying degenerate
cases, like some data set that has ten
thousand nearly identical rectangles
with centers at the same exact point.
:P
Why is this problem important? It's
useful in computer graphics, to check
if a ray intersects a polygon. I.e.,
did that sniper rifle shot you just
made hit the person you were shooting
at? It's also used in real-time map
software, like say GPS units. GPS
tells you the coordinates you're at,
but the map software has to find where
that point is in a huge amount of map
data, and do it several times per
second.
Again, credit to ModernRonin...
For rectangles that are aligned with the axes, you only need two points (four numbers) to identify the rectangle - conventionally, bottom-left and top-right corners. To establish whether a given point (Xtest, Ytest) overlaps with a rectangle (XBL, YBL, XTR, YTR) by testing both:
Xtest >= XBL && Xtest <= XTR
Ytest >= YBL && Ytest <= YTR
Clearly, for a large enough set of points to test, this could be fairly time consuming. The question, then, is how to optimize the testing.
Clearly, one optimization is to establish the minimum and maximum X and Y values for the box surrounding all the rectangles (the bounding box): a swift test on this shows whether there is any need to look further.
Xtest >= Xmin && Xtest <= Xmax
Ytest >= Ymin && Ytest <= Ymax
Depending on how much of the total surface area is covered with rectangles, you might be able to find non-overlapping sub-areas that contain rectangles, and you could then avoid searching those sub-areas that cannot contain a rectangle overlapping the point, again saving comparisons during the search at the cost of pre-computation of suitable data structures. If the set of rectangles is sparse enough, there may be no overlapping, in which case this degenerates into the brute-force search. Equally, if the set of rectangles is so dense that there are no sub-ranges in the bounding box that can be split up without breaking rectangles.
However, you could also arbitrarily break up the bounding area into, say, quarters (half in each direction). You would then use a list of boxes which would include more boxes than in the original set (two or four boxes for each box that overlapped one of the arbitrary boundaries). The advantage of this is that you could then eliminate three of the four quarters from the search, reducing the amount of searching to be done in total - at the expense of auxilliary storage.
So, there are space-time trade-offs, as ever. And pre-computation versus search trade-offs. If you are unlucky, the pre-computation achieves nothing (for example, there are two boxes only, and they don't overlap on either axis). On the other hand, it could achieve considerable search-time benefit.
I suggest you take a look at BSP trees (and possible quadtrees or octrees, links available on that page as well). They are used to partition the whole space recursively and allow you to quickly check for a point which rectangles you need to check at all.
At minimum you just have one huge partition and need to check all rectangles, at maximum your partitions get so small, that they get down to the size of single rectangles. Of course the more fine-grained the partition, the longer you need to walk down the tree in order to find the rectangles you want to check.
However, you can freely decide how many rectangles are suitable to be checked for a point and then create the corresponding structure.
Pay attention to overlapping rectangles though. As the BSP tree needs to be precomputed anyways, you may as well remove overlaps during that time, so you can get clear partitions.
Your R-tree approach is the best approach I know of (that's the approach I would choose over quadtrees, B+ trees, or BSP trees, as R-trees seem convenient to build in your case). Caveat: I'm no expert, even though I remember a few things from my senior year university class of algorithmic!
Why not try this. It seems rather light on both computation and memory.
Consider the projections of all the rectangles onto the base line of your space. Denote that set of line intervals as
{[Rl1, Rr1], [Rl2, Rr2],..., [Rln, Rrn]}, ordered by increasing left coordinates.
Now suppose your point is (x, y), start a search at the left of this set until you reach a line interval that contains the point x.
If none does, your point (x,y) is outside all rectangles.
If some do, say [Rlk, Rrk], ..., [Rlh, Rrh], (k <= h) then just check whether y is within the vertical extent of any of these rectangles.
Done.
Good luck.
John Doner