Strategy for isolating 3d data points

Strategy for isolating 3d data points - python

I have two sets of points, one from an analysis and another that I will use for the results of post-processing on the analysis data.
The analysis data, in black, is scattered.
The points used for results are red.
Here are the two sets on the same plot:
The problem I have is this: I will be interpolating onto the red points, but as you can see there are red points which fall inside areas of the black data set that are in voids. Interpolation causes there to be non-zero values at those points but it is essential that these values be zero in the final data set.
I have been thinking of several strategies for getting those values to zero. Here are several in no particular order:
Find a convex hull whose vertices only contain black data points and which contains only red data points inside the convex set. Also, the area of this hull should be maximized while still meeting the two criteria.
This has proven to be fairly difficult to implement, mostly due to having to select which black data points should be excluded from the iterative search for a convex hull.
Add an extra dimension to the data sets with a single value, like 1 or 0, so both can be part of the same data set yet still distinguishable. Use a kNN (nearest neighbor) algorithm to choose only red points in the voids. The basic idea is that red points in voids will have nearest n(6?) nearest neighbors which are in their own set. Red data points that are separated by a void boundary only will have a different amount, and lastly, the red points at least one step removed from a boundary will have a almost all black data set neighbors. The existing algorithms I have seen for this approach return indices or array masks, both of which will be a good solution. I have not yet tried implementing this yet.
Manually extract boundary points from the SolidWorks model that was used to create the black data set. No on so many levels. This would have to be done manually, z-level by z-level, and the pictures I have shown only represent a small portion of the actual, full set.
Manually create masks by making several refinements to a subset of red data points that I visually confirm to be of interest. Also, no. Not unless I have run out of options.
If this is a problem with a clear solution, then I am not seeing it. I'm hoping that proposed solution 2 will be the one, because that actually looks like it would be the most fun to implement and see in action. Either way, like the title says, I'm still looking for direction on strategies to solve this problem. About the only thing I'm sure of is that Python is the right tool.
EDIT:
The analysis data contains x, y, z, and 3 electric field component values, Ex, Ey, and Ez. The voids in the black data set are inside of metal and hence have no change in electric potential, or put another way, the electric field values are all exactly zero.
This image shows a single z-layer using linear interpolation of the Ex component with scipy's griddata. The black oval is a rough indicator of the void boundary for that central racetrack shaped void. You can see that there is red and blue (for + and - E field in the x direction) inside the oval. It should be zero (lt. green in this plot). The finished data is going to be used to track a beam of charged particles and so if a path of one of the particles actually crossed into the void the software that does the tracking can only tell if the electric potential remains constant, i.e. it knows that the path goes through solid metal and it discards that path.
If electric field exists in the void the particle tracking software doesn't know that some structure is there and bad things happen.

You might be able to solve this with the big-data technique called "Support Vector Machine". Assign the 0 and 1 classifications as you mentioned, and then run this through the libsvm algorithm. You should be able to use this model to classify and identify the points you need to zero out, and do so programmatically.
I realize that there is a learning curve for SVM and the libsvm implementation. If this is outside your effort budget, my apologies.

Related

Algorithm to create polygons(No Thiesen/Voronoi)

I have been trying to create custom regions for states. I want to fill the state map by using area of influence of points.
The below image represents what I have been trying. The left image shows the points and I just want to fill all the areas as in the right image. I have used Voronoi/Thiesen, but it leaves some points outside the area since it just takes the centroid to color the polygon.
Is there any algorithm or process to achieve that?, now I am using in Python.

You've identified your basic problem: you used a cluster-unit Voronoi algorithm, which is too simplistic for your application. You need to apply that same algebra to the points themselves, not to the region as a single-statistic entity.
To this end, I strongly recommend a multi-class SVM (Support Vector Machine) algorithm, which will identify the largest gaps between identified regions (classes) of points. Use a Gaussian kernel modification (of a very low degree) to handle non-linear boundaries. You will almost certainly get simple curves instead of lines.

Creating string art from image

I am relatively new to python. I would like to make some string-art portraits. I was watching this video which really intrigued me:
https://youtu.be/RSRNZaq30W0?t=56
I understand that to achieve this, I would first need to load the image, then do some edge-detection and then use some form of Delaunay triangulation but have no idea where to even start.
I looked up some sample code for OpenCV and figured out how to do basic edge-detection. How do I then convert those to points? And then what sort of algorithm would I need to "fill in" the different gradients?
I don't even know if this is the right approach to achieve this. Could someone please point me in the right direction and perhaps give me some sample code to get started? I would really appreciate it very much.

Edge detection or triangulation is less important in this application. The core part is to understand the pseudo-code at 1:27 of the video. The final product uses a single string at wrap around different nails in particular way, so that: darker areas in original image have less string density, and brighter areas have more strings crossing over.
The initial preparation is to:
generate an edge dection version of the image (A)
generate a blurred version of the image (B)
Then the first step is to create random positions for the nails. Apparently to achieve a good outcome, if a random-generated nail is close enough to the 'edge' of a black-white image, you should 'snap' it to the edge, so that later the strings wrapping around these edge nails will create an accurate boundary just like in the original picture. Here you use the image A) to adjust your nails. For example, just perform some potential minimization:
Add small random position change to the nails. If a nail now gets
close enough to a white point (edge) in image A), directly change to
that position.
Compute the potential. Make sure your potential function
penalizes two points that come too close. Repeat 1) 100 times to
pick one with lowest potential.
Iterate 1) and 2) 20 times
Next you decide how you want the strings to wrap around the nails.
Starting from a point A, look at some neighboring points (within certain radius) B1, B2, B3, etc. Imagine if you attach a string with certain width from A to Bi, it visually changes your string image P in a slight way. Render line segment A-B1 on P to get P1, render A-B2 on P to get P2, etc.
Find the best Bi so that the new image Pi looks closer to the original. You can just do a pixel-wise comparison between the string image and the original picture, and use this measurement to score each Bi. The video author used a blurred image B) to get rid of textures that may randomly impact his scoring algorithm.
Now the optimal Bi becomes the new A. Find its neighbors and loop over. The algorithm may stop if adding any new strings only negatively impacts the score.
There are cases where bright areas in a photo are widely separated, so any white strings crossing the dark gap will only decrease the score. Use your judgement to tweak the algorithm to workaround those non-convex scenarios.

Laplace interpolation between known values in a matrix

I'm working on a heatmap generation program which hopefully will fill in the colors based on value samples provided from a building layout (this is not GPS based).
If I have only a few known data points such as these in a large matrix of unknowns, how do I get the values in between interpolated in Python?:
0,0,0,0,1,0,0,0,0,0,5,0,0,0,0,9
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,0,0,2,0,0,0,0,0,0,0,0,8,0,0,0
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,8,0,0,0,0,0,0,0,6,0,0,0,0,0,0
0,0,0,0,0,3,0,0,0,0,0,0,0,0,7,0
I understand that bilinear won't do it, and Gaussian will bring all the peaks down to low values due to the sheer number of surrounding zeros. This is obviously a matrix handling proposition, and I don't need it to be Bezier curve smooth, just close enough to be a graphic representation would be fine. My matrix will end up being about 1500×900 cells in size, with approximately 100 known points.
Once the values are interpolated, I have written code to convert it all to colors, no problem. It's just that right now I'm getting single colored pixels sprinkled over a black background.

Proposing a naive solution:
Step 1: interpolate and extrapolate existing data points onto surroundings.
This can be done using "wave propagation" type algorithm.
The known points "spread out" their values onto surroundings until all the grid is "flooded" with some known values. At the end of this stage you have a number of intersected "disks", and no zeroes left.
Step 2: smoothen the result (using bilinear filtering or some other filtering).
If you are able to use ScyPy, then interp2d does exactly what you want. A possible problem with is that it seems to not extrapolate smoothly according to this issue. This means that all values near the walls are going to be the same as closest their neighbour points. This can be solved by putting thermometers in all 4 corners :)

Find the most significant corner of a skeleton and segment the skeleton at that corner

I have images of ore seams which I have first skeletonised (medial axis multiplied by the distance transform), then extracted corners (see the green dots). It looks like this:
The problem is to find a turning point and then segment the seam by separating the seam at the turning point. Not all skeletons have turning points, some are quite linear, and the turning points can be in any orientation. But the above image shows a seam which does have a defined turning point. Other examples of turning points look like (using ASCII): "- /- _". "X" turning points don't really exist.
I've tried a number of methods including downsampling the image, curve fitting, k-means clustering, corner detection at various thresholds and window sizes, and I haven't figured it out yet. (I'm new to to using scikit)
The technique must be able to give me some value which I can use heuristically determine whether there is a turning point or not.
What I'd like to do is to do some sort of 2 line ("piecewise"?) regression and find an intersection or some sort of rotated polynomial regression, then determine if a turning point exists, and if it does exist, the best coordinate that represents the turning point. Here is my work in progress: https://gist.github.com/anonymous/40eda19e50dec671126a
From there, I learned that a watershed segmentation with appropriate label coordinates should be able to segment the skeleton.
I found this resource: Fit a curve for data made up of two distinct regimes
But I wasn't able to figure out to apply it my current situation. More importantly there's no way for me to guess a-priori what the initial coefficients are for the fitting function since the skeletons can be in any orientation.

Test if point is in some rectangle

I have a large collection of rectangles, all of the same size. I am generating random points that should not fall in these rectangles, so what I wish to do is test if the generated point lies in one of the rectangles, and if it does, generate a new point.
Using R-trees seem to work, but they are really meant for rectangles and not points. I could use a modified version of a R-tree algorithm which works with points too, but I'd rather not reinvent the wheel, if there is already some better solution. I'm not very familiar with data-structures, so maybe there already exists some structure that works for my problem?
In summary, basically what I'm asking is if anyone knows of a good algorithm, that works in Python, that can be used to check if a point lies in any rectangle in a given set of rectangles.
edit: This is in 2D and the rectangles are not rotated.

This Reddit thread addresses your problem:
I have a set of rectangles, and need to determine whether a point is contained within any of them. What are some good data structures to do this, with fast lookup being important?
If your universe is integer, or if the level of precision is well known and is not too high, you can use abelsson's suggestion from the thread, using O(1) lookup using coloring:
As usual you can trade space for
time.. here is a O(1) lookup with very
low constant. init: Create a bitmap
large enough to envelop all rectangles
with sufficient precision, initialize
it to black. Color all pixels
containing any rectangle white. O(1)
lookup: is the point (x,y) white? If
so, a rectangle was hit.
I recommend you go to that post and fully read ModernRonin's answer which is the most accepted one. I pasted it here:
First, the micro problem. You have an
arbitrarily rotated rectangle, and a
point. Is the point inside the
rectangle?
There are many ways to do this. But
the best, I think, is using the 2d
vector cross product. First, make sure
the points of the rectangle are stored
in clockwise order. Then do the vector
cross product with 1) the vector
formed by the two points of the side
and 2) a vector from the first point
of the side to the test point. Check
the sign of the result - positive is
inside (to the right of) the side,
negative is outside. If it's inside
all four sides, it's inside the
rectangle. Or equivalently, if it's
outside any of the sides, it's outside
the rectangle. More explanation here.
This method will take 3 subtracts per
vector * times 2 vectors per side,
plus one cross product per side which
is three multiplies and two adds. 11
flops per side, 44 flops per
rectangle.
If you don't like the cross product,
then you could do something like:
figure out the inscribed and
circumscribed circles for each
rectangle, check if the point inside
the inscribed one. If so, it's in the
rectangle as well. If not, check if
it's outside the circumscribed
rectangle. If so, it's outside the
rectangle as well. If it falls between
the two circles, you're f****d and you
have to check it the hard way.
Finding if a point is inside a circle
in 2d takes two subtractions and two
squarings (= multiplies), and then you
compare distance squared to avoid
having to do a square root. That's 4
flops, times two circles is 8 flops -
but sometimes you still won't know.
Also this assumes that you don't pay
any CPU time to compute the
circumscribed or inscribed circles,
which may or may not be true depending
on how much pre-computation you're
willing to do on your rectangle set.
In any event, it's probably not a
great idea to test the point against
every rectangle, especially if you
have a hundred million of them.
Which brings us to the macro problem.
How to avoid testing the point against
every single rectangle in the set? In
2D, this is probably a quad-tree
problem. In 3d, what generic_handle
said - an octree. Off the top of my
head, I would probably implement it as
a B+ tree. It's tempting to use d = 5,
so that each node can have up to 4
children, since that maps so nicely
onto the quad-tree abstraction. But if
the set of rectangles is too big to
fit into main memory (not very likely
these days), then having nodes the
same size as disk blocks is probably
the way to go.
Watch out for annoying degenerate
cases, like some data set that has ten
thousand nearly identical rectangles
with centers at the same exact point.
:P
Why is this problem important? It's
useful in computer graphics, to check
if a ray intersects a polygon. I.e.,
did that sniper rifle shot you just
made hit the person you were shooting
at? It's also used in real-time map
software, like say GPS units. GPS
tells you the coordinates you're at,
but the map software has to find where
that point is in a huge amount of map
data, and do it several times per
second.
Again, credit to ModernRonin...

For rectangles that are aligned with the axes, you only need two points (four numbers) to identify the rectangle - conventionally, bottom-left and top-right corners. To establish whether a given point (Xtest, Ytest) overlaps with a rectangle (XBL, YBL, XTR, YTR) by testing both:
Xtest >= XBL && Xtest <= XTR
Ytest >= YBL && Ytest <= YTR
Clearly, for a large enough set of points to test, this could be fairly time consuming. The question, then, is how to optimize the testing.
Clearly, one optimization is to establish the minimum and maximum X and Y values for the box surrounding all the rectangles (the bounding box): a swift test on this shows whether there is any need to look further.
Xtest >= Xmin && Xtest <= Xmax
Ytest >= Ymin && Ytest <= Ymax
Depending on how much of the total surface area is covered with rectangles, you might be able to find non-overlapping sub-areas that contain rectangles, and you could then avoid searching those sub-areas that cannot contain a rectangle overlapping the point, again saving comparisons during the search at the cost of pre-computation of suitable data structures. If the set of rectangles is sparse enough, there may be no overlapping, in which case this degenerates into the brute-force search. Equally, if the set of rectangles is so dense that there are no sub-ranges in the bounding box that can be split up without breaking rectangles.
However, you could also arbitrarily break up the bounding area into, say, quarters (half in each direction). You would then use a list of boxes which would include more boxes than in the original set (two or four boxes for each box that overlapped one of the arbitrary boundaries). The advantage of this is that you could then eliminate three of the four quarters from the search, reducing the amount of searching to be done in total - at the expense of auxilliary storage.
So, there are space-time trade-offs, as ever. And pre-computation versus search trade-offs. If you are unlucky, the pre-computation achieves nothing (for example, there are two boxes only, and they don't overlap on either axis). On the other hand, it could achieve considerable search-time benefit.

I suggest you take a look at BSP trees (and possible quadtrees or octrees, links available on that page as well). They are used to partition the whole space recursively and allow you to quickly check for a point which rectangles you need to check at all.
At minimum you just have one huge partition and need to check all rectangles, at maximum your partitions get so small, that they get down to the size of single rectangles. Of course the more fine-grained the partition, the longer you need to walk down the tree in order to find the rectangles you want to check.
However, you can freely decide how many rectangles are suitable to be checked for a point and then create the corresponding structure.
Pay attention to overlapping rectangles though. As the BSP tree needs to be precomputed anyways, you may as well remove overlaps during that time, so you can get clear partitions.

Your R-tree approach is the best approach I know of (that's the approach I would choose over quadtrees, B+ trees, or BSP trees, as R-trees seem convenient to build in your case). Caveat: I'm no expert, even though I remember a few things from my senior year university class of algorithmic!

Why not try this. It seems rather light on both computation and memory.
Consider the projections of all the rectangles onto the base line of your space. Denote that set of line intervals as
{[Rl1, Rr1], [Rl2, Rr2],..., [Rln, Rrn]}, ordered by increasing left coordinates.
Now suppose your point is (x, y), start a search at the left of this set until you reach a line interval that contains the point x.
If none does, your point (x,y) is outside all rectangles.
If some do, say [Rlk, Rrk], ..., [Rlh, Rrh], (k <= h) then just check whether y is within the vertical extent of any of these rectangles.
Done.
Good luck.
John Doner

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.