Design an algorithm to find the average distance between two path

Design an algorithm to find the average distance between two path - python

I have a database of time-stamped points which represent a path being drawn by a user in a 2-D plane. I also have a list of points which represent the goal path. These are not timestamped. I want to find how accurate the users' drawn paths are as compared to the goal path. The parameter to define accuracy is not clear and something I'm trying to decide. I don't really care about the temporal aspect of the user drawn path. I only want to compare the two paths.
I'm doing this to do the analysis for an experiment done by a behavioral lab. This is my current algorithm.
Find the total distance of the user drawn path by adding the straight line difference between all points.
At every 1% of the total distance of both the user path and the goal path find the straight line distance between the two paths.
Average the 100 points together to get the total average distance between the two paths.
Increase the sampling frequency if I want to have a more accurate number
I'm only looking for algorithmic help since implementing this would be quite trivial. My issue is that I'm not sure whether I'm missing something here, and nor sure of the correctness of this algorithm and wanted to run it by some experienced programmers.
I'm not a programmer by trade but this data analysis is essential for the paper the lab is working on. I'm not sure if I need to be familiar with some higher level Math which makes this trivial.
I'm completely language agnostics and would appreciate any pointers to any existing algorithms or novel solutions which solve this problem.

Related

Search for similarity of a mesh within another?

First of all, sorry if this is rather basic but it is certainly not my field of expertise.
So, I'm working with protein surface and I have this cavity:
Protein cavity
It is part of a larger, watertight, triangular mesh (.ply format) that represents a protein surface.
What I want to do, is find whether this particular "sub-mesh" is found in other proteins. However, I'm not looking for a perfect fit, rather similar "sub-meshes" since the only place I will find this exact shape is in the original protein.
I've been reading the docs for the Python modules trimesh and open3d. Trimesh does have a comparison module, but it doesn't seem to have the functionality I'm looking for. Also, open3d has a "compute point cloud distance" function that is recommended to compare the difference between two point cloud or meshes.
However, since what I'm actually trying to find is similarity, I would need a way to fit my cavity's "sub-mesh" onto the surface of the protein I'm analyzing, and then "score" how different or deformed the fitted submesh is. Another way would be to rotate and translate my sub-mesh to match the most vertices and faces on the protein surface and score that I guess.
Just a heads-up, I'm a biotechnologist, self-taught in Python and with extremely limited experience in anything 3D. At this point, anything helps, be it a paper, Python module or whatever knowledge you have that you think might be useful.
Thank you very much for any help you can provide with this!

Smart way to detect too far away point from a row of points?

I'm working on a python script whose goal is to detect if a point is out of a row of points (gps statement from an agricultural machine).
Input data are shapefile and I use Geopandas library for all geotreatments.
My first idea was to make a buffer around the 2 points around considered point. After that, I watch if my point is in the buffer. But results aren't good.
So I ask myself if there is a mathematical smart method, maybe with Scikit lib... Somebody is able to help me?

try arcgis.
build two new attributes in arcgis with their X and Y coordinate,then calculate the distance between the points you want

Question is kinda vague, but my guess would be to find approximation/regression line (I believe, numpy.polyfit of 2nd degree) and take points with largest distance from line, probably with threshold relative to overall fit loss

Calculating a trajectory between two known points and an IMU

Query:
I want to estimate the trajectory of a person wearing an IMU between point a and point b. I know the exact location of point a and point b in an x,y,z space and the time it takes the person to walk between the points.
Is it possible to reconstruct the trajectory of the person moving from point a to point b using the data from an IMU and the time?

This question is too broad for SO. You could write a PhD thesis answering it, and I know people who have.
However, yes, it is theoretically possible.
However, there are a few things you'll have to deal with:
Your system is going to discretize time on some level. The result is that your estimate of position will be non-smooth. Increasing sampling rates is one way to address this, but this frequently increases the noise of the measurement.
Possible paths are non-unique. Knowing the time it takes to travel from a-b constrains slightly the information from the IMUs, but you are still left with an infinite family of possible routes between the two. Since you mention that you're considering a person walking between two points with z-components, perhaps you can constrain the route using knowledge of topography and roads?
IMUs function by integrating accelerations to velocities and velocities to positions. If the accelerations have measurement errors, and they always do, then the error in your estimate of the position will grow over time. The longer you run the system for, the more the results will diverge. However, if you're able to use roads/topography as a constraint, you may be able to restart the integration from known points in space; that is, if you can detect 90 degree turns on a street grid, each turn gives you the opportunity to tie the integrator back to a feasible initial condition.
Given the above, perhaps the most important question you have to ask yourself is how much error you can tolerate in your path reconstruction. Low-error estimates are going to require better (i.e. more expensive) sensors, higher sampling rates, and higher-order integrators.

DBSCAN with potentially imprecise lat/long coordinates

I've been running sci-kit learn's DBSCAN implementation to cluster a set of geotagged photos by lat/long. For the most part, it works pretty well, but I came across a few instances that were puzzling. For instance, there were two sets of photos for which the user-entered text field specified that the photo was taken at Central Park, but the lat/longs for those photos were not clustered together. The photos themselves confirmed that they both sets of observations were from Central Park, but the lat/longs were in fact further apart than epsilon.
After a little investigation, I discovered that the reason for this was because the lat/long geotags (which were generated from the phone's GPS) are pretty imprecise. When I looked at the location accuracy of each photo, I discovered that they ranged widely (I've seen a margin of error of up to 600 meters) and that when you take the location accuracy into account, these two sets of photos are within a nearby distance in terms of lat/long.
Is there any way to account for margin of error in lat/long when you're doing DBSCAN?
(Note: I'm not sure if this question is as articulate as it should be, so if there's anything I can do to make it more clear, please let me know.)

Note that DBSCAN doesn't actually need the distances.
Look up Generalized DBSCAN: all it really uses is a "is a neighbor of" relationship.
If you really need to incorporate uncertainty, look up the various DBSCAN variations and extensions that handle imprecise data explicitely. However, you may get pretty much the same results just by choosing a threshold for epsilon that is somewhat reasonable. There is room for choosing a larger epsilon that the one you deem adequate: if you want to use epsilon = 1km, and you assume your data is imprecise on the range of 100m, then use 1100m as epsilon instead.

Find geometry (shapes) from node cloud

I am working on some code that needs to recognize some fairly basic geometry based on a cloud of nodes. I would be interested in detecting:
plates (simple bounded planes)
cylinders (two node loops)
half cylinders (arc+line+arc+line)
domes (n*loop+top node)
I tried searching for "geometry from node cloud", "get geometry from nodes", but I cant find a nice reference. There is probably a whole field on this, can someone point me the way? i already started coding something, but I feel like re-inventing the wheel...

A good start is to just get the convex hull (the tightest fitting polygon that can surround your node cloud) of the nodes, use either Grahams algorithm or QuickHull. Note that QuickHull is easier to code and probably faster, unless you are really unlucky. There is a pure python implementation of QuickHull here. But I'm sure a quick Google search will show many other results.
Usually the convex hull is the starting point for most other shape recognition algorithms, if your cloud can be described as a sequence of strokes, there are many algorithms and approaches:
Recognizing multistroke geometric shapes: an experimental evaluation
This may be even better, once you have the convex hull, break down the polygon to pairs of vertices and run this algorithm to match based on similarity to training data:
Hierarchical shape recognition using polygon approximation and dynamic alignment
Both of these papers are fairly old, so you can use google scholar to see who cites these papers and there you have a nice literature trail of attempts to solve this problem.
There are a multitude of different methods and approaches, this has been well studied in the literature, what method you take really depends on the level of accuracy you hope to achieve, and the amount of shapes you want to recognize, as well as your input data set.
Either way, using a convex hull algorithm to produce polygons out of point clouds is the very first step and usually input to the more sophisticated algorithmms.
EDIT:
I did not consider the 3D case, for that their is a lot of really interesting work in computer graphics that has focused on this, for instance this paper Efficient RANSAC for Point-Cloud Shape Detection
Selections from from Abstract:
We present an automatic algorithm to detect basic shapes in unorganized point clouds. The algorithm decomposes the point cloud into a concise, hybrid structure of inherent shapes and a set of remaining points. Each detected shape serves as a proxy for a set of corresponding points. Our method is based on random sampling and detects planes, spheres, cylinders, cones and tori...We demonstrate that the algorithm is robust even in the presence of many outliers and a high degree of noise...Moreover the algorithm is conceptually simple and easy to implement...

To complement Josiah's answer -- since you didn't say whether there is a single such object to be detected in your point cloud -- a good solution can be to use a (generalized) Hough transform.
The idea is that each point will vote for a set of candidates in the parameter space of the shape you are considering. For instance, if you think the current object is a cylinder, you have a 7D parameter space consisting of the cylinder center (3D), direction (2D), height (1D) and radius (1D), and each point in your point cloud will vote for all parameters that agree with the observation of that point. Doing so allows to find the parameters of the actual cylinder by taking the set of parameters who have the highest number of votes.
Doing the same thing for planes, spheres etc.., will give you the best matching shape.
A strength of this method is that it allows for multiple objects in the same point cloud.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.