What is the best way to check collision of huge number of circles?
It's very easy to detect collision between two circles, but if we check every combination then it is O(n2) which definitely not an optimal solution.
We can assume that circle object has following properties:
Coordinates
Radius
Velocity
Direction
Velocity is constant, but direction can change.
I've come up with two solutions, but maybe there are some better solutions.
Solution 1
Divide whole space into overlapping squares and check for collision only with circles that are in the same square. Squares need to overlap so there won't be a problem when a circle moves from one square to another.
Solution 2
At the beginning distances between every pair of circles need to be calculated.
If the distance is small then these pair is stored in some list, and we need to check for collision in every update.
If the distance is big then we store after which update there can be a collision (it can be calculated because we know the distance and velocitites). It needs to be stored in some kind of priority queue. After previously calculated number of updates distance needs to be checked again and then we do the same procedure - put it on the list or again in the priority queue.
Answers to Mark Byers questions
Is it for a game?
It's for simulation, but it can be treated also as a game
Do you want to recalculate the new position every n milliseconds, and also check for collisions at this time?
Yes, time between update is constant.
Do you want to find the time at which the first/every collision occurs?
No, I want to find every collision and do 'something' when it occures.
How important is accuracy?
It depends on what do you mean by accuracy. I need to detect all collisions.
Is it a big problem if very small fast moving circles can pass through each other occasionally?
It can be assumed that speed is so small that it won't happen.
There are "spatial index" data-structures for storing your circles for quick comparison later; Quadtree, r-tree and kd-tree are examples.
Solution 1 seems to be a spatial index, and solution 2 would benefit from a spatial index every time you recalculate your pairs.
To complicate matters, your objects are moving - they have velocity.
It is normal to use spatial indexes for objects in games and simulations, but mostly for stationary objects, and typically objects that don't react to a collision by moving.
It is normal in games and such that you compute everything at set time intervals (discrete), so it might be that two objects pass through each other but you fail to notice because they moved so fast. Many games actually don't even evaluate collisions in strict chronological order. They have a spatial index for stationary objects e.g. walls, and lists for all the moving objects that they check exhaustively (although with relaxed discrete checks as I outlined).
Accurate continuous collision detection and where the objects react to collisions in simulations is usually much more demanding.
The pairs approach you outlined sounds promising. You might keep the pairs sorted by next collision, and reinsert them when they have collided in the appropriate new positions. You only have to sort the new generated collision list (O(n lg n)) for the two objects and then to merge two lists (the new collisions for each object, and the existing list of collisions; inserting the new collisions, removing those stale collisions that listed the two objects that collided) which is O(n).
Another solution to this is to adapt your spatial index to store the objects not strictly in one sector but in each that it has passed through since the last calculation, and do things discretely. This means storing fast moving objects in your spatial structure, and you'd need to optimise it for this case.
Remember that linked lists or lists of pointers are very bad for caching on modern processors. I'd advocate that you store copies of your circles - their important properties for collision detection at any rate - in an array (sequential memory) in each sector of any spatial index, or in the pairs you outlined above.
As Mark says in the comments, it could be quite simple to parallelise the calculations.
I assume you are doing simple hard-sphere molecular dynamic simulation, right? I came accros the same problem many times in Monte Carlo and molecular dynamic simulations. Both of your solutions are very often mentioned in literature about simulations. Personaly I prefer solution 1, but slightly modified.
Solution 1
Divide your space into rectangular cells that don't overlap. So when you check one circle for collision you look for all circles inside a cell that your first circle is, and look X cells in each direction around. I've tried many values of X and found that X=1 is the fastest solution. So you have to divide space into cells size in each direction equal to:
Divisor = SimulationBoxSize / MaximumCircleDiameter;
CellSize = SimulationBoxSize / Divisor;
Divisor should be bigger than 3, otherwise it will cause errors (if it is too small, you should enlarge your simulation box).
Then your algorithm will look like this:
Put all circles inside the box
Create cell structure and store indexes or pointers to circles inside a cell (on array or on a list)
Make a step in time (move everything) and update circles positions inside on cells
Look around every circle for collision. You should check one cell around in every direction
If there is a collision - do something
Go to 3.
If you will write it correctly then you would have something about O(N) complexity, because maximum number of circles inside 9 cells (in 2D) or 27 cells (in 3D) is constant for any total number of circles.
Solution 2
Ususaly this is done like this:
For each circle create a list of circles that are in distance R < R_max, calculate time after which we should update lists (something about T_update = R_max / V_max; where V_max is maximum current velocity)
Make a step in time
Check distance of each circle with circles on its list
If there is a collision - do something
If current time is bigger then T_update, go to 1.
Else go to 2.
This solution with lists is very often improved by adding another list with R_max_2 > R_max and with its own T_2 expiration time. In this solution this second list is used to update the first list. Of course after T_2 you have to update all lists which is O(N^2). Also be carefull with this T and T_2 times, because if collision can change velocity then those times would change. Also if you introduce some foreces to your system, then it will also cause velocity change.
Solution 1+2
You can use lists for collision detection and cells for updating lists. In one book it was written that this is the best solution, but I think that if you create small cells (like in my example) then solution 1 is better. But it is my opinion.
Other stuff
You can also do other things to improve speed of simulation:
When you calculate distance r = sqrt((x1-x2)*(x1-x2) + (y1-y2)*(y1-y2) + ...) you don't have to do square root operation. You can compare r^2 to some value - it's ok. Also you don't have to do all (x1-x2)*(x1-x2) operations (I mean, for all dimentions), because if x*x is bigger than some r_collision^2 then all other y*y and so on, summed up, would be bigger.
Molecular dynamics method is very easy to parallelise. You can do it with threads or even on GPU. You can calculate each distance in different thread. On GPU you can easly create thousends of threads almost costless.
For hard-spheres there is also effective algorithm that doesn't do steps in time, but instead it looks for nearest collision in time and jumps to this time and updates all positions. It can be good for not dense systems where collisions are not very probable.
one possible technique is to use the Delaunay triangulation on the center of your circles.
consider the center of each circle and apply the delaunay triangulation. this will tesselate your surface into triangles. this allows you to build a graph where each node stores the center of a triangle, and each edge connects to the center of a neighbour circle. the tesselation operated above will limit the number of neighbours to a reasonable value (6 neighbours on average)
now, when a circle moves, you have a limited set of circles to consider for collision. you then have to apply the tesselation again to the set of circles which are impacted by the move, but this operation involves only a very small subset of circles (the neighbours of the moving circle, and some neighbours of the neighbours)
the critical part is the first tesselation, which will take some time to perform, later tesselations are not a problem. and of course you need an efficient implementation of a graph in term of time and space...
Sub-divide your space up into regions and maintain a list of which circles are centred in each region.
Even if you use a very simple scheme, such as placing all the circles in a list, sorted by centre.x, then you can speed things up massively. To test a given circle, you only need to test it against the circles on either side of it in the list, going out until you reach one that has an x coordinate more than radius away.
You could make a 2D version of a "sphere tree" which is a special (and really easy to implement) case of the "spatial index" that Will suggested. The idea is to "combine" circles into a "containing" circle until you've got a single circle that "contains" the "huge number of circles".
Just to indicate the simplicity of computing a "containing circle" (top-of-my-head):
1) Add the center-locations of the two circles (as vectors) and scale by 1/2, thats the center of the containing circle
2) Subtract the center locations of the two circles (as vectors), add the radii and scale by 1/2, thats the radius of the containing circle
What answer is most efficient will depend somewhat on the density of circles. If the density is low, then placing placing a low-resolution grid over the map and marking those grid elements that contain a circle will likely be the most efficient. This will take approximately O(N*m*k) per update, where N is the total number of circles, m is the average number of circles per grid point, and k is the average number of grid points covered by one circle. If one circle moves more than one grid point per turn, then you have to modify m to include the number of grid points swept.
On the other hand, if the density is extremely high, you're best off trying a graph-walking approach. Let each circle contain all neighbors within a distance R (R > r_i for every circle radius r_i). Then, if you move, you query all the circles in the "forward" direction for neighbors they have and grab any that will be within D; then you forget all the ones in the backward direction that are now farther than D. Now a complete update will take O(N*n^2) where n is the average number of circles within a radius R. For something like a closely-spaced hexagonal lattice, this will give you much better results than the grid method above.
A suggestion - I am no game developer
Why not precalculate when the collisions are going to occur
as you specify
We can assume that circle object has following properties:
-Coordinates
-Radius
-Velocity
-Direction
Velocity is constant, but direction can change.
Then as the direction of one object changes, recalculate those pairs that are affected. This method is effective if directions do not change too frequently.
As Will mentioned in his answer, spacial partition trees are the common solution to this problem. Those algorithms sometimes take some tweaking to handle moving objects efficiently though. You'll want to use a loose bucket-fitting rule so that most steps of movement don't require an object to change buckets.
I've seen your "solution 1" used for this problem before and referred to as a "collision hash". It can work well if the space you're dealing with is small enough to be manageable and you expect your objects to be at least vaguely close to uniformly distributed. If your objects may be clustered, then it's obvious how that causes a problem. Using a hybrid approach of some type of a partition tree inside each hash-box can help with this and can convert a pure tree approach into something that's easier to scale concurrently.
Overlapping regions is one way to deal with objects that straddle the boundaries of tree buckets or hash boxes. A more common solution is to test any object that crosses the edge against all objects in the neighboring box, or to insert the object into both boxes (though that requires some extra handling to avoid breaking traversals).
If your code depends on a "tick" (and tests to determine if objects overlap at the tick), then:
when objects are moving "too fast" they skip over each other without colliding
when multiple objects collide in the same tick, the end result (e.g. how they bounce, how much damage they take, ...) depends on the order that you check for collisions and not the order that collisions would/should occur. In rare cases this can cause a game to lock up (e.g. 3 objects collide in the same tick; object1 and object2 are adjusted for their collision, then object2 and object3 are adjusted for their collision causing object2 to be colliding with object1 again, so the collision between object1 and object2 has to be redone but that causes object2 to be colliding with object3 again, so ...).
Note: In theory this second problem can be solved by "recursive tick sub-division" (if more than 2 objects collide, divide the length of the tick in half and retry until only 2 objects are colliding in that "sub-tick"). This can also cause games to lock up and/or crash (when 3 or more objects collide at the exact same instant you end up with a "recurse forever" scenario).
In addition; sometimes when game developers use "ticks" they also say "1 fixed length tick = 1 / variable frame rate", which is absurd because something that is supposed to be a fixed length can't depend on something variable (e.g. when the GPU is failing to achieve 60 frames per second the entire simulation goes in slow motion); and if they don't do this and have "variable length ticks" instead then both of the problems with "ticks" become significantly worse (especially at low frame rates) and the simulation becomes non-deterministic (which can be problematic for multi-player, and can result in different behavior when the player saves, loads or pauses the game).
The only correct way is to add a dimension (time), and give each object a line segment described as "starting coordinates and ending coordinates", plus a "trajectory after ending coordinates". When any object changes its trajectory (either because something unpredicted happened or because it reached its "ending coordinates") you'd find the "soonest" collision by doing a "distance between 2 lines < (object1.radius + object2.radius)" calculation for the object that changed and every other object; then modify the "ending coordinates" and "trajectory after ending coordinates" for both objects.
The outer "game loop" would be something like:
while(running) {
frame_time = estimate_when_frame_will_be_visible(); // Note: Likely to be many milliseconds after you start drawing the frame
while(soonest_object_end_time < frame_time) {
update_path_of_object_with_soonest_end_time();
}
for each object {
calculate_object_position_at_time(frame_time);
}
render();
}
Note that there are multiple ways to optimize this, including:
split the world into "zones" - e.g. so that if you know object1 would be passing through zones 1 and 2 then it can't collide with any other object that doesn't also pass through zone 1 or zone 2
keep objects in "end_time % bucket_size" buckets to minimize time taken to find "next soonest end time"
use multiple threads to do the "calculate_object_position_at_time(frame_time);" for each object in parallel
do all the "advance simulation state up to next frame time" work in parallel with "render()" (especially if most rendering is done by GPU, leaving CPU/s free).
For performance:
When collisions occur infrequently it can be significantly faster than "ticks" (you can do almost no work for relatively long periods of time); and when you have spare time (for whatever reason - e.g. including because the player paused the game) you can opportunistically calculate further into the future (effectively, "smoothing out" the overhead over time to avoid performance spikes).
When collisions occur frequently it will give you the correct results, but can be slower than a broken joke that gives you incorrect results under the same conditions.
It also makes it trivial to have an arbitrary relationship between "simulation time" and "real time" - things like fast forward and slow motion will not cause anything to break (even if the simulation is running as fast as hardware can handle or so slow that its hard to tell if anything is moving at all); and (in the absence of unpredictability) you can calculate ahead to an arbitrary time in the future, and (if you store old "object line segment" information instead of discarding it when it expires) you can skip to an arbitrary time in the past, and (if you only store old information at specific points in time to minimize storage costs) you can skip back to a time described by stored information and then calculate forward to an arbitrary time. These things combined also make it easy to do things like "instant slow motion replay".
Finally; it's also more convenient for multiplayer scenarios, where you don't want to waste a huge amount of bandwidth sending a "new location" for every object to every client at every tick.
Of course the downside is complexity - as soon as you want to deal with things like acceleration/deceleration (gravity, friction, lurching movement), smooth curves (elliptical orbits, splines) or different shaped objects (e.g. arbitrary meshes/polygons and not spheres/circles) the mathematics involved in calculating when the soonest collision will occur becomes significantly harder and more expensive; which is why game developers resort to the inferior "ticks" approach for simulations that are more complex than the case of N spheres or circles with linear motion.
I'm working on a python script whose goal is to detect if a point is out of a row of points (gps statement from an agricultural machine).
Input data are shapefile and I use Geopandas library for all geotreatments.
My first idea was to make a buffer around the 2 points around considered point. After that, I watch if my point is in the buffer. But results aren't good.
So I ask myself if there is a mathematical smart method, maybe with Scikit lib... Somebody is able to help me?
try arcgis.
build two new attributes in arcgis with their X and Y coordinate,then calculate the distance between the points you want
Question is kinda vague, but my guess would be to find approximation/regression line (I believe, numpy.polyfit of 2nd degree) and take points with largest distance from line, probably with threshold relative to overall fit loss
Query:
I want to estimate the trajectory of a person wearing an IMU between point a and point b. I know the exact location of point a and point b in an x,y,z space and the time it takes the person to walk between the points.
Is it possible to reconstruct the trajectory of the person moving from point a to point b using the data from an IMU and the time?
This question is too broad for SO. You could write a PhD thesis answering it, and I know people who have.
However, yes, it is theoretically possible.
However, there are a few things you'll have to deal with:
Your system is going to discretize time on some level. The result is that your estimate of position will be non-smooth. Increasing sampling rates is one way to address this, but this frequently increases the noise of the measurement.
Possible paths are non-unique. Knowing the time it takes to travel from a-b constrains slightly the information from the IMUs, but you are still left with an infinite family of possible routes between the two. Since you mention that you're considering a person walking between two points with z-components, perhaps you can constrain the route using knowledge of topography and roads?
IMUs function by integrating accelerations to velocities and velocities to positions. If the accelerations have measurement errors, and they always do, then the error in your estimate of the position will grow over time. The longer you run the system for, the more the results will diverge. However, if you're able to use roads/topography as a constraint, you may be able to restart the integration from known points in space; that is, if you can detect 90 degree turns on a street grid, each turn gives you the opportunity to tie the integrator back to a feasible initial condition.
Given the above, perhaps the most important question you have to ask yourself is how much error you can tolerate in your path reconstruction. Low-error estimates are going to require better (i.e. more expensive) sensors, higher sampling rates, and higher-order integrators.
I have a very large data set comprised of (x,y) coordinates. I need to know which of these points are in certain regions of the 2D space. These regions are bounded by 4 lines in the 2D domain (some of the sides are slightly curved).
For smaller datasets I have used a cumbersome for loop to test each individual point for membership of each region. This doesn't seem like a good option any more due to the size of data set.
Is there a better way to do this?
For example:
If I have a set of points:
(0,1)
(1,2)
(3,7)
(1,4)
(7,5)
and a region bounded by the lines:
y=2
y=5
y=5*sqrt(x) +1
x=2
I want to find a way to identify the point (or points) in that region.
Thanks.
The exact code is on another computer but from memory it was something like:
point_list = []
for i in range(num_po):
a=5*sqrt(points[i,0]) +1
b=2
c=2
d=5
if (points[i,1]<a) && (points[i,0]<b) && (points[i,1]>c) && (points[i,1]<d):
point_list.append(points[i])
This isn't the exact code but should give an idea of what I've tried.
If you have a single (or small number) of regions, then it is going to be hard to do much better than to check every point. The check per point can be fast, particularly if you choose the fastest or most discriminating check first (eg in your example, perhaps, x > 2).
If you have many regions, then speed can be gained by using a spatial index (perhaps an R-Tree), which rapidly identifies a small set of candidates that are in the right area. Then each candidate is checked one by one, much as you are checking already. You could choose to index either the points or the regions.
I use the python Rtree package for spatial indexing and find it very effective.
This is called the range searching problem and is a much-studied problem in computational geometry. The topic is rather involved (with your square root making things nonlinear hence more difficult). Here is a nice blog post about using SciPy to do computational geometry in Python.
Long comment:
You are not telling us the whole story.
If you have this big set of points (say N of them) and one set of these curvilinear quadrilaterals (say M of them) and you need to solve the problem once, you cannot avoid exhaustively testing all points against the acceptance area.
Anyway, you can probably preprocess the M regions in such a way that testing a point against the acceptance area takes less than M operations (closer to Log(M)). But due to the small value of M, big savings are unlikely.
Now if you don't just have one acceptance area but many of them to be applied in turn on the same point set, then more sophisticated solutions are possible (namely range searching), that can trade N comparisons to about Log(N) of them, a quite significant improvement.
It may also be that the point set is not completely random and there is some property of the point set that can be exploited.
You should tell us more and show a sample case.
I'm working on an adventure game in Python using Pygame. My main problem is how I am going to define the boundaries of the room and make the main character walk aroud without hitting a boundary every time. Sadly, I have never studied algorithms so I have no clue on how to calculate a path. I know this question is quite general and hard to answer but a point in the right direction would be very appreciated. Thanks!
There are two easy ways of defining your boundaries which are appropriate for such a game.
The simpler method is to divide your area into a grid, and use a 2D array to keep track of which squares in the grid are passable. Usually, this array stores your map information too, so in each position, there is a number that indicates whether that square contains grass or wall or road or mountain etc. (and therefore what picture to display). To give you a rough picture:
######
#.# #
# ## #
# #
######
A more complex method which is necessary if you want a sort of "maze" look, with thin walls, is to use a 2D array that indicates whether there is a vertical wall in between grid squares, and also whether there is a horizontal wall between grid squares. A rough picture (it looks a stretched in ASCII but hopefully you'll get the point):
- - - -
| | |
- -
| |
- - - -
The next thing to decide is what directions your character may move in (up/down/left/right is easiest, but diagonals are not too much harder). Then the program basically has to "mentally" explore the area, starting from your current position, hoping to come across the destination.
A simple search that is easy to implement for up/down/left/right and will find you the shortest path, if there is one, is called Breadth-First search. Here is some pseudocode:
queue = new Queue #just a simple first-in-first-out
queue.push(startNode)
while not queue.empty():
exploreNode = queue.pop()
if isWalkable(exploreNode): #this doesn't work if you use
#"thin walls". The check must go
#where the pushes are instead
if isTarget(exploreNode):
#success!!!
else:
#push all neighbours
queue.push( exploreNode.up )
queue.push( exploreNode.down )
queue.push( exploreNode.left )
queue.push( exploreNode.right )
This algorithm is slow for large maps, but it will get you used to some graph-search and pathfinding concepts. Once you've verified that it works properly, you can try replacing it with A* or something similar, which should give the same results in less time!
A* and many other searching algorithms use a priority queue instead of a FIFO queue. This lets them consider "more likely" paths first, but get around to the roundabout paths if it turns out that the more direct paths are blocked.
I recommend you read up on the A* search algorithm as it is commonly used in games for pathing problems.
If this game is two dimensional (or 2.5) I suggest you use a tile system as checking for collisions will be easier. Theres lots of information online that can get you started with these.
Sadly, I have never studied algorithms so I have no clue on how to calculate a path.
Before you start writing games, you should educate yourself on those. This takes a little more effort at the beginning, but will save you much time later.
I am not familiar with pygame, but many applications commonly use bounding volumes to define the edge of some region. The idea is that as your character walks, you will check if the characters volume intersects with the volume of a wall. You can then either adjust the velocity or stop your character from moving. Use differing shapes to get a smooth wall so that your character doesn't get stuck on the pointy edges.
These concepts can be used for any application which requires quick edge and bounds detection.