How's this for collision detection? - python

I have to write a collide method inside a Rectangle class that takes another Rectangle object as a parameter and returns True if it collides with the rectangle performing the method and False if it doesn't. My solution was to use a for loop that iterates through every value of x and y in one rectangle to see if it falls within the other, but I suspect there might be more efficient or elegant ways to do it. This is the method (I think all the names are pretty self explanatory, just ask if anything isn't clear):
def collide(self,target):
result = False
for x in range(self.x,self.x+self.width):
if x in range(target.get_x(),target.get_x()+target.get_width()):
result = True
for y in range(self.y,self.y+self.height):
if y in range(target.get_y(),target.get_y()+target.get_height()):
result = True
return result
Thanks in advance!

The problem of collision detection is a well-known one, so I thought rather than speculate I might search for a working algorithm using a well-known search engine. It turns out that good literature on rectangle overlap is less easy to come by than you might think. Before we move on to that, perhaps I can comment on your use of constructs like
if x in range(target.get_x(),target.get_x()+target.get_width()):
It is to Python's credit that such an obvious expression of your idea actually succeeds as intended. What you may not realize is that (in Python 2, anyway) each use of range() creates a list (in Python 3 it creates a generator and iterates over that instead; if you don't know what that means please just accept that it's little better in computational terms). What I suspect you may have meant is
if target.get_x() <= x < target.get_x()+target.get_width():
(I am using open interval testing to reflect your use of range())This has the merit of replacing N equality comparisons with two chained comparisons. By a relatively simple mathematical operation (subtracting target.get_x() from each term in the comparison) we transform this into
if 0 <= x-target.get_x() < target.get_width():
Do not overlook the value of eliminating such redundant method calls, though it's often simpler to save evaluated expressions by assignment for future reference.
Of course, after that scrutiny we have to look with renewed vigor at
for x in range(self.x,self.x+self.width):
This sets a lower and an upper bound on x, and the inequality you wrote has to be false for all values of x. Delving beyond the code into the purpose of the algorithm, however, is worth doing. Because any lit creation the inner test might have done is now duplicated many times over (by the width of the object, to be precise). I take the liberty of paraphrasing
for x in range(self.x,self.x+self.width):
if x in range(target.get_x(),target.get_x()+target.get_width()):
result = True
into pseudocode: "if any x between self.x and self.x+self.width lies between the target's x and the target's x+width, then the objects are colliding". In other words, whether two ranges overlap. But you sure are doing a lot of work to find that out.
Also, just because two objects collide in the x dimension doesn't mean they collide in space. In fact, if they do not also collide in the y dimension then the objects are disjoint, otherwise you would assess these rectangles as colliding:
+----+
| |
| |
+----+
+----+
| |
| |
+----+
So you want to know if they collide in BOTH dimensions, not just one. Ideally one would define a one-dimensional collision detection (which by now we just about have ...) and then apply in both dimensions. I also hope that those accessor functions can be replaced by simple attribute access, and my code is from now on going to assume that's the case.
Having gone this far, it's probably time to take a quick look at the principles in this YouTube video, which makes the geometry relatively clear but doesn't express the formula at all well. It explains the principles quite well as long as you are using the same coordinate system. Basically two objects A and B overlap horizontally if A's left side is between B's left and right sides. They also overlap if B's right is between A's left and right. Both conditions might be true, but in Python you should think about using the keyword or to avoid unnecessary comparisons.
So let's define a one-dimensional overlap function:
def oned_ol(aleft, aright, bleft, bright):
return (aleft <= bright < aright) or (bleft <= aright < bright)
I'm going to cheat and use this for both dimensions, since the inside of my function doesn't know which dimension's data I cam calling it with. If I am correct, the following formulation should do:
def rect_overlap(self, target):
return oned_ol(self.x, self.x+self.width, target.x, target.x+target.width) \
and oned_ol(self.y, self.y+self.height, target.y, target.y+target.height
If you insist on using those accessor methods you will have to re-cast the code to include them. I've done sketchy testing on the 1-D overlap function, and none at all on rect_overlap, so please let me know - caveat lector. Two things emerge.
A superficial examination of code can lead to "optimization" of a hopelessly inefficient algorithm, so sometimes it's better to return to first principles and look more carefully at your algorithm.
If you use expressions as arguments to a function they are available by name inside the function body without the need to make an explicit assignment.

def collide(self, target):
# self left of target?
if x + self.width < target.x:
return False
# self right of target?
if x > target.x + target.width :
return False
# self above target?
if y + self.height < target.y:
return False
# self below target?
if y > target.y + target.height:
return False
return True
Something like that (depends on your coord system, i.e. y positive up or down)

Related

Two instances of class are equal but different hash code

I'm working on a geometry in space project and I have different geometrical entities, among which Point. Sometimes two points are equal but for small numerical errors due to calculation, such as 1 and 1.0000000001, so I implemented the __eq__ method with math.isclose() function to sort this things out.
class Point(object):
def __init__(self, x, y, z):
self.x = x
self.y = y
self.z = z
def __eq__(self, other):
if isinstance(other, Point):
equal_x = math.isclose(other.x, self.x, rel_tol=5e-3, abs_tol=1e-5)
equal_y = math.isclose(other.y, self.y, rel_tol=5e-3, abs_tol=1e-5)
equal_z = math.isclose(other.z, self.z, rel_tol=5e-3, abs_tol=1e-5)
if equal_x and equal_y and equal_z:
return True
return False
How to implement the __hash__ method in order for such two objects to be equal when using sets and dictionaries?
The ultimate goal is to use the following function to "uniquify" a list of such object and remove duplicates:
def f12(seq):
# from Raymond Hettinger
# https://twitter.com/raymondh/status/944125570534621185
return list(dict.fromkeys(seq))
There is a general problem with your equal method.
Most people will expect transitivity in a equal method. This means that when two points a and b are equal and a is also equal to an other point c, then a and c should be also equal. That doesn't have to be the case in your implementation.
Imagine each point, have a sphere around him, where the points are considered equal. The following drawing shows this spheres (or better half the radius of the sphere) and overlapping therefore implies that the points are equal:
So a and b should have the same hash code and b and c should have the same hash code, but not a and c? How should this be possible?
I would propose to add an extra method is_close_to and implement the logic there.
Edit:
#JLPeyret points out, that you can use a grid and compute the hash value of a point corresponding to the quadrant of the grid that contains the point. In this case it is possible, that two near points are close to the division of a grid quadrant and therefore assigned with different hash values. If such a probabilistic approach works for you, take a look at locality-sensitive hashing.
Instead of giving your points an 'invalid' __eq__ method, either give them an isclose method, using the code you already have, and use that instead of == or make them equal in the standard sense by rounding the coordinates.

How to apply recursion function in land subdivision?

I've made a subdivision code that allows division of a polygon by bounding box method. subdivision(coordinates) results in subblockL and subblockR (left and right). If I want to repeat this subdivision code until it reaches the area less than 200, I would need to use recursion method.
ex:
B = subdivision(A)[0], C = subdivision(B)[0], D = subdivision(C)[0]... until it reaches the area close to 200. (in other words,
subdivision(subdivision(subdivision(A)[0])[0])[0]...)
How can I simplify repetition of subdivision? and How can I apply subdivision to every block instead of single block?
while area(subdivision(A)[0]) < 200:
for i in range(A):
subdivision(i)[0]
def sd_recursion(x):
if x == subdivision(A):
return subdivision(A)
else:
return
I'm not sure what function to put in
"What function to put in" is the function itself; that's the definition of recursion.
def sd_recursive(coordinates):
if area(coordinates) < 200:
return [coordinates]
else:
a, b = subdivision(coordinates)
return sd_recursive(a) + sd_recursive(b) # list combination, not arithmetic addition
To paraphrase, if the area is less than 200, simply return the polygon itself. Otherwise, divide the polygon into two parts, and return ... the result of applying the same logic to each part in turn.
Recursive functions are challenging because recursive functions are challenging. Until you have wrapped your head around this apparently circular argument, things will be hard to understand. The crucial design point is to have a "base case" which does not recurse, which in other words escapes the otherwise infinite loop of the function calling itself under some well-defined condition. (There's also indirect recursion, where X calls Y which calls X which calls Y ...)
If you are still having trouble, look at one of the many questions about debugging recursive functions. For example, Understanding recursion in Python
I assumed the function should return a list in every case, but there are multiple ways to arrange this, just so long as all parts of the code obey the same convention. Which way to prefer also depends on how the coordinates are represented and what's convenient for your intended caller.
(In Python, ['a'] + ['b'] returns ['a', 'b'] so this is not arithmetic addition of two lists, it's just a convenient way to return a single list from combining two other lists one after the other.)
Recursion can always be unrolled; the above can be refactored to
def sd_unrolled(coordinates):
result = []
while coordinates:
if area(coordinates[0]) < 200:
result.extend(coordinates[0])
coordinates = coordinates[1:]
a, b = subdivision(coordinates[0])
coordinates = [a, b] + coordinates[1:]
return result
This is tricky in its own right (but could perhaps be simplified by introducing a few temporary variables) and pretty inefficient or at least inelegant as we keep on copying slices of the coordinates list to maintain the tail while we keep manipulating the head (the first element of the list) by splitting it until each piece is small enough.

Under what circumstances is bidirectional bubble sort better than standard bubble sort?

I have implemented the bidirectional bubble sort algorithm. But I can't think of a scenario where bidirectional bubble sort better than standard bubble sort..can some one give me some clue?
My implementation in Python:
def bubbleSort_v(my_list):
s = 0
e = 0
right = True
for index in range(len(my_list)-1,0,-1):
if right:
right = False
for idx in range(s,index+e,1):
if my_list[idx] > my_list[idx+1]:
my_list[idx],my_list[idx+1] = my_list[idx+1],my_list[idx]
s += 1
else:
right = True
for idx in range(index-1+s,e,-1):
if my_list[idx] < my_list[idx-1]:
my_list[idx],my_list[idx-1] = my_list[idx-1],my_list[idx]
e += 1
return my_list
Thanks!
In case there is an element that is at the right (for instance the last index) of the list, that should be moved to the left side (for instance the first index) of the list. This will take a long time with single-directional bubble-sort: each time it will move only one step.
If we perform bi-directional bubblesort however, the element will be moved to the left in the first step to the right.
So in general it is better if one or more elements should be moved (over a large number of places) in the opposite direction in which the single direction bubblesort is done.
For your implementation of bubblesort, it will however not make much difference: usually bubblesort will test while it sorts. In case it can do a full run without swaps, it will simply stop working.
For example a single-directional bubblesort that moves to the right:
def single_bubble(data):
for i in range(len(data)):
can_exit = True
for j in range(len(data)-i-1):
if data[j] > data[j+1]:
data[j],data[j+1] = data[j+1],data[j]
can_exit = False
if can_exit:
return
So in case you want to move an element a large number of places to the left, then for each such step, you will have to do a full loop again. We can optimize the above method a bit more, but this behavior cannot be eliminated.
Bi-directional bubblesort can be implemented like:
def single_bubble(data):
for i in range(len(data)):
can_exit = True
for j in range(len(data)-i-1):
if data[j] > data[j+1]:
data[j],data[j+1] = data[j+1],data[j]
can_exit = False
if can_exit:
return
for j in range(len(data)-i,i,-1):
if data[j-i] > data[j]:
data[j-1],data[j] = data[j],data[j-1]
can_exit = False
if can_exit:
return
That being said, bubble sort is not a good sorting algorithm in general. There exist way better algorithms like quicksort, mergesort, timsort, radixsort (for numerical data), etc.
Bubblesort is actually a quite bad algorithm even among O(n2) algorithms, since it will move an object one place at a time. Insertion sort will simply first calculate what has to move and then move that part of the list quite fast saving a lot of useless moves. The algorithms can however serve an educational purpose when learning to design, implement and analyze algorithms, since the algorithms will perform significantly bad compared to more advanced ones.
Implementing (general purpose) sorting function yourself is probably not beneficial: good algorithms have been implemented for all popular programming languages and these algorithms are fast, consume less memory, etc.

Tips on improving this function?

This may be quite a green question, but I hope you understand – just started on python and trying to improve. Anyways, wrote a little function to do the "Shoelace Method" of finding the area of a polygon in a Cartesian plane (see this for a refresher).
I want to know how can I improve my method, so I can try out fancy new ways of doing the same old things.
def shoelace(list):
r_p = 0 # Positive Values
r_n = 0 # Negative Values
x, y = [i[0] for i in list], [i[1] for i in list]
x.append(x[0]), y.append(y[0])
print(x, y)
for i in range(len(x)):
if (i+1) < len(x):
r_p += (x[i] * y[i+1])
r_n += (x[i+1] * y[i])
else:
break
return ((abs(r_p - r_n))/2)
Don't use short variable names that need to be commented; use names that indicate the function.
list is the name of the built-in list type, so while Python will let you replace that name, it's a bad idea stylistically.
, should not be used to separate what are supposed to be statements. You can use ;, but it's generally better to just put things on separate lines. In your case, it happens to work because you are using .append for the side effect, but basically what you are doing is constructing the 2-tuple (None, None) (the return values from .append) and throwing it away.
Use built-in functions where possible for standard list transformations. See the documentation for zip, for example. Except you don't really need to perform this transformation; you want to consider pairs of adjacent points, so do that - and take apart their coordinates inside the loop.
However, you can use zip to transform the list of points into a list of pairs-of-adjacent-points :) which lets you write a much cleaner loop. The idea is simple: first, we make a list of all the "next" points relative to the originals, and then we zip the two point-lists together.
return is not a function, so the thing you're returning does not need surrounding parentheses.
Instead of tallying up separate positive and negative values, perform signed arithmetic on a single value.
def shoelace(points):
signed_double_area = 0
next_points = points[1:] + points[:1]
for begin, end in zip(points, next_points):
begin_x, begin_y = begin
end_x, end_y = end
signed_double_area += begin_x * end_y
signed_double_area -= end_x * begin_y
return abs(signed_double_area) / 2
Functionally, your program is quite good. One minor remark is to replace range(len(x)) with xrange(len(x)). It makes the program slightly more efficient. Generally, you should use range only in cases where you actually need the full list of values it creates. If all you need is to loop over those values, use xrange.
Also, you don't need the parenthesis in the return statement, nor in the r_p += and r_n += statements.
Regarding style, in Python variable assignments shouldn't be done like you did, but rather with a single space on each side of the = symbol:
r_p = 0
r_n = 0

List of objects or parallel arrays of properties?

The question is, basically: what would be more preferable, both performance-wise and design-wise - to have a list of objects of a Python class or to have several lists of numerical properties?
I am writing some sort of a scientific simulation which involves a rather large system of interacting particles. For simplicity, let's say we have a set of balls bouncing inside a box so each ball has a number of numerical properties, like x-y-z-coordinates, diameter, mass, velocity vector and so on. How to store the system better? Two major options I can think of are:
to make a class "Ball" with those properties and some methods, then store a list of objects of the class, e. g. [b1, b2, b3, ...bn, ...], where for each bn we can access bn.x, bn.y, bn.mass and so on;
to make an array of numbers for each property, then for each i-th "ball" we can access it's 'x' coordinate as xs[i], 'y' coordinate as ys[i], 'mass' as masses[i] and so on;
To me it seems that the first option represents a better design. The second option looks somewhat uglier, but might be better in terms of performance, and it could be easier to use it with numpy and scipy, which I try to use as much as I can.
I am still not sure if Python will be fast enough, so it may be necessary to rewrite it in C++ or something, after initial prototyping in Python. Would the choice of data representation be different for C/C++? What about a hybrid approach, e.g. Python with C++ extension?
Update: I never expected any performance gain from parallel arrays per se, but in a mixed environment like Python + Numpy (or whatever SlowScriptingLanguage + FastNativeLibrary) using them may (or may not?) let you move more work out of you slow scripting code and into the fast native library.
Having an object for each ball in this example is certainly better design. Parallel arrays are really a workaround for languages that do not support proper objects. I wouldn't use them in a language with OO capabilities unless it's a tiny case that fits within a function (and maybe not even then) or if I've run out of every other optimization option and the profiler shows that property access is the culprit. This applies twice as much to Python as to C++, as the former places a large emphasis on readability and elegance.
I agree that parallel arrays are almost always a bad idea, but don't forget that you can use views into a numpy array when you're setting things, up, though... (Yes, I know this is effectively using parallel arrays, but I think it's the best option in this case...)
This is great if you know the number of "balls" you're going to create beforehand, as you can allocate an array for the coordinates, and store a view into that array for each ball object.
You have to be a bit careful to do operations in-place on the coords array, but it makes updating coordinates for numerous "balls" much, much, much faster.
For example...
import numpy as np
class Ball(object):
def __init__(self, coords):
self.coords = coords
def _set_coord(self, i, value):
self.coords[i] = value
x = property(lambda self: self.coords[0],
lambda self, value: self._set_coord(0, value))
y = property(lambda self: self.coords[1],
lambda self, value: self._set_coord(1, value))
def move(self, dx, dy):
self.x += dx
self.y += dy
def main():
n_balls = 10
n_dims = 2
coords = np.zeros((n_balls, n_dims))
balls = [Ball(coords[i,:]) for i in range(n_balls)]
# Just to illustrate that that the coords are updating
ball = balls[0]
# Random walk by updating coords array
print 'Moving all the balls randomly by updating coords'
for step in xrange(5):
# Add a random value to all coordinates
coords += 0.5 - np.random.random((n_balls, n_dims))
# Display the coords for a particular ball and the
# corresponding row of the coords array
print ' Value of ball.x, ball.y:', ball.x, ball.y
print ' Value of coords[0,:]:', coords[0,:]
# Move an individual ball object
print 'Moving a ball individually through Ball.move()'
ball.move(0.5, 0.5)
print ' Value of ball.x, ball.y:', ball.x, ball.y
print ' Value of coords[0,:]:', coords[0,:]
main()
Just to illustrate, this outputs something like:
Moving all the balls randomly by updating coords
Value of ball.x, ball.y: -0.125713650677 0.301692195466
Value of coords[0,:]: [-0.12571365 0.3016922 ]
Value of ball.x, ball.y: -0.304516863495 -0.0447543559805
Value of coords[0,:]: [-0.30451686 -0.04475436]
Value of ball.x, ball.y: -0.171589457954 0.334844443821
Value of coords[0,:]: [-0.17158946 0.33484444]
Value of ball.x, ball.y: -0.0452864552743 -0.0297552313656
Value of coords[0,:]: [-0.04528646 -0.02975523]
Value of ball.x, ball.y: -0.163829876915 0.0153203173857
Value of coords[0,:]: [-0.16382988 0.01532032]
Moving a ball individually through Ball.move()
Value of ball.x, ball.y: 0.336170123085 0.515320317386
Value of coords[0,:]: [ 0.33617012 0.51532032]
The advantage here is that updating a single numpy array is going to be much, much faster than iterating through all of your ball objects, but you retain a more object-oriented approach.
Just my thoughts on it, anyway..
EDIT: To give some idea of the speed difference, with 1,000,000 balls:
In [104]: %timeit coords[:,0] += 1.0
100 loops, best of 3: 11.8 ms per loop
In [105]: %timeit [item.x + 1.0 for item in balls]
1 loops, best of 3: 1.69 s per loop
So, updating the coordinates directly using numpy is roughly 2 orders of magnitude faster when using a large number of balls. (the difference is smaller when using 10 balls, as per the example, roughly a factor of 2x, rather than 150x)
I think it depends on what you're going to be doing with them, and how often you're going to be working with (all attributes of one particle) vs (one attribute of all particles). The former is better suited to the object approach; the latter is better suited to the array approach.
I was facing a similar problem (although in a different domain) a couple of years ago. The project got deprioritized before I actually implemented this phase, but I was leaning towards a hybrid approach, where in addition to the Ball class I would have an Ensemble class. The Ensemble would not be a list or other simple container of Balls, but would have its own attributes (which would be arrays) and its own methods. Whether the Ensemble is created from the Balls, or the Balls from the Ensemble, depends on how you're going to construct them.
One of my coworkers was arguing for a solution where the fundamental object was an Ensemble which might contain only one Ball, so that no calling code would ever have to know whether you were operating on just one Ball (do you ever do that for your application?) or on many.
Will you be having any forces between the balls (hard sphere/collision, gravity, electromagnetic)? I'm guessing so. Will you be having a large enough number of balls to want to use Barnes-Hut simulation ideas? If so, then you should definitely use the Ball class idea so that you can store them easily in octrees or something else along those lines. Also, using the Barnes-Hut simulation will cut down the complexity of the simulation to O(N log N) from O(N^2).
Really though, if you don't have forces between the balls or aren't using many balls, you don't need the possible speed gains from using parallel arrays and should go with the Ball class idea for that as well.

Categories

Resources