Two instances of class are equal but different hash code

Two instances of class are equal but different hash code - python

I'm working on a geometry in space project and I have different geometrical entities, among which Point. Sometimes two points are equal but for small numerical errors due to calculation, such as 1 and 1.0000000001, so I implemented the __eq__ method with math.isclose() function to sort this things out.
class Point(object):
def __init__(self, x, y, z):
self.x = x
self.y = y
self.z = z
def __eq__(self, other):
if isinstance(other, Point):
equal_x = math.isclose(other.x, self.x, rel_tol=5e-3, abs_tol=1e-5)
equal_y = math.isclose(other.y, self.y, rel_tol=5e-3, abs_tol=1e-5)
equal_z = math.isclose(other.z, self.z, rel_tol=5e-3, abs_tol=1e-5)
if equal_x and equal_y and equal_z:
return True
return False
How to implement the __hash__ method in order for such two objects to be equal when using sets and dictionaries?
The ultimate goal is to use the following function to "uniquify" a list of such object and remove duplicates:
def f12(seq):
# from Raymond Hettinger
# https://twitter.com/raymondh/status/944125570534621185
return list(dict.fromkeys(seq))

There is a general problem with your equal method.
Most people will expect transitivity in a equal method. This means that when two points a and b are equal and a is also equal to an other point c, then a and c should be also equal. That doesn't have to be the case in your implementation.
Imagine each point, have a sphere around him, where the points are considered equal. The following drawing shows this spheres (or better half the radius of the sphere) and overlapping therefore implies that the points are equal:
So a and b should have the same hash code and b and c should have the same hash code, but not a and c? How should this be possible?
I would propose to add an extra method is_close_to and implement the logic there.
Edit:
#JLPeyret points out, that you can use a grid and compute the hash value of a point corresponding to the quadrant of the grid that contains the point. In this case it is possible, that two near points are close to the division of a grid quadrant and therefore assigned with different hash values. If such a probabilistic approach works for you, take a look at locality-sensitive hashing.

Instead of giving your points an 'invalid' __eq__ method, either give them an isclose method, using the code you already have, and use that instead of == or make them equal in the standard sense by rounding the coordinates.

Related

How can I represent an "infinite" set based off a predicate, without storing all the elements?

For homework, I was asked to write a class that acts like a mathematical - i.e., potentially infinite - set. The constructor needs to have a parameter that will be given a function that returns a boolean value (a boolean predicate). It will be given as a lambda, for example lambda x: x%3==2 or lambda x: x*x>5.
The resulting object should represent the set of all natural numbers(including 0) that satisfy the predicate.
I also need to implement __or__, __and__ and __sub__ to give the union, intersection and difference of two sets.
So far, I have this code:
class Infset:
def __init__(self, f):
self.inf = set()
self.x = 0
while True:
if f(self.x) == True:
self.inf.add(self.x)
self.x += 1
else:
self.x += 1
Of course, this really does try to make an infinite set, which results in a MemoryError.
How can I represent a potentially infinite set with finite storage space?

Instead of storing numbers, you need to store the function f itself. To do unions and so on, you need to create a new f based on self.f and other.f which gives the right answer for whether a given x is in the union.

Python Trig Functions Return Complex Numbers?

I am writing code that accepts the degree by which a motor turns and uses that data to calculate the distance covered by the wheels (using distance = no. of rotations * distance covered per rotation).
It then makes an error adjustment (taking into consideration environmental factors such as friction).
Finally, using trigonometry, it calculates the distance moved along the x-axis and y-axis.
All the above is done by the function straight contained within the class CoordinateManager. This function is called by an instance of another class.
class CoordinateManager:
goalcord = [20, 0]
def __init__(self):
self.curcord = [0, 0]
self.theta = 0
def get_compass_angle(self):
compass = Sensor(address='in2')
return compass.value(0)
def turn(self, iangle, fangle):
self.theta = self.theta + (fangle-iangle)
def straight(self, turnangle):
d = turnangle*2*3.14*2/360
d = 1.8120132*(d**0.8938054)
thetarad = radians(self.theta)
dx = d*sin(thetarad)
dy = d*cos(thetarad)
self.curcord[0] += dx
self.curcord[1] += dy
Printing both d and self.theta shows that they contain correct values.
This must mean that the array self.curcord has valid values too. However, this has not been the case. Printing the two elements of self.curcord outputs complex numbers (some big float + another big floatj).
I can think of no logical explanation for this other than that the trigonometric functions must be returning complex numbers. However, I think the chances that a python built-in lib function returns wrong values are extraordinarily slim.
Is there any logical error that I may be overlooking?
Edit: I just tried changing the last two lines to:
self.curcord[0] += dx
self.curcord[1] += dy
I just tried using .real when displaying the values. Even though the values are real now, they are still wrong. I will look further into whether this is caused by some calculation error.

Since you said in the comments above that turnangle can be any integer, the problem can be directly traced to this line:
d = 1.8120132*(d**0.8938054)
Since turnangle can be negative, the value of d before this line is executed can also be negative; a negative value raised to an arbitrary decimal power is in general complex.
Therefore the problem does not lie with the trig functions at all. The above also leads me to believe that when you said
Printing both d and self.theta shows that they contain correct values
... you only did so after this line:
d = turnangle*2*3.14*2/360
This would explain why you wrongly thought the problem must lie elsewhere.
UPDATE:
It is a very bad habit to set a variable to some function of itself like you did. Try to use a different variable name to avoid confusion - as you saw above I had to refer to "this line" rather than by their variable names.
Perhaps something like this would work, assuming that the behaviour of the motor is the same regardless of the sign of turnangle?
d = sign(d) * 1.8120132 * (abs(d) ** 0.8938054)

How's this for collision detection?

I have to write a collide method inside a Rectangle class that takes another Rectangle object as a parameter and returns True if it collides with the rectangle performing the method and False if it doesn't. My solution was to use a for loop that iterates through every value of x and y in one rectangle to see if it falls within the other, but I suspect there might be more efficient or elegant ways to do it. This is the method (I think all the names are pretty self explanatory, just ask if anything isn't clear):
def collide(self,target):
result = False
for x in range(self.x,self.x+self.width):
if x in range(target.get_x(),target.get_x()+target.get_width()):
result = True
for y in range(self.y,self.y+self.height):
if y in range(target.get_y(),target.get_y()+target.get_height()):
result = True
return result
Thanks in advance!

The problem of collision detection is a well-known one, so I thought rather than speculate I might search for a working algorithm using a well-known search engine. It turns out that good literature on rectangle overlap is less easy to come by than you might think. Before we move on to that, perhaps I can comment on your use of constructs like
if x in range(target.get_x(),target.get_x()+target.get_width()):
It is to Python's credit that such an obvious expression of your idea actually succeeds as intended. What you may not realize is that (in Python 2, anyway) each use of range() creates a list (in Python 3 it creates a generator and iterates over that instead; if you don't know what that means please just accept that it's little better in computational terms). What I suspect you may have meant is
if target.get_x() <= x < target.get_x()+target.get_width():
(I am using open interval testing to reflect your use of range())This has the merit of replacing N equality comparisons with two chained comparisons. By a relatively simple mathematical operation (subtracting target.get_x() from each term in the comparison) we transform this into
if 0 <= x-target.get_x() < target.get_width():
Do not overlook the value of eliminating such redundant method calls, though it's often simpler to save evaluated expressions by assignment for future reference.
Of course, after that scrutiny we have to look with renewed vigor at
for x in range(self.x,self.x+self.width):
This sets a lower and an upper bound on x, and the inequality you wrote has to be false for all values of x. Delving beyond the code into the purpose of the algorithm, however, is worth doing. Because any lit creation the inner test might have done is now duplicated many times over (by the width of the object, to be precise). I take the liberty of paraphrasing
for x in range(self.x,self.x+self.width):
if x in range(target.get_x(),target.get_x()+target.get_width()):
result = True
into pseudocode: "if any x between self.x and self.x+self.width lies between the target's x and the target's x+width, then the objects are colliding". In other words, whether two ranges overlap. But you sure are doing a lot of work to find that out.
Also, just because two objects collide in the x dimension doesn't mean they collide in space. In fact, if they do not also collide in the y dimension then the objects are disjoint, otherwise you would assess these rectangles as colliding:
+----+
| |
| |
+----+
+----+
| |
| |
+----+
So you want to know if they collide in BOTH dimensions, not just one. Ideally one would define a one-dimensional collision detection (which by now we just about have ...) and then apply in both dimensions. I also hope that those accessor functions can be replaced by simple attribute access, and my code is from now on going to assume that's the case.
Having gone this far, it's probably time to take a quick look at the principles in this YouTube video, which makes the geometry relatively clear but doesn't express the formula at all well. It explains the principles quite well as long as you are using the same coordinate system. Basically two objects A and B overlap horizontally if A's left side is between B's left and right sides. They also overlap if B's right is between A's left and right. Both conditions might be true, but in Python you should think about using the keyword or to avoid unnecessary comparisons.
So let's define a one-dimensional overlap function:
def oned_ol(aleft, aright, bleft, bright):
return (aleft <= bright < aright) or (bleft <= aright < bright)
I'm going to cheat and use this for both dimensions, since the inside of my function doesn't know which dimension's data I cam calling it with. If I am correct, the following formulation should do:
def rect_overlap(self, target):
return oned_ol(self.x, self.x+self.width, target.x, target.x+target.width) \
and oned_ol(self.y, self.y+self.height, target.y, target.y+target.height
If you insist on using those accessor methods you will have to re-cast the code to include them. I've done sketchy testing on the 1-D overlap function, and none at all on rect_overlap, so please let me know - caveat lector. Two things emerge.
A superficial examination of code can lead to "optimization" of a hopelessly inefficient algorithm, so sometimes it's better to return to first principles and look more carefully at your algorithm.
If you use expressions as arguments to a function they are available by name inside the function body without the need to make an explicit assignment.

def collide(self, target):
# self left of target?
if x + self.width < target.x:
return False
# self right of target?
if x > target.x + target.width :
return False
# self above target?
if y + self.height < target.y:
return False
# self below target?
if y > target.y + target.height:
return False
return True
Something like that (depends on your coord system, i.e. y positive up or down)

python: representing square grid that wraps on itself (cylinder)

I am modeling something that occurs on a square grid that wraps on itself (i.e., if you walk up past the highest point, you end up at the lowest point, like a cylinder; if you walk to the right, you just hit the boundary). I need to keep track of the location of various agents, the amount of resources at different points, and calculate the direction that agents will be moving in based on certain rules.
What's the best way to model this?
Should I make a class that represents points, which has methods to return neighboring points in each direction? If so, I would probably need to make it hashable so that I can use it as keys for the dictionary that contains the full grid (I assume such grid should be a dictionary?)
Or should I make a class that describes the whole grid and not expose individual points as independent objects?
Or should I just use regular (x, y) tuples and have methods elsewhere that allow to look up neighbors?
A lot of what I'll need to model is not yet clearly defined. Furthermore, I expect the geometry of the surface might possibly change one day (e.g., it could wrap on itself in both directions).
EDIT: One additional question: should I attach the information about the quantity of resources to each Point instance; or should I have a separate class that contains a map of resources indexed by Point?

If you want a hashable Point class without too much work, subclass tuple and add your own neighbor methods.
class Point(tuple):
def r_neighbor(self):
return Point((self[0] + 1, self[1]))
def l_neighbor(self):
[...]
x = Point((10, 11))
print x
print x.r_neighbor()
The tuple constructor wants an iterable, hence the double-parens in Point((10, 11)); if you want to avoid that, you can always override __new__ (overriding __init__ is pointless because tuples are immutable):
def __new__(self, x, y):
return super(Point, self).__new__(self, (x, y))
This might also be the place to apply modular arithmetic -- though that will really depend on what you are doing:
def __new__(self, x, y, gridsize=100):
return super(Point, self).__new__(self, (x % gridsize, y % gridsize))
or to enable arbitrary dimension grids, and go back to using tuples in __new__:
def __new__(self, tup, gridsize=100):
return super(Point, self).__new__(self, (x % gridsize for x in tup))
Regarding your question about resources: since Point is an immutable class, it's a poor place to store information about resources that might change. A defaultdict would be handy; you wouldn't have to initialize it.
from collections import defaultdict
grid = defaultdict(list)
p = Point((10, 13))
grid[(10, 13)] = [2, 3, 4]
print grid[p] # prints [2, 3, 4]
print grid[p.r_neighbor] # no KeyError; prints []
If you want more flexibility, you could use a dict instead of a list in defaultdict; but defaultdict(defaultdict) won't work; you have to create a new defaultdict factory function.
def intdict():
return defaultdict(int)
grid = defaultdict(intdict)
or more concisely
grid = defaultdict(lambda: defaultdict(int))
then
p = Point((10, 13))
grid[(10, 13)]["coins"] = 50
print grid[p]["coins"] # prints 50
print grid[p.r_neighbor]["coins"] # prints 0; again, no KeyError

I need to keep track of the location
of various agents, the amount of
resources at different points, and
calculate the direction that agents
will be moving in based on certain
rules.
Sounds like a graph to to me, though I try to see a graph in every problem. All the operations you mentioned (move around, store resources, find out where to move to) are very common on graphs. You would also be able to easily change the topology, from a cylinder to a torus or in any other way.
The only issue is that it takes more space than other representations.
On the plus side you can use a graph library to create the graph and probably even some graph algorithms to calculate where agents go.

List of objects or parallel arrays of properties?

The question is, basically: what would be more preferable, both performance-wise and design-wise - to have a list of objects of a Python class or to have several lists of numerical properties?
I am writing some sort of a scientific simulation which involves a rather large system of interacting particles. For simplicity, let's say we have a set of balls bouncing inside a box so each ball has a number of numerical properties, like x-y-z-coordinates, diameter, mass, velocity vector and so on. How to store the system better? Two major options I can think of are:
to make a class "Ball" with those properties and some methods, then store a list of objects of the class, e. g. [b1, b2, b3, ...bn, ...], where for each bn we can access bn.x, bn.y, bn.mass and so on;
to make an array of numbers for each property, then for each i-th "ball" we can access it's 'x' coordinate as xs[i], 'y' coordinate as ys[i], 'mass' as masses[i] and so on;
To me it seems that the first option represents a better design. The second option looks somewhat uglier, but might be better in terms of performance, and it could be easier to use it with numpy and scipy, which I try to use as much as I can.
I am still not sure if Python will be fast enough, so it may be necessary to rewrite it in C++ or something, after initial prototyping in Python. Would the choice of data representation be different for C/C++? What about a hybrid approach, e.g. Python with C++ extension?
Update: I never expected any performance gain from parallel arrays per se, but in a mixed environment like Python + Numpy (or whatever SlowScriptingLanguage + FastNativeLibrary) using them may (or may not?) let you move more work out of you slow scripting code and into the fast native library.

Having an object for each ball in this example is certainly better design. Parallel arrays are really a workaround for languages that do not support proper objects. I wouldn't use them in a language with OO capabilities unless it's a tiny case that fits within a function (and maybe not even then) or if I've run out of every other optimization option and the profiler shows that property access is the culprit. This applies twice as much to Python as to C++, as the former places a large emphasis on readability and elegance.

I agree that parallel arrays are almost always a bad idea, but don't forget that you can use views into a numpy array when you're setting things, up, though... (Yes, I know this is effectively using parallel arrays, but I think it's the best option in this case...)
This is great if you know the number of "balls" you're going to create beforehand, as you can allocate an array for the coordinates, and store a view into that array for each ball object.
You have to be a bit careful to do operations in-place on the coords array, but it makes updating coordinates for numerous "balls" much, much, much faster.
For example...
import numpy as np
class Ball(object):
def __init__(self, coords):
self.coords = coords
def _set_coord(self, i, value):
self.coords[i] = value
x = property(lambda self: self.coords[0],
lambda self, value: self._set_coord(0, value))
y = property(lambda self: self.coords[1],
lambda self, value: self._set_coord(1, value))
def move(self, dx, dy):
self.x += dx
self.y += dy
def main():
n_balls = 10
n_dims = 2
coords = np.zeros((n_balls, n_dims))
balls = [Ball(coords[i,:]) for i in range(n_balls)]
# Just to illustrate that that the coords are updating
ball = balls[0]
# Random walk by updating coords array
print 'Moving all the balls randomly by updating coords'
for step in xrange(5):
# Add a random value to all coordinates
coords += 0.5 - np.random.random((n_balls, n_dims))
# Display the coords for a particular ball and the
# corresponding row of the coords array
print ' Value of ball.x, ball.y:', ball.x, ball.y
print ' Value of coords[0,:]:', coords[0,:]
# Move an individual ball object
print 'Moving a ball individually through Ball.move()'
ball.move(0.5, 0.5)
print ' Value of ball.x, ball.y:', ball.x, ball.y
print ' Value of coords[0,:]:', coords[0,:]
main()
Just to illustrate, this outputs something like:
Moving all the balls randomly by updating coords
Value of ball.x, ball.y: -0.125713650677 0.301692195466
Value of coords[0,:]: [-0.12571365 0.3016922 ]
Value of ball.x, ball.y: -0.304516863495 -0.0447543559805
Value of coords[0,:]: [-0.30451686 -0.04475436]
Value of ball.x, ball.y: -0.171589457954 0.334844443821
Value of coords[0,:]: [-0.17158946 0.33484444]
Value of ball.x, ball.y: -0.0452864552743 -0.0297552313656
Value of coords[0,:]: [-0.04528646 -0.02975523]
Value of ball.x, ball.y: -0.163829876915 0.0153203173857
Value of coords[0,:]: [-0.16382988 0.01532032]
Moving a ball individually through Ball.move()
Value of ball.x, ball.y: 0.336170123085 0.515320317386
Value of coords[0,:]: [ 0.33617012 0.51532032]
The advantage here is that updating a single numpy array is going to be much, much faster than iterating through all of your ball objects, but you retain a more object-oriented approach.
Just my thoughts on it, anyway..
EDIT: To give some idea of the speed difference, with 1,000,000 balls:
In [104]: %timeit coords[:,0] += 1.0
100 loops, best of 3: 11.8 ms per loop
In [105]: %timeit [item.x + 1.0 for item in balls]
1 loops, best of 3: 1.69 s per loop
So, updating the coordinates directly using numpy is roughly 2 orders of magnitude faster when using a large number of balls. (the difference is smaller when using 10 balls, as per the example, roughly a factor of 2x, rather than 150x)

I think it depends on what you're going to be doing with them, and how often you're going to be working with (all attributes of one particle) vs (one attribute of all particles). The former is better suited to the object approach; the latter is better suited to the array approach.
I was facing a similar problem (although in a different domain) a couple of years ago. The project got deprioritized before I actually implemented this phase, but I was leaning towards a hybrid approach, where in addition to the Ball class I would have an Ensemble class. The Ensemble would not be a list or other simple container of Balls, but would have its own attributes (which would be arrays) and its own methods. Whether the Ensemble is created from the Balls, or the Balls from the Ensemble, depends on how you're going to construct them.
One of my coworkers was arguing for a solution where the fundamental object was an Ensemble which might contain only one Ball, so that no calling code would ever have to know whether you were operating on just one Ball (do you ever do that for your application?) or on many.

Will you be having any forces between the balls (hard sphere/collision, gravity, electromagnetic)? I'm guessing so. Will you be having a large enough number of balls to want to use Barnes-Hut simulation ideas? If so, then you should definitely use the Ball class idea so that you can store them easily in octrees or something else along those lines. Also, using the Barnes-Hut simulation will cut down the complexity of the simulation to O(N log N) from O(N^2).
Really though, if you don't have forces between the balls or aren't using many balls, you don't need the possible speed gains from using parallel arrays and should go with the Ball class idea for that as well.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Two instances of class are equal but different hash code - python

Instead of giving your points an 'invalid' eq method, either give them an isclose method, using the code you already have, and use that instead of == or make them equal in the standard sense by rounding the coordinates.

Related

How can I represent an "infinite" set based off a predicate, without storing all the elements?

Python Trig Functions Return Complex Numbers?

How's this for collision detection?

python: representing square grid that wraps on itself (cylinder)

List of objects or parallel arrays of properties?

Categories

Resources