Special Values of a Class - python

Right now I am creating a class which represents a closed interval. Its core functionality is to provide an intersect method.
class Interval:
def __init__(self, a, b):
# check a <= b otherwise swap
self.a = a
self.b = b
def intersect(self, other):
a = self.a if self.a > other.a else other.a
b = self.b if self.b < other.b else other.b
if b < a:
# return some value representing an empty interval, providing the intersect method
return Intervall(a,b)
It should be possible to represent special Values like all points [-oo,oo] or the empty set {}. Which still serve the intersect method. My current approach is to create a new class, but this seems kinda tedious.
class EmptyInterval:
def intersect(self, other):
return self
Assuming those special values' intersect methods take precedence I'd prepend on the Intervall class' method:
class Intervall:
...
def intersect(self,other):
if not isinstance(self, other):
other.intersect(self)
...
To clarify - the following should be legal:
a = Intervall(1,2)
b = Intervall(3,4)
c = a.intersect(b) # resulting in an empty interval
c.intersect(a) # resulting again in an empty interval
Is there some elegant / more pythonic / less nauseating ugly way to implement such a behavior?
First I thought of inheritance, but that seems quite unfitting because of the precedence those special values should have; i.e. I do not know how to implement it via inheritance.

Define a couple special functions in your class Interval:
#staticmethod
def everything():
return Interval(-math.inf, math.inf)
#staticmethod
def nothing():
return Interval(math.nan, math.nan)
You may find it more natural to write nothing() like this:
return Interval(0, 0)
or this:
return Interval(math.inf, math.inf)
It rather depends on your other code, and what you think is the most natural way to represent the empty interval. Note that any less or greater comparison with NAN will return false, so this may have some impact on which way you decide to represent the empty interval (for example is nothing().intersect(nothing()) supposed to be true or false?).

Maybe this could be another solution:
Instead of passing a,b separately, I could pass a tuple (a,b). Further I could declare a couple of singletons as class variables. During instantiation I'd pass that singleton and would only have to check whether that value is one of the singletons and act accordingly.
class Interval:
EMPTY = object()
EVERYTHING = object()
def __init__(self, bounds):
self.bound = bounds
def intersect(self, other):
if self.bounds == self.EMPTY or other.bounds == self.EMPTY:
return Interval(self.EMPTY)
...
if b < a:
return Interval(self.EMPTY)
return Interval((a,b))
I guess this maybe less error prone than John's answer, because of the general behavior math.inf and / or math.nan impose. Also it would allow to strictly forbid those values to be passed, as Interval(math.nan, 1) would be nonsensical.
But it may be more effort to implement in a more complex setting.

Related

Pythonic way of sorting classes with possible Nones in variables

I have a class that looks more or less like this:
class Something():
def __init__(self,a=None,b=None):
self.a = a
self.b = b
I want to be able to sort it in a list, normally I'd just implement method like this:
def __lt__(self,other):
return (self.a, self.b) < (other.a, other.b)
But this will raise an error in following case:
sort([Something(1,None),Something(1,1)])
While I want is for None values to be treated as greated than or following output:
[Something(1,1),Something(1,None)]
First thing that somes to my mind is change __lt__ to:
def __lt__(self,other):
if self.a and other.a:
if self.a != other.a:
return self.a < other.a
elif self.a is None:
return True
elif other.a is None:
return False
if self.b and other.b:
if self.b != other.b:
return self.b < other.b
elif self.b is None:
return True
return False
This would give me the correct results but its just ugly and python usually has a simpler way, and I don't really want to do it for each variable that I use in sorting of my full class(omitted from here to make problem clearer).
So what is the pythonic way of solving this?
Note
I also tried following but I'm assuming that even better is possible:
This would:
def __lt__(self,other):
sorting_attributes = ['a', 'b']
for attribute in sorting_attributes:
self_value = getattr(self,attribute)
other_value = getattr(other,attribute)
if self_value and other_value:
if self_value != other_value:
return self_value < other_value
elif self_value is None:
return True
elif self_value is None:
return False
Really trying to internalize the Zen of Pyhton and I know that my code is ugly so how do I fix it?
A completely different design I thought of later (posted separately because it's so different it should really be evaluated independently):
Map all your attributes to tuples, where the first element of every tuple is a bool based on the None-ness of the attribute, and the second is the attribute value itself. None/non-None mismatches would short-circuit on the bool representing None-ness preventing the TypeError, everything else would fall back to comparing the good types:
def __lt__(self, other):
def _key(attr):
# Use attr is not None to make None less than everything, is None for greater
return (attr is None, attr)
return (_key(self.a), _key(self.b)) < (_key(other.a), _key(other.b))
Probably slightly slower than my other solution in the case where no None/non-None pair occurs, but much simpler code. It also has the advantage of continuing to raise TypeErrors when mismatched types other than None/non-None arise, rather than potentially misbehaving. I'd definitely call this one my Pythonic solution, even if it is slightly slower in the common case.
An easy way to do this is to convert None to infinity, i.e. float('inf'):
def __lt__(self, other):
def convert(i):
return float('inf') if i is None else i
return [convert(i) for i in (self.a, self.b)] < [convert(i) for i in (other.a, other.b)]
A solution for the general case (where there may not be a convenient "bigger than any value" solution, and you don't want the code to grow more complex as the number of attributes increases), which still operates as fast as possible in the presumed common case of no None values. It does assume TypeError means None was involved, so if you're likely to have mismatched types besides None, this gets more complicated, but frankly, a class design like that is painful to contemplate. This works for any scenario with two or more keys (so attrgetter returns a tuple) and only requires changing the names used to construct the attrgetter to add or remove fields to compare.
def __lt__(self, other, _key=operator.attrgetter('a', 'b')):
# Get the keys once for both inputs efficiently (avoids repeated lookup)
sattrs = _key(self)
oattrs = _key(other)
try:
return sattrs < oattrs # Fast path for no Nones or only paired Nones
except TypeError:
for sattr, oattr in zip(sattrs, oattrs):
# Only care if exactly one is None, because until then, must be equal, or TypeError
# wouldn't occur as we would have short-circuited
if (sattr is None) ^ (oattr is None):
# Exactly one is None, so if it's the right side, self is lesser
return oattr is None
# TypeError implied we should see a mismatch, so assert this to be sure
# we didn't have a non-None related type mismatch
assert False, "TypeError raised, but no None/non-None pair seen
A useful feature of this design is that under no circumstances are rich comparisons invoked for any given attribute more than once; the failed attempt at the fast path proves that there must (assuming invariant of types being either compatible or None golds) be a run of zero or more attribute pairs with equal values, followed by a None/non-None mismatch. Since everything we care about is known equal or a None/non-None mismatch, we don't need to invoke potentially expensive rich comparisons again, we just do cheap identity testing to find the None/non-None mismatch and then return based on which side was None.

What's the best way to implement a number that has an optional minimum or maximum value?

I want to create a number that you can set a maximum and/or minimum value to. So it'd work like this:
>>> n = myNum(5, minimum=0, maximum=10)
>>> n += 10
>>> print(n)
10
>>> n = myNum(-12, minimum=3)
>>> print(n)
3
The problem is that however I try to implement it seems to become very tedius and long despite the fact that it seems like such a simple concept. Is there an elegant way to do this without, say, overriding every single magic method having to do with numbers?
You should rather do something like that
n = min(max(5,10),0)
and
n = min(-12,3)
From your comment, you can make a convenient function :
def between_min_max(value, min_val, max_val):
return min(max(value, max_val), min_val)
and use it later in your code :
min_val = 0
max_val = 10
n = between_min_max(5,min_val,max_val)
# and you can reuse min_val, max_val later
This might be overkill. You could try creating your own class and then overloading operators on your class. You can create the class mynum:
class mynum:
def __init__ (self, val, minval, maxval):
self.val = val
self.minval = minval
self.maxval = maxval
and declare your numbers in your code as instances of mynum:
n = mynum(5, 0, 10) # for instance
Then you can overload operators on your class, so that it behaves the way that you want it to. For adding, put this inside of your mynum class definition:
def __add__(self, operand): # overload things like n + 10
self.val += operand
if self.val > self.maxval: # replace checks with max(min(...)...) if you like
self.val = self.maxval
elif self.val < self.minval:
self.val = self.minval
return self.val
This post has some good info on a starting point for this. The down side is that this method would require that you overload every operator that could possibly give you an invalid value for your mynum instance.

Overloading Addition, Subtraction, and Multiplication Operators

How do you go about overloading the addition, subtraction, and multiplication operator so we can add, subtract, and multiply two vectors of different or identical sizes? For example, if the vectors are different sizes we must be able to add, subtract, or multiply the two vectors according to the smallest vector size?
I've created a function that allows you to modify different vectors, but now I'm struggling to overload the operators and haven't a clue on where to begin. I will paste the code below. Any ideas?
def __add__(self, y):
self.vector = []
for j in range(len(self.vector)):
self.vector.append(self.vector[j] + y.self.vector[j])
return Vec[self.vector]
You define the __add__, __sub__, and __mul__ methods for the class, that's how. Each method takes two objects (the operands of +/-/*) as arguments and is expected to return the result of the computation.
Nothing wrong with the accepted answer on this question but I'm adding some quick snippets to illustrate how this can be used. (Note that you could also "overload" the method to handle multiple types.)
"""Return the difference of another Transaction object, or another
class object that also has the `val` property."""
class Transaction(object):
def __init__(self, val):
self.val = val
def __sub__(self, other):
return self.val - other.val
buy = Transaction(10.00)
sell = Transaction(7.00)
print(buy - sell)
# 3.0
"""Return a Transaction object with `val` as the difference of this
Transaction.val property and another object with a `val` property."""
class Transaction(object):
def __init__(self, val):
self.val = val
def __sub__(self, other):
return Transaction(self.val - other.val)
buy = Transaction(20.00)
sell = Transaction(5.00)
result = buy - sell
print(result.val)
# 15
"""Return difference of this Transaction.val property and an integer."""
class Transaction(object):
def __init__(self, val):
self.val = val
def __sub__(self, other):
return self.val - other
buy = Transaction(8.00)
print(buy - 6.00)
# 2
docs have the answer. Basically there are functions that get called on an object when you add or multiple, etc. for instance __add__ is the normal add function.

Can someone help me understand special methods vs normal methods?

What is the difference between using a special method and just defining a normal class method? I was reading this site which lists a lot of them.
For example it gives a class like this.
class Word(str):
'''Class for words, defining comparison based on word length.'''
def __new__(cls, word):
# Note that we have to use __new__. This is because str is an immutable
# type, so we have to initialize it early (at creation)
if ' ' in word:
print "Value contains spaces. Truncating to first space."
word = word[:word.index(' ')] # Word is now all chars before first space
return str.__new__(cls, word)
def __gt__(self, other):
return len(self) > len(other)
def __lt__(self, other):
return len(self) < len(other)
def __ge__(self, other):
return len(self) >= len(other)
def __le__(self, other):
return len(self) <= len(other)
For each of those special methods why can't I just make a normal method instead, what are they doing different? I think I just need a fundamental explanation that I can't find, thanks.
It is a pythonic way to do this:
word1 = Word('first')
word2 = Word('second')
if word1 > word2:
pass
instead of direct usage of comparator method
NotMagicWord(str):
def is_greater(self, other)
return len(self) > len(other)
word1 = NotMagicWord('first')
word2 = NotMagicWord('second')
if word1.is_greater(word2):
pass
And the same with all other magic method. You define __len__ method to tell python its length using built-in len function, for example. All magic method will be called implicitly while standard operations like binary operators, object calling, comparision and a lot of other. A Guide to Python's Magic Methods is really good, read it and see what behavior you can give to your objects. It similar to operator overloading in C++, if you are familiar with it.
A method like __gt__ is called when you use comparison operators in your code. Writing something like
value1 > value2
Is the equivalent of writing
value1.__gt__(value2)
"Magic methods" are used by Python to implement a lot of its underlying structure.
For example, let's say I have a simple class to represent an (x, y) coordinate pair:
class Point(object):
def __init__(self, x, y):
self.x = x
self.y = y
So, __init__ would be an example of one of these "magic methods" -- it allows me to automatically initialize the class by simply doing Point(3, 2). I could write this without using magic methods by creating my own "init" function, but then I would need to make an explicit method call to initialize my class:
class Point(object):
def init(self, x, y):
self.x = x
self.y = y
return self
p = Point().init(x, y)
Let's take another example -- if I wanted to compare two point variables, I could do:
class Point(object):
def __init__(self, x, y):
self.x = x
self.y = y
def __eq__(self, other):
return self.x == other.x and self.y == other.y
This lets me compare two points by doing p1 == p2. In contrast, if I made this a normal eq method, I would have to be more explicit by doing p1.eq(p2).
Basically, magic methods are Python's way of implementing a lot of its syntactic sugar in a way that allows it to be easily customizable by programmers.
For example, I could construct a class that pretends to be a function by implementing __call__:
class Foobar(object):
def __init__(self, a):
self.a = a
def __call__(self, b):
return a + b
f = Foobar(3)
print f(4) # returns 7
Without the magic method, I would have to manually do f.call(4), which means I can no longer pretend the object is a function.
Special methods are handled specially by the rest of the Python language. For example, if you try to compare two Word instances with <, the __lt__ method of Word will be called to determine the result.
The magic methods are called when you use <, ==, > to compare the objects. functools has a helper called total_ordering that will fill in the missing comparison methods if you just define __eq__ and __gt__.
Because str already has all the comparison operations defined, it's necessary to add them as a mixin if you want to take advantage of total_ordering
from functools import total_ordering
#total_ordering
class OrderByLen(object):
def __eq__(self, other):
return len(self) == len(other)
def __gt__(self, other):
return len(self) > len(other)
class Word(OrderByLen, str):
'''Class for words, defining comparison based on word length.'''
def __new__(cls, word):
# Note that we have to use __new__. This is because str is an immutable
# type, so we have to initialize it early (at creation)
if ' ' in word:
print "Value contains spaces. Truncating to first space."
word = word[:word.index(' ')] # Word is now all chars before first space
return str.__new__(cls, word)
print Word('cat') < Word('dog') # False
print Word('cat') > Word('dog') # False
print Word('cat') == Word('dog') # True
print Word('cat') <= Word('elephant') # True
print Word('cat') >= Word('elephant') # False

What's a correct and good way to implement __hash__()?

What's a correct and good way to implement __hash__()?
I am talking about the function that returns a hashcode that is then used to insert objects into hashtables aka dictionaries.
As __hash__() returns an integer and is used for "binning" objects into hashtables I assume that the values of the returned integer should be uniformly distributed for common data (to minimize collisions).
What's a good practice to get such values? Are collisions a problem?
In my case I have a small class which acts as a container class holding some ints, some floats and a string.
An easy, correct way to implement __hash__() is to use a key tuple. It won't be as fast as a specialized hash, but if you need that then you should probably implement the type in C.
Here's an example of using a key for hash and equality:
class A:
def __key(self):
return (self.attr_a, self.attr_b, self.attr_c)
def __hash__(self):
return hash(self.__key())
def __eq__(self, other):
if isinstance(other, A):
return self.__key() == other.__key()
return NotImplemented
Also, the documentation of __hash__ has more information, that may be valuable in some particular circumstances.
John Millikin proposed a solution similar to this:
class A(object):
def __init__(self, a, b, c):
self._a = a
self._b = b
self._c = c
def __eq__(self, othr):
return (isinstance(othr, type(self))
and (self._a, self._b, self._c) ==
(othr._a, othr._b, othr._c))
def __hash__(self):
return hash((self._a, self._b, self._c))
The problem with this solution is that the hash(A(a, b, c)) == hash((a, b, c)). In other words, the hash collides with that of the tuple of its key members. Maybe this does not matter very often in practice?
Update: the Python docs now recommend to use a tuple as in the example above. Note that the documentation states
The only required property is that objects which compare equal have the same hash value
Note that the opposite is not true. Objects which do not compare equal may have the same hash value. Such a hash collision will not cause one object to replace another when used as a dict key or set element as long as the objects do not also compare equal.
Outdated/bad solution
The Python documentation on __hash__ suggests to combine the hashes of the sub-components using something like XOR, which gives us this:
class B(object):
def __init__(self, a, b, c):
self._a = a
self._b = b
self._c = c
def __eq__(self, othr):
if isinstance(othr, type(self)):
return ((self._a, self._b, self._c) ==
(othr._a, othr._b, othr._c))
return NotImplemented
def __hash__(self):
return (hash(self._a) ^ hash(self._b) ^ hash(self._c) ^
hash((self._a, self._b, self._c)))
Update: as Blckknght points out, changing the order of a, b, and c could cause problems. I added an additional ^ hash((self._a, self._b, self._c)) to capture the order of the values being hashed. This final ^ hash(...) can be removed if the values being combined cannot be rearranged (for example, if they have different types and therefore the value of _a will never be assigned to _b or _c, etc.).
Paul Larson of Microsoft Research studied a wide variety of hash functions. He told me that
for c in some_string:
hash = 101 * hash + ord(c)
worked surprisingly well for a wide variety of strings. I've found that similar polynomial techniques work well for computing a hash of disparate subfields.
A good way to implement hash (as well as list, dict, tuple) is to make the object have a predictable order of items by making it iterable using __iter__. So to modify an example from above:
class A:
def __init__(self, a, b, c):
self._a = a
self._b = b
self._c = c
def __iter__(self):
yield "a", self._a
yield "b", self._b
yield "c", self._c
def __hash__(self):
return hash(tuple(self))
def __eq__(self, other):
return (isinstance(other, type(self))
and tuple(self) == tuple(other))
(here __eq__ is not required for hash, but it's easy to implement).
Now add some mutable members to see how it works:
a = 2; b = 2.2; c = 'cat'
hash(A(a, b, c)) # -5279839567404192660
dict(A(a, b, c)) # {'a': 2, 'b': 2.2, 'c': 'cat'}
list(A(a, b, c)) # [('a', 2), ('b', 2.2), ('c', 'cat')]
tuple(A(a, b, c)) # (('a', 2), ('b', 2.2), ('c', 'cat'))
things only fall apart if you try to put non-hashable members in the object model:
hash(A(a, b, [1])) # TypeError: unhashable type: 'list'
I can try to answer the second part of your question.
The collisions will probably result not from the hash code itself, but from mapping the hash code to an index in a collection. So for example your hash function could return random values from 1 to 10000, but if your hash table only has 32 entries you'll get collisions on insertion.
In addition, I would think that collisions would be resolved by the collection internally, and there are many methods to resolve collisions. The simplest (and worst) is, given an entry to insert at index i, add 1 to i until you find an empty spot and insert there. Retrieval then works the same way. This results in inefficient retrievals for some entries, as you could have an entry that requires traversing the entire collection to find!
Other collision resolution methods reduce the retrieval time by moving entries in the hash table when an item is inserted to spread things out. This increases the insertion time but assumes you read more than you insert. There are also methods that try and branch different colliding entries out so that entries to cluster in one particular spot.
Also, if you need to resize the collection you will need to rehash everything or use a dynamic hashing method.
In short, depending on what you're using the hash code for you may have to implement your own collision resolution method. If you're not storing them in a collection, you can probably get away with a hash function that just generates hash codes in a very large range. If so, you can make sure your container is bigger than it needs to be (the bigger the better of course) depending on your memory concerns.
Here are some links if you're interested more:
coalesced hashing on wikipedia
Wikipedia also has a summary of various collision resolution methods:
Also, "File Organization And Processing" by Tharp covers alot of collision resolution methods extensively. IMO it's a great reference for hashing algorithms.
A very good explanation on when and how implement the __hash__ function is on programiz website:
Just a screenshot to provide an overview:
(Retrieved 2019-12-13)
As for a personal implementation of the method, the above mentioned site provides an example that matches the answer of millerdev.
class Person:
def __init__(self, age, name):
self.age = age
self.name = name
def __eq__(self, other):
return self.age == other.age and self.name == other.name
def __hash__(self):
print('The hash is:')
return hash((self.age, self.name))
person = Person(23, 'Adam')
print(hash(person))
Depends on the size of the hash value you return. It's simple logic that if you need to return a 32bit int based on the hash of four 32bit ints, you're gonna get collisions.
I would favor bit operations. Like, the following C pseudo code:
int a;
int b;
int c;
int d;
int hash = (a & 0xF000F000) | (b & 0x0F000F00) | (c & 0x00F000F0 | (d & 0x000F000F);
Such a system could work for floats too, if you simply took them as their bit value rather than actually representing a floating-point value, maybe better.
For strings, I've got little/no idea.
#dataclass(frozen=True) (Python 3.7)
This awesome new feature, among other good things, automatically defines a __hash__ and __eq__ method for you, making it just work as usually expected in dicts and sets:
dataclass_cheat.py
from dataclasses import dataclass, FrozenInstanceError
#dataclass(frozen=True)
class MyClass1:
n: int
s: str
#dataclass(frozen=True)
class MyClass2:
n: int
my_class_1: MyClass1
d = {}
d[MyClass1(n=1, s='a')] = 1
d[MyClass1(n=2, s='a')] = 2
d[MyClass1(n=2, s='b')] = 3
d[MyClass2(n=1, my_class_1=MyClass1(n=1, s='a'))] = 4
d[MyClass2(n=2, my_class_1=MyClass1(n=1, s='a'))] = 5
d[MyClass2(n=2, my_class_1=MyClass1(n=2, s='a'))] = 6
assert d[MyClass1(n=1, s='a')] == 1
assert d[MyClass1(n=2, s='a')] == 2
assert d[MyClass1(n=2, s='b')] == 3
assert d[MyClass2(n=1, my_class_1=MyClass1(n=1, s='a'))] == 4
assert d[MyClass2(n=2, my_class_1=MyClass1(n=1, s='a'))] == 5
assert d[MyClass2(n=2, my_class_1=MyClass1(n=2, s='a'))] == 6
# Due to `frozen=True`
o = MyClass1(n=1, s='a')
try:
o.n = 2
except FrozenInstanceError as e:
pass
else:
raise 'error'
As we can see in this example, the hashes are being calculated based on the contents of the objects, and not simply on the addresses of instances. This is why something like:
d = {}
d[MyClass1(n=1, s='a')] = 1
assert d[MyClass1(n=1, s='a')] == 1
works even though the second MyClass1(n=1, s='a') is a completely different instance from the first with a different address.
frozen=True is mandatory, otherwise the class is not hashable, otherwise it would make it possible for users to inadvertently make containers inconsistent by modifying objects after they are used as keys. Further documentation: https://docs.python.org/3/library/dataclasses.html
Tested on Python 3.10.7, Ubuntu 22.10.

Categories

Resources