How to perform __eq__, __gt__ etc on int object in Python?

How to perform __eq__, __gt__ etc on int object in Python? - python

I am working on a class that requires multiple rules to validate against e.g. if certain pattern appears more than, equal, less than to a certain number. I have the output of this regular expression I am validating in a list and checking the length of the list. Now how do I call one of these methods (__gt__, __eq__ etc) dynamically?
My approach:
func = getattr(len(re.findall('\d+', 'password1'), method_name) #method_name
#can be any one of the values __eq__, __gt__).
if func(desired_length):
print ":validated"
else:
raise Exception("Not sufficient complexity")
For example, for method_name='__eq__' and desired_length=1, the above will result True. For method_name='__gt__' and desired_length=1, the above will result False (if a number appears in this string more than once).
But I realize int objects don't really implement these methods. Is there any way I can achieve this?

Rather than using getattr on the int instance here, maybe you should consider the operator module. Then you can grab the comparison operator and pass the int instance and the desired length. A full example would be something like this:
import operator
method_name = '__eq__'
desired_length = 1
func = getattr(operator, method_name)
n_ints = len(re.findall('\d+', 'password1'))
if func(n_ints, desired_length):
print('Yeah Buddy!')

Related

Unable to understand lt method [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
Hi I was solving this question on leetcode [Given a list of non-negative integers, arrange them such that they form the largest number.] I saw this solution.
I'm unable to understand how is the class LargerNumKey is working? Also, what is the purpose lt . and what are the variables x and y
class LargerNumKey(str):
def __lt__(x, y):
return x+y > y+x
class Solution:
def largestNumber(self, nums):
largest_num = ''.join(sorted(map(str, nums), key=LargerNumKey))
return '0' if largest_num[0] == '0' else largest_num

The __lt__ "dunder" method is what allows you to use the < less-than sign for an object. It might make more sense written as follows:
class LargerNumKey(str):
def __lt__(self, other):
return self+other > other+self
# This calls into LargerNumKey.__lt__(LargerNumKey('0'), LargerNumKey('1'))
LargerNumKey('0') < LargerNumKey('1')
Behind the scenes when str is subclassed, adding self+other actually generates a str object rather than a LargerNumKey object, so you don't have infinite recursion problems defining an inequality on a type in terms of its own inequality operator.
The reason this works is perhaps more interesting:
The first fact we need is that for any positive integers we actually have (x>y) == (str(x)>str(y)), so when the custom __lt__ is operating it's actually asking whether the integers represented by those string concatenations are greater or less than each other.
The second interesting fact is that the new inequality defined as such is actually transitive -- if s<t and t<u then s<u, so the sorted() method is able to place all the numbers in the correct order by just getting the correct answer for each possible pair.

__lt__ is a magic method that lets you change the behavior of the < operator. sorted uses the < operator to compare values. So when python is comparing two values with < it checks to see if those objects have the magic method __lt__ defined. If they do, then it uses that method for the comparison. The variables x and y in the example are the two variables being compared. So if you had a line of code like x < y, then x and y would be passed as arguments to __lt__. Sorted presumably does have that line of code. But you don't have to call them 'x' and 'y', you can call them whatever you want. Often you will see them called self and other.
sorted works by comparing two items at a time. For example, let's call them x and y. So somewhere sorted has to compare them, probably with a line that looks like:
if x < y:
However, if you pass sorted a key argument, then it instead compares them more like this:
if key(x) < key(y):
Since the example passes LargerNumKey as the key, it ends up looking like this after python looks up key:
if LargerNumKey(x) < LargerNumKey(y):
When python then sees the < operator, it looks for the __lt__ method, and because it finds it turns the statement into basically:
if LargerNumKey(x).__lt__(LargerNumKey(y)):
Because __lt__ is a method on an object, the object itself becomes the first argument (x in this case). Also, because LargerNumKey is a subclass of str it behaves exactly like a regular string, except fo the __lt__ method that you overrode.
This is a useful technique when you want things to be sortable. You can use __lt__ to allow your objects to be sorted in any way you wish. And if the objects you are sorting have the __lt__ method defined, then you don't have to even pass key. But since we are working with different types of objects and don't want to use the default __lt__ method, we use key instead.
References:
Python Docs
https://rszalski.github.io/magicmethods/#comparisons
Note that while my example pretends that sorted is python code, it is in fact usually c code. However, since python is "pseudo code that runs", I think it conveys the idea accurately.

This'd also pass without __lt__:
from functools import cmp_to_key
class Solution:
def largestNumber(self, nums):
nums = list(map(str, nums))
nums.sort(key=cmp_to_key(lambda a, b: 1 if a + b > b +
a else -1 if a + b < b + a else 0), reverse=1)
return str(int(''.join(nums)))
print(Solution().largestNumber([10, 2]))
print(Solution().largestNumber([3, 30, 34, 5, 9]))
Outputs
210
9534330
References
For additional details, you can see the Discussion Board. There are plenty of accepted solutions with a variety of languages and explanations, efficient algorithms, as well as asymptotic time/space complexity analysis1, 2 in there.

Is it really necessary to hash the same for classes that compare the same?

Reading this answer it seems, that if __eq__ is defined in custom class, __hash__ needs to be defined as well. This is understandable.
However it is not clear, why - effectively - __eq__ should be same as self.__hash__()==other.__hash__
Imagining a class like this:
class Foo:
...
self.Name
self.Value
...
def __eq__(self,other):
return self.Value==other.Value
...
def __hash__(self):
return id(self.Name)
This way class instances could be compared by value, which could be the only reasonable use, but considered identical by name.
This way set could not contain multiple instances with equal name, but comparison would still work.
What could be the problem with such definition?
The reason for defining __eq__, __lt__ and other by Value is to be able to sort instances by Value and to be able to use functions like max. For example, he class should represent a physical output of a device (say heating element). Each of these outputs has unique Name. The Value is power of the output device. To find optimal combination of heating elements to turn on, it is useful to be able to compare them by power (Value). In a set or dictionary, however, it should not be possible to have multiple outputs with same names. Of course, different outputs with different names might easily have equal power.

The problem is that it does not make sense, hash is used to do efficient bucketing of objects. Consequently, when you have a set, which is implemented as a hash table, each hash points to a bucket, which is usually a list of elements. In order to check if an element is in the set (or other hash based container) you go to the bucket pointed by a hash and then you iterate over all elements in the list, comparing them one by one.
In other words - hash is not supposed to be a comparator (as it can, and should give you sometimes a false positive). In particular, in your example, your set will not work - it will not recognize duplicate, as they do not compare to each other.
class Foo:
def __eq__(self,other):
return self.Value==other.Value
def __hash__(self):
return id(self.Name)
a = set()
el = Foo()
el.Name = 'x'
el.Value = 1
el2 = Foo()
el2.Name = 'x'
el2.Value = 2
a.add(el)
a.add(el2)
print len(a) # should be 1, right? Well it is 2
actually it is even worse then that, if you have 2 objects with the same values but different names, they are not recognized to be the same either
class Foo:
def __eq__(self,other):
return self.Value==other.Value
def __hash__(self):
return id(self.Name)
a = set()
el = Foo()
el.Name = 'x'
el.Value = 2
el2 = Foo()
el2.Name = 'a'
el2.Value = 2
a.add(el)
a.add(el2)
print len(a) # should be 1, right? Well it is 2 again
while doing it properly (thus, "if a == b, then hash(a) == hash(b)") gives:
class Foo:
def __eq__(self,other):
return self.Name==other.Name
def __hash__(self):
return id(self.Name)
a = set()
el = Foo()
el.Name = 'x'
el.Value = 1
el2 = Foo()
el2.Name = 'x'
el2.Value = 2
a.add(el)
a.add(el2)
print len(a) # is really 1
Update
There is also an non deterministic part, which is hard to easily reproduce, but essentially hash does not uniquely define a bucket. Usually it is like
bucket_id = hash(object) % size_of_allocated_memory
consequently things that have different hashes can still end up in the same bucket. Consequently, you can get two elements equal to each (inside set) due to equality of Values even though Names are different, as well as the other way around, depending on actual internal implementation, memory constraints etc.
In general there are many more examples where things can go wrong, as hash is defined as a function h : X -> Z such that x == y => h(x) == h(y), thus people implementing their containers, authorization protocols, and other tools are free to assume this property. If you break it - every single tool using hashes can break. Furthermore, it can break in time, meaning that you update some library and your code will stop working, as a valid update to the underlying libraries (using the above assumption) can lead to exploiting your violation of this assumption.
Update 2
Finally, in order to solve your issue - you simply should not define your eq, lt operators to handle sorting. It is about actual comparison of the elements, which should be compatible with the rest of the behaviours. All you have to do is define a separate comparator and use it in your sorting routines (sorting in python accepts any comparator, you do not need to rely on <, > etc.). The other way around is to instead have valid <, >, = defined on values, but in order to keep names unique - keep a set with... well... names, and not objects themselves. Whichever path you choose - the crucial element here is:
equality and hashing have to be compatible, that's all.

It is possible to implement your class like this and not have any problems. However, you have to be 100% sure that no two different objects will ever produce the same hash. Consider the following example:
class Foo:
def __init__(self, name, value):
self.name= name
self.value= value
def __eq__(self, other):
return self.value == other.value
def __hash__(self):
return hash(self.name[0])
s= set()
s.add(Foo('a', 1))
s.add(Foo('b', 1))
print(len(s)) # output: 2
But you have a problem if a hash collision occurs:
s.add(Foo('abc', 1))
print(len(s)) # output: 2
In order to prevent this, you would have to know exactly how the hashes are generated (which, if you rely on functions like id or hash, might vary between implementations!) and also the values of the attribute(s) used to generate the hash (name in this example). That's why ruling out the possibility of a hash collision is very difficult, if not impossible. It's basically like begging for unexpected things to happen.

Is there a way in Python to utilize infinite subsets of R or C?

I understand that it is intrinsically impossible for computers to store infinite sets (aside from the use of generators to produce countably infinite sets), but I was wondering if there's a way to represent, say, the set of complex numbers with |z| < 1. I know I could do this with comprehensions if a package has a "set of all complex numbers" object, but my initial searches has come up empty.
I presume the better way to deal with such sets is to test inclusion given a number (i.e., given z is |z| <1?) rather than try to have some type of object, but just thought I'd ask. Thanks!

You can easily create a class to abstract away the < test:
>>> class complex_subset(object):
... def __init__(self, norm_below):
... self.norm_below=norm_below
... def __contains__(self, item):
... return abs(complex(item)) < self.norm_below
...
>>> complex_below_norm_1=complex_subset(norm_below=1)
>>> 0 in complex_below_norm_1
True
>>> 3 in complex_below_norm_1
False
>>> 0.5+0.5j in complex_below_norm_1
True
and of course, you can generalize complex_subset with keyword __init__ arguments to define your __contains__ method.
If you want to be able to compare complex_subsets with each other, you have to write the appropriate __eq__, __gt__ and __lt__ methods.

In Python, is there a good "magic method" to represent object similarity?

I want instances of my custom class to be able to compare themselves to one another for similarity. This is different than the __cmp__ method, which is used for determining the sorting order of objects.
Is there a magic method that makes sense for this? Is there any standard syntax for doing this?
How I imagine this could look:
>>> x = CustomClass("abc")
>>> y = CustomClass("abd")
>>> z = CustomClass("xyz")
>>> x.__<???>__(y)
0.75
>>> x <?> y
0.75
>>> x.__<???>__(z)
0.0
>>> x <?> z
0.0
Where <???> is the magic method name and <?> is the operator.

Take a look at the numeric types emulation in the datamodel and pick an operator hook that suits you.
I don't think there is currently an operator that is an exact match though, so you'll end up surprising some poor hapless future code maintainer (could even be you) that you overloaded a standard operator.
For a Levenshtein Distance I'd just use a regular method instead. I'd find a one.similarity(other) method a lot clearer when reading the code.

well, you could override __eq__ to mean both boolean logical equality and 'fuzzy' simlirity, by returning a sufficiently weird result from __eq__:
class FuzzyBool(object):
def __init__(self, quality, tolerance=0):
self.quality, self._tolerance = quality, tolerance
def __nonzero__(self):
return self.quality <= self._tolerance
def tolerance(self, tolerance):
return FuzzyBool(self.quality, tolerance)
def __repr__(self):
return "sorta %s" % bool(self)
class ComparesFuzzy(object):
def __init__(self, value):
self.value = value
def __eq__(self, other):
return FuzzyBool(abs(self.value - other.value))
def __hash__(self):
return hash((ComparesFuzzy, self.value))
>>> a = ComparesFuzzy(1)
>>> b = ComparesFuzzy(2)
>>> a == b
sorta False
>>> (a == b).tolerance(3)
sorta True
the default behavior of the comparator should be that it is Truthy only if the compared values are exactly equal, so that normal equality is unaffected

No, there is not. You can make a class method, but I don't think there is any intuitive operator to overload that would do what you're looking for. And, to avoid confusion, I would avoid overloading unless it is obviously intuitive.
I would simply call is CustomClass.similarity(y)

I don't think there is a magic method (and corresponding operator) that would make sense for this in any context.
However, if, with a bit of fantasy, your instances can be seen as vectors, then checking for similarity could be analogous to calculating the scalar product. It would make sense then to use __mul__ and multiplication sign for this (unless you have already defined product for CustomClass instances).

No magic function/operator for that.
When I think of "similarity" for ints and floats, I think of the difference being lower than a certain threshold. Perhaps that's something you might use?
E.g. being able to calculate the "difference" between your objects might be suitable in the sub method.

In the example you've cited, I would use difflib. This conducts spell-check like comparisons between strings. But in general, if you really are comparing objects rather than strings, then I agree with the others; you should probably create something context-specific.

Under what circumstances are rmul called?

Say I have a list l. Under what circumstance is l.__rmul__(self, other) called?
I basically understood the documentation, but I would also like to see an example to clarify its usages beyond any doubt.

When Python attempts to multiply two objects, it first tries to call the left object's __mul__() method. If the left object doesn't have a __mul__() method (or the method returns NotImplemented, indicating it doesn't work with the right operand in question), then Python wants to know if the right object can do the multiplication. If the right operand is the same type as the left, Python knows it can't, because if the left object can't do it, another object of the same type certainly can't either.
If the two objects are different types, though, Python figures it's worth a shot. However, it needs some way to tell the right object that it is the right object in the operation, in case the operation is not commutative. (Multiplication is, of course, but not all operators are, and in any case * is not always used for multiplication!) So it calls __rmul__() instead of __mul__().
As an example, consider the following two statements:
print "nom" * 3
print 3 * "nom"
In the first case, Python calls the string's __mul__() method. The string knows how to multiply itself by an integer, so all is well. In the second case, the integer does not know how to multiply itself by a string, so its __mul__() returns NotImplemented and the string's __rmul__() is called. It knows what to do, and you get the same result as the first case.
Now we can see that __rmul__() allows all of the string's special multiplication behavior to be contained in the str class, such that other types (such as integers) do not need to know anything about strings to be able to multiply by them. A hundred years from now (assuming Python is still in use) you will be able to define a new type that can be multiplied by an integer in either order, even though the int class has known nothing of it for more than a century.
By the way, the string class's __mul__() has a bug in some versions of Python. If it doesn't know how to multiply itself by an object, it raises a TypeError instead of returning NotImplemented. That means you can't multiply a string by a user-defined type even if the user-defined type has an __rmul__() method, because the string never lets it have a chance. The user-defined type has to go first (e.g. Foo() * 'bar' instead of 'bar' * Foo()) so its __mul__() is called. They seem to have fixed this in Python 2.7 (I tested it in Python 3.2 also), but Python 2.6.6 has the bug.

Binary operators by their nature have two operands. Each operand may be on either the left or the right side of an operator. When you overload an operator for some type, you can specify for which side of the operator the overloading is done. This is useful when invoking the operator on two operands of different types. Here's an example:
class Foo(object):
def __init__(self, val):
self.val = val
def __str__(self):
return "Foo [%s]" % self.val
class Bar(object):
def __init__(self, val):
self.val = val
def __rmul__(self, other):
return Bar(self.val * other.val)
def __str__(self):
return "Bar [%s]" % self.val
f = Foo(4)
b = Bar(6)
obj = f * b # Bar [24]
obj2 = b * f # ERROR
Here, obj will be a Bar with val = 24, but the assignment to obj2 generates an error because Bar has no __mul__ and Foo has no __rmul__.
I hope this is clear enough.

__mul__() is the case of a dot product and the result of the dot product should be a scalar or just a number, i.e. __mul__() results in a dot product multiplication like x1*x2+y1*y2. In __rmul__() , the result is a point with x = x1*x2 and y = y1*y2 .

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.