python set contains vs. list contains

python set contains vs. list contains - python

i'm using python 2.7
consider the following snippet of code (the example is contrived):
import datetime
class ScheduleData:
def __init__(self, date):
self.date = date
def __eq__(self, other):
try:
return self.date == other.date
except AttributeError as e:
return self.date == other
def __hash__(self):
return hash(self.date)
schedule_set = set()
schedule_set.add(ScheduleData(datetime.date(2010, 8, 7)))
schedule_set.add(ScheduleData(datetime.date(2010, 8, 8)))
schedule_set.add(ScheduleData(datetime.date(2010, 8, 9)))
print (datetime.date(2010, 8, 8) in schedule_set)
schedule_list = list(schedule_set)
print (datetime.date(2010, 8, 8) in schedule_list)
the output from this is unexpected (to me, at least):
[08:02 PM toolscripts]$ python test.py
True
False
in the first case, the given date is found in the schedule_set as i have overridden the __hash__ and __eq__ functions.
from my understanding the in operator will check against hash and equality for sets, but for lists it will simply iterate over the items in the list and check equality.
so what is happening here? why does my second test for in on the list schedule_list fail?
do i have to override some other function for lists?

The issue is the comparison is invoking an __eq__ function opposite of what you're looking for. The __eq__ method defined works when you have a ScheduleData() == datetime.date() but the in operator is performing the comparison in the opposite order, datetime.date() == ScheduleData() which is not invoking your defined __eq__. Only the class acting as the left-hand side will have its __eq__ called.
The reason this problem occurs in python 2 and not 3 has to do with the definition of datetime.date.__eq__ in the std library. Take for example the following two classes:
class A(object):
def __eq__(self, other):
print ('A.__eq__')
return False
class B(object):
def __eq__(self, other):
print ('B.__eq__')
items = [A()]
B() in items
Running this code prints B.__eq__ under both Python 2 and Python 3. The B object is used as the lhs, just as your datetime.date object is used in Python 2. However, if I redefine B.__eq__ to resemble the Python 3 defintion of datetime.date.__eq__:
class B(object):
def __eq__(self, other):
print ('First B.__eq__')
if isinstance(self, other.__class__):
print ('B.__eq__')
return NotImplemented
Then:
First B.__eq__
A.__eq__
is printed under both Python 2 and 3. The return of NotImplemented causes the check with the arguments reversed.
Using timetuple in your class will fix this problem, as #TimPeters stated (interesting quirk I was unaware of), though it seems that it need not be a function
class ScheduleData:
timetuple = None
is all you'd need in addition to what you have already.

#RyanHaining is correct. For a truly bizarre workaround, add this method to your class:
def timetuple(self):
return None
Then your program will print True twice. The reasons for this are involved, having to do with an unfortunate history of comparisons in Python 2 being far too loose. The timetuple() workaround is mostly explained in this part of the docs:
Note In order to stop comparison from falling back to the
default scheme of comparing object addresses, datetime
comparison normally raises TypeError if the other comparand
isn’t also a datetime object. However, NotImplemented is
returned instead if the other comparand has a timetuple()
attribute. This hook gives other kinds of date objects a
chance at implementing mixed-type comparison. If not,
when a datetime object is compared to an object of a
different type, TypeError is raised unless the comparison
is == or !=. The latter cases return False or True,
respectively.
datetime was one of the first types added to Python that tried to offer less surprising comparison behavior. But, it couldn't become "really clean" until Python 3.

Related

Should eq compare objects of two different types?

In the problem I'm working, there are data identifiers that have the form scope:name, being both scope and name strings. name has different parts separated by dots, like part1.part2.part3.part4.part5. On many occasions, but not always, scope is just equal to part1 of name. The code I'm writing has to work with different systems that provide or require the identifiers in different patterns. Sometimes they just require the full string representation like scope:name, on some other occasions calls have two different parameters scope and name. When receiving information from other systems, sometimes the full string scope:nameis returned, sometimes scope is omitted and should be inferred from name and sometimes a dict that contains scope and name is returned.
To ease the use of these identifiers, I have created a class to internally manage them, so that I don't have to write the same conversions, splits and formats over and over again. The class is quite simple. It only has two attributes (scope and name, a method to parse strings into objects of the class, and some magic methods to represent the objects Particularly, __str__(self) returns the object in the form scope:name, which is the fully qualified name (fqn) of the identifier:
class DID(object):
"""Represent a data identifier."""
def __init__(self, scope, name):
self.scope = scope
self.name = name
#classmethod
def parse(cls, s, auto_scope=False):
"""Create a DID object given its string representation.
Parameters
----------
s : str
The string, i.e. 'scope:name', or 'name' if auto_scope is True.
auto_scope : bool, optional
If True, and when no scope is provided, the scope will be set to
the projectname. Default False.
Returns
-------
DID
The DID object that represents the given fully qualified name.
"""
if isinstance(s, basestring):
arr = s.split(':', 2)
else:
raise TypeError('string expected.')
if len(arr) == 1:
if auto_scope:
return cls(s.split('.', 1)[0], s)
else:
raise ValueError(
"Expecting 'scope:name' when auto_scope is False"
)
elif len(arr) == 2:
return cls(*arr)
else:
raise ValueError("Too many ':'")
def __repr__(self):
return "DID(scope='{0.scope}', name='{0.name}')".format(self)
def __str__(self):
return u'{0.scope}:{0.name}'.format(self)
As I said, the code has to perform comparisons with strings and use the string representation of some methods. I am tempted to write the __eq__ magic method and its counterpart __ne__. The following is an implementation of just __eq__:
# APPROACH 1:
def __eq__(self, other):
if isinstance(other, self.__class__):
return self.scope == other.scope and self.name == other.name
elif isinstance(other, basestring):
return str(self) == other
else:
return False
As you see, it defines the equality comparison between both DIDs and strings in a way that is possible to compare one with the other. My issue with this is whether it is a good practice:
On the one hand, when other is a string, the method casts self to be a string and I keep thinking on explicit better than implicit. You could end up thinking that you are working with two strings, which is not the case of self.
On the other hand, from the point of view of meaning, a DID represents the fqn scope:name and it makes sense to compare for equality with strings as it does when an int and a float are compared, or any two objects derived from basetring are compared.
I also have thought on not including the basestring case in the implementation, but to me this is even worse and prone to mistakes:
# APPROACH 2:
def __eq__(self, other):
if isinstance(other, self.__class__):
return self.scope == other.scope and self.name == other.name
else:
return False
In approach 2, a comparison for equality between a DID object and a string, both representing the same identifier, returns False. To me, this is even more prone to mistakes.
Which are the best practices in this situation? Should the comparison between a DID and a string be implemented as it is in approach 1, even though objects from different types might be considered equal? Should I use approach 2 even though s != DID.parse(s)? Should I not implement the __eq__ and __ne__ so that there are never misunderstoods?

A few classes in Python (but I can't think of anything in the standard library off the top of my head) define an equality operator that handles multiple types on the RHS. One common library that does support this is NumPy, with:
import numpy as np
np.array(1) == 1
evaluating to True. In general I think I'd discourage this sort of thing, as there are lots of corner cases where this behaviour can get tricky. E.g. see the write up in Python 3 __hash__ method (similar things exist in Python 2, but it's end-of-life). In cases where I have written similar code, I've tended to end up with something closer to:
def __eq__(self, other):
if isinstance(other, str):
try:
other = self.parse(str)
except ValueError:
return NotImplemented
if isinstance(other, DID):
return self.scope == other.scope and self.name == other.name
return NotImplemented
Further to this, I'd suggest making objects like this immutable and you have a few ways of doing this. Python 3 has nice dataclasses, but given that you seem to be stuck under Python 2, you might use namedtuples, something like:
from collections import namedtuple
class DID(namedtuple('DID', ('scope', 'name'))):
__slots__ = ()
#classmethod
def parse(cls, s, auto_scope=False):
return cls('foo', 'bar')
def __eq__(self, other):
if isinstance(other, str):
try:
other = self.parse(str)
except ValueError:
return NotImplemented
return super(DID, self).__eq__(other)
which gives you immutability and a repr method for free, but you might want to keep your own str method. The __slots__ attribute means that accidentally assigning to obj.scopes will fail, but you might want to allow this behaviour.

Special method like str that returns a number representation of an object

Say I have a Python class as follows:
class TestClass():
value = 20
def __str__(self):
return str(self.value)
The __str__ method will automatically be called any time I try to use an instance of TestClass as a string, like in print. Is there any equivalent for treating it as a number? For example, in
an_object = TestClass()
if an_object > 30:
...
where some hypothetical __num__ function would be automatically called to interpret the object as a number. How could this be easily done?
Ideally I'd like to avoid overloading every normal mathematical operator.

You can provide __float__(), __int__(), and/or __complex__() methods to convert objects to numbers. There is also a __round__() method you can provide for custom rounding. Documentation here. The __bool__() method technically fits here too, since Booleans are a subclass of integers in Python.
While Python does implicitly convert objects to strings for e.g. print(), it never converts objects to numbers without you saying to. Thus, Foo() + 42 isn't valid just because Foo has an __int__ method. You have to explicitly use int() or float() or complex() on them. At least that way, you know what you're getting just by reading the code.
To get classes to actually behave like numbers, you have to implement all the special methods for the operations that numbers participate in, including arithmetic and comparisons. As you note, this gets annoying. You can, however, write a mixin class so that at least you only have to write it once. Such as:
class NumberMixin(object):
def __eq__(self, other): return self.__num__() == self.__getval__(other)
# other comparison methods
def __add__(self, other): return self.__num__() + self.__getval__(other)
def __radd__(self, other): return self.__getval__(other) + self.__num__()
# etc., I'm not going to write them all out, are you crazy?
This class expects two special methods on the class it's mixed in with.
__num__() - converts self to a number. Usually this will be an alias for the conversion method for the most precise type supported by the object. For example, your class might have __int__() and __float__() methods, but __int__() will truncate the number, so you assign __num__ = __float__ in your class definition. On the other hand, if your class has a natural integral value, you might want to provide __float__ so it can also be converted to a float, but you'd use __num__ = __int__ since it should behave like an integer.
__getval__() - a static method that obtains the numeric value from another object. This is useful when you want to be able to support operations with objects other than numeric types. For example, when comparing, you might want to be able to compare to objects of your own type, as well as to traditional numeric types. You can write __getval__() to fish out the right attribute or call the right method of those other objects. Of course with your own instances you can just rely on float() to do the right thing, but __getval__() lets you be as flexible as you like in what you accept.
A simple example class using this mixin:
class FauxFloat(NumberMixin):
def __init__(self, value): self.value = float(value)
def __int__(self): return int(self.value)
def __float__(self): return float(self.value)
def __round__(self, digits=0): return round(self.value, digits)
def __str__(self): return str(self.value)
__repr__ = __str__
__num__ = __float__
#staticmethod
def __getval__(obj):
if isinstance(obj, FauxFloat):
return float(obj)
if hasattr(type(obj), "__num__") and callable(type(obj).__num__):
return type(obj).__num__(obj) # don't call dunder method on instance
try:
return float(obj)
except TypeError:
return int(obj)
ff = FauxFloat(42)
print(ff + 13) # 55.0
For extra credit, you could register your class so it'll be seen as a subclass of an appropriate abstract base class:
import numbers
numbers.Real.register(FauxFloat)
issubclass(FauxFloat, numbers.Real) # True
For extra extra credit, you might also create a global num() function that calls __num__() on objects that have it, otherwise falling back to the older methods.

In case of numbers it a bit more complicated. But its possible! You have to override your class operators to fit your needs.
operator.__lt__(a, b) # lower than
operator.__le__(a, b) # lower equal
operator.__eq__(a, b) # equal
operator.__ne__(a, b) # not equal
operator.__ge__(a, b) # greater equial
operator.__gt__(a, b) # greater than
Python Operators

Looks like you need __gt__ method.
class A:
val = 0
def __gt__(self, other):
return self.val > other
a = A()
a.val = 12
a > 10
If you just wanna cast object to int - you should define __int__ method (or __float__).

Overriding special methods on builtin types

Can magic methods be overridden outside of a class?
When I do something like this
def __int__(x):
return x + 5
a = 5
print(int(a))
it prints '5' instead of '10'. Do I do something wrong or magic methods just can't be overridden outside of a class?

Short answer; not really.
You cannot arbitrarily change the behaviour of int() a builtin function (*which internally calls __int__()) on arbitrary builtin types such as int(s).
You can however change the behaviour of custom objects like this:
Example:
class Foo(object):
def __init__(self, value):
self.value = value
def __add__(self, other):
self.value += other
def __repr__(self):
return "<Foo(value={0:d})>".format(self.value)
Demo:
>>> x = Foo(5)
>>> x + 5
>>> x
<Foo(value=10)>
This overrides two things here and implements two special methods:
__repr__() which get called by repr()
__add__() which get called by the + operator.
Update: As per the comments above; techincally you can redefine the builtin function int; Example:
def int(x):
return x + 5
int(5) # returns 10
However this is not recommended and does not change the overall behaviour of the object x.
Update #2: The reason you cannot change the behaviour of bultin types (without modifying the underlying source or using Cuthon or ctypes) is because builtin types in Python are not exposed or mutable to the user unlike Homoiconic Languages (See: Homoiconicity). -- Even then I'm not really sure you can with Cython/ctypes; but the reason question is "Why do you want to do this?"
Update #3: See Python's documentation on Data Model (object.__complex__ for example).

You can redefine a top-level __int__ function, but nobody ever calls that.
As implied in the Data Model documentation, when you write int(x), that calls x.__int__(), not __int__(x).
And even that isn't really true. First, __int__ is a special method, meaning it's allowed to call type(x).__int__(x) rather than x.__int__(), but that doesn't matter here. Second, it's not required to call __int__ unless you give it something that isn't already an int (and call it with the one-argument form). So, it could be as if it's was written like this:
def int(x, base=None):
if base is not None:
return do_basey_stuff(x, base)
if isinstance(x, int):
return x
return type(x).__int__(x)
So, there is no way to change what int(5) will do… short of just shadowing the builtin int function with a different builtin/global/local function of the same name, of course.
But what if you wanted to, say, change int(5.5)? That's not an int, so it's going to call float.__int__(5.5). So, all we have to do is monkeypatch that, right?
Well, yes, except that Python allows builtin types to be immutable, and most of the builtin types in CPython are. So, if you try it:
>>> _real_float_int = float.__int__
>>> def _float_int(self):
... return _real_float_int(self) + 5
>>> _float_int(5.5)
10
>>> float.__int__ = _float_int
TypeError: can't set attributes of built-in/extension type 'float'
However, if you're defining your own types, that's a different story:
>>> class MyFloat(float):
... def __int__(self):
... return super().__int__() + 5
>>> f = MyFloat(5.5)
>>> int(f)
10

OOP way to implement class for comparison in Python

Using Python, I am trying to implement a set of types including a "don't care" type, for fuzzy matching. I have implemented it like so:
class Matchable(object):
def __init__(self, match_type = 'DEFAULT'):
self.match_type = match_type
def __eq__(self, other):
return (self.match_type == 'DONTCARE' or other.match_type == 'DONTCARE' \
or self.match_type == other.match_type)
Coming from an OO background, this solution seems inelegant; using the Matchable class results in ugly code. I'd prefer to eliminate match_type, and instead make each type its own class inherited from a superclass, then use type checking to do the comparisons. However type checking appears to be generally frowned upon:
http://www.canonical.org/~kragen/isinstance/
Is there are better (more pythonic) way to implement this functionality?
Note: I'm aware of the large number of questions and answers about Python "enums", and it may be that one of those answers is appropriate. The requirement for the overridden __ eq __ function complicates matters, and I haven't seen a way to use the proposed enum implementations for this case.
The best OO way I can come up with of doing this is:
class Match(object):
def __eq__(self, other):
return isinstance(self, DontCare) or isinstance(other, DontCare) or type(self) == type(other)
class DontCare(Match):
pass
class A(Match):
pass
class B(Match):
pass
d = DontCare()
a = A()
b = B()
print d == a
True
print a == d
True
print d == b
True
print a == b
False
print d == 1
True
print a == 1
False

The article you linked says that isinstance isn't always evil, and I think in your case it is appropriate. The main complaint in the article is that using isinstance to check whether an object supports a particular interface reduces opportunities to use implied interfaces, and it's a fair point. In your case, however, you would essentially be using a Dontcare class to provide an annotation for how an object should be treated in comparisons, and isinstance would be checking such an annotation, which should be perfectly. fine.

I guess you just need to check if any of the operands is a fuzzy type, no?
class Fuzzy(object):
def __eq__(*args):
def isFuzzy(obj):
return isinstance(obj, Fuzzy)
return any(map(isFuzzy, args))
Now you can do:
>>> class DefaultClass(object):
... pass
>>> class DontCareClass(Fuzzy):
... pass
>>> DefaultClass() == DontCareClass()
True
Since we're using isInstance, this will work just fine with polymorphism. This is a perfectly legitimate use of isInstance in my opinion. What you want to avoid is type checking when you can just rely on duck typing, but this is not one of those cases.
EDIT: Actually, for practical purposes, this would be perfectly fine too:
class Fuzzy(object):
def __eq__(*args):
return True

Is there a way to check if two object contain the same values in each of their variables in python?

How do I check if two instances of a
class FooBar(object):
__init__(self, param):
self.param = param
self.param_2 = self.function_2(param)
self.param_3 = self.function_3()
are identical? By identical I mean they have the same values in all of their variables.
a = FooBar(param)
b = FooBar(param)
I thought of
if a == b:
print "a and b are identical"!
Will this do it without side effects?
The background for my question is unit testing. I want to achieve something like:
self.failUnlessEqual(self.my_object.a_function(), another_object)

If you want the == to work, then implement the __eq__ method in your class to perform the rich comparison.
If all you want to do is compare the equality of all attributes, you can do that succinctly by comparison of __dict__ in each object:
class MyClass:
def __eq__(self, other) :
return self.__dict__ == other.__dict__

For an arbitrary object, the == operator will only return true if the two objects are the same object (i.e. if they refer to the same address in memory).
To get more 'bespoke' behaviour, you'll want to override the rich comparison operators, in this case specifically __eq__. Try adding this to your class:
def __eq__(self, other):
if self.param == other.param \
and self.param_2 == other.param_2 \
and self.param_3 == other.param_3:
return True
else:
return False
(the comparison of all params could be neatened up here, but I've left them in for clarity).
Note that if the parameters are themselves objects you've defined, those objects will have to define __eq__ in a similar way for this to work.
Another point to note is that if you try to compare a FooBar object with another type of object in the way I've done above, python will try to access the param, param_2 and param_3 attributes of the other type of object which will throw an AttributeError. You'll probably want to check the object you're comparing with is an instance of FooBar with isinstance(other, FooBar) first. This is not done by default as there may be situations where you would like to return True for comparison between different types.
See AJ's answer for a tidier way to simply compare all parameters that also shouldn't throw an attribute error.
For more information on the rich comparison see the python docs.

For python 3.7 onwards you can also use dataclass to check exactly what you want very easily. For example:
from dataclasses import dataclass
#dataclass
class FooBar:
param: str
param2: float
param3: int
a = Foobar("test_text",2.0,3)
b = Foobar("test_text",2.0,3)
print(a==b)
would return True

According to Learning Python by Lutz, the "==" operator tests value equivalence, comparing all nested objects recursively. The "is" operator tests whether two objects are the same object, i.e. of the same address in memory (same pointer value).
Except for cache/reuse of small integers and simple strings, two objects such as x = [1,2] and y = [1,2] are equal "==" in value, but y "is" x returns false. Same true with two floats x = 3.567 and y = 3.567. This means their addresses are different, or in other words, hex(id(x)) != hex(id(y)).
For class object, we have to override the method __eq__() to make two class A objects like x = A(1,[2,3]) and y = A(1,[2,3]) "==" in content. By default, class object "==" resorts to comparing id only and id(x) != id(y) in this case, so x != y.
In summary, if x "is" y, then x == y, but opposite is not true.

If this is something you want to use in your tests where you just want to verify fields of simple object to be equal, look at compare from testfixtures:
from testfixtures import compare
compare(a, b)

To avoid the possibility of adding or removing attributes to the model and forgetting to do the appropriate changes to your __eq__ function, you can define it as follows.
def __eq__(self, other):
if self.__class__ == other.__class__:
fields = [field.name for field in self._meta.fields]
for field in fields:
if not getattr(self, field) == getattr(other, field):
return False
return True
else:
raise TypeError('Comparing object is not of the same type.')
In this way, all the object attributes are compared. Now you can check for attribute equality either with object.__eq__(other) or object == other.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python set contains vs. list contains - python

Related

Should eq compare objects of two different types?

Special method like str that returns a number representation of an object

Overriding special methods on builtin types

OOP way to implement class for comparison in Python

Is there a way to check if two object contain the same values in each of their variables in python?

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python set contains vs. list contains - python

Related

Should __eq__ compare objects of two different types?

Special method like __str__ that returns a number representation of an object

Overriding special methods on builtin types

OOP way to implement class for comparison in Python

Is there a way to check if two object contain the same values in each of their variables in python?

Categories

Resources

Should eq compare objects of two different types?

Special method like str that returns a number representation of an object