compare two custom lists python - python

I'm having trouble comparing two list of objects in python
I'm converting a message into
class StatusMessage(object):
def __init__(self, conversation_id, platform):
self.__conversation_id = str(conversation_id)
self.__platform = str(platform)
#property
def conversation_id(self):
return self.__conversation_id
#property
def platform(self):
return self.__platform
Now when I create two lists of type StatusMessage
>>> expected = []
>>> expected.append(StatusMessage(1, "abc"))
>>> expected.append(StatusMessage(2, "bbc"))
>>> actual = []
>>> actual.append(StatusMessage(1, "abc"))
>>> actual.append(StatusMessage(2, "bbc"))
and then I compare the two lists using
>>> cmp(actual, expected)
or
>>> len(set(expected_messages_list).difference(actual_list)) == 0
I keep getting failures.
When I debug and actually compare for each item within the list like
>>> actual[0].conversation_id == expected[0].conversation_id
>>> actual[0].platform == expected[0].platform
then I always see
True
Doing below returns -1
>>> cmp(actual[0], expected[0])
why is this so. What am I missing???

You must tell python how to check two instances of class StatusMessage for equality.
For example, adding the method
def __eq__(self,other):
return (self is other) or (self.conversation_id, self.platform) == (other.conversation_id, other.platform)
will have the following effect:
>>> cmp(expected,actual)
0
>>> expected == actual
True
If you want to use cmp with your StatusMessage objects, consider implementing the __lt__ and __gt__ methods as well. I don't know by which rule you want to consider one instance lesser or greater than another instance.
In addition, consider returning False or error-checking for comparing a StatusMessage object with an arbitrary object that has no conversation_id or platform attribute. Otherwise, you will get an AttributeError:
>>> actual[0] == 1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "a.py", line 16, in __eq__
return (self is other) or (self.conversation_id, self.platform) == (other.conversation_id, other.platform)
AttributeError: 'int' object has no attribute 'conversation_id'
You can find one reason why the self is other check is a good idea here (possibly unexpected results in multithreaded applications).

Because you are trying to compare two custom objects, you have to define what makes the objects equal or not. You do this by defining the __eq__() method on the StatusMessage class:
class StatusMessage(object):
def __eq__(self, other):
return self.conversation_id == other.conversation_id and
self.platform == other.platform

Related

Equality Comparison with NumPy Instance Invokes `__bool__`

I have defined a class where its __ge__ method returns an instance of itself, and whose __bool__ method is not allowed to be invoked (similar to a Pandas Series).
Why is X.__bool__ invoked during np.int8(0) <= x, but not for any of the other examples? Who is invoking it? I have read the Data Model docs but I haven’t found my answer there.
import numpy as np
import pandas as pd
class X:
def __bool__(self):
print(f"{self}.__bool__")
assert False
def __ge__(self, other):
print(f"{self}.__ge__")
return X()
x = X()
np.int8(0) <= x
# Console output:
# <__main__.X object at 0x000001BAC70D5C70>.__ge__
# <__main__.X object at 0x000001BAC70D5D90>.__bool__
# Traceback (most recent call last):
# File "<stdin>", line 1, in <module>
# File "<stdin>", line 4, in __bool__
# AssertionError
0 <= x
# Console output:
# <__main__.X object at 0x000001BAC70D5C70>.__ge__
# <__main__.X object at 0x000001BAC70D5DF0>
x >= np.int8(0)
# Console output:
# <__main__.X object at 0x000001BAC70D5C70>.__ge__
# <__main__.X object at 0x000001BAC70D5D30>
pd_ge = pd.Series.__ge__
def ge_wrapper(self, other):
print("pd.Series.__ge__")
return pd_ge(self, other)
pd.Series.__ge__ = ge_wrapper
pd_bool = pd.Series.__bool__
def bool_wrapper(self):
print("pd.Series.__bool__")
return pd_bool(self)
pd.Series.__bool__ = bool_wrapper
np.int8(0) <= pd.Series([1,2,3])
# Console output:
# pd.Series.__ge__
# 0 True
# 1 True
# 2 True
# dtype: bool
I suspect that np.int8.__le__ is defined so that instead of returning NotImplemented and letting X.__ge__ take over, it instead tries to return something like not (np.int(8) > x), and then np.int8.__gt__ raises NotImplemented. Once X.__gt__(x, np.int8(0)) returns an instance of X rather than a Boolean value, then we need to call x.__bool__() in order to compute the value of not x.
(Still trying to track down where int8.__gt__ is defined to confirm.)
(Update: not quite. int8 uses a single generic rich comparison function that simply converts the value to a 0-dimensional array, then returns the result of PyObject_RichCompare on the array and x.)
I did find this function that appears to ultimately implement np.int8.__le__:
static NPY_INLINE int
rational_le(rational x, rational y) {
return !rational_lt(y,x);
}
It's not clear to me how we avoid getting to this function if one of the arguments (like X) would not be a NumPy type. I think I give up.
TL;DR
X.__array_priority__ = 1000
The biggest hint is that it works with a pd.Series.
First I tried having X inherit from pd.Series. This worked (i.e. __bool__ no longer called).
To determine whether NumPy is using an isinstance check or duck-typing approach, I removed the explicit inheritance and added (based on this answer):
#property
def __class__(self):
return pd.Series
The operation no longer worked (i.e. __bool__ was called).
So now I think we can conclude NumPy is using a duck-typing approach. So I checked to see what attributes are being accessed on X.
I added the following to X:
def __getattribute__(self, item):
print("getattr", item)
return object.__getattribute__(self, item)
Again instantiating X as x, and invoking np.int8(0) <= x, we get:
getattr __array_priority__
getattr __array_priority__
getattr __array_priority__
getattr __array_struct__
getattr __array_interface__
getattr __array__
getattr __array_prepare__
<__main__.X object at 0x000002022AB5DBE0>.__ge__
<__main__.X object at 0x000002021A73BE50>.__bool__
getattr __array_struct__
getattr __array_interface__
getattr __array__
Traceback (most recent call last):
File "<stdin>", line 32, in <module>
np.int8(0) <= x
File "<stdin>", line 21, in __bool__
assert False
AssertionError
Ah-ha! What is __array_priority__? Who cares, really. With a little digging, all we need to know is that NDFrame (from which pd.Series inherits) sets this value as 1000.
If we add X.__array_priority__ = 1000, it works! __bool__ is no longer called.
What made this so difficult (I believe) is that the NumPy code didn't show up in the call stack because it is written in C. I could investigate further if I tried out the suggestion here.

Overriding __eq__ and __hash__ to compare a dict attribute of two instances

I'm struggling to understand how to correctly compare objects based on an underlying dict attribute that each instance possesses.
Since I'm overriding __eq__, do I need to override __hash__ as well? I haven't a firm grasp on when/where to do so and could really use some help.
I created a simple example below to illustrate the maximum recursion exception that I've run into. A RegionalCustomerCollection organizes account IDs by geographical region. RegionalCustomerCollection objects are said to be equal if the regions and their respective accountids are. Essentially, all items() should be equal in content.
from collections import defaultdict
class RegionalCustomerCollection(object):
def __init__(self):
self.region_accountids = defaultdict(set)
def get_region_accountid(self, region_name=None):
return self.region_accountids.get(region_name, None)
def set_region_accountid(self, region_name, accountid):
self.region_accountids[region_name].add(accountid)
def __eq__(self, other):
if (other == self):
return True
if isinstance(other, RegionalCustomerCollection):
return self.region_accountids == other.region_accountids
return False
def __repr__(self):
return ', '.join(["{0}: {1}".format(region, acctids)
for region, acctids
in self.region_accountids.items()])
Let's create two object instances and populate them with some sample data:
>>> a = RegionalCustomerCollection()
>>> b = RegionalCustomerCollection()
>>> a.set_region_accountid('northeast',1)
>>> a.set_region_accountid('northeast',2)
>>> a.set_region_accountid('northeast',3)
>>> a.set_region_accountid('southwest',4)
>>> a.set_region_accountid('southwest',5)
>>> b.set_region_accountid('northeast',1)
>>> b.set_region_accountid('northeast',2)
>>> b.set_region_accountid('northeast',3)
>>> b.set_region_accountid('southwest',4)
>>> b.set_region_accountid('southwest',5)
Now let's try to compare the two instances and generate the recursion exception:
>>> a == b
...
RuntimeError: maximum recursion depth exceeded while calling a Python object
Your object shouldn't return a hash because it's mutable. If you put this object into a dictionary or set and then change it afterward, you may never be able to find it again.
In order to make an object unhashable, you need to do the following:
class MyClass(object):
__hash__ = None
This will ensure that the object is unhashable.
[in] >>> m = MyClass()
[in] >>> hash(m)
[out] >>> TypeError: unhashable type 'MyClass'
Does this answer your question? I'm suspecting not because you were explicitly looking for a hash function.
As far as the RuntimeError you're receiving, it's because of the following line:
if self == other:
return True
That gets you into an infinite recursion loop. Try the following instead:
if self is other:
return True
You don't need to override __hash__ to compare two objects (you'll need to if you want custom hashing, i.e. to improve performance when inserting into sets or dictionaries).
Also, you have infinite recursion here:
def __eq__(self, other):
if (other == self):
return True
if isinstance(other, RegionalCustomerCollection):
return self.region_accountids == other.region_accountids
return False
If both objects are of type RegionalCustomerCollection then you'll have infinite recursion since == calls __eq__.

Is it possible to make a variable an object of a class by changing its type?

Assume that we have a test()class defined as below :
>>> class test():
pass
Normally when I run the below code I make Obj as an object of my test() class :
>>> obj=test()
>>>
>>> obj
<__main__.test object at 0x00000000031B2390>
>>> type(obj)
<class '__main__.test'>
>>>
As you see above obj has two features. It has a value and a type.
In the below, I assign Obj value as a string to another variable called var1 :
>>> var1='<__main__.test object at 0x00000000031B2390>'
>>>
>>> type(var1)
<class 'str'>
>>>
As you see above , obj1 and var1 are equal in value, but are different in type. And again, as you know we can change type of an object to string using str() function as below :
>>> Obj=str(Obj)
>>> Obj
'<__main__.test object at 0x00000000031B2390>'
>>>
Now, I want to know if is there any way to reverse above function? I mean, Is there any way to make a string-type variable as a object?
I mean is there any way to make Var1 equal to Obj?
In the other word, assume that I know <__main__.test object at 0x00000000031B2390> is the value of an object of a class. But I don't know neither the name of the object nor the name of the class. Now I want to create another object of that class. Is there any way?
You are confusing the representation of an object (obj.__repr__(), or repr(obj)) with its value (which doesn't necessarily have a sensible meaning for all types of object *). Here is an example of an object with a misleading representation:
>>> a = 1
>>> a
1
>>> class FakeInt(object):
def __repr__(self):
return "1"
>>> b = FakeInt()
>>> b
1
>>> a == b
False
>>> a + b
Traceback (most recent call last):
File "<pyshell#22>", line 1, in <module>
a + b
TypeError: unsupported operand type(s) for +: 'int' and 'FakeInt'
a looks like b, but isn't equal to it and cannot be used in the same way.
It is conventional to implement __repr__ and __eq__ instance methods such that eval(repr(obj)) == obj. This allows you to create new objects that are equal to the existing ones, for example:
>>> class SillyString(object):
def __init__(self, s):
self.string = s
def __repr__(self):
return "SillyString({0.string!r})".format(self)
def __eq__(self, other):
return self.string == other.string
>>> s = SillyString("foo")
>>> repr(s)
"SillyString('foo')"
>>> eval(repr(s)) == s
True # equal to s
>>> eval(repr(s)) is s
False # but not identical to s, i.e. a separate object
However this seems pointless for your test objects, because they don't have attributes or methods to copy anyway.
* For example, here is an object definition:
def some_obj(object):
def __init__(self, foo, bar):
self.foo = foo
self.bar = bar
What should its "value" be? foo? bar? Some combination of the two?
There are a fair few misconceptions here.
First, it is untrue to say that a variable has two things, a value and a type. It has one thing only: a value. That value has a type. Variables in Python do not have types, they are simply names pointing at things.
Secondly, assigning a string that looks like the repr of an object does not somehow make the value equal to the object. A string is just a string.

Method inside a method in Python

I have seen source code where more than one methods are called on an object eg x.y().z() Can someone please explain this to me, does this mean that z() is inside y() or what?
This calls the method y() on object x, then the method z() is called on the result of y() and that entire line is the result of method z().
For example
friendsFavePizzaToping = person.getBestFriend().getFavoritePizzaTopping()
This would result in friendsFavePizzaTopping would be the person's best friend's favorite pizza topping.
Important to note: getBestFriend() must return an object that has the method getFavoritePizzaTopping(). If it does not, an AttributeError will be thrown.
Each method is evaluated in turn, left to right. Consider:
>>> s='HELLO'
>>> s.lower()
'hello'
>>> s='HELLO '
>>> s.lower()
'hello '
>>> s.lower().strip()
'hello'
>>> s.lower().strip().upper()
'HELLO'
>>> s.lower().strip().upper().replace('H', 'h')
'hELLO'
The requirement is that the object to the left in the chain has to have availability of the method on the right. Often that means that the objects are similar types -- or at least share compatible methods or an understood cast.
As an example, consider this class:
class Foo:
def __init__(self, name):
self.name=name
def m1(self):
return Foo(self.name+'=>m1')
def m2(self):
return Foo(self.name+'=>m2')
def __repr__(self):
return '{}: {}'.format(id(self), self.name)
def m3(self):
return .25 # return is no longer a Foo
Notice that as a type of immutable, each return from Foo is a new object (either a new Foo for m1, m2 or a new float). Now try those methods:
>>> foo
4463545376: init
>>> foo.m1()
4463545304: init=>m1
^^^^ different object id
>>> foo
4463545376: init
^^^^ foo still the same because you need to assign it to change
Now assign:
>>> foo=foo.m1().m2()
>>> foo
4464102576: init=>m1=>m2
Now use m3() and it will be a float; not a Foo anymore:
>>> foo=foo.m1().m2().m3()
>>> foo
.25
Now a float -- can't use foo methods anymore:
>>> foo.m1()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'float' object has no attribute 'm1'
But you can use float methods:
>>> foo.as_integer_ratio()
(1, 4)
In the case of:
x.y().z()
You're almost always looking at immutable objects. Mutable objects don't return anything that would HAVE a function like that (for the most part, but I'm simplifying). For instance...
class x:
def __init__(self):
self.y_done = False
self.z_done = False
def y(self):
new_x = x()
new_x.y_done = True
return new_x
def z(self):
new_x = x()
new_x.z_done = True
return new_x
You can see that each of x.y and x.z returns an x object. That object is used to make the consecutive call, e.g. in x.y().z(), x.z is not called on x, but on x.y().
x.y().z() =>
tmp = x.y()
result = tmp.z()
In #dawg's excellent example, he's using strings (which are immutable in Python) whose methods return strings.
string = 'hello'
string.upper() # returns a NEW string with value "HELLO"
string.upper().replace("E","O") # returns a NEW string that's based off "HELLO"
string.upper().replace("E","O") + "W"
# "HOLLOW"
The . "operator" is Python syntax for attribute access. x.y is (nearly) identical to
getattr(x, 'y')
so x.y() is (nearly) identical to
getattr(x, 'y')()
(I say "nearly identical" because it's possible to customize attribute access for a user-defined class. From here on out, I'll assume no such customization is done, and you can assume that x.y is in fact identical to getattr(x, 'y').)
If the thing that x.y() returns has an attribute z such that
foo = getattr(x, 'y')
bar = getattr(foo(), 'z')
is legal, then you can chain the calls together without needing the name foo in the middle:
bar = getattr(getattr(x, 'y')(), 'z')
Converting back to dot notation gives you
bar = getattr(x.y(), 'z')
or simply
bar = x.y().z()
x.y().z() means that the x object has the method y() and the result of x.y() object has the method z() . Now if you first want to apply the method y() on x and then on the result want to apply the z() method, you will write x.y().z(). This is like,
val = x.y()
result = val.z()
Example:
my_dict = {'key':'value'}
my_dict is a dict type object. my_dict.get('key') returns 'value' which is a str type object. now I can apply any method of str type object on it. which will be like,
my_dict.get('key').upper()
This will return 'VALUE'.
That is (sometimes a sign of) bad code.
It violates The law of Demeter. Here is a quote from Wikipedia explaining what is meant:
Each unit should have only limited knowledge about other units: only units "closely" related to the current unit.
Each unit should only talk to its friends; don't talk to strangers.
Only talk to your immediate friends.
Suppose you have a car, which itself has an engine:
class Car:
def __init__(self):
self._engine=None
#property
def engine(self):
return self._engine
#engine.setter
def engine(self, value):
self._engine = value
class Porsche_engine:
def start(self):
print("starting")
So if you make a new car and set the engine to Porsche you could do the following:
>>> from car import *
>>> c=Car()
>>> e=Porsche_engine()
>>> c.engine=e
>>> c.engine.start()
starting
If you are maing this call from an Object, it has not only knowledge of a Car object, but has too knowledge of Engine, which is bad design.
Additionally: if you do not know whether a Car has an engine, calling directly start
>>> c=Car()
>>> c.engine.start()
May result in an Error
AttributeError: 'NoneType' object has no attribute 'start'
Edit:
To avoid (further) misunterstandings and misreadings, from what I am saying.
There are two usages:
1) as I pointed out, Objects calling methods on other objects, returned from a third object is a violation of LoD. This is one way to read the question.
2) an exception to that is method chaining, which is not bad design.
And a better design would be, if the Car itself had a start()-Method which delegates to the engine.

Test equality of two functions in python

I want to make two functions equal to each other, like this:
def fn_maker(fn_signature):
def _fn():
pass
_fn.signature = fn_signature
return _fn
# test equality of two function instances based on the equality of their signature values
>>> fa = fn_maker(1)
>>> fb = fn_maker(1)
>>> fc = fn_maker(2)
>>> fa == fb # should be True, same signature values
True
>>> fa == fc # should be False, different signature values
False
How should I do it? I know I could probably override eq and ne if fa, fb, fc are instances of some class. But here eq is not in dir(fa) and adding it the list doesnt work.
I figured out some workaround like using a cache, e.g.,
def fn_maker(fn_signature):
if fn_signature in fn_maker.cache:
return fn_maker.cache[fn_signature]
def _fn():
pass
_fn.signature = fn_signature
fn_maker.cache[fn_signature] = _fn
return _fn
fn_maker.cache = {}
By this way there is a guarantee that there is only one function for the same signature value (kinda like a singleton). But I am really looking for some neater solutions.
If you turned your functions into instances of some class that overrides __call__() as well as the comparison operators, it will be very easy to achieve the semantics you want.
It is not possible to override the __eq__ implementation for functions (tested with Python 2.7)
>>> def f():
... pass
...
>>> class A(object):
... pass
...
>>> a = A()
>>> a == f
False
>>> setattr(A, '__eq__', lambda x,y: True)
>>> a == f
True
>>> setattr(f.__class__, '__eq__', lambda x,y: True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can't set attributes of built-in/extension type 'function'
I don't think it's possible.
But overriding __call__ seems a nice solution to me.

Categories

Resources