I am writing a Python2 module that emulates a certain library. The results may be float, int, long, unicode, str, tuple, list, and custom objects. Lists may not contain lists, but they may contain tuples. Tuples may not contain lists or tuples. Otherwise, lists and tuples may contain any of the other types listed above.
(Actually, the module should not return long or str, but if it does, they should be caught and reported as different when compared to int and unicode, respectively.)
I am writing a testing program that checks the results against known answers by the library my module tries to emulate. The obvious answer would be to test the values and the types, but one problem I'm facing is that in corner cases, possible results to test for are -0.0 (which should be distinguished from 0.0) and NaN (Not a Number - a value a float can take).
However:
>>> a = float('nan')
>>> b = float('nan')
>>> a == b
False
>>> c = float('-0.0')
>>> c
-0.0
>>> d = 1.0 - 1.0
>>> c == d
True
The is operator doesn't help a bit:
>>> a is b
False
>>> d is 0.0
False
repr helps:
>>> repr(a) == repr(b)
True
>>> repr(c) == repr(d)
False
>>> repr(d) == repr(0.0)
True
But only to a point, since it doesn't help with objects:
>>> class e:
... pass
...
>>> f = e()
>>> g = e()
>>> f.x = float('nan')
>>> g.x = float('nan')
>>> f == g
False
>>> repr(f) == repr(g)
False
This works though:
>>> repr(f.__dict__) == repr(g.__dict__)
True
But it fails with tuples and lists:
>>> h = [float('nan'), f]
>>> i = [float('nan'), g]
>>> h == i
False
>>> repr(h) == repr(i)
False
>>> repr(h.__dict__) == repr(i.__dict__)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'list' object has no attribute '__dict__'
It seems I'm close, so I need to know:
Is there a simpler way to check for actual equality that doesn't have the burden of converting to string?
If not, how would I go about comparing lists or tuples containing objects?
Edit: To be clear, what I'm after is a full comparison function. My test function looks roughly like this:
>>> def test(expression, expected):
... actual = eval(expression)
... if not reallyequal(actual, expected):
... report_error(expression, actual, expected)
My question concerns what should reallyequal() look like.
Edit 2: I've found the Python standard module unittest but unfortunately none of the checks covers this use case, so it seems that if I intend to use it, I should use something like self.assertTrue(reallyequal(actual, expected)).
I'm actually surprised that it's so hard to make unit tests including expected NaNs and minus zeros nested within the results. I'm still using the repr solution which is a half-solution, but I'm open to other ideas.
Here is one implementation:
def really_equal(actual, expected, tolerance=0.0001):
"""Compare actual and expected for 'actual' equality."""
# 1. Both same type?
if not isinstance(actual, type(expected)):
return False
# 2. Deal with floats (edge cases, tolerance)
if isinstance(actual, float):
if actual == 0.0:
return str(actual) == str(expected)
elif math.isnan(actual):
return math.isnan(expected)
return abs(actual - expected) < tolerance
# 3. Deal with tuples and lists (item-by-item, recursively)
if isinstance(actual, (tuple, list)):
return all(really_equal(i1, i2) for i1, i2 in zip(actual, expected))
# 4. Fall back to 'classic' equality
return actual == expected
A few of your edge cases from "classic" equality:
>>> float('nan') == float('nan')
False
>>> really_equal(float('nan'), float('nan'))
True
>>> 0.0 == -0.0
True
>>> really_equal(0.0, -0.0)
False
>>> "foo" == u"foo"
True
>>> really_equal("foo", u"foo")
False
>>> 1L == 1
True
>>> really_equal(1L, 1)
False
Classes should implement their own __eq__ "magic method" to determine whether or not two instances are equal - they will fall through to # 4 and be compared there:
>>> class Test(object):
def __init__(self, val):
self.val = val
def __eq__(self, other):
return self.val == other.val
>>> a = Test(1)
>>> b = Test(1)
>>> really_equal(a, b)
True
From the answers and comments it seems clear that the answer to my first question (is there a simpler way than using repr()?) is no, there is no simpler way. So I've researched more on how to accomplish this as simply as possible and I've come up with this solution which answers my second question.
repr() works for the most part, but fails on objects of custom classes. Since the default repr() of a custom object is not useful as-is anyway for any meaningful purpose, what I've done is to override the __repr__ method of each base class like this:
class MyClass:
def __repr__(self):
return self.__class__.__name__ + "(" \
+ repr(sorted(self.__dict__.items(), key=lambda t: t[0])) + ")"
Now I can use repr() on any of the values and get an expression that actually represents these values uniquely, that my test program can catch.
def reallyequal(actual, expected):
return repr(actual) == repr(expected)
(which I will actually embed in the test function due to its simplicity).
Here it is in action:
>>> reallyequal(-0.0, 0.0)
False
>>> reallyequal(float('nan'),float('nan'))
True
>>> f = MyClass()
>>> f.x = float('nan')
>>> g = MyClass()
>>> g.x = float('nan')
>>> reallyequal(f, g)
True
>>> h = [f,3]
>>> i = [g,4]
>>> reallyequal(h, i)
False
>>> i[1] = 3
>>> reallyequal(h, i)
True
>>> g.x = 1
>>> reallyequal(h, i)
False
>>> f.x = 1L
>>> reallyequal(h, i)
False
>>> f.x = 1
>>> reallyequal(h, i)
True
Edit: Edited to incorporate commenter's suggestions re repr results with __dict__.
Related
I have a function in Python called object_from_DB. The definition isn't important except that it takes an ID value as an argument, uses the sqlite3 library to pull matching values from a table in a .db file, and then uses those values as arguments in the initialization of an object. The database is in no way changed by the use of this function.
This sample code, in light of this, baffles me.
>>> x = object_from_DB(422)
>>> y = object_from_DB(422)
>>> x == y
False
Why does this happen, and what sort of technique will cause x and y to return True when compared?
By default, two distinct instances of any user-defined class are unequal:
>>> class X: pass
...
>>> a = X()
>>> b = X()
>>> a == b
False
If you want different behaviour, you have to define it:
class Y:
def __init__(self, value):
self.value = value
def __eq__(self, other):
return self.value == other.value
>>> c = Y(3)
>>> d = Y(3)
>>> e = Y(4)
>>> c == d
True
>>> d == e
False
I have trouble understanding the meaning of not in a statement such as
not int(x)
It evaluates to True if x is equal to 0.
But if x is any other number it evaluates to False.
I would like an explanation for this behavior, thanks.
not some_object will return True if some_object is falsy, i.e. if bool(some_object) will return False.
For any integer z, bool(z) will always be True unless z==0. So not int(x) is just a way of checking whether x, after you convert it to an integer (using int), is zero.
Demo:
>>> x = '-7' # this is not 0 after conversion to an integer
>>> bool(int(x))
True
>>> x = '0'
>>> bool(x) # a non-empty string is truthy
True
>>> bool(int(x))
False
>>> not int(x) # you can omit the call to bool in a boolean context
True
In a boolean context, we can omit the call to bool. Using the implicit booleanness of objects can come in handy, especially when you want to check if some object is empty (such as empty strings, sets, lists, dictionaries...).
>>> not {}
True
>>> not []
True
>>> not set()
True
>>> not ''
True
>>> not tuple()
True
>>> not 0.0
True
>>> not 0j
True
>>> not [1,2,3]
False
The methods involved here are __nonzero__ for Python2 and __bool__ for Python3. Theoretically, we could override these. Consider the following Python2 example:
>>> class LyingList(list):
... def __nonzero__(self): # for Py3, override __bool__
... return True
...
>>> liar = LyingList([])
>>> liar
[]
>>> not liar
False
uh oh!
I have defined a list as below:
list = [1,3,2,[4,5,6]]
then defined a comparator method as below:
def reverseCom(x,y):
if(x>y):
return -1
elif(x<y):
return 1
else:
return 0
Now I have sorted the list using reverseCom:
list.sort(reverseCom)
print list
Result : [[4, 5, 6], 3, 2, 1]
Though the element [4, 5, 6] is not comparable with other elements of the list. How its not throwing any error ?
Do you can help me to understand that how sort works with the user defined comparator in python ?
This is a Python 2 quirk. In Python 2, numeric and non numeric values are comparable, and numeric values are always considered to be less than the value of container objects:
>>> 1 < [1]
True
>>> 1 < [2]
True
>>> 1558 < [1]
True
>>> 1 < {}
True
when comparing two containers values of different types, on the other hand, it is the name of their type that is taken into consideration:
>>> () < []
False
>>> 'tuple' < 'list'
False
>>> {} < []
True
>>> 'dict' < 'list'
True
This feature, however, has been dropped in Python 3, which made numeric and non-numeric values no longer comparable:
>>> 1 < [1]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unorderable types: int() < list()
EDIT: this next explanation is fully experimentation-based, and I couldn't find sound documentation to back it up. If any one does find it, I'd be glad to read through it.
It appears Python 2 has even more rules when it comes to comparison of user-defined objects/non-container objects.
In this case it appears that numeric values are always greater than non-numeric non-container values.
>>> class A: pass
...
>>> a = A()
>>> 1 > a
True
>>> 2.7 > a
True
Now, when comparing two objects of different, non-numeric, non-container types, it seems that it is their address that is taken into account:
>>> class A: pass
...
>>> class B: pass
...
>>> a = A()
>>> a
<__main__.A instance at 0x0000000002265348>
>>> b = B()
>>> b
<__main__.B instance at 0x0000000002265048>
>>> a < b
False
>>> b < a
True
Which is really bananas, if you ask me.
Of course, all that can be changed around if you care to override the __lt__() and __gt__() methods inside your class definition, which determine the standard behavior of the < and > operators.
Further documentation on how these methods operate can be found here.
Bottomline: avoid comparison between different types as much as you can. The result is really unpredictable, unintuitive and not all that well documented. Also, use Python 3 whenever possible.
Your comparator actually works, i.e., does not throw any error:
In [9]: reverseCom([4,5,6],1)
Out[9]: -1
In [10]: reverseCom([4,5,6],2)
Out[10]: -1
In [11]: reverseCom([4,5,6],3)
Out[11]: -1
The reason why it works is, list instances always bigger than int instances:
In [12]: [1,2,3] > 5
Out[12]: True
In [13]: ['hello'] > 5
Out[13]: True
In [14]: [] > -1
Out[14]: True
In Python, if I have the following code:
r = Numeric(str)
i = int(r)
if r == i :
return i
return r
Is this equivalent to:
r = Numeric(str)
return r
Or do the == values of different types r and i give different return values r and i?
It all depends if the class implements an adequate __eq__ method to override == operator.
Edit: Added a little example:
>>> class foo:
... def __init__(self,x):
... self.x = x
... def __eq__(self,y):
... return int(self.x)==int(y)
...
>>> f = foo(5)
>>> f == '5'
True
>>> 5 == '5'
False
Lets see:
>>> float(2) == int(2)
True
Different types can be considered equal using ==.
Question: "do the == values of different types r and i give different return values r and i?"
Answer: clearly they are different; they have different types.
>>> print(type(i))
<type 'int'>
>>> print(type(n))
<class '__main__.Numeric'>
In the above example, I declared a class called Numeric to have something to test. If you actually have a module that implements a class called Numeric, it won't say __main__.Numeric but something else.
If the class implements a __eq__() method function, then the results of == will depend on what that function does.
class AlwaysEqual(object):
def __init__(self, x):
self.x = x
def __eq__(self, other):
return True
With the above, we can now do:
>>> x = AlwaysEqual(42)
>>> print(x == 6*9)
True
>>> print(x == "The answer to life, the universe, and everything")
True
... the is keyword that can be used for equality in strings.
>>> s = 'str'
>>> s is 'str'
True
>>> s is 'st'
False
I tried both __is__() and __eq__() but they didn't work.
>>> class MyString:
... def __init__(self):
... self.s = 'string'
... def __is__(self, s):
... return self.s == s
...
>>>
>>>
>>> m = MyString()
>>> m is 'ss'
False
>>> m is 'string' # <--- Expected to work
False
>>>
>>> class MyString:
... def __init__(self):
... self.s = 'string'
... def __eq__(self, s):
... return self.s == s
...
>>>
>>> m = MyString()
>>> m is 'ss'
False
>>> m is 'string' # <--- Expected to work, but again failed
False
>>>
Testing strings with is only works when the strings are interned. Unless you really know what you're doing and explicitly interned the strings you should never use is on strings.
is tests for identity, not equality. That means Python simply compares the memory address a object resides in. is basically answers the question "Do I have two names for the same object?" - overloading that would make no sense.
For example, ("a" * 100) is ("a" * 100) is False. Usually Python writes each string into a different memory location, interning mostly happens for string literals.
The is operator is equivalent to comparing id(x) values. For example:
>>> s1 = 'str'
>>> s2 = 'str'
>>> s1 is s2
True
>>> id(s1)
4564468760
>>> id(s2)
4564468760
>>> id(s1) == id(s2) # equivalent to `s1 is s2`
True
id is currently implemented to use pointers as the comparison. So you can't overload is itself, and AFAIK you can't overload id either.
So, you can't. Unusual in python, but there it is.
The Python is keyword tests object identity. You should NOT use it to test for string equality. It may seem to work frequently because Python implementations, like those of many very high level languages, performs "interning" of strings. That is to say that string literals and values are internally kept in a hashed list and those which are identical are rendered as references to the same object. (This is possible because Python strings are immutable).
However, as with any implementation detail, you should not rely on this. If you want to test for equality use the == operator. If you truly want to test for object identity then use is --- and I'd be hard-pressed to come up with a case where you should care about string object identity. Unfortunately you can't count on whether two strings are somehow "intentionally" identical object references because of the aforementioned interning.
The is keyword compares objects (or, rather, compares if two references are to the same object).
Which is, I think, why there's no mechanism to provide your own implementation.
It happens to work sometimes on strings because Python stores strings 'cleverly', such that when you create two identical strings they are stored in one object.
>>> a = "string"
>>> b = "string"
>>> a is b
True
>>> c = "str"+"ing"
>>> a is c
True
You can hopefully see the reference vs data comparison in a simple 'copy' example:
>>> a = {"a":1}
>>> b = a
>>> c = a.copy()
>>> a is b
True
>>> a is c
False
If you are not afraid of messing up with bytecode, you can intercept and patch COMPARE_OP with 8 ("is") argument to call your hook function on objects being compared. Look at dis module documentation for start-in.
And don't forget to intercept __builtin__.id() too if someone will do id(a) == id(b) instead of a is b.
'is' compares object identity whereas == compares values.
Example:
a=[1,2]
b=[1,2]
#a==b returns True
#a is b returns False
p=q=[1,2]
#p==q returns True
#p is q returns True
is fails to compare a string variable to string value and two string variables when the string starts with '-'. My Python version is 2.6.6
>>> s = '-hi'
>>> s is '-hi'
False
>>> s = '-hi'
>>> k = '-hi'
>>> s is k
False
>>> '-hi' is '-hi'
True
You can't overload the is operator. What you want to overload is the == operator. This can be done by defining a __eq__ method in the class.
You are using identity comparison. == is probably what you want. The exception to this is when you want to be checking if one item and another are the EXACT same object and in the same memory position. In your examples, the item's aren't the same, since one is of a different type (my_string) than the other (string). Also, there's no such thing as someclass.__is__ in python (unless, of course, you put it there yourself). If there was, comparing objects with is wouldn't be reliable to simply compare the memory locations.
When I first encountered the is keyword, it confused me as well. I would have thought that is and == were no different. They produced the same output from the interpreter on many objects. This type of assumption is actually EXACTLY what is... is for. It's the python equivalent "Hey, don't mistake these two objects. they're different.", which is essentially what [whoever it was that straightened me out] said. Worded much differently, but one point == the other point.
the
for some helpful examples and some text to help with the sometimes confusing differences
visit a document from python.org's mail host written by "Danny Yoo"
or, if that's offline, use the unlisted pastebin I made of it's body.
in case they, in some 20 or so blue moons (blue moons are a real event), are both down, I'll quote the code examples
###
>>> my_name = "danny"
>>> your_name = "ian"
>>> my_name == your_name
0 #or False
###
###
>>> my_name[1:3] == your_name[1:3]
1 #or True
###
###
>>> my_name[1:3] is your_name[1:3]
0
###
Assertion Errors can easily arise with is keyword while comparing objects. For example, objects a and b might hold same value and share same memory address. Therefore, doing an
>>> a == b
is going to evaluate to
True
But if
>>> a is b
evaluates to
False
you should probably check
>>> type(a)
and
>>> type(b)
These might be different and a reason for failure.
Because string interning, this could look strange:
a = 'hello'
'hello' is a #True
b= 'hel-lo'
'hel-lo' is b #False