Questions about python dictionary equality [duplicate]

Questions about python dictionary equality [duplicate] - python

Curiously:
>>> a = 123
>>> b = 123
>>> a is b
True
>>> a = 123.
>>> b = 123.
>>> a is b
False
Seems a is b being more or less defined as id(a) == id(b). It is easy to make bugs this way:
basename, ext = os.path.splitext(fname)
if ext is '.mp3':
# do something
else:
# do something else
Some fnames unexpectedly ended up in the else block. The fix is simple, we should use ext == '.mp3' instead, but nonetheless if ext is '.mp3' on the surface seems like a nice pythonic way to write this and it's more readable than the "correct" way.
Since strings are immutable, what are the technical details of why it's wrong? When is an identity check better, and when is an equality check better?

They are fundamentally different.
== compares by calling the __eq__ method
is returns true if and only if the two references are to the same object
So in comparision with say Java:
is is the same as == for objects
== is the same as equals for objects

As far as I can tell, is checks for object identity equivalence. As there's no compulsory "string interning", two strings that just happen to have the same characters in sequence are, typically, not the same string object.
When you extract a substring from a string (or, really, any subsequence from a sequence), you will end up with two different objects, containing the same value(s).
So, use is when and only when you are comparing object identities. Use == when comparing values.

Simple rule for determining if to use is or == in Python
Here is an easy rule (unless you want to go to theory in Python interpreter or building frameworks doing funny things with Python objects):
Use is only for None comparison.
if foo is None
Otherwise use ==.
if x == 3
Then you are on the safe side. The rationale for this is already explained int the above comments. Don't use is if you are not 100% sure why to do it.

It would be also useful to define a class like this to be used as the default value for constants used in your API. In this case, it would be more correct to use is than the == operator.
class Sentinel(object):
"""A constant object that does not change even when copied."""
def __deepcopy__(self, memo):
# Always return the same object because this is essentially a constant.
return self
def __copy__(self):
# called via copy.copy(x)
return self

You should be warned by PyCharm when you use is with a literal with a warning such as SyntaxWarning: "is" with a literal. Did you mean "=="?. So, when comparing with a literal, always use ==. Otherwise, you may prefer using is in order to compare objects through their references.

Related

why some types of data refer to the same memory location

Asked such a question. Why only the type only str and boolean with the same variables refer to one memory location:
a = 'something'
b = 'something'
if a is b: print('True') # True
but we did not write anywhere a = b. hence the interpreter saw that the strings are equal to each other and made a reference to one memory cell.
Of course, if we assign a new value to either of these two variables, there will be no conflict, so now the variable will refer to another memory location
b = 'something more'
if a is b: print('True') # False
with type boolean going on all the same
a = True
b = True
if a is b: print('True') # True
I first thought that this happens with all mutable types. But no. There remained one unchangeable type - tuple. But it has a different behavior, that is, when we assign the same values to variables, we already refer to different memory cells. Why does this happen only with tuple of immutable types
a = (1,9,8)
b = (1,9,8)
if a is b: print('True') # False

In Python, == checks for value equality, while is checks if basically its the same object like so: id(object) == id(object)
Python has some builtin singletons which it starts off with (I'm guessing lower integers and some commonly used strings)
So, if you dig deeper into your statement
a = 'something'
b = 'something'
id(a)
# 139702804094704
id(b)
# 139702804094704
a is b
# True
But if you change it a bit:
a = 'something else'
b = 'something else'
id(a)
# 139702804150640
id(b)
# 139702804159152
a is b
# False
We're getting False because Python uses different memory location for a and b this time, unlike before.
My guess is with tuples (and someone correct me if I'm mistaken) Python allocates different memory every time you create one.

Why do some types cache values? Because you shouldn't be able to notice the difference!
is is a very specialized operator. Nearly always you should use == instead, which will do exactly what you want.
The cases where you want to use is instead of == basically are when you're dealing with objects that have overloaded the behavior of == to not mean what you want it to mean, or where you're worried that you might be dealing with such objects.
If you're not sure whether you're dealing with such objects or not, you're probably not, which means that == is always right and you don't have to ever use is.
It can be a matter of "style points" to use is with known singleton objects, like None, but there's nothing wrong with using == there (again, in the absence of a pathological implementation of ==).
If you're dealing with potentially untrustworthy objects, then you should never do anything that may invoke a method that they control.... and that's a good place to use is. But almost nobody is doing that, and those who do should be aware of the zillion other ways a malicious object could cause problems.
If an object implements == incorrectly then you can get all kinds of weird problems. In the course of debugging those problems, of course you can and should use is! But that shouldn't be your normal way of comparing objects in code you write.
The one other case where you might want to use is rather than == is as a performance optimization, if the object you're dealing with implements == in a particularly expensive way. This is not going to be the case very often at all, and most of the time there are better ways to reduce the number of times you have to compare two objects (e.g. by comparing hash codes instead) which will ultimately have a much better effect on performance without bringing correctness into question.
If you use == wherever you semantically want an equality comparison, then you will never even notice when some types sneakily reuse instances on you.

Python: Identical strings (or numbers) with unique ids?

Python is wonderfully optimized, but I have a case where I'd like to work around it. It seems for small numbers and strings, python will automatically collapse multiple objects into one. For example:
>>> a = 1
>>> b = 1
>>> id(a) == id(b)
True
>>> a = str(a)
>>> b = str(b)
>>> id(a) == id(b)
True
>>> a += 'foobar'
>>> b += 'foobar'
>>> id(a) == id(b)
False
>>> a = a[:-6]
>>> b = b[:-6]
>>> id(a) == id(b)
True
I have a case where I'm comparing objects based on their Python ids. This is working really well except for the few cases where I run into small numbers. Does anyone know how to turn off this optimization for specific strings and integers? Something akin to an anti-intern()?

You shouldn't be relying on these objects to be different objects at all. There's no way to turn this behavior off without modifying and recompiling Python, and which particular objects it applies to is subject to change without notice.

You can't turn it off without re-compiling your own version of CPython.
But if you want to have "separate" versions of the same small integers, you can do that by maintaining your own id (for example a uuid4) associated with the object.
Since ints and strings are immutable, there's no obvious reason to do this - if you can't modify the object at all, you shouldn't care whether you have the "original" or a copy because there is no use-case where it can make any difference.
Related: How to create the int 1 at two different memory locations?

Sure, it can be done, but its never really a good idea:
#
Z =1
class MyString(string):
def __init__(self, *args):
global Z
super(MyString,
self).__init__(*args)
self.i = Z
Z += 1
>>> a = MyString("1")
>>> b = MyString("1")
>>> a is b
False
btw, to compare if objects have the same id just use a is b instead of id(a)==id(b)

The Python documentation on id() says
Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
CPython implementation detail: This is the address of the object in memory.
So it's guaranteed to be unique, it must be intended as a way to tell if two variables are bound to the same object.
In a comment on StackOverflow here, Alex Martelli says the CPython implementation is not the authoritative Python, and other correct implementations of Python can and do behave differently in some ways - and that the Python Language Reference (PLR) is the closest thing Python has to a definitive specification.
In the PLR section on objects it says much the same:
Every object has an identity, a type and a value. An object’s identity never changes once it has been created; you may think of it as the object’s address in memory. The ‘is‘ operator compares the identity of two objects; the id() function returns an integer representing its identity (currently implemented as its address).
The language reference doesn't say it's guaranteed to be unique. It also says (re: the object's lifetime):
Objects are never explicitly destroyed; however, when they become unreachable they may be garbage-collected. An implementation is allowed to postpone garbage collection or omit it altogether — it is a matter of implementation quality how garbage collection is implemented, as long as no objects are collected that are still reachable.
and:
CPython implementation detail: CPython currently uses a reference-counting scheme with (optional) delayed detection of cyclically linked garbage, which collects most objects as soon as they become unreachable, but is not guaranteed to collect garbage containing circular references. See the documentation of the gc module for information on controlling the collection of cyclic garbage. Other implementations act differently and CPython may change. Do not depend on immediate finalization of objects when they become unreachable (ex: always close files).
This isn't actually an answer, I was hoping this would end up somewhere conclusive. But I don't want to delete it now I've quoted and cited.
I'll go with turning your premise around: python will automatically collapse multiple objects into one. - no it willn't, they were never multiple objects, they can't be, because they have the same id().
If id() is Python's definitive answer on whether two objects are the same or different, your premise is incorrect - this isn't an optimization, it's a fundamental part of Python's view on the world.

This version accounts for wim's concerns about more aggressive internment in the future. It will use more memory, which is why I discarded it originally, but probably is more future proof.
>>> class Wrapper(object):
... def __init__(self, obj):
... self.obj = obj
>>> a = 1
>>> b = 1
>>> aWrapped = Wrapper(a)
>>> bWrapped = Wrapper(b)
>>> aWrapped is bWrapped
False
>>> aUnWrapped = aWrapped.obj
>>> bUnwrapped = bWrapped.obj
>>> aUnWrapped is bUnwrapped
True
Or a version that works like the pickle answer (wrap + pickle = wrapple):
class Wrapple(object):
def __init__(self, obj):
self.obj = obj
#staticmethod
def dumps(obj):
return Wrapple(obj)
def loads(self):
return self.obj
aWrapped = Wrapple.dumps(a)
aUnWrapped = Wrapple.loads(a)

Well, seeing as no one posted a response that was useful, I'll just let you know what I ended up doing.
First, some friendly advice to someone who might read this one day. This is not recommended for normal use, so if you're contemplating it, ask yourself if you have a really good reason. There are good reason, but they are rare, and if someone says there aren't, they just aren't thinking hard enough.
In the end, I just used pickle.dumps() on all the objects and passed the output in instead of the real object. On the other side I checked the id and then used pickle.loads() to restore the object. The nice part of this solution was it works for all types including None and Booleans.
>>> a = 1
>>> b = 1
>>> a is b
True
>>> aPickled = pickle.dumps(a)
>>> bPickled = pickle.dumps(b)
>>> aPickled is bPickled
False
>>> aUnPickled = pickle.loads(aPickled)
>>> bUnPickled = pickle.loads(bPickled)
>>> aUnPickled is bUnPickled
True
>>> aUnPickled
1

How does Python establish equality between objects?

I tracked down an error in my program to a line where I was testing for the existence of an object in a list of objects. The line always returned False, which meant that the object wasn't in the list. In fact, it kept happening, even when I did the following:
class myObject(object):
__slots__=('mySlot')
def __init__(self,myArgument):
self.mySlot=myArgument
print(myObject(0)==myObject(0)) # prints False
a=0
print(myObject(a)==myObject(a)) # prints False
a=myObject(a)
print(a==a) # prints True
I've used deepcopy before, but I'm not experienced enough with Python to know when it is and isn't necessary, or mechanically what the difference is. I've also heard of pickling, but never used it. Can someone explain to me what's going on here?
Oh, and another thing. The line
if x in myIterable:
probably tests equality between x and each element in myIterable, right? So if I can change the perceived equality between two objects, I can modify the output of that line? Is there a built-in for that and all of the other inline operators?

It passes the second operand to the __eq__() method of the first.
Incorrect. It passes the first operand to the __contains__() method of the second, or iterates the second performing equality comparisons if no such method exists.
Perhaps you meant to use is, which compares identity instead of equality.

The line myObject(0)==myObject(0) in your code is creating two different instances of a myObject , and since you haven't defined __eq__ they are being compared for identity (i.e. memory location).
x.__eq__(y) <==> x==y and your line about if x in myIterable: using "comparing equal" for the in keyword is correct unless the iterable defines __contains__.
print(myObject(0)==myObject(0)) # prints False
#because id(left_hand_side_myObject) != id(right_hand_side_myObject)
a=0
print(myObject(a)==myObject(a)) # prints False
#again because id(left_hand_side_myObject) != id(right_hand_side_myObject)
a=myObject(a)
print(a==a) # prints True
#because id(a) == id(a)

myObject(a) == myObject(a) returns false because you are creating two separate instances of myObject (with the same attribute a). So the two objects have the same attributes, but they are different instances of your class, so they are not equal.
If you want to check whether an object is in a list, then yeah,
if x in myIterable
would probably be the easiest way to do that.
If you want to check whether an object has the exact same attributes as another object in a list, maybe try something like this:
x = myObject(a)
for y in myIterable:
if x.mySlot == y.mySlot:
print("Exists")
break
Or, you could use __eq__(self,other) in your class definition to set the conditions for eqality.

Python - Function attributes or mutable default values

Say you have a function that needs to maintain some sort of state and behave differently depending on that state. I am aware of two ways to implement this where the state is stored entirely by the function:
Using a function attribute
Using a mutable default value
Using a slightly modified version of Felix Klings answer to another question, here is an example function that can be used in re.sub() so that only the third match to a regex will be replaced:
Function attribute:
def replace(match):
replace.c = getattr(replace, "c", 0) + 1
return repl if replace.c == 3 else match.group(0)
Mutable default value:
def replace(match, c=[0]):
c[0] += 1
return repl if c[0] == 3 else match.group(0)
To me the first seems cleaner, but I have seen the second more commonly. Which is preferable and why?

I use closure instead, no side effects.
Here is the example (I've just modified the original example of Felix Klings answer):
def replaceNthWith(n, replacement):
c = [0]
def replace(match):
c[0] += 1
return replacement if c[0] == n else match.group(0)
return replace
And the usage:
# reset state (in our case count, c=0) for each string manipulation
re.sub(pattern, replaceNthWith(n, replacement), str1)
re.sub(pattern, replaceNthWith(n, replacement), str2)
#or persist state between calls
replace = replaceNthWith(n, replacement)
re.sub(pattern, replace, str1)
re.sub(pattern, replace, str2)
For mutable what should happen if somebody call replace(match, c=[])?
For attribute you broke encapsulation (yes i know that python didn't implemented in classes from diff reasons ...)

Both ways feel strange to me. The first though is much better. But when you think about it this way: "Something with a state that can do operations with that state and additional input", it really sounds like a normal object. And when something sounds like an object, it should be an object...
SO, my solution would be to use a simple object with a __call__ method:
class StatefulReplace(object):
def __init__(self, initial_c=0):
self.c = initial_c
def __call__(self, match):
self.c += 1
return repl if self.c == 3 else match.group(0)
And then you can write in the global space or your module init:
replace = StatefulReplace(0)

How about:
Use a class
Use a global variable
True, these are not stored entirely within the function. I would probably use a class:
class Replacer(object):
c = 0
#staticmethod # if you like
def replace(match):
replace.c += 1
...
To answer your actual question, use getattr. It's a very clear and readable way to store data away for later. It should be pretty obvious to someone reading it what you're trying to do.
The mutable default argument version is an example of a common programming error (assuming you'll get a new list every time). For that reason alone I would avoid it. Someone reading it later might decide that it's a good idea without fully understanding the consequences. And even in this case, it seems though your function would only work once (your c value is never reset to zero).

To me, both of these approaches look dodgy. The problem is crying out for a class instance. We don't normally think about functions as maintaining state between calls; that's what classes are for.
That said, I've used function attributes before for this sort of thing. Particularly if it's a one-shot function defined within other code (i.e. not possible for it to be used from anywhere else), just tacking on attributes to it is more concise than defining a whole new class and creating an instance of it.
I would never abuse default values for this. There's a large barrier to understanding because the natural purpose of default values is to provide default values of arguments, not to maintain state between calls. Plus a default argument invites you to supply a non-default value, and you typically get very strange behaviour if you do that with a function that is abusing default values to maintain state.

How to distinguish between a sequence and a mapping

I would like to perform an operation on an argument based on the fact that it might be a map-like object or a sequence-like object. I understand that no strategy is going to be 100% reliable for type-like checking, but I'm looking for a robust solution.
Based on this answer, I know how to determine whether something is a sequence and I can do this check after checking if the object is a map.
def ismap(arg):
# How to implement this?
def isseq(arg):
return hasattr(arg,"__iter__")
def operation(arg):
if ismap(arg):
# Do something with a dict-like object
elif isseq(arg):
# Do something with a sequence-like object
else:
# Do something else
Because a sequence can be seen as a map where keys are integers, should I just try to find a key that is not an integer? Or maybe I could look at the string representation? or...?
UPDATE
I selected SilentGhost's answer because it looks like the most "correct" one, but for my needs, here is the solution I ended up implementing:
if hasattr(arg, 'keys') and hasattr(arg, '__getitem__'):
# Do something with a map
elif hasattr(arg, '__iter__'):
# Do something with a sequence/iterable
else:
# Do something else
Essentially, I don't want to rely on an ABC because there are many custom classes that behave like sequences and dictionary but that still do not extend the python collections ABCs (see #Manoj comment). I thought the keys attribute (mentioned by someone who removed his/her answer) was a good enough check for mappings.
Classes extending the Sequence and Mapping ABCs will work with this solution as well.

>>> from collections import Mapping, Sequence
>>> isinstance('ac', Sequence)
True
>>> isinstance('ac', Mapping)
False
>>> isinstance({3:42}, Mapping)
True
>>> isinstance({3:42}, Sequence)
False
collections abstract base classes (ABCs)

Sequences have an __add__ method that implements the + operator. Maps do not have that method, since adding to a map requires both a key and a value, and the + operator only has one right-hand side.
So you may try:
def ismap(arg):
return isseq(arg) and not hasattr(arg, "__add__")

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.