In Python:
len(a) can be replaced by a.__len__()
str(a) or repr(a) can be replaced by a.__str__() or a.__repr__()
== is __eq__, + is __add__, etc.
Is there similar method to get the id(a) ? If not, is there any workaround to get an unique id of a python object without using id() ?
edit: additional question: if not ? is there any reason not to define a __id__() ?
No, this behavior cannot be changed. id() is used to get "an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime" (source). No other special meaning is given to this integer (in CPython it is the address of the memory location where the object is stored, but this cannot be relied upon in portable Python).
Since there is no special meaning for the return value of id(), it makes no sense to allow you to return a different value instead.
Further, while you could guarantee that id() would return unique integers for your own objects, you could not possibly satisfy the global uniqueness constraint, since your object cannot possibly have knowledge of all other living objects. It would be possible (and likely) that one of your special values clashes with the identity of another object alive in the runtime. This would not be an acceptable scenario.
If you need a return value that has some special meaning then you should define a method where appropriate and return a useful value from it.
An object isn't aware of its own name (it can have many), let alone of any unique ID it has associated with it. So - in short - no. The reasons that __len__ and co. work is that they are bound to the object already - an object is not bound to its ID.
Related
I read somewhere that in python id() function gives the address of object being pointed to by variable.for eg; x =5, id(a) will give the address of object 5 and not the address of variable x.then how can we know the address of variable x??
Firstly - the id() function doesn't officially return the address, it returns a unique object identifier which is guaranteed to be unique for the life time of that object. It just so happens that CPython uses the address for that unique id, but it could change that definition at any time. It is of no use anyway knowing what the id() actually means - there is nothing in Python that allows objects to be accessed via their id.
You asked about the address of the variable, but in Python, variables don't have an address.
I know in languages like C, C++ etc, a named variable is simply a named location in memory into which a data item is stored.
In Python though - and certainly in CPython, variables aren't a fixed location in memory. In Python all variables simply exist as a key in a dictionary that is maintained as your code runs.
When you say
x = 5
in python, it finds the int(5) object and then builds a key value pair in the local scope dictionary. in a very real terms this equivalent to :
__dict__['x'] = 5
or something similar depending on the scope rules.
So there will be an address somewhere in memory which holds the string 'x', but that isn't the address of the variable at all.
The python3 documentation says
id(object)
Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
CPython implementation detail: This is the address of the object in memory.
So this is not guaranteed to be the address. (What did you want to do with the address?)
In CPython it just happens to be the address, because the address of an object is unique apparently and so it is an easy choice.
Generally, Python does not use pointers in the same way as C does. I recommend you to instead search for how whatever you'd like to do is generally done in python. Changing the way you think about the task is likely a more frictionless way than imposing C mentality onto Python.
getattr(dir,"__name__") is dir.__name__ evaluates to False - is there an alternative to getattr that would yield True ?
The __name__ attribute of built-in functions is implemented (on the CPython reference interpreter) as a property (technically, a get-set descriptor), not stored as an attribute in the form of a Python object.
Properties act like attributes, but call a function when the value is requested, and in this case, the function converts the C-style string name of the function to a Python str on demand. So each time you look up dir.__name__, you get freshly constructed str representing the data; as noted in the comments, this means there is no way to have an is check pass; even dir.__name__ is dir.__name__ returns False, because each lookup of __name__ returned a new str.
The language gives no guarantees of how __name__ is implemented, so you shouldn't be assuming it returns the same object each time. There are very few language guaranteed singletons (None, True, False, Ellipsis and NotImplemented are the biggies, and all classes have unique identities); assuming is will work with anything not in that set when it's not an object you controlled the creation of is a bad idea. If you want to check if the values are the same, test with ==, not is.
Update to address traversing an arbitrary graph of python objects without getting hung up by descriptors and other stuff (like __getattr__) that dynamically generate objects (and therefore shouldn't be invoked to describe the static graph):
The inspect.getattr_static function should let you "traverse an arbitrary graph of python objects reachable from a starting one while assuming as little possible about the types of objects and the implementation of their attributes" (as your comment requested). When the attribute is actually an attribute, it returns the value, but it doesn't trigger dynamic lookup for descriptors (like #property), __getattr__ or __getattribute__. So inspect.getattr_static(dir, '__name__') will return the getset_descriptor that CPython uses to implement __name__ without actually retrieving the string. On a different object where __name__ is a real attribute (e.g. the inspect module itself), it will return the attribute (inspect.getattr_static(inspect, '__name__') returns 'inspect').
While it's not perfect (some properties may actually be backed by real Python objects, not dynamically generated ones, that you can't otherwise access), it's at least a workable solution; you won't end up creating new objects by accident, and you won't end up in infinite loops of property lookup (e.g. every callable can have __call__ looked up on it forever, wrapping itself over and over as it goes), so you can at least arrive at a solution that mostly reflects the object graph accurately, and doesn't end up recursing to death.
Notably, it will preserve identity semantics properly. If two objects have the same attribute (by identity), the result will match as expected. If two objects share a descriptor (e.g. __name__ for all built-in functions, e.g. bin, dir), then it returns the descriptor itself, which will match on identity. And it does it all without needing to know up front if what you have is an attribute or descriptor.
I come from Java where even mutable objects can be "hashable".
And I am playing with Python 3.x these days just for fun.
Here is the definition of hashable in Python (from the Python glossary).
hashable
An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() method). Hashable objects which compare equal must have the same hash value.
Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.
All of Python’s immutable built-in objects are hashable; mutable containers (such as lists or dictionaries) are not. Objects which are instances of user-defined classes are hashable by default. They all compare unequal (except with themselves), and their hash value is derived from their id().
I read it and I am thinking...
Still... Why didn't they make in Python even mutable objects hashable? E.g. using the same default hashing mechanism as for user-defined objects i.e. as described by the last 2 sentences above.
Objects which are instances of user-defined classes are hashable by default. They all compare unequal (except with themselves), and their hash value is derived from their id().
This feels somewhat weird... so user-defined mutable objects are hashable (via this default hashing mechanism) but built-in mutable objects are not hashable. Doesn't this just complicate things? I don't see what benefits it brings, could someone explain?
In Python, mutable objects can be hashable, but it is generally not a good idea, because generally speaking, the equality is defined in terms of these mutable attributes, and this can lead to all sorts of crazy behavhior.
If built-in mutable objects are hashed based on identity, like the default hashing mechanism for user-defined objects, then their hash would be inconsistent with their equality. And that is absolutely a problem. However, user-defined objects by default compare and hash based on identity, so it isn't as bad of a situation, although, this set of affairs isn't very useful.
Note, if you implement __eq__ in a user-defined class, the __hash__ is set to None, making the class unhashable.
So, from the Python 3 data model documentation:
User-defined classes have __eq__() and __hash__() methods by
default; with them, all objects compare unequal (except with
themselves) and x.__hash__() returns an appropriate value such that
x == y implies both that x is y and hash(x) == hash(y).
A class that overrides __eq__() and does not define __hash__() will have its __hash__() implicitly set to None. When the
__hash__() method of a class is None, instances of the class will raise an appropriate TypeError when a program attempts to retrieve
their hash value, and will also be correctly identified as unhashable
when checking isinstance(obj, collections.abc.Hashable).
Calculating a hash value is like giving an identity to an object which simplify the comparison of objects. The comparison by hash value is generally faster than the comparison by value: for an object, you compare its attributes, for a collection, you compare its items, recursively…
If an object is mutable you need to calculate its hash value again after each changes. If this object was compared equal with another one, after a change it becomes unequal. So, mutable objects must be compared by value, not by hash. It’s a non-send to compare by hash values for mutable objects.
Edit: Java HashCode
Typically, hashCode() just returns the object's address in memory if you don't override it.
See the reference about the hashCode function.
As much as is reasonably practical, the hashCode method defined by
class Object does return distinct integers for distinct objects. (This
is typically implemented by converting the internal address of the
object into an integer, but this implementation technique is not
required by the JavaTM programming language.)
So, the Java hashCode function works the same as the default Python __hash__ function.
In Java, if you use a mutable object in a HashSet, for instance, the HashSet isn’t working properly. Because the hashCode depends of the state of the object it can no longer be retrieved properly, so the check for containment fails.
From reading other comments/answers, it seems like what you're not buying is that you have to change a hash of a mutable entity when it mutates, and that you can just hash by id, so I'll try to elaborate on this point.
To quote you:
#kindall Hm... Who says that the hash value has to come from the values in the list? And that if you e.g. add a new value you have to rehash the list, get a new hash value, etc.. In other languages that's not how it is... this is my point. In other languages the hash value just comes from the id (or is the id itself, just like for user-defined mutable Python objects)... And OK... I just feel it makes things a bit too complicated in Python (especially for beginners... not for me).
This isn't exactly false (although I do not know what "other" languages you are referencing), you could do that, but there are some pretty dire consequences:
class HashableList(list):
def __hash__(self):
return id(self)
x = HashableList([1,2,3])
y = HashableList([1,2,3])
our_set = {x}
print("Is x in our_set? ", x in our_set)
print("Is y in our_set? ", y in our_set)
print("Are x and y equal? ", x == y)
This (unexpectedly) outputs:
Is x in our_set? True
Is y in our_set? False <-- potentially confusing
Are x and y equal? True
This means that the hash is not consistent with equality, which is just downright confusing.
You might counter with "well, just hash by the contents then", but I think you already understand that if the contents change then you get other undesirable behavior (for example):
class HashableListByContents(list):
def __hash__(self):
return sum(hash(x) for x in self)
a = HashableListByContents([1,2,3])
b = HashableListByContents([1,2,3])
our_set = {a}
print('Is a in our_set? ', a in our_set)
print('Is b in our_set? ', b in our_set)
print('Are a and b equal? ', a == b)
This outputs:
Is a in our_set? True
Is b in our_set? True
Are a and b equal? True
So far so good! But...
a.append(2)
print('Is a still in our set? ', a in our_set)
this outputs:
Is a still in our set? False <-- potentially confusing
I am not a Python beginner, so I would not presume to know what would or would not confuse a Python beginner, but either way this seems confusing to me (at best). My two cents is that it's simply incorrect to hash mutable objects. I mean we have functional purists that claim mutable objects are just incorrect, period! Python won't stop you from doing any of what you described, because it would never force a paradigm like that, but it's really asking for trouble no matter what route you go down.
HTH!
Just wondering what the logic behind this one is? On the surface it seems kind of inefficient, that every time you do something simple like "x=x+1" that it has to take a new address and discard the old one.
The Python variable (called an identifier or name, in Python) is a reference to a value. The id() function says something for that value, not the name.
Many values are not mutable; integers, strings, floats all do not change in place. When you add 1 to another integer, you return a new integer that then replaces the reference to the old value.
You can look at Python names as labels, tied to values. If you imagine values as balloons, you are retying the label a new balloon each time you assign to that name. If there are no other labels attached to a balloon anymore, it simply drifts away in the wind, never to be seen again. The id() function gives you a unique number for that balloon.
See this previous answer of mine where I talk a little bit more about that idea of values-as-balloons.
This may seem inefficient. For many often used and small values, Python actually uses a process called interning, where it will cache a stash of these values for re-use. None is such a value, as are small integers and the empty tuple (()). You can use the intern() function to do the same with strings you expect to use a lot.
But note that values are only cleaned up when their reference count (the number of 'labels') drops to 0. Loads of values are reused all over the place all the time, especially those interned integers and singletons.
Because the basic types are immutable, so every time you modify it, it needs to be instantiated again
...which is perfectly fine, especially for thread-safe functions
The = operator doesn't modify an object, it assigns the name to a completely different object, which may or may not already have an id.
For your example, integers are immutable; there's no way to add something to one and keep the same id.
And, in fact, small integers are interned at least in cPython, so if you do:
x = 1
y = 2
x = x + 1
Then x and y may have the same id.
In python "primitive" types like ints and strings are immutable, which means they can not be modified.
Python is actually quite efficient, because, as #Wooble commented, «Very short strings and small integers are interned.»: if two variables reference the same (small) immutable value their id is the same (reducing duplicated immutables).
>>> a = 42
>>> b = 5
>>> id(a) == id(b)
False
>>> b += 37
>>> id(a) == id(b)
True
The reason behind the use of immutable types is a safe approach to the concurrent access on those values.
At the end of the day it depends on a design choice.
Depending on your needs you can take more advantage of an implementation instead of another.
For instance, a different philosophy can be found in a somewhat similar language, Ruby, where those types that in Python are immutable, are not.
To be accurate, assignment x=x+1 doesn't modify the object that x is referencing, it just lets the x point to another object whose value is x+1.
To understand the logic behind, one needs to understand the difference between value semantics and reference semantics.
An object with value semantics means only its value matters, not its identity. While an object with reference semantics focuses on its identity(in Python, identity can be returned from id(obj)).
Typically, value semantics implies immutability of the object. Or conversely, if an object is mutable(i.e. in-place change), that means it has reference semantics.
Let's briefly explain the rationale behind this immutability.
Objects with reference semantics can be changed in-place without losing their original addresses/identities. This makes sense in that it's the identity of an object with reference semantics that makes itself distinguishable from other objects.
In contrast, an object with value-semantics should never change itself.
First, this is possible and reasonable in theory. Since only the value(not its identity) is significant, when a change is needed, it's safe to swap it to another identity with different value. This is called referential transparency. Be noted that this is impossible for the objects with reference semantics.
Secondly, this is beneficial in practice. As the OP thought, it seems inefficient to discard the old objects each time when it's changed , but most time it's more efficient than not. For one thing, Python(or any other language) has intern/cache scheme to make less objects to be created. What's more, if objects of value-semantics were designed to be mutable, it would take much more space in most cases.
For example, Date has a value semantics. If it's designed to be mutable, any method that returning a date from internal field will exposes the handle to outside world, which is risky(e.g. outside can directly modify this internal field without resorting to public interface). Similarly, if one passes any date object by reference to some function/method, this object could be modified in that function/method, which may be not as expected. To avoid these kinds of side-effect, one has to do defensive programming: instead of directly returning the inner date field, he returns a clone of it; instead of passing by reference, he passes by value which means extra copies are made. As one could imagine, there are more chances to create more objects than necessary. What's worse, code becomes more complicated with these extra cloning.
In a word, immutability enforces the value-semantics, it usually involves less object creation, has less side-effects and less hassles, and is more test-friendly. Besides, immutable objects are inherently thread-safe, which means less locks and better efficiency in multithreading environment.
That's the reason why basic data types of value-semantics like number, string, date, time are all immutable(well, string in C++ is an exception, that's why there're so many const string& stuffs to avoid string being modified unexpectedly). As a lesson, Java made mistakes on designing value-semantic class Date, Point, Rectangle, Dimension as mutable.
As we know, objects in OOP have three characteristics: state, behavior and identity. Objects with value semantics are not typical objects in that their identities do not matter at all. Usually they are passive, and mostly used to describe other real, active objects(i.e. those with reference semantics). This is a good hint to distinguish between value semantics and reference semantics.
I have several python scripts run parallel this simple code:
test_id = id('test')
Is test_id unique or not?
http://docs.python.org/library/functions.html#id
Return the “identity” of an object. This is an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
CPython implementation detail: This is the address of the object.
So yes, the IDs are unique.
However, since Python strings are immutable, id('test') may be the same for all strings since 'test' is 'test' is likely to be True.
What do you mean unique? Unique among what?
It is just identifier for part of memory, used by parameter's value. For immutable objects with the same value it is often the same:
>>> id('foo') == id('fo' + 'o')
True
In CPython, id is the pointer to the object in memory.
>>> a = [1,2,3]
>>> b = a
>>> id(a) == id(b)
True
So, if you have multiple references to the same object (and on some corner cases, small strings are created only once and also numbers smaller than 257) it will not be unique
It might help if you talked about what you were trying to do - it isn't really typical to use the id() builtin for anything, least of all strings, unless you really know what you're doing.
Python docs nicely describe the id() builtin function:
This is an integer (or long integer)
which is guaranteed to be unique and
constant for this object during its
lifetime. Two objects with
non-overlapping lifetimes may have the
same id() value.
As I read this, the return values of id() are really only guaranteed to be unique in one interpreter instance - and even then only if the lifetimes of the items overlap. Saving these ids for later use, sending them over sockets, etc. seems not useful. Again, I don't think this is really for people who don't know that they need it.
If you want to generate IDs which are unique across multiple program instances, you might check out the uuid module.
It also occurs to me that you might be trying to produce hashes from Python objects.
Probably there is some approach to your problem which will be cleaner than trying to use the id() function, maybe the problem needs reformulating.