Uniq id function in python or not?

Uniq id function in python or not? - python

I have several python scripts run parallel this simple code:
test_id = id('test')
Is test_id unique or not?

http://docs.python.org/library/functions.html#id
Return the “identity” of an object. This is an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
CPython implementation detail: This is the address of the object.
So yes, the IDs are unique.
However, since Python strings are immutable, id('test') may be the same for all strings since 'test' is 'test' is likely to be True.

What do you mean unique? Unique among what?
It is just identifier for part of memory, used by parameter's value. For immutable objects with the same value it is often the same:
>>> id('foo') == id('fo' + 'o')
True

In CPython, id is the pointer to the object in memory.
>>> a = [1,2,3]
>>> b = a
>>> id(a) == id(b)
True
So, if you have multiple references to the same object (and on some corner cases, small strings are created only once and also numbers smaller than 257) it will not be unique

It might help if you talked about what you were trying to do - it isn't really typical to use the id() builtin for anything, least of all strings, unless you really know what you're doing.
Python docs nicely describe the id() builtin function:
This is an integer (or long integer)
which is guaranteed to be unique and
constant for this object during its
lifetime. Two objects with
non-overlapping lifetimes may have the
same id() value.
As I read this, the return values of id() are really only guaranteed to be unique in one interpreter instance - and even then only if the lifetimes of the items overlap. Saving these ids for later use, sending them over sockets, etc. seems not useful. Again, I don't think this is really for people who don't know that they need it.
If you want to generate IDs which are unique across multiple program instances, you might check out the uuid module.
It also occurs to me that you might be trying to produce hashes from Python objects.
Probably there is some approach to your problem which will be cleaner than trying to use the id() function, maybe the problem needs reformulating.

Related

python;address of variable,integer values are immutable?

I read somewhere that in python id() function gives the address of object being pointed to by variable.for eg; x =5, id(a) will give the address of object 5 and not the address of variable x.then how can we know the address of variable x??

Firstly - the id() function doesn't officially return the address, it returns a unique object identifier which is guaranteed to be unique for the life time of that object. It just so happens that CPython uses the address for that unique id, but it could change that definition at any time. It is of no use anyway knowing what the id() actually means - there is nothing in Python that allows objects to be accessed via their id.
You asked about the address of the variable, but in Python, variables don't have an address.
I know in languages like C, C++ etc, a named variable is simply a named location in memory into which a data item is stored.
In Python though - and certainly in CPython, variables aren't a fixed location in memory. In Python all variables simply exist as a key in a dictionary that is maintained as your code runs.
When you say
x = 5
in python, it finds the int(5) object and then builds a key value pair in the local scope dictionary. in a very real terms this equivalent to :
__dict__['x'] = 5
or something similar depending on the scope rules.
So there will be an address somewhere in memory which holds the string 'x', but that isn't the address of the variable at all.

The python3 documentation says
id(object)
Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
CPython implementation detail: This is the address of the object in memory.
So this is not guaranteed to be the address. (What did you want to do with the address?)
In CPython it just happens to be the address, because the address of an object is unique apparently and so it is an easy choice.
Generally, Python does not use pointers in the same way as C does. I recommend you to instead search for how whatever you'd like to do is generally done in python. Changing the way you think about the task is likely a more frictionless way than imposing C mentality onto Python.

understanding python id() uniqueness

Python documentation for id() function states the following:
This is an integer which is guaranteed to be unique and constant for
this object during its lifetime. Two objects with non-overlapping
lifetimes may have the same id() value.
CPython implementation detail: This is the address of the object in memory.
Although, the snippet below shows that id's are repeated. Since I didn't explicitly del the objects, I presume they are all alive and unique (I do not know what non-overlapping means).
>>> g = [0, 1, 0]
>>> for h in g:
... print(h, id(h))
...
0 10915712
1 10915744
0 10915712
>>> a=0
>>> b=1
>>> c=0
>>> d=[a, b,c]
>>> for e in d:
... print(e, id(e))
...
0 10915712
1 10915744
0 10915712
>>> id(a)
10915712
>>> id(b)
10915744
>>> id(c)
10915712
>>>
How can the id values for different objects be the same? Is it so because the value 0 (object of class int) is a constant and the interpreter/C compiler optimizes?
If I were to do a = c, then I understand c to have the same id as a since c would just be a reference to a (alias). I expected the objects a and c to have different id values otherwise, but, as shown above, they have the same values.
What's happening? Or am I looking at this the wrong way?
I would expect the id's for user-defined class' objects to ALWAYS be unique even if they have the exact same member values.
Could someone explain this behavior? (I looked at the other questions that ask uses of id(), but they steer in other directions)
EDIT (09/30/2019):
TO extend what I already wrote, I ran python interpreters in separate terminals and checked the id's for 0 on all of them, they were exactly the same (for the same interpreter); multiple instances of different interpreters had the same id for 0. Python2 vs Python3 had different values, but the same Python2 interpreter had same id values.
My question is because the id()'s documentation doesn't state any such optimizations, which seems misleading (I don't expect every quirk to be noted, but some note alongside the CPython note would be nice)...
EDIT 2 (09/30/2019):
The question is stemmed in understanding this behavior and knowing if there are any hooks to optimize user-define classes in a similar way (by modifying the __equals__ method to identify if two objects are same; perhaps the would point to the same address in memory i.e. same id? OR use some metaclass properties)

Ids are guaranteed to be unique for the lifetime of the object. If an object gets deleted, a new object can acquire the same id. CPython will delete items immediately when their refcount drops to zero. The garbage collector is only needed to break up reference cycles.
CPython may also cache and re-use certain immutable objects like small integers and strings defined by literals that are valid identifiers. This is an implementation detail that you should not rely upon. It is generally considered improper to use is checks on such objects.
There are certain exceptions to this rule, for example, using an is check on possibly-interned strings as an optimization before comparing them with the normal == operator is fine. The dict builtin uses this strategy for lookups to make them faster for identifiers.
a is b or a == b # This is OK
If the string happens to be interned, then the above can return true with a simple id comparison instead of a slower character-by-character comparison, but it still returns true if and only if a == b (because if a is b then a == b must also be true). However, a good implementation of .__eq__() would already do an is check internally, so at best you would only avoid the overhead of calling the .__eq__().
Thanks for the answer, would you elaborate around the uniqueness for user-defined objects, are they always unique?
The id of any object (be it user-defined or not) is unique for the lifetime of the object. It's important to distinguish objects from variables. It's possible to have two or more variables refer to the same object.
>>> a = object()
>>> b = a
>>> c = object()
>>> a is b
True
>>> a is c
False
Caching optimizations mean that you are not always guaranteed to get a new object in cases where one might naiively think one should, but this does not in any way violate the uniqueness guarantee of IDs. Builtin types like int and str may have some caching optimizations, but they follow exactly the same rules: If they are live at the same time, and their IDs are the same, then they are the same object.
Caching is not unique to builtin types. You can implement caching for your own objects.
>>> def the_one(it=object()):
... return it
...
>>> the_one() is the_one()
True
Even user-defined classes can cache instances. For example, this class only makes one instance of itself.
>>> class TheOne:
... _the_one = None
... def __new__(cls):
... if not cls._the_one:
... cls._the_one = super().__new__(cls)
... return cls._the_one
...
>>> TheOne() is TheOne() # There can be only one TheOne.
True
>>> id(TheOne()) == id(TheOne()) # This is what an is-check does.
True
Note that each construction expression evaluates to an object with the same id as the other. But this id is unique to the object. Both expressions reference the same object, so of course they have the same id.
The above class only keeps one instance, but you could also cache some other number. Perhaps recently used instances, or those configured in a way you expect to be common (as ints do), etc.

Calculate a identifier for an object [duplicate]

This would be similar to the java.lang.Object.hashcode() method.
I need to store objects I have no control over in a set, and make sure that only if two objects are actually the same object (not contain the same values) will the values be overwritten.

id(x)
will do the trick for you. But I'm curious, what's wrong about the set of objects (which does combine objects by value)?
For your particular problem I would probably keep the set of ids or of wrapper objects. A wrapper object will contain one reference and compare by x==y <==> x.ref is y.ref.
It's also worth noting that Python objects have a hash function as well. This function is necessary to put an object into a set or dictionary. It is supposed to sometimes collide for different objects, though good implementations of hash try to make it less likely.

That's what "is" is for.
Instead of testing "if a == b", which tests for the same value,
test "if a is b", which will test for the same identifier.

As ilya n mentions, id(x) produces a unique identifier for an object.
But your question is confusing, since Java's hashCode method doesn't give a unique identifier. Java's hashCode works like most hash functions: it always returns the same value for the same object, two objects that are equal always get equal codes, and unequal hash values imply unequal hash codes. In particular, two different and unequal objects can get the same value.
This is confusing because cryptographic hash functions are quite different from this, and more like (though not exactly) the "unique id" that you asked for.
The Python equivalent of Java's hashCode method is hash(x).

You don't have to compare objects before placing them in a set. set() semantics already takes care of this.
class A(object):
a = 10
b = 20
def __hash__(self):
return hash((self.a, self.b))
a1 = A()
a2 = A()
a3 = A()
a4 = a1
s = set([a1,a2,a3,a4])
s
=> set([<__main__.A object at 0x222a8c>, <__main__.A object at 0x220684>, <__main__.A object at 0x22045c>])
Note: You really don't have to override hash to prove this behaviour :-)

Python, what is the object method of built-in id()?

In Python:
len(a) can be replaced by a.__len__()
str(a) or repr(a) can be replaced by a.__str__() or a.__repr__()
== is __eq__, + is __add__, etc.
Is there similar method to get the id(a) ? If not, is there any workaround to get an unique id of a python object without using id() ?
edit: additional question: if not ? is there any reason not to define a __id__() ?

No, this behavior cannot be changed. id() is used to get "an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime" (source). No other special meaning is given to this integer (in CPython it is the address of the memory location where the object is stored, but this cannot be relied upon in portable Python).
Since there is no special meaning for the return value of id(), it makes no sense to allow you to return a different value instead.
Further, while you could guarantee that id() would return unique integers for your own objects, you could not possibly satisfy the global uniqueness constraint, since your object cannot possibly have knowledge of all other living objects. It would be possible (and likely) that one of your special values clashes with the identity of another object alive in the runtime. This would not be an acceptable scenario.
If you need a return value that has some special meaning then you should define a method where appropriate and return a useful value from it.

An object isn't aware of its own name (it can have many), let alone of any unique ID it has associated with it. So - in short - no. The reasons that __len__ and co. work is that they are bound to the object already - an object is not bound to its ID.

Why do Python variables take a new address (id) every time they're modified?

Just wondering what the logic behind this one is? On the surface it seems kind of inefficient, that every time you do something simple like "x=x+1" that it has to take a new address and discard the old one.

The Python variable (called an identifier or name, in Python) is a reference to a value. The id() function says something for that value, not the name.
Many values are not mutable; integers, strings, floats all do not change in place. When you add 1 to another integer, you return a new integer that then replaces the reference to the old value.
You can look at Python names as labels, tied to values. If you imagine values as balloons, you are retying the label a new balloon each time you assign to that name. If there are no other labels attached to a balloon anymore, it simply drifts away in the wind, never to be seen again. The id() function gives you a unique number for that balloon.
See this previous answer of mine where I talk a little bit more about that idea of values-as-balloons.
This may seem inefficient. For many often used and small values, Python actually uses a process called interning, where it will cache a stash of these values for re-use. None is such a value, as are small integers and the empty tuple (()). You can use the intern() function to do the same with strings you expect to use a lot.
But note that values are only cleaned up when their reference count (the number of 'labels') drops to 0. Loads of values are reused all over the place all the time, especially those interned integers and singletons.

Because the basic types are immutable, so every time you modify it, it needs to be instantiated again
...which is perfectly fine, especially for thread-safe functions

The = operator doesn't modify an object, it assigns the name to a completely different object, which may or may not already have an id.
For your example, integers are immutable; there's no way to add something to one and keep the same id.
And, in fact, small integers are interned at least in cPython, so if you do:
x = 1
y = 2
x = x + 1
Then x and y may have the same id.

In python "primitive" types like ints and strings are immutable, which means they can not be modified.
Python is actually quite efficient, because, as #Wooble commented, «Very short strings and small integers are interned.»: if two variables reference the same (small) immutable value their id is the same (reducing duplicated immutables).
>>> a = 42
>>> b = 5
>>> id(a) == id(b)
False
>>> b += 37
>>> id(a) == id(b)
True
The reason behind the use of immutable types is a safe approach to the concurrent access on those values.
At the end of the day it depends on a design choice.
Depending on your needs you can take more advantage of an implementation instead of another.
For instance, a different philosophy can be found in a somewhat similar language, Ruby, where those types that in Python are immutable, are not.

To be accurate, assignment x=x+1 doesn't modify the object that x is referencing, it just lets the x point to another object whose value is x+1.
To understand the logic behind, one needs to understand the difference between value semantics and reference semantics.
An object with value semantics means only its value matters, not its identity. While an object with reference semantics focuses on its identity(in Python, identity can be returned from id(obj)).
Typically, value semantics implies immutability of the object. Or conversely, if an object is mutable(i.e. in-place change), that means it has reference semantics.
Let's briefly explain the rationale behind this immutability.
Objects with reference semantics can be changed in-place without losing their original addresses/identities. This makes sense in that it's the identity of an object with reference semantics that makes itself distinguishable from other objects.
In contrast, an object with value-semantics should never change itself.
First, this is possible and reasonable in theory. Since only the value(not its identity) is significant, when a change is needed, it's safe to swap it to another identity with different value. This is called referential transparency. Be noted that this is impossible for the objects with reference semantics.
Secondly, this is beneficial in practice. As the OP thought, it seems inefficient to discard the old objects each time when it's changed , but most time it's more efficient than not. For one thing, Python(or any other language) has intern/cache scheme to make less objects to be created. What's more, if objects of value-semantics were designed to be mutable, it would take much more space in most cases.
For example, Date has a value semantics. If it's designed to be mutable, any method that returning a date from internal field will exposes the handle to outside world, which is risky(e.g. outside can directly modify this internal field without resorting to public interface). Similarly, if one passes any date object by reference to some function/method, this object could be modified in that function/method, which may be not as expected. To avoid these kinds of side-effect, one has to do defensive programming: instead of directly returning the inner date field, he returns a clone of it; instead of passing by reference, he passes by value which means extra copies are made. As one could imagine, there are more chances to create more objects than necessary. What's worse, code becomes more complicated with these extra cloning.
In a word, immutability enforces the value-semantics, it usually involves less object creation, has less side-effects and less hassles, and is more test-friendly. Besides, immutable objects are inherently thread-safe, which means less locks and better efficiency in multithreading environment.
That's the reason why basic data types of value-semantics like number, string, date, time are all immutable(well, string in C++ is an exception, that's why there're so many const string& stuffs to avoid string being modified unexpectedly). As a lesson, Java made mistakes on designing value-semantic class Date, Point, Rectangle, Dimension as mutable.
As we know, objects in OOP have three characteristics: state, behavior and identity. Objects with value semantics are not typical objects in that their identities do not matter at all. Usually they are passive, and mostly used to describe other real, active objects(i.e. those with reference semantics). This is a good hint to distinguish between value semantics and reference semantics.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.