python;address of variable,integer values are immutable? - python

I read somewhere that in python id() function gives the address of object being pointed to by variable.for eg; x =5, id(a) will give the address of object 5 and not the address of variable x.then how can we know the address of variable x??

Firstly - the id() function doesn't officially return the address, it returns a unique object identifier which is guaranteed to be unique for the life time of that object. It just so happens that CPython uses the address for that unique id, but it could change that definition at any time. It is of no use anyway knowing what the id() actually means - there is nothing in Python that allows objects to be accessed via their id.
You asked about the address of the variable, but in Python, variables don't have an address.
I know in languages like C, C++ etc, a named variable is simply a named location in memory into which a data item is stored.
In Python though - and certainly in CPython, variables aren't a fixed location in memory. In Python all variables simply exist as a key in a dictionary that is maintained as your code runs.
When you say
x = 5
in python, it finds the int(5) object and then builds a key value pair in the local scope dictionary. in a very real terms this equivalent to :
__dict__['x'] = 5
or something similar depending on the scope rules.
So there will be an address somewhere in memory which holds the string 'x', but that isn't the address of the variable at all.

The python3 documentation says
id(object)
Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
CPython implementation detail: This is the address of the object in memory.
So this is not guaranteed to be the address. (What did you want to do with the address?)
In CPython it just happens to be the address, because the address of an object is unique apparently and so it is an easy choice.
Generally, Python does not use pointers in the same way as C does. I recommend you to instead search for how whatever you'd like to do is generally done in python. Changing the way you think about the task is likely a more frictionless way than imposing C mentality onto Python.

Related

Object Creation in Python - What happenes to the modified variable?

After getting in touch with the more deeper workings of Python, I've come to the understanding that assigning a variable equals to the creation of a new object which has their own specific address regardless of the variable-name that was assigned to the object.
In that case though, it makes me wonder what happens to an object that was created and modified later on. Does it sit there and consumes memory?
The scenarion in mind looks something like this:
# Creates object with id 10101001 (random numbers)
x = 5
# Creates object with id 10010010 using the value from object 10101001.
x += 10
What happens to object with the id 10101001?
Out of curiosity too, why do objects need an ID AND a refrence that is the variable name, wouldn't it be better to just assign the address with the variable name?
I apologize in advance for the gringe this question might invoke in someone.
Here is a great talk that was given at PyCon by Ned Batchelder this year about how Python manages variables.
https://www.youtube.com/watch?v=_AEJHKGk9ns
I think it will help clear up some of your confusion.
First of all Augmented assignment statements states:
An augmented assignment expression like x += 1 can be rewritten x = x + 1 to achieve a similar, but not exactly equal effect. In the augmented version, x is only evaluated once. Also, when possible, the actual operation is performed in-place, meaning that rather than creating a new object and assigning that to the target, the old object is modified instead.
So depending on the type of x this might not create a new object.
Python is reference counted. So the reference count of the object with id 10101001 decremented. If this count hits zero, the is freed almost immediately. But most low range integers are cached anyways. Refer to Objects, Types and Reference Counts for all the details.
Regarding the id of an object:
CPython implementation detail: This is the address of the object in memory.
So basically id and reference are the same. The variable name is just a binding to the object itself.

Why should I refer to "names" and "binding" in Python instead of "variables" and "assignment"?

Why should I refer to "names" and "binding" in Python instead of "variables" and "assignment"?
I know this question is a bit general but I really would like to know :)
In C and C++, a variable is a named memory location. The value of the variable is the value stored in that location. Assign to the variable and you modify that value. So the variable is the memory location, not the name for it.
In Python, a variable is a name used to refer to an object. The value of the variable is that object. So far sounds like the same thing. But assign to the variable and you don't modify the object itself, rather you alter which object the variable refers to. So the variable is the name, not the object.
For this reason, if you're considering the properties of Python in the abstract, or if you're talking about multiple languages at once, then it's useful to use different names for these two different things. To keep things straight you might avoid talking about variables in Python, and refer to what the assignment operator does as "binding" rather than "assignment".
Note that The Python grammar talks about "assignments" as a kind of statement, not "bindings". At least some of the Python documentation calls names variables. So in the context of Python alone, it's not incorrect to do the same. Different definitions for jargon words apply in different contexts.
In, for example, C, a variable is a location in memory identified by a specific name. For example, int i; means that there is a 4-byte (usually) variable identified by i. This memory location is allocated regardless of whether a value is assigned to it yet. When C runs i = 1000, it is changing the value stored in the memory location i to 1000.
In python, the memory location and size is irrelevant to the interpreter. The closest python comes to a "variable" in the C sense is a value (e.g. 1000) which exists as an object somewhere in memory, with or without a name attached. Binding it to a name happens by i = 1000. This tells python to create an integer object with a value of 1000, if it does not already exist, and bind to to the name 'i'. An object can be bound to multiple names quite easily, e.g:
>>> a = [] # Create a new list object and bind it to the name 'a'
>>> b = a # Get the object bound to the name 'a' and bind it to the name 'b'
>>> a is b # Are the names 'a' and 'b' bound to the same object?
True
This explains the difference between the terms, but as long as you understand the difference it doesn't really matter which you use. Unless you're pedantic.
I'm not sure the name/binding description is the easiest to understand, for example I've always been confused by it even if I've a somewhat accurate understanding of how Python (and cpython in particular) works.
The simplest way to describe how Python works if you're coming from a C background is to understand that all variables in Python are indeed pointers to objects and for example that a list object is indeed an array of pointers to values. After a = b both a and b are pointing to the same object.
There are a couple of tricky parts where this simple model of Python semantic seems to fail, for example with list augmented operator += but for that it's important to note that a += b in Python is not the same as a = a + b but it's a special increment operation (that can also be defined for user types with the __iadd__ method; a += b is indeed a = a.__iadd__(b)).
Another important thing to understand is that while in Python all variables are indeed pointers still there is no pointer concept. In other words you cannot pass a "pointer to a variable" to a function so that the function can change the variable: what in C++ is defined by
void increment(int &x) {
x += 1;
}
or in C by
void increment(int *x) {
*x += 1;
}
in Python cannot be defined because there's no way to pass "a variable", you can only pass "values". The only way to pass a generic writable place in Python is to use a callback closure.
who said you should? Unless you are discussing issues that are directly related to name binding operations; it is perfectly fine to talk about variables and assignments in Python as in any other language. Naturally the precise meaning is different in different programming languages.
If you are debugging an issue connected with "Naming and binding" then use this terminology because Python language reference uses it: to be as specific and precise as possible, to help resolve the problem by avoiding unnecessary ambiguity.
On the other hand, if you want to know what is the difference between variables in C and Python then these pictures might help.
I would say that the distinction is significant because of several of the differences between C and Python:
Duck typing: a C variable is always an instance of a given type - in Python it isn't the type that a name refers to can change.
Shallow copies - Try the following:
>>> a = [4, 5, 6]
>>> b = a
>>> b[1] = 0
>>> a
[4, 0, 6]
>>> b = 3
>>> a
[4, 0, 6]
This makes sense as a and b are both names that spend some of the time bound to a list instance rather than being separate variables.

Python, what is the object method of built-in id()?

In Python:
len(a) can be replaced by a.__len__()
str(a) or repr(a) can be replaced by a.__str__() or a.__repr__()
== is __eq__, + is __add__, etc.
Is there similar method to get the id(a) ? If not, is there any workaround to get an unique id of a python object without using id() ?
edit: additional question: if not ? is there any reason not to define a __id__() ?
No, this behavior cannot be changed. id() is used to get "an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime" (source). No other special meaning is given to this integer (in CPython it is the address of the memory location where the object is stored, but this cannot be relied upon in portable Python).
Since there is no special meaning for the return value of id(), it makes no sense to allow you to return a different value instead.
Further, while you could guarantee that id() would return unique integers for your own objects, you could not possibly satisfy the global uniqueness constraint, since your object cannot possibly have knowledge of all other living objects. It would be possible (and likely) that one of your special values clashes with the identity of another object alive in the runtime. This would not be an acceptable scenario.
If you need a return value that has some special meaning then you should define a method where appropriate and return a useful value from it.
An object isn't aware of its own name (it can have many), let alone of any unique ID it has associated with it. So - in short - no. The reasons that __len__ and co. work is that they are bound to the object already - an object is not bound to its ID.

Why do Python variables take a new address (id) every time they're modified?

Just wondering what the logic behind this one is? On the surface it seems kind of inefficient, that every time you do something simple like "x=x+1" that it has to take a new address and discard the old one.
The Python variable (called an identifier or name, in Python) is a reference to a value. The id() function says something for that value, not the name.
Many values are not mutable; integers, strings, floats all do not change in place. When you add 1 to another integer, you return a new integer that then replaces the reference to the old value.
You can look at Python names as labels, tied to values. If you imagine values as balloons, you are retying the label a new balloon each time you assign to that name. If there are no other labels attached to a balloon anymore, it simply drifts away in the wind, never to be seen again. The id() function gives you a unique number for that balloon.
See this previous answer of mine where I talk a little bit more about that idea of values-as-balloons.
This may seem inefficient. For many often used and small values, Python actually uses a process called interning, where it will cache a stash of these values for re-use. None is such a value, as are small integers and the empty tuple (()). You can use the intern() function to do the same with strings you expect to use a lot.
But note that values are only cleaned up when their reference count (the number of 'labels') drops to 0. Loads of values are reused all over the place all the time, especially those interned integers and singletons.
Because the basic types are immutable, so every time you modify it, it needs to be instantiated again
...which is perfectly fine, especially for thread-safe functions
The = operator doesn't modify an object, it assigns the name to a completely different object, which may or may not already have an id.
For your example, integers are immutable; there's no way to add something to one and keep the same id.
And, in fact, small integers are interned at least in cPython, so if you do:
x = 1
y = 2
x = x + 1
Then x and y may have the same id.
In python "primitive" types like ints and strings are immutable, which means they can not be modified.
Python is actually quite efficient, because, as #Wooble commented, «Very short strings and small integers are interned.»: if two variables reference the same (small) immutable value their id is the same (reducing duplicated immutables).
>>> a = 42
>>> b = 5
>>> id(a) == id(b)
False
>>> b += 37
>>> id(a) == id(b)
True
The reason behind the use of immutable types is a safe approach to the concurrent access on those values.
At the end of the day it depends on a design choice.
Depending on your needs you can take more advantage of an implementation instead of another.
For instance, a different philosophy can be found in a somewhat similar language, Ruby, where those types that in Python are immutable, are not.
To be accurate, assignment x=x+1 doesn't modify the object that x is referencing, it just lets the x point to another object whose value is x+1.
To understand the logic behind, one needs to understand the difference between value semantics and reference semantics.
An object with value semantics means only its value matters, not its identity. While an object with reference semantics focuses on its identity(in Python, identity can be returned from id(obj)).
Typically, value semantics implies immutability of the object. Or conversely, if an object is mutable(i.e. in-place change), that means it has reference semantics.
Let's briefly explain the rationale behind this immutability.
Objects with reference semantics can be changed in-place without losing their original addresses/identities. This makes sense in that it's the identity of an object with reference semantics that makes itself distinguishable from other objects.
In contrast, an object with value-semantics should never change itself.
First, this is possible and reasonable in theory. Since only the value(not its identity) is significant, when a change is needed, it's safe to swap it to another identity with different value. This is called referential transparency. Be noted that this is impossible for the objects with reference semantics.
Secondly, this is beneficial in practice. As the OP thought, it seems inefficient to discard the old objects each time when it's changed , but most time it's more efficient than not. For one thing, Python(or any other language) has intern/cache scheme to make less objects to be created. What's more, if objects of value-semantics were designed to be mutable, it would take much more space in most cases.
For example, Date has a value semantics. If it's designed to be mutable, any method that returning a date from internal field will exposes the handle to outside world, which is risky(e.g. outside can directly modify this internal field without resorting to public interface). Similarly, if one passes any date object by reference to some function/method, this object could be modified in that function/method, which may be not as expected. To avoid these kinds of side-effect, one has to do defensive programming: instead of directly returning the inner date field, he returns a clone of it; instead of passing by reference, he passes by value which means extra copies are made. As one could imagine, there are more chances to create more objects than necessary. What's worse, code becomes more complicated with these extra cloning.
In a word, immutability enforces the value-semantics, it usually involves less object creation, has less side-effects and less hassles, and is more test-friendly. Besides, immutable objects are inherently thread-safe, which means less locks and better efficiency in multithreading environment.
That's the reason why basic data types of value-semantics like number, string, date, time are all immutable(well, string in C++ is an exception, that's why there're so many const string& stuffs to avoid string being modified unexpectedly). As a lesson, Java made mistakes on designing value-semantic class Date, Point, Rectangle, Dimension as mutable.
As we know, objects in OOP have three characteristics: state, behavior and identity. Objects with value semantics are not typical objects in that their identities do not matter at all. Usually they are passive, and mostly used to describe other real, active objects(i.e. those with reference semantics). This is a good hint to distinguish between value semantics and reference semantics.

Uniq id function in python or not?

I have several python scripts run parallel this simple code:
test_id = id('test')
Is test_id unique or not?
http://docs.python.org/library/functions.html#id
Return the “identity” of an object. This is an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
CPython implementation detail: This is the address of the object.
So yes, the IDs are unique.
However, since Python strings are immutable, id('test') may be the same for all strings since 'test' is 'test' is likely to be True.
What do you mean unique? Unique among what?
It is just identifier for part of memory, used by parameter's value. For immutable objects with the same value it is often the same:
>>> id('foo') == id('fo' + 'o')
True
In CPython, id is the pointer to the object in memory.
>>> a = [1,2,3]
>>> b = a
>>> id(a) == id(b)
True
So, if you have multiple references to the same object (and on some corner cases, small strings are created only once and also numbers smaller than 257) it will not be unique
It might help if you talked about what you were trying to do - it isn't really typical to use the id() builtin for anything, least of all strings, unless you really know what you're doing.
Python docs nicely describe the id() builtin function:
This is an integer (or long integer)
which is guaranteed to be unique and
constant for this object during its
lifetime. Two objects with
non-overlapping lifetimes may have the
same id() value.
As I read this, the return values of id() are really only guaranteed to be unique in one interpreter instance - and even then only if the lifetimes of the items overlap. Saving these ids for later use, sending them over sockets, etc. seems not useful. Again, I don't think this is really for people who don't know that they need it.
If you want to generate IDs which are unique across multiple program instances, you might check out the uuid module.
It also occurs to me that you might be trying to produce hashes from Python objects.
Probably there is some approach to your problem which will be cleaner than trying to use the id() function, maybe the problem needs reformulating.

Categories

Resources