how python variable works? - python

How variables work in python?
I tried understand it by assigning a value to a variable(a) and checking memory address of it.But,when I changed the value of that variable(a),I got another memory address.what is the reason for that? and that memory address is in the stack area of the memory? and the scope of it?,when I call del a,only variable identifier('a') was deleted.but,It is still on the memory.After,I call id(3),then,that memory address in the code section of the memory?and how python variables stored in memory?,anyone can explain more?
Code:
#!/usr/bin/python3
import _ctypes
a=45
s=id(a)
print(s)
a=a+2
d=id(a)
print(d)
print(_ctypes.PyObj_FromPtr(s))
del a
print(_ctypes.PyObj_FromPtr(d))
print(id(3))
Output:
10915904
10915968
45
47
10914560

What you're seeing is an optimization detail of CPython (the most common Python implementation, and the one you get if you download the language from python.org).
Since small integers are used so frequently, CPython always stores the numbers -5 through 256 in memory, and uses those stored integer objects whenever those numbers come up. This means that all instances of, say, 5 will have the same memory address.
>>> a = 5
>>> b = 5
>>> id(a) == id(b)
True
>>> c = 4
>>> id(a) == id(c)
False
>>> c += 1
>>> id(a) == id(c)
True
This won't be true for other integers or non-integer values, which are only created when needed:
>>> a = 300
>>> b = 300
>>> id(a) == id(b)
False

Related

use 'is' to check int identity performs different between IDE and terminal [duplicate]

After dive into Python's source code, I find out that it maintains an array of PyInt_Objects ranging from int(-5) to int(256) (#src/Objects/intobject.c)
A little experiment proves it:
>>> a = 1
>>> b = 1
>>> a is b
True
>>> a = 257
>>> b = 257
>>> a is b
False
But if I run those code together in a py file (or join them with semi-colons), the result is different:
>>> a = 257; b = 257; a is b
True
I'm curious why they are still the same object, so I digg deeper into the syntax tree and compiler, I came up with a calling hierarchy listed below:
PyRun_FileExFlags()
mod = PyParser_ASTFromFile()
node *n = PyParser_ParseFileFlagsEx() //source to cst
parsetoke()
ps = PyParser_New()
for (;;)
PyTokenizer_Get()
PyParser_AddToken(ps, ...)
mod = PyAST_FromNode(n, ...) //cst to ast
run_mod(mod, ...)
co = PyAST_Compile(mod, ...) //ast to CFG
PyFuture_FromAST()
PySymtable_Build()
co = compiler_mod()
PyEval_EvalCode(co, ...)
PyEval_EvalCodeEx()
Then I added some debug code in PyInt_FromLong and before/after PyAST_FromNode, and executed a test.py:
a = 257
b = 257
print "id(a) = %d, id(b) = %d" % (id(a), id(b))
the output looks like:
DEBUG: before PyAST_FromNode
name = a
ival = 257, id = 176046536
name = b
ival = 257, id = 176046752
name = a
name = b
DEBUG: after PyAST_FromNode
run_mod
PyAST_Compile ok
id(a) = 176046536, id(b) = 176046536
Eval ok
It means that during the cst to ast transform, two different PyInt_Objects are created (actually it's performed in the ast_for_atom() function), but they are later merged.
I find it hard to comprehend the source in PyAST_Compile and PyEval_EvalCode, so I'm here to ask for help, I'll be appreciative if some one gives a hint?
Python caches integers in the range [-5, 256], so integers in that range are usually but not always identical.
What you see for 257 is the Python compiler optimizing identical literals when compiled in the same code object.
When typing in the Python shell each line is a completely different statement, parsed and compiled separately, thus:
>>> a = 257
>>> b = 257
>>> a is b
False
But if you put the same code into a file:
$ echo 'a = 257
> b = 257
> print a is b' > testing.py
$ python testing.py
True
This happens whenever the compiler has a chance to analyze the literals together, for example when defining a function in the interactive interpreter:
>>> def test():
... a = 257
... b = 257
... print a is b
...
>>> dis.dis(test)
2 0 LOAD_CONST 1 (257)
3 STORE_FAST 0 (a)
3 6 LOAD_CONST 1 (257)
9 STORE_FAST 1 (b)
4 12 LOAD_FAST 0 (a)
15 LOAD_FAST 1 (b)
18 COMPARE_OP 8 (is)
21 PRINT_ITEM
22 PRINT_NEWLINE
23 LOAD_CONST 0 (None)
26 RETURN_VALUE
>>> test()
True
>>> test.func_code.co_consts
(None, 257)
Note how the compiled code contains a single constant for the 257.
In conclusion, the Python bytecode compiler is not able to perform massive optimizations (like statically typed languages), but it does more than you think. One of these things is to analyze usage of literals and avoid duplicating them.
Note that this does not have to do with the cache, because it works also for floats, which do not have a cache:
>>> a = 5.0
>>> b = 5.0
>>> a is b
False
>>> a = 5.0; b = 5.0
>>> a is b
True
For more complex literals, like tuples, it "doesn't work":
>>> a = (1,2)
>>> b = (1,2)
>>> a is b
False
>>> a = (1,2); b = (1,2)
>>> a is b
False
But the literals inside the tuple are shared:
>>> a = (257, 258)
>>> b = (257, 258)
>>> a[0] is b[0]
False
>>> a[1] is b[1]
False
>>> a = (257, 258); b = (257, 258)
>>> a[0] is b[0]
True
>>> a[1] is b[1]
True
(Note that constant folding and the peephole optimizer can change behaviour even between bugfix versions, so which examples return True or False is basically arbitrary and will change in the future).
Regarding why you see that two PyInt_Object are created, I'd guess that this is done to avoid literal comparison. for example, the number 257 can be expressed by multiple literals:
>>> 257
257
>>> 0x101
257
>>> 0b100000001
257
>>> 0o401
257
The parser has two choices:
Convert the literals to some common base before creating the integer, and see if the literals are equivalent. then create a single integer object.
Create the integer objects and see if they are equal. If yes, keep only a single value and assign it to all the literals, otherwise, you already have the integers to assign.
Probably the Python parser uses the second approach, which avoids rewriting the conversion code and also it's easier to extend (for example it works with floats as well).
Reading the Python/ast.c file, the function that parses all numbers is parsenumber, which calls PyOS_strtoul to obtain the integer value (for intgers) and eventually calls PyLong_FromString:
x = (long) PyOS_strtoul((char *)s, (char **)&end, 0);
if (x < 0 && errno == 0) {
return PyLong_FromString((char *)s,
(char **)0,
0);
}
As you can see here the parser does not check whether it already found an integer with the given value and so this explains why you see that two int objects are created,
and this also means that my guess was correct: the parser first creates the constants and only afterward optimizes the bytecode to use the same object for equal constants.
The code that does this check must be somewhere in Python/compile.c or Python/peephole.c, since these are the files that transform the AST into bytecode.
In particular, the compiler_add_o function seems the one that does it. There is this comment in compiler_lambda:
/* Make None the first constant, so the lambda can't have a
docstring. */
if (compiler_add_o(c, c->u->u_consts, Py_None) < 0)
return 0;
So it seems like compiler_add_o is used to insert constants for functions/lambdas etc.
The compiler_add_o function stores the constants into a dict object, and from this immediately follows that equal constants will fall in the same slot, resulting in a single constant in the final bytecode.

Python , variable store in memory

a=[1234,1234] #list
a
[1234, 1234]
id(a[0])
38032480
id(a[1])
38032480
b=1234 #b is a variable of integer type
id(b)
38032384
Why id(b) is not same as id(a[0]) and id(a[1]) in python ?
When the CPython REPL executes a line, it will:
parse, and compile it to a code object of bytecode, and then
execute the bytecode.
The compilation result can be checked through the dis module:
>>> dis.dis('a = [1234, 1234, 5678, 90123, 5678, 4321]')
1 0 LOAD_CONST 0 (1234)
2 LOAD_CONST 0 (1234)
4 LOAD_CONST 1 (5678)
6 LOAD_CONST 2 (90123)
8 LOAD_CONST 1 (5678)
10 LOAD_CONST 3 (4321)
12 BUILD_LIST 6
14 STORE_NAME 0 (a)
16 LOAD_CONST 4 (None)
18 RETURN_VALUE
Note that all 1234s are loaded with "LOAD_CONST 0", and all 5678s are are loaded with "LOAD_CONST 1". These refer to the constant table associated with the code object. Here, the table is (1234, 5678, 90123, 4321, None).
The compiler knows that all the copies of 1234 in the code object are the same, so will only allocate one object to all of them.
Therefore, as OP observed, a[0] and a[1] do indeed refer to the same object: the same constant from the constant table of the code object of that line of code.
When you execute b = 1234, this will again be compiled and executed, independent of the previous line, so a different object will be allocated.
(You may read http://akaptur.com/blog/categories/python-internals/ for a brief introduction for how code objects are interpreted)
Outside of the REPL, when you execute a *.py file, each function is compiled into separate code objects, so when we run:
a = [1234, 1234]
b = 1234
print(id(a[0]), id(a[1]))
print(id(b))
a = (lambda: [1234, 1234])()
b = (lambda: 1234)()
print(id(a[0]), id(a[1]))
print(id(b))
We may see something like:
4415536880 4415536880
4415536880
4415536912 4415536912
4415537104
The first three numbers share the same address 4415536880, and they belong to the constants of the "__main__" code object
Then a[0] and a[1] have addresses 4415536912 of the first lambda.
The b has address 4415537104 of the second lambda.
Also note that this result is valid for CPython only. Other implementations have different strategies on allocating constants. For instance, running the above code in PyPy gives:
19745 19745
19745
19745 19745
19745
There is no rule or guarantee stating that the id(a[0]) should be equal to the id(a[1]), so the question itself is moot. The question you should be asking is why id(a[0]) and id(a[1]) are in fact the same.
If you do a.append(1234) followed by id(a[2]) you may or may not get the same id. As #hiro protagonist has pointed out, these are just internal optimizations that you shouldn't depend upon.
A Python list is very much unlike a C array.
A C array is just a block of contiguous memory, so the address of its first (0-th) element is the address of the array itself, by definition. Array access in C is just pointer arithmetic, and the [] notation is just a thin crust of syntactic sugar over that pointer arithmetic. An expression int x[] is just another form of int * x.
For the sake of the example, let's assume that in in Python, id(x) is a "memory address of X", as *x would be in C. (This is not true for all Python implementations, and not even guaranteed in CPython. It's just an unique number.)
In C, an int is just an architecture-dependent number of bytes, so for int x = 1 the expression *x points to these bytes.
Everything in Python is an object, including numbers. This is why id(1) refers to an object of type int describing number 1. You can call its methods: (1).__str__() will return a string '1'.
So, when you have x = [1, 2, 3], id(x) is a "pointer" to a list object with three elements. The list object itself is pretty complex. But x[0] is not the bytes that comprise the integer value 1; it's internally a reference to an int object for number 1. Thus id(x[0]) is a "pointer" to that object.
In C terms, the elements of the array could be seen as pointers to the objects stored in it, not the objects themselves.
Since there's no point to have two objects representing the same number 1, id(1) is always the same during a Python interpreter run. An illustration:
x = [1, 2, 3]
y = [1, 100, 1000]
assert id(x) != id(y) # obviously
assert id(x[0]) == id(y[0]) == id(1) # yes, the same int object
CPython actually preallocates objects for a few most-used small numbers (see comments here). For larger numbers, it's not so, which can lead to two 'copies' of a larger number having different id() values.
You must note that: id() actually gives id of the value of variables or literals. For every literal/value that is used in your program (even when within the id() itself), id() returns (attempts to return) an unique identifier for the literal/variable within the program life-cycle. This can be used by:
User: to check if two objects/variables are the same as in: a is b
Python: to optimise memory i.e. avoid unwanted duplications of same stuff in memory
As for your case, it isn't even guaranteed that a[0] and a[1] will give the same id though the value of both can be the same. It depends on the order/chronology of creation of literals/variables in the python program lifecycle and internally handled by python.
Case 1:
Type "help", "copyright", "credits" or "license" for more information.
>>> a=[1234,1234]
>>> id(a[0])
52687424
>>> id(a[1])
52687424
Case 2 (Note that at the end of case , a[0] and a[1] have same value but different ids):
Type "help", "copyright", "credits" or "license" for more information.
>>> a=[1,1234]
>>> id(1)
1776174736
>>> id(1234)
14611088
>>> id(a[0])
1776174736
>>> id(a[1])
14611008
>>> a[0]=1234
>>> id(1234)
14611104
>>> id(a[0])
14611152
>>> id(a[1])
14611008
>>>

Python object references

I'm aware that in python every identifier or variable name is a reference to the actual object.
a = "hello"
b = "hello"
When I compare the two strings
a == b
the output is
True
If I write an equivalent code in Java,the output would be false because the comparison is between references(which are different) but not the actual objects.
So what i see here is that the references(variable names) are replaced by actual objects by the interpreter at run time.
So,is is safe for me to assume that "Every time the interpreter sees an already assigned variable name,it replaces it with the object it is referring to" ? I googled it but couldn't find any appropriate answer I was looking for.
If you actually ran that in Java, I think you'd find it probably prints out true because of string interning, but that's somewhat irrelevant.
I'm not sure what you mean by "replaces it with the object it is referring to". What actually happens is that when you write a == b, Python calls a.__eq__(b), which is just like any other method call on a with b as an argument.
If you want an equivalent to Java-like ==, use the is operator: a is b. That compares whether the name a refers to the same object as b, regardless of whether they compare as equal.
Python interning:
>>> a = "hello"
>>> b = "hello"
>>> c = "world"
>>> id(a)
4299882336
>>> id(b)
4299882336
>>> id(c)
4299882384
Short strings tend to get interned automatically, explaining why a is b == True. See here for more.
To show that equal strings don't always have the same id
>>> a = "hello"+" world"
>>> b = "hello world"
>>> c = a
>>> a == b
True
>>> a is b
False
>>> b is c
False
>>> a is c
True
also:
>>> str([]) == str("[]")
True
>>> str([]) is str("[]")
False

assign references

Is there a way to assign references in python?
For example, in php i can do this:
$a = 10;
$b = &$a;
$a = 20;
echo $a." ".$b; // 20, 20
how can i do same thing in python?
In python, if you're doing this with non-primitive types, it acts exactly like you want: assigning is done using references. That's why, when you run the following:
>>> a = {'key' : 'value'}
>>> b = a
>>> b['key'] = 'new-value'
>>> print a['key']
you get 'new-value'.
Strictly saying, if you do the following:
>>> a = 5
>>> b = a
>>> print id(a) == id(b)
you'll get True.
But! Because of primitive types are immutable, you cant change the value of variable b itself. You are just able create a new variable with a new value, based on b. For example, if you do the following:
>>> print id(b)
>>> b = b + 1
>>> print id(b)
you'll get two different values.
This means that Python created a new variable, computed its value basing on b's value and then gave this new variable the name b. This concerns all of the immutable types. Connecting two previous examples together:
>>> a = 5
>>> b = a
>>> print id(a)==id(b)
True
>>> b += 1
>>> print id(b)==id(a)
False
So, when you assign in Python, you always assign reference. But some types cannot be changed, so when you do some changes, you actually create a new variable with another reference.
In Python, everything is by default a reference. So when you do something like:
x=[1,2,3]
y=x
x[1]=-1
print y
It prints [1,-1,3].
The reason this does not work when you do
x=1
y=x
x=-1
print y
is that ints are immutable. They cannot be changed. Think about it, does a number really ever change? When you assign a new value to x, you are assigning a new value - not changing the old one. So y still points to the old one. Other immutable types (e.g. strings and tuples) behave in the same way.

How is the 'is' keyword implemented in Python?

... the is keyword that can be used for equality in strings.
>>> s = 'str'
>>> s is 'str'
True
>>> s is 'st'
False
I tried both __is__() and __eq__() but they didn't work.
>>> class MyString:
... def __init__(self):
... self.s = 'string'
... def __is__(self, s):
... return self.s == s
...
>>>
>>>
>>> m = MyString()
>>> m is 'ss'
False
>>> m is 'string' # <--- Expected to work
False
>>>
>>> class MyString:
... def __init__(self):
... self.s = 'string'
... def __eq__(self, s):
... return self.s == s
...
>>>
>>> m = MyString()
>>> m is 'ss'
False
>>> m is 'string' # <--- Expected to work, but again failed
False
>>>
Testing strings with is only works when the strings are interned. Unless you really know what you're doing and explicitly interned the strings you should never use is on strings.
is tests for identity, not equality. That means Python simply compares the memory address a object resides in. is basically answers the question "Do I have two names for the same object?" - overloading that would make no sense.
For example, ("a" * 100) is ("a" * 100) is False. Usually Python writes each string into a different memory location, interning mostly happens for string literals.
The is operator is equivalent to comparing id(x) values. For example:
>>> s1 = 'str'
>>> s2 = 'str'
>>> s1 is s2
True
>>> id(s1)
4564468760
>>> id(s2)
4564468760
>>> id(s1) == id(s2) # equivalent to `s1 is s2`
True
id is currently implemented to use pointers as the comparison. So you can't overload is itself, and AFAIK you can't overload id either.
So, you can't. Unusual in python, but there it is.
The Python is keyword tests object identity. You should NOT use it to test for string equality. It may seem to work frequently because Python implementations, like those of many very high level languages, performs "interning" of strings. That is to say that string literals and values are internally kept in a hashed list and those which are identical are rendered as references to the same object. (This is possible because Python strings are immutable).
However, as with any implementation detail, you should not rely on this. If you want to test for equality use the == operator. If you truly want to test for object identity then use is --- and I'd be hard-pressed to come up with a case where you should care about string object identity. Unfortunately you can't count on whether two strings are somehow "intentionally" identical object references because of the aforementioned interning.
The is keyword compares objects (or, rather, compares if two references are to the same object).
Which is, I think, why there's no mechanism to provide your own implementation.
It happens to work sometimes on strings because Python stores strings 'cleverly', such that when you create two identical strings they are stored in one object.
>>> a = "string"
>>> b = "string"
>>> a is b
True
>>> c = "str"+"ing"
>>> a is c
True
You can hopefully see the reference vs data comparison in a simple 'copy' example:
>>> a = {"a":1}
>>> b = a
>>> c = a.copy()
>>> a is b
True
>>> a is c
False
If you are not afraid of messing up with bytecode, you can intercept and patch COMPARE_OP with 8 ("is") argument to call your hook function on objects being compared. Look at dis module documentation for start-in.
And don't forget to intercept __builtin__.id() too if someone will do id(a) == id(b) instead of a is b.
'is' compares object identity whereas == compares values.
Example:
a=[1,2]
b=[1,2]
#a==b returns True
#a is b returns False
p=q=[1,2]
#p==q returns True
#p is q returns True
is fails to compare a string variable to string value and two string variables when the string starts with '-'. My Python version is 2.6.6
>>> s = '-hi'
>>> s is '-hi'
False
>>> s = '-hi'
>>> k = '-hi'
>>> s is k
False
>>> '-hi' is '-hi'
True
You can't overload the is operator. What you want to overload is the == operator. This can be done by defining a __eq__ method in the class.
You are using identity comparison. == is probably what you want. The exception to this is when you want to be checking if one item and another are the EXACT same object and in the same memory position. In your examples, the item's aren't the same, since one is of a different type (my_string) than the other (string). Also, there's no such thing as someclass.__is__ in python (unless, of course, you put it there yourself). If there was, comparing objects with is wouldn't be reliable to simply compare the memory locations.
When I first encountered the is keyword, it confused me as well. I would have thought that is and == were no different. They produced the same output from the interpreter on many objects. This type of assumption is actually EXACTLY what is... is for. It's the python equivalent "Hey, don't mistake these two objects. they're different.", which is essentially what [whoever it was that straightened me out] said. Worded much differently, but one point == the other point.
the
for some helpful examples and some text to help with the sometimes confusing differences
visit a document from python.org's mail host written by "Danny Yoo"
or, if that's offline, use the unlisted pastebin I made of it's body.
in case they, in some 20 or so blue moons (blue moons are a real event), are both down, I'll quote the code examples
###
>>> my_name = "danny"
>>> your_name = "ian"
>>> my_name == your_name
0 #or False
###
###
>>> my_name[1:3] == your_name[1:3]
1 #or True
###
###
>>> my_name[1:3] is your_name[1:3]
0
###
Assertion Errors can easily arise with is keyword while comparing objects. For example, objects a and b might hold same value and share same memory address. Therefore, doing an
>>> a == b
is going to evaluate to
True
But if
>>> a is b
evaluates to
False
you should probably check
>>> type(a)
and
>>> type(b)
These might be different and a reason for failure.
Because string interning, this could look strange:
a = 'hello'
'hello' is a #True
b= 'hel-lo'
'hel-lo' is b #False

Categories

Resources