In the following, I can see that when adding an integer in python, it adds the integer, assigns that resultant value to a new memory address and then sets the variable to point to that memory address:
>>> a=100
>>> id(a)
4304852448
>>> a+=5
>>> id(a)
4304852608
Is there a way to see what the value is at the (old) memory address 4304852448 (0x10096d5e0) is? For example: value_of(0x10096d5e0)
it adds the integer, assigns that resultant value to a new memory address
No; the object that represents the resulting value has a different memory address. Typically this object is created on the fly, but specifically for integers (which are immutable, and for some other types applying a similar optimization) the object may already exist and be reused.
Is there a way to see what the value is at the (old) memory address 4304852448 (0x10096d5e0) is?
This question is not well posed. First off, "memory addresses" in this context are virtualized, possibly more than once. Second, in the Python model, addresses do not "have values". They are potentially the location of objects, which in turn represent values.
That said: at the location of the previous id(a), the old object will still be present if and only if it has not been garbage-collected. For this, it is sufficient that some other reference is held to the object. (The timing of garbage collection is implementation-defined, but the reference CPython implementation implements garbage collection by reference counting.)
You are not, in general, entitled to examine the underlying memory directly (except perhaps with some kind of process spy that has the appropriate permissions), because Python is just not that low-level of a language. (As in #xprilion's answer, the CPython implementation provides a lower-level memory interface via ctypes; however, the code there is effectively doing an unsafe cast.)
If you did (and assuming CPython), you would not see a binary representation of the integer 100 in the memory location indicated by calling id(a) the first time - you would see instead the first word of the PyObject struct used to implement objects in the C code, which would usually be the refcount, but could be a pointer to the next live heap object in a reference-tracing Python build (if I'm reading it correctly).
Once it's garbage-collected, the contents of that memory are again undefined. They depend on the implementation, and even specifically for the C implementation, they depend on what the standard library free() does with the memory. (Normally it will be left untouched, because there is no reason to zero it out and it takes time to do so. A debug build might write special values into that memory, because it helps detect memory corruption.)
You are trying to dereference the memory address, much like the pointers used in C/C++. The following code might help you -
import ctypes
a = 100
b = id(a)
print(b, ": ", a)
a+=5
print(id(a), ": ", a)
print(ctypes.cast(b, ctypes.py_object).value)
OUTPUT:
140489243334080 : 100
140489243334240 : 105
100
The above example establishes that the value stored in the previous address of a remains the same.
Hope this answers the question. Read #Karl's answer for more information about what's happening in the background - https://stackoverflow.com/a/58468810/2650341
Related
I have some prolem about SharedMemory in python3.8,any help will be good.
Question 1.
SharedMemory has one parameter SIZE,the doc tell me the unit is byte.I created a instance of 1 byte size,then, let shm.buf=bytearray[1,2,3,4], it can work and no any exception!why?
Question 2.
why print buffer is a memory address?
why i set size is 1byte,but result show it allocate 4096byte?
why buffer address and buffer[3:4] address is 3X16X16byte far away?
why buffer[3:4] address same as buffer[1:3] address?
from multiprocessing import shared_memory, Process
def writer(buffer):
print(buffer) # output: <memory at 0x7f982b6234c0>
buffer[:5] = bytearray([1, 2, 3, 4, 5])
buffer[4] = 12
print(buffer[3:4]) # output: <memory at 0x7f982b6237c0>
print(buffer[1:3]) # output: <memory at 0x7f982b6237c0>
if __name__ == '__main__':
shm = shared_memory.SharedMemory('my_memory', create=True, size=1)
print(shm.size) # output: 4096
writer(shm.buf)
print('shm is :', shm) # output: shm is : SharedMemory('my_memory', size=4096)
In answer to question 2:
buffer[3:4] is not, as you seem to suppose, an array reference. It is an expression that takes a slice of buffer and assigns it to a new unnamed variable, which your function prints the ID of, then throws away. Then buffer[1:3] does something similar and the new unnamed variable coincidentally gets allocated to the same memory location as the now disappeared copy of buffer[3:4], because Python's garbage collection knew that location was free.
If you don't throw away the slices after creating them, they will be allocated to different locations. Try this:
>>> b34 = buffer[3:4]
>>> b13 = buffer[1:3]
>>> b34
<memory at 0x0000024E65662940>
>>> b13
<memory at 0x0000024E65662A00>
In this case they are at different locations because there are variables that refer to them both.
And both are at different locations to buffer because they are all 3 different variables that are only related to one another by history, because of the way they were created.
Python variables are not blocks of raw memory that you can index into with a pointer, and thinking that you can access them directly as if they were C byte arrays is not helpful. What a particular Python interpreter does with your data deep down inside is generally not the business of a Python programmer. At the application level, why should anyone care where and how buffer[3:4] is stored?
There is one good reason: if you have huge amounts of data, you may need to understand the details of the implementation because you have a performance problem. But generally the solution at the application level is to use modules like array or pandas where very clever people have already thought about those problems and come up with solutions.
What you are asking is not a question about Python, but about the the details of a particular CPython implementation. If you are interested in that sort of thing, and there is no reason why you should not be, you can always go and read the source. But that might be a little overambitious for the moment.
I read somewhere that in python id() function gives the address of object being pointed to by variable.for eg; x =5, id(a) will give the address of object 5 and not the address of variable x.then how can we know the address of variable x??
Firstly - the id() function doesn't officially return the address, it returns a unique object identifier which is guaranteed to be unique for the life time of that object. It just so happens that CPython uses the address for that unique id, but it could change that definition at any time. It is of no use anyway knowing what the id() actually means - there is nothing in Python that allows objects to be accessed via their id.
You asked about the address of the variable, but in Python, variables don't have an address.
I know in languages like C, C++ etc, a named variable is simply a named location in memory into which a data item is stored.
In Python though - and certainly in CPython, variables aren't a fixed location in memory. In Python all variables simply exist as a key in a dictionary that is maintained as your code runs.
When you say
x = 5
in python, it finds the int(5) object and then builds a key value pair in the local scope dictionary. in a very real terms this equivalent to :
__dict__['x'] = 5
or something similar depending on the scope rules.
So there will be an address somewhere in memory which holds the string 'x', but that isn't the address of the variable at all.
The python3 documentation says
id(object)
Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
CPython implementation detail: This is the address of the object in memory.
So this is not guaranteed to be the address. (What did you want to do with the address?)
In CPython it just happens to be the address, because the address of an object is unique apparently and so it is an easy choice.
Generally, Python does not use pointers in the same way as C does. I recommend you to instead search for how whatever you'd like to do is generally done in python. Changing the way you think about the task is likely a more frictionless way than imposing C mentality onto Python.
In Python, every integer seems to have a 10-digit id which starts from 438. I was trying to find a number that is the same as its id. I wrote a simple code to find the number:
for i in range(4380000000,4390000000):
if i==id(i):
print(i)
else:
pass
When I ran this for the first time I got no such number. Then I ran it for the second time and still I got no number.
When I ran it for the third time, I got a number: 4384404848
Then I checked if id(4384404848)==4384404848 and I got False.
Why did Python return a number that is not equal to it’s id? Or did the same number had different id’s when the program was running and when it had stopped?
(EDIT: The assumption “ every integer seems to have a 10-digit id which starts from 438” is wrong.)
https://docs.python.org/2/library/functions.html#id
is guaranteed to be unique and constant for this object during its lifetime.
Consider id to be a unique identifier or "hash" calculated for this object. It may (and most likely will) be different each time you run your program.
Edit: Just to add, if you're using the CPython implementation (which is the most popular), then it is the address of the object in memory. That should clarify why it was not the same in different runs of the same program.
As a separate note, you should never rely on the value of the id() on any object other than its uniqueness for that given run.
every integer seems to have a 10-digit id which starts from 438
is an incorrect assumption. On my machine:
>>> x = 5
>>> id(x)
38888712L
NO
It doesn't always start with 438
You should think of it like a Unique Register number number for a college student or employee id number (but for Python objects)
Look at what the docs say
id(object)
Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
CPython implementation detail: This is the address of the object in memory.
To make things clear. I assume you know that no matter how many variables you create if they hold the same value then in Python they are all the same. (Aliases)
Look at the interpreter.
>>> a=10
>>> id(10)
26775680
>>> b=20
>>> id(20)
26775440
Unique right. Now look,
>>> a=10
>>> b=10
>>> id(a)
26775680
>>> id(b)
26775680
Also look,
>>> a=10
>>> id(a)
26775680
>>> a=20
>>> b=a
>>> id(a)
26775440
>>> id(b)
26775440
So every value (objects) are assigned a unique value. And that value is nothing but your id().
Since OP asked!
Implementations of Python.
Meaning:
An "implementation" of Python should be taken to mean a program or environment which provides support for the execution of programs written in the Python language, as represented by the CPython reference implementation.
So what that means is Cpython is tha language engine which runs Python code (the language). Why it is named Cpython? To differentiate Python (the language) from Cpython (the implementation).
So basically Cpython is the one which the most common Python implementation (CPython: written in C, often referred to as simply ‘Python’) The one you download from python.org is this one
You need to distinguish between a language and an implementation. Python is a language.
According to Wikipedia,
"A programming language is a notation for writing programs, which are specifications of a computation or algorithm".
This means that it's simply the rules and syntax for writing code. Separately we have a
programming language implementation
which in most cases, is the actual interpreter or compiler.
So CPython - Implementation in C
There's Jython - Implementation in Java
IronPython - Implementation in C#
And some more. Take a look at them here Implementations. Download and mess with them to know more.
I am trying to understand gc because I have got a large list in a program which I need to delete to free up some badly needed memory. The basic question I want to answer is how can I find what is being tracked by gc and what has been freed? following is code illustrating my problem
import gc
old=gc.get_objects()
a=1
new=gc.get_objects()
b=[e for e in new if e not in old]
print "Problem 1: len(new)-len(old)>1 :", len(new), len(old)
print "Problem 2: none of the element in b contain a or id(a): ", a in b, id(a) in b
print "Problem 3: The reference counts are insanely high, WHY?? "
IMHO this is weird behavior that isnt addressed in the docs. For starters why does assigning a single variable create multiple entries for the gc? and why is none of them the variable I made?? Where is the entry for the variable I created in get_objects()?
EDIT: In response to martjin's first reponse I checked the following
a="foo"
print a in gc.get_objects()
Still no-go :( how can I check that a is being tracked by gc?
The result of gc.get_objects() is itself not tracked; it would create a circular reference otherwise:
>>> import gc
>>> print gc.get_objects.__doc__
get_objects() -> [...]
Return a list of objects tracked by the collector (excluding the list
returned).
You do not see a listed because that references one of the low-integer singletons. Python re-uses the same set of int objects for values between -5 and 256. As such, a = 1 does not create a new object to be tracked. Nor will you see any other primitive types.
CPython garbage collection only needs to track container types, types that can reference other value because the only thing that GC needs to do is break circular references.
Note that by the time any Python script starts, already some automatic code has been run. site.py sets up your Python path for example, which involves lists, mappings, etc. Then there are the memoized int values mentioned above, CPython also caches tuple() objects for re-use, etc. As a result, on start-up, easily 5k+ objects are already alive before one line of your code has started.
Just wondering what the logic behind this one is? On the surface it seems kind of inefficient, that every time you do something simple like "x=x+1" that it has to take a new address and discard the old one.
The Python variable (called an identifier or name, in Python) is a reference to a value. The id() function says something for that value, not the name.
Many values are not mutable; integers, strings, floats all do not change in place. When you add 1 to another integer, you return a new integer that then replaces the reference to the old value.
You can look at Python names as labels, tied to values. If you imagine values as balloons, you are retying the label a new balloon each time you assign to that name. If there are no other labels attached to a balloon anymore, it simply drifts away in the wind, never to be seen again. The id() function gives you a unique number for that balloon.
See this previous answer of mine where I talk a little bit more about that idea of values-as-balloons.
This may seem inefficient. For many often used and small values, Python actually uses a process called interning, where it will cache a stash of these values for re-use. None is such a value, as are small integers and the empty tuple (()). You can use the intern() function to do the same with strings you expect to use a lot.
But note that values are only cleaned up when their reference count (the number of 'labels') drops to 0. Loads of values are reused all over the place all the time, especially those interned integers and singletons.
Because the basic types are immutable, so every time you modify it, it needs to be instantiated again
...which is perfectly fine, especially for thread-safe functions
The = operator doesn't modify an object, it assigns the name to a completely different object, which may or may not already have an id.
For your example, integers are immutable; there's no way to add something to one and keep the same id.
And, in fact, small integers are interned at least in cPython, so if you do:
x = 1
y = 2
x = x + 1
Then x and y may have the same id.
In python "primitive" types like ints and strings are immutable, which means they can not be modified.
Python is actually quite efficient, because, as #Wooble commented, «Very short strings and small integers are interned.»: if two variables reference the same (small) immutable value their id is the same (reducing duplicated immutables).
>>> a = 42
>>> b = 5
>>> id(a) == id(b)
False
>>> b += 37
>>> id(a) == id(b)
True
The reason behind the use of immutable types is a safe approach to the concurrent access on those values.
At the end of the day it depends on a design choice.
Depending on your needs you can take more advantage of an implementation instead of another.
For instance, a different philosophy can be found in a somewhat similar language, Ruby, where those types that in Python are immutable, are not.
To be accurate, assignment x=x+1 doesn't modify the object that x is referencing, it just lets the x point to another object whose value is x+1.
To understand the logic behind, one needs to understand the difference between value semantics and reference semantics.
An object with value semantics means only its value matters, not its identity. While an object with reference semantics focuses on its identity(in Python, identity can be returned from id(obj)).
Typically, value semantics implies immutability of the object. Or conversely, if an object is mutable(i.e. in-place change), that means it has reference semantics.
Let's briefly explain the rationale behind this immutability.
Objects with reference semantics can be changed in-place without losing their original addresses/identities. This makes sense in that it's the identity of an object with reference semantics that makes itself distinguishable from other objects.
In contrast, an object with value-semantics should never change itself.
First, this is possible and reasonable in theory. Since only the value(not its identity) is significant, when a change is needed, it's safe to swap it to another identity with different value. This is called referential transparency. Be noted that this is impossible for the objects with reference semantics.
Secondly, this is beneficial in practice. As the OP thought, it seems inefficient to discard the old objects each time when it's changed , but most time it's more efficient than not. For one thing, Python(or any other language) has intern/cache scheme to make less objects to be created. What's more, if objects of value-semantics were designed to be mutable, it would take much more space in most cases.
For example, Date has a value semantics. If it's designed to be mutable, any method that returning a date from internal field will exposes the handle to outside world, which is risky(e.g. outside can directly modify this internal field without resorting to public interface). Similarly, if one passes any date object by reference to some function/method, this object could be modified in that function/method, which may be not as expected. To avoid these kinds of side-effect, one has to do defensive programming: instead of directly returning the inner date field, he returns a clone of it; instead of passing by reference, he passes by value which means extra copies are made. As one could imagine, there are more chances to create more objects than necessary. What's worse, code becomes more complicated with these extra cloning.
In a word, immutability enforces the value-semantics, it usually involves less object creation, has less side-effects and less hassles, and is more test-friendly. Besides, immutable objects are inherently thread-safe, which means less locks and better efficiency in multithreading environment.
That's the reason why basic data types of value-semantics like number, string, date, time are all immutable(well, string in C++ is an exception, that's why there're so many const string& stuffs to avoid string being modified unexpectedly). As a lesson, Java made mistakes on designing value-semantic class Date, Point, Rectangle, Dimension as mutable.
As we know, objects in OOP have three characteristics: state, behavior and identity. Objects with value semantics are not typical objects in that their identities do not matter at all. Usually they are passive, and mostly used to describe other real, active objects(i.e. those with reference semantics). This is a good hint to distinguish between value semantics and reference semantics.