Update ctypes pointer in place - python

For various reasons I would like to update the value of a ctypes pointer in place. In other words, what I want is to modify the internal buffer the pointer object wraps. Here is one possible approach:
from ctypes import *
a = pointer(c_int(123))
b = pointer(c_int(456))
memmove(addressof(a), addressof(b), sizeof(c_void_p))
a._objects.clear()
a._objects.update(b._objects)
Now a.contents will return c_long(456). However, playing around with the _objects attribute seems like it's too concerned with the implementation details (will this even behave correctly?). Is there a more idiomatic way to do this?

Since eryksun hasn't posted up his answer I'll add it here my self. This is how it should be done:
from ctypes import *
a = pointer(c_int(123))
b = pointer(c_int(456))
tmp = pointer(a)[0]
tmp.contents = b.contents
Now a.contents = c_int(456). The key is that tmp shares a buffer with a (which is why you'll find tmp._b_needsfree_ == 0).

A pointer is a variable that holds a memory address, so the call memmove(addressof(a), addressof(b),...) actually copies the address held by b into a so a now points at the same memory location that b points to. If that is what you desire, you're done.
If what you want is to set the value of the integer pointed to by a to the same value as that of the integer pointed to by b, then what you need is to copy the contents of the memory at the address pointed to by b into the memory pointed to by a. Like so...
memmove(cast(a, c_void_p).value, cast(b, c_void_p).value, sizeof(c_int))
now the pointer a points to a memory address that holds a value similar to the value stored at the memory pointed to by b.
the _objects attribute is not necessary in both cases (IMHO)
Sorry for the boring post. But that I'm afraid is the (only?) way to do pointers :)

Related

How SharedMemory in python define the size?

I have some prolem about SharedMemory in python3.8,any help will be good.
Question 1.
SharedMemory has one parameter SIZE,the doc tell me the unit is byte.I created a instance of 1 byte size,then, let shm.buf=bytearray[1,2,3,4], it can work and no any exception!why?
Question 2.
why print buffer is a memory address?
why i set size is 1byte,but result show it allocate 4096byte?
why buffer address and buffer[3:4] address is 3X16X16byte far away?
why buffer[3:4] address same as buffer[1:3] address?
from multiprocessing import shared_memory, Process
def writer(buffer):
print(buffer) # output: <memory at 0x7f982b6234c0>
buffer[:5] = bytearray([1, 2, 3, 4, 5])
buffer[4] = 12
print(buffer[3:4]) # output: <memory at 0x7f982b6237c0>
print(buffer[1:3]) # output: <memory at 0x7f982b6237c0>
if __name__ == '__main__':
shm = shared_memory.SharedMemory('my_memory', create=True, size=1)
print(shm.size) # output: 4096
writer(shm.buf)
print('shm is :', shm) # output: shm is : SharedMemory('my_memory', size=4096)
In answer to question 2:
buffer[3:4] is not, as you seem to suppose, an array reference. It is an expression that takes a slice of buffer and assigns it to a new unnamed variable, which your function prints the ID of, then throws away. Then buffer[1:3] does something similar and the new unnamed variable coincidentally gets allocated to the same memory location as the now disappeared copy of buffer[3:4], because Python's garbage collection knew that location was free.
If you don't throw away the slices after creating them, they will be allocated to different locations. Try this:
>>> b34 = buffer[3:4]
>>> b13 = buffer[1:3]
>>> b34
<memory at 0x0000024E65662940>
>>> b13
<memory at 0x0000024E65662A00>
In this case they are at different locations because there are variables that refer to them both.
And both are at different locations to buffer because they are all 3 different variables that are only related to one another by history, because of the way they were created.
Python variables are not blocks of raw memory that you can index into with a pointer, and thinking that you can access them directly as if they were C byte arrays is not helpful. What a particular Python interpreter does with your data deep down inside is generally not the business of a Python programmer. At the application level, why should anyone care where and how buffer[3:4] is stored?
There is one good reason: if you have huge amounts of data, you may need to understand the details of the implementation because you have a performance problem. But generally the solution at the application level is to use modules like array or pandas where very clever people have already thought about those problems and come up with solutions.
What you are asking is not a question about Python, but about the the details of a particular CPython implementation. If you are interested in that sort of thing, and there is no reason why you should not be, you can always go and read the source. But that might be a little overambitious for the moment.

Really simple question about reference of list in python

I have a really simple question about references in python.
I assume you are familiar with this:
aa = [1,2]
bb = aa
aa[0] = 100
print(bb)
As you might guess, the output will be
[100, 2]
and it's totally OK ✔
Let's do another example:
l = [[],[],[]]
a = l[0]
l[0] = [1,2]
print(a)
But here the output is:
[]
I know why that happened.
It's because, on line 3, we made an entirely different list and "replaced it"(not changing but replacing) with l[0]
Now my question is "Can I somehow replace l[0] with [1,2] and also keep a as a reference?
P.S not like l[0].append(1,2)
Short answer: Not really, but there might be something close enough.
The problem is, under the covers, a is a pointer-to-a-list, and l is a pointer-to-a-list-of-pointers-to-lists. When you a = l[0], what that actually translates to at the CPU is "dereference the pointer l, treat the resulting region of memory as a list object, get the first object (which will be the address of another list), and set the value of pointer a to that address". Once you've done that, a and l[0] are only related by concidence; the are two separate pointers that happen, for the moment, to point at the same object. If you assign to either variable, you're changing the value of a pointer, not the contents of the pointed-to object.
Broadly speaking, there's a few ways the computer could practically do what you ask.
Modify the pointed-to object (list) without modifying either pointer. That's what the append function does, along with the many other mutators of python lists. If you want to do this in a way that perhaps more clearly expresses your intent, you could do l[0][:] = [1,2]. That's a list copy operation, copying into the object pointed to by both l[0] and a. This is your best bet as a developer, though note that copy operations are O(n).
Implement a as a pointer-to-a-pointer-to-a-list that is automatically dereferenced (to merely a pointer-to-list) when accessed. This is not, AFAIK, something Python provides any support for; almost no language does. In C you could say list ** a = &(l[0]); but then any time you want to actually do anything with a you'd have to use *a instead.
Tell the interpreter to observe that a is an alias to l[0], rather than its own, separate variable. As far as I know, Python doesn't support this either. In C, you could do it as #define a (l[0]) though you'd want to #undef a when it went out of scope.
Rather than making a a list variable (which is implemented as a pointer-to-list), make it a function: a = lambda: l[0]. This means you have to use a() instead of a anywhere you want to get the actual content of l[0], and you can't assign to l[0] through a() (or through a directly). But it does work, in Python. You could even go so far as to use properties, which would let you skip the parentheses and assign through a, but at the cost of writing a bunch more code to wrap the lists (I'm not aware of a way to attach properties to lists directly, though one might exist, so you'd instead have to create a new object wrapping the list).
If a = l then calling a[0] in that case would yield [1,2], however, you are deleting the list that “a” was referencing, therefore the reference is destroyed as well. You need to either make a = l and call a[0] or reset the reference by calling a = l[0] again.
The original array at l[0] and the [1,2] array are two different objects. there is no way to do that short of re-assigning the a variable to l[0] again once the change has been made.

Possible to get "value of address" in python?

In the following, I can see that when adding an integer in python, it adds the integer, assigns that resultant value to a new memory address and then sets the variable to point to that memory address:
>>> a=100
>>> id(a)
4304852448
>>> a+=5
>>> id(a)
4304852608
Is there a way to see what the value is at the (old) memory address 4304852448 (0x10096d5e0) is? For example: value_of(0x10096d5e0)
it adds the integer, assigns that resultant value to a new memory address
No; the object that represents the resulting value has a different memory address. Typically this object is created on the fly, but specifically for integers (which are immutable, and for some other types applying a similar optimization) the object may already exist and be reused.
Is there a way to see what the value is at the (old) memory address 4304852448 (0x10096d5e0) is?
This question is not well posed. First off, "memory addresses" in this context are virtualized, possibly more than once. Second, in the Python model, addresses do not "have values". They are potentially the location of objects, which in turn represent values.
That said: at the location of the previous id(a), the old object will still be present if and only if it has not been garbage-collected. For this, it is sufficient that some other reference is held to the object. (The timing of garbage collection is implementation-defined, but the reference CPython implementation implements garbage collection by reference counting.)
You are not, in general, entitled to examine the underlying memory directly (except perhaps with some kind of process spy that has the appropriate permissions), because Python is just not that low-level of a language. (As in #xprilion's answer, the CPython implementation provides a lower-level memory interface via ctypes; however, the code there is effectively doing an unsafe cast.)
If you did (and assuming CPython), you would not see a binary representation of the integer 100 in the memory location indicated by calling id(a) the first time - you would see instead the first word of the PyObject struct used to implement objects in the C code, which would usually be the refcount, but could be a pointer to the next live heap object in a reference-tracing Python build (if I'm reading it correctly).
Once it's garbage-collected, the contents of that memory are again undefined. They depend on the implementation, and even specifically for the C implementation, they depend on what the standard library free() does with the memory. (Normally it will be left untouched, because there is no reason to zero it out and it takes time to do so. A debug build might write special values into that memory, because it helps detect memory corruption.)
You are trying to dereference the memory address, much like the pointers used in C/C++. The following code might help you -
import ctypes
a = 100
b = id(a)
print(b, ": ", a)
a+=5
print(id(a), ": ", a)
print(ctypes.cast(b, ctypes.py_object).value)
OUTPUT:
140489243334080 : 100
140489243334240 : 105
100
The above example establishes that the value stored in the previous address of a remains the same.
Hope this answers the question. Read #Karl's answer for more information about what's happening in the background - https://stackoverflow.com/a/58468810/2650341

Why should I refer to "names" and "binding" in Python instead of "variables" and "assignment"?

Why should I refer to "names" and "binding" in Python instead of "variables" and "assignment"?
I know this question is a bit general but I really would like to know :)
In C and C++, a variable is a named memory location. The value of the variable is the value stored in that location. Assign to the variable and you modify that value. So the variable is the memory location, not the name for it.
In Python, a variable is a name used to refer to an object. The value of the variable is that object. So far sounds like the same thing. But assign to the variable and you don't modify the object itself, rather you alter which object the variable refers to. So the variable is the name, not the object.
For this reason, if you're considering the properties of Python in the abstract, or if you're talking about multiple languages at once, then it's useful to use different names for these two different things. To keep things straight you might avoid talking about variables in Python, and refer to what the assignment operator does as "binding" rather than "assignment".
Note that The Python grammar talks about "assignments" as a kind of statement, not "bindings". At least some of the Python documentation calls names variables. So in the context of Python alone, it's not incorrect to do the same. Different definitions for jargon words apply in different contexts.
In, for example, C, a variable is a location in memory identified by a specific name. For example, int i; means that there is a 4-byte (usually) variable identified by i. This memory location is allocated regardless of whether a value is assigned to it yet. When C runs i = 1000, it is changing the value stored in the memory location i to 1000.
In python, the memory location and size is irrelevant to the interpreter. The closest python comes to a "variable" in the C sense is a value (e.g. 1000) which exists as an object somewhere in memory, with or without a name attached. Binding it to a name happens by i = 1000. This tells python to create an integer object with a value of 1000, if it does not already exist, and bind to to the name 'i'. An object can be bound to multiple names quite easily, e.g:
>>> a = [] # Create a new list object and bind it to the name 'a'
>>> b = a # Get the object bound to the name 'a' and bind it to the name 'b'
>>> a is b # Are the names 'a' and 'b' bound to the same object?
True
This explains the difference between the terms, but as long as you understand the difference it doesn't really matter which you use. Unless you're pedantic.
I'm not sure the name/binding description is the easiest to understand, for example I've always been confused by it even if I've a somewhat accurate understanding of how Python (and cpython in particular) works.
The simplest way to describe how Python works if you're coming from a C background is to understand that all variables in Python are indeed pointers to objects and for example that a list object is indeed an array of pointers to values. After a = b both a and b are pointing to the same object.
There are a couple of tricky parts where this simple model of Python semantic seems to fail, for example with list augmented operator += but for that it's important to note that a += b in Python is not the same as a = a + b but it's a special increment operation (that can also be defined for user types with the __iadd__ method; a += b is indeed a = a.__iadd__(b)).
Another important thing to understand is that while in Python all variables are indeed pointers still there is no pointer concept. In other words you cannot pass a "pointer to a variable" to a function so that the function can change the variable: what in C++ is defined by
void increment(int &x) {
x += 1;
}
or in C by
void increment(int *x) {
*x += 1;
}
in Python cannot be defined because there's no way to pass "a variable", you can only pass "values". The only way to pass a generic writable place in Python is to use a callback closure.
who said you should? Unless you are discussing issues that are directly related to name binding operations; it is perfectly fine to talk about variables and assignments in Python as in any other language. Naturally the precise meaning is different in different programming languages.
If you are debugging an issue connected with "Naming and binding" then use this terminology because Python language reference uses it: to be as specific and precise as possible, to help resolve the problem by avoiding unnecessary ambiguity.
On the other hand, if you want to know what is the difference between variables in C and Python then these pictures might help.
I would say that the distinction is significant because of several of the differences between C and Python:
Duck typing: a C variable is always an instance of a given type - in Python it isn't the type that a name refers to can change.
Shallow copies - Try the following:
>>> a = [4, 5, 6]
>>> b = a
>>> b[1] = 0
>>> a
[4, 0, 6]
>>> b = 3
>>> a
[4, 0, 6]
This makes sense as a and b are both names that spend some of the time bound to a list instance rather than being separate variables.

Python: cracking the gc enigma

I am trying to understand gc because I have got a large list in a program which I need to delete to free up some badly needed memory. The basic question I want to answer is how can I find what is being tracked by gc and what has been freed? following is code illustrating my problem
import gc
old=gc.get_objects()
a=1
new=gc.get_objects()
b=[e for e in new if e not in old]
print "Problem 1: len(new)-len(old)>1 :", len(new), len(old)
print "Problem 2: none of the element in b contain a or id(a): ", a in b, id(a) in b
print "Problem 3: The reference counts are insanely high, WHY?? "
IMHO this is weird behavior that isnt addressed in the docs. For starters why does assigning a single variable create multiple entries for the gc? and why is none of them the variable I made?? Where is the entry for the variable I created in get_objects()?
EDIT: In response to martjin's first reponse I checked the following
a="foo"
print a in gc.get_objects()
Still no-go :( how can I check that a is being tracked by gc?
The result of gc.get_objects() is itself not tracked; it would create a circular reference otherwise:
>>> import gc
>>> print gc.get_objects.__doc__
get_objects() -> [...]
Return a list of objects tracked by the collector (excluding the list
returned).
You do not see a listed because that references one of the low-integer singletons. Python re-uses the same set of int objects for values between -5 and 256. As such, a = 1 does not create a new object to be tracked. Nor will you see any other primitive types.
CPython garbage collection only needs to track container types, types that can reference other value because the only thing that GC needs to do is break circular references.
Note that by the time any Python script starts, already some automatic code has been run. site.py sets up your Python path for example, which involves lists, mappings, etc. Then there are the memoized int values mentioned above, CPython also caches tuple() objects for re-use, etc. As a result, on start-up, easily 5k+ objects are already alive before one line of your code has started.

Categories

Resources