When does ctypes free memory? - python

In Python I'm using ctypes to exchange data with a C library, and the call interface involves nested pointers-to-structs.
If the memory was allocated from in C, then python should (deeply) extract a copy of any needed values and then explicitly ask that C library to deallocate the memory.
If the memory was allocated from in Python, presumably the memory will be deallocated soon after the corresponding ctypes object passes out of scope. How does this work for pointers? If I create a pointer object from a string buffer, then do I need to keep a variable referencing that original buffer object in scope, to prevent this pointer from dangling? Or does the pointer object itself automatically do this for me (even though it won't return the original object)? Does it make any difference whether I'm using pointer, POINTER, cast, c_void_p, or from_address(addressof)?

Nested pointers to simple objects seem fine. The documentation is explicit that ctypes doesn't support "original object return", but implies that the pointer does store a python-reference in order to keep-alive its target object (the precise mechanics might be implementation-specific).
>>> from ctypes import *
>>> x = c_int(7)
>>> triple_ptr = pointer(pointer(pointer(x)))
>>> triple_ptr.contents.contents.contents.value == x.value
True
>>> triple_ptr.contents.contents.contents is x
False
>>> triple_ptr._objects['1']._objects['1']._objects['1'] is x # CPython 3.5
True
Looks like the pointer function is no different to the POINTER template constructor (like how create_string_buffer relates to c_char * size).
>>> type(pointer(x)) is type(POINTER(c_int)(x))
True
Casting to void also seems to keep the reference (but I'm not sure why it modifies the original pointer?).
>>> ptr = pointer(x)
>>> ptr._objects
{'1': c_int(7)}
>>> pvoid = cast(p, c_void_p)
>>> pvoid._objects is ptr._objects
True
>>> pvoid._objects
{139665053613048: <__main__.LP_c_int object at 0x7f064de87bf8>, '1': c_int(7)}
>>> pvoid._objects['1'] is x
True
Creating an object directly from a memory buffer (or address thereof) looks more fraught.
>>> v = c_void_p.from_buffer(triple_ptr)
>>> v2 = c_void_p.from_buffer_copy(triple_ptr)
>>> type(v._objects)
<class 'memoryview'>
>>> POINTER(POINTER(POINTER(c_int))).from_buffer(v)[0][0][0] == x.value
True
>>> p3 = POINTER(POINTER(POINTER(C_int))).from_address(addressof(triple_ptr))
>>> v2._objects is None is p3._objects is p3._b_base_
True
Incidentally, byref probably keeps-alive the memory it references.
>>> byref(x)._obj is x
True

Related

How to annotate pointer to C character array in method signature?

I am writing a wrapper for a C library in Python. I am trying to properly annotate all of the methods, so my IDE can help me catch errors. I am stuck annotating one method, can you help me figure out the proper annotation?
One of the methods in the C library works as follows:
Takes one arg: pointer to a character buffer
Buffer is made via: char_buffer = ctypes.create_string_buffer(16)
Populates the char buffer with the output value
Done via CMethod(char_buffer)
One then parses the buffer by doing something like char_buffer.value.
How can I annotate the wrapper method to look for a pointer to a character buffer? Currently, I have the below, but I think this is incorrect, since POINTER seems to be just a function in _ctypes.py.
from ctypes import POINTER
def wrapped_method(char_buffer: POINTER):
CMethod(char_buffer)
According to [Python.Docs]: ctypes.create_string_buffer(init_or_size, size=None):
This function creates a mutable character buffer. The returned object is a ctypes array of c_char.
Example:
>>> import ctypes
>>>
>>> CharArr16 = ctypes.c_char * 16
>>> s = ctypes.create_string_buffer(16)
>>>
>>> isinstance(s, CharArr16)
True
>>> isinstance(s, ctypes.c_char * 15)
False
>>> isinstance(s, ctypes.c_char * 17)
False
>>>
>>> # A more general form, but it WILL FAIL for non array instances
...
>>> isinstance(s, s._type_ * s._length_)
True
>>>
>>> # A more general form that WILL WORK
...
>>> issubclass(CharArr16, ctypes.Array)
True
>>> isinstance(s, ctypes.Array)
True

Ctypes read data from a double pointer

I am working on a C++ Dll with C wrapper and I am creating a Python wrapper for future user (I discover ctypes since monday). One of the method of my Dll (because it is a class) return an unsigned short **, call data, which corresponds to an image. On C++, I get the value of a pixel using data[row][column].
I create in Python a function on the following model :
mydll.cMyFunction.argtypes = [c_void_p]
mydll.cMyFunction.restype = POINTER(POINTER(c_ushort))
When I call this function, I have result = <__main__.LP_LP_c_ushort at 0x577fac8>
and when I try to see the data at this address (using result.contents.contents) I get the correct value of the first pixel. But I don't know how to access values of the rest of my image. Is there a easy way to do something like C++ (data[i][j]) ?
Yes, just use result[i][j]. Here's a contrived example:
>>> from ctypes import *
>>> ppus = POINTER(POINTER(c_ushort))
>>> ppus
<class '__main__.LP_LP_c_ushort'>
>>> # This creates an array of pointers to ushort[5] arrays
>>> x=(POINTER(c_ushort)*5)(*[cast((c_ushort*5)(n,n+1,n+2,n+3,n+4),POINTER(c_ushort)) for n in range(0,25,5)])
>>> a = cast(x,ppus) # gets a ushort**
>>> a
<__main__.LP_LP_c_ushort object at 0x00000000026F39C8>
>>> a[0] # deref to get the first ushort[5] array
<__main__.LP_c_ushort object at 0x00000000026F33C8>
>>> a[0][0] # get an item from a row
0
>>> a[0][1]
1
>>>
>>> a[1][0]
5
So if you are returning the ushort** correctly from C, it should "just work".

Why does SimpleNamespace have a different size than that of an empty class?

Consider the following:
In [1]: import types
In [2]: class A:
...: pass
...:
In [3]: a1 = A()
In [4]: a1.a, a1.b, a1.c = 1, 2, 3
In [5]: a2 = types.SimpleNamespace(a=1,b=2,c=3)
In [6]: sys.getsizeof(a1)
Out[6]: 56
In [7]: sys.getsizeof(a2)
Out[7]: 48
Where is this size discrepancy coming from? Looking at:
In [10]: types.__file__
Out[10]: '/Users/juan/anaconda3/lib/python3.5/types.py'
I find:
import sys
# Iterators in Python aren't a matter of type but of protocol. A large
# and changing number of builtin types implement *some* flavor of
# iterator. Don't check the type! Use hasattr to check for both
# "__iter__" and "__next__" attributes instead.
def _f(): pass
FunctionType = type(_f)
LambdaType = type(lambda: None) # Same as FunctionType
CodeType = type(_f.__code__)
MappingProxyType = type(type.__dict__)
SimpleNamespace = type(sys.implementation)
Ok, well, here goes nothing:
>>> import sys
>>> sys.implementation
namespace(cache_tag='cpython-35', hexversion=50660080, name='cpython', version=sys.version_info(major=3, minor=5, micro=2, releaselevel='final', serial=0))
>>> type(sys.implementation)
<class 'types.SimpleNamespace'>
I seem to be chasing my own tail here.
I was able to find this related question, but no answer to my particular query.
I am using CPython 3.5 on a 64-bit system. These 8 bytes seem just the right size for some errant reference that I cannot pin-point.
Consider the following classes, which have different sizes:
class A_dict:
pass
class A_slot_0:
__slots__ = []
class A_slot_1:
__slots__ = ["a"]
class A_slot_2:
__slots__ = ["a", "b"]
Each of these has a differing fundamental memory footprint:
>>> [cls.__basicsize__ for cls in [A_dict, A_slot_0, A_slot_1, A_slot_2]]
>>> [32, 16, 24, 32]
Why? In the source of type_new (in typeobject.c), which is responsible for creating the underlying type and computing the basic size of an instance, we see that tp_basicsize is computed as:
the tp_basicsize of the underlying type (object ... 16 bytes);
another sizeof(PyObject *) for each slot;
a sizeof(PyObject *) if a __dict__ is required;
a sizeof(PyObject *) if a __weakref__ is defined;
A plain class such as A_dict will have a __dict__ and a __weakref__ defined, whereas a class with slots has no __weakref__ by default. Hence the size of plain A_dict is 32 bytes. You could consider it to effectively consist of PyObject_HEAD plus two pointers.
Now, consider a SimpleNamespace, which is defined in namespaceobject.c. Here the type is simply:
typedef struct {
PyObject_HEAD
PyObject *ns_dict;
} _PyNamespaceObject;
and tp_basicsize is defined as sizeof(_PyNamespaceObject), making it one pointer larger than a plain object, and thus 24 bytes.
NOTE:
The difference here is effectively that A_dict provides support for taking weak references, while types.SimpleNamespace does not.
>>> weakref.ref(types.SimpleNamespace())
TypeError: cannot create weak reference to 'types.SimpleNamespace' object

dict does not reference elements? Python2.7 changed behavior

Given the example:
>>> import gc
>>> d = { 1 : object() }
>>> gc.get_referrers(d[1])
[] # Python 2.7
[{1: <object object at 0x003A0468>}] # Python 2.5
Why is d not listed as refererrer to to the object?
EDIT1: Although the dict in d references the object, why is the dictionairy not listed?
The doc mentions that:
This function will only locate those containers which support garbage
collection; extension types which do refer to other objects but do not
support garbage collection will not be found.
Seems that dictionaries do not support it.
And here is why:
The garbage collector tries to avoid tracking simple containers which
can’t be part of a cycle. In Python 2.7, this is now true for tuples
and dicts containing atomic types (such as ints, strings, etc.).
Transitively, a dict containing tuples of atomic types won’t be
tracked either. This helps reduce the cost of each garbage collection
by decreasing the number of objects to be considered and traversed by
the collector.
— From What's new in Python 2.7
It seems that object() is considered an atomic type, and trying this with an instance of a user-defined class (that is, not object) confirms this as your code now works.
# Python 2.7
>>> class A(object): pass
>>> r = A()
>>> d = {1: r}
>>> del r
>>> gc.get_referrers(d[1])
[{1: <__main__.A instance at 0x0000000002663708>}]
See also issue 4688.
This is a change in how objects are tracked in Python 2.7; tuples and dictionaries containing only atomic types (including instances of object()), which would never require cycle breaking, are not listed anymore.
See http://bugs.python.org/issue4688; this was implemented to avoid a performance issues with creating loads of tuples or dictionaries.
The work-around is to add an object to your dictionary that does need tracking:
>>> gc.is_tracked(d)
False
>>> class Foo(object): pass
...
>>> d['_'] = Foo()
>>> gc.is_tracked(d)
True
>>> d in gc.get_referrers(r)
True
Once tracked, a dictionary only goes back to being untracked after a gc collection cycle:
>>> del d['_']
>>> gc.is_tracked(d)
True
>>> d in gc.get_referrers(r)
True
>>> gc.collect()
0
>>> gc.is_tracked(d)
False
>>> d in gc.get_referrers(r)
False

numpy ndarray hashability

I have some problems understanding how numpy objects hashability is managed.
>>> import numpy as np
>>> class Vector(np.ndarray):
... pass
>>> nparray = np.array([0.])
>>> vector = Vector(shape=(1,), buffer=nparray)
>>> ndarray = np.ndarray(shape=(1,), buffer=nparray)
>>> nparray
array([ 0.])
>>> ndarray
array([ 0.])
>>> vector
Vector([ 0.])
>>> '__hash__' in dir(nparray)
True
>>> '__hash__' in dir(ndarray)
True
>>> '__hash__' in dir(vector)
True
>>> hash(nparray)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'numpy.ndarray'
>>> hash(ndarray)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'numpy.ndarray'
>>> hash(vector)
-9223372036586049780
>>> nparray.__hash__()
269709177
>>> ndarray.__hash__()
269702147
>>> vector.__hash__()
-9223372036586049780
>>> id(nparray)
4315346832
>>> id(ndarray)
4315234352
>>> id(vector)
4299616456
>>> nparray.__hash__() == id(nparray)
False
>>> ndarray.__hash__() == id(ndarray)
False
>>> vector.__hash__() == id(vector)
False
>>> hash(vector) == vector.__hash__()
True
How come
numpy objects define a __hash__ method but are however not hashable
a class deriving numpy.ndarray defines __hash__ and is hashable?
Am I missing something?
I'm using Python 2.7.1 and numpy 1.6.1
Thanks for any help!
EDIT: added objects ids
EDIT2:
And following deinonychusaur comment and trying to figure out if hashing is based on content, I played with numpy.nparray.dtype and have something I find quite strange:
>>> [Vector(shape=(1,), buffer=np.array([1], dtype=mytype), dtype=mytype) for mytype in ('float', 'int', 'float128')]
[Vector([ 1.]), Vector([1]), Vector([ 1.0], dtype=float128)]
>>> [id(Vector(shape=(1,), buffer=np.array([1], dtype=mytype), dtype=mytype)) for mytype in ('float', 'int', 'float128')]
[4317742576, 4317742576, 4317742576]
>>> [hash(Vector(shape=(1,), buffer=np.array([1], dtype=mytype), dtype=mytype)) for mytype in ('float', 'int', 'float128')]
[269858911, 269858911, 269858911]
I'm puzzled... is there some (type independant) caching mechanism in numpy?
I get the same results in Python 2.6.6 and numpy 1.3.0. According to the Python glossary, an object should be hashable if __hash__ is defined (and is not None), and either __eq__ or __cmp__ is defined. ndarray.__eq__ and ndarray.__hash__ are both defined and return something meaningful, so I don't see why hash should fail. After a quick google, I found this post on the python.scientific.devel mailing list, which states that arrays have never been intended to be hashable - so why ndarray.__hash__ is defined, I have no idea. Note that isinstance(nparray, collections.Hashable) returns True.
EDIT: Note that nparray.__hash__() returns the same as id(nparray), so this is just the default implementation. Maybe it was difficult or impossible to remove the implementation of __hash__ in earlier versions of python (the __hash__ = None technique was apparently introduced in 2.6), so they used some kind of C API magic to achieve this in a way that wouldn't propagate to subclasses, and wouldn't stop you from calling ndarray.__hash__ explicitly?
Things are different in Python 3.2.2 and the current numpy 2.0.0 from the repo. The __cmp__ method no longer exists, so hashability now requires __hash__ and __eq__ (see Python 3 glossary). In this version of numpy, ndarray.__hash__ is defined, but it is just None, so cannot be called. hash(nparray) fails andisinstance(nparray, collections.Hashable) returns False as expected. hash(vector) also fails.
This is not a clear answer, but here is some track to follow to understand this behavior.
I refer here to the numpy code of the 1.6.1 release.
According to numpy.ndarray object implementation (look at, numpy/core/src/multiarray/arrayobject.c), hash method is set to NULL.
NPY_NO_EXPORT PyTypeObject PyArray_Type = {
#if defined(NPY_PY3K)
PyVarObject_HEAD_INIT(NULL, 0)
#else
PyObject_HEAD_INIT(NULL)
0, /* ob_size */
#endif
"numpy.ndarray", /* tp_name */
sizeof(PyArrayObject), /* tp_basicsize */
&array_as_mapping, /* tp_as_mapping */
(hashfunc)0, /* tp_hash */
This tp_hash property seems to be overridden in numpy/core/src/multiarray/multiarraymodule.c. See DUAL_INHERIT, DUAL_INHERIT2 and initmultiarray function where tp_hash attribute is modified.
Ex:
PyArrayDescr_Type.tp_hash = PyArray_DescrHash
According to hashdescr.c, hash is implemented as follow:
* How does this work ? The hash is computed from a list which contains all the
* information specific to a type. The hard work is to build the list
* (_array_descr_walk). The list is built as follows:
* * If the dtype is builtin (no fields, no subarray), then the list
* contains 6 items which uniquely define one dtype (_array_descr_builtin)
* * If the dtype is a compound array, one walk on each field. For each
* field, we append title, names, offset to the final list used for
* hashing, and then append the list recursively built for each
* corresponding dtype (_array_descr_walk_fields)
* * If the dtype is a subarray, one adds the shape tuple to the list, and
* then append the list recursively built for each corresponding type
* (_array_descr_walk_subarray)

Categories

Resources