Correct way to init object in cython function - python

I'm using following cython code:
cimport numpy as np
np.import_array()
cdef class Seen:
cdef bint sint_
def __cinit__(self):
print('INIT seen object')
self.sint_ = 0
cdef saw_int(self, object val):
self.sint_ = 1
def test(object val):
test_type(val)
def test_type(object val, Seen seen=Seen()):
print ('BEFORE:', seen.sint_)
val = int(val)
seen.saw_int(val)
print ('AFTER:', seen.sint_)
Build it and call functions like so:
import test
import numpy as np
test.test(-1)
print('')
test.test(np.iinfo(np.uint64).max)
The output which produces questions:
INIT seen object
BEFORE: False
AFTER: True
BEFORE: True
AFTER: True
As output states - seen object is not instantiated in second test.test call. But at the same time if change test_type declaration like so:
cdef test_type(object val):
cdef Seen seen=Seen()
...
Init happens on each call.
So 2 questions:
Why 2 realizations of test_type is different? As far as I remember from cython docs these two is interchangable.
How should I pass seen object to the test_type with the default as init new one? If (..., Seen seen=Seen()) not working?

The default value of a function is evaluated once, when the function is defined. If you want a new Seen instance each time you call test_type, do the following:
def test_type(object val, Seen seen=None):
if seen is None:
seen = Seen()
print ('BEFORE:', seen.sint_)
val = int(val)
seen.saw_int(val)
print ('AFTER:', seen.sint_)
Caveat: I'm not very familiar with cython, so there might be a subtlety that I am missing. But this would be the issue in ordinary Python code, and I suspect the same issue applies here.

Related

Global variables vs. parameters for read-only access

I'm wondering what the difference is between this code:
import multiprocessing
g = {} # global data dictionary
class Foo:
#staticmethod
def bar():
elems = [1,2,3,4,5]
g["important_data"] = get_important_data()
with multiprocessing.Pool(10) as p:
for res in p.imap(f, elems):
# do whatever
where f is a function that will use g["important_data"] in a read-only manner; and this code:
import multiprocessing
class Foo:
#staticmethod
def bar():
elems = [1,2,3,4,5]
important_data = get_important_data()
with multiprocessing.Pool(10) as p:
for res in p.imap(f, (elems, important_data)):
# do whatever
where f does exactly the same computation on elems as above, but is handed the additional data as a parameter important_data, rather than acessing it via the global variable g.
I'm currently working with code where the original authors wrote the comment # Globals for multiprocessing to prevent shared memory over g, but I don't know what this means. I know that different processes in python have their own copies of global variables by default (so the first implementation would no work at all if f were meant to write in g; however, as mentioned, this is read-only), but the second implementation seems to copy important_data as well, so where's the difference?

Can I force an expression to be treated as a constant by numba?

Given some global non-mutated object of some type not known to numba:
from types import SimpleNamespace
a = SimpleNamespace(b=2)
I'd like to be able to reference a member of this object as a compile-time constant within a jitted function, something like this:
#numba.njit
def foo():
# return a.b # fails, because numba tries to evaluate at runtime
return numba.mark_this_as_constant(a.b)
Does mark_this_as_constant exist in numba under a different name already? Is it possible to write this myself, perhaps with a custom type?
I can get what I want today with:
def foo(a_b=a.b):
#numba.njit
def foo():
return a_b
return foo
foo = foo()
but this is pretty gross, and requires me to list every closure at the top, rather than at the point of use.
have you tried something like this?
a = SimpleNamespace(b=2)
a_b = a.b
#numba.njit
def foo():
return a_b

Multiple inheritance of cython cdef classes

I have some classes implemented as cdef class in cython. In client python code, I would like to compose the classes with multiple inheritance, but I'm getting a type error. Here is a minimal reproducible example:
In [1]: %load_ext cython
In [2]: %%cython
...: cdef class A:
...: cdef int x
...: def __init__(self):
...: self.x = 0
...: cdef class B:
...: cdef int y
...: def __init__(self):
...: self.y = 0
...:
In [3]: class C(A, B):
...: pass
...:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-3-83ef5091d3a6> in <module>()
----> 1 class C(A, B):
2 pass
TypeError: Error when calling the metaclass bases
multiple bases have instance lay-out conflict
Is there any way to get around this?
The docs say that:
A Python class can inherit from multiple extension types provided that the usual Python rules for multiple inheritance are followed (i.e. the C layouts of all the base classes must be compatible).
I'm trying to understand what this could possibly mean given the trivial example above.
It's pretty restricted. As best as I can tell all but one of the classes has to be empty. Empty classes can have def functions, but not cdef functions or cdef attributes.
Take a Cython class:
cdef class A:
cdef int x
This translates to C code:
struct __pyx_obj_2bc_A { // the name might be different
PyObject_HEAD
int x;
};
Essentially just a C struct containing the basic Python object stuff, and an integer.
The restriction is that a derived class must contain only one PyObject_HEAD and that its PyObject* should also be interpretable as a struct __pyx_obj_2bc_A* or a struct __pyx_obj_2bc_B*.
In your case the two integers x and y would attempt to occupy the same memory (so conflict). However, if one of the types was empty then they would share the PyObject_HEAD but not conflict further.
cdef functions cause a struct __pyx_vtabstruct_2bc_A *__pyx_vtab; to be added to the struct (so it's not empty). This contains function pointers which allows inherited classes to override the cdef functions.
Having two cdef classes that inherit from a common third class is OK, event if the common third class is not empty.
cdef class A:
cdef int x
cdef class B(A):
cdef int y
cdef class C(A):
pass
class D(B,C):
pass
The internal Python code that does this check is the function best_base if you really want to investigate the details of the algorithm.
With reference to "is there any way to get round this?" the answer is "not really." Your best option is probably composition rather than inheritance (i.e. have class C hold an A and B object, rather than inherit from A and B)

Method changes both instances even if applied to only one of them

I'm struggling to understand why my simple code behaves like this. I create 2 instances a and b that takes in an array as argument. Then I define a method to change one of the instances array, but then both get changed. Any idea why this happen and how can I avoid the method changing the other instance?
import numpy as np
class Test:
def __init__(self, arg):
self.arg=arg
def change(self,i,j,new):
self.arg[i][j]=new
array=np.array([[11,12,13]])
a=Test(array)
b=Test(array)
#prints the same as expected
print(a.arg)
print(b.arg)
print()
a.change(0,0,3)
#still prints the same, even though I did
#not change b.arg
print(a.arg)
print(b.arg)
Because you assigned the same object as the instance members. You can use np.array(x, copy=True) or x.copy() to generate a new array object:
array = np.array([[11,12,13]])
a = Test(array.copy())
b = Test(np.array(array, copy=True))
Alternatively, if your arg is always a np.array, you could do it in the __init__ method (as noted by roganjosh in the comments):
class Test:
def __init__(self, arg):
self.arg = np.array(arg, copy=True)
...

Cython: Pass by Reference

Problem
I would like to pass a vector by reference to a function in Cython.
cdef extern from "MyClass.h" namespace "MyClass":
void MyClass_doStuff "MyClass::doStuff"(vector[double]& input) except +
cdef class MyClass:
...
#staticmethod
def doStuff(vector[double]& input):
MyClass_doStuff(input)
Question
The above code doesn't throw an error during compilation but it's also not working. input is simply unchanged after the method.
I have also tried the recommendation in this question but in this case the cdef-function won't be accessible from Python ("unknown member doStuff...").
Is passing by reference possible and, if so, how to do it properly?
Edit
This is not a duplicate of cython-c-passing-by-reference as I refer to the question in the section above. The proposed solution does not accomplish my goal of having a python function taking a parameter by reference.
The problem
The trouble, as Kevin and jepio say in the comments to your question, is how you handle the vector in Python. Cython does define a cpp vector class, which is automatically converted to/from a list at the boundary to Cython code.
The trouble is that conversion step: when your function is called:
def doStuff(vector[double]& input):
MyClass_doStuff(input)
is transformed to something close to
def doStuff(list input):
vector[double] v= some_cython_function_to_make_a_vector_from_a_list(input)
MyClass_doStuff(input)
# nothing to copy the vector back into the list
The answer(s)
I think you have two options. The first would be to write the process out in full (i.e. do two manual copies):
def doStuff(list input):
cdef vector[double] v = input
MyClass_doStuff(v)
input[:] = v
This will be slow for large vectors, but works for me (my test function is v.push_back(10.0)):
>>> l=[1,2,3,4]
>>> doStuff(l)
>>> l
[1.0, 2.0, 3.0, 4.0, 10.0]
The second option is to define your own wrapper class that directly contains a vector[double]
cdef class WrappedVector:
cdef vector[double] v
# note the absence of:
# automatically defined type conversions (e.g. from list)
# operators to change v (e.g. [])
# etc.
# you're going to have to write these yourself!
and then write
def doStuff(WrappedVector input):
MyClass_doStuff(input.v)

Categories

Resources