Cython: Pass by Reference

Cython: Pass by Reference - python

Problem
I would like to pass a vector by reference to a function in Cython.
cdef extern from "MyClass.h" namespace "MyClass":
void MyClass_doStuff "MyClass::doStuff"(vector[double]& input) except +
cdef class MyClass:
...
#staticmethod
def doStuff(vector[double]& input):
MyClass_doStuff(input)
Question
The above code doesn't throw an error during compilation but it's also not working. input is simply unchanged after the method.
I have also tried the recommendation in this question but in this case the cdef-function won't be accessible from Python ("unknown member doStuff...").
Is passing by reference possible and, if so, how to do it properly?
Edit
This is not a duplicate of cython-c-passing-by-reference as I refer to the question in the section above. The proposed solution does not accomplish my goal of having a python function taking a parameter by reference.

The problem
The trouble, as Kevin and jepio say in the comments to your question, is how you handle the vector in Python. Cython does define a cpp vector class, which is automatically converted to/from a list at the boundary to Cython code.
The trouble is that conversion step: when your function is called:
def doStuff(vector[double]& input):
MyClass_doStuff(input)
is transformed to something close to
def doStuff(list input):
vector[double] v= some_cython_function_to_make_a_vector_from_a_list(input)
MyClass_doStuff(input)
# nothing to copy the vector back into the list
The answer(s)
I think you have two options. The first would be to write the process out in full (i.e. do two manual copies):
def doStuff(list input):
cdef vector[double] v = input
MyClass_doStuff(v)
input[:] = v
This will be slow for large vectors, but works for me (my test function is v.push_back(10.0)):
>>> l=[1,2,3,4]
>>> doStuff(l)
>>> l
[1.0, 2.0, 3.0, 4.0, 10.0]
The second option is to define your own wrapper class that directly contains a vector[double]
cdef class WrappedVector:
cdef vector[double] v
# note the absence of:
# automatically defined type conversions (e.g. from list)
# operators to change v (e.g. [])
# etc.
# you're going to have to write these yourself!
and then write
def doStuff(WrappedVector input):
MyClass_doStuff(input.v)

Related

Correct way to init object in cython function

I'm using following cython code:
cimport numpy as np
np.import_array()
cdef class Seen:
cdef bint sint_
def __cinit__(self):
print('INIT seen object')
self.sint_ = 0
cdef saw_int(self, object val):
self.sint_ = 1
def test(object val):
test_type(val)
def test_type(object val, Seen seen=Seen()):
print ('BEFORE:', seen.sint_)
val = int(val)
seen.saw_int(val)
print ('AFTER:', seen.sint_)
Build it and call functions like so:
import test
import numpy as np
test.test(-1)
print('')
test.test(np.iinfo(np.uint64).max)
The output which produces questions:
INIT seen object
BEFORE: False
AFTER: True
BEFORE: True
AFTER: True
As output states - seen object is not instantiated in second test.test call. But at the same time if change test_type declaration like so:
cdef test_type(object val):
cdef Seen seen=Seen()
...
Init happens on each call.
So 2 questions:
Why 2 realizations of test_type is different? As far as I remember from cython docs these two is interchangable.
How should I pass seen object to the test_type with the default as init new one? If (..., Seen seen=Seen()) not working?

The default value of a function is evaluated once, when the function is defined. If you want a new Seen instance each time you call test_type, do the following:
def test_type(object val, Seen seen=None):
if seen is None:
seen = Seen()
print ('BEFORE:', seen.sint_)
val = int(val)
seen.saw_int(val)
print ('AFTER:', seen.sint_)
Caveat: I'm not very familiar with cython, so there might be a subtlety that I am missing. But this would be the issue in ordinary Python code, and I suspect the same issue applies here.

Is there a standard docstring format to show what signature is expected for an argument that takes a function?

My __init__ method accepts another function as an argument called func_convert:
class Adc:
"""Reads data from ADC
"""
def __init__(self, func_convert):
"""Setup the ADC
Parameters
----------
func_convert : [???]
user-supplied conversion function
"""
self._func_convert = func_convert
def read(self):
data = 0 #some fake data
return self._func_convert(data)
The argument func_convert allows a custom scaling function to be supplied once at instantiation that gets called to convert the data each time it's read. The function must accept a single int argument and return a single float. One possible example would be:
def adc_to_volts(value):
return value * 3.0 / 2**16 - 1.5
adc = Adc(adc_to_volts)
volts = adc.read()
Is there a standard way to document what the expected signature of func_convert is in the parameters section of the __init__ docstring? If it makes a difference, I'm using numpy docstring style (I think).

I don't know if this standard exists for docstrings - you can of course explain what the function needs in simple sentences, but I assume you want a standard, documentation generator-friendly way to do this.
If you don't mind switching tools, this is possible using type hints and the Callable object from the typing module:
from typing import Callable
class Adc:
"""
Reads data from ADC
"""
def __init__(self, func_convert: Callable[[int], float]) -> None:
self._func_convert = func_convert
def read(self):
data = 0 # some fake data
return self._func_convert(data)

If you want to follow numpy docstring style, there are some examples from numpy which show how function parameters are described:
1)
apply_along_axis(func1d, axis, arr, *args, **kwargs)
...
Parameters
----------
func1d : function (M,) -> (Nj...)
This function should accept 1-D arrays. It is applied to 1-D
slices of `arr` along the specified axis.
2)
apply_over_axes(func, a, axes)
...
Parameters
----------
func : function
This function must take two arguments, `func(a, axis)`.
3)
set_string_function(f, repr=True)
...
Parameters
----------
f : function or None
Function to be used to pretty print arrays. The function should expect
a single array argument and return a string of the representation of
the array. If None, the function is reset to the default NumPy function
to print arrays.
TLDR: they are described manually, without any special syntax or guidelines. If your goal is to create docstrings similar to numpy's, you can describe them in any way you want. But I strongly suggest following #jfaccioni answer and using type hints.

Same memory address for different strings in cython

I wrote a tree object in cython that has many nodes, each containing a single unicode character. I wanted to test whether the character gets interned if I use Py_UNICODE or str as the variable type. I'm trying to test this by creating multiple instances of the node class and getting the memory address of the character for each, but somehow I end up with the same memory address, even if the different instances contain different characters. Here is my code:
from libc.stdint cimport uintptr_t
cdef class Node():
cdef:
public str character
public unsigned int count
public Node lo, eq, hi
def __init__(self, str character):
self.character = character
def memory(self):
return <uintptr_t>&self.character[0]
I am trying to compare the memory locations like so, from Python:
a = Node("a")
a2 = Node("a")
b = Node("b")
print(a.memory(), a2.memory(), b.memory())
But the memory addresses that prints out are all the same. What am I doing wrong?

Obviously, what you are doing is not what you think you would be doing.
self.character[0] doesn't return the address/reference of the first character (as it would be the case for an array for example), but a Py_UCS4-value (i.e. an usigned 32bit-integer), which is copied to a (local, temprorary) variable on the stack.
In your function, <uintptr_t>&self.character[0] gets you the address of the local variable on the stack, which per chance is always the same because when calling memory there is always the same stack-layout.
To make it clearer, here is the difference to a char * c_string, where &c_string[0] gives you the address of the first character in c_string.
Compare:
%%cython
from libc.stdint cimport uintptr_t
cdef char *c_string = "name";
def get_addresses_from_chars():
for i in range(4):
print(<uintptr_t>&c_string[i])
cdef str py_string="name";
def get_addresses_from_pystr():
for i in range(4):
print(<uintptr_t>&py_string[i])
An now:
>>> get_addresses_from_chars() # works - different addresses every time
# ...7752
# ...7753
# ...7754
# ...7755
>>> get_addresses_from_pystr() # works differently - the same address.
# ...0672
# ...0672
# ...0672
# ...0672
You can see it this way: c_string[...] is a cdef functionality, but py_string[...] is a python-functionality and thus cannot return an address per construction.
To influence the stack-layout, you could use a recursive function:
def memory(self, level):
if level==0 :
return <uintptr_t>&self.character[0]
else:
return self.memory(level-1)
Now calling it with a.memory(0), a.memory(1) and so on will give you different addresses (unless tail-call-optimization will kick in, I don't believe it will happen, but you could disable the optimization (-O0) just to be sure). Because depending on the level/recursion-depth, the local variable, whose address will be returned, is in a different place on the stack.
To see whether Unicode-objects are interned, it is enough to use id, which yields the address of the object (this is a CPython's implementation detail) so you don't need Cython at all:
>>> id(a.character) == id(a2.character)
# True
or in Cython, doing the same what id does (a little bit faster):
%%cython
from libc.stdint cimport uintptr_t
from cpython cimport PyObject
...
def memory(self):
# cast from object to PyObject, so the address can be used
return <uintptr_t>(<PyObject*>self.character)
You need to cast an object to PyObject *, so the Cython will allow to take the address of the variable.
And now:
>>> ...
>>> print(a.memory(), a2.memory(), b.memory())
# ...5800 ...5800 ...5000
If you want to get the address of the first code-point in the unicode object (which is not the same as the address of the string), you can use <PY_UNICODE *>self.character which Cython will replace by a call to PyUnicode_AsUnicode, e.g.:
%%cython
...
def memory(self):
return <uintptr_t>(<Py_UNICODE*>self.character), id(self.character)
and now
>>> ...
>>> print(a.memory(), a2.memory(), b.memory())
# (...768, ...800) (...768, ...800) (...144, ...000)
i.e. "a" is interned and has different address than "b" and code-points bufffer has a different address than the objects containing it (as one would expect).

Python - Inner Functions, Closures, and Factory Functions - how to factor out?

I'm hoping a Python expert out there can offer some assistance on the confusion I'm experiencing currently with Inner Functions, Closures, and Factory Functions. Upon looking for an implemented example of a General Hough Transform I found this:
https://github.com/vmonaco/general-hough/blob/master/src/GeneralHough.py
I'd like to translate this into C++ and it seems the first step is to factor out the inner function in general_hough_closure():
def general_hough_closure(reference_image):
'''
Generator function to create a closure with the reference image and origin
at the center of the reference image
Returns a function f, which takes a query image and returns the accumulator
'''
referencePoint = (reference_image.shape[0]/2, reference_image.shape[1]/2)
r_table = build_r_table(reference_image, referencePoint)
def f(query_image):
return accumulate_gradients(r_table, query_image)
return f
I seem to be stuck on how this function works. "f" does not seem to be called anywhere, and I'm not sure how the function knows what "query_image" is? I'v tried various Googling to find tips on Inner Functions, Closures, and Factory Functions, for example this and some similar pages, but all the examples I can find are more simplified and therefore not much help. Can anybody offer some direction?

The code is just returning the function f as a whole thing. There's no need to "know what the argument is" -- f will know it at the time it is called. The classic example is this:
>>> def f(x):
... def g(y):
... return x + y
... return g
...
>>> f
<function f at 0x7f8500603ae8>
>>> f(1)
<function f.<locals>.g at 0x7f8500603a60>
>>> s = f(1)
>>> s(2)
3
Here, as in your function, g closes over another value (x or r_table, respectively), while still expecting its actual argument.
Since there is a closed-over value, you cannot directly factor out f. One traditional approach is to return an object containing the value, which has some kind of call method representing the function; the easier way in C++ nowadays is to use a lambda function:
int f(int x) {
auto g = [x](int y) {
return x + y
};
return g;
}
In C++ you have the "advantage" that it will yell at you if don't specify which values you are closing over (that's the [x] here). But internall, it does pretty much the same thing (constructing an anonymous class with an x-member).

C++ before C++11 does not have function as type.
You can use the following class to emulate the semantics (pseudo code):
class GeneralHoughClosure {
public:
GeneralHoughClosure(reference_image) {
referencePoint = (reference_image.shape[0]/2, reference_image.shape[1]/2)
r_table = build_r_table(reference_image, referencePoint)
}
void Run(queryImage) {
return accumulate_gradients(r_table, query_image)
}
void operator()(queryImage) {
return accumulate_gradients(r_table, query_image)
}
}
Then, you can use it as follows:
gg = new GeneralHoughClosure(reference_image)
gg.Run(queryImage1)
gg(queryImage2)

Python: Best way to deal with functions with long list of arguments?

I've found various detailed explanations on how to pass long lists of arguments into a function, but I still kinda doubt if that's proper way to do it.
In other words, I suspect that I'm doing it wrong, but I can't see how to do it right.
The problem: I have (not very long) recurrent function, which uses quite a number of variables and needs to modify some content in at least some of them.
What I end up with is sth like this:
def myFunction(alpha, beta, gamma, zeta, alphaList, betaList, gammaList, zetaList):
<some operations>
myFunction(alpha, beta, modGamma, zeta, modAlphaList, betaList, gammaList, modZetaList)
...and I want to see the changes I did on original variables (in C I would just pass a reference, but I hear that in Python it's always a copy?).
Sorry if noob, I don't know how to phrase this question so I can find relevant answers.

You could wrap up all your parameters in a class, like this:
class FooParameters:
alpha = 1.0
beta = 1.0
gamma = 1.0
zeta = 1.0
alphaList = []
betaList = []
gammaList = []
zetaList = []
and then your function takes a single parameter instance:
def myFunction(params):
omega = params.alpha * params.beta + exp(params.gamma)
# more magic...
calling like:
testParams = FooParameters()
testParams.gamma = 2.3
myFunction(testParams)
print params.zetaList
Because the params instance is passed by reference, changes in the function are preserved.

This is commonly used in matplotlib, for example. They pass the long list of arguments using * or **, like:
def function(*args, **kwargs):
do something
Calling function:
function(1,2,3,4,5, a=1, b=2, b=3)
Here 1,2,3,4,5 will go to args and a=1, b=2, c=3 will go to kwargs, as a dictionary. So that they arrive at your function like:
args = [1,2,3,4,5]
kwargs = {a:1, b:2, c:3}
And you can treat them in the way you want.

I don't know where you got the idea that Python copies values when passing into a function. That is not at all true.
On the contrary: each parameter in a function is an additional name referring to the original object. If you change the value of that object in some way - for example, if it's a list and you change one of its members - then the original will also see that change. But if you rebind the name to something else - say by doing alpha = my_completely_new_value - then the original remains unchanged.

You may be tempted to something akin to this:
def myFunction(*args):
var_names = ['alpha','beta','gamma','zeta']
locals().update(zip(var_names,args))
myFunction(alpha,beta,gamma,zeta)
However, this 'often' won't work. I suggest introducing another namespace:
from collections import OrderedDict
def myFunction(*args):
var_names = ['alpha','beta','gamma','zeta']
vars = OrderedDict(zip(var_names,args))
#get them all via vars[var_name]
myFunction(*vars.values()) #since we used an orderedDict we can simply do *.values()

you can capture the non-modfied values in a closure:
def myFunction(alpha, beta, gamma, zeta, alphaList, betaList, gammaList, zetaList):
def myInner(g=gamma, al, zl):
<some operations>
myInner(modGamma, modAlphaList, modZetaList)
myInner(al=alphaList, zl=zetaList)
(BTW, this is about the only way to write a truly recursive function in Python.)

You could pass in a dictionary and return a new dictionary. Or put your method in a class and have alpha, beta etc. be attributes.

You should put myFunction in a class. Set up the class with the appropriate attributes and call the appropriate functions. The state is then well contained in the class.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Cython: Pass by Reference - python

Related

Correct way to init object in cython function

Is there a standard docstring format to show what signature is expected for an argument that takes a function?

Same memory address for different strings in cython

Python - Inner Functions, Closures, and Factory Functions - how to factor out?

Python: Best way to deal with functions with long list of arguments?

Categories

Resources