Making python generator via c++20 coroutines - python

Let's say I have this python code:
def double_inputs():
while True:
x = yield
yield x * 2
gen = double_inputs()
next(gen)
print(gen.send(1))
It prints "2", just as expected.
I can make a generator in c++20 like that:
#include <coroutine>
template <class T>
struct generator {
struct promise_type;
using coro_handle = std::coroutine_handle<promise_type>;
struct promise_type {
T current_value;
auto get_return_object() { return generator{coro_handle::from_promise(*this)}; }
auto initial_suspend() { return std::suspend_always{}; }
auto final_suspend() { return std::suspend_always{}; }
void unhandled_exception() { std::terminate(); }
auto yield_value(T value) {
current_value = value;
return std::suspend_always{};
}
};
bool next() { return coro ? (coro.resume(), !coro.done()) : false; }
T value() { return coro.promise().current_value; }
generator(generator const & rhs) = delete;
generator(generator &&rhs)
:coro(rhs.coro)
{
rhs.coro = nullptr;
}
~generator() {
if (coro)
coro.destroy();
}
private:
generator(coro_handle h) : coro(h) {}
coro_handle coro;
};
generator<char> hello(){
//TODO:send string here via co_await, but HOW???
std::string word = "hello world";
for(auto &ch:word){
co_yield ch;
}
}
int main(int, char**) {
for (auto i = hello(); i.next(); ) {
std::cout << i.value() << ' ';
}
}
This generator just produces a string letter by letter, but the string is hardcoded in it. In python, it is possible not only to yield something FROM the generator but to yield something TO it too. I believe it could be done via co_await in C++.
I need it to work like this:
generator<char> hello(){
std::string word = co_await producer; // Wait string from producer somehow
for(auto &ch:word){
co_yield ch;
}
}
int main(int, char**) {
auto gen = hello(); //make consumer
producer("hello world"); //produce string
for (; gen.next(); ) {
std::cout << gen.value() << ' '; //consume string letter by letter
}
}
How can I achieve that? How to make this "producer" using c++20 coroutines?

You have essentially two problems to overcome if you want to do this.
The first is that C++ is a statically typed language. This means that the types of everything involved need to be known at compile time. This is why your generator type needs to be a template, so that the user can specify what type it shepherds from the coroutine to the caller.
So if you want to have this bi-directional interface, then something on your hello function must specify both the output type and the input type.
The simplest way to go about this is to just create an object and pass a non-const reference to that object to the generator. Each time it does a co_yield, the caller can modify the referenced object and then ask for a new value. The coroutine can read from the reference and see the given data.
However, if you insist on using the future type for the coroutine as both output and input, then you need to both solve the first problem (by making your generator template take OutputType and InputType) as well as this second problem.
See, your goal is to get a value to the coroutine. The problem is that the source of that value (the function calling your coroutine) has a future object. But the coroutine cannot access the future object. Nor can it access the promise object that the future references.
Or at least, it can't do so easily.
There are two ways to go about this, with different use cases. The first manipulates the coroutine machinery to backdoor a way into the promise. The second manipulates a property of co_yield to do basically the same thing.
Transform
The promise object for a coroutine is usually hidden and inaccessible from the coroutine. It is accessible to the future object, which the promise creates and which acts as an interface to the promised data. But it is also accessible during certain parts of the co_await machinery.
Specifically, when you perform a co_await on any expression in a coroutine, the machinery looks at your promise type to see if it has a function called await_transform. If so, it will call that promise object's await_transform on every expression you co_await on (at least, in a co_await that you directly write, not implicit awaits, such as the one created by co_yield).
As such, we need to do two things: create an overload of await_transform on the promise type, and create a type whose sole purpose is to allow us to call that await_transform function.
So that would look something like this:
struct generator_input {};
...
//Within the promise type:
auto await_transform(generator_input);
One quick note. The downside of using await_transform like this is that, by specifying even one overload of this function for our promise, we impact every co_await in any coroutine that uses this type. For a generator coroutine, that's not very important, since there's not much reason to co_await unless you're doing a hack like this. But if you were creating a more general mechanism that could distinctly await on arbitrary awaitables as part of its generation, you'd have a problem.
OK, so we have this await_transform function; what does this function need to do? It needs to return an awaitable object, since co_await is going to await on it. But the purpose of this awaitable object is to deliver a reference to the input type. Fortunately, the mechanism co_await uses to convert the awaitable into a value is provided by the awaitable's await_resume method. So ours can just return an InputType&:
//Within the `generator<OutputType, InputType>`:
struct passthru_value
{
InputType &ret_;
bool await_ready() {return true;}
void await_suspend(coro_handle) {}
InputType &await_resume() { return ret_; }
};
//Within the promise type:
auto await_transform(generator_input)
{
return passthru_value{input_value}; //Where `input_value` is the `InputType` object stored by the promise.
}
This gives the coroutine access to the value, by invoking co_await generator_input{};. Note that this returns a reference to the object.
The generator type can easily be modified to allow the ability to modify an InputType object stored in the promise. Simply add a pair of send functions for overwriting the input value:
void send(const InputType &input)
{
coro.promise().input_value = input;
}
void send(InputType &&input)
{
coro.promise().input_value = std::move(input);
}
This represents an asymmetric transport mechanism. The coroutine retrieves a value at a place and time of its own choosing. As such, it is under no real obligation to respond instantly to any changes. This is good in some respects, as it allows a coroutine to insulate itself from deleterious changes. If you're using a range-based for loop over a container, that container cannot be directly modified (in most ways) by the outside world or else your program will exhibit UB. So if the coroutine is fragile in that way, it can copy the data from the user and thus prevent the user from modifying it.
All in all, the needed code isn't that large. Here's a run-able example of your code with these modifications:
#include <coroutine>
#include <exception>
#include <string>
#include <iostream>
struct generator_input {};
template <typename OutputType, typename InputType>
struct generator {
struct promise_type;
using coro_handle = std::coroutine_handle<promise_type>;
struct passthru_value
{
InputType &ret_;
bool await_ready() {return true;}
void await_suspend(coro_handle) {}
InputType &await_resume() { return ret_; }
};
struct promise_type {
OutputType current_value;
InputType input_value;
auto get_return_object() { return generator{coro_handle::from_promise(*this)}; }
auto initial_suspend() { return std::suspend_always{}; }
auto final_suspend() { return std::suspend_always{}; }
void unhandled_exception() { std::terminate(); }
auto yield_value(OutputType value) {
current_value = value;
return std::suspend_always{};
}
void return_void() {}
auto await_transform(generator_input)
{
return passthru_value{input_value};
}
};
bool next() { return coro ? (coro.resume(), !coro.done()) : false; }
OutputType value() { return coro.promise().current_value; }
void send(const InputType &input)
{
coro.promise().input_value = input;
}
void send(InputType &&input)
{
coro.promise().input_value = std::move(input);
}
generator(generator const & rhs) = delete;
generator(generator &&rhs)
:coro(rhs.coro)
{
rhs.coro = nullptr;
}
~generator() {
if (coro)
coro.destroy();
}
private:
generator(coro_handle h) : coro(h) {}
coro_handle coro;
};
generator<char, std::string> hello(){
auto word = co_await generator_input{};
for(auto &ch: word){
co_yield ch;
}
}
int main(int, char**)
{
auto test = hello();
test.send("hello world");
while(test.next())
{
std::cout << test.value() << ' ';
}
}
Be more yielding
An alternative to using an explicit co_await is to exploit a property of co_yield. Namely, co_yield is an expression and therefore it has a value. Specifically, it is (mostly) equivalent to co_await p.yield_value(e), where p is the promise object (ohh!) and e is what we're yielding.
Fortunately, we already have a yield_value function; it returns std::suspend_always. But it could also return an object that always suspends, but also which co_await can unpack into an InputType&:
struct yield_thru
{
InputType &ret_;
bool await_ready() {return false;}
void await_suspend(coro_handle) {}
InputType &await_resume() { return ret_; }
};
...
//in the promise
auto yield_value(OutputType value) {
current_value = value;
return yield_thru{input_value};
}
This is a symmetric transport mechanism; for every value you yield, you receive a value (which may be the same one as before). Unlike the explicit co_await method, you can't receive a value before you start to generate them. This could be useful for certain interfaces.
And of course, you could combine them as you see fit.

Related

Is there a way to inherit a PyTypeObject from a Python library into a custom type in CPython?

I'm trying to create a custom type in CPython that inherits from an already defined type object in Python. My approach thus far is to use PyImport_ImportModule, then access the PyTypeObject and set it to the tp_base attribute in my PyTypeObject.
Importing:
PyTypeObject typeObj = {...} // Previously defined PyTypeObject for a fallback
int init() {
PyObject* obj = PyImport_ImportModule("<absolute_path_to_python_module>");
if (obj && PyObject_HasString(obj, "<python_type_object>")) {
PyTypeObject* type_ptr = (PyTypeObject*) PyObject_GetAttrString(obj, "<python_type_object>");
typeObj = *type_ptr;
}
if (PyType_Ready(&typeObj) < 0) return -1;
... // Other initialization stuff
}
Inheriting:
PyTypeObject CustomType = {
... // Initialization stuff
.tp_base = &typeObj;
};
The custom type is able to inherit the functions, but fails on isInstance(CustomType(), TypeObj). When I try to access the __bases__ attribute of the custom type, it raises a segmentation fault.
Types are identified by identity, not value. When you try to do typeObj = *type_ptr;, even if it "works", typeObj has a different memory address than *type_ptr, and will never be considered the same type. You need to preserve the identity.
Change your code to something like:
PyTypeObject fallback_type_obj = {...} // Previously defined PyTypeObject for a fallback
PyTypeObject *type_ptr = NULL;
int init() {
PyObject* obj = PyImport_ImportModule("<absolute_path_to_python_module>");
if (obj && PyObject_HasString(obj, "<python_type_object>")) {
type_ptr = (PyTypeObject*) PyObject_GetAttrString(obj, "<python_type_object>");
}
if (type_ptr == NULL) {
if (PyType_Ready(&fallback_type_obj) < 0) return -1;
type_ptr = &fallback_type_obj;
}
... // Other initialization stuff
}
You then make a heap type that uses type_ptr as the base (a static/global type won't work, since type_ptr isn't initialized to anything useful statically).

How can I access a Rust Iterator from Python using PyO3?

I'm quite new with Rust, and my first 'serious' project has involved writing a Python wrapper for a small Rust library using PyO3. This has mostly been quite painless, but I'm struggling to work out how to expose lazy iterators over Rust Vecs to Python code.
So far, I have been collecting the values produced by the iterator and returning a list, which obviously isn't the best solution. Here's some code which illustrates my problem:
use pyo3::prelude::*;
// The Rust Iterator, from the library I'm wrapping.
pub struct RustIterator<'a> {
position: usize,
view: &'a Vec<isize>
}
impl<'a> Iterator for RustIterator<'a> {
type Item = &'a isize;
fn next(&mut self) -> Option<Self::Item> {
let result = self.view.get(self.position);
if let Some(_) = result { self.position += 1 };
result
}
}
// The Rust struct, from the library I'm wrapping.
struct RustStruct {
v: Vec<isize>
}
impl RustStruct {
fn iter(&self) -> RustIterator {
RustIterator{ position: 0, view: &self.v }
}
}
// The Python wrapper class, which exposes the
// functions of RustStruct in a Python-friendly way.
#[pyclass]
struct PyClass {
rust_struct: RustStruct,
}
#[pymethods]
impl PyClass {
#[new]
fn new(v: Vec<isize>) -> Self {
let rust_struct = RustStruct { v };
Self{ rust_struct }
}
// This is what I'm doing so far, which works
// but doesn't iterate lazily.
fn iter(&self) -> Vec<isize> {
let mut output_v = Vec::new();
for item in self.rust_struct.iter() {
output_v.push(*item);
}
output_v
}
}
I've tried to wrap the RustIterator class with a Python wrapper, but I can't use PyO3's #[pyclass] proc. macro with lifetime parameters. I looked into pyo3::types::PyIterator but this looks like a way to access a Python iterator from Rust rather than the other way around.
How can I access a lazy iterator over RustStruct.v in Python? It's safe to assume that the type contained in the Vec always derives Copy and Clone, and answers which require some code on the Python end are okay (but less ideal).
You can make your RustIterator a pyclass and then implement the proper trait (PyIterProtocol) using the rust iter itself.
Not tested, but something like:
#[pyclass]
pub struct RustIterator<'a> {
position: usize,
view: &'a Vec<isize>
}
impl<'a> Iterator for RustIterator<'a> {
type Item = &'a isize;
fn next(&mut self) -> Option<Self::Item> {
let result = self.view.get(self.position);
if let Some(_) = result { self.position += 1 };
result
}
}
#[pyproto]
impl PyIterProtocol for Iter {
fn __next__(mut slf: PyRefMut<Self>) -> IterNextOutput<usize, &'static str> {
match self.next() {
Some(value) => IterNextOutput::Yield(value),
None => IterNextOutput::Return("Ended")
}
}
}

Why do strings in python 2.7 not have the "__iter__" attribute, but strings in python 3.7 have the "__iter__" attribute

In both python 2.7 and python 3.7 I am able to do the following:
string = "hello world"
for letter in string:
print(letter)
But if I check for the iterable attribute:
python 2.7
hasattr(string, "__iter__")
>> False
python 3.7
hasattr(string, "__iter__")
>> True
This is something that I have not found to be documented when making a 2to3 migration, and it has caused a little headache because I have hasattr(entity, "__iter__") checks in my code. In python 2.7 this works to differentiate a string from a collection, but not in python 3.7
Really I am asking, what was the design decision, why was it implemented like this. I hope that's not too speculative.
The str_iterator class for str (run type(iter("str"))) was implemented in CPython3. The object of which is returned when called "str".__iter__() or iter("str"). In CPython2 own iterator' class for str wasn't implemented. When you call iter("str") you get instance of base iterator class.
See 1 piece of code. f = t->tp_iter; - return custom iterator if exist. Otherwise PySeqIter_New(o) - return instance of seqiterobject structure (see 2 piece of code). As you can see from the 3 piece of code this basic iterator calls PySequence_GetItem(seq, it->it_index); on iteration.
If you implement your class that support iteration than you need to define either __iter__ method or __getitem__. If you choose the first option then the returned object must implemented __iter__ and __next__ (CPython3) or next (CPython2) methods.
Bad idea to check for hasattr(entity, "__iter__").
If you need to check if the object is iterable, then run isinstance(entity, Iterable).
If you need to exclude only str then run isinstance(entity, Iterable) and not isinstance(entity, str) (CPython3).
1
PyObject *
PyObject_GetIter(PyObject *o)
{
PyTypeObject *t = o->ob_type;
getiterfunc f;
f = t->tp_iter;
if (f == NULL) {
if (PySequence_Check(o)) return PySeqIter_New(o);
return type_error("'%.200s' object is not iterable", o);
}
else {
...
}
}
2
typedef struct {
PyObject_HEAD
Py_ssize_t it_index;
PyObject *it_seq; /* Set to NULL when iterator is exhausted */
} seqiterobject;
PyObject *
PySeqIter_New(PyObject *seq)
{
seqiterobject *it;
if (!PySequence_Check(seq)) {
PyErr_BadInternalCall();
return NULL;
}
it = PyObject_GC_New(seqiterobject, &PySeqIter_Type);
if (it == NULL)
return NULL;
it->it_index = 0;
Py_INCREF(seq);
it->it_seq = seq;
_PyObject_GC_TRACK(it);
return (PyObject *)it;
}
3
static PyObject *
iter_iternext(PyObject *iterator)
{
seqiterobject *it;
PyObject *seq;
PyObject *result;
assert(PySeqIter_Check(iterator));
it = (seqiterobject *)iterator;
seq = it->it_seq;
if (seq == NULL)
return NULL;
if (it->it_index == PY_SSIZE_T_MAX) {
PyErr_SetString(PyExc_OverflowError, "iter index too large");
return NULL;
}
result = PySequence_GetItem(seq, it->it_index);
if (result != NULL) {
it->it_index++;
return result;
}
if (PyErr_ExceptionMatches(PyExc_IndexError) ||
PyErr_ExceptionMatches(PyExc_StopIteration))
{
PyErr_Clear();
it->it_seq = NULL;
Py_DECREF(seq);
}
return NULL;
}

What is the proper syntax to bind a typedef type using pybind11?

I have a struct very similar to this:
struct data_point
{
data_point() = delete;
data_point(const int& data) :
m_data(data)
{}
int m_data;
};
I also have this type declared as such.
typedef std::vector<data_point> data_list;
The binding for this struct is defined:
PYBIND11_MODULE(data_lib, ref)
{
py::class_<data_point> dp(ref, "data_point");
dp.def(py::init<const int&>());
dp.def_readwrite("m_data", &data_point::m_data);
}
How do I define a binding for the typedef list type?
Its not clear to me how to do this in the pybind documentation.
For this specific issue, pybind will automatically interpret an std::vector type as a python list when you include "pybind11/stl.h" . Thus, a binding for this type is unnecessary.
Ex:
#include "pybind11.h"
#include "pybind11/stl.h"
struct data_point
{
data_point() = delete;
data_point(const int& data) :
m_data(data)
{}
int m_data;
};
std::vector<data_point> make_vec(){
return {data_point(20), data_point(18)};
}
PYBIND11_MODULE(data_lib, ref)
{
py::class_<data_point> dp(ref, "data_point");
dp.def(py::init<const int&>());
dp.def_readwrite("m_data", &data_point::m_data);
ref.def("make_vec", &make_vec, "A function that returns a vector of data_points");
}
In python, when you import the data_lib library you will be able to use functions that return lists of data_point.
import data_lib
p = data_lib.make_vec()
print len(p)
output: 2

Boost.Python return python object which references to existing c++ objects

Suppose I want to get the reference to some global / internal c++ object, one method is to declare function with boost::python::return_value_policy<reference_existing_object>().
Both GetGlobalObjectA and GetGlobalObjectB return the reference to the original c++ object without create a new copy;
But how to make GetGlobalObjectByID return a ref to the existing c++ object?
struct A { uint32_t value; };
struct B { uint64_t value; };
A globalA;
B globalB;
boost::python::object GetGlobalObjectByID(int id)
{
// boost::python::object will return a new copy of C++ object, not the global one.
if (id == 1)
return boost::python::object(&globalA);
else if (id == 2)
return boost::python::object(&globalB);
else
return boost::python::object(nullptr);
}
A& GetGlobalObjectA() { return globalA; }
B& GetGlobalObjectB() { return globalB; }
BOOST_PYTHON_MODULE(myModule)
{
using namespace boost::python;
class_&ltA&gt("A");
class_&ltB&gt("B");
def("GetGlobalObjectByID", GetGlobalObjectByID);
def("GetGlobalObjectA", GetGlobalObjectA, return_value_policy&ltreference_existing_object&gt());
def("GetGlobalObjectB", GetGlobalObjectB, return_value_policy&ltreference_existing_object&gt());
}
Based on this answer, I found it is possible to use reference_existing_object::apply as a converter.
template &lttypename T&gt
inline object MagicMethod(T* ptr)
{
typename reference_existing_object::apply::type converter;
handle handle(converter(ptr));
return object(handle);
}
And here is the modified version.
boost::python::object GetGlobalObjectByID(int id)
{
if (id == 1)
return MagicMethod(&globalA);
else if (id == 2)
return MagicMethod(&globalB);
else
return boost:python::object();
}

Categories

Resources