How to create an Enum object in Python C API?

How to create an Enum object in Python C API? - python

I'm struggling how to create a python Enum object inside the Python C API. The enum class has assigned tp_base to PyEnum_Type, so it inherits Enum. But, I can't figure out a way to tell the Enum base class what items are in the enum. I want to allow iteration and lookup from Python using the __members__ attribute that every Python Enum provides.
Thank you,
Jelle

It is not straightforward at all. The Enum is a Python class using a Python metaclass. It is possible to create it in C but it will be just emulating the constructing Python code in C - the end result is the same and while it speeds up things slightly, you'll most probably run the code only once within each program run.
In any case it is possible, but it is not easy at all. I'll show how to do it in Python:
from enum import Enum
class Color(Enum):
RED = 1
GREEN = 2
BLUE = 3
print(Color)
print(Color.RED)
is the same as:
from enum import Enum
name = 'Color'
bases = (Enum,)
enum_meta = type(Enum)
namespace = enum_meta.__prepare__(name, bases)
namespace['RED'] = 1
namespace['GREEN'] = 2
namespace['BLUE'] = 3
Color = enum_meta(name, bases, namespace)
print(Color)
print(Color.RED)
The latter is the code that you need to translate into C.

Edited note: An answer on a very similar question details how enum.Enum has a functional interface that can be used instead. That is almost certainly the correct approach. I think my answer here is a useful alternative approach to be aware of, although it probably isn't the best solution to this problem.
I'm aware that this answer is slightly cheating, but this is exactly the kind of code that's better written in Python, and in the C API we still have access to the full Python interpreter. My reasoning for this is that the main reason to keep things entirely in C is performance, and it seems unlikely that creating enum objects will be performance critical.
I'll give three versions, essentially depending on the level of complexity.
First, the simplest case: the enum is entirely known and defined and compile-time. Here we simply set up an empty global dict, run the Python code, then extract the enum from the global dict:
PyObject* get_enum(void) {
const char str[] = "from enum import Enum\n"
"class Colour(Enum):\n"
" RED = 1\n"
" GREEN = 2\n"
" BLUE = 3\n"
"";
PyObject *global_dict=NULL, *should_be_none=NULL, *output=NULL;
global_dict = PyDict_New();
if (!global_dict) goto cleanup;
should_be_none = PyRun_String(str, Py_file_input, global_dict, global_dict);
if (!should_be_none) goto cleanup;
// extract Color from global_dict
output = PyDict_GetItemString(global_dict, "Colour");
if (!output) {
// PyDict_GetItemString does not set exceptions
PyErr_SetString(PyExc_KeyError, "could not get 'Colour'");
} else {
Py_INCREF(output); // PyDict_GetItemString returns a borrow reference
}
cleanup:
Py_XDECREF(global_dict);
Py_XDECREF(should_be_none);
return output;
}
Second, we might want to change what we define in C at runtime. For example, maybe the input parameters pick the enum values. Here, I'm going to use string formatting to insert the appropriate values into our string. There's a number of options here: sprintf, PyBytes_Format, the C++ standard library, using Python strings (perhaps with another call into Python code?). Pick whichever you're most comfortable with.
PyObject* get_enum_fmt(int red, int green, int blue) {
const char str[] = "from enum import Enum\n"
"class Colour(Enum):\n"
" RED = %d\n"
" GREEN = %d\n"
" BLUE = %d\n"
"";
PyObject *formatted_str=NULL, *global_dict=NULL, *should_be_none=NULL, *output=NULL;
formatted_str = PyBytes_FromFormat(str, red, green, blue);
if (!formatted_str) goto cleanup;
global_dict = PyDict_New();
if (!global_dict) goto cleanup;
should_be_none = PyRun_String(PyBytes_AsString(formatted_str), Py_file_input, global_dict, global_dict);
if (!should_be_none) goto cleanup;
// extract Color from global_dict
output = PyDict_GetItemString(global_dict, "Colour");
if (!output) {
// PyDict_GetItemString does not set exceptions
PyErr_SetString(PyExc_KeyError, "could not get 'Colour'");
} else {
Py_INCREF(output); // PyDict_GetItemString returns a borrow reference
}
cleanup:
Py_XDECREF(formatted_str);
Py_XDECREF(global_dict);
Py_XDECREF(should_be_none);
return output;
}
Obviously you can do as much or as little as you like with string formatting - I've just picked a simple example to show the point. The main differences from the previous version are the call to PyBytes_FromFormat to set up the string, and the call to PyBytes_AsString that gets the underlying char* out of the prepared bytes object.
Finally, we could prepare the enum attributes in C Python dict and pass it in. This necessitates a bit of a change. Essentially I use #AnttiHaapala's lower-level Python code, but insert namespace.update(contents) after the call to __prepare__.
PyObject* get_enum_dict(const char* key1, int value1, const char* key2, int value2) {
const char str[] = "from enum import Enum\n"
"name = 'Colour'\n"
"bases = (Enum,)\n"
"enum_meta = type(Enum)\n"
"namespace = enum_meta.__prepare__(name, bases)\n"
"namespace.update(contents)\n"
"Colour = enum_meta(name, bases, namespace)\n";
PyObject *global_dict=NULL, *contents_dict=NULL, *value_as_object=NULL, *should_be_none=NULL, *output=NULL;
global_dict = PyDict_New();
if (!global_dict) goto cleanup;
// create and fill the contents dictionary
contents_dict = PyDict_New();
if (!contents_dict) goto cleanup;
value_as_object = PyLong_FromLong(value1);
if (!value_as_object) goto cleanup;
int set_item_result = PyDict_SetItemString(contents_dict, key1, value_as_object);
Py_CLEAR(value_as_object);
if (set_item_result!=0) goto cleanup;
value_as_object = PyLong_FromLong(value2);
if (!value_as_object) goto cleanup;
set_item_result = PyDict_SetItemString(contents_dict, key2, value_as_object);
Py_CLEAR(value_as_object);
if (set_item_result!=0) goto cleanup;
set_item_result = PyDict_SetItemString(global_dict, "contents", contents_dict);
if (set_item_result!=0) goto cleanup;
should_be_none = PyRun_String(str, Py_file_input, global_dict, global_dict);
if (!should_be_none) goto cleanup;
// extract Color from global_dict
output = PyDict_GetItemString(global_dict, "Colour");
if (!output) {
// PyDict_GetItemString does not set exceptions
PyErr_SetString(PyExc_KeyError, "could not get 'Colour'");
} else {
Py_INCREF(output); // PyDict_GetItemString returns a borrow reference
}
cleanup:
Py_XDECREF(contents_dict);
Py_XDECREF(global_dict);
Py_XDECREF(should_be_none);
return output;
}
Again, this presents a reasonably flexible way to get values from C into a generated enum.
For the sake of testing I used the follow simple Cython wrapper - this is just presented for completeness to help people try these functions.
cdef extern from "cenum.c":
object get_enum()
object get_enum_fmt(int, int, int)
object get_enum_dict(char*, int, char*, int)
def py_get_enum():
return get_enum()
def py_get_enum_fmt(red, green, blue):
return get_enum_fmt(red, green, blue)
def py_get_enum_dict(key1, value1, key2, value2):
return get_enum_dict(key1, value1, key2, value2)
To reiterate: this answer is only partly in the C API, but the approach of calling Python from C is one that I've found productive at times for "run-once" code that would be tricky to write entirely in C.

Related

operator.index with custom class instance

I have a simple class below,
class MyClass(int):
def __index__(self):
return 1
According to operator.index documentation,
operator.index(a)
Return a converted to an integer. Equivalent to a.__index__()
But when I use operator.index with MyClass instance, I got 100 instead of 1 (I am getting 1 if I use a.__index__()). Why is that?.
>>> a = MyClass(100)
>>>
>>> import operator
>>> print(operator.index(a))
100
>>> print(a.__index__())
1

This actually appears to be a deep-rooted issue in cpython. If you look at the source code for operator.py, you can see the definition of index:
def index(a):
"Same as a.__index__()."
return a.__index__()
So...why is it not equivalent? It's literally calling __index__. Well, at the bottom of the source, there's the culprit:
try:
from _operator import *
except ImportError:
pass
else:
from _operator import __doc__
It's overwriting the definitions with a native _operator module. In fact, if you comment this out (either by modifying the actual library or making your own fake operator.py* and importing that), it works. So, we can find the source code for the native _operator library, and look at the related part:
static PyObject *
_operator_index(PyObject *module, PyObject *a)
{
return PyNumber_Index(a);
}
So, it's a wrapper around the PyNumber_Index function. PyNumber_Index is a wrapper around _PyNumber_Index, so we can look at that:
PyObject *
_PyNumber_Index(PyObject *item)
{
PyObject *result = NULL;
if (item == NULL) {
return null_error();
}
if (PyLong_Check(item)) {
Py_INCREF(item);
return item;
}
if (!_PyIndex_Check(item)) {
PyErr_Format(PyExc_TypeError,
"'%.200s' object cannot be interpreted "
"as an integer", Py_TYPE(item)->tp_name);
return NULL;
}
result = Py_TYPE(item)->tp_as_number->nb_index(item);
if (!result || PyLong_CheckExact(result))
return result;
if (!PyLong_Check(result)) {
PyErr_Format(PyExc_TypeError,
"__index__ returned non-int (type %.200s)",
Py_TYPE(result)->tp_name);
Py_DECREF(result);
return NULL;
}
/* Issue #17576: warn if 'result' not of exact type int. */
if (PyErr_WarnFormat(PyExc_DeprecationWarning, 1,
"__index__ returned non-int (type %.200s). "
"The ability to return an instance of a strict subclass of int "
"is deprecated, and may be removed in a future version of Python.",
Py_TYPE(result)->tp_name)) {
Py_DECREF(result);
return NULL;
}
return result;
}
PyObject *
PyNumber_Index(PyObject *item)
{
PyObject *result = _PyNumber_Index(item);
if (result != NULL && !PyLong_CheckExact(result)) {
Py_SETREF(result, _PyLong_Copy((PyLongObject *)result));
}
return result;
}
You can see before it even calls nb_index (the C name for __index__), it calls PyLong_Check on the argument, and if it's true, it just returns the item with no modification. PyLong_Check is a macro that checks for long subtyping (int in python is a PyLong):
#define PyLong_Check(op) \
PyType_FastSubclass(Py_TYPE(op), Py_TPFLAGS_LONG_SUBCLASS)
#define PyLong_CheckExact(op) Py_IS_TYPE(op, &PyLong_Type)
So, basically, the takeaway is that for whatever reason, probably for speed, int subclasses don't get their __index__ method called, and instead just get _PyLong_Copy'd to the resulting return value, but only in the native _operator module, and not in the non-native operator.py. This conflict of implementation as well as inconsistency in documentation leads me to believe that this is an issue, either in the documentation or the implementation, and you may want to raise it as one.
It's likely a documentation and not an implementation issue, as cpython has a habit of sacrificing correctness for speed: (nan,) == (nan,) but nan != nan.
* You may have to name it something like fake_operator.py then import it with import fake_operator as operator

This is because your type is an int subclass. __index__ will not be used because the instance is already an integer. That much is by design, and unlikely to be considered a bug in CPython. PyPy behaves the same.
In _operator.c:
static PyObject *
_operator_index(PyObject *module, PyObject *a)
/*[clinic end generated code: output=d972b0764ac305fc input=6f54d50ea64a579c]*/
{
return PyNumber_Index(a);
}
Note that operator.py Python code is not used generally, this code is only a fallback in the case that compiled _operator module is not available. That explains why the result a.__index__() differs.
In abstract.c, cropped after the relevant PyLong_Check part:
/* Return an exact Python int from the object item.
Raise TypeError if the result is not an int
or if the object cannot be interpreted as an index.
*/
PyObject *
PyNumber_Index(PyObject *item)
{
PyObject *result = _PyNumber_Index(item);
if (result != NULL && !PyLong_CheckExact(result)) {
Py_SETREF(result, _PyLong_Copy((PyLongObject *)result));
}
return result;
}
...
/* Return a Python int from the object item.
Can return an instance of int subclass.
Raise TypeError if the result is not an int
or if the object cannot be interpreted as an index.
*/
PyObject *
_PyNumber_Index(PyObject *item)
{
PyObject *result = NULL;
if (item == NULL) {
return null_error();
}
if (PyLong_Check(item)) {
Py_INCREF(item);
return item; /* <---- short-circuited here */
}
...
}
The documentation for operator.index is inaccurate, so this may be considered a minor documentation issue:
>>> import operator
>>> operator.index.__doc__
'Same as a.__index__()'
So, why isn't __index__ considered for integers? The probable answer is found in PEP 357, under the discussion section titled Speed:
Implementation should not slow down Python because integers and long integers used as indexes will complete in the same number of instructions. The only change will be that what used to generate an error will now be acceptable.
We do not want to slow down the most common case for slicing with integers, having to check for an nb_index slot every time.

Update
This answer is incorrect; I misread the documentation. See Aplet123's answer instead. Tl;dr the problem is actually that the C implementation doesn't match the documentation and Python implementation. The C implementation is more like a if isinstance(a, int) else a.__index__().
To prove it, try defining MyClass.__int__(). The outcome will be the same.
Original answer
See the documentation for object.__index__():
object.__index__(self)
Called to implement operator.index(), and whenever Python needs to losslessly convert the numeric object to an integer object (such as in slicing, or in the built-in bin(), hex() and oct() functions). Presence of this method indicates that the numeric object is an integer type. Must return an integer.
If __int__(), __float__() and __complex__() are not defined then corresponding built-in functions int(), float() and complex() fall back to __index__().
(added bold)
a.__int__() exists, so its return value is used instead.
>>> a.__int__
<method-wrapper '__int__' of MyClass object at 0x7f2c5f0f4ec8>
>>> a.__int__()
100

Pass by reference vs Pass by assignment? C# vs Python

I am working concurrently in C# and in Python.
Is there a difference, in terms of what is being created in memory, between passing a reference type in C#, and passing (by assignment) in Python? It seems in either case, if the variable is changed* in the function, it is changed in the outside scope as well.
(*) of course in Python it must be mutable for this to occur. An immutable object cannot be changed - but that is another topic.
Are we basically just talking different terminology for the same process, or is there a conceptual difference to be learned here, in terms of the underlying mechanism in memory?

First, all arguments are passed by value by default in C#. This has nothing to do with the type being a reference type or a value type, both behave exactly the same way.
Now, the question is, what is a variable? A variable is a placeholder for a value, nothing more. When a variabe is passed by copy, a copy of the value is made.
And what is the value stored in a variable? Well, if the type of the variable is a reference type, the value is basically the memory address of the object its referencing. If its a value type, then the value is the objet itself.
So when you say:
It seems in either case, if the variable is changed* in the function, it is changed in the outside scope as well.
That is deeply wrong because you seem to me be mixing up the type of the argument with how it is passed along:
First example:
var a = new object();
Foo(a);
var isNull = ReferenceEquals(a, null); //false!
void Foo(object o) { o = null; }
Here, a refence typed variable a is passed by value, a copy is made and then inside Foo its reassigned to null. a doesn't care a copy is reasigned inside Foo, it will still point to the same object.
Things of course change if you pass the argument by reference:
var a = new object();
Foo(ref a);
var isNull = ReferenceEquals(a, null); //true!
void Foo(ref object o) { o = null; }
Now you are not making a copy of a named o, you are passing a itself with an alias named o.
Things behave exactly the same with value types:
var a = 1;
Foo(a);
var isNull = 1 == 0; //false!
void Foo(int i) { i = 0; }
And
var a = 1;
Foo(ref a);
var isNull = 1 == 0; //true!
void Foo(ref int i) { i = 0; }
The difference between value types and reference types when you pass it a long by value is due to what the value of the variable is. Like we said before, reference typed variables store the address, so even if you pass along a copy, the copy points to the same object, so any changes in the object are visible from both variables:
var ii = new List<int>();
Foo(ii);
var b = ii.Count == 1; //true!
void Foo(List<int> list) { list.Add(1); }
But with value types, the value is the object itself, so you are passing along a copy of the object, and you are therefore modifying a copy:
struct MutableStruct
{
public int I { get; set; }
}
var m = new mutableStruct();
Foo(m);
var b = m.I == 1; //false!
void Foo(MutableStruct mutableStruct) { mutableStruct.I = 1; }
Does this make things clearer?

Keyword Arguments for C++ [duplicate]

Asked because of this: Default argument in c++
Say I have a function such as this: void f(int p1=1, int p2=2, int p3=3, int p4=4);
And I want to call it using only some of the arguments - the rest will be the defaults.
Something like this would work:
template<bool P1=true, bool P2=true, bool P3=true, bool P4=true>
void f(int p1=1, int p2=2, int p3=3, int p4=4);
// specialize:
template<>
void f<false, true, false, false>(int p1) {
f(1, p1);
}
template<>
void f<false, true, true, false>(int p1, int p2) {
f(1, p1, p2);
}
// ... and so on.
// Would need a specialization for each combination of arguments
// which is very tedious and error-prone
// Use:
f<false, true, false, false>(5); // passes 5 as p2 argument
But it requires too much code to be practical.
Is there a better way to do this?

Use the Named Parameters Idiom (→ FAQ link).
The Boost.Parameters library (→ link) can also solve this task, but paid for by code verbosity and greatly reduced clarity. It's also deficient in handling constructors. And it requires having the Boost library installed, of course.

Have a look at the Boost.Parameter library.
It implements named paramaters in C++. Example:
#include <boost/parameter/name.hpp>
#include <boost/parameter/preprocessor.hpp>
#include <iostream>
//Define
BOOST_PARAMETER_NAME(p1)
BOOST_PARAMETER_NAME(p2)
BOOST_PARAMETER_NAME(p3)
BOOST_PARAMETER_NAME(p4)
BOOST_PARAMETER_FUNCTION(
(void),
f,
tag,
(optional
(p1, *, 1)
(p2, *, 2)
(p3, *, 3)
(p4, *, 4)))
{
std::cout << "p1: " << p1
<< ", p2: " << p2
<< ", p3: " << p3
<< ", p4: " << p4 << "\n";
}
//Use
int main()
{
//Prints "p1: 1, p2: 5, p3: 3, p4: 4"
f(_p2=5);
}

Although Boost.Parameters is amusing, it suffers (unfortunately) for a number of issues, among which placeholder collision (and having to debug quirky preprocessors/template errors):
BOOST_PARAMETER_NAME(p1)
Will create the _p1 placeholder that you then use later on. If you have two different headers declaring the same placeholder, you get a conflict. Not fun.
There is a much simpler (both conceptually and practically) answer, based on the Builder Pattern somewhat is the Named Parameters Idiom.
Instead of specifying such a function:
void f(int a, int b, int c = 10, int d = 20);
You specify a structure, on which you will override the operator():
the constructor is used to ask for mandatory arguments (not strictly in the Named Parameters Idiom, but nobody said you had to follow it blindly), and default values are set for the optional ones
each optional parameter is given a setter
Generally, it is combined with Chaining which consists in making the setters return a reference to the current object so that the calls can be chained on a single line.
class f {
public:
// Take mandatory arguments, set default values
f(int a, int b): _a(a), _b(b), _c(10), _d(20) {}
// Define setters for optional arguments
// Remember the Chaining idiom
f& c(int v) { _c = v; return *this; }
f& d(int v) { _d = v; return *this; }
// Finally define the invocation function
void operator()() const;
private:
int _a;
int _b;
int _c;
int _d;
}; // class f
The invocation is:
f(/*a=*/1, /*b=*/2).c(3)(); // the last () being to actually invoke the function
I've seen a variant putting the mandatory arguments as parameters to operator(), this avoids keeping the arguments as attributes but the syntax is a bit weirder:
f().c(3)(/*a=*/1, /*b=*/2);
Once the compiler has inlined all the constructor and setters call (which is why they are defined here, while operator() is not), it should result in similarly efficient code compared to the "regular" function invocation.

This isn't really an answer, but...
In C++ Template Metaprogramming by David Abrahams and Aleksey Gurtovoy (published in 2004!) the authors talk about this:
While writing this book, we reconsidered the interface used for named
function parameter support. With a little experimentation we
discovered that it’s possible to provide the ideal syntax by using
keyword objects with overloaded assignment operators:
f(slew = .799, name = "z");
They go on to say:
We’re not going to get into the implementation details of this named
parameter library here; it’s straightforward enough that we suggest
you try implementing it yourself as an exercise.
This was in the context of template metaprogramming and Boost::MPL. I'm not too sure how their "straighforward" implementation would jive with default parameters, but I assume it would be transparent.

Python ctype - How to pass data between C functions

I have a self-made C library that I want to access using python. The problem is that the code consists essentially of two parts, an initialization to read in data from a number of files and a few calculations that need to be done only once. The other part is called in a loop and uses the data generated before repeatedly. To this function I want to pass parameters from python.
My idea was to write two C wrapper functions, "init" and "loop" - "init" reads the data and returns a void pointer to a structure that "loop" can use together with additional parameters that I can pass on from python. Something like
void *init() {
struct *mystruct ret = (mystruct *)malloc(sizeof(mystruct));
/* Fill ret with data */
return ret;
}
float loop(void *data, float par1, float par2) {
/* do stuff with data, par1, par2, return result */
}
I tried calling "init" from python as a c_void_p, but since "loop" changes some of the contents of "data" and ctypes' void pointers are immutable, this did not work.
Other solutions to similar problems I saw seem to require knowledge of how much memory "init" would use, and I do not know that.
Is there a way to pass data from one C function to another through python without telling python exactly what or how much it is? Or is there another way to solve my problem?
I tried (and failed) to write a minimum crashing example, and after some debugging it turned out there was a bug in my C code. Thanks to everyone who replied!
Hoping that this might help other people, here is a sort-of-minimal working version (still without separate 'free' - sorry):
pybug.c:
#include <stdio.h>
#include <stdlib.h>
typedef struct inner_struct_s {
int length;
float *array;
} inner_struct_t;
typedef struct mystruct_S {
int id;
float start;
float end;
inner_struct_t *inner;
} mystruct_t;
void init(void **data) {
int i;
mystruct_t *mystruct = (mystruct_t *)malloc(sizeof(mystruct_t));
inner_struct_t *inner = (inner_struct_t *)malloc(sizeof(inner_struct_t));
inner->length = 10;
inner->array = calloc(inner->length, sizeof(float));
for (i=0; i<inner->length; i++)
inner->array[i] = 2*i;
mystruct->id = 0;
mystruct->start = 0;
mystruct->end = inner->length;
mystruct->inner = inner;
*data = mystruct;
}
float loop(void *data, float par1, float par2, int newsize) {
mystruct_t *str = data;
inner_struct_t *inner = str->inner;
int i;
inner->length = newsize;
inner->array = realloc(inner->array, newsize * sizeof(float));
for (i=0; i<inner->length; i++)
inner->array[i] = par1 + i * par2;
return inner->array[inner->length-1];
}
compile as
cc -c -fPIC pybug.c
cc -shared -o libbug.so pybug.o
Run in python:
from ctypes import *
sl = CDLL('libbug.so')
# What arguments do functions take / return?
sl.init.argtype = c_void_p
sl.loop.restype = c_float
sl.loop.argtypes = [c_void_p, c_float, c_float, c_int]
# Init takes a pointer to a pointer
px = c_void_p()
sl.init(byref(px))
# Call the loop a couple of times
for i in range(10):
print sl.loop(px, i, 5, 10*i+5)

You should have a corresponding function to free the data buffer when the caller is done. Otherwise I don't see the issue. Just pass the pointer to loop that you get from init.
init.restype = c_void_p
loop.argtypes = [c_void_p, c_float, c_float]
loop.restype = c_float
I'm not sure what you mean by "ctypes' void pointers are immutable", unless you're talking about c_char_p and c_wchar_p. The issue there is if you pass a Python string as an argument it uses Python's private pointer to the string buffer. If a function can change the string, you should first copy it to a c_char or c_wchar array.
Here's a simple example showing the problem of passing a Python string (2.x byte string) as an argument to a function that modifies it. In this case it changes index 0 to '\x00':
>>> import os
>>> from ctypes import *
>>> open('tmp.c', 'w').write("void f(char *s) {s[0] = 0;}")
>>> os.system('gcc -shared -fPIC -o tmp.so tmp.c')
0
>>> tmp = CDLL('./tmp.so')
>>> tmp.f.argtypes = [c_void_p]
>>> tmp.f.restype = None
>>> tmp.f('a')
>>> 'a'
'\x00'
>>> s = 'abc'
>>> tmp.f(s)
>>> s
'\x00bc'
This is specific to passing Python strings as arguments. It isn't a problem to pass pointers to data structures that are intended to be mutable, either ctypes data objects such as a Structure, or pointers returned by libraries.

Is your C code in a DLL? If so can might consider creating a global pointer in there. init() will do any initialization required and set the pointer equal to newly allocated memory and loop() will operate on that memory. Also don't forget to free it up with a close() function

Q on Python serialization/deserialization

What chances do I have to instantiate, keep and serialize/deserialize to/from binary data Python classes reflecting this pattern (adopted from RFC 2246 [TLS]):
enum { apple, orange } VariantTag;
struct {
uint16 number;
opaque string<0..10>; /* variable length */
} V1;
struct {
uint32 number;
opaque string[10]; /* fixed length */
} V2;
struct {
select (VariantTag) { /* value of selector is implicit */
case apple: V1; /* VariantBody, tag = apple */
case orange: V2; /* VariantBody, tag = orange */
} variant_body; /* optional label on variant */
} VariantRecord;
Basically I would have to define a (variant) class VariantRecord, which varies depending on the value of VariantTag. That's not that difficult. The challenge is to find a most generic way to build a class, which serializes/deserializes to and from a byte stream... Pickle, Google protocol buffer, marshal is all not an option.
I made little success with having an explicit "def serialize" in my class, but I'm not very happy with it, because it's not generic enough.
I hope I could express the problem.
My current solution in case VariantTag = apple would look like this, but I don't like it too much
import binascii
import struct
class VariantRecord(object):
def __init__(self, number, opaque):
self.number = number
self.opaque = opaque
def serialize(self):
out = struct.pack('>HB%ds' % len(self.opaque), self.number, len(self.opaque), self.opaque)
return out
v = VariantRecord(10, 'Hello')
print binascii.hexlify(v.serialize())
>> 000a0548656c6c6f
Regards

Two suggestions:
For the variable length structure use a fixed format
and just slice the result.
Use struct.Struct
e.g. If I've understood your formats correctly (is the length byte that appeared in your example but wasn't mentioned originally present in the other variant also?)
>>> import binascii
>>> import struct
>>> V1 = struct.Struct(">H10p")
>>> V2 = struct.Struct(">L10p")
>>> def serialize(variant, n, s):
if variant:
return V2.pack(n,s)
else:
return V1.pack(n,s)[:len(s)+3]
>>> print binascii.hexlify(serialize(False, 10, 'hello')) #V1
000a0568656c6c6f
>>> print binascii.hexlify(serialize(True, 10, 'hello')) #V2
0000000a0568656c6c6f00000000
>>>

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to create an Enum object in Python C API? - python

Related

operator.index with custom class instance

Pass by reference vs Pass by assignment? C# vs Python

Keyword Arguments for C++ [duplicate]

Python ctype - How to pass data between C functions

Q on Python serialization/deserialization

Categories

Resources