I'm writing a parser for a binary format. This binary format involves different tables which are again in binary format containing varying field sizes usually (somewhere between 50 - 100 of them).
Most of these structures will have bitfields and will look something like these when represented in C:
struct myHeader
{
unsigned char fieldA : 3
unsigned char fieldB : 2;
unsigned char fieldC : 3;
unsigned short fieldD : 14;
unsigned char fieldE : 4
}
I came across the struct module but realized that its lowest resolution was a byte and not a bit, otherwise the module pretty much was the right fit for this work.
I know bitfields are supported using ctypes, but I'm not sure how to interface ctypes structs containing bitfields here.
My other option is to manipulate the bits myself and feed it into bytes and use it with the struct module - but since I have close to 50-100 different types of such structures, writing the code for that becomes more error-prone. I'm also worried about efficiency since this tool might be used to parse large gigabytes of binary data.
Thanks.
Using bitstring (which you mention you're looking at) it should be easy enough to implement. First to create some data to decode:
>>> myheader = "3, 2, 3, 14, 4"
>>> a = bitstring.pack(myheader, 1, 0, 5, 1000, 2)
>>> a.bin
'00100101000011111010000010'
>>> a.tobytes()
'%\x0f\xa0\x80'
And then decoding it again is just
>>> a.readlist(myheader)
[1, 0, 5, 1000, 2]
Your main concern might well be the speed. The library is well optimised Python, but that's not nearly as fast as a C library would be.
I haven't rigorously tested this, but it seems to work with unsigned types (edit: it works with signed byte/short types, too).
Edit 2: This is really hit or miss. It depends on the way the library's compiler packed the bits into the struct, which is not standardized. For example, with gcc 4.5.3 it works as long as I don't use the attribute to pack the struct, i.e. __attribute__ ((__packed__)) (so instead of 6 bytes it gets packed into 4 bytes, which you can check with __alignof__ and sizeof). I can make it almost work by adding _pack_ = True to the ctypes Structure definition, but it fails for fieldE. gcc notes: "Offset of packed bit-field ‘fieldE’ has changed in GCC 4.4".
import ctypes
class MyHeader(ctypes.Structure):
_fields_ = [
('fieldA', ctypes.c_ubyte, 3),
('fieldB', ctypes.c_ubyte, 2),
('fieldC', ctypes.c_ubyte, 3),
('fieldD', ctypes.c_ushort, 14),
('fieldE', ctypes.c_ubyte, 4),
]
lib = ctypes.cdll.LoadLibrary('C/bitfield.dll')
hdr = MyHeader()
lib.set_header(ctypes.byref(hdr))
for x in hdr._fields_:
print("%s: %d" % (x[0], getattr(hdr, x[0])))
Output:
fieldA: 3
fieldB: 1
fieldC: 5
fieldD: 12345
fieldE: 9
C:
typedef struct _MyHeader {
unsigned char fieldA : 3;
unsigned char fieldB : 2;
unsigned char fieldC : 3;
unsigned short fieldD : 14;
unsigned char fieldE : 4;
} MyHeader, *pMyHeader;
int set_header(pMyHeader hdr) {
hdr->fieldA = 3;
hdr->fieldB = 1;
hdr->fieldC = 5;
hdr->fieldD = 12345;
hdr->fieldE = 9;
return(0);
}
Related
I found _datetimemodule.c which seems to be the right file, but I need a bit of help as C is not my strength.
>>> import datetime
>>> import sys
>>> d = datetime.datetime.now()
>>> sys.getsizeof(d)
48
>>> d = datetime.datetime(2018, 12, 31, 23, 59, 59, 123)
>>> sys.getsizeof(d)
48
So a timezone-unaware datetime object nees 48 Bytes. Looking at the PyDateTime_DateTimeType, it seems to be a PyDateTime_DateType and a PyDateTime_TimeType. Maybe also _PyDateTime_BaseTime?
From looking at the code, I have the impression that one component is stored for each field in YYYY-mm-dd HH:MM:ss, meaning:
Year: e.g. int (e.g int16_t would be 16 bit)
Month: e.g int8_t
day: e.g. int8_t
Hour: e.g. int8_t
Minute: e.g. int8_t
Second: e.g. int8_t
Microsecond: e.g. uint16_t
But that would be 2*16 + 5 * 8 = 72 Bit = 9 Byte and not 48 Byte as Python tells me.
Where is my assumption about the internal structure of datetime wrong? How can I see this in the code?
(I guess this might differ between Python implementations - if so, please focus on cPython)
You're missing a key part of the picture: the actual datetime struct definitions, which lie in Include/datetime.h. There are also important comments in there. Here are some key excerpts:
/* Fields are packed into successive bytes, each viewed as unsigned and
* big-endian, unless otherwise noted:
*
* byte offset
* 0 year 2 bytes, 1-9999
* 2 month 1 byte, 1-12
* 3 day 1 byte, 1-31
* 4 hour 1 byte, 0-23
* 5 minute 1 byte, 0-59
* 6 second 1 byte, 0-59
* 7 usecond 3 bytes, 0-999999
* 10
*/
...
/* # of bytes for year, month, day, hour, minute, second, and usecond. */
#define _PyDateTime_DATETIME_DATASIZE 10
...
/* The datetime and time types have hashcodes, and an optional tzinfo member,
* present if and only if hastzinfo is true.
*/
#define _PyTZINFO_HEAD \
PyObject_HEAD \
Py_hash_t hashcode; \
char hastzinfo; /* boolean flag */
...
/* All datetime objects are of PyDateTime_DateTimeType, but that can be
* allocated in two ways too, just like for time objects above. In addition,
* the plain date type is a base class for datetime, so it must also have
* a hastzinfo member (although it's unused there).
*/
...
#define _PyDateTime_DATETIMEHEAD \
_PyTZINFO_HEAD \
unsigned char data[_PyDateTime_DATETIME_DATASIZE];
typedef struct
{
_PyDateTime_DATETIMEHEAD
} _PyDateTime_BaseDateTime; /* hastzinfo false */
typedef struct
{
_PyDateTime_DATETIMEHEAD
unsigned char fold;
PyObject *tzinfo;
} PyDateTime_DateTime; /* hastzinfo true */
Additionally, note the following lines in Modules/_datetimemodule.c:
static PyTypeObject PyDateTime_DateTimeType = {
PyVarObject_HEAD_INIT(NULL, 0)
"datetime.datetime", /* tp_name */
sizeof(PyDateTime_DateTime), /* tp_basicsize */
That tp_basicsize line says sizeof(PyDateTime_DateTime), not sizeof(_PyDateTime_BaseDateTime), and the type doesn't implement any special __sizeof__ handling. That means the datetime.datetime type reports its instance size as the size of a time-zone aware datetime, even for unaware instances.
The 48-byte count you're seeing breaks down as follows:
8-byte refcount
8-byte type pointer
8-byte cached hash
1-byte "hastzinfo" flag
10-byte manually packed unsigned char[10] containing datetime data
1-byte "fold" flag (DST-related)
4-byte padding, to align the tzinfo pointer
8-byte tzinfo pointer
This is true even though the actual memory layout of your unaware instance doesn't have a fold flag or tzinfo pointer.
This is, of course, all implementation details. It may be different on a different Python implementation, or a different CPython version, or a 32-bit CPython build, or a CPython debug build (there's extra stuff in the PyObject_HEAD when CPython is compiled with Py_TRACE_REFS defined).
From PEP 393 I understand that Python can use multiple encodings internally when storing strings: latin1, UCS-2, UCS-4. Is it possible to find out what encoding is used to store a particular string, e.g. in the interactive interpreter?
There is a CPython C API function for the kind of the unicode object: PyUnicode_KIND.
In case you have Cython and IPython1 you can easily access that function:
In [1]: %load_ext cython
...:
In [2]: %%cython
...:
...: cdef extern from "Python.h":
...: int PyUnicode_KIND(object o)
...:
...: cpdef unicode_kind(astring):
...: if type(astring) is not str:
...: raise TypeError('astring must be a string')
...: return PyUnicode_KIND(astring)
In [3]: a = 'a'
...: b = 'Ǧ'
...: c = '😀'
In [4]: unicode_kind(a), unicode_kind(b), unicode_kind(c)
Out[4]: (1, 2, 4)
Where 1 represents latin-1 and 2 and 4 represent UCS-2 and UCS-4 respectively.
You could then use a dictionary to map these numbers into a string that represents the encoding.
1 It's also possible without Cython and/or IPython, the combination is just very handy, otherwise it would be more code (without IPython) and/or require a manual installation (without Cython).
The only way you can test this from the Python layer (without resorting to manually mucking about with object internals via ctypes or Python extension modules) is by checking the ordinal value of the largest character in the string, which determines whether the string is stored as ASCII/latin-1, UCS-2 or UCS-4. A solution would be something like:
def get_bpc(s):
maxordinal = ord(max(s, default='\0'))
if maxordinal < 256:
return 1
elif maxordinal < 65536:
return 2
else:
return 4
You can't actually rely on sys.getsizeof because, for non-ASCII strings (even one byte per character strings that fit in the latin-1 range), the string might or might not have populated the UTF-8 representation of the string, and tricks like adding an extra character to it and comparing sizes could actually show the size decrease, and it can actually happen "at a distance", so you're not directly responsible for the existence of the cached UTF-8 form on the string you're checking. For example:
>>> e = 'é'
>>> sys.getsizeof(e)
74
>>> sys.getsizeof(e + 'a')
75
>>> class é: pass # One of several ways to trigger creation/caching of UTF-8 form
>>> sys.getsizeof(e)
77 # !!! Grew three bytes even though it's the same variable
>>> sys.getsizeof(e + 'a')
75 # !!! Adding a character shrunk the string!
One way of finding out which exact internal encoding CPython uses for a specific unicode string is to peek in the actual (CPython) object.
According to PEP 393 (Specification section), all unicode string objects start with PyASCIIObject:
typedef struct {
PyObject_HEAD
Py_ssize_t length;
Py_hash_t hash;
struct {
unsigned int interned:2;
unsigned int kind:2;
unsigned int compact:1;
unsigned int ascii:1;
unsigned int ready:1;
} state;
wchar_t *wstr;
} PyASCIIObject;
Character size is stored in the kind bit-field, as described in the PEP, as well as in the code comments in unicodeobject:
00 => str is not initialized (data are in wstr)
01 => 1 byte (Latin-1)
10 => 2 byte (UCS-2)
11 => 4 byte (UCS-4);
After we get the address of the string with id(string), we can use the ctypes module to read the object's bytes (and the kind field):
import ctypes
mystr = "x"
first_byte = ctypes.c_uint8.from_address(id(mystr)).value
The offset from the object's start to kind is PyObject_HEAD + Py_ssize_t length + Py_hash_t hash, which in turn is Py_ssize_t ob_refcnt + pointer to ob_type + Py_ssize_t length + size of another pointer for the hash type:
offset = 2 * ctypes.sizeof(ctypes.c_ssize_t) + 2 * ctypes.sizeof(ctypes.c_void_p)
(which is 32 on x64)
All put together:
import ctypes
def bytes_per_char(s):
offset = 2 * ctypes.sizeof(ctypes.c_ssize_t) + 2 * ctypes.sizeof(ctypes.c_void_p)
kind = ctypes.c_uint8.from_address(id(s) + offset).value >> 2 & 3
size = {0: ctypes.sizeof(ctypes.c_wchar), 1: 1, 2: 2, 3: 4}
return size[kind]
Gives:
>>> bytes_per_char('test')
1
>>> bytes_per_char('đžš')
2
>>> bytes_per_char('😀')
4
Note we had to handle the special case of kind == 0, because than the character type is exactly wchar_t (which is 16 or 32 bits, depending on the platform).
I would really appreciate if any explanation can be given to output of following piece of code. I am not getting why sizeof(struct_2) and sizeof(my_struct_2) are different, provided sizeof(struct_1) and sizeof(c_int) is same.
It seems ctypes packed struct within struct in some different way ?
from ctypes import *
class struct_1(Structure):
pass
int8_t = c_int8
int16_t = c_int16
uint8_t = c_uint8
struct_1._fields_ = [
('tt1', int16_t),
('tt2', uint8_t),
('tt3', uint8_t),
]
class struct_2(Structure):
pass
int8_t = c_int8
int16_t = c_int16
uint8_t = c_uint8
struct_2._fields_ = [
('t1', int8_t),
('t2', uint8_t),
('t3', uint8_t),
('t4', uint8_t),
('t5', int16_t),
('t6', struct_1),
('t7', struct_1 * 6),
]
class my_struct_2(Structure):
#_pack_ = 1 # This will give answer as 34
#_pack_ = 4 #36
_fields_ = [
('t1', c_int8),
('t2', c_uint8),
('t3', c_uint8),
('t4', c_uint8),
('t5', c_int16),
('t6', c_int),
('t7', c_int * 6),
]
print "size of c_int : ", sizeof(c_int)
print "size of struct_1 : ", sizeof(struct_1)
print "size of my struct_2 : ", sizeof(my_struct_2)
print "siz of origional struct_2: ", sizeof(struct_2)
OUTPUT:
size of c_int : 4
size of struct_1 : 4
size of my struct_2 : 36
siz of origional struct_2: 34 ==> why not 36 ??
EDIT:
Rename t6->t7 (array of struct_1) and removed pack=2 from struct_2. But still I see different size for struct_2 and my_struct_2
The difference arises from the presence or absence of padding between or after elements in the structure layout, because when the sizes differ, the larger one accounts for more bytes than are required for all the individual structure members. Members t5 and t6 alone are sufficient to demonstrate the difference, and there is no difference if t5 (only) is omitted.
A little experimentation shows that by default (i.e. when the _pack_ member is not specified), ctypes provides 2-byte alignment for structure type struct_1, but 4-byte alignment for type c_int. Or it does on my system, anyway. The ctypes documentation claims that by default it lays out structures the same way that the system's C compiler (by default) does, and that indeed seems to be the case. Consider this C program:
#include <stdio.h>
#include <stdint.h>
#include <stddef.h>
int main() {
struct s {
int16_t x;
int8_t y;
uint8_t z;
};
struct t1 {
int16_t x;
struct s y;
};
struct t2 {
int16_t x;
int y;
};
printf("The size of int is %zu\n", sizeof(int));
printf("The size of struct s is %zu\n", sizeof(struct s));
printf("The size of struct t1 is %zu\n", sizeof(struct t1));
printf("The size of struct t2 is %zu\n", sizeof(struct t2));
printf("\nThe offset of t1.y is %zu\n", offsetof(struct t1, y));
printf("The offset of t2.y is %zu\n", offsetof(struct t2, y));
}
Its output for me (on CentOS 7 w/ GCC 4.8 on x86_64) is:
The size of int is 4
The size of struct s is 4
The size of struct t1 is 6
The size of struct t2 is 8
The offset of t1.y is 2
The offset of t2.y is 4
Observe that the sizes of int and struct s are the same (4 bytes), but the compiler is aligning the struct s on a 2-byte boundary within struct t1, whereas it aligns the int on a 4-byte boundary within struct t2. This matches perfectly with the behavior of ctypes on the same system.
As for why GCC chooses the alignment it does, I observe that if I add a member of type int to struct s then GCC switches to using 4-byte alignment for the struct, as well as arranging (by default) for the offset of the int within the structure to be a multiple of 4 bytes. It is reasonable to conclude that GCC is laying out members within a struct and choosing the alignment of the struct overall so that all members of every aligned struct instance are themselves naturally aligned. Do note, however, that this is just an example. C implementations are largely at their own discretion with respect to choosing structure layout and alignment requirements.
I am using zlib for compressing some data, and I am running into a weird issue: data compressed with python is smaller than the one using C++. I have 130MB of simulation data I want to save compressed(too many files for all the necessary data).
Using C++, I have something of the sort:
//calculate inputData (double * 256 * 256 * 256)
unsigned int length = inputLength;
unsigned int outLength = length + length/1000 + 12 + 1;
printf("Length: %d %d\n", length, outLength);
Byte *outData = new Byte[outLength];
z_stream strm;
strm.zalloc = Z_NULL;
strm.zfree = Z_NULL;
strm.next_in = (Byte *) inputData;
strm.avail_in = length;
deflateInit(&strm, -1);
do {
strm.next_out = outData;
strm.avail_out = outLength;
deflate(&strm, Z_FINISH);
unsigned int have = outLength - strm.avail_out;
fwrite(outData, 1, have, output);
} while(strm.avail_out == 0);
deflateEnd(&strm);
delete[] outData;
The result using C++ is around 120MB, which is hardly what I expect, as the original is close to 130MB.
In python:
from array import array
import zlib
// read data from uncompressed file
arrD = array.array('d', data)
file.write(zlib.compress(arrD))
The result using Python is around 50MB using the same input data, less than the half. The C++ code is mostly based on the one using in python's implementation, which makes this issue even weirder.
For C++, I am using Visual Studio 2010 Profession with Zlib 1.2.8 compiled by myself.
For Python, I am using the official python 3.4.2.
Let's suppose I have a kernel to compute the element-wise sum of two arrays. Rather than passing a, b, and c as three parameters, I make them structure members as follows:
typedef struct
{
__global uint *a;
__global uint *b;
__global uint *c;
} SumParameters;
__kernel void compute_sum(__global SumParameters *params)
{
uint id = get_global_id(0);
params->c[id] = params->a[id] + params->b[id];
return;
}
There is information on structures if you RTFM of PyOpenCL [1], and others have addressed this question too [2] [3] [4]. But none of the OpenCL struct examples I've been able to find have pointers as members.
Specifically, I'm worried about whether host/device address spaces match, and whether host/device pointer sizes match. Does anyone know the answer?
[1] http://documen.tician.de/pyopencl/howto.html#how-to-use-struct-types-with-pyopencl
[2] Struct Alignment with PyOpenCL
[3] http://enja.org/2011/03/30/adventures-in-opencl-part-3-constant-memory-structs/
[4] http://acooke.org/cute/Somesimple0.html
No, there is no guaranty that address spaces match. For the basic types (float, int,…) you have alignment requirement (section 6.1.5 of the standard) and you have to use the cl_type name of the OpenCL implementation (when programming in C, pyopencl does the job under the hood I’d say).
For the pointers it’s even simpler due to this mismatch. The very beginning of section 6.9 of the standard v 1.2 (it’s section 6.8 for version 1.1) states:
Arguments to kernel functions declared in a program that are pointers
must be declared with the __global, __constant or __local qualifier.
And in the point p.:
Arguments to kernel functions that are declared to be a struct or
union do not allow OpenCL objects to be passed as elements of the
struct or union.
Note also the point d.:
Variable length arrays and structures with flexible (or unsized)
arrays are not supported.
So, no way to make you kernel runs as described in your question and that's why you haven’t been able to find some examples of OpenCl struct have pointers as members.
I still can propose a workaround that takes advantage of the fact that the kernel is compiled in JIT. It still requires that you pack you data properly and that you pay attention to the alignment and finally that the size doesn’t change during the execution of the program. I honestly would go for a kernel taking 3 buffers as arguments, but anyhow, there it is.
The idea is to use the preprocessor option –D as in the following example in python:
Kernel:
typedef struct {
uint a[SIZE];
uint b[SIZE];
uint c[SIZE];
} SumParameters;
kernel void foo(global SumParameters *params){
int idx = get_global_id(0);
params->c[idx] = params->a[idx] + params->b[idx];
}
Host code:
import numpy as np
import pyopencl as cl
def bar():
mf = cl.mem_flags
ctx = cl.create_some_context()
queue = cl.CommandQueue(self.ctx)
prog_f = open('kernels.cl', 'r')
#a = (1, 2, 3), b = (4, 5, 6)
ary = np.array([(1, 2, 3), (4, 5, 6), (0, 0, 0)], dtype='uint32, uint32, uint32')
cl_ary = cl.Buffer(ctx, mf.READ_WRITE | mf.COPY_HOST_PTR, hostbuf=ary)
#Here should compute the size, but hardcoded for the example
size = 3
#The important part follows using -D option
prog = cl.Program(ctx, prog_f.read()).build(options="-D SIZE={0}".format(size))
prog.foo(queue, (size,), None, cl_ary)
result = np.zeros_like(ary)
cl.enqueue_copy(queue, result, cl_ary).wait()
print result
And the result:
[(1L, 2L, 3L) (4L, 5L, 6L) (5L, 7L, 9L)]
I don't know the answer to my own question, but there are 3 workarounds I can come up with off the top of my head. I consider Workaround 3 the best option.
Workaround 1: We only have 3 parameters here, so we could just make a, b, and c kernel parameters. But I've read there's a limit on the number of parameters you can pass to a kernel, and I personally like to refactor any function that takes more than 3-4 arguments to use structs (or, in Python, tuples or keyword arguments). So this solution makes the code harder to read, and doesn't scale.
Workaround 2: Dump everything in a single giant array. Then the kernel would look like this:
typedef struct
{
uint ai;
uint bi;
uint ci;
} SumParameters;
__kernel void compute_sum(__global SumParameters *params, uint *data)
{
uint id = get_global_id(0);
data[params->ci + id] = data[params->ai + id] + data[params->bi + id];
return;
}
In other words, instead of using pointers, use offsets into a single array. This looks an awful lot like the beginnings of implementing my own memory model, and it feels like it's reinventing a wheel that exists somewhere in PyOpenCL, or OpenCL, or both.
Workaround 3: Make setter kernels. Like this:
__kernel void set_a(__global SumParameters *params, __global uint *a)
{
params->a = a;
return;
}
and ditto for set_b, set_c. Then execute these kernels with worksize 1 to set up the data structure. You still need to know how big a block to allocate for params, but if it's too big, nothing bad will happen (except a little wasted memory), so I'd say just assume the pointers are 64 bits.
This workaround's performance is probably awful (I imagine a kernel call has enormous overhead), but fortunately that shouldn't matter too much for my application (my kernel is going to run for seconds at a time, it's not a graphics thing that has to run at 30-60 fps, so I imagine that the time taken by extra kernel calls to set parameters will end up being a tiny fraction of my workload, no matter how high the per-kernel-call overhead is).