Python C Module - Malloc fails in specific version of Python

Python C Module - Malloc fails in specific version of Python - python

I'm writing a Python module to perform IO on a O_DIRECT context. One of the limitations of O_DIRECT is you must read into a buffer aligned on a 4096 byte boundary for 2.4 and 2.5 kernels, and 2.6 and up will accept any multiple of 512.
The obvious memory allocation candidate for this is posix_memalign(void **memptr, size_t alignment, size_t size)
In my code, I allocate an area like so:
char *buffer = NULL;
int mem_ret = posix_memalign((void**)&buffer, alignment, size);
if (!buffer) {
PyErr_NoMemory();
return NULL;
}
/* I do some stuff here */
free(buffer);
When I compile and import the module with python3.2, this (and the rest of the unshown module) work fine.
When I attempt the same with python2.7 (I'd like to preserve compatibility) it throws the PyErr_NoMemory exception, and mem_ret == ENOMEM, indicating it was unable to allocate.
Why would the version of Python I compile against affect how posix_memalign operates?
OS: Ubuntu 12.04 LTS
Compiler: Clang + GCC Show same behaviour
UPDATE
I now have a working piece of code, thanks to user694733
However the fact that it works has me even more confused:
#if PY_MAJOR_VERSION >= 3
char *buffer = NULL;
int mem_ret = posix_memalign((void**)&buffer, alignment, count);
#else
void *mem = NULL;
int mem_ret = posix_memalign(&mem, alignment, count);
char *buffer = (char*)mem;
#endif
Can anyone explain why the incorrect first block works under Python3, but not 2.7, and more importantly why the correct second block does not work under Python3?
UPDATE 2
The plot thickens, having settled on the correct form of the code below, I tested on 4 different version of Python.
void *mem = NULL;
int mem_ret = posix_memalign(&mem, alignment, count);
char *buffer = (char*)mem;
if (!buffer) {
PyErr_NoMemory();
return NULL;
}
/* Do stuff with buffer */
free(buffer);
Under Python 2.7: This code operates as expected.
Under Python 3.1: This code operates as expected.
Under Python 3.2: This code generates mem_ret == ENOMEM and returns NULL for buffer
Under Python 3.3: This code operates as expected.
The Python versions not included in the Ubuntu repositories were installed from the PPA at https://launchpad.net/~fkrull/+archive/deadsnakes
If the version tagged Python binaries are to be believed, the versions I have installed are:
python2.7
python3.1
python3.2mu (--with-pymalloc --with-wide-unicode)
python3.3m (--with-pymalloc)
Could the use of the wide-unicode flag in the default Python3 distribution be causing this error? If so, how is this happening?
For clarity, the ENOMEM failure to allocate will occur with any variant of malloc(), even something as simple as malloc(512).

For a quick work-around, stick to mmap instead of malloc+memalign

posix_memalign may not be the same body of code in one compilation environment as another. You could easily imagine that Python 3 would use different feature test macros to Python 2. That could mean it ends up running different code.
You might have a look at the symbols that are used... often times the output of ldd or nm will have mangled names that indicate what version is actually being used.
Additionally, what does an strace show of the allocation system call? I find that's a good way of seeing if the arguments passed in are incorrect, which can be a reason for getting ENOMEM.

Related

ITK Filter slower in C++ than Python

I have made a module in Python using SimpleITK, which I tried to speed up by reimplementing in C++. It turns out to be quite a lot slower.
The bottleneck is the usage of the DisplacementFieldJacobianDeterminantFilter.
These two snippets give an example of the usage of the filters.
1000 generations: C++ = 55s, python = 8s
Should I expect the c++ to be faster?
def test_DJD(label_path, ngen):
im = sitk.ReadImage(label_path)
for i in range(ngen):
jacobian = sitk.DisplacementFieldJacobianDeterminant(im)
if __name__ == '__main__':
label = sys.argv[1]
ngen = int(sys.argv[2])
test_DJD(label, ngen)
And the c++ code
typedef itk::Vector<float, 3> VectorType;
typedef itk::Image<VectorType, 3> VectorImageType;
typedef itk::DisplacementFieldJacobianDeterminantFilter<VectorImageType > JacFilterType;
typedef itk::Image<float, 3> FloatImageType;
int main(int argc, char** argv) {
std::string idealJacPath = argv[1];
std::string numGensString = argv[2];
int numGens;
istringstream ( numGensString ) >> numGens;
typedef itk::ImageFileReader<VectorImageType> VectorReaderType;
VectorReaderType::Pointer reader=VectorReaderType::New();
reader->SetFileName(idealJacPath);
reader->Update();
VectorImageType::Pointer vectorImage=reader->GetOutput();
JacFilterType::Pointer jacFilter = JacFilterType::New();
FloatImageType::Pointer generatedJac = FloatImageType::New();
for (int i =0; i < numGens; i++){
jacFilter->SetInput(vectorImage);
jacFilter->Update();
jacFilter->Modified();
generatedJac = jacFilter->GetOutput();
}
return 0;
}
I'm using the c++ ITK 4.8.2 and compiled in 'release' mode on Ubuntu 15.4. And the python SimpleITK v 9.0

You seem to be benchmarking using loops. Using loops for benchmarking is not a good practice, because the compilers and interpreters does a lot of optimizations to them.
I believe that in here
for i in range(ngen):
jacobian = sitk.DisplacementFieldJacobianDeterminant(im)
The python interpreter most probably realized that you are only using the last value assigned to the jacobian variable, therefore executing only ONE iteration of the loop. This is a very common loop optimization.
On the other hand, since you call a couple of dynamic method in the C++ version (jacFilter->Update();), is possible that the compiler could not infer that the other calls are not being used, making your C++ version slower since all the invocations to the DisplacementFieldJacobianDeterminant::update method are actually made.
Another possible cause is that the ITK pipeline in Python is not being forced to update, as you call explicitly the jacFilter->Modified() in C++ but this is not explicit in the Python version.

affinity.get_process_affinity_mask(pid) returns ValueError 22

I want to put my thread workers on a certain CPU(i want to test how GIL impacts on my program...), and i find a third-party library called affinity.
I used pip install affinity to make it available in my VM(Linux), unfortunately, i got the error below:
>>>pid = os.getpid()
>>>affinity.get_process_affinity_mask(pid)
Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: (22, 'Invalid argument')
from the code insight, it's supposed to work on Linux platform:
...
elif sys.platform in ('linux2'):
from _affinity import set_process_affinity_mask, get_process_affinity_mask
...
Could anyone give me some clue on this error? or is there any other ways i could use in my case?

Looking at the C code and the man page for sched_getaffinity, I'm not too surprised that it might fail with 'invalid argument'. The C code is passing in a single unsigned long where the function expects a cpu_set_t, which can be up to 128 unsigned longs.
I don't know much about the cpu_set_t structure, but on the surface it appears it might be one value per physical CPU, with individual cores represented as bits within one of those values. In that case, I would expect this module to fail on any machine with more than one CPU.
My system is a single-CPU dual core, so the module works for me. Is your VM configured with more than one physical CPU? As a test, try reconfiguring it to have only one with multiple cores and see if you are more successful.
If I'm right, the only way around this is modifying the C module to handle the cpu_set_t result correctly, probably using the macros described in CPU_SET (3).
What is your VM environment? VM software, CPU/Core count, Linux version?
Give this test program a try and see if your output is any different:
$ cat test.c
#include <stdio.h>
// The CPU_SET man page says to use the second definition, but my system
// wouldn't compile this code without the first one.
#define __USE_GNU
#define _GNU_SOURCE
#include <sched.h>
#include <errno.h>
int main(void) {
cpu_set_t cur_mask;
unsigned int len = sizeof(cpu_set_t);
if (sched_getaffinity(0, len, &cur_mask) < 0) {
printf("Error: %d\n", errno);
return errno;
}
int cpu_count = CPU_COUNT(&cur_mask);
printf("Cpu count: %d and size needed: %d\n", cpu_count, CPU_ALLOC_SIZE(cpu_count));
return 0;
}
$ gcc -std=c99 test.c
$ ./a.out
Cpu count: 2 and size needed: 8
On my system, it seems one unsigned long is enough to hold up to 64 CPUs, so it appears much simpler than I thought. Different hardware/architectures/kernel versions could always differ though.

I want to embed Python to MVC application, by linking dynamically to python

I want to embed Python to m vc application, by linking dynamically to Python.
hModPython = AfxLoadLibrary("Python23.dll");
pFnPyRun_SimpleString *pFunction = NULL;
pFnPy_Initialize *pPy_Initialize = NULL;
pFunction = (pFnPyRun_SimpleString *)::GetProcAddress(hModPython, "PyRun_SimpleString");
pPy_Initialize = (pFnPy_Initialize *)::GetProcAddress(hModPython, "Py_Initialize");
try
{
pPy_Initialize();
if ( pFunction )
{
(*pFunction)("import sys"); // call the code
}
else
{
AfxMessageBox("unable to access function from python23.dll.", MB_ICONSTOP|MB_OK);
}
}
catch(...)
{
}
And then I want to execute a Python script through my MFC application -
HANDLE hFile = CreateFile(file, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
DWORD dwSize = GetFileSize(hFile, NULL);
DWORD dwRead;
char *s = new char[dwSize +1];
ReadFile(hFile, s, dwSize, &dwRead, NULL);
s[dwSize] = '\0';
CString wholefile(s);
wholefile.Remove('\r');
wholefile+="\n";
CloseHandle(hFile);
pFnPy_CompileString *pFPy_CompileString = (pFnPy_CompileString *)::GetProcAddress(hModPython, "Py_CompileString");
CString fl(file);
PyObject* pCodeObject = pFPy_CompileString(wholefile.GetBuffer(0), fl.GetBuffer(0), Py_file_input);
if (pCodeObject != NULL)
{
pFnPyEval_EvalCode *pFPyEval_EvalCode = (pFnPyEval_EvalCode *)::GetProcAddress(hModPython, "PyEval_EvalCode");
PyObject* pObject = pFPyEval_EvalCode((PyCodeObject*)pCodeObject, m_Dictionary, m_Dictionary);
}
I am facing two problems here , I want to link to Python dynamically and also make my vc application independent of the location on which Python is installed on the users machine. However , I am required to include python.h for my code to compile the following declaration.
PyObject* pCodeObject
Is there a workaround for this ? Or do I have to specify the include for "Python.h" ? Which would mean again the program becomes path dependent.
I tried copying some of the python definitions including PyObject into a header in my mfc app. Then it complies fine. but Py_CompileString call fails. so finally I am unable to run script from my MFC application by linking to python dynamically.
How can this be done ? Please help. Is there a different approach to linking to python dynamically. Please could you write to me ?

What you are doing is quite triky... I don't understand the problem with Python.h. It is needed only for compiling your program, not for running it. Surely you do not need it to be compile-time-path-independent, do you?
Anyway, you may get rid of the PyObject* definition simply by replacing it with void*, because they are binary compatible types. And it looks like you don't care too much about the type safety of your solution.
I'd say that the reason why your Py_CompileString fails may be because you have an error in your script, or something. You should really look into the raised python exception.

Passing Linux boot opts to Init

I would like to pass some parameters to a customized Linux init via the boot options configured in the bootloader at boot.
I've written test init's in both Python and C. The Python version is able to see anything in the kernel boot options that doesn't have a '=' or '.' in it. The values are found in sys.argv. However, the C program doesn't seem to get passed the values. I would have thought the sys.argv list in Python was generated by parsing the **argv array. Below are the test scripts and screen shots that will hopefully help clarify.
the kernel boot line is:
kernel /grub/linux-2.6.38.4 root=/dev/vda1 init=/argv-{p|c} one two three four five
Python version:
#!/usr/bin/python
import sys
i = 0
print("Printing argv[] (Python) ...")
for each in range(i, len(sys.argv)):
print("argv[%d] - %s" % (i, sys.argv[i]))
i += 1
print("...finished printing argv[]")
C version:
#include <stdio.h>
int main(int argc, char **argv)
{
int i;
printf("Printing argv[] (C) ...\n");
for(i; i < argc; i++) {
printf("argv[%d] - %s\n", i, argv[i]);
}
printf("...finished printing argv[]\n");
}
You can see just before the test programs exit (and causes panic) the python version spits out the boot options the kernel didn't digest while the C version didn't. I've looked at the sysvinit source code and it looks to me (not being a C dev) that it works the same way?
How do I get the boot options passed to my C init program?
(oh, and the C program works as expected when not being run as init)

I don't know C, but I think where is int i; (line 4) should be int i = 0;. If I am wrong, add a comment to my answer and I will delete it.
Edit: you could also do i = 0 in the for loop: for(i = 0; i < argc; i++).

You need to initialize i to 0 as Artur said. If you dont the value of i is whatever happend to be in the memory at the time the program ran. Sometimes it will work, others i would be >= argc and the loop would be skipped, the worst case i is negative and your program segfaults.
Also in python try:
# i do not need to be initialized
# for counting xrange is better, it does not built the whole list on memory
for i in xrange(1, len(sys.argv)):
print("arg[%d] - %s" % (i, sys.argv))
# i do not need to be incremented manually

How do you call Python code from C code?

I want to extend a large C project with some new functionality, but I really want to write it in Python. Basically, I want to call Python code from C code. However, Python->C wrappers like SWIG allow for the OPPOSITE, that is writing C modules and calling C from Python.
I'm considering an approach involving IPC or RPC (I don't mind having multiple processes); that is, having my pure-Python component run in a separate process (on the same machine) and having my C project communicate with it by writing/reading from a socket (or unix pipe). my python component can read/write to socket to communicate. Is that a reasonable approach? Is there something better? Like some special RPC mechanism?
Thanks for the answer so far - however, i'd like to focus on IPC-based approaches since I want to have my Python program in a separate process as my C program. I don't want to embed a Python interpreter. Thanks!

I recommend the approaches detailed here. It starts by explaining how to execute strings of Python code, then from there details how to set up a Python environment to interact with your C program, call Python functions from your C code, manipulate Python objects from your C code, etc.
EDIT: If you really want to go the route of IPC, then you'll want to use the struct module or better yet, protlib. Most communication between a Python and C process revolves around passing structs back and forth, either over a socket or through shared memory.
I recommend creating a Command struct with fields and codes to represent commands and their arguments. I can't give much more specific advice without knowing more about what you want to accomplish, but in general I recommend the protlib library, since it's what I use to communicate between C and Python programs (disclaimer: I am the author of protlib).

Have you considered just wrapping your python application in a shell script and invoking it from within your C application?
Not the most elegant solution, but it is very simple.

See the relevant chapter in the manual: http://docs.python.org/extending/
Essentially you'll have to embed the python interpreter into your program.

I haven't used an IPC approach for Python<->C communication but it should work pretty well. I would have the C program do a standard fork-exec and use redirected stdin and stdout in the child process for the communication. A nice text-based communication will make it very easy to develop and test the Python program.

If I had decided to go with IPC, I'd probably splurge with XML-RPC -- cross-platform, lets you easily put the Python server project on a different node later if you want, has many excellent implementations (see here for many, including C and Python ones, and here for the simple XML-RPC server that's part the Python standard library -- not as highly scalable as other approaches but probably fine and convenient for your use case).
It may not be a perfect IPC approach for all cases (or even a perfect RPC one, by all means!), but the convenience, flexibility, robustness, and broad range of implementations outweigh a lot of minor defects, in my opinion.

This seems quite nice http://thrift.apache.org/, there is even a book about it.
Details:
The Apache Thrift software framework, for scalable cross-language
services development, combines a software stack with a code generation
engine to build services that work efficiently and seamlessly between
C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa,
JavaScript, Node.js, Smalltalk, OCaml and Delphi and other languages.

I've used the "standard" approach of Embedding Python in Another Application. But it's complicated/tedious. Each new function in Python is painful to implement.
I saw an example of Calling PyPy from C. It uses CFFI to simplify the interface but it requires PyPy, not Python. Read and understand this example first, at least at a high level.
I modified the C/PyPy example to work with Python. Here's how to call Python from C using CFFI.
My example is more complicated because I implemented three functions in Python instead of one. I wanted to cover additional aspects of passing data back and forth.
The complicated part is now isolated to passing the address of api to Python. That only has to be implemented once. After that it's easy to add new functions in Python.
interface.h
// These are the three functions that I implemented in Python.
// Any additional function would be added here.
struct API {
double (*add_numbers)(double x, double y);
char* (*dump_buffer)(char *buffer, int buffer_size);
int (*release_object)(char *obj);
};
test_cffi.c
//
// Calling Python from C.
// Based on Calling PyPy from C:
// http://doc.pypy.org/en/latest/embedding.html#more-complete-example
//
#include <stdio.h>
#include <assert.h>
#include "Python.h"
#include "interface.h"
struct API api; /* global var */
int main(int argc, char *argv[])
{
int rc;
// Start Python interpreter and initialize "api" in interface.py using
// old style "Embedding Python in Another Application":
// https://docs.python.org/2/extending/embedding.html#embedding-python-in-another-application
PyObject *pName, *pModule, *py_results;
PyObject *fill_api;
#define PYVERIFY(exp) if ((exp) == 0) { fprintf(stderr, "%s[%d]: ", __FILE__, __LINE__); PyErr_Print(); exit(1); }
Py_SetProgramName(argv[0]); /* optional but recommended */
Py_Initialize();
PyRun_SimpleString(
"import sys;"
"sys.path.insert(0, '.')" );
PYVERIFY( pName = PyString_FromString("interface") )
PYVERIFY( pModule = PyImport_Import(pName) )
Py_DECREF(pName);
PYVERIFY( fill_api = PyObject_GetAttrString(pModule, "fill_api") )
// "k" = [unsigned long],
// see https://docs.python.org/2/c-api/arg.html#c.Py_BuildValue
PYVERIFY( py_results = PyObject_CallFunction(fill_api, "k", &api) )
assert(py_results == Py_None);
// Call Python function from C using cffi.
printf("sum: %f\n", api.add_numbers(12.3, 45.6));
// More complex example.
char buffer[20];
char * result = api.dump_buffer(buffer, sizeof buffer);
assert(result != 0);
printf("buffer: %s\n", result);
// Let Python perform garbage collection on result now.
rc = api.release_object(result);
assert(rc == 0);
// Close Python interpreter.
Py_Finalize();
return 0;
}
interface.py
import cffi
import sys
import traceback
ffi = cffi.FFI()
ffi.cdef(file('interface.h').read())
# Hold references to objects to prevent garbage collection.
noGCDict = {}
# Add two numbers.
# This function was copied from the PyPy example.
#ffi.callback("double (double, double)")
def add_numbers(x, y):
return x + y
# Convert input buffer to repr(buffer).
#ffi.callback("char *(char*, int)")
def dump_buffer(buffer, buffer_len):
try:
# First attempt to access data in buffer.
# Using the ffi/lib objects:
# http://cffi.readthedocs.org/en/latest/using.html#using-the-ffi-lib-objects
# One char at time, Looks inefficient.
#data = ''.join([buffer[i] for i in xrange(buffer_len)])
# Second attempt.
# FFI Interface:
# http://cffi.readthedocs.org/en/latest/using.html#ffi-interface
# Works but doc says "str() gives inconsistent results".
#data = str( ffi.buffer(buffer, buffer_len) )
# Convert C buffer to Python str.
# Doc says [:] is recommended instead of str().
data = ffi.buffer(buffer, buffer_len)[:]
# The goal is to return repr(data)
# but it has to be converted to a C buffer.
result = ffi.new('char []', repr(data))
# Save reference to data so it's not freed until released by C program.
noGCDict[ffi.addressof(result)] = result
return result
except:
print >>sys.stderr, traceback.format_exc()
return ffi.NULL
# Release object so that Python can reclaim the memory.
#ffi.callback("int (char*)")
def release_object(ptr):
try:
del noGCDict[ptr]
return 0
except:
print >>sys.stderr, traceback.format_exc()
return 1
def fill_api(ptr):
global api
api = ffi.cast("struct API*", ptr)
api.add_numbers = add_numbers
api.dump_buffer = dump_buffer
api.release_object = release_object
Compile:
gcc -o test_cffi test_cffi.c -I/home/jmudd/pgsql-native/Python-2.7.10.install/include/python2.7 -L/home/jmudd/pgsql-native/Python-2.7.10.install/lib -lpython2.7
Execute:
$ test_cffi
sum: 57.900000
buffer: 'T\x9e\x04\x08\xa8\x93\xff\xbf]\x86\x04\x08\x00\x00\x00\x00\x00\x00\x00\x00'
$

Few tips for binding it with Python 3
file() not supported, use open()
ffi.cdef(open('interface.h').read())
PyObject* PyStr_FromString(const char *u)
Create a PyStr from a UTF-8 encoded null-terminated character buffer.
Python 2: PyString_FromString
Python 3: PyUnicode_FromString
Change to: PYVERIFY( pName = PyUnicode_FromString("interface") )
Program name
wchar_t *name = Py_DecodeLocale(argv[0], NULL);
Py_SetProgramName(name);
for compiling
gcc cc.c -o cc -I/usr/include/python3.6m -I/usr/include/x86_64-linux-gnu/python3.6m -lpython3.6m
I butchered dump def .. maybe it will give some ideas
def get_prediction(buffer, buffer_len):
try:
data = ffi.buffer(buffer, buffer_len)[:]
result = ffi.new('char []', data)
print('\n I am doing something here here........',data )
resultA = ffi.new('char []', b"Failed") ### New message
##noGCDict[ffi.addressof(resultA)] = resultA
return resultA
except:
print >>sys.stderr, traceback.format_exc()
return ffi.NULL
}
Hopefully it will help and save you some time

apparently Python need to be able to compile to win32 dll, it will solve the problem
In such a way that converting c# code to win32 dlls will make it usable by any development tool

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python C Module - Malloc fails in specific version of Python - python

For a quick work-around, stick to mmap instead of malloc+memalign

Related

ITK Filter slower in C++ than Python

affinity.get_process_affinity_mask(pid) returns ValueError 22

I want to embed Python to MVC application, by linking dynamically to python

Passing Linux boot opts to Init

How do you call Python code from C code?

Categories

Resources