C++ boost.python cannot convert const char* to str

C++ boost.python cannot convert const char* to str - python

I want to calculate something in C++ and return result to python. This is part of the C++ code:
const Mat& flow_map_x, flow_map_y;
std::vector<unchar> encoded_x, encoded_y;
flow_map_x = ...;
flow_map_y = ...;
Mat flow_img_x(flow_map_x.size(), CV_8UC1);
Mat flow_img_y(flow_map_y.size(), CV_8UC1);
encoded_x.resize(flow_img_x.total());
encoded_y.resize(flow_img_y.total());
memcpy(encoded_x.data(), flow_img_x.data, flow_img_x.total());
memcpy(encoded_y.data(), flow_img_y.data, flow_img_y.total());
bp::str tmp = bp::str((const char*) encoded_x.data())
The error when running python script is:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
After debugging, I found that the error comes from this line:
bp::str tmp = bp::str((const char*) encoded_x.data())
I'm not good at C++. Could anyone tell me how to fix the error? Thanks in advance!

You can't because encoded_x.data() is not UTF-8. You probably want bytes for a copy of the raw data:
Using PyObject* PyBytes_FromStringAndSize(const char *v, Py_ssize_t len). Or you can use PyByteArray_FromStringAndSize for a bytearray with the same arguments.
bp::object tmp(bp::handle<>(PyBytes_FromStringAndSize(
// Data to make `bytes` object from
reinterpret_cast<const char*>(encoded_x.data()),
// Amount of data to read
static_cast<Py_ssize_t>(encoded_x.size())
)));
In this case, you can get rid of the vector and use flow_img_x.data and flow_img_x.total() directly.
Or a memoryview to not copy the data, but just access the std::vectors data
Using PyObject* PyMemoryView_FromMemory(char *mem, Py_ssize_t size, int flags)
bp::object tmp(bp::handle<>(PyMemoryView_FromMemory(
reinterpret_cast<char*>(encoded_x.data()),
static_cast<Py_ssize_t>(encoded_x.size()),
PyBUF_WRITE // Or `PyBUF_READ` i if you want a read-only view
)));
(If the vector was const, you would const_cast<char*>(reinterpret_cast<const char*>(encoded_x.data())) and only use PyBUF_READ)
You have to make sure the vector stays alive in this case though, but it won't create an unnecessary copy.

Related

Python Bytes to CTypes Void Pointer

I have a c function that takes as arguments a void * pointer and an integer length for the size of the buffer pointed to.
e.g.
char* myfunc(void *mybuffer, int buflen)
On the python side I have a bytes object of binary data read from a file.
What I am trying to figure out is the right conversions to be able to call the c function from python, and am struggling a bit.
I understand the conversions for dealing with simple string data (e.g. encoding to utf-8 and using a char_p type) but dealing with a bytes object has been a bit of a struggle....
Thanks in advance!

Given your commented description, you can just use the obvious types if you don't need to free the returned char* memory. You can pass a bytes object to a void*. Here's a quick demo:
test.c
#include <stdio.h>
#include <stdint.h>
#ifdef _WIN32
# define API __declspec(dllexport)
#else
# define API
#endif
API char* myfunc(void *mybuffer, int buflen) {
const uint8_t* tmp = (const uint8_t*)mybuffer;
for(int i = 0; i < buflen; ++i) // show the passed bytes
printf("%02X\n", tmp[i]);
return "output"; // static string no deallocation required
}
test.py
import ctypes as ct
import os
dll = ct.CDLL('./test')
dll.myfunc.argtypes = ct.c_void_p, ct.c_int
dll.myfunc.restype = ct.c_char_p
buf = bytes([1,2,0,0xaa,0x55]) # including embedded null
ret = dll.myfunc(buf, len(buf))
print(ret)
Output:
01
02
00
AA
55
b'output'

Pointer argument passing in python ctypes

I have the following c function.
/* returns (uint8_t *)outbuf */
uint8_t *func(uint8_t *inbuf, uint32_t inbuf_len, uint32_t *outbuf_len);
This function returns outbuf, the output length is unknown before calling the function so the function receives a pointer to the length as an argument outbuf_len, also the caller is responsible to free outbuf.
I want to get the result of this function from python, so I started writing the following code:
import ctypes as ct
libb = ct.cdll.LoadLibrary('./a.so')
libb.func.restype = ct.c_char_p
inbuf = bytearray(inbuf_len)
inbuf = python_data
arr = ct.c_ubyte * inbuf_len
outbuf_len = ct.c_uint # there is no ct.c_uint_p...
outbuf = libb.func(arr.from_buffer_copy(inbuf), inbuf_len, outbuf_len)
print hexlify(outbuf) #prints only the first 4 bytes of outbuf
The problems i have is:
I didn't find pointer to uint in ctypes types, so how can I pass the outbuf_len pointer to the C function?
when printing the outbuf, only the first 4 bytes that are pointed by the pointer are printed.
How do I free() the outbuf buffer from python?
I have the source of the C function so it is possible to change how arguments are passed the the C function.
Thanks.

If you'll be passing Python byte strings as the input buffer, here's a way to do it. I made a minimal example of the C call:
test.c
#include <stdint.h>
#include <stdlib.h>
#define API __declspec(dllexport)
// double every character in the input buffer as an example
API uint8_t *func(uint8_t *inbuf, uint32_t inbuf_len, uint32_t *outbuf_len) {
*outbuf_len = inbuf_len * 2;
uint8_t* outbuf = malloc(*outbuf_len);
for(uint32_t i = 0; i < inbuf_len; ++i) {
outbuf[i*2] = inbuf[i];
outbuf[i*2+1] = inbuf[i];
}
return outbuf;
}
API void freebuf(uint8_t* buf) {
free(buf);
}
test.py
import ctypes as ct
dll = ct.CDLL('./test')
# c_char_p accepts Python byte strings and is compatible with C uint8_t*
# but don't use it for the output buffer because ctypes converts the pointer
# back to a Python byte string and you would lose access to the pointer for
# later freeing it. Use POINTER(ct.c_char) to get the actual pointer back.
dll.func.argtypes = ct.c_char_p, ct.c_uint32, ct.POINTER(ct.c_uint32)
dll.func.restype = ct.POINTER(ct.c_char)
dll.freebuf.argtypes = ct.POINTER(ct.c_char),
dll.freebuf.restype = None
def func(inbuf):
outlen = ct.c_uint32() # create storage for the output length and pass by reference
outbuf = dll.func(inbuf, len(inbuf), ct.byref(outlen))
# Slicing the returned POINTER(c_char) returns a Python byte string.
# If you used POINTER(c_uint8) for the return value instead,
# you'd get a list of integer byte values.
data = outbuf[:outlen.value]
# Can free the pointer now if you want, or return it for freeing later
dll.freebuf(outbuf)
return data
print(func(b'ABC'))
Output:
b'AABBCC'

How to decode (from base64) a python np-array and reload it in c++ as a vector of floats?

In my project I work with word vectors as numpy arrays with a dimension of 300. I want to store the processed arrays in a mongo database, base64 encoded, because this saves a lot of storage space.
Python code
import base64
import numpy as np
vector = np.zeros(300, dtype=np.float32) # represents some word-vector
vector = base64.b64encode(vector) # base64 encoding
# Saving vector to MongoDB...
In MongoDB it is saved in as binary like this. In C++ I would like to load this binary data as a std::vector. Therefore I have to decode the data first and then load it correctly. I was able to get the binary data into the c++ program with mongocxx and had it as a uint8_t* with a size of 1600 - but now I don't know what to do and would be happy if someone could help me. Thank you (:
C++ Code
const bsoncxx::document::element elem_vectors = doc["vectors"];
const bsoncxx::types::b_binary vectors = elemVectors.get_binary();
const uint32_t b_size = vectors.size; // == 1600
const uint8_t* first = vectors.bytes;
// How To parse this as a std::vector<float> with a size of 300?
Solution
I added these lines to my C++ code and was able to load a vector with 300 elements and all correct values.
const std::string encoded(reinterpret_cast<const char*>(first), b_size);
std::string decoded = decodeBase64(encoded);
std::vector<float> vec(300);
for (size_t i = 0; i < decoded.size() / sizeof(float); ++i) {
vec[i] = *(reinterpret_cast<const float*>(decoded.c_str() + i * sizeof(float)));
}
To mention: Thanks to #Holt's info, it is not wise to encode a Numpy array base64 and then store it as binary. Much better to call ".to_bytes()" on the numpy array and then store that in MongoDB, because it reduces the document size from 1.7kb (base64) to 1.2kb (to_bytes()) and then saves computation time because the encoding (and decoding!) doesn't have to be computed!

Thank #Holt for pointing out my mistake.
First, you can't save the storage space by using base64 encoding. On the contrary, it will waste your storage. For an array with 300 floats, the storage is only 300 * 4 = 1200bytes. While after you encode it, the storage will be 1600 bytes! See more about base64 here.
Second, you want to parse the bytes into a vector<float>. You need to decode the bytes if you still use the base64 encoding. I suggest you use some third-party library or try this question. Suppose you already have the decode function.
std::string base64_decode(std::string const& encoded_string); // or something like that.
You need to use reinterpret_cast to get the value.
const std::string encoded(first, b_size);
std::string decoded = base64_decode(encoded);
std::vector<float> vec(300);
for (size_t i = 0; i < decode.size() / sizeof(float); ++i) {
vec[i] = *(reinterpret_cast<const double*>(decoded.c_str()) + i);
}

cffi export python code to dll , how to read image object in #ffi.def_extern()

i am trying to convert my python code to dll using cffi so i can access this code in my c# apllication, and i am trying to send image my c# code to python function, below is my code to read the file and convert to bytes array
[DllImport(#"plugin-1.5.dll", CallingConvention = CallingConvention.Cdecl)]
static extern IntPtr test2(byte[] img, StringBuilder url);
void call(){
StringBuilder sb = new StringBuilder(20);
sb.Append("test");
Bitmap target = (Bitmap)Bitmap.FromFile(#"C:\Users\LaserTrac\Desktop\lala_192_1.jpg");
ImageFormat fmt = new ImageFormat(target.RawFormat.Guid);
var imageCodecInfo = ImageCodecInfo.GetImageEncoders().FirstOrDefault(codec => codec.FormatID == target.RawFormat.Guid);
//this is for situations, where the image is not read from disk, and is stored in the memort(e.g. image comes from a camera or snapshot)
if (imageCodecInfo == null)
{
fmt = ImageFormat.Jpeg;
}
byte[] image_byte_array = null;
long length = 0;
//Image img = Image.FromFile(#"");
using (MemoryStream ms = new MemoryStream())
{
target.Save(ms, fmt);
image_byte_array = ms.ToArray();
length = ms.Length;
}
test2(image_byte_array, sb);
}
and below is my code in python what i tried ,
plugin.h
extern char* test2(unsigned char*, char* url);
python file
#ffi.def_extern()
def test2(img, url):
print(img)
url = ffi.string(url)
#img1 = Image.open(io.BytesIO(ffi.string(img)))
#img1.save("lala1.jpg")
decoded = cv2.imdecode(np.frombuffer(ffi.buffer(img)))
#img_np = cv2.imdecode(ffi.buffer(img), cv2.IMREAD_COLOR)
cv2.imwrite("filename.jpg", decoded)
p = ffi.new("char[]", "test".encode('ascii'))
return p
error i got is buffer size must be a multiple of element size
how can i convert this img object to any python image object
in the code above when i tried below code
img_np = cv2.imdecode(ffi.buffer(img), cv2.IMREAD_COLOR)
i got type error in test2 TypeError: Expected Ptr<cv::UMat> for argument 'buf'
and with this line decoded = cv2.imdecode(np.frombuffer(ffi.buffer(img)))
i got ValueError ValueError: buffer size must be a multiple of element size
a workaround is i can convert image to base64 string and string and int variables are working, but i am also looking this to work because i have done this in c++ dll, and for some reason i am trying to convert my python code to dll.

ffi.buffer(p) must return a buffer of some size. If p was of an known-length array type like char[100] then it would return a buffer of 100 bytes. If it was pointing to a struct, i.e. of type mystruct_t *, then it would be of size sizeof(mystruct_t). But here it is a unsigned char *, so the returned buffer contains only sizeof(unsigned char) bytes, i.e. a single byte.
Use instead the ffi.buffer(p, size) variant. To know the size, pass the length variable as an extra argument from C# to Python.

Decode C const char* in Python with ctypes

I am using ctypes (imported as c) in Python 3 to execute a C++ shared library. The library is loaded into python using:
smpLib = c.cdll.LoadLibrary(os.getcwd()+os.sep+'libsmpDyn.so')
One of the functions has the extern 'C' declaration const char* runSmpModel(...). The python function prototype is coded and run as:
proto_SMP = c.CFUNCTYPE(c.c_char_p,...)
runSmpModel = proto_SMP(('runSmpModel',smpLib))
res = runSmpModel(...)
This all works beautifully, but I'm unable to decode the res variable and obtain the string passed out by the C runSmpModel function. The value of res is displayed (I'm using ipython3) as b'\xd0'. The best solution I've found online - res.decode('utf-8') gives me the error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 0: unexpected end of data
The const char* return value from the runSmpModel function comes from
std::string scenID = SMPLib::SMPModel::runModel(...);
return scenID.c_str();
inside runModel, it is ultimately defined as shown here, where scenName is an input string:
auto utcBuffId = newChars(500);
sprintf(utcBuffId, "%s_%u", scenName.c_str(), microSeconds); // catenate scenario name & time
uint64_t scenIdhash = (std::hash < std::string>() (utcBuffId)); // hash it
auto hshCode = newChars(100);
sprintf(hshCode, "%032llX", scenIdhash);
scenId = hshCode;
The value of this specific res should be 0000000000000000BBB00C6CA8B8872E. How can I decode this string?
After a lot of further testing, I've identified the problem as the length of the string passed from the C function. No problems if the string is up to 15 characters in length, but if it's 16 or longer - no dice. For a minimum-working example, the C-code is:
extern "C" {
const char* testMeSO()
{
string scenarioID = "abcdefghijklmnop";
return scenarioID.c_str();
}
}
and python code is (same definition of smpLib as shown above):
proto_TST = c.CFUNCTYPE(c.c_char_p)
testMeSO = proto_TST(('testMeSO',smpLib))
res = testMeSO()
print("Scenario ID: %s"%res.decode('utf-8'))
This gives the decode error, unless any character is removed from the scenarioID variable in the C function. So it seems the question is "how can Python read a C char* longer than 15 characters, using ctypes.

After several days of debugging and testing, I've finally gotten this working, using the second solution posted by #Petesh on this SO post. I don't understand why ctypes is apparently limiting the char * value passed from C to 15 characters (+termination = 256 bits?).
Essentially, the solution is to pass into the C function an extra char * buff buffer that has already been created using ctypes.create_string_buffer(32*16), as well as an unsigned int buffsize of value 32*16. Then, in the C function execute scenarioID.copy(buff,buffsize). The python prototype function is modified in an obvious way.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

C++ boost.python cannot convert const char* to str - python

Related

Python Bytes to CTypes Void Pointer

Pointer argument passing in python ctypes

How to decode (from base64) a python np-array and reload it in c++ as a vector of floats?

cffi export python code to dll , how to read image object in #ffi.def_extern()

Decode C const char* in Python with ctypes

Categories

Resources