Mimic Python's strip() function in C

Mimic Python's strip() function in C - python

I started on a little toy project in C lately and have been scratching my head over the best way to mimic the strip() functionality that is part of the python string objects.
Reading around for fscanf or sscanf says that the string is processed upto the first whitespace that is encountered.
fgets doesn't help either as I still have newlines sticking around.
I did try a strchr() to search for a whitespace and setting the returned pointer to '\0' explicitly but that doesn't seem to work.

Python strings' strip method removes both trailing and leading whitespace. The two halves of the problem are very different when working on a C "string" (array of char, \0 terminated).
For trailing whitespace: set a pointer (or equivalently index) to the existing trailing \0. Keep decrementing the pointer until it hits against the start-of-string, or any non-white character; set the \0 to right after this terminate-backwards-scan point.
For leading whitespace: set a pointer (or equivalently index) to the start of string; keep incrementing the pointer until it hits a non-white character (possibly the trailing \0); memmove the rest-of-string so that the first non-white goes to the start of string (and similarly for everything following).

There is no standard C implementation for a strip() or trim() function. That said, here's the one included in the Linux kernel:
char *strstrip(char *s)
{
size_t size;
char *end;
size = strlen(s);
if (!size)
return s;
end = s + size - 1;
while (end >= s && isspace(*end))
end--;
*(end + 1) = '\0';
while (*s && isspace(*s))
s++;
return s;
}

If you want to remove, in place, the final newline on a line, you can use this snippet:
size_t s = strlen(buf);
if (s && (buf[s-1] == '\n')) buf[--s] = 0;
To faithfully mimic Python's str.strip([chars]) method (the way I interpreted its workings), you need to allocate space for a new string, fill the new string and return it. After that, when you no longer need the stripped string you need to free the memory it used to have no memory leaks.
Or you can use C pointers and modify the initial string and achieve a similar result.
Suppose your initial string is "____forty two____\n" and you want to strip all underscores and the '\n'
____forty two___\n
^ ptr
If you change ptr to the 'f' and replace the first '_' after two with a '\0' the result is the same as Python's "____forty two____\n".strip("_\n");
____forty two\0___\n
^ptr
Again, this is not the same as Python. The string is modified in place, there's no 2nd string and you cannot revert the changes (the original string is lost).

I wrote C code to implement this function. I also wrote a few trivial tests to make sure my function does sensible things.
This function writes to a buffer you provide, and should never write past the end of the buffer, so it should not be prone to buffer overflow security issues.
Note: only Test() uses stdio.h, so if you just need the function, you only need to include ctype.h (for isspace()) and string.h (for strlen()).
// strstrip.c -- implement white space stripping for a string in C
//
// This code is released into the public domain.
//
// You may use it for any purpose whatsoever, and you don't need to advertise
// where you got it, but you aren't allowed to sue me for giving you free
// code; all the risk of using this is yours.
#include <ctype.h>
#include <stdio.h>
#include <string.h>
// strstrip() -- strip leading and trailing white space from a string
//
// Copies from sIn to sOut, writing at most lenOut characters.
//
// Returns number of characters in returned string, or -1 on an error.
// If you get -1 back, then nothing was written to sOut at all.
int
strstrip(char *sOut, unsigned int lenOut, char const *sIn)
{
char const *pStart, *pEnd;
unsigned int len;
char *pOut;
// if there is no room for any output, or a null pointer, return error!
if (0 == lenOut || !sIn || !sOut)
return -1;
pStart = sIn;
pEnd = sIn + strlen(sIn) - 1;
// skip any leading whitespace
while (*pStart && isspace(*pStart))
++pStart;
// skip any trailing whitespace
while (pEnd >= sIn && isspace(*pEnd))
--pEnd;
pOut = sOut;
len = 0;
// copy into output buffer
while (pStart <= pEnd && len < lenOut - 1)
{
*pOut++ = *pStart++;
++len;
}
// ensure output buffer is properly terminated
*pOut = '\0';
return len;
}
void
Test(const char *s)
{
int len;
char buf[1024];
len = strstrip(buf, sizeof(buf), s);
if (!s)
s = "**null**"; // don't ask printf to print a null string
if (-1 == len)
*buf = '\0'; // don't ask printf to print garbage from buf
printf("Input: \"%s\" Result: \"%s\" (%d chars)\n", s, buf, len);
}
main()
{
Test(NULL);
Test("");
Test(" ");
Test(" ");
Test("x");
Test(" x");
Test(" x ");
Test(" x y z ");
Test("x y z");
}

This potential ‘solution' is by no means as complete or thorough as others have presented. This is for my own toy project in C - a text-based adventure game that I’m working on with my 14-year old son. If you’re using fgets() then strcspn() may just work for you as well. The sample code below is the beginning of an interactive console-based loop.
#include <stdio.h>
#include <string.h> // for strcspn()
int main(void)
{
char input[64];
puts("Press <q> to exit..");
do {
printf("> ");
fgets(input,64,stdin); // fgets() captures '\n'
input[strcspn(input, "\n")] = 0; // replaces '\n' with 0
if (input[0] == '\0') continue;
printf("You entered '%s'\n", input);
} while (strcmp(input,"q")!= 0); // returns 0 (false) when input = "q"
puts("Goodbye!");
return 0;
}

Related

C++ byte array (struct) interpreted by Python

I am trying to pass a C++ struct from my arduino to my raspberry pi. I have a struct that looks like this:
struct node_status
{
char *node_type = "incubator";
char *sub_type; // set the type of incubator
int sub_type_id;
bool sleep = false; // set to sleep
int check_in_time = 1000; // set check in time
bool LOCK = false; // set if admin control is true/false
} nodeStatus;
I tried using the python module named struct
from struct import *
print("Rcvd Node Status msg from 0{:o}".format(header.from_node))
print("node_type: {}".format(unpack("10s",payload[0]))) #node_type
node_type = unpack("10s",payload[0])
print("sub_type: {}".format(unpack("10s",payload[1]), header.from_node)) #sub_type
sub_type = unpack("10s",payload[1])
print("sub_type_id: {}".format(unpack("b",payload[2])))
sub_type_id = unpack("b",payload[2])
print("sleep: {}".format(unpack("?",payload)[3])) #sleep
sleep = unpack("?",payload[3])
print("check_in_time: {}".format(unpack("l",payload[4]))) #check_in_time
check_in_time = unpack("l",payload[4])
print("Lock: {}".format(unpack("?",payload[5]))) #LOCK
Lock = unpack("?",payload[5])
but I am not having much luck. I was even looking at just using ctypes module but seem to not be going anywhere..
from ctypes import *
class interpret_nodes_status(Structure):
_fields_ = [('node_type',c_char_p),
('sub_type',c_char_p),
('sub_type_id',c_int),
('sleep',c_bool),
(check_in_time',c_int),
('LOCK',c_bool)]
nodestatus = translate_nodes_status(payload)
but that just gives me an error
TypeError: bytes or integer address expected instead of bytearray instance
What can I do? WHERE am I going wrong with this?
EDIT:
I am using the RF24Mesh Library from
https://github.com/nRF24/RF24Mesh
The way I send the message is this?
RF24NetworkHeader header();
if (!mesh.write(&nodeStatus, /*type*/ 126, sizeof(nodeStatus), /*to node*/ 000))
{ // Send the data
if ( !mesh.checkConnection() )
{
Serial.println("Renewing Address");
mesh.renewAddress();
}
}
else
{
Serial.println("node status msg Sent");
return;
}
}

Your C program is just sending the struct, but the struct doesn't contain any of the string data. It only includes pointers (addresses) which are not usable by any other process (different address spaces).
You would need to determine a way to send all the required data, which would likely mean sending the length of each string and its data.
One way to do that would be to use a maximum length and just store the strings in your struct:
struct node_status
{
char node_type[48];
char sub_type[48]; // set the type of incubator
int sub_type_id;
bool sleep = false; // set to sleep
int check_in_time = 1000; // set check in time
bool LOCK = false; // set if admin control is true/false
} nodeStatus;
You would then need to copy strings into those buffers instead of assigning them, and check for buffer overflow. If the strings are ever entered by users, this has security implications.
Another approach is to pack the data into a single block just when you send it.
You could use multiple writes, as well, but I don't know this mesh library or how you would set the type parameter to do that. Using a buffer is something like:
// be sure to check for null on your strings, too.
int lennodetype = strlen(nodeStatus.node_type);
int lensubtype = strlen(nodeStatus.sub_type);
int bufsize = sizeof(nodeStatus) + lennodetype + lensubtype;
byte* buffer = new byte[bufsize];
int offset = 0;
memcpy(buffer+offset, &lennodetype, sizeof(int));
offset += sizeof(int);
memcpy(buffer+offset, nodeStatus.node_type, lennodetype * sizeof(char));
offset += lennodetype * sizeof(char);
memcpy(buffer+offset, &lensubtype, sizeof(int));
offset += sizeof(int);
memcpy(buffer+offset, nodeStatus.sub_type, lensubtype * sizeof(char));
offset += lensubtype * sizeof(char);
// this still copies the pointers, which aren't needed, but simplifies the code
// and 8 unused bytes shouldn't matter too much. You could adjust this line to
// eliminate it if you wanted.
memcpy(buffer+offset, &nodeStatus, sizeof(nodeStatus));
if (!mesh.write(buffer,
/*type*/ 126,
bufsize,
/*to node*/ 000))
{ // Send the data
if ( !mesh.checkConnection() )
{
Serial.println("Renewing Address");
mesh.renewAddress();
}
}
else
{
Serial.println("node status msg Sent");
}
delete [] buffer;
Now that the data is actually SENT (a prerequisite for reading the data) the data you need should all be in the payload array. You will need to unpack it, but you can't just pass unpack a single byte, it needs the array:
len = struct.unpack("#4i", payload)
offset = 4
node_type = struct.unpack_from("{}s".format(len), payload, offset)
offset += len
len = struct.unpack_from("#4i", payload, offset)
offset += 4
sub_type = struct.unpack_from("{}s".format(len), payload, offset)
offset += len
...

I upvoted Garr Godfrey's answer as it is a good one indeed. However, it will increase the struct's size. This neither a good nor bad thing, however if for some reason you would like to keep the solution based on char* pointers instead of arrays (e.g. you don't know the maximum length of the strings), it can be achieved the following way (my code makes assumption of int's size being 4 bytes, little endian, bool's size=1bytes, char size=1byte):
//_Static_assert(sizeof(int)==4u, "Int size has to be 4 bytes");
//the above one is C11, the one below is C++:
//feel free to ifdef that if you need it
static_assert(sizeof(int)==4u, "Int size has to be 4 bytes");
struct node_status
{
char* node_type;
char* sub_type; // set the type of incubator
int sub_type_id;
bool sleep; // set to sleep
int check_in_time; // set check in time
bool LOCK; // set if admin control is true/false
};
size_t serialize_node_status(const struct node_status* st, char* buffer)
{
//this bases on the assumption buffer is large enough
//and string pointers are not null
size_t offset=0u;
size_t l = 0;
l = strlen(st->node_type)+1;
memcpy(buffer+offset, st->node_type, l);
offset += l;
l = strlen(st->sub_type)+1;
memcpy(buffer+offset, st->sub_type, l);
offset += l;
l = sizeof(st->sub_type_id);
memcpy(buffer+offset, &st->sub_type_id, l);
offset += l;
l = sizeof(st->sleep);
memcpy(buffer+offset, &st->sleep, l);
offset += l;
l = sizeof(st->check_in_time);
memcpy(buffer+offset, &st->check_in_time, l);
offset += l;
l = sizeof(st->LOCK);
memcpy(buffer+offset, &st->LOCK, l);
offset += l;
return offset;
// sending:
char buf[100] = {0}; //pick the needed size or allocate it dynamically
struct node_status nodeStatus = {"abcz", "x", 20, true, 999, false};
size_t serialized_bytes = serialize_node_status(&nodeStatus, buf);
mesh.write(buf, /*type*/ 126, serialized_bytes, /*to node*/ 000);
Side note: assigning string literals directly to char pointers is not valid C++.
So the string types either should be const char*, e.g. const char* node_type or the file should be compiled as C (where you can get away with it). Arduino often tends to have its own compilation options set, so it is likely to work due to compiler extension (or just inhibited warning). Thus, not being sure what exactly is going to be used, I wrote a C11-compatible version.
And then on Python's end:
INT_SIZE=4
class node_status:
def __init__(self,
nt: str,
st: str,
stid: int,
sl: bool,
cit: int,
lck: bool):
self.node_type = nt
self.sub_type = st
self.sub_type_id = stid
self.sleep = sl
self.check_in_time = cit
self.LOCK = lck
def __str__(self):
s=f'node_type={self.node_type} sub_type={self.sub_type}'
s+=f' sub_type_id={self.sub_type_id} sleep={self.sleep}'
s+=f' check_in_time={self.check_in_time} LOCK={self.LOCK}'
return s;
#classmethod
def from_bytes(cls, b: bytes):
offset = b.index(0x00)+1
nt = str(b[:offset], 'utf-8')
b=b[offset:]
offset = b.index(0x00)+1
st = str(b[:offset], 'utf-8')
b=b[offset:]
stid = int.from_bytes(b[:INT_SIZE], 'little')
b = b[INT_SIZE:]
sl = bool(b[0])
b = b[1:]
cit = int.from_bytes(b[:INT_SIZE], 'little')
b = b[INT_SIZE:]
lck = bool(b[0])
b = b[1:]
assert(len(b) == 0)
return cls(nt, st, stid, sl, cit, lck)
#and the deserialization goes like this:
fromMesh1 = bytes([0x61,0x62,0x63,0x0,0x78,0x79,0x7A,0x0,0x14,0x0,0x0,0x0,0x1,0xE7,0x3,0x0,0x0,0x1])
fromMesh2 = bytes([0x61,0x62,0x63,0x0,0x78,0x79,0x7A,0x0,0x14,0x0,0x0,0x0,0x1,0xE7,0x3,0x0,0x0,0x0])
fromMesh3 = bytes([0x61,0x62,0x63,0x7A,0x0,0x78,0x0,0x14,0x0,0x0,0x0,0x1,0xE7,0x3,0x0,0x0,0x0])
print(node_status.from_bytes(fromMesh1))
print(node_status.from_bytes(fromMesh2))
print(node_status.from_bytes(fromMesh3))

These are all good answers but not what was required. I suppose a more in depth knowledge of the RF24Mesh library was needed. I have been able to find the answer with the help of some RF24 pro's. Here is my solution:
I had to change the struct to specific sizes using char name[10] on the C++ arduino side.
struct node_status
{
char node_type[10] = "incubator";
char sub_type[10] = "chicken"; // set the type of incubator
int sub_type_id = 1;
bool sleep = false; // set to sleep
int check_in_time = 1000; // set check in time
bool LOCK = false; // set if admin control is true/false
} nodeStatus;
Unfortunately, it looks like read() returns the payload with a length of what you passed to the read() function. This is unintuitive and should be improved. Not to mention, the parameter specifying the length of the payload to return should be optional.
Until they get a fix for this, I will have to slice the payload to only the length that struct.pack() needs (which can be determined based on the format specifier string). So, basically
# get the max sized payload despite what was actually received
head, payload = network.read(144)
# unpack 30 bytes
(
node_type,
sub_type,
sub_type_id,
sleep,
check_in_time,
LOCK,
) = struct.unpack("<10s10si?i?", payload[:30])
I finally got it to work using this method. I want to be fair about giving the points and would like to have your opinion on who should get them that was closest to this method. Please comment below.

Converting Python program to C: How can I multiply a character by a specified value and store it into a variable?

in need of general help with converting a small buffer overflow script in Python to C. It's a bit of hack job and I am struggling to get the data types right. I can compile everything with only a single warning: "initialization makes pointer from integer without a cast - char *buff = ("%0*i", 252, 'A');"
This line is supposed to give the variable buff the value of 252 'A' characters.
I know that changing the data type can fix this, but the rest of the program relies on overflow being a pointer char *.
If anyone has any tips for me regarding any parts of the program they would be greatly appreciated.
cheers, Shiv
ORIGINAL Python:
stack_addr = 0xbffff1d0
rootcode = "\x31"
def conv(num):
return struct.pack("<I",num)
buff = "A" * 172
buff += conv(stack_addr)
buff += "\x90" * 30
buff += rootcode
buff += "A" * 22
print "targetting vulnerable program"
call(["./vuln", buff])
Converted C code:
//endianess convertion
int conv(int stack_addr)
{
(stack_addr>>8) | (stack_addr<<8);
return(0);
}
int main(int argc, char *argv[])
{
int stack_addr = 0xbffff1d0;
int rootcode = *"\x31"
char *buff = ("%0*i", 252, 'A'); //give buff the value of 252 'A's
buff += conv(stack_addr); //endian conversion
buff += ("%0*i", 30, '\x90'); //append buff variable with 30 '\x90'
buff = buff + rootcode; //append buff with value of rootcode variable
buff += ("%0*i", 22, 'A'); //append buff with 22 'A's
}

The easiest way it to write a string with the needed number of characters manually. Use the copy-paste feature of your favourite text editor.
"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
You can also build it from individual characters, using a for-loop, as described below. However, you can skip the part with building a long string, and append individual characters directly to the final string. This can be done in two ways: using strcat and without using strcat. The first way is a little cleaner:
char buff[400] = ""; // note: this should be an array, not a pointer!
// the array should be big enough to hold the final string; 400 seems enough
for (int i = 0; i < 252; i++)
strcat(buff, "A"); // this part appends one string of length 1
The function strcat is inefficient; it calculates the length of the string each time you append the string "A" to it. You don't need speed, but if you ever decide to write it efficiently, don't use strcat, and append individual char (bytes) to the array using core C language:
char buff[400]; // note: this should be an array, not a pointer!
int pos = 0; // position at which to write data
for (int i = 0; i < 252; i++)
buff[pos++] = 'A'; // this part appends one char 'A'; note single quotes
...
buff[pos++] = '\0'; // don't forget to terminate the string!

How to recieve a buffer of ints and strings from a client , and store them right? (cpp server , python client)

I have a simple cpp server which receives a char * buffer from a python client and unpacks it in order to use the data.
the python client sends a buffer which includes 2 "different" data types - string and int.
the buffer should look like this -
which means if the client wants to send the message code 200, and the data "ok", he would have to send the buffer [2002ok].
But I have decided that the client would send the buffer as chars.
so, the buffer would look like this- [Èok]
(È = 200's ascii value, = 2's ascii value)
(edit: I don't know why, but the ASCII value of 2 cannot be shown here..)
The problem is, That when I unpack the 3 parts of the buffer, they are somehow distorted.
here is my client side (Python):
msg = chr(200) + chr(0) + chr(0) + chr(0) + chr(2) + "ok"
print(">>>>" + (msg))
sock.send((msg.encode()))
and here is my server side(CPP):
uint8_t msgCode = helpMe.getCode(client_socket);
std::cout << "The message code is " << static_cast<unsigned int>(msgCode) << std::endl;
int DataLen = helpMe.getLength(client_socket);
std::string StrData = helpMe.getString(client_socket, DataLen);
Here are the "Helper" functions I used (unpacking the data):
using std::string;
uint8_t Helper::getCode(SOCKET sc)
{
uint8_t code;
getPartFromSocket(sc, reinterpret_cast<char*>(&code), sizeof(code), 0);
return code;
}
uint32_t Helper::getLength(SOCKET sc)
{
uint32_t length;
getPartFromSocket(sc, reinterpret_cast<char*>(&length), sizeof(length), 0);
return length;
}
std::string Helper::getString(SOCKET sc, size_t length)
{
std::string s(length + 1, 0);
getPartFromSocket(sc, (char*)s.data(), length, 0);
// possible since C++17 ^
return s;
}
void Helper::getPartFromSocket(SOCKET sc, char * buffer, size_t bytesNum, int flags)
{
if (bytesNum == 0)
{
return;
}
int res = recv(sc, buffer, bytesNum, flags);
if (res == INVALID_SOCKET)
{
std::string s = "Error while recieving from socket: ";
s += std::to_string(sc);
throw std::exception(s.c_str());
}
}
the client seems to work fine - it's output is:
È ok
but the server's output, which is supposed to be -
The message code is 200
is actually
The message code is ├
Where is my mistake?
Thanks, M.

You should change the way you receive data:
void Helper::getPartFromSocket(SOCKET sc, char* buffer, size_t bytesNum, int flags);
instead of internally creating an array. Then you can do:
uint8_t Helper::getCode(SOCKET sc)
{
uint8_t code;
getPartFromSocket(sc, reinterpret_cast<char*>(&code), sizeof(code), 0);
return code;
}
uint32_t Helper::getLength(SOCKET sc)
{
uint32_t length;
getPartFromSocket(sc, reinterpret_cast<char*>(&length), sizeof(length), 0);
return length;
}
std::string Helper::getString(SOCKET sc, size_t length)
{
std::string s(length, 0);
getPartFromSocket(sc, s.data(), length, 0);
// possible since C++17 ^
return s;
}
i. e. you write the data directly to where it shall be placed. At the same time, you solve your memory leak issue...
Problem remains with endianness... You obviously write big endian on python side, but as is shown above, you'll (most likely – it's machine dependent, but big endian machines got very rare these days...) read little endian. To get independent of machine's byte order on C++ side, too, you could modify the code as follows:
uint32_t length = 0
for(unsigned int i = 0; i < sizeof(length); ++i)
{
uint8_t byte;
getPartFromSocket(sc, reinterpret_cast<char*>(&byte), sizeof(byte), 0);
// little endian tranmitted:
// length |= static_cast<uint32_t>(byte) << 8*i;
// big endian transmitted:
length |= static_cast<uint32_t>(byte) << 8*(sizeof(length) - (i + 1));
// simpler: just adjust loop variable; = 1, <= sizeof ^
}
return length;
Edit: some remarks from the comments, as these have been moved away:
Well, actually, there's already a function doing this stuff: ntohl (thanks, WhozCraig, for the hint), so you can get it much easier:
uint32_t length;
getPartFromSocket(sc, reinterpret_cast<char*>(&length), sizeof(length), 0);
return ntohl(length);
Another problem spotted during discussion, this time on python side:
sock.send((msg.encode()))
encode by default delivers an utf-8-encoded string, which is certainly not what we want in this case (200 will be converted to two bytes). Instead we need to use local machine's encoding (on a windows host, quite likely cp1252 for western Europe or cp1250 for central and eastern Europe).

ctypes return a string from c function

I'm a Python veteran, but haven't dabbled much in C. After half a day of not finding anything on the internet that works for me, I thought I would ask here and get the help I need.
What I want to do is write a simple C function that accepts a string and returns a different string. I plan to bind this function in several languages (Java, Obj-C, Python, etc.) so I think it has to be pure C?
Here's what I have so far. Notice I get a segfault when trying to retrieve the value in Python.
hello.c
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
const char* hello(char* name) {
static char greeting[100] = "Hello, ";
strcat(greeting, name);
strcat(greeting, "!\n");
printf("%s\n", greeting);
return greeting;
}
main.py
import ctypes
hello = ctypes.cdll.LoadLibrary('./hello.so')
name = "Frank"
c_name = ctypes.c_char_p(name)
foo = hello.hello(c_name)
print c_name.value # this comes back fine
print ctypes.c_char_p(foo).value # segfault
I've read that the segfault is caused by C releasing the memory that was initially allocated for the returned string. Maybe I'm just barking up the wrong tree?
What's the proper way to accomplish what I want?

Your problem is that greeting was allocated on the stack, but the stack is destroyed when the function returns. You could allocate the memory dynamically:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
const char* hello(char* name) {
char* greeting = malloc(100);
snprintf("Hello, %s!\n", 100, name)
printf("%s\n", greeting);
return greeting;
}
But that's only part of the battle because now you have a memory leak. You could plug that with another ctypes call to free().
...or a much better approach is to read up on the official C binding to python (python 2.x at http://docs.python.org/2/c-api/ and python 3.x at http://docs.python.org/3/c-api/). Have your C function create a python string object and hand that back. It will be garbage collected by python automatically. Since you are writing the C side, you don't have to play the ctypes game.
...edit..
I didn't compile and test, but I think this .py would work:
import ctypes
# define the interface
hello = ctypes.cdll.LoadLibrary('./hello.so')
# find lib on linux or windows
libc = ctypes.CDLL(ctypes.util.find_library('c'))
# declare the functions we use
hello.hello.argtypes = (ctypes.c_char_p,)
hello.hello.restype = ctypes.c_char_p
libc.free.argtypes = (ctypes.c_void_p,)
# wrap hello to make sure the free is done
def hello(name):
_result = hello.hello(name)
result = _result.value
libc.free(_result)
return result
# do the deed
print hello("Frank")

In hello.c you return a local array. You have to return a pointer to an array, which has to be dynamically allocated using malloc.
char* hello(char* name)
{
char hello[] = "Hello ";
char excla[] = "!\n";
char *greeting = malloc ( sizeof(char) * ( strlen(name) + strlen(hello) + strlen(excla) + 1 ) );
if( greeting == NULL) exit(1);
strcpy( greeting , hello);
strcat(greeting, name);
strcat(greeting, excla);
return greeting;
}

I ran into this same problem today and found you must override the default return type (int) by setting restype on the method. See Return types in the ctype doc here.
import ctypes
hello = ctypes.cdll.LoadLibrary('./hello.so')
name = "Frank"
c_name = ctypes.c_char_p(name)
hello.hello.restype = ctypes.c_char_p # override the default return type (int)
foo = hello.hello(c_name)
print c_name.value
print ctypes.c_char_p(foo).value

I also ran into the same problem but used a different approach. I was suppose to find a string in a list of strings matchin a certain value.
Basically I initalized a char array with the size of longest string in my list. Then passed that as an argument to my function to hold the corresponding value.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void find_gline(char **ganal_lines, /*line array*/
size_t size, /*array size*/
char *idnb, /* id number for check */
char *resline) {
/*Iterates over lines and finds the one that contains idnb
then affects the result to the resline*/
for (size_t i = 0; i < size; i++) {
char *line = ganal_lines[i];
if (strstr(line, idnb) != NULL) {
size_t llen = strlen(line);
for (size_t k = 0; k < llen; k++) {
resline[k] = line[k];
}
return;
}
}
return;
}
This function was wrapped by the corresponding python function:
def find_gline_wrap(lines: list, arg: str, cdll):
""
# set arg types
mlen = maxlen(lines) # gives the length of the longest string in string list
linelen = len(lines)
line_array = ctypes.c_char_p * linelen
cdll.find_gline.argtypes = [
line_array,
ctypes.c_size_t,
ctypes.c_char_p,
ctypes.c_char_p,
]
#
argbyte = bytes(arg, "utf-8")
resbyte = bytes("", "utf-8")
ganal_lines = line_array(*lines)
size = ctypes.c_size_t(linelen)
idnb = ctypes.c_char_p(argbyte)
resline = ctypes.c_char_p(resbyte * mlen)
pdb.set_trace()
result = cdll.find_gline(ganal_lines, size, idnb, resline)
# getting rid of null char at the end
result = resline.value[:-1].decode("utf-8")
return result

Here's what happens. And why it's breaking. When hello() is called, the C stack pointer is moved up, making room for any memory needed by your function. Along with some function call overhead, all of your function locals are managed there. So that static char greeting[100], means that 100 bytes of the increased stack are for that string. You than use some functions that manipulate that memory. At the you place a pointer on the stack to the greeting memory. And then you return from the call, at which point, the stack pointer is retracted back to it's original before call position. So those 100 bytes that were on the stack for the duration of your call, are essentially up for grabs again as the stack is further manipulated. Including the address field which pointed to that value and that you returned. At that point, who knows what happens to it, but it's likely set to zero or some other value. And when you try to access it as if it were still viable memory, you get a segfault.
To get around, you need to manage that memory differently somehow. You can have your function allocate the memory on the heap, but you'll need to make sure it gets free()'ed at a later date, by your binding. OR, you can write your function so that the binding language passes it a glump of memory to be used.

How do I debug code that segfaults unless run through gdb?

That's a single threaded code.
In particular: ahocorasick Python extension module (easy_install ahocorasick).
I isolated the problem to a trivial example:
import ahocorasick
t = ahocorasick.KeywordTree()
t.add("a")
When I run it in gdb, all is fine, same happens when I enter these instructions into Python CLI. However, when I try to run the script regularily, I get a segfault.
To make it even weirder, the line that causes segfault (identified by core dump analysis) is a regular int incrementation (see the bottom of the function body).
I'm completely stuck by this moment, what can I do?
int
aho_corasick_addstring(aho_corasick_t *in, unsigned char *string, size_t n)
{
aho_corasick_t* g = in;
aho_corasick_state_t *state,*s = NULL;
int j = 0;
state = g->zerostate;
// As long as we have transitions follow them
while( j != n &&
(s = aho_corasick_goto_get(state,*(string+j))) != FAIL )
{
state = s;
++j;
}
if ( j == n ) {
/* dyoo: added so that if a keyword ends up in a prefix
of another, we still mark that as a match.*/
aho_corasick_output(s) = j;
return 0;
}
while( j != n )
{
// Create new state
if ( (s = xalloc(sizeof(aho_corasick_state_t))) == NULL )
return -1;
s->id = g->newstate++;
debug(printf("allocating state %d\n", s->id)); /* debug */
s->depth = state->depth + 1;
/* FIXME: check the error return value of
aho_corasick_goto_initialize. */
aho_corasick_goto_initialize(s);
// Create transition
aho_corasick_goto_set(state,*(string+j), s);
debug(printf("%u -> %c -> %u\n",state->id,*(string+j),s->id));
state = s;
aho_corasick_output(s) = 0;
aho_corasick_fail(s) = NULL;
++j; // <--- HERE!
}
aho_corasick_output(s) = n;
return 0;
}

There are other tools you can use that will find faults that does not necessarily crash the program.
valgrind, electric fence, purify, coverity, and lint-like tools may be able to help you.
You might need to build your own python in some cases for this to be usable. Also, for memory corruption things, there is (or was, haven't built exetensions in a while) a possibility to let python use direct memory allocation instead of pythons own.

Have you tried translating that while loop to a for loop? Maybe there's some subtle misunderstanding with the ++j that will disappear if you use something more intuitive.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Mimic Python's strip() function in C - python

Related

C++ byte array (struct) interpreted by Python

Converting Python program to C: How can I multiply a character by a specified value and store it into a variable?

How to recieve a buffer of ints and strings from a client , and store them right? (cpp server , python client)

ctypes return a string from c function

How do I debug code that segfaults unless run through gdb?

Categories

Resources