I am sending strings to my BPF C code and I am not sure if the strings passed in are null-terminated. If they are not, is there a way to make them null terminated? I am sending in my lines of code to BPF so I can count them manually using my stringCounter function but I keep hitting a forever loop sadly. Here is what my Python code looks like:
b = BPF(src_file="hello.c")
lookupTable = b["lookupTable"]
#add hello.csv to the lookupTable array
f = open("hello copy.csv","r")
contents = f.readlines()
for i in range(0,len(contents)):
string = contents[i].encode('utf-8')
lookupTable[ctypes.c_int(i)] = ctypes.create_string_buffer(string, len(string))
And here is the code I found for my null terminated string counter
int stringLength(char* txt)
{
int i=0,count=0;
while(txt[i++]!='\0'){
count+=1;
}
return count;
}
ctypes.create_string_buffer(string, len(string)) is not zero-terminated. But ctypes.create_string_buffer(string) is. It's easy to see that, since ctypes.create_string_buffer(string)[-1] is b'\x00', whereas ctypes.create_string_buffer(string, len(string))[-1] is the last byte in string.
In other words, if you want a zero-terminated buffer, let create_string_buffer figure out the length. (It uses the actual length from the Python bytes object, so it doesn't get fooled by internal NUL bytes, if you were worried about that.)
I'm unfamiliar with BPF but for ctypes, if your string isn't modified by the C code you don't need create_string_buffer as it is used to create mutable buffers, and Python Unicode and byte strings are both always passed nul-terminated wchar_t* or char*, respectively, to C code. Assuming your function is in test.dll or test.so:
import ctypes as ct
dll = ct.CDLL('./test')
dll.stringLength.argtypes = ct.c_char_p,
dll.stringLength.restype = ct.c_int
print(dll.stringLength('somestring'.encode())) # If string is Unicode
print(dll.stringLength(b'someotherstring')) # If already a byte string
Output:
10
15
Note this doesn't preclude having a nul in the string itself, but your count function will return a shorter value in that case:
print(dll.stringLength(b'some\0string')) # Output: 4
Your code could be probably be written as the following assuming there isn't some requirement that a BPF object have hard-coded ctypes types as indexes and values.
with open("hello copy.csv") as file:
for i,line in enumerate(file):
lookupTable[i] = string.encode()
Related
We are able to defeat the small integer intern in this way (a calculation allows us to avoid the caching layer):
>>> n = 674039
>>> one1 = 1
>>> one2 = (n ** 9 + 1) % (n ** 9)
>>> one1 == one2
True
>>> one1 is one2
False
How can you defeat the small string intern, i.e. to see the following result:
>>> one1 = "1"
>>> one2 = <???>
>>> type(one2) is str and one1 == one2
True
>>> one1 is one2
False
sys.intern mentions that "Interned strings are not immortal", but there's no context about how a string could kicked out of the intern, or how you can create a str instance avoiding the caching layer.
Since interning is CPython implementation detail, answers relying on undocumented implementation details are ok/expected.
Unicode consisting of only one character (with value smaller than 128 or more precisely from latin1) is the most complicated case, because those strings aren't really interned but (more similar to the integer pool or identically to the behavior for bytes) are created at the start and are stored in an array as long as the interpreter is alive:
truct _Py_unicode_state {
...
/* Single character Unicode strings in the Latin-1 range are being
shared as well. */
PyObject *latin1[256];
...
/* This dictionary holds all interned unicode strings...
*/
PyObject *interned;
...
};
So every time a length 1 unicode is created, the character value gets looked up if it is in the latin1-array. E.g. in unicode_decode_utf8:
/* ASCII is equivalent to the first 128 ordinals in Unicode. */
if (size == 1 && (unsigned char)s[0] < 128) {
if (consumed) {
*consumed = 1;
}
return get_latin1_char((unsigned char)s[0]);
}
One could even argue, if there is a way to circumvent this in the interpreter - we speak about a (performance-) bug.
A possibility is to populate the unicode-data by ourselves using C-API. I use Cython for the proof of concept, but also ctypes could be used to the same effect:
%%cython
cdef extern from *:
"""
PyObject* create_new_unicode(char *ch)
{
PyUnicodeObject *ob = (PyUnicodeObject *)PyUnicode_New(1, 127);
Py_UCS1 *data = PyUnicode_1BYTE_DATA(ob);
data[0]=ch[0]; //fill data without using the unicode_decode_utf8
return (PyObject*)ob;
}
"""
object create_new_unicode(char *ch)
def gen1():
return create_new_unicode(b"1")
Noteworthy details:
PyUnicode_New would not look up in latin1, because the characters aren't set yet.
For simplicity, the above works only for ASCII characters - thus we pass 127 as maxchar to PyUnicode_New. As result, we can interpret data via PyUnicode_1BYTE_DATA which makes it easy to manipulate it without much ado manually.
And now:
a,b=gen1(), gen1()
a is b, a == b
# yields (False, True)
as wanted.
Here is a similar idea, but implemented with ctypes:
from ctypes import POINTER, py_object, c_ssize_t, byref, pythonapi
PyUnicode_New = pythonapi.PyUnicode_New
PyUnicode_New.argtypes = (c_ssize_t, c_ssize_t)
PyUnicode_New.restype = py_object
PyUnicode_CopyCharacters = pythonapi._PyUnicode_FastCopyCharacters
PyUnicode_CopyCharacters.argtypes = (py_object, c_ssize_t, py_object, c_ssize_t, c_ssize_t)
PyUnicode_CopyCharacters.restype = c_ssize_t
def clone(orig):
cloned = PyUnicode_New(1,127)
PyUnicode_CopyCharacters(cloned, 0, orig, 0, 1)
return cloned
Noteworthy details:
It is not possible to use PyUnicode_1BYTE_DATA with ctypes, because it is a macro. An alternative would be to calculate the offset to data-member and directly access this memory (but it depends on the platform and doesn't feel very portable)
As work-around, PyUnicode_CopyCharacters is used (there are probably also other possibilities to achieve the same), which is more abstract and portable than directly calculating/accessing the memory.
Actually, _PyUnicode_FastCopyCharacters is used, because PyUnicode_CopyCharacters would check, that the target-unicode has multiple references and throw. _PyUnicode_FastCopyCharacters doesn't perform those checks and does as asked.
And now:
a="1"
b=clone(a)
a is b, a==b
# yields (False, True)
For strings longer than 1 character, it is a lot easier to avoid interning, e.g.:
a="12"
b="123"[0:2]
a is b, a == b
#yields (False, True)
I am attempting to read my players health. I have been on a roll but have run into a problem. I am able to read what type of information is at a certain address but can't read what the actual value is, for example here is the response I receive.
<ctypes.c_char_Array_64 object at 0x0000000002EBF9C8>
I am looking for what information is held in the c_char_Array_64 object but have no idea how I would go about it.
Here is my code:
class User:
ctypes.wintypes.DWORD = "Entity"
ctypes.wintypes.c_int = "Team"
ctypes.wintypes.c_int = "Health"
ctypes.wintypes.c_int = "Player"
def getSelfInfo(self):
adr1 = clientdll + dw_LocalPlayer
adr2 = ctypes.create_string_buffer(64)
bytes_read = ctypes.c_size_t()
(rPM(PROCESS.handle, adr1, adr2, sys.getsizeof(ctypes.wintypes.DWORD), ctypes.byref(bytes_read)))
print adr2
t = User()
t.getSelfInfo()
You need to get the value:
print(ar2.value)
From the docs:
If you need mutable memory blocks, ctypes has a create_string_buffer()
function which creates these in various ways. The current memory block
contents can be accessed (or changed) with the raw property; if you
want to access it as NUL terminated string, use the value property:
>>> from ctypes import *
>>> p = create_string_buffer(3) # create a 3 byte buffer, initialized to NUL bytes
>>> print sizeof(p), repr(p.raw)
3 '\x00\x00\x00'
>>> p = create_string_buffer("Hello") # create a buffer containing a NUL terminated string
>>> print sizeof(p), repr(p.raw)
6 'Hello\x00'
>>> print repr(p.value)
'Hello'
>>> p = create_string_buffer("Hello", 10) # create a 10 byte buffer
>>> print sizeof(p), repr(p.raw)
10 'Hello\x00\x00\x00\x00\x00'
>>> p.value = "Hi"
>>> print sizeof(p), repr(p.raw)
10 'Hi\x00lo\x00\x00\x00\x00\x00'
>>>
The empty slice of most ctypes array types will return the Python equivalent type. So to convert your 64 byte buffer to a str (in Py3 bytes), you can do:
print ar2[:]
That will read the full raw 64 bytes mind you. If you want to read it as a C-style string (so the first NUL byte terminates the Python equivalent str), you'd use .value:
print ar2.value
for line in fo:
line = " ".join(line.split())
line = line.strip()
I am getting an error
line = ''.join(line.split())
TypeError: sequence item 0: expected str instance, bytes found
its working fine in python 2.x, but not working on 3.4
kindly suggest a proper solution for that
' ' is a string which you're calling its join method with a byte sequence. As the documentation's stated, in python-3.x:
str.joinReturn a string which is the concatenation of the strings
in the iterable iterable. A TypeError will be raised if there are any
non-string values in iterable, including bytes objects. The separator
between elements is the string providing this method.
But in this case since you are dealing with byte objects you cannot use str related methods. The byte object itself comes with a join() method that can be used in the same manner as str.join. You can also use io.BytesIO, or you can do in-place concatenation with a bytearray object. As the documentation's mentioned bytearray objects are mutable and have an efficient overallocation mechanism.
So you can simply add a b prefix to the empty string to make it a byte object:
line = b" ".join(line.split())
Also, if your file is contain strings you can simply open your file in a str mode ('r')instead of byte ('rb').
with open("input.txt", "r") as f:
# Do something with f
Note that despite the separation between str and byte objects in python-3.x, in python-2.x you only have str. You can see this by checking the type of a string with b prefix:
In [2]: type(b'')
Out[2]: str
And that's what that makes the following snippet work:
"".join([b'www', b'www'])
You could use str() function:
lines=str(lines)
before running the command to avoid errors.
This way you convert the lines variable to string.
If you came here searching for a solution to join a custom class implemented in C/C++, the simplest method is to add a join method to the class itself and create binding to python.
For example, a class that can have either list or map which should be joinable, example code in pybind11 would be something like this:
py::class_<Data> _data(m, "Data");
_data.def(py::init<>())
.def("join", [] (Data &d, const char *j = ' ') {
std::string ret;
if (d.isObject())
for (auto &o: d.object())
ret += o.first + j;
else if (d.isList())
for (auto &o: d.list())
ret += o.stringValue() + j;
return ret;
})
Then in python, it is a simple matter of calling the join method for the class
data.join('_')
I have function in C that reads byte by byte from a given buffer and returns the result of a mathematical formula.
I need to write the same function in Python
The buffer in C is struct and in python i used ctypes Structure class
my prototype in c is int calc_formula(char *buff,int len)
so calling the function in c is staright forward but how i define such function in Python?
I try to define the following and have some questions
def calc_formula(buff,len):
some code
In C I called the function with pointer to the strcut first char. How do I do it in Python? is buff passed as pointer? My buffer is very large and if it can't be done, I will use global variable (which is less preferred).
I need to read the buffer byte by byte, so in c I simply increment the buffer pointer. What's the way to do it in python? I read about ctypes union class that I can define over the Structure and go over it byte by byte. Do you have a better solution?
UPDATE
i tried bbrame solution :
def calc_formula(buff, len):
sum = 0
for curChar in buff:
numericByteValue = ord(curChar)
sum += numericByteValue
return sum
with When i try its code with calc_formula(input_buff,len) , i get the following:
"*error:TypeError: 't_input_buff' object is not iterable*" - input_buff is instance of t_input_buff that is Class(Structure) . what can be the problem?
(it give me the error when it try to do the for command)
In c, try using the type c_char_p rather than char* (see the ctypes documentation).
In python the parameter (buff) will be a python string. Loop through it as follows:
def calc_formula(buff, len):
sum = 0
for curChar in buff:
numericByteValue = ord(curChar)
sum += numericByteValue
return sum
UPDATE
i solve it with ctypes union class
for answer look in this question
I'm trying to figure out why this works after lots and lots of messing about with
obo.librar_version is a c function which requires char ** as the input and does a strcpy
to passed in char.
from ctypes import *
_OBO_C_DLL = 'obo.dll'
STRING = c_char_p
OBO_VERSION = _stdcall_libraries[_OBO_C_DLL].OBO_VERSION
OBO_VERSION.restype = c_int
OBO_VERSION.argtypes = [POINTER(STRING)]
def library_version():
s = create_string_buffer('\000' * 32)
t = cast(s, c_char_p)
res = obo.library_version(byref(t))
if res != 0:
raise Error("OBO error %r" % res)
return t.value, s.raw, s.value
library_version()
The above code returns
('OBO Version 1.0.1', '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', '')
What I don't understand is why 's' does not have any value? Anyone have any ideas? Thx
When you cast s to c_char_p you store a new object in t, not a reference. So when you pass t to your function by reference, s doesn't get updated.
UPDATE:
You are indeed correct:
cast takes two parameters, a ctypes
object that is or can be converted to
a pointer of some kind, and a ctypes
pointer type. It returns an instance
of the second argument, which
references the same memory block as
the first argument.
In order to get a reference to your string buffer, you need to use the following for your cast:
t = cast(s, POINTER(c_char*33))
I have no idea why c_char_p doesn't create a reference where this does, but there you go.
Because library_version requires a char**, they don't want you to allocate the characters (as you're doing with create_string_buffer. Instead, they just want you to pass in a reference to a pointer so they can return the address of where to find the version string.
So all you need to do is allocate the pointer, and then pass in a reference to that pointer.
The following code should work, although I don't have obo.dll (or know of a suitable replacement) to test it.
from ctypes import *
_OBO_C_DLL = 'obo.dll'
STRING = c_char_p
_stdcall_libraries = dict()
_stdcall_libraries[_OBO_C_DLL] = WinDLL(_OBO_C_DLL)
OBO_VERSION = _stdcall_libraries[_OBO_C_DLL].OBO_VERSION
OBO_VERSION.restype = c_int
OBO_VERSION.argtypes = [POINTER(STRING)]
def library_version():
s_res = c_char_p()
res = OBO_VERSION(byref(s_res))
if res != 0:
raise Error("OBO error %r" % res)
return s_res.value
library_version()
[Edit]
I've gone a step further and written my own DLL that implements a possible implementation of OBO_VERSION that does not require an allocated character buffer, and is not subject to any memory leaks.
int OBO_VERSION(char **pp_version)
{
static char result[] = "Version 2.0";
*pp_version = result;
return 0; // success
}
As you can see, OBO_VERSION simply sets the value of *pp_version to a pointer to a null-terminated character array. This is likely how the real OBO_VERSION works. I've tested this against my originally suggested technique above, and it works as prescribed.