I am using the Python construct parser to process some binary data but am not managing to obtain strings in the way I expected.
Note that in the simplified example below I could use unpack or even just a slice, but the real data I am parsing does not align neatly to byte boundaries.
Some example code:
from construct import BitStruct, BitField, Padding, String
struct = BitStruct("foo",
BitField("bar", 8),
BitField("baz", 16),
Padding(4),
BitField("bat", 4)
)
struct2 = BitStruct("foo",
BitField("bar", 8),
String("baz", 16),
Padding(4),
BitField("bat", 4)
)
data = "\x01AB\xCD"
print struct.parse(data)
print struct2.parse(data)
This prints the output:
Container:
bar = 1
baz = 16706
bat = 13
Container:
bar = 1
baz = '\x00\x01\x00\x00\x00\x00\x00\x01\x00\x01\x00\x00\x00\x00\x01\x00'
bat = 13
I was expecting that String would give me back AB as an actual string. However it is returning the equivalent binary string instead.
How can I persuade construct to return me the actual ASCII string?
I solved this by creating an Adapter. The original ASCII values are parsed into a list of integers which can then be converted into a string representation.
It's not the most elegant way but due to BitStruct operating only on bit values it seems to be the easiest workaround. An improved version would parse different length strings (e.g. 7-bit ASCII).
from binascii import hexlify
from construct import BitStruct, BitField, Padding, Array, Octet, Adapter
class BitStringAdapter(Adapter):
def _encode(self, obj, context):
return list(ord(b) for b in obj)
def _decode(self, obj, context):
return "".join(chr(b) for b in obj)
struct = BitStruct("foo",
BitField("bar", 8),
BitStringAdapter(Array(2, Octet("baz"))),
Padding(4),
BitField("bat", 4)
)
data = "\x01AB\xCD"
out = struct.parse(data)
print hexlify(struct.build(out))
This outputs:
Container:
bar = 1
baz = 16706
bat = 13
0141420d
Which is correct - the C byte is discarded because it's marked as padding, this is fine.
The python module bitstruct can also be used to parse bit fields. It uses format strings, just like the standard library struct module.
The format specifier 't' is for text.
>>> from bitstruct import unpack
>>> data = b'\x01AB\xCD'
>>> unpack("u8u16p4u4", data)
(1, 16706, 13)
>>> unpack("u8t16p4u4", data)
(1, u'AB', 13)
Related
Hello so I have a string in python type POINTER(wintypes.BYTE) I am using DATA_BLOB in python(
class CREATE_DATA_BLOB(Structure):
_fields_ = [('cbData', wintypes.DWORD), ('pbData', POINTER(wintypes.BYTE))]
) I have a DLL that encrypts the data. after it encrypts the data, the data is saved inside the pbData of the data_blob structure. The problem is the values inside the pbData(pbData[0]) for example has -42 in it, another example is that some of them are between 0 to 255 - they are good but some are completely random numbers and I can't figure out how to turn these non-ASCII number to a character. In c++ I use writeFile function and I just send pbData and everything works great in python is not the case I have this error if I am trying to write pbData to a text file:
file.write(data_out.pbData)
TypeError: write() argument must be str, not LP_c_byte
I really don't know how to fix this problem.
Listing [Python 3.Docs]: ctypes - A foreign function library for Python.
There are several problems:
wintypes.BYTE is signed ([Python.Bugs]: wrong type for wintypes.BYTE)
file.write works with Python strings (in your case) not ctypes pointers (and there's no implicit conversion between them)
Going further (this would appear after solving the other 2): you have "special" chars in your buffer. That means that you shouldn't treat is as a "normal string", but as a binary sequence (otherwise you may get encode / decode errors). As a consequence, open the file where you want to dump its contents to, in binary mode (e.g.: file = open(file_name, "wb")).
>>> import ctypes as ct
>>> from ctypes import wintypes as wt
>>>
>>> class CREATE_DATA_BLOB(ct.Structure):
... _fields_ = [
... ("cbData", wt.DWORD),
... ("pbData", ct.POINTER(ct.c_ubyte)), # wt.BYTE is signed !!!
... ]
...
>>>
>>> buf_initial = b"AB\xD6CD\xD9EF\x9CGH" # Contains the 3 chars you mentioned
>>> buf_initial
b'AB\xd6CD\xd9EF\x9cGH'
>>> # Populate the structure as it was done from C++
...
>>> blob = CREATE_DATA_BLOB(len(buf_initial), ct.cast(ct.create_string_buffer(buf_initial), ct.POINTER(ct.c_ubyte)))
>>> blob.cbData, blob.pbData
(11, <__main__.LP_c_ubyte object at 0x00000154FF6998C8>)
>>>
>>> buf_final = bytes(blob.pbData[:blob.cbData]) # Convert the pointer explicitly to Python bytes
>>> buf_final
b'AB\xd6CD\xd9EF\x9cGH'
>>> buf_initial == buf_final
True
>>>
>>> with open("q058436070_out.bin", "wb") as file:
... file.write(buf_final)
...
11
Iam receiving single bytes via serial and I know, that every 4 of them are a float. F.e. I receive b'\x3c' and b'\xff' and I want it to be b'\x3c\xff'.
What is the best way to convert it?
You can use join() as you do with strings.
byte_1 = b'\x3c'
byte_2 = b'\xff'
joined_bytes = b''.join([byte_1, byte_2]) #b'\x3c\xff'
You can use it along the struct module to obtain your decoded float, be aware it returns a tuple even if it has only one element inside.
import struct
byte_1 = b'\x3c'
byte_2 = b'\xff'
byte_3 = b'\x20'
byte_4 = b'\xff'
joined_bytes = b''.join([byte_1, byte_2, byte_3, byte_4])
result = struct.unpack('f', joined_bytes)
print(result[0])
I convert a string to a json-object using the json-library:
a = '{"index":1}'
import json
json.loads(a)
{'index': 1}
However, if I instead change the string a to contain a leading 0, then it breaks down:
a = '{"index":01}'
import json
json.loads(a)
>>> JSONDecodeError: Expecting ',' delimiter
I believe this is due to the fact that it is invalid JSON if an integer begins with a leading zero as described in this thread.
Is there a way to remedy this? If not, then I guess the best way is to remove any leading zeroes by a regex from the string first, then convert to json?
A leading 0 in a number literal in JSON is invalid unless the number literal is only the character 0 or starts with 0.. The Python json module is quite strict in that it will not accept such number literals. In part because a leading 0 is sometimes used to denote octal notation rather than decimal notation. Deserialising such numbers could lead to unintended programming errors. That is, should 010 be parsed as the number 8 (in octal notation) or as 10 (in decimal notation).
You can create a decoder that will do what you want, but you will need to heavily hack the json module or rewrite much of its internals. Either way, you will see a performance slow down as you will no longer be using the C implementation of the module.
Below is an implementation that can decode JSON which contains numbers with any number of leading zeros.
import json
import re
import threading
# a more lenient number regex (modified from json.scanner.NUMBER_RE)
NUMBER_RE = re.compile(
r'(-?(?:\d*))(\.\d+)?([eE][-+]?\d+)?',
(re.VERBOSE | re.MULTILINE | re.DOTALL))
# we are going to be messing with the internals of `json.scanner`. As such we
# want to return it to its initial state when we're done with it, but we need to
# do so in a thread safe way.
_LOCK = threading.Lock()
def thread_safe_py_make_scanner(context, *, number_re=json.scanner.NUMBER_RE):
with _LOCK:
original_number_re = json.scanner.NUMBER_RE
try:
json.scanner.NUMBER_RE = number_re
return json.scanner._original_py_make_scanner(context)
finally:
json.scanner.NUMBER_RE = original_number_re
json.scanner._original_py_make_scanner = json.scanner.py_make_scanner
json.scanner.py_make_scanner = thread_safe_py_make_scanner
class MyJsonDecoder(json.JSONDecoder):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
# overwrite the stricter scan_once implementation
self.scan_once = json.scanner.py_make_scanner(self, number_re=NUMBER_RE)
d = MyJsonDecoder()
n = d.decode('010')
assert n == 10
json.loads('010') # check the normal route still raise an error
I would stress that you shouldn't rely on this as a proper solution. Rather, it's a quick hack to help you decode malformed JSON that is nearly, but not quite valid. It's useful if recreating the JSON in a valid form is not possible for some reason.
First, using regex on JSON is evil, almost as bad as killing a kitten.
If you want to represent 01 as a valid JSON value, then consider using this structure:
a = '{"index" : "01"}'
import json
json.loads(a)
If you need the string literal 01 to behave like a number, then consider just casting it to an integer in your Python script.
How to convert string int JSON into real int with json.loads
Please see the post above
You need to use your own version of Decoder.
More information can be found here , in the github
https://github.com/simplejson/simplejson/blob/master/index.rst
c = '{"value": 02}'
value= json.loads(json.dumps(c))
print(value)
This seems to work .. It is strange
> >>> c = '{"value": 02}'
> >>> import json
> >>> value= json.loads(json.dumps(c))
> >>> print(value) {"value": 02}
> >>> c = '{"value": 0002}'
> >>> value= json.loads(json.dumps(c))
> >>> print(value) {"value": 0002}
As #Dunes, pointed out the loads produces string as an outcome which is not a valid solution.
However,
DEMJSON seems to decode it properly.
https://pypi.org/project/demjson/ -- alternative way
>>> c = '{"value": 02}'
>>> import demjson
>>> demjson.decode(c)
{'value': 2}
I am attempting to read my players health. I have been on a roll but have run into a problem. I am able to read what type of information is at a certain address but can't read what the actual value is, for example here is the response I receive.
<ctypes.c_char_Array_64 object at 0x0000000002EBF9C8>
I am looking for what information is held in the c_char_Array_64 object but have no idea how I would go about it.
Here is my code:
class User:
ctypes.wintypes.DWORD = "Entity"
ctypes.wintypes.c_int = "Team"
ctypes.wintypes.c_int = "Health"
ctypes.wintypes.c_int = "Player"
def getSelfInfo(self):
adr1 = clientdll + dw_LocalPlayer
adr2 = ctypes.create_string_buffer(64)
bytes_read = ctypes.c_size_t()
(rPM(PROCESS.handle, adr1, adr2, sys.getsizeof(ctypes.wintypes.DWORD), ctypes.byref(bytes_read)))
print adr2
t = User()
t.getSelfInfo()
You need to get the value:
print(ar2.value)
From the docs:
If you need mutable memory blocks, ctypes has a create_string_buffer()
function which creates these in various ways. The current memory block
contents can be accessed (or changed) with the raw property; if you
want to access it as NUL terminated string, use the value property:
>>> from ctypes import *
>>> p = create_string_buffer(3) # create a 3 byte buffer, initialized to NUL bytes
>>> print sizeof(p), repr(p.raw)
3 '\x00\x00\x00'
>>> p = create_string_buffer("Hello") # create a buffer containing a NUL terminated string
>>> print sizeof(p), repr(p.raw)
6 'Hello\x00'
>>> print repr(p.value)
'Hello'
>>> p = create_string_buffer("Hello", 10) # create a 10 byte buffer
>>> print sizeof(p), repr(p.raw)
10 'Hello\x00\x00\x00\x00\x00'
>>> p.value = "Hi"
>>> print sizeof(p), repr(p.raw)
10 'Hi\x00lo\x00\x00\x00\x00\x00'
>>>
The empty slice of most ctypes array types will return the Python equivalent type. So to convert your 64 byte buffer to a str (in Py3 bytes), you can do:
print ar2[:]
That will read the full raw 64 bytes mind you. If you want to read it as a C-style string (so the first NUL byte terminates the Python equivalent str), you'd use .value:
print ar2.value
I have the next value
value = bytearray(b'\x85\x13\xbd|\xfb\xbc\xc3\x95\xbeL6L\xfa\xbf0U_`$]\xca\xee]z\xef\xa0\xd6(\x15\x8b\xca\x0e\x1f7\xa9\xf0\xa4\x98\xc5\xdf\xcdM5\xef\xc2\x052`\xeb\x13\xd9\x99B.\x95\xb2\xbd\x96\xd9\x14\xe6F\x9e\xfd\xd8\x00')
when I try to convert in python3.x it works well.
>>> int.from_bytes(value, byteorder='little')
2909369579440607969688280064437289348250138784421305732473112318543540722321676649649580720015118044118243611774710427666475769804427735898727217762490192773
How to convert it in python2.7? I already read the convert a string of bytes into an int (python)
struct.unpack(fmt, value)[0]
But don't know what to do with fmt.
You can just write your own from_bytes function in Python 2:
def from_bytes (data, big_endian = False):
if isinstance(data, str):
data = bytearray(data)
if big_endian:
data = reversed(data)
num = 0
for offset, byte in enumerate(data):
num += byte << (offset * 8)
return num
Used like this:
>>> data = b'\x85\x13\xbd|\xfb\xbc\xc3\x95\xbeL6L\xfa\xbf0U_`$]\xca\xee]z\xef\xa0\xd6(\x15\x8b\xca\x0e\x1f7\xa9\xf0\xa4\x98\xc5\xdf\xcdM5\xef\xc2\x052`\xeb\x13\xd9\x99B.\x95\xb2\xbd\x96\xd9\x14\xe6F\x9e\xfd\xd8\x00'
>>> from_bytes(data)
2909369579440607969688280064437289348250138784421305732473112318543540722321676649649580720015118044118243611774710427666475769804427735898727217762490192773L
As for struct, you cannot really use this, as it only supports unpacking elements of a certain kind, up to 8 byte integers. But since you want to handle arbitrary byte strings, you will have to use something else.
You can use a combination of .encode('hex') and int(x, 16):
num = int(str(value).encode('hex'), 16)
Note that you need to use something like
int(''.join(reversed(value)).encode('hex'), 16)
in order to parse it as little endian.
reference: https://stackoverflow.com/a/444814/8747