What is a suitable buffer for Python's struct module - python

In Python I'm accessing a binary file by reading it into a string and then using struct.unpack(...). Now I want to write to that string using struct.pack_into(...), but I get the error "Cannot use string as modifiable buffer". What would be a suitable buffer for use with the struct module?

As noted in another answer, struct_pack is probably all you need and should use. However, objects of type array support the buffer protocol and can be modified:
>>> import array, struct
>>> a = array.array('c', ' ' * 1000)
>>> c = 'a'; i = 1
>>> struct.pack_into('ci', a, -0, c, i)
>>> a
array('c', 'a\x00\x00\x00\x01\x00\x00\x00 ...
The original buffer protocol was a bit of a hack primarily for C extensions. It has been deprecated and replaced by a new C-level buffer API and memoryview objects in Python 3 (and in the upcoming 2.7).

If you aren't trying to pack it into a specific object, just use struct.pack to return a string.
Otherwise, ctypes.create_string_buffer is one way to obtain a mutable buffer.

Two possibilities leap immediately to mind:
You can use the Python stringio module to make a read/write buffer with file semantics.
You can use the Python array module to get a buffer you can treat like a list, but which will contain just binary bytes.

Related

hash-like string shortener with decoder

I have a really long string that I want to shorten into a smaller string of random characters similar to what a hash does. However, I want to be able to undo it later to read it. As far as I know hashes are unable to be undone and thus I could not read it later.
You can use Python's builtin compression library zlib
>>> from zlib import compress, decompress
>>> original = 'A' * 1024
>>> len(original)
1024
>>> compressed = compress(original.encode('utf-8'))
>>> len(compressed)
17
>>> original == decompress(compressed).decode('utf-8')
True
Note that the original string must contain some patterns to be compressed efficiently. In general, the more entropy original has, the longer compressed will be.
I ended up using a database and just stuck with the long strings. Originally, I was going to shorten them and not have a database, but I think a database is better anyways and it allows me to store the long form.

Python Struct Packing Unpacking

I've two bytes \x22\x38. (Read from from a Process Memory so preferably little endian)
I am pretty sure these bytes get converted 0x588, But don't know how.
I want to know how python struct module can be used to convert \x22\x38 to 0x588.
There's something else going on if somehow 2216/3816 maps to anything other than 223816 or 382216, but in any event:
>>> import struct
>>> data = b'\x22\x38'
>>> struct.unpack('<h', data)
(14370,)
>>> struct.unpack('>h', data)
(8760,)
Note that unpack() returns a tuple. h is for short (as in a C short), the < or > sets the endianness. See the struct package docs for full info.

python: how to generate char by adding int

I can use 'a'+1 to get 'b' in C language, so what the convient way to do this in Python?
I can write it like:
chr(ord('a')+1)
but I don't know whether it is the best way.
Yes, this is the best way. Python doesn't automatically convert between a character and an int the way C and C++ do.
Python doesn't actually have a character type, unlike C, so yea, chr(ord is the way to do it.
If you wanted to do it a bit more cleanly, you could do something like:
def add(c, x):
return chr(ord(c)+x)
There is the bytearray type in Python -
it is slower than regular strings, but behaves mostly like a C string:
it is mutable, acessing inidividual elements raise 0 - 255 integer numbers, insetead of substrings with lenght 1, and you can assign to the elements. Still, it is represented as a string, and in Python 2, can be used in most places a string can without being cast to a str object:
>>> text = bytearray("a")
>>> text
bytearray(b'a')
>>> print text
a
>>> text[0]+=1
>>> print text
b
>>> text[0]
98
>>> print "other_text" + text
other_textb
When using Python 3, to use the contents of a bytearray as a text object, simply call its decode method with an appropriate encoding such as "latin1" or utf-8":
>>> print ("other_text" + text.decode("latin1"))
What you're doing is really the right way. Python does not conflate a character with its numerical codepoint, as C and similar languages do. The reason is that once you go beyond ASCII, the same integral value can represent different characters, depending on the encoding. C emphasizes direct access to the underlying hardware formats, but python emphasizes well-defined semantics.

Exploits in Python - manipulating hex strings

I'm quite new to python and trying to port a simple exploit I've written for a stack overflow (just a nop sled, shell code and return address). This isn't for nefarious purposes but rather for a security lecture at a university.
Given a hex string (deadbeef), what are the best ways to:
represent it as a series of bytes
add or subtract a value
reverse the order (for x86 memory layout, i.e. efbeadde)
Any tips and tricks regarding common tasks in exploit writing in python are also greatly appreciated.
In Python 2.6 and above, you can use the built-in bytearray class.
To create your bytearray object:
b = bytearray.fromhex('deadbeef')
To alter a byte, you can reference it using array notation:
b[2] += 7
To reverse the bytearray in place, use b.reverse(). To create an iterator that iterates over it in reverse order, you can use the reversed function: reversed(b).
You may also be interested in the new bytes class in Python 3, which is like bytearray but immutable.
Not sure if this is the best way...
hex_str = "deadbeef"
bytes = "".join(chr(int(hex_str[i:i+2],16)) for i in xrange(0,len(hex_str),2))
rev_bytes = bytes[::-1]
Or might be simpler:
bytes = "\xde\xad\xbe\xef"
rev_bytes = bytes[::-1]
In Python 2.x, regular str values are binary-safe. You can use the binascii module's b2a_hex and a2b_hex functions to convert to and from hexadecimal.
You can use ordinary string methods to reverse or otherwise rearrange your bytes. However, doing any kind of arithmetic would require you to use the ord function to get numeric values for individual bytes, then chr to convert the result back, followed by concatenation to reassemble the modified string.
For mutable sequences with easier arithmetic, use the array module with type code 'B'. These can be initialized from the results of a2b_hex if you're starting from hexadecimal.

Endianness of integers in Python

I'm working on a program where I store some data in an integer and process it bitwise. For example, I might receive the number 48, which I will process bit-by-bit. In general the endianness of integers depends on the machine representation of integers, but does Python do anything to guarantee that the ints will always be little-endian? Or do I need to check endianness like I would in C and then write separate code for the two cases?
I ask because my code runs on a Sun machine and, although the one it's running on now uses Intel processors, I might have to switch to a machine with Sun processors in the future, which I know is big-endian.
Python's int has the same endianness as the processor it runs on. The struct module lets you convert byte blobs to ints (and viceversa, and some other data types too) in either native, little-endian, or big-endian ways, depending on the format string you choose: start the format with # or no endianness character to use native endianness (and native sizes -- everything else uses standard sizes), '~' for native, '<' for little-endian, '>' or '!' for big-endian.
This is byte-by-byte, not bit-by-bit; not sure exactly what you mean by bit-by-bit processing in this context, but I assume it can be accomodated similarly.
For fast "bulk" processing in simple cases, consider also the array module -- the fromstring and tostring methods can operate on large number of bytes speedily, and the byteswap method can get you the "other" endianness (native to non-native or vice versa), again rapidly and for a large number of items (the whole array).
If you need to process your data 'bitwise' then the bitstring module might be of help to you. It can also deal with endianness between platforms.
The struct module is the best standard method of dealing with endianness between platforms. For example this packs and unpack the integers 1, 2, 3 into two 'shorts' and one 'long' (2 and 4 bytes on most platforms) using native endianness:
>>> from struct import *
>>> pack('hhl', 1, 2, 3)
'\x00\x01\x00\x02\x00\x00\x00\x03'
>>> unpack('hhl', '\x00\x01\x00\x02\x00\x00\x00\x03')
(1, 2, 3)
To check the endianness of the platform programmatically you can use
>>> import sys
>>> sys.byteorder
which will either return "big" or "little".
The following snippet will tell you if your system default is little endian (otherwise it is big-endian)
import struct
little_endian = (struct.unpack('<I', struct.pack('=I', 1))[0] == 1)
Note, however, this will not affect the behavior of bitwise operators: 1<<1 is equal to 2 regardless of the default endianness of your system.
Check when?
When doing bitwise operations, the int in will have the same endianess as the ints you put in. You don't need to check that. You only need to care about this when converting to/from sequences of bytes, in both languages, afaik.
In Python you use the struct module for this, most commonly struct.pack() and struct.unpack().

Categories

Resources