struct.unpack 6 bytes into short and int fails. Why?

struct.unpack 6 bytes into short and int fails. Why? - python

s = '\x01\x00\x02\x00\x00\x00'
struct.unpack('hi',s)
I expect to get (1,2), but instead get the error:
error: unpack requires a string argument of length 8
If I perform the two unpacks separately it works:
myshort = struct.unpack('h',s[:2])
myint = struct.unpack('i',s[2:])
Also, interestingly, it will accept it if the format string is 'ih' instead of 'hi'.
What am I missing?

This is because of C structure alignment. If you actually want your data items to remain unaligned, prefix a = sign before the formatted string
>>> s = '\x01\x00\x02\x00\x00\x00'
>>> struct.unpack('=hi',s)
(1, 2)
Refer the documentation 7.3.2.1. Byte Order, Size, and Alignment

Related

How to get the last byte item from a bytes list in Python?

I have a bytes list and want to get the last item, while preserving its bytes type. Using [-1] it gives out an int type, so this is not a direct solution.
Example code:
x = b'\x41\x42\x43'
y = x[-1]
print (y, type(y))
# outputs:
67 <class 'int'>
For an arbitrary index I know how to do it:
x = b'\x41\x42\x43'
i = 2 # assume here a valid index in reference to list length
y = x[i:i+1]
print (y, type(y))
# outputs:
b'C' <class 'bytes'>
Probably I can calculate the list length and then point an absolute length-1 number, rather than relative to list end.
However, is there a more elegant way to do this ? (i.e. similar to the simple [-1]) I also cannot imagine how to adapt the [i:i+1] principle in reverse from list end.

There are a number of ways I think the one you're interested in is:
someBytes[-1:]
Edit: I just randomly decided to elaborate a bit on what is going on and why this works. A bytes object is an immutable arbitrary memory buffer so a single element of a bytes object is a byte, which is best represented by an int. This is why someBytes[-1] will be the last int in the buffer. It can be counterintuitive when you're using a bytes object like a string for whatever reason (pattern matching, handling ascii data and not bothering to convert to a string,) because a string in python (or a str to be pedantic,) represents the idea of textual data and isn't tied to any particular binary encoding (though it defaults to UTF-8). So the last element of "hello" is "o" which is a string since python has no single char type, just strings of length 1. So if you're treating a bytes object like a memory buffer you likely want an int but if you're treating it like a string you want a bytes object of length 1. So this line tells python to return a slice of the bytes object from the last element to the end of the bytes object which results in a slice length one containing only the last value in the bytes object and a slice of a bytes object is a bytes object.

You can trivially cast that int back to a bytes object:
>>> z = bytes([y])
>>> z == b'C'
True
...in the event that you can't easily get around fetching the values as ints, say because another function you don't have control of returns them that way.

If you have:
x = b'\x41\x42\x43'
Then you will get:
>>> x
b'ABC'
As you said, x[-1] will give you Ord() value.
>>> x[-1]
67
However, if you want to get the value of this, you can give:
>>> x.decode()[-1]
'C'
If you do want to get the value 43, then you can give it as follows:
>>> "{0:x}".format(x[-1])
'43'

Example above
>>> z = bytes([y])
>>> z == b'C'
True
Same you can get with
x.strip()[-1:]
Output
b'C'
So,
bytes(b'\x41\x42\x43')
Give
b'ABC'

Python not passing correct size of string to C

When I try passing a 16 character string from python to C and scramble it, I keep getting random error codes back.
s = ctypes.c_wchar_p("H86ETJJJJHGFTYHr")
print(libc.hash_password(s))
At the start of the code I added a statement to return the size of the string back to python,
however it keeps returning a value of 8
if (sizeof(my_string) != 17) return sizeof(my_string);
If I try to return a single element of the array, it will return a number, which I am assuming is the ascii value of the character, and the code does not error out.
This works for the last element as well, which is correctly recognised as a null.
The code works within C itself perfectly. So how could I get C to accept the correct size string, or python to accept the return string?
EDIT: Forgot to mention, when I do
sizeof(*my_string)
it returns a 1
EDIT 2:
Here is the function definition
unsigned char *hash_password(char *input_string)

In Python 3, "H86ETJJJJHGFTYHr" is a str object made up of Unicode codepoints. Your C function declaration is unsigned char *hash_password(char *input_string). Python str is marshaled as wchar_t* when passed via ctypes, not char*. Use a bytes object for that.
Assuming sizeof is ctypes.sizeof, it works like C and returns the size of the equivalent C object. for a c_wchar_p, that's a w_char_t*, and pointers typically have a size of 4 or 8 bytes (32- or 64-bit OS). It is not the length of the string.
It's also always a good idea to declare the arguments types and return type of a function when using ctypes, so it can check for type and number of arguments correctly, instead of guessing:
import ctypes
dll = ctypes.CDLL('./your.dll')
dll.hash_password.argtypes = ctypes.c_char_p,
dll.hash_password.restype = ctypes.c_char_p
A quick-and-dirty example (note printf returns length of string printed):
>>> from ctypes import *
>>> dll = CDLL('msvcrt')
>>> dll.printf('hello\n') # ctypes assume wchar_t* for str, so passes UTF16-encoded data
h1 # of 68 00 65 00 ... and prints only to first null, 1 char.
>>> dll.printf.argtypes=c_char_p, # tell ctypes the correct argument type
>>> dll.printf('hello\n') # now it detects str is incorrect.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ctypes.ArgumentError: argument 1: <class 'TypeError'>: wrong type
>>> dll.printf(b'hello\n') # pass bytes, and `char*` is marshaled to C
hello
6

In C sizeof doesn't ever return the length of the string it returns the size in memory of the variable.
For a string declared as
char *string;
Then string is a pointer to a character, and on your system it seems like pointers are 64 bits (i.e. 8 bits).
When you do *string in C you get the content of the first element that string points to - i.e. a single character.
To get the length of a string in C, use strlen(my_string).

sizeof returns the size of an object in memory. This is not the same thing as the length of a string.
In your C code, my_string is a pointer to a string. sizeof(my_string) is the size of this pointer: 8 bytes on a 64-bit machine. sizeof(*my_string) is the size of what my_string points to. Since you're getting 1, it likely means that there's another problem in your C code, which is that you're mixing up single-byte characters (char, whose size is always 1 by definition) and wide characters (wchar_t, whose size is almost always 2 or 4).
Your string is a null-terminated wide character string. To obtain its length in C, call wcslen. Note that this means that your whole string processing code must use wchar_t and wcsxxx functions. If your string is a byte string, use char, strlen and other functions that work on byte strings.

Python struct.pack and unpack

Im in no way an experienced python programmer,thats why i believe there may be an obvious answer to this but i just can't wrap my head around the struct.pack and unpack.
i have the following code:
struct.pack("<"+"I"*elements, *self.buf[:elements])
I want to reverse the the packing of this, however im not sure how, i know that "<" means little endian and "I" is unsigned int and thats about it, im not sure how to use struct.unpack to reverse the packing.

struct.pack takes non-byte values (e.g. integers, strings, etc.) and converts them to bytes. And conversely, struct.unpack takes bytes and converts them to their 'higher-order' equivalents.
For example:
>>> from struct import pack, unpack
>>> packed = pack('hhl', 1, 2, 3)
>>> packed
b'\x00\x01\x00\x02\x00\x00\x00\x03'
>>> unpacked = unpack('hhl', packed)
>>> unpacked
(1, 2, 3)
So in your instance, you have little-endian unsigned integers (elements many of them). You can unpack them using the same structure string (the '<' + 'I' * elements part) - e.g. struct.unpack('<' + 'I' * elements, value).
Example from: https://docs.python.org/3/library/struct.html

Looking at the documentation: https://docs.python.org/3/library/struct.html
obj = struct.pack("<"+"I"*elements, *self.buf[:elements])
struct.unpack("<"+"I"*elements, obj)
Does this work for you?

Read string from binary file

I want to read bytes 1,2 and 3 from a file. I know it corresponds to a string (in this case it's ELF of a Linux binary header)
Following examples I could find on the net I came up with this:
with open('hello', 'rb') as f:
f.seek(1)
bytes = f.read(3)
string = struct.unpack('s', bytes)
print st
Looking at the official documentation of struct it seems that passing s as argument should allow me to read a string.
I get the error:
st = struct.unpack('s', bytes)
struct.error: unpack requires a string argument of length 1
EDIT: Using Python 2.7

In your special case, it is enough to just check
if bytes == 'ELF':
to test all three bytes in one step to be the three characters E, L and F.
But also if you want to check the numerical values, you do not need to unpack anything here. Just use ord(bytes[i]) (with i in 0, 1, 2) to get the byte values of the three bytes.
Alternatively you can use
byte_values = struct.unpack('bbb', bytes)
to get a tuple of the three bytes. You can also unpack that tuple on the fly in case the bytes have nameable semantics like this:
width, height, depth = struct.unpack('bbb', bytes)
Use 'BBB' instead of 'bbb' in case your byte values shall be unsigned.

In Python 2, read returns a string; in the sense "string of bytes". To get a single byte, use bytes[i], it will return another string but with a single byte. If you need the numeric value of a byte, use ord: ord(bytes[i]). Finally, to get numeric values for all bytes use map(ord, bytes).
In [4]: s = "foo"
In [5]: s[0]
Out[5]: 'f'
In [6]: ord(s[0])
Out[6]: 102
In [7]: map(ord, s)
Out[7]: [102, 111, 111]

Python Struct, size changed by alignment.

Here's the hex code I am trying to unpack.
b'ABCDFGHa\x00a\x00a\x00a\x00a\x00\x00\x00\x00\x00\x00\x01' (it's not supposed to make any sense)
labels = unpack('BBBBBBBHHHHH5sB', msg)
struct.error: unpack requires a bytes argument of length 24
From what I counted, both of those are length = 23, both the format in my unpack function and the length of the hex values. I don't understand.
Thanks in advance

Most processors access data faster when the data is on natural boundaries, meaning data of size 2 should be on even addresses, data of size 4 should be accessed on addresses divisible by four, etc.
struct by default maintains this alignment. Since your structure starts out with 7 'B', a padding byte is added to align the next 'H' on an even address. To prevent this in Python, precede your string with '='.
Example:
>>> import struct
>>> struct.calcsize('BBB')
3
>>> struct.calcsize('BBBH')
6
>>> struct.calcsize('=BBBH')
5

I think H is enforcing 2-byte alignment after your 7 B
Aha, the alignment info is at the top of http://docs.python.org/library/struct.html, not down by the definition of the format characters.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

struct.unpack 6 bytes into short and int fails. Why? - python

This is because of C structure alignment. If you actually want your data items to remain unaligned, prefix a = sign before the formatted string >>> s = '\x01\x00\x02\x00\x00\x00' >>> struct.unpack('=hi',s) (1, 2) Refer the documentation 7.3.2.1. Byte Order, Size, and Alignment

Related

How to get the last byte item from a bytes list in Python?

Python not passing correct size of string to C

Python struct.pack and unpack

Read string from binary file

Python Struct, size changed by alignment.

Categories

Resources