change endian of hex in bitstring unpack - python

I'm using the module bitstring to unpack a 24 byte boundary file. I dont have control over the input file. The default interpretation of the module is big-endian apparently which is easy to fix when unpacking data types like int or float but some data I want to be represented as hex values. Using the unpack hex values it displays the incorrect byte ordering. Is there a fix for this? Example input: D806 desired output: 06D8
from bitstring import ConstBitStream
fp = ConstBitStream(filename="testfile.bin")
firstChunk = fp.read(2*8)
data=firstChunk.unpack('hex:16')
print(data)

You could use ordinary Python formatting on a little-endian integer interpretation.
Rather than a read then unpack you also can do both together:
print('{:0>4X}'.format(fp.read('uintle:16')))
This reads then next 16 bits from the stream, interprets it as an unsigned little-endian integer then formats it as four characters of hexadecimal, right-aligned and padded with zeros.

Related

How to convert with Python this hex to its decoded output?

I'm trying to convert the following (that seems to be an HEX) with Python with its decoded output:
I want to convert this:
To this:
How to do this?
This is the string:
0x00000000000000000000000000000000000000000000000000000000000000040000000000000000000000000000000000000000000000000000000000000080000000000000000000000000000000000000000000000000000000006331b7e000000000000000000000000000000000000000000000000000000000000000c0000000000000000000000000000000000000000000000000000000000000000474657374000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000007566963746f727900000000000000000000000000000000000000000000000000
First you need to convert the hex into a bytearray:
hex = 0x00000000000000000000000000000000000000000000000000000000000000040000000000000000000000000000000000000000000000000000000000000080000000000000000000000000000000000000000000000000000000006331b7e000000000000000000000000000000000000000000000000000000000000000c0000000000000000000000000000000000000000000000000000000000000000474657374000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000007566963746f727900000000000000000000000000000000000000000000000000
b = bytearray.fromhex(hex).decode()
Then you will need to determine the layout of the bytes. For example, an unit256 is probably 32 bytes which is 64 hex digits:
a = b[:64]
print(int.from_bytes(a, "big"))
Here I assume the bytes are in big-endian. If they are instead little-endian, you can use "little" instead of "big". You will need to learn about so-called "endianness" to understand this better.
You can get the other uint256 in a similar way.
As for the strings, I don't know what their length is. You will have to research the format for Ethereum blockchain data. Once you determine the length, you can use a similar technique to get the bytes for each string and then decode it into characters.
Just use the inbuilt decode function in Python:
str="0x00000000000000000000000000000000000000000000000000000000000000040000000000000000000000000000000000000000000000000000000000000080000000000000000000000000000000000000000000000000000000006331b7e000000000000000000000000000000000000000000000000000000000000000c0000000000000000000000000000000000000000000000000000000000000000474657374000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000007566963746f727900000000000000000000000000000000000000000000000000" #YOUR HEX
str.decode("hex")
Alternatively, if that does not work, you can use:
bytearray.fromhex(str).decode()

Bytes representation in Python

I need to do some work with bytes in Python and I've come across a byte string I don't really understand:
b"H\x00\x84\xffQ\x00\xa6\xff+\x00\x96\xff\xc2\xffI\xff\xa5\xff'\xff\x8a\xff\x19\xff\x19\xff\xf6\xfe\xb0\xfe\xc7\xfeJ\xfel\xfe\xf8\xfd+\xfe\xef\xfd:\xfe\xc3\xfd*\xfe_\xfd\xdf\xfd\n\xfd\xa3\xfd\xc6\xfcq\xfd\xbd\xfc?\xfd"
So according to what I know, bytes should be represented as \xhh, where hh are hexadecimal values (from 0 to f). However, in the third segment, there is \xffQ, and farther on there are other characters which shouldn't appear: I, ', *, :, ? etc.
I've used hex() method to see what would be the outcome, and I got this:
480084ff5100a6ff2b0096ffc2ff49ffa5ff27ff8aff19ff19fff6feb0fec7fe4afe6cfef8fd2bfeeffd3afec3fd2afe5ffddffd0afda3fdc6fc71fdbdfc3ffd
As you can see, some parts of the hex are the same, but e.g. \xffQ was changed into ff51. I need to append some data to this byte string, so I'd like to know what's going on there (or how to get the same result).
Both repr and str when processing a bytes object will print ASCII characters where possible. Otherwise their hexadecimal values will be shown in the form \xNN.
It might help you to visualise the content if you print it as all hexadecimal as follows:
b = b"H\x00\x84\xffQ\x00\xa6\xff+\x00\x96\xff\xc2\xffI\xff\xa5\xff'\xff\x8a\xff\x19\xff\x19\xff\xf6\xfe\xb0\xfe\xc7\xfeJ\xfel\xfe\xf8\xfd+\xfe\xef\xfd:\xfe\xc3\xfd*\xfe_\xfd\xdf\xfd\n\xfd\xa3\xfd\xc6\xfcq\xfd\xbd\xfc?\xfd"
print(''.join(hex(b_) for b_ in b))
Output:
0x480x00x840xff0x510x00xa60xff0x2b0x00x960xff0xc20xff0x490xff0xa50xff0x270xff0x8a0xff0x190xff0x190xff0xf60xfe0xb00xfe0xc70xfe0x4a0xfe0x6c0xfe0xf80xfd0x2b0xfe0xef0xfd0x3a0xfe0xc30xfd0x2a0xfe0x5f0xfd0xdf0xfd0xa0xfd0xa30xfd0xc60xfc0x710xfd0xbd0xfc0x3f0xfd
Or you can use binascii module if you want to visualize the content:
import binascii
print(binascii.hexlify(b"hello world"))
Output:
68656c6c6f20776f726c64

How to unpack from a binary file a byte array using Python?

I'm giving myself a crash course in reading a binary file using Python. I'm new to both, so please bear with me.
The file format's documentation tells me that the first 16 bytes are a GUID and further reading tells me that this GUID is formatted thus:
typedef struct {
unsigned long Data1;
unsigned short Data2;
unsigned short Data3;
byte Data4[8];
} GUID,
UUID,
*PGUID;
I've got as far us being able to unpack the first three entries in the struct, but I'm getting stumped on #4. It's an array of 8 bytes I think but I'm not sure how to unpack it.
import struct
fp = open("./file.bin", mode='rb')
Data1 = struct.unpack('<L', fp.read(4)) # unsigned long, little-endian
Data2 = struct.unpack('<H', fp.read(2)) # unsigned short, little-endian
Data3 = struct.unpack('<H', fp.read(2)) # unsigned short, little-endian
Data4 = struct.unpack('<s', bytearray(fp.read(8))) # byte array with 8 entries?
struct.error: unpack requires a bytes object of length 1
What am I doing wrong for Data4? (I'm using Python 3.2 BTW)
Data1 thru 3 are OK. If I use hex() on them I am getting the correct data that I'd expect to see (woohoo) I'm just failing over on the syntax of this byte array.
Edit: Answer
I'm reading a GUID as defined in MS-DTYP and this nailed it:
data = uuid.UUID(bytes_le=fp.read(16))
If you want an 8-byte string, you need to put the number 8 in there:
struct.unpack('<8s', bytearray(fp.read(8)))
From the docs:
A format character may be preceded by an integral repeat count. For example, the format string '4h' means exactly the same as 'hhhh'.
…
For the 's' format character, the count is interpreted as the length of the bytes, not a repeat count like for the other format characters; for example, '10s' means a single 10-byte string, while '10c' means 10 characters. If a count is not given, it defaults to 1. For packing, the string is truncated or padded with null bytes as appropriate to make it fit. For unpacking, the resulting bytes object always has exactly the specified number of bytes. As a special case, '0s' means a single, empty string (while '0c' means 0 characters).
However, I'm not sure why you're doing this in the first place.
fp.read(8) gives you an 8-byte bytes object. You want an 8-byte bytes object. So, just do this:
Data4 = fp.read(8)
Converting the bytes to a bytearray has no effect except to make a mutable copy. Unpacking it just gives you back a copy of the same bytes you started with. So… why?
Well, actually, struct.unpack returns a tuple whose one value is a copy of the same bytes you started with, but you can do that with:
Data4 = (fp.read(8),)
Which raises the question of why you want four single-element tuples in the first place. You're going to be doing Data1[0], etc. all over the place for no good reason. Why not this?
Data1, Data2, Data3, Data4 = struct.unpack('<LHH8s', fp.read(16))
Of course if this is meant to read a UUID, it's always better to use the "batteries included" than to try to build your own batteries from nickel and cadmium ore. As icktoofay says, just use the uuid module:
data = uuid.UUID(bytes_le=fp.read(16))
But keep in mind that Python's uuid uses the 4-2-2-1-1-6 format, not the 4-2-2-8 format. If you really need exactly that format, you'll need to convert it, which means either struct or bit twiddling anyway. (Microsoft's GUID makes things even more fun by using a 4-2-2-2-6 format, which is not the same as either, and representing the first 3 in native-endian and the last two in big-endian, because they like to make things easier…)
UUIDs are supported by Python with the uuid module. Do something like this:
import uuid
my_uuid = uuid.UUID(bytes_le=fp.read(16))

python reverse byteorder from network service

I get the following bytes from a network service: \x83\x08\x04\x04\x60\x02\x00\x81\x15\x01\x01 These are 8 bit number. I want to change the representation to my system's representation (32 bits) to be able to work on the bytes. How would I do this with python? Is there a special 'reverse' function for this?
best regards
If you have 8-bit numbers the byte order is irrelevant, as there is only one byte in each of them. If you want to convert every character to integer you can write:
struct.unpack("11B", "\x83\x08\x04\x04\x60\x02\x00\x81\x15\x01\x01")
or
struct.unpack("!11B", "\x83\x08\x04\x04\x60\x02\x00\x81\x15\x01\x01")
or
map(ord, "\x83\x08\x04\x04\x60\x02\x00\x81\x15\x01\x01")
It's equivalent.
If string contains 16-bit or 32-bit integers, you can write things like:
struct.unpack("!IIHB", "\x83\x08\x04\x04\x60\x02\x00\x81\x15\x01\x01")
which would be decoded as two 4-byte, one 2-byte and one 1-byte unsigned integers. The ! (which is equivalent to big-endian >) means that string is in network byte order, so all integers larger than one byte can be converted correctly to your native byte order.
EDIT: If what you want is to get eleven numbers and process them in reversed order, you should use one of above methods and call reversed, for example: reversed(map(ord, data)); but this reverses the order regardless of your native byte order. You didn't say what the data really is thou and I'm not convinced endianness does matter here.
Determine which byte order the bytes are in, and supply the correct byte order character to struct.unpack.
If you want to reverse all of the bytes in a string, you can do this:
'example string'[::-1]
I would recommend the struct module for unpacking network or otherwise binary data, as you otherwise don't have a good way to tell where exactly the reversing needs to happen. It allows you to specify the byte order.
I'm not sure what you mean by 8308040460020081150101, but the struct package should have everything you need.
Have you looked at the core struct library? It has methods for converting byte orders.

Convert Python byte to "unsigned 8 bit integer"

I am reading in a byte array/list from socket. I want Python to treat the first byte as an "unsigned 8 bit integer". How is it possible to get its integer value as an unsigned 8 bit integer?
Use the struct module.
import struct
value = struct.unpack('B', data[0])[0]
Note that unpack always returns a tuple, even if you're only unpacking one item.
Also, have a look at this SO question.
bytes/bytearray is a sequence of integers. If you just access an element by its index you'll have an integer:
>>> b'abc'
b'abc'
>>> _[0]
97
By their very definition, bytes and bytearrays contain integers in the range(0, 256). So they're "unsigned 8-bit integers".
Another very reasonable and simple option, if you just need the first byte’s integer value, would be something like the following:
value = ord(data[0])
If you want to unpack all of the elements of your received data at once (and they’re not just a homogeneous array), or if you are dealing with multibyte objects like 32-bit integers, then you’ll need to use something like the struct module.

Categories

Resources