Reading Strings from a binary file - python

I have a binary file written by the delphi. This is what i know:
Block 1: 4 bytes, stands for a integer value of 32 bits.
Block 2: A String value (The length is not fixed for all binary files)
Block 3: 4 bytes, stands for a integer value of 32 bits.
Block 4: A String value (The length is not fixed for all binary files)
...
BlockN
i made this to read the first block value:
import struct
f = open("filename", 'rb')
value = struct.unpack('i', f.read(4))
What about the Strings values? What a good solution would be like? Is there any way to iterate over the string and find the final delimiter "\0" of each string value like in C?

It's a little more complex with the unpack if you don't know the length. I give you a reference which should solve your problem.
packing and unpacking variable length array/string using the struct module in python

I discovered that Delphi use a 7 bit integer compression to specify at beginning of a string, how many bytes need to read.I found here the same algorithm implemented with python. So, i just have to pass the file into decode7bit(bytes): function and it will tell me how many bytes i have to read forward.

Related

File compression/decompression of binary representations of integer lists

Currently, I have a system that converts a list of integers to their binary representations. I calculate the number of bytes each number requires and then use the to_bytes() function to convert them to bytes, like so:
o = open(outFileName, "wb")
for n in result:
numBytes = math.ceil(n.bit_length()/8)
o.write(n.to_bytes(numBytes, 'little'))
o.close()
However, since the bytes are of varying lengths, what would be the method to allow an unpacking program/function to know how long each byte was? I have heard uses of the struct module and specifically the pack function, but with a focus on efficiency and reducing the size of the file as much as possible in mind, what would be the best way of approaching this to allow such an unpacking program to retrieve the exact list of originally encoded integers?
You can't. Your encoding maps different lists of integers to the same sequence of bytes. It is then impossible to know which one was the original input.
You need a different encoding.
Take a look at using the high bit each byte. There are other ways that might be better, depending on the distribution of your integers, such as Golomb coding.

stuck in a python code for encryption. error is Unknown format code 'x' for object of type 'str'

I am very new to programming and cryptography, but I took part in a CTF competition where the provided us with a hex and we are supposed to crack it. Through some work and research, I got this code
import binascii
my_ciphertext = "0f05080e1220360106190c3610061c360207061e361e01081d4e1a2e0600070e362607210c1b0c4814"
binary_rep_of_ciphertext = binascii.unhexlify(my_ciphertext)# makes it binary
array_of_ciphertext = bytearray(binary_rep_of_ciphertext)#makes binary things to array elements
def xor_string_and_char(my_char_value):
result= ''.join([chr(cc ^ my_char_value) for cc in array_of_ciphertext])
return '{:x}'.format(result) # convert back to hexadecimal
x = 0
assert x==0
while x in range(255):
my_plaintext = xor_string_and_char(x)
print('b' + my_plaintext)
x=x+1
but I keep getting an error, and I dont know how to fix it. I am not sure what is wrong with the code cause I am not good at python at all (ps: please use newbie language to explain)
error:
Unknown format code 'x' for object of type 'str'
The problem is that the formatting to hexadecimals is not provided for strings, which are used to represent a single character / byte in your code. Actually, you should use integers to represent bytes, and integer arrays to represent multiple bytes where each integer is in the range 0..255 inclusive.
Strings in Python are fortunately used for text, consisting of characters rather than bytes, and they should not be used to represent bytes. If your cryptographic algorithm inputs / output bytes (which is the commonly the case if you use XOR) then strings should not be used other than for reporting / debugging purposes.
If you want to present a string as hex, then you can use hexlify which is the reverse of unhexlify used in your code. But it seems rather an old school method as Python >= 3.5 seems to supply other methods to handle byte arrays and hexadecimal encoding / decoding...

How to write a single hex value to file in Python?

I'm trying to write a single-hex value, say 'F' to a file:
a = int('F', 16)
f.write(chr(a))
However, this code segment gives me the file with 0F. I just want the single hex F in my file. I know this is because a char is represented by a byte, is there a way to directly write the hex value without the pad?
Use string formatting:
f.write("{:X}".format(a))
It will write it as F:
>>> "{:X}".format(a)
'F'
What you are trying to do is not possible on most modern operating systems. The smallest data unit that a general purpose computing platform can handle is is one byte.
Check this wiki article for additional details where in it it states:
"Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit of memory in many computer architectures. "
You can use the struct module to write raw data to a file. This will write a single byte to the file
open('file','wb').write(struct.pack('b', 0xf))

Binary I/O in Python

I am experimenting with binary reading and writing to and from files in Python. I am trying to teach myself a bit of programming (it's not really teaching myself, since I use the internet, but anyway...). My problem is that reading a file in Python in binary does not actually output the bits to me, but seems to process it into text already.
Example:
My system has a file "Test.txt" in the same folder as the script.
The content of this file is the following text written in notepad:
Testing Temp "Testing"
This is a small piece of the code that is giving me some confusion:
f=open("Test.txt", "rb")
print(f.read(22))
This results in the following output:
b'Testing Temp "Testing"'
However, I want bits in the form of a string (so a string of 0's and 1's) as output. How can I do this?
What you have is a sequence of bytes (note the b at the beginning).
You can access the value of every single byte using indexing. In your example, if s=f.read(22) then s[0] will be 84 which is the ASCII code for T.
If you want to obtain the binary representation of a byte you use the bin built-in:
>>> bin(84)
'0b1010100'
It also adds the 0b prefix which is python's prefix for binary literals:
>>> 0b1010100
84
To obtain the bit-per-bit binary representation you can simply access every byte and call bin on each value:
def to_bits(contents):
return ''.join(bin(byte)[2:].zfill(8) for byte in contents)
which results in:
>>> to_bits(b'Testing Temp "Testing"')
'01010100011001010111001101110100011010010110111001100111001000000101010001100101011011010111000000100000001000100101010001100101011100110111010001101001011011100110011100100010'
Note that you have to call zfill(8) because bin can return representation shorter than 8 bits:
>>> bin(1)[2:]
'1'
>>> bin(1)[2:].zfill(8)
'00000001'

How to unpack from a binary file a byte array using Python?

I'm giving myself a crash course in reading a binary file using Python. I'm new to both, so please bear with me.
The file format's documentation tells me that the first 16 bytes are a GUID and further reading tells me that this GUID is formatted thus:
typedef struct {
unsigned long Data1;
unsigned short Data2;
unsigned short Data3;
byte Data4[8];
} GUID,
UUID,
*PGUID;
I've got as far us being able to unpack the first three entries in the struct, but I'm getting stumped on #4. It's an array of 8 bytes I think but I'm not sure how to unpack it.
import struct
fp = open("./file.bin", mode='rb')
Data1 = struct.unpack('<L', fp.read(4)) # unsigned long, little-endian
Data2 = struct.unpack('<H', fp.read(2)) # unsigned short, little-endian
Data3 = struct.unpack('<H', fp.read(2)) # unsigned short, little-endian
Data4 = struct.unpack('<s', bytearray(fp.read(8))) # byte array with 8 entries?
struct.error: unpack requires a bytes object of length 1
What am I doing wrong for Data4? (I'm using Python 3.2 BTW)
Data1 thru 3 are OK. If I use hex() on them I am getting the correct data that I'd expect to see (woohoo) I'm just failing over on the syntax of this byte array.
Edit: Answer
I'm reading a GUID as defined in MS-DTYP and this nailed it:
data = uuid.UUID(bytes_le=fp.read(16))
If you want an 8-byte string, you need to put the number 8 in there:
struct.unpack('<8s', bytearray(fp.read(8)))
From the docs:
A format character may be preceded by an integral repeat count. For example, the format string '4h' means exactly the same as 'hhhh'.
…
For the 's' format character, the count is interpreted as the length of the bytes, not a repeat count like for the other format characters; for example, '10s' means a single 10-byte string, while '10c' means 10 characters. If a count is not given, it defaults to 1. For packing, the string is truncated or padded with null bytes as appropriate to make it fit. For unpacking, the resulting bytes object always has exactly the specified number of bytes. As a special case, '0s' means a single, empty string (while '0c' means 0 characters).
However, I'm not sure why you're doing this in the first place.
fp.read(8) gives you an 8-byte bytes object. You want an 8-byte bytes object. So, just do this:
Data4 = fp.read(8)
Converting the bytes to a bytearray has no effect except to make a mutable copy. Unpacking it just gives you back a copy of the same bytes you started with. So… why?
Well, actually, struct.unpack returns a tuple whose one value is a copy of the same bytes you started with, but you can do that with:
Data4 = (fp.read(8),)
Which raises the question of why you want four single-element tuples in the first place. You're going to be doing Data1[0], etc. all over the place for no good reason. Why not this?
Data1, Data2, Data3, Data4 = struct.unpack('<LHH8s', fp.read(16))
Of course if this is meant to read a UUID, it's always better to use the "batteries included" than to try to build your own batteries from nickel and cadmium ore. As icktoofay says, just use the uuid module:
data = uuid.UUID(bytes_le=fp.read(16))
But keep in mind that Python's uuid uses the 4-2-2-1-1-6 format, not the 4-2-2-8 format. If you really need exactly that format, you'll need to convert it, which means either struct or bit twiddling anyway. (Microsoft's GUID makes things even more fun by using a 4-2-2-2-6 format, which is not the same as either, and representing the first 3 in native-endian and the last two in big-endian, because they like to make things easier…)
UUIDs are supported by Python with the uuid module. Do something like this:
import uuid
my_uuid = uuid.UUID(bytes_le=fp.read(16))

Categories

Resources