Python extraction of date data from a binary file

Python extraction of date data from a binary file - python

I have a file that I open in binary format using
with open(filename, 'br') as f2
I then need to extract certain blocks of Hex. There will be lots of these 'dates' in the code that will look like:
F2 96 E6 20 36 1B E4 40
I need to extract every instance of this in order for me to complete my date editing on it. Each 'date' will end with hex char 40 as above.
I have tried regex but these do not seem to work as I want.
For example
re.findall(b'............\\\x40', filename)
Can anyone assist?

I think your are confusing bytes with hex representation. 0x40 is a hexadecimal representation of the integer 64 and it's ascii code of the symbol #.
with open(filename, 'rb') as f:
results = re.findall('.{7}#', f.read())
print results
Please note, that following regexps are equivalent: '.{7}#', '.......#', '.......\x40'
>>> print 0x40, hex(64)
64 0x40
>>> print chr(0x40)
#

Related

How to read double, float and int values from binary files in python?

I have a binary file that was created in C++. The first value is double and the second is integer. I am reading the values fine using the following code in C++.
double dob_value;
int integer_value;
fread(&dob_value, sizeof(dob_value), 1, fp);
fread(&integer_value, sizeof(integer_value), 1, fp);
I am trying to read the same file in python but I am running into issues. My dob_value is 400000000.00 and my integer_value 400000. I am using following code in python for double.
def interpret_float(x):
return struct.unpack('d',x[4:]+x[:4])
with open(file_name, 'rb') as readfile:
dob = readfile.read(8)
dob_value = interpret_float(dob)[0]
val = readfile.read(4)
test2 = readfile.read(4)
integer_value = int.from_bytes(test2, "little")
My dob_value is 400000000.02384186 . My question is where is this extra decimals coming from? Also, how do I get the correct integer_value? With above code, my integer_value is 1091122467. I also have float values after integer but I haven't looked into that yet.

If the link goes broken and just in case the test.bin contains 00 00 00 00 84 D7 B7 41 80 1A 06 00 70 85 69 C0.
Your binary contains correct 41B7D78400000000 hexadecimal representation of 400000000.0 in the first 8 bytes. Running
import binascii
import struct
fname = r'test.bin'
with open(fname, 'rb') as readfile:
dob = readfile.read(8)
print(struct.unpack('d', dob)[0])
print(binascii.hexlify(dob))
outputs
>> 400000000.0
>> b'0000000084d7b741'
which is also correct little endian representation of the double. When you swap parts, you get
print(binascii.hexlify(dob[4:]+dob[:4]))
>> b'84d7b74100000000'
and if you check the decimal value, it will give you 5.45e-315, not what you expect. Moreover,
struct.unpack('d', dob[4:]+dob[:4])[0]
>>5.44740625e-315
So I'm not sure how you could get 400000000.02384186 from the code above. However, to obtain 400000000.02384186 using your test.bin, just skip the four bytes in the beginning:
with open(fname, 'rb') as readfile:
val = readfile.read(4)
dob = readfile.read(8)
dob = dob[4:]+dob[:4]
print(binascii.hexlify(dob))
print(struct.unpack('d', dob)[0])
>>b'801a060084d7b741'
>>400000000.02384186
Binary value 0x41B7D78400061A80 corresponds to 400000000.02384186. So you first read incorrect bytes, then incorrectly swap parts and get a result close to what you expect. Considering integer value, the 400000 is 0x00061A80, which is also present in the binary, but you definitely read past that bytes, since you used them for double, so you get wrong values.

How to read hex values at specific addresses in Python?

Say I have a file and I'm interested in reading and storing hex values at certain addresses, like the snippet below:
22660 00 50 50 04 00 56 0F 50 25 98 8A 19 54 EF 76 00
22670 75 38 D8 B9 90 34 17 75 93 19 93 19 49 71 EF 81
I want to read the value at 0x2266D, and be able to replace it with another hex value, but I can't understand how to do it. I've tried using open('filename', 'rb'), however this reads it as the ASCII representation of the values, and I don't see how to pick and choose when addresses I want to change.
Thanks!
Edit: For an example, I have
rom = open("filename", 'rb')
for i in range(5):
test = rom.next().split()
print test
rom.close()
This outputs: ['NES\x1a', 'B\x00\x00\x00\x00\x00\x00\x00\x00\x00!\x0f\x0f\x0f(\x0f!!', '!\x02\x0f\x0f', '!\x0f\x01\x08', '!:\x0f\x0f\x03!', '\x0f', '\x0f\x0f', '!', '\x0f\x0f!\x0f\x03\x0f\x12', '\x0f\x0f\x0f(\x02&%\x0f', '\x0f', '#', '!\x0f\x0f1', '!"#$\x14\x14\x14\x13\x13\x03\x04\x0f\x0f\x03\x13#!!\x00\x00\x00\x00\x00!!', '(', '\x0f"\x0f', '#\x14\x11\x12\x0f\x0f\x0f#', '\x10', "5'4\x0270&\x02\x02\x02\x02\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f126&\x13\x0f\x0f\x0f\x13&6222\x0f", '\x1c,', etc etc.
Much more than 5 bytes, and while some of it is in hex, some has been replaced with ASCII.

There's no indication that some of the bytes were replaced by their ASCII representations. Some bytes happen to be printable.
With a binary file, you can simply seek to the location offset and write the bytes in. Working with the line-iterator in the case of binary file is problematic, as there's no meaningful "lines" in the binary blob.
You can do in-place editing like follows (in fake Python):
with open("filename", "rb+") as f:
f.seek(0x2266D)
the_byte = f.read(1)
if len(the_byte) != 1:
# something's wrong; bolt out ...
else
transformed_byte = your_function(the_byte)
f.seek(-1, 1) # back one byte relative to the current position
f.write(transformed_byte)
But of course, you may want to do the edit on a copy, either in-memory (and commit later, as in the answer of #JosepValls), or on a file copy. The problem with gulping the whole file in memory is, of course, sometimes the system may choke ;) For that purpose you may want to mmap part of the file.

Given that is not a very big file (roms should fit fine in today's computer's memory), just do data = open('filename', 'rb').read(). Now you can do whatever you want to the data (if you print it, it will show ascii, but that is just data!). Unfortunately, string objects don't support item assignment, see this answer for more:
Change one character in a string in Python?
In your case:
data = data[0:0x2266C] + str(0xFF) + data[0x2266D:]

Write bitstream to file python 3 [duplicate]

I have a string (it could be an integer too) in Python and I want to write it to a file. It contains only ones and zeros I want that pattern of ones and zeros to be written to a file. I want to write the binary directly because I need to store a lot of data, but only certain values. I see no need to take up the space of using eight bit per value when I only need three.
For instance. Let's say I were to write the binary string "01100010" to a file. If I opened it in a text editor it would say b (01100010 is the ascii code for b). Do not be confused though. I do not want to write ascii codes, the example was just to indicate that I want to directly write bytes to the file.
Clarification:
My string looks something like this:
binary_string = "001011010110000010010"
It is not made of of the binary codes for numbers or characters. It contains data relative only to my program.

To write out a string you can use the file's .write method. To write an integer, you will need to use the struct module
import struct
#...
with open('file.dat', 'wb') as f:
if isinstance(value, int):
f.write(struct.pack('i', value)) # write an int
elif isinstance(value, str):
f.write(value) # write a string
else:
raise TypeError('Can only write str or int')
However, the representation of int and string are different, you may with to use the bin function instead to turn it into a string of 0s and 1s
>>> bin(7)
'0b111'
>>> bin(7)[2:] #cut off the 0b
'111'
but maybe the best way to handle all these ints is to decide on a fixed width for the binary strings in the file and convert them like so:
>>> x = 7
>>> '{0:032b}'.format(x) #32 character wide binary number with '0' as filler
'00000000000000000000000000000111'

Alright, after quite a bit more searching, I found an answer. I believe that the rest of you simply didn't understand (which was probably my fault, as I had to edit twice to make it clear). I found it here.
The answer was to split up each piece of data, convert them into a binary integer then put them in a binary array. After that, you can use the array's tofile() method to write to a file.
from array import *
bin_array = array('B')
bin_array.append(int('011',2))
bin_array.append(int('010',2))
bin_array.append(int('110',2))
with file('binary.mydata', 'wb') as f:
bin_array.tofile(f)

I want that pattern of ones and zeros to be written to a file.
If you mean you want to write a bitstream from a string to a file, you'll need something like this...
from cStringIO import StringIO
s = "001011010110000010010"
sio = StringIO(s)
f = open('outfile', 'wb')
while 1:
# Grab the next 8 bits
b = sio.read(8)
# Bail if we hit EOF
if not b:
break
# If we got fewer than 8 bits, pad with zeroes on the right
if len(b) < 8:
b = b + '0' * (8 - len(b))
# Convert to int
i = int(b, 2)
# Convert to char
c = chr(i)
# Write
f.write(c)
f.close()
...for which xxd -b outfile shows...
0000000: 00101101 01100000 10010000 -`.

Brief example:
my_number = 1234
with open('myfile', 'wb') as file_handle:
file_handle.write(struct.pack('i', my_number))
...
with open('myfile', 'rb') as file_handle:
my_number_back = struct.unpack('i', file_handle.read())[0]

Appending to an array.array 3 bits at a time will still produce 8 bits for every value. Appending 011, 010, and 110 to an array and writing to disk will produce the following output: 00000011 00000010 00000110. Note all the padded zeros in there.
It seems like, instead, you want to "compact" binary triplets into bytes to save space. Given the example string in your question, you can convert it to a list of integers (8 bits at a time) and then write it to a file directly. This will pack all the bits together using only 3 bits per value rather than 8.
Python 3.4 example
original_string = '001011010110000010010'
# first split into 8-bit chunks
bit_strings = [original_string[i:i + 8] for i in range(0, len(original_string), 8)]
# then convert to integers
byte_list = [int(b, 2) for b in bit_strings]
with open('byte.dat', 'wb') as f:
f.write(bytearray(byte_list)) # convert to bytearray before writing
Contents of byte.dat:
hex: 2D 60 12
binary (by 8 bits): 00101101 01100000 00010010
binary (by 3 bits): 001 011 010 110 000 000 010 010
^^ ^ (Note extra bits)
Note that this method will pad the last values so that it aligns to an 8-bit boundary, and the padding goes to the most significant bits (left side of the last byte in the above output). So you need to be careful, and possibly add zeros to the end of your original string to make your string length a multiple of 8.

numpy.array.tofile() binary file looks "strange" in notepad++

I am just wondering how the function actually stores the data. Because to me, it looks completely strange. Say I have the following code:
import numpy as np
filename = "test.dat"
print(filename)
fileobj = open(filename, mode='wb')
off = np.array([1, 300], dtype=np.int32)
off.tofile(fileobj)
fileobj.close()
fileobj2 = open(filename, mode='rb')
off = np.fromfile(fileobj2, dtype = np.int32)
print(off)
fileobj2.close()
Now I expect 8 bytes inside the file, where each element is represented by 4 bytes (and I could live with any endianness). However when I open up the file in a hex editor (used notepad++ with hex editor plugin) I get the following bytes:
01 00 C4 AC 00
5 bytes, and I have no idea at all what it represents. The first byte looks like it is the number, but then what follows is something weird, certainly not "300".
Yet reloading shows the original array.
Is this something I don't understand in python, or is it a problem in notepad++? - I notice the hex looks different if I select a different "encoding" (huh?). Also: Windows does report it being 8 bytes long.

You can tell very easily that the file actually does have 8 bytes, the same 8 bytes you'd expect (01 00 00 00 2C 01 00 00) just by using anything other than Notepad++ to look at the file, including just replacing your off = np.fromfile(fileobj2, dtype=np.int32) with off = fileobj2.read()thenprinting the bytes (which will give youb'\x01\x00\x00\x00,\x01\x00\x00'`*).
And, from your comments, after I suggested that, you tried it, and saw exactly that.
Which means this is either a bug in Notepad++, or a problem with the way you're using it; Python, NumPy, and your own code are perfectly fine.
* In case it isn't clear: '\x2c' and ',' are the same character, and bytes uses the printable ASCII representation for printable ASCII characters, as well as familiar escapes like '\n', when possible, only using the hex backslash escape for other values.

What are you expecting 300 to look like?
Write the array, and read it back as binary (in ipython):
In [478]: np.array([1,300],np.int32).tofile('test')
In [479]: with open('test','rb') as f: print(f.read())
b'\x01\x00\x00\x00,\x01\x00\x00'
There are 8 bytes, , is just a displayable byte.
Actually, I don't have to go through a file to get this:
In [505]: np.array([1,300]).tostring()
Out[505]: b'\x01\x00\x00\x00,\x01\x00\x00'
Do the same with:
[255]
b'\xff\x00\x00\x00'
[256]
b'\x00\x01\x00\x00'
[300]
b',\x01\x00\x00'
[1,255]
b'\x01\x00\x00\x00\xff\x00\x00\x00'
With powers of 2 (and 1 less) it is easy to identify a pattern in the bytes.
frombuffer converts a byte string back to an array:
In [513]: np.frombuffer(np.array([1,300]).tostring(),int)
Out[513]: array([ 1, 300])
In [514]: np.frombuffer(np.array([1,300]).data,int)
Out[514]: array([ 1, 300])
Judging from this last expression, the tofile is just writing the array buffer to the file as bytes.

How to Read-in Binary of a File in Python

In Python, when I try to read in an executable file with 'rb', instead of getting the binary values I expected (0010001 etc.), I'm getting a series of letters and symbols that I do not know what to do with.
Ex: ???}????l?S??????V?d?\?hG???8?O=(A).e??????B??$????????: ???Z?C'???|lP#.\P?!??9KRI??{F?AB???5!qtWI??8𜐮???!ᢉ?]?zъeF?̀z??/?n??
How would I access the binary numbers of a file in Python?
Any suggestions or help would be appreciated. Thank you in advance.

That is the binary. They are stored as bytes, and when you print them, they are interpreted as ASCII characters.
You can use the bin() function and the ord() function to see the actual binary codes.
for value in enumerate(data):
print bin(ord(value))

Byte sequences in Python are represented using strings. The series of letters and symbols that you see when you print out a byte sequence is merely a printable representation of bytes that the string contains. To make use of this data, you usually manipulate it in some way to obtain a more useful representation.
You can use ord(x) or bin(x) to obtain decimal and binary representations, respectively:
>>> f = open('/tmp/IMG_5982.JPG', 'rb')
>>> data = f.read(10)
>>> data
'\x00\x00II*\x00\x08\x00\x00\x00'
>>> data[2]
'I'
>>> ord(data[2])
73
>>> hex(ord(data[2]))
'0x49'
>>> bin(ord(data[2]))
'0b1001001'
>>> f.close()
The 'b' flag that you pass to open() does not tell Python anything about how to represent the file contents. From the docs:
Append 'b' to the mode to open the file in binary mode, on systems that differentiate between binary and text files; on systems that don’t have this distinction, adding the 'b' has no effect.
Unless you just want to look at what the binary data from the file looks like, Mark Pilgrim's book, Dive Into Python, has an example of working with binary file formats. The example shows how you can read IDv1 tags from an MP3 file. The book's website seems to be down, so I'm linking to a mirror.

Each character in the string is the ASCII representation of a binary byte. If you want it as a string of zeros and ones then you can convert each byte to an integer, format it as 8 binary digits and join everything together:
>>> s = "hello world"
>>> ''.join("{0:08b}".format(ord(x)) for x in s)
'0110100001100101011011000110110001101111001000000111011101101111011100100110110001100100'
Depending on if you really need to analyse / manipulate things at the binary level an external module such as bitstring could be helpful. Check out the docs; to just get the binary interpretation use something like:
>>> f = open('somefile', 'rb')
>>> b = bitstring.Bits(f)
>>> b.bin
0100100101001001...

Use ord(x) to get the integer value of each byte.
>>> with open('settings.dat', 'rb') as file:
... data = file.read()
...
>>> for index, value in enumerate(data):
... print '0x%08x 0x%02x' % (index, ord(value))
...
0x00000000 0x28
0x00000001 0x64
0x00000002 0x70
0x00000003 0x30
0x00000004 0x0d
0x00000005 0x0a
0x00000006 0x53
0x00000007 0x27
0x00000008 0x4d
0x00000009 0x41
0x0000000a 0x49
0x0000000b 0x4e
0x0000000c 0x5f
0x0000000d 0x57
0x0000000e 0x49
0x0000000f 0x4e

If you realy want to convert the binaray bytes to a stream of bits, you have to remove the first two chars ('0b') from the output of bin() and reverse the result:
with open("settings.dat", "rb") as fp:
print "".join( (bin(ord(c))[2:][::-1]).ljust(8,"0") for c in fp.read() )
If you use Python prior to 2.6, you have no bin() function.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python extraction of date data from a binary file - python

Related

How to read double, float and int values from binary files in python?

How to read hex values at specific addresses in Python?

Write bitstream to file python 3 [duplicate]

numpy.array.tofile() binary file looks "strange" in notepad++

How to Read-in Binary of a File in Python

Categories

Resources